org.apache.hadoop.mapred.lib
Class NLineInputFormat
java.lang.Object
org.apache.hadoop.mapred.FileInputFormat<LongWritable,Text>
org.apache.hadoop.mapred.lib.NLineInputFormat
- All Implemented Interfaces:
- InputFormat<LongWritable,Text>, JobConfigurable
public class NLineInputFormat
- extends FileInputFormat<LongWritable,Text>
- implements JobConfigurable
NLineInputFormat which splits N lines of input as one split.
In many "pleasantly" parallel applications, each process/mapper
processes the same input file (s), but with computations are
controlled by different parameters.(Referred to as "parameter sweeps").
One way to achieve this, is to specify a set of parameters
(one set per line) as input in a control file
(which is the input path to the map-reduce application,
where as the input dataset is specified
via a config variable in JobConf.).
The NLineInputFormat can be used in such applications, that splits
the input file such that by default, one line is fed as
a value to one map task, and key is the offset.
i.e. (k,v) is (LongWritable, Text).
The location hints will span the whole mapred cluster.
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat |
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
NLineInputFormat
public NLineInputFormat()
getRecordReader
public RecordReader<LongWritable,Text> getRecordReader(InputSplit genericSplit,
JobConf job,
Reporter reporter)
throws IOException
- Description copied from interface:
InputFormat
- Get the
RecordReader
for the given InputSplit
.
It is the responsibility of the RecordReader
to respect
record boundaries while processing the logical split to present a
record-oriented view to the individual task.
- Specified by:
getRecordReader
in interface InputFormat<LongWritable,Text>
- Specified by:
getRecordReader
in class FileInputFormat<LongWritable,Text>
- Parameters:
genericSplit
- the InputSplit
job
- the job that this split belongs to
- Returns:
- a
RecordReader
- Throws:
IOException
getSplits
public InputSplit[] getSplits(JobConf job,
int numSplits)
throws IOException
- Logically splits the set of input files for the job, splits N lines
of the input as one split.
- Specified by:
getSplits
in interface InputFormat<LongWritable,Text>
- Overrides:
getSplits
in class FileInputFormat<LongWritable,Text>
- Parameters:
job
- job configuration.numSplits
- the desired number of splits, a hint.
- Returns:
- an array of
InputSplit
s for the job.
- Throws:
IOException
- See Also:
FileInputFormat.getSplits(JobConf, int)
createFileSplit
protected static FileSplit createFileSplit(Path fileName,
long begin,
long length)
- NLineInputFormat uses LineRecordReader, which always reads
(and consumes) at least one character out of its upper split
boundary. So to make sure that each mapper gets N lines, we
move back the upper split limits of each split
by one character here.
- Parameters:
fileName
- Path of filebegin
- the position of the first byte in the file to processlength
- number of bytes in InputSplit
- Returns:
- FileSplit
configure
public void configure(JobConf conf)
- Description copied from interface:
JobConfigurable
- Initializes a new instance from a
JobConf
.
- Specified by:
configure
in interface JobConfigurable
- Parameters:
conf
- the configuration
Copyright © 2009 The Apache Software Foundation