Package org.apache.hadoop.mapred.lib
Class NLineInputFormat
java.lang.Object
org.apache.hadoop.mapred.FileInputFormat<LongWritable,Text>
org.apache.hadoop.mapred.lib.NLineInputFormat
- All Implemented Interfaces:
InputFormat<LongWritable,,Text> JobConfigurable
@Public
@Stable
public class NLineInputFormat
extends FileInputFormat<LongWritable,Text>
implements JobConfigurable
NLineInputFormat which splits N lines of input as one split.
In many "pleasantly" parallel applications, each process/mapper
processes the same input file (s), but with computations are
controlled by different parameters.(Referred to as "parameter sweeps").
One way to achieve this, is to specify a set of parameters
(one set per line) as input in a control file
(which is the input path to the map-reduce application,
where as the input dataset is specified
via a config variable in JobConf.).
The NLineInputFormat can be used in such applications, that splits
the input file such that by default, one line is fed as
a value to one map task, and key is the offset.
i.e. (k,v) is (LongWritable, Text).
The location hints will span the whole mapred cluster.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat
FileInputFormat.Counter -
Field Summary
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LOG, NUM_INPUT_FILES -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidInitializes a new instance from aJobConf.protected static FileSplitcreateFileSplit(Path fileName, long begin, long length) NLineInputFormat uses LineRecordReader, which always reads (and consumes) at least one character out of its upper split boundary.getRecordReader(InputSplit genericSplit, JobConf job, Reporter reporter) Get theRecordReaderfor the givenInputSplit.Logically splits the set of input files for the job, splits N lines of the input as one split.Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, isSplitable, listStatus, makeSplit, makeSplit, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize
-
Constructor Details
-
NLineInputFormat
public NLineInputFormat()
-
-
Method Details
-
getRecordReader
public RecordReader<LongWritable,Text> getRecordReader(InputSplit genericSplit, JobConf job, Reporter reporter) throws IOException Description copied from interface:InputFormatGet theRecordReaderfor the givenInputSplit.It is the responsibility of the
RecordReaderto respect record boundaries while processing the logical split to present a record-oriented view to the individual task.- Specified by:
getRecordReaderin interfaceInputFormat<LongWritable,Text> - Specified by:
getRecordReaderin classFileInputFormat<LongWritable,Text> - Parameters:
genericSplit- theInputSplitjob- the job that this split belongs to- Returns:
- a
RecordReader - Throws:
IOException
-
getSplits
Logically splits the set of input files for the job, splits N lines of the input as one split.- Specified by:
getSplitsin interfaceInputFormat<LongWritable,Text> - Overrides:
getSplitsin classFileInputFormat<LongWritable,Text> - Parameters:
job- job configuration.numSplits- the desired number of splits, a hint.- Returns:
- an array of
InputSplits for the job. - Throws:
IOException- See Also:
-
configure
Description copied from interface:JobConfigurableInitializes a new instance from aJobConf.- Specified by:
configurein interfaceJobConfigurable- Parameters:
conf- the configuration
-
createFileSplit
NLineInputFormat uses LineRecordReader, which always reads (and consumes) at least one character out of its upper split boundary. So to make sure that each mapper gets N lines, we move back the upper split limits of each split by one character here.- Parameters:
fileName- Path of filebegin- the position of the first byte in the file to processlength- number of bytes in InputSplit- Returns:
- FileSplit
-