@InterfaceAudience.Public @InterfaceStability.Stable public abstract class InputFormat<K,V> extends Object
InputFormatdescribes the input-specification for a Map-Reduce job.
The Map-Reduce framework relies on the
InputFormat of the
InputSplits, each of which is then assigned to an individual
RecordReaderimplementation to be used to glean input records from the logical
InputSplitfor processing by the
The default behavior of file-based
FileInputFormat, is to split the
input into logical
InputSplits based on the total size, in
bytes, of the input files. However, the
FileSystem blocksize of
the input files is treated as an upper bound for input splits. A lower bound
on the split size can be set via
Clearly, logical splits based on input-size is insufficient for many
applications since record boundaries are to respected. In such cases, the
application has to also implement a
RecordReader on whom lies the
responsibility to respect record-boundaries and present a record-oriented
view of the logical
InputSplit to the individual task.
|Constructor and Description|
|Modifier and Type||Method and Description|
Create a record reader for a given split.
Logically split the set of input files for the job.
public abstract List<InputSplit> getSplits(JobContext context) throws IOException, InterruptedException
InputSplit is then assigned to an individual
Note: The split is a logical split of the inputs and the
input files are not physically split into chunks. For e.g. a split could
be <input-file-path, start, offset> tuple. The InputFormat
also creates the
RecordReader to read the
context- job configuration.
InputSplits for the job.
public abstract RecordReader<K,V> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException
RecordReader.initialize(InputSplit, TaskAttemptContext)before the split is used.
split- the split to be read
context- the information about the task
Copyright © 2017 Apache Software Foundation. All rights reserved.