@InterfaceAudience.Public @InterfaceStability.Stable public interface InputFormat<K,V>
InputFormatdescribes the input-specification for a Map-Reduce job.
The Map-Reduce framework relies on the
InputFormat of the
InputSplits, each of which is then assigned to an individual
RecordReaderimplementation to be used to glean input records from the logical
InputSplitfor processing by the
The default behavior of file-based
FileInputFormat, is to split the
input into logical
InputSplits based on the total size, in
bytes, of the input files. However, the
FileSystem blocksize of
the input files is treated as an upper bound for input splits. A lower bound
on the split size can be set via
Clearly, logical splits based on input-size is insufficient for many
applications since record boundaries are to be respected. In such cases, the
application has to also implement a
RecordReader on whom lies the
responsibilty to respect record-boundaries and present a record-oriented
view of the logical
InputSplit to the individual task.
InputSplit getSplits(JobConf job, int numSplits) throws IOException
Note: The split is a logical split of the inputs and the input files are not physically split into chunks. For e.g. a split could be <input-file-path, start, offset> tuple.
RecordReader<K,V> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException
RecordReaderfor the given
It is the responsibility of the
RecordReader to respect
record boundaries while processing the logical split to present a
record-oriented view to the individual task.
Copyright © 2023 Apache Software Foundation. All rights reserved.