@InterfaceAudience.Public @InterfaceStability.Stable public interface InputFormat<K,V>
InputFormat describes the input-specification for a 
 Map-Reduce job. 
 
 The Map-Reduce framework relies on the InputFormat of the
 job to:
InputSplits, each of 
   which is then assigned to an individual Mapper.
   RecordReader implementation to be used to glean
   input records from the logical InputSplit for processing by 
   the Mapper.
   The default behavior of file-based InputFormats, typically 
 sub-classes of FileInputFormat, is to split the 
 input into logical InputSplits based on the total size, in 
 bytes, of the input files. However, the FileSystem blocksize of  
 the input files is treated as an upper bound for input splits. A lower bound 
 on the split size can be set via 
 
 mapreduce.input.fileinputformat.split.minsize.
Clearly, logical splits based on input-size is insufficient for many 
 applications since record boundaries are to be respected. In such cases, the
 application has to also implement a RecordReader on whom lies the
 responsibilty to respect record-boundaries and present a record-oriented
 view of the logical InputSplit to the individual task.
InputSplit, 
RecordReader, 
JobClient, 
FileInputFormat| Modifier and Type | Method and Description | 
|---|---|
| RecordReader<K,V> | getRecordReader(InputSplit split,
               JobConf job,
               Reporter reporter)Get the  RecordReaderfor the givenInputSplit. | 
| InputSplit[] | getSplits(JobConf job,
         int numSplits)Logically split the set of input files for the job. | 
InputSplit[] getSplits(JobConf job, int numSplits) throws IOException
Each InputSplit is then assigned to an individual Mapper
 for processing.
Note: The split is a logical split of the inputs and the input files are not physically split into chunks. For e.g. a split could be <input-file-path, start, offset> tuple.
job - job configuration.numSplits - the desired number of splits, a hint.InputSplits for the job.IOExceptionRecordReader<K,V> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException
RecordReader for the given InputSplit.
 It is the responsibility of the RecordReader to respect
 record boundaries while processing the logical split to present a 
 record-oriented view to the individual task.
split - the InputSplitjob - the job that this split belongs toRecordReaderIOExceptionCopyright © 2018 Apache Software Foundation. All rights reserved.