InputFormat (Apache Hadoop Main 2.7.2 API)

java.lang.Object
- org.apache.hadoop.mapreduce.InputFormat<K,V>

Direct Known Subclasses:

ComposableInputFormat, CompositeInputFormat, DBInputFormat, FileInputFormat
```
@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class InputFormat<K,V>
extends Object
```
InputFormat describes the input-specification for a Map-Reduce job.
The Map-Reduce framework relies on the InputFormat of the job to:
1. Validate the input-specification of the job.
2. Split-up the input file(s) into logical InputSplits, each of which is then assigned to an individual Mapper.
3. Provide the RecordReader implementation to be used to glean input records from the logical InputSplit for processing by the Mapper.
The default behavior of file-based InputFormats, typically sub-classes of FileInputFormat, is to split the input into logical InputSplits based on the total size, in bytes, of the input files. However, the FileSystem blocksize of the input files is treated as an upper bound for input splits. A lower bound on the split size can be set via mapreduce.input.fileinputformat.split.minsize.

Clearly, logical splits based on input-size is insufficient for many applications since record boundaries are to respected. In such cases, the application has to also implement a RecordReader on whom lies the responsibility to respect record-boundaries and present a record-oriented view of the logical InputSplit to the individual task.
See Also:
InputSplit, RecordReader, FileInputFormat

Constructor Summary

Constructors
Constructor and Description

InputFormat()

Constructors
Constructor and Description
`InputFormat()`

Method Summary

Methods
Modifier and Type	Method and Description
`abstract RecordReader<K,V>`	`createRecordReader(InputSplit split, TaskAttemptContext context)` Create a record reader for a given split.
`abstract List<InputSplit>`	`getSplits(JobContext context)` Logically split the set of input files for the job.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - InputFormat
```
public InputFormat()
```
- Method Detail
  - getSplits
```
public abstract List<InputSplit> getSplits(JobContext context)
                                    throws IOException,
                                           InterruptedException
```
    Logically split the set of input files for the job.
    Each InputSplit is then assigned to an individual Mapper for processing.
    
    Note: The split is a logical split of the inputs and the input files are not physically split into chunks. For e.g. a split could be <input-file-path, start, offset> tuple. The InputFormat also creates the RecordReader to read the InputSplit.
    
    Parameters:
    context - job configuration.
    
    Returns:
    an array of InputSplits for the job.
    
    Throws:
    
    IOException
    
    InterruptedException
  - createRecordReader
```
public abstract RecordReader<K,V> createRecordReader(InputSplit split,
                                   TaskAttemptContext context)
                                              throws IOException,
                                                     InterruptedException
```
    Create a record reader for a given split. The framework will call RecordReader.initialize(InputSplit, TaskAttemptContext) before the split is used.
    
    Parameters:
    split - the split to be read
    context - the information about the task
    
    Returns:
    a new record reader
    
    Throws:
    
    IOException
    
    InterruptedException

Class InputFormat<K,V>

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

InputFormat

Method Detail

getSplits

createRecordReader