|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.hadoop.mapreduce.InputFormat<K,V>
public abstract class InputFormat<K,V>
InputFormat
describes the input-specification for a
Map-Reduce job.
The Map-Reduce framework relies on the InputFormat
of the
job to:
InputSplit
s, each of
which is then assigned to an individual Mapper
.
RecordReader
implementation to be used to glean
input records from the logical InputSplit
for processing by
the Mapper
.
The default behavior of file-based InputFormat
s, typically
sub-classes of FileInputFormat
, is to split the
input into logical InputSplit
s based on the total size, in
bytes, of the input files. However, the FileSystem
blocksize of
the input files is treated as an upper bound for input splits. A lower bound
on the split size can be set via
mapred.min.split.size.
Clearly, logical splits based on input-size is insufficient for many
applications since record boundaries are to respected. In such cases, the
application has to also implement a RecordReader
on whom lies the
responsibility to respect record-boundaries and present a record-oriented
view of the logical InputSplit
to the individual task.
InputSplit
,
RecordReader
,
FileInputFormat
Constructor Summary | |
---|---|
InputFormat()
|
Method Summary | |
---|---|
abstract RecordReader<K,V> |
createRecordReader(InputSplit split,
TaskAttemptContext context)
Create a record reader for a given split. |
abstract List<InputSplit> |
getSplits(JobContext context)
Logically split the set of input files for the job. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public InputFormat()
Method Detail |
---|
public abstract List<InputSplit> getSplits(JobContext context) throws IOException, InterruptedException
Each InputSplit
is then assigned to an individual Mapper
for processing.
Note: The split is a logical split of the inputs and the
input files are not physically split into chunks. For e.g. a split could
be <input-file-path, start, offset> tuple. The InputFormat
also creates the RecordReader
to read the InputSplit
.
context
- job configuration.
InputSplit
s for the job.
IOException
InterruptedException
public abstract RecordReader<K,V> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException
RecordReader.initialize(InputSplit, TaskAttemptContext)
before
the split is used.
split
- the split to be readcontext
- the information about the task
IOException
InterruptedException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |