org.apache.hadoop.mapreduce.lib.input
Class KeyValueTextInputFormat
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<Text,Text>
org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat
@InterfaceAudience.Public
@InterfaceStability.Stable
public class KeyValueTextInputFormat
- extends FileInputFormat<Text,Text>
An InputFormat
for plain text files. Files are broken into lines.
Either line feed or carriage-return are used to signal end of line.
Each line is divided into key and value parts by a separator byte. If no
such a byte exists, the key will be the entire line and value will be empty.
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
KeyValueTextInputFormat
public KeyValueTextInputFormat()
isSplitable
protected boolean isSplitable(JobContext context,
Path file)
- Description copied from class:
FileInputFormat
- Is the given filename splitable? Usually, true, but if the file is
stream compressed, it will not be.
FileInputFormat
implementations can override this and return
false
to ensure that individual input files are never split-up
so that Mapper
s process entire files.
- Overrides:
isSplitable
in class FileInputFormat<Text,Text>
- Parameters:
context
- the job contextfile
- the file name to check
- Returns:
- is this file splitable?
createRecordReader
public RecordReader<Text,Text> createRecordReader(InputSplit genericSplit,
TaskAttemptContext context)
throws IOException
- Description copied from class:
InputFormat
- Create a record reader for a given split. The framework will call
RecordReader.initialize(InputSplit, TaskAttemptContext)
before
the split is used.
- Specified by:
createRecordReader
in class InputFormat<Text,Text>
- Parameters:
genericSplit
- the split to be readcontext
- the information about the task
- Returns:
- a new record reader
- Throws:
IOException
Copyright © 2009 The Apache Software Foundation