org.apache.hadoop.mapred
Class TextInputFormat
java.lang.Object
org.apache.hadoop.mapred.FileInputFormat<LongWritable,Text>
org.apache.hadoop.mapred.TextInputFormat
- All Implemented Interfaces:
- InputFormat<LongWritable,Text>, JobConfigurable
@InterfaceAudience.Public
@InterfaceStability.Stable
public class TextInputFormat
- extends FileInputFormat<LongWritable,Text>
- implements JobConfigurable
An InputFormat
for plain text files. Files are broken into lines.
Either linefeed or carriage-return are used to signal end of line. Keys are
the position in the file, and values are the line of text..
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat |
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, getSplits, listStatus, makeSplit, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TextInputFormat
public TextInputFormat()
configure
public void configure(JobConf conf)
- Description copied from interface:
JobConfigurable
- Initializes a new instance from a
JobConf
.
- Specified by:
configure
in interface JobConfigurable
- Parameters:
conf
- the configuration
isSplitable
protected boolean isSplitable(FileSystem fs,
Path file)
- Description copied from class:
FileInputFormat
- Is the given filename splitable? Usually, true, but if the file is
stream compressed, it will not be.
FileInputFormat
implementations can override this and return
false
to ensure that individual input files are never split-up
so that Mapper
s process entire files.
- Overrides:
isSplitable
in class FileInputFormat<LongWritable,Text>
- Parameters:
fs
- the file system that the file is onfile
- the file name to check
- Returns:
- is this file splitable?
getRecordReader
public RecordReader<LongWritable,Text> getRecordReader(InputSplit genericSplit,
JobConf job,
Reporter reporter)
throws IOException
- Description copied from interface:
InputFormat
- Get the
RecordReader
for the given InputSplit
.
It is the responsibility of the RecordReader
to respect
record boundaries while processing the logical split to present a
record-oriented view to the individual task.
- Specified by:
getRecordReader
in interface InputFormat<LongWritable,Text>
- Specified by:
getRecordReader
in class FileInputFormat<LongWritable,Text>
- Parameters:
genericSplit
- the InputSplit
job
- the job that this split belongs to
- Returns:
- a
RecordReader
- Throws:
IOException
Copyright © 2014 Apache Software Foundation. All Rights Reserved.