|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.hadoop.mapreduce.InputFormat<K,V> org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat<K,V> org.apache.hadoop.mapred.lib.CombineFileInputFormat<K,V>
@InterfaceAudience.Public @InterfaceStability.Stable public abstract class CombineFileInputFormat<K,V>
An abstract InputFormat
that returns CombineFileSplit
's
in InputFormat.getSplits(JobConf, int)
method.
Splits are constructed from the files under the input paths.
A split cannot have files from different pools.
Each split returned may contain blocks from different files.
If a maxSplitSize is specified, then blocks on the same node are
combined to form a single split. Blocks that are left over are
then combined with other blocks in the same rack.
If maxSplitSize is not specified, then blocks from the same rack
are combined in a single split; no attempt is made to create
node-local splits.
If the maxSplitSize is equal to the block size, then this class
is similar to the default spliting behaviour in Hadoop: each
block is a locally processed split.
Subclasses implement InputFormat.getRecordReader(InputSplit, JobConf, Reporter)
to construct RecordReader
's for CombineFileSplit
's.
CombineFileSplit
Field Summary |
---|
Fields inherited from class org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat |
---|
SPLIT_MINSIZE_PERNODE, SPLIT_MINSIZE_PERRACK |
Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
---|
DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE |
Constructor Summary | |
---|---|
CombineFileInputFormat()
default constructor |
Method Summary | |
---|---|
protected void |
createPool(JobConf conf,
List<PathFilter> filters)
Deprecated. Use CombineFileInputFormat.createPool(List) . |
protected void |
createPool(JobConf conf,
PathFilter... filters)
Deprecated. Use CombineFileInputFormat.createPool(PathFilter...) . |
RecordReader<K,V> |
createRecordReader(InputSplit split,
TaskAttemptContext context)
This is not implemented yet. |
abstract RecordReader<K,V> |
getRecordReader(InputSplit split,
JobConf job,
Reporter reporter)
This is not implemented yet. |
InputSplit[] |
getSplits(JobConf job,
int numSplits)
Logically split the set of input files for the job. |
protected boolean |
isSplitable(FileSystem fs,
Path file)
|
protected FileStatus[] |
listStatus(JobConf job)
List input directories. |
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat |
---|
createPool, createPool, getFileBlockLocations, getSplits, setMaxSplitSize, setMinSplitSizeNode, setMinSplitSizeRack |
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
---|
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, listStatus, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public CombineFileInputFormat()
Method Detail |
---|
public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException
InputFormat
Each InputSplit
is then assigned to an individual Mapper
for processing.
Note: The split is a logical split of the inputs and the input files are not physically split into chunks. For e.g. a split could be <input-file-path, start, offset> tuple.
getSplits
in interface InputFormat<K,V>
job
- job configuration.numSplits
- the desired number of splits, a hint.
InputSplit
s for the job.
IOException
@Deprecated protected void createPool(JobConf conf, List<PathFilter> filters)
CombineFileInputFormat.createPool(List)
.
@Deprecated protected void createPool(JobConf conf, PathFilter... filters)
CombineFileInputFormat.createPool(PathFilter...)
.
public abstract RecordReader<K,V> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException
getRecordReader
in interface InputFormat<K,V>
split
- the InputSplit
job
- the job that this split belongs to
RecordReader
IOException
public RecordReader<K,V> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException
CombineFileInputFormat
createRecordReader
in class CombineFileInputFormat<K,V>
split
- the split to be readcontext
- the information about the task
IOException
protected FileStatus[] listStatus(JobConf job) throws IOException
job
- the job to list input paths for
IOException
- if zero items.protected boolean isSplitable(FileSystem fs, Path file)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |