|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.hadoop.mapreduce.InputFormat<K,V> org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat<K,V>
@InterfaceAudience.Public @InterfaceStability.Stable public abstract class CombineFileInputFormat<K,V>
An abstract InputFormat
that returns CombineFileSplit
's in
InputFormat.getSplits(JobContext)
method.
Splits are constructed from the files under the input paths.
A split cannot have files from different pools.
Each split returned may contain blocks from different files.
If a maxSplitSize is specified, then blocks on the same node are
combined to form a single split. Blocks that are left over are
then combined with other blocks in the same rack.
If maxSplitSize is not specified, then blocks from the same rack
are combined in a single split; no attempt is made to create
node-local splits.
If the maxSplitSize is equal to the block size, then this class
is similar to the default splitting behavior in Hadoop: each
block is a locally processed split.
Subclasses implement
InputFormat.createRecordReader(InputSplit, TaskAttemptContext)
to construct RecordReader
's for
CombineFileSplit
's.
CombineFileSplit
Field Summary | |
---|---|
static String |
SPLIT_MINSIZE_PERNODE
|
static String |
SPLIT_MINSIZE_PERRACK
|
Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
---|
DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE |
Constructor Summary | |
---|---|
CombineFileInputFormat()
default constructor |
Method Summary | |
---|---|
protected void |
createPool(List<PathFilter> filters)
Create a new pool and add the filters to it. |
protected void |
createPool(PathFilter... filters)
Create a new pool and add the filters to it. |
abstract RecordReader<K,V> |
createRecordReader(InputSplit split,
TaskAttemptContext context)
This is not implemented yet. |
protected BlockLocation[] |
getFileBlockLocations(FileSystem fs,
FileStatus stat)
|
List<InputSplit> |
getSplits(JobContext job)
Generate the list of files and make them into FileSplits. |
protected boolean |
isSplitable(JobContext context,
Path file)
Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. |
protected void |
setMaxSplitSize(long maxSplitSize)
Specify the maximum size (in bytes) of each split. |
protected void |
setMinSplitSizeNode(long minSplitSizeNode)
Specify the minimum size (in bytes) of each split per node. |
protected void |
setMinSplitSizeRack(long minSplitSizeRack)
Specify the minimum size (in bytes) of each split per rack. |
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
---|
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, listStatus, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String SPLIT_MINSIZE_PERNODE
public static final String SPLIT_MINSIZE_PERRACK
Constructor Detail |
---|
public CombineFileInputFormat()
Method Detail |
---|
protected void setMaxSplitSize(long maxSplitSize)
protected void setMinSplitSizeNode(long minSplitSizeNode)
protected void setMinSplitSizeRack(long minSplitSizeRack)
protected void createPool(List<PathFilter> filters)
protected void createPool(PathFilter... filters)
protected boolean isSplitable(JobContext context, Path file)
FileInputFormat
FileInputFormat
implementations can override this and return
false
to ensure that individual input files are never split-up
so that Mapper
s process entire files.
isSplitable
in class FileInputFormat<K,V>
context
- the job contextfile
- the file name to check
public List<InputSplit> getSplits(JobContext job) throws IOException
FileInputFormat
getSplits
in class FileInputFormat<K,V>
job
- the job context
InputSplit
s for the job.
IOException
public abstract RecordReader<K,V> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException
createRecordReader
in class InputFormat<K,V>
split
- the split to be readcontext
- the information about the task
IOException
protected BlockLocation[] getFileBlockLocations(FileSystem fs, FileStatus stat) throws IOException
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |