CombineFileInputFormat (Apache Hadoop Main 2.4.1 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.mapred.lib
Class CombineFileInputFormat<K,V>

java.lang.Object
  org.apache.hadoop.mapreduce.InputFormat<K,V>
      org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>
          org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat<K,V>
              org.apache.hadoop.mapred.lib.CombineFileInputFormat<K,V>

All Implemented Interfaces:: InputFormat<K,V>

Direct Known Subclasses:: CombineSequenceFileInputFormat, CombineTextInputFormat

@InterfaceAudience.Public @InterfaceStability.Stable public abstract class CombineFileInputFormat<K,V>
extends CombineFileInputFormat<K,V>
implements InputFormat<K,V>
extends CombineFileInputFormat<K,V>
implements InputFormat<K,V>

An abstract InputFormat that returns CombineFileSplit's in InputFormat.getSplits(JobConf, int) method. Splits are constructed from the files under the input paths. A split cannot have files from different pools. Each split returned may contain blocks from different files. If a maxSplitSize is specified, then blocks on the same node are combined to form a single split. Blocks that are left over are then combined with other blocks in the same rack. If maxSplitSize is not specified, then blocks from the same rack are combined in a single split; no attempt is made to create node-local splits. If the maxSplitSize is equal to the block size, then this class is similar to the default spliting behaviour in Hadoop: each block is a locally processed split. Subclasses implement InputFormat.getRecordReader(InputSplit, JobConf, Reporter) to construct RecordReader's for CombineFileSplit's.

See Also:: CombineFileSplit

Field Summary

Fields inherited from class org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat
`SPLIT_MINSIZE_PERNODE, SPLIT_MINSIZE_PERRACK`

Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
`DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE`

Constructor Summary
`CombineFileInputFormat()` default constructor

Method Summary
`protected void`	`createPool(JobConf conf, List<PathFilter> filters)` Deprecated. Use `CombineFileInputFormat.createPool(List)`.
`protected void`	`createPool(JobConf conf, PathFilter... filters)` Deprecated. Use `CombineFileInputFormat.createPool(PathFilter...)`.
`RecordReader<K,V>`	`createRecordReader(InputSplit split, TaskAttemptContext context)` This is not implemented yet.
`abstract RecordReader<K,V>`	`getRecordReader(InputSplit split, JobConf job, Reporter reporter)` This is not implemented yet.
`InputSplit[]`	`getSplits(JobConf job, int numSplits)` Logically split the set of input files for the job.
`protected boolean`	`isSplitable(FileSystem fs, Path file)`
`protected FileStatus[]`	`listStatus(JobConf job)` List input directories.

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat
`createPool, createPool, getFileBlockLocations, getSplits, setMaxSplitSize, setMinSplitSizeNode, setMinSplitSizeRack`

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
`addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, listStatus, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

CombineFileInputFormat

public CombineFileInputFormat()

default constructor

Method Detail

getSplits

public InputSplit[] getSplits(JobConf job,
                              int numSplits)
                       throws IOException

Description copied from interface: InputFormat

Logically split the set of input files for the job.

Each InputSplit is then assigned to an individual Mapper for processing.

Note: The split is a logical split of the inputs and the input files are not physically split into chunks. For e.g. a split could be <input-file-path, start, offset> tuple.

Specified by:: getSplits in interface InputFormat<K,V>

Parameters:: job - job configuration.; numSplits - the desired number of splits, a hint.
Returns:: an array of InputSplits for the job.
Throws:: IOException

createPool

@Deprecated
protected void createPool(JobConf conf,
                                     List<PathFilter> filters)

Deprecated. Use CombineFileInputFormat.createPool(List).

Create a new pool and add the filters to it. A split cannot have files from different pools.

createPool

@Deprecated
protected void createPool(JobConf conf,
                                     PathFilter... filters)

Deprecated. Use CombineFileInputFormat.createPool(PathFilter...).

Create a new pool and add the filters to it. A pathname can satisfy any one of the specified filters. A split cannot have files from different pools.

getRecordReader

public abstract RecordReader<K,V> getRecordReader(InputSplit split,
                                                  JobConf job,
                                                  Reporter reporter)
                                           throws IOException

This is not implemented yet.

Specified by:: getRecordReader in interface InputFormat<K,V>

Parameters:: split - the InputSplit; job - the job that this split belongs to
Returns:: a RecordReader
Throws:: IOException

createRecordReader

public RecordReader<K,V> createRecordReader(InputSplit split,
                                            TaskAttemptContext context)
                                     throws IOException

Description copied from class: CombineFileInputFormat

This is not implemented yet.

Specified by:: createRecordReader in class CombineFileInputFormat<K,V>

Parameters:: split - the split to be read; context - the information about the task
Returns:: a new record reader
Throws:: IOException

listStatus

protected FileStatus[] listStatus(JobConf job)
                           throws IOException

List input directories. Subclasses may override to, e.g., select only files matching a regular expression.

Parameters:: job - the job to list input paths for
Returns:: array of FileStatus objects
Throws:: IOException - if zero items.

isSplitable

protected boolean isSplitable(FileSystem fs,
                              Path file)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.mapred.lib Class CombineFileInputFormat<K,V>

CombineFileInputFormat

getSplits

createPool

createPool

getRecordReader

createRecordReader

listStatus

isSplitable

org.apache.hadoop.mapred.lib
Class CombineFileInputFormat<K,V>