org.apache.hadoop.mapreduce.InputFormat<K,V>

org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>

org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat<K,V>

org.apache.hadoop.mapred.lib.CombineFileInputFormat<K,V>

All Implemented Interfaces:: InputFormat<K,V>

Direct Known Subclasses:: CombineSequenceFileInputFormat, CombineTextInputFormat

@Public @Stable public abstract class CombineFileInputFormat<K,V> extends CombineFileInputFormat<K,V> implements InputFormat<K,V>

An abstract InputFormat that returns CombineFileSplit's in InputFormat.getSplits(JobConf, int) method. Splits are constructed from the files under the input paths. A split cannot have files from different pools. Each split returned may contain blocks from different files. If a maxSplitSize is specified, then blocks on the same node are combined to form a single split. Blocks that are left over are then combined with other blocks in the same rack. If maxSplitSize is not specified, then blocks from the same rack are combined in a single split; no attempt is made to create node-local splits. If the maxSplitSize is equal to the block size, then this class is similar to the default spliting behaviour in Hadoop: each block is a locally processed split. Subclasses implement InputFormat.getRecordReader(InputSplit, JobConf, Reporter) to construct RecordReader's for CombineFileSplit's.

See Also:

CombineFileSplit

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
FileInputFormat.Counter
Field Summary

Fields inherited from class org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat
SPLIT_MINSIZE_PERNODE, SPLIT_MINSIZE_PERRACK

Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE
Constructor Summary

Constructors

Constructor

Description

CombineFileInputFormat()

default constructor
Method Summary

Modifier and Type

Method

Description

protected void

createPool(JobConf conf, List<PathFilter> filters)

Deprecated.
Use CombineFileInputFormat.createPool(List).

protected void

createPool(JobConf conf, PathFilter... filters)

Deprecated.
Use CombineFileInputFormat.createPool(PathFilter...).

RecordReader<K,V>

createRecordReader(InputSplit split, TaskAttemptContext context)

This is not implemented yet.

abstract RecordReader<K,V>

getRecordReader(InputSplit split, JobConf job, Reporter reporter)

This is not implemented yet.

InputSplit[]

getSplits(JobConf job, int numSplits)

Logically split the set of input files for the job.

protected boolean

isSplitable(FileSystem fs, Path file)

protected boolean

isSplitable(JobContext context, Path file)

Subclasses should avoid overriding this method and should instead only override isSplitable(FileSystem, Path).

protected FileStatus[]

listStatus(JobConf job)

List input directories.

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat
createPool, createPool, getFileBlockLocations, getSplits, setMaxSplitSize, setMinSplitSizeNode, setMinSplitSizeRack

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, listStatus, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize, shrinkStatus

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- CombineFileInputFormat
  
  public CombineFileInputFormat()
  
  default constructor
Method Details
- getSplits
  
  public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException
  
  Description copied from interface: InputFormat
  
  Logically split the set of input files for the job.
  Each InputSplit is then assigned to an individual Mapper for processing.
  
  Note: The split is a logical split of the inputs and the input files are not physically split into chunks. For e.g. a split could be <input-file-path, start, offset> tuple.
  
  Specified by:
  
  getSplits in interface InputFormat<K,V>
  
  Parameters:
  
  job - job configuration.
  
  numSplits - the desired number of splits, a hint.
  
  Returns:
  
  an array of InputSplits for the job.
  
  Throws:
  
  IOException
- createPool
  
  @Deprecated protected void createPool(JobConf conf, List<PathFilter> filters)
  
  Deprecated.
  Use CombineFileInputFormat.createPool(List).
  
  Create a new pool and add the filters to it. A split cannot have files from different pools.
- createPool
  
  @Deprecated protected void createPool(JobConf conf, PathFilter... filters)
  
  Deprecated.
  Use CombineFileInputFormat.createPool(PathFilter...).
  
  Create a new pool and add the filters to it. A pathname can satisfy any one of the specified filters. A split cannot have files from different pools.
- getRecordReader
  
  public abstract RecordReader<K,V> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException
  
  This is not implemented yet.
  
  Specified by:
  
  getRecordReader in interface InputFormat<K,V>
  
  Parameters:
  
  split - the InputSplit
  
  job - the job that this split belongs to
  
  Returns:
  
  a RecordReader
  
  Throws:
  
  IOException
- createRecordReader
  
  public RecordReader<K,V> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException
  
  Description copied from class: CombineFileInputFormat
  
  This is not implemented yet.
  
  Specified by:
  
  createRecordReader in class CombineFileInputFormat<K,V>
  
  Parameters:
  
  split - the split to be read
  
  context - the information about the task
  
  Returns:
  
  a new record reader
  
  Throws:
  
  IOException
- listStatus
  
  protected FileStatus[] listStatus(JobConf job) throws IOException
  
  List input directories. Subclasses may override to, e.g., select only files matching a regular expression.
  
  Parameters:
  
  job - the job to list input paths for
  
  Returns:
  
  array of FileStatus objects
  
  Throws:
  
  IOException - if zero items.
- isSplitable
  
  @Private protected boolean isSplitable(JobContext context, Path file)
  
  Subclasses should avoid overriding this method and should instead only override isSplitable(FileSystem, Path). The implementation of this method simply calls the other method to preserve compatibility.
  Overrides:
  
  isSplitable in class CombineFileInputFormat<K,V>
  
  Parameters:
  
  context - the job context
  
  file - the file name to check
  
  Returns:
  
  is this file splitable?
  
  See Also:
  
  MAPREDUCE-5530
- isSplitable
  
  protected boolean isSplitable(FileSystem fs, Path file)

Class CombineFileInputFormat<K,V>

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

Field Summary

Fields inherited from class org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat

Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

Methods inherited from class java.lang.Object

Constructor Details

CombineFileInputFormat

Method Details

getSplits

createPool

createPool

getRecordReader

createRecordReader

listStatus

isSplitable

isSplitable