FileInputFormat (Apache Hadoop Main 2.6.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.mapreduce.lib.input
Class FileInputFormat<K,V>

java.lang.Object
  org.apache.hadoop.mapreduce.InputFormat<K,V>
      org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>

Direct Known Subclasses:: CombineFileInputFormat, FixedLengthInputFormat, KeyValueTextInputFormat, NLineInputFormat, SequenceFileInputFormat, TextInputFormat

@InterfaceAudience.Public @InterfaceStability.Stable public abstract class FileInputFormat<K,V>
extends InputFormat<K,V>
extends InputFormat<K,V>

A base class for file-based InputFormats.

FileInputFormat is the base class for all file-based InputFormats. This provides a generic implementation of getSplits(JobContext). Subclasses of FileInputFormat can also override the isSplitable(JobContext, Path) method to ensure input-files are not split-up and are processed as a whole by Mappers.

Field Summary
`static int`	`DEFAULT_LIST_STATUS_NUM_THREADS`
`static String`	`INPUT_DIR`
`static String`	`INPUT_DIR_RECURSIVE`
`static String`	`LIST_STATUS_NUM_THREADS`
`static String`	`NUM_INPUT_FILES`
`static String`	`PATHFILTER_CLASS`
`static String`	`SPLIT_MAXSIZE`
`static String`	`SPLIT_MINSIZE`

Constructor Summary
`FileInputFormat()`

Method Summary
`static void`	`addInputPath(Job job, Path path)` Add a `Path` to the list of inputs for the map-reduce job.
`protected void`	`addInputPathRecursively(List<FileStatus> result, FileSystem fs, Path path, PathFilter inputFilter)` Add files in the input path recursively into the results.
`static void`	`addInputPaths(Job job, String commaSeparatedPaths)` Add the given comma separated paths to the list of inputs for the map-reduce job.
`protected long`	`computeSplitSize(long blockSize, long minSize, long maxSize)`
`protected int`	`getBlockIndex(BlockLocation[] blkLocations, long offset)`
`protected long`	`getFormatMinSplitSize()` Get the lower bound on split size imposed by the format.
`static boolean`	`getInputDirRecursive(JobContext job)`
`static PathFilter`	`getInputPathFilter(JobContext context)` Get a PathFilter instance of the filter set for the input paths.
`static Path[]`	`getInputPaths(JobContext context)` Get the list of input `Path`s for the map-reduce job.
`static long`	`getMaxSplitSize(JobContext context)` Get the maximum split size.
`static long`	`getMinSplitSize(JobContext job)` Get the minimum split size
`List<InputSplit>`	`getSplits(JobContext job)` Generate the list of files and make them into FileSplits.
`protected boolean`	`isSplitable(JobContext context, Path filename)` Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be.
`protected List<FileStatus>`	`listStatus(JobContext job)` List input directories.
`protected FileSplit`	`makeSplit(Path file, long start, long length, String[] hosts)` A factory that makes the split for this class.
`protected FileSplit`	`makeSplit(Path file, long start, long length, String[] hosts, String[] inMemoryHosts)` A factory that makes the split for this class.
`static void`	`setInputDirRecursive(Job job, boolean inputDirRecursive)`
`static void`	`setInputPathFilter(Job job, Class<? extends PathFilter> filter)` Set a PathFilter to be applied to the input paths for the map-reduce job.
`static void`	`setInputPaths(Job job, Path... inputPaths)` Set the array of `Path`s as the list of inputs for the map-reduce job.
`static void`	`setInputPaths(Job job, String commaSeparatedPaths)` Sets the given comma separated paths as the list of inputs for the map-reduce job.
`static void`	`setMaxInputSplitSize(Job job, long size)` Set the maximum split size
`static void`	`setMinInputSplitSize(Job job, long size)` Set the minimum input split size

Methods inherited from class org.apache.hadoop.mapreduce.InputFormat
`createRecordReader`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

INPUT_DIR

public static final String INPUT_DIR

See Also:: Constant Field Values

SPLIT_MAXSIZE

public static final String SPLIT_MAXSIZE

See Also:: Constant Field Values

SPLIT_MINSIZE

public static final String SPLIT_MINSIZE

See Also:: Constant Field Values

PATHFILTER_CLASS

public static final String PATHFILTER_CLASS

See Also:: Constant Field Values

NUM_INPUT_FILES

public static final String NUM_INPUT_FILES

See Also:: Constant Field Values

INPUT_DIR_RECURSIVE

public static final String INPUT_DIR_RECURSIVE

See Also:: Constant Field Values

LIST_STATUS_NUM_THREADS

public static final String LIST_STATUS_NUM_THREADS

See Also:: Constant Field Values

DEFAULT_LIST_STATUS_NUM_THREADS

public static final int DEFAULT_LIST_STATUS_NUM_THREADS

See Also:: Constant Field Values

Constructor Detail

FileInputFormat

public FileInputFormat()

Method Detail

setInputDirRecursive

public static void setInputDirRecursive(Job job,
                                        boolean inputDirRecursive)

Parameters:: job - the job to modify; inputDirRecursive -

getInputDirRecursive

public static boolean getInputDirRecursive(JobContext job)

Parameters:: job - the job to look at.
Returns:: should the files to be read recursively?

getFormatMinSplitSize

protected long getFormatMinSplitSize()

Get the lower bound on split size imposed by the format.

Returns:: the number of bytes of the minimal split for this format

isSplitable

protected boolean isSplitable(JobContext context,
                              Path filename)

Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. FileInputFormat implementations can override this and return false to ensure that individual input files are never split-up so that Mappers process entire files.

Parameters:: context - the job context; filename - the file name to check
Returns:: is this file splitable?

setInputPathFilter

public static void setInputPathFilter(Job job,
                                      Class<? extends PathFilter> filter)

Set a PathFilter to be applied to the input paths for the map-reduce job.

Parameters:: job - the job to modify; filter - the PathFilter class use for filtering the input paths.

setMinInputSplitSize

public static void setMinInputSplitSize(Job job,
                                        long size)

Set the minimum input split size

Parameters:: job - the job to modify; size - the minimum size

getMinSplitSize

public static long getMinSplitSize(JobContext job)

Get the minimum split size

Parameters:: job - the job
Returns:: the minimum number of bytes that can be in a split

setMaxInputSplitSize

public static void setMaxInputSplitSize(Job job,
                                        long size)

Set the maximum split size

Parameters:: job - the job to modify; size - the maximum split size

getMaxSplitSize

public static long getMaxSplitSize(JobContext context)

Get the maximum split size.

Parameters:: context - the job to look at.
Returns:: the maximum number of bytes a split can include

getInputPathFilter

public static PathFilter getInputPathFilter(JobContext context)

Get a PathFilter instance of the filter set for the input paths.

Returns:: the PathFilter instance set for the job, NULL if none has been set.

listStatus

protected List<FileStatus> listStatus(JobContext job)
                               throws IOException

List input directories. Subclasses may override to, e.g., select only files matching a regular expression.

Parameters:: job - the job to list input paths for
Returns:: array of FileStatus objects
Throws:: IOException - if zero items.

addInputPathRecursively

protected void addInputPathRecursively(List<FileStatus> result,
                                       FileSystem fs,
                                       Path path,
                                       PathFilter inputFilter)
                                throws IOException

Add files in the input path recursively into the results.

Parameters:: result - The List to store all files.; fs - The FileSystem.; path - The input path.; inputFilter - The input filter that can be used to filter files/dirs.
Throws:: IOException

makeSplit

protected FileSplit makeSplit(Path file,
                              long start,
                              long length,
                              String[] hosts)

A factory that makes the split for this class. It can be overridden by sub-classes to make sub-types

makeSplit

protected FileSplit makeSplit(Path file,
                              long start,
                              long length,
                              String[] hosts,
                              String[] inMemoryHosts)

A factory that makes the split for this class. It can be overridden by sub-classes to make sub-types

getSplits

public List<InputSplit> getSplits(JobContext job)
                           throws IOException

Generate the list of files and make them into FileSplits.

Specified by:: getSplits in class InputFormat<K,V>

Parameters:: job - the job context
Returns:: an array of InputSplits for the job.
Throws:: IOException

computeSplitSize

protected long computeSplitSize(long blockSize,
                                long minSize,
                                long maxSize)

getBlockIndex

protected int getBlockIndex(BlockLocation[] blkLocations,
                            long offset)

setInputPaths

public static void setInputPaths(Job job,
                                 String commaSeparatedPaths)
                          throws IOException

Sets the given comma separated paths as the list of inputs for the map-reduce job.

Parameters:: job - the job; commaSeparatedPaths - Comma separated paths to be set as the list of inputs for the map-reduce job.
Throws:: IOException

addInputPaths

public static void addInputPaths(Job job,
                                 String commaSeparatedPaths)
                          throws IOException

Add the given comma separated paths to the list of inputs for the map-reduce job.

Parameters:: job - The job to modify; commaSeparatedPaths - Comma separated paths to be added to the list of inputs for the map-reduce job.
Throws:: IOException

setInputPaths

public static void setInputPaths(Job job,
                                 Path... inputPaths)
                          throws IOException

Set the array of Paths as the list of inputs for the map-reduce job.

Parameters:: job - The job to modify; inputPaths - the Paths of the input directories/files for the map-reduce job.
Throws:: IOException

addInputPath

public static void addInputPath(Job job,
                                Path path)
                         throws IOException

Add a Path to the list of inputs for the map-reduce job.

Parameters:: job - The Job to modify; path - Path to be added to the list of inputs for the map-reduce job.
Throws:: IOException

getInputPaths

public static Path[] getInputPaths(JobContext context)

Get the list of input Paths for the map-reduce job.

Parameters:: context - The job
Returns:: the list of input Paths for the map-reduce job.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.mapreduce.lib.input Class FileInputFormat<K,V>

INPUT_DIR

SPLIT_MAXSIZE

SPLIT_MINSIZE

PATHFILTER_CLASS

NUM_INPUT_FILES

INPUT_DIR_RECURSIVE

LIST_STATUS_NUM_THREADS

DEFAULT_LIST_STATUS_NUM_THREADS

FileInputFormat

setInputDirRecursive

getInputDirRecursive

getFormatMinSplitSize

isSplitable

setInputPathFilter

setMinInputSplitSize

getMinSplitSize

setMaxInputSplitSize

getMaxSplitSize

getInputPathFilter

listStatus

addInputPathRecursively

makeSplit

makeSplit

getSplits

computeSplitSize

getBlockIndex

setInputPaths

addInputPaths

setInputPaths

addInputPath

getInputPaths

org.apache.hadoop.mapreduce.lib.input
Class FileInputFormat<K,V>