FileInputFormat (Hadoop 1.2.1 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.mapreduce.lib.input
Class FileInputFormat<K,V>

java.lang.Object
  org.apache.hadoop.mapreduce.InputFormat<K,V>
      org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>

Direct Known Subclasses:: CombineFileInputFormat, KeyValueTextInputFormat, NLineInputFormat, SequenceFileInputFormat, TextInputFormat

public abstract class FileInputFormat<K,V>
extends InputFormat<K,V>
extends InputFormat<K,V>

A base class for file-based InputFormats.

FileInputFormat is the base class for all file-based InputFormats. This provides a generic implementation of getSplits(JobContext). Subclasses of FileInputFormat can also override the isSplitable(JobContext, Path) method to ensure input-files are not split-up and are processed as a whole by Mappers.

Nested Class Summary
`static class`	`FileInputFormat.Counter`

Constructor Summary
`FileInputFormat()`

Method Summary
`static void`	`addInputPath(Job job, Path path)` Add a `Path` to the list of inputs for the map-reduce job.
`static void`	`addInputPaths(Job job, String commaSeparatedPaths)` Add the given comma separated paths to the list of inputs for the map-reduce job.
`protected long`	`computeSplitSize(long blockSize, long minSize, long maxSize)`
`protected int`	`getBlockIndex(BlockLocation[] blkLocations, long offset)`
`protected long`	`getFormatMinSplitSize()` Get the lower bound on split size imposed by the format.
`static PathFilter`	`getInputPathFilter(JobContext context)` Get a PathFilter instance of the filter set for the input paths.
`static Path[]`	`getInputPaths(JobContext context)` Get the list of input `Path`s for the map-reduce job.
`static long`	`getMaxSplitSize(JobContext context)` Get the maximum split size.
`static long`	`getMinSplitSize(JobContext job)` Get the minimum split size
`List<InputSplit>`	`getSplits(JobContext job)` Generate the list of files and make them into FileSplits.
`protected boolean`	`isSplitable(JobContext context, Path filename)` Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be.
`protected List<FileStatus>`	`listStatus(JobContext job)` List input directories.
`static void`	`setInputPathFilter(Job job, Class<? extends PathFilter> filter)` Set a PathFilter to be applied to the input paths for the map-reduce job.
`static void`	`setInputPaths(Job job, Path... inputPaths)` Set the array of `Path`s as the list of inputs for the map-reduce job.
`static void`	`setInputPaths(Job job, String commaSeparatedPaths)` Sets the given comma separated paths as the list of inputs for the map-reduce job.
`static void`	`setMaxInputSplitSize(Job job, long size)` Set the maximum split size
`static void`	`setMinInputSplitSize(Job job, long size)` Set the minimum input split size

Methods inherited from class org.apache.hadoop.mapreduce.InputFormat
`createRecordReader`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

FileInputFormat

public FileInputFormat()

Method Detail

getFormatMinSplitSize

protected long getFormatMinSplitSize()

Get the lower bound on split size imposed by the format.

Returns:: the number of bytes of the minimal split for this format

isSplitable

protected boolean isSplitable(JobContext context,
                              Path filename)

Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. FileInputFormat implementations can override this and return false to ensure that individual input files are never split-up so that Mappers process entire files.

Parameters:: context - the job context; filename - the file name to check
Returns:: is this file splitable?

setInputPathFilter

public static void setInputPathFilter(Job job,
                                      Class<? extends PathFilter> filter)

Set a PathFilter to be applied to the input paths for the map-reduce job.

Parameters:: job - the job to modify; filter - the PathFilter class use for filtering the input paths.

setMinInputSplitSize

public static void setMinInputSplitSize(Job job,
                                        long size)

Set the minimum input split size

Parameters:: job - the job to modify; size - the minimum size

getMinSplitSize

public static long getMinSplitSize(JobContext job)

Get the minimum split size

Parameters:: job - the job
Returns:: the minimum number of bytes that can be in a split

setMaxInputSplitSize

public static void setMaxInputSplitSize(Job job,
                                        long size)

Set the maximum split size

Parameters:: job - the job to modify; size - the maximum split size

getMaxSplitSize

public static long getMaxSplitSize(JobContext context)

Get the maximum split size.

Parameters:: context - the job to look at.
Returns:: the maximum number of bytes a split can include

getInputPathFilter

public static PathFilter getInputPathFilter(JobContext context)

Get a PathFilter instance of the filter set for the input paths.

Returns:: the PathFilter instance set for the job, NULL if none has been set.

listStatus

protected List<FileStatus> listStatus(JobContext job)
                               throws IOException

List input directories. Subclasses may override to, e.g., select only files matching a regular expression.

Parameters:: job - the job to list input paths for
Returns:: array of FileStatus objects
Throws:: IOException - if zero items.

getSplits

public List<InputSplit> getSplits(JobContext job)
                           throws IOException

Generate the list of files and make them into FileSplits.

Specified by:: getSplits in class InputFormat<K,V>

Parameters:: job - job configuration.
Returns:: an array of InputSplits for the job.
Throws:: IOException

computeSplitSize

protected long computeSplitSize(long blockSize,
                                long minSize,
                                long maxSize)

getBlockIndex

protected int getBlockIndex(BlockLocation[] blkLocations,
                            long offset)

setInputPaths

public static void setInputPaths(Job job,
                                 String commaSeparatedPaths)
                          throws IOException

Sets the given comma separated paths as the list of inputs for the map-reduce job.

Parameters:: job - the job; commaSeparatedPaths - Comma separated paths to be set as the list of inputs for the map-reduce job.
Throws:: IOException

addInputPaths

public static void addInputPaths(Job job,
                                 String commaSeparatedPaths)
                          throws IOException

Add the given comma separated paths to the list of inputs for the map-reduce job.

Parameters:: job - The job to modify; commaSeparatedPaths - Comma separated paths to be added to the list of inputs for the map-reduce job.
Throws:: IOException

setInputPaths

public static void setInputPaths(Job job,
                                 Path... inputPaths)
                          throws IOException

Set the array of Paths as the list of inputs for the map-reduce job.

Parameters:: job - The job to modify; inputPaths - the Paths of the input directories/files for the map-reduce job.
Throws:: IOException

addInputPath

public static void addInputPath(Job job,
                                Path path)
                         throws IOException

Add a Path to the list of inputs for the map-reduce job.

Parameters:: job - The Job to modify; path - Path to be added to the list of inputs for the map-reduce job.
Throws:: IOException

getInputPaths

public static Path[] getInputPaths(JobContext context)

Get the list of input Paths for the map-reduce job.

Parameters:: context - The job
Returns:: the list of input Paths for the map-reduce job.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.mapreduce.lib.input Class FileInputFormat<K,V>

FileInputFormat

getFormatMinSplitSize

isSplitable

setInputPathFilter

setMinInputSplitSize

getMinSplitSize

setMaxInputSplitSize

getMaxSplitSize

getInputPathFilter

listStatus

getSplits

computeSplitSize

getBlockIndex

setInputPaths

addInputPaths

setInputPaths

addInputPath

getInputPaths

org.apache.hadoop.mapreduce.lib.input
Class FileInputFormat<K,V>