|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.hadoop.mapreduce.InputFormat<K,V> org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>
public abstract class FileInputFormat<K,V>
A base class for file-based InputFormat
s.
FileInputFormat
is the base class for all file-based
InputFormat
s. This provides a generic implementation of
getSplits(JobContext)
.
Subclasses of FileInputFormat
can also override the
isSplitable(JobContext, Path)
method to ensure input-files are
not split-up and are processed as a whole by Mapper
s.
Nested Class Summary | |
---|---|
static class |
FileInputFormat.Counter
|
Constructor Summary | |
---|---|
FileInputFormat()
|
Method Summary | |
---|---|
static void |
addInputPath(Job job,
Path path)
Add a Path to the list of inputs for the map-reduce job. |
static void |
addInputPaths(Job job,
String commaSeparatedPaths)
Add the given comma separated paths to the list of inputs for the map-reduce job. |
protected long |
computeSplitSize(long blockSize,
long minSize,
long maxSize)
|
protected int |
getBlockIndex(BlockLocation[] blkLocations,
long offset)
|
protected long |
getFormatMinSplitSize()
Get the lower bound on split size imposed by the format. |
static PathFilter |
getInputPathFilter(JobContext context)
Get a PathFilter instance of the filter set for the input paths. |
static Path[] |
getInputPaths(JobContext context)
Get the list of input Path s for the map-reduce job. |
static long |
getMaxSplitSize(JobContext context)
Get the maximum split size. |
static long |
getMinSplitSize(JobContext job)
Get the minimum split size |
List<InputSplit> |
getSplits(JobContext job)
Generate the list of files and make them into FileSplits. |
protected boolean |
isSplitable(JobContext context,
Path filename)
Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. |
protected List<FileStatus> |
listStatus(JobContext job)
List input directories. |
static void |
setInputPathFilter(Job job,
Class<? extends PathFilter> filter)
Set a PathFilter to be applied to the input paths for the map-reduce job. |
static void |
setInputPaths(Job job,
Path... inputPaths)
Set the array of Path s as the list of inputs
for the map-reduce job. |
static void |
setInputPaths(Job job,
String commaSeparatedPaths)
Sets the given comma separated paths as the list of inputs for the map-reduce job. |
static void |
setMaxInputSplitSize(Job job,
long size)
Set the maximum split size |
static void |
setMinInputSplitSize(Job job,
long size)
Set the minimum input split size |
Methods inherited from class org.apache.hadoop.mapreduce.InputFormat |
---|
createRecordReader |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public FileInputFormat()
Method Detail |
---|
protected long getFormatMinSplitSize()
protected boolean isSplitable(JobContext context, Path filename)
FileInputFormat
implementations can override this and return
false
to ensure that individual input files are never split-up
so that Mapper
s process entire files.
context
- the job contextfilename
- the file name to check
public static void setInputPathFilter(Job job, Class<? extends PathFilter> filter)
job
- the job to modifyfilter
- the PathFilter class use for filtering the input paths.public static void setMinInputSplitSize(Job job, long size)
job
- the job to modifysize
- the minimum sizepublic static long getMinSplitSize(JobContext job)
job
- the job
public static void setMaxInputSplitSize(Job job, long size)
job
- the job to modifysize
- the maximum split sizepublic static long getMaxSplitSize(JobContext context)
context
- the job to look at.
public static PathFilter getInputPathFilter(JobContext context)
protected List<FileStatus> listStatus(JobContext job) throws IOException
job
- the job to list input paths for
IOException
- if zero items.public List<InputSplit> getSplits(JobContext job) throws IOException
getSplits
in class InputFormat<K,V>
job
- job configuration.
InputSplit
s for the job.
IOException
protected long computeSplitSize(long blockSize, long minSize, long maxSize)
protected int getBlockIndex(BlockLocation[] blkLocations, long offset)
public static void setInputPaths(Job job, String commaSeparatedPaths) throws IOException
job
- the jobcommaSeparatedPaths
- Comma separated paths to be set as
the list of inputs for the map-reduce job.
IOException
public static void addInputPaths(Job job, String commaSeparatedPaths) throws IOException
job
- The job to modifycommaSeparatedPaths
- Comma separated paths to be added to
the list of inputs for the map-reduce job.
IOException
public static void setInputPaths(Job job, Path... inputPaths) throws IOException
Path
s as the list of inputs
for the map-reduce job.
job
- The job to modifyinputPaths
- the Path
s of the input directories/files
for the map-reduce job.
IOException
public static void addInputPath(Job job, Path path) throws IOException
Path
to the list of inputs for the map-reduce job.
job
- The Job
to modifypath
- Path
to be added to the list of inputs for
the map-reduce job.
IOException
public static Path[] getInputPaths(JobContext context)
Path
s for the map-reduce job.
context
- The job
Path
s for the map-reduce job.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |