|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.hadoop.mapreduce.InputFormat<K,V> org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>
@InterfaceAudience.Public @InterfaceStability.Stable public abstract class FileInputFormat<K,V>
A base class for file-based InputFormat
s.
FileInputFormat
is the base class for all file-based
InputFormat
s. This provides a generic implementation of
getSplits(JobContext)
.
Subclasses of FileInputFormat
can also override the
isSplitable(JobContext, Path)
method to ensure input-files are
not split-up and are processed as a whole by Mapper
s.
Field Summary | |
---|---|
static int |
DEFAULT_LIST_STATUS_NUM_THREADS
|
static String |
INPUT_DIR
|
static String |
INPUT_DIR_RECURSIVE
|
static String |
LIST_STATUS_NUM_THREADS
|
static String |
NUM_INPUT_FILES
|
static String |
PATHFILTER_CLASS
|
static String |
SPLIT_MAXSIZE
|
static String |
SPLIT_MINSIZE
|
Constructor Summary | |
---|---|
FileInputFormat()
|
Method Summary | |
---|---|
static void |
addInputPath(Job job,
Path path)
Add a Path to the list of inputs for the map-reduce job. |
protected void |
addInputPathRecursively(List<FileStatus> result,
FileSystem fs,
Path path,
PathFilter inputFilter)
Add files in the input path recursively into the results. |
static void |
addInputPaths(Job job,
String commaSeparatedPaths)
Add the given comma separated paths to the list of inputs for the map-reduce job. |
protected long |
computeSplitSize(long blockSize,
long minSize,
long maxSize)
|
protected int |
getBlockIndex(BlockLocation[] blkLocations,
long offset)
|
protected long |
getFormatMinSplitSize()
Get the lower bound on split size imposed by the format. |
static boolean |
getInputDirRecursive(JobContext job)
|
static PathFilter |
getInputPathFilter(JobContext context)
Get a PathFilter instance of the filter set for the input paths. |
static Path[] |
getInputPaths(JobContext context)
Get the list of input Path s for the map-reduce job. |
static long |
getMaxSplitSize(JobContext context)
Get the maximum split size. |
static long |
getMinSplitSize(JobContext job)
Get the minimum split size |
List<InputSplit> |
getSplits(JobContext job)
Generate the list of files and make them into FileSplits. |
protected boolean |
isSplitable(JobContext context,
Path filename)
Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. |
protected List<FileStatus> |
listStatus(JobContext job)
List input directories. |
protected FileSplit |
makeSplit(Path file,
long start,
long length,
String[] hosts)
A factory that makes the split for this class. |
protected FileSplit |
makeSplit(Path file,
long start,
long length,
String[] hosts,
String[] inMemoryHosts)
A factory that makes the split for this class. |
static void |
setInputDirRecursive(Job job,
boolean inputDirRecursive)
|
static void |
setInputPathFilter(Job job,
Class<? extends PathFilter> filter)
Set a PathFilter to be applied to the input paths for the map-reduce job. |
static void |
setInputPaths(Job job,
Path... inputPaths)
Set the array of Path s as the list of inputs
for the map-reduce job. |
static void |
setInputPaths(Job job,
String commaSeparatedPaths)
Sets the given comma separated paths as the list of inputs for the map-reduce job. |
static void |
setMaxInputSplitSize(Job job,
long size)
Set the maximum split size |
static void |
setMinInputSplitSize(Job job,
long size)
Set the minimum input split size |
Methods inherited from class org.apache.hadoop.mapreduce.InputFormat |
---|
createRecordReader |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String INPUT_DIR
public static final String SPLIT_MAXSIZE
public static final String SPLIT_MINSIZE
public static final String PATHFILTER_CLASS
public static final String NUM_INPUT_FILES
public static final String INPUT_DIR_RECURSIVE
public static final String LIST_STATUS_NUM_THREADS
public static final int DEFAULT_LIST_STATUS_NUM_THREADS
Constructor Detail |
---|
public FileInputFormat()
Method Detail |
---|
public static void setInputDirRecursive(Job job, boolean inputDirRecursive)
job
- the job to modifyinputDirRecursive
- public static boolean getInputDirRecursive(JobContext job)
job
- the job to look at.
protected long getFormatMinSplitSize()
protected boolean isSplitable(JobContext context, Path filename)
FileInputFormat
implementations can override this and return
false
to ensure that individual input files are never split-up
so that Mapper
s process entire files.
context
- the job contextfilename
- the file name to check
public static void setInputPathFilter(Job job, Class<? extends PathFilter> filter)
job
- the job to modifyfilter
- the PathFilter class use for filtering the input paths.public static void setMinInputSplitSize(Job job, long size)
job
- the job to modifysize
- the minimum sizepublic static long getMinSplitSize(JobContext job)
job
- the job
public static void setMaxInputSplitSize(Job job, long size)
job
- the job to modifysize
- the maximum split sizepublic static long getMaxSplitSize(JobContext context)
context
- the job to look at.
public static PathFilter getInputPathFilter(JobContext context)
protected List<FileStatus> listStatus(JobContext job) throws IOException
job
- the job to list input paths for
IOException
- if zero items.protected void addInputPathRecursively(List<FileStatus> result, FileSystem fs, Path path, PathFilter inputFilter) throws IOException
result
- The List to store all files.fs
- The FileSystem.path
- The input path.inputFilter
- The input filter that can be used to filter files/dirs.
IOException
protected FileSplit makeSplit(Path file, long start, long length, String[] hosts)
protected FileSplit makeSplit(Path file, long start, long length, String[] hosts, String[] inMemoryHosts)
public List<InputSplit> getSplits(JobContext job) throws IOException
getSplits
in class InputFormat<K,V>
job
- the job context
InputSplit
s for the job.
IOException
protected long computeSplitSize(long blockSize, long minSize, long maxSize)
protected int getBlockIndex(BlockLocation[] blkLocations, long offset)
public static void setInputPaths(Job job, String commaSeparatedPaths) throws IOException
job
- the jobcommaSeparatedPaths
- Comma separated paths to be set as
the list of inputs for the map-reduce job.
IOException
public static void addInputPaths(Job job, String commaSeparatedPaths) throws IOException
job
- The job to modifycommaSeparatedPaths
- Comma separated paths to be added to
the list of inputs for the map-reduce job.
IOException
public static void setInputPaths(Job job, Path... inputPaths) throws IOException
Path
s as the list of inputs
for the map-reduce job.
job
- The job to modifyinputPaths
- the Path
s of the input directories/files
for the map-reduce job.
IOException
public static void addInputPath(Job job, Path path) throws IOException
Path
to the list of inputs for the map-reduce job.
job
- The Job
to modifypath
- Path
to be added to the list of inputs for
the map-reduce job.
IOException
public static Path[] getInputPaths(JobContext context)
Path
s for the map-reduce job.
context
- The job
Path
s for the map-reduce job.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |