@InterfaceAudience.Public @InterfaceStability.Stable public abstract class FileInputFormat<K,V> extends InputFormat<K,V>
InputFormats.
 FileInputFormat is the base class for all file-based 
 InputFormats. This provides a generic implementation of
 getSplits(JobContext).
 Implementations of FileInputFormat can also override the
 isSplitable(JobContext, Path) method to prevent input files
 from being split-up in certain situations. Implementations that may
 deal with non-splittable files must override this method, since
 the default implementation assumes splitting is always possible.
| Modifier and Type | Field and Description | 
|---|---|
| static int | DEFAULT_LIST_STATUS_NUM_THREADS | 
| static String | INPUT_DIR | 
| static String | INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS | 
| static String | INPUT_DIR_RECURSIVE | 
| static String | LIST_STATUS_NUM_THREADS | 
| static String | NUM_INPUT_FILES | 
| static String | PATHFILTER_CLASS | 
| static String | SPLIT_MAXSIZE | 
| static String | SPLIT_MINSIZE | 
| Constructor and Description | 
|---|
| FileInputFormat() | 
| Modifier and Type | Method and Description | 
|---|---|
| static void | addInputPath(Job job,
            Path path)Add a  Pathto the list of inputs for the map-reduce job. | 
| protected void | addInputPathRecursively(List<FileStatus> result,
                       FileSystem fs,
                       Path path,
                       PathFilter inputFilter)Add files in the input path recursively into the results. | 
| static void | addInputPaths(Job job,
             String commaSeparatedPaths)Add the given comma separated paths to the list of inputs for
  the map-reduce job. | 
| protected long | computeSplitSize(long blockSize,
                long minSize,
                long maxSize) | 
| protected int | getBlockIndex(BlockLocation[] blkLocations,
             long offset) | 
| protected long | getFormatMinSplitSize()Get the lower bound on split size imposed by the format. | 
| static boolean | getInputDirRecursive(JobContext job) | 
| static PathFilter | getInputPathFilter(JobContext context)Get a PathFilter instance of the filter set for the input paths. | 
| static Path[] | getInputPaths(JobContext context)Get the list of input  Paths for the map-reduce job. | 
| static long | getMaxSplitSize(JobContext context)Get the maximum split size. | 
| static long | getMinSplitSize(JobContext job)Get the minimum split size | 
| List<InputSplit> | getSplits(JobContext job)Generate the list of files and make them into FileSplits. | 
| protected boolean | isSplitable(JobContext context,
           Path filename)Is the given filename splittable? Usually, true, but if the file is
 stream compressed, it will not be. | 
| protected List<FileStatus> | listStatus(JobContext job)List input directories. | 
| protected FileSplit | makeSplit(Path file,
         long start,
         long length,
         String[] hosts)A factory that makes the split for this class. | 
| protected FileSplit | makeSplit(Path file,
         long start,
         long length,
         String[] hosts,
         String[] inMemoryHosts)A factory that makes the split for this class. | 
| static void | setInputDirRecursive(Job job,
                    boolean inputDirRecursive) | 
| static void | setInputPathFilter(Job job,
                  Class<? extends PathFilter> filter)Set a PathFilter to be applied to the input paths for the map-reduce job. | 
| static void | setInputPaths(Job job,
             Path... inputPaths)Set the array of  Paths as the list of inputs
 for the map-reduce job. | 
| static void | setInputPaths(Job job,
             String commaSeparatedPaths)Sets the given comma separated paths as the list of inputs 
 for the map-reduce job. | 
| static void | setMaxInputSplitSize(Job job,
                    long size)Set the maximum split size | 
| static void | setMinInputSplitSize(Job job,
                    long size)Set the minimum input split size | 
| static FileStatus | shrinkStatus(FileStatus origStat)The HdfsBlockLocation includes a LocatedBlock which contains messages
 for issuing more detailed queries to datanodes about a block, but these
 messages are useless during job submission currently. | 
createRecordReaderpublic static final String INPUT_DIR
public static final String SPLIT_MAXSIZE
public static final String SPLIT_MINSIZE
public static final String PATHFILTER_CLASS
public static final String NUM_INPUT_FILES
public static final String INPUT_DIR_RECURSIVE
public static final String INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS
public static final String LIST_STATUS_NUM_THREADS
public static final int DEFAULT_LIST_STATUS_NUM_THREADS
public static void setInputDirRecursive(Job job, boolean inputDirRecursive)
job - the job to modifyinputDirRecursive - public static boolean getInputDirRecursive(JobContext job)
job - the job to look at.protected long getFormatMinSplitSize()
protected boolean isSplitable(JobContext context, Path filename)
FileInputFormat always returns
 true. Implementations that may deal with non-splittable files must
 override this method.
 FileInputFormat implementations can override this and return
 false to ensure that individual input files are never split-up
 so that Mappers process entire files.context - the job contextfilename - the file name to checkpublic static void setInputPathFilter(Job job, Class<? extends PathFilter> filter)
job - the job to modifyfilter - the PathFilter class use for filtering the input paths.public static void setMinInputSplitSize(Job job, long size)
job - the job to modifysize - the minimum sizepublic static long getMinSplitSize(JobContext job)
job - the jobpublic static void setMaxInputSplitSize(Job job, long size)
job - the job to modifysize - the maximum split sizepublic static long getMaxSplitSize(JobContext context)
context - the job to look at.public static PathFilter getInputPathFilter(JobContext context)
protected List<FileStatus> listStatus(JobContext job) throws IOException
job - the job to list input paths for and attach tokens to.IOException - if zero items.protected void addInputPathRecursively(List<FileStatus> result, FileSystem fs, Path path, PathFilter inputFilter) throws IOException
result - The List to store all files.fs - The FileSystem.path - The input path.inputFilter - The input filter that can be used to filter files/dirs.IOExceptionpublic static FileStatus shrinkStatus(FileStatus origStat)
listStatus(JobContext) to scan more files with less
 memory footprint.origStat - The fat FileStatus.BlockLocation, 
HdfsBlockLocationprotected FileSplit makeSplit(Path file, long start, long length, String[] hosts)
protected FileSplit makeSplit(Path file, long start, long length, String[] hosts, String[] inMemoryHosts)
public List<InputSplit> getSplits(JobContext job) throws IOException
getSplits in class InputFormat<K,V>job - the job contextInputSplits for the job.IOExceptionprotected long computeSplitSize(long blockSize,
                                long minSize,
                                long maxSize)
protected int getBlockIndex(BlockLocation[] blkLocations, long offset)
public static void setInputPaths(Job job, String commaSeparatedPaths) throws IOException
job - the jobcommaSeparatedPaths - Comma separated paths to be set as 
        the list of inputs for the map-reduce job.IOExceptionpublic static void addInputPaths(Job job, String commaSeparatedPaths) throws IOException
job - The job to modifycommaSeparatedPaths - Comma separated paths to be added to
        the list of inputs for the map-reduce job.IOExceptionpublic static void setInputPaths(Job job, Path... inputPaths) throws IOException
Paths as the list of inputs
 for the map-reduce job.job - The job to modifyinputPaths - the Paths of the input directories/files 
 for the map-reduce job.IOExceptionpublic static void addInputPath(Job job, Path path) throws IOException
Path to the list of inputs for the map-reduce job.job - The Job to modifypath - Path to be added to the list of inputs for 
            the map-reduce job.IOExceptionpublic static Path[] getInputPaths(JobContext context)
Paths for the map-reduce job.context - The jobPaths for the map-reduce job.Copyright © 2025 Apache Software Foundation. All rights reserved.