org.apache.hadoop.mapreduce.lib.input
Class NLineInputFormat
java.lang.Object
  
org.apache.hadoop.mapreduce.InputFormat<K,V>
      
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<LongWritable,Text>
          
org.apache.hadoop.mapreduce.lib.input.NLineInputFormat
@InterfaceAudience.Public
@InterfaceStability.Stable
public class NLineInputFormat
- extends FileInputFormat<LongWritable,Text>
 
NLineInputFormat which splits N lines of input as one split.
 In many "pleasantly" parallel applications, each process/mapper 
 processes the same input file (s), but with computations are 
 controlled by different parameters.(Referred to as "parameter sweeps").
 One way to achieve this, is to specify a set of parameters 
 (one set per line) as input in a control file 
 (which is the input path to the map-reduce application,
 where as the input dataset is specified 
 via a config variable in JobConf.).
 
 The NLineInputFormat can be used in such applications, that splits 
 the input file such that by default, one line is fed as
 a value to one map task, and key is the offset.
 i.e. (k,v) is (LongWritable, Text).
 The location hints will span the whole mapred cluster.
 
 
 
 
 
| Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat | 
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize | 
 
| Methods inherited from class java.lang.Object | 
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
 
LINES_PER_MAP
public static final String LINES_PER_MAP
- See Also:
 - Constant Field Values
 
NLineInputFormat
public NLineInputFormat()
createRecordReader
public RecordReader<LongWritable,Text> createRecordReader(InputSplit genericSplit,
                                                          TaskAttemptContext context)
                                                   throws IOException
- Description copied from class: 
InputFormat 
- Create a record reader for a given split. The framework will call
 
RecordReader.initialize(InputSplit, TaskAttemptContext) before
 the split is used.
- Specified by:
 createRecordReader in class InputFormat<LongWritable,Text>
 
- Parameters:
 genericSplit - the split to be readcontext - the information about the task
- Returns:
 - a new record reader
 - Throws:
 IOException
 
 
getSplits
public List<InputSplit> getSplits(JobContext job)
                           throws IOException
- Logically splits the set of input files for the job, splits N lines
 of the input as one split.
- Overrides:
 getSplits in class FileInputFormat<LongWritable,Text>
 
- Parameters:
 job - job configuration.
- Returns:
 - an array of 
InputSplits for the job.
 - Throws:
 IOException- See Also:
 FileInputFormat.getSplits(JobContext)
 
 
getSplitsForFile
public static List<FileSplit> getSplitsForFile(FileStatus status,
                                               Configuration conf,
                                               int numLinesPerSplit)
                                        throws IOException
- Throws:
 IOException
 
createFileSplit
protected static FileSplit createFileSplit(Path fileName,
                                           long begin,
                                           long length)
- NLineInputFormat uses LineRecordReader, which always reads
 (and consumes) at least one character out of its upper split
 boundary. So to make sure that each mapper gets N lines, we
 move back the upper split limits of each split 
 by one character here.
- Parameters:
 fileName - Path of filebegin - the position of the first byte in the file to processlength - number of bytes in InputSplit
- Returns:
 - FileSplit
 
 
 
setNumLinesPerSplit
public static void setNumLinesPerSplit(Job job,
                                       int numLines)
- Set the number of lines per split
- Parameters:
 job - the job to modifynumLines - the number of lines per split
 
 
getNumLinesPerSplit
public static int getNumLinesPerSplit(JobContext job)
- Get the number of lines per split
- Parameters:
 job - the job
- Returns:
 - the number of lines per split
 
 
 
Copyright © 2009 The Apache Software Foundation