org.apache.hadoop.mapred.FileInputFormat<LongWritable,Text>

org.apache.hadoop.mapred.lib.NLineInputFormat

All Implemented Interfaces:: InputFormat<LongWritable,Text>, JobConfigurable

@Public @Stable public class NLineInputFormat extends FileInputFormat<LongWritable,Text> implements JobConfigurable

NLineInputFormat which splits N lines of input as one split. In many "pleasantly" parallel applications, each process/mapper processes the same input file (s), but with computations are controlled by different parameters.(Referred to as "parameter sweeps"). One way to achieve this, is to specify a set of parameters (one set per line) as input in a control file (which is the input path to the map-reduce application, where as the input dataset is specified via a config variable in JobConf.). The NLineInputFormat can be used in such applications, that splits the input file such that by default, one line is fed as a value to one map task, and key is the offset. i.e. (k,v) is (LongWritable, Text). The location hints will span the whole mapred cluster.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat
FileInputFormat.Counter
Field Summary

Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LOG, NUM_INPUT_FILES
Constructor Summary

Constructors

Constructor

Description

NLineInputFormat()
Method Summary

Modifier and Type

Method

Description

void

configure(JobConf conf)

Initializes a new instance from a JobConf.

protected static FileSplit

createFileSplit(Path fileName, long begin, long length)

NLineInputFormat uses LineRecordReader, which always reads (and consumes) at least one character out of its upper split boundary.

RecordReader<LongWritable,Text>

getRecordReader(InputSplit genericSplit, JobConf job, Reporter reporter)

Get the RecordReader for the given InputSplit.

InputSplit[]

getSplits(JobConf job, int numSplits)

Logically splits the set of input files for the job, splits N lines of the input as one split.

Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, isSplitable, listStatus, makeSplit, makeSplit, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- NLineInputFormat
  
  public NLineInputFormat()
Method Details
- getRecordReader
  
  public RecordReader<LongWritable,Text> getRecordReader(InputSplit genericSplit, JobConf job, Reporter reporter) throws IOException
  
  Description copied from interface: InputFormat
  
  Get the RecordReader for the given InputSplit.
  It is the responsibility of the RecordReader to respect record boundaries while processing the logical split to present a record-oriented view to the individual task.
  
  Specified by:
  
  getRecordReader in interface InputFormat<LongWritable,Text>
  
  Specified by:
  
  getRecordReader in class FileInputFormat<LongWritable,Text>
  
  Parameters:
  
  genericSplit - the InputSplit
  
  job - the job that this split belongs to
  
  Returns:
  
  a RecordReader
  
  Throws:
  
  IOException
- getSplits
  
  public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException
  
  Logically splits the set of input files for the job, splits N lines of the input as one split.
  Specified by:
  
  getSplits in interface InputFormat<LongWritable,Text>
  
  Overrides:
  
  getSplits in class FileInputFormat<LongWritable,Text>
  
  Parameters:
  
  job - job configuration.
  
  numSplits - the desired number of splits, a hint.
  
  Returns:
  
  an array of InputSplits for the job.
  
  Throws:
  
  IOException
  
  See Also:
  
  FileInputFormat.getSplits(JobConf, int)
- configure
  
  public void configure(JobConf conf)
  
  Description copied from interface: JobConfigurable
  
  Initializes a new instance from a JobConf.
  
  Specified by:
  
  configure in interface JobConfigurable
  
  Parameters:
  
  conf - the configuration
- createFileSplit
  
  protected static FileSplit createFileSplit(Path fileName, long begin, long length)
  
  NLineInputFormat uses LineRecordReader, which always reads (and consumes) at least one character out of its upper split boundary. So to make sure that each mapper gets N lines, we move back the upper split limits of each split by one character here.
  
  Parameters:
  
  fileName - Path of file
  
  begin - the position of the first byte in the file to process
  
  length - number of bytes in InputSplit
  
  Returns:
  
  FileSplit

Class NLineInputFormat

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat

Field Summary

Fields inherited from class org.apache.hadoop.mapred.FileInputFormat

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapred.FileInputFormat

Methods inherited from class java.lang.Object

Constructor Details

NLineInputFormat

Method Details

getRecordReader

getSplits

configure

createFileSplit