NLineInputFormat (Apache Hadoop Main 3.4.2 API)

java.lang.Object
- org.apache.hadoop.mapreduce.InputFormat<K,V>
- - org.apache.hadoop.mapreduce.lib.input.FileInputFormat<LongWritable,Text>
  - - org.apache.hadoop.mapreduce.lib.input.NLineInputFormat

```
@InterfaceAudience.Public
 @InterfaceStability.Stable
public class NLineInputFormat
extends FileInputFormat<LongWritable,Text>
```
NLineInputFormat which splits N lines of input as one split. In many "pleasantly" parallel applications, each process/mapper processes the same input file (s), but with computations are controlled by different parameters.(Referred to as "parameter sweeps"). One way to achieve this, is to specify a set of parameters (one set per line) as input in a control file (which is the input path to the map-reduce application, where as the input dataset is specified via a config variable in JobConf.). The NLineInputFormat can be used in such applications, that splits the input file such that by default, one line is fed as a value to one map task, and key is the offset. i.e. (k,v) is (LongWritable, Text). The location hints will span the whole mapred cluster.

Field Summary

Fields
Modifier and Type Field and Description

static String LINES_PER_MAP
- Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
  DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE

Fields
Modifier and Type	Field and Description
`static String`	`LINES_PER_MAP`

Constructor Summary

Constructors
Constructor and Description

NLineInputFormat()

Constructors
Constructor and Description
`NLineInputFormat()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected static FileSplit`	`createFileSplit(Path fileName, long begin, long length)` NLineInputFormat uses LineRecordReader, which always reads (and consumes) at least one character out of its upper split boundary.
`RecordReader<LongWritable,Text>`	`createRecordReader(InputSplit genericSplit, TaskAttemptContext context)` Create a record reader for a given split.
`static int`	`getNumLinesPerSplit(JobContext job)` Get the number of lines per split
`List<InputSplit>`	`getSplits(JobContext job)` Logically splits the set of input files for the job, splits N lines of the input as one split.
`static List<FileSplit>`	`getSplitsForFile(FileStatus status, Configuration conf, int numLinesPerSplit)`
`static void`	`setNumLinesPerSplit(Job job, int numLines)` Set the number of lines per split

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, isSplitable, listStatus, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize, shrinkStatus

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - LINES_PER_MAP
```
public static final String LINES_PER_MAP
```
    See Also:
    
    Constant Field Values
- Constructor Detail
  - NLineInputFormat
```
public NLineInputFormat()
```
- Method Detail
  - createRecordReader
```
public RecordReader<LongWritable,Text> createRecordReader(InputSplit genericSplit,
                                                          TaskAttemptContext context)
                                                   throws IOException
```
    Description copied from class: InputFormat
    
    Create a record reader for a given split. The framework will call RecordReader.initialize(InputSplit, TaskAttemptContext) before the split is used.
    
    Specified by:
    
    createRecordReader in class InputFormat<LongWritable,Text>
    
    Parameters:
    
    genericSplit - the split to be read
    
    context - the information about the task
    
    Returns:
    
    a new record reader
    
    Throws:
    
    IOException
  - getSplits
```
public List<InputSplit> getSplits(JobContext job)
                           throws IOException
```
    Logically splits the set of input files for the job, splits N lines of the input as one split.
    
    Overrides:
    
    getSplits in class FileInputFormat<LongWritable,Text>
    
    Parameters:
    
    job - the job context
    
    Returns:
    
    an array of InputSplits for the job.
    
    Throws:
    
    IOException
    
    See Also:
    
    FileInputFormat.getSplits(JobContext)
  - getSplitsForFile
```
public static List<FileSplit> getSplitsForFile(FileStatus status,
                                               Configuration conf,
                                               int numLinesPerSplit)
                                        throws IOException
```
    Throws:
    
    IOException
  - createFileSplit
```
protected static FileSplit createFileSplit(Path fileName,
                                           long begin,
                                           long length)
```
    NLineInputFormat uses LineRecordReader, which always reads (and consumes) at least one character out of its upper split boundary. So to make sure that each mapper gets N lines, we move back the upper split limits of each split by one character here.
    
    Parameters:
    
    fileName - Path of file
    
    begin - the position of the first byte in the file to process
    
    length - number of bytes in InputSplit
    
    Returns:
    
    FileSplit
  - setNumLinesPerSplit
```
public static void setNumLinesPerSplit(Job job,
                                       int numLines)
```
    Set the number of lines per split
    
    Parameters:
    
    job - the job to modify
    
    numLines - the number of lines per split
  - getNumLinesPerSplit
```
public static int getNumLinesPerSplit(JobContext job)
```
    Get the number of lines per split
    
    Parameters:
    
    job - the job
    
    Returns:
    
    the number of lines per split

Class NLineInputFormat

Field Summary

Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

Methods inherited from class java.lang.Object

Field Detail

LINES_PER_MAP

Constructor Detail

NLineInputFormat

Method Detail

createRecordReader

getSplits

getSplitsForFile

createFileSplit

setNumLinesPerSplit

getNumLinesPerSplit