NLineInputFormat (Hadoop 1.2.1 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.mapreduce.lib.input
Class NLineInputFormat

java.lang.Object
  org.apache.hadoop.mapreduce.InputFormat<K,V>
      org.apache.hadoop.mapreduce.lib.input.FileInputFormat<LongWritable,Text>
          org.apache.hadoop.mapreduce.lib.input.NLineInputFormat

@InterfaceAudience.Public @InterfaceStability.Stable public class NLineInputFormat
extends FileInputFormat<LongWritable,Text>
extends FileInputFormat<LongWritable,Text>

NLineInputFormat which splits N lines of input as one split. In many "pleasantly" parallel applications, each process/mapper processes the same input file (s), but with computations are controlled by different parameters.(Referred to as "parameter sweeps"). One way to achieve this, is to specify a set of parameters (one set per line) as input in a control file (which is the input path to the map-reduce application, where as the input dataset is specified via a config variable in JobConf.). The NLineInputFormat can be used in such applications, that splits the input file such that by default, one line is fed as a value to one map task, and key is the offset. i.e. (k,v) is (LongWritable, Text). The location hints will span the whole mapred cluster.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
`FileInputFormat.Counter`

Field Summary
`static String`	`LINES_PER_MAP`

Constructor Summary
`NLineInputFormat()`

Method Summary
`protected static FileSplit`	`createFileSplit(Path fileName, long begin, long length)` NLineInputFormat uses LineRecordReader, which always reads (and consumes) at least one character out of its upper split boundary.
`RecordReader<LongWritable,Text>`	`createRecordReader(InputSplit genericSplit, TaskAttemptContext context)` Create a record reader for a given split.
`static int`	`getNumLinesPerSplit(JobContext job)` Get the number of lines per split
`List<InputSplit>`	`getSplits(JobContext job)` Logically splits the set of input files for the job, splits N lines of the input as one split.
`static List<FileSplit>`	`getSplitsForFile(FileStatus status, Configuration conf, int numLinesPerSplit)`
`static void`	`setNumLinesPerSplit(Job job, int numLines)` Set the number of lines per split

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
`addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

LINES_PER_MAP

public static final String LINES_PER_MAP

See Also:: Constant Field Values

Constructor Detail