org.apache.hadoop.examples.terasort
Class TeraInputFormat
java.lang.Object
  
org.apache.hadoop.mapred.FileInputFormat<Text,Text>
      
org.apache.hadoop.examples.terasort.TeraInputFormat
- All Implemented Interfaces: 
 - InputFormat<Text,Text>
 
public class TeraInputFormat
- extends FileInputFormat<Text,Text>
 
An input format that reads the first 10 characters of each line as the key
 and the rest of the line as the value. Both key and value are represented
 as Text.
 
 
 
 
 
 
| Methods inherited from class org.apache.hadoop.mapred.FileInputFormat | 
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize | 
 
| Methods inherited from class java.lang.Object | 
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
 
TeraInputFormat
public TeraInputFormat()
writePartitionFile
public static void writePartitionFile(JobConf conf,
                                      Path partFile)
                               throws IOException
- Use the input splits to take samples of the input and generate sample
 keys. By default reads 100,000 keys from 10 locations in the input, sorts
 them and picks N-1 keys to generate N equally sized partitions.
- Parameters:
 conf - the job to samplepartFile - where to write the output file to
- Throws:
 IOException - if something goes wrong
 
 
getRecordReader
public RecordReader<Text,Text> getRecordReader(InputSplit split,
                                               JobConf job,
                                               Reporter reporter)
                                        throws IOException
- Description copied from interface: 
InputFormat 
- Get the 
RecordReader for the given InputSplit.
 It is the responsibility of the RecordReader to respect
 record boundaries while processing the logical split to present a 
 record-oriented view to the individual task.
- Specified by:
 getRecordReader in interface InputFormat<Text,Text>- Specified by:
 getRecordReader in class FileInputFormat<Text,Text>
 
- Parameters:
 split - the InputSplitjob - the job that this split belongs to
- Returns:
 - a 
RecordReader
 - Throws:
 IOException
 
 
getSplits
public InputSplit[] getSplits(JobConf conf,
                              int splits)
                       throws IOException
- Description copied from class: 
FileInputFormat 
- Splits files returned by 
FileInputFormat.listStatus(JobConf) when
 they're too big.
- Specified by:
 getSplits in interface InputFormat<Text,Text>- Overrides:
 getSplits in class FileInputFormat<Text,Text>
 
- Parameters:
 conf - job configuration.splits - the desired number of splits, a hint.
- Returns:
 - an array of 
InputSplits for the job.
 - Throws:
 IOException
 
 
Copyright © 2009 The Apache Software Foundation