org.apache.hadoop.mapred.lib
Class MultipleOutputFormat<K,V>

java.lang.Object
  extended by org.apache.hadoop.mapred.FileOutputFormat<K,V>
      extended by org.apache.hadoop.mapred.lib.MultipleOutputFormat<K,V>
All Implemented Interfaces:
OutputFormat<K,V>
Direct Known Subclasses:
MultipleSequenceFileOutputFormat, MultipleTextOutputFormat

@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class MultipleOutputFormat<K,V>
extends FileOutputFormat<K,V>

This abstract class extends the FileOutputFormat, allowing to write the output data to different output files. There are three basic use cases for this class. Case one: This class is used for a map reduce job with at least one reducer. The reducer wants to write data to different files depending on the actual keys. It is assumed that a key (or value) encodes the actual key (value) and the desired location for the actual key (value). Case two: This class is used for a map only job. The job wants to use an output file name that is either a part of the input file name of the input data, or some derivation of it. Case three: This class is used for a map only job. The job wants to use an output file name that depends on both the keys and the input file name,


Constructor Summary
MultipleOutputFormat()
           
 
Method Summary
protected  K generateActualKey(K key, V value)
          Generate the actual key from the given key/value.
protected  V generateActualValue(K key, V value)
          Generate the actual value from the given key and value.
protected  String generateFileNameForKeyValue(K key, V value, String name)
          Generate the file output file name based on the given key and the leaf file name.
protected  String generateLeafFileName(String name)
          Generate the leaf name for the output file name.
protected abstract  RecordWriter<K,V> getBaseRecordWriter(FileSystem fs, JobConf job, String name, Progressable arg3)
           
protected  String getInputFileBasedOutputFileName(JobConf job, String name)
          Generate the outfile name based on a given anme and the input file name.
 RecordWriter<K,V> getRecordWriter(FileSystem fs, JobConf job, String name, Progressable arg3)
          Create a composite record writer that can write key/value data to different output files
 
Methods inherited from class org.apache.hadoop.mapred.FileOutputFormat
checkOutputSpecs, getCompressOutput, getOutputCompressorClass, getOutputPath, getPathForCustomFile, getTaskOutputPath, getUniqueName, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputPath
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MultipleOutputFormat

public MultipleOutputFormat()
Method Detail

getRecordWriter

public RecordWriter<K,V> getRecordWriter(FileSystem fs,
                                         JobConf job,
                                         String name,
                                         Progressable arg3)
                                  throws IOException
Create a composite record writer that can write key/value data to different output files

Specified by:
getRecordWriter in interface OutputFormat<K,V>
Specified by:
getRecordWriter in class FileOutputFormat<K,V>
Parameters:
fs - the file system to use
job - the job conf for the job
name - the leaf file name for the output file (such as part-00000")
arg3 - a progressable for reporting progress.
Returns:
a composite record writer
Throws:
IOException

generateLeafFileName

protected String generateLeafFileName(String name)
Generate the leaf name for the output file name. The default behavior does not change the leaf file name (such as part-00000)

Parameters:
name - the leaf file name for the output file
Returns:
the given leaf file name

generateFileNameForKeyValue

protected String generateFileNameForKeyValue(K key,
                                             V value,
                                             String name)
Generate the file output file name based on the given key and the leaf file name. The default behavior is that the file name does not depend on the key.

Parameters:
key - the key of the output data
name - the leaf file name
Returns:
generated file name

generateActualKey

protected K generateActualKey(K key,
                              V value)
Generate the actual key from the given key/value. The default behavior is that the actual key is equal to the given key

Parameters:
key - the key of the output data
value - the value of the output data
Returns:
the actual key derived from the given key/value

generateActualValue

protected V generateActualValue(K key,
                                V value)
Generate the actual value from the given key and value. The default behavior is that the actual value is equal to the given value

Parameters:
key - the key of the output data
value - the value of the output data
Returns:
the actual value derived from the given key/value

getInputFileBasedOutputFileName

protected String getInputFileBasedOutputFileName(JobConf job,
                                                 String name)
Generate the outfile name based on a given anme and the input file name. If the MRJobConfig.MAP_INPUT_FILE does not exists (i.e. this is not for a map only job), the given name is returned unchanged. If the config value for "num.of.trailing.legs.to.use" is not set, or set 0 or negative, the given name is returned unchanged. Otherwise, return a file name consisting of the N trailing legs of the input file name where N is the config value for "num.of.trailing.legs.to.use".

Parameters:
job - the job config
name - the output file name
Returns:
the outfile name based on a given anme and the input file name.

getBaseRecordWriter

protected abstract RecordWriter<K,V> getBaseRecordWriter(FileSystem fs,
                                                         JobConf job,
                                                         String name,
                                                         Progressable arg3)
                                                  throws IOException
Parameters:
fs - the file system to use
job - a job conf object
name - the name of the file over which a record writer object will be constructed
arg3 - a progressable object
Returns:
A RecordWriter object over the given file
Throws:
IOException


Copyright © 2014 Apache Software Foundation. All Rights Reserved.