Class MultipleOutputFormat<K,V>

java.lang.Object
org.apache.hadoop.mapred.FileOutputFormat<K,V>
org.apache.hadoop.mapred.lib.MultipleOutputFormat<K,V>
All Implemented Interfaces:
OutputFormat<K,V>
Direct Known Subclasses:
MultipleSequenceFileOutputFormat, MultipleTextOutputFormat

@Public @Stable public abstract class MultipleOutputFormat<K,V> extends FileOutputFormat<K,V>
This abstract class extends the FileOutputFormat, allowing to write the output data to different output files. There are three basic use cases for this class. Case one: This class is used for a map reduce job with at least one reducer. The reducer wants to write data to different files depending on the actual keys. It is assumed that a key (or value) encodes the actual key (value) and the desired location for the actual key (value). Case two: This class is used for a map only job. The job wants to use an output file name that is either a part of the input file name of the input data, or some derivation of it. Case three: This class is used for a map only job. The job wants to use an output file name that depends on both the keys and the input file name,
  • Constructor Details

    • MultipleOutputFormat

      public MultipleOutputFormat()
  • Method Details

    • getRecordWriter

      public RecordWriter<K,V> getRecordWriter(FileSystem fs, JobConf job, String name, Progressable arg3) throws IOException
      Create a composite record writer that can write key/value data to different output files
      Specified by:
      getRecordWriter in interface OutputFormat<K,V>
      Specified by:
      getRecordWriter in class FileOutputFormat<K,V>
      Parameters:
      fs - the file system to use
      job - the job conf for the job
      name - the leaf file name for the output file (such as part-00000")
      arg3 - a progressable for reporting progress.
      Returns:
      a composite record writer
      Throws:
      IOException
    • generateLeafFileName

      protected String generateLeafFileName(String name)
      Generate the leaf name for the output file name. The default behavior does not change the leaf file name (such as part-00000)
      Parameters:
      name - the leaf file name for the output file
      Returns:
      the given leaf file name
    • generateFileNameForKeyValue

      protected String generateFileNameForKeyValue(K key, V value, String name)
      Generate the file output file name based on the given key and the leaf file name. The default behavior is that the file name does not depend on the key.
      Parameters:
      key - the key of the output data
      name - the leaf file name
      Returns:
      generated file name
    • generateActualKey

      protected K generateActualKey(K key, V value)
      Generate the actual key from the given key/value. The default behavior is that the actual key is equal to the given key
      Parameters:
      key - the key of the output data
      value - the value of the output data
      Returns:
      the actual key derived from the given key/value
    • generateActualValue

      protected V generateActualValue(K key, V value)
      Generate the actual value from the given key and value. The default behavior is that the actual value is equal to the given value
      Parameters:
      key - the key of the output data
      value - the value of the output data
      Returns:
      the actual value derived from the given key/value
    • getInputFileBasedOutputFileName

      protected String getInputFileBasedOutputFileName(JobConf job, String name)
      Generate the outfile name based on a given name and the input file name. If the MRJobConfig.MAP_INPUT_FILE does not exists (i.e. this is not for a map only job), the given name is returned unchanged. If the config value for "num.of.trailing.legs.to.use" is not set, or set 0 or negative, the given name is returned unchanged. Otherwise, return a file name consisting of the N trailing legs of the input file name where N is the config value for "num.of.trailing.legs.to.use".
      Parameters:
      job - the job config
      name - the output file name
      Returns:
      the outfile name based on a given name and the input file name.
    • getBaseRecordWriter

      protected abstract RecordWriter<K,V> getBaseRecordWriter(FileSystem fs, JobConf job, String name, Progressable arg3) throws IOException
      Parameters:
      fs - the file system to use
      job - a job conf object
      name - the name of the file over which a record writer object will be constructed
      arg3 - a progressable object
      Returns:
      A RecordWriter object over the given file
      Throws:
      IOException