org.apache.hadoop.mapred.lib
Class MultipleOutputs

java.lang.Object
  extended by org.apache.hadoop.mapred.lib.MultipleOutputs

public class MultipleOutputs
extends Object

The MultipleOutputs class simplifies writting to additional outputs other than the job default output via the OutputCollector passed to the map() and reduce() methods of the Mapper and Reducer implementations.

Each additional output, or named output, may be configured with its own OutputFormat, with its own key class and with its own value class.

A named output can be a single file or a multi file. The later is refered as a multi named output.

A multi named output is an unbound set of files all sharing the same OutputFormat, key class and value class configuration.

When named outputs are used within a Mapper implementation, key/values written to a name output are not part of the reduce phase, only key/values written to the job OutputCollector are part of the reduce phase.

MultipleOutputs supports counters, by default the are disabled. The counters group is the MultipleOutputs class name.

The names of the counters are the same as the named outputs. For multi named outputs the name of the counter is the concatenation of the named output, and underscore '_' and the multiname.

Job configuration usage pattern is:


 JobConf conf = new JobConf();

 conf.setInputPath(inDir);
 FileOutputFormat.setOutputPath(conf, outDir);

 conf.setMapperClass(MOMap.class);
 conf.setReducerClass(MOReduce.class);
 ...

 // Defines additional single text based output 'text' for the job
 MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class,
 LongWritable.class, Text.class);

 // Defines additional multi sequencefile based output 'sequence' for the
 // job
 MultipleOutputs.addMultiNamedOutput(conf, "seq",
   SequenceFileOutputFormat.class,
   LongWritable.class, Text.class);
 ...

 JobClient jc = new JobClient();
 RunningJob job = jc.submitJob(conf);

 ...
 

Job configuration usage pattern is:


 public class MOReduce implements
   Reducer<WritableComparable, Writable> {
 private MultipleOutputs mos;

 public void configure(JobConf conf) {
 ...
 mos = new MultipleOutputs(conf);
 }

 public void reduce(WritableComparable key, Iterator<Writable> values,
 OutputCollector output, Reporter reporter)
 throws IOException {
 ...
 mos.getCollector("text", reporter).collect(key, new Text("Hello"));
 mos.getCollector("seq", "A", reporter).collect(key, new Text("Bye"));
 mos.getCollector("seq", "B", reporter).collect(key, new Text("Chau"));
 ...
 }

 public void close() throws IOException {
 mos.close();
 ...
 }

 }
 


Constructor Summary
MultipleOutputs(JobConf job)
          Creates and initializes multiple named outputs support, it should be instantiated in the Mapper/Reducer configure method.
 
Method Summary
static void addMultiNamedOutput(JobConf conf, String namedOutput, Class<? extends OutputFormat> outputFormatClass, Class<?> keyClass, Class<?> valueClass)
          Adds a multi named output for the job.
static void addNamedOutput(JobConf conf, String namedOutput, Class<? extends OutputFormat> outputFormatClass, Class<?> keyClass, Class<?> valueClass)
          Adds a named output for the job.
 void close()
          Closes all the opened named outputs.
 OutputCollector getCollector(String namedOutput, Reporter reporter)
          Gets the output collector for a named output.
 OutputCollector getCollector(String namedOutput, String multiName, Reporter reporter)
          Gets the output collector for a multi named output.
static boolean getCountersEnabled(JobConf conf)
          Returns if the counters for the named outputs are enabled or not.
static Class<? extends OutputFormat> getNamedOutputFormatClass(JobConf conf, String namedOutput)
          Returns the named output OutputFormat.
static Class<? extends WritableComparable> getNamedOutputKeyClass(JobConf conf, String namedOutput)
          Returns the key class for a named output.
 Iterator<String> getNamedOutputs()
          Returns iterator with the defined name outputs.
static List<String> getNamedOutputsList(JobConf conf)
          Returns list of channel names.
static Class<? extends Writable> getNamedOutputValueClass(JobConf conf, String namedOutput)
          Returns the value class for a named output.
static boolean isMultiNamedOutput(JobConf conf, String namedOutput)
          Returns if a named output is multiple.
static void setCountersEnabled(JobConf conf, boolean enabled)
          Enables or disables counters for the named outputs.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MultipleOutputs

public MultipleOutputs(JobConf job)
Creates and initializes multiple named outputs support, it should be instantiated in the Mapper/Reducer configure method.

Parameters:
job - the job configuration object
Method Detail

getNamedOutputsList

public static List<String> getNamedOutputsList(JobConf conf)
Returns list of channel names.

Parameters:
conf - job conf
Returns:
List of channel Names

isMultiNamedOutput

public static boolean isMultiNamedOutput(JobConf conf,
                                         String namedOutput)
Returns if a named output is multiple.

Parameters:
conf - job conf
namedOutput - named output
Returns:
true if the name output is multi, false if it is single. If the name output is not defined it returns false

getNamedOutputFormatClass

public static Class<? extends OutputFormat> getNamedOutputFormatClass(JobConf conf,
                                                                      String namedOutput)
Returns the named output OutputFormat.

Parameters:
conf - job conf
namedOutput - named output
Returns:
namedOutput OutputFormat

getNamedOutputKeyClass

public static Class<? extends WritableComparable> getNamedOutputKeyClass(JobConf conf,
                                                                         String namedOutput)
Returns the key class for a named output.

Parameters:
conf - job conf
namedOutput - named output
Returns:
class for the named output key

getNamedOutputValueClass

public static Class<? extends Writable> getNamedOutputValueClass(JobConf conf,
                                                                 String namedOutput)
Returns the value class for a named output.

Parameters:
conf - job conf
namedOutput - named output
Returns:
class of named output value

addNamedOutput

public static void addNamedOutput(JobConf conf,
                                  String namedOutput,
                                  Class<? extends OutputFormat> outputFormatClass,
                                  Class<?> keyClass,
                                  Class<?> valueClass)
Adds a named output for the job.

Parameters:
conf - job conf to add the named output
namedOutput - named output name, it has to be a word, letters and numbers only, cannot be the word 'part' as that is reserved for the default output.
outputFormatClass - OutputFormat class.
keyClass - key class
valueClass - value class

addMultiNamedOutput

public static void addMultiNamedOutput(JobConf conf,
                                       String namedOutput,
                                       Class<? extends OutputFormat> outputFormatClass,
                                       Class<?> keyClass,
                                       Class<?> valueClass)
Adds a multi named output for the job.

Parameters:
conf - job conf to add the named output
namedOutput - named output name, it has to be a word, letters and numbers only, cannot be the word 'part' as that is reserved for the default output.
outputFormatClass - OutputFormat class.
keyClass - key class
valueClass - value class

setCountersEnabled

public static void setCountersEnabled(JobConf conf,
                                      boolean enabled)
Enables or disables counters for the named outputs.

By default these counters are disabled.

MultipleOutputs supports counters, by default the are disabled. The counters group is the MultipleOutputs class name.

The names of the counters are the same as the named outputs. For multi named outputs the name of the counter is the concatenation of the named output, and underscore '_' and the multiname.

Parameters:
conf - job conf to enableadd the named output.
enabled - indicates if the counters will be enabled or not.

getCountersEnabled

public static boolean getCountersEnabled(JobConf conf)
Returns if the counters for the named outputs are enabled or not.

By default these counters are disabled.

MultipleOutputs supports counters, by default the are disabled. The counters group is the MultipleOutputs class name.

The names of the counters are the same as the named outputs. For multi named outputs the name of the counter is the concatenation of the named output, and underscore '_' and the multiname.

Parameters:
conf - job conf to enableadd the named output.
Returns:
TRUE if the counters are enabled, FALSE if they are disabled.

getNamedOutputs

public Iterator<String> getNamedOutputs()
Returns iterator with the defined name outputs.

Returns:
iterator with the defined named outputs

getCollector

public OutputCollector getCollector(String namedOutput,
                                    Reporter reporter)
                             throws IOException
Gets the output collector for a named output.

Parameters:
namedOutput - the named output name
reporter - the reporter
Returns:
the output collector for the given named output
Throws:
IOException - thrown if output collector could not be created

getCollector

public OutputCollector getCollector(String namedOutput,
                                    String multiName,
                                    Reporter reporter)
                             throws IOException
Gets the output collector for a multi named output.

Parameters:
namedOutput - the named output name
multiName - the multi name part
reporter - the reporter
Returns:
the output collector for the given named output
Throws:
IOException - thrown if output collector could not be created

close

public void close()
           throws IOException
Closes all the opened named outputs.

If overriden subclasses must invoke super.close() at the end of their close()

Throws:
IOException - thrown if any of the MultipleOutput files could not be closed properly.


Copyright © 2009 The Apache Software Foundation