@InterfaceAudience.Public @InterfaceStability.Stable public class MultipleOutputs extends Object
OutputCollector
passed to
the map()
and reduce()
methods of the
Mapper
and Reducer
implementations.
Each additional output, or named output, may be configured with its own
OutputFormat
, with its own key class and with its own value
class.
A named output can be a single file or a multi file. The later is refered as
a multi named output.
A multi named output is an unbound set of files all sharing the same
OutputFormat
, key class and value class configuration.
When named outputs are used within a Mapper
implementation,
key/values written to a name output are not part of the reduce phase, only
key/values written to the job OutputCollector
are part of the
reduce phase.
MultipleOutputs supports counters, by default the are disabled. The counters
group is the MultipleOutputs
class name.
The names of the counters are the same as the named outputs. For multi
named outputs the name of the counter is the concatenation of the named
output, and underscore '_' and the multiname.
Job configuration usage pattern is:
JobConf conf = new JobConf(); conf.setInputPath(inDir); FileOutputFormat.setOutputPath(conf, outDir); conf.setMapperClass(MOMap.class); conf.setReducerClass(MOReduce.class); ... // Defines additional single text based output 'text' for the job MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class, LongWritable.class, Text.class); // Defines additional multi sequencefile based output 'sequence' for the // job MultipleOutputs.addMultiNamedOutput(conf, "seq", SequenceFileOutputFormat.class, LongWritable.class, Text.class); ... JobClient jc = new JobClient(); RunningJob job = jc.submitJob(conf); ...Job configuration usage pattern is:
public class MOReduce implements Reducer<WritableComparable, Writable> { private MultipleOutputs mos; public void configure(JobConf conf) { ... mos = new MultipleOutputs(conf); } public void reduce(WritableComparable key, Iterator<Writable> values, OutputCollector output, Reporter reporter) throws IOException { ... mos.getCollector("text", reporter).collect(key, new Text("Hello")); mos.getCollector("seq", "A", reporter).collect(key, new Text("Bye")); mos.getCollector("seq", "B", reporter).collect(key, new Text("Chau")); ... } public void close() throws IOException { mos.close(); ... } }
Constructor and Description |
---|
MultipleOutputs(JobConf job)
Creates and initializes multiple named outputs support, it should be
instantiated in the Mapper/Reducer configure method.
|
Modifier and Type | Method and Description |
---|---|
static void |
addMultiNamedOutput(JobConf conf,
String namedOutput,
Class<? extends OutputFormat> outputFormatClass,
Class<?> keyClass,
Class<?> valueClass)
Adds a multi named output for the job.
|
static void |
addNamedOutput(JobConf conf,
String namedOutput,
Class<? extends OutputFormat> outputFormatClass,
Class<?> keyClass,
Class<?> valueClass)
Adds a named output for the job.
|
void |
close()
Closes all the opened named outputs.
|
OutputCollector |
getCollector(String namedOutput,
Reporter reporter)
Gets the output collector for a named output.
|
OutputCollector |
getCollector(String namedOutput,
String multiName,
Reporter reporter)
Gets the output collector for a multi named output.
|
static boolean |
getCountersEnabled(JobConf conf)
Returns if the counters for the named outputs are enabled or not.
|
static Class<? extends OutputFormat> |
getNamedOutputFormatClass(JobConf conf,
String namedOutput)
Returns the named output OutputFormat.
|
static Class<?> |
getNamedOutputKeyClass(JobConf conf,
String namedOutput)
Returns the key class for a named output.
|
Iterator<String> |
getNamedOutputs()
Returns iterator with the defined name outputs.
|
static List<String> |
getNamedOutputsList(JobConf conf)
Returns list of channel names.
|
static Class<?> |
getNamedOutputValueClass(JobConf conf,
String namedOutput)
Returns the value class for a named output.
|
static boolean |
isMultiNamedOutput(JobConf conf,
String namedOutput)
Returns if a named output is multiple.
|
static void |
setCountersEnabled(JobConf conf,
boolean enabled)
Enables or disables counters for the named outputs.
|
public MultipleOutputs(JobConf job)
job
- the job configuration objectpublic static List<String> getNamedOutputsList(JobConf conf)
conf
- job confpublic static boolean isMultiNamedOutput(JobConf conf, String namedOutput)
conf
- job confnamedOutput
- named outputtrue
if the name output is multi, false
if it is single. If the name output is not defined it returns
false
public static Class<? extends OutputFormat> getNamedOutputFormatClass(JobConf conf, String namedOutput)
conf
- job confnamedOutput
- named outputpublic static Class<?> getNamedOutputKeyClass(JobConf conf, String namedOutput)
conf
- job confnamedOutput
- named outputpublic static Class<?> getNamedOutputValueClass(JobConf conf, String namedOutput)
conf
- job confnamedOutput
- named outputpublic static void addNamedOutput(JobConf conf, String namedOutput, Class<? extends OutputFormat> outputFormatClass, Class<?> keyClass, Class<?> valueClass)
conf
- job conf to add the named outputnamedOutput
- named output name, it has to be a word, letters
and numbers only, cannot be the word 'part' as
that is reserved for the
default output.outputFormatClass
- OutputFormat class.keyClass
- key classvalueClass
- value classpublic static void addMultiNamedOutput(JobConf conf, String namedOutput, Class<? extends OutputFormat> outputFormatClass, Class<?> keyClass, Class<?> valueClass)
conf
- job conf to add the named outputnamedOutput
- named output name, it has to be a word, letters
and numbers only, cannot be the word 'part' as
that is reserved for the
default output.outputFormatClass
- OutputFormat class.keyClass
- key classvalueClass
- value classpublic static void setCountersEnabled(JobConf conf, boolean enabled)
MultipleOutputs
class name.
The names of the counters are the same as the named outputs. For multi
named outputs the name of the counter is the concatenation of the named
output, and underscore '_' and the multiname.conf
- job conf to enableadd the named output.enabled
- indicates if the counters will be enabled or not.public static boolean getCountersEnabled(JobConf conf)
MultipleOutputs
class name.
The names of the counters are the same as the named outputs. For multi
named outputs the name of the counter is the concatenation of the named
output, and underscore '_' and the multiname.conf
- job conf to enableadd the named output.public Iterator<String> getNamedOutputs()
public OutputCollector getCollector(String namedOutput, Reporter reporter) throws IOException
namedOutput
- the named output namereporter
- the reporterIOException
- thrown if output collector could not be createdpublic OutputCollector getCollector(String namedOutput, String multiName, Reporter reporter) throws IOException
namedOutput
- the named output namemultiName
- the multi name partreporter
- the reporterIOException
- thrown if output collector could not be createdpublic void close() throws IOException
super.close()
at the
end of their close()
IOException
- thrown if any of the MultipleOutput files
could not be closed properly.Copyright © 2014 Apache Software Foundation. All Rights Reserved.