Class MultipleOutputs
OutputCollector passed to
the map() and reduce() methods of the
Mapper and Reducer implementations.
Each additional output, or named output, may be configured with its own
OutputFormat, with its own key class and with its own value
class.
A named output can be a single file or a multi file. The later is referred as a multi named output.
A multi named output is an unbound set of files all sharing the same
OutputFormat, key class and value class configuration.
When named outputs are used within a Mapper implementation,
key/values written to a name output are not part of the reduce phase, only
key/values written to the job OutputCollector are part of the
reduce phase.
MultipleOutputs supports counters, by default the are disabled. The counters
group is the MultipleOutputs class name.
Job configuration usage pattern is:
JobConf conf = new JobConf(); conf.setInputPath(inDir); FileOutputFormat.setOutputPath(conf, outDir); conf.setMapperClass(MOMap.class); conf.setReducerClass(MOReduce.class); ... // Defines additional single text based output 'text' for the job MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class, LongWritable.class, Text.class); // Defines additional multi sequencefile based output 'sequence' for the // job MultipleOutputs.addMultiNamedOutput(conf, "seq", SequenceFileOutputFormat.class, LongWritable.class, Text.class); ... JobClient jc = new JobClient(); RunningJob job = jc.submitJob(conf); ...
Job configuration usage pattern is:
public class MOReduce implements
Reducer<WritableComparable, Writable> {
private MultipleOutputs mos;
public void configure(JobConf conf) {
...
mos = new MultipleOutputs(conf);
}
public void reduce(WritableComparable key, Iterator<Writable> values,
OutputCollector output, Reporter reporter)
throws IOException {
...
mos.getCollector("text", reporter).collect(key, new Text("Hello"));
mos.getCollector("seq", "A", reporter).collect(key, new Text("Bye"));
mos.getCollector("seq", "B", reporter).collect(key, new Text("Chau"));
...
}
public void close() throws IOException {
mos.close();
...
}
}
-
Constructor Summary
ConstructorsConstructorDescriptionMultipleOutputs(JobConf job) Creates and initializes multiple named outputs support, it should be instantiated in the Mapper/Reducer configure method. -
Method Summary
Modifier and TypeMethodDescriptionstatic voidaddMultiNamedOutput(JobConf conf, String namedOutput, Class<? extends OutputFormat> outputFormatClass, Class<?> keyClass, Class<?> valueClass) Adds a multi named output for the job.static voidaddNamedOutput(JobConf conf, String namedOutput, Class<? extends OutputFormat> outputFormatClass, Class<?> keyClass, Class<?> valueClass) Adds a named output for the job.voidclose()Closes all the opened named outputs.getCollector(String namedOutput, String multiName, Reporter reporter) Gets the output collector for a multi named output.getCollector(String namedOutput, Reporter reporter) Gets the output collector for a named output.static booleangetCountersEnabled(JobConf conf) Returns if the counters for the named outputs are enabled or not.static Class<? extends OutputFormat>getNamedOutputFormatClass(JobConf conf, String namedOutput) Returns the named output OutputFormat.static Class<?>getNamedOutputKeyClass(JobConf conf, String namedOutput) Returns the key class for a named output.Returns iterator with the defined name outputs.getNamedOutputsList(JobConf conf) Returns list of channel names.static Class<?>getNamedOutputValueClass(JobConf conf, String namedOutput) Returns the value class for a named output.static booleanisMultiNamedOutput(JobConf conf, String namedOutput) Returns if a named output is multiple.static voidsetCountersEnabled(JobConf conf, boolean enabled) Enables or disables counters for the named outputs.
-
Constructor Details
-
MultipleOutputs
Creates and initializes multiple named outputs support, it should be instantiated in the Mapper/Reducer configure method.- Parameters:
job- the job configuration object
-
-
Method Details
-
getNamedOutputsList
Returns list of channel names.- Parameters:
conf- job conf- Returns:
- List of channel Names
-
isMultiNamedOutput
Returns if a named output is multiple.- Parameters:
conf- job confnamedOutput- named output- Returns:
trueif the name output is multi,falseif it is single. If the name output is not defined it returnsfalse
-
getNamedOutputFormatClass
public static Class<? extends OutputFormat> getNamedOutputFormatClass(JobConf conf, String namedOutput) Returns the named output OutputFormat.- Parameters:
conf- job confnamedOutput- named output- Returns:
- namedOutput OutputFormat
-
getNamedOutputKeyClass
Returns the key class for a named output.- Parameters:
conf- job confnamedOutput- named output- Returns:
- class for the named output key
-
getNamedOutputValueClass
Returns the value class for a named output.- Parameters:
conf- job confnamedOutput- named output- Returns:
- class of named output value
-
addNamedOutput
public static void addNamedOutput(JobConf conf, String namedOutput, Class<? extends OutputFormat> outputFormatClass, Class<?> keyClass, Class<?> valueClass) Adds a named output for the job.- Parameters:
conf- job conf to add the named outputnamedOutput- named output name, it has to be a word, letters and numbers only, cannot be the word 'part' as that is reserved for the default output.outputFormatClass- OutputFormat class.keyClass- key classvalueClass- value class
-
addMultiNamedOutput
public static void addMultiNamedOutput(JobConf conf, String namedOutput, Class<? extends OutputFormat> outputFormatClass, Class<?> keyClass, Class<?> valueClass) Adds a multi named output for the job.- Parameters:
conf- job conf to add the named outputnamedOutput- named output name, it has to be a word, letters and numbers only, cannot be the word 'part' as that is reserved for the default output.outputFormatClass- OutputFormat class.keyClass- key classvalueClass- value class
-
setCountersEnabled
Enables or disables counters for the named outputs.By default these counters are disabled.
MultipleOutputs supports counters, by default the are disabled. The counters group is the
The names of the counters are the same as the named outputs. For multi named outputs the name of the counter is the concatenation of the named output, and underscore '_' and the multiname.MultipleOutputsclass name.- Parameters:
conf- job conf to enableadd the named output.enabled- indicates if the counters will be enabled or not.
-
getCountersEnabled
Returns if the counters for the named outputs are enabled or not.By default these counters are disabled.
MultipleOutputs supports counters, by default the are disabled. The counters group is the
The names of the counters are the same as the named outputs. For multi named outputs the name of the counter is the concatenation of the named output, and underscore '_' and the multiname.MultipleOutputsclass name.- Parameters:
conf- job conf to enableadd the named output.- Returns:
- TRUE if the counters are enabled, FALSE if they are disabled.
-
getNamedOutputs
Returns iterator with the defined name outputs.- Returns:
- iterator with the defined named outputs
-
getCollector
Gets the output collector for a named output.- Parameters:
namedOutput- the named output namereporter- the reporter- Returns:
- the output collector for the given named output
- Throws:
IOException- thrown if output collector could not be created
-
getCollector
public OutputCollector getCollector(String namedOutput, String multiName, Reporter reporter) throws IOException Gets the output collector for a multi named output.- Parameters:
namedOutput- the named output namemultiName- the multi name partreporter- the reporter- Returns:
- the output collector for the given named output
- Throws:
IOException- thrown if output collector could not be created
-
close
Closes all the opened named outputs.If overriden subclasses must invoke
super.close()at the end of theirclose()- Throws:
IOException- thrown if any of the MultipleOutput files could not be closed properly.
-