@InterfaceAudience.Public @InterfaceStability.Stable public class ChainReducer extends Object implements Reducer
[MAP+ / REDUCE MAP*]
. And
immediate benefit of this pattern is a dramatic reduction in disk IO.
IMPORTANT: There is no need to specify the output key/value classes for the
ChainReducer, this is done by the setReducer or the addMapper for the last
element in the chain.
ChainReducer usage pattern:
... conf.setJobName("chain"); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); JobConf mapAConf = new JobConf(false); ... ChainMapper.addMapper(conf, AMap.class, LongWritable.class, Text.class, Text.class, Text.class, true, mapAConf); JobConf mapBConf = new JobConf(false); ... ChainMapper.addMapper(conf, BMap.class, Text.class, Text.class, LongWritable.class, Text.class, false, mapBConf); JobConf reduceConf = new JobConf(false); ... ChainReducer.setReducer(conf, XReduce.class, LongWritable.class, Text.class, Text.class, Text.class, true, reduceConf); ChainReducer.addMapper(conf, CMap.class, Text.class, Text.class, LongWritable.class, Text.class, false, null); ChainReducer.addMapper(conf, DMap.class, LongWritable.class, Text.class, LongWritable.class, LongWritable.class, true, null); FileInputFormat.setInputPaths(conf, inDir); FileOutputFormat.setOutputPath(conf, outDir); ... JobClient jc = new JobClient(conf); RunningJob job = jc.submitJob(conf); ...
Constructor and Description |
---|
ChainReducer()
Constructor.
|
Modifier and Type | Method and Description |
---|---|
static <K1,V1,K2,V2> |
addMapper(JobConf job,
Class<? extends Mapper<K1,V1,K2,V2>> klass,
Class<? extends K1> inputKeyClass,
Class<? extends V1> inputValueClass,
Class<? extends K2> outputKeyClass,
Class<? extends V2> outputValueClass,
boolean byValue,
JobConf mapperConf)
Adds a Mapper class to the chain job's JobConf.
|
void |
close()
Closes the ChainReducer, the Reducer and all the Mappers in the chain.
|
void |
configure(JobConf job)
Configures the ChainReducer, the Reducer and all the Mappers in the chain.
|
void |
reduce(Object key,
Iterator values,
OutputCollector output,
Reporter reporter)
Chains the
reduce(...) method of the Reducer with the
map(...) methods of the Mappers in the chain. |
static <K1,V1,K2,V2> |
setReducer(JobConf job,
Class<? extends Reducer<K1,V1,K2,V2>> klass,
Class<? extends K1> inputKeyClass,
Class<? extends V1> inputValueClass,
Class<? extends K2> outputKeyClass,
Class<? extends V2> outputValueClass,
boolean byValue,
JobConf reducerConf)
Sets the Reducer class to the chain job's JobConf.
|
public ChainReducer()
public static <K1,V1,K2,V2> void setReducer(JobConf job, Class<? extends Reducer<K1,V1,K2,V2>> klass, Class<? extends K1> inputKeyClass, Class<? extends V1> inputValueClass, Class<? extends K2> outputKeyClass, Class<? extends V2> outputValueClass, boolean byValue, JobConf reducerConf)
reducerConf
, have precedence over the job's JobConf. This
precedence is in effect when the task is running.
IMPORTANT: There is no need to specify the output key/value classes for the
ChainReducer, this is done by the setReducer or the addMapper for the last
element in the chain.job
- job's JobConf to add the Reducer class.klass
- the Reducer class to add.inputKeyClass
- reducer input key class.inputValueClass
- reducer input value class.outputKeyClass
- reducer output key class.outputValueClass
- reducer output value class.byValue
- indicates if key/values should be passed by value
to the next Mapper in the chain, if any.reducerConf
- a JobConf with the configuration for the Reducer
class. It is recommended to use a JobConf without default values using the
JobConf(boolean loadDefaults)
constructor with FALSE.public static <K1,V1,K2,V2> void addMapper(JobConf job, Class<? extends Mapper<K1,V1,K2,V2>> klass, Class<? extends K1> inputKeyClass, Class<? extends V1> inputValueClass, Class<? extends K2> outputKeyClass, Class<? extends V2> outputValueClass, boolean byValue, JobConf mapperConf)
mapperConf
, have precedence over the job's JobConf. This
precedence is in effect when the task is running.
IMPORTANT: There is no need to specify the output key/value classes for the
ChainMapper, this is done by the addMapper for the last mapper in the chain
.job
- chain job's JobConf to add the Mapper class.klass
- the Mapper class to add.inputKeyClass
- mapper input key class.inputValueClass
- mapper input value class.outputKeyClass
- mapper output key class.outputValueClass
- mapper output value class.byValue
- indicates if key/values should be passed by value
to the next Mapper in the chain, if any.mapperConf
- a JobConf with the configuration for the Mapper
class. It is recommended to use a JobConf without default values using the
JobConf(boolean loadDefaults)
constructor with FALSE.public void configure(JobConf job)
super.configure(...)
should be
invoked at the beginning of the overwriter method.configure
in interface JobConfigurable
job
- the configurationpublic void reduce(Object key, Iterator values, OutputCollector output, Reporter reporter) throws IOException
reduce(...)
method of the Reducer with the
map(...)
methods of the Mappers in the chain.reduce
in interface Reducer
key
- the key.values
- the list of values to reduce.output
- to collect keys and combined values.reporter
- facility to report progress.IOException
public void close() throws IOException
super.close()
should be
invoked at the end of the overwriter method.close
in interface Closeable
close
in interface AutoCloseable
IOException
Copyright © 2016 Apache Software Foundation. All Rights Reserved.