@InterfaceAudience.Public @InterfaceStability.Stable public class ChainMapper extends Object implements Mapper
[MAP+ / REDUCE MAP*]
. And
immediate benefit of this pattern is a dramatic reduction in disk IO.
IMPORTANT: There is no need to specify the output key/value classes for the
ChainMapper, this is done by the addMapper for the last mapper in the chain.
ChainMapper usage pattern:
... conf.setJobName("chain"); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); JobConf mapAConf = new JobConf(false); ... ChainMapper.addMapper(conf, AMap.class, LongWritable.class, Text.class, Text.class, Text.class, true, mapAConf); JobConf mapBConf = new JobConf(false); ... ChainMapper.addMapper(conf, BMap.class, Text.class, Text.class, LongWritable.class, Text.class, false, mapBConf); JobConf reduceConf = new JobConf(false); ... ChainReducer.setReducer(conf, XReduce.class, LongWritable.class, Text.class, Text.class, Text.class, true, reduceConf); ChainReducer.addMapper(conf, CMap.class, Text.class, Text.class, LongWritable.class, Text.class, false, null); ChainReducer.addMapper(conf, DMap.class, LongWritable.class, Text.class, LongWritable.class, LongWritable.class, true, null); FileInputFormat.setInputPaths(conf, inDir); FileOutputFormat.setOutputPath(conf, outDir); ... JobClient jc = new JobClient(conf); RunningJob job = jc.submitJob(conf); ...
Constructor and Description |
---|
ChainMapper()
Constructor.
|
Modifier and Type | Method and Description |
---|---|
static <K1,V1,K2,V2> |
addMapper(JobConf job,
Class<? extends Mapper<K1,V1,K2,V2>> klass,
Class<? extends K1> inputKeyClass,
Class<? extends V1> inputValueClass,
Class<? extends K2> outputKeyClass,
Class<? extends V2> outputValueClass,
boolean byValue,
JobConf mapperConf)
Adds a Mapper class to the chain job's JobConf.
|
void |
close()
Closes the ChainMapper and all the Mappers in the chain.
|
void |
configure(JobConf job)
Configures the ChainMapper and all the Mappers in the chain.
|
void |
map(Object key,
Object value,
OutputCollector output,
Reporter reporter)
Chains the
map(...) methods of the Mappers in the chain. |
public ChainMapper()
public static <K1,V1,K2,V2> void addMapper(JobConf job, Class<? extends Mapper<K1,V1,K2,V2>> klass, Class<? extends K1> inputKeyClass, Class<? extends V1> inputValueClass, Class<? extends K2> outputKeyClass, Class<? extends V2> outputValueClass, boolean byValue, JobConf mapperConf)
mapperConf
, have precedence over the job's JobConf. This
precedence is in effect when the task is running.
IMPORTANT: There is no need to specify the output key/value classes for the
ChainMapper, this is done by the addMapper for the last mapper in the chain
job
- job's JobConf to add the Mapper class.klass
- the Mapper class to add.inputKeyClass
- mapper input key class.inputValueClass
- mapper input value class.outputKeyClass
- mapper output key class.outputValueClass
- mapper output value class.byValue
- indicates if key/values should be passed by value
to the next Mapper in the chain, if any.mapperConf
- a JobConf with the configuration for the Mapper
class. It is recommended to use a JobConf without default values using the
JobConf(boolean loadDefaults)
constructor with FALSE.public void configure(JobConf job)
super.configure(...)
should be
invoked at the beginning of the overwriter method.configure
in interface JobConfigurable
job
- the configurationpublic void map(Object key, Object value, OutputCollector output, Reporter reporter) throws IOException
map(...)
methods of the Mappers in the chain.map
in interface Mapper
key
- the input key.value
- the input value.output
- collects mapped keys and values.reporter
- facility to report progress.IOException
public void close() throws IOException
super.close()
should be
invoked at the end of the overwriter method.close
in interface Closeable
close
in interface AutoCloseable
IOException
Copyright © 2015 Apache Software Foundation. All Rights Reserved.