@InterfaceAudience.Public @InterfaceStability.Stable public class ChainMapper extends Object implements Mapper
[MAP+ / REDUCE MAP*]. And
immediate benefit of this pattern is a dramatic reduction in disk IO.
IMPORTANT: There is no need to specify the output key/value classes for the
ChainMapper, this is done by the addMapper for the last mapper in the chain.
ChainMapper usage pattern:
...
conf.setJobName("chain");
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
JobConf mapAConf = new JobConf(false);
...
ChainMapper.addMapper(conf, AMap.class, LongWritable.class, Text.class,
Text.class, Text.class, true, mapAConf);
JobConf mapBConf = new JobConf(false);
...
ChainMapper.addMapper(conf, BMap.class, Text.class, Text.class,
LongWritable.class, Text.class, false, mapBConf);
JobConf reduceConf = new JobConf(false);
...
ChainReducer.setReducer(conf, XReduce.class, LongWritable.class, Text.class,
Text.class, Text.class, true, reduceConf);
ChainReducer.addMapper(conf, CMap.class, Text.class, Text.class,
LongWritable.class, Text.class, false, null);
ChainReducer.addMapper(conf, DMap.class, LongWritable.class, Text.class,
LongWritable.class, LongWritable.class, true, null);
FileInputFormat.setInputPaths(conf, inDir);
FileOutputFormat.setOutputPath(conf, outDir);
...
JobClient jc = new JobClient(conf);
RunningJob job = jc.submitJob(conf);
...
| Constructor and Description |
|---|
ChainMapper()
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
static <K1,V1,K2,V2> |
addMapper(JobConf job,
Class<? extends Mapper<K1,V1,K2,V2>> klass,
Class<? extends K1> inputKeyClass,
Class<? extends V1> inputValueClass,
Class<? extends K2> outputKeyClass,
Class<? extends V2> outputValueClass,
boolean byValue,
JobConf mapperConf)
Adds a Mapper class to the chain job's JobConf.
|
void |
close()
Closes the ChainMapper and all the Mappers in the chain.
|
void |
configure(JobConf job)
Configures the ChainMapper and all the Mappers in the chain.
|
void |
map(Object key,
Object value,
OutputCollector output,
Reporter reporter)
Chains the
map(...) methods of the Mappers in the chain. |
public ChainMapper()
public static <K1,V1,K2,V2> void addMapper(JobConf job, Class<? extends Mapper<K1,V1,K2,V2>> klass, Class<? extends K1> inputKeyClass, Class<? extends V1> inputValueClass, Class<? extends K2> outputKeyClass, Class<? extends V2> outputValueClass, boolean byValue, JobConf mapperConf)
mapperConf, have precedence over the job's JobConf. This
precedence is in effect when the task is running.
IMPORTANT: There is no need to specify the output key/value classes for the
ChainMapper, this is done by the addMapper for the last mapper in the chain
job - job's JobConf to add the Mapper class.klass - the Mapper class to add.inputKeyClass - mapper input key class.inputValueClass - mapper input value class.outputKeyClass - mapper output key class.outputValueClass - mapper output value class.byValue - indicates if key/values should be passed by value
to the next Mapper in the chain, if any.mapperConf - a JobConf with the configuration for the Mapper
class. It is recommended to use a JobConf without default values using the
JobConf(boolean loadDefaults) constructor with FALSE.public void configure(JobConf job)
super.configure(...) should be
invoked at the beginning of the overwriter method.configure in interface JobConfigurablejob - the configurationpublic void map(Object key, Object value, OutputCollector output, Reporter reporter) throws IOException
map(...) methods of the Mappers in the chain.map in interface Mapperkey - the input key.value - the input value.output - collects mapped keys and values.reporter - facility to report progress.IOExceptionpublic void close() throws IOException
super.close() should be
invoked at the end of the overwriter method.close in interface Closeableclose in interface AutoCloseableIOExceptionCopyright © 2015 Apache Software Foundation. All Rights Reserved.