|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.hadoop.mapred.lib.ChainMapper
@InterfaceAudience.Public @InterfaceStability.Stable public class ChainMapper
The ChainMapper class allows to use multiple Mapper classes within a single Map task.
The Mapper classes are invoked in a chained (or piped) fashion, the output of the first becomes the input of the second, and so on until the last Mapper, the output of the last Mapper will be written to the task's output. The key functionality of this feature is that the Mappers in the chain do not need to be aware that they are executed in a chain. This enables having reusable specialized Mappers that can be combined to perform composite operations within a single task. Special care has to be taken when creating chains that the key/values output by a Mapper are valid for the following Mapper in the chain. It is assumed all Mappers and the Reduce in the chain use maching output and input key and value classes as no conversion is done by the chaining code. Using the ChainMapper and the ChainReducer classes is possible to compose Map/Reduce jobs that look like[MAP+ / REDUCE MAP*]
. And
immediate benefit of this pattern is a dramatic reduction in disk IO.
IMPORTANT: There is no need to specify the output key/value classes for the
ChainMapper, this is done by the addMapper for the last mapper in the chain.
ChainMapper usage pattern:
... conf.setJobName("chain"); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); JobConf mapAConf = new JobConf(false); ... ChainMapper.addMapper(conf, AMap.class, LongWritable.class, Text.class, Text.class, Text.class, true, mapAConf); JobConf mapBConf = new JobConf(false); ... ChainMapper.addMapper(conf, BMap.class, Text.class, Text.class, LongWritable.class, Text.class, false, mapBConf); JobConf reduceConf = new JobConf(false); ... ChainReducer.setReducer(conf, XReduce.class, LongWritable.class, Text.class, Text.class, Text.class, true, reduceConf); ChainReducer.addMapper(conf, CMap.class, Text.class, Text.class, LongWritable.class, Text.class, false, null); ChainReducer.addMapper(conf, DMap.class, LongWritable.class, Text.class, LongWritable.class, LongWritable.class, true, null); FileInputFormat.setInputPaths(conf, inDir); FileOutputFormat.setOutputPath(conf, outDir); ... JobClient jc = new JobClient(conf); RunningJob job = jc.submitJob(conf); ...
Constructor Summary | |
---|---|
ChainMapper()
Constructor. |
Method Summary | ||
---|---|---|
static
|
addMapper(JobConf job,
Class<? extends Mapper<K1,V1,K2,V2>> klass,
Class<? extends K1> inputKeyClass,
Class<? extends V1> inputValueClass,
Class<? extends K2> outputKeyClass,
Class<? extends V2> outputValueClass,
boolean byValue,
JobConf mapperConf)
Adds a Mapper class to the chain job's JobConf. |
|
void |
close()
Closes the ChainMapper and all the Mappers in the chain. |
|
void |
configure(JobConf job)
Configures the ChainMapper and all the Mappers in the chain. |
|
void |
map(Object key,
Object value,
OutputCollector output,
Reporter reporter)
Chains the map(...) methods of the Mappers in the chain. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public ChainMapper()
Method Detail |
---|
public static <K1,V1,K2,V2> void addMapper(JobConf job, Class<? extends Mapper<K1,V1,K2,V2>> klass, Class<? extends K1> inputKeyClass, Class<? extends V1> inputValueClass, Class<? extends K2> outputKeyClass, Class<? extends V2> outputValueClass, boolean byValue, JobConf mapperConf)
mapperConf
, have precedence over the job's JobConf. This
precedence is in effect when the task is running.
IMPORTANT: There is no need to specify the output key/value classes for the
ChainMapper, this is done by the addMapper for the last mapper in the chain
job
- job's JobConf to add the Mapper class.klass
- the Mapper class to add.inputKeyClass
- mapper input key class.inputValueClass
- mapper input value class.outputKeyClass
- mapper output key class.outputValueClass
- mapper output value class.byValue
- indicates if key/values should be passed by value
to the next Mapper in the chain, if any.mapperConf
- a JobConf with the configuration for the Mapper
class. It is recommended to use a JobConf without default values using the
JobConf(boolean loadDefaults)
constructor with FALSE.public void configure(JobConf job)
super.configure(...)
should be
invoked at the beginning of the overwriter method.
configure
in interface JobConfigurable
job
- the configurationpublic void map(Object key, Object value, OutputCollector output, Reporter reporter) throws IOException
map(...)
methods of the Mappers in the chain.
map
in interface Mapper
key
- the input key.value
- the input value.output
- collects mapped keys and values.reporter
- facility to report progress.
IOException
public void close() throws IOException
super.close()
should be
invoked at the end of the overwriter method.
close
in interface Closeable
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |