org.apache.hadoop.mapreduce.Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

org.apache.hadoop.mapreduce.lib.chain.ChainReducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

@Public @Stable public class ChainReducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT> extends Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

The ChainReducer class allows to chain multiple Mapper classes after a Reducer within the Reducer task.

For each record output by the Reducer, the Mapper classes are invoked in a chained (or piped) fashion. The output of the reducer becomes the input of the first mapper and output of first becomes the input of the second, and so on until the last Mapper, the output of the last Mapper will be written to the task's output.

The key functionality of this feature is that the Mappers in the chain do not need to be aware that they are executed after the Reducer or in a chain. This enables having reusable specialized Mappers that can be combined to perform composite operations within a single task.

Special care has to be taken when creating chains that the key/values output by a Mapper are valid for the following Mapper in the chain. It is assumed all Mappers and the Reduce in the chain use matching output and input key and value classes as no conversion is done by the chaining code.

Using the ChainMapper and the ChainReducer classes is possible to compose Map/Reduce jobs that look like [MAP+ / REDUCE MAP*]. And immediate benefit of this pattern is a dramatic reduction in disk IO.

IMPORTANT: There is no need to specify the output key/value classes for the ChainReducer, this is done by the setReducer or the addMapper for the last element in the chain.

ChainReducer usage pattern:

 ...
 Job = new Job(conf);
 ....

 Configuration reduceConf = new Configuration(false);
 ...
 ChainReducer.setReducer(job, XReduce.class, LongWritable.class, Text.class,
   Text.class, Text.class, true, reduceConf);

 ChainReducer.addMapper(job, CMap.class, Text.class, Text.class,
   LongWritable.class, Text.class, false, null);

 ChainReducer.addMapper(job, DMap.class, LongWritable.class, Text.class,
   LongWritable.class, LongWritable.class, true, null);

 ...

 job.waitForCompletion(true);
 ...

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Reducer
org.apache.hadoop.mapreduce.Reducer.Context
Constructor Summary

Constructors

Constructor

Description

ChainReducer()
Method Summary

Modifier and Type

Method

Description

static void

addMapper(Job job, Class<? extends Mapper> klass, Class<?> inputKeyClass, Class<?> inputValueClass, Class<?> outputKeyClass, Class<?> outputValueClass, Configuration mapperConf)

Adds a Mapper class to the chain reducer.

void

run(Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>.org.apache.hadoop.mapreduce.Reducer.Context context)

Advanced application writers can use the Reducer.run(org.apache.hadoop.mapreduce.Reducer.Context) method to control how the reduce task works.

static void

setReducer(Job job, Class<? extends Reducer> klass, Class<?> inputKeyClass, Class<?> inputValueClass, Class<?> outputKeyClass, Class<?> outputValueClass, Configuration reducerConf)

Sets the Reducer class to the chain job.

protected void

setup(Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>.org.apache.hadoop.mapreduce.Reducer.Context context)

Called once at the start of the task.

Methods inherited from class org.apache.hadoop.mapreduce.Reducer
cleanup, reduce

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- ChainReducer
  
  public ChainReducer()
Method Details
- setReducer
  
  public static void setReducer(Job job, Class<? extends Reducer> klass, Class<?> inputKeyClass, Class<?> inputValueClass, Class<?> outputKeyClass, Class<?> outputValueClass, Configuration reducerConf)
  
  Sets the Reducer class to the chain job.
  The key and values are passed from one element of the chain to the next, by value. For the added Reducer the configuration given for it, reducerConf, have precedence over the job's Configuration. This precedence is in effect when the task is running.
  
  IMPORTANT: There is no need to specify the output key/value classes for the ChainReducer, this is done by the setReducer or the addMapper for the last element in the chain.
  
  Parameters:
  
  job - the job
  
  klass - the Reducer class to add.
  
  inputKeyClass - reducer input key class.
  
  inputValueClass - reducer input value class.
  
  outputKeyClass - reducer output key class.
  
  outputValueClass - reducer output value class.
  
  reducerConf - a configuration for the Reducer class. It is recommended to use a Configuration without default values using the Configuration(boolean loadDefaults) constructor with FALSE.
- addMapper
  
  public static void addMapper(Job job, Class<? extends Mapper> klass, Class<?> inputKeyClass, Class<?> inputValueClass, Class<?> outputKeyClass, Class<?> outputValueClass, Configuration mapperConf) throws IOException
  
  Adds a Mapper class to the chain reducer.
  The key and values are passed from one element of the chain to the next, by value For the added Mapper the configuration given for it, mapperConf, have precedence over the job's Configuration. This precedence is in effect when the task is running.
  
  IMPORTANT: There is no need to specify the output key/value classes for the ChainMapper, this is done by the addMapper for the last mapper in the chain.
  
  Parameters:
  
  job - The job.
  
  klass - the Mapper class to add.
  
  inputKeyClass - mapper input key class.
  
  inputValueClass - mapper input value class.
  
  outputKeyClass - mapper output key class.
  
  outputValueClass - mapper output value class.
  
  mapperConf - a configuration for the Mapper class. It is recommended to use a Configuration without default values using the Configuration(boolean loadDefaults) constructor with FALSE.
  
  Throws:
  
  IOException
- setup
  
  protected void setup(Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>.org.apache.hadoop.mapreduce.Reducer.Context context)
  
  Description copied from class: Reducer
  
  Called once at the start of the task.
  
  Overrides:
  
  setup in class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
- run
  
  public void run(Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>.org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException, InterruptedException
  
  Description copied from class: Reducer
  
  Advanced application writers can use the Reducer.run(org.apache.hadoop.mapreduce.Reducer.Context) method to control how the reduce task works.
  
  Overrides:
  
  run in class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
  
  Throws:
  
  IOException
  
  InterruptedException

Class ChainReducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Reducer

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapreduce.Reducer

Methods inherited from class java.lang.Object

Constructor Details

ChainReducer

Method Details

setReducer

addMapper

setup

run