ChainMapper (Apache Hadoop Main 2.7.6 API)

java.lang.Object
- org.apache.hadoop.mapreduce.Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
- - org.apache.hadoop.mapreduce.lib.chain.ChainMapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

```
@InterfaceAudience.Public
 @InterfaceStability.Stable
public class ChainMapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
extends Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
```
The ChainMapper class allows to use multiple Mapper classes within a single Map task.
The Mapper classes are invoked in a chained (or piped) fashion, the output of the first becomes the input of the second, and so on until the last Mapper, the output of the last Mapper will be written to the task's output.

The key functionality of this feature is that the Mappers in the chain do not need to be aware that they are executed in a chain. This enables having reusable specialized Mappers that can be combined to perform composite operations within a single task.

Special care has to be taken when creating chains that the key/values output by a Mapper are valid for the following Mapper in the chain. It is assumed all Mappers and the Reduce in the chain use matching output and input key and value classes as no conversion is done by the chaining code.

Using the ChainMapper and the ChainReducer classes is possible to compose Map/Reduce jobs that look like [MAP+ / REDUCE MAP*]. And immediate benefit of this pattern is a dramatic reduction in disk IO.

IMPORTANT: There is no need to specify the output key/value classes for the ChainMapper, this is done by the addMapper for the last mapper in the chain.
ChainMapper usage pattern:
```
 ...
 Job = new Job(conf);

 Configuration mapAConf = new Configuration(false);
 ...
 ChainMapper.addMapper(job, AMap.class, LongWritable.class, Text.class,
   Text.class, Text.class, true, mapAConf);

 Configuration mapBConf = new Configuration(false);
 ...
 ChainMapper.addMapper(job, BMap.class, Text.class, Text.class,
   LongWritable.class, Text.class, false, mapBConf);

 ...

 job.waitForComplettion(true);
 ...
 
```

Constructor Summary

Constructors
Constructor and Description

ChainMapper()

Constructors
Constructor and Description
`ChainMapper()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static void`	`addMapper(Job job, Class<? extends Mapper> klass, Class<?> inputKeyClass, Class<?> inputValueClass, Class<?> outputKeyClass, Class<?> outputValueClass, Configuration mapperConf)` Adds a `Mapper` class to the chain mapper.
`void`	`run(org.apache.hadoop.mapreduce.Mapper.Context context)` Expert users can override this method for more complete control over the execution of the Mapper.
`protected void`	`setup(org.apache.hadoop.mapreduce.Mapper.Context context)` Called once at the beginning of the task.

Methods inherited from class org.apache.hadoop.mapreduce.Mapper
cleanup, map

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - ChainMapper
```
public ChainMapper()
```
- Method Detail
  - addMapper
```
public static void addMapper(Job job,
                             Class<? extends Mapper> klass,
                             Class<?> inputKeyClass,
                             Class<?> inputValueClass,
                             Class<?> outputKeyClass,
                             Class<?> outputValueClass,
                             Configuration mapperConf)
                      throws IOException
```
    Adds a Mapper class to the chain mapper.
    The key and values are passed from one element of the chain to the next, by value. For the added Mapper the configuration given for it, mapperConf, have precedence over the job's Configuration. This precedence is in effect when the task is running.
    
    IMPORTANT: There is no need to specify the output key/value classes for the ChainMapper, this is done by the addMapper for the last mapper in the chain
    
    Parameters:
    
    job - The job.
    
    klass - the Mapper class to add.
    
    inputKeyClass - mapper input key class.
    
    inputValueClass - mapper input value class.
    
    outputKeyClass - mapper output key class.
    
    outputValueClass - mapper output value class.
    
    mapperConf - a configuration for the Mapper class. It is recommended to use a Configuration without default values using the Configuration(boolean loadDefaults) constructor with FALSE.
    
    Throws:
    
    IOException
  - setup
```
protected void setup(org.apache.hadoop.mapreduce.Mapper.Context context)
```
    Description copied from class: Mapper
    
    Called once at the beginning of the task.
    
    Overrides:
    
    setup in class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
  - run
```
public void run(org.apache.hadoop.mapreduce.Mapper.Context context)
         throws IOException,
                InterruptedException
```
    Description copied from class: Mapper
    
    Expert users can override this method for more complete control over the execution of the Mapper.
    
    Overrides:
    
    run in class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
    
    Throws:
    
    IOException
    
    InterruptedException

Class ChainMapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapreduce.Mapper

Methods inherited from class java.lang.Object

Constructor Detail

ChainMapper

Method Detail

addMapper

setup

run