Class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
- Direct Known Subclasses:
ChainReducer,FieldSelectionReducer,IntSumReducer,LongSumReducer,ValueAggregatorCombiner,ValueAggregatorReducer,WrappedReducer
Reducer implementations
can access the Configuration for the job via the
JobContext.getConfiguration() method.
Reducer has 3 primary phases:
-
Shuffle
The
Reducercopies the sorted output from eachMapperusing HTTP across the network. -
Sort
The framework merge sorts
Reducerinputs bykeys (since differentMappers may have output the same key).The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.
SecondarySortTo achieve a secondary sort on the values returned by the value iterator, the application should extend the key with the secondary key and define a grouping comparator. The keys will be sorted using the entire key, but will be grouped using the grouping comparator to decide which keys and values are sent in the same call to reduce.The grouping comparator is specified via
For example, say that you want to find duplicate web pages and tag them all with the url of the "best" known example. You would set up the job like:Job.setGroupingComparatorClass(Class). The sort order is controlled byJob.setSortComparatorClass(Class).- Map Input Key: url
- Map Input Value: document
- Map Output Key: document checksum, url pagerank
- Map Output Value: url
- Partitioner: by checksum
- OutputKeyComparator: by checksum and then decreasing pagerank
- OutputValueGroupingComparator: by checksum
-
Reduce
In this phase the
reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)method is called for each<key, (collection of values)>in the sorted inputs.The output of the reduce task is typically written to a
RecordWriterviaTaskInputOutputContext.write(Object, Object).
The output of the Reducer is not re-sorted.
Example:
public class IntSumReducer<Key> extends Reducer<Key,IntWritable,
Key,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Key key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionclassorg.apache.hadoop.mapreduce.Reducer.ContextTheContextpassed on to theReducerimplementations. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidCalled once at the end of the task.protected voidreduce(KEYIN key, Iterable<VALUEIN> values, Reducer<KEYIN, VALUEIN, KEYOUT, VALUEOUT>.org.apache.hadoop.mapreduce.Reducer.Context context) This method is called once for each key.voidAdvanced application writers can use therun(org.apache.hadoop.mapreduce.Reducer.Context)method to control how the reduce task works.protected voidCalled once at the start of the task.
-
Constructor Details
-
Reducer
public Reducer()
-
-
Method Details
-
setup
protected void setup(Reducer<KEYIN, VALUEIN, throws IOException, InterruptedExceptionKEYOUT, VALUEOUT>.org.apache.hadoop.mapreduce.Reducer.Context context) Called once at the start of the task.- Throws:
IOExceptionInterruptedException
-
reduce
protected void reduce(KEYIN key, Iterable<VALUEIN> values, Reducer<KEYIN, VALUEIN, throws IOException, InterruptedExceptionKEYOUT, VALUEOUT>.org.apache.hadoop.mapreduce.Reducer.Context context) This method is called once for each key. Most applications will define their reduce class by overriding this method. The default implementation is an identity function.- Throws:
IOExceptionInterruptedException
-
cleanup
protected void cleanup(Reducer<KEYIN, VALUEIN, throws IOException, InterruptedExceptionKEYOUT, VALUEOUT>.org.apache.hadoop.mapreduce.Reducer.Context context) Called once at the end of the task.- Throws:
IOExceptionInterruptedException
-
run
public void run(Reducer<KEYIN, VALUEIN, throws IOException, InterruptedExceptionKEYOUT, VALUEOUT>.org.apache.hadoop.mapreduce.Reducer.Context context) Advanced application writers can use therun(org.apache.hadoop.mapreduce.Reducer.Context)method to control how the reduce task works.- Throws:
IOExceptionInterruptedException
-