org.apache.hadoop.mapred.lib.aggregate
Class ValueAggregatorBaseDescriptor

java.lang.Object
  extended by org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorBaseDescriptor
All Implemented Interfaces:
ValueAggregatorDescriptor
Direct Known Subclasses:
AggregateWordCount.WordCountPlugInClass, AggregateWordHistogram.AggregateWordHistogramPlugin

public class ValueAggregatorBaseDescriptor
extends Object
implements ValueAggregatorDescriptor

This class implements the common functionalities of the subclasses of ValueAggregatorDescriptor class.


Field Summary
static String DOUBLE_VALUE_SUM
           
 String inputFile
           
static String LONG_VALUE_MAX
           
static String LONG_VALUE_MIN
           
static String LONG_VALUE_SUM
           
static String STRING_VALUE_MAX
           
static String STRING_VALUE_MIN
           
static String UNIQ_VALUE_COUNT
           
static String VALUE_HISTOGRAM
           
 
Fields inherited from interface org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorDescriptor
ONE, TYPE_SEPARATOR
 
Constructor Summary
ValueAggregatorBaseDescriptor()
           
 
Method Summary
 void configure(JobConf job)
          get the input file name.
static Map.Entry<Text,Text> generateEntry(String type, String id, Text val)
           
 ArrayList<Map.Entry<Text,Text>> generateKeyValPairs(Object key, Object val)
          Generate 1 or 2 aggregation-id/value pairs for the given key/value pair.
static ValueAggregator generateValueAggregator(String type)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

UNIQ_VALUE_COUNT

public static final String UNIQ_VALUE_COUNT
See Also:
Constant Field Values

LONG_VALUE_SUM

public static final String LONG_VALUE_SUM
See Also:
Constant Field Values

DOUBLE_VALUE_SUM

public static final String DOUBLE_VALUE_SUM
See Also:
Constant Field Values

VALUE_HISTOGRAM

public static final String VALUE_HISTOGRAM
See Also:
Constant Field Values

LONG_VALUE_MAX

public static final String LONG_VALUE_MAX
See Also:
Constant Field Values

LONG_VALUE_MIN

public static final String LONG_VALUE_MIN
See Also:
Constant Field Values

STRING_VALUE_MAX

public static final String STRING_VALUE_MAX
See Also:
Constant Field Values

STRING_VALUE_MIN

public static final String STRING_VALUE_MIN
See Also:
Constant Field Values

inputFile

public String inputFile
Constructor Detail

ValueAggregatorBaseDescriptor

public ValueAggregatorBaseDescriptor()
Method Detail

generateEntry

public static Map.Entry<Text,Text> generateEntry(String type,
                                                 String id,
                                                 Text val)
Parameters:
type - the aggregation type
id - the aggregation id
val - the val associated with the id to be aggregated
Returns:
an Entry whose key is the aggregation id prefixed with the aggregation type.

generateValueAggregator

public static ValueAggregator generateValueAggregator(String type)
Parameters:
type - the aggregation type
Returns:
a value aggregator of the given type.

generateKeyValPairs

public ArrayList<Map.Entry<Text,Text>> generateKeyValPairs(Object key,
                                                           Object val)
Generate 1 or 2 aggregation-id/value pairs for the given key/value pair. The first id will be of type LONG_VALUE_SUM, with "record_count" as its aggregation id. If the input is a file split, the second id of the same type will be generated too, with the file name as its aggregation id. This achieves the behavior of counting the total number of records in the input data, and the number of records in each input file.

Specified by:
generateKeyValPairs in interface ValueAggregatorDescriptor
Parameters:
key - input key
val - input value
Returns:
a list of aggregation id/value pairs. An aggregation id encodes an aggregation type which is used to guide the way to aggregate the value in the reduce/combiner phrase of an Aggregate based job.

configure

public void configure(JobConf job)
get the input file name.

Specified by:
configure in interface ValueAggregatorDescriptor
Parameters:
job - a job configuration object


Copyright © 2009 The Apache Software Foundation