org.apache.hadoop.mapred.lib.aggregate
Class ValueAggregatorJob
java.lang.Object
org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJob
public class ValueAggregatorJob
- extends Object
This is the main class for creating a map/reduce job using Aggregate
framework. The Aggregate is a specialization of map/reduce framework,
specilizing for performing various simple aggregations.
Generally speaking, in order to implement an application using Map/Reduce
model, the developer is to implement Map and Reduce functions (and possibly
combine function). However, a lot of applications related to counting and
statistics computing have very similar characteristics. Aggregate abstracts
out the general patterns of these functions and implementing those patterns.
In particular, the package provides generic mapper/redducer/combiner classes,
and a set of built-in value aggregators, and a generic utility class that
helps user create map/reduce jobs using the generic class. The built-in
aggregators include:
sum over numeric values count the number of distinct values compute the
histogram of values compute the minimum, maximum, media,average, standard
deviation of numeric values
The developer using Aggregate will need only to provide a plugin class
conforming to the following interface:
public interface ValueAggregatorDescriptor { public ArrayList
generateKeyValPairs(Object key, Object value); public void
configure(JobConfjob); }
The package also provides a base class, ValueAggregatorBaseDescriptor,
implementing the above interface. The user can extend the base class and
implement generateKeyValPairs accordingly.
The primary work of generateKeyValPairs is to emit one or more key/value
pairs based on the input key/value pair. The key in an output key/value pair
encode two pieces of information: aggregation type and aggregation id. The
value will be aggregated onto the aggregation id according the aggregation
type.
This class offers a function to generate a map/reduce job using Aggregate
framework. The function takes the following parameters: input directory spec
input format (text or sequence file) output directory a file specifying the
user plugin class
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ValueAggregatorJob
public ValueAggregatorJob()
createValueAggregatorJobs
public static JobControl createValueAggregatorJobs(String[] args,
Class<? extends ValueAggregatorDescriptor>[] descriptors)
throws IOException
- Throws:
IOException
createValueAggregatorJobs
public static JobControl createValueAggregatorJobs(String[] args)
throws IOException
- Throws:
IOException
createValueAggregatorJob
public static JobConf createValueAggregatorJob(String[] args)
throws IOException
- Create an Aggregate based map/reduce job.
- Parameters:
args
- the arguments used for job creation. Generic hadoop
arguments are accepted.
- Returns:
- a JobConf object ready for submission.
- Throws:
IOException
- See Also:
GenericOptionsParser
createValueAggregatorJob
public static JobConf createValueAggregatorJob(String[] args,
Class<? extends ValueAggregatorDescriptor>[] descriptors)
throws IOException
- Throws:
IOException
setAggregatorDescriptors
public static void setAggregatorDescriptors(JobConf job,
Class<? extends ValueAggregatorDescriptor>[] descriptors)
main
public static void main(String[] args)
throws IOException
- create and run an Aggregate based map/reduce job.
- Parameters:
args
- the arguments used for job creation
- Throws:
IOException
Copyright © 2009 The Apache Software Foundation