Class ValueAggregatorJob
java.lang.Object
org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJob
This is the main class for creating a map/reduce job using Aggregate
framework. The Aggregate is a specialization of map/reduce framework,
specilizing for performing various simple aggregations.
Generally speaking, in order to implement an application using Map/Reduce
model, the developer is to implement Map and Reduce functions (and possibly
combine function). However, a lot of applications related to counting and
statistics computing have very similar characteristics. Aggregate abstracts
out the general patterns of these functions and implementing those patterns.
In particular, the package provides generic mapper/redducer/combiner classes,
and a set of built-in value aggregators, and a generic utility class that
helps user create map/reduce jobs using the generic class. The built-in
aggregators include:
sum over numeric values count the number of distinct values compute the
histogram of values compute the minimum, maximum, media,average, standard
deviation of numeric values
The developer using Aggregate will need only to provide a plugin class
conforming to the following interface:
public interface ValueAggregatorDescriptor { public ArrayList<Entry>
generateKeyValPairs(Object key, Object value); public void
configure(JobConfjob); }
The package also provides a base class, ValueAggregatorBaseDescriptor,
implementing the above interface. The user can extend the base class and
implement generateKeyValPairs accordingly.
The primary work of generateKeyValPairs is to emit one or more key/value
pairs based on the input key/value pair. The key in an output key/value pair
encode two pieces of information: aggregation type and aggregation id. The
value will be aggregated onto the aggregation id according the aggregation
type.
This class offers a function to generate a map/reduce job using Aggregate
framework. The function takes the following parameters: input directory spec
input format (text or sequence file) output directory a file specifying the
user plugin class
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic JobConfcreateValueAggregatorJob(String[] args) Create an Aggregate based map/reduce job.static JobConfcreateValueAggregatorJob(String[] args, Class<?> caller) Create an Aggregate based map/reduce job.static JobConfcreateValueAggregatorJob(String[] args, Class<? extends ValueAggregatorDescriptor>[] descriptors) static JobConfcreateValueAggregatorJob(String[] args, Class<? extends ValueAggregatorDescriptor>[] descriptors, Class<?> caller) static JobControlcreateValueAggregatorJobs(String[] args) static JobControlcreateValueAggregatorJobs(String[] args, Class<? extends ValueAggregatorDescriptor>[] descriptors) static voidcreate and run an Aggregate based map/reduce job.static voidsetAggregatorDescriptors(JobConf job, Class<? extends ValueAggregatorDescriptor>[] descriptors)
-
Constructor Details
-
ValueAggregatorJob
public ValueAggregatorJob()
-
-
Method Details
-
createValueAggregatorJobs
public static JobControl createValueAggregatorJobs(String[] args, Class<? extends ValueAggregatorDescriptor>[] descriptors) throws IOException - Throws:
IOException
-
createValueAggregatorJobs
- Throws:
IOException
-
createValueAggregatorJob
Create an Aggregate based map/reduce job.- Parameters:
args- the arguments used for job creation. Generic hadoop arguments are accepted.caller- the the caller class.- Returns:
- a JobConf object ready for submission.
- Throws:
IOException- See Also:
-
GenericOptionsParser
-
createValueAggregatorJob
Create an Aggregate based map/reduce job.- Parameters:
args- the arguments used for job creation. Generic hadoop arguments are accepted.- Returns:
- a JobConf object ready for submission.
- Throws:
IOException- See Also:
-
GenericOptionsParser
-
createValueAggregatorJob
public static JobConf createValueAggregatorJob(String[] args, Class<? extends ValueAggregatorDescriptor>[] descriptors) throws IOException - Throws:
IOException
-
setAggregatorDescriptors
public static void setAggregatorDescriptors(JobConf job, Class<? extends ValueAggregatorDescriptor>[] descriptors) -
createValueAggregatorJob
public static JobConf createValueAggregatorJob(String[] args, Class<? extends ValueAggregatorDescriptor>[] descriptors, Class<?> caller) throws IOException - Throws:
IOException
-
main
create and run an Aggregate based map/reduce job.- Parameters:
args- the arguments used for job creation- Throws:
IOException
-