Class InputSampler<K,V>
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.hadoop.mapreduce.lib.partition.InputSampler<K,V>
- All Implemented Interfaces:
Configurable,Tool
- Direct Known Subclasses:
InputSampler
Utility for collecting samples and writing a partition file for
TotalOrderPartitioner.-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classorg.apache.hadoop.mapreduce.lib.partition.InputSampler.IntervalSampler<K,V> Sample from s splits at regular intervals.static classorg.apache.hadoop.mapreduce.lib.partition.InputSampler.RandomSampler<K,V> Sample from random points in the input.static interfaceorg.apache.hadoop.mapreduce.lib.partition.InputSampler.Sampler<K,V> Interface to sample using anInputFormat.static classorg.apache.hadoop.mapreduce.lib.partition.InputSampler.SplitSampler<K,V> Samples the first n records from s splits. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic voidintDriver for InputSampler from the command line.static <K,V> void writePartitionFile(Job job, org.apache.hadoop.mapreduce.lib.partition.InputSampler.Sampler<K, V> sampler) Write a partition file for the given job, using the Sampler provided.Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConfMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
Constructor Details
-
InputSampler
-
-
Method Details
-
writePartitionFile
public static <K,V> void writePartitionFile(Job job, org.apache.hadoop.mapreduce.lib.partition.InputSampler.Sampler<K, V> sampler) throws IOException, ClassNotFoundException, InterruptedExceptionWrite a partition file for the given job, using the Sampler provided. Queries the sampler for a sample keyset, sorts by the output key comparator, selects the keys for each rank, and writes to the destination returned fromTotalOrderPartitioner.getPartitionFile(org.apache.hadoop.conf.Configuration). -
run
Driver for InputSampler from the command line. Configures a JobConf instance and callswritePartitionFile(org.apache.hadoop.mapreduce.Job, org.apache.hadoop.mapreduce.lib.partition.InputSampler.Sampler<K, V>). -
main
- Throws:
Exception
-