org.apache.hadoop.mapreduce.lib.partition
Class InputSampler<K,V>

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.hadoop.mapreduce.lib.partition.InputSampler<K,V>
All Implemented Interfaces:
Configurable, Tool

@InterfaceAudience.Public
@InterfaceStability.Stable
public class InputSampler<K,V>
extends Configured
implements Tool

Utility for collecting samples and writing a partition file for TotalOrderPartitioner.


Nested Class Summary
static class InputSampler.IntervalSampler<K,V>
          Sample from s splits at regular intervals.
static class InputSampler.RandomSampler<K,V>
          Sample from random points in the input.
static interface InputSampler.Sampler<K,V>
          Interface to sample using an InputFormat.
static class InputSampler.SplitSampler<K,V>
          Samples the first n records from s splits.
 
Constructor Summary
InputSampler(Configuration conf)
           
 
Method Summary
static void main(String[] args)
           
 int run(String[] args)
          Driver for InputSampler from the command line.
static
<K,V> void
writePartitionFile(Job job, InputSampler.Sampler<K,V> sampler)
          Write a partition file for the given job, using the Sampler provided.
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Constructor Detail

InputSampler

public InputSampler(Configuration conf)
Method Detail

writePartitionFile

public static <K,V> void writePartitionFile(Job job,
                                            InputSampler.Sampler<K,V> sampler)
                               throws IOException,
                                      ClassNotFoundException,
                                      InterruptedException
Write a partition file for the given job, using the Sampler provided. Queries the sampler for a sample keyset, sorts by the output key comparator, selects the keys for each rank, and writes to the destination returned from TotalOrderPartitioner.getPartitionFile(org.apache.hadoop.conf.Configuration).

Throws:
IOException
ClassNotFoundException
InterruptedException

run

public int run(String[] args)
        throws Exception
Driver for InputSampler from the command line. Configures a JobConf instance and calls writePartitionFile(org.apache.hadoop.mapreduce.Job, org.apache.hadoop.mapreduce.lib.partition.InputSampler.Sampler).

Specified by:
run in interface Tool
Parameters:
args - command specific arguments.
Returns:
exit code.
Throws:
Exception

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception


Copyright © 2009 The Apache Software Foundation