org.apache.hadoop.mapreduce.lib.partition
Class InputSampler.SplitSampler<K,V>

java.lang.Object
  extended by org.apache.hadoop.mapreduce.lib.partition.InputSampler.SplitSampler<K,V>
All Implemented Interfaces:
InputSampler.Sampler<K,V>
Enclosing class:
InputSampler<K,V>

public static class InputSampler.SplitSampler<K,V>
extends Object
implements InputSampler.Sampler<K,V>

Samples the first n records from s splits. Inexpensive way to sample random data.


Constructor Summary
InputSampler.SplitSampler(int numSamples)
          Create a SplitSampler sampling all splits.
InputSampler.SplitSampler(int numSamples, int maxSplitsSampled)
          Create a new SplitSampler.
 
Method Summary
 K[] getSample(InputFormat<K,V> inf, Job job)
          From each split sampled, take the first numSamples / numSplits records.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

InputSampler.SplitSampler

public InputSampler.SplitSampler(int numSamples)
Create a SplitSampler sampling all splits. Takes the first numSamples / numSplits records from each split.

Parameters:
numSamples - Total number of samples to obtain from all selected splits.

InputSampler.SplitSampler

public InputSampler.SplitSampler(int numSamples,
                                 int maxSplitsSampled)
Create a new SplitSampler.

Parameters:
numSamples - Total number of samples to obtain from all selected splits.
maxSplitsSampled - The maximum number of splits to examine.
Method Detail

getSample

public K[] getSample(InputFormat<K,V> inf,
                     Job job)
              throws IOException,
                     InterruptedException
From each split sampled, take the first numSamples / numSplits records.

Specified by:
getSample in interface InputSampler.Sampler<K,V>
Throws:
IOException
InterruptedException


Copyright © 2009 The Apache Software Foundation