org.apache.hadoop.mapred.lib
Class TotalOrderPartitioner<K extends WritableComparable,V>

java.lang.Object
  extended by org.apache.hadoop.mapred.lib.TotalOrderPartitioner<K,V>
All Implemented Interfaces:
JobConfigurable, Partitioner<K,V>

public class TotalOrderPartitioner<K extends WritableComparable,V>
extends Object
implements Partitioner<K,V>

Partitioner effecting a total order by reading split points from an externally generated source.


Field Summary
static String DEFAULT_PATH
           
 
Constructor Summary
TotalOrderPartitioner()
           
 
Method Summary
 void configure(JobConf job)
          Read in the partition file and build indexing data structures.
 int getPartition(K key, V value, int numPartitions)
          Get the paritition number for a given key (hence record) given the total number of partitions i.e.
static String getPartitionFile(JobConf job)
          Get the path to the SequenceFile storing the sorted partition keyset.
static void setPartitionFile(JobConf job, Path p)
          Set the path to the SequenceFile storing the sorted partition keyset.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_PATH

public static final String DEFAULT_PATH
See Also:
Constant Field Values
Constructor Detail

TotalOrderPartitioner

public TotalOrderPartitioner()
Method Detail

configure

public void configure(JobConf job)
Read in the partition file and build indexing data structures. If the keytype is BinaryComparable and total.order.partitioner.natural.order is not false, a trie of the first total.order.partitioner.max.trie.depth(2) + 1 bytes will be built. Otherwise, keys will be located using a binary search of the partition keyset using the RawComparator defined for this job. The input file must be sorted with the same comparator and contain JobConf.getNumReduceTasks() - 1 keys.

Specified by:
configure in interface JobConfigurable
Parameters:
job - the configuration

getPartition

public int getPartition(K key,
                        V value,
                        int numPartitions)
Description copied from interface: Partitioner
Get the paritition number for a given key (hence record) given the total number of partitions i.e. number of reduce-tasks for the job.

Typically a hash function on a all or a subset of the key.

Specified by:
getPartition in interface Partitioner<K extends WritableComparable,V>
Parameters:
key - the key to be paritioned.
value - the entry value.
numPartitions - the total number of partitions.
Returns:
the partition number for the key.

setPartitionFile

public static void setPartitionFile(JobConf job,
                                    Path p)
Set the path to the SequenceFile storing the sorted partition keyset. It must be the case that for R reduces, there are R-1 keys in the SequenceFile.


getPartitionFile

public static String getPartitionFile(JobConf job)
Get the path to the SequenceFile storing the sorted partition keyset.

See Also:
setPartitionFile(JobConf,Path)


Copyright © 2009 The Apache Software Foundation