org.apache.hadoop.mapred.lib
Class KeyFieldBasedPartitioner<K2,V2>

java.lang.Object
  extended by org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner<K2,V2>
All Implemented Interfaces:
JobConfigurable, Partitioner<K2,V2>

public class KeyFieldBasedPartitioner<K2,V2>
extends Object
implements Partitioner<K2,V2>

Defines a way to partition keys based on certain key fields (also see KeyFieldBasedComparator. The key specification supported is of the form -k pos1[,pos2], where, pos is of the form f[.c][opts], where f is the number of the key field to use, and c is the number of the first character from the beginning of the field. Fields and character posns are numbered starting with 1; a character position of zero in pos2 indicates the field's last character. If '.c' is omitted from pos1, it defaults to 1 (the beginning of the field); if omitted from pos2, it defaults to 0 (the end of the field).


Constructor Summary
KeyFieldBasedPartitioner()
           
 
Method Summary
 void configure(JobConf job)
          Initializes a new instance from a JobConf.
protected  int getPartition(int hash, int numReduceTasks)
           
 int getPartition(K2 key, V2 value, int numReduceTasks)
          Get the paritition number for a given key (hence record) given the total number of partitions i.e.
protected  int hashCode(byte[] b, int start, int end, int currentHash)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

KeyFieldBasedPartitioner

public KeyFieldBasedPartitioner()
Method Detail

configure

public void configure(JobConf job)
Description copied from interface: JobConfigurable
Initializes a new instance from a JobConf.

Specified by:
configure in interface JobConfigurable
Parameters:
job - the configuration

getPartition

public int getPartition(K2 key,
                        V2 value,
                        int numReduceTasks)
Description copied from interface: Partitioner
Get the paritition number for a given key (hence record) given the total number of partitions i.e. number of reduce-tasks for the job.

Typically a hash function on a all or a subset of the key.

Specified by:
getPartition in interface Partitioner<K2,V2>
Parameters:
key - the key to be paritioned.
value - the entry value.
numReduceTasks - the total number of partitions.
Returns:
the partition number for the key.

hashCode

protected int hashCode(byte[] b,
                       int start,
                       int end,
                       int currentHash)

getPartition

protected int getPartition(int hash,
                           int numReduceTasks)


Copyright © 2009 The Apache Software Foundation