org.apache.hadoop.mapred
Interface Partitioner<K2,V2>

All Superinterfaces:
JobConfigurable
All Known Implementing Classes:
BinaryPartitioner, HashPartitioner, IndexUpdatePartitioner, KeyFieldBasedPartitioner, SleepJob, TotalOrderPartitioner

public interface Partitioner<K2,V2>
extends JobConfigurable

Partitions the key space.

Partitioner controls the partitioning of the keys of the intermediate map-outputs. The key (or a subset of the key) is used to derive the partition, typically by a hash function. The total number of partitions is the same as the number of reduce tasks for the job. Hence this controls which of the m reduce tasks the intermediate key (and hence the record) is sent for reduction.

See Also:
Reducer

Method Summary
 int getPartition(K2 key, V2 value, int numPartitions)
          Get the paritition number for a given key (hence record) given the total number of partitions i.e.
 
Methods inherited from interface org.apache.hadoop.mapred.JobConfigurable
configure
 

Method Detail

getPartition

int getPartition(K2 key,
                 V2 value,
                 int numPartitions)
Get the paritition number for a given key (hence record) given the total number of partitions i.e. number of reduce-tasks for the job.

Typically a hash function on a all or a subset of the key.

Parameters:
key - the key to be paritioned.
value - the entry value.
numPartitions - the total number of partitions.
Returns:
the partition number for the key.


Copyright © 2009 The Apache Software Foundation