org.apache.hadoop.mapreduce.Mapper<K1,V1,K2,V2>

org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper<K1,V1,K2,V2>

@Public @Stable public class MultithreadedMapper<K1,V1,K2,V2> extends Mapper<K1,V1,K2,V2>

Multithreaded implementation for @link org.apache.hadoop.mapreduce.Mapper.

It can be used instead of the default implementation, MapRunner, when the Map operation is not CPU bound in order to improve throughput.

Mapper implementations using this MapRunnable must be thread-safe.

The Map-Reduce job has to be configured with the mapper to use via setMapperClass(Job, Class) and the number of thread the thread-pool can use with the getNumberOfThreads(JobContext) method. The default value is 10 threads.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Mapper.Context
Field Summary

Fields

Modifier and Type

Field

Description

static String

MAP_CLASS

static String

NUM_THREADS
Constructor Summary

Constructors

Constructor

Description

MultithreadedMapper()
Method Summary

Modifier and Type

Method

Description

static <K1, V1, K2, V2> Class<Mapper<K1,V1,K2,V2>>

getMapperClass(JobContext job)

Get the application's mapper class.

static int

getNumberOfThreads(JobContext job)

The number of threads in the thread pool that will run the map function.

void

run(Mapper<K1,V1,K2,V2>.org.apache.hadoop.mapreduce.Mapper.Context context)

Run the application's maps using a thread pool.

static <K1, V1, K2, V2> void

setMapperClass(Job job, Class<? extends Mapper<K1,V1,K2,V2>> cls)

Set the application's mapper class.

static void

setNumberOfThreads(Job job, int threads)

Set the number of threads in the pool for running maps.

Methods inherited from class org.apache.hadoop.mapreduce.Mapper
cleanup, map, setup

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- NUM_THREADS
  
  public static String NUM_THREADS
- MAP_CLASS
  
  public static String MAP_CLASS
Constructor Details
- MultithreadedMapper
  
  public MultithreadedMapper()
Method Details
- getNumberOfThreads
  
  public static int getNumberOfThreads(JobContext job)
  
  The number of threads in the thread pool that will run the map function.
  
  Parameters:
  
  job - the job
  
  Returns:
  
  the number of threads
- setNumberOfThreads
  
  public static void setNumberOfThreads(Job job, int threads)
  
  Set the number of threads in the pool for running maps.
  
  Parameters:
  
  job - the job to modify
  
  threads - the new number of threads
- getMapperClass
  
  public static <K1, V1, K2, V2> Class<Mapper<K1,V1,K2,V2>> getMapperClass(JobContext job)
  
  Get the application's mapper class.
  
  Type Parameters:
  
  K1 - the map's input key type
  
  V1 - the map's input value type
  
  K2 - the map's output key type
  
  V2 - the map's output value type
  
  Parameters:
  
  job - the job
  
  Returns:
  
  the mapper class to run
- setMapperClass
  
  public static <K1, V1, K2, V2> void setMapperClass(Job job, Class<? extends Mapper<K1,V1,K2,V2>> cls)
  
  Set the application's mapper class.
  
  Type Parameters:
  
  K1 - the map input key type
  
  V1 - the map input value type
  
  K2 - the map output key type
  
  V2 - the map output value type
  
  Parameters:
  
  job - the job to modify
  
  cls - the class to use as the mapper
- run
  
  public void run(Mapper<K1,V1,K2,V2>.org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException
  
  Run the application's maps using a thread pool.
  
  Overrides:
  
  run in class Mapper<K1,V1,K2,V2>
  
  Throws:
  
  IOException
  
  InterruptedException

Class MultithreadedMapper<K1,V1,K2,V2>

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Mapper

Field Summary

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapreduce.Mapper

Methods inherited from class java.lang.Object

Field Details

NUM_THREADS

MAP_CLASS

Constructor Details

MultithreadedMapper

Method Details

getNumberOfThreads

setNumberOfThreads

getMapperClass

setMapperClass

run