Class MultithreadedMapper<K1,V1,K2,V2>

java.lang.Object
org.apache.hadoop.mapreduce.Mapper<K1,V1,K2,V2>
org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper<K1,V1,K2,V2>

@Public @Stable public class MultithreadedMapper<K1,V1,K2,V2> extends Mapper<K1,V1,K2,V2>
Multithreaded implementation for @link org.apache.hadoop.mapreduce.Mapper.

It can be used instead of the default implementation, MapRunner, when the Map operation is not CPU bound in order to improve throughput.

Mapper implementations using this MapRunnable must be thread-safe.

The Map-Reduce job has to be configured with the mapper to use via setMapperClass(Job, Class) and the number of thread the thread-pool can use with the getNumberOfThreads(JobContext) method. The default value is 10 threads.

  • Field Details

    • NUM_THREADS

      public static String NUM_THREADS
    • MAP_CLASS

      public static String MAP_CLASS
  • Constructor Details

    • MultithreadedMapper

      public MultithreadedMapper()
  • Method Details

    • getNumberOfThreads

      public static int getNumberOfThreads(JobContext job)
      The number of threads in the thread pool that will run the map function.
      Parameters:
      job - the job
      Returns:
      the number of threads
    • setNumberOfThreads

      public static void setNumberOfThreads(Job job, int threads)
      Set the number of threads in the pool for running maps.
      Parameters:
      job - the job to modify
      threads - the new number of threads
    • getMapperClass

      public static <K1, V1, K2, V2> Class<Mapper<K1,V1,K2,V2>> getMapperClass(JobContext job)
      Get the application's mapper class.
      Type Parameters:
      K1 - the map's input key type
      V1 - the map's input value type
      K2 - the map's output key type
      V2 - the map's output value type
      Parameters:
      job - the job
      Returns:
      the mapper class to run
    • setMapperClass

      public static <K1, V1, K2, V2> void setMapperClass(Job job, Class<? extends Mapper<K1,V1,K2,V2>> cls)
      Set the application's mapper class.
      Type Parameters:
      K1 - the map input key type
      V1 - the map input value type
      K2 - the map output key type
      V2 - the map output value type
      Parameters:
      job - the job to modify
      cls - the class to use as the mapper
    • run

      public void run(Mapper<K1,V1,K2,V2>.org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException
      Run the application's maps using a thread pool.
      Overrides:
      run in class Mapper<K1,V1,K2,V2>
      Throws:
      IOException
      InterruptedException