Class MultithreadedMapper<K1,V1,K2,V2>
java.lang.Object
org.apache.hadoop.mapreduce.Mapper<K1,V1,K2,V2>
org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper<K1,V1,K2,V2>
Multithreaded implementation for @link org.apache.hadoop.mapreduce.Mapper.
It can be used instead of the default implementation,
MapRunner, when the Map operation is not CPU
bound in order to improve throughput.
Mapper implementations using this MapRunnable must be thread-safe.
The Map-Reduce job has to be configured with the mapper to use via
setMapperClass(Job, Class) and
the number of thread the thread-pool can use with the
getNumberOfThreads(JobContext) method. The default
value is 10 threads.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Mapper.Context -
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiongetMapperClass(JobContext job) Get the application's mapper class.static intThe number of threads in the thread pool that will run the map function.voidRun the application's maps using a thread pool.static <K1,V1, K2, V2>
voidsetMapperClass(Job job, Class<? extends Mapper<K1, V1, K2, V2>> cls) Set the application's mapper class.static voidsetNumberOfThreads(Job job, int threads) Set the number of threads in the pool for running maps.
-
Field Details
-
NUM_THREADS
-
MAP_CLASS
-
-
Constructor Details
-
MultithreadedMapper
public MultithreadedMapper()
-
-
Method Details
-
getNumberOfThreads
The number of threads in the thread pool that will run the map function.- Parameters:
job- the job- Returns:
- the number of threads
-
setNumberOfThreads
Set the number of threads in the pool for running maps.- Parameters:
job- the job to modifythreads- the new number of threads
-
getMapperClass
Get the application's mapper class.- Type Parameters:
K1- the map's input key typeV1- the map's input value typeK2- the map's output key typeV2- the map's output value type- Parameters:
job- the job- Returns:
- the mapper class to run
-
setMapperClass
public static <K1,V1, void setMapperClassK2, V2> (Job job, Class<? extends Mapper<K1, V1, K2, V2>> cls) Set the application's mapper class.- Type Parameters:
K1- the map input key typeV1- the map input value typeK2- the map output key typeV2- the map output value type- Parameters:
job- the job to modifycls- the class to use as the mapper
-
run
public void run(Mapper<K1, V1, throws IOException, InterruptedExceptionK2, V2>.org.apache.hadoop.mapreduce.Mapper.Context context) Run the application's maps using a thread pool.- Overrides:
runin classMapper<K1,V1, K2, V2> - Throws:
IOExceptionInterruptedException
-