org.apache.hadoop.mapred.lib
Class RegexMapper<K>
java.lang.Object
   org.apache.hadoop.mapred.MapReduceBase
org.apache.hadoop.mapred.MapReduceBase
       org.apache.hadoop.mapred.lib.RegexMapper<K>
org.apache.hadoop.mapred.lib.RegexMapper<K>
- All Implemented Interfaces: 
- Closeable, JobConfigurable, Mapper<K,Text,Text,LongWritable>
- public class RegexMapper<K> 
- extends MapReduceBase- implements Mapper<K,Text,Text,LongWritable>
A Mapper that extracts text matching a regular expression.
 
 
 
| Methods inherited from class java.lang.Object | 
| clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
 
 
RegexMapper
public RegexMapper()
configure
public void configure(JobConf job)
- Description copied from class: MapReduceBase
- Default implementation that does nothing.
 
- 
- Specified by:
- configurein interface- JobConfigurable
- Overrides:
- configurein class- MapReduceBase
 
- 
- Parameters:
- job- the configuration
 
map
public void map(K key,
                Text value,
                OutputCollector<Text,LongWritable> output,
                Reporter reporter)
         throws IOException
- Description copied from interface: Mapper
- Maps a single input key/value pair into an intermediate key/value pair.
 
 Output pairs need not be of the same types as input pairs.  A given 
 input pair may map to zero or many output pairs.  Output pairs are 
 collected with calls to 
 OutputCollector.collect(Object,Object).
 Applications can use the Reporterprovided to report progress 
 or just indicate that they are alive. In scenarios where the application 
 takes an insignificant amount of time to process individual key/value 
 pairs, this is crucial since the framework might assume that the task has 
 timed-out and kill that task. The other way of avoiding this is to set 
 
 mapred.task.timeout to a high-enough value (or even zero for no 
 time-outs).
 
 
- 
- Specified by:
- mapin interface- Mapper<K,Text,Text,LongWritable>
 
- 
- Parameters:
- key- the input key.
- value- the input value.
- output- collects mapped keys and values.
- reporter- facility to report progress.
- Throws:
- IOException
 
Copyright © 2009 The Apache Software Foundation