org.apache.hadoop.mapred
Class SequenceFileInputFilter<K,V>

java.lang.Object
  extended by org.apache.hadoop.mapred.FileInputFormat<K,V>
      extended by org.apache.hadoop.mapred.SequenceFileInputFormat<K,V>
          extended by org.apache.hadoop.mapred.SequenceFileInputFilter<K,V>
All Implemented Interfaces:
InputFormat<K,V>

public class SequenceFileInputFilter<K,V>
extends SequenceFileInputFormat<K,V>

A class that allows a map/red job to work on a sample of sequence files. The sample is decided by the filter class set by the job.


Nested Class Summary
static interface SequenceFileInputFilter.Filter
          filter interface
static class SequenceFileInputFilter.FilterBase
          base class for Filters
static class SequenceFileInputFilter.MD5Filter
          This class returns a set of records by examing the MD5 digest of its key against a filtering frequency f.
static class SequenceFileInputFilter.PercentFilter
          This class returns a percentage of records The percentage is determined by a filtering frequency f using the criteria record# % f == 0.
static class SequenceFileInputFilter.RegexFilter
          Records filter by matching key to regex
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat
FileInputFormat.Counter
 
Field Summary
 
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
LOG
 
Constructor Summary
SequenceFileInputFilter()
           
 
Method Summary
 RecordReader<K,V> getRecordReader(InputSplit split, JobConf job, Reporter reporter)
          Create a record reader for the given split
static void setFilterClass(Configuration conf, Class filterClass)
          set the filter class
 
Methods inherited from class org.apache.hadoop.mapred.SequenceFileInputFormat
listStatus
 
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, getSplits, isSplitable, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SequenceFileInputFilter

public SequenceFileInputFilter()
Method Detail

getRecordReader

public RecordReader<K,V> getRecordReader(InputSplit split,
                                         JobConf job,
                                         Reporter reporter)
                                  throws IOException
Create a record reader for the given split

Specified by:
getRecordReader in interface InputFormat<K,V>
Overrides:
getRecordReader in class SequenceFileInputFormat<K,V>
Parameters:
split - file split
job - job configuration
reporter - reporter who sends report to task tracker
Returns:
RecordReader
Throws:
IOException

setFilterClass

public static void setFilterClass(Configuration conf,
                                  Class filterClass)
set the filter class

Parameters:
conf - application configuration
filterClass - filter class


Copyright © 2009 The Apache Software Foundation