org.apache.hadoop.mapreduce.lib.input
Class SequenceFileInputFilter<K,V>

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<K,V>
      extended by org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>
          extended by org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<K,V>
              extended by org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFilter<K,V>

@InterfaceAudience.Public
@InterfaceStability.Stable
public class SequenceFileInputFilter<K,V>
extends SequenceFileInputFormat<K,V>

A class that allows a map/red job to work on a sample of sequence files. The sample is decided by the filter class set by the job.


Nested Class Summary
static interface SequenceFileInputFilter.Filter
          filter interface
static class SequenceFileInputFilter.FilterBase
          base class for Filters
static class SequenceFileInputFilter.MD5Filter
          This class returns a set of records by examing the MD5 digest of its key against a filtering frequency f.
static class SequenceFileInputFilter.PercentFilter
          This class returns a percentage of records The percentage is determined by a filtering frequency f using the criteria record# % f == 0.
static class SequenceFileInputFilter.RegexFilter
          Records filter by matching key to regex
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
FileInputFormat.Counter
 
Field Summary
static String FILTER_CLASS
           
static String FILTER_FREQUENCY
           
static String FILTER_REGEX
           
static org.apache.commons.logging.Log LOG
           
 
Constructor Summary
SequenceFileInputFilter()
           
 
Method Summary
 RecordReader<K,V> createRecordReader(InputSplit split, TaskAttemptContext context)
          Create a record reader for the given split
static void setFilterClass(Job job, Class<?> filterClass)
          set the filter class
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat
getFormatMinSplitSize, listStatus
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, isSplitable, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG

FILTER_CLASS

public static final String FILTER_CLASS
See Also:
Constant Field Values

FILTER_FREQUENCY

public static final String FILTER_FREQUENCY
See Also:
Constant Field Values

FILTER_REGEX

public static final String FILTER_REGEX
See Also:
Constant Field Values
Constructor Detail

SequenceFileInputFilter

public SequenceFileInputFilter()
Method Detail

createRecordReader

public RecordReader<K,V> createRecordReader(InputSplit split,
                                            TaskAttemptContext context)
                                     throws IOException
Create a record reader for the given split

Overrides:
createRecordReader in class SequenceFileInputFormat<K,V>
Parameters:
split - file split
context - the task-attempt context
Returns:
RecordReader
Throws:
IOException

setFilterClass

public static void setFilterClass(Job job,
                                  Class<?> filterClass)
set the filter class

Parameters:
job - The job
filterClass - filter class


Copyright © 2009 The Apache Software Foundation