java.lang.Object

org.apache.hadoop.mapreduce.InputFormat<K,V>

org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>

org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<K,V>

org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFilter<K,V>

@Public @Stable public class SequenceFileInputFilter<K,V> extends SequenceFileInputFormat<K,V>

A class that allows a map/red job to work on a sample of sequence files. The sample is decided by the filter class set by the job.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static interface

org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFilter.Filter

filter interface

static class

org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFilter.FilterBase

base class for Filters

static class

org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFilter.MD5Filter

This class returns a set of records by examing the MD5 digest of its key against a filtering frequency f.

static class

org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFilter.PercentFilter

This class returns a percentage of records The percentage is determined by a filtering frequency f using the criteria record# % f == 0.

static class

org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFilter.RegexFilter

Records filter by matching key to regex

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
FileInputFormat.Counter
Field Summary

Fields

Modifier and Type

Field

Description

static final String

FILTER_CLASS

static final String

FILTER_FREQUENCY

static final String

FILTER_REGEX

static final org.slf4j.Logger

LOG

Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE
Constructor Summary

Constructors

Constructor

Description

SequenceFileInputFilter()
Method Summary

Modifier and Type

Method

Description

RecordReader<K,V>

createRecordReader(InputSplit split, TaskAttemptContext context)

Create a record reader for the given split

static void

setFilterClass(Job job, Class<?> filterClass)

set the filter class

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat
getFormatMinSplitSize, listStatus

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, isSplitable, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize, shrinkStatus

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- LOG
  
  public static final org.slf4j.Logger LOG
- FILTER_CLASS
  
  public static final String FILTER_CLASS
  See Also:
  
  Constant Field Values
- FILTER_FREQUENCY
  
  public static final String FILTER_FREQUENCY
  See Also:
  
  Constant Field Values
- FILTER_REGEX
  
  public static final String FILTER_REGEX
  See Also:
  
  Constant Field Values
Constructor Details
- SequenceFileInputFilter
  
  public SequenceFileInputFilter()
Method Details
- createRecordReader
  
  public RecordReader<K,V> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException
  
  Create a record reader for the given split
  
  Overrides:
  
  createRecordReader in class SequenceFileInputFormat<K,V>
  
  Parameters:
  
  split - file split
  
  context - the task-attempt context
  
  Returns:
  
  RecordReader
  
  Throws:
  
  IOException
- setFilterClass
  
  public static void setFilterClass(Job job, Class<?> filterClass)
  
  set the filter class
  
  Parameters:
  
  job - The job
  
  filterClass - filter class

Class SequenceFileInputFilter<K,V>

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

Field Summary

Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

Methods inherited from class java.lang.Object

Field Details

LOG

FILTER_CLASS

FILTER_FREQUENCY

FILTER_REGEX

Constructor Details

SequenceFileInputFilter

Method Details

createRecordReader

setFilterClass