SequenceFileInputFilter (Hadoop 1.2.1 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.mapreduce.lib.input
Class SequenceFileInputFilter<K,V>

java.lang.Object
  org.apache.hadoop.mapreduce.InputFormat<K,V>
      org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>
          org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<K,V>
              org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFilter<K,V>

@InterfaceAudience.Public @InterfaceStability.Stable public class SequenceFileInputFilter<K,V>
extends SequenceFileInputFormat<K,V>
extends SequenceFileInputFormat<K,V>

A class that allows a map/red job to work on a sample of sequence files. The sample is decided by the filter class set by the job.

Nested Class Summary
`static interface`	`SequenceFileInputFilter.Filter` filter interface
`static class`	`SequenceFileInputFilter.FilterBase` base class for Filters
`static class`	`SequenceFileInputFilter.MD5Filter` This class returns a set of records by examing the MD5 digest of its key against a filtering frequency f.
`static class`	`SequenceFileInputFilter.PercentFilter` This class returns a percentage of records The percentage is determined by a filtering frequency f using the criteria record# % f == 0.
`static class`	`SequenceFileInputFilter.RegexFilter` Records filter by matching key to regex

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
`FileInputFormat.Counter`

Field Summary
`static String`	`FILTER_CLASS`
`static String`	`FILTER_FREQUENCY`
`static String`	`FILTER_REGEX`
`static org.apache.commons.logging.Log`	`LOG`

Constructor Summary
`SequenceFileInputFilter()`

Method Summary
`RecordReader<K,V>`	`createRecordReader(InputSplit split, TaskAttemptContext context)` Create a record reader for the given split
`static void`	`setFilterClass(Job job, Class<?> filterClass)` set the filter class

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat
`getFormatMinSplitSize, listStatus`

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
`addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, isSplitable, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG

FILTER_CLASS

public static final String FILTER_CLASS

See Also:: Constant Field Values

FILTER_FREQUENCY

public static final String FILTER_FREQUENCY

See Also:: Constant Field Values

FILTER_REGEX

public static final String FILTER_REGEX

See Also:: Constant Field Values

Constructor Detail

SequenceFileInputFilter

public SequenceFileInputFilter()

Method Detail

createRecordReader

public RecordReader<K,V> createRecordReader(InputSplit split,
                                            TaskAttemptContext context)
                                     throws IOException

Create a record reader for the given split

Overrides:: createRecordReader in class SequenceFileInputFormat<K,V>

Parameters:: split - file split; context - the task-attempt context
Returns:: RecordReader
Throws:: IOException

setFilterClass

public static void setFilterClass(Job job,
                                  Class<?> filterClass)

set the filter class

Parameters:: job - The job; filterClass - filter class