Class SequenceFileInputFilter<K,V>
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFilter<K,V>
A class that allows a map/red job to work on a sample of sequence files.
The sample is decided by the filter class set by the job.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interfaceorg.apache.hadoop.mapreduce.lib.input.SequenceFileInputFilter.Filterfilter interfacestatic classorg.apache.hadoop.mapreduce.lib.input.SequenceFileInputFilter.FilterBasebase class for Filtersstatic classorg.apache.hadoop.mapreduce.lib.input.SequenceFileInputFilter.MD5FilterThis class returns a set of records by examing the MD5 digest of its key against a filtering frequency f.static classorg.apache.hadoop.mapreduce.lib.input.SequenceFileInputFilter.PercentFilterThis class returns a percentage of records The percentage is determined by a filtering frequency f using the criteria record# % f == 0.static classorg.apache.hadoop.mapreduce.lib.input.SequenceFileInputFilter.RegexFilterRecords filter by matching key to regexNested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
FileInputFormat.Counter -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Stringstatic final Stringstatic final Stringstatic final org.slf4j.LoggerFields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptioncreateRecordReader(InputSplit split, TaskAttemptContext context) Create a record reader for the given splitstatic voidsetFilterClass(Job job, Class<?> filterClass) set the filter classMethods inherited from class org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat
getFormatMinSplitSize, listStatusMethods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, isSplitable, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize, shrinkStatus
-
Field Details
-
LOG
public static final org.slf4j.Logger LOG -
FILTER_CLASS
- See Also:
-
FILTER_FREQUENCY
- See Also:
-
FILTER_REGEX
- See Also:
-
-
Constructor Details
-
SequenceFileInputFilter
public SequenceFileInputFilter()
-
-
Method Details
-
createRecordReader
public RecordReader<K,V> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException Create a record reader for the given split- Overrides:
createRecordReaderin classSequenceFileInputFormat<K,V> - Parameters:
split- file splitcontext- the task-attempt context- Returns:
- RecordReader
- Throws:
IOException
-
setFilterClass
set the filter class- Parameters:
job- The jobfilterClass- filter class
-