Package org.apache.hadoop.mapred.lib
Class MultipleOutputFormat<K,V>
java.lang.Object
org.apache.hadoop.mapred.FileOutputFormat<K,V>
org.apache.hadoop.mapred.lib.MultipleOutputFormat<K,V>
- All Implemented Interfaces:
OutputFormat<K,V>
- Direct Known Subclasses:
MultipleSequenceFileOutputFormat,MultipleTextOutputFormat
This abstract class extends the FileOutputFormat, allowing to write the
output data to different output files. There are three basic use cases for
this class.
Case one: This class is used for a map reduce job with at least one reducer.
The reducer wants to write data to different files depending on the actual
keys. It is assumed that a key (or value) encodes the actual key (value)
and the desired location for the actual key (value).
Case two: This class is used for a map only job. The job wants to use an
output file name that is either a part of the input file name of the input
data, or some derivation of it.
Case three: This class is used for a map only job. The job wants to use an
output file name that depends on both the keys and the input file name,
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileOutputFormat
FileOutputFormat.Counter -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected KgenerateActualKey(K key, V value) Generate the actual key from the given key/value.protected VgenerateActualValue(K key, V value) Generate the actual value from the given key and value.protected StringgenerateFileNameForKeyValue(K key, V value, String name) Generate the file output file name based on the given key and the leaf file name.protected StringgenerateLeafFileName(String name) Generate the leaf name for the output file name.protected abstract RecordWriter<K,V> getBaseRecordWriter(FileSystem fs, JobConf job, String name, Progressable arg3) protected StringgetInputFileBasedOutputFileName(JobConf job, String name) Generate the outfile name based on a given name and the input file name.getRecordWriter(FileSystem fs, JobConf job, String name, Progressable arg3) Create a composite record writer that can write key/value data to different output filesMethods inherited from class org.apache.hadoop.mapred.FileOutputFormat
checkOutputSpecs, getCompressOutput, getOutputCompressorClass, getOutputPath, getPathForCustomFile, getTaskOutputPath, getUniqueName, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputPath, setWorkOutputPath
-
Constructor Details
-
MultipleOutputFormat
public MultipleOutputFormat()
-
-
Method Details
-
getRecordWriter
public RecordWriter<K,V> getRecordWriter(FileSystem fs, JobConf job, String name, Progressable arg3) throws IOException Create a composite record writer that can write key/value data to different output files- Specified by:
getRecordWriterin interfaceOutputFormat<K,V> - Specified by:
getRecordWriterin classFileOutputFormat<K,V> - Parameters:
fs- the file system to usejob- the job conf for the jobname- the leaf file name for the output file (such as part-00000")arg3- a progressable for reporting progress.- Returns:
- a composite record writer
- Throws:
IOException
-
generateLeafFileName
Generate the leaf name for the output file name. The default behavior does not change the leaf file name (such as part-00000)- Parameters:
name- the leaf file name for the output file- Returns:
- the given leaf file name
-
generateFileNameForKeyValue
Generate the file output file name based on the given key and the leaf file name. The default behavior is that the file name does not depend on the key.- Parameters:
key- the key of the output dataname- the leaf file name- Returns:
- generated file name
-
generateActualKey
Generate the actual key from the given key/value. The default behavior is that the actual key is equal to the given key- Parameters:
key- the key of the output datavalue- the value of the output data- Returns:
- the actual key derived from the given key/value
-
generateActualValue
Generate the actual value from the given key and value. The default behavior is that the actual value is equal to the given value- Parameters:
key- the key of the output datavalue- the value of the output data- Returns:
- the actual value derived from the given key/value
-
getInputFileBasedOutputFileName
Generate the outfile name based on a given name and the input file name. If theMRJobConfig.MAP_INPUT_FILEdoes not exists (i.e. this is not for a map only job), the given name is returned unchanged. If the config value for "num.of.trailing.legs.to.use" is not set, or set 0 or negative, the given name is returned unchanged. Otherwise, return a file name consisting of the N trailing legs of the input file name where N is the config value for "num.of.trailing.legs.to.use".- Parameters:
job- the job configname- the output file name- Returns:
- the outfile name based on a given name and the input file name.
-
getBaseRecordWriter
protected abstract RecordWriter<K,V> getBaseRecordWriter(FileSystem fs, JobConf job, String name, Progressable arg3) throws IOException - Parameters:
fs- the file system to usejob- a job conf objectname- the name of the file over which a record writer object will be constructedarg3- a progressable object- Returns:
- A RecordWriter object over the given file
- Throws:
IOException
-