Class FileOutputFormat<K,V>
- Direct Known Subclasses:
MapFileOutputFormat,SequenceFileOutputFormat,TextOutputFormat
OutputFormats that read from FileSystems.-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final Stringstatic final StringConfiguration option: should output be compressed?static final StringIf compression is enabled, name of codec: "mapreduce.output.fileoutputformat.compress.codec".static final StringType of compression "mapreduce.output.fileoutputformat.compress.type": NONE, RECORD, BLOCK.static final StringDestination directory of work: "mapreduce.output.fileoutputformat.outputdir".protected static final String -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidCheck for validity of the output-specification for the job.static booleanIs the job output compressed?getDefaultWorkFile(TaskAttemptContext context, String extension) Get the default path and filename for the output format.getOutputCommitter(TaskAttemptContext context) Get the output committer for this output format.static Class<? extends CompressionCodec>getOutputCompressorClass(JobContext job, Class<? extends CompressionCodec> defaultValue) Get theCompressionCodecfor compressing the job outputs.protected static StringgetOutputName(JobContext job) Get the base output name for the output file.static PathgetOutputPath(JobContext job) Get thePathto the output directory for the map-reduce job.static PathgetPathForWorkFile(TaskInputOutputContext<?, ?, ?, ?> context, String name, String extension) Helper function to generate aPathfor a file that is unique for the task within the job output directory.abstract RecordWriter<K,V> Get theRecordWriterfor the given task.static StringgetUniqueFile(TaskAttemptContext context, String name, String extension) Generate a unique filename, based on the task id, name, and extensionstatic PathgetWorkOutputPath(TaskInputOutputContext<?, ?, ?, ?> context) Get thePathto the task's temporary output directory for the map-reduce job Tasks' Side-Effect Filesstatic voidsetCompressOutput(Job job, boolean compress) Set whether the output of the job is compressed.static voidsetOutputCompressorClass(Job job, Class<? extends CompressionCodec> codecClass) Set theCompressionCodecto be used to compress job outputs.protected static voidsetOutputName(JobContext job, String name) Set the base output name for output file to be created.static voidsetOutputPath(Job job, Path outputDir) Set thePathof the output directory for the map-reduce job.
-
Field Details
-
BASE_OUTPUT_NAME
- See Also:
-
PART
- See Also:
-
COMPRESS
Configuration option: should output be compressed? "mapreduce.output.fileoutputformat.compress".- See Also:
-
COMPRESS_CODEC
If compression is enabled, name of codec: "mapreduce.output.fileoutputformat.compress.codec".- See Also:
-
COMPRESS_TYPE
Type of compression "mapreduce.output.fileoutputformat.compress.type": NONE, RECORD, BLOCK. Generally only used inSequenceFileOutputFormat.- See Also:
-
OUTDIR
Destination directory of work: "mapreduce.output.fileoutputformat.outputdir".- See Also:
-
-
Constructor Details
-
FileOutputFormat
public FileOutputFormat()
-
-
Method Details
-
setCompressOutput
Set whether the output of the job is compressed.- Parameters:
job- the job to modifycompress- should the output of the job be compressed?
-
getCompressOutput
Is the job output compressed?- Parameters:
job- the Job to look in- Returns:
trueif the job output should be compressed,falseotherwise
-
setOutputCompressorClass
Set theCompressionCodecto be used to compress job outputs.- Parameters:
job- the job to modifycodecClass- theCompressionCodecto be used to compress the job outputs
-
getOutputCompressorClass
public static Class<? extends CompressionCodec> getOutputCompressorClass(JobContext job, Class<? extends CompressionCodec> defaultValue) Get theCompressionCodecfor compressing the job outputs.- Parameters:
job- theJobto look indefaultValue- theCompressionCodecto return if not set- Returns:
- the
CompressionCodecto be used to compress the job outputs - Throws:
IllegalArgumentException- if the class was specified, but not found
-
getRecordWriter
public abstract RecordWriter<K,V> getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException Description copied from class:OutputFormatGet theRecordWriterfor the given task.- Specified by:
getRecordWriterin classOutputFormat<K,V> - Parameters:
job- the information about the current task.- Returns:
- a
RecordWriterto write the output for the job. - Throws:
IOExceptionInterruptedException
-
checkOutputSpecs
Description copied from class:OutputFormatCheck for validity of the output-specification for the job.This is to validate the output specification for the job when it is a job is submitted. Typically checks that it does not already exist, throwing an exception when it already exists, so that output is not overwritten.
Implementations which write to filesystems which support delegation tokens usually collect the tokens for the destination path(s) and attach them to the job context's JobConf.- Specified by:
checkOutputSpecsin classOutputFormat<K,V> - Parameters:
job- information about the job- Throws:
IOException- when output should not be attemptedFileAlreadyExistsException
-
setOutputPath
Set thePathof the output directory for the map-reduce job.- Parameters:
job- The job to modifyoutputDir- thePathof the output directory for the map-reduce job.
-
getOutputPath
Get thePathto the output directory for the map-reduce job.- Returns:
- the
Pathto the output directory for the map-reduce job. - See Also:
-
getWorkOutputPath
public static Path getWorkOutputPath(TaskInputOutputContext<?, ?, throws IOException, InterruptedException?, ?> context) Get thePathto the task's temporary output directory for the map-reduce job Tasks' Side-Effect FilesSome applications need to create/write-to side-files, which differ from the actual job-outputs.
In such cases there could be issues with 2 instances of the same TIP (running simultaneously e.g. speculative tasks) trying to open/write-to the same file (path) on HDFS. Hence the application-writer will have to pick unique names per task-attempt (e.g. using the attemptid, say
attempt_200709221812_0001_m_000000_0), not just per TIP.To get around this the Map-Reduce framework helps the application-writer out by maintaining a special
${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid}sub-directory for each task-attempt on HDFS where the output of the task-attempt goes. On successful completion of the task-attempt the files in the${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid}(only) are promoted to${mapreduce.output.fileoutputformat.outputdir}. Of course, the framework discards the sub-directory of unsuccessful task-attempts. This is completely transparent to the application.The application-writer can take advantage of this by creating any side-files required in a work directory during execution of his task i.e. via
getWorkOutputPath(TaskInputOutputContext), and the framework will move them out similarly - thus she doesn't have to pick unique paths per task-attempt.The entire discussion holds true for maps of jobs with reducer=NONE (i.e. 0 reduces) since output of the map, in that case, goes directly to HDFS.
- Returns:
- the
Pathto the task's temporary output directory for the map-reduce job. - Throws:
IOExceptionInterruptedException
-
getPathForWorkFile
public static Path getPathForWorkFile(TaskInputOutputContext<?, ?, throws IOException, InterruptedException?, ?> context, String name, String extension) Helper function to generate aPathfor a file that is unique for the task within the job output directory.The path can be used to create custom files from within the map and reduce tasks. The path name will be unique for each task. The path parent will be the job output directory.
lsThis method uses the
getUniqueFile(org.apache.hadoop.mapreduce.TaskAttemptContext, java.lang.String, java.lang.String)method to make the file name unique for the task.- Parameters:
context- the context for the task.name- the name for the file.extension- the extension for the file- Returns:
- a unique path accross all tasks of the job.
- Throws:
IOExceptionInterruptedException
-
getUniqueFile
Generate a unique filename, based on the task id, name, and extension- Parameters:
context- the task that is calling thisname- the base filenameextension- the filename extension- Returns:
- a string like $name-[mrsct]-$id$extension
-
getDefaultWorkFile
Get the default path and filename for the output format.- Parameters:
context- the task contextextension- an extension to add to the filename- Returns:
- a full path $output/_temporary/$taskid/part-[mr]-$id
- Throws:
IOException
-
getOutputName
Get the base output name for the output file. -
setOutputName
Set the base output name for output file to be created. -
getOutputCommitter
Description copied from class:OutputFormatGet the output committer for this output format. This is responsible for ensuring the output is committed correctly.- Specified by:
getOutputCommitterin classOutputFormat<K,V> - Parameters:
context- the task context- Returns:
- an output committer
- Throws:
IOException
-