FileOutputFormat (Apache Hadoop Main 3.3.1 API)

java.lang.Object
- org.apache.hadoop.mapreduce.OutputFormat<K,V>
- - org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<K,V>

Direct Known Subclasses:

MapFileOutputFormat, SequenceFileOutputFormat, TextOutputFormat
```
@InterfaceAudience.Public
 @InterfaceStability.Stable
public abstract class FileOutputFormat<K,V>
extends OutputFormat<K,V>
```
A base class for OutputFormats that read from FileSystems.

Field Summary

Fields
Modifier and Type	Field and Description
`protected static String`	`BASE_OUTPUT_NAME`
`static String`	`COMPRESS` Configuration option: should output be compressed? "mapreduce.output.fileoutputformat.compress".
`static String`	`COMPRESS_CODEC` If compression is enabled, name of codec: "mapreduce.output.fileoutputformat.compress.codec".
`static String`	`COMPRESS_TYPE` Type of compression "mapreduce.output.fileoutputformat.compress.type": NONE, RECORD, BLOCK.
`static String`	`OUTDIR` Destination directory of work: "mapreduce.output.fileoutputformat.outputdir".
`protected static String`	`PART`

Constructor Summary

Constructors
Constructor and Description

FileOutputFormat()

Constructors
Constructor and Description
`FileOutputFormat()`

Method Summary

All Methods Static Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`checkOutputSpecs(JobContext job)` Check for validity of the output-specification for the job.
`static boolean`	`getCompressOutput(JobContext job)` Is the job output compressed?
`Path`	`getDefaultWorkFile(TaskAttemptContext context, String extension)` Get the default path and filename for the output format.
`OutputCommitter`	`getOutputCommitter(TaskAttemptContext context)` Get the output committer for this output format.
`static Class<? extends CompressionCodec>`	`getOutputCompressorClass(JobContext job, Class<? extends CompressionCodec> defaultValue)` Get the `CompressionCodec` for compressing the job outputs.
`protected static String`	`getOutputName(JobContext job)` Get the base output name for the output file.
`static Path`	`getOutputPath(JobContext job)` Get the `Path` to the output directory for the map-reduce job.
`static Path`	`getPathForWorkFile(TaskInputOutputContext<?,?,?,?> context, String name, String extension)` Helper function to generate a `Path` for a file that is unique for the task within the job output directory.
`abstract RecordWriter<K,V>`	`getRecordWriter(TaskAttemptContext job)` Get the `RecordWriter` for the given task.
`static String`	`getUniqueFile(TaskAttemptContext context, String name, String extension)` Generate a unique filename, based on the task id, name, and extension
`static Path`	`getWorkOutputPath(TaskInputOutputContext<?,?,?,?> context)` Get the `Path` to the task's temporary output directory for the map-reduce job Tasks' Side-Effect Files
`static void`	`setCompressOutput(Job job, boolean compress)` Set whether the output of the job is compressed.
`static void`	`setOutputCompressorClass(Job job, Class<? extends CompressionCodec> codecClass)` Set the `CompressionCodec` to be used to compress job outputs.
`protected static void`	`setOutputName(JobContext job, String name)` Set the base output name for output file to be created.
`static void`	`setOutputPath(Job job, Path outputDir)` Set the `Path` of the output directory for the map-reduce job.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - BASE_OUTPUT_NAME
```
protected static final String BASE_OUTPUT_NAME
```
    See Also:
    
    Constant Field Values
  - PART
```
protected static final String PART
```
    See Also:
    
    Constant Field Values
  - COMPRESS
```
public static final String COMPRESS
```
    Configuration option: should output be compressed? "mapreduce.output.fileoutputformat.compress".
    
    See Also:
    
    Constant Field Values
  - COMPRESS_CODEC
```
public static final String COMPRESS_CODEC
```
    If compression is enabled, name of codec: "mapreduce.output.fileoutputformat.compress.codec".
    
    See Also:
    
    Constant Field Values
  - COMPRESS_TYPE
```
public static final String COMPRESS_TYPE
```
    Type of compression "mapreduce.output.fileoutputformat.compress.type": NONE, RECORD, BLOCK. Generally only used in SequenceFileOutputFormat.
    
    See Also:
    
    Constant Field Values
  - OUTDIR
```
public static final String OUTDIR
```
    Destination directory of work: "mapreduce.output.fileoutputformat.outputdir".
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - FileOutputFormat
```
public FileOutputFormat()
```
- Method Detail
  - setCompressOutput
```
public static void setCompressOutput(Job job,
                                     boolean compress)
```
    Set whether the output of the job is compressed.
    
    Parameters:
    
    job - the job to modify
    
    compress - should the output of the job be compressed?
  - getCompressOutput
```
public static boolean getCompressOutput(JobContext job)
```
    Is the job output compressed?
    
    Parameters:
    
    job - the Job to look in
    
    Returns:
    
    true if the job output should be compressed, false otherwise
  - setOutputCompressorClass
```
public static void setOutputCompressorClass(Job job,
                                            Class<? extends CompressionCodec> codecClass)
```
    Set the CompressionCodec to be used to compress job outputs.
    
    Parameters:
    
    job - the job to modify
    
    codecClass - the CompressionCodec to be used to compress the job outputs
  - getOutputCompressorClass
```
public static Class<? extends CompressionCodec> getOutputCompressorClass(JobContext job,
                                                                         Class<? extends CompressionCodec> defaultValue)
```
    Get the CompressionCodec for compressing the job outputs.
    
    Parameters:
    
    job - the Job to look in
    
    defaultValue - the CompressionCodec to return if not set
    
    Returns:
    
    the CompressionCodec to be used to compress the job outputs
    
    Throws:
    
    IllegalArgumentException - if the class was specified, but not found
  - getRecordWriter
```
public abstract RecordWriter<K,V> getRecordWriter(TaskAttemptContext job)
                                           throws IOException,
                                                  InterruptedException
```
    Description copied from class: OutputFormat
    
    Get the RecordWriter for the given task.
    
    Specified by:
    
    getRecordWriter in class OutputFormat<K,V>
    
    Parameters:
    
    job - the information about the current task.
    
    Returns:
    
    a RecordWriter to write the output for the job.
    
    Throws:
    
    IOException
    
    InterruptedException
  - checkOutputSpecs
```
public void checkOutputSpecs(JobContext job)
                      throws FileAlreadyExistsException,
                             IOException
```
    Description copied from class: OutputFormat
    
    Check for validity of the output-specification for the job.
    This is to validate the output specification for the job when it is a job is submitted. Typically checks that it does not already exist, throwing an exception when it already exists, so that output is not overwritten.
    Implementations which write to filesystems which support delegation tokens usually collect the tokens for the destination path(s) and attach them to the job context's JobConf.
    
    Specified by:
    
    checkOutputSpecs in class OutputFormat<K,V>
    
    Parameters:
    
    job - information about the job
    
    Throws:
    
    IOException - when output should not be attempted
    
    FileAlreadyExistsException
  - setOutputPath
```
public static void setOutputPath(Job job,
                                 Path outputDir)
```
    Set the Path of the output directory for the map-reduce job.
    
    Parameters:
    
    job - The job to modify
    
    outputDir - the Path of the output directory for the map-reduce job.
  - getOutputPath
```
public static Path getOutputPath(JobContext job)
```
    Get the Path to the output directory for the map-reduce job.
    
    Returns:
    
    the Path to the output directory for the map-reduce job.
    
    See Also:
    
    getWorkOutputPath(TaskInputOutputContext)
  - getWorkOutputPath
```
public static Path getWorkOutputPath(TaskInputOutputContext<?,?,?,?> context)
                              throws IOException,
                                     InterruptedException
```
    Get the Path to the task's temporary output directory for the map-reduce job Tasks' Side-Effect Files
    Some applications need to create/write-to side-files, which differ from the actual job-outputs.
    In such cases there could be issues with 2 instances of the same TIP (running simultaneously e.g. speculative tasks) trying to open/write-to the same file (path) on HDFS. Hence the application-writer will have to pick unique names per task-attempt (e.g. using the attemptid, say attempt_200709221812_0001_m_000000_0), not just per TIP.
    
    To get around this the Map-Reduce framework helps the application-writer out by maintaining a special ${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid} sub-directory for each task-attempt on HDFS where the output of the task-attempt goes. On successful completion of the task-attempt the files in the ${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid} (only) are promoted to ${mapreduce.output.fileoutputformat.outputdir}. Of course, the framework discards the sub-directory of unsuccessful task-attempts. This is completely transparent to the application.
    
    The application-writer can take advantage of this by creating any side-files required in a work directory during execution of his task i.e. via getWorkOutputPath(TaskInputOutputContext), and the framework will move them out similarly - thus she doesn't have to pick unique paths per task-attempt.
    
    The entire discussion holds true for maps of jobs with reducer=NONE (i.e. 0 reduces) since output of the map, in that case, goes directly to HDFS.
    
    Returns:
    
    the Path to the task's temporary output directory for the map-reduce job.
    
    Throws:
    
    IOException
    
    InterruptedException
  - getPathForWorkFile
```
public static Path getPathForWorkFile(TaskInputOutputContext<?,?,?,?> context,
                                      String name,
                                      String extension)
                               throws IOException,
                                      InterruptedException
```
    Helper function to generate a Path for a file that is unique for the task within the job output directory.
    The path can be used to create custom files from within the map and reduce tasks. The path name will be unique for each task. The path parent will be the job output directory.
    ls
    This method uses the getUniqueFile(org.apache.hadoop.mapreduce.TaskAttemptContext, java.lang.String, java.lang.String) method to make the file name unique for the task.
    
    Parameters:
    
    context - the context for the task.
    
    name - the name for the file.
    
    extension - the extension for the file
    
    Returns:
    
    a unique path accross all tasks of the job.
    
    Throws:
    
    IOException
    
    InterruptedException
  - getUniqueFile
```
public static String getUniqueFile(TaskAttemptContext context,
                                   String name,
                                   String extension)
```
    Generate a unique filename, based on the task id, name, and extension
    
    Parameters:
    
    context - the task that is calling this
    
    name - the base filename
    
    extension - the filename extension
    
    Returns:
    
    a string like $name-[mrsct]-$id$extension
  - getDefaultWorkFile
```
public Path getDefaultWorkFile(TaskAttemptContext context,
                               String extension)
                        throws IOException
```
    Get the default path and filename for the output format.
    
    Parameters:
    
    context - the task context
    
    extension - an extension to add to the filename
    
    Returns:
    
    a full path $output/_temporary/$taskid/part-[mr]-$id
    
    Throws:
    
    IOException
  - getOutputName
```
protected static String getOutputName(JobContext job)
```
    Get the base output name for the output file.
  - setOutputName
```
protected static void setOutputName(JobContext job,
                                    String name)
```
    Set the base output name for output file to be created.
  - getOutputCommitter
```
public OutputCommitter getOutputCommitter(TaskAttemptContext context)
                                   throws IOException
```
    Description copied from class: OutputFormat
    
    Get the output committer for this output format. This is responsible for ensuring the output is committed correctly.
    
    Specified by:
    
    getOutputCommitter in class OutputFormat<K,V>
    
    Parameters:
    
    context - the task context
    
    Returns:
    
    an output committer
    
    Throws:
    
    IOException

Class FileOutputFormat<K,V>

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

BASE_OUTPUT_NAME

PART

COMPRESS

COMPRESS_CODEC

COMPRESS_TYPE

OUTDIR

Constructor Detail

FileOutputFormat

Method Detail

setCompressOutput

getCompressOutput

setOutputCompressorClass

getOutputCompressorClass

getRecordWriter

checkOutputSpecs

setOutputPath

getOutputPath

getWorkOutputPath

getPathForWorkFile

getUniqueFile

getDefaultWorkFile

getOutputName

setOutputName

getOutputCommitter