java.lang.Object

org.apache.hadoop.mapreduce.OutputFormat<K,V>

org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<K,V>

Direct Known Subclasses:: MapFileOutputFormat, SequenceFileOutputFormat, TextOutputFormat

@Public @Stable public abstract class FileOutputFormat<K,V> extends OutputFormat<K,V>

A base class for OutputFormats that read from FileSystems.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static enum

FileOutputFormat.Counter

Deprecated.
Field Summary

Fields

Modifier and Type

Field

Description

protected static final String

BASE_OUTPUT_NAME

static final String

COMPRESS

Configuration option: should output be compressed?

static final String

COMPRESS_CODEC

If compression is enabled, name of codec: "mapreduce.output.fileoutputformat.compress.codec".

static final String

COMPRESS_TYPE

Type of compression "mapreduce.output.fileoutputformat.compress.type": NONE, RECORD, BLOCK.

static final String

OUTDIR

Destination directory of work: "mapreduce.output.fileoutputformat.outputdir".

protected static final String

PART
Constructor Summary

Constructors

Constructor

Description

FileOutputFormat()
Method Summary

Modifier and Type

Method

Description

void

checkOutputSpecs(JobContext job)

Check for validity of the output-specification for the job.

static boolean

getCompressOutput(JobContext job)

Is the job output compressed?

Path

getDefaultWorkFile(TaskAttemptContext context, String extension)

Get the default path and filename for the output format.

OutputCommitter

getOutputCommitter(TaskAttemptContext context)

Get the output committer for this output format.

static Class<? extends CompressionCodec>

getOutputCompressorClass(JobContext job, Class<? extends CompressionCodec> defaultValue)

Get the CompressionCodec for compressing the job outputs.

protected static String

getOutputName(JobContext job)

Get the base output name for the output file.

static Path

getOutputPath(JobContext job)

Get the Path to the output directory for the map-reduce job.

static Path

getPathForWorkFile(TaskInputOutputContext<?,?,?,?> context, String name, String extension)

Helper function to generate a Path for a file that is unique for the task within the job output directory.

abstract RecordWriter<K,V>

getRecordWriter(TaskAttemptContext job)

Get the RecordWriter for the given task.

static String

getUniqueFile(TaskAttemptContext context, String name, String extension)

Generate a unique filename, based on the task id, name, and extension

static Path

getWorkOutputPath(TaskInputOutputContext<?,?,?,?> context)

Get the Path to the task's temporary output directory for the map-reduce job Tasks' Side-Effect Files

static void

setCompressOutput(Job job, boolean compress)

Set whether the output of the job is compressed.

static void

setOutputCompressorClass(Job job, Class<? extends CompressionCodec> codecClass)

Set the CompressionCodec to be used to compress job outputs.

protected static void

setOutputName(JobContext job, String name)

Set the base output name for output file to be created.

static void

setOutputPath(Job job, Path outputDir)

Set the Path of the output directory for the map-reduce job.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- BASE_OUTPUT_NAME
  
  protected static final String BASE_OUTPUT_NAME
  See Also:
  
  Constant Field Values
- PART
  
  protected static final String PART
  See Also:
  
  Constant Field Values
- COMPRESS
  
  public static final String COMPRESS
  
  Configuration option: should output be compressed? "mapreduce.output.fileoutputformat.compress".
  See Also:
  
  Constant Field Values
- COMPRESS_CODEC
  
  public static final String COMPRESS_CODEC
  
  If compression is enabled, name of codec: "mapreduce.output.fileoutputformat.compress.codec".
  See Also:
  
  Constant Field Values
- COMPRESS_TYPE
  
  public static final String COMPRESS_TYPE
  
  Type of compression "mapreduce.output.fileoutputformat.compress.type": NONE, RECORD, BLOCK. Generally only used in SequenceFileOutputFormat.
  See Also:
  
  Constant Field Values
- OUTDIR
  
  public static final String OUTDIR
  
  Destination directory of work: "mapreduce.output.fileoutputformat.outputdir".
  See Also:
  
  Constant Field Values
Constructor Details
- FileOutputFormat
  
  public FileOutputFormat()
Method Details
- setCompressOutput
  
  public static void setCompressOutput(Job job, boolean compress)
  
  Set whether the output of the job is compressed.
  
  Parameters:
  
  job - the job to modify
  
  compress - should the output of the job be compressed?
- getCompressOutput
  
  public static boolean getCompressOutput(JobContext job)
  
  Is the job output compressed?
  
  Parameters:
  
  job - the Job to look in
  
  Returns:
  
  true if the job output should be compressed, false otherwise
- setOutputCompressorClass
  
  public static void setOutputCompressorClass(Job job, Class<? extends CompressionCodec> codecClass)
  
  Set the CompressionCodec to be used to compress job outputs.
  
  Parameters:
  
  job - the job to modify
  
  codecClass - the CompressionCodec to be used to compress the job outputs
- getOutputCompressorClass
  
  public static Class<? extends CompressionCodec> getOutputCompressorClass(JobContext job, Class<? extends CompressionCodec> defaultValue)
  
  Get the CompressionCodec for compressing the job outputs.
  
  Parameters:
  
  job - the Job to look in
  
  defaultValue - the CompressionCodec to return if not set
  
  Returns:
  
  the CompressionCodec to be used to compress the job outputs
  
  Throws:
  
  IllegalArgumentException - if the class was specified, but not found
- getRecordWriter
  
  public abstract RecordWriter<K,V> getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException
  
  Description copied from class: OutputFormat
  
  Get the RecordWriter for the given task.
  
  Specified by:
  
  getRecordWriter in class OutputFormat<K,V>
  
  Parameters:
  
  job - the information about the current task.
  
  Returns:
  
  a RecordWriter to write the output for the job.
  
  Throws:
  
  IOException
  
  InterruptedException
- checkOutputSpecs
  
  public void checkOutputSpecs(JobContext job) throws FileAlreadyExistsException, IOException
  
  Description copied from class: OutputFormat
  
  Check for validity of the output-specification for the job.
  This is to validate the output specification for the job when it is a job is submitted. Typically checks that it does not already exist, throwing an exception when it already exists, so that output is not overwritten.
  Implementations which write to filesystems which support delegation tokens usually collect the tokens for the destination path(s) and attach them to the job context's JobConf.
  
  Specified by:
  
  checkOutputSpecs in class OutputFormat<K,V>
  
  Parameters:
  
  job - information about the job
  
  Throws:
  
  IOException - when output should not be attempted
  
  FileAlreadyExistsException
- setOutputPath
  
  public static void setOutputPath(Job job, Path outputDir)
  
  Set the Path of the output directory for the map-reduce job.
  
  Parameters:
  
  job - The job to modify
  
  outputDir - the Path of the output directory for the map-reduce job.
- getOutputPath
  
  public static Path getOutputPath(JobContext job)
  
  Get the Path to the output directory for the map-reduce job.
  Returns:
  
  the Path to the output directory for the map-reduce job.
  
  See Also:
  
  getWorkOutputPath(TaskInputOutputContext)
- getWorkOutputPath
  
  public static Path getWorkOutputPath(TaskInputOutputContext<?,?,?,?> context) throws IOException, InterruptedException
  
  Get the Path to the task's temporary output directory for the map-reduce job Tasks' Side-Effect Files
  Some applications need to create/write-to side-files, which differ from the actual job-outputs.
  In such cases there could be issues with 2 instances of the same TIP (running simultaneously e.g. speculative tasks) trying to open/write-to the same file (path) on HDFS. Hence the application-writer will have to pick unique names per task-attempt (e.g. using the attemptid, say attempt_200709221812_0001_m_000000_0), not just per TIP.
  
  To get around this the Map-Reduce framework helps the application-writer out by maintaining a special ${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid} sub-directory for each task-attempt on HDFS where the output of the task-attempt goes. On successful completion of the task-attempt the files in the ${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid} (only) are promoted to ${mapreduce.output.fileoutputformat.outputdir}. Of course, the framework discards the sub-directory of unsuccessful task-attempts. This is completely transparent to the application.
  
  The application-writer can take advantage of this by creating any side-files required in a work directory during execution of his task i.e. via getWorkOutputPath(TaskInputOutputContext), and the framework will move them out similarly - thus she doesn't have to pick unique paths per task-attempt.
  
  The entire discussion holds true for maps of jobs with reducer=NONE (i.e. 0 reduces) since output of the map, in that case, goes directly to HDFS.
  
  Returns:
  
  the Path to the task's temporary output directory for the map-reduce job.
  
  Throws:
  
  IOException
  
  InterruptedException
- getPathForWorkFile
  
  public static Path getPathForWorkFile(TaskInputOutputContext<?,?,?,?> context, String name, String extension) throws IOException, InterruptedException
  
  Helper function to generate a Path for a file that is unique for the task within the job output directory.
  The path can be used to create custom files from within the map and reduce tasks. The path name will be unique for each task. The path parent will be the job output directory.
  ls
  This method uses the getUniqueFile(org.apache.hadoop.mapreduce.TaskAttemptContext, java.lang.String, java.lang.String) method to make the file name unique for the task.
  
  Parameters:
  
  context - the context for the task.
  
  name - the name for the file.
  
  extension - the extension for the file
  
  Returns:
  
  a unique path accross all tasks of the job.
  
  Throws:
  
  IOException
  
  InterruptedException
- getUniqueFile
  
  public static String getUniqueFile(TaskAttemptContext context, String name, String extension)
  
  Generate a unique filename, based on the task id, name, and extension
  
  Parameters:
  
  context - the task that is calling this
  
  name - the base filename
  
  extension - the filename extension
  
  Returns:
  
  a string like $name-[mrsct]-$id$extension
- getDefaultWorkFile
  
  public Path getDefaultWorkFile(TaskAttemptContext context, String extension) throws IOException
  
  Get the default path and filename for the output format.
  
  Parameters:
  
  context - the task context
  
  extension - an extension to add to the filename
  
  Returns:
  
  a full path $output/_temporary/$taskid/part-[mr]-$id
  
  Throws:
  
  IOException
- getOutputName
  
  protected static String getOutputName(JobContext job)
  
  Get the base output name for the output file.
- setOutputName
  
  protected static void setOutputName(JobContext job, String name)
  
  Set the base output name for output file to be created.
- getOutputCommitter
  
  public OutputCommitter getOutputCommitter(TaskAttemptContext context) throws IOException
  
  Description copied from class: OutputFormat
  
  Get the output committer for this output format. This is responsible for ensuring the output is committed correctly.
  
  Specified by:
  
  getOutputCommitter in class OutputFormat<K,V>
  
  Parameters:
  
  context - the task context
  
  Returns:
  
  an output committer
  
  Throws:
  
  IOException

Class FileOutputFormat<K,V>

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

BASE_OUTPUT_NAME

PART

COMPRESS

COMPRESS_CODEC

COMPRESS_TYPE

OUTDIR

Constructor Details

FileOutputFormat

Method Details

setCompressOutput

getCompressOutput

setOutputCompressorClass

getOutputCompressorClass

getRecordWriter

checkOutputSpecs

setOutputPath

getOutputPath

getWorkOutputPath

getPathForWorkFile

getUniqueFile

getDefaultWorkFile

getOutputName

setOutputName

getOutputCommitter