FileOutputFormat (Hadoop 1.0.4 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.mapreduce.lib.output
Class FileOutputFormat<K,V>

java.lang.Object
  org.apache.hadoop.mapreduce.OutputFormat<K,V>
      org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<K,V>

Direct Known Subclasses:: SequenceFileOutputFormat, TextOutputFormat

public abstract class FileOutputFormat<K,V>
extends OutputFormat<K,V>
extends OutputFormat<K,V>

A base class for OutputFormats that read from FileSystems.

Nested Class Summary
`static class`	`FileOutputFormat.Counter`

Field Summary
`protected static String`	`BASE_OUTPUT_NAME`
`protected static String`	`PART`

Constructor Summary
`FileOutputFormat()`

Method Summary
`void`	`checkOutputSpecs(JobContext job)` Check for validity of the output-specification for the job.
`static boolean`	`getCompressOutput(JobContext job)` Is the job output compressed?
`Path`	`getDefaultWorkFile(TaskAttemptContext context, String extension)` Get the default path and filename for the output format.
`OutputCommitter`	`getOutputCommitter(TaskAttemptContext context)` Get the output committer for this output format.
`static Class<? extends CompressionCodec>`	`getOutputCompressorClass(JobContext job, Class<? extends CompressionCodec> defaultValue)` Get the `CompressionCodec` for compressing the job outputs.
`protected static String`	`getOutputName(JobContext job)` Get the base output name for the output file.
`static Path`	`getOutputPath(JobContext job)` Get the `Path` to the output directory for the map-reduce job.
`static Path`	`getPathForWorkFile(TaskInputOutputContext<?,?,?,?> context, String name, String extension)` Helper function to generate a `Path` for a file that is unique for the task within the job output directory.
`abstract RecordWriter<K,V>`	`getRecordWriter(TaskAttemptContext job)` Get the `RecordWriter` for the given task.
`static String`	`getUniqueFile(TaskAttemptContext context, String name, String extension)` Generate a unique filename, based on the task id, name, and extension
`static Path`	`getWorkOutputPath(TaskInputOutputContext<?,?,?,?> context)` Get the `Path` to the task's temporary output directory for the map-reduce job Tasks' Side-Effect Files
`static void`	`setCompressOutput(Job job, boolean compress)` Set whether the output of the job is compressed.
`static void`	`setOutputCompressorClass(Job job, Class<? extends CompressionCodec> codecClass)` Set the `CompressionCodec` to be used to compress job outputs.
`protected static void`	`setOutputName(JobContext job, String name)` Set the base output name for output file to be created.
`static void`	`setOutputPath(Job job, Path outputDir)` Set the `Path` of the output directory for the map-reduce job.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

BASE_OUTPUT_NAME

protected static final String BASE_OUTPUT_NAME

See Also:: Constant Field Values

PART

protected static final String PART

See Also:: Constant Field Values

Constructor Detail

FileOutputFormat

public FileOutputFormat()

Method Detail

setCompressOutput

public static void setCompressOutput(Job job,
                                     boolean compress)

Set whether the output of the job is compressed.

Parameters:: job - the job to modify; compress - should the output of the job be compressed?

getCompressOutput

public static boolean getCompressOutput(JobContext job)

Is the job output compressed?

Parameters:: job - the Job to look in
Returns:: true if the job output should be compressed, false otherwise

setOutputCompressorClass

public static void setOutputCompressorClass(Job job,
                                            Class<? extends CompressionCodec> codecClass)

Set the CompressionCodec to be used to compress job outputs.

Parameters:: job - the job to modify; codecClass - the CompressionCodec to be used to compress the job outputs

getOutputCompressorClass

public static Class<? extends CompressionCodec> getOutputCompressorClass(JobContext job,
                                                                         Class<? extends CompressionCodec> defaultValue)

Get the CompressionCodec for compressing the job outputs.

Parameters:: job - the Job to look in; defaultValue - the CompressionCodec to return if not set
Returns:: the CompressionCodec to be used to compress the job outputs
Throws:: IllegalArgumentException - if the class was specified, but not found

getRecordWriter

public abstract RecordWriter<K,V> getRecordWriter(TaskAttemptContext job)
                                           throws IOException,
                                                  InterruptedException

Description copied from class: OutputFormat

Get the RecordWriter for the given task.

Specified by:: getRecordWriter in class OutputFormat<K,V>

Parameters:: job - the information about the current task.
Returns:: a RecordWriter to write the output for the job.
Throws:: IOException; InterruptedException

checkOutputSpecs

public void checkOutputSpecs(JobContext job)
                      throws FileAlreadyExistsException,
                             IOException

Description copied from class: OutputFormat

Check for validity of the output-specification for the job.

This is to validate the output specification for the job when it is a job is submitted. Typically checks that it does not already exist, throwing an exception when it already exists, so that output is not overwritten.

Specified by:: checkOutputSpecs in class OutputFormat<K,V>

Parameters:: job - information about the job
Throws:: IOException - when output should not be attempted; FileAlreadyExistsException

setOutputPath

public static void setOutputPath(Job job,
                                 Path outputDir)

Set the Path of the output directory for the map-reduce job.

Parameters:: job - The job to modify; outputDir - the Path of the output directory for the map-reduce job.

getOutputPath

public static Path getOutputPath(JobContext job)

Get the Path to the output directory for the map-reduce job.

Returns:: the Path to the output directory for the map-reduce job.
See Also:: getWorkOutputPath(TaskInputOutputContext)

getWorkOutputPath

public static Path getWorkOutputPath(TaskInputOutputContext<?,?,?,?> context)
                              throws IOException,
                                     InterruptedException

Get the Path to the task's temporary output directory for the map-reduce job

Tasks' Side-Effect Files

Some applications need to create/write-to side-files, which differ from the actual job-outputs.

In such cases there could be issues with 2 instances of the same TIP (running simultaneously e.g. speculative tasks) trying to open/write-to the same file (path) on HDFS. Hence the application-writer will have to pick unique names per task-attempt (e.g. using the attemptid, say attempt_200709221812_0001_m_000000_0), not just per TIP.

To get around this the Map-Reduce framework helps the application-writer out by maintaining a special ${mapred.output.dir}/_temporary/_${taskid} sub-directory for each task-attempt on HDFS where the output of the task-attempt goes. On successful completion of the task-attempt the files in the ${mapred.output.dir}/_temporary/_${taskid} (only) are promoted to ${mapred.output.dir}. Of course, the framework discards the sub-directory of unsuccessful task-attempts. This is completely transparent to the application.

The application-writer can take advantage of this by creating any side-files required in a work directory during execution of his task i.e. via getWorkOutputPath(TaskInputOutputContext), and the framework will move them out similarly - thus she doesn't have to pick unique paths per task-attempt.

The entire discussion holds true for maps of jobs with reducer=NONE (i.e. 0 reduces) since output of the map, in that case, goes directly to HDFS.

Returns:: the Path to the task's temporary output directory for the map-reduce job.
Throws:: IOException; InterruptedException

getPathForWorkFile

public static Path getPathForWorkFile(TaskInputOutputContext<?,?,?,?> context,
                                      String name,
                                      String extension)
                               throws IOException,
                                      InterruptedException

Helper function to generate a Path for a file that is unique for the task within the job output directory.

The path can be used to create custom files from within the map and reduce tasks. The path name will be unique for each task. The path parent will be the job output directory.

This method uses the getUniqueFile(org.apache.hadoop.mapreduce.TaskAttemptContext, java.lang.String, java.lang.String) method to make the file name unique for the task.

Parameters:: context - the context for the task.; name - the name for the file.; extension - the extension for the file
Returns:: a unique path accross all tasks of the job.
Throws:: IOException; InterruptedException

getUniqueFile

public static String getUniqueFile(TaskAttemptContext context,
                                   String name,
                                   String extension)

Generate a unique filename, based on the task id, name, and extension

Parameters:: context - the task that is calling this; name - the base filename; extension - the filename extension
Returns:: a string like $name-[mr]-$id$extension

getDefaultWorkFile

public Path getDefaultWorkFile(TaskAttemptContext context,
                               String extension)
                        throws IOException

Get the default path and filename for the output format.

Parameters:: context - the task context; extension - an extension to add to the filename
Returns:: a full path $output/_temporary/$taskid/part-[mr]-$id
Throws:: IOException

getOutputName

protected static String getOutputName(JobContext job)

Get the base output name for the output file.

setOutputName

protected static void setOutputName(JobContext job,
                                    String name)

Set the base output name for output file to be created.

getOutputCommitter

public OutputCommitter getOutputCommitter(TaskAttemptContext context)
                                   throws IOException

Description copied from class: OutputFormat

Get the output committer for this output format. This is responsible for ensuring the output is committed correctly.

Specified by:: getOutputCommitter in class OutputFormat<K,V>

Parameters:: context - the task context
Returns:: an output committer
Throws:: IOException

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.mapreduce.lib.output Class FileOutputFormat<K,V>

Tasks' Side-Effect Files

BASE_OUTPUT_NAME

PART

FileOutputFormat

setCompressOutput

getCompressOutput

setOutputCompressorClass

getOutputCompressorClass

getRecordWriter

checkOutputSpecs

setOutputPath

getOutputPath

getWorkOutputPath

Tasks' Side-Effect Files

getPathForWorkFile

getUniqueFile

getDefaultWorkFile

getOutputName

setOutputName

getOutputCommitter

org.apache.hadoop.mapreduce.lib.output
Class FileOutputFormat<K,V>