org.apache.hadoop.mapreduce.lib.output
Class FileOutputCommitter

java.lang.Object
  extended by org.apache.hadoop.mapreduce.OutputCommitter
      extended by org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
Direct Known Subclasses:
PartialFileOutputCommitter

@InterfaceAudience.Public
@InterfaceStability.Stable
public class FileOutputCommitter
extends OutputCommitter

An OutputCommitter that commits files specified in job output directory i.e. ${mapreduce.output.fileoutputformat.outputdir}.


Field Summary
static String PENDING_DIR_NAME
          Name of directory where pending data is placed.
static String SUCCEEDED_FILE_NAME
           
static String SUCCESSFUL_JOB_OUTPUT_DIR_MARKER
           
protected static String TEMP_DIR_NAME
          Deprecated. 
 
Constructor Summary
FileOutputCommitter(Path outputPath, JobContext context)
          Create a file output committer
FileOutputCommitter(Path outputPath, TaskAttemptContext context)
          Create a file output committer
 
Method Summary
 void abortJob(JobContext context, org.apache.hadoop.mapreduce.JobStatus.State state)
          Delete the temporary directory, including all of the work directories.
 void abortTask(TaskAttemptContext context)
          Delete the work directory
 void cleanupJob(JobContext context)
          Deprecated. 
 void commitJob(JobContext context)
          The job has completed so move all committed tasks to the final output dir.
 void commitTask(TaskAttemptContext context)
          Move the files from the work directory to the job output directory
protected  Path getCommittedTaskPath(int appAttemptId, TaskAttemptContext context)
          Compute the path where the output of a committed task is stored until the entire job is committed for a specific application attempt.
 Path getCommittedTaskPath(TaskAttemptContext context)
          Compute the path where the output of a committed task is stored until the entire job is committed.
static Path getCommittedTaskPath(TaskAttemptContext context, Path out)
           
protected  Path getJobAttemptPath(int appAttemptId)
          Compute the path where the output of a given job attempt will be placed.
 Path getJobAttemptPath(JobContext context)
          Compute the path where the output of a given job attempt will be placed.
static Path getJobAttemptPath(JobContext context, Path out)
          Compute the path where the output of a given job attempt will be placed.
 Path getTaskAttemptPath(TaskAttemptContext context)
          Compute the path where the output of a task attempt is stored until that task is committed.
static Path getTaskAttemptPath(TaskAttemptContext context, Path out)
          Compute the path where the output of a task attempt is stored until that task is committed.
 Path getWorkPath()
          Get the directory that the task should write results into.
 boolean isRecoverySupported()
          Is task output recovery supported for restarting jobs? If task output recovery is supported, job restart can be done more efficiently.
 boolean needsTaskCommit(TaskAttemptContext context)
          Did this task write any files in the work directory?
 void recoverTask(TaskAttemptContext context)
          Recover the task output.
 void setupJob(JobContext context)
          Create the temporary directory that is the root of all of the task work directories.
 void setupTask(TaskAttemptContext context)
          No task setup required.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PENDING_DIR_NAME

public static final String PENDING_DIR_NAME
Name of directory where pending data is placed. Data that has not been committed yet.

See Also:
Constant Field Values

TEMP_DIR_NAME

@Deprecated
protected static final String TEMP_DIR_NAME
Deprecated. 
Temporary directory name The static variable to be compatible with M/R 1.x

See Also:
Constant Field Values

SUCCEEDED_FILE_NAME

public static final String SUCCEEDED_FILE_NAME
See Also:
Constant Field Values

SUCCESSFUL_JOB_OUTPUT_DIR_MARKER

public static final String SUCCESSFUL_JOB_OUTPUT_DIR_MARKER
See Also:
Constant Field Values
Constructor Detail

FileOutputCommitter

public FileOutputCommitter(Path outputPath,
                           TaskAttemptContext context)
                    throws IOException
Create a file output committer

Parameters:
outputPath - the job's output path, or null if you want the output committer to act as a noop.
context - the task's context
Throws:
IOException

FileOutputCommitter

@InterfaceAudience.Private
public FileOutputCommitter(Path outputPath,
                                                     JobContext context)
                    throws IOException
Create a file output committer

Parameters:
outputPath - the job's output path, or null if you want the output committer to act as a noop.
context - the task's context
Throws:
IOException
Method Detail

getJobAttemptPath

public Path getJobAttemptPath(JobContext context)
Compute the path where the output of a given job attempt will be placed.

Parameters:
context - the context of the job. This is used to get the application attempt id.
Returns:
the path to store job attempt data.

getJobAttemptPath

public static Path getJobAttemptPath(JobContext context,
                                     Path out)
Compute the path where the output of a given job attempt will be placed.

Parameters:
context - the context of the job. This is used to get the application attempt id.
out - the output path to place these in.
Returns:
the path to store job attempt data.

getJobAttemptPath

protected Path getJobAttemptPath(int appAttemptId)
Compute the path where the output of a given job attempt will be placed.

Parameters:
appAttemptId - the ID of the application attempt for this job.
Returns:
the path to store job attempt data.

getTaskAttemptPath

public Path getTaskAttemptPath(TaskAttemptContext context)
Compute the path where the output of a task attempt is stored until that task is committed.

Parameters:
context - the context of the task attempt.
Returns:
the path where a task attempt should be stored.

getTaskAttemptPath

public static Path getTaskAttemptPath(TaskAttemptContext context,
                                      Path out)
Compute the path where the output of a task attempt is stored until that task is committed.

Parameters:
context - the context of the task attempt.
out - The output path to put things in.
Returns:
the path where a task attempt should be stored.

getCommittedTaskPath

public Path getCommittedTaskPath(TaskAttemptContext context)
Compute the path where the output of a committed task is stored until the entire job is committed.

Parameters:
context - the context of the task attempt
Returns:
the path where the output of a committed task is stored until the entire job is committed.

getCommittedTaskPath

public static Path getCommittedTaskPath(TaskAttemptContext context,
                                        Path out)

getCommittedTaskPath

protected Path getCommittedTaskPath(int appAttemptId,
                                    TaskAttemptContext context)
Compute the path where the output of a committed task is stored until the entire job is committed for a specific application attempt.

Parameters:
appAttemptId - the id of the application attempt to use
context - the context of any task.
Returns:
the path where the output of a committed task is stored.

getWorkPath

public Path getWorkPath()
                 throws IOException
Get the directory that the task should write results into.

Returns:
the work directory
Throws:
IOException

setupJob

public void setupJob(JobContext context)
              throws IOException
Create the temporary directory that is the root of all of the task work directories.

Specified by:
setupJob in class OutputCommitter
Parameters:
context - the job's context
Throws:
IOException - if temporary output could not be created

commitJob

public void commitJob(JobContext context)
               throws IOException
The job has completed so move all committed tasks to the final output dir. Delete the temporary directory, including all of the work directories. Create a _SUCCESS file to make it as successful.

Overrides:
commitJob in class OutputCommitter
Parameters:
context - the job's context
Throws:
IOException

cleanupJob

@Deprecated
public void cleanupJob(JobContext context)
                throws IOException
Deprecated. 

Description copied from class: OutputCommitter
For cleaning up the job's output after job completion. This is called from the application master process for the entire job. This may be called multiple times.

Overrides:
cleanupJob in class OutputCommitter
Parameters:
context - Context of the job whose output is being written.
Throws:
IOException

abortJob

public void abortJob(JobContext context,
                     org.apache.hadoop.mapreduce.JobStatus.State state)
              throws IOException
Delete the temporary directory, including all of the work directories.

Overrides:
abortJob in class OutputCommitter
Parameters:
context - the job's context
state - final runstate of the job
Throws:
IOException

setupTask

public void setupTask(TaskAttemptContext context)
               throws IOException
No task setup required.

Specified by:
setupTask in class OutputCommitter
Parameters:
context - Context of the task whose output is being written.
Throws:
IOException

commitTask

public void commitTask(TaskAttemptContext context)
                throws IOException
Move the files from the work directory to the job output directory

Specified by:
commitTask in class OutputCommitter
Parameters:
context - the task context
Throws:
IOException - if commit is not successful.

abortTask

public void abortTask(TaskAttemptContext context)
               throws IOException
Delete the work directory

Specified by:
abortTask in class OutputCommitter
Throws:
IOException

needsTaskCommit

public boolean needsTaskCommit(TaskAttemptContext context)
                        throws IOException
Did this task write any files in the work directory?

Specified by:
needsTaskCommit in class OutputCommitter
Parameters:
context - the task's context
Returns:
true/false
Throws:
IOException

isRecoverySupported

public boolean isRecoverySupported()
Description copied from class: OutputCommitter
Is task output recovery supported for restarting jobs? If task output recovery is supported, job restart can be done more efficiently.

Overrides:
isRecoverySupported in class OutputCommitter
Returns:
true if task output recovery is supported, false otherwise
See Also:
OutputCommitter.recoverTask(TaskAttemptContext)

recoverTask

public void recoverTask(TaskAttemptContext context)
                 throws IOException
Description copied from class: OutputCommitter
Recover the task output. The retry-count for the job will be passed via the MRJobConfig.APPLICATION_ATTEMPT_ID key in JobContext.getConfiguration() for the OutputCommitter. This is called from the application master process, but it is called individually for each task. If an exception is thrown the task will be attempted again. This may be called multiple times for the same task. But from different application attempts.

Overrides:
recoverTask in class OutputCommitter
Parameters:
context - Context of the task whose output is being recovered
Throws:
IOException


Copyright © 2014 Apache Software Foundation. All Rights Reserved.