org.apache.hadoop.mapred
Class OutputCommitter

java.lang.Object
  extended by org.apache.hadoop.mapreduce.OutputCommitter
      extended by org.apache.hadoop.mapred.OutputCommitter
Direct Known Subclasses:
FileOutputCommitter

@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class OutputCommitter
extends OutputCommitter

OutputCommitter describes the commit of task output for a Map-Reduce job.

The Map-Reduce framework relies on the OutputCommitter of the job to:

  1. Setup the job during initialization. For example, create the temporary output directory for the job during the initialization of the job.
  2. Cleanup the job after the job completion. For example, remove the temporary output directory after the job completion.
  3. Setup the task temporary output.
  4. Check whether a task needs a commit. This is to avoid the commit procedure if a task does not need commit.
  5. Commit of the task output.
  6. Discard the task commit.

See Also:
FileOutputCommitter, JobContext, TaskAttemptContext

Constructor Summary
OutputCommitter()
           
 
Method Summary
 void abortJob(JobContext jobContext, int status)
          For aborting an unsuccessful job's output.
 void abortJob(JobContext context, org.apache.hadoop.mapreduce.JobStatus.State runState)
          This method implements the new interface by calling the old method.
abstract  void abortTask(TaskAttemptContext taskContext)
          Discard the task output
 void abortTask(TaskAttemptContext taskContext)
          This method implements the new interface by calling the old method.
 void cleanupJob(JobContext jobContext)
          Deprecated. Use commitJob(JobContext) or abortJob(JobContext, int) instead.
 void cleanupJob(JobContext context)
          Deprecated. Use commitJob(org.apache.hadoop.mapreduce.JobContext) or abortJob(org.apache.hadoop.mapreduce.JobContext, org.apache.hadoop.mapreduce.JobStatus.State) instead.
 void commitJob(JobContext jobContext)
          For committing job's output after successful job completion.
 void commitJob(JobContext context)
          This method implements the new interface by calling the old method.
abstract  void commitTask(TaskAttemptContext taskContext)
          To promote the task's temporary output to final output location The task's output is moved to the job's output directory.
 void commitTask(TaskAttemptContext taskContext)
          This method implements the new interface by calling the old method.
 boolean isRecoverySupported()
          This method implements the new interface by calling the old method.
abstract  boolean needsTaskCommit(TaskAttemptContext taskContext)
          Check whether task needs a commit
 boolean needsTaskCommit(TaskAttemptContext taskContext)
          This method implements the new interface by calling the old method.
 void recoverTask(TaskAttemptContext taskContext)
          Recover the task output.
 void recoverTask(TaskAttemptContext taskContext)
          This method implements the new interface by calling the old method.
abstract  void setupJob(JobContext jobContext)
          For the framework to setup the job output during initialization
 void setupJob(JobContext jobContext)
          This method implements the new interface by calling the old method.
abstract  void setupTask(TaskAttemptContext taskContext)
          Sets up output for the task.
 void setupTask(TaskAttemptContext taskContext)
          This method implements the new interface by calling the old method.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

OutputCommitter

public OutputCommitter()
Method Detail

setupJob

public abstract void setupJob(JobContext jobContext)
                       throws IOException
For the framework to setup the job output during initialization

Parameters:
jobContext - Context of the job whose output is being written.
Throws:
IOException - if temporary output could not be created

cleanupJob

@Deprecated
public void cleanupJob(JobContext jobContext)
                throws IOException
Deprecated. Use commitJob(JobContext) or abortJob(JobContext, int) instead.

For cleaning up the job's output after job completion

Parameters:
jobContext - Context of the job whose output is being written.
Throws:
IOException

commitJob

public void commitJob(JobContext jobContext)
               throws IOException
For committing job's output after successful job completion. Note that this is invoked for jobs with final runstate as SUCCESSFUL.

Parameters:
jobContext - Context of the job whose output is being written.
Throws:
IOException

abortJob

public void abortJob(JobContext jobContext,
                     int status)
              throws IOException
For aborting an unsuccessful job's output. Note that this is invoked for jobs with final runstate as JobStatus.FAILED or JobStatus.KILLED

Parameters:
jobContext - Context of the job whose output is being written.
status - final runstate of the job
Throws:
IOException

setupTask

public abstract void setupTask(TaskAttemptContext taskContext)
                        throws IOException
Sets up output for the task.

Parameters:
taskContext - Context of the task whose output is being written.
Throws:
IOException

needsTaskCommit

public abstract boolean needsTaskCommit(TaskAttemptContext taskContext)
                                 throws IOException
Check whether task needs a commit

Parameters:
taskContext -
Returns:
true/false
Throws:
IOException

commitTask

public abstract void commitTask(TaskAttemptContext taskContext)
                         throws IOException
To promote the task's temporary output to final output location The task's output is moved to the job's output directory.

Parameters:
taskContext - Context of the task whose output is being written.
Throws:
IOException - if commit is not

abortTask

public abstract void abortTask(TaskAttemptContext taskContext)
                        throws IOException
Discard the task output

Parameters:
taskContext -
Throws:
IOException

isRecoverySupported

public boolean isRecoverySupported()
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.

Overrides:
isRecoverySupported in class OutputCommitter
Returns:
true if task output recovery is supported, false otherwise
See Also:
OutputCommitter.recoverTask(TaskAttemptContext)

recoverTask

public void recoverTask(TaskAttemptContext taskContext)
                 throws IOException
Recover the task output. The retry-count for the job will be passed via the MRConstants.APPLICATION_ATTEMPT_ID key in JobContext.getConfiguration() for the OutputCommitter. If an exception is thrown the task will be attempted again.

Parameters:
taskContext - Context of the task whose output is being recovered
Throws:
IOException

setupJob

public final void setupJob(JobContext jobContext)
                    throws IOException
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.

Specified by:
setupJob in class OutputCommitter
Parameters:
jobContext - Context of the job whose output is being written.
Throws:
IOException - if temporary output could not be created

cleanupJob

@Deprecated
public final void cleanupJob(JobContext context)
                      throws IOException
Deprecated. Use commitJob(org.apache.hadoop.mapreduce.JobContext) or abortJob(org.apache.hadoop.mapreduce.JobContext, org.apache.hadoop.mapreduce.JobStatus.State) instead.

This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.

Overrides:
cleanupJob in class OutputCommitter
Parameters:
context - Context of the job whose output is being written.
Throws:
IOException

commitJob

public final void commitJob(JobContext context)
                     throws IOException
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.

Overrides:
commitJob in class OutputCommitter
Parameters:
context - Context of the job whose output is being written.
Throws:
IOException

abortJob

public final void abortJob(JobContext context,
                           org.apache.hadoop.mapreduce.JobStatus.State runState)
                    throws IOException
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.

Overrides:
abortJob in class OutputCommitter
Parameters:
context - Context of the job whose output is being written.
runState - final runstate of the job
Throws:
IOException

setupTask

public final void setupTask(TaskAttemptContext taskContext)
                     throws IOException
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.

Specified by:
setupTask in class OutputCommitter
Parameters:
taskContext - Context of the task whose output is being written.
Throws:
IOException

needsTaskCommit

public final boolean needsTaskCommit(TaskAttemptContext taskContext)
                              throws IOException
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.

Specified by:
needsTaskCommit in class OutputCommitter
Returns:
true/false
Throws:
IOException

commitTask

public final void commitTask(TaskAttemptContext taskContext)
                      throws IOException
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.

Specified by:
commitTask in class OutputCommitter
Parameters:
taskContext - Context of the task whose output is being written.
Throws:
IOException - if commit is not successful.

abortTask

public final void abortTask(TaskAttemptContext taskContext)
                     throws IOException
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.

Specified by:
abortTask in class OutputCommitter
Throws:
IOException

recoverTask

public final void recoverTask(TaskAttemptContext taskContext)
                       throws IOException
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.

Overrides:
recoverTask in class OutputCommitter
Parameters:
taskContext - Context of the task whose output is being recovered
Throws:
IOException


Copyright © 2012 Apache Software Foundation. All Rights Reserved.