Package org.apache.hadoop.mapred
Class OutputCommitter
java.lang.Object
org.apache.hadoop.mapreduce.OutputCommitter
org.apache.hadoop.mapred.OutputCommitter
- Direct Known Subclasses:
FileOutputCommitter
OutputCommitter describes the commit of task output for a
Map-Reduce job.
The Map-Reduce framework relies on the OutputCommitter of
the job to:
- Setup the job during initialization. For example, create the temporary output directory for the job during the initialization of the job.
- Cleanup the job after the job completion. For example, remove the temporary output directory after the job completion.
- Setup the task temporary output.
- Check whether a task needs a commit. This is to avoid the commit procedure if a task does not need commit.
- Commit of the task output.
- Discard the task commit.
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidabortJob(JobContext jobContext, int status) For aborting an unsuccessful job's output.final voidabortJob(JobContext context, JobStatus.State runState) This method implements the new interface by calling the old method.abstract voidabortTask(TaskAttemptContext taskContext) Discard the task output.final voidabortTask(TaskAttemptContext taskContext) This method implements the new interface by calling the old method.voidcleanupJob(JobContext jobContext) Deprecated.final voidcleanupJob(JobContext context) voidcommitJob(JobContext jobContext) For committing job's output after successful job completion.final voidcommitJob(JobContext context) This method implements the new interface by calling the old method.abstract voidcommitTask(TaskAttemptContext taskContext) To promote the task's temporary output to final output location.final voidcommitTask(TaskAttemptContext taskContext) This method implements the new interface by calling the old method.booleanisCommitJobRepeatable(JobContext jobContext) Returns true if an in-progress job commit can be retried.booleanisCommitJobRepeatable(JobContext jobContext) Returns true if an in-progress job commit can be retried.booleanDeprecated.UseisRecoverySupported(JobContext)instead.booleanisRecoverySupported(JobContext jobContext) Is task output recovery supported for restarting jobs?final booleanisRecoverySupported(JobContext context) This method implements the new interface by calling the old method.abstract booleanneedsTaskCommit(TaskAttemptContext taskContext) Check whether task needs a commit.final booleanneedsTaskCommit(TaskAttemptContext taskContext) This method implements the new interface by calling the old method.voidrecoverTask(TaskAttemptContext taskContext) Recover the task output.final voidrecoverTask(TaskAttemptContext taskContext) This method implements the new interface by calling the old method.abstract voidsetupJob(JobContext jobContext) For the framework to setup the job output during initialization.final voidsetupJob(JobContext jobContext) This method implements the new interface by calling the old method.abstract voidsetupTask(TaskAttemptContext taskContext) Sets up output for the task.final voidsetupTask(TaskAttemptContext taskContext) This method implements the new interface by calling the old method.
-
Constructor Details
-
OutputCommitter
public OutputCommitter()
-
-
Method Details
-
setupJob
For the framework to setup the job output during initialization. This is called from the application master process for the entire job. This will be called multiple times, once per job attempt.- Parameters:
jobContext- Context of the job whose output is being written.- Throws:
IOException- if temporary output could not be created
-
cleanupJob
Deprecated.UsecommitJob(JobContext)orabortJob(JobContext, int)instead.For cleaning up the job's output after job completion. This is called from the application master process for the entire job. This may be called multiple times.- Parameters:
jobContext- Context of the job whose output is being written.- Throws:
IOException
-
commitJob
For committing job's output after successful job completion. Note that this is invoked for jobs with final runstate as SUCCESSFUL. This is called from the application master process for the entire job. This is guaranteed to only be called once. If it throws an exception the entire job will fail.- Parameters:
jobContext- Context of the job whose output is being written.- Throws:
IOException
-
abortJob
For aborting an unsuccessful job's output. Note that this is invoked for jobs with final runstate asJobStatus.FAILEDorJobStatus.KILLED. This is called from the application master process for the entire job. This may be called multiple times.- Parameters:
jobContext- Context of the job whose output is being written.status- final runstate of the job- Throws:
IOException
-
setupTask
Sets up output for the task. This is called from each individual task's process that will output to HDFS, and it is called just for that task. This may be called multiple times for the same task, but for different task attempts.- Parameters:
taskContext- Context of the task whose output is being written.- Throws:
IOException
-
needsTaskCommit
Check whether task needs a commit. This is called from each individual task's process that will output to HDFS, and it is called just for that task.- Parameters:
taskContext-- Returns:
- true/false
- Throws:
IOException
-
commitTask
To promote the task's temporary output to final output location. IfneedsTaskCommit(TaskAttemptContext)returns true and this task is the task that the AM determines finished first, this method is called to commit an individual task's output. This is to mark that tasks output as complete, ascommitJob(JobContext)will also be called later on if the entire job finished successfully. This is called from a task's process. This may be called multiple times for the same task, but different task attempts. It should be very rare for this to be called multiple times and requires odd networking failures to make this happen. In the future the Hadoop framework may eliminate this race.- Parameters:
taskContext- Context of the task whose output is being written.- Throws:
IOException- if commit is not
-
abortTask
Discard the task output. This is called from a task's process to clean up a single task's output that can not yet been committed. This may be called multiple times for the same task, but for different task attempts.- Parameters:
taskContext-- Throws:
IOException
-
isRecoverySupported
Deprecated.UseisRecoverySupported(JobContext)instead.This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.- Overrides:
isRecoverySupportedin classOutputCommitter- Returns:
trueif task output recovery is supported,falseotherwise- See Also:
-
isRecoverySupported
Is task output recovery supported for restarting jobs? If task output recovery is supported, job restart can be done more efficiently.- Parameters:
jobContext- Context of the job whose output is being written.- Returns:
trueif task output recovery is supported,falseotherwise- Throws:
IOException- See Also:
-
isCommitJobRepeatable
Returns true if an in-progress job commit can be retried. If the MR AM is re-run then it will check this value to determine if it can retry an in-progress commit that was started by a previous version. Note that in rare scenarios, the previous AM version might still be running at that time, due to system anomalies. Hence if this method returns true then the retry commit operation should be able to run concurrently with the previous operation. If repeatable job commit is supported, job restart can tolerate previous AM failures during job commit. By default, it is not supported. Extended classes (like: FileOutputCommitter) should explicitly override it if provide support.- Parameters:
jobContext- Context of the job whose output is being written.- Returns:
truerepeatable job commit is supported,falseotherwise- Throws:
IOException
-
isCommitJobRepeatable
Description copied from class:OutputCommitterReturns true if an in-progress job commit can be retried. If the MR AM is re-run then it will check this value to determine if it can retry an in-progress commit that was started by a previous version. Note that in rare scenarios, the previous AM version might still be running at that time, due to system anomalies. Hence if this method returns true then the retry commit operation should be able to run concurrently with the previous operation. If repeatable job commit is supported, job restart can tolerate previous AM failures during job commit. By default, it is not supported. Extended classes (like: FileOutputCommitter) should explicitly override it if provide support.- Overrides:
isCommitJobRepeatablein classOutputCommitter- Parameters:
jobContext- Context of the job whose output is being written.- Returns:
truerepeatable job commit is supported,falseotherwise- Throws:
IOException
-
recoverTask
Recover the task output. The retry-count for the job will be passed via theMRConstants.APPLICATION_ATTEMPT_IDkey inJobContext.getConfiguration()for theOutputCommitter. This is called from the application master process, but it is called individually for each task. If an exception is thrown the task will be attempted again.- Parameters:
taskContext- Context of the task whose output is being recovered- Throws:
IOException
-
setupJob
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.- Specified by:
setupJobin classOutputCommitter- Parameters:
jobContext- Context of the job whose output is being written.- Throws:
IOException- if temporary output could not be created
-
cleanupJob
Deprecated.This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.- Overrides:
cleanupJobin classOutputCommitter- Parameters:
context- Context of the job whose output is being written.- Throws:
IOException
-
commitJob
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.- Overrides:
commitJobin classOutputCommitter- Parameters:
context- Context of the job whose output is being written.- Throws:
IOException
-
abortJob
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.- Overrides:
abortJobin classOutputCommitter- Parameters:
context- Context of the job whose output is being written.runState- final runstate of the job- Throws:
IOException
-
setupTask
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.- Specified by:
setupTaskin classOutputCommitter- Parameters:
taskContext- Context of the task whose output is being written.- Throws:
IOException
-
needsTaskCommit
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.- Specified by:
needsTaskCommitin classOutputCommitter- Returns:
- true/false
- Throws:
IOException
-
commitTask
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.- Specified by:
commitTaskin classOutputCommitter- Parameters:
taskContext- Context of the task whose output is being written.- Throws:
IOException- if commit is not successful.
-
abortTask
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.- Specified by:
abortTaskin classOutputCommitter- Throws:
IOException
-
recoverTask
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.- Overrides:
recoverTaskin classOutputCommitter- Parameters:
taskContext- Context of the task whose output is being recovered- Throws:
IOException
-
isRecoverySupported
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.- Overrides:
isRecoverySupportedin classOutputCommitter- Parameters:
context- Context of the job whose output is being written.- Returns:
trueif task output recovery is supported,falseotherwise- Throws:
IOException- See Also:
-
commitJob(JobContext)orabortJob(JobContext, int)instead.