@InterfaceAudience.Public @InterfaceStability.Stable public abstract class OutputCommitter extends Object
OutputCommitter
describes the commit of task output for a
Map-Reduce job.
The Map-Reduce framework relies on the OutputCommitter
of
the job to:
FileOutputCommitter
,
JobContext
,
TaskAttemptContext
Constructor and Description |
---|
OutputCommitter() |
Modifier and Type | Method and Description |
---|---|
void |
abortJob(JobContext jobContext,
org.apache.hadoop.mapreduce.JobStatus.State state)
For aborting an unsuccessful job's output.
|
abstract void |
abortTask(TaskAttemptContext taskContext)
Discard the task output.
|
void |
cleanupJob(JobContext jobContext)
Deprecated.
Use
commitJob(JobContext) and
abortJob(JobContext, JobStatus.State) instead. |
void |
commitJob(JobContext jobContext)
For committing job's output after successful job completion.
|
abstract void |
commitTask(TaskAttemptContext taskContext)
To promote the task's temporary output to final output location.
|
boolean |
isCommitJobRepeatable(JobContext jobContext)
Returns true if an in-progress job commit can be retried.
|
boolean |
isRecoverySupported()
Deprecated.
Use
isRecoverySupported(JobContext) instead. |
boolean |
isRecoverySupported(JobContext jobContext)
Is task output recovery supported for restarting jobs?
If task output recovery is supported, job restart can be done more
efficiently.
|
abstract boolean |
needsTaskCommit(TaskAttemptContext taskContext)
Check whether task needs a commit.
|
void |
recoverTask(TaskAttemptContext taskContext)
Recover the task output.
|
abstract void |
setupJob(JobContext jobContext)
For the framework to setup the job output during initialization.
|
abstract void |
setupTask(TaskAttemptContext taskContext)
Sets up output for the task.
|
public abstract void setupJob(JobContext jobContext) throws IOException
jobContext
- Context of the job whose output is being written.IOException
- if temporary output could not be created@Deprecated public void cleanupJob(JobContext jobContext) throws IOException
commitJob(JobContext)
and
abortJob(JobContext, JobStatus.State)
instead.jobContext
- Context of the job whose output is being written.IOException
public void commitJob(JobContext jobContext) throws IOException
jobContext
- Context of the job whose output is being written.IOException
public void abortJob(JobContext jobContext, org.apache.hadoop.mapreduce.JobStatus.State state) throws IOException
JobStatus.State.FAILED
or
JobStatus.State.KILLED
. This is called from the application
master process for the entire job. This may be called multiple times.jobContext
- Context of the job whose output is being written.state
- final runstate of the jobIOException
public abstract void setupTask(TaskAttemptContext taskContext) throws IOException
taskContext
- Context of the task whose output is being written.IOException
public abstract boolean needsTaskCommit(TaskAttemptContext taskContext) throws IOException
taskContext
- IOException
public abstract void commitTask(TaskAttemptContext taskContext) throws IOException
needsTaskCommit(TaskAttemptContext)
returns true and this
task is the task that the AM determines finished first, this method
is called to commit an individual task's output. This is to mark
that tasks output as complete, as commitJob(JobContext)
will
also be called later on if the entire job finished successfully. This
is called from a task's process. This may be called multiple times for the
same task, but different task attempts. It should be very rare for this to
be called multiple times and requires odd networking failures to make this
happen. In the future the Hadoop framework may eliminate this race.taskContext
- Context of the task whose output is being written.IOException
- if commit is not successful.public abstract void abortTask(TaskAttemptContext taskContext) throws IOException
taskContext
- IOException
@Deprecated public boolean isRecoverySupported()
isRecoverySupported(JobContext)
instead.true
if task output recovery is supported,
false
otherwiserecoverTask(TaskAttemptContext)
public boolean isCommitJobRepeatable(JobContext jobContext) throws IOException
jobContext
- Context of the job whose output is being written.true
repeatable job commit is supported,
false
otherwiseIOException
public boolean isRecoverySupported(JobContext jobContext) throws IOException
jobContext
- Context of the job whose output is being written.true
if task output recovery is supported,
false
otherwiseIOException
recoverTask(TaskAttemptContext)
public void recoverTask(TaskAttemptContext taskContext) throws IOException
MRJobConfig.APPLICATION_ATTEMPT_ID
key in
JobContext.getConfiguration()
for the
OutputCommitter
. This is called from the application master
process, but it is called individually for each task.
If an exception is thrown the task will be attempted again.
This may be called multiple times for the same task. But from different
application attempts.taskContext
- Context of the task whose output is being recoveredIOException
Copyright © 2024 Apache Software Foundation. All rights reserved.