@InterfaceAudience.Public @InterfaceStability.Stable public class FileOutputCommitter extends PathOutputCommitter
OutputCommitter
that commits files specified
in job output directory i.e. ${mapreduce.output.fileoutputformat.outputdir}.Modifier and Type | Field and Description |
---|---|
static String |
FILEOUTPUTCOMMITTER_ALGORITHM_VERSION |
static int |
FILEOUTPUTCOMMITTER_ALGORITHM_VERSION_DEFAULT |
static String |
FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED |
static boolean |
FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED_DEFAULT |
static String |
FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED |
static boolean |
FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED_DEFAULT |
static String |
FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS |
static int |
FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS_DEFAULT |
static String |
FILEOUTPUTCOMMITTER_TASK_CLEANUP_ENABLED |
static boolean |
FILEOUTPUTCOMMITTER_TASK_CLEANUP_ENABLED_DEFAULT |
static String |
PENDING_DIR_NAME
Name of directory where pending data is placed.
|
static String |
SUCCEEDED_FILE_NAME |
static String |
SUCCESSFUL_JOB_OUTPUT_DIR_MARKER |
protected static String |
TEMP_DIR_NAME
Deprecated.
|
Constructor and Description |
---|
FileOutputCommitter(Path outputPath,
JobContext context)
Create a file output committer
|
FileOutputCommitter(Path outputPath,
TaskAttemptContext context)
Create a file output committer
|
Modifier and Type | Method and Description |
---|---|
void |
abortJob(JobContext context,
org.apache.hadoop.mapreduce.JobStatus.State state)
Delete the temporary directory, including all of the work directories.
|
void |
abortTask(TaskAttemptContext context)
Delete the work directory
|
void |
cleanupJob(JobContext context)
Deprecated.
|
void |
commitJob(JobContext context)
The job has completed, so do works in commitJobInternal().
|
protected void |
commitJobInternal(JobContext context)
The job has completed, so do following commit job, include:
Move all committed tasks to the final output dir (algorithm 1 only).
|
void |
commitTask(TaskAttemptContext context)
Move the files from the work directory to the job output directory
|
protected Path |
getCommittedTaskPath(int appAttemptId,
TaskAttemptContext context)
Compute the path where the output of a committed task is stored until the
entire job is committed for a specific application attempt.
|
Path |
getCommittedTaskPath(TaskAttemptContext context)
Compute the path where the output of a committed task is stored until
the entire job is committed.
|
static Path |
getCommittedTaskPath(TaskAttemptContext context,
Path out) |
protected Path |
getJobAttemptPath(int appAttemptId)
Compute the path where the output of a given job attempt will be placed.
|
Path |
getJobAttemptPath(JobContext context)
Compute the path where the output of a given job attempt will be placed.
|
static Path |
getJobAttemptPath(JobContext context,
Path out)
Compute the path where the output of a given job attempt will be placed.
|
Path |
getOutputPath()
Get the final directory where work will be placed once the job
is committed.
|
Path |
getTaskAttemptPath(TaskAttemptContext context)
Compute the path where the output of a task attempt is stored until
that task is committed.
|
static Path |
getTaskAttemptPath(TaskAttemptContext context,
Path out)
Compute the path where the output of a task attempt is stored until
that task is committed.
|
Path |
getWorkPath()
Get the directory that the task should write results into.
|
boolean |
isCommitJobRepeatable(JobContext context)
Returns true if an in-progress job commit can be retried.
|
boolean |
isRecoverySupported()
Deprecated.
|
boolean |
needsTaskCommit(TaskAttemptContext context)
Did this task write any files in the work directory?
|
void |
recoverTask(TaskAttemptContext context)
Recover the task output.
|
void |
setupJob(JobContext context)
Create the temporary directory that is the root of all of the task
work directories.
|
void |
setupTask(TaskAttemptContext context)
No task setup required.
|
String |
toString() |
hasOutputPath
isRecoverySupported
public static final String PENDING_DIR_NAME
@Deprecated protected static final String TEMP_DIR_NAME
public static final String SUCCEEDED_FILE_NAME
public static final String SUCCESSFUL_JOB_OUTPUT_DIR_MARKER
public static final String FILEOUTPUTCOMMITTER_ALGORITHM_VERSION
public static final int FILEOUTPUTCOMMITTER_ALGORITHM_VERSION_DEFAULT
public static final String FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED
public static final boolean FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED_DEFAULT
public static final String FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED
public static final boolean FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED_DEFAULT
public static final String FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS
public static final int FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS_DEFAULT
public static final String FILEOUTPUTCOMMITTER_TASK_CLEANUP_ENABLED
public static final boolean FILEOUTPUTCOMMITTER_TASK_CLEANUP_ENABLED_DEFAULT
public FileOutputCommitter(Path outputPath, TaskAttemptContext context) throws IOException
outputPath
- the job's output path, or null if you want the output
committer to act as a noop.context
- the task's contextIOException
@InterfaceAudience.Private public FileOutputCommitter(Path outputPath, JobContext context) throws IOException
outputPath
- the job's output path, or null if you want the output
committer to act as a noop.context
- the task's contextIOException
public Path getOutputPath()
PathOutputCommitter
getOutputPath
in class PathOutputCommitter
public Path getJobAttemptPath(JobContext context)
context
- the context of the job. This is used to get the
application attempt id.public static Path getJobAttemptPath(JobContext context, Path out)
context
- the context of the job. This is used to get the
application attempt id.out
- the output path to place these in.protected Path getJobAttemptPath(int appAttemptId)
appAttemptId
- the ID of the application attempt for this job.public Path getTaskAttemptPath(TaskAttemptContext context)
context
- the context of the task attempt.public static Path getTaskAttemptPath(TaskAttemptContext context, Path out)
context
- the context of the task attempt.out
- The output path to put things in.public Path getCommittedTaskPath(TaskAttemptContext context)
context
- the context of the task attemptpublic static Path getCommittedTaskPath(TaskAttemptContext context, Path out)
protected Path getCommittedTaskPath(int appAttemptId, TaskAttemptContext context)
appAttemptId
- the id of the application attempt to usecontext
- the context of any task.public Path getWorkPath() throws IOException
getWorkPath
in class PathOutputCommitter
IOException
public void setupJob(JobContext context) throws IOException
setupJob
in class OutputCommitter
context
- the job's contextIOException
- if temporary output could not be createdpublic void commitJob(JobContext context) throws IOException
commitJob
in class OutputCommitter
context
- the job's contextIOException
@VisibleForTesting protected void commitJobInternal(JobContext context) throws IOException
context
- the job's contextIOException
@Deprecated public void cleanupJob(JobContext context) throws IOException
OutputCommitter
cleanupJob
in class OutputCommitter
context
- Context of the job whose output is being written.IOException
public void abortJob(JobContext context, org.apache.hadoop.mapreduce.JobStatus.State state) throws IOException
abortJob
in class OutputCommitter
context
- the job's contextstate
- final runstate of the jobIOException
public void setupTask(TaskAttemptContext context) throws IOException
setupTask
in class OutputCommitter
context
- Context of the task whose output is being written.IOException
public void commitTask(TaskAttemptContext context) throws IOException
commitTask
in class OutputCommitter
context
- the task contextIOException
- if commit is not successful.public void abortTask(TaskAttemptContext context) throws IOException
abortTask
in class OutputCommitter
IOException
public boolean needsTaskCommit(TaskAttemptContext context) throws IOException
needsTaskCommit
in class OutputCommitter
context
- the task's contextIOException
@Deprecated public boolean isRecoverySupported()
OutputCommitter
isRecoverySupported
in class OutputCommitter
true
if task output recovery is supported,
false
otherwiseOutputCommitter.recoverTask(TaskAttemptContext)
public boolean isCommitJobRepeatable(JobContext context) throws IOException
OutputCommitter
isCommitJobRepeatable
in class OutputCommitter
context
- Context of the job whose output is being written.true
repeatable job commit is supported,
false
otherwiseIOException
public void recoverTask(TaskAttemptContext context) throws IOException
OutputCommitter
MRJobConfig.APPLICATION_ATTEMPT_ID
key in
JobContext.getConfiguration()
for the
OutputCommitter
. This is called from the application master
process, but it is called individually for each task.
If an exception is thrown the task will be attempted again.
This may be called multiple times for the same task. But from different
application attempts.recoverTask
in class OutputCommitter
context
- Context of the task whose output is being recoveredIOException
public String toString()
toString
in class PathOutputCommitter
Copyright © 2024 Apache Software Foundation. All rights reserved.