@InterfaceAudience.Public @InterfaceStability.Stable public class FileOutputCommitter extends PathOutputCommitter
OutputCommitter that commits files specified
in job output directory i.e. ${mapreduce.output.fileoutputformat.outputdir}.| Modifier and Type | Field and Description |
|---|---|
static String |
FILEOUTPUTCOMMITTER_ALGORITHM_VERSION |
static int |
FILEOUTPUTCOMMITTER_ALGORITHM_VERSION_DEFAULT |
static String |
FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED |
static boolean |
FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED_DEFAULT |
static String |
FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED |
static boolean |
FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED_DEFAULT |
static String |
FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS |
static int |
FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS_DEFAULT |
static String |
PENDING_DIR_NAME
Name of directory where pending data is placed.
|
static String |
SUCCEEDED_FILE_NAME |
static String |
SUCCESSFUL_JOB_OUTPUT_DIR_MARKER |
protected static String |
TEMP_DIR_NAME
Deprecated.
|
| Constructor and Description |
|---|
FileOutputCommitter(Path outputPath,
JobContext context)
Create a file output committer
|
FileOutputCommitter(Path outputPath,
TaskAttemptContext context)
Create a file output committer
|
| Modifier and Type | Method and Description |
|---|---|
void |
abortJob(JobContext context,
org.apache.hadoop.mapreduce.JobStatus.State state)
Delete the temporary directory, including all of the work directories.
|
void |
abortTask(TaskAttemptContext context)
Delete the work directory
|
void |
cleanupJob(JobContext context)
Deprecated.
|
void |
commitJob(JobContext context)
The job has completed, so do works in commitJobInternal().
|
protected void |
commitJobInternal(JobContext context)
The job has completed, so do following commit job, include:
Move all committed tasks to the final output dir (algorithm 1 only).
|
void |
commitTask(TaskAttemptContext context)
Move the files from the work directory to the job output directory
|
protected Path |
getCommittedTaskPath(int appAttemptId,
TaskAttemptContext context)
Compute the path where the output of a committed task is stored until the
entire job is committed for a specific application attempt.
|
Path |
getCommittedTaskPath(TaskAttemptContext context)
Compute the path where the output of a committed task is stored until
the entire job is committed.
|
static Path |
getCommittedTaskPath(TaskAttemptContext context,
Path out) |
protected Path |
getJobAttemptPath(int appAttemptId)
Compute the path where the output of a given job attempt will be placed.
|
Path |
getJobAttemptPath(JobContext context)
Compute the path where the output of a given job attempt will be placed.
|
static Path |
getJobAttemptPath(JobContext context,
Path out)
Compute the path where the output of a given job attempt will be placed.
|
Path |
getOutputPath()
Get the final directory where work will be placed once the job
is committed.
|
Path |
getTaskAttemptPath(TaskAttemptContext context)
Compute the path where the output of a task attempt is stored until
that task is committed.
|
static Path |
getTaskAttemptPath(TaskAttemptContext context,
Path out)
Compute the path where the output of a task attempt is stored until
that task is committed.
|
Path |
getWorkPath()
Get the directory that the task should write results into.
|
boolean |
isCommitJobRepeatable(JobContext context)
Returns true if an in-progress job commit can be retried.
|
boolean |
isRecoverySupported()
Deprecated.
|
boolean |
needsTaskCommit(TaskAttemptContext context)
Did this task write any files in the work directory?
|
void |
recoverTask(TaskAttemptContext context)
Recover the task output.
|
void |
setupJob(JobContext context)
Create the temporary directory that is the root of all of the task
work directories.
|
void |
setupTask(TaskAttemptContext context)
No task setup required.
|
String |
toString() |
hasOutputPathisRecoverySupportedpublic static final String PENDING_DIR_NAME
@Deprecated protected static final String TEMP_DIR_NAME
public static final String SUCCEEDED_FILE_NAME
public static final String SUCCESSFUL_JOB_OUTPUT_DIR_MARKER
public static final String FILEOUTPUTCOMMITTER_ALGORITHM_VERSION
public static final int FILEOUTPUTCOMMITTER_ALGORITHM_VERSION_DEFAULT
public static final String FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED
public static final boolean FILEOUTPUTCOMMITTER_CLEANUP_SKIPPED_DEFAULT
public static final String FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED
public static final boolean FILEOUTPUTCOMMITTER_CLEANUP_FAILURES_IGNORED_DEFAULT
public static final String FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS
public static final int FILEOUTPUTCOMMITTER_FAILURE_ATTEMPTS_DEFAULT
public FileOutputCommitter(Path outputPath, TaskAttemptContext context) throws IOException
outputPath - the job's output path, or null if you want the output
committer to act as a noop.context - the task's contextIOException@InterfaceAudience.Private public FileOutputCommitter(Path outputPath, JobContext context) throws IOException
outputPath - the job's output path, or null if you want the output
committer to act as a noop.context - the task's contextIOExceptionpublic Path getOutputPath()
PathOutputCommittergetOutputPath in class PathOutputCommitterpublic Path getJobAttemptPath(JobContext context)
context - the context of the job. This is used to get the
application attempt id.public static Path getJobAttemptPath(JobContext context, Path out)
context - the context of the job. This is used to get the
application attempt id.out - the output path to place these in.protected Path getJobAttemptPath(int appAttemptId)
appAttemptId - the ID of the application attempt for this job.public Path getTaskAttemptPath(TaskAttemptContext context)
context - the context of the task attempt.public static Path getTaskAttemptPath(TaskAttemptContext context, Path out)
context - the context of the task attempt.out - The output path to put things in.public Path getCommittedTaskPath(TaskAttemptContext context)
context - the context of the task attemptpublic static Path getCommittedTaskPath(TaskAttemptContext context, Path out)
protected Path getCommittedTaskPath(int appAttemptId, TaskAttemptContext context)
appAttemptId - the id of the application attempt to usecontext - the context of any task.public Path getWorkPath() throws IOException
getWorkPath in class PathOutputCommitterIOExceptionpublic void setupJob(JobContext context) throws IOException
setupJob in class OutputCommittercontext - the job's contextIOException - if temporary output could not be createdpublic void commitJob(JobContext context) throws IOException
commitJob in class OutputCommittercontext - the job's contextIOExceptionprotected void commitJobInternal(JobContext context) throws IOException
context - the job's contextIOException@Deprecated public void cleanupJob(JobContext context) throws IOException
OutputCommittercleanupJob in class OutputCommittercontext - Context of the job whose output is being written.IOExceptionpublic void abortJob(JobContext context, org.apache.hadoop.mapreduce.JobStatus.State state) throws IOException
abortJob in class OutputCommittercontext - the job's contextstate - final runstate of the jobIOExceptionpublic void setupTask(TaskAttemptContext context) throws IOException
setupTask in class OutputCommittercontext - Context of the task whose output is being written.IOExceptionpublic void commitTask(TaskAttemptContext context) throws IOException
commitTask in class OutputCommittercontext - the task contextIOException - if commit is not successful.public void abortTask(TaskAttemptContext context) throws IOException
abortTask in class OutputCommitterIOExceptionpublic boolean needsTaskCommit(TaskAttemptContext context) throws IOException
needsTaskCommit in class OutputCommittercontext - the task's contextIOException@Deprecated public boolean isRecoverySupported()
OutputCommitterisRecoverySupported in class OutputCommittertrue if task output recovery is supported,
false otherwiseOutputCommitter.recoverTask(TaskAttemptContext)public boolean isCommitJobRepeatable(JobContext context) throws IOException
OutputCommitterisCommitJobRepeatable in class OutputCommittercontext - Context of the job whose output is being written.true repeatable job commit is supported,
false otherwiseIOExceptionpublic void recoverTask(TaskAttemptContext context) throws IOException
OutputCommitterMRJobConfig.APPLICATION_ATTEMPT_ID key in
JobContext.getConfiguration() for the
OutputCommitter. This is called from the application master
process, but it is called individually for each task.
If an exception is thrown the task will be attempted again.
This may be called multiple times for the same task. But from different
application attempts.recoverTask in class OutputCommittercontext - Context of the task whose output is being recoveredIOExceptionpublic String toString()
toString in class PathOutputCommitterCopyright © 2018 Apache Software Foundation. All rights reserved.