Package org.apache.hadoop.fs.s3a.commit.magic
Class MagicS3GuardCommitter
java.lang.Object
org.apache.hadoop.mapreduce.OutputCommitter
org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
org.apache.hadoop.fs.s3a.commit.magic.MagicS3GuardCommitter
- All Implemented Interfaces:
org.apache.hadoop.fs.statistics.IOStatisticsSource
@Public
@Unstable
public class MagicS3GuardCommitter
extends org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
This is a dedicated committer which requires the "magic" directory feature
of the S3A Filesystem to be enabled; it then uses paths for task and job
attempts in magic paths, so as to ensure that the final output goes direct
to the destination directory.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.ActiveCommit, AbstractS3ACommitter.JobUUIDSource -
Field Summary
FieldsFields inherited from class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
E_SELF_GENERATED_JOB_UUID, THREAD_PREFIX -
Constructor Summary
ConstructorsConstructorDescriptionMagicS3GuardCommitter(Path outputPath, TaskAttemptContext context) Create a task committer. -
Method Summary
Modifier and TypeMethodDescriptionvoidabortTask(TaskAttemptContext context) Abort a task.voidDelete the magic directory.voidcommitTask(TaskAttemptContext context) To promote the task's temporary output to final output location.protected final PathgetBaseTaskAttemptPath(TaskAttemptContext context) Compute the base path where the output of a task attempt is written.protected final PathgetJobAttemptPath(int appAttemptId) Compute the path where the output of a given job attempt will be placed.protected PathCompute the path under which all job attempts will be placed.getName()Get the name of this committer.final PathgetTaskAttemptPath(TaskAttemptContext context) Compute the path where the output of a task attempt is stored until that task is committed.getTempTaskAttemptPath(TaskAttemptContext context) Get a temporary directory for data.protected org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.ActiveCommitlistPendingUploadsToCommit(org.apache.hadoop.fs.s3a.commit.impl.CommitContext commitContext) Get the list of pending uploads for this job attempt, by listing all .pendingset files in the job attempt directory.protected org.apache.hadoop.fs.s3a.commit.files.PendingSetloadPendingCommits(TaskAttemptContext context) Loads pending commits from either memory or from the remote store (S3) based on the config.booleanneedsTaskCommit(TaskAttemptContext context) Did this task write any files in the work directory?protected booleanRequire magic paths in the FS client.voidsetupJob(JobContext context) Base job setup (optionally) deletes the success marker and always creates the destination directory.toString()Methods inherited from class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
abortJob, abortJobInternal, abortPendingUploads, abortPendingUploads, abortPendingUploadsInCleanup, buildJobUUID, cleanup, cleanupJob, commitJob, commitJobInternal, commitPendingUploads, deleteTaskAttemptPathQuietly, getAuditSpanSource, getCommitOperations, getConf, getDestFS, getDestinationFS, getDestS3AFS, getIOStatistics, getJobAttemptPath, getJobContext, getOutputPath, getRole, getTaskAttemptFilesystem, getUUID, getUUIDSource, getWorkPath, initiateJobOperation, initiateTaskOperation, initOutput, jobCompleted, maybeCreateSuccessMarker, maybeCreateSuccessMarkerFromCommits, maybeIgnore, maybeIgnore, precommitCheckPendingFiles, preCommitJob, recoverTask, setConf, setDestFS, setOutputPath, setupTask, setWorkPath, startOperation, updateCommonContext, warnOnActiveUploadsMethods inherited from class org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter
hasOutputPathMethods inherited from class org.apache.hadoop.mapreduce.OutputCommitter
isCommitJobRepeatable, isRecoverySupported, isRecoverySupported
-
Field Details
-
NAME
Name: "magic".- See Also:
-
-
Constructor Details
-
MagicS3GuardCommitter
Create a task committer.- Parameters:
outputPath- the job's output pathcontext- the task's context- Throws:
IOException- on a failure
-
-
Method Details
-
getName
Description copied from class:org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitterGet the name of this committer.- Specified by:
getNamein classorg.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter- Returns:
- the committer name.
-
requiresDelayedCommitOutputInFileSystem
protected boolean requiresDelayedCommitOutputInFileSystem()Require magic paths in the FS client.- Overrides:
requiresDelayedCommitOutputInFileSystemin classorg.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter- Returns:
- true, always.
-
setupJob
Description copied from class:org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitterBase job setup (optionally) deletes the success marker and always creates the destination directory. When objects are committed that dest dir marker will inevitably be deleted; creating it now ensures there is something at the end while the job is in progress -and if nothing is created, that it is still there.The option
InternalCommitterConstants.FS_S3A_COMMITTER_UUIDis set to the job UUID; if generated locallyInternalCommitterConstants.SPARK_WRITE_UUIDis also patched. The fieldAbstractS3ACommitter.jobSetupis set to true to note that this specific committer instance was used to set up a job.- Overrides:
setupJobin classorg.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter- Parameters:
context- context- Throws:
IOException- IO failure
-
listPendingUploadsToCommit
protected org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.ActiveCommit listPendingUploadsToCommit(org.apache.hadoop.fs.s3a.commit.impl.CommitContext commitContext) throws IOException Get the list of pending uploads for this job attempt, by listing all .pendingset files in the job attempt directory.- Specified by:
listPendingUploadsToCommitin classorg.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter- Parameters:
commitContext- job context- Returns:
- a list of pending commits.
- Throws:
IOException- Any IO failure
-
cleanupStagingDirs
public void cleanupStagingDirs()Delete the magic directory.- Specified by:
cleanupStagingDirsin classorg.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
-
needsTaskCommit
Did this task write any files in the work directory? Probes for a task existing by looking to see if the attempt dir exists. This adds more HTTP requests to the call. It may be better just to return true and rely on the commit task doing the work.- Specified by:
needsTaskCommitin classOutputCommitter- Parameters:
context- the task's context- Returns:
- true if the attempt path exists
- Throws:
IOException- failure to list the path
-
commitTask
Description copied from class:OutputCommitterTo promote the task's temporary output to final output location. IfOutputCommitter.needsTaskCommit(TaskAttemptContext)returns true and this task is the task that the AM determines finished first, this method is called to commit an individual task's output. This is to mark that tasks output as complete, asOutputCommitter.commitJob(JobContext)will also be called later on if the entire job finished successfully. This is called from a task's process. This may be called multiple times for the same task, but different task attempts. It should be very rare for this to be called multiple times and requires odd networking failures to make this happen. In the future the Hadoop framework may eliminate this race.- Specified by:
commitTaskin classOutputCommitter- Parameters:
context- Context of the task whose output is being written.- Throws:
IOException- if commit is not successful.
-
loadPendingCommits
protected org.apache.hadoop.fs.s3a.commit.files.PendingSet loadPendingCommits(TaskAttemptContext context) throws IOException Loads pending commits from either memory or from the remote store (S3) based on the config.- Parameters:
context- TaskAttemptContext- Returns:
- All pending commit data for the given TaskAttemptContext
- Throws:
IOException- if there is an error trying to read the commit data
-
abortTask
Abort a task. Attempt load then abort all pending files, then try to delete the task attempt path. This method may be called on the job committer, rather than the task one (such as in the MapReduce AM after a task container failure). It must extract all paths and state from the passed in context.- Specified by:
abortTaskin classOutputCommitter- Parameters:
context- task context- Throws:
IOException- if there was some problem querying the path other than it not actually existing.
-
getJobPath
Compute the path under which all job attempts will be placed.- Specified by:
getJobPathin classorg.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter- Returns:
- the path to store job attempt data.
-
getJobAttemptPath
Compute the path where the output of a given job attempt will be placed. For the magic committer, the path includes the job UUID.- Specified by:
getJobAttemptPathin classorg.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter- Parameters:
appAttemptId- the ID of the application attempt for this job.- Returns:
- the path to store job attempt data.
-
getTaskAttemptPath
Compute the path where the output of a task attempt is stored until that task is committed.- Overrides:
getTaskAttemptPathin classorg.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter- Parameters:
context- the context of the task attempt.- Returns:
- the path where a task attempt should be stored.
-
getBaseTaskAttemptPath
Description copied from class:org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitterCompute the base path where the output of a task attempt is written. This is the path which will be deleted when a task is cleaned up and aborted.- Specified by:
getBaseTaskAttemptPathin classorg.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter- Parameters:
context- the context of the task attempt.- Returns:
- the path where a task attempt should be stored.
-
getTempTaskAttemptPath
Get a temporary directory for data. When a task is aborted/cleaned up, the contents of this directory are all deleted.- Specified by:
getTempTaskAttemptPathin classorg.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter- Parameters:
context- task context- Returns:
- a path for temporary data.
-
toString
- Overrides:
toStringin classorg.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
-