org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter

org.apache.hadoop.fs.s3a.commit.magic.MagicS3GuardCommitter

All Implemented Interfaces:: org.apache.hadoop.fs.statistics.IOStatisticsSource

@Public @Unstable public class MagicS3GuardCommitter extends org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter

This is a dedicated committer which requires the "magic" directory feature of the S3A Filesystem to be enabled; it then uses paths for task and job attempts in magic paths, so as to ensure that the final output goes direct to the destination directory.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.ActiveCommit, AbstractS3ACommitter.JobUUIDSource
Field Summary

Fields

Modifier and Type

Field

Description

static final String

NAME

Name: "magic".

Fields inherited from class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
E_SELF_GENERATED_JOB_UUID, THREAD_PREFIX
Constructor Summary

Constructors

Constructor

Description

MagicS3GuardCommitter(Path outputPath, TaskAttemptContext context)

Create a task committer.
Method Summary

Modifier and Type

Method

Description

void

abortTask(TaskAttemptContext context)

Abort a task.

void

cleanupStagingDirs()

Delete the magic directory.

void

commitTask(TaskAttemptContext context)

To promote the task's temporary output to final output location.

protected final Path

getBaseTaskAttemptPath(TaskAttemptContext context)

Compute the base path where the output of a task attempt is written.

protected final Path

getJobAttemptPath(int appAttemptId)

Compute the path where the output of a given job attempt will be placed.

protected Path

getJobPath()

Compute the path under which all job attempts will be placed.

String

getName()

Get the name of this committer.

final Path

getTaskAttemptPath(TaskAttemptContext context)

Compute the path where the output of a task attempt is stored until that task is committed.

Path

getTempTaskAttemptPath(TaskAttemptContext context)

Get a temporary directory for data.

protected org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.ActiveCommit

listPendingUploadsToCommit(org.apache.hadoop.fs.s3a.commit.impl.CommitContext commitContext)

Get the list of pending uploads for this job attempt, by listing all .pendingset files in the job attempt directory.

protected org.apache.hadoop.fs.s3a.commit.files.PendingSet

loadPendingCommits(TaskAttemptContext context)

Loads pending commits from either memory or from the remote store (S3) based on the config.

boolean

needsTaskCommit(TaskAttemptContext context)

Did this task write any files in the work directory?

protected boolean

requiresDelayedCommitOutputInFileSystem()

Require magic paths in the FS client.

void

setupJob(JobContext context)

Base job setup (optionally) deletes the success marker and always creates the destination directory.

String

toString()

Methods inherited from class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
abortJob, abortJobInternal, abortPendingUploads, abortPendingUploads, abortPendingUploadsInCleanup, buildJobUUID, cleanup, cleanupJob, commitJob, commitJobInternal, commitPendingUploads, deleteTaskAttemptPathQuietly, getAuditSpanSource, getCommitOperations, getConf, getDestFS, getDestinationFS, getDestS3AFS, getIOStatistics, getJobAttemptPath, getJobContext, getOutputPath, getRole, getTaskAttemptFilesystem, getUUID, getUUIDSource, getWorkPath, initiateJobOperation, initiateTaskOperation, initOutput, jobCompleted, maybeCreateSuccessMarker, maybeCreateSuccessMarkerFromCommits, maybeIgnore, maybeIgnore, precommitCheckPendingFiles, preCommitJob, recoverTask, setConf, setDestFS, setOutputPath, setupTask, setWorkPath, startOperation, updateCommonContext, warnOnActiveUploads

Methods inherited from class org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter
hasOutputPath

Methods inherited from class org.apache.hadoop.mapreduce.OutputCommitter
isCommitJobRepeatable, isRecoverySupported, isRecoverySupported

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- NAME
  
  public static final String NAME
  
  Name: "magic".
  See Also:
  
  Constant Field Values
Constructor Details
- MagicS3GuardCommitter
  
  public MagicS3GuardCommitter(Path outputPath, TaskAttemptContext context) throws IOException
  
  Create a task committer.
  
  Parameters:
  
  outputPath - the job's output path
  
  context - the task's context
  
  Throws:
  
  IOException - on a failure
Method Details
- getName
  
  public String getName()
  
  Description copied from class: org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
  
  Get the name of this committer.
  
  Specified by:
  
  getName in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
  
  Returns:
  
  the committer name.
- requiresDelayedCommitOutputInFileSystem
  
  protected boolean requiresDelayedCommitOutputInFileSystem()
  
  Require magic paths in the FS client.
  
  Overrides:
  
  requiresDelayedCommitOutputInFileSystem in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
  
  Returns:
  
  true, always.
- setupJob
  
  public void setupJob(JobContext context) throws IOException
  
  Description copied from class: org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
  
  Base job setup (optionally) deletes the success marker and always creates the destination directory. When objects are committed that dest dir marker will inevitably be deleted; creating it now ensures there is something at the end while the job is in progress -and if nothing is created, that it is still there.
  The option InternalCommitterConstants.FS_S3A_COMMITTER_UUID is set to the job UUID; if generated locally InternalCommitterConstants.SPARK_WRITE_UUID is also patched. The field AbstractS3ACommitter.jobSetup is set to true to note that this specific committer instance was used to set up a job.
  
  Overrides:
  
  setupJob in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
  
  Parameters:
  
  context - context
  
  Throws:
  
  IOException - IO failure
- listPendingUploadsToCommit
  
  protected org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.ActiveCommit listPendingUploadsToCommit(org.apache.hadoop.fs.s3a.commit.impl.CommitContext commitContext) throws IOException
  
  Get the list of pending uploads for this job attempt, by listing all .pendingset files in the job attempt directory.
  
  Specified by:
  
  listPendingUploadsToCommit in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
  
  Parameters:
  
  commitContext - job context
  
  Returns:
  
  a list of pending commits.
  
  Throws:
  
  IOException - Any IO failure
- cleanupStagingDirs
  
  public void cleanupStagingDirs()
  
  Delete the magic directory.
  
  Specified by:
  
  cleanupStagingDirs in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
- needsTaskCommit
  
  public boolean needsTaskCommit(TaskAttemptContext context) throws IOException
  
  Did this task write any files in the work directory? Probes for a task existing by looking to see if the attempt dir exists. This adds more HTTP requests to the call. It may be better just to return true and rely on the commit task doing the work.
  
  Specified by:
  
  needsTaskCommit in class OutputCommitter
  
  Parameters:
  
  context - the task's context
  
  Returns:
  
  true if the attempt path exists
  
  Throws:
  
  IOException - failure to list the path
- commitTask
  
  public void commitTask(TaskAttemptContext context) throws IOException
  
  Description copied from class: OutputCommitter
  
  To promote the task's temporary output to final output location. If OutputCommitter.needsTaskCommit(TaskAttemptContext) returns true and this task is the task that the AM determines finished first, this method is called to commit an individual task's output. This is to mark that tasks output as complete, as OutputCommitter.commitJob(JobContext) will also be called later on if the entire job finished successfully. This is called from a task's process. This may be called multiple times for the same task, but different task attempts. It should be very rare for this to be called multiple times and requires odd networking failures to make this happen. In the future the Hadoop framework may eliminate this race.
  
  Specified by:
  
  commitTask in class OutputCommitter
  
  Parameters:
  
  context - Context of the task whose output is being written.
  
  Throws:
  
  IOException - if commit is not successful.
- loadPendingCommits
  
  protected org.apache.hadoop.fs.s3a.commit.files.PendingSet loadPendingCommits(TaskAttemptContext context) throws IOException
  
  Loads pending commits from either memory or from the remote store (S3) based on the config.
  
  Parameters:
  
  context - TaskAttemptContext
  
  Returns:
  
  All pending commit data for the given TaskAttemptContext
  
  Throws:
  
  IOException - if there is an error trying to read the commit data
- abortTask
  
  public void abortTask(TaskAttemptContext context) throws IOException
  
  Abort a task. Attempt load then abort all pending files, then try to delete the task attempt path. This method may be called on the job committer, rather than the task one (such as in the MapReduce AM after a task container failure). It must extract all paths and state from the passed in context.
  
  Specified by:
  
  abortTask in class OutputCommitter
  
  Parameters:
  
  context - task context
  
  Throws:
  
  IOException - if there was some problem querying the path other than it not actually existing.
- getJobPath
  
  protected Path getJobPath()
  
  Compute the path under which all job attempts will be placed.
  
  Specified by:
  
  getJobPath in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
  
  Returns:
  
  the path to store job attempt data.
- getJobAttemptPath
  
  protected final Path getJobAttemptPath(int appAttemptId)
  
  Compute the path where the output of a given job attempt will be placed. For the magic committer, the path includes the job UUID.
  
  Specified by:
  
  getJobAttemptPath in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
  
  Parameters:
  
  appAttemptId - the ID of the application attempt for this job.
  
  Returns:
  
  the path to store job attempt data.
- getTaskAttemptPath
  
  public final Path getTaskAttemptPath(TaskAttemptContext context)
  
  Compute the path where the output of a task attempt is stored until that task is committed.
  
  Overrides:
  
  getTaskAttemptPath in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
  
  Parameters:
  
  context - the context of the task attempt.
  
  Returns:
  
  the path where a task attempt should be stored.
- getBaseTaskAttemptPath
  
  protected final Path getBaseTaskAttemptPath(TaskAttemptContext context)
  
  Description copied from class: org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
  
  Compute the base path where the output of a task attempt is written. This is the path which will be deleted when a task is cleaned up and aborted.
  
  Specified by:
  
  getBaseTaskAttemptPath in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
  
  Parameters:
  
  context - the context of the task attempt.
  
  Returns:
  
  the path where a task attempt should be stored.
- getTempTaskAttemptPath
  
  public Path getTempTaskAttemptPath(TaskAttemptContext context)
  
  Get a temporary directory for data. When a task is aborted/cleaned up, the contents of this directory are all deleted.
  
  Specified by:
  
  getTempTaskAttemptPath in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
  
  Parameters:
  
  context - task context
  
  Returns:
  
  a path for temporary data.
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter

Class MagicS3GuardCommitter

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter

Field Summary

Fields inherited from class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter

Methods inherited from class org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter

Methods inherited from class org.apache.hadoop.mapreduce.OutputCommitter

Methods inherited from class java.lang.Object

Field Details

NAME

Constructor Details

MagicS3GuardCommitter

Method Details

getName

requiresDelayedCommitOutputInFileSystem

setupJob

listPendingUploadsToCommit

cleanupStagingDirs

needsTaskCommit

commitTask

loadPendingCommits

abortTask

getJobPath

getJobAttemptPath

getTaskAttemptPath

getBaseTaskAttemptPath

getTempTaskAttemptPath

toString