Package org.apache.hadoop.fs.s3a.commit.magic

Class MagicS3GuardCommitter

java.lang.Object
org.apache.hadoop.mapreduce.OutputCommitter
org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
org.apache.hadoop.fs.s3a.commit.magic.MagicS3GuardCommitter
All Implemented Interfaces:
org.apache.hadoop.fs.statistics.IOStatisticsSource

@Public @Unstable public class MagicS3GuardCommitter extends org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
This is a dedicated committer which requires the "magic" directory feature of the S3A Filesystem to be enabled; it then uses paths for task and job attempts in magic paths, so as to ensure that the final output goes direct to the destination directory.
  • Nested Class Summary

    Nested classes/interfaces inherited from class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter

    org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.ActiveCommit, AbstractS3ACommitter.JobUUIDSource
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
    Name: "magic".

    Fields inherited from class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter

    E_SELF_GENERATED_JOB_UUID, THREAD_PREFIX
  • Constructor Summary

    Constructors
    Constructor
    Description
    Create a task committer.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Abort a task.
    void
    Delete the magic directory.
    void
    To promote the task's temporary output to final output location.
    protected final Path
    Compute the base path where the output of a task attempt is written.
    protected final Path
    getJobAttemptPath(int appAttemptId)
    Compute the path where the output of a given job attempt will be placed.
    protected Path
    Compute the path under which all job attempts will be placed.
    Get the name of this committer.
    final Path
    Compute the path where the output of a task attempt is stored until that task is committed.
    Get a temporary directory for data.
    protected org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.ActiveCommit
    listPendingUploadsToCommit(org.apache.hadoop.fs.s3a.commit.impl.CommitContext commitContext)
    Get the list of pending uploads for this job attempt, by listing all .pendingset files in the job attempt directory.
    protected org.apache.hadoop.fs.s3a.commit.files.PendingSet
    Loads pending commits from either memory or from the remote store (S3) based on the config.
    boolean
    Did this task write any files in the work directory?
    protected boolean
    Require magic paths in the FS client.
    void
    Base job setup (optionally) deletes the success marker and always creates the destination directory.
     

    Methods inherited from class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter

    abortJob, abortJobInternal, abortPendingUploads, abortPendingUploads, abortPendingUploadsInCleanup, buildJobUUID, cleanup, cleanupJob, commitJob, commitJobInternal, commitPendingUploads, deleteTaskAttemptPathQuietly, getAuditSpanSource, getCommitOperations, getConf, getDestFS, getDestinationFS, getDestS3AFS, getIOStatistics, getJobAttemptPath, getJobContext, getOutputPath, getRole, getTaskAttemptFilesystem, getUUID, getUUIDSource, getWorkPath, initiateJobOperation, initiateTaskOperation, initOutput, jobCompleted, maybeCreateSuccessMarker, maybeCreateSuccessMarkerFromCommits, maybeIgnore, maybeIgnore, precommitCheckPendingFiles, preCommitJob, recoverTask, setConf, setDestFS, setOutputPath, setupTask, setWorkPath, startOperation, updateCommonContext, warnOnActiveUploads

    Methods inherited from class org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter

    hasOutputPath

    Methods inherited from class org.apache.hadoop.mapreduce.OutputCommitter

    isCommitJobRepeatable, isRecoverySupported, isRecoverySupported

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

  • Constructor Details

    • MagicS3GuardCommitter

      public MagicS3GuardCommitter(Path outputPath, TaskAttemptContext context) throws IOException
      Create a task committer.
      Parameters:
      outputPath - the job's output path
      context - the task's context
      Throws:
      IOException - on a failure
  • Method Details

    • getName

      public String getName()
      Description copied from class: org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
      Get the name of this committer.
      Specified by:
      getName in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
      Returns:
      the committer name.
    • requiresDelayedCommitOutputInFileSystem

      protected boolean requiresDelayedCommitOutputInFileSystem()
      Require magic paths in the FS client.
      Overrides:
      requiresDelayedCommitOutputInFileSystem in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
      Returns:
      true, always.
    • setupJob

      public void setupJob(JobContext context) throws IOException
      Description copied from class: org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
      Base job setup (optionally) deletes the success marker and always creates the destination directory. When objects are committed that dest dir marker will inevitably be deleted; creating it now ensures there is something at the end while the job is in progress -and if nothing is created, that it is still there.

      The option InternalCommitterConstants.FS_S3A_COMMITTER_UUID is set to the job UUID; if generated locally InternalCommitterConstants.SPARK_WRITE_UUID is also patched. The field AbstractS3ACommitter.jobSetup is set to true to note that this specific committer instance was used to set up a job.

      Overrides:
      setupJob in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
      Parameters:
      context - context
      Throws:
      IOException - IO failure
    • listPendingUploadsToCommit

      protected org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.ActiveCommit listPendingUploadsToCommit(org.apache.hadoop.fs.s3a.commit.impl.CommitContext commitContext) throws IOException
      Get the list of pending uploads for this job attempt, by listing all .pendingset files in the job attempt directory.
      Specified by:
      listPendingUploadsToCommit in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
      Parameters:
      commitContext - job context
      Returns:
      a list of pending commits.
      Throws:
      IOException - Any IO failure
    • cleanupStagingDirs

      public void cleanupStagingDirs()
      Delete the magic directory.
      Specified by:
      cleanupStagingDirs in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
    • needsTaskCommit

      public boolean needsTaskCommit(TaskAttemptContext context) throws IOException
      Did this task write any files in the work directory? Probes for a task existing by looking to see if the attempt dir exists. This adds more HTTP requests to the call. It may be better just to return true and rely on the commit task doing the work.
      Specified by:
      needsTaskCommit in class OutputCommitter
      Parameters:
      context - the task's context
      Returns:
      true if the attempt path exists
      Throws:
      IOException - failure to list the path
    • commitTask

      public void commitTask(TaskAttemptContext context) throws IOException
      Description copied from class: OutputCommitter
      To promote the task's temporary output to final output location. If OutputCommitter.needsTaskCommit(TaskAttemptContext) returns true and this task is the task that the AM determines finished first, this method is called to commit an individual task's output. This is to mark that tasks output as complete, as OutputCommitter.commitJob(JobContext) will also be called later on if the entire job finished successfully. This is called from a task's process. This may be called multiple times for the same task, but different task attempts. It should be very rare for this to be called multiple times and requires odd networking failures to make this happen. In the future the Hadoop framework may eliminate this race.
      Specified by:
      commitTask in class OutputCommitter
      Parameters:
      context - Context of the task whose output is being written.
      Throws:
      IOException - if commit is not successful.
    • loadPendingCommits

      protected org.apache.hadoop.fs.s3a.commit.files.PendingSet loadPendingCommits(TaskAttemptContext context) throws IOException
      Loads pending commits from either memory or from the remote store (S3) based on the config.
      Parameters:
      context - TaskAttemptContext
      Returns:
      All pending commit data for the given TaskAttemptContext
      Throws:
      IOException - if there is an error trying to read the commit data
    • abortTask

      public void abortTask(TaskAttemptContext context) throws IOException
      Abort a task. Attempt load then abort all pending files, then try to delete the task attempt path. This method may be called on the job committer, rather than the task one (such as in the MapReduce AM after a task container failure). It must extract all paths and state from the passed in context.
      Specified by:
      abortTask in class OutputCommitter
      Parameters:
      context - task context
      Throws:
      IOException - if there was some problem querying the path other than it not actually existing.
    • getJobPath

      protected Path getJobPath()
      Compute the path under which all job attempts will be placed.
      Specified by:
      getJobPath in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
      Returns:
      the path to store job attempt data.
    • getJobAttemptPath

      protected final Path getJobAttemptPath(int appAttemptId)
      Compute the path where the output of a given job attempt will be placed. For the magic committer, the path includes the job UUID.
      Specified by:
      getJobAttemptPath in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
      Parameters:
      appAttemptId - the ID of the application attempt for this job.
      Returns:
      the path to store job attempt data.
    • getTaskAttemptPath

      public final Path getTaskAttemptPath(TaskAttemptContext context)
      Compute the path where the output of a task attempt is stored until that task is committed.
      Overrides:
      getTaskAttemptPath in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
      Parameters:
      context - the context of the task attempt.
      Returns:
      the path where a task attempt should be stored.
    • getBaseTaskAttemptPath

      protected final Path getBaseTaskAttemptPath(TaskAttemptContext context)
      Description copied from class: org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
      Compute the base path where the output of a task attempt is written. This is the path which will be deleted when a task is cleaned up and aborted.
      Specified by:
      getBaseTaskAttemptPath in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
      Parameters:
      context - the context of the task attempt.
      Returns:
      the path where a task attempt should be stored.
    • getTempTaskAttemptPath

      public Path getTempTaskAttemptPath(TaskAttemptContext context)
      Get a temporary directory for data. When a task is aborted/cleaned up, the contents of this directory are all deleted.
      Specified by:
      getTempTaskAttemptPath in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter
      Parameters:
      context - task context
      Returns:
      a path for temporary data.
    • toString

      public String toString()
      Overrides:
      toString in class org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter