Package org.apache.hadoop.mapreduce.lib.output.committer.manifest

Class ManifestCommitter

java.lang.Object
org.apache.hadoop.mapreduce.OutputCommitter
org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter
org.apache.hadoop.mapreduce.lib.output.committer.manifest.ManifestCommitter
All Implemented Interfaces:
org.apache.hadoop.fs.statistics.IOStatisticsSource, StreamCapabilities, org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.StageEventCallbacks

@Public @Stable public class ManifestCommitter extends PathOutputCommitter implements org.apache.hadoop.fs.statistics.IOStatisticsSource, org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.StageEventCallbacks, StreamCapabilities
This is the Intermediate-Manifest committer. At every entry point it updates the thread's audit context with the current stage info; this is a placeholder for adding audit information to stores other than S3A. This is tagged as public/stable. This is mandatory for the classname and PathOutputCommitter implementation classes.
  • Field Details

    • LOG

      public static final org.slf4j.Logger LOG
    • TASK_COMMITTER

      public static final String TASK_COMMITTER
      Role: task committer.
      See Also:
    • JOB_COMMITTER

      public static final String JOB_COMMITTER
      Role: job committer.
      See Also:
  • Constructor Details

    • ManifestCommitter

      public ManifestCommitter(Path outputPath, TaskAttemptContext context) throws IOException
      Create a committer.
      Parameters:
      outputPath - output path
      context - job/task context
      Throws:
      IOException - failure.
  • Method Details

    • setupJob

      public void setupJob(JobContext jobContext) throws IOException
      Set up a job through a SetupJobStage.
      Specified by:
      setupJob in class OutputCommitter
      Parameters:
      jobContext - Context of the job whose output is being written.
      Throws:
      IOException - IO Failure.
    • setupTask

      public void setupTask(TaskAttemptContext context) throws IOException
      Set up a task through a SetupTaskStage. Classic FileOutputCommitter is a no-op here, relying on RecordWriters to create the dir implicitly on file create(). FileOutputCommitter also uses the existence of that file as a flag to indicate task commit is needed.
      Specified by:
      setupTask in class OutputCommitter
      Parameters:
      context - task context.
      Throws:
      IOException - IO Failure.
    • needsTaskCommit

      public boolean needsTaskCommit(TaskAttemptContext context) throws IOException
      Always return true. This way, even if there is no output, stats are collected.
      Specified by:
      needsTaskCommit in class OutputCommitter
      Parameters:
      context - task context.
      Returns:
      true
      Throws:
      IOException - IO Failure.
    • isCommitJobRepeatable

      public boolean isCommitJobRepeatable(JobContext jobContext) throws IOException
      Failure during Job Commit is not recoverable from.
      Overrides:
      isCommitJobRepeatable in class OutputCommitter
      Parameters:
      jobContext - Context of the job whose output is being written.
      Returns:
      false, always
      Throws:
      IOException - never
    • isRecoverySupported

      public boolean isRecoverySupported(JobContext jobContext) throws IOException
      Declare that task recovery is not supported. It would be, if someone added the code *and tests*.
      Overrides:
      isRecoverySupported in class OutputCommitter
      Parameters:
      jobContext - Context of the job whose output is being written.
      Returns:
      false, always
      Throws:
      IOException - never
      See Also:
    • recoverTask

      public void recoverTask(TaskAttemptContext taskContext) throws IOException
      Description copied from class: OutputCommitter
      Recover the task output. The retry-count for the job will be passed via the MRJobConfig.APPLICATION_ATTEMPT_ID key in JobContext.getConfiguration() for the OutputCommitter. This is called from the application master process, but it is called individually for each task. If an exception is thrown the task will be attempted again. This may be called multiple times for the same task. But from different application attempts.
      Overrides:
      recoverTask in class OutputCommitter
      Parameters:
      taskContext - Context of the task whose output is being recovered
      Throws:
      IOException - always
    • commitTask

      public void commitTask(TaskAttemptContext context) throws IOException
      Commit the task. This is where the task attempt tree list takes place.
      Specified by:
      commitTask in class OutputCommitter
      Parameters:
      context - task context.
      Throws:
      IOException - IO Failure.
    • abortTask

      public void abortTask(TaskAttemptContext context) throws IOException
      Abort a task.
      Specified by:
      abortTask in class OutputCommitter
      Parameters:
      context - task context
      Throws:
      IOException - failure during the delete
    • commitJob

      public void commitJob(JobContext jobContext) throws IOException
      This is the big job commit stage. Load the manifests, prepare the destination, rename the files then cleanup the job directory.
      Overrides:
      commitJob in class OutputCommitter
      Parameters:
      jobContext - Context of the job whose output is being written.
      Throws:
      IOException - failure.
    • abortJob

      public void abortJob(JobContext jobContext, JobStatus.State state) throws IOException
      Abort the job. Invokes executeCleanup(String, JobContext, ManifestCommitterConfig) then saves the (ongoing) job report data if reporting is enabled.
      Overrides:
      abortJob in class OutputCommitter
      Parameters:
      jobContext - Context of the job whose output is being written.
      state - final runstate of the job
      Throws:
      IOException - failure during cleanup; report failure are swallowed
    • cleanupJob

      public void cleanupJob(JobContext jobContext) throws IOException
      Execute the CleanupJobStage to remove the job attempt dir. This does
      Overrides:
      cleanupJob in class OutputCommitter
      Parameters:
      jobContext - Context of the job whose output is being written.
      Throws:
      IOException - failure during cleanup
    • getOutputPath

      public Path getOutputPath()
      Output path: destination directory of the job.
      Specified by:
      getOutputPath in class PathOutputCommitter
      Returns:
      the overall job destination directory.
    • getWorkPath

      public Path getWorkPath()
      Work path of the current task attempt. This is null if the task does not have one.
      Specified by:
      getWorkPath in class PathOutputCommitter
      Returns:
      a path.
    • enterStage

      public void enterStage(String stage)
      Callback on stage entry. Sets activeStage and updates the common context.
      Specified by:
      enterStage in interface org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.StageEventCallbacks
      Parameters:
      stage - new stage
    • exitStage

      public void exitStage(String stage)
      Remove stage from common audit context.
      Specified by:
      exitStage in interface org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.StageEventCallbacks
      Parameters:
      stage - stage exited.
    • getJobUniqueId

      public String getJobUniqueId()
      Get the unique ID of this job.
      Returns:
      job ID (yarn, spark)
    • getConf

      public Configuration getConf()
      Get the config of the task attempt this instance was constructed with.
      Returns:
      a configuration.
    • getSuccessReport

      public ManifestSuccessData getSuccessReport()
      Get the manifest Success data; only valid after a job.
      Returns:
      the job _SUCCESS data, or null.
    • getTaskAttemptPath

      @VisibleForTesting public Path getTaskAttemptPath(TaskAttemptContext context)
      Compute the path where the output of a task attempt is stored until that task is committed.
      Parameters:
      context - the context of the task attempt.
      Returns:
      the path where a task attempt should be stored.
    • getTaskManifestPath

      @VisibleForTesting public Path getTaskManifestPath(TaskAttemptContext context)
      The path to where the manifest file of a task attempt will be saved when the task is committed. This path will be the same for all attempts of the same task.
      Parameters:
      context - the context of the task attempt.
      Returns:
      the path where a task attempt should be stored.
    • getJobAttemptPath

      @VisibleForTesting public Path getJobAttemptPath(JobContext context)
      Compute the path where the output of a task attempt is stored until that task is committed.
      Parameters:
      context - the context of the task attempt.
      Returns:
      the path where a task attempt should be stored.
    • createManifestStoreOperations

      protected org.apache.hadoop.mapreduce.lib.output.committer.manifest.impl.ManifestStoreOperations createManifestStoreOperations() throws IOException
      Create manifest store operations for the destination store. This MUST NOT be used for the success report operations, as they may be to a different filesystem. This is a point which can be overridden during testing.
      Returns:
      a new store operations instance bonded to the destination fs.
      Throws:
      IOException - failure to instantiate.
    • toString

      public String toString()
      Overrides:
      toString in class PathOutputCommitter
    • getIOStatistics

      public org.apache.hadoop.fs.statistics.impl.IOStatisticsStore getIOStatistics()
      Description copied from interface: org.apache.hadoop.fs.statistics.IOStatisticsSource
      Return a statistics instance.

      It is not a requirement that the same instance is returned every time. IOStatisticsSource.

      If the object implementing this is Closeable, this method may return null if invoked on a closed object, even if it returns a valid instance when called earlier.

      Specified by:
      getIOStatistics in interface org.apache.hadoop.fs.statistics.IOStatisticsSource
      Returns:
      an IOStatistics instance or null
    • hasCapability

      public boolean hasCapability(String capability)
      The committer is compatible with spark's dynamic partitioning algorithm.
      Specified by:
      hasCapability in interface StreamCapabilities
      Parameters:
      capability - string to query the stream support for.
      Returns:
      true if the requested capability is supported.