org.apache.hadoop.mapreduce.lib.output.committer.manifest.ManifestCommitter

All Implemented Interfaces:: org.apache.hadoop.fs.statistics.IOStatisticsSource, StreamCapabilities, org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.StageEventCallbacks

@Public @Stable public class ManifestCommitter extends PathOutputCommitter implements org.apache.hadoop.fs.statistics.IOStatisticsSource, org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.StageEventCallbacks, StreamCapabilities

This is the Intermediate-Manifest committer. At every entry point it updates the thread's audit context with the current stage info; this is a placeholder for adding audit information to stores other than S3A. This is tagged as public/stable. This is mandatory for the classname and PathOutputCommitter implementation classes.

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.hadoop.fs.StreamCapabilities
StreamCapabilities.StreamCapability
Field Summary

Fields

Modifier and Type

Field

Description

static final String

JOB_COMMITTER

Role: job committer.

static final org.slf4j.Logger

LOG

static final String

TASK_COMMITTER

Role: task committer.

Fields inherited from interface org.apache.hadoop.fs.StreamCapabilities
ABORTABLE_STREAM, DROPBEHIND, HFLUSH, HSYNC, IOSTATISTICS, IOSTATISTICS_CONTEXT, PREADBYTEBUFFER, READAHEAD, READBYTEBUFFER, UNBUFFER, VECTOREDIO, VECTOREDIO_BUFFERS_SLICED
Constructor Summary

Constructors

Constructor

Description

ManifestCommitter(Path outputPath, TaskAttemptContext context)

Create a committer.
Method Summary

Modifier and Type

Method

Description

void

abortJob(JobContext jobContext, JobStatus.State state)

Abort the job.

void

abortTask(TaskAttemptContext context)

Abort a task.

void

cleanupJob(JobContext jobContext)

Execute the CleanupJobStage to remove the job attempt dir.

void

commitJob(JobContext jobContext)

This is the big job commit stage.

void

commitTask(TaskAttemptContext context)

Commit the task.

protected org.apache.hadoop.mapreduce.lib.output.committer.manifest.impl.ManifestStoreOperations

createManifestStoreOperations()

Create manifest store operations for the destination store.

void

enterStage(String stage)

Callback on stage entry.

void

exitStage(String stage)

Remove stage from common audit context.

Configuration

getConf()

Get the config of the task attempt this instance was constructed with.

org.apache.hadoop.fs.statistics.impl.IOStatisticsStore

getIOStatistics()

Return a statistics instance.

Path

getJobAttemptPath(JobContext context)

Compute the path where the output of a task attempt is stored until that task is committed.

String

getJobUniqueId()

Get the unique ID of this job.

Path

getOutputPath()

Output path: destination directory of the job.

ManifestSuccessData

getSuccessReport()

Get the manifest Success data; only valid after a job.

Path

getTaskAttemptPath(TaskAttemptContext context)

Compute the path where the output of a task attempt is stored until that task is committed.

Path

getTaskManifestPath(TaskAttemptContext context)

The path to where the manifest file of a task attempt will be saved when the task is committed.

Path

getWorkPath()

Work path of the current task attempt.

boolean

hasCapability(String capability)

The committer is compatible with spark's dynamic partitioning algorithm.

boolean

isCommitJobRepeatable(JobContext jobContext)

Failure during Job Commit is not recoverable from.

boolean

isRecoverySupported(JobContext jobContext)

Declare that task recovery is not supported.

boolean

needsTaskCommit(TaskAttemptContext context)

Always return true.

void

recoverTask(TaskAttemptContext taskContext)

Recover the task output.

void

setupJob(JobContext jobContext)

Set up a job through a SetupJobStage.

void

setupTask(TaskAttemptContext context)

Set up a task through a SetupTaskStage.

String

toString()

Methods inherited from class org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter
hasOutputPath

Methods inherited from class org.apache.hadoop.mapreduce.OutputCommitter
isRecoverySupported

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- LOG
  
  public static final org.slf4j.Logger LOG
- TASK_COMMITTER
  
  public static final String TASK_COMMITTER
  
  Role: task committer.
  See Also:
  
  Constant Field Values
- JOB_COMMITTER
  
  public static final String JOB_COMMITTER
  
  Role: job committer.
  See Also:
  
  Constant Field Values
Constructor Details
- ManifestCommitter
  
  public ManifestCommitter(Path outputPath, TaskAttemptContext context) throws IOException
  
  Create a committer.
  
  Parameters:
  
  outputPath - output path
  
  context - job/task context
  
  Throws:
  
  IOException - failure.
Method Details
- setupJob
  
  public void setupJob(JobContext jobContext) throws IOException
  
  Set up a job through a SetupJobStage.
  
  Specified by:
  
  setupJob in class OutputCommitter
  
  Parameters:
  
  jobContext - Context of the job whose output is being written.
  
  Throws:
  
  IOException - IO Failure.
- setupTask
  
  public void setupTask(TaskAttemptContext context) throws IOException
  
  Set up a task through a SetupTaskStage. Classic FileOutputCommitter is a no-op here, relying on RecordWriters to create the dir implicitly on file create(). FileOutputCommitter also uses the existence of that file as a flag to indicate task commit is needed.
  
  Specified by:
  
  setupTask in class OutputCommitter
  
  Parameters:
  
  context - task context.
  
  Throws:
  
  IOException - IO Failure.
- needsTaskCommit
  
  public boolean needsTaskCommit(TaskAttemptContext context) throws IOException
  
  Always return true. This way, even if there is no output, stats are collected.
  
  Specified by:
  
  needsTaskCommit in class OutputCommitter
  
  Parameters:
  
  context - task context.
  
  Returns:
  
  true
  
  Throws:
  
  IOException - IO Failure.
- isCommitJobRepeatable
  
  public boolean isCommitJobRepeatable(JobContext jobContext) throws IOException
  
  Failure during Job Commit is not recoverable from.
  
  Overrides:
  
  isCommitJobRepeatable in class OutputCommitter
  
  Parameters:
  
  jobContext - Context of the job whose output is being written.
  
  Returns:
  
  false, always
  
  Throws:
  
  IOException - never
- isRecoverySupported
  
  public boolean isRecoverySupported(JobContext jobContext) throws IOException
  
  Declare that task recovery is not supported. It would be, if someone added the code *and tests*.
  Overrides:
  
  isRecoverySupported in class OutputCommitter
  
  Parameters:
  
  jobContext - Context of the job whose output is being written.
  
  Returns:
  
  false, always
  
  Throws:
  
  IOException - never
  
  See Also:
  
  OutputCommitter.recoverTask(TaskAttemptContext)
- recoverTask
  
  public void recoverTask(TaskAttemptContext taskContext) throws IOException
  
  Description copied from class: OutputCommitter
  
  Recover the task output. The retry-count for the job will be passed via the MRJobConfig.APPLICATION_ATTEMPT_ID key in JobContext.getConfiguration() for the OutputCommitter. This is called from the application master process, but it is called individually for each task. If an exception is thrown the task will be attempted again. This may be called multiple times for the same task. But from different application attempts.
  
  Overrides:
  
  recoverTask in class OutputCommitter
  
  Parameters:
  
  taskContext - Context of the task whose output is being recovered
  
  Throws:
  
  IOException - always
- commitTask
  
  public void commitTask(TaskAttemptContext context) throws IOException
  
  Commit the task. This is where the task attempt tree list takes place.
  
  Specified by:
  
  commitTask in class OutputCommitter
  
  Parameters:
  
  context - task context.
  
  Throws:
  
  IOException - IO Failure.
- abortTask
  
  public void abortTask(TaskAttemptContext context) throws IOException
  
  Abort a task.
  
  Specified by:
  
  abortTask in class OutputCommitter
  
  Parameters:
  
  context - task context
  
  Throws:
  
  IOException - failure during the delete
- commitJob
  
  public void commitJob(JobContext jobContext) throws IOException
  
  This is the big job commit stage. Load the manifests, prepare the destination, rename the files then cleanup the job directory.
  
  Overrides:
  
  commitJob in class OutputCommitter
  
  Parameters:
  
  jobContext - Context of the job whose output is being written.
  
  Throws:
  
  IOException - failure.
- abortJob
  
  public void abortJob(JobContext jobContext, JobStatus.State state) throws IOException
  
  Abort the job. Invokes executeCleanup(String, JobContext, ManifestCommitterConfig) then saves the (ongoing) job report data if reporting is enabled.
  
  Overrides:
  
  abortJob in class OutputCommitter
  
  Parameters:
  
  jobContext - Context of the job whose output is being written.
  
  state - final runstate of the job
  
  Throws:
  
  IOException - failure during cleanup; report failure are swallowed
- cleanupJob
  
  public void cleanupJob(JobContext jobContext) throws IOException
  
  Execute the CleanupJobStage to remove the job attempt dir. This does
  
  Overrides:
  
  cleanupJob in class OutputCommitter
  
  Parameters:
  
  jobContext - Context of the job whose output is being written.
  
  Throws:
  
  IOException - failure during cleanup
- getOutputPath
  
  public Path getOutputPath()
  
  Output path: destination directory of the job.
  
  Specified by:
  
  getOutputPath in class PathOutputCommitter
  
  Returns:
  
  the overall job destination directory.
- getWorkPath
  
  public Path getWorkPath()
  
  Work path of the current task attempt. This is null if the task does not have one.
  
  Specified by:
  
  getWorkPath in class PathOutputCommitter
  
  Returns:
  
  a path.
- enterStage
  
  public void enterStage(String stage)
  
  Callback on stage entry. Sets activeStage and updates the common context.
  
  Specified by:
  
  enterStage in interface org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.StageEventCallbacks
  
  Parameters:
  
  stage - new stage
- exitStage
  
  public void exitStage(String stage)
  
  Remove stage from common audit context.
  
  Specified by:
  
  exitStage in interface org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.StageEventCallbacks
  
  Parameters:
  
  stage - stage exited.
- getJobUniqueId
  
  public String getJobUniqueId()
  
  Get the unique ID of this job.
  
  Returns:
  
  job ID (yarn, spark)
- getConf
  
  public Configuration getConf()
  
  Get the config of the task attempt this instance was constructed with.
  
  Returns:
  
  a configuration.
- getSuccessReport
  
  public ManifestSuccessData getSuccessReport()
  
  Get the manifest Success data; only valid after a job.
  
  Returns:
  
  the job _SUCCESS data, or null.
- getTaskAttemptPath
  
  @VisibleForTesting public Path getTaskAttemptPath(TaskAttemptContext context)
  
  Compute the path where the output of a task attempt is stored until that task is committed.
  
  Parameters:
  
  context - the context of the task attempt.
  
  Returns:
  
  the path where a task attempt should be stored.
- getTaskManifestPath
  
  @VisibleForTesting public Path getTaskManifestPath(TaskAttemptContext context)
  
  The path to where the manifest file of a task attempt will be saved when the task is committed. This path will be the same for all attempts of the same task.
  
  Parameters:
  
  context - the context of the task attempt.
  
  Returns:
  
  the path where a task attempt should be stored.
- getJobAttemptPath
  
  @VisibleForTesting public Path getJobAttemptPath(JobContext context)
  
  Compute the path where the output of a task attempt is stored until that task is committed.
  
  Parameters:
  
  context - the context of the task attempt.
  
  Returns:
  
  the path where a task attempt should be stored.
- createManifestStoreOperations
  
  protected org.apache.hadoop.mapreduce.lib.output.committer.manifest.impl.ManifestStoreOperations createManifestStoreOperations() throws IOException
  
  Create manifest store operations for the destination store. This MUST NOT be used for the success report operations, as they may be to a different filesystem. This is a point which can be overridden during testing.
  
  Returns:
  
  a new store operations instance bonded to the destination fs.
  
  Throws:
  
  IOException - failure to instantiate.
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class PathOutputCommitter
- getIOStatistics
  
  public org.apache.hadoop.fs.statistics.impl.IOStatisticsStore getIOStatistics()
  
  Description copied from interface: org.apache.hadoop.fs.statistics.IOStatisticsSource
  
  Return a statistics instance.
  It is not a requirement that the same instance is returned every time. IOStatisticsSource.
  If the object implementing this is Closeable, this method may return null if invoked on a closed object, even if it returns a valid instance when called earlier.
  
  Specified by:
  
  getIOStatistics in interface org.apache.hadoop.fs.statistics.IOStatisticsSource
  
  Returns:
  
  an IOStatistics instance or null
- hasCapability
  
  public boolean hasCapability(String capability)
  
  The committer is compatible with spark's dynamic partitioning algorithm.
  
  Specified by:
  
  hasCapability in interface StreamCapabilities
  
  Parameters:
  
  capability - string to query the stream support for.
  
  Returns:
  
  true if the requested capability is supported.

Class ManifestCommitter

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.hadoop.fs.StreamCapabilities

Field Summary

Fields inherited from interface org.apache.hadoop.fs.StreamCapabilities

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter

Methods inherited from class org.apache.hadoop.mapreduce.OutputCommitter

Methods inherited from class java.lang.Object

Field Details

LOG

TASK_COMMITTER

JOB_COMMITTER

Constructor Details

ManifestCommitter

Method Details

setupJob

setupTask

needsTaskCommit

isCommitJobRepeatable

isRecoverySupported

recoverTask

commitTask

abortTask

commitJob

abortJob

cleanupJob

getOutputPath

getWorkPath

enterStage

exitStage

getJobUniqueId

getConf

getSuccessReport

getTaskAttemptPath

getTaskManifestPath

getJobAttemptPath

createManifestStoreOperations

toString

getIOStatistics

hasCapability