Package org.apache.hadoop.mapreduce.lib.output.committer.manifest
Class ManifestCommitter
java.lang.Object
org.apache.hadoop.mapreduce.OutputCommitter
org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter
org.apache.hadoop.mapreduce.lib.output.committer.manifest.ManifestCommitter
- All Implemented Interfaces:
org.apache.hadoop.fs.statistics.IOStatisticsSource,StreamCapabilities,org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.StageEventCallbacks
@Public
@Stable
public class ManifestCommitter
extends PathOutputCommitter
implements org.apache.hadoop.fs.statistics.IOStatisticsSource, org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.StageEventCallbacks, StreamCapabilities
This is the Intermediate-Manifest committer.
At every entry point it updates the thread's audit context with
the current stage info; this is a placeholder for
adding audit information to stores other than S3A.
This is tagged as public/stable. This is mandatory
for the classname and PathOutputCommitter implementation
classes.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.hadoop.fs.StreamCapabilities
StreamCapabilities.StreamCapability -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringRole: job committer.static final org.slf4j.Loggerstatic final StringRole: task committer.Fields inherited from interface org.apache.hadoop.fs.StreamCapabilities
ABORTABLE_STREAM, DROPBEHIND, HFLUSH, HSYNC, IOSTATISTICS, IOSTATISTICS_CONTEXT, PREADBYTEBUFFER, READAHEAD, READBYTEBUFFER, UNBUFFER, VECTOREDIO, VECTOREDIO_BUFFERS_SLICED -
Constructor Summary
ConstructorsConstructorDescriptionManifestCommitter(Path outputPath, TaskAttemptContext context) Create a committer. -
Method Summary
Modifier and TypeMethodDescriptionvoidabortJob(JobContext jobContext, JobStatus.State state) Abort the job.voidabortTask(TaskAttemptContext context) Abort a task.voidcleanupJob(JobContext jobContext) Execute theCleanupJobStageto remove the job attempt dir.voidcommitJob(JobContext jobContext) This is the big job commit stage.voidcommitTask(TaskAttemptContext context) Commit the task.protected org.apache.hadoop.mapreduce.lib.output.committer.manifest.impl.ManifestStoreOperationsCreate manifest store operations for the destination store.voidenterStage(String stage) Callback on stage entry.voidRemove stage from common audit context.getConf()Get the config of the task attempt this instance was constructed with.org.apache.hadoop.fs.statistics.impl.IOStatisticsStoreReturn a statistics instance.getJobAttemptPath(JobContext context) Compute the path where the output of a task attempt is stored until that task is committed.Get the unique ID of this job.Output path: destination directory of the job.Get the manifest Success data; only valid after a job.getTaskAttemptPath(TaskAttemptContext context) Compute the path where the output of a task attempt is stored until that task is committed.getTaskManifestPath(TaskAttemptContext context) The path to where the manifest file of a task attempt will be saved when the task is committed.Work path of the current task attempt.booleanhasCapability(String capability) The committer is compatible with spark's dynamic partitioning algorithm.booleanisCommitJobRepeatable(JobContext jobContext) Failure during Job Commit is not recoverable from.booleanisRecoverySupported(JobContext jobContext) Declare that task recovery is not supported.booleanneedsTaskCommit(TaskAttemptContext context) Always return true.voidrecoverTask(TaskAttemptContext taskContext) Recover the task output.voidsetupJob(JobContext jobContext) Set up a job through aSetupJobStage.voidsetupTask(TaskAttemptContext context) Set up a task through aSetupTaskStage.toString()Methods inherited from class org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter
hasOutputPathMethods inherited from class org.apache.hadoop.mapreduce.OutputCommitter
isRecoverySupported
-
Field Details
-
LOG
public static final org.slf4j.Logger LOG -
TASK_COMMITTER
Role: task committer.- See Also:
-
JOB_COMMITTER
Role: job committer.- See Also:
-
-
Constructor Details
-
ManifestCommitter
Create a committer.- Parameters:
outputPath- output pathcontext- job/task context- Throws:
IOException- failure.
-
-
Method Details
-
setupJob
Set up a job through aSetupJobStage.- Specified by:
setupJobin classOutputCommitter- Parameters:
jobContext- Context of the job whose output is being written.- Throws:
IOException- IO Failure.
-
setupTask
Set up a task through aSetupTaskStage. Classic FileOutputCommitter is a no-op here, relying on RecordWriters to create the dir implicitly on file create(). FileOutputCommitter also uses the existence of that file as a flag to indicate task commit is needed.- Specified by:
setupTaskin classOutputCommitter- Parameters:
context- task context.- Throws:
IOException- IO Failure.
-
needsTaskCommit
Always return true. This way, even if there is no output, stats are collected.- Specified by:
needsTaskCommitin classOutputCommitter- Parameters:
context- task context.- Returns:
- true
- Throws:
IOException- IO Failure.
-
isCommitJobRepeatable
Failure during Job Commit is not recoverable from.- Overrides:
isCommitJobRepeatablein classOutputCommitter- Parameters:
jobContext- Context of the job whose output is being written.- Returns:
- false, always
- Throws:
IOException- never
-
isRecoverySupported
Declare that task recovery is not supported. It would be, if someone added the code *and tests*.- Overrides:
isRecoverySupportedin classOutputCommitter- Parameters:
jobContext- Context of the job whose output is being written.- Returns:
- false, always
- Throws:
IOException- never- See Also:
-
recoverTask
Description copied from class:OutputCommitterRecover the task output. The retry-count for the job will be passed via theMRJobConfig.APPLICATION_ATTEMPT_IDkey inJobContext.getConfiguration()for theOutputCommitter. This is called from the application master process, but it is called individually for each task. If an exception is thrown the task will be attempted again. This may be called multiple times for the same task. But from different application attempts.- Overrides:
recoverTaskin classOutputCommitter- Parameters:
taskContext- Context of the task whose output is being recovered- Throws:
IOException- always
-
commitTask
Commit the task. This is where the task attempt tree list takes place.- Specified by:
commitTaskin classOutputCommitter- Parameters:
context- task context.- Throws:
IOException- IO Failure.
-
abortTask
Abort a task.- Specified by:
abortTaskin classOutputCommitter- Parameters:
context- task context- Throws:
IOException- failure during the delete
-
commitJob
This is the big job commit stage. Load the manifests, prepare the destination, rename the files then cleanup the job directory.- Overrides:
commitJobin classOutputCommitter- Parameters:
jobContext- Context of the job whose output is being written.- Throws:
IOException- failure.
-
abortJob
Abort the job. InvokesexecuteCleanup(String, JobContext, ManifestCommitterConfig)then saves the (ongoing) job report data if reporting is enabled.- Overrides:
abortJobin classOutputCommitter- Parameters:
jobContext- Context of the job whose output is being written.state- final runstate of the job- Throws:
IOException- failure during cleanup; report failure are swallowed
-
cleanupJob
Execute theCleanupJobStageto remove the job attempt dir. This does- Overrides:
cleanupJobin classOutputCommitter- Parameters:
jobContext- Context of the job whose output is being written.- Throws:
IOException- failure during cleanup
-
getOutputPath
Output path: destination directory of the job.- Specified by:
getOutputPathin classPathOutputCommitter- Returns:
- the overall job destination directory.
-
getWorkPath
Work path of the current task attempt. This is null if the task does not have one.- Specified by:
getWorkPathin classPathOutputCommitter- Returns:
- a path.
-
enterStage
Callback on stage entry. SetsactiveStageand updates the common context.- Specified by:
enterStagein interfaceorg.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.StageEventCallbacks- Parameters:
stage- new stage
-
exitStage
Remove stage from common audit context.- Specified by:
exitStagein interfaceorg.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.StageEventCallbacks- Parameters:
stage- stage exited.
-
getJobUniqueId
Get the unique ID of this job.- Returns:
- job ID (yarn, spark)
-
getConf
Get the config of the task attempt this instance was constructed with.- Returns:
- a configuration.
-
getSuccessReport
Get the manifest Success data; only valid after a job.- Returns:
- the job _SUCCESS data, or null.
-
getTaskAttemptPath
Compute the path where the output of a task attempt is stored until that task is committed.- Parameters:
context- the context of the task attempt.- Returns:
- the path where a task attempt should be stored.
-
getTaskManifestPath
The path to where the manifest file of a task attempt will be saved when the task is committed. This path will be the same for all attempts of the same task.- Parameters:
context- the context of the task attempt.- Returns:
- the path where a task attempt should be stored.
-
getJobAttemptPath
Compute the path where the output of a task attempt is stored until that task is committed.- Parameters:
context- the context of the task attempt.- Returns:
- the path where a task attempt should be stored.
-
createManifestStoreOperations
protected org.apache.hadoop.mapreduce.lib.output.committer.manifest.impl.ManifestStoreOperations createManifestStoreOperations() throws IOExceptionCreate manifest store operations for the destination store. This MUST NOT be used for the success report operations, as they may be to a different filesystem. This is a point which can be overridden during testing.- Returns:
- a new store operations instance bonded to the destination fs.
- Throws:
IOException- failure to instantiate.
-
toString
- Overrides:
toStringin classPathOutputCommitter
-
getIOStatistics
public org.apache.hadoop.fs.statistics.impl.IOStatisticsStore getIOStatistics()Description copied from interface:org.apache.hadoop.fs.statistics.IOStatisticsSourceReturn a statistics instance.It is not a requirement that the same instance is returned every time.
IOStatisticsSource.If the object implementing this is Closeable, this method may return null if invoked on a closed object, even if it returns a valid instance when called earlier.
- Specified by:
getIOStatisticsin interfaceorg.apache.hadoop.fs.statistics.IOStatisticsSource- Returns:
- an IOStatistics instance or null
-
hasCapability
The committer is compatible with spark's dynamic partitioning algorithm.- Specified by:
hasCapabilityin interfaceStreamCapabilities- Parameters:
capability- string to query the stream support for.- Returns:
- true if the requested capability is supported.
-