org.apache.hadoop.mapred
Class FileOutputCommitter

java.lang.Object
  extended by org.apache.hadoop.mapreduce.OutputCommitter
      extended by org.apache.hadoop.mapred.OutputCommitter
          extended by org.apache.hadoop.mapred.FileOutputCommitter

@InterfaceAudience.Public
@InterfaceStability.Stable
public class FileOutputCommitter
extends OutputCommitter

An OutputCommitter that commits files specified in job output directory i.e. ${mapreduce.output.fileoutputformat.outputdir}.


Field Summary
static org.apache.commons.logging.Log LOG
           
static String SUCCEEDED_FILE_NAME
           
static String TEMP_DIR_NAME
          Temporary directory name
 
Constructor Summary
FileOutputCommitter()
           
 
Method Summary
 void abortJob(JobContext context, int runState)
          For aborting an unsuccessful job's output.
 void abortTask(TaskAttemptContext context)
          Discard the task output.
 void cleanupJob(JobContext context)
          Deprecated. 
 void commitJob(JobContext context)
          For committing job's output after successful job completion.
 void commitTask(TaskAttemptContext context)
          To promote the task's temporary output to final output location.
 Path getWorkPath(TaskAttemptContext context, Path outputPath)
           
 boolean isRecoverySupported()
          Deprecated. 
 boolean isRecoverySupported(JobContext context)
          Is task output recovery supported for restarting jobs? If task output recovery is supported, job restart can be done more efficiently.
 boolean needsTaskCommit(TaskAttemptContext context)
          Check whether task needs a commit.
 void recoverTask(TaskAttemptContext context)
          Recover the task output.
 void setupJob(JobContext context)
          For the framework to setup the job output during initialization.
 void setupTask(TaskAttemptContext context)
          Sets up output for the task.
 
Methods inherited from class org.apache.hadoop.mapred.OutputCommitter
abortJob, abortTask, cleanupJob, commitJob, commitTask, isRecoverySupported, needsTaskCommit, recoverTask, setupJob, setupTask
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG

TEMP_DIR_NAME

public static final String TEMP_DIR_NAME
Temporary directory name

See Also:
Constant Field Values

SUCCEEDED_FILE_NAME

public static final String SUCCEEDED_FILE_NAME
See Also:
Constant Field Values
Constructor Detail

FileOutputCommitter

public FileOutputCommitter()
Method Detail

getWorkPath

public Path getWorkPath(TaskAttemptContext context,
                        Path outputPath)
                 throws IOException
Throws:
IOException

setupJob

public void setupJob(JobContext context)
              throws IOException
Description copied from class: OutputCommitter
For the framework to setup the job output during initialization. This is called from the application master process for the entire job. This will be called multiple times, once per job attempt.

Specified by:
setupJob in class OutputCommitter
Parameters:
context - Context of the job whose output is being written.
Throws:
IOException - if temporary output could not be created

commitJob

public void commitJob(JobContext context)
               throws IOException
Description copied from class: OutputCommitter
For committing job's output after successful job completion. Note that this is invoked for jobs with final runstate as SUCCESSFUL. This is called from the application master process for the entire job. This is guaranteed to only be called once. If it throws an exception the entire job will fail.

Overrides:
commitJob in class OutputCommitter
Parameters:
context - Context of the job whose output is being written.
Throws:
IOException

cleanupJob

@Deprecated
public void cleanupJob(JobContext context)
                throws IOException
Deprecated. 

Description copied from class: OutputCommitter
For cleaning up the job's output after job completion. This is called from the application master process for the entire job. This may be called multiple times.

Overrides:
cleanupJob in class OutputCommitter
Parameters:
context - Context of the job whose output is being written.
Throws:
IOException

abortJob

public void abortJob(JobContext context,
                     int runState)
              throws IOException
Description copied from class: OutputCommitter
For aborting an unsuccessful job's output. Note that this is invoked for jobs with final runstate as JobStatus.FAILED or JobStatus.KILLED. This is called from the application master process for the entire job. This may be called multiple times.

Overrides:
abortJob in class OutputCommitter
Parameters:
context - Context of the job whose output is being written.
runState - final runstate of the job
Throws:
IOException

setupTask

public void setupTask(TaskAttemptContext context)
               throws IOException
Description copied from class: OutputCommitter
Sets up output for the task. This is called from each individual task's process that will output to HDFS, and it is called just for that task. This may be called multiple times for the same task, but for different task attempts.

Specified by:
setupTask in class OutputCommitter
Parameters:
context - Context of the task whose output is being written.
Throws:
IOException

commitTask

public void commitTask(TaskAttemptContext context)
                throws IOException
Description copied from class: OutputCommitter
To promote the task's temporary output to final output location. If OutputCommitter.needsTaskCommit(TaskAttemptContext) returns true and this task is the task that the AM determines finished first, this method is called to commit an individual task's output. This is to mark that tasks output as complete, as OutputCommitter.commitJob(JobContext) will also be called later on if the entire job finished successfully. This is called from a task's process. This may be called multiple times for the same task, but different task attempts. It should be very rare for this to be called multiple times and requires odd networking failures to make this happen. In the future the Hadoop framework may eliminate this race.

Specified by:
commitTask in class OutputCommitter
Parameters:
context - Context of the task whose output is being written.
Throws:
IOException - if commit is not

abortTask

public void abortTask(TaskAttemptContext context)
               throws IOException
Description copied from class: OutputCommitter
Discard the task output. This is called from a task's process to clean up a single task's output that can not yet been committed. This may be called multiple times for the same task, but for different task attempts.

Specified by:
abortTask in class OutputCommitter
Throws:
IOException

needsTaskCommit

public boolean needsTaskCommit(TaskAttemptContext context)
                        throws IOException
Description copied from class: OutputCommitter
Check whether task needs a commit. This is called from each individual task's process that will output to HDFS, and it is called just for that task.

Specified by:
needsTaskCommit in class OutputCommitter
Returns:
true/false
Throws:
IOException

isRecoverySupported

@Deprecated
public boolean isRecoverySupported()
Deprecated. 

Description copied from class: OutputCommitter
This method implements the new interface by calling the old method. Note that the input types are different between the new and old apis and this is a bridge between the two.

Overrides:
isRecoverySupported in class OutputCommitter
Returns:
true if task output recovery is supported, false otherwise
See Also:
OutputCommitter.recoverTask(TaskAttemptContext)

isRecoverySupported

public boolean isRecoverySupported(JobContext context)
                            throws IOException
Description copied from class: OutputCommitter
Is task output recovery supported for restarting jobs? If task output recovery is supported, job restart can be done more efficiently.

Overrides:
isRecoverySupported in class OutputCommitter
Parameters:
context - Context of the job whose output is being written.
Returns:
true if task output recovery is supported, false otherwise
Throws:
IOException
See Also:
OutputCommitter.recoverTask(TaskAttemptContext)

recoverTask

public void recoverTask(TaskAttemptContext context)
                 throws IOException
Description copied from class: OutputCommitter
Recover the task output. The retry-count for the job will be passed via the MRConstants.APPLICATION_ATTEMPT_ID key in JobContext.getConfiguration() for the OutputCommitter. This is called from the application master process, but it is called individually for each task. If an exception is thrown the task will be attempted again.

Overrides:
recoverTask in class OutputCommitter
Parameters:
context - Context of the task whose output is being recovered
Throws:
IOException


Copyright © 2014 Apache Software Foundation. All Rights Reserved.