java.lang.Object

org.apache.hadoop.mapreduce.lib.output.committer.manifest.ManifestCommitterConstants

@Public @Unstable public final class ManifestCommitterConstants extends Object

Public constants for the manifest committer. This includes all configuration options and their default values.

Field Summary

Fields

Modifier and Type

Field

Description

static final String

CAPABILITY_DYNAMIC_PARTITIONING

Stream Capabilities probe for spark dynamic partitioning compatibility.

static final String

CONTEXT_ATTR_STAGE

Stage attribute in audit context: "st".

static final String

CONTEXT_ATTR_TASK_ATTEMPT_ID

Task ID attribute in audit context: "ta".

static final boolean

DEFAULT_CREATE_SUCCESSFUL_JOB_DIR_MARKER

Default job marker option: true.

static final int

DEFAULT_WRITER_QUEUE_CAPACITY

Default value of OPT_WRITER_QUEUE_CAPACITY.

static final int

INITIAL_APP_ATTEMPT_ID

Initial number of all app attempts.

static final String

JOB_ATTEMPT_DIR_FORMAT_STR

Format string for building a job attempt dir.

static final String

JOB_DIR_FORMAT_STR

Format string for building a job dir.

static final String

JOB_ID_SOURCE_MAPREDUCE

String to use as source of the job ID.

static final String

JOB_TASK_ATTEMPT_SUBDIR

Name of directory under job attempt dir for task attempts.

static final String

JOB_TASK_MANIFEST_SUBDIR

Name of directory under job attempt dir for manifests.

static final String

MANIFEST_COMMITTER_CLASSNAME

Committer classname as recorded in the committer _SUCCESS file.

static final String

MANIFEST_COMMITTER_FACTORY

Name of the factory.

static final String

MANIFEST_SUFFIX

Suffix to use in manifest files in the manifest subdir.

static final String

OPT_CLEANUP_PARALLEL_DELETE

Should dir cleanup do parallel deletion of task attempt dirs before trying to delete the toplevel dirs.

static final String

OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST

Should parallel cleanup try to delete the base first?

static final boolean

OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST_DEFAULT

Default value of option OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST: false.

static final boolean

OPT_CLEANUP_PARALLEL_DELETE_DIRS_DEFAULT

Default value: true.

static final String

OPT_DELETE_TARGET_FILES

Should job commit delete for files/directories at the targets of renames, and, if found, deleting them?

static final boolean

OPT_DELETE_TARGET_FILES_DEFAULT

Default value: false.

static final String

OPT_DIAGNOSTICS_MANIFEST_DIR

Directory for moving manifests under for diagnostics.

static final String

OPT_IO_PROCESSORS

Threads to use for IO.

static final int

OPT_IO_PROCESSORS_DEFAULT

Default value: 32.

static final String

OPT_MANIFEST_SAVE_ATTEMPTS

How many attempts to save a task manifest by save and rename before giving up.

static final int

OPT_MANIFEST_SAVE_ATTEMPTS_DEFAULT

Default value of OPT_MANIFEST_SAVE_ATTEMPTS: 5.

static final String

OPT_PREFIX

Prefix to use for config options: "mapreduce.manifest.committer.".

static final String

OPT_STORE_OPERATIONS_CLASS

Classname of the store operations; filesystems and tests may override.

static final String

OPT_SUMMARY_REPORT_DIR

Directory for saving job summary reports.

static final String

OPT_VALIDATE_OUTPUT

Should the output be validated?

static final boolean

OPT_VALIDATE_OUTPUT_DEFAULT

Default value: false.

static final String

OPT_WRITER_QUEUE_CAPACITY

Queue capacity between task manifest loading an entry file writer.

static final String

SPARK_WRITE_UUID

The UUID for jobs: "spark.sql.sources.writeJobUUID".

static final String

STORE_OPERATIONS_CLASS_DEFAULT

Default classname of the store operations.

static final String

SUCCESS_MARKER

Marker file to create on success: "_SUCCESS".

static final int

SUCCESS_MARKER_FILE_LIMIT

The limit to the number of committed objects tracked during job commits and saved to the _SUCCESS file.

static final String

SUMMARY_FILENAME_FORMAT

Format string used to build a summary file from a Job ID.

static final String

SUMMARY_FILENAME_PREFIX

Prefix for summary files in the report dir.

static final String

TMP_SUFFIX

Suffix to use for temp files before renaming them.
Method Summary

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- MANIFEST_SUFFIX
  
  public static final String MANIFEST_SUFFIX
  
  Suffix to use in manifest files in the manifest subdir. Value: "-manifest.json".
  See Also:
  
  Constant Field Values
- SUMMARY_FILENAME_PREFIX
  
  public static final String SUMMARY_FILENAME_PREFIX
  
  Prefix for summary files in the report dir. Call
  See Also:
  
  Constant Field Values
- SUMMARY_FILENAME_FORMAT
  
  public static final String SUMMARY_FILENAME_FORMAT
  
  Format string used to build a summary file from a Job ID.
  See Also:
  
  Constant Field Values
- TMP_SUFFIX
  
  public static final String TMP_SUFFIX
  
  Suffix to use for temp files before renaming them. Value: ".tmp".
  See Also:
  
  Constant Field Values
- INITIAL_APP_ATTEMPT_ID
  
  public static final int INITIAL_APP_ATTEMPT_ID
  
  Initial number of all app attempts. This is fixed in YARN; for Spark jobs the same number "0" is used.
  See Also:
  
  Constant Field Values
- JOB_DIR_FORMAT_STR
  
  public static final String JOB_DIR_FORMAT_STR
  
  Format string for building a job dir. Value: "%s".
  See Also:
  
  Constant Field Values
- JOB_ATTEMPT_DIR_FORMAT_STR
  
  public static final String JOB_ATTEMPT_DIR_FORMAT_STR
  
  Format string for building a job attempt dir. This uses the job attempt number so previous versions can be found trivially. Value: "%02d".
  See Also:
  
  Constant Field Values
- JOB_TASK_MANIFEST_SUBDIR
  
  public static final String JOB_TASK_MANIFEST_SUBDIR
  
  Name of directory under job attempt dir for manifests.
  See Also:
  
  Constant Field Values
- JOB_TASK_ATTEMPT_SUBDIR
  
  public static final String JOB_TASK_ATTEMPT_SUBDIR
  
  Name of directory under job attempt dir for task attempts.
  See Also:
  
  Constant Field Values
- MANIFEST_COMMITTER_CLASSNAME
  
  public static final String MANIFEST_COMMITTER_CLASSNAME
  
  Committer classname as recorded in the committer _SUCCESS file.
- SUCCESS_MARKER
  
  public static final String SUCCESS_MARKER
  
  Marker file to create on success: "_SUCCESS".
  See Also:
  
  Constant Field Values
- DEFAULT_CREATE_SUCCESSFUL_JOB_DIR_MARKER
  
  public static final boolean DEFAULT_CREATE_SUCCESSFUL_JOB_DIR_MARKER
  
  Default job marker option: true.
  See Also:
  
  Constant Field Values
- SUCCESS_MARKER_FILE_LIMIT
  
  public static final int SUCCESS_MARKER_FILE_LIMIT
  
  The limit to the number of committed objects tracked during job commits and saved to the _SUCCESS file. Value: 100.
  See Also:
  
  Constant Field Values
- SPARK_WRITE_UUID
  
  public static final String SPARK_WRITE_UUID
  
  The UUID for jobs: "spark.sql.sources.writeJobUUID". This was historically created in Spark 1.x's SQL queries, but "went away". It has been restored in recent spark releases. If found: it is used instead of the MR job attempt ID.
  See Also:
  
  Constant Field Values
- JOB_ID_SOURCE_MAPREDUCE
  
  public static final String JOB_ID_SOURCE_MAPREDUCE
  
  String to use as source of the job ID. This SHOULD be kept in sync with that of AbstractS3ACommitter.JobUUIDSource. Value: "JobID".
  See Also:
  
  Constant Field Values
- OPT_PREFIX
  
  public static final String OPT_PREFIX
  
  Prefix to use for config options: "mapreduce.manifest.committer.".
  See Also:
  
  Constant Field Values
- OPT_CLEANUP_PARALLEL_DELETE
  
  public static final String OPT_CLEANUP_PARALLEL_DELETE
  
  Should dir cleanup do parallel deletion of task attempt dirs before trying to delete the toplevel dirs. For GCS this may deliver speedup, while on ABFS it may avoid timeouts in certain deployments, something OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST can alleviate. Value: "mapreduce.manifest.committer.cleanup.parallel.delete".
  See Also:
  
  Constant Field Values
- OPT_CLEANUP_PARALLEL_DELETE_DIRS_DEFAULT
  
  public static final boolean OPT_CLEANUP_PARALLEL_DELETE_DIRS_DEFAULT
  
  Default value: true.
  See Also:
  
  Constant Field Values
- OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST
  
  public static final String OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST
  
  Should parallel cleanup try to delete the base first? Best for azure as it skips the task attempt deletions unless the toplevel delete fails. Value: "mapreduce.manifest.committer.cleanup.parallel.delete.base.first".
  See Also:
  
  Constant Field Values
- OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST_DEFAULT
  
  public static final boolean OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST_DEFAULT
  
  Default value of option OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST: false.
  See Also:
  
  Constant Field Values
- OPT_IO_PROCESSORS
  
  public static final String OPT_IO_PROCESSORS
  
  Threads to use for IO.
  See Also:
  
  Constant Field Values
- OPT_IO_PROCESSORS_DEFAULT
  
  public static final int OPT_IO_PROCESSORS_DEFAULT
  
  Default value: 32.
  See Also:
  
  Constant Field Values
- OPT_SUMMARY_REPORT_DIR
  
  public static final String OPT_SUMMARY_REPORT_DIR
  
  Directory for saving job summary reports. These are the _SUCCESS files, but are saved even on job failures. Value: "mapreduce.manifest.committer.summary.report.directory".
  See Also:
  
  Constant Field Values
- OPT_DIAGNOSTICS_MANIFEST_DIR
  
  public static final String OPT_DIAGNOSTICS_MANIFEST_DIR
  
  Directory for moving manifests under for diagnostics. Value: "mapreduce.manifest.committer.diagnostics.manifest.directory".
  See Also:
  
  Constant Field Values
- OPT_VALIDATE_OUTPUT
  
  public static final String OPT_VALIDATE_OUTPUT
  
  Should the output be validated? This will check expected vs actual file lengths, and, if etags can be obtained, etags. Value: "mapreduce.manifest.committer.validate.output".
  See Also:
  
  Constant Field Values
- OPT_VALIDATE_OUTPUT_DEFAULT
  
  public static final boolean OPT_VALIDATE_OUTPUT_DEFAULT
  
  Default value: false.
  See Also:
  
  Constant Field Values
- OPT_DELETE_TARGET_FILES
  
  public static final String OPT_DELETE_TARGET_FILES
  
  Should job commit delete for files/directories at the targets of renames, and, if found, deleting them? This is part of the effective behavior of the FileOutputCommitter, however it adds an extra delete call per file being committed. If a job is writing to a directory which has only just been created or were unique filenames are being used, there is no need to perform this preparation. The recognition of newly created dirs is automatic. Value: "mapreduce.manifest.committer.delete.target.files".
  See Also:
  
  Constant Field Values
- OPT_DELETE_TARGET_FILES_DEFAULT
  
  public static final boolean OPT_DELETE_TARGET_FILES_DEFAULT
  
  Default value: false.
  See Also:
  
  Constant Field Values
- MANIFEST_COMMITTER_FACTORY
  
  public static final String MANIFEST_COMMITTER_FACTORY
  
  Name of the factory.
- OPT_STORE_OPERATIONS_CLASS
  
  public static final String OPT_STORE_OPERATIONS_CLASS
  
  Classname of the store operations; filesystems and tests may override. Value: "mapreduce.manifest.committer.store.operations.classname".
  See Also:
  
  Constant Field Values
- STORE_OPERATIONS_CLASS_DEFAULT
  
  public static final String STORE_OPERATIONS_CLASS_DEFAULT
  
  Default classname of the store operations.
- CONTEXT_ATTR_STAGE
  
  public static final String CONTEXT_ATTR_STAGE
  
  Stage attribute in audit context: "st".
  See Also:
  
  Constant Field Values
- CONTEXT_ATTR_TASK_ATTEMPT_ID
  
  public static final String CONTEXT_ATTR_TASK_ATTEMPT_ID
  
  Task ID attribute in audit context: "ta".
  See Also:
  
  Constant Field Values
- CAPABILITY_DYNAMIC_PARTITIONING
  
  public static final String CAPABILITY_DYNAMIC_PARTITIONING
  
  Stream Capabilities probe for spark dynamic partitioning compatibility.
  See Also:
  
  Constant Field Values
- OPT_WRITER_QUEUE_CAPACITY
  
  public static final String OPT_WRITER_QUEUE_CAPACITY
  
  Queue capacity between task manifest loading an entry file writer. If more than this number of manifest lists are waiting to be written, the enqueue is blocking. There's an expectation that writing to the local file is a lot faster than the parallelized buffer reads, therefore that this queue can be emptied at the same rate it is filled. Value "mapreduce.manifest.committer.writer.queue.capacity".
  See Also:
  
  Constant Field Values
- DEFAULT_WRITER_QUEUE_CAPACITY
  
  public static final int DEFAULT_WRITER_QUEUE_CAPACITY
  
  Default value of OPT_WRITER_QUEUE_CAPACITY. Value 32.
  See Also:
  
  Constant Field Values
- OPT_MANIFEST_SAVE_ATTEMPTS
  
  public static final String OPT_MANIFEST_SAVE_ATTEMPTS
  
  How many attempts to save a task manifest by save and rename before giving up. Value: "mapreduce.manifest.committer.manifest.save.attempts".
  See Also:
  
  Constant Field Values
- OPT_MANIFEST_SAVE_ATTEMPTS_DEFAULT
  
  public static final int OPT_MANIFEST_SAVE_ATTEMPTS_DEFAULT
  
  Default value of OPT_MANIFEST_SAVE_ATTEMPTS: 5.
  See Also:
  
  Constant Field Values

Class ManifestCommitterConstants

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

MANIFEST_SUFFIX

SUMMARY_FILENAME_PREFIX

SUMMARY_FILENAME_FORMAT

TMP_SUFFIX

INITIAL_APP_ATTEMPT_ID

JOB_DIR_FORMAT_STR

JOB_ATTEMPT_DIR_FORMAT_STR

JOB_TASK_MANIFEST_SUBDIR

JOB_TASK_ATTEMPT_SUBDIR

MANIFEST_COMMITTER_CLASSNAME

SUCCESS_MARKER

DEFAULT_CREATE_SUCCESSFUL_JOB_DIR_MARKER

SUCCESS_MARKER_FILE_LIMIT

SPARK_WRITE_UUID

JOB_ID_SOURCE_MAPREDUCE

OPT_PREFIX

OPT_CLEANUP_PARALLEL_DELETE

OPT_CLEANUP_PARALLEL_DELETE_DIRS_DEFAULT

OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST

OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST_DEFAULT

OPT_IO_PROCESSORS

OPT_IO_PROCESSORS_DEFAULT

OPT_SUMMARY_REPORT_DIR

OPT_DIAGNOSTICS_MANIFEST_DIR

OPT_VALIDATE_OUTPUT

OPT_VALIDATE_OUTPUT_DEFAULT

OPT_DELETE_TARGET_FILES

OPT_DELETE_TARGET_FILES_DEFAULT

MANIFEST_COMMITTER_FACTORY

OPT_STORE_OPERATIONS_CLASS

STORE_OPERATIONS_CLASS_DEFAULT

CONTEXT_ATTR_STAGE

CONTEXT_ATTR_TASK_ATTEMPT_ID

CAPABILITY_DYNAMIC_PARTITIONING

OPT_WRITER_QUEUE_CAPACITY

DEFAULT_WRITER_QUEUE_CAPACITY

OPT_MANIFEST_SAVE_ATTEMPTS

OPT_MANIFEST_SAVE_ATTEMPTS_DEFAULT