Package org.apache.hadoop.mapreduce.lib.output.committer.manifest
Class ManifestCommitterConstants
java.lang.Object
org.apache.hadoop.mapreduce.lib.output.committer.manifest.ManifestCommitterConstants
Public constants for the manifest committer.
This includes all configuration options and their default values.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringStream Capabilities probe for spark dynamic partitioning compatibility.static final StringStage attribute in audit context: "st".static final StringTask ID attribute in audit context: "ta".static final booleanDefault job marker option: true.static final intDefault value ofOPT_WRITER_QUEUE_CAPACITY.static final intInitial number of all app attempts.static final StringFormat string for building a job attempt dir.static final StringFormat string for building a job dir.static final StringString to use as source of the job ID.static final StringName of directory under job attempt dir for task attempts.static final StringName of directory under job attempt dir for manifests.static final StringCommitter classname as recorded in the committer _SUCCESS file.static final StringName of the factory.static final StringSuffix to use in manifest files in the manifest subdir.static final StringShould dir cleanup do parallel deletion of task attempt dirs before trying to delete the toplevel dirs.static final StringShould parallel cleanup try to delete the base first?static final booleanDefault value of optionOPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST: false.static final booleanDefault value: true.static final StringShould job commit delete for files/directories at the targets of renames, and, if found, deleting them?static final booleanDefault value: false.static final StringDirectory for moving manifests under for diagnostics.static final StringThreads to use for IO.static final intDefault value: 32.static final StringHow many attempts to save a task manifest by save and rename before giving up.static final intDefault value ofOPT_MANIFEST_SAVE_ATTEMPTS: 5.static final StringPrefix to use for config options: "mapreduce.manifest.committer.".static final StringClassname of the store operations; filesystems and tests may override.static final StringDirectory for saving job summary reports.static final StringShould the output be validated?static final booleanDefault value: false.static final StringQueue capacity between task manifest loading an entry file writer.static final StringThe UUID for jobs: "spark.sql.sources.writeJobUUID".static final StringDefault classname of the store operations.static final StringMarker file to create on success: "_SUCCESS".static final intThe limit to the number of committed objects tracked during job commits and saved to the _SUCCESS file.static final StringFormat string used to build a summary file from a Job ID.static final StringPrefix for summary files in the report dir.static final StringSuffix to use for temp files before renaming them. -
Method Summary
-
Field Details
-
MANIFEST_SUFFIX
Suffix to use in manifest files in the manifest subdir. Value: "-manifest.json".- See Also:
-
SUMMARY_FILENAME_PREFIX
Prefix for summary files in the report dir. Call- See Also:
-
SUMMARY_FILENAME_FORMAT
Format string used to build a summary file from a Job ID.- See Also:
-
TMP_SUFFIX
Suffix to use for temp files before renaming them. Value: ".tmp".- See Also:
-
INITIAL_APP_ATTEMPT_ID
public static final int INITIAL_APP_ATTEMPT_IDInitial number of all app attempts. This is fixed in YARN; for Spark jobs the same number "0" is used.- See Also:
-
JOB_DIR_FORMAT_STR
Format string for building a job dir. Value: "%s".- See Also:
-
JOB_ATTEMPT_DIR_FORMAT_STR
Format string for building a job attempt dir. This uses the job attempt number so previous versions can be found trivially. Value: "%02d".- See Also:
-
JOB_TASK_MANIFEST_SUBDIR
Name of directory under job attempt dir for manifests.- See Also:
-
JOB_TASK_ATTEMPT_SUBDIR
Name of directory under job attempt dir for task attempts.- See Also:
-
MANIFEST_COMMITTER_CLASSNAME
Committer classname as recorded in the committer _SUCCESS file. -
SUCCESS_MARKER
Marker file to create on success: "_SUCCESS".- See Also:
-
DEFAULT_CREATE_SUCCESSFUL_JOB_DIR_MARKER
public static final boolean DEFAULT_CREATE_SUCCESSFUL_JOB_DIR_MARKERDefault job marker option: true.- See Also:
-
SUCCESS_MARKER_FILE_LIMIT
public static final int SUCCESS_MARKER_FILE_LIMITThe limit to the number of committed objects tracked during job commits and saved to the _SUCCESS file. Value: 100.- See Also:
-
SPARK_WRITE_UUID
The UUID for jobs: "spark.sql.sources.writeJobUUID". This was historically created in Spark 1.x's SQL queries, but "went away". It has been restored in recent spark releases. If found: it is used instead of the MR job attempt ID.- See Also:
-
JOB_ID_SOURCE_MAPREDUCE
String to use as source of the job ID. This SHOULD be kept in sync with that ofAbstractS3ACommitter.JobUUIDSource. Value: "JobID".- See Also:
-
OPT_PREFIX
Prefix to use for config options: "mapreduce.manifest.committer.".- See Also:
-
OPT_CLEANUP_PARALLEL_DELETE
Should dir cleanup do parallel deletion of task attempt dirs before trying to delete the toplevel dirs. For GCS this may deliver speedup, while on ABFS it may avoid timeouts in certain deployments, somethingOPT_CLEANUP_PARALLEL_DELETE_BASE_FIRSTcan alleviate. Value: "mapreduce.manifest.committer.cleanup.parallel.delete".- See Also:
-
OPT_CLEANUP_PARALLEL_DELETE_DIRS_DEFAULT
public static final boolean OPT_CLEANUP_PARALLEL_DELETE_DIRS_DEFAULTDefault value: true.- See Also:
-
OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST
Should parallel cleanup try to delete the base first? Best for azure as it skips the task attempt deletions unless the toplevel delete fails. Value: "mapreduce.manifest.committer.cleanup.parallel.delete.base.first".- See Also:
-
OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST_DEFAULT
public static final boolean OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST_DEFAULTDefault value of optionOPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST: false.- See Also:
-
OPT_IO_PROCESSORS
Threads to use for IO.- See Also:
-
OPT_IO_PROCESSORS_DEFAULT
public static final int OPT_IO_PROCESSORS_DEFAULTDefault value: 32.- See Also:
-
OPT_SUMMARY_REPORT_DIR
Directory for saving job summary reports. These are the _SUCCESS files, but are saved even on job failures. Value: "mapreduce.manifest.committer.summary.report.directory".- See Also:
-
OPT_DIAGNOSTICS_MANIFEST_DIR
Directory for moving manifests under for diagnostics. Value: "mapreduce.manifest.committer.diagnostics.manifest.directory".- See Also:
-
OPT_VALIDATE_OUTPUT
Should the output be validated? This will check expected vs actual file lengths, and, if etags can be obtained, etags. Value: "mapreduce.manifest.committer.validate.output".- See Also:
-
OPT_VALIDATE_OUTPUT_DEFAULT
public static final boolean OPT_VALIDATE_OUTPUT_DEFAULTDefault value: false.- See Also:
-
OPT_DELETE_TARGET_FILES
Should job commit delete for files/directories at the targets of renames, and, if found, deleting them? This is part of the effective behavior of the FileOutputCommitter, however it adds an extra delete call per file being committed. If a job is writing to a directory which has only just been created or were unique filenames are being used, there is no need to perform this preparation. The recognition of newly created dirs is automatic. Value: "mapreduce.manifest.committer.delete.target.files".- See Also:
-
OPT_DELETE_TARGET_FILES_DEFAULT
public static final boolean OPT_DELETE_TARGET_FILES_DEFAULTDefault value: false.- See Also:
-
MANIFEST_COMMITTER_FACTORY
Name of the factory. -
OPT_STORE_OPERATIONS_CLASS
Classname of the store operations; filesystems and tests may override. Value: "mapreduce.manifest.committer.store.operations.classname".- See Also:
-
STORE_OPERATIONS_CLASS_DEFAULT
Default classname of the store operations. -
CONTEXT_ATTR_STAGE
Stage attribute in audit context: "st".- See Also:
-
CONTEXT_ATTR_TASK_ATTEMPT_ID
Task ID attribute in audit context: "ta".- See Also:
-
CAPABILITY_DYNAMIC_PARTITIONING
Stream Capabilities probe for spark dynamic partitioning compatibility.- See Also:
-
OPT_WRITER_QUEUE_CAPACITY
Queue capacity between task manifest loading an entry file writer. If more than this number of manifest lists are waiting to be written, the enqueue is blocking. There's an expectation that writing to the local file is a lot faster than the parallelized buffer reads, therefore that this queue can be emptied at the same rate it is filled. Value "mapreduce.manifest.committer.writer.queue.capacity".- See Also:
-
DEFAULT_WRITER_QUEUE_CAPACITY
public static final int DEFAULT_WRITER_QUEUE_CAPACITYDefault value ofOPT_WRITER_QUEUE_CAPACITY. Value 32.- See Also:
-
OPT_MANIFEST_SAVE_ATTEMPTS
How many attempts to save a task manifest by save and rename before giving up. Value: "mapreduce.manifest.committer.manifest.save.attempts".- See Also:
-
OPT_MANIFEST_SAVE_ATTEMPTS_DEFAULT
public static final int OPT_MANIFEST_SAVE_ATTEMPTS_DEFAULTDefault value ofOPT_MANIFEST_SAVE_ATTEMPTS: 5.- See Also:
-