Package org.apache.hadoop.mapreduce.lib.output.committer.manifest

Class ManifestCommitterConstants

java.lang.Object
org.apache.hadoop.mapreduce.lib.output.committer.manifest.ManifestCommitterConstants

@Public @Unstable public final class ManifestCommitterConstants extends Object
Public constants for the manifest committer. This includes all configuration options and their default values.
  • Field Details

    • MANIFEST_SUFFIX

      public static final String MANIFEST_SUFFIX
      Suffix to use in manifest files in the manifest subdir. Value: "-manifest.json".
      See Also:
    • SUMMARY_FILENAME_PREFIX

      public static final String SUMMARY_FILENAME_PREFIX
      Prefix for summary files in the report dir. Call
      See Also:
    • SUMMARY_FILENAME_FORMAT

      public static final String SUMMARY_FILENAME_FORMAT
      Format string used to build a summary file from a Job ID.
      See Also:
    • TMP_SUFFIX

      public static final String TMP_SUFFIX
      Suffix to use for temp files before renaming them. Value: ".tmp".
      See Also:
    • INITIAL_APP_ATTEMPT_ID

      public static final int INITIAL_APP_ATTEMPT_ID
      Initial number of all app attempts. This is fixed in YARN; for Spark jobs the same number "0" is used.
      See Also:
    • JOB_DIR_FORMAT_STR

      public static final String JOB_DIR_FORMAT_STR
      Format string for building a job dir. Value: "%s".
      See Also:
    • JOB_ATTEMPT_DIR_FORMAT_STR

      public static final String JOB_ATTEMPT_DIR_FORMAT_STR
      Format string for building a job attempt dir. This uses the job attempt number so previous versions can be found trivially. Value: "%02d".
      See Also:
    • JOB_TASK_MANIFEST_SUBDIR

      public static final String JOB_TASK_MANIFEST_SUBDIR
      Name of directory under job attempt dir for manifests.
      See Also:
    • JOB_TASK_ATTEMPT_SUBDIR

      public static final String JOB_TASK_ATTEMPT_SUBDIR
      Name of directory under job attempt dir for task attempts.
      See Also:
    • MANIFEST_COMMITTER_CLASSNAME

      public static final String MANIFEST_COMMITTER_CLASSNAME
      Committer classname as recorded in the committer _SUCCESS file.
    • SUCCESS_MARKER

      public static final String SUCCESS_MARKER
      Marker file to create on success: "_SUCCESS".
      See Also:
    • DEFAULT_CREATE_SUCCESSFUL_JOB_DIR_MARKER

      public static final boolean DEFAULT_CREATE_SUCCESSFUL_JOB_DIR_MARKER
      Default job marker option: true.
      See Also:
    • SUCCESS_MARKER_FILE_LIMIT

      public static final int SUCCESS_MARKER_FILE_LIMIT
      The limit to the number of committed objects tracked during job commits and saved to the _SUCCESS file. Value: 100.
      See Also:
    • SPARK_WRITE_UUID

      public static final String SPARK_WRITE_UUID
      The UUID for jobs: "spark.sql.sources.writeJobUUID". This was historically created in Spark 1.x's SQL queries, but "went away". It has been restored in recent spark releases. If found: it is used instead of the MR job attempt ID.
      See Also:
    • JOB_ID_SOURCE_MAPREDUCE

      public static final String JOB_ID_SOURCE_MAPREDUCE
      String to use as source of the job ID. This SHOULD be kept in sync with that of AbstractS3ACommitter.JobUUIDSource. Value: "JobID".
      See Also:
    • OPT_PREFIX

      public static final String OPT_PREFIX
      Prefix to use for config options: "mapreduce.manifest.committer.".
      See Also:
    • OPT_CLEANUP_PARALLEL_DELETE

      public static final String OPT_CLEANUP_PARALLEL_DELETE
      Should dir cleanup do parallel deletion of task attempt dirs before trying to delete the toplevel dirs. For GCS this may deliver speedup, while on ABFS it may avoid timeouts in certain deployments, something OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST can alleviate. Value: "mapreduce.manifest.committer.cleanup.parallel.delete".
      See Also:
    • OPT_CLEANUP_PARALLEL_DELETE_DIRS_DEFAULT

      public static final boolean OPT_CLEANUP_PARALLEL_DELETE_DIRS_DEFAULT
      Default value: true.
      See Also:
    • OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST

      public static final String OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST
      Should parallel cleanup try to delete the base first? Best for azure as it skips the task attempt deletions unless the toplevel delete fails. Value: "mapreduce.manifest.committer.cleanup.parallel.delete.base.first".
      See Also:
    • OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST_DEFAULT

      public static final boolean OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST_DEFAULT
      Default value of option OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST: false.
      See Also:
    • OPT_IO_PROCESSORS

      public static final String OPT_IO_PROCESSORS
      Threads to use for IO.
      See Also:
    • OPT_IO_PROCESSORS_DEFAULT

      public static final int OPT_IO_PROCESSORS_DEFAULT
      Default value: 32.
      See Also:
    • OPT_SUMMARY_REPORT_DIR

      public static final String OPT_SUMMARY_REPORT_DIR
      Directory for saving job summary reports. These are the _SUCCESS files, but are saved even on job failures. Value: "mapreduce.manifest.committer.summary.report.directory".
      See Also:
    • OPT_DIAGNOSTICS_MANIFEST_DIR

      public static final String OPT_DIAGNOSTICS_MANIFEST_DIR
      Directory for moving manifests under for diagnostics. Value: "mapreduce.manifest.committer.diagnostics.manifest.directory".
      See Also:
    • OPT_VALIDATE_OUTPUT

      public static final String OPT_VALIDATE_OUTPUT
      Should the output be validated? This will check expected vs actual file lengths, and, if etags can be obtained, etags. Value: "mapreduce.manifest.committer.validate.output".
      See Also:
    • OPT_VALIDATE_OUTPUT_DEFAULT

      public static final boolean OPT_VALIDATE_OUTPUT_DEFAULT
      Default value: false.
      See Also:
    • OPT_DELETE_TARGET_FILES

      public static final String OPT_DELETE_TARGET_FILES
      Should job commit delete for files/directories at the targets of renames, and, if found, deleting them? This is part of the effective behavior of the FileOutputCommitter, however it adds an extra delete call per file being committed. If a job is writing to a directory which has only just been created or were unique filenames are being used, there is no need to perform this preparation. The recognition of newly created dirs is automatic. Value: "mapreduce.manifest.committer.delete.target.files".
      See Also:
    • OPT_DELETE_TARGET_FILES_DEFAULT

      public static final boolean OPT_DELETE_TARGET_FILES_DEFAULT
      Default value: false.
      See Also:
    • MANIFEST_COMMITTER_FACTORY

      public static final String MANIFEST_COMMITTER_FACTORY
      Name of the factory.
    • OPT_STORE_OPERATIONS_CLASS

      public static final String OPT_STORE_OPERATIONS_CLASS
      Classname of the store operations; filesystems and tests may override. Value: "mapreduce.manifest.committer.store.operations.classname".
      See Also:
    • STORE_OPERATIONS_CLASS_DEFAULT

      public static final String STORE_OPERATIONS_CLASS_DEFAULT
      Default classname of the store operations.
    • CONTEXT_ATTR_STAGE

      public static final String CONTEXT_ATTR_STAGE
      Stage attribute in audit context: "st".
      See Also:
    • CONTEXT_ATTR_TASK_ATTEMPT_ID

      public static final String CONTEXT_ATTR_TASK_ATTEMPT_ID
      Task ID attribute in audit context: "ta".
      See Also:
    • CAPABILITY_DYNAMIC_PARTITIONING

      public static final String CAPABILITY_DYNAMIC_PARTITIONING
      Stream Capabilities probe for spark dynamic partitioning compatibility.
      See Also:
    • OPT_WRITER_QUEUE_CAPACITY

      public static final String OPT_WRITER_QUEUE_CAPACITY
      Queue capacity between task manifest loading an entry file writer. If more than this number of manifest lists are waiting to be written, the enqueue is blocking. There's an expectation that writing to the local file is a lot faster than the parallelized buffer reads, therefore that this queue can be emptied at the same rate it is filled. Value "mapreduce.manifest.committer.writer.queue.capacity".
      See Also:
    • DEFAULT_WRITER_QUEUE_CAPACITY

      public static final int DEFAULT_WRITER_QUEUE_CAPACITY
      Default value of OPT_WRITER_QUEUE_CAPACITY. Value 32.
      See Also:
    • OPT_MANIFEST_SAVE_ATTEMPTS

      public static final String OPT_MANIFEST_SAVE_ATTEMPTS
      How many attempts to save a task manifest by save and rename before giving up. Value: "mapreduce.manifest.committer.manifest.save.attempts".
      See Also:
    • OPT_MANIFEST_SAVE_ATTEMPTS_DEFAULT

      public static final int OPT_MANIFEST_SAVE_ATTEMPTS_DEFAULT
      Default value of OPT_MANIFEST_SAVE_ATTEMPTS: 5.
      See Also: