Package org.apache.hadoop.fs.s3a.commit

Class CommitConstants

java.lang.Object
org.apache.hadoop.fs.s3a.commit.CommitConstants

@Public @Unstable public final class CommitConstants extends Object
Constants for working with committers.
  • Field Details

    • MAGIC

      public static final String MAGIC
      Path for "magic" writes: path and PENDING_SUFFIX files: "__magic".
      See Also:
    • JOB_ID_PREFIX

      public static final String JOB_ID_PREFIX
      See Also:
    • MAGIC_PATH_PREFIX

      public static final String MAGIC_PATH_PREFIX
      See Also:
    • BASE

      public static final String BASE
      Marker of the start of a directory tree for calculating the final path names: "__base".
      See Also:
    • PENDING_SUFFIX

      public static final String PENDING_SUFFIX
      Suffix applied to pending commit metadata: ".pending".
      See Also:
    • PENDINGSET_SUFFIX

      public static final String PENDINGSET_SUFFIX
      Suffix applied to multiple pending commit metadata: ".pendingset".
      See Also:
    • MAGIC_COMMITTER_PENDING_OBJECT_ETAG_NAME

      public static final String MAGIC_COMMITTER_PENDING_OBJECT_ETAG_NAME
      Etag name to be returned on non-committed S3 object: "pending".
      See Also:
    • OPT_PREFIX

      public static final String OPT_PREFIX
      Prefix to use for config options: "fs.s3a.committer.".
      See Also:
    • MAGIC_COMMITTER_PREFIX

      public static final String MAGIC_COMMITTER_PREFIX
      Flag to indicate whether support for the Magic committer is enabled in the filesystem. Value: "fs.s3a.committer.magic".
      See Also:
    • MAGIC_COMMITTER_ENABLED

      public static final String MAGIC_COMMITTER_ENABLED
      Flag to indicate whether support for the Magic committer is enabled in the filesystem. Value: "fs.s3a.committer.magic.enabled".
      See Also:
    • STREAM_CAPABILITY_MAGIC_OUTPUT

      public static final String STREAM_CAPABILITY_MAGIC_OUTPUT
      Flag to indicate whether a stream is a magic output stream; returned in StreamCapabilities Value: "fs.s3a.capability.magic.output.stream".
      See Also:
    • STORE_CAPABILITY_MAGIC_COMMITTER

      public static final String STORE_CAPABILITY_MAGIC_COMMITTER
      Flag to indicate that a store supports magic committers. returned in PathCapabilities Value: "fs.s3a.capability.magic.committer".
      See Also:
    • STREAM_CAPABILITY_MAGIC_OUTPUT_OLD

      @Deprecated public static final String STREAM_CAPABILITY_MAGIC_OUTPUT_OLD
      Deprecated.
      Flag to indicate whether a stream is a magic output stream; returned in StreamCapabilities Value: "s3a:magic.output.stream".
      See Also:
    • STORE_CAPABILITY_MAGIC_COMMITTER_OLD

      @Deprecated public static final String STORE_CAPABILITY_MAGIC_COMMITTER_OLD
      Deprecated.
      Flag to indicate that a store supports magic committers. returned in PathCapabilities Value: "s3a:magic.committer".
      See Also:
    • DEFAULT_MAGIC_COMMITTER_ENABLED

      public static final boolean DEFAULT_MAGIC_COMMITTER_ENABLED
      Is the committer enabled by default: true.
      See Also:
    • TEMPORARY

      public static final String TEMPORARY
      This is the "Pending" directory of the FileOutputCommitter; data written here is, in that algorithm, renamed into place. Value: "_temporary".
      See Also:
    • TEMP_DATA

      public static final String TEMP_DATA
      Temp data which is not auto-committed: "_temporary".
      See Also:
    • CREATE_SUCCESSFUL_JOB_OUTPUT_DIR_MARKER

      public static final String CREATE_SUCCESSFUL_JOB_OUTPUT_DIR_MARKER
      Flag to trigger creation of a marker file on job completion.
      See Also:
    • _SUCCESS

      public static final String _SUCCESS
      Marker file to create on success: "_SUCCESS".
      See Also:
    • DEFAULT_CREATE_SUCCESSFUL_JOB_DIR_MARKER

      public static final boolean DEFAULT_CREATE_SUCCESSFUL_JOB_DIR_MARKER
      Default job marker option: true.
      See Also:
    • S3A_COMMITTER_FACTORY_KEY

      public static final String S3A_COMMITTER_FACTORY_KEY
      Key to set for the S3A schema to use the specific committer.
    • S3A_COMMITTER_FACTORY

      public static final String S3A_COMMITTER_FACTORY
      S3 Committer factory: "org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory". This uses the value of FS_S3A_COMMITTER_NAME to choose the final committer.
      See Also:
    • FS_S3A_COMMITTER_NAME

      public static final String FS_S3A_COMMITTER_NAME
      Option to identify the S3A committer: "fs.s3a.committer.name".
      See Also:
    • COMMITTER_NAME_FILE

      public static final String COMMITTER_NAME_FILE
      Option for FS_S3A_COMMITTER_NAME: classic/file output committer: "file".
      See Also:
    • COMMITTER_NAME_MAGIC

      public static final String COMMITTER_NAME_MAGIC
      Option for FS_S3A_COMMITTER_NAME: magic output committer: "magic".
      See Also:
    • COMMITTER_NAME_DIRECTORY

      public static final String COMMITTER_NAME_DIRECTORY
      Option for FS_S3A_COMMITTER_NAME: directory output committer: "directory".
      See Also:
    • COMMITTER_NAME_PARTITIONED

      public static final String COMMITTER_NAME_PARTITIONED
      Option for FS_S3A_COMMITTER_NAME: partition output committer: "partitioned".
      See Also:
    • FS_S3A_COMMITTER_STAGING_UNIQUE_FILENAMES

      public static final String FS_S3A_COMMITTER_STAGING_UNIQUE_FILENAMES
      Option for final files to have a uniqueness name through job attempt info, falling back to a new UUID if there is no job attempt information to use. "fs.s3a.committer.staging.unique-filenames". When writing data with the "append" conflict option, this guarantees that new data will not overwrite any existing data.
      See Also:
    • DEFAULT_STAGING_COMMITTER_UNIQUE_FILENAMES

      public static final boolean DEFAULT_STAGING_COMMITTER_UNIQUE_FILENAMES
      See Also:
    • FS_S3A_COMMITTER_STAGING_CONFLICT_MODE

      public static final String FS_S3A_COMMITTER_STAGING_CONFLICT_MODE
      Staging committer conflict resolution policy: "fs.s3a.committer.staging.conflict-mode". Supported: fail, append, replace.
      See Also:
    • CONFLICT_MODE_FAIL

      public static final String CONFLICT_MODE_FAIL
      Conflict mode: "fail".
      See Also:
    • CONFLICT_MODE_APPEND

      public static final String CONFLICT_MODE_APPEND
      Conflict mode: "append".
      See Also:
    • CONFLICT_MODE_REPLACE

      public static final String CONFLICT_MODE_REPLACE
      Conflict mode: "replace".
      See Also:
    • DEFAULT_CONFLICT_MODE

      public static final String DEFAULT_CONFLICT_MODE
      Default conflict mode: "append".
      See Also:
    • FS_S3A_COMMITTER_THREADS

      public static final String FS_S3A_COMMITTER_THREADS
      Number of threads in committers for parallel operations on files (upload, commit, abort, delete...): "fs.s3a.committer.threads". Two thread pools this size are created, one for the outer task-level parallelism, and one for parallel execution within tasks (POSTs to commit individual uploads) If the value is negative, it is inverted and then multiplied by the number of cores in the CPU.
      See Also:
    • DEFAULT_COMMITTER_THREADS

      public static final int DEFAULT_COMMITTER_THREADS
      Default value for FS_S3A_COMMITTER_THREADS: 32.
      See Also:
    • FS_S3A_COMMITTER_MAGIC_TRACK_COMMITS_IN_MEMORY_ENABLED

      public static final String FS_S3A_COMMITTER_MAGIC_TRACK_COMMITS_IN_MEMORY_ENABLED
      Should Magic committer track all the pending commits in memory?
      See Also:
    • FS_S3A_COMMITTER_MAGIC_TRACK_COMMITS_IN_MEMORY_ENABLED_DEFAULT

      public static final boolean FS_S3A_COMMITTER_MAGIC_TRACK_COMMITS_IN_MEMORY_ENABLED_DEFAULT
      See Also:
    • FS_S3A_COMMITTER_MAGIC_CLEANUP_ENABLED

      public static final String FS_S3A_COMMITTER_MAGIC_CLEANUP_ENABLED
      Should Magic committer cleanup all the staging dirs.
      See Also:
    • FS_S3A_COMMITTER_MAGIC_CLEANUP_ENABLED_DEFAULT

      public static final boolean FS_S3A_COMMITTER_MAGIC_CLEANUP_ENABLED_DEFAULT
      See Also:
    • FS_S3A_COMMITTER_STAGING_TMP_PATH

      public static final String FS_S3A_COMMITTER_STAGING_TMP_PATH
      Path in the cluster filesystem for temporary data: "fs.s3a.committer.staging.tmp.path". This is for HDFS, not the local filesystem. It is only for the summary data of each file, not the actual data being committed.
      See Also:
    • FS_S3A_COMMITTER_STAGING_ABORT_PENDING_UPLOADS

      @Deprecated public static final String FS_S3A_COMMITTER_STAGING_ABORT_PENDING_UPLOADS
      Deprecated.
      Should committers abort all pending uploads to the destination directory?

      Deprecated: switch to FS_S3A_COMMITTER_ABORT_PENDING_UPLOADS.

      See Also:
    • FS_S3A_COMMITTER_ABORT_PENDING_UPLOADS

      public static final String FS_S3A_COMMITTER_ABORT_PENDING_UPLOADS
      Should committers abort all pending uploads to the destination directory?

      Value: "fs.s3a.committer.abort.pending.uploads".

      Change this is if more than one committer is writing to the same destination tree simultaneously; otherwise the first job to complete will cancel all outstanding uploads from the others. If disabled, configure the bucket lifecycle to remove uploads after a time period, and/or set up a workflow to explicitly delete entries. Otherwise there is a risk that uncommitted uploads may run up bills.

      See Also:
    • DEFAULT_FS_S3A_COMMITTER_ABORT_PENDING_UPLOADS

      public static final boolean DEFAULT_FS_S3A_COMMITTER_ABORT_PENDING_UPLOADS
      Default configuration value for FS_S3A_COMMITTER_ABORT_PENDING_UPLOADS. It is disabled by default to support concurrent writes on the same parent directory but different partition/sub directory. Value: false.
      See Also:
    • SUCCESS_MARKER_FILE_LIMIT

      public static final int SUCCESS_MARKER_FILE_LIMIT
      The limit to the number of committed objects tracked during job commits and saved to the _SUCCESS file.
      See Also:
    • TASK_ATTEMPT_ID

      public static final String TASK_ATTEMPT_ID
      Extra Data key for task attempt in pendingset files.
      See Also:
    • FS_S3A_COMMITTER_REQUIRE_UUID

      public static final String FS_S3A_COMMITTER_REQUIRE_UUID
      Require the spark UUID to be passed down: "fs.s3a.committer.require.uuid". This is to verify that SPARK-33230 has been applied to spark, and that InternalCommitterConstants.SPARK_WRITE_UUID is set.

      MUST ONLY BE SET WITH SPARK JOBS.

      See Also:
    • DEFAULT_S3A_COMMITTER_REQUIRE_UUID

      public static final boolean DEFAULT_S3A_COMMITTER_REQUIRE_UUID
      Default value for FS_S3A_COMMITTER_REQUIRE_UUID: false.
      See Also:
    • FS_S3A_COMMITTER_GENERATE_UUID

      public static final String FS_S3A_COMMITTER_GENERATE_UUID
      Generate a UUID in job setup rather than fall back to YARN Application attempt ID.

      MUST ONLY BE SET WITH SPARK JOBS.

      See Also:
    • DEFAULT_S3A_COMMITTER_GENERATE_UUID

      public static final boolean DEFAULT_S3A_COMMITTER_GENERATE_UUID
      Default value for FS_S3A_COMMITTER_GENERATE_UUID: false.
      See Also:
    • X_HEADER_MAGIC_MARKER

      public static final String X_HEADER_MAGIC_MARKER
      Magic Marker header to declare final file length on magic uploads marker objects: "x-hadoop-s3a-magic-data-length".
      See Also:
    • XA_MAGIC_MARKER

      public static final String XA_MAGIC_MARKER
      XAttr name of magic marker, with "header." prefix: "header.x-hadoop-s3a-magic-data-length".
      See Also:
    • PARAM_TASK_ATTEMPT_ID

      public static final String PARAM_TASK_ATTEMPT_ID
      Task Attempt ID query header: "ta".
      See Also:
    • OPT_SUMMARY_REPORT_DIR

      public static final String OPT_SUMMARY_REPORT_DIR
      Directory for saving job summary reports. These are the _SUCCESS files, but are saved even on job failures. Value: "fs.s3a.committer.summary.report.directory".
      See Also:
    • S3A_COMMITTER_EXPERIMENTAL_COLLECT_IOSTATISTICS

      public static final String S3A_COMMITTER_EXPERIMENTAL_COLLECT_IOSTATISTICS
      Experimental feature to collect thread level IO statistics. When set the committers will reset the statistics in task setup and propagate to the job committer. The job comitter will include those and its own statistics. Do not use if the execution engine is collecting statistics, as the multiple reset() operations will result in incomplete statistics. Value: "fs.s3a.committer.experimental.collect.iostatistics".
      See Also:
    • S3A_COMMITTER_EXPERIMENTAL_COLLECT_IOSTATISTICS_DEFAULT

      public static final boolean S3A_COMMITTER_EXPERIMENTAL_COLLECT_IOSTATISTICS_DEFAULT
      Default value for S3A_COMMITTER_EXPERIMENTAL_COLLECT_IOSTATISTICS. Value: false.
      See Also: