class org.apache.hadoop.fs.FSDataOutputStreamBuilder

Builder pattern for FSDataOutputStream and its subclasses. It is used to create a new file or open an existing file on FileSystem for write.

Invariants

The FSDataOutputStreamBuilder interface does not validate parameters and modify the state of FileSystem until build() is invoked.

Implementation-agnostic parameters.

FSDataOutputStreamBuilder create()

Specify FSDataOutputStreamBuilder to create a file on FileSystem, equivalent to CreateFlag#CREATE.

FSDataOutputStreamBuilder append()

Specify FSDataOutputStreamBuilder to append to an existing file on FileSystem, equivalent to CreateFlag#APPEND.

FSDataOutputStreamBuilder overwrite(boolean overwrite)

Specify FSDataOutputStreamBuilder to overwrite an existing file or not. If giving overwrite==true, it truncates an existing file, equivalent to CreateFlag#OVERWITE.

FSDataOutputStreamBuilder permission(FsPermission permission)

Set permission for the file.

FSDataOutputStreamBuilder bufferSize(int bufSize)

Set the size of the buffer to be used.

FSDataOutputStreamBuilder replication(short replica)

Set the replication factor.

FSDataOutputStreamBuilder blockSize(long size)

Set block size in bytes.

FSDataOutputStreamBuilder recursive()

Create parent directories if they do not exist.

FSDataOutputStreamBuilder progress(Progresable prog)

Set the facility of reporting progress.

FSDataOutputStreamBuilder checksumOpt(ChecksumOpt chksumOpt)

Set checksum opt.

Set optional or mandatory parameters

FSDataOutputStreamBuilder opt(String key, ...)
FSDataOutputStreamBuilder must(String key, ...)

Set optional or mandatory parameters to the builder. Using opt() or must(), client can specify FS-specific parameters without inspecting the concrete type of FileSystem.

// Don't
if (fs instanceof FooFileSystem) {
    FooFileSystem fs = (FooFileSystem) fs;
    out = dfs.createFile(path)
        .optionA()
        .optionB("value")
        .cache()
        .build()
} else if (fs instanceof BarFileSystem) {
    ...
}

// Do
out = fs.createFile(path)
    .permission(perm)
    .bufferSize(bufSize)
    .opt("foofs:option.a", true)
    .opt("foofs:option.b", "value")
    .opt("barfs:cache", true)
    .must("foofs:cache", true)
    .must("barfs:cache-size", 256 * 1024 * 1024)
    .build();

Implementation Notes

The concrete FileSystem and/or FSDataOutputStreamBuilder implementation MUST verify that implementation-agnostic parameters (i.e., "syncable) or implementation-specific parameters (i.e., "foofs:cache") are supported.FileSystemwill satisfy optional parameters (viaopt(key, …)) on best effort. If the mandatory parameters (viamust(key, …)) can not be satisfied in theFileSystem,IllegalArgumentExceptionmust be thrown inbuild()`.

The behavior of resolving the conflicts between the parameters set by builder methods (i.e., bufferSize()) and opt()/must() is as follows:

The last option specified defines the value and its optional/mandatory state.

HDFS-specific parameters.

HdfsDataOutputStreamBuilder extends FSDataOutputStreamBuilder provides additional HDFS-specific parameters, for further customize file creation / append behavior.

FSDataOutpuStreamBuilder favoredNodes(InetSocketAddress[] nodes)

Set favored DataNodes for new blocks.

FSDataOutputStreamBuilder syncBlock()

Force closed blocks to the disk device. See CreateFlag#SYNC_BLOCK

FSDataOutputStreamBuilder lazyPersist()

Create the block on transient storage if possible.

FSDataOutputStreamBuilder newBlock()

Append data to a new block instead of the end of the last partial block.

FSDataOutputStreamBuilder noLocalWrite()

Advise that a block replica NOT be written to the local DataNode.

FSDataOutputStreamBuilder ecPolicyName()

Enforce the file to be a striped file with erasure coding policy ‘policyName’, no matter what its parent directory’s replication or erasure coding policy is.

FSDataOutputStreamBuilder replicate()

Enforce the file to be a replicated file, no matter what its parent directory’s replication or erasure coding policy is.

Builder interface

FSDataOutputStream build()

Create a new file or append an existing file on the underlying FileSystem, and return FSDataOutputStream for write.

Preconditions

The following combinations of parameters are not supported:

if APPEND|OVERWRITE: raise HadoopIllegalArgumentException
if CREATE|APPEND|OVERWRITE: raise HadoopIllegalArgumentExdeption

FileSystem may reject the request for other reasons and throw IOException, see FileSystem#create(path, ...) and FileSystem#append().

Postconditions

FS' where :
   FS'.Files'[p] == []
   ancestors(p) is-subset-of FS'.Directories'

result = FSDataOutputStream

The result is FSDataOutputStream to be used to write data to filesystem.

S3A-specific options

Here are the custom options which the S3A Connector supports.

Name Type Meaning
fs.s3a.create.performance boolean create a file with maximum performance
fs.s3a.create.header string prefix for user supplied headers

fs.s3a.create.performance

Prioritize file creation performance over safety checks for filesystem consistency.

This: 1. Skips the LIST call which makes sure a file is being created over a directory. Risk: a file is created over a directory. 1. Ignores the overwrite flag. 1. Never issues a DELETE call to delete parent directory markers.

It is possible to probe an S3A Filesystem instance for this capability through the hasPathCapability(path, "fs.s3a.create.performance") check.

Creating files with this option over existing directories is likely to make S3A filesystem clients behave inconsistently.

Operations optimized for directories (e.g. listing calls) are likely to see the directory tree not the file; operations optimized for files (getFileStatus(), isFile()) more likely to see the file. The exact form of the inconsistencies, and which operations/parameters trigger this are undefined and may change between even minor releases.

Using this option is the equivalent of pressing and holding down the “Electronic Stability Control” button on a rear-wheel drive car for five seconds: the safety checks are off. Things wil be faster if the driver knew what they were doing. If they didn’t, the fact they had held the button down will be used as evidence at the inquest as proof that they made a conscious decision to choose speed over safety and that the outcome was their own fault.

Accordingly: Use if and only if you are confident that the conditions are met.

fs.s3a.create.header User-supplied header support

Options with the prefix fs.s3a.create.header. will be added to to the S3 object metadata as “user defined metadata”. This metadata is visible to all applications. It can also be retrieved through the FileSystem/FileContext listXAttrs() and getXAttrs() API calls with the prefix header.

When an object is renamed, the metadata is propagated the copy created.

It is possible to probe an S3A Filesystem instance for this capability through the hasPathCapability(path, "fs.s3a.create.header") check.