org.apache.hadoop.fs.FSDataOutputStreamBuilder
Builder pattern for FSDataOutputStream
and its subclasses. It is used to create a new file or open an existing file on FileSystem
for write.
The FSDataOutputStreamBuilder
interface does not validate parameters and modify the state of FileSystem
until build()
is invoked.
FSDataOutputStreamBuilder create()
Specify FSDataOutputStreamBuilder
to create a file on FileSystem
, equivalent to CreateFlag#CREATE
.
FSDataOutputStreamBuilder append()
Specify FSDataOutputStreamBuilder
to append to an existing file on FileSystem
, equivalent to CreateFlag#APPEND
.
FSDataOutputStreamBuilder overwrite(boolean overwrite)
Specify FSDataOutputStreamBuilder
to overwrite an existing file or not. If giving overwrite==true
, it truncates an existing file, equivalent to CreateFlag#OVERWITE
.
FSDataOutputStreamBuilder permission(FsPermission permission)
Set permission for the file.
FSDataOutputStreamBuilder bufferSize(int bufSize)
Set the size of the buffer to be used.
FSDataOutputStreamBuilder replication(short replica)
Set the replication factor.
FSDataOutputStreamBuilder blockSize(long size)
Set block size in bytes.
FSDataOutputStreamBuilder recursive()
Create parent directories if they do not exist.
FSDataOutputStreamBuilder progress(Progresable prog)
Set the facility of reporting progress.
FSDataOutputStreamBuilder checksumOpt(ChecksumOpt chksumOpt)
Set checksum opt.
FSDataOutputStreamBuilder opt(String key, ...) FSDataOutputStreamBuilder must(String key, ...)
Set optional or mandatory parameters to the builder. Using opt()
or must()
, client can specify FS-specific parameters without inspecting the concrete type of FileSystem
.
// Don't if (fs instanceof FooFileSystem) { FooFileSystem fs = (FooFileSystem) fs; out = dfs.createFile(path) .optionA() .optionB("value") .cache() .build() } else if (fs instanceof BarFileSystem) { ... } // Do out = fs.createFile(path) .permission(perm) .bufferSize(bufSize) .opt("foofs:option.a", true) .opt("foofs:option.b", "value") .opt("barfs:cache", true) .must("foofs:cache", true) .must("barfs:cache-size", 256 * 1024 * 1024) .build();
The concrete FileSystem
and/or FSDataOutputStreamBuilder
implementation MUST verify that implementation-agnostic parameters (i.e., "syncable) or implementation-specific parameters (i.e., "foofs:cache") are supported.
FileSystemwill satisfy optional parameters (via
opt(key, …)) on best effort. If the mandatory parameters (via
must(key, …)) can not be satisfied in the
FileSystem,
IllegalArgumentExceptionmust be thrown in
build()`.
The behavior of resolving the conflicts between the parameters set by builder methods (i.e., bufferSize()
) and opt()
/must()
is as follows:
The last option specified defines the value and its optional/mandatory state.
HdfsDataOutputStreamBuilder extends FSDataOutputStreamBuilder
provides additional HDFS-specific parameters, for further customize file creation / append behavior.
FSDataOutpuStreamBuilder favoredNodes(InetSocketAddress[] nodes)
Set favored DataNodes for new blocks.
FSDataOutputStreamBuilder syncBlock()
Force closed blocks to the disk device. See CreateFlag#SYNC_BLOCK
FSDataOutputStreamBuilder lazyPersist()
Create the block on transient storage if possible.
FSDataOutputStreamBuilder newBlock()
Append data to a new block instead of the end of the last partial block.
FSDataOutputStreamBuilder noLocalWrite()
Advise that a block replica NOT be written to the local DataNode.
FSDataOutputStreamBuilder ecPolicyName()
Enforce the file to be a striped file with erasure coding policy ‘policyName’, no matter what its parent directory’s replication or erasure coding policy is.
FSDataOutputStreamBuilder replicate()
Enforce the file to be a replicated file, no matter what its parent directory’s replication or erasure coding policy is.
FSDataOutputStream build()
Create a new file or append an existing file on the underlying FileSystem
, and return FSDataOutputStream
for write.
The following combinations of parameters are not supported:
if APPEND|OVERWRITE: raise HadoopIllegalArgumentException if CREATE|APPEND|OVERWRITE: raise HadoopIllegalArgumentExdeption
FileSystem
may reject the request for other reasons and throw IOException
, see FileSystem#create(path, ...)
and FileSystem#append()
.
FS' where : FS'.Files'[p] == [] ancestors(p) is-subset-of FS'.Directories' result = FSDataOutputStream
The result is FSDataOutputStream
to be used to write data to filesystem.
Here are the custom options which the S3A Connector supports.
Name | Type | Meaning |
---|---|---|
fs.s3a.create.performance |
boolean |
create a file with maximum performance |
fs.s3a.create.header |
string |
prefix for user supplied headers |
fs.s3a.create.performance
Prioritize file creation performance over safety checks for filesystem consistency.
This: 1. Skips the LIST
call which makes sure a file is being created over a directory. Risk: a file is created over a directory. 2. Ignores the overwrite flag. 3. Never issues a DELETE
call to delete parent directory markers.
It is possible to probe an S3A Filesystem instance for this capability through the hasPathCapability(path, "fs.s3a.create.performance")
check.
Creating files with this option over existing directories is likely to make S3A filesystem clients behave inconsistently.
Operations optimized for directories (e.g. listing calls) are likely to see the directory tree not the file; operations optimized for files (getFileStatus()
, isFile()
) more likely to see the file. The exact form of the inconsistencies, and which operations/parameters trigger this are undefined and may change between even minor releases.
Using this option is the equivalent of pressing and holding down the “Electronic Stability Control” button on a rear-wheel drive car for five seconds: the safety checks are off. Things wil be faster if the driver knew what they were doing. If they didn’t, the fact they had held the button down will be used as evidence at the inquest as proof that they made a conscious decision to choose speed over safety and that the outcome was their own fault.
Note: the option can be set for an entire filesystem. Again, the safety checks are there to more closely match the semantics of a classic filesystem, and to reduce the likelihood that the object store ends up in a state which diverges so much from the classic directory + tree structur that applications get confused.
Accordingly: Use if and only if you are confident that the conditions are met.
fs.s3a.create.header
User-supplied header supportOptions with the prefix fs.s3a.create.header.
will be added to the S3 object metadata as “user defined metadata”. This metadata is visible to all applications. It can also be retrieved through the FileSystem/FileContext listXAttrs()
and getXAttrs()
API calls with the prefix header.
When an object is renamed, the metadata is propagated the copy created.
It is possible to probe an S3A Filesystem instance for this capability through the hasPathCapability(path, "fs.s3a.create.header")
check.