Package org.apache.hadoop.fs
Class ChecksumFileSystem
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.hadoop.fs.FileSystem
org.apache.hadoop.fs.FilterFileSystem
org.apache.hadoop.fs.ChecksumFileSystem
- All Implemented Interfaces:
Closeable,AutoCloseable,Configurable,BulkDeleteSource,org.apache.hadoop.fs.PathCapabilities,org.apache.hadoop.security.token.DelegationTokenIssuer
- Direct Known Subclasses:
LocalFileSystem
Abstract Checksumed FileSystem.
It provide a basic implementation of a Checksumed FileSystem,
which creates a checksum file for each raw file.
It generates & verifies checksums at the client side.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.hadoop.fs.FileSystem
org.apache.hadoop.fs.FileSystem.DirectoryEntries, org.apache.hadoop.fs.FileSystem.DirListingIterator<T extends FileStatus>, org.apache.hadoop.fs.FileSystem.Statistics -
Field Summary
Fields inherited from class org.apache.hadoop.fs.FilterFileSystem
fs, swapSchemeFields inherited from class org.apache.hadoop.fs.FileSystem
DEFAULT_FS, FS_DEFAULT_NAME_KEY, LOG, SHUTDOWN_HOOK_PRIORITY, statistics, TRASH_PREFIX, USER_HOME_PREFIXFields inherited from interface org.apache.hadoop.security.token.DelegationTokenIssuer
TOKEN_LOG -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionappend(Path f, int bufferSize, Progressable progress) Append to an existing file (optional operation).appendFile(Path path) This is overridden to ensure that this class's create() method is ultimately called.voidcompleteLocalOutput(Path fsOutputFile, Path tmpLocalFile) Called when we're all done writing to the target.voidConcat existing files together.voidcopyFromLocalFile(boolean delSrc, Path src, Path dst) The src file is on the local disk.voidcopyToLocalFile(boolean delSrc, Path src, Path dst) The src file is under FS, and the dst is on the local disk.voidcopyToLocalFile(Path src, Path dst, boolean copyCrc) The src file is under FS, and the dst is on the local disk.create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) Create an FSDataOutputStream at the indicated Path with write-progress reporting.create(Path f, FsPermission permission, EnumSet<CreateFlag> flags, int bufferSize, short replication, long blockSize, Progressable progress, org.apache.hadoop.fs.Options.ChecksumOpt checksumOpt) Create an FSDataOutputStream at the indicated Path with a custom checksum option.createFile(Path path) This is overridden to ensure that this class's create() method is ultimately called.createNonRecursive(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) Opens an FSDataOutputStream at the indicated Path with write-progress reporting.createNonRecursive(Path f, FsPermission permission, EnumSet<CreateFlag> flags, int bufferSize, short replication, long blockSize, Progressable progress) Opens an FSDataOutputStream at the indicated Path with write-progress reporting.booleanImplement the delete(Path, boolean) in checksum file system.static doublegetApproxChkSumLength(long size) intReturn the bytes Per Checksum.getChecksumFile(Path file) Return the name of the checksum file associated with a file.longgetChecksumFileLength(Path file, long fileSize) Return the length of the checksum file given the size of the actual file.static longgetChecksumLength(long size, int bytesPerSum) Calculated the length of the checksum file in bytes.get the raw file systembooleanIs checksum verification enabled?booleanhasPathCapability(Path path, String capability) Disable those operations which the checksummed FS blocks.static booleanisChecksumFile(Path file) Return true if file is a checksum file name.org.apache.hadoop.fs.RemoteIterator<LocatedFileStatus>List the statuses of the files/directories in the given path if the path is a directory.listStatus(Path f) List the statuses of the files/directories in the given path if the path is a directory.org.apache.hadoop.fs.RemoteIterator<FileStatus>Return a remote iterator for listing in a directorybooleanCallFileSystem.mkdirs(Path, FsPermission)with default permission.voidmodifyAclEntries(Path src, List<AclEntry> aclSpec) Modifies ACL entries of files and directories.Opens an FSDataInputStream at the indicated Path.This is overridden to ensure that this class'sopenFileWithOptions(org.apache.hadoop.fs.Path, org.apache.hadoop.fs.impl.OpenFileParameters)() method is called, and so ultimately itsopen(Path, int).protected CompletableFuture<FSDataInputStream>openFileWithOptions(Path path, org.apache.hadoop.fs.impl.OpenFileParameters parameters) Open the file as a blocking call toopen(Path, int).voidRemoves all but the base ACL entries of files and directories.voidremoveAclEntries(Path src, List<AclEntry> aclSpec) Removes ACL entries from files and directories.voidremoveDefaultAcl(Path src) Removes all default ACL entries from files and directories.booleanRename files/dirsbooleanreportChecksumFailure(Path f, FSDataInputStream in, long inPos, FSDataInputStream sums, long sumsPos) Report a checksum error to the file system.voidFully replaces ACL of files and directories, discarding all existing entries.voidsetConf(Configuration conf) Set the configuration to be used by this object.voidSet owner of a path (i.e. a file or a directory).voidsetPermission(Path src, FsPermission permission) Set permission of a path.booleansetReplication(Path src, short replication) Set replication for an existing file.voidsetVerifyChecksum(boolean verifyChecksum) Set whether to verify checksum.voidsetWriteChecksum(boolean writeChecksum) Set the write checksum flag.startLocalOutput(Path fsOutputFile, Path tmpLocalFile) Returns a local File that the user can write output to.booleanTruncate the file in the indicated path to the indicated size.Methods inherited from class org.apache.hadoop.fs.FilterFileSystem
access, canonicalizeUri, checkPath, close, copyFromLocalFile, copyFromLocalFile, createPathHandle, createSnapshot, createSymlink, deleteSnapshot, getAclStatus, getAllStoragePolicies, getCanonicalUri, getChildFileSystems, getConf, getDefaultBlockSize, getDefaultBlockSize, getDefaultReplication, getDefaultReplication, getEnclosingRoot, getFileBlockLocations, getFileChecksum, getFileChecksum, getFileLinkStatus, getFileStatus, getHomeDirectory, getInitialWorkingDirectory, getLinkTarget, getServerDefaults, getServerDefaults, getStatus, getStoragePolicy, getTrashRoot, getTrashRoots, getUri, getUsed, getUsed, getWorkingDirectory, getXAttr, getXAttrs, getXAttrs, initialize, listCorruptFileBlocks, listLocatedStatus, listXAttrs, makeQualified, mkdirs, msync, open, openFile, openFileWithOptions, primitiveCreate, primitiveMkdir, removeXAttr, rename, renameSnapshot, resolveLink, resolvePath, satisfyStoragePolicy, setStoragePolicy, setTimes, setWorkingDirectory, setXAttr, setXAttr, supportsSymlinks, unsetStoragePolicyMethods inherited from class org.apache.hadoop.fs.FileSystem
append, append, append, append, areSymlinksEnabled, cancelDeleteOnExit, clearStatistics, closeAll, closeAllForUGI, copyFromLocalFile, copyToLocalFile, copyToLocalFile, create, create, create, create, create, create, create, create, create, create, create, createBulkDelete, createDataInputStreamBuilder, createDataInputStreamBuilder, createDataOutputStreamBuilder, createMultipartUploader, createNewFile, createNonRecursive, createSnapshot, delete, deleteOnExit, enableSymlinks, exists, fixRelativePart, get, get, get, getAdditionalTokenIssuers, getAllStatistics, getBlockSize, getCanonicalServiceName, getContentSummary, getDefaultPort, getDefaultUri, getDelegationToken, getFileBlockLocations, getFileSystemClass, getFSofPath, getGlobalStorageStatistics, getLength, getLocal, getName, getNamed, getPathHandle, getQuotaUsage, getReplication, getScheme, getStatistics, getStatistics, getStatus, getStorageStatistics, globStatus, globStatus, isDirectory, isFile, listFiles, listStatus, listStatus, listStatus, listStatusBatch, mkdirs, moveFromLocalFile, moveFromLocalFile, moveToLocalFile, newInstance, newInstance, newInstance, newInstanceLocal, open, open, primitiveMkdir, printStatistics, processDeleteOnExit, setDefaultUri, setDefaultUri, setQuota, setQuotaByStorageTypeMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.hadoop.security.token.DelegationTokenIssuer
addDelegationTokens
-
Constructor Details
-
ChecksumFileSystem
-
-
Method Details
-
getApproxChkSumLength
public static double getApproxChkSumLength(long size) -
setConf
Description copied from interface:ConfigurableSet the configuration to be used by this object.- Specified by:
setConfin interfaceConfigurable- Overrides:
setConfin classConfigured- Parameters:
conf- configuration to be used
-
setVerifyChecksum
public void setVerifyChecksum(boolean verifyChecksum) Set whether to verify checksum.- Overrides:
setVerifyChecksumin classFilterFileSystem- Parameters:
verifyChecksum- Verify checksum flag
-
getVerifyChecksum
public boolean getVerifyChecksum()Is checksum verification enabled?- Returns:
- true if files are to be verified through checksums.
-
setWriteChecksum
public void setWriteChecksum(boolean writeChecksum) Description copied from class:FileSystemSet the write checksum flag. This is only applicable if the corresponding filesystem supports checksums. By default doesn't do anything.- Overrides:
setWriteChecksumin classFilterFileSystem- Parameters:
writeChecksum- Write checksum flag
-
getRawFileSystem
get the raw file system- Overrides:
getRawFileSystemin classFilterFileSystem- Returns:
- FileSystem being filtered
-
getChecksumFile
Return the name of the checksum file associated with a file.- Parameters:
file- the file path.- Returns:
- name of the checksum file associated with a file.
-
isChecksumFile
Return true if file is a checksum file name.- Parameters:
file- the file path.- Returns:
- if file is a checksum file true, not false.
-
getChecksumFileLength
Return the length of the checksum file given the size of the actual file.- Parameters:
file- the file path.fileSize- file size.- Returns:
- checksum length.
-
getBytesPerSum
public int getBytesPerSum()Return the bytes Per Checksum.- Returns:
- bytes per check sum.
-
open
Opens an FSDataInputStream at the indicated Path.- Overrides:
openin classFilterFileSystem- Parameters:
f- the file name to openbufferSize- the size of the buffer to be used.- Returns:
- input stream.
- Throws:
IOException- if an I/O error occurs.
-
append
Description copied from class:FileSystemAppend to an existing file (optional operation).- Overrides:
appendin classFilterFileSystem- Parameters:
f- the existing file to be appended.bufferSize- the size of the buffer to be used.progress- for reporting progress if it is not null.- Returns:
- output stream.
- Throws:
IOException- IO failure
-
truncate
Description copied from class:FileSystemTruncate the file in the indicated path to the indicated size.- Fails if path is a directory.
- Fails if path does not exist.
- Fails if path is not closed.
- Fails if new size is greater than current size.
- Overrides:
truncatein classFilterFileSystem- Parameters:
f- The path to the file to be truncatednewLength- The size the file is to be truncated to- Returns:
trueif the file has been truncated to the desirednewLengthand is immediately available to be reused for write operations such asappend, orfalseif a background process of adjusting the length of the last block has been started, and clients should wait for it to complete before proceeding with further file updates.- Throws:
IOException- IO failure
-
concat
Description copied from class:FileSystemConcat existing files together.- Overrides:
concatin classFilterFileSystem- Parameters:
f- the path to the target destination.psrcs- the paths to the sources to use for the concatenation.- Throws:
IOException- IO failure
-
getChecksumLength
public static long getChecksumLength(long size, int bytesPerSum) Calculated the length of the checksum file in bytes.- Parameters:
size- the length of the data file in bytesbytesPerSum- the number of bytes in a checksum block- Returns:
- the number of bytes in the checksum file
-
create
public FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException Description copied from class:FileSystemCreate an FSDataOutputStream at the indicated Path with write-progress reporting.- Overrides:
createin classFilterFileSystem- Parameters:
f- the file name to openpermission- file permissionoverwrite- if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.bufferSize- the size of the buffer to be used.replication- required block replication for the file.blockSize- block sizeprogress- the progress reporter- Returns:
- output stream.
- Throws:
IOException- IO failure- See Also:
-
createNonRecursive
public FSDataOutputStream createNonRecursive(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException Description copied from class:FileSystemOpens an FSDataOutputStream at the indicated Path with write-progress reporting. Same as create(), except fails if parent directory doesn't already exist.- Overrides:
createNonRecursivein classFileSystem- Parameters:
f- the file name to openpermission- file permissionoverwrite- if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.bufferSize- the size of the buffer to be used.replication- required block replication for the file.blockSize- block sizeprogress- the progress reporter- Returns:
- output stream.
- Throws:
IOException- IO failure- See Also:
-
create
public FSDataOutputStream create(Path f, FsPermission permission, EnumSet<CreateFlag> flags, int bufferSize, short replication, long blockSize, Progressable progress, org.apache.hadoop.fs.Options.ChecksumOpt checksumOpt) throws IOException Description copied from class:FileSystemCreate an FSDataOutputStream at the indicated Path with a custom checksum option.- Overrides:
createin classFilterFileSystem- Parameters:
f- the file name to openpermission- file permissionflags-CreateFlags to use for this stream.bufferSize- the size of the buffer to be used.replication- required block replication for the file.blockSize- block sizeprogress- the progress reporterchecksumOpt- checksum parameter. If null, the values found in conf will be used.- Returns:
- output stream.
- Throws:
IOException- IO failure- See Also:
-
createNonRecursive
public FSDataOutputStream createNonRecursive(Path f, FsPermission permission, EnumSet<CreateFlag> flags, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException Description copied from class:FileSystemOpens an FSDataOutputStream at the indicated Path with write-progress reporting. Same as create(), except fails if parent directory doesn't already exist.- Overrides:
createNonRecursivein classFilterFileSystem- Parameters:
f- the file name to openpermission- file permissionflags-CreateFlags to use for this stream.bufferSize- the size of the buffer to be used.replication- required block replication for the file.blockSize- block sizeprogress- the progress reporter- Returns:
- output stream.
- Throws:
IOException- IO failure- See Also:
-
setPermission
Description copied from class:FileSystemSet permission of a path.- Overrides:
setPermissionin classFilterFileSystem- Parameters:
src- The pathpermission- permission- Throws:
IOException- IO failure
-
setOwner
Description copied from class:FileSystemSet owner of a path (i.e. a file or a directory). The parameters username and groupname cannot both be null.- Overrides:
setOwnerin classFilterFileSystem- Parameters:
src- The pathusername- If it is null, the original username remains unchanged.groupname- If it is null, the original groupname remains unchanged.- Throws:
IOException- IO failure
-
setAcl
Description copied from class:FileSystemFully replaces ACL of files and directories, discarding all existing entries.- Overrides:
setAclin classFilterFileSystem- Parameters:
src- Path to modifyaclSpec- List describing modifications, which must include entries for user, group, and others for compatibility with permission bits.- Throws:
IOException- if an ACL could not be modified
-
modifyAclEntries
Description copied from class:FileSystemModifies ACL entries of files and directories. This method can add new ACL entries or modify the permissions on existing ACL entries. All existing ACL entries that are not specified in this call are retained without changes. (Modifications are merged into the current ACL.)- Overrides:
modifyAclEntriesin classFilterFileSystem- Parameters:
src- Path to modifyaclSpec- List<AclEntry> describing modifications- Throws:
IOException- if an ACL could not be modified
-
removeAcl
Description copied from class:FileSystemRemoves all but the base ACL entries of files and directories. The entries for user, group, and others are retained for compatibility with permission bits.- Overrides:
removeAclin classFilterFileSystem- Parameters:
src- Path to modify- Throws:
IOException- if an ACL could not be removed
-
removeAclEntries
Description copied from class:FileSystemRemoves ACL entries from files and directories. Other ACL entries are retained.- Overrides:
removeAclEntriesin classFilterFileSystem- Parameters:
src- Path to modifyaclSpec- List describing entries to remove- Throws:
IOException- if an ACL could not be modified
-
removeDefaultAcl
Description copied from class:FileSystemRemoves all default ACL entries from files and directories.- Overrides:
removeDefaultAclin classFilterFileSystem- Parameters:
src- Path to modify- Throws:
IOException- if an ACL could not be modified
-
setReplication
Set replication for an existing file. Implement the abstractsetReplicationofFileSystem- Overrides:
setReplicationin classFilterFileSystem- Parameters:
src- file namereplication- new replication- Returns:
- true if successful; false if file does not exist or is a directory
- Throws:
IOException- if an I/O error occurs.
-
rename
Rename files/dirs- Overrides:
renamein classFilterFileSystem- Parameters:
src- path to be renameddst- new path after rename- Returns:
- true if rename is successful
- Throws:
IOException- on failure
-
delete
Implement the delete(Path, boolean) in checksum file system.- Overrides:
deletein classFilterFileSystem- Parameters:
f- the path to delete.recursive- if path is a directory and set to true, the directory is deleted else throws an exception. In case of a file the recursive can be set to either true or false.- Returns:
- true if delete is successful else false.
- Throws:
IOException- IO failure
-
listStatus
List the statuses of the files/directories in the given path if the path is a directory.- Overrides:
listStatusin classFilterFileSystem- Parameters:
f- given path- Returns:
- the statuses of the files/directories in the given path
- Throws:
IOException- if an I/O error occurs.
-
listStatusIterator
public org.apache.hadoop.fs.RemoteIterator<FileStatus> listStatusIterator(Path p) throws IOException Description copied from class:FilterFileSystemReturn a remote iterator for listing in a directory- Overrides:
listStatusIteratorin classFilterFileSystem- Parameters:
p- target path- Returns:
- remote iterator
- Throws:
FileNotFoundException- ifpdoes not existIOException- if any I/O error occurred
-
listLocatedStatus
public org.apache.hadoop.fs.RemoteIterator<LocatedFileStatus> listLocatedStatus(Path f) throws IOException List the statuses of the files/directories in the given path if the path is a directory.- Overrides:
listLocatedStatusin classFilterFileSystem- Parameters:
f- given path- Returns:
- the statuses of the files/directories in the given patch
- Throws:
IOException- if an I/O error occurs.
-
mkdirs
Description copied from class:FileSystemCallFileSystem.mkdirs(Path, FsPermission)with default permission.- Overrides:
mkdirsin classFilterFileSystem- Parameters:
f- path- Returns:
- true if the directory was created
- Throws:
IOException- IO failure
-
copyFromLocalFile
Description copied from class:FilterFileSystemThe src file is on the local disk. Add it to FS at the given dst name. delSrc indicates if the source should be removed- Overrides:
copyFromLocalFilein classFilterFileSystem- Parameters:
delSrc- whether to delete the srcsrc- pathdst- path- Throws:
IOException- IO failure.
-
copyToLocalFile
The src file is under FS, and the dst is on the local disk. Copy it from FS control to the local dst name.- Overrides:
copyToLocalFilein classFilterFileSystem- Parameters:
delSrc- whether to delete the srcsrc- path src file in the remote filesystemdst- path local destination- Throws:
IOException- IO failure
-
copyToLocalFile
The src file is under FS, and the dst is on the local disk. Copy it from FS control to the local dst name. If src and dst are directories, the copyCrc parameter determines whether to copy CRC files.- Parameters:
src- src path.dst- dst path.copyCrc- copy csc flag.- Throws:
IOException- if an I/O error occurs.
-
startLocalOutput
Description copied from class:FilterFileSystemReturns a local File that the user can write output to. The caller provides both the eventual FS target name and the local working file. If the FS is local, we write directly into the target. If the FS is remote, we write into the tmp local area.- Overrides:
startLocalOutputin classFilterFileSystem- Parameters:
fsOutputFile- path of output filetmpLocalFile- path of local tmp file- Returns:
- the path.
- Throws:
IOException- IO failure
-
completeLocalOutput
Description copied from class:FilterFileSystemCalled when we're all done writing to the target. A local FS will do nothing, because we've written to exactly the right place. A remote FS will copy the contents of tmpLocalFile to the correct target at fsOutputFile.- Overrides:
completeLocalOutputin classFilterFileSystem- Parameters:
fsOutputFile- path of output filetmpLocalFile- path to local tmp file- Throws:
IOException- IO failure
-
reportChecksumFailure
public boolean reportChecksumFailure(Path f, FSDataInputStream in, long inPos, FSDataInputStream sums, long sumsPos) Report a checksum error to the file system.- Parameters:
f- the file name containing the errorin- the stream open on the fileinPos- the position of the beginning of the bad data in the filesums- the stream open on the checksum filesumsPos- the position of the beginning of the bad data in the checksum file- Returns:
- if retry is necessary
-
openFile
public FutureDataInputStreamBuilder openFile(Path path) throws IOException, UnsupportedOperationException This is overridden to ensure that this class'sopenFileWithOptions(org.apache.hadoop.fs.Path, org.apache.hadoop.fs.impl.OpenFileParameters)() method is called, and so ultimately itsopen(Path, int). Open a file for reading through a builder API. Ultimately callsFileSystem.open(Path, int)unless a subclass executes the open command differently. The semantics of this call are therefore the same as that ofFileSystem.open(Path, int)with one special point: it is inFSDataInputStreamBuilder.build()in which the open operation takes place -it is there where all preconditions to the operation are checked.- Overrides:
openFilein classFilterFileSystem- Parameters:
path- file path- Returns:
- a FSDataInputStreamBuilder object to build the input stream
- Throws:
IOException- if some early checks cause IO failures.UnsupportedOperationException- if support is checked early.
-
openFileWithOptions
protected CompletableFuture<FSDataInputStream> openFileWithOptions(Path path, org.apache.hadoop.fs.impl.OpenFileParameters parameters) throws IOException Open the file as a blocking call toopen(Path, int). Execute the actual open file operation. This is invoked fromFSDataInputStreamBuilder.build()and fromDelegateToFileSystemand is where the action of opening the file should begin. The base implementation performs a blocking call toFileSystem.open(Path, int)in this call; the actual outcome is in the returnedCompletableFuture. This avoids having to create some thread pool, while still setting up the expectation that theget()call is needed to evaluate the result.- Overrides:
openFileWithOptionsin classFilterFileSystem- Parameters:
path- path to the fileparameters- open file parameters from the builder.- Returns:
- a future which will evaluate to the opened file.
- Throws:
IOException- failure to resolve the link.
-
createFile
This is overridden to ensure that this class's create() method is ultimately called. Create a new FSDataOutputStreamBuilder for the file with path. Files are overwritten by default.- Overrides:
createFilein classFilterFileSystem- Parameters:
path- file path- Returns:
- a FSDataOutputStreamBuilder object to build the file HADOOP-14384. Temporarily reduce the visibility of method before the builder interface becomes stable.
-
appendFile
This is overridden to ensure that this class's create() method is ultimately called. Create a Builder to append a file.- Overrides:
appendFilein classFilterFileSystem- Parameters:
path- file path.- Returns:
- a
FSDataOutputStreamBuilderto build file append request.
-
hasPathCapability
Disable those operations which the checksummed FS blocks. The base FileSystem implementation generally has no knowledge of the capabilities of actual implementations. Unless it has a way to explicitly determine the capabilities, this method returns false. Probe for a specific capability under the given path. If the function returnstrue, this instance is explicitly declaring that the capability is available. If the function returnsfalse, it can mean one of:- The capability is not known.
- The capability is known but it is not supported.
- The capability is known but the filesystem does not know if it is supported under the supplied path.
Implementors:
PathCapabilitiesSupportcan be used to help implement this method.- Specified by:
hasPathCapabilityin interfaceorg.apache.hadoop.fs.PathCapabilities- Overrides:
hasPathCapabilityin classFilterFileSystem- Parameters:
path- path to query the capability of.capability- non-null, non-empty string to query the path for support.- Returns:
- true if the capability is supported under that part of the FS.
- Throws:
IOException- this should not be raised, except on problems resolving paths or relaying the call.
-