Package org.apache.hadoop.fs.adl
Class AdlFileSystem
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.hadoop.fs.FileSystem
org.apache.hadoop.fs.adl.AdlFileSystem
- All Implemented Interfaces:
Closeable,AutoCloseable,Configurable,BulkDeleteSource,org.apache.hadoop.fs.PathCapabilities,org.apache.hadoop.security.token.DelegationTokenIssuer
A FileSystem to access Azure Data Lake Store.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.hadoop.fs.FileSystem
org.apache.hadoop.fs.FileSystem.DirectoryEntries, org.apache.hadoop.fs.FileSystem.DirListingIterator<T extends FileStatus>, org.apache.hadoop.fs.FileSystem.Statistics -
Field Summary
FieldsFields inherited from class org.apache.hadoop.fs.FileSystem
DEFAULT_FS, FS_DEFAULT_NAME_KEY, SHUTDOWN_HOOK_PRIORITY, statistics, TRASH_PREFIX, USER_HOME_PREFIXFields inherited from interface org.apache.hadoop.security.token.DelegationTokenIssuer
TOKEN_LOG -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidChecks if the user can access a path.append(Path f, int bufferSize, Progressable progress) Append to an existing file (optional operation).voidConcat existing files together.create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) Create call semantic is handled differently in case of ADL.createNonRecursive(Path f, FsPermission permission, EnumSet<CreateFlag> flags, int bufferSize, short replication, long blockSize, Progressable progress) Deprecated.API only for 0.20-appendbooleanDelete a file.static StringgetAccountNameFromFQDN(String accountFQDN) Gets ADL account name from ADL FQDN.getAclStatus(Path path) Gets the ACL of a file or directory.com.microsoft.azure.datalake.store.ADLStoreClientlonggetBlockSize(Path f) Deprecated.Use getFileStatus() insteadReturn theContentSummaryof a givenPath.protected AzureADTokenProviderThis method is provided for convenience for derived classes to define customAzureADTokenProviderinstance.longDeprecated.usegetDefaultBlockSize(Path)insteadlongReturn the number of bytes that large input files should be optimally be split into to minimize i/o time.intGet the default port for this FileSystem.Return a file status object that represents the path.Constructing home directory locally is fine as long as Hadoop local user name and ADL user name relationship story is not fully baked yet.shortgetReplication(Path src) Deprecated.Use getFileStatus() insteadReturn the protocol scheme for this FileSystem.protected StringgetUri()Returns a URI which identifies this FileSystem.Get the current working directory for the given file system.booleanhasPathCapability(Path path, String capability) The base FileSystem implementation generally has no knowledge of the capabilities of actual implementations.voidinitialize(URI storeUri, Configuration originalConf) Called after a new FileSystem instance is constructed.listStatus(Path f) List the statuses of the files/directories in the given path if the path is a directory.booleanmkdirs(Path path, FsPermission permission) Make the given file and all non-existent parents into directories.voidmodifyAclEntries(Path path, List<AclEntry> aclSpec) Modifies ACL entries of files and directories.Open call semantic is handled differently in case of ADL.static ConfigurationpropagateAccountOptions(Configuration source, String accountName) Propagates account-specific settings into generic ADL configuration keys.voidRemoves all but the base ACL entries of files and directories.voidremoveAclEntries(Path path, List<AclEntry> aclSpec) Removes ACL entries from files and directories.voidremoveDefaultAcl(Path path) Removes all default ACL entries from files and directories.booleanRenames Path src to Path dst.voidrename(Path src, Path dst, Options.Rename... options) Deprecated.voidFully replaces ACL of files and directories, discarding all existing entries.voidSet owner of a path (i.e. a file or a directory).voidsetPermission(Path path, FsPermission permission) Set permission of a path.booleansetReplication(Path p, short replication) Azure data lake does not support user configuration for data replication hence not leaving system to query on azure data lake.voidsetUserGroupRepresentationAsUPN(boolean enableUPN) voidsetWorkingDirectory(Path dir) Set the current working directory for the given file system.booleanMethods inherited from class org.apache.hadoop.fs.FileSystem
append, append, append, append, appendFile, areSymlinksEnabled, cancelDeleteOnExit, canonicalizeUri, checkPath, clearStatistics, close, closeAll, closeAllForUGI, completeLocalOutput, copyFromLocalFile, copyFromLocalFile, copyFromLocalFile, copyFromLocalFile, copyToLocalFile, copyToLocalFile, copyToLocalFile, create, create, create, create, create, create, create, create, create, create, create, create, createBulkDelete, createDataInputStreamBuilder, createDataInputStreamBuilder, createDataOutputStreamBuilder, createFile, createMultipartUploader, createNewFile, createNonRecursive, createNonRecursive, createPathHandle, createSnapshot, createSnapshot, createSymlink, delete, deleteOnExit, deleteSnapshot, enableSymlinks, exists, fixRelativePart, get, get, get, getAdditionalTokenIssuers, getAllStatistics, getAllStoragePolicies, getCanonicalServiceName, getCanonicalUri, getChildFileSystems, getDefaultReplication, getDefaultReplication, getDefaultUri, getDelegationToken, getEnclosingRoot, getFileBlockLocations, getFileBlockLocations, getFileChecksum, getFileChecksum, getFileLinkStatus, getFileSystemClass, getFSofPath, getGlobalStorageStatistics, getInitialWorkingDirectory, getLength, getLinkTarget, getLocal, getName, getNamed, getPathHandle, getQuotaUsage, getServerDefaults, getServerDefaults, getStatistics, getStatistics, getStatus, getStatus, getStoragePolicy, getStorageStatistics, getTrashRoot, getTrashRoots, getUsed, getUsed, getXAttr, getXAttrs, getXAttrs, globStatus, globStatus, isDirectory, isFile, listCorruptFileBlocks, listFiles, listLocatedStatus, listLocatedStatus, listStatus, listStatus, listStatus, listStatusBatch, listStatusIterator, listXAttrs, makeQualified, mkdirs, mkdirs, moveFromLocalFile, moveFromLocalFile, moveToLocalFile, msync, newInstance, newInstance, newInstance, newInstanceLocal, open, open, open, openFile, openFile, openFileWithOptions, openFileWithOptions, primitiveCreate, primitiveMkdir, primitiveMkdir, printStatistics, processDeleteOnExit, removeXAttr, renameSnapshot, resolveLink, resolvePath, satisfyStoragePolicy, setDefaultUri, setDefaultUri, setQuota, setQuotaByStorageType, setStoragePolicy, setTimes, setVerifyChecksum, setWriteChecksum, setXAttr, setXAttr, startLocalOutput, truncate, unsetStoragePolicyMethods inherited from class org.apache.hadoop.conf.Configured
getConf, setConfMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.hadoop.security.token.DelegationTokenIssuer
addDelegationTokens
-
Field Details
-
SCHEME
- See Also:
-
-
Constructor Details
-
AdlFileSystem
public AdlFileSystem()
-
-
Method Details
-
getScheme
Description copied from class:FileSystemReturn the protocol scheme for this FileSystem.This implementation throws an
UnsupportedOperationException.- Overrides:
getSchemein classFileSystem- Returns:
- the protocol scheme for this FileSystem.
-
getUri
Description copied from class:FileSystemReturns a URI which identifies this FileSystem.- Specified by:
getUriin classFileSystem- Returns:
- the URI of this filesystem.
-
getDefaultPort
public int getDefaultPort()Description copied from class:FileSystemGet the default port for this FileSystem.- Overrides:
getDefaultPortin classFileSystem- Returns:
- the default port or 0 if there isn't one
-
supportsSymlinks
public boolean supportsSymlinks()Description copied from class:FileSystem- Overrides:
supportsSymlinksin classFileSystem- Returns:
- if support symlinkls true, not false.
-
initialize
Called after a new FileSystem instance is constructed.- Overrides:
initializein classFileSystem- Parameters:
storeUri- a uri whose authority section names the host, port, etc. for this FileSystemoriginalConf- the configuration to use for the FS. The account- specific options are patched over the base ones before any use is made of the config.- Throws:
IOException- on any failure to initialize this instance.
-
getCustomAccessTokenProvider
This method is provided for convenience for derived classes to define customAzureADTokenProviderinstance. In order to ensure secure hadoop infrastructure and user context for which respectiveAdlFileSysteminstance is initialized, LoadingAzureADTokenProvideris not sufficient. The order of loadingAzureADTokenProvideris to first invokegetCustomAccessTokenProvider(Configuration), If method return null which means no implementation provided by derived classes, then configuration object is loaded to retrieve token configuration as specified is documentation. Custom token management takes the higher precedence during initialization.- Parameters:
conf- Configuration object- Returns:
- null if the no custom
AzureADTokenProvidertoken management is specified. - Throws:
IOException- if failed to initialize token provider.
-
getAdlClient
@VisibleForTesting public com.microsoft.azure.datalake.store.ADLStoreClient getAdlClient() -
getHomeDirectory
Constructing home directory locally is fine as long as Hadoop local user name and ADL user name relationship story is not fully baked yet.- Overrides:
getHomeDirectoryin classFileSystem- Returns:
- Hadoop local user home directory.
-
create
public FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException Create call semantic is handled differently in case of ADL. Create semantics is translated to Create/Append semantics. 1. No dedicated connection to server. 2. Buffering is locally done, Once buffer is full or flush is invoked on the by the caller. All the pending data is pushed to ADL as APPEND operation code. 3. On close - Additional call is send to server to close the stream, and release lock from the stream. Necessity of Create/Append semantics is 1. ADL backend server does not allow idle connection for longer duration . In case of slow writer scenario, observed connection timeout/Connection reset causing occasional job failures. 2. Performance boost to jobs which are slow writer, avoided network latency 3. ADL equally better performing with multiple of 4MB chunk as append calls.- Specified by:
createin classFileSystem- Parameters:
f- File pathpermission- Access permission for the newly created fileoverwrite- Remove existing file and recreate new one if true otherwise throw error if file existbufferSize- Buffer size, ADL backend does not honourreplication- Replication count, ADL backend does not honourblockSize- Block size, ADL backend does not honourprogress- Progress indicator- Returns:
- FSDataOutputStream OutputStream on which application can push stream of bytes
- Throws:
IOException- when system error, internal server error or user error- See Also:
-
createNonRecursive
public FSDataOutputStream createNonRecursive(Path f, FsPermission permission, EnumSet<CreateFlag> flags, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException Deprecated.API only for 0.20-appendOpens an FSDataOutputStream at the indicated Path with write-progress reporting. Same as create(), except fails if parent directory doesn't already exist.- Overrides:
createNonRecursivein classFileSystem- Parameters:
f- the file name to openpermission- Access permission for the newly created fileflags-CreateFlags to use for this stream.bufferSize- the size of the buffer to be used. ADL backend does not honourreplication- required block replication for the file. ADL backend does not honourblockSize- Block size, ADL backend does not honourprogress- Progress indicator- Returns:
- output stream.
- Throws:
IOException- when system error, internal server error or user error- See Also:
-
append
Append to an existing file (optional operation).- Specified by:
appendin classFileSystem- Parameters:
f- the existing file to be appended.bufferSize- the size of the buffer to be used. ADL backend does not honourprogress- Progress indicator- Returns:
- output stream.
- Throws:
IOException- when system error, internal server error or user error
-
setReplication
Azure data lake does not support user configuration for data replication hence not leaving system to query on azure data lake. Stub implementation- Overrides:
setReplicationin classFileSystem- Parameters:
p- Not honouredreplication- Not honoured- Returns:
- True hard coded since ADL file system does not support replication configuration
- Throws:
IOException- No exception would not thrown in this case however aligning with parent api definition.
-
open
Open call semantic is handled differently in case of ADL. Instead of network stream is returned to the user, Overridden FsInputStream is returned.- Specified by:
openin classFileSystem- Parameters:
f- File pathbuffersize- Buffer size, Not honoured- Returns:
- FSDataInputStream InputStream on which application can read stream of bytes
- Throws:
IOException- when system error, internal server error or user error
-
getFileStatus
Return a file status object that represents the path.- Specified by:
getFileStatusin classFileSystem- Parameters:
f- The path we want information from- Returns:
- a FileStatus object
- Throws:
IOException- when the path does not exist or any other error; IOException see specific implementation
-
listStatus
List the statuses of the files/directories in the given path if the path is a directory.- Specified by:
listStatusin classFileSystem- Parameters:
f- given path- Returns:
- the statuses of the files/directories in the given patch
- Throws:
IOException- when the path does not exist or any other error; IOException see specific implementation
-
rename
Renames Path src to Path dst. Can take place on local fs or remote DFS. ADLS support POSIX standard for rename operation.- Specified by:
renamein classFileSystem- Parameters:
src- path to be renameddst- new path after rename- Returns:
- true if rename is successful
- Throws:
IOException- on failure
-
rename
Deprecated.Description copied from class:FileSystemRenames Path src to Path dst- Fails if src is a file and dst is a directory.
- Fails if src is a directory and dst is a file.
- Fails if the parent of dst does not exist or is a file.
If OVERWRITE option is not passed as an argument, rename fails if the dst already exists.
If OVERWRITE option is passed as an argument, rename overwrites the dst if it is a file or an empty directory. Rename fails if dst is a non-empty directory.
Note that atomicity of rename is dependent on the file system implementation. Please refer to the file system documentation for details. This default implementation is non atomic.This method is deprecated since it is a temporary method added to support the transition from FileSystem to FileContext for user applications.
- Overrides:
renamein classFileSystem- Parameters:
src- path to be renameddst- new path after renameoptions- rename options.- Throws:
FileNotFoundException- src path does not exist, or the parent path of dst does not exist.FileAlreadyExistsException- dest path exists and is a fileParentNotDirectoryException- if the parent path of dest is not a directoryIOException- on failure
-
concat
Concat existing files together.- Overrides:
concatin classFileSystem- Parameters:
trg- the path to the target destination.srcs- the paths to the sources to use for the concatenation.- Throws:
IOException- when system error, internal server error or user error
-
delete
Delete a file.- Specified by:
deletein classFileSystem- Parameters:
path- the path to delete.recursive- if path is a directory and set to true, the directory is deleted else throws an exception. In case of a file the recursive can be set to either true or false.- Returns:
- true if delete is successful else false.
- Throws:
IOException- when system error, internal server error or user error
-
mkdirs
Make the given file and all non-existent parents into directories. Has the semantics of Unix 'mkdir -p'. Existence of the directory hierarchy is not an error.- Specified by:
mkdirsin classFileSystem- Parameters:
path- path to createpermission- to apply to path- Returns:
- if mkdir success true, not false.
- Throws:
IOException- IO failure
-
setOwner
Set owner of a path (i.e. a file or a directory). The parameters owner and group cannot both be null.- Overrides:
setOwnerin classFileSystem- Parameters:
path- The pathowner- If it is null, the original username remains unchanged.group- If it is null, the original groupname remains unchanged.- Throws:
IOException- IO failure
-
setPermission
Set permission of a path.- Overrides:
setPermissionin classFileSystem- Parameters:
path- The pathpermission- Access permission- Throws:
IOException- IO failure
-
modifyAclEntries
Modifies ACL entries of files and directories. This method can add new ACL entries or modify the permissions on existing ACL entries. All existing ACL entries that are not specified in this call are retained without changes. (Modifications are merged into the current ACL.)- Overrides:
modifyAclEntriesin classFileSystem- Parameters:
path- Path to modifyaclSpec- List of AclEntry describing modifications- Throws:
IOException- if an ACL could not be modified
-
removeAclEntries
Removes ACL entries from files and directories. Other ACL entries are retained.- Overrides:
removeAclEntriesin classFileSystem- Parameters:
path- Path to modifyaclSpec- List of AclEntry describing entries to remove- Throws:
IOException- if an ACL could not be modified
-
removeDefaultAcl
Removes all default ACL entries from files and directories.- Overrides:
removeDefaultAclin classFileSystem- Parameters:
path- Path to modify- Throws:
IOException- if an ACL could not be modified
-
removeAcl
Removes all but the base ACL entries of files and directories. The entries for user, group, and others are retained for compatibility with permission bits.- Overrides:
removeAclin classFileSystem- Parameters:
path- Path to modify- Throws:
IOException- if an ACL could not be removed
-
setAcl
Fully replaces ACL of files and directories, discarding all existing entries.- Overrides:
setAclin classFileSystem- Parameters:
path- Path to modifyaclSpec- List of AclEntry describing modifications, must include entries for user, group, and others for compatibility with permission bits.- Throws:
IOException- if an ACL could not be modified
-
getAclStatus
Gets the ACL of a file or directory.- Overrides:
getAclStatusin classFileSystem- Parameters:
path- Path to get- Returns:
- AclStatus describing the ACL of the file or directory
- Throws:
IOException- if an ACL could not be read
-
access
Checks if the user can access a path. The mode specifies which access checks to perform. If the requested permissions are granted, then the method returns normally. If access is denied, then the method throws anAccessControlException.- Parameters:
path- Path to checkmode- type of access to check- Throws:
AccessControlException- if access is deniedFileNotFoundException- if the path does not existIOException- see specific implementation
-
getContentSummary
Return theContentSummaryof a givenPath.- Overrides:
getContentSummaryin classFileSystem- Parameters:
f- path to use- Returns:
- content summary.
- Throws:
FileNotFoundException- if the path does not resolveIOException- IO failure
-
getTransportScheme
-
getWorkingDirectory
Get the current working directory for the given file system.- Specified by:
getWorkingDirectoryin classFileSystem- Returns:
- the directory pathname
-
setWorkingDirectory
Set the current working directory for the given file system. All relative paths will be resolved relative to it.- Specified by:
setWorkingDirectoryin classFileSystem- Parameters:
dir- Working directory path.
-
getDefaultBlockSize
Deprecated.usegetDefaultBlockSize(Path)insteadReturn the number of bytes that large input files should be optimally be split into to minimize i/o time.- Overrides:
getDefaultBlockSizein classFileSystem- Returns:
- default block size.
-
getDefaultBlockSize
Return the number of bytes that large input files should be optimally be split into to minimize i/o time. The given path will be used to locate the actual filesystem. The full path does not have to exist.- Overrides:
getDefaultBlockSizein classFileSystem- Parameters:
f- path of file- Returns:
- the default block size for the path's filesystem
-
getBlockSize
Deprecated.Use getFileStatus() insteadDescription copied from class:FileSystemGet the block size for a particular file.- Overrides:
getBlockSizein classFileSystem- Parameters:
f- the filename- Returns:
- the number of bytes in a block
- Throws:
FileNotFoundException- if the path is not presentIOException- IO failure
-
getReplication
Deprecated.Use getFileStatus() insteadGet replication.- Overrides:
getReplicationin classFileSystem- Parameters:
src- file name- Returns:
- file replication
-
setUserGroupRepresentationAsUPN
@VisibleForTesting public void setUserGroupRepresentationAsUPN(boolean enableUPN) -
getAccountNameFromFQDN
Gets ADL account name from ADL FQDN.- Parameters:
accountFQDN- ADL account fqdn- Returns:
- ADL account name
-
propagateAccountOptions
Propagates account-specific settings into generic ADL configuration keys. This is done by propagating the values of the formfs.adl.account.${account_name}.keytofs.adl.key, for all values of "key" The source of the updated property is set to the key name of the account property, to aid in diagnostics of where things came from. Returns a new configuration. Why the clone? You can use the same conf for different filesystems, and the original values are not updated.- Parameters:
source- Source Configuration objectaccountName- account name. Must not be empty- Returns:
- a (potentially) patched clone of the original
-
hasPathCapability
Description copied from class:FileSystemThe base FileSystem implementation generally has no knowledge of the capabilities of actual implementations. Unless it has a way to explicitly determine the capabilities, this method returns false. Probe for a specific capability under the given path. If the function returnstrue, this instance is explicitly declaring that the capability is available. If the function returnsfalse, it can mean one of:- The capability is not known.
- The capability is known but it is not supported.
- The capability is known but the filesystem does not know if it is supported under the supplied path.
Implementors:
PathCapabilitiesSupportcan be used to help implement this method.- Specified by:
hasPathCapabilityin interfaceorg.apache.hadoop.fs.PathCapabilities- Overrides:
hasPathCapabilityin classFileSystem- Parameters:
path- path to query the capability of.capability- non-null, non-empty string to query the path for support.- Returns:
- true if the capability is supported under that part of the FS.
- Throws:
IOException- this should not be raised, except on problems resolving paths or relaying the call.
-