Class AdlFileSystem

All Implemented Interfaces:
Closeable, AutoCloseable, Configurable, BulkDeleteSource, org.apache.hadoop.fs.PathCapabilities, org.apache.hadoop.security.token.DelegationTokenIssuer

@Public @Evolving public class AdlFileSystem extends FileSystem
A FileSystem to access Azure Data Lake Store.
  • Field Details

  • Constructor Details

    • AdlFileSystem

      public AdlFileSystem()
  • Method Details

    • getScheme

      public String getScheme()
      Description copied from class: FileSystem
      Return the protocol scheme for this FileSystem.

      This implementation throws an UnsupportedOperationException.

      Overrides:
      getScheme in class FileSystem
      Returns:
      the protocol scheme for this FileSystem.
    • getUri

      public URI getUri()
      Description copied from class: FileSystem
      Returns a URI which identifies this FileSystem.
      Specified by:
      getUri in class FileSystem
      Returns:
      the URI of this filesystem.
    • getDefaultPort

      public int getDefaultPort()
      Description copied from class: FileSystem
      Get the default port for this FileSystem.
      Overrides:
      getDefaultPort in class FileSystem
      Returns:
      the default port or 0 if there isn't one
    • supportsSymlinks

      public boolean supportsSymlinks()
      Description copied from class: FileSystem
      Overrides:
      supportsSymlinks in class FileSystem
      Returns:
      if support symlinkls true, not false.
    • initialize

      public void initialize(URI storeUri, Configuration originalConf) throws IOException
      Called after a new FileSystem instance is constructed.
      Overrides:
      initialize in class FileSystem
      Parameters:
      storeUri - a uri whose authority section names the host, port, etc. for this FileSystem
      originalConf - the configuration to use for the FS. The account- specific options are patched over the base ones before any use is made of the config.
      Throws:
      IOException - on any failure to initialize this instance.
    • getCustomAccessTokenProvider

      protected AzureADTokenProvider getCustomAccessTokenProvider(Configuration conf) throws IOException
      This method is provided for convenience for derived classes to define custom AzureADTokenProvider instance. In order to ensure secure hadoop infrastructure and user context for which respective AdlFileSystem instance is initialized, Loading AzureADTokenProvider is not sufficient. The order of loading AzureADTokenProvider is to first invoke getCustomAccessTokenProvider(Configuration), If method return null which means no implementation provided by derived classes, then configuration object is loaded to retrieve token configuration as specified is documentation. Custom token management takes the higher precedence during initialization.
      Parameters:
      conf - Configuration object
      Returns:
      null if the no custom AzureADTokenProvider token management is specified.
      Throws:
      IOException - if failed to initialize token provider.
    • getAdlClient

      @VisibleForTesting public com.microsoft.azure.datalake.store.ADLStoreClient getAdlClient()
    • getHomeDirectory

      public Path getHomeDirectory()
      Constructing home directory locally is fine as long as Hadoop local user name and ADL user name relationship story is not fully baked yet.
      Overrides:
      getHomeDirectory in class FileSystem
      Returns:
      Hadoop local user home directory.
    • create

      public FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException
      Create call semantic is handled differently in case of ADL. Create semantics is translated to Create/Append semantics. 1. No dedicated connection to server. 2. Buffering is locally done, Once buffer is full or flush is invoked on the by the caller. All the pending data is pushed to ADL as APPEND operation code. 3. On close - Additional call is send to server to close the stream, and release lock from the stream. Necessity of Create/Append semantics is 1. ADL backend server does not allow idle connection for longer duration . In case of slow writer scenario, observed connection timeout/Connection reset causing occasional job failures. 2. Performance boost to jobs which are slow writer, avoided network latency 3. ADL equally better performing with multiple of 4MB chunk as append calls.
      Specified by:
      create in class FileSystem
      Parameters:
      f - File path
      permission - Access permission for the newly created file
      overwrite - Remove existing file and recreate new one if true otherwise throw error if file exist
      bufferSize - Buffer size, ADL backend does not honour
      replication - Replication count, ADL backend does not honour
      blockSize - Block size, ADL backend does not honour
      progress - Progress indicator
      Returns:
      FSDataOutputStream OutputStream on which application can push stream of bytes
      Throws:
      IOException - when system error, internal server error or user error
      See Also:
    • createNonRecursive

      public FSDataOutputStream createNonRecursive(Path f, FsPermission permission, EnumSet<CreateFlag> flags, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException
      Deprecated.
      API only for 0.20-append
      Opens an FSDataOutputStream at the indicated Path with write-progress reporting. Same as create(), except fails if parent directory doesn't already exist.
      Overrides:
      createNonRecursive in class FileSystem
      Parameters:
      f - the file name to open
      permission - Access permission for the newly created file
      flags - CreateFlags to use for this stream.
      bufferSize - the size of the buffer to be used. ADL backend does not honour
      replication - required block replication for the file. ADL backend does not honour
      blockSize - Block size, ADL backend does not honour
      progress - Progress indicator
      Returns:
      output stream.
      Throws:
      IOException - when system error, internal server error or user error
      See Also:
    • append

      public FSDataOutputStream append(Path f, int bufferSize, Progressable progress) throws IOException
      Append to an existing file (optional operation).
      Specified by:
      append in class FileSystem
      Parameters:
      f - the existing file to be appended.
      bufferSize - the size of the buffer to be used. ADL backend does not honour
      progress - Progress indicator
      Returns:
      output stream.
      Throws:
      IOException - when system error, internal server error or user error
    • setReplication

      public boolean setReplication(Path p, short replication) throws IOException
      Azure data lake does not support user configuration for data replication hence not leaving system to query on azure data lake. Stub implementation
      Overrides:
      setReplication in class FileSystem
      Parameters:
      p - Not honoured
      replication - Not honoured
      Returns:
      True hard coded since ADL file system does not support replication configuration
      Throws:
      IOException - No exception would not thrown in this case however aligning with parent api definition.
    • open

      public FSDataInputStream open(Path f, int buffersize) throws IOException
      Open call semantic is handled differently in case of ADL. Instead of network stream is returned to the user, Overridden FsInputStream is returned.
      Specified by:
      open in class FileSystem
      Parameters:
      f - File path
      buffersize - Buffer size, Not honoured
      Returns:
      FSDataInputStream InputStream on which application can read stream of bytes
      Throws:
      IOException - when system error, internal server error or user error
    • getFileStatus

      public FileStatus getFileStatus(Path f) throws IOException
      Return a file status object that represents the path.
      Specified by:
      getFileStatus in class FileSystem
      Parameters:
      f - The path we want information from
      Returns:
      a FileStatus object
      Throws:
      IOException - when the path does not exist or any other error; IOException see specific implementation
    • listStatus

      public FileStatus[] listStatus(Path f) throws IOException
      List the statuses of the files/directories in the given path if the path is a directory.
      Specified by:
      listStatus in class FileSystem
      Parameters:
      f - given path
      Returns:
      the statuses of the files/directories in the given patch
      Throws:
      IOException - when the path does not exist or any other error; IOException see specific implementation
    • rename

      public boolean rename(Path src, Path dst) throws IOException
      Renames Path src to Path dst. Can take place on local fs or remote DFS. ADLS support POSIX standard for rename operation.
      Specified by:
      rename in class FileSystem
      Parameters:
      src - path to be renamed
      dst - new path after rename
      Returns:
      true if rename is successful
      Throws:
      IOException - on failure
    • rename

      @Deprecated public void rename(Path src, Path dst, Options.Rename... options) throws IOException
      Deprecated.
      Description copied from class: FileSystem
      Renames Path src to Path dst
      • Fails if src is a file and dst is a directory.
      • Fails if src is a directory and dst is a file.
      • Fails if the parent of dst does not exist or is a file.

      If OVERWRITE option is not passed as an argument, rename fails if the dst already exists.

      If OVERWRITE option is passed as an argument, rename overwrites the dst if it is a file or an empty directory. Rename fails if dst is a non-empty directory.

      Note that atomicity of rename is dependent on the file system implementation. Please refer to the file system documentation for details. This default implementation is non atomic.

      This method is deprecated since it is a temporary method added to support the transition from FileSystem to FileContext for user applications.

      Overrides:
      rename in class FileSystem
      Parameters:
      src - path to be renamed
      dst - new path after rename
      options - rename options.
      Throws:
      FileNotFoundException - src path does not exist, or the parent path of dst does not exist.
      FileAlreadyExistsException - dest path exists and is a file
      ParentNotDirectoryException - if the parent path of dest is not a directory
      IOException - on failure
    • concat

      public void concat(Path trg, Path[] srcs) throws IOException
      Concat existing files together.
      Overrides:
      concat in class FileSystem
      Parameters:
      trg - the path to the target destination.
      srcs - the paths to the sources to use for the concatenation.
      Throws:
      IOException - when system error, internal server error or user error
    • delete

      public boolean delete(Path path, boolean recursive) throws IOException
      Delete a file.
      Specified by:
      delete in class FileSystem
      Parameters:
      path - the path to delete.
      recursive - if path is a directory and set to true, the directory is deleted else throws an exception. In case of a file the recursive can be set to either true or false.
      Returns:
      true if delete is successful else false.
      Throws:
      IOException - when system error, internal server error or user error
    • mkdirs

      public boolean mkdirs(Path path, FsPermission permission) throws IOException
      Make the given file and all non-existent parents into directories. Has the semantics of Unix 'mkdir -p'. Existence of the directory hierarchy is not an error.
      Specified by:
      mkdirs in class FileSystem
      Parameters:
      path - path to create
      permission - to apply to path
      Returns:
      if mkdir success true, not false.
      Throws:
      IOException - IO failure
    • setOwner

      public void setOwner(Path path, String owner, String group) throws IOException
      Set owner of a path (i.e. a file or a directory). The parameters owner and group cannot both be null.
      Overrides:
      setOwner in class FileSystem
      Parameters:
      path - The path
      owner - If it is null, the original username remains unchanged.
      group - If it is null, the original groupname remains unchanged.
      Throws:
      IOException - IO failure
    • setPermission

      public void setPermission(Path path, FsPermission permission) throws IOException
      Set permission of a path.
      Overrides:
      setPermission in class FileSystem
      Parameters:
      path - The path
      permission - Access permission
      Throws:
      IOException - IO failure
    • modifyAclEntries

      public void modifyAclEntries(Path path, List<AclEntry> aclSpec) throws IOException
      Modifies ACL entries of files and directories. This method can add new ACL entries or modify the permissions on existing ACL entries. All existing ACL entries that are not specified in this call are retained without changes. (Modifications are merged into the current ACL.)
      Overrides:
      modifyAclEntries in class FileSystem
      Parameters:
      path - Path to modify
      aclSpec - List of AclEntry describing modifications
      Throws:
      IOException - if an ACL could not be modified
    • removeAclEntries

      public void removeAclEntries(Path path, List<AclEntry> aclSpec) throws IOException
      Removes ACL entries from files and directories. Other ACL entries are retained.
      Overrides:
      removeAclEntries in class FileSystem
      Parameters:
      path - Path to modify
      aclSpec - List of AclEntry describing entries to remove
      Throws:
      IOException - if an ACL could not be modified
    • removeDefaultAcl

      public void removeDefaultAcl(Path path) throws IOException
      Removes all default ACL entries from files and directories.
      Overrides:
      removeDefaultAcl in class FileSystem
      Parameters:
      path - Path to modify
      Throws:
      IOException - if an ACL could not be modified
    • removeAcl

      public void removeAcl(Path path) throws IOException
      Removes all but the base ACL entries of files and directories. The entries for user, group, and others are retained for compatibility with permission bits.
      Overrides:
      removeAcl in class FileSystem
      Parameters:
      path - Path to modify
      Throws:
      IOException - if an ACL could not be removed
    • setAcl

      public void setAcl(Path path, List<AclEntry> aclSpec) throws IOException
      Fully replaces ACL of files and directories, discarding all existing entries.
      Overrides:
      setAcl in class FileSystem
      Parameters:
      path - Path to modify
      aclSpec - List of AclEntry describing modifications, must include entries for user, group, and others for compatibility with permission bits.
      Throws:
      IOException - if an ACL could not be modified
    • getAclStatus

      public AclStatus getAclStatus(Path path) throws IOException
      Gets the ACL of a file or directory.
      Overrides:
      getAclStatus in class FileSystem
      Parameters:
      path - Path to get
      Returns:
      AclStatus describing the ACL of the file or directory
      Throws:
      IOException - if an ACL could not be read
    • access

      public void access(Path path, FsAction mode) throws IOException
      Checks if the user can access a path. The mode specifies which access checks to perform. If the requested permissions are granted, then the method returns normally. If access is denied, then the method throws an AccessControlException.
      Parameters:
      path - Path to check
      mode - type of access to check
      Throws:
      AccessControlException - if access is denied
      FileNotFoundException - if the path does not exist
      IOException - see specific implementation
    • getContentSummary

      public ContentSummary getContentSummary(Path f) throws IOException
      Return the ContentSummary of a given Path.
      Overrides:
      getContentSummary in class FileSystem
      Parameters:
      f - path to use
      Returns:
      content summary.
      Throws:
      FileNotFoundException - if the path does not resolve
      IOException - IO failure
    • getTransportScheme

      @VisibleForTesting protected String getTransportScheme()
    • getWorkingDirectory

      public Path getWorkingDirectory()
      Get the current working directory for the given file system.
      Specified by:
      getWorkingDirectory in class FileSystem
      Returns:
      the directory pathname
    • setWorkingDirectory

      public void setWorkingDirectory(Path dir)
      Set the current working directory for the given file system. All relative paths will be resolved relative to it.
      Specified by:
      setWorkingDirectory in class FileSystem
      Parameters:
      dir - Working directory path.
    • getDefaultBlockSize

      @Deprecated public long getDefaultBlockSize()
      Deprecated.
      Return the number of bytes that large input files should be optimally be split into to minimize i/o time.
      Overrides:
      getDefaultBlockSize in class FileSystem
      Returns:
      default block size.
    • getDefaultBlockSize

      public long getDefaultBlockSize(Path f)
      Return the number of bytes that large input files should be optimally be split into to minimize i/o time. The given path will be used to locate the actual filesystem. The full path does not have to exist.
      Overrides:
      getDefaultBlockSize in class FileSystem
      Parameters:
      f - path of file
      Returns:
      the default block size for the path's filesystem
    • getBlockSize

      @Deprecated public long getBlockSize(Path f) throws IOException
      Deprecated.
      Use getFileStatus() instead
      Description copied from class: FileSystem
      Get the block size for a particular file.
      Overrides:
      getBlockSize in class FileSystem
      Parameters:
      f - the filename
      Returns:
      the number of bytes in a block
      Throws:
      FileNotFoundException - if the path is not present
      IOException - IO failure
    • getReplication

      @Deprecated public short getReplication(Path src)
      Deprecated.
      Use getFileStatus() instead
      Get replication.
      Overrides:
      getReplication in class FileSystem
      Parameters:
      src - file name
      Returns:
      file replication
    • setUserGroupRepresentationAsUPN

      @VisibleForTesting public void setUserGroupRepresentationAsUPN(boolean enableUPN)
    • getAccountNameFromFQDN

      public static String getAccountNameFromFQDN(String accountFQDN)
      Gets ADL account name from ADL FQDN.
      Parameters:
      accountFQDN - ADL account fqdn
      Returns:
      ADL account name
    • propagateAccountOptions

      public static Configuration propagateAccountOptions(Configuration source, String accountName)
      Propagates account-specific settings into generic ADL configuration keys. This is done by propagating the values of the form fs.adl.account.${account_name}.key to fs.adl.key, for all values of "key" The source of the updated property is set to the key name of the account property, to aid in diagnostics of where things came from. Returns a new configuration. Why the clone? You can use the same conf for different filesystems, and the original values are not updated.
      Parameters:
      source - Source Configuration object
      accountName - account name. Must not be empty
      Returns:
      a (potentially) patched clone of the original
    • hasPathCapability

      public boolean hasPathCapability(Path path, String capability) throws IOException
      Description copied from class: FileSystem
      The base FileSystem implementation generally has no knowledge of the capabilities of actual implementations. Unless it has a way to explicitly determine the capabilities, this method returns false. Probe for a specific capability under the given path. If the function returns true, this instance is explicitly declaring that the capability is available. If the function returns false, it can mean one of:
      • The capability is not known.
      • The capability is known but it is not supported.
      • The capability is known but the filesystem does not know if it is supported under the supplied path.
      The core guarantee which a caller can rely on is: if the predicate returns true, then the specific operation/behavior can be expected to be supported. However a specific call may be rejected for permission reasons, the actual file/directory not being present, or some other failure during the attempted execution of the operation.

      Implementors: PathCapabilitiesSupport can be used to help implement this method.

      Specified by:
      hasPathCapability in interface org.apache.hadoop.fs.PathCapabilities
      Overrides:
      hasPathCapability in class FileSystem
      Parameters:
      path - path to query the capability of.
      capability - non-null, non-empty string to query the path for support.
      Returns:
      true if the capability is supported under that part of the FS.
      Throws:
      IOException - this should not be raised, except on problems resolving paths or relaying the call.