Class ChecksumFileSystem

All Implemented Interfaces:
Closeable, AutoCloseable, Configurable, BulkDeleteSource, org.apache.hadoop.fs.PathCapabilities, org.apache.hadoop.security.token.DelegationTokenIssuer
Direct Known Subclasses:
LocalFileSystem

@Public @Stable public abstract class ChecksumFileSystem extends FilterFileSystem
Abstract Checksumed FileSystem. It provide a basic implementation of a Checksumed FileSystem, which creates a checksum file for each raw file. It generates & verifies checksums at the client side.
  • Constructor Details

    • ChecksumFileSystem

      public ChecksumFileSystem(FileSystem fs)
  • Method Details

    • getApproxChkSumLength

      public static double getApproxChkSumLength(long size)
    • setConf

      public void setConf(Configuration conf)
      Description copied from interface: Configurable
      Set the configuration to be used by this object.
      Specified by:
      setConf in interface Configurable
      Overrides:
      setConf in class Configured
      Parameters:
      conf - configuration to be used
    • setVerifyChecksum

      public void setVerifyChecksum(boolean verifyChecksum)
      Set whether to verify checksum.
      Overrides:
      setVerifyChecksum in class FilterFileSystem
      Parameters:
      verifyChecksum - Verify checksum flag
    • getVerifyChecksum

      public boolean getVerifyChecksum()
      Is checksum verification enabled?
      Returns:
      true if files are to be verified through checksums.
    • setWriteChecksum

      public void setWriteChecksum(boolean writeChecksum)
      Description copied from class: FileSystem
      Set the write checksum flag. This is only applicable if the corresponding filesystem supports checksums. By default doesn't do anything.
      Overrides:
      setWriteChecksum in class FilterFileSystem
      Parameters:
      writeChecksum - Write checksum flag
    • getRawFileSystem

      public FileSystem getRawFileSystem()
      get the raw file system
      Overrides:
      getRawFileSystem in class FilterFileSystem
      Returns:
      FileSystem being filtered
    • getChecksumFile

      public Path getChecksumFile(Path file)
      Return the name of the checksum file associated with a file.
      Parameters:
      file - the file path.
      Returns:
      name of the checksum file associated with a file.
    • isChecksumFile

      public static boolean isChecksumFile(Path file)
      Return true if file is a checksum file name.
      Parameters:
      file - the file path.
      Returns:
      if file is a checksum file true, not false.
    • getChecksumFileLength

      public long getChecksumFileLength(Path file, long fileSize)
      Return the length of the checksum file given the size of the actual file.
      Parameters:
      file - the file path.
      fileSize - file size.
      Returns:
      checksum length.
    • getBytesPerSum

      public int getBytesPerSum()
      Return the bytes Per Checksum.
      Returns:
      bytes per check sum.
    • open

      public FSDataInputStream open(Path f, int bufferSize) throws IOException
      Opens an FSDataInputStream at the indicated Path.
      Overrides:
      open in class FilterFileSystem
      Parameters:
      f - the file name to open
      bufferSize - the size of the buffer to be used.
      Returns:
      input stream.
      Throws:
      IOException - if an I/O error occurs.
    • append

      public FSDataOutputStream append(Path f, int bufferSize, Progressable progress) throws IOException
      Description copied from class: FileSystem
      Append to an existing file (optional operation).
      Overrides:
      append in class FilterFileSystem
      Parameters:
      f - the existing file to be appended.
      bufferSize - the size of the buffer to be used.
      progress - for reporting progress if it is not null.
      Returns:
      output stream.
      Throws:
      IOException - IO failure
    • truncate

      public boolean truncate(Path f, long newLength) throws IOException
      Description copied from class: FileSystem
      Truncate the file in the indicated path to the indicated size.
      • Fails if path is a directory.
      • Fails if path does not exist.
      • Fails if path is not closed.
      • Fails if new size is greater than current size.
      Overrides:
      truncate in class FilterFileSystem
      Parameters:
      f - The path to the file to be truncated
      newLength - The size the file is to be truncated to
      Returns:
      true if the file has been truncated to the desired newLength and is immediately available to be reused for write operations such as append, or false if a background process of adjusting the length of the last block has been started, and clients should wait for it to complete before proceeding with further file updates.
      Throws:
      IOException - IO failure
    • concat

      public void concat(Path f, Path[] psrcs) throws IOException
      Description copied from class: FileSystem
      Concat existing files together.
      Overrides:
      concat in class FilterFileSystem
      Parameters:
      f - the path to the target destination.
      psrcs - the paths to the sources to use for the concatenation.
      Throws:
      IOException - IO failure
    • getChecksumLength

      public static long getChecksumLength(long size, int bytesPerSum)
      Calculated the length of the checksum file in bytes.
      Parameters:
      size - the length of the data file in bytes
      bytesPerSum - the number of bytes in a checksum block
      Returns:
      the number of bytes in the checksum file
    • create

      public FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException
      Description copied from class: FileSystem
      Create an FSDataOutputStream at the indicated Path with write-progress reporting.
      Overrides:
      create in class FilterFileSystem
      Parameters:
      f - the file name to open
      permission - file permission
      overwrite - if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.
      bufferSize - the size of the buffer to be used.
      replication - required block replication for the file.
      blockSize - block size
      progress - the progress reporter
      Returns:
      output stream.
      Throws:
      IOException - IO failure
      See Also:
    • createNonRecursive

      public FSDataOutputStream createNonRecursive(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException
      Description copied from class: FileSystem
      Opens an FSDataOutputStream at the indicated Path with write-progress reporting. Same as create(), except fails if parent directory doesn't already exist.
      Overrides:
      createNonRecursive in class FileSystem
      Parameters:
      f - the file name to open
      permission - file permission
      overwrite - if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.
      bufferSize - the size of the buffer to be used.
      replication - required block replication for the file.
      blockSize - block size
      progress - the progress reporter
      Returns:
      output stream.
      Throws:
      IOException - IO failure
      See Also:
    • create

      public FSDataOutputStream create(Path f, FsPermission permission, EnumSet<CreateFlag> flags, int bufferSize, short replication, long blockSize, Progressable progress, org.apache.hadoop.fs.Options.ChecksumOpt checksumOpt) throws IOException
      Description copied from class: FileSystem
      Create an FSDataOutputStream at the indicated Path with a custom checksum option.
      Overrides:
      create in class FilterFileSystem
      Parameters:
      f - the file name to open
      permission - file permission
      flags - CreateFlags to use for this stream.
      bufferSize - the size of the buffer to be used.
      replication - required block replication for the file.
      blockSize - block size
      progress - the progress reporter
      checksumOpt - checksum parameter. If null, the values found in conf will be used.
      Returns:
      output stream.
      Throws:
      IOException - IO failure
      See Also:
    • createNonRecursive

      public FSDataOutputStream createNonRecursive(Path f, FsPermission permission, EnumSet<CreateFlag> flags, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException
      Description copied from class: FileSystem
      Opens an FSDataOutputStream at the indicated Path with write-progress reporting. Same as create(), except fails if parent directory doesn't already exist.
      Overrides:
      createNonRecursive in class FilterFileSystem
      Parameters:
      f - the file name to open
      permission - file permission
      flags - CreateFlags to use for this stream.
      bufferSize - the size of the buffer to be used.
      replication - required block replication for the file.
      blockSize - block size
      progress - the progress reporter
      Returns:
      output stream.
      Throws:
      IOException - IO failure
      See Also:
    • setPermission

      public void setPermission(Path src, FsPermission permission) throws IOException
      Description copied from class: FileSystem
      Set permission of a path.
      Overrides:
      setPermission in class FilterFileSystem
      Parameters:
      src - The path
      permission - permission
      Throws:
      IOException - IO failure
    • setOwner

      public void setOwner(Path src, String username, String groupname) throws IOException
      Description copied from class: FileSystem
      Set owner of a path (i.e. a file or a directory). The parameters username and groupname cannot both be null.
      Overrides:
      setOwner in class FilterFileSystem
      Parameters:
      src - The path
      username - If it is null, the original username remains unchanged.
      groupname - If it is null, the original groupname remains unchanged.
      Throws:
      IOException - IO failure
    • setAcl

      public void setAcl(Path src, List<AclEntry> aclSpec) throws IOException
      Description copied from class: FileSystem
      Fully replaces ACL of files and directories, discarding all existing entries.
      Overrides:
      setAcl in class FilterFileSystem
      Parameters:
      src - Path to modify
      aclSpec - List describing modifications, which must include entries for user, group, and others for compatibility with permission bits.
      Throws:
      IOException - if an ACL could not be modified
    • modifyAclEntries

      public void modifyAclEntries(Path src, List<AclEntry> aclSpec) throws IOException
      Description copied from class: FileSystem
      Modifies ACL entries of files and directories. This method can add new ACL entries or modify the permissions on existing ACL entries. All existing ACL entries that are not specified in this call are retained without changes. (Modifications are merged into the current ACL.)
      Overrides:
      modifyAclEntries in class FilterFileSystem
      Parameters:
      src - Path to modify
      aclSpec - List<AclEntry> describing modifications
      Throws:
      IOException - if an ACL could not be modified
    • removeAcl

      public void removeAcl(Path src) throws IOException
      Description copied from class: FileSystem
      Removes all but the base ACL entries of files and directories. The entries for user, group, and others are retained for compatibility with permission bits.
      Overrides:
      removeAcl in class FilterFileSystem
      Parameters:
      src - Path to modify
      Throws:
      IOException - if an ACL could not be removed
    • removeAclEntries

      public void removeAclEntries(Path src, List<AclEntry> aclSpec) throws IOException
      Description copied from class: FileSystem
      Removes ACL entries from files and directories. Other ACL entries are retained.
      Overrides:
      removeAclEntries in class FilterFileSystem
      Parameters:
      src - Path to modify
      aclSpec - List describing entries to remove
      Throws:
      IOException - if an ACL could not be modified
    • removeDefaultAcl

      public void removeDefaultAcl(Path src) throws IOException
      Description copied from class: FileSystem
      Removes all default ACL entries from files and directories.
      Overrides:
      removeDefaultAcl in class FilterFileSystem
      Parameters:
      src - Path to modify
      Throws:
      IOException - if an ACL could not be modified
    • setReplication

      public boolean setReplication(Path src, short replication) throws IOException
      Set replication for an existing file. Implement the abstract setReplication of FileSystem
      Overrides:
      setReplication in class FilterFileSystem
      Parameters:
      src - file name
      replication - new replication
      Returns:
      true if successful; false if file does not exist or is a directory
      Throws:
      IOException - if an I/O error occurs.
    • rename

      public boolean rename(Path src, Path dst) throws IOException
      Rename files/dirs
      Overrides:
      rename in class FilterFileSystem
      Parameters:
      src - path to be renamed
      dst - new path after rename
      Returns:
      true if rename is successful
      Throws:
      IOException - on failure
    • delete

      public boolean delete(Path f, boolean recursive) throws IOException
      Implement the delete(Path, boolean) in checksum file system.
      Overrides:
      delete in class FilterFileSystem
      Parameters:
      f - the path to delete.
      recursive - if path is a directory and set to true, the directory is deleted else throws an exception. In case of a file the recursive can be set to either true or false.
      Returns:
      true if delete is successful else false.
      Throws:
      IOException - IO failure
    • listStatus

      public FileStatus[] listStatus(Path f) throws IOException
      List the statuses of the files/directories in the given path if the path is a directory.
      Overrides:
      listStatus in class FilterFileSystem
      Parameters:
      f - given path
      Returns:
      the statuses of the files/directories in the given path
      Throws:
      IOException - if an I/O error occurs.
    • listStatusIterator

      public org.apache.hadoop.fs.RemoteIterator<FileStatus> listStatusIterator(Path p) throws IOException
      Description copied from class: FilterFileSystem
      Return a remote iterator for listing in a directory
      Overrides:
      listStatusIterator in class FilterFileSystem
      Parameters:
      p - target path
      Returns:
      remote iterator
      Throws:
      FileNotFoundException - if p does not exist
      IOException - if any I/O error occurred
    • listLocatedStatus

      public org.apache.hadoop.fs.RemoteIterator<LocatedFileStatus> listLocatedStatus(Path f) throws IOException
      List the statuses of the files/directories in the given path if the path is a directory.
      Overrides:
      listLocatedStatus in class FilterFileSystem
      Parameters:
      f - given path
      Returns:
      the statuses of the files/directories in the given patch
      Throws:
      IOException - if an I/O error occurs.
    • mkdirs

      public boolean mkdirs(Path f) throws IOException
      Description copied from class: FileSystem
      Call FileSystem.mkdirs(Path, FsPermission) with default permission.
      Overrides:
      mkdirs in class FilterFileSystem
      Parameters:
      f - path
      Returns:
      true if the directory was created
      Throws:
      IOException - IO failure
    • copyFromLocalFile

      public void copyFromLocalFile(boolean delSrc, Path src, Path dst) throws IOException
      Description copied from class: FilterFileSystem
      The src file is on the local disk. Add it to FS at the given dst name. delSrc indicates if the source should be removed
      Overrides:
      copyFromLocalFile in class FilterFileSystem
      Parameters:
      delSrc - whether to delete the src
      src - path
      dst - path
      Throws:
      IOException - IO failure.
    • copyToLocalFile

      public void copyToLocalFile(boolean delSrc, Path src, Path dst) throws IOException
      The src file is under FS, and the dst is on the local disk. Copy it from FS control to the local dst name.
      Overrides:
      copyToLocalFile in class FilterFileSystem
      Parameters:
      delSrc - whether to delete the src
      src - path src file in the remote filesystem
      dst - path local destination
      Throws:
      IOException - IO failure
    • copyToLocalFile

      public void copyToLocalFile(Path src, Path dst, boolean copyCrc) throws IOException
      The src file is under FS, and the dst is on the local disk. Copy it from FS control to the local dst name. If src and dst are directories, the copyCrc parameter determines whether to copy CRC files.
      Parameters:
      src - src path.
      dst - dst path.
      copyCrc - copy csc flag.
      Throws:
      IOException - if an I/O error occurs.
    • startLocalOutput

      public Path startLocalOutput(Path fsOutputFile, Path tmpLocalFile) throws IOException
      Description copied from class: FilterFileSystem
      Returns a local File that the user can write output to. The caller provides both the eventual FS target name and the local working file. If the FS is local, we write directly into the target. If the FS is remote, we write into the tmp local area.
      Overrides:
      startLocalOutput in class FilterFileSystem
      Parameters:
      fsOutputFile - path of output file
      tmpLocalFile - path of local tmp file
      Returns:
      the path.
      Throws:
      IOException - IO failure
    • completeLocalOutput

      public void completeLocalOutput(Path fsOutputFile, Path tmpLocalFile) throws IOException
      Description copied from class: FilterFileSystem
      Called when we're all done writing to the target. A local FS will do nothing, because we've written to exactly the right place. A remote FS will copy the contents of tmpLocalFile to the correct target at fsOutputFile.
      Overrides:
      completeLocalOutput in class FilterFileSystem
      Parameters:
      fsOutputFile - path of output file
      tmpLocalFile - path to local tmp file
      Throws:
      IOException - IO failure
    • reportChecksumFailure

      public boolean reportChecksumFailure(Path f, FSDataInputStream in, long inPos, FSDataInputStream sums, long sumsPos)
      Report a checksum error to the file system.
      Parameters:
      f - the file name containing the error
      in - the stream open on the file
      inPos - the position of the beginning of the bad data in the file
      sums - the stream open on the checksum file
      sumsPos - the position of the beginning of the bad data in the checksum file
      Returns:
      if retry is necessary
    • openFile

      This is overridden to ensure that this class's openFileWithOptions(org.apache.hadoop.fs.Path, org.apache.hadoop.fs.impl.OpenFileParameters)() method is called, and so ultimately its open(Path, int). Open a file for reading through a builder API. Ultimately calls FileSystem.open(Path, int) unless a subclass executes the open command differently. The semantics of this call are therefore the same as that of FileSystem.open(Path, int) with one special point: it is in FSDataInputStreamBuilder.build() in which the open operation takes place -it is there where all preconditions to the operation are checked.
      Overrides:
      openFile in class FilterFileSystem
      Parameters:
      path - file path
      Returns:
      a FSDataInputStreamBuilder object to build the input stream
      Throws:
      IOException - if some early checks cause IO failures.
      UnsupportedOperationException - if support is checked early.
    • openFileWithOptions

      protected CompletableFuture<FSDataInputStream> openFileWithOptions(Path path, org.apache.hadoop.fs.impl.OpenFileParameters parameters) throws IOException
      Open the file as a blocking call to open(Path, int). Execute the actual open file operation. This is invoked from FSDataInputStreamBuilder.build() and from DelegateToFileSystem and is where the action of opening the file should begin. The base implementation performs a blocking call to FileSystem.open(Path, int) in this call; the actual outcome is in the returned CompletableFuture. This avoids having to create some thread pool, while still setting up the expectation that the get() call is needed to evaluate the result.
      Overrides:
      openFileWithOptions in class FilterFileSystem
      Parameters:
      path - path to the file
      parameters - open file parameters from the builder.
      Returns:
      a future which will evaluate to the opened file.
      Throws:
      IOException - failure to resolve the link.
    • createFile

      public FSDataOutputStreamBuilder createFile(Path path)
      This is overridden to ensure that this class's create() method is ultimately called. Create a new FSDataOutputStreamBuilder for the file with path. Files are overwritten by default.
      Overrides:
      createFile in class FilterFileSystem
      Parameters:
      path - file path
      Returns:
      a FSDataOutputStreamBuilder object to build the file HADOOP-14384. Temporarily reduce the visibility of method before the builder interface becomes stable.
    • appendFile

      public FSDataOutputStreamBuilder appendFile(Path path)
      This is overridden to ensure that this class's create() method is ultimately called. Create a Builder to append a file.
      Overrides:
      appendFile in class FilterFileSystem
      Parameters:
      path - file path.
      Returns:
      a FSDataOutputStreamBuilder to build file append request.
    • hasPathCapability

      public boolean hasPathCapability(Path path, String capability) throws IOException
      Disable those operations which the checksummed FS blocks. The base FileSystem implementation generally has no knowledge of the capabilities of actual implementations. Unless it has a way to explicitly determine the capabilities, this method returns false. Probe for a specific capability under the given path. If the function returns true, this instance is explicitly declaring that the capability is available. If the function returns false, it can mean one of:
      • The capability is not known.
      • The capability is known but it is not supported.
      • The capability is known but the filesystem does not know if it is supported under the supplied path.
      The core guarantee which a caller can rely on is: if the predicate returns true, then the specific operation/behavior can be expected to be supported. However a specific call may be rejected for permission reasons, the actual file/directory not being present, or some other failure during the attempted execution of the operation.

      Implementors: PathCapabilitiesSupport can be used to help implement this method.

      Specified by:
      hasPathCapability in interface org.apache.hadoop.fs.PathCapabilities
      Overrides:
      hasPathCapability in class FilterFileSystem
      Parameters:
      path - path to query the capability of.
      capability - non-null, non-empty string to query the path for support.
      Returns:
      true if the capability is supported under that part of the FS.
      Throws:
      IOException - this should not be raised, except on problems resolving paths or relaying the call.