org.apache.hadoop.fs
Class ChecksumFileSystem

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.hadoop.fs.FileSystem
          extended by org.apache.hadoop.fs.FilterFileSystem
              extended by org.apache.hadoop.fs.ChecksumFileSystem
All Implemented Interfaces:
Closeable, Configurable
Direct Known Subclasses:
LocalFileSystem

@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class ChecksumFileSystem
extends FilterFileSystem

Abstract Checksumed FileSystem. It provide a basic implementation of a Checksumed FileSystem, which creates a checksum file for each raw file. It generates & verifies checksums at the client side.


Field Summary
 
Fields inherited from class org.apache.hadoop.fs.FilterFileSystem
fs, swapScheme
 
Fields inherited from class org.apache.hadoop.fs.FileSystem
DEFAULT_FS, FS_DEFAULT_NAME_KEY, LOG, SHUTDOWN_HOOK_PRIORITY, statistics
 
Constructor Summary
ChecksumFileSystem(FileSystem fs)
           
 
Method Summary
 FSDataOutputStream append(Path f, int bufferSize, Progressable progress)
          Append to an existing file (optional operation).
 void completeLocalOutput(Path fsOutputFile, Path tmpLocalFile)
          Called when we're all done writing to the target.
 void copyFromLocalFile(boolean delSrc, Path src, Path dst)
          The src file is on the local disk.
 void copyToLocalFile(boolean delSrc, Path src, Path dst)
          The src file is under FS, and the dst is on the local disk.
 void copyToLocalFile(Path src, Path dst, boolean copyCrc)
          The src file is under FS, and the dst is on the local disk.
 FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress)
          Create an FSDataOutputStream at the indicated Path with write-progress reporting.
 FSDataOutputStream createNonRecursive(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress)
          Opens an FSDataOutputStream at the indicated Path with write-progress reporting.
 boolean delete(Path f, boolean recursive)
          Implement the delete(Path, boolean) in checksum file system.
static double getApproxChkSumLength(long size)
           
 int getBytesPerSum()
          Return the bytes Per Checksum
 Path getChecksumFile(Path file)
          Return the name of the checksum file associated with a file.
 long getChecksumFileLength(Path file, long fileSize)
          Return the length of the checksum file given the size of the actual file.
static long getChecksumLength(long size, int bytesPerSum)
          Calculated the length of the checksum file in bytes.
 FileSystem getRawFileSystem()
          get the raw file system
static boolean isChecksumFile(Path file)
          Return true iff file is a checksum file name.
 org.apache.hadoop.fs.RemoteIterator<LocatedFileStatus> listLocatedStatus(Path f)
          List the statuses of the files/directories in the given path if the path is a directory.
 FileStatus[] listStatus(Path f)
          List the statuses of the files/directories in the given path if the path is a directory.
 boolean mkdirs(Path f)
          Call FileSystem.mkdirs(Path, FsPermission) with default permission.
 FSDataInputStream open(Path f, int bufferSize)
          Opens an FSDataInputStream at the indicated Path.
 boolean rename(Path src, Path dst)
          Rename files/dirs
 boolean reportChecksumFailure(Path f, FSDataInputStream in, long inPos, FSDataInputStream sums, long sumsPos)
          Report a checksum error to the file system.
 void setConf(Configuration conf)
          Set the configuration to be used by this object.
 boolean setReplication(Path src, short replication)
          Set replication for an existing file.
 void setVerifyChecksum(boolean verifyChecksum)
          Set whether to verify checksum.
 void setWriteChecksum(boolean writeChecksum)
          Set the write checksum flag.
 Path startLocalOutput(Path fsOutputFile, Path tmpLocalFile)
          Returns a local File that the user can write output to.
 
Methods inherited from class org.apache.hadoop.fs.FilterFileSystem
canonicalizeUri, checkPath, close, concat, copyFromLocalFile, copyFromLocalFile, createNonRecursive, createSnapshot, createSymlink, deleteSnapshot, getAclStatus, getCanonicalUri, getChildFileSystems, getConf, getDefaultBlockSize, getDefaultBlockSize, getDefaultReplication, getDefaultReplication, getFileBlockLocations, getFileChecksum, getFileChecksum, getFileLinkStatus, getFileStatus, getHomeDirectory, getInitialWorkingDirectory, getLinkTarget, getServerDefaults, getServerDefaults, getStatus, getUri, getUsed, getWorkingDirectory, getXAttr, getXAttrs, getXAttrs, initialize, listCorruptFileBlocks, listXAttrs, makeQualified, mkdirs, modifyAclEntries, primitiveCreate, primitiveMkdir, removeAcl, removeAclEntries, removeDefaultAcl, removeXAttr, renameSnapshot, resolveLink, resolvePath, setAcl, setOwner, setPermission, setTimes, setWorkingDirectory, setXAttr, setXAttr, supportsSymlinks
 
Methods inherited from class org.apache.hadoop.fs.FileSystem
append, append, areSymlinksEnabled, cancelDeleteOnExit, clearStatistics, closeAll, closeAllForUGI, copyFromLocalFile, copyToLocalFile, copyToLocalFile, create, create, create, create, create, create, create, create, create, create, create, create, createNewFile, createNonRecursive, createSnapshot, delete, deleteOnExit, enableSymlinks, exists, fixRelativePart, get, get, get, getAllStatistics, getBlockSize, getContentSummary, getDefaultPort, getDefaultUri, getFileBlockLocations, getFileSystemClass, getFSofPath, getLength, getLocal, getName, getNamed, getReplication, getScheme, getStatistics, getStatistics, getStatus, globStatus, globStatus, isDirectory, isFile, listFiles, listLocatedStatus, listStatus, listStatus, listStatus, mkdirs, moveFromLocalFile, moveFromLocalFile, moveToLocalFile, newInstance, newInstance, newInstance, newInstanceLocal, open, primitiveMkdir, printStatistics, processDeleteOnExit, rename, setDefaultUri, setDefaultUri
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ChecksumFileSystem

public ChecksumFileSystem(FileSystem fs)
Method Detail

getApproxChkSumLength

public static double getApproxChkSumLength(long size)

setConf

public void setConf(Configuration conf)
Description copied from interface: Configurable
Set the configuration to be used by this object.

Specified by:
setConf in interface Configurable
Overrides:
setConf in class Configured

setVerifyChecksum

public void setVerifyChecksum(boolean verifyChecksum)
Set whether to verify checksum.

Overrides:
setVerifyChecksum in class FilterFileSystem

setWriteChecksum

public void setWriteChecksum(boolean writeChecksum)
Description copied from class: FileSystem
Set the write checksum flag. This is only applicable if the corresponding FileSystem supports checksum. By default doesn't do anything.

Overrides:
setWriteChecksum in class FilterFileSystem

getRawFileSystem

public FileSystem getRawFileSystem()
get the raw file system

Overrides:
getRawFileSystem in class FilterFileSystem
Returns:
FileSystem being filtered

getChecksumFile

public Path getChecksumFile(Path file)
Return the name of the checksum file associated with a file.


isChecksumFile

public static boolean isChecksumFile(Path file)
Return true iff file is a checksum file name.


getChecksumFileLength

public long getChecksumFileLength(Path file,
                                  long fileSize)
Return the length of the checksum file given the size of the actual file.


getBytesPerSum

public int getBytesPerSum()
Return the bytes Per Checksum


open

public FSDataInputStream open(Path f,
                              int bufferSize)
                       throws IOException
Opens an FSDataInputStream at the indicated Path.

Overrides:
open in class FilterFileSystem
Parameters:
f - the file name to open
bufferSize - the size of the buffer to be used.
Throws:
IOException

append

public FSDataOutputStream append(Path f,
                                 int bufferSize,
                                 Progressable progress)
                          throws IOException
Description copied from class: FileSystem
Append to an existing file (optional operation).

Overrides:
append in class FilterFileSystem
Parameters:
f - the existing file to be appended.
bufferSize - the size of the buffer to be used.
progress - for reporting progress if it is not null.
Throws:
IOException

getChecksumLength

public static long getChecksumLength(long size,
                                     int bytesPerSum)
Calculated the length of the checksum file in bytes.

Parameters:
size - the length of the data file in bytes
bytesPerSum - the number of bytes in a checksum block
Returns:
the number of bytes in the checksum file

create

public FSDataOutputStream create(Path f,
                                 FsPermission permission,
                                 boolean overwrite,
                                 int bufferSize,
                                 short replication,
                                 long blockSize,
                                 Progressable progress)
                          throws IOException
Description copied from class: FileSystem
Create an FSDataOutputStream at the indicated Path with write-progress reporting.

Overrides:
create in class FilterFileSystem
Parameters:
f - the file name to open
overwrite - if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.
bufferSize - the size of the buffer to be used.
replication - required block replication for the file.
Throws:
IOException
See Also:
FileSystem.setPermission(Path, FsPermission)

createNonRecursive

public FSDataOutputStream createNonRecursive(Path f,
                                             FsPermission permission,
                                             boolean overwrite,
                                             int bufferSize,
                                             short replication,
                                             long blockSize,
                                             Progressable progress)
                                      throws IOException
Description copied from class: FileSystem
Opens an FSDataOutputStream at the indicated Path with write-progress reporting. Same as create(), except fails if parent directory doesn't already exist.

Overrides:
createNonRecursive in class FileSystem
Parameters:
f - the file name to open
overwrite - if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.
bufferSize - the size of the buffer to be used.
replication - required block replication for the file.
Throws:
IOException
See Also:
FileSystem.setPermission(Path, FsPermission)

setReplication

public boolean setReplication(Path src,
                              short replication)
                       throws IOException
Set replication for an existing file. Implement the abstract setReplication of FileSystem

Overrides:
setReplication in class FilterFileSystem
Parameters:
src - file name
replication - new replication
Returns:
true if successful; false if file does not exist or is a directory
Throws:
IOException

rename

public boolean rename(Path src,
                      Path dst)
               throws IOException
Rename files/dirs

Overrides:
rename in class FilterFileSystem
Parameters:
src - path to be renamed
dst - new path after rename
Returns:
true if rename is successful
Throws:
IOException - on failure

delete

public boolean delete(Path f,
                      boolean recursive)
               throws IOException
Implement the delete(Path, boolean) in checksum file system.

Overrides:
delete in class FilterFileSystem
Parameters:
f - the path to delete.
recursive - if path is a directory and set to true, the directory is deleted else throws an exception. In case of a file the recursive can be set to either true or false.
Returns:
true if delete is successful else false.
Throws:
IOException

listStatus

public FileStatus[] listStatus(Path f)
                        throws IOException
List the statuses of the files/directories in the given path if the path is a directory.

Overrides:
listStatus in class FilterFileSystem
Parameters:
f - given path
Returns:
the statuses of the files/directories in the given patch
Throws:
IOException
FileNotFoundException - when the path does not exist; IOException see specific implementation

listLocatedStatus

public org.apache.hadoop.fs.RemoteIterator<LocatedFileStatus> listLocatedStatus(Path f)
                                                                         throws IOException
List the statuses of the files/directories in the given path if the path is a directory.

Overrides:
listLocatedStatus in class FilterFileSystem
Parameters:
f - given path
Returns:
the statuses of the files/directories in the given patch
Throws:
IOException
FileNotFoundException - If f does not exist

mkdirs

public boolean mkdirs(Path f)
               throws IOException
Description copied from class: FileSystem
Call FileSystem.mkdirs(Path, FsPermission) with default permission.

Overrides:
mkdirs in class FileSystem
Throws:
IOException

copyFromLocalFile

public void copyFromLocalFile(boolean delSrc,
                              Path src,
                              Path dst)
                       throws IOException
Description copied from class: FilterFileSystem
The src file is on the local disk. Add it to FS at the given dst name. delSrc indicates if the source should be removed

Overrides:
copyFromLocalFile in class FilterFileSystem
Parameters:
delSrc - whether to delete the src
src - path
dst - path
Throws:
IOException

copyToLocalFile

public void copyToLocalFile(boolean delSrc,
                            Path src,
                            Path dst)
                     throws IOException
The src file is under FS, and the dst is on the local disk. Copy it from FS control to the local dst name.

Overrides:
copyToLocalFile in class FilterFileSystem
Parameters:
delSrc - whether to delete the src
src - path
dst - path
Throws:
IOException

copyToLocalFile

public void copyToLocalFile(Path src,
                            Path dst,
                            boolean copyCrc)
                     throws IOException
The src file is under FS, and the dst is on the local disk. Copy it from FS control to the local dst name. If src and dst are directories, the copyCrc parameter determines whether to copy CRC files.

Throws:
IOException

startLocalOutput

public Path startLocalOutput(Path fsOutputFile,
                             Path tmpLocalFile)
                      throws IOException
Description copied from class: FilterFileSystem
Returns a local File that the user can write output to. The caller provides both the eventual FS target name and the local working file. If the FS is local, we write directly into the target. If the FS is remote, we write into the tmp local area.

Overrides:
startLocalOutput in class FilterFileSystem
Parameters:
fsOutputFile - path of output file
tmpLocalFile - path of local tmp file
Throws:
IOException

completeLocalOutput

public void completeLocalOutput(Path fsOutputFile,
                                Path tmpLocalFile)
                         throws IOException
Description copied from class: FilterFileSystem
Called when we're all done writing to the target. A local FS will do nothing, because we've written to exactly the right place. A remote FS will copy the contents of tmpLocalFile to the correct target at fsOutputFile.

Overrides:
completeLocalOutput in class FilterFileSystem
Parameters:
fsOutputFile - path of output file
tmpLocalFile - path to local tmp file
Throws:
IOException

reportChecksumFailure

public boolean reportChecksumFailure(Path f,
                                     FSDataInputStream in,
                                     long inPos,
                                     FSDataInputStream sums,
                                     long sumsPos)
Report a checksum error to the file system.

Parameters:
f - the file name containing the error
in - the stream open on the file
inPos - the position of the beginning of the bad data in the file
sums - the stream open on the checksum file
sumsPos - the position of the beginning of the bad data in the checksum file
Returns:
if retry is neccessary


Copyright © 2014 Apache Software Foundation. All Rights Reserved.