org.apache.hadoop.fs
Class HarFileSystem

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.hadoop.fs.FileSystem
          extended by org.apache.hadoop.fs.FilterFileSystem
              extended by org.apache.hadoop.fs.HarFileSystem
All Implemented Interfaces:
Closeable, Configurable

public class HarFileSystem
extends FilterFileSystem

This is an implementation of the Hadoop Archive Filesystem. This archive Filesystem has index files of the form _index* and has contents of the form part-*. The index files store the indexes of the real files. The index files are of the form _masterindex and _index. The master index is a level of indirection in to the index file to make the look ups faster. the index file is sorted with hash code of the paths that it contains and the master index contains pointers to the positions in index for ranges of hashcodes.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.fs.FileSystem
FileSystem.Statistics
 
Field Summary
static int VERSION
           
 
Fields inherited from class org.apache.hadoop.fs.FilterFileSystem
fs
 
Fields inherited from class org.apache.hadoop.fs.FileSystem
FS_DEFAULT_NAME_KEY, LOG, statistics
 
Constructor Summary
HarFileSystem()
          public construction of harfilesystem
HarFileSystem(FileSystem fs)
          Constructor to create a HarFileSystem with an underlying filesystem.
 
Method Summary
 void close()
          No more filesystem operations are needed.
 void completeLocalOutput(Path fsOutputFile, Path tmpLocalFile)
          not implemented.
 void copyFromLocalFile(boolean delSrc, Path src, Path dst)
          not implemented.
 void copyToLocalFile(boolean delSrc, Path src, Path dst)
          copies the file in the har filesystem to a local file.
 FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress)
          Opens an FSDataOutputStream at the indicated Path with write-progress reporting.
 FSDataOutputStream create(Path f, int bufferSize)
           
 boolean delete(Path f, boolean recursive)
          Not implemented.
 String getCanonicalServiceName()
          Get a canonical service name for this file system.
 BlockLocation[] getFileBlockLocations(FileStatus file, long start, long len)
          Get block locations from the underlying fs and fix their offsets and lengths.
 FileChecksum getFileChecksum(Path f)
          Get the checksum of a file.
 FileStatus getFileStatus(Path f)
          return the filestatus of files in har archive.
static int getHarHash(Path p)
          the hash of the path p inside iniside the filesystem
 int getHarVersion()
           
 Path getHomeDirectory()
          return the top level archive path.
 URI getUri()
          Returns the uri of this filesystem.
 Path getWorkingDirectory()
          return the top level archive.
 void initialize(URI name, Configuration conf)
          Initialize a Har filesystem per har archive.
 FileStatus[] listStatus(Path f)
          liststatus returns the children of a directory after looking up the index files.
 Path makeQualified(Path path)
          Make sure that a path specifies a FileSystem.
 boolean mkdirs(Path f, FsPermission permission)
          not implemented.
 FSDataInputStream open(Path f, int bufferSize)
          Returns a har input stream which fakes end of file.
 void setOwner(Path p, String username, String groupname)
          not implemented.
 void setPermission(Path p, FsPermission permisssion)
          Not implemented.
 boolean setReplication(Path src, short replication)
          Not implemented.
 void setWorkingDirectory(Path newDir)
          Set the current working directory for the given file system.
 Path startLocalOutput(Path fsOutputFile, Path tmpLocalFile)
          not implemented.
 
Methods inherited from class org.apache.hadoop.fs.FilterFileSystem
append, checkPath, concat, delete, getConf, getDefaultBlockSize, getDefaultReplication, getName, rename, setVerifyChecksum
 
Methods inherited from class org.apache.hadoop.fs.FileSystem
addFileSystemForTesting, append, append, clearStatistics, closeAll, closeAllForUGI, copyFromLocalFile, copyFromLocalFile, copyFromLocalFile, copyToLocalFile, create, create, create, create, create, create, create, create, create, create, createNewFile, createNonRecursive, createNonRecursive, deleteOnExit, exists, get, get, get, getAllStatistics, getBlockSize, getCacheSize, getCanonicalUri, getContentSummary, getDefaultBlockSize, getDefaultPort, getDefaultReplication, getDefaultUri, getDelegationToken, getLength, getLocal, getNamed, getReplication, getStatistics, getStatistics, getUsed, globStatus, globStatus, isDirectory, isFile, listStatus, listStatus, listStatus, mkdirs, mkdirs, moveFromLocalFile, moveFromLocalFile, moveToLocalFile, open, printStatistics, processDeleteOnExit, setDefaultUri, setDefaultUri, setTimes
 
Methods inherited from class org.apache.hadoop.conf.Configured
setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

VERSION

public static final int VERSION
See Also:
Constant Field Values
Constructor Detail

HarFileSystem

public HarFileSystem()
public construction of harfilesystem


HarFileSystem

public HarFileSystem(FileSystem fs)
Constructor to create a HarFileSystem with an underlying filesystem.

Parameters:
fs -
Method Detail

initialize

public void initialize(URI name,
                       Configuration conf)
                throws IOException
Initialize a Har filesystem per har archive. The archive home directory is the top level directory in the filesystem that contains the HAR archive. Be careful with this method, you do not want to go on creating new Filesystem instances per call to path.getFileSystem(). the uri of Har is har://underlyingfsscheme-host:port/archivepath. or har:///archivepath. This assumes the underlying filesystem to be used in case not specified.

Overrides:
initialize in class FilterFileSystem
Parameters:
name - a uri whose authority section names the host, port, etc. for this FileSystem
conf - the configuration
Throws:
IOException

getHarVersion

public int getHarVersion()
                  throws IOException
Throws:
IOException

getWorkingDirectory

public Path getWorkingDirectory()
return the top level archive.

Overrides:
getWorkingDirectory in class FilterFileSystem
Returns:
the directory pathname

getUri

public URI getUri()
Returns the uri of this filesystem. The uri is of the form har://underlyingfsschema-host:port/pathintheunderlyingfs

Overrides:
getUri in class FilterFileSystem

getCanonicalServiceName

public String getCanonicalServiceName()
Description copied from class: FileSystem
Get a canonical service name for this file system. The token cache is the only user of this value, and uses it to lookup this filesystem's service tokens. The token cache will not attempt to acquire tokens if the service is null.

Overrides:
getCanonicalServiceName in class FilterFileSystem
Returns:
a service string that uniquely identifies this file system, null if the filesystem does not implement tokens
See Also:
SecurityUtil.buildDTServiceName(URI, int)

makeQualified

public Path makeQualified(Path path)
Description copied from class: FilterFileSystem
Make sure that a path specifies a FileSystem.

Overrides:
makeQualified in class FilterFileSystem

getFileBlockLocations

public BlockLocation[] getFileBlockLocations(FileStatus file,
                                             long start,
                                             long len)
                                      throws IOException
Get block locations from the underlying fs and fix their offsets and lengths.

Overrides:
getFileBlockLocations in class FilterFileSystem
Parameters:
file - the input filestatus to get block locations
start - the start of the desired range in the contained file
len - the length of the desired range
Returns:
block locations for this segment of file
Throws:
IOException

getHarHash

public static int getHarHash(Path p)
the hash of the path p inside iniside the filesystem

Parameters:
p - the path in the harfilesystem
Returns:
the hash code of the path.

getFileStatus

public FileStatus getFileStatus(Path f)
                         throws IOException
return the filestatus of files in har archive. The permission returned are that of the archive index files. The permissions are not persisted while creating a hadoop archive.

Overrides:
getFileStatus in class FilterFileSystem
Parameters:
f - the path in har filesystem
Returns:
filestatus.
Throws:
IOException
FileNotFoundException - when the path does not exist; IOException see specific implementation

getFileChecksum

public FileChecksum getFileChecksum(Path f)
Description copied from class: FilterFileSystem
Get the checksum of a file.

Overrides:
getFileChecksum in class FilterFileSystem
Parameters:
f - The file path
Returns:
null since no checksum algorithm is implemented.

open

public FSDataInputStream open(Path f,
                              int bufferSize)
                       throws IOException
Returns a har input stream which fakes end of file. It reads the index files to get the part file name and the size and start of the file.

Overrides:
open in class FilterFileSystem
Parameters:
f - the file name to open
bufferSize - the size of the buffer to be used.
Throws:
IOException

create

public FSDataOutputStream create(Path f,
                                 int bufferSize)
                          throws IOException
Throws:
IOException

create

public FSDataOutputStream create(Path f,
                                 FsPermission permission,
                                 boolean overwrite,
                                 int bufferSize,
                                 short replication,
                                 long blockSize,
                                 Progressable progress)
                          throws IOException
Description copied from class: FileSystem
Opens an FSDataOutputStream at the indicated Path with write-progress reporting.

Overrides:
create in class FilterFileSystem
Parameters:
f - the file name to open
overwrite - if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.
bufferSize - the size of the buffer to be used.
replication - required block replication for the file.
Throws:
IOException
See Also:
FileSystem.setPermission(Path, FsPermission)

close

public void close()
           throws IOException
Description copied from class: FileSystem
No more filesystem operations are needed. Will release any held locks.

Specified by:
close in interface Closeable
Overrides:
close in class FilterFileSystem
Throws:
IOException

setReplication

public boolean setReplication(Path src,
                              short replication)
                       throws IOException
Not implemented.

Overrides:
setReplication in class FilterFileSystem
Parameters:
src - file name
replication - new replication
Returns:
true if successful; false if file does not exist or is a directory
Throws:
IOException

delete

public boolean delete(Path f,
                      boolean recursive)
               throws IOException
Not implemented.

Overrides:
delete in class FilterFileSystem
Parameters:
f - the path to delete.
recursive - if path is a directory and set to true, the directory is deleted else throws an exception. In case of a file the recursive can be set to either true or false.
Returns:
true if delete is successful else false.
Throws:
IOException

listStatus

public FileStatus[] listStatus(Path f)
                        throws IOException
liststatus returns the children of a directory after looking up the index files.

Overrides:
listStatus in class FilterFileSystem
Parameters:
f - given path
Returns:
the statuses of the files/directories in the given patch returns null, if Path f does not exist in the FileSystem
Throws:
IOException

getHomeDirectory

public Path getHomeDirectory()
return the top level archive path.

Overrides:
getHomeDirectory in class FilterFileSystem

setWorkingDirectory

public void setWorkingDirectory(Path newDir)
Description copied from class: FilterFileSystem
Set the current working directory for the given file system. All relative paths will be resolved relative to it.

Overrides:
setWorkingDirectory in class FilterFileSystem

mkdirs

public boolean mkdirs(Path f,
                      FsPermission permission)
               throws IOException
not implemented.

Overrides:
mkdirs in class FilterFileSystem
Throws:
IOException

copyFromLocalFile

public void copyFromLocalFile(boolean delSrc,
                              Path src,
                              Path dst)
                       throws IOException
not implemented.

Overrides:
copyFromLocalFile in class FilterFileSystem
Throws:
IOException

copyToLocalFile

public void copyToLocalFile(boolean delSrc,
                            Path src,
                            Path dst)
                     throws IOException
copies the file in the har filesystem to a local file.

Overrides:
copyToLocalFile in class FilterFileSystem
Throws:
IOException

startLocalOutput

public Path startLocalOutput(Path fsOutputFile,
                             Path tmpLocalFile)
                      throws IOException
not implemented.

Overrides:
startLocalOutput in class FilterFileSystem
Throws:
IOException

completeLocalOutput

public void completeLocalOutput(Path fsOutputFile,
                                Path tmpLocalFile)
                         throws IOException
not implemented.

Overrides:
completeLocalOutput in class FilterFileSystem
Throws:
IOException

setOwner

public void setOwner(Path p,
                     String username,
                     String groupname)
              throws IOException
not implemented.

Overrides:
setOwner in class FilterFileSystem
Parameters:
p - The path
username - If it is null, the original username remains unchanged.
groupname - If it is null, the original groupname remains unchanged.
Throws:
IOException

setPermission

public void setPermission(Path p,
                          FsPermission permisssion)
                   throws IOException
Not implemented.

Overrides:
setPermission in class FilterFileSystem
Throws:
IOException


Copyright © 2009 The Apache Software Foundation