org.apache.hadoop.filecache
Class TrackerDistributedCacheManager

java.lang.Object
  extended by org.apache.hadoop.filecache.TrackerDistributedCacheManager

public class TrackerDistributedCacheManager
extends Object

Manages a single machine's instance of a cross-job cache. This class would typically be instantiated by a TaskTracker (or something that emulates it, like LocalJobRunner). This class is internal to Hadoop, and should not be treated as a public interface.


Nested Class Summary
protected  class TrackerDistributedCacheManager.BaseDirManager
          This class holds properties of each base directories and is responsible for clean up unused cache files in base directories.
protected  class TrackerDistributedCacheManager.CleanupThread
          A thread to check and cleanup the unused files periodically
 
Field Summary
protected  TrackerDistributedCacheManager.BaseDirManager baseDirManager
           
protected  TrackerDistributedCacheManager.CleanupThread cleanupThread
           
 
Constructor Summary
TrackerDistributedCacheManager(Configuration conf, TaskController controller)
           
 
Method Summary
static void createAllSymlink(Configuration conf, File jobCacheDir, File workDir)
          This method create symlinks for all files in a given dir in another directory.
static void determineTimestampsAndCacheVisibilities(Configuration job)
          Determines timestamps of files to be cached, and stores those in the configuration.
static long downloadCacheObject(Configuration conf, URI source, Path destination, long desiredTimestamp, boolean isArchive, FsPermission permission)
          Download a given path to the local file system.
static boolean[] getArchiveVisibilities(Configuration conf)
          Get the booleans on whether the archives are public or not.
static void getDelegationTokens(Configuration job, Credentials credentials)
          For each archive or cache file - get the corresponding delegation token
static boolean[] getFileVisibilities(Configuration conf)
          Get the booleans on whether the files are public or not.
protected  TaskDistributedCacheManager getTaskDistributedCacheManager(JobID jobId)
           
 TaskDistributedCacheManager newTaskDistributedCacheManager(JobID jobId, Configuration taskConf)
           
 void purgeCache()
          Clear the entire contents of the cache and delete the backing files.
 void removeTaskDistributedCacheManager(JobID jobId)
           
 void setArchiveSizes(JobID jobId, long[] sizes)
          Set the sizes for any archives, files, or directories in the private distributed cache.
 void startCleanupThread()
          Start the background thread
 void stopCleanupThread()
          Stop the background thread
static void validate(Configuration conf)
          This is part of the framework API.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

baseDirManager

protected TrackerDistributedCacheManager.BaseDirManager baseDirManager

cleanupThread

protected TrackerDistributedCacheManager.CleanupThread cleanupThread
Constructor Detail

TrackerDistributedCacheManager

public TrackerDistributedCacheManager(Configuration conf,
                                      TaskController controller)
                               throws IOException
Throws:
IOException
Method Detail

downloadCacheObject

public static long downloadCacheObject(Configuration conf,
                                       URI source,
                                       Path destination,
                                       long desiredTimestamp,
                                       boolean isArchive,
                                       FsPermission permission)
                                throws IOException
Download a given path to the local file system.

Parameters:
conf - the job's configuration
source - the source to copy from
destination - where to copy the file. must be local fs
desiredTimestamp - the required modification timestamp of the source
isArchive - is this an archive that should be expanded
permission - the desired permissions of the file.
Returns:
for archives, the number of bytes in the unpacked directory
Throws:
IOException

createAllSymlink

public static void createAllSymlink(Configuration conf,
                                    File jobCacheDir,
                                    File workDir)
                             throws IOException
This method create symlinks for all files in a given dir in another directory. Should not be used outside of DistributedCache code.

Parameters:
conf - the configuration
jobCacheDir - the target directory for creating symlinks
workDir - the directory in which the symlinks are created
Throws:
IOException

purgeCache

public void purgeCache()
Clear the entire contents of the cache and delete the backing files. This should only be used when the server is reinitializing, because the users are going to lose their files.


newTaskDistributedCacheManager

public TaskDistributedCacheManager newTaskDistributedCacheManager(JobID jobId,
                                                                  Configuration taskConf)
                                                           throws IOException
Throws:
IOException

setArchiveSizes

public void setArchiveSizes(JobID jobId,
                            long[] sizes)
                     throws IOException
Set the sizes for any archives, files, or directories in the private distributed cache.

Throws:
IOException

removeTaskDistributedCacheManager

public void removeTaskDistributedCacheManager(JobID jobId)

getTaskDistributedCacheManager

protected TaskDistributedCacheManager getTaskDistributedCacheManager(JobID jobId)

determineTimestampsAndCacheVisibilities

public static void determineTimestampsAndCacheVisibilities(Configuration job)
                                                    throws IOException
Determines timestamps of files to be cached, and stores those in the configuration. Determines the visibilities of the distributed cache files and archives. The visibility of a cache path is "public" if the leaf component has READ permissions for others, and the parent subdirs have EXECUTE permissions for others. This is an internal method!

Parameters:
job -
Throws:
IOException

getFileVisibilities

public static boolean[] getFileVisibilities(Configuration conf)
Get the booleans on whether the files are public or not. Used by internal DistributedCache and MapReduce code.

Parameters:
conf - The configuration which stored the timestamps
Returns:
array of booleans
Throws:
IOException

getArchiveVisibilities

public static boolean[] getArchiveVisibilities(Configuration conf)
Get the booleans on whether the archives are public or not. Used by internal DistributedCache and MapReduce code.

Parameters:
conf - The configuration which stored the timestamps
Returns:
array of booleans

getDelegationTokens

public static void getDelegationTokens(Configuration job,
                                       Credentials credentials)
                                throws IOException
For each archive or cache file - get the corresponding delegation token

Parameters:
job -
credentials -
Throws:
IOException

validate

public static void validate(Configuration conf)
                     throws InvalidJobConfException
This is part of the framework API. It's called within the job submission code only, not by users. In the non-error case it has no side effects and returns normally. If there's a URI in both mapred.cache.files and mapred.cache.archives, it throws its exception.

Parameters:
conf - a Configuration to be cheked for duplication in cached URIs
Throws:
InvalidJobConfException

startCleanupThread

public void startCleanupThread()
Start the background thread


stopCleanupThread

public void stopCleanupThread()
Stop the background thread



Copyright © 2009 The Apache Software Foundation