@InterfaceAudience.Public @InterfaceStability.Stable @Deprecated public class DistributedCache extends org.apache.hadoop.mapreduce.filecache.DistributedCache
DistributedCache is a facility provided by the Map-Reduce
 framework to cache files (text, archives, jars etc.) needed by applications.
 
Applications specify the files, via urls (hdfs:// or http://) to be cached
 via the JobConf. The
 DistributedCache assumes that the files specified via urls are
 already present on the FileSystem at the path specified by the url
 and are accessible by every machine in the cluster.
The framework will copy the necessary files on to the worker node before any tasks for the job are executed on that node. Its efficiency stems from the fact that the files are only copied once per job and the ability to cache archives which are un-archived on the workers.
DistributedCache can be used to distribute simple, read-only
 data/text files and/or more complex types such as archives, jars etc.
 Archives (zip, tar and tgz/tar.gz files) are un-archived at the worker nodes.
 Jars may be optionally added to the classpath of the tasks, a rudimentary
 software distribution mechanism.  Files have execution permissions.
 In older version of Hadoop Map/Reduce users could optionally ask for symlinks
 to be created in the working directory of the child task.  In the current
 version symlinks are always created.  If the URL does not have a fragment
 the name of the file or directory will be used. If multiple files or
 directories map to the same link name, the last one added, will be used.  All
 others will not even be downloaded.
DistributedCache tracks modification timestamps of the cache
 files. Clearly the cache files should not be modified by the application
 or externally while the job is executing.
Here is an illustrative example on how to use the
 DistributedCache:
     // Setting up the cache for the application
     1. Copy the requisite files to the FileSystem:
     $ bin/hadoop fs -copyFromLocal lookup.dat /myapp/lookup.dat
     $ bin/hadoop fs -copyFromLocal map.zip /myapp/map.zip
     $ bin/hadoop fs -copyFromLocal mylib.jar /myapp/mylib.jar
     $ bin/hadoop fs -copyFromLocal mytar.tar /myapp/mytar.tar
     $ bin/hadoop fs -copyFromLocal mytgz.tgz /myapp/mytgz.tgz
     $ bin/hadoop fs -copyFromLocal mytargz.tar.gz /myapp/mytargz.tar.gz
     2. Setup the application's JobConf:
     JobConf job = new JobConf();
     DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"),
                                   job);
     DistributedCache.addCacheArchive(new URI("/myapp/map.zip"), job);
     DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job);
     DistributedCache.addCacheArchive(new URI("/myapp/mytar.tar"), job);
     DistributedCache.addCacheArchive(new URI("/myapp/mytgz.tgz"), job);
     DistributedCache.addCacheArchive(new URI("/myapp/mytargz.tar.gz"), job);
     3. Use the cached files in the Mapper
     or Reducer:
     public static class MapClass extends MapReduceBase
     implements Mapper<K, V, K, V> {
       private Path[] localArchives;
       private Path[] localFiles;
       public void configure(JobConf job) {
         // Get the cached archives/files
         File f = new File("./map.zip/some/file/in/zip.txt");
       }
       public void map(K key, V value,
                       OutputCollector<K, V> output, Reporter reporter)
       throws IOException {
         // Use data from the cached archives/files here
         // ...
         // ...
         output.collect(k, v);
       }
     }
 GenericOptionsParser.
 This class includes methods that should be used by users
 (specifically those mentioned in the example above, as well
 as DistributedCache.addArchiveToClassPath(Path, Configuration)),
 as well as methods intended for use by the MapReduce framework
 (e.g., JobClient).| Modifier and Type | Field and Description | 
|---|---|
| static String | CACHE_ARCHIVESDeprecated.  | 
| static String | CACHE_ARCHIVES_SIZESDeprecated.  | 
| static String | CACHE_ARCHIVES_TIMESTAMPSDeprecated.  | 
| static String | CACHE_FILESDeprecated.  | 
| static String | CACHE_FILES_SIZESDeprecated.  | 
| static String | CACHE_FILES_TIMESTAMPSDeprecated.  | 
| static String | CACHE_LOCALARCHIVESDeprecated.  | 
| static String | CACHE_LOCALFILESDeprecated.  | 
| static String | CACHE_SYMLINKDeprecated.  | 
| Constructor and Description | 
|---|
| DistributedCache()Deprecated.  | 
| Modifier and Type | Method and Description | 
|---|---|
| static void | addLocalArchives(Configuration conf,
                String str)Deprecated.  | 
| static void | addLocalFiles(Configuration conf,
             String str)Deprecated.  | 
| static void | createAllSymlink(Configuration conf,
                File jobCacheDir,
                File workDir)Deprecated. 
 Internal to MapReduce framework.  Use DistributedCacheManager
 instead. | 
| static FileStatus | getFileStatus(Configuration conf,
             URI cache)Deprecated.  | 
| static long | getTimestamp(Configuration conf,
            URI cache)Deprecated.  | 
| static void | setArchiveTimestamps(Configuration conf,
                    String timestamps)Deprecated.  | 
| static void | setFileTimestamps(Configuration conf,
                 String timestamps)Deprecated.  | 
| static void | setLocalArchives(Configuration conf,
                String str)Deprecated.  | 
| static void | setLocalFiles(Configuration conf,
             String str)Deprecated.  | 
addArchiveToClassPath, addArchiveToClassPath, addCacheArchive, addCacheFile, addFileToClassPath, addFileToClassPath, addFileToClassPath, checkURIs, createSymlink, getArchiveClassPaths, getArchiveTimestamps, getArchiveVisibilities, getCacheArchives, getCacheFiles, getFileClassPaths, getFileTimestamps, getFileVisibilities, getLocalCacheArchives, getLocalCacheFiles, getSymlink, setCacheArchives, setCacheFiles@Deprecated public static final String CACHE_FILES_SIZES
CACHE_FILES_SIZES is not a *public* constant.
 The variable is kept for M/R 1.x applications, M/R 2.x applications should
 use MRJobConfig.CACHE_FILES_SIZES@Deprecated public static final String CACHE_ARCHIVES_SIZES
CACHE_ARCHIVES_SIZES is not a *public* constant.
 The variable is kept for M/R 1.x applications, M/R 2.x applications should
 use MRJobConfig.CACHE_ARCHIVES_SIZES@Deprecated public static final String CACHE_ARCHIVES_TIMESTAMPS
CACHE_ARCHIVES_TIMESTAMPS is not a *public* constant.
 The variable is kept for M/R 1.x applications, M/R 2.x applications should
 use MRJobConfig.CACHE_ARCHIVES_TIMESTAMPS@Deprecated public static final String CACHE_FILES_TIMESTAMPS
CACHE_FILES_TIMESTAMPS is not a *public* constant.
 The variable is kept for M/R 1.x applications, M/R 2.x applications should
 use MRJobConfig.CACHE_FILE_TIMESTAMPS@Deprecated public static final String CACHE_ARCHIVES
CACHE_ARCHIVES is not a *public* constant.
 The variable is kept for M/R 1.x applications, M/R 2.x applications should
 use MRJobConfig.CACHE_ARCHIVES@Deprecated public static final String CACHE_FILES
CACHE_FILES is not a *public* constant.
 The variable is kept for M/R 1.x applications, M/R 2.x applications should
 use MRJobConfig.CACHE_FILES@Deprecated public static final String CACHE_LOCALARCHIVES
CACHE_LOCALARCHIVES is not a *public* constant.
 The variable is kept for M/R 1.x applications, M/R 2.x applications should
 use MRJobConfig.CACHE_LOCALARCHIVES@Deprecated public static final String CACHE_LOCALFILES
CACHE_LOCALFILES is not a *public* constant.
 The variable is kept for M/R 1.x applications, M/R 2.x applications should
 use MRJobConfig.CACHE_LOCALFILES@Deprecated public static final String CACHE_SYMLINK
CACHE_SYMLINK is not a *public* constant.
 The variable is kept for M/R 1.x applications, M/R 2.x applications should
 use MRJobConfig.CACHE_SYMLINK@Deprecated public static void addLocalArchives(Configuration conf, String str)
conf - The conf to modify to contain the localized cachesstr - a comma separated list of local archives@Deprecated public static void addLocalFiles(Configuration conf, String str)
conf - The conf to modify to contain the localized cachesstr - a comma separated list of local files@Deprecated public static void createAllSymlink(Configuration conf, File jobCacheDir, File workDir) throws IOException
conf - the configurationjobCacheDir - the target directory for creating symlinksworkDir - the directory in which the symlinks are createdIOException@Deprecated public static FileStatus getFileStatus(Configuration conf, URI cache) throws IOException
FileStatus of a given cache file on hdfs. Internal to
 MapReduce.conf - configurationcache - cache fileFileStatus of a given cache file on hdfsIOException@Deprecated public static long getTimestamp(Configuration conf, URI cache) throws IOException
conf - configurationcache - cache fileIOException@Deprecated public static void setArchiveTimestamps(Configuration conf, String timestamps)
conf - Configuration which stores the timestamp'stimestamps - comma separated list of timestamps of archives.
 The order should be the same as the order in which the archives are added.@Deprecated public static void setFileTimestamps(Configuration conf, String timestamps)
conf - Configuration which stores the timestamp'stimestamps - comma separated list of timestamps of files.
 The order should be the same as the order in which the files are added.@Deprecated public static void setLocalArchives(Configuration conf, String str)
conf - The conf to modify to contain the localized cachesstr - a comma separated list of local archives@Deprecated public static void setLocalFiles(Configuration conf, String str)
conf - The conf to modify to contain the localized cachesstr - a comma separated list of local filesCopyright © 2018 Apache Software Foundation. All rights reserved.