Class Job

java.lang.Object
org.apache.hadoop.mapreduce.task.JobContextImpl
org.apache.hadoop.mapreduce.Job
All Implemented Interfaces:
AutoCloseable, JobContext, org.apache.hadoop.mapreduce.MRJobConfig

@Public @Evolving public class Job extends org.apache.hadoop.mapreduce.task.JobContextImpl implements JobContext, AutoCloseable
The job submitter's view of the Job.

It allows the user to configure the job, submit it, control its execution, and query the state. The set methods only work until the job is submitted, afterwards they will throw an IllegalStateException.

Normally the user creates the application, describes various facets of the job via Job and then submits the job and monitor its progress.

Here is an example on how to submit a job:

     // Create a new Job
     Job job = Job.getInstance();
     job.setJarByClass(MyJob.class);
     
     // Specify various job-specific parameters     
     job.setJobName("myjob");
     
     job.setInputPath(new Path("in"));
     job.setOutputPath(new Path("out"));
     
     job.setMapperClass(MyJob.MyMapper.class);
     job.setReducerClass(MyJob.MyReducer.class);

     // Submit the job, then poll for progress until the job is complete
     job.waitForCompletion(true);
 
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static enum 
     
    static enum 
     
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
    Key in mapred-*.xml that sets completionPollInvervalMillis
    static final int
     
    static final boolean
     
    static final String
     
    static final String
    Key in mapred-*.xml that sets progMonitorPollIntervalMillis
    static final String
     
    static final String
     
    static final String
     

    Fields inherited from class org.apache.hadoop.mapreduce.task.JobContextImpl

    conf, credentials, ugi

    Fields inherited from interface org.apache.hadoop.mapreduce.MRJobConfig

    AM_NODE_LABEL_EXP, AM_STRICT_LOCALITY, APPLICATION_ATTEMPT_ID, APPLICATION_MASTER_CLASS, ARCHIVES_FOR_SHARED_CACHE, CACHE_ARCHIVES, CACHE_ARCHIVES_SHARED_CACHE_UPLOAD_POLICIES, CACHE_ARCHIVES_SIZES, CACHE_ARCHIVES_TIMESTAMPS, CACHE_ARCHIVES_VISIBILITIES, CACHE_FILE_TIMESTAMPS, CACHE_FILE_VISIBILITIES, CACHE_FILES, CACHE_FILES_SHARED_CACHE_UPLOAD_POLICIES, CACHE_FILES_SIZES, CACHE_LOCALARCHIVES, CACHE_LOCALFILES, CACHE_SYMLINK, CLASSPATH_ARCHIVES, CLASSPATH_FILES, COMBINE_CLASS_ATTR, COMBINE_RECORDS_BEFORE_PROGRESS, COMBINER_GROUP_COMPARATOR_CLASS, COMPLETED_MAPS_FOR_REDUCE_SLOWSTART, COUNTER_GROUP_NAME_MAX_DEFAULT, COUNTER_GROUP_NAME_MAX_KEY, COUNTER_GROUPS_MAX_DEFAULT, COUNTER_GROUPS_MAX_KEY, COUNTER_NAME_MAX_DEFAULT, COUNTER_NAME_MAX_KEY, COUNTERS_MAX_DEFAULT, COUNTERS_MAX_KEY, DEFAULT_FINISH_JOB_WHEN_REDUCERS_DONE, DEFAULT_HEAP_MEMORY_MB_RATIO, DEFAULT_IO_SORT_FACTOR, DEFAULT_IO_SORT_MB, DEFAULT_JOB_ACL_MODIFY_JOB, DEFAULT_JOB_ACL_VIEW_JOB, DEFAULT_JOB_AM_ACCESS_DISABLED, DEFAULT_JOB_DFS_STORAGE_CAPACITY_KILL_LIMIT_EXCEED, DEFAULT_JOB_MAX_MAP, DEFAULT_JOB_RUNNING_MAP_LIMIT, DEFAULT_JOB_RUNNING_REDUCE_LIMIT, DEFAULT_JOB_SINGLE_DISK_LIMIT_BYTES, DEFAULT_JOB_SINGLE_DISK_LIMIT_CHECK_INTERVAL_MS, DEFAULT_JOB_SINGLE_DISK_LIMIT_KILL_LIMIT_EXCEED, DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED, DEFAULT_LOG_LEVEL, DEFAULT_MAP_CPU_VCORES, DEFAULT_MAP_MEMORY_MB, DEFAULT_MAPRED_ADMIN_JAVA_OPTS, DEFAULT_MAPRED_ADMIN_USER_ENV, DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH, DEFAULT_MAPREDUCE_CROSS_PLATFORM_APPLICATION_CLASSPATH, DEFAULT_MAPREDUCE_JOB_EMIT_TIMELINE_DATA, DEFAULT_MAPREDUCE_JVM_SYSTEM_PROPERTIES_TO_LOG, DEFAULT_MAX_ALLOWED_FETCH_FAILURES_FRACTION, DEFAULT_MAX_FETCH_FAILURES_NOTIFICATIONS, DEFAULT_MAX_SHUFFLE_FETCH_HOST_FAILURES, DEFAULT_MAX_SHUFFLE_FETCH_RETRY_DELAY, DEFAULT_MR_AM_ADMIN_COMMAND_OPTS, DEFAULT_MR_AM_ADMIN_USER_ENV, DEFAULT_MR_AM_COMMAND_OPTS, DEFAULT_MR_AM_COMMIT_WINDOW_MS, DEFAULT_MR_AM_COMMITTER_CANCEL_TIMEOUT_MS, DEFAULT_MR_AM_CONTAINERLAUNCHER_THREAD_COUNT_LIMIT, DEFAULT_MR_AM_CONTAINERLAUNCHER_THREADPOOL_INITIAL_SIZE, DEFAULT_MR_AM_CPU_VCORES, DEFAULT_MR_AM_HARD_KILL_TIMEOUT_MS, DEFAULT_MR_AM_HISTORY_COMPLETE_EVENT_FLUSH_TIMEOUT_MS, DEFAULT_MR_AM_HISTORY_JOB_COMPLETE_UNFLUSHED_MULTIPLIER, DEFAULT_MR_AM_HISTORY_MAX_UNFLUSHED_COMPLETE_EVENTS, DEFAULT_MR_AM_HISTORY_USE_BATCHED_FLUSH_QUEUE_SIZE_THRESHOLD, DEFAULT_MR_AM_IGNORE_BLACKLISTING_BLACKLISTED_NODE_PERCENT, DEFAULT_MR_AM_JOB_CLIENT_THREAD_COUNT, DEFAULT_MR_AM_JOB_REDUCE_PREEMPTION_LIMIT, DEFAULT_MR_AM_JOB_REDUCE_RAMP_UP_LIMIT, DEFAULT_MR_AM_LOG_BACKUPS, DEFAULT_MR_AM_LOG_KB, DEFAULT_MR_AM_LOG_LEVEL, DEFAULT_MR_AM_MAX_ATTEMPTS, DEFAULT_MR_AM_NUM_PROGRESS_SPLITS, DEFAULT_MR_AM_PROFILE, DEFAULT_MR_AM_STAGING_DIR, DEFAULT_MR_AM_STAGING_ERASURECODING_ENABLED, DEFAULT_MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_INITIALS, DEFAULT_MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_LAMBDA_MS, DEFAULT_MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_STAGNATED_MS, DEFAULT_MR_AM_TASK_ESTIMATOR_SMOOTH_LAMBDA_MS, DEFAULT_MR_AM_TASK_LISTENER_THREAD_COUNT, DEFAULT_MR_AM_TO_RM_HEARTBEAT_INTERVAL_MS, DEFAULT_MR_AM_TO_RM_WAIT_INTERVAL_MS, DEFAULT_MR_AM_VMEM_MB, DEFAULT_MR_AM_WEBAPP_HTTPS_CLIENT_AUTH, DEFAULT_MR_AM_WEBAPP_HTTPS_ENABLED, DEFAULT_MR_CLIENT_JOB_MAX_RETRIES, DEFAULT_MR_CLIENT_JOB_RETRY_INTERVAL, DEFAULT_MR_CLIENT_MAX_RETRIES, DEFAULT_MR_CLIENT_TO_AM_IPC_MAX_RETRIES, DEFAULT_MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS, DEFAULT_MR_ENCRYPTED_INTERMEDIATE_DATA, DEFAULT_MR_ENCRYPTED_INTERMEDIATE_DATA_BUFFER_KB, DEFAULT_MR_ENCRYPTED_INTERMEDIATE_DATA_KEY_SIZE_BITS, DEFAULT_MR_JOB_END_NOTIFICATION_TIMEOUT, DEFAULT_MR_JOB_REDUCER_PREEMPT_DELAY_SEC, DEFAULT_MR_JOB_REDUCER_UNCONDITIONAL_PREEMPT_DELAY_SEC, DEFAULT_MR_NUM_OPPORTUNISTIC_MAPS_PERCENT, DEFAULT_MR_TASK_ENABLE_PING_FOR_LIVELINESS_CHECK, DEFAULT_REDUCE_CPU_VCORES, DEFAULT_REDUCE_MEMORY_MB, DEFAULT_REDUCE_SEPARATE_SHUFFLE_LOG, DEFAULT_SHELL, DEFAULT_SHUFFLE_FETCH_RETRY_INTERVAL_MS, DEFAULT_SHUFFLE_INPUT_BUFFER_PERCENT, DEFAULT_SHUFFLE_KEY_LENGTH, DEFAULT_SHUFFLE_LOG_BACKUPS, DEFAULT_SHUFFLE_LOG_KB, DEFAULT_SHUFFLE_MERGE_PERCENT, DEFAULT_SPECULATIVE_MINIMUM_ALLOWED_TASKS, DEFAULT_SPECULATIVE_RETRY_AFTER_NO_SPECULATE, DEFAULT_SPECULATIVE_RETRY_AFTER_SPECULATE, DEFAULT_SPECULATIVECAP_RUNNING_TASKS, DEFAULT_SPECULATIVECAP_TOTAL_TASKS, DEFAULT_SPLIT_METAINFO_MAXSIZE, DEFAULT_TASK_ISMAP, DEFAULT_TASK_LOCAL_WRITE_LIMIT_BYTES, DEFAULT_TASK_LOG_BACKUPS, DEFAULT_TASK_PROFILE_PARAMS, DEFAULT_TASK_STUCK_TIMEOUT_MS, DEFAULT_TASK_TIMEOUT_MILLIS, FILES_FOR_CLASSPATH_AND_SHARED_CACHE, FILES_FOR_SHARED_CACHE, FINISH_JOB_WHEN_REDUCERS_DONE, GROUP_COMPARATOR_CLASS, HADOOP_WORK_DIR, HEAP_MEMORY_MB_RATIO, ID, INDEX_CACHE_MEMORY_LIMIT, INPUT_FILE_MANDATORY_PREFIX, INPUT_FILE_OPTION_PREFIX, INPUT_FORMAT_CLASS_ATTR, IO_SORT_FACTOR, IO_SORT_MB, JAR, JAR_UNPACK_PATTERN, JOB_ACL_MODIFY_JOB, JOB_ACL_VIEW_JOB, JOB_AM_ACCESS_DISABLED, JOB_CANCEL_DELEGATION_TOKEN, JOB_CONF_FILE, JOB_DFS_STORAGE_CAPACITY_KILL_LIMIT_EXCEED, JOB_JAR, JOB_JOBTRACKER_ID, JOB_LOCAL_DIR, JOB_MAX_MAP, JOB_NAME, JOB_NAMENODES, JOB_NAMENODES_TOKEN_RENEWAL_EXCLUDE, JOB_NODE_LABEL_EXP, JOB_RUNNING_MAP_LIMIT, JOB_RUNNING_REDUCE_LIMIT, JOB_SINGLE_DISK_LIMIT_BYTES, JOB_SINGLE_DISK_LIMIT_CHECK_INTERVAL_MS, JOB_SINGLE_DISK_LIMIT_KILL_LIMIT_EXCEED, JOB_SPLIT, JOB_SPLIT_METAINFO, JOB_SUBMIT_DIR, JOB_SUBMITHOST, JOB_SUBMITHOSTADDR, JOB_TAGS, JOB_TOKEN_TRACKING_IDS, JOB_TOKEN_TRACKING_IDS_ENABLED, JOB_UBERTASK_ENABLE, JOB_UBERTASK_MAXBYTES, JOB_UBERTASK_MAXMAPS, JOB_UBERTASK_MAXREDUCES, JOBJAR_SHARED_CACHE_UPLOAD_POLICY, JOBJAR_SHARED_CACHE_UPLOAD_POLICY_DEFAULT, JOBJAR_VISIBILITY, JOBJAR_VISIBILITY_DEFAULT, JVM_NUMTASKS_TORUN, KEY_COMPARATOR, MAP_CLASS_ATTR, MAP_COMBINE_MIN_SPILLS, MAP_CPU_VCORES, MAP_DEBUG_SCRIPT, MAP_ENV, MAP_FAILURES_MAX_PERCENT, MAP_INPUT_FILE, MAP_INPUT_PATH, MAP_INPUT_START, MAP_JAVA_OPTS, MAP_LOG_LEVEL, MAP_MAX_ATTEMPTS, MAP_MEMORY_MB, MAP_NODE_LABEL_EXP, MAP_OUTPUT_COLLECTOR_CLASS_ATTR, MAP_OUTPUT_COMPRESS, MAP_OUTPUT_COMPRESS_CODEC, MAP_OUTPUT_KEY_CLASS, MAP_OUTPUT_KEY_FIELD_SEPARATOR, MAP_OUTPUT_KEY_FIELD_SEPERATOR, MAP_OUTPUT_VALUE_CLASS, MAP_RESOURCE_TYPE_PREFIX, MAP_SKIP_INCR_PROC_COUNT, MAP_SKIP_MAX_RECORDS, MAP_SORT_CLASS, MAP_SORT_SPILL_PERCENT, MAP_SPECULATIVE, MAPRED_ADMIN_USER_ENV, MAPRED_ADMIN_USER_SHELL, MAPRED_MAP_ADMIN_JAVA_OPTS, MAPRED_REDUCE_ADMIN_JAVA_OPTS, MAPREDUCE_APPLICATION_CLASSPATH, MAPREDUCE_APPLICATION_FRAMEWORK_PATH, MAPREDUCE_JOB_CLASSLOADER, MAPREDUCE_JOB_CLASSLOADER_SYSTEM_CLASSES, MAPREDUCE_JOB_CREDENTIALS_BINARY, MAPREDUCE_JOB_DIR, MAPREDUCE_JOB_EMIT_TIMELINE_DATA, MAPREDUCE_JOB_LOG4J_PROPERTIES_FILE, MAPREDUCE_JOB_SHUFFLE_PROVIDER_SERVICES, MAPREDUCE_JOB_USER_CLASSPATH_FIRST, MAPREDUCE_JVM_ADD_OPENS_JAVA_OPT, MAPREDUCE_JVM_ADD_OPENS_JAVA_OPT_DEFAULT, MAPREDUCE_JVM_SYSTEM_PROPERTIES_TO_LOG, MAPREDUCE_V2_CHILD_CLASS, MAX_ALLOWED_FETCH_FAILURES_FRACTION, MAX_FETCH_FAILURES_NOTIFICATIONS, MAX_RESOURCES, MAX_RESOURCES_DEFAULT, MAX_RESOURCES_MB, MAX_RESOURCES_MB_DEFAULT, MAX_SHUFFLE_FETCH_HOST_FAILURES, MAX_SHUFFLE_FETCH_RETRY_DELAY, MAX_SINGLE_RESOURCE_MB, MAX_SINGLE_RESOURCE_MB_DEFAULT, MAX_TASK_FAILURES_PER_TRACKER, MR_AM_ADMIN_COMMAND_OPTS, MR_AM_ADMIN_USER_ENV, MR_AM_COMMAND_OPTS, MR_AM_COMMIT_WINDOW_MS, MR_AM_COMMITTER_CANCEL_TIMEOUT_MS, MR_AM_CONTAINERLAUNCHER_THREAD_COUNT_LIMIT, MR_AM_CONTAINERLAUNCHER_THREADPOOL_INITIAL_SIZE, MR_AM_CPU_VCORES, MR_AM_CREATE_JH_INTERMEDIATE_BASE_DIR, MR_AM_ENV, MR_AM_HARD_KILL_TIMEOUT_MS, MR_AM_HISTORY_COMPLETE_EVENT_FLUSH_TIMEOUT_MS, MR_AM_HISTORY_JOB_COMPLETE_UNFLUSHED_MULTIPLIER, MR_AM_HISTORY_MAX_UNFLUSHED_COMPLETE_EVENTS, MR_AM_HISTORY_USE_BATCHED_FLUSH_QUEUE_SIZE_THRESHOLD, MR_AM_IGNORE_BLACKLISTING_BLACKLISTED_NODE_PERECENT, MR_AM_JOB_CLIENT_PORT_RANGE, MR_AM_JOB_CLIENT_THREAD_COUNT, MR_AM_JOB_NODE_BLACKLISTING_ENABLE, MR_AM_JOB_RECOVERY_ENABLE, MR_AM_JOB_RECOVERY_ENABLE_DEFAULT, MR_AM_JOB_REDUCE_PREEMPTION_LIMIT, MR_AM_JOB_REDUCE_RAMPUP_UP_LIMIT, MR_AM_JOB_SPECULATOR, MR_AM_LOG_BACKUPS, MR_AM_LOG_KB, MR_AM_LOG_LEVEL, MR_AM_MAX_ATTEMPTS, MR_AM_NUM_PROGRESS_SPLITS, MR_AM_PREEMPTION_POLICY, MR_AM_PREFIX, MR_AM_PROFILE, MR_AM_PROFILE_PARAMS, MR_AM_RESOURCE_PREFIX, MR_AM_SECURITY_SERVICE_AUTHORIZATION_CLIENT, MR_AM_SECURITY_SERVICE_AUTHORIZATION_TASK_UMBILICAL, MR_AM_STAGING_DIR, MR_AM_STAGING_DIR_ERASURECODING_ENABLED, MR_AM_TASK_ESTIMATOR, MR_AM_TASK_ESTIMATOR_EXPONENTIAL_RATE_ENABLE, MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_LAMBDA_MS, MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_SKIP_INITIALS, MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_STAGNATED_MS, MR_AM_TASK_ESTIMATOR_SMOOTH_LAMBDA_MS, MR_AM_TASK_LISTENER_THREAD_COUNT, MR_AM_TO_RM_HEARTBEAT_INTERVAL_MS, MR_AM_TO_RM_WAIT_INTERVAL_MS, MR_AM_VMEM_MB, MR_AM_WEBAPP_HTTPS_CLIENT_AUTH, MR_AM_WEBAPP_HTTPS_ENABLED, MR_AM_WEBAPP_PORT_RANGE, MR_APPLICATION_TYPE, MR_CLIENT_JOB_MAX_RETRIES, MR_CLIENT_JOB_RETRY_INTERVAL, MR_CLIENT_MAX_RETRIES, MR_CLIENT_TO_AM_IPC_MAX_RETRIES, MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS, MR_ENCRYPTED_INTERMEDIATE_DATA, MR_ENCRYPTED_INTERMEDIATE_DATA_BUFFER_KB, MR_ENCRYPTED_INTERMEDIATE_DATA_KEY_SIZE_BITS, MR_JOB_END_NOTIFICATION_CUSTOM_NOTIFIER_CLASS, MR_JOB_END_NOTIFICATION_MAX_ATTEMPTS, MR_JOB_END_NOTIFICATION_MAX_RETRY_INTERVAL, MR_JOB_END_NOTIFICATION_PROXY, MR_JOB_END_NOTIFICATION_TIMEOUT, MR_JOB_END_NOTIFICATION_URL, MR_JOB_END_RETRY_ATTEMPTS, MR_JOB_END_RETRY_INTERVAL, MR_JOB_REDACTED_PROPERTIES, MR_JOB_REDUCER_PREEMPT_DELAY_SEC, MR_JOB_REDUCER_UNCONDITIONAL_PREEMPT_DELAY_SEC, MR_JOB_SEND_TOKEN_CONF, MR_NUM_OPPORTUNISTIC_MAPS_PERCENT, MR_PREFIX, MR_TASK_ENABLE_PING_FOR_LIVELINESS_CHECK, NUM_MAP_PROFILES, NUM_MAPS, NUM_REDUCE_PROFILES, NUM_REDUCES, OUTPUT, OUTPUT_FORMAT_CLASS_ATTR, OUTPUT_KEY_CLASS, OUTPUT_VALUE_CLASS, PARTITIONER_CLASS_ATTR, PRESERVE_FAILED_TASK_FILES, PRESERVE_FILES_PATTERN, PRIORITY, QUEUE_NAME, RECORDS_BEFORE_PROGRESS, REDUCE_CLASS_ATTR, REDUCE_CPU_VCORES, REDUCE_DEBUG_SCRIPT, REDUCE_ENV, REDUCE_FAILURES_MAXPERCENT, REDUCE_INPUT_BUFFER_PERCENT, REDUCE_JAVA_OPTS, REDUCE_LOG_LEVEL, REDUCE_MARKRESET_BUFFER_PERCENT, REDUCE_MARKRESET_BUFFER_SIZE, REDUCE_MAX_ATTEMPTS, REDUCE_MEMORY_MB, REDUCE_MEMORY_TOTAL_BYTES, REDUCE_MEMTOMEM_ENABLED, REDUCE_MEMTOMEM_THRESHOLD, REDUCE_MERGE_INMEM_THRESHOLD, REDUCE_NODE_LABEL_EXP, REDUCE_RESOURCE_TYPE_PREFIX, REDUCE_SEPARATE_SHUFFLE_LOG, REDUCE_SKIP_INCR_PROC_COUNT, REDUCE_SKIP_MAXGROUPS, REDUCE_SPECULATIVE, RESERVATION_ID, RESOURCE_TYPE_ALTERNATIVE_NAME_MEMORY, RESOURCE_TYPE_NAME_MEMORY, RESOURCE_TYPE_NAME_VCORE, SETUP_CLEANUP_NEEDED, SHARED_CACHE_MODE, SHARED_CACHE_MODE_DEFAULT, SHUFFLE_CONNECT_TIMEOUT, SHUFFLE_FETCH_FAILURES, SHUFFLE_FETCH_RETRY_ENABLED, SHUFFLE_FETCH_RETRY_INTERVAL_MS, SHUFFLE_FETCH_RETRY_TIMEOUT_MS, SHUFFLE_INDEX_CACHE, SHUFFLE_INPUT_BUFFER_PERCENT, SHUFFLE_KEY_LENGTH, SHUFFLE_LOG_BACKUPS, SHUFFLE_LOG_KB, SHUFFLE_MEMORY_LIMIT_PERCENT, SHUFFLE_MERGE_PERCENT, SHUFFLE_NOTIFY_READERROR, SHUFFLE_PARALLEL_COPIES, SHUFFLE_READ_TIMEOUT, SKIP_OUTDIR, SKIP_RECORDS, SKIP_START_ATTEMPTS, SPECULATIVE_MINIMUM_ALLOWED_TASKS, SPECULATIVE_RETRY_AFTER_NO_SPECULATE, SPECULATIVE_RETRY_AFTER_SPECULATE, SPECULATIVE_SLOWNODE_THRESHOLD, SPECULATIVE_SLOWTASK_THRESHOLD, SPECULATIVECAP, SPECULATIVECAP_RUNNING_TASKS, SPECULATIVECAP_TOTAL_TASKS, SPILL_FILES_COUNT_LIMIT, SPLIT_FILE, SPLIT_METAINFO_MAXSIZE, STDERR_LOGFILE_ENV, STDOUT_LOGFILE_ENV, TASK_ATTEMPT_ID, TASK_CLEANUP_NEEDED, TASK_DEBUGOUT_LINES, TASK_EXIT_TIMEOUT, TASK_EXIT_TIMEOUT_CHECK_INTERVAL_MS, TASK_EXIT_TIMEOUT_CHECK_INTERVAL_MS_DEFAULT, TASK_EXIT_TIMEOUT_DEFAULT, TASK_ID, TASK_ISMAP, TASK_LOCAL_WRITE_LIMIT_BYTES, TASK_LOG_BACKUPS, TASK_LOG_PROGRESS_DELTA_THRESHOLD, TASK_LOG_PROGRESS_DELTA_THRESHOLD_DEFAULT, TASK_LOG_PROGRESS_WAIT_INTERVAL_SECONDS, TASK_LOG_PROGRESS_WAIT_INTERVAL_SECONDS_DEFAULT, TASK_MAP_PROFILE_PARAMS, TASK_OUTPUT_DIR, TASK_PARTITION, TASK_PREEMPTION, TASK_PROFILE, TASK_PROFILE_PARAMS, TASK_PROGRESS_REPORT_INTERVAL, TASK_REDUCE_PROFILE_PARAMS, TASK_STUCK_TIMEOUT_MS, TASK_TIMEOUT, TASK_TIMEOUT_CHECK_INTERVAL_MS, TASK_USERLOG_LIMIT, USER_NAME, WORKDIR, WORKFLOW_ADJACENCY_PREFIX_PATTERN, WORKFLOW_ADJACENCY_PREFIX_STRING, WORKFLOW_ID, WORKFLOW_NAME, WORKFLOW_NODE_NAME, WORKFLOW_TAGS, WORKING_DIR
  • Constructor Summary

    Constructors
    Constructor
    Description
    Job()
    Deprecated.
    Job(Configuration conf, String jobName)
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Add an archive path to the current set of classpath entries.
    static void
    Add an archive path to the current set of classpath entries.
    static boolean
    Add an archive to job config for shared cache processing.
    void
    Add a archives to be localized
    static void
    Add an archives to be localized to the conf.
    void
    Add a file to be localized
    static void
    Add a file to be localized to the conf.
    void
    Add an file path to the current set of classpath entries It adds the file to cache as well.
    static void
    Add a file path to the current set of classpath entries.
    static void
    addFileToClassPath(Path file, Configuration conf, FileSystem fs, boolean addToCache)
    Add a file path to the current set of classpath entries.
    static boolean
    Add a file to job config for shared cache processing.
    static boolean
    Add a file to job config for shared cache processing.
    float
    Get the progress of the job's cleanup-tasks, as a float between 0.0 and 1.0.
    void
    Close the Job.
    void
    Deprecated.
    void
    Fail indicated task attempt.
    This is to get the shared cache upload policies for archives.
    getCluster()
     
    static int
    The interval at which waitForCompletion() should check.
    Gets the counters for this job.
    This is to get the shared cache upload policies for files.
    long
    Get finish time of the job.
     
    static Job
    Creates a new Job with no particular Cluster .
    static Job
    Creates a new Job with no particular Cluster and a given Configuration.
    static Job
    Creates a new Job with no particular Cluster and a given jobName.
    static Job
    Deprecated.
    static Job
    static Job
    getInstance(Cluster cluster, JobStatus status, Configuration conf)
    Creates a new Job with no particular Cluster and given Configuration and JobStatus.
    static Job
    Creates a new Job with no particular Cluster and given Configuration and JobStatus.
    Get the path of the submitted job configuration.
    The user-specified job name.
    Returns the current state of the Job.
    Get scheduling info of the job.
    static int
    The interval at which monitorAndPrintJob() prints status
    Get the reservation to which the job is submitted to, if any
    Get scheduling info of the job.
    long
    Get start time of the job.
     
    getTaskCompletionEvents(int startFrom)
    Get events indicating completion (success/failure) of component tasks.
    getTaskCompletionEvents(int startFrom, int numEvents)
    Get events indicating completion (success/failure) of component tasks.
    Gets the diagnostic messages for a given task attempt.
    Get the task output filter.
    org.apache.hadoop.mapreduce.TaskReport[]
    Get the information of the current state of the tasks of a job.
    Get the URL where some job progress information will be displayed.
    boolean
    Check if the job is finished or not.
    boolean
     
    boolean
    Check if the job completed successfully.
    boolean
     
    void
    Kill the running job.
    void
    Kill indicated task attempt.
    boolean
    killTask(TaskAttemptID taskId, boolean shouldFail)
    Kill indicated task attempt.
    float
    Get the progress of the job's map-tasks, as a float between 0.0 and 1.0.
    boolean
    Monitor a job and print status in real-time as progress is made and tasks fail.
    float
    Get the progress of the job's reduce-tasks, as a float between 0.0 and 1.0.
    static void
    This is to set the shared cache upload policies for archives.
    void
    setCacheArchives(URI[] archives)
    Set the given set of archives
    static void
    setCacheArchives(URI[] archives, Configuration conf)
    Set the configuration with the given set of archives.
    void
    setCacheFiles(URI[] files)
    Set the given set of files
    static void
    Set the configuration with the given set of files.
    void
    Sets the flag that will allow the JobTracker to cancel the HDFS delegation tokens upon job completion.
    void
    setCombinerClass(Class<? extends Reducer> cls)
    Set the combiner class for the job.
    void
    Define the comparator that controls which keys are grouped together for a single call to combiner, Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)
    static void
    This is to set the shared cache upload policies for files.
    void
    Define the comparator that controls which keys are grouped together for a single call to Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)
    void
    Set the InputFormat for the job.
    void
    Set the job jar
    void
    Set the Jar by finding where a given class came from.
    void
    Set the user-specified job name.
    void
    setJobSetupCleanupNeeded(boolean needed)
    Specify whether job-setup and job-cleanup is needed for the job
    void
    Set the key class for the map output data.
    void
    Set the value class for the map output data.
    void
    setMapperClass(Class<? extends Mapper> cls)
    Set the Mapper for the job.
    void
    setMapSpeculativeExecution(boolean speculativeExecution)
    Turn speculative execution on or off for this job for map tasks.
    void
    Expert: Set the number of maximum attempts that will be made to run a map task.
    void
    Expert: Set the number of maximum attempts that will be made to run a reduce task.
    void
    setNumReduceTasks(int tasks)
    Set the number of reduce tasks for the job.
    void
    Set the OutputFormat for the job.
    void
    setOutputKeyClass(Class<?> theClass)
    Set the key class for the job output data.
    void
    Set the value class for job outputs.
    void
    Set the Partitioner for the job.
    void
    setPriority(JobPriority jobPriority)
    Set the priority of a running job.
    void
    setPriorityAsInteger(int jobPriority)
    Set the priority of a running job.
    void
    setProfileEnabled(boolean newValue)
    Set whether the system should collect profiler information for some of the tasks in this job?
    void
    Set the profiler configuration arguments.
    void
    setProfileTaskRange(boolean isMap, String newValue)
    Set the ranges of maps or reduces to profile. setProfileEnabled(true) must also be called.
    void
    setReducerClass(Class<? extends Reducer> cls)
    Set the Reducer for the job.
    void
    setReduceSpeculativeExecution(boolean speculativeExecution)
    Turn speculative execution on or off for this job for reduce tasks.
    void
    Set the reservation to which the job is submitted to
    void
    Define the comparator that controls how the keys are sorted before they are passed to the Reducer.
    void
    setSpeculativeExecution(boolean speculativeExecution)
    Turn speculative execution on or off for this job.
    static void
    Modify the Configuration to set the task output filter.
    float
    Get the progress of the job's setup-tasks, as a float between 0.0 and 1.0.
    void
    Set the reported username for this job.
    void
    Set the current working directory for the default file system.
    void
    Submit the job to the cluster and return immediately.
    Dump stats to screen.
    boolean
    waitForCompletion(boolean verbose)
    Submit the job to the cluster and wait for it to finish.

    Methods inherited from class org.apache.hadoop.mapreduce.task.JobContextImpl

    getArchiveClassPaths, getArchiveClassPaths, getArchiveTimestamps, getArchiveTimestamps, getCacheArchives, getCacheArchives, getCacheFiles, getCacheFiles, getCombinerClass, getCombinerKeyGroupingComparator, getConfiguration, getCredentials, getFileClassPaths, getFileClassPaths, getFileTimestamps, getFileTimestamps, getGroupingComparator, getInputFormatClass, getJar, getJobID, getJobSetupCleanupNeeded, getLocalCacheArchives, getLocalCacheArchives, getLocalCacheFiles, getLocalCacheFiles, getMapOutputKeyClass, getMapOutputValueClass, getMapperClass, getMaxMapAttempts, getMaxReduceAttempts, getNumReduceTasks, getOutputFormatClass, getOutputKeyClass, getOutputValueClass, getPartitionerClass, getProfileEnabled, getProfileParams, getProfileTaskRange, getReducerClass, getSortComparator, getSymlink, getTaskCleanupNeeded, getUser, getWorkingDirectory, setJobID

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

  • Constructor Details

  • Method Details

    • getInstance

      public static Job getInstance() throws IOException
      Creates a new Job with no particular Cluster . A Cluster will be created with a generic Configuration.
      Returns:
      the Job , with no connection to a cluster yet.
      Throws:
      IOException
    • getInstance

      public static Job getInstance(Configuration conf) throws IOException
      Creates a new Job with no particular Cluster and a given Configuration. The Job makes a copy of the Configuration so that any necessary internal modifications do not reflect on the incoming parameter. A Cluster will be created from the conf parameter only when it's needed.
      Parameters:
      conf - the configuration
      Returns:
      the Job , with no connection to a cluster yet.
      Throws:
      IOException
    • getInstance

      public static Job getInstance(Configuration conf, String jobName) throws IOException
      Creates a new Job with no particular Cluster and a given jobName. A Cluster will be created from the conf parameter only when it's needed. The Job makes a copy of the Configuration so that any necessary internal modifications do not reflect on the incoming parameter.
      Parameters:
      conf - the configuration
      Returns:
      the Job , with no connection to a cluster yet.
      Throws:
      IOException
    • getInstance

      public static Job getInstance(JobStatus status, Configuration conf) throws IOException
      Creates a new Job with no particular Cluster and given Configuration and JobStatus. A Cluster will be created from the conf parameter only when it's needed. The Job makes a copy of the Configuration so that any necessary internal modifications do not reflect on the incoming parameter.
      Parameters:
      status - job status
      conf - job configuration
      Returns:
      the Job , with no connection to a cluster yet.
      Throws:
      IOException
    • getInstance

      @Deprecated public static Job getInstance(Cluster ignored) throws IOException
      Deprecated.
      Creates a new Job with no particular Cluster. A Cluster will be created from the conf parameter only when it's needed. The Job makes a copy of the Configuration so that any necessary internal modifications do not reflect on the incoming parameter.
      Parameters:
      ignored -
      Returns:
      the Job , with no connection to a cluster yet.
      Throws:
      IOException
    • getInstance

      @Deprecated public static Job getInstance(Cluster ignored, Configuration conf) throws IOException
      Creates a new Job with no particular Cluster and given Configuration. A Cluster will be created from the conf parameter only when it's needed. The Job makes a copy of the Configuration so that any necessary internal modifications do not reflect on the incoming parameter.
      Parameters:
      ignored -
      conf - job configuration
      Returns:
      the Job , with no connection to a cluster yet.
      Throws:
      IOException
    • getInstance

      @Private public static Job getInstance(Cluster cluster, JobStatus status, Configuration conf) throws IOException
      Creates a new Job with no particular Cluster and given Configuration and JobStatus. A Cluster will be created from the conf parameter only when it's needed. The Job makes a copy of the Configuration so that any necessary internal modifications do not reflect on the incoming parameter.
      Parameters:
      cluster - cluster
      status - job status
      conf - job configuration
      Returns:
      the Job , with no connection to a cluster yet.
      Throws:
      IOException
    • getStatus

      public JobStatus getStatus() throws IOException, InterruptedException
      Throws:
      IOException
      InterruptedException
    • getJobState

      public JobStatus.State getJobState() throws IOException, InterruptedException
      Returns the current state of the Job.
      Returns:
      JobStatus#State
      Throws:
      IOException
      InterruptedException
    • getTrackingURL

      public String getTrackingURL()
      Get the URL where some job progress information will be displayed.
      Returns:
      the URL where some job progress information will be displayed.
    • getJobFile

      public String getJobFile()
      Get the path of the submitted job configuration.
      Returns:
      the path of the submitted job configuration.
    • getStartTime

      public long getStartTime()
      Get start time of the job.
      Returns:
      the start time of the job
    • getFinishTime

      public long getFinishTime() throws IOException, InterruptedException
      Get finish time of the job.
      Returns:
      the finish time of the job
      Throws:
      IOException
      InterruptedException
    • getSchedulingInfo

      public String getSchedulingInfo()
      Get scheduling info of the job.
      Returns:
      the scheduling info of the job
    • getPriority

      public JobPriority getPriority() throws IOException, InterruptedException
      Get scheduling info of the job.
      Returns:
      the priority info of the job
      Throws:
      IOException
      InterruptedException
    • getJobName

      public String getJobName()
      The user-specified job name.
      Specified by:
      getJobName in interface JobContext
      Overrides:
      getJobName in class org.apache.hadoop.mapreduce.task.JobContextImpl
      Returns:
      the job's name, defaulting to "".
    • getHistoryUrl

      public String getHistoryUrl() throws IOException, InterruptedException
      Throws:
      IOException
      InterruptedException
    • isRetired

      public boolean isRetired() throws IOException, InterruptedException
      Throws:
      IOException
      InterruptedException
    • getCluster

      @Private public Cluster getCluster()
    • toString

      public String toString()
      Dump stats to screen.
      Overrides:
      toString in class org.apache.hadoop.mapreduce.task.JobContextImpl
    • getTaskReports

      public org.apache.hadoop.mapreduce.TaskReport[] getTaskReports(TaskType type) throws IOException, InterruptedException
      Get the information of the current state of the tasks of a job.
      Parameters:
      type - Type of the task
      Returns:
      the list of all of the map tips.
      Throws:
      IOException
      InterruptedException
    • mapProgress

      public float mapProgress() throws IOException
      Get the progress of the job's map-tasks, as a float between 0.0 and 1.0. When all map tasks have completed, the function returns 1.0.
      Returns:
      the progress of the job's map-tasks.
      Throws:
      IOException
    • reduceProgress

      public float reduceProgress() throws IOException
      Get the progress of the job's reduce-tasks, as a float between 0.0 and 1.0. When all reduce tasks have completed, the function returns 1.0.
      Returns:
      the progress of the job's reduce-tasks.
      Throws:
      IOException
    • cleanupProgress

      public float cleanupProgress() throws IOException, InterruptedException
      Get the progress of the job's cleanup-tasks, as a float between 0.0 and 1.0. When all cleanup tasks have completed, the function returns 1.0.
      Returns:
      the progress of the job's cleanup-tasks.
      Throws:
      IOException
      InterruptedException
    • setupProgress

      public float setupProgress() throws IOException
      Get the progress of the job's setup-tasks, as a float between 0.0 and 1.0. When all setup tasks have completed, the function returns 1.0.
      Returns:
      the progress of the job's setup-tasks.
      Throws:
      IOException
    • isComplete

      public boolean isComplete() throws IOException
      Check if the job is finished or not. This is a non-blocking call.
      Returns:
      true if the job is complete, else false.
      Throws:
      IOException
    • isSuccessful

      public boolean isSuccessful() throws IOException
      Check if the job completed successfully.
      Returns:
      true if the job succeeded, else false.
      Throws:
      IOException
    • killJob

      public void killJob() throws IOException
      Kill the running job. Blocks until all job tasks have been killed as well. If the job is no longer running, it simply returns.
      Throws:
      IOException
    • setPriority

      public void setPriority(JobPriority jobPriority) throws IOException, InterruptedException
      Set the priority of a running job.
      Parameters:
      jobPriority - the new priority for the job.
      Throws:
      IOException
      InterruptedException
    • setPriorityAsInteger

      public void setPriorityAsInteger(int jobPriority) throws IOException, InterruptedException
      Set the priority of a running job.
      Parameters:
      jobPriority - the new priority for the job.
      Throws:
      IOException
      InterruptedException
    • getTaskCompletionEvents

      public TaskCompletionEvent[] getTaskCompletionEvents(int startFrom, int numEvents) throws IOException, InterruptedException
      Get events indicating completion (success/failure) of component tasks.
      Parameters:
      startFrom - index to start fetching events from
      numEvents - number of events to fetch
      Returns:
      an array of TaskCompletionEvents
      Throws:
      IOException
      InterruptedException
    • getTaskCompletionEvents

      public TaskCompletionEvent[] getTaskCompletionEvents(int startFrom) throws IOException
      Get events indicating completion (success/failure) of component tasks.
      Parameters:
      startFrom - index to start fetching events from
      Returns:
      an array of TaskCompletionEvents
      Throws:
      IOException
    • killTask

      @Private public boolean killTask(TaskAttemptID taskId, boolean shouldFail) throws IOException
      Kill indicated task attempt.
      Parameters:
      taskId - the id of the task to kill.
      shouldFail - if true the task is failed and added to failed tasks list, otherwise it is just killed, w/o affecting job failure status.
      Throws:
      IOException
    • killTask

      public void killTask(TaskAttemptID taskId) throws IOException
      Kill indicated task attempt.
      Parameters:
      taskId - the id of the task to be terminated.
      Throws:
      IOException
    • failTask

      public void failTask(TaskAttemptID taskId) throws IOException
      Fail indicated task attempt.
      Parameters:
      taskId - the id of the task to be terminated.
      Throws:
      IOException
    • getCounters

      public Counters getCounters() throws IOException
      Gets the counters for this job. May return null if the job has been retired and the job is no longer in the completed job store.
      Returns:
      the counters for this job.
      Throws:
      IOException
    • getTaskDiagnostics

      public String[] getTaskDiagnostics(TaskAttemptID taskid) throws IOException, InterruptedException
      Gets the diagnostic messages for a given task attempt.
      Parameters:
      taskid -
      Returns:
      the list of diagnostic messages for the task
      Throws:
      IOException
      InterruptedException
    • setNumReduceTasks

      public void setNumReduceTasks(int tasks) throws IllegalStateException
      Set the number of reduce tasks for the job.
      Parameters:
      tasks - the number of reduce tasks
      Throws:
      IllegalStateException - if the job is submitted
    • setWorkingDirectory

      public void setWorkingDirectory(Path dir) throws IOException
      Set the current working directory for the default file system.
      Parameters:
      dir - the new current working directory.
      Throws:
      IllegalStateException - if the job is submitted
      IOException
    • setInputFormatClass

      public void setInputFormatClass(Class<? extends InputFormat> cls) throws IllegalStateException
      Set the InputFormat for the job.
      Parameters:
      cls - the InputFormat to use
      Throws:
      IllegalStateException - if the job is submitted
    • setOutputFormatClass

      public void setOutputFormatClass(Class<? extends OutputFormat> cls) throws IllegalStateException
      Set the OutputFormat for the job.
      Parameters:
      cls - the OutputFormat to use
      Throws:
      IllegalStateException - if the job is submitted
    • setMapperClass

      public void setMapperClass(Class<? extends Mapper> cls) throws IllegalStateException
      Set the Mapper for the job.
      Parameters:
      cls - the Mapper to use
      Throws:
      IllegalStateException - if the job is submitted
    • setJarByClass

      public void setJarByClass(Class<?> cls)
      Set the Jar by finding where a given class came from.
      Parameters:
      cls - the example class
    • setJar

      public void setJar(String jar)
      Set the job jar
    • setUser

      public void setUser(String user)
      Set the reported username for this job.
      Parameters:
      user - the username for this job.
    • setCombinerClass

      public void setCombinerClass(Class<? extends Reducer> cls) throws IllegalStateException
      Set the combiner class for the job.
      Parameters:
      cls - the combiner to use
      Throws:
      IllegalStateException - if the job is submitted
    • setReducerClass

      public void setReducerClass(Class<? extends Reducer> cls) throws IllegalStateException
      Set the Reducer for the job.
      Parameters:
      cls - the Reducer to use
      Throws:
      IllegalStateException - if the job is submitted
    • setPartitionerClass

      public void setPartitionerClass(Class<? extends Partitioner> cls) throws IllegalStateException
      Set the Partitioner for the job.
      Parameters:
      cls - the Partitioner to use
      Throws:
      IllegalStateException - if the job is submitted
    • setMapOutputKeyClass

      public void setMapOutputKeyClass(Class<?> theClass) throws IllegalStateException
      Set the key class for the map output data. This allows the user to specify the map output key class to be different than the final output value class.
      Parameters:
      theClass - the map output key class.
      Throws:
      IllegalStateException - if the job is submitted
    • setMapOutputValueClass

      public void setMapOutputValueClass(Class<?> theClass) throws IllegalStateException
      Set the value class for the map output data. This allows the user to specify the map output value class to be different than the final output value class.
      Parameters:
      theClass - the map output value class.
      Throws:
      IllegalStateException - if the job is submitted
    • setOutputKeyClass

      public void setOutputKeyClass(Class<?> theClass) throws IllegalStateException
      Set the key class for the job output data.
      Parameters:
      theClass - the key class for the job output data.
      Throws:
      IllegalStateException - if the job is submitted
    • setOutputValueClass

      public void setOutputValueClass(Class<?> theClass) throws IllegalStateException
      Set the value class for job outputs.
      Parameters:
      theClass - the value class for job outputs.
      Throws:
      IllegalStateException - if the job is submitted
    • setCombinerKeyGroupingComparatorClass

      public void setCombinerKeyGroupingComparatorClass(Class<? extends RawComparator> cls) throws IllegalStateException
      Define the comparator that controls which keys are grouped together for a single call to combiner, Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)
      Parameters:
      cls - the raw comparator to use
      Throws:
      IllegalStateException - if the job is submitted
    • setSortComparatorClass

      public void setSortComparatorClass(Class<? extends RawComparator> cls) throws IllegalStateException
      Define the comparator that controls how the keys are sorted before they are passed to the Reducer.
      Parameters:
      cls - the raw comparator
      Throws:
      IllegalStateException - if the job is submitted
      See Also:
    • setGroupingComparatorClass

      public void setGroupingComparatorClass(Class<? extends RawComparator> cls) throws IllegalStateException
      Define the comparator that controls which keys are grouped together for a single call to Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)
      Parameters:
      cls - the raw comparator to use
      Throws:
      IllegalStateException - if the job is submitted
      See Also:
    • setJobName

      public void setJobName(String name) throws IllegalStateException
      Set the user-specified job name.
      Parameters:
      name - the job's new name.
      Throws:
      IllegalStateException - if the job is submitted
    • setSpeculativeExecution

      public void setSpeculativeExecution(boolean speculativeExecution)
      Turn speculative execution on or off for this job.
      Parameters:
      speculativeExecution - true if speculative execution should be turned on, else false.
    • setMapSpeculativeExecution

      public void setMapSpeculativeExecution(boolean speculativeExecution)
      Turn speculative execution on or off for this job for map tasks.
      Parameters:
      speculativeExecution - true if speculative execution should be turned on for map tasks, else false.
    • setReduceSpeculativeExecution

      public void setReduceSpeculativeExecution(boolean speculativeExecution)
      Turn speculative execution on or off for this job for reduce tasks.
      Parameters:
      speculativeExecution - true if speculative execution should be turned on for reduce tasks, else false.
    • setJobSetupCleanupNeeded

      public void setJobSetupCleanupNeeded(boolean needed)
      Specify whether job-setup and job-cleanup is needed for the job
      Parameters:
      needed - If true, job-setup and job-cleanup will be considered from OutputCommitter else ignored.
    • setCacheArchives

      public void setCacheArchives(URI[] archives)
      Set the given set of archives
      Parameters:
      archives - The list of archives that need to be localized
    • setCacheArchives

      public static void setCacheArchives(URI[] archives, Configuration conf)
      Set the configuration with the given set of archives.
      Parameters:
      archives - The list of archives that need to be localized.
      conf - Configuration which will be changed.
    • setCacheFiles

      public void setCacheFiles(URI[] files)
      Set the given set of files
      Parameters:
      files - The list of files that need to be localized
    • setCacheFiles

      public static void setCacheFiles(URI[] files, Configuration conf)
      Set the configuration with the given set of files.
      Parameters:
      files - The list of files that need to be localized.
      conf - Configuration which will be changed.
    • addCacheArchive

      public void addCacheArchive(URI uri)
      Add a archives to be localized
      Parameters:
      uri - The uri of the cache to be localized
    • addCacheArchive

      public static void addCacheArchive(URI uri, Configuration conf)
      Add an archives to be localized to the conf.
      Parameters:
      uri - The uri of the cache to be localized.
      conf - Configuration to add the cache to.
    • addCacheFile

      public void addCacheFile(URI uri)
      Add a file to be localized
      Parameters:
      uri - The uri of the cache to be localized
    • addCacheFile

      public static void addCacheFile(URI uri, Configuration conf)
      Add a file to be localized to the conf. The localized file will be downloaded to the execution node(s), and a link will be created to the file from the job's working directory. If the last part of URI's path name is "*", then the entire parent directory will be localized and links will be created from the job's working directory to each file in the parent directory.

      The access permissions of the file will determine whether the localized file will be shared across jobs. If the file is not readable by other or if any of its parent directories is not executable by other, then the file will not be shared. In the case of a path that ends in "/*", sharing of the localized files will be determined solely from the access permissions of the parent directories. The access permissions of the individual files will be ignored.

      Parameters:
      uri - The uri of the cache to be localized.
      conf - Configuration to add the cache to.
    • addFileToClassPath

      public void addFileToClassPath(Path file) throws IOException
      Add an file path to the current set of classpath entries It adds the file to cache as well. Files added with this method will not be unpacked while being added to the classpath. To add archives to classpath, use the addArchiveToClassPath(Path) method instead.
      Parameters:
      file - Path of the file to be added
      Throws:
      IOException
    • addFileToClassPath

      public static void addFileToClassPath(Path file, Configuration conf, FileSystem fs)
      Add a file path to the current set of classpath entries. The file will also be added to the cache.
      Parameters:
      file - Path of the file to be added.
      conf - Configuration that contains the classpath setting.
      fs - FileSystem with respect to which file should be interpreted.
    • addFileToClassPath

      public static void addFileToClassPath(Path file, Configuration conf, FileSystem fs, boolean addToCache)
      Add a file path to the current set of classpath entries. The file will also be added to the cache if addToCache is true.
      Parameters:
      file - Path of the file to be added.
      conf - Configuration that contains the classpath setting.
      fs - FileSystem with respect to which file should be interpreted.
      addToCache - Whether the file should also be added to the cache list.
    • addArchiveToClassPath

      public void addArchiveToClassPath(Path archive) throws IOException
      Add an archive path to the current set of classpath entries. It adds the archive to cache as well. Archive files will be unpacked and added to the classpath when being distributed.
      Parameters:
      archive - Path of the archive to be added
      Throws:
      IOException
    • addArchiveToClassPath

      public static void addArchiveToClassPath(Path archive, Configuration conf, FileSystem fs)
      Add an archive path to the current set of classpath entries. It adds the archive to cache as well.
      Parameters:
      archive - Path of the archive to be added.
      conf - Configuration that contains the classpath setting.
      fs - FileSystem with respect to which archive should be interpreted.
    • createSymlink

      @Deprecated public void createSymlink()
      Deprecated.
      Originally intended to enable symlinks, but currently symlinks cannot be disabled.
    • setMaxMapAttempts

      public void setMaxMapAttempts(int n)
      Expert: Set the number of maximum attempts that will be made to run a map task.
      Parameters:
      n - the number of attempts per map task.
    • setMaxReduceAttempts

      public void setMaxReduceAttempts(int n)
      Expert: Set the number of maximum attempts that will be made to run a reduce task.
      Parameters:
      n - the number of attempts per reduce task.
    • setProfileEnabled

      public void setProfileEnabled(boolean newValue)
      Set whether the system should collect profiler information for some of the tasks in this job? The information is stored in the user log directory.
      Parameters:
      newValue - true means it should be gathered
    • setProfileParams

      public void setProfileParams(String value)
      Set the profiler configuration arguments. If the string contains a '%s' it will be replaced with the name of the profiling output file when the task runs. This value is passed to the task child JVM on the command line.
      Parameters:
      value - the configuration string
    • setProfileTaskRange

      public void setProfileTaskRange(boolean isMap, String newValue)
      Set the ranges of maps or reduces to profile. setProfileEnabled(true) must also be called.
      Parameters:
      newValue - a set of integer ranges of the map ids
    • setCancelDelegationTokenUponJobCompletion

      public void setCancelDelegationTokenUponJobCompletion(boolean value)
      Sets the flag that will allow the JobTracker to cancel the HDFS delegation tokens upon job completion. Defaults to true.
    • addFileToSharedCache

      @Unstable public static boolean addFileToSharedCache(URI resource, Configuration conf)
      Add a file to job config for shared cache processing. If shared cache is enabled, it will return true, otherwise, return false. We don't check with SCM here given application might not be able to provide the job id; ClientSCMProtocol.use requires the application id. Job Submitter will read the files from job config and take care of things.
      Parameters:
      resource - The resource that Job Submitter will process later using shared cache.
      conf - Configuration to add the resource to
      Returns:
      whether the resource has been added to the configuration
    • addFileToSharedCacheAndClasspath

      @Unstable public static boolean addFileToSharedCacheAndClasspath(URI resource, Configuration conf)
      Add a file to job config for shared cache processing. If shared cache is enabled, it will return true, otherwise, return false. We don't check with SCM here given application might not be able to provide the job id; ClientSCMProtocol.use requires the application id. Job Submitter will read the files from job config and take care of things. Job Submitter will also add the file to classpath. Intended to be used by user code.
      Parameters:
      resource - The resource that Job Submitter will process later using shared cache.
      conf - Configuration to add the resource to
      Returns:
      whether the resource has been added to the configuration
    • addArchiveToSharedCache

      @Unstable public static boolean addArchiveToSharedCache(URI resource, Configuration conf)
      Add an archive to job config for shared cache processing. If shared cache is enabled, it will return true, otherwise, return false. We don't check with SCM here given application might not be able to provide the job id; ClientSCMProtocol.use requires the application id. Job Submitter will read the files from job config and take care of things. Intended to be used by user code.
      Parameters:
      resource - The resource that Job Submitter will process later using shared cache.
      conf - Configuration to add the resource to
      Returns:
      whether the resource has been added to the configuration
    • setFileSharedCacheUploadPolicies

      @Unstable public static void setFileSharedCacheUploadPolicies(Configuration conf, Map<String,Boolean> policies)
      This is to set the shared cache upload policies for files. If the parameter was previously set, this method will replace the old value with the new provided map.
      Parameters:
      conf - Configuration which stores the shared cache upload policies
      policies - A map containing the shared cache upload policies for a set of resources. The key is the url of the resource and the value is the upload policy. True if it should be uploaded, false otherwise.
    • setArchiveSharedCacheUploadPolicies

      @Unstable public static void setArchiveSharedCacheUploadPolicies(Configuration conf, Map<String,Boolean> policies)
      This is to set the shared cache upload policies for archives. If the parameter was previously set, this method will replace the old value with the new provided map.
      Parameters:
      conf - Configuration which stores the shared cache upload policies
      policies - A map containing the shared cache upload policies for a set of resources. The key is the url of the resource and the value is the upload policy. True if it should be uploaded, false otherwise.
    • getFileSharedCacheUploadPolicies

      @Unstable public static Map<String,Boolean> getFileSharedCacheUploadPolicies(Configuration conf)
      This is to get the shared cache upload policies for files.
      Parameters:
      conf - Configuration which stores the shared cache upload policies
      Returns:
      A map containing the shared cache upload policies for a set of resources. The key is the url of the resource and the value is the upload policy. True if it should be uploaded, false otherwise.
    • getArchiveSharedCacheUploadPolicies

      @Unstable public static Map<String,Boolean> getArchiveSharedCacheUploadPolicies(Configuration conf)
      This is to get the shared cache upload policies for archives.
      Parameters:
      conf - Configuration which stores the shared cache upload policies
      Returns:
      A map containing the shared cache upload policies for a set of resources. The key is the url of the resource and the value is the upload policy. True if it should be uploaded, false otherwise.
    • submit

      Submit the job to the cluster and return immediately.
      Throws:
      IOException
      InterruptedException
      ClassNotFoundException
    • waitForCompletion

      public boolean waitForCompletion(boolean verbose) throws IOException, InterruptedException, ClassNotFoundException
      Submit the job to the cluster and wait for it to finish.
      Parameters:
      verbose - print the progress to the user
      Returns:
      true if the job succeeded
      Throws:
      IOException - thrown if the communication with the JobTracker is lost
      InterruptedException
      ClassNotFoundException
    • monitorAndPrintJob

      public boolean monitorAndPrintJob() throws IOException, InterruptedException
      Monitor a job and print status in real-time as progress is made and tasks fail.
      Returns:
      true if the job succeeded
      Throws:
      IOException - if communication to the JobTracker fails
      InterruptedException
    • getProgressPollInterval

      public static int getProgressPollInterval(Configuration conf)
      The interval at which monitorAndPrintJob() prints status
    • getCompletionPollInterval

      public static int getCompletionPollInterval(Configuration conf)
      The interval at which waitForCompletion() should check.
    • getTaskOutputFilter

      public static Job.TaskStatusFilter getTaskOutputFilter(Configuration conf)
      Get the task output filter.
      Parameters:
      conf - the configuration.
      Returns:
      the filter level.
    • setTaskOutputFilter

      public static void setTaskOutputFilter(Configuration conf, Job.TaskStatusFilter newValue)
      Modify the Configuration to set the task output filter.
      Parameters:
      conf - the Configuration to modify.
      newValue - the value to set.
    • isUber

      public boolean isUber() throws IOException, InterruptedException
      Throws:
      IOException
      InterruptedException
    • getReservationId

      public ReservationId getReservationId()
      Get the reservation to which the job is submitted to, if any
      Returns:
      the reservationId the identifier of the job's reservation, null if the job does not have any reservation associated with it
    • setReservationId

      public void setReservationId(ReservationId reservationId)
      Set the reservation to which the job is submitted to
      Parameters:
      reservationId - the reservationId to set
    • close

      public void close() throws IOException
      Close the Job.
      Specified by:
      close in interface AutoCloseable
      Throws:
      IOException - if fail to close.