org.apache.hadoop.mapreduce.task.JobContextImpl

org.apache.hadoop.mapreduce.Job

All Implemented Interfaces:: AutoCloseable, JobContext, org.apache.hadoop.mapreduce.MRJobConfig

@Public @Evolving public class Job extends org.apache.hadoop.mapreduce.task.JobContextImpl implements JobContext, AutoCloseable

The job submitter's view of the Job.

It allows the user to configure the job, submit it, control its execution, and query the state. The set methods only work until the job is submitted, afterwards they will throw an IllegalStateException.

Normally the user creates the application, describes various facets of the job via Job and then submits the job and monitor its progress.

Here is an example on how to submit a job:

     // Create a new Job
     Job job = Job.getInstance();
     job.setJarByClass(MyJob.class);
     
     // Specify various job-specific parameters     
     job.setJobName("myjob");
     
     job.setInputPath(new Path("in"));
     job.setOutputPath(new Path("out"));
     
     job.setMapperClass(MyJob.MyMapper.class);
     job.setReducerClass(MyJob.MyReducer.class);

     // Submit the job, then poll for progress until the job is complete
     job.waitForCompletion(true);

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static enum

Job.JobState

static enum

Job.TaskStatusFilter
Field Summary

Fields

Modifier and Type

Field

Description

static final String

COMPLETION_POLL_INTERVAL_KEY

Key in mapred-*.xml that sets completionPollInvervalMillis

static final int

DEFAULT_SUBMIT_REPLICATION

static final boolean

DEFAULT_USE_WILDCARD_FOR_LIBJARS

static final String

OUTPUT_FILTER

static final String

PROGRESS_MONITOR_POLL_INTERVAL_KEY

Key in mapred-*.xml that sets progMonitorPollIntervalMillis

static final String

SUBMIT_REPLICATION

static final String

USE_WILDCARD_FOR_LIBJARS

static final String

USED_GENERIC_PARSER

Fields inherited from class org.apache.hadoop.mapreduce.task.JobContextImpl
conf, credentials, ugi

Fields inherited from interface org.apache.hadoop.mapreduce.MRJobConfig
AM_NODE_LABEL_EXP, AM_STRICT_LOCALITY, APPLICATION_ATTEMPT_ID, APPLICATION_MASTER_CLASS, ARCHIVES_FOR_SHARED_CACHE, CACHE_ARCHIVES, CACHE_ARCHIVES_SHARED_CACHE_UPLOAD_POLICIES, CACHE_ARCHIVES_SIZES, CACHE_ARCHIVES_TIMESTAMPS, CACHE_ARCHIVES_VISIBILITIES, CACHE_FILE_TIMESTAMPS, CACHE_FILE_VISIBILITIES, CACHE_FILES, CACHE_FILES_SHARED_CACHE_UPLOAD_POLICIES, CACHE_FILES_SIZES, CACHE_LOCALARCHIVES, CACHE_LOCALFILES, CACHE_SYMLINK, CLASSPATH_ARCHIVES, CLASSPATH_FILES, COMBINE_CLASS_ATTR, COMBINE_RECORDS_BEFORE_PROGRESS, COMBINER_GROUP_COMPARATOR_CLASS, COMPLETED_MAPS_FOR_REDUCE_SLOWSTART, COUNTER_GROUP_NAME_MAX_DEFAULT, COUNTER_GROUP_NAME_MAX_KEY, COUNTER_GROUPS_MAX_DEFAULT, COUNTER_GROUPS_MAX_KEY, COUNTER_NAME_MAX_DEFAULT, COUNTER_NAME_MAX_KEY, COUNTERS_MAX_DEFAULT, COUNTERS_MAX_KEY, DEFAULT_FINISH_JOB_WHEN_REDUCERS_DONE, DEFAULT_HEAP_MEMORY_MB_RATIO, DEFAULT_IO_SORT_FACTOR, DEFAULT_IO_SORT_MB, DEFAULT_JOB_ACL_MODIFY_JOB, DEFAULT_JOB_ACL_VIEW_JOB, DEFAULT_JOB_AM_ACCESS_DISABLED, DEFAULT_JOB_DFS_STORAGE_CAPACITY_KILL_LIMIT_EXCEED, DEFAULT_JOB_MAX_MAP, DEFAULT_JOB_RUNNING_MAP_LIMIT, DEFAULT_JOB_RUNNING_REDUCE_LIMIT, DEFAULT_JOB_SINGLE_DISK_LIMIT_BYTES, DEFAULT_JOB_SINGLE_DISK_LIMIT_CHECK_INTERVAL_MS, DEFAULT_JOB_SINGLE_DISK_LIMIT_KILL_LIMIT_EXCEED, DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED, DEFAULT_LOG_LEVEL, DEFAULT_MAP_CPU_VCORES, DEFAULT_MAP_MEMORY_MB, DEFAULT_MAPRED_ADMIN_JAVA_OPTS, DEFAULT_MAPRED_ADMIN_USER_ENV, DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH, DEFAULT_MAPREDUCE_CROSS_PLATFORM_APPLICATION_CLASSPATH, DEFAULT_MAPREDUCE_JOB_EMIT_TIMELINE_DATA, DEFAULT_MAPREDUCE_JVM_SYSTEM_PROPERTIES_TO_LOG, DEFAULT_MAX_ALLOWED_FETCH_FAILURES_FRACTION, DEFAULT_MAX_FETCH_FAILURES_NOTIFICATIONS, DEFAULT_MAX_SHUFFLE_FETCH_HOST_FAILURES, DEFAULT_MAX_SHUFFLE_FETCH_RETRY_DELAY, DEFAULT_MR_AM_ADMIN_COMMAND_OPTS, DEFAULT_MR_AM_ADMIN_USER_ENV, DEFAULT_MR_AM_COMMAND_OPTS, DEFAULT_MR_AM_COMMIT_WINDOW_MS, DEFAULT_MR_AM_COMMITTER_CANCEL_TIMEOUT_MS, DEFAULT_MR_AM_CONTAINERLAUNCHER_THREAD_COUNT_LIMIT, DEFAULT_MR_AM_CONTAINERLAUNCHER_THREADPOOL_INITIAL_SIZE, DEFAULT_MR_AM_CPU_VCORES, DEFAULT_MR_AM_HARD_KILL_TIMEOUT_MS, DEFAULT_MR_AM_HISTORY_COMPLETE_EVENT_FLUSH_TIMEOUT_MS, DEFAULT_MR_AM_HISTORY_JOB_COMPLETE_UNFLUSHED_MULTIPLIER, DEFAULT_MR_AM_HISTORY_MAX_UNFLUSHED_COMPLETE_EVENTS, DEFAULT_MR_AM_HISTORY_USE_BATCHED_FLUSH_QUEUE_SIZE_THRESHOLD, DEFAULT_MR_AM_IGNORE_BLACKLISTING_BLACKLISTED_NODE_PERCENT, DEFAULT_MR_AM_JOB_CLIENT_THREAD_COUNT, DEFAULT_MR_AM_JOB_REDUCE_PREEMPTION_LIMIT, DEFAULT_MR_AM_JOB_REDUCE_RAMP_UP_LIMIT, DEFAULT_MR_AM_LOG_BACKUPS, DEFAULT_MR_AM_LOG_KB, DEFAULT_MR_AM_LOG_LEVEL, DEFAULT_MR_AM_MAX_ATTEMPTS, DEFAULT_MR_AM_NUM_PROGRESS_SPLITS, DEFAULT_MR_AM_PROFILE, DEFAULT_MR_AM_STAGING_DIR, DEFAULT_MR_AM_STAGING_ERASURECODING_ENABLED, DEFAULT_MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_INITIALS, DEFAULT_MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_LAMBDA_MS, DEFAULT_MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_STAGNATED_MS, DEFAULT_MR_AM_TASK_ESTIMATOR_SMOOTH_LAMBDA_MS, DEFAULT_MR_AM_TASK_LISTENER_THREAD_COUNT, DEFAULT_MR_AM_TO_RM_HEARTBEAT_INTERVAL_MS, DEFAULT_MR_AM_TO_RM_WAIT_INTERVAL_MS, DEFAULT_MR_AM_VMEM_MB, DEFAULT_MR_AM_WEBAPP_HTTPS_CLIENT_AUTH, DEFAULT_MR_AM_WEBAPP_HTTPS_ENABLED, DEFAULT_MR_CLIENT_JOB_MAX_RETRIES, DEFAULT_MR_CLIENT_JOB_RETRY_INTERVAL, DEFAULT_MR_CLIENT_MAX_RETRIES, DEFAULT_MR_CLIENT_TO_AM_IPC_MAX_RETRIES, DEFAULT_MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS, DEFAULT_MR_ENCRYPTED_INTERMEDIATE_DATA, DEFAULT_MR_ENCRYPTED_INTERMEDIATE_DATA_BUFFER_KB, DEFAULT_MR_ENCRYPTED_INTERMEDIATE_DATA_KEY_SIZE_BITS, DEFAULT_MR_JOB_END_NOTIFICATION_TIMEOUT, DEFAULT_MR_JOB_REDUCER_PREEMPT_DELAY_SEC, DEFAULT_MR_JOB_REDUCER_UNCONDITIONAL_PREEMPT_DELAY_SEC, DEFAULT_MR_NUM_OPPORTUNISTIC_MAPS_PERCENT, DEFAULT_MR_TASK_ENABLE_PING_FOR_LIVELINESS_CHECK, DEFAULT_REDUCE_CPU_VCORES, DEFAULT_REDUCE_MEMORY_MB, DEFAULT_REDUCE_SEPARATE_SHUFFLE_LOG, DEFAULT_SHELL, DEFAULT_SHUFFLE_FETCH_RETRY_INTERVAL_MS, DEFAULT_SHUFFLE_INPUT_BUFFER_PERCENT, DEFAULT_SHUFFLE_KEY_LENGTH, DEFAULT_SHUFFLE_LOG_BACKUPS, DEFAULT_SHUFFLE_LOG_KB, DEFAULT_SHUFFLE_MERGE_PERCENT, DEFAULT_SPECULATIVE_MINIMUM_ALLOWED_TASKS, DEFAULT_SPECULATIVE_RETRY_AFTER_NO_SPECULATE, DEFAULT_SPECULATIVE_RETRY_AFTER_SPECULATE, DEFAULT_SPECULATIVECAP_RUNNING_TASKS, DEFAULT_SPECULATIVECAP_TOTAL_TASKS, DEFAULT_SPLIT_METAINFO_MAXSIZE, DEFAULT_TASK_ISMAP, DEFAULT_TASK_LOCAL_WRITE_LIMIT_BYTES, DEFAULT_TASK_LOG_BACKUPS, DEFAULT_TASK_PROFILE_PARAMS, DEFAULT_TASK_STUCK_TIMEOUT_MS, DEFAULT_TASK_TIMEOUT_MILLIS, FILES_FOR_CLASSPATH_AND_SHARED_CACHE, FILES_FOR_SHARED_CACHE, FINISH_JOB_WHEN_REDUCERS_DONE, GROUP_COMPARATOR_CLASS, HADOOP_WORK_DIR, HEAP_MEMORY_MB_RATIO, ID, INDEX_CACHE_MEMORY_LIMIT, INPUT_FILE_MANDATORY_PREFIX, INPUT_FILE_OPTION_PREFIX, INPUT_FORMAT_CLASS_ATTR, IO_SORT_FACTOR, IO_SORT_MB, JAR, JAR_UNPACK_PATTERN, JOB_ACL_MODIFY_JOB, JOB_ACL_VIEW_JOB, JOB_AM_ACCESS_DISABLED, JOB_CANCEL_DELEGATION_TOKEN, JOB_CONF_FILE, JOB_DFS_STORAGE_CAPACITY_KILL_LIMIT_EXCEED, JOB_JAR, JOB_JOBTRACKER_ID, JOB_LOCAL_DIR, JOB_MAX_MAP, JOB_NAME, JOB_NAMENODES, JOB_NAMENODES_TOKEN_RENEWAL_EXCLUDE, JOB_NODE_LABEL_EXP, JOB_RUNNING_MAP_LIMIT, JOB_RUNNING_REDUCE_LIMIT, JOB_SINGLE_DISK_LIMIT_BYTES, JOB_SINGLE_DISK_LIMIT_CHECK_INTERVAL_MS, JOB_SINGLE_DISK_LIMIT_KILL_LIMIT_EXCEED, JOB_SPLIT, JOB_SPLIT_METAINFO, JOB_SUBMIT_DIR, JOB_SUBMITHOST, JOB_SUBMITHOSTADDR, JOB_TAGS, JOB_TOKEN_TRACKING_IDS, JOB_TOKEN_TRACKING_IDS_ENABLED, JOB_UBERTASK_ENABLE, JOB_UBERTASK_MAXBYTES, JOB_UBERTASK_MAXMAPS, JOB_UBERTASK_MAXREDUCES, JOBJAR_SHARED_CACHE_UPLOAD_POLICY, JOBJAR_SHARED_CACHE_UPLOAD_POLICY_DEFAULT, JOBJAR_VISIBILITY, JOBJAR_VISIBILITY_DEFAULT, JVM_NUMTASKS_TORUN, KEY_COMPARATOR, MAP_CLASS_ATTR, MAP_COMBINE_MIN_SPILLS, MAP_CPU_VCORES, MAP_DEBUG_SCRIPT, MAP_ENV, MAP_FAILURES_MAX_PERCENT, MAP_INPUT_FILE, MAP_INPUT_PATH, MAP_INPUT_START, MAP_JAVA_OPTS, MAP_LOG_LEVEL, MAP_MAX_ATTEMPTS, MAP_MEMORY_MB, MAP_NODE_LABEL_EXP, MAP_OUTPUT_COLLECTOR_CLASS_ATTR, MAP_OUTPUT_COMPRESS, MAP_OUTPUT_COMPRESS_CODEC, MAP_OUTPUT_KEY_CLASS, MAP_OUTPUT_KEY_FIELD_SEPARATOR, MAP_OUTPUT_KEY_FIELD_SEPERATOR, MAP_OUTPUT_VALUE_CLASS, MAP_RESOURCE_TYPE_PREFIX, MAP_SKIP_INCR_PROC_COUNT, MAP_SKIP_MAX_RECORDS, MAP_SORT_CLASS, MAP_SORT_SPILL_PERCENT, MAP_SPECULATIVE, MAPRED_ADMIN_USER_ENV, MAPRED_ADMIN_USER_SHELL, MAPRED_MAP_ADMIN_JAVA_OPTS, MAPRED_REDUCE_ADMIN_JAVA_OPTS, MAPREDUCE_APPLICATION_CLASSPATH, MAPREDUCE_APPLICATION_FRAMEWORK_PATH, MAPREDUCE_JOB_CLASSLOADER, MAPREDUCE_JOB_CLASSLOADER_SYSTEM_CLASSES, MAPREDUCE_JOB_CREDENTIALS_BINARY, MAPREDUCE_JOB_DIR, MAPREDUCE_JOB_EMIT_TIMELINE_DATA, MAPREDUCE_JOB_LOG4J_PROPERTIES_FILE, MAPREDUCE_JOB_SHUFFLE_PROVIDER_SERVICES, MAPREDUCE_JOB_USER_CLASSPATH_FIRST, MAPREDUCE_JVM_ADD_OPENS_JAVA_OPT, MAPREDUCE_JVM_ADD_OPENS_JAVA_OPT_DEFAULT, MAPREDUCE_JVM_SYSTEM_PROPERTIES_TO_LOG, MAPREDUCE_V2_CHILD_CLASS, MAX_ALLOWED_FETCH_FAILURES_FRACTION, MAX_FETCH_FAILURES_NOTIFICATIONS, MAX_RESOURCES, MAX_RESOURCES_DEFAULT, MAX_RESOURCES_MB, MAX_RESOURCES_MB_DEFAULT, MAX_SHUFFLE_FETCH_HOST_FAILURES, MAX_SHUFFLE_FETCH_RETRY_DELAY, MAX_SINGLE_RESOURCE_MB, MAX_SINGLE_RESOURCE_MB_DEFAULT, MAX_TASK_FAILURES_PER_TRACKER, MR_AM_ADMIN_COMMAND_OPTS, MR_AM_ADMIN_USER_ENV, MR_AM_COMMAND_OPTS, MR_AM_COMMIT_WINDOW_MS, MR_AM_COMMITTER_CANCEL_TIMEOUT_MS, MR_AM_CONTAINERLAUNCHER_THREAD_COUNT_LIMIT, MR_AM_CONTAINERLAUNCHER_THREADPOOL_INITIAL_SIZE, MR_AM_CPU_VCORES, MR_AM_CREATE_JH_INTERMEDIATE_BASE_DIR, MR_AM_ENV, MR_AM_HARD_KILL_TIMEOUT_MS, MR_AM_HISTORY_COMPLETE_EVENT_FLUSH_TIMEOUT_MS, MR_AM_HISTORY_JOB_COMPLETE_UNFLUSHED_MULTIPLIER, MR_AM_HISTORY_MAX_UNFLUSHED_COMPLETE_EVENTS, MR_AM_HISTORY_USE_BATCHED_FLUSH_QUEUE_SIZE_THRESHOLD, MR_AM_IGNORE_BLACKLISTING_BLACKLISTED_NODE_PERECENT, MR_AM_JOB_CLIENT_PORT_RANGE, MR_AM_JOB_CLIENT_THREAD_COUNT, MR_AM_JOB_NODE_BLACKLISTING_ENABLE, MR_AM_JOB_RECOVERY_ENABLE, MR_AM_JOB_RECOVERY_ENABLE_DEFAULT, MR_AM_JOB_REDUCE_PREEMPTION_LIMIT, MR_AM_JOB_REDUCE_RAMPUP_UP_LIMIT, MR_AM_JOB_SPECULATOR, MR_AM_LOG_BACKUPS, MR_AM_LOG_KB, MR_AM_LOG_LEVEL, MR_AM_MAX_ATTEMPTS, MR_AM_NUM_PROGRESS_SPLITS, MR_AM_PREEMPTION_POLICY, MR_AM_PREFIX, MR_AM_PROFILE, MR_AM_PROFILE_PARAMS, MR_AM_RESOURCE_PREFIX, MR_AM_SECURITY_SERVICE_AUTHORIZATION_CLIENT, MR_AM_SECURITY_SERVICE_AUTHORIZATION_TASK_UMBILICAL, MR_AM_STAGING_DIR, MR_AM_STAGING_DIR_ERASURECODING_ENABLED, MR_AM_TASK_ESTIMATOR, MR_AM_TASK_ESTIMATOR_EXPONENTIAL_RATE_ENABLE, MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_LAMBDA_MS, MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_SKIP_INITIALS, MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_STAGNATED_MS, MR_AM_TASK_ESTIMATOR_SMOOTH_LAMBDA_MS, MR_AM_TASK_LISTENER_THREAD_COUNT, MR_AM_TO_RM_HEARTBEAT_INTERVAL_MS, MR_AM_TO_RM_WAIT_INTERVAL_MS, MR_AM_VMEM_MB, MR_AM_WEBAPP_HTTPS_CLIENT_AUTH, MR_AM_WEBAPP_HTTPS_ENABLED, MR_AM_WEBAPP_PORT_RANGE, MR_APPLICATION_TYPE, MR_CLIENT_JOB_MAX_RETRIES, MR_CLIENT_JOB_RETRY_INTERVAL, MR_CLIENT_MAX_RETRIES, MR_CLIENT_TO_AM_IPC_MAX_RETRIES, MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS, MR_ENCRYPTED_INTERMEDIATE_DATA, MR_ENCRYPTED_INTERMEDIATE_DATA_BUFFER_KB, MR_ENCRYPTED_INTERMEDIATE_DATA_KEY_SIZE_BITS, MR_JOB_END_NOTIFICATION_CUSTOM_NOTIFIER_CLASS, MR_JOB_END_NOTIFICATION_MAX_ATTEMPTS, MR_JOB_END_NOTIFICATION_MAX_RETRY_INTERVAL, MR_JOB_END_NOTIFICATION_PROXY, MR_JOB_END_NOTIFICATION_TIMEOUT, MR_JOB_END_NOTIFICATION_URL, MR_JOB_END_RETRY_ATTEMPTS, MR_JOB_END_RETRY_INTERVAL, MR_JOB_REDACTED_PROPERTIES, MR_JOB_REDUCER_PREEMPT_DELAY_SEC, MR_JOB_REDUCER_UNCONDITIONAL_PREEMPT_DELAY_SEC, MR_JOB_SEND_TOKEN_CONF, MR_NUM_OPPORTUNISTIC_MAPS_PERCENT, MR_PREFIX, MR_TASK_ENABLE_PING_FOR_LIVELINESS_CHECK, NUM_MAP_PROFILES, NUM_MAPS, NUM_REDUCE_PROFILES, NUM_REDUCES, OUTPUT, OUTPUT_FORMAT_CLASS_ATTR, OUTPUT_KEY_CLASS, OUTPUT_VALUE_CLASS, PARTITIONER_CLASS_ATTR, PRESERVE_FAILED_TASK_FILES, PRESERVE_FILES_PATTERN, PRIORITY, QUEUE_NAME, RECORDS_BEFORE_PROGRESS, REDUCE_CLASS_ATTR, REDUCE_CPU_VCORES, REDUCE_DEBUG_SCRIPT, REDUCE_ENV, REDUCE_FAILURES_MAXPERCENT, REDUCE_INPUT_BUFFER_PERCENT, REDUCE_JAVA_OPTS, REDUCE_LOG_LEVEL, REDUCE_MARKRESET_BUFFER_PERCENT, REDUCE_MARKRESET_BUFFER_SIZE, REDUCE_MAX_ATTEMPTS, REDUCE_MEMORY_MB, REDUCE_MEMORY_TOTAL_BYTES, REDUCE_MEMTOMEM_ENABLED, REDUCE_MEMTOMEM_THRESHOLD, REDUCE_MERGE_INMEM_THRESHOLD, REDUCE_NODE_LABEL_EXP, REDUCE_RESOURCE_TYPE_PREFIX, REDUCE_SEPARATE_SHUFFLE_LOG, REDUCE_SKIP_INCR_PROC_COUNT, REDUCE_SKIP_MAXGROUPS, REDUCE_SPECULATIVE, RESERVATION_ID, RESOURCE_TYPE_ALTERNATIVE_NAME_MEMORY, RESOURCE_TYPE_NAME_MEMORY, RESOURCE_TYPE_NAME_VCORE, SETUP_CLEANUP_NEEDED, SHARED_CACHE_MODE, SHARED_CACHE_MODE_DEFAULT, SHUFFLE_CONNECT_TIMEOUT, SHUFFLE_FETCH_FAILURES, SHUFFLE_FETCH_RETRY_ENABLED, SHUFFLE_FETCH_RETRY_INTERVAL_MS, SHUFFLE_FETCH_RETRY_TIMEOUT_MS, SHUFFLE_INDEX_CACHE, SHUFFLE_INPUT_BUFFER_PERCENT, SHUFFLE_KEY_LENGTH, SHUFFLE_LOG_BACKUPS, SHUFFLE_LOG_KB, SHUFFLE_MEMORY_LIMIT_PERCENT, SHUFFLE_MERGE_PERCENT, SHUFFLE_NOTIFY_READERROR, SHUFFLE_PARALLEL_COPIES, SHUFFLE_READ_TIMEOUT, SKIP_OUTDIR, SKIP_RECORDS, SKIP_START_ATTEMPTS, SPECULATIVE_MINIMUM_ALLOWED_TASKS, SPECULATIVE_RETRY_AFTER_NO_SPECULATE, SPECULATIVE_RETRY_AFTER_SPECULATE, SPECULATIVE_SLOWNODE_THRESHOLD, SPECULATIVE_SLOWTASK_THRESHOLD, SPECULATIVECAP, SPECULATIVECAP_RUNNING_TASKS, SPECULATIVECAP_TOTAL_TASKS, SPILL_FILES_COUNT_LIMIT, SPLIT_FILE, SPLIT_METAINFO_MAXSIZE, STDERR_LOGFILE_ENV, STDOUT_LOGFILE_ENV, TASK_ATTEMPT_ID, TASK_CLEANUP_NEEDED, TASK_DEBUGOUT_LINES, TASK_EXIT_TIMEOUT, TASK_EXIT_TIMEOUT_CHECK_INTERVAL_MS, TASK_EXIT_TIMEOUT_CHECK_INTERVAL_MS_DEFAULT, TASK_EXIT_TIMEOUT_DEFAULT, TASK_ID, TASK_ISMAP, TASK_LOCAL_WRITE_LIMIT_BYTES, TASK_LOG_BACKUPS, TASK_LOG_PROGRESS_DELTA_THRESHOLD, TASK_LOG_PROGRESS_DELTA_THRESHOLD_DEFAULT, TASK_LOG_PROGRESS_WAIT_INTERVAL_SECONDS, TASK_LOG_PROGRESS_WAIT_INTERVAL_SECONDS_DEFAULT, TASK_MAP_PROFILE_PARAMS, TASK_OUTPUT_DIR, TASK_PARTITION, TASK_PREEMPTION, TASK_PROFILE, TASK_PROFILE_PARAMS, TASK_PROGRESS_REPORT_INTERVAL, TASK_REDUCE_PROFILE_PARAMS, TASK_STUCK_TIMEOUT_MS, TASK_TIMEOUT, TASK_TIMEOUT_CHECK_INTERVAL_MS, TASK_USERLOG_LIMIT, USER_NAME, WORKDIR, WORKFLOW_ADJACENCY_PREFIX_PATTERN, WORKFLOW_ADJACENCY_PREFIX_STRING, WORKFLOW_ID, WORKFLOW_NAME, WORKFLOW_NODE_NAME, WORKFLOW_TAGS, WORKING_DIR
Constructor Summary

Constructors

Constructor

Description

Job()

Deprecated.
Use getInstance()

Job(Configuration conf)

Deprecated.
Use getInstance(Configuration)

Job(Configuration conf, String jobName)

Deprecated.
Use getInstance(Configuration, String)
Method Summary

Modifier and Type

Method

Description

void

addArchiveToClassPath(Path archive)

Add an archive path to the current set of classpath entries.

static void

addArchiveToClassPath(Path archive, Configuration conf, FileSystem fs)

Add an archive path to the current set of classpath entries.

static boolean

addArchiveToSharedCache(URI resource, Configuration conf)

Add an archive to job config for shared cache processing.

void

addCacheArchive(URI uri)

Add a archives to be localized

static void

addCacheArchive(URI uri, Configuration conf)

Add an archives to be localized to the conf.

void

addCacheFile(URI uri)

Add a file to be localized

static void

addCacheFile(URI uri, Configuration conf)

Add a file to be localized to the conf.

void

addFileToClassPath(Path file)

Add an file path to the current set of classpath entries It adds the file to cache as well.

static void

addFileToClassPath(Path file, Configuration conf, FileSystem fs)

Add a file path to the current set of classpath entries.

static void

addFileToClassPath(Path file, Configuration conf, FileSystem fs, boolean addToCache)

Add a file path to the current set of classpath entries.

static boolean

addFileToSharedCache(URI resource, Configuration conf)

Add a file to job config for shared cache processing.

static boolean

addFileToSharedCacheAndClasspath(URI resource, Configuration conf)

Add a file to job config for shared cache processing.

float

cleanupProgress()

Get the progress of the job's cleanup-tasks, as a float between 0.0 and 1.0.

void

close()

Close the Job.

void

createSymlink()

Deprecated.

void

failTask(TaskAttemptID taskId)

Fail indicated task attempt.

static Map<String,Boolean>

getArchiveSharedCacheUploadPolicies(Configuration conf)

This is to get the shared cache upload policies for archives.

Cluster

getCluster()

static int

getCompletionPollInterval(Configuration conf)

The interval at which waitForCompletion() should check.

Counters

getCounters()

Gets the counters for this job.

static Map<String,Boolean>

getFileSharedCacheUploadPolicies(Configuration conf)

This is to get the shared cache upload policies for files.

long

getFinishTime()

Get finish time of the job.

String

getHistoryUrl()

static Job

getInstance()

Creates a new Job with no particular Cluster .

static Job

getInstance(Configuration conf)

Creates a new Job with no particular Cluster and a given Configuration.

static Job

getInstance(Configuration conf, String jobName)

Creates a new Job with no particular Cluster and a given jobName.

static Job

getInstance(Cluster ignored)

Deprecated.
Use getInstance()

static Job

getInstance(Cluster ignored, Configuration conf)

Deprecated.
Use getInstance(Configuration)

static Job

getInstance(Cluster cluster, JobStatus status, Configuration conf)

Creates a new Job with no particular Cluster and given Configuration and JobStatus.

static Job

getInstance(JobStatus status, Configuration conf)

Creates a new Job with no particular Cluster and given Configuration and JobStatus.

String

getJobFile()

Get the path of the submitted job configuration.

String

getJobName()

The user-specified job name.

JobStatus.State

getJobState()

Returns the current state of the Job.

JobPriority

getPriority()

Get scheduling info of the job.

static int

getProgressPollInterval(Configuration conf)

The interval at which monitorAndPrintJob() prints status

ReservationId

getReservationId()

Get the reservation to which the job is submitted to, if any

String

getSchedulingInfo()

Get scheduling info of the job.

long

getStartTime()

Get start time of the job.

JobStatus

getStatus()

TaskCompletionEvent[]

getTaskCompletionEvents(int startFrom)

Get events indicating completion (success/failure) of component tasks.

TaskCompletionEvent[]

getTaskCompletionEvents(int startFrom, int numEvents)

Get events indicating completion (success/failure) of component tasks.

String[]

getTaskDiagnostics(TaskAttemptID taskid)

Gets the diagnostic messages for a given task attempt.

static Job.TaskStatusFilter

getTaskOutputFilter(Configuration conf)

Get the task output filter.

org.apache.hadoop.mapreduce.TaskReport[]

getTaskReports(TaskType type)

Get the information of the current state of the tasks of a job.

String

getTrackingURL()

Get the URL where some job progress information will be displayed.

boolean

isComplete()

Check if the job is finished or not.

boolean

isRetired()

boolean

isSuccessful()

Check if the job completed successfully.

boolean

isUber()

void

killJob()

Kill the running job.

void

killTask(TaskAttemptID taskId)

Kill indicated task attempt.

boolean

killTask(TaskAttemptID taskId, boolean shouldFail)

Kill indicated task attempt.

float

mapProgress()

Get the progress of the job's map-tasks, as a float between 0.0 and 1.0.

boolean

monitorAndPrintJob()

Monitor a job and print status in real-time as progress is made and tasks fail.

float

reduceProgress()

Get the progress of the job's reduce-tasks, as a float between 0.0 and 1.0.

static void

setArchiveSharedCacheUploadPolicies(Configuration conf, Map<String,Boolean> policies)

This is to set the shared cache upload policies for archives.

void

setCacheArchives(URI[] archives)

Set the given set of archives

static void

setCacheArchives(URI[] archives, Configuration conf)

Set the configuration with the given set of archives.

void

setCacheFiles(URI[] files)

Set the given set of files

static void

setCacheFiles(URI[] files, Configuration conf)

Set the configuration with the given set of files.

void

setCancelDelegationTokenUponJobCompletion(boolean value)

Sets the flag that will allow the JobTracker to cancel the HDFS delegation tokens upon job completion.

void

setCombinerClass(Class<? extends Reducer> cls)

Set the combiner class for the job.

void

setCombinerKeyGroupingComparatorClass(Class<? extends RawComparator> cls)

Define the comparator that controls which keys are grouped together for a single call to combiner, Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)

static void

setFileSharedCacheUploadPolicies(Configuration conf, Map<String,Boolean> policies)

This is to set the shared cache upload policies for files.

void

setGroupingComparatorClass(Class<? extends RawComparator> cls)

Define the comparator that controls which keys are grouped together for a single call to Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)

void

setInputFormatClass(Class<? extends InputFormat> cls)

Set the InputFormat for the job.

void

setJar(String jar)

Set the job jar

void

setJarByClass(Class<?> cls)

Set the Jar by finding where a given class came from.

void

setJobName(String name)

Set the user-specified job name.

void

setJobSetupCleanupNeeded(boolean needed)

Specify whether job-setup and job-cleanup is needed for the job

void

setMapOutputKeyClass(Class<?> theClass)

Set the key class for the map output data.

void

setMapOutputValueClass(Class<?> theClass)

Set the value class for the map output data.

void

setMapperClass(Class<? extends Mapper> cls)

Set the Mapper for the job.

void

setMapSpeculativeExecution(boolean speculativeExecution)

Turn speculative execution on or off for this job for map tasks.

void

setMaxMapAttempts(int n)

Expert: Set the number of maximum attempts that will be made to run a map task.

void

setMaxReduceAttempts(int n)

Expert: Set the number of maximum attempts that will be made to run a reduce task.

void

setNumReduceTasks(int tasks)

Set the number of reduce tasks for the job.

void

setOutputFormatClass(Class<? extends OutputFormat> cls)

Set the OutputFormat for the job.

void

setOutputKeyClass(Class<?> theClass)

Set the key class for the job output data.

void

setOutputValueClass(Class<?> theClass)

Set the value class for job outputs.

void

setPartitionerClass(Class<? extends Partitioner> cls)

Set the Partitioner for the job.

void

setPriority(JobPriority jobPriority)

Set the priority of a running job.

void

setPriorityAsInteger(int jobPriority)

Set the priority of a running job.

void

setProfileEnabled(boolean newValue)

Set whether the system should collect profiler information for some of the tasks in this job?

void

setProfileParams(String value)

Set the profiler configuration arguments.

void

setProfileTaskRange(boolean isMap, String newValue)

Set the ranges of maps or reduces to profile. setProfileEnabled(true) must also be called.

void

setReducerClass(Class<? extends Reducer> cls)

Set the Reducer for the job.

void

setReduceSpeculativeExecution(boolean speculativeExecution)

Turn speculative execution on or off for this job for reduce tasks.

void

setReservationId(ReservationId reservationId)

Set the reservation to which the job is submitted to

void

setSortComparatorClass(Class<? extends RawComparator> cls)

Define the comparator that controls how the keys are sorted before they are passed to the Reducer.

void

setSpeculativeExecution(boolean speculativeExecution)

Turn speculative execution on or off for this job.

static void

setTaskOutputFilter(Configuration conf, Job.TaskStatusFilter newValue)

Modify the Configuration to set the task output filter.

float

setupProgress()

Get the progress of the job's setup-tasks, as a float between 0.0 and 1.0.

void

setUser(String user)

Set the reported username for this job.

void

setWorkingDirectory(Path dir)

Set the current working directory for the default file system.

void

submit()

Submit the job to the cluster and return immediately.

String

toString()

Dump stats to screen.

boolean

waitForCompletion(boolean verbose)

Submit the job to the cluster and wait for it to finish.

Methods inherited from class org.apache.hadoop.mapreduce.task.JobContextImpl
getArchiveClassPaths, getArchiveClassPaths, getArchiveTimestamps, getArchiveTimestamps, getCacheArchives, getCacheArchives, getCacheFiles, getCacheFiles, getCombinerClass, getCombinerKeyGroupingComparator, getConfiguration, getCredentials, getFileClassPaths, getFileClassPaths, getFileTimestamps, getFileTimestamps, getGroupingComparator, getInputFormatClass, getJar, getJobID, getJobSetupCleanupNeeded, getLocalCacheArchives, getLocalCacheArchives, getLocalCacheFiles, getLocalCacheFiles, getMapOutputKeyClass, getMapOutputValueClass, getMapperClass, getMaxMapAttempts, getMaxReduceAttempts, getNumReduceTasks, getOutputFormatClass, getOutputKeyClass, getOutputValueClass, getPartitionerClass, getProfileEnabled, getProfileParams, getProfileTaskRange, getReducerClass, getSortComparator, getSymlink, getTaskCleanupNeeded, getUser, getWorkingDirectory, setJobID

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.hadoop.mapreduce.JobContext
getArchiveClassPaths, getArchiveTimestamps, getCacheArchives, getCacheFiles, getCombinerClass, getCombinerKeyGroupingComparator, getConfiguration, getCredentials, getFileClassPaths, getFileTimestamps, getGroupingComparator, getInputFormatClass, getJar, getJobID, getJobSetupCleanupNeeded, getLocalCacheArchives, getLocalCacheFiles, getMapOutputKeyClass, getMapOutputValueClass, getMapperClass, getMaxMapAttempts, getMaxReduceAttempts, getNumReduceTasks, getOutputFormatClass, getOutputKeyClass, getOutputValueClass, getPartitionerClass, getProfileEnabled, getProfileParams, getProfileTaskRange, getReducerClass, getSortComparator, getSymlink, getTaskCleanupNeeded, getUser, getWorkingDirectory

Field Details
- OUTPUT_FILTER
  
  public static final String OUTPUT_FILTER
  See Also:
  
  Constant Field Values
- COMPLETION_POLL_INTERVAL_KEY
  
  public static final String COMPLETION_POLL_INTERVAL_KEY
  
  Key in mapred-*.xml that sets completionPollInvervalMillis
  See Also:
  
  Constant Field Values
- PROGRESS_MONITOR_POLL_INTERVAL_KEY
  
  public static final String PROGRESS_MONITOR_POLL_INTERVAL_KEY
  
  Key in mapred-*.xml that sets progMonitorPollIntervalMillis
  See Also:
  
  Constant Field Values
- USED_GENERIC_PARSER
  
  public static final String USED_GENERIC_PARSER
  See Also:
  
  Constant Field Values
- SUBMIT_REPLICATION
  
  public static final String SUBMIT_REPLICATION
  See Also:
  
  Constant Field Values
- DEFAULT_SUBMIT_REPLICATION
  
  public static final int DEFAULT_SUBMIT_REPLICATION
  See Also:
  
  Constant Field Values
- USE_WILDCARD_FOR_LIBJARS
  
  public static final String USE_WILDCARD_FOR_LIBJARS
  See Also:
  
  Constant Field Values
- DEFAULT_USE_WILDCARD_FOR_LIBJARS
  
  public static final boolean DEFAULT_USE_WILDCARD_FOR_LIBJARS
  See Also:
  
  Constant Field Values
Constructor Details
- Job
  
  @Deprecated public Job() throws IOException
  
  Deprecated.
  Use getInstance()
  
  Throws:
  
  IOException
- Job
  
  @Deprecated public Job(Configuration conf) throws IOException
  
  Deprecated.
  Use getInstance(Configuration)
  
  Throws:
  
  IOException
- Job
  
  @Deprecated public Job(Configuration conf, String jobName) throws IOException
  
  Deprecated.
  Use getInstance(Configuration, String)
  
  Throws:
  
  IOException
Method Details
- getInstance
  
  public static Job getInstance() throws IOException
  
  Creates a new Job with no particular Cluster . A Cluster will be created with a generic Configuration.
  
  Returns:
  
  the Job , with no connection to a cluster yet.
  
  Throws:
  
  IOException
- getInstance
  
  public static Job getInstance(Configuration conf) throws IOException
  
  Creates a new Job with no particular Cluster and a given Configuration. The Job makes a copy of the Configuration so that any necessary internal modifications do not reflect on the incoming parameter. A Cluster will be created from the conf parameter only when it's needed.
  
  Parameters:
  
  conf - the configuration
  
  Returns:
  
  the Job , with no connection to a cluster yet.
  
  Throws:
  
  IOException
- getInstance
  
  public static Job getInstance(Configuration conf, String jobName) throws IOException
  
  Creates a new Job with no particular Cluster and a given jobName. A Cluster will be created from the conf parameter only when it's needed. The Job makes a copy of the Configuration so that any necessary internal modifications do not reflect on the incoming parameter.
  
  Parameters:
  
  conf - the configuration
  
  Returns:
  
  the Job , with no connection to a cluster yet.
  
  Throws:
  
  IOException
- getInstance
  
  public static Job getInstance(JobStatus status, Configuration conf) throws IOException
  
  Creates a new Job with no particular Cluster and given Configuration and JobStatus. A Cluster will be created from the conf parameter only when it's needed. The Job makes a copy of the Configuration so that any necessary internal modifications do not reflect on the incoming parameter.
  
  Parameters:
  
  status - job status
  
  conf - job configuration
  
  Returns:
  
  the Job , with no connection to a cluster yet.
  
  Throws:
  
  IOException
- getInstance
  
  @Deprecated public static Job getInstance(Cluster ignored) throws IOException
  
  Deprecated.
  Use getInstance()
  
  Creates a new Job with no particular Cluster. A Cluster will be created from the conf parameter only when it's needed. The Job makes a copy of the Configuration so that any necessary internal modifications do not reflect on the incoming parameter.
  
  Parameters:
  
  ignored -
  
  Returns:
  
  the Job , with no connection to a cluster yet.
  
  Throws:
  
  IOException
- getInstance
  
  @Deprecated public static Job getInstance(Cluster ignored, Configuration conf) throws IOException
  
  Deprecated.
  Use getInstance(Configuration)
  
  Creates a new Job with no particular Cluster and given Configuration. A Cluster will be created from the conf parameter only when it's needed. The Job makes a copy of the Configuration so that any necessary internal modifications do not reflect on the incoming parameter.
  
  Parameters:
  
  ignored -
  
  conf - job configuration
  
  Returns:
  
  the Job , with no connection to a cluster yet.
  
  Throws:
  
  IOException
- getInstance
  
  @Private public static Job getInstance(Cluster cluster, JobStatus status, Configuration conf) throws IOException
  
  Creates a new Job with no particular Cluster and given Configuration and JobStatus. A Cluster will be created from the conf parameter only when it's needed. The Job makes a copy of the Configuration so that any necessary internal modifications do not reflect on the incoming parameter.
  
  Parameters:
  
  cluster - cluster
  
  status - job status
  
  conf - job configuration
  
  Returns:
  
  the Job , with no connection to a cluster yet.
  
  Throws:
  
  IOException
- getStatus
  
  public JobStatus getStatus() throws IOException, InterruptedException
  
  Throws:
  
  IOException
  
  InterruptedException
- getJobState
  
  public JobStatus.State getJobState() throws IOException, InterruptedException
  
  Returns the current state of the Job.
  
  Returns:
  
  JobStatus#State
  
  Throws:
  
  IOException
  
  InterruptedException
- getTrackingURL
  
  public String getTrackingURL()
  
  Get the URL where some job progress information will be displayed.
  
  Returns:
  
  the URL where some job progress information will be displayed.
- getJobFile
  
  public String getJobFile()
  
  Get the path of the submitted job configuration.
  
  Returns:
  
  the path of the submitted job configuration.
- getStartTime
  
  public long getStartTime()
  
  Get start time of the job.
  
  Returns:
  
  the start time of the job
- getFinishTime
  
  public long getFinishTime() throws IOException, InterruptedException
  
  Get finish time of the job.
  
  Returns:
  
  the finish time of the job
  
  Throws:
  
  IOException
  
  InterruptedException
- getSchedulingInfo
  
  public String getSchedulingInfo()
  
  Get scheduling info of the job.
  
  Returns:
  
  the scheduling info of the job
- getPriority
  
  public JobPriority getPriority() throws IOException, InterruptedException
  
  Get scheduling info of the job.
  
  Returns:
  
  the priority info of the job
  
  Throws:
  
  IOException
  
  InterruptedException
- getJobName
  
  public String getJobName()
  
  The user-specified job name.
  
  Specified by:
  
  getJobName in interface JobContext
  
  Overrides:
  
  getJobName in class org.apache.hadoop.mapreduce.task.JobContextImpl
  
  Returns:
  
  the job's name, defaulting to "".
- getHistoryUrl
  
  public String getHistoryUrl() throws IOException, InterruptedException
  
  Throws:
  
  IOException
  
  InterruptedException
- isRetired
  
  public boolean isRetired() throws IOException, InterruptedException
  
  Throws:
  
  IOException
  
  InterruptedException
- getCluster
  
  @Private public Cluster getCluster()
- toString
  
  public String toString()
  
  Dump stats to screen.
  
  Overrides:
  
  toString in class org.apache.hadoop.mapreduce.task.JobContextImpl
- getTaskReports
  
  public org.apache.hadoop.mapreduce.TaskReport[] getTaskReports(TaskType type) throws IOException, InterruptedException
  
  Get the information of the current state of the tasks of a job.
  
  Parameters:
  
  type - Type of the task
  
  Returns:
  
  the list of all of the map tips.
  
  Throws:
  
  IOException
  
  InterruptedException
- mapProgress
  
  public float mapProgress() throws IOException
  
  Get the progress of the job's map-tasks, as a float between 0.0 and 1.0. When all map tasks have completed, the function returns 1.0.
  
  Returns:
  
  the progress of the job's map-tasks.
  
  Throws:
  
  IOException
- reduceProgress
  
  public float reduceProgress() throws IOException
  
  Get the progress of the job's reduce-tasks, as a float between 0.0 and 1.0. When all reduce tasks have completed, the function returns 1.0.
  
  Returns:
  
  the progress of the job's reduce-tasks.
  
  Throws:
  
  IOException
- cleanupProgress
  
  public float cleanupProgress() throws IOException, InterruptedException
  
  Get the progress of the job's cleanup-tasks, as a float between 0.0 and 1.0. When all cleanup tasks have completed, the function returns 1.0.
  
  Returns:
  
  the progress of the job's cleanup-tasks.
  
  Throws:
  
  IOException
  
  InterruptedException
- setupProgress
  
  public float setupProgress() throws IOException
  
  Get the progress of the job's setup-tasks, as a float between 0.0 and 1.0. When all setup tasks have completed, the function returns 1.0.
  
  Returns:
  
  the progress of the job's setup-tasks.
  
  Throws:
  
  IOException
- isComplete
  
  public boolean isComplete() throws IOException
  
  Check if the job is finished or not. This is a non-blocking call.
  
  Returns:
  
  true if the job is complete, else false.
  
  Throws:
  
  IOException
- isSuccessful
  
  public boolean isSuccessful() throws IOException
  
  Check if the job completed successfully.
  
  Returns:
  
  true if the job succeeded, else false.
  
  Throws:
  
  IOException
- killJob
  
  public void killJob() throws IOException
  
  Kill the running job. Blocks until all job tasks have been killed as well. If the job is no longer running, it simply returns.
  
  Throws:
  
  IOException
- setPriority
  
  public void setPriority(JobPriority jobPriority) throws IOException, InterruptedException
  
  Set the priority of a running job.
  
  Parameters:
  
  jobPriority - the new priority for the job.
  
  Throws:
  
  IOException
  
  InterruptedException
- setPriorityAsInteger
  
  public void setPriorityAsInteger(int jobPriority) throws IOException, InterruptedException
  
  Set the priority of a running job.
  
  Parameters:
  
  jobPriority - the new priority for the job.
  
  Throws:
  
  IOException
  
  InterruptedException
- getTaskCompletionEvents
  
  public TaskCompletionEvent[] getTaskCompletionEvents(int startFrom, int numEvents) throws IOException, InterruptedException
  
  Get events indicating completion (success/failure) of component tasks.
  
  Parameters:
  
  startFrom - index to start fetching events from
  
  numEvents - number of events to fetch
  
  Returns:
  
  an array of TaskCompletionEvents
  
  Throws:
  
  IOException
  
  InterruptedException
- getTaskCompletionEvents
  
  public TaskCompletionEvent[] getTaskCompletionEvents(int startFrom) throws IOException
  
  Get events indicating completion (success/failure) of component tasks.
  
  Parameters:
  
  startFrom - index to start fetching events from
  
  Returns:
  
  an array of TaskCompletionEvents
  
  Throws:
  
  IOException
- killTask
  
  @Private public boolean killTask(TaskAttemptID taskId, boolean shouldFail) throws IOException
  
  Kill indicated task attempt.
  
  Parameters:
  
  taskId - the id of the task to kill.
  
  shouldFail - if true the task is failed and added to failed tasks list, otherwise it is just killed, w/o affecting job failure status.
  
  Throws:
  
  IOException
- killTask
  
  public void killTask(TaskAttemptID taskId) throws IOException
  
  Kill indicated task attempt.
  
  Parameters:
  
  taskId - the id of the task to be terminated.
  
  Throws:
  
  IOException
- failTask
  
  public void failTask(TaskAttemptID taskId) throws IOException
  
  Fail indicated task attempt.
  
  Parameters:
  
  taskId - the id of the task to be terminated.
  
  Throws:
  
  IOException
- getCounters
  
  public Counters getCounters() throws IOException
  
  Gets the counters for this job. May return null if the job has been retired and the job is no longer in the completed job store.
  
  Returns:
  
  the counters for this job.
  
  Throws:
  
  IOException
- getTaskDiagnostics
  
  public String[] getTaskDiagnostics(TaskAttemptID taskid) throws IOException, InterruptedException
  
  Gets the diagnostic messages for a given task attempt.
  
  Parameters:
  
  taskid -
  
  Returns:
  
  the list of diagnostic messages for the task
  
  Throws:
  
  IOException
  
  InterruptedException
- setNumReduceTasks
  
  public void setNumReduceTasks(int tasks) throws IllegalStateException
  
  Set the number of reduce tasks for the job.
  
  Parameters:
  
  tasks - the number of reduce tasks
  
  Throws:
  
  IllegalStateException - if the job is submitted
- setWorkingDirectory
  
  public void setWorkingDirectory(Path dir) throws IOException
  
  Set the current working directory for the default file system.
  
  Parameters:
  
  dir - the new current working directory.
  
  Throws:
  
  IllegalStateException - if the job is submitted
  
  IOException
- setInputFormatClass
  
  public void setInputFormatClass(Class<? extends InputFormat> cls) throws IllegalStateException
  
  Set the InputFormat for the job.
  
  Parameters:
  
  cls - the InputFormat to use
  
  Throws:
  
  IllegalStateException - if the job is submitted
- setOutputFormatClass
  
  public void setOutputFormatClass(Class<? extends OutputFormat> cls) throws IllegalStateException
  
  Set the OutputFormat for the job.
  
  Parameters:
  
  cls - the OutputFormat to use
  
  Throws:
  
  IllegalStateException - if the job is submitted
- setMapperClass
  
  public void setMapperClass(Class<? extends Mapper> cls) throws IllegalStateException
  
  Set the Mapper for the job.
  
  Parameters:
  
  cls - the Mapper to use
  
  Throws:
  
  IllegalStateException - if the job is submitted
- setJarByClass
  
  public void setJarByClass(Class<?> cls)
  
  Set the Jar by finding where a given class came from.
  
  Parameters:
  
  cls - the example class
- setJar
  
  public void setJar(String jar)
  
  Set the job jar
- setUser
  
  public void setUser(String user)
  
  Set the reported username for this job.
  
  Parameters:
  
  user - the username for this job.
- setCombinerClass
  
  public void setCombinerClass(Class<? extends Reducer> cls) throws IllegalStateException
  
  Set the combiner class for the job.
  
  Parameters:
  
  cls - the combiner to use
  
  Throws:
  
  IllegalStateException - if the job is submitted
- setReducerClass
  
  public void setReducerClass(Class<? extends Reducer> cls) throws IllegalStateException
  
  Set the Reducer for the job.
  
  Parameters:
  
  cls - the Reducer to use
  
  Throws:
  
  IllegalStateException - if the job is submitted
- setPartitionerClass
  
  public void setPartitionerClass(Class<? extends Partitioner> cls) throws IllegalStateException
  
  Set the Partitioner for the job.
  
  Parameters:
  
  cls - the Partitioner to use
  
  Throws:
  
  IllegalStateException - if the job is submitted
- setMapOutputKeyClass
  
  public void setMapOutputKeyClass(Class<?> theClass) throws IllegalStateException
  
  Set the key class for the map output data. This allows the user to specify the map output key class to be different than the final output value class.
  
  Parameters:
  
  theClass - the map output key class.
  
  Throws:
  
  IllegalStateException - if the job is submitted
- setMapOutputValueClass
  
  public void setMapOutputValueClass(Class<?> theClass) throws IllegalStateException
  
  Set the value class for the map output data. This allows the user to specify the map output value class to be different than the final output value class.
  
  Parameters:
  
  theClass - the map output value class.
  
  Throws:
  
  IllegalStateException - if the job is submitted
- setOutputKeyClass
  
  public void setOutputKeyClass(Class<?> theClass) throws IllegalStateException
  
  Set the key class for the job output data.
  
  Parameters:
  
  theClass - the key class for the job output data.
  
  Throws:
  
  IllegalStateException - if the job is submitted
- setOutputValueClass
  
  public void setOutputValueClass(Class<?> theClass) throws IllegalStateException
  
  Set the value class for job outputs.
  
  Parameters:
  
  theClass - the value class for job outputs.
  
  Throws:
  
  IllegalStateException - if the job is submitted
- setCombinerKeyGroupingComparatorClass
  
  public void setCombinerKeyGroupingComparatorClass(Class<? extends RawComparator> cls) throws IllegalStateException
  
  Define the comparator that controls which keys are grouped together for a single call to combiner, Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)
  
  Parameters:
  
  cls - the raw comparator to use
  
  Throws:
  
  IllegalStateException - if the job is submitted
- setSortComparatorClass
  
  public void setSortComparatorClass(Class<? extends RawComparator> cls) throws IllegalStateException
  
  Define the comparator that controls how the keys are sorted before they are passed to the Reducer.
  Parameters:
  
  cls - the raw comparator
  
  Throws:
  
  IllegalStateException - if the job is submitted
  
  See Also:
  
  setCombinerKeyGroupingComparatorClass(Class)
- setGroupingComparatorClass
  
  public void setGroupingComparatorClass(Class<? extends RawComparator> cls) throws IllegalStateException
  
  Define the comparator that controls which keys are grouped together for a single call to Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)
  Parameters:
  
  cls - the raw comparator to use
  
  Throws:
  
  IllegalStateException - if the job is submitted
  
  See Also:
  
  setCombinerKeyGroupingComparatorClass(Class)
- setJobName
  
  public void setJobName(String name) throws IllegalStateException
  
  Set the user-specified job name.
  
  Parameters:
  
  name - the job's new name.
  
  Throws:
  
  IllegalStateException - if the job is submitted
- setSpeculativeExecution
  
  public void setSpeculativeExecution(boolean speculativeExecution)
  
  Turn speculative execution on or off for this job.
  
  Parameters:
  
  speculativeExecution - true if speculative execution should be turned on, else false.
- setMapSpeculativeExecution
  
  public void setMapSpeculativeExecution(boolean speculativeExecution)
  
  Turn speculative execution on or off for this job for map tasks.
  
  Parameters:
  
  speculativeExecution - true if speculative execution should be turned on for map tasks, else false.
- setReduceSpeculativeExecution
  
  public void setReduceSpeculativeExecution(boolean speculativeExecution)
  
  Turn speculative execution on or off for this job for reduce tasks.
  
  Parameters:
  
  speculativeExecution - true if speculative execution should be turned on for reduce tasks, else false.
- setJobSetupCleanupNeeded
  
  public void setJobSetupCleanupNeeded(boolean needed)
  
  Specify whether job-setup and job-cleanup is needed for the job
  
  Parameters:
  
  needed - If true, job-setup and job-cleanup will be considered from OutputCommitter else ignored.
- setCacheArchives
  
  public void setCacheArchives(URI[] archives)
  
  Set the given set of archives
  
  Parameters:
  
  archives - The list of archives that need to be localized
- setCacheArchives
  
  public static void setCacheArchives(URI[] archives, Configuration conf)
  
  Set the configuration with the given set of archives.
  
  Parameters:
  
  archives - The list of archives that need to be localized.
  
  conf - Configuration which will be changed.
- setCacheFiles
  
  public void setCacheFiles(URI[] files)
  
  Set the given set of files
  
  Parameters:
  
  files - The list of files that need to be localized
- setCacheFiles
  
  public static void setCacheFiles(URI[] files, Configuration conf)
  
  Set the configuration with the given set of files.
  
  Parameters:
  
  files - The list of files that need to be localized.
  
  conf - Configuration which will be changed.
- addCacheArchive
  
  public void addCacheArchive(URI uri)
  
  Add a archives to be localized
  
  Parameters:
  
  uri - The uri of the cache to be localized
- addCacheArchive
  
  public static void addCacheArchive(URI uri, Configuration conf)
  
  Add an archives to be localized to the conf.
  
  Parameters:
  
  uri - The uri of the cache to be localized.
  
  conf - Configuration to add the cache to.
- addCacheFile
  
  public void addCacheFile(URI uri)
  
  Add a file to be localized
  
  Parameters:
  
  uri - The uri of the cache to be localized
- addCacheFile
  
  public static void addCacheFile(URI uri, Configuration conf)
  
  Add a file to be localized to the conf. The localized file will be downloaded to the execution node(s), and a link will be created to the file from the job's working directory. If the last part of URI's path name is "*", then the entire parent directory will be localized and links will be created from the job's working directory to each file in the parent directory.
  The access permissions of the file will determine whether the localized file will be shared across jobs. If the file is not readable by other or if any of its parent directories is not executable by other, then the file will not be shared. In the case of a path that ends in "/*", sharing of the localized files will be determined solely from the access permissions of the parent directories. The access permissions of the individual files will be ignored.
  
  Parameters:
  
  uri - The uri of the cache to be localized.
  
  conf - Configuration to add the cache to.
- addFileToClassPath
  
  public void addFileToClassPath(Path file) throws IOException
  
  Add an file path to the current set of classpath entries It adds the file to cache as well. Files added with this method will not be unpacked while being added to the classpath. To add archives to classpath, use the addArchiveToClassPath(Path) method instead.
  
  Parameters:
  
  file - Path of the file to be added
  
  Throws:
  
  IOException
- addFileToClassPath
  
  public static void addFileToClassPath(Path file, Configuration conf, FileSystem fs)
  
  Add a file path to the current set of classpath entries. The file will also be added to the cache.
  
  Parameters:
  
  file - Path of the file to be added.
  
  conf - Configuration that contains the classpath setting.
  
  fs - FileSystem with respect to which file should be interpreted.
- addFileToClassPath
  
  public static void addFileToClassPath(Path file, Configuration conf, FileSystem fs, boolean addToCache)
  
  Add a file path to the current set of classpath entries. The file will also be added to the cache if addToCache is true.
  
  Parameters:
  
  file - Path of the file to be added.
  
  conf - Configuration that contains the classpath setting.
  
  fs - FileSystem with respect to which file should be interpreted.
  
  addToCache - Whether the file should also be added to the cache list.
- addArchiveToClassPath
  
  public void addArchiveToClassPath(Path archive) throws IOException
  
  Add an archive path to the current set of classpath entries. It adds the archive to cache as well. Archive files will be unpacked and added to the classpath when being distributed.
  
  Parameters:
  
  archive - Path of the archive to be added
  
  Throws:
  
  IOException
- addArchiveToClassPath
  
  public static void addArchiveToClassPath(Path archive, Configuration conf, FileSystem fs)
  
  Add an archive path to the current set of classpath entries. It adds the archive to cache as well.
  
  Parameters:
  
  archive - Path of the archive to be added.
  
  conf - Configuration that contains the classpath setting.
  
  fs - FileSystem with respect to which archive should be interpreted.
- createSymlink
  
  @Deprecated public void createSymlink()
  
  Deprecated.
  
  Originally intended to enable symlinks, but currently symlinks cannot be disabled.
- setMaxMapAttempts
  
  public void setMaxMapAttempts(int n)
  
  Expert: Set the number of maximum attempts that will be made to run a map task.
  
  Parameters:
  
  n - the number of attempts per map task.
- setMaxReduceAttempts
  
  public void setMaxReduceAttempts(int n)
  
  Expert: Set the number of maximum attempts that will be made to run a reduce task.
  
  Parameters:
  
  n - the number of attempts per reduce task.
- setProfileEnabled
  
  public void setProfileEnabled(boolean newValue)
  
  Set whether the system should collect profiler information for some of the tasks in this job? The information is stored in the user log directory.
  
  Parameters:
  
  newValue - true means it should be gathered
- setProfileParams
  
  public void setProfileParams(String value)
  
  Set the profiler configuration arguments. If the string contains a '%s' it will be replaced with the name of the profiling output file when the task runs. This value is passed to the task child JVM on the command line.
  
  Parameters:
  
  value - the configuration string
- setProfileTaskRange
  
  public void setProfileTaskRange(boolean isMap, String newValue)
  
  Set the ranges of maps or reduces to profile. setProfileEnabled(true) must also be called.
  
  Parameters:
  
  newValue - a set of integer ranges of the map ids
- setCancelDelegationTokenUponJobCompletion
  
  public void setCancelDelegationTokenUponJobCompletion(boolean value)
  
  Sets the flag that will allow the JobTracker to cancel the HDFS delegation tokens upon job completion. Defaults to true.
- addFileToSharedCache
  
  @Unstable public static boolean addFileToSharedCache(URI resource, Configuration conf)
  
  Add a file to job config for shared cache processing. If shared cache is enabled, it will return true, otherwise, return false. We don't check with SCM here given application might not be able to provide the job id; ClientSCMProtocol.use requires the application id. Job Submitter will read the files from job config and take care of things.
  
  Parameters:
  
  resource - The resource that Job Submitter will process later using shared cache.
  
  conf - Configuration to add the resource to
  
  Returns:
  
  whether the resource has been added to the configuration
- addFileToSharedCacheAndClasspath
  
  @Unstable public static boolean addFileToSharedCacheAndClasspath(URI resource, Configuration conf)
  
  Add a file to job config for shared cache processing. If shared cache is enabled, it will return true, otherwise, return false. We don't check with SCM here given application might not be able to provide the job id; ClientSCMProtocol.use requires the application id. Job Submitter will read the files from job config and take care of things. Job Submitter will also add the file to classpath. Intended to be used by user code.
  
  Parameters:
  
  resource - The resource that Job Submitter will process later using shared cache.
  
  conf - Configuration to add the resource to
  
  Returns:
  
  whether the resource has been added to the configuration
- addArchiveToSharedCache
  
  @Unstable public static boolean addArchiveToSharedCache(URI resource, Configuration conf)
  
  Add an archive to job config for shared cache processing. If shared cache is enabled, it will return true, otherwise, return false. We don't check with SCM here given application might not be able to provide the job id; ClientSCMProtocol.use requires the application id. Job Submitter will read the files from job config and take care of things. Intended to be used by user code.
  
  Parameters:
  
  resource - The resource that Job Submitter will process later using shared cache.
  
  conf - Configuration to add the resource to
  
  Returns:
  
  whether the resource has been added to the configuration
- setFileSharedCacheUploadPolicies
  
  @Unstable public static void setFileSharedCacheUploadPolicies(Configuration conf, Map<String,Boolean> policies)
  
  This is to set the shared cache upload policies for files. If the parameter was previously set, this method will replace the old value with the new provided map.
  
  Parameters:
  
  conf - Configuration which stores the shared cache upload policies
  
  policies - A map containing the shared cache upload policies for a set of resources. The key is the url of the resource and the value is the upload policy. True if it should be uploaded, false otherwise.
- setArchiveSharedCacheUploadPolicies
  
  @Unstable public static void setArchiveSharedCacheUploadPolicies(Configuration conf, Map<String,Boolean> policies)
  
  This is to set the shared cache upload policies for archives. If the parameter was previously set, this method will replace the old value with the new provided map.
  
  Parameters:
  
  conf - Configuration which stores the shared cache upload policies
  
  policies - A map containing the shared cache upload policies for a set of resources. The key is the url of the resource and the value is the upload policy. True if it should be uploaded, false otherwise.
- getFileSharedCacheUploadPolicies
  
  @Unstable public static Map<String,Boolean> getFileSharedCacheUploadPolicies(Configuration conf)
  
  This is to get the shared cache upload policies for files.
  
  Parameters:
  
  conf - Configuration which stores the shared cache upload policies
  
  Returns:
  
  A map containing the shared cache upload policies for a set of resources. The key is the url of the resource and the value is the upload policy. True if it should be uploaded, false otherwise.
- getArchiveSharedCacheUploadPolicies
  
  @Unstable public static Map<String,Boolean> getArchiveSharedCacheUploadPolicies(Configuration conf)
  
  This is to get the shared cache upload policies for archives.
  
  Parameters:
  
  conf - Configuration which stores the shared cache upload policies
  
  Returns:
  
  A map containing the shared cache upload policies for a set of resources. The key is the url of the resource and the value is the upload policy. True if it should be uploaded, false otherwise.
- submit
  
  public void submit() throws IOException, InterruptedException, ClassNotFoundException
  
  Submit the job to the cluster and return immediately.
  
  Throws:
  
  IOException
  
  InterruptedException
  
  ClassNotFoundException
- waitForCompletion
  
  public boolean waitForCompletion(boolean verbose) throws IOException, InterruptedException, ClassNotFoundException
  
  Submit the job to the cluster and wait for it to finish.
  
  Parameters:
  
  verbose - print the progress to the user
  
  Returns:
  
  true if the job succeeded
  
  Throws:
  
  IOException - thrown if the communication with the JobTracker is lost
  
  InterruptedException
  
  ClassNotFoundException
- monitorAndPrintJob
  
  public boolean monitorAndPrintJob() throws IOException, InterruptedException
  
  Monitor a job and print status in real-time as progress is made and tasks fail.
  
  Returns:
  
  true if the job succeeded
  
  Throws:
  
  IOException - if communication to the JobTracker fails
  
  InterruptedException
- getProgressPollInterval
  
  public static int getProgressPollInterval(Configuration conf)
  
  The interval at which monitorAndPrintJob() prints status
- getCompletionPollInterval
  
  public static int getCompletionPollInterval(Configuration conf)
  
  The interval at which waitForCompletion() should check.
- getTaskOutputFilter
  
  public static Job.TaskStatusFilter getTaskOutputFilter(Configuration conf)
  
  Get the task output filter.
  
  Parameters:
  
  conf - the configuration.
  
  Returns:
  
  the filter level.
- setTaskOutputFilter
  
  public static void setTaskOutputFilter(Configuration conf, Job.TaskStatusFilter newValue)
  
  Modify the Configuration to set the task output filter.
  
  Parameters:
  
  conf - the Configuration to modify.
  
  newValue - the value to set.
- isUber
  
  public boolean isUber() throws IOException, InterruptedException
  
  Throws:
  
  IOException
  
  InterruptedException
- getReservationId
  
  public ReservationId getReservationId()
  
  Get the reservation to which the job is submitted to, if any
  
  Returns:
  
  the reservationId the identifier of the job's reservation, null if the job does not have any reservation associated with it
- setReservationId
  
  public void setReservationId(ReservationId reservationId)
  
  Set the reservation to which the job is submitted to
  
  Parameters:
  
  reservationId - the reservationId to set
- close
  
  public void close() throws IOException
  
  Close the Job.
  
  Specified by:
  
  close in interface AutoCloseable
  
  Throws:
  
  IOException - if fail to close.

Class Job

Nested Class Summary

Field Summary

Fields inherited from class org.apache.hadoop.mapreduce.task.JobContextImpl

Fields inherited from interface org.apache.hadoop.mapreduce.MRJobConfig

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapreduce.task.JobContextImpl

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.hadoop.mapreduce.JobContext

Field Details

OUTPUT_FILTER

COMPLETION_POLL_INTERVAL_KEY

PROGRESS_MONITOR_POLL_INTERVAL_KEY

USED_GENERIC_PARSER

SUBMIT_REPLICATION

DEFAULT_SUBMIT_REPLICATION

USE_WILDCARD_FOR_LIBJARS

DEFAULT_USE_WILDCARD_FOR_LIBJARS

Constructor Details

Job

Job

Job

Method Details

getInstance

getInstance

getInstance

getInstance

getInstance

getInstance

getInstance

getStatus

getJobState

getTrackingURL

getJobFile

getStartTime

getFinishTime

getSchedulingInfo

getPriority

getJobName

getHistoryUrl

isRetired

getCluster

toString

getTaskReports

mapProgress

reduceProgress

cleanupProgress

setupProgress

isComplete

isSuccessful

killJob

setPriority

setPriorityAsInteger

getTaskCompletionEvents

getTaskCompletionEvents

killTask

killTask

failTask

getCounters

getTaskDiagnostics

setNumReduceTasks

setWorkingDirectory

setInputFormatClass

setOutputFormatClass

setMapperClass

setJarByClass

setJar

setUser

setCombinerClass

setReducerClass

setPartitionerClass

setMapOutputKeyClass

setMapOutputValueClass

setOutputKeyClass

setOutputValueClass

setCombinerKeyGroupingComparatorClass

setSortComparatorClass

setGroupingComparatorClass

setJobName