Class Job
- All Implemented Interfaces:
AutoCloseable,JobContext,org.apache.hadoop.mapreduce.MRJobConfig
It allows the user to configure the job, submit it, control its execution, and query the state. The set methods only work until the job is submitted, afterwards they will throw an IllegalStateException.
Normally the user creates the application, describes various facets of the
job via Job and then submits the job and monitor its progress.
Here is an example on how to submit a job:
// Create a new Job
Job job = Job.getInstance();
job.setJarByClass(MyJob.class);
// Specify various job-specific parameters
job.setJobName("myjob");
job.setInputPath(new Path("in"));
job.setOutputPath(new Path("out"));
job.setMapperClass(MyJob.MyMapper.class);
job.setReducerClass(MyJob.MyReducer.class);
// Submit the job, then poll for progress until the job is complete
job.waitForCompletion(true);
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringKey in mapred-*.xml that sets completionPollInvervalMillisstatic final intstatic final booleanstatic final Stringstatic final StringKey in mapred-*.xml that sets progMonitorPollIntervalMillisstatic final Stringstatic final Stringstatic final StringFields inherited from class org.apache.hadoop.mapreduce.task.JobContextImpl
conf, credentials, ugiFields inherited from interface org.apache.hadoop.mapreduce.MRJobConfig
AM_NODE_LABEL_EXP, AM_STRICT_LOCALITY, APPLICATION_ATTEMPT_ID, APPLICATION_MASTER_CLASS, ARCHIVES_FOR_SHARED_CACHE, CACHE_ARCHIVES, CACHE_ARCHIVES_SHARED_CACHE_UPLOAD_POLICIES, CACHE_ARCHIVES_SIZES, CACHE_ARCHIVES_TIMESTAMPS, CACHE_ARCHIVES_VISIBILITIES, CACHE_FILE_TIMESTAMPS, CACHE_FILE_VISIBILITIES, CACHE_FILES, CACHE_FILES_SHARED_CACHE_UPLOAD_POLICIES, CACHE_FILES_SIZES, CACHE_LOCALARCHIVES, CACHE_LOCALFILES, CACHE_SYMLINK, CLASSPATH_ARCHIVES, CLASSPATH_FILES, COMBINE_CLASS_ATTR, COMBINE_RECORDS_BEFORE_PROGRESS, COMBINER_GROUP_COMPARATOR_CLASS, COMPLETED_MAPS_FOR_REDUCE_SLOWSTART, COUNTER_GROUP_NAME_MAX_DEFAULT, COUNTER_GROUP_NAME_MAX_KEY, COUNTER_GROUPS_MAX_DEFAULT, COUNTER_GROUPS_MAX_KEY, COUNTER_NAME_MAX_DEFAULT, COUNTER_NAME_MAX_KEY, COUNTERS_MAX_DEFAULT, COUNTERS_MAX_KEY, DEFAULT_FINISH_JOB_WHEN_REDUCERS_DONE, DEFAULT_HEAP_MEMORY_MB_RATIO, DEFAULT_IO_SORT_FACTOR, DEFAULT_IO_SORT_MB, DEFAULT_JOB_ACL_MODIFY_JOB, DEFAULT_JOB_ACL_VIEW_JOB, DEFAULT_JOB_AM_ACCESS_DISABLED, DEFAULT_JOB_DFS_STORAGE_CAPACITY_KILL_LIMIT_EXCEED, DEFAULT_JOB_MAX_MAP, DEFAULT_JOB_RUNNING_MAP_LIMIT, DEFAULT_JOB_RUNNING_REDUCE_LIMIT, DEFAULT_JOB_SINGLE_DISK_LIMIT_BYTES, DEFAULT_JOB_SINGLE_DISK_LIMIT_CHECK_INTERVAL_MS, DEFAULT_JOB_SINGLE_DISK_LIMIT_KILL_LIMIT_EXCEED, DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED, DEFAULT_LOG_LEVEL, DEFAULT_MAP_CPU_VCORES, DEFAULT_MAP_MEMORY_MB, DEFAULT_MAPRED_ADMIN_JAVA_OPTS, DEFAULT_MAPRED_ADMIN_USER_ENV, DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH, DEFAULT_MAPREDUCE_CROSS_PLATFORM_APPLICATION_CLASSPATH, DEFAULT_MAPREDUCE_JOB_EMIT_TIMELINE_DATA, DEFAULT_MAPREDUCE_JVM_SYSTEM_PROPERTIES_TO_LOG, DEFAULT_MAX_ALLOWED_FETCH_FAILURES_FRACTION, DEFAULT_MAX_FETCH_FAILURES_NOTIFICATIONS, DEFAULT_MAX_SHUFFLE_FETCH_HOST_FAILURES, DEFAULT_MAX_SHUFFLE_FETCH_RETRY_DELAY, DEFAULT_MR_AM_ADMIN_COMMAND_OPTS, DEFAULT_MR_AM_ADMIN_USER_ENV, DEFAULT_MR_AM_COMMAND_OPTS, DEFAULT_MR_AM_COMMIT_WINDOW_MS, DEFAULT_MR_AM_COMMITTER_CANCEL_TIMEOUT_MS, DEFAULT_MR_AM_CONTAINERLAUNCHER_THREAD_COUNT_LIMIT, DEFAULT_MR_AM_CONTAINERLAUNCHER_THREADPOOL_INITIAL_SIZE, DEFAULT_MR_AM_CPU_VCORES, DEFAULT_MR_AM_HARD_KILL_TIMEOUT_MS, DEFAULT_MR_AM_HISTORY_COMPLETE_EVENT_FLUSH_TIMEOUT_MS, DEFAULT_MR_AM_HISTORY_JOB_COMPLETE_UNFLUSHED_MULTIPLIER, DEFAULT_MR_AM_HISTORY_MAX_UNFLUSHED_COMPLETE_EVENTS, DEFAULT_MR_AM_HISTORY_USE_BATCHED_FLUSH_QUEUE_SIZE_THRESHOLD, DEFAULT_MR_AM_IGNORE_BLACKLISTING_BLACKLISTED_NODE_PERCENT, DEFAULT_MR_AM_JOB_CLIENT_THREAD_COUNT, DEFAULT_MR_AM_JOB_REDUCE_PREEMPTION_LIMIT, DEFAULT_MR_AM_JOB_REDUCE_RAMP_UP_LIMIT, DEFAULT_MR_AM_LOG_BACKUPS, DEFAULT_MR_AM_LOG_KB, DEFAULT_MR_AM_LOG_LEVEL, DEFAULT_MR_AM_MAX_ATTEMPTS, DEFAULT_MR_AM_NUM_PROGRESS_SPLITS, DEFAULT_MR_AM_PROFILE, DEFAULT_MR_AM_STAGING_DIR, DEFAULT_MR_AM_STAGING_ERASURECODING_ENABLED, DEFAULT_MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_INITIALS, DEFAULT_MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_LAMBDA_MS, DEFAULT_MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_STAGNATED_MS, DEFAULT_MR_AM_TASK_ESTIMATOR_SMOOTH_LAMBDA_MS, DEFAULT_MR_AM_TASK_LISTENER_THREAD_COUNT, DEFAULT_MR_AM_TO_RM_HEARTBEAT_INTERVAL_MS, DEFAULT_MR_AM_TO_RM_WAIT_INTERVAL_MS, DEFAULT_MR_AM_VMEM_MB, DEFAULT_MR_AM_WEBAPP_HTTPS_CLIENT_AUTH, DEFAULT_MR_AM_WEBAPP_HTTPS_ENABLED, DEFAULT_MR_CLIENT_JOB_MAX_RETRIES, DEFAULT_MR_CLIENT_JOB_RETRY_INTERVAL, DEFAULT_MR_CLIENT_MAX_RETRIES, DEFAULT_MR_CLIENT_TO_AM_IPC_MAX_RETRIES, DEFAULT_MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS, DEFAULT_MR_ENCRYPTED_INTERMEDIATE_DATA, DEFAULT_MR_ENCRYPTED_INTERMEDIATE_DATA_BUFFER_KB, DEFAULT_MR_ENCRYPTED_INTERMEDIATE_DATA_KEY_SIZE_BITS, DEFAULT_MR_JOB_END_NOTIFICATION_TIMEOUT, DEFAULT_MR_JOB_REDUCER_PREEMPT_DELAY_SEC, DEFAULT_MR_JOB_REDUCER_UNCONDITIONAL_PREEMPT_DELAY_SEC, DEFAULT_MR_NUM_OPPORTUNISTIC_MAPS_PERCENT, DEFAULT_MR_TASK_ENABLE_PING_FOR_LIVELINESS_CHECK, DEFAULT_REDUCE_CPU_VCORES, DEFAULT_REDUCE_MEMORY_MB, DEFAULT_REDUCE_SEPARATE_SHUFFLE_LOG, DEFAULT_SHELL, DEFAULT_SHUFFLE_FETCH_RETRY_INTERVAL_MS, DEFAULT_SHUFFLE_INPUT_BUFFER_PERCENT, DEFAULT_SHUFFLE_KEY_LENGTH, DEFAULT_SHUFFLE_LOG_BACKUPS, DEFAULT_SHUFFLE_LOG_KB, DEFAULT_SHUFFLE_MERGE_PERCENT, DEFAULT_SPECULATIVE_MINIMUM_ALLOWED_TASKS, DEFAULT_SPECULATIVE_RETRY_AFTER_NO_SPECULATE, DEFAULT_SPECULATIVE_RETRY_AFTER_SPECULATE, DEFAULT_SPECULATIVECAP_RUNNING_TASKS, DEFAULT_SPECULATIVECAP_TOTAL_TASKS, DEFAULT_SPLIT_METAINFO_MAXSIZE, DEFAULT_TASK_ISMAP, DEFAULT_TASK_LOCAL_WRITE_LIMIT_BYTES, DEFAULT_TASK_LOG_BACKUPS, DEFAULT_TASK_PROFILE_PARAMS, DEFAULT_TASK_STUCK_TIMEOUT_MS, DEFAULT_TASK_TIMEOUT_MILLIS, FILES_FOR_CLASSPATH_AND_SHARED_CACHE, FILES_FOR_SHARED_CACHE, FINISH_JOB_WHEN_REDUCERS_DONE, GROUP_COMPARATOR_CLASS, HADOOP_WORK_DIR, HEAP_MEMORY_MB_RATIO, ID, INDEX_CACHE_MEMORY_LIMIT, INPUT_FILE_MANDATORY_PREFIX, INPUT_FILE_OPTION_PREFIX, INPUT_FORMAT_CLASS_ATTR, IO_SORT_FACTOR, IO_SORT_MB, JAR, JAR_UNPACK_PATTERN, JOB_ACL_MODIFY_JOB, JOB_ACL_VIEW_JOB, JOB_AM_ACCESS_DISABLED, JOB_CANCEL_DELEGATION_TOKEN, JOB_CONF_FILE, JOB_DFS_STORAGE_CAPACITY_KILL_LIMIT_EXCEED, JOB_JAR, JOB_JOBTRACKER_ID, JOB_LOCAL_DIR, JOB_MAX_MAP, JOB_NAME, JOB_NAMENODES, JOB_NAMENODES_TOKEN_RENEWAL_EXCLUDE, JOB_NODE_LABEL_EXP, JOB_RUNNING_MAP_LIMIT, JOB_RUNNING_REDUCE_LIMIT, JOB_SINGLE_DISK_LIMIT_BYTES, JOB_SINGLE_DISK_LIMIT_CHECK_INTERVAL_MS, JOB_SINGLE_DISK_LIMIT_KILL_LIMIT_EXCEED, JOB_SPLIT, JOB_SPLIT_METAINFO, JOB_SUBMIT_DIR, JOB_SUBMITHOST, JOB_SUBMITHOSTADDR, JOB_TAGS, JOB_TOKEN_TRACKING_IDS, JOB_TOKEN_TRACKING_IDS_ENABLED, JOB_UBERTASK_ENABLE, JOB_UBERTASK_MAXBYTES, JOB_UBERTASK_MAXMAPS, JOB_UBERTASK_MAXREDUCES, JOBJAR_SHARED_CACHE_UPLOAD_POLICY, JOBJAR_SHARED_CACHE_UPLOAD_POLICY_DEFAULT, JOBJAR_VISIBILITY, JOBJAR_VISIBILITY_DEFAULT, JVM_NUMTASKS_TORUN, KEY_COMPARATOR, MAP_CLASS_ATTR, MAP_COMBINE_MIN_SPILLS, MAP_CPU_VCORES, MAP_DEBUG_SCRIPT, MAP_ENV, MAP_FAILURES_MAX_PERCENT, MAP_INPUT_FILE, MAP_INPUT_PATH, MAP_INPUT_START, MAP_JAVA_OPTS, MAP_LOG_LEVEL, MAP_MAX_ATTEMPTS, MAP_MEMORY_MB, MAP_NODE_LABEL_EXP, MAP_OUTPUT_COLLECTOR_CLASS_ATTR, MAP_OUTPUT_COMPRESS, MAP_OUTPUT_COMPRESS_CODEC, MAP_OUTPUT_KEY_CLASS, MAP_OUTPUT_KEY_FIELD_SEPARATOR, MAP_OUTPUT_KEY_FIELD_SEPERATOR, MAP_OUTPUT_VALUE_CLASS, MAP_RESOURCE_TYPE_PREFIX, MAP_SKIP_INCR_PROC_COUNT, MAP_SKIP_MAX_RECORDS, MAP_SORT_CLASS, MAP_SORT_SPILL_PERCENT, MAP_SPECULATIVE, MAPRED_ADMIN_USER_ENV, MAPRED_ADMIN_USER_SHELL, MAPRED_MAP_ADMIN_JAVA_OPTS, MAPRED_REDUCE_ADMIN_JAVA_OPTS, MAPREDUCE_APPLICATION_CLASSPATH, MAPREDUCE_APPLICATION_FRAMEWORK_PATH, MAPREDUCE_JOB_CLASSLOADER, MAPREDUCE_JOB_CLASSLOADER_SYSTEM_CLASSES, MAPREDUCE_JOB_CREDENTIALS_BINARY, MAPREDUCE_JOB_DIR, MAPREDUCE_JOB_EMIT_TIMELINE_DATA, MAPREDUCE_JOB_LOG4J_PROPERTIES_FILE, MAPREDUCE_JOB_SHUFFLE_PROVIDER_SERVICES, MAPREDUCE_JOB_USER_CLASSPATH_FIRST, MAPREDUCE_JVM_ADD_OPENS_JAVA_OPT, MAPREDUCE_JVM_ADD_OPENS_JAVA_OPT_DEFAULT, MAPREDUCE_JVM_SYSTEM_PROPERTIES_TO_LOG, MAPREDUCE_V2_CHILD_CLASS, MAX_ALLOWED_FETCH_FAILURES_FRACTION, MAX_FETCH_FAILURES_NOTIFICATIONS, MAX_RESOURCES, MAX_RESOURCES_DEFAULT, MAX_RESOURCES_MB, MAX_RESOURCES_MB_DEFAULT, MAX_SHUFFLE_FETCH_HOST_FAILURES, MAX_SHUFFLE_FETCH_RETRY_DELAY, MAX_SINGLE_RESOURCE_MB, MAX_SINGLE_RESOURCE_MB_DEFAULT, MAX_TASK_FAILURES_PER_TRACKER, MR_AM_ADMIN_COMMAND_OPTS, MR_AM_ADMIN_USER_ENV, MR_AM_COMMAND_OPTS, MR_AM_COMMIT_WINDOW_MS, MR_AM_COMMITTER_CANCEL_TIMEOUT_MS, MR_AM_CONTAINERLAUNCHER_THREAD_COUNT_LIMIT, MR_AM_CONTAINERLAUNCHER_THREADPOOL_INITIAL_SIZE, MR_AM_CPU_VCORES, MR_AM_CREATE_JH_INTERMEDIATE_BASE_DIR, MR_AM_ENV, MR_AM_HARD_KILL_TIMEOUT_MS, MR_AM_HISTORY_COMPLETE_EVENT_FLUSH_TIMEOUT_MS, MR_AM_HISTORY_JOB_COMPLETE_UNFLUSHED_MULTIPLIER, MR_AM_HISTORY_MAX_UNFLUSHED_COMPLETE_EVENTS, MR_AM_HISTORY_USE_BATCHED_FLUSH_QUEUE_SIZE_THRESHOLD, MR_AM_IGNORE_BLACKLISTING_BLACKLISTED_NODE_PERECENT, MR_AM_JOB_CLIENT_PORT_RANGE, MR_AM_JOB_CLIENT_THREAD_COUNT, MR_AM_JOB_NODE_BLACKLISTING_ENABLE, MR_AM_JOB_RECOVERY_ENABLE, MR_AM_JOB_RECOVERY_ENABLE_DEFAULT, MR_AM_JOB_REDUCE_PREEMPTION_LIMIT, MR_AM_JOB_REDUCE_RAMPUP_UP_LIMIT, MR_AM_JOB_SPECULATOR, MR_AM_LOG_BACKUPS, MR_AM_LOG_KB, MR_AM_LOG_LEVEL, MR_AM_MAX_ATTEMPTS, MR_AM_NUM_PROGRESS_SPLITS, MR_AM_PREEMPTION_POLICY, MR_AM_PREFIX, MR_AM_PROFILE, MR_AM_PROFILE_PARAMS, MR_AM_RESOURCE_PREFIX, MR_AM_SECURITY_SERVICE_AUTHORIZATION_CLIENT, MR_AM_SECURITY_SERVICE_AUTHORIZATION_TASK_UMBILICAL, MR_AM_STAGING_DIR, MR_AM_STAGING_DIR_ERASURECODING_ENABLED, MR_AM_TASK_ESTIMATOR, MR_AM_TASK_ESTIMATOR_EXPONENTIAL_RATE_ENABLE, MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_LAMBDA_MS, MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_SKIP_INITIALS, MR_AM_TASK_ESTIMATOR_SIMPLE_SMOOTH_STAGNATED_MS, MR_AM_TASK_ESTIMATOR_SMOOTH_LAMBDA_MS, MR_AM_TASK_LISTENER_THREAD_COUNT, MR_AM_TO_RM_HEARTBEAT_INTERVAL_MS, MR_AM_TO_RM_WAIT_INTERVAL_MS, MR_AM_VMEM_MB, MR_AM_WEBAPP_HTTPS_CLIENT_AUTH, MR_AM_WEBAPP_HTTPS_ENABLED, MR_AM_WEBAPP_PORT_RANGE, MR_APPLICATION_TYPE, MR_CLIENT_JOB_MAX_RETRIES, MR_CLIENT_JOB_RETRY_INTERVAL, MR_CLIENT_MAX_RETRIES, MR_CLIENT_TO_AM_IPC_MAX_RETRIES, MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS, MR_ENCRYPTED_INTERMEDIATE_DATA, MR_ENCRYPTED_INTERMEDIATE_DATA_BUFFER_KB, MR_ENCRYPTED_INTERMEDIATE_DATA_KEY_SIZE_BITS, MR_JOB_END_NOTIFICATION_CUSTOM_NOTIFIER_CLASS, MR_JOB_END_NOTIFICATION_MAX_ATTEMPTS, MR_JOB_END_NOTIFICATION_MAX_RETRY_INTERVAL, MR_JOB_END_NOTIFICATION_PROXY, MR_JOB_END_NOTIFICATION_TIMEOUT, MR_JOB_END_NOTIFICATION_URL, MR_JOB_END_RETRY_ATTEMPTS, MR_JOB_END_RETRY_INTERVAL, MR_JOB_REDACTED_PROPERTIES, MR_JOB_REDUCER_PREEMPT_DELAY_SEC, MR_JOB_REDUCER_UNCONDITIONAL_PREEMPT_DELAY_SEC, MR_JOB_SEND_TOKEN_CONF, MR_NUM_OPPORTUNISTIC_MAPS_PERCENT, MR_PREFIX, MR_TASK_ENABLE_PING_FOR_LIVELINESS_CHECK, NUM_MAP_PROFILES, NUM_MAPS, NUM_REDUCE_PROFILES, NUM_REDUCES, OUTPUT, OUTPUT_FORMAT_CLASS_ATTR, OUTPUT_KEY_CLASS, OUTPUT_VALUE_CLASS, PARTITIONER_CLASS_ATTR, PRESERVE_FAILED_TASK_FILES, PRESERVE_FILES_PATTERN, PRIORITY, QUEUE_NAME, RECORDS_BEFORE_PROGRESS, REDUCE_CLASS_ATTR, REDUCE_CPU_VCORES, REDUCE_DEBUG_SCRIPT, REDUCE_ENV, REDUCE_FAILURES_MAXPERCENT, REDUCE_INPUT_BUFFER_PERCENT, REDUCE_JAVA_OPTS, REDUCE_LOG_LEVEL, REDUCE_MARKRESET_BUFFER_PERCENT, REDUCE_MARKRESET_BUFFER_SIZE, REDUCE_MAX_ATTEMPTS, REDUCE_MEMORY_MB, REDUCE_MEMORY_TOTAL_BYTES, REDUCE_MEMTOMEM_ENABLED, REDUCE_MEMTOMEM_THRESHOLD, REDUCE_MERGE_INMEM_THRESHOLD, REDUCE_NODE_LABEL_EXP, REDUCE_RESOURCE_TYPE_PREFIX, REDUCE_SEPARATE_SHUFFLE_LOG, REDUCE_SKIP_INCR_PROC_COUNT, REDUCE_SKIP_MAXGROUPS, REDUCE_SPECULATIVE, RESERVATION_ID, RESOURCE_TYPE_ALTERNATIVE_NAME_MEMORY, RESOURCE_TYPE_NAME_MEMORY, RESOURCE_TYPE_NAME_VCORE, SETUP_CLEANUP_NEEDED, SHARED_CACHE_MODE, SHARED_CACHE_MODE_DEFAULT, SHUFFLE_CONNECT_TIMEOUT, SHUFFLE_FETCH_FAILURES, SHUFFLE_FETCH_RETRY_ENABLED, SHUFFLE_FETCH_RETRY_INTERVAL_MS, SHUFFLE_FETCH_RETRY_TIMEOUT_MS, SHUFFLE_INDEX_CACHE, SHUFFLE_INPUT_BUFFER_PERCENT, SHUFFLE_KEY_LENGTH, SHUFFLE_LOG_BACKUPS, SHUFFLE_LOG_KB, SHUFFLE_MEMORY_LIMIT_PERCENT, SHUFFLE_MERGE_PERCENT, SHUFFLE_NOTIFY_READERROR, SHUFFLE_PARALLEL_COPIES, SHUFFLE_READ_TIMEOUT, SKIP_OUTDIR, SKIP_RECORDS, SKIP_START_ATTEMPTS, SPECULATIVE_MINIMUM_ALLOWED_TASKS, SPECULATIVE_RETRY_AFTER_NO_SPECULATE, SPECULATIVE_RETRY_AFTER_SPECULATE, SPECULATIVE_SLOWNODE_THRESHOLD, SPECULATIVE_SLOWTASK_THRESHOLD, SPECULATIVECAP, SPECULATIVECAP_RUNNING_TASKS, SPECULATIVECAP_TOTAL_TASKS, SPILL_FILES_COUNT_LIMIT, SPLIT_FILE, SPLIT_METAINFO_MAXSIZE, STDERR_LOGFILE_ENV, STDOUT_LOGFILE_ENV, TASK_ATTEMPT_ID, TASK_CLEANUP_NEEDED, TASK_DEBUGOUT_LINES, TASK_EXIT_TIMEOUT, TASK_EXIT_TIMEOUT_CHECK_INTERVAL_MS, TASK_EXIT_TIMEOUT_CHECK_INTERVAL_MS_DEFAULT, TASK_EXIT_TIMEOUT_DEFAULT, TASK_ID, TASK_ISMAP, TASK_LOCAL_WRITE_LIMIT_BYTES, TASK_LOG_BACKUPS, TASK_LOG_PROGRESS_DELTA_THRESHOLD, TASK_LOG_PROGRESS_DELTA_THRESHOLD_DEFAULT, TASK_LOG_PROGRESS_WAIT_INTERVAL_SECONDS, TASK_LOG_PROGRESS_WAIT_INTERVAL_SECONDS_DEFAULT, TASK_MAP_PROFILE_PARAMS, TASK_OUTPUT_DIR, TASK_PARTITION, TASK_PREEMPTION, TASK_PROFILE, TASK_PROFILE_PARAMS, TASK_PROGRESS_REPORT_INTERVAL, TASK_REDUCE_PROFILE_PARAMS, TASK_STUCK_TIMEOUT_MS, TASK_TIMEOUT, TASK_TIMEOUT_CHECK_INTERVAL_MS, TASK_USERLOG_LIMIT, USER_NAME, WORKDIR, WORKFLOW_ADJACENCY_PREFIX_PATTERN, WORKFLOW_ADJACENCY_PREFIX_STRING, WORKFLOW_ID, WORKFLOW_NAME, WORKFLOW_NODE_NAME, WORKFLOW_TAGS, WORKING_DIR -
Constructor Summary
ConstructorsConstructorDescriptionJob()Deprecated.Job(Configuration conf) Deprecated.Job(Configuration conf, String jobName) Deprecated. -
Method Summary
Modifier and TypeMethodDescriptionvoidaddArchiveToClassPath(Path archive) Add an archive path to the current set of classpath entries.static voidaddArchiveToClassPath(Path archive, Configuration conf, FileSystem fs) Add an archive path to the current set of classpath entries.static booleanaddArchiveToSharedCache(URI resource, Configuration conf) Add an archive to job config for shared cache processing.voidaddCacheArchive(URI uri) Add a archives to be localizedstatic voidaddCacheArchive(URI uri, Configuration conf) Add an archives to be localized to the conf.voidaddCacheFile(URI uri) Add a file to be localizedstatic voidaddCacheFile(URI uri, Configuration conf) Add a file to be localized to the conf.voidaddFileToClassPath(Path file) Add an file path to the current set of classpath entries It adds the file to cache as well.static voidaddFileToClassPath(Path file, Configuration conf, FileSystem fs) Add a file path to the current set of classpath entries.static voidaddFileToClassPath(Path file, Configuration conf, FileSystem fs, boolean addToCache) Add a file path to the current set of classpath entries.static booleanaddFileToSharedCache(URI resource, Configuration conf) Add a file to job config for shared cache processing.static booleanaddFileToSharedCacheAndClasspath(URI resource, Configuration conf) Add a file to job config for shared cache processing.floatGet the progress of the job's cleanup-tasks, as a float between 0.0 and 1.0.voidclose()Close theJob.voidDeprecated.voidfailTask(TaskAttemptID taskId) Fail indicated task attempt.This is to get the shared cache upload policies for archives.getCluster()static intThe interval at which waitForCompletion() should check.Gets the counters for this job.This is to get the shared cache upload policies for files.longGet finish time of the job.static Jobstatic JobgetInstance(Configuration conf) static JobgetInstance(Configuration conf, String jobName) static JobgetInstance(Cluster ignored) Deprecated.UsegetInstance()static JobgetInstance(Cluster ignored, Configuration conf) Deprecated.static JobgetInstance(Cluster cluster, JobStatus status, Configuration conf) static JobgetInstance(JobStatus status, Configuration conf) Get the path of the submitted job configuration.The user-specified job name.Returns the current state of the Job.Get scheduling info of the job.static intThe interval at which monitorAndPrintJob() prints statusGet the reservation to which the job is submitted to, if anyGet scheduling info of the job.longGet start time of the job.getTaskCompletionEvents(int startFrom) Get events indicating completion (success/failure) of component tasks.getTaskCompletionEvents(int startFrom, int numEvents) Get events indicating completion (success/failure) of component tasks.String[]getTaskDiagnostics(TaskAttemptID taskid) Gets the diagnostic messages for a given task attempt.static Job.TaskStatusFilterGet the task output filter.org.apache.hadoop.mapreduce.TaskReport[]getTaskReports(TaskType type) Get the information of the current state of the tasks of a job.Get the URL where some job progress information will be displayed.booleanCheck if the job is finished or not.booleanbooleanCheck if the job completed successfully.booleanisUber()voidkillJob()Kill the running job.voidkillTask(TaskAttemptID taskId) Kill indicated task attempt.booleankillTask(TaskAttemptID taskId, boolean shouldFail) Kill indicated task attempt.floatGet the progress of the job's map-tasks, as a float between 0.0 and 1.0.booleanMonitor a job and print status in real-time as progress is made and tasks fail.floatGet the progress of the job's reduce-tasks, as a float between 0.0 and 1.0.static voidsetArchiveSharedCacheUploadPolicies(Configuration conf, Map<String, Boolean> policies) This is to set the shared cache upload policies for archives.voidsetCacheArchives(URI[] archives) Set the given set of archivesstatic voidsetCacheArchives(URI[] archives, Configuration conf) Set the configuration with the given set of archives.voidsetCacheFiles(URI[] files) Set the given set of filesstatic voidsetCacheFiles(URI[] files, Configuration conf) Set the configuration with the given set of files.voidsetCancelDelegationTokenUponJobCompletion(boolean value) Sets the flag that will allow the JobTracker to cancel the HDFS delegation tokens upon job completion.voidsetCombinerClass(Class<? extends Reducer> cls) Set the combiner class for the job.voidsetCombinerKeyGroupingComparatorClass(Class<? extends RawComparator> cls) Define the comparator that controls which keys are grouped together for a single call to combiner,Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)static voidsetFileSharedCacheUploadPolicies(Configuration conf, Map<String, Boolean> policies) This is to set the shared cache upload policies for files.voidsetGroupingComparatorClass(Class<? extends RawComparator> cls) Define the comparator that controls which keys are grouped together for a single call toReducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)voidsetInputFormatClass(Class<? extends InputFormat> cls) Set theInputFormatfor the job.voidSet the job jarvoidsetJarByClass(Class<?> cls) Set the Jar by finding where a given class came from.voidsetJobName(String name) Set the user-specified job name.voidsetJobSetupCleanupNeeded(boolean needed) Specify whether job-setup and job-cleanup is needed for the jobvoidsetMapOutputKeyClass(Class<?> theClass) Set the key class for the map output data.voidsetMapOutputValueClass(Class<?> theClass) Set the value class for the map output data.voidsetMapperClass(Class<? extends Mapper> cls) Set theMapperfor the job.voidsetMapSpeculativeExecution(boolean speculativeExecution) Turn speculative execution on or off for this job for map tasks.voidsetMaxMapAttempts(int n) Expert: Set the number of maximum attempts that will be made to run a map task.voidsetMaxReduceAttempts(int n) Expert: Set the number of maximum attempts that will be made to run a reduce task.voidsetNumReduceTasks(int tasks) Set the number of reduce tasks for the job.voidsetOutputFormatClass(Class<? extends OutputFormat> cls) Set theOutputFormatfor the job.voidsetOutputKeyClass(Class<?> theClass) Set the key class for the job output data.voidsetOutputValueClass(Class<?> theClass) Set the value class for job outputs.voidsetPartitionerClass(Class<? extends Partitioner> cls) Set thePartitionerfor the job.voidsetPriority(JobPriority jobPriority) Set the priority of a running job.voidsetPriorityAsInteger(int jobPriority) Set the priority of a running job.voidsetProfileEnabled(boolean newValue) Set whether the system should collect profiler information for some of the tasks in this job?voidsetProfileParams(String value) Set the profiler configuration arguments.voidsetProfileTaskRange(boolean isMap, String newValue) Set the ranges of maps or reduces to profile. setProfileEnabled(true) must also be called.voidsetReducerClass(Class<? extends Reducer> cls) Set theReducerfor the job.voidsetReduceSpeculativeExecution(boolean speculativeExecution) Turn speculative execution on or off for this job for reduce tasks.voidsetReservationId(ReservationId reservationId) Set the reservation to which the job is submitted tovoidsetSortComparatorClass(Class<? extends RawComparator> cls) Define the comparator that controls how the keys are sorted before they are passed to theReducer.voidsetSpeculativeExecution(boolean speculativeExecution) Turn speculative execution on or off for this job.static voidsetTaskOutputFilter(Configuration conf, Job.TaskStatusFilter newValue) Modify the Configuration to set the task output filter.floatGet the progress of the job's setup-tasks, as a float between 0.0 and 1.0.voidSet the reported username for this job.voidsetWorkingDirectory(Path dir) Set the current working directory for the default file system.voidsubmit()Submit the job to the cluster and return immediately.toString()Dump stats to screen.booleanwaitForCompletion(boolean verbose) Submit the job to the cluster and wait for it to finish.Methods inherited from class org.apache.hadoop.mapreduce.task.JobContextImpl
getArchiveClassPaths, getArchiveClassPaths, getArchiveTimestamps, getArchiveTimestamps, getCacheArchives, getCacheArchives, getCacheFiles, getCacheFiles, getCombinerClass, getCombinerKeyGroupingComparator, getConfiguration, getCredentials, getFileClassPaths, getFileClassPaths, getFileTimestamps, getFileTimestamps, getGroupingComparator, getInputFormatClass, getJar, getJobID, getJobSetupCleanupNeeded, getLocalCacheArchives, getLocalCacheArchives, getLocalCacheFiles, getLocalCacheFiles, getMapOutputKeyClass, getMapOutputValueClass, getMapperClass, getMaxMapAttempts, getMaxReduceAttempts, getNumReduceTasks, getOutputFormatClass, getOutputKeyClass, getOutputValueClass, getPartitionerClass, getProfileEnabled, getProfileParams, getProfileTaskRange, getReducerClass, getSortComparator, getSymlink, getTaskCleanupNeeded, getUser, getWorkingDirectory, setJobIDMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface org.apache.hadoop.mapreduce.JobContext
getArchiveClassPaths, getArchiveTimestamps, getCacheArchives, getCacheFiles, getCombinerClass, getCombinerKeyGroupingComparator, getConfiguration, getCredentials, getFileClassPaths, getFileTimestamps, getGroupingComparator, getInputFormatClass, getJar, getJobID, getJobSetupCleanupNeeded, getLocalCacheArchives, getLocalCacheFiles, getMapOutputKeyClass, getMapOutputValueClass, getMapperClass, getMaxMapAttempts, getMaxReduceAttempts, getNumReduceTasks, getOutputFormatClass, getOutputKeyClass, getOutputValueClass, getPartitionerClass, getProfileEnabled, getProfileParams, getProfileTaskRange, getReducerClass, getSortComparator, getSymlink, getTaskCleanupNeeded, getUser, getWorkingDirectory
-
Field Details
-
OUTPUT_FILTER
- See Also:
-
COMPLETION_POLL_INTERVAL_KEY
Key in mapred-*.xml that sets completionPollInvervalMillis- See Also:
-
PROGRESS_MONITOR_POLL_INTERVAL_KEY
Key in mapred-*.xml that sets progMonitorPollIntervalMillis- See Also:
-
USED_GENERIC_PARSER
- See Also:
-
SUBMIT_REPLICATION
- See Also:
-
DEFAULT_SUBMIT_REPLICATION
public static final int DEFAULT_SUBMIT_REPLICATION- See Also:
-
USE_WILDCARD_FOR_LIBJARS
- See Also:
-
DEFAULT_USE_WILDCARD_FOR_LIBJARS
public static final boolean DEFAULT_USE_WILDCARD_FOR_LIBJARS- See Also:
-
-
Constructor Details
-
Job
Deprecated.UsegetInstance()- Throws:
IOException
-
Job
Deprecated.- Throws:
IOException
-
Job
Deprecated.- Throws:
IOException
-
-
Method Details
-
getInstance
Creates a newJobwith no particularCluster. A Cluster will be created with a genericConfiguration.- Returns:
- the
Job, with no connection to a cluster yet. - Throws:
IOException
-
getInstance
Creates a newJobwith no particularClusterand a givenConfiguration. TheJobmakes a copy of theConfigurationso that any necessary internal modifications do not reflect on the incoming parameter. A Cluster will be created from the conf parameter only when it's needed.- Parameters:
conf- the configuration- Returns:
- the
Job, with no connection to a cluster yet. - Throws:
IOException
-
getInstance
Creates a newJobwith no particularClusterand a given jobName. A Cluster will be created from the conf parameter only when it's needed. TheJobmakes a copy of theConfigurationso that any necessary internal modifications do not reflect on the incoming parameter.- Parameters:
conf- the configuration- Returns:
- the
Job, with no connection to a cluster yet. - Throws:
IOException
-
getInstance
Creates a newJobwith no particularClusterand givenConfigurationandJobStatus. A Cluster will be created from the conf parameter only when it's needed. TheJobmakes a copy of theConfigurationso that any necessary internal modifications do not reflect on the incoming parameter.- Parameters:
status- job statusconf- job configuration- Returns:
- the
Job, with no connection to a cluster yet. - Throws:
IOException
-
getInstance
Deprecated.UsegetInstance()Creates a newJobwith no particularCluster. A Cluster will be created from the conf parameter only when it's needed. TheJobmakes a copy of theConfigurationso that any necessary internal modifications do not reflect on the incoming parameter.- Parameters:
ignored-- Returns:
- the
Job, with no connection to a cluster yet. - Throws:
IOException
-
getInstance
Deprecated.Creates a newJobwith no particularClusterand givenConfiguration. A Cluster will be created from the conf parameter only when it's needed. TheJobmakes a copy of theConfigurationso that any necessary internal modifications do not reflect on the incoming parameter.- Parameters:
ignored-conf- job configuration- Returns:
- the
Job, with no connection to a cluster yet. - Throws:
IOException
-
getInstance
@Private public static Job getInstance(Cluster cluster, JobStatus status, Configuration conf) throws IOException Creates a newJobwith no particularClusterand givenConfigurationandJobStatus. A Cluster will be created from the conf parameter only when it's needed. TheJobmakes a copy of theConfigurationso that any necessary internal modifications do not reflect on the incoming parameter.- Parameters:
cluster- clusterstatus- job statusconf- job configuration- Returns:
- the
Job, with no connection to a cluster yet. - Throws:
IOException
-
getStatus
- Throws:
IOExceptionInterruptedException
-
getJobState
Returns the current state of the Job.- Returns:
- JobStatus#State
- Throws:
IOExceptionInterruptedException
-
getTrackingURL
Get the URL where some job progress information will be displayed.- Returns:
- the URL where some job progress information will be displayed.
-
getJobFile
Get the path of the submitted job configuration.- Returns:
- the path of the submitted job configuration.
-
getStartTime
public long getStartTime()Get start time of the job.- Returns:
- the start time of the job
-
getFinishTime
Get finish time of the job.- Returns:
- the finish time of the job
- Throws:
IOExceptionInterruptedException
-
getSchedulingInfo
Get scheduling info of the job.- Returns:
- the scheduling info of the job
-
getPriority
Get scheduling info of the job.- Returns:
- the priority info of the job
- Throws:
IOExceptionInterruptedException
-
getJobName
The user-specified job name.- Specified by:
getJobNamein interfaceJobContext- Overrides:
getJobNamein classorg.apache.hadoop.mapreduce.task.JobContextImpl- Returns:
- the job's name, defaulting to "".
-
getHistoryUrl
- Throws:
IOExceptionInterruptedException
-
isRetired
- Throws:
IOExceptionInterruptedException
-
getCluster
-
toString
Dump stats to screen.- Overrides:
toStringin classorg.apache.hadoop.mapreduce.task.JobContextImpl
-
getTaskReports
public org.apache.hadoop.mapreduce.TaskReport[] getTaskReports(TaskType type) throws IOException, InterruptedException Get the information of the current state of the tasks of a job.- Parameters:
type- Type of the task- Returns:
- the list of all of the map tips.
- Throws:
IOExceptionInterruptedException
-
mapProgress
Get the progress of the job's map-tasks, as a float between 0.0 and 1.0. When all map tasks have completed, the function returns 1.0.- Returns:
- the progress of the job's map-tasks.
- Throws:
IOException
-
reduceProgress
Get the progress of the job's reduce-tasks, as a float between 0.0 and 1.0. When all reduce tasks have completed, the function returns 1.0.- Returns:
- the progress of the job's reduce-tasks.
- Throws:
IOException
-
cleanupProgress
Get the progress of the job's cleanup-tasks, as a float between 0.0 and 1.0. When all cleanup tasks have completed, the function returns 1.0.- Returns:
- the progress of the job's cleanup-tasks.
- Throws:
IOExceptionInterruptedException
-
setupProgress
Get the progress of the job's setup-tasks, as a float between 0.0 and 1.0. When all setup tasks have completed, the function returns 1.0.- Returns:
- the progress of the job's setup-tasks.
- Throws:
IOException
-
isComplete
Check if the job is finished or not. This is a non-blocking call.- Returns:
trueif the job is complete, elsefalse.- Throws:
IOException
-
isSuccessful
Check if the job completed successfully.- Returns:
trueif the job succeeded, elsefalse.- Throws:
IOException
-
killJob
Kill the running job. Blocks until all job tasks have been killed as well. If the job is no longer running, it simply returns.- Throws:
IOException
-
setPriority
Set the priority of a running job.- Parameters:
jobPriority- the new priority for the job.- Throws:
IOExceptionInterruptedException
-
setPriorityAsInteger
Set the priority of a running job.- Parameters:
jobPriority- the new priority for the job.- Throws:
IOExceptionInterruptedException
-
getTaskCompletionEvents
public TaskCompletionEvent[] getTaskCompletionEvents(int startFrom, int numEvents) throws IOException, InterruptedException Get events indicating completion (success/failure) of component tasks.- Parameters:
startFrom- index to start fetching events fromnumEvents- number of events to fetch- Returns:
- an array of
TaskCompletionEvents - Throws:
IOExceptionInterruptedException
-
getTaskCompletionEvents
Get events indicating completion (success/failure) of component tasks.- Parameters:
startFrom- index to start fetching events from- Returns:
- an array of
TaskCompletionEvents - Throws:
IOException
-
killTask
Kill indicated task attempt.- Parameters:
taskId- the id of the task to kill.shouldFail- iftruethe task is failed and added to failed tasks list, otherwise it is just killed, w/o affecting job failure status.- Throws:
IOException
-
killTask
Kill indicated task attempt.- Parameters:
taskId- the id of the task to be terminated.- Throws:
IOException
-
failTask
Fail indicated task attempt.- Parameters:
taskId- the id of the task to be terminated.- Throws:
IOException
-
getCounters
Gets the counters for this job. May return null if the job has been retired and the job is no longer in the completed job store.- Returns:
- the counters for this job.
- Throws:
IOException
-
getTaskDiagnostics
Gets the diagnostic messages for a given task attempt.- Parameters:
taskid-- Returns:
- the list of diagnostic messages for the task
- Throws:
IOExceptionInterruptedException
-
setNumReduceTasks
Set the number of reduce tasks for the job.- Parameters:
tasks- the number of reduce tasks- Throws:
IllegalStateException- if the job is submitted
-
setWorkingDirectory
Set the current working directory for the default file system.- Parameters:
dir- the new current working directory.- Throws:
IllegalStateException- if the job is submittedIOException
-
setInputFormatClass
Set theInputFormatfor the job.- Parameters:
cls- theInputFormatto use- Throws:
IllegalStateException- if the job is submitted
-
setOutputFormatClass
Set theOutputFormatfor the job.- Parameters:
cls- theOutputFormatto use- Throws:
IllegalStateException- if the job is submitted
-
setMapperClass
Set theMapperfor the job.- Parameters:
cls- theMapperto use- Throws:
IllegalStateException- if the job is submitted
-
setJarByClass
Set the Jar by finding where a given class came from.- Parameters:
cls- the example class
-
setJar
Set the job jar -
setUser
Set the reported username for this job.- Parameters:
user- the username for this job.
-
setCombinerClass
Set the combiner class for the job.- Parameters:
cls- the combiner to use- Throws:
IllegalStateException- if the job is submitted
-
setReducerClass
Set theReducerfor the job.- Parameters:
cls- theReducerto use- Throws:
IllegalStateException- if the job is submitted
-
setPartitionerClass
Set thePartitionerfor the job.- Parameters:
cls- thePartitionerto use- Throws:
IllegalStateException- if the job is submitted
-
setMapOutputKeyClass
Set the key class for the map output data. This allows the user to specify the map output key class to be different than the final output value class.- Parameters:
theClass- the map output key class.- Throws:
IllegalStateException- if the job is submitted
-
setMapOutputValueClass
Set the value class for the map output data. This allows the user to specify the map output value class to be different than the final output value class.- Parameters:
theClass- the map output value class.- Throws:
IllegalStateException- if the job is submitted
-
setOutputKeyClass
Set the key class for the job output data.- Parameters:
theClass- the key class for the job output data.- Throws:
IllegalStateException- if the job is submitted
-
setOutputValueClass
Set the value class for job outputs.- Parameters:
theClass- the value class for job outputs.- Throws:
IllegalStateException- if the job is submitted
-
setCombinerKeyGroupingComparatorClass
public void setCombinerKeyGroupingComparatorClass(Class<? extends RawComparator> cls) throws IllegalStateException Define the comparator that controls which keys are grouped together for a single call to combiner,Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)- Parameters:
cls- the raw comparator to use- Throws:
IllegalStateException- if the job is submitted
-
setSortComparatorClass
Define the comparator that controls how the keys are sorted before they are passed to theReducer.- Parameters:
cls- the raw comparator- Throws:
IllegalStateException- if the job is submitted- See Also:
-
setGroupingComparatorClass
public void setGroupingComparatorClass(Class<? extends RawComparator> cls) throws IllegalStateException Define the comparator that controls which keys are grouped together for a single call toReducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)- Parameters:
cls- the raw comparator to use- Throws:
IllegalStateException- if the job is submitted- See Also:
-
setJobName
Set the user-specified job name.- Parameters:
name- the job's new name.- Throws:
IllegalStateException- if the job is submitted
-
setSpeculativeExecution
public void setSpeculativeExecution(boolean speculativeExecution) Turn speculative execution on or off for this job.- Parameters:
speculativeExecution-trueif speculative execution should be turned on, elsefalse.
-
setMapSpeculativeExecution
public void setMapSpeculativeExecution(boolean speculativeExecution) Turn speculative execution on or off for this job for map tasks.- Parameters:
speculativeExecution-trueif speculative execution should be turned on for map tasks, elsefalse.
-
setReduceSpeculativeExecution
public void setReduceSpeculativeExecution(boolean speculativeExecution) Turn speculative execution on or off for this job for reduce tasks.- Parameters:
speculativeExecution-trueif speculative execution should be turned on for reduce tasks, elsefalse.
-
setJobSetupCleanupNeeded
public void setJobSetupCleanupNeeded(boolean needed) Specify whether job-setup and job-cleanup is needed for the job- Parameters:
needed- Iftrue, job-setup and job-cleanup will be considered fromOutputCommitterelse ignored.
-
setCacheArchives
Set the given set of archives- Parameters:
archives- The list of archives that need to be localized
-
setCacheArchives
Set the configuration with the given set of archives.- Parameters:
archives- The list of archives that need to be localized.conf- Configuration which will be changed.
-
setCacheFiles
Set the given set of files- Parameters:
files- The list of files that need to be localized
-
setCacheFiles
Set the configuration with the given set of files.- Parameters:
files- The list of files that need to be localized.conf- Configuration which will be changed.
-
addCacheArchive
Add a archives to be localized- Parameters:
uri- The uri of the cache to be localized
-
addCacheArchive
Add an archives to be localized to the conf.- Parameters:
uri- The uri of the cache to be localized.conf- Configuration to add the cache to.
-
addCacheFile
Add a file to be localized- Parameters:
uri- The uri of the cache to be localized
-
addCacheFile
Add a file to be localized to the conf. The localized file will be downloaded to the execution node(s), and a link will be created to the file from the job's working directory. If the last part of URI's path name is "*", then the entire parent directory will be localized and links will be created from the job's working directory to each file in the parent directory.The access permissions of the file will determine whether the localized file will be shared across jobs. If the file is not readable by other or if any of its parent directories is not executable by other, then the file will not be shared. In the case of a path that ends in "/*", sharing of the localized files will be determined solely from the access permissions of the parent directories. The access permissions of the individual files will be ignored.
- Parameters:
uri- The uri of the cache to be localized.conf- Configuration to add the cache to.
-
addFileToClassPath
Add an file path to the current set of classpath entries It adds the file to cache as well. Files added with this method will not be unpacked while being added to the classpath. To add archives to classpath, use theaddArchiveToClassPath(Path)method instead.- Parameters:
file- Path of the file to be added- Throws:
IOException
-
addFileToClassPath
Add a file path to the current set of classpath entries. The file will also be added to the cache.- Parameters:
file- Path of the file to be added.conf- Configuration that contains the classpath setting.fs- FileSystem with respect to whichfileshould be interpreted.
-
addFileToClassPath
public static void addFileToClassPath(Path file, Configuration conf, FileSystem fs, boolean addToCache) Add a file path to the current set of classpath entries. The file will also be added to the cache ifaddToCacheis true.- Parameters:
file- Path of the file to be added.conf- Configuration that contains the classpath setting.fs- FileSystem with respect to whichfileshould be interpreted.addToCache- Whether the file should also be added to the cache list.
-
addArchiveToClassPath
Add an archive path to the current set of classpath entries. It adds the archive to cache as well. Archive files will be unpacked and added to the classpath when being distributed.- Parameters:
archive- Path of the archive to be added- Throws:
IOException
-
addArchiveToClassPath
Add an archive path to the current set of classpath entries. It adds the archive to cache as well.- Parameters:
archive- Path of the archive to be added.conf- Configuration that contains the classpath setting.fs- FileSystem with respect to whicharchiveshould be interpreted.
-
createSymlink
Deprecated.Originally intended to enable symlinks, but currently symlinks cannot be disabled. -
setMaxMapAttempts
public void setMaxMapAttempts(int n) Expert: Set the number of maximum attempts that will be made to run a map task.- Parameters:
n- the number of attempts per map task.
-
setMaxReduceAttempts
public void setMaxReduceAttempts(int n) Expert: Set the number of maximum attempts that will be made to run a reduce task.- Parameters:
n- the number of attempts per reduce task.
-
setProfileEnabled
public void setProfileEnabled(boolean newValue) Set whether the system should collect profiler information for some of the tasks in this job? The information is stored in the user log directory.- Parameters:
newValue- true means it should be gathered
-
setProfileParams
Set the profiler configuration arguments. If the string contains a '%s' it will be replaced with the name of the profiling output file when the task runs. This value is passed to the task child JVM on the command line.- Parameters:
value- the configuration string
-
setProfileTaskRange
Set the ranges of maps or reduces to profile. setProfileEnabled(true) must also be called.- Parameters:
newValue- a set of integer ranges of the map ids
-
setCancelDelegationTokenUponJobCompletion
public void setCancelDelegationTokenUponJobCompletion(boolean value) Sets the flag that will allow the JobTracker to cancel the HDFS delegation tokens upon job completion. Defaults to true. -
submit
Submit the job to the cluster and return immediately. -
waitForCompletion
public boolean waitForCompletion(boolean verbose) throws IOException, InterruptedException, ClassNotFoundException Submit the job to the cluster and wait for it to finish.- Parameters:
verbose- print the progress to the user- Returns:
- true if the job succeeded
- Throws:
IOException- thrown if the communication with theJobTrackeris lostInterruptedExceptionClassNotFoundException
-
monitorAndPrintJob
Monitor a job and print status in real-time as progress is made and tasks fail.- Returns:
- true if the job succeeded
- Throws:
IOException- if communication to the JobTracker failsInterruptedException
-
getProgressPollInterval
The interval at which monitorAndPrintJob() prints status -
getCompletionPollInterval
The interval at which waitForCompletion() should check. -
getTaskOutputFilter
Get the task output filter.- Parameters:
conf- the configuration.- Returns:
- the filter level.
-
setTaskOutputFilter
Modify the Configuration to set the task output filter.- Parameters:
conf- the Configuration to modify.newValue- the value to set.
-
isUber
- Throws:
IOExceptionInterruptedException
-
getReservationId
Get the reservation to which the job is submitted to, if any- Returns:
- the reservationId the identifier of the job's reservation, null if the job does not have any reservation associated with it
-
setReservationId
Set the reservation to which the job is submitted to- Parameters:
reservationId- the reservationId to set
-
close
Close theJob.- Specified by:
closein interfaceAutoCloseable- Throws:
IOException- if fail to close.
-
getInstance()