@InterfaceAudience.Public @InterfaceStability.Stable public class JobClient extends CLI implements AutoCloseable
JobClient is the primary interface for the user-job to interact
with the cluster.
JobClient provides facilities to submit jobs, track their
progress, access component-tasks' reports/logs, get the Map-Reduce cluster
status information etc.
The job submission process involves:
InputSplits for the job.
DistributedCache
of the job, if necessary.
JobConf and then uses the JobClient to submit
the job and monitor its progress.
Here is an example on how to use JobClient:
// Create a new JobConf
JobConf job = new JobConf(new Configuration(), MyJob.class);
// Specify various job-specific parameters
job.setJobName("myjob");
job.setInputPath(new Path("in"));
job.setOutputPath(new Path("out"));
job.setMapperClass(MyJob.MyMapper.class);
job.setReducerClass(MyJob.MyReducer.class);
// Submit the job, then poll for progress until the job is complete
JobClient.runJob(job);
Job Control
At times clients would chain map-reduce jobs to accomplish complex tasks which cannot be done via a single map-reduce job. This is fairly easy since the output of the job, typically, goes to distributed file-system and that can be used as the input for the next job.
However, this also means that the onus on ensuring jobs are complete (success/failure) lies squarely on the clients. In such situations the various job-control options are:
runJob(JobConf) : submits the job and returns only after
the job has completed.
submitJob(JobConf) : only submits the job, then poll the
returned handle to the RunningJob to query status and make
scheduling decisions.
JobConf.setJobEndNotificationURI(String) : setup a notification
on job-completion, thus avoiding polling.
JobConf,
ClusterStatus,
Tool,
DistributedCache| Constructor and Description |
|---|
JobClient()
Create a job client.
|
JobClient(Configuration conf)
Build a job client with the given
Configuration,
and connect to the default cluster |
JobClient(InetSocketAddress jobTrackAddr,
Configuration conf)
Build a job client, connect to the indicated job tracker.
|
JobClient(JobConf conf)
Build a job client with the given
JobConf, and connect to the
default cluster |
| Modifier and Type | Method and Description |
|---|---|
void |
cancelDelegationToken(Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> token)
Deprecated.
|
void |
close()
Close the
JobClient. |
void |
displayTasks(JobID jobId,
String type,
String state)
Display the information about a job's tasks, of a particular type and
in a particular state
|
JobStatus[] |
getAllJobs()
Get the jobs that are submitted.
|
JobQueueInfo[] |
getChildQueues(String queueName)
Returns an array of queue information objects about immediate children
of queue queueName.
|
TaskReport[] |
getCleanupTaskReports(JobID jobId)
Get the information of the current state of the cleanup tasks of a job.
|
Cluster |
getClusterHandle()
Get a handle to the Cluster
|
ClusterStatus |
getClusterStatus()
Get status information about the Map-Reduce cluster.
|
ClusterStatus |
getClusterStatus(boolean detailed)
Get status information about the Map-Reduce cluster.
|
protected long |
getCounter(Counters cntrs,
String counterGroupName,
String counterName) |
int |
getDefaultMaps()
Get status information about the max available Maps in the cluster.
|
int |
getDefaultReduces()
Get status information about the max available Reduces in the cluster.
|
Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> |
getDelegationToken(Text renewer)
Get a delegation token for the user from the JobTracker.
|
FileSystem |
getFs()
Get a filesystem handle.
|
RunningJob |
getJob(JobID jobid)
Get an
RunningJob object to track an ongoing job. |
RunningJob |
getJob(String jobid)
Deprecated.
Applications should rather use
getJob(JobID). |
protected RunningJob |
getJobInner(JobID jobid) |
JobStatus[] |
getJobsFromQueue(String queueName)
Gets all the jobs which were added to particular Job Queue
|
TaskReport[] |
getMapTaskReports(JobID jobId)
Get the information of the current state of the map tasks of a job.
|
TaskReport[] |
getMapTaskReports(String jobId)
Deprecated.
Applications should rather use
getMapTaskReports(JobID) |
org.apache.hadoop.mapred.QueueAclsInfo[] |
getQueueAclsForCurrentUser()
Gets the Queue ACLs for current user
|
JobQueueInfo |
getQueueInfo(String queueName)
Gets the queue information associated to a particular Job Queue
|
JobQueueInfo[] |
getQueues()
Return an array of queue information objects about all the Job Queues
configured.
|
TaskReport[] |
getReduceTaskReports(JobID jobId)
Get the information of the current state of the reduce tasks of a job.
|
TaskReport[] |
getReduceTaskReports(String jobId)
Deprecated.
Applications should rather use
getReduceTaskReports(JobID) |
JobQueueInfo[] |
getRootQueues()
Returns an array of queue information objects about root level queues
configured
|
TaskReport[] |
getSetupTaskReports(JobID jobId)
Get the information of the current state of the setup tasks of a job.
|
Path |
getStagingAreaDir()
Fetch the staging area directory for the application
|
Path |
getSystemDir()
Grab the jobtracker system directory path where job-specific files are to be placed.
|
org.apache.hadoop.mapred.JobClient.TaskStatusFilter |
getTaskOutputFilter()
Deprecated.
|
static org.apache.hadoop.mapred.JobClient.TaskStatusFilter |
getTaskOutputFilter(JobConf job)
Get the task output filter out of the JobConf.
|
void |
init(JobConf conf)
Connect to the default cluster
|
static boolean |
isJobDirValid(Path jobDirPath,
FileSystem fs)
Checks if the job directory is clean and has all the required components
for (re) starting the job
|
JobStatus[] |
jobsToComplete()
Get the jobs that are not completed and not failed.
|
static void |
main(String[] argv) |
boolean |
monitorAndPrintJob(JobConf conf,
RunningJob job)
Monitor a job and print status in real-time as progress is made and tasks
fail.
|
long |
renewDelegationToken(Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> token)
Deprecated.
|
static RunningJob |
runJob(JobConf job)
Utility that submits a job, then polls for progress until the job is
complete.
|
void |
setTaskOutputFilter(org.apache.hadoop.mapred.JobClient.TaskStatusFilter newValue)
Deprecated.
|
static void |
setTaskOutputFilter(JobConf job,
org.apache.hadoop.mapred.JobClient.TaskStatusFilter newValue)
Modify the JobConf to set the task output filter.
|
RunningJob |
submitJob(JobConf conf)
Submit a job to the MR system.
|
RunningJob |
submitJob(String jobFile)
Submit a job to the MR system.
|
displayJobList, displayTasks, getTaskLogURL, rungetConf, setConfclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetConf, setConfpublic JobClient()
public JobClient(JobConf conf) throws IOException
JobConf, and connect to the
default clusterconf - the job configuration.IOExceptionpublic JobClient(Configuration conf) throws IOException
Configuration,
and connect to the default clusterconf - the configuration.IOExceptionpublic JobClient(InetSocketAddress jobTrackAddr, Configuration conf) throws IOException
jobTrackAddr - the job tracker to connect to.conf - configuration.IOExceptionpublic void init(JobConf conf) throws IOException
conf - the job configuration.IOExceptionpublic void close()
throws IOException
JobClient.close in interface AutoCloseableIOExceptionpublic FileSystem getFs() throws IOException
IOExceptionpublic Cluster getClusterHandle()
public RunningJob submitJob(String jobFile) throws FileNotFoundException, InvalidJobConfException, IOException
RunningJob which can be used to track
the running-job.jobFile - the job configuration.RunningJob which can be used to track the
running-job.FileNotFoundExceptionInvalidJobConfExceptionIOExceptionpublic RunningJob submitJob(JobConf conf) throws FileNotFoundException, IOException
RunningJob which can be used to track
the running-job.conf - the job configuration.RunningJob which can be used to track the
running-job.FileNotFoundExceptionIOExceptionprotected RunningJob getJobInner(JobID jobid) throws IOException
IOExceptionpublic RunningJob getJob(JobID jobid) throws IOException
RunningJob object to track an ongoing job. Returns
null if the id does not correspond to any known job.jobid - the jobid of the job.RunningJob handle to track the job, null if the
jobid doesn't correspond to any known job.IOException@Deprecated public RunningJob getJob(String jobid) throws IOException
getJob(JobID).IOExceptionpublic TaskReport[] getMapTaskReports(JobID jobId) throws IOException
jobId - the job to query.IOException@Deprecated public TaskReport[] getMapTaskReports(String jobId) throws IOException
getMapTaskReports(JobID)IOExceptionpublic TaskReport[] getReduceTaskReports(JobID jobId) throws IOException
jobId - the job to query.IOExceptionpublic TaskReport[] getCleanupTaskReports(JobID jobId) throws IOException
jobId - the job to query.IOExceptionpublic TaskReport[] getSetupTaskReports(JobID jobId) throws IOException
jobId - the job to query.IOException@Deprecated public TaskReport[] getReduceTaskReports(String jobId) throws IOException
getReduceTaskReports(JobID)IOExceptionpublic void displayTasks(JobID jobId, String type, String state) throws IOException
jobId - the ID of the jobtype - the type of the task (map/reduce/setup/cleanup)state - the state of the task
(pending/running/completed/failed/killed)IOException - when there is an error communicating with the masterIllegalArgumentException - if an invalid type/state is passedpublic ClusterStatus getClusterStatus() throws IOException
ClusterStatus.IOExceptionpublic ClusterStatus getClusterStatus(boolean detailed) throws IOException
detailed - if true then get a detailed status including the
tracker namesClusterStatus.IOExceptionpublic JobStatus[] jobsToComplete() throws IOException
JobStatus for the running/to-be-run jobs.IOExceptionpublic JobStatus[] getAllJobs() throws IOException
JobStatus for the submitted jobs.IOExceptionpublic static RunningJob runJob(JobConf job) throws IOException
job - the job configuration.IOException - if the job failspublic boolean monitorAndPrintJob(JobConf conf, RunningJob job) throws IOException, InterruptedException
conf - the job's configurationjob - the job to trackIOException - if communication to the JobTracker failsInterruptedException@Deprecated public void setTaskOutputFilter(org.apache.hadoop.mapred.JobClient.TaskStatusFilter newValue)
newValue - task filter.public static org.apache.hadoop.mapred.JobClient.TaskStatusFilter getTaskOutputFilter(JobConf job)
job - the JobConf to examine.public static void setTaskOutputFilter(JobConf job, org.apache.hadoop.mapred.JobClient.TaskStatusFilter newValue)
job - the JobConf to modify.newValue - the value to set.@Deprecated public org.apache.hadoop.mapred.JobClient.TaskStatusFilter getTaskOutputFilter()
protected long getCounter(Counters cntrs, String counterGroupName, String counterName) throws IOException
getCounter in class CLIIOExceptionpublic int getDefaultMaps()
throws IOException
IOExceptionpublic int getDefaultReduces()
throws IOException
IOExceptionpublic Path getSystemDir()
public static boolean isJobDirValid(Path jobDirPath, FileSystem fs) throws IOException
IOExceptionpublic Path getStagingAreaDir() throws IOException
IOExceptionpublic JobQueueInfo[] getRootQueues() throws IOException
IOExceptionpublic JobQueueInfo[] getChildQueues(String queueName) throws IOException
queueName - IOExceptionpublic JobQueueInfo[] getQueues() throws IOException
IOExceptionpublic JobStatus[] getJobsFromQueue(String queueName) throws IOException
queueName - name of the Job QueueIOExceptionpublic JobQueueInfo getQueueInfo(String queueName) throws IOException
queueName - name of the job queue.IOExceptionpublic org.apache.hadoop.mapred.QueueAclsInfo[] getQueueAclsForCurrentUser()
throws IOException
IOExceptionpublic Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> getDelegationToken(Text renewer) throws IOException, InterruptedException
renewer - the user who can renew the tokenIOExceptionInterruptedExceptionpublic long renewDelegationToken(Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> token) throws org.apache.hadoop.security.token.SecretManager.InvalidToken, IOException, InterruptedException
Token.renew(org.apache.hadoop.conf.Configuration) insteadtoken - the token to reneworg.apache.hadoop.security.token.SecretManager.InvalidTokenIOExceptionInterruptedExceptionpublic void cancelDelegationToken(Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> token) throws org.apache.hadoop.security.token.SecretManager.InvalidToken, IOException, InterruptedException
Token.cancel(org.apache.hadoop.conf.Configuration) insteadtoken - the token to cancelIOExceptionorg.apache.hadoop.security.token.SecretManager.InvalidTokenInterruptedExceptionCopyright © 2018 Apache Software Foundation. All rights reserved.