org.apache.hadoop.mapred
Class JobClient

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.hadoop.mapreduce.tools.CLI
          extended by org.apache.hadoop.mapred.JobClient
All Implemented Interfaces:
Configurable, Tool

@InterfaceAudience.Public
@InterfaceStability.Stable
public class JobClient
extends CLI

JobClient is the primary interface for the user-job to interact with the cluster. JobClient provides facilities to submit jobs, track their progress, access component-tasks' reports/logs, get the Map-Reduce cluster status information etc.

The job submission process involves:

  1. Checking the input and output specifications of the job.
  2. Computing the InputSplits for the job.
  3. Setup the requisite accounting information for the DistributedCache of the job, if necessary.
  4. Copying the job's jar and configuration to the map-reduce system directory on the distributed file-system.
  5. Submitting the job to the cluster and optionally monitoring it's status.

Normally the user creates the application, describes various facets of the job via JobConf and then uses the JobClient to submit the job and monitor its progress.

Here is an example on how to use JobClient:

     // Create a new JobConf
     JobConf job = new JobConf(new Configuration(), MyJob.class);
     
     // Specify various job-specific parameters     
     job.setJobName("myjob");
     
     job.setInputPath(new Path("in"));
     job.setOutputPath(new Path("out"));
     
     job.setMapperClass(MyJob.MyMapper.class);
     job.setReducerClass(MyJob.MyReducer.class);

     // Submit the job, then poll for progress until the job is complete
     JobClient.runJob(job);
 

Job Control

At times clients would chain map-reduce jobs to accomplish complex tasks which cannot be done via a single map-reduce job. This is fairly easy since the output of the job, typically, goes to distributed file-system and that can be used as the input for the next job.

However, this also means that the onus on ensuring jobs are complete (success/failure) lies squarely on the clients. In such situations the various job-control options are:

  1. runJob(JobConf) : submits the job and returns only after the job has completed.
  2. submitJob(JobConf) : only submits the job, then poll the returned handle to the RunningJob to query status and make scheduling decisions.
  3. JobConf.setJobEndNotificationURI(String) : setup a notification on job-completion, thus avoiding polling.

See Also:
JobConf, ClusterStatus, Tool, DistributedCache

Field Summary
 
Fields inherited from class org.apache.hadoop.mapreduce.tools.CLI
cluster
 
Constructor Summary
JobClient()
          Create a job client.
JobClient(Configuration conf)
          Build a job client with the given Configuration, and connect to the default cluster
JobClient(InetSocketAddress jobTrackAddr, Configuration conf)
          Build a job client, connect to the indicated job tracker.
JobClient(JobConf conf)
          Build a job client with the given JobConf, and connect to the default cluster
 
Method Summary
 void cancelDelegationToken(org.apache.hadoop.security.token.Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> token)
          Deprecated. Use Token.cancel(org.apache.hadoop.conf.Configuration) instead
 void close()
          Close the JobClient.
 void displayTasks(JobID jobId, String type, String state)
          Display the information about a job's tasks, of a particular type and in a particular state
 JobStatus[] getAllJobs()
          Get the jobs that are submitted.
 JobQueueInfo[] getChildQueues(String queueName)
          Returns an array of queue information objects about immediate children of queue queueName.
 TaskReport[] getCleanupTaskReports(JobID jobId)
          Get the information of the current state of the cleanup tasks of a job.
 Cluster getClusterHandle()
          Get a handle to the Cluster
 ClusterStatus getClusterStatus()
          Get status information about the Map-Reduce cluster.
 ClusterStatus getClusterStatus(boolean detailed)
          Get status information about the Map-Reduce cluster.
protected  long getCounter(Counters cntrs, String counterGroupName, String counterName)
           
 int getDefaultMaps()
          Get status information about the max available Maps in the cluster.
 int getDefaultReduces()
          Get status information about the max available Reduces in the cluster.
 org.apache.hadoop.security.token.Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> getDelegationToken(Text renewer)
          Get a delegation token for the user from the JobTracker.
 FileSystem getFs()
          Get a filesystem handle.
 RunningJob getJob(JobID jobid)
          Get an RunningJob object to track an ongoing job.
 RunningJob getJob(String jobid)
          Deprecated. Applications should rather use getJob(JobID).
 JobStatus[] getJobsFromQueue(String queueName)
          Gets all the jobs which were added to particular Job Queue
 TaskReport[] getMapTaskReports(JobID jobId)
          Get the information of the current state of the map tasks of a job.
 TaskReport[] getMapTaskReports(String jobId)
          Deprecated. Applications should rather use getMapTaskReports(JobID)
 org.apache.hadoop.mapred.QueueAclsInfo[] getQueueAclsForCurrentUser()
          Gets the Queue ACLs for current user
 JobQueueInfo getQueueInfo(String queueName)
          Gets the queue information associated to a particular Job Queue
 JobQueueInfo[] getQueues()
          Return an array of queue information objects about all the Job Queues configured.
 TaskReport[] getReduceTaskReports(JobID jobId)
          Get the information of the current state of the reduce tasks of a job.
 TaskReport[] getReduceTaskReports(String jobId)
          Deprecated. Applications should rather use getReduceTaskReports(JobID)
 JobQueueInfo[] getRootQueues()
          Returns an array of queue information objects about root level queues configured
 TaskReport[] getSetupTaskReports(JobID jobId)
          Get the information of the current state of the setup tasks of a job.
 Path getStagingAreaDir()
          Fetch the staging area directory for the application
 Path getSystemDir()
          Grab the jobtracker system directory path where job-specific files are to be placed.
 org.apache.hadoop.mapred.JobClient.TaskStatusFilter getTaskOutputFilter()
          Deprecated. 
static org.apache.hadoop.mapred.JobClient.TaskStatusFilter getTaskOutputFilter(JobConf job)
          Get the task output filter out of the JobConf.
 void init(JobConf conf)
          Connect to the default cluster
static boolean isJobDirValid(Path jobDirPath, FileSystem fs)
          Checks if the job directory is clean and has all the required components for (re) starting the job
 JobStatus[] jobsToComplete()
          Get the jobs that are not completed and not failed.
static void main(String[] argv)
           
 boolean monitorAndPrintJob(JobConf conf, RunningJob job)
          Monitor a job and print status in real-time as progress is made and tasks fail.
 long renewDelegationToken(org.apache.hadoop.security.token.Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> token)
          Deprecated. Use Token.renew(org.apache.hadoop.conf.Configuration) instead
static RunningJob runJob(JobConf job)
          Utility that submits a job, then polls for progress until the job is complete.
 void setTaskOutputFilter(org.apache.hadoop.mapred.JobClient.TaskStatusFilter newValue)
          Deprecated. 
static void setTaskOutputFilter(JobConf job, org.apache.hadoop.mapred.JobClient.TaskStatusFilter newValue)
          Modify the JobConf to set the task output filter.
 RunningJob submitJob(JobConf conf)
          Submit a job to the MR system.
 RunningJob submitJob(String jobFile)
          Submit a job to the MR system.
 
Methods inherited from class org.apache.hadoop.mapreduce.tools.CLI
displayJobList, displayTasks, run
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Constructor Detail

JobClient

public JobClient()
Create a job client.


JobClient

public JobClient(JobConf conf)
          throws IOException
Build a job client with the given JobConf, and connect to the default cluster

Parameters:
conf - the job configuration.
Throws:
IOException

JobClient

public JobClient(Configuration conf)
          throws IOException
Build a job client with the given Configuration, and connect to the default cluster

Parameters:
conf - the configuration.
Throws:
IOException

JobClient

public JobClient(InetSocketAddress jobTrackAddr,
                 Configuration conf)
          throws IOException
Build a job client, connect to the indicated job tracker.

Parameters:
jobTrackAddr - the job tracker to connect to.
conf - configuration.
Throws:
IOException
Method Detail

init

public void init(JobConf conf)
          throws IOException
Connect to the default cluster

Parameters:
conf - the job configuration.
Throws:
IOException

close

public void close()
           throws IOException
Close the JobClient.

Throws:
IOException

getFs

public FileSystem getFs()
                 throws IOException
Get a filesystem handle. We need this to prepare jobs for submission to the MapReduce system.

Returns:
the filesystem handle.
Throws:
IOException

getClusterHandle

public Cluster getClusterHandle()
Get a handle to the Cluster


submitJob

public RunningJob submitJob(String jobFile)
                     throws FileNotFoundException,
                            InvalidJobConfException,
                            IOException
Submit a job to the MR system. This returns a handle to the RunningJob which can be used to track the running-job.

Parameters:
jobFile - the job configuration.
Returns:
a handle to the RunningJob which can be used to track the running-job.
Throws:
FileNotFoundException
InvalidJobConfException
IOException

submitJob

public RunningJob submitJob(JobConf conf)
                     throws FileNotFoundException,
                            IOException
Submit a job to the MR system. This returns a handle to the RunningJob which can be used to track the running-job.

Parameters:
conf - the job configuration.
Returns:
a handle to the RunningJob which can be used to track the running-job.
Throws:
FileNotFoundException
IOException

getJob

public RunningJob getJob(JobID jobid)
                  throws IOException
Get an RunningJob object to track an ongoing job. Returns null if the id does not correspond to any known job.

Parameters:
jobid - the jobid of the job.
Returns:
the RunningJob handle to track the job, null if the jobid doesn't correspond to any known job.
Throws:
IOException

getJob

@Deprecated
public RunningJob getJob(String jobid)
                  throws IOException
Deprecated. Applications should rather use getJob(JobID).

Throws:
IOException

getMapTaskReports

public TaskReport[] getMapTaskReports(JobID jobId)
                               throws IOException
Get the information of the current state of the map tasks of a job.

Parameters:
jobId - the job to query.
Returns:
the list of all of the map tips.
Throws:
IOException

getMapTaskReports

@Deprecated
public TaskReport[] getMapTaskReports(String jobId)
                               throws IOException
Deprecated. Applications should rather use getMapTaskReports(JobID)

Throws:
IOException

getReduceTaskReports

public TaskReport[] getReduceTaskReports(JobID jobId)
                                  throws IOException
Get the information of the current state of the reduce tasks of a job.

Parameters:
jobId - the job to query.
Returns:
the list of all of the reduce tips.
Throws:
IOException

getCleanupTaskReports

public TaskReport[] getCleanupTaskReports(JobID jobId)
                                   throws IOException
Get the information of the current state of the cleanup tasks of a job.

Parameters:
jobId - the job to query.
Returns:
the list of all of the cleanup tips.
Throws:
IOException

getSetupTaskReports

public TaskReport[] getSetupTaskReports(JobID jobId)
                                 throws IOException
Get the information of the current state of the setup tasks of a job.

Parameters:
jobId - the job to query.
Returns:
the list of all of the setup tips.
Throws:
IOException

getReduceTaskReports

@Deprecated
public TaskReport[] getReduceTaskReports(String jobId)
                                  throws IOException
Deprecated. Applications should rather use getReduceTaskReports(JobID)

Throws:
IOException

displayTasks

public void displayTasks(JobID jobId,
                         String type,
                         String state)
                  throws IOException
Display the information about a job's tasks, of a particular type and in a particular state

Parameters:
jobId - the ID of the job
type - the type of the task (map/reduce/setup/cleanup)
state - the state of the task (pending/running/completed/failed/killed)
Throws:
IOException

getClusterStatus

public ClusterStatus getClusterStatus()
                               throws IOException
Get status information about the Map-Reduce cluster.

Returns:
the status information about the Map-Reduce cluster as an object of ClusterStatus.
Throws:
IOException

getClusterStatus

public ClusterStatus getClusterStatus(boolean detailed)
                               throws IOException
Get status information about the Map-Reduce cluster.

Parameters:
detailed - if true then get a detailed status including the tracker names
Returns:
the status information about the Map-Reduce cluster as an object of ClusterStatus.
Throws:
IOException

jobsToComplete

public JobStatus[] jobsToComplete()
                           throws IOException
Get the jobs that are not completed and not failed.

Returns:
array of JobStatus for the running/to-be-run jobs.
Throws:
IOException

getAllJobs

public JobStatus[] getAllJobs()
                       throws IOException
Get the jobs that are submitted.

Returns:
array of JobStatus for the submitted jobs.
Throws:
IOException

runJob

public static RunningJob runJob(JobConf job)
                         throws IOException
Utility that submits a job, then polls for progress until the job is complete.

Parameters:
job - the job configuration.
Throws:
IOException - if the job fails

monitorAndPrintJob

public boolean monitorAndPrintJob(JobConf conf,
                                  RunningJob job)
                           throws IOException,
                                  InterruptedException
Monitor a job and print status in real-time as progress is made and tasks fail.

Parameters:
conf - the job's configuration
job - the job to track
Returns:
true if the job succeeded
Throws:
IOException - if communication to the JobTracker fails
InterruptedException

setTaskOutputFilter

@Deprecated
public void setTaskOutputFilter(org.apache.hadoop.mapred.JobClient.TaskStatusFilter newValue)
Deprecated. 

Sets the output filter for tasks. only those tasks are printed whose output matches the filter.

Parameters:
newValue - task filter.

getTaskOutputFilter

public static org.apache.hadoop.mapred.JobClient.TaskStatusFilter getTaskOutputFilter(JobConf job)
Get the task output filter out of the JobConf.

Parameters:
job - the JobConf to examine.
Returns:
the filter level.

setTaskOutputFilter

public static void setTaskOutputFilter(JobConf job,
                                       org.apache.hadoop.mapred.JobClient.TaskStatusFilter newValue)
Modify the JobConf to set the task output filter.

Parameters:
job - the JobConf to modify.
newValue - the value to set.

getTaskOutputFilter

@Deprecated
public org.apache.hadoop.mapred.JobClient.TaskStatusFilter getTaskOutputFilter()
Deprecated. 

Returns task output filter.

Returns:
task filter.

getCounter

protected long getCounter(Counters cntrs,
                          String counterGroupName,
                          String counterName)
                   throws IOException
Overrides:
getCounter in class CLI
Throws:
IOException

getDefaultMaps

public int getDefaultMaps()
                   throws IOException
Get status information about the max available Maps in the cluster.

Returns:
the max available Maps in the cluster
Throws:
IOException

getDefaultReduces

public int getDefaultReduces()
                      throws IOException
Get status information about the max available Reduces in the cluster.

Returns:
the max available Reduces in the cluster
Throws:
IOException

getSystemDir

public Path getSystemDir()
Grab the jobtracker system directory path where job-specific files are to be placed.

Returns:
the system directory where job-specific files are to be placed.

isJobDirValid

public static boolean isJobDirValid(Path jobDirPath,
                                    FileSystem fs)
                             throws IOException
Checks if the job directory is clean and has all the required components for (re) starting the job

Throws:
IOException

getStagingAreaDir

public Path getStagingAreaDir()
                       throws IOException
Fetch the staging area directory for the application

Returns:
path to staging area directory
Throws:
IOException

getRootQueues

public JobQueueInfo[] getRootQueues()
                             throws IOException
Returns an array of queue information objects about root level queues configured

Returns:
the array of root level JobQueueInfo objects
Throws:
IOException

getChildQueues

public JobQueueInfo[] getChildQueues(String queueName)
                              throws IOException
Returns an array of queue information objects about immediate children of queue queueName.

Parameters:
queueName -
Returns:
the array of immediate children JobQueueInfo objects
Throws:
IOException

getQueues

public JobQueueInfo[] getQueues()
                         throws IOException
Return an array of queue information objects about all the Job Queues configured.

Returns:
Array of JobQueueInfo objects
Throws:
IOException

getJobsFromQueue

public JobStatus[] getJobsFromQueue(String queueName)
                             throws IOException
Gets all the jobs which were added to particular Job Queue

Parameters:
queueName - name of the Job Queue
Returns:
Array of jobs present in the job queue
Throws:
IOException

getQueueInfo

public JobQueueInfo getQueueInfo(String queueName)
                          throws IOException
Gets the queue information associated to a particular Job Queue

Parameters:
queueName - name of the job queue.
Returns:
Queue information associated to particular queue.
Throws:
IOException

getQueueAclsForCurrentUser

public org.apache.hadoop.mapred.QueueAclsInfo[] getQueueAclsForCurrentUser()
                                                                    throws IOException
Gets the Queue ACLs for current user

Returns:
array of QueueAclsInfo object for current user.
Throws:
IOException

getDelegationToken

public org.apache.hadoop.security.token.Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> getDelegationToken(Text renewer)
                                                                                                                                           throws IOException,
                                                                                                                                                  InterruptedException
Get a delegation token for the user from the JobTracker.

Parameters:
renewer - the user who can renew the token
Returns:
the new token
Throws:
IOException
InterruptedException

renewDelegationToken

public long renewDelegationToken(org.apache.hadoop.security.token.Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> token)
                          throws org.apache.hadoop.security.token.SecretManager.InvalidToken,
                                 IOException,
                                 InterruptedException
Deprecated. Use Token.renew(org.apache.hadoop.conf.Configuration) instead

Renew a delegation token

Parameters:
token - the token to renew
Returns:
true if the renewal went well
Throws:
org.apache.hadoop.security.token.SecretManager.InvalidToken
IOException
InterruptedException

cancelDelegationToken

public void cancelDelegationToken(org.apache.hadoop.security.token.Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> token)
                           throws org.apache.hadoop.security.token.SecretManager.InvalidToken,
                                  IOException,
                                  InterruptedException
Deprecated. Use Token.cancel(org.apache.hadoop.conf.Configuration) instead

Cancel a delegation token from the JobTracker

Parameters:
token - the token to cancel
Throws:
IOException
org.apache.hadoop.security.token.SecretManager.InvalidToken
InterruptedException

main

public static void main(String[] argv)
                 throws Exception
Throws:
Exception


Copyright © 2014 Apache Software Foundation. All Rights Reserved.