Class JobClient

All Implemented Interfaces:
AutoCloseable, Configurable, Tool

@Public @Stable public class JobClient extends CLI implements AutoCloseable
JobClient is the primary interface for the user-job to interact with the cluster. JobClient provides facilities to submit jobs, track their progress, access component-tasks' reports/logs, get the Map-Reduce cluster status information etc.

The job submission process involves:

  1. Checking the input and output specifications of the job.
  2. Computing the InputSplits for the job.
  3. Setup the requisite accounting information for the DistributedCache of the job, if necessary.
  4. Copying the job's jar and configuration to the map-reduce system directory on the distributed file-system.
  5. Submitting the job to the cluster and optionally monitoring it's status.
Normally the user creates the application, describes various facets of the job via JobConf and then uses the JobClient to submit the job and monitor its progress.

Here is an example on how to use JobClient:

     // Create a new JobConf
     JobConf job = new JobConf(new Configuration(), MyJob.class);
     
     // Specify various job-specific parameters     
     job.setJobName("myjob");
     
     job.setInputPath(new Path("in"));
     job.setOutputPath(new Path("out"));
     
     job.setMapperClass(MyJob.MyMapper.class);
     job.setReducerClass(MyJob.MyReducer.class);

     // Submit the job, then poll for progress until the job is complete
     JobClient.runJob(job);
 
Job Control

At times clients would chain map-reduce jobs to accomplish complex tasks which cannot be done via a single map-reduce job. This is fairly easy since the output of the job, typically, goes to distributed file-system and that can be used as the input for the next job.

However, this also means that the onus on ensuring jobs are complete (success/failure) lies squarely on the clients. In such situations the various job-control options are:

  1. runJob(JobConf) : submits the job and returns only after the job has completed.
  2. submitJob(JobConf) : only submits the job, then poll the returned handle to the RunningJob to query status and make scheduling decisions.
  3. JobConf.setJobEndNotificationURI(String) : setup a notification on job-completion, thus avoiding polling.
See Also:
  • Field Details

    • MAPREDUCE_CLIENT_RETRY_POLICY_ENABLED_KEY

      @Private public static final String MAPREDUCE_CLIENT_RETRY_POLICY_ENABLED_KEY
      See Also:
    • MAPREDUCE_CLIENT_RETRY_POLICY_ENABLED_DEFAULT

      @Private public static final boolean MAPREDUCE_CLIENT_RETRY_POLICY_ENABLED_DEFAULT
      See Also:
    • MAPREDUCE_CLIENT_RETRY_POLICY_SPEC_KEY

      @Private public static final String MAPREDUCE_CLIENT_RETRY_POLICY_SPEC_KEY
      See Also:
    • MAPREDUCE_CLIENT_RETRY_POLICY_SPEC_DEFAULT

      @Private public static final String MAPREDUCE_CLIENT_RETRY_POLICY_SPEC_DEFAULT
      See Also:
  • Constructor Details

    • JobClient

      public JobClient()
      Create a job client.
    • JobClient

      public JobClient(JobConf conf) throws IOException
      Build a job client with the given JobConf, and connect to the default cluster
      Parameters:
      conf - the job configuration.
      Throws:
      IOException
    • JobClient

      public JobClient(Configuration conf) throws IOException
      Build a job client with the given Configuration, and connect to the default cluster
      Parameters:
      conf - the configuration.
      Throws:
      IOException
    • JobClient

      public JobClient(InetSocketAddress jobTrackAddr, Configuration conf) throws IOException
      Build a job client, connect to the indicated job tracker.
      Parameters:
      jobTrackAddr - the job tracker to connect to.
      conf - configuration.
      Throws:
      IOException
  • Method Details

    • init

      public void init(JobConf conf) throws IOException
      Connect to the default cluster
      Parameters:
      conf - the job configuration.
      Throws:
      IOException
    • close

      public void close() throws IOException
      Close the JobClient.
      Specified by:
      close in interface AutoCloseable
      Throws:
      IOException
    • getFs

      public FileSystem getFs() throws IOException
      Get a filesystem handle. We need this to prepare jobs for submission to the MapReduce system.
      Returns:
      the filesystem handle.
      Throws:
      IOException
    • getClusterHandle

      public Cluster getClusterHandle()
      Get a handle to the Cluster
    • submitJob

      Submit a job to the MR system. This returns a handle to the RunningJob which can be used to track the running-job.
      Parameters:
      jobFile - the job configuration.
      Returns:
      a handle to the RunningJob which can be used to track the running-job.
      Throws:
      FileNotFoundException
      InvalidJobConfException
      IOException
    • submitJob

      public RunningJob submitJob(JobConf conf) throws FileNotFoundException, IOException
      Submit a job to the MR system. This returns a handle to the RunningJob which can be used to track the running-job.
      Parameters:
      conf - the job configuration.
      Returns:
      a handle to the RunningJob which can be used to track the running-job.
      Throws:
      FileNotFoundException
      IOException
    • submitJobInternal

      @Private public RunningJob submitJobInternal(JobConf conf) throws FileNotFoundException, IOException
      Throws:
      FileNotFoundException
      IOException
    • getJobInner

      protected RunningJob getJobInner(JobID jobid) throws IOException
      Throws:
      IOException
    • getJob

      public RunningJob getJob(JobID jobid) throws IOException
      Get an RunningJob object to track an ongoing job. Returns null if the id does not correspond to any known job.
      Parameters:
      jobid - the jobid of the job.
      Returns:
      the RunningJob handle to track the job, null if the jobid doesn't correspond to any known job.
      Throws:
      IOException
    • getJob

      @Deprecated public RunningJob getJob(String jobid) throws IOException
      Deprecated.
      Applications should rather use getJob(JobID).
      Throws:
      IOException
    • getMapTaskReports

      public TaskReport[] getMapTaskReports(JobID jobId) throws IOException
      Get the information of the current state of the map tasks of a job.
      Parameters:
      jobId - the job to query.
      Returns:
      the list of all of the map tips.
      Throws:
      IOException
    • getMapTaskReports

      @Deprecated public TaskReport[] getMapTaskReports(String jobId) throws IOException
      Deprecated.
      Applications should rather use getMapTaskReports(JobID)
      Throws:
      IOException
    • getReduceTaskReports

      public TaskReport[] getReduceTaskReports(JobID jobId) throws IOException
      Get the information of the current state of the reduce tasks of a job.
      Parameters:
      jobId - the job to query.
      Returns:
      the list of all of the reduce tips.
      Throws:
      IOException
    • getCleanupTaskReports

      public TaskReport[] getCleanupTaskReports(JobID jobId) throws IOException
      Get the information of the current state of the cleanup tasks of a job.
      Parameters:
      jobId - the job to query.
      Returns:
      the list of all of the cleanup tips.
      Throws:
      IOException
    • getSetupTaskReports

      public TaskReport[] getSetupTaskReports(JobID jobId) throws IOException
      Get the information of the current state of the setup tasks of a job.
      Parameters:
      jobId - the job to query.
      Returns:
      the list of all of the setup tips.
      Throws:
      IOException
    • getReduceTaskReports

      @Deprecated public TaskReport[] getReduceTaskReports(String jobId) throws IOException
      Deprecated.
      Applications should rather use getReduceTaskReports(JobID)
      Throws:
      IOException
    • displayTasks

      public void displayTasks(JobID jobId, String type, String state) throws IOException
      Display the information about a job's tasks, of a particular type and in a particular state
      Parameters:
      jobId - the ID of the job
      type - the type of the task (map/reduce/setup/cleanup)
      state - the state of the task (pending/running/completed/failed/killed)
      Throws:
      IOException - when there is an error communicating with the master
      IllegalArgumentException - if an invalid type/state is passed
    • getClusterStatus

      public ClusterStatus getClusterStatus() throws IOException
      Get status information about the Map-Reduce cluster.
      Returns:
      the status information about the Map-Reduce cluster as an object of ClusterStatus.
      Throws:
      IOException
    • getClusterStatus

      public ClusterStatus getClusterStatus(boolean detailed) throws IOException
      Get status information about the Map-Reduce cluster.
      Parameters:
      detailed - if true then get a detailed status including the tracker names
      Returns:
      the status information about the Map-Reduce cluster as an object of ClusterStatus.
      Throws:
      IOException
    • jobsToComplete

      public JobStatus[] jobsToComplete() throws IOException
      Get the jobs that are not completed and not failed.
      Returns:
      array of JobStatus for the running/to-be-run jobs.
      Throws:
      IOException
    • getAllJobs

      public JobStatus[] getAllJobs() throws IOException
      Get the jobs that are submitted.
      Returns:
      array of JobStatus for the submitted jobs.
      Throws:
      IOException
    • runJob

      public static RunningJob runJob(JobConf job) throws IOException
      Utility that submits a job, then polls for progress until the job is complete.
      Parameters:
      job - the job configuration.
      Throws:
      IOException - if the job fails
    • monitorAndPrintJob

      public boolean monitorAndPrintJob(JobConf conf, RunningJob job) throws IOException, InterruptedException
      Monitor a job and print status in real-time as progress is made and tasks fail.
      Parameters:
      conf - the job's configuration
      job - the job to track
      Returns:
      true if the job succeeded
      Throws:
      IOException - if communication to the JobTracker fails
      InterruptedException
    • setTaskOutputFilter

      @Deprecated public void setTaskOutputFilter(JobClient.TaskStatusFilter newValue)
      Deprecated.
      Sets the output filter for tasks. only those tasks are printed whose output matches the filter.
      Parameters:
      newValue - task filter.
    • getTaskOutputFilter

      public static JobClient.TaskStatusFilter getTaskOutputFilter(JobConf job)
      Get the task output filter out of the JobConf.
      Parameters:
      job - the JobConf to examine.
      Returns:
      the filter level.
    • setTaskOutputFilter

      public static void setTaskOutputFilter(JobConf job, JobClient.TaskStatusFilter newValue)
      Modify the JobConf to set the task output filter.
      Parameters:
      job - the JobConf to modify.
      newValue - the value to set.
    • getTaskOutputFilter

      @Deprecated public JobClient.TaskStatusFilter getTaskOutputFilter()
      Deprecated.
      Returns task output filter.
      Returns:
      task filter.
    • getCounter

      protected long getCounter(Counters cntrs, String counterGroupName, String counterName) throws IOException
      Overrides:
      getCounter in class CLI
      Throws:
      IOException
    • getDefaultMaps

      public int getDefaultMaps() throws IOException
      Get status information about the max available Maps in the cluster.
      Returns:
      the max available Maps in the cluster
      Throws:
      IOException
    • getDefaultReduces

      public int getDefaultReduces() throws IOException
      Get status information about the max available Reduces in the cluster.
      Returns:
      the max available Reduces in the cluster
      Throws:
      IOException
    • getSystemDir

      public Path getSystemDir()
      Grab the jobtracker system directory path where job-specific files are to be placed.
      Returns:
      the system directory where job-specific files are to be placed.
    • isJobDirValid

      public static boolean isJobDirValid(Path jobDirPath, FileSystem fs) throws IOException
      Checks if the job directory is clean and has all the required components for (re) starting the job
      Throws:
      IOException
    • getStagingAreaDir

      public Path getStagingAreaDir() throws IOException
      Fetch the staging area directory for the application
      Returns:
      path to staging area directory
      Throws:
      IOException
    • getRootQueues

      public JobQueueInfo[] getRootQueues() throws IOException
      Returns an array of queue information objects about root level queues configured
      Returns:
      the array of root level JobQueueInfo objects
      Throws:
      IOException
    • getChildQueues

      public JobQueueInfo[] getChildQueues(String queueName) throws IOException
      Returns an array of queue information objects about immediate children of queue queueName.
      Parameters:
      queueName -
      Returns:
      the array of immediate children JobQueueInfo objects
      Throws:
      IOException
    • getQueues

      public JobQueueInfo[] getQueues() throws IOException
      Return an array of queue information objects about all the Job Queues configured.
      Returns:
      Array of JobQueueInfo objects
      Throws:
      IOException
    • getJobsFromQueue

      public JobStatus[] getJobsFromQueue(String queueName) throws IOException
      Gets all the jobs which were added to particular Job Queue
      Parameters:
      queueName - name of the Job Queue
      Returns:
      Array of jobs present in the job queue
      Throws:
      IOException
    • getQueueInfo

      public JobQueueInfo getQueueInfo(String queueName) throws IOException
      Gets the queue information associated to a particular Job Queue
      Parameters:
      queueName - name of the job queue.
      Returns:
      Queue information associated to particular queue.
      Throws:
      IOException
    • getQueueAclsForCurrentUser

      public org.apache.hadoop.mapred.QueueAclsInfo[] getQueueAclsForCurrentUser() throws IOException
      Gets the Queue ACLs for current user
      Returns:
      array of QueueAclsInfo object for current user.
      Throws:
      IOException
    • getDelegationToken

      public Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> getDelegationToken(Text renewer) throws IOException, InterruptedException
      Get a delegation token for the user from the JobTracker.
      Parameters:
      renewer - the user who can renew the token
      Returns:
      the new token
      Throws:
      IOException
      InterruptedException
    • renewDelegationToken

      public long renewDelegationToken(Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> token) throws org.apache.hadoop.security.token.SecretManager.InvalidToken, IOException, InterruptedException
      Renew a delegation token
      Parameters:
      token - the token to renew
      Returns:
      true if the renewal went well
      Throws:
      org.apache.hadoop.security.token.SecretManager.InvalidToken
      IOException
      InterruptedException
    • cancelDelegationToken

      public void cancelDelegationToken(Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> token) throws org.apache.hadoop.security.token.SecretManager.InvalidToken, IOException, InterruptedException
      Cancel a delegation token from the JobTracker
      Parameters:
      token - the token to cancel
      Throws:
      IOException
      org.apache.hadoop.security.token.SecretManager.InvalidToken
      InterruptedException
    • main

      public static void main(String[] argv) throws Exception
      Throws:
      Exception