Package org.apache.hadoop.mapred
Class JobClient
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.hadoop.mapreduce.tools.CLI
org.apache.hadoop.mapred.JobClient
- All Implemented Interfaces:
AutoCloseable,Configurable,Tool
JobClient is the primary interface for the user-job to interact
with the cluster.
JobClient provides facilities to submit jobs, track their
progress, access component-tasks' reports/logs, get the Map-Reduce cluster
status information etc.
The job submission process involves:
- Checking the input and output specifications of the job.
-
Computing the
InputSplits for the job. -
Setup the requisite accounting information for the
DistributedCacheof the job, if necessary. - Copying the job's jar and configuration to the map-reduce system directory on the distributed file-system.
- Submitting the job to the cluster and optionally monitoring it's status.
JobConf and then uses the JobClient to submit
the job and monitor its progress.
Here is an example on how to use JobClient:
// Create a new JobConf
JobConf job = new JobConf(new Configuration(), MyJob.class);
// Specify various job-specific parameters
job.setJobName("myjob");
job.setInputPath(new Path("in"));
job.setOutputPath(new Path("out"));
job.setMapperClass(MyJob.MyMapper.class);
job.setReducerClass(MyJob.MyReducer.class);
// Submit the job, then poll for progress until the job is complete
JobClient.runJob(job);
Job Control
At times clients would chain map-reduce jobs to accomplish complex tasks which cannot be done via a single map-reduce job. This is fairly easy since the output of the job, typically, goes to distributed file-system and that can be used as the input for the next job.
However, this also means that the onus on ensuring jobs are complete (success/failure) lies squarely on the clients. In such situations the various job-control options are:
-
runJob(JobConf): submits the job and returns only after the job has completed. -
submitJob(JobConf): only submits the job, then poll the returned handle to theRunningJobto query status and make scheduling decisions. -
JobConf.setJobEndNotificationURI(String): setup a notification on job-completion, thus avoiding polling.
- See Also:
-
JobConfClusterStatusToolDistributedCache
-
Nested Class Summary
Nested Classes -
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionCreate a job client.JobClient(InetSocketAddress jobTrackAddr, Configuration conf) Build a job client, connect to the indicated job tracker.JobClient(Configuration conf) Build a job client with the givenConfiguration, and connect to the default clusterBuild a job client with the givenJobConf, and connect to the default cluster -
Method Summary
Modifier and TypeMethodDescriptionvoidcancelDelegationToken(Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> token) Deprecated.voidclose()Close theJobClient.voiddisplayTasks(JobID jobId, String type, String state) Display the information about a job's tasks, of a particular type and in a particular stateGet the jobs that are submitted.getChildQueues(String queueName) Returns an array of queue information objects about immediate children of queue queueName.getCleanupTaskReports(JobID jobId) Get the information of the current state of the cleanup tasks of a job.Get a handle to the ClusterGet status information about the Map-Reduce cluster.getClusterStatus(boolean detailed) Get status information about the Map-Reduce cluster.protected longgetCounter(Counters cntrs, String counterGroupName, String counterName) intGet status information about the max available Maps in the cluster.intGet status information about the max available Reduces in the cluster.Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier>getDelegationToken(Text renewer) Get a delegation token for the user from the JobTracker.getFs()Get a filesystem handle.Deprecated.Applications should rather usegetJob(JobID).Get anRunningJobobject to track an ongoing job.protected RunningJobgetJobInner(JobID jobid) getJobsFromQueue(String queueName) Gets all the jobs which were added to particular Job QueuegetMapTaskReports(String jobId) Deprecated.Applications should rather usegetMapTaskReports(JobID)getMapTaskReports(JobID jobId) Get the information of the current state of the map tasks of a job.org.apache.hadoop.mapred.QueueAclsInfo[]Gets the Queue ACLs for current usergetQueueInfo(String queueName) Gets the queue information associated to a particular Job QueueReturn an array of queue information objects about all the Job Queues configured.getReduceTaskReports(String jobId) Deprecated.Applications should rather usegetReduceTaskReports(JobID)getReduceTaskReports(JobID jobId) Get the information of the current state of the reduce tasks of a job.Returns an array of queue information objects about root level queues configuredgetSetupTaskReports(JobID jobId) Get the information of the current state of the setup tasks of a job.Fetch the staging area directory for the applicationGrab the jobtracker system directory path where job-specific files are to be placed.Deprecated.static JobClient.TaskStatusFilterGet the task output filter out of the JobConf.voidConnect to the default clusterstatic booleanisJobDirValid(Path jobDirPath, FileSystem fs) Checks if the job directory is clean and has all the required components for (re) starting the jobGet the jobs that are not completed and not failed.static voidbooleanmonitorAndPrintJob(JobConf conf, RunningJob job) Monitor a job and print status in real-time as progress is made and tasks fail.longrenewDelegationToken(Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> token) Deprecated.static RunningJobUtility that submits a job, then polls for progress until the job is complete.voidDeprecated.static voidsetTaskOutputFilter(JobConf job, JobClient.TaskStatusFilter newValue) Modify the JobConf to set the task output filter.Submit a job to the MR system.Submit a job to the MR system.submitJobInternal(JobConf conf) Methods inherited from class org.apache.hadoop.mapreduce.tools.CLI
displayJobList, displayJobList, displayTasks, getTaskLogURL, runMethods inherited from class org.apache.hadoop.conf.Configured
getConf, setConfMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
Field Details
-
MAPREDUCE_CLIENT_RETRY_POLICY_ENABLED_KEY
- See Also:
-
MAPREDUCE_CLIENT_RETRY_POLICY_ENABLED_DEFAULT
@Private public static final boolean MAPREDUCE_CLIENT_RETRY_POLICY_ENABLED_DEFAULT- See Also:
-
MAPREDUCE_CLIENT_RETRY_POLICY_SPEC_KEY
- See Also:
-
MAPREDUCE_CLIENT_RETRY_POLICY_SPEC_DEFAULT
- See Also:
-
-
Constructor Details
-
JobClient
public JobClient()Create a job client. -
JobClient
Build a job client with the givenJobConf, and connect to the default cluster- Parameters:
conf- the job configuration.- Throws:
IOException
-
JobClient
Build a job client with the givenConfiguration, and connect to the default cluster- Parameters:
conf- the configuration.- Throws:
IOException
-
JobClient
Build a job client, connect to the indicated job tracker.- Parameters:
jobTrackAddr- the job tracker to connect to.conf- configuration.- Throws:
IOException
-
-
Method Details
-
init
Connect to the default cluster- Parameters:
conf- the job configuration.- Throws:
IOException
-
close
Close theJobClient.- Specified by:
closein interfaceAutoCloseable- Throws:
IOException
-
getFs
Get a filesystem handle. We need this to prepare jobs for submission to the MapReduce system.- Returns:
- the filesystem handle.
- Throws:
IOException
-
getClusterHandle
Get a handle to the Cluster -
submitJob
public RunningJob submitJob(String jobFile) throws FileNotFoundException, InvalidJobConfException, IOException Submit a job to the MR system. This returns a handle to theRunningJobwhich can be used to track the running-job.- Parameters:
jobFile- the job configuration.- Returns:
- a handle to the
RunningJobwhich can be used to track the running-job. - Throws:
FileNotFoundExceptionInvalidJobConfExceptionIOException
-
submitJob
Submit a job to the MR system. This returns a handle to theRunningJobwhich can be used to track the running-job.- Parameters:
conf- the job configuration.- Returns:
- a handle to the
RunningJobwhich can be used to track the running-job. - Throws:
FileNotFoundExceptionIOException
-
submitJobInternal
@Private public RunningJob submitJobInternal(JobConf conf) throws FileNotFoundException, IOException - Throws:
FileNotFoundExceptionIOException
-
getJobInner
- Throws:
IOException
-
getJob
Get anRunningJobobject to track an ongoing job. Returns null if the id does not correspond to any known job.- Parameters:
jobid- the jobid of the job.- Returns:
- the
RunningJobhandle to track the job, null if thejobiddoesn't correspond to any known job. - Throws:
IOException
-
getJob
Deprecated.Applications should rather usegetJob(JobID).- Throws:
IOException
-
getMapTaskReports
Get the information of the current state of the map tasks of a job.- Parameters:
jobId- the job to query.- Returns:
- the list of all of the map tips.
- Throws:
IOException
-
getMapTaskReports
Deprecated.Applications should rather usegetMapTaskReports(JobID)- Throws:
IOException
-
getReduceTaskReports
Get the information of the current state of the reduce tasks of a job.- Parameters:
jobId- the job to query.- Returns:
- the list of all of the reduce tips.
- Throws:
IOException
-
getCleanupTaskReports
Get the information of the current state of the cleanup tasks of a job.- Parameters:
jobId- the job to query.- Returns:
- the list of all of the cleanup tips.
- Throws:
IOException
-
getSetupTaskReports
Get the information of the current state of the setup tasks of a job.- Parameters:
jobId- the job to query.- Returns:
- the list of all of the setup tips.
- Throws:
IOException
-
getReduceTaskReports
Deprecated.Applications should rather usegetReduceTaskReports(JobID)- Throws:
IOException
-
displayTasks
Display the information about a job's tasks, of a particular type and in a particular state- Parameters:
jobId- the ID of the jobtype- the type of the task (map/reduce/setup/cleanup)state- the state of the task (pending/running/completed/failed/killed)- Throws:
IOException- when there is an error communicating with the masterIllegalArgumentException- if an invalid type/state is passed
-
getClusterStatus
Get status information about the Map-Reduce cluster.- Returns:
- the status information about the Map-Reduce cluster as an object
of
ClusterStatus. - Throws:
IOException
-
getClusterStatus
Get status information about the Map-Reduce cluster.- Parameters:
detailed- if true then get a detailed status including the tracker names- Returns:
- the status information about the Map-Reduce cluster as an object
of
ClusterStatus. - Throws:
IOException
-
jobsToComplete
Get the jobs that are not completed and not failed.- Returns:
- array of
JobStatusfor the running/to-be-run jobs. - Throws:
IOException
-
getAllJobs
Get the jobs that are submitted.- Returns:
- array of
JobStatusfor the submitted jobs. - Throws:
IOException
-
runJob
Utility that submits a job, then polls for progress until the job is complete.- Parameters:
job- the job configuration.- Throws:
IOException- if the job fails
-
monitorAndPrintJob
public boolean monitorAndPrintJob(JobConf conf, RunningJob job) throws IOException, InterruptedException Monitor a job and print status in real-time as progress is made and tasks fail.- Parameters:
conf- the job's configurationjob- the job to track- Returns:
- true if the job succeeded
- Throws:
IOException- if communication to the JobTracker failsInterruptedException
-
setTaskOutputFilter
Deprecated.Sets the output filter for tasks. only those tasks are printed whose output matches the filter.- Parameters:
newValue- task filter.
-
getTaskOutputFilter
Get the task output filter out of the JobConf.- Parameters:
job- the JobConf to examine.- Returns:
- the filter level.
-
setTaskOutputFilter
Modify the JobConf to set the task output filter.- Parameters:
job- the JobConf to modify.newValue- the value to set.
-
getTaskOutputFilter
Deprecated.Returns task output filter.- Returns:
- task filter.
-
getCounter
protected long getCounter(Counters cntrs, String counterGroupName, String counterName) throws IOException - Overrides:
getCounterin classCLI- Throws:
IOException
-
getDefaultMaps
Get status information about the max available Maps in the cluster.- Returns:
- the max available Maps in the cluster
- Throws:
IOException
-
getDefaultReduces
Get status information about the max available Reduces in the cluster.- Returns:
- the max available Reduces in the cluster
- Throws:
IOException
-
getSystemDir
Grab the jobtracker system directory path where job-specific files are to be placed.- Returns:
- the system directory where job-specific files are to be placed.
-
isJobDirValid
Checks if the job directory is clean and has all the required components for (re) starting the job- Throws:
IOException
-
getStagingAreaDir
Fetch the staging area directory for the application- Returns:
- path to staging area directory
- Throws:
IOException
-
getRootQueues
Returns an array of queue information objects about root level queues configured- Returns:
- the array of root level JobQueueInfo objects
- Throws:
IOException
-
getChildQueues
Returns an array of queue information objects about immediate children of queue queueName.- Parameters:
queueName-- Returns:
- the array of immediate children JobQueueInfo objects
- Throws:
IOException
-
getQueues
Return an array of queue information objects about all the Job Queues configured.- Returns:
- Array of JobQueueInfo objects
- Throws:
IOException
-
getJobsFromQueue
Gets all the jobs which were added to particular Job Queue- Parameters:
queueName- name of the Job Queue- Returns:
- Array of jobs present in the job queue
- Throws:
IOException
-
getQueueInfo
Gets the queue information associated to a particular Job Queue- Parameters:
queueName- name of the job queue.- Returns:
- Queue information associated to particular queue.
- Throws:
IOException
-
getQueueAclsForCurrentUser
Gets the Queue ACLs for current user- Returns:
- array of QueueAclsInfo object for current user.
- Throws:
IOException
-
getDelegationToken
public Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> getDelegationToken(Text renewer) throws IOException, InterruptedException Get a delegation token for the user from the JobTracker.- Parameters:
renewer- the user who can renew the token- Returns:
- the new token
- Throws:
IOExceptionInterruptedException
-
renewDelegationToken
public long renewDelegationToken(Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> token) throws org.apache.hadoop.security.token.SecretManager.InvalidToken, IOException, InterruptedException Deprecated.Renew a delegation token- Parameters:
token- the token to renew- Returns:
- true if the renewal went well
- Throws:
org.apache.hadoop.security.token.SecretManager.InvalidTokenIOExceptionInterruptedException
-
cancelDelegationToken
public void cancelDelegationToken(Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> token) throws org.apache.hadoop.security.token.SecretManager.InvalidToken, IOException, InterruptedException Deprecated.Cancel a delegation token from the JobTracker- Parameters:
token- the token to cancel- Throws:
IOExceptionorg.apache.hadoop.security.token.SecretManager.InvalidTokenInterruptedException
-
main
- Throws:
Exception
-
Token.cancel(org.apache.hadoop.conf.Configuration)instead