MapReduce Commands Guide

Overview
User Commands
- archive
- archive-logs
- classpath
- distcp
- job
- pipes
- queue
- version
- envvars
Administration Commands
- historyserver
- hsadmin

Overview

All mapreduce commands are invoked by the bin/mapred script. Running the mapred script without any arguments prints the description for all commands.

Usage: mapred [SHELL_OPTIONS] COMMAND [GENERIC_OPTIONS] [COMMAND_OPTIONS]

Hadoop has an option parsing framework that employs parsing generic options as well as running classes.

COMMAND_OPTIONS	Description
SHELL_OPTIONS	The common set of shell options. These are documented on the Hadoop Commands Reference page.
GENERIC_OPTIONS	The common set of options supported by multiple commands. See the Hadoop Commands Reference for more information.
COMMAND COMMAND_OPTIONS	Various commands with their options are described in the following sections. The commands have been grouped into User Commands and Administration Commands.

User Commands

Commands useful for users of a hadoop cluster.

`archive`

Creates a hadoop archive. More information can be found at Hadoop Archives Guide.

`archive-logs`

A tool to combine YARN aggregated logs into Hadoop archives to reduce the number of files in HDFS. More information can be found at Hadoop Archive Logs Guide.

`classpath`

Usage: yarn classpath [--glob |--jar <path> |-h |--help]

COMMAND_OPTION	Description
`--glob`	expand wildcards
`--jar` path	write classpath as manifest in jar named path
`-h`, `--help`	print help

Prints the class path needed to get the Hadoop jar and the required libraries. If called without arguments, then prints the classpath set up by the command scripts, which is likely to contain wildcards in the classpath entries. Additional options print the classpath after wildcard expansion or write the classpath into the manifest of a jar file. The latter is useful in environments where wildcards cannot be used and the expanded classpath exceeds the maximum supported command line length.

`distcp`

Copy file or directories recursively. More information can be found at Hadoop DistCp Guide.

`job`

Command to interact with Map Reduce Jobs.

COMMAND_OPTION	Description
-submit job-file	Submits the job.
-status job-id	Prints the map and reduce completion percentage and all job counters.
-counter job-id group-name counter-name	Prints the counter value.
-kill job-id	Kills the job.
-events job-id from-event-# #-of-events	Prints the events’ details received by jobtracker for the given range.
-history [all] jobHistoryFilejobId [-outfile file] [-format humanjson]	Prints job details, failed and killed task details. More details about the job such as successful tasks, task attempts made for each task, task counters, etc can be viewed by specifying the [all] option. An optional file output path (instead of stdout) can be specified. The format defaults to human-readable but can also be changed to JSON with the [-format] option.
-list [all]	Displays jobs which are yet to complete. `-list all` displays all jobs.
-kill-task task-id	Kills the task. Killed tasks are NOT counted against failed attempts.
-fail-task task-id	Fails the task. Failed tasks are counted against failed attempts.
-set-priority job-id priority	Changes the priority of the job. Allowed priority values are VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW
-list-active-trackers	List all the active NodeManagers in the cluster.
-list-blacklisted-trackers	List the black listed task trackers in the cluster. This command is not supported in MRv2 based cluster.
-list-attempt-ids job-id task-type task-state	List the attempt-ids based on the task type and the status given. Valid values for task-type are REDUCE, MAP. Valid values for task-state are running, pending, completed, failed, killed.
-logs job-id task-attempt-id	Dump the container log for a job if taskAttemptId is not specified, otherwise dump the log for the task with the specified taskAttemptId. The logs will be dumped in system out.
-config job-id file	Download the job configuration file.

`pipes`

Runs a pipes job.

Usage: mapred pipes [-conf <path>] [-jobconf <key=value>, <key=value>, ...] [-input <path>] [-output <path>] [-jar <jar file>] [-inputformat <class>] [-map <class>] [-partitioner <class>] [-reduce <class>] [-writer <class>] [-program <executable>] [-reduces <num>]

COMMAND_OPTION	Description
-conf path	Configuration for job
-jobconf key=value, key=value, …	Add/override configuration for job
-input path	Input directory
-output path	Output directory
-jar jar file	Jar filename
-inputformat class	InputFormat class
-map class	Java Map class
-partitioner class	Java Partitioner
-reduce class	Java Reduce class
-writer class	Java RecordWriter
-program executable	Executable URI
-reduces num	Number of reduces

`queue`

command to interact and view Job Queue information

Usage: mapred queue [-list] | [-info <job-queue-name> [-showJobs]] | [-showacls]

COMMAND_OPTION	Description
-list	Gets list of Job Queues configured in the system. Along with scheduling information associated with the job queues.
-info job-queue-name [-showJobs]	Displays the job queue information and associated scheduling information of particular job queue. If `-showJobs` options is present a list of jobs submitted to the particular job queue is displayed.
-showacls	Displays the queue name and associated queue operations allowed for the current user. The list consists of only those queues to which the user has access.

`version`

Prints the version.

Usage: mapred version

`envvars`

Usage: mapred envvars

Display computed Hadoop environment variables.

Administration Commands

Commands useful for administrators of a hadoop cluster.

`historyserver`

Start JobHistoryServer.

Usage: mapred historyserver

`hsadmin`

Runs a MapReduce hsadmin client for execute JobHistoryServer administrative commands.

COMMAND_OPTION	Description
-refreshUserToGroupsMappings	Refresh user-to-groups mappings
-refreshSuperUserGroupsConfiguration	Refresh superuser proxy groups mappings
-refreshAdminAcls	Refresh acls for administration of Job history server
-refreshLoadedJobCache	Refresh loaded job cache of Job history server
-refreshJobRetentionSettings	Refresh job history period, job cleaner settings
-refreshLogRetentionSettings	Refresh log retention period and log retention check interval
-getGroups [username]	Get the groups which given user belongs to
-help [cmd]	Displays help for the given command or all commands if none is specified.

General

Common

HDFS

MapReduce

MapReduce REST APIs

YARN

YARN REST APIs

Hadoop Compatible File Systems

Auth

Tools

Reference

Configuration