[ Go Back ]
This document describes the FairScheduler, a pluggable scheduler for Hadoop which provides a way to share large clusters. NOTE: The Fair Scheduler implementation is currently under development and should be considered experimental.
Fair scheduling is a method of assigning resources to applications such that all apps get, on average, an equal share of resources over time. Hadoop NextGen is capable of scheduling multiple resource types, such as Memory and CPU. Currently only memory is supported, so a "cluster share" is a proportion of aggregate memory in the cluster. When there is a single app running, that app uses the entire cluster. When other apps are submitted, resources that free up are assigned to the new apps, so that each app gets roughly the same amount of resources. Unlike the default Hadoop scheduler, which forms a queue of apps, this lets short apps finish in reasonable time while not starving long-lived apps. It is also a reasonable way to share a cluster between a number of users. Finally, fair sharing can also work with app priorities - the priorities are used as weights to determine the fraction of total resources that each app should get.
The scheduler organizes apps further into "queues", and shares resources fairly between these queues. By default, all users share a single queue, called “default”. If an app specifically lists a queue in a container resource request, the request is submitted to that queue. It is also possible to assign queues based on the user name included with the request through configuration. Within each queue, fair sharing is used to share capacity between the running apps. queues can also be given weights to share the cluster non-proportionally in the config file.
The fair scheduler supports hierarchical queues. All queues descend from a queue named "root". Available resources are distributed among the children of the root queue in the typical fair scheduling fashion. Then, the children distribute the resources assigned to them to their children in the same fashion. Applications may only be scheduled on leaf queues. Queues can be specified as children of other queues by placing them as sub-elements of their parents in the fair scheduler configuration file.
A queue's name starts with the names of its parents, with periods as separators. So a queue named "queue1" under the root named, would be referred to as "root.queue1", and a queue named "queue2" under a queue named "parent1" would be referred to as "root.parent1.queue2". When referring to queues, the root part of the name is optional, so queue1 could be referred to as just "queue1", and a queue2 could be referred to as just "parent1.queue2".
In addition to providing fair sharing, the Fair Scheduler allows assigning guaranteed minimum shares to queues, which is useful for ensuring that certain users, groups or production applications always get sufficient resources. When a queue contains apps, it gets at least its minimum share, but when the queue does not need its full guaranteed share, the excess is split between other running apps. This lets the scheduler guarantee capacity for queues while utilizing resources efficiently when these queues don't contain applications.
The Fair Scheduler lets all apps run by default, but it is also possible to limit the number of running apps per user and per queue through the config file. This can be useful when a user must submit hundreds of apps at once, or in general to improve performance if running too many apps at once would cause too much intermediate data to be created or too much context-switching. Limiting the apps does not cause any subsequently submitted apps to fail, only to wait in the scheduler's queue until some of the user's earlier apps finish. apps to run from each user/queue are chosen in order of priority and then submit time, as in the default FIFO scheduler in Hadoop.
Certain add-ons are not yet supported which existed in the original (MR1) Fair Scheduler. Among them, is the use of a custom policies governing priority “boosting” over certain apps.
To use the Fair Scheduler first assign the appropriate scheduler class in yarn-site.xml:
<property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property>
Customizing the Fair Scheduler typically involves altering two files. First, scheduler-wide options can be set by adding configuration properties in the fair-scheduler.xml file in your existing configuration directory. Second, in most cases users will want to create a manifest file listing which queues exist and their respective weights and capacities. The location of this file is flexible - but it must be declared in fair-scheduler.xml.
The allocation file must be in XML format. The format contains three types of elements:
An example allocation file is given here:
<?xml version="1.0"?> <allocations> <queue name="sample_queue"> <minResources>10000</minResources> <maxResources>90000</maxResources> <maxRunningApps>50</maxRunningApps> <weight>2.0</weight> <schedulingMode>fair</schedulingMode> <queue name="sample_sub_queue"> <minResources>5000</minResources> </queue> </queue> <user name="sample_user"> <maxRunningApps>30</maxRunningApps> </user> <userMaxAppsDefault>5</userMaxAppsDefault> </allocations>
Note that for backwards compatibility with the original FairScheduler, "queue" elements can instead be named as "pool" elements.