YARN Node Labels

Overview

Node label is a way to group nodes with similar characteristics and applications can specify where to run.

Now we only support node partition, which is:

User can specify set of node labels which can be accessed by each queue, one application can only use subset of node labels that can be accessed by the queue which contains the application.

Features

The Node Labels supports the following features for now:

Configuration

Setting up ResourceManager to enable Node Labels:

Setup following properties in yarn-site.xml

Property Value
yarn.node-labels.fs-store.root-dir hdfs://namenode:port/path/to/store/node-labels/
yarn.node-labels.enabled true

Notes:

  • Make sure yarn.node-labels.fs-store.root-dir is created and ResourceManager has permission to access it. (Typically from “yarn” user)
  • If user want to store node label to local file system of RM (instead of HDFS), paths like file:///home/yarn/node-label can be used

Add/modify node labels list and node-to-labels mapping to YARN

  • Add cluster node labels list:

    • Executing yarn rmadmin -addToClusterNodeLabels "label_1(exclusive=true/false),label_2(exclusive=true/false)" to add node label.
    • If user don’t specify “(exclusive=…)”, execlusive will be true by default.
    • Run yarn cluster --list-node-labels to check added node labels are visible in the cluster.
  • Add labels to nodes

    • Executing yarn rmadmin -replaceLabelsOnNode “node1[:port]=label1 node2=label2”. Added label1 to node1, label2 to node2. If user don’t specify port, it added the label to all NodeManagers running on the node.

Configuration of Schedulers for node labels

Capacity Scheduler Configuration

Property Value
yarn.scheduler.capacity.<queue-path>.capacity Set the percentage of the queue can access to nodes belong to DEFAULT partition. The sum of DEFAULT capacities for direct children under each parent, must be equal to 100.
yarn.scheduler.capacity.<queue-path>.accessible-node-labels Admin need specify labels can be accessible by each queue, split by comma, like “hbase,storm” means queue can access label hbase and storm. All queues can access to nodes without label, user don’t have to specify that. If user don’t specify this field, it will inherit from its parent. If user want to explicitly specify a queue can only access nodes without labels, just put a space as the value.
yarn.scheduler.capacity.<queue-path>.accessible-node-labels.<label>.capacity Set the percentage of the queue can access to nodes belong to <label> partition . The sum of <label> capacities for direct children under each parent, must be equal to 100. By default, it’s 0.
yarn.scheduler.capacity.<queue-path>.accessible-node-labels.<label>.maximum-capacity Similar to yarn.scheduler.capacity.<queue-path>.maximum-capacity, it is for maximum-capacity for labels of each queue. By default, it’s 100.
yarn.scheduler.capacity.<queue-path>.default-node-label-expression Value like “hbase”, which means: if applications submitted to the queue without specifying node label in their resource requests, it will use “hbase” as default-node-label-expression. By default, this is empty, so application will get containers from nodes without label.

An example of node label configuration:

Assume we have a queue structure

                root
            /     |    \
     engineer    sales  marketing

We have 5 nodes (hostname=h1..h5) in the cluster, each of them has 24G memory, 24 vcores. 1 among the 5 nodes has GPU (assume it’s h5). So admin added GPU label to h5.

Assume user have a Capacity Scheduler configuration like: (key=value is used here for readability)

yarn.scheduler.capacity.root.queues=engineering,marketing,sales
yarn.scheduler.capacity.root.engineering.capacity=33
yarn.scheduler.capacity.root.marketing.capacity=34
yarn.scheduler.capacity.root.sales.capacity=33

yarn.scheduler.capacity.root.engineering.accessible-node-labels=GPU
yarn.scheduler.capacity.root.marketing.accessible-node-labels=GPU

yarn.scheduler.capacity.root.engineering.accessible-node-labels.GPU.capacity=50
yarn.scheduler.capacity.root.marketing.accessible-node-labels.GPU.capacity=50

yarn.scheduler.capacity.root.engineering.default-node-label-expression=GPU

You can see root.engineering/marketing/sales.capacity=33, so each of them can has guaranteed resource equals to 1/3 of resource without partition. So each of them can use 1/3 resource of h1..h4, which is 24 * 4 * (1/3) = (32G mem, 32 v-cores).

And only engineering/marketing queue has permission to access GPU partition (see root.<queue-name>.accessible-node-labels).

Each of engineering/marketing queue has guaranteed resource equals to 1/2 of resource with partition=GPU. So each of them can use 1/2 resource of h5, which is 24 * 0.5 = (12G mem, 12 v-cores).

Notes:

  • After finishing configuration of CapacityScheduler, execute yarn rmadmin -refreshQueues to apply changes
  • Go to scheduler page of RM Web UI to check if you have successfully set configuration.

Specifying node label for application

Applications can use following Java APIs to specify node label to request

  • ApplicationSubmissionContext.setNodeLabelExpression(..) to set node label expression for all containers of the application.
  • ResourceRequest.setNodeLabelExpression(..) to set node label expression for individual resource requests. This can overwrite node label expression set in ApplicationSubmissionContext
  • Specify setAMContainerResourceRequest.setNodeLabelExpression in ApplicationSubmissionContext to indicate expected node label for application master container.

Monitoring

Monitoring through web UI

Following label-related fields can be seen on web UI:

Monitoring through commandline

  • Use yarn cluster --list-node-labels to get labels in the cluster
  • Use yarn node -status <NodeId> to get node status including labels on a given node

Useful links