Node Attribute is a way to describe the attributes of a Node without resource guarantees. This could be used by applications to pick up the right nodes for their container to be placed based on expression of multitude of these attributes.
The salient features of Node Attributes is as follows:
Unlike labels, attributes can be mapped to a node from both Centralised and Distributed modes at the same time. There will be no clashes as attributes are identified with different prefix in different modes. In case of Centralized attributes are identified by prefix “rm.yarn.io” and in case of Distributed attributes are identified by prefix “nm.yarn.io”. This implies attributes are uniquely identified by prefix and name.
Unlike Node Labels, Node Attributes need not be explicitly enabled as it will always exist and would have no impact in terms of performance or compatibility even if feature is not used.
Setup following properties in yarn-site.xml
Property | Value | Default Value |
---|---|---|
yarn.node-attribute.fs-store.root-dir | path where centralized attribute mappings are stored | file:///tmp/hadoop-yarn-${user}/node-attribute/ |
yarn.node-attribute.fs-store.impl.class | Configured class needs to extend org.apache.hadoop.yarn.nodelabels.NodeAttributeStore | FileSystemNodeAttributeStore |
Notes:
Three options are supported to map attributes to node in Centralised approach:
add Executing yarn nodeattributes -add “node1:attribute[(type)][=value],attribute2 node2:attribute2[=value],attribute3 adds attributes to the nodes without impacting already existing mapping on the node(s).
remove Executing yarn nodeattributes -remove “node1:attribute,attribute1 node2:attribute2" removes attributes to the nodes without impacting already existing mapping on the node(s).
replace Executing yarn nodeattributes -replace “node1:attribute[(type)][=value],attribute1[=value],attribute2 node2:attribute2[=value],attribute3"" replaces the existing attributes to the nodes with the one configured as part of this command.
Notes:
Configuring attributes to nodes in Distributed mode
Property | Value |
---|---|
yarn.nodemanager.node-attributes.provider | Administrators can configure the provider for the node attributes by configuring this parameter in NM. Administrators can configure “config”, “script” or the class name of the provider. Configured class needs to extend org.apache.hadoop.yarn.server.nodemanager.nodelabels.NodeAttributesProvider. If “config” is configured, then “ConfigurationNodeAttributesProvider” and if “script” is configured, then “ScriptBasedNodeAttributesProvider” will be used. |
yarn.nodemanager.node-attributes.provider.fetch-interval-ms | When “yarn.nodemanager.node-attributes.provider” is configured with “config”, “script” or the configured class extends NodeAttributesProvider, then periodically node attributes are retrieved from the node attributes provider. This configuration is to define the interval period. If -1 is configured, then node attributes are retrieved from provider only during initialisation. Defaults to 10 mins. |
yarn.nodemanager.node-attributes.provider.fetch-timeout-ms | When “yarn.nodemanager.node-attributes.provider” is configured with “script”, then this configuration provides the timeout period after which it will interrupt the script which queries the node attributes. Defaults to 20 mins. |
yarn.nodemanager.node-attributes.provider.script.path | The node attribute script NM runs to collect node attributes. Lines in the script output starting with “NODE_ATTRIBUTE:” will be considered as a record of node attribute, attribute name, type and value should be delimited by comma. Each of such lines will be parsed to a node attribute. |
yarn.nodemanager.node-attributes.provider.script.opts | The arguments to pass to the node attribute script. |
yarn.nodemanager.node-attributes.provider.configured-node-attributes | When “yarn.nodemanager.node-attributes.provider” is configured with “config” then ConfigurationNodeAttributesProvider fetches node attributes from this parameter. |
Applications can use Placement Constraint APIs to specify node attribute request as mentioned in Placement Constraint documentation.
Here is an example for creating a Scheduling Request object with NodeAttribute expression:
//expression : AND(python!=3:java=1.8) SchedulingRequest schedulingRequest = SchedulingRequest.newBuilder().executionType( ExecutionTypeRequest.newInstance(ExecutionType.GUARANTEED)) .allocationRequestId(10L).priority(Priority.newInstance(1)) .placementConstraintExpression( PlacementConstraints.and( PlacementConstraints .targetNodeAttribute(PlacementConstraints.NODE, NodeAttributeOpCode.NE, PlacementConstraints.PlacementTargets .nodeAttribute("python", "3")), PlacementConstraints .targetNodeAttribute(PlacementConstraints.NODE, NodeAttributeOpCode.EQ, PlacementConstraints.PlacementTargets .nodeAttribute("java", "1.8"))) .build()).resourceSizing( ResourceSizing.newInstance(1, Resource.newInstance(1024, 1))) .build();
The above SchedulingRequest requests for 1 container on nodes that must satisfy following constraints:
Node attribute rm.yarn.io/python doesn’t exist on the node or it exist but its value is not equal to 3
Node attribute rm.yarn.io/java must exist on the node and its value is equal to 1.8
As part of http://rm-http-address:port/ws/v1/cluster/nodes/{nodeid} REST output attributes and its values mapped to the given node can be got.