public class NvidiaGPUPluginForRuntimeV2 extends Object implements DevicePlugin, DevicePluginScheduler
Modifier and Type | Class and Description |
---|---|
static class |
NvidiaGPUPluginForRuntimeV2.DeviceLinkType
Different type of link.
|
class |
NvidiaGPUPluginForRuntimeV2.NvidiaCommandExecutor
A shell wrapper class easy for test.
|
Modifier and Type | Field and Description |
---|---|
static org.slf4j.Logger |
LOG |
static String |
NV_RESOURCE_NAME |
static String |
TOPOLOGY_POLICY_ENV_KEY
The container can set this environment variable.
|
static String |
TOPOLOGY_POLICY_PACK
Schedule policy that prefer the faster GPU-GPU communication.
|
static String |
TOPOLOGY_POLICY_SPREAD
Schedule policy that prefer the faster CPU-GPU communication.
|
Constructor and Description |
---|
NvidiaGPUPluginForRuntimeV2() |
Modifier and Type | Method and Description |
---|---|
Set<Device> |
allocateDevices(Set<Device> availableDevices,
int count,
Map<String,String> envs)
Called when allocating devices.
|
void |
basicSchedule(Set<Device> allocation,
int count,
Set<Device> availableDevices) |
int |
computeCostOfDevices(Device[] devices)
The cost function used to calculate costs of a sub set of devices.
|
Map<Integer,List<Map.Entry<Set<Device>,Integer>>> |
getCostTable() |
Map<String,Integer> |
getDevicePairToWeight() |
Set<Device> |
getDevices()
Called when update node resource.
|
DeviceRegisterRequest |
getRegisterRequestInfo()
Called first when device plugin framework wants to register.
|
void |
initCostTable() |
boolean |
isTopoInitialized() |
DeviceRuntimeSpec |
onDevicesAllocated(Set<Device> allocatedDevices,
YarnRuntimeType yarnRuntime)
Asking how these devices should be prepared/used
before/when container launch.
|
void |
onDevicesReleased(Set<Device> releasedDevices)
Called after device released.
|
void |
parseTopo(String topo,
Map<String,Integer> deviceLinkToWeight)
A typical sample topo output:
GPU0 GPU1 GPU2 GPU3 CPU Affinity
GPU0 X PHB SOC SOC 0-31
GPU1 PHB X SOC SOC 0-31
GPU2 SOC SOC X PHB 0-31
GPU3 SOC SOC PHB X 0-31
Legend:
X = Self
SOC = Connection traversing PCIe as well as the SMP link between
CPU sockets(e.g.
|
void |
setPathOfGpuBinary(String pOfGpuBinary) |
void |
setShellExecutor(NvidiaGPUPluginForRuntimeV2.NvidiaCommandExecutor shellExecutor) |
void |
topologyAwareSchedule(Set<Device> allocation,
int count,
Map<String,String> envs,
Set<Device> availableDevices,
Map<Integer,List<Map.Entry<Set<Device>,Integer>>> cTable)
Topology Aware schedule algorithm.
|
public static final org.slf4j.Logger LOG
public static final String NV_RESOURCE_NAME
public static final String TOPOLOGY_POLICY_ENV_KEY
public static final String TOPOLOGY_POLICY_PACK
public static final String TOPOLOGY_POLICY_SPREAD
public DeviceRegisterRequest getRegisterRequestInfo() throws Exception
DevicePlugin
getRegisterRequestInfo
in interface DevicePlugin
DeviceRegisterRequest
Exception
public Set<Device> getDevices() throws Exception
DevicePlugin
getDevices
in interface DevicePlugin
Device
, TreeSet
recommendedException
public DeviceRuntimeSpec onDevicesAllocated(Set<Device> allocatedDevices, YarnRuntimeType yarnRuntime) throws Exception
DevicePlugin
VolumeSpec
to let the
framework to create volume before running container.onDevicesAllocated
in interface DevicePlugin
allocatedDevices
- A set of allocated Device
.yarnRuntime
- Indicate which runtime YARN will use
Could be RUNTIME_DEFAULT
or RUNTIME_DOCKER
in DeviceRuntimeSpec
constants. The default means YARN's
non-docker container runtime is used. The docker means YARN's
docker container runtime is used.DeviceRuntimeSpec
description about environment,
VolumeSpec
, MountVolumeSpec
. etcException
public void onDevicesReleased(Set<Device> releasedDevices) throws Exception
DevicePlugin
onDevicesReleased
in interface DevicePlugin
releasedDevices
- A set of released devicesException
public Set<Device> allocateDevices(Set<Device> availableDevices, int count, Map<String,String> envs)
DevicePluginScheduler
allocateDevices
in interface DevicePluginScheduler
availableDevices
- Devices allowed to be chosen from.count
- Number of device to be allocated.envs
- Environment variables of the container.Device
allocated@VisibleForTesting public void initCostTable() throws IOException
IOException
@VisibleForTesting public int computeCostOfDevices(Device[] devices)
@VisibleForTesting public void topologyAwareSchedule(Set<Device> allocation, int count, Map<String,String> envs, Set<Device> availableDevices, Map<Integer,List<Map.Entry<Set<Device>,Integer>>> cTable)
@VisibleForTesting public void basicSchedule(Set<Device> allocation, int count, Set<Device> availableDevices)
public void parseTopo(String topo, Map<String,Integer> deviceLinkToWeight)
@VisibleForTesting public void setPathOfGpuBinary(String pOfGpuBinary)
@VisibleForTesting public void setShellExecutor(NvidiaGPUPluginForRuntimeV2.NvidiaCommandExecutor shellExecutor)
@VisibleForTesting public boolean isTopoInitialized()
@VisibleForTesting public Map<Integer,List<Map.Entry<Set<Device>,Integer>>> getCostTable()
Copyright © 2008–2024 Apache Software Foundation. All rights reserved.