Anyone writing a YARN application needs to understand the process, in order to write short-lived applications or long-lived services. They also need to start testing on secure clusters during early development stages, in order to write code that actually works.
YARN Resource Managers (RMs) and Node Managers (NMs) co-operate to execute the user’s application with the identity and hence access rights of that user.
The (active) Resource Manager:
Finds space in a cluster to deploy the core of the application, the Application Master (AM).
Requests that the NM on that node allocate a container and start the AM in it.
Communicates with the AM, so that the AM can request new containers and manipulate/release current ones, and to provide notifications about allocated and running containers.
The Node Managers:
Localize resources: Download from HDFS or other filesystem into a local directory. This is done using the delegation tokens attached to the container launch context. (For non-HDFS resources, using other credentials such as object store login details in cluster configuration files)
Start the application as the user.
Monitor the application and report failure to the RM.
To execute code in the cluster, a YARN application must:
Have a client-side application which sets up the ApplicationSubmissionContext detailing what is to be launched. This includes:
Have an Application Master which, when launched, registers with the YARN RM and listens for events. Any AM which wishes to execute work in other containers must request them off the RM, and, when allocated, create a ContainerLaunchContext containing the command to execute, the environment to execute the command, binaries to localize and all relevant security credentials.
Even with the NM handling the localization process, the AM must itself be able to retrieve the security credentials supplied at launch time so that it itself may work with HDFS and any other services, and to pass some or all of these credentials down to the launched containers.
The delegation tokens which a YARN application needs must be acquired from a program executing as an authenticated user. For a YARN application, this means the user launching the application. It is the client-side part of the YARN application which must do this:
Which tokens are required? Normally, at least a token to access HDFS.
An application must request a delegation token from every filesystem with which it intends to interact —including the cluster’s main FS. FileSystem.addDelegationTokens(renewer, credentials) can be used to collect these; it is a no-op on those filesystems which do not issue tokens (including non-kerberized HDFS clusters).
Applications talking to other services, such as Apache HBase and Apache Hive, must request tokens from these services, using the libraries of these services to acquire delegation tokens. All tokens can be added to the same set of credentials, then saved to a byte buffer for submission.
The Application Timeline Server also needs a delegation token. This is handled automatically on AM launch.
When the Application Master is launched and any of the UGI/Hadoop operations which trigger a user login invoked, the UGI class will automatically load in all tokens saved in the file named by the environment variable HADOOP_TOKEN_FILE_LOCATION.
This happens on an insecure cluster along with a secure one, and on a secure cluster even if a keytab is used by the application. Why? Because the AM/RM token needed to authenticate the application with the YARN RM is always supplied this way.
This means you have a relative similar workflow across secure and insecure clusters.
During AM startup, log in to Kerberos. A call to UserGroupInformation.isSecurityEnabled() will trigger this operation.
Enumerate the current user’s credentials, through a call of UserGroupInformation.getCurrentUser().getCredentials().
Filter out the AMRM token, resulting in a new set of credentials. In an insecure cluster, the list of credentials will now be empty; in a secure cluster they will contain
Set the credentials of all containers to be launched to this (possibly empty) list of credentials.
If the filtered list of tokens to renew, is non-empty start up a thread to renew them.
Tokens expire: they have a limited lifespan. An application wishing to use a token past this expiry date must renew the token before the token expires.
Hadoop automatically sets up a delegation token renewal thread when needed, the DelegationTokenRenewer.
It is the responsibility of the application to renew all tokens other than the AMRM and timeline tokens.
Here are the different strategies
Don’t rely on the lifespan of the application being so short that token renewal is not needed. For applications whose life can always be measured in minutes or tens of minutes, this is a viable strategy.
Start a background thread/Executor to renew the tokens at a regular interval. This what most YARN applications do.
The AM/RM token is renewed automatically; the RM sends out a new token to the AM within an allocate message. Consult the AMRMClientImpl class to see the process. Your AM code does not need to worry about this process
Even if an application is renewing tokens regularly, if an AM fails and is restarted, it gets restarted from that original ApplicationSubmissionContext. The tokens there may have expired, so localization may fail, even before the issue of credentials to talk to other services.
How is this problem addressed? The YARN Resource Manager gets a new token for the node managers, if needed.
More precisely
This is primarily an issue for long-lived services (see below).
Unmanaged application masters are not launched in a container set up by the RM and NM, so cannot automatically pick up an AM/RM token at launch time. The YarnClient.getAMRMToken() API permits an Unmanaged AM to request an AM/RM token. Consult UnmanagedAMLauncher for the specifics.
In an insecure cluster, the application will run as the identity of the account of the node manager, such as yarn or mapred. By default, the application will access HDFS as that user, with a different home directory, and with a different user identified in audit logs and on file system owner attributes.
This can be avoided by having the client identify the identify of the HDFS/Hadoop user under which the application is expected to run. This does not affect the OS-level user or the application’s access rights to the local machine.
When Kerberos is disabled, the identity of a user is picked up by Hadoop first from the environment variable HADOOP_USER_NAME, then from the OS-level username (e.g. the system property user.name).
YARN applications should propagate the user name of the user launching an application by setting this environment variable.
Map<String, String> env = new HashMap<>(); String userName = UserGroupInformation.getCurrentUser().getUserName(); env.put(UserGroupInformation.HADOOP_USER_NAME, userName); containerLaunchContext.setEnvironment(env);
Note that this environment variable is picked up in all applications which talk to HDFS via the hadoop libraries. That is, if set, it is the identity picked up by HBase and other applications executed within the environment of a YARN container within which this environment variable is set.
Apache Oozie can launch an application in a secure cluster either by acquiring all relevant credentials, saving them to a file in the local filesystem, then setting the path to this file in the environment variable HADOOP_TOKEN_FILE_LOCATION. This is of course the same environment variable passed down by YARN in launched containers, as is similar content: a byte array with credentials.
Here, however, the environment variable is set in the environment executing the YARN client. This client must use the token information saved in the named file instead of acquiring any tokens of its own.
Loading in the token file is automatic: UGI does it during user login.
The client is then responsible for passing the same credentials into the AM launch context. This can be done simply by passing down the current credentials.
credentials = new Credentials( UserGroupInformation.getCurrentUser().getCredentials());
The Application Timeline Server can be deployed as a secure service —in which case the application will need the relevant token to authenticate with it. This process is handled automatically in YarnClientImpl if ATS is enabled in a secure cluster. Similarly, the AM-side TimelineClient YARN service class manages token renewal automatically via the ATS’s SPNEGO-authenticated REST API.
If you need to prepare a set of delegation tokens for a YARN application launch via Oozie, this can be done via the timeline client API.
try(TimelineClient timelineClient = TimelineClient.createTimelineClient()) { timelineClient.init(conf); timelineClient.start(); Token<TimelineDelegationTokenIdentifier> token = timelineClient.getDelegationToken(rmprincipal)); credentials.addToken(token.getService(), token); }
Applications may wish to cancel tokens they hold when terminating their AM. This ensures that the tokens are no-longer valid.
This is not mandatory, and as a clean shutdown of a YARN application cannot be guaranteed, it is not possible to guarantee that the tokens will always be during application termination. However, it does reduce the window of vulnerability to stolen tokens.
There is a time limit on all token renewals, after which tokens won’t renew, causing the application to stop working. This is somewhere between seventy-two hours and seven days.
Any YARN service intended to run for an extended period of time must have a strategy for renewing credentials.
Here are the strategies:
A keytab is provided for the application’s use on every node.
This is done by:
The keytab must be in a secure directory path, where only the service (and other trusted accounts) can read it. Distribution becomes a responsibility of the cluster operations team.
This is effectively how all static Hadoop applications get their security credentials.
A keytab is uploaded to HDFS.
When launching the AM, the keytab is listed as a resource to localize to the AM’s container.
The Application Master is configured with the relative path to the keytab, and logs in with UserGroupInformation.loginUserFromKeytab().
When the AM launches the container, it lists the HDFS path to the keytab as a resource to localize.
It adds the HDFS delegation token to the container launch context, so that the keytab and other application files can be localized.
Launched containers must themselves log in via UserGroupInformation.loginUserFromKeytab(). UGI handles the login, and schedules a background thread to relogin the user periodically.
Token creation is handled automatically in the Hadoop IPC and REST APIs, the containers stay logged in via kerberos for their entire duration.
This avoids the administration task of installing keytabs for specific services across the entire cluster.
It does require the client to have access to the keytab and, as it is uploaded to the distributed filesystem, must be secured through the appropriate path permissions/ACLs.
As all containers have access to the keytab, all code executing in the containers has to be trusted. Malicious code (or code escaping some form of sandbox) could read the keytab, and hence have access to the cluster until the keys expire or are revoked.
This is the strategy implemented by Apache Slider (incubating).
A keytab is uploaded to HDFS by the client.
When launching the AM, the keytab is listed as a resource to localize to the AM’s container.
The Application Master is configured with the relative path to the keytab, and logs in with UserGroupInformation.loginUserFromKeytab(). The UGI codepath will still automatically load the file references by $HADOOP_TOKEN_FILE_LOCATION, which is how the AMRM token is picked up.
When the AM launches a container, it acquires all the delegation tokens needed by that container, and adds them to the container’s container launch context.
Launched containers must load the delegation tokens from $HADOOP_TOKEN_FILE_LOCATION, and use them (including renewals) until they can no longer be renewed.
The AM must implement an IPC interface which permits containers to request a new set of delegation tokens; this interface must itself use authentication and ideally wire encryption.
Before a delegation token is due to expire, the processes running in the containers must request new tokens from the Application Master over the IPC channel.
When the containers need the new tokens, the AM, logged in with a keytab, asks the various cluster services for new tokens.
(Note there is an alternative direction for refresh operations: from AM to the containers, again over whatever IPC channel is implemented between AM and containers). The rest of the algorithm: AM regenerated tokens passed to containers over IPC.
This is the strategy used by Apache Spark 1.5+, with a netty-based protocol between containers and the AM for token updates.
Because only the AM has direct access to the keytab, it is less exposed. Code running in the containers only has access to the delegation tokens.
However, those containers will have access to HDFS from the tokens passed in at container launch, so will have access to the copy of the keytab used for launching the AM. While the AM could delete that keytab on launch, doing so would stop YARN being able to successfully relaunch the AM after any failure.
This strategy may be the sole one acceptable to a strict operations team: a client process running on an account holding a Kerberos TGT negotiates with all needed cluster services for new delegation tokens, tokens which are then pushed out to the Application Master via some RPC interface.
This does require the client process to be re-executed on a regular basis; a cron or Oozie job can do this. The AM will need to implement an IPC API over which renewed tokens can be provided. (Note that as Oozie can collect the tokens itself, all the updater application needs to do whenever executed is set up an IPC connection with the AM and pass up the current user’s credentials).
YARN provides a straightforward way of giving every YARN Application SPNEGO authenticated web pages: the RM implements SPNEGO authentication in the Resource Manager Proxy and restricts access to the Yarn Application’s Web UI to only the RM Proxy. There are two ways to do this:
A YARN Application’s Web Server should load the AM proxy filter (see the AmFilterInitializer class) when setting up its web UI; this filter will redirect all HTTP Requests coming from any host other than the RM Proxy hosts to an RM proxy, to which the client app/browser must re-issue the request. The client will authenticate against the principal of the RM Proxy (usually yarn), and, once authenticated, have its request forwarded.
Known weaknesses in this option are:
The AM proxy filter only checks for the IP/hosts of the RM Proxy so any Application running on those hosts has unrestricted access to the YARN Application’s Web UI. This is why in a secure cluster the proxy hosts must run on cluster nodes which do not run end user code (i.e. not running YARN NodeManagers, and hence not schedule YARN containers; nor support logins by end users).
The HTTP requests between RM proxy and the Yarn Application are not currently encrypted. That is: HTTPS is not supported.
By default, YARN Application Web UIs are not encrypted (i.e. HTTPS). It is up to the Application to provide support for HTTPS. This can either be done entirely independently with a valid HTTPS Certificate from a public CA or source that the RM or JVM is configured to trust. Or, alternatively, the RM can act as a limited CA and provide the Application with a Certificate it can use, which is only accepted by the RM proxy, and no other clients (e.g. web browsers). This is important because the Application cannot necessarily be trusted to not steal any issued Certificates or perform other malicious behavior. The Certificates the RM issues will be (a) expired, (b) have a Subject containing CN=<application-id> instead of the typical CN=<hostname|domain>, and (c) be issued by a self-signed CA Certificate generated by the RM.
For an Application to take advantage of this ability, it simply needs to load the provided Keystore into its Web Server of choice. The location of the Keystore can be found in the KEYSTORE_FILE_LOCATION environment variable, and its password in the KEYSTORE_PASSWORD environment variable. This will be available as long as yarn.resourcemanager.application-https.policy is not set to NONE (see table below), and it’s provided an HTTPS Tracking URL.
Additionally, the Application can verify that the RM proxy is in fact the RM via HTTPS Mutual Authentication. Besides the provided Keystore, there is also a provided Truststore with the RM proxy’s client Certificate. By loading this Truststore and enabling needsClientAuth (or equivalent) in its Web Server of choice, the AM’s Web Server should automatically require that the client (i.e. the RM proxy) provide a trusted Certificate, or it will fail the connection. This ensures that only the RM Proxy, which the client authenticated against, can access it.
yarn.resourcemanager.application-https.policy | Behavior |
---|---|
NONE | The RM will do nothing special. |
LENIENT | The RM will generate and provide a keystore and truststore to the AM, which it is free to use for HTTPS in its tracking URL web server. The RM proxy will still allow HTTP connections to AMs that opt not to use HTTPS. |
STRICT | this is the same as LENIENT, except that the RM proxy will only allow HTTPS connections to AMs; HTTP connections will be blocked and result in a warning page to the user. |
The default value is OFF.
YARN REST APIs running on the same port as the registered web UI of a YARN application are automatically authenticated via SPNEGO authentication in the RM proxy.
Any REST endpoint (and equally, any web UI) brought up on a different port does not support SPNEGO authentication unless implemented in the YARN application itself.
Here is the checklist of core actions which a YARN application must do to successfully launch in a YARN cluster.
[ ] Client checks for security being enabled via UserGroupInformation.isSecurityEnabled()
In a secure cluster:
[ ] If HADOOP_TOKEN_FILE_LOCATION is unset, client acquires delegation tokens for the local filesystems, with the RM principal set as the renewer.
[ ] If HADOOP_TOKEN_FILE_LOCATION is unset, client acquires delegation tokens for all other services to be used in the YARN application.
[ ] If HADOOP_TOKEN_FILE_LOCATION is set, client uses the current user’s credentials as the source of all tokens to be added to the container launch context.
[ ] Client sets all tokens on AM ContainerLaunchContext.setTokens().
[ ] Recommended: if it is set in the client’s environment, client sets the environment variable HADOOP_JAAS_DEBUG=true in the Container launch context of the AM.
In an insecure cluster:
[ ] Propagate local username to YARN AM, hence HDFS identity via the HADOOP_USER_NAME environment variable.
[ ] In a secure cluster, AM retrieves security tokens from HADOOP_TOKEN_FILE_LOCATION environment variable (automatically done by UGI).
[ ] A copy the token set is filtered to remove the AM/RM token and any timeline token.
[ ] A thread or executor is started to renew threads on a regular basis.
[ ] Recommended: AM cancels tokens when application completes.
[ ] Tokens to be passed to containers are passed via ContainerLaunchContext.setTokens().
[ ] In an insecure cluster, propagate the HADOOP_USER_NAME environment variable.
[ ] Recommended: AM sets the environment variable HADOOP_JAAS_DEBUG=true in the Container launch context if it is set in the AM’s environment.
[ ] Call UserGroupInformation.isSecurityEnabled() to trigger security setup.
[ ] A thread or executor is started to renew threads on a regular basis.
[ ] Application developers have chosen and implemented a token renewal strategy: shared keytab, AM keytab or client-side token refresh.
[ ] In a secure cluster, the keytab is either already in HDFS (and checked for), or it is in the local FS of the client, in which case it must be uploaded and added to the list of resources to localize.
[ ] If stored in HDFS, keytab permissions should be checked. If the keytab is readable by principals other than the current user, warn, and consider actually failing the launch (similar to the normal ssh application.)
[ ] Client acquires HDFS delegation token and attaches to the AM Container Launch Context,
[ ] AM logs in as principal in keytab via loginUserFromKeytab().
[ ] (AM extracts AM/RM token from the HADOOP_TOKEN_FILE_LOCATION environment variable).
[ ] For launched containers, either the keytab is propagated, or the AM acquires/attaches all required delegation tokens to the Container Launch context alongside the HDFS delegation token needed by the NMs.
It is straightforward to be confident that a YARN application works in secure cluster. The process to do so is: test on a secure cluster.
Even a single VM-cluster can be set up with security enabled. If doing so, we recommend turning security up to its strictest, with SPNEGO-authenticated Web UIs (and hence RM Proxy), as well as IPC wire encryption. Setting the kerberos token expiry to under an hour will find kerberos expiry problems early —so is also recommended.
[ ] Application launched in secure cluster.
[ ] Launched application runs as user submitting job (tip: log user.name system property in AM).
[ ] Web browser interaction verified in secure cluster.
[ ] REST client interation (GET operations) tested.
[ ] Application continues to run after Kerberos Token expiry.
[ ] Application does not launch if user lacks Kerberos credentials.
[ ] If the application supports the timeline server, verify that it publishes events in a secure cluster.
[ ] If the application integrates with other applications, such as HBase or Hive, verify that the interaction works in a secure cluster.
[ ] If the application communicates with remote HDFS clusters, verify that it can do so in a secure cluster (i.e. that the client extracted any delegation tokens for this at launch time)
If you don’t test your YARN application in a secure Hadoop cluster, it won’t work.
And without those tests: your users will be the ones to find out that your application doesn’t work in a secure cluster.
Bear that in mind when considering how much development effort to put into Kerberos support.