Apache Hadoop 3.2.0 Release Notes

These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.

This change updates the Microsoft Windows build directions to be more flexible with regards to Visual Studio compiler versions:

Additionally, Snappy and ISA-L that use bin as the location of the DLL will now be recognized without having to set their respective lib paths if the prefix is set.

Note to contributors:

It is very important that solutions for any patches remain at the VS 2010-level.

The response format of resource manager’s REST API endpoint “ws/v1/cluster/scheduler” is optimized, where the “operationInfos” field is converted from a map to list. This change makes the content to be more JSON parser friendly.

This jira helped to remove the private and unused class DataOutputByteBuffer.

Environment variables for MapReduce tasks can now be specified as separate properties, e.g.: mapreduce.map.env.VARNAME=value mapreduce.reduce.env.VARNAME=value yarn.app.mapreduce.am.env.VARNAME=value yarn.app.mapreduce.am.admin.user.env.VARNAME=value This form of specifying environment variables is useful when the value of an environment variable contains commas.

WASB: Bug fix to support non-sequential page blob reads. Required for HBASE replication.

New command is added to dfsadmin. hdfs dfsadmin [-upgrade [query | finalize] 1. -upgrade query gives the upgradeStatus 2. -upgrade finalize is equivalent to -finalizeUpgrade.

To support Queue deletion with RM restart feature, AllocationFileLoaderService constructor signature was changed. YARN-8390 corrected this and made as a compatible change.

If HADOOP_CLIENT_SKIP_UNJAR environment variable is set to true, Apache Hadoop RunJar skips unjar the provided jar.

WASB: Fix Spark process hang at shutdown due to use of non-daemon threads by updating Azure Storage Java SDK to 7.0

Mover could have fail after 20+ minutes if a block move was enqueued for this long, between two DataNodes due to an internal constant that was introduced for Balancer, but affected Mover as well. The internal constant can be configured with the dfs.balancer.max-iteration-time parameter after the patch, and affects only the Balancer. Default is 20 minutes.

commons-lang version 2.6 was removed from Apache Hadoop. If you are using commons-lang 2.6 as transitive dependency of Hadoop, you need to add the dependency directly. Note: this also means it is absent from share/hadoop/common/lib/

FUSE lib now recognize the change of the Kerberos ticket cache path if it was changed between two file system access in the same local user session via the KRB5CCNAME environment variable.

Restore the KMS accept queue size to 500 in Hadoop 3.x, making it the same as in Hadoop 2.x.

ABFS: Improved HTTPS performance

ABFS: Support for OAuth

After this change, attempt to unsetErasureCodingPolicy() on a directory without EC policy explicitly set on it, will get NoECPolicySetException.

The S3A connector no longer supports username and secrets in URLs of the form ``. It is near-impossible to stop those secrets being logged —which is why a warning has been printed since Hadoop 2.8 whenever such a URL was used.s3a://key:secret@bucket/

Fix: use a more secure mechanism to pass down the secrets.

With this feature task, Node Attributes is supported in YARN which will help user’s to effectively use resources and assign to applications based on characteristics of each node’s in the cluster.

When a namenode A sends request RollEditLog to a remote NN, either the remote NN is standby or IO Exception happens, A should continue to try next NN, instead of getting stuck on the problematic one. This Patch is based on trunk.

StoragePolicySatisfier(SPS) allows users to track and satisfy the storage policy requirement of a given file/directory in HDFS. User can specify a file/directory path by invoking “hdfs storagepolicies -satisfyStoragePolicy -path <path>” command or via HdfsAdmin#satisfyStoragePolicy(path) API. For the blocks which has storage policy mismatches, it moves the replicas to a different storage type in order to fulfill the storage policy requirement. Since API calls goes to NN for tracking the invoked satisfier path(iNodes), administrator need to enable dfs.storage.policy.satisfier.mode’ config at NN to allow these operations. It can be enabled by setting ‘dfs.storage.policy.satisfier.mode’ to ‘external’ in hdfs-site.xml. The configs can be disabled dynamically without restarting Namenode. SPS should be started outside Namenode using “hdfs –daemon start sps”. If administrator is looking to run Mover tool explicitly, then he/she should make sure to disable SPS first and then run Mover. See the “Storage Policy Satisfier (SPS)” section in the Archival Storage guide for detailed usage.

The abfs connector in the hadoop-azure module supports Microsoft Azure Datalake (Gen 2), which at the time of writing (September 2018) was in preview, soon to go GA. As with all cloud connectors, corner-cases will inevitably surface. If you encounter one, please file a bug report.

This patch improves the KMS delegation token issuing and authentication logic, to enable tokens to authenticate with a set of KMS servers. The change is backport compatible, in that it keeps the existing authentication logic as a fall back.

Historically, KMS delegation tokens have ip:port as service, making KMS clients only able to use the token to authenticate with the KMS server specified as ip:port, even though the token is shared among all KMS servers at server-side. After this patch, newly created tokens will have the KMS URL as service.

A DelegationTokenIssuer interface is introduced for token creation.

A new option (-replicate) is added to fsck command to re-trigger the replication for mis-replicated blocks. This option should be used instead of previous workaround of increasing and then decreasing replication factor (using hadoop fs -setrep command).

This patch enables “Hadoop” and “MIT” as options for “hadoop.security.auth_to_local.mechanism” and defaults to ‘hadoop’. This should be backward compatible with pre-HADOOP-12751.

This is basically HADOOP-12751 plus configurable + extended tests.