Apache Hadoop 3.4.0 Release Notes

These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.

The build image has been upgraded to Bionic.

ZKFC binds host address to “dfs.namenode.servicerpc-bind-host”, if configured. Otherwise, it binds to “dfs.namenode.rpc-bind-host”. If neither of those is configured, ZKFC binds itself to NameNode RPC server address (effectively “dfs.namenode.rpc-address”).

When FairCallQueue is enabled, user can specify capacity allocation among all sub-queues via configuration “ipc.<port>.callqueue.capacity.weights”. The value of this config is a comma-separated list of positive integers, each of which specifies the weight associated with the sub-queue at that index. This list length should be IPC scheduler priority levels, defined by “scheduler.priority.levels”.

By default, each sub-queue is associated with weight 1, i.e., all sub-queues are allocated with the same capacity.

ViewFS#listStatus on root(“/”) considers listing from fallbackLink if available. If the same directory name is present in configured mount path as well as in fallback link, then only the configured mount path will be listed in the returned result.

Added a new BlockPlacementPolicy: “AvailableSpaceRackFaultTolerantBlockPlacementPolicy” which uses the same optimization logic as the AvailableSpaceBlockPlacementPolicy along with spreading the replicas across maximum number of racks, similar to BlockPlacementPolicyRackFaultTolerant. The BPP can be configured by setting the blockplacement policy class as org.apache.hadoop.hdfs.server.blockmanagement.AvailableSpaceRackFaultTolerantBlockPlacementPolicy

Enable balancer to redirect getBlocks request to a Standby Namenode, thus reducing the performance impact of balancer to the Active NameNode.

The feature is disabled by default. To enable it, configure the hdfs-site.xml of balancer: dfs.ha.allow.stale.reads = true.

Added a UserGroupMapping#getGroupsSet() API and deprecate UserGroupMapping#getGroups.

The UserGroupMapping#getGroups() can be expensive as it involves Set->List conversion. For user with large group membership (i.e., > 1000 groups), we recommend using getGroupSet to avoid the conversion and fast membership look up.

Boost 1.72 is required when building native code.

ABFS: Support for conditional overwrite.

Add a new storage type NVDIMM and a new storage policy ALL_NVDIMM for HDFS. The NVDIMM storage type is for non-volatile random-access memory storage medias whose data survives when DataNode restarts.

New encryption codec “SM4/CTR/NoPadding” is added. Requires openssl version >=1.1.1 for native implementation.

Improved node registration with node health status.

The SnappyCodec uses the snappy-java compression library, rather than explicitly referencing native binaries. It contains the native libraries for many operating systems and instruction sets, falling back to a pure java implementation. It does requires the snappy-java.jar is on the classpath. It can be found in hadoop-common/lib, and has already been present as part of the avro dependencies

The configuration dfs.image.transfer.bandwidthPerSec which defines the maximum bandwidth available for fsimage transfer is changed from 0 (meaning no throttle at all) to 50MB/s.

The Hadoop’s LZ4 compression codec now depends on lz4-java. The native LZ4 is performed by the encapsulated JNI and it is no longer necessary to install and configure the lz4 system package.

The lz4-java is declared in provided scope. Applications that wish to use lz4 codec must declare dependency on lz4-java explicitly.

WARNING: No release note provided for this change.

The default value of the configuration hadoop.http.idle_timeout.ms (how long does Jetty disconnect an idle connection) is changed from 10000 to 60000. This property is inlined during compile time, so an application that references this property must be recompiled in order for it to take effect.

Added support for HuaweiCloud OBS (https://www.huaweicloud.com/en-us/product/obs.html) to Hadoop file system, just like what we do before for S3, ADLS, OSS, etc. With simple configuration, Hadoop applications can read/write data from OBS without any code change.

Added support for rename across namespaces for RBF through DistCp. By default the option is turned off, needs to be explicitly turned on by setting dfs.federation.router.federation.rename.option to DISTCP along with the configurations required for running DistCp. In general the client timeout should also be high enough to use this functionality to ensure that the client doesn’t timeout.

Dependency on HTrace and TraceAdmin protocol/utility were removed. Tracing functionality is no-op until alternative tracer implementation is added.

Add a new configuration “dfs.datanode.same-disk-tiering.capacity-ratio.percentage” to allow admins to configure capacity for individual volumes on the same mount.

WARNING: No release note provided for this change.

`trace` subcommand of hadoop CLI was removed as a follow-up of removal of TraceAdmin protocol.

The protected commons-logging LOG has been removed from the FileSystem class. Any FileSystem implementation which referenced this for logging will no longer link. Fix: move these implementations to using their own, private SLF4J log (or other logging API)

Added a -useiterator option in distcp which uses listStatusIterator for building the listing. Primarily to reduce memory usage at client for building listing.

Removed findbugs from the hadoop build images and added spotbugs instead. Upgraded SpotBugs to 4.2.2 and spotbugs-maven-plugin to 4.2.0.

DFS client can use the newly added URI cache when creating socket address for read operations. By default it is disabled. When enabled, creating socket address will use cached URI object based on host:port to reduce the frequency of URI object creation.

To enable it, set the following config key to true: <property> <name>dfs.client.read.uri.cache.enabled</name> <value>true</value> </property>

Adds auto-reload of keystore.

Adds below new config (default 10 seconds):


The refresh interval used to check if either of the truststore or keystore certificate file has changed.

The default quota initialization thread count during the NameNode startup process (dfs.namenode.quota.init-threads) is increased from 4 to 12.

Removed log counter from JVMMetrics because this feature strongly depends on Log4J 1.x API.

This JIRA changes public fields in DFSHedgedReadMetrics. If you are using the public member variables of DFSHedgedReadMetrics, you need to use them through the public API.

In order to resolve build issues with Maven 3.8.1, we have to bump SolrJ to latest version 8.8.2 as of now. Solr is used by YARN application catalog. Hence, we would recommend upgrading Solr cluster accordingly before upgrading entire Hadoop cluster to 3.4.0 if the YARN application catalog service is used.

org.apache.hadoop.yarn.webapp.hamlet package was removed. Use org.apache.hadoop.yarn.webapp.hamlet2 instead.

Added syncronization so that the “yarn node list” command does not fail intermittently

WARNING: No release note provided for this change.

All of the default charset usages have been replaced to UTF-8. If the default charset of your environment is not UTF-8, the behavior can be different.

WARNING: No release note provided for this change.

WARNING: No release note provided for this change.

When Timeline Service V1 or V1.5 is used, if “yarn.resourcemanager.system-metrics-publisher.timeline-server-v1.enable-batch” is set to true, ResourceManager sends timeline events in batch. The default value is false. If this functionality is enabled, the maximum number that events published in batch is configured by “yarn.resourcemanager.system-metrics-publisher.timeline-server-v1.batch-size”. The default value is 1000. The interval of publishing events can be configured by “yarn.resourcemanager.system-metrics-publisher.timeline-server-v1.interval-seconds”. By default, it is set to 60 seconds.

WARNING: No release note provided for this change.

Use jetty’s Slf4jRequestLog for http request log.

As a side effect, we remove the dummy HttpRequestLogAppender, just make use of DailyRollingFileAppender. But the DRFA in log4j1 lacks the ability of specifying the max retain files, so there is no retainDays config any more.

Will add the above ability back after we switch to log4j2’s RollingFileAppender.

Throughput is one of the core performance evaluation for DataNode instance. However it does not reach the best performance especially for Federation deploy all the time although there are different improvement, because of the global coarse-grain lock. These series issues (include HDFS-16534, HDFS-16511, HDFS-15382 and HDFS-16429.) try to split the global coarse-grain lock to fine-grain lock which is double level lock for blockpool and volume, to improve the throughput and avoid lock impacts between blockpools and volumes.

Java classes generated from previous versions of avro will need to be recompiled

WARNING: No release note provided for this change.

Before this improvement, “hadoop fs -touch” command threw PathIsDirectoryException for directory. Now the command supports directory and will not throw that exception.

WARNING: No release note provided for this change.

log4j 1 was replaced with reload4j which is fork of log4j 1.2.17 with the goal of fixing pressing security issues.

If you are depending on the hadoop artifacts in your build were explicitly excluding log4 artifacts, and now want to exclude the reload4j files, you will need to update your exclusion lists <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-reload4j</artifactId> </exclusion> <exclusion> <groupId>ch.qos.reload4j</groupId> <artifactId>reload4j</artifactId> </exclusion>

Use modified jersey-json 1.20 in https://github.com/pjfanning/jersey-1.x/tree/v1.20 that uses Jackson 2.x. By this change, Jackson 1.x dependency has been removed from Hadoop. downstream applications which explicitly exclude jersey from transitive dependencies must now exclude com.github.pjfanning:jersey-json

WARNING: No release note provided for this change.

okhttp has been updated to address CVE-2021-0341

Namenode metrics that represent Slownode Json now include three important factors (median, median absolute deviation, upper latency limit) that can help user determine how urgently a given slownode requires manual intervention.

Apache Xerces has been updated to 2.12.2 to fix CVE-2022-23437

Downgrades Jackson from 2.13.2 to 2.12.7 to fix class conflicts in downstream projects. This version of jackson does contain the fix for CVE-2020-36518.

Netty has been updated to address CVE-2019-20444, CVE-2019-20445 and CVE-2022-24823

The AWS SDK has been updated to 1.12.262 to address jackson CVE-2018-7489

In preparation for an (incompatible but necessary) move to the AWS SDK v2, some uses of internal/deprecated uses of AWS classes/interfaces are logged as warnings, though only once during the life of a JVM. Set the log “org.apache.hadoop.fs.s3a.SDKV2Upgrade” to only log at INFO to hide these.

The swift:// connector for openstack support has been removed. It had fundamental problems (swift’s handling of files > 4GB). A subset of the S3 protocol is now exported by almost all object store services -please use that through the s3a connector instead. The hadoop-openstack jar remains, only now it is empty of code. This is to ensure that projects which declare the JAR a dependency will still have successful builds.

bouncy castle 1.68+ is a multirelease JAR containing java classes compiled for different target JREs. older versions of asm.jar and maven shade plugin may have problems with these. fix: upgrade the dependencies

ABFS block prefetching has been disabled to avoid HADOOP-18521 and buffer sharing on multithreaded processes (Hive, Spark etc). This will have little/no performance impact on queries against Parquet or ORC data, but can slow down sequential stream processing, including CSV files -however, the read data will be correct. It may slow down distcp downloads, where the race condition does not arise. For maximum distcp performance re-enable the readahead by setting fs.abfs.enable.readahead to true.

WARNING: No release note provided for this change.

If you have a SequenceFile with an old key or value class which has been renamed, you can use WritableName.addName to add an alias class. This functionality previously existed, but only worked for classes which extend Writable. It now works for any class, notably key or value classes which use io.serializations.

ContainerLogAppender and ContainerRollingLogAppender both have quite similar functionality as RollingFileAppender. Both are marked as IS.Unstable.

Before migrating to log4j2, replacing them with RollingFileAppender. Any downstreamers using it should do the same.

TaskLogAppender is IA.Private and IS.Unstable. Removing it before migrating to log4j2 as it is no longer used within Hadoop. Any downstreamers using it should use RollingFileAppender instead.

  1. Migrating Async appenders from code to log4j properties for namenode audit logger as well as datanode/namenode metric loggers
  2. Provided sample log4j properties on how we can configure AsyncAppender to wrap RFA for the loggers
  3. Incompatible change as three hdfs-site configs are no longer in use, they are to be replaced with log4j properties. Removed them and added a log to indicate that they are now replaced with log4j properties. The configs:
  1. Namenode audit logger as well as datanode/namenode metric loggers now use SLF4J logger rather than log4j logger directly

Support has been added for IBM Semeru Runtimes, where due to vendor name based logic and changes in the java.vendor property, failures could occur on java 11 runtimes and above.

json-smart is no longer dependency of the hadoop-auth module (it not required) so is not exported transitively as a dependency or included in hadoop releases. If application code requires this on the classpath, a version must be added to the classpath explicitly -you get to choose which one

The s3a connector no longer deletes directory markers by default, which speeds up write operations, reduces iO throttling and saves money. this can cause problems with older hadoop releases trying to write to the same bucket. (Hadoop 3.3.0; Hadoop 3.2.x before 3.2.2, and all previous releases). Set “fs.s3a.directory.marker.retention” to “delete” for backwards compatibility

By default, the mapreduce manifest committer is used for jobs working with abfs and gcs.. Hadoop mapreduce jobs will pick this up automatically; for Spark it is a bit complicated: read the docs to see the steps required.

The v1 aws-sdk-bundle JAR has been removed; it only required for third party applications or for use of v1 SDK AWSCredentialsProvider classes. There is automatic migration of the standard providers from the v1 to v2 classes, so this is only of issue for third-party providers or if very esoteric classes in the V1 SDK are used. Consult the aws_sdk_upgrade document for details

The S3A connector now uses the V2 AWS SDK. This is a significant change at the source code level. Any applications using the internal extension/override points in the filesystem connector are likely to break. Consult the document aws_sdk_upgrade for the full details.

The default value for fs.azure.data.blocks.buffer is changed from “disk” to “bytebuffer”

This will speed up writing to azure storage, at the risk of running out of memory -especially if there are many threads writing to abfs at the same time and the upload bandwidth is limited.

If jobs do run out of memory writing to abfs, change the option back to “disk”

S3A directory delete and rename will optionally abort all pending uploads under the to-be-deleted paths when fs.s3a.directory.operations.purge.upload is true It is off by default.

Hadoop S3A connector has explicit awareness of and support for S3Express storage.A filesystem can now be probed for inconsistent directoriy listings through fs.hasPathCapability(path, “fs.capability.directory.listing.inconsistent”). If true, then treewalking code SHOULD NOT report a failure if, when walking into a subdirectory, a list/getFileStatus on that directory raises a FileNotFoundException.

We have enhanced the YARN Federation functionality for improved usability. The enhanced features are as follows: 1. YARN Router now boasts a full implementation of all interfaces including the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and RMWebServiceProtocol. 2. YARN Router support for application cleanup and automatic offline mechanisms for subCluster. 3. Code improvements were undertaken for the Router and AMRMProxy, along with enhancements to previously pending functionalities. 4. Audit logs and Metrics for Router received upgrades. 5. A boost in cluster security features was achieved, with the inclusion of Kerberos support. 6. The page function of the router has been enhanced. 7. A set of commands has been added to the Router side for operating on SubClusters and Policies.

S3 Select is no longer supported through the S3A connector

If the user wants to load custom implementations of AWS Credential Providers through user provided jars can set {{fs.s3a.extensions.isolated.classloader}} to {{false}}.

maven/ivy imports of hadoop-common are less likely to end up with log4j versions on their classpath.