These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.
Drop support for Hbase V1 as the back end of the YARN Application Timeline service, which becomes HBase 2 only. The supported version HBase version is 2.5.8
This does not have any effect on HBase deployments themselves
Hadoop has upgraded to commons-collections4-4.4. This MUST be on the classpath to create a Configuration() object. The hadoop-commons dependency exports has been updated appropriately. If dependencies are configured manually, these MUST be updated. Applications which require the older “commons-collections” binaries on their classpath may have to explicitly add them.
Backwards-incompatible upgrade for security reasons. All field access is now via setter/getter methods. To marshal Serializable objects the packages they are in must be declared in the system property “{{org.apache.avro.SERIALIZABLE_PACKAGES}}”
This upgrade does break our compatibility policy -but this is a critical security issue. Everyone using an avro version before 1.11.4 MUST upgrade.
The thresholds at which adjacent vector IO read ranges are coalesced into a single range has been increased, as has the limit at which point they are considered large enough that parallel reads are faster.
* The min/max for local filesystems and any other FS without custom support are now 16K and 1M * s3a and abfs use 128K as the minimum size, 2M for max.
These values are based on the Facebook Velox paper which stated their thresholds for merging were 20K for local SSD and 500K for cloud storage
Jetty has been upgraded to address CVE-2024-22201, CVE-2023-44487, CVE-2024-8184, CVE-2024-13009
The option fs.s3a.create.checksum.algorithm allows checksums to be set on file upload; It supports the following values: ‘CRC32’, ‘CRC32C’, ‘SHA1’, and ‘SHA256’
A Hadoop 2.10.1 or 2.10.2 client connecting to a Hadoop 3.4.0~3.4.1, 3.3.0~3.3.6, 3.2.1~3.2.4 (any version with HDFS-13541) cluster could cause the DataNode to disconnect any subsequent client connections due to an incompatible binary protocol change.
HDFS-16644 provides a partial fix: a Hadoop 3 cluster will reject Hadoop 2.10.1/2.10.2 clients, but it will not fail other subsequent client connections.
For Hadoop 2 cluster wishing to upgrade to Hadoop 3 in a rolling fashion, the workaround is to perform a two-step upgrade: upgrade to an earlier Hadoop 3 version without HDFS-13541, and then upgrade again to the newer Hadoop 3 version. Or revert HDFS-13541 from your version and rebuild.
S3A client now uses S3 conditional overwrite PUT requests to perform overwrite protection checks at end of PUT request (stream close()). This saves a HEAD request on file creation, and actually delivers an atomic creation. It may not be supported on third party stores: set fs.s3a.create.conditional.enabled to false to revert to the old behavior. Consult the third-party-stores documentation for details.
AWS SDK 2.30.0 and later are (currently) incompatible with third party stores. Accordingly, this release is kept at a 2.29 version. See HADOOP-19490. There may now be some problems using the AWS4SignerType and S3.
S3A output streams no longer log warnings on use of hflush()/raise exceptions if fs.s3a.downgrade.syncable.exceptions = false. hsync() is reported with a warning/rejected, as appropriate. That method us absolutely unsupported when writing to S3
Option “fs.file.checksum.verify” disables checksum verification in local FS, so sliced subsets of larger buffers are never returned.
Stream capability “fs.capability.vectoredio.sliced” is true if a filesystem knows that it is returning slices of a larger buffer. This is false if a filesystem doesn’t, or against the local FS in releases which lack this feature.