Apache Hadoop 3.2.0

Apache Hadoop 3.2.0 incorporates a number of significant enhancements over the previous stable minor release line (hadoop-3.1).

This is the first release in 3.2 release line which is not yet generally available (GA) or production ready.

Overview

Users are encouraged to read the full set of release notes. This page provides an overview of the major changes.

Node Attributes Support in YARN

Node Attributes helps to tag multiple labels on the nodes based on its attributes and supports placing the containers based on expression of these labels.

More details are available in the Node Attributes documentation.

Hadoop Submarine on YARN

Hadoop Submarine enables data engineers to easily develop, train and deploy deep learning models (in TensorFlow) on very same Hadoop YARN cluster where data resides.

More details are available in the Hadoop Submarine documentation.

Storage Policy Satisfier

Supports HDFS (Hadoop Distributed File System) applications to move the blocks between storage types as they set the storage policies on files/directories.

More details are available in the Storage Policy Satisfier documentation.

ABFS Filesystem connector

Supports the latest Azure Datalake Gen2 Storage.

Enhanced S3A connector

Support of an enhanced S3A connector, including better resilience to throttled AWS S3 and DynamoDB IO.

Upgrades for YARN long running services

Supports in-place seamless upgrades of long running containers via YARN Native Service API and CLI.

More details are available in the YARN Service Upgrade documentation.

Getting Started

The Hadoop documentation includes the information you need to get started using Hadoop. Begin with the Single Node Setup which shows you how to set up a single-node Hadoop installation. Then move on to the Cluster Setup to learn how to set up a multi-node Hadoop installation.