Apache Hadoop 3.1.0

Apache Hadoop 3.1.0 incorporates a number of significant enhancements over the previous minor release line (hadoop-3.0).

Overview

Users are encouraged to read the full set of release notes. This page provides an overview of the major changes.

Here is a short overview of the major features and improvements.

Yarn Service framework provides first class support and APIs to host long running services natively in YARN.

In a nutshell, it serves as a container orchestration platform for managing containerized services on YARN. It supports both docker container and traditional process based containers in YARN.

See the user documentation for more details.

First-class GPU scheduling and isolation (For both docker/non-docker containers) on YARN.

See the user documentation for more details.

First-class FPGA scheduling and isolation (For both docker/non-docker containers) on YARN.

See the user documentation for more details.

Support more expressive placement constraints in YARN. Such constraints can be crucial for the performance and resilience of applications, especially those that include long-running containers, such as services, machine-learning and streaming workloads.

For example, it may be beneficial to co-locate the allocations of a job on the same rack (affinity constraints) to reduce network costs, spread allocations across machines (anti-affinity constraints) to minimize resource interference, or allow up to a specific number of allocations in a node group (cardinality constraints) to strike a balance between the two. Placement decisions also affect resilience. For example, allocations placed within the same cluster upgrade domain would go offline simultaneously.

See the user documentation for more details.

Support administrators to specify absolute resources (X Memory, Y VCores, Z GPUs, etc.) to a queue instead of providing percentage based values. This provides better control for admins to configure required amount of resources for a given queue.

See the user documentation for more details.

Provided storage allows data stored outside HDFS to be mapped to and addressed from HDFS. It builds on heterogeneous storage by introducing a new storage type, PROVIDED, to the set of media in a DataNode.

See the user documentation for more details.

Getting Started

The Hadoop documentation includes the information you need to get started using Hadoop. Begin with the Single Node Setup which shows you how to set up a single-node Hadoop installation. Then move on to the Cluster Setup to learn how to set up a multi-node Hadoop installation.