Apache Hadoop 3.1.0 Release Notes

These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.

Added new configuration “dfs.client.block.write.replace-datanode-on-failure.min-replication”.

The minimum number of replications that are needed to not to fail
  the write pipeline if new datanodes can not be found to replace
  failed datanodes (could be due to network failure) in the write pipeline.
  If the number of the remaining datanodes in the write pipeline is greater
  than or equal to this property value, continue writing to the remaining nodes.
  Otherwise throw exception.

  If this is set to 0, an exception will be thrown, when a replacement
  can not be found.

Adds a getconf command option to list the journal nodes. Usage: hdfs getconf -journalnodes

The first version of Resource Estimator service, a tool that captures the historical resource usage of an app and predicts its future resource requirement.

A framework is implemented to orchestrate containers on YARN

A DNS server backed by yarn service registry is implemented to enable service discovery on YARN using standard DNS lookup.

A REST API service is implemented to enable users to launch and manage container based services on YARN via REST API

Previously if multiple metrics sinks were configured with different periods, they may emit more frequently than configured, at a period as low as the GCD of the configured periods. This change makes all metrics sinks emit at their configured period.

WARNING: No release note provided for this change.

This JIRA makes following change: Change Router metrics context from ‘router’ to ‘dfs’.

Mount tables support ACL, The users won’t be able to modify their own entries (we are assuming these old (no-permissions before) mount table with owner:superuser, group:supergroup, permission:755 as the default permissions). The fix way is login as superuser to modify these mount table entries.

Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath.

Provided storage allows data stored outside HDFS to be mapped to and addressed from HDFS. It builds on heterogeneous storage by introducing a new storage type, PROVIDED, to the set of media in a datanode. Clients accessing data in PROVIDED storages can cache replicas in local media, enforce HDFS invariants (e.g., security, quotas), and address more data than the cluster could persist in the storage attached to DataNodes.

now that S3A has a checksum, you need to explicitly disable checksums when uploading from HDFS : use -skipCrc

checksum verification does work between s3a buckets, provided the block size on uploads was identical

Added new patch. Fixes white spaces and some check-style items.

FairScheduler Continuous Scheduling is deprecated starting from 3.1.0.

Support multi-thread pre-read in AliyunOSSInputStream to improve the sequential read performance from Hadoop to Aliyun OSS.

MapReduce jobs that output to filesystems without direct support for recursive delete can set mapreduce.fileoutputcommitter.task.cleanup.enabled=true to have each task delete their intermediate work directory rather than waiting for the ApplicationMaster to clean up at the end of the job. This can significantly speed up the cleanup phase for large jobs on such filesystems.

Added an option to not disables short-circuit reads on failures, by setting dfs.domain.socket.disable.interval.seconds to 0.

Fix the document error of setting up HFDS Router Federation

Change default State Store from local file to ZooKeeper. This will require additional zk address to be configured.

Updated checkstyle to 8.8 and updated maven-checkstyle-plugin to 3.0.0.

HBase integration module was mixed up with for hbase-server and hbase-client dependencies. This JIRA split into sub modules such that hbase-client dependent modules and hbase-server dependent modules are separated. This allows to make conditional compilation with different version of Hbase.

The HADOOP_CONF_DIR environment variable is no longer unconditionally inherited by containers even if it does not appear in the nodemanager whitelist variables specified by the yarn.nodemanager.env-whitelist property. If the whitelist property has been modified from the default to not include HADOOP_CONF_DIR yet containers need it to be inherited from the nodemanager’s environment then the whitelist settings need to be updated to include HADOOP_CONF_DIR.

Federation supports and controls global quota at mount table level.

In a federated environment, a folder can be spread across multiple subclusters. Router aggregates quota that queried from these subclusters and uses that for the quota-verification.