Apache Hadoop 2.7.0 Release Notes

These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.

WARNING: No release note provided for this change.

Based on the reconfiguration framework provided by HADOOP-7001, enable reconfigure the dfs.datanode.data.dir and add new volumes into service.

New fs -find command

Keys with uppercase names can no longer be created when using the JavaKeyStoreProvider to resolve ambiguity about case-sensitivity in the KeyStore spec.

WARNING: No release note provided for this change.

WARNING: No release note provided for this change.

Hadoop now supports integration with Azure Storage as an alternative Hadoop Compatible File System.

WARNING: No release note provided for this change.

The following parameters are introduced in this JIRA: fs.s3a.threads.max: the maximum number of threads to allow in the pool used by TransferManager fs.s3a.threads.core: the number of threads to keep in the pool used by TransferManager fs.s3a.threads.keepalivetime: when the number of threads is greater than the core, this is the maximum time that excess idle threads will wait for new tasks before terminating fs.s3a.max.total.tasks: the maximum number of tasks that the LinkedBlockingQueue can hold

We have reinstated support for launching Hadoop processes on Windows by using Cygwin to run the shell scripts. All processes still must have access to the native components: hadoop.dll and winutils.exe.

  1. HDFS now can choose to append data to a new block instead of end of the last partial block. Users can pass {{CreateFlag.APPEND}} and {{CreateFlag.NEW_BLOCK}} to the {{append}} API to indicate this requirement.
  2. HDFS now allows users to pass {{SyncFlag.END_BLOCK}} to the {{hsync}} API to finish the current block and write remaining data to a new block.

WARNING: No release note provided for this change.

Hadoop metrics sent to Ganglia over multicast now support optional configuration of socket TTL. The default TTL is 1, which preserves the behavior of prior Hadoop versions. Clusters that span multiple subnets/VLANs will likely want to increase this.

WARNING: No release note provided for this change.

Apache Curator version change: Apache Hadoop has updated the version of Apache Curator used from 2.6.0 to 2.7.1. This change should be binary and source compatible for the majority of downstream users. Notable exceptions:

Downstream users are reminded that while the Hadoop community will attempt to avoid egregious incompatible dependency changes, there is currently no policy around when Hadoop’s exposed dependencies will change across versions (ref http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#Java_Classpath ).

Introduced a new configuration dfs.pipeline.ecn. When the configuration is turned on, DataNodes will signal in the writing pipelines when they are overloaded. The client can back off based on this congestion signal to avoid overloading the system.

Add a feature for replica pinning so that when a replica is pinned in a datanode, it will not be moved by Balancer/Mover. The replica pinning feature can be enabled/disabled by “dfs.datanode.block-pinning.enabled”, where the default is false.

  1. Introduced quota by storage type as a hard limit on the amount of space usage allowed for different storage types (SSD, DISK, ARCHIVE) under the target directory.
  2. Added {{SetQuotaByStorageType}} API and {{-storagetype}} option for {{hdfs dfsadmin -setSpaceQuota/-clrSpaceQuota}} commands to allow set/clear quota by storage type under the target directory.

This fix moves the public class StorageType from the package org.apache.hadoop.hdfs to org.apache.hadoop.fs.

Removed commons-httpclient dependency from hadoop-yarn-server-web-proxy module.

The Hadoop Common native components now support 32-bit build targets on Windows.

LibHDFS now supports 32-bit build targets on Windows.

This introduces two new MR2 job configs, mentioned below, which allow users to control the maximum simultaneously-running tasks of the submitted job, across the cluster:

This is controllable at a per-job level.

This merges Block.BLOCK_FILE_PREFIX and DataStorage.BLOCK_FILE_PREFIX into one constant. Hard-coded literals of “blk_” in various files are also updated to use the same constant.

This change introduces a new configuration key used to throttle decommissioning work, “dfs.namenode.decommission.blocks.per.interval”. This new key overrides and deprecates the previous related configuration key “dfs.namenode.decommission.nodes.per.interval”. The new key is intended to result in more predictable pause times while scanning decommissioning nodes.

Applications which made use of the LogAggregationContext in their application will need to revisit this code in order to make sure that their logs continue to get rolled out.

Added a section to BUILDING.txt on how to install required / optional packages on a clean install of Ubuntu 14.04 LTS Desktop.

Went through the CMakeLists.txt files in the repo and added the following optional library dependencies - Snappy, Bzip2, Linux FUSE and Jansson.

Updated the required packages / version numbers from the trunk branch version of BUILDING.txt.

ProtocolBuffer is packaged in Ubuntu