Apache Hadoop 2.0.3-alpha Release Notes

These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.


This jira adds a new DataNode state called “stale” at the NameNode. DataNodes are marked as stale if it does not send heartbeat message to NameNode within the timeout configured using the configuration parameter “dfs.namenode.stale.datanode.interval” in seconds (default value is 30 seconds). NameNode picks a stale datanode as the last target to read from when returning block locations for reads.

This feature is by default turned * off *. To turn on the feature, set the HDFS configuration “dfs.namenode.check.stale.datanode” to true.


WARNING: No release note provided for this change.


A map-task’s syslogs now carries basic info on the InputSplit it processed.


This jira adds a new metric with name “StaleDataNodes” under metrics context “dfs” of type Gauge. This tracks the number of DataNodes marked as stale. A DataNode is marked stale when the heartbeat message from the DataNode is not received within the configured time "“dfs.namenode.stale.datanode.interval”.

Please see hdfs-default.xml documentation corresponding to "“dfs.namenode.stale.datanode.interval” for more details on how to configure this feature. When this feature is not configured, this metrics would return zero.


Add a JSONP alternative outpout for /jmx HTTP interface to provide a Javascript polling ability in browsers.


Speed up Crc32 by improving the cache hit-ratio of hadoop.util.PureJavaCrc32


Handle TaskAttempt diagnostic updates while in the NEW and UNASSIGNED states.


The change from this jira changes the content of some of the log messages. No log message are removed. Only the content of the log messages is changed to reduce the size. If you have a tool that depends on the exact content of the log, please look at the patch and make appropriate updates to the tool.


“test” will not print a warning for non-existent paths when testing for existence


Add a separate logger “BlockStateChange” for block state change logs.


The RPC SASL negotiation now always ends with final response. If the SASL mechanism does not have a final response (GSSAPI, PLAIN), then an empty success response is sent to the client. The client will now always expect a final response to definitively know if negotiation is complete/successful.


Allow ReduceTask loading a third party plugin for shuffle (and merge) instead of the default shuffle.


MAPREDUCE-4807 Allow external implementations of the sort phase in a Map task


Update FileStatus.toString to include missing fields


WARNING: No release note provided for this change.


Patches adds more tests to verify overwritten and more complex operations -write-delete-overwrite. By using differently sized datasets and different data inside, these tests verify that the overwrite really did take place. While HDFS meets all these requirements directly, eventually consistent object stores may not -hence these tests.


Resolved as part of HADOOP-9119 -it’s test data generator creates more bits in every test byte


Member dataEncryptionKey of the protobuf message GetDataEncryptionKeyResponseProto is made optional instead of required. This is incompatible change is not likely to affect the existing users (that are using HDFS FileSystem and other public APIs).


Protobuf message GetLinkTargetResponseProto member targetPath is made optional from required so that null values can be passed over the wire. This is an incompatible wire protocol change and does not affect the API backward compatibility.


Protobuf message GetBlockKeysResponseProto member keys is made optional from required so that null values can be passed over the wire. This is an incompatible wire protocol change and does not affect the API backward compatibility.


Protobuf message GetDelegationTokenRequestProto field renewer is made requried from optional. This change is not wire compatible with the older releases.


The default group mapping policy has been changed to JniBasedUnixGroupsNetgroupMappingWithFallback. This should maintain the same semantics as the prior default for most users.


This jira introduces a new configuration parameter “ipc.client.connect.timeout”. This configuration defines the Hadoop RPC connection timeout in milliseconds for a client to connect to a server. For details see the description associated with this configuration in core-default.xml.


The HDFS implementation of getFileChecksum() can now operate correctly against earlier-version datanodes which do not include the checksum type information in their checksum response. The checksum type is automatically inferred by issuing a read of the first byte of each block.


This is an incompatible change from release 2.0.2-alpha and prior releases. Balancer tool exited with exit code 1 on success. It is changed to exit with exit code 0 on success. Non 0 exit code indicates failure.


This patch makes an incompatible configuration change, as described below: In releases 1.1.0 and other point releases 1.1.x, the configuration parameter “dfs.namenode.check.stale.datanode” could be used to turn on checking for the stale nodes. This configuration is no longer supported in release 1.2.0 onwards and is renamed as “dfs.namenode.avoid.read.stale.datanode”.

How feature works and configuring this feature: As described in HDFS-3703 release notes, datanode stale period can be configured using parameter “dfs.namenode.stale.datanode.interval” in seconds (default value is 30 seconds). NameNode can be configured to use this staleness information for reads using configuration “dfs.namenode.avoid.read.stale.datanode”. When this parameter is set to true, namenode picks a stale datanode as the last target to read from when returning block locations for reads. Using staleness information for writes is as described in the releases notes of HDFS-3912.


WARNING: No release note provided for this change.