Apache Hadoop 2.4.0 Release Notes

These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.

Committed to branch-2 and trunk.

Add option for distcp to preserve the checksum type of the source files. Users can use “-pc” as distcp command option to preserve the checksum type.

Fixes NFS on Kerberized cluster.

Use protobuf to serialize/deserialize the FSImage.

I just committed this. Thank you Chu.

If a read from a block is slow, start up another parallel, ‘hedged’ read against a different block replica. We then take the result of which ever read returns first (the outstanding read is cancelled). This ‘hedged’ read feature will help rein in the outliers, the odd read that takes a long time because it hit a bad patch on the disc, etc.

This feature is off by default. To enable this feature, set <code>dfs.client.hedged.read.threadpool.size</code> to a positive number. The threadpool size is how many threads to dedicate to the running of these ‘hedged’, concurrent reads in your client.

Then set <code>dfs.client.hedged.read.threshold.millis</code> to the number of milliseconds to wait before starting up a ‘hedged’ read. For example, if you set this property to 10, then if a read has not returned within 10 milliseconds, we will start up a new read against a different block replica.

This feature emits new metrics:

The `ls` command only prints “Found foo items” once when listing the directories recursively.

dfs.http.port and dfs.https.port are removed. Filesystem clients, such as WebHdfsFileSystem, now have fixed instead of configurable default ports (i.e., 50070 for http and 50470 for https).

Users can explicitly specify the port in the URI to access the file system which runs on non-default ports.

The hadoop.rpc.protection configuration property previously supported specifying a single value: one of authentication, integrity or privacy. An unrecognized value was silently assumed to mean authentication. This configuration property now accepts a comma-separated list of any of the 3 values, and unrecognized values are rejected with an error. Existing configurations containing an invalid value must be corrected. If the property is empty or not specified, authentication is assumed.

The default configuration of HDFS now sets dfs.namenode.fs-limits.max-component-length to 255 for improved interoperability with other file system implementations. This limits each component of a file system path to a maximum of 255 bytes in UTF-8 encoding. Attempts to create new files that violate this rule will fail with an error. Existing files that violate the rule are not effected. Previously, dfs.namenode.fs-limits.max-component-length was set to 0 (ignored). If necessary, it is possible to set the value back to 0 in the cluster’s configuration to restore the old behavior.

WARNING: No release note provided for this change.

SaslPropertiesResolver or its subclass is used to resolve the QOP used for a connection. The subclass can be specified via “hadoop.security.saslproperties.resolver.class” configuration property. If not specified, the full set of values specified in hadoop.rpc.protection is used while determining the QOP used for the connection. If a class is specified, then the QOP values returned by the class will be used while determining the QOP used for the connection.

Note that this change, effectively removes SaslRpcServer.SASL_PROPS which was a public field. Any use of this variable should be replaced with the following code: SaslPropertiesResolver saslPropsResolver = SaslPropertiesResolver.getInstance(conf); Map<String, String> sasl_props = saslPropsResolver.getDefaultProperties();

HDFS now supports ACLs (Access Control Lists). ACLs can specify fine-grained file permissions for specific named users or named groups.

WARNING: No release note provided for this change.

WARNING: No release note provided for this change.