Apache Hadoop 1.1.0 Release Notes

These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.

WARNING: No release note provided for this incompatible change.

This patch enables durable sync by default. Installation where HBase was not used, that used to run without setting “dfs.support.append” or setting it to false explicitly in the configuration, must add a new flag “dfs.durable.sync” and set it to false to preserve the previous semantics.

WARNING: No release note provided for this incompatible change.

Append is not supported in Hadoop 1.x. Please upgrade to 2.x if you need append. If you enabled dfs.support.append for HBase, you’re OK, as durable sync (why HBase required dfs.support.append) is now enabled by default. If you really need the previous functionality, to turn on the append functionality set the flag “dfs.support.broken.append” to true.

WARNING: No release note provided for this incompatible change.

When configuring proxy users and hosts, the special wildcard value “*” may be specified to match any host or any user.

Zero values for dfs.socket.timeout and dfs.datanode.socket.write.timeout are now respected. Previously zero values for these parameters resulted in a 5 second timeout.

This change adds two new configuration parameters.

{{dfs.namenode.invalidate.work.pct.per.iteration}} for controlling deletion rate of blocks.

{{dfs.namenode.replication.work.multiplier.per.iteration}} for controlling replication rate. This in turn allows controlling the time it takes for decommissioning.

Please see hdfs-default.xml for detailed description.

This jira adds a new DataNode state called “stale” at the NameNode. DataNodes are marked as stale if it does not send heartbeat message to NameNode within the timeout configured using the configuration parameter “dfs.namenode.stale.datanode.interval” in seconds (default value is 30 seconds). NameNode picks a stale datanode as the last target to read from when returning block locations for reads.

This feature is by default turned * off *. To turn on the feature, set the HDFS configuration “dfs.namenode.check.stale.datanode” to true.

getBlockLocations(), and hence open() for read, will now throw SafeModeException if the NameNode is still in safe mode and there are no replicas reported yet for one of the blocks in the file.

Add a utility method HdfsUtils.isHealthy(uri) for checking if the given HDFS is healthy.

The ‘namenode -format’ command now supports the flags ‘-nonInteractive’ and ‘-force’ to improve usefulness without user input.

This is a new feature. It is documented in hdfs_user_guide.xml.

The fsck “move” option is no longer destructive. It copies the accessible blocks of corrupt files to lost and found as before, but no longer deletes the corrupt files after copying the blocks. The original, destructive behavior can be enabled by specifying both the “move” and “delete” options.

Document and raise the maximum allowed transfer threads on a DataNode to 4096. This helps Apache HBase in particular.

Due to the requirement that KSSL use weak encryption types for Kerberos tickets, HTTP authentication to the NameNode will now use SPNEGO by default. This will require users of previous branch-1 releases with security enabled to modify their configurations and create new Kerberos principals in order to use SPNEGO. The old behavior of using KSSL can optionally be enabled by setting the configuration option “hadoop.security.use-weak-http-crypto” to “true”.

HDFS now has the ability to use posix_fadvise and sync_data_range syscalls to manage the OS buffer cache. This support is currently considered experimental, and may be enabled by configuring the following keys: dfs.datanode.drop.cache.behind.writes - set to true to drop data out of the buffer cache after writing dfs.datanode.drop.cache.behind.reads - set to true to drop data out of the buffer cache when performing sequential reads dfs.datanode.sync.behind.writes - set to true to trigger dirty page writeback immediately after writing data dfs.datanode.readahead.bytes - set to a non-zero value to trigger readahead for sequential reads

Optionally call initialize/initializeFileSystem in JobTracker::startTracker() to allow for proper initialization when offerService is not being called.

Fixed a race condition caused in TestKillSubProcesses caused due to a recent commit.

Fixed TestRawHistoryFile and TestJobHistoryServer to not write to /tmp.

Fixes the issue of GenerateDistCacheData job slowness.

Rumen now provides {{Parsed*}} objects. These objects provide extra information that are not provided by {{Logged*}} objects.

Backports latest features from trunk to 0.20.206 branch.

Improves cumulative CPU emulation for short running tasks.

Adds system tests to Gridmix. These system tests cover various features like job types (load and sleep), user resolvers (round-robin, submitter-user, echo) and submission modes (stress, replay and serial).

The default minimum heartbeat interval has been dropped from 3 seconds to 300ms to increase scheduling throughput on small clusters. Users may tune mapreduce.jobtracker.heartbeats.in.second to adjust this value.