Apache Hadoop 1.1.0 Release Notes

These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.

Zero values for dfs.socket.timeout and dfs.datanode.socket.write.timeout are now respected. Previously zero values for these parameters resulted in a 5 second timeout.

The default minimum heartbeat interval has been dropped from 3 seconds to 300ms to increase scheduling throughput on small clusters. Users may tune mapreduce.jobtracker.heartbeats.in.second to adjust this value.

When configuring proxy users and hosts, the special wildcard value “*” may be specified to match any host or any user.

Adds system tests to Gridmix. These system tests cover various features like job types (load and sleep), user resolvers (round-robin, submitter-user, echo) and submission modes (stress, replay and serial).

Improves cumulative CPU emulation for short running tasks.

Backports latest features from trunk to 0.20.206 branch.

HDFS now has the ability to use posix_fadvise and sync_data_range syscalls to manage the OS buffer cache. This support is currently considered experimental, and may be enabled by configuring the following keys: dfs.datanode.drop.cache.behind.writes - set to true to drop data out of the buffer cache after writing dfs.datanode.drop.cache.behind.reads - set to true to drop data out of the buffer cache when performing sequential reads dfs.datanode.sync.behind.writes - set to true to trigger dirty page writeback immediately after writing data dfs.datanode.readahead.bytes - set to a non-zero value to trigger readahead for sequential reads

Rumen now provides {{Parsed*}} objects. These objects provide extra information that are not provided by {{Logged*}} objects.

Document and raise the maximum allowed transfer threads on a DataNode to 4096. This helps Apache HBase in particular.

The fsck “move” option is no longer destructive. It copies the accessible blocks of corrupt files to lost and found as before, but no longer deletes the corrupt files after copying the blocks. The original, destructive behavior can be enabled by specifying both the “move” and “delete” options.

WARNING: No release note provided for this change.

Fixes the issue of GenerateDistCacheData job slowness.

This is a new feature. It is documented in hdfs_user_guide.xml.

The ‘namenode -format’ command now supports the flags ‘-nonInteractive’ and ‘-force’ to improve usefulness without user input.

WARNING: No release note provided for this change.

Append is not supported in Hadoop 1.x. Please upgrade to 2.x if you need append. If you enabled dfs.support.append for HBase, you’re OK, as durable sync (why HBase required dfs.support.append) is now enabled by default. If you really need the previous functionality, to turn on the append functionality set the flag “dfs.support.broken.append” to true.

getBlockLocations(), and hence open() for read, will now throw SafeModeException if the NameNode is still in safe mode and there are no replicas reported yet for one of the blocks in the file.

Add a utility method HdfsUtils.isHealthy(uri) for checking if the given HDFS is healthy.

This patch enables durable sync by default. Installation where HBase was not used, that used to run without setting “dfs.support.append” or setting it to false explicitly in the configuration, must add a new flag “dfs.durable.sync” and set it to false to preserve the previous semantics.

WARNING: No release note provided for this change.

Due to the requirement that KSSL use weak encryption types for Kerberos tickets, HTTP authentication to the NameNode will now use SPNEGO by default. This will require users of previous branch-1 releases with security enabled to modify their configurations and create new Kerberos principals in order to use SPNEGO. The old behavior of using KSSL can optionally be enabled by setting the configuration option “hadoop.security.use-weak-http-crypto” to “true”.

This change adds two new configuration parameters.

{{dfs.namenode.invalidate.work.pct.per.iteration}} for controlling deletion rate of blocks.

{{dfs.namenode.replication.work.multiplier.per.iteration}} for controlling replication rate. This in turn allows controlling the time it takes for decommissioning.

Please see hdfs-default.xml for detailed description.

This jira adds a new DataNode state called “stale” at the NameNode. DataNodes are marked as stale if it does not send heartbeat message to NameNode within the timeout configured using the configuration parameter “dfs.namenode.stale.datanode.interval” in seconds (default value is 30 seconds). NameNode picks a stale datanode as the last target to read from when returning block locations for reads.

This feature is by default turned * off *. To turn on the feature, set the HDFS configuration “dfs.namenode.check.stale.datanode” to true.

Fixed TestRawHistoryFile and TestJobHistoryServer to not write to /tmp.

Fixed a race condition caused in TestKillSubProcesses caused due to a recent commit.

Optionally call initialize/initializeFileSystem in JobTracker::startTracker() to allow for proper initialization when offerService is not being called.