Apache Hadoop 2.0.2-alpha Release Notes

These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.


WARNING: No release note provided for this incompatible change.


The trash emptier may no longer be run using “hadoop org.apache.hadoop.fs.Trash”. The trash emptier runs on the NameNode (if configured). Old trash checkpoints may be deleted using “hadoop fs -expunge”.


distcp skips CRC on 0 byte files.


If fs.trash.interval is configured on the server then the client’s value for this configuration is ignored.


FsShell’s “mkdir” no longer implicitly creates all non-existent parent directories. The command adopts the posix compliant behavior of requiring the “-p” flag to auto-create parent directories.


Merged the change to branch-2


WARNING: No release note provided for this incompatible change.


WARNING: No release note provided for this incompatible change.


WARNING: No release note provided for this incompatible change.


Resolve sporadic distcp issue due to having two DistCp classes (v1 & v2) in the classpath.


Improved excpetion handling of shutting down web server. (Devaraj K via Eric Yang)


This is an incompatible change: Before this change, if a file is already open for write by one client, and another client calls fs.create() with overwrite=true, an AlreadyBeingCreatedException is thrown. After this change, the file will be deleted and the new file will be created successfully.


The datanode now performs 4MB readahead by default when reading data from its disks, if the native libraries are present. This has been shown to improve performance in many workloads. The feature may be disabled by setting dfs.datanode.readahead.bytes to “0”.


WARNING: No release note provided for this incompatible change.


getBlockLocations(), and hence open() for read, will now throw SafeModeException if the NameNode is still in safe mode and there are no replicas reported yet for one of the blocks in the file.


Add a utility method HdfsUtils.isHealthy(uri) for checking if the given HDFS is healthy.


This change adds two new configuration parameters.

{{dfs.namenode.invalidate.work.pct.per.iteration}} for controlling deletion rate of blocks.

{{dfs.namenode.replication.work.multiplier.per.iteration}} for controlling replication rate. This in turn allows controlling the time it takes for decommissioning.

Please see hdfs-default.xml for detailed description.


HDFS no longer silently ignores missing or unreadable host files specified by dfs.hosts or dfs.hosts.exclude. In order to specify that no hosts should be included or excluded, administrators should either refrain from setting the relevant config properties, or create an empty file in order to represent an empty list.


WARNING: No release note provided for this incompatible change.


libhdfs is enhanced to read directly into user-supplied buffers when possible, reducing the number of memory copies.


Introduced a new command, “hdfs dfsadmin -rollEdits” which requests that the active NameNode roll its edit log. This can be useful for administrators manually backing up log segments.


libhdfs now uses the server block size configuration rather than the deprecated dfs.block.size client configuration.


This jira removes functionality that has not been used/applicable since release 0.17. The incompatibility introduced by this change will not affect any HDFS users.


Due to the requirement that KSSL use weak encryption types for Kerberos tickets, HTTP authentication to the NameNode will now use SPNEGO by default. This will require users of previous branch-1 releases with security enabled to modify their configurations and create new Kerberos principals in order to use SPNEGO. The old behavior of using KSSL can optionally be enabled by setting the configuration option “hadoop.security.use-weak-http-crypto” to “true”.


WARNING: No release note provided for this incompatible change.


WARNING: No release note provided for this incompatible change.


-Djava.library.path in mapred.child.java.opts can cause issues with native libraries. LD_LIBRARY_PATH through mapred.child.env should be used instead.


The Job Summary log may contain commas in values that are escaped by a ’' character. This was true before, but is more likely to be exposed now.


ContainerTokens now have an expiry interval so that stale tokens cannot be used for launching containers.


Fixed NodeManagers’ decommissioning at RM to accept IP addresses also.


Removes two sets of previously available config properties:

  1. ( yarn.scheduler.fifo.minimum-allocation-mb and yarn.scheduler.fifo.maximum-allocation-mb ) and,
  2. ( yarn.scheduler.capacity.minimum-allocation-mb and yarn.scheduler.capacity.maximum-allocation-mb )

In favor of two new, generically named properties:

  1. yarn.scheduler.minimum-allocation-mb - This acts as the floor value of memory resource requests for containers.
  2. yarn.scheduler.maximum-allocation-mb - This acts as the ceiling value of memory resource requests for containers.

Both these properties need to be set at the ResourceManager (RM) to take effect, as the RM is where the scheduler resides.

Also changes the default minimum and maximums to 128 MB and 10 GB respectively.


Note that to apply this you should first run the script - ./MAPREDUCE-3543v3.sh svn, then apply the patch.

If this is merged to more then trunk, the version inside of hadoop-tools/hadoop-gridmix/pom.xml will need to be udpated accordingly.


Fixed a bug in MR client to redirect to JobHistoryServer correctly when RM forgets the app.