Apache Hadoop 0.23.1 Release Notes

These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.

Generate integration artifacts “org.apache.hadoop:hadoop-client” and “org.apache.hadoop:hadoop-minicluster” containing all the jars needed to use Hadoop client APIs, and to run Hadoop MiniClusters, respectively. Push these artifacts to the maven repository when mvn-deploy, along with existing artifacts.

Adding config for MapReduce History Server protocol in hadoop-policy.xml for service level authorization.

Fix ViewFS to catch a null canonical service-name and pass tests TestViewFileSystem*

Fixed Configuration.getClasses() API to return the default value if the key is not set.

The ‘fs -getmerge’ tool now uses a -nl flag to determine if adding a newline at end of each file is required, in favor of the ‘addnl’ boolean flag that was used earlier.

Provide WebHDFS as a complete FileSystem implementation for accessing HDFS over HTTP. Previous hftp feature was a read-only FileSystem and does not provide “write” accesses.

  1. New configurations a. dfs.block.local-path-access.user is the key in datanode configuration to specify the user allowed to do short circuit read. b. dfs.client.read.shortcircuit is the key to enable short circuit read at the client side configuration. c. dfs.client.read.shortcircuit.skip.checksum is the key to bypass checksum check at the client side.
  2. By default none of the above are enabled and short circuit read will not kick in.
  3. If security is on, the feature can be used only for user that has kerberos credentials at the client, therefore map reduce tasks cannot benefit from it in general.

The default checksum algorithm used on HDFS is now CRC32C. Data from previous versions of Hadoop can still be read backwards-compatibly.

BlockReader has been reimplemented to use direct byte buffers. If you use a custom socket factory, it must generate sockets that have associated Channels.

The default blocksize property ‘dfs.blocksize’ now accepts unit symbols to be used instead of byte length. Values such as “10k”, “128m”, “1g” are now OK to provide instead of just no. of bytes as was before.

Fixed and reenabled tests related to MR child JVM’s environmental variables in TestMiniMRChildTask.

Addressed MR AM hanging issues during AM restart and then the recovery.

Changed MR AM to not add the same rack entry multiple times into the container request table when multiple hosts for a split happen to be on the same rack

Fixed MR AM to always use hostnames and never IPs when requesting containers so that scheduler can give off data local containers correctly.

Fixed an NPE in FileOutputCommitter for jobs with maps but no reduces.

fix cross scripting attacks vulnerability through webapp interface.

Added test to validate that AM can crash multiple times and still can recover successfully after MAPREDUCE-3846.

Fixed CapacityScheduler so that maxActiveApplication and maxActiveApplicationsPerUser per queue are not too low for small clusters.

MAPREDUCE-3774. Moved yarn-default.xml to hadoop-yarn-common from hadoop-server-common.

Changed active nodes list to not contain unhealthy nodes on the webUI and metrics.

Modified RM UI to filter applications based on state of the applications.

Modified application limits to include queue max-capacities besides the usual user limits.

Modified CapacityScheduler to use only users with pending requests for computing user-limits.

Changed bin/mapred job -list to not print job-specific information not available at RM.

Very minor incompatibility in cmd-line output, inevitable due to MRv2 architecture.

Fixing YARN+MR to allow MR jobs to be able to use java.io.File.createTempFile to create temporary files as part of their tasks.

Fixed EventFetcher and Fetcher threads to shut-down properly so that reducers don’t hang in corner cases.

Fixed the way head-room is allocated to applications by CapacityScheduler so that it deducts current-usage per user and not per-application.

Fixed MR AM recovery so that only single selected task output is recovered and thus reduce the unnecessarily bloated recovery time.

Improved FileInputFormat to return better locality for the last split.

New JMX Bean in ResourceManager to provide list of live node managers:

Hadoop:service=ResourceManager,name=RMNMInfo LiveNodeManagers

Increased RPC handlers for all YARN servers to reasonable values for working at scale.

Fixed a race condition in MR AM which is failing the sort benchmark consistently.

Making CapacityScheduler more conservative so as to assign only one off-switch container in a single scheduling iteration.

Fixed TokenCache to work with absent FileSystem canonical service-names.

Fixed TaskHeartbeatHandler to not hold a global lock for all task-updates.

Rumen now provides {{Parsed*}} objects. These objects provide extra information that are not provided by {{Logged*}} objects.

Modified CompositeService to avoid duplicate stop operations thereby solving race conditions in MR AM shutdown.

Optimized Job’s progress calculations in MR AM.

Fixed failures in TestStagingCleanup and TestJobEndNotifier tests.

Modified NM to report correct http address when an ephemeral web port is configured.

Fixed an NPE occuring during scheduling in the ResourceManager.

Fixed TaskHeartBeatHandler to use a new configuration for the thread loop interval separate from task-timeout configuration property.

Fixed a deadlock in NodeManager LocalDirectories’s handling service.

Batching JobHistory flushing to DFS so that we don’t flush for every event slowing down AM.

Removed a multitude of cloned/duplicate counters in the AM thereby reducing the AM heap size and preventing full GCs.

Fixed MapReduce AM to count failed maps also towards Reduce ramp up.

Fixed JobHistory web-UI to display links to single task’s counters’ page.

Fixed JobEndNotifier to not get interrupted before completing all its retries.

Modified Gridmix STRESS mode locking structure. The submitted thread and the polling thread now run simultaneously without blocking each other.

Fixed failing JUnit tests in Gridmix.

Fixed MR AM in uber mode to write map intermediate outputs in the correct directory to work properly in secure mode.

Fixed job-access-controls to work with MR AM and JobHistoryServer web-apps.

Fixes ‘ant docs’ by removing stale references to capacity-scheduler docs.

Fixed pom files to refer to the correct MR app-jar needed by the integration tests.

Corrected MR AM to honor speculative configuration and enable speculating either maps or reduces.

Modified ContainerLocalizer to send a heartbeat to NM immediately after downloading a resource instead of always waiting for a second.

Fixed log aggregation to work correctly in secure mode. Contributed by Siddharth Seth.

Fixed Cluster’s getDelegationToken’s API to return null when there isn’t a supported token.

Fixed AM’s tracking URL to always go through the proxy, even before the job started, so that it works properly with oozie throughout the job execution.

Enhanced MR AM to use a proxy to ping the job-end notification URL.

Fixed LocalResourceTracker in NodeManager to remove deleted cache entries correctly.

Added system tests to test the memory emulation feature in Gridmix.

Fixed ant test compilation.

Added information about lost/rebooted/decommissioned nodes on the webapps.

Fixed MR AM’s ContainerLauncher to handle node-command timeouts correctly.

Unsuccessful tasks now log hostname and rackname to job history.

Fixed a race condition in ResourceManager that was causing TestContainerManagerSecurity to fail sometimes.

Fixed JobHistoryServer to also show the job’s queue name.

Fixed MR AM to stop considering node blacklisting after the number of nodes blacklisted crosses a threshold.

Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes.

Modified MR AM to not send a stop-container request for a container that isn’t launched at all.

Added AMInfo table to the MR AM job pages to list all the job-attempts when AM restarts and recovers.

Moved log related components into yarn-common so that HistoryServer and clients can use them without depending on the yarn-server-nodemanager module.

Removed the unnecessary job user-name configuration in mapred-site.xml.

Fixed a bug in TestSubmitJob.

Reenabled and fixed bugs in the failing test TestDelegationToken.

Reenabled and fixed bugs in the failing ant test TestAuditLogger.

Reneabled and fixed bugs in the failing test TestNoJobSetupCleanup.

Changed NodeManager to fail fast when LinuxContainerExecutor has wrong configuration or permissions.

Fixes bug in TestUserResolve.

Support for web-services in YARN and MR components.

Fixed bugs in ExecutionSummarizer and ResourceUsageMatcher.

DistCpV2 added to hadoop-tools.

Adds system tests for the CPU emulation feature in Gridmix3.

Added an anonymizer tool to Rumen. Anonymizer takes a Rumen trace file and/or topology as input. It supports persistence and plugins to override the default behavior.