Apache Hadoop 0.23.1 Release Notes

These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.

Fixed bugs in ExecutionSummarizer and ResourceUsageMatcher.

Fixes bug in TestUserResolve.

The default checksum algorithm used on HDFS is now CRC32C. Data from previous versions of Hadoop can still be read backwards-compatibly.

BlockReader has been reimplemented to use direct byte buffers. If you use a custom socket factory, it must generate sockets that have associated Channels.

Moved log related components into yarn-common so that HistoryServer and clients can use them without depending on the yarn-server-nodemanager module.

Fixed a bug in TestSubmitJob.

Reneabled and fixed bugs in the failing test TestNoJobSetupCleanup.

Reenabled and fixed bugs in the failing test TestDelegationToken.

Reenabled and fixed bugs in the failing ant test TestAuditLogger.

Fixed JobHistoryServer to also show the job’s queue name.

Fixed a race condition in ResourceManager that was causing TestContainerManagerSecurity to fail sometimes.

Fixed ant test compilation.

Adds system tests for the CPU emulation feature in Gridmix3.

Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes.

Removed the unnecessary job user-name configuration in mapred-site.xml.

Fixed Cluster’s getDelegationToken’s API to return null when there isn’t a supported token.

Fixed LocalResourceTracker in NodeManager to remove deleted cache entries correctly.

document changes only.

Added system tests to test the memory emulation feature in Gridmix.

Changed NodeManager to fail fast when LinuxContainerExecutor has wrong configuration or permissions.

Fixed MR AM’s ContainerLauncher to handle node-command timeouts correctly.

Fixed pom files to refer to the correct MR app-jar needed by the integration tests.

Here is a patch to enable this behavior

Fixes ‘ant docs’ by removing stale references to capacity-scheduler docs.

  1. New configurations a. dfs.block.local-path-access.user is the key in datanode configuration to specify the user allowed to do short circuit read. b. dfs.client.read.shortcircuit is the key to enable short circuit read at the client side configuration. c. dfs.client.read.shortcircuit.skip.checksum is the key to bypass checksum check at the client side.
  2. By default none of the above are enabled and short circuit read will not kick in.
  3. If security is on, the feature can be used only for user that has kerberos credentials at the client, therefore map reduce tasks cannot benefit from it in general.

Fixed Configuration.getClasses() API to return the default value if the key is not set.

Fixed a deadlock in NodeManager LocalDirectories’s handling service.

Support for web-services in YARN and MR components.

Fixed MR AM in uber mode to write map intermediate outputs in the correct directory to work properly in secure mode.

Fixed log aggregation to work correctly in secure mode. Contributed by Siddharth Seth.

Fixed an NPE occuring during scheduling in the ResourceManager.

Fixed JobEndNotifier to not get interrupted before completing all its retries.

Fixed JobHistory web-UI to display links to single task’s counters’ page.

Fixed failures in TestStagingCleanup and TestJobEndNotifier tests.

Added an anonymizer tool to Rumen. Anonymizer takes a Rumen trace file and/or topology as input. It supports persistence and plugins to override the default behavior.

Fixed AM’s tracking URL to always go through the proxy, even before the job started, so that it works properly with oozie throughout the job execution.

Fixed MR AM to stop considering node blacklisting after the number of nodes blacklisted crosses a threshold.

Unsuccessful tasks now log hostname and rackname to job history.

Modified CompositeService to avoid duplicate stop operations thereby solving race conditions in MR AM shutdown.

Rumen now provides {{Parsed*}} objects. These objects provide extra information that are not provided by {{Logged*}} objects.

Modified ContainerLocalizer to send a heartbeat to NM immediately after downloading a resource instead of always waiting for a second.

Optimized Job’s progress calculations in MR AM.

The ‘fs -getmerge’ tool now uses a -nl flag to determine if adding a newline at end of each file is required, in favor of the ‘addnl’ boolean flag that was used earlier.

Fixed failing JUnit tests in Gridmix.

The default blocksize property ‘dfs.blocksize’ now accepts unit symbols to be used instead of byte length. Values such as “10k”, “128m”, “1g” are now OK to provide instead of just no. of bytes as was before.

Fixed MapReduce AM to count failed maps also towards Reduce ramp up.

Removed a multitude of cloned/duplicate counters in the AM thereby reducing the AM heap size and preventing full GCs.

Fix ViewFS to catch a null canonical service-name and pass tests TestViewFileSystem*

Fixed TaskHeartBeatHandler to use a new configuration for the thread loop interval separate from task-timeout configuration property.

Fixed TokenCache to work with absent FileSystem canonical service-names.

Modified MR AM to not send a stop-container request for a container that isn’t launched at all.

Enhanced MR AM to use a proxy to ping the job-end notification URL.

Added AMInfo table to the MR AM job pages to list all the job-attempts when AM restarts and recovers.

Fixed TaskHeartbeatHandler to not hold a global lock for all task-updates.

Batching JobHistory flushing to DFS so that we don’t flush for every event slowing down AM.

Fixed a race condition in MR AM which is failing the sort benchmark consistently.

Modified NM to report correct http address when an ephemeral web port is configured.

Corrected MR AM to honor speculative configuration and enable speculating either maps or reduces.

Making CapacityScheduler more conservative so as to assign only one off-switch container in a single scheduling iteration.

Adding config for MapReduce History Server protocol in hadoop-policy.xml for service level authorization.

new files added: A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WebServicesIntro.apt.vm A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRest.apt.vm A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/MapredAppMasterRest.apt.vm A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/HistoryServerRest.apt.vm

The hadoop-project/src/site/site.xml is split into separate patch.

Improved FileInputFormat to return better locality for the last split.

Fixed EventFetcher and Fetcher threads to shut-down properly so that reducers don’t hang in corner cases.

Committed to trunk and branch-0.23. Thanks Mahadev.

Fixed the way head-room is allocated to applications by CapacityScheduler so that it deducts current-usage per user and not per-application.

DistCpV2 added to hadoop-tools.

Increased RPC handlers for all YARN servers to reasonable values for working at scale.

Added information about lost/rebooted/decommissioned nodes on the webapps.

Changed bin/mapred job -list to not print job-specific information not available at RM.

Very minor incompatibility in cmd-line output, inevitable due to MRv2 architecture.

Modified CapacityScheduler to use only users with pending requests for computing user-limits.

Modified Gridmix STRESS mode locking structure. The submitted thread and the polling thread now run simultaneously without blocking each other.

New JMX Bean in ResourceManager to provide list of live node managers:

Hadoop:service=ResourceManager,name=RMNMInfo LiveNodeManagers

Fixing YARN+MR to allow MR jobs to be able to use java.io.File.createTempFile to create temporary files as part of their tasks.

Modified RM UI to filter applications based on state of the applications.

MAPREDUCE-3774. Moved yarn-default.xml to hadoop-yarn-common from hadoop-server-common.

WARNING: No release note provided for this change.

Modified application limits to include queue max-capacities besides the usual user limits.

Fixed CapacityScheduler so that maxActiveApplication and maxActiveApplicationsPerUser per queue are not too low for small clusters.

WARNING: No release note provided for this change.

Fixed MR AM recovery so that only single selected task output is recovered and thus reduce the unnecessarily bloated recovery time.

Changed active nodes list to not contain unhealthy nodes on the webUI and metrics.

Fixed job-access-controls to work with MR AM and JobHistoryServer web-apps.

Fixed an NPE in FileOutputCommitter for jobs with maps but no reduces.

fix cross scripting attacks vulnerability through webapp interface.

Fixed MR AM to always use hostnames and never IPs when requesting containers so that scheduler can give off data local containers correctly.

Changed MR AM to not add the same rack entry multiple times into the container request table when multiple hosts for a split happen to be on the same rack

Addressed MR AM hanging issues during AM restart and then the recovery.

Generate integration artifacts “org.apache.hadoop:hadoop-client” and “org.apache.hadoop:hadoop-minicluster” containing all the jars needed to use Hadoop client APIs, and to run Hadoop MiniClusters, respectively. Push these artifacts to the maven repository when mvn-deploy, along with existing artifacts.

Added test to validate that AM can crash multiple times and still can recover successfully after MAPREDUCE-3846.

Fixed and reenabled tests related to MR child JVM’s environmental variables in TestMiniMRChildTask.

Provide WebHDFS as a complete FileSystem implementation for accessing HDFS over HTTP. Previous hftp feature was a read-only FileSystem and does not provide “write” accesses.