Apache Hadoop 0.22.0 Release Notes

These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.


Allow map and reduce jvm parameters, environment variables and ulimit to be set separately.

Configuration changes: add mapred.map.child.java.opts add mapred.reduce.child.java.opts add mapred.map.child.env add mapred.reduce.child.ulimit add mapred.map.child.env add mapred.reduce.child.ulimit deprecated mapred.child.java.opts deprecated mapred.child.env deprecated mapred.child.ulimit


Trash feature notifies user of over-quota condition rather than silently deleting files/directories; deletion can be compelled with “rm -skiptrash”.


Split existing RpcMetrics into RpcMetrics and RpcDetailedMetrics. The new RpcDetailedMetrics has per method usage details and is available under context name “rpc” and record name “detailed-metrics”


Moved Task log cleanup into a separate thread in TaskTracker. Added configuration “mapreduce.job.userlog.retain.hours” to specify the time(in hours) for which the user-logs are to be retained after the job completion.


WARNING: No release note provided for this change.


Fixed a bug that causes TaskRunner to get NPE in getting ugi from TaskTracker and subsequently crashes it resulting in a failing task after task-timeout period.


Added a metric to track number of heartbeats processed by the JobTracker.


WARNING: No release note provided for this change.


WARNING: No release note provided for this change.


Specific exceptions are thrown from HDFS implementation and protocol per the interface defined in AbstractFileSystem. The compatibility is not affected as the applications catch IOException and will be able to handle specific exceptions that are subclasses of IOException.


new config: hadoop.security.service.user.name.key this setting points to the server principal for RefreshUserToGroupMappingsProtocol. The value should be either NN or JT principal depending if it is used in DFAdmin or MRAdmin. The value is set by the application. No need for default value.


Does not currently provide anything but uniform distribution. Uses some older depreciated class interfaces (for mapper and reducer) This was tested on 0.20 and 0.22 (locally) so it should be fairly backwards compatible.


Incremental enhancements to the JobTracker include a no-lock version of JT.getTaskCompletion events, no lock on the JT while doing i/o during job-submission and several fixes to cut down configuration parsing during heartbeat-handling.


Removes JNI calls to get jvm current/max heap usage in ClusterStatus. Any instances of ClusterStatus serialized in a prior version will not be correctly deserialized using the updated class.


Improved console messaging for streaming jobs by using the generic JobClient API itself instead of the existing streaming-specific code.


Added a configuration property “stream.map.input.ignoreKey” to specify whether to ignore key or not while writing input for the mapper. This configuration parameter is valid only if stream.map.input.writer.class is org.apache.hadoop.streaming.io.TextInputWriter.class. For all other InputWriter’s, key is always written.


Improved streaming job failure when #link is missing from uri format of -cacheArchive. Earlier it used to fail when launching individual tasks, now it fails during job submission itself.


changed protocol name (may be used in hadoop-policy.xml) from security.refresh.usertogroups.mappings.protocol.acl to security.refresh.user.mappings.protocol.acl


WARNING: No release note provided for this change.


changing name of the protocol (may be used in hadoop-policy.xml) from security.refresh.usertogroups.mappings.protocol.acl to security.refresh.user.mappings.protocol.acl


Lazily construct a connection to the JobTracker from the job-submission client.


Adds the audit logging facility to MapReduce. All authorization/authentication events are logged to audit log. Audit log entries are stored as key=value.


Incremental enhancements to the JobTracker to optimize heartbeat handling.


WARNING: No release note provided for this change.


Fixed an NPE in streaming that occurs when there is no input to reduce and the streaming reducer sends status updates by writing “reporter:status: xxx” statements to stderr.


Improved performance of the method JobInProgress.findSpeculativeTask() which is in the critical heartbeat code path.


MAPREDUCE-1887. MRAsyncDiskService now properly absolutizes volume root paths. (Aaron Kimball via zshao)


Processing of concatenated gzip files formerly stopped (quietly) at the end of the first substream/“member”; now processing will continue to the end of the concatenated stream, like gzip(1) does. (bzip2 support is unaffected by this patch.)


This jira introduces backward incompatibility. Existing pipes applications MUST be recompiled with new hadoop pipes library once the changes in this jira are deployed.


When running fsck, audit log events are not logged for listStatus and open are not logged. A new event with cmd=fsck is logged with ugi field set to the user requesting fsck and src field set to the fsck path.


Removed public deprecated class org.apache.hadoop.streaming.UTF8ByteArrayUtils.


A robots.txt is now in place which will prevent well behaved crawlers from perusing Hadoop web interfaces.


WARNING: No release note provided for this change.


Fixes serialization of job-acls in JobStatus to use AccessControlList.write() instead of AccessControlList.toString().


WARNING: No release note provided for this change.


New metrics “login” of type MetricTimeVaryingRate is added under new metrics context name “ugi” and metrics record name “ugi”.


resubmit the patch for HDFS1318 as Hudson was down last week.


Collect cpu and memory statistics per task.


Moved the libhdfs package to the HDFS subproject.


Clears a problem that {{TestJobCleanup}} leaves behind files that cause {{TestJobOutputCommitter}} to error out.


Makes AccessControlList a writable and updates documentation for Job ACLs.



Adds -background option to run a streaming job in background.


Remove some redundant lines from JobInProgress’s constructor which was re-initializing things unnecessarily.


This provides an option to store fsimage compressed. The layout version is bumped to -25. The user could configure if s/he wants the fsimage to be compressed or not and which codec to use. By default the fsimage is not compressed.


N/A


Fix EOF exception in BlockDecompressorStream when decompressing previous compressed empty file


Store fsimage MD5 checksum in VERSION file. Validate checksum when loading a fsimage. Layout version bumped.


Add a configuration variable dfs.image.transfer.bandwidthPerSec to allow the user to specify the amount of bandwidth for transferring image and edits. Its default value is 0 indicating no throttling.


Added support to auto-generate the Eclipse .classpath file from ivy.


Added support to auto-generate the Eclipse .classpath file from ivy.


Support for reporting metrics to Ganglia 3.1 servers


Moved the api public Counter getCounter(Enum<?> counterName), public Counter getCounter(String groupName, String counterName) from org.apache.hadoop.mapreduce.TaskInputOutputContext to org.apache.hadoop.mapreduce.TaskAttemptContext


This patch has changed the serialization format of BlockLocation.


Improve the buffer utilization of ZlibCompressor to avoid invoking a JNI per write request.


The permissions on datanode data directories (configured by dfs.datanode.data.dir.perm) now default to 0700. Upon startup, the datanode will automatically change the permissions to match the configured value.


The TaskTracker now uses the libhadoop JNI library to operate securely on local files when security is enabled. Secure clusters must ensure that libhadoop.so is available to the TaskTracker.


Updates hadoop-config.sh to always resolve symlinks when determining HADOOP_HOME. Bash built-ins or POSIX:2001 compliant cmds are now required.


The native build run when from trunk now requires autotools, libtool and openssl dev libraries.


The native build run when from trunk now requires autotools, libtool and openssl dev libraries.


The native build run when from trunk now requires autotools, libtool and openssl dev libraries.


I have just committed this to trunk and branch-0.22. Thanks Roman!


Fix Dynamic Priority Scheduler to work with hierarchical queue names


Fix a misleading documentation note about the usage of Reporter objects in Reducers.


Job names on jobtracker.jsp should be 80 characters long at most.


Job ACL files now have permissions set to 600 (previously 700).


Remove the now defunct property `mapreduce.job.userhistorylocation`.


Adds a new configuration hadoop.work.around.non.threadsafe.getpwuid which can be used to enable a mutex around this call to workaround thread-unsafe implementations of getpwuid_r. Users should consult https://wiki.apache.org/hadoop/KnownBrokenPwuidImplementations for a list of such systems.


Removed references to the older fs.checkpoint.* properties that resided in core-site.xml


Removed references to the older fs.checkpoint.* properties that resided in core-site.xml


Increments the RPC protocol version in org.apache.hadoop.ipc.Server from 4 to 5. Introduces ArrayPrimitiveWritable for a much more efficient wire format to transmit arrays of primitives over RPC. ObjectWritable uses the new writable for array of primitives for RPC and continues to use existing format for on-disk data.


Updated the help for the touchz command.


When Hadoop’s Kerberos integration is enabled, it is now required that either {{kinit}} be on the path for user accounts running the Hadoop client, or that the {{hadoop.kerberos.kinit.command}} configuration option be manually set to the absolute path to {{kinit}}.


Add an FAQ entry regarding the differences between Java API and Streaming development of MR programs.


Removed thriftfs contrib component.


Removed contrib related build targets.


Updated the web documentation to reflect the formatting abilities of ‘fs -stat’.


Option webinterface.private.actions has been renamed to mapreduce.jobtracker.webinterface.trusted and should be specified in mapred-site.xml instead of core-site.xml


Configuration option webinterface.private.actions has been renamed to mapreduce.jobtracker.webinterface.trusted


Adds method to NameNode/ClientProtocol that allows for rude revoke of lease on current lease holder


Confirmed that problem of finding ivy file occurs w/o patch with ant 1.7, and not with patch (with either ant 1.7 or 1.8). Other unit tests are still failing the test steps themselves on my laptop, but that is not due not finding the ivy file.


Add CapacityScheduler servlet to enhance web UI for queue information.