These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.
Previously, the MR job will get failed if AM get restarted for some reason (like node failure, etc.) during its doing commit job no matter if AM attempts reach to the maximum attempts. In this improvement, we add a new API isCommitJobRepeatable() to OutputCommitter interface which to indicate if job’s committer can do commitJob again if previous commit work is interrupted by NM/AM failures, etc. The instance of OutputCommitter, which support repeatable job commit (like FileOutputCommitter in algorithm 2), can allow AM to continue the commitJob() after AM restart as a new attempt.
This fix includes public method interface change. A follow-up JIRA issue for this incompatibility for branch-2.7 is HADOOP-13579.
Made CanBuffer interface public for use in client applications.
Added New compression levels for GzipCodec that can be set in zlib.compress.level
Two recommendations for the mapreduce.jobhistory.loadedtasks.cache.size property: 1) For every 100k of cache size, set the heap size of the Job History Server to 1.2GB. For example, mapreduce.jobhistory.loadedtasks.cache.size=500000, heap size=6GB. 2) Make sure that the cache size is larger than the number of tasks required for the largest job run on the cluster. It might be a good idea to set the value slightly higher (say, 20%) in order to allow for job size growth.
Fix inconsistent value type ( String and Array ) of the “type” field for LeafQueueInfo in response of RM REST API
Backport the fix to 2.7 and 2.8