Hadoop 2.6.4 Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements.

Changes since Hadoop 2.6.3

YARN-4598. Major bug reported by tangshangwen and fixed by tangshangwen (nodemanager)
Invalid event: RESOURCE_FAILED at CONTAINER_CLEANEDUP_AFTER_KILL
YARN-4581. Major bug reported by sandflee and fixed by sandflee (resourcemanager)
AHS writer thread leak makes RM crash while RM is recovering
YARN-4546. Critical bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
ResourceManager crash due to scheduling opportunity overflow
YARN-4452. Critical bug reported by Naganarasimha G R and fixed by Naganarasimha G R
NPE when submit Unmanaged application
YARN-4414. Major bug reported by Jason Lowe and fixed by Chang Li (nodemanager)
Nodemanager connection errors are retried at multiple levels
YARN-4380. Major bug reported by Tsuyoshi Ozawa and fixed by Varun Saxena (test)
TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently
YARN-4354. Blocker bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)
Public resource localization fails with NPE
YARN-4180. Critical bug reported by Anubhav Dhoot and fixed by Anubhav Dhoot (resourcemanager)
AMLauncher does not retry on failures when talking to NM
YARN-3893. Critical sub-task reported by Bibin A Chundatt and fixed by Bibin A Chundatt (resourcemanager)
Both RM in active state when Admin#transitionToActive failure from refeshAll()
YARN-3857. Critical bug reported by mujunchao and fixed by mujunchao (resourcemanager)
Memory leak in ResourceManager with SIMPLE mode
YARN-3849. Critical bug reported by Sunil G and fixed by Sunil G (capacityscheduler)
Too much of preemption activity causing continuos killing of containers across queues
YARN-3842. Critical bug reported by Karthik Kambatla and fixed by Robert Kanter
NMProxy should retry on NMNotYetReadyException
YARN-3697. Critical bug reported by zhihai xu and fixed by zhihai xu (fairscheduler)
FairScheduler: ContinuousSchedulingThread can fail to shutdown
YARN-3695. Major bug reported by Junping Du and fixed by Raju Bairishetti
ServerProxy (NMProxy, etc.) shouldn't retry forever for non network exception.
YARN-3535. Critical bug reported by Peng Zhang and fixed by Peng Zhang (capacityscheduler , fairscheduler , resourcemanager)
Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED
YARN-3154. Blocker sub-task reported by Xuan Gong and fixed by Xuan Gong (nodemanager , resourcemanager)
Should not upload partial logs for MR jobs or other "short-running' applications

Applications which made use of the LogAggregationContext in their application will need to revisit this code in order to make sure that their logs continue to get rolled out.
YARN-2975. Blocker bug reported by Karthik Kambatla and fixed by Karthik Kambatla
FSLeafQueue app lists are accessed without required locks
YARN-2902. Major sub-task reported by Jason Lowe and fixed by Varun Saxena (nodemanager)
Killing a container that is localizing can orphan resources in the DOWNLOADING state
MAPREDUCE-6621. Major bug reported by Xuan Gong and fixed by Xuan Gong
Memory Leak in JobClient#submitJobInternal()
MAPREDUCE-6619. Major bug reported by shanyu zhao and fixed by Junping Du (mrv2)
HADOOP_CLASSPATH is overwritten in MR container
MAPREDUCE-6618. Major bug reported by Xuan Gong and fixed by Xuan Gong
YarnClientProtocolProvider leaking the YarnClient thread.
MAPREDUCE-6577. Critical bug reported by Sangjin Lee and fixed by Sangjin Lee (mr-am)
MR AM unable to load native library without MR_AM_ADMIN_USER_ENV set
MAPREDUCE-6554. Critical bug reported by Bibin A Chundatt and fixed by Bibin A Chundatt
MRAppMaster servicestart failing with NPE in MRAppMaster#parsePreviousJobHistory
MAPREDUCE-6492. Critical bug reported by Bibin A Chundatt and fixed by Bibin A Chundatt
AsyncDispatcher exit with NPE on TaskAttemptImpl#sendJHStartEventForAssignedFailTask
MAPREDUCE-6436. Blocker improvement reported by Ryu Kobayashi and fixed by Kai Sasaki
JobHistory cache issue
MAPREDUCE-6363. Critical bug reported by Brahma Reddy Battula and fixed by Bibin A Chundatt (benchmarks)
[NNBench] Lease mismatch error when running with multiple mappers
MAPREDUCE-5982. Major bug reported by Jason Lowe and fixed by Chang Li (mr-am)
Task attempts that fail from the ASSIGNED state can disappear
HDFS-9600. Critical bug reported by Phil Yang and fixed by Phil Yang
do not check replication if the block is under construction
HDFS-9574. Major bug reported by Kihwal Lee and fixed by Kihwal Lee
Reduce client failures during datanode restart
HDFS-9445. Blocker bug reported by Kihwal Lee and fixed by Walter Su
Datanode may deadlock while handling a bad volume
HDFS-9415. Major improvement reported by Arpit Agarwal and fixed by Xiaobing Zhou (documentation)
Document dfs.cluster.administrators and dfs.permissions.superusergroup
HDFS-9314. Major improvement reported by Ming Ma and fixed by Xiao Chen
Improve BlockPlacementPolicyDefault's picking of excess replicas
HDFS-9313. Major bug reported by Ming Ma and fixed by Ming Ma
Possible NullPointerException in BlockManager if no excess replica can be chosen
HDFS-9294. Blocker bug reported by DENG FEI and fixed by Brahma Reddy Battula (hdfs-client)
DFSClient deadlock when close file and failed to renew lease
HDFS-9220. Blocker bug reported by Bogdan Raducanu and fixed by Jing Zhao
Reading small file (< 512 bytes) that is open for append fails due to incorrect checksum
HDFS-9178. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee
Slow datanode I/O can cause a wrong node to be marked bad
HDFS-8767. Critical bug reported by Haohui Mai and fixed by Kanaka Kumar Avvaru
RawLocalFileSystem.listStatus() returns null for UNIX pipefile
HDFS-8722. Critical improvement reported by Kihwal Lee and fixed by Kihwal Lee
Optimize datanode writes for small writes and flushes
HDFS-8647. Major improvement reported by Ming Ma and fixed by Brahma Reddy Battula
Abstract BlockManager's rack policy into BlockPlacementPolicy
HDFS-7694. Major improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
FSDataInputStream should support "unbuffer"
HDFS-6945. Critical bug reported by Akira AJISAKA and fixed by Akira AJISAKA (namenode)
BlockManager should remove a block from excessReplicateMap and decrement ExcessBlocks metric when the block is removed
HDFS-4660. Blocker bug reported by Peng Zhang and fixed by Kihwal Lee (datanode)
Block corruption can happen during pipeline recovery
HADOOP-12736. Major test reported by Xiao Chen and fixed by Xiao Chen
TestTimedOutTestsListener#testThreadDumpAndDeadlocks sometimes times out
HADOOP-12706. Major bug reported by Jason Lowe and fixed by Sangjin Lee (test)
TestLocalFsFCStatistics#testStatisticsThreadLocalDataCleanUp times out occasionally
HADOOP-12107. Critical bug reported by Sangjin Lee and fixed by Sangjin Lee (fs)
long running apps may have a huge number of StatisticsData instances under FileSystem
HADOOP-11252. Critical bug reported by Wilfred Spiegelenburg and fixed by Masatake Iwasaki (ipc)
RPC client does not time out by default