Hadoop 0.23.11 Release Notes
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
Changes since Hadoop 0.23.10
- YARN-1932.
Blocker bug reported by Mit Desai and fixed by Mit Desai
Javascript injection on the job status page
Scripts can be injected into the job status page as the diagnostics field is
not sanitized. Whatever string you set there will show up to the jobs page as it is ... ie. if you put any script commands, they will be executed in the browser of the user who is opening the page.
We need escaping the diagnostic string in order to not run the scripts.
- YARN-1670.
Critical bug reported by Thomas Graves and fixed by Mit Desai
aggregated log writer can write more log data then it says is the log length
We have seen exceptions when using 'yarn logs' to read log files.
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that.
Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small.
We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this.
We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long.
while (len != -1 && curRead < fileLength) {
This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits.
- YARN-1592.
Major bug reported by Thomas Graves and fixed by Thomas Graves (capacityscheduler)
CapacityScheduler tries to reserve more than a node's total memory on branch-0.23
see YARN-957 for description. branch 0.23 has the same bug.
- YARN-1575.
Critical sub-task reported by Jason Lowe and fixed by Jason Lowe (nodemanager)
Public localizer crashes with "Localized unkown resource"
The public localizer can crash with the error:
{noformat}
2014-01-08 14:11:43,212 [Thread-467] ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localized unkonwn resource to java.util.concurrent.FutureTask@852e26
2014-01-08 14:11:43,212 [Thread-467] INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting
{noformat}
- YARN-1180.
Trivial bug reported by Thomas Graves and fixed by Chen He (capacityscheduler)
Update capacity scheduler docs to include types on the configs
The capacity scheduler docs (https://hadoop.apache.org/docs/r2.1.0-beta/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html) don't include types for all the configs. For instance the minimum-user-limit-percent doesn't say its an Int. It also the only setting for the Resource Allocation configs that is an Int rather then a float.
- YARN-1145.
Major bug reported by Rohith and fixed by Rohith
Potential file handle leak in aggregated logs web ui
Any problem in getting aggregated logs for rendering on web ui, then LogReader is not closed.
Now, it reader is not closed which causing many connections in close_wait state.
hadoopuser@hadoopuser:> jps
*27909* JobHistoryServer
DataNode port is 50010. When greped with DataNode port, many connections are in CLOSE_WAIT from JHS.
hadoopuser@hadoopuser:> netstat -tanlp |grep 50010
tcp 0 0 10.18.40.48:50010 0.0.0.0:* LISTEN 21453/java
tcp 1 0 10.18.40.48:20596 10.18.40.48:50010 CLOSE_WAIT *27909*/java
tcp 1 0 10.18.40.48:19667 10.18.40.152:50010 CLOSE_WAIT *27909*/java
tcp 1 0 10.18.40.48:20593 10.18.40.48:50010 CLOSE_WAIT *27909*/java
tcp 1 0 10.18.40.48:12290 10.18.40.48:50010 CLOSE_WAIT *27909*/java
tcp 1 0 10.18.40.48:19662 10.18.40.152:50010 CLOSE_WAIT *27909*/java
- YARN-1053.
Blocker bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
Diagnostic message from ContainerExitEvent is ignored in ContainerImpl
If the container launch fails then we send ContainerExitEvent. This event contains exitCode and diagnostic message. Today we are ignoring diagnostic message while handling this event inside ContainerImpl. Fixing it as it is useful in diagnosing the failure.
- YARN-853.
Major bug reported by Devaraj K and fixed by Devaraj K (capacityscheduler)
maximum-am-resource-percent doesn't work after refreshQueues command
If we update yarn.scheduler.capacity.maximum-am-resource-percent / yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent configuration and then do the refreshNodes, it uses the new config value to calculate Max Active Applications and Max Active Application Per User. If we add new node after issuing 'rmadmin -refreshQueues' command, it uses the old maximum-am-resource-percent config value to calculate Max Active Applications and Max Active Application Per User.
- YARN-500.
Major bug reported by Nishan Shetty, Huawei and fixed by Kenji Kikushima (resourcemanager)
ResourceManager webapp is using next port if configured port is already in use
- MAPREDUCE-5789.
Major bug reported by Rushabh S Shah and fixed by Rushabh S Shah (jobhistoryserver , webapps)
Average Reduce time is incorrect on Job Overview page
- MAPREDUCE-5778.
Major bug reported by Jason Lowe and fixed by Akira AJISAKA (jobhistoryserver)
JobSummary does not escape newlines in the job name
- MAPREDUCE-5757.
Major bug reported by Jason Lowe and fixed by Jason Lowe (client)
ConcurrentModificationException in JobControl.toList
- MAPREDUCE-5746.
Major bug reported by Jason Lowe and fixed by Jason Lowe (jobhistoryserver)
Job diagnostics can implicate wrong task for a failed job
- MAPREDUCE-5744.
Blocker bug reported by Sangjin Lee and fixed by Gera Shegalov
Job hangs because RMContainerAllocator$AssignedRequests.preemptReduce() violates the comparator contract
- MAPREDUCE-5689.
Critical bug reported by Lohit Vijayarenu and fixed by Lohit Vijayarenu
MRAppMaster does not preempt reducers when scheduled maps cannot be fulfilled
- MAPREDUCE-5623.
Major bug reported by Tsuyoshi OZAWA and fixed by Jason Lowe
TestJobCleanup fails because of RejectedExecutionException and NPE.
- MAPREDUCE-5454.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (test)
TestDFSIO fails intermittently on JDK7
- MAPREDUCE-3191.
Trivial bug reported by Todd Lipcon and fixed by Chen He
docs for map output compression incorrectly reference SequenceFile
- HDFS-6449.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee
Incorrect counting in ContentSummaryComputationContext in 0.23.
- HDFS-6191.
Major improvement reported by Kihwal Lee and fixed by Kihwal Lee (namenode)
Disable quota checks when replaying edit log.
- HDFS-6166.
Blocker bug reported by Nathan Roberts and fixed by Nathan Roberts (balancer)
revisit balancer so_timeout
- HDFS-5881.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee
Fix skip() of the short-circuit local reader (legacy).
- HDFS-5806.
Major bug reported by Nathan Roberts and fixed by Nathan Roberts (balancer)
balancer should set SoTimeout to avoid indefinite hangs
- HDFS-5728.
Critical bug reported by Vinayakumar B and fixed by Vinayakumar B (datanode)
[Diskfull] Block recovery will fail if the metafile does not have crc for all chunks of the block
- HDFS-5637.
Major improvement reported by Liang Xie and fixed by Liang Xie (hdfs-client , security)
try to refeatchToken while local read InvalidToken occurred
- HDFS-4576.
Major bug reported by Daryn Sharp and fixed by Daryn Sharp (webhdfs)
Webhdfs authentication issues
- HDFS-4461.
Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
DirectoryScanner: volume path prefix takes up memory for every block that is scanned
- HADOOP-10588.
Major bug reported by Kihwal Lee and fixed by Kihwal Lee
Workaround for jetty6 acceptor startup issue
- HADOOP-10454.
Major improvement reported by Kihwal Lee and fixed by Kihwal Lee
Provide FileContext version of har file system
- HADOOP-10332.
Major bug reported by Daryn Sharp and fixed by Jonathan Eagles
HttpServer's jetty audit log always logs 200 OK
- HADOOP-10164.
Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans
Allow UGI to login with a known Subject
- HADOOP-10148.
Minor sub-task reported by Chen He and fixed by Chen He (ipc)
backport hadoop-10107 to branch-0.23
- HADOOP-10146.
Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (util)
Workaround JDK7 Process fd close bug
- HADOOP-10129.
Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (tools/distcp)
Distcp may succeed when it fails
- HADOOP-10112.
Major bug reported by Brandon Li and fixed by Brandon Li (tools)
har file listing doesn't work with wild card
- HADOOP-10110.
Blocker bug reported by Chuan Liu and fixed by Chuan Liu (build)
hadoop-auth has a build break due to missing dependency
- HADOOP-10081.
Critical bug reported by Jason Lowe and fixed by Tsuyoshi OZAWA (ipc)
Client.setupIOStreams can leak socket resources on exception or error
- HADOOP-9230.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (test)
TestUniformSizeInputFormat fails intermittently
- HADOOP-8826.
Minor bug reported by Robert Joseph Evans and fixed by Mit Desai
Docs still refer to 0.20.205 as stable line
- HADOOP-7688.
Major improvement reported by Tsz Wo Nicholas Sze and fixed by Uma Maheswara Rao G
When a servlet filter throws an exception in init(..), the Jetty server failed silently.