These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.
The generic resource types feature allows admins to configure custom resource types outside of memory and CPU. Users can request these resource types which YARN will take into account for resource scheduling.
This also adds GPU as a native resource type, built on top of the generic resource types feature. It adds support for GPU resource discovery, GPU scheduling and GPU isolation.
Observer is a new type of a NameNode in addition to Active and Standby Nodes in HA settings. An Observer Node maintains a replica of the namespace same as a Standby Node. It additionally allows execution of clients read requests.
To ensure read-after-write consistency within a single client, a state ID is introduced in RPC headers. The Observer responds to the client request only after its own state has caught up with the client’s state ID, which it previously received from the Active NameNode.
Clients can explicitly invoke a new client protocol call msync(), which ensures that subsequent reads by this client from an Observer are consistent.
A new client-side ObserverReadProxyProvider is introduced to provide automatic switching between Active and Observer NameNodes for submitting respectively write and read requests.
This feature allows HDFS to selectively enforce encryption for both RPC (NameNode) and data transfer (DataNode). With this feature enabled, NameNode can listen on multiple ports, and different ports can have different security configurations. Depending on which NameNode port clients connect to, the RPC calls and the following data transfer will enforce security configuration corresponding to this NameNode port. This can help when there is requirement to enforce different security policies depending on the location where the clients are connecting from.
This can be enabled by setting hadoop.security.saslproperties.resolver.class
configuration to org.apache.hadoop.security.IngressPortBasedResolver
, and add the additional NameNode auxiliary ports by setting dfs.namenode.rpc-address.auxiliary-ports
, and set the security individual ports by configuring ingress.port.sasl.configured.ports
.
This adds an extension to the IPC FairCallQueue which allows for the consideration of the cost of a user’s operations when deciding how they should be prioritized, as opposed to the number of operations. This can be helpful for protecting the NameNode from clients which submit very expensive operations (e.g. large listStatus operations or recursive getContentSummary operations).
This can be enabled by setting the ipc.<port>.costprovder.impl
configuration to org.apache.hadoop.ipc.WeightedTimeCostProvider
.
This JIRA makes following change: Change Router metrics context from ‘router’ to ‘dfs’.
Mount tables support ACL, The users won’t be able to modify their own entries (we are assuming these old (no-permissions before) mount table with owner:superuser, group:supergroup, permission:755 as the default permissions). The fix way is login as superuser to modify these mount table entries.
Support multi-thread pre-read in AliyunOSSInputStream to improve the sequential read performance from Hadoop to Aliyun OSS.
MapReduce jobs that output to filesystems without direct support for recursive delete can set mapreduce.fileoutputcommitter.task.cleanup.enabled=true to have each task delete their intermediate work directory rather than waiting for the ApplicationMaster to clean up at the end of the job. This can significantly speed up the cleanup phase for large jobs on such filesystems.
Added an option to not disables short-circuit reads on failures, by setting dfs.domain.socket.disable.interval.seconds to 0.
Fix the document error of setting up HFDS Router Federation
Change default State Store from local file to ZooKeeper. This will require additional zk address to be configured.
HBase integration module was mixed up with for hbase-server and hbase-client dependencies. This JIRA split into sub modules such that hbase-client dependent modules and hbase-server dependent modules are separated. This allows to make conditional compilation with different version of Hbase.
Use environment variable HTTPFS_HTTP_HOSTNAME to limit the IP addresses httpfs server binds to. Default: httpfs server binds to all IP addresses on the host.
WASB: Bug fix to support non-sequential page blob reads. Required for HBASE replication.
WASB: Bug fix for recent regression in hflush() and hsync().
WASB: Fix Spark process hang at shutdown due to use of non-daemon threads by updating Azure Storage Java SDK to 7.0
Federation supports and controls global quota at mount table level.
In a federated environment, a folder can be spread across multiple subclusters. Router aggregates quota that queried from these subclusters and uses that for the quota-verification.
WASB: listStatus 10x performance improvement for listing 700,000 files
This change was required to address license compatibility issues with the JSON parser in the older AWS SDKs.
A consequence of this, where needed, the applied patch contains HADOOP-12705 Upgrade Jackson 2.2.3 to 2.7.8.
This patch changed the default build and test environment from Ubuntu “Trusty” 14.04 to Ubuntu “Xenial” 16.04.
This change allows the inode and inode directory sections of the fsimage to be loaded in parallel. Tests on large images have shown this change to reduce the image load time to about 50% of the pre-change run time.
It works by writing sub-section entries to the image index, effectively splitting each image section into many sub-sections which can be processed in parallel. By default 12 sub-sections per image section are created when the image is saved, and 4 threads are used to load the image at startup.
This is disabled by default for any image with more than 1M inodes (dfs.image.parallel.inode.threshold) and can be enabled by setting dfs.image.parallel.load to true. When the feature is enabled, the next HDFS checkpoint will write the image sub-sections and subsequent namenode restarts can load the image in parallel.
A image with the parallel sections can be read even if the feature is disabled, but HDFS versions without this Jira cannot load an image with parallel sections. OIV can process a parallel enabled image without issues.
Key configuration parameters are:
dfs.image.parallel.load=false - enable or disable the feature
dfs.image.parallel.target.sections = 12 - The target number of subsections. Aim for 2 to 3 times the number of dfs.image.parallel.threads.
dfs.image.parallel.inode.threshold = 1000000 - Only save and load in parallel if the image has more than this number of inodes.
dfs.image.parallel.threads = 4 - The number of threads used to load the image. Testing has shown 4 to be optimal, but this may depends on the environment
This change allows the inode and inode directory sections of the fsimage to be loaded in parallel. Tests on large images have shown this change to reduce the image load time to about 50% of the pre-change run time.
It works by writing sub-section entries to the image index, effectively splitting each image section into many sub-sections which can be processed in parallel. By default 12 sub-sections per image section are created when the image is saved, and 4 threads are used to load the image at startup.
This is disabled by default for any image with more than 1M inodes (dfs.image.parallel.inode.threshold) and can be enabled by setting dfs.image.parallel.load to true. When the feature is enabled, the next HDFS checkpoint will write the image sub-sections and subsequent namenode restarts can load the image in parallel.
A image with the parallel sections can be read even if the feature is disabled, but HDFS versions without this Jira cannot load an image with parallel sections. OIV can process a parallel enabled image without issues.
Key configuration parameters are:
dfs.image.parallel.load=false - enable or disable the feature
dfs.image.parallel.target.sections = 12 - The target number of subsections. Aim for 2 to 3 times the number of dfs.image.parallel.threads.
dfs.image.parallel.inode.threshold = 1000000 - Only save and load in parallel if the image has more than this number of inodes.
dfs.image.parallel.threads = 4 - The number of threads used to load the image. Testing has shown 4 to be optimal, but this may depends on the environment.
UPGRADE WARN: 1. It can upgrade smoothly from 2.10 to 3.* if not enable this feature ever. 2. Only path to do upgrade from 2.10 to 3.3 currently when enable fsimage parallel loading feature. 3. If someone want to upgrade 2.10 to 3.*(3.1.*/3.2.*) prior release, please make sure that save at least one fsimage file after disable this feature. It relies on change configuration parameter(dfs.image.parallel.load=false) first and restart namenode before upgrade operation.