This page explains how to quickly setup HttpFS with Pseudo authentication against a Hadoop cluster with Pseudo authentication.
By default, HttpFS assumes that Hadoop configuration files (core-site.xml & hdfs-site.xml) are in the HttpFS configuration directory.
If this is not the case, add to the httpfs-site.xml file the httpfs.hadoop.config.dir property set to the location of the Hadoop configuration directory.
Edit Hadoop core-site.xml and defined the Unix user that will run the HttpFS server as a proxyuser. For example:
<property> <name>hadoop.proxyuser.#HTTPFSUSER#.hosts</name> <value>httpfs-host.foo.com</value> </property> <property> <name>hadoop.proxyuser.#HTTPFSUSER#.groups</name> <value>*</value> </property>
IMPORTANT: Replace #HTTPFSUSER# with the Unix user that will start the HttpFS server.
To start/stop HttpFS, use hdfs --daemon start|stop httpfs. For example:
hadoop-3.3.4 $ hdfs --daemon start httpfs
NOTE: The script httpfs.sh is deprecated. It is now just a wrapper of hdfs httpfs.
$ curl -sS 'http://<HTTPFSHOSTNAME>:14000/webhdfs/v1?op=gethomedirectory&user.name=hdfs' {"Path":"\/user\/hdfs"}
HttpFS preconfigures the HTTP port to 14000.
HttpFS supports the following configuration properties in the HttpFS’s etc/hadoop/httpfs-site.xml configuration file.
Enable SSL in etc/hadoop/httpfs-site.xml:
<property> <name>httpfs.ssl.enabled</name> <value>true</value> <description> Whether SSL is enabled. Default is false, i.e. disabled. </description> </property>
Configure etc/hadoop/ssl-server.xml with proper values, for example:
<property> <name>ssl.server.keystore.location</name> <value>${user.home}/.keystore</value> <description>Keystore to be used. Must be specified. </description> </property> <property> <name>ssl.server.keystore.password</name> <value></value> <description>Must be specified.</description> </property> <property> <name>ssl.server.keystore.keypassword</name> <value></value> <description>Must be specified.</description> </property>
The SSL passwords can be secured by a credential provider. See Credential Provider API.
You need to create an SSL certificate for the HttpFS server. As the httpfs Unix user, using the Java keytool command to create the SSL certificate:
$ keytool -genkey -alias jetty -keyalg RSA
You will be asked a series of questions in an interactive prompt. It will create the keystore file, which will be named .keystore and located in the httpfs user home directory.
The password you enter for “keystore password” must match the value of the property ssl.server.keystore.password set in the ssl-server.xml in the configuration directory.
The answer to “What is your first and last name?” (i.e. “CN”) must be the hostname of the machine where the HttpFS Server will be running.
Start HttpFS. It should work over HTTPS.
Using the Hadoop FileSystem API or the Hadoop FS shell, use the swebhdfs:// scheme. Make sure the JVM is picking up the truststore containing the public key of the SSL certificate if using a self-signed certificate. For more information about the client side settings, see SSL Configurations for SWebHDFS.
NOTE: Some old SSL clients may use weak ciphers that are not supported by the HttpFS server. It is recommended to upgrade the SSL client.
The following environment variables are deprecated. Set the corresponding configuration properties instead.
Environment Variable | Configuration Property | Configuration File |
---|---|---|
HTTPFS_HTTP_HOSTNAME | httpfs.http.hostname | httpfs-site.xml |
HTTPFS_HTTP_PORT | httpfs.http.port | httpfs-site.xml |
HTTPFS_MAX_HTTP_HEADER_SIZE | hadoop.http.max.request.header.size and hadoop.http.max.response.header.size | httpfs-site.xml |
HTTPFS_MAX_THREADS | hadoop.http.max.threads | httpfs-site.xml |
HTTPFS_SSL_ENABLED | httpfs.ssl.enabled | httpfs-site.xml |
HTTPFS_SSL_KEYSTORE_FILE | ssl.server.keystore.location | ssl-server.xml |
HTTPFS_SSL_KEYSTORE_PASS | ssl.server.keystore.password | ssl-server.xml |
Name | Description |
---|---|
/conf | Display configuration properties |
/jmx | Java JMX management interface |
/logLevel | Get or set log level per class |
/logs | Display log files |
/stacks | Display JVM stacks |
/static/index.html | The static home page |
To control the access to servlet /conf, /jmx, /logLevel, /logs, and /stacks, configure the following properties in httpfs-site.xml:
<property> <name>hadoop.security.authorization</name> <value>true</value> <description>Is service-level authorization enabled?</description> </property> <property> <name>hadoop.security.instrumentation.requires.admin</name> <value>true</value> <description> Indicates if administrator ACLs are required to access instrumentation servlets (JMX, METRICS, CONF, STACKS). </description> </property> <property> <name>httpfs.http.administrators</name> <value></value> <description>ACL for the admins, this configuration is used to control who can access the default servlets for HttpFS server. The value should be a comma separated list of users and groups. The user list comes first and is separated by a space followed by the group list, e.g. "user1,user2 group1,group2". Both users and groups are optional, so "user1", " group1", "", "user1 group1", "user1,user2 group1,group2" are all valid (note the leading space in " group1"). '*' grants access to all users and groups, e.g. '*', '* ' and ' *' are all valid. </description> </property>