Hadoop: CLI MiniCluster.

Purpose

Using the CLI MiniCluster, users can simply start and stop a single-node Hadoop cluster with a single command, and without the need to set any environment variables or manage configuration files. The CLI MiniCluster starts both a YARN/MapReduce & HDFS clusters.

This is useful for cases where users want to quickly experiment with a real Hadoop cluster or test non-Java programs that rely on significant Hadoop functionality.

Hadoop Tarball

You should be able to obtain the Hadoop tarball from the release. Also, you can directly create a tarball from the source:

$ mvn clean install -DskipTests
$ mvn package -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip

NOTE: You will need protoc 2.5.0 installed.

The tarball should be available in hadoop-dist/target/ directory.

Running the MiniCluster

From inside the root directory of the extracted tarball, you can start the CLI MiniCluster using the following command:

$ HADOOP_CLASSPATH=share/hadoop/yarn/test/hadoop-yarn-server-tests-2.9.2-tests.jar bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.9.2-tests.jar minicluster -rmport RM_PORT -jhsport JHS_PORT

In the example command above, RM_PORT and JHS_PORT should be replaced by the user’s choice of these port numbers. If not specified, random free ports will be used.

There are a number of command line arguments that the users can use to control which services to start, and to pass other configuration properties. The available command line arguments:

$ -D <property=value>    Options to pass into configuration object
$ -datanodes <arg>       How many datanodes to start (default 1)
$ -format                Format the DFS (default false)
$ -help                  Prints option help.
$ -jhsport <arg>         JobHistoryServer port (default 0--we choose)
$ -namenode <arg>        URL of the namenode (default is either the DFS
$                        cluster or a temporary dir)
$ -nnport <arg>          NameNode port (default 0--we choose)
$ -nnhttpport <arg>      NameNode HTTP port (default 0--we choose)
$ -nodemanagers <arg>    How many nodemanagers to start (default 1)
$ -nodfs                 Don't start a mini DFS cluster
$ -nomr                  Don't start a mini MR cluster
$ -rmport <arg>          ResourceManager port (default 0--we choose)
$ -writeConfig <path>    Save configuration to this XML file.
$ -writeDetails <path>   Write basic information to this JSON file.

To display this full list of available arguments, the user can pass the -help argument to the above command.