A normal Hadoop test run will test those FileSystems that can be tested locally via the local filesystem. This typically means file:// and its underlying LocalFileSystem, and hdfs:// via the HDFS MiniCluster.
Other filesystems are skipped unless there is a specific configuration to the remote server providing the filesystem.
These filesystem bindings must be defined in an XML configuration file, usually hadoop-common-project/hadoop-common/src/test/resources/contract-test-options.xml. This file is excluded and should not be checked in.
In contract-test-options.xml, the filesystem name must be defined in the property fs.contract.test.fs.ftp. The specific login options to connect to the FTP Server must then be provided.
A path to a test directory must also be provided in the option fs.contract.test.ftp.testdir. This is the directory under which operations take place.
Example:
<configuration> <property> <name>fs.contract.test.fs.ftp</name> <value>ftp://server1/</value> </property> <property> <name>fs.ftp.user.server1</name> <value>testuser</value> </property> <property> <name>fs.contract.test.ftp.testdir</name> <value>/home/testuser/test</value> </property> <property> <name>fs.ftp.password.server1</name> <value>secret-login</value> </property> </configuration>
The OpenStack Swift login details must be defined in the file /hadoop-tools/hadoop-openstack/src/test/resources/contract-test-options.xml. The standard hadoop-common contract-test-options.xml resource file cannot be used, as that file does not get included in hadoop-common-test.jar.
In /hadoop-tools/hadoop-openstack/src/test/resources/contract-test-options.xml the Swift bucket name must be defined in the property fs.contract.test.fs.swift, along with the login details for the specific Swift service provider in which the bucket is posted.
<configuration> <property> <name>fs.contract.test.fs.swift</name> <value>swift://swiftbucket.rackspace/</value> </property> <property> <name>fs.swift.service.rackspace.auth.url</name> <value>https://auth.api.rackspacecloud.com/v2.0/tokens</value> <description>Rackspace US (multiregion)</description> </property> <property> <name>fs.swift.service.rackspace.username</name> <value>this-is-your-username</value> </property> <property> <name>fs.swift.service.rackspace.region</name> <value>DFW</value> </property> <property> <name>fs.swift.service.rackspace.apikey</name> <value>ab0bceyoursecretapikeyffef</value> </property> </configuration>
The core of adding a new FileSystem to the contract tests is adding a new contract class, then creating a new non-abstract test class for every test suite that you wish to test.
As an example, here is the implementation of the test of the create() tests for the local filesystem.
package org.apache.hadoop.fs.contract.localfs; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.contract.AbstractCreateContractTest; import org.apache.hadoop.fs.contract.AbstractFSContract; public class TestLocalCreateContract extends AbstractCreateContractTest { @Override protected AbstractFSContract createContract(Configuration conf) { return new LocalFSContract(conf); } }
The standard implementation technique for subclasses of AbstractFSContract is to be driven entirely by a Hadoop XML configuration file stored in the test resource tree. The best practise is to store it under /contract with the name of the FileSystem, such as contract/localfs.xml. Having the XML file define all FileSystem options makes the listing of FileSystem behaviors immediately visible.
The LocalFSContract is a special case of this, as it must adjust its case sensitivity policy based on the OS on which it is running: for both Windows and OS/X, the filesystem is case insensitive, so the ContractOptions.IS_CASE_SENSITIVE option must be set to false. Furthermore, the Windows filesystem does not support Unix file and directory permissions, so the relevant flag must also be set. This is done after loading the XML contract file from the resource tree, simply by updating the now-loaded configuration options:
getConf().setBoolean(getConfKey(ContractOptions.SUPPORTS_UNIX_PERMISSIONS), false);
If your new FileSystem test cases fails one of the contract tests, what you can you do?
It depends on the cause of the problem
If a test needs to be skipped because a feature is not supported, look for a existing configuration option in the ContractOptions class. If there is no method, the short term fix is to override the method and use the ContractTestUtils.skip() message to log the fact that a test is skipped. Using this method prints the message to the logs, then tells the test runner that the test was skipped. This highlights the problem.
A recommended strategy is to call the superclass, catch the exception, and verify that the exception class and part of the error string matches that raised by the current implementation. It should also fail() if superclass actually succeeded -that is it failed the way that the implemention does not currently do. This will ensure that the test path is still executed, any other failure of the test -possibly a regression- is picked up. And, if the feature does become implemented, that the change is picked up.
A long-term solution is to enhance the base test to add a new optional feature key. This will require collaboration with the developers on the hdfs-dev mailing list.
The contract tests include the notion of strict vs lax exceptions. Strict exception reporting means: reports failures using specific subclasses of IOException, such as FileNotFoundException, EOFException and so on. Lax reporting means throws IOException.
While FileSystems SHOULD raise stricter exceptions, there may be reasons why they cannot. Raising lax exceptions is still allowed, it merely hampers diagnostics of failures in user applications. To declare that a FileSystem does not support the stricter exceptions, set the option fs.contract.supports-strict-exceptions to false.
Tests against remote FileSystems will require the URL to the FileSystem to be specified; tests against remote FileSystems that require login details require usernames/IDs and passwords.
All these details MUST be required to be placed in the file src/test/resources/contract-test-options.xml, and your SCM tools configured to never commit this file to subversion, git or equivalent. Furthermore, the build MUST be configured to never bundle this file in any -test artifacts generated. The Hadoop build does this, excluding src/test/**/*.xml from the JAR files. In addition, src/test/resources/auth-keys.xml will need to be created. It can be a copy of contract-test-options.xml. The AbstractFSContract class automatically loads this resource file if present; specific keys for specific test cases can be added.
As an example, here are what S3A test keys look like:
<configuration> <property> <name>fs.contract.test.fs.s3a</name> <value>s3a://tests3contract</value> </property> <property> <name>fs.s3a.access.key</name> <value>DONOTPCOMMITTHISKEYTOSCM</value> </property> <property> <name>fs.s3a.secret.key</name> <value>DONOTEVERSHARETHISSECRETKEY!</value> </property> </configuration>
The AbstractBondedFSContract automatically skips a test suite if the FileSystem URL is not defined in the property fs.contract.test.fs.%s, where %s matches the schema name of the FileSystem.
When running the tests maven.test.skip will need to be turned off since it is true by default on these tests. This can be done with a command like mvn test -Ptests-on.
Passing all the FileSystem contract tests does not mean that a filesystem can be described as “compatible with HDFS”. The tests try to look at the isolated functionality of each operation, and focus on the preconditions and postconditions of each action. Core areas not covered are concurrency and aspects of failure across a distributed system.
Proof that this is is true is the fact that the Amazon S3 and OpenStack Swift object stores are eventually consistent object stores with non-atomic rename and delete operations. Single threaded test cases are unlikely to see some of the concurrency issues, while consistency is very often only visible in tests that span a datacenter.
There are also some specific aspects of the use of the FileSystem API:
Tests that verify these behaviors are of course welcome.
Some tests work directly against the root filesystem, attempting to do things like rename “/” and similar actions. The root directory is “special”, and it’s important to test this, especially on non-POSIX filesystems such as object stores. These tests are potentially very destructive to native filesystems, so use care.
Add the tests under AbstractRootDirectoryContractTest or create a new test with (a) Root in the title and (b) a check in the setup method to skip the test if root tests are disabled:
skipIfUnsupported(TEST_ROOT_TESTS_ENABLED);
Don’t provide an implementation of this test suite to run against the local FS.
Tests designed to generate scalable load -and that includes a large number of small files, as well as fewer larger files, should be designed to be configurable, so that users of the test suite can configure the number and size of files.
Be aware that on object stores, the directory rename operation is usually O(files)*O(data) while the delete operation is O(files). The latter means even any directory cleanup operations may take time and can potentially timeout. It is important to design tests that work against remote filesystems with possible delays in all operations.
The specification is incomplete. It doesn’t have complete coverage of the FileSystem classes, and there may be bits of the existing specified classes that are not covered.