A normal Hadoop test run will test those FileSystems that can be tested locally via the local filesystem. This typically means file://
and its underlying LocalFileSystem
, and hdfs://
via the HDFS MiniCluster.
Other filesystems are skipped unless there is a specific configuration to the remote server providing the filesystem.
These filesystem bindings must be defined in an XML configuration file, usually hadoop-common-project/hadoop-common/src/test/resources/contract-test-options.xml
. This file is excluded and should not be checked in.
In contract-test-options.xml
, the filesystem name must be defined in the property fs.contract.test.fs.ftp
. The specific login options to connect to the FTP Server must then be provided.
A path to a test directory must also be provided in the option fs.contract.test.ftp.testdir
. This is the directory under which operations take place.
Example:
<configuration> <property> <name>fs.contract.test.fs.ftp</name> <value>ftp://server1/</value> </property> <property> <name>fs.ftp.user.server1</name> <value>testuser</value> </property> <property> <name>fs.contract.test.ftp.testdir</name> <value>/home/testuser/test</value> </property> <property> <name>fs.ftp.password.server1</name> <value>secret-login</value> </property> </configuration>
The core of adding a new FileSystem to the contract tests is adding a new contract class, then creating a new non-abstract test class for every test suite that you wish to test.
contract
, for the files and tests.AbstractFSContract
for your own contract implementation.Test
and the name of the filesystem. Example: TestHDFSRenameContract
.createContract()
.src/test/resources/contract-test-options.xml
file of the specific project.As an example, here is the implementation of the test of the create()
tests for the local filesystem.
package org.apache.hadoop.fs.contract.localfs; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.contract.AbstractCreateContractTest; import org.apache.hadoop.fs.contract.AbstractFSContract; public class TestLocalCreateContract extends AbstractCreateContractTest { @Override protected AbstractFSContract createContract(Configuration conf) { return new LocalFSContract(conf); } }
The standard implementation technique for subclasses of AbstractFSContract
is to be driven entirely by a Hadoop XML configuration file stored in the test resource tree. The best practise is to store it under /contract
with the name of the FileSystem, such as contract/localfs.xml
. Having the XML file define all FileSystem options makes the listing of FileSystem behaviors immediately visible.
The LocalFSContract
is a special case of this, as it must adjust its case sensitivity policy based on the OS on which it is running: for both Windows and OS/X, the filesystem is case insensitive, so the ContractOptions.IS_CASE_SENSITIVE
option must be set to false. Furthermore, the Windows filesystem does not support Unix file and directory permissions, so the relevant flag must also be set. This is done after loading the XML contract file from the resource tree, simply by updating the now-loaded configuration options:
getConf().setBoolean(getConfKey(ContractOptions.SUPPORTS_UNIX_PERMISSIONS), false);
If your new FileSystem
test cases fails one of the contract tests, what you can you do?
It depends on the cause of the problem
FileSystem
subclass class doesn’t correctly implement specification. Fix.FileSystem
subclass hide the differences, e.g. by translating exceptions.hdfs-dev
mailing list. Note that while FileSystem tests live in the core Hadoop codebase, it is the HDFS team who owns the FileSystem specification and the tests that accompany it.If a test needs to be skipped because a feature is not supported, look for a existing configuration option in the ContractOptions
class. If there is no method, the short term fix is to override the method and use the ContractTestUtils.skip()
message to log the fact that a test is skipped. Using this method prints the message to the logs, then tells the test runner that the test was skipped. This highlights the problem.
A recommended strategy is to call the superclass, catch the exception, and verify that the exception class and part of the error string matches that raised by the current implementation. It should also fail()
if superclass actually succeeded -that is it failed the way that the implemention does not currently do. This will ensure that the test path is still executed, any other failure of the test -possibly a regression- is picked up. And, if the feature does become implemented, that the change is picked up.
A long-term solution is to enhance the base test to add a new optional feature key. This will require collaboration with the developers on the hdfs-dev
mailing list.
The contract tests include the notion of strict vs lax exceptions. Strict exception reporting means: reports failures using specific subclasses of IOException
, such as FileNotFoundException
, EOFException
and so on. Lax reporting means throws IOException
.
While FileSystems SHOULD raise stricter exceptions, there may be reasons why they cannot. Raising lax exceptions is still allowed, it merely hampers diagnostics of failures in user applications. To declare that a FileSystem does not support the stricter exceptions, set the option fs.contract.supports-strict-exceptions
to false.
Tests against remote FileSystems will require the URL to the FileSystem to be specified; tests against remote FileSystems that require login details require usernames/IDs and passwords.
All these details MUST be required to be placed in the file src/test/resources/contract-test-options.xml
, and your SCM tools configured to never commit this file to subversion, git or equivalent. Furthermore, the build MUST be configured to never bundle this file in any -test
artifacts generated. The Hadoop build does this, excluding src/test/**/*.xml
from the JAR files. In addition, src/test/resources/auth-keys.xml
will need to be created. It can be a copy of contract-test-options.xml
. The AbstractFSContract
class automatically loads this resource file if present; specific keys for specific test cases can be added.
As an example, here are what S3A test keys look like:
<configuration> <property> <name>fs.contract.test.fs.s3a</name> <value>s3a://tests3contract</value> </property> <property> <name>fs.s3a.access.key</name> <value>DONOTPCOMMITTHISKEYTOSCM</value> </property> <property> <name>fs.s3a.secret.key</name> <value>DONOTEVERSHARETHISSECRETKEY!</value> </property> </configuration>
The AbstractBondedFSContract
automatically skips a test suite if the FileSystem URL is not defined in the property fs.contract.test.fs.%s
, where %s
matches the schema name of the FileSystem.
When running the tests maven.test.skip
will need to be turned off since it is true by default on these tests. This can be done with a command like mvn test -Ptests-on
.
Passing all the FileSystem contract tests does not mean that a filesystem can be described as “compatible with HDFS”. The tests try to look at the isolated functionality of each operation, and focus on the preconditions and postconditions of each action. Core areas not covered are concurrency and aspects of failure across a distributed system.
There are also some specific aspects of the use of the FileSystem API:
hadoop -fs
CLI.Tests that verify these behaviors are of course welcome.
seek()
, rename()
, create()
, and so on. This is to match up the way that the FileSystem contract specification is split up by operation. It also makes it easier for FileSystem implementors to work on one test suite at a time.AbstractFSContractTestBase
with a new abstract test suite class. Again, use Abstract
in the title.org.apache.hadoop.fs.contract.ContractTestUtils
for utility classes to aid testing, with lots of filesystem-centric assertions. Use these to make assertions about the filesystem state, and to incude diagnostics information such as directory listings and dumps of mismatched files when an assertion actually fails.Some tests work directly against the root filesystem, attempting to do things like rename “/” and similar actions. The root directory is “special”, and it’s important to test this, especially on non-POSIX filesystems such as object stores. These tests are potentially very destructive to native filesystems, so use care.
Add the tests under AbstractRootDirectoryContractTest
or create a new test with (a) Root
in the title and (b) a check in the setup method to skip the test if root tests are disabled:
skipIfUnsupported(TEST_ROOT_TESTS_ENABLED);
Don’t provide an implementation of this test suite to run against the local FS.
Tests designed to generate scalable load -and that includes a large number of small files, as well as fewer larger files, should be designed to be configurable, so that users of the test suite can configure the number and size of files.
Be aware that on object stores, the directory rename operation is usually O(files)*O(data)
while the delete operation is O(files)
. The latter means even any directory cleanup operations may take time and can potentially timeout. It is important to design tests that work against remote filesystems with possible delays in all operations.
The specification is incomplete. It doesn’t have complete coverage of the FileSystem classes, and there may be bits of the existing specified classes that are not covered.