Notation

A formal notation such as The Z Notation would be the strictest way to define Hadoop FileSystem behavior, and could even be used to prove some axioms.

However, it has a number of practical flaws:

  1. Such notations are not as widely used as they should be, so the broader software development community is not going to have practical experience of it.

  2. It’s very hard to work with without dropping into tools such as LaTeX and add-on libraries.

  3. Such notations are difficult to understand, even for experts.

Given that the target audience of this specification is FileSystem developers, formal notations are not appropriate. Instead, broad comprehensibility, ease of maintenance, and ease of deriving tests take priority over mathematically-pure formal notation.

Mathematics Symbols in this document

This document does use a subset of the notation in the Z syntax, but in an ASCII form and the use of Python list notation for manipulating lists and sets.

  • iff : iff If and only if
  • : implies
  • : --> total function
  • : -> partial function
  • : ^: Set Intersection
  • : +: Set Union
  • \ : -: Set Difference
  • : exists Exists predicate

  • : forall: For all predicate
  • = : == Equals operator
  • : != operator. In Java z ≠ y is written as !( z.equals(y)) for all non-simple datatypes
  • : equivalent-to equivalence operator. This is stricter than equals.
  • : {} Empty Set. ∅ ≡ {}
  • : approximately-equal-to operator
  • ¬ : not Not operator. In Java, !
  • : does-not-exist: Does not exist predicate. Equivalent to not exists
  • : and : local and operator. In Java , &&
  • : or : local and operator. In Java, ||
  • : in : element of
  • : not in : not an element of
  • : subset-or-equal-to the subset or equality condition
  • : subset-of the proper subset condition
  • | p | : len(p) the size of a variable
  • := : = :

  • `:#` : Python-style comments

  • happens-before : happens-before : Lamport’s ordering relationship as defined in Time, Clocks and the Ordering of Events in a Distributed System

Sets, Lists, Maps, and Strings

The python data structures are used as the basis for this syntax as it is both plain ASCII and well-known.

Lists
  • A list L is an ordered sequence of elements [e1, e2, ... en]
  • The size of a list len(L) is the number of elements in a list.
  • Items can be addressed by a 0-based index e1 == L[0]
  • Python slicing operators can address subsets of a list L[0:3] == [e1,e2], L[:-1] == en
  • Lists can be concatenated L' = L + [ e3 ]
  • Lists can have entries removed L' = L - [ e2, e1 ]. This is different from Python’s del operation, which operates on the list in place.
  • The membership predicate in returns true iff an element is a member of a List: e2 in L
  • List comprehensions can create new lists: L' = [ x for x in l where x < 5]
  • for a list L, len(L) returns the number of elements.
Sets

Sets are an extension of the List notation, adding the restrictions that there can be no duplicate entries in the set, and there is no defined order.

  • A set is an unordered collection of items surrounded by { and } braces.
  • When declaring one, the python constructor {} is used. This is different from Python, which uses the function set([list]). Here the assumption is that the difference between a set and a dictionary can be determined from the contents.
  • The empty set {} has no elements.
  • All the usual set concepts apply.
  • The membership predicate is in.
  • Set comprehension uses the Python list comprehension. S' = {s for s in S where len(s)==2}
  • for a set s, len(s) returns the number of elements.
  • The - operator returns a new set excluding all items listed in the righthand set of the operator.
Maps

Maps resemble Python dictionaries; {“key”:value, “key2”,value2}

  • keys(Map) represents the set of keys in a map.
  • k in Map holds iff k in keys(Map)
  • The empty map is written {:}
  • The - operator returns a new map which excludes the entry with the key specified.
  • len(Map) returns the number of entries in the map.
Strings

Strings are lists of characters represented in double quotes. e.g. "abc"

"abc" == ['a','b','c']

State Immutability

All system state declarations are immutable.

The suffix “’” (single quote) is used as the convention to indicate the state of the system after an operation:

L' = L + ['d','e']

Function Specifications

A function is defined as a set of preconditions and a set of postconditions, where the postconditions define the new state of the system and the return value from the function.

Exceptions

In classic specification languages, the preconditions define the predicates that MUST be satisfied else some failure condition is raised.

For Hadoop, we need to be able to specify what failure condition results if a specification is not met (usually what exception is to be raised).

The notation raise <exception-name> is used to indicate that an exception is to be raised.

It can be used in the if-then-else sequence to define an action if a precondition is not met.

Example:

if not exists(FS, Path) : raise IOException

If implementations may raise any one of a set of exceptions, this is denoted by providing a set of exceptions:

if not exists(FS, Path) : raise {FileNotFoundException, IOException}

If a set of exceptions is provided, the earlier elements of the set are preferred to the later entries, on the basis that they aid diagnosis of problems.

We also need to distinguish predicates that MUST be satisfied, along with those that SHOULD be met. For this reason a function specification MAY include a section in the preconditions marked ‘Should:’ All predicates declared in this section SHOULD be met, and if there is an entry in that section which specifies a stricter outcome, it SHOULD BE preferred. Here is an example of a should-precondition:

Should:

if not exists(FS, Path) : raise FileNotFoundException

Conditions

There are further conditions used in precondition and postcondition declarations.

supported(instance, method)

This condition declares that a subclass implements the named method -some subclasses of the verious FileSystem classes do not, and instead raise UnsupportedOperation

As an example, one precondition of FSDataInputStream.seek is that the implementation must support Seekable.seek :

supported(FDIS, Seekable.seek) else raise UnsupportedOperation