A formal notation such as The Z Notation would be the strictest way to define Hadoop FileSystem behavior, and could even be used to prove some axioms.
However, it has a number of practical flaws:
Such notations are not as widely used as they should be, so the broader software development community is not going to have practical experience of it.
It’s very hard to work with without dropping into tools such as LaTeX and add-on libraries.
Such notations are difficult to understand, even for experts.
Given that the target audience of this specification is FileSystem developers, formal notations are not appropriate. Instead, broad comprehensibility, ease of maintenance, and ease of deriving tests take priority over mathematically-pure formal notation.
This document does use a subset of the notation in the Z syntax, but in an ASCII form and the use of Python list notation for manipulating lists and sets.
iff
: iff
If and only if⇒
: implies
→
: -->
total function↛
: ->
partial function∩
: ^
: Set Intersection
∪
: +
: Set Union\
: -
: Set Difference∃
: exists
Exists predicate
∀
: forall
: For all predicate=
: ==
Equals operator≠
: !=
operator. In Java z ≠ y
is written as !( z.equals(y))
for all non-simple datatypes≡
: equivalent-to
equivalence operator. This is stricter than equals.∅
: {}
Empty Set. ∅ ≡ {}
≈
: approximately-equal-to
operator¬
: not
Not operator. In Java, !
∄
: does-not-exist
: Does not exist predicate. Equivalent to not exists
∧
: and
: local and operator. In Java , &&
∨
: or
: local and operator. In Java, ||
∈
: in
: element of∉
: not in
: not an element of⊆
: subset-or-equal-to
the subset or equality condition⊂
: subset-of
the proper subset condition| p |
: len(p)
the size of a variable:=
: =
:
`` : #
: Python-style comments
happens-before
: happens-before
: Lamport’s ordering relationship as defined in Time, Clocks and the Ordering of Events in a Distributed System
The python data structures are used as the basis for this syntax as it is both plain ASCII and well-known.
[e1, e2, ... en]
len(L)
is the number of elements in a list.e1 == L[0]
L[0:3] == [e1,e2]
, L[:-1] == en
L' = L + [ e3 ]
L' = L - [ e2, e1 ]
. This is different from Python’s del
operation, which operates on the list in place.in
returns true iff an element is a member of a List: e2 in L
L' = [ x for x in l where x < 5]
L
, len(L)
returns the number of elements.Sets are an extension of the List notation, adding the restrictions that there can be no duplicate entries in the set, and there is no defined order.
{
and }
braces.{}
is used. This is different from Python, which uses the function set([list])
. Here the assumption is that the difference between a set and a dictionary can be determined from the contents.{}
has no elements.in
.S' = {s for s in S where len(s)==2}
len(s)
returns the number of elements.-
operator returns a new set excluding all items listed in the righthand set of the operator.Maps resemble Python dictionaries; {“key”:value, “key2”,value2}
keys(Map)
represents the set of keys in a map.k in Map
holds iff k in keys(Map)
{:}
-
operator returns a new map which excludes the entry with the key specified.len(Map)
returns the number of entries in the map.Strings are lists of characters represented in double quotes. e.g. "abc"
"abc" == ['a','b','c']
All system state declarations are immutable.
The suffix “’” (single quote) is used as the convention to indicate the state of the system after an operation:
L' = L + ['d','e']
A function is defined as a set of preconditions and a set of postconditions, where the postconditions define the new state of the system and the return value from the function.
In classic specification languages, the preconditions define the predicates that MUST be satisfied else some failure condition is raised.
For Hadoop, we need to be able to specify what failure condition results if a specification is not met (usually what exception is to be raised).
The notation raise <exception-name>
is used to indicate that an exception is to be raised.
It can be used in the if-then-else sequence to define an action if a precondition is not met.
Example:
if not exists(FS, Path) : raise IOException
If implementations may raise any one of a set of exceptions, this is denoted by providing a set of exceptions:
if not exists(FS, Path) : raise {FileNotFoundException, IOException}
If a set of exceptions is provided, the earlier elements of the set are preferred to the later entries, on the basis that they aid diagnosis of problems.
We also need to distinguish predicates that MUST be satisfied, along with those that SHOULD be met. For this reason a function specification MAY include a section in the preconditions marked ‘Should:’ All predicates declared in this section SHOULD be met, and if there is an entry in that section which specifies a stricter outcome, it SHOULD BE preferred. Here is an example of a should-precondition:
Should:
if not exists(FS, Path) : raise FileNotFoundException
There are further conditions used in precondition and postcondition declarations.
supported(instance, method)
This condition declares that a subclass implements the named method -some subclasses of the verious FileSystem classes do not, and instead raise UnsupportedOperation
As an example, one precondition of FSDataInputStream.seek
is that the implementation must support Seekable.seek
:
supported(FDIS, Seekable.seek) else raise UnsupportedOperation