The Hadoop FileSystem API Definition

This is a specification of the Hadoop FileSystem APIs, which models the contents of a filesystem as a set of paths that are either directories, symbolic links, or files.

There is surprisingly little prior art in this area. There are multiple specifications of Unix filesystems as a tree of inodes, but nothing public which defines the notion of “Unix filesystem as a conceptual model for data storage access”.

This specification attempts to do that; to define the Hadoop FileSystem model and APIs so that multiple filesystems can implement the APIs and present a consistent model of their data to applications. It does not attempt to formally specify any of the concurrency behaviors of the filesystems, other than to document the behaviours exhibited by HDFS as these are commonly expected by Hadoop client applications.

  1. Introduction
  2. Notation
  3. Model
  4. FileSystem class
  5. OutputStream, Syncable and StreamCapabilities
  6. Abortable
  7. FSDataInputStream class
  8. PathCapabilities interface
  9. FSDataOutputStreamBuilder class
  10. Testing with the Filesystem specification
  11. Extending the specification and its tests
  12. Uploading a file using Multiple Parts
  13. IOStatistics
  14. openFile()
  15. SafeMode
  16. LeaseRecoverable