Skip navigation links

@InterfaceAudience.Public @InterfaceStability.Unstable

Package org.apache.hadoop.fs.statistics

This package contains support for statistic collection and reporting.

See: Description

Package org.apache.hadoop.fs.statistics Description

This package contains support for statistic collection and reporting. This is the public API; implementation classes are to be kept elsewhere.

This package defines two interfaces:

IOStatisticsSource: a source of statistic data, which can be retrieved through a call to IOStatisticsSource.getIOStatistics() .

IOStatistics the statistics retrieved from a statistics source.

The retrieved statistics may be an immutable snapshot -in which case to get updated statistics another call to IOStatisticsSource.getIOStatistics() must be made. Or they may be dynamic -in which case every time a specific statistic is retrieved, the latest version is returned. Callers should assume that if a statistics instance is dynamic, there is no atomicity when querying multiple statistics. If the statistics source was a closeable object (e.g. a stream), the statistics MUST remain valid after the stream is closed.

Use pattern:

An application probes an object (filesystem, stream etc) to see if it implements IOStatisticsSource, and, if it is, calls getIOStatistics() to get its statistics. If this is non-null, the client has statistics on the current state of the statistics.

The expectation is that a statistics source is dynamic: when a value is looked up the most recent values are returned. When iterating through the set, the values of the iterator SHOULD be frozen at the time the iterator was requested.

These statistics can be used to: log operations, profile applications, and make assertions about the state of the output.

The names of statistics are a matter of choice of the specific source. However, StoreStatisticNames contains a set of names recommended for object store operations. StreamStatisticNames declares recommended names for statistics provided for input and output streams.

Utility classes:

Implementors notes:

  1. IOStatistics keys SHOULD be standard names where possible.
  2. An IOStatistics instance MUST be unique to that specific instance of IOStatisticsSource. (i.e. not shared the way StorageStatistics are)
  3. MUST return the same values irrespective of which thread the statistics are retrieved or its keys evaluated.
  4. MUST NOT remove keys once a statistic instance has been created.
  5. MUST NOT add keys once a statistic instance has been created.
  6. MUST NOT block for long periods of time while blocking operations (reads, writes) are taking place in the source. That is: minimal synchronization points (AtomicLongs etc.) may be used to share values, but retrieval of statistics should be fast and return values even while slow/blocking remote IO is underway.
  7. MUST support value enumeration and retrieval after the source has been closed.
  8. SHOULD NOT have back-references to potentially expensive objects (filesystem instances etc.)
  9. SHOULD provide statistics which can be added to generate aggregate statistics.
Skip navigation links

Copyright © 2024 Apache Software Foundation. All rights reserved.