Class FSDataInputStream

All Implemented Interfaces:
Closeable, DataInput, AutoCloseable, ByteBufferPositionedReadable, ByteBufferReadable, CanSetDropBehind, CanSetReadahead, CanUnbuffer, org.apache.hadoop.fs.HasEnhancedByteBufferAccess, org.apache.hadoop.fs.HasFileDescriptor, PositionedReadable, Seekable, org.apache.hadoop.fs.statistics.IOStatisticsSource, StreamCapabilities
Direct Known Subclasses:
HdfsDataInputStream

@Public @Stable public class FSDataInputStream extends DataInputStream implements Seekable, PositionedReadable, ByteBufferReadable, org.apache.hadoop.fs.HasFileDescriptor, CanSetDropBehind, CanSetReadahead, org.apache.hadoop.fs.HasEnhancedByteBufferAccess, CanUnbuffer, StreamCapabilities, ByteBufferPositionedReadable, org.apache.hadoop.fs.statistics.IOStatisticsSource
Utility that wraps a FSInputStream in a DataInputStream and buffers input through a BufferedInputStream.
  • Constructor Details

    • FSDataInputStream

      public FSDataInputStream(InputStream in)
  • Method Details

    • seek

      public void seek(long desired) throws IOException
      Seek to the given offset.
      Specified by:
      seek in interface Seekable
      Parameters:
      desired - offset to seek to
      Throws:
      IOException - raised on errors performing I/O.
    • getPos

      public long getPos() throws IOException
      Get the current position in the input stream.
      Specified by:
      getPos in interface Seekable
      Returns:
      current position in the input stream
      Throws:
      IOException - raised on errors performing I/O.
    • read

      public int read(long position, byte[] buffer, int offset, int length) throws IOException
      Read bytes from the given position in the stream to the given buffer.
      Specified by:
      read in interface PositionedReadable
      Parameters:
      position - position in the input stream to seek
      buffer - buffer into which data is read
      offset - offset into the buffer in which data is written
      length - maximum number of bytes to read
      Returns:
      total number of bytes read into the buffer, or -1 if there is no more data because the end of the stream has been reached
      Throws:
      IOException - IO problems.
    • readFully

      public void readFully(long position, byte[] buffer, int offset, int length) throws IOException
      Read bytes from the given position in the stream to the given buffer. Continues to read until length bytes have been read.
      Specified by:
      readFully in interface PositionedReadable
      Parameters:
      position - position in the input stream to seek
      buffer - buffer into which data is read
      offset - offset into the buffer in which data is written
      length - the number of bytes to read
      Throws:
      IOException - IO problems
      EOFException - If the end of stream is reached while reading. If an exception is thrown an undetermined number of bytes in the buffer may have been written.
    • readFully

      public void readFully(long position, byte[] buffer) throws IOException
      Specified by:
      readFully in interface PositionedReadable
      Parameters:
      position - position within file
      buffer - destination buffer
      Throws:
      IOException - IO problems.
      EOFException - the end of the data was reached before the read operation completed
    • seekToNewSource

      public boolean seekToNewSource(long targetPos) throws IOException
      Seek to the given position on an alternate copy of the data.
      Specified by:
      seekToNewSource in interface Seekable
      Parameters:
      targetPos - position to seek to
      Returns:
      true if a new source is found, false otherwise
      Throws:
      IOException - raised on errors performing I/O.
    • getWrappedStream

      @Public @Stable public InputStream getWrappedStream()
      Get a reference to the wrapped input stream. Used by unit tests.
      Returns:
      the underlying input stream
    • read

      public int read(ByteBuffer buf) throws IOException
      Description copied from interface: ByteBufferReadable
      Reads up to buf.remaining() bytes into buf. Callers should use buf.limit(..) to control the size of the desired read.

      After a successful call, buf.position() will be advanced by the number of bytes read and buf.limit() will be unchanged.

      In the case of an exception, the state of the buffer (the contents of the buffer, the buf.position(), the buf.limit(), etc.) is undefined, and callers should be prepared to recover from this eventuality.

      Callers should use StreamCapabilities.hasCapability(String) with StreamCapabilities.READBYTEBUFFER to check if the underlying stream supports this interface, otherwise they might get a UnsupportedOperationException.

      Implementations should treat 0-length requests as legitimate, and must not signal an error upon their receipt.

      Specified by:
      read in interface ByteBufferReadable
      Parameters:
      buf - the ByteBuffer to receive the results of the read operation.
      Returns:
      the number of bytes read, possibly zero, or -1 if reach end-of-stream
      Throws:
      IOException - if there is some error performing the read
    • getFileDescriptor

      public FileDescriptor getFileDescriptor() throws IOException
      Specified by:
      getFileDescriptor in interface org.apache.hadoop.fs.HasFileDescriptor
      Returns:
      the FileDescriptor
      Throws:
      IOException - raised on errors performing I/O.
    • setReadahead

      public void setReadahead(Long readahead) throws IOException, UnsupportedOperationException
      Description copied from interface: CanSetReadahead
      Set the readahead on this stream.
      Specified by:
      setReadahead in interface CanSetReadahead
      Parameters:
      readahead - The readahead to use. null means to use the default.
      Throws:
      IOException - If there was an error changing the dropBehind setting. UnsupportedOperationException If this stream doesn't support setting readahead.
      UnsupportedOperationException
    • setDropBehind

      public void setDropBehind(Boolean dropBehind) throws IOException, UnsupportedOperationException
      Description copied from interface: CanSetDropBehind
      Configure whether the stream should drop the cache.
      Specified by:
      setDropBehind in interface CanSetDropBehind
      Parameters:
      dropBehind - Whether to drop the cache. null means to use the default value.
      Throws:
      IOException - If there was an error changing the dropBehind setting. UnsupportedOperationException If this stream doesn't support setting the drop-behind.
      UnsupportedOperationException
    • read

      public ByteBuffer read(ByteBufferPool bufferPool, int maxLength, EnumSet<ReadOption> opts) throws IOException, UnsupportedOperationException
      Description copied from interface: org.apache.hadoop.fs.HasEnhancedByteBufferAccess
      Get a ByteBuffer containing file data. This ByteBuffer may come from the stream itself, via a call like mmap, or it may come from the ByteBufferFactory which is passed in as an argument.
      Specified by:
      read in interface org.apache.hadoop.fs.HasEnhancedByteBufferAccess
      Parameters:
      bufferPool - If this is non-null, it will be used to create a fallback ByteBuffer when the stream itself cannot create one.
      maxLength - The maximum length of buffer to return. We may return a buffer which is shorter than this.
      opts - Options to use when reading.
      Returns:
      We will always return an empty buffer if maxLength was 0, whether or not we are at EOF. If maxLength > 0, we will return null if the stream has reached EOF. Otherwise, we will return a ByteBuffer containing at least one byte. You must free this ByteBuffer when you are done with it by calling releaseBuffer on it. The buffer will continue to be readable until it is released in this manner. However, the input stream's close method may warn about unclosed buffers.
      Throws:
      IOException - if there was an error reading.
      UnsupportedOperationException - if factory was null, and we needed an external byte buffer.
    • read

      public final ByteBuffer read(ByteBufferPool bufferPool, int maxLength) throws IOException, UnsupportedOperationException
      Throws:
      IOException
      UnsupportedOperationException
    • releaseBuffer

      public void releaseBuffer(ByteBuffer buffer)
      Description copied from interface: org.apache.hadoop.fs.HasEnhancedByteBufferAccess
      Release a ByteBuffer which was created by the enhanced ByteBuffer read function. You must not continue using the ByteBuffer after calling this function.
      Specified by:
      releaseBuffer in interface org.apache.hadoop.fs.HasEnhancedByteBufferAccess
      Parameters:
      buffer - The ByteBuffer to release.
    • unbuffer

      public void unbuffer()
      Description copied from interface: CanUnbuffer
      Reduce the buffering. This will also free sockets and file descriptors held by the stream, if possible.
      Specified by:
      unbuffer in interface CanUnbuffer
    • hasCapability

      public boolean hasCapability(String capability)
      Description copied from interface: StreamCapabilities
      Query the stream for a specific capability.
      Specified by:
      hasCapability in interface StreamCapabilities
      Parameters:
      capability - string to query the stream support for.
      Returns:
      True if the stream supports capability.
    • toString

      public String toString()
      String value. Includes the string value of the inner stream
      Overrides:
      toString in class Object
      Returns:
      the stream
    • read

      public int read(long position, ByteBuffer buf) throws IOException
      Description copied from interface: ByteBufferPositionedReadable
      Reads up to buf.remaining() bytes into buf from a given position in the file and returns the number of bytes read. Callers should use buf.limit(...) to control the size of the desired read and buf.position(...) to control the offset into the buffer the data should be written to.

      After a successful call, buf.position() will be advanced by the number of bytes read and buf.limit() will be unchanged.

      In the case of an exception, the state of the buffer (the contents of the buffer, the buf.position(), the buf.limit(), etc.) is undefined, and callers should be prepared to recover from this eventuality.

      Callers should use StreamCapabilities.hasCapability(String) with StreamCapabilities.PREADBYTEBUFFER to check if the underlying stream supports this interface, otherwise they might get a UnsupportedOperationException.

      Implementations should treat 0-length requests as legitimate, and must not signal an error upon their receipt.

      This does not change the current offset of a file, and is thread-safe.

      Specified by:
      read in interface ByteBufferPositionedReadable
      Parameters:
      position - position within file
      buf - the ByteBuffer to receive the results of the read operation.
      Returns:
      the number of bytes read, possibly zero, or -1 if reached end-of-stream
      Throws:
      IOException - if there is some error performing the read
    • readFully

      public void readFully(long position, ByteBuffer buf) throws IOException
      Delegate to the underlying stream.
      Specified by:
      readFully in interface ByteBufferPositionedReadable
      Parameters:
      position - position within file
      buf - the ByteBuffer to receive the results of the read operation.
      Throws:
      IOException - on a failure from the nested stream.
      UnsupportedOperationException - if the inner stream does not support this operation.
      See Also:
    • getIOStatistics

      public IOStatistics getIOStatistics()
      Get the IO Statistics of the nested stream, falling back to null if the stream does not implement the interface IOStatisticsSource.
      Specified by:
      getIOStatistics in interface org.apache.hadoop.fs.statistics.IOStatisticsSource
      Returns:
      an IOStatistics instance or null
    • minSeekForVectorReads

      public int minSeekForVectorReads()
      Description copied from interface: PositionedReadable
      What is the smallest reasonable seek?
      Specified by:
      minSeekForVectorReads in interface PositionedReadable
      Returns:
      the minimum number of bytes
    • maxReadSizeForVectorReads

      public int maxReadSizeForVectorReads()
      Description copied from interface: PositionedReadable
      What is the largest size that we should group ranges together as?
      Specified by:
      maxReadSizeForVectorReads in interface PositionedReadable
      Returns:
      the number of bytes to read at once
    • readVectored

      public void readVectored(List<? extends org.apache.hadoop.fs.FileRange> ranges, IntFunction<ByteBuffer> allocate) throws IOException
      Description copied from interface: PositionedReadable
      Read fully a list of file ranges asynchronously from this file. The default iterates through the ranges to read each synchronously, but the intent is that FSDataInputStream subclasses can make more efficient readers. As a result of the call, each range will have FileRange.setData(CompletableFuture) called with a future that when complete will have a ByteBuffer with the data from the file's range.

      The position returned by getPos() after readVectored() is undefined.

      If a file is changed while the readVectored() operation is in progress, the output is undefined. Some ranges may have old data, some may have new and some may have both.

      While a readVectored() operation is in progress, normal read api calls may block.

      Specified by:
      readVectored in interface PositionedReadable
      Parameters:
      ranges - the byte ranges to read
      allocate - the function to allocate ByteBuffer
      Throws:
      IOException - any IOE.
    • readVectored

      public void readVectored(List<? extends org.apache.hadoop.fs.FileRange> ranges, IntFunction<ByteBuffer> allocate, Consumer<ByteBuffer> release) throws IOException
      Description copied from interface: PositionedReadable
      Extension of PositionedReadable.readVectored(List, IntFunction) where a release(buffer) operation may be invoked if problems surface during reads.

      The release operation is invoked after an IOException to return the actively buffer to a pool before reporting a failure in the future.

      The default implementation calls PositionedReadable.readVectored(List, IntFunction).p

      Implementations SHOULD override this method if they can release buffers as part of their error handling.

      Specified by:
      readVectored in interface PositionedReadable
      Parameters:
      ranges - the byte ranges to read
      allocate - function to allocate ByteBuffer
      release - callable to release a ByteBuffer.
      Throws:
      IOException - any IOE.