org.apache.hadoop.io.compress.bzip2
Class CBZip2InputStream

java.lang.Object
  extended by java.io.InputStream
      extended by org.apache.hadoop.io.compress.bzip2.CBZip2InputStream
All Implemented Interfaces:
Closeable, BZip2Constants

public class CBZip2InputStream
extends InputStream
implements BZip2Constants

An input stream that decompresses from the BZip2 format (without the file header chars) to be read as any other stream.

The decompression requires large amounts of memory. Thus you should call the close() method as soon as possible, to force CBZip2InputStream to release the allocated memory. See CBZip2OutputStream for information about memory usage.

CBZip2InputStream reads bytes from the compressed source stream via the single byte read() method exclusively. Thus you should consider to use a buffered source stream.

Instances of this class are not threadsafe.


Nested Class Summary
static class CBZip2InputStream.STATE
          A state machine to keep track of current state of the de-coder
 
Field Summary
static long BLOCK_DELIMITER
           
static long EOS_DELIMITER
           
 
Fields inherited from interface org.apache.hadoop.io.compress.bzip2.BZip2Constants
baseBlockSize, END_OF_BLOCK, END_OF_STREAM, G_SIZE, MAX_ALPHA_SIZE, MAX_CODE_LEN, MAX_SELECTORS, N_GROUPS, N_ITERS, NUM_OVERSHOOT_BYTES, rNums, RUNA, RUNB
 
Constructor Summary
CBZip2InputStream(InputStream in)
           
CBZip2InputStream(InputStream in, SplittableCompressionCodec.READ_MODE readMode)
          Constructs a new CBZip2InputStream which decompresses bytes read from the specified stream.
 
Method Summary
 void close()
           
 long getProcessedByteCount()
          This method reports the processed bytes so far.
static long numberOfBytesTillNextMarker(InputStream in)
          Returns the number of bytes between the current stream position and the immediate next BZip2 block marker.
 int read()
           
 int read(byte[] dest, int offs, int len)
          In CONTINOUS reading mode, this read method starts from the start of the compressed stream and end at the end of file by emitting un-compressed data.
protected  void reportCRCError()
           
 boolean skipToNextMarker(long marker, int markerBitLength)
          This method tries to find the marker (passed to it as the first parameter) in the stream.
protected  void updateProcessedByteCount(int count)
          This method keeps track of raw processed compressed bytes.
 void updateReportedByteCount(int count)
          This method is called by the client of this class in case there are any corrections in the stream position.
 
Methods inherited from class java.io.InputStream
available, mark, markSupported, read, reset, skip
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

BLOCK_DELIMITER

public static final long BLOCK_DELIMITER
See Also:
Constant Field Values

EOS_DELIMITER

public static final long EOS_DELIMITER
See Also:
Constant Field Values
Constructor Detail

CBZip2InputStream

public CBZip2InputStream(InputStream in,
                         SplittableCompressionCodec.READ_MODE readMode)
                  throws IOException
Constructs a new CBZip2InputStream which decompresses bytes read from the specified stream.

Although BZip2 headers are marked with the magic "Bz" this constructor expects the next byte in the stream to be the first one after the magic. Thus callers have to skip the first two bytes. Otherwise this constructor will throw an exception.

Throws:
IOException - if the stream content is malformed or an I/O error occurs.
NullPointerException - if in == null

CBZip2InputStream

public CBZip2InputStream(InputStream in)
                  throws IOException
Throws:
IOException
Method Detail

getProcessedByteCount

public long getProcessedByteCount()
This method reports the processed bytes so far. Please note that this statistic is only updated on block boundaries and only when the stream is initiated in BYBLOCK mode.


updateProcessedByteCount

protected void updateProcessedByteCount(int count)
This method keeps track of raw processed compressed bytes.

Parameters:
count - count is the number of bytes to be added to raw processed bytes

updateReportedByteCount

public void updateReportedByteCount(int count)
This method is called by the client of this class in case there are any corrections in the stream position. One common example is when client of this code removes starting BZ characters from the compressed stream.

Parameters:
count - count bytes are added to the reported bytes

skipToNextMarker

public boolean skipToNextMarker(long marker,
                                int markerBitLength)
                         throws IOException,
                                IllegalArgumentException
This method tries to find the marker (passed to it as the first parameter) in the stream. It can find bit patterns of length <= 63 bits. Specifically this method is used in CBZip2InputStream to find the end of block (EOB) delimiter in the stream, starting from the current position of the stream. If marker is found, the stream position will be right after marker at the end of this call.

Parameters:
marker - The bit pattern to be found in the stream
markerBitLength - No of bits in the marker
Throws:
IOException
IllegalArgumentException - if marketBitLength is greater than 63

reportCRCError

protected void reportCRCError()
                       throws IOException
Throws:
IOException

numberOfBytesTillNextMarker

public static long numberOfBytesTillNextMarker(InputStream in)
                                        throws IOException
Returns the number of bytes between the current stream position and the immediate next BZip2 block marker.

Parameters:
in - The InputStream
Returns:
long Number of bytes between current stream position and the next BZip2 block start marker.
Throws:
IOException

read

public int read()
         throws IOException
Specified by:
read in class InputStream
Throws:
IOException

read

public int read(byte[] dest,
                int offs,
                int len)
         throws IOException
In CONTINOUS reading mode, this read method starts from the start of the compressed stream and end at the end of file by emitting un-compressed data. In this mode stream positioning is not announced and should be ignored. In BYBLOCK reading mode, this read method informs about the end of a BZip2 block by returning EOB. At this event, the compressed stream position is also announced. This announcement tells that how much of the compressed stream has been de-compressed and read out of this class. In between EOB events, the stream position is not updated.

Overrides:
read in class InputStream
Returns:
int The return value greater than 0 are the bytes read. A value of -1 means end of stream while -2 represents end of block
Throws:
IOException - if the stream content is malformed or an I/O error occurs.

close

public void close()
           throws IOException
Specified by:
close in interface Closeable
Overrides:
close in class InputStream
Throws:
IOException


Copyright © 2009 The Apache Software Foundation