org.apache.hadoop.io
Class SequenceFile

java.lang.Object
  extended by org.apache.hadoop.io.SequenceFile

public class SequenceFile
extends Object

SequenceFiles are flat files consisting of binary key/value pairs.

SequenceFile provides Writer, Reader and SequenceFile.Sorter classes for writing, reading and sorting respectively.

There are three SequenceFile Writers based on the SequenceFile.CompressionType used to compress key/value pairs:
  1. Writer : Uncompressed records.
  2. RecordCompressWriter : Record-compressed files, only compress values.
  3. BlockCompressWriter : Block-compressed files, both keys & values are collected in 'blocks' separately and compressed. The size of the 'block' is configurable.

The actual compression algorithm used to compress key and/or values can be specified by using the appropriate CompressionCodec.

The recommended way is to use the static createWriter methods provided by the SequenceFile to chose the preferred format.

The Reader acts as the bridge and can read any of the above SequenceFile formats.

SequenceFile Formats

Essentially there are 3 different formats for SequenceFiles depending on the CompressionType specified. All of them share a common header described below.

Uncompressed SequenceFile Format
Record-Compressed SequenceFile Format
Block-Compressed SequenceFile Format

The compressed blocks of key lengths and value lengths consist of the actual lengths of individual keys/values encoded in ZeroCompressedInteger format.

See Also:
CompressionCodec

Nested Class Summary
static class SequenceFile.CompressionType
          The compression type used to compress key/value pairs in the SequenceFile.
static class SequenceFile.Metadata
          The class encapsulating with the metadata of a file.
static class SequenceFile.Reader
          Reads key/value pairs from a sequence-format file.
static class SequenceFile.Sorter
          Sorts key/value pairs in a sequence-format file.
static interface SequenceFile.ValueBytes
          The interface to 'raw' values of SequenceFiles.
static class SequenceFile.Writer
          Write key/value pairs to a sequence-format file.
 
Field Summary
static int SYNC_INTERVAL
          The number of bytes between sync points.
 
Method Summary
static SequenceFile.Writer createWriter(Configuration conf, FSDataOutputStream out, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, CompressionCodec codec)
          Construct the preferred type of 'raw' SequenceFile Writer.
static SequenceFile.Writer createWriter(Configuration conf, FSDataOutputStream out, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, CompressionCodec codec, SequenceFile.Metadata metadata)
          Construct the preferred type of 'raw' SequenceFile Writer.
static SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass)
          Construct the preferred type of SequenceFile Writer.
static SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, int bufferSize, short replication, long blockSize, boolean createParent, SequenceFile.CompressionType compressionType, CompressionCodec codec, SequenceFile.Metadata metadata)
          Construct the preferred type of SequenceFile Writer.
static SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, int bufferSize, short replication, long blockSize, SequenceFile.CompressionType compressionType, CompressionCodec codec, Progressable progress, SequenceFile.Metadata metadata)
          Construct the preferred type of SequenceFile Writer.
static SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType)
          Construct the preferred type of SequenceFile Writer.
static SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, CompressionCodec codec)
          Construct the preferred type of SequenceFile Writer.
static SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, CompressionCodec codec, Progressable progress)
          Construct the preferred type of SequenceFile Writer.
static SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, CompressionCodec codec, Progressable progress, SequenceFile.Metadata metadata)
          Construct the preferred type of SequenceFile Writer.
static SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, Progressable progress)
          Construct the preferred type of SequenceFile Writer.
static SequenceFile.CompressionType getCompressionType(Configuration job)
          Deprecated. Use SequenceFileOutputFormat.getOutputCompressionType(org.apache.hadoop.mapred.JobConf) to get SequenceFile.CompressionType for job-outputs.
static void setCompressionType(Configuration job, SequenceFile.CompressionType val)
          Deprecated. Use the one of the many SequenceFile.createWriter methods to specify the SequenceFile.CompressionType while creating the SequenceFile or SequenceFileOutputFormat.setOutputCompressionType(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.io.SequenceFile.CompressionType) to specify the SequenceFile.CompressionType for job-outputs. or
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SYNC_INTERVAL

public static final int SYNC_INTERVAL
The number of bytes between sync points.

See Also:
Constant Field Values
Method Detail

getCompressionType

@Deprecated
public static SequenceFile.CompressionType getCompressionType(Configuration job)
Deprecated. Use SequenceFileOutputFormat.getOutputCompressionType(org.apache.hadoop.mapred.JobConf) to get SequenceFile.CompressionType for job-outputs.

Get the compression type for the reduce outputs

Parameters:
job - the job config to look in
Returns:
the kind of compression to use

setCompressionType

@Deprecated
public static void setCompressionType(Configuration job,
                                                 SequenceFile.CompressionType val)
Deprecated. Use the one of the many SequenceFile.createWriter methods to specify the SequenceFile.CompressionType while creating the SequenceFile or SequenceFileOutputFormat.setOutputCompressionType(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.io.SequenceFile.CompressionType) to specify the SequenceFile.CompressionType for job-outputs. or

Set the compression type for sequence files.

Parameters:
job - the configuration to modify
val - the new compression type (none, block, record)

createWriter

public static SequenceFile.Writer createWriter(FileSystem fs,
                                               Configuration conf,
                                               Path name,
                                               Class keyClass,
                                               Class valClass)
                                        throws IOException
Construct the preferred type of SequenceFile Writer.

Parameters:
fs - The configured filesystem.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

public static SequenceFile.Writer createWriter(FileSystem fs,
                                               Configuration conf,
                                               Path name,
                                               Class keyClass,
                                               Class valClass,
                                               SequenceFile.CompressionType compressionType)
                                        throws IOException
Construct the preferred type of SequenceFile Writer.

Parameters:
fs - The configured filesystem.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
compressionType - The compression type.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

public static SequenceFile.Writer createWriter(FileSystem fs,
                                               Configuration conf,
                                               Path name,
                                               Class keyClass,
                                               Class valClass,
                                               SequenceFile.CompressionType compressionType,
                                               Progressable progress)
                                        throws IOException
Construct the preferred type of SequenceFile Writer.

Parameters:
fs - The configured filesystem.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
compressionType - The compression type.
progress - The Progressable object to track progress.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

public static SequenceFile.Writer createWriter(FileSystem fs,
                                               Configuration conf,
                                               Path name,
                                               Class keyClass,
                                               Class valClass,
                                               SequenceFile.CompressionType compressionType,
                                               CompressionCodec codec)
                                        throws IOException
Construct the preferred type of SequenceFile Writer.

Parameters:
fs - The configured filesystem.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
compressionType - The compression type.
codec - The compression codec.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

public static SequenceFile.Writer createWriter(FileSystem fs,
                                               Configuration conf,
                                               Path name,
                                               Class keyClass,
                                               Class valClass,
                                               SequenceFile.CompressionType compressionType,
                                               CompressionCodec codec,
                                               Progressable progress,
                                               SequenceFile.Metadata metadata)
                                        throws IOException
Construct the preferred type of SequenceFile Writer.

Parameters:
fs - The configured filesystem.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
compressionType - The compression type.
codec - The compression codec.
progress - The Progressable object to track progress.
metadata - The metadata of the file.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

public static SequenceFile.Writer createWriter(FileSystem fs,
                                               Configuration conf,
                                               Path name,
                                               Class keyClass,
                                               Class valClass,
                                               int bufferSize,
                                               short replication,
                                               long blockSize,
                                               SequenceFile.CompressionType compressionType,
                                               CompressionCodec codec,
                                               Progressable progress,
                                               SequenceFile.Metadata metadata)
                                        throws IOException
Construct the preferred type of SequenceFile Writer.

Parameters:
fs - The configured filesystem.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
bufferSize - buffer size for the underlaying outputstream.
replication - replication factor for the file.
blockSize - block size for the file.
compressionType - The compression type.
codec - The compression codec.
progress - The Progressable object to track progress.
metadata - The metadata of the file.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

public static SequenceFile.Writer createWriter(FileSystem fs,
                                               Configuration conf,
                                               Path name,
                                               Class keyClass,
                                               Class valClass,
                                               int bufferSize,
                                               short replication,
                                               long blockSize,
                                               boolean createParent,
                                               SequenceFile.CompressionType compressionType,
                                               CompressionCodec codec,
                                               SequenceFile.Metadata metadata)
                                        throws IOException
Construct the preferred type of SequenceFile Writer.

Parameters:
fs - The configured filesystem.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
bufferSize - buffer size for the underlaying outputstream.
replication - replication factor for the file.
blockSize - block size for the file.
createParent - create parent directory if non-existent
compressionType - The compression type.
codec - The compression codec.
progress - The Progressable object to track progress.
metadata - The metadata of the file.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

public static SequenceFile.Writer createWriter(FileSystem fs,
                                               Configuration conf,
                                               Path name,
                                               Class keyClass,
                                               Class valClass,
                                               SequenceFile.CompressionType compressionType,
                                               CompressionCodec codec,
                                               Progressable progress)
                                        throws IOException
Construct the preferred type of SequenceFile Writer.

Parameters:
fs - The configured filesystem.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
compressionType - The compression type.
codec - The compression codec.
progress - The Progressable object to track progress.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

public static SequenceFile.Writer createWriter(Configuration conf,
                                               FSDataOutputStream out,
                                               Class keyClass,
                                               Class valClass,
                                               SequenceFile.CompressionType compressionType,
                                               CompressionCodec codec,
                                               SequenceFile.Metadata metadata)
                                        throws IOException
Construct the preferred type of 'raw' SequenceFile Writer.

Parameters:
conf - The configuration.
out - The stream on top which the writer is to be constructed.
keyClass - The 'key' type.
valClass - The 'value' type.
compressionType - The compression type.
codec - The compression codec.
metadata - The metadata of the file.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

public static SequenceFile.Writer createWriter(Configuration conf,
                                               FSDataOutputStream out,
                                               Class keyClass,
                                               Class valClass,
                                               SequenceFile.CompressionType compressionType,
                                               CompressionCodec codec)
                                        throws IOException
Construct the preferred type of 'raw' SequenceFile Writer.

Parameters:
conf - The configuration.
out - The stream on top which the writer is to be constructed.
keyClass - The 'key' type.
valClass - The 'value' type.
compressionType - The compression type.
codec - The compression codec.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException


Copyright © 2009 The Apache Software Foundation