org.apache.hadoop.io
Class SequenceFile

java.lang.Object
  extended by org.apache.hadoop.io.SequenceFile

@InterfaceAudience.Public
@InterfaceStability.Stable
public class SequenceFile
extends Object

SequenceFiles are flat files consisting of binary key/value pairs.

SequenceFile provides SequenceFile.Writer, SequenceFile.Reader and SequenceFile.Sorter classes for writing, reading and sorting respectively.

There are three SequenceFile Writers based on the SequenceFile.CompressionType used to compress key/value pairs:
  1. Writer : Uncompressed records.
  2. RecordCompressWriter : Record-compressed files, only compress values.
  3. BlockCompressWriter : Block-compressed files, both keys & values are collected in 'blocks' separately and compressed. The size of the 'block' is configurable.

The actual compression algorithm used to compress key and/or values can be specified by using the appropriate CompressionCodec.

The recommended way is to use the static createWriter methods provided by the SequenceFile to chose the preferred format.

The SequenceFile.Reader acts as the bridge and can read any of the above SequenceFile formats.

SequenceFile Formats

Essentially there are 3 different formats for SequenceFiles depending on the CompressionType specified. All of them share a common header described below.

Uncompressed SequenceFile Format
Record-Compressed SequenceFile Format
Block-Compressed SequenceFile Format

The compressed blocks of key lengths and value lengths consist of the actual lengths of individual keys/values encoded in ZeroCompressedInteger format.

See Also:
CompressionCodec

Field Summary
static int SYNC_INTERVAL
          The number of bytes between sync points.
 
Method Summary
static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf, FSDataOutputStream out, Class keyClass, Class valClass, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, CompressionCodec codec)
          Deprecated. Use createWriter(Configuration, Writer.Option...) instead.
static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf, FSDataOutputStream out, Class keyClass, Class valClass, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, CompressionCodec codec, org.apache.hadoop.io.SequenceFile.Metadata metadata)
          Deprecated. Use createWriter(Configuration, Writer.Option...) instead.
static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf, org.apache.hadoop.io.SequenceFile.Writer.Option... opts)
          Create a new Writer with the given options.
static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileContext fc, Configuration conf, Path name, Class keyClass, Class valClass, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, CompressionCodec codec, org.apache.hadoop.io.SequenceFile.Metadata metadata, EnumSet<CreateFlag> createFlag, org.apache.hadoop.fs.Options.CreateOpts... opts)
          Construct the preferred type of SequenceFile Writer.
static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass)
          Deprecated. Use createWriter(Configuration, Writer.Option...) instead.
static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, int bufferSize, short replication, long blockSize, boolean createParent, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, CompressionCodec codec, org.apache.hadoop.io.SequenceFile.Metadata metadata)
          Deprecated. 
static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, int bufferSize, short replication, long blockSize, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, CompressionCodec codec, Progressable progress, org.apache.hadoop.io.SequenceFile.Metadata metadata)
          Deprecated. Use createWriter(Configuration, Writer.Option...) instead.
static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, org.apache.hadoop.io.SequenceFile.CompressionType compressionType)
          Deprecated. Use createWriter(Configuration, Writer.Option...) instead.
static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, CompressionCodec codec)
          Deprecated. Use createWriter(Configuration, Writer.Option...) instead.
static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, CompressionCodec codec, Progressable progress)
          Deprecated. Use createWriter(Configuration, Writer.Option...) instead.
static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, CompressionCodec codec, Progressable progress, org.apache.hadoop.io.SequenceFile.Metadata metadata)
          Deprecated. Use createWriter(Configuration, Writer.Option...) instead.
static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, Progressable progress)
          Deprecated. Use createWriter(Configuration, Writer.Option...) instead.
static org.apache.hadoop.io.SequenceFile.CompressionType getDefaultCompressionType(Configuration job)
          Get the compression type for the reduce outputs
static void setDefaultCompressionType(Configuration job, org.apache.hadoop.io.SequenceFile.CompressionType val)
          Set the default compression type for sequence files.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SYNC_INTERVAL

public static final int SYNC_INTERVAL
The number of bytes between sync points.

See Also:
Constant Field Values
Method Detail

getDefaultCompressionType

public static org.apache.hadoop.io.SequenceFile.CompressionType getDefaultCompressionType(Configuration job)
Get the compression type for the reduce outputs

Parameters:
job - the job config to look in
Returns:
the kind of compression to use

setDefaultCompressionType

public static void setDefaultCompressionType(Configuration job,
                                             org.apache.hadoop.io.SequenceFile.CompressionType val)
Set the default compression type for sequence files.

Parameters:
job - the configuration to modify
val - the new compression type (none, block, record)

createWriter

public static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf,
                                                                    org.apache.hadoop.io.SequenceFile.Writer.Option... opts)
                                                             throws IOException
Create a new Writer with the given options.

Parameters:
conf - the configuration to use
opts - the options to create the file with
Returns:
a new Writer
Throws:
IOException

createWriter

@Deprecated
public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs,
                                                                               Configuration conf,
                                                                               Path name,
                                                                               Class keyClass,
                                                                               Class valClass)
                                                             throws IOException
Deprecated. Use createWriter(Configuration, Writer.Option...) instead.

Construct the preferred type of SequenceFile Writer.

Parameters:
fs - The configured filesystem.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

@Deprecated
public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs,
                                                                               Configuration conf,
                                                                               Path name,
                                                                               Class keyClass,
                                                                               Class valClass,
                                                                               org.apache.hadoop.io.SequenceFile.CompressionType compressionType)
                                                             throws IOException
Deprecated. Use createWriter(Configuration, Writer.Option...) instead.

Construct the preferred type of SequenceFile Writer.

Parameters:
fs - The configured filesystem.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
compressionType - The compression type.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

@Deprecated
public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs,
                                                                               Configuration conf,
                                                                               Path name,
                                                                               Class keyClass,
                                                                               Class valClass,
                                                                               org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
                                                                               Progressable progress)
                                                             throws IOException
Deprecated. Use createWriter(Configuration, Writer.Option...) instead.

Construct the preferred type of SequenceFile Writer.

Parameters:
fs - The configured filesystem.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
compressionType - The compression type.
progress - The Progressable object to track progress.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

@Deprecated
public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs,
                                                                               Configuration conf,
                                                                               Path name,
                                                                               Class keyClass,
                                                                               Class valClass,
                                                                               org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
                                                                               CompressionCodec codec)
                                                             throws IOException
Deprecated. Use createWriter(Configuration, Writer.Option...) instead.

Construct the preferred type of SequenceFile Writer.

Parameters:
fs - The configured filesystem.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
compressionType - The compression type.
codec - The compression codec.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

@Deprecated
public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs,
                                                                               Configuration conf,
                                                                               Path name,
                                                                               Class keyClass,
                                                                               Class valClass,
                                                                               org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
                                                                               CompressionCodec codec,
                                                                               Progressable progress,
                                                                               org.apache.hadoop.io.SequenceFile.Metadata metadata)
                                                             throws IOException
Deprecated. Use createWriter(Configuration, Writer.Option...) instead.

Construct the preferred type of SequenceFile Writer.

Parameters:
fs - The configured filesystem.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
compressionType - The compression type.
codec - The compression codec.
progress - The Progressable object to track progress.
metadata - The metadata of the file.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

@Deprecated
public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs,
                                                                               Configuration conf,
                                                                               Path name,
                                                                               Class keyClass,
                                                                               Class valClass,
                                                                               int bufferSize,
                                                                               short replication,
                                                                               long blockSize,
                                                                               org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
                                                                               CompressionCodec codec,
                                                                               Progressable progress,
                                                                               org.apache.hadoop.io.SequenceFile.Metadata metadata)
                                                             throws IOException
Deprecated. Use createWriter(Configuration, Writer.Option...) instead.

Construct the preferred type of SequenceFile Writer.

Parameters:
fs - The configured filesystem.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
bufferSize - buffer size for the underlaying outputstream.
replication - replication factor for the file.
blockSize - block size for the file.
compressionType - The compression type.
codec - The compression codec.
progress - The Progressable object to track progress.
metadata - The metadata of the file.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

@Deprecated
public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs,
                                                                               Configuration conf,
                                                                               Path name,
                                                                               Class keyClass,
                                                                               Class valClass,
                                                                               int bufferSize,
                                                                               short replication,
                                                                               long blockSize,
                                                                               boolean createParent,
                                                                               org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
                                                                               CompressionCodec codec,
                                                                               org.apache.hadoop.io.SequenceFile.Metadata metadata)
                                                             throws IOException
Deprecated. 

Construct the preferred type of SequenceFile Writer.

Parameters:
fs - The configured filesystem.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
bufferSize - buffer size for the underlaying outputstream.
replication - replication factor for the file.
blockSize - block size for the file.
createParent - create parent directory if non-existent
compressionType - The compression type.
codec - The compression codec.
metadata - The metadata of the file.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileContext fc,
                                                                    Configuration conf,
                                                                    Path name,
                                                                    Class keyClass,
                                                                    Class valClass,
                                                                    org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
                                                                    CompressionCodec codec,
                                                                    org.apache.hadoop.io.SequenceFile.Metadata metadata,
                                                                    EnumSet<CreateFlag> createFlag,
                                                                    org.apache.hadoop.fs.Options.CreateOpts... opts)
                                                             throws IOException
Construct the preferred type of SequenceFile Writer.

Parameters:
fc - The context for the specified file.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
compressionType - The compression type.
codec - The compression codec.
metadata - The metadata of the file.
createFlag - gives the semantics of create: overwrite, append etc.
opts - file creation options; see Options.CreateOpts.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

@Deprecated
public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs,
                                                                               Configuration conf,
                                                                               Path name,
                                                                               Class keyClass,
                                                                               Class valClass,
                                                                               org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
                                                                               CompressionCodec codec,
                                                                               Progressable progress)
                                                             throws IOException
Deprecated. Use createWriter(Configuration, Writer.Option...) instead.

Construct the preferred type of SequenceFile Writer.

Parameters:
fs - The configured filesystem.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
compressionType - The compression type.
codec - The compression codec.
progress - The Progressable object to track progress.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

@Deprecated
public static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf,
                                                                               FSDataOutputStream out,
                                                                               Class keyClass,
                                                                               Class valClass,
                                                                               org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
                                                                               CompressionCodec codec,
                                                                               org.apache.hadoop.io.SequenceFile.Metadata metadata)
                                                             throws IOException
Deprecated. Use createWriter(Configuration, Writer.Option...) instead.

Construct the preferred type of 'raw' SequenceFile Writer.

Parameters:
conf - The configuration.
out - The stream on top which the writer is to be constructed.
keyClass - The 'key' type.
valClass - The 'value' type.
compressionType - The compression type.
codec - The compression codec.
metadata - The metadata of the file.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException

createWriter

@Deprecated
public static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf,
                                                                               FSDataOutputStream out,
                                                                               Class keyClass,
                                                                               Class valClass,
                                                                               org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
                                                                               CompressionCodec codec)
                                                             throws IOException
Deprecated. Use createWriter(Configuration, Writer.Option...) instead.

Construct the preferred type of 'raw' SequenceFile Writer.

Parameters:
conf - The configuration.
out - The stream on top which the writer is to be constructed.
keyClass - The 'key' type.
valClass - The 'value' type.
compressionType - The compression type.
codec - The compression codec.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException


Copyright © 2014 Apache Software Foundation. All Rights Reserved.