Class SequenceFile
SequenceFiles are flat files consisting of binary key/value
pairs.
SequenceFile provides SequenceFile.Writer,
SequenceFile.Reader and SequenceFile.Sorter classes for writing,
reading and sorting respectively.
SequenceFile Writers based on the
SequenceFile.CompressionType used to compress key/value pairs:
-
Writer: Uncompressed records. -
RecordCompressWriter: Record-compressed files, only compress values. -
BlockCompressWriter: Block-compressed files, both keys & values are collected in 'blocks' separately and compressed. The size of the 'block' is configurable.
The actual compression algorithm used to compress key and/or values can be
specified by using the appropriate CompressionCodec.
The recommended way is to use the static createWriter methods
provided by the SequenceFile to chose the preferred format.
The SequenceFile.Reader acts as the bridge and can read any of the
above SequenceFile formats.
SequenceFile Formats
Essentially there are 3 different formats for SequenceFiles
depending on the CompressionType specified. All of them share a
common header described below.
SequenceFile Header
- version - 3 bytes of magic header SEQ, followed by 1 byte of actual version number (e.g. SEQ4 or SEQ6)
- keyClassName -key class
- valueClassName - value class
- compression - A boolean which specifies if compression is turned on for keys/values in this file.
- blockCompression - A boolean which specifies if block-compression is turned on for keys/values in this file.
-
compression codec -
CompressionCodecclass which is used for compression of keys and/or values (if compression is enabled). -
metadata -
SequenceFile.Metadatafor this file. - sync - A sync marker to denote end of the header.
Uncompressed SequenceFile Format
- Header
-
Record
- Record length
- Key length
- Key
- Value
-
A sync-marker every few
100kilobytes or so.
Record-Compressed SequenceFile Format
- Header
-
Record
- Record length
- Key length
- Key
- Compressed Value
-
A sync-marker every few
100kilobytes or so.
Block-Compressed SequenceFile Format
- Header
-
Record Block
- Uncompressed number of records in the block
- Compressed key-lengths block-size
- Compressed key-lengths block
- Compressed keys block-size
- Compressed keys block
- Compressed value-lengths block-size
- Compressed value-lengths block
- Compressed values block-size
- Compressed values block
- A sync-marker every block.
The compressed blocks of key lengths and value lengths consist of the actual lengths of individual keys/values encoded in ZeroCompressedInteger format.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumThe compression type used to compress key/value pairs in theSequenceFile.static classorg.apache.hadoop.io.SequenceFile.MetadataThe class encapsulating with the metadata of a file.static classorg.apache.hadoop.io.SequenceFile.ReaderReads key/value pairs from a sequence-format file.static classorg.apache.hadoop.io.SequenceFile.SorterSorts key/value pairs in a sequence-format file.static interfaceorg.apache.hadoop.io.SequenceFile.ValueBytesThe interface to 'raw' values of SequenceFiles.static classorg.apache.hadoop.io.SequenceFile.WriterWrite key/value pairs to a sequence-format file. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intThe number of bytes between sync points. 100 KB, default. -
Method Summary
Modifier and TypeMethodDescriptionstatic org.apache.hadoop.io.SequenceFile.WritercreateWriter(Configuration conf, FSDataOutputStream out, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, CompressionCodec codec) Deprecated.static org.apache.hadoop.io.SequenceFile.WritercreateWriter(Configuration conf, FSDataOutputStream out, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, CompressionCodec codec, org.apache.hadoop.io.SequenceFile.Metadata metadata) Deprecated.UsecreateWriter(Configuration, Writer.Option...)instead.static org.apache.hadoop.io.SequenceFile.WritercreateWriter(Configuration conf, org.apache.hadoop.io.SequenceFile.Writer.Option... opts) Create a new Writer with the given options.static org.apache.hadoop.io.SequenceFile.WritercreateWriter(FileContext fc, Configuration conf, Path name, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, CompressionCodec codec, org.apache.hadoop.io.SequenceFile.Metadata metadata, EnumSet<CreateFlag> createFlag, org.apache.hadoop.fs.Options.CreateOpts... opts) Construct the preferred type of SequenceFile Writer.static org.apache.hadoop.io.SequenceFile.WritercreateWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass) Deprecated.UsecreateWriter(Configuration, Writer.Option...)instead.static org.apache.hadoop.io.SequenceFile.WritercreateWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, int bufferSize, short replication, long blockSize, boolean createParent, SequenceFile.CompressionType compressionType, CompressionCodec codec, org.apache.hadoop.io.SequenceFile.Metadata metadata) Deprecated.static org.apache.hadoop.io.SequenceFile.WritercreateWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, int bufferSize, short replication, long blockSize, SequenceFile.CompressionType compressionType, CompressionCodec codec, Progressable progress, org.apache.hadoop.io.SequenceFile.Metadata metadata) Deprecated.UsecreateWriter(Configuration, Writer.Option...)instead.static org.apache.hadoop.io.SequenceFile.WritercreateWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType) Deprecated.UsecreateWriter(Configuration, Writer.Option...)instead.static org.apache.hadoop.io.SequenceFile.WritercreateWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, CompressionCodec codec) Deprecated.UsecreateWriter(Configuration, Writer.Option...)instead.static org.apache.hadoop.io.SequenceFile.WritercreateWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, CompressionCodec codec, Progressable progress) Deprecated.UsecreateWriter(Configuration, Writer.Option...)instead.static org.apache.hadoop.io.SequenceFile.WritercreateWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, CompressionCodec codec, Progressable progress, org.apache.hadoop.io.SequenceFile.Metadata metadata) Deprecated.UsecreateWriter(Configuration, Writer.Option...)instead.static org.apache.hadoop.io.SequenceFile.WritercreateWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, Progressable progress) Deprecated.UsecreateWriter(Configuration, Writer.Option...)instead.static SequenceFile.CompressionTypeGet the compression type for the reduce outputsstatic voidSet the default compression type for sequence files.
-
Field Details
-
SYNC_INTERVAL
public static final int SYNC_INTERVALThe number of bytes between sync points. 100 KB, default. Computed as 5 KB * 20 = 100 KB- See Also:
-
-
Method Details
-
getDefaultCompressionType
Get the compression type for the reduce outputs- Parameters:
job- the job config to look in- Returns:
- the kind of compression to use
-
setDefaultCompressionType
Set the default compression type for sequence files.- Parameters:
job- the configuration to modifyval- the new compression type (none, block, record)
-
createWriter
public static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf, org.apache.hadoop.io.SequenceFile.Writer.Option... opts) throws IOException Create a new Writer with the given options.- Parameters:
conf- the configuration to useopts- the options to create the file with- Returns:
- a new Writer
- Throws:
IOException- raised on errors performing I/O.
-
createWriter
@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass) throws IOException Deprecated.UsecreateWriter(Configuration, Writer.Option...)instead.Construct the preferred type of SequenceFile Writer.- Parameters:
fs- The configured filesystem.conf- The configuration.name- The name of the file.keyClass- The 'key' type.valClass- The 'value' type.- Returns:
- Returns the handle to the constructed SequenceFile Writer.
- Throws:
IOException- raised on errors performing I/O.
-
createWriter
@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType) throws IOException Deprecated.UsecreateWriter(Configuration, Writer.Option...)instead.Construct the preferred type of SequenceFile Writer.- Parameters:
fs- The configured filesystem.conf- The configuration.name- The name of the file.keyClass- The 'key' type.valClass- The 'value' type.compressionType- The compression type.- Returns:
- Returns the handle to the constructed SequenceFile Writer.
- Throws:
IOException- raised on errors performing I/O.
-
createWriter
@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, Progressable progress) throws IOException Deprecated.UsecreateWriter(Configuration, Writer.Option...)instead.Construct the preferred type of SequenceFile Writer.- Parameters:
fs- The configured filesystem.conf- The configuration.name- The name of the file.keyClass- The 'key' type.valClass- The 'value' type.compressionType- The compression type.progress- The Progressable object to track progress.- Returns:
- Returns the handle to the constructed SequenceFile Writer.
- Throws:
IOException- raised on errors performing I/O.
-
createWriter
@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, CompressionCodec codec) throws IOException Deprecated.UsecreateWriter(Configuration, Writer.Option...)instead.Construct the preferred type of SequenceFile Writer.- Parameters:
fs- The configured filesystem.conf- The configuration.name- The name of the file.keyClass- The 'key' type.valClass- The 'value' type.compressionType- The compression type.codec- The compression codec.- Returns:
- Returns the handle to the constructed SequenceFile Writer.
- Throws:
IOException- raised on errors performing I/O.
-
createWriter
@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, CompressionCodec codec, Progressable progress, org.apache.hadoop.io.SequenceFile.Metadata metadata) throws IOException Deprecated.UsecreateWriter(Configuration, Writer.Option...)instead.Construct the preferred type of SequenceFile Writer.- Parameters:
fs- The configured filesystem.conf- The configuration.name- The name of the file.keyClass- The 'key' type.valClass- The 'value' type.compressionType- The compression type.codec- The compression codec.progress- The Progressable object to track progress.metadata- The metadata of the file.- Returns:
- Returns the handle to the constructed SequenceFile Writer.
- Throws:
IOException- raised on errors performing I/O.
-
createWriter
@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, int bufferSize, short replication, long blockSize, SequenceFile.CompressionType compressionType, CompressionCodec codec, Progressable progress, org.apache.hadoop.io.SequenceFile.Metadata metadata) throws IOException Deprecated.UsecreateWriter(Configuration, Writer.Option...)instead.Construct the preferred type of SequenceFile Writer.- Parameters:
fs- The configured filesystem.conf- The configuration.name- The name of the file.keyClass- The 'key' type.valClass- The 'value' type.bufferSize- buffer size for the underlaying outputstream.replication- replication factor for the file.blockSize- block size for the file.compressionType- The compression type.codec- The compression codec.progress- The Progressable object to track progress.metadata- The metadata of the file.- Returns:
- Returns the handle to the constructed SequenceFile Writer.
- Throws:
IOException- raised on errors performing I/O.
-
createWriter
@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, int bufferSize, short replication, long blockSize, boolean createParent, SequenceFile.CompressionType compressionType, CompressionCodec codec, org.apache.hadoop.io.SequenceFile.Metadata metadata) throws IOException Deprecated.Construct the preferred type of SequenceFile Writer.- Parameters:
fs- The configured filesystem.conf- The configuration.name- The name of the file.keyClass- The 'key' type.valClass- The 'value' type.bufferSize- buffer size for the underlaying outputstream.replication- replication factor for the file.blockSize- block size for the file.createParent- create parent directory if non-existentcompressionType- The compression type.codec- The compression codec.metadata- The metadata of the file.- Returns:
- Returns the handle to the constructed SequenceFile Writer.
- Throws:
IOException- raised on errors performing I/O.
-
createWriter
public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileContext fc, Configuration conf, Path name, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, CompressionCodec codec, org.apache.hadoop.io.SequenceFile.Metadata metadata, EnumSet<CreateFlag> createFlag, org.apache.hadoop.fs.Options.CreateOpts... opts) throws IOException Construct the preferred type of SequenceFile Writer.- Parameters:
fc- The context for the specified file.conf- The configuration.name- The name of the file.keyClass- The 'key' type.valClass- The 'value' type.compressionType- The compression type.codec- The compression codec.metadata- The metadata of the file.createFlag- gives the semantics of create: overwrite, append etc.opts- file creation options; seeOptions.CreateOpts.- Returns:
- Returns the handle to the constructed SequenceFile Writer.
- Throws:
IOException- raised on errors performing I/O.
-
createWriter
@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileSystem fs, Configuration conf, Path name, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, CompressionCodec codec, Progressable progress) throws IOException Deprecated.UsecreateWriter(Configuration, Writer.Option...)instead.Construct the preferred type of SequenceFile Writer.- Parameters:
fs- The configured filesystem.conf- The configuration.name- The name of the file.keyClass- The 'key' type.valClass- The 'value' type.compressionType- The compression type.codec- The compression codec.progress- The Progressable object to track progress.- Returns:
- Returns the handle to the constructed SequenceFile Writer.
- Throws:
IOException- raised on errors performing I/O.
-
createWriter
@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf, FSDataOutputStream out, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, CompressionCodec codec, org.apache.hadoop.io.SequenceFile.Metadata metadata) throws IOException Deprecated.UsecreateWriter(Configuration, Writer.Option...)instead.Construct the preferred type of 'raw' SequenceFile Writer.- Parameters:
conf- The configuration.out- The stream on top which the writer is to be constructed.keyClass- The 'key' type.valClass- The 'value' type.compressionType- The compression type.codec- The compression codec.metadata- The metadata of the file.- Returns:
- Returns the handle to the constructed SequenceFile Writer.
- Throws:
IOException- raised on errors performing I/O.
-
createWriter
@Deprecated public static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf, FSDataOutputStream out, Class keyClass, Class valClass, SequenceFile.CompressionType compressionType, CompressionCodec codec) throws IOException Deprecated.UsecreateWriter(Configuration, Writer.Option...)instead.Construct the preferred type of 'raw' SequenceFile Writer.- Parameters:
conf- The configuration.out- The stream on top which the writer is to be constructed.keyClass- The 'key' type.valClass- The 'value' type.compressionType- The compression type.codec- The compression codec.- Returns:
- Returns the handle to the constructed SequenceFile Writer.
- Throws:
IOException- raised on errors performing I/O.
-
createWriter(Configuration, Writer.Option...)instead.