org.apache.hadoop.io
Class SequenceFile.Sorter

java.lang.Object
  extended by org.apache.hadoop.io.SequenceFile.Sorter
Enclosing class:
SequenceFile

public static class SequenceFile.Sorter
extends Object

Sorts key/value pairs in a sequence-format file.

For best performance, applications should make sure that the Writable.readFields(DataInput) implementation of their keys is very efficient. In particular, it should avoid allocating memory.


Nested Class Summary
static interface SequenceFile.Sorter.RawKeyValueIterator
          The interface to iterate over raw keys/values of SequenceFiles.
 class SequenceFile.Sorter.SegmentDescriptor
          This class defines a merge segment.
 
Constructor Summary
SequenceFile.Sorter(FileSystem fs, Class<? extends WritableComparable> keyClass, Class valClass, Configuration conf)
          Sort and merge files containing the named classes.
SequenceFile.Sorter(FileSystem fs, RawComparator comparator, Class keyClass, Class valClass, Configuration conf)
          Sort and merge using an arbitrary RawComparator.
 
Method Summary
 SequenceFile.Writer cloneFileAttributes(Path inputFile, Path outputFile, Progressable prog)
          Clones the attributes (like compression of the input file and creates a corresponding Writer
 int getFactor()
          Get the number of streams to merge at once.
 int getMemory()
          Get the total amount of buffer memory, in bytes.
 SequenceFile.Sorter.RawKeyValueIterator merge(List<SequenceFile.Sorter.SegmentDescriptor> segments, Path tmpDir)
          Merges the list of segments of type SegmentDescriptor
 SequenceFile.Sorter.RawKeyValueIterator merge(Path[] inNames, boolean deleteInputs, int factor, Path tmpDir)
          Merges the contents of files passed in Path[]
 SequenceFile.Sorter.RawKeyValueIterator merge(Path[] inNames, boolean deleteInputs, Path tmpDir)
          Merges the contents of files passed in Path[] using a max factor value that is already set
 void merge(Path[] inFiles, Path outFile)
          Merge the provided files.
 SequenceFile.Sorter.RawKeyValueIterator merge(Path[] inNames, Path tempDir, boolean deleteInputs)
          Merges the contents of files passed in Path[]
 void setFactor(int factor)
          Set the number of streams to merge at once.
 void setMemory(int memory)
          Set the total amount of buffer memory, in bytes.
 void setProgressable(Progressable progressable)
          Set the progressable object in order to report progress.
 void sort(Path[] inFiles, Path outFile, boolean deleteInput)
          Perform a file sort from a set of input files into an output file.
 void sort(Path inFile, Path outFile)
          The backwards compatible interface to sort.
 SequenceFile.Sorter.RawKeyValueIterator sortAndIterate(Path[] inFiles, Path tempDir, boolean deleteInput)
          Perform a file sort from a set of input files and return an iterator.
 void writeFile(SequenceFile.Sorter.RawKeyValueIterator records, SequenceFile.Writer writer)
          Writes records from RawKeyValueIterator into a file represented by the passed writer
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SequenceFile.Sorter

public SequenceFile.Sorter(FileSystem fs,
                           Class<? extends WritableComparable> keyClass,
                           Class valClass,
                           Configuration conf)
Sort and merge files containing the named classes.


SequenceFile.Sorter

public SequenceFile.Sorter(FileSystem fs,
                           RawComparator comparator,
                           Class keyClass,
                           Class valClass,
                           Configuration conf)
Sort and merge using an arbitrary RawComparator.

Method Detail

setFactor

public void setFactor(int factor)
Set the number of streams to merge at once.


getFactor

public int getFactor()
Get the number of streams to merge at once.


setMemory

public void setMemory(int memory)
Set the total amount of buffer memory, in bytes.


getMemory

public int getMemory()
Get the total amount of buffer memory, in bytes.


setProgressable

public void setProgressable(Progressable progressable)
Set the progressable object in order to report progress.


sort

public void sort(Path[] inFiles,
                 Path outFile,
                 boolean deleteInput)
          throws IOException
Perform a file sort from a set of input files into an output file.

Parameters:
inFiles - the files to be sorted
outFile - the sorted output file
deleteInput - should the input files be deleted as they are read?
Throws:
IOException

sortAndIterate

public SequenceFile.Sorter.RawKeyValueIterator sortAndIterate(Path[] inFiles,
                                                              Path tempDir,
                                                              boolean deleteInput)
                                                       throws IOException
Perform a file sort from a set of input files and return an iterator.

Parameters:
inFiles - the files to be sorted
tempDir - the directory where temp files are created during sort
deleteInput - should the input files be deleted as they are read?
Returns:
iterator the RawKeyValueIterator
Throws:
IOException

sort

public void sort(Path inFile,
                 Path outFile)
          throws IOException
The backwards compatible interface to sort.

Parameters:
inFile - the input file to sort
outFile - the sorted output file
Throws:
IOException

merge

public SequenceFile.Sorter.RawKeyValueIterator merge(List<SequenceFile.Sorter.SegmentDescriptor> segments,
                                                     Path tmpDir)
                                              throws IOException
Merges the list of segments of type SegmentDescriptor

Parameters:
segments - the list of SegmentDescriptors
tmpDir - the directory to write temporary files into
Returns:
RawKeyValueIterator
Throws:
IOException

merge

public SequenceFile.Sorter.RawKeyValueIterator merge(Path[] inNames,
                                                     boolean deleteInputs,
                                                     Path tmpDir)
                                              throws IOException
Merges the contents of files passed in Path[] using a max factor value that is already set

Parameters:
inNames - the array of path names
deleteInputs - true if the input files should be deleted when unnecessary
tmpDir - the directory to write temporary files into
Returns:
RawKeyValueIteratorMergeQueue
Throws:
IOException

merge

public SequenceFile.Sorter.RawKeyValueIterator merge(Path[] inNames,
                                                     boolean deleteInputs,
                                                     int factor,
                                                     Path tmpDir)
                                              throws IOException
Merges the contents of files passed in Path[]

Parameters:
inNames - the array of path names
deleteInputs - true if the input files should be deleted when unnecessary
factor - the factor that will be used as the maximum merge fan-in
tmpDir - the directory to write temporary files into
Returns:
RawKeyValueIteratorMergeQueue
Throws:
IOException

merge

public SequenceFile.Sorter.RawKeyValueIterator merge(Path[] inNames,
                                                     Path tempDir,
                                                     boolean deleteInputs)
                                              throws IOException
Merges the contents of files passed in Path[]

Parameters:
inNames - the array of path names
tempDir - the directory for creating temp files during merge
deleteInputs - true if the input files should be deleted when unnecessary
Returns:
RawKeyValueIteratorMergeQueue
Throws:
IOException

cloneFileAttributes

public SequenceFile.Writer cloneFileAttributes(Path inputFile,
                                               Path outputFile,
                                               Progressable prog)
                                        throws IOException
Clones the attributes (like compression of the input file and creates a corresponding Writer

Parameters:
inputFile - the path of the input file whose attributes should be cloned
outputFile - the path of the output file
prog - the Progressable to report status during the file write
Returns:
Writer
Throws:
IOException

writeFile

public void writeFile(SequenceFile.Sorter.RawKeyValueIterator records,
                      SequenceFile.Writer writer)
               throws IOException
Writes records from RawKeyValueIterator into a file represented by the passed writer

Parameters:
records - the RawKeyValueIterator
writer - the Writer created earlier
Throws:
IOException

merge

public void merge(Path[] inFiles,
                  Path outFile)
           throws IOException
Merge the provided files.

Parameters:
inFiles - the array of input path names
outFile - the final output file
Throws:
IOException


Copyright © 2009 The Apache Software Foundation