Class CombineFileSplit

java.lang.Object
org.apache.hadoop.mapreduce.InputSplit
org.apache.hadoop.mapreduce.lib.input.CombineFileSplit
All Implemented Interfaces:
Writable
Direct Known Subclasses:
CombineFileSplit

@Public @Stable public class CombineFileSplit extends InputSplit implements Writable
A sub-collection of input files. Unlike FileSplit, CombineFileSplit class does not represent a split of a file, but a split of input files into smaller sets. A split may contain blocks from different file but all the blocks in the same split are probably local to some rack
CombineFileSplit can be used to implement RecordReader's, with reading one record per file.
See Also:
  • Constructor Details

    • CombineFileSplit

      public CombineFileSplit()
      default constructor
    • CombineFileSplit

      public CombineFileSplit(Path[] files, long[] start, long[] lengths, String[] locations)
    • CombineFileSplit

      public CombineFileSplit(Path[] files, long[] lengths)
    • CombineFileSplit

      public CombineFileSplit(CombineFileSplit old) throws IOException
      Copy constructor
      Throws:
      IOException
  • Method Details

    • getLength

      public long getLength()
      Description copied from class: InputSplit
      Get the size of the split, so that the input splits can be sorted by size.
      Specified by:
      getLength in class InputSplit
      Returns:
      the number of bytes in the split
    • getStartOffsets

      public long[] getStartOffsets()
      Returns an array containing the start offsets of the files in the split
    • getLengths

      public long[] getLengths()
      Returns an array containing the lengths of the files in the split
    • getOffset

      public long getOffset(int i)
      Returns the start offset of the ith Path
    • getLength

      public long getLength(int i)
      Returns the length of the ith Path
    • getNumPaths

      public int getNumPaths()
      Returns the number of Paths in the split
    • getPath

      public Path getPath(int i)
      Returns the ith Path
    • getPaths

      public Path[] getPaths()
      Returns all the Paths in the split
    • getLocations

      public String[] getLocations() throws IOException
      Returns all the Paths where this input-split resides
      Specified by:
      getLocations in class InputSplit
      Returns:
      a new array of the node nodes.
      Throws:
      IOException
    • readFields

      public void readFields(DataInput in) throws IOException
      Description copied from interface: Writable
      Deserialize the fields of this object from in.

      For efficiency, implementations should attempt to re-use storage in the existing object where possible.

      Specified by:
      readFields in interface Writable
      Parameters:
      in - DataInput to deseriablize this object from.
      Throws:
      IOException - any other problem for readFields.
    • write

      public void write(DataOutput out) throws IOException
      Description copied from interface: Writable
      Serialize the fields of this object to out.
      Specified by:
      write in interface Writable
      Parameters:
      out - DataOuput to serialize this object into.
      Throws:
      IOException - any other problem for write.
    • toString

      public String toString()
      Overrides:
      toString in class Object