Class FileSplit

java.lang.Object
org.apache.hadoop.mapreduce.InputSplit
org.apache.hadoop.mapreduce.lib.input.FileSplit
All Implemented Interfaces:
Writable

@Public @Stable public class FileSplit extends InputSplit implements Writable
  • Constructor Details

    • FileSplit

      public FileSplit()
    • FileSplit

      public FileSplit(Path file, long start, long length, String[] hosts)
      Constructs a split with host information
      Parameters:
      file - the file name
      start - the position of the first byte in the file to process
      length - the number of bytes in the file to process
      hosts - the list of hosts containing the block, possibly null
    • FileSplit

      public FileSplit(Path file, long start, long length, String[] hosts, String[] inMemoryHosts)
      Constructs a split with host and cached-blocks information
      Parameters:
      file - the file name
      start - the position of the first byte in the file to process
      length - the number of bytes in the file to process
      hosts - the list of hosts containing the block
      inMemoryHosts - the list of hosts containing the block in memory
  • Method Details

    • getPath

      public Path getPath()
      The file containing this split's data.
    • getStart

      public long getStart()
      The position of the first byte in the file to process.
    • getLength

      public long getLength()
      The number of bytes in the file to process.
      Specified by:
      getLength in class InputSplit
      Returns:
      the number of bytes in the split
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • write

      public void write(DataOutput out) throws IOException
      Description copied from interface: Writable
      Serialize the fields of this object to out.
      Specified by:
      write in interface Writable
      Parameters:
      out - DataOuput to serialize this object into.
      Throws:
      IOException - any other problem for write.
    • readFields

      public void readFields(DataInput in) throws IOException
      Description copied from interface: Writable
      Deserialize the fields of this object from in.

      For efficiency, implementations should attempt to re-use storage in the existing object where possible.

      Specified by:
      readFields in interface Writable
      Parameters:
      in - DataInput to deseriablize this object from.
      Throws:
      IOException - any other problem for readFields.
    • getLocations

      public String[] getLocations() throws IOException
      Description copied from class: InputSplit
      Get the list of nodes by name where the data for the split would be local. The locations do not need to be serialized.
      Specified by:
      getLocations in class InputSplit
      Returns:
      a new array of the node nodes.
      Throws:
      IOException
    • getLocationInfo

      @Evolving public SplitLocationInfo[] getLocationInfo() throws IOException
      Description copied from class: InputSplit
      Gets info about which nodes the input split is stored on and how it is stored at each location.
      Overrides:
      getLocationInfo in class InputSplit
      Returns:
      list of SplitLocationInfos describing how the split data is stored at each location. A null value indicates that all the locations have the data stored on disk.
      Throws:
      IOException