Class DistCp

java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.hadoop.tools.DistCp
All Implemented Interfaces:
Configurable, Tool

@Public @Evolving public class DistCp extends Configured implements Tool
DistCp is the main driver-class for DistCpV2. For command-line use, DistCp::main() orchestrates the parsing of command-line parameters and the launch of the DistCp job. For programmatic use, a DistCp object can be constructed by specifying options (in a DistCpOptions object), and DistCp::execute() may be used to launch the copy-job. DistCp may alternatively be sub-classed to fine-tune behaviour.
  • Constructor Details

    • DistCp

      public DistCp(Configuration configuration, DistCpOptions inputOptions) throws Exception
      Public Constructor. Creates DistCp object with specified input-parameters. (E.g. source-paths, target-location, etc.)
      Parameters:
      configuration - configuration against which the Copy-mapper must run
      inputOptions - Immutable options
      Throws:
      Exception
  • Method Details

    • run

      public int run(String[] argv)
      Implementation of Tool::run(). Orchestrates the copy of source file(s) to target location, by: 1. Creating a list of files to be copied to target. 2. Launching a Map-only job to copy the files. (Delegates to execute().) The MR job is not closed as part of run if its a blocking call to run
      Specified by:
      run in interface Tool
      Parameters:
      argv - List of arguments passed to DistCp, from the ToolRunner.
      Returns:
      On success, it returns 0. Else, -1.
    • execute

      public Job execute() throws Exception
      Original entrypoint of a distcp job. Calls execute(boolean) without doing extra context checks and setting some configs.
      Returns:
      Job handle
      Throws:
      Exception - when fails to submit distcp job or distcp job fails
    • execute

      public Job execute(boolean extraContextChecks) throws Exception
      Implements the core-execution. Creates the file-list for copy, and launches the Hadoop-job, to do the copy.
      Parameters:
      extraContextChecks - if true, does extra context checks and sets some configs.
      Returns:
      Job handle
      Throws:
      Exception - when fails to submit distcp job or distcp job fails, or context checks fail
    • createAndSubmitJob

      public Job createAndSubmitJob() throws Exception
      Create and submit the mapreduce job.
      Returns:
      The mapreduce job object that has been submitted
      Throws:
      Exception
    • waitForJobCompletion

      public void waitForJobCompletion(Job job) throws Exception
      Wait for the given job to complete.
      Parameters:
      job - the given mapreduce job that has already been submitted
      Throws:
      Exception
    • createInputFileListing

      protected Path createInputFileListing(Job job) throws IOException
      Create input listing by invoking an appropriate copy listing implementation. Also add delegation tokens for each path to job's credential store
      Parameters:
      job - - Handle to job
      Returns:
      Returns the path where the copy listing is created
      Throws:
      IOException - - If any
    • getFileListingPath

      protected Path getFileListingPath() throws IOException
      Get default name of the copy listing file. Use the meta folder to create the copy listing file
      Returns:
      - Path where the copy listing file has to be saved
      Throws:
      IOException - - Exception if any
    • getContext

      protected org.apache.hadoop.tools.DistCpContext getContext()
      Returns the context.
      Returns:
      the context
    • main

      public static void main(String[] argv)
      Main function of the DistCp program. Parses the input arguments (via OptionsParser), and invokes the DistCp::run() method, via the ToolRunner.
      Parameters:
      argv - Command-line arguments sent to DistCp.
    • cleanup

      protected void cleanup()
      Clean the staging folder created by distcp.