Package org.apache.hadoop.tools
Class DistCp
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.hadoop.tools.DistCp
- All Implemented Interfaces:
Configurable,Tool
DistCp is the main driver-class for DistCpV2.
For command-line use, DistCp::main() orchestrates the parsing of command-line
parameters and the launch of the DistCp job.
For programmatic use, a DistCp object can be constructed by specifying
options (in a DistCpOptions object), and DistCp::execute() may be used to
launch the copy-job. DistCp may alternatively be sub-classed to fine-tune
behaviour.
-
Constructor Summary
ConstructorsConstructorDescriptionDistCp(Configuration configuration, DistCpOptions inputOptions) Public Constructor. -
Method Summary
Modifier and TypeMethodDescriptionprotected voidcleanup()Clean the staging folder created by distcp.Create and submit the mapreduce job.protected PathCreate input listing by invoking an appropriate copy listing implementation.execute()Original entrypoint of a distcp job.execute(boolean extraContextChecks) Implements the core-execution.protected org.apache.hadoop.tools.DistCpContextReturns the context.protected PathGet default name of the copy listing file.static voidMain function of the DistCp program.intImplementation of Tool::run().voidwaitForJobCompletion(Job job) Wait for the given job to complete.Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConfMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
Constructor Details
-
DistCp
Public Constructor. Creates DistCp object with specified input-parameters. (E.g. source-paths, target-location, etc.)- Parameters:
configuration- configuration against which the Copy-mapper must runinputOptions- Immutable options- Throws:
Exception
-
-
Method Details
-
run
Implementation of Tool::run(). Orchestrates the copy of source file(s) to target location, by: 1. Creating a list of files to be copied to target. 2. Launching a Map-only job to copy the files. (Delegates to execute().) The MR job is not closed as part of run if its a blocking call to run -
execute
Original entrypoint of a distcp job. Callsexecute(boolean)without doing extra context checks and setting some configs.- Returns:
- Job handle
- Throws:
Exception- when fails to submit distcp job or distcp job fails
-
execute
Implements the core-execution. Creates the file-list for copy, and launches the Hadoop-job, to do the copy.- Parameters:
extraContextChecks- if true, does extra context checks and sets some configs.- Returns:
- Job handle
- Throws:
Exception- when fails to submit distcp job or distcp job fails, or context checks fail
-
createAndSubmitJob
Create and submit the mapreduce job.- Returns:
- The mapreduce job object that has been submitted
- Throws:
Exception
-
waitForJobCompletion
Wait for the given job to complete.- Parameters:
job- the given mapreduce job that has already been submitted- Throws:
Exception
-
createInputFileListing
Create input listing by invoking an appropriate copy listing implementation. Also add delegation tokens for each path to job's credential store- Parameters:
job- - Handle to job- Returns:
- Returns the path where the copy listing is created
- Throws:
IOException- - If any
-
getFileListingPath
Get default name of the copy listing file. Use the meta folder to create the copy listing file- Returns:
- - Path where the copy listing file has to be saved
- Throws:
IOException- - Exception if any
-
getContext
protected org.apache.hadoop.tools.DistCpContext getContext()Returns the context.- Returns:
- the context
-
main
Main function of the DistCp program. Parses the input arguments (via OptionsParser), and invokes the DistCp::run() method, via the ToolRunner.- Parameters:
argv- Command-line arguments sent to DistCp.
-
cleanup
protected void cleanup()Clean the staging folder created by distcp.
-