@InterfaceAudience.Public @InterfaceStability.Evolving public class DataDrivenDBInputFormat<T extends DBWritable> extends DBInputFormat<T> implements Configurable
Modifier and Type | Field and Description |
---|---|
static String |
SUBSTITUTE_TOKEN
If users are providing their own query, the following string is expected to
appear in the WHERE clause, which will be substituted with a pair of conditions
on the input to allow input splits to parallelise the import.
|
conditions, connection, dbConf, dbProductName, fieldNames, tableName
Constructor and Description |
---|
DataDrivenDBInputFormat() |
Modifier and Type | Method and Description |
---|---|
protected RecordReader<LongWritable,T> |
createDBRecordReader(org.apache.hadoop.mapreduce.lib.db.DBInputFormat.DBInputSplit split,
Configuration conf) |
protected String |
getBoundingValsQuery() |
List<InputSplit> |
getSplits(JobContext job)
Logically split the set of input files for the job.
|
protected DBSplitter |
getSplitter(int sqlDataType) |
static void |
setBoundingQuery(Configuration conf,
String query)
Set the user-defined bounding query to use with a user-defined query.
|
static void |
setInput(Job job,
Class<? extends DBWritable> inputClass,
String inputQuery,
String inputBoundingQuery)
setInput() takes a custom query and a separate "bounding query" to use
instead of the custom "count query" used by DBInputFormat.
|
static void |
setInput(Job job,
Class<? extends DBWritable> inputClass,
String tableName,
String conditions,
String splitBy,
String... fieldNames)
Note that the "orderBy" column is called the "splitBy" in this version.
|
closeConnection, createConnection, createRecordReader, getConf, getConnection, getCountQuery, getDBConf, getDBProductName, setConf
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getConf, setConf
public static final String SUBSTITUTE_TOKEN
public DataDrivenDBInputFormat()
protected DBSplitter getSplitter(int sqlDataType)
public List<InputSplit> getSplits(JobContext job) throws IOException
Each InputSplit
is then assigned to an individual Mapper
for processing.
Note: The split is a logical split of the inputs and the
input files are not physically split into chunks. For e.g. a split could
be <input-file-path, start, offset> tuple. The InputFormat
also creates the RecordReader
to read the InputSplit
.
getSplits
in class DBInputFormat<T extends DBWritable>
job
- job configuration.InputSplit
s for the job.IOException
protected String getBoundingValsQuery()
public static void setBoundingQuery(Configuration conf, String query)
protected RecordReader<LongWritable,T> createDBRecordReader(org.apache.hadoop.mapreduce.lib.db.DBInputFormat.DBInputSplit split, Configuration conf) throws IOException
createDBRecordReader
in class DBInputFormat<T extends DBWritable>
IOException
public static void setInput(Job job, Class<? extends DBWritable> inputClass, String tableName, String conditions, String splitBy, String... fieldNames)
Copyright © 2017 Apache Software Foundation. All rights reserved.