DataDrivenDBInputFormat (Apache Hadoop Main 3.2.2 API)

java.lang.Object
- org.apache.hadoop.mapreduce.InputFormat<LongWritable,T>
- - org.apache.hadoop.mapreduce.lib.db.DBInputFormat<T>
  - - org.apache.hadoop.mapreduce.lib.db.DataDrivenDBInputFormat<T>

All Implemented Interfaces:

Configurable

Direct Known Subclasses:

OracleDataDrivenDBInputFormat
```
@InterfaceAudience.Public
 @InterfaceStability.Evolving
public class DataDrivenDBInputFormat<T extends DBWritable>
extends DBInputFormat<T>
implements Configurable
```
A InputFormat that reads input data from an SQL table. Operates like DBInputFormat, but instead of using LIMIT and OFFSET to demarcate splits, it tries to generate WHERE clauses which separate the data into roughly equivalent shards.

Field Summary

Fields
Modifier and Type	Field and Description
`static String`	`SUBSTITUTE_TOKEN` If users are providing their own query, the following string is expected to appear in the WHERE clause, which will be substituted with a pair of conditions on the input to allow input splits to parallelise the import.

Fields inherited from class org.apache.hadoop.mapreduce.lib.db.DBInputFormat
conditions, connection, dbConf, dbProductName, fieldNames, tableName

Constructor Summary

Constructors
Constructor and Description

DataDrivenDBInputFormat()

Constructors
Constructor and Description
`DataDrivenDBInputFormat()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected RecordReader<LongWritable,T>`	`createDBRecordReader(org.apache.hadoop.mapreduce.lib.db.DBInputFormat.DBInputSplit split, Configuration conf)`
`protected String`	`getBoundingValsQuery()`
`List<InputSplit>`	`getSplits(JobContext job)` Logically split the set of input files for the job.
`protected DBSplitter`	`getSplitter(int sqlDataType)`
`static void`	`setBoundingQuery(Configuration conf, String query)` Set the user-defined bounding query to use with a user-defined query.
`static void`	`setInput(Job job, Class<? extends DBWritable> inputClass, String inputQuery, String inputBoundingQuery)` setInput() takes a custom query and a separate "bounding query" to use instead of the custom "count query" used by DBInputFormat.
`static void`	`setInput(Job job, Class<? extends DBWritable> inputClass, String tableName, String conditions, String splitBy, String... fieldNames)` Note that the "orderBy" column is called the "splitBy" in this version.

Methods inherited from class org.apache.hadoop.mapreduce.lib.db.DBInputFormat
closeConnection, createConnection, createRecordReader, getConf, getConnection, getCountQuery, getDBConf, getDBProductName, setConf

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf

- Field Detail
  - SUBSTITUTE_TOKEN
```
public static final String SUBSTITUTE_TOKEN
```
    If users are providing their own query, the following string is expected to appear in the WHERE clause, which will be substituted with a pair of conditions on the input to allow input splits to parallelise the import.
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - DataDrivenDBInputFormat
```
public DataDrivenDBInputFormat()
```
- Method Detail
  - getSplitter
```
protected DBSplitter getSplitter(int sqlDataType)
```
    Returns:
    
    the DBSplitter implementation to use to divide the table/query into InputSplits.
  - getSplits
```
public List<InputSplit> getSplits(JobContext job)
                           throws IOException
```
    Logically split the set of input files for the job.
    Each InputSplit is then assigned to an individual Mapper for processing.
    
    Note: The split is a logical split of the inputs and the input files are not physically split into chunks. For e.g. a split could be <input-file-path, start, offset> tuple. The InputFormat also creates the RecordReader to read the InputSplit.
    
    Overrides:
    
    getSplits in class DBInputFormat<T extends DBWritable>
    
    Parameters:
    
    job - job configuration.
    
    Returns:
    
    an array of InputSplits for the job.
    
    Throws:
    
    IOException
  - getBoundingValsQuery
```
protected String getBoundingValsQuery()
```
    Returns:
    
    a query which returns the minimum and maximum values for the order-by column. The min value should be in the first column, and the max value should be in the second column of the results.
  - setBoundingQuery
```
public static void setBoundingQuery(Configuration conf,
                                    String query)
```
    Set the user-defined bounding query to use with a user-defined query. This *must* include the substring "$CONDITIONS" (DataDrivenDBInputFormat.SUBSTITUTE_TOKEN) inside the WHERE clause, so that DataDrivenDBInputFormat knows where to insert split clauses. e.g., "SELECT foo FROM mytable WHERE $CONDITIONS" This will be expanded to something like: SELECT foo FROM mytable WHERE (id > 100) AND (id < 250) inside each split.
  - createDBRecordReader
```
protected RecordReader<LongWritable,T> createDBRecordReader(org.apache.hadoop.mapreduce.lib.db.DBInputFormat.DBInputSplit split,
                                                            Configuration conf)
                                                     throws IOException
```
    Overrides:
    
    createDBRecordReader in class DBInputFormat<T extends DBWritable>
    
    Throws:
    
    IOException
  - setInput
```
public static void setInput(Job job,
                            Class<? extends DBWritable> inputClass,
                            String tableName,
                            String conditions,
                            String splitBy,
                            String... fieldNames)
```
    Note that the "orderBy" column is called the "splitBy" in this version. We reuse the same field, but it's not strictly ordering it -- just partitioning the results.
  - setInput
```
public static void setInput(Job job,
                            Class<? extends DBWritable> inputClass,
                            String inputQuery,
                            String inputBoundingQuery)
```
    setInput() takes a custom query and a separate "bounding query" to use instead of the custom "count query" used by DBInputFormat.

Class DataDrivenDBInputFormat<T extends DBWritable>

Field Summary

Fields inherited from class org.apache.hadoop.mapreduce.lib.db.DBInputFormat

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapreduce.lib.db.DBInputFormat

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.hadoop.conf.Configurable

Field Detail

SUBSTITUTE_TOKEN

Constructor Detail

DataDrivenDBInputFormat

Method Detail

getSplitter

getSplits

getBoundingValsQuery

setBoundingQuery

createDBRecordReader

setInput

setInput