java.lang.Object

org.apache.hadoop.mapreduce.InputFormat<LongWritable,T>

org.apache.hadoop.mapreduce.lib.db.DBInputFormat<T>

All Implemented Interfaces:: Configurable

Direct Known Subclasses:: DataDrivenDBInputFormat, DBInputFormat

@Public @Stable public class DBInputFormat<T extends DBWritable> extends InputFormat<LongWritable,T> implements Configurable

A InputFormat that reads input data from an SQL table.

DBInputFormat emits LongWritables containing the record number as key and DBWritables as value. The SQL query, and input class can be using one of the two setInput methods.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

org.apache.hadoop.mapreduce.lib.db.DBInputFormat.DBInputSplit

A InputSplit that spans a set of rows

static class

org.apache.hadoop.mapreduce.lib.db.DBInputFormat.NullDBWritable

A Class that does nothing, implementing DBWritable
Field Summary

Fields

Modifier and Type

Field

Description

protected String

conditions

protected Connection

connection

protected DBConfiguration

dbConf

protected String

dbProductName

protected String[]

fieldNames

protected String

tableName
Constructor Summary

Constructors

Constructor

Description

DBInputFormat()
Method Summary

Modifier and Type

Method

Description

protected void

closeConnection()

Connection

createConnection()

protected RecordReader<LongWritable,T>

createDBRecordReader(org.apache.hadoop.mapreduce.lib.db.DBInputFormat.DBInputSplit split, Configuration conf)

RecordReader<LongWritable,T>

createRecordReader(InputSplit split, TaskAttemptContext context)

Create a record reader for a given split.

Configuration

getConf()

Return the configuration used by this object.

Connection

getConnection()

protected String

getCountQuery()

Returns the query for getting the total number of rows, subclasses can override this for custom behaviour.

DBConfiguration

getDBConf()

String

getDBProductName()

List<InputSplit>

getSplits(JobContext job)

Logically split the set of input files for the job.

void

setConf(Configuration conf)

Set the configuration to be used by this object.

static void

setInput(Job job, Class<? extends DBWritable> inputClass, String inputQuery, String inputCountQuery)

Initializes the map-part of the job with the appropriate input settings.

static void

setInput(Job job, Class<? extends DBWritable> inputClass, String tableName, String conditions, String orderBy, String... fieldNames)

Initializes the map-part of the job with the appropriate input settings.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- dbProductName
  
  protected String dbProductName
- conditions
  
  protected String conditions
- connection
  
  protected Connection connection
- tableName
  
  protected String tableName
- fieldNames
  
  protected String[] fieldNames
- dbConf
  
  protected DBConfiguration dbConf
Constructor Details
- DBInputFormat
  
  public DBInputFormat()
Method Details
- setConf
  
  public void setConf(Configuration conf)
  
  Set the configuration to be used by this object.
  
  Specified by:
  
  setConf in interface Configurable
  
  Parameters:
  
  conf - configuration to be used
- getConf
  
  public Configuration getConf()
  
  Description copied from interface: Configurable
  
  Return the configuration used by this object.
  
  Specified by:
  
  getConf in interface Configurable
  
  Returns:
  
  Configuration
- getDBConf
  
  public DBConfiguration getDBConf()
- getConnection
  
  public Connection getConnection()
- createConnection
  
  public Connection createConnection()
- getDBProductName
  
  public String getDBProductName()
- createDBRecordReader
  
  protected RecordReader<LongWritable,T> createDBRecordReader(org.apache.hadoop.mapreduce.lib.db.DBInputFormat.DBInputSplit split, Configuration conf) throws IOException
  
  Throws:
  
  IOException
- createRecordReader
  
  public RecordReader<LongWritable,T> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException
  
  Create a record reader for a given split. The framework will call RecordReader.initialize(InputSplit, TaskAttemptContext) before the split is used.
  
  Specified by:
  
  createRecordReader in class InputFormat<LongWritable,T extends DBWritable>
  
  Parameters:
  
  split - the split to be read
  
  context - the information about the task
  
  Returns:
  
  a new record reader
  
  Throws:
  
  IOException
  
  InterruptedException
- getSplits
  
  public List<InputSplit> getSplits(JobContext job) throws IOException
  
  Logically split the set of input files for the job.
  Each InputSplit is then assigned to an individual Mapper for processing.
  
  Note: The split is a logical split of the inputs and the input files are not physically split into chunks. For e.g. a split could be <input-file-path, start, offset> tuple. The InputFormat also creates the RecordReader to read the InputSplit.
  
  Specified by:
  
  getSplits in class InputFormat<LongWritable,T extends DBWritable>
  
  Parameters:
  
  job - job configuration.
  
  Returns:
  
  an array of InputSplits for the job.
  
  Throws:
  
  IOException
- getCountQuery
  
  protected String getCountQuery()
  
  Returns the query for getting the total number of rows, subclasses can override this for custom behaviour.
- setInput
  
  public static void setInput(Job job, Class<? extends DBWritable> inputClass, String tableName, String conditions, String orderBy, String... fieldNames)
  
  Initializes the map-part of the job with the appropriate input settings.
  Parameters:
  
  job - The map-reduce job
  
  inputClass - the class object implementing DBWritable, which is the Java object holding tuple fields.
  
  tableName - The table to read data from
  
  conditions - The condition which to select data with, eg. '(updated > 20070101 AND length > 0)'
  
  orderBy - the fieldNames in the orderBy clause.
  
  fieldNames - The field names in the table
  
  See Also:
  
  setInput(Job, Class, String, String)
- setInput
  
  public static void setInput(Job job, Class<? extends DBWritable> inputClass, String inputQuery, String inputCountQuery)
  
  Initializes the map-part of the job with the appropriate input settings.
  Parameters:
  
  job - The map-reduce job
  
  inputClass - the class object implementing DBWritable, which is the Java object holding tuple fields.
  
  inputQuery - the input query to select fields. Example : "SELECT f1, f2, f3 FROM Mytable ORDER BY f1"
  
  inputCountQuery - the input query that returns the number of records in the table. Example : "SELECT COUNT(f1) FROM Mytable"
  
  See Also:
  
  setInput(Job, Class, String, String, String, String...)
- closeConnection
  
  protected void closeConnection()

Class DBInputFormat<T extends DBWritable>

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

dbProductName

conditions

connection

tableName

fieldNames

dbConf

Constructor Details

DBInputFormat

Method Details

setConf

getConf

getDBConf

getConnection

createConnection

getDBProductName

createDBRecordReader

createRecordReader

getSplits

getCountQuery

setInput

setInput

closeConnection