org.apache.hadoop.mapreduce.lib.db
Class DBInputFormat<T extends DBWritable>

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<LongWritable,T>
      extended by org.apache.hadoop.mapreduce.lib.db.DBInputFormat<T>
All Implemented Interfaces:
Configurable
Direct Known Subclasses:
DataDrivenDBInputFormat

@InterfaceAudience.Public
@InterfaceStability.Stable
public class DBInputFormat<T extends DBWritable>
extends InputFormat<LongWritable,T>
implements Configurable

A InputFormat that reads input data from an SQL table.

DBInputFormat emits LongWritables containing the record number as key and DBWritables as value. The SQL query, and input class can be using one of the two setInput methods.


Nested Class Summary
static class DBInputFormat.DBInputSplit
          A InputSplit that spans a set of rows
static class DBInputFormat.NullDBWritable
          A Class that does nothing, implementing DBWritable
 
Constructor Summary
DBInputFormat()
           
 
Method Summary
protected  void closeConnection()
           
protected  RecordReader<LongWritable,T> createDBRecordReader(DBInputFormat.DBInputSplit split, Configuration conf)
           
 RecordReader<LongWritable,T> createRecordReader(InputSplit split, TaskAttemptContext context)
          Create a record reader for a given split.
 Configuration getConf()
          Return the configuration used by this object.
 Connection getConnection()
           
protected  String getCountQuery()
          Returns the query for getting the total number of rows, subclasses can override this for custom behaviour.
 DBConfiguration getDBConf()
           
 String getDBProductName()
           
 List<InputSplit> getSplits(JobContext job)
          Logically split the set of input files for the job.
 void setConf(Configuration conf)
          Set the configuration to be used by this object.
static void setInput(Job job, Class<? extends DBWritable> inputClass, String inputQuery, String inputCountQuery)
          Initializes the map-part of the job with the appropriate input settings.
static void setInput(Job job, Class<? extends DBWritable> inputClass, String tableName, String conditions, String orderBy, String... fieldNames)
          Initializes the map-part of the job with the appropriate input settings.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DBInputFormat

public DBInputFormat()
Method Detail

setConf

public void setConf(Configuration conf)
Set the configuration to be used by this object.

Specified by:
setConf in interface Configurable

getConf

public Configuration getConf()
Description copied from interface: Configurable
Return the configuration used by this object.

Specified by:
getConf in interface Configurable

getDBConf

public DBConfiguration getDBConf()

getConnection

public Connection getConnection()

getDBProductName

public String getDBProductName()

createDBRecordReader

protected RecordReader<LongWritable,T> createDBRecordReader(DBInputFormat.DBInputSplit split,
                                                            Configuration conf)
                                                                        throws IOException
Throws:
IOException

createRecordReader

public RecordReader<LongWritable,T> createRecordReader(InputSplit split,
                                                       TaskAttemptContext context)
                                                                   throws IOException,
                                                                          InterruptedException
Create a record reader for a given split. The framework will call RecordReader.initialize(InputSplit, TaskAttemptContext) before the split is used.

Specified by:
createRecordReader in class InputFormat<LongWritable,T extends DBWritable>
Parameters:
split - the split to be read
context - the information about the task
Returns:
a new record reader
Throws:
IOException
InterruptedException

getSplits

public List<InputSplit> getSplits(JobContext job)
                           throws IOException
Logically split the set of input files for the job.

Each InputSplit is then assigned to an individual Mapper for processing.

Note: The split is a logical split of the inputs and the input files are not physically split into chunks. For e.g. a split could be <input-file-path, start, offset> tuple. The InputFormat also creates the RecordReader to read the InputSplit.

Specified by:
getSplits in class InputFormat<LongWritable,T extends DBWritable>
Parameters:
job - job configuration.
Returns:
an array of InputSplits for the job.
Throws:
IOException

getCountQuery

protected String getCountQuery()
Returns the query for getting the total number of rows, subclasses can override this for custom behaviour.


setInput

public static void setInput(Job job,
                            Class<? extends DBWritable> inputClass,
                            String tableName,
                            String conditions,
                            String orderBy,
                            String... fieldNames)
Initializes the map-part of the job with the appropriate input settings.

Parameters:
job - The map-reduce job
inputClass - the class object implementing DBWritable, which is the Java object holding tuple fields.
tableName - The table to read data from
conditions - The condition which to select data with, eg. '(updated > 20070101 AND length > 0)'
orderBy - the fieldNames in the orderBy clause.
fieldNames - The field names in the table
See Also:
setInput(Job, Class, String, String)

setInput

public static void setInput(Job job,
                            Class<? extends DBWritable> inputClass,
                            String inputQuery,
                            String inputCountQuery)
Initializes the map-part of the job with the appropriate input settings.

Parameters:
job - The map-reduce job
inputClass - the class object implementing DBWritable, which is the Java object holding tuple fields.
inputQuery - the input query to select fields. Example : "SELECT f1, f2, f3 FROM Mytable ORDER BY f1"
inputCountQuery - the input query that returns the number of records in the table. Example : "SELECT COUNT(f1) FROM Mytable"
See Also:
setInput(Job, Class, String, String, String, String...)

closeConnection

protected void closeConnection()


Copyright © 2009 The Apache Software Foundation