DBInputFormat (Apache Hadoop Main 2.5.2 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.mapreduce.lib.db
Class DBInputFormat<T extends DBWritable>

java.lang.Object
  org.apache.hadoop.mapreduce.InputFormat<LongWritable,T>
      org.apache.hadoop.mapreduce.lib.db.DBInputFormat<T>

All Implemented Interfaces:: Configurable

Direct Known Subclasses:: DataDrivenDBInputFormat, DBInputFormat

@InterfaceAudience.Public @InterfaceStability.Stable public class DBInputFormat<T extends DBWritable>
extends InputFormat<LongWritable,T>
implements Configurable
extends InputFormat<LongWritable,T>
implements Configurable

A InputFormat that reads input data from an SQL table.

DBInputFormat emits LongWritables containing the record number as key and DBWritables as value. The SQL query, and input class can be using one of the two setInput methods.

Field Summary
`protected String`	`conditions`
`protected Connection`	`connection`
`protected DBConfiguration`	`dbConf`
`protected String`	`dbProductName`
`protected String[]`	`fieldNames`
`protected String`	`tableName`

Constructor Summary
`DBInputFormat()`

Method Summary
`protected void`	`closeConnection()`
`protected RecordReader<LongWritable,T>`	`createDBRecordReader(org.apache.hadoop.mapreduce.lib.db.DBInputFormat.DBInputSplit split, Configuration conf)`
`RecordReader<LongWritable,T>`	`createRecordReader(InputSplit split, TaskAttemptContext context)` Create a record reader for a given split.
`Configuration`	`getConf()` Return the configuration used by this object.
`Connection`	`getConnection()`
`protected String`	`getCountQuery()` Returns the query for getting the total number of rows, subclasses can override this for custom behaviour.
`DBConfiguration`	`getDBConf()`
`String`	`getDBProductName()`
`List<InputSplit>`	`getSplits(JobContext job)` Logically split the set of input files for the job.
`void`	`setConf(Configuration conf)` Set the configuration to be used by this object.
`static void`	`setInput(Job job, Class<? extends DBWritable> inputClass, String inputQuery, String inputCountQuery)` Initializes the map-part of the job with the appropriate input settings.
`static void`	`setInput(Job job, Class<? extends DBWritable> inputClass, String tableName, String conditions, String orderBy, String... fieldNames)` Initializes the map-part of the job with the appropriate input settings.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

dbProductName

protected String dbProductName

conditions

protected String conditions

connection

protected Connection connection

tableName

protected String tableName

fieldNames

protected String[] fieldNames

dbConf

protected DBConfiguration dbConf

Constructor Detail

DBInputFormat

public DBInputFormat()

Method Detail

setConf

public void setConf(Configuration conf)

Set the configuration to be used by this object.

Specified by:: setConf in interface Configurable

getConf

public Configuration getConf()

Description copied from interface: Configurable

Return the configuration used by this object.

Specified by:: getConf in interface Configurable

getDBConf

public DBConfiguration getDBConf()

getConnection

public Connection getConnection()

getDBProductName

public String getDBProductName()

createDBRecordReader

protected RecordReader<LongWritable,T> createDBRecordReader(org.apache.hadoop.mapreduce.lib.db.DBInputFormat.DBInputSplit split,
                                                            Configuration conf)
                                                                        throws IOException

Throws:: IOException

createRecordReader

public RecordReader<LongWritable,T> createRecordReader(InputSplit split,
                                                       TaskAttemptContext context)
                                                                   throws IOException,
                                                                          InterruptedException

Create a record reader for a given split. The framework will call RecordReader.initialize(InputSplit, TaskAttemptContext) before the split is used.

Specified by:: createRecordReader in class InputFormat<LongWritable,T extends DBWritable>

Parameters:: split - the split to be read; context - the information about the task
Returns:: a new record reader
Throws:: IOException; InterruptedException

getSplits

public List<InputSplit> getSplits(JobContext job)
                           throws IOException

Logically split the set of input files for the job.

Each InputSplit is then assigned to an individual Mapper for processing.

Note: The split is a logical split of the inputs and the input files are not physically split into chunks. For e.g. a split could be <input-file-path, start, offset> tuple. The InputFormat also creates the RecordReader to read the InputSplit.

Specified by:: getSplits in class InputFormat<LongWritable,T extends DBWritable>

Parameters:: job - job configuration.
Returns:: an array of InputSplits for the job.
Throws:: IOException

getCountQuery

protected String getCountQuery()

Returns the query for getting the total number of rows, subclasses can override this for custom behaviour.

setInput

public static void setInput(Job job,
                            Class<? extends DBWritable> inputClass,
                            String tableName,
                            String conditions,
                            String orderBy,
                            String... fieldNames)

Initializes the map-part of the job with the appropriate input settings.

Parameters:: job - The map-reduce job; inputClass - the class object implementing DBWritable, which is the Java object holding tuple fields.; tableName - The table to read data from; conditions - The condition which to select data with, eg. '(updated > 20070101 AND length > 0)'; orderBy - the fieldNames in the orderBy clause.; fieldNames - The field names in the table
See Also:: setInput(Job, Class, String, String)

setInput

public static void setInput(Job job,
                            Class<? extends DBWritable> inputClass,
                            String inputQuery,
                            String inputCountQuery)

Initializes the map-part of the job with the appropriate input settings.

Parameters:: job - The map-reduce job; inputClass - the class object implementing DBWritable, which is the Java object holding tuple fields.; inputQuery - the input query to select fields. Example : "SELECT f1, f2, f3 FROM Mytable ORDER BY f1"; inputCountQuery - the input query that returns the number of records in the table. Example : "SELECT COUNT(f1) FROM Mytable"
See Also:: setInput(Job, Class, String, String, String, String...)

closeConnection

protected void closeConnection()

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.mapreduce.lib.db Class DBInputFormat<T extends DBWritable>

dbProductName

conditions

connection

tableName

fieldNames

dbConf

DBInputFormat

setConf

getConf

getDBConf

getConnection

getDBProductName

createDBRecordReader

createRecordReader

getSplits

getCountQuery

setInput

setInput

closeConnection

org.apache.hadoop.mapreduce.lib.db
Class DBInputFormat<T extends DBWritable>