org.apache.hadoop.mapred.lib.db.DBInputFormat<T>

All Implemented Interfaces:: Configurable, InputFormat<LongWritable,T>, JobConfigurable

@Public @Stable public class DBInputFormat<T extends DBWritable> extends DBInputFormat<T> implements InputFormat<LongWritable,T>, JobConfigurable

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

protected static class

org.apache.hadoop.mapred.lib.db.DBInputFormat.DBInputSplit

A InputSplit that spans a set of rows

protected class

org.apache.hadoop.mapred.lib.db.DBInputFormat.DBRecordReader

A RecordReader that reads records from a SQL table.

static class

org.apache.hadoop.mapred.lib.db.DBInputFormat.NullDBWritable

A Class that does nothing, implementing DBWritable
Field Summary

Fields inherited from class org.apache.hadoop.mapreduce.lib.db.DBInputFormat
conditions, connection, dbConf, dbProductName, fieldNames, tableName
Constructor Summary

Constructors

Constructor

Description

DBInputFormat()
Method Summary

Modifier and Type

Method

Description

void

configure(JobConf job)

Initializes a new instance from a JobConf.

RecordReader<LongWritable,T>

getRecordReader(InputSplit split, JobConf job, Reporter reporter)

Get the RecordReader for the given InputSplit.

InputSplit[]

getSplits(JobConf job, int chunks)

Logically split the set of input files for the job.

static void

setInput(JobConf job, Class<? extends DBWritable> inputClass, String inputQuery, String inputCountQuery)

Initializes the map-part of the job with the appropriate input settings.

static void

setInput(JobConf job, Class<? extends DBWritable> inputClass, String tableName, String conditions, String orderBy, String... fieldNames)

Initializes the map-part of the job with the appropriate input settings.

Methods inherited from class org.apache.hadoop.mapreduce.lib.db.DBInputFormat
closeConnection, createConnection, createDBRecordReader, createRecordReader, getConf, getConnection, getCountQuery, getDBConf, getDBProductName, getSplits, setConf, setInput, setInput

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- DBInputFormat
  
  public DBInputFormat()
Method Details
- configure
  
  public void configure(JobConf job)
  
  Initializes a new instance from a JobConf.
  
  Specified by:
  
  configure in interface JobConfigurable
  
  Parameters:
  
  job - the configuration
- getRecordReader
  
  public RecordReader<LongWritable,T> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException
  
  Get the RecordReader for the given InputSplit.
  It is the responsibility of the RecordReader to respect record boundaries while processing the logical split to present a record-oriented view to the individual task.
  
  Specified by:
  
  getRecordReader in interface InputFormat<LongWritable,T extends DBWritable>
  
  Parameters:
  
  split - the InputSplit
  
  job - the job that this split belongs to
  
  Returns:
  
  a RecordReader
  
  Throws:
  
  IOException
- getSplits
  
  public InputSplit[] getSplits(JobConf job, int chunks) throws IOException
  
  Logically split the set of input files for the job.
  Each InputSplit is then assigned to an individual Mapper for processing.
  
  Note: The split is a logical split of the inputs and the input files are not physically split into chunks. For e.g. a split could be <input-file-path, start, offset> tuple.
  
  Specified by:
  
  getSplits in interface InputFormat<LongWritable,T extends DBWritable>
  
  Parameters:
  
  job - job configuration.
  
  chunks - the desired number of splits, a hint.
  
  Returns:
  
  an array of InputSplits for the job.
  
  Throws:
  
  IOException
- setInput
  
  public static void setInput(JobConf job, Class<? extends DBWritable> inputClass, String tableName, String conditions, String orderBy, String... fieldNames)
  
  Initializes the map-part of the job with the appropriate input settings.
  Parameters:
  
  job - The job
  
  inputClass - the class object implementing DBWritable, which is the Java object holding tuple fields.
  
  tableName - The table to read data from
  
  conditions - The condition which to select data with, eg. '(updated > 20070101 AND length > 0)'
  
  orderBy - the fieldNames in the orderBy clause.
  
  fieldNames - The field names in the table
  
  See Also:
  
  setInput(JobConf, Class, String, String)
- setInput
  
  public static void setInput(JobConf job, Class<? extends DBWritable> inputClass, String inputQuery, String inputCountQuery)
  
  Initializes the map-part of the job with the appropriate input settings.
  Parameters:
  
  job - The job
  
  inputClass - the class object implementing DBWritable, which is the Java object holding tuple fields.
  
  inputQuery - the input query to select fields. Example : "SELECT f1, f2, f3 FROM Mytable ORDER BY f1"
  
  inputCountQuery - the input query that returns the number of records in the table. Example : "SELECT COUNT(f1) FROM Mytable"
  
  See Also:
  
  setInput(JobConf, Class, String, String, String, String...)

Class DBInputFormat<T extends DBWritable>

Nested Class Summary

Field Summary

Fields inherited from class org.apache.hadoop.mapreduce.lib.db.DBInputFormat

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapreduce.lib.db.DBInputFormat

Methods inherited from class java.lang.Object

Constructor Details

DBInputFormat

Method Details

configure

getRecordReader

getSplits

setInput

setInput