Package org.apache.hadoop.mapred.lib.db
Class DBInputFormat<T extends DBWritable>
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<LongWritable,T>
org.apache.hadoop.mapreduce.lib.db.DBInputFormat<T>
org.apache.hadoop.mapred.lib.db.DBInputFormat<T>
- All Implemented Interfaces:
Configurable,InputFormat<LongWritable,,T> JobConfigurable
@Public
@Stable
public class DBInputFormat<T extends DBWritable>
extends DBInputFormat<T>
implements InputFormat<LongWritable,T>, JobConfigurable
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected static classorg.apache.hadoop.mapred.lib.db.DBInputFormat.DBInputSplitA InputSplit that spans a set of rowsprotected classorg.apache.hadoop.mapred.lib.db.DBInputFormat.DBRecordReaderA RecordReader that reads records from a SQL table.static classorg.apache.hadoop.mapred.lib.db.DBInputFormat.NullDBWritableA Class that does nothing, implementing DBWritable -
Field Summary
Fields inherited from class org.apache.hadoop.mapreduce.lib.db.DBInputFormat
conditions, connection, dbConf, dbProductName, fieldNames, tableName -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidInitializes a new instance from aJobConf.getRecordReader(InputSplit split, JobConf job, Reporter reporter) Get theRecordReaderfor the givenInputSplit.Logically split the set of input files for the job.static voidsetInput(JobConf job, Class<? extends DBWritable> inputClass, String inputQuery, String inputCountQuery) Initializes the map-part of the job with the appropriate input settings.static voidsetInput(JobConf job, Class<? extends DBWritable> inputClass, String tableName, String conditions, String orderBy, String... fieldNames) Initializes the map-part of the job with the appropriate input settings.Methods inherited from class org.apache.hadoop.mapreduce.lib.db.DBInputFormat
closeConnection, createConnection, createDBRecordReader, createRecordReader, getConf, getConnection, getCountQuery, getDBConf, getDBProductName, getSplits, setConf, setInput, setInput
-
Constructor Details
-
DBInputFormat
public DBInputFormat()
-
-
Method Details
-
configure
Initializes a new instance from aJobConf.- Specified by:
configurein interfaceJobConfigurable- Parameters:
job- the configuration
-
getRecordReader
public RecordReader<LongWritable,T> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException Get theRecordReaderfor the givenInputSplit.It is the responsibility of the
RecordReaderto respect record boundaries while processing the logical split to present a record-oriented view to the individual task.- Specified by:
getRecordReaderin interfaceInputFormat<LongWritable,T extends DBWritable> - Parameters:
split- theInputSplitjob- the job that this split belongs to- Returns:
- a
RecordReader - Throws:
IOException
-
getSplits
Logically split the set of input files for the job.Each
InputSplitis then assigned to an individualMapperfor processing.Note: The split is a logical split of the inputs and the input files are not physically split into chunks. For e.g. a split could be <input-file-path, start, offset> tuple.
- Specified by:
getSplitsin interfaceInputFormat<LongWritable,T extends DBWritable> - Parameters:
job- job configuration.chunks- the desired number of splits, a hint.- Returns:
- an array of
InputSplits for the job. - Throws:
IOException
-
setInput
public static void setInput(JobConf job, Class<? extends DBWritable> inputClass, String tableName, String conditions, String orderBy, String... fieldNames) Initializes the map-part of the job with the appropriate input settings.- Parameters:
job- The jobinputClass- the class object implementing DBWritable, which is the Java object holding tuple fields.tableName- The table to read data fromconditions- The condition which to select data with, eg. '(updated > 20070101 AND length > 0)'orderBy- the fieldNames in the orderBy clause.fieldNames- The field names in the table- See Also:
-
setInput
public static void setInput(JobConf job, Class<? extends DBWritable> inputClass, String inputQuery, String inputCountQuery) Initializes the map-part of the job with the appropriate input settings.- Parameters:
job- The jobinputClass- the class object implementing DBWritable, which is the Java object holding tuple fields.inputQuery- the input query to select fields. Example : "SELECT f1, f2, f3 FROM Mytable ORDER BY f1"inputCountQuery- the input query that returns the number of records in the table. Example : "SELECT COUNT(f1) FROM Mytable"- See Also:
-