org.apache.hadoop.mapred.lib.db
Class DBOutputFormat<K extends DBWritable,V>

java.lang.Object
  extended by org.apache.hadoop.mapred.lib.db.DBOutputFormat<K,V>
All Implemented Interfaces:
OutputFormat<K,V>

public class DBOutputFormat<K extends DBWritable,V>
extends Object
implements OutputFormat<K,V>

A OutputFormat that sends the reduce output to a SQL table.

DBOutputFormat accepts <key,value> pairs, where key has a type extending DBWritable. Returned DBOutputFormat.DBRecordWriter writes only the key to the database with a batch SQL query.


Nested Class Summary
protected  class DBOutputFormat.DBRecordWriter
          A RecordWriter that writes the reduce output to a SQL table
 
Constructor Summary
DBOutputFormat()
           
 
Method Summary
 void checkOutputSpecs(FileSystem filesystem, JobConf job)
          Check for validity of the output-specification for the job.
protected  String constructQuery(String table, String[] fieldNames)
          Constructs the query used as the prepared statement to insert data.
 RecordWriter<K,V> getRecordWriter(FileSystem filesystem, JobConf job, String name, Progressable progress)
          Get the RecordWriter for the given job.
static void setOutput(JobConf job, String tableName, String... fieldNames)
          Initializes the reduce-part of the job with the appropriate output settings
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DBOutputFormat

public DBOutputFormat()
Method Detail

constructQuery

protected String constructQuery(String table,
                                String[] fieldNames)
Constructs the query used as the prepared statement to insert data.

Parameters:
table - the table to insert into
fieldNames - the fields to insert into. If field names are unknown, supply an array of nulls.

checkOutputSpecs

public void checkOutputSpecs(FileSystem filesystem,
                             JobConf job)
                      throws IOException
Check for validity of the output-specification for the job.

This is to validate the output specification for the job when it is a job is submitted. Typically checks that it does not already exist, throwing an exception when it already exists, so that output is not overwritten.

Specified by:
checkOutputSpecs in interface OutputFormat<K extends DBWritable,V>
job - job configuration.
Throws:
IOException - when output should not be attempted

getRecordWriter

public RecordWriter<K,V> getRecordWriter(FileSystem filesystem,
                                         JobConf job,
                                         String name,
                                         Progressable progress)
                                                     throws IOException
Get the RecordWriter for the given job.

Specified by:
getRecordWriter in interface OutputFormat<K extends DBWritable,V>
job - configuration for the job whose output is being written.
name - the unique name for this part of the output.
progress - mechanism for reporting progress while writing to file.
Returns:
a RecordWriter to write the output for the job.
Throws:
IOException

setOutput

public static void setOutput(JobConf job,
                             String tableName,
                             String... fieldNames)
Initializes the reduce-part of the job with the appropriate output settings

Parameters:
job - The job
tableName - The table to insert data into
fieldNames - The field names in the table. If unknown, supply the appropriate number of nulls.


Copyright © 2009 The Apache Software Foundation