org.apache.hadoop.mapreduce.lib.join
Class CompositeInputFormat<K extends WritableComparable>

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<K,TupleWritable>
      extended by org.apache.hadoop.mapreduce.lib.join.CompositeInputFormat<K>

@InterfaceAudience.Public
@InterfaceStability.Stable
public class CompositeInputFormat<K extends WritableComparable>
extends InputFormat<K,TupleWritable>

An InputFormat capable of performing joins over a set of data sources sorted and partitioned the same way.

See Also:
A user may define new join types by setting the property mapreduce.join.define.<ident> to a classname. In the expression mapreduce.join.expr, the identifier will be assumed to be a ComposableRecordReader. mapreduce.join.keycomparator can be a classname used to compare keys in the join., JoinRecordReader, MultiFilterRecordReader

Field Summary
static String JOIN_COMPARATOR
           
static String JOIN_EXPR
           
 
Constructor Summary
CompositeInputFormat()
           
 
Method Summary
protected  void addDefaults()
          Adds the default set of identifiers to the parser.
static String compose(Class<? extends InputFormat> inf, String path)
          Convenience method for constructing composite formats.
static String compose(String op, Class<? extends InputFormat> inf, Path... path)
          Convenience method for constructing composite formats.
static String compose(String op, Class<? extends InputFormat> inf, String... path)
          Convenience method for constructing composite formats.
 RecordReader<K,TupleWritable> createRecordReader(InputSplit split, TaskAttemptContext taskContext)
          Construct a CompositeRecordReader for the children of this InputFormat as defined in the init expression.
 List<InputSplit> getSplits(JobContext job)
          Build a CompositeInputSplit from the child InputFormats by assigning the ith split from each child to the ith composite split.
 void setFormat(Configuration conf)
          Interpret a given string as a composite expression.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

JOIN_EXPR

public static final String JOIN_EXPR
See Also:
Constant Field Values

JOIN_COMPARATOR

public static final String JOIN_COMPARATOR
See Also:
Constant Field Values
Constructor Detail

CompositeInputFormat

public CompositeInputFormat()
Method Detail

setFormat

public void setFormat(Configuration conf)
               throws IOException
Interpret a given string as a composite expression. func ::= <ident>([<func>,]*<func>) func ::= tbl(<class>,"<path>") class ::= @see java.lang.Class#forName(java.lang.String) path ::= @see org.apache.hadoop.fs.Path#Path(java.lang.String) Reads expression from the mapreduce.join.expr property and user-supplied join types from mapreduce.join.define.<ident> types. Paths supplied to tbl are given as input paths to the InputFormat class listed.

Throws:
IOException
See Also:
compose(java.lang.String, java.lang.Class, java.lang.String...)

addDefaults

protected void addDefaults()
Adds the default set of identifiers to the parser.


getSplits

public List<InputSplit> getSplits(JobContext job)
                           throws IOException,
                                  InterruptedException
Build a CompositeInputSplit from the child InputFormats by assigning the ith split from each child to the ith composite split.

Specified by:
getSplits in class InputFormat<K extends WritableComparable,TupleWritable>
Parameters:
job - job configuration.
Returns:
an array of InputSplits for the job.
Throws:
IOException
InterruptedException

createRecordReader

public RecordReader<K,TupleWritable> createRecordReader(InputSplit split,
                                                        TaskAttemptContext taskContext)
                                                                            throws IOException,
                                                                                   InterruptedException
Construct a CompositeRecordReader for the children of this InputFormat as defined in the init expression. The outermost join need only be composable, not necessarily a composite. Mandating TupleWritable isn't strictly correct.

Specified by:
createRecordReader in class InputFormat<K extends WritableComparable,TupleWritable>
Parameters:
split - the split to be read
taskContext - the information about the task
Returns:
a new record reader
Throws:
IOException
InterruptedException

compose

public static String compose(Class<? extends InputFormat> inf,
                             String path)
Convenience method for constructing composite formats. Given InputFormat class (inf), path (p) return: tbl(<inf>, <p>)


compose

public static String compose(String op,
                             Class<? extends InputFormat> inf,
                             String... path)
Convenience method for constructing composite formats. Given operation (op), Object class (inf), set of paths (p) return: <op>(tbl(<inf>,<p1>),tbl(<inf>,<p2>),...,tbl(<inf>,<pn>))


compose

public static String compose(String op,
                             Class<? extends InputFormat> inf,
                             Path... path)
Convenience method for constructing composite formats. Given operation (op), Object class (inf), set of paths (p) return: <op>(tbl(<inf>,<p1>),tbl(<inf>,<p2>),...,tbl(<inf>,<pn>))



Copyright © 2014 Apache Software Foundation. All Rights Reserved.