org.apache.hadoop.streaming
Class StreamXmlRecordReader

java.lang.Object
  extended by org.apache.hadoop.streaming.StreamBaseRecordReader
      extended by org.apache.hadoop.streaming.StreamXmlRecordReader
All Implemented Interfaces:
RecordReader<Text,Text>

public class StreamXmlRecordReader
extends StreamBaseRecordReader

A way to interpret XML fragments as Mapper input records. Values are XML subtrees delimited by configurable tags. Keys could be the value of a certain attribute in the XML subtree, but this is left to the stream processor application. The name-value properties that StreamXmlRecordReader understands are: String begin (chars marking beginning of record) String end (chars marking end of record) int maxrec (maximum record size) int lookahead(maximum lookahead to sync CDATA) boolean slowmatch


Field Summary
 
Fields inherited from class org.apache.hadoop.streaming.StreamBaseRecordReader
LOG
 
Constructor Summary
StreamXmlRecordReader(FSDataInputStream in, FileSplit split, Reporter reporter, JobConf job, FileSystem fs)
           
 
Method Summary
 void init()
           
 boolean next(Text key, Text value)
          Read a record.
 void seekNextRecordBoundary()
          Implementation should seek forward in_ to the first byte of the next record.
 
Methods inherited from class org.apache.hadoop.streaming.StreamBaseRecordReader
close, createKey, createValue, getPos, getProgress
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StreamXmlRecordReader

public StreamXmlRecordReader(FSDataInputStream in,
                             FileSplit split,
                             Reporter reporter,
                             JobConf job,
                             FileSystem fs)
                      throws IOException
Throws:
IOException
Method Detail

init

public void init()
          throws IOException
Throws:
IOException

next

public boolean next(Text key,
                    Text value)
             throws IOException
Description copied from class: StreamBaseRecordReader
Read a record. Implementation should call numRecStats at the end

Specified by:
next in interface RecordReader<Text,Text>
Specified by:
next in class StreamBaseRecordReader
Parameters:
key - the key to read data into
value - the value to read data into
Returns:
true iff a key/value was read, false if at EOF
Throws:
IOException

seekNextRecordBoundary

public void seekNextRecordBoundary()
                            throws IOException
Description copied from class: StreamBaseRecordReader
Implementation should seek forward in_ to the first byte of the next record. The initial byte offset in the stream is arbitrary.

Specified by:
seekNextRecordBoundary in class StreamBaseRecordReader
Throws:
IOException


Copyright © 2009 The Apache Software Foundation