org.apache.hadoop.streaming
Class StreamXmlRecordReader
java.lang.Object
   org.apache.hadoop.streaming.StreamBaseRecordReader
org.apache.hadoop.streaming.StreamBaseRecordReader
       org.apache.hadoop.streaming.StreamXmlRecordReader
org.apache.hadoop.streaming.StreamXmlRecordReader
- All Implemented Interfaces: 
- RecordReader<Text,Text>
- public class StreamXmlRecordReader 
- extends StreamBaseRecordReader
A way to interpret XML fragments as Mapper input records.
  Values are XML subtrees delimited by configurable tags.
  Keys could be the value of a certain attribute in the XML subtree, 
  but this is left to the stream processor application.
  The name-value properties that StreamXmlRecordReader understands are:
    String begin (chars marking beginning of record)
    String end   (chars marking end of record)
    int maxrec   (maximum record size)
    int lookahead(maximum lookahead to sync CDATA)
    boolean slowmatch
 
 
 
 
 
| Methods inherited from class java.lang.Object | 
| clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
 
StreamXmlRecordReader
public StreamXmlRecordReader(FSDataInputStream in,
                             FileSplit split,
                             Reporter reporter,
                             JobConf job,
                             FileSystem fs)
                      throws IOException
- Throws:
- IOException
init
public void init()
          throws IOException
- 
- Throws:
- IOException
 
next
public boolean next(Text key,
                    Text value)
             throws IOException
- Description copied from class: StreamBaseRecordReader
- Read a record. Implementation should call numRecStats at the end
 
- 
- Specified by:
- nextin interface- RecordReader<Text,Text>
- Specified by:
- nextin class- StreamBaseRecordReader
 
- 
- Parameters:
- key- the key to read data into
- value- the value to read data into
- Returns:
- true iff a key/value was read, false if at EOF
- Throws:
- IOException
 
seekNextRecordBoundary
public void seekNextRecordBoundary()
                            throws IOException
- Description copied from class: StreamBaseRecordReader
- Implementation should seek forward in_ to the first byte of the next record.
  The initial byte offset in the stream is arbitrary.
 
- 
- Specified by:
- seekNextRecordBoundaryin class- StreamBaseRecordReader
 
- 
- Throws:
- IOException
 
Copyright © 2009 The Apache Software Foundation