org.apache.hadoop.streaming
Class StreamXmlRecordReader
java.lang.Object
org.apache.hadoop.streaming.StreamBaseRecordReader
org.apache.hadoop.streaming.StreamXmlRecordReader
- All Implemented Interfaces:
- RecordReader<Text,Text>
public class StreamXmlRecordReader
- extends StreamBaseRecordReader
A way to interpret XML fragments as Mapper input records.
Values are XML subtrees delimited by configurable tags.
Keys could be the value of a certain attribute in the XML subtree,
but this is left to the stream processor application.
The name-value properties that StreamXmlRecordReader understands are:
String begin (chars marking beginning of record)
String end (chars marking end of record)
int maxrec (maximum record size)
int lookahead(maximum lookahead to sync CDATA)
boolean slowmatch
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
StreamXmlRecordReader
public StreamXmlRecordReader(FSDataInputStream in,
FileSplit split,
Reporter reporter,
JobConf job,
FileSystem fs)
throws IOException
- Throws:
IOException
init
public void init()
throws IOException
- Throws:
IOException
next
public boolean next(Text key,
Text value)
throws IOException
- Description copied from class:
StreamBaseRecordReader
- Read a record. Implementation should call numRecStats at the end
- Specified by:
next
in interface RecordReader<Text,Text>
- Specified by:
next
in class StreamBaseRecordReader
- Parameters:
key
- the key to read data intovalue
- the value to read data into
- Returns:
- true iff a key/value was read, false if at EOF
- Throws:
IOException
seekNextRecordBoundary
public void seekNextRecordBoundary()
throws IOException
- Description copied from class:
StreamBaseRecordReader
- Implementation should seek forward in_ to the first byte of the next record.
The initial byte offset in the stream is arbitrary.
- Specified by:
seekNextRecordBoundary
in class StreamBaseRecordReader
- Throws:
IOException
Copyright © 2009 The Apache Software Foundation