org.apache.hadoop.mapreduce.lib.db
Class TextSplitter
java.lang.Object
org.apache.hadoop.mapreduce.lib.db.BigDecimalSplitter
org.apache.hadoop.mapreduce.lib.db.TextSplitter
- All Implemented Interfaces:
- DBSplitter
@InterfaceAudience.Public
@InterfaceStability.Evolving
public class TextSplitter
- extends BigDecimalSplitter
Implement DBSplitter over text strings.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TextSplitter
public TextSplitter()
split
public List<InputSplit> split(Configuration conf,
ResultSet results,
String colName)
throws SQLException
- This method needs to determine the splits between two user-provided strings.
In the case where the user's strings are 'A' and 'Z', this is not hard; we
could create two splits from ['A', 'M') and ['M', 'Z'], 26 splits for strings
beginning with each letter, etc.
If a user has provided us with the strings "Ham" and "Haze", however, we need
to create splits that differ in the third letter.
The algorithm used is as follows:
Since there are 2**16 unicode characters, we interpret characters as digits in
base 65536. Given a string 's' containing characters s_0, s_1 .. s_n, we interpret
the string as the number: 0.s_0 s_1 s_2.. s_n in base 65536. Having mapped the
low and high strings into floating-point values, we then use the BigDecimalSplitter
to establish the even split points, then map the resulting floating point values
back into strings.
- Specified by:
split
in interface DBSplitter
- Overrides:
split
in class BigDecimalSplitter
- Throws:
SQLException
Copyright © 2009 The Apache Software Foundation