TextSplitter (Apache Hadoop Main 2.4.1 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.mapreduce.lib.db
Class TextSplitter

java.lang.Object
  org.apache.hadoop.mapreduce.lib.db.BigDecimalSplitter
      org.apache.hadoop.mapreduce.lib.db.TextSplitter

All Implemented Interfaces:: DBSplitter

@InterfaceAudience.Public @InterfaceStability.Evolving public class TextSplitter
extends BigDecimalSplitter
extends BigDecimalSplitter

Implement DBSplitter over text strings.

Constructor Summary
`TextSplitter()`

Method Summary
`List<InputSplit>`	`split(Configuration conf, ResultSet results, String colName)` This method needs to determine the splits between two user-provided strings.

Methods inherited from class org.apache.hadoop.mapreduce.lib.db.BigDecimalSplitter
`tryDivide`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

TextSplitter

public TextSplitter()

Method Detail

split

public List<InputSplit> split(Configuration conf,
                              ResultSet results,
                              String colName)
                       throws SQLException

This method needs to determine the splits between two user-provided strings. In the case where the user's strings are 'A' and 'Z', this is not hard; we could create two splits from ['A', 'M') and ['M', 'Z'], 26 splits for strings beginning with each letter, etc. If a user has provided us with the strings "Ham" and "Haze", however, we need to create splits that differ in the third letter. The algorithm used is as follows: Since there are 2**16 unicode characters, we interpret characters as digits in base 65536. Given a string 's' containing characters s_0, s_1 .. s_n, we interpret the string as the number: 0.s_0 s_1 s_2.. s_n in base 65536. Having mapped the low and high strings into floating-point values, we then use the BigDecimalSplitter to establish the even split points, then map the resulting floating point values back into strings.

Specified by:: split in interface DBSplitter
Overrides:: split in class BigDecimalSplitter

Throws:: SQLException