org.apache.hadoop.mapreduce.lib.fieldsel
Class FieldSelectionHelper

java.lang.Object
  extended by org.apache.hadoop.mapreduce.lib.fieldsel.FieldSelectionHelper

@InterfaceAudience.Public
@InterfaceStability.Stable
public class FieldSelectionHelper
extends Object

This class implements a mapper/reducer class that can be used to perform field selections in a manner similar to unix cut. The input data is treated as fields separated by a user specified separator (the default value is "\t"). The user can specify a list of fields that form the map output keys, and a list of fields that form the map output values. If the inputformat is TextInputFormat, the mapper will ignore the key to the map function. and the fields are from the value only. Otherwise, the fields are the union of those from the key and those from the value. The field separator is under attribute "mapreduce.fieldsel.data.field.separator" The map output field list spec is under attribute "mapreduce.fieldsel.map.output.key.value.fields.spec". The value is expected to be like "keyFieldsSpec:valueFieldsSpec" key/valueFieldsSpec are comma (,) separated field spec: fieldSpec,fieldSpec,fieldSpec ... Each field spec can be a simple number (e.g. 5) specifying a specific field, or a range (like 2-5) to specify a range of fields, or an open range (like 3-) specifying all the fields starting from field 3. The open range field spec applies value fields only. They have no effect on the key fields. Here is an example: "4,3,0,1:6,5,1-3,7-". It specifies to use fields 4,3,0 and 1 for keys, and use fields 6,5,1,2,3,7 and above for values. The reduce output field list spec is under attribute "mapreduce.fieldsel.reduce.output.key.value.fields.spec". The reducer extracts output key/value pairs in a similar manner, except that the key is never ignored.


Field Summary
static String DATA_FIELD_SEPERATOR
           
static Text emptyText
           
static String MAP_OUTPUT_KEY_VALUE_SPEC
           
static String REDUCE_OUTPUT_KEY_VALUE_SPEC
           
 
Constructor Summary
FieldSelectionHelper()
           
FieldSelectionHelper(Text key, Text val)
           
 
Method Summary
 void extractOutputKeyValue(String key, String val, String fieldSep, List<Integer> keyFieldList, List<Integer> valFieldList, int allValueFieldsFrom, boolean ignoreKey, boolean isMap)
           
 Text getKey()
           
 Text getValue()
           
static int parseOutputKeyValueSpec(String keyValueSpec, List<Integer> keyFieldList, List<Integer> valueFieldList)
           
static String specToString(String fieldSeparator, String keyValueSpec, int allValueFieldsFrom, List<Integer> keyFieldList, List<Integer> valueFieldList)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

emptyText

public static Text emptyText

DATA_FIELD_SEPERATOR

public static final String DATA_FIELD_SEPERATOR
See Also:
Constant Field Values

MAP_OUTPUT_KEY_VALUE_SPEC

public static final String MAP_OUTPUT_KEY_VALUE_SPEC
See Also:
Constant Field Values

REDUCE_OUTPUT_KEY_VALUE_SPEC

public static final String REDUCE_OUTPUT_KEY_VALUE_SPEC
See Also:
Constant Field Values
Constructor Detail

FieldSelectionHelper

public FieldSelectionHelper()

FieldSelectionHelper

public FieldSelectionHelper(Text key,
                            Text val)
Method Detail

parseOutputKeyValueSpec

public static int parseOutputKeyValueSpec(String keyValueSpec,
                                          List<Integer> keyFieldList,
                                          List<Integer> valueFieldList)

specToString

public static String specToString(String fieldSeparator,
                                  String keyValueSpec,
                                  int allValueFieldsFrom,
                                  List<Integer> keyFieldList,
                                  List<Integer> valueFieldList)

getKey

public Text getKey()

getValue

public Text getValue()

extractOutputKeyValue

public void extractOutputKeyValue(String key,
                                  String val,
                                  String fieldSep,
                                  List<Integer> keyFieldList,
                                  List<Integer> valFieldList,
                                  int allValueFieldsFrom,
                                  boolean ignoreKey,
                                  boolean isMap)


Copyright © 2009 The Apache Software Foundation