org.apache.hadoop.mapreduce.lib.fieldsel
Class FieldSelectionHelper
java.lang.Object
  
org.apache.hadoop.mapreduce.lib.fieldsel.FieldSelectionHelper
@InterfaceAudience.Public
@InterfaceStability.Stable
public class FieldSelectionHelper
- extends Object
 
This class implements a mapper/reducer class that can be used to perform
 field selections in a manner similar to unix cut. The input data is treated
 as fields separated by a user specified separator (the default value is
 "\t"). The user can specify a list of fields that form the map output keys,
 and a list of fields that form the map output values. If the inputformat is
 TextInputFormat, the mapper will ignore the key to the map function. and the
 fields are from the value only. Otherwise, the fields are the union of those
 from the key and those from the value.
 
 The field separator is under attribute "mapreduce.fieldsel.data.field.separator"
 
 The map output field list spec is under attribute 
 "mapreduce.fieldsel.map.output.key.value.fields.spec".
 The value is expected to be like "keyFieldsSpec:valueFieldsSpec"
 key/valueFieldsSpec are comma (,) separated field spec: fieldSpec,fieldSpec,fieldSpec ...
 Each field spec can be a simple number (e.g. 5) specifying a specific field, or a range
 (like 2-5) to specify a range of fields, or an open range (like 3-) specifying all 
 the fields starting from field 3. The open range field spec applies value fields only.
 They have no effect on the key fields.
 
 Here is an example: "4,3,0,1:6,5,1-3,7-". It specifies to use fields 4,3,0 and 1 for keys,
 and use fields 6,5,1,2,3,7 and above for values.
 
 The reduce output field list spec is under attribute 
 "mapreduce.fieldsel.reduce.output.key.value.fields.spec".
 
 The reducer extracts output key/value pairs in a similar manner, except that
 the key is never ignored.
 
 
| 
Method Summary | 
 void | 
extractOutputKeyValue(String key,
                      String val,
                      String fieldSep,
                      List<Integer> keyFieldList,
                      List<Integer> valFieldList,
                      int allValueFieldsFrom,
                      boolean ignoreKey,
                      boolean isMap)
 
            | 
 Text | 
getKey()
 
            | 
 Text | 
getValue()
 
            | 
static int | 
parseOutputKeyValueSpec(String keyValueSpec,
                        List<Integer> keyFieldList,
                        List<Integer> valueFieldList)
 
            | 
static String | 
specToString(String fieldSeparator,
             String keyValueSpec,
             int allValueFieldsFrom,
             List<Integer> keyFieldList,
             List<Integer> valueFieldList)
 
            | 
 
| Methods inherited from class java.lang.Object | 
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
 
emptyText
public static Text emptyText
DATA_FIELD_SEPERATOR
public static final String DATA_FIELD_SEPERATOR
- See Also:
 - Constant Field Values
 
MAP_OUTPUT_KEY_VALUE_SPEC
public static final String MAP_OUTPUT_KEY_VALUE_SPEC
- See Also:
 - Constant Field Values
 
REDUCE_OUTPUT_KEY_VALUE_SPEC
public static final String REDUCE_OUTPUT_KEY_VALUE_SPEC
- See Also:
 - Constant Field Values
 
FieldSelectionHelper
public FieldSelectionHelper()
FieldSelectionHelper
public FieldSelectionHelper(Text key,
                            Text val)
parseOutputKeyValueSpec
public static int parseOutputKeyValueSpec(String keyValueSpec,
                                          List<Integer> keyFieldList,
                                          List<Integer> valueFieldList)
 
specToString
public static String specToString(String fieldSeparator,
                                  String keyValueSpec,
                                  int allValueFieldsFrom,
                                  List<Integer> keyFieldList,
                                  List<Integer> valueFieldList)
 
getKey
public Text getKey()
 
getValue
public Text getValue()
 
extractOutputKeyValue
public void extractOutputKeyValue(String key,
                                  String val,
                                  String fieldSep,
                                  List<Integer> keyFieldList,
                                  List<Integer> valFieldList,
                                  int allValueFieldsFrom,
                                  boolean ignoreKey,
                                  boolean isMap)
 
Copyright © 2009 The Apache Software Foundation