org.apache.hadoop.examples
Class RandomTextWriter

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.hadoop.examples.RandomTextWriter
All Implemented Interfaces:
Configurable, Tool

public class RandomTextWriter
extends Configured
implements Tool

This program uses map/reduce to just run a distributed job where there is no interaction between the tasks and each task writes a large unsorted random sequence of words. In order for this program to generate data for terasort with a 5-10 words per key and 20-100 words per value, have the following config:

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>test.randomtextwrite.min_words_key</name> <value>5</value> </property> <property> <name>test.randomtextwrite.max_words_key</name> <value>10</value> </property> <property> <name>test.randomtextwrite.min_words_value</name> <value>20</value> </property> <property> <name>test.randomtextwrite.max_words_value</name> <value>100</value> </property> <property> <name>test.randomtextwrite.total_bytes</name> <value>1099511627776</value> </property> </configuration> Equivalently, RandomTextWriter also supports all the above options and ones supported by Tool via the command-line. To run: bin/hadoop jar hadoop-${version}-examples.jar randomtextwriter [-outFormat output format class] output


Constructor Summary
RandomTextWriter()
           
 
Method Summary
static void main(String[] args)
           
 int run(String[] args)
          This is the main routine for launching a distributed random write job.
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Constructor Detail

RandomTextWriter

public RandomTextWriter()
Method Detail

run

public int run(String[] args)
        throws Exception
This is the main routine for launching a distributed random write job. It runs 10 maps/node and each node writes 1 gig of data to a DFS file. The reduce doesn't do anything.

Specified by:
run in interface Tool
Parameters:
args - command specific arguments.
Returns:
exit code.
Throws:
IOException
Exception

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception


Copyright © 2009 The Apache Software Foundation