Class UpdateIndex

  extended by org.apache.hadoop.contrib.index.main.UpdateIndex

public class UpdateIndex
extends Object

A distributed "index" is partitioned into "shards". Each shard corresponds to a Lucene instance. This class contains the main() method which uses a Map/Reduce job to analyze documents and update Lucene instances in parallel. The main() method in UpdateIndex requires the following information for updating the shards: - Input formatter. This specifies how to format the input documents. - Analysis. This defines the analyzer to use on the input. The analyzer determines whether a document is being inserted, updated, or deleted. For inserts or updates, the analyzer also converts each input document into a Lucene document. - Input paths. This provides the location(s) of updated documents, e.g., HDFS files or directories, or HBase tables. - Shard paths, or index path with the number of shards. Either specify the path for each shard, or specify an index path and the shards are the sub-directories of the index directory. - Output path. When the update to a shard is done, a message is put here. - Number of map tasks. All of the information can be specified in a configuration file. All but the first two can also be specified as command line options. Check out conf/index-config.xml.template for other configurable parameters. Note: Because of the parallel nature of Map/Reduce, the behaviour of multiple inserts, deletes or updates to the same document is undefined.

Field Summary
static org.apache.commons.logging.Log LOG
Constructor Summary
Method Summary
static void main(String[] argv)
          The main() method
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail


public static final org.apache.commons.logging.Log LOG
Constructor Detail


public UpdateIndex()
Method Detail


public static void main(String[] argv)
The main() method

argv -

Copyright © 2009 The Apache Software Foundation