A distributed "index" is partitioned into "shards". Each shard corresponds to a Lucene instance. This class contains the main() method which uses a Map/Reduce job to analyze documents and update Lucene instances in parallel. The main() method in UpdateIndex requires the following information for updating the shards: - Input formatter. This specifies how to format the input documents. - Analysis. This defines the analyzer to use on the input. The analyzer determines whether a document is being inserted, updated, or deleted. For inserts or updates, the analyzer also converts each input document into a Lucene document. - Input paths. This provides the location(s) of updated documents, e.g., HDFS files or directories, or HBase tables. - Shard paths, or index path with the number of shards. Either specify the path for each shard, or specify an index path and the shards are the sub-directories of the index directory. - Output path. When the update to a shard is done, a message is put here. - Number of map tasks. All of the information can be specified in a configuration file. All but the first two can also be specified as command line options. Check out conf/index-config.xml.template for other configurable parameters. Note: Because of the parallel nature of Map/Reduce, the behaviour of multiple inserts, deletes or updates to the same document is undefined.

