org.apache.hadoop.contrib.index.main
Class UpdateIndex
java.lang.Object
org.apache.hadoop.contrib.index.main.UpdateIndex
public class UpdateIndex
- extends Object
A distributed "index" is partitioned into "shards". Each shard corresponds
to a Lucene instance. This class contains the main() method which uses a
Map/Reduce job to analyze documents and update Lucene instances in parallel.
The main() method in UpdateIndex requires the following information for
updating the shards:
- Input formatter. This specifies how to format the input documents.
- Analysis. This defines the analyzer to use on the input. The analyzer
determines whether a document is being inserted, updated, or deleted.
For inserts or updates, the analyzer also converts each input document
into a Lucene document.
- Input paths. This provides the location(s) of updated documents,
e.g., HDFS files or directories, or HBase tables.
- Shard paths, or index path with the number of shards. Either specify
the path for each shard, or specify an index path and the shards are
the sub-directories of the index directory.
- Output path. When the update to a shard is done, a message is put here.
- Number of map tasks.
All of the information can be specified in a configuration file. All but
the first two can also be specified as command line options. Check out
conf/index-config.xml.template for other configurable parameters.
Note: Because of the parallel nature of Map/Reduce, the behaviour of
multiple inserts, deletes or updates to the same document is undefined.
Field Summary |
static org.apache.commons.logging.Log |
LOG
|
Method Summary |
static void |
main(String[] argv)
The main() method |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.apache.commons.logging.Log LOG
UpdateIndex
public UpdateIndex()
main
public static void main(String[] argv)
- The main() method
- Parameters:
argv
-
Copyright © 2009 The Apache Software Foundation