5

I am trying to understand how Lucene should be used.

From what I have read, creating an IndexReader is costly, so using a Search Manager shoulg be the right choice. However, a SearchManager should be produced by a NRTManager(which, by the way, should replace the IndexWriter for every add or delete operation performed). But in order to have a NRTManager, I should first have an IndexWriter, and here comes my problem.

The documentation says:

  • an IndexWriter is thread-safe
  • the constructor of this class takes a Directory object, so it seems creating an instace should be costly(as in the case of an IndexReader)
  • all changes are buffered and flushed periodically(so they seem to encourage using a single instance)

    but:

  • the changes, although flushed will only be visible after commit or close

  • after finished making updates(add/delete), the instance should be closed
  • I also found this: https://stackoverflow.com/questions/5374419/forgot-to-close-the-lucene-indexwriter-after-adding-documents-to-the-index where it is said that not closing a writer might ruin everything

So what am I really supposed to do? Is having a single IndexWriter instance a good idea (make only commit and never close it)?

EDIT: What is more, if I use NRTManager, how can I make a commit? Is it even possible?

Dragos
  • 409
  • 4
  • 9

1 Answers1

5

Ok, how do we start. First of all this is written based on Lucene 3.6. NRTManager is used for Near-Realtime-Scenarios where Writing and Reading is very close after each other. An Example would be Twitter (that is actually using a modified Version of Lucene). In these cases you are not suppost to close your IndexWriter as all changes that occure are being tracked by NRTManager.TrackingIndexWriter - use NRTManagerReopenThread to periodicly trigger refreshes on the Searcher Threads.

For non-realtime scenarios you rather want to use the SearcherManager to acquire IndexSearcher and an instance of IndexWriter to write Documents. After a set of Documents is written to the index (or on random base) use ReferenceManager.maybeRefresh (extended by SearcherManager) to refresh the Searcher Threads.

To sum up:

  • You can have a single instance of IndexWriter and commit your changes. You can as well have multiple threads of IndexWriter writing on one index with ConcurrentMergeScheduler.
  • Close your IndexWriter only if you are sure that you do not have any changes to do on the index (mind that it is very coastly to open an IndexWriter)
  • You never commit an NRTManager as all changes are being tracked.
Matt Ball
  • 457
  • 4
  • 11
Mirko
  • 51
  • 2
  • 4
    How do I deal with the case when the server shuts down before the changes in memory are committed? Do I listen for `contextDestroyed` in `contextListener` and call `IndexWriter#submit` then close the index writer? – qualebs Nov 15 '14 at 21:31