6

we wrote and still maintain a large E-Commerce application. Our feature list resembles what you would expect from most shops. We'd like to improve some of our features, and now the search/suggestion list functionality (enter some letters, a JScripted suggestion list appears) has caught our eye.

Currently, we use http://xapian.org/. It has some drawbacks. Firstly, it's not actually the right solution. It has been created to index documents, not ever-changing data in a granularity that an E-Commerce application would need. Secondly, the load on the database is significant when we reindex all data every night.

We'd like a framework that has been designed for indexing database data, which can add to the index easily and without much load, which can supply data changes in the backoffice quickly to the frontend without much load and delay.

I'm aware of the fact that Xapian is Open Source and even Free Software, so we could adapt it to our needs if we decided to invest the time and manpower. But taking a quick look around for a solution more suited seems fair, right?

Oh, and commercial applications are fine, too. FOSS is not required.

gnat
  • 21,442
  • 29
  • 112
  • 288
Dabu
  • 203
  • 1
  • 6

2 Answers2

2

We had EXACTLY the same needs that you are describing. We are a python shop... we sell around 300,000 unique items - so we needed fast updates. After testing Xapian I decided against it. What we ended up with was PyLucene - we have a server that is constantly rebuilding search indexes and replaces copies of read-only search indexes periodically. When you can have a local-copy on your servers of your search index it makes things amazingly fast. Transporting the database is just copying gzipped files on disk.

We actually spoke to several larger companies and got some advice from them - this seemed like the right way to go. If you are looking for a FTS engine that is real-time updated I would look at TSEARCH2 and Postgres, otherwise Lucene, or PyLucene is a great way to go. It takes about 20 seconds to build our index - and we can obviously adjust our weights and search formula's as frequently as we like.

Ben DeMott
  • 601
  • 3
  • 9
0

Have a look at elasticsearch. It is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Apache Lucene.

It's being used in production for big projects, and several advantages over other search solutions, especially with real-time indexing and scalability.

I'm sure there are libraries to use in python around.

Hakan Deryal
  • 1,403
  • 10
  • 13