4

I have been working on making a completely offline dictionary using the Wiktionary XML dumps. The dumps themselves are about 10 MB, but when converted into a index using a search engine indexer (I use Whoosh Search Engine in Python), the complete index comes to about 250 MB. Which I think would be difficult to distribute, it might be zipped, still it won't come anywhere near 10 MB. And indexing takes about 1 hour in my system, so indexing while installing the software in a PC is tedious.

So I am looking for an alternate way of storing the words and meanings to make the dictionary. Which is a better searchable solution? May be some sort of Data Base that produces light weight DBs.

Or the search engine indexes are better than DBs?

  • +1: I don't know if this is on topic here, but if people have information about what works well for (large) dictionary databases, I'm possibly interested. – compman Jun 19 '11 at 03:34

1 Answers1

3

Have a look at the Directed Acyclic Word Graph data structure, which is designed to be a highly space-economical way to store dictionaries. They are commonly used on mobile phones, where economizing storage space is important.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
  • +1 for introducing me to a data structure that's obvious when you think about it but cool nonetheless. Just looking at the name I thought "Sounds like a trie except…ah, clever". – Jon Purdy Jun 19 '11 at 07:04
  • Any examples of Python libraries or implementations? –  Jun 19 '11 at 15:15