I'm trying to implement an autocomplete feature for a search engine. I have a database of words(stemmed) that occur in the documents that I have for users to search.
What I am thinking of doing is:
Constructing a graph with words as nodes and weighted edges with weights as the number of times those 2 words have been searched together.
Each node also keeps track of which is the most commonly typed word next to it.
Whenever a user starts typing his query, this graph is queried and the first word is predicted by string matching and then ranking the matched words according to search frequency.
When the user starts typing the next word, the graph is visited and the neighbours of this node are retrieved and shown to the user in order of frequency (weight of edges).
This is a rough overview of what I am trying to achieve. I have not implemented it yet. It would require a fair amount of work to implement this, and I think it would be quite slow. And to be quite honest, I don't think doing that much work knowing that it'll be quite slow is worth it.
If I am on the right track with this conceptual design, are there any optimizations that should be made to this to improve performance? Or perhaps an entirely different design that I should be using instead?
I found only 2 papers on query prediction, this and this. I went through the latter but it isn't that related to my use-case.