Let's say I have a big corpus (for example in english or an arbitrary language), and I want to perform some semantic search on it. For example I have the query:
"Be careful: [art] armada of [sg] is coming to [do sg]!"
And the corpus contains the following sentence:
"Be careful: an armada of alien ships is coming to destroy our planet!"
It can be seen that my query string could contain "semantic placeholders", such as:
[art] - some placeholder for articles (for example a / an in English) [sg], [do sg] - some placeholders for NPs and VPs (subjects and predicates) I would like to develop a library which would be capable to handle these queries efficiently. I suspect that some kind of POS-tagging would be necessary for parsing the text, but because I don't want to fully reimplement an already existing full-text search engine to make it work, I'm considering that how could I integrate this behaviour into a search engine like Lucene?
I know there are SpanQueries which could behave similarly in some cases, but as I can see, Lucene doesn't do any semantic stuff with stored texts.
It is possible to implement a behavior like this? Or do I have to write an own search engine?