This is sort of a follow up to this question about NLG research directions in the linguistics field.
How do personal assistant tools such as Siri, Google Now, or Cortana perform Natural Language Generation (NLG)? Specifically, the sentence text generation part. I am not interested in the text-to-speech part, just the text generation part.
I'm not looking for exactly how each one does it, as that information is probably not available.
I am wondering what setup is required to implement sentence generation of that quality?
- What kind of data would you need in a database (at a high level)?
- Does it require having a dictionary of every possible word and it's meaning, along with many books/corpora annotated and statistically analyzed added to it?
- Does it require actually recording people talk in a natural way (such as from TV shows or podcasts), transcribing them to text, and then adding that somehow to their "system"? (to get really "human"-like sentences)
- Or are there just simple syntax-based sentence patterns they are using, with no gigantic semantic "meaning" database? Where someone just wrote a bunch of regular expressions type thing..
- What are the algorithms that are used for such naturally written human-like sentences?
One reason for asking is, it seems like the NLG field is very far from being able to do what Siri and Google Now and others are accomplishing. So what kind of stuff are they doing? (Just for the sentence text generation part).