8

This is sort of a follow up to this question about NLG research directions in the linguistics field.

How do personal assistant tools such as Siri, Google Now, or Cortana perform Natural Language Generation (NLG)? Specifically, the sentence text generation part. I am not interested in the text-to-speech part, just the text generation part.

I'm not looking for exactly how each one does it, as that information is probably not available.

I am wondering what setup is required to implement sentence generation of that quality?

  • What kind of data would you need in a database (at a high level)?
    • Does it require having a dictionary of every possible word and it's meaning, along with many books/corpora annotated and statistically analyzed added to it?
    • Does it require actually recording people talk in a natural way (such as from TV shows or podcasts), transcribing them to text, and then adding that somehow to their "system"? (to get really "human"-like sentences)
    • Or are there just simple syntax-based sentence patterns they are using, with no gigantic semantic "meaning" database? Where someone just wrote a bunch of regular expressions type thing..
  • What are the algorithms that are used for such naturally written human-like sentences?

One reason for asking is, it seems like the NLG field is very far from being able to do what Siri and Google Now and others are accomplishing. So what kind of stuff are they doing? (Just for the sentence text generation part).

Lance
  • 2,537
  • 15
  • 34
  • 1
    There's probably a multitude of approaches but I believe one of them is at least a combination of a rule-based system and a statistically based system. Google obviously has a lot of text samples at their disposal so when you type "I want to sw.." it will look at those samples and give you "swim" and "swing at a tree" because that's what occurs the most. The rule-based system can allow for Google to also search for samples with the same grammar structure but different content (e.g. "we wanted to swim"). That's just one approach of many though. – Jeroen Vannevel Jan 18 '15 at 18:51
  • There are probably two very distinct sides to the question: Creating proper sentences and natural sounding text-to-speach. You might want to clarify which side you are most interested in. – Bart van Ingen Schenau Jan 18 '15 at 19:53
  • 3
    Usually these sentences are not generated, but *retrieved* from the limitless corpus that the internet now constitutes. Peter Norvig has a nice item in *Beautiful Code* showing how some NLP problems basically solve themselves once you've got access to a trillion-word corpus. We like to believe that useful sentence generation is on a completely different level of difficulty than the obvious segmentation or hyphenation, but it isn't really; not when you have that much example data to select from. – Kilian Foth Jan 18 '15 at 20:07
  • @BartvanIngenSchenau updated, I am not interested in the text-to-speech part, only how to naturally construct text sentences. – Lance Jan 18 '15 at 20:47
  • @KilianFoth ah that is very interesting, thanks for the insight. Will look more into that (guessing it's this http://norvig.com/ngrams/). Are you saying it's not worth it nowadays to try other methods? – Lance Jan 18 '15 at 20:50
  • @KilianFoth While individual sentences may be just copy/pasted from the trillions of sentences out there potentially, it doesn't seem possible to construct a simple abstract/summary sort of thing in this fashion. To combine multiple sentences requires a whole bunch of other stuff, such as making a coherent narrative, how to make it interesting, etc.. I am interested in how that component is handled as well by these tools. – Lance Jan 18 '15 at 21:01

2 Answers2

3

Siri typically doesn't "generate" sentences. She parses what you say and 'recognizes' certain keywords, sure, and for common responses, she will use a template, such as I found [N] restaurants fairly close to you or I couldn't find [X] in your music, [Username].

But most of her responses are canned, depending on her interpretation of your speech, in addition to a random number generator to choose a creative answer to a flippant question. Simply asking Siri "How much wood can a wood chuck chuck?" or "What is the meaning of life?" will generate any of a variety of answers. There are numerous cultural references and jokes built-in (and repeated verbatim) that prove with relative certainty that Siri is not just spontaneously generating most of her text, but pulling it from a database of some sort. It's likely that incoming questions are saved to a central server, where new responses to those questions can be created by Apple employees, allowing Siri to "learn".

Her text-to-speech part is good enough, however, that it sometimes makes it seem as though the answers are being generated...

Ayelis
  • 191
  • 7
2

If you have a so-called deep syntactic representation of what you want to generate, such as read(he,book), then it's relatively easy to generate its linear representation. One needs a formal grammar describing the syntax of the language and a morphological lexicon for inflected forms. Generation is an order of magnitude easier than analysis (since one is "creating ambiguity", not resolving it).

If you have only a logical representation (say, in first-order logic), things get more complicated. Let's say, you have buy(John,book) ∧ read(John,book). One could generate two sentences like John bought a book. John read a book but it feels unnatural. A better output would be John bought a book. He read it. Even better would be to generate one compound sentence with and. The logical representation might be similar to the deep syntactic representation above but there are no pronouns, no clause boundaries, etc. The phase of translating a purely logical representation of what one wants to convey into something more "human-like" is called "language planning" or "sentence planning" and is the harder task in the process.

Atamiri
  • 121
  • 3