How to teach a script to detect sarcasm?

Question

I'm currently building a fun script, that basically matches given phrases and gives a predefined response based on the match-points. You can ask it to retrieve some information based on live feeds, run tasks, tell anecdotes or just chat with her.

I already have built-in detection for badwords, caps lock or both. The program has a girl name and I tried to make as close to being a girly-girl as possible, on the logic terms (for example: everybody knows that most girly-girls take 700ms to respond to a question, joking of course). So here is a little example:

Client: WHAT IS YOUR PROBLEM?
Kiku: DONT USE THAT TONE WITH ME!
Client: #### you
Kiku: why are you being so mean to me :/

However, I really would like to add sarcasm-feature to it. So if you write something in sarcasm, then she will detect it and respond accordingly. Now this is a tricky part, How do you teach a script, what sarcasm is?

To me more specific. What are the most common sarcastic words used today? Or how to get that statistic? How can I make the script understand the context of the given phrase?

UPDATE

As this question is getting alot of hype, I think things should be cleared out a bit more. It is very clear, that making a script fully detect sarcasm is basically impossible. At least in reasonable matter. However, I do believe that some amount of possible sarcasm could be detected.

I currently have made it so far, that my script can detect a very limited sarcasm. I predefined some common sarcastic-words (however, alone they are useless.) For example: like, whatever, yeah, right and great. And then it first matches, the simple things, like uppercase and quoted: THANKS you are so smart or oh you are so "SMART".

Since the scripts main function is to do tasks or retrieve information, and afterwords it will ask if that was what you meant. Then I thought, adding "thanks" as a special variable. So yeah thanks or whatever thanks will trigger the possible sarcasm, and the script will ask you: "Do I detect sarcasm?" Your best bet would be to say "sorry" then, otherwise it will add a warning-point and if the limit is reached -- it will start ignoring you.

As these very very simple algorithms seem to actually work, then this idea has a future, of course after alot of tuning and tweaking. However, is somebody alot smarter would make a open-source software with the same idea in mind. Then this feature could be hooked to alot of functions on the web. Customer care would probably benefit the most, however, this kind of software could also be used detecting "flaggable"-content.

Until your script can actually _understand_ what is being said (rather than picking up keywords), I think it's impossible to catch most sarcasm scenarios — Rob, Sep 11 '11 at 23:34
@Rob I'm starting to think the same. Most scenarios are too good of a hope anyways. Though, weirdly enough Ive gotten some ideas from the comments of this question. For example, some words UPPECASE in a phrase or "bunnyears" are usually related to sarcasm. So I'm a little smarter. However, that is not enough :/ — Kalle H. Väravas, Sep 12 '11 at 00:02
@Kalle Sure, I think quotations around a single word could be a hint - however uppercase words could be used for emphasis as well — Rob, Sep 12 '11 at 00:22
I don't think they reveal their algorithms for obvious reasons, but [ToneCheck](http://tonecheck.com/) is trying to do something like this: analyze the tone of emails and suggest improvements. I'm not sure if they do sarcasm as well, but either way it looks neat and related. Maybe you can install it, write some sarcastic emails, and see if it picks that up? :) — Adam Lear, Sep 12 '11 at 00:37
@Anna Lear: ToneCheck seems very interesting. However, I don't quite get, how they are detecting and most likely the system is bigger then *my house*. However, I got an idea on what may trigger the sarcasm-detection. Ty, for the link. — Kalle H. Väravas, Sep 12 '11 at 01:40
One easy way would be to just flag all anomalous statements, cliche phrases, and words in CAPS-LOCK. Otherwise, super-AI would help. — Caffeinated, Sep 12 '11 at 02:00
CAPS-LOCK gets detected before sarcasm (however, in the form of most of the strings characters being uppercase). With that I'm hoping to nerf the use of uppercase characters. So if we know, that the client doesn't user much uppercase characters, then we can detect single or more words in a string as possible sarcasm. — Kalle H. Väravas, Sep 12 '11 at 02:26
Here are two current scientific papers about developments in sarcasm detection. This is cutting-edge stuff, not production ready and very difficult to copy, such as (unfortunately) most current developments in NLP: [paper from Jerusalem Hebrew University](http://staff.science.uva.nl/~otsur/papers/sarcasmAmazonICWSM10.pdf), [paper from Valencia Technical University](http://aclweb.org/anthology-new/W/W11/W11-1715.pdf). Both approaches use statistical inference on hand-annotated corpuses of Amazon product reviews. — Felix Dombek, Sep 12 '11 at 03:43
@Kalle Let's be clear here. Sarcasm is one of the most subtle and advanced devices in spoken language. Even if you're a native English speaker, fully capable of picking up on the subtleties e.g. tone of voice and any relevant contextual information, you'll regularly fail to detect sarcasm. Non-native speakers stand almost no chance whatsoever. Take it to text and even the native speakers stand almost no chance. _And you want computers, which struggle to dimly comprehend even the most simple of sentences, to solve this problem?_ Leave this to someone with a lifetime in speech and text analysis. — doppelgreener, Sep 12 '11 at 03:58
There are some interesting threads [here](http://english.stackexchange.com/search?q=sarcasm) on *people* recognising sarcasm... — Benjol, Sep 12 '11 at 06:55
I wouldn't think it's possible to get a computer recognising sarcasm beyond anything but a rudimentary level from predictable\common place sarcastic phrases. If humans so often miss sarcasm in text (because so much of sarcasm is tone of voice, facial expression and body language), then I don't hold much hope for computers. — Andy Hunt, Sep 12 '11 at 08:42
I don't think this question should've been closed, especially not as "not constructive". I'd cast a reopen vote right now if it weren't binding, but I think this question deserves a few reopen votes especially with the latest edit. — Adam Lear, Sep 12 '11 at 20:15
@Anna: Agreed, however since my interests are involved, then my opinion is compromised. Still, I thought this fits the format of Programmers, or I wouldn't have posted it. At least it was open for so long, that programmers from different timezones could respond :) — Kalle H. Väravas, Sep 12 '11 at 21:25
Sorry this is a really silly question, but how do you vote to reopen? — Rei Miyasaka, Sep 13 '11 at 00:24
@Rei: [You need 3000rep for that.](http://meta.programmers.stackexchange.com/questions/680/why-is-there-no-vote-to-reopen) — Kalle H. Väravas, Sep 13 '11 at 00:56
@Zenzelezz: The Meditation Guru at Node 42 detected sarcasm at line 1 of comment 11. Thanks for playing. — Robert Harvey, Nov 08 '11 at 19:45

Dave Nay · Answer 1 · 2013-01-07T04:56:06.543

43

if(string.Contains("<Sarcasm>")) containsSarcasm = true;

</Sarcasm>

Honestly, I have no idea how to go about this. I think only about 30% of people in real life "get" sarcasm in the first place, so making a computer recognize it and understand it sounds like a very difficult task.

Edit Based on the comments to my original post, I believe that I have perfectly illustrated the extreme difficulty of the task that is being asked. Yes, the first half of my post was sarcastic. I even emphasized that fact by using a made-up markup tag. By posting a sarcastic comment about a trivial solution to an exceptionally difficult problem, and that sarcasm being interpreted as "unhelpful" brings up this question:

If you can't recognize written sarcasm, then how are you going to write an algorithm that recognizes it?

Oh, and Anna if someone includes the term "I think..." in their sentence, it usually indicates that it is an opinion that is being stated, not necessarily researched hard fact.

edited Jan 07 '13 at 04:56

answered Sep 11 '11 at 23:08

Dave Nay

3,809
2
18
25

For the record the back end platform, that deals with the matches is PHP. But I didn't add it on purpose, since matching functions are mostly the same in all platforms.. The problem is with what to match it? I don't quite follow your answer.. do you mean, if the client writes hey there smartypants? But the client is not going to do that.. Also, you are missing on `)` in your if statement. – Kalle H. Väravas Sep 11 '11 at 23:26
3

-1 I'm sorry, I'm not sure why this answer is getting upvotes. The code provided is 100% irrelevant to the question at hand and the the second part of the answer your are saying "i have no idea". Your answer is 30% on-topic and doesn't really help or give any good direction. – Kalle H. Väravas Sep 11 '11 at 23:59
18

It's getting upvoted because the poster has pointed out that there's no practical way to do this. Since humans often can't detect sarcasm, then there's no way you can teach a machine to. Also, sarcasm is often conveyed by tone of voice, and hence often misunderstood in online communication. – Andy Waite Sep 12 '11 at 00:03
1

@Andy Waite Yes of course, you are correct. As I said hes answer is 30% on-topic (even-thought he started getting upvotes when there was no additional-text.) Still, I wont accept that answer, because.. yes of course you cannot get 100% match.. But even a 50% match? This answer is basically saying, "I don't understand your question, I think its not possible, Sounds hard, Just quit!" I mostly only get these types of answers in Programmers, next to "Why are you doing this? Use a framework!".. – Kalle H. Väravas Sep 12 '11 at 00:12
4

@Kalle I agree with you. This answer is part sarcasm, part "I don't know", and part guesswork (30%? Really? Can you back that up?). It's entirely unhelpful and should've at best been a comment. – Adam Lear Sep 12 '11 at 00:34
@Anna Lear Sadly, it keeps getting upvotes. Typical, person lands on a question-page, views the title, quoted/code example --> checks the most upvoted answer and upvotes. I have the temptation to flag this answer, but it does fit the format (barely). Im also sad, that the person doesn't respond to my comments. I have the feeling, that this user does have some experience and could make this answer better.. – Kalle H. Väravas Sep 12 '11 at 01:15
1

@Kalle: I was typing my edit during your last comment. ;-) I was at the grocery store when you and Anna wrote your other comments. – Dave Nay Sep 12 '11 at 01:24
@DaveNay: I see. Well, your edit clears it up and I removed my -1, but the answer still is still telling to "just quit", rather then "try something". I already got some slight ideas, on how you can detect some sarcasm. But I don't agree, that it is totally impossible. – Kalle H. Väravas Sep 12 '11 at 01:28
@Kalle: You agreed with Rob in the comments to your question above. – Dave Nay Sep 12 '11 at 01:30
And for the record, I never said impossible in my post or comments. – Dave Nay Sep 12 '11 at 01:32
@DaveNay: I agree on the fact, that predefining sarcastic scenarios is actually possible, but very difficult. Obviously 100% match is going to impossible, but some % should be possible. I also think, that making such point on a incorrect syntax as you did in your first part of the question, is not obvious in a Programming community and I still look at it as a very poor answer. – Kalle H. Väravas Sep 12 '11 at 01:37
For the record, neither did you say it is possible or give any valuable direction. As far as I understand, you were only sarcastic and basically mocking my attempts to do something, what you wouldn't do?! Correct? – Kalle H. Väravas Sep 12 '11 at 01:43
1

I can't say it's possible because I have no idea how to do it. I was in no way mocking you at all. – Dave Nay Sep 12 '11 at 01:45
This research may help you out. It's a very different problem but I suspect there are similarities. http://www.physorg.com/news/2011-05-computer-program-understands-the-thats.html – Tyler Sep 12 '11 at 02:37
@DaveNay: The statement is missing a closing bracket and would be better expressed as `containsSarcasm = string.Contains("")` anyway. However: if someone tackling an open problem, it's bad to ridicule of OP's request, especially given that you're not knowledgeable in this field. Computational linguistics exists since the 1950s and has yielded some interesting insights into [sentiment analysis](http://en.wikipedia.org/wiki/Sentiment_analysis), [semantic role labeling](http://en.wikipedia.org/wiki/Semantic_role_labeling) and other semantic techniques which might be employed for this. – Felix Dombek Sep 12 '11 at 03:26
3

Interesting comment thread - apparently even humans can't detect sarcasm (or lack thereof) reliably. – Piskvor left the building Sep 12 '11 at 08:36
@Kalle H.: Just accept the other better answer, and over time it will receive more upvotes. You won't get much better answers anymore. – Steven Jeuris Sep 12 '11 at 10:01
1

@Steven: This is a joke-answer. This is as helpful as "google how to detect sarcasm". And actually, I did get alot better answers. – Kalle H. Väravas Sep 12 '11 at 18:22
Funnily enough, the code you wrote wouldn't succeed in matching your sarcasm tags :P – back2dos Sep 12 '11 at 20:07
`Oh, and Anna if someone includes the term "I think..." in their sentence, it usually indicates that it is an opinion that is being stated, not necessarily researched hard fact.` Really? I had no idea. Seriously though, opinion or not, I still think your answer is unhelpful. It may illustrate a point, but I think if you don't have any idea how to approach a problem, maybe you should leave a comment instead. – Adam Lear Sep 12 '11 at 20:12
@Kalle - What good would 50%, or wrong half the time be? I think he nailed it - not possible, and I would add that it's likely a bad idea (although extremely interesting). just think about the scenario of when a computer takes sarcasm serious. This could lead down a very convoluted path. – JSON Oct 23 '16 at 01:36

Charles E. Grant · Accepted Answer · 2011-09-12T01:03:47.093

17

If you had a full natural language processing system and a database of facts ala the IBM Watson system, you might be able flag some statements as possible sarcasm. For example, "I hear your mother has cancer and you just got fired!" "Yeah, isn't life wonderful!" could be flagged because it could recognize that getting cancer and loosing a job are not generally described as positive experiences.

I assume you don't have the resources to put together a Watson grade system. You could put together a database of commonly used sarcastic phrases, and then use some sort of text matching algorithm between the target statement and the sarcasm database. I have to guess that it won't be very effective because all the phrases that are used sarcastically are used sincerely more frequently. For example "That's a nice X." is usually used sincerely, but is sometimes used sarcastically.

Sarcasm is very closely related to deception. It's not uncommon for a person using sarcasm to deny it when challenged on it, and their choice of words makes denial possible. I suspect this means that a good sarcasm detector is probably about as hard a problem as a conversational program that passes the Turing test.

edited Sep 12 '11 at 01:03

answered Sep 12 '11 at 00:50

Charles E. Grant

16,612
1
46
73

1

Thanks for your answer. This answers quality is definitely better then the one above. I think creating a database of actual phrase is gonna get too big. However a database of words + some other methods, could actually make it work (not 100%, but then again not 0%). I'm planning to add a verification step. If the script detects a slight level of sarcasm, it will ask: Was that sarcasm? If you respond: "OH NO", then it confirm the sarcasm and actually get mad and ignore you for some time (the time will be determined by the level of sarcasm). – Kalle H. Väravas Sep 12 '11 at 01:24
3

I really think you want to stick with the phrases because sarcasm is all about context, and there are at least a few phrases that are more likely to be used sarcasticaly. "Yeah" isn't sarcastic, "Right" isn't sarcastic, but "Yeah, right!" is probably used more often sarcastically then sincerely. – Charles E. Grant Sep 12 '11 at 01:56
6

As for the quality of the answers here, I have to say this is partially your responsibility. Natural language processing is a well known field that could provide you with several useful techniques, but you give no indication of having done any research to look up existing methods. It's a tough problem, and not something that can be usefully be answered in general in a SO post. Many, many, books have been filled on the topic of natural language processing. – Charles E. Grant Sep 12 '11 at 02:07

score 11 · Answer 3 · answered Sep 12 '11 at 03:49

The problem of sarcasm detection is an open problem in computational linguistics - you'd be better served by searching google scholar than stackexchange for such things. There has, however, been some progress made on the issue. For spoken sarcasm, a robust recognizer can be built using "spectral and contextual features" that (the authors claim) detects sarcasm as well as a human annotator. The authors of the paper claim that the raw text is therefore not enough to detect sarcasm - indeed, they got better results by ignoring the actual words being said.

Tsur et al. have also reported some interesting results in textual sarcasm detection just last year with their SASI algorithm. They also report some additional followup findings in another paper.

In any case, this is the cutting edge of computational language research; do not expect anyone to hand you a libsarcasm on a silver platter. You will need large training datasets and a lot of free time to tweak your sarcasm detector - and even then, a precision of 77% (as reported in the SASI paper) isn't enough to reject a post based solely on a sarcasm flag.

Hmm, 77%? That seems to be better than most *people*. (no sarcasm intended) — Piskvor left the building, Sep 12 '11 at 08:38
@Piskvor, quite possibly, but I don't have statistics on that. It's better precision than the audio one at least. — bdonlan, Sep 12 '11 at 15:06

score 2 · Answer 4 · answered Sep 12 '11 at 03:08

I don't think this answer is a very realistic method of approach, but if you had the resources to do it I believe it would be possible. Consider google's reCaptcha project which uses human beings to decipher words that computers can not read ("Learn More Recaptcha Page"). I believe the problem is similar, in that you are trying to get a machine to figure out something humans are at least better at doing already.

Imagine you had the resources to ask millions of people to identify sarcasm to you within a typed conversation context. Imagine that you could ask that many people to submit to you the exact moment in the conversation when sarcasm was realized and as little of the conversation prior to the realization that one would need to mark that identification. This could be stored in a database let's say, which your program had access to. Then, as the user was typing the conversation to you, the database could be filtered for "similar" conversations.

How to evaluate similarity is something to think about, but one that I believe there's probably research in existence for already. I believe it would be very much like the theory behind spelling error corrections. Either way it would probably come down to a probabilistic value that the conversation being typed is in fact sarcastic, and at some point a threshold would have to be provided.

I do also like your idea of presenting the question "Was that sarcasm?" to the user and then using their response to reach a more accurate decision.

I do hope my answer was not a complete waste and I wish you luck in this endeavor.

-Asaf

score 1 · Answer 5 · answered Sep 12 '11 at 02:30

Sarcasm detection in computational linguistics (aka natural language processing) is an extremely difficult problem in its own right. It is basically a classification problem where a model must be trained first. A similar problem for finding double-entendres (PDF file) was recently researched and published. The techniques for both problems are comparable.

score 1 · Answer 6 · answered Sep 12 '11 at 09:46

1

My 2 cents:

Ask a psychologist about how to recognize sarcasm in phrases, with that info compare them to the input.

But it would be a really hard project, with the effort used in that, you could do the best O.S. in the world for sure :P

answered Sep 12 '11 at 09:46

dsocolobsky

963
9
15

How to teach a script to detect sarcasm?

6 Answers6