86

I have a website e-mail form. I use a custom CAPTCHA to prevent spam from robots. Despite this, I still get spam.

Why? How do robots beat the CAPTCHA? Do they use some kind of advanced OCR or just get the solution from where it is stored?

How can I prevent this? Should I change to another type of CAPTCHA?


I am sure the e-mails are coming from the form, because it is sent from my email-sender that serves the form messages. Also the letter style is the same.

For the record, I am using PHP + MySQL, but I'm not searching for a solution to this problem. I was interested in the general situation how the robots beat these technologies. I just told this situation as an example, so you can understand better what I'm asking about.

gnat
  • 21,442
  • 29
  • 112
  • 288
totymedli
  • 1,299
  • 1
  • 13
  • 27
  • 86
    I want to counter your question: how can Humans beat CAPTCHA: more often then not it takes me multiple sometimes very many times to get through. – Pieter B Mar 06 '13 at 12:43
  • 14
    If somebody thinks this question deserves a downvote, at least explain me why. – totymedli Mar 06 '13 at 12:47
  • @totymedli: It's a great question but not on topic. Why people burn rep to do what will eventually get taken care of is beyond me. – Blrfl Mar 06 '13 at 13:13
  • 10
    @Blrfl, I don't understand how this is off topic. –  Mar 06 '13 at 14:00
  • 5
    Can you tell us a bit more about the captchas you use? – mhr Mar 06 '13 at 14:03
  • 4
    @Blrfl Downvoting questions doesn't lower your rep (unless that rule reverted back recently) – KChaloux Mar 06 '13 at 14:07
  • The cost of getting captchas solved by humans are about $2 per 1000 images with about 96% accuracy. If whatever is on the other side is worth anything, someone will make a bot and exploit it. – idstam Mar 06 '13 at 14:30
  • 4
    You're not using a CAPTCHA made by sony, are you? http://cryptogasm.com/2011/07/sony-captcha-fail/ – whatsisname Mar 06 '13 at 14:39
  • 48
    If you want us to answer your question, first prove you are not a robot. – Pete Kirkham Mar 06 '13 at 14:49
  • 1
    @dan1111: I don't understand what this has to do with programming. – Blrfl Mar 06 '13 at 15:51
  • 1
    Just to check, are you sure the e-mails are coming from the form? Seems like it would be easier to spam you by just *sending directly to the e-mail address*. – deworde Mar 06 '13 at 17:09
  • @deworde Yes, because it is sent from my email-sender that serves the form messages. Also the letter style is the same. – totymedli Mar 06 '13 at 17:32
  • @KChaloux I thought downvoting -anything- on a non-Meta site was -1 rep to the downvoter, which is returned if the question is closed/deleted – Izkata Mar 06 '13 at 17:46
  • 8
    @Blrfl It does, however, have everything to do with quality software development. – Izkata Mar 06 '13 at 17:46
  • @Blrfl Unless the effect isn't instant, I just tried it by downvoting this question (undid it later, don't worry). I don't appear to have lost any rep during the downvote period. I remember there being a distinct change in policy about this, in order to encourage people to use their votes to clear out bad questions faster. – KChaloux Mar 06 '13 at 17:56
  • @KChaloux: Must've missed that. I can vote to close, so I just do that. – Blrfl Mar 06 '13 at 18:03
  • 2
    The best captcha alternative I've found is to ask a question in your registration form that your audience would know the answer to, but a random person wouldnt. This is obviously best suited for niche topic sites. I do this on a phpbb site, and only one or two spam posts a month get through, if that. Some must be manually filling out the registration form. – GrandmasterB Mar 06 '13 at 20:12
  • 1
    Concerning the downvotes -- there is a point that goes roughly like "the question shows serious lack of research". There are thousands of blog posts and online resources, hell, even research papers concerning this topic. Simply googling for it spews out tons of info. There are a lot of people who are annoyed by you asking a question on SE before trying Google. – TC1 Mar 06 '13 at 21:33
  • 2
    @TC1 I made multiple Google searches but I didn't find any realy good answer especially not in SO sites, so thats why I asked it here. – totymedli Mar 06 '13 at 21:38
  • @totymedli My first Google search: http://www.extremetech.com/computing/103283-breaking-and-fixing-captcha/2 (But I didn't downvote, as I think it's a good idea to have a _good_ version of this question somewhere on the SE network; if someone transplanted and explained the automated ways, that would be great) – Izkata Mar 06 '13 at 23:07
  • 1
    what technology does your site use? you didn't mention it which makes it difficult to answer – gnat Mar 07 '13 at 08:50
  • 1
    @gnat It is PHP + MySQL, but as I think my question is answered with MainMa's answer. I'm not searching for a solution to this problem. I was interested in the general situation how the robots beat these technologies. I just told this situation as an example, so you can understand better what I'm asking about, because usually when I ask something you guy reply with: `Could you give us an example?`. – totymedli Mar 07 '13 at 13:20
  • 1
    @totymedli when you mention _"multiple Google searches"_, it is typically worth also providing details on what search phrases you used: that way, you give answerers a chance to not only answer your question, but also improve your Google-fu by explaining how to search better – gnat Mar 07 '13 at 13:49
  • I suggest changing the type of captcha. If you don't use reCaptcha, give it a try. If you do, you might want to try [one of those](http://martin-thoma.com/captcha/). – Martin Thoma Mar 13 '13 at 19:13

7 Answers7

75

Two easiest ways to get through CAPTCHA:

  • Use human farms, i.e. ask for people to fill CAPTCHAs for money, just like ProTypers does.

  • Use an OCR.

There may also be a bug either in the CAPTCHA mechanism itself or the surrounding application, allowing someone to bypass the CAPTCHA.

By the way, the W3C article Inaccessibility of CAPTCHA : Alternatives to Visual Turing Tests on the Web explains as well how CAPTCHAs could be compromised:

[...] One of the first documented attacks on the system was by a Carnegie Mellon student, who associated CAPTCHA images with access to an adult Web site, thus gaining free human labor to crack the authentication. [...]

External projects [...] have shown methodologies and results indicating that many of the systems can be defeated by computers with between 88% and 100% accuracy, using optical character recognition.

So how can you prevent those attacks?

  • If you have your custom implemented CAPTCHA, you may try to move to a popular one, like reCAPTCHA.

    This will help if either your own CAPTCHA was too easy to OCR, or if there was a bug which was successfully exploited.

  • If you use a popular CAPTCHA mechanism, moving to a custom-made one or to another popular one might prevent OCR.

Technically, nothing would prevent human farms: you may create animated GIFs where several frames display different text very quickly, and only one frame is actually visible by the user, you may distort or bend text in all directions or find new, alternative ways to prevent OCRs from recognizing text, still humans paid for solving CAPTCHAs will successfully solve them.

You may want to move from visual CAPTCHA to sound (if you're not using both already, and you should), but this means that users with hearing impairment would be unable to use your application.


FrustratedWithFormsDesigner and GalacticCowboy mentioned in the comments domain-specific CAPTCHAs. I tried to find some material about how effective those are, but without success, so here is just my personal opinion:

  1. Domain-specific CAPTCHAs can be hugely annoying when actual users have no idea about the answer.

    Example: I'm visiting a page on a movies-oriented website. I notice a mistake in an article and want to comment on it to notify the author about the mistake. The comments form asks me, as a CAPTCHAs mechanism, to provide the name of the actress displayed on a photo. I have no idea who is this actress, so the only thing I can do is to leave the website (or spend the next two minutes using Google Images).

    Another example: a website asks to give a synonym of "mysterious". Easy as it sounds for a non-impaired person who speaks English fluently, it would be impossible to solve without external help for people who don't speak English well or people with some developmental disabilities, not counting the fact that finding synonyms or antonyms is always tricky.

  2. Most of those domain-specific problems can be solved programmatically. Both examples I gave are easily solved using external resources (Google Images and Synonyms dictionary). The one about transistors given as an example by FrustratedWithFormsDesigner is better, but still may be probably solved with a custom-made bot.

  3. None resist human farms.

  4. Either they generate data, just like ordinary text CAPTCHAs draw distorted characters, in which case the generation algorithm can be itself exploited to tune the bots, or they find data somewhere, just like reCAPTCHA takes text from scanned books, in which case the bot can use this data against it (for example, if you take words from a dictionary, asking the user to provide synonyms, the bot can use the very same dictionary to have a 100% success).

Arseni Mourzenko
  • 134,780
  • 31
  • 343
  • 513
  • 4
    +1 for noting that there is no CAPTCHA guaranteed to work – Neil Mar 06 '13 at 13:19
  • 8
    I've seen some novel captchas that are very domain specific. One of them displayed images of resistors and the user had to enter the resistance (there was a link to a help page for people who didn't know how to read resistor colour bands). Another had an image of a small snippet of musical notation and the user had to enter certain notes (such as "notes in second chord only"). They are still gameable, but the cost to spammers might not be worth it. – FrustratedWithFormsDesigner Mar 06 '13 at 15:16
  • 2
    @MainMa there was a post on a tech report somewhere about someone who had designed a bot that beat captchas remarkably fast, because it was ignoring the visual and instead interpreting the audio cue's with it, so you would end up having to add static to the audio one, which makes it even harder for users anyway –  Mar 06 '13 at 15:54
  • @RhysW: the W3C article I mentioned also mentions that moving from images to audio doesn't solve bot problem and might even make things easier for bots. – Arseni Mourzenko Mar 06 '13 at 16:00
  • I've seen domain-specific ones as well ("enter the common 3-letter designation for the game style where two teams each try to capture the other team's flag" on a game site, etc.) or ones where the text above the box tells you to enter something specific that is different from the image, solve a math problem, etc. Unfortunately, I don't know how successful these are - one site I saw had 4 different ones, on one page... – GalacticCowboy Mar 06 '13 at 16:06
  • 24
    Human Farms. Why am I picturing the Matrix? – LarsTech Mar 06 '13 at 16:53
  • I would think that Human Farms would be for precomputing CAPTCHAs, but you're saying that they can be used for real-time CAPTCHA analysis? – Ryan Amos Mar 06 '13 at 22:00
  • @Ryan Amos: for me, the first sentence of the W3C quote in my answer is an example of real-time CAPTCHA analysis. Maybe I'm wrong. – Arseni Mourzenko Mar 06 '13 at 22:26
  • @MainMa it's not clear, to me, if he was caching CAPTCHAs and then creating a database which were solved by humans OR if he was having people solve them in real time. – Ryan Amos Mar 07 '13 at 01:59
  • @FrustratedWithFormsDesigner: Decoding the resistors could certainly be easily programmed into a bot while even with a help screen it's not going to work for humans with impaired color vision. To read resistor codes I need a cheat sheet with the actual colors that I can hold up next to the resistor and even that's not going to do it on a tiny resistor. (The narrower the bands the harder it is for me.) – Loren Pechtel Mar 07 '13 at 05:56
  • @LorenPechtel: Sure, it *could* be programmed. If it was used often enough, there would probably be more incentive to program a bot for it. I suppose you could get trickier and give simple circuits and ask the user to analyse and answer some question about it. Which would probably prevent most people from submitting their comments anyway... :/ – FrustratedWithFormsDesigner Mar 07 '13 at 14:33
  • I've seen the scenario where the attacker loads a CAPTCHA from the target site into a "free" porn site. The BOT then transcribes the user's input into the target site's CAPTCHA. – Steven Evers Nov 01 '13 at 22:58
38

Adding to MainMa's answer...

Spammers trick others into doing the CAPTCHA for them

Basically, spammers set up a warez site or a porn site that appears to have a CAPTCHA on it, but it's not a real CAPTCHA. A bot pulls the CAPTCHA from the site they want to spam (or otherwise exploit), and then displays it on the warez site or a porn site where someone completes it for them. Then the CAPTCHA value is passed back to their bot...

A bit more on Spammers

I use reCAPTCHA, and I've found that it's basically worthless. I also use a custom spam filter that catches the spam that got past reCAPTCHA, and I need to review it every few days for false positives.

My forum is also all custom-written and it gets very little traffic. I don't believe anyone coded a specific attack to my site. Still, my spam filter catches 2k spam messages a day! None are ever displayed on the site. Spammers get no benefit from spamming me, yet they still do.

I can see patterns in the spamming attempts because I log it all. I can tell you this: putting aside how they get past the CAPTCHA, spammers are clearly using a brute force technique varying the fields that are filled out and the kind of data and word mixes that populate those fields. Apparently they do this so cheaply (including bypassing the CAPTCHA) that it doesn't even pay to do an analysis of the individual sites to see of if what they are doing is or isn't working.

Year after year, they continue targeting my site with thousands of spam messages a day only to get one through every month, and that one gets manually deleted a day later. It's that cheap to spam!

This is going to be a battle for years to come. Particularly for small one-man moderator sites like mine.


EDIT 6/22/2017 : I want to add that since this post google has completely revamped reCAPTCHA and as of this writing it has been working flawlessly. Though I suspect there is a bit of false positives or its a pain for users as post have dropped a bit since I implemented it. The 2 big changes are

1) They are using Images instead of text (So no more OCR)

2) They are combining it with the users activity across all site that use reCAPTCHA. So if you get past the reCAPTCHA on site A, then go to Site B it may not even prompt you to prove you are human! Also (I think) if you are hitting too many reCAPTCHAs across too many sites it will flag you as well. I am sure it is using other sorts of AI based on the users activity as well.

I'm sure its just a matter of time until spammers beat this as well...

Morons
  • 14,674
  • 4
  • 37
  • 73
  • This is probably the strongest argument for reCAPTCHA for a small website: Google has way more resources to keep up with current spam techniques. – Stephen C. Steel Jun 22 '17 at 14:57
15

Have you ever tried using cat-dog captcha? I have a forum that had standard captcha and changed it and I have no guest spams since. enter image description here

cat-dog-man
  • 179
  • 1
  • 2
12

It is possible that your site is being targeted by an exploited ultra-cheap labour force and that a human being is manually entering your CAPTCHA phrases.

If the solution you are using is not overly sophisticated, it is possible that your attacker is doing image recognition.

It is also a possibility that you have a bug somewhere in your code that is allowing the CAPTCHA to be bypassed.

Don't make the assumption that a robot is beating your CAPTCHA. Think of your system holistically and see if it has been compromised.

AakashM
  • 2,107
  • 16
  • 20
Sam
  • 6,152
  • 2
  • 20
  • 34
10

Other have discussed how spammers circumvent CAPTCHAs. Here are some tips on How can I prevent this:

Note there is no silver bullet and spammers seem to be 1 step ahead of the game. So you will have to use a combination of multiple techniques

  1. Use a honey pot form
  2. Use a CAPTCHAs or Logic question. Basic questions like "apple, fish, hand, six - which of these is a body part"
  3. Have a delay. If the form is posted within 5 seconds of the page loading ignore the request, most robots will post within less than a second
  4. Have some IP address monitoring - if you notice a spider crawling your website which is not in a white-list (google, bing) then blacklist and ban its IP address. Preferably this would be dynamic/automated in code/software
Daveo
  • 900
  • 6
  • 12
6

To echo the other answers, you're likely encountering bots that use human farms to enter the captchas for them.

I've recently discussed a technique (and released an accompanying Drupal module) that blocks spam bots by requiring client-side JavaScript. As far as I'm aware, this has worked with 100% efficiency on all sites that have used this code. The idea is to use AJAX to generate a unique hash and submit it along with the other form data, and then compute that same hash on the backend once the form is submitted, and compare the two values.

Full details in my blog post (coincidentally, since you mentioned using PHP + MySQL, these are the same technologies described there) - Module release: Badbot; eliminating spam...

gnat
  • 21,442
  • 29
  • 112
  • 288
-2

If your site is twitter, and someone has targeted it specifically (rather than a bot finding it) then you can stop reading...

Otherwise, it might be worth not making your form NOT look like a form. 1. Don't have fields with 'e-mail' in the type, name or placeholder, use short or misleading names for all fields. 2. Don't use an actual html form element and submit button. Rather use AJAX to post it on the click of a normal div (styled to look like a button). 3. Don't put the onclick event in the html, add a listener in JavaScript. 4. Use JavaScript to populate any tips 'enter your email address here' as it's possible that bots won't actually be triggering JS when trawling pages (not sure on this one, but I do it anyway).