Monday, February 20, 2012

Escape from CAPTCHA

I AM a human being, not a robot. Yet when I created this blog, the CAPTCHA mechanism forcing me to prove that was so horribly inefficient that it took a dozen attempts -- literally -- to get the form to submit. Or, rather, to get an image readable enough to a human to decode so that the validation would pass and the process could proceed.

This is frustrating, on a good day, and maddening as hell if you're on a schedule.

Despite its ubiquitous presence, there isn't much discussion of Human User Verification on the 'net. A quick Google® search brings up a handful of Software Engineers discussion of the issue, but throughout the blogosphere, you just don't hear much about it.

This is odd, of course, given most people's animosity towards the entirely un-user-friendly CAPTCHA software most widely used by sites large and small. Despite this usability nightmare, CAPTCHA is used far and wide, hither and yon, much to your average user's intense chagrin.

Before moving forward, though, let's take a step back and talk about Human User Verification. Back in the Wild West days of the internet (read: early/mid 90's) you didn't see much in the way of Human User Verification. Of course, you didn't see much spam in those days, either. Spam, the barnicles on the hull of the ship that is the internet, became more prevalent in the dot.com boom days, and is now responsible for a vast majority of the traffic on the internet. A Spam free internet would be faster, more efficient and far less infuriating. Spam is a pox, a bane, a chancreous sore on an otherwise outstanding form of communication. Its presence is responsible for lost communication. It has forced people to abandon their email addresses. It has spread malware and viruses. It has, in general, made people's lives miserable.

So it goes...

Over the years, I've developed a variety of methodologies for dealing with spam, some low-tech, others more advanced. I wrote a spam filter for an email client that I still use today. It learns from analysis as new forms of spam arrive in my mailbox. That doesn't stop the lower forms of humanity, namely spammers, from flooding servers with bogus communication. In fact, it has evolved over the years (written by programmers having succumbed to the Dark Side, shame on them) into what is called Bot Spam. Bot Spam applications seek out submittable forms, fill them with bogus data, and submit them, extending the clog to data-driven websites.

And with the advent of this monstrous problem emerged CAPTCHA.

CAPTCHA was conceived by Carnegie Mellon engineers Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford in 2000. It stands for Completely Automated Public Turing test to tell Computers and Humans Apart. The acronym is really as tortured as the process itself. It has, nevertheless, been an effective albeit ham-fisted method for keeping Bot Spam at bay.

More than a decade later, CAPTCHA still holds the title of most popular reverse Turing Test. This fact is ironic, given the number of hours poured into the development of Bot Spam applications.

At the heart of it, Human User Verification needs to fulfill two basic goals:

* Provide a problem easy enough for all humans to solve.
* Prevent standard automated software from filling out a form

While CAPTCHA easily fulfills the second goal, I believe an argument can be made that it fails to successfully complete the first. Further, given the zeal spammers exhibit for thwarting such gateways, it is only a matter of time before some Dark and Disturbed engineer creates an algorithm capable of reading the skewed letters generated by a CAPTCHA application.

And so it was, that when I had a client request that I install a reverse Turing Test on their site to thwart Bot Spam, that I decided to re-examine the issue and create one of my own. The result was ABHUVA.

ABHUVA stands for A Better Human User Verification Application. Its primary goal is to put usability back into Human User Verification, while still successfully thwarting Bot Spam applications.

In order to accomplish this, I started from Square One and asked the question: What does it take to prove that there is a human being on the submitting end of a form? The answer was very simple:

"I think, therefore I am."

This Golden Nugget was first made popular by Descartes, a 16th century French Philosopher and Mathematician, and the human considered to be the father of Western Philosophy. At the heart of the matter, this is all a reverse Turing Test is attempting to prove: That the entity on the other end of the line thinks, and therefore is.

Given this, I took the next step: What, besides horribly skewed lettering, can prove that the requesting entity is capable of thought? The answer came to be quite quickly: discerning symbols.

Symbolism is the oldest form of written communication where humans are concerned. Thousands of years before formalized written language, humans communicated indirectly via symbols. Hence I leveraged Symbolism rather than written language for my ABHUVA reverse Turing Test. Here's a fact: More humans can identify symbols than they can identify written language. It's the same methodology that goes into creating everything from roadside directives to bathroom designations. And, programmatically speaking, it would be far more difficult to develop an algorithm that can discern a symbol of a chicken from the word chicken. Thus it was that symbolism was used as context for initial Human User Verification in ABHUVA.

ABHUVA doesn't stop there, however. As the Bot Spam application continues to try and breach the gates, ABHUVA changes the code required. When it enters this next phase, it begins asking the user questions that require input. The questions start off extremely simple, and grow progressively more difficult for each challenge. This methodology changes the paradigm, changes the game, and will challenge even the most cleverly-programmed application to successfully respond correctly. Further, it actually locks the form in the client until the question is answered. There is no going around, no going over or under. Only a correct answer will enable the form.

Spam has become more aggressive over the years. Doesn't it make sense that the mechanism fending off this attack should become more aggressive as well? In the commercial version, it offers a third server-side validation layer, using the same model. It's time to change the game. It's time for ABHUVA.

No comments:

Post a Comment