Enterprise IT Consultant Views on Technologies and Trends

Jul 25 2010   2:36AM GMT

reCAPTCHA – the power of channelizing human efforts

Sasirekha R Profile: Sasirekha R

reCAPTCHA – the power of channelizing human efforts

With the advent of Web 2.0, “Architecture of Participation”, “The Network Effect (social network)”, “Harnessing the collective intelligence” became the buzz words and numerous examples are quoted to show their success.

Wikipedia is based on the notion that an entry can be added by any web user, and edited by any other, is a radical experiment in trust – applying Eric Raymond’s dictum “with enough eyeballs, all bugs are shallow,” to content creation. It has become quite natural to many of us to refer to Wikipedia as the first source of information. But this is just the tip of an iceberg as the percentage of users who contribute to Wikipedia is very small.

reCAPTCHA is a sample of how powerful the usage could be if most of the Internet users contribute. The term CAPTCHA stands for Completely Automated Public Turing Test To Tell Computers and Humans Apart. According to Wikipedia, the process usually involves one computer asking a user to complete a simple test which the computer is able to generate and grade. Because other computers are unable to solve the CAPTCHA, any user entering a correct solution is presumed to be human.

reCAPTCHA is currently digitizing the archives of the The New York Times and twenty years of The New York Times have already been digitized and it is believed that another 110 years would be done by the end of 2010. 

Luis von Ahn – an early CAPTCHA developer – who realized “he had unwittingly created a system that was frittering away, in ten-second increments, millions of hours of a most precious resource: human brain cycles.”  The question on how this human effort could be put to positive arose. reCAPTCHA is the answer – and it channels this effort spent solving CAPTCHAs online into “reading” books.”

The statistics on number of words that get digitized every day also gives us some interesting perspectives. In 2007, it was said that 30 million CAPTCHAs are being solved every day and the number became 60 million in 2008 and today it is said that over 200 million CAPTCHAs are solved every day by people around the world. Another way to look at it is that, currently CAPTCHAs produce output equivalent to more than 150,000 hours of work each day.

The success of CAPTCHA – involvement of a vast majority of the internet users – can be attributed to the following facts:

  • less than ten seconds of human time is spent on each case
  • reCAPTCHA is available for free and a large number of popular websites are using this to control spam.
  • It is mandatory to solve it for the user to solve it to continue their work – say to get to the website or download.

Now Google has taken over reCAPTCHA and you can visit http://www.google.com/recaptcha/learnmore to know more about this free CAPTCHA service that helps to digitize books, newspapers and old time radio shows. Typically CAPTCHAs are used by websites to protect themselves from spam. For some applications such as WordPress and Mediawiki, plugins reCAPTCHA can be used without writing any code by using the plug-in provided.

Google suggests that it can be used by individuals who want to control their control email spam, it provide a service called Mailhide which takes an address such as jsmith@example.com and turns it into jsm@example.com. In order to reveal the address, a user must click on the “…” and solve a reCAPTCHA.

As a security solution, CAPTCHAs are also not invincible and there are periodical reports of the CAPTCHAs being cracked. Though some of the failures are attributed to incorrect implementation, it isalso said that a growing number of research projects are attempting cracking CAPTCHAs and some are already in beating visual CAPTCHAs using computer programs. It is also not encouraging to note that spammers use human solvers and pay about one dollor for each 1000 solved captchas.

The official CAPTCHA website http://www.captcha.net/ now provides another image recognition based CAPTCHA, ESP-PIX where instead of typing letters we have to authenticate ourselves by recognized what object is common in a set of images. This ongoing war between security providers and breakers can be expected to go on and on – as CAPTCHAs become more and more widely used.

reCAPTCHA is a sample on how to successfully explore the power of channelizing human effort. I am sure we can think of other uses – of collective human effort –  by enterprises as well as at global initiatives.  Opinion Polling and Surveys seem to be one such area to me. Let me try to elaborate the idea in a later blog.

2  Comments on this Post

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • Sasirekha R
    [...] involving the users without making them actually pay money. reCAPTCHA (refer to earlier blog http://itknowledgeexchange.techtarget.co…) is a classic example of how the human effort can be channelized for achieving huge productivity - [...]
    0 pointsBadges:
    report
  • Nikki6
    Text-based Authentication is a thing of the past and still continues to be a problem for most people along with it still allowing for SPAM/BOTS to get through you have to now contend with Phish Attacks. Anyone that has ever used CAPTCHA and/or RECAPTCHA know that is becomes a complete hassle trying to decipher the random words along with have to start over if you input the incorrect verbiage. Now imagine if you have bad vision and need to go online for something, it is a complete nightmare and make the online experience not very enjoyable for anyone. At first you had some kind of text-based authentication which was single word and those usually are random letters, numbers and symbols but those can be easier broken, so along came RECAPTCHA which use any two random word to authenticate. These words cannot be read half the time and sometimes the wording combination is offensive.. Image-based Authentication and Verification solutions that can be imbedded within WCM's are the next wave and make sense. Images-based Captcha solution offers pure ease of use and more security from BOTS and SPAM when websites have web pages where users can comment, register or blog.
    0 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: