reCAPTCHA – the power of channelizing human efforts
With the advent of Web 2.0, “Architecture of Participation”, “The Network Effect (social network)”, “Harnessing the collective intelligence” became the buzz words and numerous examples are quoted to show their success.
Wikipedia is based on the notion that an entry can be added by any web user, and edited by any other, is a radical experiment in trust – applying Eric Raymond’s dictum “with enough eyeballs, all bugs are shallow,” to content creation. It has become quite natural to many of us to refer to Wikipedia as the first source of information. But this is just the tip of an iceberg as the percentage of users who contribute to Wikipedia is very small.
reCAPTCHA is a sample of how powerful the usage could be if most of the Internet users contribute. The term CAPTCHA stands for Completely Automated Public Turing Test To Tell Computers and Humans Apart. According to Wikipedia, the process usually involves one computer asking a user to complete a simple test which the computer is able to generate and grade. Because other computers are unable to solve the CAPTCHA, any user entering a correct solution is presumed to be human.
reCAPTCHA is currently digitizing the archives of the The New York Times and twenty years of The New York Times have already been digitized and it is believed that another 110 years would be done by the end of 2010.
Luis von Ahn – an early CAPTCHA developer – who realized “he had unwittingly created a system that was frittering away, in ten-second increments, millions of hours of a most precious resource: human brain cycles.” The question on how this human effort could be put to positive arose. reCAPTCHA is the answer – and it channels this effort spent solving CAPTCHAs online into “reading” books.”
The statistics on number of words that get digitized every day also gives us some interesting perspectives. In 2007, it was said that 30 million CAPTCHAs are being solved every day and the number became 60 million in 2008 and today it is said that over 200 million CAPTCHAs are solved every day by people around the world. Another way to look at it is that, currently CAPTCHAs produce output equivalent to more than 150,000 hours of work each day.
The success of CAPTCHA – involvement of a vast majority of the internet users – can be attributed to the following facts:
less than ten seconds of human time is spent on each case
- reCAPTCHA is available for free and a large number of popular websites are using this to control spam.
- It is mandatory to solve it for the user to solve it to continue their work – say to get to the website or download.
Now Google has taken over reCAPTCHA and you can visit http://www.google.com/recaptcha/learnmore to know more about this free CAPTCHA service that helps to digitize books, newspapers and old time radio shows. Typically CAPTCHAs are used by websites to protect themselves from spam. For some applications such as WordPress and Mediawiki, plugins reCAPTCHA can be used without writing any code by using the plug-in provided.
Google suggests that it can be used by individuals who want to control their control email spam, it provide a service called Mailhide which takes an address such as firstname.lastname@example.org and turns it into jsm…@example.com. In order to reveal the address, a user must click on the “…” and solve a reCAPTCHA.
As a security solution, CAPTCHAs are also not invincible and there are periodical reports of the CAPTCHAs being cracked. Though some of the failures are attributed to incorrect implementation, it isalso said that a growing number of research projects are attempting cracking CAPTCHAs and some are already in beating visual CAPTCHAs using computer programs. It is also not encouraging to note that spammers use human solvers and pay about one dollor for each 1000 solved captchas.
The official CAPTCHA website http://www.captcha.net/ now provides another image recognition based CAPTCHA, ESP-PIX where instead of typing letters we have to authenticate ourselves by recognized what object is common in a set of images. This ongoing war between security providers and breakers can be expected to go on and on – as CAPTCHAs become more and more widely used.
reCAPTCHA is a sample on how to successfully explore the power of channelizing human effort. I am sure we can think of other uses – of collective human effort – by enterprises as well as at global initiatives. Opinion Polling and Surveys seem to be one such area to me. Let me try to elaborate the idea in a later blog.