CAPTCHA, an acronym for "Completely Automated Public Turing Test* to Tell Humans Apart," refers to various authentication methods that validate users as humans, and not bots, by testing users with a challenge that is simple for humans but difficult for machines. CAPTCHAs prevent scammers and spammers from using bots to fill out web forms for malicious purposes.
Traditional CAPTCHAs asked users to read and correctly retype distorted text that could not be interpreted by optical character recognition (OCR) technology. Newer iterations of CAPTCHA technology use AI-driven behavioral and risk analyses to authenticate human users based on activity patterns rather than a single discrete task.
Many web sites require users to complete a CAPTCHA challenge before logging into an account profile, submitting a registration form, posting a comment, or performing some other action that hackers might use a bot to perform. By meeting the challenge, users they are human and are allowed to continue their activity on the website.
The earliest forms of CAPTCHA technology were developed by several different groups in parallel during the late 1990s and early 2000s. Each group was working to combat the widespread problem of hackers using bots to carry out nefarious activities on the internet. For example, computer scientists working for the search engine AltaVista wanted to stop bots from adding malicious web addresses to the company's link database.
Researchers at the IT company Sanctum filed the first CAPTCHA-style system in 1997. But the term CAPTCHA was coined in 2003 by a group of computer science researchers at Carnegie Mellon University led by Luis von Ahn and Manuel Blum. This team was inspired to work on the technology by a Yahoo executive who delivered a talk about the company's issues with spambots signing up for millions of fake email accounts.
To solve Yahoo’s problem, von Ahn and Blum created a computer program that 1) generated a random string of text, 2) generated a distorted image of that text (called a ‘CAPTCHA code’), 3) presented the image to the user, and 4) asked the user to enter the text into a form field, and then submit entry by clicking a check box next to the phrase ‘I am not a robot.’ Because optical character recognition (OCR) technology of the time struggled to decipher such distorted text, bots could not pass the CAPTCHA challenge. If a user entered the correct string of characters, it could be reliably assumed they were human and they were permitted to complete their account registration or web form submission.
Yahoo implemented Carnegie Mellon's technology, requiring all users to pass a CAPTCHA test before signing up for an email address. This significantly cut down on spambot activity, and other companies proceeded to adopt CAPTCHAs to protect their web forms. Over time, however, hackers used data from completed CAPTCHA challenges to develop algorithms capable of reliably passing CAPTCHA tests. This marked the beginning of an ongoing arms race between CAPTCHA developers and cybercriminals which has fueled the evolution of CAPTCHA functionality.
reCAPTCHA v1
Launched by von Ahn in 2007, reCAPTCHA v1 had a dual aim: To make the text-based CAPTCHA challenge more difficult for bots to crack, and to improve the accuracy of optical character recognition (OCR) being used at the time to digitize printed texts.
reCAPTCHA achieved the first goal by increasing the distortion of text displayed to the user, and eventually adding lines through the text.
It achieved the second goal by replacing a single image of randomly-generated distorted text with two distorted text images of words scanned from actual texts by two different OCR programs. The first word, or control word, was a word identified correctly by both OCR programs; the second word was a word both OCR programs failed to identify. If the user correctly identified the control word, reCAPTCHA 1) assumed the user was human and allowed them to continue their task, and 2) also assumed the user identified the second word correctly, and used the response to verify future OCR results.
In this way, reCAPTCHA improved anti-bot security and improved the accuracy of texts being digitized at the Internet Archive and the New York Times. (Ironically, over time it also helped improve artificial intelligence and machine learning algorithms to the point that, by 2014, they could identify the most distorted text CAPTCHAs 99.8 percent of the time.)
In 2009, Google acquired reCAPTCHA and began using it to digitize texts for Google Books while offering it as a service to other organizations. However, as OCR technology progressed with the help of reCAPTCHA, so did the artificial intelligence programs that could effectively solve text-based reCAPTCHAs. In response, Google introduced image recognition reCAPTCHAs in 2012, which replaced distorted text with images taken from Google Street View. Users proved their humanity by identifying real-world objects like street lights and taxicabs. In addition to sidestepping the advanced OCR now deployed by bots, these image-based reCAPTCHAs were considered more convenient for mobile app users.
Google ReCAPTCHA v2: No CAPTCHA reCAPTCHA
In 2014 Google released reCAPTCHA v2, which replaced text- and image-based challenges with a simple checkbox stating "I am not a robot." As users check the box, reCAPTCHA v2 analyzes the user’s interactions with web pages, evaluating factors like typing speed, cookies, device history, and IP address to determine whether a user is likely to be human. The checkbox is also part of how the CAPTCHA works: no CAPTCHA reCAPTCHA tracks the user's mouse movements as they click the box. A human's movements tend to be more chaotic, whereas bots' movements are more precise. If no CAPTCHA reCAPTCHA suspects a user may be a bot, it presents them with an image-based CAPTCHA challenge.
ReCAPTCHA v3
reCAPTCHA v3, which debuted in 2018, does away with the check box and expands upon the AI-driven risk analysis of no CAPTCHA reCAPTCHA. ReCAPTCHA v3 integrates with a web page via JavaScript API and runs in the background, scoring a user's behavior on a scale of 0.0 (likely a bot) to 1.0 (likely a human). Website owners can set automated actions to trigger at certain moments when a user's score suggests they may be a bot. For example, blog comments from low-scoring users may be sent to a moderation queue when they click "submit," or low-scoring users may be asked to complete a multi-factor authentication process when they attempt to log into an account.
By removing interactive challenges from the CAPTCHA verification process, AI-based authentication methods like reCAPTCHA v3 seek to sidestep the problem of hackers using data from previously solved challenges to train bots to crack new CAPTCHAs. Because of this, experts believe AI-based CAPTCHAs may become the norm, completely replacing challenge-based CAPTCHAs in the next 5-10 years.
As a bot detection and prevention measure, CAPTCHA technology has several common uses, including:
Preventing fake registrations. By asking users to pass a CAPTCHA test before signing up for an email account, social media profile, or other online service, companies can filter out bots aiming to use these services to spread spam or malware or conduct other malicious activities. CAPTCHA's earliest adopters were companies like Yahoo, Microsoft, and AOL, which wanted to stop bots from registering for fake email accounts.
Guarding against suspicious transactions. Companies like Ticketmaster have used CAPTCHA to stop bots from buying up limited commodities, like concert tickets, and reselling them on secondary markets.
Protecting online poll integrity. Without a deterrent like CAPTCHA, online polls can be compromised by bots. Some of the earliest experiments in CAPTCHA-like technology were motivated by the need to protect the integrity of online poll results. For example, to ensure the quality of its online opinion polls during the 1996 U.S. presidential election, the Digital Equipment Corporation asked users to locate and click a pixelated image of a flag on the web page before casting their votes.
Stopping comment and product review spam. Scammers and cybercriminals often use blog and article comment sections to spread scams and malware; they may also engage in review spam, in which they post large numbers of fake reviews to artificially boost a product's rankings on an eCommerce website or search engine. Bots can also use unprotected comment sections to carry out harassment campaigns. These malicious activities can be mitigated by asking users to complete a CAPTCHA before posting a comment or review.
Defending against brute-force and dictionary attacks. In brute-force and dictionary attacks, hackers break into an account by using bots to guess combinations of numbers, letters, and special characters until they find the correct password. These attacks can be halted by requiring users to complete a CAPTCHA after a certain number of unsuccessful login attempts.
While CAPTCHA technology has generally proven effective in stopping bots, it is not without its trade-offs.
Inconvenient user experience. CAPTCHA challenges add an extra step to registration, login, and form-completion processes that some people find irksome. Moreover, as CAPTCHA complexity has increased to defeat more sophisticated bots, solving CAPTCHAS has also become frustrating for users. In a 2010 study, when Stanford University researchers asked groups of three people to solve the same CAPTCHAs; participants agreed unanimously on the CAPTCHA solution just 71 percent of the time (PDF, 2.5 MB; link resides outside IBM.com). The study also found that non-native English speakers have a harder time solving CAPTCHAs than native speakers, which suggests that CAPTCHAs may be more challenging for some demographic groups than others.
Inaccessibility. Text and image CAPTCHAs can be extremely challenging or impossible to solve for visually impaired users. This is compounded by the fact that screen readers cannot read most CAPTCHA challenges because these tests are designed to be unreadable by machines.
Alternative forms of CAPTCHAs have attempted to address this issue, but they have their own limitations. Audio CAPTCHAs, which ask users to decipher garbled audio, are notoriously difficult to solve. The aforementioned Stanford study found that users agree unanimously on audio CAPTCHA solutions just 31 percent of the time.
MAPTCHA, a type of CAPTCHA that asks users to solve simple math problems, is highly vulnerable to being cracked by algorithms.
Using inaccessible CAPTCHAs can have legal repercussions as well. The Section 508 Amendment to the Rehabilitation Act of 1973, introduced in 1998, requires U.S. federal agencies and private organizations that do business with those agencies to make digital information accessible to people with disabilities. Companies may be in violation of this requirement if they do not have accessible CAPTCHA options.
Lower conversion rates. The inconvenient user experience and inaccessibility of CAPTCHAs can negatively influence conversion rates. In a 2009 case study of 50 websites, asking users to complete a CAPTCHA reduced legitimate conversions by 3.2 percent (link resides outside IBM.com). Audio CAPTCHAs can be especially detrimental: The Stanford study mentioned above found that users give up on solving sound-based CAPTCHAs 50 percent of the time.
Bot AI continues evolving to defeat new CAPTCHAs. CAPTCHA schemes have changed so many times since the technology's inception because bots have consistently evolved to defeat each new CAPTCHA challenge. The very structure of CAPTCHA technology contributes to this problem because CAPTCHAs rely on unsolved AI problems to thwart bots. When humans solve CAPTCHA challenges, they generate data sets that can train machine learning algorithms to overcome these previously impossible AI problems. For example, in 2016, computer science researcher Jason Polakis used Google's reverse image search to solve Google's image-based CAPTCHAS with 70 percent accuracy.
Privacy concerns. While new forms of CAPTCHA try to solve accessibility problems and halt the bot arms race by removing interactive challenges entirely, some users and researchers find AI-driven CAPTCHAs invasive. People have raised concerns about how reCAPTCHA v3 uses codes and cookies to track users across multiple websites. Some feel there is not enough transparency into how this tracking data might be used for purposes beyond verification.
*A Turing Test, named for its creator Alan Turing, tests a machine’s ability to exhibit human intelligence.
Connect every user to the right level of access with IBM Security Verify IAM solution.
IBM Security Verify lets you go beyond basic authentication with options for passwordless or multifactor authentication.
Proactively protect users and assets with AI-assisted, risk-based authentication with IBM Security Verify.
IAM allows IT administrators to assign, authenticate, permission and manage a single digital identity for each user or entity on the network.
MFA requires at least one authentication factor in addition to a password, or at least two authentication factors instead of a password, to authenticate users.
SSO enables users to log in to a session once, using a single set of login credentials, and access multiple related applications and services during that session.