What is CAPTCHA?

CAPTCHA stands for "completely automated public Turing test* to tell computers and humans apart." It refers to various authentication methods that validate users as humans, not bots, by presenting a challenge that is simple for humans but difficult for machines.

CAPTCHAs prevent scammers and spammers from using bots to complete web forms for malicious purposes.

Traditional CAPTCHAs required users to read and correctly retype distorted text that could not be interpreted by optical character recognition (OCR) technology. Newer iterations of CAPTCHA technology use AI-driven behavioral and risk analyses to authenticate human users based on activity patterns rather than a single task.

Many websites require users to complete a CAPTCHA challenge before logging into an account profile, submitting a registration form, posting a comment, or performing some other action that hackers might use a bot to perform. By meeting the challenge, users confirm that they are human and are then allowed to continue their activity on the website.

* A Turing Test, named for its creator Alan Turing, tests a machine’s ability to exhibit human intelligence.

Strengthen your security intelligence  

Stay ahead of threats with news and insights on security, AI and more, weekly in the Think Newsletter.  

Subscribe today

The evolution of CAPTCHA

Several different groups developed the earliest forms of CAPTCHA technology in parallel during the late 1990s and early 2000s. Each group worked to combat the widespread problem of hackers that use bots for nefarious activities on the internet. For example, computer scientists working for the search engine AltaVista wanted to stop bots from adding malicious web addresses to the company's link database.

Researchers at the IT company Sanctum filed the first CAPTCHA-style system in 1997. However, a group of computer science researchers at Carnegie Mellon University led by Luis von Ahn and Manuel Blum first introduced the term CAPTCHA in 2003. This team was inspired to work on the technology by a Yahoo executive who delivered a talk about the company's issues with spambots signing up for millions of fake email accounts.

To solve Yahoo’s problem, von Ahn and Blum created a computer program that:

generated a random string of text,
generated a distorted image of that text (called a ‘CAPTCHA code’),
presented the image to the user,
asked the user to enter the text into a form field and then submit entry by clicking a check box next to the phrase "I am not a robot."

Because OCR technology of the time struggled to decipher such distorted text, bots could not pass the CAPTCHA challenge. If a user entered the correct string of characters, it could be reliably assumed they were human and they were permitted to complete their account registration or web form submission.

Yahoo implemented Carnegie Mellon's technology, requiring all users to pass a CAPTCHA test before signing up for an email address. This significantly cut down on spambot activity, and other companies proceeded to adopt CAPTCHAs to protect their web forms. Over time, however, hackers used data from completed CAPTCHA challenges to develop algorithms capable of reliably passing CAPTCHA tests. This marked the beginning of an ongoing arms race between CAPTCHA developers and cybercriminals which has fueled the evolution of CAPTCHA functionality.

reCAPTCHA v1

Launched by von Ahn in 2007, reCAPTCHA v1 had a dual aim: to make the text-based CAPTCHA challenge more difficult for bots to crack, and to improve the accuracy of OCR being used at the time to digitize printed texts.

reCAPTCHA achieved the first goal by increasing the distortion of text displayed to the user, and eventually adding lines through the text.

It achieved the second goal by replacing a single image of randomly-generated distorted text with two distorted text images of words scanned from actual texts by two different OCR programs. The first word, or control word, was a word identified correctly by both OCR programs. The second word was a word both OCR programs failed to identify. If the user correctly identified the control word, reCAPTCHA assumed the user was human and allowed them to continue their task, and also assumed the user identified the second word correctly, and used the response to verify future OCR results.

In this way, reCAPTCHA improved anti-bot security and improved the accuracy of texts being digitized at the Internet Archive and the New York Times. Ironically, over time it also helped improve artificial intelligence and machine learning algorithms to the point that, by 2014, they could identify the most distorted text CAPTCHAs 99.8% of the time.

In 2009, Google acquired reCAPTCHA and began using it to digitize texts for Google Books while offering it as a service to other organizations. However, as OCR technology progressed with the help of reCAPTCHA, so did the artificial intelligence programs that could effectively solve text-based reCAPTCHAs. In response, Google introduced image recognition reCAPTCHAs in 2012, which replaced distorted text with images taken from Google Street View. Users proved their humanity by identifying real-world objects like street lights and taxicabs. In addition to sidestepping the advanced OCR now deployed by bots, these image-based reCAPTCHAs were considered more convenient for mobile app users.

Google reCAPTCHA v2: No CAPTCHA reCAPTCHA

In 2014, Google released reCAPTCHA v2, which replaced text- and image-based challenges with a simple checkbox stating "I am not a robot." As users check the box, reCAPTCHA v2 analyzes the user’s interactions with web pages, evaluating factors like typing speed, cookies, device history, and IP address to determine whether a user is likely to be human. The checkbox is also part of how the CAPTCHA works: no CAPTCHA reCAPTCHA tracks the user's mouse movements as they click the box. A human's movements tend to be more chaotic, whereas bots' movements are more precise. If no CAPTCHA reCAPTCHA suspects a user may be a bot, it presents them with an image-based CAPTCHA challenge.

reCAPTCHA v3

reCAPTCHA v3, which debuted in 2018, does away with the checkbox and expands upon the AI-driven risk analysis of no CAPTCHA reCAPTCHA. ReCAPTCHA v3 integrates with a web page through JavaScript API and runs in the background, scoring a user's behavior on a scale of 0.0 (likely a bot) to 1.0 (likely a human). Website owners can set automated actions to trigger at certain moments when a user's score suggests they may be a bot. For example, blog comments from low-scoring users may be sent to a moderation queue when they click "submit," or low-scoring users may be asked to complete a multifactor authentication process when they attempt to log into an account.

AI-based authentication methods like reCAPTCHA v3 seek to sidestep the problem of hackers. By removing interactive challenges from the CAPTCHA verification process, they prevent hackers from using data from previously solved challenges to train bots to crack new CAPTCHAs. Because of this, experts believe AI-based CAPTCHAs may become the norm, completely replacing challenge-based CAPTCHAs in the next five to ten years.

Mixture of Experts | 9 May, episode 54

Decoding AI: Weekly News Roundup

Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.

Watch the latest podcast episodes

CAPTCHA use cases

CAPTCHA technology has several common uses as a bot detection and prevention measure, including:

Preventing fake registrations
Guarding against suspicious transactions
Protecting online poll integrity
Stopping comment and product review spam
Defending against brute-force and dictionary attacks

Preventing fake registrations

By presenting users with a CAPTCHA test before signing up for an email account, social media profile or other online services, companies can block bots that use these services to spread spam or malware or conduct malicious activities. CAPTCHA's earliest adopters were companies like Yahoo, Microsoft and AOL, who wanted to stop bots from registering for fake email accounts.

Guarding against suspicious transactions

Companies like Ticketmaster have used CAPTCHA to stop bots from buying up limited commodities, for example concert tickets, and reselling them on secondary markets.

Protecting online poll integrity

Bots can compromise online polls without a deterrent like CAPTCHA. The need to protect the integrity of online poll results motivated some of the earliest experiments in CAPTCHA-like technology. For example, to ensure the quality of its online opinion polls during the 1996 US presidential election, the Digital Equipment Corporation asked users to locate and click a pixelated image of a flag on the web page before casting their votes.

Stopping comment and product review spam

Scammers and cybercriminals often use blog and article comment sections to spread scams and malware. They might also engage in review spam, in which they post large numbers of fake reviews to artificially boost a product's rankings on an e-commerce website or search engine. Bots can also use unprotected comment sections to carry out harassment campaigns. These malicious activities can be mitigated by asking users to complete a CAPTCHA before posting a comment or review.

Defending against brute-force and dictionary attacks

In brute-force and dictionary attacks, hackers break into an account by using bots to guess combinations of numbers, letters, and special characters until they find the correct password. These attacks can be halted by requiring users to complete a CAPTCHA after a certain number of unsuccessful login attempts.

CAPTCHA disadvantages

While CAPTCHA technology has generally proven effective in stopping bots, it is not without its disadvantages, including:

Inconvenient user experiences
Accessibility challenges
Reduced conversion rates
Bot AI's ability to defeat new CAPTCHAs
Privacy concerns

Inconvenient user experiences

CAPTCHA challenges add an extra step to registration, login, and form-completion processes that some people find irksome. Moreover, as CAPTCHA complexity has increased to defeat more sophisticated bots, solving CAPTCHAs has also become frustrating for users. In a 2010 study, when Stanford University researchers asked groups of three people to solve the same CAPTCHAs, participants agreed unanimously on the CAPTCHA solution just 71% of the time. The study also found that non-native English speakers have a harder time solving CAPTCHAs than native speakers, which suggests that CAPTCHAs might be more challenging for some demographic groups than others.

Accessibility challenges

Text and image CAPTCHAs can be extremely challenging or impossible to solve for visually impaired users. This is compounded by the fact that screen readers cannot read most CAPTCHA challenges because these tests are designed to be unreadable by machines.

Alternative forms of CAPTCHAs have attempted to address this issue, but they have their own limitations. Audio CAPTCHAs, which require users to decipher garbled audio, are notoriously difficult to solve. The aforementioned Stanford study found that users agree unanimously on audio CAPTCHA solutions just 31% of the time.

MAPTCHA, a type of CAPTCHA that requires users to solve simple math problems, is highly vulnerable to being cracked by algorithms.

Using inaccessible CAPTCHAs can have legal repercussions as well. The Section 508 Amendment to the Rehabilitation Act of 1973, introduced in 1998, requires US federal agencies and their private sector partners to make digital information accessible to people with disabilities. Companies may be in violation of this requirement if they do not have accessible CAPTCHA options.

Reduced conversion rates

The inconvenient user experience and inaccessibility of CAPTCHAs can negatively influence conversion rates. In a 2009 case study of 50 websites, asking users to complete a CAPTCHA reduced legitimate conversions by 3.2%. Audio CAPTCHAs can be especially detrimental: the Stanford study mentioned before found that users give up on solving sound-based CAPTCHAs 50% of the time.

Bot AI's ability to defeat new CAPTCHAs

CAPTCHA schemes have changed so many times since the technology's inception because bots have consistently evolved to defeat each new CAPTCHA challenge. The very structure of CAPTCHA technology contributes to this problem because CAPTCHAs rely on unsolved AI problems to thwart bots. When humans solve CAPTCHA challenges, they generate data sets that can train machine learning algorithms to overcome these previously impossible AI problems. For example, in 2016, computer science researcher Jason Polakis used Google's reverse image search to solve Google's image-based CAPTCHAS with 70%.

Privacy concerns

While new forms of CAPTCHA try to solve accessibility problems and halt the bot arms race by removing interactive challenges entirely, some users and researchers find AI-driven CAPTCHAs invasive. People have raised concerns about how reCAPTCHA v3 uses codes and cookies to track users across multiple websites. Some feel there is not enough transparency into how this tracking data might be used for purposes beyond verification.

2024 Kuppingercole CIAM Report

Learn about the customer identity and access management (CIAM) landscape and current trends in the market.

What is CAPTCHA?

What is CAPTCHA?

Strengthen your security intelligence

The evolution of CAPTCHA

reCAPTCHA v1

Google reCAPTCHA v2: No CAPTCHA reCAPTCHA

reCAPTCHA v3

Decoding AI: Weekly News Roundup

CAPTCHA use cases

Preventing fake registrations

Guarding against suspicious transactions

Protecting online poll integrity

Stopping comment and product review spam

Defending against brute-force and dictionary attacks

CAPTCHA disadvantages

Inconvenient user experiences

Accessibility challenges

Reduced conversion rates

Bot AI's ability to defeat new CAPTCHAs

Privacy concerns

Resources

Related solutions