Webinar: Closing the Identity Gap Securing Humans and AI at Scale | 5 March | Register now

What is CAPTCHA?

What is CAPTCHA?

CAPTCHA stands for “completely automated public turing test* to tell computers and humans apart.” It refers to various authentication methods that validate users as humans, not bots, by presenting a challenge that is simple for humans but difficult for machines.

CAPTCHAs prevent malicious actors and spammers from using bots to complete web forms for malicious purposes.

Traditional CAPTCHAs forced users to read distorted text and retype it correctly because optical character recognition (OCR) technology cannot interpret it. Newer iterations of CAPTCHA technology use AI-driven behavioral and risk analyses to authenticate human users based on activity patterns rather than a single task.

Many websites require users to complete a CAPTCHA challenge before logging in to an account profile, submitting a registration form or posting a comment. This step helps prevent hackers from using bots to perform malicious actions. By meeting the challenge, users confirm that they are human and are then allowed to continue their activity on the website.

* A Turing Test, named for its creator Alan Turing, tests a machine’s ability to exhibit human intelligence.

Would your team catch the next zero-day in time?

Join security leaders who rely on the Think Newsletter for curated news on AI, cybersecurity, data and automation. Learn fast from expert tutorials and explainers—delivered directly to your inbox. See the IBM Privacy Statement.

Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.

https://www.ibm.com/privacy

The evolution of CAPTCHA

Several different groups developed the earliest forms of CAPTCHA technology in parallel during the late 1990s and early 2000s. Each group worked to combat the widespread problem of hackers that use bots for nefarious activities on the internet. For example, computer scientists working for the search engine AltaVista wanted to stop bots from adding malicious web addresses to the company’s link database.

Researchers at the IT company Sanctum filed the first CAPTCHA-style system in 1997. In fact, a group of computer science researchers at Carnegie Mellon University led by Luis von Ahn and Manuel Blum first introduced the term CAPTCHA in 2003. This team was inspired to work on the technology by a Yahoo executive who delivered a talk about the company’s issues with spambots signing up for millions of fake email accounts.

To solve Yahoo’s problem, von Ahn and Blum created a computer program that:

  1. generated a random string of text
  2. generated a distorted image of that text (called a ‘CAPTCHA code’)
  3. presented the image to the user
  4. asked the user to enter the text into a form field 
  5. required the user to submit the entry by clicking a checkbox next to the phrase “I am not a robot.”

Because OCR technology of the time struggled to decipher such distorted text, bots were unable to pass the CAPTCHA challenge. If a user entered the correct string of characters, it became clear they were human and they were allowed to complete their account registration or web form submission.

Yahoo implemented Carnegie Mellon’s technology, requiring all users to pass a CAPTCHA test before signing up for an email address. This innovation cut down on spambot activity and other companies proceeded to adopt CAPTCHAs to protect their web forms.

Over time, hackers used data from completed CAPTCHA challenges to develop algorithms capable of reliably passing CAPTCHA tests. This development marked the beginning of an ongoing arms race between CAPTCHA developers and cybercriminals, which has fueled the evolution of CAPTCHA functions.

reCAPTCHA v1

Launched by von Ahn in 2007, reCAPTCHA v1 had a dual aim: to make the text-based CAPTCHA challenge more difficult for bots to crack and to improve the accuracy of OCR being used at the time to digitize printed texts.

reCAPTCHA achieved the first goal by increasing the distortion of text displayed to the user, and eventually adding lines through the text.

It achieved the second goal by replacing a single image of randomly generated distorted text with two distorted text images. Two different OCR programs scanned these words from actual texts. Both programs correctly identified the first word, known as the control word. The second word was a word that both OCR programs failed to identify.

If the user correctly identified the control word, reCAPTCHA assumed that the user was human and allowed them to continue their task. It also assumed that the user identified the second word correctly and used the response to verify future OCR results.

In this way, reCAPTCHA improved anti-bot security and improved the accuracy of texts being digitized at the Internet Archive and the New York Times. Ironically, over time it also helped improve artificial intelligence and machine learning algorithms to the point that, by 2014, they can identify the most distorted text CAPTCHAs 99.8% of the time.

In 2009, Google acquired reCAPTCHA and began to use it to digitize texts for Google Books while offering it as a service to other organizations. However, as OCR technology progressed with the help of reCAPTCHA, so did the artificial intelligence programs that can effectively solve text-based reCAPTCHAs.

In response, Google introduced image recognition reCAPTCHAs in 2012, which replaced distorted text with images taken from Google Street view. Users proved their humanity by identifying real-world objects like street lights and taxicabs. In addition to avoiding sophisticated OCR methods used by bots, image-based reCAPTCHAs provided a more convenient experience for mobile app users.

Google reCAPTCHA v2: No CAPTCHA reCAPTCHA

In 2014, Google released reCAPTCHA v2, which replaced text and image-based challenges with a simple checkbox stating “I am not a robot.” As users check the box, reCAPTCHA v2 analyzes their interactions with web pages. It evaluates factors such as typing speed, cookies, device history and IP address to determine whether a user is likely to be human.

The checkbox is also part of how the CAPTCHA works: no CAPTCHA reCAPTCHA tracks the user’s mouse movements as they click the box. A human’s movements tend to be more chaotic, whereas bots’ movements are more precise. If no CAPTCHA reCAPTCHA suspects a user might be a bot, it presents them with an image-based CAPTCHA challenge.

reCAPTCHA v3

reCAPTCHA v3, which debuted in 2018, does away with the checkbox and expands upon the AI-driven risk analysis of no CAPTCHA reCAPTCHA. ReCAPTCHA v3 works through a JavaScript API, running silently in the background to assign each user a score from 0.0 (likely a bot) to 1.0 (likely a human).

Website owners can set automated actions to trigger at certain moments when a user’s score suggests they might be a bot. For example, blog comments from low-scoring users might be sent to a moderation queue when they click “submit.” Low-scoring users might also be asked to complete a multifactor authentication process when they attempt to log in to an account.

AI-based authentication methods like reCAPTCHA v3 seek to sidestep the problem of hackers. By removing interactive challenges from the CAPTCHA verification process, they prevent hackers from using data from previously solved challenges to train bots to crack new CAPTCHAs. Because of this shift, experts believe AI-based CAPTCHAs might become the norm, completely replacing challenge-based CAPTCHAs in the next five to ten years.

Security Intelligence | 18 February, episode 21

Your weekly news podcast for cybersecurity pros

Whether you're a builder, defender, business leader or simply want to stay secure in a connected world, you'll find timely updates and timeless principles in a lively, accessible format. New episodes on Wednesdays at 6am EST.

CAPTCHA use cases

CAPTCHA technology has several common uses as a bot detection and prevention measure, including:

  1. Preventing fake registrations
  2. Guarding against suspicious transactions
  3. Protecting online poll integrity
  4. Stopping comment and product review spam
  5. Defending against brute-force and dictionary attacks

Preventing fake registrations

By presenting users with a CAPTCHA test before signing up for an email account, social media profile or other online services, companies can block bots from accessing these platforms. These bots often attempt to spread spam, distribute malware or engage in other malicious activities. CAPTCHA’s earliest adopters were companies like Yahoo, Microsoft and AOL, who wanted to stop bots from registering for fake email accounts.

Guarding against suspicious transactions

Companies like Ticketmaster have used CAPTCHA to stop bots from buying up limited commodities (for example concert tickets) and reselling them on secondary markets.

Protecting online poll integrity

Bots can compromise online polls without a deterrent like CAPTCHA. The need to protect the integrity of online poll results motivated some of the earliest experiments in CAPTCHA technology. 

For example, to help ensure the quality of its online opinion polls during the 1996 US presidential election, the Digital Equipment Corporation added a verification step. Users were asked to locate and click a pixelated image of a flag on the web page before casting their votes.

Stopping comment and product review spam

Malicious actors and cybercriminals often use blog and article comment sections to spread scams and malware. They might also engage in review spam, in which they post large numbers of fake reviews to artificially boost a product’s rankings on an e-commerce website or search engine. Bots can also use unprotected comment sections to launch harassment campaigns.

These malicious activities can be mitigated by asking users to complete a CAPTCHA before posting a comment or review.

Defending against brute-force and dictionary attacks

In brute-force and dictionary attacks, hackers break into an account by using bots to guess combinations of numbers, letters and special characters until they find the correct password. These attacks can be halted by requiring users to complete a CAPTCHA after repeated failed login attempts.

CAPTCHA disadvantages

While CAPTCHA technology has demonstrated effectiveness in stopping bots, it is not without its disadvantages, including:

  1. Inconvenient user experiences
  2. Accessibility challenges
  3. Reduced conversion rates
  4. Bot AI’s ability to defeat new CAPTCHAs
  5. Privacy concerns

Inconvenient user experiences

CAPTCHA challenges add an extra step to registration, login and form-completion processes that some people find irksome. Moreover, as CAPTCHA complexity has increased to defeat more sophisticated bots, solving CAPTCHAs has also become frustrating for users.

In a 2010 study, when Stanford University researchers asked groups of three people to solve the same CAPTCHAs, participants agreed unanimously on the CAPTCHA solution just 71% of the time. The study also found that non-native English speakers have a harder time solving CAPTCHAs than native speakers, which suggests that CAPTCHAs might be more challenging for some demographic groups than others.

Accessibility challenges

Text and image CAPTCHAs can be challenging or impossible to solve for visually impaired users. Screen readers cannot read most CAPTCHA challenges because these tests are unreadable by machines, which compounds the problem.

Alternative forms of CAPTCHAs have attempted to address this issue, but they have their own limitations. Audio CAPTCHAs, which require users to decipher garbled audio are notoriously difficult to solve. The mentioned Stanford study found that users agree unanimously on audio CAPTCHA solutions just 31% of the time.

Algorithms can easily crack MAPTCHA, a type of CAPTCHA that requires users to solve simple math problems.

Using inaccessible CAPTCHAs can have legal repercussions as well. Introduced in 1998, the Section 508 Amendment to the Rehabilitation Act of 1973 requires US federal agencies and their private sector partners to make digital information accessible to individuals with disabilities. Companies might be in violation of this requirement when they do not have accessible CAPTCHA options.

Reduced conversion rates

The inconvenient user experience and inaccessibility of CAPTCHAs can negatively influence conversion rates. In a 2009 case study of 50 websites, asking users to complete a CAPTCHA reduced legitimate conversions by 3.2%. Audio CAPTCHAs can be especially detrimental: the Stanford study mentioned before found that users give up on solving sound-based CAPTCHAs 50% of the time.

How bot AI defeats new CAPTCHAs

CAPTCHA schemes have changed so many times because the technology’s inception because bots have consistently evolved to defeat each new CAPTCHA challenge. The very structure of CAPTCHA technology contributes to this problem because CAPTCHAs rely on unsolved AI problems to thwart bots. When humans solve CAPTCHA challenges, they generate datasets that can train machine learning algorithms to overcome these previously impossible AI problems.

For example, in 2016, computer science researcher Jason Polakis used Google reverse image search to solve Google’s image-based CAPTCHAS with 70%.

Privacy concerns

While new forms of CAPTCHA try to solve accessibility problems and halt the bot arms race by removing interactive challenges entirely, some users and researchers find AI-driven CAPTCHAs invasive. People have raised concerns about how reCAPTCHA v3 uses codes and cookies to track users across multiple websites. Some feel there is not enough transparency into how this tracking data might be used for purposes beyond verification.

Related solutions
IBM Verify

Build a secure, vendor-independent identity framework that modernizes identity and access management (IAM), integrates with existing tools and enables seamless hybrid access without added complexity.

Explore IBM Verify
Identity and access management (IAM) solutions
Secure and unify identities across hybrid environments, reducing risk while simplifying access.
Explore IAM solutions
Identity and access management (IAM) services

Protect and manage user access with automated identity controls and risk-based governance across hybrid-cloud environments.

    Explore IAM services
    Take the next step

    Enhance identity and access management (IAM) with IBM Verify for seamless hybrid access and strengthen identity protection by uncovering hidden identity-based risks with AI.

    Discover IBM Verify  Explore identity and access management solutions