17 September, 2024
Artificial general intelligence (AGI) is a hypothetical stage in the development of machine learning (ML) in which an artificial intelligence (AI) system can match or exceed the cognitive abilities of human beings across any task. It represents the fundamental, abstract goal of AI development: the artificial replication of human intelligence in a machine or software.
AGI has been actively explored since the earliest days of AI research. Still, there is no consensus within the academic community regarding exactly what would qualify as AGI or how to best achieve it. Though the broad goal of human-like intelligence is fairly straightforward, the details are nuanced and subjective. The pursuit of AGI therefore comprises the development of both a framework to understand intelligence in machines and the models able to satisfy that framework.
The challenge is both philosophical and technological. Philosophically, a formal definition of AGI requires both a formal definition of “intelligence” and general agreement on how that intelligence could be manifested in AI. Technologically, AGI requires the creation of AI models with an unprecedented level of sophistication and versatility, metrics and tests to reliably verify the model’s cognition and the computing power necessary to sustain it.
The notion of “general” intelligence or general AI, can be best understood in contrast to narrow AI: a term that effectively describes nearly all current AI, whose “intelligence” is demonstrated only in specialized domains.
The 1956 Dartmouth Summer Research Project on Artificial Intelligence, which brought together mathematicians and scientists from institutions including Dartmouth, IBM, Harvard and Bell Labs, is considered the origin of the term “artificial intelligence.” As described in the proposal, “the study [was] to proceed based on the conjecture that every aspect of learning or any other feature of intelligence can be so precisely described that a machine can be made to simulate it.”
This burgeoning field of “AI” sought to develop a roadmap to machines that can think for themselves. But in the following decades, progress toward human-like intelligence in machines proved elusive.
Much greater progress was made in the pursuit of computing machines that perform specific tasks that typically require significant intelligence in humans, such as chess playing, healthcare diagnostics, forecasting or automobile driving. But these models—for instance, those powering self-driving cars—demonstrate intelligence only within their specific domains.
In 2007, AI researcher Ben Goertzel popularized the term “artificial general intelligence” (AGI), at the suggestion of DeepMind cofounder Shane Legg, in an influential book of the same name (link resides outside ibm.com). In contrast to what he dubbed “narrow AI,” an artificial general intelligence would be a new type of AI with, among other qualities, “the ability to solve general problems in a non-domain-restricted way, in the same sense that a human can.”
AGI is strongly associated with other concepts in machine learning, often being conflated or even used interchangeably with strong AI or artificial superintelligence. While these concepts have a fair amount of overlap, they are each a distinct conception of AI in their own right.
"Strong AI," a concept discussed prominently in the work of philosopher John Searle, refers to an AI system demonstrating consciousness and serves mostly as a counterpoint to weak AI. While strong AI is generally analogous to AGI (and weak AI is generally analogous to narrow AI), they are not mere synonyms of one another.
In essence, whereas weak AI is simply a tool to be used by a conscious mind—that is, a human being—strong AI is itself a conscious mind. Though it is typically implied that this consciousness would entail a corresponding intelligence equal or superior to that of human beings, strong AI is not explicitly concerned with relative performance on various tasks. The two concepts are often conflated because consciousness is usually taken to be either a prerequisite or a consequence of “general intelligence.”
Despite their similarities, AGI and strong AI ultimately describe complementary concepts, rather than identical concepts.
Artificial superintelligence, as its name implies, constitutes an AI system whose capabilities vastly exceed those of human beings.
It’s worth noting that this concept does not necessarily presuppose "general" superintelligence. Of these 3 analogous AI stages—AGI, strong AI and artificial superintelligence—artificial superintelligence is the only one that has arguably been achieved already. Rather than being the sole domain of science fiction, there exist narrow AI models demonstrating what might fairly be called superintelligence in that they exceed the performance of any human being on their specific task.
For example,
Though these models might represent breakthroughs in artificial superintelligence, they have not achieved artificial "general" intelligence, as such AI systems cannot autonomously learn new tasks or expand their problem-solving capabilities beyond their narrowly defined scope.
Furthermore, it’s worth noting that superintelligence is not a prerequisite of AGI. In theory, an AI system that demonstrates consciousness and an intelligence level comparable to that of an average, unremarkable human being would represent both AGI and strong AI—but not artificial superintelligence.
There is no consensus among experts regarding what exactly should qualify as AGI, though plenty of definitions have been proposed throughout the history of computer science. These definitions generally focus on the abstract notion of machine intelligence, rather than the specific algorithms or machine learning models that should be used to achieve it.
In 2023, a Google Deepmind paper (link resides outside ibm.com) surveyed existing academic literature and identified several categories of frameworks for defining artificial general intelligence:
Alan Turing, a seminal figure in the history of theoretical computer science, published one of the earliest and most influential definitions of machine intelligence in his 1950 paper, “Computer Machinery and Intelligence.” The core of his argument was that intelligence can be defined by behavior, rather than mystical philosophical qualities. Acknowledging the difficulty of pinning down firm definitions of concepts such as machines and thinking, Turing proposed a simple way around the problem based on a party game called the Imitation Game.
The “Turing Test” is simple: a human observer must read text samples and determine whether they were generated by a human or by a machine. Turing proposed that if a human cannot distinguish between the program’s output and another human’s output, the program can be said to demonstrate human-like intelligence.
Criticisms of the Turing Test
Despite its monumental influence, computer scientists today do not consider the Turing Test to be an adequate measure of AGI. Rather than demonstrate the ability of machines to think, the test often simply highlights how easy humans are to fool.
For instance, in 1966 Joseph Weizelbaum created a chatbot program called ELIZA that applied simple rules to transform the human language of a user’s input into a response from the chatbot. The first program he designed with this chatbot was called DOCTOR, designed to imitate a Rogerian psychotherapist by either responding generically or rephrasing user input in the form of a question:
Human: Well, my boyfriend made me come here. ELIZA: Your boyfriend made you come here? Human: He says I'm depressed much of the time. ELIZA: I am sorry to hear you are depressed. Human: It's true. I'm unhappy. ELIZA: Do you think coming here will help you not to be unhappy?
As Weizelbaum explained in his 1976 work, Computer Power and Human Reason, he was “startled to see how quickly and very deeply people conversing with DOCTOR became emotionally involved with the computer and how unequivocally they anthropomorphized it.” He noted that even his secretary, who watched him work on the program for months and obviously knew its simple methodology, asked him to leave the room for privacy when she began conversing with it.1 This phenomenon has come to be known as The ELIZA Effect (link resides outside ibm.com).
Another proposed definition sets a higher bar for AGI: an AI system possessing consciousness. As articulated by Searles, “according to strong AI, the computer is not merely a tool in the study of the mind; rather, the appropriately programmed computer really is a mind.”2
Searles authored a prominent philosophical refutation of the Turing Test’s ability to prove strong AI in 1980. He describes an English speaker with absolutely no understanding of Chinese, locked in a room full of books of Chinese symbols and instructions (in English) for manipulating the symbols. He argues that the English speaker could fool someone in a different room into thinking he can speak Chinese by simply following the instructions to manipulate numbers and symbols, despite not understanding the other person’s messages nor even his own replies.3
The decades of debate around the Chinese Room Argument, summarized in this Stanford Encyclopedia of Philosophy article (link resides outside IBM.com), demonstrate the lack of scientific consensus on a definition of “understanding” and whether a computer program can possess it. This disagreement, along with the possibility that consciousness might not even be a requirement for human-like performance, makes Strong AI alone an impractical framework for defining AGI.
An intuitive approach to AGI, which aims to replicate the kind of intelligence that (to our knowledge) has only ever been achieved by the human brain, is to replicate the human brain itself.4 This intuition led to the original artificial neural networks, which in turn have yielded the deep learning models that currently represent the state-of-the-art across nearly every subfield of AI.
The success of deep learning neural networks, particularly the large language models (LLMs) and multimodal models at the forefront of generative AI (gen AI), demonstrate the benefits of drawing inspiration from the human brain through self-organizing networks of artificial neurons. However, many of the most capable deep learning models to date use transformer-based architectures, which themselves don’t strictly emulate brain-like structures. This suggests that explicitly mimicking the human brain might not be inherently necessary to achieve AGI.
A more holistic approach is to simply define AGI as an AI system that can do all the cognitive tasks that people can do. While this definition is helpfully flexible and intuitive, it’s ambiguous: which tasks? Which people? This ambiguity limits its practical use as a formal framework for AGI.
The most notable contribution of this framework is that it limits the focus of AGI to non-physical tasks. Doing so disregards capabilities like physical tool use, locomotion or manipulating objects, which are often considered to be demonstrations of “physical intelligence.”5 This eliminates further advancements in robotics as a prerequisite to the development of AGI.
Another intuitive approach to AGI, and to intelligence itself, is to emphasize the ability to learn—specifically, to learn as broad a range of tasks and concepts as humans can. This echoes Turing in “Computing Machinery and Intelligence,” wherein he speculates that it might be wiser to program a childlike AI and subject it to a period of education, rather than directly program a computer system as an adult mind.6
This approach is at odds with narrow AI, which explicitly trains models to perform a specific task. For example, even an LLM such as GPT-4 that ostensibly demonstrates the capacity for few-shot learning or even zero-shot learning on “new” tasks is limited to functions adjacent to its main task: autoregressively predicting the next word in a sequence.
Though state-of-the-art multimodal AI models can perform increasingly diverse tasks, from natural language processing (NLP) to computer vision to speech recognition, they’re still limited to a finite list of core skills represented in their training data sets. They can’t, for instance, also learn to drive a car. A true AGI would be able to learn from new experiences in real time—a feat unremarkable for human children and even many animals.
AI researcher Pei Wang offers a definition of machine intelligence that’s useful within this framework: “the ability for an information processing system to adapt to its environment with insufficient knowledge and resources.”7
Open AI, whose GPT-3 model is often credited with initiating the current generative AI era upon the launch of ChatGPT, defines AGI in its charter as “highly autonomous systems that outperform humans at most economically valuable work.”8
As the DeepMind paper notes, this definition omits elements of human intelligence whose economic value is hard to define, such as artistic creativity or emotional intelligence. At best, those aspects of intelligence can realize economic value in a roundabout way—such as creativity producing profitable movies or emotional intelligence powering machines that perform psychotherapy.
The focus on economic value also implies that capabilities comprising AGI can only be counted if they’re actually put into real-world deployment. If an AI system can rival humans at a specific task, but is impractical to ever actually deploy for that task for legal, ethical or social reasons, can it be said to “outperform” humans?
The DeepMind paper also notes that OpenAI shut down its robotics division in 2021, implying that the replication of physical labor—and corresponding implications on the role of “physical intelligence” in AGI—is not part of this interpretation of economic value.
Gary Marcus, a psychologist, cognitive scientist and AI researcher, defined AGI as “a shorthand for any intelligence…that is flexible and general, with resourcefulness and reliability comparable to (or beyond) human intelligence.”9 Marcus proposed a set of benchmark tasks intended to demonstrate that adaptability and general competence, akin to a specific and practical implementation of the “learn tasks” framework.
This quantification of AGI is reminiscent of a thought experiment proposed by Apple cofounder Steve Wozniak, who asked: “Could a computer make a cup of coffee?” Wozniak notes that this seemingly simple task is actually quite complex: one must be able to walk, to know what kitchens are, to know what a coffee machine or coffee might look like and to interface with drawers and cabinets. In short, a human must draw upon a lifetime of experience just to brew a cup of coffee.10
Specifically, Marcus proposed a set of 5 benchmark tasks that would demonstrate AGI if performed by a single AI system.11
While this task-oriented framework introduces some much-needed objectivity into the validation of AGI, it’s difficult to agree on whether these specific tasks cover all of human intelligence. The third task, working as a cook, implies that robotics—and thus, physical intelligence—would be a necessary part of AGI.
In 2023, CEO of Microsoft AI and DeepMind co-founder Mustafa Suleyman proposed the term “Artificial Capable Intelligence” (ACI) to describe AI systems that can accomplish complex, open-ended, multistep tasks in the real world. More specifically, he proposed a “Modern Turning Test” in which an AI would be given USD 100,000 of seed capital and tasked with growing that into USD 1M.12 Broadly speaking, this blends OpenAI’s notion of economic value with Marcus’s focus on flexibility and general intelligence.
While this benchmark does likely prove genuine ingenuity and interdisciplinary competence, in practical terms this framing of intelligence as a specific kind of economic output is prohibitively narrow. Furthermore, focusing solely on profit introduces significant alignment risks.13
Some researchers, such as Blase Agüera y Arcas and Peter Norvig, have argued that advanced LLMs such as Meta’s Llama, Open AI’s GPT and Anthropic’s Claude have already achieved AGI. They posit that generality is the key element of AGI and that today’s models can already discuss a wide range of topics, perform a wide range of tasks and process a diverse array of multimodal inputs. “’General intelligence’ must be thought of in terms of a multidimensional scorecard,” they posit. “Not a single yes or no proposition.”14
There are many detractors to this position. The authors of the DeepMind paper argue that generality itself does not qualify as AGI: it must be paired with a certain degree of performance. For example, if an LLM can write code, but that code isn’t reliable, then that generality “is not yet sufficiently performant.”
Yann LeCun, Meta’s chief AI scientist, has stated that LLMs lack AGI because they don’t have common sense: they can’t think before they act, can’t perform actions in the real world or learn through embodied experience and lack persistent memory and capacity for hierarchical planning.15 On a more fundamental level, LeCun and Jacob Browning have argued that “a system trained on language alone will never approximate human intelligence, even if trained from now until the heat death of the universe.”16
Goertzel and Pennachin state that there are at least three basic technological approaches to AGI systems, in terms of algorithms and model architectures.
Predictions about the future of AI always entail a high degree of uncertainty, but nearly all experts agree it will be possible by the end of the century and some estimate it might happen far sooner.
In 2023, Max Roser of Our World in Data authored a roundup of AGI forecasts (link resides outside ibm.com) to summarize how expert thinking has evolved on AGI forecasting in recent years. Each survey asked respondents—AI and machine learning researchers—how long they thought it would take to reach a 50% chance of human-level machine intelligence. The most significant change from 2018–2022 is the respondents’ increasing certainty that AGI would arrive within 100 years.
However, it's worth noting that those three studies were each conducted before the launch of ChatGPT and the beginning of the modern generative AI (gen AI) era. The increasing pace of advancements in AI technology since late 2022, particularly in LLMs and multimodal AI, has yielded a much different forecasting environment.
In a larger follow-up survey by Grace et al of 2,778 AI researchers, conducted in October 2023 and published in January 2024, respondents estimated a 50% chance of “unaided machines outperforming humans in every possible task” by 2047—13 years earlier than experts predicted in a similar study only one year prior.
But as Roser notes, research has shown that experts in many fields aren’t necessarily reliable when forecasting the future of their own discipline. He cites the example of the Wright brothers, generally considered to be the inventors of the world’s first successful airplane. In an award acceptance speech on 5 November 1908 at the Aéro Club de France in Paris, Wilbur Wright is said to have proclaimed, “I confess that in 1901, I said to my brother Orville that men would not fly for 50 years. Two years later, we were making flights.”18
NOTE: all links reside outside ibm.com.
1 Computer Power and Human Reason: from Judgment to Calculation (page 6), Joseph Weizenbaum, 1976.
2 "Minds, brains, and programs", Behavioral and Brain Sciences (archived via OCR by University of Southampton), 1980.
3 ibid.
4 "Can we accurately bridge neurobiology to brain-inspired AGI to effectively emulate the human brain?", Research Directions: Bioelectronics (published online by Cambridge University), 12 February 2024.
5 "Physical intelligence as a new paradigm", Extreme Mechanics Letters, Volume 46, July 2021.
6 "Computing Machinery and Intelligence", Mind 49: 433-460 (published online by University of Maryland, Baltimore County), 1950.
7 "On the Working Definition of Intelligence", ResearchGate, January 1999.
8 "Open AI Charter", OpenAI, archived on 1 September 2024.
9 "AGI will not happen in your lifetime. Or will it?", Gary Marcus (on Substack), 22 January 2023.
10 "Wozniak: Could a Computer Make a Cup of Coffee?", Fast Company (on YouTube), 2 March 2010.
11 "Dear Elon Musk, here are five things you might want to consider about AGI", Gary Marcus (on Substack), 31 May 2022.
12 "Mustafa Suleyman: My new Turing test would see if AI can make $1 million", MIT Technology Review, 14 July 2023.
13 "Alignment of Language Agents", arXiv, 26 March 2021.
14 "Artificial General Intelligence Is Already Here", Noema Magazine, 10 October 2023.
15 "Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI", Lex Fridman Podcast (on YouTube), 10 October 2023.
16 "AI and The Limits of Language," Noema Magazine, 23 August 2023.
17 "Why is the human brain so difficult to understand? We asked 4 neuroscientists." Allen Institute, 21 April 2022.
18 "Great Aviation Quotes: Predictions," Great Aviation Quotes.
Learn fundamental concepts and build your skills with hands-on labs, courses, guided projects, trials and more.
Learn how to confidently incorporate generative AI and machine learning into your business.
Want to get a better return on your AI investments? Learn how scaling gen AI in key areas drives change by helping your best minds build and deliver innovative new solutions.
Learn how to select the most suitable AI foundation model for your use case.
IBM® Granite™ is our family of open, performant and trusted AI models, tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.
Dive into the 3 critical elements of a strong AI strategy: creating a competitive edge, scaling AI across the business and advancing trustworthy AI.
We surveyed 2,000 organizations about their AI initiatives to discover what's working, what's not and how you can get ahead.
Train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with IBM watsonx.ai, a next-generation enterprise studio for AI builders. Build AI applications in a fraction of the time with a fraction of the data.
Put AI to work in your business with IBM's industry-leading AI expertise and portfolio of solutions at your side.
Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.