Scaling AI, agent-led future and race to AGI

Watch the episode
New thumbnail image for the weekly Mixture of Experts podcast
Episode 29: Scaling AI, agent-led future and race to AGI

Is 2024 the year scaling AI officially breaks? In episode 29 of Mixture of Experts, host Tim Hwang is joined by Anthony Annunziata, Kate Soule and Naveen Rao. Listen to the experts discuss whether we are living in a post-scale world. Next, delve into chatting about AI agents and what does the future hold for this technology. Finally, hear them debate whether AGI is here to stay. Tune in to this week’s Mixture of Experts to find out.

Key takeaways:

  • 0:00 Intro
  • 0:48 Scaling AI
  • 16:56 Agent-led future
  • 26:52 AGI hype

The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.

📩 Sign up for a monthly newsletter for AI updates from IBM.

Episode transcript

Tim Hwang: Will we see a 1 million GPU cluster opening up sometime in the next 3 years? Kate Soule is a director of technical product management at Granite. Kate, welcome back to the show. What do you think?

Kate Soule: No, I really don’t think so.

Tim Hwang: Anthony Annunziata is director of AI Open Innovation. Anthony, welcome to the show for the first time. What’s your take?

Anthony Annunziata: I don’t think so either.

Tim Hwang: And then we’ve got a very special guest. Naveen Rao is VP of AI at Databricks. I think our first external guest on Mixture of Experts. Naveen, what do you think?

Naveen Rao: Unlikely. I think there will be a reset in terms of expectations and ROI, and that’s probably going to drive a little more rationality into building these out.

Tim Hwang: All right, all that and more on today’s Mixture of Experts.

I’m Tim Hwang, and welcome to Mixture of Experts. Each week, we bring you the insights you need to navigate the ever-changing, ever-unpredictable world of artificial intelligence. Today, we’re talking about 2025—what the future holds for agents and for AGI—but first, let’s talk about the future of scale.

AI companies have been chasing scale; unless you’ve been living under a rock, that won’t be unfamiliar. Where that’s been most prominent is in data centers and power. McKinsey just came out with a report estimating that global demand for data centers could triple by 2030, with generative AI driving huge increases in energy consumption. Their mind-boggling estimate is that spend will be $250 billion for this infrastructure by 2030.

Kate, maybe I’ll kick it to you first. Can you give our listeners a little intuition for why all these companies are chasing scale and why that’s been important to the history of AI so far?

Kate Soule: Yeah, sure thing, Tim. If you think of how these models have trained and evolved, it’s been a simple formula: take as much data as you can get, add as much compute as you have access to, and train a model for as long as you can afford to maximize performance. So, the recipe for scale has been a mixture of getting more data and more compute, which drives costs and demand for data centers.

I think interesting things will emerge that might break these trends. For one, we’re just running out of data. All the data is being used; model size no longer scales proportionally, and there’s only so much data worth training on. We’re also seeing more compute spent at inference time instead of just training time. As we max out what we can pre-bake into the model during training, we’re looking at other places, like during inference, to spend extra compute to boost performance. That might also start to break some of those trends.

Tim Hwang: That’s great. Naveen, maybe I’ll turn to you. In your opening comment, you said scale isn’t all you need and we’ll have to reevaluate how we do machine learning. To kind of take Kate’s comment, why shouldn’t we believe it will keep working? We’ve had huge successes just doing the “dumb thing”—adding more data and compute. Why is now different?

Naveen Rao: Well, you also have to look at the motivations. I was a scale maximalist for a long time; I started the first AI chip company in 2014, designed for scale from day one. I’ll offer a different explanation. Yes, everything Kate said is correct, but there’s another motivation: as an engineer, it’s a really freaking cool problem to scale something bigger and bigger. It’s just cool; you want to work on it. I’ve been seduced by that myself—“Oh, this is cool; I want to build that.” There are interesting challenges each time—latency starts to matter, how do I deal with that, can I come up with new strategies? It’s an intellectual pursuit.

But at some point, you’ve got to solve problems not just for their own sake, and I think that’s where we are now. As Kate said, we’ve run out of data, but also, the paradigm of simply training on more data isn’t going to yield more results. I’m happy to go into why: these things are essentially conditional probability estimators, and you can never uncover every conditional probability in the data. You’ll get to the heat death of the universe before you uncover all of them. There will always be some return from getting bigger and using more data, but it’s diminishing for real-world applications, so you need a new paradigm.

Tim Hwang: Yeah, for sure. Do you want to talk a little about what you think that new paradigm is? It’s the multi-billion dollar question. While we’re speculating about 2025, what are your intuitions? If data is failing us—a crazy thing to say—what comes after data?

Naveen Rao: I think there are several facets. On the algorithmic side, it’s intuitive. If you’ve been around a child learning or training an animal, you don’t train it through exhaustive observation. You don’t put a kid in front of every observation of a task and expect them to learn it; that’s what we’re asking of an AI model. We actually do it through trial and error, performing something and getting a reward or anti-reward—that’s reinforcement learning.

We do a weak version of reinforcement learning with neural networks, but it’s predicated on a huge set of distributions it’s been trained on. Animals are more efficient; they observe some, build baseline distributions, then act and update these distributions all at once. I think something towards that end is the answer. We can be more compact with our representations and discern causality. Causality may or may not exist from a physics standpoint, but it’s a more compact way to describe how the world works. We can’t make it hugely observational; it’s not going to work.

Tim Hwang: Yeah, that’s super fascinating. It’s funny to think the recent history of deep learning is out of order. In the AlphaGo era, everything was going to be reinforcement learning, then that petered out as other approaches succeeded. You’re saying we kind of have to get back to that; that’s actually true.

Naveen Rao: I actually think AlphaFold and AlphaZero were very much on the right track. We didn’t have the scale part figured out yet, but it was the right approach.

Anthony Annunziata: I’d present a complementary perspective, maybe a simple one. Research is hard. When you find something that works in AI, people jump on it and run. When that happens, there’s irrational exuberance in the research community. Sometimes decisions are made more deeply, but sometimes you just find something that works and push it as hard as you can until it stops working or until other things catch up, including cost and ROI.

Tim Hwang: 100%, yeah, for sure. Your title is specifically looking at open innovation. Traditionally—by which I mean the last 36 months—a lot of breakthroughs required access to a really big computer no one else had. Anthony, if scale is no longer the thing that gets the breakthrough, are there more opportunities elsewhere? Will more people be able to advance the state of the art without needing a million-GPU cluster?

Anthony Annunziata: Yeah, I think so. Taking a bit of what Naveen said, innovation at the architectural level, the feedback level, in how AI systems are built—there’s a huge opportunity for that in the open community, in universities, in players left on the sidelines who struggled to catch up with the scale story due to the centricity of compute. We’re going to see even more of that.

The other side is, after a couple of years of pushing ahead, we have great open models that are very capable, and you’ve already seen a flourishing of innovation with them. There’s a lot more to go with what we’ve built already and what will continue to come out.

Tim Hwang: I want to force the panel to make concrete predictions. One interesting thing about scale is the dream that if you flip over a few more cards, the model will get that much better. It feels like the gas could run out of the scale car before we realize scale is broken. Is 2025 the year where scale breaks? Where we realize this isn’t going to work anymore?

Naveen Rao: You think it already broke?

Tim Hwang: Why do you think that?

Naveen Rao: Show me a bigger model than GPT-4. No one’s built one, and there’s a good reason. They probably have built one, but it didn’t do anything all that special. I think that’s the issue. Scale is not the only ingredient; you need that plus something else, and maybe then you’ll get super intelligence, but we haven’t cracked that yet.

Tim Hwang: Kate, Anthony, do you agree? Is scale already failed? Are we already in a post-scale world?

Kate Soule: I think there’s an important part we haven’t covered: part of the advantage of scale is boosting the performance of smaller models. Maybe the performance at the top has maxed out due to pure size, but there’s more to talk about in scaling performance and packing more performance into fewer parameters on smaller models, using large models as teacher models, synthetic data generators, or in RLAIF workflows to improve smaller models.

We’ve seen a trend: what a model could do last year with 70 billion, 100 billion, or a trillion parameters, you can do many of those same tasks in fewer than eight billion parameters today. I don’t think we’ve maxed out that curve of downsizing and packing more performance into smaller models.

Tim Hwang: The commercial dynamics are interesting. The rhetoric has been, “We’ll train a massive model and sell an API against it.” Kate, you’re presaging a world where big labs have gigantic models for internal purposes, almost for minting smaller models that are the real commercial action.

Kate Soule: I think there’s a huge competitive advantage for model providers having their own in-house large model to boost and create the smaller models everyone will use. No one wants to run a trillion-parameter model for real tasks. It’s cool to say you have it, but no one wants to use it in real-world applications. Smaller models will be much more cost-effective.

Naveen Rao: I’ll offer another set of data. I’m a neuroscientist from grad school, and I like to look at biology as a blueprint. Over four billion years of evolution, interesting things came about. If you look at brains, scale wasn’t all you needed. Humans don’t have the largest brains; brains scale with body size—blue whales have the largest by mass, likely more neurons. Dolphins and elephants have large brains, but they haven’t had the same impact. I argue there are architectural differences that lead to this. We came up with the right mix of scale, architecture, and environment to build human intelligence.

Tim Hwang: Yeah, it’s like the adage: super intelligence isn’t all you need. You might have a huge brain, but its impact may be limited.

Naveen Rao: Yeah, that’s a whole other topic. I don’t even know what super intelligence is; how do we define it? We haven’t even solved regular intelligence; we can’t even define it.

Tim Hwang: We’ll get to that. Anthony, maybe I’ll turn to you for predictions to close this segment. I observe that contracts to build massive data centers are happening now, regardless of scale. In 36 months, will we see these huge facilities balled? Big, empty data centers?

Anthony Annunziata: No, I don’t think that’ll happen. There’ll be a correction, but a smooth one. What’s important is we focused on the training part of scaling. The scaling of deployment—whether medium-sized models, small models, or APIs to big models—will depend on cloud data center availability. The trend is reasonable; maybe inflated a bit, but it won’t wholesale collapse.

Kate Soule: I agree. Where there are opportunities to add something beyond scale to improve performance, we’re seeing innovation. We can do more at runtime for a single model, regardless of scale, to boost performance—like running it multiple times to generate multiple answers. If that trend continues, a larger population will drive up inferencing costs, paying for their small fraction, versus training where big providers dump billions. If everyone sees that lift and has their own ROI, that will drive investment in inference compute even more. An API call could cost 10 times what it does today because it’s worth it for performance.

Tim Hwang: Yeah, for sure. It’ll be interesting to see data centers built for training but used for inference.

Anthony Annunziata: I think there’s a whole scale story in deploying AI practically that just started. There are parallels to the internet buildout around 2000 when the stock market crashed; people said we overbuilt network infrastructure. In the fullness of time, it was underbuilt. There was an overbuild for a short period, maybe two or three years, until demand caught up. We’ll probably end up there. There will be articles saying everything was overbuilt, the bubble burst, and in two years, it’ll all make sense.

Tim Hwang: Wouldn’t that be a nice change?

Naveen Rao: High-availability GPU compute at reasonable prices.

Tim Hwang: Dream the impossible dream.

I’m moving us to our second segment. If one word characterized enterprise AI in 2024, it’s “agents, agents, agents.” On this show, it’s an in-joke that agents must come up at least once. News is out that Salesforce is planning to hire 1,000 salespeople to support its push into the agents market. As we get into November and think about 2025, is the future really agents? Will I have to hear more about agents in 2025? Naveen, I’ll kick it to you first. How do you think this market will evolve? Is hiring a thousand salespeople justified?

Naveen Rao: Well, honestly, no, because I know the state of the art of agents. But it’s a great headline; Salesforce is great at that. It’ll make them appear as a big AI player, and that’s what they’re going for. I don’t think it’s necessary because when an agent really works, you won’t have to sell it much; it’ll just automate things. We’re not there yet; the hype is ahead. There’ll be a big disillusionment in the next two years, then it’ll come back slowly and be super useful in three or four years.

You won’t need a thousand people to focus on agents; it’ll be amazing for the product, and people will use it. Their sales infrastructure should handle it. At Databricks, we’ve gone through this: should we hire a bunch of people to sell AI or lay it into the product? We’ve done a mix; we haven’t hired a thousand, but some, with mixed success. You need to integrate into how people use the tools, make it somewhat invisible, and it will sell itself.

Tim Hwang: Anthony, I see you nodding vigorously. Naveen said promises aren’t matching up. Do you have thoughts on where the gaps are?

Anthony Annunziata: A few thoughts. First, what is an AI agent? There’s a large spectrum. If you look at announcements like the one you referenced, the use of “agent” is a relatively early version in automation—task automation, execution. If by agent we mean a chat experience with more lookup, search capability, asking questions to get the right data—more interactive, with implicit reasoning—we’ve already seen that; it’ll steadily grow.

If instead we’re talking about an agent you give a goal, and it interacts with a large variety of systems and executes without supervision—no way. There are so many steps with compounded error. We can’t even get high-accuracy basic Q&A in many industry domains yet; no way we’ll get that level of automated execution. So, a piece is valid and true and will grow, but there’s a long tail of research for full fruition.

Tim Hwang: It’s funny; an adage about AI is we don’t know what we’re talking about; people say “AI” for everything. Anthony, you’re arguing that’s happening in the agent market—the word is so broad. Are you just talking about RAG? If so, sure, agents exist.

Anthony Annunziata: I agree; there’s a definite stretching of the definition.

Tim Hwang: Kate, maybe I can ask you to jump in. Anthony had two pictures: a chatbot that looks things up, and an agent that does the whole thing in the real world. Are you an optimist? Do you think we’ll get to the second vision, or is it way off?

Kate Soule: I’m pretty skeptical of the broad definition of agent today. An agent is really just a long prompt now—a multi-page prompt asking a model nicely to do five different things, think in a specific order, call APIs a specific way. It works well but isn’t controllable. There’s no real thought on control points needed in agentic workflows for robustness and reliability deployed in the world.

A lot of work is needed to transition from a four-page “word vomit” of instructions to a controllable program with clear rules—some at the system level, some at the model level—that can execute tasks within a degree of freedom, not unlimited. I worry everyone’s amazed that a model can follow four pages of instructions—I can’t read four pages and remember everything—so it’s impressive, and hype is built around it. But if we’re not careful, we’ll cram more instructions into the prompt instead of focusing on control points for AI-enabled workflows to be automated. Does chat need to be part of them? “Agent” connotes conversation or dialogue, but many opportunities for AI aren’t chat-based. A lot of evolution is needed for agents to find application and get traction.

Tim Hwang: Yeah, it’s interesting. The chat thing might be a mistake of history; the long-term evolution might be a bad interface. If I’m writing an email, I don’t want to talk multiple times; I want a short box to put info in and get an email out.

Naveen Rao: Chat has been an obsession in AI for decades—go back to ELIZA. As Kate said earlier, it feels really cool; that’s a strong motivator. Part of this is we haven’t built models that uncover causality. You can’t build something with agency unless it understands the intrinsic nature of the world—“I do this, and that happens.” These models don’t have that; they pick up on patterns but don’t understand causal relationships.

Tim Hwang: I’m starting to get your vibe; you’re grumpy about AI.

Naveen Rao: I devoted my last 15-18 years to this field. I’m not grumpy; I’m a realist.

Tim Hwang: In the spirit of realism, can I push you on predictions for agents in 2025? What’s the bull case? What’s the most impactful thing on agents in the next 12 months, if anything?

Naveen Rao: Yeah, if you narrow the definition, we get something super useful. An LLM that can summarize and do all it does is super useful; it doesn’t mean it’s AGI (I hate that term). As Kate said, the interface isn’t necessarily a chatbot. I want something in an Excel spreadsheet to impute values or describe things. There are so many ways to add value to experiences. Automating tasks—“copy these cells and apply this formula across rows”—that’s an agentic workflow, but it’s not thinking on its own; I’m telling it what to do within the app’s framework.

That’s what we’ll see more of. Inside Databricks, we’re using LLMs and generative AI to improve the experience—finding bugs in SQL code, proposing fixes. These are big time-savers. That’s what we’ll see in 2025; it will drive demand for compute, but it’s not super intelligence. That’s what I’m grumpy about.

Tim Hwang: I’m glad you said that, Naveen; it’s a good sign when a panelist says, “I really dislike that term.”

Moving to the third segment, let’s talk about super intelligence and AGI. As we look toward 2025, it’s part of the narrative. The Information reported that OpenAI is seeing improvement rates in GPT slowing over time. I think Ilya said progress is slowing down. I want to put those rumblings next to what we hear from industry leaders. Sam Altman predicted super intelligence is potentially a thousand days away. Anthropic warned systems are advancing so quickly we need serious, targeted regulation in the next 18 months.

Anthony, I’ll kick it to you first. What are we to make of this? Is AGI on the way? How do we square what we’ve talked about—it’s getting harder—with strong claims that ultra-powerful systems are coming in the next thousand days?

Anthony Annunziata: Look, we’re talking about it, so the headlines work. It’s a compelling topic that attracts public attention, like the superhero obsession. I think a lot of it is that. Where are we today? I don’t even know a working definition of AGI. I can propose my own, but what will matter is more ways AI is integrated, embedded, and helps in specific contexts. Naveen mentioned coding assistance; it’s an early use case with lots of utility; we’ll see more.

When does AI reach general intelligence, even defined as equivalence to human capacity to know, reason, perceive? That’s a long way off.

Naveen Rao: I don’t know if it’s a long way off; it depends on your definition. We will get there; it’ll be harder than we think. Our perception is linear: it gets better every year, so in two years, we’ll have this other thing. That’s not how technologies evolve; we always underestimate or overestimate in the short term but underestimate in the long term because they work on exponentials. A 5-10% improvement year-on-year adds up fast by year seven.

In 10 years, we might have something that reasons and understands causality. My prediction: by 30 years, it’s a 95% chance; by 10 years, 30%. My bounds are 10-30 years from now. That’s not that long.

Tim Hwang: So, people alive today will see it.

Naveen Rao: Yeah, totally, which is cool. But it won’t happen next year; that’s a hype train. We haven’t solved fundamental problems yet, and we’ll see around that precipice a year ahead of time. Right now, it’s not clear, so it’s not credible.

Kate Soule: We’re conflating things—cause and effect vs. super intelligence. There are causal models today in drug discovery that isolate cause-effect relationships. Are we talking about models understanding causal reasoning or sentience, having a personality, doing things of its own will? Those aspirations are more around marketing. There aren’t the right economic incentives to develop that vs. better tools for handling language and tasks.

In the next 3-10+ years is more realistic for causal reasoning.

Tim Hwang: There’s an adage in financial markets: the market can be irrational longer than you can stay solvent. I joked with a friend: AGI can be imminent longer than you can stay solvent—it’s just around the corner.

Anthony, to go back, I want to challenge the idea it’s all marketing. In Dario Amodei’s essay (we talked about it a few episodes ago), he writes about AI changing the world. People say it’s marketing, but he wrote about this as a grad student; he still does. To look past current problems, you almost have to be a true believer. I don’t think it’s marketing; I think they genuinely believe it’s imminent. How do you think about that?

Anthony Annunziata: I think AI will change the world—incrementally, practically, and quickly. It already is, in practical ways we’ve talked about: specific applications integrated with software, capabilities we want assistance with. I wouldn’t say people are disingenuous, but there’s a cultural obsession with “super anything.” It’s interesting and fun. A more negative side is the existential debates that hopefully are dying back; they were really heated a year and a half ago.

It’s a natural attractor; it’s hard not to bring it up. Maybe I’m too much of a pragmatist; I focus on how AI is actually helping and will help every day. On this podcast, probably before too long, that’s how the world changes—not with super intelligence.

Tim Hwang: I agree; I don’t think they’re disingenuous; people believe it, and that’s fine. Naveen, you want to pull back and contextualize?

Naveen Rao: Do you care if the airplane was invented in 1903 or 1910? Does it matter? We’re splitting hairs. If it’s three years, as Dario says, or 10 years, looking back in 50 years, it doesn’t matter because of exponentials. It’s okay to be exuberant and believe.

Also, Anthropic’s warnings on safety and understanding aren’t predicated on super intelligence arrival. Dumb intelligence can be dangerous if out in the world. If we give LLMs API calls and ability to impact the world, pull real data into decision-making, from that perspective, it’s absolutely true and something everyone should be aware of, AGI or not.

Tim Hwang: Yeah, for sure. These last comments are interesting. All three of you pitch yourselves as realists, but we’re landing on: we all agree this technology will be a huge deal; we’re just hair-splitting over 10 years vs. two months, which is interesting.

A final comment: you all talk to customers in the market who need to decide if this technology is better than their current stack. Do you hear from customers, “By the way, should I be worried this will destroy the world?” How much is this chin-stroking media discussion vs. influencing enterprise decisions? Are they separate worlds?

Anthony Annunziata: Certainly, lots of customers are concerned and ask about accuracy, trust, implementing use cases with high-quality output they can trust in deployment to save or make money without big liability. There are challenges in health, finance, legal, many areas. I hear very little, if any, questions about AI destroying the world or contributing to a robot army. It’s very practical, business-focused, as it should be.

Tim Hwang: That’s interesting. We think of AI discussion as one block, but in practice, there are distinct fora. Kate, Naveen, thoughts on what you hear from customers and if AGI stuff registers?

Naveen Rao: Anthony nailed it; it’s practically grounded. That said, motivations are such that no one wants to be left behind. There’s a lot of top-down push for AI, even from boards. I’ve spoken to multiple boards of large public companies; this is front and center. It’s about the next technology transition; we have to be part of it. No one’s talking about it taking over the world; it’s about crafting a strategy to be part of this new world.

Kate Soule: I’d echo both. I’m optimistic that conversations with enterprises are about how to take advantage but also ensure the right protocols, control points, and safety measures are in place because their bottom line is at risk. That provides helpful pressure on model providers to develop solutions for a responsible, governed approach vs. building far and fast. That will help create tooling so it isn’t a concern about AGI taking over; we’ll have built controls and processes for a well-governed AI system.

Tim Hwang: I can’t think of a better note to end on. I’m wrapping us up. Kate, as always, thanks. Anthony and Naveen, hope to have you on again. Thanks for joining us. If you enjoyed this, you can get us on Apple Podcasts, Spotify, and podcast platforms everywhere. We’ll see you next week on Mixture of Experts.

Stay on top of AI news with our experts

Follow us on Apple Podcasts and Spotify.

Subscribe to our playlist on YouTube