Will 2025 be the year of AI agents? In episode 35 of Mixture of Experts, join host Tim Hwang along with some show veterans as they look back at 2024 in AI. Tune in to this week’s discussion, as we review AI models, agents, hardware and product releases with some of the top industry experts. What was the best model of 2024? Will NVIDIA still be king in 2025? What are some of the AI trends in 2025?
All that and more on this special edition of Mixture of Experts.
Key takeaways:
The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.
Tim Hwang: All right, looking back at 2024, what was the best model of the year? For me, it’s going to be Gemini and Flash. And I’m going to nominate a sequence, I think, which is the sequence of the Llama models. So, is the bubble finally going to burst on Agents in 2025? Agents are the world. Agents are everything. And in 2025, we’re going to have super Agents. In 2025, is NVIDIA still going to be king? Not only is NVIDIA here, but we also see new entrants, the other players in the market. Are we going to end up having openness and safety? You can do this out in the open. It does not need to be behind a black curtain, so to speak. All that and more on today’s “Mixture of Experts.”
I am Tim Hwang, and welcome to “Mixture of Experts.” Each week, MoE is dedicated to bringing the gold-standard banter you need to make sense of the ever-evolving landscape of artificial intelligence. Today, we’re looking back at the huge evolutions across 2024. You know, just to take you back, in January of 2024, we’re all chattering about the release of the GPT Store, Claude 2.1’s long context window, and I think at that point, we were still waiting for the release of Llama 3. 2024 was incredible, obviously a dynamic year in AI, and so what we’ve done is we’ve gathered a bunch of our best panelists to talk about what stood out to them, what didn’t go as well, and maybe what they’ll think about what happens in 2025. We’re going to talk about agents, hardware, product releases from the whole year, but first, we’re going to start with what happened in the world of AI models in 2024.
And to help us unpack the journey we’ve been on, we have with us Marina Danilevsky, who’s a Senior Research Scientist, and Shobhit Varshney, Senior Partner Consulting on AI for US, Canada, and Latin America. And so I want to actually start with a quick, more recent story, right? Even before we zoom back to the dark ages of January 2024, which is the release of o1. Obviously, this was a big announcement, one of the biggest announcements of the year. And I know, Shobhit, before the show, you and I were talking, you wanted to kind of get in and actually just point out that the release of o1 actually marks a pretty big change in how these companies are thinking about doing models and scaling these models. And maybe we’ll just start there if you want to jump in.
Shobhit Varshney: Excellent. It’s such a great time to be alive. What we see all around us—like, there’s no other year in your entire career life that you would rather be alive than today. In the last year or so, we saw the era of scaling laws. We got to a point where we realized that adding more compute, building larger models, and driving higher performance got us incredible, incredible performance from these models, right? So we got to a point where we have insanely large models—now Llama 3, 400 billion parameters, 1.75 trillion from GPT-4. You can see this huge set of big models that are doing amazing work.
Now we are transitioning to a couple of different shifts in the market. One, we are seeing more of the shift moving towards the inference phase of it. Slow down, think about what you want me to do, and think through a plan and come up with an answer. We also started to give these models more tools that they could use, just like we learned to use tools as we grow up. So we have these agentic flows that are helping us increase the intelligence as well. We also saw a big shift in the overall cost. The cost of these proprietary models plummeted in the last year or so. But then smaller models got more and more efficient and started to perform much, much better. So we’ve seen this shift towards insanely large models that can think a lot more. We saw us run out of all the public internet data, and now we’re focusing a lot more on high-quality enterprise data or stuff that’s built for specific models. So we’re now getting to a point where you have a teacher model that’s insanely large, really well-thinking through the whole problem, that can create synthetic data, can help train a smaller model, can distill a model that can deliver high performance at a lower price point. So we’ve shifted quite a bit in how we think about AI models and how we have been investing in building them. 2025 and beyond is going to be a completely different ballgame in what we see with what AI models would do.
Marina, what are your thoughts?
Marina Danilevsky: Yeah, I think you’re right. It’s been a really interesting year in terms of where we started, where we’ve ended up. We’ve seen that, yes, we can go bigger and bigger and bigger. And now we’re finally there. We can say, great. So how well can we still keep going now that we can go so far smaller? So that initial research push of “how big can we go,” we’ve finally given ourselves the luxury of, “All right, now it’s time for efficiency. Now it’s time for cutting costs. Now it’s maybe eventually time to talk about environmental aspects and things of that nature.” Maybe next year.
Tim Hwang: Is that a prediction for 2025 or?
Marina Danilevsky: 2025. So that part is very interesting. It also means that the quality has gotten to where we can start to build enterprise-grade solutions reliably. And I’m excited for that. I know we’re not talking about next year yet, but that’s the thing that I’m really excited for. The quality is there, I think finally. And we can start getting real serious about enterprise solutions.
Tim Hwang: Yeah, I mean, I think that seemed like a really big trend this year. Certainly, as someone who kind of does software engineering in their free time as a hobby, this is the year where I was like, “Wow, I am finally able to do stuff with these coding assistants that I would not otherwise be able to do.” It’s like finally fit for purpose for me to kind of use on a day-to-day basis. And I think that was a very big jump that we noticed in the last 12 months. I guess, Marina, are there particular stories that stand out to you from, I don’t know, earlier in the spring or otherwise where you’re like, “Oh, when I look back on 2024, I’ll really remember it for X”?
Marina Danilevsky: I mean, first of all, I’ll remember it for just the very, very high levels of competition. It felt like every two weeks somebody was coming out with something, and companies that you maybe wouldn’t even expect—like, very recently, Amazon being like, “Oh, they’re working on that. Oh, that’s actually pretty good.” So I think I’ll remember it for a lot of people trying to really one-up each other in a good way, in a way that actually really pushes the thing forward. But I think that the number of players that we have this year is what’s really going to make it stand out for me. And some of the, you know, as we talked about in previous episodes, some of the debuts were more successful, some were less successful. Sometimes people didn’t quite double-check everything. Maybe sometimes people thought that the demos were a little bit overcooked. And so I think that’s the thing that’ll make me really remember the year: the different ways of how do you join in the competition and introduce your flavor.
Tim Hwang: Shobhit, how about you?
Shobhit Varshney: I think from an enterprise perspective, this is an amazing year. We recently ran a survey for our AI report, and about 15 percent of our clients globally got real tangible value by applying generative AI. There’s a lot of knowledge that was locked in documents and processes, things of that nature. And we saw meaningful movement in how clients are focusing on a few small, complex workflows and delivering exceptional value out of it. I think we did not get enough value out of the generic co-pilots or assistants. That has shifted more towards, “Hey, this really has to be grounded in my data and my knowledge and things of that nature.” But overall, the last two weeks that we just went through, I think that was the most action we’ve ever seen in the last two, three years of AI—the competition between OpenAI and Google, and then Meta jumping in. That has been a phenomenal, phenomenal movement in the community together. And now we’re starting to see us move towards, “Hey, we have exceptional models, how do we start to then control them a little bit more, adapt them to our enterprise workflows and our datasets, and have them think and reason with tools and things of that nature more?” The big movements around o1, I think it’s going to go down in history as a big point in time when we started to realize that $200 a month is actually great value. You start to get to a point where if you’re spending $200 bucks a month, you’re really being very focused on which workflows truly can see an uplift and apply AI to them. Now you’re at a point where you’re really paying somebody to augment every aspect of your daily life. I think we’re at great momentum to start 2025.
Tim Hwang: Yeah, for sure. And I guess, I don’t know if folks have nominees on superlatives for this, or it’s like, is o1 the release of the year? I mean, I think from a model standpoint, or were there other ones that kind of stand out? I mean, I guess we also had Llama 3 this year, right? It was also a huge, huge announcement.
Shobhit Varshney: For me, it’s going to be Gemini Flash. I think what they’ve just done with a small model that does multimodal, that’s going to drive the next two, three years of computing. And the reason I say that is everything that you can now unlock. If you guys followed the Android XR announcements recently, you’re now at a point where multimodal models were inherently insanely large. They needed a lot of compute, always happened on the servers. Now with models like Google Flash, you’re getting to a point where a small model can do multimodal really, really well. And the thing that will blow you away is how it starts to remember things that you’ve just seen, right? I think it’s going to start augmenting all parts of our day-to-day workflows, including memory. That’s something that we have not seen so far. We used to generally ask questions in a very cold start. Now we’ll get to a point where these models will have infinite memory, can have access to tools like we do. I’m very excited about high performance at a really small size. So we can then eventually get to this compute infrastructure where you can have XR/AR experiences, and you can bring compute more and more closer to the devices that will drive a lot more of privacy as well, because then the data is locked into those devices that I’m carrying with me versus somebody else’s cloud.
Tim Hwang: Yeah, I want to agree with that, actually—the small models thing. Because I think that we’re going to start, at least in the next year or two, seeing a lot more formal regulation going on and a lot more people waking up to what does it really mean—as you’re talking, Shobhit—if the models are starting to remember, starting to be personalized, starting to be customized, that’s going to become extremely, extremely relevant. So having something small, local, that you can actually have that guarantee technologically, that’s going to become very, very important. I agree with you.
Marina Danilevsky: Yeah, for sure.
Tim Hwang: And how about you, Marina? I think in terms of, you know, I know Shobhit was saying o1 was huge. Like if you have a “best model of the year” kind of nomination.
Marina Danilevsky: That’s a hard one. I like seeing them in a holistic way. And I feel like it’s hard to tell at the moment when something is actually going to turn in. I’m going to nominate a sequence, I think, which is the sequence of the Llama models—not the Llama models itself, but the sequence of “we’re going to have Llama 3,” and then we’re... so we’ve seen what we can do with pre-training, and then we’re going to see what we can do with post-training. So we’re going to get bigger, bigger, bigger, and then we’re going to see how far down we can go. I’d like to see a consistent perspective of that as a sequence that people try: push the pre-training, push the post-training, push the size, and do that iteratively, iteratively, iteratively. I’d like to see that continue to be a thing.
Tim Hwang: Yeah, I feel like that’s how we know you’re a connoisseur, Marina, is like you like the curation of Llama. It’s not just like any given model is the best model.
Shobhit Varshney: Marina, I think we’ll get to a point where the big research labs are going to build even bigger, bigger models, but they may not release them in the public as a model, and we use that more for creating synthetic data, for distilling, teaching as a teacher model, and so forth. But I’m really excited about—we’re finally coming to a point where we’ve poked at this for a while, and we said, “Oh, if I just ask this model to think before it answers...” Well, this is what elementary school teachers teach kids, right? And now we’re trying to relearn how we teach young kids on how do they look at the... try different things out, create a plan, answer the question, go pick up a calculator if you really need to, and don’t try to do this in your head,” things of that nature. Like, I feel that we are—I have little kids and I’ve spoken about that quite a bit—and I feel that we are, there’s so many similarities between how we are training and we’re doing some reinforcement learning with our kids and giving them rewards and mechanisms in place. We are breaking problems into smaller chunks and they go solve each one of them separately, and there’s a whole positive reinforcement around them and they get things right. I think we’re getting to a point where we’re getting to learn how these models learn, and that becomes a good symbiotic relationship. I think we will stop asking these models to do things that humans do really well, and we’ll have a better mutual appreciation of which things should be delegated down to these models. And that also means that benchmarks and how we evaluate these models are going to change quite a bit. But I think today we’re starting to get to know these models really well. And 2025 and ‘26 will have a very different relationship with these models, becoming more of a companion versus trying to figure out, “Hey, can you do this as well as I do?”
Tim Hwang: Yeah, absolutely. Yeah, I think one of the funniest outcomes of this year has been all the examples of, like, “Could you just try harder?” and then the model actually just does better, which is very funny. I mean, computers did not used to do that.
So, I think maybe a final question, and then we can wrap up this segment, is we haven’t talked so much about multimodality, but it really seems poised to become a really big deal in 2025. I’m curious, I guess maybe Marina, I’ll start with you, if you’ve got kind of predictions for what’s coming up in the next year for multimodal.
Marina Danilevsky: Yeah, multimodal—that’s something where we had those thoughts when foundation models first came on, ‘cause we were all very excited about the fact of, “Oh, well, it’s just tokens in order. It doesn’t have to be text. It can be anything.” But then I think the reason we all went into text—as one of the very early, code being part of it, I think—is the amount of training data that we had, the amount of examples that we had. So especially now that we’ve gotten better with synthetic data and with, like you said, Shobhit—you were referring to teacher models—we’re going to be able to explore that space a lot more. And so I think that they might finally be at the point where, once again, they are useful. There’s huge interest in having the multimodal models because now, you know how with the text models, we had the idea that when you have one doing lots of tasks, it learns from each other. Now it’s going to be even more interesting where if you have a multimodal model, does that make it actually also better at each of the individual modalities? Again, I think the data is now finally there, not just the compute, but the data and the ability to create more data. And so I think that, yeah, next year we should see more. I think I was expecting to see maybe a little bit more models that aimed at the sciences this year. Maybe now again, next year, maybe models that are going to be more successful with video—not just Sora, but something that is maybe a little bit more useful lower down, think like with robotics. There’s a lot of things to be minded there. So that’s, I guess, where I see those. Maybe, yeah, the flashy parts are fun, but the real usefulness is somewhere a little bit lower down with the hardware.
Shobhit Varshney: No, I think the multimodal space is going to be amazing the next couple of years. And I think it is important for it to understand all aspects of what humans are seeing, feeling, looking at, reading, and listening before it comes and helps us. I think it’s going to have a huge impact on its understanding of the world around us. So far, we have done things where, “Hey, I will take a picture of something or I’ll translate that into text and ask a question of a chatbot.” That paradigm has not scaled. As the multimodal models get better and smaller—like the Gemini 2.0 Flash Experimental—those are the ones that are going to drive more and more richer experiences in our day-to-day lives. And the competition is going to be very, very high. You will see these models come out from everywhere. The Any2Any, from speech to speech directly, those kind of models are delivering exceptional customer experiences. If you look at traditional ways of doing AI, you would go speech to text. You take that text, you pass it to an AI model. AI model figures out what to respond with, and you go back from text to speech. A lot is lost in translation and transcription. Now, when you start doing from media to media, you go from voice to voice, it starts to understand the nuances of how humans talk. I’m very excited about the next year of multimodal: small, and then starting the full context.
Tim Hwang: That’s awesome. And that’s all the time we have for today to talk about AI models. Shobhit, Marina, thanks for coming on. Happy holidays, and we’ll talk next year about all this and more.
For our next segment, I want to talk about agents in 2024, and to help me do that, I’m gonna bring in Chris Hay, Distinguished Engineer, CTO Customer Transformation, and Maya Murad, who is the Product Manager for AI Incubation. Maya, Chris, welcome back to the show.
Well, so in 2024, it was the year of the agents, agents, agents, agents. I think it almost became a little bit of an in-joke at MoE that if we had an episode that did not include agents, that was a really big thing and an unusual thing. And so I guess probably let’s put it this way. And I guess maybe Chris, we’ll throw it to you first: Agents—overhyped in 2024 or underhyped in 2024?
Chris Hay: Underhyped, not hyped enough. Agents are the world, agents are everything, and in 2025, wow, we’re gonna have super agents. That’s what’s coming in ‘25.
Tim Hwang: Okay. And I guess, Maya, I mean, looking back, I don’t know if you’d agree with Chris or if there’s particular stories in 2024 that really stood out to you in the development of agents, if they’re going to be as big as Chris says for 2025.
Maya Murad: So I definitely agree 2024, I would say it was a lot of talking about AI agents. I’m excited to see more execution, and what I expect to see is more quality hurdles once we see more agents being pushed into production. I think we’re just scratching the surface of what is needed. A trend that I’m starting to see right now this year is having more protocols and standardization efforts. So we saw that Meta is attempting to do that with the Llama Stack, Anthropic with their Model Context Protocol, MCP. So I think it’s going to be this little battle for how do we standardize how LLMs interact with the external world, how agents—I think in the future it’s going to be how agents interact with each other. And I think this is where the next frontier is and where a lot of our efforts are going to be heading towards.
Tim Hwang: Yeah, this felt like a big, almost like a preparation year. I was looking at all the news stories and I was like, “Is the biggest agent story of the year that Salesforce is hiring a lot of sales agents to sell agents?” Like, it feels like, and then between that and the technical standards, it’s almost kind of like it’s far and few between to be like, “Oh yeah, this was the killer agent release of the year.” And actually, in fact, a lot more prep. I don’t know if, Maya, you’d agree with that.
Maya Murad: It felt like it was the year of bracing for what’s to come and all the different things we needed to consider, and then who wanted to own that category. So it was really interesting that, for example, Meta went out early and with—so the first iteration of Llama Stack was a little bit rough, but what they were trying to do with their saying, “We’re in this in the long term, and we want to help define those agent intercommunication protocols.” And I have faith, if that’s a direction that Meta wants to take, I’m sure they’re going to do a good job at it. But this is also signaling something interesting. The last two years, it’s mainly the field reacting to what OpenAI put out. So OpenAI put out their chat completions API, and the whole ecosystem followed suit. And if you didn’t have that exact API, your thing was much more difficult to consume. And now we’re seeing a lot more players contend to being the one setting those standards and protocols.
Tim Hwang: Yeah, for sure. And maybe, I guess, Chris, to turn it back to you, I mean, you just used the phrase “agents are the world,” which is a very bold claim. But, I mean, 2025, let’s say agents are a lot more popular, become a lot more prominent as a part of the landscape. You know, is it Meta that’s well-positioned to win here, or do you have any predictions about what we’re going to see in terms of who’s going to be leading in the space versus maybe a little bit further behind?
Chris Hay: So I really like what Maya had to say on Anthropic and the Model Context Protocol. I actually think that is going to be one of the biggest enablers for agents next year. And I think the problem that they’ve solved really well is allowing remote calling of tools. That’s probably the biggest thing that they’ve solved there, right? So if we think about the enterprise for a second, you’re not going to have agents that are sitting scouring the web, or they’re going to be sitting downloading documents, whatever. It’s going to be access to your enterprise tools. It’s going to be things like accessing Slack, it’s going to be accessing your Dropbox, or your Box folders, or whatever, or your GitHub. And a lot of that is being standardized. But more importantly, you want to take your own data, and then expose your own APIs, and expose that in a way that agents can consume data in a standardized way. And I think MCP has done a really good job of allowing you to remote call tools and then be able to chain them together with multiple servers. And I think that’s going to be a big enabler.
Now what’s interesting and what they’ve done there is it is easy to hook up different LLMs, for example. So it’s not tied to the cloud stack there. You can hook up any other model that you want. And it’s all tied in to function calling, which again, was a standard that was created by OpenAI in that sense. So, I like what you said there, Maya, about different providers coming in and coming in an ecosystem. And I think that’s what I’d like to see happen is no one company winning, and this ecosystem of providers is going to push everything forward, and we’re going to enter this world of the big agent marketplace. And that’s why I say super agents are coming, because it’s going to be this really big ecosystem that’s going to start to emerge in 2025.
Tim Hwang: And when you say super agent, what do you mean exactly?
Chris Hay: I just made up the term, Tim, so... You heard it here first on MoE. A really good agent.
Maya Murad: That’s a super coming from super-intelligence, or is this your definition, or is it in the sense of like a Hollywood super-agent?
Chris Hay: Actually, I—thanks for the save there, Maya, right? I’m going to define a super agent as the combination of the reasoning models—the inference time compute models are coming out just now—combined with tool access. So therefore they’re more powerful than the agents that you have today. So there you heard it first. You’re right, Tim. That’s what a super agent is.
Tim Hwang: Very nice. Maya, you had a funny phrase when you were kind of giving your reaction to my first question, which is, you know, next year’s agents are going to be everywhere, but it’s also going to be the year we’re going to discover where the barriers or the limitations are—basically, the full force of agents is going to become crashing onto reality. And I think we’re going to learn a lot. And, you know, I guess one question I’ve been asking a lot of the panelists for this episode is, you know, what’s underrated? What are people not thinking about that’s likely to be a big hurdle, right, for agents going forwards?
Maya Murad: Number one answer: security. Super underrated. I think it’s already being reported that a lot of the existing players in the space are leaking sensitive data. And I see agents as a way of exacerbating these inherent risks of LLMs. And I think we’re under-appreciating what it takes to get it right. I think the other thing is how to nail the right human interactions. When you have this ability to automate more complex tasks, what are the things that you still need to delegate to the human? How do you need to have a human in the loop? How do you avoid an over-trust issue? My team has done a number of user studies, and when information is presented neatly by an actor that looks and seems intelligent, it’s really easy to take everything surface level for granted. And I think there’s a whole new paradigm of human-computer interaction, or maybe human-agent interaction, that will be unlocked. And I’m really excited for what’s to come because I think this is inherently a creative exercise. How do we keep retain our creativity, retain our ability to do critical thinking, and yet automate certain parts of processes to AI? That will be a really interesting paradigm to get right.
Tim Hwang: Yeah, I think that delegation problem is going to end up being super, super hard. It’s very easy to be dependent on even people who sound smart when they’re not actually. It’s like no different, I guess, for agents as well.
Well, I guess put it this way: it sounds like we’re very interested. And I guess the big prediction from both of you seems to be agent marketplaces. That’s going to be maybe the big thing we’re going to see next year. I think one of the big questions is also kind of like what’s going to be the first most popular agent use case in some ways. You think about the big marketplace, there’s a lot of things that agents could do that may be fun to do, but I think we’re almost kind of looking like what’s going to be the “email” of the agent world, right? Like what’s going to be the “Slack” of the agent world. Curious in both of your experiences, talking to customers and stuff with their particular things, like in their hopes and dreams that they really want to see out of agents, and if there’s kind of anything recurring there that’s worth it for our listeners to know.
Chris Hay: I think from my perspective, Tim, and that marketplace, I think there is some obvious ones. Like translation, I think, if I’m truly honest, like language models today, I don’t think they’ve really nailed translation so well. There’s some models that do certain languages really well, but then if you think of the more esoteric languages, for example, the less popular ones, then the large models aren’t getting that. And then it’s going to be specialized models that have been trained in that specific language. So I think that’s probably a real opportunity for some of these smaller language models combined with an agent to offer translation services. And again, add that into domain services. So things like legal, which is something you know very well, Tim, then I think that will probably be a big piece of that marketplace.
But I’m hoping that it won’t just be about these individual agents. I think any piece of information—it could be sports scores, it could be golf scores, it could be information about play, it could be absolutely anything. And one of the things, and this is my next prediction for 2025, is I think we’re going to get a shift in the world wide web. So today, HTML, et cetera, is the dominant markup language of the internet. That’s not really well designed for LLMs and not well designed for agents. So I wonder if in order for the agents to exist, not just having the marketplaces, but having the way to expose that data—we talked about MCP earlier—I wonder if you’re going to start to see new types of pages appearing where the content is optimized towards the agents for consumption by agents and resources that they expose, as opposed to necessarily human. So I’m kind of predicting we’re going to start to see this shift in the web to a kind of, dare I say, a Web 4.0—I’m trying to avoid the term Web 3.0—where we have content that is specifically designed for agent consumption.
Tim Hwang: Yeah, it seems to be almost the prediction that’s kind of implicit in what both of you are saying is that there’ll be so much interest in the promise of agents that almost we’re going to be reconstructing the web to make it safe for agents or make it work for agents. And I guess a lot of the stack and a lot of the interoperability stuff that’s being built is an attempt to do that in some ways. I don’t know, Maya, do you agree with that?
Maya Murad: You think that’s kind of like going to be the future is like we’ll have an agent markup language basically? A.T.M.L.? I think a lot of the interesting use cases will be unlocked when different agents that were built by different providers that are owned by different organizations are able to interact with each other. And like, how do you establish a safety protocol? How are you able to do that productively? Like, the promise here is like, how do we break out of all these silos of different systems and having to manually architect how each one speaks to each other? And can we get to a universal interaction protocol? This is really an interesting promise. I don’t know if we will fully unlock it next year, but a lot of different actors would like to go into this direction. And there’s simple things that we should nail before that. So I know like software engineering tasks—there’s a lot of investment going that space. I still think no one has nailed the average business user. The average business user has to use, I don’t know, a dozen of different tools on their computer and their machine. None of them speaks with the other. Everyone has its own onboarding experience. So I see a lot of opportunity to flatten out these complex experiences and make them much more dynamic and integrated. And this is the true promise of this technology.
Tim Hwang: And it’s the ultimate dream, I guess. Because the world you’re describing is almost like the agent becomes your entire interface for all these applications. Like they stay independent, but the operating system in the future really is the agent that’s doing things on your behalf. It’s natural language.
Maya Murad: LLMs changed our perception of how we interact with the digital world. We expect everything to be in natural language, or you could do a form and then there’s an option to do natural language interaction. And I think that expectation is gonna widen.
Tim Hwang: Yeah, no, I think that makes a ton of sense. I guess maybe the final turn that we should talk a little bit about is like on the engineering and coding side, right? I was thinking this year that the coding assistance has gotten really, really good. But the dream is that you eventually have agents that are like—I’m really envisioning a software code base that looks like this, and it’s able to kind of build and interoperate on all parts of that, and all parts of your code base. What do we think are the prospects for that kind of automation and agentic behavior?
Chris Hay: I’m going to kick off here, and I’m going to be controversial as always. And here is something for people to think about, which is: programming languages today are designed for human beings, right? And if you think about things like loops, while loops, for loops, etc., there you have however many versions, and the same with conditionals, if statements, blah, blah, blah. But you know what? When you get down to an assembly level, none of that exists, right? It’s all back to branches and jump statements, etc. And therefore, we are in an agentic world, we’re getting them to program in a language that is designed for humans. And the big challenge, I would say, that I think is going to happen over the next few years is that you’re going to have a more agentic-native language. Something that is more designed for LLMs and therefore has less of a syntactic sugar that you need to satisfy humans there. So, I think there’s going to be an evolution in programming coming. And you can see it already today, right? The LLMs are already generating, you know, “here’s another Fibonacci function.” I don’t need another Fibonacci function in my life, right? We got those.
Tim Hwang: Exactly.
Chris Hay: So I then think you’ll be like the equivalent of kind of NPM or something like that, where you have a big massive AI library where you can pull the functions that you need. So I think, like your AI operating system, I think we’re going to get AI programming languages and libraries that are going to be a little bit more native, and then that’s going to help the development of coding. So I think that’s an interesting term. Will it be 2025? Maybe, maybe it’s going to be ‘26, but I think that’s where we’re going.
Maya Murad: With the current technology we have, I’m like super impressed with what I’ve seen with Replit, with the ability to stand up like full-stack applications. On the project I’m working on with Bee, it’s been such an interesting paradigm—chat to build applications. I really see the ability to create digital interfaces and code bases being democratized in a way that hasn’t been able to before, purely powered by the current technology of agents that we have. I just think there’s this last-mile problem to nail, and I think next year this is going to blow up in a major way.
Tim Hwang: Nice. Well, you heard it here first. That’s all the time that we have for agents. That was a lot to cover in a short period of time. Chris, Maya, thanks for coming on the show, and we’ll see you next year.
I want to move us on to talk about the hardware that powered AI in 2024, and I can’t have picked a better duo of people to help out in terms of explaining those than the two that I have online with me today. Khaoutar El Maghraoui is a Principal Research Scientist, AI Engineering, AI Hardware Center, and Volkmar Uhlig is Vice President, AI Infrastructure Portfolio Lead. Welcome to the show.
Volkmar, maybe I’ll turn to you first. So, you know, as we talk about hardware in AI, it’s almost become synonymous with saying that we want to talk about NVIDIA. And I’m curious about what you thought the biggest stories were this year from NVIDIA. I mean, the one that strikes me is the announcement of the upcoming GB200, but curious if there’s other things on your radar as we kind of think about what were the big stories in 2024?
Volkmar Uhlig: NVIDIA made a big splash for the GB200. And I think we are seeing a big shift towards more integrated systems on the training side. Very large, like rack-scale computers now. Liquid cooling is coming. So all the things we’ve seen over the years—how to get crammed more compute into smaller form factor, making it faster, better networks behind it, etc. And I think NVIDIA is really trying to push hard on staying the leader.
And then we are seeing upgrades, which are kind of a reflection of how models are now looking like. So we have 70 billion parameter models. And you know, the 70 billion parameters, even if you quantize gigabytes at 8-bit, it’s 140 gigabytes at 16-bit. Now you don’t want to have to buy full cards. So we see an increase in memory capacity across the board of all the accelerators.
But not only NVIDIA is here, but we also see new entrants, the other players in the market. AMD is announcing a pretty good roadmap of their products with very large memory capacities and memory bandwidth to address those large language models and fit more model into less space or less compute. And Intel is playing in the market as well. And then you have a handful of startups where we also saw really interesting technologies coming onto the market. So if you look at Cerebras, that’s a wafer-scale AI, which a year ago they were talking about it, now you can actually use it as a cloud service. You have Groq being a player, there are other companies coming up, there’s D-Matrix, which will have an adapter coming out at the beginning of next year. And so I think there’s a good set of players in the market.
And then there are new entrants, right? We just saw the Broadcom announcement pretty much last week with very large revenue targets and the relationship with Apple. And then Qualcomm is also in the game and has a chip architecture coming, and some of them are available and there’s a good roadmap for them. So I think the market is not only NVIDIA anymore, which is, I think, good for the industry, and it’s moving extremely fast. And we see training systems there, but there’s an increasing focus on inferencing because, from my perspective, it’s kind of where the money will be made.
Tim Hwang: Yeah, for sure. And I guess, Khaoutar, I don’t know if you want to talk a little bit about that bit. I wanted to make sure that we did talk a little bit about the big trends in inferencing this year, because it feels like that was actually a big theme of how this market is developing out. And if you want to speak a little bit to that and where you think things went in 2024.
Khaoutar El Maghraoui: Yeah, so of course, there’s a lot happening, especially around inference engines and optimizing inference engines. A lot of hardware-software co-design is also playing a key role in that. So, we see technologies like vLLM, for example. We see also things like what they’re doing and all the stuff around KV cache optimizations, the batching for the inference optimizations. So a lot of that, a lot of innovations is happening in open source around building and scaling inferencing, especially focusing on large language models. But a lot of these optimizations we see, they’re not only specific to LLM, they can be also extended to other models. So, a lot of development that’s happening at vLLM—there is work, you know, even at IBM Research and others contributing to open source to basically, especially, bring a lot of these co-optimizations in terms of scheduling, in terms of batching, in terms of figuring out how to best basically collocate all of these inference requests and get the hardware to run them efficiently.
Tim Hwang: Yeah, absolutely. Volkmar, do you want to give us a little bit of a peek into 2025? I mean, it kind of sounds like with this market becoming increasingly crowded, I think everybody’s coming after NVIDIA’s crown here. You know, what do you expect to happen in 2025? Does NVIDIA largely still stay in the lead? Or do we end in December 2025 with the market becoming a lot more divided and diversified than it has been traditionally, particularly on the training side?
Volkmar Uhlig: So I think the training side will be—that’s my prediction—will be still very strongly in the hands of NVIDIA. I think AMD and Intel will try to break into that market, but I think that will probably be more in the 2026-‘27 timeframe. The reason why I’m saying this is the architecture you need to build to build a really successful training system, it’s not the GPU, it’s a system. So you need really good, low-latency networking. You need to have a reliability problem. There’s a strong push to actually move compute into the fabric to further cut down the latency and more efficiently utilize the hardware. And NVIDIA, with their acquisition of Mellanox, effectively bought the number one network vendor for high-performance computing, which training is. And so there is a bunch of consortiums coming up. There’s Ultra Ethernet, where they’re trying to get to similar capabilities what you have with InfiniBand. And InfiniBand, despite that it’s an open standard, there’s pretty much only one vendor on the planet, which is Mellanox, which is now owned by NVIDIA. So I think NVIDIA has a good lock on that side of the market, and therefore a lot of the investments where other people are playing is more in the inferencing market, which is much easier to enter because you intrinsically not only have NVIDIA systems—like you don’t have NVIDIA on cell phones, you don’t have NVIDIA on the edge—and so there is a... and the software investment you need to do on inferencing is much lower than what you have on training side. So I think training is in very safe hands for NVIDIA.
But I think there is now enough—with Gaudi 3 coming online, which has integrated Ethernet, and what AMD is putting on the market—I think there will be a slow creep into that market. And I think in 2026, we will probably see that there is a major break into that market, and NVIDIA loses that very unique position it has right now.
Tim Hwang: Yeah, it’s going to be a big transition. Khaoutar, do you agree with that for the 2025 prediction?
Khaoutar El Maghraoui: Yeah, I agree with that. Of course, there’s a rising competition in AI hardware, like Volkmar mentioned. Companies like AMD, Intel, and startups like Groq and Graphcore, they’re developing competitive hardware. IBM also is developing competitive hardware for training and inference. The problem with the NVIDIA GPUs is also the cost and the power efficiency. The NVIDIA GPUs are very expensive and they’re power-hungry, making them less attractive, especially for the edge AI and the cost-sensitive deployments. So the competitors like AWS Inferentia, IPUs, they offer specialized hardware that’s often cheaper and more energy-efficient for certain applications.
And I think, you know, the open standards, for example, like the OpenAI Triton and the ONNX and new frameworks, they’re also working a lot on reducing the reliance on NVIDIA’s proprietary ecosystem, which makes it really easy for competitors to gain some traction here. And if we look at the inference-specific hardware, there is, you know, these—like I mentioned vLLM before—these dedicated inference engines like vLLM, SGLang, Triton, they highlight the potential for non-NVIDIA hardware. So they’re opening up the door for the competition easy entry, and they also allow them to excel in inference scenarios, especially for large language models. So we’ll see this widespread emergence of edge inference solutions powered by ASICs. And I think this is challenging NVIDIA’s role in this rapidly growing edge AI market.
Tim Hwang: Yeah, and I think the edge is the last bit I wanted to make sure that we touch on before we move on to the next segment. You know, Volkmar, it seems to me that obviously one of the big stories was Apple moving into Apple Intelligence and making sure that all the, you know, essentially AI chips are on them. I assume that’s going to continue to 2025, but I’m curious for our listeners that are less involved in watching the hardware space day to day, if there’s any trends that you think are worth it for people to pay attention to as we get into the next 12 months.
Volkmar Uhlig: I think the Apple model is very elegant when you are in a power-constraint environment. So, whatever you can do in that power-constraint environment with less accuracy, you do on device. And then whenever you need more, you go somewhere else. I think also the Apple architecture—that they are running on the same silicon as they are running on their phone, they run in the cloud—it’s a very interesting architecture because it simplifies it for the developer. It simplifies it in deployment. And so I think that we will see more of that type of separation, and I think we will see more compute happening on edge devices. And we’re going now as silicon matures, and there are more choices and you don’t need a high-powered card anymore, and the silicon gets more and more specialized for that simple matrix multiply, I think we will see pretty much every chip which will leave a factory will effectively contain AI capabilities in one form or another. And then it’s really this hybrid architecture of on-device and off-device processing, which allows to have silicon live for a long period of time. But if you’re on an edge—and edge is not only a phone, it could be an industrial device, where your life cycle is five to ten years—you don’t want to go and every two years have to swap out the chip just because you want to train another network. And so I think the architecture Apple put out will be more solidified, and we will see software ecosystems being built around that.
Tim Hwang: Yeah, that’s great. Well, Khaoutar, I’ll let you have the last word here. I’ve been asking most panelists as they’ve been coming on, what is the most underrated thing in this particular domain? So for AI hardware, are there things that people are not paying attention to? There’s a lot of hype in the AI hardware space. So I’m curious if there’s any more subtle trends that you think are important to pay attention to.
Khaoutar El Maghraoui: Yeah, that’s a great question. So I think there is a lot of work around real-time compute optimizations. Technologies, for example, like the test-time compute, which allows AI models to allocate additional computational resources during inference. This is something that we saw with OpenAI o1 model. It’s really, I think, sets some precedence here, and it allows the models to break down these complex problems effectively and mimic also kind of what we’re doing in human reasoning. And it also has implications on the way we design these models and also the way the models interact with the hardware. So it’s kind of pushing for more hardware-software co-design in this context where processing during inference.
I think another trend I see is the hardware accessibility for all. I think when we see the Llama 3 series, which illustrates new hardware ecosystems are evolving for both high-end research models but also for consumer-grade applications. So the Llama models, they release multiple versions—the 400B, the 8B, and so on. So that’s also an important trend that we’re seeing, so we can kind of bridge the gap between high-end data centers that allow basically access to where you have access to these high-end computes and infrastructure, which is not accessible to everything. So pushing towards that would be really important.
The other thing is the open source and the enterprise synergy. IBM released Granite 3, which I think is a great step in the right direction, which also highlights the importance of open-source AI and its ability to maximize the performance for enterprise hardware. But there are still hardware design challenges. For example, what we see with NVIDIA’s Blackwell GPUs and the issues that they have around thermal management and server architectures. So, these hardwares, to scale to meet demands for these next-gen AI, power efficiency is becoming critical. So, if I were to sum up what’s going on around these trends, I think the year 2024 showcased the importance of hardware-software co-design and the industry’s pivot also towards specialized AI accelerators, open-source adoption, and real-time compute innovations are really very important, are setting the stage for further breakthroughs.
Tim Hwang: Yeah, that’s a great note to end on. Well, that’s all the time that we have for hardware. Khaoutar, Volkmar, thanks for joining us, and for all your help in 2024 explaining the world of hardware, and we’ll have to have you back on in 2025.
Finally, to round out our picture of 2024, we need to talk about the product releases that stunned us, amazed us, and gave us something to think about. To help me do that are Kate Soule, Director of Technical Product Management for Granite, and Kush Varshney, IBM Fellow on AI Governance.
Kate, maybe I’ll turn it to you first. Obviously, the schedule was crazy this year in terms of product releases. It felt like every other week there was something. But I guess looking back on the last 12 months, I’m kind of curious, like, what did you think was the biggest things? The stories that we’ll look back on 2024 and be like, “Yeah, this is the year that...”
Kate Soule: You know, as the director for technical product management for Granite, I feel like I have to celebrate what our team at IBM accomplished and released for launching the Granite 3.0 model family, focused on Apache 2 licensed models that are transparent with an ethical sourcing of the data that went into them, that we share all the details about online in our report. So really excited about being able to continue that commitment to open-source AI and being able to create state-of-the-art language models in the 2 to 8 billion parameter size that we can put out there under permissible terms for our customers and for the open-source communities to leverage more broadly.
Looking outside of just IBM, you know, I think the release of the GPT-4 family of models and product was really exciting. I think it launched a new wave of interest in how do we continue to improve performance without just spending more money on our training compute. So I think that really is ushering in this next wave that we’re going to see in 2025 of how can we spend more at inference time, allowing models and products that use these models to have more advanced computations and inference calls that get generated to improve performance beyond just “let’s throw more money at the training, let’s throw more data, let’s scale, scale, scale.” So that’s more broadly something I was pretty excited to see.
Tim Hwang: Yeah, we should definitely talk about both of those themes. I mean, I think on the first one, 2024 was really the attack of the open source. It felt like for a moment there, all the closed-source models would really be winning the day, and it’s just the explosion of activity on open source has been really, really exciting to see. And then I think the second one as well is kind of like the “play smarter, not harder” kind of world where I think there’s a bunch of new techniques that we’re seeing kind of play out in a lot of places. Maybe, Kush, maybe we’ll start with that first theme. In the open-source world, of course, this is also the year of Llama 3. There’s just been a lot happening in open-source land. And curious as you look back, I mean, I think on either of the themes that Kate pointed out here, either on the open-source side or in the kind of different methods for doing AI, if there’s like things that you’d want our listeners to remember from 2024.
Kush Varshney: Yeah, I mean, I think your phrasing of it—“open source returns” or “the return of”—whatever you want to call it. Yeah, I mean, I think that’s the right way to frame it. I think we’re realizing—when we talk to customers across the board—that they were, in 2023, it was all about kind of POCs and this sort of thing, like getting people excited within their own companies that, “Oh, maybe generative AI has a role to play.” But then over time, they realized that actually we need to worry about the copyrighted data, other governance sort of issues, the cost, just how to make these operational. And I think watsonx, the IBM product, kind of shined with that. The Granite models obviously as well. So, how do we take the science experiment that we had in 2023, that was being used more this year, and now going into next year, it’s all about being as serious as possible, I would say.
Tim Hwang: Yeah, for sure. And I think now that you’re on for this segment, I mean, I think it’s a good time to ask, too, obviously you spend a lot of time thinking about AI governance, right? And there were a bunch of stories in that vein this year. I don’t know if there’s ones that you’d want to call out for 2024.
Kush Varshney: Yeah, no, I mean, I think just the fact that the whole AI safety world convened. I mean, we had this Korea summit, we had the summit in San Francisco in November. And yeah, I mean, it’s just—this is now the topic. I think it’s the thing that we need to overcome because just having AI, generative AI out there without the safety guardrails and without the governance, it’s just dangerous. I think the promise of the return on investment is only a promise until you can overcome the hump of the governance issues.
Tim Hwang: Yeah, for sure. Do you have any predictions for where we go in 2025 with all that? I mean, I’m detecting a theme here, which is 2024 almost set up a lot of stuff. 2025, we’re going to almost see how it plays out. I mean, both in open source and in governance, it seems like.
Kush Varshney: Yeah, no, I think the prediction is—the earlier segment was about agentic AI in the show, so I think that’s gonna really explode as well. And I think the governance—there is going to be what drives the governance back down to other use cases as well, because when you have autonomous agents, then really the governance, the trust is extremely important. You have very little control over what these things might do. The stuff that Kate was mentioning, the extra inference cycles that you’re going to see, are going to be, I think, mainly for the purpose of governance. It’s to make these things self-reflect a little bit, maybe think twice about what answers they’re putting out there and so forth. So you’re going to have more tools for governing the agents as well. So the Granite Guardian 3.1 release that just happened actually has a function-calling hallucination detector in there. So that’s one of the things that agents actually do, right? As part of the LLM, they actually will call some other tools, some other agents, some other function, and if that itself is hallucinated—the parameters, the type of the parameters, the function names—all of these things can go wrong. So we have ways of detecting issues there.
Tim Hwang: Kush, I’m curious, you said the inference runtime is going to be used more almost for governance and self-reflection. But I think you had even shared a paper recently about how there’s also like it opens this whole can of worms of other risks and potential security issues, right? When the models are running all these loops offline and people are naturally able to observe what’s going in.
Kush Varshney: Yeah, I mean, this whole—self-reflection, you can call it metacognition, you can call it wisdom—I mean, I think these are going to be things that are going to be part of what happens. But yeah, I mean, anytime you have extra stuff happening, more loops, more opportunities, more surface area for attacks, right? So I think that is certainly going to be part of it. But I have hope that just like in other systems, you can have better control when you can kind of have more opportunity to affect what happens.
Tim Hwang: Yeah, and I think that ends up being critical, and I think is also a pivot that I was going to mention to kind of throw it back to you, Kate, is, you know, if all of the open source is just coming up so quickly in 2024, it feels like 2025 might finally be the year where it’s like we’re at parity or even open source is going past closed source in some sense. And I think this is happening not just because the technology is getting better, but also, like Kush is saying, our ability to have components that ensure safety in deploying open-source models is also getting better, right? In the past, it was like, “Well, we have to rely on closed source because they really understand how to do alignment and security and safety.”
Kate Soule: There’s a lot of scare tactics out there.
Tim Hwang: That’s right.
Kate Soule: “Only the big model providers have the budget to be able to look at how to do this safely or the expertise.” That’s right. I, yeah, I think we’re finally getting, you know, chipping away at that enough. We’re seeing Meta, for example, doing a phenomenal job releasing very large models with excellent safety alignment out there and showing that you can do this out in the open. It does not need to be, you know, inside of behind a black curtain, so to speak.
Tim Hwang: Yeah, for sure. Is that a prediction for 2025? That we can have our cake and eat it too. Like we can have it be open, and it can also be safe.
Kate Soule: Absolutely. Yeah.
Tim Hwang: That’s exciting. Do you have any open-source predictions going into the next 12 months? Like where do we go from here? I guess more Granite, even better Granite.
Kate Soule: I think the next year is really going to be focused a little bit higher up the stack on top of the models and co-optimizing models and the developer frameworks in which they’re executed on. So we saw the release of Llama Stack, right, in 2024. I think we’re going to see that wildly evolve as it starts to mature and other similar types of capabilities and stacks being developed. I think we’ve all also kind of accepted the OpenAI endpoint way of working with models is, you know, the incumbent way to operate. But there’s probably other ways we can continue to innovate and improve now that we’ve been around the block a few times. So I think we’re going to start to see a lot of open-source innovation a little bit higher up the stack, particularly from model providers who are looking, “How do we further improve performance?” It goes hand in hand. If you’re trying to optimize and run innovate at the inference time, you need a stack that can handle that. And so that’s where I think a lot of the development is going to happen.
Tim Hwang: Yeah, for sure. Yeah, I feel like there’s so much that we’ve just taken as a given in some ways just because that’s where stuff got started. But it’s even easy to forget, given that there’s so much news, like this is all very fresh, and just a few years ago it was basically non-existent.
So, one that I kind of put before this group, particularly because we’re talking about product releases, you know, I think this year on “Mixture of Experts,” we’ve talked a lot about how chat, right? It’s just an interface that we started with just because ChatGPT was so successful. But there’s kind of no reason why that has to be the way we interact with these systems going forwards. I’m curious if either of you have kind of predictions on like, even the interface. Like do we start interacting with these systems in a way that’s pretty different from what we’ve gotten used to?
Kush Varshney: Yeah. I mean, I think the co-creativity, co-creation is going to become a bigger thing. So having multiple people—I know there’s been some canvas sort of things that have come out this year as well, but I think it’s just going to grow. And let me give a brief shout-out to my brother. He has a startup called Kocree, K O C R E E, and I just got to get that in.
Tim Hwang: Exactly.
Kush Varshney: And it’s all about kind of co-creating music for people through with AI, but also to help people and society with their well-being and so forth. Because when you create with others, it’s actually a positive experience as well. So, I think, you know, just a shift in focus a little bit maybe towards more human flourishing, human well-being, and kind of how to get people to really work together to have kind of open-endedness and so forth might be something that emerges.
Tim Hwang: Maybe we’ve got a little few minutes left on this segment. Is there anything that folks aren’t talking about? Like, I guess that’s one thing is like, I feel like, you know, and particularly in AI, everyone is always excited about the latest model release or the latest... Yeah, I’m always kind of trying to see around corners. I think with both of you, you’re kind of experts to think about this so deeply. Like what’s underhyped, maybe underrated at the moment that really deserves more attention going into the next year.
Kate Soule: I think there’s a going to be a tremendous opportunity, and I really hope this takes off, around thinking about modular components for building with LLMs. So how do we—like there’s work going on, for example, on how do we get to the point where you could fine-tune a LoRA adapter, right? Kind of a bucket of weights that you fine-tune for your task that sits on top of the model. Right now they have to be tailored for the exact model you’re going to deploy, and a new version comes out, you have to retune your model. But how do we create versions of this? For example, there’s interesting research on universal ones that can be applied anywhere. And then that creates some really nice, like modular components that you could ship or could have a catalog of and choose from and provision live and swap in and out. Again, at inference time, you could swap these types of things in and out. I think there’s also aspects like we’ve all heard now of our seminal mixture of experts architecture, right? So there, I think, is going to be increasing look at, can we make modular components where you have modular experts that get swapped in and out on the architecture side? So I would love, and I think there’s some really interesting research going on at the ground level that could support a focus around how do we make building and specializing models more modular in 2025.
Tim Hwang: Yeah, that’s super cool. And I think doesn’t get enough attention. I mean, I think everybody’s always like, “The AI is just one big model that does everything. Why do I have to choose? And that’s once we have the big model, all our problems will be solved, right? Bigger is better.”
Kush Varshney: Yeah.
Tim Hwang: How about you, Kush? Anything underrated you’d point out to our listeners before we close up on this segment?
Kush Varshney: Yeah, I think the middleware for agents I think would be one thing as well. I mean, building on what Kate just said about the modularity. So even having different agents in a multi-agent system, how you kind of register them, orchestrate them and so forth. So from IBM Research, we have this B framework—B as in the thing buzzing around in my ear. And that is out there. There’s other startups as well. So some former IBM researchers have this company called Emergence AI, and they have one. There’s others out there as well. So I mean, I think that’s gonna pick up, and it, I mean, again, relates to what Kate was saying. I mean, connecting more between the development environment and the models, kind of linking that much closer. So I think once we are at a point where the models are all kind of good enough, then it’s a question of how do we use them? How do we make productive use? And how do we develop them better? So, yeah.
Tim Hwang: Yeah, for sure. Definitely keep an eye on that space. Well, Kate, Kush, thanks for joining us on this segment. Appreciate you helping us to navigate 2024 in product releases, but also 2025 in product releases. And we will see you in the new year.
Well, that’s everything we have time for on our episode today. So much happened in 2024, and there’s basically no way we could fit it into one show. But I want to thank all of our panelists for helping us try. And to all the panelists that we’ve been lucky enough to have on “Mixture of Experts” in 2024. Each week, we get to nerd out with some of the smartest people in the business, and it’s a pleasure to be able to talk with them to better understand this crazy world of artificial intelligence. And thanks to you for joining us. If you enjoyed what you heard, you can get us on Apple Podcasts, Spotify, and podcast platforms everywhere. Here’s to what was a great 2024, and here’s looking forward to an incredible 2025.
Listen to engaging discussions with tech leaders. Watch the latest episodes.