Episode 37: CES 2025, NVIDIA DIGITS, Apple Intelligence fails and Sam Altman’s reflections

What’s the most exciting CES AI announcement? In episode 37 of Mixture of Experts, join host Tim Hwang along with experts Skyler Speakman, Volkmar Uhlig and Shobhit Varshney to discuss CES 2025. Specifically, listen to the experts dive into NVIDIA’S project DIGITS, among other announcements from the AI hardware giant. Next, take a look at a new enterprise AI development survey detailing how developers really feel about AI implementation. Then, dive into how Apple Intelligence experienced some major hallucination fails and what this tells us about Apple’s stake in the AI game. Finally, hear the experts discuss Sam Altman of OpenAI’s reflection blog on the second anniversary of ChatGPT exploring his insights on the future of AI. All this and more on today’s Mixture of Experts.

Key takeaways:

00:00 - Intro
01:01 - CES and NVIDIA DIGITS
10:31 - AI developer survey
21:02 - Apple Intelligence fails
31:00 - Sam Alton's reflections

The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.

Learn more about artificial intelligence.
📩 Sign up for a monthly newsletter for AI updates from IBM.
Explore the AI developer survey.

View all Mixture of Experts episodes

Listen on

Apple Podcasts

Spotify Podcasts

YouTube

Episode transcript

Tim Hwang: What are you most excited about coming out of CES? Shobhit Varshney is a Senior Partner Consulting on AI for the US, Canada, and Latin America. Shobhit, welcome back to the show. What do you think?

Shobhit Varshney: NVIDIA’s DIGITS. The supercomputer right next to my laptop. Mwah. Love it.

Tim Hwang: Great. Uh, Skyler Speakman is a Senior Research Scientist. Skyler, welcome back. What are you most interested in coming out of CES?

Skyler Speakman: As a long-time PC gamer, absolutely the new line of graphics cards coming out.

Tim Hwang: And finally, last but not least, Volkmar Uhlig is Vice President and AI Infrastructure Portfolio Lead. Volkmar, what do you take out of CES this year?

Volkmar Uhlig: I’m with Shobhit, it’s the DIGITS.

Tim Hwang: All right, all that and more on today’s Mixture of Experts. I’m Tim Hwang, and welcome to Mixture of Experts. Each week, MoE is dedicated to bringing you the debate, news, and analysis you need to keep up with the top headlines in artificial intelligence. Today, we’re going to be talking about a new report on developer use of AI tools, some trouble with Apple Intelligence, and Sam Altman’s reflections on the second anniversary of ChatGPT. But first, let’s get started. Let’s talk a little about CES. Shobhit, maybe I’ll kick it to you first. We’re all very excited by DIGITS. For those of us who are not obsessively watching all the headlines coming out of CES, what is DIGITS and why are you so excited about it?

Shobhit Varshney: So the intention here is shrinking their DGX all the way down to a small machine. NVIDIA has figured out a way to squeeze in a lot more firepower—their graphics GPU card with an insane memory, 120 GBs attached to it—so you can start to run some really large AI workloads on your desktop. Imagine a 200 billion parameter model, which is way bigger than what ChatGPT was when it came out two years back. You’re able to run that locally, right next to your machine. Now, it comes with a flavor of Unix on it, but you can obviously, instead of Linux, you can have your Mac and Windows use that as a server, and you can do some really cool things. But now you’re talking about having personal supercomputers that you can literally keep on your desk or potentially even carry with you. It won’t be out till May. It’s about $3,000, which, just looking at the hardware that’s going in that, itself is a ridiculously great price point to deliver that. But this starts to move computing from the cloud supercomputing all the way down to your desk. So, petaflops of compute at your desktop, and that just is an insane value.

Tim Hwang: Yeah, absolutely. I know, Volkmar, you were saying that you were excited about this as well. I think we’ve talked about it in the past, but if you want to give our listeners a little bit of an intuition of why is NVIDIA moving into this market at all, right? Like, arguably, doesn’t this put them in competition with, like, Apple and all these other kind of desktop personal computer creators? Whereas NVIDIA’s usual thing has, of course, been data centers. Do you have a sense of why they’re moving into this market?

Volkmar Uhlig: Yeah, I would not say that NVIDIA traditionally is a data center company. They are a gaming company. So, the data center kind of came along and hit them three, four years ago.

Tim Hwang: Right in the face, yeah.

Volkmar Uhlig: And, you know, good for them, they captured it and it was just visible in their market capitalization. I think what NVIDIA is figuring out right now is that the development market, or developer market, was kind of limited to, you know, “buy an RTX and stick it into a developer machine.” And now they are effectively going all in of saying we need to cover this whole value chain creation. And I think it’s very, very hard, like today, because you, in fact, you need to buy a Windows or Linux box and then you stick in a bunch of NVIDIA cards and you rack this thing up. And now they are effectively coming out and saying, “Okay, here’s a ready-to-go system which is optimized for that specific workload.” I think when you see what Apple did with the M1 to M4 now, they are effectively trying to capture that desktop market. And that is not CUDA, and that is not NVIDIA. And I think NVIDIA is doing a preventive strike here. And if you look from a pricing perspective, they’re sitting right between the smaller Apple Studio for $2,000 and the bigger Apple Studio for $4,000, and so they are at $3,000 and they have specs which are bigger. And so I think it’s... and now it’s also an attachment, but it’s at the same time, you can use it as your primary desktop. So I think they are effectively trying to cover their bases. What will be interesting to see is, you know, what people are now doing. If you can just for $3,000 get that box—it’s not a DGX—but in many cases, it may be sufficient for running small-scale training jobs. And so I can imagine that people are just buying them by the truckload and putting them up in data centers and giving their developers, not necessarily something on the desk, but maybe it’s tethered, but it’s on-premise. And so it’s a really good way of actually getting that development loop going. And you could even use it for production use cases, right? So if you don’t need a 19-inch rack server, you could use something smaller.

Skyler Speakman: They, I think at three different points in the press release from NVIDIA, they talk about how easy it is to take the models that you’ve trained on your small DIGITS and move it to NVIDIA’s cloud. So I also think they’re really pushing for this hook here in order to drive more business to their data centers. It’s “start small on your own personalized local system” and make it extremely easy for you to then scale that up onto, of course, their data centers. So I think that also plays a lot into the strategy of why they’re really pushing this.

Tim Hwang: Yeah. Shobhit, maybe to turn back to you, what do you do with a petaflop? You know, it’s kind of funny because it is very exciting—a supercomputer literally on your desktop—but with that level of computing power, what do we use it for? I mean, is it just gaming? Do you anticipate people doing a lot more homebrew AI stuff? What does this unlock, right, if it just really becomes super successful? I think there are two different markets here, one is enterprise, one is consumer.

Shobhit Varshney: Right. I think from there will be some enthusiasts on the consumer side that’ll obviously gravitate towards it. But I think there’s a huge potential on the enterprise side. What that gives you is being able to run compute that’s closer to where the action is happening. So think about industrial applications where on the factory floor, you want compute to be right next to where the manufacturing of everything is happening. Or, one of my clients, a large auto industry, they have a lot of trucks and buses and things of that nature, and you would want to have some mobile compute that you can actually run a model on. In a lot of these use cases, there’s a lot of latency between calling a server or a cloud API and getting responses back. Those are expensive. So imagine you’re taking, say, a picture on the manufacturing conveyor belt. You want to be able to process those near to where the images are being captured. There’s less latency and there’s a huge security concern here. You want to make sure that the data, especially if it is related to something that’s very sensitive, you don’t want that leaving your premise either. So you want to be able to run those closer to it. Same thing goes for, say, defense applications where you are doing something more tactical in the field. You want to be able to compute all the images coming in from all the drones and stuff at the particular place, because you may be in a territory where you really don’t even have a cellular connection. So all of those are heavy computing workloads that used to traditionally take cloud environments to scale up and run, that you’re now being able to do closer to where the action is happening. That’s a huge, huge unlock of value for enterprises. Today, we’ve been constrained by some cutesy little small models that’ll be running on mobile devices and things of that nature, but we’re not quite there yet where you can run a 200 billion parameter model right next to where the action is.

Tim Hwang: Yeah, that’s really exciting. Well, a lot more to pay attention to. I’m definitely going to get one, as it sounds like many of the folks on this call are. So we’ll definitely have to compare notes once they start arriving on our respective desktops.

Shobhit Varshney: Tim, apart from the DIGITS, there were some insanely good things that NVIDIA released during the keynote. There were, like, three different areas that Jensen wanted to ensure that people realize that this is what NVIDIA really does, right? So one was in physical AI, figuring out a way in which we can model the physical universe around us—a good set of starter AI open source that can understand the physics and we can start to train things around it. That leads to things like robotics and humanoids around us in our environments. The second big area of unlock was AI automotives. Figuring out how do we do autonomous driving, and you need the whole pipeline of millions of sensor data coming in. How do you process that and make decisions on the vehicle itself? And then the third one was around digital workers, agents doing regular day-to-day work as you and I do inside of all the softwares that we work with. Jensen spent 90 minutes on this on stage wowing the audience. That’s no easy feat, right? If you analyze the entire 90-minute conversation, you start to realize what an incredible communicator he is, breaking down complex concepts into such clarity. So in each of those different sections, he proved that NVIDIA is in fact a leader. They are making some bold moves to ensure that the ecosystem comes along with them. They just bought Run:ai for maybe $700 million, they turned around and open-sourced it. It’s such a baller move—$700 million and then you open-source it. So they’re trying to ensure that the entire industry moves closer to this physical AI and agentic AI and autonomous driving era. And they want to be the backbone across each one of them. Last year they had, in the gaming industry—and Skyler’s going to chuckle on this—the 40 series of their chips used to be $1,600. They just released an equivalent compute for $550. So just imagine, Apple will never do this. They’ll never take a $1,600 thing and have the next iteration be a third of the price. So you’re getting to this point where NVIDIA wants to make sure that the compute is as easily accessible and democratized as plugging into electricity. But they want to be the electric superpower of the entire world. And if you look at those three different areas, my hot take: NVIDIA is undervalued right now.

Tim Hwang: IBM is out with a new developer report, taking a look at developers’ views on the use of AI tools in their workflow. A couple of very interesting data points, but I think the place I wanted to start is on this really interesting result where the developers were asked, “Okay, so what do you want most out of an AI tool?” The comment was, well, we want things like trustworthiness in the AI, and it should be reliable and all the things that you would want. And then they were asked, “Well, what are the current problems with the existing AI tool set?” And it was exactly those same things. And so I do want to really ask this kind of question of the group, which is, it does feel like despite all the hype around code assistance and agents in development and all this kind of stuff that we’ve been talking a lot about, it seems like ultimately there still is this big trust gap and it is actually preventing adoption of a lot of these tools. I guess maybe Skyler, I’ll turn it to you first. Do you see that as a big problem? Like, do you think that it’s ultimately going to kind of put a ceiling on the use of these tools? And what should we make of this? It was sort of an interesting result for me.

Skyler Speakman: I’m not sure about a ceiling is the right term, but certainly a delay. I spent a good time last year, end of last year, in San Francisco at this International Network of AI Safety Institutes. So this big congregation. And the topics are, of course, around safety, robustness, trustworthiness, and those are the topics of the day. And here when I talk to would-be clients, they aren’t concerned about overall accuracy. That’s not their concerns. It’s “how are these machines reaching their conclusions and can we trust them?” That’s the back and forth we have now, not accuracy or even costs. So it’s a concern at a global level and even at just kind of an individual client engagement level. So yes, it’s been part of an IBM research strategy for many years now: what can we do with trust and governance in this space? Lots of work to be done there.

Tim Hwang: And I think there’s kind of one point of view—and Volkmar, I don’t know if you agree, working with a lot of folks who are kind of in the nitty-gritty of the technical aspects of this—is, you know, I think the AI person’s response also is, “Well, what do you care about, like trust or reliability, if it just works, then it just works, right?” You kind of think about the early days of, like, Google, where it’s like, “Oh, the Google Image Search, there’s this GPS thing that’s going to tell you where to go. Yeah, sure, I don’t trust that.” And then over time it just turns out the fastest way to get from point A to point B is just to put it into GPS, and kind of people get over their fear about not really knowing how these systems make decisions. Do you think that’ll kind of be the case here with all these developer tools that say we’re going to do code gen? And you’re like, “I don’t really need to understand ‘cause it just works and I’m moving faster than developers that are not.”

Volkmar Uhlig: I don’t think so. So the way right now the development works usually is—and I hope this is how it works for most companies—is you use the code generation kind of as like, “Okay, I know what algorithm I want, and I can proofread it.” I can proofread code about 10 times faster than I can write code. And so if I go and I need to build something, I’m just going to an agent and then the agent produces the code. I’m still checking that the code works, and there is still an architecture behind it where you are saying, you’re kind of interacting with the system and you’re almost having an engineer at your hand who is very fast and doesn’t get tired. And you still need to do all the engineering practices we have. You still need to write unit tests, you still need to write integration tests. And so there is a rigor to it. Now, if you have bad engineering practices and you don’t write unit and integration tests, then you may actually litter your code base with bugs. But that’s more of an organizational, structural problem, right? So do you allow code which is untested in your code base? A developer can make mistakes and the model can make mistakes. And we are primarily now asking, who has the higher likelihood to get it right? In the end, confidence in your code base will always come from test coverage and reviewing that the tests are written well. And typically in engineering, you’re saying your test should be 10 times easier to understand than the code you are actually writing, so that you actually know—it’s easier to check that the tests are correct than the code itself is correct. If you follow those practices, I think you will discover the bugs which get introduced. But if you don’t, yeah, good luck.

Tim Hwang: Yeah, definitely. So do you think that this report is mostly just revealing the fact that, effectively, the sort of AI engineering is still more buggy than humans? That effectively the lack of trust is kind of well-warranted?

Volkmar Uhlig: I think we are not at the point that I can go blindly to a model and say, “Produce me 10,000 lines of code,” and they will be correct. I think the big challenge is that humans are lazy. And so there is a tendency that we are overconfident in what the model is doing. And if you do that and we are not very skeptical about the output and we don’t review it, we will actually get bugs into the code base. I would flip it around and say the more open-ended question we have right now is where we are actually putting the model in the middle of the execution. So there’s one is the code generation, but I can review this. What if the model actually executes code? And we see this right now already in ChatGPT. You ask it a random question, and it goes out, and it produces actually Python code, and then it runs the Python code, and it gives you an answer. But then you look at the Python code, it may be buggy. And so sometimes the code doesn’t even, you know, when you do data aggregations, you have like a table and it has, you know, five values in the first column and seven values in the other column, and then it says, “Oh, Panda, sorry, I got an exception.” So that, you know, and this happens in real life. And so you get these answers which are just bogus simply because the code generation and then the code execution is wrong. And so that’s, I think, where it becomes much more scary, where we are doing this on-the-fly code generation. And I do not think that with the current accuracy, we are there yet. And so, for small things that may be okay, but for large things, I think you still need human eyes. Will that go away? Yes, probably. Over the next three, four years, we will get to a point that the code will be better than what a human can produce.

Tim Hwang: Shobhit, to bring you back into this conversation, I gotta believe that this is like your life, right? Customers and clients saying, “Well, I don’t know if I trust this stuff,” and then you being like, “No, the water’s fine.” I’m curious how this is kind of playing out in your world, because it feels like this is a conversation that you have day in, day out, all the time.

Shobhit Varshney: So, from an IBM consulting perspective, we have very strict guidelines and warranties and things of that nature. For any code that IBM produces for an end client, we have to be bound by what our master service agreement says, and what will go into the code, is it copyright-free, things of that nature as well. So there’s a pretty high bar for when our team members are producing code for our clients. And I think over time, you’re starting to see that the quality of the engineer that is leveraging these copilots matters a lot. If you are a software architect, somebody who’s senior, who knows how to make interns work for you... So say we get you some brilliant software developers and they have these parts of brilliance, they’ll show you some code that’s like, “Oh my God, I don’t believe that this intern wrote this.” And you realize that they actually copied it off from Stack Exchange and they modified it a little bit. So it was brilliant, but it was because they had access to other things. But unless you know how to judge that piece of code, it’s very difficult for you to even think about putting that into production. So the bar of the manager for an intern is pretty high. Similarly, when you get a copilot who is behaving like an intern, the person who’s using that copilot should understand how code is written, to the earlier points we’ve made. We need to know what good looks like. But if the code is being generated 100 percent by a copilot, then it’s very difficult for you to understand what logic was used. Earlier you said that you can just proofread code, but then you need to be really good enough and have done this over and over again before to understand what to even look for. What’s happening in reality today, 70 percent of the code gets generated and it works pretty well. The last 30%, the last mile, is where we get stuck. It’s an iterative process. It takes one step forward, but then it ends up taking two steps backward and may introduce some other bugs. So unless you really know how the code was written, how you would have written it yourself if you had the time, you’re not really able to get to that 100 percent unlock of value. So this tandem between a human and copilot, we also need to figure out a little bit better on how to ask the right questions, how to create the right test cases. And I think having an agent that’s going to go review and be the peer reviewer for the code that’s being generated, we’re moving towards that place. A lot of our deployments with our clients, when we introduce other agents to review the code, review the errors, that multi-agent system is delivering higher quality code for our teams than what we got from an LLM that we just start spitting out the code into it.

Tim Hwang: It’s really interesting to think about this as part of a maturity of the overall AI tool chain that needs to happen. So the lack of trust is the fact that we have this AI code gen thing, but it’s really not connected to any other AI tools around it, sort of is what you’re saying.

Skyler Speakman: It’s a new year. We can be optimistic. One of the insights from the same study—what was the lowest item on this list of 10? The one that people, the developers, don’t think is a problem, and I think is really interesting, it was the quality of the LLM. So these developers are, I think, correct and convinced that the LLM quality is going to continue to increase. That’s not one of their concerns. And it’s really interesting to see that sort of play out here as the lowest of the 10 options given here. This came up about half as often as the trustworthy issues did. So I think that’s a pretty interesting takeaway from here. LLMs will get better. How we integrate them into the decision-making process, that’s a different story, but I think there is kind of a global optimism that these LLMs are going to become stronger.

Tim Hwang: For our next segment, we’re going to talk a little bit about Apple Intelligence. There was a really interesting news story that popped up in the last week about how this new summarization feature that was part of Apple Intelligence had been messing up. So this would be a summary of your voicemails, your text messages, and importantly, your news stories, your news headlines that you were getting. And they found that in many cases, Apple was actually summarizing incorrectly. It was hallucinating. So Apple apologized and promised that they’d be doing better on the version two of this feature. I wanted to bring up this topic just because when we talked about this earlier last year, before the feature came out, the opinion that we had was AI is going to be perfect for Apple, and they’re going to get this so right, and it’s going to be so targeted. So I wanted to just go back and talk a little about, were we right, were we wrong? I guess, maybe Shobhit, I’ll throw it to you first on what your hot take is on that.

Shobhit Varshney: I think it underperforms in a lot of different scenarios. I think Apple is using a lot smaller models to do this on-device, dialing up on the security side of things to make sure that they’re small, can run, they’re not using some insanely large model to do the summaries and stuff like that. So a little bit of the performance hit, I believe, is happening because of the size of the models that they’re using. And we see this in real world as well as we’re building multi-agent systems and stuff like that too. So I think there’s a little bit of the balance between, “Hey, should I make sure that everything runs on device, and I’m going to constrain it only to a few things? It has to be, it cannot start draining the battery,” and a few other things that they have to solve for. “Well, so do I really get a really intelligent model to go do these summarizations and things of that nature?”

Tim Hwang: Skyler, maybe do you have a similar take?

Skyler Speakman: I think in addition to Apple getting burned here, I think there’s, at least from what I’ve seen from the headlines, it’s other news agencies that were using Apple technology. And so, for example, the BBC, you see this BBC breaking news coming up, and it’s completely made up. And so the BBC is actually feeling quite burnt in this. It’s not just Apple with egg in their face. It’s partners that they’ve gone with, because now they’re getting these headlines blasted to their customers with the BBC icon next to it—gibberish. So I think it’s going to be... yeah, Apple really has to really think about how—obviously the technical challenges of getting these hallucinations taken care of—but then how do you really pass that messaging on to the consumers going through another news agency the right way? Because I think they got hurt on this one.

Tim Hwang: Yeah, that’s right. Volkmar, one question I had in particular for you: I was having a conversation recently where a friend of mine was making the argument that Apple is ultimately a hardware brain, right? They do hardware. And he was saying that machine learning is very different. It’s just like, you throw a bunch of data at it and then the machine just sort of figures it out. And so its attitude is a lot more just like, “You know, just try it, and then if it works, you know, then it was like,” basically a lot more shooting from the hip than the mentality of how to do hardware. And so from this argument, he was saying, like, culturally, Apple’s just not well-positioned to kind of play and win in this space because of how careful Apple is in a lot of ways. Do you think that’s right? Like, is there a point of view here which is, in some ways, Apple was slow to launch the product, and then it just can’t bear, organizationally, the risk of these things, and so it’s always kind of kneecapped from really launching good features in the space?

Volkmar Uhlig: I don’t think so. So the way—like, do you remember when Apple kicked out Google from their phone and did Apple Maps, and it was a disaster? And they took a lot of heat for it. And now it’s one of the main routing applications. I think what’s happening is Apple was kind of in a bind because they were late to the game. They didn’t build a really strong AI team. This was very visible, like, I was living in Silicon Valley and Apple was just not there. They were not present. And now they were effectively in a bind of, “Okay, we need to bring something out. We need to make an announcement.” So they made a big splash, and they ship the product, they try to keep the functionality really limited, but effectively make a strong statement, “Hey, we are going to put something on our devices. We are not like missing the whole thing.” And I think they had to rush it out. I think fundamentally there is a backtesting problem. Those things could have been found if they would have done decent backtesting on very large-scale data—they have very large-scale data on the devices—they didn’t. And so now they get burned. Do I believe that will get fixed? Yes. I think what Apple is doing is defining on-edge devices how you do deep integration. I think it’s still clunky, like the whole “rewrite my email and rewrite my text messages.” It’s not good. The models are not good yet. We have much better models out. I think figuring out how to squeeze something into that form factor with the resource constraints you have, the power constraints you have, is the tough challenge. On the flip side, what we are seeing now is every generation of a new model, we pretty much get the same capabilities for the next smaller model. So the 70 billion parameter model gets to a 20 billion, and the 20 billion parameter model becomes 13 billion, and the 13 billion parameter model becomes a 7 billion parameter model. And so, just by waiting 6 to 12 months, we will see capabilities which have only been traditionally able to be done in the cloud on, like, two GPUs or so, will be possible to run on a phone. But if they would have waited the year, they would have lost the market. So I think they were in this bind. It’s like, “Okay, technology’s almost there. There’s a lot of hype around it. We need to do something, so let’s get something out.” And now they burn their fingers.

Tim Hwang: Yeah, that’s interesting. But it’s cool to think that this is basically like Apple Maps again. As someone who just switched actually from Google Maps to Apple Maps, I’m like, “Wow, this is actually in fact way better.” But it was such a funny thing ‘cause I remember the initial reputation of it was terrible, so I didn’t touch it for...

Volkmar Uhlig: It was terrible, like, you’re driving into the ocean. That’s how... I didn’t touch it for, so... And it’s, by the way, it’s still like that for anywhere except for their Apple offices. So we went to Japan and, like, you know, Apple Maps sends you into the forest. That’s amazing.

Tim Hwang: Shobhit, do you agree with that? It’s kind of like... it seems like Apple is the most disappointing, but it seems like what Volkmar is saying is, “Give it time, they will eventually win just because you can’t beat Apple.”

Shobhit Varshney: So if you look at the... actually, Apple did a lot in the open-source community last year, and it’s fairly impressive what they did with their Ferret-UI models. They have these smaller adapter models that can run on-device and things of that nature. The power envelope is pretty low, so they’ve done an incredibly good job and open-sourced a lot of that. There are a few things where I think Apple has a lead over some of the other mobile manufacturers and stuff: understanding of what’s on the screen, as an example. They have some brilliant work that they’ve open-sourced that lets you understand the different elements, so you can then build on top of that and create apps that can take actions on the screen, things of that nature. So they’ve done some really good fundamental work in 2024, and I’m expecting that in 2025 they’re going to start taking better use of the compute power going up, as well as the fact that now they’ve learned so much. The challenge with a really, really small model—and as you said earlier, the small models will get better than where they were last year and so forth—so we’re seeing that that’ll get incrementally better. But the fact that you are picking a small model to cover news articles from every domain, that is a challenge. If you’re asking a small model to do a bespoke piece of domain expertise, that works really well when we deploy this for our clients. But on an Apple phone, you’re expecting it to understand the nuances of negation and things of that nature on a news article that could be around biology, or it could be around some politics or sports and things of that nature. It needs to have the understanding of every term that’s used in golf. That’s different from the way you talk about it when you talk about soccer—soccer versus football, things of that nature. So you do need a larger model to do the summary, but that’s the balance there that they’re trying to make. And I think they will catch up in 2025, but the fundamental work that they did in 2024 was really, really good.

Volkmar Uhlig: I really agree with Shobhit here. Like the foundational work—how do you think about UI integration, how to think about on-device processing and also the offload, and then also the cross-quest data domain integration, understanding maps, understanding your calendar, understanding your email—all that foundational work, I think it’s incredible what they did. And so my expectation is that we will get AI Kit, too, where also people can bring their own adapters. Right now you can’t, but that’s just the next logical step because you can’t have 20 big models live on a phone because you just don’t have the memory capacity. And so the next logical step is like, “Okay, I can take the Apple model and I can fine-tune it for my specific domain and I can load my adapter into it so that I can bring new AI capabilities on-device, but have shared base weights.” And so this is where I think Apple did this foundational work by saying, “Hey, we are providing this as part of the operating system that people can build on.” And this is, I think, their strength. So they will do that ecosystem play and give access to it. But, you know, Apple always starts with the walled garden, and nobody can do anything until they figured it out by themselves, until they enabled all the applications, and then it will become kind of obvious how you build this. And then we’ll run it on our DIGITS.

Tim Hwang: Exactly. Yeah, exactly. So for our last segment, let’s do a little final round the horn. Sam Altman, on his personal blog, put out a reflections blog post looking back at the last two years of ChatGPT. There’s a lot in it. It’s a very long blog post. I think the big thing that came out of it for me was really just the degree to which Sam still really believes in AGI as the mission of OpenAI. He hits on it multiple times, and it’s still the big thing he’s rallying the company towards. But I kind of wanted to get the view of all of you on the panel on what you thought was surprising, what you thought was interesting. Shobhit, I’m curious if you have any thoughts on the blog post, and if there’s anything that you thought was surprising or kind of worth it for people to pay attention to.

Shobhit Varshney: Yeah, so he talked a lot about AGI, and I think we as a community have not agreed to what should be the levels of defining what AGI is. So I think we need to do a better job before we can even evaluate people’s opinions on whether AGI is achievable or not. If we don’t agree on a definition of artificial general intelligence between even humans—ten kids in a classroom or in high school or in college—it’s very difficult for us to have a good measure for that. So the community in 2025 needs to have better definitions, just like we did with autonomous driving—different levels and hear the scenarios, hear the test cases. We should do a little bit better job of defining that before we can evaluate if Sam is really telling the truth about how far we are from AGI.

Tim Hwang: Yeah, for sure. Skyler, any thoughts? Hot takes? Opinions?

Skyler Speakman: Yeah, slightly humorous take. I had forgotten that he was fired and hired back. So kudos to the PR team for that, and it wasn’t until reviewing the blog that I had that trigger again. You’re like, “Oh yeah, he was briefly not CEO.” I had forgotten about that. And so I guess that was, if you’re talking for a hot take of reflection and reading that, that’s probably what jumped out at me as it just triggered that memory again. So, um, yeah, that’s my hottest take of that.

Tim Hwang: I had a very similar experience where I was like, “Oh, yeah, that was last year.” So, last but not least, Volkmar, curious if you’ve got any takes.

Volkmar Uhlig: Yeah, I think it’s a mix of both. So I think, having been in startups and venture capital for more than 10 years, I can feel for the pain he is going through and the ups and downs, and getting fired from your own company is really not fun. But I think it’s really interesting to see the product evolution they are going through. And he is pointing this out, like, you know, “We did ChatGPT and we released this thing into the wild, and it’s the fastest-growing consumer product ever.” So that’s really amazing to see how AI took off. I think in the end, OpenAI created this new wave. They took the risk, they figured it out, kudos to him. And now it’s really the question—they have this really big north star of “we want to get to AGI.” And if you look at 2024 with o1, where they actually say we want to get to human-level reasoning, and they are still innovating, and it’s really impressive that if you look at OpenAI, they are clearly the leader in this industry right now. They are defining the next steps. And I think it’s part of Sam’s vision to say we want to get to AGI in a human-scale time frame. So, and every time they’re releasing a new product, it’s like, “Wow, this is possible?” I think they are still driving the industry and everybody else is a follower. So that’s really impressive.

Tim Hwang: Yeah, I love that. I think that was one big reflection on the blog post was just this guy who’s running this company seems himself kind of surprised about how fast things are moving. You know, it’s like, “Oh, yeah, wow, like we’re doing this thing. It’s only been two years.” And that’s very fun to see that even he is continually confounded by how things are happening. So, well, great. Well, thanks for joining us. Shobhit, as always, Volkmar, as always, and Skyler, as always. It’s a pleasure to all have you on the show. And thanks for joining us, all you listeners out there. If you enjoyed what you heard, you can get us on Apple Podcasts, Spotify, and podcast platforms everywhere. And we will see you next week on Mixture of Experts.

Explore more episodes

OpenAI o3, DeepSeek-V3 and the Brundage/Marcus AI bet

Is deep learning hitting a wall? It’s 2025 and the Mixture of Experts is back and better than ever. In episode 36, host Tim Hwang is joined by Chris Hay, Kate Soule and Kush Varshney to debrief OpenAI o3, DeepSeek-V3 and the Brundage/Marcus AI bet.

2024 Rewind: Breakthroughs in AI models, agents, hardware and products

Will 2025 be the year of AI agents? In episode 35 of Mixture of Experts, host Tim Hwang is joined by some show veterans to reflect on 2024 in AI and make predictions for 2025.

IBM Granite 3.1, NVIDIA Jetson, stealing AI models and is pretraining over?

Is pretraining a thing of the past? In episode 34 of Mixture of Experts, we cover IBM® Granite™ 3.1, NVIDIA Jetson, stealing AI models and more! Tune in to this episode to hear the latest AI news.

You might like

View all podcasts

A 3D geometric composition of a cube inside another larger cube, with overlapping areas creating color gradients

IBM Granite

Explore IBM® Granite™, our family of open, performant, and trusted AI models tailored for business and optimized to scale your AI applications.

AI in Action

Discover how we use AI to build experiences amidst all the hype about what AI can do. In this series, our host Albert Lawrence together with business leaders and IBM technologists bypass the theoretical rhetoric and show you how to put AI into practice.

Smart Talks with IBM

Be inspired by conversations between people who are at the forefront of innovation. Tune in to hear Malcolm Gladwell—one of the world’s most renowned thinkers and writers in social science—talk to leaders about technology that can transform your business.

Transformers

Hosted by Ann Funai, CIO and VP of Business Platforms Transformation at IBM, Transformers takes you behind the scenes and screens of today's transformative leaders. Explore their real-world challenges, inspiring lessons and practical best practices. From personal transformations to the transformative influences around them, Ann and her guests explore what it truly means to be transformers.

AI Academy

Watch AI Academy, a new flagship AI for business educational experience. Gain insights from top IBM thought leaders on effectively prioritizing the AI investments that can drive growth, through a course designed for business leaders such as yourself.