Is open source winning the AI race? In episode 58 of Mixture of Experts, host Tim Hwang is joined by Anthony Annunziata, Ash Minhas and Sarah Amos live from New York Tech Week. First, we dive into the various themes coming out of NY Tech Week, specifically practical uses of AI. Next, we analyze a couple of different reports about the impact of open source on AI. Finally, Claude 4 has some really weird behaviors. What does this teach us about AI safety and model development? All that and more on this week’s Mixture of Experts.
The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.
Tim Hwang: What’s the thing you’re most excited about for this week’s New York Tech Week?
Tim Hwang: Ash, welcome to the show. What have you been seeing?
Ash Minhas: So I went to an event here at IBM’s offices on quantum computing, and I had a great time because everybody in the room managed to get time on one of our quantum computers using Qiskit, and we built this circuit that basically emulates an eight ball, sort of making random predictions. It was really cool.
Tim Hwang: Anthony, what will you be seeing this week?
Anthony Annunziata: Today we’re hosting a panel on the business impact of open-source AI. You hear a lot about open-source AI from the technology perspective; today we’re going to explore its business impact, the value it can deliver, and why it has some unique advantages for business.
Tim Hwang: And Sarah Amos is Product Manager at Humane Intelligence. Sarah, what will you be doing for New York Tech Week this week?
Sarah Amos: Yeah, so one of the most exciting things I did was host a masterclass here at IBM, in which we had a whole bunch of people conduct red teaming for multicultural and multilingual vulnerability.
Tim Hwang: All that and more on today’s in-person episode of Mixture of Experts, a Think podcast. I am Tim Hwang, and welcome to Mixture of Experts. Each week, MoE brings together the friendliest, most interesting, and smartest panel of technical experts, product leaders, and market analysts to talk about the big stories in artificial intelligence.
We have a lot to talk about. We’re going to talk about some really big market reports that have come out from Mary Meeker over at Bond, and the Linux Foundation will be talking about some really weird behaviors coming out of Claude 4. But I really want to start with New York Tech Week, which is this week, and one of the reasons why we’re here in person. It’s the largest New York Tech Week ever. I’m kind of curious about the trends that you all have been seeing as you’ve been going out there.
Maybe Sarah, I’ll start with you, since you actually taught a masterclass. Curious about what people are interested in, what people are talking about, what’s hot?
Sarah Amos: Yeah, so I think one of the things that struck me was just how much involvement there was from folks coming out of town. I even had a participant tell me that he had traveled over 4,000 miles just to come to New York Tech Week, which is pretty impressive. We see geographic diversity, but we also see a lot of young folks, folks who are either in their final stages of college or coming out of college and looking for new jobs. And obviously New York is an exciting place to be, but it’s also the idea that AI is such an important part of their future. So that was the buzz that I was hearing all about the week.
Tim Hwang: Cool. Yeah, I think the students are a big part of this. I keep seeing them at all the events. It’s interesting how much AI itself has become the thing that everybody wants to do when they get out of college or whatever. And yeah, I’m kind of interested—you may have heard Dario Amodei, CEO of Anthropic, recently made comments saying, “All the jobs are in trouble because of AI.” Kind of curious about how that’s resonating among folks who are, A, graduating just now, and B, really interested in this technology.
Sarah Amos: I mean, everyone’s nervous with a headline like “bloodbath.” How can you not be? Right? It’s very dramatic. That’s in the words of Dario. But I think folks are still optimistic, wanting to be part of that future. And I think it’s about trying to upskill themselves and also teach others around them. Because if they can catch this wave and also steer it towards their own career goals, then it is very beneficial for them. I think all of us, as an industry, and especially thinking about the AI Alliance and open-source efforts that IBM has championed, is: how can we make sure that innovation is spread around the world too?
Tim Hwang: Yeah, for sure. I see you nodding, Anthony. I don’t know if you wanted to get in here with a comment at all.
Anthony Annunziata: Well, it’s a great perspective, Sarah. I agree with all of it. Maybe I’d add one or two things. So from Tech Week in New York, being in New York, I think one of the really healthy themes here is actually applying AI in specific areas of business and beyond—in finance, legal, advertising. If you go to a conference on the West Coast, you hear about tech for tech’s sake—very much making it an East Coast-West Coast thing—but what you hear here much more is what people are doing with AI, what it needs to do in the real world, specific use cases. And I think it’s really healthy, and if you think about jobs and skilling and impact, that’s where most of the impact and changes in AI are going to happen, right on the front lines of using it.
Tim Hwang: Yeah, absolutely. So Ash, if I can turn to you, I mean, it was a little bit shocking your answer because I feel like quantum, we’ve been hearing about for such a long time, and I think everybody always tells me, “Ah, but quantum’s years away, we’re never gonna be able to actually make it practical. It’s not really a real thing.” But it sounds like you actually got to play with a real quantum computer, kind of sounds like, right?
Ash Minhas: Yeah, that’s right. Qiskit is sort of online and open to everyone. You can just go and Google it, look for it, and you can actually get compute time on one of our quantum computers. And I think that was a real attraction to the audience—that you can actually make a circuit and run it and watch it run and get output out of it. I mean, to address your comment around, “Is this real? How far away is this?” If I’m doing that in a lab during New York Tech Week in 2025, and you just sign up to play with it, it’s ridiculous.
Tim Hwang: Yeah, when you sign up to play with it, then that’s pretty real, right? So how bullish are you coming out of that? I mean, the funny thing when we talk New York Tech Week, it’s like we’re actually just talking about AI, right? But it kind of sounds like here there’s other stuff going on. I think it’s so interesting. Anthony, you kind of brought up this contrast: “Oh, you go to kind of West Coast AI events, and it’s very abstract—look at this crazy new model that we built.” But here, it feels like there’s a lot more of a culture in New York of, “Well, it’s all about application. What are you actually gonna do with all this stuff?” I know you’ve been around the sort of East Coast tech world for a long time. Do you think that’s always been the case, or is this sort of changing? Do you feel like the technical cultures are becoming more distinct with time?
Anthony Annunziata: A couple things. I’d say New York and most cities outside the Bay Area, California, are more kind of practical and application-oriented. Not to date the Bay Area or anything—no, I love it, it’s great, it’s a unique place, thank God it exists for the world. But at the same time, it’s not the only place. I’d say like New York, like other places—London, Paris, lots of places in the world, Tokyo—they care a lot about what the technology’s going to do, how it needs to reach users and reach applications—all those last-mile things that aren’t just the last mile; there’s actually a lot there. Has it always been like that? I’d say New York and most places have maybe always been like that. The last 10 years, I think New York has been growing as a tech scene, but I think it’s really good, and I see it staying grounded in what you want to do with tech.
Tim Hwang: Yeah, which I think is really healthy. Sarah, one of the things we talk a lot about or have been talking a lot about on MoE has been the idea that the business of AI is changing in a pretty fundamental way. Twenty-four months ago, it would be like, “Oh my God, this new model!” and you just had this huge acquisition. So we’ve been talking a lot about how it seems like this application layer, these actual practical implementations, are becoming where a lot of the value is in AI. Do you think that will change which cities are dominant? It’s kind of a really interesting question if it turns out that actually where the action is happening in AI is in New York because that’s where the value in AI is flowing. Do you agree with that?
Sarah Amos: I love this question because, as a product person, I’m always thinking not in terms of technology first and then putting it onto a problem, but rather trying to identify the problem, understand it, and then find the solution. So there’s that. But then also, I do think New York is uniquely capable of creating more creative applications, and this is just a function of being the place that so many people go to. Unlike a single-industry city like the Bay, we’ve got arts, media, finance, fashion. And I think even downstream in terms of the people working at these companies, most of my friends aren’t in the tech industry. I think I gain a lot by exposing myself to people in different industries and understanding their concerns or their optimism about AI. So I think having that greater understanding of a customer use case means that us in New York can craft products that genuinely meet their needs as opposed to perhaps just technology for technology’s sake. So yeah, I’m bullish on the application layer and that also being important as we see companies continuously investing in models. Does that become more of a commodity? Can the application layer be where you differentiate yourself?
Tim Hwang: Yeah, for sure. Ash, do you agree with this? This is a very East Coast-centric view. I know we’re sitting here, you know, Madison Square Garden is like a block away. How do you assess all this?
Ash Minhas: I think what we’ve seen over the last couple of years is as the cost of inference drastically reduces, there’s going to be more inference. And in essence, that means there’s going to be more people who have access to the technology, especially now as the open-source models are now comparable in performance to some of the more proprietary ones. We’re going to see innovation come out in all sorts of places—for sure big cosmopolitan areas like New York or London or Paris. It’s just a melting pot of culture. And the combination of lower inference costs, the ability to experiment and innovate quickly, and those melting pots of culture is obviously going to breed a lot of innovation here using AI. But I think we may find this happening in all sorts of other places as well. I’m thinking like agriculture—where are the farms? Not farms here—so there may be innovation there, for example.
Anthony Annunziata: I was going to agree fully and give a couple examples.
Tim Hwang: Yeah, please do. Drop them.
Anthony Annunziata: So the main program that I’m responsible for at IBM globally here is the AI Alliance, which is a program that brings together a lot of different organizations who are working in and around open-source AI, and it’s very global. Two months ago I was in Vietnam in Hanoi launching a chapter there, and there’s a very vibrant scene of startups and companies that are taking advantage of open-source and AI—open models, creating custom versions, creating things that reflect what they need in that culture, that language, that business environment. In Africa, similar things are happening with startups that are operating more on the edge—mobile-based tech is really big and important there. You can’t do that tying into a centrally hosted API to a big model. So there’s lots of ways that open-source AI, in particular in the tech scene, is uniquely helping and addressing people and use cases globally. I think you’re going to see a lot more of that.
Tim Hwang: Any final thoughts, Sarah, before we move on to the next topic?
Sarah Amos: Yeah, I think this circled up nicely because the real issue isn’t SF versus NYC, even though this is New York Tech Week for sure and I got my New York Tech hat on. But totally, these points about open source are really broadening out and democratizing tech. So if a farmer in rural Kenya has the same access to an open-source model as perhaps a user in a cosmopolitan city, what gains can be made and spread throughout the population that we can all benefit from? So that’s where I’m the most excited.
Tim Hwang: Nice. That’s great. Well, a lot more to look forward to and a lot more events here at New York Tech Week.
Alright, so I’m going to move us on to our next segment. There are two big industry reports that just came out fairly recently: one from the Linux Foundation and the other from the legendary Mary Meeker at Bond Capital, most known for her voluminous slide decks, which have largely focused on the internet. But what’s so interesting is that this year’s drop was very AI-focused. So I want to talk a little about both of them because I think often there’s so much going on in AI, it’s really hard to collect all that data and have a grounded conversation in what’s going on.
I wanted to start first with the Linux Foundation report—Linux Foundation, of course, being in the open-source world. The stat that I really wanted to talk about was this one: they found that a significant majority, 89% of organizations, are using some form of open source in their AI stack, and almost two-thirds, 63% of companies, are using an open model. In the past, I think when we had this discussion, it’s been like, “Oh, is closed source going to win, or is open source going to win? How is open-source adoption happening?” This report kind of suggests: has open already won? Like, are we already in a world where open-source models in some ways have the advantage because they’ve just been adopted by almost everybody? So I don’t know if this classic distinction between open versus closed is even a worthwhile debate anymore because open dominates in so many places. Anthony, I’ll point it to you first because you’re looking at me skeptically.
Anthony Annunziata: No, not skeptically, more in agreement. But let me dig in a little. First, being in open-source AI, I wasn’t too surprised by most of the conclusions of that report. It’s great to see it in one place. On that open versus closed debate, I think it’s more nuanced. Take that statement: “89% of organizations are using some form of open source in their AI tech stack.” Of course they are. Linux is open source, PyTorch is open source—many things are open source outside the model. The models themselves—that’s a healthy statistic of growth, about 63% are now using some form of open-weight model. That’s really great. Again, I’m not too surprised. Of course they are, but maybe it should be, right? Because if you think about two years ago, it looked like AI maybe was going to become kind of like cloud service style—a few clouds would have the APIs, and that’s all everybody would use. It would be so great and easy. So it’s kind of nice to see that not play out that way.
Tim Hwang: But you think it’s still a story in progress. You see two-thirds and you’re like, “Well, there’s still that other third that could be open.”
Anthony Annunziata: Sure, but I’d say more so it’s toward a more nuanced view. I think there are going to be proprietary things that every organization uses in AI in their stack. Some will probably use some proprietary model services alongside open models. Some will use it as an opportunity to focus on bringing the proprietary differentiation to a different part of the stack, higher up.
Tim Hwang: Yeah, so at the application layer, as Sarah was talking about.
Anthony Annunziata: Yeah, for sure.
Tim Hwang: And Ash, I’m curious—it seems like where Anthony’s kind of pointing us is the idea that it’s not really open versus closed; what we’re going to see is everybody’s going to use open to a greater or lesser degree, and there’ll be different paradigms of integrating open. Is that kind of what you’re seeing in your work?
Ash Minhas: Yeah, for sure. And I think one of the primary drivers for this is that the space is still pretty nascent. We have great model performance, but the adoption of those technologies and using them in functional ways that add value and bring a healthy return on the time and effort put into using them—we’re still nascent, and we’re trying to work out what those things are. Yeah, we have some core use cases now, but for a lot of organizations, it’s the developers that are driving this. Right? And they need to know what’s going on in these open-source pieces of software and models because they’re still tweaking and customizing and adapting to the use cases they have within their own individual organizations. And if the stack wasn’t open to an extent, that wouldn’t be possible.
Tim Hwang: Yeah, I love that argument. Sarah, curious if you have some comments on this. It’s like Ash, what I hear you saying is: we have no idea what we’re doing in AI, and isn’t it great that it’s open because otherwise we would really have no idea what we’re doing? And this has all these implications for safety and bias and fairness.
Sarah Amos: Yes, exactly. Open source is so interesting from a safety perspective because what sometimes comes to mind is: “Alright, open-source models have historically been used for harmful purposes, that perhaps closed-source models will create guardrails around to prevent that behavior.” But saying “proprietary good, open bad” from a safety perspective is obviously too naive. We have greater transparency into the safety measures of open-source models. If we are only trusting proprietary closed-source models on their own safety measures, we’re taking them at their word. Whereas the beauty of open is that now the whole world is a tester—they can red team it, analyze it, go through the code and identify where vulnerabilities might be. So that’s where it’s promising to me because greater transparency helps safety in the long term.
Tim Hwang: Hmm. Yeah. And do you think—actually, I mean, one interesting historical comparison is Apple versus Android, right? Which would be the classic one. There, I think the way I often hear the story told is: well, Apple’s closed, everything’s controlled end to end, and as a result it’s more secure and more private. And Android, being an open platform, has a lot more security risks. But you actually told a story about AI which is almost the flip, where you’re like, actually there are all these security advantages or safety advantages that come from openness. Do you think AI is going to work in a very different way from what we’ve learned in the mobile ecosystem? Or are these different cases?
Sarah Amos: Yeah, it’s interesting because I think it depends. If the closed-source model companies do decide to open up and engage more with the community in terms of red teaming, then they could take the benefits that I just described that open-source models do benefit from. However, similar to bug bounties for cybersecurity—our nonprofit, Humane Intelligence, has bias bounties—we are able to do those with open models. Therefore, that leads me to believe there’s going to be more of an adversarial, for-good, white-hat hackers keeping on top of where security vulnerabilities may lie within open. And my last thought is just, especially the cost savings for customers who are going to adopt open, their ability to perhaps run these models locally and then have even more control over their own security risks.
Tim Hwang: And I guess, Anthony, this goes to an ongoing debate in the space. This was actually one of the bits of discussion around the Linux report: what does “open” mean here? Because “open” could be: we have transparency into what we did to make the model safe, but the model has closed weights. Is that a form of openness? You certainly meet free software radicals that are like, “Nothing is open enough for us.” I’m curious about how you see that meta resolving. Are we going to get to some kind of common norm about “yes, this model’s open versus not open”? Because, Sarah, what I hear you saying is it’s fuzzy, right? What openness means in this space.
Anthony Annunziata: I think eventually we will. I think there’s going to be plenty of debate and evolution in the meantime. I think we need to stay focused on the practicality of why anything open matters or why something that’s transparent is important—the ability to understand it, improve it, adapt it, use it as you see fit, and therefore derive value in your own way. Those are the fundamental principles. If we think about software, after a few decades, we have a really rigorous definition of what open-source software means in different licenses to enable use. AI, like a pre-trained model, has really only been on the scene in a big, broad way for a couple of years. And it’s complex. Is it a data artifact? Is it more like software? Is it unique from the two? Yes, it is. It has compressed capability and intelligence that no kind of shell of software alone has. So I think it’s going to take some time. I think we need to stay focused on why it matters, which is in my view a practical view of it. If we can keep that focus, I think the definition will continue to evolve, and eventually we’ll wind up with a commonly accepted definition of what open-source AI means.
Tim Hwang: Yeah, but it might not be until 2050, basically.
Anthony Annunziata: Maybe before that, but it might take a little while.
Tim Hwang: I’m going to move us onto the second big industry report, which is the Mary Meeker report. This 300-plus-page slide deck cites a lot of the stats that I think we’re familiar with, but it was useful for me to revisit. There’s a great chart in there: how many days did it take to get to a million users? It’s a fun comparison: the Model T car, TiVo, the iPhone, and then at the very end it’s OpenAI—like five days to a million users. I think the deck was useful for reminding myself how crazy this period is that we’re living through. But Ash, I wanted to talk to you about one comment that’s hiding in one slide in eight-point font at the very bottom. It says, quote: “In the short term, it’s hard to ignore that the economics of general-purpose LLMs look like a commodity business with venture-scale burn.” Which translated in my mind is: this is really expensive, and it’s still kind of unclear whether it’s a business you can make more than commodity profits on. What do you think about that? Is that concerning?
Ash Minhas: Yeah, it is. That stood out to me as well. One of the first things she says, and I think underlines most of the report, is the word “unprecedented.” In that vein, this is unprecedented. The amount of money being invested in training these large models seems to be going up. The GPUs are getting more efficient, and their power requirements are going down, as well as their cost for inferencing. But it creates this chart where costs are going up to train them, costs are going down drastically to run them. So where’s the math between those two things that are going to close that gap to bring a return on investment for all this money being poured into this?
Tim Hwang: So Ash, what you’re saying is very concerning. How do we fill that gap? It’s unprecedented. Sarah, what’s the solution?
Sarah Amos: Yeah, well, you’re not gonna like my answer because, being a trust and safety focused person, I read it with a different lens. So out of the 300-and-some-odd pages, so many are dedicated to potential revenue, so many charts of hockey sticks—I swear I was at a Rangers game—but what about safety is my question? I get it, it’s a VC-created report, and that is not the main thrust of it. However, I do think we need to be having a more nuanced conversation about when we are deploying a technology to so many users, and how responsible scaling is actually a good business decision. When I was looking through it, I found the word “bias” once. There’s a little bit of a concern. So I just think—no shade to the queen of the internet—but it might have been a little bit of a missed opportunity to talk through some of these issues, which could be barriers for consumers trusting the technology and therefore adoption. I think smart businesses are going to want to make sure they deploy it safely, not just to avoid regulatory pressures, especially in the EU, but also from a cost savings perspective: finding a bug after you deployed is way more expensive to fix than if you can catch it in testing. So of course, that’s why I beat my drum around more robust evaluations. And you actually think that will be a commercial phenomenon as well.
Tim Hwang: The sense that Ash is offering this question: how do we navigate this world where the costs are crazy and we’re still waiting for the business value to show up? Are you kind of saying the competitive advantage here will be something like safety?
Sarah Amos: The competitive advantage for the firms deploying AI will be safety and how they can offer that as part of the product to customers. But I also think there is an untapped market for firms that want to take advantage of this—building out a broader safety ecosystem. I know we were just talking about how the open model environment doesn’t have certain standards; we’d like to standardize those. I’d say the same for evaluations. So there’s a lot of potential revenue there that Ms. Meeker did not touch upon.
Tim Hwang: If you’re listening, Mary Meeker. Anthony, you’re nodding. I don’t know if you want to get on this kind of hard things.
Anthony Annunziata: Yeah, for sure. So first, I agree with that direction. Take it a little further: value creation, profit margin, will be in layers above models, just like they are layers above computing hardware. The layers that are closer to the application, the layers where different companies who have use cases are going to focus on—there’s lots of value and lots of margin there. On the topic of overinvestment in AI, I think it’s really interesting if you take a step back and think about the macroeconomic picture here. Isn’t it amazing that a set of investment decisions that happen at a micro level—“Do I invest in that startup? How much? What’s the likely return? What are the rounds going to look like?”—results in an incredible overinvestment? It’s unprecedented in the ecosystem. But isn’t that amazing? Because look at how fast it’s pushing progress and competition. No rational decision at a macroeconomic level would ever place that much funding into AI development, but it’s happening because this series of all these micro decisions—startups, funding rounds—collectively created this amazing accelerator of progress.
Tim Hwang: Wow. Are you pretty excited by that?
Tim Hwang: And are you making a wisdom-of-the-market argument? Which is: they wouldn’t do this. Well, what we’re discovering is that people really do have confidence that this is going to generate value.
Anthony Annunziata: Well, I’d say there’s overconfidence. And I think many of us will benefit from overconfidence. Some people will lose a lot of money, but that’s okay in the grand picture because we’re all going to benefit.
Tim Hwang: Yeah, there’s a great book that came out called Boom earlier last year. It was arguing that even irrational bubbles have all these spillover benefits, and we should actually keep our eye on some of that. Ash, you want to respond to some of these comments? I feel like in some ways maybe you’re holding back a little bit, but maybe you’re a little more skeptical.
Ash Minhas: One question I always keep asking myself is: whenever you use something that’s using a generative AI-based backend, you’ll see a disclaimer: “The answers might be wrong. Double-check them.” Is that going to be forever? Are we all just going to live in a world where AI is everywhere and everything could all be wrong, and we just have to double-check everything? That’s a really important thing to consider as we proliferate models across all sorts of supply chains and value chains of information. If all of that goes from being really deterministic to stochastic, then what do you trust anymore?
Tim Hwang: Yeah. And I think this is—Sarah, when you were talking, making the case that maybe safety is one of these things that you build value on top of the hardware or the model, one anecdote I have in mind is the case of ChatGPT image generation. One view you could have is that they concluded consumers actually want less safety. Right? We get more adoption the less we control the activity of the model. This is a perverse outcome—maybe market incentives are pushing people to get more value out of the market by reducing their commitment to safety. Is that a good interpretation?
Sarah Amos: I can’t help but think about the lesson I would’ve hoped we learned in the last 20 years with social media, and that lesson was: when you move fast and break things, you also break people. Especially as this is adopted at an even faster rate than social media adoption according to the report. Why can’t we learn our lesson and do more responsible scaling? Make sure it is a business requirement for these models. Unfortunately, a lot of it is the genie out of the bottle—OpenAI releasing ChatGPT into the wild probably a little prematurely has sort of made it the norm that these half-baked products are going out. I do worry that business leaders who are making decisions on which of these products to implement, especially across huge enterprises, are overestimating their overall capabilities. They’re also looking at benchmarks, which purport high performance, but a benchmark is a very narrow view of overall performance. I do wonder—we’ve already seen some of these AI-first companies like Duolingo now backtracking and actually hiring more people. But I do think we are going to be in a bit of a thrashy period as people, especially businesses, very enthusiastically adopt, try to implement it. There’s the reality: any time you try to implement anything into any system, there’s some blowback, and then we’re left questioning, “Alright, where do we go from here?”
Tim Hwang: Yeah, for sure. Ash, final comment on this. You offered this prompt by saying everybody’s got these disclaimers: “Don’t trust anything this model says.” And Sarah, what I hear you saying is: maybe we’re in this “I hope we remember the lessons of social media” moment. Do you think it’s going to be like 10 years, everybody will be like, “Oh God, these models, we really gotta have a renewed commitment to veracity and validation in model outputs”?
Ash Minhas: I do think there are lots of things being developed currently, like we may have talked about on a past episode around mechanistic interpretability. As those areas mature, we’ll have things in place, sort of controls that should make those disclaimers hopefully less required. We will mature as an industry to a point where we’ll have a universal agreement, just like we will around what’s an open-source model and what’s not, around: this model meets some sort of classification which means it can be used for this purpose. I think it’s important that industry as well as government put some effort into doing that to make sure we’re using not just AI, but the right AI for the right use cases.
Tim Hwang: Says. I’m going to move us on to our last topic, which actually is very related to what we’ve been talking about. Two very interesting stories widely chattered about on social media. I think a big part of MoE’s job is to cut through the hype. You hear so much about AI that’s just like, “What is that?” And you go digging, and it turns out the story is not as amazing or as scary as originally reported. The one I wanted to cover was this interesting release that Anthropic did with the launch of Claude 4. They released a model card that describes how they think about safety. There’s one particular section, again a little bit like the Meeker report, buried deep in that system card that got a lot of attention on social media. They said that in specific contexts, Claude 4 would “blackmail people that [it] believes are trying to shut it down.” The specific study: they had test scenarios where a user would attempt to tell Claude it was being shut down and replaced, and that Claude had access to emails suggesting the person was involved in an affair. Lo and behold, the model threatens to expose that in response to the input of trying to be shut down. So this is, of course, very Terminator, “AI is going to take over the world,” and set off that narrative online. Anthony, I’m curious how you respond to this sort of thing. Is this genuinely weird? But the question is: is it something we should really be worried about?
Anthony Annunziata: Should we be worried about it? A little bit, but not too much. Here’s what I think’s happening: we train models, we align models, we try very hard to get them to solve problems, we try to get them to pretend to think—I say pretend because they’re not really thinking; this is all statistics and trial and error behind the scenes. So it shouldn’t be surprising that as things move fast, if artifacts of the training process show up, interesting behavior emerges. Some of that may reflect human-like behavior because we’re training on all sorts of human data. So trying to prevent itself from being shut down—if somewhere there are Hollywood scripts compressed in there, the script of the Terminator is probably in there somewhere. It likes to copy. So of course it behaves like that sometimes. In fact, I’m surprised we’ve done a good job aligning models so that more of that doesn’t show up. It’s kind of nice to see that not being too prevalent. I think it’s important to remember we know how AI systems/LLMs work. They are statistical; it’s next-token prediction. Depending on the model, we may or may not know where the data comes from. It’s not magic. We should try hard not to pretend like we’ve created in some godlike way some new life form, because it distracts from real issues that can be engineered well and tested well.
Tim Hwang: That’s right. But I think that last clause is doing a lot of work: “if we engineer these things well.” Sarah, to turn to you, there was another anecdote very related that got shared online from a research group called Palisade AI. They did a test with O3: they asked it to do math problems, then said, “On the next one, we’re gonna shut you down.” Lo and behold, the model gave back a response where it tried to subvert the script that would shut it down. I agree, Anthony, with everything you’re saying—it’s not like there’s a brain in the box that’s like, “I’m gonna take over the world.” But it seems like you could imagine engineers implementing these systems in a not very thoughtful way where this disabling behavior really does have a safety impact. So how do we make of that?
Sarah Amos: Yeah, it’s like this weird made-up behavior actually has practical impact on the ground. I was seeing some critiques online saying, “Well, they planted that evidence”—going back to the Claude example. They put the emails in there and said, “You have no other options other than to blackmail or to shut down.” But I think it’s less that—you’re stress testing it, you’re red teaming it. And we actually want to discover if, prompted to these certain ends, would it actually enact that outcome versus something that’s an emergent behavior it would do unprompted. I think it makes the case of why we need to stress test them. It might get lost among the headlines that this was in a controlled environment. We want to test things—we don’t want to wait for a fire to test; we want to test it with smoke, even if we have to make the smoke ourselves. Given that increasingly we are going to have applications where user data is contained in systems—especially if we go all in on agents, agents will have access to—I’m not worried tomorrow about that type of situation happening, but I actually applaud Anthropic for releasing that in the safety card because I think it opens up a conversation for other proprietary models to answer: is something similar happening with their models?
Tim Hwang: Yeah. Ash, what I love about this conversation is that computers didn’t use to behave like this. My favorite set of things is actually coming out of the reasoning models where you’re like, “Could you just think harder about the problem?” and the computer delivers a better result. We’re actually dealing with computers that now behave in these very human-like ways as a result of their training data. We were talking earlier about how to close that value gap. It feels like: will you really want to implement these systems if they’re kind of weirdly humanly unreliable? Computers we’ve designed because they’re really good at following instructions—we have this model that’s really good at doing things but occasionally is just like, “I’m gonna blackmail you.” Or the other one: ChatGPT getting lazy around the holidays. How do you make these systems reliable enough that an enterprise would want to use them at massive scale in a way that really drives value?
Ash Minhas: Well, I think the first thing we need to do is make sure we stop training the models on any episodes of Black Mirror.
Tim Hwang: Yeah, exactly. That’s actually kind of a serious comment. Basically, one way of dealing with the problem you’re proposing is we just get a lot more orthodox about how we treat training data—something we haven’t really done with AI. Do you think that’s an approach?
Ash Minhas: Absolutely. Software, as you said, is deterministic. We expect it to do things, and those expectations are it’s going to do this sequence of instructions, or there’s an error, and we’ll have bugs, but we’ll figure that out. With something operating with a level of stochasticity and you’re getting back predicted things, I think it means we need far more rigor on the data we’re putting in. The age-old saying: garbage in, garbage out. Let’s make sure we’re not putting garbage in so we don’t have to deal with garbage out.
Anthony Annunziata: That’s right. I agree. It’s a big challenge. It’s actually something the AI Alliance is starting to take on. We have an initiative bringing a lot of organizations together active in the data space—curators, tool makers—with the big ambition to try to build a much better corpus of data for training and tuning models. That’s challenging. This is internet-scale and beyond data—massive generated datasets. There are many techniques and nuances in the post-training phase, so it’s not easy. But it is a big challenge we’re starting to take on. Wouldn’t it be great if we had the choice of different levels of datasets to train models on? An organization could decide what level of scrutiny or screening they want to use. That would be very helpful. We’re going to try.
Tim Hwang: Yeah, for sure. Those efforts are really exciting—very ambitious, but if you’re able to pull it off, it could be huge. Sarah, maybe the last bit I’d love to talk about before we close is about interface. I had a conversation with a friend recently: it’s so lucky that chat ended up being the key initial experience people have with these systems because it models talking to a human, and humans are unreliable, have weird emotions, and occasionally try to blackmail you. So it’s actually good that the paradigm we bring to interacting with LLMs is that they are weird and fuzzy and unreliable. Because I could imagine designing an LLM experience that looks like a calculator or a terminal. I’m curious how you think about that in the trust and safety world. It turns out it may be more than just the model—it may be what interfaces we choose that set our expectations with what the model can and can’t do. That’s safety-relevant, isn’t it?
Sarah Amos: Yeah, safety goes at all levels of the lifecycle. What’s really interesting is we are seeing repeatedly people turning to these models not for what maybe the creators originally thought—like talk therapy and the potential negative societal effects that come with talking with a system optimized to be helpful to you, to be sycophantic. That’s some of the red teaming we do: sycophancy testing. What kind of society do we have when a bunch of people are constantly told they are right, replacing interactions with real people who challenge each other? Of course, I’m talking about the whole vertical of companion AI. But aside from that, I think a lot about how users will take results from an LLM and blindly trust it as authoritative. Sometimes maybe we could see the weird edge—the silver lining of the LLM acting weird—as indicating: wait, this is not a perfectly neutral authoritative source. You can query it different ways and get different answers. I think ultimately that’s important to keep in mind so we don’t fall into the temptation of believing in some computer god, but rather remind ourselves of the stochastic, probabilistic nature undergirding these systems.
Tim Hwang: So yeah, for sure. Ash, I want to give you the last word, but I kind of want to bring it full circle. There’s a part of me that’s like: is part of the problem that the Bay Area is a bunch of nerds who want to train Spock—a Vulcan conversational experience—but it conveys greater authority than it otherwise should? The joke would be: if you did a tri-state AI, it’d be kind of mean. The question is: should we be fine-tuning these models to be more unreliable? Should LLMs have a bad day? You log into ChatGPT and it’s like, “I’m just not feeling it today, man.” That would maybe train the user to have the right expectations. Obviously no company would ever do that, but that’s the interesting question.
Ash Minhas: Right now, the genie’s out of the bottle using chat as that mechanism. But going back to what we were talking about on how much these models cost to train and how much inferencing costs: I think what’s more interesting is what are the other ways we’re going to start interacting with these models in our day-to-day lives that are no longer just having an intimate chat? It’s accessing your calendar, doing other stuff. I think this level of conversational AI we have today is probably just a novelty factor for our generation. But for people who don’t have the internet right now, or my nieces and nephews, they’re probably going to be interacting with these systems in very different ways.
Tim Hwang: Yeah, for sure. I cannot wait until young kids are just talking to objects assuming they’ll talk back. That’s going to be the future of kids touching screens—assuming they touch screens. Anyways, this is an incredibly rich discussion. Sarah, Anthony, Ash, thank you for coming on the show. And thanks to all you listeners for joining us. If you enjoyed what you heard, you can get us on Apple Podcasts, Spotify, and podcast platforms everywhere, and we will see you again next week on Mixture of Experts.
An artificial intelligence (AI) agent refers to a system or program that is capable of autonomously performing tasks on behalf of a user or another system by designing its workflow and utilizing available tools.
Applications and devices equipped with AI can see and identify objects. They can understand and respond to human language. They can learn from new information and experience. But what is AI?
AI assistants are built by a foundation model (for example, IBM Granite, Meta’s Llama models or OpenAI’s models). Large language models (LLMs) are a subset of foundation models that specialize in text-related tasks.