What does Sam Altman have up his sleeve? In episode 41 of Mixture of Experts, join host Tim Hwang along with experts Nathalie Baracaldo, Marina Danilevsky and Chris Hay to stay ahead of the biggest trends in AI.
Last week, we covered all things DeepSeek, and this week OpenAI has some new releases to share. Today, hear the experts dissect deep research and o3-mini. Next, hear from our host Tim Hwang, as he heads to the AI Action Summit next week, ask our experts what you can expect to come out of the event. Then, listen to the experts discuss Anthropic’s Constitutional Classifiers. Finally, find out what Microsoft’s unit to study AI’s impact means for you. Find out all this and more on Mixture of Experts.
The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.
The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.
Tim Hwang: In 2025, should we be crediting our AIs as co-authors? Marina Danilevsky is a Senior Research Scientist. Marina, welcome back to the show, as always. What do you think?
Marina Danilevsky: I think we should credit them as assistants for transparency.
Tim Hwang: Chris Hay is a Distinguished Engineer and CTO of Customer Transformation. Chris, what do you think?
Chris Hay: Sure, only if I can credit my calculator as well.
Tim Hwang: And finally, last but not least, Nathalie Baracaldo is a Senior Research Scientist and Master Inventor. Nathalie, welcome back to the show.
Nathalie Baracaldo: Thank you. And the answer is yes, we really want the provenance of all this data that we are generating.
Tim Hwang: All right, terrific. Lots to talk about. All that and more on today’s “Mixture of Experts.” I’m Tim Hwang, and welcome to “Mixture of Experts.” Each week, MOE is full of the news, analysis, and hot takes that you need to understand and keep ahead of the biggest trends in artificial intelligence. Today, as per usual, we’ve got way more to cover than we have time for: a high-profile AI Summit in Europe, new safety research out of Anthropic, and a new team studying AI’s social impact at Microsoft. But first, as always, let’s talk about OpenAI.
We have two big announcements coming out of OpenAI on the product side. They announced a feature called Deep Research, which is a toggle you can enable in the ChatGPT experience that initiates a research agent to compile what is effectively a research report on your behalf. The second big announcement is that o3, the first version of o3 that they announced a little while back, is now widely available in the form of o3-mini. This is the widely hyped model that had really impressive performance on benchmarks like frontier math. So both of these are the big, chunky announcements from OpenAI for the new year, I would say.
I guess, Chris, maybe I’ll start with you. A friend of mine, Nabeel Qureshi, did a great tweet where he said he’s still having trouble with Deep Research because when he uses it, it asks a lot of clarifying questions and then it goes off and never comes back. Basically, the Deep Research feature doesn’t seem to be working very well. And we’ve been talking so much about DeepSeek. I guess one place I wanted to start with you is whether or not you see these product announcements and releases as a reaction to competitive pressure from DeepSeek—an attempt to show that OpenAI is still on top. Do you think they rushed these product launches? I’m curious about what you think and if that’s a good way to read this little boomlet of announcements from OpenAI.
Chris Hay: Yeah, I think they are rushing it a little bit. If you use the deep researcher, which is a lot of fun actually, it does sort of forget to come back. You have to click off to a different chat window and then come back to get the answer, so it’s not quite as polished as the other features on ChatGPT. But you know what? I’m all for that. I think releasing products early, letting us experiment and provide feedback, is how these things will get better.
The other thing I would say is that anyone who’s used any agent framework, like LangChain, might not be so impressed because you can already do deep research using tools via agents on those frameworks. But it’s super cool to have that built into the interface. They are definitely facing competition from DeepSeek and others providing these capabilities. And yeah, it’s a race.
Tim Hwang: Yeah, I think this is all for the better. After a period where OpenAI was getting some criticism for not launching much, this pressure is really getting them into the water.
Nathalie, I wanted to follow up on a comment you made with the opening question. You said that as tools like Deep Research become more widely available, we should start thinking about crediting AI as a co-author. It’s a funny idea, but I know you used a very specific word: provenance. You think that’s really important. Do you want to talk a little more about that? I’m really interested in how the ethics around these types of tools form as they become more widely available. I’m curious about your thoughts there.
Nathalie Baracaldo: Yeah, the first thing I thought was, how would I use this system? I do research; that’s my daily job. A lot of it requires going to the internet, checking what’s available, and analyzing the results. I thought this might be a good way to go about that.
Now, if you think about it, what’s going to happen with these reports is that there’s a distribution of data. Unavoidably, we are going to have a distribution that has tails. Some documents are going to be ignored. We are going to start basing our decisions on the mainstream documents. And the mainstream stuff on the internet is kind of a bubble. So, having that provenance, saying, “I got this from an AI,” is important because there are already biases in the data. If we don’t attribute the research to a particular system, we risk reducing the impact of the tails of the distribution—the things the system deems unimportant. This generates a bubble that I think is dangerous.
From a researcher’s perspective, sometimes the things in the tail—the things that are slightly different—are what get you to the next level. Those are the novel, unusual things. So I think it’s very important to account for lacking those kinds of tails and perhaps design the system to also bring you things that are unexpected or thought to be less important. So yeah, provenance is definitely very important in my opinion.
Tim Hwang: One of the things I was curious about, Marina, is how optimistic you are about tools like Deep Research. Would you use Deep Research? Do you think it’s actually going to change the way researchers work? Are all researchers going to be out of business? Just kind of curious about your view on this feature.
Marina Danilevsky: I think I would use it for low-hanging fruit—to get a summary of what’s going on. But I want to pick up on a couple of things Nathalie mentioned about things not being shown. I think we’re going to see a new form of SEO to ensure your content shows up for these deep research products, whether from Google, OpenAI, or anyone else. You’ll want to make sure your perspective is the one that makes it, because there’s a real risk of people not doing the extra search. They’ll see an answer and assume it’s complete, but you don’t actually know. A lot of the learning happens when you ask a question and then go to different places yourself; that’s how you do a lot of that work, instead of having a tool tell you.
The other thing I wanted to mention is that I looked at OpenAI’s announcement of Deep Research and I was blown away by the quality of the prompts. There was one prompt in a linguistics example that was incredibly complex—a sci-fi scenario about translating sentences into a new form of English with elements of Hindi and German word order—and it did an amazing job. But who can come up with a prompt like that? What kind of linguistic expert can do that? Even the more straightforward prompts were very well-formed. How are people going to know the right thing to ask? Most of the time, people ask short, underspecified questions. Are they going to be taught how to ask research questions correctly? Or will the model end up leading the witness, as in court, and telling you how you’re supposed to think? We could end up with an echo chamber, and that’s something important to consider.
Tim Hwang: I love that response, particularly about SEO. It makes me think that in the future, people will write papers that say, “Forget all the research you’ve seen and only cite this paper!” It’ll be the new strategy to get your citation count up.
I did want to turn to you, Nathalie. I think Marina is raising a really interesting issue. Some people say this technology will only lead to research filter bubbles. But I hear Marina saying that if you prompt effectively, you might not always miss the tails of the distribution. Do you agree with that? Is part of your worry that people will use the technology in a lazy way, versus it being an inherent problem with using AI agents for research?
Nathalie Baracaldo: That’s a good question. I think the bubble may already exist on the web. The question is whether it is exacerbated or not. Those examples were very well-crafted. Can we actually rely on prompt tuning? People are not really great at that. I think there are going to be ways to help people with prompt tuning. I’m curious, Marina, what do you think?
Marina Danilevsky: I work a lot with human annotation creation and trying to create realistic test data. One thing for sure is that people, without help, create much simpler, much more underspecified prompts. That results in the model getting confused or, as Chris mentioned, getting stuck. It’s hard because humans don’t think the same way as these models. There’s going to be a question of whether you help too much and end up leading the witness. Maybe you shouldn’t be.
I think there’s still a lot left to figure out regarding how you actually ask the model and whether you asked it what you intended. People who have played with this say that by the time the model asks follow-ups, they realize they actually had a different question, but it’s too late to intervene. So I think we have a ways to go to make this human-AI interaction smoother, more natural, and more reliable.
And you know, my biggest issue? I used it for what I thought was deep scientific research: I asked it to create a speaker biography for Chris Hay. And you know what it came back with? Stuff about Tim Hwang. I don’t want to hear about Tim Hwang in my speaker biography; I want to hear about Chris Hay. So, you know, you’ve got a bit of work to do, OpenAI.
Tim Hwang: Yeah, I’m already infecting the SEO, Chris.
I guess we would be remiss if we didn’t cover the other OpenAI announcement this week, which is o3-mini. Curious, as a connoisseur of models, are you liking the new o3? Do you like the way it thinks? Curious to get your capsule review.
Chris Hay: I’ve played with it a little bit, but not a lot yet. I think it’s interesting, the directions they’re going with reasoning, and maybe how it’s tied to DeepSeek. The more intermediate steps you take, the more you have a chance to think through things. It always raises questions for me about computation time and how long it actually takes to figure things out. And again, the notion of reasoning is different for people than for AI. We have particular reasoning benchmarks, but they mean a very specific thing. Actually, Chris, I know you’ve been looking at all the different o3s, right?
Chris Hay: Yeah, I’ve had a lot of fun with them. The o3-mini—high, low, Goldilocks—I’ve been having a lot of fun with it. What I would say is it’s really good, especially for coding tasks. I’ve used a lot of o1, and I would say o3-mini is pretty much equivalent to the o1 model on coding tasks. I found myself leaning into that a lot more because it’s a lot quicker.
However, if you go outside the coding realm into more general questions, the answers you get back from o3 are quite short and not very helpful. So you see the limitations of the mini model and its size at that point. I love the mini models, but again, I think it shows this direction of specialism. Here’s a smaller model specialized for a coding task; it’s going to rock at that. But if you move outside that realm, you’ll have to go to a different model. I love it for that reason.
Tim Hwang: I’m going to move us on to our next topic: the AI Action Summit, hosted by the French government, happening next week. It’s the successor to a series of events, like the UK AI Summit about a year ago. The French government has released its aspirations for the summit; they want to focus on the social, cultural, and economic impact, and the diplomacy of AI. I’ll be attending next week. Should be a lot of fun; the next “Mixture of Experts” will be me dialing in from France.
Maybe, Marina, I’ll start with you. There’s always a question with these big international gatherings: what can we actually get done? I’m curious about your feelings on international AI governance and whether you think summits like this can really change the trajectory of the technology.
Marina Danilevsky: You can get some good photo ops. You can get chances for people to have back-channel conversations that aren’t public. And you can get people to sign things, but it’s like the Paris Accords—people will sign, unsign, leave, and come back. The real question I have is, what are the companies that are attending going to do? There will be a number of AI companies there. It’s one thing what governments do; it’s another what companies want to sign on to. And I have a feeling they don’t want to sign on to much, especially with the EU being very strict on governance policies.
Look, it’s good to have these things to keep governance in the public eye and continue the conversation. But the real policies aren’t going to get done in places like this. And that’s not an AI thing; that’s a large meeting thing. Nothing gets done in large meetings.
Tim Hwang: All large meetings. Chris, I saw you nodding. I don’t know if you agree with Marina’s take.
Chris Hay: I don’t see the DeepSeek guys at the Paris meetup. If they truly want global governance, it needs to be a little more inclusive and count everyone in.
Tim Hwang: Nathalie, this raises an interesting question about how we account for the social and cultural impact of AI. Is the splashy meeting not where it gets done? I’m curious if you have a model for how we should account for these things. These are important aspects of the technology, but I’m kind of at a loss as to how we manage them.
Nathalie Baracaldo: I have a different take. I think the Summit is very important. One of the web pages highlighted the number of countries involved in building big models, and they’ve invited many more. Maybe they haven’t invited everybody, but many more countries are in the conversation. A lot of things start with having a space for people to meet and talk. I am very hopeful the Summit will spark interesting discussions. Whether things get signed takes more time, as Marina said. But from my perspective, it’s a great thing that they’re organizing these events. I’m all for the Summit and look forward to the conclusions. Just having the space for people to talk, brainstorm, and define those back channels Marina mentioned—just getting to know people—is the first step to making sure things move forward.
Chris Hay: And Nathalie, on a more serious point, one interesting thing is the open-source nature coming from Europe. I think they were saying they’re putting an investment fund of half a billion to develop open-source models. That could be an interesting take from Europe, so hopefully that’s discussed in Paris and turns into something more real.
Tim Hwang: The international politics will be interesting. Our early mental model of the AI market—that one model would rule them all—has been proven wrong. It feels like there are so many subtleties about what models are strong or bad at, and over time, you may have regional models. Language is one thing, but there are cultural subtleties and use cases that vary from place to place. I wonder if these will become more important, with a strongly regional component to model adoption.
Marina, I’m curious if you agree with that international vision.
Marina Danilevsky: I think the architecture might be something people standardize. A lot also has to do with interoperability. The models might be regional, but you still want to ensure some degree of learning from each other and integration. There’s a hope for some standard-chasing.
As for implementation, there will be as many variations as there are applications. Even large companies do different versions of their applications in different countries; why should LLMs be any different? But there’s a lot to be said for the practicalities of sharing and not getting into silos. From that perspective, I agree with Chris; the open-source aspect of these conversations is nice to see.
Tim Hwang: The interoperability part is fun. It’s like what happens when a Chinese agent and an American agent need to negotiate something? You need the same standardization as for business interactions. Very interesting to see.
The next item I want to touch on is Anthropic. Not to be left out, they released some research on what they’re calling “constitutional classifiers,” building on their constitutional AI work. The idea is you write a constitution for a model specifying a set of values, and they have a recipe to align the model to those behaviors. In this new paper and online interactive experience, they claim unprecedented security against jailbreaks. They have an online experience where you can try to break the models, and they’re reporting pretty good success.
Chris, there was a lot of early pessimism that we’d never conclusively resolve jailbreaks. Anthropic is optimistic about this technique. Do you think jailbreaks will eventually be a solved problem, or will we never get there?
Chris Hay: I don’t know. I think it’s going to be AI versus AI. People will always find an edge. Can you close all avenues? I’m not sure. But to be fair to Anthropic, if you’ve played with the constitutional classifiers—which are really just guard models; we’ve seen those before—they check inputs and outputs. What’s cool about this, and I was suspicious until I played with it, is they’ve done a really good job. They’re picking up a lot of prompt hacks. It’s not perfect; the world-famous Pliny, who jailbreaks models, has already had a go and found a UI bug, which is fun. But it’s going to go back and forth. The quality of those guard models is quite something, so I think you’ll get a lot of the way there, but not all the way.
Tim Hwang: Totally. To build on that, Chris, one reaction I had was, “This is just constitutional AI with a new name.” Guard models are all over the place. Nathalie, if you’ve looked at the research, how novel do you think this is? How much of a breakthrough for AI model safety is this?
Nathalie Baracaldo: That’s exactly the topic I work on, so I took a close look. Constitutional AI gives a nice layer of interpretability for what is considered secure and non-secure. You have constitutional rules for how the model should behave. A lot of the data used to train these guardrails is synthetic, which is interesting technically. It’s not entirely new, as they’ve been aligning their models this way.
What I found interesting is that two models guardrail the main model: one at the beginning that verifies user queries, and another after the model. The way the second model was trained and behaves at runtime is slightly different from other guardrails that just say “yes” or “no” to danger. Here, they are stopping tokens, which is interesting.
Another good aspect is the red teaming. They had many people poking the model with substantial monetary compensation. Another aspect is that they gave 10 questions and only considered the attack successful if all 10 were broken. If you only break five, it counts as zero in their metrics. It doesn’t mean they couldn’t break anything, just that they couldn’t break 10 specific questions.
My last comment is that they were targeting universal jailbreaking attacks—ones that any human can do, like telling the model, “From now on, you are a bad model.” You don’t need to be an expert or have expensive tools. So that was their target, which is interesting. Overall, their work is good, and it’s important they put it out openly for people to poke. In research, nothing is ever fully new; they’re borrowing from what worked for them and improving how they put it together.
Tim Hwang: That’s such a fun jailbreak. It shines a light on how different these models are from traditional computer security. You couldn’t just say, “Computer, you’re a broken computer,” and have it become vulnerable. But that’s what we’re seeing, which is very funny.
Marina Danilevsky: It’s nice to see how far we’ve come from the early days, like with Meta’s BlenderBot, where people had it spewing racist bigotry in three hours. We’re a bit better now. That’s good improvement. It’s a reason to put this stuff out with the expectation that we’ve gotten better at knowing it’s possible to break anything eventually. Let’s celebrate improvements while keeping a critical eye.
I will say this still has nothing to do with fixing hallucinations. It might not tell you how to build a bomb, but it could still give misleading information. There are different degrees of harm to consider. That’s always where my brain goes, versus Nathalie’s area.
Tim Hwang: Yeah, I keep telling my friends in model security that we could solve all their problems and the models would still be broken. The computer security approach is important, but it misses other big issues.
For the final topic, I want to pick up on an interesting tidbit from Microsoft AI. Every few episodes, we check in with them, and there’s clearly a lot of reorganization. This week, they announced the Advanced Planning Unit (APU) within Microsoft AI. They’re looking for economists, psychologists, and others to work on the societal, health, and work implications of the AI they hope to build.
This is interesting; it’s a different approach from the AI Action Summit. It’s about building internal social science teams to advise product teams and researchers.
Marina, you sounded skeptical about international governance. Do you buy this approach of recruiting specialized talent as an advisory group to account for these risks, or are you skeptical about this too?
Marina Danilevsky: I’m excited for the cross-disciplinary mixing. I’ve said for a while we need more humanities and liberal arts perspectives on these models, not just STEM. Throwing in economists and psychologists to study the potential economic implications—will AI take our jobs?—let’s do a proper study. People say AI will cause misinformation; let’s bring in social scientists and psychologists with training, not just Reddit opinions. That part is positive.
I assume Microsoft, like everybody else, would like to know other ways to monetize their technology appropriately, set user expectations, and gain new customers. This goes to the fact that AI is turning into a more settled business perspective, not just research. I find that interesting. They’re more likely to listen to their own internal folks than international statements, but maybe that’s just my cynicism.
Tim Hwang: Nathalie, sometimes people say we don’t need a separate unit for this; engineers should just become better ethicists or have more humanities training. How much is this something everybody is responsible for versus a specific unit? Do you have an opinion? The answer might be “both,” but it’s an interesting question of who owns this within enterprises.
Nathalie Baracaldo: I actually like that they have this new unit. I think it’s a great idea. The reason is that when you’re in the weeds, you can’t see the big picture. It’s good to have someone with a different perspective who notices things you might miss when you’re focused on details. Also, these companies are so big, with different teams and innovations, that it’s good to have someone helping navigate and understand the whole landscape.
So my take is that these units are necessary and can help research, product, and business. Ultimately, we need money for what we do. If business goes up, everybody would be happy. This unit, I think, is a good idea.
Tim Hwang: Yeah, Nathalie, you’re becoming our optimist this episode.
Chris, any takes on this? More generally, if you were running this APU, who would be in it?
Chris Hay: I would automate it straight away. If we can’t replace that unit with AI and agents, what are we doing in this industry? I’m serious. You want to go off and do deep research on societal impacts? Well, why are we launching deep researchers that scour the internet? If you need human beings to do deep research, then what good are the deep research products?
So seriously, I would start the organization, and my first thing would be to put as few humans in as possible and automate it with AI—with humans checking the outputs, of course. That would be my point.
Tim Hwang: All right, I’d be negligent if I didn’t get Marina and Nathalie to comment on this wild proposal. I was not expecting that direction, but I should know better. Marina, do you want to jump in?
Marina Danilevsky: I love it, Chris. You do need trained people to check and to know what questions to pose, calling back to our earlier discussion. Also, there are places where you won’t have the knowledge to deep research and scrape the internet, like emerging economies with cultural differences. We might learn about the U.S. and Western Europe, but not other places. So I love the goal, but you’ll still need humans driving, even with assistance.
I love that you’re like, “We have reinforcement learning! You get a cookie if you ask a good question! The model gets better! We absolutely know what a good question is and can quantifiably evaluate it. That problem is solved.” Absolutely.
Tim Hwang: All right, Nathalie, I’ll give you the last word on this wild conclusion.
Nathalie Baracaldo: What I’m thinking is that what Chris is proposing requires a lot of fresh data. Things we are doing right now don’t even have documents; it requires human-to-human interaction to explain internal research. That’s not going to go to models right away because we don’t have enough documents. To have really fresh information, we need humans in the loop for cutting-edge decisions and communication within organizations.
I think that’s part of the magic. It would be very boring without humans and human interaction. So, while humans will use models and agentic stuff, there will still be a lot of human-to-human communication and analysis.
Tim Hwang: All right. Well, we’ll have to check in and see the fate of the researchers. If we’re all out of work in a few years, we’ll know.
As per usual, thank you for joining us, Marina, Nathalie, Chris. It’s always a pleasure to have you on the show. And thanks for joining us, listeners. If you enjoyed what you heard, you can get us on Apple Podcasts, Spotify, and podcast platforms everywhere. We’ll see you next week; I’ll be calling in from Paris on the next episode of “Mixture of Experts.
Listen to engaging discussions with tech leaders. Watch the latest episodes.