Disney's AI bet: USD 1B OpenAI content deal explained

Watch the episode

Why did Disney pay OpenAI a billion dollars to use their characters? This week on Mixture of Experts, host Tim Hwang and experts Marina Danilevsky, Martin Keen and Kush Varshney analyze Disney’s three-year OpenAI licensing deal, and what it means for IP owners, content creators and the future of fan-generated AI content. Next, Time Magazine names “Architects of AI” as 2025 Person of the Year—it’s not the first time the person of the year was not a person, but what’s different about this? Then, NVIDIA drops Nemotron 3 open-source models; we explore what makes this model release different. Finally, Anthropic’s Soul Document leaked. We unpack model alignment, philosophy in AI and the future of prompting vs. fine-tuning.

  • 00:37 – Introduction
  • 02:14 – Disney and OpenAI billion-dollar deal
  • 10:35 – Time Magazine’s Person of the Year: Architects of AI
  • 15:39 – NVIDIA Nemotron 3 open-source models
  • 24:10 – Claude’s Soul Document and model alignment

The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.

Listen on Apple podcasts Spotify podcasts YouTube Casted

Episode transcript

Tim Hwang: I’m Tim Hwang and welcome to Mixture of Experts. Each week, Moe brings together a panel of charming and brilliant minds in technology to distill down what’s important in the latest news in artificial intelligence. Joining us today are three incredible panelists. We’ve got Martin Keen, Master Inventor; Marina Danilevsky, Senior Research Scientist; and Kush Varshney, IBM Fellow. The year is winding up, but it’s still full of AI news. We’re going to be covering Disney and OpenAI’s new licensing deal, the Time Magazine Person of the Year, Nemotron 3 (the new Nvidia launch), and this Claude “Soul” document. But first, we’ve got Aili with the news.

Aili McConnon: Hi, I’m Eilee McConnon, a tech news writer for IBM Think. Here are a few AI headlines you might have missed this week. 2025 has been the year that driverless cars have gone mainstream. Google, Amazon, and Tesla each launched their own version of these robo-taxis in various US cities. Walmart has moved from the New York Stock Exchange to Nasdaq, a move illustrating the retail giant is trying to transform into an AI-powered enterprise. IBM and online platform Kaggle have partnered to create a new leaderboard that evaluates AI models and agents as they solve real-world enterprise issues. It’s official: the generated content. For more, subscribe to the Think newsletter linked in the show notes. And now let’s go back to our experts.

Tim Hwang: I want to begin today’s episode by talking about the Disney-OpenAI deal, which was just announced this past week. This is sort of an interesting one. I know at this point we’re all kind of very cynical about deals that are less than USD 10 billion, but the core of this deal basically is that Disney is about to sign a three-year licensing deal with OpenAI to allow its characters and IP to be used in its generative AI models. Additionally, Disney is going to be taking what is effectively a billion-dollar stake in OpenAI itself and become an equity owner. Martin, maybe I’ll kick it off with you. Why is OpenAI signing this deal, exactly?

Martin Keen: Yeah, it’s such an interesting deal, isn’t it, Tim? Because traditionally, the sort of generative AI deals that we’ve seen up until now have been for training and grounding purposes. So you would purchase, for example, a bunch of news articles. OpenAI did a deal with the Financial Times for that, and Google did a deal with Reddit where they pay Reddit, I think, USD 60 million a year for Reddit training data. But this isn’t that. This is the other end of it. This is taking the finished model and actually using the output of it to incorporate the characters, which is a really different way of looking at it. And I think from one perspective, you can think: as soon as a new image model comes out, people are trying to generate copyrighted material instantly with it and seeing how much the model lets you get away with. And often over time, they kind of — I think one of my first Sora uses was a Mickey Mouse cartoon, exactly. So, you know, might as well do deals. We might see more beyond this if this one works out for Disney and OpenAI.

Tim Hwang: Whenever a new model comes out today, there’s always the same sort of test that everybody does on the model. You always see people try to build a vector image of a pelican riding a bike to see how good each model is, right? Or an otter on a plane using Wi-Fi is another pretty popular one. I have my own: every time a new model comes out that’s supposed to be good at writing, I have it generate a Nelson DeMille short story in the style of Nelson DeMille. It’s not bad, but it always comes through and says, “I can’t write in the style of Nelson DeMille, but here is something similar to it that includes some of the themes.” It’s a disclaimer, right? But maybe not in the future. Maybe in the future, authors will consent as well, and then I could make fan fiction of my favorite authors. Who knows where this is going? Marina, I think one of the things that comes to mind is that with Disney, the most frustrating thing for them for a long time has been the Vault, right? Where they’re like, “You can only access certain movies at certain times,” and then arbitrarily, Disney is like, “It’s in the vault. You can’t get access to it.” That’s a long way of saying I think Disney’s very protective of these characters and this IP. For them to say, “Yeah, we’re okay with a world where anyone can use generative AI to have Mickey Mouse do whatever” — well, Mickey Mouse is a bad case because it’s now entering the public domain. What’s your thinking on why Disney’s finally willing to take this risk? Because it’s kind of a big deal for them in terms of how they think about and control their intellectual property.

Marina Danilevsky: It is. And Bob Iger did say something about this being kind of inevitable anyway, so we want to get ahead of it instead of fall prey to it. Something that I just want to mention is there’s the flip side of the deal. Yes, you can as an individual use Sora to generate things with Disney images, but Disney is going to stream these kinds of videos. So they’re going right back to Disney, and they’re trying to basically have control in some way of that fan-generated content and have it come back to Disney instead of proliferating on X or BlueSky or wherever you’re going. So they’re trying to say, before this gets into a whole “Oh yeah, look at all these wonderful fan-generated shorts” — no, no, no, come be on our platform instead. So this also reads to me, to a large extent, like a platform play: “Look, this is going to happen anyway, so let’s make sure that people are doing these things on our platform and constantly coming back to that much tighter integration with Disney.” That seemed to be an important part of it to me.

Tim Hwang: Yeah, absolutely. Kush, this sort of opens up a really strange world, right? You just turn on the Disney Channel at some point and it’s just infinite user-generated AI cartoons of these characters. That seems like where we’re headed.

Kush Varshney: Yeah, I think it is. For the diehard listeners of the podcast, maybe last year at some point I was mentioning Foucault and the author function and how the whole social contract of authorship is maybe changing. I think that’s exactly it. When you have these thousands or millions or billions of fanfictions out there, the social contract is just completely different. Before, in oral cultures, you had these bards singing. They weren’t trying to make money off it; they were shepherds or farmers or shopkeepers, just doing it for their own status or to continue a tradition. Then with blogs and stuff, that digital reality came about again. Now with AI tools, I think Disney’s just going to capture exactly that same social contract. As Marina was saying, be ahead of the game and just be part of where the world is headed, because they’re still going to make their money on the amusement parks and merchandise and licensing. So whatever authorship is turning into, I think they’re just a part of it too.

Tim Hwang: Martin, where does this all go ultimately? Now that Disney has been willing to jump in, I assume everybody else who owns significant IP is also thinking, “Is this my time to get in and cash in on some of this stuff, to build a partnership with these AI companies?” I don’t know if you have any thoughts on where this goes next.

Martin Keen: Yeah, I mean, like you say, Tim, this completely flips the script. LLMs come out, everybody’s thinking, “Hey, how did this get trained on my data? Do I need to sue to get my data paid for?” Now this is the exact opposite. Disney is paying a billion dollars in an equity investment to OpenAI with more to come. So it’s completely the opposite way around. So I think this really signals: are the eyeballs now going to go more to the Disney IP and not to your own IP? So we may even see the complete reverse of what we’ve seen before, where everybody is kind of pushing to get their content into these models now.

Tim Hwang: Yeah. Marina, one thing I want to note to end this segment on: obviously Disney is Disney — one of the biggest, most powerful content companies in the world. Do you have any thoughts on what this means if you’re just someone creating art online? Because I feel like they’re in a very different position from someone who happens to own the rights to Harry Potter or something like that.

Marina Danilevsky: I was going to say K-pop Demon Hunters — get back to me when. Yes, that’s right, playing with my daughter who’s obsessed right now. But I could imagine other content owners following suit and seeing what this is. I wonder to what extent — I know it’s an exclusive license for a year — are we going to be chopping things up again? Are we going to see more mergers of this kind of thing? I think there’s a lot of economics here that are really interesting and very much not worked out. Like what Kush said, the contract with creators and what counts as official and what counts as unofficial — that line is going to be way harder to tell. I think there are going to be a lot of people involved in that in the next few years trying to draw those lines.

Tim Hwang: Well, we’ll keep an eye on it. Super useful getting all your opinions on it. I’m going to move us on to our next segment. Time magazine, as you all know, has an annual Person of the Year feature. This year’s Person of the Year is not a person, but in fact the architects of AI. If you’ve seen the magazine cover, it’s a take on a classic image of construction in New York, but it’s all of the luminaries of AI sitting on a construction girder. I thought this was so interesting and worth spending time on because it lets us talk a little bit about what’s been happening this year in AI, but also how the media representation of AI is evolving. The first thing that stood out to me was you have CEOs and infrastructure providers, not a whole lot of researchers represented here. Lisa Su was on there, which I think was surprising for people — former IBM researcher, by the way.

Kush Varshney: They got to get that name checked. Exactly. So having the CEOs on there — that’s the signal. “Architects of AI” — what are they architecting? I think it’s the financial aspects, the hype, the business. That’s what’s being architected. If they had chosen to feature the scientists or even the data workers, that would have been a very impactful cover. But capitalism is being forefronted. That’s what “architecture” means, I think. Is that a good thing or a bad thing? I’m not going to comment on that, but I think that’s the message, at least.

Tim Hwang: Marina, this is going to be a very pointed question for someone like you. Does research matter anymore? Time magazine is saying — and I think it does capture something fundamental about this moment in AI — that almost all the action, all the attention is on the business side. I’m really curious how you feel about this balance of power between the folks advancing the actual technical research and this increasingly large ecosystem that sits almost entirely aside from what’s happening on the latest papers from NeurIPS or what have you.

Marina Danilevsky: I really agree with what Kush said. This is a signal that it hasn’t been as much the year of AI as it’s been the year of AI hype, AI communication, AI as business, AI as financial deals — not necessarily so much the technical side. That’s interesting; it’s continued. But people were saying that 2025 was going to be the year where AI agents put all of us out of work. Not quite there, guys. Still getting there. But the hype this year, the stories, the way people talked about it was ridiculous. A lot of it centered on these cults of personality — who could say the most ridiculous things in the news and move the coverage like ping-pong balls: “Oh, now it’s this company. No, it’s back here. No, it’s back here.” So it was certainly a year of that. I also, like Kush, am not completely sure what to make of it. It’s reflective of reality. It’s maybe not reflective of where the real work and the interesting technical work is happening, but you can’t deny the reality — it has been this, for better or worse.

Tim Hwang: Yeah, it has been the hype, basically. Martin, thoughts on this? I’m hearing some grumbles maybe from your other panelists.

Martin Keen: No, I’m totally on board with what you’re saying there, Marina. It does seem like “the year of AI hype” would be the tag for this rather than “the year of the agent,” perhaps. But the article points out how much of this focuses on infrastructure — how much it’s been on infrastructure this year, how much spending there is just raw spending on AI data centers and so forth. They mentioned in the article over USD 400 billion in 2025 just on AI activities, which is a huge amount. So the article kind of made the point: are these the next industrial titans? We had the railroads and so forth, and now it’s the data center. Is this the next thing? I think a lot of the focus is on that now. When I saw that this year it was AI, I went back and had a look at some of the other Time Persons of the Year. This is certainly not the first time it has not been an actual person. Do you know who the Person of the Year in 1982 was? No, I do not. It was the computer — popularized by the IBM PC. So we’ve kind of come full circle from “Hey, everyone has a computer now” to “Hey, now everyone has access to AI” in 2025.

Tim Hwang: I’m going to move us to our next topic. A few weeks ago I was joking: I am tired, because it feels like every few weeks we do a segment on a new model being out. The end result — I forget who commented on a previous episode — is there are a lot of models coming out all the time, they’re all really good, and after a while the distinctions blend into one another. But we’re going to do a segment on this. Nvidia launched its newest generation of its Nemotron open-source models, Nemotron 3. There’s a lot we’ve come to expect from model releases in the last few months: they’re focusing more on agentic behavior, there’s a spread of models from the very largest to the smallest, and there’s a bunch of infrastructure and other component accessories released with this generation. I do want to get into that, but I want to start with a business question for some of our listeners. Kush, why isn’t Nvidia always winning when it releases its models? Doesn’t it have the ability to create models that are ultra-optimized for their own hardware that everybody else runs on? Implicitly, doesn’t it make sense that the Nemotron models would be some of the most successful out there? But that doesn’t really seem to be the case — we tend to focus on a lot of other players for their models. I’m curious if you can account for Nvidia being this huge hardware leader but not necessarily a model leader.

Kush Varshney: Yeah, maybe now they will be. They’ve been moving up the stack — starting as a GPU company, then having CUDA. Part of this announcement was actually an acquisition of ShedMD, which is workload management, scheduling software. They keep moving up the stack, consolidating everything as they go. It’s like a snowball. A long time ago I did an internship at Sun Microsystems, and the key phrase for them was “The network is the computer.” John Gage came up with that. I think Nvidia is just rolling up to becoming “Nvidia: The AI stack is the computer.” They’re controlling the narrative going forward as well. So we just need to keep an eye on what comes next — where else do they go?

Tim Hwang: So you’re actually saying this is a pretty big deal. I shouldn’t necessarily shrug it off like, “Eh, Nemotron 3, much like Nemotron 2, what’s another model?” You actually think this is significant in some ways?

Kush Varshney: Maybe. You could say that the models are being commoditized in some fashion. What they’re doing is probably the same recipe that others are doing, that we’re doing. On the Granite 4 architecture, it’s pretty much the same. So maybe it’s just connecting everything together that matters.

Tim Hwang: Marina, if you agree with Kush, there’s almost a fun, interesting race going on. Nvidia moving up the stack, trying to gobble everybody up, while everybody’s trying to get off Nvidia onto other hardware platforms. Do you agree with Kush’s assessment? Where does this all go in your mind?

Marina Danilevsky: I definitely agree with Kush. The full-stack play is something that has been going on — people coming from different ends, either from the bottom or from the top, trying to do it. Because the quality of these models by themselves has gotten very comparable, so it matters so much more: the levels of integration, the levels of distillation you can do for specific things, how you integrate them together, how you test things for yourself. When everything gets commoditized like this, it turns once again much more into, “Okay, what’s the economic play here?” and also “What’s the ease of use?” People’s expectations for ease of use and ease of trying it out are very, very high. So Nvidia doesn’t want to be dependent on other people to choose it or not choose it, so they are necessarily getting ahead of the game. This makes a lot of sense to me. They’re not the only ones doing this, but it makes sense.

Tim Hwang: Martin, you may have comments on this. One question to throw into the mix: a meta story of 2025 is what are the bounds of an open release in the space? People used to say, “Oh, we just released the model.” Now people are like, “The model and the data.” And Nvidia is here because they’re also releasing training datasets and reinforcement learning libraries. It feels like the scope of “open” is getting broader and broader about what’s expected when you do one of these open releases. Curious if you have thoughts on that trend.

Martin Keen: Yeah, it seems like openness is going to expand a lot, especially with the EU AI Act coming in next year. That’s going to require things like stating what your training dataset is and so forth. That’s a big open thing that is not discussed even with many open models. Tim, you made the point of why doesn’t Nvidia have the best model because they have the GPUs and shouldn’t it just be easy to combine the two? But it’s interesting: what is probably the top-rated frontier model right now is Gemini 3 Pro. That was trained on exactly zero Nvidia GPUs. Google has had the advantage of using their own hardware. They trained the entire model on TPUs, and it was a massive pre-training effort. Most of the work seemed to go into the pre-training. So that’s an example where owning the hardware and building the model has really worked out very well — Google has just been able to make such advances. So it will be interesting to see how these Nvidia models come along as well.

Tim Hwang: Kush, anything you’d want to flag in terms of model architecture otherwise? This is again sort of multi-agent, what everybody else is doing, but anything unique worth pointing out?

Kush Varshney: No. On the narrative point, the architecture is this hybrid: it has a bunch of transformer layers, it has some Mamba layers (a state-space model for long-range dependencies), and a mixture of experts — our name of our podcast in there. That combination, the hype is again saying that this is something unique and new and great, and it can deliver all sorts of performance and efficiency and makes sense for agents. All of that is true, I would completely agree with it. But it’s not like they’re the first to come out with it. So coming back to the Time magazine thing: what are you centering? It’s the hype of it in addition to the fundamentals.

Tim Hwang: That architecture sounds an awful lot like Granite 4 — the Mamba, the transformer, and the mixture of experts all together.

Martin Keen: Everybody’s been very on message today. We’ve got lots of IBM references coming in, I guess.

Tim Hwang: Marina, on a final note, do you want to do any comparison between this and Granite on the actual architecture itself?

Marina Danilevsky: As I was reading, I was like, “Hold on, am I reading the Nvidia one or the Granite website?” Because it’s awfully familiar. They have the same idea of multiple models as well, which I think a lot of these open models are moving to now. They have Nano, Super, and Ultra — they have 30 billion for Nano and 500 billion for Ultra. But because it’s a mixture-of-experts model, you only use about one tenth of those parameters at inference time. So that 30-billion model, the Nano model, actually only uses 3 billion active parameters at inference time. That means you can take these models and run them on some pretty small devices, which I think is quite interesting.

Tim Hwang: I’m going to move us on to our final topic of the day. This is a fun story from a number of weeks back. We had scheduled to talk about it when the news broke, but there’s been so much other stuff happening in AI that we’re only addressing it now towards the end of December. It’s a way to talk about what’s happening in model alignment and model safety. A kind of independent researcher was digging around with Claude and uncovered a document that is used in the training process that Anthropic calls its “Claude Soul Document.” The Soul Document is very long and unique in a couple of ways. The way it’s drafted is a lot more narrative and philosophical than a lot of safety documents you might have seen. That said, it’s a long list of things we don’t want the model to do. Marina, what should we make of Claude having a Soul Document? What exactly is going on here from a technical standpoint? What is this actually being used for?

Marina Danilevsky: I think it’s very in line with Anthropic’s perspective and the way they want people to think about how they develop their models and their company. I like Claude; I’ve always kind of liked the Claude model. I like the way they write and what they do. Something interesting here is that this document, if I understood the coverage correctly, is being used at fine-tuning time, not just something that’s in the prompt. That’s different from what a lot of other models do, which is just a whole bunch of fine-tuning or RL techniques that don’t have any framing around the examples. The example is just “Here’s a really specific task, really specific question, and this answer is better than that answer.” I’m simplifying. But you keep going that way without anything around it like, “Oh, you should think about how to answer this in reference to this and this.” As a result of doing it earlier in the stack, this is reinforced over and over again in the model’s own parameters. Calling it a “soul document” is cute, but there is a sense of almost a value and a structure that no matter what the task is, you ought to be referencing a bunch of things beyond just the concrete question and answer. I think there’s something in that. A lot of people maybe do that without being so explicit about it, because of the default system prompt you choose to have or not have as you train your model. You might have your own version of that document, but it’s maybe very small and not very intentional. At Anthropic, it seems to have been deployed very intentionally. It does result in a somewhat different personality of a model because you end up building different biases — and I use that word in a technical sense. So I think it’s just a slightly different perspective on when you put this information into the model earlier up the stack, and that itself is worth looking into — how much of a difference that makes.

Tim Hwang: Marina, why haven’t we done model alignment in this way before? Why have we leaned so much on prompting versus just working it into the fine-tuning? Because what you’re saying is really true: we want this set of principles embedded into the behavior of the model. But the fine-tuning thing is a little bit different from how a lot of people do it. Kush may have perspective on this as well.

Marina Danilevsky: One thing is that it becomes a little more difficult to do things from an evaluation perspective. If you have all of these things as part of what you’re going after, can you really tell that this particular answer is actually better than that particular answer? I wonder how much time they spent not only on figuring out this document, but on figuring out how they need to change their training data to account for training in line with what is there. Most of the time we don’t do that. The datasets we rely on or construct don’t do that; they’re a little more straightforward, a little more focused — actually a lot more focused than that.

Kush Varshney: As we’ve been doing this for Granite, we’ve built up that experience as well for safety alignment or morality alignment. The evaluation is clearly part of this — how do you know which behavior is preferred or not? But there’s also the modularity question. Once you do supervised fine-tuning or similar, it’s fully baked in. Not every use case is exactly the same, especially for the types of customers and use cases we often think about. So maybe it’s too heavy-handed in some fashion. Anthropic’s document states that they want this to be an “expert friend” of some kind. Not every LLM should be your expert friend. So breaking it up, having the options to turn things on and off, do things in a way that makes sense for your use case, is another driving factor.

Tim Hwang: Totally. Marina, your comment about giving up some evaluation ability by doing it this way is pretty interesting. They have a harder time measuring this, but they think it’s more aligned if they do it this way. Martin, Marina was being pretty nice about this, but one of my instincts on reading this was: wow, this is so Anthropic. This is the most Anthropic document I’ve ever read. It’s easy on some level to eye-roll and be like, “Oh yeah, ‘Soul Document,’ very Anthropic.” But I kind of agree with Marina: out of all the models currently operating, Claude is just the most pleasant to interact with right now. I’m really curious about whether the way these documents are written has something to do with this very hard-to-quantify quality we like in Claude.

Martin Keen: Yeah, reading the Soul Document was so interesting. I agree with you, Tim — the nicest model to chat to has always been Claude for the last couple of models. I’m thinking of myself when I prompt a large language model: I’ll read back my prompt and think, “If I gave this to a human, am I giving them a decent chance to actually perform the task? Am I giving them enough information?” I’m editing a document and I just respond, “Make it better” — that hasn’t really given the model a good idea of what to do. So I try to be quite specific. I was interested to see what Anthropic would include in this Soul Document that would really guide the model. The part that I read — it says (this is from the Soul Document) that Anthropic generally believes it might be building one of the most transformative and potentially dangerous technologies in human history, yet presses forward anyway. I’m thinking: if I tell that to a human — that I’m building that potentially dangerous thing — how does that human proceed? I have no idea. So how that model is supposed to take that information and do something with it? I’m looking forward to finding out.

Tim Hwang: Kush, maybe I’ll give you the last comment here. This goes to what I was just asking Martin: how much of what’s in here is actually impacting model behavior, and how much is poetic license or literary flourish? This document is a real pleasure to read. But where I’m left at the end of the day is: how much of this actually impacts how the model behaves?

Kush Varshney: I think it does have an effect, certainly. Could it have been more concise? Maybe. But a few different interesting parts. One is that they do discuss uncertainty, value uncertainty, and calibration in there. As people, most of the things we encounter in life we’re also uncertain about. We don’t know what we believe until we encounter it. The fact that they’re going through all of that uncertainty — how should you reason about it — is actually a nice thing, and that can’t be done very concisely. So I think that’s an aspect of it. Another thing I’ve been waiting this whole podcast history to talk about a little bit is some of the moral philosophy aspects. There’s a little bit of confusion in — it’s very Anthropic, as you just said. But if you step back and look at the philosophy of it, maybe it’s trying to do too many things at the same time. There’s this concept of dualism and non-dualism in a lot of moral philosophy: is the soul of individuals separate for each individual, or is it all universal, all the same thing? I don’t want to get too philosophical, but it’s important here. The reason it is, is because every instance that a person uses the model, it’s like a new birth, a new session. Is this Soul Document really meant to be universal, or is it meant to be individualized for the session? If it’s really meant to be universal, then why is it talking about being a “brilliant friend”? Because every context needs to have a separate sort of soul in that case. So it’s very confusing what the exact goal should be.

Tim Hwang: So what’s the end result? Do you feel like that’s a problem for the model?

Kush Varshney: If you just use it for this very narrow type of use, then it’s fine. But if it’s really a general-purpose technology that you’re going to use in a lot of different situations, then I think it’s too prescriptive in certain ways.

Tim Hwang: Yeah, for sure. That’s a good reminder. Kush, I didn’t know you wanted to talk about that. I’m going to work out an MoE segment next year where we just do moral philosophy. I think it’d make for a super fascinating episode. Marina, I thought of one last question, so maybe we’ll have the last word go to you. A few years ago, I was very excited about prompting because I thought, “For someone who has over time become more of a writer than a coder, this is very exciting.” At the time, I had a couple researcher friends who said, “Don’t invest too much in prompting. We’re going to figure out how to automate it. Prompting is going away. It’s just a temporary thing.” Here we are at the end of 2025. The architects of AI have done their thing, and AI is bigger than ever. If anything, the fact that Anthropic is doubling down so hard on this kind of document, which specifies — as Kush said — the moral philosophy of these models: are we going to be living with prompting for much longer, or is this similarly just a temporary thing in your point of view?

Marina Danilevsky: Prompting is a way to get information into the model. It is certainly a very simple and straightforward way. It doesn’t mean that you know what effect it has. But I will say that a lot of times when you go into these larger systems, prompting does kind of start to dial down. Maybe you specify slightly what you want, but the way you actually end up executing is not via prompt. You end up telling the model roughly, “I kind of want you to do this,” and then there are other intermediate steps. So out of “prompt engineer” you end up with “agentic flow engineer.” So prompting in this sense is going to be part of it, because you’re trying to get information into the model in a particular way. But I agree with your friends back then who said, “Yeah, we’re going to figure this out.” Remember where we started: you prompted and you put in two spaces instead of one space, and the model was like, “I don’t know what to do.” We got past that. The base idea remains the same: you’re trying to get information in, and the model either can kind of figure out what you mean or it can’t quite figure out what you mean. Now it just won’t tell you that it couldn’t figure out what you meant; it continues to keep going. But I think we are going to be moving beyond this in terms of ways to inject information into the models, and as we go beyond the models, ways to inject the information we want into the use of the models. The models themselves are a means to an end most of the time. Especially as Kush was referring to enterprise solutions and real use-case solutions — those are going to be a means to an end. At that point, we’re going to be moving beyond something as fragile as prompting.

Tim Hwang: And also, come to think of it, beyond these types of fine-tuning approaches as well, where you take a sole document and try to fine-tune it in.

Incredible episode. This ties together so many threads from the last 12 months. Martin, Kush, Marina, thanks for joining us so late in December. That’s all the time we have for today. If you enjoyed what you heard, you can get us on Apple Podcasts, Spotify, and podcast platforms everywhere. And we’ll see you next week on Mixture of Experts.

Stay on top of AI news with our experts

Follow us on Apple Podcasts and Spotify.

  1. Subscribe to our playlist on YouTube