We are celebrating MoE podcast’s one year anniversary! In episode 53 of Mixture of Experts, host Tim Hwang is joined by the O.G. panel of experts from our pilot—Chris Hay, Shobhit Varshney and Kush Varshney. This week, we cover some exciting announcements at LlamaCon. Then, we discuss some new Chinese AI models, from Qwen3 to the rumored DeepSeek-R2. Next, J.P. Morgan’s CISO, Patrick Opet, released “An open letter to our third-party suppliers,” covering the need for AI security. Are we doomed? Finally, we look back at some of the topics we discussed in episode 1—the Rabbit AI device, GPT-2 chatbot, Apple Intelligence—after all that, who was the first person to say “agents” on the podcast? Tune in to find out, on today’s one-year celebration of Mixture of Experts.
Tim Hwang: I wanna go back one year: it’s May 2024 again. What’s the biggest thing in AI that turns out to be not that big of a deal? Kush Varshney is an IBM Fellow on AI Governance. Kush, welcome back to the show. What do you think?
Kush Varshney: Kolmogorov-Arnold Networks. Got it. That’s a good one.
Tim Hwang: Shobhit Varshney, Head of Data and AI for the Americas. Shobhit.
Shobhit Varshney: The cost of AI—I think the intelligence per dollar has plummeted significantly.
Tim Hwang: Absolutely. And last but not least is Chris Hay, Distinguished Engineer and CTO of Customer Transformation. Chris, what do you think?
Chris Hay: Those stupid pin things we got all excited about last year.
Tim Hwang: All that and more on today’s Mixture of Experts. I’m Tim Hwang, and welcome to Mixture of Experts. Each week, MoE brings together the smartest and, I think, the most good-looking crew in all of podcasting to discuss and debate the biggest news in artificial intelligence. And this is a big episode. Today we’re officially celebrating our one-year anniversary of MoE. We brought together the original crew from MoE episode one to join us—all-star crew. We’re gonna do a look back, cover a call to action from J.P. Morgan, a new wave of action in the Chinese AI market. But first, I really wanted to cover all the latest from LlamaCon. So I believe this was the first event—officially the first LlamaCon—that Meta has run, focusing on its work in the open-source space and around the Llama class of models. I think a lot of announcements to cover here. Shobhit, the first one that I was really intrigued to get your take on was they announced this thing called the Llama API, and it’s a developer platform that, quote, “will bring together the best of closed source with open-source flexibility.” So for our listeners who might be less familiar with this, what have they done, and why is it kind of a big deal? In my opinion, I think it’s a big deal.
Shobhit Varshney: Yeah. So today, in the current state, if an enterprise needs to play around with Llama models, you go to one of your hyperscaler partners and say you’re gonna use their version of the studio, their way of fine-tuning it, and whatever the hyperscalers are producing. And then once you’re done with that model, it’s difficult to move it around. Right? So in this particular case, Meta is coming out and saying, “We want to be as developer-friendly as possible. We’ll give you a central place with all the playgrounds, the fine-tuning capabilities, also evaluations, and so on and so forth.” So as you’re fine-tuning the model, you can test it out. All of that will be done centrally. They will host the API for Llama as well. You can obviously still get it everywhere else that you get your LLMs from, but now they’re developing a whole stack, so they’re moving beyond just providing the model to providing the whole ecosystem. They have done enough work in the space with Llama Stack and a few other things in the past, but this was their coming-out party, saying that we are gonna be as developer-friendly as possible. Come work with us, we’ll help you fine-tune it. Once you’re done with that model, you can take it anywhere. Obviously, there are a lot of things around privacy where they will not train the model on the data that you’re providing them, and so on and so forth. But the inference speed is amazing with their partnerships with ServiceNow and GR and others. So overall, they wanna be the hub where people come and experiment with Llama models, versus Llama models being one of the 200 models available on Microsoft or AWS or Google.
Tim Hwang: For sure. And Chris, maybe to bring you into this conversation, I was having a debate with a friend about this announcement, and we were talking about whether or not this is almost like a position of strength for Meta or a position of weakness. One point of view is, hey, we release these open-source models and everybody will build all the tooling around it. Essentially, that’s what we do: we do the model, and then everybody else builds the ecosystem. So that’s kind of the bear case. And my friend was like, well, the bull case is actually that they recognize there’s such a big opportunity that they have to actively build this stack. I’m curious if you have any feelings about that or how you size up these moves.
Chris Hay: I think it’s a really interesting move. I think having a standardized stack where you can bring your models, you can fine-tune them—and I think fine-tuning is gonna become a bigger thing in the future, because you’re gonna want your own personalized model, you’re gonna want something with domain knowledge—and therefore bringing that into a consistent place, I think, is a good thing. And then if you think about where Meta wants to go in the future, they want AI to power all your avatars, assistants, et cetera, on their platforms and have agents on there. Then I think making it easier to have a playground for developers and individuals to tune models based on Llama’s stack is a sensible thing. I do think, though, that when I really look at this, all the APIs are OpenAI-compatible APIs, and nearly every single service provider is moving towards OpenAI-compatible APIs anyway. So there is still a part of me that goes, “Well, can I do that somewhere else?” And sure, with the fine-tuning part specifically, that is hard, right? Because getting your models out of some of those existing stacks and taking them elsewhere is more difficult. So I think that is a differential play in my mind.
Tim Hwang: Totally. Yeah. It’s getting more complicated, seeing them navigate this. Kush, another part of the announcement that I wanted you to comment on was that they also announced all of these security and protection models—so Llama Guard 4, Llama Firewall, Llama Prompt Guard 2. It kind of feels like the protection space around AI is starting to get a lot more complicated than it used to be. Where the old thing was like, “Oh, well, we just have a model that tells you if the outputs are toxic,” now it feels like they’ve got, at every layer of the stack, a model you can use for security and safety. Curious about how you read these trends. Where is this going? Is it just gonna become a more and more complicated ecosystem of safety models? Yeah, just curious about your hot take on that.
Kush Varshney: Yeah. As you said, they have this new Llama Guard 4—it’s a 12-billion-parameter model. It’s multimodal, so it has vision and text in there. The Prompt Guard they made really tiny, I think 22 million parameters. So yeah, they’re making progress. Certainly, the headlines are good. We haven’t had a chance to evaluate and see what the performance is yet. And yeah, actually just a week and a half ago, maybe two weeks ago, there was a new benchmark that came out called GuardBench. This GuardBench actually goes and tests a lot of different guardrail models. Just a side note, the Granite Guardian model that I’ve talked about in the past is at the top of that leaderboard, but we should see how Llama Guard 4 is doing there because if they’ve really made good progress, that’s awesome. And the fact that the Prompt Guard is so tiny, I think, is gonna make a huge difference because it’s like 22 million parameters—it’s a blink of the eye. You can do it so fast. So I think the overall space is just becoming where people are realizing the seriousness of safety and security. So having multiple layers of security—that’s just good practice. So having it on the inputs, on the outputs, the overall firewall—all of that is good stuff. And then we’ll see how it progresses. No concern for me. I think this is where the field needs to go.
Tim Hwang: Shobhit, before we move on to our next topic, any other announcements that you’d highlight? I know there’s a bunch announced. Those are the kind of two that stood out to me, but I know there’s a whole blog post. There’s a lot going on.
Shobhit Varshney: Yeah. The other couple things were one around their Meta AI app. They have consolidated all of their intelligence into one app, and that could be a ChatGPT competitor or Gemini, and so on and so forth, but they want one app that people can go and do some cool things and talk to it. And they have the potential to make this super hyper-personalized because they have billions of interactions happening across all of their WhatsApp and Instagram and Facebook. You could potentially have an avatar that is really personalized to your particular needs and wants and things that you care about. It is a delicate balance between privacy and hyper-personalization. They’ll have to do that balance delicately. But they have a huge bet on creating the one app where you go for all of your AI. There are a few other things that may have been brushed off in the details, but they have done... they’ve had about 1.2 billion downloads of Llama models, and a lot of those, the majority, are derivatives of Llama on Hugging Face and other places. So clearly the momentum around open source with the developer community is amazing, and Llama has had a huge impact on where we are today with open models versus others. But there were a few things on my wishlist that they didn’t get to. There are two other models that they had announced; they’re not coming quite yet. One is their small Llama model, that’ll be about an 8-billion-parameter model. 8 billion was the most popular size of the Llama model from the last previous generation. We have not seen that yet, but that would be a game-changer for enterprises, especially if you have good methods of distilling it down. And then on the other end of the spectrum is the behemoth model. They still need to figure out what they do with it. It’s not something that’s practical at this size to be run by enterprises, but we need to figure out what’s the right way of distilling it down, or can I use that to train other models? There are other things around multi-agent orchestration that I was expecting Llama to release as well, like things like MCP support and agent-to-agent protocols or anything around agent ops as part of the whole Llama stack. I’m waiting for them to announce more things in that space as well. But overall, really positive. It’s good to see that we are celebrating open source getting closer and closer to the frontier models as well. So great LlamaCon for all of us. We had a good partnership with them; IBM and Box have done some amazing work with Llama that was announced on stage as well. So overall, very positive for all of us. The community has enjoyed it.
Tim Hwang: That’s great. Yeah, and a lot more to come, I’m sure. What you talked about is gonna be coming out very soon, I think probably. Speaking of open, I’ll move us on to our next topic. We wanted to do a short segment because there’s been a lot of interesting things bubbling up, particularly in open source, in the Chinese market. And I did want to spend a little time talking about that. One that has actually come out is that Alibaba has launched Qwen3, which is a whole class of models that they’ve put out, which is the latest generation of their Qwen generation of models. And Chris, I think I wanted to start with a little technical explainer. In the blog post, they talk a little about how these models are what they call “hybrid models,” which combine, quote, “thinking and non-thinking modes.” And I think, in true AI form, we picked all sorts of terminology that’s very confusing. Like, what is a hallucination, anyway? So I wanted to initially start with: what is a thinking and non-thinking mode when it comes to AI, and why is it important for what they’re doing here?
Chris Hay: Yeah, so when we hear “thinking,” I think of it as the kind of reasoning models, like the o1’s, o3’s, o4’s. In those particular cases, if you think of what a model is, it is a kind of next-token prediction model. So it is gonna be token, token, token, token. So whenever it’s answering a question, that works great. But you can imagine some of these problems are a lot harder to solve. And therefore, if you equate the thinking time to the number of tokens that you generate, then the more tokens you generate, the more likely you’re gonna get a good answer. And so when you’re saying “thinking mode” in that sense, it’s like a human being: rather than blurting out the first thing that comes into your mind, spend a little time deliberating whatever the answer is gonna be before you open your mouth and announce your feelings to the world. Keep those thoughts inside. So that is kind of what the idea of thinking is there. Now, there is some class of questions where no matter how long you think about it, thinking is not gonna help. Things like, “What is the capital of England?” If you don’t know the answer, sitting and thinking about it really isn’t gonna help you. But doing something like a math problem or a logical or reasoning problem—if there are six cats and one falls out the window, how many cats do you have left and how many lives has it got?—then it needs to think about that a little bit, and then it’ll come to the answer, and therefore you’ll generate those tokens. So the idea of being able to have this hybrid mode... in reality, for some cases, you want thinking switched off—quick questions, general Q&A-type knowledge answers. But if you’re doing logic and reasoning, you want the ability to switch that on and have the model take a little time to think about it and come back with the answer. I still think this is a problem today that is gonna go away in the future. Just like human beings, we have learned when to blurt out an answer and when not to. You don’t say to a human being—speak for yourself, Chris. Well, actually, maybe not. Maybe I haven’t learned. But I think in time, that’s gonna relax, and we’re not gonna have to switch that on or off. But I do like this idea of the future of a thinking budget: you’ve got five minutes to think about it, three minutes to think about it. So I think this practice is gonna evolve, but I think the hybrid mode is very much a positive.
Tim Hwang: And Chris, one thing that’s been raised before but might be fun to tackle more directly with this segment and these releases: some people have commented—I think Kate might have mentioned it on a previous episode—we’re really starting to see the return of mixture of experts. It feels like that is now very much back on the table; it’s what everybody’s doing. So what was kind of uncool again is really back and forth. So I wanna talk a little about why that’s the case now that we’re seeing it in Qwen3, and is rumored for the DeepSeek-R2 launch, which is also potentially coming out maybe even by the time this episode releases. Rumors have it happening potentially this week?
Kush Varshney: Yeah, I mean, whoever came up with the name of this podcast was quite prescient: “Mixture of Experts.” The term has been around for a long time. It meant something different when I was in grad school, with these gating mechanisms and stuff. But the point of it is really... just like Chris was saying, like humans don’t blurt stuff out, humans also don’t use their entire brain when they’re thinking. I don’t know what the stat is—like we only use 10% of our brain at a time. Same idea: you don’t need to use everything; you don’t need to activate everything. Because there’s a portion that’s really the important part when you’re thinking about something, computing something, inferring something. And so I think it’s just taking advantage of that. You can use less power, less computation, less of everything if you’re only activating the relevant parts. And if you can know which parts to activate, then that’s gonna end up being a good thing. And then you can have different specializations, different things that are better at particular aspects. So about cats falling out of trees—a mixture of experts, or an expert—and then, whatever, all sorts of different experts on there. So I think that’s where things are headed.
Tim Hwang: Sure. But I think the final bit that I was hoping to get your take on is... what I love about the DeepSeek story is how much it is messing with all of our intuitions about how competition in AI is supposed to go down. The first one, of course, was, “Oh, okay, in the US, it was Meta doing open source versus these closed-source guys.” And so the introduction of DeepSeek is like, “Oh, well now even Meta has competition.” Yeah. And I think the other really interesting element that’s been rumored around R2 is that they are doing the training not on Nvidia video? Which I think is really intriguing and also completely scrambles the idea that, “Oh, everybody’s just gonna build on Jensen’s chips, and that’s just gonna be the way AI works.” Yeah. I think what I read in the blog post was the rumors that R2 was trained on a server cluster of Huawei’s Ascend 910B chip, which would mark a really big transition in how some of this happens at the cutting edge. Do you wanna talk a little about that? I thought it was very interesting.
Shobhit Varshney: Yeah, so over time, China is brilliant. The people are just absolutely stunning in these research labs. So they’re very high concentration of talent. So they’re trying to figure out ways around their dependence on the US supply chain for intelligence. They have a lot of intelligence to go build their own chips. There’s a competition around not just the chip, but the whole set of things that go before and after—the ecosystem around the chips as well. And China, I think, has a good shot at this. They should be able to go and look at the Huawei series of chips. And this is a really good use case for printing them. If you look at what Google did with their tensor processing units, the TPU, they have an unfair advantage that they have both all the AI and the chip manufacturing. So when they release their models like Gemini, which are running at crazy volume every day, they need to make sure they’re super hyper-optimized. Like the Gemini Flash model, for example, running on TPUs, is a really good price point at scale with billions of these every day. So you’ll start to see a lot of these companies start to leverage the architecture underlying, optimize that. And China understands that they will not always be able to get access to the technology from the rest of the world, so they will start to create their own supply chain top to bottom. There’s a lot of investment coming from each of the countries in their own sovereign AI, trying to make sure that they can be masters of their own destiny in the AI space. I’m very excited about this whole David versus Goliath kind of war. If you look at the size of the model that Qwen3 came out with, you have a good mixture of experts model where the number of active parameters are very, very low, and they’re outcompeting some of the best-in-class models from OpenAI and Google and Llama. So you have this crazy compute—intelligence per dollar—that’s just completely plummeting, and that unlocks an insane amount of other use cases that we would deploy this at. If you look at the way we within IBM are looking at, say, our Z series and stuff too, we want to bring AI closer and closer to where the transactions are happening—billions of these with almost negligible latency. So I think the whole size of the model being smaller and outcompeting is a great thing. Taking this open source—these are Apache 2.0 licenses—creating derivatives out of it and owning that intellectual property, carrying it with you, deploying it where you need to be, at the edge, on the servers, on the clouds. I think that’s the future direction they were taking. We should be very proud of how far the AI community has come on the open-source models and the progress we are making in this space. But going back to what Kush was mentioning around all the guardrails that are needed, when we see models like Qwen3 come out, we do not see a lot more transparency on the data that went in or any guardrails and stuff that have been put, or any equivalent of the Llama Guards or the Granite Guardians being released from the Chinese labs at this point quite yet. Qwen3 models are text-only at this point; they’re not quite multimodal. That kind of reduces the space of what use cases we can deploy them at. So a few things that they do need to catch up on, but there’s just so much happening in this space. This competition is really positive for all of us.
Chris Hay: I think there’s a flip to that as well, which is, although they’re not being particularly open on data, I think labs like DeepSeek, et cetera—maybe not so much Qwen, Alibaba—but they’re being very open with their code bases. They open-source their distributed file system, et cetera. So I think one of the things I really appreciate in this space with the competition is that the innovation is moving out into the open-source community. And because these labs are being constrained, it is forcing them to think in a different way. So I actually really hope that some crazy kid in a garage somewhere is just gonna turn up one day and go, “I’ve trained a 50-billion-trillion-parameter model, and I’ve done it with a cheeseburger, and I just took this chip and my microwave, and it was fine.” Do you know what I mean? And we’ll be like, “Whoa.” And then what do we do with all of those chips at that point? But I think there’s a really serious problem there: when the next innovation comes—and it will come at some point—where we realize we don’t need all of these GPUs, what are we going to do with all of these GPUs and these massive data centers? And if NVIDIA wants to donate some to me, I will happily take them and I will find something to do with them.
Tim Hwang: You hear that, Jensen? If you’re listening...
Shobhit Varshney: We always talk about China and US, right? I’m not sure if you guys know this, there was a new model that came out in the last few days for voice, speech-to-text. There’s a company, a small lab that are two Korean undergrads who put this model together called D-I-A, Dia. Dia does speech-to-text with very high accuracy. It understands accents, background noise—it does a really, really good job. And it’s outcompeting the bigger labs, like 11 Labs or even OpenAI voices and stuff like that. Super small, Apache 2.0, open-sourced it. So you’re seeing innovation from all over the world. This is not just a US versus China. And at times you hear Mistral from France, right? But this is a global moment right now. Everybody is investing heavily. In India, you have a lot of labs that are now getting a lot of investment to go build these models and own your own destiny. So just the fact that the whole community, the global community, is all in on open source and powering through it—that’s the way we should be.
Kush Varshney: And, and...
Chris Hay: Shobhit, just to add to that, Sir Demis Hassabis—I mean, please note the word “Sir.” I don’t hear no American accent on him, buddy.
Tim Hwang: That’s actually a great pivot into our next segment. One thing I did really want to cover was this pretty interesting letter that came out of J.P. Morgan. Patrick Opet, who is their Chief Information Security Officer, pens sort of an open letter to the industry that was a call to action to work on SaaS security, which is a big problem and a known problem, but I thought was pretty interesting is that he focused specifically on AI and its contribution to this issue. So I’ll just quote what he said: “Critically, the explosive growth of new value-bearing services and data management, automation, artificial intelligence, and AI agents amplifies and rapidly distributes these risks, bringing them directly to the forefront of every organization.” And so there’s a SaaS security issue, and then AI’s gonna basically pour gasoline on it. And Chris, I’ll throw it to you: this seems pretty dire. Are we in trouble?
Chris Hay: I don’t know if there’s anything new that is being said in this letter, really. I mean, yes, agents, of course. But we’ve had agents really for decades. And in some ways, if you think about it, the autonomy, the action-taking is not particularly new. With AI agents these days, it’s the interaction through natural language, more data, and this sort of stuff. But there’s been things—Shobhit mentioned the Z processors and stuff—transactions happen, things happen very quickly. There’s been software as a service for more than a decade. And the point of the letter is, yes, focus on security, of course. Like, who’s not gonna say yes? Maybe there’s a culture sort of issue that’s being pointed out that we need to think more about these important consequential industries, regulated industries, at the forefront. Maybe, like you said, Tim, the mixture of experts—we don’t cover that in industry so much; maybe we should. But for people who do work in those industries, this is not anything new, really.
Shobhit Varshney: This is a good reminder for everybody that you have to think about governance as we move from experiments to going into production at scale. And I’ll give you my take from an enterprise perspective. When I’m working with my clients and we are putting things into production, across all of these different SaaS vendors, everybody else is in this rush to force agents into their SaaS platforms. Even industry standards like MCP—it took them a while; they worked different versions to get to supporting authentication properly. There’s so much in software engineering that has been done to secure things the right way, and we are almost throwing that away and starting from scratch when you’re getting to these agents. We have collectively decided as a humanity that English is the way to talk to these agents, and it just does not scale quite yet. If I’m trying to call another agent and give it a task, I need more structure because I need to do error catching; I need to be able to pass authentication—for this particular task, I’m giving you read access to this particular dataset. So we need to evolve beyond the cute demos that we have all done on stage last year and this year. As you get more and more serious about rolling these out at scale, the governance aspects of it—you have 10 different SaaS vendors for the marketing team that you’re working with. The CMO is not spending that much time understanding what’s happening with all of that data. If I have to put a small policy—and a policy could be as simple as, “I don’t want generative AI to create any content that refers to these 10 competitors”—if I give it that small a task, such a basic thing to do, it’s insane how much effort it’ll take for all these enterprises to make that happen in every 10, 20, 30 different SaaS vendors that they’re working with. So we are struggling as we are delivering these with enterprises. That’s why all the work that Kush and team are doing around governance, around security, guardrails, things of that nature, are so important. And this has to work across regardless of which AI model you’re bringing into the organization. So I think J.P. Morgan Chase is doing a really good job at giving a reality check to leadership. And what J.P. Morgan does is often imitated and copied, and people get inspired by the work they’ve done in the space. We need more people to talk about governance and the reason why we need smaller models that you can monitor, that have the right guardrails. Even speech models: as you start to move from just text-based LLMs to now speech-to-speech native, I’m trying to roll something out at one of our regulated industry clients, and it’s very tricky. The speech models tracing, looking at auditability, the agent ops that’s needed for speech models, is insanely difficult. There’s a lot that this community is gonna do. I would say if 2025 we say is the year of the agents, I would argue that 2025 is the year of governance. We’ve got to get this right if we have a shot at going into production at scale.
Chris Hay: Or we could build more agents to solve the problem. I get your point—we should go governance, and I believe that’s a serious thing, and we should build walls and put everything behind walls, and then nobody can access anything. Or we could go, “Hmm, let’s build more agents.” And then the good agents can fight the bad agents, and then we’re gonna be fine because we’re gonna have a little good agent versus bad agent war. So I think any problem that is super hard today, we don’t need to solve that with things like governance; we can solve that with more AI. That is my solution to this.
Shobhit Varshney: So Cisco—we recently had the big security conference this week. Cisco released a foundation model for security. IBM has done a lot in this space around security-related models. So if you’re looking at cybersecurity risks, you’re looking at hallucinations, things of that nature, I think there’ll be enough AI improvements that we are doing. You need AI—good AI to fight bad AI. A hundred percent with you on that. We do need to talk about the discipline that enterprises need to have to ensure that those good AI agents are deployed as default, baked in, security by design, not as an afterthought bolted on. That’s the point that I think J.P. Morgan Chase is arguing: get excited about this huge benefit to us, but you have to make sure that there’s security by design for the very beginning.
Chris Hay: But you can’t cripple innovation at the same time. No, I get it. There’s certain areas where you have to say, “You know what, this is a very serious thing, and I need it to not hallucinate, blah, blah, blah.” But then at the same time, you need to make breakthroughs and you need to discover new things, and we have a hype cycle to maintain. So at the same time, we can’t hold back on that. So I get it. And I think for certain regulated industries, I understand that and that makes sense. But at the same time, sometimes hallucinations are a good thing because it gives you a bit of creativity. So we just need to be appropriate for the right scenario.
Kush Varshney: Chris’s point is actually a good one. When you have a mix of things that are controlling others, it doesn’t always have to be just in a closed system, like with a single governor. Our immune system—it controls diseases and stuff. There’s bad things happening; there’s good guys fighting against it. It happens in nature all the time in different ecosystems. So if you take the big system-level view of things, control is not always just one little knob. So I think it’s actually a mix of things. Yeah, I just wanted to end with a shout-out to the Robust Intelligence folks—that’s the team that put together the Cisco model that Shobhit mentioned. Really good, really good work from them.
Tim Hwang: Great. Well, that’s resolved the Shobhit-Chris debate conclusively.
Chris Hay: I thought my cousin would be on my side.
Shobhit Varshney: I’m on both sides.
Kush Varshney: You’re on both sides. More security, the better.
Shobhit Varshney: Yeah, exactly.
Tim Hwang: Chris likes both of you equally. It’s fine. So to close our episode, as I mentioned at the top of the show, this is the first anniversary episode of MoE. Very fast year. We were able to bring together the original cast from episode one. And a little bit like the kind of kickoff question we did, I thought it’d be fun to end with a final segment talking about what we did on that first episode, ‘cause it’s very fun to take a look back and be like, “Oh yeah, whatever happened to that?” or “Oh, that turned out to be a really big thing.” So it’s kind of a fun exercise. Producer Hans here will be playing some clips. You’ll actually be able to hear yourself from a year ago, which may either be fun or cringeworthy. We’re about to find out. But the first topic that we covered on episode one was the Rabbit R1 device, which, if you recall, was a small, cute little hardware device with AI embedded, and it was a conversation about AI hardware and where it was gonna go. Hans, do you wanna roll that tape so we can have the respective takes of everybody on the show about what they said about that?
Chris Hay (clip): But it’s like trying to sell a pager to somebody today. It’s like, here’s this thing that’s got the things you need. You can get messages, but nobody has a pager, right? Because it was replaced by the phone. And so I do think there will be AI on hardware devices. I just don’t get that one.
Shobhit Varshney (clip): Just being an optimist of where the tech is going, I’m more on the side of... I see the promise of what this... and Apple takes a while to come into this industry, right? Same thing goes with the Vision Pro glasses. I was a big fan of them when I bought them early on, and three days in I did return them.
Kush Varshney (clip): To me, what this is leading to is actually a fourth paradigm of how we interact with computing. There was punch cards, there was command line, then there was GUIs, and this is now the language, natural language interactions and so forth. I think maybe there’s no killer app yet, but the killer app maybe is the fact that we have this new way of interacting, and that’s what these devices are gonna start us on the road down.
Tim Hwang: Nice. That’s awesome. Well, Shobhit, I’ll start with you ‘cause I think you actually bought a Rabbit R1. Where is it? Where is it now?
Shobhit Varshney: It’s in the garage in a box.
Chris Hay: I’d gone and sell it on eBay, man. Oh, really? Can’t even... there’s no secondary market for the R1. Oh, man.
Shobhit Varshney: I’m hoping this will be one of those things that goes for a million bucks later. But yes, it’s in a garage in a box. I couldn’t even find it to bring for this episode. But overall, I still stand by what I said. I think the market needs to evolve, and we are not there yet. We’ve not seen a single device that is beyond... like even Ray-Bans. I obviously have the Ray-Ban glasses as well, but it’s okay, but not at the point where you can really use it as a real device. I think the last time we saw an accessory that could augment your iPhones and stuff was the watch. Watches looked at a niche, they went off of that, and they’re augmenting, extending your phone. They don’t work without the phone. It just works really, really well as a partner. So I think we’ll get to that point with devices, but I’ve not seen another thing to throw my money at yet.
Tim Hwang: All right. Real quick, Chris, do you wanna take a victory lap on this one? Because I think you won this point.
Chris Hay: Yeah, no, and I’m feeling good about that one. That thing is a pager. I said it at the time, and I’ll say it again. So I’m feeling good.
Tim Hwang: All right. Sounds good. Second thing that we covered on episode one was the rise of a mysterious chatbot on Chatbot Arena called GPT-2 Chatbot, which, if you recall, there was wild speculation about what it was. Hans, do you wanna play the clip about your all’s take at the time?
Shobhit Varshney (clip): Is it GPT-5? I don’t know. I think they’ve hyped GPT-5 so much that if that is at this point, it has to be AGI, or it’s not even gonna impress us.
Chris Hay (clip): Exactly. So maybe it’s GPT-4.5. But I don’t think that... I read a theory online—I can’t say who said it—but I actually like it. Somebody said that take the GPT-2 LLM, which they’ve open-sourced—you can download that on Hugging Face—and they reckon that they may have trained GPT-2 on the latest data that trains GPT-4. And I think that’s an interesting theory, right? GPT-2 with GPT-4 data. So maybe it’s something like that. I don’t know. But I don’t think it’s GPT-5. It probably is GPT-4.5, and as you say, you’ve gotta put it in some sort of arena to see how well it’s actually performing.
Tim Hwang: Chris, that was a pretty good guess. I mean, ‘cause we’re now living in a GPT-4.5 world, right?
Chris Hay: Yeah. I think... I can’t even remember what that model was. When was this? Was that the 4.o, or was it slightly after?
Shobhit Varshney: Yeah, so that was the next model that GPT released. And I think somebody had spilled the beans, saying that, “Hey, yes, that was... it scored really high during the GPT-4o release.” So our guess is that that was the testing they were doing on LLM arena.
Chris Hay: Yeah. I can’t even remember. That’s how far back it was. But I think my guess was pretty good, right? It wasn’t quite four or five, but it was basically the next version of the model. Yeah, I feel good. I still think the GPT-2 theory was a great one, though, so somebody’s gotta do that actually. That’s a good idea.
Tim Hwang: All right. And so for the final one that I want to play, we talked about agents, which has since become basically an ongoing MoE in-joke, I suppose. And with Shobhit and Chris on the show, both of you are probably our most prolific users of the word “agents.” If we just did a word count of all the things you’ve said on MoE, probably you two would be at the top of that leaderboard. So Kush, I’m gonna ask you to maybe make a guess: which one of these guys used “agent” first on MoE? Like, who is number one on breaking that seal?
Kush Varshney: Yeah, this time I’ll go with the family member.
Chris Hay: Woo! There was a plan.
Tim Hwang: All right, well, roll the tape, Hans.
Shobhit Varshney (clip): The talk by Andrew on how agentic flows are going to be the way we get to AGI.
Tim Hwang: Yes! Nailed it. Congratulations.
Shobhit Varshney: So Shobhit to blame for starting that.
Chris Hay: I heard “agentic.” I didn’t hear the word “agents” at all. I call... this is a fix.
Tim Hwang: Yeah, disqualified. So we’ll do some investigation on who actually used the word “agent” first, but I guess Shobhit, you can rest easy knowing that you were really a pathblazer there for us.
Shobhit Varshney: But you should take a second to acknowledge how far we have come in the last year. At that point, Andrew had just introduced how GPT-3.5 with tools can actually outcompete GPT-4 and stuff. So just imagine how far we have come in terms of the cost, the kind of tools, the ecosystem around agents. I’m just very proud of where the community is today with what’s happening in the multi-agent space.
Tim Hwang: Yeah, for sure. And more to come soon. I mean, I think we’re gonna have to do this next year as well, where we look back on what we were talking about in 2025.
Kush Varshney: Ooh, we should have done... what should be the next word that’ll catch on, that’ll go viral? You may have used it on this episode: “open letter.”
Shobhit Varshney: Congratulations to the production crew for the one-year anniversary. I wanna give a big shout-out to the producer Hans, Alex, Michael, and Selma—you guys have poured your soul into this. Thank you so much for bringing Mixture of Experts to our audiences.
Tim Hwang: Thank you so much!
Kush Varshney: Happy birthday.
Chris Hay: Happy birthday.
Tim Hwang: Well, that’s all the time that we have for today. Kush, Shobhit, Chris—an amazing panel. Glad to have you on again! And thanks to all you listeners. If you enjoyed what you heard, you can get us on Apple Podcasts, Spotify, and podcast platforms everywhere, and we will see you all next week on Mixture of Experts.
Applications and devices equipped with AI can see and identify objects. They can understand and respond to human language. They can learn from new information and experience. But what is AI?
It has become a fundamental deep learning technique, particularly in the training process of foundation models used for generative AI. But what is fine-tuning and how does it work?
In this tutorial, you will use IBM’s Docling and open source IBM® Granite® vision, text-based embeddings and generative AI models to create a retrieval augmented generation (RAG) system.