Is Manus a second DeepSeek moment? In episode 46 of Mixture of Experts, host Tim Hwang chats about all things Manus with Chris Hay, Kaoutar El Maghraoui and Vyoma Gajjar. Then, the rise of vibe coding—what started as a joke has now become a reality. Next, we dive deep into the future of scaling laws. Finally, Perplexity is teaming up with Deutsche Telekom to release an AI phone—what’s the motivation here?
Tune in to today’s Mixture of Experts to find out more!
Key takeaways:
The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.
Tim Hwang: Is Manus AI a second DeepSeek moment? Vyoma Gajjar is an AI Technical Solutions Architect. Welcome back to the show. What do you think?
Vyoma Gajjar: Almost.
Tim Hwang: Great. Kaoutar El Maghraoui is a Principal Research Scientist and Manager at the AI Hardware Center. Kaoutar, welcome back as always. Manus AI, what do you think?
Kaoutar El Maghraoui: I don’t think so.
Tim Hwang: And last but not least is Chris Hay, Distinguished Engineer and CTO of Customer Transformation. Chris, DeepSeek moment, yes or no?
Chris Hay: Yes, but no, but yes, but no, maybe, yes, no, maybe.
Tim Hwang: Well, we’ll be investigating all that and more on today’s Mixture of Experts. I’m Tim Hwang, and welcome to Mixture of Experts. Each week, MoE gathers just the nicest and most brilliant people to talk through the biggest news in the world of artificial intelligence.
As always, there’s going to be a ton to cover. We’re going to talk about vibe coding, scaling laws, and a new phone from Perplexity. But first, I really want to talk about Manus AI, which was the focus of our initial kickoff question.
If you’ve not been watching the news, Manus AI is a Chinese company that announced a multi-purpose agent that has really been taking the AI chatter class by storm. They have a bunch of demos showing their agent able to pull off some traditionally quite difficult tasks. They show it scheduling trips, doing stock analysis, reviewing resumes, and evaluating insurance. It’s a really wide range of outcomes.
And it seems to be another moment where, following hot on the heels of DeepSeek, people have been asking: Is China really catching up, if not surpassing, a lot of the companies we talk about almost every single episode on this show—your OpenAIs and Anthropics of the world?
So, I guess, Vyoma, maybe I’ll start with you because I think you were the most bullish. I think you said it’s close to being a DeepSeek moment. Do you want to kind of lay out the bull case here for why it actually is a really big deal?
Vyoma Gajjar: Sure. As you know, half of Silicon Valley is building agentic AI startups right now. Manus AI is an agentic paradigm that we are seeing. It is more of an industrialization of the intelligence that has been created from all these large language models. If done right—if they can work well on the compute side, the hardware side—they can come up with something because they’re first in this new paradigm of bringing it to the market. I know there are so many other agentic frameworks available, so I feel that if everything goes right in 10 other aspects we have to evaluate, like metrics, hardware, software, compute, etc., maybe. But then, as I said, there are 15 others, or 50 others, who can always catch up. So, you never know.
Tim Hwang: Yeah, definitely. Kaoutar, you were a little bit more skeptical, I think, in the opening. Curious about what you think here. A friend of mine was saying it’s easy to have a really cool-looking demo, but a real product is a whole other thing. We don’t really know whether or not Manus can deliver. Is that the source of your skepticism, or is it coming from somewhere else?
Kaoutar El Maghraoui: Yeah, I think I’m still a bit skeptical about this. From my perspective, Manus is definitely shaking things up a bit. Of course, there is also a lot of skepticism in the AI community. Some argue it’s transformative, pushing the boundaries of what AI agents can do. Others just say it’s a rebranding of what maybe Claude is doing—more smoke than fire.
The big question here is, can Manus really redefine AI autonomy, or is this just another step in the ongoing AI race between East and West? Is it a leap, or is it just more advancements? I think there is a lot more evaluation that needs to be done to see whether we’re seeing new innovations, a leap, or just a maturing of this technology.
The community is really interested in the implications for AI agent development. If Manus proves to be a significant advancement, it could accelerate the creation of more sophisticated and capable agents. But of course, there is a lot of pressure here. There is a growing awareness of the increasing competition from Chinese AI companies. You heard from Vyoma that half of the startups are agentic AI companies, so there is a lot of competition. I think a lot of people right now are analyzing the output of what Manus is doing to see if they can see the hallmarks of Claude’s outputs. If that is the case, it’s really diminishing the hype surrounding this product.
Tim Hwang: Yeah, for sure. I think it’s a good chance to bring Chris in. On this Claude point, the background is that pretty soon after Manus came out, people said there are a bunch of responses that are very Claude-flavored. In some cases, they were actually able to pull out some verification or strong evidence that it was from Claude.
Let’s say for a moment that it is just a Claude wrapper. Does that totally diminish this as an outcome? I don’t know how we should think about that. I know a lot of people said, “Oh, if it’s just a wrapper, then Manus really hasn’t added all that much.”
Chris Hay: So I guess from my perspective... Let’s think about Cursor, for example. Let’s think about Klein. Let’s think about Perplexity. We could probably argue all of them are Claude wrappers as well, right? They’re all tools where ultimately Claude is driving the experience. But actually, I don’t think this is a story about which AI model is powering it, although that is important. This is really a story of somebody bringing together a really great experience.
I think they have brought together a great experience because when you use the Manus UI, it does the planning, it’s got a little to-do list, and it ticks them off as it goes along. It has access to tools; it’ll access the terminal, access the browser—very similar to what’s going on with OpenAI with Operator, etc., and Deep Research, for example. They brought that together in a nice experience. They’re running it on a sandbox. They’re doing tool calling, and it kind of feels good, right?
Now, it’s a little bit more than a Claude wrapper, to be fair to them. They have taken the open-source tools and integrated them really well together, right? So technically, we could go and do this ourselves. And I think that’s why this is probably gonna end up... This is why I went, “Yes, maybe, maybe, maybe yes, yes, maybe.” Because I think what’s gonna happen is the open-source community is gonna go, “We can do that.” And I know that because I’ve been coding away all week doing the same thing, right? Trying to do the same thing.
Tim Hwang: Yeah, exactly. Like every other developer on the planet, right?
Chris Hay: And therefore, it’s something that’s achievable. And to be fair to them, it’s a little bit more as well. They said that Claude was doing orchestration, but they also said they fine-tuned a bunch of Qwen models. I think they specifically said that for the planning model—the one that comes up with the to-dos, etc.—that was a particular kind of Qwen fine-tune, and they pointed to a version they’d done earlier. So it’s a little bit more than just, “Here’s Claude with a pretty UI.” It’s a bunch of fine-tuned models, bringing these tools together, sandboxing it, and then bringing the package together. I think they’ve done a fabulous job.
And then finally, they’ve generated the hype, right? I was reading today that like 2 million folks have signed up for invites. So we’re all running around going, “Yeah, we’re going to do this.” Will they be around? Are they the next Devon? We will find out in six months or so, right? But for just now, the hype cycle’s there, and I’m hoping it galvanizes that open-source community.
Tim Hwang: You had a good phrase there, which is they’re tying together a bunch of components, and we could do this as well. I did want to dive a little bit into that. I have a friend who focuses a lot on all the kind of state-level AI bills that are bubbling up. He made the observation that US companies could have done this; US open-source efforts could have launched a very similar thing. Maybe one reason they don’t is because some of the things that Manus is showing off have been kind of risky from a legal standpoint in the US, right? Things like resume review is very hotly regulated; it’s a hotly disputed thing.
I guess I’m going to toss it back to you. Do you think there’s almost an edge here? Is Manus winning this, or at least seeming first to get this hype wave, just because they’ve been willing to be more aggressive than other folks? Or do you not really buy that?
Vyoma Gajjar: I feel the first thing is that Anthropic Claude had tried something called “Computer Use.” We spoke about it on this board with you, and it’s being compared quite vigorously right now with Manus AI. But the Computer Use, Anthropic Claude version, actually performs very well in controlled environments—exactly what we’re talking about. Like, let’s say, if there’s resume review, etc., it brings along a whole different metric system that has to be evaluated for a large language model to be used. A use-case POC is very different from what you can integrate in an enterprise architecture, right? And how do we integrate it?
So sure, Manus showed the way that, “Okay, yes, this is how we can do it.” But to actually do it, it’s going to be a lot of leaps and bounds that the entire industry has to go through. Regulations, etc., have to be written around it for us to be able to use it. But yes, the US did try it, and the Computer Use part was actually something we were all talking about for a while, right?
Tim Hwang: So, Kaoutar, I guess the final question—curious to get your thoughts on this—is whether or not, take us six months into the future, as Chris is saying, do you have predictions on where this all goes? One thing’s for certain: it seems like we’re going to see a bunch of open-source attempts to do the same thing. I guess we can ask whether or not this Manus thing actually changes the fundamental trajectory of where agents are going. Curious if you want to paint a picture of where you think we will be in six months. If anything, Manus is just building more hype and more momentum in this direction, but I’m curious to get your prediction.
Kaoutar El Maghraoui: Yeah, that’s a good question. I definitely feel, and I agree with Chris, that what they’re doing is not just integration; it’s also more about having this fully autonomous agent capable of independently executing complex tasks, doing various things like sorting, stock trend analysis, website creation, which is really great. So we will see others trying to mimic that. They’re leading in this space around autonomy. So beyond mere integration, it’s making a significant advancement in AI autonomy.
But I think whether more hype will follow... I think it’s going to be the case. We’re seeing now every few days or every few weeks, we’re seeing new hypes. So I feel we will see more interesting things coming in this space.
Tim Hwang: Yeah, it’s like if this is a DeepSeek moment, then get ready for at least 20 or 30 more this year, I suppose.
Chris Hay: I just want to say, you know what I don’t want to see in six months? Another browser operator. Large language models are really good at text, right? Why are we insisting we have an AI moving a cursor around, finding a bounded box, taking a screenshot, and then typing into the box? You know what I would rather see? I would rather see somebody go, “You know what? I’m going to create an AI-native browser which parses the text.” And actually, yes, it’s going to communicate with the websites, and they’ll recognize it as a real browser, etc. But you don’t need to do screenshots. You don’t need to move a cursor around. You’re a browser, and you’re a large language model that knows how to code. Do that. That’s what I want to see in the next six months.
Kaoutar El Maghraoui: People have gotten lazy, Chris. People don’t want to do that. They’re like, “If we can have someone do this for us, why not?”
Chris Hay: I’ll sit on the couch all day. I’m fine for them to sit on the couch. Just don’t move a cursor around and take screenshots. That’s what’s bothering me.
Tim Hwang: Yeah, I think that is one of my favorite agentic tropes at the moment. People do it because it looks really cool. That’s the main reason; it’s kind of cool and spooky.
Vyoma Gajjar: Yeah, I think it’s more for demo purposes, and also for people to show that this is how we can do AGI. This is the next step to AGI. And I think everyone’s chasing that now, thinking, “Okay, we’re done with the LLMs, etc. Now let’s move on.”
Chris Hay: But Vyoma, it sounds like you’re defining AGI as a 95-year-old grandparent trying to work the internet for the first time. That’s what I see when I see the AI operate with a browser.
Vyoma Gajjar: Yeah, that’s true too. I mean, that’s what we’ve defined, I guess, at this point. But let’s break that.
Kaoutar El Maghraoui: Yeah, I think it’s going to be interesting to see how these human interfaces will evolve. I agree with you; I also don’t like the cursor on these things. So probably more serious thinking into what would be interesting for us to see as these interfaces. What would you like to see that mimics a true human experience without having this cursor or screenshots and things like that?
Tim Hwang: Yeah, for sure. Chris, in your weekend experimentation, have you been able to replicate Manus?
Chris Hay: I am surprisingly far. Actually, I went slightly different from them. I’ve put MCP at the heart of what I’ve been doing, which I think is a lot better. But then I haven’t built a product and got it to 2 million people. This is just Chris and his agents in the night. So I don’t think I can take the intellectual high ground on this, but yeah, I’ve got pretty far so far.
Tim Hwang: Yeah, that’s actually a pretty strong indication, right? With not a whole lot of work, you actually get pretty far with these things. I guess it just goes to show how competitive this space is about to be.
I’m going to move us on to our next topic, which is a real fun one. Andrej Karpathy, who we’ve talked about many times on this show before—in addition to being former Tesla and former OpenAI, I think he’s had this kind of career shaping the memes of the AI space. We mentioned Cursor earlier; arguably, his shout-out of Cursor is one of the reasons it has been so wildly successful.
He had a nice tweet capturing his thoughts on using AI-assisted coding recently, where he said: “There’s a new kind of coding I call vibe coding, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It’s possible because the LLMs, e.g., Cursor Composer with Sonnet, are getting too good.”
This has been a funny thing because, in true Andrej Karpathy form, “vibe coding” has just gotten everywhere. It’s been a joke that people keep mentioning now. Weirdly, I feel like in the last week or so, people have been saying, “Oh yeah, I did this project through vibe coding.” It’s almost now becoming a term of art.
If I may turn it to you first: Is vibe coding a real thing? Is this the future of coding? Are people just kind of vibing with it until an app comes out? Is this a good way of thinking about where engineering is going with all this assistance?
Vyoma Gajjar: Yeah, it’s going to be very controversial when I say this, but no, it’s not. I don’t think this is the future. I feel that getting to know the concepts and the basics behind how a particular code works is extremely important. Sure, you won’t be able to code it end-to-end; you can use vibe coding to assist you with that. But that being the only crutch that we all rely on? I don’t think that’s going to be successful.
Sure, I can do vibe coding for a weekend project to test out something that I want to maybe show a small POC for. But for a diff that’s going through millions of lines of code over there, you use that particular code to solve that diff and then put it in production? I’m totally against that. I don’t think that’s the right norm.
The other thing is, I know everyone keeps talking about vibe coding, but if you go into the interview market right now, you have to go through a LeetCode interview. You have to do solution design. So the basics aren’t going anywhere. People are talking about it, but yes, you have to traverse a string if needed, or add the nodes of a binary tree. You have to do that. It’s not because people want to know how well you code, but they want to know whether you understand the concept, right? So that’s something that I feel is needed.
Tim Hwang: Yeah, so there’s a lot to unpack there. I don’t think even Karpathy is saying, “Oh, you should vibe code an enormously complex project.” But I think it is this interesting debate. You sort of say, “Look, this is going to work for your weekend project, but probably not much further.” There’s almost a question of, where do you draw that line? How far can you get with vibe coding? If everybody agrees you can’t do the most complex thing, but you can definitely vibe code the simplest thing, this dividing line gets very fuzzy as these models get better. I think it’s a genuinely interesting question. I’m curious, are you a vibe coder? Do you vibe code?
Kaoutar El Maghraoui: I do it sometimes, and I like it. Actually, I’m really fascinated by vibe coding. I see vibe coding as a reflection of the changing nature of software development, where AI tools are increasingly being adopted and handling routine tasks, allowing coders to focus on higher-level design and problem-solving.
Of course, I think we’re evolving into a world where we’re going to be combining both vibe coding and serious coding. But understanding what the AI is generating, how to test it, how to integrate it, how to do all these diffs and so on, is still going to be very important. Whether we’re going to get to the point where it’s all automated by AI, I think it remains to be seen. I think we’re heading in that direction as these LLMs are becoming better and better at coding.
One of the things that I’m a bit concerned about is the rigor. Vibe coding could also lead to a decline in code rigor and best practices. So there is a worry that we’re getting less experienced coders who rely more and more on vibe coding, especially among students. People that are still learning are given programming assignments, and they’ll just go and ask an AI agent or an LLM to do it for them. Sometimes the AI does a pretty good job. So there’s definitely going to be an influence from these AI tools on coding styles and practices. But this is also enabling a more exploratory and iterative approach to coding.
Tim Hwang: Yeah, for sure. My friend actually coded up an app, and I was like, “Oh, so what does this menu do?” It was very funny because he was like, “I just vibe coded it, so I’m actually not really sure what it does.” I don’t know if this is a sustainable way to go about building bigger systems.
This bleeds pretty well, I think, to a point Vyoma made, which is first you said it’s only going to be good for weekend projects, not anything bigger. I think the second point you made is also really interesting: if you want a job as an engineer, they’re still going to make you go through LeetCode, right? You’re still forced to go through this gate.
I was joking with a friend recently. I was like, “Oh, well, I’m just looking for the 10x vibe coder.” Right? Someone out there is able to vibe code way more proficiently than everybody else. If I can just find that person, maybe he or she doesn’t need to know LeetCode.
Vyoma Gajjar: So, I’m not saying that LeetCode is 100% a reflection of how good a software engineer you are. But even if you’re not able to solve that pseudocode, how does an if statement work? How does a while statement work? Trying to explain what you are asked from a LeetCode question is also good enough in case you’re not able to code on the spot.
A “10x vibing coder”... if that particular coder would actually even understand legacy code that has been written? For that, you’ll have to go back and understand: What’s a function? How did that function work? What are the parameters called in this function? I’m not there yet. Maybe I have not seen a good use of that entire system, but maybe people might get better. The models keep getting better; I totally agree with that. Sure, you can use it for grunt work. That was one of the questions coming up when I was reading about it: if there is a CSS file and you need to change a particular bracket or a button, you don’t have to sift through thousands of lines. Of course, sure, you can use it for that because it’s a better use of your time to do something else, learn something better. But for you to build an application end-to-end, I don’t know if it’s the best way to do it.
Chris Hay: I can say right now, this is here to stay. This is not a weekend project thing. As somebody whose starting graduate job was writing Unix C Motif, I can tell you right now I spent most of my time chasing memory pointer bugs. I am never going back to that nightmare ever again. You talk about productivity? You look at your own terrible graduate-written code and try to work out why that memory location is not the place you wanted to point at. You watch your dreams die as every time you boot up your application, it crashes.
What I can do with a large language model is I can go Ctrl+C, Ctrl+V the error message from the compiler and then say, “Fix this, buddy,” and then it goes, “Ah, haha, you messed it up over here.” Oh, that’s great. I will just copy and paste that back in. Or using Cursor, I don’t even need to copy and paste; I can just click, click, click—like Homer Simpson in the nuclear power factory when he was working at home with that little pigeon. That’s the world we’re moving to, right? So I’m all in. I’m all in.
Tim Hwang: Okay, you’ve got some inbound here. Kaoutar, how about you go first?
Kaoutar El Maghraoui: I’m a bit worried that if we only do this, then if you really want to do software-hardware co-design, really do optimizations, then people will lose the skills to understand computer architecture. What does it mean to have a dangling pointer? How do we do these memory allocations, the efficiency? I mean, I see that right now as we are designing these efficient systems, having a core understanding of these concepts is very critical to know how to optimize. So if we’re just going to vibe code and then people have no clue what the underlying systems are doing, how they’re behaving, what it means to have a cache hierarchy, all these implications about data movements and the computational units, the bottlenecks... we’re going to lose that. And that’s what worries me. How do we optimize these systems end-to-end?
Chris Hay: I’m okay with losing that, to be honest. I’m still having nightmares from my past there. Of course, it’s not for everyone. But honestly, I think those skills are important. I think we have different skills within the engineering practice, and that’s going to expand out.
If you think about it, if you think of something like Lewis Hamilton as a Formula One driver—one of the best drivers in the world—does he know how to change a tire? I don’t know. Maybe he does, maybe he doesn’t, right? But what he does know how to do is drive a Formula One car at speed through the race course better than anyone else. Now, the question I would have there is that there is a wider team, right? Some people are going to be specialists at tire changes, some at ball bearings, whatever. I think that’s pretty cool. So not everybody needs to know how to deal with dangling pointers, right? Some people just want to build an app and get it out and try to make some money. Go for it.
Tim Hwang: Yeah, there’s some interesting history here too. Think about the first people who built planes, right? The Wright Brothers. They were engineers modifying bikes to get the plane to work. If you’re flying a plane, it almost breaks down so often that you really need to understand every bit of it. Right? And now, pilots have a certain level of training, but they’re not necessarily airplane engineers.
I actually wonder if that’s going to have that divergence in coding over time, where you almost have good coders, but that’s almost a discipline separate from understanding the inner workings of the machine. I guess we get that in part because machines become super reliable, right? Your car is not breaking down every single day, so you don’t necessarily need to understand it. But software used to break down all the time, and so you really did need to know those internal components.
Vyoma Gajjar: First of all, I’m very happy to know that Chris is a fellow F1 supporter. So even if Hamilton doesn’t try to fix a car, Chris, I think he would know how to do it theoretically. That’s all I feel is actually needed in the vibe coding part.
The other thing is, I wouldn’t entrust a vibe coder with a nuclear reactor control system. But I would make sure that anyone who does vibe coding actually knows what they are doing with it. So as long as that loop ties back, I am okay with it. But if it doesn’t, and we have someone who’s just vibing with a minimum amount of data points and trying to make something production-ready, I don’t think I’m comfortable with that.
Chris Hay: I sort of agree with you, and I think we’re going to move from one extreme to the other. I think that’s the reality. I can see a world where you’re going to vibe code to prototype, vibe code to figure out some issues, etc. It’s almost like, I’ve never painted, but it’s like those Monet paintings where it’s all sort of blurry stuff, and then you hone in on the detail. I think that’s going to become a bit of a pattern. You kind of need something like this, you orchestrate it, and then you start to say, “Okay, I know what I’ve built here. I prototyped this. Now I’m going to start engineering this further and go down into detail.”
The flip of this is, we’re looking at this from an engineering perspective. Why can’t that person who’s never coded go and create an app for themselves, get it in the app store, and make some money? Or maybe somebody who wants to do a home automation project but has never had those skills—why can’t they vibe code and then be able to do that thing they’ve never been able to achieve? And then, you know what? It might get them interested in the discipline and might want to say, “You know what? How does memory work?” And then they start delving into that. So I think actually it’s an area where we can have greater inclusion, greater impact, and a larger community. So I hope that’s the direction we go in.
Tim Hwang: Yeah, I think the feedback loop is super important. At least for me, who doesn’t really code day in and day out, the ability to use these tools makes the experience... I have 45 minutes after my kids have gone down to play with the computer, and I can just get further in that time. It’s a very strong, satisfying feedback loop in a way that you didn’t really have pre-these tools.
A final anecdote: My mom was a very early coder. She still remembers punch cards. To your point, Chris, it’s interesting how much, if you felt a lot of pain coding, you are more likely to want these tools because you remember how painful it is. My mom’s response is, “Oh, yeah, I remember programming a big box of punch cards, dropping them, and then spending hours having to recompile.” And she’s like, “I love this. Automate everything, basically.” I think that’s a really fascinating aspect of people’s personal experience; how difficult it is might make them more or less willing to adopt these tools.
Chris Hay: And you could imagine a vision model, multimodal, controlling an action model with a robotic arm, and then that robotic arm can re-sort those punch cards, and your mom would be fine, right?
Tim Hwang: Yeah, exactly. I need the coding assistant for punch cards. That actually would be an awesome project.
Kaoutar El Maghraoui: I agree with Chris that it’s going to be a hybrid world where we have people who have no clue about coding but are still able to use vibe coding to create really nice things for rapid prototyping, proof of concepts, or applications matured by these tools. But then we still need the people who really understand what’s happening behind the scenes, who know how to debug, who know how to optimize. It’s going to be specialization at these different levels, for sure.
Tim Hwang: Yeah. I’ve got to believe some of these debates happened when object-oriented programming came around. People were like, “Ah, you don’t understand the inner workings of the system!” This battle happens almost at each layer of abstraction, arguably.
I’m going to move us on to our next topic. We wanted to do a quick segment to talk a little bit about scaling laws. This segment kind of puts together a couple of things we’ve touched on in the last few episodes, particularly regarding DeepSeek. Kaoutar, great to have you on the show because I think you suggested this topic.
The background here, of course, is that scaling laws are the idea that we have this interesting relationship in machine learning where the more compute you use in pre-training, the better the capabilities that come out. There’s a rough relationship between how much muscle you’re putting in and the model that comes out. This has motivated the entire thesis of these frontier model companies raising enormous amounts of money, which is to say if we want really powerful systems, we need lots of data, lots of compute, and we need to do the biggest possible pre-training run.
I know Kaoutar you wanted to bring this up because you think DeepSeek doesn’t necessarily break this idea but nuances it a little. Do you want to talk a little about that?
Kaoutar El Maghraoui: Yeah, definitely. As you mentioned, in traditional AI development, there is this general belief that bigger models and larger datasets lead to better performance, following the scaling laws. This often translates into massive investments in hardware and infrastructure. But what DeepSeek really demonstrated is they’re challenging these traditional AI scaling laws. They’re demonstrating that smaller, more cost-efficient models can achieve competitive performance, and they’re even threatening existing business models’ reliance on large-scale infrastructures.
They used a lot of techniques like quantization and distillation. They even did some optimizations at the PTX level, given the limitations they had with the H100 GPUs. So a lot of focus on efficiency over size, emphasizing efficiency rather than sheer model size, and optimizing at different levels in the stack. They also looked at enhancing data quality and employing better training strategies.
The key idea here is how do we leverage smarter training techniques, as in their example, to achieve better performance with fewer parameters and reduced computational costs? I was really fascinated by the wide range of techniques they used. They had data-centric approaches; they also had hardware-aware optimizations; they were considering sustainability. I think this has implications for the AI community, where the focus is shifting from scaling by size to scaling by efficiency.
Tim Hwang: I think this is actually a lot to get into. In some ways, for me, the scaling law question is interesting because people have meant a couple of things in the popular discussion of AI. One of them is just how much compute you need. I guess in some ways DeepSeek doesn’t really change that. It certainly changes the platforms you need to get high performance, but it doesn’t necessarily eliminate the idea that more compute equals better performance. Is that right? Is that the right way of thinking about it? I don’t know, Vyoma, if you want to jump in.
Vyoma Gajjar: With the scaling laws, with DeepSeek, etc., I see a new shift in people trying to optimize GPUs and revolutionize this entire field. I don’t know if people know about this, but a month ago, Meta and NVIDIA came up with a paper, and they said something called Warp Specialization will be a part of PyTorch. It optimizes GPU performance on Hopper architectures like the H100s by assigning distinct roles to each one of these warps. (A warp is a group of 32 threads that are running). It kind of pivots to this entire point that we are looking into how to optimize all of these hardware specs. I think that came into the picture because of some of the scaling laws we’ve been seeing. So I don’t see that as a way it would limit us; I feel we’ve come up with better ways.
Tim Hwang: Right. It’s not necessarily about magnitude; it’s more about how we’re treating the GPUs, basically.
Vyoma Gajjar: Exactly. Exactly.
Kaoutar El Maghraoui: I think hardware-aware optimizations are becoming increasingly important. If you also see the work happening around state space models and the flex attentions... Every now and then we hear about different algorithms around Flash. How do you do these transformer attention computations more efficiently? There is Flash Attention; there are various versions. There is Flex Attention. Now the Mamba and Bamba models are also doing a lot of optimizations by understanding the underlying architecture, especially the GPUs right now, and then figuring out how to restructure the computations and the data movement so you can drive more efficiency from the hardware.
Other things like test-time compute are also becoming very important. Instead of focusing solely on pre-training compute, we can focus on inference-time compute, which is really more critical. Meaning smaller models can compute more at test time—longer reasoning, tree search, Monte Carlo inference, and things like that. This also reduces the need for an enormous parameter count. There was a very interesting paper about test-time compute that showed techniques focusing on how to get more from the model during test time, not during training time.
And also distillation: creating compact models with large-model capabilities. Of course, you still need the large model, but we can create a variety of distilled versions that do really better and can inherit knowledge and reasoning from much larger models. And of course, high-quality training is also something that is outperforming raw scaling—smart data selections, better fine-tuning, reinforcement learning with self-improvement. Well-trained models can outperform poorly trained massive models, especially if you focus on data quality.
Chris Hay: Yeah, I was just going to say I think we came from a world of “vibe training,” which is really, if you think about what was going on in the beginning, just taking some transformers and throwing a bunch of data at it to get it to predict the next token. And actually, we’re in this stage now where it’s really about honing the algorithms, honing the chain of thought, as you say, starting to engineer things. Kaoutar, you made some really good points on DeepSeek. One of the interesting things they did, I think it was last week, is they open-sourced a whole bunch of their code bases that they use to train. That’s everything from data frameworks; they even engineered themselves a new distributed file system, etc. So all of these engineering techniques—anything to get more efficiency, better training—I loved your point about high-quality chain of thought and inference-time compute. That makes a huge difference. That allows you to start getting smaller models, higher quality models. I really think we’ve moved into this engineering phase. But I’m going to, like Karpathy, I’m going to call it “vibe training” and see if I can get myself a Wikipedia entry off the back of that.
Tim Hwang: Yeah, you heard it here first. I guess, Kaoutar, maybe I’ll throw it to you for the last question here. Do we think that scaling laws no longer matter? Do we care about scaling laws anymore, given this new era of optimization?
Kaoutar El Maghraoui: Yeah, I think we should really shift from just “bigger” to “smarter” and more efficient models. We should redefine the traditional scaling laws as they were defined by “bigger is better,” but I think it should be about “smarter and more efficient.”
Tim Hwang: Well, I’m going to move us on to our final topic. I want to end on a fun, odd story that came across our cues. There was an announcement recently that Deutsche Telekom and Perplexity were going to work together to launch what they call an “AI phone,” which would integrate a bunch of AI features for less than USD 1,000, coming out in 2026.
This news struck everybody as a little surprising because Perplexity has largely been in the world of AI-powered search. They were one of the first to market with a human-language query that gives you results, attempting to be better than the “10 blue links” from Google.
So the first question is, it’s a very perplexing announcement for Perplexity to be getting into the phone space. Chris, you’re already smiling, so maybe I’ll throw it to you first. Why is Perplexity doing this at all?
Chris Hay: Tim, when you’re on your mobile phone and you’re doing your Google searching, do you ever go to www.google.com and then type in your query there? Is that your action?
Tim Hwang: I never do that. What is your action, Tim? How do you search on a regular mobile phone?
Tim Hwang: I would say open the browser and then type my search term into the browser bar.
Chris Hay: Exactly. So this is what this is really about: controlling the browser bar. Whoever controls the browser bar can direct those queries to their search engine. So if I was Perplexity, I would absolutely launch a mobile phone where I’m in control of the browser bar. That’s my opinion of what they’re doing, and that is a smart move.
Tim Hwang: Do you all agree? Vyoma, Kaoutar? Curious if you think it’s a brilliant move by Perplexity; this is exactly what should happen.
Kaoutar El Maghraoui: I think this is also marking a shift in the user interface—how we’re interacting with our phones—to shift to a more voice-centric, AI-driven user experience, potentially reducing reliance on app-based interactions. I think maybe this is going to shift and change with these AI phones. It’s going to be a completely different experience that is mostly voice-centric. Probably we’ll see the disappearance of apps and more of these agents in the background working together to satisfy whatever we need.
Tim Hwang: Yeah, the interface part is really super interesting. One thing that has been said about AI search features, whether it’s Perplexity or Google, is that increasingly they’re moving to a world where you don’t have to go to the underlying website; it curates the result for you. So I guess, Chris, to your original point, it’s kind of weird. It’s almost like the whole feature has been turned inside out: you’re going to a browser bar not to browse the web, but to get the results of a chatbot. That’s really strange from my perspective.
Chris Hay: And that chatbot is gonna launch its own browser instance, go to somewhere else, look up that website, and then come back with the answer. It’s gonna be weird.
Tim Hwang: Definitely. I think the business rationale for Perplexity makes sense. It’s almost a question, though, of how valuable that browser bar is going to be in the future. It of course has been the source of a lot of litigation, like Apple working with Google to have it as the default search engine. But you can almost imagine a future where maybe the app is actually the more powerful thing. When I want to know something, I don’t go to the browser bar anymore; I just go to Perplexity. Or in Kaoutar’s world, I just speak into my phone, and the phone just does what I want it to do.
Vyoma, maybe I’ll ask you: Is there a world where Perplexity is trying to seize this real estate on the phone which might not be so valuable in the future? Maybe browser bars are going to be a thing of the past?
Vyoma Gajjar: Just FYI, it has taken over my browser bar for sure. I’ve been using Perplexity for months now. Many of my friends... you want to research anything? In my past days, I would go to Reddit. Now I no longer have to do that. It’s on my home screen; it’s right there in my most-used apps because that’s all I use now. I don’t research anything anymore. So it has totally taken over that.
AI-layered integration at the OS level is much better than any standalone app that exists. So I think Perplexity has hit it out of the park there. The only thing I sometimes struggle with in Perplexity is when I actually want to buy something. Let’s say I want to buy a mattress or a particular lampshade; I’m researching about it, and it’s not getting that consumer knowledge about me that Google has, because Google’s been integrated at the OS level for years for that context. So this is going to be a great pivot for them to make their product more context-aware, which is the need of the hour now. It’s going to feed all these usage patterns back. I’m never going to lose Perplexity from my phone. I love it, genuinely. I no longer have a Google search bar on my phone anymore. That’s a big deal.
Yeah, and I live in the Bay Area, and there are many, many people who use that. Believe me, you’d see them pop it out on their phones all the time. I have friends who use that. So I feel that is one of the things I’ll see.
But the one thing they didn’t speak about in that entire blog post was the hardware specifications. What is that inference layer they’re going to use? Is it going to be robust local inference with the typical cloud, or are they going to be using on-prem such as NPUs or TPUs, etc.? That is going to be the deciding factor in whether it is here to stay or not.
Kaoutar El Maghraoui: That’s a very good point, Vyoma, because this definitely depends on how mature or powerful the edge AI models, especially on-device AI, are. As we’re getting these LLMs to become smaller and more efficient, more AI processing can be done on the device. This provides faster responses, better privacy, and also the context that allows customization. So I see this evolution going hand-in-hand as edge AI becomes more mature and powerful; we can do more with these AI phones.
Vyoma Gajjar: Exactly.
Tim Hwang: Yeah, the price angle is interesting. How cheaply can you pack these features into the phone? That’s a really interesting question. Right now, it’s almost a luxury feature. If you want to run a more sophisticated model, you need all the power and hardware at the edge, which makes for a much more expensive phone. They’re promising it for less than a thousand dollars. It’s already funny to say, “It’s a phone, but it’s going to be cheap—less than USD 1,000.” But even still, to pull that off is pretty interesting in terms of how far you can democratize this tech.
Vyoma Gajjar: Yeah, it is going to start this whole wave of having specialized devices. I hope we are not going back to the era of the Amazon Fire Phone, which detached itself and wasn’t that great. But I hope this breaks that curse, and we are able to see something greater and better.
Chris Hay: I agree, Vyoma. I just want to see that new experience, as you say. A native, integrated experience, have all your contacts but keep it private. I loved your point, Kaoutar, about voice, etc. I think there are so many different modalities that can come into this, and back to the camera as well. I just hope we get a different and new experience. But as I said earlier, if you want to control that search experience, you need a device there. So I do think it’s a brilliant move.
Kaoutar El Maghraoui: Yeah, this is all happening while Apple is delaying its AI features. I don’t know your take on this. Is that because they want to make sure they have a very well-curated, secure offering? Apple has been conservative about security. This is opening the space for Perplexity and others to take over some of the market that Apple phones have.
Vyoma Gajjar: Yeah, I think with Apple, when they put out Apple Intelligence, they got a lot of backlash. So maybe they are field-testing it way more before coming into production. I don’t know if you know about this, but Apple came up with the Apple Kids Watch. Again, they are known as the company that respects privacy, and them coming up with the kids watch showcases their commitment towards it. So I feel they are looking into several avenues before coming out with something publicly.
Tim Hwang: Yeah, and I think, weirdly, the hardware mentality might be working against them in implementing these features. Unlike building a phone, which you can really control, these models are still unreliable and probabilistic. In some ways, the discipline of launching AI features is a bit more risk-loving than Apple might be used to, and I think it’s holding them back in the market.
Chris Hay: Yeah, but I ain’t giving up my iPhone for anything, Tim. So I’m okay with that.
Tim Hwang: Yeah, that’s right. The counter-argument is they can just keep trying because everybody’s on their phone and they’re not going to throw it away, so they can just keep going until they get it. Something to keep an eye on. We’ll definitely be keeping an eye on this Perplexity project; a lot more to come there.
So that’s all the time we have for today. Vyoma, Kaoutar, Chris, thanks for joining us, as always. And thanks for joining us, all you listeners. If you enjoyed what you heard, you can get us on Apple Podcasts, Spotify, and podcast platforms everywhere. We will see you next week on Mixture of Experts.
Applications and devices equipped with AI can see and identify objects. They can understand and respond to human language. They can learn from new information and experience. But what is AI?
It has become a fundamental deep learning technique, particularly in the training process of foundation models used for generative AI. But, what is fine-tuning and how does it work?
In this tutorial, you will use IBM’s Docling and open-source IBM Granite vision, text-based embeddings and generative AI models to create a RAG system.
Listen to engaging discussions with tech leaders. Watch the latest episodes.