What’s the most exciting announcement coming out of NVIDIA GTC? In episode 47 of Mixture of Experts, host Tim Hwang is joined by Nathalie Baracaldo, Kaoutar El Maghraoui and Vyoma Gajjar to walk you through this week’s biggest AI news headlines. First, dive into the latest announcements from NVIDIA GTC, including the Groot N1 model for humanoid robotics.
Next, find out what Baidu’s new AI reasoning models offer, and why they aren’t open source. Then, get our expert analysis of this week’s paper, Chain-of-Thought Reasoning—and explore its flaws. Finally, explore Gemini Flash 2.0’s new image generation models for developer experimentation. Does this signal that Google is catching up on the AI game? Tune in to this episode of Mixture of Experts to find out.
Key takeaways:
The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.
Tim Hwang: What’s the announcement you’re most excited about from NVIDIA GTC? Vyoma Gajjar is an AI Technical Solutions Architect. Vyoma, welcome back to the show. What did you think?
Vyoma Gajjar: Thank you. I feel the Groot N1 model—the generalist model they’re calling it for humanoid robotics—was something I really enjoyed.
Tim Hwang: Kaoutar El Maghraoui is a Principal Research Scientist and Manager at the AI Hardware Center. Kaoutar, welcome back to the show. What did you see from the keynote that you liked?
Kaoutar El Maghraoui: Great to be here. I was also very excited about the robotics and simulation announcement, especially the Newton Physics Engine for real-time physics simulation and how it’s working with the AI.
Tim Hwang: Nathalie Baracaldo is a Senior Research Scientist and Master Inventor. Nathalie, welcome back to the show. We haven’t seen you for a while. What did you like from GTC?
Nathalie Baracaldo: I was super excited with their framework to generate synthetic data for robots. That has been a key factor for reducing the performance of robots in all sorts of applications, so I’m super excited about that.
Tim Hwang: Absolutely. All that and more on today’s Mixture of Experts. I’m Tim Hwang, and welcome to Mixture of Experts. Each week, MOE brings together the best minds in artificial intelligence to walk you through the biggest headlines of the week.
As always, there’s a lot to cover. We’re going to talk about Baidu’s new models, a paper about the flaws of Chain-of-Thought reasoning, and Gemini 2.0 Flash Experimental. But first, I really want to cover NVIDIA GTC. GTC is NVIDIA’s big conference every year, where the big drops happen. Jensen Huang gets to walk out on stage and do all the exciting keynotes.
It sounds like this group really wants to talk about robots, and specifically Groot N1, a foundation model for robots that NVIDIA announced. Vyoma, maybe I’ll start with you. What got you so excited about this announcement?
Vyoma Gajjar: One of the things I saw is that a model like Groot N1, created by NVIDIA, is trained on both synthetic and real data. During the keynote, NVIDIA was claiming that it features a dual-system architecture—it’s thinking fast and slow—which is inspired by human cognitive processes. I feel these are the small ways in which people are trying to get closer to AGI, however far it seems. So I feel that was a good step they were trying to take.
Tim Hwang: Yeah, absolutely. Nathalie, in your response, you flagged the synthetic data part as what got you most excited. It would be good for our listeners to understand how big of a blocker that has been traditionally. Could you talk a little to that?
Nathalie Baracaldo: Yes, definitely. One of the big issues is that when you try to simulate a robot to test it before it goes into the real environment, we have limited data. Traditionally, when you simulate—which is less expensive—you don’t have the full spectrum of different scenarios where the robot might move. As a result, when you move your machine learning programming into the actual robot, it fails. There are very nice videos of how it fails. If you have a humanoid, it may just fall on its face; it’s just crazy.
That’s why many robots in companies and factories have a very restricted set of environments. You see them going down an aisle, for example, if it’s an arm, and so forth. It’s just because it’s very complex to create a robot that can move in an environment that may not be exactly what it was designed for. Moving just a millimeter, the robot may not behave properly.
Having this type of synthetic generation of data allows us to create a huge environment where we can test this and make the whole development cycle much faster and safer. Another aspect is that as these robots evolve and move around, you may have unknown safety things happening. That is super interesting to me because understanding how we can make all those environments safe and simulate things that go wrong before the robot is deployed... I think it’s just fascinating. It opens a lot of new opportunities to create safe robots and applications and deploy them in real life. I was also super excited to hear they open-sourced it. I was very, very excited to hear that.
Tim Hwang: Yeah, the open-source element is a really interesting part of this. They’ve clearly created something big from a model standpoint, but they’re saying, “We’re here to sell hardware.” So the business incentives lean towards open source. Kaoutar, you spend your days thinking about hardware. How big of a deal is this? And why is NVIDIA getting into robots? I think about NVIDIA; they started as a gaming GPU company. The next time we really thought about them, it was for big data centers for language models. A lot of this keynote was robots, robots, robots. We saw videos of robots; they brought a robot on stage. Why is NVIDIA investing in this vertical?
Kaoutar El Maghraoui: I think it’s high time right now to invest in this, and it’s very attractive. All the ingredients are coming together: the models, the hardware, the simulations, the synthetic data generations. This makes robots perform very well. The collaboration they have with DeepMind and Disney is also interesting—especially seeing Disney play a role here. Maybe they’ll bring the fun, the entertainment piece, their Disney characters, into these robots. That’s going to be really interesting.
Another thing that was very interesting is this physics engine they talked about, which is designed for robotic simulation. Nathalie mentioned this. It’s built on their RAPIDS framework, which provides a lot of acceleration. It provides high-fidelity and real-time analytics simulations, which wasn’t really possible or realistic before. This is crucial for training and testing robotic systems in virtual environments before deploying them in the real world. I think that’s a very big step forward to enable humanoid robots to perform well and with high fidelity.
So, combining simulation and AI acceleration using their RAPIDS-based acceleration framework with high-performance parallel programming helps them achieve fast and efficient GPU-accelerated simulations. They’re combining the AI world with physics-based simulations to provide an interesting outcome. They also have integrations with existing frameworks, like Isaac Sim, and with their reinforcement learning. They also have a playground that uses DeepMind’s robotics research. A lot of integration with existing frameworks makes high-precision robotic control possible, paving the way for a great environment ideal for simulating tasks like manipulation, grasping, multimodality, etc.
Tim Hwang: Yeah. A follow-up question for you, Kaoutar: in the last few episodes, we’ve done segments where it’s like, “Oh, OpenAI is about to work on its own chips,” or “Amazon might be catching up.” Lots of people want to capture some of NVIDIA’s market. I look at all this robotics work and the announcements about Blackwell—the performance metrics are just insane. From your opinion, as someone who watches the industry, can anyone catch up? After this keynote, it feels very hard for anyone to credibly claim they’ll do things on par with NVIDIA, particularly because of their ecosystem. Curious about your thoughts.
Kaoutar El Maghraoui: Yeah, I agree with you. I think they’re creating a big gap. They’re also lining up the right collaborators, like DeepMind and Disney and others. So it’s going to be hard to catch up, but I wouldn’t be surprised if somebody comes with contributions, from OpenAI or others. Although I agree, it’s really difficult to catch up.
Tim Hwang: I want to talk a little about some of the other announcements. Nathalie, in particular, I thought of you. A few episodes ago, we talked about a project they announced, I think at one of the last keynotes, called Digits—a quite cute little supercomputer they were selling that would be a desktop unit. As a researcher, is that form factor for doing work interesting to you? I have a friend at a company saying, “Our product’s getting successful, but that means we’re burning all our compute on inference. We have no time to do training or fine-tuning work anymore.” There’s a tension where all compute resources for an organization come out of the same bucket. Is something like DGX Spark (what it’s called now) interesting? Did you put your name in to reserve one? I’m curious about your thoughts.
Nathalie Baracaldo: I would pass that question to Kaoutar, actually. I’m not sure how to answer that.
Kaoutar El Maghraoui: Yeah, I think it’s definitely going to open doors for many researchers, enthusiasts, and people interested in learning about all the different cycles in the AI journey. Of course, there’s a lot of focus on inference and inference scaling because it’s hard to get access to GPUs. We had to be creative about what we can do with available resources. But I feel a lot can be done in pre-training and fine-tuning stages. Only a few organizations are limited because of resource constraints. I would love to have access to an “AI in a box” that I can use in my home and experiment with, pushing the boundaries further. I’m sure many others would have that appetite as well.
Tim Hwang: Yeah, it’s a cool device. I’m a bit of a device nerd; it’s amazing you can have that much computing power on your desktop right now. Very exciting. Vyoma, any thoughts on DGX Spark? Do you think it’s a gimmick, or something you’d actually be interested in playing with?
Vyoma Gajjar: I feel the reason they came up with this is to target the developer community—developers sitting at home or wanting to try something as a side project. That’s how innovation flows. Someone’s side project, if they have the compute to do it, opens creative doors. It helps you experiment, train, fine-tune a small thing and move on. “Fail fast” works very well here. I feel that will be the reason people adopt it more. “Okay, I have this, I can leverage it, learn about it, try seeing if it works or not, and move on.” You’ll no longer see only big companies or institutions spending time on innovation projects because someone somewhere would have tried it and said, “Hey, guys, it’s not going to work. Let’s move on.” A quick turnaround is the angle NVIDIA is trying to play here.
Kaoutar El Maghraoui: If I might add, all the points Vyoma said are great. This is about bringing AI supercomputing to the desktop, trying to democratize AI. But there’s also the angle of robotics and humanoid AI training at scale, which I think was one of their motivations—pushing humanoid robotics forward with models like Groot. Why DGX matters here is it allows developers to fine-tune and deploy robotics models locally. Using their Newton Physics Engine, it enables simulation-to-real training. You need these capabilities locally to advance these things. It would also be great for students learning AI. They need to learn these concepts, and the best way is to experiment and have hands-on experiences. Students right now are struggling to get access to GPUs and resources.
Tim Hwang: Yeah, it’ll be really cool if there’s a big schools or education market for these devices. I think about when the Macintosh or first iMacs came out; Apple had a huge market selling to schools because everybody wanted to give computers to their kids. It was a great way to break into people learning how to use these devices.
I’m going to move us to our next topic. Baidu announced this week they were launching two new models: ERNIE X1 and ERNIE 4.5. X1 is supposed to be their DeepSeek competitor. Baidu is a longstanding Chinese tech company, one of the leaders in the space—someone you would have expected to dominate in AI. Like many other players, like OpenAI, they too are now struggling with new competitors. What’s interesting about ERNIE X1 and ERNIE 4.5 is they’re both closed-source models. As a first cut, I’m curious about your thoughts on open source here. Why do we think Baidu is still pursuing a closed-source strategy? Do you think they’ll have to open source, like many others are thinking about now?
Vyoma Gajjar: Yeah. I feel Baidu is a company that originated from creating a search engine for China, and they wanted to keep majority of that data private due to data privacy inhibitions. I feel this is their chance to utilize some of the information, the knowledge graph, they’ve created across their different AI applications, like Baidu AI for search or maps. They’re trying to create a platform interface with one particular model that creates synergy. I always believe that sooner or later they’ll realize the open-source market is a better way forward. Like how Sam Altman, after a couple of years, had to say in an AMA on Reddit, “Hey, I think maybe we’re on the other side of history.”
Tim Hwang: He’s getting dragged into it.
Vyoma Gajjar: Exactly. I don’t think he wanted to say it; it just came out. So I feel that as well. But looking at Baidu’s core structure, they were always very privacy-integrated, something they believed in building. I get where their mind is right now, but I think sooner or later they’ll have to move. The pricing they’ve set is almost half of what others are. That’s another point: “See, guys, everyone’s gonna use us; we’re half as expensive.” So they have an up in the market. They’ll leverage this as much as they can, and then we’ll see.
Tim Hwang: Yeah, the competitive dynamics are interesting. Nathalie, Baidu is like the Google of its market—the search engine. The reputation has been that Google was slow to capture the AI opportunity. It’s interesting that in China, the search engine company is also the one that’s been slow to capture this. Should we read into that? Is there something about search businesses or dominant search companies that are more limited in using or benefiting from AI?
Nathalie Baracaldo: Yeah, that’s an interesting question. The way I see it is they probably don’t need to open source because they already have a big user base. People are already trusting them with many things. So from a strategy perspective, open sourcing may not be a key priority as it is for other companies.
Another aspect: whenever I think about open-source vs. closed-source models, from a security standpoint, when you have an open-source model, you’re telling people, “Hey, go inspect it. We try our best; tell us how you think we did.” That offers transparency and improves how we move forward. When companies keep models behind the scenes, not telling how it works exactly, they may be planning to orchestrate different components in the backend. We see with OpenAI—we’re not fully sure how many models they have behind the scene; we know they have guardrails. So not fully open-sourcing because they already have a big base for search, they probably think it’s okay to go that way.
But as a security person, I like transparency. It makes it easier to test the system. So that’s my take on open source vs. non-open source and what they’re doing.
Tim Hwang: Yeah, for sure. Kaoutar, are you on team Vyoma? Do you feel this closed-source strategy is doomed? Will we see Baidu have to open source? Or do you think different things are going on in that market?
Kaoutar El Maghraoui: I kind of agree with Nathalie, but I see they’ve already started making a step towards open source. I think they announced they’re planning to open source their new models sometime in June. This shows they’re competing with OpenAI and DeepSeek, especially seeing all the buzz DeepSeek created. Open-sourcing AI models like DeepSeek has gained traction. Baidu likely sees this as a way to increase adoption of its own models, gain market share, attract developers, and build an ecosystem. If you keep things closed, you miss out on the open ecosystem, developers, and community help—especially adoption. Adoption goes hand-in-hand with open source. Driving widespread adoption, more developers, more use cases, faster improvements through external contributions—those are all win-win strategies with open source. I think Baidu is getting it and moving in that direction. This intensifies competition in China and globally. It’s interesting to see these dynamics.
Tim Hwang: The next thing I want to talk about is a paper. I like to always have a paper to discuss. It’s fun to see what’s going on in research. This paper caught my eye: “Chain-of-Thought Reasoning in the Wild is Not Always Faithful.” For those not super aware, Chain-of-Thought (CoT) reasoning traces are where a model thinks through a problem before rendering an answer. Overall, we’ve discovered this method is really good for getting models to perform better. But there’s an increasing series of papers investigating the problem of when the model gives erroneous reasoning for its decision. When are these reasoning traces not actually a faithful way of understanding how models make decisions?
Nathalie, you think a lot about security. This paper raises security issues—maybe we’re giving people the wrong impression of how AIs think by giving them CoT traces. Is that the right way of thinking about this paper and the problems of CoT in general?
Nathalie Baracaldo: Yeah, that’s a very interesting question. I’d rather have Chain-of-Thought so I can know at least a little how the model came to an answer. What the paper shows is a lot of biases that may happen in the Chain-of-Thought itself. I think it’s really interesting because the particular bias they demonstrate is a bias that also exists in us humans. For example, if I ask you, Tim, “Is X larger than Y?” depending on how I phrase it, people would answer one way vs. another. That’s a cognitive bias cognitive psychologists have long studied. The paper shows that same bias exists in the model. I thought that was interesting—the parallel in cognitive psychology for humans vs. the paper. If you study this further, you’ll see models exhibit other types of biases.
For fairness, for example, there are papers about how the Chain-of-Thought itself may be biased, pretty hateful, or tell you stuff you don’t want to see for certain use cases. Overall, I thought it was a really interesting paper. The caveat—because I’m a researcher at heart—is that they only used one dataset and their temperature was 0.7, which I thought was interesting. I’d like to see more expansion on this work because it’s fascinating.
Tim Hwang: Yeah, absolutely. Vyoma, did you see things you were interested in? From the temperature point, how much should we believe these results? Maybe, by and large, reasoning traces are useful for understanding how models make decisions, and we shouldn’t be so scared. What do you think?
Vyoma Gajjar: Yeah, I agree with Nathalie on the temperature point. When it’s at that temperature, you’re telling it to be more creative in its thinking. Using that as a base to say CoT is not here to stay? I feel it is here to stay because it tells you what it’s going through. The other part is, companies that have come up with reasoning models are now looking into how to make CoT processes better. Right now, we see a lot of pattern matching rather than a generalized way to understand deep reasoning. But going further, what about a reverse CoT? Whatever a CoT has given you, go back and evaluate it again. Tell me if that Chain-of-Thought was right. There can be innovative ways researchers will answer this. I feel it is here to stay, and it should.
A short example: I moved and was looking for a sofa. I said, “I want a Japanese-style sofa with a table.” It started giving me “table, table, table.” I knew from the reasoning that it was spinning on the “table” thing. I didn’t want that. I said, “I want a side table.” It again told me, “No, it’s the table.” I’m trying to say I understood I had to be specific: “I want an adjustable low-level side table.” I wouldn’t have done that had it just spun, and I would have been okay with it giving me a Japanese-style sofa someday. So CoT tells you to improve your prompt; it tells you something’s not right. Right now, CoT is also based a bit on your prompt; it doesn’t tell you the exact model internal workings. But I feel it will evolve with time. People are working on it.
Tim Hwang: That’s one of the most interesting things—I never thought about it that way. CoT is useful for letting you know when the reasoning is definitely off, even if it’s not a good guide for when it got it right. It’s a debugging tool more than anything, which is a fun way of thinking about it. Kaoutar, I’d love your comment on Nathalie’s point: it’s funny these models have inherited all these cognitive biases humans have. Computers didn’t used to have those biases, but now we live in a world where that’s the case. As someone who thinks about hardware—which I envision as more structured—it feels like computers are now executing systems with weird, soft, emotional aspects. It’s a funny contrast to what we thought about computers 10 years ago. Computers used to be exact, zeros and ones, the opposite of biased.
Kaoutar El Maghraoui: Right now, because they’re learning with AI from data generated by us, they’ve inherited all our biases. I think it’s only natural to see these outcomes, and we have to figure out systematic ways to solve them. Vyoma mentioned some. Of course, this is a challenge: CoT is unable to generalize and accumulates errors the longer the reasoning chain is, leading to faulty logic despite correct answers. But as Vyoma mentioned, there are potential solutions. I also agree this is here to stay, especially the interpretability aspect. We need to amplify its importance.
We need things like self-correction modules. Claude, for example, has Constitutional AI, a reflection-based approach that helps self-correct. There’s structured step verification, hybrid models where neuro-symbolic reasoning, like Tree of Thoughts, can help correct logic. Combining statistical logical AI with probabilistic reasoning and logical constraints—all these techniques need to be brought to the table. We need to combine neural-symbolic AI approaches to improve reasoning in LLMs, with self-verification and self-correcting. This will keep CoT useful and reduce flaws. Hybrid reasoning frameworks will be necessary to improve reliability.
Tim Hwang: Yeah, we’ll definitely see that. It’s a funny outcome: we created CoT, which can be very emotional. You read a CoT like, “Oh, I’m trying to do the best I can at my job. Okay, let’s research this task.” Then we try to make it more computer-like again. It makes me think: people say, “He’s like a computer.” Maybe 20 years ago, that meant rigid and logical. Kids today might say, “He’s like a computer,” meaning irrational and emotional. It’d be funny if it flips what we mean.
Vyoma Gajjar: Imagine NotebookLM with CoT. You see the Chain-of-Thought and can posit, “No, don’t go there. Don’t think like this; change this.” That can be used as a training dataset. It’ll open new avenues for prompt engineering. People will learn to make prompt engineering more robust, scalable, and precise. This could help with customization. The way you interact with the model might be very different from how Tim or Nathalie interacts. Localizations, customizations might be interesting. You can inject cultural cues and preferences. So, Tim, I think we’re going to introduce even more biases while trying to make it different.
Tim Hwang: It’s going to be a hybrid world. On our Baidu segment, the narrative has been that Google is coming from behind—they should have captured the AI revolution but missed it, and now they’re catching up. Week to week, it feels like Google is really catching up. There are all these launches which are quite impressive. Google recently announced—almost a joke in AI now—a model called Gemini 2.0 Flash Experimental. It’s a model they had in beta for a small group but is now widely available. It’s an image gen model people can play with. This by itself might not be super impressive, but I wanted to use it to talk about one aspect: Google is touting that one reason its model is so good is that it incorporates “world knowledge” to make image generation better. Like many phrases in AI, “world knowledge”—what does that mean? I’ll start with you: What is “world knowledge,” and why is it important for AI generation, particularly in images?
Vyoma Gajjar: Correct. First, I want to answer the point: Is Google really catching up? That point we made...
Tim Hwang: Yeah, the hot take. It’s just vibes; I don’t have industry stats, so feel free to knock me down.
Vyoma Gajjar: No, I get it. I’ve been asked this many times. “Catching up” is subjective. My question is: Are these models going to surpass or at least match the creativity of already established models like Midjourney, DALL-E, etc.? Maybe they’ve arrived at the dinner table, but maybe they have really big products none of these people had.
On “world knowledge,” I feel they’re talking about deep integration with Google’s knowledge graph, with access to all the real-world data. Instead of just learning from images, they’re also learning from structured text. Imagine all the Google searches we’ve done, all the pictures I’ve posted about my sofa saying, “This is not what I want; this is what I want.” That has been added into that historically consistent world knowledge Google already has. It’s extremely important to create a model that’s more accurate in answering user questions.
Tim Hwang: Yeah, I see this as Google trying to use its advantages. Lots of people can train image gen models, but we’ve got the knowledge, so we have to put that to use.
Vyoma Gajjar: I would trust them. Even though I’ve been using other models, I’ll use them now because I know they might have more domain-specific, accurate data that I need.
Tim Hwang: Kaoutar, am I just operating on vibes? Is Google catching up, or is this table stakes? They’re just able to generate a model as good as everybody else now.
Kaoutar El Maghraoui: I think it’s table stakes. They’ve been working on it, so it’s just time for release.
Tim Hwang: Nathalie, you were laughing. I guess you agree?
Nathalie Baracaldo: I think at some point last year, I was so surprised and excited with every announcement. Now, every week something is happening. The space has started to be where people are catching up, and it’s becoming a commodity. That said, I’m still impressed that we can say, “Change my tulips for wild flowers,” and be very focused on a part of an image. That wasn’t the case a few months back, and that makes me happy. I love seeing more models coming out. Not only Google, but many players will continue improving models, capabilities, and how we describe and get beautiful pictures. So yeah.
Tim Hwang: Yeah, for sure. It’s really hard to be in the AI business. You’re doing magical things never done before, and then six months later, people say, “Ah, what else you got?” It’s very hard to keep ahead. I agree; there’s almost announcement fatigue. What is the next big thing? Big announcements every week blend together.
Kaoutar El Maghraoui: The key question: Are they catching up, or are they really innovating? That’s what we need to focus on. You can catch up, see what others are doing, close gaps, mimic—a lot of algorithms are published. But are you bringing something new to the table that nobody else has thought about? Maybe we should start seeing trends: Who’s the real innovator, and who’s just playing catch-up?
Tim Hwang: Yeah, definitely. Kaoutar, you’re a harsh judge. Well, that’s all the time we have for today. Thanks for joining us. Nathalie, Kaoutar, Vyoma, it’s always a pleasure. Thanks to all the listeners. We’re trying something new this week. We’re interested in hearing what you’re interested in. Spotify, please drop a comment. Let us know; we’ll keep an eye out and work that into future episodes. Flag anything you’ve seen you want us to talk about. We look forward to hearing from you. As always, if you enjoyed this, you can get us on Apple Podcasts, Spotify, and podcast platforms everywhere. We’ll see you all next week on Mixture of Experts.
Applications and devices equipped with AI can see and identify objects. They can understand and respond to human language. They can learn from new information and experience. But what is AI?
It has become a fundamental deep learning technique, particularly in the training process of foundation models used for generative AI. But, what is fine-tuning and how does it work?
In this tutorial, you will use IBM’s Docling and open-source IBM Granite® vision, text-based embeddings and generative AI models to create a retrieval augmented generation (RAG) system.
Listen to engaging discussions with tech leaders. Watch the latest episodes.