NEO 1X Robot, OpenAI chips, The AI Scientist and the future of prompt engineering

Watch the episode
Mixture of Experts podcast logo
NEO 1X Robot, OpenAI chips, The AI Scientist and the future of prompt engineering

Will prompt engineering ever die? In Episode 19 of Mixture of Experts, host Tim Hwang is joined by Kaoutar El Maghraoui, Kate Soule and Shobhit Varshney. Today, the experts chat the future of prompt engineering, a new paper released about The AI Scientist, NEO 1X’s humanoid robot and OpenAI’s in-house AI chips.

Will AI take over scientific discovery? Will everyone have at-home AI assistants? Why is OpenAI investing in chip production? Tune in for our expert’s takes.

The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.

📩 Sign up for a monthly newsletter for AI updates from IBM.

Listen on Apple podcast Spotify podcast

Episode transcript

Tim Hwang: My opinion is that prompt engineering is never going to die. It’s a forever thing.
Kate Soule: Anyone who’s worked with large language models has experienced some of the pain, dark art, black magic of... if I shout loudly enough at my model, maybe like literally if I type in all caps, maybe this time it will do what I’m asking it to do.
Tim Hwang: The creepy factor is big, but these robots are also pretty cool if you can get them to work.
Kaoutar El Maghraoui: I would love to have one actually in my home, cleaning dishes and cooking.
Tim Hwang: How many scientists are going to be out of a job in the next 10 to 15 years?
Shobhit Varshney: I’m just looking forward to a world where we start using the word “we” when AI is actually starting to do something meaningful for us.

Tim Hwang: All that and more on today’s episode of Mixture of Experts. I’m Tim Hwang, and I’m joined today, as I am every Friday, by a world-class panel of engineers, researchers, product leaders, and more to hash out the week’s news in AI. On the panel today: Kate Soule, a Program Director of Generative AI Research; Shobhit Varshney, a Senior Partner Consulting on AI for the U.S., Canada, and Latin America; and Kaoutar El Maghraoui, Principal Research Scientist, AI Engineering and AI Hardware Center.

As always on Mixture of Experts, we’re going to start with a round-the-horn question: Will prompt engineers even exist in five years? Kate, yes or no?

Kate Soule: No.

Tim Hwang: Shobhit, yes or no?

Shobhit Varshney: Not at all, man.

Tim Hwang: Okay, alright. And how about you, Kaoutar?

Kaoutar El Maghraoui: I think it’s going to evolve to a different role.

Tim Hwang: Okay, alright. Well, let’s get right into it. The “prompt” for this first story we want to cover today is that we’ve just had a slew of sub-plot, sub-B kind of announcements from all the companies. They haven’t been the most prominent things, but they’ve created a pattern. Kate, you flagged this for us: a lot of companies have been working on prompt automation.

Anthropic announced a “Meta Prompt” system that helps generate prompts for you. Cohere is launching a “Prompt Tuning” feature that takes a prompt you have and improves it automatically. Google recently acquired a company called “Prompt Poet,” which is very much the same functionality.

This is a big deal. If you’re familiar with LLMs, a lot of work has gone into making a good prompt. The big thing is the future of taking the human out of the loop—the idea that you won’t need prompting anymore. Kate, as someone who threw this topic to us, can you explain for our listeners why that is important? What changes when that happens?

Kate Soule: Yeah. And I like what you did there, Tim: “the prompt for today.” Look, I think anyone who’s worked with large language models has experienced some of the pain, dark art, black magic of... if I shout loudly enough at my model—like, literally if I type in all caps—maybe this time it will do what I’m asking. It can be a frustrating process that doesn’t make logical sense. We’re all rational beings; ideally, there would be a rational, structured way to prompt these models.

I’m really excited to see work that tries to—not take a human entirely out of the loop—but take a human out of the loop of finding these specific phrases, tokens, words, and patterns that seem to be more effective for one given model to perform a task. Being able to search a broader space of natural language to identify how to frame a question for improved accuracy is going to be really powerful overall to improve productivity and reduce some of the stress when working with models.

Tim Hwang: Yeah, for sure. Now, Kaoutar, in your response, you agreed that prompt engineering might not be long for this world, but you said the role will shift. Can you tell us more about what you’re thinking there?

Kaoutar El Maghraoui: Yeah, sure. There have been a lot of recent developments in prompt engineering leading to significant changes in how prompt engineers interact with LLMs, like Kate mentioned—things like meta-prompting from Anthropic. These developments shift the focus from crafting individual prompts to designing systems that guide the AI to adjust its own behavior. Prompt engineers may increasingly focus on creating frameworks for meta-prompting or refining the underlying logic. This creates a more robust role where engineers manage how prompts evolve in real-time.

Look at Cohere’s Prompt Tuner, for example. It enables users to fine-tune and optimize prompts for different applications. The implication is that prompt engineers may transition from manually crafting prompts to overseeing automated tuning systems. This democratizes prompt creation and could reduce technical barriers to entry, pushing engineers to focus on more complex, high-impact tasks where deep expertise is required, like designing industry-specific models or optimizations at scale.

Then there’s Google’s acquisition of Prompt Poet, which emphasizes automation in the generation and optimization of prompts. This further blurs the line between AI systems and prompt engineers. As these AI systems evolve, the engineer’s role may shift towards supervision, focusing on edge cases, creative tasks, or model-specific customizations.

The overall implication is a shift from a manual to a supervisory role. I don’t think we’ll completely remove the human from the loop, but there will be an increased focus on optimization and an expansion of skill sets. Prompt engineers will need broader skills, including model training, dataset curation, and integrating LLMs into broader AI pipelines. To sum up, prompt engineering is likely evolving from a hands-on manual role into a more supervisory one, focusing on higher-level design, optimization, and supervision of automated systems.

Tim Hwang: Yeah, that makes a lot of sense. It’s interesting that the process happening with the movement to AI agents will also happen in the prompt space—rather than doing everything, you’re monitoring the system. Shobhit, you had your hand up.

Shobhit Varshney: Yes, I think prompts will get more and more personalized to the individual. Over time, a lot more context will automatically pull in. The center of gravity is moving towards hyper-personalization for the individual. The way a prompt expands into a meta-prompt will be super hyper-personalized to the context, the memory of everything I’ve done in the past.

Being a good prompter to these LLMs at work has made me a much better parent talking to my eight-year-old daughter. She just...

Tim Hwang: “Explain it clearly, think through it step-by-step,” you know.

Shobhit Varshney: Yes! I have to talk to my daughter saying, “Anya, you just turned nine, you are a big girl now,” and then I walk her through reasoning to get the answer I’m expecting, like, “No, I should not have ice cream before I sleep.”

Tim Hwang: Got it, right? Exactly. That’s the desired outcome.

Shobhit Varshney: Absolutely. And that’s a two-way feedback training. Now we’re at a point where, say, it’s 8 p.m., and if I say “Anya,” her response is going to be, “Papa, I’m almost done eating,” because she understands the pattern that when she’s eating slowly, I’ll check in. She has more context on how to respond to me. But if my wife calls her, her response will be different. So I think the hyper-personalization of these meta-prompts is the direction we’re headed.

Tim Hwang: Yeah, for sure. Kate, maybe to turn it to you before we move on: when we think about prompting with humans, we encode in language. What’s interesting is that the optimizations may use tokens that don’t look like normal grammar—it could be a random string that gets the best results. Do you feel like prompts over time will become more obscure to us because the optimal encoding for the model may not be human-readable? There’s a trade-off between optimization and readability. Your thoughts?

Kate Soule: Yeah, to answer that, it’s important to recognize two sides of innovation happening. One is improving our ability to prompt the models, but the other is improving the model’s ability to take structured, reasonable prompts. Instead of talking to a version of Shobhit’s eight-year-old daughter, can I talk to a software developer that understands structured inputs and provides structured responses?

If we only innovated on prompt optimization—creating new tokens while keeping the model frozen—then yes, we could get to non-human-readable prompts. But we’re also seeing, like with OpenAI’s structured outputs, more structure being baked into models to make interaction more standardized and systematic. Ultimately, that’s where real value gets unlocked, especially in agentic patterns. If we can focus on having very structured, formulaic ways to work with models—maybe not perfectly human-readable like storytelling, but systematic—I think that’s where we’re going to end up.

Tim Hwang: Yeah, it’s funny because what you’re describing is we’re reconverging towards code—structured language as a way to direct systems.

Kate Soule: Yeah, we started structured, created a bunch of unstructured, and now we’re like, “Wait, there were some good things there we should bring back.”

Tim Hwang: So I’m going to move us to our next topic. We spend a lot of time on Mixture of Experts talking about software and enterprise, but one of the most interesting viral AI moments recently was the launch of a humanoid robot called NEO from a company called 1X Technologies. They’re working on humanoid robots designed for home assistance. The demo shows a robot helping around the home—cleaning dishes, cleaning up.

The question is, how much of this is reality? How much is a cool demo? And most importantly, would you buy one? Kaoutar, I’m curious about your thoughts. Did you see the demo? What did you think? Do you think something like this will be a reality? There are practical bits-and-atoms questions about hardware affordability I’d love your take on.

Kaoutar El Maghraoui: I would love to have one actually in my home, cleaning dishes and cooking—someone who spends an hour on tasks I hate the most. The demo was very impressive. 1X is among the most prominent companies in the emerging field of humanoid robots. But will they become a reality or remain a pipe dream? Humanoid robots have been a focus of science fiction, but transitioning comes with significant challenges.

The argument for humanoid robots is that they can fit into environments designed for humans, use existing tools, and interact naturally. However, there are several challenges. First, mobility: building a robot with human-level dexterity and mobility is very difficult. While there’s progress, we’re far from a robot performing all human tasks autonomously. Technologies like soft robotics and advanced actuators are making strides but aren’t there yet.

Second, energy efficiency: these robots require significant power, limiting practical use. NEO and similar projects are working on efficiency, but battery life and energy consumption are still bottlenecks.

Third, cognitive and social interactions: beyond physical tasks, robots must navigate human life complexities—interpreting social cues, responding appropriately, making real-time decisions. Developing AI capable of this is an ongoing research area.

Fourth, economics: building something affordable, versatile, and reliable is a major hurdle. For many applications, simpler, specialized robots are more efficient and cost-effective. The complexity and cost of humanoid robots limit adoption to niche markets.

So, what’s the reality versus the long-term vision? We’re in a transitional phase. Existing prototypes are far from ubiquitous, but demos show promise. It’s not just a technological pipe dream; I think it’s going to happen. But for full realization, it’s going to take years, if not decades, before they become a common reality.

Tim Hwang: Yeah, that functionality gap is interesting. People might purchase these, but if there’s not much they can do, they could end up like lonely Pelotons—expensive hardware sitting around. It’s funny because it’s a humanoid guy. Kate, Shobhit, are you more skeptical, or do you agree we’ll see this in our lifetime?

Shobhit Varshney: I’m a big geek; I’ll buy stuff I think is awesome. So I’m...

Tim Hwang: You’re going to have the Peloton NEO robot in your house.

Shobhit Varshney: I feel the same argument applies as with models: one massive, stunning model that can do everything versus a smaller set of niche models for specific use cases that are more efficient. I’m in the camp of preferring a device that helps with a particular task incredibly well. For example, I use the Roborock S8 MaxV Ultra—a high-end robot that vacuums, mops, cleans itself, and dries itself. Specialized tools augment what humans aren’t good at; that’s the future direction in the short run.

It’ll take a while to solve all the constraints for a humanoid replica. In the next five years, specialized tools that do a particular task incredibly well, are cost-optimized, and handle repetitive work will dominate. By the time we get a humanoid that solves cost, flexibility, and dexterity issues... Kate, do you think the same?

Kate Soule: I completely agree. Model specialization has progressed with the same trends. It also reminds me of the story where if you asked someone in the horse-and-buggy days what they wanted, they said a faster horse. Then Ford released the first cars. We’re in a similar scenario: people want more human time, so they think of a humanoid robot. But really, can we rethink how to make humans more superpowered, not just create more humans we don’t have to worry about feeding?

Shobhit Varshney: That sounds more like how we solved the dishwasher paradigm. We figured out an optimal way to wash dishes that does an incredibly good job at a low price point. We changed the human workflow. We didn’t optimize the human action of rinsing a dish; we found a better way to solve the niche use case. I’m with you: the human workflow has to change, and then we optimize. Specialized machines that do a particular task really well will come before a general humanoid.

Tim Hwang: Yeah, and you can’t discount the creep factor. It’s a bit spooky to have a large humanoid in the house. That leans in favor of specialized applications that don’t raise that fear. We’ll see if 1X can pull this off.

Kaoutar El Maghraoui: It’s an interesting development. It comes down to what people can consume. Specialization versus generalization is always a concern. But if we can combine both, that would be great—like what LLMs are doing: having large models for variety but specializing them for tasks. Could we have humanoid robots that do various tasks, but you press a button to focus on cleaning the dishwasher or the pool? A subset of the model specialized within the humanoid? That would be cool.

Tim Hwang: Yeah, ultimately, the humanoid robot will be the one maintaining all the smaller robots. Robots all the way down.

Kaoutar El Maghraoui: It’s like a hierarchy.

Tim Hwang: Exactly.

Shobhit Varshney: Kaoutar, the way you framed it, you’re looking for a Transformer robot that turns into a vacuum cleaner to do one job really well. That’ll be the world we live in.

Kaoutar El Maghraoui: That would be cool. Yeah.

Tim Hwang: I’m going to move us to our next topic. There’s a fascinating paper shared by a friend of the pod, Kush Varshney (a recurring guest). I love how ML papers pick dramatic names; this one is called “The AI Scientist,” with a long title about using AI to automate end-to-end science. It’s a proposed system pushing the limits of whether LLMs can help with scientific discovery in a fully automated way.

This is a big deal. Technological breakthroughs are critical for societal progress. The hope is to augment and accelerate the research process by overcoming the bottleneck of limited brilliant minds. I always worry these papers look too good, with ambition too great. Kaoutar, you looked at this paper. Do you feel they hit upon something new, or will AI’s role in science look different?

Kaoutar El Maghraoui: I enjoyed reading the paper. It presents a nice framework for an automated AI scientist where LLMs generate research ideas, write code, run experiments, visualize results, and write papers. They showed interesting papers generated by this AI scientist. It made me worry: what’s going to happen to scientists? What about conferences? Will papers be generated by real scientists or LLMs?

These advancements could significantly impact scientific discovery, reducing cost and increasing speed, especially as augmentation for human research. The controversy surrounds methodological concerns, particularly the reliance on automated review systems to evaluate scientific quality. That raised concerns for me: can such reviews truly assess novelty, creativity, and rigor? Another skepticism is whether AI can fully replace human intuition in scientific discovery, especially in abstract or interdisciplinary fields. AI isn’t there yet for mimicking human intuition across multiple fields.

There are also broader ethical and social implications for automating research. But from a scientific perspective, it’s a nice piece of work with many implications, including ethical concerns and the automated review process.

Tim Hwang: That’s right. Kate, as a researcher yourself, how do you feel about this? There’s a tendency to push back, like engineers saying, “They’ll never code as well as I do.” Are these experiments fun toys? Would you use them? Would you read AI-produced papers?

Kate Soule: Well, I’m honored you call me a researcher; I work with amazing researchers at IBM Research. As a non-researcher, this might be naive, but I question if there’s something LLMs can do well in understanding past literature on a broader scale than humanly possible—analyzing and finding similar methods or approaches for new, related problems. Kaoutar, any thoughts on that?

Kaoutar El Maghraoui: I agree. They might discover things scientists can’t by pulling from a wide variety of sources. But we still need a human in the loop to validate, verify experiments, and take them to the real world. We can’t just apply LLM results directly. As systems get better, we’ll need verification for scientific discovery.

Tim Hwang: Yeah, some researchers think about the “burden of knowledge”—there’s more and more knowledge and papers. The hope is these systems can find connections between papers that people miss, reducing it to a search problem. What’s interesting here is the idea of the AI then running the experiment. How far beyond search do we need to go?

Shobhit Varshney: Yes. Just like any enterprise workflow—we help clients with R&D for new food formulations, perfumes, car products, battery research—you figure out the steps needed. When you hire a brilliant intern from MIT, you give them a specific task to augment a senior researcher. We’ll incrementally see AI helping on specific tasks in the research spectrum end-to-end. I don’t think it’ll be completely taken over by AI; it’s augmenting intelligence, not replacing. The good tandem between humans and AI will get better at what to request.

For example, creating a knowledge graph across research papers to find novel ideas from overseas. I’m interested in a future where we have AI representatives collaborating. Imagine a team of researchers with their AI counterparts in Israel talking to counterparts in the U.S., exchanging ideas and coming up with a new theorem. I’m looking forward to a world where we start using the word “we” when AI is actually doing something meaningful for us.

Tim Hwang: Well, one big drama in academia is who’s the first author. In the future, you might struggle with an LLM collaborator taking credit. That drama would be funny—humans and AIs.

Kaoutar El Maghraoui: There will be competition between models writing the best papers, AI conferences generated and reviewed by AI.

Tim Hwang: That’s right. Reviewer number two will be an AI, unjustly turning down your paper.

Shobhit Varshney: There are aspects of research researchers don’t want to do that AI will help with. For example, we’re helping utility companies file cases to increase electricity prices. They have to make a case for why to increase by X cents. We help create the submission package by researching all past submissions (publicly available). Then, knowing who’s on the panel, we look at every question they’ve asked. If I’m a reviewer, I typically ask about ethical concerns. Each reviewer has a pattern. We reverse-engineer what judges would ask and change the documentation to address those proactively. Then, for the in-person presentation, we prepare the witness based on the questions the person typically asks. AI will be really helpful in augmenting these aspects. Do you think that’ll be helpful, Kaoutar?

Kaoutar El Maghraoui: I think so, definitely. As humans we’re limited; augmented by AI, we’ll be superhumans, hopefully in the right direction.

Kate Soule: It gets back to what we were talking about: are we going to have AI try to become its own researcher, replicating humans? Or have AI specialize in parts of the process, running it faster and better to support humans in new, efficient workflows?

Tim Hwang: The news story of the week is that it’s rumored OpenAI will invest in producing its own in-house chips to support its work, partly through collaboration with Apple. This has been rumored, but now it looks more certain they’re investing big. Kaoutar, you’re the natural person to ask: why would OpenAI do this? Semiconductors are wildly expensive and hard to pull off. China has been trying to reproduce the Taiwanese semiconductor industry with moderate success. Why is OpenAI making such a big bet?

Kaoutar El Maghraoui: I think Sam Altman, OpenAI’s CEO, has made acquiring more AI chips a top priority. He’s publicly complained about the scarcity. Given rising chip costs, supply chain challenges, and the need for specialized hardware optimized for OpenAI models, this is a strategic move. Designing their own chips could enable OpenAI to tailor hardware for their specific workloads, improving performance, efficiency, and scaling potential.

Of course, there are financial challenges given the complexity of semiconductor design and manufacturing. But by creating in-house chips, OpenAI can reduce reliance on third-party manufacturers like NVIDIA, which controls about 80% of the AI hardware market. This gives them more control over the supply chain and allows specialization. While semiconductor development is challenging and costly, this move could enable OpenAI to differentiate its hardware and scale operations effectively. I think they’ve thought a lot about this; it’s a strategic move to diversify.

Tim Hwang: Totally. It’s wild that what’s cheaper than trying to get H100s is building your own semiconductor supply chain—a crazy thing to say. Kate, Shobhit, thoughts? It’s high-risk. Do we think it’ll be successful?

Kate Soule: Certainly high-risk. I want to emphasize Kaoutar’s point about opportunities in AI and hardware co-design. Developing models and the hardware that runs them in tandem can unlock new performance levels, efficiencies, and cost savings. There’s tremendous opportunity. So it makes sense to put some skin in the game, given the ways they could innovate with better control over hardware design.

Tim Hwang: Yeah, for sure. Shobhit, maybe you’re ideal to wrap up this section and close the episode. You think about what this means for business and enterprise. The semiconductor stuff is abstract, but as Kate says, there are practical implications for our experience of these technologies. What does the everyday look like if OpenAI is successful here?

Shobhit Varshney: NVIDIA is a great partner with us; we have joint clients. Yesterday, I spent the day with NVIDIA discussing work with enterprises beyond hyperscalers. They detailed their intellectual property and differentiation. They have a significant moat not just on the chip level but in architecting the entire end-to-end flow. The total cost of ownership—going from a massive data center to one box—the wiring in existing data centers is more expensive than that one NVIDIA box.

Jensen Huang made a famous statement: even if competitors (who are also customers) made free chips, the total cost would still be lower on NVIDIA. They’ve done an incredibly good job driving higher efficiencies and throughput—5x, 10x on the same footprint. It’ll take a while for a company like OpenAI to replicate everything. It’ll distract them from their core business. They should focus more on adding intelligence—what Ilya Sutskever is doing with SSI, raising a billion dollars, what Claude models are doing with responsible AI. There’s still more focus needed on solving those problems for enterprises.

The cost will come down over time, just as computing costs on NVIDIA have plummeted over the last decade. OpenAI’s focus should still be on problems that need resolving before they vertically integrate end-to-end.

Tim Hwang: Yeah, it’ll be fascinating to see. This won’t be the last time we talk about it. I’m not sad we ran out of time; we’ll pick it up in the future. That’s what we have time for today. So Shobhit, Kate, Kaoutar, thanks for joining us on the show. And for all you listeners out there, if you enjoyed what you heard, as always, you can get Mixture of Experts on Apple Podcasts, Spotify, and podcast platforms everywhere. We’ll see you next week.

Stay on top of AI news with our experts

Follow us on Apple Podcasts and Spotify.

Subscribe to our playlist on YouTube