Episode 27: The future of agents, AI energy consumption, Anthropic's computer use and Google watermarking AI-generated text

Agents, agents and more agents! In episode 27 of Mixture of Experts, host Tim Hwang is joined by Volkmar Uhlig and Vyoma Gajjar. First, the experts chat about Mark Benioff’s spicy tweet, and what this means for the future of AI agents. Next, the conversation turns to the energy demands of powering AI models and should we be concerned? Then, the experts debrief Anthropic’s release of computer use. Finally, they explore Google’s integration of SynthID-Text into Gemini to help watermark AI-generated text and question whether this feature is needed. Tune in to learn more on this episode of Mixture of Experts.

Key takeaways:

0:00 Intro
0:35 The future of agents
5:23 AI energy consumption
13:44 Anthropic’s computer use
23:44 Google watermarking AI-generated text

The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.

Learn more about artificial intelligence.

📩 Sign up for a monthly newsletter for AI updates from IBM.

View all Mixture of Experts episodes

Listen on

Episode transcript

Vyoma Gajjar: Thank you. I do not think that it is Clippy 2.0. Microsoft Copilot has been one of the pioneers in the field of code translation, extraction, and code generation.

Tim Hwang: Volkmar Uhlig is Vice President, AI Infrastructure Portfolio Lead. Volkmar, welcome to the show. What do you think?

Volkmar Uhlig: I think the judgment is out. I rate it a 2.5.

Tim Hwang: All that and more on today’s Mixture of Experts.

I’m Tim Hwang, and welcome to Mixture of Experts. Every week, we bring you the world-class analysis, debate, and thinking you need to navigate the rapidly changing universe of artificial intelligence. We’ve got a discussion about nuclear power and AI using computers, but first, we really want to talk about the rumble happening in the “agent jungle.”

The question is: Is Copilot just like Clippy 2.0? This was inspired by a spicy tweet from Mark Benioff. But more generally, we want to focus here on Mixture of Experts on taking a look back over the last few months. The fact is, Salesforce has launched an agent platform, Microsoft has launched an agent platform—2025 is shaping up to be a battle of competition over agents, specifically agents in the enterprise.

I want to spend a little time talking about that and giving you listeners an intuition of what to expect over the next 12 months or so. Volkmar, I’ll turn to you first. Now that there are going to be so many different agent platforms to choose from, do you see different companies taking different approaches to offering these technologies to the enterprise? What do you think are the big competitive dynamics playing out here?

Volkmar Uhlig: All companies are trying to experiment. We are in a world where we are slowly moving from the training wheels, where the systems are supervised by humans. Right now, the system is in the passenger seat—that’s why it’s the “co-pilot” and not the “pilot.” At some point, I think there will be a switch-over. The systems will become more powerful and trustworthy, and then the system becomes the pilot, and the user is the co-pilot. At some point, we can get the co-pilot out of the seat, and the systems can be fully autonomous.

So, I think we are in a progression of how the technology is evolving, but at this point in time, human eyes are required on the systems. The big experimentation right now is how these user interfaces look. We kind of know how fully autonomous systems look—there’s not even a screen; in cars, there’s no steering wheel anymore. But in the systems today, we are experimenting. If you look at Microsoft, they integrated it sometimes as a chat agent on the side, sometimes directly in the applications. Apple took a different approach; Salesforce is taking different approaches. So, everybody is experimenting with the user experience at this point in time where the technology still has its training wheels on. We are going through the training wheel phase.

Tim Hwang: Yeah, for sure. It’s so interesting how much competition is happening just on the level of the interface. We don’t even know how to effectively interact with these agents.

Another angle to this question is that for the outside observer, they look at this stuff and occasionally think, “This is just Clippy 2.0.” Back in the 90s or early 2000s, we were just talking to a paperclip on a word processor. But it sounds like one reason you think this is genuinely different is that there’s a lot of experimentation happening under the hood as well. Is that right?

Vyoma Gajjar: That is correct. All the information, the legacy information that has been gathered from Clippy... you see that Microsoft has been a great company operating seamlessly for years. Imagine the amount of data it has gathered. The Clippy data, as everyone’s claiming, plus all the other information from platforms like GitHub, etc. Imagine feeding all of that information into a large language model and making your day-to-day life much better. I feel that is what we are aiming at.

There are a couple of solutions we want from this. The first is enhanced productivity, and I think Microsoft Copilot helps you do that. It also gives us a lot of our free time back to do something more productive and creative.

Tim Hwang: Yeah, that’s great. Volkmar, I know your background is in autonomous vehicles. This model—where agents are the next level of autonomy, and we’re getting people to trust the technology enough to take it to the next level—is a really interesting set of problems we’ll see play out in this space.

Volkmar Uhlig: The nice thing here is your life doesn’t depend on it.

Tim Hwang: Yeah, that’s right. All that will happen if this technology fails is that code breaks or you send a really awkward email to someone. The stakes are a little bit lower.

Well, perfect. One topic I really wanted to touch on, moving to the next segment, is AI and energy. A few weeks ago, news leaked that Microsoft was considering restarting the Three Mile Island nuclear power plant. All current projections suggest that future models will need gigawatts of power in a data center to run. We’ve danced around this topic in previous episodes, but I wanted to tackle it head-on. How are we thinking about dealing with the environmental impact of these models and the energy required to unlock all of their potential?

As someone who’s excited about the technology but also concerned about climate change, this topic is near and dear to my heart. I’m interested in the approaches people are thinking about. Volkmar, maybe I’ll start with you. I’m curious about how IBM is thinking about it, but also how you’re seeing the space evolve around this tricky problem.

Volkmar Uhlig: Sure. In general, at IBM, we are trying to leave a green thumbprint on the planet. We’re trying to be conscious of the environmental impact. The power consumption for data centers right now is about 1.5% of total power production in the United States, so it’s tiny.

With the expected growth in AI, the projections are not really friendly—assuming H100s at 700-900 watts, and the next ones from AMD at 2,000 watts. I think we have not yet done the projections for technological improvements. I do not believe we will see these high-power cards in the long run; it’s just a moment in time. But even if we stay on that projection, the total power consumption is going to increase from 1.5% to 4%.

Okay, well, take the population growth of the United States right now. That’s nothing. The population growth is already bigger than what we are adding here in total data center power consumption. So, I think the moment right now is that there is a concentrated interest in a very rapid build-out, and we are actually putting the discussion about what constitutes green and efficient energy back on the table.

I do not think that has anything to do with AI, but it’s actually a key moment, a tipping point, where we can have a conversation about nuclear power in the United States. I’m really excited about that because this is one of the cleanest power sources. Tech companies trying to utilize nuclear power in a very careful, orchestrated way is a good thing. If the conclusion is we still shouldn’t do it, then that’s a consensus for people who have these power plants in their backyard. But I think the discussion needs to be rational, and over the last 50 years, it was irrational.

Tim Hwang: Yeah, for sure. That’ll be the most interesting thing. So often in AI, the AI itself isn’t the thing, but it triggers a bigger discussion, which is fascinating.

Vyoma, you work a lot with customers and clients. Is the environmental discussion popping up? Are clients raising it, or are people looking for solutions saying, “I want you to deliver this, but we have to make sure the emissions are good”? I’m curious about what you’re seeing on the front lines.

Vyoma Gajjar: Yeah, of course. That’s a good question. In 2023, we were just getting up to speed with this technology; people wanted to know more about it. But in 2024, we see that so many of our clients want to make it much more sustainable.

As you see, clients and companies like Microsoft, Sam Altman, and others are investing in a company called Oklo. Google and Amazon have their own different ways of investing in nuclear plants. They are trying to make this more sustainable and avoid the lag. If models run on nuclear energy, they run much faster and more seamlessly. There are fewer chances of them breaking in the middle, forcing you to rerun pipelines that take hours of compute and resources.

That is something we are making clients much more aware of. I was at a client location two weeks ago telling them that 15 to 20% of our electricity comes from nuclear plants, so that’s something we have to look into. The government is also helping with the Inflation Reduction Act, giving more tax credits because, as mentioned, we have a much better structure around it. Technology has evolved; trust it, and we should be doing much better.

One thing I wanted to add is that not everyone wants to leverage large language models. People are pivoting towards smaller models that can do just the job right, through techniques like fine-tuning or prompt tuning. I feel that is also a caveat I’m seeing nowadays.

Tim Hwang: Yeah, for sure. I think you and Volkmar represent two sides of a very interesting coin. The argument you just made is that customers are thinking about smaller models to reduce their energy footprint. Volkmar also says the projections are based on the idea that current chip energy consumption will last forever, but the next generation will likely consume a lot less.

There’s an interesting interplay: Does the model need to consume as much energy, and will the hardware become more efficient? I could see a world where Volkmar’s prediction doesn’t come to pass for some time, so customers increasingly want smaller models. I could also imagine a breakthrough where the next generation of boards is so energy-efficient that people run the biggest model because it costs less energy. It’ll be interesting to see that play out. Do either of you have an impression of what’s going to hit first?

Volkmar Uhlig: The moment you have something so dominant in the market that costs so much money but has a huge upside potential, innovation will take place. We are already in a perfect market for inference; it’s a commodity. You pay by tokens, so the race to the bottom is on. The race to the bottom is across different disciplines: I can make smaller models that run faster, I can make faster inference, or I can produce power more cheaply.

I’m expecting that each participant in this market—because it’s such a big market, 2-3% of total power consumption in the US is billions of dollars at stake—will innovate. The model people will innovate on the models, the hardware people will innovate on the hardware, and the power plant people will innovate on the power plant. Overall, we are better off because now there’s a very specific problem that radiates into the rest of the economy. If you can suddenly make power at half the cost, that’s wonderful; it will make everything cheaper.

Tim Hwang: Yeah, there are other reasons we want to do that. Exactly. Volkmar, this is the time you should get back to the Bay Area startup idea.

Volkmar Uhlig: Yeah, exactly. Have you considered getting into fusion?

Tim Hwang: There you go. I’m going to push on to our next topic. Anthropic just last week launched a new feature called “Computer Use.” The basic premise is simple and fun: the idea that your AI agent will be able to take over your mouse, pilot your cursor, and do things for you as if it were a user on screen.

This generated all sorts of funny stories. One was that during testing, the Computer Use feature would occasionally get distracted. It would go off-task and pause to look at photos of Yellowstone National Park before continuing. It’s like these models simulate actual human behavior in funny ways.

But I want to start with the business question. Vyoma, I’ll toss it to you: Why is Anthropic working on a feature like Computer Use? Is it just a cool demo from a research lab, or is it actually connected to what they need to do as a business?

Vyoma Gajjar: Look at Anthropic, look at agents—everything these companies are trying to do is create a symbiotic relationship between humans and machines. In this case, I think Anthropic is trying to do that.

I feel that with the cloud models coming into play, they are trying to help augment our behavior and make our lives better and more productive. I was just speaking to my mother yesterday; she said, “I need to book this ticket, help me,” and I was in the middle of a meeting. Imagine if she had this Computer Use model. It would help so many people with training, enablement, and people with disabilities. It has a social impact angle that goes unseen, and I feel that’s what the market and clients want in the future. It has great potential.

Tim Hwang: Yeah, for sure. Volkmar, this is fun because it connects to what you were talking about earlier regarding innovation on the interface level. We invented GUIs and operating systems to make it easier for humans to interact with machines. Now, we have this funny situation where the machine is taking over that interface to pilot the machine. It’s a funny historical development. I’m curious how this fits into your earlier thoughts about all the innovation we’re seeing on the interface side.

Volkmar Uhlig: Yeah, so when I looked at their demo video, it felt kind of useless at first. I thought, “Why?” But I think there is a certain level of smartness behind it.

If a computer is interfacing with a computer, there are much better ways to do it. They use the browser; there’s an engine, it’s JavaScript. I can just directly hook into a JavaScript engine. I don’t need to render something into pixels. That rendering effort and then “un-rendering” it is just insane. Computer-to-computer interaction happens through APIs.

Now, I think we are seeing something very interesting emerging: the API to a computer is becoming the English language. That’s what large language models do. I talk to you in English, and you interface with the outside world. If you look at what ChatGPT is doing, it’s creating a Python script to automate a task, pulling data from the internet, converting it to JSON, and giving you an answer back. It’s the translator in the middle.

So, I think the ability to interface with human perception—the visual domain, not just text-based or auditory—is key. Suddenly, if I can understand the visual domain a human is consuming, I can interface with that.

If I were a business doing what Anthropic is doing, my guess is they’re probably looking at automating development processes and debugging. The demo is just showing, “Hey, look, we can do this.” But if you convert this into economic value, it’s probably in testing, quality control, and QA of software development, which employs millions of people today and can be automated. That’s the direction I would take this. It’s not about replacing machine-to-machine interaction but doing what the human is doing: “Are all my buttons correctly aligned? Is my text formatted correctly?” Then it makes sense.

In that realm—quality control, potentially data generation where you visually inspect if your code generation was correct, or if the web page renders correctly in all browsers—that’s where I can see this going.

Tim Hwang: Yeah, that’s really interesting. It’s kind of a debugging thing. It’s fascinating that their stated reason for releasing it isn’t really the ultimate business purpose.

One angle, which I don’t know if you buy, Volkmar, is that we don’t live in a world with perfect APIs. You could imagine these models being helpful for facilitating interactions when there’s no clean API for a system to talk to another system.

Volkmar Uhlig: I don’t think you would do this in the visual domain, rendering something in a browser on a laptop. I think it’s still a crazy way to do it. It’s just so inefficient to convert 10 characters of JSON into a million pixels and then try to understand that.

I think there will be a different layer. Each of these layers has value. You could also have the code generated for the API by the large language model. But you can go one layer up: I run a JavaScript engine, and the next layer up is I render the output in a web browser and read the pixels. Or, I could just read the DOM; they could have just read the DOM instead of converting it to pixels. That’s why my immediate reaction was, “Oh, this is kind of weird.”

I think from a quality control perspective, that’s huge. Then you can also say, “Please judge if this interface is better than that interface.” Suddenly, you can do experimentation. That’s where the true value comes if you can actually understand a screen.

Tim Hwang: Well, we have a bit of a difference of opinion between you and Vyoma. Vyoma, you made an argument earlier that this is amazing for interfacing with agents for your mom. Volkmar is taking a very technical approach—there are more efficient ways of doing what Computer Use does. But you’re making an argument that it might help people understand and interface with these systems, even if it’s technically less efficient. Do you agree with that?

Vyoma Gajjar: Yeah, there are two caveats to this. Right now, we belong to the tech space; that’s what we do day in and day out. When I go out and talk to clients, they have not even embarked on their AI journey. They’re still working with traditional legacy models and systems and don’t even know what AI does or where to go from here.

To onboard these clients and use cases, I feel this is a great starting point to show them the value and get them excited. One use case I’ve seen is with people retiring who have a lot of information about COBOL or legacy network systems. Where does all this legacy system information go? Companies are concerned about how to reuse this information before someone retires and augment it into new systems.

Imagine if you have a technical aspect like Computer Use that can look at logs or network issues from the past few years and show how to embed it into new software. It helps people understand that this isn’t trying to replace them but to make their life easier and bring in all the lost information. So, code translation and understanding are great use cases. Validation testing is a great use case. Another is understanding the language and the code. Code understanding would be a main use case for Computer Use. For example, if someone built a 70-year-old COBOL function, it could tell you step-by-step what is going on and how it will proceed. It can be broken down into multiple steps.

Tim Hwang: That’s great. We’ll have to see how this evolves. I guess we’ll have a long bet on whether this ends up being a debugging feature or a user-facing feature.

The final story I wanted to focus on today came from Google. They announced an advancement they were working on called SynthID text, integrated into Gemini. The idea is to help watermark AI-generated text. If you’re familiar with this space, the traditional problem is that watermarking text often forces model outputs into patterns that are not great for solving actual problems. Their claim is that this methodology is better because you can identify AI-generated text without compromising quality, accuracy, creativity, or speed.

Vyoma, I’ll kick it over to you first. Why is something like this important? Do we need watermarking for text? What’s it for?

Vyoma Gajjar: Let me answer this question one by one. We do need watermarking for text, and again, it is quite controversial that I’ve said that. Google has been very bold to at least come up with this product and be so vocal about it. Other companies have been experimenting; I know OpenAI has, but they haven’t brought it out publicly yet. Some companies fear people will stop using it because of the watermark—writers might think, “Oh, now I’ll be caught.”

But I feel watermarking is not there to catch you; it creates an ethical standard and standardization. Everyone is trying to move towards some regulation: if X amount of tokens are generated by Y model, then this is how it should be watermarked. There should be some logging on top of it. This brings a lot of confidence to clients and people that whatever model they’re using or text that has been generated has some marks or metrics attached.

I like this angle because I work heavily in AI ethics, standards, and policies. This topic comes up every other day: “How do I know this generated text is right or wrong?” Teachers come up to me and say, “I don’t know if the student copied this assignment.” It’s going to help students, teachers, and everyone create a healthier environment.

Tim Hwang: Yeah, I think it’s great. Volkmar, I’m curious if you have any thoughts on this. Clearly, this isn’t going to solve the use of these models for spreading fake information. But do you agree that these measures are necessary to make this technology used in an ethical manner?

Volkmar Uhlig: I’m on the total opposite side.

Tim Hwang: Yeah, let’s hear it.

Volkmar Uhlig: I have two school-age children, and the schools are trying desperately to prevent kids from using ChatGPT to write their essays. I believe they should just do everything on ChatGPT. The reason is that ChatGPT does not substitute thinking; it just substitutes the process of content creation, or it enhances it.

What we are now arguing is that I have a tool, and I need to tag everything produced with the tool. But I’m not tagging everything if I use a power drill instead of drilling a hole by hand. I don’t say, “Wow, I used a power drill to make this hole, and therefore I need to tell you.” I’m not announcing to the world every time I drive somewhere that I used a car and fossil fuels.

I think we are in a bifurcated world right now. We have a society that actively uses large language models and uses their power, and a society that doesn’t. Then we have people who want to regulate everything and tell everybody how to live. It’s like, “Oh my God, we need to protect the people who are not using LLMs.” The poor teachers need to change their way of educating, but it will take 100 years for them to get there. So, let’s give them tools to do the useless teaching they’ve been doing for 100 years, so they can figure out if someone is using tools of the 21st century and punish them for it.

It’s like saying I need to walk to school because my parents couldn’t drive me and save 20 minutes. I think we are at a breaking point. The technology is not yet widely adopted, but a good chunk of society—early adults and children—are using it. ChatGPT probably grew like crazy when the first kid found out it could write an essay.

We need an education system that embraces it, and we need a corporate system that embraces it. The second thing is, there’s a certain arrogance by Google to say, “Oh, look, we can watermark.” I could use another chat agent, downloaded from the internet, that removes your watermark, and you’re done. The idea that a company has such broad distribution that they can push watermarking into the world... it just tells you there will be models of different value.

There will be the Google model that watermarks everything, and the non-watermarking model that is more valuable because nobody can see that I used it. So, of course, you’ve just created an economy of cheating because you’re trying to tag everything. Except, as Google, you have the knob to turn off the watermarking for your own purposes. The idea that you could actually do this is ridiculous from my perspective.

Vyoma Gajjar: We can agree to disagree on this. There are two caveats. As Volkmar mentioned, there are people who know and understand AI, and people who are scared to use it. I feel the merging point, where everyone is comfortable, comes when all these techniques and tools have been experimented with for a while. I still feel we are a little fresh into this.

Look at the internet revolution, and then look at how short a period it’s been since the LLM boom. There haven’t been enough products or use cases that have gone into full-fledged production yet. Until we reach a point where we see the effects and long-term effects of these techniques, I feel we can keep thinking about the best ways to regulate or not. But until then, just keep experimenting and working on this. I feel somewhere we’ll all come to a merging point where everyone will be comfortable.

Volkmar Uhlig: But this is true for every technology invented by humanity. If something is three years old, we do not know, so let’s experiment with it. The US, in general, always tries first and then figures out what works and what doesn’t before regulating. Let’s not anticipate every bad problem and regulate it before anything happens. I think the US will probably be reactive in regulation. Typically, regulators are years behind. So, let’s build something valuable first before figuring out how to put guardrails around it.

Tim Hwang: We could go much longer on this. Vyoma, we’ll have to have you back on the show. Thanks for coming on. And Volkmar, it’s a pleasure as always. Thanks for joining us.

If you enjoyed what you heard, you can get us on Apple Podcasts, Spotify, and podcast platforms everywhere. Listeners, we’ll see you next week.

Explore more episodes

IBM Granite 3.0, NVIDIA’s Nemotron AI model and Perplexity’s fundraising

Can chat replace search? In episode 26 of Mixture of Experts, host Tim Hwang is joined by Kate Soule, Kush Varshney and Petros Zerfos for the IBM TechXchange week. They discuss Granite 3.0, NVIDIA’s Nemotron AI model and Perplexity’s recent fundraising efforts.

Machines of Loving Grace, Entropix, AI and elections, and GSM8K

Can AI solve infectious disease? In episode 25 of Mixture of Experts, host Tim Hwang is joined by Kaoutar El Maghraoui, Maya Murad and Ruben Boonen to chat about Machines of Loving Grace, AI and elections, Entropix and GSM8K.

AI in the Nobels, DGX B200 arrival and Unstructured’s USD 40 million funding round

Could AI win a Nobel Prize in the future? In episode 24 of Mixture of Experts, host Tim Hwang is joined by Chris Hay and Edward Calvesbert to chat about AI in the Nobels, DGX B200 and all things unstructured data.

You might like

View all podcasts

Smart Talks with IBM

Be inspired by conversations between people who are at the forefront of innovation. Tune in to hear Malcolm Gladwell—one of the world’s most renowned thinkers and writers in social science—talk to leaders about technology that can transform your business.

AI Academy

Watch AI Academy, a new flagship AI for business educational experience. Gain insights from top IBM thought leaders on effectively prioritizing the AI investments that can drive growth, through a course designed for business leaders such as you.

AI in Action

Discover how we use AI to build experiences amidst all the hype about what AI can do. In this series, our host Albert Lawrence together with business leaders and IBM technologists bypass the theoretical rhetoric and show you how to put AI into practice.