What is physical AI?

A row of robotic arms at work

Physical AI, explained

Physical AI refers to artificial intelligence (AI) systems that operate in and interact with the physical world, rather than existing only in software or digital environments.

Physical AI typically involves the combination of AI models with sensors, actuators and other control systems that allow models to act upon real-world environments, taking models from the realm of bits to the realm of atoms. With AI, advanced physical systems can now perceive the environment, reason with the power of a large language model (LLM), act accordingly, and then learn from the outcome of that action.

Another way of thinking about physical AI is that it is simply AI-powered models applied to systems in physical space. For example, robotics focuses on mechanics and control of physical machines. Before AI, robot behavior was typically rule-based or scripted, and robots could only perform narrow tasks within specifically engineered environments. Think of a robotic arm that welds the same seam 1,000 times a day on an automotive production line, or an early-generation robotic vacuum that follows preset navigation rules.

In contrast, robotic AI agents equipped with general understanding from LLMs have a limited but still powerful “common sense” about the world. These models can be paired with reinforcement learning techniques in high-performance hybrid architectures so that robots can possess both general knowledge and a specialized understanding of a specific use case.

What’s more, physical AI goes far beyond individual robots to entire AI-powered factories, energy-efficient smart grids or fleets of automated vehicles. Many systems that exist in physical space can be augmented with AI.

Why is physical AI a hot topic?

Several bottlenecks that previously prevented a physical AI revolution are being broken at the same time. The first and most important is the arrival of generative AI, powered by foundation models. Today’s large computer vision and multimodal models can recognize objects, understand spatial relationships and generalize across settings. This reduces the amount of specific training required for individual tasks and allows systems to re-use intelligence across them.

The second challenge is now being overcome by the power of modern simulation, which combines high-fidelity physics modelling, photorealistic rendering and parallelization. This dramatically reduces model training times and makes simulation useful not just for testing but as a primary training ground. A related trend is the explosion of compute availability. Breakthroughs in GPUs and data centers have made training at scale feasible.

Finally, hardware is better than ever. Modern robots have better sensors and lighter materials. They can take advantage of recent edge AI breakthroughs and better communications capabilities. These innovations have made experimentation viable, even for small startups. The result is a renaissance for physical automation initiatives, from autonomous vehicles to industrial robots and healthcare bots that perform surgery and other complicated procedures.

Jensen Huang, CEO of Nvidia, is widely credited with popularizing the term ”physical AI” and framing it as the next major wave of AI-driven innovation. During a January 2026 podcast interview, Huang predicted a future with “a billion robots.”1 This vision involves a new global economy around developing and maintaining all these new robots, which could become one of the largest industries on the planet, nothing less than a second industrial revolution.

That same month, Nvidia released a collection of open models, frameworks and advanced AI infrastructure for physical AI.2 The release touted new technologies to speed up workflows across “the entire robot development lifecycle.”

“The ChatGPT moment for robotics is here,” Huang said.

The release includes open, fully customizable world models that enable physically based synthetic data generation and robot policy evaluation in simulation for physical AI, an open reasoning vision language model and an open reasoning vision language action model. This came alongside new simulation and compute frameworks.

The latest AI trends, brought to you by experts

Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.

Thank you! You are subscribed.

Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.

How does physical AI work?

Imagine the goal is to train a network of mobile robots (AMRs) that can autonomously pick up litter from sidewalks, parks and streets without harming people or themselves. The task is not simply defined as “picking up objects,” but as detecting litter among non-litter, navigating crowded environments, choosing safe paths, picking up objects of variable shape and size and other concerns.

Once the goals are defined, the robot must be designed with the proper morphology. Should it be a humanoid robot or something else? Does it use wheels or legs? Does it need a gripper that pinches objects or a vacuum that sucks them up? What sort of cameras and sensors does it need to navigate its environment?

Then, a simulated environment is typically created. Such an environment might include terrain, litter, random objects (rocks, benches, fences, etc.), people, lighting effects and various weather conditions.

In this simulated training environment, the model governing the robot’s behavior learns what litter looks like, from bottles and cans to scraps of paper and tiny candy wrappers. It learns how to maintain balance on uneven terrain and in strong winds. It learns how to best avoid bumping into people and how to grasp glass bottles hard enough to pick them up but not so hard that it shatters them.

Each training run changes the qualities of the components involved: bigger pieces of trash, different weather conditions, more people walking around. The robot “never sees the same sidewalk twice.”

When the robot gets a defined task right, its behavior is “rewarded” with a high score, which reinforces the best behaviors. Across many iterations, the robot learns how to do its job.

Once the robot surpasses a certain success threshold, it is deployed to a real-world training environment, like a quiet street without too many people. The robot is fine-tuned to handle unexpected new conditions that weren’t present in the simulation, like wind blowing small bits of trash.

This information is used to improve the simulated training environment for additional training. The robot can then be stress-tested in more complex environments with dense crowds, in poor lighting or on wet slippery surfaces.

Reinforcement learning

The reward mechanism described above is part of reinforcement learning, a type of machine learning process in which autonomous agents learn to make decisions from trial and error interactions with their environment. Reinforcement learning is crucial for robotics because agents learn behavior through interaction over time, which is what robots must do in the physical world.

The world is messy: surfaces differ, objects deform, sensor data is noisy and humans behave unpredictably. Scalability can’t be achieved when writing hard rules for every situation. Reinforcement learning allows robots to discover strategies on their own by experimenting within constraints. Instead of being told how to move, the robot learns which behaviors work best under real conditions.

Reinforcement learning excels where other machine learning methods fail. For example, grasping litter involves approaching it, aligning a manipulator, adjusting force and lifting—all while responding to real-time feedback. Supervised learning methods can theoretically label what a “good grasp” looks like, but it cannot easily teach how to recover from a slip or adapt mid-motion. Reinforcement learning, by contrast, optimizes entire action sequences based on long-term outcomes.

This is just one example of how a robot might be trained. There are many other methods for physical AI systems like supervised and unsupervised learning, imitation learning and learning from demonstration (LfD).

AI Academy

Become an AI expert

Gain the knowledge to prioritize AI investments that drive business growth. Get started with our free AI Academy today and lead the future of AI in your organization.

Challenges in training physical AI

Training physical AI works differently from training nonphysical autonomous systems for a few reasons.

  • Data is expensive
     

  • Physics is hard
     

  • Time is of the essence
     

  • Real stakes

Data is expensive

While traditional AI models are trained on static datasets, including text, images and audio, physical AI usually requires data of robots interacting with real environments. In traditional machine learning training, data can be easily scraped, copied and re-used cheaply. Not so with physical AI. One typically can’t just “download a dataset.”

Data collection takes time. Every data point requires a robot to move its body, manipulate objects, or just observe things happening in its environment in continuous time. In the real world, machines break down. Gaskets are known to blow, creating complexities for gathering good training data.

Physics is hard

Physical AI must contend with physics. Gravity, friction, temperature, torque, balance, timing, momentum, wear, noise, lag—the real world is infinitely complex, which is why models that look great in simulated environments often fail when tested in the field.

To grapple with the uncertainties and complexities of physics, training might incorporate physics-informed models or hybrid systems in which simpler control algorithms ensure stability and learning models are limited to handling perception and decision-making.

Time is of the essence

Physical systems operate in continuous time. In many use cases, tight feedback loops with minimal latency are required between perception, decision and action. Small delays can cause failures. Often speed is just as important or even more important than accuracy. In other AI domains, it’s usually all about getting the most accurate output, but factoring in the need for speed introduces a major engineering challenge.

Real stakes

In most AI training environments, errors are harmless and easily discarded. But stakes are high in the real world. If an LLM makes a wrong prediction in a digital environment, a human can choose to act on it or not. In contrast, if a self-driving car incorrectly predicts the speed of the car in front of it, the results can be catastrophic. Training often involves constraints and gradual increases in autonomy, sometimes requiring human oversight and other forms of monitoring.

The role of synthetic data

To address the drawbacks above, researchers rely heavily on simulated environments and synthetic data, generated by robots, often virtual, interacting with virtual environments.

The use of world foundation models (WFM) is increasingly common in robotics. A WFM is a powerful AI system that has learned the dynamics of the physical world (geometry, motion, physics) from vast amounts of real-world data, enabling it to generate realistic, physics-aware scenarios for training physical AI.

This simulation often involves the creation of a digital twin of a system or environment, like a factory. In this virtual space, autonomous machines perform tasks, generating synthetic data about how these machines performed in the virtual space.

Techniques like domain randomization, in which the characteristics of simulated environments are intentionally generated in all sorts of random ways, can help produce more useful synthetic data, resulting in more robust models that are able to transfer their skills to messy, highly variable reality. However, an overreliance on synthetic data can lead to overfitting.

Author:

Cole Stryker

Staff Editor, AI Models

IBM Think

Related solutions
IBM® watsonx Orchestrate™ 

Easily design scalable AI assistants and agents, automate repetitive tasks and simplify complex processes with IBM® watsonx Orchestrate™.

Explore watsonx Orchestrate
Artificial intelligence solutions

Put AI to work in your business with IBM’s industry-leading AI expertise and portfolio of solutions at your side.

Explore AI solutions
Artificial intelligence consulting and services

IBM Consulting AI services help reimagine how businesses work with AI for transformation.

Explore AI services
Take the next step

Whether you choose to customize pre-built apps and skills or build and deploy custom agentic services using an AI studio, the IBM watsonx platform has you covered.

Explore watsonx Orchestrate Explore watsonx.ai
Footnotes:
  1. Jensen Huang, January 2026 podcast interview (video), No Priors: AI, Machine Learning, Tech, & Startups, YouTube.com, Jan 8, 2026
     

  2. NVIDIA Newsroom: NVIDIA Releases New Physical AI Models as Global Partners Unveil Next-Generation Robots., Nvidia.com, January 5, 2026