If RH20T showed what one focused lab could do, Open X-Embodiment demonstrated the power of pooling resources at unprecedented scale. The project brought together 21 institutions to aggregate data across 22 different robot platforms (integrated hardware and software systems used for building and testing robots)—a radical departure from the traditional model of individual labs keeping their own datasets proprietary. Karl Pertsch, a postdoctoral researcher at UC Berkeley and Stanford University and one of the project’s organizers, said the speed was remarkable.

“At the time, around 2023, the largest robot datasets that were used in research were on the order of a few tens of hours of robot data,” Pertsch said in an interview with IBM Think. “Open X contains about 2,000 hours and was put together within the span of just a few months.”

The innovation was cross-embodiment learning: train on multiple robot types and models learn general task representations instead of robot-specific quirks. The proof came quickly, according to Pertsch. “A robot that had never seen a Coke can in its training data was able to pick up a Coke can after it was co-trained on data from other robots that featured Coke cans.”

Getting 21 institutions to collaborate wasn’t trivial. It involved countless meetings, alignment on data formats and work by many graduate students, Pertsch noted. But the robotics community was ready. “All the people we reached out to were actually very excited to contribute,” he said. “I think there was a shared belief in large parts of the robotics community at the time that we needed larger and more diverse datasets.”

The impact has been dramatic. “The community has broadly adopted Open X as the default data source for larger-scale data-driven robot learning research in the open-source community,” Pertsch said. The most impressive validation? “Most VLAs [vision-language-action models] today, open and closed, use at least a portion of the Open X dataset as part of their data mix.”

But Open X-Embodiment had a limitation, according to Pertsch: most contributing datasets came from controlled lab environments. The solution, he said, was DROID (Distributed Robot Interaction Dataset), one of Open X-Embodiment’s major components that pushed the collaboration into real-world settings. DROID sent 50 data collectors across three continents to gather manipulation data in realistic environments, including graduate students’ homes.

The logistics were intense. Pertsch recalled that one of his colleagues spent a lot of time driving robots around in rented vans. But there was a payoff. “Today, the most generalizable open-source robot models are trained on DROID, and DROID has become a great platform for testing robot models out of the box, just like you would download an LLM and prompt it to do something,” Pertsch said.