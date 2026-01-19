Imagine the goal is to train a network of mobile robots (AMRs) that can autonomously pick up litter from sidewalks, parks and streets without harming people or themselves. The task is not simply defined as “picking up objects,” but as detecting litter among non-litter, navigating crowded environments, choosing safe paths, picking up objects of variable shape and size and other concerns.

Once the goals are defined, the robot must be designed with the proper morphology. Should it be a humanoid robot or something else? Does it use wheels or legs? Does it need a gripper that pinches objects or a vacuum that sucks them up? What sort of cameras and sensors does it need to navigate its environment?

Then, a simulated environment is typically created. Such an environment might include terrain, litter, random objects (rocks, benches, fences, etc.), people, lighting effects and various weather conditions.

In this simulated training environment, the model governing the robot’s behavior learns what litter looks like, from bottles and cans to scraps of paper and tiny candy wrappers. It learns how to maintain balance on uneven terrain and in strong winds. It learns how to best avoid bumping into people and how to grasp glass bottles hard enough to pick them up but not so hard that it shatters them.

Each training run changes the qualities of the components involved: bigger pieces of trash, different weather conditions, more people walking around. The robot “never sees the same sidewalk twice.”

When the robot gets a defined task right, its behavior is “rewarded” with a high score, which reinforces the best behaviors. Across many iterations, the robot learns how to do its job.

Once the robot surpasses a certain success threshold, it is deployed to a real-world training environment, like a quiet street without too many people. The robot is fine-tuned to handle unexpected new conditions that weren’t present in the simulation, like wind blowing small bits of trash.

This information is used to improve the simulated training environment for additional training. The robot can then be stress-tested in more complex environments with dense crowds, in poor lighting or on wet slippery surfaces.