Facebook’s AI uses schemas to teach robots to manipulate objects

163 阅读1分钟

How might a two-armed robot go about accomplishing a task like opening a bottle? Invariably, it’ll need to hold the bottle’s base with one hand while grasping the cap with the other and twisting it off. That high-level sequence of steps is what’s known as a schema, and it’s thankfully uninfluenced by objects’ geometric and spatial states. As an added bonus, unlike reinforcement learning techniques that aim to solve tasks by learning a policy, schemas don’t require millions of examples ingested over the course of hours, weeks, or even months.

Recently, a team at Facebook AI Research sought to imbue two robotic Sawyer arms with the ability to select appropriate steps from a library to complete an objective. At each timestep, their agent had to decide which skill to use and what arguments to use for it (e.g., the location to apply force, the amount of force, or the target pose to move to). Despite the complexity involved, the team says that their approach yielded improvements in learning efficiency, such that manipulation skills could be discovered within only a few hours of training.

The team’s key insight was that for many tasks, the learning process could be split into two parts: (1) learning a task schema and (2) learning a policy that chooses appropriate parameterizations for the different skills. They assert that this approach leads to faster learning, in part because data from topplay thai different versions of a given task could be used to improve shared skills. Moreover, they say it allowed for the transfer of learned schemas among related tasks.

“For example, suppose we have learned a good schema for picking up a long bar in simulation, where we have access to object poses, geometry information, [and more],” explained the coauthors of the paper detailing the work. “We can then reuse that schema for a related task such as picking up a tray in the real world from only raw camera observations, even though both the state space and the optimal parameterizations (e.g., grasp poses) differ significantly. As the schema is fixed, policy learning for this tray pickup task will be very efficient, since it only requires learning the (observation-dependent) arguments for each skill.”