Toyota Robots That Do Housework! – Lifeboat News: The Blog

“Operating and navigating in home environments is very challenging for robots. Every home is unique, with a different combination of objects in distinct configurations that change over time. To address the diversity a robot faces in a home environment, we teach the robot to perform arbitrary tasks with a variety of objects, rather than program the robot to perform specific predefined tasks with specific objects. In this way, the robot learns to link what it sees with the actions it is taught. When the robot sees a specific object or scenario again, even if the scene has changed slightly, it knows what actions it can take with respect to what it sees.

We teach the robot using an immersive telepresence system, in which there is a model of the robot, mirroring what the robot is doing. The teacher sees what the robot is seeing live, in 3D, from the robot’s sensors. The teacher can select different behaviors to instruct and then annotate the 3D scene, such as associating parts of the scene to a behavior, specifying how to grasp a handle, or drawing the line that defines the axis of rotation of a cabinet door. When teaching a task, a person can try different approaches, making use of their creativity to use the robot’s hands and tools to perform the task. This makes leveraging and using different tools easy, allowing humans to quickly transfer their knowledge to the robot for specific situations.

Historically, robots, like most automated cars, continuously perceive their surroundings, predict a safe path, then compute a plan of motions based on this understanding. At the other end of the spectrum, new deep learning methods compute low-level motor actions directly from visual inputs, which requires a significant amount of data from the robot performing the task. We take a middle ground. Our teaching system only needs to understand things around it that are relevant to the behavior being performed. Instead of linking low-level motor actions to what it sees, it uses higher-level behaviors. As a result, our system does not need prior object models or maps. It can be taught to associate a given set of behaviors to arbitrary scenes, objects, and voice commands from a single demonstration of the behavior. This also makes the system easy to understand and makes failure conditions easy to diagnose and reproduce.”