C. G. Atkeson, J. G. Hale, F. Pollick, M. Riley, S. Kotosaka, S. Schaal, T. Shibata, G. Tevatia, A. Ude, S. Vijayakumar, E. Kawato, M. Kawato. (2000). Using Humanoid Robots To Study Human Behavior.
The aim of this paper is to optimize trajectory formulation and planning, learning from demonstration, oculomotor control, and interactive behaviors in robotics.
The inverse kinematics problem in describes the process to determine which joints (DOF) to utilize and to what angles such that the robots fingers will touch a target. The problem has no unique solution, since there are many ways to touch a target. By using an extended Jacobian algorithm and imposing optimization criteria on movement planning, for instance, by requiring that the system accomplishes a task in minimum time, the optimal trajectory can be calculated.
Trajectory-planning projects can also include a common feature in the brain that employs topographic maps as basic representations of sensory signals. Such maps can be built with various neural network approaches and utilizes by a robot when performing movement planning.
To discover methods for robots to learn from sensory information on how to acquire perceptual and motor skills. We explore neural networks, statistical- and machine learning algorithms and investigate three specific areas: supervised and unsupervised learning (1), learning from demonstration (2) and reinforcement learning (3).
1) Supervised and unsupervised learning. Working with humanoid robots has forced us to develop algorithms that:
- learn incrementally as training data is generated,
- learn in real time as the robot behaves, and
- scale to complex, high-dimensional learning problems.
We are developing learning algorithms and appropriate representations to acquire useful models automatically. Our ultimate goal is to compare the behavior of these learning algorithms with human learning. One algorithm that can deal with the high dimensionality of humanoid robot learning is locally weighted projection regression (LWPR). LWPR
- learns rapidly using second-order learning methods supporting incremental training,
- uses statistically sound stochastic crossvalidation to learn,
- adjusts its local weighting kernels (how much and what shape area the local model covers) based only on local information to avoid interference with other models,
- has a computational complexity that is linear in the number of inputs, and
- can detect redundant or irrelevant inputs.
2) Learning from demonstration (LFD) is learning by following an demonstration. Important questions to ask ourselves however are: How does the learner know what is important or irrelevant in the demonstration? How does the learner infer the performer’s goals? How does the learner generalize to different situations? Thus we can identify three key challenges before LFD becomes viable:
- to be able to perceive and understand what happens during a demonstration;
- to find an appropriate way to translate the behavior into something the robot can actually do—it is humanoid, not human, so it has many fewer joints and ways to move, and it is weaker and slower than a human; and
- to enable the robot to fill in missing information using learning from practice—many things are hard or impossible to perceive in a demonstration, such as muscle activations or responses to errors that do not occur in the demonstration.
Perceiving human movement can be achieved by having robots recreate or predict measured images based on recovered information. Thus, perception becomes an optimization process, trying to find the underlying movement or motor program that predicts the measured data and deviates the least from what we know about human movement.
Translating movement and inferring goals is a somewhat difficult objective, since most human movement is really complex and requires far too many DOFs that most simple robots movements by leaving out details. Developing an algorithm or automatic criterion function for scoring the motion is very time consuming however. The solution: something more abstract than motion trajectories must be transferred in LFD. The robot must perceive the teacher’s goals to perform the necessary abstraction. We are exploring alternative ways to do this.
Learning from practice using reinforcement learning (3). In our LFD approach, the robot learns a reward function from the demonstration that then lets it learn from practice without further demonstrations. The learned function rewards robot actions that look like the observed demonstration. This simple reward function does not capture the true goals of actions, but it works well for many tasks. From the implementation we learned the following lessons:
- Simply mimicking demonstrated motions is often not adequate.
- Given the differences between the human teacher and the robot learner and the small number of demonstrations, learning the teacher’s policy (what the teacher does in every possible situation) is often impossible.
- However, a task planner can use a learned model and a reward function to compute an appropriate policy.
- This model-based planning process supports rapid learning.
- Both parametric and nonparametric models can be learned and used.
- Incorporating a task-level direct learning component that is non-model-based, in addition to the model-based planner, is useful in compensating for structural modelling errors and slow model learning.