Dieter Vanderelst, Alan Winfield (2017), Rational imitation for robots: the cost difference model

From Control Systems Technology Group
Jump to navigation Jump to search

A summary of “Rational Imitation for Robots: the Cost Difference Model”

Infants always imitate behaviour of their peers, but not blindly; instead, they use a form of rational imitation, where they will imitate something they think is valuable. If an experimenter shows how to turn on a light with its head, the infants will imitate it, but if the experimenter does the same thing with his hands tight behind his back, the infants will often use their hands instead. Infants (1) take into account the constraints of the demonstrator and (2) discount actions in favour of goals. In light of the example, infants estimate the cost, but being an estimate they test an alternative to see if its cost is lower.

Computations assumed to underlie the selection of action policies for imitation:

  1. Parsing behaviour. The existence of this parser is assumed, since it would be beyond the scope of the paper (as well as the state-of-the-art).
  2. Solving the correspondence problem, i.e. matching behaviour with earlier observed, similar behaviour. There are several papers linked tackling this problem. The output is modeled as a sequence of states in the observer’s coordinate system, denoted as o_t with t indexing the time, where t = [0, T].
  3. Comparing costs. Inferring the demonstrator’s action policy can be thought of as selecting the minimal number of intermediate states from o_t, denoted as o_s. Planned action sequence a_t = f(o_s, C), where C is the constraint of the demonstrator and thus a_t follows from the current ‘best option’ knowledge. Then dE = (Ê(o_t) - Ê(a_t)) * S(o_t), i.e. the cost of the demonstrated action sequence, minus the energetic cost of the best known action, and multiplied by the saliency of the demonstrated state o_t. Ideally, dE approaches 0 as o_s is expanded. If dE reaches a certain threshold, no imitation occurs; instead, the best known action to attain a goal is used instead. o_s is always expanded by adding states at each odd index.

In the experiment, the described infant behaviour for learning and imitating is recreated through a pathfinding test on a ‘robot’, with indeed similar results using the formalization in (3). However, this is a very basic representation of a problem far beyond this scope, and the purpose is thus to merely offer a formalization of a requirement for this kind of infant learning. However, because infant behaviour is so well replicated, the proposed method might enable us to further investigate and thus explain infant behaviour.