PRE2020 3 Group8: Difference between revisions

From Control Systems Technology Group
Jump to navigation Jump to search
Line 72: Line 72:
Li et al. (2016) <ref name="DeepRLForDialogGeneration"/> propose an algorithm that combines the two categories. That is, they train a <span style="font-variant-caps: small-caps;">Seq2Seq</span> model (an encoder-decoder recurrent neural network) with reinforcement learning, by having two agents communicate against each other. The state of the conversation is represented by the previous two responses, which are alternately generated by the two agents.
Li et al. (2016) <ref name="DeepRLForDialogGeneration"/> propose an algorithm that combines the two categories. That is, they train a <span style="font-variant-caps: small-caps;">Seq2Seq</span> model (an encoder-decoder recurrent neural network) with reinforcement learning, by having two agents communicate against each other. The state of the conversation is represented by the previous two responses, which are alternately generated by the two agents.
The policies (i.e. the neural networks) are optimized towards maximizing rewards attained by their actions. The rewards are defined in terms of how well a response keeps the conversation going. In particular, the rewards are higher the more different the response form responses on a predefined list of 'dull' responses. This is implemented by taking the negative log of the probability that a 'dull' response will be generated (note that the negative log of a small probability has a greater value than the negative log of a small probability):
The policies (i.e. the neural networks) are optimized towards maximizing rewards attained by their actions. The rewards are defined in terms of how well a response keeps the conversation going. In particular, the rewards are higher the more different the response form responses on a predefined list of 'dull' responses. This is implemented by taking the negative log of the probability that a 'dull' response will be generated (note that the negative log of a small probability has a greater value than the negative log of a small probability):
<math> = \frac{1}{N_{S}}</math>
<math> = \frac{1}{N_{S}}</math>
<math>\sin 2\pi x + \ln e</math>
<math>\sin 2\pi x + \ln e</math>

Revision as of 16:22, 2 February 2021

Group description

Abstract

A pure software end-user application that supports people in their need to socialize while motivating self-improvement. Anthropomorphism is intentionally used to increase user commitment and experience. Machine learning techniques are used to process user's data and provide feedback, and to facilitate the anthropomorphized interface.


Members

(in alphabetical order):

  • Edwin Steenkamer
  • Emi Kuijpers (1227154)
  • Fanni Egresits
  • Morris Boers (1253107)
  • Lulof Pirée (1363638)


GitHub Page:

GitHub

Logbook

See the page logbook_group_8

Problem statement and objectives

Goals

The software application should:

  • Significantly reduce symptoms of loneliness as induced by infrequent social contact in users
  • Register personal goals set by the users
  • Collect data on the user's behavior and progress towards goals
  • Provide the user with feedback and constructive nudges

Beyond the scope

The following features are probably valuable additions to the product, but they are beyond the scope of what can be achieved in one quartile:

  • Voice recognition
  • Animated anthropomorphized interface (e.g. simulated face)

Who are the users

The target of the application is to support civilians in daily life. The audience of the prototype is narrowed down to adolescents and adults who use computers on a daily basis.

TODO...

Approach, milestones and deliverables

TODO...

Literature Review

Statistical dialog systems

Two categories: Seq2Seq and task-oriented

Statistical dialog systems can be divided into two major categories[1]. The first category learns mappings from input messages to responses. In the simplest case this learning a probability distribution. More advanced algorithms, such as Seq2Seq, do take prior context into account. In particular, Seq2Seq uses two LSTMs (Long Short-Term Memory, a commonly used variant of Recurrent Neural Networks): one to encode input messages to an abstract feature vector, and another to convert such vectors to a reply [2].

The other category are task-oriented dialogue systems. These are often tuned to a specific domain application, and trained with reinforcement learning. Examples are statistical models based on Markov Decision Processes (a typical model for reinforcement learning) and models that learn generation rules. Because of their fine-tuned nature they cannot flexibly employed beyond their domain.

Seq2Seq with Deep Reinforcement Learning

(This section assumes some basic knowledge of Reinforcement Learning)

Li et al. (2016) [1] propose an algorithm that combines the two categories. That is, they train a Seq2Seq model (an encoder-decoder recurrent neural network) with reinforcement learning, by having two agents communicate against each other. The state of the conversation is represented by the previous two responses, which are alternately generated by the two agents. The policies (i.e. the neural networks) are optimized towards maximizing rewards attained by their actions. The rewards are defined in terms of how well a response keeps the conversation going. In particular, the rewards are higher the more different the response form responses on a predefined list of 'dull' responses. This is implemented by taking the negative log of the probability that a 'dull' response will be generated (note that the negative log of a small probability has a greater value than the negative log of a small probability):

[math]\displaystyle{ = \frac{1}{N_{S}} }[/math] [math]\displaystyle{ \sin 2\pi x + \ln e }[/math]

Overview

Work-in-progress-page

See the page WIP group 8 for an actively edited file of notes.

User guide

TODO...

Software documentation

TODO...

References

  1. 1.0 1.1 Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, Dan Jurafsky (2016). Deep Reinforcement Learning for Dialogue Generation. Published: arXiv.org. URL: [1]. Date accessed: 01-02-2021.
  2. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le (2014). Sequence to sequence learning with neural networks. Published: Advances in neural information processing systems, pages 3104-3112. URL: [2]. Date accessed: 02-02-2021.