Dialogsysteme mit Gefühlen: Wie, warum und wohin? - Dr. Felix Burkhardt Research director, audEERING GmbH - ITG-Workshop Sprachassistenten
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Dialogsysteme mit Gefühlen: Wie, warum und wohin? Dr. Felix Burkhardt Research director, audEERING GmbH
Outlook • General motivation for emotional HMI • How to model emotions related states • Recognition • Simulation • Applications • Ethical considerations • Market 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
Trends • Ubiquitous computing accessible via a) Smart mobile devices: phones, glasses, watches, t-shirts, implants, etc. b) Home automation: central intelligence controlling media, communication, environment c) Aging society gets supported by technological interfaces d) Big data, faster hardware and new algorithms: DNNs • Uses natural interface: voice, gestures, wearables, … • Gets much nearer to user, unobtrusive • Will be emotional because it‘s easier: emotion expression is a channel of communication, e.g. urgency, irony, ... 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
Emotions and intelligence • Antonio Damasio demonstrated that emotions are central to the life-regulating processes of almost all living creatures. • E.g. brain injuries specific to emotional processing robbed people of their capacity to make decisions • Emotions help to react fast, be social, be motivated etc. • In opposition to Descartes, body and mind are not separated „The question is not whether intelligent machines can have any emotions, but whether machines can be intelligent without any emotions,“ - Marvin Minsky 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
How to model emotions: categories • …everyone except a psychologist knows what an emotion is (Young 1973) • Charles Darwin: The Expression of the Emotions in Man and Animals • The big four: • Anger • Sadness • Joy • Fear Emotions as characters in Pixar‘s „Inside • Needed to survive and „culturally universal“ Out“ (anger, fear, joy, disgust, sadness) • Many more catgorical models exist, e.g. Ekman‘s six or Plutchik‘s emotion weel 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
Dimensional models • Dimensions consider an emotion as a point in an n-dimensional emotion space. • One of the most well-known spaces is the PAD-space: • Pleasure (valence) • Arousal (activation) • Dominance • Specific dimensions are better recognized by different modalities, e.g. activation in the speech but valence in the mimics 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
Panksepp‘s seven primal Emotions • Jaak Panksepp was a neuro-scientist who suggested seven emotion categories in men and animals that can be localized in the brain. • Search (anticipation, desire) • Rage ((frustration, body surface irritation, restraint, indignation) • Fear (pain, threat, foreboding) • Panic/Loss ((separation distress, social loss, grief, loneliness) • Play ((rough-and tumble carefree play, joy) • Lust (copulation, mating) • Care ((maternal nurturance) 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
Appraisal theory • Appraisal theory means that emotions are extracted from our evaluations (appraisals or estimates) of events that cause specific reactions in different people. • E.g. Scherer's multi-level sequential check model • Three levels of processing are: innate (sensory-motor), learned (schema- based), and deliberate (conceptual) Source: https://en.wikipedia.org/wiki/Appraisal_theory 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
How are emotions expressed: modalities • User introspection: e.g. Emoticon, press button etc • Text: sentiment analysis • Audio: speech, extralinguistics • Video: facial expression, gestures, posture • Physiology: respiration rate, blood pressure, skin conductivity, neuronal activity, speech (held vowels) • Behaviour, e.g. switched room often, typing speed • Context: localization, weather, time of day, other people‘s moods etc. 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
Training and evaluation data • Ideally from the application • From an application similar to the target • From Wizard of Oz scenario • From field recordings (e.g. VAM) • From induced emotions („Lost luggage“, „Aibo“) • From actors Felix Burkhardt, Astrid Paeschke, Miriam Rolfes, Walther F. Sendlmeier and Benjamin Weiss: A Database of German Emotional Speech, Proc. Interspeech 2005 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
Ground truth and gold standards • Five human labelers annotated the emotional content of textual data using four categories. labeler A labeler B labeler C labeler D labeler E majority machine • A machine algorithm did the same labeler A 1,00 0,20 0,19 0,10 0,24 0,27 0,15 classification. labeler B 1,00 0,79 0,46 0,15 0,81 0,15 • “majority” means the majority voting of the labeler C 1,00 0,47 0,19 0,83 0,14 human labelers. labeler D 1,00 0,09 0,52 0,07 • The chart shows the Cohen’s kappa values for the so-called “inter rater agreement”, i.e. labeler E 1,00 0,29 0,10 how much each rater agrees with all other majority 1,00 0,17 raters. machine 1,00 • EWE (evaluator weighted estimator) is a possibility to weight labelers according to their inter rater agreement 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
Recognition by statistical classification • Basic approach: Early Fusion • extract features, • select best ones, • classify features, • fuse classifier outputs (unimodal/multimodal) • Classifiers: Gaussian Mixture Models: model training data as Gaussian densities, Artificial Neural Networks (Late) (ANN), e.g. Multi Layer Perceptron, Support Vector Machines (SVM): use „kernel functions“ to separate non-linear decision boundaries, Classification and Regression Trees (CART), Hidden Markov Models (HMMs) used to model temporal structure 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
Recognition by Deep Neural Networks Tremendous success during the last a) decade Three reasons: more data , faster hardware, new algorithms Can work end-to-end, no feature engineering Can be used for analysis and synthesis Can learn from unlabeled data b) BUT: needs lots of data (does it?) Is lazy → Explainabilty 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
Deep Learning II • Data Augmentation • Transfer Learning • Synthetic Training GANs Autoencoder • Unsupervised learning • Reinforcement Images: https://towardsdatascience.com learning source: medium.com 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
Speech Synthesis • With deep learning using Embeddings Style tokens Source: https://towardsdatascience.com/neural-network-embeddings-explained- 4d028e6f0526 Source: https://ai.googleblog.com/2017/12/tacotron-2-generating-human-like- speech.html Source: https://www.researchgate.net/publication/335601425_Comic- Guided_Speech_Synthesis
Emofilt • Emofilt is a Java tool to transform the prosody of a given utterance in order to simulate emotional expression • It is based on Mbrola for speech generation and an arbitrary phonemization generator like MARY or Txt2Pho • Mbrola is a diphone synthesizer from the University of Mons with databases for 34 languages
The storyteller • Used emofilt to “emotionalize” a fairytale • Asked 12 schoolkids for both versions How they like the story How they like the speaker How many facts they remember • F. Burkhardt: “An Affective Spoken Story Teller”, Interspeech, 2011 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
W3C recommendation for emotion annotation • As the Web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions. • The specification of Emotion Markup Language 1.0 aims to strike a balance between practical applicability and scientific well-foundedness. • The language is conceived as a "plug-in" language suitable for use in three different areas: • manual annotation of data • automatic recognition of emotion-related states from user behavior • generation of emotion-related system behavior https://www.w3.org/TR/emotionml/ 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
Five types of applications a) Mediated emotion b) Affect recognition c) Affect simulation d) Modeling emotional intelligence e) Modeling human emotional behavior A. Batliner, F. Burkhardt, M. van Ballegooy, E. Nöth: A Taxonomy of Applications that Utilize Emotional Awareness, Proc. IS-LTC 2006 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
Conciliation strategies • The price question in HMI dialogs is how to react on sensed emotion expression • Remember emotion implies intelligence • Human agents are expensive • Current lack of world knowledge prevents many of human strategies Literature: F. Burkhardt, K.P. Engelbrecht, M. van Ballegooy, T. Polzehl and J. Stegmann: Emotion Detection in Dialog Systems - Usecases, Strategies and Challenges. ACII 2009 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
Applications irrespective of dialog strategies • Irony / Sarcasm detection: To analyze user opinion this is still a big problem. • Call center support: distribute aggressive callers, support training. • Automated dialog support: Anger detection can be used for churn prevention or for automatic quality monitoring. • Emotional Chat: Facilitate emotional computer mediated communication. • Emotion-aware Surrounding: Computer controlled environment that adapts automatically on the user’s mood. • Search–by-emotion, e.g. Entertain product. • Believable Agent: The naturalness of an artificial ‘being’ and the appearance of intelligence is highly altered by emotional expressions. • Artificial intelligence models, use emotions for motivation modeling 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
The anger monitor • Recorded and annotated anger data in office and at home • Machine chunks and classifies continuously and warns when anger is detected • Household with two teenagers • Cross database training not successful • F. Burkhardt: "You Seem Aggressive! - Monitoring Anger in a Practical Application”, LREC, 2012 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
Ethical implications • Emotional expression can be a) voluntary (communicative) or b) involuntary • In a way, emotional expression of type b) reveals our “inner thoughts” • There are cases were people decide voluntary to detect their “subconscious” emotions (e.g. therapy) • It is not in any case of the interest of humans for these to be revealed, but what if an ethically cgorrect procedure is facilitated by AI, e.g. fraud detection? • General: keep the user in control Emerging standard: https://sagroups.ieee.org/7014/ “This standard defines "empathic technologies" as affect-sensitive technologies employed to algorithmically infer, model and simulate understanding of emotions, feelings, moods, perspective, attention and intention. Data insights and actions taken in response to these automated inferences typically, but not always, inform future interactions between a person or group and system (or between systems).” 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
Market overview Q3 2017 Audio recognition Emotions from face Sentiment from text Emotional 3.3.2020, ITG Workshop Magdeburg Emotional Dialogsysteme systems mit Gefühlen: Wie, Warum und Wohin? biosignals
Wrap up • Emotional processing comes with pervasive computing • It can be used with intuitive interfaces, more natural mediated communication and sophisticated AI models • It is ethically very sensitive • Emotional categories contrast with more complex models, but the nature of emotion is dictated by application • The market is growing, all the big players are already there • Deep Learning is a big driver 3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?
You can also read