Dialogue act classification is a laughing matter - SemDial 2021

Page created by Helen Olson
 
CONTINUE READING
Dialogue act classification is a laughing matter - SemDial 2021
Dialogue act
classification is a
laughing matter
 Centre for Linguistic Theory and Studies
Vladislav Maraev*
 *

 in Probability (CLASP), Department of
Bill Noble* Philosophy, Linguistics and Theory of
 Science, University of Gothenburg
Chiara Mazzocconi†
Christine Howes* Institute of Language, Communication,
 †

 and the Brain, Laboratoire Parole et
PotsDial 2021 Langage, Aix-Marseille University
 1
Dialogue act classification is a laughing matter - SemDial 2021
Why laughter?
• We laugh a lot: laughter can make up to
 17% of conversation.*
• It is social: we are 30 times more likely† to
 laugh in presence of others.
• Kids laugh before they learn to speak;
 laughter can be informative about their
 development.‡
* Tian, Y., Mazzocconi, C., & Ginzburg, J. (2016). When do we laugh? In Proc. of SemDial-2016.
† Provine, R. R. (2004). Laughing, tickling, and the evolution of speech and self. Current Directions in
 Psychological Science, 13(6):215–218.
‡ Mazzocconi, C., & Ginzburg, J. (2020). Laughter growing up. In Laughter and Other Non-Verbal Vocali-
 sations Workshop: Proceedings (2020). 2
Dialogue act classification is a laughing matter - SemDial 2021
What is laughter?
• Laughter is not the same as humour.
• Laughter can convey a wide spectrum of
 emotions: from embarrassment to joy.
• Laughter can express (or interplay with) a
 communicative intent.
• Laughter can be a subject of clarification
 request.*
* Mazzocconi, C., Maraev, V., & Ginzburg, J. (2018). Laughter Repair. In Proceedings of the 22nd Work-
shop on the Semantics and Pragmatics of Dialogue. Aix-en-Provence, France.
 3
Communicative intent
• Laughter can help determining sincerity of
 an utterance (e.g. sarcasm).*
• Listeners can be influenced towards
 non-literal interpretation of sentences
 accompanied by laughter.†
• We explore the role of laughter in
 attributing communicative intents to
 utterances.
* Tepperman, J., Traum, D., & Narayanan, S. (2006). ‘Yeah right’: sarcasm recognition for spoken dialogue
systems. In , Ninth International Conference on Spoken Language Processing. 4
† Bryant, G. A. (2016). How do laughter and language interact? In Proceedings of EVOLANG11.
The concept of a dialogue act (DA)
• based on Austin’s concept of a speech act*
• considers not only a propositional content
 of an utterance but also performed action
• Dialogue act is an extension of speech act,
 focussing on interaction.
• Dialogue act recognition (DAR) is task of
 labelling sequence of utterances with DAs.

* Austin, J. L. (1975). How to do things with words. Oxford university press.
 5
Example SWDA-2827
 Utterance Dialogue act
A: Well, I’m the kind of cook that I don’t Statement-non-
 normally measure things, opinion (sd)
A: I just kind of throw them in sd
A: and, you know, I don’t to the point of,
 you know, measuring down to the exact sd
 amount that they say.
B: That means that you are real cook. Statement-
 opinion
A: Oh, is that what it means Downplayer
A: Uh-huh. Backchannel
A: Non-verbal 6
In this work
• We explore collocation of laughs and
 dialogue acts.
• We investigate whether laughter is helpful
 for the computational task of dialogue act
 recognition (DAR).

 7
Data
• Switchboard Dialogue Act Corpus (SWDA)*
• 220 dialogue acts according to DAMSL
 schema† clustered into 42 DAs
• 1155 conversations, 400k utterances,
 3M tokens

* Jurafsky, D., Shriberg, E., & Biasca, D. (1997). Switchboard Dialog Act Corpus. International Computer
Science Inst. Berkeley CA, Tech. Rep.
† Jurafsky, D., Shriberg, L., & Biasca, D. (1997). Switchboard SWBD-DAMSL Shallow-Discourse-Func-
tion Annotation Coders Manual.
 8
0.7
 Non-verbal
 Non-verbal

0.6

0.5

0.4
 Downplayer

0.3 Apology

0.2
 Downplayer
 Apology
0.1

 0
 9
 contain laughter adjacent with laughter
Pentagonal within
 the given DA Statement-non-opinion

representations Apology
 Downplayer
of DAs
 in previous utterance in next utterance
 by self by self

 0.16
 0.12
 0.08
 0.04

 in previous utterance in next utterance
 by other by other

 10
0.06

 Tag-Question

0.05
 Apology

0.04

 Summarize/reformulate

0.03

 Rhetorical-Questions
 Negative non-no answers

0.02 Self-talk

 Or-Clause Statement-non-opinion
 Dispreferred answers
 Statement-opinion Action-directive
 Thanking
 Offers/Options/Commits
0.01 Conventional-opening
 Hedge Collaborative Completion
 Abandoned or Turn-Exit or Uninterpretable
 Declarative Yes-No-Question
 Other answers
 Open-Question
 Affirmative non-yes answers
 0 3rd-party-talk
 Hold before answer/agreement
 Backchannel in question form Other
 No answers
 Wh-Question
 Repeat-phrase
 Yes answers Reject
-0.01 Yes-No-Question
 Declarative Wh-Question
 Conventional-closing

 Response Acknowledgement
 Acknowledge (Backchannel)
 Segment (multi-utterance)
 Appreciation
-0.02

-0.03

 Quotation
 Agree/Accept

-0.04 Maybe/Accept-part

 Signal-non-understanding Downplayer

-0.05
 -0.1 -0.05 0 0.05 0.1 0.15
.05 .10

Statement-non-opinion Acknowledge Statement-opinion Segment Abandoned or Turn-Exit Agree/Accept
 (Backchannel) (multi-utterance) or Uninterpretable

 Appreciation Yes-No-Question Yes answers Conventional-closing Wh-Question No answers

 Response Hedge Declarative Backchannel Quotation Summarize/reformulate
 Acknowledgement Yes-No-Question in question form

 Other Affirmative non-yes answers Action-directive Collaborative Completion Repeat-phrase Open-Question

Rhetorical-Questions Hold before Reject Negative non-no answers Signal-non-understanding Other answers
 answer/agreement

Conventional-opening Or-Clause Dispreferred answers 3rd-party-talk Offers, Options Commits Maybe/Accept-part

 Downplayer Self-talk Tag-Question Declarative Wh-Question Apology Thanking
Modification and enrichment of
current DA (with a degree of urgency)
• smoothing/softening: Action-directive,
 Reject, Dispreferred answer, Apology
• stress positive disposition: Appreciation,
 Downplayer, Thanking
• cue less probable, non-literal meaning:
 Rhetorical questions

 13
Benevolence induction
• Laughter can induce or invite a
 determinate response (Downplayer,
 Appreciation).
• Self-talk: signals ‘social’ incongruity of the
 action

A: Well, I don’t have a Mexi-, - Statement n/o
A: I don’t, shouldn’t say that, Self-talk
A: I don’t have an ethnic maid . Statement n/o

 14
within
 the given DA
 Apology
 Downplayer

in previous utterance in next utterance
 by self by self

 0.16
 0.12
 0.08
 0.04

 in previous utterance in next utterance
 by other by other

 15
Apology and Downplayer
A: I'm sorry to keep you waiting Apology
 #.#
B: Okay Downplayer
A: Uh, I was calling from work Statement n/o

• The positive effect of laughter is attained
 and successful.
• We also recently discovered that in case of
 social incongruity laughter is likely to be
 followed by gaze ‘check’.*
 16
* Mazzocconi C., Maraev V., Somahekarappa V., Howes C. Looking for laughs: Gaze interaction with
 laughter pragmatics and coordination, accepted at ICMI 2021
Intermediate conclusion
• Laughter is tightly related to dialogue
 information structure.
• ...should it then be an important cue for a
 computational model?

 17
Dialogue act recognition model
• We are using BERT pre-trained on massive
 non-dialogical data (see Noble and
 Maraev, 2021).*
 Ŷ0 Ŷ1 ... ŶT

 h1 h2 hT
 RNN RNN ... RNN

 Encoder Encoder ... Encoder

 s0 , w00 , w10 , ..., wn0 0 s1 , w01 , w11 , ..., wn1 1 ... sT , w0T , w1T , ..., wnTT
         
 Utterance 1 Utterance 2 Utterance T
 Utterance 1 Utterance 2 Utterance T

* Noble, B., & Maraev, V. (2021). Large-scale text pre-training helps with dialogue act recognition, 18
but not without fine-tuning. In Proceedings of the 14th International Conference on Computational Seman-
tics (IWCS) (pp. 166–172).
Results

 macro F1 accuracy
BERT-NL 36.48 76.00
BERT-L 36.75 76.60
BERT-L+OSNL 43.71 76.95
BERT-L+OSL 41.43 77.09

• OS = OpenSubtitles, 350M tokens, 0.3%
 laughter tokens
 19
A.2 Model performance in DAR task
 1
 adjacent utterances contain laughter
 contains laughter

 0.5

 0
 Sta Ac Sta Se Ab Ag Ap Ye No Ye Co W No Re He De Ba Qu Su Ot Af Ac Co Re Op Rh Ho Re Ne Sig Ot Co Or Di 3rd Of M Do Se Ta De Ap Th
 tem kn tem gm an ree pre s-N n-v s a nv h-Q an spo dg cla ckc ota mm her firm tio llab pea en eto ld jec gat na her nv -Cl spr -p fer ayb wn lf-t g-Q cla olo ank
 en owl en ent don /Ac cia o-Q erb nsw enti ues sw nse e rativ han tion ari ati n-di ora t-ph -Qu rica befo t ive l-n an ent au efe arty s, O e/A pl alk ue rat gy in
 sti ive
 t-n ed t-o (m ed ce tio u al er on tio ers A e Y nel ze/ ve rec tiv ra es l-Q re no on- swe iona se rred -ta pti cce ayer on W
 g
 on ge pin ult or pt n est s al- n
 clo ck e s in ref no tiv e C se tion ue an n-n un rs l-o an lk ons pt-p h-Q
 -op (B io i-u Tu ion no -N qu orm n-y e om sti sw o a ders pe sw Co art
 ini ack n tter rn- sin wl o nin
 on ch an Ex g edg o e s
 -Q tio u lat
 e
 es an p let n s er/ag ns tan
 we di g
 ers mm ue
 sti
 an ce) it eme u est fon s we io n r eem n
 rs g its on
 ne or nt ion rm rs
 l) Un en
 int t
 erp
 ret
 ab
 le
In the first experiment we investigated whether 563
 ft
laughter,ft
 fa as an example of a dialogue-specific sig- fa
 qw^d
 564
 qw^d ^g
nal, ^gis
 t1 a helpful feature for DAR. Therefore, we
 t1
 bd
 565
 bd aap_am
train
 aap_am
 oo_co_ccanother version of each model: one containing
 t3
 oo_co_cc
 t3 566
 arp_nd
laughs qrr (L) and one with laughs left out (NL), and
 arp_nd
 fp
 qrr
 fp 567
 no no
compare br
 ng
 their performances in DAR task. Table 2 br
 ng
 ar
 568
 ar
compares
 ^h
 qh
 the results from applying the models with ^h
 qh
 qo
 569
 qo
twob^m different
 ^2 utterance encoders (BERT, CNN). b^m
 ^2
 ad 570
 ad na
 First
 na
 fo_o_fw_
 bf
 of all, we can see that BERT outperforms fo_o_fw_
 bf 571
 ^q
 bhin all cases. From observing the effect of
 ^q
CNNqy^d bh
 qy^d 572
 h h
laughters,
 bk
 nn
 we can see differences in performance bk
 nn 573
 qw qw
depending
 fc
 ny
 on the dialogue act, regardless of how fc
 ny
 x 574
 x
oftenbaqy laughter occurs in the current or adjacent qy
 ba
 aa 575
 aa
utterances
 %
 + (see Figure 7 in Appendix A). The %
 sv
 +
 576
 sv b
strongest
 sd
 b
 evidence for laughter as a helpful fea- sd
 577

 sd
 b
 sv
 +
 %
 aa
 ba
 qy
 x
 ny
 fc
 qw
 nn
 bk
 h
 qy^d
 bh
 ^q
 bf
 fo_o_fw_
 na
 ad
 ^2
 b^m
 qo
 qh
 ^h
 ar
 ng
 br
 no
 fp
 qrr
 arp_nd
 t3
 oo_co_cc
 aap_am
 bd
 t1
 ^g
 qw^d
 fa
 ft
ture was found in SWDA for the BERT utterance
 sd
 b
 sv
 +
 %
 aa
 ba
 qy
 x
 ny
 fc
 qw
 nn
 bk
 h
 qy^d
 bh
 ^q
 bf
 fo_o_fw_
 na
 ad
 ^2
 b^m
 qo
 qh
 ^h
 ar
 ng
 br
 no
 fp
 qrr
 arp_nd
 t3
 oo_co_cc
 aap_am
 bd
 t1
 ^g
 qw^d
 fa
 ft
 578
encoder, where macro-F1 score increases by 7.89
 579
percentage points.
 BERT-NL
 Confusion matrices (Figure 5) provide some
 Figure580
 BERT-L
 5: Confusion matrices for BERT-NL (top) vs
 BERT-L (bottom); SWDA corpus. Solid lines show
 581
food for thought. Most of the misclassifications classification improvement of rhetorical questions.
 582
fall into the majority classes, such as sd (Statement-
 583
non-opinion), on left edge of the matrix. How-
 584
 585
 586
 6 587
Rhetorical questions
• are misclassified by BERT-NL as Wh-q.
• Laughter cancels seriousness and reduces
 commitment to literal meaning.*†

 B: Um, as far as spare time, they talked Statement n/o
 about,
 B: I don’t, + I think, Statement n/o
 B: who has any spare time ? Rhetorical q.

* Tepperman, J., Traum, D., & Narayanan, S. (2006). ‘Yeah right’: sarcasm recognition for spoken dialogue
 systems. In , Ninth International Conference on Spoken Language Processing.
† Ginzburg, J., Breitholtz, E., Cooper, R., Hough, J., & Tian, Y. (2015). Understanding Laughter. In ,
 Proceedings of the 20th Amsterdam Colloquium (pp. 137–146). : . 22
What about non-verbals?
• What if out model was unaware of this
 class?
• We mask the outputs where the desired
 class was Non-verbal.
• We test on 659 non-verbals (413 of which
 contain laughters)
• Predicted: Acknowledge/Backchannel
 (76%), continuation of previous DA (11%)
 23
Example
 B: I would go from one side of the boat to the Statement n/o
 other,
 B: and, uh, +
 A: Backchannel
 B: the, uh, the party boat captain could not +
 understand, you know,
 B: he even, even he started baiting my hook Statement n/o
 ,
 A: Backchannel
 B: and holding, holding the, uh, the fishing rod. +
 A: How funny, Appreciation

 24
Conclusions
• Laughter is tightly related to dialogue
 information structure.
• Laughter is a valuable cue for DAR task
 (implications for NLU).
• Laughter can help disambiguating literal
 and non-literal interpretation (a struggle
 for many NLP tasks).
• Future models need meaningful DAs for
 standalone laughs.
 25
Thank you!
Maraev, Noble and Howes were supported by the Swedish Research Council (VR) grant
2014-39 for the establishment of the Centre for Linguistic Theory and Studies in Probability
(CLASP). Mazzocconi was supported by the “Investissements d’Avenir” French government
program managed by the French National Research Agency (reference: ANR-16-CONV-0002)
and from the Excellence Initiative of Aix-Marseille University—“A*MIDEX” through the Insti-
tute of Language, Communication and the Brain.

 Bill Noble Chiara Mazzocconi Chris Howes

 26
You can also read