Dialogue act classification is a laughing matter - SemDial 2021
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Dialogue act classification is a laughing matter Centre for Linguistic Theory and Studies Vladislav Maraev* * in Probability (CLASP), Department of Bill Noble* Philosophy, Linguistics and Theory of Science, University of Gothenburg Chiara Mazzocconi† Christine Howes* Institute of Language, Communication, † and the Brain, Laboratoire Parole et PotsDial 2021 Langage, Aix-Marseille University 1
Why laughter? • We laugh a lot: laughter can make up to 17% of conversation.* • It is social: we are 30 times more likely† to laugh in presence of others. • Kids laugh before they learn to speak; laughter can be informative about their development.‡ * Tian, Y., Mazzocconi, C., & Ginzburg, J. (2016). When do we laugh? In Proc. of SemDial-2016. † Provine, R. R. (2004). Laughing, tickling, and the evolution of speech and self. Current Directions in Psychological Science, 13(6):215–218. ‡ Mazzocconi, C., & Ginzburg, J. (2020). Laughter growing up. In Laughter and Other Non-Verbal Vocali- sations Workshop: Proceedings (2020). 2
What is laughter? • Laughter is not the same as humour. • Laughter can convey a wide spectrum of emotions: from embarrassment to joy. • Laughter can express (or interplay with) a communicative intent. • Laughter can be a subject of clarification request.* * Mazzocconi, C., Maraev, V., & Ginzburg, J. (2018). Laughter Repair. In Proceedings of the 22nd Work- shop on the Semantics and Pragmatics of Dialogue. Aix-en-Provence, France. 3
Communicative intent • Laughter can help determining sincerity of an utterance (e.g. sarcasm).* • Listeners can be influenced towards non-literal interpretation of sentences accompanied by laughter.† • We explore the role of laughter in attributing communicative intents to utterances. * Tepperman, J., Traum, D., & Narayanan, S. (2006). ‘Yeah right’: sarcasm recognition for spoken dialogue systems. In , Ninth International Conference on Spoken Language Processing. 4 † Bryant, G. A. (2016). How do laughter and language interact? In Proceedings of EVOLANG11.
The concept of a dialogue act (DA) • based on Austin’s concept of a speech act* • considers not only a propositional content of an utterance but also performed action • Dialogue act is an extension of speech act, focussing on interaction. • Dialogue act recognition (DAR) is task of labelling sequence of utterances with DAs. * Austin, J. L. (1975). How to do things with words. Oxford university press. 5
Example SWDA-2827 Utterance Dialogue act A: Well, I’m the kind of cook that I don’t Statement-non- normally measure things, opinion (sd) A: I just kind of throw them in sd A: and, you know, I don’t to the point of, you know, measuring down to the exact sd amount that they say. B: That means that you are real cook. Statement- opinion A: Oh, is that what it means Downplayer A: Uh-huh. Backchannel A: Non-verbal 6
In this work • We explore collocation of laughs and dialogue acts. • We investigate whether laughter is helpful for the computational task of dialogue act recognition (DAR). 7
Data • Switchboard Dialogue Act Corpus (SWDA)* • 220 dialogue acts according to DAMSL schema† clustered into 42 DAs • 1155 conversations, 400k utterances, 3M tokens * Jurafsky, D., Shriberg, E., & Biasca, D. (1997). Switchboard Dialog Act Corpus. International Computer Science Inst. Berkeley CA, Tech. Rep. † Jurafsky, D., Shriberg, L., & Biasca, D. (1997). Switchboard SWBD-DAMSL Shallow-Discourse-Func- tion Annotation Coders Manual. 8
0.7 Non-verbal Non-verbal 0.6 0.5 0.4 Downplayer 0.3 Apology 0.2 Downplayer Apology 0.1 0 9 contain laughter adjacent with laughter
Pentagonal within the given DA Statement-non-opinion representations Apology Downplayer of DAs in previous utterance in next utterance by self by self 0.16 0.12 0.08 0.04 in previous utterance in next utterance by other by other 10
0.06 Tag-Question 0.05 Apology 0.04 Summarize/reformulate 0.03 Rhetorical-Questions Negative non-no answers 0.02 Self-talk Or-Clause Statement-non-opinion Dispreferred answers Statement-opinion Action-directive Thanking Offers/Options/Commits 0.01 Conventional-opening Hedge Collaborative Completion Abandoned or Turn-Exit or Uninterpretable Declarative Yes-No-Question Other answers Open-Question Affirmative non-yes answers 0 3rd-party-talk Hold before answer/agreement Backchannel in question form Other No answers Wh-Question Repeat-phrase Yes answers Reject -0.01 Yes-No-Question Declarative Wh-Question Conventional-closing Response Acknowledgement Acknowledge (Backchannel) Segment (multi-utterance) Appreciation -0.02 -0.03 Quotation Agree/Accept -0.04 Maybe/Accept-part Signal-non-understanding Downplayer -0.05 -0.1 -0.05 0 0.05 0.1 0.15
.05 .10 Statement-non-opinion Acknowledge Statement-opinion Segment Abandoned or Turn-Exit Agree/Accept (Backchannel) (multi-utterance) or Uninterpretable Appreciation Yes-No-Question Yes answers Conventional-closing Wh-Question No answers Response Hedge Declarative Backchannel Quotation Summarize/reformulate Acknowledgement Yes-No-Question in question form Other Affirmative non-yes answers Action-directive Collaborative Completion Repeat-phrase Open-Question Rhetorical-Questions Hold before Reject Negative non-no answers Signal-non-understanding Other answers answer/agreement Conventional-opening Or-Clause Dispreferred answers 3rd-party-talk Offers, Options Commits Maybe/Accept-part Downplayer Self-talk Tag-Question Declarative Wh-Question Apology Thanking
Modification and enrichment of current DA (with a degree of urgency) • smoothing/softening: Action-directive, Reject, Dispreferred answer, Apology • stress positive disposition: Appreciation, Downplayer, Thanking • cue less probable, non-literal meaning: Rhetorical questions 13
Benevolence induction • Laughter can induce or invite a determinate response (Downplayer, Appreciation). • Self-talk: signals ‘social’ incongruity of the action A: Well, I don’t have a Mexi-, - Statement n/o A: I don’t, shouldn’t say that, Self-talk A: I don’t have an ethnic maid . Statement n/o 14
within the given DA Apology Downplayer in previous utterance in next utterance by self by self 0.16 0.12 0.08 0.04 in previous utterance in next utterance by other by other 15
Apology and Downplayer A: I'm sorry to keep you waiting Apology #.# B: Okay Downplayer A: Uh, I was calling from work Statement n/o • The positive effect of laughter is attained and successful. • We also recently discovered that in case of social incongruity laughter is likely to be followed by gaze ‘check’.* 16 * Mazzocconi C., Maraev V., Somahekarappa V., Howes C. Looking for laughs: Gaze interaction with laughter pragmatics and coordination, accepted at ICMI 2021
Intermediate conclusion • Laughter is tightly related to dialogue information structure. • ...should it then be an important cue for a computational model? 17
Dialogue act recognition model • We are using BERT pre-trained on massive non-dialogical data (see Noble and Maraev, 2021).* Ŷ0 Ŷ1 ... ŶT h1 h2 hT RNN RNN ... RNN Encoder Encoder ... Encoder s0 , w00 , w10 , ..., wn0 0 s1 , w01 , w11 , ..., wn1 1 ... sT , w0T , w1T , ..., wnTT Utterance 1 Utterance 2 Utterance T Utterance 1 Utterance 2 Utterance T * Noble, B., & Maraev, V. (2021). Large-scale text pre-training helps with dialogue act recognition, 18 but not without fine-tuning. In Proceedings of the 14th International Conference on Computational Seman- tics (IWCS) (pp. 166–172).
Results macro F1 accuracy BERT-NL 36.48 76.00 BERT-L 36.75 76.60 BERT-L+OSNL 43.71 76.95 BERT-L+OSL 41.43 77.09 • OS = OpenSubtitles, 350M tokens, 0.3% laughter tokens 19
A.2 Model performance in DAR task 1 adjacent utterances contain laughter contains laughter 0.5 0 Sta Ac Sta Se Ab Ag Ap Ye No Ye Co W No Re He De Ba Qu Su Ot Af Ac Co Re Op Rh Ho Re Ne Sig Ot Co Or Di 3rd Of M Do Se Ta De Ap Th tem kn tem gm an ree pre s-N n-v s a nv h-Q an spo dg cla ckc ota mm her firm tio llab pea en eto ld jec gat na her nv -Cl spr -p fer ayb wn lf-t g-Q cla olo ank en owl en ent don /Ac cia o-Q erb nsw enti ues sw nse e rativ han tion ari ati n-di ora t-ph -Qu rica befo t ive l-n an ent au efe arty s, O e/A pl alk ue rat gy in sti ive t-n ed t-o (m ed ce tio u al er on tio ers A e Y nel ze/ ve rec tiv ra es l-Q re no on- swe iona se rred -ta pti cce ayer on W g on ge pin ult or pt n est s al- n clo ck e s in ref no tiv e C se tion ue an n-n un rs l-o an lk ons pt-p h-Q -op (B io i-u Tu ion no -N qu orm n-y e om sti sw o a ders pe sw Co art ini ack n tter rn- sin wl o nin on ch an Ex g edg o e s -Q tio u lat e es an p let n s er/ag ns tan we di g ers mm ue sti an ce) it eme u est fon s we io n r eem n rs g its on ne or nt ion rm rs l) Un en int t erp ret ab le
In the first experiment we investigated whether 563 ft laughter,ft fa as an example of a dialogue-specific sig- fa qw^d 564 qw^d ^g nal, ^gis t1 a helpful feature for DAR. Therefore, we t1 bd 565 bd aap_am train aap_am oo_co_ccanother version of each model: one containing t3 oo_co_cc t3 566 arp_nd laughs qrr (L) and one with laughs left out (NL), and arp_nd fp qrr fp 567 no no compare br ng their performances in DAR task. Table 2 br ng ar 568 ar compares ^h qh the results from applying the models with ^h qh qo 569 qo twob^m different ^2 utterance encoders (BERT, CNN). b^m ^2 ad 570 ad na First na fo_o_fw_ bf of all, we can see that BERT outperforms fo_o_fw_ bf 571 ^q bhin all cases. From observing the effect of ^q CNNqy^d bh qy^d 572 h h laughters, bk nn we can see differences in performance bk nn 573 qw qw depending fc ny on the dialogue act, regardless of how fc ny x 574 x oftenbaqy laughter occurs in the current or adjacent qy ba aa 575 aa utterances % + (see Figure 7 in Appendix A). The % sv + 576 sv b strongest sd b evidence for laughter as a helpful fea- sd 577 sd b sv + % aa ba qy x ny fc qw nn bk h qy^d bh ^q bf fo_o_fw_ na ad ^2 b^m qo qh ^h ar ng br no fp qrr arp_nd t3 oo_co_cc aap_am bd t1 ^g qw^d fa ft ture was found in SWDA for the BERT utterance sd b sv + % aa ba qy x ny fc qw nn bk h qy^d bh ^q bf fo_o_fw_ na ad ^2 b^m qo qh ^h ar ng br no fp qrr arp_nd t3 oo_co_cc aap_am bd t1 ^g qw^d fa ft 578 encoder, where macro-F1 score increases by 7.89 579 percentage points. BERT-NL Confusion matrices (Figure 5) provide some Figure580 BERT-L 5: Confusion matrices for BERT-NL (top) vs BERT-L (bottom); SWDA corpus. Solid lines show 581 food for thought. Most of the misclassifications classification improvement of rhetorical questions. 582 fall into the majority classes, such as sd (Statement- 583 non-opinion), on left edge of the matrix. How- 584 585 586 6 587
Rhetorical questions • are misclassified by BERT-NL as Wh-q. • Laughter cancels seriousness and reduces commitment to literal meaning.*† B: Um, as far as spare time, they talked Statement n/o about, B: I don’t, + I think, Statement n/o B: who has any spare time ? Rhetorical q. * Tepperman, J., Traum, D., & Narayanan, S. (2006). ‘Yeah right’: sarcasm recognition for spoken dialogue systems. In , Ninth International Conference on Spoken Language Processing. † Ginzburg, J., Breitholtz, E., Cooper, R., Hough, J., & Tian, Y. (2015). Understanding Laughter. In , Proceedings of the 20th Amsterdam Colloquium (pp. 137–146). : . 22
What about non-verbals? • What if out model was unaware of this class? • We mask the outputs where the desired class was Non-verbal. • We test on 659 non-verbals (413 of which contain laughters) • Predicted: Acknowledge/Backchannel (76%), continuation of previous DA (11%) 23
Example B: I would go from one side of the boat to the Statement n/o other, B: and, uh, + A: Backchannel B: the, uh, the party boat captain could not + understand, you know, B: he even, even he started baiting my hook Statement n/o , A: Backchannel B: and holding, holding the, uh, the fishing rod. + A: How funny, Appreciation 24
Conclusions • Laughter is tightly related to dialogue information structure. • Laughter is a valuable cue for DAR task (implications for NLU). • Laughter can help disambiguating literal and non-literal interpretation (a struggle for many NLP tasks). • Future models need meaningful DAs for standalone laughs. 25
Thank you! Maraev, Noble and Howes were supported by the Swedish Research Council (VR) grant 2014-39 for the establishment of the Centre for Linguistic Theory and Studies in Probability (CLASP). Mazzocconi was supported by the “Investissements d’Avenir” French government program managed by the French National Research Agency (reference: ANR-16-CONV-0002) and from the Excellence Initiative of Aix-Marseille University—“A*MIDEX” through the Insti- tute of Language, Communication and the Brain. Bill Noble Chiara Mazzocconi Chris Howes 26
You can also read