Interpreting Verbal Metaphors by Paraphrasing
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Interpreting Verbal Metaphors by Paraphrasing Rui Mao1 , Chenghua Lin2 , and Frank Guerin3 1 Department of Computing Science, University of Aberdeen, AB24 3UE, UK 1 r03rm16@abdn.ac.uk 2 Department of Computer Science, University of Sheffield, S1 4DP, UK 2 c.lin@sheffield.ac.uk 3 Department of Computer Science, University of Surrey, GU2 7XH, UK 3 f.guerin@surrey.ac.uk Abstract similar word to the real meaning of a metaphor, which has the highest probability of appearing in Metaphorical expressions are difficult linguis- tic phenomena, challenging diverse Natural a given context as the literal paraphrase of the metaphor (see § 3 for the reason). The probability arXiv:2104.03391v1 [cs.CL] 7 Apr 2021 Language Processing tasks. Previous works showed that paraphrasing a metaphor as its lit- is given by BERT based LM prediction and the eral counterpart can help machines better pro- semantically similar word is constrained by Word- cess metaphors on downstream tasks. In this Net hypernyms and synonyms. Therefore, our ap- paper, we interpret metaphors with BERT and proach is fully unsupervised which does not require WordNet hypernyms and synonyms in an un- labelling a large dataset for model training. supervised manner, showing that our method significantly outperforms the state-of-the-art By running an automatic evaluation on a pub- baseline. We also demonstrate that our method licly available dataset (MOH, Mohammad et al., can help a machine translation system improve 2016), our model achieves 8% accuracy gains on its accuracy in translating English metaphors verbal metaphor paraphrasing, compared with the to 8 target languages. state-of-the-art baseline (Mao et al., 2018). On the human evaluated paraphrasing task, the aver- 1 Introduction age gain of our model is 13% on two benchmark Metaphor is defined as using one or several words datasets, i.e., MOH and VUA (Steen et al., 2010). to illustrate a meaning different to the basic mean- We also examine the metaphor paraphrases on MT ing of the words (Steen et al., 2010). Due to tasks based on Google translator1 and 8 diverse the difficulty of inferring underlying meanings of target languages. Evaluation results show that our metaphors, metaphoric expressions challenge di- method can improve the accuracy of translating verse Natural Language Processing (NLP) tasks, English metaphors by an average gain of 20.9%, e.g., sentiment analysis (Ghosh et al., 2015) and outperforming the baseline by 12.9%. machine translation (MT) (Mao et al., 2018). Cur- The contribution of this work can be summarised rently, BERT has achieved large improvements as follows: (1) we introduce a simple yet effec- on diverse downstream tasks (Devlin et al., 2019). tive method by using BERT as a LM model for However, it has not been examined on metaphor metaphor interpretation, yielding significant im- interpretation. In this paper, we interpret verbal provements against the state-of-the-art baseline; (2) metaphors with BERT and WordNet (Fellbaum, with our model, a MT system can significantly im- 1998). We focus on verbal metaphors, because prove its accuracy of translating English metaphors verbs are the most common metaphoric expressions into 8 diverse target languages. among all PoS categories (Steen et al., 2010), form- ing the most popular tasks in automatic metaphor 2 Related work identification and interpretation (Shutova, 2015; Due to the lack of large annotated metaphor in- Leong et al., 2018). terpretation corpora, previous works addressed We consider metaphor interpretation as a para- metaphor interpretation mainly by unsupervised phrasing task, predicting literal counterparts for modelling of co-occurrences of words and their metaphors. Unlike fine-tuning or feature based contexts with different knowledge bases and lexical BERT applications, we use BERT as a language 1 modelling (LM) method to predict a semantically https://translate.google.com/
resources (Shutova, 2010; Shutova et al., 2012; Bol- the start and end of an input sequence with a length legala and Shutova, 2013). These works focused of n. For a sentence that has multiple metaphors, on interpreting verbal metaphors from word-pairs we mask one metaphoric word each time. with specific syntactic structures, e.g., verb-subject Next, we predict an appropriate paraphrase for and verb-direct object, which is inconvenient for the metaphor. In order to connect a metaphor (the real-world applications. Another trend in metaphor target word) with its possible literal counterparts, interpretation focused on a specific domain, e.g., we introduce the semantic constraints of Word- question answering about Unix (Martin, 1990), Net hypernyms and synonyms for candidate literal event descriptions in economics (Narayanan, 1997) counterpart development. It is likely that one of and mental states descriptions (Barnden and Lee, the hypernyms and synonyms has a similar mean- 2002) with hand-coded knowledge and logic rules. ing to the underlying meaning of a metaphor, thus, Mao et al. (2018) extended the word-pair models, they help the BERT model filter out irrelevant pre- interpreting metaphors from full sentences in open dictions. The hypernyms {hi,j } and synonyms domains. They modelled the co-occurrences of {si,k } of a metaphoric verb i and their different words and their contexts with word2vec CBOW verb forms (f , inflections) are considered as our (Mikolov et al., 2013) input and output vectors candidate word set (Ci = {hi,j , hfi,j , si,k , sfi,k }). to generate appropriate paraphrases of metaphors. The best fit word (bi ) in Ci is given by a candidate Compared with context-independent word2vec em- word with the highest probability in its context beddings, BERT as a context-dependent word em- bedding method, has shown remarkable perfor- bi = argmax p(mi |w1 , ..., wi−1 , wi+1 , ..., wn ). mi ∈Ci mance on diverse NLP tasks (Devlin et al., 2019). However, to the best of our knowledge, BERT has bi is considered as literal, because according to not been applied in metaphor interpretation. relevant statistics from Cameron (2003); Martin 3 Methodology (2006); Steen et al. (2010) and Shutova (2016), lit- erals are more common in typical corpora. Thus, a Inspired by Mao et al. (2018), we introduce a hy- literal has higher probability appearing in a context pernym and synonym constrained missing word than a metaphor with the similar meaning. No- prediction method for unsupervised metaphor in- ticeably, the limitation of paraphrasing a metaphor terpretation. The difference is our missing word with a single literal word is that the paraphrase may prediction is based on pre-trained BERT (BERT- lose some nuance in the original metaphor. How- LARGE - CASED 2 ) that has 24 Transformer layers ever, such a paraphrase method can help a machine (Vaswani et al., 2017), whereas Mao et al. (2018) to better translate the real meanings of metaphors used word2vec input and output vectors. into human comprehensible languages (see § 6). BERT is a pre-trained Language Modelling method. The training target is to predict randomly 4 Dataset selected masked WordPieces (Wu et al., 2016) and MOH. MOH was formed by Mohammad et al. the next sentence of a current processing sentence. (2016), sourcing from WordNet example sentences. Since the masked word prediction is bidirectional, We select 315 sentences containing 315 metaphoric combining its surrounding context information, a verbs whose metaphoricity was agreed by at least pre-trained BERT model on open corpora can be 70% annotators, forming our automatic evaluation naturally applied as a missing word prediction test set (MOH315 ). The average sentence length model. In our task, we predict the probability of is 8.8. We conduct human evaluation with 50 ran- a missing word (mi ) with its context (w) by using domly selected sentences (MOH50 ) from MOH315 . pre-trained BERT VUA50 . We also randomly select 50 metaphoric p(mi |w1 , ..., wi−1 , wi+1 , ..., wn ) verbs and their associated sentences from VU Ams- = BERT([ CLS ], w1 , ..., [ MASK ]i , ..., wn , [ SEP ]), terdam Metaphor Corpus (Steen et al., 2010) for hu- man evaluation, where the sentences are originated where the position of mi is replaced with [ MASK ]i ; from British National Corpus (Burnard, 2000). [ CLS ] and [ SEP ] are special tokens, representing VUA50 may contain multiple metaphoric verbs in a 2 https://github.com/google-research/ sentence. Thus, 36 different sentences from all four bert different genres e.g., academic text, conversation,
news and fiction are selected. The average sentence vited 3 native English speakers with master degrees. length of VUA50 is 27.7. A participant would be asked if a lemmatized para- phrase of a target word is acceptable in the given 5 Baselines context with a questionnaire. A paraphrase is ac- SIM - CBOW I+O (Mao et al., 2018) modelled ceptable, if it captures the original meaning of a word co-occurrences with CBOW based word2vec target word in the given context, where the gram- (Mikolov et al., 2013) input and output vectors. matical tense of the lemmatized paraphrase is not Their paraphrases were also constrained by Word- a part of consideration. An example of the ques- Net hypernyms and synonyms. tionnaire can be viewed in Figure 1a. The final BERT- LARGE - CASED - WWM (Cui et al., 2019) decision is agreed by at least 2 annotators. masked whole words during the BERT pre-training In the human evaluation (Table 1), BERT- LARGE - procedure, rather than WordPieces. We test this CASED surpasses SIM - CBOW I+O by 14%, 12% method as an alternative of BERT- LARGE - CASED and 13% on MOH50 , VUA50 and their com- for the best fit word prediction. bination datasets (Cohen’s κ = 0.61), respec- tively, yielding an average gain of 13%. The 6 Result average gain of BERT- LARGE - CASED over BERT- LARGE - CASED - WWM is 4%. The accuracy of the The evaluation is conducted as three phases. First, three models on MOH50 with human evaluation we automatically evaluate model performance on is higher than their automatic evaluation, because MOH315 . Since MOH315 originated from WordNet candidate words from other WordNet sense classes example sentences, a paraphrasing prediction is are possibly acceptable from practical aspects. E.g., correct, if the paraphrase belongs to the WordNet “read” is an acceptable paraphrase of “scan” in “She sense class which the test sentence was in. scanned the newspaper headlines while waiting for As seen under the first column in Table 1, the ac- the taxi” for human, although it is incorrect for auto- curacy of BERT- LARGE - CASED on the automatic matic evaluation. BERT- LARGE - CASED performs evaluation are 49%, outperforming SIM - CBOWI+O similarly on MOH50 and VUA50 , although the aver- by 8%. It shows that using BERT can better age length of VUA sentences (27.7) is longer than paraphrase a target word in a sentence, based that of MOH (8.8). on constrained missing word prediction. This is likely because Transformer based BERT can bet- In the third phase, for evaluating the quality of ter model the long-term dependency and word or- paraphrases on a downstream task, we examine ders in sentences, while a shallow neural network the MOH50 paraphrases on English MT with 8 tar- based CBOW only represents word co-occurrences get languages, namely German, Russian, Greek, within a window of contexts. In CBOW, the posi- Italian, Dutch, Chinese, Thai and Japanese. We tion of a target word was not specified in a sentence, invited 3 participants per target language, to anno- while BERT used positional embedding for each tate the quality of a translation of each target word word during its pre-training procedure. Compared in an original sentence, a SIM - CBOWI+O para- with BERT- LARGE - CASED - WWM, there is a gain phrased sentence, and a BERT- LARGE - CASED para- of 3% in BERT- LARGE - CASED. Masking whole phrased sentence. BERT- LARGE - CASED - WWM is words during pre-training does not outperform the excluded in this phase, because many predictions original BERT that masked WordPieces, probably of it are the same as the predictions given by because the learned stem information from Word- BERT- LARGE - CASED by our observation (e.g., B Piece is more useful for predicting a missing word and C in Figure 1a), whereas BERT- LARGE - CASED than the whole-word-masking approach. Thus, the surpasses BERT- LARGE - CASED - WWM on the para- original BERT can better retrieve a missing word. phrase evaluations. The translations are given by In the second phase, we test model perfor- Google translator. The participants are native in the mance on MOH50 and VUA50 with human eval- target languages, and had learnt English in school, uation. As seen in Table 1, the accuracy and had each spent a minimum of one year living of automatic evaluation on MOH50 given by and studying in England as a postgraduate. Partic- SIM - CBOW I+O , BERT- LARGE - CASED - WWM and ipants were provided with senses of the original BERT- LARGE - CASED is similar to their perfor- metaphors in WordNet, so that they can fully un- mance on MOH315 . For human evaluation, we in- derstand the true meanings of the metaphors. The
1. Am I supposed to swallow that story? 1. Am I supposed to swallow that story? For the target word “swallow” (the bold): swallow: believe or accept without Acceptable/ Yes/No questioning or challenge. Unacceptable A. “talk” is an acceptable paraphrase; A. Я должен проглотить эту историю? B. “believe” is an acceptable paraphrase; B. Я должен рассказать эту историю? C. “believe” is an acceptable paraphrase. C. Я должен верить этой истории? (a) A is paraphrased by SIM-CBOWI+O; B is by BERT-LARGE- (b) A is translated by original sentence; B is by SIM-CBOWI+O CASED; C is by BERT-LARGE-CASED-WWM. paraphrasing; C is by BERT-LARGE-CASED paraphrasing. Figure 1: Example questions for human evaluation. (a) Paraphrasing task. (b) Machine Translation task. Automatic Evaluation Human Evaluation Model MOH315 MOH50 MOH50 VUA50 MOH50 +VUA50 SIM - CBOW I+O 0.41 0.42 0.68 0.70 0.69 BERT- LARGE - CASED - WWM 0.46 0.46 0.76 0.80 0.78 BERT- LARGE - CASED 0.49 0.50 0.82 0.82 0.82 Table 1: Metaphor interpretation evaluation on paraphrasing tasks, measured by accuracy. German Russian Greek Italian Dutch Chinese Thai Japanese Avg Cohen’s kappa (κ) 0.59 0.67 0.60 0.63 0.67 0.69 0.71 0.55 - Original 0.48 0.32 0.32 0.50 0.50 0.38 0.30 0.28 0.385 SIM - CBOW I+O 0.38 0.40 0.36 0.56 0.46 0.60 0.52 0.44 0.465 BERT- LARGE - CASED 0.60 0.42 0.46 0.72 0.66 0.68 0.70 0.50 0.594 Table 2: Metaphor interpretation evaluation on Machine Translation tasks, measured by accuracy. quality of a translation is measured by a binary and “tugged” in “She tugged for years to make a de- choice, e.g., acceptable or unacceptable. An ac- cent living.” are badly translated by Google across ceptable translation is qualified, if a translation all the target languages. However, the translations expresses the true meaning of the target word and based on BERT- LARGE - CASED paraphrased target it matches its context (cohesion) in their mother words, e.g. “Am I supposed to believe that story?” language, where errors in context words and tenses and “She struggled for years to make a decent liv- are ignored. The final decision was agreed by at ing.” are acceptable in each target language. We least two annotators (κ ranges from 0.55 to 0.71 in also noticed that there are some translation errors Table 2). An example can be viewed in Figure 1b. introduced due to incorrect paraphrasing. How- As seen in Table 2, the average accuracy of orig- ever, the accuracy gain achieved by paraphrasing inal metaphor translations is 38.5%. By paraphras- metaphors substantially surpasses the small impact ing the metaphors with BERT- LARGE - CASED, the from the error, yielding a significant overall im- average accuracy is improved to 59.4%, achieving provement (20.9% in accuracy) in the MT task. a gain of 20.9%, outperforming SIM - CBOWI+O by 7 Conclusion 12.9%. The average gain of BERT- LARGE - CASED on the target Asian languages (30.7%) against the We propose an unsupervised model for metaphor original metaphor translations are higher than that interpretation based on BERT and WordNet, yield- of European languages (15.0%). This is proba- ing large gains against the baseline on metaphor bly because the larger language and cultural dif- paraphrasing tasks. Our evaluation demonstrates ferences mean that English metaphors which are that paraphrasing English metaphors into their lit- directly translated are unlikely to be understand- eral counterparts can help a MT system improve able by Asians. After paraphrasing, the incom- the accuracy of translating the metaphors into more prehensible phrases become comprehensible. E.g., comprehensible target languages. In future work, “swallow” in “Am I supposed to swallow that story?” we will test our model on other downstream tasks.
References Saif Mohammad, Ekaterina Shutova, and Peter Turney. 2016. Metaphor as a medium for emotion: An em- John A Barnden and Mark G Lee. 2002. An artifi- pirical study. In Proceedings of the Fifth Joint Con- cial intelligence approach to metaphor understand- ference on Lexical and Computational Semantics, ing. Theoria et Historia Scientiarum, 6(1):399–412. pages 23–33. D Bollegala and E Shutova. 2013. Metaphor interpreta- Srini Narayanan. 1997. Knowledge-based action rep- tion using paraphrases extracted from the web. PLoS resentations for metaphor and aspect (KARMA). ONE, 8(9). Computer Science Division, University of California at Berkeley dissertation. Lou Burnard. 2000. Reference guide for the British national corpus (world edition). Ekaterina Shutova. 2010. Automatic metaphor inter- pretation as a paraphrasing task. In Human Lan- Lynne Cameron. 2003. Metaphor in Educational Dis- guage Technologies: The 2010 Annual Conference course. A&C Black. of the North American Chapter of the Association for Computational Linguistics, pages 1029–1037. Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Association for Computational Linguistics. Ziqing Yang, Shijin Wang, and Guoping Hu. 2019. Pre-training with whole word masking for Chinese Ekaterina Shutova. 2015. Design and evaluation of BERT. arXiv preprint arXiv:1906.08101. metaphor processing systems. Computational Lin- guistics, 41(4):579–623. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Ekaterina Shutova. 2016. Design and evaluation of deep bidirectional Transformers for language under- metaphor processing systems. Computational Lin- standing. In Proceedings of the 2019 Conference of guistics. the North American Chapter of the Association for Computational Linguistics: Human Language Tech- Ekaterina Shutova, Tim Van de Cruys, and Anna Ko- nologies, pages 4171–4186. rhonen. 2012. Unsupervised metaphor paraphras- ing using a vector space model. In 24th Inter- Christiane Fellbaum. 1998. WordNet: An Electronic national Conference on Computational Linguistics, Lexical Database. Bradford Books. page 1121. Aniruddha Ghosh, Guofu Li, Tony Veale, Paolo Rosso, Gerard J Steen, Aletta G Dorst, J Berenike Herrmann, Ekaterina Shutova, John Barnden, and Antonio Anna Kaal, Tina Krennmayr, and Trijntje Pasma. Reyes. 2015. Semeval-2015 task 11: Sentiment 2010. A Method for Linguistic Metaphor Identifi- analysis of figurative language in twitter. In Pro- cation: From MIP to MIPVU, volume 14. John Ben- ceedings of the 9th International Workshop on Se- jamins Publishing. mantic Evaluation (SemEval 2015), pages 470–478. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Chee Wee Ben Leong, Beata Beigman Klebanov, and Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Ekaterina Shutova. 2018. A report on the 2018 VUA Kaiser, and Illia Polosukhin. 2017. Attention is all metaphor detection shared task. In Proceedings of you need. In Advances in Neural Information Pro- the Workshop on Figurative Language Processing, cessing Systems, pages 5998–6008. pages 56–66. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Rui Mao, Chenghua Lin, and Frank Guerin. 2018. Maxim Krikun, Yuan Cao, Qin Gao, Klaus Word embedding and WordNet based metaphor iden- Macherey, et al. 2016. Google’s neural machine tification and interpretation. In Proceedings of the translation system: Bridging the gap between hu- 56th Annual Meeting of the Association for Compu- man and machine translation. arXiv preprint tational Linguistics, volume 1, pages 1222–1231. arXiv:1609.08144. James Martin. 1990. A Computational Model of Metaphor Interpretation. Academic Press Profes- sional. James H Martin. 2006. A corpus-based analysis of con- text effects on metaphor comprehension. Technical Report CU-CS-738-94, Boulder: University of Col- orado: Computer Science Department. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor- rado, and Jeff Dean. 2013. Distributed representa- tions of words and phrases and their compositional- ity. In Advances in neural information processing systems, pages 3111–3119.
You can also read