Multi-word Units Under the Magnifying Glass - Vered Shwartz Talk @ ONLP, December 26, 2018
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Multi-word Units Under the Magnifying Glass Vered Shwartz Natural Language Processing Lab, Bar-Ilan University Talk @ ONLP, December 26, 2018
Multi-Word Units (MWUs)∗ A sequence of consecutive words that creates a new concept Vered Shwartz · MWUs Under the Magnifying Glass · December 2018 2 / 34
Multi-Word Units (MWUs)∗ A sequence of consecutive words that creates a new concept Noun compounds: flea market, flea bite, flea bite treatment, ... Adjective-noun compositions: hot tea, hot day, ... Verb-particle constructions: wake up, let go, ... Light-verb constructions: make a decision, take a walk, ... Idioms: look what the cat dragged in, ... Vered Shwartz · MWUs Under the Magnifying Glass · December 2018 2 / 34
Multi-Word Units (MWUs)∗ A sequence of consecutive words that creates a new concept Noun compounds: flea market, flea bite, flea bite treatment, ... Adjective-noun compositions: hot tea, hot day, ... Verb-particle constructions: wake up, let go, ... Light-verb constructions: make a decision, take a walk, ... Idioms: look what the cat dragged in, ... May combine in a non-trivial way Implicit meaning Non-literal word usage Vered Shwartz · MWUs Under the Magnifying Glass · December 2018 2 / 34
Multi-Word Units (MWUs)∗ A sequence of consecutive words that creates a new concept Noun compounds: flea market, flea bite, flea bite treatment, ... Adjective-noun compositions: hot tea, hot day, ... Verb-particle constructions: wake up, let go, ... Light-verb constructions: make a decision, take a walk, ... Idioms: look what the cat dragged in, ... May combine in a non-trivial way Implicit meaning Non-literal word usage * Also referred to as Multi-Word Expressions or phrases Vered Shwartz · MWUs Under the Magnifying Glass · December 2018 2 / 34
Previous MWUs Representations Compositional Distributional Representations: vec(olive oil) = f (vec(olive), vec(oil)) Many ways to learn f [Mitchell and Lapata, 2010, Zanzotto et al., 2010, Dinu et al., 2013] Usually applied to AN or NC, limited to specific number of words Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 3 / 34
Previous MWUs Representations Compositional Distributional Representations: vec(olive oil) = f (vec(olive), vec(oil)) Many ways to learn f [Mitchell and Lapata, 2010, Zanzotto et al., 2010, Dinu et al., 2013] Usually applied to AN or NC, limited to specific number of words Phrase Embeddings: Arbitrarily long phrases Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 3 / 34
Previous MWUs Representations Compositional Distributional Representations: vec(olive oil) = f (vec(olive), vec(oil)) Many ways to learn f [Mitchell and Lapata, 2010, Zanzotto et al., 2010, Dinu et al., 2013] Usually applied to AN or NC, limited to specific number of words Phrase Embeddings: Arbitrarily long phrases Supervision from PPDB [Wieting et al., 2015] Limited in coverage Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 3 / 34
Previous MWUs Representations Compositional Distributional Representations: vec(olive oil) = f (vec(olive), vec(oil)) Many ways to learn f [Mitchell and Lapata, 2010, Zanzotto et al., 2010, Dinu et al., 2013] Usually applied to AN or NC, limited to specific number of words Phrase Embeddings: Arbitrarily long phrases Supervision from PPDB [Wieting et al., 2015] Limited in coverage Generalizing word2vec [Poliak et al., 2017] Can compose vectors for unseen phrases Naive composition, doesn’t handle the complexity of phrases Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 3 / 34
Enter contextualized word embeddings! • Represent a word in context ◦ Good for word sense induction • Trained as language models ◦ On a large corpus ◦ Capture world knowledge • Improve performance of various NLP applications • Named after characters from Sesame Street Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 4 / 34
Enter contextualized word embeddings! • Represent a word in context ◦ Good for word sense induction • Trained as language models ◦ On a large corpus ◦ Capture world knowledge • Improve performance of various NLP applications • Named after characters from Sesame Street Are meaningful MWU representations built-in in these models? Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 4 / 34
Probing Tasks Simple tasks designed to test a single linguistic property [Adi et al., 2017, Conneau et al., 2018] Representation Minimal Model Prediction Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 5 / 34
Probing Tasks Simple tasks designed to test a single linguistic property [Adi et al., 2017, Conneau et al., 2018] Representation Minimal Model Prediction SkipThoughts(s) What is s’s length? InferSent(s) Is w in s? ... ... Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 5 / 34
Probing Tasks Simple tasks designed to test a single linguistic property [Adi et al., 2017, Conneau et al., 2018] Representation Minimal Model Prediction SkipThoughts(s) What is s’s length? InferSent(s) Is w in s? ... ... We follow the same for MWUs, with various representations Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 5 / 34
Representations Contextualized Word Embeddings Sentence Embeddings Word Embeddings word2vec SkipThoughts ELMo GloVe InferSent∗ OpenAI Transformer fastText GenSen∗ BERT ∗ supervised Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 6 / 34
Tasks and Results
1. MWU Type Task Definition Dataset: Wiki50 corpus [Vincze et al., 2011] Input: sentence Goal: sequence labeling to BIO tags MWUs: noun compounds, adjective-noun compositions, idioms, light verb constructions, verb-particle constructions Named entities: person, location, organization Example: O B-MW_VPC I-MW_VPC B-MW_NC I-MW_NC O O O O Authorities meted out summary justice in cases as this Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 8 / 34
1. MWU Type Results F1(M) F1(N) 100 95.9 Human 70.6 Human 61.5 61.6 BERT ELMo 50 44.9 OpenAI Transformer 32.6 31.4 GloVe fastText 18.8 18.8 20.6 word2vec ELMo BERT 0 0.2 0.2 0.1 2.1 0 0 OAIT word2vec GloVe fastText Majority Majority (1) Identifying MWU type is difficult; (2) Named entities are easier; (3) Context helps Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 9 / 34
2. Noun Compound Literality A constituent word may be used in a non-literal way Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 10 / 34
2. Noun Compound Literality Task Definition Dataset: based on [Reddy et al., 2011] and [Tratz, 2011] Input: sentence s, target word w ∈ s (part of NC) Goal: is w literal in NC? Example: Non-Literal Literal The crash course in litigation made me a better lawyer Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 11 / 34
2. Noun Compound Literality Results 100 87 Human Accuracy 50 50 44 OpenAI Transformer 41.8 BERT 34.2 35.5 ELMo 28.8 30.3 GenSen 26.5 SkipThoughts 24.9 fastText GloVe 20 word2vec InferSent Majority 0 Word Embeddings Sentence Embeddings Contextualized (1) word embeddings < sentence embeddings < contextualized; (2) Far from humans Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 12 / 34
2. Noun Compound Literality Analysis ELMo OpenAI Transformer BERT A search team located the [crash]L site and found small amounts of human remains. landfill body archaeological wreckage place burial Web man wreck crash missing excavation burial location grave After a [crash]N course in tactics and maneuvers, the squadron was off to the war... crash few short changing while successful collision moment rigorous training long brief reversed couple training (1) BERT > ELMo, both reasonable (2) OpenAI Transformer errs due to uni-directionality Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 13 / 34
2. Noun Compound Literality Analysis ELMo OpenAI Transformer BERT The gold/[silver]L price ratio is often analyzed by traders, investors, and buyers. silver platinum silver blue black copper platinum gold platinum purple silver gold yellow red diamond Growing up with a [silver]N spoon in his mouth, he was always cheerful... silver mother wooden rubber father greasy iron lot big tin big silver wooden man little Things get tougher when both constituent nouns are non-literal! Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 14 / 34
3. Noun Compound Relations NCs express semantic relations between the constituent words Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 15 / 34
3. Noun Compound Relations NCs express semantic relations between the constituent words May require world knowledge and common sense to interpret Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 15 / 34
3. Noun Compound Relations Task Definition Dataset: based on [Hendrickx et al., 2013] Input: sentence s, NC ∈ s, paraphrase p Goal: does p explicate NC? Example: access road Road that makes access possible X Road forecasted for access season × Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 16 / 34
3. Noun Compound Relations Results 100 92 Human 74.2 67 BERT 65.6 Accuracy 60.9 60.1 60.7 ELMo 58.5 GenSen word2vec fastText GloVe 51.3 InferSent 50 50 50 SkipThoughts Majority OpenAI Transformer 0 Word Embeddings Sentence Embeddings Contextualized (1) word embeddings < sentence embeddings < contextualized; (2) Far from humans; (3) Open AI Transformer fails Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 17 / 34
riding a drug 0 money earned money with the from drug sales help of drug money for buyingmoney drug used 3. Noun Compound Relations 25 moneyforgotten drug by sellingmoney drugsthat secure is for drug 50 money developed money that Analysis for drug involves drug 75 80 60 40 20 0 20 40 60 80 drug money stage area 100 money reserved 1 for drug money coming from drug sale 75 money to money causing drug 0 buy drug 50 money gained 1 money made from selling drugs from drug business money involved area that area specified in drug business includes stage for stage area covered 25 money while 2 for stage riding a drug area that area in which produces stage stage areais who allocated is 0 money earned from drug sales 3 active in a stage money with the the area area given the forwill stage help of drug money for stage list for 25 buyingmoney drug used 4 moneyforgotten drug by sellingmoney drugsthat is area where a secure for drug 5 stage is located 50 money developed money that for drug involves drug 6 75 area where a stage is 7 80 60 40 20 0 20 40 60 80 8 6 4 2 0 stage area purse net 100 No 1clear signal from BERT. Capturing implicit information is challenging! 75 net with 0 a purse net riding a purse 1 50 a net who net done area that area specified on purse joined a purse includes stage for stage area covered 2 25 Vered Shwartz and Ido Dagan · area that area in which for stage How well do Pre-trained Text Representations AddressnetMulti-word in which there is purse Units? 18 / 34
4. Adjective-Noun Relations Adjectives select different attributes of the noun they combine with The hot debate about the hot office (or: the cold war over the cold office) Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 19 / 34
4. Adjective-Noun Relations Task Definition Dataset: based on [Hartung, 2015] Input: sentence s, AN ∈ s, attribute w Goal: is the attribute w conveyed in AN? Example: warm support: temperature × emotionality X Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 20 / 34
4. Adjective-Noun Relations Results 100 77 Human Accuracy 51.5 52.9 47.8 49.3 50 50 46.3 45.6 OpenAI Transformer InferSent 43.4 BERT GenSen 41.2 SkipThoughts Majority fastText 36 ELMo word2vec GloVe 0 Word Embeddings Sentence Embeddings Contextualized Best model performs only slightly better than majority (Capturing implicit information is challenging) Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 21 / 34
5. Adjective-Noun Entailment Task Definition Dataset: [Pavlick and Callison-Burch, 2016] Input: premise p, hypothesis h, differ by a single adjective Goal: p → h? Example: p: Most people die in the class to which they were born. h: Most people die in the social class to which they were born. X Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 22 / 34
5. Adjective-Noun Entailment Results 100 74.4 Human 55.2 F1 50 48.4 GenSen 45.2 40.4 InferSent 37.2 ELMo 36.6 fastText BERT word2vec 23.4 20.6 Thoughts 14.7 GloVe OAIT Skip 0 0 Majority Word Embeddings Sentence Embeddings Contextualized Bad performance for all models, best for sentence embeddings trained on RTE Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 23 / 34
6. Verb-Particle Classification VPC meanings differ from their verbs’ meanings Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 24 / 34
6. Verb-Particle Classification Task Definition Dataset: [Tu and Roth, 2012] Input: sentence s, VP ∈ s Goal: is VP a VPC? Example: VPC Non-VPC We did get on together Which response did you get on that? Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 25 / 34
6. Verb-Particle Classification Results 100 82 76.4 75 Human 72.3 71.4 68.6 70 68.6 ELMo 67.9 67.9 BERT 65.7 Majority OpenAI Transformer fastText Accuracy word2vec SkipThoughts GloVe InferSent GenSen 50 0 Word Embeddings Sentence Embeddings Contextualized Similar performance for all models. Is the good performance merely due to label imbalance? Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 26 / 34
6. Verb-Particle Classification Analysis have on give in 20 10 5 0 0 20 5 40 20 0 make for 20 5 get0 on 5 15 10 10 5 5 0 0 5 5 10 10 15 10 0 10 5 0 5 10 Very weak signal from ELMo. Mostly performs well due to label imbalance. Vered Shwartz and Ido Dagan · How well do Pre-trained Text Representations Address Multi-word Units? 27 / 34
Future Directions
Can we learn MWUs like humans do? [Cooper, 1999]: how do L2 learners process idioms? Infer from context: 28% (57% success rate) Rely on literal meaning: 19% (22% success rate) ... Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 29 / 34
Inferring from context We need richer context modeling Previous news stories may help understand that “crocodile tears” refer to manipulative behavior [Asl, 2013]: L2 learners interpret idioms with more success through extended contexts (stories) than through sentential contexts Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 30 / 34
Relying on literal meaning We need world knowledge “Cradle is something that you put the baby in” “You’re stealing a child from a mother” “So robbing the cradle is like dating a really young person” [Cooper, 1999] Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 31 / 34
Recap 1. Testing Existing Pre-trained Representations Contextualized word embeddings provide better MWU representations, but there is still a long way to go Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 32 / 34
Recap 1. Testing Existing Pre-trained Representations Contextualized word embeddings provide better MWU representations, but there is still a long way to go 2. Future Directions To represent MWUs like humans do, we need better context and world knowledge modeling Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 32 / 34
Recap 1. Testing Existing Pre-trained Representations Contextualized word embeddings provide better MWU representations, but there is still a long way to go 2. Future Directions To represent MWUs like humans do, we need better context and world knowledge modeling Thank you! Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 32 / 34
References I [Adi et al., 2017] Adi, Y., Kermany, E., Belinkov, Y., Lavi, O., and Goldberg, Y. (2017). Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. In Proceedings of ICLR Conference Track. [Asl, 2013] Asl, F. M. (2013). The impact of context on learning idioms in efl classes. TESOL Journal, 37(1):2. [Conneau et al., 2018] Conneau, A., Kruszewski, G., Lample, G., Barrault, L., and Baroni, M. (2018). What you can cram into a single vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2126–2136. Association for Computational Linguistics. [Cooper, 1999] Cooper, T. C. (1999). Processing of idioms by l2 learners of english. TESOL quarterly, 33(2):233–262. [Dinu et al., 2013] Dinu, G., Pham, N. T., and Baroni, M. (2013). General estimation and evaluation of compositional distributional semantic models. In Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality, pages 50–58, Sofia, Bulgaria. Association for Computational Linguistics. [Hartung, 2015] Hartung, M. (2015). Distributional Semantic Models of Attribute Meaning in Adjectives and Nouns. Ph.D. thesis, Heidelberg University. [Hendrickx et al., 2013] Hendrickx, I., Kozareva, Z., Nakov, P., Ó Séaghdha, D., Szpakowicz, S., and Veale, T. (2013). Semeval-2013 task 4: Free paraphrases of noun compounds. In SemEval, pages 138–143. [Mitchell and Lapata, 2010] Mitchell, J. and Lapata, M. (2010). Composition in distributional models of semantics. Cognitive science, 34(8):1388–1429. [Pavlick and Callison-Burch, 2016] Pavlick, E. and Callison-Burch, C. (2016). Most "babies" are "little" and most "problems" are "huge": Compositional entailment in adjective-nouns. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2164–2173, Berlin, Germany. Association for Computational Linguistics. [Poliak et al., 2017] Poliak, A., Rastogi, P., Martin, M. P., and Van Durme, B. (2017). Efficient, compositional, order-sensitive n-gram embeddings. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 503–508, Valencia, Spain. Association for Computational Linguistics. Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 33 / 34
References II [Reddy et al., 2011] Reddy, S., McCarthy, D., and Manandhar, S. (2011). An empirical study on compositionality in compound nouns. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 210–218, Chiang Mai, Thailand. Asian Federation of Natural Language Processing. [Tratz, 2011] Tratz, S. (2011). Semantically-enriched parsing for natural language understanding. University of Southern California. [Tu and Roth, 2012] Tu, Y. and Roth, D. (2012). Sorting out the most confusing english phrasal verbs. In *SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pages 65–69, Montréal, Canada. Association for Computational Linguistics. [Vincze et al., 2011] Vincze, V., Nagy T., I., and Berend, G. (2011). Multiword expressions and named entities in the wiki50 corpus. In Proceedings of the International Conference Recent Advances in Natural Language Processing 2011, pages 289–295. Association for Computational Linguistics. [Wieting et al., 2015] Wieting, J., Bansal, M., Gimpel, K., and Livescu, K. (2015). Towards universal paraphrastic sentence embeddings. CoRR, abs/1511.08198. [Zanzotto et al., 2010] Zanzotto, F. M., Korkontzelos, I., Fallucchi, F., and Manandhar, S. (2010). Estimating linear models for compositional distributional semantics. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 1263–1271. Association for Computational Linguistics. Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 34 / 34
You can also read