The Role of Modifier and Head Properties in Predicting the Compositionality of English and German Noun-Noun Compounds: A Vector-Space Perspective
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
The Role of Modifier and Head Properties in Predicting the Compositionality of English and German Noun-Noun Compounds: A Vector-Space Perspective Sabine Schulte im Walde and Anna Hätty and Stefan Bott Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart Pfaffenwaldring 5B, 70569 Stuttgart, Germany {schulte,haettyaa,bottsn}@ims.uni-stuttgart.de Abstract modifier and head transparency (T) vs. opaque- ness (O) within a compound. Examples for these In this paper, we explore the role of con- categories were car-wash (TT), strawberry (OT), stituent properties in English and Ger- jailbird (TO), and hogwash (OO). Libben et al. man noun-noun compounds (corpus fre- confirmed Zwitserlood’s analyses that both se- quencies of the compounds and their con- mantically transparent and semantically opaque stituents; productivity and ambiguity of compounds show morphological constituency; in the constituents; and semantic relations addition, the semantic transparency of the head between the constituents), when predict- constituent was found to play a significant role. ing the degrees of compositionality of the From a computational point of view, address- compounds within a vector space model. ing the compositionality of noun compounds (and The results demonstrate that the empirical multi-word expressions in more general) is a cru- and semantic properties of the compounds cial ingredient for lexicography and NLP appli- and the head nouns play a significant role. cations, to know whether the expression should be treated as a whole, or through its constituents, 1 Introduction and what the expression means. For example, The past 20+ years have witnessed an enormous studies such as Cholakov and Kordoni (2014), amount of discussions on whether and how the Weller et al. (2014), Cap et al. (2015), and Salehi modifiers and the heads of noun-noun compounds et al. (2015b) have integrated the prediction of such as butterfly, snowball and teaspoon influence multi-word compositionality into statistical ma- the compositionality of the compounds, i.e., the chine translation. degree of transparency vs. opaqueness of the com- Computational approaches to automatically pounds. The discussions took place mostly in psy- predict the compositionality of noun compounds cholinguistic research, typically relying on read- have mostly been realised as vector space mod- ing time and priming experiments. For example, els, and can be subdivided into two subfields: Sandra (1990) demonstrated in three priming ex- (i) approaches that aim to predict the meaning periments that both modifier and head constituents of a compound by composite functions, relying were accessed in semantically transparent En- on the vectors of the constituents (e.g., Mitchell glish noun-noun compounds (such as teaspoon), and Lapata (2010), Coecke et al. (2011), Baroni but there were no effects for semantically opaque et al. (2014), and Hermann (2014)); and (ii) ap- compounds (such as buttercup), when primed ei- proaches that aim to predict the degree of compo- ther on their modifier or head constituent. In con- sitionality of a compound, typically by comparing trast, Zwitserlood (1994) provided evidence that the compound vectors with the constituent vec- the lexical processing system is sensitive to mor- tors (e.g., Reddy et al. (2011), Salehi and Cook phological complexity independent of semantic (2013), Schulte im Walde et al. (2013), Salehi et transparency. Libben and his colleagues (Libben al. (2014; 2015a)). In line with subfield (ii), et al. (1997), Libben et al. (2003)) were the first this paper aims to distinguish the contributions who systematically categorised noun-noun com- of modifier and head properties when predicting pounds with nominal modifiers and heads into four the compositionality of English and German noun- groups representing all possible combinations of noun compounds in a vector space model. 148 Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics (*SEM 2016), pages 148–158, Berlin, Germany, August 11-12, 2016.
Up to date, computational research on noun one opaque constituent) produce semantic prim- compounds has largely ignored the influence of ing of their constituents. For the heads of seman- constituent properties on the prediction of compo- tically transparent compounds, a larger amount of sitionality. Individual pieces of research noticed facilitation was found than for the modifiers. Dif- differences in the contributions of modifier and ferences in the results by Sandra (1990) and Zwit- head constituents towards the composite functions serlood (1994) were supposedly due to different predicting compositionality (Reddy et al., 2011; definitions of partial opacity, and different prime– Schulte im Walde et al., 2013), but so far the target SOAs. roles of modifiers and heads have not been distin- Libben and his colleagues (Libben et al. (1997), guished. We use a new gold standard of German Libben (1998), and Libben et al. (2003)) were the noun-noun compounds annotated with corpus fre- first who systematically categorised noun-noun quencies of the compounds and their constituents; compounds with nominal modifiers and heads productivity and ambiguity of the constituents; and into four groups representing all possible com- semantic relations between the constituents; and binations of a constituent’s transparency (T) vs. we extend three existing gold standards of German opaqueness (O) within a compound: TT, OT, TO, and English noun-noun compounds (Ó Séaghdha, OO. Libben’s examples for these categories were 2007; von der Heide and Borgwaldt, 2009; Reddy car-wash (TT), strawberry (OT), jailbird (TO), et al., 2011) to include approximately the same and hogwash (OO). They confirmed Zwitserlood’s compound and constituent properties. Relying on analyses that both semantically transparent and se- a standard vector space model of compositional- mantically opaque compounds show morphologi- ity, we then predict the degrees of compositional- cal constituency, and also that the semantic trans- ity of the English and German noun-noun com- parency of the head constituent was found to play pounds, and explore the influences of the com- a significant role. Studies such as Jarema et al. pound and constituent properties. Our empirical (1999) and Kehayia et al. (1999) to a large ex- computational analyses reveal that the empirical tent confirmed the insights by Libben and his col- and semantic properties of the compounds and the leagues for French, Bulgarian, Greek and Polish. head nouns play a significant role in determining Regarding related computational work, promi- the compositionality of noun compounds. nent approaches to model the meaning of a com- pound or a phrase by a composite function include 2 Related Work Mitchell and Lapata (2010), Coecke et al. (2011), Regarding relevant psycholinguistic research on Baroni et al. (2014), and Hermann (2014)). In this the representation and processing of noun com- area, researchers combine the vectors of the com- pounds, Sandra (1990) hypothesised that an asso- pound/phrase constituents by mathematical func- ciative prime should facilitate access and recog- tions such that the resulting vector optimally rep- nition of a noun compound, if a compound con- resents the meaning of the compound/phrase. This stituent is accessed during processing. His three research is only marginally related to ours, since priming experiments revealed that in transparent we are interested in the degree of compositional- noun-noun compounds, both constituents are ac- ity of a compound, rather than its actual meaning. cessed, but he did not find priming effects for the Most closely related computational work in- constituents in opaque noun-noun compounds. cludes distributional approaches that predict the Zwitserlood (1994) performed an immediate degree of compositionality of a compound regard- partial repetition experiment and a priming exper- ing a specific constituent, by comparing the com- iment to explore and to distinguish morpholog- pound vector to the respective constituent vector. ical and semantic structures in noun-noun com- Most importantly, Reddy et al. (2011) used a stan- pounds. On the one hand, she confirmed San- dard distributional model to predict the compo- dra’s results that there is no semantic facilitation of sitionality of compound-constituent pairs for 90 any constituent in opaque compounds. In contrast, English compounds. They extended their predic- she found evidence for morphological complex- tions by applying composite functions (see above). ity, independent of semantic transparency, and that In a similar vein, Schulte im Walde et al. (2013) both transparent and also partially opaque com- predicted the compositionality for 244 German pounds (i.e., compounds with one transparent and compounds. Salehi et al. (2014) defined a cross- 149
lingual distributional model that used translations German Noun-Noun Compound Datasets As into multiple languages and distributional simi- basis for this work, we created a novel gold stan- larities in the respective languages, to predict the dard of German noun-noun compounds: Gh OST- compositionality for the two datasets from Reddy NN (Schulte im Walde et al., 2016). The new et al. (2011) and Schulte im Walde et al. (2013). gold standard was built such that it includes a rep- resentative choice of compounds and constituents 3 Noun-Noun Compounds from various frequency ranges, various productiv- Our focus of interest is on noun-noun compounds, ity ranges, with various numbers of senses, and such as butterfly, snowball and teaspoon as well with various semantic relations. In the follow- as car park, zebra crossing and couch potato in ing, we describe the creation process in some de- English, and Ahornblatt ‘maple leaf’, Feuerwerk tail, because the properties of the gold standard are ‘fireworks’, and Löwenzahn ‘dandelion’ in Ger- highly relevant for the distributional models. man, where both the grammatical head (in English Relying on the 11.7 billion words in the web and German, this is typically the rightmost con- corpus DECOW14AX 2 (Schäfer and Bildhauer, stituent) and the modifier are nouns. We are inter- 2012; Schäfer, 2015), we extracted all words that ested in the degrees of compositionality of noun- were identified as common nouns by the Tree Tag- noun compounds, i.e., the semantic relatedness be- ger (Schmid, 1994) and analysed as noun com- tween the meaning of a compound (e.g., snowball) pounds with exactly two nominal constituents by and the meanings of its constituents (e.g., snow the morphological analyser SMOR (Faaß et al., and ball). More specifically, this paper aims to 2010). This set of 154,960 two-part noun-noun explore factors that have been found to influence compound candidates was enriched with empiri- compound processing and representation, such as cal properties relevant for the gold standard: • frequency-based factors, i.e., the frequencies • corpus frequencies of the compounds and the of the compounds and their constituents (van constituents (i.e., modifiers and heads), rely- Jaarsveld and Rattink, 1988; Janssen et al., ing on DECOW14AX; 2008); • productivity of the constituents i.e., how • the productivity (morphological family size), many compound types contained a specific i.e., the number of compounds that share a modifier/head constituent; constituent (de Jong et al., 2002); and • number of senses of the compounds and the • semantic variables as the relationship be- constituents, relying on GermaNet (Hamp tween compound modifier and head: a teapot and Feldweg, 1997; Kunze, 2000). is a pot FOR tea; a snowball is a ball MADE OF snow (Gagné and Spalding, 2009; Ji et From the set of compound candidates we extracted al., 2011). a random subset that was balanced3 for In addition, we were interested in the effect of am- • the productivity of the modifiers: we cal- biguity (of both the modifiers and the heads) re- culated tertiles to identify modifiers with garding the compositionality of the compounds. low/mid/high productivity; Our explorations required gold standards of compounds that were annotated with all these • the ambiguity of the heads: we distinguished compound and constituent properties. Since most between heads with 1, 2 and >2 senses. previous work on computational predictions of For each of the resulting nine categories (three compositionality has been performed for English productivity ranges × three ambiguity ranges), and for German, we decided to re-use existing we randomly selected 20 noun-noun compounds datasets for both languages, which however re- 2 http://corporafromtheweb.org/decow14/ quired extensions to provide all properties we 3 We wanted to extract a random subset that at the same wanted to take into account. We also created a time was balanced across frequency, productivity and am- novel gold standard. In the following, we describe biguity ranges of the compounds and their constituents, but defining and combining several ranges for each of the three the datasets.1 criteria and for compounds as well as constituents would have 1 The datasets are available from http://www.ims. led to an explosion of factors to be taken into account, so we uni-stuttgart.de/data/ghost-nn/. focused on two main criteria instead. 150
from our candidate set, disregarding compounds constituent compositionality ratings on a scale with a corpus frequency < 2,000, and disregard- from 1 (definitely semantically opaque) to 6 (def- ing compounds containing modifiers or heads with initely semantically transparent). Another five na- a corpus-frequency < 100. We refer to this dataset tive speakers provided additional annotation for of 180 compounds balanced for modifier produc- our small core subset of 180 compounds on the tivity and head ambiguity as Gh OST-NN/S. same scale. As final compositionality ratings, we We also created a subset of 5 noun-noun com- use the mean compound–constituent ratings across pounds for each of the 9 criteria combinations, by the 13 annotators. randomly selecting 5 out of the 20 selected com- As alternative gold standard for German noun- pounds in each mode. This small, balanced sub- noun compounds, we used a dataset based on a set was then systematically extended by adding selection of noun compounds by von der Heide all compounds from the original set of compound and Borgwaldt (2009), that was previously used candidates with either the same modifier or the in computational models predicting composition- same head as any of the selected compounds. Tak- ality (Schulte im Walde et al., 2013; Salehi et al., ing Haarpracht as an example (the modifier is 2014). The dataset contains a subset of their com- Haar ’hair’, the head is Pracht ’glory’), we added pounds including 244 two-part noun-noun com- Haarwäsche, Haarkleid, Haarpflege, etc. as well pounds, annotated by compositionality ratings on as Blütenpracht, Farbenpracht, etc.4 We refer to a scale between 1 and 7. We enriched the existing this dataset of 868 compounds that destroyed the dataset with frequencies, and productivity and am- coherent balance of criteria underlying our ran- biguity scores, also based on DECOW14AX and dom extraction, but instead ensured a variety of GermaNet, to provide the same empirical infor- compounds with either the same modifiers or the mation as for the Gh OST-NN datasets. We refer same heads, as Gh OST-NN/XL. to this alternative German dataset as VD HB. The two sets of compounds (Gh OST-NN/S and Gh OST-NN/XL) were annotated with the seman- English Noun-Noun Compound Datasets tic relations between the modifiers and the heads, Reddy et al. (2011) created a gold standard for and compositionality ratings. Regarding seman- English noun-noun compounds. Assuming that tic relations, we applied the relation set sug- compounds whose constituents appeared either gested by Ó Séaghdha (2007), because (i) he as their hypernyms or in their definitions tend had evaluated his annotation relations and anno- to be compositional, they induced a candidate tation scheme, and (ii) his dataset had a similar compound set with various degrees of compound– size as ours, so we could aim for comparing re- constituent relatedness from WordNet (Miller et sults across languages. Ó Séaghdha (2007) him- al., 1990; Fellbaum, 1998) and Wiktionary. A self had relied on a set of nine semantic rela- random choice of 90 compounds that appeared tions suggested by Levi (1978), and designed and with a corpus frequency > 50 in the ukWaC evaluated a set of relations that took over four corpus (Baroni et al., 2009) constituted their of Levi’s relations (BE, HAVE, IN, ABOUT) gold-standard dataset and was annotated by and added two relations referring to event partici- compositionality ratings. Bell and Schäfer (2013) pants (ACTOR, INST(rument)) that replaced annotated the compounds with semantic relations the relations MAKE, CAUSE, FOR, FROM, using all of Levi’s original nine relation types: USE. An additional relation LEX refers to lexi- CAUSE, HAVE, MAKE, USE, BE, IN, calised compounds where no relation can be as- FOR, FROM, ABOUT. We refer to this dataset signed. Three native speakers of German anno- as R EDDY. tated the compounds with these seven semantic Ó Séaghdha developed computational models relations.5 Regarding compositionality ratings, to predict the semantic relations between modi- eight native speakers of German annotated all fiers and heads in English noun compounds (Ó 868 gold-standard compounds with compound– Séaghdha, 2008; Ó Séaghdha and Copestake, 2013; Ó Séaghdha and Korhonen, 2014). As 4 The translations of the example compounds are hair gold-standard basis for his models, he created a washing, hair dress, hair care, floral glory, and colour glory. dataset of compounds, and annotated the com- 5 In fact, the annotation was performed for a superset of 1,208 compounds, but we only took into account 868 com- pounds with semantic relations: He tagged and pounds with perfect agreement, i.e. IAA=1. parsed the written part of the British National Cor- 151
Annotation Language Dataset #Compounds Frequency/Productivity Ambiguity Relations Gh OST-NN/S 180 DECOW GermaNet Levi (7) DE Gh OST-NN/XL 868 DECOW GermaNet Levi (7) VD HB 244 DECOW GermaNet – R EDDY 90 ENCOW WordNet Levi (9) EN OS 396 ENCOW WordNet Levi (6) Table 1: Noun-noun compound datasets. pus using RASP (Briscoe and Carroll, 2002), and In this paper, we use VSMs in order to model applied a simple heuristics to induce compound compounds as well as constituents by distribu- candidates: He used all sequences of two or more tional vectors, and we determine the semantic re- common nouns that were preceded or followed by latedness between the compounds and their mod- sentence boundaries or by words not representing ifier and head constituents by measuring the dis- common nouns. Of these compound candidates, tance between the vectors. We assume that the a random selection of 2,000 instances was used closer a compound vector and a constituent vec- for relation annotation (Ó Séaghdha, 2007) and tor are to each other, the more compositional (i.e., classification experiments. The final gold standard the more transparent) the compound is, regard- is a subset of these compounds, containing 1,443 ing that constituent. Correspondingly, the more noun-noun compounds. We refer to this dataset as distant a compound vector and a constituent vec- OS. tor are to each other, the less compositional (i.e., Both English compound datasets were enriched the more opaque) the compound is, regarding that with frequencies and productivities, based on the constituent. ENCOW14AX 6 containing 9.6 billion words. We Our main questions regarding the VSMs are also added the number of senses of the con- concerned with the influence of constituent prop- stituents to both datasets, using WordNet. And we erties on the prediction of compositionality. I.e., collected compositionality ratings for a random how do the corpus frequencies of the compounds choice of 396 compounds from the OS dataset and their constituents, the productivity and the am- relying on eight experts, in the same way as the biguity of the constituents, and the semantic rela- Gh OST-NN ratings were collected. tions between the constituents influence the qual- ity of the predictions? Resulting Noun-Noun Compound Datasets Table 1 summarises the gold-standard datasets. 4.1 Vector Space Models (VSMs) They are of different sizes, but their empirical and semantic annotations have been aligned to a large We created a standard vector space model for extent, using similar corpora, relying on WordNets all our compounds and constituents in the vari- and similar semantic relation inventories based on ous datasets, using co-occurrence frequencies of Levi (1978). nouns within a sentence-internal window of 20 words to the left and 20 words to the right of 4 VSMs Predicting Compositionality the targets.7 The frequencies were induced from the German and English COW corpora, and trans- Vector space models (VSMs) and distributional in- formed to local mutual information (LMI) values formation have been a steadily increasing, integral (Evert, 2005). part of lexical semantic research over the past 20 Relying on the LMI vector space models, the years (Turney and Pantel, 2010): They explore cosine determined the distributional similarity the notion of “similarity” between a set of tar- between the compounds and their constituents, get objects, typically relying on the distributional which was in turn used to predict the degree hypothesis (Harris, 1954; Firth, 1957) to deter- mine co-occurrence features that best describe the 7 In previous work, we systematically compared window- words, phrases, sentences, etc. of interest. based and syntax-based co-occurrence variants for predicting compositionality (Schulte im Walde et al., 2013). The current 6 http://corporafromtheweb.org/encow14/ work adopted the best choice of co-occurrence dimensions. 152
of compositionality between the compounds and 4.3 Influence of Compound Properties on their constituents, assuming that the stronger the VSM Prediction Results distributional similarity (i.e., the cosine values), Figures 1 to 5 present the core results of this paper: the larger the degree of compositionality. The vec- They explore the influence of compound and con- tor space predictions were evaluated against the stituent properties on predicting compositionality. mean human ratings on the degree of composition- Since we wanted to optimise insight into the influ- ality, using the Spearman Rank-Order Correlation ence of the properties, we selected the 60 maxi- Coefficient ρ (Siegel and Castellan, 1988). mum instances and the 60 minimum instances for each property.9 For example, to explore the in- 4.2 Overall VSM Prediction Results fluence of head frequency on the prediction qual- Table 2 presents the overall prediction results ity, we selected the 60 most frequent and the 60 across languages and datasets. The mod column most infrequent compound heads from each gold- shows the ρ correlations for predicting only the standard resource, and calculated Spearman’s ρ degree of compositionality of compound–modifier for each set of 60 compounds with these heads. pairs; the head column shows the ρ correlations Figure 1 shows that the distributional model for predicting only the degree of compositional- predicts high-frequency compounds (red bars) bet- ity of compound–head pairs; and the both col- ter than low-frequency compounds (blue bars), umn shows the ρ correlations for predicting the across datasets. The differences are significant for degree of compositionality of compound–modifier Gh O S T-NN/XL. and compound–head pairs at the same time. Dataset mod head both Gh OST-NN/S 0.48 0.57 0.46 DE Gh OST-NN/XL 0.49 0.59 0.47 VD HB 0.65 0.60 0.61 R EDDY 0.48 0.60 0.56 EN OS 0.46 0.39 0.35 Table 2: Overall prediction results (ρ). The models for VD HB and R EDDY represent replications of similar models in Schulte im Walde et al. (2013) and Reddy et al. (2011), respectively, but using the much larger COW corpora. Figure 1: Effect of compound frequency. Overall, the both prediction results on VD HB are significantly8 better than all others but R EDDY; Figure 2 shows that the distributional model and the prediction results on OS compounds are predicts compounds with low-frequency heads significantly worse than all others. We can also better than compounds with high-frequency heads compare within-dataset results: Regarding the two (right panel), while there is no tendency regarding Gh OST-NN datasets and the R EDDY dataset, the the modifier frequencies (left panel). The differ- VSM predictions for the compound–head pairs are ences regarding the head frequencies are signifi- better than for the compound–modifier pairs. Re- cant (p = 0.1) for both Gh O S T-NN datasets. garding the VD HB and the OS datasets, the VSM Figure 3 shows that the distributional model predictions for the compound–modifier pairs are also predicts compounds with low-productivity better than for the compound–head pairs. These heads better than compounds with high- differences do not depend on the language (ac- productivity heads (right panel), while there cording to our datasets), and are probably due to is no tendency regarding the productivities of properties of the specific gold standards that we modifiers (left panel). The prediction differences did not control. They are, however, also not the regarding the head productivities are significant main point of this paper. for Gh O S T-NN/S (p < 0.05). 8 9 All significance tests in this paper were performed by For R EDDY, we could only use 45 maximum/minimum Fisher r-to-z transformation. instances, since the dataset only contains 90 compounds. 153
Figure 2: Effect of modifier/head frequency. Figure 3: Effect of modifier/head productivity. Figure 4: Effect of modifier/head ambiguity. 154
Figure 4 shows that the distributional model 5 Discussion also predicts compounds with low-ambiguity While modifier frequency, productivity and am- heads better than compounds with high-ambiguity biguity did not show a consistent effect on the heads (right panel) –with one exception (Gh OST- predictions, head frequency, productivity and NN/XL)– while there is no tendency regarding the ambiguity influenced the predictions such that ambiguities of modifiers (left panel). The predic- the prediction quality for compounds with low- tion differences regarding the head ambiguities are frequency, low-productivity and low-ambiguity significant for Gh O S T-NN/XL (p < 0.01). heads was better than for compounds with high- Figure 5 compares the predictions of the dis- frequency, high-productivity and high-ambiguity tributional model regarding the semantic rela- heads. The differences were significant only for tions between modifiers and heads, focusing on our new Gh OST-NN datasets. In addition, the Gh O S T-NN/XL. The numbers in brackets refer to compound frequency also had an effect on the pre- the number of compounds with the respective re- dictions, with high-frequency compounds receiv- lation. The plot reveals differences between pre- ing better prediction results than low-frequency dictions of compounds with different relations. compounds. Finally, the quality of predictions also differed for compound relation types, with BE compounds predicted best, and ACTOR com- pounds predicted worst. These differences were ascertained mostly in the Gh OST-NN and the OS datasets. Our results raise two main questions: (1) What does it mean if a distributional model predicts a certain subset of compounds (with specific properties) “better” or “worse” than other subsets? (2) What are the implications for (a) psycholin- guistic and (b) computational models regard- Figure 5: Effect of semantic relation. ing the compositionality of noun compounds? Regarding question (1), there are two options why a distributional model predicts a certain sub- Table 3 summarises those differences across set of compounds better or worse than other sub- gold standards that are significant (where filled sets. On the one hand, one of the underlying gold- cells refer to rows significantly outperforming standard datasets could contain compounds whose columns). Overall, the compositionality of compositionality scores are easier to predict than BE compounds is predicted significantly better the compositionality scores of compounds in a than the compositionality of HAVE compounds different dataset. On the other hand, even if (in R EDDY), INST and ABOUT compounds (in there were differences in individual dataset pairs, Gh OST-NN) and ACTOR compounds (in Gh OST- this would not explain why we consistently find NN and OS). The compositionality of ACTOR modelling differences for head constituent proper- compounds is predicted significantly worse than ties (and compound properties) but not for modi- the compositionality of BE, HAVE, IN and fier constituent properties. We therefore conclude INST compounds in both Gh OST-NN and OS. that the effects of compound and head properties are due to the compounds’ morphological con- HAVE INST ABOUT ACTOR stituency, with specific emphasis on the influences BE R EDDY Gh OST Gh OST Gh OST, OS of the heads. HAVE OS Gh OST, OS IN Gh OST, OS Looking at the individual effects of the com- INST Gh OST, OS pound and head properties that influence the dis- tributional predictions, we hypothesise that high- Table 3: Significant differences: relations. frequent compounds are easier to predict because they have a better corpus coverage (and less 155
sparse data) than low-frequent compounds, and compounds across various frequency, productivity that they contain many clearly transparent com- and ambiguity ranges. pounds (such as Zitronensaft ‘lemon juice’), and at the same time many clearly opaque compounds 6 Conclusion (such as Eifersucht ‘jealousy’, where the literal translations of the constituents are ‘eagerness’ and We explored the role of constituent properties ‘addiction’). Concerning the decrease in predic- in English and German noun-noun compounds, tion quality for more frequent, more productive when predicting compositionality within a vec- and more ambiguous heads, we hypothesise that tor space model. The results demonstrated that all of these properties are indicators of ambiguity, the empirical and semantic properties of the com- and the more ambiguous a word is, the more diffi- pounds and the head nouns play a significant role. cult it is to provide a unique distributional predic- Therefore, psycholinguistic experiments as well as tion, as distributional co-occurrence in most cases computational models are advised to carefully bal- (including our current work) subsumes the con- ance their selections of compound targets accord- texts of all word senses within one vector. For ex- ing to compound and constituent properties. ample, more than half of the compounds with the most frequent and also with the most productive Acknowledgments heads have the head Spiel, which has six senses in GermaNet and covers six relations (BE, IN, The research presented in this paper was funded INST, ABOUT, ACTOR, LEX). by the DFG Heisenberg Fellowship SCHU 2580/1 Regarding question (2), the results of our distri- (Sabine Schulte im Walde), the DFG Research butional predictions confirm psycholinguistic re- Grant SCHU 2580/2 “Distributional Approaches search that identified morphological constituency to Semantic Relatedness” (Stefan Bott), and the in noun-noun compounds: Our models clearly dis- DFG Collaborative Research Center SFB 732 tinguish between properties of the whole com- (Anna Hätty). pounds, properties of the modifier constituents, and properties of the head constituents. Further- References more, our models reveal the need to carefully bal- ance the frequencies and semantic relations of tar- Marco Baroni, Silvia Bernardini, Adriano Ferraresi, and Eros Zanchetta. 2009. The WaCky Wide Web: get compounds, and to carefully balance the fre- A Collection of Very Large Linguistically Processed quencies, productivities and ambiguities of their Web-Crawled Corpora. Language Resources and head constituents, in order to optimise experiment Evaluation, 43(3):209–226. interpretations, while a careful choice of empirical modifier properties seems to play a minor role. Marco Baroni, Raffaella Bernardi, and Roberto Zam- parelli. 2014. Frege in Space: A Program for Com- For computational models, our work provides positional Distributional Semantics. Linguistic Is- similar implications. We demonstrated the need to sues in Language Technologies, 9(6):5–110. carefully balance gold-standard datasets for multi- Melanie J. Bell and Martin Schäfer. 2013. Semantic word expressions according to the empirical and Transparency: Challenges for Distributional Seman- semantic properties of the multi-word expressions tics. In Proceedings of the IWCS Workshop on For- themselves, and also according to those of the con- mal Distributional Semantics, pages 1–10, Potsdam, Germany. stituents. In the case of noun-noun compounds, the properties of the nominal modifiers were of Ted Briscoe and John Carroll. 2002. Robust Accurate minor importance, but regarding other multi-word Statistical Annotation of General Text. In Proceed- expressions, this might differ. If datasets are not ings of the 3rd Conference on Language Resources and Evaluation, pages 1499–1504, Las Palmas de balanced for compound and constituent properties, Gran Canaria, Spain. the qualities of model predictions are difficult to interpret, because it is not clear whether biases in Fabienne Cap, Manju Nirmal, Marion Weller, and empirical properties skewed the results. Our ad- Sabine Schulte im Walde. 2015. How to Account for Idiomatic German Support Verb Constructions in vice is strengthened by the fact that most signifi- Statistical Machine Translation. In Proceedings of cant differences in prediction results were demon- the 11th Workshop on Multiword Expressions, pages strated for our new gold standard, which includes 19–28, Denver, Colorado, USA. 156
Kostadin Cholakov and Valia Kordoni. 2014. Better Hongbo Ji, Christina L. Gagné, and Thomas L. Spald- Statistical Machine Translation through Linguistic ing. 2011. Benefits and Costs of Lexical Decompo- Treatment of Phrasal Verbs. In Proceedings of the sition and Semantic Integration during the Process- Conference on Empirical Methods in Natural Lan- ing of Transparent and Opaque English Compounds. guage Processing, pages 196–201, Doha, Qatar. Journal of Memory and Language, 65:406–430. Bob Coecke, Mehrnoosh Sadrzadeh, and Stephen Eva Kehayia, Gonia Jarema, Kyrana Tsapkini, Danuta Clark. 2011. Mathematical Foundations for a Com- Perlak, Angela Ralli, and Danuta Kadzielawa. 1999. positional Distributional Model of Meaning. Lin- The Role of Morphological Structure in the Process- guistic Analysis, 36(1-4):345–384. ing of Compounds: The Interface between Linguis- tics and Psycholinguistics. Brain and Language, Nicole H. de Jong, Laurie B. Feldman, Robert 68:370–377. Schreuder, Michael Pastizzo, and Harald R. Baayen. 2002. The Processing and Representation of Dutch Claudia Kunze. 2000. Extension and Use of Ger- and English Compounds: Peripheral Morphological maNet, a Lexical-Semantic Database. In Proceed- and Central Orthographic Effects. Brain and Lan- ings of the 2nd International Conference on Lan- guage, 81:555–567. guage Resources and Evaluation, pages 999–1002, Athens, Greece. Stefan Evert. 2005. The Statistics of Word Co- Occurrences: Word Pairs and Collocations. Ph.D. Judith N. Levi. 1978. The Syntax and Semantics of thesis, Institut für Maschinelle Sprachverarbeitung, Complex Nominals. Academic Press, London. Universität Stuttgart. Gary Libben, Martha Gibson, Yeo Bom Yoon, and Do- Gertrud Faaß, Ulrich Heid, and Helmut Schmid. 2010. miniek Sandra. 1997. Semantic Transparency and Design and Application of a Gold Standard for Mor- Compound Fracture. Technical Report 9, CLAS- phological Analysis: SMOR in Validation. In Pro- NET Working Papers. ceedings of the 7th International Conference on Language Resources and Evaluation, pages 803– Gary Libben, Martha Gibson, Yeo Bom Yoon, and Do- 810, Valletta, Malta. miniek Sandra. 2003. Compound Fracture: The Role of Semantic Transparency and Morphological Christiane Fellbaum, editor. 1998. WordNet – An Elec- Headedness. Brain and Language, 84:50–64. tronic Lexical Database. Language, Speech, and Communication. MIT Press, Cambridge, MA. Gary Libben. 1998. Semantic Transparency in the Processing of Compounds: Consequences for Rep- John R. Firth. 1957. Papers in Linguistics 1934-51. resentation, Processing, and Impairment. Brain and Longmans, London, UK. Language, 61:30–44. Christina L. Gagné and Thomas L. Spalding. 2009. George A. Miller, Richard Beckwith, Christiane Fell- Constituent Integration during the Processing of baum, Derek Gross, and Katherine J. Miller. 1990. Compound Words: Does it involve the Use of Re- Introduction to Wordnet: An On-line Lexical lational Structures? Journal of Memory and Lan- Database. International Journal of Lexicography, guage, 60:20–35. 3(4):235–244. Birgit Hamp and Helmut Feldweg. 1997. GermaNet Jeff Mitchell and Mirella Lapata. 2010. Composition – A Lexical-Semantic Net for German. In Proceed- in Distributional Models of Semantics. Cognitive ings of the ACL Workshop on Automatic Information Science, 34:1388–1429. Extraction and Building Lexical Semantic Resources for NLP Applications, pages 9–15, Madrid, Spain. Diarmuid Ó Séaghdha and Ann Copestake. 2013. Interpreting Compound Nouns with Kernel Meth- Zellig Harris. 1954. Distributional structure. Word, ods. Journal of Natural Language Engineering, 10(23):146–162. 19(3):331–356. Karl Moritz Hermann. 2014. Distributed Represen- Diarmuid Ó Séaghdha and Anna Korhonen. 2014. tations for Compositional Semantics. Ph.D. thesis, Probabilistic Distributional Semantics with La- University of Oxford. tent Variable Models. Computational Linguistics, 40(3):587–631. Niels Janssen, Yanchao Bi, and Alfonso Caramazza. 2008. A Tale of Two Frequencies: Determining the Diarmuid Ó Séaghdha. 2007. Designing and Evalu- Speed of Lexical Access for Mandarin Chinese and ating a Semantic Annotation Scheme for Compound English Compounds. Language and Cognitive Pro- Nouns. In Proceedings of Corpus Linguistics, Birm- cesses, 23:1191–1223. ingham, UK. Gonia Jarema, Celine Busson, Rossitza Nikolova, Diarmuid Ó Séaghdha. 2008. Learning Compound Kyrana Tsapkini, and Gary Libben. 1999. Process- Noun Semantics. Ph.D. thesis, University of Cam- ing Compounds: A Cross-Linguistic Study. Brain bridge, Computer Laboratory. Technical Report and Language, 68:362–369. UCAM-CL-TR-735. 157
Siva Reddy, Diana McCarthy, and Suresh Manandhar. Sabine Schulte im Walde, Anna Hätty, Stefan Bott, and 2011. An Empirical Study on Compositionality in Nana Khvtisavrishvili. 2016. Gh oSt-NN: A Rep- Compound Nouns. In Proceedings of the 5th In- resentative Gold Standard of German Noun-Noun ternational Joint Conference on Natural Language Compounds. In Proceedings of the 10th Interna- Processing, pages 210–218, Chiang Mai, Thailand. tional Conference on Language Resources and Eval- uation, pages 2285–2292, Portoroz, Slovenia. Bahar Salehi and Paul Cook. 2013. Predicting the Compositionality of Multiword Expressions Using Sidney Siegel and N. John Castellan. 1988. Non- Translations in Multiple Languages. In Proceedings parametric Statistics for the Behavioral Sciences. of the 2nd Joint Conference on Lexical and Compu- McGraw-Hill, Boston, MA. tational Semantics, pages 266–275, Atlanta, GA. Peter D. Turney and Patrick Pantel. 2010. From Fre- Bahar Salehi, Paul Cook, and Timothy Baldwin. 2014. quency to Meaning: Vector Space Models of Se- Using Distributional Similarity of Multi-way Trans- mantics. Journal of Artificial Intelligence Research, lations to Predict Multiword Expression Composi- 37:141–188. tionality. In Proceedings of the 14th Conference of Henk J. van Jaarsveld and Gilbert E. Rattink. 1988. the European Chapter of the Association for Com- Frequency Effects in the Processing of Lexicalized putational Linguistics, pages 472–481, Gothenburg, and Novel Nominal Compounds. Journal of Psy- Sweden. cholinguistic Research, 17:447–473. Bahar Salehi, Paul Cook, and Timothy Baldwin. Claudia von der Heide and Susanne Borgwaldt. 2009. 2015a. A Word Embedding Approach to Predicting Assoziationen zu Unter-, Basis- und Oberbegrif- the Compositionality of Multiword Expressions. In fen. Eine explorative Studie. In Proceedings of Proceedings of the Conference of the North Amer- the 9th Norddeutsches Linguistisches Kolloquium, ican Chapter of the Association for Computational pages 51–74. Linguistics/Human Language Technologies, pages 977–983, Denver, Colorado, USA. Marion Weller, Fabienne Cap, Stefan Müller, Sabine Schulte im Walde, and Alexander Fraser. 2014. Dis- Bahar Salehi, Nitika Mathur, Paul Cook, and Timothy tinguishing Degrees of Compositionality in Com- Baldwin. 2015b. The Impact of Multiword Ex- pound Splitting for Statistical Machine Translation. pression Compositionality on Machine Translation In Proceedings of the 1st Workshop on Computa- Evaluation. In Proceedings of the 11th Workshop on tional Approaches to Compound Analysis, pages 81– Multiword Expressions, pages 54–59, Denver, Col- 90, Dublin, Ireland. orado, USA. Pienie Zwitserlood. 1994. The Role of Semantic Dominiek Sandra. 1990. On the Representation Transparency in the Processing and Representation and Processing of Compound Words: Automatic of Dutch Compounds. Language and Cognitive Access to Constituent Morphemes does not occur. Processes, 9:341–368. The Quarterly Journal of Experimental Psychology, 42A:529–567. Roland Schäfer and Felix Bildhauer. 2012. Building Large Corpora from the Web Using a New Efficient Tool Chain. In Proceedings of the 8th International Conference on Language Resources and Evaluation, pages 486–493, Istanbul, Turkey. Roland Schäfer. 2015. Processing and Querying Large Web Corpora with the COW14 Architecture. In Proceedings of the 3rd Workshop on Challenges in the Management of Large Corpora, pages 28–34, Mannheim, Germany. Helmut Schmid. 1994. Probabilistic Part-of-Speech Tagging using Decision Trees. In Proceedings of the 1st International Conference on New Methods in Language Processing. Sabine Schulte im Walde, Stefan Müller, and Stephen Roller. 2013. Exploring Vector Space Models to Predict the Compositionality of German Noun-Noun Compounds. In Proceedings of the 2nd Joint Con- ference on Lexical and Computational Semantics, pages 255–265, Atlanta, GA. 158
You can also read