WORDNETGRAPH: STRUCTURING WORDNET NATURAL LANGUAGE DEFINITIONS
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
WordNetGraph: Structuring WordNet Natural Language Definitions Vivian Silva, Santos , Jelena Mitrovic∗1 , and Siegfried Handschuh 1 Faculty of Computer Science and Mathematics, University of Passau – Germany Abstract WordNetGraph: Structuring WordNet Natural Language Definitions WordNet is largely used as a linguistic resource in a number of semantic tasks, such as Ques- tion Answering, Information Retrieval, Text Entailment, etc., but systems usually query only the links between terms, such as synonym, hypernym or derivational form relationships. The synsets’ definitions are usually left aside, although they contain a large amount of relevant information. These natural language definitions can serve as a rich source of knowledge, but structuring them into a comprehensible semantic model is essential for making them useful in semantic interpretation tasks. In order to allow the use of WordNet’s natural language definitions as a structured knowledge source in NLP tasks, we developed the WordNetGraph, a graph knowledge base built accord- ing to the methodology described in [1]. WordNetGraph builds upon a conceptual model based on entity-centered semantic roles for definitions [2], that is, roles that express the part played by an expression in a definition, showing how it relates to the definiendum, i.e., the entity being defined. This model extends the classic Aristotle’s genus-differentia definition pattern [3, 4, 5]: the genus concepts is replaced by the supertype role (the definiendum’s su- perclass, immediate or not); the essential properties represented by the differentia concept is split into the differentia quality and differentia event roles; and other roles, such as associated fact, purpose or accessory quality, among others, represent the definiendum’s non-essential attributes. For building the graph, a small sample of WordNet definitions was first automatically pre- annotated, using the syntactic patterns described in [2] to assign the suitable semantic roles to each segment in a definition, and then manually curated to create a training dataset. This dataset was used to train a machine learning classifier [6], which was later used to label all WordNet noun and verb definitions. After a post-processing phase to fix minor errors in the sequence of labels, the classified data was then serialized in RDF format. Figure 1 shows an example of labeled definition (for the WordNet synset ”lake poets”). The same labeled definition is depicted in the final graph format in Figure 2. WordNetGraph was primarily designed for and successfully used in an interpretable text en- tailment recognition approach for providing human-readable justifications for the entailment decision. Using an algorithm based on distributional semantics [7] to navigate the graph, we look for a path linking the entailing text T to the entailed hypothesis H. If we succeed, then the entailment is confirmed, and the contents of the nodes in the retrieved path are used to build a natural language justification that explains why the entailment is true and what exactly the semantic relationship between T and H is. The complete description of the text entailment recognition approach, including evaluation results and justification examples can ∗ Speaker sciencesconf.org:wnlex2018:217083
be found in [8]. Figure 1. Example of role labeling for the definition of ”lake poets” Figure 2. RDF representation for the definition of ”lake poets” In future work, this methodology will be applied to GermaNet, and it will also include the adjective synsets because they are organized hierarchically in the lexico-semantic net- work for the German language. References Silva, V. S., Freitas, A., and Handschuh, S. (2018). Building a Knowledge Graph from Nat- ural Language Definitions for Interpretable Text Entailment Recognition. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Silva, V. S., Handschuh, S., and Freitas, A. (2016). Categorization of semantic roles for dictionary definitions. In Cognitive Aspects of the Lexicon (CogALex-V), Workshop at COLING 2016, pages 176–184. Berg, J. (1982). Aristotle’s theory of definition. ATTI del Convegno Internazionale di Storia della Logica, pages 19–30. Granger, E. H. (1984). Aristotle on genus and differentia. Journal of the History of Philos- ophy, 22(1):1–23. Lloyd, A. C. (1962). Genus, species and ordered series in Aristotle. Phronesis, pages 67–90. Mesnil, G., Dauphin, Y., Yao, K., Bengio, Y., Deng, L., Hakkani-Tur, D., He, X., Heck, L., Tur, G., Yu, D., et al. (2015). Using recurrent neural networks for slot filling in spoken lan- guage understanding. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 23(3):530–539. Freitas, A., da Silva, J. a. C. P., Curry, E., and Buitelaar, P. (2014). A distributional semantics approach for selective reasoning on commonsense graph knowledge bases. In In- ternational Conference on Applications of Natural Language to Data Bases/Information Systems, pages 21– 32. Springer. Silva, V. S., Freitas, A., and Handschuh, S. (2018). Recognizing and justifying text entail- ment through distributional navigation on definition graphs. In AAAI.
Refining WordNets in the Context of OntoLex-Lemon Thierry Declerck∗1 1 Austrian Center for Digital Humanities, at Austrian Academy of Sciences – Austria Abstract In the paper [1], we presented an approach for encoding German compounds listed in Ger- maNet into the decomposition module of OntoLex-Lemon (https://www.w3.org/2016/05/ontolex/). This lead us to the possibility to associate distinct senses to a component that is used in dif- ferent compounds. Beyond this we could also associate senses to components of compounds that are not per-se a lexical entry. We are currently extended this word to the consideration of morphological variants of a word that is listed in Princeton WordNet [2]. This concerns inflectional and derivational variants. Inflectional variants are relevant as some differences in the use of the singular or the plural form of a word can affect the type and the range of senses (or synsets) associated to a specific lemma. Our approach would thus consist in not associated synsets only to lemmas, but also to forms (in the terminology of OntoLex-Lemon). This can be easily done while adding some restrictions to the senses listed in OntoLex-Lemon. We will present our current encoding of such phenomena in the poster. Derivation is also a topic of interest as this would allow to describe (slight) sense modifica- tions when considering, for example, a verb and its nominalisation, or between an adjective and its adverbial derivation. Using OntoLex-Lemon for encoding those aspects could help in extending WordNet with this type of information, including shift of meanings. We are currently working on proposing an encoding of this phenomenon in OntoLex-Lemon References: Thierry Declerck, Piroska Lendvai. Towards a Formal Representation of Components of German Compounds in: Micha Elsner, Sandra Kuebler (eds.): Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Mor- phology, Berlin, Germany, ACL, Humboldt University, 8/2016 Princeton University ”About WordNet.” https://wordnet.princeton.edu/WordNet. Prince- ton University. 2010. ∗ Speaker sciencesconf.org:wnlex2018:219592
WordNet and Distributional Semantics for Computational Rhetoric Jelena Mitrović, Siegfried Handschuh Faculty of Computer Science and Mathematics, University of Passau Computational rhetoric (CR) is an area of Natural Language Processing (NLP) dealing with computational approaches to modelling and detection of rhetorical figures, as well as rhetorical relations which, in turn, might aid tasks such as Sentiment and Opinion Mining and Analysis, Argument mining, Argumentation modelling, Analysis of political argumentation etc. (Mitrović, et al., 2017). In this poster, we present some recent advances in CR relating to using WordNet as a starting point and a valuable resource, as well as future directions in this regard, relating to the paradigm of Distributional semantics. Ontological modelling of rhetorical figures (Mladenović and Mitrović, 2013) and rhetorical relations (Groza et al. 2016) is one approach to salient CR solutions. On the other hand, WordNet can successfully be used in CR. For figures Irony and Sarcasm Serbian WordNet ontology (SWN) was used, enhanced with new semantic relations specificOf /specifiedBy which connect adjective and noun synsets and play the semantic role of the figure Simile (Mladenović et al., 2016; Mladenović et al., 2017; Mitrović, 2018). Detection of rhetorical figures was performed in a Machine learning system shown in Figure 1. Figure 1 Detection of Irony and Sarcasm using WordNet Distributional semantics models and distributional representations, such as word embeddings, have been a hot topic in NLP and CR recently. Khodak et al. (2017) have used word embeddings to construct Wordnets in French and Russian and their approach can be extended to other languages. Gutierrez et al. (2016) investigate compositional distributional semantic models for literal and metaphorical senses. O’Reilly and Harris (2017) describe a multi- dimensional vector-space to imagine and calculate the rhetorical figure Antimetabole, which has been seen as an important figure in political argumentation (Mitrović et al., 2017). Likewise, Zayed et al. (2018) identify metaphors on the phrase level using distributed representations of word meaning. All these works pave the way to a plethora of possibilities for hybrid approaches
to using WordNet and its distributed representation for Computational rhetoric purposes. We envision exciting research directions stemming from Distributional semantics and WordNet modelling, to harness the deeper semantics of lexico-semantic networks and allow for new NLP and CR approaches. References Groza, T., Kim, H.L., Handschuh, S. (2016). SALT: Enriching LATEX with semantic annotations. In Proceedings of the 5th International Semantic Web Conference (ISWC 2006), Athens, GA. Gutierrez, D., Shutova, E., Marghetis T., and Bergen, B. (2016). Literal and Metaphorical Senses in Compositional Distributional Semantic Models. In Proceedings of ACL 2016, Berlin, Germany. Khodak, M., Risteski, A., Fellbaum, C., Arora, S. (2017). Extending and improving Wordnet via unsupervised word embeddings. Linguistic Issues in Language Technology, Vol 10, Issue 4. Mitrović J., O’Reilly C., Mladenovic M., Handschuh S. (2017). Ontological Representations of Rhetorical Figures for Argument Mining. Argument and Computation. 2018;7(3). Mitrović, J. (2018). Electronic Lexical Resources and Tools for Natural Language Processing of Serbian and their Enhancement via Crowdsourcing. PhD Thesis. University of Belgrade. http://uvidok.rcub.bg.ac.rs/bitstream/handle/123456789/2431/Doktorat.pdf?sequence=2 Mladenović, M. and Mitrović, J. (2013). Ontology of rhetorical figures for Serbian. InProceedings of Text, Speech, and Dialogue –16th International Conference, TSD2013. I. Habernal and V. Matoušek, eds. pp. 386–393. Mladenović, M., Mitrović, J., Krstev, C. (2016). A language-independent model for introducing a new semantic relation between adjectives and nouns in a WordNet. InProceedings of Eight Global WordNet Conference, GWC2016, pp. 218–225. Mladenović, M., Mitrović, J., Krstev, C, Stanković, R. (2017). Using Lexical Resources for Irony and Sarcasm Classification. The 8th Balkan Conference in Informatics, Skopje, Macedonia, 20- 23. O'Reilly, C. and Harris R. A. (2017) Antimetabole and Image Schemata-ontological and vector space models. Joint Ontology Workshop 2017, Bozen-Bolzano. Zayed, O., McCree, J.P., Buitelaar, P. (2018) Phrase-level metaphor identification using distributed representations of word meaning. NAACL 2018-FigLang2018 Workshop.
You can also read