WORDNETGRAPH: STRUCTURING WORDNET NATURAL LANGUAGE DEFINITIONS

Page created by Marc Richardson
 
CONTINUE READING
WordNetGraph: Structuring WordNet Natural
                Language Definitions

                  Vivian Silva, Santos , Jelena Mitrovic∗1 , and Siegfried Handschuh
              1
                  Faculty of Computer Science and Mathematics, University of Passau – Germany

                                                  Abstract

          WordNetGraph: Structuring WordNet Natural Language Definitions
      WordNet is largely used as a linguistic resource in a number of semantic tasks, such as Ques-
      tion Answering, Information Retrieval, Text Entailment, etc., but systems usually query only
      the links between terms, such as synonym, hypernym or derivational form relationships. The
      synsets’ definitions are usually left aside, although they contain a large amount of relevant
      information. These natural language definitions can serve as a rich source of knowledge, but
      structuring them into a comprehensible semantic model is essential for making them useful
      in semantic interpretation tasks.
      In order to allow the use of WordNet’s natural language definitions as a structured knowledge
      source in NLP tasks, we developed the WordNetGraph, a graph knowledge base built accord-
      ing to the methodology described in [1]. WordNetGraph builds upon a conceptual model
      based on entity-centered semantic roles for definitions [2], that is, roles that express the part
      played by an expression in a definition, showing how it relates to the definiendum, i.e., the
      entity being defined. This model extends the classic Aristotle’s genus-differentia definition
      pattern [3, 4, 5]: the genus concepts is replaced by the supertype role (the definiendum’s su-
      perclass, immediate or not); the essential properties represented by the differentia concept is
      split into the differentia quality and differentia event roles; and other roles, such as associated
      fact, purpose or accessory quality, among others, represent the definiendum’s non-essential
      attributes.
      For building the graph, a small sample of WordNet definitions was first automatically pre-
      annotated, using the syntactic patterns described in [2] to assign the suitable semantic roles
      to each segment in a definition, and then manually curated to create a training dataset. This
      dataset was used to train a machine learning classifier [6], which was later used to label all
      WordNet noun and verb definitions. After a post-processing phase to fix minor errors in the
      sequence of labels, the classified data was then serialized in RDF format. Figure 1 shows
      an example of labeled definition (for the WordNet synset ”lake poets”). The same labeled
      definition is depicted in the final graph format in Figure 2.
      WordNetGraph was primarily designed for and successfully used in an interpretable text en-
      tailment recognition approach for providing human-readable justifications for the entailment
      decision. Using an algorithm based on distributional semantics [7] to navigate the graph, we
      look for a path linking the entailing text T to the entailed hypothesis H. If we succeed, then
      the entailment is confirmed, and the contents of the nodes in the retrieved path are used
      to build a natural language justification that explains why the entailment is true and what
      exactly the semantic relationship between T and H is. The complete description of the text
      entailment recognition approach, including evaluation results and justification examples can
∗
    Speaker

                                                                       sciencesconf.org:wnlex2018:217083
be found in [8].

Figure 1. Example of role labeling for the definition of ”lake poets”

Figure 2. RDF representation for the definition of ”lake poets”

In future work, this methodology will be applied to GermaNet, and it will also include
the adjective synsets because they are organized hierarchically in the lexico-semantic net-
work for the German language.
References
Silva, V. S., Freitas, A., and Handschuh, S. (2018). Building a Knowledge Graph from Nat-
ural Language Definitions for Interpretable Text Entailment Recognition. In Proceedings
of the Eleventh International Conference on Language Resources and Evaluation (LREC
2018).
Silva, V. S., Handschuh, S., and Freitas, A. (2016). Categorization of semantic roles for
dictionary definitions. In Cognitive Aspects of the Lexicon (CogALex-V), Workshop at
COLING 2016, pages 176–184.
Berg, J. (1982). Aristotle’s theory of definition. ATTI del Convegno Internazionale di Storia
della Logica, pages 19–30.
Granger, E. H. (1984). Aristotle on genus and differentia. Journal of the History of Philos-
ophy, 22(1):1–23.
Lloyd, A. C. (1962). Genus, species and ordered series in Aristotle. Phronesis, pages 67–90.
Mesnil, G., Dauphin, Y., Yao, K., Bengio, Y., Deng, L., Hakkani-Tur, D., He, X., Heck, L.,
Tur, G., Yu, D., et al. (2015). Using recurrent neural networks for slot filling in spoken lan-
guage understanding. IEEE/ACM Transactions on Audio, Speech and Language Processing
(TASLP), 23(3):530–539.
Freitas, A., da Silva, J. a. C. P., Curry, E., and Buitelaar, P. (2014). A distributional
semantics approach for selective reasoning on commonsense graph knowledge bases. In In-
ternational Conference on Applications of Natural Language to Data Bases/Information
Systems, pages 21– 32. Springer.
Silva, V. S., Freitas, A., and Handschuh, S. (2018). Recognizing and justifying text entail-
ment through distributional navigation on definition graphs. In AAAI.
Refining WordNets in the Context of
                            OntoLex-Lemon

                                          Thierry Declerck∗1
         1
             Austrian Center for Digital Humanities, at Austrian Academy of Sciences – Austria

                                                Abstract

          In the paper [1], we presented an approach for encoding German compounds listed in Ger-
      maNet into the decomposition module of OntoLex-Lemon (https://www.w3.org/2016/05/ontolex/).
      This lead us to the possibility to associate distinct senses to a component that is used in dif-
      ferent compounds. Beyond this we could also associate senses to components of compounds
      that are not per-se a lexical entry.
      We are currently extended this word to the consideration of morphological variants of a word
      that is listed in Princeton WordNet [2]. This concerns inflectional and derivational variants.
      Inflectional variants are relevant as some differences in the use of the singular or the plural
      form of a word can affect the type and the range of senses (or synsets) associated to a specific
      lemma. Our approach would thus consist in not associated synsets only to lemmas, but also
      to forms (in the terminology of OntoLex-Lemon). This can be easily done while adding some
      restrictions to the senses listed in OntoLex-Lemon. We will present our current encoding of
      such phenomena in the poster.

      Derivation is also a topic of interest as this would allow to describe (slight) sense modifica-
      tions when considering, for example, a verb and its nominalisation, or between an adjective
      and its adverbial derivation. Using OntoLex-Lemon for encoding those aspects could help
      in extending WordNet with this type of information, including shift of meanings. We are
      currently working on proposing an encoding of this phenomenon in OntoLex-Lemon

      References:

      Thierry Declerck, Piroska Lendvai. Towards a Formal Representation of Components of
      German Compounds in: Micha Elsner, Sandra Kuebler (eds.): Proceedings of the 14th
      SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Mor-
      phology, Berlin, Germany, ACL, Humboldt University, 8/2016
      Princeton University ”About WordNet.” https://wordnet.princeton.edu/WordNet. Prince-
      ton University. 2010.

∗
    Speaker

                                                                    sciencesconf.org:wnlex2018:219592
WordNet and Distributional Semantics for Computational Rhetoric
                            Jelena Mitrović, Siegfried Handschuh
              Faculty of Computer Science and Mathematics, University of Passau
         Computational rhetoric (CR) is an area of Natural Language Processing (NLP) dealing with
computational approaches to modelling and detection of rhetorical figures, as well as rhetorical
relations which, in turn, might aid tasks such as Sentiment and Opinion Mining and Analysis,
Argument mining, Argumentation modelling, Analysis of political argumentation etc. (Mitrović,
et al., 2017). In this poster, we present some recent advances in CR relating to using WordNet as
a starting point and a valuable resource, as well as future directions in this regard, relating to the
paradigm of Distributional semantics.
        Ontological modelling of rhetorical figures (Mladenović and Mitrović, 2013) and rhetorical
relations (Groza et al. 2016) is one approach to salient CR solutions. On the other hand, WordNet
can successfully be used in CR. For figures Irony and Sarcasm Serbian WordNet ontology (SWN)
was used, enhanced with new semantic relations specificOf /specifiedBy which connect adjective
and noun synsets and play the semantic role of the figure Simile (Mladenović et al., 2016;
Mladenović et al., 2017; Mitrović, 2018). Detection of rhetorical figures was performed in a
Machine learning system shown in Figure 1.

                     Figure 1 Detection of Irony and Sarcasm using WordNet
       Distributional semantics models and distributional representations, such as word
embeddings, have been a hot topic in NLP and CR recently. Khodak et al. (2017) have used word
embeddings to construct Wordnets in French and Russian and their approach can be extended
to other languages. Gutierrez et al. (2016) investigate compositional distributional semantic
models for literal and metaphorical senses. O’Reilly and Harris (2017) describe a multi-
dimensional vector-space to imagine and calculate the rhetorical figure Antimetabole, which
has been seen as an important figure in political argumentation (Mitrović et al., 2017). Likewise,
Zayed et al. (2018) identify metaphors on the phrase level using distributed representations of
word meaning. All these works pave the way to a plethora of possibilities for hybrid approaches
to using WordNet and its distributed representation for Computational rhetoric purposes. We
envision exciting research directions stemming from Distributional semantics and WordNet
modelling, to harness the deeper semantics of lexico-semantic networks and allow for new NLP
and CR approaches.
References
Groza, T., Kim, H.L., Handschuh, S. (2016). SALT: Enriching LATEX with semantic annotations. In
Proceedings of the 5th International Semantic Web Conference (ISWC 2006), Athens, GA.
Gutierrez, D., Shutova, E., Marghetis T., and Bergen, B. (2016). Literal and Metaphorical Senses
in Compositional Distributional Semantic Models. In Proceedings of ACL 2016, Berlin, Germany.
Khodak, M., Risteski, A., Fellbaum, C., Arora, S. (2017). Extending and improving Wordnet via
unsupervised word embeddings. Linguistic Issues in Language Technology, Vol 10, Issue 4.
Mitrović J., O’Reilly C., Mladenovic M., Handschuh S. (2017). Ontological Representations of
Rhetorical Figures for Argument Mining. Argument and Computation. 2018;7(3).
Mitrović, J. (2018). Electronic Lexical Resources and Tools for Natural Language Processing of
Serbian and their Enhancement via Crowdsourcing. PhD Thesis. University of Belgrade.
http://uvidok.rcub.bg.ac.rs/bitstream/handle/123456789/2431/Doktorat.pdf?sequence=2
Mladenović, M. and Mitrović, J. (2013). Ontology of rhetorical figures for Serbian. InProceedings
of Text, Speech, and Dialogue –16th International Conference, TSD2013. I. Habernal and V.
Matoušek, eds. pp. 386–393.
Mladenović, M., Mitrović, J., Krstev, C. (2016). A language-independent model for introducing a
new semantic relation between adjectives and nouns in a WordNet. InProceedings of Eight
Global WordNet Conference, GWC2016, pp. 218–225.
Mladenović, M., Mitrović, J., Krstev, C, Stanković, R. (2017). Using Lexical Resources for Irony
and Sarcasm Classification. The 8th Balkan Conference in Informatics, Skopje, Macedonia, 20-
23.
O'Reilly, C. and Harris R. A. (2017) Antimetabole and Image Schemata-ontological and vector
space models. Joint Ontology Workshop 2017, Bozen-Bolzano.
Zayed, O., McCree, J.P., Buitelaar, P. (2018) Phrase-level metaphor identification using
distributed representations of word meaning. NAACL 2018-FigLang2018 Workshop.
You can also read