CARMA Constructional Analyzer using Recursively Multiple AVMs
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
CARMA Constructional Analyzer using Recursively Multiple AVMs Ely Edison Matos ely.matos@ufjf.edu.br September 6, 2018 FrameNet Brasil Project - UFJF
Table of contents 1. Introduction 2. Premises 3. Computational Processing 4. Limitations and Outlook 1
Introduction
Context FNBr is working on NLU (Natural Language Understanding) projects. FNBr approach to NLU comprises three main elements: 1. Linguistic Knowledge: Lexicon, Constructions, GF, POS, Roles, Syntax, etc. 2. World Knowledge: Ontologies and external datasets 3. Situational Context: Frames and Frame Elements NLU processes must use linguistic knowledge cognitively to get an approximated shape of a world knowledge in a given situational context. 2
CARMA CARMA1 is a constructional analyzer: given a raw sentence, it tries to identify the constructions in the sentence. If these constructions evoke a frame, it helps to identify the Situational Context. 1 this is a new version of [4] 3
Why constructional analysis? obj det nsubj det o celular quebrou a tela DET NOUN VERB DET NOUN The cellphone break.PST the screen ’The screen in cellphone broke’/Cxn Split_object obj det nsubj det o menino quebrou a cadeira DET NOUN VERB DET NOUN The boy break.PST the chair ’The boy broke the chair’/Cxn Transitive_action 4
Resources CARMA is using 4 different resources: 1. FNBr framenet • the network of frames and LUs, including all lexicon stuff (words, lexemes, lemmas,..) 2. FNBr constructicon • the network of constructions 3. FNBr ontology • a Generative Lexicon based ontology defining extended qualia relations between LUs (based on SIMPLE ontology[1]) 4. UD parser • to get the syntactic structure of sentence using UD POS and relations. 5
Premises
AVM Figure 1: AVM structure 6
AVM Figure 2: Everything as AVM ! Recursive AVMs: the value can be another AVM 7
Constraints CARMA is a constraint-based system: the AVM attributes must be restricted by a (set of) possible/acceptable value(s) Constructions are defined by constraining construction elements to dependency relations, as proposed by Property Grammar [2] 8
Construction definition cxn_split_object: type: cxn class: cxn_split_object region: cxn_split_object attributes: nsubj: features: {optional: false, head: false} value: [ud_nsubj] verb: features: {optional: false, head: true} value: [pos_verb] obj: features: {optional: false, head: false} value: [ud_obj] x_part: type: xe value: [rel_is_part_of] x_frame: type: xe value: [frm_undergoing] constraints: - {arg1: verb, constraint: dominance, arg2: nsubj} - {arg1: verb, constraint: dominance, arg2: obj} - {arg1: nsubj, arg2: x_part, constraint: hasword} - {arg1: obj, arg2: x_part, constraint: hasword} 9
Computational Processing
Topology CARMA is a recursive hierarchical network and an elaborated pattern-matching system So, it is amenable to some Machine Learning techniques 10
RCN RCN: Recursive Cortical Network[3] Figure 3: Overview of RCN (source:[3]) 11
RCN ? RCN resembles AVM Figure 4: Detail of of RCN (source:[3]) 12
RCN Processing RCN can be used for generation and inference (parsing) Inference • Belief propagation • Forward-pass • Backward-pass 13
CARMA processing In CARMA we are interested in the parsing process Resources are stored in some persistent medium • Lexicon on FNBr database (MySQL) • Frames, Constructions, Ontology exported to Neo4j graph database 14
CARMA processing 1. User inputs a sentence. 2. The sentence is parsed for UD (currently using UDPipe parser) 3. FNBr database is queried for wordforms, lexemes and lemmas 4. A type network is built with lexical stuff 5. Graph database is queried to complete the type network 6. Type network is traversed to create a token network 7. Word nodes are activated, constraints are calculated and the activation spreads in token network until a root node 8. Activated constructions nodes correspond to constructions detected in the sentence 9. Conflicts (more than one construction activated) are resolved based on MAP (maximum a posteriori) 15
CARMA processing Figure 5: Partial view of activated network 16
Limitations and Outlook
Limitations • Current version is at very beginning • UD parsing for Brazilian Portuguese is very limited and error prone • Some basic linguistic phenomenons are not handled yet (e.g. Null Instantiation) • and many others... 17
Outlook • How to implement a learning process? • How to use the analysis in the context of construction alignment • How many constraint types? • and many others... 18
Thank you! 18
References i N. Bel, F. Busa, N. Calzolari, E. Gola, A. Lenci, M. Monachini, A. Ogonowski, I. Peters, W. Peters, N. Ruimy, M. Villegas, and A. Zampolli. SIMPLE: A General Framework for the Development of Multilingual Lexicons. Proceedings of the 2nd International Conference on Language Resources and Evaluation, 2000. P. Blache. Property Grammars: A Fully Constraint-Based Theory. In Christiansen H. et al. (eds.), Constraint Solving and Language Processing, Sptinger-Verlag, Berlin Heidelberg, pages 1–16, 2005. 19
References ii D. George, W. Lehrach, K. Kansky, M. Lazaro-Gredilla, C. Laan, B. Marthi, X. Lou, Z. Meng, and Y. Liu. A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs. Science, 10(October):1–19, 1126. E. Matos, T. Torrent, V. Almeida, A. Laviola, L. Lage, N. Marção, and T. Tavares. Constructional Analysis Using Constrained Spreading Activation in a FrameNet-Based Structured Connectionist Model. The AAAI 2017 Spring Symposium on Computational Construction Grammar and Natural Language Understanding, Technical Report SS-17-02, pages 222–229, 2017. 20
You can also read