Usage in Question Answering - Inside ASCENT: Exploring a Deep Commonsense Knowledge Base and its
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Inside A SCENT: Exploring a Deep Commonsense Knowledge Base and its Usage in Question Answering Tuan-Phong Nguyen Simon Razniewski Gerhard Weikum Max Planck Institute for Informatics Saarbrücken, Germany {tuanphong,srazniew,weikum}@mpi-inf.mpg.de Abstract (CSKBs), including expert-annotated KBs (e.g., Cyc (Lenat, 1995)), crowdsourced KBs (e.g., Con- A SCENT is a fully automated methodology for extracting and consolidating commonsense ceptNet (Speer and Havasi, 2012) and Atomic (Sap arXiv:2105.13662v1 [cs.AI] 28 May 2021 assertions from web contents (Nguyen et al., et al., 2019)) and KBs built by automatic acqui- 2021). It advances traditional triple-based sition methods such as WebChild (Tandon et al., commonsense knowledge representation by 2014, 2017), TupleKB (Mishra et al., 2017), Quasi- capturing semantic facets like locations and modo (Romero et al., 2019) and CSKG (Ilievski purposes, and composite concepts, i.e., sub- et al., 2020). Human-created KBs, although pos- groups and related aspects of subjects. In this sessing high precision, usually suffer from low cov- demo, we present a web portal that allows users to understand its construction process, erage. On the other hand, automatically-acquired explore its content, and observe its impact in KBs typically have better coverage, but also con- the use case of question answering. The demo tain more noise. Nonetheless, despite different website1 and an introductory video2 are both construction methods, these KBs are all based on a available online. simple subject-predicate-object model, which has major limitations in validity and expressiveness. 1 Introduction We recently presented A SCENT (Nguyen et al., Commonsense knowledge (CSK) is an enduring 2021), a methodology for automatically collecting theme of AI (McCarthy, 1960) that has been re- and consolidating commonsense assertions from cently revived for the goal of building more robust the general web. To overcome the limitations of and reliable applications (Monroe, 2020). Recent prior works, A SCENT refines subjects with sub- years have witnessed the emerging of large pre- groups (e.g., circus elephant and domesticated ele- trained language models (LMs), notably BERT (De- phant) and aspects (e.g., elephant tusk and elephant vlin et al., 2018), GPT (Brown et al., 2020) and habitat), and captures semantic facets of assertions their variants which significantly boosted the per- (e.g., hlawyer, represents, clients, LOCATION: in formance of tasks requiring natural language under- courtsi or helephant, uses, its trunk, PURPOSE: to standing such as question answering and dialogue suck up wateri). systems (Clark et al., 2020). Although it has been For a given concept, A SCENT searches through shown that such LMs implicitly store some com- the web with pattern-based search queries dis- monsense knowledge (Talmor et al., 2019), this ambiguated using WordNet (Miller, 1995) hyper- comes with various caveats, for example regarding nymy. Then, irrelevant documents are filtered out degree of truth, or negation, and their commercial based on similarity comparison against the corre- development is inherently hampered by their low sponding Wikipedia articles. We then use a se- interpretability and explainability. ries of judicious dependency-parse-based rules to Structured knowledge bases (KBs), in contrast, collect faceted assertions from the retained texts. give a great possibility of explaining and interpret- The semantic facets, which come from preposi- ing outputs of systems leveraging the resources. tional phrases and supporting adverbs are then la- There have been great efforts towards build- beled by a supervised classifier. Finally, asser- ing large-scale commonsense knowledge bases tions are clustered using similarity scores from 1 https://ascent.mpi-inf.mpg.de word2vec (Mikolov et al., 2013) and a fine-tuned 2 https://youtu.be/qMkJXqu_Yd4 RoBERTa (Liu et al., 2019) model.
We executed the A SCENT pipeline for 10,000 1. When searching for source texts, A SCENT prominent concepts (selected based on their respec- combines the target subject with an informa- tive number of assertions in ConceptNet) as pri- tive hypernym from WordNet to distinguish mary subjects. In (Nguyen et al., 2021), we showed different senses of the word (e.g., “bus public that the content of the resulting CSKB (hereinafter transport” and “bus network topology” for the referred to as A SCENT KB) is a milestone in both subject bus). salience and recall. As extrinsic evaluation, we conducted a comprehensive evaluation of the con- 2. A SCENT refines subjects with multi-word tribution of CSK to zero-shot question answering phrases into subgroups and aspects. For ex- (QA) with pre-trained language models (Petroni ample, subgroups for the subject bus would et al., 2020; Guu et al., 2020). be tourist bus and school bus, while one of its aspects would be bus driver. This paper presents a companion web portal of the A SCENT KB, which enables the following in- Semantic facets. The validity of commonsense teractions: assertions is usually non-binary (Zhang et al., 2017; Chalier et al., 2020), and depends on specific tem- 1. Exploration of the construction process poral and spatial circumstances (e.g., lions live for of A SCENT, by inspecting word sense 10-14 years in the wild but for more than 15 years and Wikipedia disambiguation, web search in captivity). Moreover, CSK triples often ben- queries, clustered statements, and source sen- efit from further context regarding causes/effects tences and documents. and instruments (e.g., elephants communicate with 2. Inspection of the resulting KB, starting from each other by creating sounds, beer is served in subjects, predicates, objects, or examining bars). In A SCENT’s knowledge model, such infor- specific subgroups or aspects. mation is added to SPO triples via semantic facets. A SCENT distinguished 8 types of facets: cause, 3. Observation of the impact of structured knowl- manner, purpose, transitive-object, degree, loca- edge on question answering with pretrained tion, temporal and other-quality. language models, comparing generated an- 2.2 Extraction pipeline swers across various CSKBs and QA settings. A SCENT is a pipeline operating in three phases: The web portal is available at https://ascent. source discovery, knowledge extraction and knowl- mpi-inf.mpg.de, and a screencast demonstrating edge consolidation. Fig. 1 illustrates the architec- the system can be found at https://youtu.be/ ture of the pipeline. qMkJXqu_Yd4. Source discovery. We utilize the Bing Web Search API to obtain documents specific to each 2 A SCENT subject, with search queries refined by the sub- ject’s hypernyms in WordNet. We manually de- Two major contributions of A SCENT are its ex- signed query templates for 35 prominent hyper- pressive knowledge model, and its state-of-the-art nyms (e.g., if subject s0 has hypernym animal.n.01, extraction methodology. Details are in the techni- we produce the search query “s0 animal facts”, cal paper (Nguyen et al., 2021). In this section, we similarly for the hypernym professional.n.01, the revisit the most important points. search query will be “s0 job descriptions”). We then compute the cosine similarity between the 2.1 Knowledge model bag-of-words representations of each obtained doc- A SCENT extends the traditional triple-based data ument and a respective Wikipedia article to deter- model in existing CSKBs in two ways. mine the relevance of the documents. Low-ranked Expressive subjects. Subjects in existing CSKBs documents will be omitted in further steps. are usually single nouns, which implies two short- Knowledge extraction. The extractors take in the comings: (i) different meanings for the same word relevant documents and their outputs include: open are conflated, and (ii) refinements and variants of information extraction (OIE) tuples, list of sub- word senses are missed out. A SCENT has addressed groups and list of aspects. To obtain OIE tuples, this problem with the following means: we extend the S TUFF IE approach (Prasojo et al.,
Open IE Supervised Facet Labeling Assertion Clustering Relevant websites Coreference Resolution OpenIE assertions Facet Clustering Noun Chunking Filtering Web Search Noun chunks Subgroup/Aspect Extraction Concept S Related terms Commonsense assertions of S (1) Retrieval (2) Extraction (3) Consolidation Figure 1: Architecture of the A SCENT extraction pipeline (Nguyen et al., 2021). 2018), a list of carefully crafted dependency-parse- answering (Section 4.3). based rules, to pull out faceted assertions from the texts. Then we classify each facet into one of the 3 Commonsense QA setups eight semantic labels using a fine-tuned RoBERTa One common extrinsic use case of KBs is question model. For subgroups, noun phrases whose head answering. Recently, it was observed that prim- word is the target subject are collected as candi- ing language models (LMs) with relevant context dates and then are clustered using the hierarchical can considerably benefit their performance in QA- agglomerative clustering (HAC) algorithm on av- like tasks (Petroni et al., 2020; Guu et al., 2020). erage word2vec representations. Finally, we col- In (Nguyen et al., 2021), to evaluate the contri- lect aspects from possessive noun chunks and SPO bution of structured CSK to QA, we conducted a triples where P is either “have”, “contain”, “be comprehensive evaluation consisting of four differ- assembled of” or “be composed of”. ent setups, all based on the above idea. Knowledge consolidation. We perform cluster- 1. In masked prediction (MP), LMs are asked ing on SPO triples and facet values. As SPO to predict single masked tokens in generic triples, we first filter triple-pair candidates with sentences. fast word2vec similarity. After that, advanced simi- larity of triple pairs computed by another fine-tuned 2. In free generation (FG), LMs arbitrarily gen- RoBERTa model is fed to the HAC algorithm to erate answer sentences to given questions. group the triples into semantically similar clusters. 3. Guided generation (GG) extends free genera- For facet values, we group phrases with the same tion by answer prefixes that prevent the LMs head words together (e.g., “during evening” and from evading answering. “in the evening”). 4. Span prediction (SP) is the task of locating 2.3 Web portal the answer of a question in provided context. The web portal (https://ascent.mpi-inf.mpg. Examples of the QA setups can be seen in Ta- de) is implemented in Python using Django, and ble 1. Generally, given a question, our system hosted on an Nginx web server. The underlying will retrieve from CSKBs assertions relevant to it, structured CSK is stored in a PostgreSQL database, and then use the assertions as additional context while for the QA part, statements of all CSKBs to guide the LMs. In the A SCENT demonstrator, are indexed and queried via Apache Solr, for fast we provide a web interface for experimenting with text-based querying. All components are deployed all of those QA setups with context retrieved from on a virtual machine with access to 4 virtual CPUs several popular CSKBs. and 8 GB of RAM. 4 Demonstration experience In the demonstration session, we show how users can interact with the portal for exploring the KB In the demonstration session, attendees will experi- (Section 4.1), understanding the KB construction ence three main functionalities of our demonstra- (Section 4.2), and observing its utility for question tion system.
Setup Input Sample output ity under the Browse menu. This way, they can Elephants eat [MASK]. [SEP] Ele- everything (15.52%), search, for instance, for all concepts that eat grass MP phants eat roots, grasses, fruit, and trees (15.32%), plants bark, and they eat a lot of these things. (11.26%) (capybara, zebra, kangaroo, ...). C: Elephants eat roots, grasses, fruit, They eat a lot of and bark, and they eat... grasses, fruits, and... The website also provides a JSON-formatted FG Q: What do elephants eat? data dump (678MB) of all 8.9 million assertions A: C: Elephants eat roots, grasses, fruit, Elephants eat a lot of extracted by the pipeline and their corresponding and bark, and they eat... things. GG Q: What do elephants eat? source sentences and documents. This dataset is A: Elephants eat also accessible via the HuggingFace Datasets pack- question=“What do elephants eat?” start=14, end=46, SP context=“Elephants eat roots, grasses, answer=“roots, grasses, age3 . fruit, and bark, and they eat...” fruit, and bark” Table 1: Examples of QA setups (Nguyen et al., 2021). 4.2 Inspecting the construction of assertions For many downstream use cases, it is important to know about the provenance of information. 4.1 Exploring the A SCENT KB Users can inspect general properties of the con- Concept page. Suppose a user wants to know struction process by observing the WordNet lemma which knowledge A SCENT stores for elephants. and the Wikipedia page used for filtering, as well They can enter the concept into the search field in as inspect specific statistics about the number of the top right of the start page, and select the first retained websites, sentences, and assertions, in a result from the autocompletion list, or press enter, panel at the bottom of subject pages (e.g., 435 web- to arrive at the intended concept. The resulting sites were retained for elephant, from which 50k website (see Fig. 2) is divided into three main areas. OpenIE assertions could be extracted). At the top left, they can inspect an image from Furthermore, users can look deeply into the con- https://pixabay.com, the WordNet synset used struction process of each assertion on its own dedi- for disambiguation, the Wikipedia page used for cated page, which displays the following: result filtering, and a list of alternative lemmas, if existing. 1. Clustered triples: These are triples that were At the top right, users can see subgroups and grouped together in the knowledge consolida- related aspects, which in our knowledge represen- tion phase (cf. Section 2.2), where the most tation model, can carry their own statements. This frequent triple was selected as cluster repre- way, they can learn that the most salient aspects of sentative. For example, for the assertion hlion, elephants are their trunks, tusks and ears, or that eat, zebra, DEGREE: mostlyi (14), the cluster elephant trunks have more than 40,000 muscles. contains: hlion, eat, zebrai (9), hlion, prey on, The body of the page, presents the assertions, zebrai (2), hlion, feed on, zebrai (1), hlion, organized into groups of same-predicate assertions. feed upon, zebrai (1), hlion, prey upon, zebrai In each group, assertions are sorted by their fre- (1). The numbers in parentheses indicate their quency displayed beside their objects. For example, corresponding frequency. the most commonly mentioned foods of elephants are grasses, fruits, and plants. Many assertions 2. Facets: The assertion’s facets are presented in come with a red asterisk. This indicates that the as- a table whose columns are facet value, facet sertion comes with semantic facets. When clicking type and clustered facets. The frequency of on an assertion, it will show a small box display- each clustered facet is also indicated. ing an SVG-based visualisation of the assertion in which we illustrate all elements of the assertion: its 3. Source sentences and documents: Finally, we subject, predicate, object, facet labels and values, exhibit the sentences from which the asser- frequency of the assertion as well as frequency of tions were extracted and their parent docu- each facet. For example, one can see that the pur- ments (in the form of URLs). Furthermore, in pose of elephants using their trunks is to suck up the extraction phase, we also recorded the po- water. sition of assertion elements (i.e., subject, pred- icate, object, facet) in the source sentences. Searching and downloading assertions. Alter- natively to exploring statements starting from a 3 https://huggingface.co/datasets/ subject, users can start from a search functional- ascent_kb
Figure 2: Example of A SCENT’s page for the concept elephant. We show that information to users by high- assertions to be retrieved per CSKB for each lighting each kind of element with a different question. color in the source sentences. 4. Context sources: The user selects the sources 4.3 Experimenting with commonsense QA of context (i.e., “no context”, CSKBs and The third functionality experienced in the demo ses- “custom context”). If a CSKB is selected, the sion is the utilization of commonsense knowledge system will retrieve from that KB assertions for question answering (QA). relevant to the given input question. If “cus- Input. There are four main parts in the input in- tom context” is selected, user must then enter terface for the QA experiment: their own content. The “no context” option is available for all setups but Span Prediction. 1. QA setup: The user chooses one QA setup they want to experiment with. Available Output. The QA system presents its output in the are Masked Prediction, Span Prediction and form of a table which has three columns: Source, Free/Guided Generation. If Masked Predic- Answer(s) and Context. For Masked Prediction tion is selected, the user can choose how many and Span Prediction, answers are printed with answers the LM should produce. For the Gen- their respective confidence scores, meanwhile for eration settings, users can provide an answer Free/Guided Generation, only answers are printed. prefix to avoid overly evasive answers. For Span Prediction in which answers come di- rectly from given contexts, we also highlight the 2. Input query: The user enters the text question answers in the contexts. as input. The question can be in the form An example of the QA demo’s output for the of a masked sentence (in the case of Masked question “What do rabbits eat?” under the Free Prediction), or a standard natural-language Generation setting can be seen in Fig. 3. One can question (in other setups). observe that language models’ predictions are heav- 3. Retrieval options: The user can select one ily influenced by given contexts. Without context, supported retrieval method and the number of GPT-2 is only able to generate an evasive answer.
When being given context, it tends to re-generate tent via CSV files. Some, like ConceptNet4 , We- the first sentence in the context first, (e.g., see the bChild5 , Atomic6 and Quasimodo7 , have a web por- answers aligning with A SCENT, TupleKB and Con- tal to visualise their assertions. The most common ceptNet in Fig. 3). For the context retrieved from way for CSKB visualisation is to use a single page Quasimodo, GPT-2 is able to overlook the erro- for each subject and group assertions by predicate neous first sentence, however its generated answer (e.g., in ConceptNet and WebChild). Quasimodo, is rather elusive despite the fact that subsequent on the other hand, implements a simple search in- statements in the context all contain direct answers terface to filter assertions and presents assertions to the question. in a tabular way (Romero and Razniewski, 2020). The question “Bartenders work in [MASK].” un- The A SCENT demo has both functionalities: ex- der the Masked Prediction setting is another ex- hibiting assertions of each concept in a separated ample for the influence of context on LMs’ output. page, and supporting assertion filtering. Our demo Since bartender is a subject well covered by the A S - also uses an SVG-based visualisation of assertions CENT KB, the assertions pulled out are all relevant with semantic facets, which are a distinctive feature (i.e., Bartenders work in bar. Bartenders work in of the A SCENT knowledge model. restaurant. . . ) which help guide the LM to a good Context in LM-based question answering. answer (bar). Meanwhile, because this subject is Priming large pretrained LMs with context in not present in TupleKB, its retrieved statements are QA-like tasks is a relatively new line of research rather unrelated (Work capitals have firm. Work (Petroni et al., 2020; Guu et al., 2020). In our orig- experiences include statement. . . ). Given that, the inal paper, we made the first attempt to evaluate top-1 prediction for this KB was tandem which is the contribution of CSKB assertions to QA via four obviously an evasive answer. different setups based on that idea. While others use commonsense knowledge for (re-)training lan- 5 Related work guage models (Hwang et al., 2021; Ilievski et al., 2021; Ma et al., 2021; Mitra et al., 2020), to the CSKB construction. Cyc (Lenat, 1995) is the best of our knowledge, our demo system is the first first attempt to build a large-scale common- to visualize the effect of priming vanilla language sense knowledge base. Since then, there have models, i.e., without task-specific retraining. been a number of other CSKB construction projects, notably ConceptNet (Speer and Havasi, 6 Conclusion 2012), WebChild (Tandon et al., 2014, 2017), Tu- pleKB (Mishra et al., 2017), and more recently We presented a web portal for a state-of-the-art Quasimodo (Romero et al., 2019), Dice (Chalier commonsense knowledge base—the A SCENT KB. et al., 2020), Atomic (Sap et al., 2019), and It allows users to fully explore and search the CSKG (Ilievski et al., 2020). The early approach CSKB, inspect the construction process of each to building a CSKB is based on human annota- assertion, and observe the impact of structured tion (e.g., Cyc with expert annotation and Con- CSKBs on different QA tasks. We hope that the ceptNet with crowdsourcing annotation). Later portal enables interesting interactions with the A S - projects tend to use automated methods based on CENT methodology, and that the QA demo allows open information extraction to collect CSK from researchers to explore the potentials of combining texts (e.g., WebChild, TupleKB and Quasimodo). structured data with pre-trained language models. Lately, CSKG is an attempt to combine various commonsense knowledge resources into a single KB. The common thread of these CSKB is that References they are all based on SPO triples as knowledge Tom B Brown et al. 2020. Language models are few- representation, which has shortcomings (Nguyen shot learners. In NeurIPS. et al., 2021). A SCENT is the first attempt to build 4 http://conceptnet.io a large-scale CSKB with assertions equipped with 5 https://gate.d5.mpi-inf.mpg.de/ semantic facets built upon the ideas of semantic webchild 6 role labeling (Palmer et al., 2010). https://mosaickg.apps.allenai.org/ kg-atomic2020 7 KB visualization. Most CSKBs share their con- https://quasimodo.r2.enst.fr
Figure 3: Free Generation output for question: “What do rabbits eat?”. Yohan Chalier, Simon Razniewski, and Gerhard evaluation in commonsense question answering. In Weikum. 2020. Joint reasoning for multi-faceted AAAI. commonsense knowledge. In AKBC. John McCarthy. 1960. Programs with common sense. Peter Clark et al. 2020. From ‘F’to ‘A’ on the NY RLE and MIT computation center. regents science exams: An overview of the Aristo project. AI Magazine. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word represen- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and tations in vector space. In ICLR. Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understand- George A Miller. 1995. Wordnet: a lexical database for ing. In NAACL. English. Communications of the ACM. Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasu- Bhavana Dalvi Mishra, Niket Tandon, and Peter Clark. pat, and Ming-Wei Chang. 2020. Realm: Retrieval- 2017. Domain-targeted, high precision knowledge augmented language model pre-training. In ICML. extraction. TACL. Arindam Mitra, Pratyay Banerjee, Kuntal Kumar Pal, Jena D Hwang, Chandra Bhagavatula, Ronan Le Bras, Swaroop Mishra, and Chitta Baral. 2020. How ad- Jeff Da, Keisuke Sakaguchi, Antoine Bosselut, and ditional knowledge can improve natural language Yejin Choi. 2021. Comet-atomic 2020: On sym- commonsense question answering? arXiv preprint bolic and neural commonsense knowledge graphs. arXiv:1909.08855. In AAAI. Don Monroe. 2020. Seeking artificial common sense. Filip Ilievski, Alessandro Oltramari, Kaixin Ma, Bin Communications of the ACM. Zhang, Deborah L McGuinness, and Pedro Szekely. 2021. Dimensions of commonsense knowledge. Tuan-Phong Nguyen, Simon Razniewski, and Gerhard arXiv preprint arXiv:2101.04640. Weikum. 2021. Advanced semantics for common- sense knowledge extraction. In WWW. Filip Ilievski, Pedro Szekely, and Bin Zhang. 2020. Cskg: The commonsense knowledge graph. arXiv Martha Palmer, Daniel Gildea, and Nianwen Xue. 2010. preprint arXiv:2012.11490. Semantic role labeling. Synthesis Lectures on Hu- man Language Technologies. Douglas B Lenat. 1995. Cyc: A large-scale investment in knowledge infrastructure. Communications of the Fabio Petroni, Patrick Lewis, Aleksandra Piktus, Tim ACM. Rocktäschel, Yuxiang Wu, Alexander H Miller, and Sebastian Riedel. 2020. How context affects lan- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man- guage models’ factual predictions. In AKBC. dar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Radityo Eko Prasojo, Mouna Kacimi, and Werner Nutt. Roberta: A robustly optimized BERT pretraining ap- 2018. Stuffie: Semantic tagging of unlabeled facets proach. arXiv preprint arXiv:1907.11692. using fine-grained information extraction. In CIKM. Kaixin Ma, Filip Ilievski, Jonathan Francis, Yonatan Julien Romero and Simon Razniewski. 2020. Inside Bisk, Eric Nyberg, and Alessandro Oltramari. 2021. quasimodo: Exploring construction and usage of Knowledge-driven data construction for zero-shot commonsense knowledge. In CIKM.
Julien Romero, Simon Razniewski, Koninika Pal, Niket Tandon, Gerard de Melo, Fabian M. Suchanek, Jeff Z. Pan, Archit Sakhadeo, and Gerhard Weikum. and Gerhard Weikum. 2014. Webchild: harvesting 2019. Commonsense properties from query logs and and organizing commonsense knowledge from the question answering forums. In CIKM. web. In WSDM. Maarten Sap et al. 2019. Atomic: An atlas of machine commonsense for if-then reasoning. In AAAI. Niket Tandon, Gerard de Melo, and Gerhard Weikum. Robyn Speer and Catherine Havasi. 2012. Conceptnet 2017. Webchild 2.0 : Fine-grained commonsense 5: A large semantic network for relational knowl- knowledge distillation. In ACL. edge. Theory and Applications of Natural Language Processing. Alon Talmor, Yanai Elazar, Yoav Goldberg, and Sheng Zhang, Rachel Rudinger, Kevin Duh, and Ben- Jonathan Berant. 2019. olmpics - on what language jamin Van Durme. 2017. Ordinal common-sense in- model pre-training captures. TACL. ference. TACL.
You can also read