Complex Question Answering on knowledge graphs using machine translation and multi-task learning
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Complex Question Answering on knowledge graphs using machine translation and multi-task learning Saurabh Srivastava, Mayur Patidar, Sudip Chowdhury, Puneet Agarwal, Indrajit Bhattacharya, Gautam Shroff TCS Research, New Delhi, India { sriv.saurabh, patidar.mayur, sudip.chowdhury2 puneet.a, gautam.shroff}@tcs.com Abstract Question answering (QA) over a knowledge graph (KG) is a task of answering a natu- ral language (NL) query using the informa- tion stored in KG. In a real-world industrial setting, this involves addressing multiple chal- lenges including entity linking, multi-hop rea- soning over KG, etc. Traditional approaches handle these challenges in a modularized se- quential manner where errors in one module lead to the accumulation of errors in down- stream modules. Often these challenges are inter-related and the solutions to them can rein- Figure 1: Example queries from a real-world dataset force each other when handled simultaneously LOCA. Column 6 (is SP?) represents whether the in an end-to-end learning setup. To this end, queries can be answered via shortest path or not?, all we propose a multi-task BERT based Neural the other columns are self-explanatory. Machine Translation (NMT) model to address these challenges. Through experimental anal- Sample questions from LOCA dataset are shown ysis, we demonstrate the efficacy of our pro- in Figure 1. The schema of the corresponding KG posed approach on one publicly available and is shown in Figure 2. Answering such questions one proprietary dataset. often requires a traversal of KG along multiple re- 1 Introduction lations which may not form a directed chain graph, and may follow more complex topology as shown Question answering on knowledge graphs (KGQA) for question 5, 7 and 8 in Figure 1. It can also be has mainly been attempted on publicly available observed that most often words of the natural lan- KGs such as Freebase Bollacker et al. (2008), DB- guage question (NLQ) and corresponding relations Pedia Lehmann et al. (2015), Yago Suchanek et al. have a weak correlation. Most of the proposed ap- (2007), etc. There is also a demand for questions proaches on the KGQA task Bollacker et al. (2008) answering on proprietary KGs created by large en- parse the NLQ and convert it into a structured query terprises. For example, KGQA, on a) a KG that and then execute the structured query on the KG to contains information related to retail products, can retrieve the factoid answers. Such conversion in- help the customers choose the right product for volves multiple sub-tasks: a) linking the mentioned their needs, or b) a KG containing document cata- entity with corresponding entity-node in the KG logs (best practices, white papers, research papers) Blanco et al. (2015); Pappu et al. (2017), b) identifi- can help a knowledge worker find a specific piece cation of the type of the answer entity Ziegler et al. of information, or c) a KG that stores profiles of var- (2017), c) identification of relations Dubey et al. ious companies can be used to do preliminary anal- (2018); Weston et al. (2013); Hakkani-Tür et al. ysis before giving them a loan, etc. Our motivat- (2014). These tasks are most often performed in a ing use-case comes from an enterprise system (re- sequence Both et al. (2016); Dubey et al. (2016); ferred to as LOCA) that is expected to answer users’ Singh et al. (2018), or in parallel Veyseh (2016); questions about the R&D division of an enterprise. Xu et al. (2014); Park et al. (2015), which results 3428 Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, pages 3428–3439 April 19 - 23, 2021. ©2021 Association for Computational Linguistics
(i) We propose a multi-task model that performs has_goal_list Program supervisor 1. Name 2. Vision Program Goal List list_of 1. Desc all tasks for parsing of natural language ques- keyword Head heads_rp has_asset tion together, rather than the traditional ap- 1. Name has_program has_keys 2. Emp ID proach of performing these tasks in a sequen- rp_key_person 3. E-mail ID associated_rp rp_goal_list owner_of Asset research_paper 4. Role owner relevant_keys Keyword head_of 1. Name tial manner, which also involves candidate rp_evangelist 2. Desc asset 1. Name evangelist head published_by_prog 2. Type used_in prog_evangelist key_contributor_prog Research Area sub_area explored_in sub_area has_paper generation based on upstream task and then has_asset 1. Name uses 2. High Level Focus Sub Area area_keys key_concepts short-listing them to make the final predic- ra_evangelist belongs_to 1. Name Paper area_evangelist 2. Desc published 1. Title 2. URL tion. We also demonstrate that using such key_person 3. Date of Publication Evangelist 1. Name asset_evangelist key_contributor_sra 4. Conference explored_in 5. Track/ Workshop an approach newer types of challenges of the 6. Type 2. Emp ID 3. E-mail ID 4. Role Researcher 1. Name 7. PDF Available KGQA task can be solved, which have not 2. Emp ID author 3. E-mail ID 4. Role author_of been attempted by prior work so far. Figure 2: Schema of our propreitary KG LOCA. (ii) We propose the use of neural machine trans- lation based approach to retrieve the variable in accumulation of errors Dubey et al. (2018). Fur- number of relations involved in answering a ther, most of the KGQA tasks are not as complex complex NLQ against a KG. as LOCA. For example, a) All questions of Sim- pleQA Bordes et al. (2015) can be answered using (iii) We also demonstrate that every sub-task of single triple, b) NLQs for most of the datasets (e.g., parsing an NLQ is complementary to other SimpleQA, Meta QA) contain only one metioned tasks and helps the model in performing bet- entity, and c) Even if multiple relations are required ter towards the final goal of KGQA. In Ta- for answer entity retrieval, they are organized in a ble 3, we have demonstrated that via joint sequence, i.e., chain. training on more than one task, the accuracy Our motivating example contains specific types of individual tasks improves as compared to of questions that pose many challenges with re- training them separately. For example, when spect to each of the aforementioned tasks. More- trained separately, the best F1-score for detect- over, some of the questions can only be answered ing mentioned entity(s) was 83.3, and the best via a model that attempts more than one sub-tasks accuracy for the prediction of entity types of together. For example, the first two questions of answer nodes was 75.7. When trained jointly, Figure 1 mention the same words, i.e., “deep learn- we get the corresponding metrics as 87.1 and ing” but they get associated with two different en- 76.3. When trained jointly for all tasks, the tity nodes of the KG. Additionally, the prior work results improve even further. could detect the set of relations when the schema sub-graph follows a specific topology, however, in (iv) CQA-NMT predicts the relations involved in our example, most of the questions follow a dif- a sub-graph of KG and also helps to predict ferent topology. We demonstrate in Section 5 that the topology of the sub-graph, resulting in most of the prior art approaches fail to solve such compositional reasoning via a neural network challenges. We provide a summary of such chal- on the KG. However, the prior work predicts lenges in Section 2. the relations for a specific topology only1 . In this paper, we propose CQA-NMT, a novel (v) We also demonstrate that our approach outper- transformer-based, NMT (neural machine trans- forms the state-of-the-art approaches on the lation) model to solve the aforementioned chal- MetaQA dataset, and therefore we present a lenges by performing four tasks jointly using a new baseline on this dataset. Our approach single model, i.e., i) Detection of mentioned enti- also performs better than standard approaches ties, ii) prediction of entity types of answer nodes, iii) prediction of topology and relations involved, 1 Topology is a specific arrangement of how the mentioned and iv) question type classification such as ‘Fac- entities, and answer entities are connected to each other via the predicted relations. Sample topologies are given in Figure 1. toid’, ‘Count’, etc. CQA-NMT not only performs Our approach can be used to answer questions of any topology, the four sub-tasks but also helps downstream tasks if adequate number of samples are included in the training of mentioned entity disambiguation and subsequent data. The prior works have not attempted a dataset such as LOCA which contains many different topologies. To the best answer retrieval from the KG. The key contribu- of our efforts we could not find another such dataset, which tions of this paper are: has led to our aforementioned belief. 3429
as applicable to our dataset and helps us solve an aggregate operation such as count (‘Count’, see most of the real-world industrial challenges. question 6, in Figure 1), and finally, some questions 2 KGQA Problem and Challenges perform existence check (‘Boolean’, see question 8, in Figure 1). Such information is also assumed For answering natural language questions (NLQ), to be available for every NLQ in training data. we assume that the background knowledge is stored We now describe the challenges that need to be in a knowledge graph G, comprising of a set of addressed while performing the KGQA task. To nodes V (G), and edges E(G). Here, nodes repre- the best of our efforts we could not find any prior sent entities, and edges represent the relationship work that covers all these challenges together. between a pair of entities or connect an entity to one of its properties. An NLQ (q) is a sequence 1. Incomplete Entity Mention: In the NLQ of words wi of a natural language (e.g., English), users often do not mention the complete name i.e., q = {w1 , w2 , ..., wN }. We also assume that of the intended entity Huang et al. (2019), e.g., the NLQs can mention zero, one, or more entities only the first name of a person, short name of present in G and enquire about another entity of G, a group, etc., e.g., question 8 in Figure 1. which is connected with the mentioned entity(s). We pose the KGQA problem as a supervised learn- 2. Co-occurrence disambiguation: For situa- ing problem and next, describe the labels assumed tions when a mentioned entity should be to be available for every question in the training linked to KG entity with help of another men- data and that need to be predicted for every ques- tioned entity in the question, e.g., in question tion in the test data. 7 of Figure 1, there can be many people who Entity Linking Annotation Some of the n- have the same first name (‘Libby’) but there grams (ηi ) in an NLQ refer to entity(s) of KG. is only one of them who works on NLP, the Such n-grams have been underlined in Figure 1. models needs to use this information to con- The entity-id (as shown in the third column of Fig- clusively resolve the mentioned entities Mo- ure 1) of the mentioned entity is also assumed to hammed et al. (2017). be available as part of label annotation for every 3. Avoid un-intended match: Some of the words question. in a sentence coincidently match with an entity Answer Entity Type Annotation (AET), τ : We name but are not an intended mention of an assume that every NLQ has an entity type (ti ) for entity, e.g., the word ‘vision’ may get matched the answer entities. These are shown in the middle with ‘Computer Vision’ which is not intended column of Figure 1. We refer τ as a set of all entity in question 9 of Figure 1. types in the knowledge graph G. Relation Sequence and Topology Annotation 4. Duplicate KG Entity The intended entity (path) Sequence of relations connecting the linked names may be different from the words used entities to the answer entities can be considered in the NLQ, and there can be more than one paths (pathi ), each of which can contain one or entity in the KG that has the same name Shen more relations. These paths are connected to form et al. (2019), for example, “Life Sciences” a topology, as shown in Figure 1. This topology is the name of a research area, as well as a of the paths and relations are also assumed to be keyword (see KG schema given in Figure 2). available for an NLQ in training data. These paths The model needs to link the entity using other need not be the shortest paths between the linked words, similar to how it is shown in questions entities and the answer entities. For example, the 1 and 2 of Figure 1. last three columns of Figure 1 indicate a) the set of paths separated by a semicolon (;), b) whether this 5. Relation names mismatch: Often the words is the shortest path, and c) topology of the paths of the KG relations and the words of the NLQ connecting the linked entities to the answer entities. do not match, Huang et al. (2019), e.g., ques- Question Type Annotation (qtype ) Some NLQs tions 2, 4, 6, etc. in Figure 1. can be answered by a single triple of the knowl- edge graph(‘Simple’), while some of them require 6. Implicit Relations Indication: Sometimes traversal along with more complex topology as in- words of the NLQ do not even make any men- dicated earlier(‘Factoid’), some questions require tion of the relations involved, however, they 3430
need to be inferred Zhang et al. (2018). For ex- a fixed and small set of topologies). They perform ample, in question 4 of Figure 1, some of the only one task of relationship prediction. relations are not mentioned in the question. 3.2 Techniques used for KGQA Problem Definition: The objective of the pro- Transformers and Machine Translation: Trans- posed approach is to output 1) the mentioned former Vaswani et al. (2017) has proved to be entity(s) (si ) in the query, 2) the answer entity one of the most exciting approaches for NLP re- type , 3) the path or set of predicates, Pq = search. They have shown dominating results in {p1q , p2q , p3q , ..., pN q } where, each pi ∈ E(G) and, Vaswani et al. (2017); Devlin et al. (2018), etc. 4) the question type. The set Pq is a sequence of The paper Lukovnikov et al. (2019) closely resem- predicates, such that if traversed along these edges bles our approach as they proposed a joint-learning from the mentioned entity node(s), we can arrive based multi-task model using Transformer. How- at the answer entity(s) node(s). The final answer is ever, they handle only 1-hop questions and consider then retrieved from KG and is post-processed as per relation prediction as a classification task. In its the outputs of question type and the ‘answer entity current form it cannot be used to solve the vari- type’ modules. We assume that we have N train- able length path prediction form, as required in our ing samples DT rain = {(qi , ηi , ti , qti , pathi )}N i=1 motivating example. In an extension to the work available to us where, qi ∈ Q, si ⊆ V (G), ti ∈ of using logical forms for KGQA Dong and Lap- τ, qti ∈ qtype , and, pathi ⊆ 2E(G) . ata (2016) proposed the usage of attention-based 3 Related Work seq2seq model to generate the logical form of an in- put utterance. However, they use an LSTM model In this section, we present a view of prior work, on and not Transformer. the KGQA problem as an NLP task, and then on Graph-Based Approaches GraftNet Sun et al. set of techniques used for this task. (2018) and PullNet Sun et al. (2019) approached 3.1 KGQA Task the problem of KGQA using graph-based solutions. The approach extracts a subgraph related to query Berant et al. (2013); Berant and Liang (2014); and then performs reasoning over it to extract the Reddy et al. (2014); Luo et al. (2018) proposed an final answer(s). KV-MEM Bordes et al. (2015) approach to perform KGQA by mapping a query to proposed a memory network-based approach for a its logical form and then converting it to a formal single relationship prediction. query to extract answers. However, these are not Embedding Based Approaches Approaches for joint learning tasks as proposed in our work. KGQA using KG embeddings (such as TransE Bor- Multi-Task based approaches Similar to us, des et al. (2013) and TransR Lin et al. (2015)) were many works like Lukovnikov et al. (2019); Huang used by Huang et al. (2019) when only one relation et al. (2019); Shen et al. (2019) rely on the jointly (i.e., one RDF triple) is involved. In Bordes et al. learning multiple sub-tasks tasks of KGQA prob- (2013); Socher et al. (2013); Dettmers et al. (2018) lem. However, all these approaches focus on single- also one-hop KG modeling approaches were pro- hop relations only, and therefore we cannot take posed. Recently, Saxena et al. (2020) presented an such approaches as a baseline for our model. In a approach, EmbedKGQA, for joint learning, again more complex setting, Shen et al. (2019) proposed using KG Embeddings, in the context of multi-hop a joint learning task for entity linking, path pre- relations. However, their approach is not truly a diction (chains topology only), and question type. joint model as they perform answer candidate selec- However, their model does not predict answer en- tion via the model, i.e., they arrive at the candidates tity type. We do not compare our approach with before executing the model. Shen et al. (2019) because they focus on the im- Our proposed approach has outperformed Pull- plicit mention of the entities in previous sentences Net and EmbedKGQA on the MetaQA dataset, as of dialogue, and also because they do not attempt shown in Section 5. to predict non-chain topology or the answer entity type. 4 Proposed Architecture Non-Chain Multi-Hop Relations Agarwal et al. (2019) proposed an embedding based approach to In this section, we describe our proposed joint predict non-chain multi-hop relation prediction (for model (CQA-NMT) which is an encoder-decoder 3431
INPUT Who starred films for the screenwriter of Rear Window? MULTI-TASK MODEL SHARED ENCODER ENTITY QUESTION Answer PATH MENTION TYPE Entity Type PREDICTION PREDICTION DETECTION Prediction (seq2seq) (seq2seq) (classification) (classification) Rear Window ANSWER ENTITY ENTITY LINKING PATH QUESTION TYPE TYPE written_by, has_written, starred_actor KNOWLEDGE BASE Factoid NA ENTITY ID rear_window Relevant subgraph ... 1954 Grace ... ... ar Kelly ... Peyton Peyton sta ye sta sta Place e_ n rre n John writte rre writte Place rre John has_ d_ has_ as d_ d Michael ... ac ... _a rele Michael ac ... ... tor n_by cto ... tor n_by Hayes Hayes writte writte r Rear Lee Rear Lee writte Window writte n_by Philips Lee Window n_by Philips Algo d_ actor ... Cornell has_ Philips, starre Cornell has_writ dir writte has_genre Thelma ten n Missisipi Catherine gs ec Woolrich Missisipi Woolrich ta ted Ritter ... Mermaid ... Mermaid Deneuve, _b y classic starre starre d_ac ... Alfred tor ... d_ac tor Catherine ... hitchcock Catherine Thriller Deneuve ... Deneuve ... ... ... OUTPUT Figure 3: Proposed Multi-task model CQA-NMT. The query is passed through all the modules to (1) Extract the mentioned entity and link it to a KG node. (2) Generate the sequence of predicates required for traversal. (3) Predict the Question type and, (4) Predict AET. After extracting all the information from query, a corresponding formal query such as, SPARQL, is generated to retrieve the final answer. based model where, BERT Devlin et al. (2018) is WordPiece tokenization, we assigned the same la- used as an Encoder and a Transformer Vaswani bel to the other tokenized input corresponding to et al. (2017) as a decoder. Figure 3 illustrates a their first sub-token. For e.g., the output of BERT’s high-level view of the proposed model. Wordpiece tokenizer is ‘Jim Hen ##son’ for the Joint Model for KGQA input ‘Jim Henson’. We assigned the labels for the In this paper, we extend BERT to generate path (or tokenized output as ‘B-Per I-Per I-per’, i.e., the sec- inference chains), perform sequence labeling and ond sub-word ‘##son’ was given the same label as classification jointly. Details of each module are the first sub-word ‘Hen’. The output of the softmax described next. layer is: 1. Entity Mention Detection Module: To extract yei type = sof tmax(Wetype .hi + betype ) (1) the mentioned entity(s) from NL query, we per- where, hi is the hidden state corresponding to the formed a sequence labeling task using BERT’s ith token. hidden states (Figure 4). Sequence labeling is a 2. Entity Linking: The output of the Entity Men- seq2seq task that tags the input word sequence tion Detection Module is a sequence of tokens x = (w1 , w2 , ..., wT ) with the output label se- along with its type (ti ) for a candidate entity. These quence yseq = (y1 , y2 , ..., yT ). In this paper, we mentioned entities still need to be linked to a KG augmented CQA-NMT to jointly infer the type node for traversal. In our work, we do not use any of the mentioned entity(s) along with its(their) neural network for the linking process. Instead, we ‘span’. We feed the final hidden states of the tokens rely on an ensemble of string matching algorithms2 h2 , h3 , ..., hT −1 , into a softmax layer to generate and PageRank Page et al. (1999) to break the ties output sequence. Also, we ignore the h1 and hT between candidate entities. i.e., [CLS] and [SEP] tokens as they can never The Entity Mention Detection Module outputs as be a part of an entity(s) and are only required as a preprocessing step of BERT. Since BERT uses 2 We used Levenshtein Distance and SequenceMatcher packages available in Python 3432
many entities as provided in a query and their as- INPUT Who starred films for the screenwriter of Rear Window? sociated type (ti ). To link the mentioned entity ENCODER ... ENTITY MENTION DETECTION [CLS] who starred Rear Window ? [SEP] DECODER in the NL query, we extract the candidates from ECLS E2 E3 ... En-3 En-2 En-1 En DECODER V(G) of type ti . We then apply 3 string-matching BERT . . . T[CLS] h2 h3 ... hn-3 hn-2 hn-1 hn algorithms, similar to (Mohammed et al., 2017) , DECODER ... and take a majority voting to further break the ties. O O BIO PREDICTION B-Movie I-Movie O DECODER Finally, we apply the PageRank algorithm to link Rear Window the mentioned entity with a KG entity. One way ENTITY LINKING to understand the usability of the PageRank algo- PATH PREDICTION Knowledge Graph Secret_Window String Rear_Windows rithm is to consider the notion of popularity. For KNOWLEDGE Entity Set Matching Heuristics Advance_to_Rear Open_Windows Page Rank BASE windows e.g., if a user queries ‘Where was Obama born?’, Top k candidates entity set the user here is more likely referring to the famous ENTITY ID PATH Barack Obama, compared to any other. A detailed Rear_Window written_by, has_written, description of the entity mention detection and the . starred_actors entity linking procedure is shown in Figure 4. Figure 4: Entity Mention and Entity Linking Modules. 3. Path prediction Module: To generate the se- quence of predicates for an input query, we aug- mented our architecture with a Transformer-based where, Vaswani et al. (2017) decoder which is often used yqtype ∈ {factoid, count, boolean, simple} (8) in Neural Machine Translation (NMT) tasks. We define ypath ={p1 , p2 , ..., pN } where each pi ∈ yτ ∈ {entity types in KG} (9) E(G). In our work, we do not constraint the num- yei type ∈ {B, I} × {entity types} ∪ {O} (10) ber of predicates (multiple-hops) that are required pi ∈ E(G). (11) to extract the final answer. Hence, an obvious choice was to use a decoder module which can stop generating the predicates once it has predicted For training we maximize the conditional probabil- the end-of-sentence ([EOS]) token (Figure 4). ity p(yetype , ypath , yqtype , yτ |x). The model is fine- 4. Question Type and Answer Entity Type pre- tuned end-to-end via minimizing the cross-entropy diction module: In our work, we formulate the loss. task of determining the question type and the AET 5 Experiments and System details as a classification task since we have a discrete set for both qtype and Answer Entity Types. Using the In this section, we first introduce the datasets used hidden states of the first special token from BERT, for our experiments. We pre-process all NLQs (of i.e., [CLS], we predict: all datasets) by downcasing and tokenizing. yqtype = sof tmax(Wqt .h1 + bqt ) (2) 5.1 Datasets, Metrices, and Baselines yτ = sof tmax(Wtgt .h1 + btgt ) (3) LOCA Dataset: We introduce a new challenging To jointly model all the task using a single dataset ‘LOCA’, which consists of 5010 entities, architecture, we define our training objective as: 42 unique predicates, and a total of 45,869 facts. The dataset has 3,275 one or multi-hop questions p(y|x) = p(yetype , ypath , yqtype , yτ |x) (4) that have 0, 1, or more entities mentioned in the questions. It contains multiple question types like p(y|x) = p(yqtype |x).p(yτ |x).p(yetype |x).p(ypath |x) count, factoid, and boolean. (5) For the questions with multiple entities, we used The path and AET components of CQA-NMT are an operator “;” as a delimiter to separate paths defined as, corresponding to each entity (in Figure 1, query Y N 5, 7, and 8). For the scope of this paper, we i p(yetype |x) = p(yetype |x), (6) considered queries involving the only intersection n=1 which can be replaced with other operators like Y T union, set-difference, etc. without loss of any p(ypath |x) = p(pt |p1 , p2 , ..., pt−1 |x). (7) generality. The operator “;” help us detect and t=1 predict the different topologies involved in an NL 3433
Train Dev Test dataset into train, test, and, dev, we used the same split as provided by Zhang et al. (2018) for the MetaQA 1-hop 96,106 9,992 9,947 MetaQA dataset and a ratio of 80-10-10 for LOCA MetaQA 2-hop 118,980 14,872 14,872 dataset. MetaQA 3-hop 114,196 14,274 14,274 LOCA 1-hop 2,000 250 250 5.3 Main Results LOCA 2-hop 400 50 50 In this section, we report the results of the experi- LOCA 3[or more]-hop 220 28 28 ments on the MetaQA and the LOCA dataset. Next, Table 1: Statistics of MetaQA and LOCA dataset we provide insights into the model outputs and re- sults of error-analysis performed on LOCA dataset. query. 5.3.1 LOCA MetaQA: The dataset proposed in Zhang et al. The experimental results for LOCA dataset are (2018) consists of 3 different datasets namely, shown in the last row of table 2. The results affirm Vanilla, NTM, and, Audio Data. All the datasets that the proposed approach outperforms the base- contain single and multi-hop (maximum 3-hop) lines. We observed that the baselines’ inability to questions from the movie domain. For our experi- handle Duplicate KG Entity (Section 2 challenge ments, we used the Vanilla and the NTM version of 4) limits their performance. Additionally, the abil- the datasets and the KB as provided in Zhang et al. ity of the NMT Bahdanau et al. (2014) model to (2018). Since, both versions of MetaQA do not effectively handle complex and un-known topolo- consider the AET and question type, we assigned a gies helped us retrieve answers with better accuracy default label to both the tasks. for variable-hop (v-hop) queries. Metrics: We used different metrics for different subtasks. Since a query can contain partially men- 5.3.2 MetaQA tioned entities, we used F-score to evaluate mention The experimental results for MetaQA are shown in and its type detection module. For Inference Chain table 2. For Vanilla MetaQA, we achieved better (or Path prediction), question type, and, answer en- answer accuracy on 1-hop and 3-hop settings. How- tity type prediction we use the accuracy measure. ever, in a 2-hop setting, we were able to achieve In Table 2, similar to prior works, we have used the comparable results to the state-of-the-art. An incre- Hits@1 to evaluate the query-answer accuracy. ment of about 2% and 4.9% Hits@1 can be seen in Baselines: We have used, KV-mem Bordes et al. the 1-hop and 3-hop settings. (2015), GraftNet Sun et al. (2018), PullNet Sun To obtain the performance of each baseline on et al. (2019), VRN Zhang et al. (2018), and Em- v-hop (variable-hop) dataset, we re-use the existing bedKGQA Saxena et al. (2020) as the baselines. models for 1-hop, 2-hop, and 3-hop and assume that there is an oracle which can redirect query 5.2 Training Details to the correct model. Thus estimated accuracy of All the baselines and the proposed approach were various approaches is shown in the 4th row of Table trained on DGX 32GB NVIDIA GPU using Ten- 2, while the actual results on v-hop dataset are sorFlow Abadi et al. (2015) and Texar Hu et al. shown in the 5th row. It is evident that CQA-NMT (2018) libraries. For CQA-NMT, we used the small outperforms all the baselines on MetaQA dataset uncased version of pre-trained BERT Devlin et al. in variable hop setting. (2018) model. Adam Kingma and Ba (2014) opti- To gauge the effectiveness and robustness of mizer was employed with a learning rate of 2e-5 our model, we used the same models trained on for BERT and default for others. The training ob- vanilla MetaQA dataset and evaluated its perfor- jective of each model was maximized using the mance on NTM MetaQA, i.e., in zero-shot setting. cross-entropy loss and the best models were se- For this, we achieved better results on 1 and 3-hop. lected using the validation loss. Dropout values The worse performance of CQA-NMT on MetaQA- were set to .5 and were optimized as described in NMT(2-hop) can be because of zero shot setting. Srivastava et al. (2014). For BERT we used 10% of Because, as compared to VRN, we have not trained total training data for the warmup phase of BERT CQA-NMT on MetaQA-NTM dataset, we trained Vaswani et al. (2017). Finally, for the division of it on MetaQA vanilla dataset only. 3434
KV-Mem GraftNet PullNet EmbedKGQA VRN CQA-NMT MetaQA(1-hop) 96.2 97.0 97.0 97.5 97.5 99.96 MetaQA(2-hop) 82.7 94.8 99.9 98.8 89.9 99.45 MetaQA(3-hop) 48.9 77.7 91.4 94.8 62.5 99.78 MetaQA(V-hop) Estimated 73.79 89.11 96.05 - - 99.69 MetaQA(V-hop) - 83.67* - - - 95.85 MetaQA-NTM(1-hop) - - - - 81.3 83.3 MetaQA-NTM(2-hop) - - - - 69.7 62.1 MetaQA-NTM(3-hop) - - - - 38.0 81.3 LOCA 51.07 72.27 - - - 78.29 Table 2: Results of the baselines and CQA-NMT on MetaQA (vanilla and NTM) and LOCA dataset. Results for MetaQA-NTM were obtained in zero-shot setting for CQA-NMT. All other baseline results are taken from Sun et al. (2019), Zhang et al. (2018), and Saxena et al. (2020), except for * marked numbers. Note: Source code for PullNet, VRN is not available. We were not able to replicate the same results of KV-Mem. Mention Detection AET Inference Chains Question Type Answer (F1-score) (Accuracy) (Accuracy) (Accuracy) (Accuracy) the worst performance. In row 2, 3, and 4 of Table Unsupervised 67.12 53.3 51.0 - 42.9 3, we kept only one component of CQA-NMT as BERT (mention only) 83.3 53.22 51.9 - 47.22 supervised and applied heuristics for others as men- BERT (AET only) 67.38 75.7 55.6 - 45.2 BERT (IC only) 67.8 50.9 80.1 - 40.1 tioned above. As evident from these rows, mention BERT (mention and AET) 87.1 76.3 71.6 - 65.33 detection plays a crucial role in extracting the cor- BERT (AET and IC) 66.3 74.1 77.8 - 60.39 BERT (Mention and IC) 87.3 53.1 79.6 - 51.3 rect answer (a jump in range of 2%-5% in answer CQA-NMT 93.66 79.89 81.95 97.65 71.01 accuracy). A similar analysis can be found in Dong and Lapata (2016); Guo et al. (2018); Srivastava Table 3: Effects for reducing the supervision from our et al. (2020), where authors found that entity link- approach. The numbers in italics are obtained without any supervision. ing error is one of the major errors leading to wrong predictions in KGQA. In summary, while testing each components of 5.4 Further Results and Analysis CQA-NMT, we tried supervising different com- Advantage of Transformers: In the LSTM based ponents at a time and used heuristics based ap- implementation of mentioned entity detection, it proaches for the remaining components. The could not detect different entity types for the same heuristics are: phrase “deep learning” in query 1 and 2 of Figure a) Shortest path algorithm, similar to Sun et al. 1. However, in BERT-based approach it was able (2018, 2019), for Path Prediction Task. to. We therefore infer that such phenomenon could b) Candidate and Relation Pairing Score Mo- occur due to key features of BERT such as multi hammed et al. (2017) for Entity Linking. head attention, WordPiece embeddings, Positional c) LSTM based Classifier Mohammed et al. (2017) embeddings, and/ or Segment embeddings. More- for Answer Entity Type Prediction. [ however in over, in a different context, it was able to assign Row 1 “Unsupervised”, we identify the AET if different types to the entities with the same men- name of the entity-type is present in the NLQ ]. tions (Query 1 and 2 from Figure 1). Benefits of joint training: From row 5-7 of Table Effects of using less annotations: To study the 3, we infer that joint training not only improves importance of annotation in our approach, we re- the scores of individual components (in range 15%- moved several components from our proposed ap- 20%) but also, the overall answer accuracy. We proach and studied the effects (Table 3). We first observed that the challenges 5 and 6 from section 2 studied CQA-NMT after removing all the supervi- were handled significantly better after jointly train- sion and used heuristics-based-approaches for AET ing CQA-NMT for AET and mention detection and Mention Detection (both the approaches were (row 5). We found that the context used by a men- taken from Mohammed et al. (2017)). The shortest tioned entity for AET was different in different path, similar to Sun et al. (2018, 2019), between queries. For e.g., in queries ‘Who heads Deep the linked KG entity and AET, was then taken to Learning?’ and ‘papers in Deep Learning’, Deep retrieve the answers. This setting (row 1) results in Learning, is a research area with AET ‘head’ in 3435
the former, and in the later, is a keyword with AET a bunch of rules for different question-types and ‘paper.title’. After jointly training the mention de- used a simple-mapping rules to map the queries tection model with AET or IC, a jump of ∼4% to the sketches. For e.g., consider the query, q = can be observed for the individual components and “Who is working in automated regulatory compli- ∼11-20% in answer accuracy. ance and has published a paper in NLP?”. The Errors with Joint Training: From table 3, it output of CQA-NMT contains all the information seems that the joint-training helps. However, we that is required to form a structured query such as also found that the error made by a single module SPARQL. The outputs of CQA-NMT are: produces a cascading of errors. For example, if for 1. Linked Entities: {e5: automated regulatory a single mention span ‘deep learning and ai’ with compliance (sub-area), e6: NLP (keyword)} the gold label as ‘B-research area I-research area 2. Inference Chain: key person; has paper, au- I-research area I-research area’, the entity detec- thor tion predicts ‘B-research area I-research area O 3. Answer Entity Type (AET): researcher.name B-keyword’, the path prediction module generates 4. Question Type (qtype ): Factoid two different paths, one for each entity, leading to After using the qtype information, we fill a sketch wrong answer. using other outputs. The generated SPARQL query Similarly, if AET prediction module makes an ‘er- is: ror’, it gets propagated to all the other modules SELECT DISTINCT ?uri WHERE { simultaneously. The frequency of such error, how-
Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay systems. In European Semantic Web Conference, Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey pages 625–641. Springer. Irving, Michael Isard, Yangqing Jia, Rafal Jozefow- icz, Lukasz Kaiser, Manjunath Kudlur, Josh Leven- Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, berg, Dandelion Mané, Rajat Monga, Sherry Moore, and Sebastian Riedel. 2018. Convolutional 2d Derek Murray, Chris Olah, Mike Schuster, Jonathon knowledge graph embeddings. In Thirty-Second Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, AAAI Conference on Artificial Intelligence. Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Mar- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and tin Wattenberg, Martin Wicke, Yuan Yu, and Xiao- Kristina Toutanova. 2018. Bert: Pre-training of deep qiang Zheng. 2015. TensorFlow: Large-scale ma- bidirectional transformers for language understand- chine learning on heterogeneous systems. Software ing. arXiv preprint arXiv:1810.04805. available from tensorflow.org. Li Dong and Mirella Lapata. 2016. Language to log- Puneet Agarwal, Maya Ramanath, and Gautam Shroff. ical form with neural attention. arXiv preprint 2019. Retrieving relationships from a knowl- arXiv:1601.01280. edge graph for question answering. In European Conference on Information Retrieval, pages 35–50. Mohnish Dubey, Debayan Banerjee, Debanjan Chaud- Springer. huri, and Jens Lehmann. 2018. Earl: joint entity and relation linking for question answering over knowl- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- edge graphs. In International Semantic Web Confer- gio. 2014. Neural machine translation by jointly ence, pages 108–126. Springer. learning to align and translate. arXiv preprint arXiv:1409.0473. Mohnish Dubey, Sourish Dasgupta, Ankit Sharma, Konrad Höffner, and Jens Lehmann. 2016. Asknow: Jonathan Berant, Andrew Chou, Roy Frostig, and Percy A framework for natural language query formaliza- Liang. 2013. Semantic parsing on freebase from tion in sparql. In European Semantic Web Confer- question-answer pairs. In Proceedings of the 2013 ence, pages 300–316. Springer. conference on empirical methods in natural lan- guage processing, pages 1533–1544. Daya Guo, Duyu Tang, Nan Duan, Ming Zhou, and Jian Yin. 2018. Dialog-to-action: Conversational Jonathan Berant and Percy Liang. 2014. Semantic pars- question answering over a large-scale knowledge ing via paraphrasing. In Proceedings of the 52nd An- base. In Advances in Neural Information Process- nual Meeting of the Association for Computational ing Systems, pages 2942–2951. Linguistics (Volume 1: Long Papers), pages 1415– 1425. Dilek Hakkani-Tür, Asli Celikyilmaz, Larry Heck, Roi Blanco, Giuseppe Ottaviano, and Edgar Meij. 2015. Gokhan Tur, and Geoff Zweig. 2014. Probabilis- Fast and space-efficient entity linking for queries. In tic enrichment of knowledge graph entities for rela- Proceedings of the Eighth ACM International Con- tion detection in conversational understanding. In ference on Web Search and Data Mining, pages 179– Fifteenth Annual Conference of the International 188. Speech Communication Association. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Zhiting Hu, Haoran Shi, Bowen Tan, Wentao Wang, Sturge, and Jamie Taylor. 2008. Freebase: a collab- Zichao Yang, Tiancheng Zhao, Junxian He, Lianhui oratively created graph database for structuring hu- Qin, Di Wang, Xuezhe Ma, et al. 2018. Texar: A man knowledge. In Proceedings of the 2008 ACM modularized, versatile, and extensible toolkit for text SIGMOD international conference on Management generation. arXiv preprint arXiv:1809.00794. of data, pages 1247–1250. Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Antoine Bordes, Nicolas Usunier, Sumit Chopra, and Li. 2019. Knowledge graph embedding based ques- Jason Weston. 2015. Large-scale simple question tion answering. In Proceedings of the Twelfth ACM answering with memory networks. arXiv preprint International Conference on Web Search and Data arXiv:1506.02075. Mining, pages 105–113. Antoine Bordes, Nicolas Usunier, Alberto Garcia- Diederik P Kingma and Jimmy Ba. 2014. Adam: A Duran, Jason Weston, and Oksana Yakhnenko. method for stochastic optimization. arXiv preprint 2013. Translating embeddings for modeling multi- arXiv:1412.6980. relational data. In Advances in neural information processing systems, pages 2787–2795. Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N Mendes, Sebastian Andreas Both, Dennis Diefenbach, Kuldeep Singh, Hellmann, Mohamed Morsey, Patrick Van Kleef, Saedeeh Shekarpour, Didier Cherix, and Christoph Sören Auer, et al. 2015. Dbpedia–a large-scale, mul- Lange. 2016. Qanary–a methodology for tilingual knowledge base extracted from wikipedia. vocabulary-driven open question answering Semantic Web, 6(2):167–195. 3437
Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, Richard Socher, Danqi Chen, Christopher D Manning, and Xuan Zhu. 2015. Learning entity and relation and Andrew Ng. 2013. Reasoning with neural ten- embeddings for knowledge graph completion. In sor networks for knowledge base completion. In Twenty-ninth AAAI conference on artificial intelli- Advances in neural information processing systems, gence. pages 926–934. Denis Lukovnikov, Asja Fischer, and Jens Lehmann. Himani Srivastava, Prerna Khurana, Saurabh Srivas- 2019. Pretrained transformers for simple question tava, Vaibhav Varshney, Lovekesh Vig, Puneet Agar- answering over knowledge graphs. In International wal, and Gautam Shroff. 2020. Improved question Semantic Web Conference, pages 470–486. Springer. answering using domain prediction. Kangqi Luo, Fengli Lin, Xusheng Luo, and Kenny Zhu. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, 2018. Knowledge base question answering via en- Ilya Sutskever, and Ruslan Salakhutdinov. 2014. coding of complex query graphs. In Proceedings of Dropout: a simple way to prevent neural networks the 2018 Conference on Empirical Methods in Natu- from overfitting. The journal of machine learning ral Language Processing, pages 2185–2194. research, 15(1):1929–1958. Fabian M Suchanek, Gjergji Kasneci, and Gerhard Salman Mohammed, Peng Shi, and Jimmy Lin. 2017. Weikum. 2007. Yago: a core of semantic knowledge. Strong baselines for simple question answering over In Proceedings of the 16th international conference knowledge graphs with and without neural networks. on World Wide Web, pages 697–706. arXiv preprint arXiv:1712.01969. Haitian Sun, Tania Bedrax-Weiss, and William W Co- Lawrence Page, Sergey Brin, Rajeev Motwani, and hen. 2019. Pullnet: Open domain question answer- Terry Winograd. 1999. The pagerank citation rank- ing with iterative retrieval on knowledge bases and ing: Bringing order to the web. Technical report, text. arXiv preprint arXiv:1904.09537. Stanford InfoLab. Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Kathryn Aasish Pappu, Roi Blanco, Yashar Mehdad, Amanda Mazaitis, Ruslan Salakhutdinov, and William W Co- Stent, and Kapil Thadani. 2017. Lightweight multi- hen. 2018. Open domain question answering using lingual entity extraction and linking. In Proceedings early fusion of knowledge bases and text. arXiv of the Tenth ACM International Conference on Web preprint arXiv:1809.00782. Search and Data Mining, pages 365–374. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Seonyeong Park, Soonchoul Kwon, Byungsoo Kim, Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz and Gary Geunbae Lee. 2015. Isoft at qald-5: Hy- Kaiser, and Illia Polosukhin. 2017. Attention is all brid question answering system over linked data and you need. In Advances in neural information pro- text data. In CLEF (Working Notes). cessing systems, pages 5998–6008. Siva Reddy, Mirella Lapata, and Mark Steedman. 2014. Amir Pouran Ben Veyseh. 2016. Cross-lingual ques- Large-scale semantic parsing without question- tion answering using common semantic space. In answer pairs. Transactions of the Association for Proceedings of TextGraphs-10: the workshop on Computational Linguistics, 2:377–392. graph-based methods for natural language process- ing, pages 15–19. Apoorv Saxena, Aditay Tripathi, and Partha Taluk- Jason Weston, Antoine Bordes, Oksana Yakhnenko, dar. 2020. Improving multi-hop question answering and Nicolas Usunier. 2013. Connecting language over knowledge graphs using knowledge base em- and knowledge bases with embedding models for re- beddings. Proceedings of the 2020 conference on lation extraction. arXiv preprint arXiv:1307.7973. Association for Computational Linguistics. Kun Xu, Sheng Zhang, Yansong Feng, and Dongyan Tao Shen, Xiubo Geng, Tao Qin, Daya Guo, Duyu Zhao. 2014. Answering natural language questions Tang, Nan Duan, Guodong Long, and Daxin Jiang. via phrasal semantic parsing. In Natural Language 2019. Multi-task learning for conversational ques- Processing and Chinese Computing, pages 333–344. tion answering over a large-scale knowledge base. Springer. arXiv preprint arXiv:1910.05069. Yuyu Zhang, Hanjun Dai, Zornitsa Kozareva, Alexan- Kuldeep Singh, Arun Sethupat Radhakrishna, An- der J Smola, and Le Song. 2018. Variational reason- dreas Both, Saeedeh Shekarpour, Ioanna Lytra, Ri- ing for question answering with knowledge graph. cardo Usbeck, Akhilesh Vyas, Akmal Khikmatul- In Thirty-Second AAAI Conference on Artificial In- laev, Dharmen Punjani, Christoph Lange, et al. 2018. telligence. Why reinvent the wheel: Let’s build question an- swering systems together. In Proceedings of the David Ziegler, Abdalghani Abujabal, Rishiraj Saha 2018 World Wide Web Conference, pages 1247– Roy, and Gerhard Weikum. 2017. Efficiency-aware 1256. answering of compositional questions using answer 3438
type prediction. In Proceedings of the Eighth Inter- national Joint Conference on Natural Language Pro- cessing (Volume 2: Short Papers), pages 222–227. 3439
You can also read