Jointly Extracting Explicit and Implicit Relational Triples with Reasoning Pattern Enhanced Binary Pointer Network
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Jointly Extracting Explicit and Implicit Relational Triples with Reasoning Pattern Enhanced Binary Pointer Network Yubo Chen† , Yunqi Zhang† , Changran Hu‡ , Yongfeng Huang† † Department of Electronic Engineering & BNRist, Tsinghua University, Beijing, China ‡ University of California, Berkeley, USA ybch14@gmail.com zhang-yq15@outlook.com changran_hu@berkeley.edu yfhuang@tsinghua.edu.cn Abstract Work for Locate in Relational triple extraction is a crucial task … Mark Spencer, a designer of Digium, a company in Huntsville … for knowledge graph construction. Existing Live in methods mainly focused on explicit relational triples that are directly expressed, but usually Person Organization Location Explict Triple Implicit Triple suffer from ignoring implicit triples that lack explicit expressions. This will lead to serious Figure 1: An example of explicit and implicit relational incompleteness of the constructed knowledge triples. Italic phrases are key relational expressions cor- graphs. Fortunately, other triples in the sen- responding to the explicit relational triples. tence provide supplementary information for discovering entity pairs that may have implicit relations. Also, the relation types between the with feature-based methods (Yu and Lam, 2010; Li implicitly connected entity pairs can be iden- and Ji, 2014; Ren et al., 2017). Afterward, neural tified with relational reasoning patterns in the network-based models were proposed to eliminate real world. In this paper, we propose a uni- hand-crafted features (Gupta et al., 2016; Zheng fied framework to jointly extract explicit and implicit relational triples. To explore entity et al., 2017). More recently, several methods were pairs that may be implicitly connected by rela- proposed to extract overlapping triples, such as tions, we propose a binary pointer network to tagging-based (Dai et al., 2019; Wei et al., 2020), extract overlapping relational triples relevant graph-based (Wang et al., 2018; Fu et al., 2019), to each word sequentially and retain the infor- copy-based (Zeng et al., 2018, 2019, 2020) and mation of previously extracted triples in an ex- token pair linking models (Wang et al., 2020). ternal memory. To infer the relation types of Existing models achieved considerable success implicit relational triples, we propose to intro- duce real-world relational reasoning patterns on extracting explicit triples which have direct rela- in our model and capture these patterns with tional expressions in the sentence. However, there a relation network. We conduct experiments are many implicit relational triples that are not ex- on several benchmark datasets, and the results plicitly expressed. For example, in Figure 1, the prove the validity of our method. explicit triples are strongly indicated by the key relational phrases, but the implicit relation “Live in” 1 Introduction is not expressed explicitly. Unfortunately, existing Relational triple extraction is defined as automat- methods usually ignored implicit triples (Zhu et al., ically recognizing semantic relations with triple 2019), which will cause serious incompleteness of structures (subject, relation, object) among multi- the constructed KGs and performance degradation ple entities in a sentence. It is a critical task for of downstream tasks (Angeli and Manning, 2013; constructing Knowledge Graphs (KGs) from unla- Jia et al., 2020; Jun et al., 2020). beled corpus (Dong et al., 2014). Our work is motivated by several observations. Early work of relational triple extraction ap- First, other relational triples within a sentence pro- plied pipeline methods (Zelenko et al., 2003; Chan vide supplementary information for discovering and Roth, 2011), which ran entity recognition and entity pairs that may have implicit relational con- relation classification separately. However, such nections. For example, in Figure 1, the explicit pipeline approaches suffered from error propaga- triples establish a relational connection between tion. To address this issue, recent work proposed “Mark Spencer” and “Huntsville” through the in- to jointly extract entity and relations from the text termediate entity “Digium”. Second, the relation 5694 Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5694–5703 June 6–11, 2021. ©2021 Association for Computational Linguistics
types of implicit relation triples can be derived interactions between entities and relations. through real-world reasoning patterns. For exam- To overcome these drawbacks, recent research ple, in Figure 1, the reasoning pattern “one lives focused on jointly extracting entities and relations, where the company he works for is located” helps including feature-based models (Yu and Lam, 2010; identify the type of the implicit triple as “Live in”. Li and Ji, 2014; Ren et al., 2017) and neural In this paper, we propose a unified framework network-based models (Gupta et al., 2016; Miwa for the joint extraction of explicit and implicit re- and Bansal, 2016; Zheng et al., 2017). For example, lational triples. We propose a Binary Pointer Net- Ren et al. (2017) proposed to jointly embed enti- work (BPtrNet), which is based on the pointer net- ties, relations, text features and type labels into two work (Vinyals et al., 2015), to extract overlapping low-dimensional spaces. Miwa and Bansal (2016) relational triples relevant to each word sequentially. proposed a joint model containing two long-short To discover implicitly connected entity pairs, we term memories (LSTMs) (Gers et al., 2000) with preserve the information of previously extracted shared parameters. Zheng et al. (2017) proposed triples in an external memory and use it to enhance to extract relational triples directly by transforming the extraction of later time steps. To infer the rela- this task into a sequence tagging problem, whose tion types between the implicitly connected entity tags contain the information of entities and the re- pairs, we propose to augment our model with real- lations they hold. However, they only assigned world relational reasoning patterns and capture the one label for each word, which means that this relational inference logic with a Relation Network method failed to extract overlapping triples. Subse- (RN) (Santoro et al., 2017). The RN obtains a quent work proposed several mechanisms to solve pattern-enhanced representation from the memory this problem: (1) labeling tagging sequences for for each word pair. Then the Reasoning pattern words (Dai et al., 2019) or entities (Yu et al., 2019; enhanced BPtrNet (R-BPtrNet) uses the word pair Wei et al., 2020); (2) transforming the sentence representation to compute a binary score for each into a graph structure (Wang et al., 2018; Fu et al., candidate triple. Finally, triples with positive scores 2019); (3) generating triple element sequences with are output as the extraction result. copy mechanism (Zeng et al., 2018, 2019, 2020; The main contributions of this paper are: Nayak and Ng, 2020); (4) linking token pairs with • We propose a unified framework to jointly a handshake tagging scheme (Wang et al., 2020). extract explicit and implicit relational triples. However, these methods usually ignored implicit • To discover entity pairs that are implicitly con- relational triples that are not directly expressed in nected by relations, we propose a BPtrNet the sentence (Zhu et al., 2019), thus will lead to the model to extract overlapping relational triples incompleteness of the resulting KGs and negatively sequentially and utilize an internal memory to affect the performance of downstream tasks (An- retain the extracted triples. geli and Manning, 2013; Jia et al., 2020). • To enhance the relation type inference of im- Our work is motivated by two observations. plicitly connected entity pairs, we propose to First, other triples in the sentence provide supple- introduce relational reasoning patterns, cap- mentary evidence for discovering entity pairs with tured with a RN, to augment our model. implicit relational connections. Second, the rela- • We conduct experiments on several bench- tion types of the implicit connections need to be mark datasets and the experimental results identified through real-world reasoning patterns. demonstrate the validity of our method. In this paper, we propose a unified framework for the joint extraction of explicit and implicit rela- 2 Related Work tional triples. We propose a binary pointer network Early work of relational triple extraction addressed to sequentially extract overlapping relational triples this task in a pipelined manner (Zelenko et al., and externally keep the information of predicted 2003; Zhou et al., 2005; Chan and Roth, 2011; triples for exploring implicitly connected entity Gormley et al., 2015). They first ran named entity pairs. We also propose to introduce real-world rea- recognition to identify all entities and then classi- soning patterns in our model to help derive the rela- fied relations between all entity pairs. However, tion type of implicit triples with a relation network. these pipelined methods usually suffered from er- Experimental results on several benchmark datasets ror propagation problem and failed to capture the demonstrate the effectiveness of our method. 5695
Extracted Triples (i=n) End Tags O PER O ORG O LOC Spencer Digium Start Tags PER O O ORG O LOC Live in Locate in ... Hunt. Hunt. Inputs Output Work for 0 0 0 0 0 0 0 0 0 0 0 0 Mark Spencer ... Digium ... Hunt. Linear Locate in 0 0 0 0 0 0 1 0 0 0 0 0 BPtrNet Decoder Hunt. Entity Tagger Gate Live in 0 0 0 1 0 0 0 0 0 0 0 0 ... ... ... ... .9 .2 .1 …… 0 0 0 0 0 0 0 0 0 0 0 0 Linear ... ... Micro. Relation Network Spencer ... Spencer Work for ... Work for Digium ... Micro. (i=n-5) (i=...) (i=1) ... Word Encoder External Memory Micro. ... Mark Spencer ... of Digium , a com. in Hunt. , a com. in Hunt. BPtrNet Encoder LM embeddings p Word representations x Reasoning representations hP Relation type embeddings r Char. representations c Encoder hidden states hE Decoder hidden states hD LSTM cells Word embeddings w Global hidden state g Entity tag embeddings e Figure 2: The overall framework of our approach. 3 Our Approach sequence-to-sequence (seq2seq) model. Therefore, we propose a Binary Pointer Net- The overall framework of our approach is shown work (BPtrNet), based on a seq2seq pointer net- in Figure 2. We introduce the Binary Pointer Net- work (Vinyals et al., 2015), to jointly extract ex- work (BPtrNet) and the Relation Network (RN) in plicit and implicit relational triples. Our model Section 3.1 and 3.2 and the details of training and first encodes the words of a sentence into vector inference in Section 3.3, respectively. representations (Section 3.1.1). Then, we use a bi- nary decoder to sequentially transform the vectors 3.1 Binary Pointer Network into (overlapping) relational triples (Section 3.1.2). Existing methods usually failed to extract implicit We also introduce an external memory to retain relational triples due to the lack of explicit expres- previously extracted triples for enhancing future sions (Zhu et al., 2019). Fortunately, we observe decoding steps (Section 3.1.3). that other triples in the sentence can help discover 3.1.1 Encoder entity pairs that may have implicit relational con- nections. For instance, in the sentence “George is Given a sentence [w1 , . . . , wn ], we first capture Judy’s father and David’s grandfather”, the rela- morphological patterns of entities with a convolu- tion between Judy and David is not explicitly ex- tional neural network (CNN) (LeCun et al., 1989) pressed. In this case, if we first extract the explicit and compute the character representation ci of the triple (Judy, father, George) and keep its informa- word wi (i = 1, . . . , n): ci = CNN(wi ; θ) ∈ Rdc . tion in our model, we can easily establish an im- Then we introduce the context-sensitive representa- plicit connection between Judy and David through tions p1:n captured with a pre-trained Language George because George is explicitly connected Model (LM) to bring rich semantics and prior with David by the relational keyword “grandfa- knowledge from the large-scale unlabeled corpus. ther”. Inspired by this observation, our model ex- We feed ci , pi and the word embedding wi into a tracts relational triples relevant to each word se- bidirectional LSTM (BiLSTM) to compute the con- quentially and keeps all previous triples of this textualized word representations x1:n and encode sentence to enhance the extraction at future time the sentence with another BiLSTM: steps. This word-by-word extraction process can be regarded as transforming a text sequence into a xi = BiLSTMin ([wi ;pi ; ci ]) ∈ Rdin . (1) sequence of extracting actions, which leads us to a hEi = BiLSTM Enc (xi ) ∈ R2dE . 5696
3.1.2 Binary Decoder in our model (i, j = 1, . . . , n). σ and ρ are the First, to capture the interactions between entities sigmoid and tanh functions, respectively. r ∈ RdR and relations, we recognize the entities with a span- is the type embedding of the relation r. Wptr and based entity tagger (Yu et al., 2019; Wei et al., bptr are parameters of the binary pointer. 2020) and transform the tags into vectors as part 3.1.3 External Memory of the decoder’s input (Figure 2). Specifically, we We introduce an external memory M to keep the assign each token a start and end tag to indicate previously extracted triples of the sentence. We whether the current token corresponds to a start or first initialize M as an empty set. After the de- end position of an entity of a certain type: coder’s extraction process at the i-th time step, we hTi = BiLSTMT ag (xi ) ∈ RdT represent the extracted triple t = (wst , rt , wi ) as: s/e p(yi ) = softmax(Ws/e hTi + bs/e ) (2) hM E E dM t = [hst ; rt ; hi ] ∈ R . (5) s/e s/e tagi = arg max p(yi = k) k Then we update the memory with the representa- where (Ws , bs ) and (We , be )are parameters of tions of the output triples t1 , . . . , tNi : the start and end tag classifiers, respectively. Then N ×dM we obtain the entity tag embedding ei ∈ Rde by M ← [M; hM M t1 ; . . . ; htN ] ∈ R (6) i averaging the look-up embeddings of the start and where Ni is the number of the currently extracted end tags. We also capture a global contextual em- triples and N = ik=1 Nk . Note that we set and P bedding g by max pooling over hE 1:n . Then we update the external memory for each sentence in- adopt a LSTM as the decoder (hD 0 = hE n ): dependently, and the memory stores only the triple hD i = LSTM Dec ([xi ; ei ; g], hD i−1 ) ∈ R 2dE . (3) representations of one single sentence. Thus triples of other sentences will not be introduced into the Next, we introduce how to extract relational sentence currently being extracted. Finally, the triples at the i-th time step. We consider the current triples in the memory are utilized to obtain the rea- word as the object entity, select words as subjects soning pattern-enhanced representations for future that form triples with the object from all the words time steps, as described in Section 3.2. of the sentence, and predict the relation types be- tween the subjects and the object. For example, in 3.2 Relation Network for capturing patterns Figure 2, when the current object is “Huntsville”, of relational reasoning the model selects “Digium” as the subject and clas- Relation types of implicit relational triples are dif- sifies the relation as “Locate in”. Thus (“Digium”, ficult to infer due to the lack of explicit evidence, “Locate in”, “Huntsville”) is extracted as a rela- thus need to be derived with real-world relational tional triple. Multi-token entities are represented reasoning patterns. For example, in the sentence with their last words and recovered by finding the “George is Judy’s father and David’s grandfather”, nearest start tags of the same type from their last the relation type between “Judy” and “David” can positions. However, the original softmax pointer be inferred as “father” using the pattern “father’s in (Vinyals et al., 2015) only allows an object to father is called grandfather”. point to one subject, thus fails to extract multiple Based on this fact, we propose to enhance our triples with overlapping objects. To address this model by introducing real-world relational reason- issue, we propose a binary pointer, which indepen- ing patterns. We capture the patterns with a Rela- dently computes a binary score for each subject tion Network (RN) (Santoro et al., 2017), a neu- to form a relational triple with the current object ral network module specially designed for rela- under each relation type. Our method naturally tional reasoning. A RN is essentially a composite solves the overlapping triple problem by producing function over arelational triple set T : RN(T ) = multiple positive scores at one step (Figure 2). We fφ {gθ (t)}t∈T , where fφ is an aggregation func- formulate the score of the triple (wj , r, wi ) as: tion and gθ projects a triple into a fixed-size em- (r) > ptr E D sji = σ(r ρ(W [hj ; hi ] + b )), ptr (4) bedding. We set the memory M as the input re- lational triple set T and utilize the RN to learn a and extract this candidate triple as a relational triple pattern-enhanced representation hPji for the word (r) if sji is higher than some threshold, such as 0.5 pair (wj , wi ) at the i-th time step. First, the gθ reads 5697
the triple representations from M and projects them 4 Experiments with a fully-connected layer: 4.1 Datasets and Evaluation Metrics gθ (t) = ρ(Wθ hM θ dP t +b )∈R . (7) We evaluate our method on two benchmark datasets. Then fφ selects useful triples with a gating net- NYT (Riedel et al., 2010) consists of sentences work1 : uφt = σ gθ (t)Uφ [hE D from the New York Times corpus and contains 24 j ; hi ] ∈ R, and ag- gregates the selected triples with the word pair to relation types. WebNLG (Gardent et al., 2017) compute hPji using another fully-connected layer: was created for natural language generation task. It contains 171 relation types2 and was adopted for re- hPji = fφ ({gθ (t)}t∈M ) lational triple extraction by (Zeng et al., 2018). We φ E D X φ φ (8) split the sentences into three categories: Normal, = ρ W hj ; hi ; ut gθ (t) + b . SingleEntityOverlap (SPO) and EntityPairOverlap t (EPO) following Zeng et al. (2018). The statistics (r) Finally, we modify Equation 4 as sji = σ(r> hPji ) of the two datasets are shown in Table 1. Follow- to compute the binary scores of candidate triples. ing previous work (Zeng et al., 2018; Wei et al., We denote our Reasoning pattern enhanced BPtr- 2020; Wang et al., 2020), an extracted relational Net model as R-BPtrNet. Note that we use quite triple is regarded as correct only if the relation and simple formulas for fφ and gθ because our contri- the heads of both subject and object are all correct. bution focuses on the effectiveness of introducing We report the standard micro precision, recall, and relational reasoning patterns for this task rather F1 -scores on both datasets. than the model structure. Exploration for more complex structures will be left for future work. NYT WebNLG Dataset 3.3 Training and Inference Train Test Train Test We calculate the triple loss of a sentence as a binary Normal 37013 3266 1596 246 cross entropy over valid candidate triples Tv , whose SEO 9782 1297 227 457 subject and object are different entities (or the end EPO 14735 978 3406 26 words of different entities): ALL 56195 5000 5019 703 X 1 Lt = − yt log st +(1−yt ) log(1−st ) Table 1: Statistics of evaluation datasets. |Tv | t∈Tv (9) where st is the score of the candidate triple t, yt = 1 4.2 Experimental Settings for gold triples and 0 for others. We also train the We determine the hyper-parameters on the valida- entity tagger with a cross-entrory loss: tion sets. We use the pre-trained GloVe (Penning- n 1X X ton et al., 2014) embeddings as w. We adopt a Le = − log p(yi∗ = ŷi∗ ) (10) one-layer CNN with dc = 30 channels to learn n i=1 ∗∈{s,e} c from 30-dimensional randomly-initialized char- s/e where ŷi are the gold start and end tags of the i-th acter embeddings. We choose the state-of-the-art word, respectively. Finally, we train the R-BPtrNet RoBERTaLARGE (Liu et al., 2019) model3 as the with the joint loss L = Lt + Le . pre-tained LM. For a fair comparison with previous To prevent error propagation, we use the gold methods, we also conduct experiments and report entity tags to filter out valid candidate triples and the scores with BERTBASE (Devlin et al., 2019). compute the tag embeddings e1:n during training. We set din (Equation 1) as 300. The hidden dimen- We also update the memory M with the gold rela- sions of the encoder dE and the entity tagger dT are tional triples. During inference, we extract triples both 200. The dimensions of entity tag embeddings from scratch and use the predicted entity tags and de and relation type embeddings dR are set as 50 relational triples instead of the gold ones. and 200, respectively. The projection dimension 1 dP of the RN is set as 500. We don’t use the more common attention mecha- 2 nism (Bahdanau et al., 2015) to select triples because the As mentioned in (Wang et al., 2020), this number is mis- attention weights are restricted to sum to 1. If all triples in the written as 246 in (Wei et al., 2020) and (Yu et al., 2019). Here memory are useless, they will still be assigned a large weight we quote the correct number from (Wang et al., 2020). 3 due to the restriction, which will confuse the model. https://github.com/huggingface/transformers 5698
NYT WebNLG Method Prec. Rec. F1 Prec. Rec. F1 NovelTagging (Zheng et al., 2017) 62.4 31.7 42.0 52.5 19.3 28.3 CopyRE (Zeng et al., 2018) 72.8 69.4 71.1 60.9 61.1 61.0 CopyRERL (Zeng et al., 2019) 77.9 67.2 72.1 63.3 59.9 61.6 GraphRel (Fu et al., 2019) 63.9 60.0 61.9 44.7 41.1 42.9 ETL-Span (Yu et al., 2019) 84.9 72.3 78.1 84.0 91.5 87.6 CopyMTL (Zeng et al., 2020) 75.7 68.7 72.0 58.0 54.9 56.4 WDec (Nayak and Ng, 2020) 94.5 76.2 84.4 - - - CGTUniLM (Ye et al., 2020) 94.7 84.2 89.1 92.9 75.6 83.4 C AS R ELLSTM (Wei et al., 2020) 84.2 83.0 83.6 86.9 80.6 83.7 C AS R ELBERT (Wei et al., 2020) 89.7 89.5 89.6 93.4 90.1 91.7 TPLinkerLSTM (Wang et al., 2020) 83.8 83.4 83.6 90.8 90.3 90.5 TPLinkerBERT (Wang et al., 2020) 91.3 92.5 91.9 91.8 92.0 91.9 R-BPtrNet 90.9 91.3 91.1 90.7 94.6 92.6 R-BPtrNetBERT 92.7 92.5 92.6 93.7 92.8 93.3 R-BPtrNetRoBERTa 94.0 92.9 93.5 94.3 93.3 93.8 Table 2: Performance of our method and previous state-of-the-art models on the NYT and WebNLG test sets. The best scores are in bold and the second-best scores are underlined. R-BPtrNet is our model without pre-trained LMs. R-BPtrNetBERT and R-BPtrNetRoBERTa are models using BERTBASE and RoBERTaLARGE , respectively. We add 10% dropout (Srivastava et al., 2014) on Reinforcement Learning (RL). the input of all LSTMs for regularization. Follow- • GraphRel (Fu et al., 2019) proposed a graph ing previous work (Zeng et al., 2018; Wei et al., convolutional network for this task. 2020; Wang et al., 2020), we set the max length • ETL-Span (Yu et al., 2019) proposed a of input sentences to 100. We use the Adam op- decomposition-based tagging scheme. timizer (Kingma and Ba, 2014) to fine-tune the • CopyMTL (Zeng et al., 2020) proposed a LM and train other parameters with the learning Multi-Task Learning (MTL) framework based rates of 10−5 and 10−3 , respectively. We train our on CopyRE to address multi-token entities. model for 30/90 epochs with the batch size as 32/8 on NYT/WebNLG. At the beginning of the last 10 • WDec (Nayak and Ng, 2020) proposed an epochs, we load the parameters with the best val- encoder-decoder architecture for this task. idation performance and divide the learning rates • CGTUniLM (Ye et al., 2020) proposed a gen- by ten. Finally, we choose the best model on the erative transformer module with a triple con- validation set and output results on the test set. trastive training object. • C AS R EL (Wei et al., 2020) proposed a cas- 4.3 Performance Evaluation cade binary tagging framework. We present our results on the NYT and WebNLG • TPLinker (Wang et al., 2020) proposed a one- test sets in Table 2 and compare them with several stage token pair linking model with a novel previous state-of-the-art models: handshaking tagging scheme. • NovelTagging (Zheng et al., 2017) trans- From Table 2 we have the following observa- formed this task into a sequence tagging prob- tions: (1) The R-BPtrNet significantly outperforms lem but neglected the overlapping triples. all previous non-LM methods. It demonstrates • CopyRE (Zeng et al., 2018) proposed a the superiority of our seq2seq-based framework seq2seq model based on the copy mechanism to jointly extract explicit and implicit relational to generate triple element as sequences. triples and improve the performance for this task. • CopyRERL (Zeng et al., 2019) proposed to Additionally, the R-BPtrNet produces competitive learn the extraction order of CopyRE with performance to the BERT-based baseline models 5699
NYT WebNLG Method Nor. SEO EPO N=1 N=2 N=3 N=4 N≥5 Nor. SEO EPO N=1 N=2 N=3 N=4 N≥5 CopyRE 66.0 48.6 55.0 67.1 58.6 52.0 53.6 30.0 59.2 33.0 36.6 59.2 42.5 31.7 24.2 30.0 CopyRERL 71.2 69.4 72.8 71.7 72.6 72.5 77.9 45.9 65.4 60.1 67.4 63.4 62.2 64.4 57.2 55.7 GraphRel 69.6 51.2 58.2 71.0 61.5 57.4 55.1 41.1 65.8 38.3 40.6 66.0 48.3 37.0 32.1 32.1 ETL-Span† 88.5 87.6 60.3 85.5 82.1 74.7 75.6 76.9 87.3 91.5 80.5 82.1 86.5 91.4 89.5 91.1 C AS R ELBERT 87.3 91.4 92.0 88.2 90.3 91.9 94.2 83.7 89.4 92.2 94.7 89.3 90.8 94.2 92.4 90.9 TPLinkerBERT 90.1 93.4 94.0 90.0 92.9 93.1 96.1 90.0 87.9 92.5 95.3 88.0 90.1 94.6 93.3 91.6 R-BPtrNetBERT 90.4 94.4 95.2 89.5 93.1 93.5 96.7 91.3 89.5 93.9 96.1 88.5 91.4 96.2 94.9 94.2 R-BPtrNetRoBERTa 91.2 95.3 96.1 90.5 93.6 94.2 97.7 92.1 89.9 94.4 97.4 89.3 91.7 96.5 95.8 94.8 Table 3: F1 scores on sentences with different overlapping patterns and different triple numbers. The best scores are in bold and the second-best scores are underlined. † marks scores reproduced by (Wang et al., 2020). without using BERT. It shows that the improve- patterns and the number of triples. We conduct ments of our model come not primarily from the further experiments on these subsets and report pre-trained LM representations, but from the intro- the results in Table 3, from which we can observe duction of relational reasoning patterns to this task. that: (1) The R-BPtrNetRoBERTa and R-BPtrNetBERT (2) R-BPtrNetBERT outperforms BERT-based base- both significantly outperform previous models on line models. It indicates that our method can effec- the SPO and EPO subsets of NYT and WebNLG tively extract implicit relational triples with the as- datasets. It proves the validity of our method to sistance of the triple-retaining external memory and address the overlapping triple problem. Moreover, the pattern-capturing RN. (3) R-BPtrNetRoBERTa we find that implicit relational triples usually over- further outperforms R-BPtrNetBERT and other base- lap with others. Therefore, the improvements on line methods. It indicates that the more powerful the overlapping subsets also validate the effective- LM brings more prior knowledge and real-world ness of our method for extracting implicit relational relational facts, enhancing the model’s ability to triples. (2) R-BPtrNetRoBERTa and R-BPtrNetBERT learn real-world relational reasoning patterns. both bring improvements to sentences with multi- ple triples compared to baseline models. It indi- 4.4 Performance on Different Sentence Types cates that our method can effectively extract multi- To demonstrate the ability of our model in han- ple relational triples from a sentence. Furthermore, dling the multiple triples and overlapping triples we observe more significant improvements when of a sentence, we split the test sets of NYT and the number of triples grows. We hypothesize that WebNLG datasets according to the overlapping this is because implicit relational triples are more likely to occur in sentences with more triples. Our model extracts the implicit relational triples more correctly and improves the performance. 4.5 Ablation Study on Implicit Triples We run an ablation study to investigate the con- tribution of each component in our model to the implicit relational triples. We manually select 134 sentences with rich implicit triples from the NYT test set4 . We conduct experiments on the subset 4 We first select sentences that contain at least two over- lapping relational triples. For example, if a sentence contains entity A, B and C, and if A→B and B→C exists or A→B and A→C exists, this sentence is selected. Note that A→B and Figure 3: An ablation study on a manually selected sub- B→A are counted as one triple during this selecting procedure. Then, we manually check all selected sentences and keep the set with rich implicit relational triples. ones with implicit relational triples. 5700
We ' re providing a new outlet ... Edmund , was an influential ... organizing an expedition for them for distribution , '' said municipal judge in Crowley, starting in November in Sentence Chad Hurley , chief executive who was ... as well as a close Jinghong , a small city in the Yunnan province in China. and ... of YouTube , a division adviser to former Louisiana of Google. Gov. Edwin Edwards ... Yunnan YouTube Edmund Edwin contains Edwards Ground contains Chad Jinghong lives in lives in lives in Truth Hurley company contains China Google Crowley Louisiana Yunnan YouTube Edmund Edwin company Edwards contains TPLinker contains Chad lives in lives in Jinghong Hurley China Google Crowley Louisiana Yunnan YouTube Edmund advisor Edwin company Edwards contains Ours contains Chad division lives in lives in Jinghong lives in contains Hurley company China Google Crowley Louisiana Figure 4: Examples of sentences with implicit relational triples, and the predictions from the TPLinkerBERT and R-BPtrNetBERT model. Bold and colored texts are entities. We distinguish different entities with different colors. Explicit and implicit relational triples are represented by black and green solid arrows, respectively. Blue dashed arrows indicate explicit relational connections between entities, but they do not appear as relational triples because their relation types don’t belong to the pre-defined relations of the dataset. using the following ablation options: extracted triples. (3) R-BPtrNet brings significant • R-BPtrNetRoBERTa and R-BPtrNetBERT are improvements over the BPtrNet to the relation and the full models using RoBERTaLARGE and the triple F1 scores. It indicates that the RN effec- BERTBASE as LMs, respectively. tively captures the relational reasoning patterns and • R-BPtrNet removes the pre-trained LM enhances the relation type inference of implicit re- representations from the full model. lations. (4) The pre-trained LMs only bring minor improvements. It proves that the effectiveness of • BPtrNet removes the RN from the R- our model comes primarily from the external mem- BPtrNet. Under this setting, we feed a gated ory and the introduction of relational reasoning summation of the memory into the decoder’s patterns rather than the pre-trained LMs. input of the next time step. • BPtrNetNoMem removes the external mem- 4.6 Case Study ory from the BPtrNet, which means that the Figure 4 shows the comparison of the best previ- previously extracted triples are not retained. ous model TPLinkerBERT and our R-BPtrNetBERT We compare the performance of these options with model on three example sentences from the im- the previous BERT-based models. We also analyze plicit subset in Section 4.5. The first example the performance on predicting only the entity pairs contains the transitive pattern of the relation “con- and the relations, respectively. We illustrate the tains”. The second example contains a multi-hop results in Figure 3, from which we can observe relation path pattern between “Chad Hurley” and that: (1) BPtrNetNoMem produces comparable re- “Google” through the intermediate entity “Youtube”. sults to the baseline models. We speculate that it The third example contains a composite pattern be- benefits from the seq2seq structure and the previ- tween the siblings “Crowley” and “Edwin Edwards” ous triples are embedded into the decoder’s hidden with a common ancestor “Edmund”. We can ob- states. (2) BPtrNet brings huge improvements over serve that the TPLinkerBERT model fails to extract the BPtrNetNoMem to the entity pair and the triple the implicit relational triples. The R-BPtrNetBERT F1 scores. It indicates that the external memory successfully captures various reasoning patterns effectively helps discover entity pairs that have im- in the real world and effectively extracts all the plicit relational connections by retaining previously implicit relational triples in the examples. 5701
5 Conclusion Claire Gardent, Anastasia Shimorina, Shashi Narayan, and Laura Perez-Beltrachini. 2017. Creating train- In this paper, we propose a unified framework to ing corpora for nlg micro-planning. In ACL. extract explicit and implicit relational triples jointly. To discover entity pairs that may have implicit re- Felix A Gers, Jürgen Schmidhuber, and Fred Cummins. 2000. Learning to forget: Continual prediction with lational connections, we propose a binary pointer lstm. Neural computation, 12(10):2451–2471. network to extract relational triples relevant to each word sequentially and introduce an external mem- Matthew R Gormley, Mo Yu, and Mark Dredze. 2015. ory to retain the extracted triples. To derive the Improved relation extraction with feature-rich com- positional embedding models. In EMNLP, pages relation types of the implicitly connected entity 1774–1784. pairs, we propose to introduce real-world relational reasoning patterns to this task and capture the rea- Pankaj Gupta, Hinrich Schütze, and Bernt Andrassy. soning patterns with a relation network. We con- 2016. Table filling multi-task recurrent neural net- duct experiments on two benchmark datasets, and work for joint entity and relation extraction. In COL- ING, pages 2537–2547. the results prove the effectiveness of our method. Ningning Jia, Xiang Cheng, and Sen Su. 2020. Im- Acknowledgement proving knowledge graph embedding using locally and globally attentive relation paths. In European We would like to thank the anonymous reviewers Conference on Information Retrieval, pages 17–32. for their constructive comments on this paper. This Springer. work was supported by the National Natural Sci- ence Foundation of China under Grant numbers Kuang Jun, Cao Yixin, Zheng Jianbing, He Xiangnan, Gao Ming, and Zhou Aoying. 2020. Improving neu- U1936208, U1936216, and 61862002. ral relation extraction with implicit mutual relations. In ICDE, pages 1021–1032. References Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint Gabor Angeli and Christopher D Manning. 2013. arXiv:1412.6980. Philosophers are mortal: Inferring the truth of un- seen facts. In CoNLL, pages 133–142. Yann LeCun, Bernhard Boser, John S Denker, Don- nie Henderson, Richard E Howard, Wayne Hubbard, Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- and Lawrence D Jackel. 1989. Backpropagation ap- gio. 2015. Neural machine translation by jointly plied to handwritten zip code recognition. Neural learning to align and translate. In ICLR. computation, 1(4):541–551. Yee Seng Chan and Dan Roth. 2011. Exploiting Qi Li and Heng Ji. 2014. Incremental joint extraction syntactico-semantic structures for relation extrac- of entity mentions and relations. In ACL, pages 402– tion. In ACL-HLT, pages 551–560. 412. Dai Dai, Xinyan Xiao, Yajuan Lyu, Shan Dou, Qiao- qiao She, and Haifeng Wang. 2019. Joint extraction Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man- of entities and overlapping relations using position- dar Joshi, Danqi Chen, Omer Levy, Mike Lewis, attentive sequence labeling. In AAAI, volume 33, Luke Zettlemoyer, and Veselin Stoyanov. 2019. pages 6300–6308. Roberta: A robustly optimized bert pretraining ap- proach. arXiv preprint arXiv:1907.11692. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep Makoto Miwa and Mohit Bansal. 2016. End-to-end re- bidirectional transformers for language understand- lation extraction using lstms on sequences and tree ing. In NAACL-HLT. structures. In ACL, pages 1105–1116. Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Tapas Nayak and Hwee Tou Ng. 2020. Effective mod- Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, eling of encoder-decoder architecture for joint entity Shaohua Sun, and Wei Zhang. 2014. Knowledge and relation extraction. In AAAI, volume 34, pages vault: A web-scale approach to probabilistic knowl- 8528–8535. edge fusion. In KDD, pages 601–610. Jeffrey Pennington, Richard Socher, and Christopher D Tsu-Jui Fu, Peng-Hsuan Li, and Wei-Yun Ma. 2019. Manning. 2014. Glove: Global vectors for word rep- Graphrel: Modeling text as relational graphs for resentation. In Proceedings of the 2014 conference joint entity and relation extraction. In ACL, pages on empirical methods in natural language process- 1409–1418. ing (EMNLP), pages 1532–1543. 5702
Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R 2020. Contrastive triple extraction with generative Voss, Heng Ji, Tarek F Abdelzaher, and Jiawei Han. transformer. arXiv preprint arXiv:2009.06207. 2017. Cotype: Joint extraction of typed entities and relations with knowledge bases. In WWW, pages Bowen Yu, Zhenyu Zhang, and Jianlin Su. 2019. 1015–1024. Joint extraction of entities and relations based on a novel decomposition strategy. arXiv preprint Sebastian Riedel, Limin Yao, and Andrew McCallum. arXiv:1909.04273. 2010. Modeling relations and their mentions with- out labeled text. In Joint European Conference Xiaofeng Yu and Wai Lam. 2010. Jointly identifying on Machine Learning and Knowledge Discovery in entities and extracting relations in encyclopedia text Databases, pages 148–163. Springer. via a graphical model approach. In COLING 2010: Posters, pages 1399–1407. Adam Santoro, David Raposo, David G Barrett, Ma- teusz Malinowski, Razvan Pascanu, Peter Battaglia, Dmitry Zelenko, Chinatsu Aone, and Anthony and Timothy Lillicrap. 2017. A simple neural net- Richardella. 2003. Kernel methods for relation ex- work module for relational reasoning. In Advances traction. Journal of machine learning research, in neural information processing systems, pages 3(Feb):1083–1106. 4967–4976. Daojian Zeng, Haoran Zhang, and Qianying Liu. 2020. Copymtl: Copy mechanism for joint extraction of Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, entities and relations with multi-task learning. In Ilya Sutskever, and Ruslan Salakhutdinov. 2014. AAAI, pages 9507–9514. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning Xiangrong Zeng, Shizhu He, Daojian Zeng, Kang Liu, research, 15(1):1929–1958. Shengping Liu, and Jun Zhao. 2019. Learning the extraction order of multiple relational facts in a Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. sentence with reinforcement learning. In EMNLP- 2015. Pointer networks. In Advances in neural in- IJCNLP, pages 367–377. formation processing systems, pages 2692–2700. Shaolei Wang, Yue Zhang, Wanxiang Che, and Ting Xiangrong Zeng, Daojian Zeng, Shizhu He, Kang Liu, Liu. 2018. Joint extraction of entities and relations and Jun Zhao. 2018. Extracting relational facts by based on a novel graph scheme. In IJCAI, pages an end-to-end neural model with copy mechanism. 4461–4467. In ACL, pages 506–514. Yucheng Wang, Bowen Yu, Yueyang Zhang, Tingwen Suncong Zheng, Feng Wang, Hongyun Bao, Yuexing Liu, Hongsong Zhu, and Limin Sun. 2020. Tplinker: Hao, Peng Zhou, and Bo Xu. 2017. Joint extrac- Single-stage joint extraction of entities and rela- tion of entities and relations based on a novel tagging tions through token pair linking. arXiv preprint scheme. In ACL, pages 1227–1236. arXiv:2010.13415. GuoDong Zhou, Jian Su, Jie Zhang, and Min Zhang. Zhepei Wei, Jianlin Su, Yue Wang, Yuan Tian, and 2005. Exploring various knowledge in relation ex- Yi Chang. 2020. A novel cascade binary tagging traction. In ACL, pages 427–434. framework for relational triple extraction. In ACL, pages 1476–1488. Hao Zhu, Yankai Lin, Zhiyuan Liu, Jie Fu, Tat-Seng Chua, and Maosong Sun. 2019. Graph neural net- Hongbin Ye, Ningyu Zhang, Shumin Deng, Mosha works with generated parameters for relation extrac- Chen, Chuanqi Tan, Fei Huang, and Huajun Chen. tion. In ACL, pages 1331–1339. 5703
You can also read