Hierarchical Graph Convolutional Networks for Jointly Resolving Cross-document Coreference of Entity and Event Mentions
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Hierarchical Graph Convolutional Networks for Jointly Resolving Cross-document Coreference of Entity and Event Mentions Duy Phung1 , Tuan Ngo Nguyen2 and Thien Huu Nguyen2 1 VinAI Research, Vietnam 2 Department of Computer and Information Science, University of Oregon, Eugene, Oregon, USA v.duypv1@vinai.io,{tnguyen,thien}@cs.uoregon.edu Abstract A major challenge in CDECR involves the necessity to model entity mentions (e.g., “Jim This paper studies the problem of cross- O’Brien”) that participate into events and reveal document event coreference resolution their spatio-temporal information (Yang et al., (CDECR) that seeks to determine if event 2015) (called event arguments). In particular, as mentions across multiple documents refer to event mentions might be presented in different sen- the same real-world events. Prior work has demonstrated the benefits of the predicate- tences/documents, an important evidence for pre- argument information and document context dicting the coreference of two event mentions is to for resolving the coreference of event men- realize that the two event mentions have the same tions. However, such information has not been participants in the real world and/or occur at the captured effectively in prior work for CDECR. same location and time (i.e., same arguments). To address these limitations, we propose a Motivated by this intuition, prior work for novel deep learning model for CDECR that CDECR has attempted to jointly resolve cross- introduces hierarchical graph convolutional neural networks (GCN) to jointly resolve document coreference for entities and events so entity and event mentions. As such, sentence- the two tasks can mutually benefit from each other level GCNs enable the encoding of important (iterative clustering) (Lee et al., 2012). In fact, context words for event mentions and their this iterative and joint modeling approach has re- arguments while the document-level GCN cently led to the state-of-the-art performance for leverages the interaction structures of event CDECR (Barhom et al., 2019; Meged et al., 2020). mentions and arguments to compute document Our model for CDECR follows this joint coref- representations to perform CDECR. Extensive experiments are conducted to demonstrate the erence resolution method; however, we advance effectiveness of the proposed model. it by introducing novel techniques to address two major limitations from previous work (Yang et al., 1 Introduction 2015; Kenyon-Dean et al., 2018; Barhom et al., 2019), i.e., the inadequate mechanisms to capture Event coreference resolution (ECR) aims to clus- the argument-related information for representing ter event-triggering expressions in text such that event mentions and the use of only lexical features all event mentions in a group refer to the same to represent input documents. unique event in real world. We are interested in As the first limitation with the event argument- cross-document ECR (CDECR) where event men- related evidence, existing methods for CDECR tions might appear in the same or different docu- have mainly captured the direct information of ments. For instance, consider the following sen- event arguments for event mention representa- tences (event mentions) S1 and S2 that involve tions, thus failing to explicitly encode other im- “leaving” and “left” (respectively) as event trigger portant context words in the sentences to reveal words (i.e., the predicates): fine-grained nature of relations between arguments S1: O’Brien was forced into the drastic step of leaving and triggers for ECR (Yang et al., 2015; Barhom the 76ers. et al., 2019). For instance, consider the corefer- S2: Jim O’Brien left the 76ers after one season as coach. ence prediction between the event mentions in S1 An CDECR system in this case would need to and the following sentence S3 (with “leave” as the recognize that both event mentions in S1 and S2 event trigger word): refer to the same event. S3: The baseball coach Jim O’Brien decided to leave the 32 Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15), pages 32–41 June 11, 2021. ©2021 Association for Computational Linguistics
s forced prep work on CDECR has proved that input documents jpas auxpass nsub also provide useful context information for event O’Brien into was mentions (e.g., document topics) to enhance the dep step prep clustering performance (Kenyon-Dean et al., 2018). det However, the document information is only cap- dep the drastic of pcomp tured via lexical features in prior work, e.g., TF- leaving dep IDF vectors (Kenyon-Dean et al., 2018; Barhom 76er et al., 2019), leading to the poor generalization to dep (S1) the unseen words/tokens and inability to encapsulate latent semantic information for CDECR. To this Figure 1: The pruned dependency tree for the event mention in S1. The trigger is red while argument heads are blue. end, we propose to learn distributed representation vectors for input documents to enrich event men- club on Thursday. tion representations and improve the generalization The arguments for the event mention in S3 in- of the models for CDECR. In particular, as entity volves “The baseball coach Jim O’Brien”, “the and event mentions are the main objects of inter- club”, and “Monday”. If an entity coreference est for CDECR, our motivation is to focus on the resolution system considers the entity mention context information from these objects to induce pairs (“O’Brien” in S1, “The baseball coach Jim document representations for the models. To im- O’Brien” in S3) and (“the 76ers” in S1 and “the plement this idea, we propose to represent input club” in S3) as being coreferred, a CDECR sys- documents via interaction graphs between their en- tem, which only concerns event arguments and tity and event mentions, serving as the structures to their coreference information, would incorrectly generate document representation vectors. predict the coreference between the event mentions Based on those motivations, we introduce a in S1 and S3 in this case. However, if the CDECR novel hierarchical graph convolutional neural net- system further models the important words for the work (GCN) that involves two levels of GCN mod- relations between event triggers and arguments (i.e., els to learn representation vectors for the itera- the words “was forced” in S1 and “decided” in S3), tive and joint model for CDECR. In particular, it can realize the unwillingness of the subject for sentence-level GCNs will consume the pruned de- the position ending event in S1 and the self intent pendency trees to obtain context-enriched represen- to leave the position for the event in S3. As such, tation vectors for event and entity mentions while this difference can help the system to reject the a document-level GCN will be run over the entity- event coreference for S1 and S3. event interaction graphs, leveraging the mention To this end, we propose to explicitly identify representations from the sentence-level GCNs as and capture important context words for event trig- the inputs to generate document representations gers and arguments in representation learning for for CDECR. Extensive experiments show that the CDECR. In particular, our motivation is based on proposed model achieves the state-of-the-art res- the shortest dependency paths between event trig- olution performance for both entities and events gers and arguments that have been used to reveal on the ECB+ dataset. To our knowledge, this is important context words for their relations (Li et al., the first work that utilizes GCNs and entity-event 2013; Sha et al., 2018; Veyseh et al., 2020a, 2021). interaction graphs for coreference resolution. As an example, Figure 1 shows the dependency tree of S1 where the shortest dependency path between 2 Related Work “O’Brien” and “leaving” can successfully include the important context word “forced”. As such, for ECR is considered as a more challenging task than each event mention, we leverage the shortest de- entity coreference resolution due to the more com- pendency paths to build a pruned and argument- plex structures of event mentions that require argu- customized dependency tree to simultaneously con- ment reasoning (Yang et al., 2015). Previous work tain event triggers, arguments and the important for within-document event resolution includes pair- words in a single structure. Afterward, the struc- wise classifiers (Ahn, 2006; Chen et al., 2009), ture will be exploited to learn richer representation spectral graph clustering methods (Chen and Ji, vectors for CDECR. 2009), information propagation (Liu et al., 2014), Second, for document representations, previous markov logic networks (Lu et al., 2016), and deep 33
learning (Nguyen et al., 2016). For only cross- (Cybulska and Vossen, 2014) to evaluate the mod- document event resolution, prior work has consid- els in this work, the training phase directly utilizes ered mention-pair classifiers for coreference that the golden topics of the documents while the test use granularities of event slots and lexical features phase applies the K-mean algorithm for document of event mentions for the features (Cybulska and clustering as in (Barhom et al., 2019). Afterward, Vossen, 2015b,a). Within- and cross-document given a topic t 2 T with the corresponding docu- event coreference have also been solved simulta- ment subset Dt ⇢ D, our CDECR model initial- neously in previous work (Lee et al., 2012; Bejan izes the entity and event cluster configurations Et0 and Harabagiu, 2010; Adrian Bejan and Harabagiu, and Vt0 (respectively) where: Et0 involves within- 2014; Yang et al., 2015; Choubey and Huang, 2017; document clusters of the entity mentions in the Kenyon-Dean et al., 2018). The most related works documents in Dt , and Vt0 simply puts each event to us involve the joint models for entity and event mention presented in Dt into its own cluster (lines 2 coreference resolution that use contextualized word and 3 in Algorithm 1). In the training phase, Et0 is embeddings to capture the dependencies between obtained from the golden within-document corefer- the two tasks and lead to the state-of-the-art perfor- ence information of the entity mentions (to reduce mance for CDECR (Lee et al., 2012; Barhom et al., noise) while the within-document entity mention 2019; Meged et al., 2020). clusters returned by Stanford CoreNLP (Manning Finally, regarding the modeling perspective, our et al., 2014) are used for Et0 in the test phase, fol- work is related to the models that use GCNs to lowing (Barhom et al., 2019). For convenience, the learn representation vectors for different NLP tasks, sets of entity and event mentions in Dt are called e.g., event detection (Lai et al., 2020; Veyseh et al., MtE and MtV respectively. 2019) and target opinion word extraction (Vey- seh et al., 2020b), applying both sentence- and Algorithm 1 Training algorithm document-level graphs (Sahu et al., 2019; Tran 1: for t 2 T do et al., 2020; Nan et al., 2020; Tran and Nguyen, 2: Et0 Within-doc clusters of entity mentions 3: Vt0 Singleton event mentions in MtV 2021; Nguyen et al., 2021). However, to our knowl- 4: k 1 edge, none of the prior work has employed GCNs 5: while 9 meaningful cluster-pair merge do for ECR. 6: //Entities 7: Generate entity mention representations RE (mei , Vtk 1 ) for all mei 2 MtE 3 Model 8: Compute entity mention-pair coreference scores SE (mei , mej ) Given a set of input documents D, the goal of 9: Train RE and SE using the gold entity mention CDECR is to cluster event mentions in the doc- clusters 10: Etk Agglomeratively cluster MtE based on uments of D according to their coreference. Our SE (mei , mej ) model for CDECR follows (Barhom et al., 2019) 11: //Events that simultaneously clusters entity mentions in D 12: Generate event mention representations RV (mvi , Etk ) for all mvi 2 MtV to benefit from the inter-dependencies between en- 13: Compute event mention-pair coreference scores tities and events for coreference resolution. In this SV (mvi , mvj ) section, we will first describe the overall framework 14: Train RV and SV using the gold entity mention clusters of our iterative method for joint entity and event 15: Vtk Agglomeratively cluster MtV based on coreference resolution based on (Barhom et al., SV (mvi , mvj ) 16: k k+1 2019) (a summary of the framework is given in 17: end while Algorithm 1). The novel hierarchical GCN model 18: end for for inducing mention1 representation vectors will be discussed afterward. Given the initial configurations, our iterative al- Iterative Clustering for CDECR: Following gorithm involves a sequence of clustering itera- (Barhom et al., 2019), we first cluster the input tions, generating new cluster configurations Etk document set D into different topics to improve and Vtk for entities and events (respectively) af- the coreference performance (the set of document ter each iteration. As such, each iteration k per- topics is called T ). As we use the ECB+ dataset forms two independent clustering steps where en- 1 We use mentions to refer to both event and entity men- tity mentions are clustered first to produce Etk , tions. followed by event mention clustering to obtain 34
Vtk (alternating the clustering). Starting with the est cluster-pair scores until all the scores are be- entity clustering at iteration k, each entity men- low a predefined threshold . In this way, the al- tion mei is first transformed into a representa- gorithm first focuses on high-precision merging tion vector RE (mei , Vtk 1 ) (line 7 in Algorithm operations and postpones less precise ones until 1) that is conditioned on not only the specific con- more information is available. The cluster-pair text of mei but also the current event cluster con- score SC (ci , cj ) for two clusters ci and cj (mention figuration Vtk 1 (i.e., to capture the event-entity sets) at some algorithm step is based on averaging inter-dependencies). Afterward, a scoring func- mention Plinkage coreference scores: SC (ci , cj ) = 1 tion SE (mei , mej ) is used to compute the coref- |ci ||cj | mi 2ci ,mj 2cj S⇤ (mi , mj ) (Barhom et al., erence probability/score for each pair of entity 2019) where ⇤ can be E or V depending on whether mentions, leveraging their mention representations ci and cj are entity or event clusters (respectively). RE (mei , Vtk 1 ) and RE (mej , Vtk 1 ) at the cur- Mention Representations: Let m be a men- rent iteration (RE and SE will be discussed in the tion (event or entity) in a sentence W = next section) (line 8). An agglomerative cluster- w1 , w2 , . . . , wn of n words (wi is the i-th word) ing algorithm then utilizes these coreference scores where wa is the head word of m. To prepare W to cluster the entity mentions, leading to a new for the mention representation computation and configuration Etk for entity mention clusters (line achieve a fair comparison with (Barhom et al., 10). Given Etk , the same procedure is applied to 2019), we first convert each word wi 2 W into cluster the event mentions for iteration k, includ- a vector xi using the ELMo embeddings (Peters ing: (i) obtaining an event representation vector et al., 2018). Here, xi is obtained by running the RV (mvi , Etk ) for each event mention mvi based pre-trained ELMo model over W and averaging on the current entity cluster configuration Etk , (ii) the hidden vectors for wi at the three layers in computing coreference scores for the event mention ELMo. This transforms W into a sequence of vec- pairs via the scoring function SV (mvi , mvj ), and tors X = x1 , x2 , . . . , xn for the next steps. The (iii) performing agglomerative clustering for the mention representations in our work are based on event mentions to produce the new configuration two major elements, i.e., the modeling of important Vtk for event clusters (lines 12, 13, and 15). context words for event triggers and arguments, Note that during the training process, the param- and the induction of document presentations. eters of the representation and scoring functions for entities RE and SE (or for events with RV and SV ) (i) Modeling Important Context Words: A mo- are updated/optimized after the coreference scores tivation for representation learning in our model are computed for all the entity (or event) mention is to capture event arguments and important con- pairs. This corresponds to lines 9 and 14 in Algo- text words (for the relations between event triggers rithm 1 (not used in the test phase). In particular, and arguments) to enrich the event mention repre- the loss function to optimize RE and SE in line sentations. In this work, we employ a symmetric 9 is based on the cross-entropy over every pair of intuition to compute a representation vector for entity (mei , mej ) in MtE : Lent,coref an entity mention m, aiming to encode associated P mentionsent t = predicates (i.e., event triggers that accept m as an me 6=me y ij log S E (m ei , mej ) (1 i j argument W ) and important context words (for the ent ) log(1 yij SE (mei , mej ) where yij ent is a relations between the entity mention and associ- golden binary variable to indicate whether mei and ated event triggers). As such, following (Barhom mej corefer or not (symmetrically for Lenv,coref t et al., 2019), we first identify the attached argu- to train RV and SV in line 14). Also, we use ments (if m is an event mention) or predicates the predicted configurations Etk and Vtk 1 in the (if m is an entity mention) in W for m using a mention representation computation (instead of the semantic role labeling (SRL) system. In particu- golden clusters as in yij ent for the loss functions) lar, we focus on four semantic roles of interest: during both the training and test phases to achieve Arg0, Arg1, Location, and Time. For conve- a consistency (Barhom et al., 2019). nience, let Am = {wi1 , . . . , wio } ⇢ W be the set Finally, we note that the agglomerative cluster- of head words of the attached event arguments or ing in each iteration of our model also starts with event triggers for m in W based on the SRL sys- the initial clusters as in Et0 and Vt0 , and greed- tem and the four roles (o is the number of head ily merges multiple cluster pairs with the high- words). In particular, if m is an event mention, 35
Am would involve the head words of the entity resolution task (i.e., events or entities). For exam- mentions that fill the four semantic roles for m ple, event coreference might need to weight the in W . In contrast, if m is an entity mention, Am information from event arguments and important would capture the head words of the event trig- context words more than those for entity coref- gers that take m as an argument with one of the erence (Yang et al., 2015). To this end, we pro- four roles in W . Afterward, to encode the impor- pose to apply two separate sentence-level GCN tant context words for the relations between m and models for event and entity mentions, which share the words in Am , we employ the shortest depen- the architecture but differ from the parameters, to dency paths Pij between the head word wa of m enhance representation learning (called Gent and and the words wij 2 Am . As such, starting with Gevn for entities and events respectively). In ad- the dependency tree of G m = {N m , E m } of W dition, to introduce a new source of training sig- (N m involves the words in W ), we build a pruned nals and promote the knowledge transfer between tree Ĝ m = {N̂ m , Ê m } of G m to explicitly focus the two GCN networks, we propose to regularize on the mention m and its important context words the GCN models so they produce similar repre- in the paths Pij . Concretely, the node set N̂ m of sentation vectors for the same input sentence W . Ĝ m contains all the words in the paths Pij (N̂ m = In particular, we apply both GCN models Gent S and Gevn over the full dependency tree3 G m us- j=1..o Pij ) while the edge set Ê m preserves the dependency connections in G of the words in m ing the ELMo-based vectors X for W as the input. N̂ m (Ê m = {(a, b)|a, b 2 N̂ m , (a, b) 2 E m }). As This produces the hidden vectors hent ent 1 , . . . , hn such, the pruned tree Ĝ m helps to gather the rele- and h1 , . . . , hn in the last layers of G and env env ent vant context words for the coreference resolution Gevn for W . Afterward, we compute two ver- of m and organize them into a single dependency sions of representation vectors for W based on graph, serving as a rich structure to learn a repre- max-pooling: hent = max_pool(hent ent 1 , . . . , hn ), henv = max_pool(henv 1 , . . . , hn ). Finally, the env sentation vector for m2 (e.g., in Figure 1). (ii) Graph Convolutional Networks: The context mean squared difference between hent and henv words and structure in Ĝ m suggest the use of Graph is introduced into the overall loss function to reg- Convolutional Networks (GCN) (Kipf and Welling, ularize the models: Lreg m = ||h ent henv ||22 . As 2017; Nguyen and Grishman, 2018) to learn the this loss is specific to mention m, it will be com- representation vector for m (called the sentence- puted independently for event and entity mentions level GCN). In particular, the GCN model in this and added into the corresponding training loss work involves several layers (i.e., L layers in our (i.e., lines 9 or 14 in Algorithm 1). In partic- case) to compute the representation vectors for the ular, the overall training loss for entity mention nodes in the graph Ĝ m at different abstract levels. coreference resolution in line 9 of P Algorithm 1 is: ent,coref The input vector h0v for the node v 2 N̂ m for GCN ent Lt = ↵ Lt ent +(1 ↵ ) me 2M E Lreg ent me t is set to the corresponding ELMo-based vectors in while those for event mention coreference res- X. After L layers, we obtain the hidden vectors olutionPis: Lenv t = ↵env Lenv,coref t + (1 ↵env ) mv 2M V Lreg mv (line 14). Here, ↵ ent and hLv (in the last layer) for the nodes v 2 N̂ . We m t call sent(m) = [hvm , max_pool(hv |v 2 N̂ m )] L L ↵ env are the trade-off parameters. the sentence-level GCN vector that will be used (iv) Document Representation: As motivated in later to represent m (vm is the corresponding node the introduction, we propose to learn representa- of the head word of m in N̂ m ). tion vectors for input documents to enrich men- (iii) GCN Interaction: Our discussion about the tion representations and improve the generaliza- sentence-level GCN so far has been agnostic to tion of the models (over the lexical features for whether m is an event or entity mention and the documents). As such, our principle is to employ straightforward approach is to apply the same GCN entity and event mentions (the main objects of in- model for both entity and event mentions. How- terest in CDECR) and their interactions/structures ever, this approach might limit the flexibility of the to represent input documents (i.e., documents as GCN model to focus on the necessary aspects of interaction graphs of entity and event mentions). information that are specific to each coreference Given the mention m of interest and its correspond- 2 3 If Am is empty, the pruned tree Ĝ m only contains the We tried the pruned dependency tree Ĝ m in this regular- head word of m. ization, but the full dependency tree G m led to better results. 36
P ing document d 2 Dt , we start by building an mentions in c: Arg0m = 1/|c| q2c xhead(q) interaction graph G doc = {N doc , E doc } where the (head(q) is the index of the head word of mention node set N doc involves all the entity and event men- q in W ). Note that as the current entity cluster con- tions in d. For the edge set E doc , we leverage two figuration Etk is used to generate the cluster-based types of information to connect the nodes in N doc : representations if m is an event mention (and vice (i) predicate-argument information: we establish verse), it serves as the main mechanism to enable a link between the nodes for an event mention x the two coreference tasks to interact and benefit and an entity mention y in N doc if y is an argu- from each other. These vectors are inherited from ment of x for one of the four semantic roles, i.e., (Barhom et al., 2019) for a fair comparison. Arg0, Arg1, Location, and Time (identified Finally, given two mentions m1 and m2 , and by the SRL system), and (ii) entity mention coref- their corresponding representation vectors R(m1 ) erence: we connect any pairs of nodes in N doc that and R(m2 ) (as computed above), the coreference correspond to two coreferring entity mentions in score functions SE and SV send the concatenated d (using the gold within-document entity corefer- vector [R(m1 ), R(m2 ), R(m1 ) R(m2 )] to two- ence in training and the predicted one in testing, layer feed-forward networks (separate ones for SE as in Et0 ). In this way, G doc helps to emphasize on and SV ) that involve the sigmoid function in the important objects of d and enables intra- and inter- end to produce coreference score for m1 and m2 . sentence interactions between event mentions (via Here, is the element-wise product. This com- entity mentions/arguments) to produce effective pletes the description of our CDECR model. document representations for CDECR. In the next step, we feed the interaction graph 4 Experiments G doc into a GCN model Gdoc (called the document- Dataset: We use the ECB+ dataset (Cybulska and level GCN) using the sentence-level GCN-based Vossen, 2014) to evaluate the CDECR models in representations of the entity and event mentions this work. Note that ECB+ is the largest dataset in N doc (i.e., sent(m)) as the initial vectors for with both within- and cross-document annotation the nodes (thus called a hierarchical GCN model). for the coreference of entity and event mentions so The hidden vectors produced by the last layer far. We follow the setup and split for this dataset of Gdoc for the nodes in G doc is called {hdu |u 2 in prior work to ensure a fair comparison (Cybul- N doc }. Finally, we obtain the representation ska and Vossen, 2014; Kenyon-Dean et al., 2018; vector doc(m) for d based on the max-pooling: Barhom et al., 2019; Meged et al., 2020). In partic- doc(m) = max_pool(hdu |u 2 N doc ). ular, this setup employs the annotation subset that (v) Final Representation: Given the represen- has been validated for correctness by (Cybulska tation vectors learned so far, we form the final and Vossen, 2014) and involves a larger portion of representation vector for m (RE (m, Vtk 1 ) or the dataset for training. In ECB+, only a part of RV (m, Etk ) in lines 7 or 12 of Algorithm 1) by the mentions are annotated. This setup thus utilizes concatenating the following vectors: gold-standard event and entity mentions in the eval- (1) The sentence- and document-level GCN- uation and does not require special treatment for based representation vectors for m (i.e., sent(m) unannotated mentions (Barhom et al., 2019). and doc(m)). Note that there is a different setup for ECB+ that (2) The cluster-based representation is applied in (Yang et al., 2015) and (Choubey and cluster(m) = [Arg0m , Arg1m , Locationm Huang, 2017). In this setup, the full ECB+ dataset , Timem ]. Taking Arg0m as an example, it is is employed, including the portions with known computed by considering the mention m0 that is annotation errors. In test time, such prior work uti- associated with m via the semantic role Arg0 in lizes the predicted mentions from a mention extrac- W using the SRL system. Here, m0 is an event tion tool (Yang et al., 2015). To handle the partial mention if m is an entity mention and vice versa annotation in ECB+, those prior work only eval- (Arg0 = 0 if m0 does not exist). As such, let c uates the systems on the predicted mentions that be the cluster in the current configuration (i.e., are also annotated as the gold mentions. However, Vtk 1 or Etk ) that contain m0 . We then obtain as shown by (Upadhyay et al., 2016), this ECB+ Arg0m by averaging the ELMo-based vectors setup has several limitations (e.g., the ignorance (i.e., X = x1 , . . . , xn ) of the head words of the of clusters with a single mention and the separate 37
evaluation for each sub-topic). Following (Barhom metrics to evaluate the models in this work, includ- et al., 2019), we thus do not evaluate the systems ing MUC (Vilain et al., 1995), B3 , CEAF-e (Luo, on this setup, i.e., not comparing our model with 2005), and CoNLL F1 (average of three previous those models in (Yang et al., 2015) and (Choubey metrics). The official CoNLL scorer in (Pradhan and Huang, 2017) due to the incompatibility. et al., 2014) is employed to compute these metrics. Hyper-Parameters: To achieve a fair comparison, Tables 1 and 2 show the performance (F1 scores) we utilize the preprocessed data and extend the of the models for cross-document resolution for implementation for the model in (Barhom et al., entity and event mentions (respectively). Note that 2019) to include our novel hierarchical GCN model. we also report the performance of a variant (called The development dataset of ECB+ is used to tune HGCN-DJ) of the proposed HGCN model where the hyper-parameters of the proposed model (called the cluster-based representations cluster(m) are HGCN). The suggested values and the resources excluded (thus separately doing event and entity for our model are reported in Appendix A. resolution as BSE-DJ). Comparison: Following (Barhom et al., 2019), we Model MUC B3 CEAF-e CoNLL compare HGCN with the following baselines: LEMMA (Barhom et al., 2019) 78.1 77.8 73.6 76.5 CV (Cybulska and Vossen, 2015b) 73.0 74.0 64.0 73.0 (i) LEMMA (Barhom et al., 2019): This first KCP (Kenyon-Dean et al. 2019) 69.0 69.0 69.0 69.0 CKCP (Barhom et al., 2019) 73.4 75.9 71.5 73.6 clusters documents to topics and then groups event BSE-DJ (Barhom et al., 2019) 79.4 80.4 75.9 78.5 mentions that are in the same document clusters BSE (Barhom et al., 2019) MCS (Meged et al., 2020) 80.9 81.6 80.3 80.5 77.3 77.8 79.5 80.0 and share the head lemmas. HGCN-DJ 81.3 80.8 76.1 79.4 HGCN (proposed) 83.1 82.1 78.8 81.3 (ii) CV (Cybulska and Vossen, 2015b): This is a supervised learning method for CDECR that lever- Table 1: The cross-document event coreference resolu- ages discrete features to represent event mentions tion performance (F1) on the ECB+ test set. and documents. We compare with the best reported Model MUC B3 CEAF-e CoNLL results for this method as in (Barhom et al., 2019). LEMMA (Barhom et al., 2019) 76.7 65.6 60.0 67.4 BSE-DJ (Barhom et al., 2019) 78.7 69.9 61.6 70.0 (iii) KCP (Kenyon-Dean et al., 2018): This is BSE (Barhom et al., 2019) 79.7 70.5 63.3 71.2 HGCN-DJ 80.1 70.7 62.2 71.0 a neural network model for CDECR. Both event HGCN (proposed) 82.1 71.7 63.4 72.4 mentions and document are represented via word embeddings and hand-crafted binary features. Table 2: The cross-document entity coreference resolu- (iv) C-KCP (Barhom et al., 2019): This is the tion performance (F1) on the ECB+ test set. KCP model that is retrained and tuned using the As can be seen, HGCN outperforms the all the same document clusters in the test phase as in baselines models on both entity and event coref- (Barhom et al., 2019) and our model. erence resolution (over different evaluation met- (v) BSE (Barhom et al., 2019): This is a joint res- rics). In particular, the CoNLL F1 score of HGCN olution model for cross-document coreference of for event coreference is 1.3% better than those for entity and event mentions, using ELMo to compute MCS (the prior SOTA model) while the CoNLL representations for the mentions. F1 improvement of HGCN over BSE (the prior (v) BSE-DJ (Barhom et al., 2019): This is a SOTA model for entity coreference on ECB+) is variant of BSE that does not use the cluster-based 1.2%. These performance gaps are significant with representations cluster(m) in the mention repre- p < 0.001, thus demonstrating the effectiveness sentations, thus performing event and entity coref- of the proposed model for CDECR. Importantly, erence resolution separately. HGCN is significantly better than BSE, the most (vii) MCS (Meged et al., 2020): An extension of direct baseline of the proposed model, on both en- BSE where some re-ranking features are included. tity and event coreference regardless of whether Note that BSE and MCS are the current state-of- the cluster-based representations cluster(m) for the-art (SOTA) models for CDECR on ECB+. joint entity and event resolution is used or not. This For cross-document entity coreference resolu- testifies to the benefits of the proposed hierarchi- tion, we compare our model with the LEMMA cal model for representation learning for CDECR. and BSE models in (Barhom et al., 2019), the only Finally, we evaluate the full HGCN model when works that report the performance for event men- ELMo embeddings are replaced with BERT embed- tions on ECB+ so far. Following (Barhom et al., dings (Devlin et al., 2019), leading to the CoNLL 2019), we use the common coreference resolution F1 scores of 79.7% and 72.3% for event and entity 38
coreference (respectively). This performance is ei- It is clear from the table that all the ablated ther worse (for events) or comparable (for entities) baselines are significantly worse than the full than those for EMLo, thus showing the advantages model HGCN (with p < 0.001), thereby confirm- of ELMo for our tasks on ECB+. ing the necessity of the proposed GCN models Ablation Study: To demonstrate the benefits of (the sentence-level GCNs with pruned trees and the proposed components for our CDECR model, document-level GCN) and the GCN interaction we evaluate three groups of ablated/varied models mechanism for HGCN and CDECR. In addition, for HGCN. First, for the effectiveness of the pruned the same trends for the model performance are dependency tree Ĝ m and the sentence-level GCN also observed for entity coreference in this ablation models Gent and Genv , we consider the following study (the results are shown in Appendix B), thus baselines: (i) HGCN-Sentence GCNs: this model further demonstrating the benefits of the proposed removes the sentence-level GCN models Gent and components in this work. Finally, we show the Genv from HGCN and directly feed the ELMo- distribution of the error types of our HGCN model based vectors X of the mention heads into the in Appendix C for future improvement. document-level GCN Gdoc (the representation vec- tor sent(m) is thus not included), and (ii) HGCN- 5 Conclusion Pruned Tree: this model replaces the pruned de- We present a model to jointly resolve the cross- pendency tree Ĝ m with the full dependency tree document coreference of entity and event mentions. G m in the computation. Second, for the advantage Our model introduces a novel hierarchical GCN of the GCN interaction between Gent and Genv , that captures both sentence and document context we examine two baselines: (iii) HGCN with One for the representations of entity and event mentions. Sent GCN: this baseline only uses one sentence- In particular, we design pruned dependency trees level GCN model for both entity and event men- to capture important context words for sentence- tions (the regularization loss Lreg m for G ent and level GCNs while interaction graphs between entity G env is thus not used as well), and (iv) HGCN- and event mentions are employed for document- reg Lm : this baseline still uses two sentence-level level GCN. In the future, we plan to explore better GCN models but excludes the regularization term mechanisms to identify important context words Lreg m from the training losses. Finally, for the bene- for CDECR. fits of the document-level GCN Gdoc , we study the following variants: (v) HGCN-Gdoc : this model Acknowledgments removes the document-level GCN model from HGCN, thus excluding the document representa- This research has been supported by the Army Re- tions doc(m) from the mention representations, (vi) search Office (ARO) grant W911NF-21-1-0112. HGCN-Gdoc +TFIDF: this model also excludes This research is also based upon work supported by Gdoc , but it includes the TF-IDF vectors for doc- the Office of the Director of National Intelligence uments, based on uni-, bi- and tri-grams, in the (ODNI), Intelligence Advanced Research Projects mention representations (inherited from (Kenyon- Activity (IARPA), via IARPA Contract No. 2019- Dean et al., 2018)), and (vii) HGCN-Gdoc +MP: in- 19051600006 under the Better Extraction from Text stead of using the GCN model Gdoc , this model ag- Towards Enhanced Retrieval (BETTER) Program. gregates mention representation vectors produced The views and conclusions contained herein are by the sentence-level GCNs (sent(m)) to obtain those of the authors and should not be interpreted the document representations doc(m) using max- as necessarily representing the official policies, ei- pooling. Table 3 presents the performance of the ther expressed or implied, of ARO, ODNI, IARPA, models for event coreference on the ECB+ test set. the Department of Defense, or the U.S. Govern- ment. The U.S. Government is authorized to re- Model MUC B3 CEAF-e CoNLL HGCN (full) 83.1 82.1 78.8 81.3 produce and distribute reprints for governmental HGCN-Sentence GCNs 79.7 80.7 76.4 79.0 purposes notwithstanding any copyright annotation HGCN-Pruned Tree 81.8 81.1 77.1 80.0 HGCN with One Sent GCN 82.1 81.0 76.3 79.8 therein. This document does not contain technol- HGCN-Lreg m 81.6 81.6 77.6 80.3 ogy or technical data controlled under either the HGCN-Gdoc 82.6 81.8 76.7 80.4 HGCN-Gdoc +TFIDF 82.2 81.0 78.4 80.5 U.S. International Traffic in Arms Regulations or HGCN-Gdoc +MP 82.0 81.1 77.0 80.0 the U.S. Export Administration Regulations. Table 3: The CDECR F1 scores on the ECB+ test set. 39
References Thomas N. Kipf and Max Welling. 2017. Semi- supervised classification with graph convolutional Cosmin Adrian Bejan and Sanda Harabagiu. 2014. Un- networks. In ICLR. supervised event coreference resolution. In Compu- tational Linguistics. Viet Dac Lai, Tuan Ngo Nguyen, and Thien Huu Nguyen. 2020. Event detection: Gate diversity and David Ahn. 2006. The stages of event extraction. In syntactic importance scores for graph convolution Proceedings of the Workshop on Annotating and neural networks. In EMNLP. Reasoning about Time and Events. Heeyoung Lee, Marta Recasens, Angel Chang, Mihai Shany Barhom, Vered Shwartz, Alon Eirew, Michael Surdeanu, and Dan Jurafsky. 2012. Joint entity and Bugert, Nils Reimers, and Ido Dagan. 2019. Re- event coreference resolution across documents. In visiting joint modeling of cross-document entity and EMNLP. event coreference resolution. In ACL. Qi Li, Heng Ji, and Liang Huang. 2013. Joint event Cosmin Bejan and Sanda Harabagiu. 2010. Unsuper- extraction via structured prediction with global fea- vised event coreference resolution with rich linguis- tures. In ACL. tic features. In ACL. Zheng Chen and Heng Ji. 2009. Graph-based event Zhengzhong Liu, Jun Araki, Eduard Hovy, and Teruko coreference resolution. In Proceedings of the 2009 Mitamura. 2014. Supervised within-document event Workshop on Graph-based Methods for Natural Lan- coreference using information propagation. In guage Processing (TextGraphs-4). LREC. Zheng Chen, Heng Ji, and Robert Haralick. 2009. A Jing Lu, Deepak Venugopal, Vibhav Gogate, and Vin- pairwise event coreference model, feature impact cent Ng. 2016. Joint inference for event coreference and evaluation for event coreference resolution. In resolution. In COLING. Proceedings of the Workshop on Events in Emerging Text Types. Xiaoqiang Luo. 2005. On coreference resolution per- formance metrics. In EMNLP. Prafulla Kumar Choubey and Ruihong Huang. 2017. Event coreference resolution by iteratively unfold- Christopher Manning, Mihai Surdeanu, John Bauer, ing inter-dependencies among events. In EMNLP. Jenny Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language pro- Agata Cybulska and Piek Vossen. 2014. Using a cessing toolkit. In ACL System Demonstrations. sledgehammer to crack a nut? lexical diversity and event coreference resolution. In LREC. Yehudit Meged, Avi Caciularu, Vered Shwartz, and Ido Dagan. 2020. Paraphrasing vs coreferring: Two Agata Cybulska and Piek Vossen. 2015a. Translat- sides of the same coin. In ArXiv abs/2004.14979. ing granularity of event slots into features for event coreference resolution. In Proceedings of the The Guoshun Nan, Zhijiang Guo, Ivan Sekulic, and Wei Lu. 3rd Workshop on EVENTS: Definition, Detection, 2020. Reasoning with latent structure refinement for Coreference, and Representation. document-level relation extraction. In ACL. Agata Cybulska and Piek T. J. M. Vossen. 2015b. "bag Minh Van Nguyen, Viet Dac Lai, and Thien Huu of events" approach to event coreference resolution. Nguyen. 2021. Cross-task instance representation supervised classification of event templates. In Int. interactions and label dependencies for joint in- J. Comput. Linguistics Appl. formation extraction with graph convolutional net- works. In NAACL-HLT. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Thien Huu Nguyen, , Adam Meyers, and Ralph Grish- deep bidirectional transformers for language under- man. 2016. New york university 2016 system for standing. In NAACL-HLT. kbp event nugget: A deep learning approach. In TAC. Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embed- Thien Huu Nguyen and Ralph Grishman. 2018. Graph dings, convolutional neural networks and incremen- convolutional networks with argument-aware pool- tal parsing. In To appear. ing for event detection. In AAAI. Kian Kenyon-Dean, Jackie Chi Kit Cheung, and Doina F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, Precup. 2018. Resolving event coreference with B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, supervised representation learning and clustering- R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, oriented regularization. In Proceedings of the Sev- D. Cournapeau, M. Brucher, M. Perrot, and E. Duch- enth Joint Conference on Lexical and Computa- esnay. 2011. Scikit-learn: Machine learning in tional Semantics. Python. In Journal of Machine Learning Research. 40
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Marc Vilain, John Burger, John Aberdeen, Dennis Con- Gardner, Christopher Clark, Kenton Lee, and Luke nolly, and Lynette Hirschman. 1995. A model- Zettlemoyer. 2018. Deep contextualized word repre- theoretic coreference scoring scheme. In Sixth Mes- sentations. In NAACL-HLT. sage Understanding Conference (MUC-6). Sameer Pradhan, Xiaoqiang Luo, Marta Recasens, Ed- Bishan Yang, Claire Cardie, and Peter Frazier. 2015. A uard Hovy, Vincent Ng, and Michael Strube. 2014. hierarchical distance-dependent Bayesian model for Scoring coreference partitions of predicted men- event coreference resolution. In TACL. tions: A reference implementation. In ACL. Sunil Kumar Sahu, Fenia Christopoulou, Makoto Miwa, and Sophia Ananiadou. 2019. Inter-sentence relation extraction with document-level graph convo- lutional neural network. In ACL. Lei Sha, Feng Qian, Baobao Chang, and Zhifang Sui. 2018. Jointly extracting event triggers and argu- ments by dependency-bridge rnn and tensor-based argument interaction. In AAAI. M. Surdeanu, Lluís Màrquez i Villodre, X. Carreras, and P. Comas. 2007. Combination strategies for se- mantic role labeling. In Journal of Artificial Intelli- gence Research. Hieu Minh Tran, Minh Trung Nguyen, and Thien Huu Nguyen. 2020. The dots have their values: Exploit- ing the node-edge connections in graph-based neu- ral models for document-level relation extraction. In EMNLP Findings. Minh Tran and Thien Huu Nguyen. 2021. Graph con- volutional networks for event causality identifica- tion with rich document-level structures. In NAACL- HLT. Shyam Upadhyay, Nitish Gupta, Christos Christodoulopoulos, and Dan Roth. 2016. Re- visiting the evaluation for cross document event coreference. In COLING. Amir Pouran Ben Veyseh, Franck Dernoncourt, Quan Hung Tran, Varun Manjunatha, Lidan Wang, Rajiv Jain, Doo Soon Kim, Walter Chang, and Thien Huu Nguyen. 2021. Inducing rich interac- tion structures between words for document-level event argument extraction. In Proceedings of the 25th Pacific-Asia Conference on Knowledge Discov- ery and Data Mining (PAKDD). Amir Pouran Ben Veyseh, Thien Huu Nguyen, and De- jing Dou. 2019. Graph based neural networks for event factuality prediction using syntactic and se- mantic structures. In ACL. Amir Pouran Ben Veyseh, Tuan Ngo Nguyen, and Thien Huu Nguyen. 2020a. Graph transformer net- works with syntactic and semantic structures for event argument extraction. In EMNLP Findings. Amir Pouran Ben Veyseh, Nasim Nouri, Franck Der- noncourt, Dejing Dou, and Thien Huu Nguyen. 2020b. Introducing syntactic structures into target opinion word extraction with deep learning. In EMNLP. 41
You can also read