Event Extraction from Historical Texts: A New Dataset for Black Rebellions

Page created by Clayton Lynch
 
CONTINUE READING
Event Extraction from Historical Texts:
                           A New Dataset for Black Rebellions
          Viet Dac Lai1 , Minh Van Nguyen1 , Heidi Kaufman2 , Thien Huu Nguyen1
                      1
                        Department of Computer and Information Science
                                     2
                                       Department of English
                         University of Oregon, Eugene, OR 97403, USA
                     {vietl,minhnv,thien}@cs.uoregon.edu
                                  hkaufman@uoregon.edu

                     Abstract                              over the quantity of the data. Recent advances of
                                                           natural language processing (NLP) aim to bridge
    Understanding historical events is necessary
                                                           the gap between qualitative and quantitative anal-
    for the study of contemporary society, cul-
    ture, and politics. In this work, we focus on          yses by identifying, extracting, and counting con-
    the event extraction task (EE) to detect event         textual data (Won et al., 2018; Wadden et al., 2019;
    trigger words and their arguments in a novel           Lin et al., 2020). This new approach provides con-
    domain of historical texts. In particular, we          textual information about real-life entities (e.g., in-
    introduce a new EE dataset for a corpus of             dividuals, locations, times, documents) which can
    nineteenth-century African American newspa-            be later integrated into knowledge bases (Won et al.,
    pers. Our goal is to study the discourse of
                                                           2018) to aid historical research and discourse anal-
    slave and non-slave African diaspora rebel-
    lions published in the periodical press in this
                                                           ysis.
    period. Our dataset features 5 entity types,              In this work, we explore Information Extraction
    12 event types, and 6 argument roles that con-         (IE) in NLP for humanities research in support of
    cern slavery and black movements between the           the important and complicated process of knowl-
    eighteenth and nineteenth centuries. Histor-           edge extraction from historical texts. Particularly,
    ical newspapers present many challenges for            we investigate the Event Extraction (EE) task which
    existing EE systems, including the evolution           identifies event trigger words of pre-determined
    of meanings of words and the extensive use
                                                           event types (the most important words/phrases to
    of religious discourse in newspapers from this
    era. Our experiments with current state-of-            evoke events) (Li et al., 2013), together with its
    the-art EE systems and BERT models demon-              arguments (e.g., participants, locations). For exam-
    strate their poor performance over historical          ple, in the following sentence an EE system should
    texts and call for more robust research efforts        be able to detect the word “proclaimed” as a trigger
    in this area.                                          word of the event type “Law Approve” and asso-
                                                           ciate it with the arguments, i.e., agent (Capitol),
1   Introduction
                                                           beneficiary (the slave), and datetime (now).
In the last two decades, the emergence of digital hu-         Freedom to the slave should now be proclaimed
manities has transformed scholarship in the human-         from the Capitol, and should be seen above the
ities. Historical documents are now massively digi-        smoke and fire of every battle field.
tized into photos and texts that allow researchers to         To enable the development and evaluation of EE
query across collections and languages (Piotrowski,        models for historical text, benchmark datasets play
2012). Despite the convenience of these applica-           an important role. However, most of the current
tions (Yang and Eisenstein, 2016), a gap still ex-         datasets in EE (i.e., ACE-2005 (Walker et al., 2005)
ists between datasets and research methods. As             and TAC KBP (Mitamura et al., 2015)) are not suit-
such, humanities scholars do not solely interpret          able for this domain for several reasons. First, these
historical facts from statistical figures derived from     datasets are collected from various sources without
massive data. Rather, they prefer reading texts and        a target topic (Walker et al., 2005; Mitamura et al.,
interpreting words in historical and cultural context,     2015). Therefore, tracking the evolution of some
or by associating texts with the circumstances sur-        specific movements or progress, which is of great
rounding their publication. This working methodol-         interest to literary scholars and historians, is not a
ogy requires an emphasis on the quality of the data        feasible goal. Second, documents in these datasets

                                                      2390
            Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2390–2400
                          August 1–6, 2021. ©2021 Association for Computational Linguistics
are derived from recent articles and documents in             arguments for such events, we introduce five en-
which the use of words in the text differ from their          tity types (i.e., Person, Organization, Geographical-
uses in the past. For example, some words obtain              Political Entities, Time, and Document). The first
new semantics over time, and the dominance of                 four entity types follow the definition in the ACE
religion in the past led to extensive use of religion-        2005 guideline (Walker et al., 2005) while the Doc-
related words and figurative language in histori-             ument type represents government documents (e.g.,
cal publications. Last but not least, existing EE             Slavery Act) used in events. Finally, we define
datasets mostly concern events in common human                six argument roles that such entity types can play
life, such as giving birth, transportation, and crimes.       in our events, including Time, Location, Agent,
These events might not relate to the subjects literary        Patient, Object, and Beneficiary. Tables 9 and 10
scholars and historians want to study.                        provide more descriptions and examples of these
   To redress this problem, we introduce a novel              argument roles for each event type.
EE dataset for historical texts, called BRAD, fo-                The African American corpus is a large corpus
cusing on Black Rebellions in African Diaspora                of 177,582 articles. We thus select documents that
(i.e., African American population). BRAD’s doc-              are relevant to our focused topic of Black diaspora
uments are selected by a humanities expert and are            rebellions. First, automatic selection is done by
annotated by EE experts for 5 entity types, 12 event          keyword matching to identify documents related
types, and 6 argument roles. Finally, we evaluate             to slavery and insurrection. As such, our humani-
the state-of-the-art EE models on BRAD. Our ex-               ties expert defined a set of keywords for the topic
periments show that the performance for historical            of rebellion. In the nineteenth century this clus-
texts of current EE models is significantly poorer            ter of words were used interchangeably to describe
than those for modern texts, necessitating further            African diaspora rebellion events (e.g., “rebel”, “re-
research into this area. We will also release our             volt”, “strike”, “insurrection”). We used the Stan-
dataset and code to facilitate future research.               ford CoreNLP toolkit to split and tokenize docu-
                                                              ments into sentences and words. Next, for each
2   Data Collection and Annotation
                                                              document in the corpus, we counted the number of
In this project we use documents from the African             words in the document that appears in the desig-
American newspaper corpus. These documents                    nated keyword set (called matching rate). The top
involve news articles derived from nineteenth-                1000 documents with the highest matching rates
century African American periodicals1 published               are selected for further consideration. In the second
from 1827 to 1909.                                            step, the humanities expert examined the 1000 doc-
   To create an EE dataset, we first designed a               uments to identify relevant documents for Black
set of event types and annotation guidelines, con-            rebellions, leading to the selection of 151 docu-
sulting our humanities expert who specializes in              ments used for the EE annotation.
nineteenth-century literature. In particular, we fo-             In the next step we recruited two graduate stu-
cus on the four most important events for Black               dents to annotate the selected documents for EE.
rebellions presented in our corpus, including Hu-             Each student was independently trained on the an-
manity: a humanity event concerns a violation or              notation guideline and performed a group of exer-
facilitation of basic human rights (e.g. living, free-        cises to better recognize events and entities. The
dom, property); Law: a law event characterizes an             students annotated the 151 documents for entity
introduction, approval or appeal of a law; Conflict:          mentions and event triggers, achieving Cohen’s
a conflict event represents an act of violence; it            Kappa scores of 0.81 and 0.82 respectively. Note
includes the initialization, development, and con-            that these scores are very close to the near-perfect
sequences of a violent act; and Justice: a justice            agreement range of [0.81, 0.99]. To further im-
event captures an act of punishment of the govern-            prove the quality of the dataset, our humanities ex-
ment to the people who violate a law. These four              pert will resolve the annotation conflicts that arise
events are further expanded into 12 event sub-types.          between the two students, leading to the final anno-
Tables 7 and 8 present event types along with their           tation version of entity mentions and event triggers
descriptions and examples in BRAD. To capture                 in the 151 documents. In the next step, given the
   1
     Douglass Monthly, The Frederick Douglass Paper, Free-    reconciled entity mention and event trigger anno-
dom Journals, The Christian Recorder, The Colored American.   tation, the two students continue to annotate event

                                                         2391
arguments for the event triggers. Our evaluation                        Entity Type               #Entity
                                                                        PERSON (PER)               12,599
shows a Cohen’s Kappa score of 0.75 that indicates
                                                                        ORGANIZATION (ORG)          3,836
a strong agreement between the two annotators.                          GEOPOLITICAL (GPE)          2,873
Also, the lower agreement score for event argu-                         DOCUMENT (DOC)              1,121
ments suggests that event argument annotation is                        DATETIME (TIME)               996
more ambiguous than those for entity mentions and
                                                                  Table 3: Distribution of entity types in BRAD.
event triggers. Finally, our domain expert was con-
sulted to resolve any conflicts in event argument an-
notation, producing the final version of our BRAD             ample, in the following sentence, “Congress” and
dataset with the 151 documents. To facilitate the             “her” are two mentions of the USS Congress bat-
development of EE models, we then split BRAD                  tleship launched by the United State Navy in 1841.
into three portions for training, development, and            Without historical knowledge, our current percep-
test data with 101, 25, and 25 documents, respec-             tion might interpret “Congress” as the legislative
tively. Table 1 presents the statistics while Table 2         branch of the United States. In fact, the second
and 3 presents the frequencies of event and entity            clause mentions the wooden hull that helps to clar-
types in our BRAD dataset.                                    ify it as the battleship that sunk in 1862 during the
                      Train     Dev      Test     Total       US Civil War. Such misinterpretation might lead
   #document             101       25       25       151      to incorrect annotations and analyses.
   #sentence           3,847      925      866      5638         “The Congress was visited and received the
   #token            117,278   27,860   26,920   172,058
   #event trigger      2,720      606      933     4,259      shots and shells in all part of her wooden hull”.
   #entity mention    14,389    3,287    3,749    21,425         Second, we find that annotation disagreements
   #event argument     6,057    1,219    2,570     9,846
                                                              are more likely to occur in the interpretation of
         Table 1: Statistics of the BRAD dataset.             event triggers. In BRAD, we allow event triggers to
                                                              involve multiple words that cause span mismatches
                                                              between annotations for some confusing cases (e.g.,
              Event Type            #Event                    annotating the whole phrase “make the black man
              Conflict Attack        1,628
              Conflict Other           971
                                                              equal” as an event trigger or annotating “make”
              Humanity Deprive         577                    and “equal” as two separate triggers). Another
              Humanity Endow           376                    form of popular disagreement involves mismatches
              Conflict Injure          153                    on event types. Consider the following sentence as
              Law Approve              145                    an example:
              Law Repeal               141
              Law Propose              108                       “Believing his life to be in danger, Patmon
              Justice Arrest-Jail       78                    stepped back, drew his revolver, and told the fellow
              Conflict Protest          42                    to surrender, or he would shoot him .”
              Justice Execute           26                       Two annotators agree that the word “shoot” is
              Justice Sentence          14
                                                              an event trigger. However, one annotator consid-
    Table 2: Distribution of event types in BRAD.             ers this as an event of type Conflict Attack as it is
                                                              a part of the conflict between the overseer (“Pat-
Annotation Challenges: During the EE annota-                  mon”) and the slave (“fellow”, “him”); the other
tion process of historical texts, we found sev-               annotator, on the other hand, treats “shoot” as a Hu-
eral noteworthy challenges regarding the ability              manity Deprive event as the overseer is threatening
to achieve interpretive consensus of the texts.               to kill the slave (i.e., taking the right to life).
   First, for the domain expertise, we find that the          Data Analysis: To illustrate the ambiguity in
use and meaning of words evolves over time and                BRAD, Table 4 shows five words with the high-
across geographical regions, potentially introduc-            est frequency as event triggers (i.e., Event Count),
ing new meanings or making one meaning more                   along with the percentage of times these words are
popular than the others. Language is always in per-           labeled as event triggers in the dataset (i.e., Event
petual flux. As such, understanding texts from the            Rate) (Sims et al., 2019). This table demonstrates
past requires analysis of the context in which texts          the likelihood that words with the highest event
were written. In order to be effective, the annota-           counts might not be annotated as event triggers in
tions must be attentive to these contexts. For ex-            BRAD, thereby necessitating EE models to find a

                                                           2392
Word            #Event    Event Rate             cross-sentence relationships for span representation
          war                183        23.5%
          rebellion          112        47.3%
                                                           propagation for joint IE.
          insurrection        66        78.6%                  OneIE (Lin et al., 2020): This model first iden-
          revolt              77        60.2%              tifies spans of entity mentions and event triggers.
          emancipation        58        82.9%
          take                64        32.3%
                                                           The detected spans are then paired to jointly predict
          put                 35        47.3%              entity types, event types, relations, and argument
          send                25        45.5%              roles for IE. Global features are used to capture
                                                           cross-task and cross-instance dependencies and
Table 4: Event rates of the words with the highest event
counts in BRAD.                                            are employed in the decoding phase with beam
                                                           searches to improve extraction performance.
                         ACE 2005    BRAD                     As such, we adapt the official implementations
            god            4.5%      17.5%
                                                           of such models from their original papers for our
            lord           0.3%       8.7%
            heaven         0.7%       8.7%
                                                           EE task in BRAD by ignoring the relation extrac-
            mighty         0.2%       8.7%                 tion task and re-tuning them on the BRAD devel-
            sacred         0.2%       9.5%                 opment set. For both models, we employ the pre-
            curse          0.5%       5.6%                 trained BERT model (i.e., the bert-base-cased ver-
            christian      1.7%      18.3%                 sion) to encode input texts. Besides, motivated
                                                           by the language difference between historical and
Table 5: Percentages of documents containing religion-
related words in BRAD and ACE 2005.
                                                           modern texts, we further explore a variant of the
                                                           BERT model by fine-tuning it on the African Amer-
                                                           ican corpus via the masked language modeling task
method of effectively capturing context in order to        (Devlin et al., 2019). Note that we exclude the 151
perform correct predictions.                               documents of BRAD in this fine-tuning process.
   Moreover, we find extensive use of religion-            This fine-tuned BERT model will also be fed into
related words in BRAD compared to existing EE              DyGIE++ and OneIE to perform EE in BRAD.
datasets. For example, considering the words                  Result: Table 6 reports the performance of the
“lord”, “heaven”, and “christian”, the percentages         models on the test set of BRAD over five sub-
of documents in ACE 2005 containing these words            tasks: Entity Mention Detection (Entity), Event
are only 0.3%, 0.7%, and 1.7% while those per-             Trigger Identification, i.e., not concerning event
centages for BRAD are 8.7%, 8.7%, and 18.3%                types (Trig-I), Event Trigger Classification (Trig-
respectively. Such language difference suggests the        C), Event Argument Identification, i.e., not con-
potential need to adapt existing language models to        cerning argument roles (Arg-I), and Event Argu-
better capture the nature of historical texts which,       ment Classification (Arg-C). For comparison, we
in turn, will facilitate a more accurate performance       also include the original performance of the models
of EE.                                                     on the popular EE dataset ACE 2005. There are
                                                           three major observations from the table. First, the
3   Experiment
                                                           performance of current EE models on BRAD is
There are three major EE tasks that BRAD supports          significantly and substantially worse than those on
for historical texts, including entity mention detec-      ACE across different tasks. It thus suggests that EE
tion (EMD), event trigger detection (ED), and event        for historical texts in BRAD is a challenging task
argument extraction (EAE). This section aims to            and more research effort is necessary to boost the
reveal the complexity of the EE tasks in BRAD by           EE performance for this domain. Second, compar-
evaluating the performance of existing state-of-the-       ing the performance of the models with different
art models for EE on this dataset. In particular, we       versions of BERT (i.e., original vs fine-tuned), it
focus on the following state-of-the-art models for         is clear that fine-tuning BERT on historical texts
EE that leverage the pre-trained language model            is beneficial for improving the performance of EE
BERT (Devlin et al., 2019) for the text encoding           models on BRAD (especially for OneIE where the
and jointly perform predictions for all EE tasks in        improvement is consistent across different EE sub-
an end-to-end fashion (i.e., joint inference):             tasks with large margins). This observation sug-
   DyGIE++ (Wadden et al., 2019): This model               gests that pre-training BERT on modern texts is
utilizes dynamic span graphs to exploit long-range         unable to capture the nuance of language use in

                                                       2393
ACE            BRAD               BRAD#
                     Task      Model
                                             F1      P       R      F1     P    R    F1
                               DyGIE++      90.7    85.6    75.9   80.5   84.3 78.6 81.4
                     Entity
                               OneIE        90.3    85.4    77.0   81.0   85.0 79.4 82.1
                               DyGIE++      76.5    81.5    50.1   62.0   77.4 56.9 65.6
                     Trig-I
                               OneIE        78.6    80.9    47.0   59.4   80.8 52.9 63.9
                               DyGIE++      73.6    62.7    38.5   47.7   61.6 40.6 49.0
                     Trig-C
                               OneIE        75.2    63.5    36.9   46.7   64.9 42.2 51.2
                               DyGIE++      55.4    55.5    28.0   37.2   47.8 28.6 35.8
                     Arg-I
                               OneIE        60.7    57.7    33.9   42.7   58.9 40.4 47.9
                               DyGIE++      52.5    48.8    24.6   32.7   42.5 25.4 31.8
                     Arg-C
                               OneIE        58.6    49.6    29.1   36.7   52.1 31.6 39.4

Table 6: The performance of models on the test sets of BRAD and ACE 2005. The BRAD# columns report the performance
with BERT fine-tuned on the African American corpus.

history, thus impairing the models and requiring           5   Acknowledgement
appropriate adaptation to boost the EE performance.
                                                           This work is supported by an I3 grant from the
Finally, we note that the human performance (F1
                                                           University of Oregon Provost’s Office (Incubat-
scores) for Entity, Trig-C, and Arg-C on BRAD
                                                           ing Interdisciplinary Initiatives) and the Army Re-
are 95.43, 88.3, and 79.8 respectively. The large
                                                           search Office (ARO) grant W911NF-21-1-0112.
performance gaps between human and current EE
                                                           This research is also based upon work supported by
systems thus presents many research opportunities
                                                           the Office of the Director of National Intelligence
for future work on BRAD.
                                                           (ODNI), Intelligence Advanced Research Projects
                                                           Activity (IARPA), via IARPA Contract No. 2019-
4   Related work
                                                           19051600006 under the Better Extraction from Text
Prior work in NLP for historical texts has mainly          Towards Enhanced Retrieval (BETTER) Program.
focused on spelling and text normalization (Pet-           The views and conclusions contained herein are
tersson et al., 2014; Bollmann et al., 2017; Flachs        those of the authors and should not be interpreted
et al., 2019). Recently, some studies have under-          as necessarily representing the official policies, ei-
taken research on historical texts with NLP tasks          ther expressed or implied, of ARO, ODNI, IARPA,
such as POS tagging (Yang and Eisenstein, 2016)            the Department of Defense, or the U.S. Govern-
and information extraction (Pettersson et al., 2016).      ment. The U.S. Government is authorized to re-
However, none of this work has explored EE.                produce and distribute reprints for governmental
   EE is an active research area due to the availabil-     purposes notwithstanding any copyright annotation
ity of EE datasets e.g., for general (Walker et al.,       therein. This document does not contain technol-
2005; Mitamura et al., 2015) and biomedical (Kim           ogy or technical data controlled under either the
et al., 2011) domains. Most of prior studies fo-           U.S. International Traffic in Arms Regulations or
cus on in-domain EE (Ahn, 2006; Li et al., 2013;           the U.S. Export Administration Regulations.
Nguyen and Grishman, 2015; Chen et al., 2015;
                                                           6   Conclusion
Nguyen et al., 2016; Yang et al., 2019; Wadden
et al., 2019; Lai et al., 2020c; Nguyen et al., 2021).     We present BRAD, a new dataset for EE on histor-
Some recent studies in EE have also addressed ex-          ical texts that focuses on Black rebellions in the
tensible learning settings for EE to new event types,      American Africa corpus. Our experiments demon-
e.g. zero-shot learning (Huang et al., 2018), few-         strate the poor performance of current models for
shot learning (Lai et al., 2020a,b), or new domains        EE on BRAD compared to those on modern texts,
(Naik and Rosé, 2020). The closet works to ours           thus creating room for future research on EE for
involve recent efforts to create new datasets for EE       historical texts. We also illustrate one approach to
(Satyapanich et al., 2020; Ebner et al., 2020; Wang        improve current EE systems for historical texts via
et al., 2020; Trong et al., 2020; Le and Nguyen,           fine-tuning existing pre-trained language models.
2021). However, these works do not consider his-           In the future, we plan to enlarge our datasets with
torical texts as we do.                                    more annotated documents and event types.

                                                      2394
References                                                 Viet Dac Lai, Tuan Ngo Nguyen, and Thien Huu
                                                             Nguyen. 2020c. Event detection: Gate diversity and
David Ahn. 2006. The stages of event extraction. In          syntactic importance scores for graph convolution
  Proceedings of the Workshop on Annotating and              neural networks. In Proceedings of the 2020 Con-
  Reasoning about Time and Events.                           ference on Empirical Methods in Natural Language
Marcel Bollmann, Joachim Bingel, and Anders                  Processing (EMNLP).
 Søgaard. 2017. Learning attention for historical text     Duong Le and Thien Huu Nguyen. 2021. Fine-grained
 normalization by learning to pronounce. In Proceed-         event trigger detection. In Proceedings of the 16th
 ings of the 55th Annual Meeting of the Association          Conference of the European Chapter of the Associa-
 for Computational Linguistics (ACL).                        tion for Computational Linguistics (EACL).
Yubo Chen, Liheng Xu, Kang Liu, Daojian Zeng, and          Qi Li, Heng Ji, and Liang Huang. 2013. Joint event
  Jun Zhao. 2015. Event extraction via dynamic multi-        extraction via structured prediction with global fea-
  pooling convolutional neural networks. In Proceed-         tures. In Proceedings of the 51st Annual Meet-
  ings of the 53rd Annual Meeting of the Association         ing of the Association for Computational Linguistics
  for Computational Linguistics and the 7th Interna-         (ACL).
  tional Joint Conference on Natural Language Pro-
  cessing (ACL-IJCNLP).                                    Ying Lin, Heng Ji, Fei Huang, and Lingfei Wu. 2020.
                                                             A joint neural model for information extraction with
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and                global features. In Proceedings of the 58th Annual
   Kristina Toutanova. 2019. BERT: Pre-training of           Meeting of the Association for Computational Lin-
   deep bidirectional transformers for language under-       guistics (ACL).
   standing. Proceedings of the 2019 Conference of the
   North American Chapter of the Association for Com-      Teruko Mitamura, Zhengzhong Liu, and Eduard H
   putational Linguistics: Human Language Technolo-          Hovy. 2015. Overview of tac kbp 2015 event nugget
   gies (NAACL-HLT).                                         track. In Proceedings of the Text Analysis Confer-
                                                             ence (TAC).
Seth Ebner, Patrick Xia, Ryan Culkin, Kyle Rawlins,
  and Benjamin Van Durme. 2020. Multi-sentence ar-         Aakanksha Naik and Carolyn Rosé. 2020. Towards
  gument linking. In Proceedings of the 58th Annual          open domain event trigger identification using ad-
  Meeting of the Association for Computational Lin-          versarial domain adaptation. In Proceedings of the
  guistics (ACL).                                            58th Annual Meeting of the Association for Compu-
                                                             tational Linguistics (ACL).
Simon Flachs, Marcel Bollmann, and Anders Søgaard.
  2019. Historical text normalization with delayed         Minh Van Nguyen, Viet Lai, and Thien Huu Nguyen.
  rewards. In Proceedings of the 57th Annual Meet-           2021. Cross-task instance representation interac-
  ing of the Association for Computational Linguistics       tions and label dependencies for joint information
  (ACL).                                                     extraction with graph convolutional networks. In
                                                             Proceedings of the 2021 Conference of the North
Lifu Huang, Heng Ji, Kyunghyun Cho, and Clare R              American Chapter of the Association for Computa-
   Voss. 2018. Zero-shot transfer learning for event ex-     tional Linguistics: Human Language Technologies
   traction. In Proceedings of the 56th Annual Meet-        (NAACL-HLT).
   ing of the Association for Computational Linguistics
  (ACL).                                                   Thien Huu Nguyen, Kyunghyun Cho, and Ralph Gr-
                                                             ishman. 2016. Joint event extraction via recurrent
Jin-Dong Kim, Tomoko Ohta, Sampo Pyysalo, Yoshi-             neural networks. In Proceedings of the 2016 Con-
   nobu Kano, and Jun’ichi Tsujii. 2011. Extracting          ference of the North American Chapter of the Asso-
   bio-molecular events from literature—the bionlp’09        ciation for Computational Linguistics: Human Lan-
   shared task. Computational Intelligence, 27(4):513–       guage Technologies (NAACL-HLT).
   540.
                                                           Thien Huu Nguyen and Ralph Grishman. 2015. Event
Viet Dac Lai, Franck Dernoncourt, and Thien Huu              detection and domain adaptation with convolutional
  Nguyen. 2020a. Exploiting the matching informa-            neural networks. In The 53rd Annual Meeting of the
  tion in the support set for few shot event classifica-     Association for Computational Linguistics and the
  tion. In Proceedings of the 24th Pacific-Asia Con-         7th International Joint Conference on Natural Lan-
  ference on Knowledge Discovery and Data Mining             guage Processing (ACL-IJCNLP).
  (PAKDD).
                                                           Eva Pettersson, Jonas Lindström, Benny Jacobs-
Viet Dac Lai, Thien Huu Nguyen, and Frank Dernon-            son, and Rosemarie Fiebranz. 2016. Histsearch-
  court. 2020b. Extensively matching for few-shot            implementation and evaluation of a web-based tool
  learning event detection. In Proceedings of the First      for automatic information extraction from historical
  Joint Workshop on Narrative Understanding, Story-          text. In HistoInformatics@ DH.
  lines, and Events.

                                                      2395
Eva Pettersson, Beáta Megyesi, and Joakim Nivre.           the 9th International Joint Conference on Natural
  2014. A multilingual evaluation of three spelling         Language Processing (EMNLP-IJCNLP).
  normalisation methods for historical text. In Pro-
  ceedings of the 8th workshop on language technol-       Christopher Walker, Stephanie Strassel, Julie Medero,
  ogy for cultural heritage, social sciences, and hu-       and Kazuaki Maeda. 2005. ACE 2005 multilingual
  manities (LaTeCH).                                        training corpus. In Technical report, Linguistic Data
                                                            Consortium.
Michael Piotrowski. 2012. Natural language process-
  ing for historical texts. Synthesis lectures on human   Xiaozhi Wang, Ziqi Wang, Xu Han, Wangyi Jiang,
  language technologies, 5(2):1–157.                        Rong Han, Zhiyuan Liu, Juanzi Li, Peng Li, Yankai
                                                            Lin, and Jie Zhou. 2020. MAVEN: A Massive Gen-
Taneeya Satyapanich, Francis Ferraro, and Tim Finin.        eral Domain Event Detection Dataset. In Proceed-
  2020. CASIE: Extracting cybersecurity event infor-        ings of the 2020 Conference on Empirical Methods
  mation from text. In Proceedings of the Associa-          in Natural Language Processing (EMNLP).
  tion for the Advancement of Artificial Intelligence
  (AAAI).                                                 Miguel Won, Patricia Murrieta-Flores, and Bruno Mar-
                                                            tins. 2018. Ensemble named entity recognition
Matthew Sims, Jong Ho Park, and David Bamman.
                                                            (NER): evaluating NER tools in the identification of
 2019. Literary event detection. In Proceedings of
                                                            place names in historical corpora. Frontiers in Digi-
 the 57th Annual Meeting of the Association for Com-
                                                            tal Humanities, 5:2.
 putational Linguistics (ACL).
Hieu Man Duc Trong, Duc Trong Le, Amir Pouran             Sen Yang, Dawei Feng, Linbo Qiao, Zhigang Kan,
  Ben Veyseh, Thuat Nguyen, and Thien Huu Nguyen.           and Dongsheng Li. 2019. Exploring pre-trained lan-
  2020. Introducing a new dataset for event detection       guage models for event extraction and generation. In
  in cybersecurity texts. In Proceedings of the 2020        Proceedings of the 57th Annual Meeting of the Asso-
  Conference on Empirical Methods in Natural Lan-           ciation for Computational Linguistics (ACL).
  guage Processing (EMNLP).
                                                          Yi Yang and Jacob Eisenstein. 2016. Part-of-speech
David Wadden, Ulme Wennberg, Yi Luan, and Han-              tagging for historical English. In Proceedings of the
  naneh Hajishirzi. 2019. Entity, relation, and event       2016 Conference of the North American Chapter of
  extraction with contextualized span representations.      the Association for Computational Linguistics: Hu-
  In Proceedings of the 2019 Conference on Empiri-          man Language Technologies (NAACL-HLT).
  cal Methods in Natural Language Processing and

                                                     2396
Type                Description                                       Examples
        LAW            A PROPOSE event occurs           Below we give the salient points of the bill of an entertain-
        Propose        when an actor (Agent) intro-     ment recently given in the interest of a certain church about
                       duces a bill, proposition, or    to be organized in a certain town in New Jersey.
                       treaty which benefits a group    The bill introduced in Congress last week by the congress-
                       of people (Beneficial).          man from North Carolina , to abolish the 15th amendment.
                                                        It ’s only effect will be to create support for the bill of
                                                        congressman Crumpacker which proposes a reduction of
                                                        representation in those States
        LAW            An APPROVE event occurs          Be it enacted by the General Assembly of Maryland.
        Approve        when a bill or order (Object)
                       is passed by either the head of Vermont has passed her Liberty Bill , New York has under
                       the government or a represen- discussion , and Massachusetts will soon report and pass
                       tative committee (Agent).         her Act .
                                                         But it is said that for the Government to adopt the abolition
                                                         policy, would involve the loss of the support of the Union
                                                         men of the Border Slave States.
        LAW            A REPEAL event occurs ... that I determined to revoke the act of the Federal Con-
        Repeal         when an active law (Object) stituent Assembly , whereby Slavery was abolished.
                       is completely repealed by a Even the New York Tribune protests against making this
                       state actor (Agent).              war for the destruction of slavery, and insists that such a
                                                         war would alienate a large body of the Northern people at
                                                         present who adhere to the Government in the prosecution
                                                         of the war.
                                                         They want to se the Government march a powerful array
                                                         into the traitorous States, proclaim liberty to every slave,
                                                         and wipe out the last vestige of that barbarous system from
                                                         the land ...
        CONFLICT       A PROTEST event occurs Almost simultaneously with the appearance of the minstrels
        Protest        when people (Agent) come there arose from every kennel in the neighborhood timely
                       into a public area to demand protest barked forth vigorously by a hundred curs, who, in
                       some action.        PROTEST common with their masters, cursed their common luck.
                       events include, but are not It has attempted to supplant Government with anarchy, and
                       limited to, protests, sit-ins, the fury of a brutal mob for the beneficent operation of law,
                       and riots as the result of a pre- and the legally appointed law-makers.
                       vious protest.                    ...while the majority of the men were absent at a public
                                                         demonstration at Myrtle-avenue Park, in another part of
                                                         the city.
        CONFLICT       An ATTACK event occurs Make the slave first, midst, and last Follow no longer the
        Attack         when a person or a organiza- partial and side issues; strike for the abolition of slavery.
                       tion (Agent) performs an vio- . . . and hovering about Williamsport in an unaccountable
                       lent act causing harm or dam- manner - while the rebel troops are burning , destroying ,
                       age to another person or orga- pressing loyal men into service , or driving them from the
                       nization (Patient).               houses they hoped to possess , and the wheat-fields they
                                                         expected to reap , under the protecting folds of the Federal
                                                         flag .
                                                         The States which rebelled , after having been most thor-
                                                         oughly whipped in a great war , came back into the Union
                                                         upon their promises to abide by the Constitution and Laws
                                                         of the same .
        CONFLICT       A Injure-Die event is defined The life of loyal men are being sacrificed by scores, and
        Injure-Die     as a death or wound of a per- will, by and by, be sacrificed by thousands.
                       son (Patient) which is the re- Why should the nation pour out its blood and lavish its
                       sult of a violence event by an- treasure by the million, consent to protect and preserve the
                       other person (Agent).             guilty cause of all its troubles?
                                                         Our loss is estimated at two hundred killed and wounded.

Table 7: Event types with their descriptions and examples in the BRAD dataset (to be continued in Table 8). Event trigger
words are shown in bold.

                                                          2397
Type                   Description                                          Examples
     CONFLICT        Conflict Other events are re-      Let the slaves and free colored people be
     Other           served for events that are         called into service, and formed into a liberating army, to
                     related to conflicts, but not      march into the South ...
                     classified as one of the con-      Their efforts in this direction have been crowned by entire
                     flict event types above, includ-   success.
                     ing declaring war, threatening     He had called loud and earnestly upon the Government for
                     someone, forming an army, a        reinforcements; but the Government was practically deaf
                     movement, and a march.             to the call, and left him and his brave companions either
                                                        to perform a miracle, or to be completely overwhelmed by
                                                        superior numbers.
     JUSTICE         An Arrest-Jail Event occurs        It appears that he obtained his information direct from Ger-
     Arrest-Jail     when the movement of a per-        man where a supposed agent of the company had been
                     son (Patient) is constrained by    arrested, having in his possession incriminating docu-
                     a state actor (Agent).             ments.
                                                        It is said to be possible to imprison a man for debt in
                                                        Massachusetts.
                                                        He put her in jail at Eastville and she stayed there for some
                                                        time .
     JUSTICE         A SENTENCE Event takes             ... and any person so offending shall be guilty of a felony,
     Sentence        place when a punishment for        and shall, on conviction, be sentenced to confinement in
                     a person or an organization        the penitentiary of this State, for a period not less than
                     (Patient) is issued by a state     ten nor more than twenty years from the time of sentence
                     actor (Agent).                     pronounced on such person.
                                                        If any slave or servant be convicted or any crime the pun-
                                                        ishment whereof may be death or confinement in the peni-
                                                        tentiary
                                                        ..., but that it has been promptly put down and the guilty
                                                        parties summarily punished.
     JUSTICE         An EXECUTE Event occurs            Hector Grant James Horney, and Esther Anderson, white
     Execute         when the life of a person (Pa-     servants, were executed at Chester, Kent county.
                     tient) is taken by a state actor   All these , if the demand of the Administration and its
                     (Agent).                           friends is gratified , are to be hanged ; for the punishment
                                                        of treason by our law is death , ...
                                                        He made some confessions, and managed finally to escape,
                                                        but was arrested, taken to El Dorado, tried, and shot - not,
                                                        however, by regular process.
     HUMANITY An DEPRIVE Event occurs                   We thank Dr. CROFTS for the assurance of his sympathy,
     Deprive  when someone’s right (Pa-                 and hope often to receive his earnest words in behalf of our
              tient) is taken away, disre-              enslaved people.
              spect, or discouraged in any              Before the slaved is freed, this and a hundred other plans
              form of expression including              will be critically canvassed , and the discussion of each will
              but not limited to law, action,           elicit some truth .
              and statement.                            ... shall the four millions slaves , now robbed of all their
                                                        rights, and degraded to a level with brute beast...
     HUMANITY An ENDOW Event occurs                     And as for lynching - let all the officers of the law, with all
     Endow    when someone’s right is en-               the powers of the law, defend the rights and life of every
              riched or appreciated in any              prisoner.
              form of expression including              Before the slaved is freed , this and a hundred other plans
              but not limited to law, action,           will be critically canvassed
              action and statement.                     They are going into every community which offers free-
                                                        dom and protection to their citizens , where law is justly
                                                        administered and where the rights of man are respected ;
                                                        and there are many such sections in this country ; there will
                                                        be the future homes of the Negroes .

Table 8: Event types with their descriptions and examples in the BRAD dataset. Event trigger words are shown in bold.

                                                          2398
Event        Argument       Entity             Description                          Examples
                          Role         Type
                        Agent       PER          The person or organization The resolutions were proposed
                                    ORG          who proposes the law       by the gentleman from Ohio.
        LAW
                        Beneficiary PER          The person or organization
        Propose                                                              Slavery has been brought into
                                    ORG          who benefits from the pro-
                                                 posal                      the House .
                        Object         DOC       The proposed law
                        Time           TIME      When the proposal takes
                                                 place
                        Location       GPE       Where the proposal takes
                                                 place
                        Agent       PER          The person or organization Freedom to the slave should
                                    ORG          who approves the law        now be proclaimed from the
        LAW
                        Beneficiary PER          The person or organization Capitol .
        Approve
                                    ORG          who benefit from this ap- The act was duly approved by
                                                 proval
                                                                            the Executive , published, and
                        Object         DOC       The approved law
                                                                            announced to the civilized words.
                        Time           TIME      When the approval takes
                                                 place
                        Location       GPE       Where the approval takes
                                                 place
                        Agent       PER          The person or organization They shall strike down
                                    ORG          who repeals the law          Slavery .
        LAW
                        Beneficiary PER          The person or organization ... we can not see why the
        Repeal
                                    ORG          who benefits from this re- institution of private property
                                                 peal
                                                                              was to be abolished.
                        Object         DOC       The repealed law
                        Time           TIME      When the repeal takes place
                        Location       GPE       Where the repeal takes place
                        Agent          PER       The person or organization        The red cap was paraded
        CONFLICT                       ORG       who protests                      through Cape Haytien .
        Protest         Patient        ORG       The organization that the         The men were present at a
                                                 agent protest against             public demonstration in
                        Time           TIME      When the protest takes place       Brooklyn .
                        Location       GPE       Where the protest takes
                                                 place
                        Agent          PER        The attacking person or or-      Fremont is scouring the
        CONFLICT
                                       ORG        ganization                       rebels beyond the borders of
        Attack
                        Patient        PER        The target of the attack         Missouri .
                                       ORG                                         On Wednesday morning the
                        Time           TIME       When the attack takes place       rebels prepared to storm our
                        Location       GPE        Where the attack takes place     works in Plymouth .
        CONFLICT        Agent          PER       The person or organization         They cut men in half, and
        Injure-                        ORG       who attempts to attack or kill    pieces from exploded shells,
        Die             Patient        PER       The person or organization        killed and wounded several .
                                       ORG       who is injured or dead            Most of the negroes , we regret
                        Time           TIME      When the injury/death takes       to hear, are said to have been
                                                 place                             massacred.
                        Location       GPE       Where the injury/death takes
                                                 place

Table 9: Descriptions and examples of argument roles for each event type in the BRAD dataset (to be continued in Table 10).
Event triggers are bold and underlined. Arguments are highlighted using colors that match with their roles.

                                                            2399
Event        Argument       Entity            Description                         Examples
                           Role         Type
                          Agent         PER       The acting person or organi-    The poor men of the South
        CONFLICT
                                        ORG       zation                          have been pressed into
        Other
                          Patient       PER       The person or organization      the army to fight the battle of
                                        ORG       who is the object of the act    slavery.
                          Time          TIME      When the action takes place
                          Location      GPE       Where the action takes place
        JUSTICE           Agent         PER       The arresting agent or jailer   A few weeks ago , a man
        Arrest-                         ORG                                       named Hancock was arrested
        Jail              Patient       PER       The person who is arrested      in Union county, Arkansas .
                          Time          TIME      When the arrest/jail takes      Several free colored men were
                                                  place                           captured with the rebels in
                          Location      GPE       Where the arrest/jail takes      Fort Fisher .
                                                  place
                          Agent         PER       The judge or court              It has been represented there are
        JUSTICE
                                        ORG                                       confined in the Government
        Sentence
                          Patient       PER       The person who is sentenced     jail forty-five prisoners , who
                                        ORG                                       are not charged with crime, but
                          Time          TIME      When the sentencing takes       are represented as being slaves
                                                  place
                          Location      GPE       Where the sentencing takes
                                                  place
                          Agent         PER       The    person/organization      A man by the name of Martin .
        JUSTICE
                                        ORG       who orders or carry out the     was tried in El Dorado on a
        Execute
                                                  execution                       similar charge and hanged.
                          Patient       PER       The person who is executed       They seized him, and being
                          Time          TIME      When the execution takes
                                                                                  then convinced of his guilt,
                                                  place
                                                                                  shot him in the woods
                          Location      GPE       Where the execution takes
                                                  place
                          Agent         PER       The    person/organization      It is our right to liberate the
        HUMANITY
                                        ORG       who endows the patient           slaves of an enemy.
        Endow
                          Patient       PER       The person who is endowed       ... and give freedom to persons
                          Time          TIME      When the endowment takes        held to labor in the slave states .
                                                  place
                          Location      GPE       Where the endowment takes
                                                  place
                          Agent         PER       The    person/organization       They have outraged, and
        HUMANITY
                                        ORG       who deprives the patient        robbed, and murdered our
        Deprive
                          Patient       PER       The person who is deprived      peaceful citizens .
                          Time          TIME      When the deprivation takes      ... by the atrocities of the rebels
                                                  place                           delivering back into bondage
                          Location      GPE       Where the deprivation takes     thousands of slaves .
                                                  place

Table 10: Descriptions and examples of argument roles for each event type in the BRAD dataset. Event triggers are bold and
underlined. Arguments are highlighted using colors that match with their roles.

                                                            2400
You can also read