Harry Potter and the Action Prediction Challenge from Natural Language
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Harry Potter and the Action Prediction Challenge from Natural Language David Vilares Carlos Gómez-Rodrı́guez Universidade da Coruña, CITIC Universidade da Coruña, CITIC Departamento de Computación Departamento de Computación Campus de Elviña s/n, 15071 Campus de Elviña s/n, 15071 A Coruña, Spain A Coruña, Spain david.vilares@udc.es carlos.gomez@udc.es Abstract In an alternative line of work, script induction We explore the challenge of action prediction (Schank and Abelson, 1977) has been also a use- from textual descriptions of scenes, a testbed ful approach to evaluate inference and semantic arXiv:1905.11037v1 [cs.CL] 27 May 2019 to approximate whether text inference can be capabilities of NLP systems. Here, a model pro- used to predict upcoming actions. As a case cesses a document to infer new sequences that re- of study, we consider the world of the Harry flect events that are statistically probable (e.g. go Potter fantasy novels and inferring what spell to a restaurant, be seated, check the menu, . . . ). will be cast next given a fragment of a story. For example, Chambers and Jurafsky (2008) in- Spells act as keywords that abstract actions (e.g. ‘Alohomora’ to open a door) and de- troduce narrative event chains, a representation note a response to the environment. This idea of structured knowledge of a set of events occur- is used to automatically build HPAC, a corpus ring around a protagonist. They then propose a containing 82 836 samples and 85 actions. We method to learn statistical scripts, and also intro- then evaluate different baselines. Among the duce two different evaluation strategies. With a tested models, an LSTM-based approach ob- related aim, Pichotta and Mooney (2014) propose tains the best performance for frequent actions a multi-event representation of statistical scripts to and large scene descriptions, but approaches such as logistic regression behave well on in- be able to consider multiple entities. These same frequent actions. authors (Pichotta and Mooney, 2016) have also studied the abilities of recurrent neural networks 1 Introduction for learning scripts, generating upcoming events Natural language processing (NLP) has achieved given a raw sequence of tokens, using BLEU (Pap- significant advances in reading comprehension ineni et al., 2002) for evaluation. tasks (Chen et al., 2016; Salant and Berant, 2017). This paper explores instead a new task: action These are partially due to embedding methods prediction from natural language descriptions of (Mikolov et al., 2013; Devlin et al., 2018) and scenes. The challenge is addressed as follows: neural networks (Rosenblatt, 1958; Hochreiter and given a natural language input sequence describ- Schmidhuber, 1997; Vaswani et al., 2017), but also ing the scene, such as a piece of a story coming to the availability of new resources and challenges. from a transcript, the goal is to infer which action For instance, in cloze-form tasks (Hermann et al., is most likely to happen next. 2015; Bajgar et al., 2016), the goal is to predict the missing word given a short context. Weston et al. Contribution We introduce a fictional-domain (2015) presented baBI, a set of proxy tasks for English corpus set in the world of Harry Potter reading comprenhension. In the SQuAD corpus novels. The domain is motivated by the existence (Rajpurkar et al., 2016), the aim is to answer ques- of a variety of spells in these literary books, associ- tions given a Wikipedia passage. Kocisky et al. ated with keywords that can be seen as unambigu- (2018) introduce NarrativeQA, where answering ous markers for actions that potentially relate to the questions requires to process entire stories. In the previous context. This is used to automatically a related line, Frermann et al. (2017) use fictional create a natural language corpus coming from hun- crime scene investigation data, from the CSI se- dreds of users, with different styles, interests and ries, to define a task where the models try to an- writing skills. We then train a number of standard swer the question: ‘who committed the crime?’. baselines to predict upcoming actions, a task that
requires to be aware of the context. In particular, considered as the scene description is however not we test a number of generic models, from a simple trivial. This paper considers experiments (§4) us- logistic regression to neural models. Experiments ing snippets with the 32, 64, 96 and 128 previous shed some light about their strengths and weak- tokens to an action. We provide the needed scripts nesses and how these are related to the frequency to rebuild the corpus using arbitrary lengths.2 of each action, the existence of other semantically related actions and the length of the input story. 2.2 Data crawling The number of occurrences of spells in the origi- 2 HPAC: The Harry Potter’s Action nal Harry Potter books is small (432 occurrences), prediction Corpus which makes it difficult to train and test a machine To build an action prediction corpus, we need to: learning model. However, the amount of available (1) consider the set of actions, and (2) collect data fan fiction for this saga allows to create a large where these occur. Data should come from differ- corpus. For HPAC, we used fan fiction (and ent users, to approximate a real natural language only fan fiction texts) from https://www. task. Also, it needs to be annotated, determining fanfiction.net/book/Harry-Potter/ that a piece of text ends up triggering an action. and a version of the crawler by Milli and Bamman These tasks are however time consuming, as they (2016).3 We collected Harry Potter stories written require annotators to read vast amounts of large in English and marked with the status ‘com- texts. In this context, machine comprehension re- pleted’. From these we extracted a total of 82 836 sources usually establish a compromise between spell occurrences, that we used to obtain the scene their complexity and the costs of building them descriptions. Table 2 details the statistics of the (Hermann et al., 2015; Kocisky et al., 2018). corpus (see also Appendix A). Note that similar to Twitter corpora, fan fiction stories can be deleted 2.1 Domain motivation over time by users or admins, causing losses in We rely on an intuitive idea that uses transcripts the dataset.4 from the Harry Potter world to build up a corpus Preprocessing We tokenized the samples with for textual action prediction. The domain has a set (Manning et al., 2014) and merged the occurrences of desirable properties to evaluate reading compre- of multi-word spells into a single token. hension systems, which we now review. Harry Potter novels define a variety of spells. 3 Models These are keywords cast by witches and wizards to achieve purposes, such as turning on a light (‘Lu- This work addresses the task as a classification mos’), unlocking a door (‘Alohomora’) or killing problem, and in particular as a sequence to label (‘Avada Kedavra’). They abstract complex and classification problem. For this reason, we rely on non-ambiguous actions. Their use also makes it standard models used for this type of task: multi- possible to build an automatic and self-annotated nomial logistic regression, a multi-layered per- corpus for action prediction. The moment a spell ceptron, convolutional neural networks and long occurs in a text represents a response to the en- short-term memory networks. We outline the es- vironment, and hence, it can be used to label the sentials of each of these models, but will treat them preceding text fragment as a scene description that as black boxes. In a related line, Kaushik and Lip- ends up triggering that action. Table 1 illustrates it ton (2018) discuss the need of providing rigorous with some examples from the original books. baselines that help better understand the improve- This makes it possible to consider texts from the ment coming from future and complex models, magic world of Harry Potter as the domain for the and also the need of not demanding architectural action prediction corpus, and the spells as the set novelty when introducing new datasets. of eligible actions.1 Determining the length of the Although not done in this work, an alternative preceding context, namely snippet, that has to be (but also natural) way to address the task is as a 1 2 Note that the corpus is built in an automatic way and https://github.com/aghie/hpac 3 some occurrences might not correspond to actions, but for ex- Due to the website’s Terms of Service, the corpus cannot ample, to a description of the spell or even some false positive be directly released. 4 samples. Related to this, we have not censored the content of They also can be modified, making it unfeasible to re- the stories, so some of them might contain adult content. trieve some of the samples.
Text fragment Action Ducking under Peeves, they ran for their lives, right to the end of the corridor where they slammed into a door Unlock the - and it was locked. ‘This is it!’ Ron moaned, as they pushed helplessly at the door, ‘We’re done for! This is door the end!’ They could hear footsteps, Filch running as fast as he could toward Peeves’s shouts. ‘Oh, move over’, Hermione snarled. She grabbed Harry’s wand, tapped the lock, and whispered, ‘Alohomora’. And then, without warning, Harry’s scar exploded with pain. It was agony such as he had never felt in all his Kill a target life; his wand slipped from his fingers as he put his hands over his face; his knees buckled; he was on the ground and he could see nothing at all; his head was about to split open. From far away, above his head, he heard a high, cold voice say, ‘Kill the spare.’ A swishing noise and a second voice, which screeched the words to the night: ‘Avada Kedavra’ Harry felt himself being pushed hither and thither by people whose faces he could not see. Then he heard Ron Turn on a yell with pain. ‘What happened?’ said Hermione anxiously, stopping so abruptly that Harry walked into her. light ‘Ron, where are you? Oh, this is stupid’ - ‘Lumos’ Table 1: Examples from the Harry Potter books showing how spells map to reactions to the environment. Statistics Training Dev Test 3.2 Sequential models #Actions 85 83 84 #Samples 66 274 8 279 8 283 The input sequence is represented as a sequence #Tokens (s=32) 2 111 180 263 573 263 937 #Unique tokens (s=32) 33 067 13 075 13 207 of word embeddings, w1:n , where wi is a con- #Tokens (s=128) 8 329 531 1 040 705 1 041 027 catenation of an internal embedding learned dur- #Unique tokens (s=128) 60 379 25 146 25 285 ing the training process for the word wi , and a pre- Table 2: Corpus statistics: s is the length of the snippet. trained embedding extracted from GloVe (Pen- nington et al., 2014)5 , that is further fine-tuned. Long short-term memory network (Hochre- special case of language modelling, where the out- iter and Schmidhuber, 1997): The output for an put vocabulary is restricted to the size of the ‘ac- element wi also depends on the output of wi−1 . tion’ vocabulary. Also, note that the performance The LSTMθ (w1:n )6 takes as input a sequence of for this task is not expected to achieve a perfect ac- word embeddings and produces a sequence of hid- curacy, as there may be situations where more than den outputs, h1:n (hi size set to 128). The last one action is reasonable, and also because writers output of the LSTMθ , hn , is fed to a MLPθ . tell a story playing with elements such as surprise or uncertainty. Convolutional Neural Network (LeCun et al., The source code for the models can be found in 1995; Kim, 2014). It captures local properties over the GitHub repository mentioned above. continuous slices of text by applying a convolution layer made of different filters. We use a wide con- Notation w1:n denotes a sequence of words volution, with a window slice size of length 3 and w1 , ..., wn that represents the scene, with wi ∈ V . 250 different filters. The convolutional layer uses Fθ (·) is a function parametrized by θ. The task is a relu as the activation function. The output is cast as F : V n → A, where A is the set of actions. fed to a max pooling layer, whose output vector is passed again as input to a MLPθ . 3.1 Machine learning models The input sentence w1:n is encoded as a one-hot 4 Experiments vector, v (total occurrence weighting scheme). Setup All MLPθ ’s have 128 input neurons and Multinomial Logistic Regression Let MLRθ (v) 1 hidden layer. We trained up to 15 epochs be an abstraction of a multinomial logistic regres- using mini-batches (size=16), Adam (lr=0.001) sion parametrized by θ, the output for an input (Kingma and Ba, 2015) and early stopping. v is computed as the arg maxa∈A P (y = a|v), where P (y = a|v) is a sof tmax function, i.e, Table 3 shows the macro and weighted F-scores Wa ·v P (y = a|v) = PAe Wa0 ·v . for the models considering different snippet sizes.7 a0 e 5 http://nlp.stanford.edu/data/glove. MultiLayer Perceptron We use one hid- 6B.zip 6 den layer with a rectifier activation function n is set to be equal to the length of the snippet. 7 As we have addressed the task as a classification prob- (relu(x)=max(0, x)). The output is computed as lem, we will use precision, recall and F-score as the evalua- MLPθ (v)= sof tmax(W2 · relu(W · v + b) + b2 ). tion metrics.
To diminish the impact of random seeds and local the performance on these two groups of actions, minima in neural networks, results are averaged with a ∼50 points difference in recall at 5. Also, a across 5 runs.8 ‘Base’ is a majority-class model simple logistic regression performs similar to the that maps everything to ‘Avada Kedavra’, the most LSTM on the infrequent actions. common action in the training set. This helps test Snippet Model R@1 R@2 R@5 R@10 whether the models predict above chance perfor- - Base 11.5 - - - mance. When using short snippets (size=32), dis- MLR 31.4 43.7 60.3 73.5 parate models such as our MLR, MLP and LSTMs MLP 32.1 44.3 61.5 74.9 32 LSTM 32.2 44.3 61.5 74.7 achieve a similar performance. As the snippet size CNN 29.2 41.1 58.1 71.6 is increased, the LSTM-based approach shows a MLR 32.1 44.9 61.9 74.3 clear improvement on the weighted scores9 , some- 64 MLP 32.7 46.0 63.5 76.6 LSTM 33.9 46.1 63.1 75.7 thing that happens only marginally for the rest. CNN 29.9 41.8 59.0 72.2 However, from Table 3 it is hard to find out what MLR 32.0 44.5 60.7 74.6 the approaches are actually learning to predict. MLP 32.6 45.6 63.4 76.6 96 LSTM 34.5 46.9 63.7 76.1 Macro Weighted CNN 29.3 41.9 59.5 72.8 Snippet Model MLR 31.7 44.5 61.0 74.3 P R F P R F - Base 0.1 1.2 0.2 1.3 11.5 2.4 MLP 32.9 45.8 63.2 76.9 128 MLR 18.7 11.6 13.1 28.9 31.4 28.3 LSTM 35.1 47.4 64.4 76.9 MLP 19.1 9.8 10.3 31.7 32.1 28.0 CNN 30.2 42.3 59.6 72.8 32 LSTM 13.7 9.7 9.5 29.1 32.2 28.6 CNN 9.9 7.8 7.3 24.6 29.2 24.7 Table 4: Averaged recall at k over 5 runs. MLR 20.6 12.3 13.9 29.9 32.1 29.0 MLP 17.9 9.5 9.8 31.2 32.7 27.9 64 LSTM 13.3 10.3 10.2 30.3 33.9 30.4 Frequent Infrequent CNN 9.8 7.8 7.4 25.0 29.9 25.4 Snippet Model Fwe R@1 R@5 Fwe R@1 R@5 MLR 20.4 13.3 14.6 30.3 32.0 29.3 Base 3.7 14.5 - 0.0 0.0 - MLP 16.9 9.5 9.8 30.2 32.6 27.8 MLR 35.8 37.1 70.5 14.8 9.5 23.0 96 LSTM 14.0 10.5 10.3 30.6 34.5 30.7 MLP 35.9 38.1 71.9 13.2 9.4 21.8 CNN 10.2 7.1 6.9 25.2 29.4 24.4 32 LSTM 37.1 38.4 71.6 11.7 8.6 23.0 MLR 19.6 12.1 12.9 30.0 31.7 28.2 CNN 33.1 35.5 69.3 7.1 5.2 15.2 MLP 18.9 9.9 10.3 31.4 32.9 28.0 MLR 36.7 37.9 71.8 14.9 9.9 24.0 128 LSTM 14.4 10.5 10.5 31.3 35.1 31.1 MLP 36.4 39.2 74.5 11.0 7.9 21.6 CNN 8.8 7.8 7.1 24.8 30.2 25.0 64 LSTM 39.2 40.3 73.0 12.4 9.4 25.4 CNN 33.9 36.4 70.6 6.9 5.2 15.1 Table 3: Macro and weighted F-scores over 5 runs. MLR 36.4 37.4 70.1 17.1 11.7 25.1 MLP 36.2 39.1 74.0 11.0 7.9 23.1 96 LSTM 39.6 41.1 73.7 12.4 9.6 25.8 To shed some light, Table 4 shows their perfor- CNN 32.7 35.8 71.6 6.3 4.8 13.7 mance according to a ranking metric, recall at k. MLR 35.4 37.2 70.5 15.4 10.7 25.0 The results show that the LSTM-based approach is MLP 36.5 39.5 74.0 11.1 8.2 22.3 128 LSTM 40.3 41.9 74.4 12.3 9.5 26.2 the top performing model, but the MLP obtains just CNN 33.7 36.9 71.4 6.5 5.0 14.6 slightly worse results. Recall at 1 is in both cases low, which suggests that the task is indeed com- Table 5: Performance on frequent (those that occur plex and that using just LSTMs is not enough. It above the average) and infrequent actions. is also possible to observe that even if the mod- els have difficulties to correctly predict the action Error analysis10 Some of the misclassifications as a first option, they develop certain sense of the made by the LSTM approach were semantically scene and consider the right one among their top related actions and counter-actions. For exam- choices. Table 5 delves into this by splitting the ple, ‘Colloportus’ (to close a door) was never performance of the model into infrequent and fre- predicted. The most common mis-classification quent actions (above the average, i.e. those that (14 out of 41) was ‘Alohomora’ (to unlock a occur more than 98 times in the training set, a to- door), which was 5 times more frequent in the tal of 20 actions). There is a clear gap between training corpus. Similarly, ‘Nox’ (to extinguish 8 Some macro F-scores do not lie within the Precision and the light from a wand) was correctly predicted Recall due to this issue. 6 times, meanwhile 36 mis-classifications corre- 9 For each label, we compute their average, weighted by 10 the number of true instances for each label. The F-score Made over one of the runs from the LSTM-based ap- might be not between precision and recall. proach and setting the snippet size to 128 tokens.
spond to ‘Lumos’ (to light a place using a wand), with this dataset could be transferred to real-word which was 6 times more frequent in the train- actions (i.e. real-domain setups), or if such trans- ing set. Other less frequent spells that denote fer is not possible and a model needs to be trained vision and guidance actions, such as ‘Point me’ from scratch. (the wand acts a a compass pointing North) and ‘Homenum revelio’ (to revel a human presence) Acknowlegments were also mainly misclassified as ‘Lumos’. This This work has received support from the is an indicator that the LSTM approach has dif- TELEPARES-UDC project (FFI2014-51978-C2- ficulties to disambiguate among semantically re- 2-R) and the ANSWER-ASAP project (TIN2017- lated actions, especially if their occurrence was 85160-C2-1-R) from MINECO, and from Xunta unbalanced in the training set. This issue is in de Galicia (ED431B 2017/01), and from the Eu- line with the tendency observed for recall at k. ropean Research Council (ERC), under the Euro- Spells intended for much more specific purposes, pean Union’s Horizon 2020 research and innova- according to the books, obtained a performance tion programme (FASTPARSE, grant agreement significantly higher than the average, e.g. F- No 714150). score(‘Riddikulus’)=63.54, F-score(‘Expecto Pa- tronum’)=55.49 and F-score(‘Obliviate’)=47.45. As said before, the model is significantly biased References towards frequent actions. For 79 out of 84 gold Ondrej Bajgar, Rudolf Kadlec, and Jan Kleindi- actions in the test set, we found that the samples enst. 2016. Embracing data abundance: Booktest tagged with such actions were mainly classified dataset for reading comprehension. arXiv preprint into one of the top 20 most frequent actions. arXiv:1610.00956. Human comparison We collected human an- Nathanael Chambers and Dan Jurafsky. 2008. Unsu- pervised learning of narrative event chains. pages notations from 208 scenes involving frequent ac- 789–797. tions. The accuracy/F-macro/F-weighted was 39.20/30.00/40.90. The LSTM approach obtained Danqi Chen, Jason Bolton, and Christopher D. Man- 41.26/25.37/39.86. Overall, the LSTM approach ning. 2016. A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task. In obtained a similar performance, but the lower Proceedings of the 54th Annual Meeting of the As- macro F-score by the LSTM could be an indicator sociation for Computational Linguistics (Volume 1: that humans can distinguish within a wider spec- Long Papers), pages 2358–2367, Berlin, Germany. trum of actions. As a side note, super-human per- Association for Computational Linguistics. formance it is not strange in other NLP tasks, such Jacob Devlin, Ming-Wei Chang, Kenton Lee, and as sentiment analysis (Pang et al., 2002). Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understand- 5 Conclusion ing. arXiv preprint arXiv:1810.04805. We explored action prediction from written sto- L. Frermann, S. B. Cohen, and M. Lapata. 2017. Who- ries. We first introduced a corpus set in the world dunnit? Crime Drama as a Case for Natural Lan- guage Understanding. Transactions of the Associa- of Harry Potter’s literature. Spells in these nov- tion for Computational Linguistics, 6:1–15. els act as keywords that abstract actions. This idea was used to label a collection of fan fiction. Karl Moritz Hermann, Tomas Kocisky, Edward We then evaluated standard NLP approaches, from Grefenstette, Lasse Espeholt, Will Kay, Mustafa Su- leyman, and Phil Blunsom. 2015. Teaching ma- logistic regression to sequential models such as chines to read and comprehend. In Advances in Neu- LSTM s. The latter performed better in general, al- ral Information Processing Systems, pages 1693– though vanilla models achieved a higher perfor- 1701. mance for actions that occurred a few times in the Sepp Hochreiter and Jürgen Schmidhuber. 1997. training set. An analysis over the output of the Long short-term memory. Neural computation, LSTM approach also revealed difficulties to dis- 9(8):1735–1780. criminate among semantically related actions. Divyansh Kaushik and Zachary C. Lipton. 2018. The challenge here proposed corresponded to a How much reading does reading comprehension re- fictional domain. A future line of work we are in- quire? a critical investigation of popular bench- terested in is to test whether the knowledge learned marks. pages 5010–5015.
Yoon Kim. 2014. Convolutional neural networks for Karl Pichotta and Raymond J. Mooney. 2016. Using sentence classification. In Proceedings of the 2014 sentence-level lstm language models for script infer- Conference on Empirical Methods in Natural Lan- ence. In Proceedings of the 54th Annual Meeting of guage Processing (EMNLP), pages 1746–1751. As- the Association for Computational Linguistics (Vol- sociation for Computational Linguistics. ume 1: Long Papers), pages 279–289. Association for Computational Linguistics. Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd Interna- Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and tional Conference for Learning Representation. Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of Tomas Kocisky, Jonathan Schwarz, Phil Blunsom, the 2016 Conference on Empirical Methods in Nat- Chris Dyer, Karl Moritz Hermann, Gabor Melis, and ural Language Processing, pages 2383–2392. Asso- Edward Grefenstette. 2018. The narrativeqa reading ciation for Computational Linguistics. comprehension challenge. Transactions of the Asso- ciation for Computational Linguistics, 6:317–328. Frank Rosenblatt. 1958. The perceptron: A probabilis- tic model for information storage and organization Yann LeCun, Yoshua Bengio, et al. 1995. Convolu- in the brain. Psychological review, 65(6):386. tional networks for images, speech, and time series. The handbook of brain theory and neural networks, S. Salant and J. Berant. 2017. Contextualized Word 3361(10):1995. Representations for Reading Comprehension. ArXiv e-prints. Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. Roger C Schank and Robert P Abelson. 1977. Scripts: 2014. The Stanford CoreNLP natural language pro- Plans, goals and understanding. Lawrence Erlbaum. cessing toolkit. In Proceedings of 52nd annual Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob meeting of the association for computational lin- Uszkoreit, Llion Jones, Aidan N Gomez, Łũkasz guistics: system demonstrations, pages 55–60. Kaiser, and Illia Polosukhin. 2017. Attention is all Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor- you need. In Advances in Neural Information Pro- rado, and Jeff Dean. 2013. Distributed representa- cessing Systems 30, pages 5998–6008. Curran Asso- tions of words and phrases and their compositional- ciates, Inc. ity. In Advances in neural information processing Jason Weston, Antoine Bordes, Sumit Chopra, and systems, pages 3111–3119. Tomas Mikolov. 2015. Towards ai-complete ques- Smitha Milli and David Bamman. 2016. Beyond tion answering: A set of prerequisite toy tasks. canonical texts: A computational analysis of fan- CoRR, abs/1502.05698. fiction. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Process- A Corpus distribution ing, pages 2048–2053. Association for Computa- tional Linguistics. Table 6 summarizes the label distribution across the training, development and test sets of the HPAC Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. corpus. 2002. Thumbs up? sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002). Kishore Papineni, Salim Roukos, Todd Ward, and Wei- Jing Zhu. 2002. Bleu: a method for automatic eval- uation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Compu- tational Linguistics. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 confer- ence on empirical methods in natural language pro- cessing, pages 1532–1543. Karl Pichotta and Raymond Mooney. 2014. Statisti- cal script learning with multi-argument events. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Lin- guistics, pages 220–229. Association for Computa- tional Linguistics.
Action #Training #Dev #Test Action #Training #Dev #Test AVADA KEDAVRA 7937 986 954 CRUCIO 7852 931 980 ACCIO 4556 595 562 LUMOS 4159 505 531 STUPEFY 3636 471 457 OBLIVIATE 3200 388 397 EXPELLIARMUS 2998 377 376 LEGILIMENS 1938 237 247 EXPECTO PATRONUM 1796 212 242 PROTEGO 1640 196 229 SECTUMSEMPRA 1596 200 189 ALOHOMORA 1365 172 174 INCENDIO 1346 163 186 SCOURGIFY 1317 152 166 REDUCTO 1313 171 163 IMPERIO 1278 159 144 WINGARDIUM LEVIOSA 1265 158 154 PETRIFICUS TOTALUS 1253 175 134 SILENCIO 1145 153 136 REPARO 1124 159 137 MUFFLIATO 1005 108 92 AGUAMENTI 796 84 86 FINITE INCANTATEM 693 90 75 INCARCEROUS 686 99 87 NOX 673 82 80 RIDDIKULUS 655 81 88 DIFFINDO 565 90 82 IMPEDIMENTA 552 88 79 LEVICORPUS 535 63 68 EVANESCO 484 53 59 SONORUS 454 66 73 POINT ME 422 57 69 EPISKEY 410 55 59 CONFRINGO 359 52 48 ENGORGIO 342 52 41 COLLOPORTUS 269 26 41 RENNERVATE 253 24 33 PORTUS 238 22 31 TERGEO 235 23 26 MORSMORDRE 219 29 38 EXPULSO 196 23 20 HOMENUM REVELIO 188 30 24 MOBILICORPUS 176 20 14 RELASHIO 174 20 27 LOCOMOTOR 172 24 19 AVIS 166 17 29 RICTUSEMPRA 159 16 26 IMPERVIUS 149 26 13 OPPUGNO 144 18 7 FURNUNCULUS 137 20 20 SERPENSORTIA 133 14 15 CONFUNDO 130 17 21 LOCOMOTOR MORTIS 127 14 15 TARANTALLEGRA 126 11 17 REDUCIO 117 13 22 QUIETUS 108 15 17 LANGLOCK 99 12 19 GEMINIO 78 5 10 FERULA 78 6 10 ORCHIDEOUS 76 7 5 DENSAUGEO 67 13 8 LIBERACORPUS 63 7 5 APARECIUM 63 14 10 ANAPNEO 62 6 5 FLAGRATE 59 4 11 DELETRIUS 59 12 6 OBSCURO 57 11 7 PRIOR INCANTATO 56 4 3 DEPRIMO 51 2 2 SPECIALIS REVELIO 50 11 6 WADDIWASI 45 5 8 PROTEGO TOTALUM 44 9 5 DURO 36 4 4 SALVIO HEXIA 36 8 5 DEFODIO 34 2 6 PIERTOTUM LOCOMOTOR 30 4 3 GLISSEO 26 4 3 MOBILIARBUS 25 3 4 REPELLO MUGGLETUM 23 2 5 ERECTO 23 7 5 CAVE INIMICUM 19 5 2 DESCENDO 19 0 1 PROTEGO HORRIBILIS 18 7 5 METEOLOJINX RECANTO 10 3 1 PESKIPIKSI PESTERNOMI 7 0 0 Table 6: Label distribution for the HPAC corpus
You can also read