A Survey on Sentiment and Emotion Analysis for Computational Literary Studies
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
A Survey on Sentiment and Emotion Analysis for Computational Literary Studies Evgeny Kim and Roman Klinger Institut für Maschinelle Sprachverarbeitung (IMS) University of Stuttgart 70569 Stuttgart, Germany {firstname.lastname}@ims.uni-stuttgart.de Abstract 1 Introduction and Motivation 1.1 On the Importance of Emotions arXiv:1808.03137v1 [cs.CL] 9 Aug 2018 Emotions have often been a crucial part of Human mental experiences consist of various phe- compelling narratives: literature tells about nomena that are not directly grounded in the ob- people with goals, desires, passions, and jective perception of the world. A large portion intentions. In the past, classical literary of our daily decisions and interactions with others studies usually scrutinized the affective di- are driven by subconscious processes, including mension of literature within the framework emotions and affect. Emotions play a crucial role of hermeneutics. However, with emergence when it comes to the arts (Johnson-Laird and Oat- of the research field known as Digital Hu- ley, 2016). Unintentionally or not, when creating manities (DH) some studies of emotions a piece of art, an artist introduces this emotional in literary context have taken a computa- component into her work that in turn make us ex- tional turn. Given the fact that DH is still perience different emotions (Anderson, 2004; In- being formed as a science, this direction germanson and Economy, 2009). When perceiving of research can be rendered relatively new. the arts, for example during reading a novel, peo- At the same time, the research in sentiment ple can feel emotions, because they are drawn into analysis started in computational linguistic the stories that depict characters who act and feel, almost two decades ago and is nowadays an have desires and fears, reach success or fail (Djikic established field that has dedicated work- et al., 2009). Readers of fiction have richer emo- shops and tracks in the main computational tional experiences and better abilities of empathy linguistics conferences. This leads us to and understanding of others’ lives than people who the question of what are the commonalities do not consume literature (Mar et al., 2009; Kidd and discrepancies between sentiment anal- and Castano, 2013). ysis research in computational linguistics This observation has two major implications for and digital humanities? In this survey, we the connection between the literature and human offer an overview of the existing body of emotions. First, literature requires that we use research on sentiment and emotion analy- our emotions in order to understand it (Robinson, sis as applied to literature. We precede the 2005), or better, we have to use our knowledge main part of the survey with a short intro- about human emotions to understand the feelings duction to natural language processing and and moods of the fictional characters. Second, emo- machine learning, psychological models of tional experiences we draw from the literature are emotions, and provide an overview of ex- of the same sort we have in real life, which makes isting approaches to sentiment and emotion literature a valid source of the depiction of human analysis in computational linguistics. The emotions (Hogan, 2010, 2015). papers presented in this survey are either All said-above means that emotions are tightly coming directly from DH or computational intervened with the content of artistic work, and linguistics venues and are limited to sen- thus need to be studied in this context not only by timent and emotion analysis as applied to humanities scholars but by psychologists as well, literary text. because research in this direction can benefit the
understanding of both the arts and emotions. emotional intelligence (Bal and Veltkamp, 2013; The link between emotions and arts in general Djikic et al., 2013; Johnson, 2012; Samur et al., is a matter of debates that date back to the Ancient 2018; Djikic et al., 2009). Moreover, there is a period, particularly, to Plato, who viewed passions growing body of literature that recognizes the de- and desires as the lowest kind of knowledge and liberate choices people make with regard to their treated poets as undesirable members in his ideal emotional states when seeking narrative enjoyment, society (Plato, 1969). In contrast, the Aristotle’s for example a book or a film (Zillmann et al., 1980; view on emotive component of poetry expressed in Ross, 1999; Bryant and Zillmann, 1984; Oliver, his Poetics (Aristotle, 1996) differed from Plato’s 2008; Mar et al., 2011). The influence of mood in that emotions do have great importance, particu- on these choices has been studied by Zillmann larly, in the moral life of a person (de Sousa, 2017). (1988). His mood-management theory proposes For a long period of time, no single word or term that readers and viewers when seeking entertain- existed in English language to describe “the emo- ment make choices that will promote or main- tions” as a category of feeling (Downes and McNa- tain positive moods or reduce the negative ones. mara, 2016). However, in the late 19th century the Usual objections to the mood-management theory emotion theory of arts stepped into the spotlight of point to the fact that people still enjoy tragedies philosophers. One of the first accounts on the topic or horror stories (Oliver, 1993; Oliver et al., 2000; is given by Leo Tolstoy in 1898 in his essay What Oliver, 2008), though these genres provoke nega- is Art? (Tolstoy, 1962). Tolstoy argues that art can tive emotions in them, such as sadness, fear, anxi- express emotions experienced in fictitious context ety, and anger. A possible solution was proposed by and the degree to which the audience is convinced Vorderer et al. (2004): Enjoyment is explained by in them defines the success of the artistic work (cf., the notion of “meta-emotions”, i.e., emotions we Anderson and McMaster (1986), (Hogan, 2010), experience towards our emotions directed at some and Piper and Jean So (2015)). But why do imagi- object, which are deemed appropriate in a partic- nary contexts make people experience emotions? ular situation. Recent research in cognitive psy- This paradox that later received the name “para- chology suggests possible explanations why such dox of fiction” was first pinpointed by the English experiences are perceived as positive in the first philosopher Coling Radford (Radford and Weston, place (Tamborini et al., 2010). 1975). The paradox is formulated as follows: New methods of quantitative research emerged 1. We experience emotions towards fictitious in the humanities scholarship bringing forth the so- characters, object or events. called “digital revolution” (Lanham, 1989) and the transformation of the field into what we know as 2. In order to experience emotions, we must be- digital humanities (Berry, 2012; Schreibman et al., lieve that these characters, object, or events 2015). The adoption of computational methods of are real. text analysis and data mining from the fields of then fast-growing areas of computational linguistics and 3. We do not believe that these characters or sit- artificial intelligence provided humanities scholars uations are real. with new tools of text analytics and data-driven approaches to theory formulation (Vanhoutte, 2013; This paradox and its possible solutions are dis- Jockers and Underwood, 2016). cussed (e.g., Walton (1978), Lamarque (1981), and Neill (1991)) and disputed (e.g., Tullmann and Although one of the first works on computa- Buckwalter (2014)) by others, but we leave it to the tional treatment of subjective phenomena was orig- reader to explore this philosophical problem. What inating from the area of artificial intelligence (AI) we would like to highlight though in relation to this (Carbonell, 1979) (cited by Pang et al. (2008)), it paradox is that Radford’s statements contributed to was only a few years later that the first work on the popularity of the research on emotions and arts the computer-assisted modeling of emotions in lit- in many fields, from literary studies to psychology. erature was published (Anderson and McMaster, But what exactly can we learn from this inter- 1982). Challenged by the question why some texts play of emotion and literature? Emotional intel- are more interesting than the others, in their pa- ligence is a prerequisite to understanding literary per, Anderson and McMaster concluded that the fiction but reading literature in turn enhances our “emotional tone” of a story can be responsible for
the reader’s interest. The results of their study sug- DH line of research. However, to make the reader gest that a large-scale analysis of “emotional tone” aware of these applications, we shortly mention of the collection of texts is possible with the help examples of them in Section 1.3. of a computer program. There are three implica- The survey is structured as follows: Section 1 tions of this finding. First, they suggested that by is an introduction. Section 2 introduces the reader identifying emotional tones of text passages one to the field of natural language processing (NLP) can model affective patterns of a given text or a and to the standard pipeline used in many NLP collection of texts, which in turn can be used to projects. Section 3 introduces the most common challenge or test existing literary theories. Second, emotion theories used for the development of meth- their approach to affect modeling demonstrate that ods of computational emotion analysis, as well the stylistic properties of texts can be defined on as provides an important background to emotion the basis of their emotional interest and not only analysis of literary texts from a classic and com- linguistic characteristics. And finally, they suggest putational perspective. Section 4 is the core of the that functional texts (speeches, memos, advertise- survey and is an overview of different applications ment) can be run through an emotion analysis pro- of sentiment and emotion analysis to literary text. gram to test whether they will have the intended Section 5 concludes the paper. impact. With regard to these implications, the work by Anderson and McMaster (1982) is an important 1.3 Other Applications of Emotion and early piece as it laid out the “roadmap” for some Sentiment Analysis of the basic applications of sentiment and emotion The survey does not cover every possible work on analysis of texts, namely sentiment and emotion emotion analysis that exists, even in the digital hu- pattern recognition from text and computational manities context. The understanding and automatic text characterization based on sentiment and emo- analysis of emotions, sentiments and affects played tion. an important role in computer science and artificial intelligence in the last decades. It is applied in a va- 1.2 Scope and Structure of the Survey riety of studies from which we discuss a selection in the following. The goal of this survey is to provide a compre- hensive overview of the methods of emotion and Robotics and Artificial Intelligence (AI) sentiment analysis as applied to a text. The sur- While there is big overlap between the robotics vey is prepared with a digital humanities scholar and AI, the former is mostly an engineering field in mind who is looking for an introduction to the that deals with the design and use of robots, while existing research in the field of sentiment and emo- the latter is more concerned with their actual tion analysis from (primarily literary) text. All operation including but not limited to decision the studies presented in this article are either di- making, problem solving, and reasoning (Brady, rectly coming from digital humanities venues or 1985). This also includes emotional intelligence, deal with sentiments and emotions in the literary as more and more robots that are developed today text context. A substantial number of the works of serve not only pragmatic goals (e.g., cleaning, the latter category originate from the computational warehouse operation) but social ones as well linguistics community. Their primary goal is often (Breazeal, 2003). The motivation for affective a methodological one rather than interpretative one. computing in robotics and AI, therefore, is to build However, these works are still included in the sur- robots and virtual agents that are more human-like vey, as we believe – and argue in the discussion – in terms of communication and reasoning. that interpretation and methodology should come Robots and virtual agents that are able to recog- hand in hand. nize and express emotions have been one of the foci The survey does not cover applications of emo- in the fields of robotics and artificial intelligence tion and sentiment analysis in the areas of digital for decades, both at the conceptual (Sloman and humanities that are not focused on text, e.g., sen- Croucher, 1981; Dorner and Hille, 1995; Wright, timent analysis of visual art and design, movies, 1997; Coeckelbergh, 2012) and implementational or music. It does also not provide an in-depth (case-study) levels (Velásquez, 1998; Leite et al., overview of all possible applications of emotion 2008; Beck et al., 2010; Klein and Cook, 2012). analysis in the computational context outside of the Some works focus on theoretical implications of
emotional robots (Sloman and Croucher, 1981; Fri- quently make the gaming experience even more jda and Swagerman, 1987; Evans, 2004; Arbib and enjoyable. Fellous, 2004) engaging in a fundamental discus- On the one hand, recognition and elicitation of sion of such a possibility. A closely related body user’s emotions through mining of player data (e.g., of research touches upon moral and ethical impli- recognition of facial expressions and keystroke pat- cations that arise when we talk about autonomous terns, chat message analysis) has several applica- self-aware robots, who may make decisions which tions in the field of game development. For ex- are against human moral judgements (Kahn Jr et al. ample, by timely and accurately recognizing the (2012), Arkin et al. (2012), Malle et al. (2015). player’s emotional state the system can adaptively Another thriving line of research related to AI respond to it by changing the game environment and emotions deals with computational modeling (changing the pace, color scheme, or even sug- of emotions in robotic and virtual agent applica- gesting the player to take a break). It can also tions. For example, Gratch and Marsella (2004) play a role in educational games by customizing propose a new methodology of emotion modeling the learning process (Zhou (2003), Conati (2002), based on comparing the behavior of the computa- Conati et al. (2003)). On the other hand, games tional model against human behavior and on the that are able to cause an emotional response in use of standard clinical instruments for assessing players, such as fear or happiness, are more im- emotions. Pereira et al. (2005) outline the belief– mersive (Sweetser and Johnson, 2004), and are desire–intention architecture of emotions based on thought to be facilitating the flow (Johnson and four modules, namely the Emotional State Man- Wiles, 2003), which is a state of profound enjoy- ager, the Sensing and Perception Module, Capa- ment and total immersion in an activity (Csikszent- bilities module, and the Resources module, where mihalyi and Csikszentmihalyi, 1992). Therefore, each module is responsible for separate processes for the gaming process to be captivating and real- within the emotion concept. Jiang et al. (2007) put istic it is important that the player interacts with forth an extended belief-desire-intention model in- realistic non-player characters that express emo- troducing primary and secondary emotions into the tions in an intelligent way and react to player’s architecture. emotions appropriately (Chaplin and Rhalibi, 2004; Human-computer interaction (HCI) can be con- Hudlicka and Broekens, 2009; Bosser et al., 2007; sidered a subfield of artificial intelligence. It has Ochs et al., 2008, 2009; Li and Campbell, 2015; also showed an increased interest in emotions. For Popescu et al., 2014, i.a.). instance, Cowie et al. (2001) examine basic issues related to the extraction of emotions from the user Emotion Detection from Voice, Face, Body, and consolidating psychological and linguistic analyses Physiology In contrast to robotics and gaming, of emotions. Pantic and Rothkrantz (2003) argue the goal of the recognition of emotions from bodily that next-generation HCI designs will need to in- reactions focuses on humans; to identify patterns clude the ability to recognize user’s affective states in acoustic speech signals, facial expressions, body in order to become more effective and more human- postures, and physiology, and classifying them into like. Both Beale and Creed (2008) and Beale and different emotions, often with machine learning Creed (2009) provide an overview of the role of techniques. Calvo and D’Mello (2010) provide an emotions in HCI highlighting important lessons in-depth survey to which we refer the reader for a drawn from different research and providing guide- comprehensive overview. We will, however, add lines for future research. that since the publication of Calvo’s and D’Mello’s survey, the methodology has changed in terms of Computer Games and Virtual Reality As the used equipment. Earlier researchers had to video games become more complex and engag- rely on laboratory equipment. Nowadays more and ing, research in the field of game AI gains more more studies are done with the help of non-invasive popularity. The foci of the research are different wearable devices (wrist bands and smartphone ap- (cf. Yannakakis (2012)) but the ones relevant to our plications) that monitor the subjects’ emotional discussion are mining of player data and enhancing state (cf. Dupré et al. (2018), Ghandeharioun et al. non-player character behavior. The main motiva- (2016)). This turn seems warranted as it provides tion for researchers from this field is to study what the researchers and developers with a more natural makes players enjoy or detest a game and conse- and close monitoring of the subjects, and hence, a
larger amount of research data. The goal here is to enable automatic detection of the users’ posts with respect to the sentiments and Sentiment Analysis and Opinion Mining Ap- opinions. This can be useful for automatic moni- plications of sentiment analysis and opinion min- toring social media for emergency reports, violent ing outside of a humanities context are not covered language, and mood of certain user groups. by this survey. However, in Section 3.3 we will give an overview of the existing methods used in 2 A Very Short Introduction to Natural sentiment analysis, as some of them are relevant in Language Processing and Machine the context of the reviewed papers. In this section, Learning we give a short overview of other application areas of sentiment analysis and opinion mining. In the introduction, we discussed the importance Opinion mining deals with tracking and auto- of emotion analysis for literary studies. In Section matic classification of opinions expressed by peo- 3.3, we will provide an overview of research in ple (Liu, 2012). Opinion in a narrow sense is under- sentiment and emotion analysis as applied to text. stood as evaluation or attitude towards some object As that section relies on the concepts from both (Liu, 2015). Although opinion mining and senti- natural language processing (NLP) and machine ment analysis are often used interchangeably in the learning, we provide a short introduction to both literature, opinions are not sentiments (Munezero disciplines in the following. et al., 2014): While sentiment is prompted by emo- A comprehensive overview of NLP is beyond the tions, opinions are judgements based on objective scope of this survey paper. Therefore, we present or subjective interpretations of the topic that are NLP tasks as steps of a single pipeline (see Fig- not necessarily related to emotions. ure 1), which is common for many NLP projects. As far as sentiment analysis and opinion mining Readers who are familiar with NLP may skip this are concerned with human attitudes and feelings section without any hesitation. Readers who feel towards anything, they find applications in many that they need an in-depth textbook-style introduc- areas, for instance in business and sociology. A tion to the field are referred to Jurafsky and James popular application of opinion mining is automatic (2000), which we follow in this section to describe review analysis, often performed on a large collec- some important concepts of NLP. tion of reviews that can originate from any domain, According to the Encyclopedia of Cognitive Sci- for example movie reviews (Amolik et al., 2016; ence (Allen, 2006), NLP is a field that explores Parkhe and Biswas, 2016; Tang et al., 2018), prod- computational methods for interpreting and pro- uct reviews (books, electronics, DVDs, etc.) (Fang cessing natural language, in either textual or spo- and Zhan, 2015; Xia et al., 2015), restaurant and ken form. NLP addresses a variety of tasks related tourism products reviews (Kiritchenko et al., 2014; to language use and text analysis, from machine Gan et al., 2017; Marrese-Taylor et al., 2014). The translation to code switching to named entity recog- goal of opinion mining in this context is to classify nition to semantic role labeling. Regardless of the reviews into positive or negative with various levels task, any NLP project includes several preliminary of classification granularity. steps of speech or text processing that are necessary Opinion mining is not limited to reviews. Com- for these and other downstream tasks. We now pro- putational social sciences have also witnessed an ceed to the description of these fundamental steps increased interest in automatic sentiment analysis, in an NLP pipeline. for example, in the political domain (Maragoudakis et al., 2011; Ceron et al., 2014; Rill et al., 2014; 2.1 Typical NLP pipeline Liu and Lei, 2018). A goal of these studies is not Modern NLP pipelines may include a variety of only to understand the electoral preferences of the processes and heavy feature engineering combining population, but also to gain insight into how these multiple features. Figure 1 shows the basic NLP preferences are formed and propagated via social pipeline that is most commonly used across various media (Yaqub et al., 2017). projects. A significant amount of research is concerned An NLP pipeline usually starts with speech with automatic analysis of social media posts recognition (if the input is speech) and then con- (most commonly, Twitter), for example Khan et al. tinues as if the input is directly text. The next (2015), Rosenthal et al. (2017), Asghar et al. (2018). step is tokenization and segmentation followed by
Speech Tokenization and Morphological Syntactic analysis recognition Semantic analysis segmentation analysis he hates me he , hates , me he , hate , I S he hate I person emotion person NP VP he V NP hate I Figure 1: Typical NLP pipeline. morphological analysis, syntactic analysis, and se- question mark or exclamation point, are unambigu- mantic analysis (Uszkoreit, 2001). ous markers of a sentence boundary, periods are less unambiguous, as they also indicate abbrevia- 2.1.1 Speech Recognition tions boundaries (e.g., Mr., Mrs., Inc.). Therefore, Such that a speech signal can be analyzed syntacti- it is often more appropriate to address word tok- cally and semantically, it is typically first converted enization and sentence segmentation jointly. to text to be able to apply the same methods as for text as input. First, the analogue speech signals are 2.1.3 Morphological Analysis sampled by time, filtered, and decomposed into the After the text is available is a segmented form, each frequency domain. The frequency components are word in the text can be analyzed for its morpho- then analyzed for features (the most common of logical properties, e.g., inflection and case markers. which are mel frequency cepstral coefficients (Imai, For each token, a morphological parser outputs its 1983)) and converted into specific acoustic feature lemma (a dictionary form), and a part-of-speech vectors. Then, a language model and vocabulary category with the morphosyntactic information. In are used to calculate the phonetic likelihood of each addition, words can be stemmed, that is, reduced to speech sample. The decoded speech signal is then their root with affixes and suffixes being removed. available as hypotheses of textual representations. Morphological analysis is an important prerequisite 2.1.2 Tokenization and Segmentation for syntactic analysis. Text, converted from a speech signal or directly as Lemmatization is a process of casting a word input is passed to the tokenization and segmenta- to its base form. It is often required to reduce the tion part of the pipeline that outputs an array of variability of surface realizations of the words shar- tokens, i.e., the input text in which units, often ing the same root. While lemmatization involves words, are separated from each other. This process- a complex morphological analysis of a word (the ing is required before a morphological analysis can algorithm should learn, for example, that the words be applied. sang, sung, sings share the same lemma form sing), In many languages, words are separated by stemming takes a simpler approach. In some ap- whitespace and splitting the text on it, in most cases, plications, it is only important to map the word to will produce a meaningful output. However, some its root, without full parsing of a word. Stemming words contain whitespaces (e.g., San Francisco, does exactly that by chopping off the affixes of the rock and roll) and, depending on the application, words. For example, in web search, one may want some tokenizers may also tokenize multi-word ex- to map foxes to fox, but might not need to know that pressions. Some tokenizers can also expand clitic foxes are plural (Jurafsky and James, 2000, p. 46). contractions that are marked by apostrophes, for One popular algorithm for stemming is the Porter example don’t is converted to do not and I’m to I stemmer (Porter, 1980). am. Morphological parsing is important not only In some cases, the text should be segmented for lemmatization and stemming, but for part-of- into sentences first and only then into words. Es- speech (POS) tagging as well. In fact, POS tagging sentially, the task of sentence segmentation is to is often based on the analysis of word affixes and separate sentences from each other. The most com- suffixes (e.g., adjectives in English are recognized mon cues for segmenting a text into sentences are by -able,-ful,-ish among other suffixes, while verbs punctuation marks. Though some symbols, like by -ate and -en). The number of POS tags used
varies, from seventeen, as in the Universal POS have similar meaning. Generally, these words are tagset (Petrov et al., 2012), to forty-five in Penn represented as vectors or arrays of numbers that Treebank (Marcus et al., 1993), to sixty one used by are, in some way, related to word counts. These the Lancaster UCREL project’s CLAWS (the Con- relationships are captured in a term-context matrix stituent Likelihood Automatic Wordtagging Sys- that represents how well each word fit with other tem) (Rayson and Garside, 1998). words (context) in the corpus. Such a matrix is of dimensionality |V | × |V |, where each cell contains 2.1.4 Syntactic Analysis the number of times the row word (target) and the Based on the morphological information obtained column word (context) co-occur in some context in in the previous step, the words in the sentence some corpus. The matrix can then be used to cal- are analyzed for their grammatical function (e.g., culate the similarity of the words, with the cosine whether a word is a subject, object, modifier). This measure being used more commonly. process is called parsing and it is important for In contrast to count-based sparse vector represen- analyzing the relationship between words, includ- tations, most approaches rely on dense representa- ing disambiguating their meaning. The output of tions nowadays, either obtained by dimensionality this step of the pipeline is a text represented by its reduction or by predicting the target word or its syntactic or dependency tree. context. Examples for this group vector represen- There are two main types of parsing: con- tations include embeddings (Mikolov et al., 2013; stituency parsing based on Chomsky’s generative Pennington et al., 2014). grammar (Chomsky, 1993), and dependency pars- ing based on dependency rules (Kübler et al., 2009). 2.2 Machine-learning The main difference between the two types of pars- Although some of the previously described tasks, ing is that the constituency parsing operates on a such as POS tagging or syntactic parsing, can be phrase level, where each type of phrase (e.g., noun performed using rule-based approaches, most mod- phrase, verbal phrase) is allowed to be composed ern NLP pipelines make use of machine learning of phrases of certain type, while dependency pars- methods. The advantage of machine learning over ing operates on a word level and takes into account rule-based systems becomes especially clear in the dependency rules between them. context of large data that needs to be processed. Writing down rules that capture all the minuscule 2.1.5 Semantic Analysis differences and variety of language in some corpus Finally, the sentences, phrases, or words of the is a tedious and by and large an impossible task. text are analyzed for their meaning based on the That is where machine learning techniques come information obtained in the preceding parts of the handy. Machine learning is a subfield of artificial pipeline. intelligence widely applied to many other disci- Semantic analysis is needed to disambiguate plines, including natural language processing and polysemous words, which is especially difficult data science. In the remainder of this section, we in- given a wide range of meanings a single word can troduce three main paradigms of machine learning take. The most straight-forward approach to word and briefly describe how are they used for solving sense disambiguation is through the use of lexi- NLP tasks. cal resources, such as WordNet (Fellbaum, 1998). WordNet provides a set of lemmas for nouns, verbs, 2.2.1 Learning Paradigms adjectives, and adverbs, where each lemma is an- Machine learning is about using the right features notated with a set of senses. to build the right models that achieve the right task Many of the disambiguation algorithms, how- (Flach, 2012, p. 13). The machine learning models ever, rely on contextual similarity when choosing learns to associate characteristics of each instance the proper sense. There are different approaches with a class to be predicted. These charateristics are to computing the word context. One of the most commonly referred to as features. Following this popular of them is a distributional semantics ap- definition, one must acknowledge that there is no proach. Distributional semantics deals with seman- single machine learning framework (cf. “no silver tic properties of words derived from their distribu- bullet” argument by Brooks (1987)) that applies to tion across texts. The intuition behind its use is all possible scenarios. Generally, machine learning that words that occur in the same context tend to settings bifurcate into supervised and unsupervised
learning paradigms, with each of the paradigms gineering is often iterative: features are added, re- encompassing a wide range of models. moved, normalized, and fine-tuned until the model Supervised machine learning refers to the meth- achieves the results one expects from it. Tradi- ods of labeling unseen data by learning a function tionally, feature engineering has been paid great from labeled training instances. A classifier is a attention to in machine learning. However, recent function ĉ : X → Yc , where X is an instance successes in the family of machine learning meth- of data and Y = {c1 , c2 . . . , ck } is a finite set of ods known as deep learning have deemed features class labels. Labels can be numerical, ordinal, or less necessary an ingredient than the model archi- nominal, or structured, and often denote a class tecture. membership of each data instance. For example, in 2.2.3 Deep Learning the task of POS tagging, labels are the actual POS tags assigned to the words in the training set. Dur- During the past decade, neural networks have re- ing the training phase, a classification algorithm gained their once-lost popularity, which vanished learns a mapping from instances to labels, and later, in the late 1990s due to the computational cost during the prediction phase, classifies the new, un- associated with them and the rise of other success- seen, instances with the class labels. Examples ful methods, for instance support vector machines. for supervised machine learning methods are naı̈ve One of the factors that can be attributed to the re- Bayes classifier, support vector machines, decision emergence of neural networks is the availability of trees, and supervised deep learning algorithms. moderately expensive hardware and software capa- However, labeled training data is not always ble of processing big data. What was not possible available or is prohibitively expensive. Moreover, back in 1990s has become possible now: neural sometimes researchers do not know the actual la- networks can be trained on big amounts of data, bels of the data they have. In this case, another with comparably big sets of parameters and “deep” family of the machine learning algorithms, referred architectures. to as unsupervised machine learning, comes to The general idea behind deep learning is to build the rescue. Clustering is one of the most popular models in which, specifically in NLP, words are rep- unsupervised models that works by assessing the resented in a continuous space (following the ideas similarity between instances and arranging them of distributional semantics). Neural networks usu- in such a way that similar instances are put in the ally have several layers, which are trained jointly same cluster while dissimilar instances are put in to fulfill the specific task at hand. Each of these different clusters. The output of an unsupervised layers can be interpreted as being responsible for machine learning algorithm can be used to better different subtasks on the route to the common goal. understand the nature and variance of the data, or as The layers of the network extract and transform a prerequisite step to develop a supervised learning features sequentially. The layers that are close to task with a set of defined labels. the data input extract simple features, while higher layers learn more complex features derived from 2.2.2 Feature-based Learning the lower layer features (Zhang et al., 2018, p. 2). A common first approach when developing a Exactly due to such a multi-layered structure of machine-learning-based model is to map each in- deep neural networks, manually designed features stance into a representation of its characteristics, are of lower importance, as every layer extracts its features. Features are functions mapping in- them from the input on its own. stances to a set of values, for instance real num- Common network substructures for NLP include bers, Boolean values (e.g., “is this word an adjec- the following components: embedding layers, con- tive/noun/verb?”, “is this word a proper noun?”, “is volutional networks or a long-short-term memory previous word ‘the’?”, etc.) and integers (when the network, and a dense layer, which we discuss ex- feature is a count of something). In text classifi- emplarily in the following. Word embeddings trans- cation, a common approach is the so-called bag- forms words in a vocabulary to vectors of contin- of-words, in which each word is represented by uous real numbers that represent words as a func- its count in an instance (a document, sentence, for tion of their context and encode linguistic patterns. instance). Word2Vec (Mikolov et al., 2013) is one popular Features do not come ready-made with the data word embedding approach that includes models and the process of model building and feature en- for predicting a target word from its context and,
vice versa, predicting the contexts words given the pipeline we describe in Section 2.1 are performed target word. Dense layers combine the information today with the help of machine learning. The pre- received from the preceding components and often sented pipeline is rather fundamental and is often perform the final classification. Convolutional neu- used as a part of other larger pipelines designed for ral networks (CNN) (LeCun et al., 1989, 1998) are specific applications. These applications include di- a special kind of neural networks originally used alogue systems, discourse analysis, document clas- in computer vision, inspired by the human visual sification, text generation, text mining, machine cortex. Similar to the visual system, CNNs are able translation, question answering, text summariza- to detect relevant features in the input that is pro- tion, and, finally, sentiment and emotion analysis. cessed in an “n-gram” fashion. This is achieved by With this necessary introduction to natural lan- using filters, that detect relevant features from the guage processing and machine learning, we now input, and a max pooling, an operation of extract- may proceed to an overview of what sentiment and ing the most representative numeric values from emotion analysis is and how it is performed compu- the filtered features. CNNs ability to capture the tationally. But before that, we first need to provide spatial correlation of features proved to be useful a background in the emotion theories that exist in in the NLP context, as features important for text psychology and introduce the role they play in the classification may be located in different places of computational emotion analysis. the input. Long short-term memory (LSTM) networks 3 Background on Sentiment Analysis Hochreiter and Schmidhuber (1997) have a recur- and Emotion Analysis sive structure and interpret the input as time-series 3.1 Affect and Emotion in Psychology and are capable of learning distant dependencies. In contrast to CNNs that are limited to their filter The history of emotion research has a long and rich sizes, LSTMs have a memory of more distant in- tradition that followed the 1872 Darwin’s publica- formation. This comes at cost of computational tion of The Expression of the Emotions in Man and complexity. The trade-off between efficiency and Animals Darwin (1872). The subject of emotion complexity is realized in the mechanism of a “for- theories is so vast and diverse that it is not possi- get gate”. The gate discards irrelevant information ble to even briefly mention all of the theories or (features) from the previously read input. This name prominent psychologists who contributed to makes LSTM efficient in learning sequential data, the emotion research throughout the nineteenth and as irrelevant features are discarded improving the twentieth centuries (see Gendron and Feldman Bar- prediction, which is not biased by unimportant de- rett (2009) for a brief history of ideas about emotion tails. in psychology). Most emotion theories, however, These and other components make deep learn- that appeared in the last century fall into one of ing an efficient tool for solving many problems. the traditions, namely basic, appraisal, and con- However, deep learning has its limitations. First, it structionist. In the pages that follow, we briefly often requires large amounts of data to recognize discuss models of emotions as they are introduced helpful characteristics in data, which is not always in psychology. We limit these descriptions to those available, especially in certain domains. Second, theories which have been used to formalize com- deep learning algorithms are not always easily in- putational methods for automatic analysis in the terpretable, which often makes it difficult to under- digital humanities and natural language processing. stand what meta-parameters of the network should Namely, we will introduce two theories from basic be optimized for a better result. tradition, and one theory from both appraisal and constructionist ones. 2.3 Applications 3.1.1 Ekman’s Theory of Basic Emotions Machine learning finds an extensive application in The basic emotion theory was first articulated natural language processing. The advancements by Silvan Tomkins in the early 1960’s (Tomkins, in machine learning have contributed to the devel- 1962). Inspired by Darwin’s view of emotions opment of the field in recent years, both in terms as mental states that cause stereotypic bodily ex- of methodology and the efficiency of performing pressions (Gendron and Feldman Barrett, 2009), certain tasks. Most of the steps of a typical NLP Tomkins postulated that certain emotions are au-
tomatically triggered by objects or events in the Based on the observation of facial behavior in world. Importantly, each episode of certain emo- early development or social interaction, Ekman’s tion (or “instance”), Tomkins argue, is biologically theory also postulates that emotions should be con- similar to other instances of the same emotion or sidered discrete categories (Ekman, 1993), rather share a common trigger. Tomkins’ own work in than dimensional. Though this view allows for turn inspired one of his mentees, Paul Ekman, to conceiving of emotions as having different intensi- formulate a new theory of emotions. Ekman put in ties (for example, anger can take different intensity, question the existing emotion theories that postu- from resentment to rage), it does not allow emo- lated that facial displays of emotion are socially tions to blend and leaves no room for more com- learned and therefore vary from culture to cul- plex affective states in which individuals report the ture. Together with Sorenson and Friesen, Ekman co-occurrence of like-valenced discrete emotions (Ekman et al., 1969) endeavor on a field trip to (Barrett, 1998). This and other theory postulates New Guinea, Borneo, the United States, Brazil, were widely criticized and disputed in literature (c.f and Japan to challenge this view. The outcome Russell (1994), Russell et al. (2003), Gendron et al. of their large-scale study led to a conclusion that (2014), Barrett (2017)). would revolutionize the field of psychology for Regardless of the criticism that Ekman’s theory many years: facial displays of fundamental emo- of basic emotions has undergone in recent years, tions are not learned but innate, and therefore are the theory itself as well as its methodology, was universal across the nationalities. However, there revolutionary in the time of its appearance and are culture-specific prescriptions about how and in continued to shape the research in emotion in the which situations emotions are displayed. late twentieth century. Ekman’s categories of ba- To come to this basic emotion definition, Ekman sic emotions are frequently used in the research and et al. select 30 photographs of adult males and on computational facial emotion recognition (e.g., females, children, professional actors, and mental Essa and Pentland (1997), Pantic and Rothkrantz patients. The photographs are selected in such a (2000), Bartlett et al. (2005)) and well as in emo- way that the portrayed faces express one of the six tion recognition from text. basic affects from Tomkin’s list of affects, exclud- 3.1.2 Plutchik’s Wheel of Emotions ing interest and shame, namely anger, fear, disgust, surprise, sadness, and happiness. The selection of Robert Plutchik was an American psychologist and affects is based on previous research (Ekman et al., a professor of psychiatry at the Albert Einstein Col- 1971)1 that finds that facial expressions pertaining lege of Medicine, who contributed to the study of to these emotions are clearly identifiable and can emotions, violence and suicide2 . In the early 80’s be scored by observers. These selected pictures are he formulated his psychoevolutionary theory of then shown to the participants of the study along emotions (Plutchik, 1991, cited by revised version) with the list of six basic affects. The observer’s together with the postulates that shape it, some of task is to categorize each picture into one of the six which overlap with the assumptions of Ekman’s the- categories. ory, that there is a small number of basic emotions, which differ from each other both in physiology Ekman’s research boosted interest in emotion and behavior, and which can exist in varying de- and brought forth new challenges and questions grees of intensity). However, there are important about the nature of the emotions. In his subsequent differences to the Ekman’s study of emotions. studies, Ekman showed that both nature and nur- First and foremost, Plutchik stated that, apart ture must be considered in the study of emotions from a small set of basic emotions, all other emo- (Ekman, 1971, 1992) and that facial expressions tions are mixed and derived from the various com- of emotions, even when produced voluntarily, gen- binations of basic ones. He further categorized erate the physiology and some subjective feelings these other emotions in the primary dyads (very pertaining to the true emotional experience (Ekman likely to co-occur), secondary dyads (less likely et al., 1983). The latter findings gave way to a new to co-occur) and tertiary dyads (co-occur seldom) line of research in the biology of emotions studying (Plutchik, 1991, p. 117). Love, for instance, is a the emotion-specific changes in the physiology. primary-dyad emotion derived from both joy and 1 2 In press at the time of publication of Ekman et al. (1969) Based on the information from https://www.the-emotions. study com/robert-plutchik.html
trust (the same applies to friendship). Delight is an example of the secondary-dyad emotion, which takes a little bit from both joy and surprise. Finally, guilt is a tertiary-dyad emotion being a mixture of fear and joy. Some other examples of blended emotions are optimism (anticipation + joy), aggres- sion (anticipation + anger), shame (fear + disgust), and envy (sadness + anger). Plutchik argues that most of our daily emotions are mixed, while pri- mary emotions almost do not exist in their pure form. More importantly, to Plutchik, mixed emo- tions are the actual personality traits. He writes that “Emotions like pride, aggression, submission, and optimism are usually long-lasting, and in fact are often called personality traits.” Plutchik (1991, p. 120) and later concludes that “persisting situa- tions which produce mixed emotions produce per- sonality traits” (Plutchik, 1991, p. 121). In other Figure 2: Plutchik’s wheel of emotions words, a conflict between two or more emotions produce a new unique personality trait or attitude, which persist over time. (1991). The wheel (Figure 2) is constructed in the fashion of a color wheel, with similar emotions The second radical difference of Plutchik’s emo- placed closer together and opposite emotions 180 tion theory from the basic emotion theory of Ek- degrees apart. The wheel is designed as a cone, man is that emotion is not reduced to physiology where the vertical dimension indicates the intensity, only. Plutchik believes that humans recognize ranging from maximum intensity at the top to a and express emotions not with any one particular state of deep sleep at the bottom. Such a shape physiological signal, but in terms of overall behav- implies that emotions become less distinguishable ior. Hence, he claims, we should study emotions at lower levels of intensity. Essentially, the wheel is through behavior and not by using bodily measure- constructed from eight basic bipolar emotions: joy ments. Plutchik writes: “Emotion is not a thing versus sorrow, anger versus fear, trust versus dis- in the sense as table or chair is” (Plutchik, 1991, gust, and surprise versus anticipation. The blank p. 50). For Plutchik, emotion is “a patterned bodily spaces between the leaves are so-called primary reaction of destruction, reproduction, [. . . ] brought dyads — emotions that are mixtures of two of the about by a stimulus.” (Plutchik, 1991, p. 151), and primary emotions. its (emotion) properties can only be inferred, but Just as Ekman’s theory of basic emotions influ- not measured. As Ekman, Plutchik considers that enced the research in facial emotion recognition, emotions are innate, but this innateness has nothing the wheel model of emotions proposed by Plutchik to do with certain body parts or neural structures. too had a great impact on the field of affective com- Emotions are mere adaptive devices inherited by an puting. However, in contrast to Ekman’s model, individual from the process of evolution and strug- Plutchik’s wheel of emotions is primarily used in gle for survival. In this sense, adaptive behavior the emotion recognition from text as a basis for comes first, and emotion follows. Evolution taught emotion categorization (some examples are Cam- us to explore, protect, reproduce, reject, destruct, bria et al. (2012), Kim et al. (2012), Suttles and Ide and emotions are evolutionary devices that have (2013), Borth et al. (2013), Mohammad and Turney relevance to basic biological adaptive processes. (2013), Abdul-Mageed and Ungar (2017)). In order to represent the organization and proper- ties of the emotions as they were defined by his psy- 3.1.3 Russel’s Circumplex Model choevolutionary theory, Plutchik proposed a struc- Despite wide popularity and influence, the theory tural model of emotions, which he called a multidi- of basic emotions elaborated in detail by Ekman is mensional model of emotions that is more known challenged by some theoretical and empirical diffi- today as Plutchik’s wheel of emotions Plutchik culties associated with it. Main objection raised to
the theory of basic emotions is that there are no re- liable neural, physiological and facial correlates to specific basic emotions (Posner et al., 2005), which essentially challenges the idea of innate, and hence “universal”, emotions. At the same time, investi- gations in the subjective experience of emotions suggest that they arise from cognitive interpreta- tions of physiological experiences (Cacioppo et al., 2000). Attempts to overcome the shortcomings of basic emotions theory and its unfitness for clinical studies led researches to suggesting various dimen- sional models, the most prominent of which is the circumplex model of affect proposed by James Rus- sel (Russell, 1980). The word “circumplex” in the name of the model refers to the fact that emotional episodes do not cluster at the axes but at the periph- Figure 3: Circumplex model of affect: Horizontal ery of a circle (Figure 3). axis represents the valence dimension, the vertical At the core of the circumplex model is the notion axis represents the arousal dimension of two dimensions plotted on a circle along horizon- tal and vertical axes. These dimensions are valence 3.2 Emotion Analysis in Classical Literary (how pleasant or unpleasant one feels) and arousal Studies (the degree of calmness or excitement). The num- ber of dimensions is not strictly fixed and there Until the end of the twentieth century, literary and are adaptations of the model that incorporate more art theories often disregarded the importance of dimensions, as the Valence-Arousal-Dominance the aesthetic and affective dimension of literature, model that adds an additional dimension of domi- which in part stemmed from the rejection of old- nance, the degree of control one feels over the sit- fashioned literary history that had explained the uation that causes an emotion (Bradley and Lang, meaning of art works by the biography of the author 1994). (Sætre et al., 2014a). However, the affective turn Essentially, by moving from discrete categories taken by a wide range of disciplines in the past two to a dimensional representation, the researchers decades – from political and sociological sciences are able to account for subjective experiences that to neurosciences to media studies – have refueled do not fit nicely the isolated non-overlapping cat- the interest of literary critics in human affects and egories. Accordingly, each affective experience sentiments. can be depicted as a point in a circumplex that is We already mentioned several works that ex- described by only two parameters — valence and plore the link between the arts and emotions in the arousal — without need for labeling or reference to Introduction. In this section, we will talk about folk emotional concepts (Russell, 2003). However, several other studies that focus on the emotions the strengths of the model turned out to be its weak- expressed in literary art form to set a ground for nesses: For example, it is not clear if there are basic further discussion of differences between classical dimensions in the model (Larsen and Diener, 1992) and computational approaches to theorizing about and what to do with qualitatively different events of emotions. fear, anger, embarrassment and disgust that fall in We said earlier there seems to be a consensus identical places in the circumplex structure (Russell among literary critics that literary art and emotions and Barrett, 1999). Despite these shortcomings, the go hand in hand. However, one might be chal- circumplex model of affect is widely used in psy- lenged to define the specific way in which emotions chologic and psycholinguistic studies. In computa- come into play in the text. The exploration of this tional linguistics, the circumplex model is applied problem is presented by van Meel (1995). Under- when the interest is in continuous measurements pinning the centrality of human destiny, hopes, and of valence and arousal rather than in the specific feelings in the themes of many artworks – from discrete emotional categories. painting to literature – van Meel explores how
emotions are involved in the production of arts. field’s prose, to describe the emotional world of Pointing out to big differences between the two characters. Going back and forth from psycho- media in their possibilities to depict human emo- narration to free indirect discourse provides Mans- tions (painting convey nonverbal behavior directly, field with a tool to point out the significant mo- but lack temporal dimension that novels have and ments in the protagonists’ lives and draw a separa- use to describe emotions), van Meel provides an tion between characters and narration. analysis of the nonverbal descriptions used by the Both van Meel’s and Kuivalainen works, sepa- writers to convey emotional behavior of the char- rated from each other by more than a decade, un- acters. Description of visual characteristics, van derpin the importance of emotional language in Meel speculates, responds to a fundamental need the interpretation of characters’ traits, hopes, and of a reader to build an image of a person and her tragedy, and this view in fact finds empirical sup- behavior. Moreover, nonverbal descriptions add port, for example in Barton (1996) and Van Horn important information, which can in some cases (1997). Of course, the power of linguistic tools play a crucial hermeneutical role, as in Kafka’s in conveying emotions cannot be underestimated. Der Prozess, where the fatal decisions for K. are But at the same time its role in the creation and made clear by gestures rather than words. How- depiction of emotion should not be overestimated. ever, gestures are not the only nonverbal channels That is, saying that someone looked angry or fear- that are used to convey emotions in literature. Van ful or sad, as well as directly expressing characters Meel defines eight channels (bodily characteristics, emotions are not the only ways the authors resort clothing, facial expressions, looking behavior, hand to when building believable fictional space filled gestures, movements of the body, voice, and spa- with characters, action, and emotions. In fact, many tial relations) and offers a small-scale quantitative novelists strived to express emotions indirectly by systematic analysis of their use in literature (on a way of figures of speech or catachresis Hillis Miller sample of six twentieth-century “classics”). The (2014), first of all, because emotional language can analysis shows that the voice category was the most be ambiguous and vague, and, second, to avoid any frequently used followed by facial expressions, and allusions to Victorian emotionalism and pathos. hand gestures. The results, van Meel suggest, show How can an author convey emotions indirectly? that such types of analysis could contribute to un- A book chapter by Hillis Miller (2014) in Exploring raveling the hidden presuppositions about inner Text and Emotions (Sætre et al., 2014b) seeks the life and its outer appearance, and can help in re- answer to exactly this question. Using Conrad’s constructing the emotional universe of individual Nostromo opening scenes as material, Hillis Miller writers and historical periods. shows how Conrad’s descriptions of an imaginary A hermeneutic approach through the lenses space generate emotions in readers without direct of emotions is presented by Kuivalainen (2009), communication of emotions. which provides a detailed analysis of linguistic fea- Conrad’s Nostromo opening chapter is an objec- tures that contribute to characters’ emotional in- tive description of Sulaco, an imaginary land. The volvement in Mansfield’s prose. The study shows description is mainly topographical and includes how, through the extensive use of adjectives, ad- occasional architectural metaphors, but it combines verbs, deictic markers, and orthography, Mansfield wide expanse with hermetically sealed enclosure, steers the reader towards the protagonist’s climax. which generates “depthless emotional detachment” Subtly shifting between psycho-narration and free (Hillis Miller, 2014, p. 93). Through the use of indirect discourse, Mansfield is making use of eval- present tense, Conrad is making the readers to sug- uative and emotive descriptors in psycho-narrative gest that the whole scene is timeless and does not sections, often marking the internal discourse with change. The topographical descriptions are given dashes, exclamation marks, intensifiers, and repeti- in a pure materialist way: There is nothing behind tion, which triggers an emotional climax. Various clouds, mountains, rocks, and sea that would matter deictic features introduced in the text are used to to humankind, not a single feature of the landscape pinpoint the source of emotions in the text, which is personified, not a single topographical shape is helps in creating a picture of characters’ emotional symbolic. Knowingly or unknowingly, the author world. Verbs (especially, in present tense), adjec- argues, but by telling the reader what she should tives, and adverbs serve the same goal in Mans- see – with no deviations from truth – Conrad em-
You can also read