BHAAV (भ व) - A Text Corpus for Emotion Analysis from - arXiv
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
BHAAV (भाव) - A Text Corpus for Emotion Analysis from Hindi Stories Yaman Kumar Debanjan Mahata∗ Sagar Aggarwal Adobe Systems, Noida Bloomberg LP NSIT-Delhi ykumar@adobe.com dmahata@bloomberg.net sagara.co@nsit.net.in Anmol Chugh Rajat Maheshwari Rajiv Ratn Shah Adobe Systems, Noida USICT, New Delhi IIIT-Delhi achugh@adobe.com rajat.usict.101164@ipu.ac.in rajivratn@iiitd.ac.in Abstract considerable attention from the scientific com- munity making it one of the important areas In this paper, we introduce the first and of research in computational linguistics. largest Hindi text corpus, named BHAAV (भाव) , which means emotions in Hindi, for Majority of the methods and resources de- analyzing emotions that a writer expresses veloped in emotion analysis domain deals through his characters in a story, as per- with English language (Yadollahi et al., 2017). ceived by a narrator/reader. The corpus Moreover, since there are no text based re- consists of 20,304 sentences collected from sources for emotion analysis in Hindi, our un- 230 different short stories spanning across derstanding of expression of emotions is only 18 genres such as ेरणादायक (Inspirational) limited to English text. To this end, we de- and रह यमयी (Mystery). Each sentence has velop a text corpus1 for emotion analysis from been annotated into one of the five emotion categories (anger, joy, suspense, sad, and stories written in Hindi, which is one of the neutral), by three native Hindi speakers 22 official languages of India and is among the with at least ten years of formal education top five most widely spoken languages in the in Hindi. We also discuss challenges in the world2 . The proposed corpus is the largest annotation of low resource languages such annotated corpus for studying emotions from as Hindi, and discuss the scope of the pro- Hindi text, and facilitates the development of posed corpus along with its possible uses. linguistic resources in low-resource languages. We also provide a detailed analysis of the dataset and train strong baseline classifiers According to a joint report by KPMG and reporting their performances. Google3 published in 2017, there are 234 mil- lion Internet users in India using one of the 1 Introduction Indian languages as their medium of commu- nication against 175 million users using En- Emotion analysis from text is the study of glish. This gap is predicted to increase by identifying, classifying and analyzing emo- 2021, with users using Indian languages reach- tions (e.g., joy, sadness), as expressed and ing 536 million. Thus, social media companies reflected in a piece of given text (Yadollahi like Facebook and Internet search companies et al., 2017). Its wide range of applications like Google have increased their support for in areas such as customer relation manage- popularly used Indian languages. Since Hindi ment (Bougie et al., 2003), dialogue systems is the most widely spoken Indian language, fol- (Ravaja et al., 2006), intelligent tutoring sys- lowed by Bengali and Telugu, the introduction tems (Litman and Forbes-Riley, 2004), analyz- of BHAAV dataset is apt and timely. ing human communications (Kövecses, 2003), Related to the task of emotion analysis natural text-to-speech systems (Francisco and in Hindi, previous attempts have been made Gervás, 2006), assistive robots (Breazeal and Brooks, 2005), product analysis (Knautz et al., 1 https://doi.org/10.5281/zenodo.3457467 2010), and studying psychology from social me- 2 https://en.wikipedia.org/wiki/List_of_ dia (De Choudhury et al., 2013), has drawn languages_by_number_of_native_speakers 3 https://assets.kpmg.com/ Author ∗ participated in this research as an adjunct content/dam/kpmg/in/pdf/2017/04/ faculty at IIIT-Delhi, India. Indian-languages-Defining-Indias-Internet.pdf
in developing corpus for predicting emotions Major contributions of this work are: from Hindi-English code switched language - Publicly share the first and the largest used in social media (Vijay et al., 2018) (2,866 annotated Hindi corpus (BHAAV) for senti- sentences), and from auditory speech signals ment analysis, consisting of 20,304 sentences (Koolagudi et al., 2011). Some work has been from 230 popular Hindi short stories spanning undertaken in a closely related task of senti- across 18 popular genres. Each sentence is la- ment analysis and datasets have been created beled with one of the five emotion categories: for identifying sentiments expressed in movie anger, joy, suspense, sad, and neutral. reviews (Mittal et al., 2013) (664 reviews), - Describe potential applications of BHAAV, Hindi blogs (Arora, 2013) (250 blogs), and the process of annotation, and main challenges generating lexical resources like Hindi Senti- in creating an emotion analysis text corpus for Wordnet (Joshi et al., 2010). Given the dearth a low-resource language like Hindi. of resources for analyzing emotions from Hindi - Propose strong baseline classifiers and report text, we present and publicly share BHAAV, their results for identifying emotion expressed a corpus of 20,304 sentences collected from 230 in a sentence of a story written in Hindi. different short stories (e.g., Eidgah (ईदगाह) by Munshi Premchand) written in Hindi, span- 2 Related Work ning across 18 genres (see Table 7 for complete list). Each sentence has been annotated by It is necessary to mention that there has been three native Hindi speakers who has at least extensive work in sentiment analysis especially ten years of reading and writing experience in in the past two decades. Although, there is sig- Hindi language, with the goal of identifying nificant intersection between techniques used one of the following five popular emotion cat- for sentiment analysis and emotion analysis, egories: anger, joy, suspense, sad, and neutral. yet the two are different in many ways. Emo- tion analysis is often tackled at a fine grained Stories are a melting pot of different types level and has historically proved to be more of emotions expressed by the author through challenging due to subtleties involved in iden- the characters and plots that he develops in tifying and defining emotions. Additionally, his writing. Emotions in storytelling has been resources for emotion analysis are scarce when previously studied resulting in identification of compared to sentiment analysis, especially for six basic types of emotional arcs in English low resource languages like Hindi. For a de- stories (Reagan et al., 2016), namely - Rags tailed survey of methods, datasets and theo- to riches, Tragedy, Man in a hole, Icarus, Cin- retical foundations on sentiment analysis and derella, and Oedipus. This motivated us to de- emotion analysis, please refer (Yadollahi et al., velop BHAAV from Hindi stories. We believe 2017; Lei et al., 2018; Cambria et al., 2017; Po- that apart from studying emotions in Hindi ria et al., 2017). text, the presented corpus would also enable Analyzing emotions from text has been pri- studies related to the analysis of Hindi liter- marily manifested through four different types ature from the perspective of identifying the of tasks - Emotion Detection (Gupta et al., inherent emotional arcs. It also has the poten- 2013), Emotion Polarity Classification (Alm tial to catalyze research related to human text- et al., 2005), Emotion Classification (Yang to-speech systems geared towards improving et al., 2007), and Emotion Cause Detection automated storytelling experiences. For in- (Gao et al., 2015). The scope of this work is stance, inducing emotion cues while automati- limited to the task of Emotion Classification. cally synthesizing speech for stories from text. (Pang et al., 2008), mentions that emotions We keep the order of the sentences intact are expressed at four levels - morphological, as they occur in their source story. This lexical, syntactic and figurative, and noted that makes the corpus ideal for performing tem- as we move from morphological to figurative, poral analysis of emotions in the stories, and the difficulty of the emotion analysis task in- provides enough information for training ma- creases and number of resources for the same chine learning models that takes into account decrease. Developed from stories written in temporal context. a morphologically rich language, BHAAV pri-
marily deals with the first and the last levels. language specific challenges (Arora, 2013), in Most of the work for creating data resources order to draw a complete picture of the intri- for emotion analysis has been fairly limited to cacies of the task and emphasize that there is a building emotion lexicons (Strapparava et al., scope of developing methods specific to Hindi, 2004; Pennebaker et al., 2001; Shahraki and and not all methods developed for English can Zaiane, 2017; Mohammad and Turney, 2013), be directly translated to Hindi. or concentrated in annotating emotions of in- Word Order - The order in which words ap- dividual sentences without giving any context pear in a sentence plays an important role in (Strapparava and Mihalcea, 2007). As indi- determining polarity as well as subjectivity of cated by many, this approach is a non-holistic the text. As opposed to English, which is a for a task such as emotion analysis (Schwarz- fixed order language, Hindi is a free order lan- Friesel, 2015; Ortony et al., 1987). To this guage. For any sentence in English to be gram- end, BHAAV not only presents annotated sen- matically correct the ‘subject’ (S) is followed tences, but also provides their context. by ‘verb’ (V), which is followed by ‘object’ (O), Lastly, when it comes to the task of analyz- i.e., in the [SVO] pattern. For example the ing emotions from text, there are no datasets English sentence - “Ram (राम) ate (खाया) three available in Hindi. Although, resource-poor mangoes (तीन आम)”, which follows [SVO], can Indian languages have started catching up be expressed in the following three ways in their richer counterparts in the domain of Hindi that do not adhere to the [SVO] pat- sentiment analysis (SA) (Mittal et al., 2013; tern: (i) ‘राम ने तीन आम खाया’ [SVO], (ii) ‘तीन आम Arora, 2013; Joshi et al., 2010), yet sufficient खाया राम ने’ [OVS], and (iii) ‘खाये तीन आम राम ने’ work needs to be done considering the pace [VOS]. This lack of order can pose challenges at which these languages are finding their uses to the machine learning algorithms that take in modern digitally driven India. The lack of into account the order of the words. resources can be judged from the wide usage Morphological Variations - Hindi language of one of the very few Hindi datasets for SA is morphologically rich. This means that a lot tasks (Balamurali et al., 2012). It consists more information can be expressed in a word of just 200 positive and negative sentences in Hindi for which one might end up writing for two major Indian languages, Hindi and many more words in English. One of the ex- Marathi. Another popular and a recent at- ample is that of expressing genders. For exam- tempt is by (Patra et al., 2015). Their dataset ple, when using the word ‘खायेगी’, which means contains approximately 1500 tweets for lan- ‘will eat’ in English, one can not only indicate guages of Hindi, Bengali and Tamil annotated that someone will eat but also provide cues of for the task of Aspect Based Sentiment Anal- the person’s gender (in this case female - the ysis. BHAAV is certainly an attempt to fill male variant is ‘खायेगा’). this gap and create a large, effective and high Handling Spelling Variations - A word quality resource for emotion mining from text. with the same meaning can appear with mul- tiple spelling variations. Occurrence of such 3 Language Specific Challenges variations can pose challenges for the machine learning models that has to take into account As already mentioned and pointed in (Yadol- all the spelling variants. For example the word lahi et al., 2017), the computational methods ‘मेहगा’, which means ‘costly’ has another vari- used in the tasks pertaining to sentiment anal- ant महंगा that means the same. ysis (SA) can readily be applied to the emotion Lack of Resources - The lack of lexicons, analysis (EA) tasks. Therefore, the challenges developed techniques and elaborate resources for EA from text are very similar to that of in Hindi also adds to the challenge, which is the domain of SA from text. For a detailed de- also one of the main motivations for our work. scription of the challenges one can refer (Mo- hammad, 2017). However, our task of identi- 4 Corpus Creation and Annotation fying emotions from sentences poses additional challenges due to the inherent characteristics One of our primary aims was to create a man- of Hindi language. We point out some of these ually annotated large corpus for performing
emotion analysis from text in Hindi. We also the output of the initial phase, we observed wanted to capture the context in which a given that not all basic emotions occurred promi- piece of text occurs. Therefore, we decided to nently in the selected Hindi stories. There extract all the sentences from short stories be- were five main categories of emotions which longing to genres popular in Hindi. Whenever were found to be present extensively in the cor- possible we also searched for an audio book4 pus - anger, joy, suspense, sad, and neutral. A where the same story has been narrated by a brief description of all the emotion categories narrator. This was done in order to help the is presented in Table 1. A few examples to il- annotators during the annotation process, in lustrate the various categories as annotated by case they have to refer to examples of how a the annotators are also given in Table 2. More narrator/reader would express the emotion of examples along with common error cases are a sentence in the context of the story. All our listed in the Appendix section of the paper. annotators were native Hindi speaking volun- The annotators were instructed not to be teers who had a minimum of 10 years of formal biased by their own interpretations of a state- education in Hindi, and showed great interest ment in the story while labeling them. For ex- in reading the stories. ample, take the case of the following sentence Emotion Category κ α Emotions expressed by the category एक दवसीय केट मैच म भारत से हार गया पाक (Pakistan joy 0.821 0.821 joy, gratitude, happiness, pleasantness anger 0.807 0.807 anger, rage, disgust, irritation lost to India in One Day International). An suspense 0.757 0.757 wonder, excitement, anxious uncertainty sad 0.835 0.835 sadness, dis-consolation, loneliness, anxiety, misery Indian annotator is often inclined to mark it as neutral 0.789 0.788 None of the above BHAAV dataset 0.802 0.802 joy while a Pakistani annotator often marks it as sad where as an unbiased reader would read Table 1: Emotions and thier inter-annotator agree- ments as measured using Fleiss’ Kappa (κ) (Fleiss and it as having neutral emotion. Thus, the anno- Cohen, 1973) and Krippendorff’s alpha (α) (Krippen- tators were asked to identify only the emotion dorff, 2011) for the entire BHAAV dataset. that an unbiased narrator/reader of that story would like to express while reading it to some- The extracted text from 230 stories was split one. Whenever confused, they were asked to into sentences in an automated way and con- do the following: first, mark the reason why tained many unnecessary text that were not they think a sentence should have a particular a part of the story. During the annotation emotion; second, to refer to the audio book of process, the annotators filtered the unwanted the story if available and try to infer the emo- text and only annotated the relevant portion. tion being expressed; third, if any of the other Whenever the sentences were not correctly options do not work, mark it as neutral. split, the annotators also corrected them. A total of five annotators were used for annotat- General statistics of the dataset are pre- ing the entire corpus, such that each sentence sented in Table 3. As can be seen from gets at-least three annotations. During the an- the table, Bhaav is imbalanced towards neu- notation process the annotators had access to tral sentences. This is due to the fact that the actual online story and the list of audio we took raw, unedited stories, making our books. Each story was annotated in one sit- dataset mimic the distribution of emotions as ting. It took nine months to finish the process. expressed in the author’s writings. Alterna- The guidelines for annotating emotions were tively, in order to balance the dataset, we designed to be very short and concise with re- could have taken selective sentences. However, gards to the definitions of the categories to there would have been several drawbacks as- be assigned. Due to space restrictions, the sociated with such an approach: 1) loss of im- guidelines for identifying each emotion are pre- mediate sentence contexts; 2) separating the sented in the Appendix. In order to identify individual sentences from the bigger picture emotion categories best suited for our short as developed by the author in different plots story corpus, we did some initial annotations of the story, and 3) failure to capture the im- with (Plutchik, 1984)’s ‘basic’ emotions. From plicit emotions expressed by a character of the 4 story (the emotions which a character is feel- Example of audio books for some of the stories - https://www.youtube.com/user/sameergoswami/ ing vs what his words indicate). The over- playlists all inter-annotator agreements and the agree-
Emotion Sample Hindi Sentences English Translation joy बादशाह ने कहा तु हार कहानी पहली दोन से अ धक मनोरंजक है The king said that your story is more entertaining than the previous two stories anger पया नई देगा तो उसका खाल उतारकर बाजार म बेच देगा If he does not give the money then I will take out his skin and sell it in the market मज र ने अब तक तो झलक भर देखी थी अब तो उसे पूर नजर भर देखा suspense Till now the worker had only seen his glimpses, but when he saw him fully he was तो ठगा सा खड़ा रह गया just stunned sad उसने ँ आसे होते ए म मी क ओर देखा With teary eyes he saw his mother neutral म इसक मां ं I am his mother Table 2: Sample sentences from BHAAV dataset for each emotion label. ments for individual emotion categories are the mind, there is an implicit pointer that she presented in Table 1. Next, we present some is feeling irritated due to the haste and hence of the challenges that we faced during the an- is angry over him. These types of emotions notation process that we think should be ex- are totally contextual and could be identified plicitly pointed out in order to provide a true only while reading the story. We believe that picture of the corpus as well as to give an idea capturing these emotions are also necessary in of the difficulties in carrying out such a pro- order to make our annotation process holistic. cess. Although, we do not train any classification model in this work that can take these types 4.1 Challenges in Annotation of context in order to predict the final emo- Apart from the challenge of annotating a low- tion of a sentence, yet we think that BHAAV resource language for which one can seldom as a dataset provides an opportunity to build get high quality crowd workers, there were cer- such contextual models making it a rich cor- tain challenges that were both specific to the pus unlike many other previous ones as already domain of stories as well as generic ones pe- pointed out in Section 2. We would certainly culiar to the tasks of sentiment and emotion like to take it up as a future work. analysis. Some of the prominent ones as iden- . Example 2 - अब ज दी पड़ है क लोग ईदगाह य नह चलते| इ ह tified from the feedback of the annotators are गृह थी क चता से या योजन| (Now he is feeling why do presented below with examples. not people go to the mosque a little faster. What do Identifying Implicit Emotions - The an- they (the children) know about household chores) notators were asked to identify the emotions Primary Target of Opinion - Another chal- whenever it was both explicitly and implic- lenge comes when there is not even an im- itly expressed. An example of explicitly ex- plicit clue in the immediate context of a sen- pressed emotion would be - Example 1, in tence. For instance, in a story, sometimes a which the speaker by using the words such as character is developed as an adversary to a सुहावना (refreshing), मनोहर (beautiful) clearly in- particular prop (i.e., PTO, Primary Target of dicates that he is happy with the nature, thus Opinion). The prop can be another charac- expressing his joy in the statements. ter or some inanimate object or phenomena. . Example 1 - कतना मनोहर, कतना सुहावना भाव है| वृ पर From the start of the story, the character ex- अजीब ह रयाली है, खेत म कुछ अजीब रौनक है, आसमान पर कुछ अजीब presses his emotions in a characteristic manner ला लमा है| (It is such a beautiful and enjoyable feeling. There is a strange greenery on the trees, some strange towards that PTO. Thus if a sentence or a con- liveliness in the fields, there is some weird but enjoyable text does not have any explicit clues to know redness in the sky) the state of the mind of the character, iden- Identifying implicit emotions were some- tifying the PTO and the character‘s emotions times confusing for the annotators and on tak- towards PTO gives some connotation to that ing a closer look we did find some of them be- sentence. This is in line to what was suggested ing marked as neutral. An example of implic- in the work (Mohammad, 2016). An exam- itly expressed emotion would be - Example 2, ple of such an instance as presented in Exam- in which a child’s grandmother is complain- ple 3, can be derived from the famous story ing about her son being too hasty of going to by Munshi Premchand, Eidgah. The follow- the mosque. She complains of his ignorance of ing sentence when read in isolation could po- knowing anything about driving a household tentially trick someone into thinking whether and its inherent difficulties. Although there the boy speaking these dialogues is expressing are no explicit words indicating her state of mercy or even neutrality, when he is actually
expressing joy. Hindi story. Both classic machine learning . Example 3 - मोह सन- ले कन दल म कह रहे ह गे क मले तो खा and modern deep learning models are trained ल| (Mohsin- But in the hearts, they must be thinking and their results are analyzed. We extensively that if they could get it, they would eat it) use Sklearn (Pedregosa et al., 2011) and Keras Sarcasm - A common challenge which anno- (Chollet et al., 2018) as our machine learning tators faced while annotating BHAAV is the toolkits. case of sarcasm, which is again prevalent in most of the previous works in sentiment and 5.1 Dataset emotion analysis. Sarcasm, as it occurs, is No. of Sentences No. of Sentences generally accompanied by either anger or de- Emotion No. of Sentences (Train data) (Test data) joy 2,463 2,242 221 light (or sometimes both) of the speaker at anger 1,464 1,321 143 the dismay of the PTO. Thus, in most cases, suspense sad 1,512 3,168 1,389 2,843 123 325 the emotional state of the speaker of sarcas- neutral 11,697 10,478 1,219 tic comments was a mixture of anger with the Table 3: Distribution of sentences in different cate- PTO and rejoicement at its expense. How- gories of emotions in the BHAAV dataset. ever, to account for the headline categories we chose for Bhaav, annotators were asked to dif- The BHAAV dataset was randomly shuf- ferentiate between these two causes using the fled and split into train and test datasets context provided and mark the category which with a ratio of 10:1. The distribution of la- most closely represents the sentence. This was bels in the two datasets are shown in Table sometimes challenging. For instance, in exam- 3. The proportion of distribution of labels in ple 4 the emotion most close to the state of the test dataset is kept similar to the training speaker is that of anger, when it could be eas- dataset. We train our models on the training ily misunderstood to be that of joy. dataset and test the final predictions on the . Example 4 - हा हा हा! अब तुम बताओगे हम या बोल? (Ha Ha test dataset. We do not create a separate vali- Ha ! Now you would tell me what I should speak?) dation dataset. However, we do use validation Annotating Suspense - Suspense was the data extracted from the training data, when- toughest category for the annotators and ever necessary for tuning the hyperparameters proved very difficult for them to know exactly of the models. when a sentence is of this category. The anno- tators were asked to mark a sentence as sus- 5.2 Text Preprocessing pense when there is some element in it which Before training the classification models one evokes a sense of wonder, anticipation or worry needs to preprocess the text and represent each (see Example 5). Suspense is a unique feature sentence as a feature vector. We tokenize of stories which does not get fully expressed in each sentence into words and remove punc- other types of written materials such as news tuations. We do not remove the stopwords. articles, formal reports, and others. Since we deal with Hindi, the standard word . Example 5 - पछले पहर को मह फल म स नाटा हो गया| -हा tokenizers that are suitable for English lan- क आवाज ब द हो गय | लीला ने सोचा, या लोग कह चले गए, या सो guage could not be used. Therefore, we used गये? एकाएक स नाटा य छा गया? (Last afternoon, the si- the tokenizer shipped with Classical Language lence was over the entire place. There were no voices around. The sounds of Hu-Ha completely stopped. Toolkit5 . Each sentence is vectorized after a Leela thought, did people go somewhere, or perhaps they feature extraction step for the classic machine slept? Why all of a sudden there is silence everywhere?) learning models such as Support Vector Ma- Next, we present the experiments performed chines. Unigrams, Bigrams and Trigrams were for training the baseline models. generated as features for each sentence and their TF-IDF (Aizawa, 2003) scores were con- 5 Baseline Models sidered as the feature values. One of the key components of the input fed In this section, we describe strong baseline to the deep learning models are pre-trained models that we train for the task of one of word embeddings (Kusner et al., 2015), that the emotions - anger, joy, suspense, sad, and neutral, from a given sentence taken from a 5 http://docs.cltk.org/en/latest/hindi.html
are used for representing each word of the in- with two annealing restarts has been shown put sentences by a dense real valued vector. to work faster and perform better than SGD Since the dataset on which we train our mod- in other NLP tasks (Denkowski and Neubig, els is relatively small, we use the pretrained 2017). Therefore, we use the same as our word embeddings in order to prevent overfit- optimization algorithm for the deep learning ting. This practice is commonly known as models. As the task is a multi-class classifi- transfer learning6 . We choose the Fasttext7 cation problem, categorical cross entropy was word embeddings (Bojanowski et al., 2016), used as the loss function, and the final layer trained on the Hindi Wikipedia corpus. This of both the deep learning models consisted of was a natural choice due to its easy availabil- a fully-connected dense neural network with ity. Additionally, Fasttext is possibly a bet- the extracted features as the input and a soft- ter choice than other popular word embedding max output giving the prediction probability methods as it is more suitable for representing for each of the five emotion categories. words belonging to morphologically rich lan- Hyperparameter Range guages like Hindi as described in Section 3. No. of Filters for CNN 100, 200, 300, 400 Filter sizes While training the deep learning models, for the CNN model 1, 2, 3, 4, 5, 6 each sentence in the training and test dataset Dense Output Layer Size Dropout Probability 100, 200, 300, 400 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 is converted to a fixed size document of 126 Learning Rate 0.0001, 0.001 Batch Sizes 8, 16, 32, 64, 128 words (maximum length of a sentence in the Epochs 10, 50, 100, 150 dataset). Padding8 is used for sentences of LSTM units 8, 16, 32, 64, 128, 256 length lesser than 126 words. Each word is Table 4: Hyperparameter ranges used for random represented as a 300 dimensional (D) vector search during training deep learning models (CNN and by the word embedding model. All the words Bidirectional LSTM). in the dataset are mapped to their correspond- ing word embedding vector. Whenever a word Among the classic machine learning tech- is not found in the vocabulary of the word em- niques, Support Vector Machine (SVM) with bedding model we assign it a 300-D zero vec- a linear kernel (Hsu et al., 2003), Logistic Re- tor. Each sentence is then represented as a gression (Yu et al., 2011) and Random Forests matrix of its constituent words and their cor- (Breiman, 2001) were trained. A shallow Con- responding embedding vector, which is then volutional Neural Network with a single in- fed as an input to deep learning algorithms. put channel similar to (Severyn and Moschitti, 2015), and Bidirectional Long Short Term 5.3 Training Memory networks with an architecture sim- ilar to (Mahata et al., 2018), are the deep All the machine learning models were trained learning models that were trained. A ran- after selecting the hyperparameters on a vali- dom classifier that randomly generated predic- dation data. 10-fold cross validation was used tions from a label distribution similar to that for the classic techniques. For the deep learn- of the training dataset was also implemented. ing models, random search (Bergstra and Ben- Table 5 summarizes the performances of the gio, 2012) was used for selecting the best hy- classifiers on the test dataset for the following perparameters among the ones shown in Ta- metrics - macro average precision, macro aver- ble 4, that best fitted a fixed randomly selected age recall, macro average F1-score, and accu- validation data comprising of 20% of the train- racy (Sokolova and Lapalme, 2009). We chose ing data. Only 100 iterations of random search macro-average measures as the data is imbal- was performed. Once the hyperparameter tun- anced and macro-averaging will assign equal ing was done the final model was trained on weights to all the categories, which gives a bet- the entire training data using the selected hy- ter generic performance of any classifier. perparameters. Adam (Kingma and Ba, 2014) 6 ftp://ftp.cs.wisc.edu/machine-learning/shavlik- 6 Discussion group/torrey.handbook09.pdf In order to analyze the possible features chosen 7 https://github.com/facebookresearch/ fastText/blob/master/pretrained-vectors.md by a machine learning classification algorithm 8 https://keras.io/preprocessing/sequence/ for discriminating between different categories
(a) Idealist (b) Exploiter and Exploited Figure 1: Flow of emotions in randomly selected stories from two different genres. Macro Avg Macro Avg Macro Avg Method Precision Recall F1 Accuracy the data distribution (Table 3) and it being Logistic Regression SVM 0.58 0.48 0.62 0.52 0.58 0.49 0.62 0.52 the majority class. The performance of the Random Forests 0.44 0.59 0.45 0.59 suspense category was consistently low. Al- CNN 0.50 0.55 0.51 0.55 BLSTM 0.43 0.60 0.47 0.60 though, the category of anger had a similar Random Classifier 0.40 0.40 0.40 0.40 presence in BHAAV, yet it had better perfor- Table 5: Performance of the baseline supervised clas- mance than suspense. This might be due to sification models on BHAAV dataset. the presence of better discriminative features for anger than suspense. Another reason could be related to challenges associated with anno- of emotions and to validate the ability of the tating the suspense category (Section 4.1). BHAAV dataset in providing such features to any classifier, we looked at the most impor- Our analysis provides a brief insight into tant features chosen by the Logistic Regres- the BHAAV dataset from which we can con- sion model. Table 6 shows the top 10 most in- clude that it is an appropriate dataset for emo- formative unigram features for each category tion identification and classification tasks. Al- of emotion chosen by the model in order to though, the dataset is created from stories, it make the final predictions. As evident from can possibly be used for many other domains the choices, words like स न (glad), सुंदर (beau- as it is rich in features indicating the five dif- tiful), खुश (happy), हँस, (laugh), are sensible ferent emotions as presented in this work. The indicators of joy, and so are the words like annotations were done from the perspective of अपमान (insult), गु सा (anger), ोध (anger), बदला a reader/narrator trying to express the emo- (revenge), for anger. The other categories also tion of a sentence, given the existing scenario show a similar pattern. in the story and whenever applicable trying to express the emotion of a character in the story. Emotion Top 10 Important Unigram Features This also makes this dataset suitable for train- स न (glad), सुंदर (beautiful), खुश (happy), joy हँस, (laugh), संगीत (music), खलौने (toys), मजा (fun), ing automated text-to-speech interfaces (e.g., आनंद (joy), हँसकर (smilingly), उछल (jump) अपमान (insult), गु सा (anger), ोध (anger), audio books) for story narration and improv- anger बदला (revenge), मूख (idiot), सजा (punishment), जह नुम (hell), आग (fire), (evil), च लाया (screamed) ing them by infusing emotions in them. आवाज़ (sound), आ य (astonishment), ज न (Genie), suspense देखा (saw), यु (war), छन… (sound of anklets), कहाँ (where), जा (magic), अचानक (suddenly), जहाज (ship), 6.1 Emotions and Genres रो (cry), मर (die), रोने (crying), sad ख (sadness), दय (heart), खी (sad), जीवन (life), आँसू (tears), रोते (cry), भगवान् (God) We started with frequently used 30 genres as neutral कसान (farmer), उसने (he), ब नी (Binny), पूछा (asked), दादाजी (grandfather), कल (tomorrow), पं डत (pundit), mentioned by (Nagendra, 1994) and selected मेहता (mehta), मां (mother), आना (come) 500 popular online short stories. However, we narrowed down to the most frequent 18 gen- Table 6: Top 10 most important features for each emo- tion category as identified by the Logistic Regression res (see Table 1 for complete list) and ended model during training. up with extracting text from 230 stories, de- pending on the availability of online content. We also looked at the performance of the Throughout the process of deciding on genres classifiers for individual categories. Neutral and finding online content relevant to them, category had the best performance consis- we took help from some experts in Hindi liter- tently, which is quiet easy to guess from the ature who have done their PhD in Hindi liter-
Genres आदशवादी (Idealist) learning for text-based emotion prediction. In ेमपरक (Romantic) Proceedings of the conference on human lan- शहर जीवन (Urban Life) guage technology and empirical methods in nat- शोषक और शो षत वग (Exploiter and Exploited Class) नी तपरक (Moral Stories) ural language processing, pages 579–586. Asso- कसान जीवन (Life of a Farmer) ciation for Computational Linguistics. ऐ तहा सक (Historical) ेरणादायक (Inspiration) Piyush Arora. 2013. Sentiment analysis for hindi देश भ संभं धत (Patriotic) language. MS by Research in Computer Science. गत जीवन क सम या (Personal Issues/Problems) ढ़ और अंध व ास (Dogmatic and Superstitious) संयु प रवार क सम या (Joint Family Problems) AR Balamurali, Aditya Joshi, and Pushpak Bhat- रह यमयी (Mystery) tacharyya. 2012. Cross-lingual sentiment anal- यथाथवादी (Realistic and Pragmatic) ysis for indian languages using linked wordnets. ामीण (Village Life) Proceedings of COLING 2012: Posters, pages उपदेशपरक (Instructive) भोगे ए यथाथ क कहानी (Real Stories) 73–82. समाज सुधारक (Society and its Reformation) James Bergstra and Yoshua Bengio. 2012. Random Table 7: Genres present in BHAAV search for hyper-parameter optimization. Jour- nal of Machine Learning Research, 13(Feb):281– 305. ature. Piotr Bojanowski, Edouard Grave, Armand Joulin, BHAAV is appropriate for analyzing the and Tomas Mikolov. 2016. Enriching word vec- flow of emotions in individual stories and tors with subword information. arXiv preprint study them for different genres. We plotted arXiv:1607.04606. the flow of emotions in a randomly picked Roger Bougie, Rik Pieters, and Marcel Zeelenberg. story from two different genres as shown in 2003. Angry customers don’t come back, they Figure 1. It is observable from the figures that get back: The experience and behavioral impli- each story has its own distinct emotion foot- cations of anger and dissatisfaction in services. Journal of the Academy of Marketing Science, print. It would be interesting to study them 31(4):377–393. and draw interesting linguistic insights from the Hindi literature using BHAAV. Cynthia Breazeal and Rodney Brooks. 2005. Robot emotion: A functional perspective. Who 7 Future Work and Conclusion needs emotions, pages 271–310. In this work we publicly shared the first and Leo Breiman. 2001. Random forests. Machine learning, 45(1):5–32. the largest annotated corpus, named BHAAV, with 20,304 sentences in Hindi, for emotion Erik Cambria, Soujanya Poria, Alexander Gel- analysis. We provided a detailed description bukh, and Mike Thelwall. 2017. Sentiment anal- ysis is a big suitcase. IEEE Intelligent Systems, of the dataset, language specific challenges, 32(6):74–80. annotation process, challenges associated with annotations and reported performances of the François Chollet et al. 2018. Keras: The python deep learning library. Astrophysics Source Code baseline classification models trained on the Library. dataset for identifying emotions expressed in a sentence. Through different observations we Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. 2013. Predicting de- confirm BHAAV to be rich with emotion cues pression via social media. ICWSM, 13:1–10. and point to the potential applications. In the future, we plan to work on enriching BHAAV Michael Denkowski and Graham Neubig. 2017. Stronger baselines for trustable results in with more annotations related to sentiment neural machine translation. arXiv preprint and discourse analysis, and believe that it will arXiv:1706.09733. prove to be a valuable resource in Hindi. Joseph L Fleiss and Jacob Cohen. 1973. The equivalence of weighted kappa and the intra- References class correlation coefficient as measures of reli- ability. Educational and psychological measure- Akiko Aizawa. 2003. An information-theoretic per- ment, 33(3):613–619. spective of tf–idf measures. Information Pro- cessing & Management, 39(1):45–65. Virginia Francisco and Pablo Gervás. 2006. Au- tomated mark up of affective information in en- Cecilia Ovesdotter Alm, Dan Roth, and Richard glish texts. In International Conference on Text, Sproat. 2005. Emotions from text: machine Speech and Dialogue, pages 375–382. Springer.
Kai Gao, Hua Xu, and Jiushuo Wang. 2015. A Namita Mittal, Basant Agarwal, Garvit Chouhan, rule-based approach to emotion cause detection Nitin Bania, and Prateek Pareek. 2013. Senti- for chinese micro-blogs. Expert Systems with ment analysis of hindi reviews based on negation Applications, 42(9):4517–4528. and discourse relation. In Proceedings of the 11th Workshop on Asian Language Resources, Narendra Gupta, Mazin Gilbert, and Giuseppe Di pages 45–50. Fabbrizio. 2013. Emotion detection in email customer care. Computational Intelligence, Saif Mohammad. 2016. A practical guide to senti- 29(3):489–505. ment annotation: Challenges and solutions. In Proceedings of the 7th Workshop on Computa- Chih-Wei Hsu, Chih-Chung Chang, Chih-Jen Lin, tional Approaches to Subjectivity, Sentiment and et al. 2003. A practical guide to support vector Social Media Analysis, pages 174–179. Associa- classification. tion for Computational Linguistics. Aditya Joshi, AR Balamurali, and Pushpak Bhat- Saif M Mohammad. 2017. Challenges in sentiment tacharyya. 2010. A fall-back strategy for senti- analysis. In A Practical Guide to Sentiment ment analysis in hindi: a case study. Proceed- Analysis, pages 61–83. Springer. ings of the 8th ICON. Saif M Mohammad and Peter D Turney. 2013. Diederik P Kingma and Jimmy Ba. 2014. Adam: Nrc emotion lexicon. National Research Coun- A method for stochastic optimization. arXiv cil, Canada. preprint arXiv:1412.6980. Doctor Nagendra. 1994. Hindi sahitya ka itihas. Kathrin Knautz, Tobias Siebenlist, and Wolf- gang G Stock. 2010. Memose: search engine Andrew Ortony, Gerald L Clore, and Mark A Foss. for emotions in multimedia documents. In Pro- 1987. The referential structure of the affective ceedings of the 33rd International ACM SIGIR lexicon. Cognitive science, 11(3):341–364. Conference on Research and development in in- Bo Pang, Lillian Lee, et al. 2008. Opinion min- formation retrieval, pages 791–792. ACM. ing and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2):1–135. Shashidhar G Koolagudi, Ramu Reddy, Jainath Yadav, and K Sreenivasa Rao. 2011. Iitkgp- Braja Gopal Patra, Dipankar Das, Amitava Das, sehsc: Hindi speech corpus for emotion anal- and Rajendra Prasath. 2015. Shared task ysis. In Devices and Communications (ICDe- on sentiment analysis in indian languages sail Com), 2011 International Conference on, pages tweets - an overview. In Proceedings of the 1–5. IEEE. Third International Conference on Mining In- telligence and Knowledge Exploration - Volume Zoltán Kövecses. 2003. Metaphor and emotion: 9468, MIKE 2015, pages 650–655. Springer- Language, culture, and body in human feeling. Verlag. Cambridge University Press. Fabian Pedregosa, Gaël Varoquaux, Alexandre Klaus Krippendorff. 2011. Computing krippen- Gramfort, Vincent Michel, Bertrand Thirion, dorff’s alpha-reliability. Olivier Grisel, Mathieu Blondel, Peter Pretten- hofer, Ron Weiss, Vincent Dubourg, et al. 2011. Matt Kusner, Yu Sun, Nicholas Kolkin, and Kil- Scikit-learn: Machine learning in python. Jour- ian Weinberger. 2015. From word embeddings nal of machine learning research, 12(Oct):2825– to document distances. In International Con- 2830. ference on Machine Learning, pages 957–966. James W Pennebaker, Martha E Francis, and Zhang Lei, Wang Shuai, and Liu Bing. 2018. Deep Roger J Booth. 2001. Linguistic inquiry and learning for sentiment analysis: A survey. Cor- word count: Liwc 2001. Mahway: Lawrence Erl- nell Science Library. baum Associates, 71(2001):2001. Diane J Litman and Kate Forbes-Riley. 2004. Pre- Robert Plutchik. 1984. Emotions: A general psy- dicting student emotions in computer-human tu- choevolutionary theory. Approaches to emotion, toring dialogues. In Proceedings of the 42nd An- 1984:197–219. nual Meeting on Association for Computational Linguistics, page 351. Association for Computa- Soujanya Poria, Erik Cambria, Rajiv Bajpai, and tional Linguistics. Amir Hussain. 2017. A review of affective com- puting: From unimodal analysis to multimodal Debanjan Mahata, Jasper Friedrichs, Rajiv Ratn fusion. Information Fusion, 37:98–125. Shah, et al. 2018. # phramacovigilance- exploring deep learning techniques for identify- Niklas Ravaja, Timo Saari, Marko Turpeinen, Jari ing mentions of medication intake from twitter. Laarni, Mikko Salminen, and Matias Kivikan- arXiv preprint arXiv:1805.06375. gas. 2006. Spatial presence and emotions during
video game playing: Does it matter with whom Hsiang-Fu Yu, Fang-Lan Huang, and Chih-Jen Lin. you play? Presence: Teleoperators and Virtual 2011. Dual coordinate descent methods for lo- Environments, 15(4):381–392. gistic regression and maximum entropy models. Machine Learning, 85(1-2):41–75. Andrew J Reagan, Lewis Mitchell, Dilan Kiley, Christopher M Danforth, and Peter Sheridan A General Instruction Dodds. 2016. The emotional arcs of stories are dominated by six basic shapes. EPJ Data Sci- • Attempt HITs only if you are a native ence, 5(1):31. speaker of Hindi. Monika Schwarz-Friesel. 2015. Language and emo- tion. The Cognitive Linguistic Perspective, • Your responses are confidential. Any pub- in: Ulrike Lüdtke (Hg.), Emotion in Lan- lications based on these responses will not guage. Theory–Research–Application, Amster- include your specific responses, but rather dam, pages 157–173. aggregate information from many individ- Aliaksei Severyn and Alessandro Moschitti. 2015. uals.We will not ask any information that Twitter sentiment analysis with deep convolu- can be used to identify who you are. tional neural networks. In Proceedings of the 38th International ACM SIGIR Conference on B Task Specific Instructions Research and Development in Information Re- trieval, pages 959–962. ACM. • We take into account these five headline Ameneh Gholipour Shahraki and Osmar R Zaiane. categories: Anger, Joy, Sad, Suspense, 2017. Lexical and learning-based emotion min- Neutral/ Plain Talk. ing from text. In Proceedings of the Interna- tional Conference on Computational Linguistics • The headline and subordinate categories and Intelligent Text Processing. are as mentioned below Marina Sokolova and Guy Lapalme. 2009. A sys- – Anger(0) - Emotions include anger, tematic analysis of performance measures for rage, disgust, violent unwillingness, classification tasks. Information Processing & Management, 45(4):427–437. sadism, irritation – Joy(1) - Emotions include Joy, grat- Carlo Strapparava and Rada Mihalcea. 2007. itude, happiness, pleasantness, ela- SemEval-2007 task 14: Affective text. In Pro- ceedings of the Fourth International Workshop tion, positive excitement, triumph, on Semantic Evaluations (SemEval-2007), pages gratification, pride 70–74. Association for Computational Linguis- – Sad(2) - Emotions include sadness, tics. disconsolation, loneliness, anxiety, Carlo Strapparava, Alessandro Valitutti, et al. misery, sorry, depressing, shameful, 2004. Wordnet affect: an affective extension of grief-stricken, melancholy, unwilling wordnet. In Lrec, volume 4, pages 1083–1086. Citeseer. – Suspense(3) - Wonder, excitement, anxious uncertainty Deepanshu Vijay, Aditya Bohra, Vinay Singh, – Neutral(4) / Plain talk - These in- Syed Sarfaraz Akhtar, and Manish Shrivastava. 2018. Corpus creation and emotion prediction clude no emotions, examples are gen- for hindi-english code-mixed social media text. eral talk spoken with no emotion In Proceedings of the 2018 Conference of the North American Chapter of the Association for • Agreeing or disagreeing with the speaker’ Computational Linguistics: Student Research s views should not have a bearing on your Workshop, pages 128–135. response. You are to assess the language Ali Yadollahi, Ameneh Gholipour Shahraki, and being used (not the views). For exam- Osmar R Zaiane. 2017. Current state of text ple, given the tweet, ‘Evolution makes sentiment analysis from opinion to emotion no sense’, the correct answer is ‘the mining. ACM Computing Surveys (CSUR), 50(2):25. speaker is using negative language’ since the speaker’s words are criticizing or Changhua Yang, Kevin Hsin-Yih Lin, and Hsin- judging negatively something (in this case Hsi Chen. 2007. Emotion classification us- ing web blog corpora. In Web Intelligence, the theory of evolution). Note that the IEEE/WIC/ACM International Conference on, answer is not contingent on whether you pages 275–278. IEEE. believe in evolution or not.
• From reading the text, identify the entity पर कुछ अजीब ला लमा है |आज का सूय देखो, कतना towards which opinion is being expressed यारा, कतना शीतल है, यानी संसार को ईद क बधाई or the entity towards which the speaker’ दे रहा है | गाँव म कतनी हलचल है | ईदगाह जाने क s attitude can be determined. This en- तैया रयाँ हो रही ह | (Eid has come after 30 tity is usually a person, object, company, days of Ramadan. It is such a beautiful group of people, or some such entity. We and enjoyable feeling. There is a strange will call this the PRIMARY TARGET greenery on the trees, some strange liveli- OF OPINION (PTO). For example, if the ness in the fields, there is some weird but text criticizes certain actions or beliefs of enjoyable redness in the sky. Look at to- a person (or group of persons), then that day’s sun is looking, how lovely, how cool person or group is the PTO. If the text it is, that is to congratulate the world on mocks people who do not believe in evo- Eid. There is so much commotion in the lution, then the PTO is ‘people who do village? Preparations are going to go to not believe in evolution’. If the text ques- Idgah.) tions or mocks evolution, then the PTO Here the narrator is expressing his joy to- is ‘evolution’. wards the change in season and the com- ing of the festival. • While annotating, always try to find an explicit or implicit clue which suggests the • Anger: लड़के सबसे यादा स न ह | कसी ने एक speakers’attitude towards the situation. रोजा रखा है, वह भी दोपहर तक, कसी ने वह भी नह , ले- The speaker in this reference can be the कन ईदगाह जाने क खुशी उनके ह से क चीज है | रोजे narrator himself or the characters of the बड़े-बूढ़ के लए ह गे | इनके लए तो ईद है | रोज ईद का story. Example of a clue can be positive नाम रटते थे, आज वह आ गयी | अब ज दी पड़ है क words or sentiments described in a sen- लोग ईदगाह य नह चलते | इ ह गृह थी क चता से tence explaining a situation. या योजन | सेवैय के लए ध ओर श कर घर म है या Example - रमजान के पूरे तीस रोज के बाद ईद आयी है | कतना नह , इनक बला से, ये तो सेवेयाँ खायगे | वह या जान मनोहर, कतना सुहावना भाव है | वृ पर अजीब ह रयाली है, खेत क अ बाजान य बदहवास चौधर कायमअली के घर दौ- म कुछ अजीब रौनक है, आसमान पर कुछ अजीब ला लमा है | ड़े जा रहे ह | उ ह या खबर क चौधर आँख बदल ल, तो यह सार ईद मुहरम हो जाय | The boys are most Here, the narrator narrates the story pleased. Someone has kept a rosa, that about a month Ramzan. The narrator too by noon, someone hasn’t, but the joy though could be plain talking but the sen- of going to Idgah is the part of their share. tence, “ कतना मनोहर, कतना सुहावना भाव है” Rose will be for the elderly. For them it is give us a clue that he is not simply stating Eid. Every day the people use to talk about the events as is. Rather, he has an emo- Eid, today it came. Now they are excited, tional attachment to the climate and the asking why do not people go to the mosque story settings. In particular, he is happy a little faster. What do they (the children) about the environment and its refreshing know about household chores? .They are events. Thus, using this clue we can know not bothered whether there is milk or sugar for sure, that these sentences are not neu- in the house for saivanya (a type of food), tral but contain an emotion of joy. they just want to eat it. What does he know why the father is going to Chowd- • In case where someone is just quoting an- hary (ask for money to celebrate Eid). other person with no reference to his own They don’t know if Chowdhary changes emotional state, find explicit or implicit his mood, Eid would become Muharram. clue which suggests the speaker’s atti- tude towards PTO. In the last three sentences, the narra- tor shows signs of irritation, which is C Example HIT a subcategory of the headline category, “Anger”. • Joy: रमजान के पूरे तीस रोज के बाद ईद आयी है | कतना मनोहर, कतना सुहावना भाव है | वृ पर • Suspense: पछले पहर को मह फल म स नाटा हो अजीब ह रयाली है, खेत म कुछ अजीब रौनक है, आसमान गया | -हा क आवाज ब द हो गय | लीला ने सोचा,
या लोग कह चले गए, या सो गये | एकाएक स नाटा expressing a positive sentiment with the य छा गया | (Last afternoon, the silence reportage. was over the entire place. There were no voices around. The sounds of Hu-Ha com- pletely stopped. Leela thought, did peo- ple go somewhere, or perhaps they slept? Why all of a sudden there is silence ev- erywhere?) Here the narrator is trying to create sus- pense. • Sad: उ ह ने खुद वह सब क झेले ह, जो वह मुझे झे- लवाना चाहती ह | उनके वा य पर उन क का जरा भी असर नह पड़ा | वह इस 65 वष क उ म मुझसे कह टाँठ ह | फर उ ह कैसे मालूम हो क इन क से वा य बगड़ सकता है | (She herself has ex- perienced all the hardships and she wants me to do the same. Those sufferings did not have any effect on her health. She is much healthier than me despite being 65 years of age. Then how does she will come to know that health problems are worsened by these sufferings?) Once the wife justifies her mother-in-law’ s actions, she starts explaining herself and her deplorable situation by taking the ref- erence of her failing health. Due to the continuous arguments and hardships she has to face due to her mother-in-law, her health is suffering. She even says her mother-in-law at the age of 65 is health- ier than herself. Thus, here she is showing the signs of her being sad about her situa- tion. Thus, this comes under the category ‘Sad’. • Neutral: गाँव से मेला चला | और ब च के साथ हा मद भी जा रहा था | कभी सबके सब दौड़कर आगे नकल जाते | फर कसी पेड़ के नीचे खड़े होकर साथ वाल का इंतज़ार करते | (A group of people from the village left for Idgah. And with the kids, hammid was also going. Sometimes they start running in an attempt to outdo the others. Then they stood under a tree and waited for the people to catch up to them.) Here although the sentence itself is pos- itive but the narrator is not emotionally attached to the situation. He is just re- porting it as is. There is no clue whatso- ever which indicates that the narrator is
You can also read