BHAAV (भ व) - A Text Corpus for Emotion Analysis from - arXiv

Page created by Fernando Barker
 
CONTINUE READING
BHAAV (भ व) - A Text Corpus for Emotion Analysis from - arXiv
BHAAV (भाव) - A Text Corpus for Emotion Analysis from
                       Hindi Stories
       Yaman Kumar                          Debanjan Mahata∗                   Sagar Aggarwal
     Adobe Systems, Noida                     Bloomberg LP                       NSIT-Delhi
          ykumar@adobe.com                   dmahata@bloomberg.net            sagara.co@nsit.net.in

       Anmol Chugh                          Rajat Maheshwari                  Rajiv Ratn Shah
     Adobe Systems, Noida                    USICT, New Delhi                     IIIT-Delhi
          achugh@adobe.com               rajat.usict.101164@ipu.ac.in         rajivratn@iiitd.ac.in

                     Abstract                              considerable attention from the scientific com-
                                                           munity making it one of the important areas
    In this paper, we introduce the first and              of research in computational linguistics.
    largest Hindi text corpus, named BHAAV
    (भाव) , which means emotions in Hindi, for
                                                              Majority of the methods and resources de-
    analyzing emotions that a writer expresses             veloped in emotion analysis domain deals
    through his characters in a story, as per-             with English language (Yadollahi et al., 2017).
    ceived by a narrator/reader. The corpus                Moreover, since there are no text based re-
    consists of 20,304 sentences collected from            sources for emotion analysis in Hindi, our un-
    230 different short stories spanning across            derstanding of expression of emotions is only
    18 genres such as ेरणादायक (Inspirational)             limited to English text. To this end, we de-
    and रह यमयी (Mystery). Each sentence has
                                                           velop a text corpus1 for emotion analysis from
    been annotated into one of the five emotion
    categories (anger, joy, suspense, sad, and             stories written in Hindi, which is one of the
    neutral), by three native Hindi speakers               22 official languages of India and is among the
    with at least ten years of formal education            top five most widely spoken languages in the
    in Hindi. We also discuss challenges in the            world2 . The proposed corpus is the largest
    annotation of low resource languages such              annotated corpus for studying emotions from
    as Hindi, and discuss the scope of the pro-            Hindi text, and facilitates the development of
    posed corpus along with its possible uses.
                                                           linguistic resources in low-resource languages.
    We also provide a detailed analysis of the
    dataset and train strong baseline classifiers             According to a joint report by KPMG and
    reporting their performances.                          Google3 published in 2017, there are 234 mil-
                                                           lion Internet users in India using one of the
1 Introduction                                             Indian languages as their medium of commu-
                                                           nication against 175 million users using En-
Emotion analysis from text is the study of                 glish. This gap is predicted to increase by
identifying, classifying and analyzing emo-                2021, with users using Indian languages reach-
tions (e.g., joy, sadness), as expressed and               ing 536 million. Thus, social media companies
reflected in a piece of given text (Yadollahi              like Facebook and Internet search companies
et al., 2017). Its wide range of applications              like Google have increased their support for
in areas such as customer relation manage-                 popularly used Indian languages. Since Hindi
ment (Bougie et al., 2003), dialogue systems               is the most widely spoken Indian language, fol-
(Ravaja et al., 2006), intelligent tutoring sys-           lowed by Bengali and Telugu, the introduction
tems (Litman and Forbes-Riley, 2004), analyz-              of BHAAV dataset is apt and timely.
ing human communications (Kövecses, 2003),
                                                              Related to the task of emotion analysis
natural text-to-speech systems (Francisco and
                                                           in Hindi, previous attempts have been made
Gervás, 2006), assistive robots (Breazeal and
Brooks, 2005), product analysis (Knautz et al.,              1
                                                                https://doi.org/10.5281/zenodo.3457467
2010), and studying psychology from social me-               2
                                                                https://en.wikipedia.org/wiki/List_of_
dia (De Choudhury et al., 2013), has drawn                 languages_by_number_of_native_speakers
                                                              3
                                                                https://assets.kpmg.com/
    Author
    ∗
             participated in this research as an adjunct   content/dam/kpmg/in/pdf/2017/04/
faculty at IIIT-Delhi, India.                              Indian-languages-Defining-Indias-Internet.pdf
BHAAV (भ व) - A Text Corpus for Emotion Analysis from - arXiv
in developing corpus for predicting emotions           Major contributions of this work are:
from Hindi-English code switched language           - Publicly share the first and the largest
used in social media (Vijay et al., 2018) (2,866    annotated Hindi corpus (BHAAV) for senti-
sentences), and from auditory speech signals        ment analysis, consisting of 20,304 sentences
(Koolagudi et al., 2011). Some work has been        from 230 popular Hindi short stories spanning
undertaken in a closely related task of senti-      across 18 popular genres. Each sentence is la-
ment analysis and datasets have been created        beled with one of the five emotion categories:
for identifying sentiments expressed in movie       anger, joy, suspense, sad, and neutral.
reviews (Mittal et al., 2013) (664 reviews),        - Describe potential applications of BHAAV,
Hindi blogs (Arora, 2013) (250 blogs), and          the process of annotation, and main challenges
generating lexical resources like Hindi Senti-      in creating an emotion analysis text corpus for
Wordnet (Joshi et al., 2010). Given the dearth      a low-resource language like Hindi.
of resources for analyzing emotions from Hindi      - Propose strong baseline classifiers and report
text, we present and publicly share BHAAV,          their results for identifying emotion expressed
a corpus of 20,304 sentences collected from 230     in a sentence of a story written in Hindi.
different short stories (e.g., Eidgah (ईदगाह) by
Munshi Premchand) written in Hindi, span-           2 Related Work
ning across 18 genres (see Table 7 for complete
list). Each sentence has been annotated by          It is necessary to mention that there has been
three native Hindi speakers who has at least        extensive work in sentiment analysis especially
ten years of reading and writing experience in      in the past two decades. Although, there is sig-
Hindi language, with the goal of identifying        nificant intersection between techniques used
one of the following five popular emotion cat-      for sentiment analysis and emotion analysis,
egories: anger, joy, suspense, sad, and neutral.    yet the two are different in many ways. Emo-
                                                    tion analysis is often tackled at a fine grained
   Stories are a melting pot of different types     level and has historically proved to be more
of emotions expressed by the author through         challenging due to subtleties involved in iden-
the characters and plots that he develops in        tifying and defining emotions. Additionally,
his writing. Emotions in storytelling has been      resources for emotion analysis are scarce when
previously studied resulting in identification of   compared to sentiment analysis, especially for
six basic types of emotional arcs in English        low resource languages like Hindi. For a de-
stories (Reagan et al., 2016), namely - Rags        tailed survey of methods, datasets and theo-
to riches, Tragedy, Man in a hole, Icarus, Cin-     retical foundations on sentiment analysis and
derella, and Oedipus. This motivated us to de-      emotion analysis, please refer (Yadollahi et al.,
velop BHAAV from Hindi stories. We believe          2017; Lei et al., 2018; Cambria et al., 2017; Po-
that apart from studying emotions in Hindi          ria et al., 2017).
text, the presented corpus would also enable           Analyzing emotions from text has been pri-
studies related to the analysis of Hindi liter-     marily manifested through four different types
ature from the perspective of identifying the       of tasks - Emotion Detection (Gupta et al.,
inherent emotional arcs. It also has the poten-     2013), Emotion Polarity Classification (Alm
tial to catalyze research related to human text-    et al., 2005), Emotion Classification (Yang
to-speech systems geared towards improving          et al., 2007), and Emotion Cause Detection
automated storytelling experiences. For in-         (Gao et al., 2015). The scope of this work is
stance, inducing emotion cues while automati-       limited to the task of Emotion Classification.
cally synthesizing speech for stories from text.    (Pang et al., 2008), mentions that emotions
We keep the order of the sentences intact           are expressed at four levels - morphological,
as they occur in their source story. This           lexical, syntactic and figurative, and noted that
makes the corpus ideal for performing tem-          as we move from morphological to figurative,
poral analysis of emotions in the stories, and      the difficulty of the emotion analysis task in-
provides enough information for training ma-        creases and number of resources for the same
chine learning models that takes into account       decrease. Developed from stories written in
temporal context.                                   a morphologically rich language, BHAAV pri-
marily deals with the first and the last levels.   language specific challenges (Arora, 2013), in
  Most of the work for creating data resources     order to draw a complete picture of the intri-
for emotion analysis has been fairly limited to    cacies of the task and emphasize that there is a
building emotion lexicons (Strapparava et al.,     scope of developing methods specific to Hindi,
2004; Pennebaker et al., 2001; Shahraki and        and not all methods developed for English can
Zaiane, 2017; Mohammad and Turney, 2013),          be directly translated to Hindi.
or concentrated in annotating emotions of in-      Word Order - The order in which words ap-
dividual sentences without giving any context      pear in a sentence plays an important role in
(Strapparava and Mihalcea, 2007). As indi-         determining polarity as well as subjectivity of
cated by many, this approach is a non-holistic     the text. As opposed to English, which is a
for a task such as emotion analysis (Schwarz-      fixed order language, Hindi is a free order lan-
Friesel, 2015; Ortony et al., 1987). To this       guage. For any sentence in English to be gram-
end, BHAAV not only presents annotated sen-        matically correct the ‘subject’ (S) is followed
tences, but also provides their context.           by ‘verb’ (V), which is followed by ‘object’ (O),
  Lastly, when it comes to the task of analyz-     i.e., in the [SVO] pattern. For example the
ing emotions from text, there are no datasets      English sentence - “Ram (राम) ate (खाया) three
available in Hindi. Although, resource-poor        mangoes (तीन आम)”, which follows [SVO], can
Indian languages have started catching up          be expressed in the following three ways in
their richer counterparts in the domain of         Hindi that do not adhere to the [SVO] pat-
sentiment analysis (SA) (Mittal et al., 2013;      tern: (i) ‘राम ने तीन आम खाया’ [SVO], (ii) ‘तीन आम
Arora, 2013; Joshi et al., 2010), yet sufficient   खाया राम ने’ [OVS], and (iii) ‘खाये तीन आम राम ने’
work needs to be done considering the pace         [VOS]. This lack of order can pose challenges
at which these languages are finding their uses    to the machine learning algorithms that take
in modern digitally driven India. The lack of      into account the order of the words.
resources can be judged from the wide usage        Morphological Variations - Hindi language
of one of the very few Hindi datasets for SA       is morphologically rich. This means that a lot
tasks (Balamurali et al., 2012). It consists       more information can be expressed in a word
of just 200 positive and negative sentences        in Hindi for which one might end up writing
for two major Indian languages, Hindi and          many more words in English. One of the ex-
Marathi. Another popular and a recent at-          ample is that of expressing genders. For exam-
tempt is by (Patra et al., 2015). Their dataset    ple, when using the word ‘खायेगी’, which means
contains approximately 1500 tweets for lan-        ‘will eat’ in English, one can not only indicate
guages of Hindi, Bengali and Tamil annotated       that someone will eat but also provide cues of
for the task of Aspect Based Sentiment Anal-       the person’s gender (in this case female - the
ysis. BHAAV is certainly an attempt to fill        male variant is ‘खायेगा’).
this gap and create a large, effective and high    Handling Spelling Variations - A word
quality resource for emotion mining from text.     with the same meaning can appear with mul-
                                                   tiple spelling variations. Occurrence of such
3 Language Specific Challenges                     variations can pose challenges for the machine
                                                   learning models that has to take into account
As already mentioned and pointed in (Yadol-        all the spelling variants. For example the word
lahi et al., 2017), the computational methods      ‘मेहगा’, which means ‘costly’ has another vari-
used in the tasks pertaining to sentiment anal-    ant महंगा that means the same.
ysis (SA) can readily be applied to the emotion    Lack of Resources - The lack of lexicons,
analysis (EA) tasks. Therefore, the challenges     developed techniques and elaborate resources
for EA from text are very similar to that of       in Hindi also adds to the challenge, which is
the domain of SA from text. For a detailed de-     also one of the main motivations for our work.
scription of the challenges one can refer (Mo-
hammad, 2017). However, our task of identi-        4 Corpus Creation and Annotation
fying emotions from sentences poses additional
challenges due to the inherent characteristics     One of our primary aims was to create a man-
of Hindi language. We point out some of these      ually annotated large corpus for performing
emotion analysis from text in Hindi. We also                                                the output of the initial phase, we observed
wanted to capture the context in which a given                                              that not all basic emotions occurred promi-
piece of text occurs. Therefore, we decided to                                              nently in the selected Hindi stories. There
extract all the sentences from short stories be-                                            were five main categories of emotions which
longing to genres popular in Hindi. Whenever                                                were found to be present extensively in the cor-
possible we also searched for an audio book4                                                pus - anger, joy, suspense, sad, and neutral. A
where the same story has been narrated by a                                                 brief description of all the emotion categories
narrator. This was done in order to help the                                                is presented in Table 1. A few examples to il-
annotators during the annotation process, in                                                lustrate the various categories as annotated by
case they have to refer to examples of how a                                                the annotators are also given in Table 2. More
narrator/reader would express the emotion of                                                examples along with common error cases are
a sentence in the context of the story. All our                                             listed in the Appendix section of the paper.
annotators were native Hindi speaking volun-
                                                                                               The annotators were instructed not to be
teers who had a minimum of 10 years of formal
                                                                                            biased by their own interpretations of a state-
education in Hindi, and showed great interest
                                                                                            ment in the story while labeling them. For ex-
in reading the stories.
                                                                                            ample, take the case of the following sentence
 Emotion Category     κ       α         Emotions expressed by the category                  एक दवसीय केट मैच म भारत से हार गया पाक (Pakistan
       joy          0.821   0.821         joy, gratitude, happiness, pleasantness
      anger         0.807   0.807              anger, rage, disgust, irritation             lost to India in One Day International). An
    suspense        0.757   0.757        wonder, excitement, anxious uncertainty
       sad          0.835   0.835   sadness, dis-consolation, loneliness, anxiety, misery   Indian annotator is often inclined to mark it as
     neutral        0.789   0.788                    None of the above
  BHAAV dataset     0.802   0.802                                                           joy while a Pakistani annotator often marks it
                                                                                            as sad where as an unbiased reader would read
Table 1: Emotions and thier inter-annotator agree-
ments as measured using Fleiss’ Kappa (κ) (Fleiss and                                       it as having neutral emotion. Thus, the anno-
Cohen, 1973) and Krippendorff’s alpha (α) (Krippen-                                         tators were asked to identify only the emotion
dorff, 2011) for the entire BHAAV dataset.                                                  that an unbiased narrator/reader of that story
                                                                                            would like to express while reading it to some-
   The extracted text from 230 stories was split
                                                                                            one. Whenever confused, they were asked to
into sentences in an automated way and con-
                                                                                            do the following: first, mark the reason why
tained many unnecessary text that were not
                                                                                            they think a sentence should have a particular
a part of the story. During the annotation
                                                                                            emotion; second, to refer to the audio book of
process, the annotators filtered the unwanted
                                                                                            the story if available and try to infer the emo-
text and only annotated the relevant portion.
                                                                                            tion being expressed; third, if any of the other
Whenever the sentences were not correctly
                                                                                            options do not work, mark it as neutral.
split, the annotators also corrected them. A
total of five annotators were used for annotat-                                                General statistics of the dataset are pre-
ing the entire corpus, such that each sentence                                              sented in Table 3. As can be seen from
gets at-least three annotations. During the an-                                             the table, Bhaav is imbalanced towards neu-
notation process the annotators had access to                                               tral sentences. This is due to the fact that
the actual online story and the list of audio                                               we took raw, unedited stories, making our
books. Each story was annotated in one sit-                                                 dataset mimic the distribution of emotions as
ting. It took nine months to finish the process.                                            expressed in the author’s writings. Alterna-
   The guidelines for annotating emotions were                                              tively, in order to balance the dataset, we
designed to be very short and concise with re-                                              could have taken selective sentences. However,
gards to the definitions of the categories to                                               there would have been several drawbacks as-
be assigned. Due to space restrictions, the                                                 sociated with such an approach: 1) loss of im-
guidelines for identifying each emotion are pre-                                            mediate sentence contexts; 2) separating the
sented in the Appendix. In order to identify                                                individual sentences from the bigger picture
emotion categories best suited for our short                                                as developed by the author in different plots
story corpus, we did some initial annotations                                               of the story, and 3) failure to capture the im-
with (Plutchik, 1984)’s ‘basic’ emotions. From                                              plicit emotions expressed by a character of the
   4
                                                                                            story (the emotions which a character is feel-
   Example of audio books for some of the stories
- https://www.youtube.com/user/sameergoswami/                                               ing vs what his words indicate). The over-
playlists                                                                                   all inter-annotator agreements and the agree-
Emotion   Sample Hindi Sentences                                      English Translation
     joy     बादशाह ने कहा तु हार कहानी पहली दोन से अ धक मनोरंजक है      The king said that your story is more entertaining than the previous two stories
    anger      पया नई देगा तो उसका खाल उतारकर बाजार म बेच देगा           If he does not give the money then I will take out his skin and sell it in the market
             मज र ने अब तक तो झलक भर देखी थी अब तो उसे पूर नजर भर देखा
  suspense                                                               Till now the worker had only seen his glimpses, but when he saw him fully he was
             तो ठगा सा खड़ा रह गया
                                                                         just stunned
     sad     उसने ँ आसे होते ए म मी क ओर देखा                            With teary eyes he saw his mother
   neutral   म इसक मां ं                                                 I am his mother

                         Table 2: Sample sentences from BHAAV dataset for each emotion label.

ments for individual emotion categories are                                        the mind, there is an implicit pointer that she
presented in Table 1. Next, we present some                                        is feeling irritated due to the haste and hence
of the challenges that we faced during the an-                                     is angry over him. These types of emotions
notation process that we think should be ex-                                       are totally contextual and could be identified
plicitly pointed out in order to provide a true                                    only while reading the story. We believe that
picture of the corpus as well as to give an idea                                   capturing these emotions are also necessary in
of the difficulties in carrying out such a pro-                                    order to make our annotation process holistic.
cess.                                                                              Although, we do not train any classification
                                                                                   model in this work that can take these types
4.1 Challenges in Annotation                                                       of context in order to predict the final emo-
Apart from the challenge of annotating a low-                                      tion of a sentence, yet we think that BHAAV
resource language for which one can seldom                                         as a dataset provides an opportunity to build
get high quality crowd workers, there were cer-                                    such contextual models making it a rich cor-
tain challenges that were both specific to the                                     pus unlike many other previous ones as already
domain of stories as well as generic ones pe-                                      pointed out in Section 2. We would certainly
culiar to the tasks of sentiment and emotion                                       like to take it up as a future work.
analysis. Some of the prominent ones as iden-
                                                                                   . Example 2 - अब ज दी पड़ है क लोग ईदगाह य नह चलते| इ ह
tified from the feedback of the annotators are                                     गृह थी क चता से या योजन| (Now he is feeling why do
presented below with examples.                                                     not people go to the mosque a little faster. What do
Identifying Implicit Emotions - The an-                                            they (the children) know about household chores)
notators were asked to identify the emotions                                       Primary Target of Opinion - Another chal-
whenever it was both explicitly and implic-                                        lenge comes when there is not even an im-
itly expressed. An example of explicitly ex-                                       plicit clue in the immediate context of a sen-
pressed emotion would be - Example 1, in                                           tence. For instance, in a story, sometimes a
which the speaker by using the words such as                                       character is developed as an adversary to a
सुहावना (refreshing), मनोहर (beautiful) clearly in-                                particular prop (i.e., PTO, Primary Target of
dicates that he is happy with the nature, thus                                     Opinion). The prop can be another charac-
expressing his joy in the statements.                                              ter or some inanimate object or phenomena.
. Example 1 - कतना मनोहर, कतना सुहावना भाव है| वृ पर                               From the start of the story, the character ex-
अजीब ह रयाली है, खेत म कुछ अजीब रौनक है, आसमान पर कुछ अजीब                         presses his emotions in a characteristic manner
ला लमा है| (It is such a beautiful and enjoyable feeling.
There is a strange greenery on the trees, some strange                             towards that PTO. Thus if a sentence or a con-
liveliness in the fields, there is some weird but enjoyable                        text does not have any explicit clues to know
redness in the sky)                                                                the state of the mind of the character, iden-
   Identifying implicit emotions were some-                                        tifying the PTO and the character‘s emotions
times confusing for the annotators and on tak-                                     towards PTO gives some connotation to that
ing a closer look we did find some of them be-                                     sentence. This is in line to what was suggested
ing marked as neutral. An example of implic-                                       in the work (Mohammad, 2016). An exam-
itly expressed emotion would be - Example 2,                                       ple of such an instance as presented in Exam-
in which a child’s grandmother is complain-                                        ple 3, can be derived from the famous story
ing about her son being too hasty of going to                                      by Munshi Premchand, Eidgah. The follow-
the mosque. She complains of his ignorance of                                      ing sentence when read in isolation could po-
knowing anything about driving a household                                         tentially trick someone into thinking whether
and its inherent difficulties. Although there                                      the boy speaking these dialogues is expressing
are no explicit words indicating her state of                                      mercy or even neutrality, when he is actually
expressing joy.                                            Hindi story. Both classic machine learning
. Example 3 - मोह सन- ले कन दल म कह रहे ह गे क मले तो खा   and modern deep learning models are trained
ल| (Mohsin- But in the hearts, they must be thinking       and their results are analyzed. We extensively
that if they could get it, they would eat it)
                                                           use Sklearn (Pedregosa et al., 2011) and Keras
Sarcasm - A common challenge which anno-                   (Chollet et al., 2018) as our machine learning
tators faced while annotating BHAAV is the                 toolkits.
case of sarcasm, which is again prevalent in
most of the previous works in sentiment and                5.1 Dataset
emotion analysis. Sarcasm, as it occurs, is                                               No. of Sentences   No. of Sentences
generally accompanied by either anger or de-                Emotion    No. of Sentences
                                                                                           (Train data)        (Test data)
                                                               joy           2,463              2,242               221
light (or sometimes both) of the speaker at                   anger          1,464              1,321               143

the dismay of the PTO. Thus, in most cases,                 suspense
                                                               sad
                                                                             1,512
                                                                             3,168
                                                                                                1,389
                                                                                                2,843
                                                                                                                    123
                                                                                                                    325
the emotional state of the speaker of sarcas-                neutral        11,697             10,478              1,219

tic comments was a mixture of anger with the               Table 3: Distribution of sentences in different cate-
PTO and rejoicement at its expense. How-                   gories of emotions in the BHAAV dataset.
ever, to account for the headline categories we
chose for Bhaav, annotators were asked to dif-                The BHAAV dataset was randomly shuf-
ferentiate between these two causes using the              fled and split into train and test datasets
context provided and mark the category which               with a ratio of 10:1. The distribution of la-
most closely represents the sentence. This was             bels in the two datasets are shown in Table
sometimes challenging. For instance, in exam-              3. The proportion of distribution of labels in
ple 4 the emotion most close to the state of               the test dataset is kept similar to the training
speaker is that of anger, when it could be eas-            dataset. We train our models on the training
ily misunderstood to be that of joy.                       dataset and test the final predictions on the
. Example 4 - हा हा हा! अब तुम बताओगे हम या बोल? (Ha Ha    test dataset. We do not create a separate vali-
Ha ! Now you would tell me what I should speak?)           dation dataset. However, we do use validation
Annotating Suspense - Suspense was the                     data extracted from the training data, when-
toughest category for the annotators and                   ever necessary for tuning the hyperparameters
proved very difficult for them to know exactly             of the models.
when a sentence is of this category. The anno-
tators were asked to mark a sentence as sus-               5.2 Text Preprocessing
pense when there is some element in it which               Before training the classification models one
evokes a sense of wonder, anticipation or worry            needs to preprocess the text and represent each
(see Example 5). Suspense is a unique feature              sentence as a feature vector. We tokenize
of stories which does not get fully expressed in           each sentence into words and remove punc-
other types of written materials such as news              tuations. We do not remove the stopwords.
articles, formal reports, and others.                      Since we deal with Hindi, the standard word
. Example 5 - पछले पहर को मह फल म स नाटा हो गया| -हा       tokenizers that are suitable for English lan-
क आवाज ब द हो गय | लीला ने सोचा, या लोग कह चले गए, या सो   guage could not be used. Therefore, we used
गये? एकाएक स नाटा य छा गया? (Last afternoon, the si-       the tokenizer shipped with Classical Language
lence was over the entire place. There were no voices
around. The sounds of Hu-Ha completely stopped.
                                                           Toolkit5 . Each sentence is vectorized after a
Leela thought, did people go somewhere, or perhaps they    feature extraction step for the classic machine
slept? Why all of a sudden there is silence everywhere?)   learning models such as Support Vector Ma-
  Next, we present the experiments performed               chines. Unigrams, Bigrams and Trigrams were
for training the baseline models.                          generated as features for each sentence and
                                                           their TF-IDF (Aizawa, 2003) scores were con-
5 Baseline Models                                          sidered as the feature values.
                                                              One of the key components of the input fed
In this section, we describe strong baseline
                                                           to the deep learning models are pre-trained
models that we train for the task of one of
                                                           word embeddings (Kusner et al., 2015), that
the emotions - anger, joy, suspense, sad, and
neutral, from a given sentence taken from a                   5
                                                                  http://docs.cltk.org/en/latest/hindi.html
are used for representing each word of the in-         with two annealing restarts has been shown
put sentences by a dense real valued vector.           to work faster and perform better than SGD
Since the dataset on which we train our mod-           in other NLP tasks (Denkowski and Neubig,
els is relatively small, we use the pretrained         2017). Therefore, we use the same as our
word embeddings in order to prevent overfit-           optimization algorithm for the deep learning
ting. This practice is commonly known as               models. As the task is a multi-class classifi-
transfer learning6 . We choose the Fasttext7           cation problem, categorical cross entropy was
word embeddings (Bojanowski et al., 2016),             used as the loss function, and the final layer
trained on the Hindi Wikipedia corpus. This            of both the deep learning models consisted of
was a natural choice due to its easy availabil-        a fully-connected dense neural network with
ity. Additionally, Fasttext is possibly a bet-         the extracted features as the input and a soft-
ter choice than other popular word embedding           max output giving the prediction probability
methods as it is more suitable for representing        for each of the five emotion categories.
words belonging to morphologically rich lan-                   Hyperparameter                     Range
guages like Hindi as described in Section 3.                No. of Filters for CNN           100, 200, 300, 400
                                                                  Filter sizes
   While training the deep learning models,                  for the CNN model
                                                                                                1, 2, 3, 4, 5, 6

each sentence in the training and test dataset             Dense Output Layer Size
                                                             Dropout Probability
                                                                                              100, 200, 300, 400
                                                                                     0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9
is converted to a fixed size document of 126                    Learning Rate                    0.0001, 0.001
                                                                  Batch Sizes                 8, 16, 32, 64, 128
words (maximum length of a sentence in the                          Epochs                     10, 50, 100, 150
dataset). Padding8 is used for sentences of                      LSTM units                8, 16, 32, 64, 128, 256

length lesser than 126 words. Each word is
                                                       Table 4: Hyperparameter ranges used for random
represented as a 300 dimensional (D) vector            search during training deep learning models (CNN and
by the word embedding model. All the words             Bidirectional LSTM).
in the dataset are mapped to their correspond-
ing word embedding vector. Whenever a word                Among the classic machine learning tech-
is not found in the vocabulary of the word em-         niques, Support Vector Machine (SVM) with
bedding model we assign it a 300-D zero vec-           a linear kernel (Hsu et al., 2003), Logistic Re-
tor. Each sentence is then represented as a            gression (Yu et al., 2011) and Random Forests
matrix of its constituent words and their cor-         (Breiman, 2001) were trained. A shallow Con-
responding embedding vector, which is then             volutional Neural Network with a single in-
fed as an input to deep learning algorithms.           put channel similar to (Severyn and Moschitti,
                                                       2015), and Bidirectional Long Short Term
5.3 Training                                           Memory networks with an architecture sim-
                                                       ilar to (Mahata et al., 2018), are the deep
All the machine learning models were trained
                                                       learning models that were trained. A ran-
after selecting the hyperparameters on a vali-
                                                       dom classifier that randomly generated predic-
dation data. 10-fold cross validation was used
                                                       tions from a label distribution similar to that
for the classic techniques. For the deep learn-
                                                       of the training dataset was also implemented.
ing models, random search (Bergstra and Ben-
                                                       Table 5 summarizes the performances of the
gio, 2012) was used for selecting the best hy-
                                                       classifiers on the test dataset for the following
perparameters among the ones shown in Ta-
                                                       metrics - macro average precision, macro aver-
ble 4, that best fitted a fixed randomly selected
                                                       age recall, macro average F1-score, and accu-
validation data comprising of 20% of the train-
                                                       racy (Sokolova and Lapalme, 2009). We chose
ing data. Only 100 iterations of random search
                                                       macro-average measures as the data is imbal-
was performed. Once the hyperparameter tun-
                                                       anced and macro-averaging will assign equal
ing was done the final model was trained on
                                                       weights to all the categories, which gives a bet-
the entire training data using the selected hy-
                                                       ter generic performance of any classifier.
perparameters. Adam (Kingma and Ba, 2014)
   6
     ftp://ftp.cs.wisc.edu/machine-learning/shavlik-
                                                       6 Discussion
group/torrey.handbook09.pdf                            In order to analyze the possible features chosen
   7
     https://github.com/facebookresearch/
fastText/blob/master/pretrained-vectors.md             by a machine learning classification algorithm
   8
     https://keras.io/preprocessing/sequence/          for discriminating between different categories
(a) Idealist                                                 (b) Exploiter and Exploited

                   Figure 1: Flow of emotions in randomly selected stories from two different genres.

                       Macro Avg       Macro Avg      Macro Avg
      Method
                       Precision        Recall           F1
                                                                      Accuracy    the data distribution (Table 3) and it being
 Logistic Regression
        SVM
                         0.58
                         0.48
                                         0.62
                                         0.52
                                                        0.58
                                                        0.49
                                                                         0.62
                                                                         0.52
                                                                                  the majority class. The performance of the
  Random Forests         0.44            0.59           0.45             0.59     suspense category was consistently low. Al-
        CNN              0.50            0.55           0.51             0.55
      BLSTM              0.43            0.60           0.47             0.60     though, the category of anger had a similar
 Random Classifier       0.40            0.40           0.40             0.40
                                                                                  presence in BHAAV, yet it had better perfor-
Table 5: Performance of the baseline supervised clas-                             mance than suspense. This might be due to
sification models on BHAAV dataset.                                               the presence of better discriminative features
                                                                                  for anger than suspense. Another reason could
                                                                                  be related to challenges associated with anno-
of emotions and to validate the ability of the
                                                                                  tating the suspense category (Section 4.1).
BHAAV dataset in providing such features to
any classifier, we looked at the most impor-                                         Our analysis provides a brief insight into
tant features chosen by the Logistic Regres-                                      the BHAAV dataset from which we can con-
sion model. Table 6 shows the top 10 most in-                                     clude that it is an appropriate dataset for emo-
formative unigram features for each category                                      tion identification and classification tasks. Al-
of emotion chosen by the model in order to                                        though, the dataset is created from stories, it
make the final predictions. As evident from                                       can possibly be used for many other domains
the choices, words like स न (glad), सुंदर (beau-                                  as it is rich in features indicating the five dif-
tiful), खुश (happy), हँस, (laugh), are sensible                                   ferent emotions as presented in this work. The
indicators of joy, and so are the words like                                      annotations were done from the perspective of
अपमान (insult), गु सा (anger), ोध (anger), बदला                                   a reader/narrator trying to express the emo-
(revenge), for anger. The other categories also                                   tion of a sentence, given the existing scenario
show a similar pattern.                                                           in the story and whenever applicable trying to
                                                                                  express the emotion of a character in the story.
  Emotion                  Top 10 Important Unigram Features                      This also makes this dataset suitable for train-
                             स न (glad), सुंदर (beautiful), खुश (happy),
     joy              हँस, (laugh), संगीत (music), खलौने (toys), मजा (fun),       ing automated text-to-speech interfaces (e.g.,
                           आनंद (joy), हँसकर (smilingly), उछल (jump)
                           अपमान (insult), गु सा (anger), ोध (anger),             audio books) for story narration and improv-
   anger         बदला (revenge), मूख (idiot), सजा (punishment), जह नुम (hell),
                             आग (fire),      (evil), च लाया (screamed)
                                                                                  ing them by infusing emotions in them.
                      आवाज़ (sound), आ य (astonishment), ज न (Genie),
  suspense       देखा (saw), यु (war), छन… (sound of anklets), कहाँ (where),
                          जा (magic), अचानक (suddenly), जहाज (ship),              6.1 Emotions and Genres
                                   रो (cry), मर (die), रोने (crying),
    sad                  ख (sadness), दय (heart), खी (sad), जीवन (life),
                                आँसू (tears), रोते (cry), भगवान् (God)            We started with frequently used 30 genres as
   neutral
                             कसान (farmer), उसने (he), ब नी (Binny),
             पूछा (asked), दादाजी (grandfather), कल (tomorrow), पं डत (pundit),
                                                                                  mentioned by (Nagendra, 1994) and selected
                             मेहता (mehta), मां (mother), आना (come)              500 popular online short stories. However, we
                                                                                  narrowed down to the most frequent 18 gen-
Table 6: Top 10 most important features for each emo-
tion category as identified by the Logistic Regression                            res (see Table 1 for complete list) and ended
model during training.                                                            up with extracting text from 230 stories, de-
                                                                                  pending on the availability of online content.
   We also looked at the performance of the                                       Throughout the process of deciding on genres
classifiers for individual categories. Neutral                                    and finding online content relevant to them,
category had the best performance consis-                                         we took help from some experts in Hindi liter-
tently, which is quiet easy to guess from the                                     ature who have done their PhD in Hindi liter-
Genres
                          आदशवादी (Idealist)
                                                             learning for text-based emotion prediction. In
                            ेमपरक (Romantic)                 Proceedings of the conference on human lan-
                       शहर जीवन (Urban Life)                 guage technology and empirical methods in nat-
        शोषक और शो षत वग (Exploiter and Exploited Class)
                      नी तपरक (Moral Stories)
                                                             ural language processing, pages 579–586. Asso-
                   कसान जीवन (Life of a Farmer)              ciation for Computational Linguistics.
                        ऐ तहा सक (Historical)
                        ेरणादायक (Inspiration)             Piyush Arora. 2013. Sentiment analysis for hindi
                     देश भ संभं धत (Patriotic)               language. MS by Research in Computer Science.
             गत जीवन क सम या (Personal Issues/Problems)
           ढ़ और अंध व ास (Dogmatic and Superstitious)
          संयु प रवार क सम या (Joint Family Problems)
                                                           AR Balamurali, Aditya Joshi, and Pushpak Bhat-
                          रह यमयी (Mystery)                 tacharyya. 2012. Cross-lingual sentiment anal-
                यथाथवादी (Realistic and Pragmatic)          ysis for indian languages using linked wordnets.
                           ामीण (Village Life)
                                                            Proceedings of COLING 2012: Posters, pages
                       उपदेशपरक (Instructive)
                भोगे ए यथाथ क कहानी (Real Stories)          73–82.
           समाज सुधारक (Society and its Reformation)
                                                           James Bergstra and Yoshua Bengio. 2012. Random
        Table 7: Genres present in BHAAV                     search for hyper-parameter optimization. Jour-
                                                             nal of Machine Learning Research, 13(Feb):281–
                                                             305.
ature.
                                                           Piotr Bojanowski, Edouard Grave, Armand Joulin,
   BHAAV is appropriate for analyzing the                    and Tomas Mikolov. 2016. Enriching word vec-
flow of emotions in individual stories and                   tors with subword information. arXiv preprint
study them for different genres. We plotted                  arXiv:1607.04606.
the flow of emotions in a randomly picked                  Roger Bougie, Rik Pieters, and Marcel Zeelenberg.
story from two different genres as shown in                  2003. Angry customers don’t come back, they
Figure 1. It is observable from the figures that             get back: The experience and behavioral impli-
each story has its own distinct emotion foot-                cations of anger and dissatisfaction in services.
                                                             Journal of the Academy of Marketing Science,
print. It would be interesting to study them                 31(4):377–393.
and draw interesting linguistic insights from
the Hindi literature using BHAAV.                          Cynthia Breazeal and Rodney Brooks. 2005.
                                                             Robot emotion: A functional perspective. Who
7 Future Work and Conclusion                                 needs emotions, pages 271–310.

In this work we publicly shared the first and              Leo Breiman. 2001. Random forests.         Machine
                                                             learning, 45(1):5–32.
the largest annotated corpus, named BHAAV,
with 20,304 sentences in Hindi, for emotion                Erik Cambria, Soujanya Poria, Alexander Gel-
analysis. We provided a detailed description                 bukh, and Mike Thelwall. 2017. Sentiment anal-
                                                             ysis is a big suitcase. IEEE Intelligent Systems,
of the dataset, language specific challenges,                32(6):74–80.
annotation process, challenges associated with
annotations and reported performances of the               François Chollet et al. 2018. Keras: The python
                                                             deep learning library. Astrophysics Source Code
baseline classification models trained on the                Library.
dataset for identifying emotions expressed in
a sentence. Through different observations we              Munmun De Choudhury, Michael Gamon, Scott
                                                            Counts, and Eric Horvitz. 2013. Predicting de-
confirm BHAAV to be rich with emotion cues                  pression via social media. ICWSM, 13:1–10.
and point to the potential applications. In the
future, we plan to work on enriching BHAAV                 Michael Denkowski and Graham Neubig. 2017.
                                                             Stronger baselines for trustable results in
with more annotations related to sentiment
                                                             neural machine translation. arXiv preprint
and discourse analysis, and believe that it will             arXiv:1706.09733.
prove to be a valuable resource in Hindi.
                                                           Joseph L Fleiss and Jacob Cohen. 1973. The
                                                             equivalence of weighted kappa and the intra-
References                                                   class correlation coefficient as measures of reli-
                                                             ability. Educational and psychological measure-
Akiko Aizawa. 2003. An information-theoretic per-            ment, 33(3):613–619.
  spective of tf–idf measures. Information Pro-
  cessing & Management, 39(1):45–65.                       Virginia Francisco and Pablo Gervás. 2006. Au-
                                                             tomated mark up of affective information in en-
Cecilia Ovesdotter Alm, Dan Roth, and Richard                glish texts. In International Conference on Text,
  Sproat. 2005. Emotions from text: machine                  Speech and Dialogue, pages 375–382. Springer.
Kai Gao, Hua Xu, and Jiushuo Wang. 2015. A           Namita Mittal, Basant Agarwal, Garvit Chouhan,
  rule-based approach to emotion cause detection       Nitin Bania, and Prateek Pareek. 2013. Senti-
  for chinese micro-blogs. Expert Systems with         ment analysis of hindi reviews based on negation
  Applications, 42(9):4517–4528.                       and discourse relation. In Proceedings of the
                                                       11th Workshop on Asian Language Resources,
Narendra Gupta, Mazin Gilbert, and Giuseppe Di         pages 45–50.
  Fabbrizio. 2013. Emotion detection in email
  customer care.   Computational Intelligence,       Saif Mohammad. 2016. A practical guide to senti-
  29(3):489–505.                                       ment annotation: Challenges and solutions. In
                                                       Proceedings of the 7th Workshop on Computa-
Chih-Wei Hsu, Chih-Chung Chang, Chih-Jen Lin,          tional Approaches to Subjectivity, Sentiment and
  et al. 2003. A practical guide to support vector     Social Media Analysis, pages 174–179. Associa-
  classification.                                      tion for Computational Linguistics.

Aditya Joshi, AR Balamurali, and Pushpak Bhat-       Saif M Mohammad. 2017. Challenges in sentiment
  tacharyya. 2010. A fall-back strategy for senti-     analysis. In A Practical Guide to Sentiment
  ment analysis in hindi: a case study. Proceed-       Analysis, pages 61–83. Springer.
  ings of the 8th ICON.
                                                     Saif M Mohammad and Peter D Turney. 2013.
Diederik P Kingma and Jimmy Ba. 2014. Adam:            Nrc emotion lexicon. National Research Coun-
  A method for stochastic optimization. arXiv          cil, Canada.
  preprint arXiv:1412.6980.
                                                     Doctor Nagendra. 1994. Hindi sahitya ka itihas.
Kathrin Knautz, Tobias Siebenlist, and Wolf-
  gang G Stock. 2010. Memose: search engine          Andrew Ortony, Gerald L Clore, and Mark A Foss.
  for emotions in multimedia documents. In Pro-        1987. The referential structure of the affective
  ceedings of the 33rd International ACM SIGIR         lexicon. Cognitive science, 11(3):341–364.
  Conference on Research and development in in-      Bo Pang, Lillian Lee, et al. 2008. Opinion min-
  formation retrieval, pages 791–792. ACM.             ing and sentiment analysis. Foundations and
                                                       Trends® in Information Retrieval, 2(1–2):1–135.
Shashidhar G Koolagudi, Ramu Reddy, Jainath
  Yadav, and K Sreenivasa Rao. 2011. Iitkgp-         Braja Gopal Patra, Dipankar Das, Amitava Das,
  sehsc: Hindi speech corpus for emotion anal-         and Rajendra Prasath. 2015.       Shared task
  ysis. In Devices and Communications (ICDe-           on sentiment analysis in indian languages sail
  Com), 2011 International Conference on, pages        tweets - an overview. In Proceedings of the
  1–5. IEEE.                                           Third International Conference on Mining In-
                                                       telligence and Knowledge Exploration - Volume
Zoltán Kövecses. 2003. Metaphor and emotion:           9468, MIKE 2015, pages 650–655. Springer-
  Language, culture, and body in human feeling.        Verlag.
  Cambridge University Press.
                                                     Fabian Pedregosa, Gaël Varoquaux, Alexandre
Klaus Krippendorff. 2011.      Computing krippen-      Gramfort, Vincent Michel, Bertrand Thirion,
  dorff’s alpha-reliability.                           Olivier Grisel, Mathieu Blondel, Peter Pretten-
                                                       hofer, Ron Weiss, Vincent Dubourg, et al. 2011.
Matt Kusner, Yu Sun, Nicholas Kolkin, and Kil-         Scikit-learn: Machine learning in python. Jour-
 ian Weinberger. 2015. From word embeddings            nal of machine learning research, 12(Oct):2825–
 to document distances. In International Con-          2830.
 ference on Machine Learning, pages 957–966.
                                                     James W Pennebaker, Martha E Francis, and
Zhang Lei, Wang Shuai, and Liu Bing. 2018. Deep        Roger J Booth. 2001. Linguistic inquiry and
  learning for sentiment analysis: A survey. Cor-      word count: Liwc 2001. Mahway: Lawrence Erl-
  nell Science Library.                                baum Associates, 71(2001):2001.
Diane J Litman and Kate Forbes-Riley. 2004. Pre-     Robert Plutchik. 1984. Emotions: A general psy-
  dicting student emotions in computer-human tu-       choevolutionary theory. Approaches to emotion,
  toring dialogues. In Proceedings of the 42nd An-     1984:197–219.
  nual Meeting on Association for Computational
  Linguistics, page 351. Association for Computa-    Soujanya Poria, Erik Cambria, Rajiv Bajpai, and
  tional Linguistics.                                  Amir Hussain. 2017. A review of affective com-
                                                       puting: From unimodal analysis to multimodal
Debanjan Mahata, Jasper Friedrichs, Rajiv Ratn         fusion. Information Fusion, 37:98–125.
  Shah, et al. 2018.      # phramacovigilance-
  exploring deep learning techniques for identify-   Niklas Ravaja, Timo Saari, Marko Turpeinen, Jari
  ing mentions of medication intake from twitter.      Laarni, Mikko Salminen, and Matias Kivikan-
  arXiv preprint arXiv:1805.06375.                     gas. 2006. Spatial presence and emotions during
video game playing: Does it matter with whom      Hsiang-Fu Yu, Fang-Lan Huang, and Chih-Jen Lin.
  you play? Presence: Teleoperators and Virtual       2011. Dual coordinate descent methods for lo-
  Environments, 15(4):381–392.                        gistic regression and maximum entropy models.
                                                      Machine Learning, 85(1-2):41–75.
Andrew J Reagan, Lewis Mitchell, Dilan Kiley,
  Christopher M Danforth, and Peter Sheridan        A General Instruction
  Dodds. 2016. The emotional arcs of stories are
  dominated by six basic shapes. EPJ Data Sci-       • Attempt HITs only if you are a native
  ence, 5(1):31.                                       speaker of Hindi.
Monika Schwarz-Friesel. 2015. Language and emo-
 tion. The Cognitive Linguistic Perspective,         • Your responses are confidential. Any pub-
 in: Ulrike Lüdtke (Hg.), Emotion in Lan-              lications based on these responses will not
 guage. Theory–Research–Application, Amster-           include your specific responses, but rather
 dam, pages 157–173.                                   aggregate information from many individ-
Aliaksei Severyn and Alessandro Moschitti. 2015.       uals.We will not ask any information that
  Twitter sentiment analysis with deep convolu-        can be used to identify who you are.
  tional neural networks. In Proceedings of the
  38th International ACM SIGIR Conference on        B Task Specific Instructions
  Research and Development in Information Re-
  trieval, pages 959–962. ACM.                       • We take into account these five headline
Ameneh Gholipour Shahraki and Osmar R Zaiane.
                                                       categories: Anger, Joy, Sad, Suspense,
 2017. Lexical and learning-based emotion min-         Neutral/ Plain Talk.
 ing from text. In Proceedings of the Interna-
 tional Conference on Computational Linguistics      • The headline and subordinate categories
 and Intelligent Text Processing.                      are as mentioned below
Marina Sokolova and Guy Lapalme. 2009. A sys-             – Anger(0) - Emotions include anger,
 tematic analysis of performance measures for               rage, disgust, violent unwillingness,
 classification tasks. Information Processing &
 Management, 45(4):427–437.                                 sadism, irritation
                                                          – Joy(1) - Emotions include Joy, grat-
Carlo Strapparava and Rada Mihalcea. 2007.                  itude, happiness, pleasantness, ela-
  SemEval-2007 task 14: Affective text. In Pro-
  ceedings of the Fourth International Workshop             tion, positive excitement, triumph,
  on Semantic Evaluations (SemEval-2007), pages             gratification, pride
  70–74. Association for Computational Linguis-           – Sad(2) - Emotions include sadness,
  tics.
                                                            disconsolation, loneliness, anxiety,
Carlo Strapparava, Alessandro Valitutti, et al.             misery, sorry, depressing, shameful,
  2004. Wordnet affect: an affective extension of           grief-stricken, melancholy, unwilling
  wordnet. In Lrec, volume 4, pages 1083–1086.
  Citeseer.                                               – Suspense(3) - Wonder, excitement,
                                                            anxious uncertainty
Deepanshu Vijay, Aditya Bohra, Vinay Singh,
                                                          – Neutral(4) / Plain talk - These in-
  Syed Sarfaraz Akhtar, and Manish Shrivastava.
  2018. Corpus creation and emotion prediction              clude no emotions, examples are gen-
  for hindi-english code-mixed social media text.           eral talk spoken with no emotion
  In Proceedings of the 2018 Conference of the
  North American Chapter of the Association for      • Agreeing or disagreeing with the speaker’
  Computational Linguistics: Student Research          s views should not have a bearing on your
  Workshop, pages 128–135.                             response. You are to assess the language
Ali Yadollahi, Ameneh Gholipour Shahraki, and          being used (not the views). For exam-
  Osmar R Zaiane. 2017. Current state of text          ple, given the tweet, ‘Evolution makes
  sentiment analysis from opinion to emotion           no sense’, the correct answer is ‘the
  mining. ACM Computing Surveys (CSUR),
  50(2):25.                                            speaker is using negative language’ since
                                                       the speaker’s words are criticizing or
Changhua Yang, Kevin Hsin-Yih Lin, and Hsin-           judging negatively something (in this case
  Hsi Chen. 2007. Emotion classification us-
  ing web blog corpora. In Web Intelligence,
                                                       the theory of evolution). Note that the
  IEEE/WIC/ACM International Conference on,            answer is not contingent on whether you
  pages 275–278. IEEE.                                 believe in evolution or not.
• From reading the text, identify the entity                      पर कुछ अजीब ला लमा है |आज का सूय देखो, कतना
   towards which opinion is being expressed                         यारा, कतना शीतल है, यानी संसार को ईद क बधाई
   or the entity towards which the speaker’                        दे रहा है | गाँव म कतनी हलचल है | ईदगाह जाने क
   s attitude can be determined. This en-                          तैया रयाँ हो रही ह | (Eid has come after 30
   tity is usually a person, object, company,                      days of Ramadan. It is such a beautiful
   group of people, or some such entity. We                        and enjoyable feeling. There is a strange
   will call this the PRIMARY TARGET                               greenery on the trees, some strange liveli-
   OF OPINION (PTO). For example, if the                           ness in the fields, there is some weird but
   text criticizes certain actions or beliefs of                   enjoyable redness in the sky. Look at to-
   a person (or group of persons), then that                       day’s sun is looking, how lovely, how cool
   person or group is the PTO. If the text                         it is, that is to congratulate the world on
   mocks people who do not believe in evo-                         Eid. There is so much commotion in the
   lution, then the PTO is ‘people who do                          village? Preparations are going to go to
   not believe in evolution’. If the text ques-                    Idgah.)
   tions or mocks evolution, then the PTO                          Here the narrator is expressing his joy to-
   is ‘evolution’.                                                 wards the change in season and the com-
                                                                   ing of the festival.
 • While annotating, always try to find an
   explicit or implicit clue which suggests the                 • Anger: लड़के सबसे यादा स न ह | कसी ने एक
   speakers’attitude towards the situation.                       रोजा रखा है, वह भी दोपहर तक, कसी ने वह भी नह , ले-
   The speaker in this reference can be the                        कन ईदगाह जाने क खुशी उनके ह से क चीज है | रोजे
   narrator himself or the characters of the                      बड़े-बूढ़ के लए ह गे | इनके लए तो ईद है | रोज ईद का
   story. Example of a clue can be positive                       नाम रटते थे, आज वह आ गयी | अब ज दी पड़ है क
   words or sentiments described in a sen-                        लोग ईदगाह य नह चलते | इ ह गृह थी क चता से
   tence explaining a situation.                                    या योजन | सेवैय के लए ध ओर श कर घर म है या
    Example - रमजान के पूरे तीस रोज के बाद ईद आयी है | कतना       नह , इनक बला से, ये तो सेवेयाँ खायगे | वह या जान
    मनोहर, कतना सुहावना भाव है | वृ   पर अजीब ह रयाली है, खेत      क अ बाजान य बदहवास चौधर कायमअली के घर दौ-
    म कुछ अजीब रौनक है, आसमान पर कुछ अजीब ला लमा है |             ड़े जा रहे ह | उ ह या खबर क चौधर आँख बदल ल, तो
                                                                  यह सार ईद मुहरम हो जाय | The boys are most
    Here, the narrator narrates the story                         pleased. Someone has kept a rosa, that
    about a month Ramzan. The narrator                            too by noon, someone hasn’t, but the joy
    though could be plain talking but the sen-                    of going to Idgah is the part of their share.
    tence, “ कतना मनोहर, कतना सुहावना भाव है”                     Rose will be for the elderly. For them it is
    give us a clue that he is not simply stating                  Eid. Every day the people use to talk about
    the events as is. Rather, he has an emo-                      Eid, today it came. Now they are excited,
    tional attachment to the climate and the                      asking why do not people go to the mosque
    story settings. In particular, he is happy                    a little faster. What do they (the children)
    about the environment and its refreshing                      know about household chores? .They are
    events. Thus, using this clue we can know                     not bothered whether there is milk or sugar
    for sure, that these sentences are not neu-                   in the house for saivanya (a type of food),
    tral but contain an emotion of joy.                           they just want to eat it. What does he
                                                                  know why the father is going to Chowd-
 • In case where someone is just quoting an-
                                                                  hary (ask for money to celebrate Eid).
   other person with no reference to his own
                                                                  They don’t know if Chowdhary changes
   emotional state, find explicit or implicit
                                                                  his mood, Eid would become Muharram.
   clue which suggests the speaker’s atti-
   tude towards PTO.                                              In the last three sentences, the narra-
                                                                  tor shows signs of irritation, which is
C Example HIT                                                     a subcategory of the headline category,
                                                                 “Anger”.
 • Joy: रमजान के पूरे तीस रोज के बाद ईद आयी है
   | कतना मनोहर, कतना सुहावना भाव है | वृ पर                    • Suspense: पछले पहर को मह फल म स नाटा हो
   अजीब ह रयाली है, खेत म कुछ अजीब रौनक है, आसमान                 गया | -हा क आवाज ब द हो गय | लीला ने सोचा,
या लोग कह चले गए, या सो गये | एकाएक स नाटा     expressing a positive sentiment with the
    य छा गया | (Last afternoon, the silence        reportage.
   was over the entire place. There were no
   voices around. The sounds of Hu-Ha com-
   pletely stopped. Leela thought, did peo-
   ple go somewhere, or perhaps they slept?
   Why all of a sudden there is silence ev-
   erywhere?)
   Here the narrator is trying to create sus-
   pense.

• Sad: उ ह ने खुद वह सब क झेले ह, जो वह मुझे झे-
  लवाना चाहती ह | उनके वा य पर उन क का जरा
  भी असर नह पड़ा | वह इस 65 वष क उ म मुझसे
  कह टाँठ ह | फर उ ह कैसे मालूम हो क इन क से
   वा य बगड़ सकता है | (She herself has ex-
  perienced all the hardships and she wants
  me to do the same. Those sufferings did
  not have any effect on her health. She is
  much healthier than me despite being 65
  years of age. Then how does she will come
  to know that health problems are worsened
  by these sufferings?)
  Once the wife justifies her mother-in-law’
  s actions, she starts explaining herself and
  her deplorable situation by taking the ref-
  erence of her failing health. Due to the
  continuous arguments and hardships she
  has to face due to her mother-in-law, her
  health is suffering. She even says her
  mother-in-law at the age of 65 is health-
  ier than herself. Thus, here she is showing
  the signs of her being sad about her situa-
  tion. Thus, this comes under the category
 ‘Sad’.

• Neutral: गाँव से मेला चला | और ब च के साथ
  हा मद भी जा रहा था | कभी सबके सब दौड़कर आगे
   नकल जाते | फर कसी पेड़ के नीचे खड़े होकर साथ
  वाल का इंतज़ार करते | (A group of people from
  the village left for Idgah. And with the
  kids, hammid was also going. Sometimes
  they start running in an attempt to outdo
  the others. Then they stood under a tree
  and waited for the people to catch up to
  them.)
   Here although the sentence itself is pos-
   itive but the narrator is not emotionally
   attached to the situation. He is just re-
   porting it as is. There is no clue whatso-
   ever which indicates that the narrator is
You can also read