JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Journal of ICT, 20, No. 1 (January) 2021, pp: 1-20 JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY http://jict.uum.edu.my How to cite this article: Nezhad, Z. B., & Deihimi, A. M. (2021). Sarcasm detection in Persian. Journal of Information and Communication Technology, 20(1), 1-20. https://doi.org/10.32890/jict.20.1.2021.6249 SARCASM DETECTION IN PERSIAN Zahra Bokaee Nezhad & 2Mohammad Ali Deihimi 1 1 Department of Computer Engineering, Zand University, Iran 2 Department of Electronics Engineering, Bahonar University, Iran Corresponding author: zbokaee@gmail.com m.a.deihimi@gmail.com Received: 25/4/2019 Revised: 13/5/2020 Accepted: 14/5/2020 Published: 4/11/2020 ABSTRACT Sarcasm is a form of communication where the individual states the opposite of what is implied. Therefore, detecting a sarcastic tone is somewhat complicated due to its ambiguous nature. On the other hand, identification of sarcasm is vital to various natural language processing tasks such as sentiment analysis and text summarisation. However, research on sarcasm detection in Persian is very limited. This paper investigated the sarcasm detection technique on Persian tweets by combining deep learning-based and machine learning-based approaches. Four sets of features that cover different types of sarcasm were proposed, namely deep polarity, sentiment, part of speech, and punctuation features. These features were utilised to classify the tweets as sarcastic and non- sarcastic. In this study, the deep polarity feature was proposed by conducting a sentiment analysis using deep neural network architecture. In addition, to extract the sentiment feature, a Persian sentiment dictionary was developed, which consisted of four sentiment categories. The study also used a new Persian proverb dictionary in the preparation step to enhance the accuracy of the proposed model. The performance of the model is analysed using several 1
Journal of ICT, 20, No. 1 (January) 2021, pp: 1-20 standard machine learning algorithms. The results of the experiment showed that the method outperformed the baseline method and reached an accuracy of 80.82%. The study also examined the importance of each proposed feature set and evaluated its added value to the classification. Keywords: Sarcasm detection, natural language processing, machine learning, sentiment analysis, classification. INTRODUCTION Twitter has become one of the biggest destinations for people to put forward their opinions. There is a great opportunity for companies and organisations to notice users’ ideas (Rajadesingan, Zafarani, & Liu, 2015). Sentiment analysis has emerged as a field of study in identifying opinionative data in the Web and classifying them according to their polarity. Sentiment analysis can present reasonable opportunities for marketers to generate market intelligence on consumer attitudes and help organisations to fulfil their purposes (Al-Otaibi et al., 2018). Nevertheless, there are some problems with using sentiment analysis tools. One of these problems is the existence of sarcasm in the user’s view, which can lead to misclassification of sentiment analysis (Maynard & Greenwood, 2014). In fact, sarcasm is one of the considerable challenges in sentiment analysis. It is an indirect way of telling a message and can be conveyed through different ways such as direct conversation, speech, text, etc. (Seyed Sadeqi & Ehsanjou, 2018). In direct conversation, facial expression and body gesture help to recognise sarcasm. In speech, sarcasm can be derived from changes in tone. In text, it is too difficult to identify sarcasm; however, there are some methods that help to reveal it. Cambridge Dictionary describes sarcasm as “the use of remarks that mean the opposite of what they say made to hurt some one’s feeling or to criticise something humorously!” Consider the following tweet, “Yay! It’s a holiday weekend, and I’m on call for work! Couldn’t be luckier!” Although this tweet includes the words “yay” and “lucky” with positive sentiments, the expression has a negative sentiment (Liu et al., 2014). This shows that detecting the sentiment of a tweet seems complicated when the tweet is sarcastic. Sarcasm may create problems for not only sentiment analysis approaches but also many other natural language processing (NLP) tasks such as review summarisation systems. Consider the following tweet, “I’m very pleased to waste my four hours on such a pathetic movie!”. Although this tweet has the word “pleased” with a positive sentiment, the whole emotion of the tweet is negative. Accordingly, if a movie review summarisation system does not employ a 2
Journal of ICT, 20, No. 1 (January) 2021, pp: 1-20 sarcasm detection model, it may recognise this tweet as a positive review (Maynard & Greenwood, 2014). It is demonstrated that the state-of-the-art approaches of sentiment analysis can be highly enhanced when there is an ability to detect sarcastic statements (Hazarika et al., 2018). To compound the problem, in Persian, people tend to use sarcasm in their daily conversations for criticizing and censoring especially in political topics (Hokmi, 2017).To the best of the researchers’ knowledge, there is no work on sarcasm detection in Persian. Therefore, they aim to present a model that performs the task of sarcasm detection in Persian. The proposed approach considers different types of sarcasm and it is evaluated on the first sarcastic Persian dataset. The rest of the article is structured as follows. The next section describes related works. It is followed by the architecture of the proposed model. The results are shown in the fourth section, while the fifth section concludes this work and proposes possible directions for future works. RELATED WORKS Currently, despite several studies that have been conducted to detect sarcasm in English, there is no attempt to address this problem in Persian. However, there are several research related to the recognition of sarcasm in different languages and this subject has gained rapid attention. Twitter sarcasm detection techniques can be classified into five categories, namely pattern- based approach, context-based approach, deep learning-based approach, machine learning-based approach, and lexicon-based approach (Bharti et al., 2017). Pattern-based approach: Bouazizi and Otsuki (2016) used a pattern-based approach to detect sarcasm on Twitter. They utilised three patterns for sarcasm in tweets: (1) sarcasm as Wit, which is used by capital letter words, punctuation marks, or sarcasm-related emoticons to show the sense of humour; (2) sarcasm as Evasion, which is used when a person wants to avoid giving a clear answer; and (3) sarcasm as Whimper, in which the anger of a person is shown. Then, four sets of features were extracted as sentiment-related features, punctuation- related features, syntactic and semantic features, and pattern features. They reached an accuracy of 83.1%. Context-based approach: Schifanella et al. (2016) built a complex classification model that worked over an entire tweet sequence rather than one tweet at a time. They deployed features based on the integration between linguistic and contextual features. 3
Journal of ICT, 20, No. 1 (January) 2021, pp: 1-20 Deep learning-based approach: Some studies proposed automated sarcasm detection using deep neural network architecture. Son et al. (2019) put forward a hybrid deep learning model based on soft Attention-Based Bidirectional Long Short-Term Memory (sAtt-BLSTM) and convolutional neural network (ConvNet). They applied Global Vector (GloVe) for word representation. Additionally, they used feature maps generated by sAtt-BLSTM as well as punctuation-based features. Felbo et al. (2017) proposed a deep Moji model based on the occurrence of emoji. They used Long Short-Term Memory (LSTM) as well as attention mechanisms and gained acceptable results. Ghosh and Veale (2016) built a model combing a Recurrent Neural Network (RNN) with a ConvNet. Their model represented better results as compared to recursive support vector machine (SVM). Machine learning-based approach: Many studies have been conducted to detect sarcasm based on machine learning approach. Rajadesingan et al. (2015) proposed a model to detect sarcasm by behavioural features using users’ past tweets. They employed theories from behavioural and psychological studies to construct their features. They used Naïve Bayes and SVM classifiers. Blamey et al. (2012) suggested a new feature to capture properties of a figurative language like emotional scenario and unexpectedness with ambiguity and polarity. Suhaimin et al. (2019) presented a framework to support sentiment analysis by using sarcasm detection and classification. The framework comprised six modules: pre-processing, feature extraction, feature selection, initial sentiment classification, sarcasm detection and classification, and actual sentiment classification. The framework was evaluated using a nonlinear SVM and Malay social media data. The best average F-measure score of 90.5% was recorded using their framework. Suhaimin et al. (2017) proposed a feature extraction process to detect sarcasm using Malay social media data as bilingual texts. They considered four categories of features using NLP, namely lexical, pragmatic, prosodic, and syntactic. They investigated the use of the idiosyncratic feature to capture peculiar and odd comments found in the texts. A nonlinear SVM was utilised for classification. Their results demonstrated that the combination of syntactic, pragmatic, and prosodic features produced the best performance with a F-measure score of 85.2%. Lunando and Purwarianti (2013) presented an Indonesian sarcasm detection model by using unigram, number of interjections, words, negativity, and question words as extracted features. For term recognition, they used translated SentiWordnet, which led to undetected terms and very low accuracy. Rahayu et al. (2018) proposed another method on Indonesian tweets based on two extracted features, namely interjection and punctuation. They also used two different weighting algorithms. Their proposed model outperformed the model proposed by Lunando and Purwarianti (2013). Nevertheless, the paucity of sophisticated features on their work leave more room for improvement. 4
w accuracy. Rahayu et al. (2018) proposed another method on Indonesian tweets based on two ext atures, namely interjection and punctuation. They also used two different weighting algorithms. Journal of ICT, 20, No. 1 (January) 2021, pp: 1-20 oposed model outperformed the model proposed by Lunando and Purwarianti (2013). Neverthele ucity of sophisticated features on their work leave more room for improvement. Lexicon-based approach: Parmar et al. (2018) utilised lexical and hyperbole features xicon-based approach: Parmartoetimprove the sentiment al. (2018) utilised analysis results. lexical and They employed hyperbole features to impro MapReduce to reduce the execution time by a parallel process platform. They ntiment analysissuggested results. They employed MapReduce to reduce the execution time by a parallel p five parts for their proposed model. In the first part, different Twitter atform. They suggested five parts for application programming their (APIs) interfaces proposedweremodel. performed In tothe firsttweets. retrieve part, different T plication programming Afterwards,interfaces (APIs) the results were were stored performed in the hadoop to retrieve distributedtweets. Afterwards, the file system (HDFS). Secondly, all data were preprocessed to remove re stored in the Hadoop Distributed File System (HDFS). Secondly, all data were noisy data such as preproces move noisy data Uniform such asResource Uniform Locator (URL). Resource The third Locator part was (URL). Thewords third relationships, part was words relation which were done by part-of-speech tagging (POS) method. Subsequently, all hich were done the by phrases part-of-speech tagging were stored (POS)file. in a parsed method. Subsequently, As the fourth all theanalysis step, sentiment phrases were store rsed file. As the wasfourth step,onsentiment conducted all phrases. analysis The lastwasstepconducted on all phrases. was a feature-based compositeThe last step ature-based composite approachapproach (FBCA). In (FBCA). this step,Inthethis step, the algorithm usedalgorithm hyperbole used hyperbole and lexical with and lexica punctuation and negation features to nctuation and negation features to detect sarcastic tweets.detect sarcastic tweets. Riloff et al. (2013) developed two bags of lexicons using bootstrap techniques. loff et al. (2013) developed These lexiconstwo bags ofoflexicons consisted positiveusing bootstrap sentiments and techniques. These lexicons con negative situations. positive sentiments and negative They attempted situations. to identify sarcasmThey attempted in tweets for any to identify positive sarcasm sentiment in a in tweets fo sitive sentimentnegative situation. in a negative However,However, situation. their method hadmethod their some limitations had somesince it could since it cou limitations not identify sarcasm across entify sarcasm across multiple sentences. multiple sentences. In the present study, the researchers will investigate the sarcasm detection the present study, the researchers technique on Persianwill investigate tweets the sarcasm by combining detection technique deep learning-based and machine on Persian twe mbining deep learning-based and machine learning-based approaches. Theylearning-based aim to find bothapproaches. They world-level and aim to find both sentence- level inconsistencies. Four different sets of features are extracted vel and sentence-level inconsistencies. Four different sets of features are extracted to cover to cover di pes of sarcasm different types in Persian. In of sarcasm this addition, in Persian. In addition, study presents thethis study first presents Persian the first proverb dictionary to Persian proverb dictionary to ignore any misclassification, which will be y misclassification, which will be further explained in the next section. further explained in the next section. PROPOSED MODEL PROPOSED MODEL The existence of sarcasm in tweets can lead to a state of ambiguity. It can e existence of also sarcasm in tweets decrease can lead the accuracy of to a state some NLPoftasks ambiguity. such as It can alsoAnalysis Sentiment decrease the accur me NLP tasks such as Sentiment and Text Analysis Summarisation (Bhartiand Text et al., Summarisation 2017). Consider the(Bharti et al., following 2017). Consid tweet, lowing tweet, "!!! بهتر از این نمیشه،( "گوشیم خورد شدMy (My phone phoneis is broken, broken, howhow betterbetter can can this b this be???). Although it is a sarcastic tweet, without deploying the sarcasm though it is a sarcastic tweet, without deploying the sarcasm detection model, the sentiment an detection model, the sentiment analysis model may mislead. Therefore, it is odel may mislead. Therefore, it is crucial to develop a model to detect sarcasm. crucial to develop a model to detect sarcasm. gure 1 represents the architecture Figure 1 represents of thethearchitecture proposed model. The model of the proposed consists model. The of four key compo model consists of four (2) mely (1) Data Preprocessing, keyData components, namely Preparation, (3)(1) Data Preprocessing, Feature Extraction, and(2)(4) Data Classification Preparation, (3) Feature Extraction, and (4) Classification. From each tweet, ch tweet, the researchers extracted a set of features in a way that covered different types of sa en, they used several machine learning algorithms to perform the classification. 5
Journal of ICT, 20, No. 1 (January) 2021, pp: 1-20 the researchers extracted a set of features in a way that covered different types of sarcasm. Then, they used several machine learning algorithms to perform the classification. Figure 1. Figure 1. Architecture Architecture of of Persian Persian sarcasm sarcasm detection detection model. model. Figure Figure 1. Architecture of Persian sarcasm 1. Architecture detection of Persian sarcasm detection model. model. Data Data Collection Collection Data Collection Data Collection OneOne problem problem of NLPoftasks NLPin tasks PersianinisPersian that thereisisthat there isdataset no standard no standard dataset for sentiment for nor analysis One sarcasm problem detection of NLP (Gelbukh, tasks 2009). sentiment analysis nor sarcasm in Persian Therefore, the is present detection that there study (Gelbukh, is created ano standard Persian dataset dataset for the forissentiment sarcasm a detection using Tweepy API. To collect One problem of NLP sarcastic tweets, tasks 2009). the researchersin queriedTherefore, Persian theisAPIthat for there tweets no standa sarcasm present detection study created(Gelbukh, a Persian 2009). datasetTherefore, for sarcasmthe present detection study using created Tweepy a Persian dataset sarcasm containing hashtags # طعنهor #( کنایهboth mean detection (Gelbukh, tweets sarcasm). Non-sarcastic 2009). Therefore, were the on retrieved based present study detection API. To using collect Tweepy sarcastic political hashtags. Then, these tweets API. tweets,To collect the were manually sarcastic researchers checked tweets, queried and API. cleanedTo thethe API up by researchers for removing noisy queried tweets and the API containing hashtags detection using Tweepy collect sarcastic tweets, the re containing irrelevant tweets.hashtags In this step#طعنه to avoid #کنایهten(both or bias, (both mean mean different sarcasm). sarcasm). computer Non-sarcastic Non-sarcastic scientists tweets and native speakerstweetswerewere retrieve were for employed retrieved checkingbased labels. on containing political Ultimately, hashtags hashtags. Then,#طعنه theseor tweets #کنایه (both were meanhashtags manually sarcasm). Non-sarc political hashtags. Then, these the study were tweets collected 1,200 manually sarcastic tweets checked containing and cleaned up by removing checked and cleaned political upstep by tweets. removinghashtags. In thisnoisy Then, theseandtweets wereIn manually thiswereandchecked an tenand irrelevant tweets. #طعنه or # کنایهand 1,300 non-sarcastic dataset, sarcastic non-sarcastic tweets irrelevant tweets. In this to avoid bias, different computer scientists native spe labelled 0 and 1, respectively. irrelevant tweets. In this step step to avoid bias, ten different computer scientists and native speakers were to avoid bias, ten different compute employed for checking labels. Ultimately, the study collected 1,200 sarcastic tweets containi employed for checking labels. employed for checking Ultimately, the study labels. Ultimately, collected the study collected 1,20 1,200 sarcastic #طعنه Data or # کنایهand 1,300 non-sarcastic tweets. In this dataset, sarcastic and non-sarcastic t Preprocessing tweets containing hashtags # طعنهor # کنایهand and 1,300 non-sarcastic tweets. 1,300 non-sarcastic tweets. In this In this dataset, sa labelled dataset, 0 and 1, sarcastic respectively. andpreprocessed non-sarcastic In this section, tweets were labelled 0tweets 1,were andand to clean labelled them for0feature respectively. transfer and 1,extraction. respectively. This step includes the following process: Data DataPreprocessing Preprocessing Data Preprocessing Text Filtering: The researchers filtered non-Persian tweets, URLs, retweets, mentions, and some special characters such as ‘^%#-+’. In addition, all hashtags were removed and replaced informal words with InInthis this section, tweets tweetswere werepreprocessed preprocessed to to clean andand clean transfer themthem transfer for feature for feature extraction In thiscomprehensive formal words using Dehkhoda – the largest extraction. section, tweets Persian were preprocessed dictionary to clean and (Dehkhoda, 1931). transfer t includes the This step includes following process:the following process: includes the following process: 5 Text Filtering: The researchers filtered non-Persian tweets, URLs, retweets, mentions, and so 6 The researchers filtered non-Persian tweets, URLs, Text Filtering: characters such as ‘^%#-+’. In addition, all hashtags were removed and replaced informal characters such as ‘^%#-+’. In addition, all hashtags were remov formal words using Dehkhoda – the largest comprehensive Persian dictionary (Dehkhoda, 193 formal words using Dehkhoda – the largest comprehensive Persian
Journal of ICT, 20, No. 1 (January) 2021, pp: 1-20 Text Filtering: The researchers filtered non-Persian tweets, URLs, retweets, mentions, and some special characters such as ‘^%#-+’. In addition, all hashtags were removed and replaced informal words with formal words using Dehkhoda – the largest comprehensive Persian dictionary (Dehkhoda, 1931). Emoji Dictionary: People Emoji Dictionary: usually use emojiPeopleduringusually their daily useconversations emoji during their daily Emoji Dictionary: People usually use emo se emoji duringintheir daily conversations microblogs such as Twitter Twitter in andmicroblogs Facebook. and Facebook. such As aAs as result, aEmojithe present result, researchers the present researchers creat Emoji Dictionary: People Dictionary: usually People usually use use emoji People ictionary: Dictionary: Dictionary: he present during usually People theirEmoji created researchers usually daily an use conversations Dictionary: emoji emoji created use an during dictionary emoji emoji during in People microblogs their daily containing dictionary Twitter their daily emojis. Emoji usually all such Twitter Twitter containing as Dictionary: conversations use emoji conversations These emojis. all emojis and during in were theiruse Facebook. People in microblogs These microblogs manually emoji usually daily emojis As such aduring use result, as emoji conversations such as were labelled as their the happy,indaily pres during mic sad y use emoji during their daily conversations in microblogs Twitter and such Facebook. as As Twitter Twitter a result, emojis. and Facebook. the These present emojis As a result, researchers the crea ,and ople eir and the ple anually present usually daily nd Facebook. Facebook. labelled researchers use As emoji conversations Asmanually as in aahappy, result,created during result, the an their microblogs Twitter the present sad, and labelled emoji daily such dictionary conversations Facebook. present and as researchers as happy, containing As acreated sad, researchers neutral. Based in result, and created on the an all microblogs Twitter and theemoji neutral. an present Based emoji dictionary, such Facebook. dictionary as researchers on the dictionary As a result, containing an emoji dictional the created all dictionary, containing each all were manually present resear lt, the present researchers created each anneutral. emojiall emoji dictionary Twitter in a retrieved containing emojis. These tweet all was Twitter emojis werereplaced emojis. manually by Theseits relative emojis labelled label. were wasaslabelled happy,manu manually As ers mojis. emojis. placed labelled acreated result, These These by its the an as happy, present emoji emoji emojis emojis relative were were sad, Twitter and researchers dictionary in amanually retrieved manually emojis.created containing tweet labelled These labelled Based an was as onwere emoji the replaced asemojis happy, happy, sad, sad, dictionary, dictionary Twitter by each anditsneutral. and manually relative emoji containing emojis. These labelled neutral. in label. Based Based asall on on ahappy, emojisretrieved the sad,tweet were manually the dictionary, and neutral. dictionary, Basedsa replaced bo as manually replaced mojis labelled were byand as label. happy, sad, and neutral. each its relative manually label. labelled Basedemoji as onhappy, on thein dictionary, sad, each emoji a retrieved tweet and neutral. in a retrieved was replaced Based tweet was by its relative repla label. on the appy, oji oji ji in in aasad, retrieved retrieved neutral. tweet tweet was Based waseach replaced emoji replaced theby indictionary, by aitsretrieved its relative relativetweet label. label.was each emoji replaced bydictionary, in aitsretrieved relativetweet label.was replaced by its relati slabel. replaced tweet wasby its relative replaced Proverb by its label. Dictionary: relative label. Persian speakers mostly use slang expressions as well as several common Proverb Dictionary: proverbs in Persian Proverb speakers their conversationsDictionary: mostly Persian (Hokmi, use speakers slang expres most ers kers mostly use mostly use 2017). slang expressions slang Proverb expressions 1,000 mostly common as as use well proverbs well as Proverbseveral asPersian in several their common Dictionary: common conversations Proverb Persian proverbs (Hokmi, speakers inslang theirDictionary:2017). mostly conversations Persian 1,000 use slang speakers common (Hokmi, expre 20 Dictionary: Dictionary: Persian Persian speakersspeakers Dictionary: mostly usePersian slang slang slang expressions Proverb speakers expressions and as proverbs as Dictionary: mostly well well as use as were several several collected Persian common speakers expressions common using asmostly well as usesev sl kmi, eakers Hokmi, Persian ginersian 2017). mostly 2017). speakers 1,000 use common slang Dehkhoda 1,000 mostly commonuseexpressions Persian Dictionary. slangPersian as slang well collected slang expressions The and as proverbs and asstudyproverbs several using proverbs well in also as common Dehkhoda their were were used several collected proverbs Dictionary. conversationsthe common website using in The (Hokmi, their Dehkhoda study created conversations 2017). also by used 1,000 Dictionary. the (Hokm common The websstu inexpressions their their conversations as well as conversations severalin proverbs (Hokmi, (Hokmi, common 2017). 2017).their1,000 conversations 1,000 commonproverbs common Persian in (Hokmi, Persian slangtheirand 2017). slang and conversations 1,000 proverbs common proverbs were (Hokmi, were Persian 2017). slang and 1,000 p Hokmi, y.The ersations common The study 2017). study also (Hokmi,also Persian 1,000 used Ehsan used the 2017). slang common theand website (2013) website 1,000 proverbsPersian to created find created common were slang some someby by Ehsan Ehsan Persian andslang other collected proverbs (2013) slangs slangs (2013) using andto collected to suchwere such findfind Dehkhoda proverbs as: as: some using collected "جهنمDictionary. other were Dehkhoda"به (The (The slangs using hell The hell such Dehkhoda Dictionary. with study as: "جهنمit), also it), The Dictionary. used "کردی "به study (The the سرویس also web hell Th us using using Dehkhoda Dehkhoda Dictionary. Dictionary. collected The The studyusing study also Dehkhoda also used used the Dictionary. the website website created The study created by by Ehsan also used Ehsan (2013) (2013) the to website to find find created by Ehsan ary. da he (The The hell Dictionary. study hellwithwith also The used it),"کردی it), "کردی study the سرویس سرویس also website used "دهنمو the created(bite (bite (bite website bycreated me), me), "میجوشه Ehsan some سرکه سرکه (و2013) other (I"جهنم havebyسیرو سیر slangs Ehsan مثل a"دهنمو to"دلم مثل butterfly find "دلم such (2013) "میجوشه in tosome as: my (Isuch "جهنم (I find other have "بهas: stomach), slangs a(The butterfly hell butterfly etc. such in inmy with Finally, as: myit),"جهنم "کردی stomach), each سرویس"به (The etc hererthe website slangs slangs such such created as: by Ehsan "جهنم "جهنم as:stomach), "به some "به (The (The (2013) other hell hell to slangs with with find it), such it), "کردی "کردیas: سرویس سرویس some "به (The "دهنمو other (bite hell (bite slangs with me), me), سرکه it), سرکه"کردیسیر وو "جهنم سیرسرویس مثل مثل "به "دلم "دلم (The "دهنمو hell (bite me),tweet with it), سرکه"ی " as: کر (The ach), mach), hell etc. "جهنمa "به سرویس etc. with Finally, "دهنموFinally, (The(bite it), hell "کردی each each with me), tweet tweetetc. سرویس it), سرکه "کردی وFinally, was "دهنمو سیر scanned سرویس scanned مثل each (bite or "دلم "دهنمو to tweet me), to slang"میجوشه سرکه replace replace (bite with was (Iو its me), the scanned سیر have its "میجوشه مثل a proverb سرکه proverb direct "دلم (I to butterfly or have replace slang وmeaning. سیر "میجوشه مثلain my with its "دلمeach butterfly (Iproverb the have stomach),direct in my a or meaning.slang butterfly etc. stomach), Finally, in my etc.each stomac twee Finally, II have butterfly have a butterflywith in my in my "میجوشه stomach), stomach), (I have etc. Finally, a butterfly each intweet my stomach), was etc. Finally, each tweet was scanned to replace its proverb scanned etc. to Finally, replace its proverb tweet was scanned to repla tomach), etc. the direct tweet meaning. ly ach inthe withtweet with was Finally, my stomach), the direct direct scanned etc.each Finally, meaning.toorreplace meaning. slangeach was scanned tweet its proverb with the direct wasto replace ormeaning. slang with scanned toitsreplace or proverb the direct slang with theordirect its meaning. proverb slangmeaning. with the direct meaning. meaning. Data Preparation Data Preparation Data Preparation Data Preparation Data Preparation Data Preparation paration eparation eparation Data Preparation In this section, four types of preparation were In thisperformed section, on fourthe types dataset, of preparation wer namely (1) Normalisation, In this (2) section, word four types of Tokenisation, (3)In preparation Lemmatisation, this section, were four andperformed types(4) of preparatioon the d ration ction, were four performed types of on the dataset, namely In (1) this section, Normalisation,In this four typesperformed word (2) section, of four preparation Tokenisation, types of (3) were preparation performed Lemmatisation, were on and perform the(4 ion ection, ection, were fourperformed of preparation typesPart-of-Speech on Inthe preparation thisdataset, section, were were (POS) performed four namely performed types (1) word Tokenisation, Tagging. onof on the preparation dataset, Normalisation, the dataset, namely were namely(2) (3) Lemmatisation, (1) (1) Normalisation, on Normalisation, the dataset, (2) and (4) Part-of-Speech(2) namely (1) (PO N es aration on, on and of the were (4) preparation performed Part-of-Speech dataset, were namely on (1) the (POS) performed dataset, Tagging. on Normalisation, the namely dataset, (2) word(1) Normalisation, Tokenisation, namely (1) (3) (2) Normalisation, word Lemmatisation, Tokenisation, (2) and (4) (3) Lemmatisation, Part-of-Speech (PO a kenisation, , andand kenisation, (4) (4) (3) Lemmatisation, Part-of-Speech (3) Lemmatisation, word (POS) Tokenisation, and (4) andTagging. Part-of-Speech(3) Lemmatisation, (4) Part-of-Speech (POS) Tagging. (POS) word Tokenisation, Tagging. and (4) (3) Part-of-Speech Lemmatisation, (POS) and Tagging. (4) Part-of-S tion, Lemmatisation, eech (POS) Tagging. Part-of-Speech (POS) and (4) Part-of-Speech (POS) Tagging. Tagging. Normalisation: This step converted a list of Normalisation: This step converted a list Normalisation: This of stepwordsconverted into a uniform a list sequence. of d sation: a list of This words step into converted a uniform Normalisation: a list sequence. of words This Normalisation: With into step a normalisation, converted uniform Normalisation: This sequence. a list the step researchers of Normalisation: converted This words With aimed step auniform list to converted normalisation, into a ofwords This overcome the step words sequence. into a listconverted into a aunifo profound of words With unifoacin n sation: a list of This words stepWith converted into anormalisation, uniform a listsequence. of words the researchers into With researchers aimed a normalisation, uniform aimed to sequence. to overcome the overcome Withprofound normalisation, profound challenges challenges the in Persian ted ep ofound a a listchallenges converted uniform of sequence. words a list into in of a uniform Persian words With into sequence. including: normalisation,a uniform (1) the With researchers Existence sequence. normalisation, With aimed of researchers various to the normalisation, aimed researchers overcome prefixes such tothe profound as overcome aimed “”ب (b), to challenges “”بر profound overcome (bar), in(1) “”پس profou Persian challenges (pa rs aimed to overcome in inPersian researchers profound including: challenges aimed to in overcome (1)(1)Existence Persian profound including: challenges (1) Existence in Persian of various including: Exis ers ers aimed found profound “”پس ,vercome (pas), to overcome challenges challenges “”بر“”فرا profound (fara), challenges profound inPersian Persian etc.; (2)insuch challenges including: including: Different Persian prefixes (1) inExistence encoding including: Persian such Existence prefixes asofExistence such forms (1) of “”ب of as various including: for various (b), various “”ب some prefixes “”بر (b), (1) (bar), characters of “”بر Existence prefixes various such as (bar), like “”پس such “”ی “”پس (y) “”ب of(pas), various as(bar), “”ب (pas), and (b), “”فرا (b), “”ک “”فرا (fara), “”بر (k); (bar), (fara), (3) etc.;“س etc.; Using ( such Persian such as as “”ب “”بincluding: (b), (b), “”بر (1) (bar), (bar), (bar), Existence prefixes “”پس “”پس (pas), (pas), (pas), ofasvarious “”فرا “”ب “”فرا (fara), (fara), (b), (fara), “”بر etc.; etc.; etc.; prefixes (bar), (2) (2) (2) Different “”پس Different Differentsuch(pas), as “”ب encoding “”فرا encoding encoding (b), (fara), forms“”بر forms forms etc.;for forfor (2) some some “”پس some Different (pas), “”فرا encoding (f r),”پس “”پس(pas), (pas),“”فرا (fara), “(”فراpas), (fara), etc.;etc.; (2) (2) Different Different encoding “ترcharactersencoding like forms (4)forms “”ی for (y) for formssome and some “”ک (k); characters (3) Using like “”ی half-space (y) and “”ک such(k);as (3) ““اU ), rs ,(3) sa), “”بر like like Using etc.; (bar), “”ی “”ی (2)(y) (y) “”پس half-space andcharacters Different and “”ک “”ک such “”فرا encoding (k); as characters (k); (3) (3) “(fara), ها like آن ”“”ی forms Using Using and like etc.; (y) for “”ی half-space half-space سخت (2) some characters ”; (y)Different and “”ک and and such such “”ک(k); as as(4) like Existence encoding (3) “characters “(k); ها ها آن ””“”ی (آن3) Using and andUsing(y) “a“like of تر and wide سخت for “”ی “”ک range ترhalf-space ”; half-space سخت some (y) (k); andand ”; and ofsuch (4) such (4) (3) suffixes “”ک Using as Existence as Existence “هاsuch (k); آنhalf-space (3) and ”of of as and “”ها Using “سختتر such (ha), half-space as ”;“رین and 3) ); Using (3) Using “”ترینhalf-space half-space such such as as“ ها“ آن ها ” آن ”and and “ تر “ a ترسخت سخت wide a ”; ”; wideand and range (4) of range Existence Existence suffixes of suffixes suchof of a suchas “”ها wideas (ha), range “”ها “”ترین of (ha), suffixes “”ترین (tarin),such (tarin), “”یان as “”ها “”یان (yan), (ha (yan nd uch nge ange “”ک (ha), as of (k); (”آنها3) “suffixes of suffixes (tarin), and such“”یان“تر Using such سخت as a”;(yan), half-space as “”ها “”ها wide ;(ha), and and (ha), etc. (4)such(4) “”ترین range (Mohtaj, as Existence “”ترین of “آنهاof Existence (tarin), suffixes (tarin), ” “”یان and of“(yan), Roshanfekr, such “”یان تر aas سخت wide (yan), “”ها ”; a Zafarian, wide etc. etc. and range (ha), (4) range (Mohtaj, Asghari, & ofExistence “”ترین (Mohtaj, ofsuffixes 2018). suffixes Roshanfekr, (tarin), Roshanfekr, ofsuch as suchZafarian, “”یان as “”هاetc. (yan), Zafarian, (ha), (ha), &&(Mohtaj,“ ”ترینRoshan (tarin), ha), ”ه ”یانsuch “”ترین (ha), as “”ترین “”ها (tarin), (ha),(tarin), “”یان “”ترین (yan), etc. (Mohtaj, Roshanfekr, “”یان (yan), (yan), (tarin), etc. etc. “”یان (Mohtaj, (Mohtaj, (yan), Zafarian, & Asghari, etc. etc. Roshanfekr, Roshanfekr, Asghari, 2018). (Mohtaj, (Mohtaj, 2018). Zafarian, Zafarian, Roshanfekr, Roshanfekr, Asghari, 2018). && Asghari, Zafarian, Zafarian, & & 2018). Asghari, 2018). 2018). 2018). Asghari, 2018). Tokenisation: This step was conducted to ducted to break Tokenisation: each tweet down This to itsstep was Tokenisation: constitutive conducted Tokenisation: words. This to This Letstepbreak step consider us was Tokenisation: each was an tweet conducted example: conducted down This to toهستمbreak tostep break نویس its was each برنامه یک conducte tweet "منeac "do d) ation: ation: This This step step was Tokenisation: was conducted conducted toto break break Thiseach steptweet each was conducted tweet downTokenisation: down toto itsits to This constitutive break constitutive step each was words. tweet words. conducted down Let Let ustoeach us its tweet to constitutiv break nducted cted یک pan "منto was down " to break )I break am conducted constitutive each a each tweet programmer tweet toconstitutive break words. down down . It eachwords. to converts tweet toLet itsits downus consider constitutive consider constitutive to ،"نویس to نویس its"برنامهanan example: words. words. an constitutive example: ،""یک Let Let ، ""من هستم us هستم words.us ""هستم. consider نویسLet برنامه us anیک example: "من " (I )I am هستم am aنویس برنامه programmer یک ن tweet example:هستم an example: to its هستم نویس نویس برنامه یک "من یک consider برنامه "من an "" )I amLet example: )I am considerus هستم aa programmer programmer example: برنامه.consider . ItItیکconverts"من converts an " نویس example: )I to toam برنامه ،"نویس هستمنویسیک "برنامه a programmer ،"نویس "برنامه "من ،""یک ،""یک" )I. ،It،am برنامه یک ""من a"من converts ""من programmer " )Itoam a pro ،"نویس ه. یهبammer یک "من برنامه" "من نویس )I " )I . Itam am یکa"من converts programmer a programmer programmer " to )I am ،"نویس .. It It converts . It converts a programmer "برنامه converts ،""هستم""یک. .،It to ""من ،"نویس ،"نویس toconverts "برنامه ""هستم."برنامه ،"نویس to ""هستم. ،""یک ،""یک "برنامه، ""من ،""من ""هستم. ،" "یک،""من ""هستم. Lemmatisation and Stemming: This step wa his step was Stemming: Lemmatisation done to decrease the size andofStemming:the Lemmatisation dataset Thisfind and the stepthe was and root done Stemming: ofwasalldataset to decrease Lemmatisation words. This step theandwassize doneof to decrease Stemming: This to s isation isation and and Stemming: the Lemmatisation This datasetThis step andstep was findwas the Lemmatisation and done done root toStemming: to decrease ofdecrease all words. Lemmatisation This the and size size Stemming: stepofof the the and done dataset Stemming: This to and step decrease and find find was the This the the done root size root step towas of the done decrease dataset th This emming: ecrease stepthe was This size done step of to decrease was the done dataset to the and size the decrease find ofthe the root dataset of all size of words. the anddataset find the androot findofthe allroot words. rds. step rds. was done to decreaseof all the words. size of the of alldataset words. and of find all the words. root Part-of-Speech (POS) Tagging: It is a proce Part-of-Speech (POS) Tagging: It is a process of converting tweets into lists is a process of converting tweets into lists of Part-of-Speech tuples where each Part-of-Speech (POS) tuple hasof Part-of-Speech Tagging: a form (POS) (word,It is POS Tagging: aeach (POS) process tag). Ittuple is of TheTagging: converting POSoftag aconverting process ofIt conv istwe sign a Speech Speech (POS) (POS) Tagging: of tuples Tagging: Part-of-Speech ItIt isis aa process where each of process of(POS) tuple hasTagging: converting converting Part-of-Speech a form tweets tweets It(POS) (word, is into a process into lists POS lists Tagging: tuples of tag). of converting The tuples It where POS where is a tweets tag each process signifiesinto tuple of lists tuples twee Itting S is tagatweets process signifies is of converting whether of the tweets word into is a(word, lists of tuples has a form where (word, each POStuple All hasPOS tag). a form The preparation POS (word, tag steps POS signifies were done tag). whetherThe Hazm, using POS the taa wo Tagging: m ma(word, process (word, It into POS POSof atag). process lists whether converting tag). The The of the tuples POS has POS converting atag tweets where word form tag signifies into isnoun, signifies tweets eacha noun, lists adjective, tuple whether POS whether has into tag). ofadjective, atuples form lists adjective, the theThe verb, ofhas word where word (word, POS and tuplesisisaeach verb, so aform atag POS and noun,on. where (word, signifies noun, tuple tag). each soadjective, on. tuple whether adjective, The POS tag). verb, verb, tag theThe andword and POS signifiesso so on.is on.tag signifies awhether noun, whet adjective, the wor ng OS ag). rration tag Hazm, the The word signifies a ispython POS tag awhetherwhether noun, library signifies the adjective, for word whetherdigesting verb, isthe a noun, Persian word All atext. islibrarynoun, verb, preparation adjective, and stepsso verb, on. were andAll done preparation so on. usingdone Hazm, steps were a python done using library forHlid aration tag steps steps signifies were were done done using All using the preparation word Hazm, Hazm, is a aaandpythonso adjective, steps python noun, All on.were library done preparation for for verb, All and digesting using digesting steps preparation Hazm, so werePersian Persianadone on. steps python text. text. were using library Hazm, for using digesting a python Hazm, a python Persian library text. for di re aryingdone Hazm, for usinga python digesting Hazm,library a python forlibrary digesting for Persian digesting text. Persian7text. Feature Extraction Hazm, a pythonPersian library text. for digesting PersianFeature text. Extraction Feature Extraction Extraction Extraction Feature Extraction Feature Extraction Feature Extraction In this section, four sets of features were ex
Journal of ICT, 20, No. 1 (January) 2021, pp: 1-20 All preparation steps were done using Hazm, a python library for digesting Persian text. Feature Extraction In this section, four sets of features were extracted, namely (1) Sentiment Feature, (2) Deep Polarity Feature, (3) POS Feature, and (4) Punctuation Feature. These features were extracted in a way that covered different types of Persian sarcasm. Based on these features, all of the retrieved tweets were represented by feature vectors. This section explains how to represent a tweet as a feature represent a tweet as a feature vector to train a classifier for sarcasm id vector to train a classifier for sarcasm identification. identification. dentification. Deep Polarity FeaturesDeep Polarity Features This section focuses This section focuses on sentence-level on sentence-level inconsistency. inconsistency. Let us consider the twe Let us consid iderthe der thetweet: et: tweet: “ “قدر خرابشدشدچهچهقدر ( ”!!!خوشبختم منگوشیمMy گوشیمخراب (Myphone phoneisisbroken broken how howlucky I am!!!). lucky I am!!!). There is a s There is sentence-levelinconsistency entence-level a sentence-level inconsistency between the first and second parts betweenthe first and second parts of the tweet (i.e. My phone inconsistencybetween of is broken and dhow howlucky the tweet luckyI Iam!!!). am!!!).We (i.e. My Weconsideredphone is broken and how lucky I am!!!). We considered consideredeach tweet with more than n tokens as a multiple sentence tw each tweet with more than n tokens as a multiple sentence tweet (MST). Based weet (MST).Based eet (MST). Basedononthe authors’observation on about 1,000 retrieved tweets, the proper values for n theauthors’ on the authors’ observation on about 1,000 retrieved tweets, the proper values nare are6,6,12, 12,and and18, 18,respectively. respectively.The Thebest value for n was evaluated in the Analysis and Results section. E for n are 6, 12, and 18, respectively. The best value for n was evaluated in the Each ach MSTMSTwas was divided divided into into two two parts. parts. Analysis and Results section. Thus,Each for the MSTmentioned tweet, was divided it is into twoas parts. follows: Thus, for the mentioned tweet, it is as follows: tweet: My phone is broken how lucky I am!!! tweet: My phone is brokenn >how 6 orlucky n>12I am!!! or n>18? Tweet is MST so divide it into two equal par arts: skipitit ts: skip n > 6 or n>12 or n>18? nTweet is MST so divide it into two equal parts: skip it =8, Hence we have: n =8, Hence we have: Part1: My phone is broken Part1: My phone is brokenPart 2: how lucky I am!!! Part 2: how lucky I am!!! If there is anybetween If there is any sentiment inconsistency sentimenttheinconsistency first and second between parts ofthe thefirst and second dparts partsofofthethetweet, tweet,itititwill tweet, will will hint hint hint about about about beingsarcastic. being sarcastic.ToTo fulfil fulfil this,this, first,first, the present the present research combined research dedtwo twodeepdeepneural neuralnetwork combined network two deepmodels, models, LSTM neural and convolutional network models, LSTMneural network, as proposed and convolutional neural by Roshanfek ekretetal.al.(2017). kr (2017). Then,as network, Then, the the combined proposed combined bydeep learningetmodel Roshanfekr was employed al. (2017). Then, thetocombined each MST. Afterwards, the de deep eepmodel ep modelwas learning was appliedmodel applied totothe was thefirst employed andsecondtoparts firstand each of MST. eachAfterwards, the deep model tweet respectively. was Consequently, two new b binaryfeatures, binary applied to dpf1and features,dpf1 the first and dpf2were anddpf2 second were parts introduced of each (which tweet stand respectively. for deep Consequently, polarity feature). two new binary features, dpf1 and dpf2 were introduced (which stand for deep polarity feature). The feature dpf1 is activated if the first and second parts of the MST d Tdodonot nothave havethethesame samesentiment. sentiment. The feature dpf2 is activated if the first or second parts of the MST d The feature dpf1 is activated if the first and second parts of the MST do not Tdodonot nothavehavethethesame have same sentiment the sentiment same withthe whole MST’s sentiment. with sentiment. For example, 8in this tweet: “My phone is broken and how lucky I a am!!!”,the am!!!”, thesentiment sentimentanalysis modeldetected positive sentiment for this tweet and detected negative se analysismodel sentiment ntiment forforthe thefirst firstpart. part.Thus, thefeature dpf2 was activated. Thus,the
pmployed eep learning tomodel learning each modelMST. waswas Afterwards, employed employed the to to each deep each MST. model MST. was applied Afterwards, Afterwards, thetheto deepthe deep first was model model and was applied applied to to thethe first first andand et ond respectively. econd parts partsof of Consequently, each eachtweet two newConsequently, tweetrespectively. respectively. binary features, Consequently, two dpf1 twonew newand dpf2features, binary binary were features,dpf1 dpf1 andanddpf2 dpf2were were Journal of ICT, 20, No. 1 (January) 2021, pp: 1-20 r deep polarity ntroduced oduced (which (which feature). stand stand forfor deep deep polarity polarity feature). feature). eedfeature The iffeature the first dpf1 and dpf1 second is activated is The parts activated if the feature if offirst the dpf2 the MST isfirst andand activateddo secondifnot second have firstofthe parts the parts of the or same the MST secondMSTsentiment. dodo partsnotnot of have thehavethethe MST same dosame notsentiment. sentiment. have eedfeature The iffeature the dpf2 first or dpf2 second is the is sameparts activated activated ifofthe if the sentiment thefirst first MST or or with dowhole second not second the have parts ofthe parts MST’s of same thethe MSTMST sentiment dodo sentiment. notnot with have have thethe same same sentiment sentimentwith with nt. he whole whole MST’s MST’s sentiment. sentiment. For example, in this tweet: “My phone is broken and how lucky I am!!!”, For “My t:example,phone example, in in thisthe is this sentiment broken tweet: and “My tweet: analysis how “Myphone lucky phone model I am!!!”, is is broken broken detected andthehow and howpositive sentiment lucky lucky sentiment I analysis am!!!”, I am!!!”,thefor model the this sentimenttweet sentiment and analysis analysis model model nt for positive ected etected this tweet positive detected sentiment sentiment negative and detected forfor thisthis sentiment negative tweet tweetand and for sentiment detectedthe detected first for negative the part. negative Thus, first sentiment part. the sentiment feature Thus,forthe for thethe dpf2 first first was part. part. Thus, Thus,thethe activated. ture eature dpf2 dpf2 waswasactivated. activated. Sentiment Feature Sentiment ntiment Feature Feature This section concentrates on word-level inconsistency, which refers to sonsection This word-level section inconsistency, concentrates concentrates onon the coexistence which word-level word-level refers of negative toand thepositive inconsistency, coexistence inconsistency, which which ofwithin refers words negative refersto to thethe and coexistence the coexistence same tweet. of of negative Fornegative andand same itive tweet. ositive words words For example, within within thethe example, same"بدبختم same tweet.انقدرFor tweet. کهForخوبه ( "چهI"بدبختم example, example, love my انقدر "بدبختم love my misfortune). انقدر که که خوبه خوبه "چه"چه misfortune). (IThere love (I love There is my is my amisfortune). misfortune). word-level There There is is ncy betweeninconsistency word-level word-level “( ”خوبlove) inconsistency inconsistency and “”بدبخت between between between “”خوب (love) “(”خوبmisfortune). (love) (love)andand To and“”بدبخت “”بدبخت (misfortune). identify such (misfortune). (misfortune). To identify ToToidentify identifysuch such t dictionary awas nconsistency, onsistency, a such sentiment inconsistency, provided sentiment by usingwas dictionary dictionary a two sentiment was popular provided dictionary provided Persian by byusing was using provided dictionaries, two two popular popularby using Moein Persian two Persian popular Moein dictionaries, dictionaries, Moein 31). 1972) 72) and The and sentiment Dehkhoda Persian Dehkhoda (1931).dictionaries, dictionary (1931). The consisted The sentimentMoein sentiment (1972) ofdictionary 2,500 dictionary and emotional Dehkhoda consisted words consisted of of (1931). along 2,500 2,500 withThe sentiment emotional emotional words words along alongwith with .ir heirThere were polarities polarities and dictionary four and different scores. scores. consisted There polarities There were wereof 2,500 that four four emotional were different different words considered polarities polaritiesasalong that that with positive, were were their high polarities considered considered as as and positive, positive,high high scores. There were four different polarities that were considered as positive, gh negative. itive, ositive, Theand negative, negative, scores andhigh were high integerThe negative. negative. from The 2 (i.e. scores scoreswerehigh were positive) integer integer from to2-2(i.e. from 2 (i.e. (i.e. highhigh highpositive) positive)to to -2 -2 (i.e. (i.e. high high high positive, negative, and high negative. The scores were integer from 2 (i.e. thin theIftweet ative). egative). If anyanydidword word not exist within within inthe the the sentiment tweet tweet diddid dictionary, notnotexist existin in its thethe score sentiment was sentiment set to 0. itsits dictionary, dictionary, score score was wassetset to to 0. 0. high positive) to -2 (i.e. high negative). If any word within the tweet did not bleur polarities Table 1 represents with 1 represents related these thesefour examples. four polarities polarities with with related relatedexamples. examples. exist in the sentiment dictionary, its score was set to 0. Table 1 represents these four polarities with related examples. 7 77 e1e11111 Table 1 rDifferent Different Different Different Different Four Polarities Polarities DifferentPolarities Polarities Polarities Polarities Different with with with with with with Polarities with Examples Examples Examples Examples Examples Examples Examples Polarity Polarity Polarity Polarity Polarity Polarity Polarity Example Example Example Example Example Example Example Positive Positive Positive Positive Positive Positive Positive (happy)،،،خوب (خوشحالhappy) (خوشحالhappy) (خوشحالhappy) (خوشحالhappy) (خوشحالhappy) (خوشحالhappy) ،خوب ،خوب ،خوب خوب (good) (good) (خوبgood) (good) (good) (good) (good) High positive Highpositive High High High High positive positive positive positive High positive (excellent) (عالیexcellent) (عالیexcellent) (عالیexcellent) (عالیexcellent) (عالیexcellent) (عالیexcellent) ،آور ،آور ،آور شگفت ،شگفتآور ،آور ،آور شگفت (wonderful) (wonderful) (شگفتwonderful) شگفت شگفت (wonderful) (wonderful) (wonderful) (wonderful) Negative Negative Negative Negative Negative Negative Negative (miserable) (بدبختmiserable (بدبختmiserable (بدبختmiserable (بدبختmiserable (بدبختmiserable (بدبختmiserable ())ناراحتsad) (((((،(،،ناراحت ،ناراحت ،ناراحت ،ناراحت ناراحت )sad) )sad) )sad) sad) )sad) sad) High negative Highnegative High High High High negative negative negative negative High negative (افتضاحawful) (awful) (افتضاحawful) (افتضاحawful) ،چندش ،چندش (افتضاحawful)،،،چندش (افتضاحawful) (افتضاحawful) ،چندش چندش (disgusting) (disgusting) (چندشdisgusting)(disgusting) (disgusting) (disgusting) (disgusting) ceably, due to Noticeably, the lack of duePersian rich to the lexicon, lack of rich the Persian lexicon, researchers built the researchers built aon their ceably, eably, eably, eably, ably, due duedueto due due to to totothe thethelack the the lackof lack lack lack of ofofrich of richPersian rich rich rich Persianlexicon, Persian Persian Persian lexicon,the lexicon, lexicon, lexicon, thetheresearchers the the researchersbuilt researchers researchers researchers builtaaaaaasemantic built built built semantic semantic semantic semantic dictionary semanticdictionary dictionaryon dictionary dictionary dictionary ontheir on on on their their their their semantic dictionary on their own. First, they selected common emotional First, First, .First, First, First, they they First,they they they selected selected theyselected selectedcommon selected selected common common common common emotional emotional commonemotional words words emotionalwords emotional along wordsalong words along with alongwith along along with their their withtheir with with polarity polarity theirpolarity their their polarityfrom polarity polarity from from Dehkhoda Dehkhoda fromDehkhoda from from (1931) (1931) Dehkhoda(1931) Dehkhoda (1931) (1931) and (1931)and and and and and words alongemotional with theirwords polarity from Dehkhoda (1931) andDehkhoda Moein (1972). nninn(1972). nin (1972). (1972). (1972). (1972). Then, (1972).Then, Then,for Then, Then, Then, for for for for for each each each each each each emotional emotional emotional emotional emotional emotional word, word, word, word, word, word, a aa aaascore score score score score score between between between between between between -2-2 -2 -2 -2 -2 to to to to toto 22 222was 2 waswas was was was considered considered considered considered considered considered based based based based based based onon on on on on the thethe the the the Then, for each emotional word, a score between -2 to 2 was considered based d’s ’ss’stranslation s’s translation translationin translation translation translation inin in in SentiStrength. inSentiStrength. SentiStrength. SentiStrength. SentiStrength. SentiStrength. on the word’s translation in SentiStrength. ggggthethe the the the semantic thesemantic semantic semantic semantic dictionary, semanticdictionary, dictionary, dictionary, dictionary, dictionary, Using the four four four four four four auxiliary auxiliary auxiliary auxiliary auxiliary auxiliary semantic features featureswere features features features features dictionary, were were were four extracted wereextracted were extractedby extracted extracted extracted auxiliary byby by by counting bycounting counting counting counting counting features were the the the the the number thenumber numberof number number number extracted of ofof of of positive, positive, positive, positive, positive, positive, by tive, ve, tive,high ive, ive, ve, high high high high positive, positive, highpositive, positive, positive, and and counting positive, and and and high high andhigh high high negative negative thenegative high number words words of negativewords negative negative words words words inin positive, each each inineach inin each each each tweet. tweet. negative, These These high tweet.These tweet. tweet. tweet. These These These features features positive, were and featureswere features features features were were were were named highnamed named named named named as negative as as pw, pw, asaspw, as nw, pw,nw, pw, pw, nw, nw, nw, nw, andand andand and NW. NW. NW. NW. NW. Then, Then, Then, Then, Then, inin in inin line line line line line with with with with with Bouazizi Bouazizi Bouazizi Bouazizi Bouazizi and andand and and Ohtsuki’s Ohtsuki’s Ohtsuki’s Ohtsuki’s Ohtsuki’s (2015) (2015) (2015) (2015) (2015) works, works, works, works, works, nd NW. Then, in line with Bouazizi and Ohtsuki’s (2015) works, the ratio of the tweet p(t) is the thethe the the ratio ratio ratio ratio ratio of ofof of of the thethe the the tweet tweet tweet tweet tweet p(t) p(t) p(t) p(t) p(t) isisis is is 9 ded ned ed edd in inin in in Equation inEquation Equation1: Equation Equation Equation 1: 1:1: 1: 1:
You can also read