Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter Muhammad Okky Ibrohim Indra Budi Faculty of Computer Science Faculty of Computer Science Universitas Indonesia Universitas Indonesia Kampus UI, Depok, 16424, Indonesia Kampus UI, Depok, 16424, Indonesia okkyibrohim@cs.ui.ac.id indra@cs.ui.ac.id Abstract was the Tutsi ethnic genocide in Rwanda in 1994 (Stanton, 2009). The cause of the tragedy was hate Hate speech and abusive language spreading on social media need to be detected auto- speech propagated by some groups, claiming that matically to avoid conflicts between citizens. the cause of increasing pressure in politics, eco- Moreover, hate speech has a target, category, nomic and social was the Tutsi ethnic. and level that also need to be detected to help In everyday life, especially in social media, the the authority in prioritizing which hate speech hate speech spreading is often accompanied with must be addressed immediately. This re- abusive language (Davidson et al., 2017). Abu- search discusses multi-label text classification sive language is an utterance that contains abu- for abusive language and hate speech detection including detecting the target, category, and sive words/phrases that is conveyed to the inter- level of hate speech in Indonesian Twitter us- locutor (individuals or groups), both verbally and ing machine learning approaches with Support in writing. Hate speech that contains abusive Vector Machine (SVM), Naive Bayes (NB), words/phrases often accelerates the occurrence of and Random Forest Decision Tree (RFDT) social conflict because of the use of the abusive classifier and Binary Relevance (BR), Label words/phrases that triggers emotions. In Indone- Power-set (LP), and Classifier Chains (CC) as sia, abusive words are usually derived from an un- the data transformation method. We used sev- eral kinds of feature extractions which are term pleasant condition such as mental disorder, sexual frequency, orthography, and lexicon features. deviation, physical disability, lack of moderniza- Our experiment results show that in general tion, a condition where someone does not have the RFDT classifier using LP as the transfor- etiquette, conditions that is not allowed by reli- mation method gives the best accuracy with gion, and other conditions related to unfortunate fast computational time. circumstances; animals that have a bad charac- teristic, disgusting, and forbidden in certain reli- 1 Introduction gion; astral beings that often interfere with hu- Hate speech is a direct or indirect speech toward a man life; a dirty and bad smell object; a part of person or group containing hatred based on some- the body and an activity that related to sexual ac- thing inherent to that person or group (Komnas tivity; and low-class profession that is forbidden HAM, 2015)1 . Factors that are often used as by religion (Wijana and Rohmadi., 2010; Ibrohim bases of hatred include ethnicity, religion, disabil- and Budi, 2018). In general, the use of abusive ity, gender, and sexual orientation. Hate speech words aimed to curse someone (spreading hate spreading is a very dangerous action which can speech) in Indonesia is divided into three types have some negative effects such as discrimination, that are words, phrases, and clauses (Wijana and social conflict, and even human genocide (Kom- Rohmadi., 2010). The spread of hate speech that nas HAM, 2015). One of the most horrific geno- is accompanied with abusive language often accel- cides caused by the act of spreading hate speech erates the occurrence of social conflict because of 1 Komisi Nasional Hak Asasi Manusia (Komnas the use of the abusive words/phrases that triggers HAM) is an independent institution that functions to emotions. Although abusive language are some- carry out studies, research, counseling, monitoring, times just being used as jokes (not to offend some- and mediation of human rights in Indonesia. See https://www.komnasham.go.id/index.php/ one), the use of abusive language in social media about/1/tentang-komnas-ham.html still can lead to conflict because of misunderstand- 46 Proceedings of the Third Workshop on Abusive Language Online, pages 46–57 Florence, Italy, August 1, 2019. c 2019 Association for Computational Linguistics
ings among netizens (Yenala et al., 2017). More- Depending on (Hernanto and Jeihan, 2018)23 , over, children could be exposed to language inap- detection of the hate speech target, category, and propriate for their age from those abusive language level is important to help authorities prioritize scattered in their social media (Chen et al., 2012). cases of hate speech that must be handled im- mediately. In this work, we do research on hate The hate speech and abusive language on so- speech and abusive language detection in Indone- cial media must be detected to avoid conflicts sian Twitter. We chose Twitter as our dataset be- between citizens and children learning the hate cause Twitter is one of the social media platforms speech and inappropriate language from the social in Indonesia that is often used to spread the hate media they use (Komnas HAM, 2015; Chen et al., speech and abusive language (Alfina et al., 2017, 2012). In recent years, many researchers have 2018; Putri, 2018; Ibrohim and Budi, 2018; Ibro- done research in hate speech detection (Waseem him et al., 2018). This problem is a multi-label and Hovy, 2016; Alfina et al., 2017, 2018; Pu- text classification problem, where a tweet can be tri, 2018; Vigna et al., 2017) and abusive lan- no hate speech, no hate speech but abusive, hate guage detection (Turaob and Mitrpanont, 2017; speech but no abusive, and hate speech and abu- Chen et al., 2012; Nobata et al., 2016; Ibrohim and sive. Furthermore, hate speech also has a certain Budi, 2018; Ibrohim et al., 2018) in various social target, category, and level. media genres and languages. In doing multi-label hate speech and abusive language detection, we use machine learning ap- According to (Komnas HAM, 2015), a hate proach with several classifiers. The classifiers that speech has a certain target, category, and level. we use include Support Vector Machine (SVM), Hate speeches can belong to a certain category Nave Bayes (NB), and Random Forest Deci- such as ethnicity, religion, race, sexual orienta- sion Tree (RFDT) using problem transformation tion, etc. that are targeted to a particular individual methods including Binary Relevance (BR), Label or group with a certain level of hatred. However, Power-set (LP), and Classifier Chains (CC). Based based on our literature study, there has been no on several previous works, these three classifiers research on abusive language and hate speech de- are algorithms that can produce pretty good per- tection including the detection of hate speech tar- formance for hate speech and abusive language de- get, category, and level conducted simultaneously. tection in Indonesian (Alfina et al., 2017, 2018; Many research in hate speech detection (Waseem Putri, 2018; Ibrohim and Budi, 2018; Ibrohim and Hovy, 2016; Alfina et al., 2017, 2018; Putri, et al., 2018). We used several kinds of text clas- 2018) just identifying whether a text is hate speech sification features including term frequency (word or not. In 2017, (Vigna et al., 2017) performed re- n-grams and character n-grams), orthography (ex- search on hate speech level detection. Their re- clamation mark, question mark, uppercase, and search was done to classifying Italian Facebook lowercase), and sentiment lexicon (negative, pos- post and comment into three labels which are no itive, and abusive). We use accuracy for evaluat- hate speech, weak hate speech, and strong hate ing our proposed approach (Kafrawy et al., 2015). speech. However, (Vigna et al., 2017) did not To validate our experiment results, we use 10-fold classifying the target and category of hate speech. cross validation technique (Kohavi, 1995). Similar to research in hate speech detection, many In this paper, we built an Indonesian Twitter studies in abusive language detection (Turaob and dataset for abusive language and hate speech de- Mitrpanont, 2017; Chen et al., 2012; Nobata et al., tection including detecting the target, category, 2016) also just identify whether a text is abusive and level of hate speech. In general, the contri- language or not. In 2018, (Ibrohim and Budi, butions of this research are: 2018) conducted research on hate speech and abu- • Analyzing the target, category, and level of sive language detection. Their research was done 2 to classify Indonesian tweet into three labels that Staff of Direktorat Tindak Pidana Siber Bareskrim Polri 3 Direktorat Tindak Pidana Siber Badan Reserse Krim- are no hate speech, abusive but no hate speech, inal Kepolisian Negara Republik Indonesia (Bareskrim and abusive and hate speech. However, same as Polri) is a directorate of the Indonesian national police other studeis on hate speech and abusive language that charge of fostering and carrying out the function of investigating and investigating cyber crimes in Indone- detection, (Ibrohim and Budi, 2018) did not clas- sia. See https://humas.polri.go.id/category/ sify the target and category of hate speech. satker/cyber-crime-bareskrim-polri/ 47
hate speech to make an annotator guide and aimed at someone (an individual person), while gold standard annotation for building In- hate speech with group target is hate speech that donesian hate speech and abusive language aimed at a particular groups, associations, or com- dataset. Our annotator is arranged based on munities. These groups, associations, and commu- (Komnas HAM, 2015) and the results of in- nities can be in the form of religious groups, races, terviews and discussions with the staff of Di- politics, fan clubs, hobby communities, etc. rektorat Tindak Pidana Siber Bareskrim Polri Both aimed at individual or group, hate speech (Hernanto and Jeihan, 2018) and a linguistic has a particular category as the basis of hate. Ac- expert (Nurasijah, 2018). cording to FGD results, in general, hate speech categories are as follows: • Building a dataset for abusive language and hate speech detection including detecting the 1. Religion/creed, which is hate speech based target, category, and level of hate speech in on a religion (Islam, Christian, Catholic, Indonesian Twitter. We provide this research etc.), religious organization/stream, or a par- dataset for public4 so that it can be used by ticular creed; other researchers who are interested in doing future work of this paper. 2. Race/ethnicity, which is hate speech based on a human race (human groups based on physi- • Conducting preliminaries experiments on cal characteristics such as face shape, height, multi-label abusive language and hate speech skin color, and others) or ethnicity (human detection (including hate speech target, cate- groups based on general citizenship or shared gory, and level detection) in Indonesian Twit- cultural traditions in a geographical area); ter using machine learning approaches. 3. Physical/disability, which is hate speech This paper is organized as follows. We discuss based on physical deficiencies/differences hate speech target, category, and level in Indone- (e.g. shape of face, eye, and other body sia in Section 2. Our data collection and annota- parts) or disability (e.g. autism, idiot, blind, tion process is described in Section 3. Section 4 deaf, etc.), either just cursing someone (or presenting our experiment results and discussion. a group) with those words related to phys- Finally, the conclusions and future work of our re- ical/disability or those that are truly experi- search are presented in Section 5. enced by those who are the target of the hate speech; 2 Hate Speech Target, Categories, and Level in Indonesia 4. Gender/sexual orientation, which is hate speech based on gender (male and female), In this research, we conducted Focus Group Dis- cursing someone (or a group) using words cussion (FGD) with the staff of Direktorat Tin- that are degrading to gender (e.g.: gigolo, dak Pidana Siber Badan Reserse Kriminal Ke- bitch, etc.), or deviant sexual orientation polisian Negara Republik Indonesia (Bareskrim (e.g.: homosexual, lesbian, etc.); Polri), which is the agency responsible for inves- tigating cybercrimes in Indonesia. This is done 5. Other invective/slander, which is hate speech in order to get a valid definition of hate speech, in the form of swearing/ridicule using crude including the characterization. From the FGD words/phrases or other slanders/incitement with staff of Bareskrim Polri (Hernanto and Jei- which are not related to the four groups pre- han, 2018), it was obtained that hate speech has a viously explained. particular target, categories, and level. Every hate speech is aimed at a particular target. Notice that a hate speech can be catego- In general, the target of hate speech is divided into rized in several categories at once except two kinds, which are individual and group. Hate other invective/slander category. In other speech with individual target is hate speech that words, a hate speech under category reli- 4 gion/creed, race/ethnicity, physical/disability, and https://github.com/okkyibrohim/id- multi-label-hate-speech-and-abusive- gender/sexual orientation can not be categorized language-detection as other invective/slander category, and vice versa. 48
Besides having targets and categories, hate may not yet exist in the data from previous re- speech also has a certain level. Based on the FGD searches. We crawled Twitter data using Twitter results, we divide hate speech into three levels, Search API5 which is implemented using Tweepy which are weak, moderate, and strong. The ex- Library6 . The queries which we used for crawl- planation for every level of hate speech are as fol- ing Twitter data are words/phrases that often used lows: by netizens when spreading hate speech and abu- sive language in Indonesian social media, that can 1. Weak hate speech, which is hate speech in be seen in Appendix 17 . We crawled the twitter the form of swearing/slanders that aimed data for about 7 months, from March 20th , 2018 at individuals without including incite- until September 10th , 2018. The purpose of crawl- ment/provocation to bring open conflict. In ing with a long time is to get more tweet writing Indonesia, hate speech in this form catego- patterns. rized as weak hate speech because it is a per- In this research, we used crowdsourcing with a sonal problem. It means, if the target of hate paid mechanism (Sabou et al., 2014) for the anno- speech does not report to the authorities (feel- tation process. Since the tweets that we want to an- ing ordinary and forgiving people who spread notate has many labels, we decided to conduct two the hate speech towards him) then that hate phases of annotation process. This is because an- speech is not too prioritized to be resolved by notators who are not linguistic experts should not the authorities. annotate data with too many labels (Sabou et al., 2. Moderate hate speech, which is 2014). The first phase annotation process was hate speech in the form of swear- done to annotate the Twitter data whether tweets ing/blasphemy/stereotyping/labeling are hate speech and abusive language or not, while aimed at groups without including in- the second phase annotation process was done to citement/provocation to bring open conflict. annotate the hate speech target, categories, and Although it can invite conflict between level. For tweets from (Alfina et al., 2017, 2018) groups, this kind of hate speech is belonging and (Putri, 2018), tweets were just annotated to de- to moderate hate speech because the conflict termine whether the tweet is an abusive language that will occur is estimated to be limited to or not in the first phase annotation process, since conflict on social media. the hate speech label is already obtained. Mean- while, tweets from (Ibrohim and Budi, 2018) can 3. Strong hate speech, which is hate be annotated directly in the second phase since speech in the form of swear- their dataset was annotated for hate speech and ing/slanders/blasphemy/stereotyping/labeling abusive labels. aimed at individual or group including incite- For the annotation process, we built a web based ment/provocation to bring open conflict. This annotation system in order to make it easy for the kind of hate speech is belonging to strong annotators to annotate data so that it can speed hate speech, because it is a hate speech up the annotation process and minimize annota- that needs to be prioritized to be resolved tion errors. We conducted an annotator guideline soon because it can invite conflicts that are to give the task definition and example for help- widespread and can lead to conflicts/physical ing the annotators in understanding the annotation destruction in the real world. task. We also conducted a gold standard annota- tion for testing whether the annotators already un- 3 Data Collection and Annotation derstand the task or not. In this research, we are doing a discussion and consultation with an expert In this research, we used hate speech and abu- sive language Twitter dataset from several previ- 5 ous researches consisting of (Alfina et al., 2017, https://developer.twitter.com/en/docs/ tweets/search/api-reference/get-search- 2018), (Putri, 2018), and (Ibrohim and Budi, tweets.html 6 2018). Besides using Twitter dataset from previ- http://www.tweepy.org/ 7 ous researches, we also crawled tweets in order to For complete list of queries, see https: //github.com/okkyibrohim/id-multi- enrich dataset such that it can include the kinds of label-hate-speech-and-abusive-language- writing of hate speech and abusive language that detection 49
linguistic (Nurasijah, 2018)8 in order to get a valid nal label was decided using 100% agreement tech- annotation guideline and gold standard annotation. nique. From this phase, we get 11,292 (68.44% Twitter data that used for the gold standard came total tweets that were annotated in the first phase) from previous research (Alfina et al., 2017; Ibro- consisting of 6,187 not hate speech tweets and him and Budi, 2018) and hate speech handbook 5,105 hate speech tweets that have 100% agree- (Komnas HAM, 2015). ment (reliable dataset). According to (McHugh, After the annotation system was built and tested 2012), this percentage amount of reliable dataset well, the next process is the annotator recruitment (data can be used for research experiment) shows process. In this research, annotators came from that the annotation result has a good level of agree- different religious, racial/ethnic, and residential ment. backgrounds. This is done to reduce bias because Next, in the second annotation phase, we an- the annotation of hate speech is quite subjective notated 5,700 hate speech tweets (5,105 tweets (Alfina et al., 2017). The selected annotators’ cri- from the first phase annotation and 595 tweets teria are as follows: (a) have the age of 20-30 years from (Ibrohim and Budi, 2018)). In this phase, old (since most Twitter users in Indonesia came we use the best three annotators from the first an- from that age (APJII., 2017)); (b) native in the In- notation phase to annotate the target, categories, donesian language (Bahasa Indonesia); (c) experi- and level of hate speech. The final label in this enced using Twitter; (d) not members of any polit- phase was decided using majority voting. Since ical party/organization (this is done to reduce an- we use 3 annotators, each tweet label must have notation bias, especially when the annotators an- a minimum agreement from two annotators. If notate tweets that are related to politics). there is no agreement among the annotators in In this research, we use 30 annotators to anno- giving the label, then the tweet is deleted. From tate our dataset from various demographic back- the second phase annotations results, there were ground. The annotators consist of 14 males and 16 139 tweets that were deleted because there was no females that have various age (25 annotators aged agreement in hate speech categories or hate speech 20-24 years and 5 annotators aged 25-32 years) level labels. Therefore, we get 5,561 reliable data and last education background (12 annotators have (97.56% from total tweets that annotated in the bachelor degree for last education and 18 annota- second phase) that can be used for the research tors have senior high school degree for last edu- experiment. According to (McHugh, 2012), this cation). Furthermore, annotators also come from percentage amount of reliable dataset shows that various jobs, ethnicities, and religions. The kind the annotation result has a almost perfect level of of annotators’ jobs consists of bachelor students agreement. (12 annotators), master students (3 annotators), From these two phase annotation process, we civil servants (1 annotator), honorary employees get 13,169 tweets already used for research exper- (1 annotator), teacher/tutor/teaching assistant (5 iments that consist of 7,608 not hate speech tweets annotators), and private employees (8 annotators); (6,187 tweets from the first phase annotation and the annotators’ origin ethnic consists of Java (11 1,421 tweets from (Ibrohim and Budi, 2018)) and annotators), Bali (4 annotators), Tionghoa (4 an- 5,561 hate speech tweets. The distribution of abu- notators), Betawi (3 annotators), Batak (2 annota- sive language towards not hate speech tweets and tors), and others (6 annotators, came from Melayu, hate speech tweets from the collected tweets can Minang, Sunda, Cirebon, Ambon, Toraja); and the be seen in Figure 1. From Figure 1, we can see annotator’s religion consists of Islam (15 annota- that not all hate speech is abusive language. On tors), Christian (5 annotators), Catholic (5 annota- the contrary, an abusive language also not neces- tors), Hindu (3 annotators), and Buddha (2 anno- sarily a hate speech. tators). In the first annotation phase, we collect 16,500 From the total 5,561 hate speech tweets we tweets from the crawling process and previous re- have, most of that hate speech tweets are di- searches (Alfina et al., 2017, 2018; Putri, 2018) rected at individuals (3,575 tweets targeted to to be annotated by those 30 annotators. Every an individual and 1,986 tweets targeted to a tweet was annotated by 3 annotators and the fi- group). Those hate speech tweets consist of several hate speech categories which are 793 8 Master in sociolinguistics tweets related to religion/creed, 566 tweets re- 50
Figure 1: Distribution of abusive language towards not hate speech tweets and hate speech tweets lated to race/ethnicity, 323 tweets related to phys- ical/disability, 306 tweets related to gender/sexual orientation, and 3,740 tweets related to other in- vective/slander. Notice that the hate speech cat- Figure 2: The experiment flowchart egories of religion/creed, race/ethnicity, physi- cal/disability, and gender/sexual orientation are multi-label. It means, a tweet of hate speech remove emoticon in data cleaning process. After can be related in several categories. Meanwhile, that, we do text normalization, which is chang- for the hate speech level labels, our hate speech ing non-formal words into formal ones. In this dataset consists of 3,383 weak hate speech, 1,705 research, we do text normalization simply using moderate hate speech, and 473 strong hate speech. dictionary obtained from the combination dictio- naries from several previous works (Alfina et al., 4 Experiments and Discussions 2017; Ibrohim and Budi, 2018; Salsabila et al., 2018) and the dictionary that we build based on We conduct two scenarios for the experiment. The our dataset. Next, we do stemming to lemma- first experiment scenario uses multi-label clas- tize words in every tweet. In this paper, stemming sification to identify abusive language and hate was done using Nazief-Adriani Algorithm (Adri- speech including the target, categories, and level ani et al., 2007) that implemented using Sastrawi that contained in a tweet. Meanwhile, the second Library9 . For stop words removal, we used stop scenario uses multi-label classification to identify word list given by (Tala, 2003). abusive language and hate speech that contained in The next step after data preprocessing is fea- a tweet without identifying the target, categories, ture extraction. In this research, we used several and level of hate speech. Both of these scenarios kinds of feature extractions which are term fre- are performed to find out the best classifier, trans- quency, orthography and lexicon features. Term formation method, and features for each scenario. frequency features that we used in our experiments In general, both the first scenario and the second consist of word n-grams (unigram, bigrams, tri- scenario have the same flow that can be seen in grams, and the combination of word unigram, bi- Figure 2. grams, and trigrams) and character n-grams (tri- First, we do data preprocessing in order to make grams, quadgrams, and the combination of charac- classification process more efficient and gives bet- ter trigrams and quadgrams). For the orthography ter results. We do five processes in data prepro- feature, we used the number of exclamation mark, cessing consists of case folding, data cleaning, text question mark, uppercase and lowercase. Mean- normalization, stemming, and stop words removal. while, for the lexicon features, we used sentiment Case folding was done to make all character in lexicon (negative and positive sentiment) given by lower case in order to standardize character case. (Koto and Rahmaningtyas, 2017) and abusive lexi- Next, data cleaning was done to remove unnec- con that we built ourselves compiled from abusive essary characters such as re-tweet symbol (RT), words that used as queries when crawling Twit- username, URL, and punctuation. Since we do 9 not use emoticon for feature extraction, we also https://github.com/har07/PySastrawi 51
ter data. After the feature extraction process was Table 1: Best of each type of feature extraction based done, the dataset is ready for classification pro- on average accuracy for the first scenario cess. Best Feature Average For the classifier, we used three machine learn- Type of Based on Average Accuracy ing classification algorithms which are Naive Feature Accuracy (%) Bayes (NB), Support Vector Machine (SVM), and word unigram + Random Forest Decision Tree (RFDT). Based on word n-gram 59.44 bigram + trigram the previous works (Alfina et al., 2017, 2018; Pu- character character tri, 2018; Ibrohim and Budi, 2018), those three al- 52.55 n-gram quadgrams gorithms can give a pretty good performance in ortography question mark 44.44 doing hate speech and abusive language detec- lexicon negative sentiment 44.45 tion in Bahasa Indonesia (Indonesian language). Notice that these three classifiers are single label output classifiers. It means, those three classi- ble 1), we observe that for the first experiment sce- fiers cannot solve multi-label text classification di- nario, the combination of word unigram, bigrams, rectly. To overcome this problem, we applied data and trigrams is the best word n-grams feature, transformation method (Kafrawy et al., 2015) such character quadgrams is the best character n-grams that the classifiers that we use can solve multi- feature, question mark is the best orthography fea- label text classification problem. We used three ture, and negative sentiment is the best lexicon fea- data transformation methods that are Binary Rel- ture. However, if viewed individually, RFDT clas- evance (BR), Label Power-set (LP), and Classifier sifier with LP data transformation method when Chains (CC) (Kafrawy et al., 2015). In doing clas- using word unigram feature gives the best perfor- sification, we do classification using each type of mance with 66.12% of accuracy. feature extraction first. After that, we do classifi- cation using the combination of best feature from After obtaining the best features of each type of each type of feature extraction. For the evalua- features, we do experiments using the combination tion, we used 10-fold cross-validation technique of best features from each type of features. Based (Kohavi, 1995) with accuracy as the metric evalu- on experiments using the combination of best fea- ation. Accuracy in this research is calculated using tures from each type of features, we obtain that formula as follow (Kafrawy et al., 2015): the combination of best features in this first exper- iment scenario does not give a significant result on D ! 1 X L̂(i) ∧ L(i) the classification accuracy results. The best per- Accuracy = × 100% formance in experiments using the combination of D i=1 L̂(i) ∨ L(i) (1) best features is obtained when using RFDT clas- where D is total document in corpus (dataset), L̂(i) sifier with LP data transformation method using is the prediction result of ith document, and L(i) is the combination of character quadgrams, question the actual label of ith document. mark, and negative sentiment just gives 65.73% of accuracy, still cannot exceed the accuracy given 4.1 First Scenario Experiment Result by the RFDT classifier with LP data transforma- In this research, the first experiment scenario was tion method using word unigram feature that can done to know the combination of features, classi- give 66.12% of accuracy. fier, and data transformation method that we used Based on classification results and data analysis, that can give the best accuracy in identifying abu- we observe that the word unigram features gives sive language and hate speech including the target, the best accuracy may be because they represent categories, and level that was contained in a tweet. the characteristics of each label. In each classifica- To obtain that, we do experiments using every type tion label, there are words that characterize the la- of feature extractions first. The experiment results bel. For example, in hate speech label, each tweet for the best type of feature based on average accu- that labeled as hate speech contain hate words such racy using all classifiers and data transformation as abusive words that demean an individual or methods for the first scenario is in Table 1. group (e.g. jelek (ugly), murahan (gimrack), etc.), Based on average accuracy when doing experi- hate words related to politics in Indonesia (e.g. an- ments using every type of feature extractions (Ta- tek (henchman), komunis (communist), etc.), and 52
threatening/provoking words (e.g. bakar (burn), on the experiment using the combination of best bunuh (kill), etc.). Next, for classifiers analysis, features from each type of features, we obtain that the ensemble method on RFDT relatively can give the combination of best features in this second ex- better accuracy compared to NB and SVM. For periment scenario can give slightly better perfor- data transformation methods, LP can give the best mance compared to when we do not combine the accuracy because each unique label formed from best feature. RFDT classifier with LP data trans- the power-set process will have a correlation be- formation method when using the combination of tween labels so that it can reduce classification er- word unigram, character quadgrams, positive sen- ror (Kafrawy et al., 2015). timent, and abusive lexicon features can gives the best performance with 77.36% of accuracy. 4.2 Second Scenario Experiment Result The second experiment scenario in this research 4.3 Discussions was done to know the combination of features, Based on the first and second experiment scenario classifiers, and data transformation methods that results, we obtained that word unigram, RFDT, can give the best accuracy in identifying abusive and LP is the best combination of feature, clas- language and hate speech in a tweet without iden- sifier, and data transformation method for both tifying the target, categories, and level of hate scenarios. From the second experiment scenario, speech. Same as the first experiment scenario, we our approach can reach a good enough perfor- do experiments using every type of feature extrac- mance in doing multi-label text classification to tions first. The experiment results for best each identify abusive language and hate speech with- type of feature based on average accuracy using out identifying the target, categories, and level all classifier and data transformation method for of hate speech with 77.36% of accuracy when the second scenario can be seen in Table 2. using RFDT classifier with LP data transforma- Table 2: Best of each type of feature extraction based tion method and word unigram feature extraction. on average accuracy for the second scenario However, when doing multi-label text classifica- tion to identify abusive language and hate speech Best Feature Average including its target, categories, and level in the Type of Based on Average Accuracy first scenario, the best performance from all our Feature Accuracy (%) approaches that we use still does not give a good word n-gram word unigram 73.53 enough performance (only 66.12% of accuracy). character character From our error analysis using confusion matrix 72.44 n-gram quadgrams (Fawcett, 2006) on each classification labels, the ortography exclamation mark 45.27 most common type of error is false negative. This positive sentiment misclassification is likely due to a large amount lexicon 52.10 + abusive lexicon of unbalanced data in our dataset. According to (Ganganwar, 2012), unbalanced dataset can give Based on average accuracy when doing experi- negative results on classification performance be- ment using every type of feature extractions (Ta- cause the unbalanced number of dataset between ble 2), we obtain that for the second experiment the majority and minority classes tends to make scenario, word unigram feature is the best word the classification performance on majority class n-grams feature, character quadgrams is the best better than classification performance on the mi- character quadgrams feature, exclamation mark nority class, such that it is necessary to balance is the best orthography feature, and the combi- the dataset. The balancing dataset process can be nation of positive sentiment and abusive lexicon done by collecting new data and doing the anno- is the best lexicon feature. If viewed individu- tation process with a focus on minority labeled ally, RFDT classifier with LP data transformation data. However, this method needs to consider method when using word unigram feature gives the data labeling process may be more expensive the best performance with 76.16% of accuracy. (Sabou et al., 2014). Some other methods that After obtaining the best features of each type of can be done to balance the dataset are data re- features, we do experiment using the combination sampling (Chawla et al., 2002) and data augmen- of best features from each type of feature. Based tation (Wang and Yang, 2015; Kobayashi, 2018). 53
Notice that balancing dataset on multi-label tion method for all scenarios we did. However, problems is a quite difficult process because of the although our approach can reach a good enough relationship between labels (Giraldo-Forero et al., performance in doing multi-label classification to 2013). To overcome this problem, several tech- identify abusive language and hate speech without niques can be used, one of which is the hierar- identifying the target, categories, and level of hate chical multi-label classification (Madjarov et al., speech (77.36% of accuracy), all the approaches 2014). In this paper, the multi-label classifica- we used still does not give a good enough per- tion problem can be seen as hierarchical multi- formance when doing multi-label classification to label classification problem that can be done by identify abusive language and hate speech includ- identifying hate speech and abusive language first, ing identify the target, categories, and level of hate and then reclassifying the tweets identified as hate speech (only 66.12% of accuracy). speech to identify the target, categories, and level For future work, we suggest using hierarchi- of hate speech separately. This approach can make cal multi-label classification approach (Madjarov the process of dataset balancing easier as classifi- et al., 2014) for abusive language and hate speech cation is done separately for each label type (Feng identification including identify the target, cate- and Zheng, 2017). gories, and level of hate speech. Our error analysis shows that a lot of false negative errors is prob- 5 Conclusions and Future Works ably caused by the unbalanced dataset (Gangan- war, 2012) such that it is necessary to balance the In this paper, we discussed hate speech and abu- dataset. This hierarchical multi-label classification sive language detection in Indonesian Twitter. approach can make the process of dataset balanc- We conducted Focus Group Discussion (FGD) ing easier because the classification is done sepa- with staffs of Direktorat Tindak Pidana Siber rately on each label type (Feng and Zheng, 2017). Bareskrim Polri as the agency responsible for in- Besides doing hierarchical multi-label classifi- vestigating cyber crimes in Indonesia in order to cation and dataset balancing, another thing that get a valid definition of hate speech, including the needs to be tried to improve the accuracy of this hate speech characterization. The results of the research is to add a semantic feature, namely word FGD are then poured into annotation guidelines embedding (Mikolov et al., 2013) in the feature ex- for the purposes of annotating hate speeches. Be- traction process. In some text classification exper- sides conducted FGD with with staffs of Direk- iments in the Indonesian language (Saputri et al., torat Tindak Pidana Siber Bareskrim Polri, we also 2018; Jannati et al., 2018), adding word embed- conducted discussions with an expert linguist in ding features to basic features such as word n- order to make sure that the annotator guidelines we grams is shown to improve classification perfor- built valid and easy to understand by an annotator mance because the word embedding feature can who is not a linguistic expert. Moreover, we also recognize word meaning that cannot be captured built gold standard annotations for testing whether by features such as frequency term, orthography a prospective annotator has read and understood and lexicon features. the annotations guide or not. We then built a From the FGD results, we obtained that han- dataset for abusive language and hate speech iden- dling hate speech problem in social media is not tification (including identification of targets, cat- just about identifying whether a text/document is egories, and level hate speech) using annotation hate speech or not. There are several other tasks guidelines and gold standard annotations that have which needs to done to help the authorities in han- been made. Our dataset including the annotation dling hate speech problems such as the identifica- guidelines and gold standard annotations are open tion of buzzers, thread starters, and fake account for public such that other researchers who are in- spreaders of hate speech. terested in doing research in hate speech and abu- sive language identification in Indonesian social Acknowledgments media can use it. After building the dataset, we did two experi- The authors acknowledge the PITTA A research ment scenarios. Our experiment results show that grant NKB-0350/UN2.R3.1/HKP.05.00/2019 word unigram, RFDT, and LP is the best combi- from Directorate Research and Community nation of feature, classifier, and data transforma- Services, Universitas Indonesia. 54
References Muhammad Okky Ibrohim and Indra Budi. 2018. A dataset and preliminaries study for abusive language Mirna Adriani, Jelita Asian, Bobby Nazief, S. M.M. detection in indonesian social media. Procedia Tahaghoghi, and Hugh E. Williams. 2007. Stem- Computer Science, 135:222 – 229. ming indonesian: A confix-stripping approach. ACM Transactions on Asian Language Information Muhammad Okky Ibrohim, Erryan Sazany, and Indra Processing (TALIP), 6(4):1–33. Budi. 2018. Identify abusive and offensive language inindonesian twitter using deep learning approach. Ika Alfina, Rio Mulia, Mohamad Ivan Fanany, and Journal of Physics: Conference Series. Yudo Ekananta. 2017. Hate speech detection in the indonesian language: A dataset and preliminary R. Jannati, R. Mahendra, C. W. Wardhana, and study. In International Conference on Advanced M. Adriani. 2018. Stance classification towards po- Computer Science and Information Systems (ICAC- litical figures on blog writing. In 2018 International SIS), pages 233–238. Conference on Asian Language Processing (IALP), pages 96–101. Ika Alfina, Siti Hadiyan Pratiwi, Indra Budi, Rio Mu- lia, and Yudo Ekanata. 2018. Detecting hate speech Passent El Kafrawy, Amr Mausad, and Heba Esmail. against religion in the indonesian language. Jour- 2015. Article: Experimental comparison of meth- nal of Telecommunication, Electronic and Computer ods for multi-label classification in different appli- Engineering (JTEC). cation domains. International Journal of Computer APJII. 2017. Infografis Penetrasi dan Pengguna Inter- Applications, 114(19):1–9. net Indonesia Survey 2017. Pustaka PelajarAsosiasi S. Kobayashi. 2018. Contextual augmentation: Data Penyelenggara Jasa Internet Indonesia, Jakarta. augmentation by words with paradigmatic relations. Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. In Proceedings of NAACL-HLT 2018, pages 452– Hall, and W. Philip Kegelmeyer. 2002. Smote: Syn- 457. thetic minority over-sampling technique. Journal of Ron Kohavi. 1995. A study of cross-validation and Artificial Intelligence Research, 16:321–357. bootstrap for accuracy estimation and model selec- Ying Chen, Yilu Zhou, Sencun Zhu, and Heng Xu. tion. In Proceedings of the 14th International Joint 2012. Detecting offensive language in social me- Conference on Artificial Intelligence - Volume 2, dia to protect adolescent online safety. In Proceed- IJCAI’95, pages 1137–1143, San Francisco, CA, ings of the 2012 ASE/IEEE International Confer- USA. Morgan Kaufmann Publishers Inc. ence on Social Computing and 2012 ASE/IEEE In- Komnas HAM. 2015. Buku Saku Penanganan Ujaran ternational Conference on Privacy, Security, Risk Kebencian (Hate Speech). Komisi Nasional Hak and Trust, SOCIALCOM-PASSAT ’12, pages 71– Asasi Manusia, Jakarta. 80, Washington, DC, USA. IEEE Computer Society. F. Koto and G. Y. Rahmaningtyas. 2017. Inset lexi- Thomas Davidson, Dana Warmsley, Michael W. Macy, con: Evaluation of a word list for indonesian senti- and Ingmar Weber. 2017. Automated hate speech ment analysis in microblogs. In 2017 International detection and the problem of offensive language. In Conference on Asian Language Processing (IALP), International AAAI Conference on Web and Social pages 391–394. Media (ICWSM), pages 512–515. T Fawcett. 2006. An introduction to roc analysis. Pat- G. Madjarov, I. Dimitrovsk, D. Gjorgjevikj, and tern Recognition Letters, 27:861–874. S. Dzerosk. 2014. Evaluation of different data- derived label hierarchies in multi-label classifica- Fu P. Feng, S. and W. Zheng. 2017. A hierarchical tion. In Proceedings of the 3rd International Con- multi-label classification algorithm for gene func- ference on New Frontiers in Mining Complex Pat- tion prediction. Algorithms, 10(4):1–14. terns, pages 19–37. V Ganganwar. 2012. An overview of classification Mary L. McHugh. 2012. Interrater reliability: the algorithms for imbalanced datasets. International kappa statistic. Biochemia medica, 22(3):276–282. Journal of Emerging Technology and Advanced En- gineering, 2(4):42–47. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor- rado, and Jeff Dean. 2013. Distributed representa- A. F. Giraldo-Forero, J. A. Jaramillo-Garzon, J. F. tions of words and phrases and their composition- Ruiz-Munoz, and C. G. Castellanos-Dominguez. ality. In C. J. C. Burges, L. Bottou, M. Welling, 2013. Managing imbalanced data setsin multi-label Z. Ghahramani, and K. Q. Weinberger, editors, Ad- problems: A case study with the smote algorithm. vances in Neural Information Processing Systems In Proceedings of the 18th Iberoamerican Congress 26, pages 3111–3119. Curran Associates, Inc. on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pages 334–342. C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and Y Chang. 2016. Abusive language detection in on- Bayu Hernanto and Jeihan. 2018. Personal communi- line user content. In International World Wide Web cation. Conference Committee (IW3C2), pages 145–153. 55
Muzainah Nurasijah. 2018. Personal communication. Tansa Trisna Astono Putri. 2018. Analisis dan deteksi hate speech pada sosial twitter berbahasa indonesia. Master’s thesis, Faculty of Computer Science, Uni- versitas Indonesia. Marta Sabou, Kalina Bontcheva, Leon Derczynski, and Arno Scharl. 2014. Corpus annotation through crowdsourcing: Towards best practice guidelines. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC- 2014). European Language Resources Association (ELRA). Nikmatun Aliyah Salsabila, Yosef Ardhito Winatmoko, and Ali Akbar Septiandri. 2018. Colloquial indone- sian lexicon. In 2018 International Conference on Asian Language Processing (IALP), pages 236–239. M. S. Saputri, R. Mahendra, and M. Adriani. 2018. Emotion classification on indonesian twitter dataset. In 2018 International Conference on Asian Lan- guage Processing (IALP), pages 90–95. Gregory H. Stanton. 2009. The rwandan genocide: Why early warning failed. Journal of African Con- flicts and Peace Studies, 1(2):6–25. F. Z. Tala. 2003. A study of stemming effects on infor- mation retrieval in bahasa indonesia. Master’s the- sis, Universiteti van Amsterdam The Netherlands. S. Turaob and J.L Mitrpanont. 2017. Automatic dis- covery of abusive thai language. In International Conference on Asia-Pacific Digital Libraries, pages 267–278. Fabio Del Vigna, Andrea Cimino, Felice Dell’Orletta, Marinella Petrocchi, and Maurizio Esconi. 2017. Hate me, hate me not: Hate speech detection on facebook. In Proceedings of the First Italian Con- ference on Cybersecurity (ITASEC17), pages 86–95. W. Y. Wang and D. Yang. 2015. Thats so annoying!!!: A lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using #petpeeve tweets. In EMNLP, pages 2557–2563. Zeerak Waseem and Dirk Hovy. 2016. Hateful sym- bols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL Student Research Workshop, pages 88–93, San Diego, California. Association for Computa- tional Linguistics. I Dewa Putu Wijana and Muhammad Rohmadi. 2010. Sosiolinguistik: Kajian, Teori, dan Analisis. Pus- taka Pelajar, Yogyakarta. Harish Yenala, Ashish Jhanwar, Manoj K. Chinnakotla, and Jay Goyal. 2017. Deep learning for detecting in- appropriate content in text. In International Journal of Data Science and Analytics. 56
Appendix 1: Example of query used for crawling Twitter data Query Description Citation keparat Abusive word (other) (Wijana and Rohmadi., 2010) anjing Abusive word (other) (Wijana and Rohmadi., 2010) asu Abusive word (other), other form of anjing (Ibrohim and Budi, 2018) banci Abusive word related to gender/sexual orientation (Wijana and Rohmadi., 2010) bangsat Abusive word (other) (Wijana and Rohmadi., 2010) Abusive word related to gender/sexual orientation, bencong (Wijana and Rohmadi., 2010) another form of banci jancuk Abusive word related to gender/sexual orientation (Wijana and Rohmadi., 2010) budek Abusive word related to physical/disability (Wijana and Rohmadi., 2010) burik Abusive word related to physical/disability (Wijana and Rohmadi., 2010) cocot Abusive word (other) (Ibrohim and Budi, 2018) ngewe Abusive word related to gender/sexual orientation (Ibrohim and Budi, 2018) kafir Abusive word related to religion/creed (Wijana and Rohmadi., 2010) Abusive word related to religion/creed, another form kapir (Ibrohim and Budi, 2018) of kafir sinting Abusive word related to physical/disability (Wijana and Rohmadi., 2010) antek Word related to hate speech issue in politics (Hernanto and Jeihan, 2018) asing Word related to hate speech issue in politics (Hernanto and Jeihan, 2018) Word related to hate speech issue in politics, another aseng (Hernanto and Jeihan, 2018) form of asing ateis Abusive word related to religion/creed (Wijana and Rohmadi., 2010) sitip Abusive word related to race/ethnicity (Wijana and Rohmadi., 2010) autis Abusive word related to physical/disability (Wijana and Rohmadi., 2010) picek Abusive word related to physical/disability (Ibrohim and Budi, 2018) ayam kampus Abusive phrase related to gender/sexual orientation (Ibrohim and Budi, 2018) bani kotak Phrase related to hate speech issue in politics (Hernanto and Jeihan, 2018) cebong Word related to hate speech issue in politics (Hernanto and Jeihan, 2018) Word related to hate speech issue in politics and cina (Hernanto and Jeihan, 2018) race/ethnicity Word related to hate speech issue in politics and china (Hernanto and Jeihan, 2018) race/ethnicity, other form of cina hindu Word related to hate speech issue in religion/creed (Hernanto and Jeihan, 2018) katolik Word related to hate speech issue in religion/creed (Hernanto and Jeihan, 2018) Word related to hate speech issue in religion/creed, katholik (Hernanto and Jeihan, 2018) another form of katolik Word related to hate speech issue in politics and komunis (Hernanto and Jeihan, 2018) race/ethnicity kristen Word related to hate speech issue in religion/creed (Hernanto and Jeihan, 2018) onta Word related to hate speech issue in politics (Hernanto and Jeihan, 2018) pasukan nasi Phrase related to hate speech issue in politics (Hernanto and Jeihan, 2018) Word related to hate speech issue in politics and tionghoa (Hernanto and Jeihan, 2018) race/ethnicity 57
You can also read