An enhanced personality detection system through user's digital footprints
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
An enhanced personality detection system through user’s digital footprints Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 ............................................................................................................................................................ Mohammad Mobasher and Saeed Farzi Department of Software Engineering, K. N. Toosi University of Technology, Tehran, Iran ...................................................................................................................................... Abstract One of the most important aspects of any person’s life is personality, which affects one’s speech, decision, well-being, feeling and mental health. Personality detection is usually based on data collected by a questionnaire that comprises some critical prob- lems such as the lack of direct access to the individuals and explicit personal infor- mation. However nowadays, one of the valuable resources for such studies is social networks. The footprint and tracking of users on social networks have provided valuable information for personality recognition. Specifically, this research introdu- ces an intelligence personality recognition system based on modeling user behavior using sophisticated features, i.e., Statistical, Emotional, and Linguistic. Furthermore, a dataset called KNTU_Personality based on the MBTI personality model with the Correspondence: profile information and tweets has been collected. The experimental study follows Saeed Farzi, Department of two scenarios with complementing objectives. First the sensitivity analysis is per- Software Engineering, K. N. formed respecting to setting parameters, introduced features and different learning Toosi University of Technology, Tehran, Iran. algorithms. Next the proposed system has been compared with well-known person- E-mail: ality detection systems. The results demonstrate the superiorities of the proposed saeedfarzi@kntu.ac.ir system regarding its counterparts in terms of F-Score, Precision, Recall and Accuracy. ................................................................................................................................................................................. 1 Introduction Currently, the personality detection is done by responding to questionnaires prepared by sociological For a large-scale society, making policies such as educa- specialists; nonetheless, this method suffers from two tion, mass media, and community orientation to elim- critical problems. (1) Respondents often have little inate specific anomalies require a proper perception of desire to answer lots of questions. (2) Preparing the the society. This perception can be achieved by identi- suitable implicit questions is a hard task even for fying people’s personalities in the society. Of course, sociological specialists. Since the questions need to personality recognition is also used in a variety of other be asked implicitly in order to reveal a variety of fields such as finance (Kannadhasan et al., 2016; Wang aspects of respondents’ personality. and and Lu, 2018), recommendation systems Here, the main idea of coming up with these prob- (Tahmasebi and Fotouhi, 2019), mental health, person- lems is tracking user’s activity and following users’ al or business relationship improvement (Orme, 2016), footprints on social networks instead of using long and determining job path (Ting and Varathan, 2018). and hard questionnaires. By analyzing this valuable Even nowadays, applications use the user’s personality information, identifying the personalities of the user to improve user experience (Mehta et al., 2019). becomes an easy, precise and automatic task. Digital Scholarship in the Humanities VC The Author(s) 2021. Published by Oxford University Press on behalf of EADH. All 1 of 21 rights reserved. For permissions, please email: journals.permissions@oup.com doi:10.1093/llc/fqaa070
M. Mobasher and S. Farzi Due to the widespread use of social networks in Unlike other social networks’ contents, typically recent years—every person spends an average of photos or videos, Twitter usually uses short texts more than 135 minutes a day on social media—a pris- with 280-character length limitation. tine mine of user data has been created (Kircaburun In this study, we introduce a personality recogni- and Griffiths, 2018). This mine is full of behavioral, tion system using users’ footprints on the Twitter so- cial network based on the MBTI model. In this regard, Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 contextual, structural and demographical information of different characters, including politicians, artists, users’ footprints on the social network are modeled athletes and eventually ordinary people. Indeed, min- through three types of linguistic (Linguistic modeling ing this mine, despite its risks, is valuable and useful. of users tweets), emotional and statistical (Statistical Social networks users commonly share their opin- and Descriptive aspects of users’ Activities) features. ions either explicitly or implicitly, in a straightforward All the experiments performed in this study can be manner without regarding social interactions, on vari- described in three general sections: (1) Several ous issues, e.g., political, social, sports or even most Boosting Algorithms. (2) Several features sets. (3) private emotions and behaviors about music, movies, Features set combination. entertainment and so on. Obviously, this has pro- The purposes of all of these exercises are to develop duced a considerable volume of data, making it an a smart personality recognition system, using the best ideal platform for analyzing user’s personality algorithms as well as the best features set and compare (Kumar et al., 2013; Liao et al., n.d.). In recent decade, result together. As we mentioned in previous section, Twitter has become one of the most popular social one of the best algorithms used in this study is the networks as a microblog (Sakaki et al., 2010) to share CatBoost algorithm, which has been able to yield ac- users’ opinions, feelings and thoughts. Figure 1 ceptable f-score 82%. shows the trend of increasing its users over the years 2014–2019. There are various theories of personality prediction 2. Background and Related Work in psychology, such as BigFive1(Mccrae and John, 1992), MBTI2(Boyle, 1995), DISC3(Renzulli, 1990) Many scholars have focused on identifying users and so on. However, after some considerations and personalities in social networks, especially Twitter, be- the literature review process, the MBTI theory, one of cause of their importance and their increasing usage to the most common among people concerned with effectively identify one’s personality using user- understanding their personality or society, is used in generated content (Alsadhan and Skillicorn, 2017). this study. The basis of this theory relays on four dis- This section first introduces personality models and tinct dipoles of personality (i.e., Introvert-Extrovert, then reviews the related works. Intuitive-Sensing, Thinking-Feeling, and Judging- 2.1 Brief Theory of Personality modeling Perceiving). One of the most important challenges of tradition- The development of the MBTI theory was carried out al machine learning algorithms is the lack of labeled by Myers and McCaulley based on Carl Jung’s book.5 In this theory, human behavior is based on four es- data. As the MBTI model uses sixteen personality sential personality attributes. These four main attrib- types, it is necessary that each type has sufficient in- utes are named Mind (Extroverts (E) and Introverts formation from users who have that personality type. (I)), Energy (Observant (S) or Intuitive (N)), Nature The absence of labeled data and the imbalanced dis- (Thinking (T) and Feeling (F)), Tactic (Judging (J) tribution of data across different personality types are and Perceiving (P)). Each individual belongs to one of two critical problems that personality recognition sys- the dimensions of each of the four attributes. The four tems face. To address this problem, a dataset called main attributes upon which this theory is developed KNTU_Personality4 was collected with 1,357 users. are described as follows. This dataset includes users profile information and their tweets, which will be provided free of charge to † Mind refers to how one interacts with the world researchers. around them. In this dimension, people are 2 of 21 Digital Scholarship in the Humanities, 2021
Personality detection Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 Fig. 1 Distribution of twitter usage (https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/) divided into Introverted (I) and Extroverted (E). and Perceiving (P). Judging are people who plan Extroverts usually have more relationships and for all their work to avoid an unspecified factor friendships than introverts. But instead, introverts and also complete all tasks. But in contrast to flex- spend most of their time immersing themselves in ible people decide too early to do their job and they their thoughts and prefer being alone rather than do not plan much ahead. in groups. † Energy focuses more on how one perceives the In the MBTI model, each of these dimensions will world around them. Based on this feature, people be denoted by a single letter, and by combining, there are divided into two parts: Observant(S) and will be sixteen personality types, i.e., INTJ, INTP, Intuitive (N). For Observant individuals, genuine ENTJ, ENTP, INFJ, INFP, ENFJ, ENFP, ISTJ, ISFJ, and documented information takes precedence ESTJ, ESFJ, ISTP, ISFP, ESTP, ESFP. over intuition and inspiration, so they are cautious about the details of the steps involved. But intui- 2.2 Related Work tionists, in turn, pay more attention to probabil- This section taxonomies pervious works and, ities than to facts. describes in more details and finally, in Table 1, sum- † Nature defines the individual’s approach to marizes them in terms of data, learning algorithm and decision-making when facing difficulties, and peo- features. ple are divided into two categories of Thinking (T) and Feeling(F). Thinkers have an analytical spirit 2.2.1 Facebook dataset and usually tell the truth, but felling pay attention Quercia et al. (2012) developed a smart system using to the outcome and the impact that their decision personality information published by users in the may have on others when making a decision. myPersonality web application to measure the rela- † Tactic is based on how people are oriented in life, tionship among users who are highly popular on and people divide into two categories: Judging (J) Facebook and their personality types. The personality Digital Scholarship in the Humanities, 2021 3 of 21
M. Mobasher and S. Farzi Table 1 Summarizes the related works Research work Year Data Personality model Algorithm Features And tools (Quercia et al., 2012) 2012 myPersonality BigFive Relation — (Tandera et al., 2017) 2017 Facebook BigFive Deep Learning LIWC (250 Facebook users And SPLICE Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 with 10.000 Traditional learning Statuses) algorithm (150 Facebook users) (Nave et al., 2018) 2018 myPersonality BigFive Linear regression Demographic (22,252 MyPersonality users) (Sewwandi et al., 2017) 2017 Facebook BigFive Naı̈ve Bayesian LIWC (Preoţiuc-Pietro et al., 2015 Twitter BigFive — LIWC 2015) (1957 Twitter users Age-Gender with average 3400 message) (Bharadwaj et al., 2018) 2018 Twitter MBTI SVM LIWC 8600 record of each EmoSenticNet class in MBTI ConceptNet (Chhabra et al., 2019) 2019 Twitter BigFive LSTM Bag of words With unigram, bigram and trigram (Golbeck et al., 2011) 2011 Twitter and Facebook BigFive Regression in weka LIWC (50 users ) Statistical (Stankevich et al., 2018 Vkontakte BigFive SVM Statistical 2018) (165 profiles) Age Gender (Gatica-Perez et al., 2018 YouTube vlog BigFive Correlation Image and Audio 2018) (99 users) (Zou and Wu, 2019) 2018 Sina Webio BigFive Correlation — (Barry et al., 2019) 2019 Instagram BigFive Correlation — (149 undergraduates Like PNIa from university) NPIb FoMOSc (Yılmaz et al., 2020) 2017 James Pennebaker and BigFive Deep Learning Google’s pretrained Laura King’s stream- CNN word2vec of-consciousness embeddingse essay datasetd (Sarwani et al., 2019) 2019 Twitter MBTI Neural Network Bag of Word (25 users) a Pathological narcissism Inventory b Narcissistic personality Inventory c Fear of missing out survey d http://web.archive.org/web/20160519045708/http://mypersonality.org/wiki/doku.php?id¼wcpr13 e https://code.google.com/archive/p/word2vec/ types of people in this study was based on the BigFive statuses, compiled using in the myPersonality web ap- theory, and the measure of the popularity of users was plication, and the other with 150 users and their sta- the number of friends per user. Tandera et al. (2017) tus. This research used linguistic features such as predicted the user’s personality based on the BigFive LIWC6 and SPLICE, to build traditional classification theory. They used two datasets from Facebook. One models as well as deep learning. Nave et al. (2018) consisting of 250 users’ data and their last 1,000 investigated the relationship between personality 4 of 21 Digital Scholarship in the Humanities, 2021
Personality detection types and interests in music. In this study, two datasets and their choices. The number of hashtags, and the consisting of 21,929 myPersonality users based on the number of followers are two features. How an indi- Big Five Theory, were used to obtain personality types. vidual communicates with others or how he or she Demographic features such as age, gender and num- chooses friends were considered as indications for ber of likes were used to build such a system. choice. Sewwandi et al. (2017) have designed and imple- Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 mented an intelligent system to identify individuals’ 2.2.3 Vkontakte dataset personalities based on the BigFive theory using It is a social network in Russia that is available in Linguistic features on user-generated content on the several languages. According to statistics in 2018, the Facebook social network. In this study, the features number of users of this social network has reached 500 obtained from the LIWC tool have a good effect on million users. the fabrication of such a system. Sarwani et al. (2019) Stankevich et al. (2018) by using the BigFive the- have solved a personality classification problem using ory, developed a system to identify users’ personality a neural network algorithm. They used a 25-member in the Vkontakte social network. One of the problems suite of Facebook users under the BigFive theory. In in this study was the lack of labeled data. To solve this this research, they first obtained the documentation of problem, the researchers first asked the volunteers to users on Facebook social network and then, using TF/ fill in a questionnaire and then give their username in IDF methods, built a neural network with a back- the social network. Finally, their data set contains 165 propagation approach. The accuracy is 66%. Yılmaz users, along with their profile information. et al. (2020) used the BigFive theory for each person- ality type in this theory to design and train a model 2.2.4 YouTube dataset using user-generated sentences. According to research Video Blogs or Video Logs refer to those videos that reports, the sentences in this dataset were first con- Vloggers sit in front of the camera and talk about a verted to Word2Vec and then given as input to the variety of topics such as politics, books, movies, or network. personal matters. Gatica-Perez et al. (2018) studied the behavioral 2.2.2 Twitter dataset data of users on YouTube using Vlogger or Vlogs Preoţiuc-Pietro et al. (2015) researched the relation- that users have posted for at least three years, and ship between content generated by users with diseases were able to correlate the type of videos with the user’s such as depression, anxiety, and stress PTSD7 on the personality using the BigFive theory. In this study, for social network Twitter, based on the BigFive theory. each video, a set of twenty-one variables was obtained They collected data from 1957 users of the self- using online tools. According to the research, there is a reporting method that reported users suffering from connection between the Extraversion and Funny a disease, especially depression. They use Logistic re- videos. gression as a classifier with features such as age, gen- der, and LIWC were used to construct such an 2.2.5 Sina Webio dataset intelligent system. Bharadwaj et al. (2018) developed A social network (microblog) in China is launched in a smart personality recognition system using LIWC 2009. Zou and Wu (2019) investigated the relation- and EmoSenticNet tools on texts produced by users ship between user loyalties in improving the growing on Twitter. In this study, the MBTI theory and the trend of a social network. In this study, they first SVM algorithm were used to categorize the users’ per- explored the relationship between users’ personality sonality. Chhabra et al. (2019) were able to design and characteristics based on the BigFive theory and their implement a personality recognition system based on loyalty. According to the results of this study, there is a the BigFive theory using a data set collected from strong relationship between openness and loyalty. Twitter. Their proposed system uses LSTM for the classification. Golbeck et al. (2011) used demographic 2.2.6 Instagram dataset features related to individual activities on Twitter to Barry et al. (2019) studied the relationship between discover the relationship between individual’s lifestyle users’ selfies and their personality based on the BigFive Digital Scholarship in the Humanities, 2021 5 of 21
M. Mobasher and S. Farzi theory. During this study, the hypothesis that people introduced. In fact, every part of personality is illumi- with narcissism were generally unrelated to their selfie nated through one or more features. Hence introduc- sharing. ing sophisticated features is an essential part of Kircaburun and Griffiths (2018) investigated machine learning projects. Proposed features are Instagram addiction and its association with person- described in detail in Section 4.2. To address the se- ality types of the BigFive model. The study was con- Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 cond question, a research dataset called ducted on college students who also referred to the KNTU_Personality has been collected from the extent of internet usage and the degree of narcissism Twitter microblog that has produced, refined, and during the study. According to the results of this standardized respecting to the high-quality dataset. study, Instagram addiction has a weak relationship This dataset is described in Section 4.1. At the end, with agreeableness and self-linking personality traits, to answer the third question, given that data unbal- as well as conscientious personality traits. ancing and problem properties, different classifiers are Table 1 briefly describe the related works. examined, which are described in detail in Section 4.3. The overall architecture of the proposed system is showing in Fig. 2, in which the four main parts are 3 Problem Definition observed. (1) Data gathering: Gathering complete profile information of users along with their tweets. Definition Personality Recognition: it is defined In this phase, users, profile information as well as as assigning a personality type, P tweets that identify their personality types based on P P 2 fINTJ: ESFJ:ESTP: ENTJ . . .gg, to a given the MBTI model are collected from Twitter accounts. user, U fU U 2 users ¼ fu1 :u2 : u3 :u4 . . . :un gg, according to behavioral, emotional and sociological (2) Feature engineering: Based on studies on MBTI characteristics. Therefore, if the user U is vectorized theory and the nature of the problem that applies to all ~ ¼ < f1 : f2 : f3 : . . . : fm > then by its features as U avenues of life. We decided to extract three categories the personality recognition is a function approxima- of features. (3) Pre-processing: One of the essential tion which is shown by Equation (1). components in any problem is the application of pre-processing techniques to the type of data available. P ¼ F ðU ~ Þ; (1) This operation is essentially an empirical task and should be based on the knowledge gained from the where P 2 fINTJ, ESFJ, ESTP, ENTJ. . .g is a person- dataset. (4) Model and Classification: In this study, ality type based on the MBTI model.In the machine the family of Boosting algorithms is used. Several algo- learning literature, the estimation of the function F is rithms in this family are also used for comparison and done by a classifier whose classes are personality types evaluation. of the MBTI model. 4.1 Data gathering 4 Proposed System The data set should include the following information. (1) User information (such as the number of posts, The following three critical questions must be number of followers). (2) User-generated content answered to design a classifier. (1) How to map a (such as text and image posts). (3) User feedback on user to a feature space? (2) How to collect labeled others’ content (such as Likes and Retweet). (4) User training data? (3) What sort of classifier can accurately personality types. estimate the function F? Making such a dataset is done in three steps, iden- To addresses the first question, three types of tifying individuals, whose personality type is deter- Linguistic, Statistical and Emotional features as repre- mined, removing users whose profile is private, and sentatives of behavioral, emotional and sociological then gathering profile information and user-generated characteristics of a considering user have been content. These steps are described as follow. 6 of 21 Digital Scholarship in the Humanities, 2021
Personality detection Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 Fig. 2 Proposed system Step 1: Identify and retrieve user profile of users. In all, 63% of this dataset is for women and information 37% for men. The user information used in this study is part of the 4.2 Feature extraction Twisty (Verhoeven et al., 2012) data set and its users In the classification problem, generating and selecting are English speaking. This dataset includes usernames, the right features is one of the most critical steps in along with the personality types of 1,500 users on designing a learning system. This is also addressed in Twitter’s social network. The Twython8 library is this research. Different kinds of features are extracted used to communicate with the twitter servers and col- in three kinds of Linguistic, Statistical and Emotional lect the data needed in this study, which is briefly which briefly described in following. described in Fig. 3. This function includes two steps. First, it requests 4.2.1 Linguistic to communicate with Twitter server, which requires Generally, in a statistical linguistic model, we seek to five parameters that each developer must enter a find the probability function for a sequence of differ- unique value (line 2 to 5). Then, it fetches profile in- ent words. In other words, if the sentence W is made formation of anyone in the body (line 6 to 8). up of words < w1 : w2 : w3 . . . wN >, the goal is to find the following probability. Step 2: Remove private account At this point, users whose profiles were private were PðwÞ ¼ Pðw1 w2 w3 . . . wN Þ (1) identified and removed from the body. Finally, after this step, 1,357 users were obtained. This joint probability of words is computed using the chain rule as: Step 3: Data collection After identifying users, up to 3,000 recent tweets have Y N Pðw1 w2 w3 . . . wN Þ ¼ P ðwi w1 w2 wi1 Þ (2) been fetched per user, using the User_Tweet function i¼1 which its pseudo-code is described in Fig. 4. After these steps, something over 3.300 million It is assumed that the writing style and lexicon set tweets were collected. Figure 5 shows the distribution that users use to write tweets contain information Digital Scholarship in the Humanities, 2021 7 of 21
M. Mobasher and S. Farzi Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 Fig. 3 Function user’s profile data Fig. 4 Function user’s Tweet data 8 of 21 Digital Scholarship in the Humanities, 2021
Personality detection Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 Fig. 5 Distributions of user about their personality type. Hence, a linguistic model person’s emoji, we use Emoji-Emotion dataset9 where train for each personality type. The linguistic model an emoji describes negative, neutral, and positive will model users’ writing style in terms of lexicon, emotions. grammar, and meaning for each personality type rep- As we can see in Table 2, for each user based on resented by Equation (3). Equations (10), (11) and (12), three features #PositiveEmoji, #NeutralEmoji, #NegativeEmoji are pt X n generated. P_Emoji, N_Emoji, and Ne_Emoji are score ðu: pt Þ ¼ plm ðt Þ ¼ logplm ðti jti1 Þ (3) the functions that return the number of emoji used i¼1 by user based on their category. where pt represents the personality type and 4.2.3 Emotional logplm ðti jti1 Þ is the language model probability of People’s feelings about a subject have a direct impact term ti. This feature is one of the features introduced on their textual generated-contents such as posts and by this study. tweets. Past studies have shown that emotional traits 4.2.2 Statistical can be used in many tasks, such as the classification of sentences using their latent emotions (Luo, 2018). These features refer to user information such as num- Therefore, this study has a particular emphasis on ber of likes, number of posts, age, gender, and so on in the fact that one’s personality can be identified using the social network. The primary purpose of introduc- the feelings contained in his or her textual generated- ing these features is to understand the personality type contents. For this purpose, a set of features is provided of the user based on their Statistical features on social for recognition of the emotions in the textual networks. In this research, we have tried to introduce generated-contents for each user. features that have the most relationship with different To do this, several tools such as NRC,10 ParallelDot types of MBTI personality types, which can provide API11 and Sentiment140 API12 have been used for relatively high quality and accuracy in building an describe each person in terms of emotion in this study. intelligent personality recognition system. Table 1 To do this, several methods have been used in this reports the engineered statistical features used and study to describe each person in terms of emotion, introduced in this study. which is described below. Also, in this feature category, there are several fea- tures that related to emoji used by users. Emoji is one 4.2.3.1 NRC. The Emotion of each tweet could be of the simplest ways to express emotion and the gen- understood by the word used in the tweet. One can eral concept that it is growing rapidly on social media examine the set of tweets of a person based on the (Lin, 2019). To further analyze and understand each words used in those tweets and as we know each Digital Scholarship in the Humanities, 2021 9 of 21
M. Mobasher and S. Farzi Table 2 Statistical formulas Abbreviation Description Formula # Score Hashtag ðuÞ min ðjHashtagðiÞjÞ No_Hashtag (NH) # Hashtags u 2 users ðuÞ ¼ max i 2 users ðjHashtagðiÞjÞ min ðjHashtagðiÞjÞ (5) i 2 users i 2 users UniquHashtag ðuÞ min ðjUniquHashtagðiÞjÞ Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 Score No_UniquHashtag # Hashtags u 2 users ðuÞ ¼ max ðjUniquHashtagðiÞjÞ i 2 users min ðjUniquHashtagðiÞjÞ (6) i 2 users i 2 users Score MentionðuÞ min ðjMentionðiÞjÞ No_Mention # Mentions u 2 users ðuÞ ¼ max i 2 users ðjMentionðiÞjÞ min ðjMentionðiÞjÞ (7) i 2 users i 2 users Score Image ðuÞ min ðjImageðiÞjÞ No_Image # Images u 2 users ðuÞ ¼ max i 2 users ðjImageðiÞjÞ min ðjImageðiÞjÞ (8) i 2 users i 2 users Score Like ðuÞ min ðjLikeðiÞjÞ No_Like # Like u 2 users ðuÞ ¼ max i 2 users ðjLikeðiÞjÞ min ðjLikeðiÞjÞ (9) i 2 users i 2 users Score GeoðuÞ min ðjGeoðiÞjÞ No_Geo # Geo u 2 users ðuÞ ¼ max i 2 users ðjGeoðiÞjÞ min ðjGeoðiÞjÞ (10) i 2 users i 2 users Score WEmoji ðuÞ min ðjWEmoji ðiÞjÞ No_Whole_Emoji # Emoji u 2 users ðuÞ ¼ max i 2 users ðjWEmoji ðiÞjÞ min ðjWEmoji ðiÞjÞ (11) i 2 users i 2 users Score PEmoji ðuÞ min ðjPEmoji ðiÞjÞ No_Pos_Emoji # Positive emoji u 2 users ðuÞ ¼ max i 2 users ðjPEmoji ðiÞjÞ min ðjPEmoji ðiÞjÞ (12) i 2 users i 2 users min Score NeEmoji ðuÞ ðjNeEmoji ðiÞjÞ No_ Neu_Emoji # Neutral Emoji u 2 users ðuÞ ¼ max i 2 users ðjNeEmoji ðiÞjÞ min ðjNeEmoji ðiÞjÞ (13) i 2 users i 2 users min Score NEmoji ðuÞ ðjNEmoji ðiÞjÞ No_Neg_Emoji # Negative Emoji u 2 users ðuÞ ¼ max i 2 users ðjNEmoji ðiÞjÞ min ðjNEmoji ðiÞjÞ (14) i 2 users i 2 users min Score QSen ðuÞ ðjQSen ðiÞjÞ No_Question # Question sentence u 2 users ðuÞ ¼ max i 2 users ðjQSen ðiÞjÞ min ðjQSen ðiÞjÞ (15) i 2 users i 2 users min Score Sentence ðuÞ ðjSentenceðiÞjÞ No_Sentence # Sentence u 2 users ðuÞ ¼ max i 2 users ðjSentenceðiÞjÞ min ðjSentenceðiÞjÞ (16) i 2 users i 2 users Score Excl Sen ðuÞ min ðjExcl Sen ðiÞjÞ No_Exclamation # Exclamation u 2 users ðuÞ ¼ max i 2 users ðjExcl Sen ðiÞjÞ min ðjExcl Sen ðiÞjÞ (17) i 2 users i 2 users min Score Follower ðuÞ ðjFollowerðiÞjÞ No_Follower # Follower u 2 users ðuÞ ¼ max i 2 users ðjFollowerðiÞjÞ min ðjFollowerðiÞjÞ (18) i 2 users i 2 users Score Following ðuÞ min ðjFollowingðiÞjÞ No_Following # Following u 2 users ðuÞ ¼ max i 2 users ðjFollowingðiÞjÞ min ðjFollowingðiÞjÞ (19) i 2 users i 2 users Score GrMemðuÞ min ðjGrMemðiÞjÞ No_List # Group u 2 users ðuÞ ¼ max i 2 users ðjGrMemðiÞjÞ min ðjGrMemðiÞjÞ (20) i 2 users i 2 users Score Tweet ðuÞ min ðjTweetðiÞjÞ No_Tweet # Reply u 2 users ðuÞ ¼ max i 2 users ðjTweetðiÞjÞ min ðjTweetðiÞjÞ (21) i 2 users i 2 users Score ReTweet ðuÞ min ðjReTweetðiÞjÞ No_ReTweet # Retweet u 2 users ðuÞ ¼ max i 2 users ðjReTweetðiÞjÞ min ðjReTweetðiÞjÞ (22) i 2 users i 2 users Score Truncated ðuÞ min ðjTruncatedðiÞjÞ No_Truncated # Truncated Tweet u 2 users ðuÞ ¼ max i 2 users ðjTruncatedðiÞjÞ mini 2 users ðjTruncatedðiÞjÞ (23) i 2 users min Score Tweet ðuÞ ðjTweetðiÞjÞ Average_Time interval Average time for every tweet u 2 users ðuÞ ¼ max i 2 users ðjTweetðiÞjÞ min ðjTweetðiÞjÞ (24) i 2 users i 2 users word may have multiple difference senses. And NRC provide different emotional properties (happy, angry, allow us to get those differences senses by more than excited, sarcasm, sad, fear, bored) for each text. 14,000 words (Mohammad and Turney, 2013). This In order to use this API, all the tweets were col- dataset contains eight emotional feature (anger, fear, lected for each user and as a result, the following six anticipation, trust, surprise, sadness, joy, disgust) and attributes were obtained for each user. two psychological feature (negative, positive) for each words. 4.2.3.3 Sentiment140 The basis of this analysis sys- 4.2.3.2 ParallelDots. This API13 can be used in four tem is a sentiment text designed by (Go et al., 2009). different languages, and it uses a variety of datasets to The data set used in this system is from the Twitter 10 of 21 Digital Scholarship in the Humanities, 2021
Personality detection social network, and this API14 has been used in many 5 Experimental Study works, such as in Heredia et al. (2016). As mentioned before, by analyzing the footprint of 4.3 Preprocessing Twitter users, their personality can be precisely pre- Data preprocessing is a significant step, and choosing dicted. To this end, the relationship among different the right technique can improve the results even fur- Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 personalities and features are investigated. Table 3 ther. All trials go through the preprocessing stage before reports the results with different features and learning the modeling phase. This step involves removing URLs, algorithms over the KNTU_Personality dataset. names, hashtags, spaces, lowercase letters, and deleting The evaluation of the proposed system has been users whose accounts are private. Python’s regex func- performed in different manners. First, the impact of tions are used to remove some entities from the text. To each feature over the output quality is calculated, and achieve the best results, we used the average value of a next, the proposed system is compared with other feature class to a person in a personality type. well-known algorithms. The proposed system is programed with Python on 4.4 Prediction model a 5 core CPU (i5-7200U) and 8GB memory. Machine learning techniques play an essential role in solving many problems in today’s world. For example, 5.1 Evaluation metrics in smart classification of spam emails, intelligent Evaluating machine learning algorithm is an essential advertising systems, and malicious malware detection part.Mostofthetimesaccuracymetricisusedtoevaluate systems. There are two crucial factors to achieve good the classification model; however, it is not enough to performance in solving each of these problems with truly judge about the model due to the imbalancing machine learning techniques. One is the use of prac- data. In this research, we evaluate results based on tical models that can identify the complex relation- Accuracy, Precision, Recall, F1-Score and ROC curve. ships in the data. Another factor to better train these There are four important terms that used in this metrics: models the enormous amount of data is needed. † True Positives (TP): The cases which we predicted Among all the machine learning algorithms that are used nowadays, the ensemble approaches are especial- YES and the actual output was also YES. † True Negatives (TN): The cases which we pre- ly interesting. In this class of methods, a robust clas- sifier is built by taking advantage of multiple weak dicted NO and the actual output was NO. † False Positives (FP): The cases which we predicted classifiers which is why these techniques are popular and effective (Friedman, 2001; Ke et al., 2017). YES and the actual output was NO. † False Negatives (FN): The cases which we pre- In summary, this technique operates by repeatedly retraining a classifier in conjunction with selecting a dicted NO and the actual output was YES. † Accuracy dataset based on the precision obtained from the pre- vious step. Each of these classifiers also adopts a weight- ing based on the accuracy obtained in that iteration. Informally, Aaccuracy is the prediction of our model got right. It is the ratio of correct predictions X T to the total input samples (Yin et al., 2019). It can be H ðx Þ ¼ sign at ht ðxÞ (25) calculated like Equation (26): t¼1 TP þ TN Accuracy ¼ (26) Where ht ðxÞ is the output of the weak classifier t on TP þ TN þ FP þ FN the input x. at is the weight of the classifier t. In this research, we use the algorithms of AdaBoost, CatBoost, GradiantBoost, XGBoost, † Precision LigthGBM in order to obtain the best results as well Precision is a good measure to determine, when the as to compare the results of three different gradient costs of False Positive is high. It is also called the amplification tree algorithms. Positive Predictive Value (PPV). It can be calculated Digital Scholarship in the Humanities, 2021 11 of 21
M. Mobasher and S. Farzi Table 3 Comparison of five algorithm on KNTU_Personality Algorithm AdaBoost CatBoost GradientBoosting LigthGBM XgBoost SþE þ L F 0.377 0.822 0.771 0.793 0.782 R 0.573 0.831 0.786 0.811 0.797 P 0.318 0.830 0.780 0.794 0.786 Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 A 0.350 0.822 0.767 0.785 0.774 EþL F 0.126 0.515 0.443 0.484 0.458 R 0.324 0.548 0.471 0.513 0.488 P 0.121 0.556 0.474 0.509 0.479 A 0.139 0.513 0.458 0.488 0.464 SþL F 0.416 0.803 0.782 0.782 0.784 R 0.618 0.811 0.802 0.797 0.797 P 0.361 0.818 0.782 0.794 0.794 A 0.358 0.803 0.777 0.781 0.779 SþE F 0.373 0.746 0.734 0.752 0.738 R 0.577 0.754 0.748 0.768 0.752 P 0.317 0.753 0.741 0.758 0.742 A 0.353 0.743 0.730 0.745 0.736 L F 0.116 0.485 0.481 0.508 0.474 R 0.152 0.496 0.494 0.528 0.480 P 0.155 0.529 0.521 0.532 0.469 A 0.122 0.487 0.483 0.508 0.469 E F 0.187 0.286 0.262 0.265 0.282 R 0.418 0.306 0.282 0.287 0.301 P 0.206 0.328 0.296 0.289 0.304 A 0.165 0.290 0.271 0.269 0.286 S F 0.387 0.720 0.709 0.741 0.717 R 0.589 0.725 0.723 0.753 0.727 P 0.343 0.733 0.713 0.748 0.728 A 0.345 0.716 0.705 0.740 0.716 like Equation (27). A low precision can also indicate a 1 F1 ¼ 2 1 1 (29) large number of False Positives (Davis and Goadrich, precision þ recall 2006; Yin et al., 2019). TP F1 Score tries to find the balance between preci- Precision ¼ (27) sion and recall (Davis and Goadrich, 2006). TP þ FP † Receiver Operation Characteristic (ROC) † Recall The idea of using ROC diagram in machine leaning Recall can be thought of as a measure of a was first discussed in 2005. This diagram is in fact the classifiers completeness. It is also called Sensitivity or TPR (Sensitivity) against FPR (Specificity) rate. The the True Positive Rate (TPR) (Davis and Goadrich, TPR actually calculated by Equation (28) and FPR by 2006). Equation (30). TP Recall ¼ (28) FP TP þ FN FPR ¼ (30) FP þ TN † F1-Score According to this metric, a suitable model place at F1-Score is the harmonic mean between precision and top left of the diagram—according to its acquired recall and range for this metric is [0, 1]. It tells point (TP ¼ 100%, FP ¼ 0)—and on unsuitable you how precise your classifier is, as well as how strong one place at bottom right—due to its point (TP ¼ it is. To calculates F1-Score we can use Equation (29). 0, FP ¼ 100%)—of the diagram (Prati and Flach, 12 of 21 Digital Scholarship in the Humanities, 2021
Personality detection 2005). An area of 1.0 represents a model that made all person’s statistical activities in social networks are predictions perfectly. An area of 0.5 represents a very relative with personality and this helps examine model as good as random. a person’s personality type. Next up, after the statistical features, the best cat- 5.2 Experiments egory feature that we can use them to predict person’s Applying each category of features used in this study personality is the linguistic features. Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 or combining them seems to produce different results. It can achieve to %50 in F1-Score metric for Therefore, in Table 3, all combinations of the feature LigthGBM and %48 F1-Score metric in CatBoost al- categories are examined.15 Also for better result, we gorithm in prediction personality. use gender feature in all model. Each feature category In the end, the emotional features have the lowest is denoted by the following abbreviations. Linguistic F1-Score among other metrics in algorithms com- (L), Statistical (S) and Emotional (E). All results are pared to the other two category features. based on 10-fold cross-validation, where folds are ran- The results in Fig. 7 are based on the correlation of domly sampled from the data. each category feature to MBTI classes. There are many As we can see in Table 3, the result from CatBoost metrics to measure the correlation between a feature algorithm show a significant increase in four mature and a class label such as mutual information, chi- Accuracy, Precision, Recall and F1-Score compared to square, correlation coefficient scores, Pearson etc. In other. The negative aspect of the algorithms is the this work, we use Pearson correlation coefficient. amount of time which we need to learn. Figure 6 Figure 7 can be calculated as follows: show the time for learning and testing model for each algorithm with statistical features. As we can P n see, the time for learning CatBoost algorithm is bigger ðxi x Þðyi y Þ i¼1 than others. For resolving this problem, we should r ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (31) P n 2 P n change hyper parameters like grow_policy, depth ðxi x Þ ðyi y Þ2 and so on. We can do this by manual change by i¼1 i¼1 user or tuning procedure (Probst et al., 2019). The best result among the three feature categories Where n is the number of samples. x and y are the is related to the statistical features. These features are items that we want to compare. xi and yi are value of fully numerical and represent individuals by their stat- element i in samples. x and y denote the mean of istical activities. By this result, it is true that the each items. Fig 6 Learning and testing time of statistical features Digital Scholarship in the Humanities, 2021 13 of 21
M. Mobasher and S. Farzi Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 Fig. 7 Categories correlation MBTI classes based on Pearson Fig. 8 CatBoost ROC curve 14 of 21 Digital Scholarship in the Humanities, 2021
Personality detection Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 Fig. 9 Feature important with p threshold (p > 0.044) Digital Scholarship in the Humanities, 2021 15 of 21
Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 Digital Scholarship in the Humanities, 2021 M. Mobasher and S. Farzi Fig. 9 Continued 16 of 21
Personality detection The value range for this metric is between 1 and ISFP have low value, and ISFP has less value than 1. 1 denotes perfect negative correlation, while 1 INTP. According to Fig. 5, the number of samples denotes perfect positive correlation and 0 is without for ISFP is less than INTP; also, the correlation be- correlation. tween ISFP and MBTI classes is less than INTP. As we can see, statistical features have most related As we can see, some classes like INTJ has good to MBTI classes than other category features. result in ROC curve, by review Fig. 5 it is observed Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 To achieve better results, we use all category fea- that this class has more user than other classes. With tures together. Table 3 shows results for combined more attention in Figs 5 and 8, all classes have more category features, and we can see improvement for women than men like ISTP and ESTP, have lower each combination, and in the end, we achieve the results. Thus, it can be conducted that gender is best result with a combination of three categories. such an essential feature for this research. To better understand these results, we show the Figure 9 is selection of feature important for each ROC curve in the CatBoost algorithm with all cat- category with p threshold (p > 0.044). As we can see egory features. gender feature is an important feature in all of them. As explained in section 6.1.5, if the value below the In statistical category Truncated and Geo are good curve is closer to 1, the model has a good reaction for feature. The tendency to write a lot (more than 140 that class. For example, according to the Fig. 8, INTP, character), as well as to share location when Fig. 9 Continued Digital Scholarship in the Humanities, 2021 17 of 21
M. Mobasher and S. Farzi Table 4 Important feature correlation with MBTI classes ISTP 0.045 0.232 0.382 0.0223 0.3372 0.251 ISTJ 0.099 0.007 0.226 0.4443 0.0196 0.1736 ISFP 0.073 0.391 0.571 0.1105 0.25703 0.0598 ISFJ 0.064 0.189 0.330 0.6245 0.0703 0.1527 INTP 0.090 0.162 0.162 0.0883 0.6437 0.37 Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 INTJ 0.108 0.364 0.137 0.0753 0.1644 0.2536 INFP 0.133 0.124 0.100 0.1696 0.1301 0.0784 INFJ 0.171 0.032 0.236 0.0307 0.3824 0.1129 ESTP 0.040 0.305 0.026 0.1446 0.426 0.0576 ESTJ 0.060 0.231 0.409 0.2884 0.1066 0.3442 ESFP 0.049 0.501 0.044 0.1086 0.0733 0.4349 ESFJ 0.060 0.117 0.115 0.1043 0.1568 0.3413 ENTP 0.087 0.146 0.184 0.0299 0.0142 0.0023 ENTJ 0.948 0.024 0.167 0.0513 0.0722 0.1272 ENFP 0.056 0.367 0.374 0.5351 0.2032 0.3460 ENFJ 0.103 0.097 0.242 0.2350 0.4221 0.0746 #Truncated #Geo #Photo Board Happy Sarcasm Statistical Emotional publishing a post, seems to be very important in the feature, as well as the average interval for each tweet, fits process of character recognition. well for personality recognition. The results of this According to Table 4, all features have positive and study confirm that the two dimensions of negative correlation with some MBTI classes. #Photo Extraversion and Introversion can be well predicted feature has good positive correlation with Introversion from user information in social networks. and negative correlation with Extroversion. #Geo has The second contribution of this work shows that Positive correlation with Extraversion that means, users user information on social networks, including lin- that shared their location, they have more Extraversion guistic, emotional, and statistical features, boosting type than Introversion. All emotional feature, just like algorithms can be developed for personality recogni- sample feature we show in Table 4 that are positive tion with excellent results. In the past, however, most feature (Happy) have good positive correlation with of the features used were restricted to a limited num- Extraversion type and negative feature like Board ber of emotional or statistical features or a combin- have negative correlation with Introversion type, there- ation of these two. fore, it can be concluded that people with extraversion Given the shortage of tagged data in this area, our type shared positive content. work can be used to fill this gap. CatBoost algorithm Almost all features according to Table 4 have posi- can be used to tag the data which can then be verified tive correlation with INTJ class. As we can see in ROC by experts. result (Fig. 9), this class has good result. One interesting area to be investigated as future works is use models such as deep learning to improve the results of personality recognition studies. For have 6 Conclusion and Future Work more and best dataset in this area we can use best In this study, first contribution of this work is intro- model from this research (CatBoost) for labeling ducing a new dataset for personality recognition studies new user in twitter social network and the use an ex- called KNTU_Personality. It contains information pert for evaluate the result. from more than 1,200 Twitter accounts which contains their profile information together with their tweets and References profile. The study on this data set reveals that (the user Alsadhan, N., and Skillicorn, D. (2017). Estimating gender together with their average interval of tweets are Personality from Social Media Posts. In IEEE very useful in recognizing their personality, the gender International Conference on Data Mining Workshops, 18 of 21 Digital Scholarship in the Humanities, 2021
Personality detection ICDMW, 2017-Novem, pp. 350–6. https://doi.org/10. Kannadhasan, M., Aramvalarthan, S., Mitra, S. K., and 1109/ICDMW.2017.51 Goyal, V. (2016). Relationship between biopsychosocial fac- Barry, C. T., McDougall, K. H., Anderson, A. C et al. tors and financial risk tolerance: an empirical study. Vikalpa, (2019). ‘Check Your Selfie before You Wreck Your 41(2): 117–31. https://doi.org/10.1177/0256090916642685 Selfie’: Personality ratings of Instagram users as a function Ke, G., Meng, Q., Finley, T., et al. (2017). LightGBM: A of self-image posts. Journal of Research in Personality, 82: highly efficient gradient boosting decision tree. In Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 103843. https://doi.org/10.1016/j.jrp.2019.07.001 Advances in Neural Information Processing Systems, Bharadwaj, S., Sridhar S., Choudhary, R., and Srinath, R. 2017-Decem(Nips), pp. 3147–55. (2018). Persona Traits Identification based on Kircaburun, K., and Griffiths, M. D. (2018). Instagram Myers-Briggs Type Indicator(MBTI) - A Text addiction and the Big Five of personality: The mediating Classification Approach. In 2018 International role of self-liking. Journal of Behavioral Addictions, 7(1): Conference on Advances in Computing, Communications 158–70. https://doi.org/10.1556/2006.7.2018.15 and Informatics, ICACCI 2018, pp. 1076–82. https://doi. Kumar, S., Morstatter, F., and Liu, H. (2013). Twitter Data org/10.1109/ICACCI.2018.8554828 Analytics. Springer, p. 89. https://doi.org/10.1007/978-1- Boyle, G. J. (1995). Myers-Briggs Type Indicator (MBTI) 4614-9372-3 Some payxhomwtrix limitations. Liao, Y., Moshtaghi, M., Han, B. et al. (n.d.). Mining Micro- Chhabra, G. S., Sharma, A., and Murali Krishnan, N. Blogs: Opportunities and Challenges. (2019). Deep Learning Model for Personality Traits Lin, F. (2019). Positive or negative: emoji usage in online social Classification from Text Emphasis on Data Slicing. In media. 334(Hsmet), pp. 512–16. https://doi.org/10.2991/ IOP Conference Series: Materials Science and hsmet-19.2019.95 Engineering, 495(1). https://doi.org/10.1088/1757- 899X/495/1/012007 Luo, J. (2018). Emotional Analysis Oriented to Short Texts. 166(Amcce), pp. 567–70. https://doi.org/10.2991/amcce- Davis, J., and Goadrich, M. (2006). The relationship be- 18.2018.98 tween precision-recall and ROC curves. ACM International Conference Proceeding Series, 148: 233–40. Mccrae, R. R., and John, O. P. (1992). The five-factor https://doi.org/10.1145/1143844.1143874 model: issues and applications. Journal of Personality, 60(2): 175–532. http://www.ncbi.nlm.nih.gov/pubmed/ Friedman, J. H. (2001). Greedy function approximation: A 1635040 gradient boosting machine. Annals of Statistics, 29(5): 1189–232. https://doi.org/10.2307/2699986 Mehta, Y., Majumder, N., Gelbukh, A., and Cambria, E. (2019). Recent trends in deep learning based personality Gatica-Perez, D., Sanchez-Cortes, D., Tri Do, T. M., detection. Artificial Intelligence Review. https://doi.org/ Jayagopi, D. B., and Otsuka, K. (2018). Vlogging over 10.1007/s10462-019-09770-z time: Longitudinal impressions and behavior in YouTube. In ACM International Conference Proceeding Series, pp. Mohammad, S. M., and Turney, P. D. (2013). 37–47. https://doi.org/10.1145/3282894.3282922 Crowdsourcing a word-emotion association lexicon. Computational Intelligence, 29(3): 436–65. https://doi. Go, A., Bhayani, R., and Huang, L. (2009). Twitter org/10.1111/j.1467-8640.2012.00460.x Sentiment Classification using Distant Supervision. Processing, 1–6. Nave, G., Minxha J., Greenberg, D. M., Kosinski, M., Stillwell, D., and Rentfrow, J. (2018). Musical preferen- Golbeck, J., Robles, C., Edmondson, M., and Turner, K. ces predict personality: evidence from active listening and (2011). Predicting personality from twitter. In Proceedings Facebook likes. Psychological Science, 29(7): 1145–58. - 2011 IEEE International Conference on Privacy, Security, https://doi.org/10.1177/0956797618761659 Risk and Trust and IEEE International Conference on Social Computing, PASSAT/SocialCom 2011, pp. 149–56. https:// Orme, J. (2016). Re-examining the Use of Behavioral doi.org/10.1109/PASSAT/SocialCom.2011.33 Assessment Tools for Employee Selection. Heredia, B., Khoshgoftaar, T. M., Prusa, J., and Crawford, Prati, R. C., and Flach, P. A. (2005). ROCCER: An algo- M. (2016). Cross-Domain sentiment analysis: An empir- rithm for rule learning based on ROC analysis. In IJCAI ical investigation. In Proceedings - 2016 IEEE 17th International Joint Conference on Artificial Intelligence, pp. International Conference on Information Reuse and 823–28. Integration, IRI 2016, pp. 160–65. https://doi.org/10. Preoţiuc-Pietro, D., Eichstaedt, J., Park, G. et al.. (2015). 1109/IRI.2016.28 The Role of Personality, Age, and Gender in Tweeting about Digital Scholarship in the Humanities, 2021 19 of 21
M. Mobasher and S. Farzi Mental Illness. pp. 21–30. https://doi.org/10.3115/v1/ Wang, S., and Lu, H. (2018). The effects of personal types and w15-1203 decision-making modes on irrational financial behavior of Probst, P., Boulesteix, A. L., and Bischl, B. (2019). chinese students under. 10(july), pp. 59–68. Tunability: Importance of hyperparameters of machine Yin, M., Vaughan, J. W, and Wallach, H. (2019). learning algorithms. Journal of Machine Learning Understanding the effect of accuracy on trust in machine Research, 20: 1–32. learning models. In Conference on Human Factors in Downloaded from https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqaa070/6085985 by guest on 28 January 2021 Quercia, D., Lambiotte, R., Stillwell, D., Kosinski, M., and Computing Systems - Proceedings, pp. 1–12. https://doi. Crowcroft, J. (2012). The personality of popular face- org/10.1145/3290605.3300509 book users. In Proceedings of the ACM Conference on _ Yılmaz, T., Ergil, A., and Ilgen, B. (2020). Deep Computer Supported Cooperative Work, CSCW, pp. learning-based document modeling for personality de- 955–64. https://doi.org/10.1145/2145204.2145346 tection from Turkish Texts. Advances in Intelligent Renzulli, J. S. (1990). A practical system for identifying Systems and Computing, 1069: 729–36. https://doi.org/ gifted and talented students. Early Child Development 10.1007/978-3-030-32520-6_53 and Care, 63(1): 9–18. https://doi.org/10.1080/ Zou, S., and Wu, K. (2019). Impact of Weibo User’s 0300443900630103 Personality Traits on Loyalty. In Proceedings - Sakaki, T., Okazaki, M., and Matsuo, Y. (2010). Earthquake International Joint Conference on Information, Media shakes Twitter users. 851. https://doi.org/10.1145/ and Engineering, ICIME 2018, pp. 77–81. https://doi. 1772690.1772777 org/10.1109/ICIME.2018.00025 Sarwani, M. Z., Sani, D. A., and Fakhrini, F. C. (2019). Personality classification through social media using probabilistic neural network algorithms. International Notes Journal of Artificial Intelligence and Robotics (IJAIR), 1 Also know the five-factor model and OCEAN model 1(1): 9. https://doi.org/10.25139/ijair.v1i1.2025 2 Myers–Briggs Type Indicator Sewwandi, D., Perera, K., Sandaruwan, S., Lakchani, O., 3 Dominance Influence Steadiness Conscientiousness Nugaliyadde, A., and Thelijjagoda, S. (2017). Linguistic 4 https://github.com/MohammadMobasher/ Features based Personality Recognition using Social Media KNTU_Personality Data. 5 Psychological Type Stankevich, M., Smirnov, I., Ignatiev, N., Grigoriev, O., 6 Linguistic Inquiry and Word Count and Kiselnikova, N. (2018). Analysis of big five person- 7 posttraumatic stress disorder ality traits by processing of social media users activity 8 https://twython.readthedocs.io/en/latest/ features. In CEUR Workshop Proceedings, 2277, pp. 9 https://github.com/words/emoji-emotion 162–6. 10 The feeling of each tweet depends on the words used. In Tandera, T., Hendro, Suhartono, D., Wongso, R., and other words, each word itself has one or more different Prasetio, Y. L. (2017). Personality prediction system senses. Therefore, one can examine the set of tweets of from Facebook users. Procedia Computer Science, 116: each individual from this perspective. This is done by 604–11. https://doi.org/10.1016/j.procs.2017.10.016 using a dataset called the NRC that contains more than Tahmasebi M. and Fotouhi F., Esmaeili M. (2019). 14,000 words (Mohammad and Turney, 2013). Hybrid adaptive educational hypermedia recommend- Each word in this dataset is described with eight emo- er accommodating user’s learning style and web tional attributes (anger, fear, anticipation, trust, sur- page features. Journal of AI and Data Mining, 7(2): prise, sadness, joy, disgust) and two psychological 225–38. https://doi.org/https://10.22044/jadm.2018. attributes (negative, positive). http://www.purl.com/ 6397.1755 net/lexicons Ting, T. L., and Varathan, K. D. (2018). Job recommenda- 11 This API can be used in four different languages, and it tion using Facebook personality scores. Malaysian uses a variety of datasets to provide different emotional Journal of Computer Science, 31(4): 311–31. https://doi. properties (happy, angry, excited, sarcasm, sad, fear, org/10.22452/mjcs.vol31no4.5 bored) for each text. In order to use this API, all the Verhoeven, B., Daelemans, W., and Plank, B. (2012). A tweets were collected for each user and as a result, the Multilingual Twitter Stylometry Corpus for Gender and following six attributes were obtained for each user. Personality Profiling. pp. 1632–7. https://www.paralleldots.com/emotion-detection 20 of 21 Digital Scholarship in the Humanities, 2021
You can also read