Intelligent social bots uncover the link between user preference and diversity of news consumption - arXiv
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Intelligent social bots uncover the link between user preference and diversity of news consumption Yong Min∗ Tingjun Jiang Xiaogang Jin Zhejiang University of Technology Zhejiang University of Technology Zhejiang University Hangzhou, Zhejiang Hangzhou, Zhejiang Hangzhou, Zhejiang myong@zjut.edu.cn jtj9678@163.com xiaogangj@zju.edu.cn Cheng Jin Qu Li Tencent Corporation Zhejiang University of Technology Shenzhen, Guangdong Hangzhou, Zhejiang moonjin@tencent.com liqu@zjut.edu.cn arXiv:1907.02703v1 [cs.SI] 5 Jul 2019 ABSTRACT 1 INTRODUCTION The boom of online social media and microblogging platforms has With the growing popularity of social media and microblogging rapidly alter the way we consume news and exchange opinions. platforms, the way people obtain information and form opinions Even though considerable efforts try to recommend various con- has undergone substantial change. A recent study found that social tents to users, loss of information diversity and the polarization of media has become the primary source for over 60% of users to obtain interest groups are still an enormous challenge for industry and news [2]. These users are exposed to more personalized information, academia. Here, we take advantage of benign social bots to de- which is considered to limit the diversity of content they consume sign a controlled experiment on Weibo (the largest microblogging and even cause filter bubbles [3–5]. In fact, news consumption on platform in China). These software bots can exhibit human-like social media has been extensively studied to determine what factors behavior (e.g., preferring particular content) and simulate the for- lead to polarization [6, 7]. Recent works suggest that confirmation mation of personal social networks and news consumption under bias or selective exposure plays a significant role in online social two well-accepted sociological hypotheses (i.e., homophily and tri- dynamics [6, 8]. That is, online users tend to select messages or adic closure). We deployed 68 bots to Weibo, and each bot ran for information sources supporting their existing beliefs or cohering at least 2 months and followed 100 to 120 accounts. In total, we with their preferences and hence to form filter bubbles [5]. However, observed 5,318 users and recorded about 630,000 messages exposed the mechanism of polarization remains an open question that needs to these bots. Our results show, even with the same selection be- further clarification. haviors, bots preferring entertainment content are more likely to In this paper, we conduct a controlled experiment to explore the form polarized communities with their peers, in which about 80% pattern of news consumption and the factors leading to the emer- of the information they consume is of the same type, which is a gence of polarization on Weibo (NASDAQ: WB), the most popular significant difference for bots preferring sci-tech content. The result Chinese microblogging service. Weibo is the most famous Chinese suggests that users'preference played a more crucial role in limiting social networking and microblogging platform where users can themselves access to diverse content by compared with the two post messages for instant sharing. It has more than 500 million well-known drivers (self-selection and pre-selection). Furthermore, registered users and 313 million active monthly users, including our results reveal an ingenious connection between specific content government agencies, celebrities, enterprises, and ordinary neti- and its propagating sub-structures in the same social network. In zens. The information dissemination across Weibo is chiefly based the Weibo network, entertainment news favors a unidirectional on the “attention” relationship between users [9]. User A can fol- star-like sub-structure, while sci-tech news spreads on a bidirec- low user B (i.e., becoming a “follower” of B, and B then becomes a tional clustering sub-structure. This connection can amplify the “following” of A) and then receive all posts by user B on his page. diversity effect of user preference. The discovery may have impor- Therefore, the social network of Weibo is a directed graph. Repost- tant implications for diffusion dynamics study and recommendation ing or forwarding followings’ messages are the primary methods system design. of diffusing information in Weibo. One microblog can be reposted multiple times. In a reposted microblog, the usernames of the orig- KEYWORDS inal author and the last re-poster are visible to the current user Social network, Information polarization, Controlled experiment, (Figure 1). We focus on two types of user preference (entertainment Social bot, Text classification, Microblogging and sci-tech news) and analyze their effects on news consumption behaviors as well as the local network structure. In order to explore the extent and pathway of preferences af- “…forms of media favor particular kinds of content and therefore are fecting news consumption, we propose an experimental approach capable of taking command of a culture.” by using intelligent social bots [10]. Social bots are generally con- Neil Postman [1] sidered to be harmful, although some of them are benign and, in principle, innocuous or even helpful. Therefore, social bots are ∗ Corresponding author
Composing area especially entertainment [16]. When the overwhelming entertain- ment news obscures the scientific, financial and social news that reaches their home page daily, users are likely to become addicted to the “entertainment gossip” and consume more gossip-related news [17]. A high proportion of these users are so-called fanboys Weibo type & search area or fangirls who desperately support their favorite celebrities, and some others are anti-fans of those celebrities [18]. Fans and anti- Originial weibo fans have similar features and behavior in that they are usually active in polarized clusters of like-minded people, maintaining and reinforcing their current standpoints and even attacking each other [19]. These polarized users could be the important force in making and spreading ridiculous rumors about celebrities because they want to look only for news that supports their views and does not Weibo Operations Reposted weibo care about the facts. Besides, most social networks are usually flooded with malicious attacks and profanities between fans and anti-fans, making the network environment increasingly worse [20]. Moreover, these extreme behaviors gradually extend into real life. For instance, some netizens unearth the real identity of people who hold opposing views by launching a search and then exposing personal information (including photos, cell phone number, home Original author Reposter Direct source address, and other details) to disturb their opponent’s real life [21]. of the weibo of reposting These phenomena reveal the potential interaction between informa- tion polarization in online social networks and offline behavioral Figure 1: Weibo user interface. polarization. The formation mechanism of polarization and the method of restraining polarization are the most pressing issues in the field of information polarization [3, 5, 8, 22]. Many studies have suggested often the subject of research that needs to be eliminated [11], but that personalization, both self-selection, and pre-selection, limit researchers have yet to recognize their potential value as a powerful people's exposure to diverse content and increases polarization tool in social network analysis [10, 12]. Our social bots can use text [23]. Self-selection is the tendency for users to consume content classification algorithms to simulate the selection behavior of users or build new relationships that confirm their existing beliefs and on content and information sources, thereby simulating the evolu- preferences. Pre-selection depends on computer algorithms to per- tion of personal social networks of users with specific preferences. sonalize content for users without any conscious user choice. The By analyzing the personal social networks of these robots and the present research generally agreed that self-selection is the primary information obtained from them, we can clearly show the route of cause of information polarization [8]. user preferences affecting information polarization, thus revealing the complex interactions behind information polarization. 2 RELATED WORK 2.2 Controlled experiment on online social 2.1 Polarization of news consumption networks The advent and growing popularity of social media and personal- At present, research about online social networks relies mainly on ization services have, to a large extent, changed the way people observational methods [7, 8], which means that researchers cannot consume information, which is regarded as a factor in the fostering intervene in the study object but can only process and analyze the of the polarization of information diffusion and public opinions [8]. passively acquired data. However, the big open data contains noise In collective and individual levels, the term “polarization” has two that can affect the results [24]. To overcome the shortcomings of possible meanings. One is that like-minded people form exclusive a purely observational study, some researchers have ingeniously clubs sharing similar opinions and ignoring dissenting views, i.e., adopted the method of natural experiments or quasi-experiments to opinion polarization [6]; the other is that an individual is exposed make comparative analyses and causal inferences from the available to less diverse content and is limited by a narrow set of informa- datasets [25–27]. For example, Phan et al. (2015) analyzed the tion, i.e., information polarization [7]. For this paper, we focus on dynamic development of social relationships after natural disasters information polarization on social networks and media. in the context of Hurricane Ike in 2008 [26]. Nevertheless, the Many scholars have investigated the polarization of political uncontrolled approach limits the freedom of research. In contrast, news consumption [7, 13–15]. For example, Schmidt et al. (2017) the controlled experiment is more rigorous and can provide a more found that users prefer to pay attention to a narrow set of news solid foundation for causal analysis [24]. outlets [7]. However, the phenomenon of information polarization As online social media and networks have expanded to serve is present not only in political content but also in other content, hundreds of millions of users, and their functions have become 2
more accessible and intelligent, the ability to test sociological the- The social bot is based on two well-known sociological hypothe- ories and behavioral phenomena has increased. Application pro- ses. The first is triadic closure [39]. Triadic closure suggests that gramming interfaces from operators and innovative third-party among three people, A, B, and C, if a link exists between A and B, toolkits allow researchers to execute novel experimental designs. and A and C, then there is a high probability of a link between B For example, Facebook provides interfaces to customize application and C. The hypothesis reflects the tendency of a friend of a friend features for particular users and enables the design of controlled to become a friend and is a useful simplification of reality that experiments [28]. Amazon Mechanical Turk lets experimenters can be used to understand and predict the evolution of social net- engaging worldwide users in a precisely defined experimental en- works. Because the relationships embedded in Weibo should be vironment [29, 30]. These powerful tools are enabling a revolution modelled by a directed graph, the bots use directed triadic closure of experimental social science and social network analysis. As a to expand their followings (Figure 2) [40]. The second hypothesis result, the controlled experiment approach has begun to reveal is homophily [6], i.e., people who are similar have a higher chance the nuanced causal relationships among the design and operation of becoming connected. In online social networks, for example, an of the social network and human behavior. For example, recent individual creating her new homepage tends to link it to sites that large-scale experiments have deepened our understanding about are close to her interests (i.e., preferences). information sharing and diffusion, behavior spreading, voting and Based on the two hypotheses, the workflow of the bots includes political mobilization, and cooperation [24, 31–35]. The large scale five steps (Figure 2). (1) Initially, each bot is assigned 2 or 3 de- of modern online social networks also enables researchers to con- fault followings, who mostly post or repost messages consistent duct experiments involved millions of people to explore the variety with the preference of the bot. (2) A bot will periodically awaken of behaviors of diverse users. For example, the experiment con- from idleness at a uniformly random time interval. When the bot ducted by Aral and Walker (2012) estimates the heterogeneity of awakened, it can view the latest messages posted or reposted by the impact of influence-mediating messages on different types of its followings. As well known, not all unreaded information can be consumers [28]; Muchnik et al. (2013) examine in depth whether exposed to users of social networks. A bot can assess only the latest opinion change or selective turnout creates the social influence bias 50 messages (i.e., the maximum amount in the first page on Weibo) that they estimate exists in online ratings [36]. after waking up. Please note that we exclude the influence of algo- Large-scale experiments on social networks, while actively pro- rithmic ranking and recommendation systems (i.e., pre-selection) moting innovation of sociology and communication studies, also on the information exposure by re-ranking all possible messages bring about some serious problems [37]. For instance, the politi- according to the descending order of posting time. (3) After viewing cal voter mobilization experiment conducted by Robert et al. on the exposed messages, the bot selects only the messages consistent Facebook was sharply criticized by a considerable number of schol- with its preferences. The step depends on the FastText text classifi- ars, who argued that scientific research should not interfere with cation algorithm, and all classification results from the algorithm national politics or users'voting behavior [34]. Therefore, inappro- are further verified by the experimenters to ensure correctness. (4) priate design on actual large-scale social networks is likely to cause If there are reposted messages in selected messages, according to social injustice or infringe on the privacy of users, which violates directed triadic closure, the bot randomly selects a reposted mes- the ethics of scientific research and may even be illegal. All these sage and follows the direct source of the reposted message. Please issues are due to improper intervention in the behavior of social note that Weibo limits direct access to the information about the network users and the collection of their private data. followings and followers of a user; thus, bots must find new fol- In 2018, the Deep Mind team published a breakthrough study. lowings through reposting behavior. (5) If the following number By training the recurrent neural network, researchers simulated reaches the upper limit, the bot stops running; otherwise, the bot artificial agents with the function of autonomous mobility and becomes idle and waits to wake up again. To avoid legal and moral judgment of spatial location in two-dimensional virtual space [38]. hazards, the bots in the experiment do not produce or repost any These artificial agents produced a structure similar to that of the information. neural network cells of humans. The study provides a theoretical Technically, the bot is built using the Selenium WebDriver API basis for using artificial intelligence technology to simulate human (http://www.seleniumhq.org/projects/webdriver) to drive Google social behavior. If artificial agents can be used to replace real users Chrome Browser as a human to navigate the Weibo website. To to conduct experiments on large-scale social networks, potential train the FastText classifier for identifying preferred contents [41], moral and legal risks can be avoided, and the cost and threshold of we use the THUCNews dataset (http://thuctc.thunlp.org), which the experiments can be reduced. contains approximately 740 thousand Chinese news texts from 2005 to 2011 and has been marked for 14 classes. We further combine 3 PROPOSED METHOD and filter the original classes in the dataset to meet our requirement for recognizing Weibo text. All codes and data for building bots are 3.1 Social bot design freely accessible in GitHub (anonymous for review). A social bot is a computer algorithm that automatically interacts with humans on social networks, trying to emulate and possibly al- ter their behavior [10]. The social bot in our experiment is designed 3.2 Experiment design to imitate similarity-based relationship formation, which reflects We are interested in exploring the following overarching question: the human behavior of selecting information and relationships what role do the individual preferences of social media users play depending on one's preferences (i.e., self-selection). in their news consumption? Based on the above intelligent social 3
5 Manual checking interface Exposed news 1 Start Main-thread 4 2 3 Initializing 1 social bot 1 1 Reading Preferred news 2 exposed news 4 2 2 3 Selecting preferred 1 news 1 3 Selecting a reposted news Reposted news 3 Sub-thread 4 5 Yes Manually Waiting for 2 3 checking Empty? random time 1 No 4 1 Legal? Yes Follow Adding tree a new following New following 4 No No Removing Followings are illegal follow full Yes Directed triadic End closure Figure 2: Design of social bot. bots, we designed a controlled experiment to help us observe the is a reversed arc < v j , vi >, is a reciprocal edge (vi , v j ) is defined news consumption and personal social network evolution of the between two nodes. As a result, the reciprocal ratio can be defined bots with a specific preference. For each bot, we mainly measured as: the following response variables: n Õ ne (vi ) V1: The probability that the bot is exposed to the preferred con- r= (1) n i=1 a i (v ) tent. For a given time t, the probability can be quantified by the preferred content ratio (PCR), which is the ratio of the amount of where ne is the number of edges linking to node i, and na is the total information matching its preference to the total amount of infor- number of arcs linking to i. We can also measure the correlation mation from 0 to t. Since the running of each bot is not completely (called reciprocal correlation) between ne /na and na to reflect the synchronized, we normalize the time scale from 0 to 1, where t = 0 structural features of personal social networks. represents the initial state after given the initial user, and t = 1 For V1 and V2, we can quantify the extent to which the pref- represents the time when the bot ends the running. erence for particular kinds of content affects the diversity of new V2: The distribution of word frequency in the exposed content. consumption. For V3 and V4, we can reveal the impact of users’ For a word, the word frequency is defined as the ratio of the number preference on their personal social network structure and explore of messages that contain the word to all exposed messages of a bot. the relationship between the social network structure and content V3: The attributes of all followings of a bot. These attributes consumption. include the proportion of followings with the same preference (the We designed two experimental treatments and evaluated the preference of followers is judged according to their screen name as difference in the above response variables. The two treatments well as the tags and content of their microblogs) and the number were designed to reflect the variance in content preference on of the followings'followers and followings. the Weibo. The two treatments are an entertainment group (EG) V4: The structure of the followings network of a bot (i.e., the and a science and technology (sci-tech) group (STG). Bots in EG personal social network of the bot), including clustering coefficient, prefer entertainment messages, including celebrity gossip, fashion, in- and out-degree distribution, in- and out-degree correlation, and movies, TV shows, music, and such, and bots in STG prefer sci- reciprocal ratio (r ) [42, 43]. For a directed arc < vi , v j >, if there tech news, including nature, science, engineering, technological 4
Table 1: Standard for text classification ** (A) ** (B) 1 ** Average ratio Standard Entertainment Sci-Tech 0.8 ** Accept (1) celebrity gossip, (1) nature, science, engi- 0.6 fashion, movies, TV neering, technological 0.4 * shows, and pop music; advances, digital prod- 0.2 (2) Explicitly contain- ucts, and Internet; (2) ing the name, account technical company and 0 EG STG EG STG1STG2STG3 or abbreviation of university. 1 entertainers. EG (C) Common (1) commercial advertisement; (2) less than 5 0.8 Average ratio reject Chinese characters. 0.6 Specific (1) ACG content (i.e., an- (1) financial or business 0.4 STG reject imation, comic, and dig- report of technical or ital game); (2) art, lit- Internet company; (2) 0.2 erature, and personal price of digital products; 0 feeling; (3) simple lyrics (3) military equipment; 0 0.2 0.4 0.6 0.8 1 and lines; (4) personal (4) daily skills; (5) con- Normalized time leisure activities. stellations and divina- tion; (6) weather fore- Figure 3: Preference drives the polarization of exposed con- cast; (7) environmental tent. (A) In the initial state, there is an approximately 50% conservation; (8) docu- higher preferred content ratio (PCR) in the entertainment mentary with irrelevant (EG) than in the sci-tech (STG) treatment (∗ ∗ p < 0.01, two- content. sided Mann-Whitney U test, n = 34 bots). (B) In the final state, the PCR of the EG is 4.4 times that of the STG treat- ment (∗ ∗ p < 0.01, two-sided Mann-Whitney U Test, n = 34 bots). Furthermore, we divided bots in STG into three sub- advance, digital products, Internet, and so on. Each treatment groups according to their initial PCR: STG1 (0.6 ≤ PCR, contained 34 bots who operated on the Weibo platform between n 1 = 14), STG2 (0.3 ≤ PCR < 0.6, n 2 = 11), and STG3 (PCR < 0.3, 13 March and 28 September 2018. The initial followings come from n 3 = 9). Only STG1 and STG2 display a significant difference a large enough user pool, which contained at least 100 candidates (∗p < 0.05, Kruskal-Wallis with Mann-Whitney post hoc test). for each preference. The idle period of all bots was 2 4 hours. The (C) PCR changes with time. The filling areas indicate the maximum number of followings for each bot was 120, according to range between the 25th and 75th percentiles of the ratios in the Dunbar number. the two treatments. Compared with the conventional method, our design is character- ized by using intelligent bots as the agents to conduct experiments in a real online social environment and indirectly observe real human behavior. These bots act entirely on predefined behavior hy- potheses and intelligent algorithms, needing manual operation by 4 RESULTS the experimenters only in the judgment of preference classification. 4.1 User preference drives polarization of Therefore, our experiment can be regarded as a type of randomized content assignment. In social media, users'self-selection behavior of information source and information has been considered the primary mechanism of po- 3.3 Standard for preference classification larization. In this experiment, social bots were designed to expand The most significant feature of the social bots is that they can personal social networks based on the two hypotheses, which aimed autonomously identify whether information matches their pref- to simulate users'self-selection. We aimed to validate whether the erences, benefiting from automatic and manual text classification. effect of self-selection is identical under difference preferences. Therefore, the criteria of text classification are fundamental. We de- For all 68 social bots, since the initial users have been set as the fined a content classification standard for the two kinds of content user who mainly reposts the corresponding preferred content, the in our experiment (Table 1). PCR of EG and STG in the initial state can be as high as 76.77% The classification criteria shown in Table 1 were used to se- and 50.0% on average (Figure 3(A)). That is, most of the messages lect initial followings, prepare the training corpus of the FastText exposed to both treatments are concentrated in the corresponding classifier, manually verify, and classify exposed texts. The unified preference. Even so, preference has a significant impact on infor- standard can ensure consistency in the judgment of preference and mation diversity. The PCR of EG was significantly higher than that the classification of text. of STG (p < 0.01, two-sided Mann-Whitney U test). This initial 5
difference between EG and STG provides a baseline for our further observed the personal social network of each bot and found that analysis. the bots present different structures according to their preferences. Compared to the initial state, we are more concerned about the 4.2.1 The attributes of followings. The followings of a bot are diversity of messages consumed by social bots after personal social the nodes of its personal social network. Our results showed that networks have formed (Figure 3(B)). As the PCR of STG in the initial in the EG treatment, the average proportion of nodes with the state is significantly less than that of EG and has a large variety entertainment preference is 85.32% (up to 92.14% and down to (Figure 3A), we further divide STG into three subgroups according 73.47%) (Figure 5(A)). The high proportion suggests that, with the to their initial PCR to explain the effect of initial deviations: STG1 entertainment preference, users'self-selection can effectively filter (initial PCR is more than 0.6, 14 bots), STG2 (initial PCR is between the information source, making most of the information sources 0.3 and 0.6, 11 bots), and STG3 (initial PCR is less than 0.3, 9 bots). consistent with their preference and thus resulting in a filter bubble. The results show that when the personal social network reaches 100 In the STG treatment, however, the proportion is only 35.33% on to 120 followings, EG still has the higher PCR with an average of average (up to 47.86%, down to 23.12%) (Figure 5(B)). Based on 63.78%, while the average PCR of the three STG subgroups is lower the information diffusion mechanism of Weibo and the design of than 20%. Moreover, Only STG1 and STG2 display a significant social bots, a low proportion indicates that users with the sci-tech difference (∗p < 0.05, Kruskal-Wallis with Mann-Whitney post hoc preference have a high possibility of following users with other test). The results suggest that regardless of the initial state, bots in preferences, and users with various preferences could forward the the STG treatment can receive diverse information rather than sci- sci-tech content. The interaction between multiple preferences tech content. From the changes of PCR with the growth of personal effectively enhances the diversity of news consumption. For the social networks, we can also observe the effect of user preference sci-tech preference, as a result, the content-based self-selection on information polarization (Figure 3(C)). With the social network behavior is inefficient in developing information polarization. grows, the PCR of EG declines only slightly in the early stage and Not only the composition of followings but also the features of remains unchanged after the further increase of followings. In STG, followings in the two treatments are significantly different. Fig- new followings can continuously reduce the PCR; that is, they can ure 5(C) and (D) show that, compared with STG, EG has 53.78% bring more nonpreferred content. Therefore, there is a correlation fewer followings (i.e., an indirect information source for bots) but between the users’ preferences and the polarization of their news 23.43% more followers on average (p < 0.01, two-sided Mann- consumption in Weibo. Whitney U Test). In other words, users with entertainment prefer- The effect of preference on content consumption is reflected ence tend to interact with fewer news sources while diffusing infor- not only on the type of content but also on the semantic level. In mation to a large number of audiences. The “trumpet-like” feature both EG and STG treatments, the word frequency of the preferred can incite information polarization. Therefore, users’ preferences content exposed to the social bots presents a power-law distribution not only directly affect news consumption through self-selection (Figure 4(A)). The power-law distribution (y ∼ x −α ) indicates that but also change their social networks to reinforce this effect. there are some dominant words with high word frequency in the preferred content. Moreover, the dominance of a few words is more 4.2.2 The connection of personal social networks. In addition to severe in EG than in STG (α = 2.23 for EG and α = 4.74 for STG). the followings’ attributes, the structure of the personal social net- In the EG treatment, the maximum frequency of a word can be up work is also modified by the user preference. By roughly comparing to 0.4 on average, which means that the same word can be found in the in- and out-degree distributions of EG and STG's personal so- 40% of messages on average. However, in the STG treatment, the cial networks, we find that both present power-law distributions maximum word frequency is no more than 0.15 (Figure 4(B)). Even (α = 1.766 for in-degrees and α = 1.847 for out-degrees) and are among the top 10 words of EG, the first and second words are more almost identical (Figure 6(A) and (B)). Furthermore, the visualiza- dominant than other words. By analyzing the top 10 high-frequency tion of network evolution also shows that EG and STG have similar words of the two treatments, we find that the high-frequency words personal social networks (Figure 7). However, additional structural of EG are usually the names of certain entertainment celebrities features reveal the opposite result. and related specific words, while the high-frequency words of STG Figure 6(C) shows that the average clustering coefficient of STG are some common words, such as China, America, technology, networks is 35% higher than that of EG networks (p < 0.01, two- market, and company (Figure 4(C)). From this, it can be seen that sided Mann-Whitney U Test). Given the low level of information the messages received by bots preferring entertainment content not polarization in STG, this is an unexpected result, as a dense com- only concentrate on a single type but also show a univocal trend in munity spreads diverse information. This counterintuitive phenom- semantics. enon can be attributed to differences in the connection pattern of personal social networks. Figure 6(D) shows that there is a high positive correlation (0.6) between the in- and out-degrees of STG networks, while the degree correlation of EG networks is very weak 4.2 The personal social network (0.2). This correlation implies a difference between the two types of The dissemination mechanism of Weibo makes the followings of networks on reciprocal edges. We found that an average of 60% of a bot become the primary and direct information sources for the the arcs in STG networks belong to the reciprocal edge, while the bot. Therefore, the personal social network of a social bot (i.e., the proportion is only 40% in EG (Figure 6(E)). Besides, in STG, the de- network consisting of its followings) is its media environment and gree of reciprocal edge presented a weak positive correlation (0.25) reflects a local structure to diffuse specific types of information. We with nodes’ total degree; that is, the higher-degree nodes might 6
100 (A) EG03 top10 words (C) 01) Wu, Yifan (name) 10-2 Distribution 02) Mr 音乐 EG 03) Hip hop y~x-2.23 04) Music 10-4 05) Lu, Han (name) 鹿晗 说唱 STG 06) China y~x-4.74 07) SKR 电影 吴亦凡 08) Diss 10-6 09) Internet 10) Movie 10-4 10-3 10-2 10-1 0.8 中国 Word rank 网络 0.8 (B) 科技 智能 STG07 top10 words 0.6 手机 公司 Frequency 中国 04) Sci-Tech 美元 05) Market 0.4 美国 市场 全球 06) Mobile phone 技术 07) US dollar 01) China 08) Global 0.2 02) Corporation 09) Technology 03) US 10) Intelligence 0 1 2 3 4 5 6 7 8 9 10 Figure 4: Semantic analysis of exposed contents. (A) Inverse cumulative distribution of word frequency (log-log plot). The EG treatment has a longer tail than the STG treatment; i.e., the portion of the EG distribution has more higher-frequency words. (B) Box plots of the frequency of top 10 words in both the EG and STG treatments. (C) Demonstration of top 10 words in exposing messages from EG 03 bot and STG 07 bot. The size of the bubble indicates the word frequency. have more reciprocal edges (Figure 6(F)). In EG, the degree of the Moreover, different news consumption patterns are accompanied by reciprocal edge is negatively correlated with the total degree (-0.18); different compositions and structures of personal social networks. that is, the higher-degree nodes prefer the one-way relationships in Weibo (Figure 6(F)). The visualization analysis also shows the same results. If all 5 DISCUSSION nonreciprocal arcs are removed, we find that the STG networks still have a high connection density, and the high-degree nodes In this paper, we create a new controlled experiment approach are still maintained in the centrality, while EG networks are the to reveal the link between user preferences and patterns of news opposite, with the sparse connection, and the high-degree nodes consumption on a Chinese microblogging platform (Weibo). This are marginalized (Figure 7). Based on the above results, an EG method can be seen as a real-time user behavioral simulation on network tends to be a unidirectional star-like structure with a few real-world social networks using the computer and artificial in- high-degree nodes, while an STG network tends to be a bidirec- telligence technology. (1) The method regards real-world social tional clustering structure. The unidirectional star-like structure networks as an objective media environment and records and ana- means selecting one or several central nodes to play the role of lyzes the feedback of the environment to specific user behaviors. (2) broadcasters to send information to all other nodes with few inter- This method uses artificial intelligence technologies (e.g., natural actions between them. However, the diffusion path of this structure language processing) to simulate specific user behaviors instead is mainly dependent on the selected central node. If the central of employing volunteers or collecting users’ private data. The nodes in the star-like structure produce only a narrow set of news, approach not only improves the freedom of the experimenters in the others will be limited to a lower diversity of content. This designing and controlling the experiment but also avoids the ethical selection effect of central nodes tends to enlarge the polarization risk of violating users’ privacy. (3) Compared with the experimental of the content [44]. In the bidirectional clustering structure, all method of directly altering the real-world social network [34], this nodes are placed in a clustering group, and most connections are method gives the experimenter more flexibility and allows more bidirectional. Those nodes can efficiently exchange information researchers to conduct their own experiments without internal and realize a complementary effect with their friends with distinct assistance from social network operators. (4) Compared with the preferences. The complementary effect can promote the diversity experimental method based on artificial social networks [31, 32], of information [44]. this method can work directly on real-world social networks and Although all social bots select potential followings based on can control user behaviors more directly with lower cost. How- preferred content to ensure that these users post or repost the ever, limited by the current technical conditions, this method can content they like, whether the self-selection behavior can narrow intelligently simulate only relatively simple user behaviors. How users’ news consumption is dependent on the users’ preferences. to simulate complex user behaviors (especially initiative feedback behaviors such as comments and chats) is still a challenging task. 7
(A) (B) (A) (B) 92% 10-1 10-1 Distribution Distribution 35% 73% 10-2 10-2 23% EG 85% STG 10 -3 10-3 y~x-1.77 y~x-1.85 48% 106 100 101 102 100 101 102 1000 3 In-degree Out-degree (C) ** (D) 0.5 0.8 Average followings ** Average followers 800 (C) ** (D) In-Out correlation 2 0.4 ** 600 0.6 Clustering 0.3 400 0.4 1 0.2 ** 200 0.2 0.1 0 0 EG STG EG STG 0 0 EG STG EG STG Figure 5: Demographic features of the followings of bots. 0.8 0.3 Reciprocal correlation (A) and (B) Percentage of followings with the same prefer- (E) (F) ** ** ence with bots in the EG and STG treatments. The white and Reciprocal arc ratio 0.2 colored dotted lines indicate the minimum and maximum 0.6 0.1 percentages, respectively. (C) The average number of follow- 0.4 0 ings of followings (∗ ∗ p < 0.01, two-sided Mann-Whitney U Test, 2,675 followings in EG and 1,622 followings in STG). (D) -0.1 0.2 The average number of followers of followings (∗ ∗ p < 0.01, -0.2 two-sided Mann-Whitney U Test, 2,675 followings in EG and ** 0 -0.3 1,622 followings in STG). Due to the high variance, we do not EG STG EG STG show error lines in this plot. Figure 6: Structural features of personal social networks. (A) Mean of inverse cumulative distribution of in-degree (log- log plot). (B) Mean of inverse cumulative distribution of Taking advantage of our experimental method, we compare the out-degree (log-log plot). (C) Personal social networks in news consumption of users with different preferences. Previous STG have a higher clustering coefficient than networks in studies on the polarization mechanism focus on the effect of users’ EG (∗ ∗ p < 0.01, two-sided Mann-Whitney U Test, n = 34 self-selection behaviors [23]. First, a large number of studies focus networks). (D) Compared with EG, nodes in STG present a on the dissemination of different opinions in response to particular stronger correlation between in- and out-degrees (∗∗p < 0.01, content, such as political news [7, 13–15]. In this case, users are Pearson correlation coefficient, 54,823 nodes for EG, 51,838 actually in the same media environment, and the differences in nodes for STG). (E) Personal social networks in STG obtain content preferences can be ignored. Second, the rise of pre-selection more reciprocal edges than networks in EG (∗ ∗p < 0.01, two- technologies (e.g., recommendation systems) makes people more sided Mann-Whitney U Test, n = 34 networks). (F) Nodes attentive to the comparison of pre-selection and self-selection than in EG show a weak negative correlation between reciprocal to the impact of the media environment [8]. However, our research edge degree (> 0) and total degree. In contrast, nodes in STG shows that user preferences might be the primary factor affecting show a positive reciprocal degree correlation (∗ ∗ p < 0.01, the diversity of news consumption. Distinct preferences can lead to Pearson correlation coefficient, 3,546 nodes for EG, 3,368 entirely different news consumption patterns even with the same nodes for STG). self-selection behavior. Such results deepen our understanding of the mechanism of information polarization. Furthermore, in this experiment, we make the ideal assumption that each user has bot, the messages that a bot can receive is limited to the follow- only a single preference. However, people usually have multiple ings'posts and reposts. Therefore, information diversity depends preferences. How the interaction of multiple preferences shapes on followings'self-selection and the structure of personal social the diversity of news consumption remains an open question. networks. The impact of network structure on information dif- Our results also show that user preference not only affects in- fusion is well known [42, 45, 46], but people often overlook the formation diversity but also shapes personal social networks. De- role of content in it. Until now, only a few works have noticed pending on the mechanism of Weibo and the design of a social the link between content and structure [7, 47]. In communication 8
EG05 STG02 Initialized (0%) 20% 50% End (100%) Reciprocal edges only Figure 7: Visualization of the evolution of personal social networks. In this demonstration, Social bot 05 in EG (EG05) and bot 02 in STG (STG02) display a similar growth process from initialized two followings to the end of running. However, STG02 obtains more reciprocal edges than EG05. The triangles represent the nodes with high in-degree, that is, the corresponding users have a large number of followers. The visualization is based on the radial layout with in-degree centrality; thus the node with higher in-degree is closer to the center of the plot. studies, the media ecology theory has revealed the connection be- Our work is a preliminary experimental analysis of the relation- tween the form and structure of media and content [48]. They even ship between user preference and news consumption, so there are think that “the medium is the message.” [49]. Our results show still many limitations. First, the impact of initial users requires that even in the same media, different content should spread in further evaluation. In the experiment, it was difficult for us to find distinct sub-structures. For example, we found that personal so- users who post merely sci-tech content (RCP ≥ 0.8) as the initial fol- cial networks of users with entertainment preferences tend to be a lowings of bots. Although we find that the weak selection intensity unidirectional star-like structure, which is more powerful for the does not affect the result, the effect of the initial values still requires broadcasting of narrow content. It can be seen that user preference a more detailed analysis. Second, limited by the accuracy of the text drives the coevolution of content consumption and personal social classification algorithm, the robot currently only selects potential network structure, and the coevolution strengthens information followings through the received real-time messages. The selection polarization. is relatively loose. If the robot can examine all the information Our work provides valuable insights into the theoretical analysis about potential followers (historical posts, tags, avatars, etc.), the of diffusion dynamics. We found that, even in the same network, effect of rigorous selection requires further study. Third, the two information about different contents should be propagated through types of preferences are not enough to explain all the problems. different sub-structures. This phenomenon has led us to re-examine The study of more preferences (e.g., sport, finance, politics, etc.) the results of previous studies based on a unified macroscopic struc- will help us to understand the complexity of the triad of preference, ture, such as small-world networks or scale-free networks [42]. content, and structure in news consumption. Fortunately, researchers have begun to use multilayer network In sum, we have introduced a new experimental approach using models to describe the differences in sub-structures. However, social bots and revealed the collective behavior behind information the multilayer network model for content differences and the re- polarization, that is, a group of users with the same preference can lated propagation mechanisms remain to be studied. Furthermore, form a media environment facilitating polarization by specialized our work is also instructive for the design of the recommendation structure and selection behaviors. system [50]. Users with different preferences need different rec- ommendation strategies. For example, our results show that users preferring entertainment content need more diverse recommenda- tions to avoid polarization, while users preferring sci-tech content ACKNOWLEDGMENTS need more relevant content to maintain their activity in social net- This work was supported by the National Natural Science Foun- works [51]. As a result, the recommendation system not only needs dation of China (Grant Nos. 71303217, 31370354 and 31270377) to pay attention to individual preferences and behaviors but also and the Zhejiang Provincial Natural Science Foundation of China needs to understand the collective behavior of the user groups with (Grant No. LY17G030030, LGF18D010001, LGF18D010002). the same preferences. We thank Professor Jie Chang and Professor Ying Ge (Zhejiang University) for their guidance on the diversity theory. 9
We thank Aizhu Liu, Chenyi Fang, Conger Yuan, Fan Li, Hao April 2014. Li, Hao Wu, Haochen Hou, Hengji Wang, Jian Zhou, Jiaye Zhang, [25] Yubo Chen, Qi Wang, and Jinhong Xie. Online social interactions: A natural experiment on word of mouth versus observational learning. Journal of Marketing Jinmeng Wang, Junhao Xu, Lingjian Jin, Longzhong Lu, Lu Chen, Research, 48(2):238–254, April 2011. Luchen Zhang, Qiuhai Zheng, Qiuya Ji, Renyuan Yao, Ruonan [26] Tuan Q. Phan and Edoardo M. Airoldi. A natural experiment of social net- work formation and dynamics. Proceedings of the National Academy of Sciences, Zhang, Shang Gao, Shicong Han, Songyi Huang, Ting Xu, Wei Fang, 112(21):6595–6600, May 2015. Wei Zhang, Xingfan Zhang, Yijing Wang, Yingjie Feng, Yinting [27] Soroush Vosoughi, Deb Roy, and Sinan Aral. The spread of true and false news Chen, Yiqi Ning, Yujie Bao, Yuying Zhou, Zheyu Li, Ziyu Liu for online. Science, 359(6380):1146–1151, March 2018. [28] Sinan Aral and Dylan Walker. Identifying influential and susceptible members their contribution to the manual text classification. of social networks. Science, 337(6092):337–341, July 2012. We thank Haidan Yang for her introduction to the theory of [29] David G. Rand and Martin A. Nowak. The evolution of antisocial punishment in communication. optional public goods games. Nature Communications, 2:434, August 2011. [30] Winter Mason and Duncan J. Watts. Collaborative learning in networks. Pro- ceedings of the National Academy of Sciences, 109(3):764–769, January 2012. [31] Damon Centola. The spread of behavior in an online social network experiment. REFERENCES Science, 329(5996):1194–1197, September 2010. [1] Neil Postman. Amusing ourselves to death: public discourse in the age of show [32] Damon Centola. An experimental study of homophily in the adoption of health business. Penguin, 2006. behavior. Science, 334(6060):1269–1272, December 2011. [2] Nic Newman, David Levy, and Rasmus Kleis Nielsen. Reuters institute digital [33] David G. Rand, Samuel Arbesman, and Nicholas A. Christakis. Dynamic social news report 2015. SSRN Scholarly Paper ID 2619576, Social Science Research networks promote cooperation in experiments with humans. Proceedings of the Network, Rochester, NY, June 2015. National Academy of Sciences, 108(48):19193–19198, November 2011. [3] Eli Pariser. The filter Bubble: How the new personalized web is changing what we [34] Robert M. Bond, Adam D. I. Kramer, Cameron Marlow, Christopher J. Fariss, read and how we think. Penguin, May 2011. Jaime E. Settle, James H. Fowler, and Jason J. Jones. A 61-million-person experi- [4] Seth Flaxman, Sharad Goel, and Justin M. Rao. Filter bubbles, echo chambers, ment in social influence and political mobilization. Nature, 489(7415):295–298, and online news consumption. Public Opinion Quarterly, 80(S1):298–320, January September 2012. 2016. [35] Adam D. I. Kramer, Jamie E. Guillory, and Jeffrey T. Hancock. Experimental evi- [5] F.J. Zuiderveen Borgesius, D. Trilling, J. Moller, B. Bodo, C.H. de Vreese, and dence of massive-scale emotional contagion through social networks. Proceedings N. Helberger. Should we worry about filter bubbles? Internet Policy Review, 5(1), of the National Academy of Sciences, page 201320040, May 2014. March 2016. [36] Lev Muchnik, Sinan Aral, and Sean J. Taylor. Social influence bias: A randomized [6] P. Dandekar, A. Goel, and D. T. Lee. Biased assimilation, homophily, and the experiment. Science, 341(6146):647–651, August 2013. dynamics of polarization. Proceedings of the National Academy of Sciences, [37] Ulf-Dietrich Reips. Standards for internet-based experimenting. Experimental 110(15):5791–5796, April 2013. Psychology, 49(4):243–256, October 2002. [7] Ana Luca Schmidt, Fabiana Zollo, Michela Del Vicario, Alessandro Bessi, Antonio [38] Andrea Banino, Caswell Barry, Benigno Uria, Charles Blundell, Timothy Lillicrap, Scala, Guido Caldarelli, H. Eugene Stanley, and Walter Quattrociocchi. Anatomy Piotr Mirowski, Alexander Pritzel, Martin J. Chadwick, Thomas Degris, Joseph of news consumption on Facebook. Proceedings of the National Academy of Modayil, Greg Wayne, Hubert Soyer, Fabio Viola, Brian Zhang, Ross Goroshin, Sciences, 114(12):3035–3039, March 2017. Neil Rabinowitz, Razvan Pascanu, Charlie Beattie, Stig Petersen, Amir Sadik, [8] Eytan Bakshy, Solomon Messing, and Lada A. Adamic. Exposure to ideologically Stephen Gaffney, Helen King, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, diverse news and opinion on Facebook. Science, 348(6239):1130–1132, June 2015. and Dharshan Kumaran. Vector-based navigation using grid-like representations [9] Weibo, October 2018. https://en.wikipedia.org/wiki/Sina Weibo. in artificial agents. Nature, 557(7705):429–433, May 2018. [10] Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, and Alessandro [39] Ginestra Bianconi, Richard K. Darst, Jacopo Iacovacci, and Santo Fortunato. Flammini. The rise of social bots. Communications of the ACM, 59(7):96–104, Triadic closure as a basic generating mechanism of communities in complex June 2016. networks. Physical Review E, 90(4):042806, October 2014. [11] Abigail Paradise, Rami Puzis, and Asaf Shabtai. Anti-reconnaissance tools: [40] Daniel Mauricio Romero and Jon M Kleinberg. The directed closure process Detecting targeted socialbots. IEEE Internet Computing, 18(5):11–19, 2014. in hybrid social-information networks, with an analysis of link formation on [12] Norah Abokhodair, Daisy Yoo, and David W McDonald. Dissecting a social twitter. In Proceedings of the Fourth International AAAI Conference on Weblogs botnet: Growth, content and influence in twitter. In Proceedings of the 18th ACM and Social Media, 2010. Conference on Computer Supported Cooperative Work & Social Computing, pages [41] Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. Bag of 839–851. ACM, 2015. tricks for efficient text classification. arXiv preprint arXiv:1607.01759, 2016. [13] Michael D. Conover, Jacob Ratkiewicz, Matthew Francisco, Bruno Goncalves, [42] Stefano Boccaletti, Vito Latora, Yamir Moreno, Martin Chavez, and D-U Hwang. Filippo Menczer, and Alessandro Flammini. Political polarization on Twitter. In Complex networks: structure and dynamics. Physics reports, 424(4-5):175–308, Fifth International AAAI Conference on Weblogs and Social Media, July 2011. 2006. [14] Markus Prior. Media and political polarization. Annual Review of Political Science, [43] Giorgio Fagiolo. Clustering in complex directed networks. Physical Review E, 16(1):101–127, 2013. 76(2):026107, 2007. [15] Vidya Narayanan, Vlad Barash, John Kelly, Bence Kollanyi, Lisa-Maria Neudert, [44] Michel Loreau and Andy Hector. Partitioning selection and complementarity in and Philip N. Howard. Polarization, partisanship and junk news consumption biodiversity experiments. Nature, 412(6842):72, 2001. over social media in the US. arXiv:1803.01845 [cs], March 2018. arXiv: 1803.01845. [45] Yong Min, Xiaogang Jin, Ying Ge, and Jie Chang. The role of community mix- [16] James G. Webster. Beneath the veneer of fragmentation: television audience ing styles in shaping epidemic behaviors in weighted networks. PLOS ONE, polarization in a multichannel world. Journal of Communication, 55(2):366–382, 8(2):e57100, February 2013. WOS:000315184200214. June 2005. [46] Xiao-gang Jin and Yong Min. Modeling dual-scale epidemic dynamics on complex [17] Su Jung Kim and James G. Webster. The impact of a multichannel environment networks with reaction diffusion processes. Journal of Zhejiang University on television news viewing: a longitudinal study of news audience polarization SCIENCE C, 15(4):265–274, April 2014. in South Korea. International Journal of Communication, 6(0):19, April 2012. [47] Chun-Yuen Teng, Liuling Gong, Avishay Livne Eecs, Celso Brunetti, and Lada [18] Mark Duffett. Understanding fandom: An introduction to the study of media fan Adamic. Coevolution of network structure and content. In Proceedings of the 4th culture. Bloomsbury Academic, New York, 1 edition edition, August 2013. Annual ACM Web Science Conference, WebSci ’12, pages 288–297, New York, NY, [19] Lucy Bennett. Researching online fandom. Cinema Journal, 52(4):129–134, 2013. USA, 2012. ACM. [20] Huishan Huang. Exo “vs” tfboys: Causal analysis of network wars between idol [48] Carlos A. Scolari. Media ecology: exploring the metaphor to expand the theory. fans from an perspective of psychology. Journal of Henan Institute of Education, Communication Theory, 22(2):204–225, May 2012. 25(1):48–52, 2016. [49] Marshall McLuhan and Quentin Fiore. The medium is the message. New York, [21] Dong Han. Search boundaries: human flesh search, privacy law, and internet 123:126–128, 1967. regulation in china. Asian Journal of Communication, 28(4):434–447, 2018. [50] Tao Zhou, Zoltan Kuscsik, Jian-Guo Liu, Matuvs Medo, Joseph Rushton Wake- [22] Natalie Jomini Stroud. Polarization and partisan selective exposure. Journal of ling, and Yi-Cheng Zhang. Solving the apparent diversity-accuracy dilemma Communication, 60(3):556–576, September 2010. of recommender systems. Proceedings of the National Academy of Sciences, [23] Kota Kakiuchi, Fujio Toriumi, Masanori Takano, Kazuya Wada, and Ichiro 107(10):4511–4515, 2010. Fukuda. Influence of selective exposure to viewing contents diversity. Arxiv, [51] Xiaogang Jin, Cheng Jin, Jiaxuan Huang, and Yong Min. Coupling effect of page 1807.08744, July 2018. nodes popularity and similarity on social network persistence. Scientific Reports, [24] Sinan Aral and Dylan Walker. Tie strength, embeddedness, and social influence: 7:42956, February 2017. A large-scale networked experiment. Management Science, 60(6):1352–1370, 10
You can also read