"IT DOESN'T MATTER NOW WHO'S RIGHT AND WHO'S NOT:" A MODEL TO EVALUATE AND DETECT BOT BEHAVIOR ON TWITTER - OhioLINK ETD Center
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
“IT DOESN’T MATTER NOW WHO’S RIGHT AND WHO’S NOT:” A MODEL TO EVALUATE AND DETECT BOT BEHAVIOR ON TWITTER by Braeden Bowen Honors Theis Submitted to the Department of Computer Science and the Department of Political Science Wittenberg University In partial fulfillment of the requirements for Wittenberg University honors April 2021
Bowen 2 On April 18, 2019, United States Special Counsel Robert Mueller III released a 448-page report on Russian influence on the 2016 United States presidential election [32]. In the report, Mueller and his team detailed a vast network of false social media accounts acting in a coordinated, concerted campaign to influence the outcome of the election and insert systemic distrust in Western democracy. Helmed by the Russian Internet Research Agency (IRA), a state-sponsored organization dedicated to operating the account network, the campaign engaged in "information warfare" to undermine the United States democratic political system. Russia's campaign of influence on the 2016 U.S. elections is emblematic of a new breed of warfare designed to achieve long-term foreign policy goals by preying on inherent social vulnerabilities that are amplified by the novelty and anonymity of social media [13]. To this end, state actors can weaponize automated accounts controlled through software [55] to exert influence through the dissemination of a narrative or the production of inorganic support for a person, issue, or event [13]. Research Questions This study asks six core questions about bots, bot activity, and disinformation online: RQ 1: What are bots? RQ 2: Why do bots work? RQ 3: When have bot campaigns been executed? RQ 4: How do bots work? RQ 5: What do bots do? RQ 6: How can bots be modeled? Hypotheses With respect to RQ 6, I will propose BotWise, a model designed to distill average behavior on the social media platform Twitter from a set of real users and compare that data against novel input. Regarding this model, I have three central hypotheses: H 1: real users and bots exhibit distinct behavioral patterns on Twitter H 2: the behavior of accounts can be modeled based on account data and activity H 3: novel bots can be detected using these models by calculating the difference between modeled behavior and novel behavior Bots Automated accounts on social media are not inherently malicious. Originally, software robots, or "bots," were used to post content automatically on a set schedule. Since then, bots have evolved significantly, and can now be used for a variety of innocuous purposes, including marketing, distribution of information, automatic responding, news aggregation, or just for highlighting and reposting interesting content [13]. No matter their purpose, bots are built entirely from human-written code. As a result, every action and decision they are made capable of replicating must be preprogrammed and decided by the account's owner. But because they are largely self-reliant after creation, bots can generate massive amounts of content and data very quickly.
Bowen 3 Many limited-use bots make it abundantly clear that they are inhuman actors. Some bots, called social bots, though, attempt to subvert real users by emulating human behavior as closely as possible, creating a mirage of imitation [13]. These accounts may attempt to build a credible persona as a real person in order to avoid detection, sometimes going as far as being partially controlled by a human and partially controlled by software [54]. The more sophisticated the bot, the more effectively it can shroud itself and blend into the landscape of real users online. Not all social bots are designed benevolently. Malicious bots, those designed with an exploitative or abusive purpose in mind, can also be built from the same framework that creates legitimate social bots. These bad actors are created with the intention of exploiting and manipulating information by infiltrating a population of real, unsuspecting users [13]. If a malicious actor like Russia's Internet Research Agency were invested in creation a large-scale disinformation campaign with bots, a single account would be woefully insufficient to produce meaningful results. Malicious bots can be coordinated with extreme scalability to feign the existence of a unified populous or movement, or to inject disinformation or polarization into an existing community of users [13], [30]. These networks, called "troll factories," "farms," or "botnets," can more effectively enact an influence campaign [9] and are often hired by partisan groups or weaponized by states to underscore or amplify a political narrative. Social Media Usage In large part, the effectiveness of bots depends on users' willingness to engage with social media. Luckily for bots, social media usage in the U.S. has skyrocketed since the medium's inception in the early 2000's. In 2005, as the Internet began to edge into American life as a mainstay of communication, a mere 5% of Americans reported using social media [40], which was then just a burgeoning new form of online interconnectedness. Just a decade and a half later, almost 75% of Americans found themselves utilizing YouTube, Instagram, Snapchat, Facebook, or Twitter. In a similar study, 90% of Americans 18-29, the lowest age range surveyed, reported activity on social media [39]. In 2020, across the globe, over 3.8 billion people, nearly 49% of the world's population, held a presence on social media [23]. In April 2020 alone, Facebook reported that more than 3 billion of those people had used its products [36]. The success of bots also relies on users' willingness to utilize social media not just as a platform for social connections, but as an information source. Again, the landscape is ripe for influence: in January 2021, more than half (53%) of U.S. adults reported reading news from social media and over two-thirds (68%) reported reading news from news websites [45]. In a 2018 Pew study, over half of Facebook users reported getting their news exclusively from Facebook [14]. In large part, this access to information is free, open, and unrestricted, a novel method for the dissemination of news media. Generally, social media has made the transmission of information easier and faster than ever before [22]. Information that once spread slowly by mouth now spreads instantaneously through increasingly massive networks, bringing worldwide communication delays to nearly zero. Platforms like Facebook and Twitter have been marketed by proponents of democracy as a mode of increasing democratic participation, free speech, and political engagement [49]. In theory, Sunstein [47] says, social media as a vehicle of self-governance should bolster democratic
Bowen 4 information sharing. In reality, though, the proliferation of "fake news," disinformation, and polarization have threatened cooperative political participation [47]. While social media was intended to decentralize and popularize democracy and free speech [49], the advent of these new platforms have inadvertently decreased the authority of institutions (DISNFO) and the power of public officials to influence the public agenda [27] by subdividing groups of people into unconnected spheres of information. Social Vulnerabilities Raw code and widespread social media usage alone are not sufficient to usurp an electoral process or disseminate a nationwide disinformation campaign. To successfully avoid detection, spread a narrative, and eventually "hijack" a consumer of social media, bots must work to exploit a number of inherent social vulnerabilities that, while largely predating social media, may be exacerbated by the platforms' novelty and opportunity for relative anonymity [44]. Even the techniques for social exploitation are not new: methods of social self-insertion often mirror traditional methods of exploitation for software and hardware [54]. The primary social vulnerability that bot campaigns may exploit is division. By subdividing large groups of people and herding them into like-minded circles of users inside of which belief- affirmative information flows, campaigns can decentralize political and social narratives, reinforce beliefs, polarize groups, and, eventually, pit groups against one another, even when screens are off [31]. Participatory Media Publically and commercially, interconnectedness, not disconnectedness, is the animus of social media platforms like Facebook, whose public aim is to connect disparate people and give open access to information [58]. In practice, though, this interconnectedness largely revolves around a user's chosen groups, not the platform's entire user base. A participant in social media is given a number of choices: what platforms to join, who to connect with, who to follow, and what to see. Platforms like Facebook and Twitter revolve around sharing information with users' personal connections and associated groups: a tweet is sent out to all of a user's followers, and a Facebook status update can be seen by anyone within a user's chosen group of "friends." Users can post text, pictures, GIFs, videos, and links to outside sources, including other social media sites. Users also have the ability to restrict who can see the content they post, from anyone on the entire platform to no one at all. Users chose what content to participate in and interact with and chose which groups to include themselves in. This choice is the first building block of division: while participation in self-selected groups online provides users with a sense of community and belonging [5], it also builds an individual group identity [20] that may leave users open to manipulation of their social vulnerabilities. Social Media Algorithms
Bowen 5 Because so many people use social media, companies have an ever-increasing opportunity to generate massive profits through advertisement revenue. Potential advertisers, then, want to buy advertisement space on the platforms that provide the most eyes on their products [24]. In order to drive profits and increase the visibility of advertisements on their platforms [24], though, social media companies began to compete to create increasingly intricate algorithms designed to keep users on their platforms for longer periods of time [31], increasing both the number of tweets a user saw and the number of advertisements they would see [24]. Traditionally, the content a user saw on their "timeline" or "feed," the front page of the platform that showed a user's chosen content, was displayed chronologically, from newest to oldest. Modern social media algorithms, though, are designed to maximize user engagement by sorting content from most interesting to least interesting [14], rather than simply newest to oldest (although relevancy and recency are still factors). The most prominent method of sorting content to maximize engagement is a ranking algorithm. On their own, ranking algorithms are designed to prioritize a most likely solution to a given problem. On social media, they are designed to predict and prioritize content that a user is most likely to interact with, thus extending time spent on the platform [24]. Ranking algorithms require a large amount of intricate, personal data to make acute decisions. To amass this information, platforms like Twitter and Facebook collect "engagement" data [24], including how often a user sees a certain kind of post, how long they look at it, whether they click the photos, videos, or links included in the post, and whether they like, repost, share, or otherwise engage with the content. Interacting with a post repeatedly or at length is seen as positive engagement, which provides a subtle cue to the ranking algorithm that a user may be more interested in that kind of content. Even without any kind of interaction, the length of time spent on a post is enough to trigger a reaction by the algorithm. When a user opens Twitter, the algorithm pools all the content posted by users they follow and scores each based on the previously collected engagement data [24]. Higher-ranking posts, the content a user is most likely to engage with, are ordered first, and lower-ranking posts are ordered last. Some algorithms may cascade high-engagement posts, saving some for later in the timeline in the hope of further extending time spent consuming content. Meanwhile, advertisements are sprinkled into the ranking, placed in optimized spaces to fully maximize the likelihood that a user sees and engages with them. Social media algorithms are not designed to damage information consumption [49] or facilitate bot campaigns, but users' ability to selectively build their own personalized profile accidentally leaves them vulnerable to a social attack. Consider this scenario: a user follows only conservative pundits on Twitter (e.g., “@TuckerCarlson,” “@JudgeJeanine”) and interacts only with conservative video posts. If one such pundit inadvertently reposts a video posted by a bot which contains false information, the user is more likely to see that video and thus absorb that false information than someone who follows mixed or liberal sources. The algorithm does not consider whether the information is true— only whether a user is likely to want to interact with it. Viral Spread
Bowen 6 Beyond a user's timeline, many platforms also utilize algorithms to aggregate worldwide user activity data into a public "trending" page that collects and repackages a slice of popular or "viral" topics of conversation on the platform. Virality online is not necessarily random, though. Guadagno et al. [18] found that videos could be targeted at specific users, who themselves could be primed to share content more rapidly and more consistently. Emotional connections, especially those which evoked positive responses, were shared more often than not. Algorithms are demonstrably effective at keeping users engaged for longer periods of time. Their subversive and covert nature also makes their manipulation less obvious, a fact which bots are also effective at utilizing. Algorithms have become the standard for the population of content into a user's feed, from timelines to trending pages to advertisement personalization [24]. Filter Bubbles The primary consequence of algorithmic subdivision of users is what Pariser [35] calls a "filter bubble." By design, social media users adhere to familiar, comfortable spheres that do not challenge their preconceived beliefs and ideas. If, for instance, a liberal user only follows and interacts with liberal accounts, their ranking algorithm will only have liberal content to draw from when creating a tailored feed; as a result, their ranked information will consist of only the most agreeable liberal content available. Even if that user follows some conservative platforms but does not interact with them, those accounts will receive a low ranking and will be less likely to be seen at all. This is a filter bubble: a lack of variability in the content that algorithms feed to users [35]. In an effort to maximize advertisement revenue, algorithms incidentally seal off a user's access to diverse information, limiting them to the bubble that they created for themselves by interacting with the content that they prefer. As users follow other users and engage with content that is already familiar to them, the filter bubble inadvertently surrounds them with content with which they already know and agree with [35]. Psychologically, Pariser says, content feeds tailored to exclusively agreeable content overinflate the confidence of social media users to reflect on their own ideology and upset the traditional cognitive balance between acquiring new ideas and reinforcing old ones. While delivering on the promise of an individually personalized feed, ranking algorithms also serve to amplify the principle of confirmation bias, the tendency to accept unverified content as correct if it agrees with previously held beliefs [35]. In this way, filter bubbles act as feedback loops: as users increasingly surround themselves with content that appeals to their existing understanding, the filter bubble of agreeable content becomes denser and more concentrated. Filter bubbles are the first step towards efficacy for a bot campaign. To be adequately noticed, and thus to disseminate a narrative, a bot needs to be able to work its way into the algorithm pathway that will lead to first contact with a relevant user's filter bubble feed. Echo Chambers
Bowen 7 Bots may also exploit unity as much as division. Mandiberg and Davidson [28] theorized that users' preexisting biases, which are folded into the ranking algorithm process, could drive the filter bubble process to a more extreme level, one that may break through individual algorithmic boundaries. Filter bubbles operate on an individual level— each user's feed is algorithmically tailored to their individual likes and interests. One of the core elements of social media usage, though, is social interaction: users are presented with the choice to follow, unfollow, and even block whomever they please. Given a path of least resistance, users may be accidentally goaded by their filter bubbles into creating for themselves an "ideological cocoon" [16] [47]. Not only are users more likely to read information they agree with inside of their own filter bubble, Karlova and Fisher [22] found, but they are also more likely to share information, news, and content within their established groups if they agree with it, if it interests them, or if they think it would interest someone else within their circle. Gillani et al. [16] posited that homophily may be to blame for the creation of "cocoons." Homophily, a real-world phenomenon wherein people instinctively associate with like-minded groups, has found a natural home in filter bubbles online [56]. Unlike real-world public engagement, though, participatory media platforms are just that— participatory. Because humans have a tendency to select and interact with content that they approve of, homophily, they can choose not to participate in or be a part of groups that they do not identify with [28]. On social media, interactions with others are purely voluntary. Users are able to choose to or not to follow other users. They are able to choose which groups they join and which kinds of content they interact with. They can even completely remove content that is non-compliant with their chosen culture— blocking. Beyond just being more likely to see, social media users are more likely to share information that they agree with; if the platform's algorithm takes sharing into consideration when ranking, it may strengthen a user's filter bubble [1]. This style of behavior is called "selective exposure," and it can quickly lead to an elimination of involuntary participation from a user's social media experience [4]. This combination of factors creates what Geschke, Lorenz, and Holtz [15] describe as a "triple filter bubble." Building on Pariser's [35] definition of the algorithmic filter bubble, they propose a three-factor system of filtration: individual, social, and technological filters. Each filter feeds into the other: individuals are allowed to make their own decisions. In combination, groups of like- minded users make similar choices about which content they willingly consume. Algorithms, knowing only what a user wants to see more of, deliver more individually engaging content. The triple filter bubble cycle has the effect of partitioning the information landscape between groups of like beliefs. A new user to Twitter, a Democrat, may choose to follow only Democratic- leaning accounts (e.g., "@TheDemocrats," "@SpeakerPelosi," "@MSNBC"), but the information sphere they reside in will ostensibly present overwhelmingly pro-Democratic content that rarely provides an ideological challenge to the Democratic beliefs the user had before joining Twitter. When like-minded users' filter bubbles overlap, they cooperatively create an echo chamber, into and out of which counter-cultural media and information are unlikely to cross [15]. Echo chambers
Bowen 8 represent a more concentrated pool of information than a single user's filter bubble or participatory groups: information, shared by users to others on timelines, in reposts, and in direct messages, rarely escapes to the broader platform or into other filter bubbles [18]. Echo chambers can quickly lead to the spread of misinformation [4], even through viral online content called "memes," which are designed to be satirical or humorous in nature. Importantly, Guadagno et al. [18] found that the source of social media content had no impact on a user's decision to share it— only the emotional response it elicited impacted decision-making. An appeal to emotion in an echo chamber can further strengthen the walls of the echo chamber, especially if the appeal has no other grounds of legitimacy. Group Polarization Echo chambers, Sunstein argues, have an unusual collateral effect: group polarization [47]. When people of similar opinions, those confined within a shared echo chamber, discuss an issue, individuals' positions will not remain unchanged, be moderated, or be curtailed by discussion: they will be extremified [48]. Group polarization also operates outside of the confines of social media, in family groups, ethnic groups, and work groups. Inside of an online echo chamber, though, where the saliency of contrary information is low and the saliency of belief-affirming, emotionally reactive information is high, polarization may be magnified [47]. The content users settled in an echo chamber produce tends to follow the same pattern of extremization [48]. Traditionally, the counterstrategy for group polarization has been to expose users to information that is inherently contrary to their held belief (e.g., the exemplary Democratic user should be made to have tweets from "@GOP" integrated into their timeline) [47]. The solution may not be so simple, though: recent research suggests that users exposed to contrary information or arguments that counter the authenticity of a supportive source tend to harden their support for their existing argument rather than reevaluate authenticity [31], [16]. Thus, both contrastive and supportive information, when inserted into an echo chamber, can increase polarization within the group. Extreme opposing content presents another problem for polarization: virality. Content that breaches social norms generates shock value and strong emotions, making it more likely to be circulated than average content [18]. Compounding the visibility of extremity, social media algorithms that categorize and publicize "viral" popular trends utilize that content to maximize engagement. An internal Facebook marketing presentation described the situation bluntly: "Our algorithms exploit the human brain's attraction to divisiveness," one slide said [36]. Because filter bubbles and echo chambers limit the extent to which groups cross-pollenate beliefs and ideas, only extreme and unrepresentative beliefs tend to break through and receive inter-group exposure [31]. This condition, the "if it's outrageous, it's contagious" principle, may provide an answer as to why contrary information on social media tends to push users further away from consensus. Intention may introduce another complication. Yardi's [56] review of several studies on digital group polarization found that people tended to go online not to agree, but to argue. This may also indicate that users are predisposed to rejecting contrary information, allowing them to fall back on their preexisting beliefs.
Bowen 9 Cultivation Echo chambers, filter bubbles, and group polarization are the central social vulnerabilities that bots are able to exploit to deliver a payload of disinformation, but even these may be subject to a much older model of information: cultivation theory. First theorized in the 1970's to explain television's ability to define issues that viewers believed were important, cultivation theory has more recently been reapplied to the burgeoning social media model. The theory states that by selecting specific issues to discuss, the media "set the agenda" for what issues viewers consider important or pervasive [34]. Just like television, social media may play a role in shaping our perceptions of reality [31]. On social media, though, agenda-setting is self-selective: ranking algorithms and filter bubbles rather than television producers frame users’ understanding of the world through the information they consume. The information that users chose to post is also self-selective: images or stories may represent only a sliver of reality or may not represent reality at all [34]. Authenticity Malicious bots' manipulation of social vulnerabilities is predicated on a constant and ongoing appeal to the users whom they hope to target by tailoring their identity and content to appeal to those targets. All of this shared content, though, requires authenticity, the perception that a piece of content is legitimate, by average users [30]. In a traditional news environment, content with a well-recognized name (e.g., CNN, Wall Street Journal) or expertise on a topic (e.g., U.S. Department of State) often carries authenticity just by nature of being associated with that well- known label. In contrast, content online can acquire authenticity by being posted by an average, or visibly average, user: the more organic, "bottom-up" content is shared or rewarded with interactions, the more authenticity it generates. Content that is false but appeals to a user's filter bubble [35] and appears authentic is more likely to be spread [30]. Authenticity can be achieved at multiple levels of interaction with a piece of content: a tweet itself, the account's profile, the account's recent tweets, the account's recent replies, and the timing and similarity between each of the former. Low-level authenticity analysis requires the least amount of available information, while high-level authenticity checks require most or all available information. Authenticity may also be tracked over time: users that see a single account repeatedly, view its profile, or regularly interact with its content must have long-term reinforcement of the account's percieved legitimacy. Having the hallmarks of a legitimate account, including posting on a variety of topics, can help increase a bot's authenticity. Authenticity is generated through the passage of deception checks, the process of looking for cues about the authenticity or lack thereof for an account. These checks can occur at each layer of authenticity and can help uncover unusual behavior indicative of a bot [58]. Bots that do not have multiple layers of authenticity are less effective at preying on social vulnerabilities and thus less effective at achieving their intended goal.
Bowen 10 All content and all users on a platform are subject to deception checks, but even cues for deception checks are subject to social vulnerabilities [22]. Content that appeals to a preexisting bias may be able to bypass checks and deliver the desired narrative payload. Authority If a user has authenticity, it has authority, or the perception that, based on authenticity, a user is acceptable and that their information is accurate. Authority can either come from the top or the bottom of an information landscape. "Top-down" authority is derived from a single figure with high authenticity, like official sources, popular figures, or verified users (e.g., "@POTUS," "@VP," "@BarackObama"). "Bottom-up" authority is derived from a wide number of users who may have less authenticity, but have a collective consensus, expose a user to a concept repeatedly, or are swept up in some "viral" content (e.g., the sum of the accounts a user is following). Authority and authenticity are not universal: different users have different standards of authenticity, perform varying degrees of deception checks, and agree variably on the legitimacy of a source. Authority and authenticity are often conflated with agreeability by users: the degree to which they ideologically agree with a piece of content may dictate its reliability. While this is not a legitimate method for deception checks, bots can effectively prey on users by exploiting their social vulnerabilities and predilection for agreeable information to disseminate a desired narrative [22]. Once authenticity and authority have been established, though, a user is likely to accept the information that they consume from these sources, regardless of its actual veracity. Disinformation Riding the wave of algorithmic sorting and exploiting social vulnerabilities, malicious bots can weaponize authenticity and authority to distribute their primary payloads: damaging information. As the efficacy of filter bubbles show, consumers of social media are actively fed information that agrees with their social circles' previously held beliefs, enhancing the effect off the filter bubble feedback loop [2]. In fact, the desire for emotionally and ideologically appealing, low-effort, biased information is so strong that social media users are predisposed to accepting false information as correct if it seems to fit within their filter bubble [2] and if it has percieved authority. Accounts online can spread three types of information through filter bubbles and echo chambers. Information in its base form is intentionally true and accurate, while misinformation is unintentionally inaccurate, misleading, or false. Critically, though, disinformation, the favored fashion of information spread by malicious bots, is more deliberate. Bennett and Livingston [2] define disinformation as "intentional falsehoods spread as news stories or simulated documentary formats to advance political goals." Simply put, disinformation is strategic misinformation [38]. Fig. 1 depicts such a strategy: by mixing partially true and thoroughly false statements in one image, the lines between information and disinformation are easily blurred.
Bowen 11 Sparkes-Vian [46] argues that while the democratic nature of online social connectivity should foster inherent counters to inaccurate or intentionally false information, appeals to authenticity and subversive tactics for deception online supersede corrective failsafes and allow disinformation to roost. Disinformation can be so effective at weaponizing biases that it can be spread through filter bubbles in the same manner as factual information [26]. Consensus on the mechanisms of disinformation's efficacy has yet to be reached. Some research has found that deception checks may be "hijacked" by existing biases [38], but this conclusion is undercut by the fact that rationalization of legitimate contrary information spurs increased polarization [31]. Pennycook [38], meanwhile, has concluded that such Figure 1 reasoning does not occur at all in the social media setting: "cognitive laziness," a lack of engagement of critical thinking skills while idly scrolling through social media, may disengage critical reasoning skills entirely. Social media users that consistently implemented deception checks and reflective reasoning, Pennycook found, were more likely to discern disinformation from information. Even when reading politically supportive or contrastive information, users who performed effective deception checks were more efficacious at rooting out misinformation. Memes "Memes," shared jokes or images that evolve based on an agreed-upon format, offer an easy vehicle for disinformation to spread without necessarily needing to generate the same authenticity cues as fake news [46]. Like disinformation, memes often appeal to emotion or preexisting biases, propagating quickly through filter bubbles. Memes are also easily re-formatted and re-posted with an account-by-account augmentable meaning. According to Sparks-Vian [46], memes are shared either by "copying a product," where the identical likeness of a piece of content is shared repeatedly, or "copying by instruction," where a base format is agreed upon and variations of the format are shared individually with varying meanings and techniques. Fig. 2 depicts a "copy by instruction" meme format with disinformation. Figure 2 Doxing Another less common tactic for disinformation spreading is "doxing," whereby private information or potentially compromising material is published on sites like WikiLeaks or Предатель ("traitor")
Bowen 12 and redistributed on social media [30]. Doxing, also used as a tactic of the American alt-right on message boards like 4chan and 8chan, has most visibly been used to leak emails from the Democratic National Committee (DNC) in June 2016 and French presidential candidate Emanuel Macron in May 2017 [29]. Fake News One of the most-discussed mediums for disinformation is "fake news," news stories built on or stylized by disinformation. While the term entered the national lexicon during the 2016 U.S. presidential election, misrepresentative news articles are not a new problem— since the early 2000's, online fake news has been designed specifically to distort a shared social reality [57]. Fake news may proliferate along the same channels as disinformation, and as with other forms of disinformation, fake news, inflammatory articles, and conspiracy theories inserted into an echo chamber may increase group polarization [58]. Fake news articles and links may bolster a malicious bot's efforts to self-insert into a filter bubble, especially if headlines offer support for extant beliefs [57]. While disinformation in its raw form often carries little authenticity, disinformation stylized as legitimate news may build authority, even if the source represents a false claim. If a fake website, like that in Fig. 3, can pass low-level deception checks, it can bolster the legitimacy of a claim, thus boosting the authority of a narrative. Of course, fake news can contribute to disinformation by simply being seen and interpreted as legitimate. Russia's Internet Research Agency Figure 3 effectively weaponized fake news to galvanize its readers away from more credible sources during the 2016 U.S. presidential election. By investing in the bottom-up narrative that the mainstream media was actually "fake news" and that alternative sources were the only legitimate way to understand the world, Russia was able to label the media, users, and platforms attempting to correct false information as suppressers of a real narrative. A similar waning trust in traditional media and political principles allowed the IRA to engage in the complete fabrication of events of its own design [26]. Prior Exposure Fake news is also able to effectively spread by praying on a more subtle social vulnerability: prior exposure. Pennycook, Cannon, and Rand [37] found that simply seeing a statement multiple times
Bowen 13 increased readers' likelihood of recalling the statement as accurate later. Even statements that were officially marked as false were more likely to be recalled as true later. Similarly, blatantly partisan and implausible headlines, like that of Fig. 3, are more likely to be recalled as true if users are repeatedly exposed to them online. Just a single prior exposure to fake news of any type was enough to increase the likelihood of later misidentification. Prior exposure to fake news creates the "Illusory Truth Effect:" since repetition increases the cognitive ease with which information is processed, repetition can be incorrectly used to infer accuracy [37]. The illusory truth problem supports Pennycook's [38] cognitive laziness hypothesis: because humans "lazy scroll" on social media and passively consume information, repetition is an easy way to mentally cut corners for processing information. As a result, though, false information that comes across a user's social media feed repeatedly is more likely to be believed, even if it is demonstrably false. Propaganda Damaging information need not be false at all, though. Propaganda, another form of information, may be true or false, but consistently pushes a political narrative and discourages other viewpoints [49]. Traditional propaganda, like that generated by Soviet Union propaganda factories during the 20th century [25], follows the top-down authority model, being created by a state or organization seeking to influence public opinion. Propaganda on social media, however, may follow the bottom- up authority model: being generated not by a top-down media organization or state, but dispersed laterally by average users [26]. Organic, or seemingly organic, propaganda is more effective than identical, state-generated efforts [46]. Figure 4 One of the central benefits of social media is the accessibility of propaganda re-distributors: retweets, reposts, and tracked interactions may bolster the visibility of a narrative simply because real users interacted with it. Fig. 4 depicts a piece of top-down propaganda that attempts to utilize factual information to appeal to a lateral group of re-distributors. Just like disinformation, propaganda can spread rapidly online when introduced to directly target a filter bubble [49]. Unlike traditional disinformation, though, propaganda that preys on social vulnerabilities is not designed for a reader to believe in its premise, but to radicalize doubt in truth altogether [7]. Bottom-up propaganda can be mimicked inorganically by top-down actors: distributors of disinformation and propaganda engage in "camouflage" to disseminate seemingly legitimate content through online circles [26]. To effectively shroud propaganda as being organic, bottom-up content, distributors must build a visible perception of authority.
Bowen 14 Cognitive Hacking Exploiting social vulnerabilities, weaponizing algorithms, and deploying disinformation and propaganda all serve the ultimate aim for malicious bots: manipulation. Linvill, Boatwright, Grant, and Warren [26] suggested that these methods can be used to engage in "cognitive hacking," exploiting an audience's predisposed social vulnerabilities. While consumers of social media content should judge content by checking for cues that may decrease authenticity and credibility [22], content that both appeals to a preconceived viewpoint and appears authentic is able to bypass deception checks [30]. The traditional media tactic of agenda setting argues that media coverage influences public perceptions of issues as salient. In contrast, Linvill and Warren [27] suggest that public agenda building, behavior responses to social movements online can be influenced by disinformation and propaganda. Mass media altering issue salience in mass audiences (agenda setting) is not as effective as mass audiences generating issue salience collectively (agenda building). State-sponsored efforts to influence the public agenda are less effective than naturally generated public discussion, and as such efforts to alter the public's agenda building are more effective when they are generated either by citizens themselves or by users that appear to be citizens [27]. To this end, malign actors utilize bots in social media environments to disseminate their narrative. Hacking in Practice Karlova and Fisher [22] argue that governments can exploit social vulnerabilities to disseminate disinformation and propaganda. Governments and organizations that engage in the manipulation of information, currency, and political narratives online and on social media have been labeled by the European Commission as "hybrid threats:" states capable of engaging in both traditional and so-called "non-linear" warfare waged entirely with information [2]. One such state is Russia, whose modern disinformation campaigns rose from the ashes of the Soviet Union's extensive propaganda campaigns throughout the 20th century [25]. Case Study: The Internet Research Agency The centerpiece of the Russian social media disinformation campaigns since at least 2013 has been the St. Petersburg-based Internet Research Agency [7], a state-sponsored troll factory, a group dedicated to creating and managing bots [27]. The IRA's reach was deep: measuring IRA-generated tweet data and Facebook advertisement log data, up to one in 40,000 internet users were exposed to IRA content per day from 2015 to 2017 [21]. The IRA's efforts were first exposed after a massive, multi-year coordinated effort to influence the outcome of the 2016 United States presidential election [32]. Their goal was not to pursue a particular definition of truth and policy, but to prevent social media users from being able to trust authorities, to encourage them to believe what they were told, and to make indistinguishable truth from disinformation [7].
Bowen 15 To those ends, the IRA utilized bots in a variety of multi-national campaigns to amplify a range of viewpoints and orientations to decrease coordination in both liberal and conservative camps [27]. Early on, Russian-operated accounts inserted themselves into natural political discourse on Twitter, Facebook, and Instagram to disseminate sensational, misleading, or even outright false information [26]. They worked to "delegitimize knowledge" not at the top levels of public media consumption, but at the ground level of interpersonal communication online. To create a sense of authenticity and bottom-up authority [30], IRA accounts on Twitter, Facebook, and Instagram built identities as legitimate citizens and organizations with a spectrum of political affiliations, from deep partisan bias to no affiliation at all [27]. Many accounts acted in concert, generating a fluid machine process, which Linvill and Warren [26] liken to a modern propaganda factory. To overcome, or perhaps social media filter bubbles, the IRA generally operated a wide variety of accounts, including pro-left, pro-right, and seemingly non-partisan news organizations [26]. Increasing authenticity in these political circles meant posting overtly political content and relevant "camouflage" that signals to a user that the account is, in fact, operated by a legitimate citizen. Case Study: 2014 Ukrainian Protests In one of Russia's earliest exertion of bot operations, state actors conducted disinformation operations in nearby Ukraine beginning in 2014 [30]. Protest movements in response to Russian- sympathetic Ukrainian president Viktor Yanukovych flourished on social media, but both Russian and Ukrainian authorities were able to disrupt protest movements by inserting disinformation into the social media platforms on which protestors were actively planning. The Ukrainian social media protests occurred just a few years after the Arab Spring protests in which Twitter and Facebook played vital roles in organization, dissemination of information, and free expression. The speed at which information was passed regarding the Ukrainian protests online heightened insecurity at the upper levels of the Russian and Ukrainian governments [30]. Russian and Ukrainian state actors posing as protestors inserted disinformation into social circles. A 2013 photo of a Syrian war victim was used to show how Ukrainian soldiers had attacked a young boy in Ukraine [30]. Screenshots from notoriously violent Belarussian film "The Brest Fortress" were used to show a little girl crying over the body of her mother. While both images were demonstrably false, the content was still used both to dissuade protestors from publicly joining the effort and destabilize Ukrainian citizen coordination. Because it came from outwardly "regular" or legitimate sources and thus carried both authenticity and bottom-up authority, the false content carried inherently high credibility with protestors [30]. Ukraine was one of Russia's earliest forays into social media manipulation, and a new step away from direct intimidation of opponents [30]. Its experiments in limited online social circles showed state actors that citizens can actively participate in the creation and dissemination of disinformation and propaganda [30] without the state having to exert a previously overt role [25].
Bowen 16 Case Study: 2016 U.S. Presidential Election The genesis of the national conversation of bot campaigns was the 2016 U.S. presidential election, where a coordinated effort by the IRA first sought to destabilize the U.S. political system and erode trust in the political process [32]. The IRA's goal in the election was not to directly support a Trump presidency, but to sow discord, foster antagonism, spread distrust in authorities, and amplify extremist viewpoints [27]. Their methods, though, seemed to favor a Trump presidency overall [26]. On May 17, 2017, former FBI director Robert Mueller was appointed to a head a special counsel investigation into Russian interference into the election. While much of the 448-page report, which was released on April 18, 2019, remains redacted, publicly available information in the report details the IRA's adherence to advertisements, Facebook groups, and Twitter trolls to spread disinformation [32]. In sum, the IRA employed nearly 3,900 false Twitter accounts to produce, amplify, and insert disinformation, propaganda, fake news, and divisive content into preexisting American social media circles [26]. The organization utilized a combination of bot techniques and preyed on several social vulnerabilities to sow discord on Twitter, operating accounts with explicitly partisan leanings ("@TEN_GOP") and accounts with bottom-up authenticity ("@Pamela_Moore13," "@jenn_abrams") [32]. IRA Twitter content was viewed and interacted with over 414 million times on the platform between 2015 and 2017 [14]. On Facebook, the IRA focused on acquiring the support of existing American users with partisan groups ("Being Patriotic," "Secured Borders," "Tea Party News") and social justice groups ("Blacktivist," "Black Matters," "United Muslims of America") [32]. The IRA also purchased approximately $100,000 worth of algorithmically targeted Facebook advertisements promoting IRA-operated groups and pro-Trump, anti-Clinton messaging. IRA Facebook content reached a total of 126 million American users [14]. Hall [19] found that American citizens online were unable to differentiate between false 2016 election content and real content on either platform. Even if users were able to distinguish between the two, false information and propaganda often still affected user opinion. Specific Attacks: 2017 French Presidential Election Just five months after their successful influence campaign in the U.S., the IRA set its sights on the French presidential election. Evidence collected by Ferrara [12] suggested that troll factories attributed to the Kremlin assembled a coordinated campaign against centrist candidate Emanuel Macron and his party, "En Marche," in France's 2017 presidential elections. 4chan.org, a mostly American far-right messaging board, fostered the initial discussion of leaking or even manufacturing documents to incriminate the Macron campaign. On May 5, 2017, just two days before the presidential election, a coordinated doxing effort released "MacronLeaks" to the public via well-known leak aggregator WikiLeaks, mirroring a similar Russian effort against the DNC in the United States the previous year [11].
Bowen 17 Russian disinformation groups seized on the initial MacronLeaks dump, coordinating another bottom-up assault via social media [12]. They dispatched bots which largely amplified existing narratives, re-tweeting tweets pertaining to the dump or sending tweets which included "#MacronLeaks," "#MacronGate," or just "#Macron." The bots also generated tweets aimed at Marine Le Pen, a far-right candidate and Macron's electoral opponent. Beginning on April 30, 2017, the bot campaign ramped up quickly and generated at peak 300 tweets per minute and continued through Election Day, May 7, and fading out by the following day. When the MacronLeaks dump occurred on May 5, bot-generated content regarding Macron and the election began to increase. At peak, bot-generated content competed with or matched human-generated posts, suggesting that the MacronLeaks campaign generated substantial public attention. Additionally, increases in bot traffic and content on Twitter tended to slightly precede corresponding increases in human content, suggesting that bots were able to "cognitively hack" human conversations and generate new discussion topics, particularly regarding controversial issues [12]. The bot campaign in France, though, has been largely regarded as a failure: the program, despite generating substantial online discussion, had little success at swaying French voters [12]. It has been suggested that because the majority of bot-generated Tweets were in English, not French, the language selection both snubbed native-French participation and engaged the American anti- Macron user base, contributing to the volume of Tweets about the MacronLeaks dump. Case Study: 2020 U.S. Presidential Election After the successful execution of the bot operation during the 2016 Election, Russia turned its sights on the 2020 election, which it again hoped to mold in favor of its foreign policy interests. The 2018 Mueller Investigation, though, had thoroughly exposed the IRA's playbook, and social media platforms had already begun reacting to pervasive misinformation with more stringent social media restrictions [53], necessitating a new strategy for influence. Reorganizing under the less stigmatic moniker "Lakhta Internet Research" (LIR), which first operated during the 2018 U.S. midterm elections, the IRA focused its mission fully on amplifying existing domestic narratives and issues [33]. In April 2020, a cooperative investigation between CNN and Clemson University behavioral science researchers Darren Linvill and Patrick Warren uncovered one of the LIR's new tactics: outsourcing [52]. A pair of proxy troll farms, located in Ghana and Nigeria, produced content specifically targeted Black Americans, working to inflame racial tensions and heighten awareness of police brutality. In just eight months, LIR-owned accounts operated out of these troll factories garnered almost 345,000 followers across Twitter, Instagram, and Facebook. The exposition of the Ghanaian and Nigerian proxy troll factories did not deter Moscow or the LIR. Just four months later, on August 7, 2020, National Counterintelligence and Security Center director William Evanina released a statement outlining the Intelligence Community's confidence in a sustained, pervasive social media influence operation to sway voters and increase social discord [10]. Simultaneously, a Carnegie Mellon University study found that 82% of the most influential accounts on social media distributing information about the COVID-19 pandemic were bots [58].
Bowen 18 But on November 2, 2020, just one day before Election Day, researchers at the Foreign Policy Research Institute (FPRI) noted that it was state news channels and official state sources, not swarms of bots and trolls, that were pushing divisive and anti-democratic narratives about the election, particularly utilizing phrases like "rigged" and "civil war" [3]. A separate analysis by the FPRI on the same day came to the same conclusion: state-sponsored networks, personalities, and figures seemed to be doing the heavy lifting of disinformation distribution, especially on the issue of mail-in voting, which had received a similar stamp of "fraudulence" from state sources [51]. After the election, it appeared Moscow's efforts to bolster President Trump's reelection had failed. In a post-election autopsy report on election interference, the National Intelligence Council (NIC) dissected intelligence on the LIR's tactics, successful or otherwise, during the election cycle [33]. Rather than focus on generating bottom-up authority with an army of bots online, the Kremlin seemingly took an approach of top-down authority instead, using existing social media personas, unreliable news websites, and visible U.S. figures and politicians to deliver its divisive and sometimes outright false messaging. Its comparatively minor bot campaign, centered around the LIR, served mostly to amplify U.S. media coverage of the messaging it had pushed through its other mediums, and tended to ride the coattails of established personas online, including, the report hints, Rudy Giuliani, President Trump's personal lawyer, and President Trump himself. The report also noted, though, that in the month leading up to the election, Moscow opted to shift its tactics again, beginning to discredit an incoming Biden administration and the results of the election rather than continue to support what it viewed as a shrinking possibility for a Trump reelection. After the election, it doubled down on the "rigging" narrative, raising questions about the validity of Biden's election, exclusively to foster further distrust of the system [33]. Watts [53] argues that the efficacy of bots, trolls, and foreign influence, though, paled in comparison to the wide-reaching success of domestic disinformation. Top-down authority was more effective in the 2020 U.S. presidential election, Watts argues, because one of Moscow's most effective amplifiers of disinformation was already in the White House. Indeed, the NIC found that the operation largely rested on existing, popular figures to disseminate the narratives Moscow had chosen [33]. Meanwhile, filter bubbles, echo chambers, group polarization, and preexisting domestic attitudes culminated in the violent January 6, 2021 raid of the U.S. Capitol by Trump supporters who cited the very claims of fraudulence that the Kremlin's disinformation campaign had produced. If Russia's efforts in the 2016 U.S. presidential election laid the groundwork for the social diffusion of discord, the 2020 election was a successful trial run. Despite its candidate's loss in the election, Moscow's influence operation was able to convince a significant portion of the U.S. population that their election system was illegitimate and broken enough to warrant an insurrection against it [53]. This succinctly meets one of Moscow's long-term foreign policy goals of destabilizing the Western-led Liberal International Order. Bot Tactics In these cases, malicious bots were successful because they effectively bred bottom-up authority and consensus on the narratives they delivered. To exploit social vulnerabilities and "hack" users
Bowen 19 in these manners, though, bots must utilize a number of tactics to create their authentic personas and bolster their authenticity and thus authority. Bypassing Controls In order to operate a successful network of bots, the accounts must first be created and designed. When a traditional user creates their account, they are prompted to enter their name, email, phone number, and date of birth. They must also confirm their legitimacy by recalling a numeric code sent to their email address or phone number. Occasionally, users may also be asked to solve a CAPTCHA, ("Completely Automated Public Turing test to tell Computers and Humans Apart"), a test designed to differentiate artificial and human users which often involves mouse movement tracking or picture recognition. Platforms like Twitter have gradually increased the difficulty of creating new accounts to dissuade troll factory creators, from producing hordes of accounts en masse. Similarly, Google, which offers simple email account creation tools, increasingly requires cross-authentication to create new accounts. Additionally, both platforms may track the number of new creations on a given IP address, blocking the address after a certain number of account creations in a given period. On Twitter, developer accounts with tightly controlled Application Programming Interface (API) calls are required to own and control bots. Rates for connection to the API, both for reading and writing, are limited to prevent spamming and massive data collection. A single bot or small group of accounts, though, is largely insufficient to produce the desired illusion of bottom-up authority required to conduct an effective campaign of influence. As restrictions become more stringent, the cybersecurity arms race has encouraged elicit troll factories to craft new tools to work around these restrictions and maximize their output for the creation of a bot army. Rather than apply for a Twitter developer account and work under the confines of rate limits, HTML parsers in programming languages like Python allow troll factories to bypass the API entirely by reading, interpreting, and hijacking the Twitter Web App interface, which is not rate limited. As a result, many bots may display a disproportionate usage of the Twitter Web App. Similarly, programs and scripts designed to bypass proxy limiting, account verification, and CAPTCHA checks are freely available and accessible online. Tools like the PVA Account Creator combine these tactics into one functioning application, allowing users to farm accounts with little resistance or repudiation from Twitter or Google. Obfuscation Once an account is created and Twitter's restrictions have been bypassed, the more difficult task of maintaining secrecy and effective operation begins. If a bot wants to remain online and avoid detection by real users employing deception checks, it must engage in a number of obfuscation tactics to increase its authority and blend in with real users. The most glaring flaw of the bottom-up propaganda model, of which bots are a facet, is that the creation of individual authenticity required to support bottom-up propaganda is much more
You can also read