"IT DOESN'T MATTER NOW WHO'S RIGHT AND WHO'S NOT:" A MODEL TO EVALUATE AND DETECT BOT BEHAVIOR ON TWITTER - OhioLINK ETD Center

Page created by Louis Graves
 
CONTINUE READING
"IT DOESN'T MATTER NOW WHO'S RIGHT AND WHO'S NOT:" A MODEL TO EVALUATE AND DETECT BOT BEHAVIOR ON TWITTER - OhioLINK ETD Center
“IT DOESN’T MATTER NOW WHO’S RIGHT AND WHO’S NOT:”
A MODEL TO EVALUATE AND DETECT BOT BEHAVIOR ON TWITTER

 by

 Braeden Bowen

 Honors Theis
 Submitted to the Department of Computer Science
 and the Department of Political Science
 Wittenberg University
 In partial fulfillment of the requirements
 for Wittenberg University honors
 April 2021
"IT DOESN'T MATTER NOW WHO'S RIGHT AND WHO'S NOT:" A MODEL TO EVALUATE AND DETECT BOT BEHAVIOR ON TWITTER - OhioLINK ETD Center
Bowen 2

On April 18, 2019, United States Special Counsel Robert Mueller III released a 448-page report
on Russian influence on the 2016 United States presidential election [32]. In the report, Mueller
and his team detailed a vast network of false social media accounts acting in a coordinated,
concerted campaign to influence the outcome of the election and insert systemic distrust in
Western democracy. Helmed by the Russian Internet Research Agency (IRA), a state-sponsored
organization dedicated to operating the account network, the campaign engaged in "information
warfare" to undermine the United States democratic political system.

Russia's campaign of influence on the 2016 U.S. elections is emblematic of a new breed of warfare
designed to achieve long-term foreign policy goals by preying on inherent social vulnerabilities
that are amplified by the novelty and anonymity of social media [13]. To this end, state actors can
weaponize automated accounts controlled through software [55] to exert influence through the
dissemination of a narrative or the production of inorganic support for a person, issue, or event
[13].

Research Questions
This study asks six core questions about bots, bot activity, and disinformation online:

RQ 1: What are bots?
RQ 2: Why do bots work?
RQ 3: When have bot campaigns been executed?
RQ 4: How do bots work?
RQ 5: What do bots do?
RQ 6: How can bots be modeled?

Hypotheses
With respect to RQ 6, I will propose BotWise, a model designed to distill average behavior on the
social media platform Twitter from a set of real users and compare that data against novel input.
Regarding this model, I have three central hypotheses:

H 1: real users and bots exhibit distinct behavioral patterns on Twitter
H 2: the behavior of accounts can be modeled based on account data and activity
H 3: novel bots can be detected using these models by calculating the difference between modeled
behavior and novel behavior

Bots
Automated accounts on social media are not inherently malicious. Originally, software robots, or
"bots," were used to post content automatically on a set schedule. Since then, bots have evolved
significantly, and can now be used for a variety of innocuous purposes, including marketing,
distribution of information, automatic responding, news aggregation, or just for highlighting and
reposting interesting content [13].

No matter their purpose, bots are built entirely from human-written code. As a result, every action
and decision they are made capable of replicating must be preprogrammed and decided by the
account's owner. But because they are largely self-reliant after creation, bots can generate massive
amounts of content and data very quickly.
"IT DOESN'T MATTER NOW WHO'S RIGHT AND WHO'S NOT:" A MODEL TO EVALUATE AND DETECT BOT BEHAVIOR ON TWITTER - OhioLINK ETD Center
Bowen 3

Many limited-use bots make it abundantly clear that they are inhuman actors. Some bots, called
social bots, though, attempt to subvert real users by emulating human behavior as closely as
possible, creating a mirage of imitation [13]. These accounts may attempt to build a credible
persona as a real person in order to avoid detection, sometimes going as far as being partially
controlled by a human and partially controlled by software [54]. The more sophisticated the bot,
the more effectively it can shroud itself and blend into the landscape of real users online.

Not all social bots are designed benevolently. Malicious bots, those designed with an exploitative
or abusive purpose in mind, can also be built from the same framework that creates legitimate
social bots. These bad actors are created with the intention of exploiting and manipulating
information by infiltrating a population of real, unsuspecting users [13].

If a malicious actor like Russia's Internet Research Agency were invested in creation a large-scale
disinformation campaign with bots, a single account would be woefully insufficient to produce
meaningful results. Malicious bots can be coordinated with extreme scalability to feign the
existence of a unified populous or movement, or to inject disinformation or polarization into an
existing community of users [13], [30]. These networks, called "troll factories," "farms," or
"botnets," can more effectively enact an influence campaign [9] and are often hired by partisan
groups or weaponized by states to underscore or amplify a political narrative.

Social Media Usage
In large part, the effectiveness of bots depends on users' willingness to engage with social media.
Luckily for bots, social media usage in the U.S. has skyrocketed since the medium's inception in
the early 2000's. In 2005, as the Internet began to edge into American life as a mainstay of
communication, a mere 5% of Americans reported using social media [40], which was then just a
burgeoning new form of online interconnectedness. Just a decade and a half later, almost 75% of
Americans found themselves utilizing YouTube, Instagram, Snapchat, Facebook, or Twitter. In a
similar study, 90% of Americans 18-29, the lowest age range surveyed, reported activity on social
media [39]. In 2020, across the globe, over 3.8 billion people, nearly 49% of the world's
population, held a presence on social media [23]. In April 2020 alone, Facebook reported that more
than 3 billion of those people had used its products [36].

The success of bots also relies on users' willingness to utilize social media not just as a platform
for social connections, but as an information source. Again, the landscape is ripe for influence: in
January 2021, more than half (53%) of U.S. adults reported reading news from social media and
over two-thirds (68%) reported reading news from news websites [45]. In a 2018 Pew study, over
half of Facebook users reported getting their news exclusively from Facebook [14]. In large part,
this access to information is free, open, and unrestricted, a novel method for the dissemination of
news media.

Generally, social media has made the transmission of information easier and faster than ever before
[22]. Information that once spread slowly by mouth now spreads instantaneously through
increasingly massive networks, bringing worldwide communication delays to nearly zero.
Platforms like Facebook and Twitter have been marketed by proponents of democracy as a mode
of increasing democratic participation, free speech, and political engagement [49]. In theory,
Sunstein [47] says, social media as a vehicle of self-governance should bolster democratic
"IT DOESN'T MATTER NOW WHO'S RIGHT AND WHO'S NOT:" A MODEL TO EVALUATE AND DETECT BOT BEHAVIOR ON TWITTER - OhioLINK ETD Center
Bowen 4

information sharing. In reality, though, the proliferation of "fake news," disinformation, and
polarization have threatened cooperative political participation [47]. While social media was
intended to decentralize and popularize democracy and free speech [49], the advent of these new
platforms have inadvertently decreased the authority of institutions (DISNFO) and the power of
public officials to influence the public agenda [27] by subdividing groups of people into
unconnected spheres of information.

Social Vulnerabilities
Raw code and widespread social media usage alone are not sufficient to usurp an electoral process
or disseminate a nationwide disinformation campaign. To successfully avoid detection, spread a
narrative, and eventually "hijack" a consumer of social media, bots must work to exploit a number
of inherent social vulnerabilities that, while largely predating social media, may be exacerbated by
the platforms' novelty and opportunity for relative anonymity [44]. Even the techniques for social
exploitation are not new: methods of social self-insertion often mirror traditional methods of
exploitation for software and hardware [54].

The primary social vulnerability that bot campaigns may exploit is division. By subdividing large
groups of people and herding them into like-minded circles of users inside of which belief-
affirmative information flows, campaigns can decentralize political and social narratives, reinforce
beliefs, polarize groups, and, eventually, pit groups against one another, even when screens are off
[31].

Participatory Media
Publically and commercially, interconnectedness, not disconnectedness, is the animus of social
media platforms like Facebook, whose public aim is to connect disparate people and give open
access to information [58].

In practice, though, this interconnectedness largely revolves around a user's chosen groups, not the
platform's entire user base. A participant in social media is given a number of choices: what
platforms to join, who to connect with, who to follow, and what to see. Platforms like Facebook
and Twitter revolve around sharing information with users' personal connections and associated
groups: a tweet is sent out to all of a user's followers, and a Facebook status update can be seen by
anyone within a user's chosen group of "friends." Users can post text, pictures, GIFs, videos, and
links to outside sources, including other social media sites. Users also have the ability to restrict
who can see the content they post, from anyone on the entire platform to no one at all.

Users chose what content to participate in and interact with and chose which groups to include
themselves in. This choice is the first building block of division: while participation in self-selected
groups online provides users with a sense of community and belonging [5], it also builds an
individual group identity [20] that may leave users open to manipulation of their social
vulnerabilities.

Social Media Algorithms
"IT DOESN'T MATTER NOW WHO'S RIGHT AND WHO'S NOT:" A MODEL TO EVALUATE AND DETECT BOT BEHAVIOR ON TWITTER - OhioLINK ETD Center
Bowen 5

Because so many people use social media, companies have an ever-increasing opportunity to
generate massive profits through advertisement revenue. Potential advertisers, then, want to buy
advertisement space on the platforms that provide the most eyes on their products [24].

In order to drive profits and increase the visibility of advertisements on their platforms [24],
though, social media companies began to compete to create increasingly intricate algorithms
designed to keep users on their platforms for longer periods of time [31], increasing both the
number of tweets a user saw and the number of advertisements they would see [24].

Traditionally, the content a user saw on their "timeline" or "feed," the front page of the platform
that showed a user's chosen content, was displayed chronologically, from newest to oldest. Modern
social media algorithms, though, are designed to maximize user engagement by sorting content
from most interesting to least interesting [14], rather than simply newest to oldest (although
relevancy and recency are still factors).

The most prominent method of sorting content to maximize engagement is a ranking algorithm.
On their own, ranking algorithms are designed to prioritize a most likely solution to a given
problem. On social media, they are designed to predict and prioritize content that a user is most
likely to interact with, thus extending time spent on the platform [24].

Ranking algorithms require a large amount of intricate, personal data to make acute decisions. To
amass this information, platforms like Twitter and Facebook collect "engagement" data [24],
including how often a user sees a certain kind of post, how long they look at it, whether they click
the photos, videos, or links included in the post, and whether they like, repost, share, or otherwise
engage with the content. Interacting with a post repeatedly or at length is seen as positive
engagement, which provides a subtle cue to the ranking algorithm that a user may be more
interested in that kind of content. Even without any kind of interaction, the length of time spent on
a post is enough to trigger a reaction by the algorithm.

When a user opens Twitter, the algorithm pools all the content posted by users they follow and
scores each based on the previously collected engagement data [24]. Higher-ranking posts, the
content a user is most likely to engage with, are ordered first, and lower-ranking posts are ordered
last. Some algorithms may cascade high-engagement posts, saving some for later in the timeline
in the hope of further extending time spent consuming content. Meanwhile, advertisements are
sprinkled into the ranking, placed in optimized spaces to fully maximize the likelihood that a user
sees and engages with them.

Social media algorithms are not designed to damage information consumption [49] or facilitate
bot campaigns, but users' ability to selectively build their own personalized profile accidentally
leaves them vulnerable to a social attack. Consider this scenario: a user follows only conservative
pundits on Twitter (e.g., “@TuckerCarlson,” “@JudgeJeanine”) and interacts only with
conservative video posts. If one such pundit inadvertently reposts a video posted by a bot which
contains false information, the user is more likely to see that video and thus absorb that false
information than someone who follows mixed or liberal sources. The algorithm does not consider
whether the information is true— only whether a user is likely to want to interact with it.
Viral Spread
Bowen 6

Beyond a user's timeline, many platforms also utilize algorithms to aggregate worldwide user
activity data into a public "trending" page that collects and repackages a slice of popular or "viral"
topics of conversation on the platform.

Virality online is not necessarily random, though. Guadagno et al. [18] found that videos could be
targeted at specific users, who themselves could be primed to share content more rapidly and more
consistently. Emotional connections, especially those which evoked positive responses, were
shared more often than not.

Algorithms are demonstrably effective at keeping users engaged for longer periods of time. Their
subversive and covert nature also makes their manipulation less obvious, a fact which bots are also
effective at utilizing. Algorithms have become the standard for the population of content into a
user's feed, from timelines to trending pages to advertisement personalization [24].

Filter Bubbles
The primary consequence of algorithmic subdivision of users is what Pariser [35] calls a "filter
bubble." By design, social media users adhere to familiar, comfortable spheres that do not
challenge their preconceived beliefs and ideas. If, for instance, a liberal user only follows and
interacts with liberal accounts, their ranking algorithm will only have liberal content to draw from
when creating a tailored feed; as a result, their ranked information will consist of only the most
agreeable liberal content available. Even if that user follows some conservative platforms but does
not interact with them, those accounts will receive a low ranking and will be less likely to be seen
at all.

This is a filter bubble: a lack of variability in the content that algorithms feed to users [35]. In an
effort to maximize advertisement revenue, algorithms incidentally seal off a user's access to
diverse information, limiting them to the bubble that they created for themselves by interacting
with the content that they prefer.

As users follow other users and engage with content that is already familiar to them, the filter
bubble inadvertently surrounds them with content with which they already know and agree with
[35]. Psychologically, Pariser says, content feeds tailored to exclusively agreeable content
overinflate the confidence of social media users to reflect on their own ideology and upset the
traditional cognitive balance between acquiring new ideas and reinforcing old ones.

While delivering on the promise of an individually personalized feed, ranking algorithms also
serve to amplify the principle of confirmation bias, the tendency to accept unverified content as
correct if it agrees with previously held beliefs [35]. In this way, filter bubbles act as feedback
loops: as users increasingly surround themselves with content that appeals to their existing
understanding, the filter bubble of agreeable content becomes denser and more concentrated.

Filter bubbles are the first step towards efficacy for a bot campaign. To be adequately noticed, and
thus to disseminate a narrative, a bot needs to be able to work its way into the algorithm pathway
that will lead to first contact with a relevant user's filter bubble feed.

Echo Chambers
Bowen 7

Bots may also exploit unity as much as division. Mandiberg and Davidson [28] theorized that
users' preexisting biases, which are folded into the ranking algorithm process, could drive the filter
bubble process to a more extreme level, one that may break through individual algorithmic
boundaries.

Filter bubbles operate on an individual level— each user's feed is algorithmically tailored to their
individual likes and interests. One of the core elements of social media usage, though, is social
interaction: users are presented with the choice to follow, unfollow, and even block whomever
they please. Given a path of least resistance, users may be accidentally goaded by their filter
bubbles into creating for themselves an "ideological cocoon" [16] [47]. Not only are users more
likely to read information they agree with inside of their own filter bubble, Karlova and Fisher
[22] found, but they are also more likely to share information, news, and content within their
established groups if they agree with it, if it interests them, or if they think it would interest
someone else within their circle.

Gillani et al. [16] posited that homophily may be to blame for the creation of "cocoons."
Homophily, a real-world phenomenon wherein people instinctively associate with like-minded
groups, has found a natural home in filter bubbles online [56]. Unlike real-world public
engagement, though, participatory media platforms are just that— participatory. Because humans
have a tendency to select and interact with content that they approve of, homophily, they can
choose not to participate in or be a part of groups that they do not identify with [28].

On social media, interactions with others are purely voluntary. Users are able to choose to or not
to follow other users. They are able to choose which groups they join and which kinds of content
they interact with. They can even completely remove content that is non-compliant with their
chosen culture— blocking. Beyond just being more likely to see, social media users are more likely
to share information that they agree with; if the platform's algorithm takes sharing into
consideration when ranking, it may strengthen a user's filter bubble [1]. This style of behavior is
called "selective exposure," and it can quickly lead to an elimination of involuntary participation
from a user's social media experience [4].

This combination of factors creates what Geschke, Lorenz, and Holtz [15] describe as a "triple
filter bubble." Building on Pariser's [35] definition of the algorithmic filter bubble, they propose a
three-factor system of filtration: individual, social, and technological filters. Each filter feeds into
the other: individuals are allowed to make their own decisions. In combination, groups of like-
minded users make similar choices about which content they willingly consume. Algorithms,
knowing only what a user wants to see more of, deliver more individually engaging content.

The triple filter bubble cycle has the effect of partitioning the information landscape between
groups of like beliefs. A new user to Twitter, a Democrat, may choose to follow only Democratic-
leaning accounts (e.g., "@TheDemocrats," "@SpeakerPelosi," "@MSNBC"), but the information
sphere they reside in will ostensibly present overwhelmingly pro-Democratic content that rarely
provides an ideological challenge to the Democratic beliefs the user had before joining Twitter.

When like-minded users' filter bubbles overlap, they cooperatively create an echo chamber, into
and out of which counter-cultural media and information are unlikely to cross [15]. Echo chambers
Bowen 8

represent a more concentrated pool of information than a single user's filter bubble or participatory
groups: information, shared by users to others on timelines, in reposts, and in direct messages,
rarely escapes to the broader platform or into other filter bubbles [18].

Echo chambers can quickly lead to the spread of misinformation [4], even through viral online
content called "memes," which are designed to be satirical or humorous in nature. Importantly,
Guadagno et al. [18] found that the source of social media content had no impact on a user's
decision to share it— only the emotional response it elicited impacted decision-making. An appeal
to emotion in an echo chamber can further strengthen the walls of the echo chamber, especially if
the appeal has no other grounds of legitimacy.

Group Polarization
Echo chambers, Sunstein argues, have an unusual collateral effect: group polarization [47]. When
people of similar opinions, those confined within a shared echo chamber, discuss an issue,
individuals' positions will not remain unchanged, be moderated, or be curtailed by discussion: they
will be extremified [48]. Group polarization also operates outside of the confines of social media,
in family groups, ethnic groups, and work groups. Inside of an online echo chamber, though, where
the saliency of contrary information is low and the saliency of belief-affirming, emotionally
reactive information is high, polarization may be magnified [47]. The content users settled in an
echo chamber produce tends to follow the same pattern of extremization [48].

Traditionally, the counterstrategy for group polarization has been to expose users to information
that is inherently contrary to their held belief (e.g., the exemplary Democratic user should be made
to have tweets from "@GOP" integrated into their timeline) [47]. The solution may not be so
simple, though: recent research suggests that users exposed to contrary information or arguments
that counter the authenticity of a supportive source tend to harden their support for their existing
argument rather than reevaluate authenticity [31], [16]. Thus, both contrastive and supportive
information, when inserted into an echo chamber, can increase polarization within the group.

Extreme opposing content presents another problem for polarization: virality. Content that
breaches social norms generates shock value and strong emotions, making it more likely to be
circulated than average content [18]. Compounding the visibility of extremity, social media
algorithms that categorize and publicize "viral" popular trends utilize that content to maximize
engagement. An internal Facebook marketing presentation described the situation bluntly: "Our
algorithms exploit the human brain's attraction to divisiveness," one slide said [36].

Because filter bubbles and echo chambers limit the extent to which groups cross-pollenate beliefs
and ideas, only extreme and unrepresentative beliefs tend to break through and receive inter-group
exposure [31]. This condition, the "if it's outrageous, it's contagious" principle, may provide an
answer as to why contrary information on social media tends to push users further away from
consensus.

Intention may introduce another complication. Yardi's [56] review of several studies on digital
group polarization found that people tended to go online not to agree, but to argue. This may also
indicate that users are predisposed to rejecting contrary information, allowing them to fall back on
their preexisting beliefs.
Bowen 9

Cultivation
Echo chambers, filter bubbles, and group polarization are the central social vulnerabilities that bots
are able to exploit to deliver a payload of disinformation, but even these may be subject to a much
older model of information: cultivation theory.

First theorized in the 1970's to explain television's ability to define issues that viewers believed
were important, cultivation theory has more recently been reapplied to the burgeoning social media
model. The theory states that by selecting specific issues to discuss, the media "set the agenda" for
what issues viewers consider important or pervasive [34].

Just like television, social media may play a role in shaping our perceptions of reality [31]. On
social media, though, agenda-setting is self-selective: ranking algorithms and filter bubbles rather
than television producers frame users’ understanding of the world through the information they
consume. The information that users chose to post is also self-selective: images or stories may
represent only a sliver of reality or may not represent reality at all [34].

Authenticity
Malicious bots' manipulation of social vulnerabilities is predicated on a constant and ongoing
appeal to the users whom they hope to target by tailoring their identity and content to appeal to
those targets. All of this shared content, though, requires authenticity, the perception that a piece
of content is legitimate, by average users [30]. In a traditional news environment, content with a
well-recognized name (e.g., CNN, Wall Street Journal) or expertise on a topic (e.g., U.S.
Department of State) often carries authenticity just by nature of being associated with that well-
known label. In contrast, content online can acquire authenticity by being posted by an average, or
visibly average, user: the more organic, "bottom-up" content is shared or rewarded with
interactions, the more authenticity it generates. Content that is false but appeals to a user's filter
bubble [35] and appears authentic is more likely to be spread [30].

Authenticity can be achieved at multiple levels of interaction with a piece of content: a tweet itself,
the account's profile, the account's recent tweets, the account's recent replies, and the timing and
similarity between each of the former. Low-level authenticity analysis requires the least amount
of available information, while high-level authenticity checks require most or all available
information.

Authenticity may also be tracked over time: users that see a single account repeatedly, view its
profile, or regularly interact with its content must have long-term reinforcement of the account's
percieved legitimacy. Having the hallmarks of a legitimate account, including posting on a variety
of topics, can help increase a bot's authenticity.

Authenticity is generated through the passage of deception checks, the process of looking for cues
about the authenticity or lack thereof for an account. These checks can occur at each layer of
authenticity and can help uncover unusual behavior indicative of a bot [58]. Bots that do not have
multiple layers of authenticity are less effective at preying on social vulnerabilities and thus less
effective at achieving their intended goal.
Bowen 10

All content and all users on a platform are subject to deception checks, but even cues for deception
checks are subject to social vulnerabilities [22]. Content that appeals to a preexisting bias may be
able to bypass checks and deliver the desired narrative payload.

Authority
If a user has authenticity, it has authority, or the perception that, based on authenticity, a user is
acceptable and that their information is accurate. Authority can either come from the top or the
bottom of an information landscape. "Top-down" authority is derived from a single figure with
high authenticity, like official sources, popular figures, or verified users (e.g., "@POTUS,"
"@VP," "@BarackObama"). "Bottom-up" authority is derived from a wide number of users who
may have less authenticity, but have a collective consensus, expose a user to a concept repeatedly,
or are swept up in some "viral" content (e.g., the sum of the accounts a user is following).

Authority and authenticity are not universal: different users have different standards of
authenticity, perform varying degrees of deception checks, and agree variably on the legitimacy
of a source. Authority and authenticity are often conflated with agreeability by users: the degree
to which they ideologically agree with a piece of content may dictate its reliability. While this is
not a legitimate method for deception checks, bots can effectively prey on users by exploiting their
social vulnerabilities and predilection for agreeable information to disseminate a desired narrative
[22]. Once authenticity and authority have been established, though, a user is likely to accept the
information that they consume from these sources, regardless of its actual veracity.

Disinformation
Riding the wave of algorithmic sorting and exploiting social vulnerabilities, malicious bots can
weaponize authenticity and authority to distribute their primary payloads: damaging information.

As the efficacy of filter bubbles show, consumers of social media are actively fed information that
agrees with their social circles' previously held beliefs, enhancing the effect off the filter bubble
feedback loop [2]. In fact, the desire for emotionally and ideologically appealing, low-effort,
biased information is so strong that social media users are predisposed to accepting false
information as correct if it seems to fit within their filter bubble [2] and if it has percieved authority.

Accounts online can spread three types of information through filter bubbles and echo chambers.
Information in its base form is intentionally true and accurate, while misinformation is
unintentionally inaccurate, misleading, or false. Critically, though, disinformation, the favored
fashion of information spread by malicious bots, is more deliberate. Bennett and Livingston [2]
define disinformation as "intentional falsehoods spread as news stories or simulated documentary
formats to advance political goals." Simply put, disinformation is strategic misinformation [38].
Fig. 1 depicts such a strategy: by mixing partially true and thoroughly false statements in one
image, the lines between information and disinformation are easily blurred.
Bowen 11

Sparkes-Vian [46] argues that while the democratic
nature of online social connectivity should foster
inherent counters to inaccurate or intentionally false
information, appeals to authenticity and subversive
tactics for deception online supersede corrective
failsafes and allow disinformation to roost.
Disinformation can be so effective at weaponizing
biases that it can be spread through filter bubbles in the
same manner as factual information [26].

Consensus on the mechanisms of disinformation's
efficacy has yet to be reached. Some research has
found that deception checks may be "hijacked" by
existing biases [38], but this conclusion is undercut by
the fact that rationalization of legitimate contrary
information spurs increased polarization [31].
Pennycook [38], meanwhile, has concluded that such Figure 1
reasoning does not occur at all in the social media
setting: "cognitive laziness," a lack of engagement of critical thinking skills while idly scrolling
through social media, may disengage critical reasoning skills entirely. Social media users that
consistently implemented deception checks and reflective reasoning, Pennycook found, were more
likely to discern disinformation from information. Even when reading politically supportive or
contrastive information, users who performed effective deception checks were more efficacious at
rooting out misinformation.

 Memes
 "Memes," shared jokes or images that evolve based on an
 agreed-upon format, offer an easy vehicle for
 disinformation to spread without necessarily needing to
 generate the same authenticity cues as fake news [46]. Like
 disinformation, memes often appeal to emotion or
 preexisting biases, propagating quickly through filter
 bubbles. Memes are also easily re-formatted and re-posted
 with an account-by-account augmentable meaning.
 According to Sparks-Vian [46], memes are shared either by
 "copying a product," where the identical likeness of a piece
 of content is shared repeatedly, or "copying by instruction,"
 where a base format is agreed upon and variations of the
 format are shared individually with varying meanings and
 techniques. Fig. 2 depicts a "copy by instruction" meme
 format with disinformation.
 Figure 2

Doxing
Another less common tactic for disinformation spreading is "doxing," whereby private information
or potentially compromising material is published on sites like WikiLeaks or Предатель ("traitor")
Bowen 12

and redistributed on social media [30]. Doxing, also used as a tactic of the American alt-right on
message boards like 4chan and 8chan, has most visibly been used to leak emails from the
Democratic National Committee (DNC) in June 2016 and French presidential candidate Emanuel
Macron in May 2017 [29].

Fake News
One of the most-discussed mediums for disinformation is "fake news," news stories built on or
stylized by disinformation. While the term entered the national lexicon during the 2016 U.S.
presidential election, misrepresentative news articles are not a new problem— since the early
2000's, online fake news has been designed specifically to distort a shared social reality [57].

Fake news may proliferate along the same
channels as disinformation, and as with
other forms of disinformation, fake news,
inflammatory articles, and conspiracy
theories inserted into an echo chamber
may increase group polarization [58]. Fake
news articles and links may bolster a
malicious bot's efforts to self-insert into a
filter bubble, especially if headlines offer
support for extant beliefs [57].

While disinformation in its raw form often
carries little authenticity, disinformation
stylized as legitimate news may build
authority, even if the source represents a
false claim. If a fake website, like that in
Fig. 3, can pass low-level deception
checks, it can bolster the legitimacy of a
claim, thus boosting the authority of a
narrative. Of course, fake news can
contribute to disinformation by simply
being seen and interpreted as legitimate.

Russia's Internet Research Agency Figure 3
effectively weaponized fake news to
galvanize its readers away from more credible sources during the 2016 U.S. presidential election.
By investing in the bottom-up narrative that the mainstream media was actually "fake news" and
that alternative sources were the only legitimate way to understand the world, Russia was able to
label the media, users, and platforms attempting to correct false information as suppressers of a
real narrative. A similar waning trust in traditional media and political principles allowed the IRA
to engage in the complete fabrication of events of its own design [26].

Prior Exposure
Fake news is also able to effectively spread by praying on a more subtle social vulnerability: prior
exposure. Pennycook, Cannon, and Rand [37] found that simply seeing a statement multiple times
Bowen 13

increased readers' likelihood of recalling the statement as accurate later. Even statements that were
officially marked as false were more likely to be recalled as true later. Similarly, blatantly partisan
and implausible headlines, like that of Fig. 3, are more likely to be recalled as true if users are
repeatedly exposed to them online. Just a single prior exposure to fake news of any type was
enough to increase the likelihood of later misidentification.

Prior exposure to fake news creates the "Illusory Truth Effect:" since repetition increases the
cognitive ease with which information is processed, repetition can be incorrectly used to infer
accuracy [37].

The illusory truth problem supports Pennycook's [38] cognitive laziness hypothesis: because
humans "lazy scroll" on social media and passively consume information, repetition is an easy way
to mentally cut corners for processing information. As a result, though, false information that
comes across a user's social media feed repeatedly is more likely to be believed, even if it is
demonstrably false.

Propaganda
Damaging information need not be false at all, though. Propaganda, another form of information,
may be true or false, but consistently pushes a political narrative and discourages other viewpoints
[49]. Traditional propaganda, like that generated by Soviet Union propaganda factories during the
20th century [25], follows the top-down authority model, being created by a state or organization
seeking to influence public opinion. Propaganda on social media, however, may follow the bottom-
up authority model: being generated not by a top-down media organization or state, but dispersed
laterally by average users [26]. Organic, or seemingly organic, propaganda is more effective than
identical, state-generated efforts [46].

 Figure 4

One of the central benefits of social media is the accessibility of propaganda re-distributors:
retweets, reposts, and tracked interactions may bolster the visibility of a narrative simply because
real users interacted with it. Fig. 4 depicts a piece of top-down propaganda that attempts to utilize
factual information to appeal to a lateral group of re-distributors.

Just like disinformation, propaganda can spread rapidly online when introduced to directly target
a filter bubble [49]. Unlike traditional disinformation, though, propaganda that preys on social
vulnerabilities is not designed for a reader to believe in its premise, but to radicalize doubt in truth
altogether [7].

Bottom-up propaganda can be mimicked inorganically by top-down actors: distributors of
disinformation and propaganda engage in "camouflage" to disseminate seemingly legitimate
content through online circles [26]. To effectively shroud propaganda as being organic, bottom-up
content, distributors must build a visible perception of authority.
Bowen 14

Cognitive Hacking
Exploiting social vulnerabilities, weaponizing algorithms, and deploying disinformation and
propaganda all serve the ultimate aim for malicious bots: manipulation. Linvill, Boatwright, Grant,
and Warren [26] suggested that these methods can be used to engage in "cognitive hacking,"
exploiting an audience's predisposed social vulnerabilities. While consumers of social media
content should judge content by checking for cues that may decrease authenticity and credibility
[22], content that both appeals to a preconceived viewpoint and appears authentic is able to bypass
deception checks [30].

The traditional media tactic of agenda setting argues that media coverage influences public
perceptions of issues as salient. In contrast, Linvill and Warren [27] suggest that public agenda
building, behavior responses to social movements online can be influenced by disinformation and
propaganda. Mass media altering issue salience in mass audiences (agenda setting) is not as
effective as mass audiences generating issue salience collectively (agenda building).

State-sponsored efforts to influence the public agenda are less effective than naturally generated
public discussion, and as such efforts to alter the public's agenda building are more effective when
they are generated either by citizens themselves or by users that appear to be citizens [27]. To this
end, malign actors utilize bots in social media environments to disseminate their narrative.

Hacking in Practice
Karlova and Fisher [22] argue that governments can exploit social vulnerabilities to disseminate
disinformation and propaganda. Governments and organizations that engage in the manipulation
of information, currency, and political narratives online and on social media have been labeled by
the European Commission as "hybrid threats:" states capable of engaging in both traditional and
so-called "non-linear" warfare waged entirely with information [2].

One such state is Russia, whose modern disinformation campaigns rose from the ashes of the
Soviet Union's extensive propaganda campaigns throughout the 20th century [25].

Case Study: The Internet Research Agency
The centerpiece of the Russian social media disinformation campaigns since at least 2013 has been
the St. Petersburg-based Internet Research Agency [7], a state-sponsored troll factory, a group
dedicated to creating and managing bots [27].

The IRA's reach was deep: measuring IRA-generated tweet data and Facebook advertisement log
data, up to one in 40,000 internet users were exposed to IRA content per day from 2015 to 2017
[21].

The IRA's efforts were first exposed after a massive, multi-year coordinated effort to influence the
outcome of the 2016 United States presidential election [32]. Their goal was not to pursue a
particular definition of truth and policy, but to prevent social media users from being able to trust
authorities, to encourage them to believe what they were told, and to make indistinguishable truth
from disinformation [7].
Bowen 15

To those ends, the IRA utilized bots in a variety of multi-national campaigns to amplify a range of
viewpoints and orientations to decrease coordination in both liberal and conservative camps [27].
Early on, Russian-operated accounts inserted themselves into natural political discourse on
Twitter, Facebook, and Instagram to disseminate sensational, misleading, or even outright false
information [26]. They worked to "delegitimize knowledge" not at the top levels of public media
consumption, but at the ground level of interpersonal communication online.

To create a sense of authenticity and bottom-up authority [30], IRA accounts on Twitter, Facebook,
and Instagram built identities as legitimate citizens and organizations with a spectrum of political
affiliations, from deep partisan bias to no affiliation at all [27]. Many accounts acted in concert,
generating a fluid machine process, which Linvill and Warren [26] liken to a modern propaganda
factory.

To overcome, or perhaps social media filter bubbles, the IRA generally operated a wide variety of
accounts, including pro-left, pro-right, and seemingly non-partisan news organizations [26].
Increasing authenticity in these political circles meant posting overtly political content and relevant
"camouflage" that signals to a user that the account is, in fact, operated by a legitimate citizen.

Case Study: 2014 Ukrainian Protests
In one of Russia's earliest exertion of bot operations, state actors conducted disinformation
operations in nearby Ukraine beginning in 2014 [30]. Protest movements in response to Russian-
sympathetic Ukrainian president Viktor Yanukovych flourished on social media, but both Russian
and Ukrainian authorities were able to disrupt protest movements by inserting disinformation into
the social media platforms on which protestors were actively planning.

The Ukrainian social media protests occurred just a few years after the Arab Spring protests in
which Twitter and Facebook played vital roles in organization, dissemination of information, and
free expression. The speed at which information was passed regarding the Ukrainian protests
online heightened insecurity at the upper levels of the Russian and Ukrainian governments [30].

Russian and Ukrainian state actors posing as protestors inserted disinformation into social circles.
A 2013 photo of a Syrian war victim was used to show how Ukrainian soldiers had attacked a
young boy in Ukraine [30]. Screenshots from notoriously violent Belarussian film "The Brest
Fortress" were used to show a little girl crying over the body of her mother. While both images
were demonstrably false, the content was still used both to dissuade protestors from publicly
joining the effort and destabilize Ukrainian citizen coordination. Because it came from outwardly
"regular" or legitimate sources and thus carried both authenticity and bottom-up authority, the false
content carried inherently high credibility with protestors [30].

Ukraine was one of Russia's earliest forays into social media manipulation, and a new step away
from direct intimidation of opponents [30]. Its experiments in limited online social circles showed
state actors that citizens can actively participate in the creation and dissemination of disinformation
and propaganda [30] without the state having to exert a previously overt role [25].
Bowen 16

Case Study: 2016 U.S. Presidential Election
The genesis of the national conversation of bot campaigns was the 2016 U.S. presidential election,
where a coordinated effort by the IRA first sought to destabilize the U.S. political system and erode
trust in the political process [32].

The IRA's goal in the election was not to directly support a Trump presidency, but to sow discord,
foster antagonism, spread distrust in authorities, and amplify extremist viewpoints [27]. Their
methods, though, seemed to favor a Trump presidency overall [26].

On May 17, 2017, former FBI director Robert Mueller was appointed to a head a special counsel
investigation into Russian interference into the election. While much of the 448-page report, which
was released on April 18, 2019, remains redacted, publicly available information in the report
details the IRA's adherence to advertisements, Facebook groups, and Twitter trolls to spread
disinformation [32].

In sum, the IRA employed nearly 3,900 false Twitter accounts to produce, amplify, and insert
disinformation, propaganda, fake news, and divisive content into preexisting American social
media circles [26]. The organization utilized a combination of bot techniques and preyed on several
social vulnerabilities to sow discord on Twitter, operating accounts with explicitly partisan
leanings ("@TEN_GOP") and accounts with bottom-up authenticity ("@Pamela_Moore13,"
"@jenn_abrams") [32]. IRA Twitter content was viewed and interacted with over 414 million
times on the platform between 2015 and 2017 [14].

On Facebook, the IRA focused on acquiring the support of existing American users with partisan
groups ("Being Patriotic," "Secured Borders," "Tea Party News") and social justice groups
("Blacktivist," "Black Matters," "United Muslims of America") [32]. The IRA also purchased
approximately $100,000 worth of algorithmically targeted Facebook advertisements promoting
IRA-operated groups and pro-Trump, anti-Clinton messaging. IRA Facebook content reached a
total of 126 million American users [14].

Hall [19] found that American citizens online were unable to differentiate between false 2016
election content and real content on either platform. Even if users were able to distinguish between
the two, false information and propaganda often still affected user opinion.

Specific Attacks: 2017 French Presidential Election
Just five months after their successful influence campaign in the U.S., the IRA set its sights on the
French presidential election. Evidence collected by Ferrara [12] suggested that troll factories
attributed to the Kremlin assembled a coordinated campaign against centrist candidate Emanuel
Macron and his party, "En Marche," in France's 2017 presidential elections. 4chan.org, a mostly
American far-right messaging board, fostered the initial discussion of leaking or even
manufacturing documents to incriminate the Macron campaign. On May 5, 2017, just two days
before the presidential election, a coordinated doxing effort released "MacronLeaks" to the public
via well-known leak aggregator WikiLeaks, mirroring a similar Russian effort against the DNC in
the United States the previous year [11].
Bowen 17

Russian disinformation groups seized on the initial MacronLeaks dump, coordinating another
bottom-up assault via social media [12]. They dispatched bots which largely amplified existing
narratives, re-tweeting tweets pertaining to the dump or sending tweets which included
"#MacronLeaks," "#MacronGate," or just "#Macron." The bots also generated tweets aimed at
Marine Le Pen, a far-right candidate and Macron's electoral opponent.

Beginning on April 30, 2017, the bot campaign ramped up quickly and generated at peak 300
tweets per minute and continued through Election Day, May 7, and fading out by the following
day. When the MacronLeaks dump occurred on May 5, bot-generated content regarding Macron
and the election began to increase. At peak, bot-generated content competed with or matched
human-generated posts, suggesting that the MacronLeaks campaign generated substantial public
attention. Additionally, increases in bot traffic and content on Twitter tended to slightly precede
corresponding increases in human content, suggesting that bots were able to "cognitively hack"
human conversations and generate new discussion topics, particularly regarding controversial
issues [12].

The bot campaign in France, though, has been largely regarded as a failure: the program, despite
generating substantial online discussion, had little success at swaying French voters [12]. It has
been suggested that because the majority of bot-generated Tweets were in English, not French, the
language selection both snubbed native-French participation and engaged the American anti-
Macron user base, contributing to the volume of Tweets about the MacronLeaks dump.

Case Study: 2020 U.S. Presidential Election
After the successful execution of the bot operation during the 2016 Election, Russia turned its
sights on the 2020 election, which it again hoped to mold in favor of its foreign policy interests.
The 2018 Mueller Investigation, though, had thoroughly exposed the IRA's playbook, and social
media platforms had already begun reacting to pervasive misinformation with more stringent
social media restrictions [53], necessitating a new strategy for influence. Reorganizing under the
less stigmatic moniker "Lakhta Internet Research" (LIR), which first operated during the 2018
U.S. midterm elections, the IRA focused its mission fully on amplifying existing domestic
narratives and issues [33].

In April 2020, a cooperative investigation between CNN and Clemson University behavioral
science researchers Darren Linvill and Patrick Warren uncovered one of the LIR's new tactics:
outsourcing [52]. A pair of proxy troll farms, located in Ghana and Nigeria, produced content
specifically targeted Black Americans, working to inflame racial tensions and heighten awareness
of police brutality. In just eight months, LIR-owned accounts operated out of these troll factories
garnered almost 345,000 followers across Twitter, Instagram, and Facebook.

The exposition of the Ghanaian and Nigerian proxy troll factories did not deter Moscow or the
LIR. Just four months later, on August 7, 2020, National Counterintelligence and Security Center
director William Evanina released a statement outlining the Intelligence Community's confidence
in a sustained, pervasive social media influence operation to sway voters and increase social
discord [10]. Simultaneously, a Carnegie Mellon University study found that 82% of the most
influential accounts on social media distributing information about the COVID-19 pandemic were
bots [58].
Bowen 18

But on November 2, 2020, just one day before Election Day, researchers at the Foreign Policy
Research Institute (FPRI) noted that it was state news channels and official state sources, not
swarms of bots and trolls, that were pushing divisive and anti-democratic narratives about the
election, particularly utilizing phrases like "rigged" and "civil war" [3]. A separate analysis by the
FPRI on the same day came to the same conclusion: state-sponsored networks, personalities, and
figures seemed to be doing the heavy lifting of disinformation distribution, especially on the issue
of mail-in voting, which had received a similar stamp of "fraudulence" from state sources [51].
After the election, it appeared Moscow's efforts to bolster President Trump's reelection had failed.

In a post-election autopsy report on election interference, the National Intelligence Council (NIC)
dissected intelligence on the LIR's tactics, successful or otherwise, during the election cycle [33].
Rather than focus on generating bottom-up authority with an army of bots online, the Kremlin
seemingly took an approach of top-down authority instead, using existing social media personas,
unreliable news websites, and visible U.S. figures and politicians to deliver its divisive and
sometimes outright false messaging. Its comparatively minor bot campaign, centered around the
LIR, served mostly to amplify U.S. media coverage of the messaging it had pushed through its
other mediums, and tended to ride the coattails of established personas online, including, the report
hints, Rudy Giuliani, President Trump's personal lawyer, and President Trump himself.

The report also noted, though, that in the month leading up to the election, Moscow opted to
shift its tactics again, beginning to discredit an incoming Biden administration and the results of
the election rather than continue to support what it viewed as a shrinking possibility for a Trump
reelection. After the election, it doubled down on the "rigging" narrative, raising questions about
the validity of Biden's election, exclusively to foster further distrust of the system [33].

Watts [53] argues that the efficacy of bots, trolls, and foreign influence, though, paled in
comparison to the wide-reaching success of domestic disinformation. Top-down authority was
more effective in the 2020 U.S. presidential election, Watts argues, because one of Moscow's most
effective amplifiers of disinformation was already in the White House. Indeed, the NIC found that
the operation largely rested on existing, popular figures to disseminate the narratives Moscow had
chosen [33]. Meanwhile, filter bubbles, echo chambers, group polarization, and preexisting
domestic attitudes culminated in the violent January 6, 2021 raid of the U.S. Capitol by Trump
supporters who cited the very claims of fraudulence that the Kremlin's disinformation campaign
had produced.

If Russia's efforts in the 2016 U.S. presidential election laid the groundwork for the social diffusion
of discord, the 2020 election was a successful trial run. Despite its candidate's loss in the election,
Moscow's influence operation was able to convince a significant portion of the U.S. population
that their election system was illegitimate and broken enough to warrant an insurrection against it
[53]. This succinctly meets one of Moscow's long-term foreign policy goals of destabilizing the
Western-led Liberal International Order.

Bot Tactics
In these cases, malicious bots were successful because they effectively bred bottom-up authority
and consensus on the narratives they delivered. To exploit social vulnerabilities and "hack" users
Bowen 19

in these manners, though, bots must utilize a number of tactics to create their authentic personas
and bolster their authenticity and thus authority.

Bypassing Controls
In order to operate a successful network of bots, the accounts must first be created and designed.
When a traditional user creates their account, they are prompted to enter their name, email, phone
number, and date of birth. They must also confirm their legitimacy by recalling a numeric code
sent to their email address or phone number. Occasionally, users may also be asked to solve a
CAPTCHA, ("Completely Automated Public Turing test to tell Computers and Humans Apart"),
a test designed to differentiate artificial and human users which often involves mouse movement
tracking or picture recognition.

Platforms like Twitter have gradually increased the difficulty of creating new accounts to dissuade
troll factory creators, from producing hordes of accounts en masse. Similarly, Google, which offers
simple email account creation tools, increasingly requires cross-authentication to create new
accounts. Additionally, both platforms may track the number of new creations on a given IP
address, blocking the address after a certain number of account creations in a given period.

On Twitter, developer accounts with tightly controlled Application Programming Interface (API)
calls are required to own and control bots. Rates for connection to the API, both for reading and
writing, are limited to prevent spamming and massive data collection.

A single bot or small group of accounts, though, is largely insufficient to produce the desired
illusion of bottom-up authority required to conduct an effective campaign of influence. As
restrictions become more stringent, the cybersecurity arms race has encouraged elicit troll factories
to craft new tools to work around these restrictions and maximize their output for the creation of a
bot army.

Rather than apply for a Twitter developer account and work under the confines of rate limits,
HTML parsers in programming languages like Python allow troll factories to bypass the API
entirely by reading, interpreting, and hijacking the Twitter Web App interface, which is not rate
limited. As a result, many bots may display a disproportionate usage of the Twitter Web App.

Similarly, programs and scripts designed to bypass proxy limiting, account verification, and
CAPTCHA checks are freely available and accessible online. Tools like the PVA Account Creator
combine these tactics into one functioning application, allowing users to farm accounts with little
resistance or repudiation from Twitter or Google.

Obfuscation
Once an account is created and Twitter's restrictions have been bypassed, the more difficult task
of maintaining secrecy and effective operation begins. If a bot wants to remain online and avoid
detection by real users employing deception checks, it must engage in a number of obfuscation
tactics to increase its authority and blend in with real users.

The most glaring flaw of the bottom-up propaganda model, of which bots are a facet, is that the
creation of individual authenticity required to support bottom-up propaganda is much more
You can also read