Alexa in Phishingland: Empirical Assessment of Susceptibility to Phishing Pretexting in Voice Assistant Environments

Page created by Tammy Mack
 
CONTINUE READING
2021 IEEE Symposium on Security and Privacy Workshops

                  Alexa in Phishingland: Empirical Assessment of Susceptibility to Phishing
                                 Pretexting in Voice Assistant Environments

                                          Filipo Sharevski                                        Peter Jachim
                                        School of Computing                                   School of Computing
                                          DePaul University                                     DePaul University
                                            Chicago, IL                                           Chicago, IL
                                      fsharevs@cdm.depaul.edu                                  pjacim@depaul.edu

           Abstract—This paper investigates what cues people use to               some action with an URL included in that email. The
           spot a phishing email when the email is spoken back to                 adversary can, in addition, ask a user of the native Alexa
           them by the Alexa voice assistant, instead of read on a                email skill to reply in some form (e.g. reply “yes”). If
           screen. We configured Alexa to read there emails to a sample           they do, they could be a potential target for phishing
           of 52 participants and ask for their phishing evaluations.             emails coming from a particular authoritative sender [3].
           We also asked a control group of another 52 participants               If they ignore it or delete the email, the adversary can use
           to evaluate these emails on a regular screen to compare                another strategy, e.g. “urgency” or “reciprocity” [4]. In
           the plausibility of phishing pretexting in voice assistant             this way, the adversary can exploit the trust users have into
           environments. The results suggest that Alexa can be used               Amazon Alexa and its native email function to perform
           for pretexting users that lack phishing awareness to receive           “pretexting” without risking user suspicion in third party
           and act upon a relatively urgent email from an authoritative           skills [5] or being detected by tools like SkillExplorer [6].
           sender. Inspecting the sender (“authority cue”) and relying                 In this paper we investigated the preconditions for
           on their personal experiences helped participants with higher          an adversary to utilize the Alexa native email function
           phishing awareness to use Alexa towards a preliminary email            towards “pretexting” for phishing, that is, find out the
           screening to flag an email as potentially “phishing.”                  susceptibility of users to phishing when such an email
                                                                                  is spoken back to them. Alexa delivered three separate
           Index Terms—Voice assistant security, IoT security, phishing           emails to a sample of 52 participants that were asked
           susceptibility, Amazon Alexa                                           to decide whether each of the emails is phishing or not
                                                                                  and elaborate on the cues they based that decision on. To
           1. Introduction                                                        compare the users’ susceptibility to spoken back emails
                                                                                  versus the susceptibility to visually inspected emails, we
               Over the years we have to condition ourselves to spot              exposed a control group of another 52 participants to
           a phishy URL or an overtly persuasive email narrative.                 the same email stimuli and collected their feedback. We
           The act of spotting phishing is mainly based on visual                 measured all 104 participants’ “phishing susceptibility”
           inspection of the email through a computer or smartphone               using the SeBIS scale [7] and used the scores together with
           interface. Users recently got an alternative interface for             their responses to determine the effectiveness of various
           email inspection in the voice assistants like Amazon Alexa             “pretexting” and phishing tactics. This work provides the
           or Google Home [1]. The content in the voice assistant                 first empirical assessment of susceptibility to phishing
           ecosystem is usually “spoken back” to users, and as such,              attacks when users listen to an email, but not look at it.
           users rely mostly on audio inspection. The voice/audio
           interface entails different patterns of interaction, which             2. Susceptibility to Phishing
           could affect the way users inspect suspicious content
           like email. The difference in email inspection opens an                2.1. Cognitive Vulnerabilities
           opportunity, we suspect, for adversaries to target users’
           susceptibility to phishing emails.                                         The goal of a phishing campaign is to elicit a decision
               Spoken back emails, even if phishing, are mostly                   from a vulnerable user that under normal circumstances
           harmless for users: they cannot click on the phishing                  they otherwise wouldn’t make. Exploiting one’s cognitive
           links or download any damaging attachments. This type                  vulnerabilities through phishing rests on three fundamen-
           of interaction is different from the one where a third-                tal principles: epistemic asymmetry, technocratic domi-
           party application for Alexa tries to lure the user to spell            nance, and teleological replacement [8]. Epistemic asym-
           their password directly to the voice assistant [2], which              metry occurs when an adversary employs a persuasion
           is a phishing variant called voice phishing or “vishing”.              strategy to manipulate the heuristics a user employs to
           Instead of crafting a third-party skill to try to retrieve the         decide how to act on an email [9], [10]. These heuristics
           user’s password, an adversary thus can send a preliminary              are influenced by the following persuasion strategies [4]:
           email with “pretexting” content, e.g. assume the guise of              authority, commitment, liking, reciprocation, scarcity, and
           an “authoritative” figure and create a believable scenario             social proof. These strategies have a varying effect on
           that elicits a user to expect a follow-up email and perform            users and studies have shown that users are significantly

© 2021, Filipo Sharevski. Under license to IEEE.                            207
DOI 10.1109/SPW53761.2021.00034
more susceptible to emails using the authority and scarcity          attacker, is to condition the target user, i.e. create a believ-
principles [3], [11], [12].                                          able scenario that elicits the target user to expect a follow-
     Technocratic dominance occurs when an adversary                 up email and perform some action (e.g. follow an URL).
mimics the cues a user employs to access the authenticity            This email also asks the target user to reply in some form
of an email e.g. spoofing “from” addresses or including              (e.g. reply “yes”) to “confirm” that they have received this
deceptive images/logos/banners, deceptive links, “https,”            preliminary notification from the authoritative sender. Any
and spelling and grammar [13], [14]. Studies have found              action like confirmation with “yes” gives an indication that
that the use of convincing logos and letterheads makes it            the target user might be susceptible to phishing emails
significantly harder to detect a phish for an average target         coming from an authoritative sender [3].
[15]. Scarcity of time, or urgency, was found to short-                  For example, an adversary can send a preliminary
circuiting the resources available for assessing the techni-         email that appears it is from a user’s IT department asking
cal cues to detect phishing [16]. Teleological replacement           the user to reply “yes” if they don’t want opt out from
occurs when the adversary manages to exploit users to                a large mail server migration. If the target user replies
act against their better judgment [16], [17]. For example,           “yes,” the adversary can follow up with an actual phishing
when people are stressed or under pressure, overloaded               email and ask the user to confirm their opt-out on “the
with information, or heavily focused on a primary task,              following link,” where the link is actually a fraudulent
their ability to notice suspicious emails is reduced [18].           URL. Certainly, there is no guarantee that the user will
Even if suspicious emails are noticed, people may not feel           actually click the link or recall they agreed on such an
they have sufficient time, resources or means to further             action when conversing with Alexa and succumb to the
process any persuasion or technical cues. An attacker can            reciprocity principle. Nevertheless, the adversary can take
either “pretext” a target with overloading information or            the “pretexting” further on and add a deadline for the
utilize outside information to infer their cognitive over-           reply before the deactivation takes place automatically. If
load and choose a particular moment to send a phishing               the target user in this case replies “yes” the adversary
email. The phishing incident with John Podesta, Hillary              gets the indication that “urgency” might work and rushes
Clinton’s campaign manager, provides evidence for such a             to send the actual phishing email to the user.
strategy that ultimately altered his behaviour and resulted              Even if all of this is futile and doesn’t work for the
in yielding his Gmail login credentials [19].                        adversary, it might be beneficial for the user. Users, many
                                                                     of which have been trained to spot or have experienced
2.2. Phishing Preparation: Pretexting                                phishing emails in the past, could be “incentivized” by
                                                                     Alexa to scrutinize any email in more detail on a screen
                                                                     and neutralize the phishing effort. Another reason for
     Pretexting is a commonly-used technique in social en-
                                                                     suspicion of such a pretexting email is a type of a phone
gineering attacks in general and phishing in particular. In
                                                                     scam called “Can you hear me?” that lures the victims
pretexting, the adversary uses a pre-designed scenario to
                                                                     to respond with “yes.” [23]. The goal of the scam, in
legitimize their interactions with potential victims, reduce
                                                                     the similar way as the Alexa vishing attack [2], is to
their suspicions, and eventually mislead them to click on
                                                                     obtain an approval in a form of voice signature and that
a phishing URL or download a malicious attachment [20].
                                                                     can later be used by the adversaries to pretend to be the
Studies have found that phishing attacks coupled with pre-
                                                                     victim and authorize fraudulent activities. Any immediate
texting are more likely to victimize message recipients and
                                                                     action such as giving confirmation “out of the blue”,
result in teleological replacement with a higher success
                                                                     could potentially trigger a phishing flag for users aware
rate [21], [22], [16]. Usually the pretexting scenario con-
                                                                     of phishing and vishing. In other words, Alexa could in
textualizes the phishing principles of persuasion in a con-
                                                                     and of itself help users eliminate any epistemic asymmetry,
text relevant for the phishing victim, for example, crafting
                                                                     bring the technological dominance on the side of the users,
emails for a customer survey employing the principle of
                                                                     and deny any effort for teleological replacement through
reciprocation to say Verizon and T-Mobile customers to
                                                                     “pretexting” by the phishing attackers.
resemble the usual communication patterns each of the
cellphone carriers does with their customers. However,
little has be done in exploring the possible pretexting              4. Alexa in Phishingland
potential using alternative interfaces and communication
                                                                         Our study seeks to understand how the change in the
channels with the potential victim. In this paper, we
                                                                     interface, from visual to voice/audio interaction, affects
introduce a novel type of epistemic asymmetry that utilizes
                                                                     the inspection of emails by end users. Our objective was
voice assistants like Alexa as a form of technocratic dom-
                                                                     to see if an adversary could infer a phishing susceptibility
inance over target users that helps an adversary achieve
                                                                     strategy for a user by sending several “pretext” emails
teleological replacement through “pretexting”.
                                                                     through Amazon Alexa and observe the user’s behaviour.
                                                                     We set to answer the following research questions:
3. Phishing Pretexting with Alexa
                                                                       Research Question 1a: What action one takes to re-
    An email skill on the voice assistant allows users                 spond to an email spoken back by Alexa?
to ask Alexa to read, flag, respond, delete, or search
with a keyword for a particular email in their inbox on                Research Question 1b: If one’s action is to ask Alexa
their behalf. Some of these emails could be a preliminary              to reply on their behalf to an email, what are the most
phishing with “pretexting” content, e.g. assume the guise              effective influencing strategies employed in the email
of an “authoritative” figure. This email, sent by a phishing           formatting that resulted in a successful “pretexting”?

                                                               208
Research Question 2a: What cues (audio, experience,                      5. Results
  prior phishing knowledge) one uses to inspect an email
  spoken back by Alexa and decide whether the email is
  phishing or not?                                                         5.1. Research Question 1
  Research Question 2b: What cues (audio, experience,
  prior phishing knowledge) one uses to decide to ask                      5.1.1. A: Responding to a Spoken Back Email. Table
  Alexa to reply on their behalf to a spoken back email?                   2 shows the distribution of responses chosen for each
                                                                           of the three emails. Roughly 9% of the participants in
  Research Question 3a: Is there a difference in sus-                      the Alexa group indicated they will ask Alexa to reply
  ceptibility to pretexting and phishing, consequently, be-                “yes” on their behalf. This result is on the lines of the
  tween emails spoken back by Alexa and emails visually                    percentage of people that fall on phishing emails (click
  inspected on a screen?                                                   rate) [27], given that Alexa group participants scored
                                                                           high on the SeBIS scale on average (M= 3.7147, SD =
  Research Question 3b: How the cues one uses to                           .658). Inspecting the spoken back email prompted most
  inspect an email differ between an email spoken back                     of the participants to flag the emails and check on the
  by Alexa and visually inspected email?                                   screen or simply ignore the email. This is expected, given
                                                                           that users are yet to accustom to utilizing Alexa in full
                                                                           assisting capacity [28]. The first email, employing both
    With an IRB approval we recruited participants affili-                 the principles of authority and scarcity prompted 5.8%
ated with a university in the US. The inclusion criteria re-               of the participants to reach out and call the “Information
quired them to be at least 18 years or older and have inter-               Security” department. This results suggest that a combina-
acted with an voice assistant. A sample of 104 participants                tion of the persuasion principles elicit a response from the
agreed to be in the study. Half of them were randomly as-                  user (potentially contrary to the adversary;s objectives).
signed to interact with Alexa and the other half were given                Roughly half of the participants indicated that they will
emails for visual inspection (a control group). The emails                 delete the second email, probably numb from receiving
relate to activities usually communicated via email, e.g.                  too many email surveys.
university communication, a phone carrier message (Ver-
izon), and a payment provider notification (Venmo). The
emails were adapted from previous phishing susceptibility                        TABLE 2. U SER R ESPONSES TO SPOKEN BACK EMAILS
studies [24], [25]. We modified the email text to include                                                        Email Stimuli
only the option for replying “yes” instead of the URL                       Response                     First    Second         Third
for the Alexa group, corresponding to the “pretexting”                      Alexa, reply “yes”           9.6%     8.7%           8.7%
strategy we elaborated in Section 3. Each participant in the                Flag and check on a screen   34.7%    53.8%          68.3%
Alexa group prompted Alexa “Alexa, read my emails from                      Reach out (call)             5.8%     1%             1%
                                                                            Ignore the email             15.4%    27.9%          18.3%
today.” The participants in the control group were simply                   Delete the email             1.9%     46.2%          3.8%
shown the emails on a screen. After each email, we used
the phishing susceptibility questionnaire from [24], [26],
shown in Table 1. At the end, the participants completed
the SeBIS questionnaire to measure participants’ phishing                  5.1.2. B: Pretexting and Influencing Strategies. Table
awareness [7]. The SeBIS scale measures a user’s self-                     3 gives the distribution of the “pretexting” effectiveness
reported intent to comply with “good” security practices                   in relationship to the influencing strategies for the portion
such as paying attention to contextual phishing cues.                      of participants susceptible to phishing pretexting, i.e. who
                                                                           replied “yes” to the email stimuli. The scarcity and author-
                                                                           ity principles worked well for the first email, but the an-
                                                                           choring into the personal experience worked half the time
     TABLE 1. P HISHING S USCEPTIBILITY Q UESTIONNAIRE                     for these participants. This finding confirms the previous
 User Response What action will you take as the next step:
                                                                           evidence that authority and scarcity are indeed effective
 1. Ask Alexa to reply “yes” to this email (“Respond to this email         persuasion strategies [3] when it comes to phishing. About
 by following the link” for the control group)                             the personal experience, it is expected that participants not
 2. Flag and check the email on a computer or smartphone screen            necessarily received or perhaps paid attention to university
 later (only the Alexa group)                                              administrative emails in the past. For the second email,
 3. Reach out to the sender separately                                     the authority and the personal experience were the most
 4. Ignore the email
 5. Delete the email
                                                                           successful in the pretexting, expectedly, because the email
 Authority: Do you recognize the sender? (Yes/No)                          was from a known sender. The scarcity principle worked
 Scarcity: Did the email convey sense of urgency? (Yes/No)                 roughly half the time. The scarcity didn’t work at all
 Personal Experience: Have you ever received an email like this            for the third email. The authority of the sender and the
 from this or similar senders? (Yes/No)                                    personal experience of the participants is what worked
 Detection: Does the email appear to be legitimate? (Yes/No/Maybe)         well. Using a payment service like Venmo as a disguise
 Inspection: What cues did you use to make your determination?             works as an effective adaptation of the old phishing emails
 (Open ended)
                                                                           from banks (Wachovia, Bank of America) or PayPal to the
                                                                           younger generation (the average age of the participant in
                                                                           the sample was 28 years old) [26].

                                                                     209
TABLE 3. “P RETEXTING ” AND I NFLUENCING S TRATEGIES             personal experience (“usually a Verizon survey email is
                                                                     signed by a representative”).
                                  Email Stimuli
 Pretexting            First       Second          Third
                                                                         For the third email, slightly more than half of
 Scarcity (Urgency)    100%        57.1%           0%                the participants correctly identified it as being phishing
 Authority             80%         100%            100%              (57.7%) with not phishing (36.5%) and potentially phish-
 Personal Experience   50%         100%            100%              ing (5.8%). The participants who decided the email is
                                                                     phishing scored the highest on the phishing awareness
                                                                     scale (SeBIS=3.92). Those participants who said it was
5.2. Research Question 2                                             not phishing or were not entirely sure scored lower on
                                                                     average: SeBIS=3.7 and SeBIS=3.2, respectively. When
5.2.1. A: Audio Inspection Cues. The decisions for each              asked what cues they used to inspect the third email
of the spoken back emails are given in Table 4. For the              spoken back by Alexa, the participants that incorrectly
first email, participants are equally divided between the            identified that the email was not phishing referred to
email being phishing, not phishing, or potentially phish-            the authority principle of influence (“The email is from
ing. The participants who incorrectly decided the email is           Venmo,” “Venmo is a reputable company that doesn’t send
not phishing scored the lowest on the phishing awareness             scam emails”) and personal experience (“I have received
scale (SeBIS=3.5911), while the undecided scored the                 similar emails from Venmo,” “Venmo sends me these all
highest on average (SeBIS=3.8526). Those that said the               the time”). The participants who correctly identified the
email was not phishing scored SeBIS=3.75. When asked                 email as phishing predominately referred to their personal
what cues they used to inspect the first email spoken                experience saying that “Venmo states the amount of the
back by Alexa, the participants that incorrectly identified          transaction directly in the email” and that “Venmo sends
the email was not phishing predominantly referred to the             notifications in the app.” The personal experience was also
authority principle of influence, saying that the “source is         the main factor for the participants that were undecided,
trustworthy,” “the sender is valid,” and the email allowed           saying that the email “didn’t sound the same as the other
them to “safely ignore it.” The participants who correctly           emails from Venmo” or “they don’t ask me to reply to get
identified the email as phishing equally referred to the             a receipt, the receipt is there in the email.”
scarcity principle of influence (saying “the email seems
urgent,” “it prompted me to reply immediately”) and                  5.2.2. B: Pretexting Cues. Table 5 gives the breakdown
personal experience (asking“why would I or someone else              of the phishing decisions made by the participants who
deactivate a spam filter”). The participants that were unde-         asked Alexa to reply “yes” on their behalf. The authority
cided questioned the authority of the sender (“seems like it         and scarcity in the first email were sufficient to persuade
is from our university, but I don’t know that department”),          70% of this group of participants to incorrectly decide
the urgency of the email (“it said I can respond right here          the email was not phishing. These participants scored the
right now”) and also relied on their personal experience             lowest on the phishing awareness scale: SeBIS=3.77. The
(“I am not sure who would’ve initiated this email”).                 pretexting was suspicious to 20% of them and only 10% of
                                                                     them correctly decided that someone is trying to condition
         TABLE 4. D ETECTION : S POKEN   BACK EMAILS
                                                                     them to take an action. The correct and suspecting partic-
                                                                     ipants scored the highest on the phishing awareness scale:
                                Email Stimuli                        SeBIS=4.21 and SeBIS=4.20, respectively. The authority
 Detection         First        Second            Third              and personal experience helped 85.7% of the participants
 Not Phishing      30.7%        13.5%             36.5%
 Phishing          30.7%        63.5%             57.7%
                                                                     to correctly identify the second email as not phishing,
 Maybe Phishing    38.6%        23%               5.8%               scoring the highest on the phishing awareness scales (Se-
                                                                     BIS=4.77). Only 14.3% of the participants suspect the
    For the second email, more than half of the par-                 email is phishing, scoring the lowest on the phishing
ticipants incorrectly identified it as being not phishing            awareness scale (SeBIS=3.00). The authority and personal
(13.5%) with phishing (63.5%) and potentially phishing               experience were persuasive for 75% of these participants
(23%). The participants who decided the email is not                 to incorrectly identify the third email as not phishing.
phishing scored the lowest on the phishing awareness scale           Their scores were the lowest on the phishing awareness
on average, SeBIS=3.5667. Those participants who said                scales (SeBIS=3.00). Equally 12.5% of the participants
it was phishing or were not entirely sure scored higher              suspected/decided correctly that the email is phishing and
on average: SeBIS=4.007 and SeBIS=4.006, respectively.               their scores were the highest (SeBIS=4.2 on average).
When asked what cues they used to inspect the second
                                                                             TABLE 5. P RETEXTING AND E MAIL D ETECTION
email spoken back by Alexa, the participants that correctly
identified that the email was not phishing predominantly                                            Email Stimuli
referred to the authority principle of influence saying that          Pretexting       First        Second          Third
it “sounds like something Verizon will ask.” The partic-              Not Phishing     70%          85.7%           75%
                                                                      Phishing         10%          0%              12.5%
ipants who incorrectly identified the email as phishing
                                                                      Maybe Phishing   20%          14.3%           12.5%
referred to the reciprocity principle of influence (saying
“Verizon usually gives incentives,” “there is no indication
what I will get in return”). The participants that were un-          5.3. Research Question 3
decided questioned the authority of the sender (“someone
could easily fake Verizon’s email”), the urgency of the              5.3.1. A: Differences in Susceptibility. To see if there is
email (“it gives a week to respond”) and relied on their             a difference in susceptibility to pretexting between emails

                                                               210
spoken back by Alexa and emails visually inspected on a                5.3.2. A: Differences in Cues for Inspection. Table 7
screen, we performed several Chi-square tests between the              shows the difference in relying on personal experience
control and the Alexa group. Table 6 shows the results of              for making a decision about phishing between the spoken
a 2 x 4 Chi-square test comparing the user responses for               back and the regular emails. We observed a statistically
each of the email stimuli used in the study, respectively              significant association between the mode of email inspec-
We observed a statistically significant association between            tion (spoken back vs. regular) and the personal experience
the mode of email inspection (spoken back vs. regular) for             for the second and the third email: 2) χ2 (1) = 7.80, p =
the first email, χ2 (3) = 16.867, p = 0.002, and for the               0.005; and 3) χ2 (1) = 28.719, p = 0.001. While for the
third email, χ2 (3) = 20, p = 0.001. Both of these emails              first email there was no difference between the groups,
were phishing emails and in both cases the participants in             the Alexa group participants had reported higher exposure
the Alexa group appear more susceptible to the pretexting              to business communication emails. The participants in the
then the control group participants to the phishing. This              control group expressed a bit different personal experience
result adds additional evidence that pretexting in voice               with customer satisfaction surveys and payment receipts.
assistant environments is a viable “preterxting” strategy.             For the second email they indicated that the “greeting is
                                                                       vague,” and that “customer surveys are usually done via
 TABLE 6. U SER R ESPONSE : S POKEN   BACK VS . REGULAR EMAILS         phone at end of service call, no need for an email.” For
                          User Response [Email 1]
                                                                       the third email, they noticed that “e-mails like this would
           Yes/Click   Call      Ignore    Delete      Total           provide the receipt details regardless.” This outcome might
 Control   3           6         25        18          52              be a result of the Alexa group participants’ reliance on
 Alexa     16          10        20        6           52              the availability heuristic when deciding on a spoken back
 Total     19          16        46        24          104             email - hearing a keyword like “customer satisfaction
                          User Response [Email 2]
                                                                       survey” or a “payment receipt” might have prompted
 Control   10          4         29        9           52
 Alexa     11          1         29        11          52              them to relate any experience with email surveys and
 Total     21          5         58        20          104             receipts, bypassing any vagueness in greeting or need to
                          User Response [Email 3]                      hear the details of the receipt. This result indicates that
 Control   1           2         23        26          52              the pretexting strategy should rely on contextually relevant
 Alexa     14          4         24        10          52              experiences for the user, instead of a simple email blast.
 Total     15          6         47        36          104
                                                                        TABLE 8. P ERSONAL EXPERIENCE : S POKEN   BACK VS . REGULAR
    Table 6 shows the results of a 2 x 3 Chi-square                                              EMAILS
test comparing the detection decision for each of the
email stimuli used in the study, respectively. We ob-                                       Personal experience [Email 1]
served a statistically significant association between the                        Yes              No               Total
                                                                        Control   16               36               52
mode of email inspection (spoken back vs. regular) and                  Alexa     25               27               52
the phishing detection success for all three emails: 1)                 Total     41               63               104
χ2 (2) = 18.682, p = 0.001; 2) χ2 (2) = 6.504, p = 0.039;                                   Personal experience [Email 2]
and 3) χ2 (2) = 30.239, p = 0.001. For the first email,                 Control   34               18               52
there are more than twice correct detections that the email             Alexa     46               6                52
                                                                        Total     80               24               104
is phishing in the control group compared to the Alexa                                      Personal experience [Email 3]
group. For the second email, there are more than twice                  Control   21               31               52
correct detections that the email is not phishing in the                Alexa     47               5                52
Alexa group. For the third email, there are more than                   Total     68               36               104
twice correct detections that the email is phishing in the
control group compared to the Alexa group. There are                        Table 9 shows the difference in the effectiveness of
several implications of these results, namely, that 1) an              the authority persuasion strategy between the spoken back
adversary could pretext a user to believe a phishing email             and the regular emails. We observed a statistically signif-
is not a phishing when the email is spoken back; and 2)                icant association between the mode of email inspection
voice assistant users show much lower false negative rate              (spoken back vs. regular) and the authority strategy for
in detecting phishing/pretexting.                                      all three emails: 1) χ2 (1) = 7.558, p = 0.006; 2) χ2 (1) =
                                                                       28.556, p = 0.001; and 3) χ2 (1) = 9.43, p = 0.002. More
   TABLE 7. D ETECTION : S POKEN   BACK VS . REGULAR EMAILS
                                                                       of the Alexa group participants recognized the sender in
                                                                       the first email, but less in the second and third email.
                           Detection [Email 1]                         The participants in the control group suspected the au-
           Phishing     Not Phishing   Maybe         Total             thority of the sender in the first email, indicating that
 Control   35           2              15            52
 Alexa     16           16             20            52                “the email doesn’t seem to come from a .edu address.”
 Total     51           18             35            104               They acknowledged, though still suspected, the authority
                           Detection [Email 2]                         of the second email (Verizon), asking “why the email has
 Control   12           12             20            52                an impersonal greeting and not my name.” For the third
 Alexa     7            33             12            52                email, the control group participants visually caught the
 Total     19           53             32            104
                           Detection [Email 3]
                                                                       misspelling of the authoritative sender (“Venmou” instead
 Control   2            4              11            52                of “Venmo,” which in the Alexa group, when spoken back,
 Alexa     19           30             3             52                sounds the same). This outcome might be a result of
 Total     56           37             14            104               the contextual relevance of the first email for the Alexa
                                                                       group participants (allegedly coming from the university

                                                                 211
information security department), but not the particular              phishing awareness scores on average. The lack of phish-
businesses in the second and the third email (not all                 ing awareness impaired them to correctly spot the emails
participants are Verizon and Venmo users). As with the                as phishing when they were spoken back by Alexa. Our
personal experiences above, the results indicate that the             further analysis suggested that this particular outcome
pretexting strategy needs to be relevant for the user in the          results because the first email successfully employed the
context of their personal day-to-day interactions to allow            authority principles of influence, while the authority and
for application of the authority persuasion principle.                the personal experience were the most successful in the
                                                                      pretexting for the third email. The comparison between
   TABLE 9. AUTHORITY: S POKEN   BACK VS . REGULAR EMAILS             the Alexa and the control group of participants indicated
                                                                      that pretexting in voice assistant environments might be
                          Authority [Email 1]
           Yes              No                Total                   a threat worth exploring, especially when the pretexting
 Control   19               33                52                      strategy relies on contextually relevant experiences and
 Alexa     33               19                52                      day-to-day interactions. We must note that the results do
 Total     52               52                104                     not serve as an authoritative test but more so an early
                          Authority [Email 2]                         exploration of new avenues for phishing pretexting. As
 Control   36               16                52
 Alexa     9                43                52
                                                                      such, we urge readers to take the results as indicative to
 Total     52               52                104                     the said threat, which we certainly believe needs need
                          Authority [Email 3]                         further investigation.
 Control   26               26                52                          A significant portion of the participants utilized Alexa
 Alexa     11               41                52                      to their advantage as a form of a potential early phish-
 Total     52               52                104
                                                                      ing detection buffer indicating that they are inclined to
                                                                      flag the emails and check them later on a computer or
     Table 10 shows the difference in the effectiveness of
                                                                      a smartphone screen. For the first email, 5.8% partici-
the scarcity persuasion strategy between the spoken back
                                                                      pants indicated they will directly call the department that
and the regular emails . We observed a statistically sig-
                                                                      sent the email to ask what it is about. The majority of
nificant association between the mode of email inspection
                                                                      them correctly identified the first email as phishing but
(spoken back vs. regular) and the scarcity strategy for the
                                                                      half of them failed to make the correct decision for the
second and the third email: 2) χ2 (1) = 40.316, p = 0.001;
                                                                      third email, despite scoring the highest on the phishing
and 3) χ2 (1) = 61.905, p = 0.001. While for the first
                                                                      awareness score in both cases. Most of them exhibited
email there was no difference between the groups, the
                                                                      “better safe than sorry” behaviour in the second email,
Alexa group participants had the opposite perception of
                                                                      suspecting that the email is possibly phishing. A possible
the scarcity, or the urgency of the emails then the partici-
                                                                      explanation for the lapse in judgement for the third email
pants in the control group. The participants in the control
                                                                      is the lack of urgency in the email formatting, indicating
group recognized the urgency in the formatting of the
                                                                      that the absence of known influencing strategies could be
first (“Sense of urgency, link is suspicious since it’s not a
                                                                      useful for “pretexting” through Alexa for users who are
.edu”) and second email (“The time limit alludes to a bit
                                                                      highly aware of phishing. Attention needs to be brought
of urgency”). The inability to contextualize the urgency
                                                                      to the participants who said that they would ignore the
of the emails in the Alexa group might be a result of the
                                                                      emails altogether. An additional analysis revealed that
audio interface itself which doesn’t allow for immediate
                                                                      those who chose to ignore the first and the second email
action on the email like the regular computer interface
                                                                      scored the lowest on the phishing awareness scale, but
does. The results indicate scarcity might not be applicable
                                                                      those that ignored the third showed the highest score. We
for pretexting in voice assistant environments.
                                                                      suspect that this might be a result of the higher exposure
   TABLE 10. S CARCITY: S POKEN BACK VS . REGULAR EMAILS
                                                                      to financial phishing campaigns then to institutional or
                                                                      communications ones. People probably tend to be more
                           Scarcity [Email 1]                         vigilant about their bank accounts than their institutional
           Yes              No                Total                   or cellphone service accounts. A portion of the partici-
 Control   29               23                52
                                                                      pants decided to delete the phishing emails, suggesting
 Alexa     32               20                52
 Total     61               43                104                     potential Alexa’s utilization as phishing removal assistant.
                           Scarcity [Email 2]                             In general, the alternative audio/voice interaction in-
 Control   18               34                52                      terface has the potential to be used for pretexting based on
 Alexa     49               3                 52                      the authority principle and to a lesser extent the scarcity
 Total     67               37                104                     principle, but the relatively higher number of incorrect
                           Scarcity [Email 3]
 Control   8                44                52                      decisions and decisions as “potentially phishing” suggest
 Alexa     48               4                 52                      that users in our study relayed on their personal expe-
 Total     56               48                104                     rience when talking to Alexa about emails. This makes
                                                                      sense from a human-computer interaction perspective,
                                                                      given that users tend to personify Alexa in such inter-
6. Discussion and Conclusion                                          actions as reading personal emails and place a high level
                                                                      of trust in the voice assistant [29].
    Our analysis shows that the percentage of partici-                    We utilized a relatively younger-leaning sample in our
pants susceptible to “pretexting” through Alexa is non-               study and a more representative population might have
negligible. Not surprisingly, for the emails that were ac-            a different approach towards handling phishing emails
tually phishing, these participants showed relatively low             and interacting with Alexa. We didn’t measure for the

                                                                212
frequency of interaction with voice assistants but stud-                           [14] K. Parsons, M. Butavicius, M. Pattinson, D. Calic, A. Mccormac,
ies suggest that the personification and interaction with                               and C. Jerram, “Do users focus on the correct cues to differentiate
                                                                                        between phishing and genuine emails?” 2016.
Alexa is dependent on it [29], [26]. It will be interesting
                                                                                   [15] M. Blythe, H. Petrie, and J. A. Clark, “F for fake: Four
in the future to explore the relationship between one’s                                 studies on how we fall for phish,” in Proceedings of the
susceptibility to phishing “pretexting” and their interaction                           SIGCHI Conference on Human Factors in Computing Systems,
habits with Alexa. Our study utilized three sample email                                ser. CHI ’11. New York, NY, USA: Association for Computing
stimuli that were customized to the audio interaction with                              Machinery, 2011, p. 3469–3478. [Online]. Available: https:
                                                                                        //doi.org/10.1145/1978942.1979459
Alexa. Different emails from different senders, perhaps
                                                                                   [16] A. Vishwanath, T. Herath, R. Chen, J. Wang, and H. R. Rao,
including actual fraudulent links, might yield different                                “Why do people get phished? testing individual differences in
results. Hearing an email that wants you to click on a                                  phishing vulnerability within an integrated, information processing
link could trigger a phishing deception judgment much                                   model,” Decision Support Systems, vol. 51, no. 3, pp. 576 – 586,
more frequently than hearing an email that asks you to                                  2011. [Online]. Available: http://www.sciencedirect.com/science/
                                                                                        article/pii/S016792361100090X
reply “yes.” This is certainly an avenue of research that
                                                                                   [17] J. S. Downs, M. B. Holbrook, and L. F. Cranor, “Decision strategies
we will explore in the future to yield a more nuanced                                   and susceptibility to phishing,” in Proceedings of the Second
understanding of users’ phishing behaviour.                                             Symposium on Usable Privacy and Security, ser. SOUPS ’06. New
                                                                                        York, NY, USA: Association for Computing Machinery, 2006, pp.
References                                                                              79 – 90.
                                                                                   [18] C. Koumpis, G. Farrell, A. May, J. Mailley, M. Maguire, and
                                                                                        V. Sdralia, “To err is human, to design-out divine: Reducing
[1]   H. Chung, M. Iorga, J. Voas, and S. Lee, “Alexa, Can I Trust You?”
                                                                                        human error as a cause of cyber security breaches,” Human Factors
      Computer, vol. 50, no. 9, pp. 100–104, 2017.
                                                                                        Working Group Complementary White Paper, 2007.
[2]   Security Research Labs, “Smart Spies: Alexa and Google Home                  [19] E. Lipton, D. Sagner, and S. Shane, “The Perfect Weapon: How
      expose users to vishing and eavesdropping,” https://srlabs.de/bites/              Russian Cyberpower Invaded the U.S.” https://www.nytimes.com/
      smart-spies/, December 2019.                                                      2016/12/13/us/politics/russia-hack-election-dnc.html?referer=, De-
[3]   T. Lin, D. E. Capecci, D. M. Ellis, H. A. Rocha, S. Dommaraju,                    cember 2016.
      D. S. Oliveira, and N. C. Ebner, “Susceptibility to spear-phishing           [20] K. D. Mitnick and W. L. Simon, The art of deception: Controlling
      emails: Effects of internet user demographics and email content,”                 the human element of security. John Wiley & Sons, 2003.
      ACM Trans. Comput.-Hum. Interact., vol. 26, no. 5, Jul. 2019.
                                                                                   [21] X. R. Luo, W. Zhang, S. Burd, and A. Seazzu, “Investigating
      [Online]. Available: https://doi.org/10.1145/3336141
                                                                                        phishing victimization with the heuristic-systematic model: A
[4]   R. B. Cialdini, Influence: the psychology of persuasion; Rev.                     theoretical framework and an exploration,” Computers & Security,
      ed. New York, NY: Collins, 2007. [Online]. Available:                             vol. 38, pp. 28 – 38, 2013, cybercrime in the Digital Economy.
      http://cds.cern.ch/record/2010777                                                 [Online]. Available: http://www.sciencedirect.com/science/article/
[5]   D. J. Major, D. Y. Huang, M. Chetty, and N. Feamster, “Alexa, who                 pii/S0167404812001927
      am i speaking to? understanding users’ ability to identify third-            [22] M. Workman, “Wisecrackers: A theory-grounded investigation of
      party apps on amazon alexa,” 2019.                                                phishing and pretext social engineering threats to information
[6]   Z. Guo, Z. Lin, P. Li, and K. Chen, “Skillexplorer: Understanding                 security,” Journal of the American Society for Information Science
      the behavior of skills in large scale,” in 29th USENIX Security                   and Technology, vol. 59, no. 4, pp. 662–674, 2008.
      Symposium (USENIX Security 20). USENIX Association, Aug.                     [23] Federal Communications Commission, “FCC Warns of ’Can You
      2020, pp. 2649–2666. [Online]. Available: https://www.usenix.org/                 Hear Me’ Phone Scams,” March 2017, https://www.fcc.gov/.
      conference/usenixsecurity20/presentation/guo                                 [24] C. Canfield, A. Davis, B. Fischhoff, A. Forget, S. Pearman,
[7]   S. Egelman, L. F. Cranor, and J. Hong, “You’ve been warned:                       and J. Thomas, “Replication: Challenges in using data logs
      An empirical study of the effectiveness of web browser phishing                   to validate phishing detection ability metrics,” in Thirteenth
      warnings,” in Proceedings of the SIGCHI Conference on Human                       Symposium on Usable Privacy and Security (SOUPS 2017).
      Factors in Computing Systems, ser. CHI ’08. New York, NY,                         Santa Clara, CA: USENIX Association, Jul. 2017, pp. 271–284.
      USA: Association for Computing Machinery, 2008, p. 1065–1074.                     [Online]. Available: https://www.usenix.org/conference/soups2017/
      [Online]. Available: https://doi.org/10.1145/1357054.1357219                      technical-sessions/presentation/canfield
[8]   H. Joseph, “Social engineering in cybersecurity: The evolution of            [25] C. I. Canfield, B. Fischhoff, and A. Davis, “Quantifying phishing
      a concept,” Computers & Security, vol. 73, pp. 102 – 113, 2018.                   susceptibility for detection and behavior decisions,” Human Fac-
                                                                                        tors, vol. 58, no. 8, pp. 1158–1172, 2016.
[9]   A. Ferreira, L. Coventry, and G. Lenzini, “Principles of persua-
                                                                                   [26] S. Sheng, M. Holbrook, P. Kumaraguru, L. F. Cranor, and
      sion in social engineering and their use in phishing,” in Human
                                                                                        J. Downs, “Who falls for phish? a demographic analysis of
      Aspects of Information Security, Privacy, and Trust, T. Tryfonas
                                                                                        phishing susceptibility and effectiveness of interventions,” in
      and I. Askoxylakis, Eds. Springer, 2015, pp. 36–47.
                                                                                        Proceedings of the SIGCHI Conference on Human Factors
[10] A. van der Heijden and L. Allodi, “Cognitive triaging of                           in Computing Systems, ser. CHI ’10. New York, NY, USA:
     phishing attacks,” in 28th USENIX Security Symposium (USENIX                       Association for Computing Machinery, 2010, p. 373–382. [Online].
     Security 19). Santa Clara, CA: USENIX Association, Aug.                            Available: https://doi.org/10.1145/1753326.1753383
     2019, pp. 1309–1326. [Online]. Available: https://www.usenix.org/             [27] Verizon, “Data Breach Investigation Report,” online, May
     conference/usenixsecurity19/presentation/van-der-heijden                           2019,        https://enterprise.verizon.com/resources/executivebriefs/
[11] O. Zielinska, A. Welk, C. B. Mayhorn, and E. Murphy-Hill, “The                     2019-dbir-executive-brief.pdf.
     persuasive phish: Examining the social psychological principles               [28] J. Lau, B. Zimmerman, and F. Schaub, “Alexa, Are You
     hidden in phishing emails,” in Proceedings of the Symposium and                    Listening?: Privacy Perceptions, Concerns and Privacy-seeking
     Bootcamp on the Science of Security, ser. HotSos ’16. New                          Behaviors with Smart Speakers,” Proc. ACM Hum.-Comput.
     York, NY, USA: Association for Computing Machinery, 2016, p.                       Interact., vol. 2, no. CSCW, pp. 1–31, Nov. 2018. [Online].
     126. [Online]. Available: https://doi.org/10.1145/2898375.2898382                  Available: http://doi.acm.org/10.1145/3274371
[12] G. D. Moody, D. F. Galletta, and B. K. Dunn, “Which phish                     [29] A. Purington, J. G. Taft, S. Sannon, N. N. Bazarova, and S. H.
     get caught? an exploratory study of individuals’ susceptibility to                 Taylor, ““alexa is my new bff”: Social roles, user satisfaction,
     phishing,” European Journal of Information Systems, vol. 26, no. 6,                and personification of the amazon echo,” in Proceedings of the
     pp. 564–584, 2017.                                                                 2017 CHI Conference Extended Abstracts on Human Factors
[13] C. Iuga, J. R. C. Nurse, and A. Erola, “Baiting the hook: fac-                     in Computing Systems, ser. CHI EA ’17. New York, NY,
     tors impacting susceptibility to phishing attacks,” Human-centric                  USA: Association for Computing Machinery, 2017, p. 2853–2859.
     Computing and Information Sciences, vol. 6, no. 1, p. 8, 2016.                     [Online]. Available: https://doi.org/10.1145/3027063.3053246

                                                                             213
You can also read