In Alexa We Trust: How Increasingly Humanoid Computers Are Changing Human Behavior - Universiteit van Amsterdam

Page created by Alan Evans

IT & Technique

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

In Alexa We Trust: How Increasingly Humanoid Computers Are Changing Human Behavior - Universiteit van Amsterdam

In Alexa We Trust:
How Increasingly Humanoid Computers Are Changing Human Behavior

                          Master’s Thesis

                           29 June 2018
                   New Media and Digital Culture
                     Universiteit van Amsterdam

Abstract

This research revolves around the anthropomorphism of (computer) devices and assesses how
this practice affects human behavior in general and human-computer interaction in particular. It
is nested in the domain of the voice-activated ‘conversational interface’, bringing forward the
Amazon Echo as a case study. As the first and foremost ‘smart speaker’ in the US market, the
Echo is approached on both a theoretical and empirical level by carefully examining three of the
main ‘pillars’ of the current Echo ecosystem: The Echo Facebook page, the Alexa Skills store,
and the Amazon webstore. Combining quantitative and qualitative analyses of several datasets
from these domains with theory from the field of media studies, drawing mostly from platform
and app studies, this research demonstrates how device anthropomorphism affects human-
computer interaction in various ways. In conclusion, it is argued that the anthropomorphism of
the Echo device is a deliberate, ‘trust-inducing design strategy’ above all, ultimately employed by
Amazon to increase profits. Looking beyond the widely popularized conceptualization of the
voice-activated conversational interface as merely being a ‘natural’ or ‘intuitive’ medium for
human-computer interaction, this research illuminates the economic factors underlying this
phenomenon.

In Alexa We Trust                                                                                  2

Table of Contents

Introduction            4
1. Amazon on Facebook   13
       1.1 Method       18
       1.2 Results      23
       1.3 Discussion   29
2. Alexa Skills         33
       2.1 Method       37
       2.2 Results      43
       2.3 Discussion   49
3. Echo Reviews         54
       3.1 Method       58
       3.2 Results      61
       3.3 Discussion   67
Conclusion              73
Acknowledgements        79
References              80
Appendices              88

In Alexa We Trust            3

“Alexa. Good morning.”
“Good Morning.”
“Alexa. How are you doing today?”
“I’m AI okay.”
“Alexa. Are you being serious?”
“I like to be useful. But I can have fun too.”
“Alexa. Does that make you human?”
“Hmm. I’m not sure.”
“Alexa. What are you then?”
“I’m Alexa and I’m designed around your voice. I can provide information, music, news,
weather, and more.”

For the largest part of my life, I have been interacting with technology, but interacting
with Alexa somehow feels very different from any previous encounter. While being fully aware
of the fact that I am conversing with a machine, Alexa’s human voice and quirky character
trigger me into engaging with it in general chat, addressing it with human courtesy, and even
developing what feels like a personal relationship. Alexa however, is nothing more than the
proverbial face – or: voice – behind which a variety of complex technologies hide. It is the so-
called ‘virtual personal assistant’ (VPA) of Amazon; the personified, voice-activated
‘conversational interface’ through which users can connect with Amazon’s Echo-devices in an
“intuitive and natural way” (McTear 11-22).
“Echo”, as Amazon first introduced its voice-activated smart speaker to the press in June
2015, “is a new category of device designed around your voice—it’s always on, hands-free, and
fast—just ask for information, music, news, weather, and more from across the room and get
answers instantly”. “Alexa”, the company continues, is “the brain behind Echo (…) built in the
cloud, so it is always getting smarter” (Amazon, Amazon Echo Now Available to All
Customers). The Echo is thus a device with a brain (i.e. Alexa) that communicates by voice – a
computer with human characteristics. In other words, it is an ‘anthropomorphized’ device.
Anthropomorphism, put simply, is the tendency to attribute human characteristics to non-
human objects as a way to help rationalize their actions and behavior (Duffy 180). It is this
human-computer duality that lies at the heart of this research, which revolves around the
question: How does the anthropomorphizing – or: personification – of the Amazon Echo device
affect human behavior in general and human-computer interaction in particular?

In Alexa We Trust 4

Before elaborating on this question and delving into a human-computer duality narrative, it
is important to first outline the cultural, technological, and economic context in which Alexa and
the Echo came into existence, as well as the specific technologies they consist of. To be clear,
‘Alexa’ and ‘Echo’ refer to separate, yet inseparable things: The Echo is the hardware that houses
the Alexa software; it is the device through which the underlying VPA technology can be
accessed. As these come in a package, this research treats them as such, referring to both when
mentioning the device (i.e. ‘Echo’). At the same time however, the study respects the significance
of the underlying technology – which can be accessed through other devices as well – by
referring to ‘Alexa’ when only the software is discussed. In some deviating cases there will be a
clear indication of the subject(s) under scrutiny.
    The idea of employing VPAs in everyday life is definitely not new: The voice-activated
conversational interface has been a long-standing vision of researchers in artificial intelligence
(AI) and speech technology (Cassell et al. 520). Until recently however, the realization of this
vision was confined to the imagination of popular culture, with science fiction books and movies
depicting ‘sentient computers’, like HAL 9000 in 2001: A Space Odyssey (1968), or personified,
‘intelligent operating systems’, like Samantha in Her (2013) (Pieraccini 263; Wan 166). Such
systems have also been in the picture of major technology companies for quite some time, as a
1987 concept video of Apple depicting its Knowledge Navigator indicates (McTear 15). It was also
this company that, after acquiring the necessary technology from the US-American startup Siri
Incorporated for an undisclosed amount in 2010, introduced Siri in 2011, now generally recognized
as the first voice-activated VPA (Both 108; McTear 16).
    To understand the recent rise of the conversational interface, a term that refers both to
voice-activated assistants like Siri and Alexa and ‘intelligent’, automated text-based chatbots with
which one interacts by typing, such as Facebook’s M, it is important to highlight some of the
technological advances that have contributed to this development (Newman; Brownlee). Besides
the more obvious factors that appear on the surface, such as the ever-increasing computing
power of devices, faster wireless networks, and the fact that major technology companies share
great interest in the technology, there are also some more profound reasons behind the rise of
the (voice-activated) conversational interface (McTear 16-18).
    First, research in the field of artificial intelligence has shifted its focus from so-called
‘knowledge-based’ approaches, which pursue intelligence in computers by training them to solve
problems that are difficult for humans bßut easy for computers (e.g. in the domains of decision-
making, chess, etc.), towards ‘subsymbolic’ approaches, which instead revolve around easy
‘problems’ for humans that have proven to be difficult for computers (e.g. in the domains of

In Alexa We Trust                                                                                    5

speech, emotion, etc.). In an official Google video on Youtube, psychologist Allison Gopnik notes
how AI has shifted its focus towards tackling new challenges: “The things that we thought were
going to be easy for a computer system, like understanding language, those things have turned
out to be incredibly hard” (Google, Behind the Mic). Secondly, language technologies have
benefitted greatly from recent technological developments within domains such as neural
networks, big data, and deep learning, drastically increasing the accuracy of speech recognition
technology and spoken language understanding (McTear 16-18; Pieraccini 136). Lastly, as the
founding father of the internet Tim Berners-Lee already prophesied in 2001, the Web has been
evolving into a ‘Semantic Web’, where search and other functionalities are built around the
meaning of input, rather than on literal keywords (Berners-Lee 34; McTear 17).
Having sketched out the cultural and technological context in which the conversational
interface could come into existence, I now briefly turn to the question of how the Echo and
similar devices actually work, before elaborating on the economic context in which they came to
thrive. Shedding light on the technicity of the device, I argue, in fact contributes to the better
understanding of this economic context. To be sure, this is not an in-depth, all-encompassing
specialist explanation, but rather a concise overview of the device’s main technical components.
Typically, there are five sequential layers to any voice-activated conversational interface: speech
recognition, spoken or natural language understanding, dialogue management, response
generation, and text-to-speech synthesis (McTear 20-21). After activating the Echo by using a
‘wake word’ (‘Alexa’ by default), the integral speech technology first attempts to recognize any
spoken language by converting the audio to words (Bohn; Vermeulen et al. 1). Then, it interprets
these words and discovers the intended meaning of the speaker (Hwang). If an intended
meaning is not recognized, the dialogue management system seeks clarification by engaging in a
dialogue with the user (McTear 20). If the meaning is understood, the system proceeds by
constructing a response in natural language, converting meaning back to words (Pieraccini 170).
Finally, these words are converted to audio, as the device responds to the user in spoken
language (Taylor 146; McTear 21) (see Figure 1a).

In Alexa We Trust 6

Figure 1a. The five sequential layers of the voice-activated conversational interface: speech recognition, spoken or
natural language understanding, dialogue management, response generation, and text-to-speech synthesis (McTear
21).

While text-based chatbots, the more ‘primitive’ conversational interfaces that do not consist of
all aforementioned technologies, were initially believed to be “the next big platform” by many in
the technology industry, have failed dramatically in living up to the expectations, Alexa (in this
case synonymous for the Echo) and most other voice-activated VPAs have experienced a rapid
and continuous increase in popularity ever since their introduction (Griffith and Simonite).
Recent market research concludes that one-in-six US adults now owns a voice-activated smart
speaker (Ong). Another 2017 study even predicts that by 2020, 75% of US households will own
such a device (Gartner). Many of the world’s largest technology companies are employing vast
amounts of knowledge and resources to at least one of the underlying technologies; some have
even introduced a voice-activated smart speaker – with all of the inherent technologies – of their
own (Coyne et al. 1). As mentioned before, Apple’s Siri is generally recognized as the first VPA.
With regard to voice-activated smart speakers that house VPAs however, Amazon is widely
accepted as the market’s ‘first mover’ with its Echo (Weinberger). After the introduction of the
Echo, Microsoft and Google were quick to formulate an answer, with their Cortana and Google
Home systems respectively. Apple followed suit by announcing the HomePod in June 2017

In Alexa We Trust 7

(Apple). However, enjoying first mover advantage, Amazon has firmly established itself as the
market leader, commanding around 72% market share (Kinsella and Mutchler 10).

Figure 1b. The first generation of smart speakers of four of the largest technology companies worldwide. From left
to right: Amazon’s Echo; Apple’s HomePod; Microsoft’s Cortana; Google’s Home.

The varying physical designs of all of these speakers do not hide the fact that they are remarkably
similar in function (see Figure 1b). They are all equipped to play music, tell jokes, read the news,
translate between languages, provide information, set timers and alarms, and much more. The
companies introducing them however, have very different backgrounds. While Amazon, Apple,
Microsoft and Google (or: Alphabet), the four largest technology companies in the world in terms
of market value, originate from the separate market spheres of e-commerce, consumer goods,
computer soft and hardware, and web search respectively, they are now jumping to the exact
same occasion (Statista). With the power and influence that these companies wield in today’s
global economy, it is of great importance – for academics and society in general – to understand
why they all share the interest in building and selling voice-activated smart speakers and the
accompanying VPAs.
Part of the reason for this development can be found in the simple fact that these
companies are actors in a capitalist system. Shedding light on technology companies from a
predominantly economic perspective, Nick Srnicek argues how their decision-making is best
apprehended by scrutinizing their quest for profit and their effort to fend off competition. This
approach in fact makes the ‘next move’ of such companies more predictable for outside
observers. “Capitalism”, Srnicek continues, “demands that firms constantly seek out new
avenues for profit, new markets, new commodities, and new means of exploitation” (10). In
accordance with this ‘logic of accumulation’, major technology companies – having already
established themselves as uncontested market leaders in their respective spheres – unsurprisingly
turn to new markets, such as smart speaker hardware and VPA technology, as new possible
avenues for profit (Zuboff 76). However, this is still not a sufficient explanation for why these
companies have all turned to the exact same markets, introducing strikingly similar products.

In Alexa We Trust 8

To truly understand this development, it is essential to delve deeper into economic
theory – without resorting to too much jargon – and approach these technology companies as a
new kind of firm, constructed around a new kind of business model – the ‘platform’ – and
situated in a new kind of capitalism, one that has turned to data as a way to maintain economic
growth (Srnicek 13). In this ‘platform capitalism’ – the economic system of the ‘information
society’ – data is the raw material to be refined and exploited after being extracted from user
activity, the natural source (Srnicek 54; Yeung 119). Platforms like Amazon, Apple, and Google
revolve around obtaining control over data, with the aim to predict and even modify the
behavior of their users as a means to produce revenue and increase market control (Zuboff 75).
On their indispensable quest to gain access to more data and enabled by aforementioned
technological advances, these companies are now expanding their data collection into the
relatively undiscovered realm of the home, expecting to uncover and control rich new data
sources. As one report puts it: “From a data-production perspective, activities are like lands
waiting to be discovered. Whoever gets there first and holds them gets their resources – in this
case, their data riches” (Srnicek 127).
        In platform capitalism, controlling more data means more control over a market. When
in control of a market, platforms can ‘set the rules of the game’, eventually becoming non-
regulable, hegemonic models that may even “take on a powerful institutional role, solidifying
economies and cultures in their image over time” (Srnicek 13; Bratton 41). Only platforms can
compete with and thereby possibly regulate other platforms: no other business model thrives so
well in the information society (Srnicek 62). Other platforms thus form the only real threat for a
platform’s conquering of a market: Only they are able to extract and control the same large
amounts of data needed to expand. As the expansion of platforms is driven by the need for
more data, we can see the development of a certain rat race between platforms, who compete
vigorously for control over key market positions that are rich in data. This ‘data rush’ ultimately
leads to a situation in which platforms become increasingly similar, entering the same markets
and launching similar products (Idem 67-68; 136). In this light, the introduction of the
HomePod (paired with Siri) for example, both challenges and resembles Amazon’s longer
established effort of the Echo (and Alexa) and Google’s more recent Home (and Google Assistant)
in their quests to control and exploit the supposedly data-rich markets of smart speaker hardware
and VPA technology.
        Having explained how the Echo works technically and having sketched out the cultural,
technological, and economic context in which such devices could come into existence, I have
paved the way to return to the main narrative of this research, which revolves around

In Alexa We Trust                                                                                     9

anthropomorphism. In this regard, it is important to note that the striking similarities between
the smart speakers and VPAs that Amazon, Apple, and Google have introduced are not limited
to the confines of function, with which I refer to the aforementioned abilities of these products,
such as playing music or reading the news. Rather, these similarities spill over into the realm of
form – the less tangible domain of terminology and underlying ideology with which these
products are introduced to the public by the companies in question. One of the most dominant
narratives within this domain, propagated by Amazon as well as Apple and Google, is that the
voice-activated conversational interface encompasses an ‘intuitive’ or ‘natural’ way of interacting
with computers. At the introduction of a new Home device in 2017 for example, Google notes:
“The way you interact with our products has to be so intuitive you never even have to think
about it and so simple that the entire household can use it” (Google Event October 4 2017 New
Google Home Mini). Propagating similar narratives, Amazon and Apple further attempt to
establish natural interaction between humans and computers by personifying their devices – by
default addressed as persons, ‘Alexa’ and ‘Siri’ respectively (McTear 15).
The main question of this research revolves around the consequences that this
anthropomorphizing of computers has on human behavior, specifically addressing the effects it
has on human-computer interaction. As a case study, the Amazon Echo device is introduced.
This research elaborates on that case study, bringing multiple sets of empirical data into the
equation. It is divided into three main chapters, as it approaches this subject from three different
angles. Each chapter is preceded by and constructed around specific sub-questions. Together,
these sub-questions form a framework from which the main research question can be
approached more comprehensively. Discussing different datasets, these chapters are structured
according to a similar setup, a) introducing and contextualizing the dataset, b) explaining the
research methodology, c) presenting the research results, and d) discussing the findings.
The Echo, compared to other such devices, makes for an interesting case for a
multiplicity of reasons. As mentioned before, Amazon is both first mover and leader in the smart
speaker market. Further, Apple’s HomePod is not yet available to the public and Google’s Home
is not personified to the same extent as the Echo, as it is addressed and activated as a device
rather than a person (‘Ok. Google’). These cases have thus respectively not sparked much
research at all or not enough within domains that are of interest to this research, such as
anthropomorphism of devices. The most important reason to study the Echo from the
perspective of this data-driven study however, is the simple fact that it was the first of its kind to
hit the market and has thus sparked a relatively dense body of data (Weinberger). What also
makes the Echo an interesting case study is that Amazon, more so than other contestants, has

In Alexa We Trust 10

built a platform around its devices, introducing the ‘Alexa Skills store’ that prompts developers
to create custom applications for Alexa users, thereby creating a multisided market that caters to
several different groups of stakeholders (Rieder and Sire 199). The availability of extensive data
from different groups of stakeholders makes for a more holistic approach of the subject, one
that better captures its versatility and complexity.
        The first chapter of this research approaches the Echo by examining the official
Facebook page on which the device has been promoted from its very introduction. This chapter
analyzes the specific terminology that Amazon conveys when parading its product to the public.
With access to all publicly available historical data of this substantial marketing channel – which
gives a unique insight into how Amazon has thus far framed its device to the public – I propose
a quantitative approach to answering the question: How does Amazon employ the notion of
anthropomorphism in presenting the Echo to its prospective customers on Facebook? Also
having access to the ‘engagement metrics’ (likes, shares, reactions, etc.) of this Facebook page,
this research will then, through quantitative and qualitative analyses, proceed to answer the
question: How does Amazon’s framing of the Echo subsequently affect the relationship between
Echo users and their devices? Elaborating on the last question, this chapter argues that this
relationship is not merely a product of top-down imposed marketing, but rather an ever-
evolving, in flux phenomenon that develops in dialogue between Amazon and its (prospective)
customers. With respect to this argument, this chapter is constructed around theories of
‘prosumers’, ‘customer coproduction’, and ‘consumer publics’, among others (Lloyd; Arnould
and Thomspon; Arvidsson, The Potential of Consumer Publics). Taking a step back, this chapter
also discusses the semantics of engagement metrics on social media, building on a dense body of
literature concerning the ‘real’ and the ‘virtual’ (Rogers, The End of the Virtual; Rogers, Digital
Methods; Gerlitz, What Counts?; Rieder, Studying Facebook via Data Extraction), as well as the
role of Facebook with regard to (the limits of) user expression, experience, and sentiment
(Thaler and Sunstein; Gillespie; Gerlitz and Helmond).
    The second chapter is set against the backdrop of the ‘Alexa Skills store’, a subdomain on
the Amazon website that lists over 30.000 ‘skills’: instantly accessible functionalities that can be
activated on the Echo by using custom voice commands (e.g. “Alexa, what’s my flash
briefing?”). With this market, Amazon invites developers and (commercial) third parties into
their ecosystem, further establishing its position as a platform that caters to – and between –
multiple stakeholders (Rieder and Sire 199). Approaching the Echo from the perspective of such
parties, this chapter evaluates how each understands the device, asking: How do developers and
third parties associated with the Echo ecosystem envision people using the device? I approach

In Alexa We Trust                                                                                   11

this question by resorting to a quantitative empirical analysis of the aforementioned Amazon
subdomain. Subsequently, this chapter employs Skills store data to unveil how Echo users are
actually using the device, asking: How is the Echo actually being put to use and how does this
differ from the usage envisioned by developers and third parties in the ecosystem? How does
this differ from the usage envisioned by the platform itself? To answer these questions, I rely on
two extensive datasets that contain (the metadata of) over 11.000 Alexa skills in total, while also
borrowing from publicly available market research on the subject (Kinsella and Mutchler; NPR
and Edison Research). As the Alexa Skills store concerns a relatively new phenomenon, this
chapter will borrow from platform studies, as well as build on a body of literature surrounding a
similar market place – that of mobile applications (Helmond et al.; Islam; Guzman and Maalej).
The third chapter zooms in on more specific user behavior and sentiment with regard to the
Echo. This last chapter describes both a quantitative and qualitative analysis of the interaction
between Echo users and their devices. It forms, to an extent, a renewal of previous research,
carried out by Purington et al. and described in the 2017 article ‘“Alexa is my new BFF”: Social
Roles, User Satisfaction, and Personification of the Amazon Echo’. Analyzing customer reviews
of Echo-buyers on Amazon.com, the researchers aim to illuminate the ways in which “people
perceive, interact with, and integrate this device into social life” (Purington et al. 2854). This
chapter takes a similar approach, albeit with a different, more up-to-date dataset, and revolves
around the following questions: How do Echo users address and interact with their devices?
How does this behavior subsequently affect user sentiment with regard to the Echo? The main
dataset that is used in this chapter consists of over eighteen thousand customer reviews of the
Echo on Amazon.com. Borrowing from methodologies presented in prior research, this chapter
applies a structured approach to distill from this dense dataset the information needed to
formulate comprehensive answers to the questions above (Purington et al.; Coyne et al.;
Mudambi and Schuff).
Current study brings together theory and empirical data to establish whether and how
anthropomorphism of computers is changing the ways in which we perceive and use these
computers, ultimately touching upon the implications this has for the future of human-computer
interaction. By approaching the Amazon Echo from three different perspectives, collecting and
analyzing vast amounts of data from three of the core ‘pillars’ of the Echo ecosystem, this
research proposes an extensive, yet tangible way of tackling this subject. Importantly, this study
rises to the occasion of exploring the rather unexplored domain of the novel and immensely
popular voice-activated conversational interface. As ‘natural’ and ‘intuitive’ this interface may
seem, its rapid rise can only truly be understood by first scrutinizing the companies behind it.

In Alexa We Trust 12

1. Amazon on Facebook

In 2015, research found that manipulating slot machines to expose users to an
anthropomorphized description of these machines increased gambling behavior. ‘Priming’ such
users with these anthropomorphic machines, the research concludes, makes them gamble – and
lose – more, ultimately benefiting the casino and negatively impacting the gambler (Riva et al.
313). Anthropomorphism of devices, other research underscores, increases the user’s trust for
and engagement with these devices, thereby effectively affecting user behavior (Schuetzler et al.
12). The choice of Amazon to personify the Echo then, can be conceptualized as a “trust-
inducing design strategy”, aimed at establishing a more positive and thus more durable
relationship between users and their devices (Seeger and Heinzl 130).
With regard to commerce, such a relationship may also be a more fruitful one: As
research shows, trust is of landmark importance in the decision-making process of customers
within the domain of e-commerce (Gefen 734). As all of these studies indicate, there are clear
advantages for Amazon to introduce anthropomorphized hardware – none of which are
obviously mentioned in the company’s official press release of the Echo (Amazon, Amazon
Echo Now Available to All Customers). Current research however, does emphasize how the
anthropomorphizing of the Echo device benefits Amazon’s commercial operations. In this first
chapter, I approach such commercial benefits by illuminating the ways in which Amazon actively
shapes public perception of the Echo by deliberately integrating and iterating
anthropomorphism narratives in their public communication around this product, ultimately
exploring if and how this affects the behavior of (prospective) users with regard to their Echo.
To do so in a constructive and tangible manner, this chapter takes a two-fold approach
to the concept of ‘anthropomorphism’. On the one hand, it establishes and compares the ‘degree
of personification’ that Amazon and (prospective) users ascribe to the Echo device. On the
other hand, these parties are scrutinized and compared for the ‘degree of sociability’ they ascribe
to the device in their descriptions of varying use cases. Building forth on methodology
introduced in prior research, this chapter approximates these degrees by analyzing the specific
language used by Amazon and its (prospective) customers – or: users – to address the device
with and to specify how it is (to be) used (Purington et al. 2855). This chapter applies this
approach against the backdrop of the ‘Computer as Social Actors’ (CASA) paradigm, which
describes how “people respond to technologies as though they were human, despite knowing
that they are interacting with a machine” (Nass et al. 228; Purington et al. 2854).

In Alexa We Trust 13

By incorporating the CASA paradigm, I add a certain layer of nuance to the analysis,
arguing that in most of the cases where humans anthropomorphize their computers this does
not imply that they see or treat them as equals. Following the logic of this paradigm, degrees of
personification and sociability are thus to be considered within the confines of human-computer
interaction and are not to be mistaken for measurement tools that transcend this domain and can
simply be applied to approximate the types and ‘depths’ of interaction between equals. As the
CASA paradigm indicates, computers – and other devices – have become social actors that take
on various social roles in our lives, albeit still to a limited, non-human extent.
Anthropomorphism can be considered a logical consequence of the social roles that these
devices have appropriated (Nass et al. 229). In the case of the Echo, this is no different.
However, it is also important to view the anthropomorphizing of the Echo as a ‘response’ to
Amazon’s initial introduction and framing of the device: It is a humanoid – thus social – device
from the very outset. This chapter thus explores the notion of anthropomorphism and its
consequences for human-computer interaction by first examining what precedes it – in this case:
the marketing effort of the Echo.
One way or another, before goods are sold to customers, these customers have to be
convinced of buying them. In other words: these products have to be marketed to prospective
customers (Kotler 46-48). During this marketing process, companies communicate the ways in
which (prospective) customers can use their products. Perhaps unavoidable, this in fact ‘nudges’
those customers towards using the product in specific ways (Thaler and Sunstein 6). In this
respect, the case of the Echo is not any different: When announcing the Echo in 2014 – and in
many marketing efforts since then – Amazon attached clear directions for its usage (Echo
announcement Amazon). To get a deeper understanding of how Echo users interact with and
make use of their devices, it is therefore of key importance to first understand the ways in which
they are being ‘instructed’ to do so – whether that is before or after their purchase. In order to
illuminate the ways in which such top-down instructing occurs, this research analyzes one of the
Echo’s most substantial marketing channels: its Facebook fan page1. This particular page ‘went
public’ (i.e. with the first public page post) in July 2016 and counted over 514.000 followers at
the time of writing.
Arguably Amazon’s largest external marketing channel for the Echo, this Facebook page
forms an important object of study – a rich source that gives insight into the company’s sales
strategy and broader underlying motivations. It is perhaps one of the most accurate lenses
through which the rapid expansion of the smart speaker market can be analyzed, which has

1 Accessed: 7 March 2018.

In Alexa We Trust 14

taken on an almost unparalleled magnitude: Only three years after the first public introduction,
one-sixth of the total US adult population now owns a smart speaker (Ong). Ironically, whereas
it took Facebook, a free software service, two years to reach fifty million ‘customers’, this same
number was reached in three years by smart speaker hardware with selling prices between $30
and $180 (Kinsella and Mutchler 7). With Amazon commanding 72% of this potent market, this
company ought to be the first under scrutiny for the better comprehending of a market that has
grown at such an explosive rate. In doing so, this chapter rises to the important occasion of
illuminating the rapid, yet in many ways early stage rise of the voice-activated conversational
interface (Dale 815-817).
        As research object in the domain of media studies in general and ‘platform studies’ in
particular, Facebook has been approached from a vast range of perspectives. In its capacity as
intermediary, Facebook is often brought forward as a multi-sided market that aims to cater to all
of its stakeholders (Rieder and Sire 199). Other observers emphasize the technical specificities
with which the platform determines and streamlines user behavior and data flows (Gerlitz and
Helmond; Mittelstadt et al.), or the role of the platform’s ‘political affordances’ in this context
(Gillespie). Whereas these studies emphasize Facebook as an entity of which the technical and
political affordances guide and restrict the maneuvering space of its different stakeholders,
Facebook is also often conceptualized for the infrastructure it in fact offers to third parties to
benefit from (Bogost and Montfort). Furthermore, at the intersection of media studies,
psychology, social and political sciences, Facebook has notoriously been illuminated for its
capacity to classify, predict, and modify user behavior (Bachrach et al.).
        Borrowing from all of these approaches, yet not remaining confined to their exclusivity,
this chapter first seeks to answer the question: How does Amazon employ the notion of
anthropomorphism in presenting the Echo to its prospective customers – or: users – on
Facebook? As this question indicates, the terms ‘customer’ and ‘user’ can be considered
interchangeable throughout this chapter, unless otherwise stated. To approach this question, I
first take a step back and briefly disconnect from the underlying theoretical framework of
platform studies to emphasize the importance of the specific language that Amazon conveys to
nudge its users in specific directions. With this narrowed down approach, I aim to identify
patterns in the interaction between Amazon and (prospective) Echo users. Subsequently, I
reconnect to the broader framework of platform studies and consult these patterns to answer the
question: How does Amazon’s framing of the Echo subsequently affect the relationship between
Echo users and their devices? To contribute to the formulation of a more constructive answer to
these questions, this chapter makes use of empirical research. By elaborating on these questions

In Alexa We Trust                                                                                     15

and introducing two extensive datasets, I argue, Amazon’s underlying motivations for the
anthropomorphism of the Echo can be mapped and better apprehended. This apprehension,
ultimately, is necessary for a more conclusive approximation of the main research question: How
does the anthropomorphizing of the Amazon Echo device affect human behavior in general and
human-computer interaction in particular?
        On the one hand, research on the Facebook page of the Echo gives meaningful insight
into Amazon’s underlying marketing strategy. On the other hand, as Arvidsson points out, these
are also sites of collaborative consumer practices: places where consumers are not only told
about products top-down, but also contribute to the value creation of those products
(Arvidsson, The Potential of Consumer Publics 368). In such places, consumers are in fact
becoming producers (Arnould and Thompson 868-870). It is often on the basis of these so-
called ‘consumer publics’ that suggestions for the innovative use of products arise, which in the
long run may form a “common horizon of values that (…) determine the direction of [the
consumers’] passions and engagements” (Arvidsson, The Potential of Consumer Publics 370;
384). With regard to the Echo, or any such device for that matter, there has been little research
on consumer publics. This chapter however, approaches Amazon’s marketing effort on
Facebook not only as top-down communication, but also as a two-way interaction between
producer and consumer – the latter becoming increasingly difficult to distinguish from the
former (Lloyd 42). The Echo, this chapter argues, is a fluid, ‘in flux’ product, the narrative
around which is changing continuously and is at least in part determined by the product’s users
in a process called ‘customer coproduction’ (Arnould and Thompson 869).
        Building on the aforementioned theoretical framework that describes the conjoining of
consumers and producers, this chapter supplements theory with empirical data, introducing the
‘engagement metrics’ of the Echo Facebook page. An analysis of these metrics, of ‘natively
digital objects’ such as likes, shares, and reactions, gives insight into how producer-consumer
interaction shapes the narrative around the Echo device. Importantly, this chapter first takes the
necessary step back and approach the semantics of such natively digital objects. In line with
Richard Rogers’ studies that introduced the field of ‘digital methods’ and argued the ‘end of the
virtual’ (Rogers, The End of the Virtual; Digital Methods), I argue how societal and cultural
claims can be made on the basis of research of digital sources alone. Agreeing on this ‘online
groundedness’, this chapter at the same time acknowledges and respects the limits of digital
methods when it comes to the approaching of the ‘real’ through the lens of the ‘virtual’ (Rogers,
Digital Methods 29). To illustrate this nuanced approach: In this chapter, a ‘like’ on Facebook is
in itself an object of study that may indicate the enjoyment of a user with regard to what she

In Alexa We Trust                                                                                   16

liked, while at the same time presenting a form of user expression that can only be witnessed in a
digital environment and thus cannot said to be representative of any form of user expression
witnessed outside of the digital domain – or: outside of Facebook for that matter.
        By liking, whatever such user expression may in fact represent, users produce and engage
with Facebook’s data, in this case participating in the shaping of a narrative around the Echo on
the platform (Gerlitz). Thus, in the process of approaching the subject of data semantics
(Rogers; Rogers), it is also important to consider the role Facebook plays in the formulation of
these semantics. To do so, this chapter returns to platform studies and examine how Facebook
both enables and restricts user expression (Bogost and Montfort; Gerlitz and Helmond;
Gillespie). It is argued how the very design of Facebook – its political and technical affordances
– nudges and restricts users in voicing their true feelings with regard to the Echo device (Gerlitz
and Helmond; Gillespie). As this is a commercial platform that is subject to – and benefits from
– the ‘law of the network effects’, which holds that an increased usage further increases usage, it
strongly encourages any form of user participation (Rieder and Sire 200; Bucher 484). In this
sense, Facebook is not neutral, but rather driven by technology that is designed and motivations
that are commercial (Bucher 480). This research holds that a like of a user, to continue this
illustrative example, cannot be regarded as mere user expression, but should also be considered
to be co-produced by Facebook, which in fact benefits from increased user participation and
thus stimulates such interaction (Gerlitz and Helmond 1361-1362).
        However, the main aim of this first chapter is not merely to analyze and explain
Facebook data, but rather to examine the behavior of Amazon on Facebook, focusing on how
the company uses anthropomorphism in the narrative around the Echo to shape and modify
customer behavior with regard to this device. Even though these customers indeed co-produce
this narrative, as this chapter also brings forward, I emphasize how Amazon’s ‘instructions’ on
Echo interaction and usage precede any such co-production process. This chapter hereby
presents both a top-down and bottom-up dialogue between the Echo producer and consumer.
Indeed, whatever shape or form this dialogue holds, it takes place within the confines of
Facebook. As one of the main pillars of the current Echo ecosystem, this platform is thus
scrutinized for the role it plays as intermediary between producer and consumer – or: company
and customer – as this chapter borrows from a multiplicity of researches mostly originating from
the field of platform studies.

In Alexa We Trust                                                                                  17

1.1 Method

The first data sample used for the research in this chapter consists of 288 unique page posts that
were collected from the official Amazon Echo Facebook page. Spanning the page’s entire public
lifetime – from its first post in July 2016 to the data extraction for this research in March 2018 –
this sample does not necessarily represent all contact moments between Amazon and its
customers via this medium. Indeed, posts may have been deleted in the meantime. Due to
Facebook privacy regulations, there is no way of recovering deleted data and revealing the
complete history of the page. 288 posts however, do make for a substantial data sample – one
that suffices for the purposes of this research.
This dataset contains the textual content of all of these posts as well as the specification of
their ‘type’ (e.g. photo, video, link, etc.). The temporal and “post-demographical” properties of
the posts are also included: its publication date and time and a wide range of ‘engagement
metrics’ (e.g. likes, comments), as well as other information that is not important for this
particular research (Rieder, Studying Facebook via Data Extraction 346). With the notion of
‘consumer publics’ in mind, this dataset is further supplemented by a second dataset that zooms
in on user comments to posts. For the sake of a meaningful, qualitative analysis, this dataset is
limited to 721 user comments – covering all twenty posts of the month December 2017. As is
discussed later, this particular period was selected in an effort to pursue academic consistency,
echoing prior research that covers the same month in 2016 (Purington et al.).
Both datasets were formed using Bernhard Rieder’s Netvizz application: “A data collection
and extraction application that allows researchers to export data in standard file formats from
different sections of the Facebook social networking service” (Rieder, Studying Facebook via
Data Extraction 346). Netvizz, which has a monthly active user base of over 3000 at the time of
writing, is an application that only functions within the confines of Facebook and thus requires a
user to have a Facebook-account2. To retrieve data, it makes use of the sanctioned Facebook
‘Application Programming Interface’ (API) (Idem 348). Netvizz is written in the PHP-language
and runs on a server that is provided by the Amsterdam-based Digital Methods Initiative (Idem
349). The tool allows for the quantitative and qualitative analysis of friendship networks, groups,
and pages on Facebook. This research only makes use of the Netvizz tool with regard to
Facebook pages – in this case: The Amazon Echo page. While the analysis of both friendship
networks and groups faces reliability issues on the basis of privacy settings of individual users,

2
Accessed: 7 March 2018.

In Alexa We Trust 18

page engagement data – which forms the core of this analysis – can be considered more robust
(Idem 349).
Reliable in a technological sense, e.g. page data retrieved with Netvizz does not contain any
miscalculations or leave any dubious blank spaces, there are however some reliability as well as
validity issues when it comes to the semantics of the data. For example, taken into consideration
for this research are only the textual capacities of posts and comments; any accompanying
images, videos, or links are left out of the equation. This strong focus on text leaves obvious
questions about the impact of such added media unanswered, potentially harming the reliability
and validity of the data. Further, considering page engagement data, some validity questions
arise: What does a like actually represent? And what about a wow reaction? What does it mean
when someone shares a post? Importantly, carrying out such research into user behavior,
expression, and interaction on Facebook, encloses the researcher within the confines of the
technical and visual affordances of such a platform – what Agre famously deemed its “grammars
of action” (Agre 745). Facebook’s very architecture and policy in fact determine and thereby
limit the freedom of movement and expression of its users, forming “real and substantive
interventions into the contours of public discourse” (Gillespie 359).
Regardless of these semantic complexities, “for researchers from the humanities and social
sciences”, as Rieder points out, “the possibility to analyze the expressions and behavioral traces
from sometimes very large numbers of individuals or groups using these platforms can provide
valuable insights into the arrays of meaning and practice that emerge and manifest themselves
online” (Rieder, Studying Facebook via Data Extraction 347). As Rogers argues, Facebook is not
merely a “virtual space” that exists in isolation of “real life”. Rather, it can be regarded as “a
source of data about society and culture” (Rogers, Digital Methods 29). Compared to traditional
empirical methods such as experiments or interviews, using data capturing software such as
Netvizz has the added value of producing ‘observational data’ (i.e. data documenting what
people do, instead of what they say they do) – besides having more obvious advantages in the
domains of cost, speed, and exhaustiveness (Rieder, Studying Facebook via Data Extraction 346-
347).
Having outlined what the data sample represents and how it was formed, briefly discussing
reliability and validity issues, I now go into more detail and discuss the specific procedures that
were carried out during this research. First, to retrieve the main dataset with Netvizz, the
numerical page id of the Amazon Echo page was recovered using Lookup-id3. The other
parameters were then specified to capture the page’s post history in its entirety (see Figure 2). To

3
Accessed: 7 March 2018.

In Alexa We Trust 19

be sure, the page itself was also analyzed manually and the first post was found to date from 13
July 2016. This manual analysis unveiled that the page did not allow for any posts by users to be
displayed. Thus, after setting the parameters, I retrieved the posts by page only. To be sure,
choosing the post by page and users in fact returned identical results. Netvizz then returned a zip-file
with two tabular files: one containing the 288 page posts (ranging from 13 July 2016 until 7
March 2018) with metadata and engagement metrics and the other merely describing the
engagement statistics per day (see Appendix I). A third tabular file, describing the page’s fans per
country, was not included, due to recent changes in Facebook’s API policy (Kmieckowiak).

Figure 2. The exact parameters with which this research used the Netvizz application to request data from the
official Amazon Echo page on Facebook. In this case: the last 999 page posts and accompanying metadata.

Secondly, following a similar procedure, the second dataset was retrieved. Zooming in on
December 2017, this particular dataset contains all user comments to the posts of this particular
month. It was retrieved using Netvizz with the parameters as specified in the Figure below (see
Figure 3). Again, to activate Netvizz and start data retrieval, I selected the post by page only option.
In this case, three tabular files were returned: the same two as with the aforementioned request
and a third containing all user comments. In total, before filtering, this last file contained 950
comments to 20 posts (see Appendix II).

In Alexa We Trust 20

Figure 3. The exact parameters with which this research used the Netvizz application to request data from the
official Amazon Echo page on Facebook. In this case: user comments on the December 2017 page posts.

Thirdly, both datasets were filtered. For the first dataset, which is referred to as ‘1A’ from here
on, only the tabular file containing the actual posts was used for this research and thus subjected
to filtering. For the second dataset, which is be referred to as ‘1B’ from here on, this was the case
for the tabular file containing the user comments only. Both ‘raw’ datasets – although some
would argue that data is always already ‘cooked’ (Gitelman) – were filtered and analyzed using
the built-in filter function of Google Sheets. For the filtering of 1A, the irrelevant columns of data
were omitted from the file, leaving the type of post (link, status, photo, or video), its text, publish
date, and engagement metrics (total engagement; likes, comments, shares, and types of reactions)
(see Figure 4). Similar filtering was done for 1B, preserving the following data: the post to which
the comment forms a reply, temporal data (of post and comment), whether the comment
directly replies to a post or to another comment, the text of the comment, and its number of
likes (see Figure 5). To not overcomplicate the analysis of 1B, second-tier and even third-tier
replies (replies to replies, etc.) were omitted from the data sample, leaving 721 of 950 comments
(76%).

Figure 4. An excerpt of the tabular file 1A after step one of filtering: omitting irrelevant columns of raw data. Row
one consists of data categories. Row two contains (meta)data of a post.

Figure 5. An excerpt of the tabular file 1B after step one of filtering: omitting irrelevant columns of raw data. Row
one consists of data categories. Row two contains (meta)data of a comment.

In Alexa We Trust 21

Lastly, both the 288 posts (post_message column in Figure 4) and 721 comments (comment_message
column in Figure 5) were grouped – albeit in separate files – on the basis of a singular textual
characteristic: whether they contained the word ‘Alexa’ and/(n)or ‘Echo’. This particular
parameter was established as an indicator of the degree of personification with which both
Amazon and its followers addressed the Echo device, a method derived from previous research
by Purington et al. (2856). In line with the methodology of this research, posts and comments
describing the technology as a person (using the name ‘Alexa’) were categorized separately from
those describing the technology as an object (using ‘Echo’) and those referring to both or none
(Purington et al. 2855). Then, all posts and comments were reviewed qualitatively to establish the
degree of sociability they ascribed to the device. This was done on the basis of functionalities
and roles of the Echo that were described in these posts and comments. In line with the
methodology of aforementioned study, which in turn relies on the CASA paradigm to approach
anthropomorphism from, five separate categories were identified and coded – and recoded by a
second coder – to represent varying degrees of sociability, from least sociable (0) to most
sociable (4). Deviating from this methodology, this research merged the ‘Companion’ and
‘Friend’ categories, as it found the distinction between the two hard to establish and irrelevant
(Purington et al. 2854) (see Figure 6).
Degree of sociability (what kind of interaction with the Echo is described?)
Code Functionality of device Example 1A Example 1B
0 None / not specified Say hello to the all-new Echo When are we getting this in
Dot. Add Alexa to any room the uk?
for only $49.99. #JustAsk
amzn.to/EchoDot
1 Information source (providing news, #JustAsk for weather Alexa no longer recognizes

weather, facts) information and more. The "WBUR" as a streaming
all-new Echo Dot for only radio station. (after months
$49.99. and months of working
correctly) It keeps asking me if
I want to add an entry to
Pandora.
2 Entertainment provider (playing It’s summer so why not enjoy a I want to play the music I own

music, audio books, games, telling soundtrack of seasonal hits? on my Alexa. I used to be
Ask “Alexa play the able to upload it to Amazon
jokes)
Summer Vibes station from Music and it'd play. Now,
Prime.” you stopped accepting uploads.
Now what do I do?

In Alexa We Trust 22

3 Personal assistant (managing Order stuff anytime night or How hard can it be to turn

shopping, timers/alarms, schedules) day. Add Alexa to any room lights on at dusk? Echo is the
with Echo Dot for only only smart home device that
$49.99. #JustAsk does not have that
functionality. Please add this
to routines.
4 Companion / Friend (conversation Busting out memories from the Chad Peery this is your
past? It’s easy with the all-new girlfriend!
partner, friend, family member,
Echo Dot available for only
roommate, etc.)
$49.99. #JustAsk

Figure 6. Categories with which degree of sociability of posts (1A) and comments (1B) was established, with
examples for both datasets. This methodology was largely derived from a previous study by Purington et al.

1.2 Results

Before delving into the results, it is important to briefly mention the complications that surfaced
during data retrieval, filtering, and grouping. First, the Netvizz tool, however robust for
retrieving page data, has at least one weak spot in this respect. As displayed in Figure 2 and 3
above, the maximum amount of posts to be retrieved is 999 a time. If a page has less than 999
posts – which is difficult to establish beforehand – while the researcher requests this maximum
retrieval of 999, Netvizz returns a tabular file of 999 posts that iterates some of the earliest
post(s). In the case of the Amazon Echo page, the tabular file of 1A contained around 700
duplicates of the page’s first three posts. These duplicates had to be removed manually. This
glitch did not however, as was established by comparing data manually, skew any (meta)data of
these posts. Secondly, there was at least one major outlier in 1A, distorting the average results
and complicating any meaningful observations. In the following section I discuss how this
complication was resolved. Lastly, the coding of the types of interaction to establish degrees of
sociability remains a manual task and is thus exposed to subjectivity and bias. To tackle such
issues a second coder was employed and the inter-coder reliability was established at Cohen’s
k=0,85 (Cohen 37-40).
The first research results of the 1A dataset revolve around the use of the words ‘Alexa’
and ‘Echo’ in posts, where the former represents a higher level of device personification than the
latter (Purington et al. 2855). As described in the methodology, posts mentioning ‘Alexa’ were
separated from those mentioning ‘Echo’, those mentioning both and those mentioning neither.
Importantly, posts mentioning ‘Alexa’ in a non-personifying manner (e.g. ‘Alexa app’) were

In Alexa We Trust 23

You can also read