MODELING CHOICE INTERDEPENDENCE IN A SOCIAL NETWORK

Page created by Tyrone Hogan
 
CONTINUE READING
MODELING CHOICE INTERDEPENDENCE IN A SOCIAL NETWORK
MODELING CHOICE INTERDEPENDENCE IN A SOCIAL NETWORK

                     Jing Wang, Anocha Aribarg, and Yves F. Atchadé *
                                         July 2013

                              Forthcoming, Marketing Science

*
 Jing Wang is specialist in Asia-Pacific marketing and sales practice of McKinsey & Company,
233 Taicang Rd, Shanghai, China 200020. E-mail: jingwang16@gmail.com. Anocha Aribarg is
Associate Professor of Marketing at Stephen M. Ross School of Business, University of
Michigan, 701 Tappan Street, Ann Arbor, Michigan 48109. E-mail: anocha@umich.edu. Yves F.
Atchadé is Associate Professor of Statistics at the Department of Statistics, University of
Michigan, 454 West Hall, 1085 South University Avenue, Ann Arbor, Michigan 48109. E-mail:
yvesa@umich.edu. The paper is based on the first author’s dissertation. We thank Stefan
Witwicki for his assistance in JAVA and Flash programming to facilitate our data collection
process. We thank Natasha Zhang Foutz for great comments for our paper. Finally, we thank the
review team for providing us with valuable comments and directions for the paper.

                                                                                            1
MODELING CHOICE INTERDEPENDENCE IN A SOCIAL NETWORK
ABSTRACT

This paper investigates how individuals’ product choices are influenced by the product choices

of their connected others, and how the influence mechanism may differ for fashion versus

technology-related products. We conduct a novel field experiment to induce and observe choice

interdependence in a closed social network. In our experiment, we conceptualize individuals’

choices to be driven by multi-attribute utilities and measure their initial attribute preferences

prior to observing their choice interdependence, as well as collecting network information. These

design elements help alleviate concerns in identifying social interaction effects from other

confounds. Given that we have complete information on choices and their sequence, we use a

Discrete-time Markov Chain (DMC) model. Nonetheless, we also use a Markov Random Field

(MRF) model as an alternative when the information on choice sequence is missing. We find

significant social interaction effects. Our findings show that while experts exert asymmetrically

greater influences for a technology-related product, popular individuals exert greater influences

for a fashion-related product. In addition, we find choices made by early decision makers to be

more influential than choices made later for the technology-related product. Finally, using the

MRF with snapshop data can also provide good out-of-sample prediction for a technology-

related product.

Keywords: social interactions, social network, social influence, homophily, conjoint experiment,

Markov random field

                                                                                                    2
MODELING CHOICE INTERDEPENDENCE IN A SOCIAL NETWORK
Researchers have long recognized the impact of social interactions among individuals in

a social network on their preference evolution and product adoptions. The rapid growth of social

networking websites has facilitated social interactions to an unprecedented level. In addition to

encouraging users to maintain current and build new connections on the site, Facebook

automatically passes along information about users’ site activities to their friends. For instance,

when users become a fan of a brand page or respond to an interactive ad on the site, their friends

receive a notification. These notifications are sent with the expectation that they may influence

others. Social interaction effects are also undoubtedly prevalent in offline contexts. Individuals’

decisions to purchase a hybrid vehicle, for example, may be influenced by recommendations

from their neighbors. Teenagers’ decisions to purchase certain brands of jeans can be influenced

by observing their peers’ choices. In this paper, we examine social interactions in a multi-

attribute choice context where we conceptualize choice interdependence in a social network as

being driven by individuals’ preference shifts induced by social interactions after controlling for

their initial preferences.

        Although extant research has attempted to measure social interaction effects (e.g., Nair,

Manchanda, and Bhatia 2010; Nam, Manchanda, and Chintagunta 2010; Tucker 2008), it does

not recognize that such effects may arise from different types of social influence. Prior

behavioral research (Bearden and Etzel 1982; Burnkrant and Cousineau 1975; Deutsch and

Gerard 1955) distinguishes a value-expressive from an informational social influence. While an

informational influence is driven by an individual’s goal to gain more knowledge, a value-

expressive influence is driven by someone’s goal to maintain and enhance their self-perception

or identity in relation to their peers. Given different goal orientations, consumers rely more on

the credibility of the source of influence when it comes to an informational influence and more

                                                                                                      3
on desirable characteristics of the source when it comes to a value-expressive influence

(McGuire 1969). Given that we expect an informational influence to play a more important role

for a technology-related product and a value-expressive influence for a fashion-related product,

we examine how experts (i.e., more credible) versus popular (i.e., with a more desirable

characteristic) individuals are asymmetrically more influential for technology-related versus

fashion-related products. In addition, previous research suggests that two distinct groups of

influentials and imitators may play different roles in the diffusion of innovations (Van den Bulte

and Joshi 2007). That is, influentials tend to be opinion leaders (Katz and Lazerfeld 1955) and

early adopters who influence, but are not or to a lesser degree influenced by, imitators. Provided

with information about the sequence of choices, we also assess whether individuals are more

influenced by the choices made by early decision makers, as opposed to the choices made later.

       Prior research has attempted to identify the causal effects of social interactions

(Hartmann et al. 2008; Manski 1993, 2000; Moffitt 2001). With observational data, social

interactions are likely confounded with homophily or endogenous group formation, correlated

unobservables, and simultaneity. We propose a novel conjoint experiment for studying social

interactions that helps alleviate these endogeneity concerns. Specifically, we examine choice

interdependence in a social network and conceptualize consumer choices to be driven by multi-

attribute utilities. We conduct a two-stage choice experiment with a pre-influence stage, followed

by a post-influence stage. The experiment also involves two product categories: a fashion-related

one and a technology-related one. In the pre-influence stage, we use a conjoint choice

experiment to measure each participant’s initial preferences for different product attributes that

govern the utilities of the product choices. We also collect information about their social

connections within a closed social network. In the post-influence stage, we conduct an incentive-

                                                                                                     4
aligned choice experiment to induce and observe choice interdependence within the network.

This experimental design helps us to identify preference shifts as a result of social interactions

separately from initial preferences.

       To study choice interdependence and identify the social interaction effects, it is also

desirable for researchers to observe both individual choices and the sequence of choices

(complete data). We fit a discrete-time Markov chain (DMC) model to the complete data.

However, oftentimes researchers do not keep records of the sequence due to a capacity

constraint, or in offline contexts it may not even be feasible to observe the choice sequence. In

these situations, researchers have to work with cross-sectional data where realized choices are

recorded only at a particular time point (snapshot data). Given their theoretical connection, we

demonstrate how researchers can use a Markov random field (MRF) model as an alternative to a

DMC model when only snapshot data are available. Both DMC and MRF models are appropriate

for modeling choice interdependence in a social network because both models properly account

for the dependence structure in the data (i.e., the joint probability of individual choices).

       Controlling for individuals’ heterogeneous initial preferences, we find significant social

interaction effects based on both the DMC and MRF models. Our findings suggest that, on the

one hand, popular individuals (i.e., those with more connections) exert greater influence for a

fashion-related product where value-expressive influence is more prominent. On the other hand,

experts exert greater influence for a technology-related product where informational influence

dominates. We also find that individuals who make choices earlier are more influential in the

technology-related product category. This result alludes to the opinion leadership of early

decision makers for this product category. Finally, our model comparison shows good out-of-

sample predictive performance of the MRF model for the technology-related product.

                                                                                                     5
The contribution of our paper is threefold. First, we assess how different types of

influences, informational versus value-expressive, lead experts versus popular individuals to

asymmetrically exert greater influence, respectively, for technology-related versus fashion

related-products. We also examine the potential disproportionate influence of early decision

makers. Second, coupled with network information, we demonstrate how researchers can use a

two-stage conjoint choice experiment, which is conceptualize based on the multi-attribute utility

framework (Keeney and Raiffa 1993), to identify individual preference shifts induced by social

interactions after controlling for their initial preferences. Our approach helps mitigate the

endogeneity concern. Finally, given that we use the DMC to model choice interdependence when

the information on both choices and their sequence is available, we examine how the

theoretically connected MRF model can be used as an alternative when the sequence information

is missing—a common occurrence in most online and offline situations.

        The paper is structured as follows. We begin with the literature review in relation to our

conceptual framework and then present our modeling framework. This is followed by a

description of our experimental design and data collection procedure. Our empirical analysis

section includes: i) descriptive statistics of the participants and their social network; ii) the

analysis of individual-specific initial preferences; iii) the detailed model specification and

estimation procedure; iv) model comparison; and v) estimation results. The discussion section

concludes our paper.

              LITERATURE REVIEW AND CONCEPTUAL FRAMEWORK

        There is a sizable body of literature on social interaction effects (also referred to as peer,

neighborhood, and social contagion effects) in the fields of economics, sociology, and marketing.

                                                                                                         6
In economics, the primary focus is on identifying an “active” social interaction that gives rise to

a social multiplier using observational data (Brock and Durlauf 2001; Manski 1993, 2000).

Hartmann et al. (2008) distinguish active from passive social interactions; the former involves

two individuals’ actions being simultaneously affected by the actions of the other and resolved as

an equilibrium outcome (e.g., Hartman 2010) while the latter treats the impact of an individual’s

action on the other’s action as being sequential in nature. Although the concept of active social

interactions originates in sociology (Granovetter 1978; Schelling 1971), the majority of work in

this discipline does not hinge upon an equilibrium assumption (i.e., focus on passive social

interactions), and the emphasis is placed on how the structure of individuals’ social network

affects how social interactions propagate information diffusion in the network (Granovetter

1973; Katz and Lazarfeld 1955; Watts and Dodds 2007). Comparing to sociology, not much

attention has been paid to the structure of a social network in the economic literature. Early

research in economics and sociology mainly proposes theoretical frameworks to characterize the

mechanism of social interactions without corroborating empirical evidence.

       Despite no explicit attempt to model social interactions, the marketing literature has long

recognized their role (i.e., influentials vs. imitators) in the diffusion of innovations (Bass 1969;

Van den Bulte and Stremersch 2004; Van den Bulte and Wuyts 2007). With the availability of

observational data especially in online contexts, research in marketing explicitly links social

interactions to firm performance and product adoptions (Godes et al. 2005; Hartmann et al.

2008). On the one hand, some researchers have examined the effects of aggregate word-of-

mouth (Chen, Wang and Xie 2011; Chevalier and Mayzlin 2006; Godes and Mayzlin 2004; Zhu

and Zhang 2010), online viral marketing activities (De Bruyn and Lilien 2008; Toubia, Freud,

and Stephen 2011), and online community participation (Algesheimer et al. 2010; Manchanda,

                                                                                                       7
Packard, and Pattabhiramaiah 2013; Stephen and Toubia 2010) on aggregate sales. On the other

hand, others attempt to establish the casual effects of social interactions on adoption decisions at

the individual level (Nair, Manchanda and Bhatia 2010; Nam, Manchanda, and Chintagunta

2010; Tucker 2008; Zhang 2010). A few researchers examine social interactions in a multi-

attribute utility framework1 (Narayan, Rao, and Saunders 2011; Yang and Allenby 2003) or

explicitly collect network information (Nair, Manchanda, and Bhatia 2010; Narayan, Rao, and

Saunders 2011; Reingen et al. 1984). In studying choice interdependence, we conceptualize

individual choices to be driven by multi-attribute utilities and focus on the passive form of social

interactions. We incorporate explicit information on a network of individuals that is not

endogenously determined by choice behavior.

        None of the aforementioned research considers that social interaction effects may arise

from different types of social influence. Behavioral research (Bearden and Etzel 1982; Bearden,

Netemeyer, and Teel 1989; Burnkrant and Cousineau 1975; Deutsch and Gerard 1955)

distinguishes a value-expressive from an informational social influence. According to Kelman

(1958; 1961), while an informational influence operates through the internalization process, a

value-expressive influence operates through the identification process. An informational

influence is driven by individuals’ goal to gain more knowledge to make informed decisions.

Faced with uncertainty, individuals seek information from sources that are viewed as credible

(DeBono and Harnish 1988; Kelman 1958; McGuire 1969). In contrast, a value-expressive

influence2 is driven by someone’s goal to maintain and enhance their self-perception or identity

1
  Some research relies on the random utility framework (Brock and Durlauf 2001; Hartman 2010), but not the multi-
attribute utility framework.
2
  Previous research distinguishes two types of normative influence: i) a value-expressive influence, which is
motivated by self-maintenance or enrichment and ii) a utilitarian influence, which is motivated by external rewards.
We focus only on a sub-type of normative influence, namely, the value-expressive influence, as it simplifies the
connection between the type of social influence and technology-related versus fashion-related products. Note,

                                                                                                                   8
by associating with a reference group (Belk 1988). These individuals would associate

(disassociate) themselves with a reference group they perceive as being positive (negative)

(Miller, McIntyre, and Mantrala 1993). As a result, they rely on desirable characteristics of the

information source, such as attractiveness (DeBono and Harnish 1988; Kelman 1958), as criteria

to judge the value of the information.

         Park and Lessig (1977) suggest that the extent to which different types of social influence

affect an individual’s brand selection varies by product category. Informational influence plays a

more important role for a technology-related product; therefore, we expect that individuals pay

more attention to the source credibility, and experts exert an asymmetrically larger influence in

this product category. In contrast, value-expressive influence dominates in a fashion-related

product; therefore, we expect that individuals focus more on the desirable characteristics of the

source, and popular individuals exert an asymmetrically larger influence in this product category.

The asymmetric effects of social interactions have been documented in the prescription drug

(Nair, Manchanda and Bhatia 2010) and video-messaging (Tucker 2008) categories. However,

the study contexts primarily involve informational influences. In addition, Nair, Manchanda, and

Bhatia (2010) examine asymmetric social interaction effects for prescription decisions—

decisions for others, not self. Tucker (2008) studies social interactions in the presence of network

externalities—a larger number of adopters enhance product performance and benefits. We study

choice interdependence in the absence of network externalities.

         In the context of diffusion of innovations, influentials may have a greater influence than

imitators (Van den Bulte and Joshi 2007; Van den Bulte and Wuyts 2007). Influentials are early

adopters of innovations, who are likely to be opinion leaders (Katz and Lazarfeld 1955; Myers

however, despite their conceptual differences, researchers have had difficulty in empirically distinguishing value-
expressive and utilitarian influences (Burnkrant and Cousineau 1975; Bearden, Netemeyer and Teel 1989).

                                                                                                                      9
and Robertson 1972; Rogers and Catarno 1962). Czepiel (1974) reports early adopters to exhibit

great opinion leadership in the diffusion of technological innovation among competing firms.

Nair, Manchanda and Bhatia (2010) find that research-active specialists who are supposed to be

early decision makers exert asymmetrically higher influence. Although not in the context of

innovation adoptions, Trusov, Bodapati, and Buklin (2010) also document that users who have

been a member of a social networking site for a longer period of time (measures by months user

has been a member) are more influential than others in stimulating activities on the site. Given

the information about the choice sequence, we also assess whether early decision makers (i.e.,

influentials) have a greater influence than later decision makers (i.e., imitators). In studying

choice interdependence, we examine how experts versus popular individuals, respectively, exert

asymmetrically greater influence for technology-related versus fashion-related products, and how

the choices made by early decision makers may be disproportionately more influential.

                                  MODELING FRAMEWORK

       To examine choice interdependence in a social network, it is desirable to have complete

longitudinal choice data where both choices and their sequence are observed. With the choice

sequence information, we can examine how observed choices in a particular order may be more

influential than others. It also allows us to model not only which choice is made, but also who is

likely to make a choice at a particular time. Snapshot data, on the other hand, are cross-sectional

choice data collected at a fixed time point. This type of data is easier and less expensive to

obtain. However, a lack of information on the choice sequence makes it difficult to understand

the dynamic process, as different processes can give rise to the same snapshot data. No matter

what type of data is available, we need to model all consumers’ choices jointly rather than

                                                                                                   10
independently to properly preserve the dependence structure in the data. In modeling choice

interdependence, we treat the social network as being exogenous and fixed instead of modeling

its formation or properties (Ansari, Koenigsberg, and Stahl 2011; Iacobucci and Hopkins 1992;

Reingen and Kernan 1986; Watts and Dodds 2007).

Discrete-Time Markov Chain for Complete Data

       Besides sociology, social networks and their influences on individuals’ behavior have

also sparked interests of researchers in other fields, such as statistics and computer science.

While some researchers (Hoff, Raftery, and Handcock 2002; Hunter 2007; Robin et al. 2007;

Wasserman and Pattison 1996) primarily focus on modeling the structure (i.e., connections

among nodes) and properties of a social network, others (Hill, Provost, and Volinsky 2006) like

us attempt to model the behavior of the network nodes, e.g. choice decisions such as in our

study. For instance, Koskinen and Snijders (2007) and Snijders, Steglich, and Schweinberger

(2007) model the dynamics of network formation and behavioral evolution simultaneously by

collecting the snapshot data of network structure and behavioral outcomes at more than one time

point (e.g., every three months). However, the authors collect data at a few time points. As such,

between two consecutive time points there are multiple processes that can lead to the same

observed behavioral outcomes. They hence use a continuous-time Markov process to fit the data

and treat the possible processes between consecutive time points as latent variables to be

augmented. With complete data, we do not need to rely on augmented variables and adopt a

simpler discrete-time Markov chain (DMC) model (Ross 1996).

       Following Koskinen and Snijders (2007), we assume at most one individual may make a

choice decision at a given time (i.e., no multiple decisions are allowed simultaneously). This

                                                                                                  11
assumption is reasonable in the context of sequential choice like ours, as it is unlikely that two

individuals will make choice decisions at the exact same time. The assumption simplifies our

model specification, as we do not need to handle the possible dependence of the transitions made

at the same time. The state of the Markov chain hence has two parts: i) who is likely to make a

choice and ii) choices made by all n individuals in a network at a particular time. Again, we

emphasize that the state of the Markov chain involves all individuals’ choices rather than a single

                                                                                              t
person’s due to the interdependence. We denote “who is likely to make a choice” at time t by c ,

which takes values from 1 to n, and individual i’s choice at time t by Yi t , which takes nominal

values 0, …, K. The state at time point t is S t  (c t , Y1t ,..., Ynt ) . The first-order Markovian property

dictates that:

(1)                   p(s t 1 | s1 ,..., s t , )  p(s t 1 | s t , ) .

          Under the assumption that only one transition is allowed, only one person’s choice Yctt11

can be different from Yctt 1 , while all other choices remain the same, Y jt 1  Y jt for j  c t 1 . Then

we have:

(2)                    p ( s t 1 | s t , )  p (c t 1  i, y it 1 | c t , y t , ) I ( y t i 1  y t i )

                                  p (c t 1  i | c t , y t , ) p ( yit 1 | c t 1  i, c t , y t , ) I ( y t i 1  y t i ) ,

                                                                                                                    T 1
where yi  ( y1 ,..., yi 1 , yi 1 ,..., yn ) . The likelihood is p ( s 1 ,..., s T | s 0 , )   p ( s t 1 | s t , ) , where
                                                                                                                    t 0

y 0 and c 0 are to be specified later. Researchers can postulate any probability form of interest

for P(c t 1  i | c t , y t , ) . Conditional on c , there is great flexibility in specifying the
                                                    t

multinomial choice probability p ( y it 1 | c t 1  i, c t , y t , ) (e.g., a multinomial logit). We include

I ( y t i 1  y t i ) for completion to assure that only one choice can be made at a time. Note that a

                                                                                                                                       12
transition is determined by an individual attempting to make his or her choice. Given that DMC

models everyone’s choice together, it preserves the interdependence within the system.

Markov Random Field for Snapshot Data

        Recognizing that often time researchers may not have information on the choice

sequence, we examine the use of a Markov Random Field (MRF) model with snapshot data.

Despite the differences in the formulation and type of data they handle, the DMC and MRF are

closely connected. We provide a proof in online Appendix A that under certain conditions, the

DMC converges to the MRF or, in other words, the MRF is an equilibrium model for the DMC.

        The MRF is the general generating function for a multivariate logistic (MVL) choice

model (detailed description of the MRF structure is provided in online Appendix B). MVL

models have been used in marketing for studying consumer market baskets (Russell and Peterson

2000), multiple category price competition (Song and Chintagunta 2006), multiple category

incidence and quantity (Niraj et al. 2008), product recommendations (Moon and Russell 2008),

and coordinated choice decisions (Yang et al. 2010). An MVL that involves binary choices is

referred to as an autologistic model, and that involves multinomial choices as an auto-model.

        An auto-model (Besag 1974, 1975) is essentially a special case of the MRF. Denote

individual i’s choice by yi; i=1, …, n. In Yang et al. (2010), each choice is dummy coded as a

vector of 0 and 1, e.g., yi = (0, 1, 0) indicates among the three choice options, individual i

chooses option 2. The joint distribution of the auto-model takes the form of

p ( y1 ,..., y n )  exp(  yiU ( yi )                ij   yi' y j ) / z (  ) . U ( yi ) is a function capturing individual i’s
                       1i  n             1i  j  n

own preferences,  ij y i' y j captures choice interdependence, and z(β) is the normalizing constant

that sums over all possible values of y  ( y1 ,..., yn ) . The parameter  ij  0 when choice

                                                                                                                                 13
decisions of individual i and j are independent. Given the restrictive specification of  ij y i' y j , the

auto-model can only accommodate symmetric social influence effects  ij   ji . The MRF

allows for any functional form to capture choice interdependence and thus can accommodate

asymmetric effects of social interactions (a technical discussion is in online Appendix B).

        To incorporate influence asymmetry, some necessary conditions need to be met for both

DMC and MRF. The first condition involves a clearly defined identity of each node at either the

individual or role level (e.g., some people are experts on high-tech products while others are

novices). The second condition is that statistics corresponding to the parameters for asymmetry

must have different values, which most of the time are satisfied in a directed network. To

identify asymmetric influence at the individual level, researchers also need to observe more than

one choice for each individual. If node identities are defined at the role level, one decision per

individual will be sufficient as long as aggregate statistics associated with each role differ.

        As the group size grows, the MRF, as well as the auto model, suffers from the intractable

normalizing constant problem. Specifically, because z(β) involves summing over all possible

values of y, its computation can become intractable. Given that our paper involves a large social

network, we demonstrate the use of an efficient approximate sampling algorithm (Murrey,

Ghahramani, and MacKey 2006; Wang and Atchdé 2012; online Appendix C) to circumvent the

intractable normalizing constant problem in estimating an MRF model.

                   EXPERIMENTAL DESIGN AND DATA COLLECTION

Experimental Design

        Overview. We conduct a two-stage conjoint choice experiment: a pre-influence stage and

                                                                                                         14
a post-influence stage. The experiment also involves two products: a bundle of university sports

paraphernalia and a Bluetooth® headset. Our multi-stage conjoint design is in line with previous

research examining reference group influence3 (Aribarg, Arora, and Bodur 2002; Arora and

Alleby 1999; Narayan, Rao, and Suanders 2010) using conjoint experiments. In the pre-influence

stage, for each product we use a standard conjoint choice experiment (Haaijer and Wedel 2003)

to measure each participant’s initial preferences (i.e., part-worths) for different product attributes

that govern the utilities of the product choices they make. We then collect information about the

participants’ connections within a relatively well-connected and closed social network. Our

network involves the undergraduate students who took an introduction to marketing class in the

fall of 2009 at a Midwest university. The majority of participants knew each other. We expect

this network to be denser than that of a randomly selected group of students.

        In the post-influence stage, we conduct a field conjoint experiment where we ask

participants to enter two raffles as a token of appreciation—in addition to class credits—for their

participation in the pre-influence stage. Each participant may opt out from the raffles. Our design

is consistent with the incentive-aligned design discussed in Ding (2007). We instruct the

participants that they will receive an email with a webpage link where they can make a prize

choice—from four possible options—which they will receive should they win the raffle for each

product. Each of these prize options is described by product attributes for which we measure

individual-specific part-worths of the participants in the pre-influence stage. Participants make a

series of choices in the pre-influence stage but only make one prize choice for each product in

the post-influence stage. The choice set in the post-influence stage also does not overlap with any

choice set in the pre-influence stage. To induce choice interdependence, we show the participants

3
 We study individual choice decisions, and not joint (Aribarg, Arora, and Bodur 2002; Aribarg, Arora, and Kang
2010; Arora and Allenby 1999) or coordinated choice (Hartman 2010; Yang et al. 2010) decisions.

                                                                                                                 15
on the webpage the choices made by their friends (including “no choice” being made). We let the

choice interdependence process evolve organically, and collect longitudinal data of choices

sequentially made. All choices are recorded in real time (complete data). As a result, we also

obtain cross-sectional data that are the realization of the complete data at the end of the study

(snapshot data).

         Fashion-related vs. Technology-related Products. We regard university sports

paraphernalia as a fashion-related product and a Bluetooth headset as a technology-related

product. Choice decisions for both products are likely affected by reference group influences, as

they can be easily “seen or identified by others” (Bearden and Etzel 1982; Bourne 1975). The

products are also of interest to the student population. Park and Lessig (1977) develop a series of

Likert scale questions to measure the extent to which informational and value-expressive

influences are relevant to a consumer’s brand selection for a product.4 We use these questions to

validate our product selection.

         We recruit 78 participants from the undergraduate student population at the same

university. Each participant rates both products on this scale. We randomize the order of

products being rated across participants.5 The findings show that for sports paraphernalia, the

mean value-expressive influence (4.139, Std. Dev. = 0.549) is significantly higher than the mean

informational influence (2.994, Std. Dev. = 0.980; t = 9.177), whereas for the Bluetooth headset,

the mean informational influence (4.034, Std. Dev. = 0.655) is significantly higher than the mean

4
  We include all 14 items that are supposed to measure three types of influence, informational, utilitarian, and value-
expressive, although we only report the results for the two types of influence that are relevant to our research. The
complete analysis is available upon request.
5
  A factor analysis shows that for the value-expressive influence, all five items form a single factor. However, for
the informational influence in the Bluetooth category, two out of five items appear to capture a different underlying
factor from the other three that capture the majority of the explained variance. Upon further investigation, we decide
to use only these three statements, as they are also more relevant to the product categories in our study. We drop two
statements: i) The brand which the individual selects is influenced by observing a seal of approval of an independent
testing agency and ii) The individual’s observation of what experts do influences his choice of a brand (such as
observing the type of car the police drive or the brand of TV repairmen buy).

                                                                                                                    16
value-expressive influence (3.709, Std. Dev. = 0.791; t = 2.884). The mean informational

influence is also significantly higher for the Bluetooth headset than for sports paraphernalia

(4.034 vs. 2.994; t = 7.745). In contrast, the mean value-expressive influence is significantly

higher for sports paraphernalia than for the Bluetooth headset (4.139 vs. 3.709; t = 4.509). These

results validate the use of university sports paraphernalia as a product that involves a value-

expressive influence and a Bluetooth headset as a product that involves an informational

influence.

       Identification of Social Interaction Effects. Observed correlation in the behavior of

individuals who belong to the same reference group may not indicate that someone’s action has a

causal effect on the actions of the others in the group. Previous research has extensively

discussed the identification (i.e., the causal inference) of social interaction effects from potential

confounds, including homophily or endogenous group formation, correlated unobservables, and

simultaneity (Manski 1993, 2000; Moffitt 2001). Homophily describes the reverse causality

effect of social interactions where the correlated behavior of individuals within a reference group

results from the individuals’ tendencies to form social connections with others who share similar

tastes and preferences, and not a causal effect of someone’s behavior on another.

       In economics, the collection of exogenous social network information is rare. Social

connections are sometimes inferred from the observed behavior whose sensitivity to social

interactions researchers aim to study (Tucker 2008), or the geographic and/or demographic

proximities of individuals (Bell and Song 2007; Nam, Manchanda, and Chintagunta 2010; Yang

and Allenby 2003). Such procedures have the potential to interject a homophily problem into a

study. Different methods are proposed as remedies. With the availability of panel data, one can

specify a model with individual fixed (Nair, Manchanda and Bhatia 2010) or random effects

                                                                                                    17
(Hartman 2010) to control for individual preferences for a behavior or an action. Alternatively,

researchers can simultaneously model the group formation process, which requires exclusion

restrictions (Steglich, Snijders, and Pearson 2010; Tucker 2008) or use other exogenous variables

to help separate social interaction effects from homophily (Nam, Manchanda, and Chintagunta

2010). Aral, Muchnik, and Sundararajan (2009) propose a dynamic matched sample estimation

framework to tease out correlated product adoption that can be explained by observable

individuals’ characteristics and behavior.

        Bramoulle, Djebbari, and Fortin (2009), Hartman et al. (2008), and Manski (1993, 2000)

advocate more research to collect network information that is exogenous to the focal behavior to

circumvent homophily. The social network in our study is determined by the frequency with

which participants interact, and not their choice behavior. However, one may expect individuals

who interact frequently to have similar preferences for certain product attributes, such as a

particular brand (Reingen et. al. 1984). We control for potential homophily by accounting for

each participant’s utility (i.e., overall preference) for each prize option presented to him or her in

the post-influence stage. Each option’s utility is derived from each participant’s part-worths—

capturing his or her initial preferences for different product attributes—estimated from the pre-

influence stage. In addition to network information, Manski (2000) encourages more research to

explicitly elicit preferences using an experimental approach. Our design employs both elements.

        Shalizi and Thomas (2011) distinguish latent homophily from homophily. The authors

argue that even when researchers tease out homophily by conditioning on previous behavior or

action (e.g., jumping off a bridge) and observable individual traits including preference (e.g.,

fondness of jumping of a bridge), the presence of unobservable individual traits that drive the

preference for the action (e.g., a thrill-seeking propensity trait) and engender the social tie (e.g., a

                                                                                                     18
bond formed by a common thrill-seeking propensity) still confounds the social interaction effects

with latent homophily.6 Most research studying social interactions cannot easily measure the

latent preference driving the action under investigation (e.g., technology adoption, school

performance), and controls for preference only at the action level (e.g., adding fixed effects to a

model intercept). In contrast, conjoint experiments allow us to precisely measure individuals’

intrinsic preferences at the characteristic of the action level (i.e., preferences for attributes of a

choice). In a way, we measure the latent attribute preferences (e.g., the element of thrill-seeking

in jumping off a bridge) that drive the utility (i.e., overall preference) of each choice option. Our

multiple choice tasks in the pre-influence stage are created based on an optimal design, which

aims to maximize the efficiency of part-worth estimates. As a result, each choice option,

including those in the post-influence stage, is hypothetically created and entirely determined by

the attributes that characterize the option. As such, the possibility that a prize choice is

influenced by latent factors other than a participant’s latent initial attribute preferences and

induced social interactions is minimal.

           Correlated unobservables can be explained by individuals behaving similarly because of

their likely exposure to the same external stimuli (e.g., individuals in the same ZIP Code receive

the same promotion from a car dealership). The use of geographic and demographic proximities

to infer social relations can further exacerbate this type of endogeneity. The inclusion of

individual fixed effects helps mitigate both homophily and correlated unobservables concerns.

However, with observational data, researchers may need to also include time fixed effects and

additional variables that help control for time-varying changes that are specific to each individual

(Nair, Manchanda, and Bhatia 2010). In our study, all participants make their prize choices on an

6
    The authors provide the empirical support for their arguments via simulation studies.

                                                                                                         19
identical webpage within a short time frame. There are no promotions on the sports

paraphernalia items we use during the raffle period, and the Bluetooth headset options are

hypothetical. In fact, outside marketing activities should not concern us, as our focal behavior is

prize choice, which is costless to the participants and everybody has an equal probability of

winning. The only difference across the participants is the information about prize choices made

by their friends7. We thus expect the impact of correlated unobservables to be minimal.

         Finally, the simultaneity issue may arise if individuals make choices at the same time.

Given our emphasis on studying passive social interactions in a sequential choice context, our

experimental design separates pre-influence from post-influence choices and exploits the choice

sequence information. The imposition of temporal ordering makes simultaneity not relevant in

our research (Hartman et al. 2008; Manski 2000; Narayan, Rao, and Saunders 2010).

         More recent research advocates the use of randomized experiments to identify social

interaction effects. Several experimental procedures have been proposed: i) a random assignment

of group membership (Moffitt 2001); ii) with mutually exclusive reference groups, members in

each group are randomly assigned to a treatment (e.g., a policy intervention) or a control

condition (Moffitt 2001); iii) a random assignment of individuals to one of the treatment

conditions where different types of social cues are present or a control condition where the social

cue is absent (Aral and Walker 2011; Bakshy et al. 2012); and iv) for each adopter, randomly

assign some of his or her peers to receive information about the adoption and the remaining to no

information (Aral and Walker 2012). Despite the merits (Rubin 1974), a few caveats exist for

these approaches. The random assignment of group membership can change the structural nature

of social interactions; hence, the less obtrusive nature of the last three approaches makes them

7
  The participants also see the names of their friends who have not made their choices by the time they log on to the
site.

                                                                                                                  20
more appealing. Nonetheless, the last three approaches are more appropriate for a social network

wherein reference groups do not overlap. Overlapping connections across members can create

leakage (Aral and Walker 2011) or incompatibility across conditions. In our social network,

friendships are intertwined across participants. Therefore, it is not plausible to create non-

overlapping treatment and control conditions that allow us to conduct simple mean comparison

analyses. Obtaining good data from a randomization approach also requires having an adequate

sample size in each experimental condition. As a result, our research uses a statistical modeling

approach where we control for initial preferences directly as an alternative to using a randomized

controlled experiment.

Data Collection Procedure

        As mentioned earlier, our data collection procedure consists of pre- and post-influence

stages. The pre-influence stage involves measuring initial preferences for different attributes of a

university sports paraphernalia bundle and a Bluetooth headset, and collecting their network

information. For sports paraphernalia, we create product bundles based on four attributes: jacket

style (four levels: style 1: blue, style 1: gray, style 2: blue, style 2: gray); t-shirt color (two levels:

blue and white); hat style (two levels: 1 and 2); and price (four levels: $39, $49, $59, and $69).

We include six attributes for the Bluetooth headset: brand (two levels: Motorola and

Plantronics); color (two levels: black and silver); weight (two levels: 8 and 13 grams); talk time

per battery charge (two levels: 5 and 8 hours); noise cancellation (two levels: yes and no); and

price (four levels: $49, $59, $69, and $79). For each product, a blocked design involving ten sets

of twelve quadruples is created using an optimal design generated by SAS OPTEX (Kuhfeld,

Tobias, and Garratt 1994). Participants are randomly assigned to one of the ten sets and complete

                                                                                                        21
twelve choice tasks (each with four options). We also include the fifth baseline option of “an ear-

plugged wired headset at $29” for the Bluetooth headset category.

                                              Figure 1
                                Webpage for Sports Paraphernalia Raffle

         To collect social network information, we first ask the participants to nominate ten

students in the class with whom they have interacted most frequently.8 To ensure privacy,9 we

8
  To facilitate the nomination process, our JAVA program has a built-in search tool. Participants can easily locate
the nominees’ names by typing part of their first or last names. The search tool helps us avoid typos and solves the
problem that participants may not necessarily remember the full name of all the nominees.
9
  This procedure was included upon the request from the Institutional Review Boards of Health Sciences and
Behavioral Sciences (IRB-HSBS).

                                                                                                                   22
later provide the participants with an option to indicate to which nominees they are unwilling to

disclose their choice information. We then invite the participants to enter the two raffles. We

finally collect demographic information including gender, time spent on email daily, confidence

in apparel taste, knowledge about cell phone headset, and interest in winning the products.

         In the post-influence stage, we first launch the online raffle for sports paraphernalia. All

participants receive an email that includes the URL that directs them to a webpage. On the page,

they can select one of four sports paraphernalia bundles they wish to receive should they win the

raffle. The four prize options are described by the same attributes as those used in the pre-

influence stage (Figure 1).10 The four options are relatively close to one another with regard to

their overall utilities (computed from the part-worth estimates obtained from the pre-influence

stage) and pre-tested with a similar pool of participants to have similar choice shares.

         To facilitate choice interdependence, we show the participants the choices that were

previously made by their “friends.”11 One’s friends are other participants who specify him or her

as someone they have interacted with frequently and agree to share choice information with in

the pre-influence stage. For sports paraphernalia, we expect that the participants’ product

decisions may also be influenced by a larger-scale social interaction based on gender (i.e. gender

norm) (Brock and Durlauf 2001; Miller and Prentice 1996). To examine whether the influence of

gender norm can co-exist with the social influence at the reference group level (i.e., global vs.

local social interactions), we show each participant the aggregate proportions of other

participants choosing each option. The aggregate proportions are computed based on: 1) all

participants who have already chosen each option; 2) only female participants who have already

10
   We omit price, as including price may have prompted students to choose the most expensive bundle.
11
  To further induce social interactions, the experiment features the real-time delivery of friends’ choices during the
raffle period. The procedure works as follows. The first time a participant makes a choice or every time he or she
changes a choice, emails are sent out to his or her “friends.” The email informs each participant’s friends about his
or her most recent activity, and urges the friends to visit the raffle page. The webpage URL is included in the email.

                                                                                                                    23
chosen each option; and 3) only male participants who have already chosen each option at a

particular time point. The participants can change their prize choice as often as they wish before

the end of the raffle. We inform them that if they do not make a choice, one of the four bundles

will be drawn randomly as their prize should they win the raffle. Our database is updated in real-

time to assure that the webpage offers up-to-date information about the friends’ activities. To

increase the response rate, we send an email reminder in the middle of the raffle.

       A week after we launch the second raffle for the Bluetooth headset. The setup is very

similar to the paraphernalia raffle, except that participants do not see the aggregate proportions

of male and females who are choosing each Bluetooth option. We do not expect a gender norm

to be relevant for this product category. In addition, the default option for the participants who do

not make a prize choice is a wired headset priced at $29 (the outside option we use in the

conjoint experiment in the pre-influence stage). The raffle is closed after a week. Two email

reminders are sent out during this period. A winner is drawn for each raffle at the conclusion of

the study.

                                    EMPIRICAL ANALYSIS

Descriptive Statistics of Participants and Their Social Network

       Participant Profile. There are 292 participants in the pre-influence stage, among which

215 agree to participate in the raffles in the post-influence stage. The general demographic

profiles of the participants who agree to enter the raffles and those who do not are not

significantly different, except that those in the raffles are more interested in the products they can

win (Table 1). All the analysis discussed below is based on these 215 participants.

                                                                                                     24
Among the 215 raffle participants, 21.4% rate their confidence in apparel taste greater

than 6 on a 1-7 scale, 1 being least confident), and 18.6% rate their level of knowledge about cell

phone headsets greater than 5 on a 1-7 scale, 1 being not knowledgeable. More than half of the

participants are “very interested” in winning a sports paraphernalia bundle, with a rating of 7 on

a 1-7 scale (1 being “not interested all at”), and the mean rating is 5.99. For the Bluetooth

headset, interests are more evenly distributed, although one-third of the ratings are 7, and the

mean rating is 4.51 (on a 1-7 scale).

                                                   Table 1
                                             Descriptive Statistics

Variables                   Participants’ mean              Non-participants’            Test statistic1
                            (Std. Dev.)                     mean (Std. Dev.)                  (df)
Gender                           60.9% male                   63.6% male                    .08 (1)
Apparel confidence                5.52 (1.18)                 5.51 (1.30)                  .09 (123)
Headset knowledge                 3.59 (1.75)                 3.27 (1.70)                 1.38 (138)
Time spent on email                 38.6% 2                     28.6%                       5.43 (3)
Interest in sports                5.99 (1.34)                 3.22 (2.06)                10.05 (78)**
paraphernalia
Interest in Bluetooth              4.51 (2.12)                  2.35 (1.77)              8.11 (119) **
headset
1
  Chi-square test is used for gender and time spent on email. Independent-sample t-test is used for the remaining
variables.
2
  Percentage of participants spending more than an hour on email every day.
**
   Two sided p-value is < .05,

         Social Network Profile. Figure 2 depicts the directed social network among the raffle

participants. Each node represents a participant. Recall that in our experiment, one’s friends are

defined as participants who nominate him or her as one of those with whom they have frequently

interacted and agree to share choice information. In other words, a participant cannot choose his

or her friends. It is other participants who nominate themselves to be his or her friends. An arrow

indicates the direction of the nomination. For instance, an arrow from node 110 to 2 means

participant 110 nominates participant 2, and becomes one of participant 2’s friend. A double-

                                                                                                                    25
headed arrow between 185 and 196 means both participants nominate each other. Among the

628 pairs of connected participants (the same two individuals are only counted as one pair),

26.6% of them have arrows both in and out (i.e., have reciprocal connections).

                                             Figure 2
                                       Social Network Plot

       It is not surprising that a small group of participants in the network are substantially more

popular (i.e., many other students nominate each of them as one of their “friends”) than the rest.

We use the size of each node to represent the number of friends each participant has. The bigger

the node, the larger the number of friends he or she has (i.e., more arrows in). Participants 9, 110,

and 140 are examples of the popular participants. On the flip side, we use color to denote the

number of others each participant nominates. The three orange nodes, 24, 50, and 121, each

nominates nine students (nine arrows out) though they themselves do not get as many

nominations from others (i.e., fewer arrows in). From this plot, we do not see a strong correlation

between popularity (being nominated) and the number of nominations. Nodes of the same size

                                                                                                  26
can thus have a variety of colors.

                                                         Table 2
                                                Network Structure Statistics

               Effects                                           Network Statistics                  Values
 Mean in/out degree                                      e /n
                                               1 i , j  n ij                                        3.7
 Variance of in degree                      var( 1 j  n e ji )                                     7.0
 (prestige)
 Variance of out degree                     var( 1 j  n eij )                                      6.0
 Percentage of isolates (no                # ( 1 j  n e ji  1 j  n eij  0) / n               2.3%
 arrow in or out)
 Percentage of arrow out only              # ( 1 j  n e ji  0, 1 j  n eij  0) / n            4.2%
 Percentage of arrow in only               # ( 1 j  n e ji  0, 1 j  n eij  0) / n            12.6%
 Density                                                 e / n ( n  1)
                                               1 i , j  n ij                                       1.7%
 Percentage of reciprocal
 connections
                                           
                                       1 i  j  n
                                                        2eij e ji / n ( n  1)                       .73%
 Percentage of transitive
 triads
                                                                      e e e / n ( n  1)( n  2 )
                                       1 i , j , i '  n , i  j  i ' ij ji ' ii '                 .007%
eij = 1 if i nominates j, 0 otherwise.

          The network solicitation procedure leads to participants having 3.7 friends on average.

The most popular participants have 12 friends, while 14 participants have no friends. Table 2

provides some network structure statistics. As expected, the density, reciprocity, and transitivity

are low as we work with a relatively large network. Two characteristics of our experimental

setup also contribute to the network’s sparse nature. First, we ask each participant to select up to

ten friends with whom they have interacted most frequently for a practical reason. Narayan, Rao,

and Saunders (2011) allow participant to nominate up to N-1 (the N sample size = 70) others as

their influencers. This would have been a timely prohibited task for us with 292 participants, plus

we propose a design that is applicable for a large network. Other researchers take a similar

approach restricting the number of nominations per participant (Lieder et al. 2009; Nair,

Manchanda, and Bhatia 2010). Second, as participants’ choices are shared along with their

                                                                                                              27
names, to protect privacy participants are given the right to opt out from sharing choice

information with the friends they nominate, which makes our network even sparser. Unlike in

Centola (2010, 2011) where people do not know each other prior to the experiment and their

information is shared with the screen names they choose, which eliminates the privacy issue, our

experiment hinges upon actual friendship and cannot get around the privacy issue. Privacy is also

not a concern in a field experiment, where focal behavior is naturally expected to be observed

among peers (Aral and Walker 2011, 2012).

       In Table 2, 4.2% of the participants receive no nomination from other students, although

they nominate at least one other student as their friend (i.e., arrow out only). We observe that

85.1% of the participants share choices with at least one friend, and 93.5% of the participants can

observe at least one other participant’s choice. At the same time, 14.9% (% arrow in only plus %

isolate) are not willing to nominate anybody with whom they will eventually share their choice

information. The major difference between them and those sharing choice information is that

they are more socially peripheral, i.e., they have fewer friends sharing choice information with

them (on average 2.4 vs. 3.9 friends). There is no significant difference in other demographic

profiles, except that participants not sharing on average are less knowledgeable about a cell

phone headset (3.1 vs. 3.7 on a 1-7 point scale).

       Raffle Choice Summary. For sports paraphernalia, 167 participants visit the webpage, and

62.3% of them are male, which is similar to the proportion of males among the 215 raffle

participants. Among the 25 participants who visit the page multiple times, 17 do not change their

choices in all the visits, whereas the remaining eight first did not choose any option but then pick

an option later. The percentages of people choosing options 1 to 4 and not choosing any of the

options are 30.5%, 38.9%, 9.0%, 16.8%, and 4.8%, respectively.

                                                                                                   28
For the Bluetooth headset, 97 participants visit the webpage, and 25.8% claim to be

knowledgeable (rating their knowledge greater than 5 on a 1-7 scale), which is higher than the

percentage among the 215 enrolled (18.6%). The main reason could be that the participants who

consider themselves knowledgeable are more interested in winning a Bluetooth headset than

those who are less knowledgeable, as the mean winning interest are 5.9 and 4.2 for the more

knowledgeable and less knowledgeable group, respectively. Among the 97 participants, 92 also

log on to the paraphernalia raffle page. Thirty-four participants visit the page multiple times, and

28 of them maintain the same choices in all the visits. The remaining 6 first did not choose an

option but later choose one. The percentages of people choosing options one to four and not

choosing any of the options are 14.4%, 3.1%, 74.2%, 2.1%, and 6.2%, respectively.12

Analysis of Participants’ Initial Preferences

         We use the hierarchical Bayes logit model (Allenby and Rossi 1999) to fit the conjoint

                                                                                               P
choice data from the pre-influence stage. Participant i’s utility of option k is              
                                                                                               p 1
                                                                                                       I I
                                                                                                        x
                                                                                                      ip ipq ,k     ikI ,

where  ipI captures individual i’s initial preference for the p-th product attribute, x ipq
                                                                                          I
                                                                                             , k indicates

the p-th attribute’s value for option k in task q,  ikI is an error term following type I extreme

value distribution, and the superscript I denotes initial preference. All the attributes are dummy-

coded including price. The model specifications for the sports paraphernalia and the Bluetooth

headset are the same except that for the headset, the participants also have an option to choose an

ear-plug version at $29. Hence, we estimate the preference (  i0I ) for this outside option for the

12
   Option 1: Motorola, black, 13 grams, 8-hr talk time, no noise cancellation; Option 2: Motorola, silver, 8 grams, 5-
hr talk time, no noise cancellation; Option 3: Plantronics, silver, 13 grams, 8-hr talk time, noise cancellation; Option
4: Plantronics, black, 8 grams, 8-hr talk time, no noise cancellation).

                                                                                                                              29
You can also read