How should we investigate variation in the relation between social

Page created by Corey Robinson
 
CONTINUE READING
How should we investigate variation in the relation between social
Preprint under review                                                                                                       1

           How should we investigate variation in the relation between social
                             media and well-being?
                  Niklas Johannes1, Philipp K. Masur2, Matti Vuorre1, & Andrew K. Przybylski1
                           1
                            Oxford Internet Institute, University of Oxford, 2Communication Science, Vrije
                                                       Universiteit Amsterdam

           The study of the relation between social media use and well-being is at a critical junction. Many
           researchers find small to no associations, yet policymakers and public stakeholders keep asking for more
           evidence. One way the field is reacting is by inspecting the variation around average relations – with
           the goal of describing individual social media users. Here, we argue that such an approach risks losing
           sight of the most important outcomes of a quantitative social science: estimates of the average relation
           in a large group. Our analysis begins by describing how the field got to this point. Then, we explain the
           problems of the current approach of studying variation. Next, we propose a principled approach to
           quantify, interpret, and explain variation in average relations: (1) conducting model comparisons, (2)
           defining a region of practical equivalence and testing the theoretical distribution of relations against that
           region, (3) defining a smallest effect size of interest and comparing it against the theoretical distribution.
           We close with recommendations to either study moderators as systematic factors that explain variation
           or to conduct N = 1 studies and qualitative research.
           Keywords: Social media; well-being; effect heterogeneity

                               This preprint is under review and has not been peer reviewed yet.

    In choosing what to study, social scientists interested in             and comes to a consensus, concluding that there is little
digital technologies are often reactive (Orben, 2020b). They               cause for concern: Relationships between technology use
react to calls for evidence from policymakers, which come                  and outcomes of concern are close to zero. At this point, why
with opportunities for funding to deliver that evidence.                   doesn’t the field move on? Is the question about
Likewise, policymakers regularly react to concerns of the                  technology’s potential harm not answered? Wouldn’t it be
public. This reciprocal relation between concerns and calls                prudent to stop investing resources into researching effects
for evidence follows a cyclical nature (Grimes et al., 2008;               that evidence suggests are not there?
Orben, 2020b): Whenever a new technology hits the market,                      We believe the study of social media use and well-being
it sparks concerns about the effects of the technology on the              faces exactly these questions today. For the past fifteen
well-being of its users. All stakeholders are looking for                  years, the public discourse has been ripe with concerns about
guidance: Should we limit our and especially our children’s                the relation between social media use and well-being.
time with the technology? How can we prevent its alleged                   Policymakers reacted, calling for scientific evidence on
negative effects? Governments regularly respond to these                   social media use effects and spending millions in funds for
concerns and call on science for evidence – in the process                 social scientists to deliver that evidence (Council on
committing large amounts of funding. After years, the                      Communications and Media, 2016; Dickson et al., 2019).
scientific community evaluates the best available evidence                 And social scientists delivered. Almost all reviews and
                                                                           meta-analyses come to a similar conclusion: The
                                                                           relationship between social media use and the user’s well-
                                                                           being is close to zero and below any threshold for severe
     Grants from the Huo Family Foundation and Economic and
Social Research Council (ES/T008709/1) supported NJ, MV, and               harm that would require intervention (Appel et al., 2020;
AKP. Code to reproduce the figures and numbers reported in this            Bayer et al., 2020; Dickson et al., 2019; Dienlin & Johannes,
article is available at https://osf.io/b7rpx/. A public preprint of this   2020; Kross et al., 2021; Odgers & Jensen, 2020; Orben,
work is available at https://psyarxiv.com/xahbg. The authors
                                                                           2020a; vanden Abeele, 2020).
declare no conflicts of interest The funders had no role in the
conception of this paper, decision to publish, or preparation of the
manuscript.
     Niklas      Johannes       is    the   corresponding      author:
niklas.johannes@oii.ox.ac.uk.
How should we investigate variation in the relation between social
Preprint under review                                                                                                   2

    Taken together, it’s becoming increasingly clear that the      assumption might have been overly simplistic (Kaye et al.,
field is at a critical junction. The evidence on the               2020; Orben et al., 2020; Przybylski & Weinstein, 2017), but
relationships between social media use and well-being is           it shaped how social scientists studied social media.
clear, yet that consensus has not reached the public, which        However, there is no convincing evidence that social media
continues to express fears over the pervasive use of social        are indeed the dose that causes a response. Most studies in
media, which again prompts governments to keep asking for          the field cannot make causal claims, because there is a lack
more evidence. The disconnect between having delivered             of theory that would allow to test causal models with cross-
that evidence and requests for more of it puts social scientists   sectional data (Eronen & Bringmann, 2021; Rohrer, 2018)
in an unenviable position. That position has led researchers       or inform us about time-varying confounders in longitudinal
to investigate variation around the average association            data (Hernán, 2018; VanderWeele et al., 2016). Moreover,
between social media use and well-being (Aalbers et al.,           the few experiments with high external validity show the
2021; Beyens et al., 2020; Whitlock & Masur, 2019).                lack of a causal effect (Mitev et al., 2021; Przybylski et al.,
Studying between-person and -group variation might be              2021). Therefore, it is questionnable whether social media
informative, but the field has not yet developed a principled      use is indeed the dose that causes a response (i.e., changes
approach to researching or understanding that variation.           in well-being). Moreover, it is unclear whether a dose-
Before we develop such an approach, we believe it is               response relation would be linear, such that with each extra
important that we pause and ask ourselves: How did we get          minute of social media use, well-being varies to the same
here? What’s the problem? And where do we go from here?            extent as with the next minute (Johannes et al., 2021). For
    Studying variation before a solid understanding of how         example, there is some preliminary evidence that the relation
to answer these questions may be hasty and risk investing          between social media use and well-being is curvilinear
resources that could be more valuable elsewhere. Here, we          (Bruggeman et al., 2019; Przybylski & Weinstein, 2017).
tackle those questions in the hope that we can outline a               We believe such a linear dose-response assumption
principled approach for the study of social media and well-        aligned with the way in which the field is understood by
being.                                                             many researchers. Emulating STEM and medicine
                                                                   (Muthukrishna & Henrich, 2019), the field takes a certain
                     How did we get here?                          lawfulness in media effects for granted. Although people are
                                                                   different and media effects will differ from person to person
   Prescribing the impossible                                      to a degree, we tend to assume that we can identify average
     Ever since social media became popular at the turn of the     media effects that we can generalize to a population. At the
millenium, there have been concerns about their negative           same time, calls for advice from the public and policymakers
effects (e.g., Beard, 2005). While the social sciences were        have pressured social scientists into producing prescriptive
still figuring out how to properly research social media, the      recommendations (Council on Communications and Media,
first popular science books about their harmful effects on our     2016; Dickson et al., 2019). How should we use social
thinking (e.g., Carr, 2011) and social fabric (e.g., Turkle,       media? What amount of screen time is safe for a child?
2012) came out and caused a stir in public discourse.              Under the assumption of lawfulness on the level of
Policymakers reacted quickly. For example, in 2018 the UK          individual people, prescriptions are possible. Some
Chief Medical Officers commissioned researchers to                 biochemical processes will affect almost everyone in a
provide an evidence map of published research on the effect        similar way, but to different degrees (Muthukrishna &
of screen time and social media on children’s and young            Henrich, 2019). For example, Figure 1 shows different
people’s mental health (Dickson et al., 2019). In 2016, the        distributions of effects. The effect of administering a drug
American Academy of Pediatrics released a policy statement         on a chemical response in the body will have lower variance
on Media and Young Minds, recommending that children               than investigating relationships between comparably
younger than 18 to 24 months avoid digital media (except           abstract psychological concepts. Thus, psychological
video-chatting) altogether (Council on Communications and          processes are rarely universal to a degree that we can make
Media, 2016).                                                      prescriptions for individual social media users (Bolger et al.,
     What is striking in the calls for evidence is the             2019; Fisher et al., 2018; Molenaar, 2004; Valkenburg &
assumption of how social media effects are supposed to             Peter, 2013). For all those reasons, it’s close to impossible
work. Those calls often assume a linear dose-response              to make (personalized) prescriptions on social media use:
relationship: More time on social media will lead to lower         the evidence is merely correlational, we can’t be certain we
well-being (Johannes et al., 2021). From the get-go, such an
Preprint under review                                                                                                   3

indeed investigate a dose with a linear effect, and relations
uncovered by the field are not universal.

   Figure 1. Examples of the distributions of a null effect (upper panel) and a large effect (lower panel) under different
levels of variation (little variation in blue and a lot of variation in black).
                                                                 (between-person) and using social media more than a person
   A mismatch
                                                                 usually does is not systematically related to changes in that
     When it comes to social media use and well-being,           person’s well-being (within-person). On the level of the
scholars have reached a point of general consensus.              population, concerns don’t seem warranted. However:
Consensus here should not be understood as every                 These studies didn’t look at the content that people engage
researcher holding a similar opinion; there are still social     with on social media or mental health indicators, but
scientists who are concerned about social media (e.g.,           frequency or amount of self-reported social media use and
Twenge et al., 2020). But most scholars reject this framing      various indicators of well-being (e.g., life satisfaction or
(Livingstone, 2018) and large-scale studies investigating        self-esteem).
average associations between social media use and well-              Even with these limitations in mind, if we accept that the
being support the lack of a sizable relation (Dickson et al.,    literature provides convincing evidence of null or negligible
2019; Dienlin & Johannes, 2020; Kross et al., 2021; Kushlev      associations, we are presented with a mismatch: The lack of
& Leitao, 2020; Masur, 2021; Meier et al., in press; Meier       positive evidence linking social media and well-being is
& Reinecke, 2020; Orben, 2020a; Valkenburg, Meier, et al.,       fundamentally out of step with on-going public concerns
2021; Whitlock & Masur, 2019). Most of these studies show        about negative effects, addiction, and distractions (Ellis,
small or negligible associations on both the between-person      2019; Kardefelt-Winther et al., 2017; Loh & Kanai, 2015;
and the within-person level (Coyne et al., 2019; Dienlin et      Satchell et al., 2021; van der Schuur et al., 2019). As a result,
al., 2017; Houghton et al., 2018; Orben et al., 2019; Orben      there are no grounds for making prescriptions on social
& Przybylski, 2019; Schemer et al., 2020; Thorisdottir et al.,   media use. That mismatch between evidence and concerns
2019). In other words, those who use more social media are       might make many researchers feel uneasy. When lay beliefs
not worse off compared to those who use them less                about social media effects are this strong but social scientists
Preprint under review                                                                                                     4

are not able to produce evidence in support, maybe they
can’t be trusted with producing evidence for questions that
have a (seemingly) less obvious answer? The result of this
line of reasoning is dissatisfaction for researchers, the
public, and policymakers alike. A perception of social
scientists as being unable to deliver useful insights,
especially on a topic for which the public has strong
preconceptions, can have serious consequences for the
research funding and the credibility of social science overall
(IJzerman et al., 2020). Consequently, the average negligible
relation between social media use and well-being may well
lead us to feel threatened as a discipline.
    The field has reacted to that threat. Several evidence
reviews on the effects of technology have called into                    Figure 2. An example of the argument to focus on
question the validity of the evidence so far (Bayer et al.,          individual social media users. The average size of the
2020; Dienlin & Johannes, 2020; Kross et al., 2021; Kushlev          relation between social media use and well-being is zero, but
& Leitao, 2020; Meier et al., in press; Orben, 2020a; vanden         for some people the relation will be above a smallest effect
Abeele, 2020). For example, researchers have criticized both         size of interest.
the validity and reliability of self-reported social media use
                                                                     well-being (Anvari et al., 2021; Anvari & Lakens, 2021). As
(Ellis, 2019; Ellis et al., 2019; Kaye et al., 2020; Parry et al.,
                                                                     a result, we should focus, so the argument goes, on
2021; Shaw et al., 2020; vanden Abeele et al., 2013) and the
                                                                     individual social media users and less on the average relation
lack of nuance when inspecting general use rather than the
                                                                     between social media use and well-being among a large
content people engage with (Dienlin & Johannes, 2020;
                                                                     group of people.
Kaye et al., 2020; Kross et al., 2021; Masur, 2021; Orben,
                                                                         In the past year, several analyses of experience sampling
2020a).
                                                                     data explored this idea of person-specific media effects
    The strongest reaction to the status quo that threatens the
                                                                     (Aalbers et al., 2021; Beyens et al., 2020; Siebers et al.,
field (one-size-fits-all approach with negligible effects) has
                                                                     2021; Valkenburg, Beyens, et al., 2021). Based on the
been a call for more work that identifies those who show
                                                                     assumption that everyone is different, they highlight
non-negligible relations between social media use and well-
                                                                     relations per person and place less focus on the average
being (Orben, 2020b; Valkenburg & Peter, 2013). This call
                                                                     effect. These analyses are best understood as a variant of the
responds to the question why the public continues to be
                                                                     heterogeneity argument, an approach used in other parts of
worried about the effects of social media: Several salient
                                                                     the social sciences (Bolger et al., 2019). In brief, the effect
cases (e.g., radicalization) suggest that there may be distinct
                                                                     heterogeneity argument states that effects won’t be uniform
subgroups or contexts under which social media use has
                                                                     across people and understanding the variation around
meaningful effects after all.
                                                                     average effects can be informative. Such investigations of
    One prominent version of that argument goes as follows:
                                                                     variation around average relations (i.e., effect heterogeneity)
Estimating average relationships between social media use
                                                                     may be of high value for the field. However, prioritizing
and well-being isn’t informative. Instead, all people are
                                                                     variation over interpreting and understanding average
different, so we must examine individual social media users
                                                                     associations risks atomizing associations.
and the relation between social media use and well-being
                                                                         We believe that shifting focus from average to individual
that is specific to them. Some users will show large relations
                                                                     relations may not be as promising a solution as it might
and others won’t, leading to overall negligible average
                                                                     appear at first blush. This is in part because most social
relations (Aalbers et al., 2021; Beyens et al., 2020; Siebers
                                                                     scientists researching social media rely on inferential
et al., 2021; Valkenburg, Beyens, et al., 2021). Figure 2
                                                                     statistics to draw inferences about populations of interest. As
illustrates that reasoning: The average relation between
                                                                     long as we are committed to making such inferences, we
social media use and well-being is zero, but there is variation
                                                                     have to make a choice: Either we dismiss most, if not all,
around the average relation, such that large negative and
                                                                     average associations in the literature because we believe we
large positive relations effectively ‘cancel’ each other out. If
                                                                     need to inspect effects per person. In this case, we are
these relations are beyond a smallest effect size of interest,
                                                                     obliged to shift the definition of our population (from large
they can be considered large enough to be meaningful for
Preprint under review                                                                                                    5

groups to a single person) and conduct N = 1 or qualitative         we should know what sources of variability to account for to
studies. Alternatively, we acknowledge average associations         identify the signal. Because we want to generalize from the
as our best bets of the true association in the population,         people in our sample to the population, we need to account
whose variation deserves more scrutiny. Then we’re obliged          for variation of people being different from each other. Only
to develop a principled approach towards identifying,               if we account for these differences are we allowed to
interpreting, and explaining variation around average               generalize to other people. Social scientists often account for
relations. We strongly believe the second option is the             such variation in various forms of mixed-effects models by
answer to the question “Where do we go from here?”. Before          specifying grouping variables (Bates et al., 2015; Bolker et
we present our attempt to provide that answer, we’ll outline        al., 2009; DeBruine & Barr, 2021) – ideally all sources of
the problem we see with the current approach to studying            variability that we want to generalize over (Yarkoni, 2019).
variation around average relations.                                 Therefore, when we predict well-being, we’ll obtain a fixed
                                                                    effect (i.e., average relation) and random slopes (i.e., per-
                      What’s the problem?                           person deviations from the average relation) for social media
                                                                    use. Random slopes mean that the model doesn’t assume that
   Inferential goals and problems                                   the relation will be the same for every participant; the model
    Quantitative social scientists draw samples from a              takes these differences between people into account and
population to make an inference about that population, often        provides us with the best estimate of the average relation,
with the aim of making recommendations to policymakers              that is, the fixed effect.
and the public. In our case, if we want test the general                 The fixed effect, therefore, is our best estimate of the
relation between social media use and well-being, we                true, average association in the population; it’s the effect we
measure social media use and well-being in a sample of, say,        hope to generalize to other observations. If we rely on
1,000 people. Next, we build a statistical model that allows        inferential statistics in that way, we implicitly state that we
us to estimate the direction and magnitude of a relation in         are interested in an inference about the population. The same
the data generated by these 1,000 people. If we find that           logic applies to identifying factors that might make some
there is a negative relation, we don’t want to conclude that        social media users more susceptible than others. If we want
the relation only applies to our sample; rarely are we              to generalize those factors to the population, we are yet again
interested in a relation in those particular 1,000 people we        interested in fixed, average relations – only this time we are
happened to measure. If we were, we wouldn’t need                   after moderators, not main effects. Moderators can explain
inferential statistics. We could just calculate the size of the     systematic variation in an effect, informing us about the
relation and have our answer. We could report back to               factors that, on average, influence the average effect of
policymakers and inform them about the harmful effect               social media in the population we want to generalize to.
social media might have on these particular people. If we           Because moderators are important in explaining variation,
have established causality, we could even consider making           we will come back to their role in a later section.
a prescription, so that policymakers can regulate the time               For those reasons, we caution against treating fixed
those 1,000 people spend on social media. But we’re not             effects as secondary. For example, Beyens and colleagues
only interested in those 1,000 people. We sampled those             (2020) reported the distribution of observed random slopes
1,000 people to draw an inference to the population they            of the relation between social media use and well-being,
come from. Statistical inference is thus necessarily                categorizing individual relations according to sign and size.
inductive: To arrive at an inference about the population, we       They state: “Because only small subsets of adolescents
generalize from a sample of that population.                        experience small to moderate changes in well-being, the true
    Sampling introduces sampling error. Statistical inference       effects of social media reported in previous studies have
helps us to disentangle that error from a true effect. In other     probably been diluted across heterogeneous samples of
words, it separates signal (i.e., the true effect or association)   individuals that differ in their susceptibility to media
from noise (i.e., the error), which means there will be             effects” (p. 2). Such an approach emphasizes description
variation in our measures – be it caused by measurement             over inference by focusing solely on variation. Describing
error, sampling error, or true variation in the effect. That        observed random slopes instead of interpreting the fixed
variation can occur on two levels: Between people (i.e.,            effects therefore neglects the most important outcome of our
differences from one person to another) and, if we have             models: the estimate of the average association in the
multiple measurements per person, within people (i.e.,              population which is intended to generalize beyond the
variation around the person’s mean). In our statistical model,      current sample. It also assumes that the general demographic
Preprint under review                                                                                                  6

factors underlying the specific sample make it representative      that we believe is a good approximation of the average effect
of the population as a whole. As we have argued, variation         in the population. Because there are so many differences
is something that a mixed-effects model adjusts for to             between and within people, it is likely that there will be
provide us their best estimate of the average relation in the      variation even in the Stroop effect, which we consider one
population. Thus, highlighting the variation around fixed          of the most stable phenomena in Psychology (Bolger et al.,
effects or focusing on individual random slopes does not           2019; Fisher et al., 2018; Haaf & Rouder, 2019; Molenaar,
satisfy the goals of understanding the population. This issue      2004).
is exacerbated by non-representative samples typically                 In short: Random variation around effects—including
recruited in the field.                                            null effects—is exceptionally likely in nearly any
    That’s not to say that the variation around the fixed effect   psychological phenomenon. As far as we know, we have yet
is meaningless. That variation complements, but does not in        to identify an invariant phenomenon. Because human
any way supplant, information the fixed effects carry – it can     cognition, emotion, and behavior is complex (Eronen &
inform us about the expected variation from person to person       Bringmann, 2021), it is practically impossible to causally
around the fixed effect. That variation, in turn, can inform       explain them in their totality (Muthukrishna & Henrich,
us whether we should identify causes for this variation, such      2019). There are a myriad of different genetic and
as moderators or other predictors of variance. Yet again, we       environmental influences on human behavior – not to speak
are studying moderators by making inferences based on              of the differences in affordances, content, and user
fixed effects (independent of whether we assume variability        motivation for using social media. These influences can and
between persons).                                                  will interact; therefore, each occasion a person uses social
    Therefore, we are wary of the field assigning wrong            media is so multiply determined as to be nearly unique. As
priorities to the properties of its models by interpreting         a result, people differ to such a substantial degree that no
random slopes without communicating the essential context          single treatment is likely to have identical effects across
provided by fixed effects. We must continue treating fixed         everyone in the population (Fisher et al., 2018; Molenaar,
effects as the primary insight from analysis; otherwise, we        2004). Applying this reasoning to our case: Whenever we
shift the focus from inferences about the population to            investigate the relation between social media use and well-
making predictions for individual participants in our study.       being, we assume that the true relation has a distribution
In other words, a focus on individual random slopes in             across people. Most people will experience an association
particular neglects our inferential goals and casts doubt on       close the average of the distribution, but there will also be
previous research that interprets average effects, thereby         people who display larger or smaller associations.
atomizing our knowledge base. If we want to inspect                    If we accept that not all people are the same and social
variation, we need a principled approach which integrates          media effects naturally contain variation, the conclusion that
both sources of information. Before we outline that                media effects won’t be the same for everyone takes the form
approach, we first discuss where variation comes from and          of a circular argument. When we look at person-specific
what it means.                                                     media effects, the question then is not whether there will be
                                                                   variation around the fixed effect. The questions are rather:
   There will (likely) always be variation
                                                                   How do we estimate variance around the fixed effect? How
    To highlight the role of variation, let’s consider an          much variation is there? And how much variance is relevant
illustrative example from Psychology: the Stroop effect.           to warrant further attention?
People are generally slower to name the color of incongruent           The goal to find factors that make people more
words (i.e., the word RED in the color blue) compared to           susceptible to social media effects rests on the assumption
congruent words (i.e., the word RED in the color red). In          that variation in media effects is meaningful and not just
Psychology, we assume the effect follows some lawfulness           noise. If the effect of social media use has little variation
of cognitive processing that is universal across humans. But       around the general null effect, only few people will
it is unlikely that even effects we consider lawful are            experience negative effects. By contrast, if there’s a lot of
invariant across people, even if their direction is universal      variation around the effect, the large negative influence of
(Haaf & Rouder, 2019). There is likely variation in the effect     social media will apply to more people and identifying
due to differences between people (i.e., you are different         systematic differences becomes relevant. The question of
from me) and within people (i.e., you might behave                 what meaningful variation is goes hand in hand with the
differently later compared to you earlier). That’s why we are      question how we should invest finite resources: How should
trying to estimate an average Stroop effect from our sample        further public funding on researching social media effects be
Preprint under review                                                                                                   7

allocated? Finding factors that make people more                 integrate previous insights and current practices in the field
susceptible implies that finding those factors is worth the      to a principled approach. To illustrate that approach, we
investment of time and money.                                    work along an example taken from Beyens and colleagues
    The field must provide benchmarks against which we           (2020) who presented a study on the relation between social
measure the answers to these questions of effect                 media use and well-being in an experience sampling study.
heterogeneity; it must specify how much variation is             They found a fixed effect for the relation between social
meaningful and warrants further investigation. Otherwise,        media use (in steps of five minutes) and well-being of .06 on
we simply restate circular reasoning: people being different     a 7-point Likert scale. That association was on the within-
because they’re different – unless they’re not different         level: For the average person, spending five more minutes
enough. Next, we therefore outline a principled approach to      on social media in the past hour than they typically do was
dealing with variation in average relations.                     associated with a 0.06 increase in well-being. That fixed
                                                                 effect was not significant (p = .091).
                 Where do we go from here?                           How do we know how much variability there is in that
                                                                 average effect? The standard deviation of the random slopes
   Quantifying variation                                         provides us with that answer. In the case of our example, the
    How to assess whether there is meaningful variability        standard deviation was 0.71 (σ2 = 0.50 from Table 2), more
around the average effect is neither a new challenge nor is it   than ten times as large as the average effect. From the
one special to the study of social media. For example, in the    standard deviation, we can calculate an interval around the
field of personalized medicine, there is a heavy debate on       fixed effect, sometimes referred to as heterogeneity interval
how to deal with variation in effects and how to demark          (Bolger et al., 2019), by multiplying the mean effect (0.06)
effects on the individual level from those on the group-level    with the lower and upper bound of the standard deviation of
(Senn, 2016, 2018). A similar debate has been going on in        the mean social media effect (0.06 ± 1.96 x 0.71). Therefore,
the social sciences for several years (Fisher et al., 2018;      our heterogeneity interval is [-1.33, 1.45]. It tells us that 95%
Molenaar, 2004). Similarly, Bolger and colleagues (2019)         of person-specific associations between social media use
have addressed the question of meaningful variation in           and well-being in the population would fall within this
experimental effects extensively and provide an overview of      interval. According to the model, some people will
how to deal with effect heterogeneity. The study of variation    experience negative associations (-1.33) that are 22 times
in the relation between social media use and well-being can      more intense and negative than the average positive
benefit from the work in these fields. Rather than reinventing   association (0.06); others will display positive associations
the wheel, we aim to integrate this work and translate it to a   (1.45) that are 24 times larger.
principled approach to study variation in social media               In the above example, we used point estimates of the
research.                                                        fixed effect and its standard deviation to obtain a
    How has the field studied variation so far? Researchers      heterogeneity interval. In practice, these parameters are
most often start with model comparisons, where they              estimated from data and therefore introduce uncertainty
compare a model with only a fixed slope (i.e., the effect will   which ought to be included in further calculations (e.g., of
be the same for every person) to a model with additional         heterogeneity intervals). Without representations of these
random slopes (i.e., each person will differ to a degree from    uncertainties, for example in the form of posterior
the overall effect). The model that accounts for variability     distributions, researchers run the risk of making overly
around the effect routinely fits the data better. Another        confident or underconfident statements. However, we only
common practice is plotting the distribution of the observed     had access to point estimates for these examples and
random slopes to demonstrate the variation in the relation       therefore continue working with them, while recognizing
between social media use and well-being. A subsequent step       that in practice such uncertainties should be described.
is often defining cutoffs for effect sizes following the             Note that effect heterogeneity and the uncertainty around
conventional benchmarks of Cohen (1988) and describing           the fixed effects are not the same. The fixed effect is the
what proportion of random slopes in the sample exceeds           average association between social media use and well-
these benchmarks (e.g., 12% of the observed random slopes        being; its surrounding 95% confidence intervals inform us
are considered large).                                           about variability in that average relation from sample to
    Drawing inspiration from previous work on effect             sample. If we ran infinite studies, 95% of the confidence
heterogeneity (Fisher et al., 2018; Molenaar, 2004; Senn,        intervals around the fixed effect would contain the true
2018), especially from Bolger and colleagues (2019), we          population average relation. In contrast, the heterogeneity
Preprint under review                                                                                                  8

interval informs us about variability in the association from    meaningful from random variation, we suggest a principled
person to person. If we ran infinite studies, 95% of the         workflow that follows three steps (see Table 1). First, we can
heterogeneity intervals would contain an individual person’s     compare models as a baseline test. Second, we must define
true relation of social media use and well-being.                a Region of Practical Equivalence (hereafter ROPE;
    However, the accuracy of these parameters only holds         Kruschke, 2014) around the fixed effect and test our
assuming adequate sampling on both the between- and the          heterogeneity distribution against this ROPE to identify
within-person level. On the between-level, if we sample          noteworthy variation. Third, we must define a Smallest
social media users that are not representative of the            Effect Size of Interest (hereafter SESOI; Anvari et al., 2021;
population we want to generalize to, our estimate of the         Lakens et al., 2018) and compare the heterogeneity
variability of the effect also isn’t representative. The same    distribution against it.
limitation applies if we don’t obtain a representative sample         First, Bolger and colleagues (2019) recommend model
from people’s everyday social media use and well-being           comparisons as a first step. During that step, we compare a
(e.g., via a random experience sampling procedure). If we        model without random slopes to a model with random
don’t study a representative sample of a person’s life,          slopes. Goodness of fit is the standard by which model
inferences about the distribution of all participants in our     comparisons are judged. If the slopes significantly improve
study will be flawed. Therefore, the accuracy of any             model fit, we have initial evidence that there might be
descriptive analysis of a distribution of individual relations   meaningful variation around the average effect. As already
depends on sampling on both levels: the individual and the       outlined earlier, this step is far from conclusive.
group.                                                           Theoretically, we know that people are different and a model
    Assuming adequate sampling, the heterogeneity interval       with random slopes will almost always yield a better fit (Barr
therefore answers exactly the question we are interested in:     et al., 2013). Therefore, model comparison provides a
What social media relations can we expect in the                 necessary, but not a sufficient, first step.
population? Unfortunately, the field has not employed these           Second, we must define a ROPE which “indicates a
intervals, which prevents social scientists from being able to   small range of parameter values that are considered to be
quantify variation in media effects from person to person.       practically equivalent to [the fixed effect] for the purposes
Merely inspecting random slopes as evidence of meaningful        of the particular application.” (Kruschke, 2014). Let's apply
variation in the relation confounds sample-to-sample             this definition to our working example. Before we collect
variation of the average relation and person-to-person           data, we decide that our fixed effect of social media use on
variation around the average relation. We strongly               well-being has noteworthy variation if the effect
recommend the field adopts the practice of estimating            heterogeneity distribution exceeds a range of ± 0.3 Likert-
heterogeneity intervals. As a quantitative discipline that is    points around the fixed effect. Note that this number is
interested in variability of a parameter, we need to define      entirely arbitrary; “ROPE limits, by definition, cannot be
how to estimate that parameter before we can even begin to       uniquely ‘correct,’ but instead are established by practical
interpret variability.                                           aims” (Kruschke, 2014, p. 338). We need expert knowledge
                                                                 to determine our ROPE and provide context for analyses
   Interpreting variation
                                                                 which use it as standard in our models. For some, 0.3 will
    After correctly quantifying variation, deciding whether      represent a meaningful and sensible cutoff for this effect; for
the estimated variability is meaningful is both possible and     others, it won’t. Like Bayesian procedures that clearly
necessary. In other words: Now that it’s measurable, can we      communicate prior beliefs about an effect, being transparent
safely ignore effect heterogeneity or is it worthy of further    and putting ROPE up for discussion enables others to better
investigation? Many social scientists default to treating        scrutinize how we deal with effect heterogeneity (Dienes,
variation around effects as a result of hidden moderators        2019). With this procedure, we communicate to readers that
(Kunert, 2016), thereby seeing all variation as meaningful       we only find the variation around a fixed effect worthy of
and worthy of further examination. However, we caution           further study if that variation doesn’t fall within the ROPE.
against such a view. As we have explained, few, if any,               After having defined our ROPE, we need to test the
psychological phenomena will be invariant and much               variation against the ROPE. Here, we don’t rely on the
variation we can consider noise (e.g., from the sampling         observed random slopes, but the theoretical distribution
strategy, sample size, the size of the fixed effect,             around the fixed effect, that is, the heterogeneity distribution
measurement error, to name just a few). Explaining all           from which we draw the heterogeneity interval (Bolger et
variation may practically be impossible. To distinguish          al., 2019). Plotting the observed random effects in the
Preprint under review                                                                                                  9

Table 1. An explanation of the three steps of interpreting variation.
     Step                                          Explanation
 1     Model comparison                              Statistically compare a model with a fixed effect and random slope to a
                                                     model with only a fixed effect.
 2     Region of Practical Equivalence (ROPE)        Define a region of practical equivalence and estimate and compare
                                                     theoretical distribution of average effect against it
 3     Smallest Effect Size of Interest (SESOI)      Define a smallest effect size of interest and compare theoretical
                                                     distribution of average effect against it

sample distracts from the actual purpose of the model, which       size of interest (SESOI) for the relationship of interest (e.g.,
is to make an inference to the population. As we explained         social media use and well-being). The SESOI tells us how
in the section on quantifying variation, we can estimate this      large an association has to be for us to consider it practically
theoretical distribution with the fixed effect and its standard    relevant (Anvari et al., 2021; Anvari & Lakens, 2021;
deviation.                                                         Lakens et al., 2018).
    We then can calculate the area under that theoretical               Both the ROPE (i.e., width of distribution, relative
distribution to infer what proportion of media users fall          limits) and the SESOI (i.e., location and width of
below or above certain thresholds. In our recurring example,       distribution, absolute limits) matter, see Figure 3: Our effect
we have an average relation of 0.06 and a standard deviation       heterogeneity might well lie outside the ROPE, but that
of 0.71. Our ROPE of ± 0.3 Likert-point hence ranges from          doesn’t mean it’s practically relevant. The distribution of
–0.24 to 0.36 (.06 ±0.3). Now, we can calculate what               associations we can infer from our sample might well have
proportion of our distribution falls outside the ROPE. For         noteworthy variation, but fall completely within the bounds
this example, the area outside this range is 67%. As it is         of our SESOI (see blue distribution in Figure 3). Then we
more than 5%, we can conclude that there is noteworthy             conclude that there is noteworthy variation, but that
variation around the fixed effect.                                 variation operates within a range we don’t consider relevant.
    Note several points here: Because we use the theoretical       On the flipside, our distribution might not exceed the ROPE,
distribution, and not observed slopes, we can make an              but lie completely outside our SESOI (grey distribution in
inference to the population. However, as we explained              Figure 3). Now we don’t find noteworthy variation (so we
before, we have used a theoretical distribution derived from       assume the fixed effect is a good enough approximation for
point estimates of fixed effect and its standard deviation. For    everyone), but everyone in the population shows a relevant,
an inference that takes uncertainty into account, ideally we       large enough association. Finally, and probably most
need to estimate the proportion of the theoretical distribution    common, there may be less clear-cut cases (red distribution
outside ROPE for each parameter combination in the                 in Figure 3): For example, we might have noteworthy
posterior. This approach is therefore more informative than        variation, but some parts of the distribution are equivalent to
merely describing what proportion of observed random               a practically insiginicant effect (i.e., inside the SESOI
slopes are outside a cutoff, because observed random slopes        range). Here, we have noteworthy variation and large parts
describe the sample, not the population. Second, effect            of the population show a large effect.
heterogeneity is independent of the location of the fixed               For our running example, we probably want to choose a
effect: we specify the ROPE around wherever the fixed              large SESOI. In standardized terms, an effect that has
effect will fall. Therefore, ROPE limits are relative to the       medium size by Cohen’s (1988) benchmarks might be
location of the fixed effect; ROPE is about the width of the       practically meaningless, depending on the outcome (Anvari
distribution, not its location.                                    et al., 2021; Baguley, 2009; Lakens et al., 2018). In the case
    Third, now that we have tested whether there is                of well-being it has been convincingly argued that people
considerable variation around the fixed effect, we can move        need to experience a somewhat large difference of around
on to investigate the location of the distribution and its width   half a standard deviation before they can subjectively
in relation to an absolute limit. This combination answers         experience that change (Anvari & Lakens, 2021; Norman et
whether there are meaningful person-specific associations.         al., 2003). Therefore, just like with the ROPE, we need to
To investigate whether the variation we consider noteworthy        apply our domain knowledge to define what we consider a
also matters practically, we need to define a smallest effect      meaningful, absolute effect. In our example, say we only
Preprint under review                                                                                                  10

regard large associations of at least one Likert-point or larger    noteworthy and probably worth studying (i.e., ROPE), were
as relevant. We found noteworthy variation in the previous          it not for the generally small associations (i.e., SESOI):
step because the distribution exceeded our ROPE, but only           explaining even noteworthy variation might be
15.9% of the distribution fall outside the SESOI.                   inconsequential. Others will conclude that the variation in
Theoretically, we can expect 15.9% of the population to             associations is noteworthy and large enough in enough cases
exhibit an association between social media use and well-           to be relevant and worthy of further study. Whatever
being that we consider plausibly meaningful. Note again that        researchers decide, we urge them to be explicit and
we’re using point estimates; ideally, we inspect what               transparent in their choices of both ROPE and SESOI. As a
proportion of the heterogeneity interval’s lower and upper          minimal standard, we suggest preregistration as a tool for
bounds lie outside the SESOI.                                       subjecting our hypothesis of effect heterogeneity to a more
    Is 15.9% enough people to conclude that we need to              severe test (Lakens, 2019) – or display a range of ROPEs
explain that variation? Again, there is no absolute rule here       and SESOIs so readers can interpret the results better
and the answer depends on the researcher. Some will                 (Dienes, 2019).
conclude that the variation in associations in itself is

    Figure 3. Examples of how ROPE (Region of Practical Equivalence) and SESOI (Smallest Effect Size of Interest)
interact. Distributions have different ROPEs (bars on top), but the same SESOI (dashed vertical lines). The red case shows
a distribution that is outside ROPE and outside SESOI. The blue case shows a distribution that is outside ROPE, but inside
SESOI. The grey case shows a distribution that is inside the ROPE, but outside the SESOI.
   Explaining variation                                             effects. We agree with previous research that it’s an
                                                                    important step for social media research to identify those
    Once we know how to quantify variation in media effects
                                                                    people who are more susceptible to media effects (Beyens et
and have determined the magnitude of variation necessary
                                                                    al., 2020; Orben, 2020a; Orben et al., 2019; Valkenburg &
to be relevant for social science research, the final logical
                                                                    Peter, 2013). However, that step should not be taken for
step is to ask what factors explain that variation. For whom
                                                                    individual social media users; instead, we must study
does the effect differ and for what reasons? A large amount
                                                                    systematic individual differences or differences in the
of variation around the average effect can mean that there
                                                                    content of social media that can account for variation in
are unobserved factors that explain why some people show
                                                                    social media effects.
a large and others a small effect. It might be well worth to
                                                                         Identifying susceptible people means identifying factors
study these factors. But if we rely on inferential statistics for
                                                                    that can explain systematic variation in the effect.
studying those factors, we are yet again interested in fixed
                                                                    Statistically, those factors are modelled as moderators.
Preprint under review                                                                                                   11

Moderators provide a useful conceptual tool for explaining             If, instead, social media research is truly interested in
variations in the effect across the population, because they       studying effects for each individual person, we argue that
are average effects that we estimate to apply to groups of         such a focus requires a different approach: N = 1 studies.
people. Once more, consider the Stroop effect that we used         They represent an intruiging alternative research direction.
as an example earlier on. The fixed effect will show that          N = 1 studies allow inferences to the person under study only
people, on average, are slower on incongruent trials               – assuming no lawfulness in the effect. Medicine has called
compared to congruent trials. However, that effect likely          for N = 1 studies, but medicine relies on higher lawfulness
varies, such that some people show little slowing and others       (Senn, 2018). With the noisiness of social behavior,
extreme slowing. For example, differences in visual acuity         sampling correctly on the within-person level means
might induce systematic differences between participants. If       ensuring a representative sampling of that person’s usage
some participants have forgotten their contact lenses, they        episodes as well as ensuring enough power to detect
might be slower to read and therefore show a different effect.     seemingly small effects (Götz et al., 2021) – which may
    In the case of social media and well-being, if we find that    require measuring variables as often as 500-1,000 times.
the relation between social media use and well-being has           Alternatively, researchers truly try to describe and
high variability, it’s possible that modelling knowledge           understand an individual person through a qualitative
about group membership—whether someone identifies with             approach. A qualitative approach won’t lead to quantifiable
a particular gender— can explain parts of that variability         social media effect estimates, but to a nuanced
because the relation is present for teenage girls, but absent      understanding of effects in that specific person. It’s
for boys (Orben et al., 2019). But note that we infer that this    extremely valuable to explain social media effects in
moderation reported by Orben and colleagues generalizes            individuals through qualitative research, but qualitative
only to the population which was sampled: A large group of         research won’t test conditions that explain systematic
British young people aged 10 to 15 years old. We’re not            variation in quantifiable social media effect estimates. If the
saying the relation is negative for a specific girl, or null for   goal is to quantify effects to generalize to a large group of
a specific boy. There’s little doubt many girls in the data        people, however, we need to identify moderators. Focusing
show no relation whereas a number of boys show negative            on individual participants in a sample will undermine that
associations. Put differently, identifying moderators will         goal and fail to advance social media effects research.
reduce effect heterogeneity, but cannot entirely eliminate it
(Bolger et al., 2019). We argue that identifying moderators,                                  Conclusion
not inspecting individual users, echoes calls of differential
susceptibility to media effects models because moderators              What is the relation between social media use and well-
answer the question of what factors in groups of people can        being? Social science has done valuable work over the past
explain media effects – not what factors play a role for           fifteen years to address that question with increasing
individual media users (Valkenburg & Peter, 2013).                 fidelity. It has shown that the relation between general social
    Identifying moderators that can systematically explain         media use and general well-being is rarely large and mostly
effect heterogeneity in a diciplined and accurate way is           close to zero. Using these average relations as a vantage
difficult and we expect social scientists will be tempted to       point to identify factors that make some people more
adopt a ‘shot-gun approach’ and measure and test a large           susceptible to potential effects of social media can be an
number of contructs as moderators. This strategy is doomed         important next step. In other words, there’s promise in
to failure. It will lead to high false positive rates and fool     further studying the variation in the average (mostly) null
social scientists into giving more weight to those moderators      effects the literature has shown. However, we believe the
that ‘worked’ (Munafò et al., 2017; Nosek et al., 2018).           field lacks a principled approach to study effect
Testing a wide slate of seemingly plausible moderators will        heterogeneity. Here, we’ve made an attempt to explain how
inevitably yield statistically significant results; but ignoring   social media effects research has gotten to its current state
researcher degrees of freedom means these results will not         and shown a way forward. Either we follow a principled
be informative unless more advanced statistical methods are        approach to the study of effect heterogeneity: continue
used (Gelman & Loken, 2013; Simmons et al., 2011).                 investigating fixed effects, develop principles to quantify
Instead, theory should identify moderators that researchers        and interpret effect heterogeneity, and identify moderators
test in truly confirmatory tests (Fried, 2020). Only such an       of the relation between social media use and well-being.
approach can systematically explain effect heterogeneity           Such moderators may well turn out to be user motivation and
that can be generalized to the population.                         the content people engage with. Or we focus on individual
Preprint under review                                                                                               12

social media users and conduct more qualitative and N = 1        Beyens, I., Pouwels, J. L., van Driel, I. I., Keijsers, L., &
studies. Either of these paths will be insightful, but we                 Valkenburg, P. M. (2020). The effect of social
mustn’t confuse them.                                                     media on well-being differs from adolescent to
                                                                          adolescent. Scientific Reports, 10(1), 10763.
                           References                                     https://doi.org/10.1038/s41598-020-67727-7
                                                                 Bolger, N., Zee, K. S., Rossignac-Milon, M., & Hassin, R.
Aalbers, G., Vanden Abeele, M. M. P., Hendrickson, A. T.,                 R. (2019). Causal processes in psychology are
          de Marez, L., & Keijsers, L. (2021). Caught in the              heterogeneous.      Journal      of     Experimental
          moment: Are there person-specific associations                  Psychology:       General,     148(4),     601–618.
          between momentary procrastination and passively                 https://doi.org/10.1037/xge0000558
          measured smartphone use? Mobile Media &                Bolker, B. M., Brooks, M. E., Clark, C. J., Geange, S. W.,
          Communication,                  2050157921993896.               Poulsen, J. R., Stevens, M. H. H., & White, J. S. S.
          https://doi.org/10.1177/2050157921993896                        (2009). Generalized linear mixed models: A
Anvari, F., Kievit, R., Lakens, D., Przybylski, A. K.,                    practical guide for ecology and evolution. Trends
          Tiokhin, L., Wiernik, B. M., & Orben, A. (2021).                in Ecology and Evolution, 24(3), 127–135.
          Evaluating the practical relevance of observed                  https://doi.org/10.1016/j.tree.2008.10.008
          effect sizes in psychological research. PsyArXiv.      Bruggeman, H., Van Hiel, A., Van Hal, G., & Van Dongen,
          https://doi.org/10.31234/osf.io/g3vtr                           S. (2019). Does the use of digital media affect
Anvari, F., & Lakens, D. (2021). Using anchor-based                       psychological well-being? An empirical test
          methods to determine the smallest effect size of                among children aged 9 to 12. Computers in Human
          interest. Journal of Experimental Social                        Behavior,               101,               104–113.
          Psychology,                96,              104159.             https://doi.org/10.1016/j.chb.2019.07.015
          https://doi.org/10.1016/j.jesp.2021.104159             Carr, N. (2011). The shallows—How the internet is
Appel, M., Marker, C., & Gnambs, T. (2020). Are social                    changing the way we think, read and remember.
          media ruining our lives? A review of meta-analytic              Atlantic Books.
          evidence. Review of General Psychology, 24(1),         Cohen, J. (1988). Statistical power analysis for the
          60–74.                                                          behavioral sciences (2nd ed.). Lawrence Erlbaum.
          https://doi.org/10.1177/1089268019880891               Council on Communications and Media. (2016). Media and
Baguley, T. (2009). Standardized or simple effect size: What              Young         Minds.        Pediatrics,      138(5).
          should be reported? British Journal of Psychology,              https://doi.org/10.1542/peds.2016-2591
          100(3),                                    603–617.    Coyne, S. M., Rogers, A. A., Zurcher, J. D., Stockdale, L.,
          https://doi.org/10.1348/000712608X377117                        & Booth, M. (2019). Does time spent using social
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013).               media impact mental health?: An eight year
          Random effects structure for confirmatory                       longitudinal study. Computers in Human Behavior,
          hypothesis testing: Keep it maximal. Journal of                 106160.
          Memory and Language, 68(3), 255–278.                            https://doi.org/10.1016/j.chb.2019.106160
          https://doi.org/10.1016/j.jml.2012.11.001              DeBruine, L. M., & Barr, D. J. (2021). Understanding
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015).                 mixed-effects models through data simulation.
          Fitting linear mixed-effects models using lme4.                 Advances in Methods and Practices in
          Journal of Statistical Software, 67(1), 1–48.                   Psychological        Science,       4(1),      1–15.
          https://doi.org/10.18637/jss.v067.i01                           https://doi.org/10.1177/2515245920965119
Bayer, J. B., Triệu, P., & Ellison, N. B. (2020). Social media   Dickson, K., Richardson, M., Kwan, I., Macdowall, W.,
          elements, ecologies, and effects. Annual Review of              Burchett, H., Stansfield, C., & Thomas, J. (2019).
          Psychology,              71(1),            471–497.             Screen-based activities and children and young
          https://doi.org/10.1146/annurev-psych-010419-                   people’s mental health and psychosocial
          050944                                                          wellbeing: A systematic map of reviews. London:
Beard, K. W. (2005). Internet addiction: A review of current              EPPI-Centre, Social Science Research Unit, UCL
          assessment techniques and potential assessment                  Institute of Education, University College London.
          questions. CyberPsychology & Behavior, 8(1), 7–        Dienes, Z. (2019). How do I know what my theory predicts?
          14. https://doi.org/10.1089/cpb.2005.8.7                        Advances in Methods and Practices in
You can also read