Collective response to the media coverage of COVID-19 Pandemic on Reddit and Wikipedia

Page created by Hugh Wells
 
CONTINUE READING
Collective response to the media coverage of COVID-19 Pandemic on Reddit and Wikipedia
Collective response to the media coverage of COVID-19 Pandemic on Reddit and Wikipedia

                                        Nicolò Gozzi,1 Michele Tizzani,2 Michele Starnini,2 Fabio Ciulla,3 Daniela Paolotti,2 André Panisson,2, ∗ and Nicola Perra1, †
                                                                        1
                                                                            Networks and Urban Systems Centre, University of Greenwich, London, UK
                                                                                                  2
                                                                                                    ISI Foundation, Turin, Italy
                                                                                               3
                                                                                                 Quid Inc., San Francisco, USA
                                                                                                       (Dated: June 12, 2020)
                                                           The exposure and consumption of information during epidemic outbreaks may alter risk perception, trigger
                                                        behavioural changes, and ultimately affect the evolution of the disease. It is thus of the uttermost importance to
                                                        map information dissemination by mainstream media outlets and public response. However, our understanding
                                                        of this exposure-response dynamic during COVID-19 pandemic is still limited. In this paper, we provide a
                                                        characterization of media coverage and online collective attention to COVID-19 pandemic in four countries: Italy,
                                                        United Kingdom, United States, and Canada. For this purpose, we collect an heterogeneous dataset including
                                                        227, 768 online news articles and 13, 448 Youtube videos published by mainstream media, 107, 898 users posts
                                                        and 3, 829, 309 comments on the social media platform Reddit, and 278, 456, 892 views to COVID-19 related
                                                        Wikipedia pages. Our results show that public attention, quantified as users activity on Reddit and active searches
arXiv:2006.06446v1 [cs.SI] 8 Jun 2020

                                                        on Wikipedia pages, is mainly driven by media coverage and declines rapidly, while news exposure and COVID-
                                                        19 incidence remain high. Furthermore, by using an unsupervised, dynamical topic modeling approach, we show
                                                        that while the attention dedicated to different topics by media and online users are in good accordance, interesting
                                                        deviations emerge in their temporal patterns. Overall, our findings offer an additional key to interpret public
                                                        perception/response to the current global health emergency and raise questions about the effects of attention
                                                        saturation on collective awareness, risk perception and thus on tendencies towards behavioural changes.

                                                                I.    INTRODUCTION                                  the spreading was isolated in far away locations. For example,
                                                                                                                    the hashtag #MilanoNonSiFerma (Milan does not stop) was
                                           “In the next influenza pandemic, be it now or in the future,             coined to invite citizens in Milan to go out and live normally.
                                        be the virus mild or virulent, the single most important weapon             Free aperitifs were offered in Venice. In hindsight, of course, is
                                        against the disease will be a vaccine. The second most impor-               easy to criticise the initial response in Italy. In fact, the country
                                        tant will be communication” [1]. This evocative sentence was                has been one of the first to experience rapid growth of hospi-
                                        written in May 2009 by John M. Barry, in the early phases of                talizations [7]. However, the Mayor of London, twelve days
                                        what soon after become the H1N1 2009 pandemic. In his essay,                before the national lockdown, and few days after the extension
                                        Barry summarised the mishandling of the deadly 1918 Spanish                 of the cordon sanitaire to the entire country in Italy, affirmed
                                        flu highlighting the importance of precise, effective and honest            via his official Facebook page “we should carry on doing what
                                        information in the onset of health crises.                                  we’ve been doing” [8]. More in general, in several western
                                           Eleven years later we find ourselves dealing with another                countries, the news coming from others reporting worrying
                                        pandemic. The cause is not a novel strain of influenza, but                 epidemic outbreaks were not considered as relevant for the
                                        these words are, unfortunately, still extremely relevant. In fact,          internal situation. This initial phase aimed at conveying low
                                        as the SARS-CoV-2 sweeps the world and the vaccine is just                  local risk and boosting confidence about national safety has
                                        a far vision of hope, the most important weapons to reduce                  been repeated, at different times, across countries. A series
                                        the burden of the disease are non-pharmaceutical interven-                  of surveys conducted in late February provide a glimpse of
                                        tions [2, 3]. Social distancing became paramount, gatherings                the possible effects of these approaches. They report that cit-
                                        have been cancelled, mobility within and across countries have              izens of several European countries, despite the grim news
                                        been dramatically reduced. While such measures have been                    coming from Asia, were overly optimistic about the health
                                        enforced to different extents across nations, they all rely on              emergency placing their risk of infections to be 1% or less [9].
                                        compliance. Their effectiveness is linked to risk and suscep-               As happened in 1918, the countries that reacted earlier rather
                                        tibility perception [4], thus the information that citizens are             than later were able to control the virus with significant less
                                        exposed to is fundamental.                                                  victims [10–14].
                                           History repeats itself and we seem not be able to learn from                History repeats itself, but the context often is radically dif-
                                        our past mistakes. As happened in 1918, despite early evi-                  ferent. In 1918, news circulated slowly via news papers, con-
                                        dences from China [5, 6], the virus was first equated, by many,             trolled by editorial choices, and of course words of mouth. In
                                        to the normal seasonal flu. As happened in 1918, many na-                   2009, we witnessed the first pandemic in the social media era.
                                        tional and regional governments organised campaigns aimed                   Newspapers and TV were still very important source of infor-
                                        at boosting social activities (and thus local economies) actively           mation, but Twitter, Facebook, YouTube, Wikipedia started to
                                        trying to convince people that their cities were safe and that              become relevant for decentralized news consumption, boost-
                                                                                                                    ing peer discussions, and misinformation spread. Today these
                                                                                                                    platforms and websites are far more popular, integral part of so-
                                        ∗   andre.panisson@isi.it                                                   ciety and instrumental pieces of the national and international
                                        †   n.perra@greenwich.ac.uk                                                 news circulations. Together with traditional news media, they
Collective response to the media coverage of COVID-19 Pandemic on Reddit and Wikipedia
2

are the principal sources of information for the public. As such,    towards different topics are in good accordance, interesting
they are fundamental drivers of people perception, opinions,         deviations emerge in their temporal patterns. Also, we high-
and thus behaviours. This is particularly relevant for health        light that, at the end of our observation period, general interest
issues. For example, about 60% of adults in the USA consulted        grows towards topics about the resumption of activities after
online sources to gather health information [15].                    lockdown, the search for a vaccine against Sars-Cov-2, ac-
   With respect to past epidemics and pandemics, studies on          quired immunity and antibodies tests.
traditional news coverage of the 2009 H1N1 pandemic high-            Overall, the research presented here offers insights to inter-
lighted the importance of framing and its effect on people’s         pret public perception/response to the current global health
perception, behaviours (such as vaccination intent), stigmatisa-     emergency, raises interrogatives about the effects of attention
tion of cultures at the epicentre of the outbreak, and how these     saturation on collective awareness, risk perception and thus on
factors differ across countries/cultures [16–21]. During Zika        tendencies towards behavioural changes.
epidemic in 2016, public attention was synchronised across US
states, driven by news coverage about the outbreak and inde-
pendently of the real local risk of infection [22]. With respect         II.    IMPACT OF MEDIA COVERAGE AND EPIDEMIC
to COVID-19 pandemic itself, a recent study clearly shows                      PROGRESSION ON COLLECTIVE ATTENTION
how Google searches for “coronavirus” in the USA spiked sig-
nificantly right after the announcement of the first confirmed          How is collective attention shaped by news media coverage
case in each state [23]. Several studies based on Twitter data       and epidemic progression? To tackle this important question,
also highlight how misinformation and low quality information        we collected an heterogeneous dataset that includes COVID-
about COVID-19, although overall limited, spread before the          19 related news articles and Youtube videos published online
local outbreak and rapidly took off once the local epidemic          by mainstream information media, relevant posts and relative
started. In the current landscape, this has the potential to boost   discussion of geolocalized Reddit users, and country-specific
irrational, unscientific, and dangerous behaviours [24–26].          views to Wikipedia pages related to COVID-19 for Italy, United
   On the other hand, despite some important limitations [27],       Kingdom, United States and Canada (see subsections A, B,
modern media has become a key data source to observe                 C of Methods and Materials for details). This choice aims to
and monitor health. In fact, posts on Twitter [28–33], Face-         provide an overview of media coverage and a proxy of public
book [34], and Reddit [35, 36], page views in Wikipedia [37,         attention and response. On the one hand, the study of news arti-
38] and searches on Google [39, 40] have been used to study,         cles and videos allows us to estimate the exposure of the public
nowcast and predict the spreading of infectious diseases as well     to COVID-19 pandemic in traditional news media. On the
as the prevalence of noncommunicable illnesses. Therefore, in        other hand, the study of users discussions and response on so-
the current full-fledged digital society, information is not only    cial media (through Reddit) and information seeking (through
key to inform people’s behaviour but can be used to develop          Wikipedia page views) allows us to quantify the reaction of in-
an unprecedented understanding of such behaviours, as well as        dividuals to both the COVID-19 pandemic and news exposure.
of the phenomena driving them.                                       As mentioned in the introduction, previous studies showed the
   The context where COVID-19 is unfolding is thus very het-         usefulness of social media, internet use and search trends to
erogeneous and complex. Traditional and social media are             analyze health-related information streams and monitor public
integral parts of our perception and opinions, have the potential    reaction to infectious diseases [41–45].
to trigger behaviour change and thus influence the pandemic             Hence, we consider volume of comments of geolocalized
spreading. Such complex landscape must be characterized in           users on the subreddit /r/Coronavirus1 to explore the public
order to understand the public attention and response to media       discussion in reaction to media covering the epidemic in the
coverage. Here, we tackle this challenge by assembling an het-       various countries, while we consider the number of views of
erogeneous dataset which includes 227, 768 news and 13, 448          relevant Wikipedia pages about COVID-19 pandemic to quan-
YouTube videos published by traditional media, 278, 456, 892         tify users interest. It is important to stress how Reddit and
views of topical Wikipedia pages, 107, 898 submissions and           Wikipedia provide different aspects of online users behaviour
3, 829, 309 comments from 417, 541 distinct users on Red-            and collective response. In fact, while Reddit posts can be
dit, as well as epidemic data in four different countries: Italy,    regarded as a general indicator of the online discussion sur-
United Kingdom, United States, and Canada.                           rounding the global health emergency, the number of access
                                                                     to COVID-19 related Wikipedia pages is a proxy of health in-
   First, we explore how media coverage and epidemic pro-
                                                                     formation seeking behaviour (HISB). HISB is the act through
gression influence public attention and response. To achieve
                                                                     which individuals retrieve and acquire new knowledge about
this, we analyze news volume and COVID-19 incidence with
                                                                     a specific topic related to health [46, 47], and it is likely to be
respect to Wikipedia page views volume and Reddit comments.
                                                                     triggered on a population scale by a disrupting event, such as
Our results show that public attention and response are mostly
                                                                     the threaten of a previously unknown disease [48, 49].
driven by media coverage rather than disease spreading. Fur-
thermore, we observe typical saturation and memory effects
of public collective attention. Moreover, using an unsuper-
vised topic modeling approach, we explore the different topics       1   Subreddits are user-created areas of interest where discussions are organized
framed in traditional media and in Reddit discussions. We                in Reddit. /r/Coronavirus is the subreddit related to discussions surrounding
show that, while attentions of news outlets and online users             COVID-19 pandemic
3

Figure 1:      Normalized weekly volume of news articles and
Youtube videos (news)), Reddit comments (reddit), Wikipedia views
(wikipedia) related to COVID-19 pandemic and COVID-19 incidence
                  (covid inc.) in different countries.

   Our analysis starts by comparing, in Figure 1, the weekly
volume of news and videos published on Youtube, Wikipedia           Figure 2: Country specific Pearson correlation coefficients between:
views, and Reddit comments of geolocalized users in compari-        1) news coverage and global/domestic COVID-19 incidence, volumes
son with the weekly COVID-19 incidence in the four countries        of Reddit comments and Wikipedia views; 2) global COVID-19 in-
considered. It can be seen how, as COVID-19 spreads, both           cidence and volumes of Reddit comments and Wikipedia views; 3)
media coverage and public interest grow in time. However,           domestic COVID-19 incidence and volumes of Reddit comments and
public attention, quantified by the number of Reddit comments                                Wikipedia views.
and Wikipedia views, sharply decreases after reaching a peak,
despite the volume of news and COVID-19 incidence remain-
ing high. Furthermore, the peak in public attention consis-         while media coverage is generally well synchronized with the
tently anticipates the maximum media exposure and maximum           global COVID-19 incidence, the media attention gradually
COVID-19 incidence.                                                 shifts towards the internal evolution of the pandemic as soon as
   The correlation between media coverage, public attention,        domestic outbreaks erupt. Arguably, this may have played an
and the epidemic progression is quantified more in details in       important role in individual risk perception. We can speculate
Figure 2. The plot shows that news coverage of each country         that re-framing the emergency within a national dimension had
is strongly correlated with COVID-19 incidence (both global         the potential to amplify the perceived susceptibility of indi-
and domestic), and slightly less with the volume of Reddit          viduals [50, 51] and thus increase the adoption of behavioural
comments and Wikipedia views, which, in turn, are much              changes [4, 52]. Indeed, previous studies showed how at the
less correlated with COVID-19 incidence (both global and do-        beginning of February 2020 people were overly optimistic re-
mestic). This holds for all countries under consideration and       garding the risks associated with the new virus circulating in
highlights how the disease spreading triggers media coverage,       Asia, and how their perception sharply changed after first cases
and how the public response is more likely driven by such news      were confirmed in their countries [9, 53].
exposure in each country rather than COVID-19 progression.             To explore more systematically the relationship between
Beyond these observations, it is interesting to notice from Fig-    media coverage, public attention and epidemic progression, we
ure 2 that Italy is the only country where news volume shows        consider a linear regression model to nowcast, separately for
higher correlation with domestic rather than global incidence.      each country, collective public attention (quantified with the
This suggests that Italian media coverage follows more closely      number of comments by geolocalized Reddit users or visits to
the internal evolution rather than the global one, at odds with     relevant Wikipedia pages) given the volume of media cover-
respect to other countries. This is probably due to Italy being     age or the COVID-19 incidence as independent variables. We
the location of the first COVID-19 outbreak outside Asia.           include also “memory effects” in the public attention by con-
   This observation is supported by Figure 3, showing the ci-       sidering an exponential decaying term in the news time series
tation share of Italian locations by Italian news media, before     [22] (see subsection D of Methods and Materials for details).
and after the first COVID-19 death was confirmed in Italy           We compare the three models, where the independent vari-
on 2020/02/20. After this date, Italian locations represent         able(s) are the domestic incidence, the news volume, the news
about 74% of all places cited by Italian media (in our dataset),    volume plus a memory term, by using the Akaike Information
with an increase of 45% with respect to the same statistics         Criterion (AIC) [54] and coefficient of determination (R2 ). We
calculated before. Similar effects, though generally less in-       found that the model considering only COVID-19 incidence
tense, can be observed also in the other countries. Therefore,      has much less predictive power than the ones considering me-
4

                                                                                       news         newsM EM                   R2
                                                                                 reddit wikipedia reddit wikipedia      reddit wikipedia
                                                                          Italy   0.87    0.43    -0.41    -0.15         0.82    0.73
                                                                          UK      0.95    0.99    -0.44    -0.47         0.82    0.85
                                                                          US      1.07    0.87    -0.48    -0.44         0.88    0.82
                                                                          Canada 1.12     1.06    -0.40    -0.45         0.90    0.82

                                                                       Table I: Coefficient estimates for news plus memory effects linear
                                                                       regression model and related R2 . It is worth underlying that, despite
                                                                       the simplicity of this approach, we obtain relatively high values of
                                                                                                       R2 .

                                                                           III.   DYNAMICS OF CONTENT PRODUCTION AND
                                                                                           CONSUMPTION

                                                                           While collective attention and media coverage are well cor-
Figure 3: Share of citations of China versus home country locations    related in terms of volume, the content and topics discussed
by Italian/UK/US/Canadian news outlets before and after first COVID-   by media and consumed by online users may not be as syn-
19 death occurred in different countries considered. Geographic        chronized [65, 66]. To shed light on this issue, we adopt an
           locations are extracted from text using [55, 56]            unsupervised topic modeling approach to extract prevalent top-
                                   .                                   ics in the news articles mentioned and discussed on Reddit.
                                                                       Often, indeed, users on Reddit post a submission containing a
                                                                       news article, and discussion unfolds in comments under such
                                                                       submission. Differently from the first part and to provide a
                                                                       comprehensive overview of the topics discussed, here we do
dia coverage (see Table II in Methods and Material). This              not take into account any geographical context. Nonetheless, in
enforces the idea that collective attention is mainly driven by        the Supplementary Information we provide some insights also
media coverage rather than COVID-19 incidence. In addition,            on the specific topics discussed by users in different countries.
we found that including memory effects improves significantly              We characterize the main topics discussed on Reddit by con-
the model performance. Not surprisingly, the coefficients of           sidering all submissions that include a news article in English.
the “memory effects” term reported in Table I are negative             We then apply a topic modelling approach on the content of
for all countries. This implies that public attention actually         this news article set. Specifically, we extract topics by means
saturates in response to news exposure and gives us the chance         of non-Negative Matrix Factorization (NMF) [67], a popular
to quantify the rate at which this phenomenon happens.                 method for this kind of tasks (see subsection E of Methods
                                                                       and Materials for details). In this way, we extract the n = 64
   The results presented so far are in very good accordance with       most relevant topics in the news shared on Reddit. As a second
findings obtained in previous contexts related to epidemics and        step, we apply the model trained on the Reddit news to the set
pandemics. Indeed, a similar media-driven spiky unfolding of           of articles published by mainstream media. That is, we char-
public attention, measured through the information seeking and         acterize the news published by media in terms of the topics
public discussions of online users, has been observed during           discussed on Reddit. This choice allows us to directly compare
the 2009 H1N1 influenza pandemic [57, 58], the 2016 Zika               the topics covered by media with the public discussion around
outbreak [59], the seasonal flu [60] and during more localized         such news exposure. A complete list of the 64 topics extracted
public health emergency such as the 2013 measles outbreak              with the most frequent words is provided in the Supplementary
in Netherlands [61]. Our findings confirm the central role of          Information.
media, showing how media exposure is capable of shaping and                We consider the number of articles published on a certain
driving collective attention during a national and global health       topic as a proxy of general interest of traditional media towards
emergency.                                                             it, while we measure the collective interest of Reddit users by
                                                                       the number of comments under the news articles on a specific
   Media exposure is an important factor that can influence            topic. Figure 4 shows an overview of the topics extracted and
individual risk perception as well [62–64]. The timing and             a comparison of the interest of media and Reddit users. We
framing of the information disseminated by media can actually          find a diverse and heterogeneous set of topics. Among others,
modulate the attention and ultimately the behaviour of individ-        we recognize topics about the global spreading of the virus
uals [2]. This becomes an even greater concern in a context            (Outbreaks, WHO, CDC), COVID-19 symptoms, treatment,
where the most effective strategy to fight the spreading are           hospitals and care facilities (Symptoms, Medical Treatment,
containment measures based on individuals’ behaviour. For              Medical Staff, Care Facilities), the economic impact of the pan-
this reason, in the next section we characterize media coverage        demic and responses from the governments to the upcoming
and online users response more specifically in terms of content        crisis (Economy, Money), different societal aspects (Sports,
produced and consumed.                                                 Religious Services, Education), and also the possible interven-
5

                                                                        Figure 5: Scatter plot with the 64 topics extracted via NMF. X-axis
                                                                        (Y-axis) coordinate indicates when the topic achieved 50% of its
                                                                           relevance in news outlets (Reddit) during our analysis interval.

                                                                        of the population in the Santa Clara County was infected re-
                                                                        spect to what originally thought [68]. Since the study suggests
                                                                        a lower mortality rate, the preprint has been quickly leveraged
                                                                        to support protest against lockdowns2 , while substantial flaws
                                                                        have been detected in the scientific methodology of the paper 3 .
                                                                           The topics overview presented so far does not take into ac-
                                                                        count any temporal dynamics of interest. However, topics
                                                                        showing a similar overall statistics may present a mismatch
                                                                        in temporal patterns. Hence, in the following, we take into
                                                                        account the temporal evolution of interest towards different
                                                                        topics. In Figure 5 we represent each topic as a single point: its
Figure 4: Difference in interest percentage share of different topics   x-coordinate (y-coordinate) indicates when such topic reached
by traditional media and Reddit users. For example, +2% on the          50% of its total relevance in news outlets (on Reddit) dur-
x-axis indicates that traditional media dedicates proportionally 2%     ing the analysis interval (see subsection E of Methods and
  more attention to that specific topic with respect to Reddit users.   Materials for a formal definition of the relevance of a topic).
                                                                        Therefore, topics at the bottom left became relevant very early
                                                                        in the public discussion. Among these, we recognize themes
tions to mitigate the spreading of the virus (Face Masks, Social        centred on early COVID-19 outbreaks (i.e., Chinese, Japanese,
Distancing, Tests, Vaccine).                                            Iranian and Italian outbreaks), the events related to cruise ships,
   Overall, the attention of traditional media and Reddit users         specific countries (i.e., Israel, Singapore and Malaysia), and
towards different topics are in good accordance. Indeed, in             also topics regarding (early) health issues such as Symptoms,
Figure 4 we represent the difference between interest share             Confirmed Cases and the CDC.
towards different topics in media and Reddit submissions. That             On the contrary, topics in the top right became relevant
is, we compute the percentage share of attention dedicated by           toward the end of the analysis interval (early May). Reason-
news outlets and Reddit users to each topic, and we subtract            ably, we find here topics about the resumption of activities
these two quantities. We observe a maximum absolute mis-                after lockdown (i.e. Reopening), the feasibility and timing
match in interest share of 2.61%. Nonetheless, we observe that          of a possible vaccine against Sars-Cov-2 (i.e. Vaccine), and
Reddit users are slightly more interested to topics regarding           discussions regarding acquired immunity and antibodies tests
health (Symptoms, Medical Treatment), non-pharmaceutical                (i.e. Immunity). In-between, we find all other topics clustered
interventions and personal protective equipment (Social Dis-
tancing, Face Masks), studies and information on the epidemic
(Research, Surveys, Santa Clara Study, CDC), and also to spe-           2   https://www.nytimes.com/2020/05/14/opinion/coronavirus-research-
cific public figures such as Anthony Fauci. Interestingly, the              misinformation.html
                                                                        3
Santa Clara Study topic refers to the discussion about a contro-            https://www.theatlantic.com/health/archive/2020/04/pandemic-confusing-
versial scientific paper suggesting that a much higher fraction             uncertainty/610819/
6

around end of March and mid-April 2020, the period when the          characterized the exposure of individuals to COVID-19 pan-
general discussion surrounding COVID-19 pandemic aroused             demic by considering only news articles and Youtube videos
sharply, as also shown in Figure 1.                                  published online by major news outlets. However, individuals
   Note that the diagonal (plotted as a dashed line) in Fig-         are also exposed to relevant information through other chan-
ure 5 separates topics according to their temporal evolution.        nels, with television on top of these [69]. Second, a 2013 Pew
Above (below) the diagonal, we find topics whose interest on         Internet Study found that Reddit users are more likely young
Reddit grows slowly (quickly) with respect to the media cov-         males [70], showing that around 15% of male internet users
erage. Therefore, above the diagonal the interest of Reddit          aged between 18 and 29 declare to use Reddit, compared to the
users is mainly triggered by media exposure, while below it          5% of women in the same age range and to the 8% of men aged
the interest grows faster and declines rapidly despite sustained     between 30 and 49. Similarly, informal surveys proposed to
media exposure. While the top-left and bottom-right regions          users showed that most of respondents were males in their “late
are empty, indicating that, as a first approximation, temporal       teens to mid-20s”, and that female users were “very much in
patterns of attention by traditional media and Reddit users          the minority” [71]. Furthermore, Reddit is much more popular
are well-synchronized, interesting deviations from the diago-        among urban and suburban residents rather than individuals
nal are observable. For example, above the diagonal one can          living in rural areas [70]. Besides socio-demographic biases,
find mainly topics related to various outbreaks, economics and       other works suggested also that Reddit has become more and
politics, for which the interests on Reddit follows the media        more a self-referential community, reinforcing the tendency
coverage. Below the diagonal, we observe topics more related         to focus on its own contents rather than external sources [72].
to everyday life, such as Schools, Medical Staff, Care Facilities,   Thus, perceptions, interests, and behaviours of Reddit users
and Lockdown, for which the attention on Reddit accelerates          may differ from those of the general population. A similar
with respect to media coverage, and then declines rapidly. Note      argument may be raised for Wikipedia searches. Indeed, the
that our view of topics discussed on Reddit is limited, since we     usage of Internet, especially for information seeking purposes,
only consider topics from news articles shared in submissions        can vary across people with different socio-demographic back-
and do not explicitly take into account content expressed in         grounds [73–76].
comments. This ensures a proper comparison with topics ex-              Finally, our view on online users reaction is partial. Indeed,
tracted from news published and explains the absence of points       we do not consider other popular digital data sources such as,
in the bottom right corner of Figure 5.                              for example, Twitter. The reason behind this choice is twofold.
                                                                     First, many studies already characterized public response dur-
                                                                     ing the current and past health emergencies through the lens
                     IV.   CONCLUSIONS                               of Twitter [25, 43, 45, 58, 59, 77–80]. Second, several stud-
                                                                     ies have reported high prevalence of bots as drivers of low
   In this work, we characterized the response of online users       quality information and discussions on COVID-19 on this plat-
to both media coverage and COVID-19 pandemic progression.            form [24, 25, 81–83]. Thus, careful and challenging extra steps
As a first step, we focused on the impact of media coverage on       would be necessary to isolate, identify, and distinguish organic
collective attention in different countries, characterized as vol-   discussions/reactions possibly originated from traditional me-
umes of country-specific Wikipedia pages views and comments          dia from those sparked by social bots. We leave this for future
of geolocalized Reddit users. We showed that collective atten-       work.
tion was mainly driven by media coverage rather than epidemic           In conclusion, our work offers further insights to interpret
progression, rapidly saturated, and decreased despite media          public response to the current global health emergency and
coverage and COVID-19 incidence remaining high. This trend           raises questions about possible undesired effects of commu-
is very similar to that observed during other outbreaks [57–61].     nication. On one hand, our results confirm the pivotal role
Also, we showed how media coverage sharply shifted to the            of media during health emergencies, showing how collective
domestic situation as soon as the first death was confirmed in       attention is mainly driven by media coverage. Therefore, since
the home country, discussing the implications for re-shaping         people are highly reactive to the news they are exposed to, in
individuals perception of risk [9, 53].                              the beginning of an outbreak, the quality and type of informa-
   As a second step, we focused on the dynamics of content           tion provided might have critical effects on risk perception,
production and consumption. We modeled topics published              behaviours, and ultimately on the unfolding of the disease. On
in mainstream media and discussed on Reddit, showing that            the other hand, however, we found that collective online atten-
Reddit users were generally more interested in health, data          tion saturates and declines rapidly despite media exposure and
regarding the new disease, and interventions needed to halt          disease circulation remaining high. Attention saturation has
the spreading with respect to media exposure. By taking into         the potential to affect collective awareness, perceived risk and
account the dynamics of topics extracted, we show that, while        ultimately propensity towards virtuous individual behavioural
their temporal patterns are generally synchronized, the public       changes aimed at mitigating the spreading. Furthermore, espe-
attention for topics related to politics and economics is mainly     cially in case of unknown viruses, attention saturation might
triggered by media exposure, while the interests for topics          exacerbate the spreading of low quality information, which is
more related to daily life accelerates on Reddit with respect to     likely to spread in the early phases of the outbreak when the
media coverage.                                                      characteristics of the disease are uncertain. Future works are
   Of course, our research comes with limitations. First, we         needed to characterize the actual effects of attention saturation
7

on human perceptions during a global health emergency.             Canadian channels.
   Our findings suggest that public health authorities should
consider to reinforce specific communication channels, such           It is important to underline that, while there is a good overlap
as social media platforms, in order to compensate the (natu-       between the sources of news articles and videos, some do not
ral) phenomenon of attention saturation. Indeed, these chan-       match. This is due to the fact that not all news organizations
nels have the potential to create a more durable engagement        run a YouTube channel and others do not produce traditional
with people, through a continuous loop of direct interactions.     articles. In the Supplementary Information, we provide a com-
Currently, we see public health authorities issuing regularly      plete list of news outlets and Youtube channels considered.
declarations on social media. However, the CDC didn’t even
have a Twitter account in 2009 during H1N1 pandemic (the
account was created in May 2010). While this is just an exam-                                  B.   Reddit data set
ple, it underlines how we are relatively new to communicating
such global health emergencies through social medias. There-           Reddit is a social content aggregation website where users
fore, there is great need to further reinforce and engage people   can post, comment and vote content. It is structured in sub-
through these channels. Alongside, public health authorities       communities (i.e. subreddits), centered around a variety of
should consider to strengthen additional communication chan-       topics.
nels. An example can be represented by participatory surveil-          Reddit has already proven to be suitable for a variety of
lance platforms all over the world such as Influenzanet, Flu       research purposes, ranging from the study of user engagement
Near You and FluTracking [84–86], which have the potential of      and interactions between highly related communities [90, 91]
delivering in-depth targeted information to individuals during     to post-election political analyses [92]. Also, it has been used
public health emergencies, to promote the exchange of infor-       to study the impact of linguistic differences in news titles [93]
mation between people and public health authorities, with the      and to explore recent web-related issues such as hate speech
potential to enhance the level of engagement in the community      [94] or cyberbullying [95] as well as health related issues like
[87].                                                              mental illness [96], also providing insights about the opioid
                                                                   epidemics [77, 97].
                                                                       We used the Reddit API to collect all submissions and com-
              V.    METHODS AND MATERIAL                           ments published in Reddit under the subreddit /r/Coronavirus
                                                                   from 15/02/2020 to 15/05/2020. After data cleanup by re-
   In this section we provide general information about the data   moving entries deleted by authors and moderators, we keep
sets collected and the methods used.                               only submissions with score > 1 to avoid spam. We remove
                                                                   comments with less than 10 characters and with more than 3
                                                                   duplicates, to avoid using automatic messages from modera-
                   A.   News Articles and Videos                   tion. Final data contains 107, 898 submissions and 3, 829, 309
                                                                   comments from 417, 541 distinct users.
                                                                       For the submissions, we then selected entries with links to
   We collect news articles using News API, a service that
                                                                   English news outlets. The content of the urls was extracted
allows to download articles published online in a variety of
                                                                   using the available implementation4 of the method described
countries and languages [88]. For each of the country con-
                                                                   in [98], resulting in 66, 575 valid documents.
sidered, we download all relevant articles published online
                                                                       Reddit does not provide any explicit information about users’
by selected sources in the period 2020/02/07 - 2020/05/15.
                                                                   location, therefore we use self reporting via regular expression
We select “relevant” articles considering those citing one of
                                                                   to assign a location to users. Reddit users often declare geo-
the following keywords: ’coronavirus’, ’covid19’, ’covid-19’,
                                                                   graphical information about themselves in submissions or com-
’ncov-19’, ’sars-cov-2’. Note that for each article we have
                                                                   ment texts. We used the same approach as described in [97],
access to title, description and a preview of the whole text.
                                                                   that found the use of regular expressions as reliable, resulting
In total, our dataset consists in 227, 768 news: 71, 461 pub-
                                                                   in high correlation with census data in the US, although we
lished by Italian, 63, 799 by UK, 82, 630 by US, and 9, 878 by
                                                                   acknowledge a potential higher bias at country level due to
Canadian media.
                                                                   heterogeneities in Reddit population coverage and users demo-
   Additionally, we collect all videos published on YouTube
                                                                   graphics. We selected all texts containing expressions such as
by major news organizations, in the four countries under inves-
                                                                   ‘I am from’ or ‘I live in’ and extracted candidate expressions
tigation, via their official YouTube channels using the official
                                                                   from the text that follows the expression, to identify which
API [89]. In doing so, we download title and description of
                                                                   ones represented country locations. By removing inconsistent
all videos and select as relevant those that mention one of the
                                                                   self reporting we were able to assign a country to 789, 909
following keywords: ‘coronavirus’, ‘virus’, ‘covid’, ‘covid19’,
                                                                   distinct users, from which 41, 465 have written at least one
‘sars’, ‘sars-cov-2’, ‘sarscov2’.
                                                                   comment in the subreddit r/Coronavirus (13, 811 from USA,
   The reach of each channel (measured by number of
                                                                   6, 870 from Canada, 3, 932 from UK and 445 from Italy).
subscribers) varies quite drastically from more than 9 million
for CNN (USA) to about 12 thousand for Ansa (Italy). In
total, the YouTube dataset consist of 13, 448 videos: 3, 325
                                                                    4   https://github.com/jcpeterson/openwebtext
by Italian, 3, 525 by British, 6, 288 by American, ans 310 by
8

                     C.   Wikipedia data set                                    incidence              news         news + newsM EM
                                                                            reddit wikipedia     reddit wikipedia   reddit wikipedia
                                                                     Italy   3.71 -98.73         -29.68 -121.3      -76.44   -138.49
   Wikipedia has become a popular digital data source to study       UK     64.24 65.85          -16.03 -20.46      -51.97    -65.24
health information seeking behaviour [57, 99], and to monitor        US     86.02 61.85          -11.76 -14.67      -48.39    -44.93
and forecast the spreading of infectious diseases [100, 101].        Canada 84.53 68.65          -28.53 -13.99      -69.26    -53.08
Here, we use the Wikimedia API [102] to collect the number
of visits per day of Wikipedia articles and the total monthly ac-    Table II: Akaike Information Criterion for the three linear regression
cesses to a specific project from each country. We consider the      models applied to predict Reddit comments and Wikipedia visits.
language as indicative of a specific country, suggesting the rel-    As a practical rule, a model i is preferred to model j if AIC(j) −
evant projects for our analysis to be in English and Italian, i.e.                              AIC(i) ≥ 3
en.wikipedia and it.wikipedia respectively. We choose the arti-
cles directly related to COVID-19 and the ones in the ’see also’
section of each page at the time of the analysis, 2020/02/07         public collective attention. Then, the models considered are:
- 2020/05/15, including country-specific articles (see Supple-            model I) yt = α1 incidencet + ut
mentary Information for full list of web pages considered).
                                                                         model II) yt = α1 newst + ut                                  (2)
   Except for the Italian, where the language is highly indica-
tive of the location, the number of the access to English pages          model III) yt = α1 newst + α2 newsM EMt + ut
are almost evenly distributed among English-speaking coun-
                                                                     Where yt can be either the volume of Reddit comments of
tries. To normalize the signal related to each country we weight
                                                                     geolocalized users or country specific Wikipedia visits, and
the number of daily accesses to a single article from a specific
                                                                     ut is the error term. In Table II we report the results of the
project p, Sp (d), with the total number of monthly accesses
                                                                     three regressions in terms of Akaike Information Criterion
from a country c to the related Wikipedia project Tpc (d) such
                                                                     (AIC) [54]. We observe that the model considering both news
that the daily page views from a given Wikipedia project and
                                                                     volume and memory effects is generally the better choice,
country is
                                                                     while the model considering COVID-19 incidence only is the
                               Sp (d)Tpc (d)                         worst.
                     c
                    ya,p (d) = P c           ,
                                   c Tp (d)
                                                                                            E.    Topic Modeling
where the denominator is the total number of views of the
Wikipedia specific project. The total volume of views at day d          Topic modeling has emerged as one of the most effective
from a country c is then given by the sum over all the articles      methods for classifying, clustering, and retrieving textual data,
a and projects p, namely                                             and has been the object of extensive investigation in the lit-
                                 X                                   erature. Many topic analysis frameworks are extensions of
                     y c (d) =          c
                                       ya,p (d).                     well known algorithms, considered as state-of-the-art for topic
                                 a,p                                 modeling. Latent Dirichlet Allocation (LDA) [103] is the ref-
                                                                     erence for probabilistic topic modeling. Nonnegative matrix
                                                                     factorization (NMF) [67] is the counterpart of LDA for the
  D.   Linear regression approach to model collective attention      matrix factorization community.
                                                                        Although there are many approaches to temporal and hier-
   Above we showed how media coverage, COVID-19 inci-                archical topic modeling [104–106], we choose to apply NMF
dence, and public attention are correlated even across four          to the dataset, and then build time-varying intensities for each
different countries. To move a step forward in this analy-           topic using the articles publication date. Starting from a dataset
sis, we considered a linear regression model that predicts for       D containing the news articles shared in Reddit, we extract
each country the public response given the news exposure.            words and phrases with the methodology described in [107],
To include “memory effects” in the public response to media          discarding terms with frequency below 10, to form a vocabu-
coverage, we consider also a modified version of this simple         lary V with around 60k terms. Each document is then repre-
model, in which we weight cumulative news articles volume            sented as a vector of term counts, in a bag-of-words approach.
time series with an exponential decaying term [22]. Formally,        We apply TF-IDF normalization [108] and extract a total of
we define the new variable:                                          K = 64 topics through NMF:
                                                                                                               2
                            τ                                                                min kX − WHkF ,                           (3)
                            X           ∆t                                                  W,H
          newsM EM =               e−   τ    news(t − ∆t)     (1)
                            ∆t=1                                     where kk2F is the Frobenius norm and X ∈ R|D|×|V| is the ma-
                                                                     trix resulting form TF-IDF normalization, subject to the con-
Where τ is a free parameter that sets the memory time scale and      straint that the values in W ∈ R|D|×K and H ∈ RK×|V| must
is tuned with cross-validation (more details in the Supplemen-       be nonnegative. The nonnegative factorization is achieved us-
tary Information). These two models are compared to a linear         ing the projected gradient method with sparseness constraints,
regression that considers only COVID-19 incidence to predict         as described in [109, 110]. The matrix H is then used as a
9

transformation basis for other datasets, e.g. with a new matrix            Fondazione Cassa di Risparmio di Torino (Fondazione CRT).
X
e we fix H and calculate a new W   e according to Eq. 3.                   M.T. acknowledges support from EPIPOSE -“Epidemic in-
    For each topic k we build a time series sk for each dataset            telligence to minimize COVID-19’s public health, societal
             (t)
D, where sk is the strength of topic k at time t. For the news             and economical impact” H2020-SC1-PHE-CORONAVIRUS-
                     (t)                                                   2020 call. M.S/ and A.P. acknowledge support from the Re-
outlets dataset, sk = i∈D(t) wik , where D(t) is the set of
                         P
                                                                           search Project “Casa Nel Parco” (POR FESR 14/20 - CANP -
all documents shared at time t in news outlets. For Reddit, we
                                                                           Cod. 320 - 16 - Piattaforma Tecnologica “Salute e Benessere”)
weight each shared document by its number of comments, and
  (t)                                                                      funded by Regione Piemonte in the context of the Regional
sk = i∈D(t) wik · ci , where D(t) is the set of all documents
         P
                                                                           Platform on Health and Wellbeing. A.P. acknowledges partial
shared at time t in Reddit, and ci is the number of comments               support from Intesa Sanpaolo Innovation Center. The fun-
associated to document i. Finally, we define the relevance of a            ders had no role in study design, data collection and analysis,
topic as the integral in time of the strength. Therefore, given            decision to publish, or preparation of the manuscript. N.G.
t0 and tf as the start/end of our analysis interval, and given             acknowledges support from the Doctoral Training Alliance.
       Rt      (t)
R = t0f dtsk , the coordinates of Figure 5 are the t1/2 such
      R t1/2     (t)
that t0 dtsk = R/2.

                                                                                                  CONTRIBUTIONS
                    ACKNOWLEDGMENTS
                                                                           N.G, M.S., D.P., A.P. and N.P. conceptualized the study. N.G.,
  Authors would like to thank the startup Quick Algorithm for              N.P., A.P. and M.T. collected the data. N.G., A.P. and F.C.
providing the platform https://covid19.scops.ai/                           performed analyses. N.G., M.S. and N.P. wrote the initial
scops/home/, where the data collected during COVID-19                      draft of the manuscript. N.G. and A.P. provided visualiza-
pandemic were visualized in real-time. D.P. and M.T. acknowl-              tion. All authors (N.G., N.P., D.P., M.S., A.P., M.T., F.C.)
edge support from the Lagrange Project of the Institute for                discussed the research design, reviewed, edited, and approved
Scientific Interchange Foundation (ISI Foundation) funded by               the manuscript.

  [1] John M Barry, “Pandemics: avoiding the mistakes of 1918,”              [9] Jocelyn Raude, Marion Debin, Cécile Souty, Caroline Guer-
      Nature 459, 324–325 (2009).                                                risi, Clement Turbelin, Alessandra Falchi, Isabelle Bonmarin,
  [2] Sebastian Funk, Marcel Salathé,          and Vincent A. A.                Daniela Paolotti, Yamir Moreno, Chinelo Obi, et al., “Are
      Jansen, “Modelling the influence of human behaviour                        people excessively pessimistic about the risk of coronavirus
      on the spread of infectious diseases: a review,” Jour-                     infection?” (2020).
      nal of The Royal Society Interface 7, 1247–1256 (2010),               [10] Moritz UG Kraemer, Chia-Hung Yang, Bernardo Gutier-
      https://royalsocietypublishing.org/doi/pdf/10.1098/rsif.2010.0142.         rez, Chieh-Hsi Wu, Brennan Klein, David M Pigott, Louis
  [3] Frederik Verelst, Lander Willem, and Philippe Beutels,                     du Plessis, Nuno R Faria, Ruoran Li, William P Hanage, et al.,
      “Behavioural change models for infectious disease trans-                   “The effect of human mobility and control measures on the
      mission: a systematic review (2010-2015),” Journal of The                  covid-19 epidemic in china,” Science 368, 493–497 (2020).
      Royal Society Interface 13 (2016), 10.1098/rsif.2016.0820,            [11] Benjamin F. Maier and Dirk Brockmann, “Effective
      https://royalsocietypublishing.org/doi/pdf/10.1098/rsif.2016.0820.         containment explains subexponential growth in recent con-
  [4] Irwin M. Rosenstock, Victor J. Strecher, and Marshall H.                   firmed covid-19 cases in china,” Science 368, 742–746 (2020),
      Becker, “Social learning theory and the health belief model,”              https://science.sciencemag.org/content/368/6492/742.full.pdf.
      Health Education Quarterly 15, 175–183 (1988), pMID:                  [12] Roy M Anderson, Hans Heesterbeek, Don Klinkenberg, and
      3378902, https://doi.org/10.1177/109019818801500203.                       T Déirdre Hollingsworth, “How will country-based mitigation
  [5] Joseph T Wu, Kathy Leung, Mary Bushman, Nishant Kishore,                   measures influence the course of the covid-19 epidemic?” The
      Rene Niehus, Pablo M de Salazar, Benjamin J Cowling, Marc                  Lancet 395, 931–934 (2020).
      Lipsitch, and Gabriel M Leung, “Estimating clinical severity          [13] Juliet Bedford, Delia Enria, Johan Giesecke, David L Heymann,
      of covid-19 from the transmission dynamics in wuhan, china,”               Chikwe Ihekweazu, Gary Kobinger, H Clifford Lane, Ziad
      Nature Medicine 26, 506–510 (2020).                                        Memish, Myoung-don Oh, Anne Schuchat, et al., “Covid-19:
  [6] “Covid19               timeline,”             https://www.                 towards controlling of a pandemic,” The Lancet 395, 1015–
      who.int/news-room/detail/                                                  1018 (2020).
      27-04-2020-who-timeline---covid-19                           (),      [14] Tim Colbourn, “Covid-19: extending or relaxing distancing
      accessed: 2020-05-11.                                                      control measures,” The Lancet Public Health (2020).
  [7] “Who covid19 official situation reports,” https:                      [15] Susannah Fox and Maeve Duggan, “Health online 2013,”
      //www.who.int/emergencies/diseases/                                        Health 2013, 1–55 (2013).
      novel-coronavirus-2019/situation-reports/                             [16] Seow Ting Lee, “Predictors of h1n1 influenza pandemic news
      (), accessed: 2020-05-11.                                                  coverage: Explicating the relationships between framing and
  [8] “Mayor of london accouncement march 11th,”                                 news release selection,” International Journal of Strategic Com-
      https://www.facebook.com/sadiqforlondon/                                   munication 8, 294–310 (2014).
      posts/3025766374142796, accessed: 2020-05-11.
10

[17] Michael McCauley, Sara Minsky, and Kasisomayajula                         in Proceedings of the 5th Annual ACM Web Science Conference
     Viswanath, “The h1n1 pandemic: media frames, stigmatization               (2013) pp. 47–56.
     and coping,” BMC Public Health 13, 1116 (2013).                    [33]   David A Broniatowski, Michael J Paul, and Mark Dredze,
[18] Carolyn A Lin and Carolyn Lagoe, “Effects of news media                   “National and local influenza surveillance through twitter: an
     and interpersonal interactions on h1n1 risk perception and                analysis of the 2012-2013 influenza epidemic,” PloS one 8
     vaccination intent,” Communication Research Reports 30, 127–              (2013).
     136 (2013).                                                        [34]   Matheus Araujo, Yelena Mejova, Ingmar Weber, and Fabricio
[19] Seow Ting Lee and Iccha Basnyat, “From press release to news:             Benevenuto, “Using facebook ads audiences for global lifestyle
     mapping the framing of the 2009 h1n1 a influenza pandemic,”               disease surveillance: Promises and limitations,” in Proceedings
     Health Communication 28, 119–132 (2013).                                  of the 2017 ACM on Web Science Conference (2017) pp. 253–
[20] Hyun Jung Oh, Thomas Hove, Hye-Jin Paek, Byoungkwan                       257.
     Lee, Hyegyu Lee, and Sun Kyu Song, “Attention cycles and           [35]   Albert Park and Mike Conway, “Tracking health related dis-
     the h1n1 pandemic: A cross-national study of us and korean                cussions on reddit for public health applications,” in AMIA
     newspaper coverage,” Asian Journal of Communication 22,                   Annual Symposium Proceedings, Vol. 2017 (American Medical
     214–232 (2012).                                                           Informatics Association, 2017) p. 1362.
[21] Maria Keramarou, S Cottrell, MR Evans, C Moore, RE Stiff,          [36]   Mrinal Kumar, Mark Dredze, Glen Coppersmith, and Mun-
     C Elliott, DR Thomas, M Lyons, and RL Salmon, “Two waves                  mun De Choudhury, “Detecting changes in suicide content
     of pandemic influenza a (h1n1) 2009 in wales–the possible                 manifested in social media following celebrity suicides,” in
     impact of media coverage on consultation rates, april–december            Proceedings of the 26th ACM conference on Hypertext & So-
     2009,” Eurosurveillance 16, 19772 (2011).                                 cial Media (2015) pp. 85–94.
[22] Michele Tizzoni, André Panisson, Daniela Paolotti, and Ciro       [37]   Nicholas Generous, Geoffrey Fairchild, Alina Deshpande,
     Cattuto, “The impact of news exposure on collective attention             Sara Y Del Valle, and Reid Priedhorsky, “Global disease mon-
     in the united states during the 2016 zika epidemic,” PLoS                 itoring and forecasting with wikipedia,” PLoS computational
     computational biology 16, e1007633 (2020).                                biology 10 (2014).
[23] Ana I Bento, Thuy Nguyen, Coady Wing, Felipe Lozano-Rojas,         [38]   Kyle S Hickmann, Geoffrey Fairchild, Reid Priedhorsky,
     Yong-Yeol Ahn, and Kosali Simon, “Evidence from internet                  Nicholas Generous, James M Hyman, Alina Deshpande, and
     search data shows information-seeking responses to news of                Sara Y Del Valle, “Forecasting the 2013–2014 influenza season
     local covid-19 cases,” Proceedings of the National Academy of             using wikipedia,” PLoS computational biology 11 (2015).
     Sciences (2020).                                                   [39]   Jeremy Ginsberg, Matthew H Mohebbi, Rajan S Patel, Lynnette
[24] Riccardo Gallotti, Francesco Valle, Nicola Castaldo, Pierluigi            Brammer, Mark S Smolinski, and Larry Brilliant, “Detecting
     Sacco, and Manlio De Domenico, “Assessing the risks of”                   influenza epidemics using search engine query data,” Nature
     infodemics” in response to covid-19 epidemics,” arXiv preprint            457, 1012–1014 (2009).
     arXiv:2004.03997 (2020).                                           [40]   Andrea Freyer Dugas, Mehdi Jalalpour, Yulia Gel, Scott Levin,
[25] Lisa Singh, Shweta Bansal, Leticia Bode, Ceren Budak,                     Fred Torcaso, Takeru Igusa, and Richard E Rothman, “In-
     Guangqing Chi, Kornraphop Kawintiranon, Colton Padden,                    fluenza forecasting with google flu trends,” PloS one 8 (2013).
     Rebecca Vanarsdall, Emily Vraga, and Yanchen Wang, “A first        [41]   Gunther Eysenbach, “Infodemiology and infoveillance tracking
     look at covid-19 information and misinformation sharing on                online health information and cyberbehavior for public health,”
     twitter,” arXiv preprint arXiv:2003.13907 (2020).                         American journal of preventive medicine 40, S154–8 (2011).
[26] Matteo Cinelli, Walter Quattrociocchi, Alessandro Galeazzi,        [42]   Gabriel J Milinovich, Gail M Williams, Archie C A Clements,
     Carlo Michele Valensise, Emanuele Brugnoli, Ana Lu-                       and Wenbiao Hu, “Internet-based surveillance systems for mon-
     cia Schmidt, Paola Zola, Fabiana Zollo, and Antonio                       itoring emerging infectious diseases,” The Lancet Infectious
     Scala, “The covid-19 social media infodemic,” arXiv preprint              Diseases 14, 160 – 168 (2014).
     arXiv:2003.05004 (2020).                                           [43]   Han Woo Park, Sejung Park, and Miyoung Chong, “Conver-
[27] David Lazer, Ryan Kennedy, Gary King, and Alessandro                      sations and medical news frames on twitter: Infodemiological
     Vespignani, “The parable of google flu: traps in big data analy-          study on covid-19 in south korea,” J Med Internet Res 22,
     sis,” Science 343, 1203–1205 (2014).                                      e18897 (2020).
[28] Aron Culotta, “Towards detecting influenza epidemics by ana-       [44]   Albert Park and Mike Conway, “Tracking health related discus-
     lyzing twitter messages,” in Proceedings of the first workshop            sions on reddit for public health applications,” AMIA ... Annual
     on social media analytics (2010) pp. 115–122.                             Symposium proceedings. AMIA Symposium 2017, 1362–1371
[29] Vasileios Lampos and Nello Cristianini, “Tracking the flu pan-            (2018).
     demic by monitoring the social web,” in 2010 2nd international     [45]   Alex Lamb, Michael Paul, and Mark Dredze, “Investigating
     workshop on cognitive information processing (IEEE, 2010)                 twitter as a source for studying behavioral responses to epi-
     pp. 411–416.                                                              demics,” AAAI Fall Symposium - Technical Report (2012).
[30] Qian Zhang, Nicola Perra, Daniela Perrotta, Michele Tizzoni,       [46]   Nehama Lewis, “Information seeking and scanning,” in The
     Daniela Paolotti, and Alessandro Vespignani, “Forecasting                 International Encyclopedia of Media Effects (American Cancer
     seasonal influenza fusing digital indicators and a mechanis-              Society, 2017) pp. 1–10.
     tic disease model,” in Proceedings of the 26th international       [47]   Sylvie D. Lambert and Carmen G. Loiselle, “Health
     conference on world wide web (2017) pp. 311–319.                          information—seeking behavior,” Qualitative Health
[31] Munmun De Choudhury, Michael Gamon, Scott Counts, and                     Research 17, 1006–1019 (2007), pMID: 17928475,
     Eric Horvitz, “Predicting depression via social media,” in Sev-           https://doi.org/10.1177/1049732307305199.
     enth international AAAI conference on weblogs and social           [48]   D Walter, Merle Böhmer, Sabine Reiter, Gerard Krause, and
     media (2013).                                                             Ole Wichmann, “Risk perception and information-seeking be-
[32] Munmun De Choudhury, Scott Counts, and Eric Horvitz, “So-                 haviour during the 2009/10 influenza a(h1n1)pdm09 pandemic
     cial media as a measurement tool of depression in populations,”           in germany,” Euro surveillance : bulletin européen sur les mal-
11

       adies transmissibles = European communicable disease bulletin      [65] Wayne Xin Zhao, Jing Jiang, Jianshu Weng, Jing He, Ee-Peng
       17 (2012).                                                              Lim, Hongfei Yan, and Xiaoming Li, “Comparing twitter and
[49]   Natalie Lee-San Pang, “Crisis-based information seeking: mon-           traditional media using topic models,” in Advances in Infor-
       itoring versus blunting in the information seeking behaviour of         mation Retrieval, edited by Paul Clough, Colum Foley, Cathal
       working students during the southeast asian haze crisis,” Inf.          Gurrin, Gareth J. F. Jones, Wessel Kraaij, Hyowon Lee, and
       Res. 19 (2014).                                                         Vanessa Mudoch (Springer Berlin Heidelberg, Berlin, Heidel-
[50]   Branden B. Johnson, “Explaining americans’ responses                    berg, 2011) pp. 338–349.
       to dread epidemics: an illustration with ebola in late             [66] Qiming Diao, Jing Jiang, Feida Zhu, and Ee-Peng Lim, “Find-
       2014,” Journal of Risk Research 20, 1338–1357 (2017),                   ing bursty topics from microblogs,” in Proceedings of the 50th
       https://doi.org/10.1080/13669877.2016.1153507.                          Annual Meeting of the Association for Computational Linguis-
[51]   Tara Sell, Crystal Boddie, Emma Mcginty, Keshia Pollack,                tics: Long Papers - Volume 1, ACL ’12 (Association for Com-
       Katherine Smith, Thomas Burke, and Lainie Rutkow, “Me-                  putational Linguistics, USA, 2012) p. 536–544.
       dia messages and perception of risk for ebola virus infec-         [67] Daniel D Lee and H Sebastian Seung, “Learning the parts of
       tion, united states,” Emerging Infectious Diseases 23 (2017),           objects by non-negative matrix factorization,” Nature 401, 788
       10.3201/eid2301.160589.                                                 (1999).
[52]   Nicolò Gozzi, Daniela Perrotta, Daniela Paolotti, and Nicola      [68] Eran Bendavid, Bianca Mulaney, Neeraj Sood, Soleil Shah,
       Perra, “Towards a data-driven characterization of behavioral            Emilia Ling, Rebecca Bromley-Dulfano, Cara Lai, Zoe
       changes induced by the seasonal flu,” PLOS Computational                Weissberg, Rodrigo Saavedra-Walker, James Tedrow, Dona
       Biology 16, e1007879 (2020).                                            Tversky, Andrew Bogan, Thomas Kupiec, Daniel Eichner,
[53]   Toby Wise, Tomislav Zbozinek, Giorgia Michelini, Cindy Ha-              Ribhav Gupta, John Ioannidis, and Jay Bhattacharya,
       gan, and dean mobbs, “Changes in risk perception and protec-            “Covid-19 antibody seroprevalence in santa clara county,
       tive behavior during the first week of the covid-19 pandemic in         california,” medRxiv (2020), 10.1101/2020.04.14.20062463,
       the united states,” (2020).                                             https://www.medrxiv.org/content/early/2020/04/30/2020.04.14.20062463.full
[54]   Hirotogu Akaike, “Information theory and an extension of the       [69] Weirui Wang and Lee Ahern, “Acting on surprise: emotional
       maximum likelihood principle,” in Selected papers of hirotugu           response, multiple-channel information seeking and vaccina-
       akaike (Springer, 1998) pp. 199–213.                                    tion in the h1n1 flu epidemic,” Social Influence 10, 137–148
[55]   Yanqing Chen and Steven Skiena, “False-friend detection                 (2015), https://doi.org/10.1080/15534510.2015.1011227.
       and entity matching via unsupervised transliteration,” arXiv       [70] maeve duggan and a j s smith, “6% of online adults are reddit
       preprint arXiv:1611.06722 (2016).                                       users,” (2013).
[56]   “Python            geocoder,”           https://geocoder.          [71] Craig Finlay, “Age and gender in reddit commenting and suc-
       readthedocs.io, accessed: 2020-05-29.                                   cess,” Journal of Information Science Theory and Practice 2,
[57]   Yla Tausczik, Kate Faasse, James W. Pennebaker, and Keith J.            18–28 (2014).
       Petrie, “Public anxiety and information seeking following the      [72] Philipp Singer, Fabian Flöck, Clemens Meinhart, Elias Zeitfo-
       h1n1 outbreak: Blogs, newspaper articles, and wikipedia visits,”        gel, and Markus Strohmaier, “Evolution of reddit: From the
       Health Communication 27, 179–185 (2012), pMID: 21827326,                front page of the internet to a self-referential community?” in
       https://doi.org/10.1080/10410236.2011.571759.                           Proceedings of the 23rd International Conference on World
[58]   Cynthia Chew and Gunther Eysenbach, “Pandemics in the age               Wide Web, WWW ’14 Companion (Association for Computing
       of twitter: Content analysis of tweets during the 2009 h1n1             Machinery, New York, NY, USA, 2014) p. 517–522.
       outbreak,” PLOS ONE 5, 1–13 (2010).                                [73] Alexander JAM van Deursen and Jan AGM van
[59]   Dasha Pruss, Yoshinari Fujinuma, Ashlynn R. Daughton,                   Dijk, “The digital divide shifts to differences in us-
       Michael J. Paul, Brad Arnot, Danielle Albers Szafir, and Jordan         age,” New Media & Society 16, 507–526 (2014),
       Boyd-Graber, “Zika discourse in the americas: A multilingual            https://doi.org/10.1177/1461444813487959.
       topic analysis of twitter,” PLOS ONE 14, 1–23 (2019).              [74] Alexander van Deursen and Jan van Dijk, “Internet skills and
[60]   Michael C. Smith and David A. Broniatowski, “Towards real-              the digital divide,” New Media & Society 13, 893–911 (2011),
       time measurement of public epidemic awareness monitoring                https://doi.org/10.1177/1461444810386774.
       influenza awareness through twitter michael c,” (2015).            [75] Laura Robinson, Shelia R. Cotten, Hiroshi Ono, An-
[61]   Liesbeth Mollema, Irene Anhai Harmsen, Emma Broekhuizen,                abel Quan-Haase, Gustavo Mesch, Wenhong Chen, Jeremy
       Rutger Clijnk, Hester De Melker, Theo Paulussen, Gerjo Kok,             Schulz, Timothy M. Hale,               and Michael J. Stern,
       Robert Ruiter, and Enny Das, “Disease detection or public               “Digital inequalities and why they matter,” Informa-
       opinion reflection? content analysis of tweets, other social            tion, Communication & Society 18, 569–582 (2015),
       media, and online newspapers during the measles outbreak in             https://doi.org/10.1080/1369118X.2015.1012532.
       the netherlands in 2013,” J Med Internet Res 17, e128 (2015).      [76] Alexander J.A.M. [van Deursen], Jan A.G.M. [van Dijk], and
[62]   Anders A F Wahlberg and Lennart Sjoberg, “Risk perception               Oscar Peters, “Rethinking internet skills: The contribution of
       and the media,” Journal of Risk Research 3, 31–50 (2000),               gender, age, education, internet experience, and hours online to
       https://doi.org/10.1080/136698700376699.                                medium- and content-related internet skills,” Poetics 39, 125 –
[63]   Celine Klemm, Enny Das, and Tilo Hartmann, “Swine flu                   144 (2011).
       and hype: A systematic review of media dramatization of the        [77] Albert Park and Mike Conway, “Towards tracking opium re-
       h1n1 influenza pandemic,” Journal of Risk Research 19, 1–20             lated discussions in social media.” Online journal of public
       (2014).                                                                 health informatics 9 (2017).
[64]   Jean Tchuenche, Nothabo Dube, Claver Bhunu, Robert Smith,          [78] Jeanine PD Guidry, Yan Jin, Caroline A Orr, Marcus Mess-
       and Chris Bauch, “The impact of media coverage on the trans-            ner, and Shana Meganck, “Ebola on instagram and twitter:
       mission dynamics of human influenza,” BMC public health 11              How health organizations address the health crisis in their so-
       Suppl 1, S5 (2011).                                                     cial media engagement,” Public relations review 43, 477–486
                                                                               (2017).
You can also read