Collective response to the media coverage of COVID-19 Pandemic on Reddit and Wikipedia
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Collective response to the media coverage of COVID-19 Pandemic on Reddit and Wikipedia Nicolò Gozzi,1 Michele Tizzani,2 Michele Starnini,2 Fabio Ciulla,3 Daniela Paolotti,2 André Panisson,2, ∗ and Nicola Perra1, † 1 Networks and Urban Systems Centre, University of Greenwich, London, UK 2 ISI Foundation, Turin, Italy 3 Quid Inc., San Francisco, USA (Dated: June 12, 2020) The exposure and consumption of information during epidemic outbreaks may alter risk perception, trigger behavioural changes, and ultimately affect the evolution of the disease. It is thus of the uttermost importance to map information dissemination by mainstream media outlets and public response. However, our understanding of this exposure-response dynamic during COVID-19 pandemic is still limited. In this paper, we provide a characterization of media coverage and online collective attention to COVID-19 pandemic in four countries: Italy, United Kingdom, United States, and Canada. For this purpose, we collect an heterogeneous dataset including 227, 768 online news articles and 13, 448 Youtube videos published by mainstream media, 107, 898 users posts and 3, 829, 309 comments on the social media platform Reddit, and 278, 456, 892 views to COVID-19 related Wikipedia pages. Our results show that public attention, quantified as users activity on Reddit and active searches arXiv:2006.06446v1 [cs.SI] 8 Jun 2020 on Wikipedia pages, is mainly driven by media coverage and declines rapidly, while news exposure and COVID- 19 incidence remain high. Furthermore, by using an unsupervised, dynamical topic modeling approach, we show that while the attention dedicated to different topics by media and online users are in good accordance, interesting deviations emerge in their temporal patterns. Overall, our findings offer an additional key to interpret public perception/response to the current global health emergency and raise questions about the effects of attention saturation on collective awareness, risk perception and thus on tendencies towards behavioural changes. I. INTRODUCTION the spreading was isolated in far away locations. For example, the hashtag #MilanoNonSiFerma (Milan does not stop) was “In the next influenza pandemic, be it now or in the future, coined to invite citizens in Milan to go out and live normally. be the virus mild or virulent, the single most important weapon Free aperitifs were offered in Venice. In hindsight, of course, is against the disease will be a vaccine. The second most impor- easy to criticise the initial response in Italy. In fact, the country tant will be communication” [1]. This evocative sentence was has been one of the first to experience rapid growth of hospi- written in May 2009 by John M. Barry, in the early phases of talizations [7]. However, the Mayor of London, twelve days what soon after become the H1N1 2009 pandemic. In his essay, before the national lockdown, and few days after the extension Barry summarised the mishandling of the deadly 1918 Spanish of the cordon sanitaire to the entire country in Italy, affirmed flu highlighting the importance of precise, effective and honest via his official Facebook page “we should carry on doing what information in the onset of health crises. we’ve been doing” [8]. More in general, in several western Eleven years later we find ourselves dealing with another countries, the news coming from others reporting worrying pandemic. The cause is not a novel strain of influenza, but epidemic outbreaks were not considered as relevant for the these words are, unfortunately, still extremely relevant. In fact, internal situation. This initial phase aimed at conveying low as the SARS-CoV-2 sweeps the world and the vaccine is just local risk and boosting confidence about national safety has a far vision of hope, the most important weapons to reduce been repeated, at different times, across countries. A series the burden of the disease are non-pharmaceutical interven- of surveys conducted in late February provide a glimpse of tions [2, 3]. Social distancing became paramount, gatherings the possible effects of these approaches. They report that cit- have been cancelled, mobility within and across countries have izens of several European countries, despite the grim news been dramatically reduced. While such measures have been coming from Asia, were overly optimistic about the health enforced to different extents across nations, they all rely on emergency placing their risk of infections to be 1% or less [9]. compliance. Their effectiveness is linked to risk and suscep- As happened in 1918, the countries that reacted earlier rather tibility perception [4], thus the information that citizens are than later were able to control the virus with significant less exposed to is fundamental. victims [10–14]. History repeats itself and we seem not be able to learn from History repeats itself, but the context often is radically dif- our past mistakes. As happened in 1918, despite early evi- ferent. In 1918, news circulated slowly via news papers, con- dences from China [5, 6], the virus was first equated, by many, trolled by editorial choices, and of course words of mouth. In to the normal seasonal flu. As happened in 1918, many na- 2009, we witnessed the first pandemic in the social media era. tional and regional governments organised campaigns aimed Newspapers and TV were still very important source of infor- at boosting social activities (and thus local economies) actively mation, but Twitter, Facebook, YouTube, Wikipedia started to trying to convince people that their cities were safe and that become relevant for decentralized news consumption, boost- ing peer discussions, and misinformation spread. Today these platforms and websites are far more popular, integral part of so- ∗ andre.panisson@isi.it ciety and instrumental pieces of the national and international † n.perra@greenwich.ac.uk news circulations. Together with traditional news media, they
2 are the principal sources of information for the public. As such, towards different topics are in good accordance, interesting they are fundamental drivers of people perception, opinions, deviations emerge in their temporal patterns. Also, we high- and thus behaviours. This is particularly relevant for health light that, at the end of our observation period, general interest issues. For example, about 60% of adults in the USA consulted grows towards topics about the resumption of activities after online sources to gather health information [15]. lockdown, the search for a vaccine against Sars-Cov-2, ac- With respect to past epidemics and pandemics, studies on quired immunity and antibodies tests. traditional news coverage of the 2009 H1N1 pandemic high- Overall, the research presented here offers insights to inter- lighted the importance of framing and its effect on people’s pret public perception/response to the current global health perception, behaviours (such as vaccination intent), stigmatisa- emergency, raises interrogatives about the effects of attention tion of cultures at the epicentre of the outbreak, and how these saturation on collective awareness, risk perception and thus on factors differ across countries/cultures [16–21]. During Zika tendencies towards behavioural changes. epidemic in 2016, public attention was synchronised across US states, driven by news coverage about the outbreak and inde- pendently of the real local risk of infection [22]. With respect II. IMPACT OF MEDIA COVERAGE AND EPIDEMIC to COVID-19 pandemic itself, a recent study clearly shows PROGRESSION ON COLLECTIVE ATTENTION how Google searches for “coronavirus” in the USA spiked sig- nificantly right after the announcement of the first confirmed How is collective attention shaped by news media coverage case in each state [23]. Several studies based on Twitter data and epidemic progression? To tackle this important question, also highlight how misinformation and low quality information we collected an heterogeneous dataset that includes COVID- about COVID-19, although overall limited, spread before the 19 related news articles and Youtube videos published online local outbreak and rapidly took off once the local epidemic by mainstream information media, relevant posts and relative started. In the current landscape, this has the potential to boost discussion of geolocalized Reddit users, and country-specific irrational, unscientific, and dangerous behaviours [24–26]. views to Wikipedia pages related to COVID-19 for Italy, United On the other hand, despite some important limitations [27], Kingdom, United States and Canada (see subsections A, B, modern media has become a key data source to observe C of Methods and Materials for details). This choice aims to and monitor health. In fact, posts on Twitter [28–33], Face- provide an overview of media coverage and a proxy of public book [34], and Reddit [35, 36], page views in Wikipedia [37, attention and response. On the one hand, the study of news arti- 38] and searches on Google [39, 40] have been used to study, cles and videos allows us to estimate the exposure of the public nowcast and predict the spreading of infectious diseases as well to COVID-19 pandemic in traditional news media. On the as the prevalence of noncommunicable illnesses. Therefore, in other hand, the study of users discussions and response on so- the current full-fledged digital society, information is not only cial media (through Reddit) and information seeking (through key to inform people’s behaviour but can be used to develop Wikipedia page views) allows us to quantify the reaction of in- an unprecedented understanding of such behaviours, as well as dividuals to both the COVID-19 pandemic and news exposure. of the phenomena driving them. As mentioned in the introduction, previous studies showed the The context where COVID-19 is unfolding is thus very het- usefulness of social media, internet use and search trends to erogeneous and complex. Traditional and social media are analyze health-related information streams and monitor public integral parts of our perception and opinions, have the potential reaction to infectious diseases [41–45]. to trigger behaviour change and thus influence the pandemic Hence, we consider volume of comments of geolocalized spreading. Such complex landscape must be characterized in users on the subreddit /r/Coronavirus1 to explore the public order to understand the public attention and response to media discussion in reaction to media covering the epidemic in the coverage. Here, we tackle this challenge by assembling an het- various countries, while we consider the number of views of erogeneous dataset which includes 227, 768 news and 13, 448 relevant Wikipedia pages about COVID-19 pandemic to quan- YouTube videos published by traditional media, 278, 456, 892 tify users interest. It is important to stress how Reddit and views of topical Wikipedia pages, 107, 898 submissions and Wikipedia provide different aspects of online users behaviour 3, 829, 309 comments from 417, 541 distinct users on Red- and collective response. In fact, while Reddit posts can be dit, as well as epidemic data in four different countries: Italy, regarded as a general indicator of the online discussion sur- United Kingdom, United States, and Canada. rounding the global health emergency, the number of access to COVID-19 related Wikipedia pages is a proxy of health in- First, we explore how media coverage and epidemic pro- formation seeking behaviour (HISB). HISB is the act through gression influence public attention and response. To achieve which individuals retrieve and acquire new knowledge about this, we analyze news volume and COVID-19 incidence with a specific topic related to health [46, 47], and it is likely to be respect to Wikipedia page views volume and Reddit comments. triggered on a population scale by a disrupting event, such as Our results show that public attention and response are mostly the threaten of a previously unknown disease [48, 49]. driven by media coverage rather than disease spreading. Fur- thermore, we observe typical saturation and memory effects of public collective attention. Moreover, using an unsuper- vised topic modeling approach, we explore the different topics 1 Subreddits are user-created areas of interest where discussions are organized framed in traditional media and in Reddit discussions. We in Reddit. /r/Coronavirus is the subreddit related to discussions surrounding show that, while attentions of news outlets and online users COVID-19 pandemic
3 Figure 1: Normalized weekly volume of news articles and Youtube videos (news)), Reddit comments (reddit), Wikipedia views (wikipedia) related to COVID-19 pandemic and COVID-19 incidence (covid inc.) in different countries. Our analysis starts by comparing, in Figure 1, the weekly volume of news and videos published on Youtube, Wikipedia Figure 2: Country specific Pearson correlation coefficients between: views, and Reddit comments of geolocalized users in compari- 1) news coverage and global/domestic COVID-19 incidence, volumes son with the weekly COVID-19 incidence in the four countries of Reddit comments and Wikipedia views; 2) global COVID-19 in- considered. It can be seen how, as COVID-19 spreads, both cidence and volumes of Reddit comments and Wikipedia views; 3) media coverage and public interest grow in time. However, domestic COVID-19 incidence and volumes of Reddit comments and public attention, quantified by the number of Reddit comments Wikipedia views. and Wikipedia views, sharply decreases after reaching a peak, despite the volume of news and COVID-19 incidence remain- ing high. Furthermore, the peak in public attention consis- while media coverage is generally well synchronized with the tently anticipates the maximum media exposure and maximum global COVID-19 incidence, the media attention gradually COVID-19 incidence. shifts towards the internal evolution of the pandemic as soon as The correlation between media coverage, public attention, domestic outbreaks erupt. Arguably, this may have played an and the epidemic progression is quantified more in details in important role in individual risk perception. We can speculate Figure 2. The plot shows that news coverage of each country that re-framing the emergency within a national dimension had is strongly correlated with COVID-19 incidence (both global the potential to amplify the perceived susceptibility of indi- and domestic), and slightly less with the volume of Reddit viduals [50, 51] and thus increase the adoption of behavioural comments and Wikipedia views, which, in turn, are much changes [4, 52]. Indeed, previous studies showed how at the less correlated with COVID-19 incidence (both global and do- beginning of February 2020 people were overly optimistic re- mestic). This holds for all countries under consideration and garding the risks associated with the new virus circulating in highlights how the disease spreading triggers media coverage, Asia, and how their perception sharply changed after first cases and how the public response is more likely driven by such news were confirmed in their countries [9, 53]. exposure in each country rather than COVID-19 progression. To explore more systematically the relationship between Beyond these observations, it is interesting to notice from Fig- media coverage, public attention and epidemic progression, we ure 2 that Italy is the only country where news volume shows consider a linear regression model to nowcast, separately for higher correlation with domestic rather than global incidence. each country, collective public attention (quantified with the This suggests that Italian media coverage follows more closely number of comments by geolocalized Reddit users or visits to the internal evolution rather than the global one, at odds with relevant Wikipedia pages) given the volume of media cover- respect to other countries. This is probably due to Italy being age or the COVID-19 incidence as independent variables. We the location of the first COVID-19 outbreak outside Asia. include also “memory effects” in the public attention by con- This observation is supported by Figure 3, showing the ci- sidering an exponential decaying term in the news time series tation share of Italian locations by Italian news media, before [22] (see subsection D of Methods and Materials for details). and after the first COVID-19 death was confirmed in Italy We compare the three models, where the independent vari- on 2020/02/20. After this date, Italian locations represent able(s) are the domestic incidence, the news volume, the news about 74% of all places cited by Italian media (in our dataset), volume plus a memory term, by using the Akaike Information with an increase of 45% with respect to the same statistics Criterion (AIC) [54] and coefficient of determination (R2 ). We calculated before. Similar effects, though generally less in- found that the model considering only COVID-19 incidence tense, can be observed also in the other countries. Therefore, has much less predictive power than the ones considering me-
4 news newsM EM R2 reddit wikipedia reddit wikipedia reddit wikipedia Italy 0.87 0.43 -0.41 -0.15 0.82 0.73 UK 0.95 0.99 -0.44 -0.47 0.82 0.85 US 1.07 0.87 -0.48 -0.44 0.88 0.82 Canada 1.12 1.06 -0.40 -0.45 0.90 0.82 Table I: Coefficient estimates for news plus memory effects linear regression model and related R2 . It is worth underlying that, despite the simplicity of this approach, we obtain relatively high values of R2 . III. DYNAMICS OF CONTENT PRODUCTION AND CONSUMPTION While collective attention and media coverage are well cor- Figure 3: Share of citations of China versus home country locations related in terms of volume, the content and topics discussed by Italian/UK/US/Canadian news outlets before and after first COVID- by media and consumed by online users may not be as syn- 19 death occurred in different countries considered. Geographic chronized [65, 66]. To shed light on this issue, we adopt an locations are extracted from text using [55, 56] unsupervised topic modeling approach to extract prevalent top- . ics in the news articles mentioned and discussed on Reddit. Often, indeed, users on Reddit post a submission containing a news article, and discussion unfolds in comments under such submission. Differently from the first part and to provide a comprehensive overview of the topics discussed, here we do dia coverage (see Table II in Methods and Material). This not take into account any geographical context. Nonetheless, in enforces the idea that collective attention is mainly driven by the Supplementary Information we provide some insights also media coverage rather than COVID-19 incidence. In addition, on the specific topics discussed by users in different countries. we found that including memory effects improves significantly We characterize the main topics discussed on Reddit by con- the model performance. Not surprisingly, the coefficients of sidering all submissions that include a news article in English. the “memory effects” term reported in Table I are negative We then apply a topic modelling approach on the content of for all countries. This implies that public attention actually this news article set. Specifically, we extract topics by means saturates in response to news exposure and gives us the chance of non-Negative Matrix Factorization (NMF) [67], a popular to quantify the rate at which this phenomenon happens. method for this kind of tasks (see subsection E of Methods and Materials for details). In this way, we extract the n = 64 The results presented so far are in very good accordance with most relevant topics in the news shared on Reddit. As a second findings obtained in previous contexts related to epidemics and step, we apply the model trained on the Reddit news to the set pandemics. Indeed, a similar media-driven spiky unfolding of of articles published by mainstream media. That is, we char- public attention, measured through the information seeking and acterize the news published by media in terms of the topics public discussions of online users, has been observed during discussed on Reddit. This choice allows us to directly compare the 2009 H1N1 influenza pandemic [57, 58], the 2016 Zika the topics covered by media with the public discussion around outbreak [59], the seasonal flu [60] and during more localized such news exposure. A complete list of the 64 topics extracted public health emergency such as the 2013 measles outbreak with the most frequent words is provided in the Supplementary in Netherlands [61]. Our findings confirm the central role of Information. media, showing how media exposure is capable of shaping and We consider the number of articles published on a certain driving collective attention during a national and global health topic as a proxy of general interest of traditional media towards emergency. it, while we measure the collective interest of Reddit users by the number of comments under the news articles on a specific Media exposure is an important factor that can influence topic. Figure 4 shows an overview of the topics extracted and individual risk perception as well [62–64]. The timing and a comparison of the interest of media and Reddit users. We framing of the information disseminated by media can actually find a diverse and heterogeneous set of topics. Among others, modulate the attention and ultimately the behaviour of individ- we recognize topics about the global spreading of the virus uals [2]. This becomes an even greater concern in a context (Outbreaks, WHO, CDC), COVID-19 symptoms, treatment, where the most effective strategy to fight the spreading are hospitals and care facilities (Symptoms, Medical Treatment, containment measures based on individuals’ behaviour. For Medical Staff, Care Facilities), the economic impact of the pan- this reason, in the next section we characterize media coverage demic and responses from the governments to the upcoming and online users response more specifically in terms of content crisis (Economy, Money), different societal aspects (Sports, produced and consumed. Religious Services, Education), and also the possible interven-
5 Figure 5: Scatter plot with the 64 topics extracted via NMF. X-axis (Y-axis) coordinate indicates when the topic achieved 50% of its relevance in news outlets (Reddit) during our analysis interval. of the population in the Santa Clara County was infected re- spect to what originally thought [68]. Since the study suggests a lower mortality rate, the preprint has been quickly leveraged to support protest against lockdowns2 , while substantial flaws have been detected in the scientific methodology of the paper 3 . The topics overview presented so far does not take into ac- count any temporal dynamics of interest. However, topics showing a similar overall statistics may present a mismatch in temporal patterns. Hence, in the following, we take into account the temporal evolution of interest towards different topics. In Figure 5 we represent each topic as a single point: its Figure 4: Difference in interest percentage share of different topics x-coordinate (y-coordinate) indicates when such topic reached by traditional media and Reddit users. For example, +2% on the 50% of its total relevance in news outlets (on Reddit) dur- x-axis indicates that traditional media dedicates proportionally 2% ing the analysis interval (see subsection E of Methods and more attention to that specific topic with respect to Reddit users. Materials for a formal definition of the relevance of a topic). Therefore, topics at the bottom left became relevant very early in the public discussion. Among these, we recognize themes tions to mitigate the spreading of the virus (Face Masks, Social centred on early COVID-19 outbreaks (i.e., Chinese, Japanese, Distancing, Tests, Vaccine). Iranian and Italian outbreaks), the events related to cruise ships, Overall, the attention of traditional media and Reddit users specific countries (i.e., Israel, Singapore and Malaysia), and towards different topics are in good accordance. Indeed, in also topics regarding (early) health issues such as Symptoms, Figure 4 we represent the difference between interest share Confirmed Cases and the CDC. towards different topics in media and Reddit submissions. That On the contrary, topics in the top right became relevant is, we compute the percentage share of attention dedicated by toward the end of the analysis interval (early May). Reason- news outlets and Reddit users to each topic, and we subtract ably, we find here topics about the resumption of activities these two quantities. We observe a maximum absolute mis- after lockdown (i.e. Reopening), the feasibility and timing match in interest share of 2.61%. Nonetheless, we observe that of a possible vaccine against Sars-Cov-2 (i.e. Vaccine), and Reddit users are slightly more interested to topics regarding discussions regarding acquired immunity and antibodies tests health (Symptoms, Medical Treatment), non-pharmaceutical (i.e. Immunity). In-between, we find all other topics clustered interventions and personal protective equipment (Social Dis- tancing, Face Masks), studies and information on the epidemic (Research, Surveys, Santa Clara Study, CDC), and also to spe- 2 https://www.nytimes.com/2020/05/14/opinion/coronavirus-research- cific public figures such as Anthony Fauci. Interestingly, the misinformation.html 3 Santa Clara Study topic refers to the discussion about a contro- https://www.theatlantic.com/health/archive/2020/04/pandemic-confusing- versial scientific paper suggesting that a much higher fraction uncertainty/610819/
6 around end of March and mid-April 2020, the period when the characterized the exposure of individuals to COVID-19 pan- general discussion surrounding COVID-19 pandemic aroused demic by considering only news articles and Youtube videos sharply, as also shown in Figure 1. published online by major news outlets. However, individuals Note that the diagonal (plotted as a dashed line) in Fig- are also exposed to relevant information through other chan- ure 5 separates topics according to their temporal evolution. nels, with television on top of these [69]. Second, a 2013 Pew Above (below) the diagonal, we find topics whose interest on Internet Study found that Reddit users are more likely young Reddit grows slowly (quickly) with respect to the media cov- males [70], showing that around 15% of male internet users erage. Therefore, above the diagonal the interest of Reddit aged between 18 and 29 declare to use Reddit, compared to the users is mainly triggered by media exposure, while below it 5% of women in the same age range and to the 8% of men aged the interest grows faster and declines rapidly despite sustained between 30 and 49. Similarly, informal surveys proposed to media exposure. While the top-left and bottom-right regions users showed that most of respondents were males in their “late are empty, indicating that, as a first approximation, temporal teens to mid-20s”, and that female users were “very much in patterns of attention by traditional media and Reddit users the minority” [71]. Furthermore, Reddit is much more popular are well-synchronized, interesting deviations from the diago- among urban and suburban residents rather than individuals nal are observable. For example, above the diagonal one can living in rural areas [70]. Besides socio-demographic biases, find mainly topics related to various outbreaks, economics and other works suggested also that Reddit has become more and politics, for which the interests on Reddit follows the media more a self-referential community, reinforcing the tendency coverage. Below the diagonal, we observe topics more related to focus on its own contents rather than external sources [72]. to everyday life, such as Schools, Medical Staff, Care Facilities, Thus, perceptions, interests, and behaviours of Reddit users and Lockdown, for which the attention on Reddit accelerates may differ from those of the general population. A similar with respect to media coverage, and then declines rapidly. Note argument may be raised for Wikipedia searches. Indeed, the that our view of topics discussed on Reddit is limited, since we usage of Internet, especially for information seeking purposes, only consider topics from news articles shared in submissions can vary across people with different socio-demographic back- and do not explicitly take into account content expressed in grounds [73–76]. comments. This ensures a proper comparison with topics ex- Finally, our view on online users reaction is partial. Indeed, tracted from news published and explains the absence of points we do not consider other popular digital data sources such as, in the bottom right corner of Figure 5. for example, Twitter. The reason behind this choice is twofold. First, many studies already characterized public response dur- ing the current and past health emergencies through the lens IV. CONCLUSIONS of Twitter [25, 43, 45, 58, 59, 77–80]. Second, several stud- ies have reported high prevalence of bots as drivers of low In this work, we characterized the response of online users quality information and discussions on COVID-19 on this plat- to both media coverage and COVID-19 pandemic progression. form [24, 25, 81–83]. Thus, careful and challenging extra steps As a first step, we focused on the impact of media coverage on would be necessary to isolate, identify, and distinguish organic collective attention in different countries, characterized as vol- discussions/reactions possibly originated from traditional me- umes of country-specific Wikipedia pages views and comments dia from those sparked by social bots. We leave this for future of geolocalized Reddit users. We showed that collective atten- work. tion was mainly driven by media coverage rather than epidemic In conclusion, our work offers further insights to interpret progression, rapidly saturated, and decreased despite media public response to the current global health emergency and coverage and COVID-19 incidence remaining high. This trend raises questions about possible undesired effects of commu- is very similar to that observed during other outbreaks [57–61]. nication. On one hand, our results confirm the pivotal role Also, we showed how media coverage sharply shifted to the of media during health emergencies, showing how collective domestic situation as soon as the first death was confirmed in attention is mainly driven by media coverage. Therefore, since the home country, discussing the implications for re-shaping people are highly reactive to the news they are exposed to, in individuals perception of risk [9, 53]. the beginning of an outbreak, the quality and type of informa- As a second step, we focused on the dynamics of content tion provided might have critical effects on risk perception, production and consumption. We modeled topics published behaviours, and ultimately on the unfolding of the disease. On in mainstream media and discussed on Reddit, showing that the other hand, however, we found that collective online atten- Reddit users were generally more interested in health, data tion saturates and declines rapidly despite media exposure and regarding the new disease, and interventions needed to halt disease circulation remaining high. Attention saturation has the spreading with respect to media exposure. By taking into the potential to affect collective awareness, perceived risk and account the dynamics of topics extracted, we show that, while ultimately propensity towards virtuous individual behavioural their temporal patterns are generally synchronized, the public changes aimed at mitigating the spreading. Furthermore, espe- attention for topics related to politics and economics is mainly cially in case of unknown viruses, attention saturation might triggered by media exposure, while the interests for topics exacerbate the spreading of low quality information, which is more related to daily life accelerates on Reddit with respect to likely to spread in the early phases of the outbreak when the media coverage. characteristics of the disease are uncertain. Future works are Of course, our research comes with limitations. First, we needed to characterize the actual effects of attention saturation
7 on human perceptions during a global health emergency. Canadian channels. Our findings suggest that public health authorities should consider to reinforce specific communication channels, such It is important to underline that, while there is a good overlap as social media platforms, in order to compensate the (natu- between the sources of news articles and videos, some do not ral) phenomenon of attention saturation. Indeed, these chan- match. This is due to the fact that not all news organizations nels have the potential to create a more durable engagement run a YouTube channel and others do not produce traditional with people, through a continuous loop of direct interactions. articles. In the Supplementary Information, we provide a com- Currently, we see public health authorities issuing regularly plete list of news outlets and Youtube channels considered. declarations on social media. However, the CDC didn’t even have a Twitter account in 2009 during H1N1 pandemic (the account was created in May 2010). While this is just an exam- B. Reddit data set ple, it underlines how we are relatively new to communicating such global health emergencies through social medias. There- Reddit is a social content aggregation website where users fore, there is great need to further reinforce and engage people can post, comment and vote content. It is structured in sub- through these channels. Alongside, public health authorities communities (i.e. subreddits), centered around a variety of should consider to strengthen additional communication chan- topics. nels. An example can be represented by participatory surveil- Reddit has already proven to be suitable for a variety of lance platforms all over the world such as Influenzanet, Flu research purposes, ranging from the study of user engagement Near You and FluTracking [84–86], which have the potential of and interactions between highly related communities [90, 91] delivering in-depth targeted information to individuals during to post-election political analyses [92]. Also, it has been used public health emergencies, to promote the exchange of infor- to study the impact of linguistic differences in news titles [93] mation between people and public health authorities, with the and to explore recent web-related issues such as hate speech potential to enhance the level of engagement in the community [94] or cyberbullying [95] as well as health related issues like [87]. mental illness [96], also providing insights about the opioid epidemics [77, 97]. We used the Reddit API to collect all submissions and com- V. METHODS AND MATERIAL ments published in Reddit under the subreddit /r/Coronavirus from 15/02/2020 to 15/05/2020. After data cleanup by re- In this section we provide general information about the data moving entries deleted by authors and moderators, we keep sets collected and the methods used. only submissions with score > 1 to avoid spam. We remove comments with less than 10 characters and with more than 3 duplicates, to avoid using automatic messages from modera- A. News Articles and Videos tion. Final data contains 107, 898 submissions and 3, 829, 309 comments from 417, 541 distinct users. For the submissions, we then selected entries with links to We collect news articles using News API, a service that English news outlets. The content of the urls was extracted allows to download articles published online in a variety of using the available implementation4 of the method described countries and languages [88]. For each of the country con- in [98], resulting in 66, 575 valid documents. sidered, we download all relevant articles published online Reddit does not provide any explicit information about users’ by selected sources in the period 2020/02/07 - 2020/05/15. location, therefore we use self reporting via regular expression We select “relevant” articles considering those citing one of to assign a location to users. Reddit users often declare geo- the following keywords: ’coronavirus’, ’covid19’, ’covid-19’, graphical information about themselves in submissions or com- ’ncov-19’, ’sars-cov-2’. Note that for each article we have ment texts. We used the same approach as described in [97], access to title, description and a preview of the whole text. that found the use of regular expressions as reliable, resulting In total, our dataset consists in 227, 768 news: 71, 461 pub- in high correlation with census data in the US, although we lished by Italian, 63, 799 by UK, 82, 630 by US, and 9, 878 by acknowledge a potential higher bias at country level due to Canadian media. heterogeneities in Reddit population coverage and users demo- Additionally, we collect all videos published on YouTube graphics. We selected all texts containing expressions such as by major news organizations, in the four countries under inves- ‘I am from’ or ‘I live in’ and extracted candidate expressions tigation, via their official YouTube channels using the official from the text that follows the expression, to identify which API [89]. In doing so, we download title and description of ones represented country locations. By removing inconsistent all videos and select as relevant those that mention one of the self reporting we were able to assign a country to 789, 909 following keywords: ‘coronavirus’, ‘virus’, ‘covid’, ‘covid19’, distinct users, from which 41, 465 have written at least one ‘sars’, ‘sars-cov-2’, ‘sarscov2’. comment in the subreddit r/Coronavirus (13, 811 from USA, The reach of each channel (measured by number of 6, 870 from Canada, 3, 932 from UK and 445 from Italy). subscribers) varies quite drastically from more than 9 million for CNN (USA) to about 12 thousand for Ansa (Italy). In total, the YouTube dataset consist of 13, 448 videos: 3, 325 4 https://github.com/jcpeterson/openwebtext by Italian, 3, 525 by British, 6, 288 by American, ans 310 by
8 C. Wikipedia data set incidence news news + newsM EM reddit wikipedia reddit wikipedia reddit wikipedia Italy 3.71 -98.73 -29.68 -121.3 -76.44 -138.49 Wikipedia has become a popular digital data source to study UK 64.24 65.85 -16.03 -20.46 -51.97 -65.24 health information seeking behaviour [57, 99], and to monitor US 86.02 61.85 -11.76 -14.67 -48.39 -44.93 and forecast the spreading of infectious diseases [100, 101]. Canada 84.53 68.65 -28.53 -13.99 -69.26 -53.08 Here, we use the Wikimedia API [102] to collect the number of visits per day of Wikipedia articles and the total monthly ac- Table II: Akaike Information Criterion for the three linear regression cesses to a specific project from each country. We consider the models applied to predict Reddit comments and Wikipedia visits. language as indicative of a specific country, suggesting the rel- As a practical rule, a model i is preferred to model j if AIC(j) − evant projects for our analysis to be in English and Italian, i.e. AIC(i) ≥ 3 en.wikipedia and it.wikipedia respectively. We choose the arti- cles directly related to COVID-19 and the ones in the ’see also’ section of each page at the time of the analysis, 2020/02/07 public collective attention. Then, the models considered are: - 2020/05/15, including country-specific articles (see Supple- model I) yt = α1 incidencet + ut mentary Information for full list of web pages considered). model II) yt = α1 newst + ut (2) Except for the Italian, where the language is highly indica- tive of the location, the number of the access to English pages model III) yt = α1 newst + α2 newsM EMt + ut are almost evenly distributed among English-speaking coun- Where yt can be either the volume of Reddit comments of tries. To normalize the signal related to each country we weight geolocalized users or country specific Wikipedia visits, and the number of daily accesses to a single article from a specific ut is the error term. In Table II we report the results of the project p, Sp (d), with the total number of monthly accesses three regressions in terms of Akaike Information Criterion from a country c to the related Wikipedia project Tpc (d) such (AIC) [54]. We observe that the model considering both news that the daily page views from a given Wikipedia project and volume and memory effects is generally the better choice, country is while the model considering COVID-19 incidence only is the Sp (d)Tpc (d) worst. c ya,p (d) = P c , c Tp (d) E. Topic Modeling where the denominator is the total number of views of the Wikipedia specific project. The total volume of views at day d Topic modeling has emerged as one of the most effective from a country c is then given by the sum over all the articles methods for classifying, clustering, and retrieving textual data, a and projects p, namely and has been the object of extensive investigation in the lit- X erature. Many topic analysis frameworks are extensions of y c (d) = c ya,p (d). well known algorithms, considered as state-of-the-art for topic a,p modeling. Latent Dirichlet Allocation (LDA) [103] is the ref- erence for probabilistic topic modeling. Nonnegative matrix factorization (NMF) [67] is the counterpart of LDA for the D. Linear regression approach to model collective attention matrix factorization community. Although there are many approaches to temporal and hier- Above we showed how media coverage, COVID-19 inci- archical topic modeling [104–106], we choose to apply NMF dence, and public attention are correlated even across four to the dataset, and then build time-varying intensities for each different countries. To move a step forward in this analy- topic using the articles publication date. Starting from a dataset sis, we considered a linear regression model that predicts for D containing the news articles shared in Reddit, we extract each country the public response given the news exposure. words and phrases with the methodology described in [107], To include “memory effects” in the public response to media discarding terms with frequency below 10, to form a vocabu- coverage, we consider also a modified version of this simple lary V with around 60k terms. Each document is then repre- model, in which we weight cumulative news articles volume sented as a vector of term counts, in a bag-of-words approach. time series with an exponential decaying term [22]. Formally, We apply TF-IDF normalization [108] and extract a total of we define the new variable: K = 64 topics through NMF: 2 τ min kX − WHkF , (3) X ∆t W,H newsM EM = e− τ news(t − ∆t) (1) ∆t=1 where kk2F is the Frobenius norm and X ∈ R|D|×|V| is the ma- trix resulting form TF-IDF normalization, subject to the con- Where τ is a free parameter that sets the memory time scale and straint that the values in W ∈ R|D|×K and H ∈ RK×|V| must is tuned with cross-validation (more details in the Supplemen- be nonnegative. The nonnegative factorization is achieved us- tary Information). These two models are compared to a linear ing the projected gradient method with sparseness constraints, regression that considers only COVID-19 incidence to predict as described in [109, 110]. The matrix H is then used as a
9 transformation basis for other datasets, e.g. with a new matrix Fondazione Cassa di Risparmio di Torino (Fondazione CRT). X e we fix H and calculate a new W e according to Eq. 3. M.T. acknowledges support from EPIPOSE -“Epidemic in- For each topic k we build a time series sk for each dataset telligence to minimize COVID-19’s public health, societal (t) D, where sk is the strength of topic k at time t. For the news and economical impact” H2020-SC1-PHE-CORONAVIRUS- (t) 2020 call. M.S/ and A.P. acknowledge support from the Re- outlets dataset, sk = i∈D(t) wik , where D(t) is the set of P search Project “Casa Nel Parco” (POR FESR 14/20 - CANP - all documents shared at time t in news outlets. For Reddit, we Cod. 320 - 16 - Piattaforma Tecnologica “Salute e Benessere”) weight each shared document by its number of comments, and (t) funded by Regione Piemonte in the context of the Regional sk = i∈D(t) wik · ci , where D(t) is the set of all documents P Platform on Health and Wellbeing. A.P. acknowledges partial shared at time t in Reddit, and ci is the number of comments support from Intesa Sanpaolo Innovation Center. The fun- associated to document i. Finally, we define the relevance of a ders had no role in study design, data collection and analysis, topic as the integral in time of the strength. Therefore, given decision to publish, or preparation of the manuscript. N.G. t0 and tf as the start/end of our analysis interval, and given acknowledges support from the Doctoral Training Alliance. Rt (t) R = t0f dtsk , the coordinates of Figure 5 are the t1/2 such R t1/2 (t) that t0 dtsk = R/2. CONTRIBUTIONS ACKNOWLEDGMENTS N.G, M.S., D.P., A.P. and N.P. conceptualized the study. N.G., Authors would like to thank the startup Quick Algorithm for N.P., A.P. and M.T. collected the data. N.G., A.P. and F.C. providing the platform https://covid19.scops.ai/ performed analyses. N.G., M.S. and N.P. wrote the initial scops/home/, where the data collected during COVID-19 draft of the manuscript. N.G. and A.P. provided visualiza- pandemic were visualized in real-time. D.P. and M.T. acknowl- tion. All authors (N.G., N.P., D.P., M.S., A.P., M.T., F.C.) edge support from the Lagrange Project of the Institute for discussed the research design, reviewed, edited, and approved Scientific Interchange Foundation (ISI Foundation) funded by the manuscript. [1] John M Barry, “Pandemics: avoiding the mistakes of 1918,” [9] Jocelyn Raude, Marion Debin, Cécile Souty, Caroline Guer- Nature 459, 324–325 (2009). risi, Clement Turbelin, Alessandra Falchi, Isabelle Bonmarin, [2] Sebastian Funk, Marcel Salathé, and Vincent A. A. Daniela Paolotti, Yamir Moreno, Chinelo Obi, et al., “Are Jansen, “Modelling the influence of human behaviour people excessively pessimistic about the risk of coronavirus on the spread of infectious diseases: a review,” Jour- infection?” (2020). nal of The Royal Society Interface 7, 1247–1256 (2010), [10] Moritz UG Kraemer, Chia-Hung Yang, Bernardo Gutier- https://royalsocietypublishing.org/doi/pdf/10.1098/rsif.2010.0142. rez, Chieh-Hsi Wu, Brennan Klein, David M Pigott, Louis [3] Frederik Verelst, Lander Willem, and Philippe Beutels, du Plessis, Nuno R Faria, Ruoran Li, William P Hanage, et al., “Behavioural change models for infectious disease trans- “The effect of human mobility and control measures on the mission: a systematic review (2010-2015),” Journal of The covid-19 epidemic in china,” Science 368, 493–497 (2020). Royal Society Interface 13 (2016), 10.1098/rsif.2016.0820, [11] Benjamin F. Maier and Dirk Brockmann, “Effective https://royalsocietypublishing.org/doi/pdf/10.1098/rsif.2016.0820. containment explains subexponential growth in recent con- [4] Irwin M. Rosenstock, Victor J. Strecher, and Marshall H. firmed covid-19 cases in china,” Science 368, 742–746 (2020), Becker, “Social learning theory and the health belief model,” https://science.sciencemag.org/content/368/6492/742.full.pdf. Health Education Quarterly 15, 175–183 (1988), pMID: [12] Roy M Anderson, Hans Heesterbeek, Don Klinkenberg, and 3378902, https://doi.org/10.1177/109019818801500203. T Déirdre Hollingsworth, “How will country-based mitigation [5] Joseph T Wu, Kathy Leung, Mary Bushman, Nishant Kishore, measures influence the course of the covid-19 epidemic?” The Rene Niehus, Pablo M de Salazar, Benjamin J Cowling, Marc Lancet 395, 931–934 (2020). Lipsitch, and Gabriel M Leung, “Estimating clinical severity [13] Juliet Bedford, Delia Enria, Johan Giesecke, David L Heymann, of covid-19 from the transmission dynamics in wuhan, china,” Chikwe Ihekweazu, Gary Kobinger, H Clifford Lane, Ziad Nature Medicine 26, 506–510 (2020). Memish, Myoung-don Oh, Anne Schuchat, et al., “Covid-19: [6] “Covid19 timeline,” https://www. towards controlling of a pandemic,” The Lancet 395, 1015– who.int/news-room/detail/ 1018 (2020). 27-04-2020-who-timeline---covid-19 (), [14] Tim Colbourn, “Covid-19: extending or relaxing distancing accessed: 2020-05-11. control measures,” The Lancet Public Health (2020). [7] “Who covid19 official situation reports,” https: [15] Susannah Fox and Maeve Duggan, “Health online 2013,” //www.who.int/emergencies/diseases/ Health 2013, 1–55 (2013). novel-coronavirus-2019/situation-reports/ [16] Seow Ting Lee, “Predictors of h1n1 influenza pandemic news (), accessed: 2020-05-11. coverage: Explicating the relationships between framing and [8] “Mayor of london accouncement march 11th,” news release selection,” International Journal of Strategic Com- https://www.facebook.com/sadiqforlondon/ munication 8, 294–310 (2014). posts/3025766374142796, accessed: 2020-05-11.
10 [17] Michael McCauley, Sara Minsky, and Kasisomayajula in Proceedings of the 5th Annual ACM Web Science Conference Viswanath, “The h1n1 pandemic: media frames, stigmatization (2013) pp. 47–56. and coping,” BMC Public Health 13, 1116 (2013). [33] David A Broniatowski, Michael J Paul, and Mark Dredze, [18] Carolyn A Lin and Carolyn Lagoe, “Effects of news media “National and local influenza surveillance through twitter: an and interpersonal interactions on h1n1 risk perception and analysis of the 2012-2013 influenza epidemic,” PloS one 8 vaccination intent,” Communication Research Reports 30, 127– (2013). 136 (2013). [34] Matheus Araujo, Yelena Mejova, Ingmar Weber, and Fabricio [19] Seow Ting Lee and Iccha Basnyat, “From press release to news: Benevenuto, “Using facebook ads audiences for global lifestyle mapping the framing of the 2009 h1n1 a influenza pandemic,” disease surveillance: Promises and limitations,” in Proceedings Health Communication 28, 119–132 (2013). of the 2017 ACM on Web Science Conference (2017) pp. 253– [20] Hyun Jung Oh, Thomas Hove, Hye-Jin Paek, Byoungkwan 257. Lee, Hyegyu Lee, and Sun Kyu Song, “Attention cycles and [35] Albert Park and Mike Conway, “Tracking health related dis- the h1n1 pandemic: A cross-national study of us and korean cussions on reddit for public health applications,” in AMIA newspaper coverage,” Asian Journal of Communication 22, Annual Symposium Proceedings, Vol. 2017 (American Medical 214–232 (2012). Informatics Association, 2017) p. 1362. [21] Maria Keramarou, S Cottrell, MR Evans, C Moore, RE Stiff, [36] Mrinal Kumar, Mark Dredze, Glen Coppersmith, and Mun- C Elliott, DR Thomas, M Lyons, and RL Salmon, “Two waves mun De Choudhury, “Detecting changes in suicide content of pandemic influenza a (h1n1) 2009 in wales–the possible manifested in social media following celebrity suicides,” in impact of media coverage on consultation rates, april–december Proceedings of the 26th ACM conference on Hypertext & So- 2009,” Eurosurveillance 16, 19772 (2011). cial Media (2015) pp. 85–94. [22] Michele Tizzoni, André Panisson, Daniela Paolotti, and Ciro [37] Nicholas Generous, Geoffrey Fairchild, Alina Deshpande, Cattuto, “The impact of news exposure on collective attention Sara Y Del Valle, and Reid Priedhorsky, “Global disease mon- in the united states during the 2016 zika epidemic,” PLoS itoring and forecasting with wikipedia,” PLoS computational computational biology 16, e1007633 (2020). biology 10 (2014). [23] Ana I Bento, Thuy Nguyen, Coady Wing, Felipe Lozano-Rojas, [38] Kyle S Hickmann, Geoffrey Fairchild, Reid Priedhorsky, Yong-Yeol Ahn, and Kosali Simon, “Evidence from internet Nicholas Generous, James M Hyman, Alina Deshpande, and search data shows information-seeking responses to news of Sara Y Del Valle, “Forecasting the 2013–2014 influenza season local covid-19 cases,” Proceedings of the National Academy of using wikipedia,” PLoS computational biology 11 (2015). Sciences (2020). [39] Jeremy Ginsberg, Matthew H Mohebbi, Rajan S Patel, Lynnette [24] Riccardo Gallotti, Francesco Valle, Nicola Castaldo, Pierluigi Brammer, Mark S Smolinski, and Larry Brilliant, “Detecting Sacco, and Manlio De Domenico, “Assessing the risks of” influenza epidemics using search engine query data,” Nature infodemics” in response to covid-19 epidemics,” arXiv preprint 457, 1012–1014 (2009). arXiv:2004.03997 (2020). [40] Andrea Freyer Dugas, Mehdi Jalalpour, Yulia Gel, Scott Levin, [25] Lisa Singh, Shweta Bansal, Leticia Bode, Ceren Budak, Fred Torcaso, Takeru Igusa, and Richard E Rothman, “In- Guangqing Chi, Kornraphop Kawintiranon, Colton Padden, fluenza forecasting with google flu trends,” PloS one 8 (2013). Rebecca Vanarsdall, Emily Vraga, and Yanchen Wang, “A first [41] Gunther Eysenbach, “Infodemiology and infoveillance tracking look at covid-19 information and misinformation sharing on online health information and cyberbehavior for public health,” twitter,” arXiv preprint arXiv:2003.13907 (2020). American journal of preventive medicine 40, S154–8 (2011). [26] Matteo Cinelli, Walter Quattrociocchi, Alessandro Galeazzi, [42] Gabriel J Milinovich, Gail M Williams, Archie C A Clements, Carlo Michele Valensise, Emanuele Brugnoli, Ana Lu- and Wenbiao Hu, “Internet-based surveillance systems for mon- cia Schmidt, Paola Zola, Fabiana Zollo, and Antonio itoring emerging infectious diseases,” The Lancet Infectious Scala, “The covid-19 social media infodemic,” arXiv preprint Diseases 14, 160 – 168 (2014). arXiv:2003.05004 (2020). [43] Han Woo Park, Sejung Park, and Miyoung Chong, “Conver- [27] David Lazer, Ryan Kennedy, Gary King, and Alessandro sations and medical news frames on twitter: Infodemiological Vespignani, “The parable of google flu: traps in big data analy- study on covid-19 in south korea,” J Med Internet Res 22, sis,” Science 343, 1203–1205 (2014). e18897 (2020). [28] Aron Culotta, “Towards detecting influenza epidemics by ana- [44] Albert Park and Mike Conway, “Tracking health related discus- lyzing twitter messages,” in Proceedings of the first workshop sions on reddit for public health applications,” AMIA ... Annual on social media analytics (2010) pp. 115–122. Symposium proceedings. AMIA Symposium 2017, 1362–1371 [29] Vasileios Lampos and Nello Cristianini, “Tracking the flu pan- (2018). demic by monitoring the social web,” in 2010 2nd international [45] Alex Lamb, Michael Paul, and Mark Dredze, “Investigating workshop on cognitive information processing (IEEE, 2010) twitter as a source for studying behavioral responses to epi- pp. 411–416. demics,” AAAI Fall Symposium - Technical Report (2012). [30] Qian Zhang, Nicola Perra, Daniela Perrotta, Michele Tizzoni, [46] Nehama Lewis, “Information seeking and scanning,” in The Daniela Paolotti, and Alessandro Vespignani, “Forecasting International Encyclopedia of Media Effects (American Cancer seasonal influenza fusing digital indicators and a mechanis- Society, 2017) pp. 1–10. tic disease model,” in Proceedings of the 26th international [47] Sylvie D. Lambert and Carmen G. Loiselle, “Health conference on world wide web (2017) pp. 311–319. information—seeking behavior,” Qualitative Health [31] Munmun De Choudhury, Michael Gamon, Scott Counts, and Research 17, 1006–1019 (2007), pMID: 17928475, Eric Horvitz, “Predicting depression via social media,” in Sev- https://doi.org/10.1177/1049732307305199. enth international AAAI conference on weblogs and social [48] D Walter, Merle Böhmer, Sabine Reiter, Gerard Krause, and media (2013). Ole Wichmann, “Risk perception and information-seeking be- [32] Munmun De Choudhury, Scott Counts, and Eric Horvitz, “So- haviour during the 2009/10 influenza a(h1n1)pdm09 pandemic cial media as a measurement tool of depression in populations,” in germany,” Euro surveillance : bulletin européen sur les mal-
11 adies transmissibles = European communicable disease bulletin [65] Wayne Xin Zhao, Jing Jiang, Jianshu Weng, Jing He, Ee-Peng 17 (2012). Lim, Hongfei Yan, and Xiaoming Li, “Comparing twitter and [49] Natalie Lee-San Pang, “Crisis-based information seeking: mon- traditional media using topic models,” in Advances in Infor- itoring versus blunting in the information seeking behaviour of mation Retrieval, edited by Paul Clough, Colum Foley, Cathal working students during the southeast asian haze crisis,” Inf. Gurrin, Gareth J. F. Jones, Wessel Kraaij, Hyowon Lee, and Res. 19 (2014). Vanessa Mudoch (Springer Berlin Heidelberg, Berlin, Heidel- [50] Branden B. Johnson, “Explaining americans’ responses berg, 2011) pp. 338–349. to dread epidemics: an illustration with ebola in late [66] Qiming Diao, Jing Jiang, Feida Zhu, and Ee-Peng Lim, “Find- 2014,” Journal of Risk Research 20, 1338–1357 (2017), ing bursty topics from microblogs,” in Proceedings of the 50th https://doi.org/10.1080/13669877.2016.1153507. Annual Meeting of the Association for Computational Linguis- [51] Tara Sell, Crystal Boddie, Emma Mcginty, Keshia Pollack, tics: Long Papers - Volume 1, ACL ’12 (Association for Com- Katherine Smith, Thomas Burke, and Lainie Rutkow, “Me- putational Linguistics, USA, 2012) p. 536–544. dia messages and perception of risk for ebola virus infec- [67] Daniel D Lee and H Sebastian Seung, “Learning the parts of tion, united states,” Emerging Infectious Diseases 23 (2017), objects by non-negative matrix factorization,” Nature 401, 788 10.3201/eid2301.160589. (1999). [52] Nicolò Gozzi, Daniela Perrotta, Daniela Paolotti, and Nicola [68] Eran Bendavid, Bianca Mulaney, Neeraj Sood, Soleil Shah, Perra, “Towards a data-driven characterization of behavioral Emilia Ling, Rebecca Bromley-Dulfano, Cara Lai, Zoe changes induced by the seasonal flu,” PLOS Computational Weissberg, Rodrigo Saavedra-Walker, James Tedrow, Dona Biology 16, e1007879 (2020). Tversky, Andrew Bogan, Thomas Kupiec, Daniel Eichner, [53] Toby Wise, Tomislav Zbozinek, Giorgia Michelini, Cindy Ha- Ribhav Gupta, John Ioannidis, and Jay Bhattacharya, gan, and dean mobbs, “Changes in risk perception and protec- “Covid-19 antibody seroprevalence in santa clara county, tive behavior during the first week of the covid-19 pandemic in california,” medRxiv (2020), 10.1101/2020.04.14.20062463, the united states,” (2020). https://www.medrxiv.org/content/early/2020/04/30/2020.04.14.20062463.full [54] Hirotogu Akaike, “Information theory and an extension of the [69] Weirui Wang and Lee Ahern, “Acting on surprise: emotional maximum likelihood principle,” in Selected papers of hirotugu response, multiple-channel information seeking and vaccina- akaike (Springer, 1998) pp. 199–213. tion in the h1n1 flu epidemic,” Social Influence 10, 137–148 [55] Yanqing Chen and Steven Skiena, “False-friend detection (2015), https://doi.org/10.1080/15534510.2015.1011227. and entity matching via unsupervised transliteration,” arXiv [70] maeve duggan and a j s smith, “6% of online adults are reddit preprint arXiv:1611.06722 (2016). users,” (2013). [56] “Python geocoder,” https://geocoder. [71] Craig Finlay, “Age and gender in reddit commenting and suc- readthedocs.io, accessed: 2020-05-29. cess,” Journal of Information Science Theory and Practice 2, [57] Yla Tausczik, Kate Faasse, James W. Pennebaker, and Keith J. 18–28 (2014). Petrie, “Public anxiety and information seeking following the [72] Philipp Singer, Fabian Flöck, Clemens Meinhart, Elias Zeitfo- h1n1 outbreak: Blogs, newspaper articles, and wikipedia visits,” gel, and Markus Strohmaier, “Evolution of reddit: From the Health Communication 27, 179–185 (2012), pMID: 21827326, front page of the internet to a self-referential community?” in https://doi.org/10.1080/10410236.2011.571759. Proceedings of the 23rd International Conference on World [58] Cynthia Chew and Gunther Eysenbach, “Pandemics in the age Wide Web, WWW ’14 Companion (Association for Computing of twitter: Content analysis of tweets during the 2009 h1n1 Machinery, New York, NY, USA, 2014) p. 517–522. outbreak,” PLOS ONE 5, 1–13 (2010). [73] Alexander JAM van Deursen and Jan AGM van [59] Dasha Pruss, Yoshinari Fujinuma, Ashlynn R. Daughton, Dijk, “The digital divide shifts to differences in us- Michael J. Paul, Brad Arnot, Danielle Albers Szafir, and Jordan age,” New Media & Society 16, 507–526 (2014), Boyd-Graber, “Zika discourse in the americas: A multilingual https://doi.org/10.1177/1461444813487959. topic analysis of twitter,” PLOS ONE 14, 1–23 (2019). [74] Alexander van Deursen and Jan van Dijk, “Internet skills and [60] Michael C. Smith and David A. Broniatowski, “Towards real- the digital divide,” New Media & Society 13, 893–911 (2011), time measurement of public epidemic awareness monitoring https://doi.org/10.1177/1461444810386774. influenza awareness through twitter michael c,” (2015). [75] Laura Robinson, Shelia R. Cotten, Hiroshi Ono, An- [61] Liesbeth Mollema, Irene Anhai Harmsen, Emma Broekhuizen, abel Quan-Haase, Gustavo Mesch, Wenhong Chen, Jeremy Rutger Clijnk, Hester De Melker, Theo Paulussen, Gerjo Kok, Schulz, Timothy M. Hale, and Michael J. Stern, Robert Ruiter, and Enny Das, “Disease detection or public “Digital inequalities and why they matter,” Informa- opinion reflection? content analysis of tweets, other social tion, Communication & Society 18, 569–582 (2015), media, and online newspapers during the measles outbreak in https://doi.org/10.1080/1369118X.2015.1012532. the netherlands in 2013,” J Med Internet Res 17, e128 (2015). [76] Alexander J.A.M. [van Deursen], Jan A.G.M. [van Dijk], and [62] Anders A F Wahlberg and Lennart Sjoberg, “Risk perception Oscar Peters, “Rethinking internet skills: The contribution of and the media,” Journal of Risk Research 3, 31–50 (2000), gender, age, education, internet experience, and hours online to https://doi.org/10.1080/136698700376699. medium- and content-related internet skills,” Poetics 39, 125 – [63] Celine Klemm, Enny Das, and Tilo Hartmann, “Swine flu 144 (2011). and hype: A systematic review of media dramatization of the [77] Albert Park and Mike Conway, “Towards tracking opium re- h1n1 influenza pandemic,” Journal of Risk Research 19, 1–20 lated discussions in social media.” Online journal of public (2014). health informatics 9 (2017). [64] Jean Tchuenche, Nothabo Dube, Claver Bhunu, Robert Smith, [78] Jeanine PD Guidry, Yan Jin, Caroline A Orr, Marcus Mess- and Chris Bauch, “The impact of media coverage on the trans- ner, and Shana Meganck, “Ebola on instagram and twitter: mission dynamics of human influenza,” BMC public health 11 How health organizations address the health crisis in their so- Suppl 1, S5 (2011). cial media engagement,” Public relations review 43, 477–486 (2017).
You can also read