Data Mining and Content Analysis of the Chinese Social Media Platform Weibo During the Early COVID-19 Outbreak: Retrospective Observational ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
JMIR PUBLIC HEALTH AND SURVEILLANCE Li et al Original Paper Data Mining and Content Analysis of the Chinese Social Media Platform Weibo During the Early COVID-19 Outbreak: Retrospective Observational Infoveillance Study Jiawei Li1,2,3,4, MS; Qing Xu1,2,3,4, MAS; Raphael Cuomo1,4, MPH, PhD; Vidya Purushothaman1,4, MBBS, MAS; Tim Mackey1,2,3,4, MAS, PhD 1 Department of Anesthesiology and Division of Infectious Diseases and Global Public Health, University of California San Diego School of Medicine, La Jolla, CA, United States 2 S-3 Research LLC, San Diego, CA, United States 3 Department of Healthcare Research and Policy, University of California San Diego Extension, La Jolla, CA, United States 4 Global Health Policy Institute, San Diego, CA, United States Corresponding Author: Tim Mackey, MAS, PhD Department of Anesthesiology and Division of Infectious Diseases and Global Public Health University of California San Diego School of Medicine 8950 Villa La Jolla Drive A124 La Jolla, CA, 92037 United States Phone: 1 9514914161 Email: tmackey@ucsd.edu Abstract Background: The coronavirus disease (COVID-19) pandemic, which began in Wuhan, China in December 2019, is rapidly spreading worldwide with over 1.9 million cases as of mid-April 2020. Infoveillance approaches using social media can help characterize disease distribution and public knowledge, attitudes, and behaviors critical to the early stages of an outbreak. Objective: The aim of this study is to conduct a quantitative and qualitative assessment of Chinese social media posts originating in Wuhan City on the Chinese microblogging platform Weibo during the early stages of the COVID-19 outbreak. Methods: Chinese-language messages from Wuhan were collected for 39 days between December 23, 2019, and January 30, 2020, on Weibo. For quantitative analysis, the total daily cases of COVID-19 in Wuhan were obtained from the Chinese National Health Commission, and a linear regression model was used to determine if Weibo COVID-19 posts were predictive of the number of cases reported. Qualitative content analysis and an inductive manual coding approach were used to identify parent classifications of news and user-generated COVID-19 topics. Results: A total of 115,299 Weibo posts were collected during the study time frame consisting of an average of 2956 posts per day (minimum 0, maximum 13,587). Quantitative analysis found a positive correlation between the number of Weibo posts and the number of reported cases from Wuhan, with approximately 10 more COVID-19 cases per 40 social media posts (P
JMIR PUBLIC HEALTH AND SURVEILLANCE Li et al (JMIR Public Health Surveill 2020;6(2):e18700) doi: 10.2196/18700 KEYWORDS COVID-19; coronavirus; infectious disease; social media, surveillance; infoveillance; infodemiology key trends that may also correlate with outbreak incidence data Introduction [18]. The coronavirus disease (COVID-19) is a rapidly emerging Leveraging infoveillance approaches, we conducted a infectious disease caused by a novel coronavirus named severe retrospective observational study for COVID-19 on one of the acute respiratory syndrome (SARS) coronavirus 2 [1]. The largest Chinese social media platforms, Sina Weibo [新浪微 COVID-19 outbreak began in late December 2019 in Wuhan, 博]. Sina Weibo is a microblogging website (also known as the Hubei Province, China, with a cluster of patients presenting Chinese equivalent to Twitter) and one of the most influential with pneumonia of unknown origin and reported exposure to a social media platforms in China. According to its own press seafood and live animal market in the same city [2]. The World release in August 2020, it had over 486 million active users Health Organization (WHO) confirmed 41 cases and 1 death [19]. Users can publish content such as messages in microblogs due to the novel coronavirus by January 12, 2020 [3]. Since this and share text, pictures, videos, and music. Compared with initial reporting, COVID-19 has rapidly spread within China WeChat, another popular social media platform in China, Weibo and internationally, with the WHO declaring a Public Health posts are generally more publicly visible; with WeChat, posts Emergency of International Concern (PHEIC) under the revised are generally more private and only visible to certain people International Health Regulations on January 30, 2020 [4]. selected by users. Due to the public nature of the platform, we Since the PHEIC declaration, COVID-19 has spread to every attempted to assess whether Weibo posts about COVID-19 were continent except Antarctica, becoming a highly infectious global predictive of the number of reported cases during the outbreak’s pandemic with sustained community transmission [5]. The early stages and conducted a qualitative analysis of severity of the COVID-19 outbreak, with approximately 1.8 COVID-19–related themes detected and discussed by users million cases worldwide as of mid-April 2020 [6], has far located in Wuhan. surpassed past coronavirus events such as the Middle East respiratory syndrome (MERS)–related coronavirus, which had Methods 2494 cases as of November 2019, and the 2003 SARS coronavirus, which had more than 8000 cases and affected 26 Study Design countries. It is unknown whether viral mutations will result in This observational infoveillance study was conducted in two patterns of annual re-emergence as seen with influenza strains. phases: data collection using an automated Python (Python Software Foundation) programming script to collect Attempts to predict epidemiological features (eg, prevalence, COVID-19-related posts on Weibo, and quantitative and attack rate, replication or reproduction rate, morbidity, and qualitative analysis to identify trends and characterize key mortality) of an outbreak to inform infection control and public themes discussed by Chinese users. health countermeasures are critical. This can be challenging during the earlier stages of an outbreak when there is a lack of Data Collection sufficient information regarding the etiology of the disease, Programming scripts were written in the Python programming inadequate diagnostic and testing capabilities, and incomplete language to extract posts on Weibo in the Chinese language epidemiological data regarding confirmed cases [7]. In the (traditional and simplified Chinese) from users self-reporting absence of such data, the use of information in an electronic their location in Wuhan. Weibo users can post messages limited medium such as social media conversations can enable to 2000 characters with or without images, videos, and other syndromic surveillance approaches to characterize disease multimedia, and can repost messages equivalent to the retweet distribution and provide accurate case counts more rapidly [8]. function on Twitter. Python scripts were set to continuously These “infoveillance” approaches have been used to characterize collect data filtered for COVID-19–related keywords from a host of public health issues including topics related to mental December 23, 2019, to January 30, 2020. Keywords included health, substance abuse behavior, the spread of foodborne the Chinese-language terms: [冠状病毒] (coronavirus), [新型 illness, and the monitoring of infectious disease outbreaks (eg 肺炎] (novel pneumonia), [武汉肺炎] (Wuhan pneumonia), pertussis, influenza, HIV/AIDS, dengue, West Nile virus, Zika [疫情] (epidemic situation), [非典] (severe acute respiratory virus, H1N1, and Ebola) [8-12]. Specifically, the now ubiquitous syndrome), [华南海鲜市场] (Wuhan Seafood Wholesale nature of social media means it represents an important, Market). These keywords were chosen on the basis of manual “nontraditional” source for disease surveillance. Specifically, searches via the platform’s public search function to detect a user-generated social media data can be mined to assess the baseline of user conversations related to the outbreak for our public’s knowledge, attitudes, and behaviors toward the disease, systematic data collection processes. The variation in selected and can be particularly informative when cross-validated with keywords was also necessary, as the official name of the disease traditional disease surveillance data [13-17]. Others have also was not announced until February 11, 2020 [20]. These data used global social media platforms such as Twitter to examine collection methods are consistent with other analyses of the 2009 H1N1 pandemic, conduct content analysis, and identify health-related posts (including flu-related topics) on the Weibo platform not related to COVID-19 [21]. http://publichealth.jmir.org/2020/2/e18700/ JMIR Public Health Surveill 2020 | vol. 6 | iss. 2 | e18700 | p. 2 (page number not for citation purposes) XSL• FO RenderX
JMIR PUBLIC HEALTH AND SURVEILLANCE Li et al Posts were filtered for geographic location, thereby identifying Third, we identified parent classifications to select prevalent users specifically within the city of Wuhan, China. Posts with topics and then tagged and grouped these classifications with geographic locations outside of Wuhan City were not included supporting qualitative data (eg, Weibo posts). We primarily in this study, as the aim was to focus on the early origins of the relied on inductive coding approaches starting with Weibo outbreak in this region. Posts were collected from all account COVID-19 posts identified but also informed this inductive types including personal accounts, media accounts, and coding based on themes detected in existing literature from prior government accounts. Technical limitations of the Weibo disease outbreaks [18,21]. Coders individually selected parent platform and Python script used to collect data limited our data topic classifications to represent different thematic areas and collection to a maximum of 2000 posts per hour. However, collapsed infrequent categories into parent classifications. We during the 936 hours of data collection, we did not reach this then combined the related topics, removed duplicate topics, and limit for any hour of collection. evaluated thematic concurrence by independently coding the entire sample of posts collected from the early period of the The number of COVID-19 cases in all of mainland China and outbreak with detected COVID-19-related posts (December 31, Hubei Province were collected for each calendar day between 2019, until January 20, 2020.) December 23, 2019, and January 30, 2020, inclusively. Case counts were made publicly available on the internet by the Chinese National Health Commission, a cabinet-level executive Results department of the Chinese central government, headquartered Data Availability and Ethics Approval in Beijing [20]. Data collected on social media platforms is available on request Quantitative Analysis from authors subject to appropriate deidentification. Ethics Weibo posts with COVID-19-related keywords were binned approval was not required for this study. All information into each calendar day to calculate posts per day. Longitudinal collected from this study was from the public domain, and the trends were then visually conveyed using line graphs. Regression study did not involve any interaction with users. Indefinable analysis was conducted to understand the predictive value of user information was removed from the study results. social media posts on the number of confirmed cases reported Data Collection by the Chinese government. Simple linear regression was performed between social media posts per day and the number There were 115,299 posts collected during the study time frame, of cases reported within mainland China, excluding Hubei with an average of 2956 Weibo posts per day. There was a high Province, and the cases in Hubei Province alone. Also, a simple degree of variation in the number of posts depending on the linear regression model was computed wherein the number of date of collection, with 0 posts collected on one day (December posts per day was used to predict percent daily change in cases 26, 2019) and the highest number of posts (13,740) collected from Hubei, and a separate model was computed to predict on January 27, 2020. During this same time period, China percent daily change in cases from mainland China (excluding reported 36,456 confirmed cases of COVID-19. COVID-19 Hubei). Modeling of daily changes was conducted using the cases were reported starting on January 16, 2020, when 45 cases final 20 days of data, as prior days did not exhibit daily changes were reported. The number of cases then increased rapidly, in posts, cases from Hubei, or cases from the remainder of reaching 9692 cases on January 30, 2020, the final day of data China. A P value
JMIR PUBLIC HEALTH AND SURVEILLANCE Li et al reported in Hubei among all incident cases in mainland China outbreak-related Weibo posts are reactive to local disease (Wuhan cases proportion=–0.003*posts+100.2; P
JMIR PUBLIC HEALTH AND SURVEILLANCE Li et al with the outbreak. This period of initial uncertainty was followed behaviors that could potentially increase the risk of transmission by the information disclosure about the causative agent by the (eg, going to New Year celebration events and self-evacuation Chinese National Health Commission and other official from Wuhan). New user-response topics began to emerge toward government and academic sources. As the outbreak progressed, the end of the early outbreak period, including criticism of the a higher volume of more precise terms including “novel Wuhan Red Cross response and user uncertainty related to news coronavirus” or “COVID-19” were detected, with a decline and about quarantines, travel restrictions, and new hospital shift from posts mentioning colloquial terms such as “unknown construction projects. reason pneumonia” and “Wuhan pneumonia,” similar to Another subtheme detected in user responses was the discussion terminology adoption during the H1N1 pandemic [18]. about COVID-19-related symptoms that appeared in both news The presence of parent classifications also changed over the and individual posts. At the beginning of the outbreak period, time period. For example, at the onset of the outbreak, many most symptom-related posts were posted by official accounts users discussed what the causative agent of the outbreak might or consisted of news posts that were reposted and shared by the be, including how seasonal influenza, avian influenza, MERS, public that included describing symptoms of fever and shortness and SARS were ruled out. From January 16-20, posts about the of breath (January 1) and coughing and weakness or fatigue causative agent also increased with most conversations (January 10). Separate from these news-related posts, some discussing a novel coronavirus strain. There were also early individual accounts also self-reported other symptoms including conversations about the South China Seafood Market and headache, diarrhea, sore throat, and perspiration during sleep. wildlife trafficking, reflecting uncertainty regarding the zoonotic After January 20, 2020, human-to-human transmission was origins and possible transmission vectors of the disease. The confirmed, and we detected an increase in posts related to user proportion of posts with reference to the South China Seafood self-reported symptoms along with posts that provided Market were much higher prior to January 6. second-hand reporting of symptoms from other people during our coding of the random stratified sample of the entire 39-day The detection of posts related to the epidemiological data study period. Though not fully coded for this study, our characteristics of the outbreak were relatively consistent random selection detected a few users who reported other throughout the entire 21-day period assessed. However, disputed symptoms, including a loss of taste and lack of appetite, following confirmation of the outbreak as a novel coronavirus, in the last 10 days of our overall data collection (January 21-30). there was a spike of discussions about whether COVID-19 was transmittable human-to-human. Relatedly, at the beginning of Importantly, both the nature of the content and the volume of the outbreak, there were some posts where users expressed their posts were likely driven by a combination of release of own personal reaction and concerns about a potential outbreak. government information and news events. For example, From January 14, after 3 cases were confirmed outside of China, December 31, 2019 had the second highest volume of posts there was a notable increase in the number of posts related to related to the COVID-19 term “pneumonia of unknown cause,” the public's reaction to the outbreak and its associated risks and which corresponded to the confirmation from an official source control and response measures. (the Wuhan Health Commission) of cases of pneumonia of unknown etiology detected in Wuhan city. There was also an More specific subthemes regarding users’ knowledge, attitudes, increase in COVID-19 Weibo posts on January 6, 2020, likely and responses to COVID-19 changed as more information about driven by news on January 5 that laboratory test results had the underlining epidemiology became available. Specifically, ruled out other pathogens (eg, influenza, avian influenza, the terms “confirmed case,” “suspected case,” “death case,” adenovirus, MERS, and SARS) as the cause of the outbreak. “human-to-human transmission,” “monitored,” “public health Additionally, on January 15, the WHO announced that the supervision,” and “quarantine” became more frequent as the possibility of limited human-to-human transmission could not outbreak progressed. Accompanying this shift in terminology, be excluded. This drove a second increase in the overall number we also observed wide variation in user reactions as information of posts on January 16. Finally, on January 20, human-to-human from government sources was disseminated and the outbreak transmission was confirmed, generating a large number of worsened. This included posts conveying protective behaviors conversations from both media accounts and private Weibo (eg, cleaning hands, staying away from crowds, wearing medical users, resulting in the largest increase of posts observed during masks in public areas), while others conveyed attitudes and this time period. http://publichealth.jmir.org/2020/2/e18700/ JMIR Public Health Surveill 2020 | vol. 6 | iss. 2 | e18700 | p. 5 (page number not for citation purposes) XSL• FO RenderX
JMIR PUBLIC HEALTH AND SURVEILLANCE Li et al Table 1. Posts on the Weibo social media platform organized according to themes detected in qualitative analysis. Theme, Description of topic (English) Example topics (Chinese with English translation) Causative agent Concerns about potential SARSa re-emergence [希望不是非典,大家安心过年] “I wish this is not SARS, hope everyone can have a peaceful New Year Eve” Wuhan Seafood Wholesale Market considered the source of causative [华南海鲜市场可能是病毒来源] agent “Wuhan Seafood Wholesale Market might be the source of the causative agent” Pneumonia caused by unknown reason [武汉出现不明原因肺炎] “cases of pneumonia of unknown etiology detected in Wuhan City” Novel coronavirus has been confirmed as causative agent [初步认定为新型冠状病毒] “Causative agent preliminary identification of a novel coronavirus” Epidemiological characteristics of the outbreak No evidence of human-to-human transmission [目前没有证据证明人传人] “There is no evidence of human-to-human transmission” New confirmed cases and statistics on mortality [截至昨日病例已增至44例,其中11例重症,均接受隔离治疗] “Until yesterday, there were 44 confirm cases, 11 severe cases, all have been isolated and treated” Public Health supervision for people who had close contact with pa- [与病患接触者进行医学观察] tients with confirmed cases “People who have close connection with confirmed cases are under med- ical supervision” Public reaction to outbreak control and response Wear masks, keep away from crowds [虽然不知道是不是人传人,还是戴上口罩,远离人群] “Not sure if it is human to human transmissible, but wear masks and keep away from crowds would help. Just in case.” Self-evacuating Wuhan [太可怕了,快点离开这儿吧] “It’s scary, let’s leave here (Wuhan)” Will still attend New Year celebration event [不明原因肺炎丝毫不影响大家戴口罩出来跨年] “The pneumonia of unknown cause seems not to impact people’s New Year Eve celebration event” Other topics Wuhan Red Cross under scrutiny [红十字会遭到社会指责] “The Red Cross was accused by society” Travel restrictions [从1月23日10时起,武汉市和周边的鄂州市、仙桃市、潜江市、黄 冈市、荆门市等相续宣布暂停运营城市公交、地铁、轮渡、长途客 运,暂时关闭机场、火车站、高速公路等离开通道,严防武汉新型 冠状病毒疫情扩散。] “From 10:00 on January 23, Wuhan and surround- ing Ezhou, Xiantao, Qianjiang, Huanggang, Jingmen, etc. have successively announced the suspension of operation of city buses, subways, ferries, long-distance passenger transport, and temporarily closed airports and trains. Stations, highways, etc. leave the passage to strictly prevent the spread of new coronavirus epidemics in Wuhan.” New hospital construction projects [新建武汉“小汤山] “Will build new hospitals in Wuhan (Lei Shen Shan & Huo Shen Shan), also called Wuhan Xiao Tang Shan” a SARS: severe acute respiratory syndrome. microblogging site Twitter or Google trends data, yet only a Discussion few have examined foreign-language platforms. Due to the Principal Findings COVID-19 outbreak originating in Wuhan, China, this study sought to identify, characterize, and assess the potential The vast majority of infoveillance studies have analyzed data relationship between Chinese social media conversations taking from English-language social media platforms such as the http://publichealth.jmir.org/2020/2/e18700/ JMIR Public Health Surveill 2020 | vol. 6 | iss. 2 | e18700 | p. 6 (page number not for citation purposes) XSL• FO RenderX
JMIR PUBLIC HEALTH AND SURVEILLANCE Li et al place in Wuhan and the number of COVID-19 confirmed cases Public perception about the origins and transmission patterns at the early stages of the outbreak, which has now transitioned of COVID-19 changed over the study period, with initial into a worldwide pandemic. It also sought to understand how conversations focusing on a possible link with the seafood user perception changed as additional information became market in Wuhan, later changing to discussions about the available from government and media sources as the outbreak increased number of cases reporting no exposure to the seafood progressed while also attempting to identify parent market, and then leading to discussions of possible classifications of predominant user-generated themes that human-to-human transmission, which was later confirmed by emerged as the outbreak accelerated. These study objectives the Chinese government. Overall, we observed a wide variation and some of its general findings are consistent with prior studies, in user reactions to information, with some users expressing a including a 2010 infoveillance study by Chew and Eysenbach willingness to undertake protective behavior and other users [18] assessing the 2009 H1N1 outbreak. In that study, posts downplaying the risks and engaging in behaviors that could from Twitter were collected, thematically assessed, and found have exacerbated disease spread (eg, leaving Wuhan, attending to be significantly correlated with weekly H1N1 incidence New Year events). Importantly, these observed attitudes and during the outbreak, with the absolute increase in H1N1-related behaviors happened prior to the Chinese government announcing tweet volume coinciding with major news events. a lockdown of Wuhan and other cities in Hubei on January 23, 2020. After the announcement of these quarantine measures, Based on our analysis, there appears to be a positive correlation there was a sizable increase in the volume of Wuhan Weibo between the number of COVID-19-related Weibo posts from posts we collected. Wuhan and the number of cases officially reported in Wuhan during the early stages of this outbreak. This effect size was Though the mixed quantitative and qualitative results of this larger than what was observed for the rest of China excluding study are primarily exploratory, they provide important insight Hubei Province (where Wuhan is the capital city) and held when on the changing knowledge, attitudes, and behaviors of Chinese comparing the number of Weibo posts to the incidence social media users who were at the epicenter of what is now a proportion of cases in Hubei. However, any potential predictive rapidly expanding pandemic that has impacted all facets of value of using social media data as a proxy for real world public global society. More research is needed to better understand the health surveillance statistics needs more rigor and added data effectiveness of health communication strategies during evolving layers to confirm possible associations, particularly in the outbreaks such as COVID-19, particularly in the context of how context of user reactions to news events as discussed in this and information is understood, shared, and acted upon by users in other studies. Despite these limitations, qualitative analysis the face of uncertainty and changing information. Specifically, characterized the early stages of the outbreak as having different we need to better understand how social media platforms can degrees of the Chinese public’s uncertainty regarding the risks influence the public’s risk perception, their trust and credibility posed by COVID-19. As information emerged about the disease, of different information sources, and, ultimately, how it changes users expressed new concerns, which also led to changing real-world behavior that can have an impact on control measures knowledge, attitudes, and behaviors among Chinese social media enacted to mitigate an outbreak. users, some protective and some that may have introduced Early reports indicated that major social media platforms are increased risks of disease spread. struggling with the volume of COVID-19 information and In response to public uncertainty, the Chinese government issued user-generated content flooding their platforms, some of which a series of announcements on the Weibo “official” accounts in is helpful and accurate and some of which are rumors and an attempt to clarify the characteristics of the disease as they misinformation [22]. In fact, popular social media platforms became known, including an official warning to hospitals on TikTok, Facebook, and Twitter have all announced measures December 30, 2019, about how to report potential cases and a to better ensure access to credible and accurate information subsequent announcement on January 8, 2020, confirming a about COVID-19, though whether these platforms are up to this novel coronavirus as the causative disease agent (Figure 2). task remains an open question [22].Evaluating whether social These events evidence that social media was used as an outbreak media can act as a positive tool to promote global health communication tool by the government, media, and users (who objectives, particularly in the context of health emergencies, reposted content) and led to the dissemination of and user will be tested by COVID-19, along with its utility as a modern reaction to information on an outbreak whose trajectory would approach to public health surveillance. take it global. http://publichealth.jmir.org/2020/2/e18700/ JMIR Public Health Surveill 2020 | vol. 6 | iss. 2 | e18700 | p. 7 (page number not for citation purposes) XSL• FO RenderX
JMIR PUBLIC HEALTH AND SURVEILLANCE Li et al Figure 2. Timeline of COVID-19 events, themes detected in Weibo posts, and examples of posts. CCTV: China Central Television; CDC: Centers for Disease Control and Prevention; COVID-19: coronavirus disease. simple linear regression showed a positive correlation between Limitations COVID-19 Weibo posts and the number of Wuhan cases in the This study has certain limitations. First, our data collection was study time frame, but we did not control for other potential limited to one Chinese social media platform over a prescribed confounders. Further, it is unclear whether our observed period of time. Hence, it is not generalizable to all COVID-19 predictive trend line would continue or if the trend line is social media conversations occurring among Chinese users. We generalizable to other instances of COVID-19 outbreaks in other did not examine Chinese private communication apps (eg QQ, countries or communities. It is more likely that only under WeChat) due to the difficulty of collecting data on these specific disease transmission circumstances would this platforms. Future studies should assess a broader scope of correlation occur, namely a lack of knowledge and reporting of conversational data from multiple platforms and use natural an outbreak in its early stages, a novel virus with high language processing and machine learning approaches to help transmission and sustained community spread, and high social classify larger volumes of conversations. Second, our data media engagement involving outbreak conversations. Further, collection started from the reported early stages of the as previously stated, it is highly likely that the volume of posts COVID-19 outbreak. During parts of this time period, the are associated with user reactions to news events and causative agent had not been confirmed and there was no official government announcements. Finally, due to censorship in China, name for the disease. Due to early inconsistency in terminology, posts may have been deleted before data collection. In fact, a Weibo users may have used other keywords to describe few messages detected included associated comments that had COVID-19 that were not collected in this study. Third, the been deleted and were not retrievable. Authors' Contributions JL collected the data; all authors designed the study, conducted the data analyses, wrote the manuscript, and approved the final version of the manuscript. Conflicts of Interest JL, QX, and TKM are employees of the startup company S-3 Research LLC. S-3 Research is a startup funded and currently supported by the National Institutes of Health – National Institute on Drug Abuse through a Small Business Innovation and Research contract for opioid-related social media research and technology commercialization. Authors report no other conflicts of interest associated with this manuscript. http://publichealth.jmir.org/2020/2/e18700/ JMIR Public Health Surveill 2020 | vol. 6 | iss. 2 | e18700 | p. 8 (page number not for citation purposes) XSL• FO RenderX
JMIR PUBLIC HEALTH AND SURVEILLANCE Li et al References 1. Centers for Disease Control and Prevention. 2019 Novel Coronavirus (2019-nCoV) Situation Summary URL: https://www. cdc.gov/coronavirus/2019-nCoV/summary.html#background [accessed 2020-02-26] 2. Wuhan Municipal Health Commission. [Wuhan Municipal Health Commission on the current situation of pneumonia in our city] URL: http://wjw.wuhan.gov.cn/front/web/showDetail/2019123108989 [accessed 2020-02-26] 3. World Health Organization. 2020 Jan 12. Novel Coronavirus – China URL: https://www.who.int/csr/don/ 12-january-2020-novel-coronavirus-china/en/ [accessed 2020-02-26] 4. World Health Organization. 2020 Jan 31. Novel Coronavirus (2019-nCoV) situation report - 11 URL: https://www.who.int/ docs/default-source/coronaviruse/situation-reports/20200131-sitrep-11-ncov.pdf?sfvrsn=de7c0f7_4 [accessed 2020-02-26] 5. CCDC Weekly. 2020. The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19) — China, 2020 URL: http://weekly.chinacdc.cn/en/article/id/e53946e2-c6c4-41e9-9a9b-fea8db1a8f51 [accessed 2020-02-26] 6. Wang C, Horby PW, Hayden FG, Gao GF. A novel coronavirus outbreak of global health concern. Lancet 2020 Feb 15;395(10223):470-473. [doi: 10.1016/S0140-6736(20)30185-9] [Medline: 31986257] 7. Kelly-Cirino CD, Nkengasong J, Kettler H, Tongio I, Gay-Andrieu F, Escadafal C, et al. Importance of diagnostics in epidemic and pandemic preparedness. BMJ Glob Health 2019;4(Suppl 2):e001179 [FREE Full text] [doi: 10.1136/bmjgh-2018-001179] [Medline: 30815287] 8. Eysenbach G. Infodemiology and infoveillance: framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the Internet. J Med Internet Res 2009 Mar 27;11(1):e11. [doi: 10.2196/jmir.1157] [Medline: 19329408] 9. Kim SJ, Marsch LA, Hancock JT, Das AK. Scaling up research on drug abuse and addiction through social media big data. J Med Internet Res 2017 Oct 31;19(10):e353. [doi: 10.2196/jmir.6426] [Medline: 29089287] 10. Gianfredi V, Bragazzi N, Mahamid M, Bisharat B, Mahroum N, Amital H, et al. Monitoring public interest toward pertussis outbreaks: an extensive Google Trends-based analysis. Public Health 2018 Dec;165:9-15. [doi: 10.1016/j.puhe.2018.09.001] [Medline: 30342281] 11. Bragazzi NL, Bacigaluppi S, Robba C, Siri A, Canepa G, Brigo F. Infodemiological data of West-Nile virus disease in Italy in the study period 2004-2015. Data Brief 2016 Dec;9:839-845. [doi: 10.1016/j.dib.2016.10.022] [Medline: 27872881] 12. Mamidi R, Miller M, Banerjee T, Romine W, Sheth A. Identifying key topics bearing negative sentiment on Twitter: insights concerning the 2015-2016 Zika epidemic. JMIR Public Health Surveill 2019 Jun 04;5(2):e11036. [doi: 10.2196/11036] [Medline: 31165711] 13. Majumder MS, Santillana M, Mekaru SR, McGinnis DP, Khan K, Brownstein JS. Utilizing nontraditional data sources for near real-time estimation of transmission dynamics during the 2015-2016 Colombian Zika virus disease outbreak. JMIR Public Health Surveill 2016 Jun 01;2(1):e30. [doi: 10.2196/publichealth.5814] [Medline: 27251981] 14. Santillana M, Nguyen AT, Dredze M, Paul MJ, Nsoesie EO, Brownstein JS. Combining search, social media, and traditional data sources to improve influenza surveillance. PLoS Comput Biol 2015 Oct;11(10):e1004513. [doi: 10.1371/journal.pcbi.1004513] [Medline: 26513245] 15. Signorini A, Segre AM, Polgreen PM. The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic. PLoS One 2011 May 04;6(5):e19467. [doi: 10.1371/journal.pone.0019467] [Medline: 21573238] 16. Nsoesie EO, Brownstein JS. Computational approaches to influenza surveillance: beyond timeliness. Cell Host Microbe 2015 Mar 11;17(3):275-278. [doi: 10.1016/j.chom.2015.02.004] [Medline: 25766284] 17. World Health Organization. Naming the coronavirus disease (COVID-19) and the virus that causes it URL: https://www. who.int/emergencies/diseases/novel-coronavirus-2019/technical-guidance/ naming-the-coronavirus-disease-(covid-2019)-and-the-virus-that-causes-it [accessed 2020-02-26] 18. Chew C, Eysenbach G. Pandemics in the age of Twitter: content analysis of Tweets during the 2009 H1N1 outbreak. PLoS One 2010 Nov 29;5(11):e14118. [doi: 10.1371/journal.pone.0014118] [Medline: 21124761] 19. Xinhua. 2019 Aug 20. Weibo reports robust Q2 user growth URL: http://www.xinhuanet.com/english/2019-08/20/ c_138323288.htm [accessed 2020-04-13] 20. Muniz-Rodriguez K, Chowell G, Cheung C, Jia D, Lai P, Lee Y, et al. Epidemic doubling time of the COVID-19 epidemic by Chinese province. medRxiv 2020 Feb 28. [doi: 10.1101/2020.02.05.20020750] 21. Wang S, Paul M, Dredze M. Exploring health topics in Chinese social media: an analysis of Sina Weibo. 2014 Presented at: Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence; 2014 June 18; Québec City, Québec. 22. Convertino J. ABC News. 2020 Mar 05. Social media companies partnering with health authorities to combat misinformation on coronavirus URL: https://abcnews.go.com/Technology/ social-media-companies-partnering-health-authorities-combat-misinformation/story?id=69389222 [accessed 2020-03-10] Abbreviations COVID-19: coronavirus disease MERS: Middle East respiratory syndrome http://publichealth.jmir.org/2020/2/e18700/ JMIR Public Health Surveill 2020 | vol. 6 | iss. 2 | e18700 | p. 9 (page number not for citation purposes) XSL• FO RenderX
JMIR PUBLIC HEALTH AND SURVEILLANCE Li et al PHEIC: Public Health Emergency of International Concern SARS: severe acute respiratory syndrome WHO: World Health Organization Edited by T Sanchez, G Eysenbach; submitted 12.03.20; peer-reviewed by A Daughton, X Huang; comments to author 07.04.20; revised version received 14.04.20; accepted 14.04.20; published 21.04.20 Please cite as: Li J, Xu Q, Cuomo R, Purushothaman V, Mackey T Data Mining and Content Analysis of the Chinese Social Media Platform Weibo During the Early COVID-19 Outbreak: Retrospective Observational Infoveillance Study JMIR Public Health Surveill 2020;6(2):e18700 URL: http://publichealth.jmir.org/2020/2/e18700/ doi: 10.2196/18700 PMID: ©Jiawei Ken Li, Qing Xu, Raphael Cuomo, Vidya Purushothaman, Tim Mackey. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 21.04.2020. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included. http://publichealth.jmir.org/2020/2/e18700/ JMIR Public Health Surveill 2020 | vol. 6 | iss. 2 | e18700 | p. 10 (page number not for citation purposes) XSL• FO RenderX
You can also read