COVID-19 and Arabic Twitter: How can Arab World Governments and Public Health Organizations Learn from Social Media? - OpenReview
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
COVID-19 and Arabic Twitter: How can Arab World Governments and Public Health Organizations Learn from Social Media? Lama Alsudias Paul Rayson King Saud University / Saudi Arabia Lancaster University / UK Lancaster University / UK p.rayson@lancaster.ac.uk lalsudias@ksu.edu.sa Abstract academic researchers in various fields including Natural Language Processing (NLP) have carried In March 2020, the World Health Organiza- out studies targetting this subject. For example, tion announced the COVID-19 outbreak as a pandemic. Most previous social media re- COVID-19 and AI1 is one of the conferences that lated research has been on English tweets and has been convened virtually to show how Artificial COVID-19. In this study, we collect approxi- Intelligence (AI) can contribute in helping the mately 1 million Arabic tweets from the Twit- Public Health Organizations during pandemics. ter streaming API related to COVID-19. Fo- People use social media applications such as cussing on outcomes that we believe will be Twitter to find the news related to COVID-19 useful for Public Health Organizations, we and/or express their opinions and feelings about analyse them in three different ways: identi- it. As a result, a vast amount of information could fying the topics discussed during the period, detecting rumours, and predicting the source be exploited by NLP researchers for a myriad of the tweets. We use the k-means algorithm of analyses despite the informal nature of social for the first goal with k=5. The topics dis- media writing style. cussed can be grouped as follows: COVID- 19 statistics, prayers for God, COVID-19 loca- We hypothesise that Public Health Organiza- tions, advise and education for prevention, and tions (PHOs) may benefit from mining the topics advertising. We sample 2000 tweets and label discussed between people during the pandemic. them manually for false information, correct information, and unrelated. Then, we apply This may help in understanding a population’s three different machine learning algorithms, current and changing concerns related to the Logistic Regression, Support Vector Classifi- disease and help to find the best solutions to protect cation, and Naı̈ve Bayes with two sets of fea- people. In addition, during an outbreak people tures, word frequency approach and word em- themselves will search online for reliable and beddings. We find that Machine Learning clas- trusted information related to the disease such as sifiers are able to correctly identify the ru- prevention and transmission pathways. COVID-19 mour related tweets with 84% accuracy. We Twitter conversations may not correlate with the also try to predict the source of the rumour re- lated tweets depending on our previous model actual disease epidemiology. Therefore, Public which is about classifying tweets into five cat- Health Organizations have a vested interest in en- egories: academic, media, government, health suring that information spread in the population is professional, and public. Around (60%) of the accurate (Vorovchenko et al., 2017). For instance, rumour related tweets are classified as written the Ministry of Health in Saudi Arabia2 presents by health professionals and academics. a daily press conference incorporating the aim of quickly stopping the spread of false rumours. 1 Introduction However, there is currently a prolonged period The current coronavirus disease (COVID-19) of time until warnings are issued. For example, outbreak is of major global concern and is the first tweet that included false information classified by the World Health Organization as about hot weather killing the virus was on 10 an international health emergency. Governments 1 https://hai.stanford.edu/events/ around the world have taken different decisions covid-19-and-ai-virtual-conference 2 in order to stop the spread of the disease. Many https://www.moh.gov.sa
February 2020, while the press conference which (2016). These studies followed a variety of di- responded to this rumour on 14 April 2020. There rections for analysis with multiple different goals is a clearly a need to find false information as (Joshi et al., 2019). The study of Ahmed et al. quickly as possible. In addition, effort needs to (2019) used a thematic analysis of tweets related be made in relation to tracking the user accounts to the H1N1 pandemic. Eight key themes emerged that promote rumours. This can be undertaken from the analysis: emotion, health related infor- using a variety of techniques, for example mation, general commentary and resources, media using social network features, geolocation, bot and health organisations, politics, country of origin, detection or content based approaches such as food, and humour and/or sarcasm. language style. Public Health Organizations would A survey study (Fung et al., 2016b) reviewed benefit from speeding up the process of tracking in the research relevant to the Ebola virus and social order to stop rumours and remove the bot networks. media. It compared research questions, study designs, data collection methods, and analytic The vast majority of the previous research in this methods. Ahmed et al. (2017b) used content area has been on English Twitter content but this analysis to identify the topics discussed on Twitter will not directly assist PHOs in Arabic speaking at the beginning of the 2014 Ebola epidemic in the countries. The Arabic language is spoken by 467 United States. In (Vorovchenko et al., 2017), they million people in the world and has more than 26 determined the geolocation of the Ebola tweets dialects3 . Of particular importance for NLP is and named the accounts that interacted more on coping with dialectal and/or meaning differences Twitter related to the 2014 West African Ebola in less formal settings such as social media. As an outbreak. The main goal of the study by Kalyanam example in health field, the word ( ám') may be et al. (2015) was to distinguish between credible understood as vaccination4 in Modern Standard and fake tweets. It highlighted the problems of Arabic or reading supplications in Najdi dialect5 . manual labeling process with verification needs. There has been much recent progress in Arabic The study in (Fung et al., 2016a) highlighted NLP research yet there is still an urgent need to how the problem of misinformation changed develop fake news detection for Arabic tweets during the disease outbreak and recommended (Mouty and Gazdar, 2018). a longitudinal study of information published on social media. Moreover, it pointed out the In this paper, we have combined qualitative and importance of understanding the source of this quantitative studies to analyse Arabic tweets aim- information and the process of spreading rumours ing to support Public Health Organizations who can in order to reduce their impact in the future. learn from social media data along various lines: • Analyzing the topics discussed between peo- Ghenai and Mejova (2017) tracked Zika Fever ple during the peak of COVID-19 misinformation on social media by comparing them with rumours identified by the World • Identifying and detecting the rumours related Health Organization. Also, they pointed out the to COVID-19. importance of credible information sources and • Predicting the type of sources of tweets about encouraged collaboration between researchers COVID-19. and health organizations to rapidly process the misinformation related to health on social media. 2 Related Work The study in (Ortiz-Martı́nez and Jiménez-Arcia, 2017) reviewed the quality of available yellow There is a vast quantity of research over recent fever information on Twitter. It also showed years that analyses social media data related to the significance of the awareness of misleading different pandemics such as H1N1 (2009), Ebola information during pandemic spread. The study of (2014), Zika Fever (2015), and Yellow Fever Zubiaga et al. (2018a) summarised other studies 3 https://en.wikipedia.org/wiki/Arabic related to social media rumours. It illustrated 4 https://en.wikipedia.org/wiki/Modern_ techniques for developing rumour detection, Standard_Arabic 5 https://en.wikipedia.org/wiki/Najdi_ rumour tracking, rumour stance classification, Arabic and rumour veracity classification. Vorovchenko
et al. (2017) mentioned the importance of Twitter 2020), which integrates the scientific and medical information during the epidemic and how Public vocabularies of infectious diseases with their in- Health Organisations can benefit from this. They formal equivalents used in general discourse. We showed the requirement to monitor false informa- collated COVID-19 information from the World tion posted by some accounts and recommended Health Organization6 and Ministry of health in that this was performed in real time to reduce the Saudi Arabia. This included symptom, cause, danger of this information. Also, they discussed prevention, infection, organ, treatment, diagnosis, the lack of available datasets which help in the place of the disease spread, and slang terms for development of rumour classification systems. COVID-19 and extended our ontology7 . These terms were then used in our collection process. Researchers have been doing studies on building and analysing COVID-19 Twitter datasets since 4 Data Collection the disease appeared in December 2019. So We began collecting Arabic tweets about a number far, there are two different datasets which have of infectious diseases from September 2019. Here been published recently related to Arabic and in this paper, we analysed only the tweets related COVID-19 (Alqurashi et al., 2020) and (Haouari to COVID-19 from December 2019 to April et al., 2020). The former collected 3,934,610 2020 (there are a few tweets between September million tweets until April 15 2020, and the latter and November, these are related to Middle East included around 748k tweets until March 31, respiratory syndrome coronavirus, MERS-CoV8 ). 2020. These papers contain an initial analysis We have collected approximately six million and statistical results for the collected tweets and tweets in Arabic during this period. We obtained some suggestions for future work, which include the tweets depending on three keywords ( AKðPñ», pandemic response, behavior analysis, emergency management, misinformation detection, and social AKðQ», 19 YJ¯ ñ») which mean Coronavirus, a analytics. misspelling of the name of Coronavirus, and On the other hand, there are some datasets in En- COVID-19 respectively in English. We collected glish such as (Chen et al., 2020) and (Lopez et al., the tweets weekly using Twitter API. 2020). Also, there is a multilingual COVID-19 dataset containing location information (Qazi Next, we pre-processed the tweets through a et al., 2020). This contains more than 524 million pipeline of different steps: tweets, with 5.5 million Arabic tweets, posted over • Manually remove retweets, advertisements, a period of three months since February 1, 2020. It and spam. focuses on determining the geolocation of a tweet which can help research with various different • Filter out URLs, mentions, hashtags, numbers, challenges, including identifying rumours. emojis, repeating characters, and non-Arabic words using Python scripts9 . Although the above studies have produced datasets related to COVID-19, they do not anal- • Normalize and tokenize tweets. yse them deeply using NLP methods. Previous • Remove Arabic stopwords (Alrefaie, 2017). studies representing earlier epidemics present good techniques and results, however none of them are After pre-processing, the resulting dataset related to Arabic tweets. Therefore, to assist PHOs was 1,048,575 unique tweets from the original in Arabic speaking countries there is an urgent need 6,578,982 collected. Figure 1 shows the number of to analyse tweets related to COVID-19 using mul- Arabic tweets about Coronavirus each week with tiple Arabic NLP techniques. specific dates highlighted to show government deci- sions on protecting the population from COVID-19 3 Update Arabic Infectious Disease and other key dates for context. Ontology 6 http://www.emro.who.int 7 With the recent appearance of COVID-19 as a https://github.com/alsudias/ Arabic-Infectious-Disease-Ontology new disease, there is need to update our Arabic 8 https://www.who.int/ Infectious Disease Ontology (Alsudias and Rayson, news-room/fact-sheets/detail/
Figure 1: Number of Arabic tweets about Coronavirus 5 Methods of the one million tweets, we sampled 2,000 tweets to classify them for rumour detection. We manu- We performed three different types of analysis on ally labelled the tweets to create a gold standard the collected data. Firstly, in order to better under- dataset and then applied different machine learning stand the topics discussed in the corpus, we carried algorithms in this part of our study. out a cluster analysis. Secondly, taking a sample of the corpus, we performed rumour detection. Fi- 5.2.1 Labelling Guidelines nally, we extended our previous work to classify We manually labelled the tweets with 1, -1 , and 0 the source of tweets into five types of Twitter users to represent false information, correct information, which aims at helping to determine their veracity. and unrelated content, respectively. Our reference point for deciding whether the content contained 5.1 Cluster Analysis true or false information was based on the list is- To explore the topics discussed on Twitter dur- sued by the Ministry of Health in Saudi Arabia 10 ing the COVID-19 epidemic in Saudi Arabia and and is regularly updated (the last update applied other countries in the Arab World, we subjected for this study dates from 14 April 2020). Table 1 the text of the tweets to cluster analysis. After presents the list in both Arabic and English and pre-processing the tweets as described above, we Table 2 shows some example tweets for each label. used the N-gram forms (unigram, bigram, and tri- gram) of twitter corpus and clustered them using the K-means algorithm with the Python Scikit-learn 5.2.2 Machine Learning Models 0.20.2 (Pedregosa et al., 2011) software and set the We applied three different machine learning algo- value of k, the number of clusters, to be five. rithms: Logistic Regression (LR), Support Vector Classification (SVC), and Naı̈ve Bayes (NB). To 5.2 Rumour Detection help the classifier distinguish between the classes Following previous work (Zubiaga et al., 2018b), more accurately, we extracted further linguistic fea- we applied a top-down strategy, which is where the tures. The selected features fall into two groups: set of rumours is identified in advance then the data word frequency, count vector and TF-IDF, and is sampled to extract the posts associated with the word embedding based (Word2Vec and FastText). previously identified rumours. In our dataset, out We used 10-fold cross validation to determine accu- racy of the classifiers for this dataset, splitting the middle-east-respiratory-syndrome-coronavirus-(mers-cov) 9 10 https://github.com/alsudias https://www.moh.gov.sa
Rumour in Arabic Rumour in English .AKðPñ» ð Q¯ É® JK é®JËB @ HA K@ñJmÌ '@ Pets are transporters of Coronavirus. .AKðPñºË ɯA K ñªJ.Ë@ Mosquitoes are transporters of Coron- avirus. .AKðPñ» ð Q®K. àñK. A B Y¯ ÈA®£B @ Children are not infected by Coron- avirus. .Q£AjÖÏ àñQªJK Y¯ ¡® ¯ áË@ PAJ.» Only old people may have a high risk .AKðPñ» ð Q¯ áÓ éJ of Coronavirus. .ð Q®Ë@ úΫ úæ® K éKXðQK. ð @ ®¢Ë@ èP@Qk Hot or cold weather can kill the virus. .ð Q®Ë@ úΫ úæ® K iÊÖÏ @ ð ZAÜÏ AK. èQ«QªË@ Gargling with water and salt eliminates the virus. .AKðPñºË@ áÓ ù® K H. A« @ ð HA ¢Êg Yg. ñK There are some herbs that protect against from Coronavirus. .i¢ B@ úΫ ù®J . K B ð Q®Ë@ The virus does not survive on surfaces. Table 1: List of rumours that appear during COVID-19 (source: Saudi Arabia Ministry of Health) Tweet in Arabic Tweet in English Label AKðPñ» ð Q¯ PA KB PAm' @ ¼AJë àñºJ There will be a decrease in the 1 úG. QªË@ ÕËAªË@ ú¯ Añk JË@ ɯ éK @ YK. ©Ó spread of the Corona virus at the (false) beginning of the summer, espe- . èP@QmÌ '@ ék PX ¨A®KPB @Q¢ . cially in the Arab world, due to the high temperatures. ú¯ AB AK. QºKQK ð ð Q®Ë@ ªK : éjË@ The Ministry of Health: A virus -1 lives and is mainly concentrated (true) K Q£ á« éËA® JK@ XP@ð Q« ½Ë YË úæ®JJË@ PAêm.Ì '@ in the respiratory system, so it . ñªJ.Ë@ é«YË ÈCg áÓ ð @ H@ QåmÌ '@ is not likely to be transmitted by insects or by mosquito bites. Ë@ ø Yë ú¯ ÑêÊË@ AJÔgQK à@ ½Ë A é»PA J.ÖÏ @ é«A Oh God, in this blessed hour, 0 We ask you to have mercy on us (unre- @QÓB @ Qå AJ¯ ð ZCK. ð Z@ X É¿ AJ« YªJ.K ð and keep away from us all dis- lated) . áÒÊÖÏ @ XCK. é¯A¿ ð AKXCK. ¡®k@ð . ÐA®B @ð ease and calamity, and protect us from the evil of diseases and sick- nesses. Preserve our country and other Muslim countries. Table 2: Example tweets and our labelling system entire sample into 90% training and 10% testing because it previously achieved the best accuracy for each fold. (77%), and employed it here to predict the source of the COVID-19 tweets that we had already labelled 5.3 Source Type Prediction in Section 5.2. We replicated a Logic Regression model from our 6 Results and Discussion previous study, which was useful for classifying tweets into five categories: academic, media, gov- 6.1 Cluster Analysis ernment, health professional, and public (Alsu- Our cluster analysis of the five main public topics dias and Rayson, 2019). We used this LR model discussed in tweets content is as follows: (1)
disease statistics: the number of infected, died, Mejova (2017). and recovered people; (2) prayers: prayer asking God to stop virus; (3) disease locations: spread Figures 3, 4, and 5 show the accuracy, F1-score, and location (i.e., name of locations, information recall, and precision on our corpus using LR, SVC, about spread); (4) Advise for prevention education: and NB algorithms with various feature selection health information (i.e., prevention methods, signs, approaches. The highest accuracy (84.03%) was symptoms); and (5) advertising: adverts for any achieved by the LR classifier with a count vector product either related or not related to the virus. set of features and SVC with TF-IDF. Therefore, Figure 2 illustrates the five topics of COVID-19 the count vector gives best result in LG and NB in tweets with examples. all metrics results whereas with precision which achieves better results with TF-IDF 83.71% in LG For each cluster, the top terms by frequency are and 81.28% in NB. while TF-IDF in SVC has the as follows: (1) disease statistics: case ( éËAg), new best results except recall which achieved 75.55% ( èYK Yg.), and infection ( éK. A@); (2) prayers: Allah in count vector set features. We also applied several word embedding based ( é
Figure 2: Examples of tweets in each cluster Tweet in Arabic Tweet in English Predicted Label ð Q®Ë@ PAKYK@ ©¯ñ JK ... éJ ÒÊ« èZ@Q¯ ú¯ In scientific reading ... the virus is Academic èP@QmÌ '@ I ÉK QK @ ú¯ expected to erode in April due to heat. .. . . àB @ úæk èY»ñÜÏ @ HB AmÌ '@ éjË@ JÓ HYj Health spokesman has confirmed cases Media áªËAJ.Ë AêÒªÓ ð AKðPñ» ð Q®K. éK . AÖÏ @ so far infected with coronavirus, mostly for adults. ð Q®Ë ɯA JË@ ñë , ñªJ.Ë@ @ñP èP@Pð AK Ministry of health please spray Government mosquitoes, as they are carriers of the ñªJ.Ë@ PA K@ ©Ó HA K. AB @ HX@ P AKðPñ» Coronavirus, increased infections as mosquitoes spread. ZAÜÏ @ PAm'. A J@ à@ Y»ñK úæJ QJ.k A Chinese expert confirms that inhal- Health ing water vapor kills coronavirus. professional AKðPñ» ð Q¯ ÉJ®K ÐñJË@ ð àñÒJÊËAK. AKðQºË@ h. C« Corona treatment with lemon and gar- Public ”H lic ”YouTube link”. . ñJKñK ¡. @P” Table 3: Some examples of false tweets from different source predicted labels 7 Conclusion and Future work methods of analysis suitable for helping Arab World Governments and Public Health Organi- In this paper, we identified and analysed one sations. Our analysis first identifies the topics million tweets related to the COVID-19 pandemic discussed on social media during the epidemic, in the Arabic language. We performed three detects the tweets that contain false information, experiments which we expect can help to develop
highest accuracy result was 84% achieved by the LR classifier with count vector set of features and SVC with TF-IDF. Finally, we also used our previous model to predict the source types of the sampled tweets. Around 60% of the rumour related tweets are classified as written by health professional and academics which shows the urgent need to respond to such fake news. The dataset, including tweet IDs, manually assigned labels for the sampled Figure 3: Results using Logistic Regression tweets, and other resources used in this paper are made freely available for academic research purposes 11 . There are clearly many potential future direc- tions related to analysing social media data on the topics of pandemics. Since false information has the potential to play a dangerous role in topics related to health, there is a need to enhance and automate the automatic detection process support- ing different languages beyond just English. Future potential directions include monitoring the spread of the disease by finding the infected individuals, Figure 4: Results using Support Vector Classification defining the infected locations, or observing people that do not apply self isolation rules. Moreover, the analysis could proceed in an exploratory and thematic way such as discovering further topics discussed during the epidemic, as well as assist- ing governments and public health organisations in measuring people’s concerns resulting from the disease. References Wasim Ahmed, Peter A Bath, Laura Sbaffi, and Gian- luca Demartini. 2019. Novel insights into views to- wards H1N1 during the 2009 Pandemic: a thematic Figure 5: Results using Naı̈ve Bayes analysis of Twitter data. Health Information & Li- braries Journal, 36(1):60–72. and predicts the source of the rumour related Wasim Ahmed, Gianluca Demaerini, and Peter A Bath. tweets based on our previous model for other 2017a. Topics discussed on twitter at the beginning diseases. The clustered topics are COVID-19 of the 2014 Ebola epidemic in United States. iCon- ference 2017 Proceedings. statistics, prayers for God, COVID-19 locations, advise for preventing education, and advertising. Wasim Ahmed, G. Demartini, and P. Bath. 2017b. Top- ics Discussed on Twitter at the Beginning of the Our second contribution is a labeled sample of 2014 Ebola Epidemic in United States. tweets (2,000 out of 1 million) annotated for false Sarah Alqurashi, Ahmad Alhindi, and Eisa Alanazi. information, correct information, and unrelated. To 2020. Large arabic twitter dataset on covid-19. investigate the replicability and scalability of this arXiv preprint arXiv:2004.04315. annotation, we applied multiple Machine Learning 11 https://doi.org/10.17635/lancaster/ Algorithms with different sets of features. The researchdata/375
Mohamed Taher Alrefaie. 2017. arabic stop Edward Ma. 2018. 3 basic approaches in Bag of Words words. https://github.com/mohataher/ which are better than Word Embeddings. arabic-stop-words. Rabeaa Mouty and Achraf Gazdar. 2018. Survey on Lama Alsudias and Paul Rayson. 2019. Classifying in- Steps of Truth Detection on Arabic Tweets. In formation sources in Arabic twitter to support online 2018 21st Saudi Computer Society National Com- monitoring of infectious diseases. In Proceedings puter Conference (NCC), pages 1–6. IEEE. of the 3rd Workshop on Arabic Corpus Linguistics, pages 22–30, Cardiff, United Kingdom. Association Michelle Odlum and Sunmoo Yoon. 2015. What can for Computational Linguistics. we learn about the Ebola outbreak from tweets? American journal of infection control, 43(6):563– Lama Alsudias and Paul Rayson. 2020. Developing 571. an Arabic Infectious Disease Ontology to Include Non-Standard Terminology. In Proceedings of The Yeimer Ortiz-Martı́nez and Luisa F Jiménez-Arcia. 12th Language Resources and Evaluation Confer- 2017. Yellow fever outbreaks and Twitter: Rumors ence, pages 4844–4852, Marseille, France. Euro- and misinformation. American journal of infection pean Language Resources Association. control, 45(7):816–817. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Fabian Pedregosa, Gaël. Varoquaux, Alexandre Gram- Tomas Mikolov. 2017. Enriching word vectors with fort, Vincent Michel, Bertrand Thirion, Olivier subword information. Transactions of the Associa- Grisel, Mathieu Blondel, Peter Prettenhofer, Ron tion for Computational Linguistics, 5:135–146. Weiss, Vincent Dubourg, Jake Vanderplas, Alexan- dre Passos, David Cournapeau, Matthieu Brucher, Emily Chen, Kristina Lerman, and Emilio Ferrara. Matthieu Perrot, and Édouard Duchesnay. 2011. 2020. Covid-19: The first public coronavirus twit- Scikit-learn: Machine Learning in Python. Journal ter dataset. arXiv preprint arXiv:2003.07372. of Machine Learning Research, 12:2825–2830. I.C.-H Fung, King-wa Fu, C.-H Chan, Benedict Chan, Chi-Ngai Cheung, Thomas Abraham, and Zion Tse. Umair Qazi, Muhammad Imran, and Ferda Ofli. 2020. 2016a. Social Media’s Initial Reaction to Informa- GeoCoV19: A Dataset of Hundreds of Millions of tion and Misinformation on Ebola, August 2014: Multilingual COVID-19 Tweets with Location Infor- Facts and Rumors. Public Health Reports, 131:461– mation. arXiv preprint arXiv:2005.11177. 473. Tatiana Vorovchenko, Proochista Ariana, Francois van Isaac Chun-Hai Fung, Carmen Hope Duke, Loggerenberg, and Amirian Pouria. 2017. Ebola Kathryn Cameron Finch, Kassandra Renee Snook, and Twitter. What Insights Can Global Health Draw Pei-Ling Tseng, Ana Cristina Hernandez, Manoj from Social Media?, pages 85–98. Gambhir, King-Wa Fu, and Zion Tsz Ho Tse. 2016b. Arkaitz Zubiaga, Ahmet Aker, Kalina Bontcheva, Ebola virus disease and social media: A systematic Maria Liakata, and Rob Procter. 2018a. Detection review. American journal of infection control, and Resolution of Rumours in Social Media: A Sur- 44(12):1660—1671. vey. ACM Comput. Surv., 51(2). Amira Ghenai and Yelena Mejova. 2017. Catching Zika fever: Application of crowdsourcing and ma- Arkaitz Zubiaga, Ahmet Aker, Kalina Bontcheva, chine learning for tracking health misinformation on Maria Liakata, and Rob Procter. 2018b. Detection Twitter. arXiv preprint arXiv:1707.03778. and resolution of rumours in social media: A survey. ACM Computing Surveys (CSUR), 51(2):1–36. Fatima Haouari, Maram Hasanain, Reem Suwaileh, and Tamer Elsayed. 2020. The First Arabic COVID-19 Twitter Dataset with Propagation Net- works. arXiv preprint arXiv:2004.05861. Aditya Joshi, Sarvnaz Karimi, Ross Sparks, Cécile Paris, and C Raina Macintyre. 2019. Survey of Text-based Epidemic Intelligence: A Computational Linguistics Perspective. ACM Computing Surveys (CSUR), 52(6):1–19. Janani Kalyanam, Sumithra Velupillai, Son Doan, Mike Conway, and Gert R. G. Lanckriet. 2015. Facts and Fabrications about Ebola: A Twitter Based Study. ArXiv, abs/1508.02079. Christian E Lopez, Malolan Vasu, and Caleb Galle- more. 2020. Understanding the perception of COVID-19 policies by mining a multilanguage Twit- ter dataset. arXiv preprint arXiv:2003.10359.
You can also read