Both Rates of Fake News and Fact-based News on Twitter Negatively Correlate with the State-level COVID-19 Vaccine Uptake
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Both Rates of Fake News and Fact-based News on Twitter Negatively Correlate with the State-level COVID-19 Vaccine Uptake Hanjia Lyu, 1 Zihe Zheng, 1 Jiebo Luo 2,* 1 University of Rochester, Goergen Institute for Data Science, Rochester, 14627, USA 2 University of Rochester, Department of Computer Science, Rochester, 14627, USA * Corresponding author: Jiebo Luo (jluo@cs.rochester.edu) Abstract Data Collection arXiv:2106.07435v1 [cs.SI] 14 Jun 2021 Twitter Data There is evidence of misinformation in the online discourses We use the Tweepy API1 to collect the related tweets and discussions about the COVID-19 vaccines. Using a sam- ple of 1.6 million geotagged English tweets and the data from that are publicly available. The search keywords and hash- the CDC COVID Data Tracker, we conduct a quantitative tags are COVID-19 vaccine-related or vaccine-related, in- study to understand the influence of both misinformation and cluding “vaccine”, “COVID-19 vaccine”, “COVID vac- fact-based news on Twitter on the COVID-19 vaccine uptake cine”, “COVID19 vaccine”, “vaccinated”, “immunization”, in the U.S. from April 19 when U.S. adults were vaccine eligi- “covidvaccine”, “#vaccine” and “covid19vaccine”.2 Slang ble to May 7, 2021, after controlling state-level factors such and misspellings of the related keywords are also included as demographics, education, and the pandemic severity. We which are composed of “vacinne”, “vacine”, “antivax” and identify the tweets related to either misinformation or fact- “anti vax”. The tweets that are only related to other vaccine based news by analyzing the URLs. By analyzing the con- topics like MMR, autism, HPV, tuberculosis, tetanus, hep- tent of the most frequent tweets of these two groups, we find atitis B, flu shot or flu vaccine are removed. Moreover, since that their structures are similar, making it difficult for Twit- ter users to distinguish one from another by reading the text this study focuses on the tweets posted by the U.S. Twitter alone. The users who spread both fake news and fact-based users, we use the geo-location disclosed in the users’ pro- news tend to show a negative attitude towards the vaccines. files to filter out the tweets of non-US users. Similar to Lyu We further conduct the Fama-MacBeth regression with the et al. (2020), the locations with noise are excluded. 1.6 mil- Newey-West adjustment to examine the effect of fake-news- lion geotagged tweets as well as the retweets posted from related and fact-related tweets on the vaccination rate, and April 12, 2021 to May 7, 2021 are collected. find marginally negative correlations. CDC COVID-19 Data The daily state-level number of people with at least one dose, confirmed cases, and deaths per hundred are extracted from Introduction the CDC COVID Data Tracker (Centers for Disease Control and Prevention 2021). A previous study (Wu, Lyu, and Luo 2021) has found evi- dence of misinformation in the online discourses and discus- Census Data sions about the COVID-19 vaccines. Rzymski et al. (2021) From the latest American Community Survey 5-Year Data suggested tracking and tackling emerging and circulating (2015-2019) (U.S. Census Bureau 2020), we collect (1) fake news. Montagni et al. (2021) argued to increase peo- the percentage of male persons; (2) the percentage of per- ple’s ability to detect fake news. Marco-Franco et al. (2021) sons aged 65 years and over; (3) the percentage of White found that citizens do not support the involvement of govern- alone, not Hispanic or Latino; (4) the percent- ment authorities in the direct control of news. Collaboration age of Black or African American alone; (5) with the media and other organizations should be used in- the percentage of Asian alone; (6) the percentage of stead. However, little is known about the scale and scope of Hispanic or Latino; (7) the percentage of persons the influence of misinformation and fact-based news about aged 25 years and over with a Bachelor’s degree or higher; COVID-19 vaccines on social media platforms on the vac- (8) the percentage of persons in the labour force (16 years cine uptake. To summarize, this work (1) quantitatively an- and over); and (9) per capita income in the past 12 months alyzes the effect of fake news and fact-based news on the (in 2019 dollars). vaccine uptake in the U.S. using the Fama-MacBeth regres- sion with the Newey-West adjustment and (2) compares the 1 https://www.tweepy.org/ [Accessed June 9, 2021] most frequent fake-news-related and fact-related tweets by 2 The capitalization of non-hastag keywords does not matter in conducting a content analysis. the Tweepy query.
2020 National Popular Vote Data approaches the lowest on weekends. Therefore, we apply a The results of the 2020 national popular vote (Wasserman 7-day moving average to the vaccination data. To maintain et al. 2020) are used to estimate the political affiliation of the consistency, the number of confirmed cases and deaths individual states. Since the sums of the shares of Biden and are processed in the same way. the shares of Trump are almost equal to 100%, we only se- As for the Twitter data, since Twitter users can post tweets lect the shares of Biden. To keep the consistency among the repeatedly, the series of (1) the percentage of unique Twitter variables, the state-level shares are chosen. users who post fake-news-related tweets, and (2) the per- centage of unique Twitter users who post fact-related tweets Methodology are only processed with a 7-day moving average. Tweets Classification Study Period Following the method of Bovet and Makse (2019), we in- This study focuses on understanding the influence of misin- tend to classify the tweets into (1) fake-news-related, (2) formation about COVID-19 vaccines on Twitter on the vac- fact-related, and (3) others, by examining the URLs (if cine uptake. The CDC vaccination data may not reflect the any) of the tweets. More specifically, if the URL’s domain intention to receive vaccination when the vaccines were not name is judged to be related to the websites containing fake available for all the U.S. adults. We thus set the study pe- news, conspiracy theories, or extremely biased news, the riod to be from April 19, 2021 to May 7, 2021. April 19 is tweets that are associated with (i.e., contain/retweet/quote) selected as the start date because, according to the Reuters4 , this URL are classified as fake-news-related. If the URL’s the U.S. President Joe Biden moved up the COVID-19 vac- domain name is judged to be related to the websites that are cine eligibility target for all American adults to April 19. traditional, fact-based, news outlets, the tweets that are as- sociated with this URL are classified as fact-related. If the Fama-MacBeth Regression tweets are not associated with any URLs or the URLs’s do- main names are not identified as fake-news-related or fact- In our study, we attempt to analyze five time series data, but related, the tweets are classified as others. most of them are non-stationary. For example, the time se- ries of the vaccination data show a declining trend during Websites Classification The curated list of fake-news- our study period. Noticeably, the vaccination data, at this related websites, composed of 1,125 unique domain names, stage, has already been transformed into a relative change was built by the Columbia Journalism Review.3 They built rate. To avoid the spurious regression problem, which might the list by merging the major curated fake-news site lists pro- lead to a incorrectly estimated linear relationship between vided by fact-checking groups like PolitiFact, FactCheck, non-stationary time series variables (Kao 1999), we con- OpenSources, and Snopes. The domain names that were as- duct the Fama-MacBeth regression (Fama and MacBeth signed as fake, conspiracy, bias, and unreliable are included 2021) with the Newey-West adjustment (lag=2) (Newey and in our study. West 1986), which has also been applied in several previ- The curated list of fact-related websites, composed of 77 ous studies to address the time effect in areas such as fi- unique domain names, was reported by Bovet and Makse nance (Loughran and Ritter 1996), public health and epi- (2019). They identified the most important traditional news demiology (Wang et al. 2021). Apart from the time series outlets by manually inspecting the list of top 250 URLs’ do- data, we add control variables from the aforementioned data main names. sources including the Census data and the 2020 National Popular Vote data. Extracting Domain Names The tweets with no URLs are classified as others. For the rest, we compare the domain names of the URLs of the tweets with the aforementioned Results curated lists. Similarly to Bovet and Makse (2019), most News Spreading on Twitter URLs are shortened. We use the Python Requests package There are 616 unique Twitter users who are associated with to open the URLs and extract the actual domain names from fake news, while 11,948 that are associated with fact-based the complete URLs. news. Interestingly, 184 are associated with both fake news and fact-based news, which account for 29.9% and 1.5% of Preprocessing the fake-news-related users and fact-related users, respec- The daily state-level number of people with at least one dose, tively. This suggests that people who are associated with confirmed cases, deaths per hundred are transformed using fake news are more likely to be associated with fact-based a two-step procedure. First, we calculate the relative change news, but not the other way around. rates of these three variable. Next, we smooth the data us- The state-level percentages of fake-news-related and fact- ing a simple moving average. According to the CDC vac- related Twitter users are presented in Figure 1. The states cination data (Centers for Disease Control and Prevention without sufficient data (at least 300 unique Twitter users) 2021), there is a seasonal pattern inside the daily number of 4 people with at least one dose. The number normally reaches https://www.reuters.com/article/us-health-coronavirus- the highest in the middle of the week (i.e., Thursdays), and usa/all-american-adults-to-be-eligible-for-covid-19-vaccine-by- april-19-biden-idUKKBN2BT1IF?edition-redirect=uk [Accessed 3 https://www.cjr.org/fake-beta [Accessed June 9, 2021] June 9, 2021]
Percentage of Twitter Users Associated with Tweets Linked to Fake News quoted by the users who are related to both fake news and Percent fact-based news in Table 1. It is interesting that the news 0.0045 that are spread by this group of users is mostly negative 0.004 about the COVID-19 vaccines. Moreover, when these users 0.0035 post tweets linked to fact-based news, they are most likely 0.003 to show a negative attitude towards the vaccines. For exam- 0.0025 ple, they argue “Pure evil.” to a piece of news with the title 0.002 0.0015 “Children as young as 6 months old now in COVID-19 vac- 0.001 cine trials” written by ABC News.5 0.0005 Percentage of Population Receiving at Least One Dose Percent 65 (a) 60 Percentage of Twitter Users Associated with Tweets Linked to Fact-based News Outlets 55 Percent 50 0.035 45 0.03 40 35 0.025 0.02 Figure 2: Percent of the population receiving at least one dose. (b) Figure 1: (a) Percentage of the Twitter users associated with Fake News, Fact-based News, and Vaccination Rate tweets linked to fake news. (b) Percentage of the Twitter users associated with tweets linked to traditional news out- We conduct the Fama-MacBeth regression with the Newey- lets. West adjustment (lag=2) of the 7-day average relative change of the number of people receiving at least one dose per hundred, on the 7-day average percentages of unique fake-news-related and fact-related Twitter users, during the are plotted in grey. The states with relatively higher rates of period when all U.S. adults are eligible for COVID-19 vac- fake-news-related users are located spatially sparsely. How- cines, while controlling other factors. Figure 2 shows the ever, the states with similar rates of fact-related users seem vaccination rates for different U.S. states as of May 7, 2021. to cluster together. Interestingly, Maine, Louisiana and Mon- Table 2 summarizes the results of the Fama-MacBeth re- tana are the states with relatively lower rates of fact-related gression, which suggest marginal effects of the fake news users but higher fake-news-related users. and fact-based news on the vaccination rate. Overall, the Table 1 lists the top 10 most frequent tweets linked to percentages of fake-news-related and fact-based new-related fake and fact-based news websites. The top 10 most frequent Twitter users are negatively associated with the relative fake-news-related tweets account for 19.9% of the number change of vaccination rates: 1 percent increase in fake-news- of the total fake-news-related tweets, while the top 10 of the related Twitter users is negatively associated with the rel- fact-related tweets comprise of 28.1%, which indicates that ative change of the vaccination rate (B = −0.13, SE = the distribution of the fake news on Twitter is more even 0.07, p < .1); 1 percent increase in fact-related Twitter users compared to the fact-based news. This is consistent with the is negatively associated with the relative change of the vac- pattern of the fake and fact-related tweets during the 2016 cination rate (B = −0.04, SE = 0.02, p < .1). Even the presidential election (Bovet and Makse 2019). By reading percentage of fact-related Twitter users is almost 20 times the text alone, it is difficult to distinguish the fake news from the percentage of fake-news-related Twitter users, the coef- the fact-based news. The structures of these two types of ficient of the fake-news-related user rate is greater. news are similar. For example, capitalizing all the letters of the first word is observed in both group. In addition, the text Control Variables of both groups describes actions or quotes sentences from political figures and celebrities. Some of the control variables are significantly associated with the vaccination rate. Demographically, 1 percent in- As aforementioned, there is a group of Twitter users who crease in the fraction of male population is negatively as- are involved with spreading both fake news and fact-based news. To better understand the difference of the fake-news- 5 https://abcnews.go.com/US/children-young-months-now- related and fact-related tweets, we further highlight in bold covid-19-vaccine-trials/story?id=77353416 [Accessed June 9, the tweets that are most frequently posted, retweeted or 2021]
Table 1: Top 10 most frequent tweets linked to fake/fact-based news websites. Fake Fact-based BREAKING: U.S. health officials say fully vaccinated Americans Will Officially Need A Vaccine Passport For Travel Americans don’t need to wear masks outdoors anymore unless To Europe In 2021 they are in a big crowd of strangers. Cuomo Announces Separate Baseball Stadium Seating For JUST IN: All adults in US now eligible for COVID-19 vaccine Vaccinated And Unvaccinated Fans Joe Rogan Says Not To Get Vaccinated If You’re Young, Sparks Herpes infection possibly linked to COVID-19 vaccine, Massive Backlash study says These poll numbers prove it: People who refuse to wear masks Vaccination Site Denying Appointments To White People In or respect social distance are also refusing vaccination. They’re The Name Of ‘Equity’ holding us back from resuming normal life. In fact, they’re counting on the rest of us to get vaccinated so they don’t have to. WATCH: Fauci calls out GOP Sen. Johnson’s questioning of ANALYSIS: After A Year Of Being Wrong, Experts Confused vaccine effort: "How can anyone say that 567,000 dead About ‘Vaccine Hesitancy’ Americans is not an emergency?" BREAKING: Vaccinated people can ditch the mask outdoors Shh — The Media Doesn’t Seem To Want You To Know in many cases, CDC says—but should still wear one in COVID-19 Cases Are Plummeting Nationally crowds and indoor public spaces. Has general population been enrolled in a vast and unimaginably U.S. health officials conclude that it was anxiety, and not a dangerous phase-three clinical trial without legal informed problem with the coronavirus vaccine, that caused apparent consent? This is a criminal enterprise, the likes of which this reactions in dozens of people this month. Basically, some world has never seen before... people get so upset by injections that their anxiety spurs #Vaccine #Pfizer #Moderna #COVID19 physical symptoms. It’s Not Working: 86 Million Vaccinated, yet Daily Covid-19 Joe Rogan tells 21-year-olds not to get COVID vaccine on Cases Are the Same as They Were in February popular Spotify show Ted Nugent tests positive for COVID-19 after refusing vaccine, Rand Paul has a brutal message for Joe Biden. via falsely claiming “nobody knows what’s in it” "On the Montana side of the border, vaccine recipients were often ‘Follow The CCD Guidelines’ And ‘Visit Vaccines.Gum’: emotional, shedding tears, shouting words of gratitude through car President Biden Gaffes His Way Through Press Conference windows as they drove away, and handing the nurses gifts such as chocolate and clothing."
Table 2: The results of the Fama-MacBeth regression with tweets linked to the fake news and fact-based news, we find the Newey-West adjustment (lag=2). that their structures are similar, which makes it difficult to distinguish one from another by reading the text alone. Fur- Variable B SE t-stat p value thermore, by examining the most frequent tweets that are posted by the users who are identified as spreading both Fake news -0.13 0.07 -1.85 0.081 fake news and fact-based news, we find that these overlap- Fact-based news -0.04 0.02 -1.91 0.072 ping users tend to post negative news about COVID-19 vac- Male -0.11 0.03 -3.90 0.001 cines, and even the news is neutral, they are likely to show 65 years and over -0.01 0.01 -0.80 0.432 a negative attitude. It seems that it is the negativity inside White 0.00 0.00 2.08 0.052 the news instead of the authenticity of the news that is as- Black -0.00 0.00 -0.74 0.469 sociated with the vaccination rates. Future work could in- Asian 0.04 0.01 5.20 0.000 vestigate the causal relationship between the news and the Hispanic 0.00 0.00 0.95 0.353 vaccine uptake – whether the users who do not intend to Bachelor 0.00 0.00 7.42 0.000 take the vaccines tend to spread negative news or they read Labor 2.55e-5 4.50e-5 0.57 0.578 negative news and become resistant to the vaccines. More- Income -2.68e-7 4.90e-8 -5.47 0.000 over, this work employs a method to identify fake news and Confirmed cases 0.36 0.06 6.14 0.000 fact-based news only using the URLs, which could poten- Deaths -0.07 0.09 -0.82 0.423 tially cause a sample bias. However, one of the advantages of Biden shares 1.31e-4 1.36e-5 9.57 0.000 this approach over other text-based machine learning or deep const 0.05 0.01 3.96 0.000 learning methods (Jin et al. 2016, 2017b,a) is its high preci- sion rate. In the future, we intend to combine these methods to detect fake news more reliably. sociated with the relative change of the vaccination rate (B = −0.11, SE = 0.03, p < .01). No statistically signifi- cant relationship is found between the percentage of persons References aged 65 and over, which is within our expectation, since this Bovet, A.; and Makse, H. A. 2019. Influence of fake news demographic group is among the first batches who are el- in Twitter during the 2016 US presidential election. Nature igible for the COVID-19 vaccines in the U.S. By the time communications 10(1): 1–14. of our study period, over 78% of the people aged 65 years Centers for Disease Control and Prevention. 2021. COVID and over have received at least one dose (Centers for Disease Data Tracker . Control and Prevention 2021). As for the race and ethnicity, the percentage of White alone, not Hispanic or Fama, E. F.; and MacBeth, J. D. 2021. Risk, return, and Latino (B = 0.00, SE = 0.00, p < .1) and the percent- equilibrium empirical tests. University of Chicago Press. age of Asian alone (B = 0.04, SE = 0.01, p < .001) Jin, Z.; Cao, J.; Guo, H.; Zhang, Y.; and Luo, J. 2017a. Mul- are both positively associated with the vaccination rate. The timodal fusion with recurrent neural networks for rumor de- patterns of the demographic variables are in consistent with tection on microblogs. In Proceedings of the 25th ACM in- the ongoing vaccination trends (Centers for Disease Con- ternational conference on Multimedia, 795–816. trol and Prevention 2021). With respect to the educational level, 1 percent increase in the percentage of persons aged Jin, Z.; Cao, J.; Guo, H.; Zhang, Y.; Wang, Y.; and Luo, J. 25 years and over with a Bachelor’s degree or higher is posi- 2017b. Detection and analysis of 2016 us presidential elec- tively associated with the vaccination rate (B = 0.00, SE = tion related rumors on twitter. In International conference 0.00, p < .001). Socioeconomically, per capita income (in on social computing, behavioral-cultural modeling and pre- 2019 dollars) is negatively associated with the vaccination diction and behavior representation in modeling and simu- rate (B = −2.68e − 7, SE = 4.90e − 8, p < .001). lation, 14–24. Springer. Furthermore, the daily COVID-19 confirmed cases is posi- Jin, Z.; Cao, J.; Zhang, Y.; and Luo, J. 2016. News ver- tively associated with the vaccination rate (B = 0.36, SE = ification by exploiting conflicting social viewpoints in mi- 0.06, p < .001). The shares of the 2020 National Popular croblogs. In Proceedings of the AAAI Conference on Artifi- Vote of Biden is positively associated with the vaccination cial Intelligence, volume 30. rate (B = 1.31e − 4, SE = 1.36e − 5, p < .001). Kao, C. 1999. Spurious regression and residual-based tests for cointegration in panel data. Journal of econometrics Discussion and Conclusion 90(1): 1–44. We identify the marginal negative correlations between the percentages of the U.S. Twitter users who are associated Loughran, T.; and Ritter, J. R. 1996. Long-term market over- with fake news, fact-based news and the U.S. COVID-19 reaction: The effect of low-priced stocks. The Journal of vaccination rates during the period when all U.S. adults are Finance 51(5): 1959–1970. eligible for the COVID-19 vaccines. Montagni et al. (2021) Lyu, H.; Chen, L.; Wang, Y.; and Luo, J. 2020. Sense and argued that the acceptance of a COVID-19 vaccine is as- sensibility: Characterizing social media users regarding the sociated with ability to detect fake news and health liter- use of controversial terms for covid-19. IEEE Transactions acy. However, by analyzing the content of the most frequent on Big Data .
Marco-Franco, J. E.; Pita-Barros, P.; Vivas-Orts, D.; González-de Julián, S.; and Vivas-Consuelo, D. 2021. COVID-19, Fake News, and Vaccines: Should Regulation Be Implemented? International Journal of Environmental Research and Public Health 18(2): 744. Montagni, I.; Ouazzani-Touhami, K.; Mebarki, A.; Texier, N.; Schück, S.; Tzourio, C.; et al. 2021. Acceptance of a Covid-19 vaccine is associated with ability to detect fake news and health literacy. Journal of public health (Oxford, England) . Newey, W. K.; and West, K. D. 1986. A simple, positive semi-definite, heteroskedasticity and autocorrelationconsis- tent covariance matrix. Technical report, National Bureau of Economic Research. Rzymski, P.; Borkowski, L.; Drag, ˛ M.; Flisiak, R.; Jemielity, J.; Krajewski, J.; Mastalerz-Migas, A.; Matyja, A.; Pyrć, K.; Simon, K.; et al. 2021. The strategies to support the COVID-19 vaccination with evidence-based communication and tackling misinformation. Vaccines 9(2): 109. U.S. Census Bureau. 2020. 2015-2019 American Commu- nity Survey 5-year . Wang, J.; Tang, K.; Feng, K.; Lin, X.; Lv, W.; Chen, K.; and Wang, F. 2021. Impact of temperature and relative humid- ity on the transmission of COVID-19: a modelling study in China and the United States. BMJ open 11(2): e043863. Wasserman, D.; Andrews, S.; Saenger, L.; Cohen, L.; Flinn, A.; and Tatarsky, G. 2020. 2020 National Popular Vote Tracker. The Cook Political Report . Wu, W.; Lyu, H.; and Luo, J. 2021. Characterizing Discourse about COVID-19 Vaccines: A Reddit Version of the Pan- demic Story. arXiv preprint arXiv:2101.06321 .
You can also read