Assessing population-level symptoms of anxiety, depression, and suicide risk in real time using NLP applied to social media data
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Assessing population-level symptoms of anxiety, depression, and suicide risk in real time using NLP applied to social media data Alex B. Fine, Patrick Crutchley, Jenny Blase, Joshua Carroll, & Glen Coppersmith Qntfy {alex.fine, patrick, jenny.blase, josh, glen}@qntfy.com Abstract community-level interventions. The dramatic social upheavals of 2020 pro- Prevailing methods for assessing population- level mental health require costly collection vide a visceral illustration of how specific com- of large samples of data through instruments munities are psychologically affected by spe- such as surveys, and are thus slow to re- cific events. For example, the COVID-19 pan- flect current, rapidly changing social condi- demic, which took root in the United States tions. This constrains how easily population- in February and March of 2020, in addition to level mental health data can be integrated into threatening the health of a broad swath of the health and policy decision-making. Here, we population, placed particularly heavy demands demonstrate that natural language processing on healthcare providers charged with responding applied to publicly-available social media data can provide real-time estimates of psycholog- to a highly contagious and deadly novel virus, ical distress in the population (specifically, often under resource-constrained circumstances. English-speaking Twitter users in the US). We Anecdotal reports made it clear that the surge examine population-level changes in linguis- in cases–coupled with factors such as under- tic correlates of mental health symptoms in funded clinics and lack of a coordinated federal response to the COVID-19 pandemic and to response–was leading to acute psychological dis- the killing of George Floyd. As a case study, tress and burnout among healthcare providers such we focus on social media data from health- care providers, compared to a control sample. as nurses and physicians. In addition, the killing Our results provide a concrete demonstration of George Floyd on May 25, 2020 elicited na- of how the tools of computational social sci- tionwide responses of grief and anger, and is ence can be applied to provide real-time or widely believed to have surfaced latent psycholog- near-real-time insight into the impact of pub- ical trauma in large swaths of the American and in- lic events on mental health. ternational population. In both instances, we saw that there was and is no scalable technique for col- 1 Introduction lecting population-scale data to quantify changes Measurements of the mental health of large pop- in mental health over time, to ask which segments ulations often become quickly outdated, given of the population are most severely affected by traditional techniques for data collection, analy- the situation, or to determine which psychological sis, and dissemination. For example, estimates symptoms are changing in prevalence and there- of suicide rates in the United States are often fore what interventions should be prioritized by delayed by two years (Hedegaard et al., 2018). the community. More up-to-date information about population- Here, we focus on healthcare providers (HCPs) level mental health could provide clinicians and as a case study, and present a framework for mon- other decision-makers with crucial warning sig- itoring signs of psychological distress in a contin- nals of shifts in mental health or burgeoning pub- uous, scalable, and ethical fashion (Mikal et al.) lic health crises. Continuous access to sound esti- using public social media data. We use models of mates of population-level mental health variables anxiety, depression, and suicide risk, trained on a could also provide a mechanism for evaluating separate data source, to produce longitudinal es- 50 Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science, pages 50–54 Online, November 20, 2020. c 2020 Association for Computational Linguistics https://doi.org/10.18653/v1/P17
timates of the prevalence of symptoms associated niques modeled on those reported by Beller et al. with these conditions among HCPs and a compar- (2014), who automatically identify profession and ison sample. other fine-grained social roles on the basis of self- The model-derived estimates of symptom disclosure. Here, we manually constructed a cor- prevalence show relative changes in mental health pus of HCP professional labels (e.g., “physician”, aligned with the timing of events related to “doctor”, “nurse”, “RN”) and searched for strings COVID-19 and the killing of George Floyd among containing these labels in contexts demonstrated HCPs in the US. For example, we were able to by Beller et al. to indicate that the author identi- observe the particularly negative impact of the fies with that role (e.g., “I’m a ”, “As a I COVID-19 pandemic across the population. Fur- think”). This classification was then manually as- thermore, we find no evidence that rescinding sessed by human annotators and found to have a stay-at-home orders reversed the deleterious ef- 95% true positive rate. The control sample used fects of the pandemic on mental health, nor do we in these analyses comprise a sample of the general find evidence that either healthcare workers or the population in the United States (henceforth Com- general population had returned to their respective munity, n = 10, 000) that did not self-identify as pre-COVID levels of anxiety, depression, and sui- HCPs, selected randomly from users for whom ge- cide risk at the time of writing. ographic data was available (either through a geo- Moreover, we find evidence that the killing of tagging algorithm or disclosure of their location in George Floyd and subsequent civil unrest across their public profile). Users with fewer than 100 the United States had a measurably deleterious ef- posts between the start of the year until the end of fect on all aspects of mental health measured in May were excluded from the analysis. both the HCP and control populations. These findings constitute, we believe, a persua- 3 Methods sive proof of concept for the use of transparently We estimate the impact of various national events and ethically collected social media data in pro- in 2020 on population mental health. To do so, viding aggregated, real-time, population-level es- we compare measures of average anxiety, depres- timates of emotional and psychological distress, sion and suicide risk before and after each event. extending the capabilities of what is commonly We will refer to the “Pre-Lockdown Baseline” as known as infoveillance (Paul and Dredze, 2011; the time period from January 1-February 29. The Eichstaedt et al., 2015; Paparrizos et al., 2016; national emergency declaration from the White Eysenbach, 2009). (For a review of different House came on March 13, 2020, and many stay- approaches to assessing population-level mental at-home orders were put in place around that time. health, see Aoun et al. (2004)) We believe the data We define “Early Lockdown” as March 15 to collection and modeling techniques reported here March 31, as it signifies a time when people were can inform and improve public and private efforts adjusting to the changes induced by the lockdown to promote population-level mental health. including job loss, homeschooling, and working from home. We refer to the period of April 15 2 Data to April 30 as “Mid Lockdown”.1 States took a All analyses were performed using public social varied approach to lifting lockdown restrictions, media data collected from Twitter between Jan- and each followed their own timeline. We sus- uary 1 and June 1, 2020. Analyses are based on pect the lifting of stay-at-home guidance may have two groups: healthcare professionals and a com- impacted people’s mental health, and obtained the munity sample group. Healthcare professionals state-specific dates on which those orders were (HCPs, n = 25, 040) are comprised of providers lifted. On May 25, George Floyd was killed in po- working directly with patients (e.g., nurses, doc- lice custody, setting off protests and unrest across tors) and those in adjacent roles (e.g., epidemi- the United States. We examine one week prior to ologists and hospital administrators). Users were and after his death (May 18-25; May 26-June 2). geo-located using self-stated location in the user We use classification models, trained on sepa- profile, and only US-based users were included in 1 These time periods were specified before the analyses the analysis. In order to determine which indi- reported below. We did not experiment with multiple time viduals in our sample were HCPs, we used tech- windows. 51
rate data sets from the one described above, to group relative to their Pre-Lockdown baseline, we score each Tweet in the sample with an estimate calculated by-group Z-scores from this baseline. of the probability that Tweet was authored by a This is illustrated in Figure 1, along with the time person experiencing anxiety or depression or who periods under consideration. had attempted suicide. The labels used in the train- First, note that every time period after lockdown ing were derived via self-stated diagnosis: a user exhibits higher scores for all mental health con- was considered to be living with anxiety, depres- ditions we examined. Furthermore, the killing of sion, or suicidality if they explicitly reported that George Floyd appears to have had a significant ef- they had received a diagnosis of an anxiety dis- fect on mental health across all groups. order or depression or had previously attempted Longitudinal changes in depression for HCPs suicide, respectively. Examples of self-statements and Community do not differ reliably (p > 0.1). include disclosures such as, “As a person who has HCPs exhibit less change in their anxiety over been diagnosed with general anxiety disorder, I time compared to Community (though HCPs are can tell you...”, “today marks one year since I tried still at a higher base-rate of anxiety). Interestingly, to take my own life”. Self-statements were found HCPs show a larger change in suicide-related risk using manually constructed search terms and reg- during Early Lockdown. This disappears in Mid ular expressions; we then confirmed their plausi- Lockdown and gets closer to returning to base- bility and validity using human annotators with line rates towards the end of May (note, again, clinical training. Logistic regression with char- that baseline rates for HCPs remain higher than acter n-gram features were trained on three sep- for Community). arate samples (anxiety, depression, suicide) to dis- tinguish users with a self-stated mental health di- 5 Discussion agnosis from control users reporting no such diag- noses. We employed the same models reported in Real-time information about the population’s our previous work, using the anxiety and depres- mental health is critically important, especially in sion models from Coppersmith et al. (2015) and times of crisis. Our work is relevant to govern- the suicide model from Coppersmith et al. (2018). ment agencies or other organizations with the re- AUC scores for the anxiety, depression, and sui- sources to craft population-scale public health in- cide models were .84, .72, and .73, respectively. terventions or policy recommendations. The cur- For each measure (anxiety, depression, suicide), rent study provides a proof of concept of how pub- we computed the mean of all messages per user licly available social media data might be used to per day. Each user is thus represented as the assess population-level mental health in a way that mean of their per-day estimates. This allows could support these organizations. for matched-sample t-tests between time periods, We hasten to emphasize that this work repre- and independent t-tests between groups within the sents a proof of concept, and raises several ques- same time period. User data was de-identified tions for future research. First, the population of prior to being submitted to these models, and all social media users does not perfectly mirror the statistical analysis was conducted over aggregated general population, and it is plausible that those user data. who do not engage in social media were affected differently by COVID and the killing of George 4 Results Floyd. We can only speculate about how such a bias might influence our results. Second, we did Baseline scores for each mental health variable not correct for population demographic rates in the were higher (i.e., more severe) for HCPs than the creation of the community group, but did take care Community population. This suggests that, prior to capture a geographically diverse population. to COVID-19 lockdowns, HCPs were experienc- Finally, in future work we plan to explore how ing anxiety, depression, and suicide risk at higher the outputs of the models reported here can be con- rates than the general population (p < 0.001; tinuously calibrated and refined using psychomet- note that the figure below, for the sake of com- rically validated clinical scales of constructs such parison, shows by-group z-scores so that this Pre- as anxiety and depression. We take it as uncontro- Lockdown difference is not apparent). versial that using methods of the general kind em- To get a sense of how each event affected each ployed here to measure phenomena as complex as 52
Figure 1: Changes in mental health compared to Pre-Lockdown baseline for HCPs and Community. Y -axis indicates Z-scores compared to each group’s Pre-Lockdown baseline; a score of 0 means a return to Pre-Lockdown baseline levels. Time periods for comparison are indicated by thick horizontal bars at the mean for that group across the relevant time period. Significant events are indicated by vertical dotted lines. State reopenings are represented as faded dotted lines. 53
anxiety, depression, and suicide will demand ex- continue to increase. US Department of Health tensive collaboration and iteration. and Human Services, Centers for Disease Control and . . . . 6 Conclusion J. Mikal, S. Hurst, and M. Conway. Ethical issues in We have demonstrated the ability to assess using twitter for population-level depression moni- toring: a qualitative study. BMC Medical Ethics, population-level mental health constructs in real 17(22). time, based on publicly available social media data. Quick access to this information could al- John Paparrizos, Ryen W. White, and Eric Horvitz. 2016. Screening for pancreatic adenocarcinoma low lawmakers, mental health practitioners, and using signals from web search logs: Feasibility others to determine what type of interventions are study and results. Journal of Oncology Practice, needed, and where, in the face of rapidly changing 12(8):737–744. PMID: 27271506. conditions. Harnessing this kind of information Michael J Paul and Mark Dredze. 2011. You are may be critical to our recovery from COVID-19, what you tweet: Analyzing twitter for public health. and in allowing skillful responses to future crises. In Fifth International AAAI Conference on Weblogs and Social Media. References S. Aoun, D. Pennebaker, and C. Wood. 2004. Assess- ing population need for mental health care: A review of approaches and predictors. Mental Health Serv. Res., 6:33–46. Charley Beller, Rebecca Knowles, Craig Harman, Shane Bergsma, Margaret Mitchell, and Benjamin Van Durme. 2014. I’ma belieber: Social roles via self-identification and conceptual attributes. In Pro- ceedings of the 52nd Annual Meeting of the Associa- tion for Computational Linguistics (Volume 2: Short Papers), pages 181–186. Glen Coppersmith, Mark Dredze, Craig Harman, and Kristy Hollingshead. 2015. From ADHD to SAD: Analyzing the language of mental health on Twit- ter through self-reported diagnoses. In Proceed- ings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, Denver, Colorado, USA. North American Chapter of the Association for Computa- tional Linguistics. Glen Coppersmith, Ryan Leary, Patrick Crutchley, and Alex Fine. 2018. Natural language processing of so- cial media as screening for suicide risk. Biomedical informatics insights, 10:1178222618792860. Johannes C Eichstaedt, Hansen Andrew Schwartz, Margaret L Kern, Gregory Park, Darwin R Labarthe, Raina M Merchant, Sneha Jha, Megha Agrawal, Lukasz A Dziurzynski, Maarten Sap, et al. 2015. Psychological language on twitter predicts county- level heart disease mortality. Psychological science, 26(2):159–169. Gunther Eysenbach. 2009. Infodemiology and in- foveillance: framework for an emerging set of public health informatics methods to analyze search, com- munication and publication behavior on the internet. Journal of medical Internet research, 11(1):e11. Holly Hedegaard, Sally C Curtin, and Margaret Warner. 2018. Suicide rates in the United States 54
You can also read