The AI Doctor Is In: A Survey of Task-Oriented Dialogue Systems for Healthcare Applications
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
The AI Doctor Is In: A Survey of Task-Oriented Dialogue Systems for Healthcare Applications Mina Valizadeh and Natalie Parde Natural Language Processing Laboratory Department of Computer Science University of Illinois at Chicago {mvaliz2, parde}@uic.edu Abstract gap (Newman-Griffis et al., 2021) between cutting- edge, foundational work in dialogue systems and Task-oriented dialogue systems are increas- prototypical or deployed dialogue agents in health- ingly prevalent in healthcare settings, and have care settings. This limits the proliferation of scien- been characterized by a diverse range of ar- chitectures and objectives. Although these tific progress to real-world systems, constraining systems have been surveyed in the medical the potential benefits of fundamental research. community from a non-technical perspective, We move towards closing this gap by conducting a systematic review from a rigorous compu- a comprehensive, scientifically rigorous analysis of tational perspective has to date remained no- task-oriented healthcare dialogue systems. Our un- ticeably absent. As a result, many important derlying objectives are to (a) explore how these sys- implementation details of healthcare-oriented tems have been employed to date, and (b) map out dialogue systems remain limited or under- specified, slowing the pace of innovation in their characteristics, shortcomings, and subsequent this area. To fill this gap, we investigated an opportunities for follow-up work. Importantly, we initial pool of 4070 papers from well-known seek to address the limitations of prior systematic computer science, natural language process- reviews by extensively investigating the included ing, and artificial intelligence venues, identi- systems from a computational perspective. Our fying 70 papers discussing the system-level primary contributions are as follows: implementation of task-oriented dialogue sys- tems for healthcare applications. We con- 1. We systematically search through 4070 papers ducted a comprehensive technical review of from well-known technical venues and iden- these papers, and present our key findings in- tify 70 papers fitting our inclusion criteria.2 cluding identified gaps and corresponding rec- ommendations. 2. We analyze these systems based on many fac- tors, including system objective, language, ar- 1 Introduction chitecture, modality, device type, and evalua- Dialogue systems1 have a daily presence in many tion paradigm, among others. individuals’ lives, acting as virtual assistants (Hoy, 2018), customer service agents (Xu et al., 2017), 3. We identify common limitations across sys- or even companions (Zhou et al., 2020). While tems, including an incomplete exploration of some systems are designed to conduct unstructured architecture, replicability concerns, ethical conversations in open domains (chatbots), others and privacy issues, and minimal investigation (task-oriented dialogue systems) help users to com- of usability or engagement. We offer prac- plete tasks in a specific domain (Jurafsky and Mar- tical suggestions for addressing these as an tin, 2009; Qin et al., 2019). Task-oriented dialogue on-ramp for future work. systems can potentially play an important role in In the long term, we hope that the gaps and op- health and medical care (Laranjo et al., 2018), and portunities identified in this survey can stimulate they have been adopted by growing numbers of more rapid advances in the design of task-oriented patients, caregivers, and clinicians (Kearns et al., healthcare dialogue systems. We also hope that the 2019). Nonetheless, there remains a translational survey provides a useful starting point and synthe- 1 We follow an inclusive definition of dialogue systems, sis of prior work for NLP researchers and practi- encompassing any intelligent systems designed to converse 2 with humans via natural language. A full listing of these papers is provided in the appendix. 6638 Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics Volume 1: Long Papers, pages 6638 - 6660 May 22-27, 2022 c 2022 Association for Computational Linguistics
tioners entering this critical yet surprisingly under- Screening ACM IEEE ACL AAAI Total studied application domain. Process Initial 1050 1400 1020 600 4070 Search 2 Related Work Title 151 273 106 55 585 Screening Dialogue systems in healthcare have been the focus Abstract 32 45 26 8 110 of several recent surveys conducted by the medical Screening and clinical communities (Vaidyam et al., 2019; Final 21 31 16 2 70 Laranjo et al., 2018; Kearns et al., 2019). These Screening surveys have investigated the real-world utiliza- Table 1: The number of papers included from each tion of deployed systems, rather than examining database in each step of the paper screening process. their design and implementation from a technical perspective. In contrast, studies examining these systems through the lens of AI and NLP research 3 Search Criteria and Screening and practice have been limited. Zhang et al. (2020) We designed search criteria in concert with our goal and Chen et al. (2017) presented surveys of recent of filling a translational information gap between advances in general-domain task-oriented dialogue fundamental dialogue systems research and applied systems. Although they provide an excellent holis- systems in the healthcare domain. To do so, we tic portrait of the subfield, they do not delve into retrieved articles from well-respected computer sci- aspects of particular interest in healthcare settings ence, AI, and NLP databases and screened them for (e.g., system objectives doubling as clinical goals), focus on task-oriented dialogue systems designed limiting their usefulness for this audience. for healthcare settings. Our target databases were: Vaidyam et al. (2019), Laranjo et al. (2018), (1) ACM,3 (2) IEEE,4 (3) the ACL Anthology,5 and and Kearns et al. (2019) conducted systematic (4) the AAAI Digital Library.6 ACM and IEEE are reviews of dialogue systems deployed in mental large databases of papers from prestigious confer- health (Vaidyam et al., 2019) or general healthcare ences and journals across many CS fields, including (Laranjo et al., 2018; Kearns et al., 2019) settings. but not limited to robotics, human-computer inter- Vaidyam et al. (2019) examined 10 articles, and action, data mining, and multimedia systems. The Laranjo et al. (2018) and Kearns et al. (2019) exam- ACL Anthology is the premier database of publica- ined 17 and 46 articles, respectively. All surveys tions within NLP, hosting papers from major con- were written for a medical audience and focused on ferences and topic-specific venues (e.g., SIGDIAL, healthcare issues and impact, covering few articles organized by the Special Interest Group on Dis- from AI, NLP, or general computer science venues. course and Dialogue). The AAAI Digital Library Montenegro et al. (2019) and Tudor Car et al. hosts papers not only from the AAAI Conference on (2020) recently reviewed 40 and 47 articles, re- Artificial Intelligence, but also from other AI con- spectively, covering conversational agents in the ferences, AI Magazine, and the Journal of Artificial healthcare domain. These two surveys are the clos- Intelligence Research. We applied the following est to ours, but differ in important ways. First, inclusion criteria when identifying papers: our focus is on a specific class of conversational agents: task-oriented dialogue systems. The sur- • The main focus must be on the technical de- veys by Montenegro et al. (2019) and Tudor Car sign or implementation of a task-oriented dia- et al. (2020) used a wider search breading their abil- logue system. ity to provide extensive technical depth. We also • The system must be designed for health- reviewed more papers (70 articles), which were related applications. then screened using a more thorough taxonomy as part of the analysis. Some aspects that we consid- • The article must not be dedicated to one spe- ered that differ from these prior surveys include the cific module of the system’s architecture (e.g., overall dialogue system architecture, the dialogue 3 https://dl.acm.org/ management architecture, the system evaluation 4 https://ieeexplore.ieee.org/ methods, and the dataset(s) used when developing 5 https://www.aclweb.org/anthology/ 6 and/or evaluating the system. https://aaai.org/Library/library.php 6639
the natural language understanding compo- nent of a health-related dialogue system). Although a narrower scope—e.g., developing im- proved methods for slot-filling—is common when publishing in the dialogue systems community, these papers tend to place more emphasis on tech- nical design irrespective of application context, of- fering less coverage of the system-level charac- teristics that are the target of this survey. We fol- lowed four steps in our screening process. First (Ini- tial Search), we applied a predefined search query to the databases to populate our initial list of pa- pers. To generate the query, we used the keywords “task-oriented,” “dialogue system,” “conversational agent,” “health,” and “healthcare,” and synonyms Figure 1: Research domains and corresponding subcat- and abbreviations of these keywords. We short- egories for the included papers. Parentheses indicate listed papers using these keywords individually as the number of papers belonging to the (sub)category. well as in combination with one another. Next (Title Screening), we performed a prelimi- nary screening through the initial list of papers by subsections, grouped into thematic categories: on- reading the titles, keeping those that satisfied the tology (§4), system architecture (§5), system de- inclusion criteria. Then (Abstract Screening), we sign (§6), dataset (§7), and system evaluation (§8). went through the list of papers remaining after the 4 Ontology title screening and read the abstracts, keeping those that satisfied the inclusion criteria. Lastly (Final We map each paper to its domain of research (§4.1), Screening), we read the body of the papers remain- system objective (§4.2), target audience (§4.3), and ing after the abstract screening and kept those that language (§4.4), and present our findings. satisfied the inclusion criteria. These funnel filtering processes were conducted 4.1 Domain of Research by a computer science graduate student (a fluent Task-oriented dialogue systems can potentially im- L2 English speaker) using predefined search and pact many facets of healthcare in society (Bick- screening guidelines. Questions or uncertainties more and Giorgino, 2004). We define a domain of regarding a paper’s compliance with inclusion cri- research as the healthcare area in which the sys- teria were forwarded along to the senior project tem operates. We identify both broad domains lead (a computer science professor and fluent L1 and more specific subcategories thereof based on English speaker with expertise in NLP) and final the systems surveyed, outlined in Figure 1. Broad consensus was reached via discussion among the domain categories include mental health, physi- two parties. We detail the number of papers remain- cal health, health information, patient assistance, ing after each screening step in Table 1. Overall, physician assistance, cognitive or developmental this screening process combined with our subse- health, and other (comprising subcategories not quent surveying methods spanned eight months, easily classifiable to one of the broader domains). covering papers published prior to January 2021. Systems in the mental health domain supported In total, 70 papers (21 from ACM, 31 from IEEE, individuals with mental or psychological health 16 from ACL, and 2 from AAAI7 ) satisfied the in- conditions, and systems in the cognitive or devel- clusion criteria. We survey papers meeting our opmental health domain were a close analogue inclusion criteria according to a wide range of pa- for individuals with conditions impacting memory, rameters, and present our findings in the following executive, or other cognitive function. Systems 7 Papers about task-oriented dialogue systems published at in the physical health domain were targeted to- AAAI often focus on one specific component of the system wards individuals with specific physical health con- from a technical perspective, rather than proposing a conver- sational agent as a whole. Therefore, only two papers from cerns, including infectious (e.g., Covid-19), non- the AAAI Digital Library satisfied the inclusion criteria. infectious (e.g., cancer), and temporary (e.g., preg- 6640
System Objective # Papers Target Audience # Papers Diagnosis 7 Patients 59 Monitoring 8 Caregivers 3 Intervention 13 Patients & Caregivers 2 Counseling 5 Clinicians 11 Assistance 12 Table 3: Distribution of the target audiences of the sys- Multi-Objective 25 tems described in the surveyed papers. Table 2: Distribution of system objectives across the surveyed papers. Additional details regarding multi- signed for more than one target objective. Among objective papers are provided in the appendix. multi-objective systems, those that were designed for both diagnosis and assistance had the highest frequency (7/25); we provide additional details re- nancy) conditions. Systems providing health in- garding these systems in Table 8 of the appendix. formation performed general-purpose actions such Separately, we also considered the role of en- as offering advice or suggesting disease diagnoses. gagement as an objective of each system. We de- Finally, systems performing patient assistance or fine this as a goal of engaging target users in in- physician assistance supported specific patient- or teraction, irrespective of underlying health goals. physician-focused healthcare tasks. Dialogue sys- Engagement may be of particular interest in health- tems designed for mental health, physical health, care settings since it can be critical in encouraging and health information were the most prevalent, adoption or adherence with respect to healthcare covering 51 of the 70 included papers. outcomes (Montenegro et al., 2019). Surprisingly, almost 60% of the papers (41 of the 70 surveyed) 4.2 System Objective did not mention any goals pertaining to engaging Task-oriented dialogue systems define value rela- users in more interactions. tive to the goals of a target task. We define the system objective as the healthcare task for which a 4.3 Target Audience system is designed. Some system objectives may The final consumers of healthcare systems often be closely aligned with a single domain, whereas fall into three groups: patients, caregivers, and others may occur in many different domains (e.g., clinicians. Table 3 shows the number of systems monitoring mental, physical, or cognitive condi- surveyed that focus on each category. We find that tions). Thus, although the domain of research and out of 70 task-oriented dialogue systems, 59 are system objective may frequently correlate, there is designed specifically for patients. not by necessity a direct association. 4.4 Language Included systems were categorized as being de- signed to: diagnose a health condition (e.g., by pre- Most general-domain dialogue systems research dicting whether the user suffers from cognitive de- has been conducted in English and other high- cline); monitor user states (e.g., by tracking their di- resource languages (Artetxe et al., 2020). Ex- ets or periodically checking their mood); intervene panding language diversity may extend the ben- by addressing users’ health concerns or improv- efits of health-related dialogue systems more glob- ing their states (e.g., by teaching children how to ally. As shown in Figure 2, among the systems map facial expressions to emotions); counsel users included in our review a majority (56%) are de- without providing any direct intervention (e.g., by signed for English speakers. Encouragingly, sev- listening to users’ concerns and empathizing with eral of the included systems did focus on lower- them); or assist users by providing information or resource languages, including Telugu (Duggenpudi guidance (e.g., by answering questions from users et al., 2019), Bengali (Rahman et al., 2019), and who are filling out forms). Many systems were also Setswana (Grover et al., 2009). categorized as multi-objective, meaning that they were designed for more than one of those goals. 5 System Architecture Table 2 shows the number of systems having We investigate both the general architecture of the each objective. Many systems (25/70) were de- system (§5.1), and if applicable, the dialogue man- 6641
Dialogue Management Architecture # Papers Rule-based 17 Intent-based 20 Hybrid Architecture 21 Corpus-based 0 Table 5: Distribution of dialogue management archi- tectures across the surveyed papers. This table does not include papers describing end-to-end architectures (n = 2) or for which system architecture was not spec- ified (n = 10). Figure 2: Language diversity across the surveyed sys- tems. A small percentage (10%) of papers do not spec- we afford it special attention. In rule-based ap- ify the system’s language. proaches, the system interacts with users based on a predefined set of rules, with success conditioned System Architecture # Papers upon coverage of all relevant cases (Siangchin and Pipeline 58 Samanchuen, 2019). Intent-based approaches seek End-to-End 2 to extract the user’s intention from the dialogue, Not Specified 10 and then perform the relevant action (Jurafsky and Martin, 2009). In hybrid dialogue management Table 4: Distribution of papers describing systems with architectures, the system leverages a combination pipeline or end-to-end architectures, or that do not spec- of rule-based and intent-based approaches, and fi- ify the architecture. nally corpus-based approaches mine the dialogues of human-human conversations and produce re- agement architecture specifically (§5.2). sponses using retrieval methods or generative meth- ods (Jurafsky and Martin, 2009). As shown in Ta- 5.1 General Architecture ble 5, among papers reporting on dialogue manage- Task-oriented dialogue systems are generally de- ment architecture, we observe a fairly even mix of signed using pipeline or end-to-end architectures. rule-based, intent-based, and hybrid architectures. Pipeline architectures typically consist of separate components for natural language understanding, di- 6 System Design alogue state tracking, dialogue policy, and natural 6.1 Modality language generation. The ensemble of the dialogue state tracker and dialogue policy is the dialogue Modality, the channel through which information manager (Chen et al., 2017). End-to-end architec- is exchanged between a computer and a human tures train a single model to produce output for a (Karray et al., 2008), can play an important role in given input, often interacting with structured ex- dialogue quality and user satisfaction (Bilici et al., ternal databases and requiring extensive training 2000). Unimodal systems use a single modality data (Chen et al., 2017). As shown in Table 4, for information exchange, whereas multimodal sys- only 2.85% of papers (2 of the 70 surveyed) imple- tems use multiple modalities (Karray et al., 2008). mented an end-to-end system; this is unsurprising Systems reviewed in this survey operated using one given the limited training data available in most or more of several modalities. In text-based or spo- healthcare domains. We also found that 14% (10 ken interaction, users interact with the system by papers) did not directly specify the architecture of typing or speaking, respectively. In interaction via their developed system. graphical user interface (GUI), users interact with the system through the use of visual elements. 5.2 Dialogue Management Architecture In general, multimodal dialogue systems can be Unlike other pipeline components that impact user flexible and robust, but especially challenging to experience and engagement but not fundamental implement in the medical domain (Sonntag et al., decision-making, the dialogue manager is central 2009). We find that 49 papers describe unimodal to overall functionality (Zhao et al., 2019); thus, systems and 21 describe multimodal systems. Ta- 6642
Unimodal Multimodal Evaluation Type # Papers Category # Papers Category # Papers Human Evaluation 28 Text 23 Spoken + Text 14 Automated Evaluation 7 Spoken 25 Spoken + GUI 4 Human & Automated Evaluation 9 GUI 1 Text + GUI 3 Not Specified 26 Table 6: Distribution of modality type across the uni- Table 7: Distribution of evaluation methods across the modal (49 total, left) and multimodal (21 total, right) surveyed papers. systems surveyed. We reviewed each paper for information regard- ing the data used during system development, fo- cusing on dataset size, availability, and privacy- preserving measures. Only 20 papers provide de- tails about the data used (two papers provided a link to the dataset, and the remaining 18 discussed the dataset size). Unfortunately, the remaining papers did not provide rationale for their lack of data or other replicability information. Our assumption is that often the data contained sensitive information, Figure 3: Distribution of device type across the sur- preventing authors from releasing specific details, veyed papers. but only 19 of the 70 included papers provided in- formation about data-related privacy or ethical con- siderations. Only 10 mentioned Institutional Re- ble 6 provides more details regarding their distribu- view Board (IRB) approval for their dataset and/or tion across modalities. task, despite IRB (or equivalent) review being a 6.2 Device crucial step towards ensuring that research is con- ducted ethically and in such a way that protects Dialogue systems may facilitate interaction using a human subjects to the extent possible (Amdur and variety of devices (Arora et al., 2013), ranging from Biddle, 1997). telephones (Garvey and Sankaranarayanan, 2012) to computers (McTear, 2010) to any other technol- 8 System Evaluation ogy that allows interaction (e.g., VR-based avatars (Brinkman et al., 2012b; McTear, 2010)). We cate- We examined the means through which systems gorized the included systems as mobile, telephone, were evaluated both qualitatively and quantita- desktop/laptop, in-car, PDA, robot, virtual environ- tively (Deriu et al., 2019; Hastie, 2012). We de- ment, or virtual reality (including virtual agents fined human evaluation, often implemented in prior and avatars) systems, considering systems as multi- work through questionnaires (Grover et al., 2009; device if they leveraged multiple devices for inter- Holmes et al., 2019; Parde and Nielsen, 2019; action. As shown in Figure 3, we found that multi- Wang et al., 2020) or direct feedback from real- device and mobile-based dialogue systems were world users (Deriu et al., 2019), as an evaluation most popular. Table 9 in the appendix provides that relies on subjective, first-hand, human user additional details regarding multi-device systems. experience. In contrast, automated evaluation pro- vides an objective, quantitative measurement of 7 Dataset one or more dimensions of the system from a Data is crucial for effective system development mathematical perspective (Finch and Choi, 2020). (Serban et al., 2015), but many datasets for training Some metrics used for automated evaluation of dialogue systems are smaller than those used for the reviewed systems include measures of task per- other NLP tasks (Lowe et al., 2017). This is even formance (Ali et al., 2020) and completion rates more pronounced in the healthcare domain, in part (Holmes et al., 2019), response correctness (Ros- due to the risk of data misuse by others or the lack ruen and Samanchuen, 2018), and response time of data sharing incentives (Lee and Yoon, 2017). (Grover et al., 2009). 6643
In Table 7, we observe that nearly half of the tion. However, it is well-established that individ- papers conducted human evaluations; however, a uals from certain demographic groups are more large percentage (37%) also did not discuss evalua- comfortable conversing with dialogue systems via tion at all. We further analyzed papers conducting speech (Tudor Car et al., 2020). Text-based sys- human evaluations and found that they included tems may also be more likely to violate privacy an average of 26 (mode = 12) participants. More considerations (Tudor Car et al., 2020). Thus, we details regarding the human and automated evalua- recommend that researchers engage in further ex- tions are provided in Tables 10, 11, and 12 of the ploration of multimodal or spoken dialogue sys- appendix. In a follow-up analysis of system usabil- tems when applicable and appropriate. ity, defined as the degree to which users are able to Many of the surveyed systems were also imple- engage with a system safely, effectively, efficiently, mented on mobile phones. Although an advantage and enjoyably (Lee et al., 2019), we observed that of mobile-based systems is that they are readily 33 papers explicitly evaluated the usability of their available using a technology familiar to most users, system. Lee et al. (2018) found that users significantly re- duced their usage over time when engaging long- 9 Discussion term with mobile health applications. Tudor Car et al. (2020) suggest that one way to overcome this We identify common limitations across many sur- limitation in mobile-based systems is by directly veyed systems, accompanied by recommendations embedding them in applications or platforms with for addressing them in future work. which users already engage habitually (e.g., Face- 9.1 Incomplete Exploration of System Design book Messenger). This more ambient dissemina- tion approach may facilitate easier and more lasting We observed little system-level architectural di- integration of system use in individuals’ daily lives. versity across the surveyed systems, with most Finally, we identified that most systems (84%) (83%) having a pipeline architecture. This architec- target only patients, with research on systems tar- tural homogeneity limits our understanding of good geted towards clinicians and caregivers remaining design practice within this domain. Recent stud- limited. We recommend further exploration of sys- ies demonstrate that end-to-end architectures for tems targeted towards these critical audiences. This task-oriented dialogue systems could compete with may offer broad, high-impact support in under- pipeline architectures given sufficient high-quality standing, diagnosing, and treating patients’ health data (Hosseini-Asl et al., 2020; Ham et al., 2020; issues (Valizadeh et al., 2021; Kaelin et al., 2021). Bordes et al., 2017; Wen et al., 2016). However, the external knowledge sources often leveraged in end- 9.2 Replicability Concerns to-end systems are notoriously complex in many healthcare sub-domains (Campillos-Llanos et al., Data accessibility restrictions reduce the capacity 2020). Additionally, for healthcare applications of public health research (Strongman et al., 2019), interpretability is highly desired (Ham et al., 2020), and these limitations may be partially responsible but explanations are often obfuscated in end-to-end for the imbalance of pipeline versus end-to-end systems (Ham et al., 2020; Wen et al., 2016). Fi- architectures (§9.1). Only a small percentage of pa- nally, users of these systems may seek guidance on pers surveyed (29%) ventured to discuss the quan- sensitive topics, which can exacerbate privacy con- tity or characteristics of the data used during sys- cerns (Xu et al., 2021). Any system trained on large, tem development in any way. A lack of data trans- weakly curated datasets may also learn unpleasant parency hinders scientific progress and severely behaviors and amplify biases in the training data, in impedes replicability. We call upon researchers to turn producing harmful consequences (Dinan et al., publish data when permissible by governing pro- 2021; Bender et al., 2021). We recommend fur- tocol, and descriptive statistics to the extent allow- ther experimentation with architectural design, in able when circumstances prevent data release. We parallel with work towards developing high-quality also view the development of high-quality, pub- healthcare dialogue datasets, which to date remain licly available datasets as an important frontier in scarce (Farzana et al., 2020). translational dialogue systems research (§9.1). We noticed that a considerable number of the Many of the surveyed papers also lack important systems (33%) allowed only text-based interac- implementation details, such as evaluation meth- 6644
ods (34%). This prevents the research community languages brings up various challenges (López- from replicating developed systems and general- Cózar Delgado and Araki, 2005), but solving this izing study findings more broadly (Walker et al., problem could have have tremendous benefit for 2018). Well-established guidelines exist and are individuals in non-English speaking communities being increasingly enforced within the NLP com- with minimal or unreliable healthcare access. The munity to prevent reproducibility issues (Dodge systems developed by Duggenpudi et al. (2019), et al., 2019). The disregard of reproducibility best Rahman et al. (2019), and Grover et al. (2009) pro- practices observed with many healthcare dialogue vide case examples for how such systems may be systems may be partially attributed to the most com- implemented. We also note that while troubling, mon target venues for this work, which may place a 56% share of systems targeted towards English less emphasis on replication. This validates a cen- speakers is consistent with linguistic homogeneity tral motivator for publishing this survey—without in the field in general, and actually slightly low adequate inclusion of target domain and techni- relative to many other NLP tasks (Mielke, 2016; cal stakeholders in interdisciplinary, translational Bender, 2009). Healthcare dialogue systems may research, progress will remain constrained. We on some level offer a case example for how appli- strongly urge researchers in this domain to provide cations originally designed for high-resource (i.e., implementation details in their publications. English-language) settings can be adapted and re- engineered to provide better coverage of the di- 9.3 Potential Ethical and Privacy Issues verse, real-world potential user base. Real-world medical data facilitates the devel- opment of high-quality healthcare applications 9.5 Minimal Investigation of Usability or (Bertino et al., 2005; Di Palo and Parde, 2019; User Engagement Farzana et al., 2020), but protecting the rights Finally, more than 50% (37/70) of the included and privacy of contributors to the data is critical papers did not evaluate system usability or gen- for ensuring ethical research conduct (Institute of eral user experience. Usability testing can improve Medicine, 2009), as is proper treatment of copy- productivity and safeguard against errors (Rogers right protections. We screened all included papers et al., 2005), both of which are critical in healthcare for coverage of privacy and ethical concerns, and tasks. Therefore, we urge the research community observed that only 27% of the surveyed papers con- to consider and assess usability when designing for sidered participant or patient privacy in the design this domain. The systems among those surveyed of their system. Moreover, only 14% of the sur- that do this already (e.g., those developed by Wang veyed papers documented any evidence of Institu- et al. (2020), Lee et al. (2020b), Wei et al. (2018), tional Review Board (or IRB-equivalent) approval. or Demasi et al. (2020)) provide case examples for Research involving healthcare dialogue systems how it might be done. is unquestionably human-centered, and as such the Almost 60% of the surveyed systems were not absence of ethical oversight in the design of such explicitly designed to engage users, despite this systems is a grave concern. Although technical being a common objective in the general domain researchers entering this space may be unfamiliar (Ghazarian et al., 2019). Healthcare dialogue sys- with human subjects research and protocol, we urge tems may stand to benefit particularly well from all dialogue systems researchers to submit their such measures (Parde, 2018), since patient engage- experimental design and protocol for review by an ment is predictive of adoption and adherence to appropriate external review board. We also ask that healthcare outcomes (Montenegro et al., 2019). To researchers consider the potential harms from use increase user satisfaction and system performance, or misuse of their systems, following guidelines we recommend that the research community more established by the ACM Code of Ethics.8 purposefully consider engagement when designing their healthcare-oriented dialogue systems. 9.4 Room for Increased Language Diversity We observed that most systems (56%) targeted En- 10 Conclusion glish speakers. Developing multilingual dialogue In this work, we conducted a systematic technical systems or systems for speakers of low-resource survey of task-oriented dialogue systems used for 8 https://www.acm.org/code-of-ethics health-related purposes, providing much-needed 6645
analyses from a computational perspective and nar- often observed among human healthcare providers rowing the translational gap between basic and ap- (FitzGerald and Hurst, 2017). Systems trained to plied dialogue systems research. We comprehen- perform healthcare tasks using datasets that are not sively searched through 4070 papers in computer representative of the target population may exhibit science, NLP, and AI databases, finding 70 papers poorer performance with users who already experi- that satisfied our inclusion criteria. We analyzed ence marginalization or are otherwise vulnerable, these papers based on numerous technical factors impeding or even reversing benefits. We call upon including the domain of research, system objective, researchers to examine, debias, and curate their target audience, language, system architecture, sys- training data such that task-oriented dialogue sys- tem design, training dataset, and evaluation meth- tems for healthcare applications elevate, rather than ods. Following this, we identified and summarized diminish, outcomes for the historically underserved gaps in this existing body of work, including an users which they are best poised to benefit. incomplete exploration of system design, replica- bility concerns, potential ethical and privacy issues, 12 Acknowledgements room for increased language diversity, and mini- This material is based upon work supported by mal investigation of usability or user engagement. the National Science Foundation under Grant Finally, we presented evidence-based recommen- No. 2125411, and by a start-up grant from the dations stemming from our findings as a launching University of Illinois at Chicago. Any opinions, point for future work. It is our hope that inter- findings, and conclusions or recommendations are ested researchers find the information provided in those of the authors and do not necessarily reflect this survey to be a unique and helpful resource the views of the National Science Foundation. We for developing task-oriented dialogue systems for thank the anonymous reviewers for their insightful healthcare applications. suggestions, which further strengthened this work. 11 Ethical Considerations References Beyond the concrete changes suggested during the Parham Aarabi. 2013. Virtual cardiologist — a conver- discussion, it is important to consider the broader sational system for medical diagnosis. In 2013 26th ethical implications of task-oriented dialogue sys- IEEE Canadian Conference on Electrical and Com- tems in healthcare settings. Although the goal of puter Engineering (CCECE), pages 1–4. such systems may not be to replace human health- Yuna Ahn, Yilin Zhang, Yujin Park, and Joonhwan Lee. care providers, it is likely that deployed systems 2020. A chatbot solution to chat app problems: En- would support clinicians, defraying workload for visioning a chatbot counseling system for teenage overburdened individuals. In doing so, these sys- victims of online sexual exploitation. In Extended Abstracts of the 2020 CHI Conference on Human tems may have significant impact on healthcare Factors in Computing Systems, CHI EA ’20, page decision-making. Machines are imperfect, and thus 1–7, New York, NY, USA. Association for Comput- a possible harm is that these systems may misin- ing Machinery. terpret user input or make incorrect predictions— Mohammad Rafayet Ali, Seyedeh Zahra Razavi, Raina a mistake that in high-stakes healthcare settings Langevin, Abdullah Al Mamun, Benjamin Kane, could prove detrimental or even dangerous. Re- Reza Rawassizadeh, Lenhart K. Schubert, and searchers and developers should be cognizant of Ehsan Hoque. 2020. A virtual conversational agent possible harms stemming from the use and misuse for teens with autism spectrum disorder: Experimen- tal results and design lessons. In Proceedings of the of task-oriented dialogue systems for healthcare 20th ACM International Conference on Intelligent settings, and should implement both automated Virtual Agents, IVA ’20, New York, NY, USA. As- (e.g., strict thresholds for diagnostic suggestions) sociation for Computing Machinery. and human (e.g., training to ensure staff awareness Mohammad Rafayet Ali, Taylan Sen, Benjamin Kane, of potential system fallibilities) safeguards. Shagun Bose, Thomas Carroll, Ronald Epstein, Moreover, a potential benefit of these systems Lenhart K. Schubert, and Ehsan Hoque. 2021. is their potential to meaningfully and beneficially Novel computational linguistic measures, dialogue system and the development of sophie: Standardized extend healthcare access to underserved popula- online patient for healthcare interaction education. tions. As such, it is important to ensure that auto- IEEE Transactions on Affective Computing, pages 1– mated systems do not fall prey to the same biases 1. 6646
Robert J. Amdur and Chuck Biddle. 1997. Institutional Vildan Bilici, Emiel Krahmer, Saskia te Riele, and Ray- Review Board Approval and Publication of Human mond Veldhuis. 2000. Preferred modalities in dia- Research Results. JAMA, 277(11):909–914. logue systems. In Sixth International Conference on Spoken Language Processing. Masahiro Araki, Kana Shibahara, and Yuko Mizukami. 2011. Spoken dialogue system for learning braille. Antoine Bordes, Y-Lan Boureau, and Jason Weston. In 2011 IEEE 35th Annual Computer Software and 2017. Learning end-to-end goal-oriented dialog. Applications Conference, pages 152–156. Willem-Paul Brinkman, Dwi Hartanto, Ni Kang, Suket Arora, Kamaljeet Batra, and Sarabjit Singh. Daniel de Vliegher, Isabel L. Kampmann, 2013. Dialogue system: A brief review. CoRR, Nexhmedin Morina, Paul G.M. Emmelkamp, abs/1306.4134. and Mark Neerincx. 2012a. A virtual reality dia- Mikel Artetxe, Sebastian Ruder, Dani Yogatama, logue system for the treatment of social phobia. In Gorka Labaka, and Eneko Agirre. 2020. A call CHI ’12 Extended Abstracts on Human Factors in for more rigor in unsupervised cross-lingual learn- Computing Systems, CHI EA ’12, page 1099–1102, ing. In Proceedings of the 58th Annual Meeting New York, NY, USA. Association for Computing of the Association for Computational Linguistics, Machinery. pages 7375–7388, Online. Association for Compu- Willem-Paul Brinkman, Dwi Hartanto, Ni Kang, tational Linguistics. Daniel Vliegher, Isabel Kampmann, Nexhmedin Lekha Athota, Vinod Kumar Shukla, Nitin Pandey, and Morina, Paul Emmelkamp, and Mark Neerincx. Ajay Rana. 2020. Chatbot for healthcare system 2012b. A virtual reality dialogue system for the using artificial intelligence. In 2020 8th Interna- treatment of social phobia. In Proceedings of the tional Conference on Reliability, Infocom Technolo- Conference on Human Factors in Computing Sys- gies and Optimization (Trends and Future Direc- tems, pages 1099–1102. tions) (ICRITO), pages 619–622. Jacqueline Brixey, Rens Hoegen, Wei Lan, Joshua Ru- Saminda Sundeepa Balasuriya, Laurianne Sitbon, An- sow, Karan Singla, Xusen Yin, Ron Artstein, and drew A. Bayor, Maria Hoogstrate, and Margot Brere- Anton Leuski. 2017. SHIHbot: A Facebook chat- ton. 2018. Use of voice activated interfaces by peo- bot for sexual health information on HIV/AIDS. In ple with intellectual disability. In Proceedings of the Proceedings of the 18th Annual SIGdial Meeting 30th Australian Conference on Computer-Human In- on Discourse and Dialogue, pages 370–373, Saar- teraction, OzCHI ’18, page 102–112, New York, brücken, Germany. Association for Computational NY, USA. Association for Computing Machinery. Linguistics. R. V. Belfin, A. J. Shobana, Megha Manilal, Ashly Ann Leonardo Campillos Llanos, Dhouha Bouamor, Éric Mathew, and Blessy Babu. 2019. A graph based Bilinski, Anne-Laure Ligozat, Pierre Zweigenbaum, chatbot for cancer patients. In 2019 5th Interna- and Sophie Rosset. 2015. Description of the Patient- tional Conference on Advanced Computing Commu- Genesys dialogue system. In Proceedings of the nication Systems (ICACCS), pages 717–721. 16th Annual Meeting of the Special Interest Group Emily M. Bender. 2009. Linguistically naïve != lan- on Discourse and Dialogue, pages 438–440, Prague, guage independent: Why NLP needs linguistic ty- Czech Republic. Association for Computational Lin- pology. In Proceedings of the EACL 2009 Workshop guistics. on the Interaction between Linguistics and Compu- Leonardo Campillos-Llanos, Catherine Thomas, Éric tational Linguistics: Virtuous, Vicious or Vacuous?, Bilinski, Pierre Zweigenbaum, and Sophie Rosset. pages 26–32, Athens, Greece. Association for Com- 2020. Designing a virtual patient dialogue system putational Linguistics. based on terminology-rich resources: Challenges Emily M. Bender, Timnit Gebru, Angelina McMillan- and evaluation. Natural Language Engineering, Major, and Shmargaret Shmitchell. 2021. On the 26(2):183–220. dangers of stochastic parrots: Can language mod- els be too big? . In Proceedings of the 2021 ACM Bo-Wei Chen, Po-Yi Shih, Karunanithi Bharanitha- Conference on Fairness, Accountability, and Trans- ran, Po-Chuan Lin, Jhing-Fa Wang, and Chia- parency, FAccT ’21, page 610–623, New York, NY, Ming Chen. 2013. Customizable cloud-healthcare USA. Association for Computing Machinery. dialogue system based on lvcsr with prosodic- contextual post-processing. In 2013 1st Interna- E. Bertino, B.C. Ooi, Y. Yang, and R.H. Deng. 2005. tional Conference on Orange Technologies (ICOT), Privacy and ownership preserving of outsourced pages 246–249. medical data. In 21st International Conference on Data Engineering (ICDE’05), pages 521–532. Hongshen Chen, Xiaorui Liu, Dawei Yin, and Jiliang Tang. 2017. A survey on dialogue systems: Recent Timothy Bickmore and Toni Giorgino. 2004. Some advances and new frontiers. CoRR, abs/1711.01731. novel aspects of health communication from a dia- logue systems perspective. AAAI Fall Symposium - Ching-Hua Chuan and Susan Morgan. 2021. Creat- Technical Report. ing and evaluating chatbots as eligibility assistants 6647
for clinical trials: An active deep learning approach Emily Dinan, Gavin Abercrombie, A. Stevie Bergman, towards user-centered classification. ACM Trans. Shannon Spruit, Dirk Hovy, Y-Lan Boureau, and Comput. Healthcare, 2(1). Verena Rieser. 2021. Anticipating safety issues in e2e conversational ai: Framework and tooling. Karl Daher, Jacky Casas, Omar Abou Khaled, and Elena Mugellini. 2020. Empathic chatbot response Francesca Dino, Rohola Zandie, Hojjat Abdollahi, for medical assistance. In Proceedings of the 20th Sarah Schoeder, and Mohammad H. Mahoor. 2019. ACM International Conference on Intelligent Virtual Delivering cognitive behavioral therapy using a con- Agents, IVA ’20, New York, NY, USA. Association versational social robot. In 2019 IEEE/RSJ Interna- for Computing Machinery. tional Conference on Intelligent Robots and Systems (IROS), pages 2089–2095. Prathyusha Danda, Brij Mohan Lal Srivastava, and Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Manish Shrivastava. 2016. Vaidya: A spoken di- Schwartz, and Noah A. Smith. 2019. Show your alog system for health domain. In Proceedings of work: Improved reporting of experimental results. the 13th International Conference on Natural Lan- In Proceedings of the 2019 Conference on Empirical guage Processing, pages 161–166, Varanasi, India. Methods in Natural Language Processing and the NLP Association of India. 9th International Joint Conference on Natural Lan- guage Processing (EMNLP-IJCNLP), pages 2185– Johan Oswin De Nieva, Jose Andres Joaquin, 2194, Hong Kong, China. Association for Computa- Chaste Bernard Tan, Ruzel Khyvin Marc Te, and tional Linguistics. Ethel Ong. 2020. Investigating students’ use of a mental health chatbot to alleviate academic stress. Suma Reddy Duggenpudi, Kusampudi Siva Subra- In 6th International ACM In-Cooperation HCI and hamanyam Varma, and Radhika Mamidi. 2019. UX Conference, CHIuXiD ’20, page 1–10, New Samvaadhana: A Telugu dialogue system in hospi- York, NY, USA. Association for Computing Machin- tal domain. In Proceedings of the 2nd Workshop on ery. Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), pages 234–242, Hong Kong, China. Orianna Demasi, Yu Li, and Zhou Yu. 2020. A multi- Association for Computational Linguistics. persona chatbot for hotline counselor training. In Wilmer Stalin Erazo, Germán Patricio Guerrero, Car- Findings of the Association for Computational Lin- los Carrión Betancourt, and Iván Sánchez Salazar. guistics: EMNLP 2020, pages 3623–3636, Online. 2020. Chatbot implementation to collect data on Association for Computational Linguistics. possible covid-19 cases and release the pressure on the primary health care system. In 2020 11th Jan Deriu, Álvaro Rodrigo, Arantxa Otegi, Guillermo IEEE Annual Information Technology, Electronics Echegoyen, Sophie Rosset, Eneko Agirre, and Mark and Mobile Communication Conference (IEMCON), Cieliebak. 2019. Survey on evaluation methods for pages 0302–0307. dialogue systems. CoRR, abs/1905.04071. Ahmed Fadhil and Ahmed Ghassan Tawfiq AbuRa’ed. David DeVault, Kallirroi Georgila, Ron Artstein, Fab- 2019. Ollobot - towards a text-based arabic health rizio Morbini, David Traum, Stefan Scherer, Al- conversational agent: Evaluation and results. In bert Skip Rizzo, and Louis-Philippe Morency. 2013. RANLP. Verbal indicators of psychological distress in inter- Shahla Farzana, Mina Valizadeh, and Natalie Parde. active dialogue with a virtual human. In Proceed- 2020. Modeling dialogue in conversational cogni- ings of the SIGDIAL 2013 Conference, pages 193– tive health screening interviews. In Proceedings of 202, Metz, France. Association for Computational the 12th Language Resources and Evaluation Con- Linguistics. ference, pages 1167–1177, Marseille, France. Euro- pean Language Resources Association. Alessandro Di Nuovo, Josh Bamforth, Daniela Conti, Karen Sage, Rachel Ibbotson, Judy Clegg, Anna Sarah E. Finch and Jinho D. Choi. 2020. Towards uni- Westaway, and Karen Arnold. 2020. An explo- fied dialogue system evaluation: A comprehensive rative study on robotics for supporting children with analysis of current evaluation protocols. In Proceed- autism spectrum disorder during clinical procedures. ings of the 21th Annual Meeting of the Special Inter- In Companion of the 2020 ACM/IEEE International est Group on Discourse and Dialogue, pages 236– Conference on Human-Robot Interaction, HRI ’20, 245, 1st virtual meeting. Association for Computa- page 189–191, New York, NY, USA. Association for tional Linguistics. Computing Machinery. Chloë FitzGerald and Samia Hurst. 2017. Implicit bias in healthcare professionals: a systematic review. Flavio Di Palo and Natalie Parde. 2019. Enriching neu- BMC medical ethics, 18(1):1–18. ral models with targeted features for dementia detec- tion. In Proceedings of the 57th Annual Meeting of Floyd Garvey and Suresh Sankaranarayanan. 2012. In- the Association for Computational Linguistics: Stu- telligent agent based flight search and booking sys- dent Research Workshop, pages 302–308, Florence, tem. International Journal of Advanced Research in Italy. Association for Computational Linguistics. Artificial Intelligence, 1(4). 6648
Sarik Ghazarian, Ralph M. Weischedel, Aram Gal- Koji Inoue, Pierrick Milhorat, Divesh Lala, Tianyu styan, and Nanyun Peng. 2019. Predictive en- Zhao, and Tatsuya Kawahara. 2016. Talking with gagement: An efficient metric for automatic eval- ERICA, an autonomous android. In Proceedings uation of open-domain dialogue systems. CoRR, of the 17th Annual Meeting of the Special Interest abs/1911.01456. Group on Discourse and Dialogue, pages 212–215, Los Angeles. Association for Computational Lin- Nancy Green, William Lawton, and Boyd Davis. 2004. guistics. An assistive conversation skills training system for caregivers of persons with alzheimer’s disease. In Institute of Medicine. 2009. Beyond the HIPAA Pri- Proceedings of the AAAI 2004 Fall Symposium on vacy Rule: Enhancing Privacy, Improving Health Dialogue Systems for Health Communication. Through Research. The National Academies Press, Washington, DC. Aditi Sharma Grover, Madelaine Plauché, Etienne Barnard, and Christiaan Kuun. 2009. Hiv health Hifza Javed, Myounghoon Jeon, Ayanna Howard, information access using spoken dialogue systems: and Chung Hyuk Park. 2018. Robot-assisted Touchtone vs. speech. In Proceedings of the 3rd In- socio-emotional intervention framework for chil- ternational Conference on Information and Commu- dren with autism spectrum disorder. In Companion nication Technologies and Development, ICTD’09, of the 2018 ACM/IEEE International Conference on page 95–107. IEEE Press. Human-Robot Interaction, HRI ’18, page 131–132, New York, NY, USA. Association for Computing Donghoon Ham, Jeong-Gwan Lee, Youngsoo Jang, Machinery. and Kee-Eung Kim. 2020. End-to-end neural pipeline for goal-oriented dialogue systems using Daniel Jurafsky and James H. Martin. 2009. Speech GPT-2. In Proceedings of the 58th Annual Meet- and Language Processing (2nd Edition). Prentice- ing of the Association for Computational Linguis- Hall, Inc., USA. tics, pages 583–592, Online. Association for Com- putational Linguistics. Dipesh Kadariya, Revathy Venkataramanan, Hong Yung Yip, Maninder Kalra, Krishnaprasad Helen Hastie. 2012. Metrics and evaluation of spo- Thirunarayanan, and Amit Sheth. 2019. kbot: ken dialogue systems. In Data-driven methods for Knowledge-enabled personalized chatbot for adaptive spoken dialogue systems, pages 131–150. asthma self-management. In 2019 IEEE In- Springer. ternational Conference on Smart Computing (SMARTCOMP), pages 138–143. Samuel Holmes, Anne Moorhead, Raymond Bond, Huiru Zheng, Vivien Coates, and Michael Mctear. Vera C Kaelin, Mina Valizadeh, Zurisadai Salgado, Na- 2019. Usability testing of a healthcare chatbot: Can talie Parde, and Mary A Khetani. 2021. Artificial we use conventional methods to assess conversa- intelligence in rehabilitation targeting the participa- tional user interfaces? In Proceedings of the 31st Eu- tion of children and youth with disabilities: Scoping ropean Conference on Cognitive Ergonomics, ECCE review. J Med Internet Res, 23(11):e25745. 2019, page 207–214, New York, NY, USA. Associa- tion for Computing Machinery. Takeshi Kamita, Atsuko Matsumoto, Boyu Sun, and Tomoo Inoue. 2020. Promotion of continuous use Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, of a self-guided mental healthcare system by a chat- Semih Yavuz, and Richard Socher. 2020. A simple bot. In Conference Companion Publication of the language model for task-oriented dialogue. CoRR, 2020 on Computer Supported Cooperative Work and abs/2005.00796. Social Computing, CSCW ’20 Companion, page 293–298, New York, NY, USA. Association for Matthew B. Hoy. 2018. Alexa, siri, cortana, and Computing Machinery. more: An introduction to voice assistants. Medical Reference Services Quarterly, 37(1):81–88. PMID: B. Amir H. Kargar and Mohammad H. Mahoor. 2017. 29327988. A pilot study on the ebear socially assistive robot: Implication for interacting with elderly people with Chin-Yuan Huang, Ming-Chin Yang, Chin-Yu Huang, moderate depression. In 2017 IEEE-RAS 17th In- Yu-Jui Chen, Meng-Lin Wu, and Kai-Wen Chen. ternational Conference on Humanoid Robotics (Hu- 2018. A chatbot-supported smart wireless interac- manoids), pages 756–762. tive healthcare system for weight control and health promotion. In 2018 IEEE International Conference Fakhri Karray, Milad Alemzadeh, Jamil Saleh, and on Industrial Engineering and Engineering Manage- Mo Nours Arab. 2008. Human-computer interac- ment (IEEM), pages 1791–1795. tion: Overview on state of the art. International Journal on Smart Sensing and Intelligent Systems, Tae-Ho Hwang, JuHui Lee, Se-Min Hyun, and KangY- 1:137–159. oon Lee. 2020. Implementation of interactive health- care advisor model using chatbot and visualiza- William Kearns, Nai-Ching Chi, Yong Choi, Shih-Yin tion. In 2020 International Conference on Informa- Lin, Hilaire Thompson, and George Demiris. 2019. tion and Communication Technology Convergence A systematic review of health dialog systems. Meth- (ICTC), pages 452–455. ods of Information in Medicine, 58:179–193. 6649
Liliana Laranjo, Adam Dunn, Huong Ly Tong, A. Baki A. Loisel, N. Chaignaud, and J-Ph. Kotowicz. Kocaballi, Jessica Chen, Rabia Bashir, Didi Surian, 2007. Designing a human-computer dialog sys- Blanca Gallego, Farah Magrabi, Annie Lau, and En- tem for medical information search. In 2007 rico Coiera. 2018. Conversational agents in health- IEEE/WIC/ACM International Conferences on Web care: A systematic review. Journal of the American Intelligence and Intelligent Agent Technology - Medical Informatics Association, 0. Workshops, pages 350–353. Choong Lee and Hyung-Jin Yoon. 2017. Medical big Ramón López-Cózar Delgado and Masahiro Araki. data: promise and challenges. Kidney Research and 2005. Spoken, Multilingual and Multimodal Dia- Clinical Practice, 36:3–11. logue Systems: Development and Assessment. Wi- ley, Chichester, UK. Dongkeon Lee, Kyo-Joong Oh, and Ho-Jin Choi. 2017. The chatbot feels you - a counseling service using Ryan Lowe, Nissan Pow, Iulian Serban, Laurent Char- emotional response generation. In 2017 IEEE Inter- lin, Chia-Wei Liu, and Joelle Pineau. 2017. Train- national Conference on Big Data and Smart Com- ing end-to-end dialogue systems with the ubuntu di- puting (BigComp), pages 437–440. alogue corpus. Dialogue and Discourse, 8:31–65. Ju Yeon Lee, Ju Young Kim, Seung Ju You, You Soo Raju Maharjan, Per Bækgaard, and Jakob E. Bardram. Kim, Hye Yeon Koo, Jeong Hyun Kim, Sohye Kim, 2019. "hear me out": Smart speaker based conversa- Jung Ha Park, Jong Soo Han, Siye Kil, Hyerim tional agent to monitor symptoms in mental health. Kim, Ye Seul Yang, and Kyung Min Lee. 2019. De- In Adjunct Proceedings of the 2019 ACM Interna- velopment and usability of a life-logging behavior tional Joint Conference on Pervasive and Ubiqui- monitoring application for obese patients. Journal tous Computing and Proceedings of the 2019 ACM of Obesity and Metabolic Syndrome, 28(3):194–202. International Symposium on Wearable Computers, Publisher Copyright: Copyright © 2019 Korean So- UbiComp/ISWC ’19 Adjunct, page 929–933, New ciety for the Study of Obesity. York, NY, USA. Association for Computing Machin- ery. Kyunghee Lee, Hyeyon Kwon, Byungtae Lee, Guna Lee, Jae Ho Lee, Yu Rang Park, and Soo-Yong Shin. Rohit Binu Mathew, Sandra Varghese, Sera Elsa Joy, 2018. Effect of self-monitoring on long-term patient and Swanthana Susan Alex. 2019. Chatbot for dis- engagement with mobile health applications. PLOS ease prediction and treatment recommendation us- ONE, 13(7):1–12. ing machine learning. In 2019 3rd International Conference on Trends in Electronics and Informat- Yi-Chieh Lee, Naomi Yamashita, and Yun Huang. ics (ICOEI), pages 851–856. 2020a. Designing a chatbot as a mediator for pro- Michael McTear. 2010. Chapter 9 - the role of spo- moting deep self-disclosure to a real mental health ken dialogue in user–environment interaction. In professional. Proc. ACM Hum.-Comput. Interact., Hamid Aghajan, Ramón López-Cózar Delgado, and 4(CSCW1). Juan Carlos Augusto, editors, Human-Centric Inter- faces for Ambient Intelligence, pages 225–254. Aca- Yi-Chieh Lee, Naomi Yamashita, Yun Huang, and Wai demic Press, Oxford. Fu. 2020b. "i hear you, i feel you": Encouraging deep self-disclosure through a chatbot. In Proceed- Sabrina J. Mielke. 2016. Language diversity in ACL ings of the 2020 CHI Conference on Human Factors 2004 - 2016. in Computing Systems, CHI ’20, page 1–12, New York, NY, USA. Association for Computing Machin- Mahdi Naser Moghadasi, Yu Zhuang, and Hashim Gell- ery. ban. 2020. Robo: A counselor chatbot for opioid ad- dicted patients. In 2020 2nd Symposium on Signal Peter Ljunglöf, Britt Claesson, Ingrid Mattsson Müller, Processing Systems, SSPS 2020, page 91–95, New Stina Ericsson, Cajsa Ottesjö, Alexander Berman, York, NY, USA. Association for Computing Machin- and Fredrik Kronlid. 2011. Lekbot: A talking ery. and playing robot for children with disabilities. In Proceedings of the Second Workshop on Speech Joao Luis Zeni Montenegro, Cristiano André da Costa, and Language Processing for Assistive Technolo- and Rodrigo da Rosa Righi. 2019. Survey of conver- gies, pages 110–119, Edinburgh, Scotland, UK. As- sational agents in health. Expert Systems with Appli- sociation for Computational Linguistics. cations, 129:56–67. Peter Ljunglöf, Staffan Larsson, Katarina Fabrizio Morbini, David DeVault, Kallirroi Georgila, Heimann Mühlenbock, and Gunilla Thunberg. Ron Artstein, David Traum, and Louis-Philippe 2009. TRIK: A talking and drawing robot for Morency. 2014. A demonstration of dialogue pro- children with communication disabilities. In cessing in SimSensei kiosk. In Proceedings of the Proceedings of the 17th Nordic Conference of Com- 15th Annual Meeting of the Special Interest Group putational Linguistics (NODALIDA 2009), pages on Discourse and Dialogue (SIGDIAL), pages 254– 275–278, Odense, Denmark. Northern European 256, Philadelphia, PA, U.S.A. Association for Com- Association for Language Technology (NEALT). putational Linguistics. 6650
You can also read