Conversational AI Systems for Social Good: Opportunities and Challenges
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Conversational AI Systems for Social Good: Opportunities and Challenges Peng Qi Jing Huang Youzheng Wu Xiaodong He Bowen Zhou JD AI Research Mountain View, CA 94043 {peng.qi, jing.huang, wuyouzheng1, xiaodong.he, bowen.zhou}@jd.com Abstract allow users to convey their needs the same way they would to a human counterpart. Second, with arXiv:2105.06457v1 [cs.CL] 13 May 2021 Conversational artificial intelligence (ConvAI) the help of increasingly robust automatic speech systems have attracted much academic and commercial attention recently, making signif- recognition (ASR) and speech synthesis (or text- icant progress on both fronts. However, lit- to-speech, TTS) systems, ConvAI systems are a tle existing work discusses how these sys- lot easier and inexpensive to access than many tems can be developed and deployed for so- other technical solutions. One could gain access cial good. In this paper, we briefly review to a ConvAI system as long as they can access a the progress the community has made towards telephone, without needing Internet access, smart better ConvAI systems and reflect on how ex- devices, or even a digital cellular network. isting technologies can help advance social good initiatives from various angles that are Both reasons also make ConvAI a great can- unique for ConvAI, or not yet become com- didate technology for social good, because these mon knowledge in the community. We further properties help the underlying technical solution discuss about the challenges ahead for ConvAI reach a much broader population at virtually no ad- systems to better help us achieve these goals ditional cost to the society, nor delay in implemen- and highlight the risks involved in their devel- tation to the communities to be served. However, opment and deployment in the real world. despite these inherent advantages and the grow- ing interest in ConvAI, there has been little ex- 1 Introduction isting work that discusses how these systems and Conversational artificial intelligence (ConvAI) technologies can be developed and used for social systems, or dialogue systems, are increasingly good to the best of our knowledge. prevalent in our everyday lives, taking the form of In this paper, we aim to better understand how smart assistants for home and hand-held devices conversational AI technologies and systems can (e.g., Amazon’s Alexa, Apple’s Siri, Google As- be developed and deployed for social good. To sistant) or friendly helpers with telephone banking this end, we begin by briefly reviewing the com- or online customer service, to name a few. Con- munity’s efforts and progress on ConvAI over the temporaneously, there has been a renewed inter- past few decades. We then explore how existing est in ConvAI from the research community, evi- technolgy can be, or have been, applied to various denced by the enthusiasm around the ConvAI chal- scenarios to advance the United Nations’ Sustain- lenge1 and the growing number of publications in able Development Goals (SDGs)for social good. the ACL Anthology that mention ConvAI-themed We illustrate specific use cases with concrete ex- keywords in their titles (see Figure 2 in the ap- amples, with an focus on application scenarios that pendix). are far from common knowledge. Finally, we con- Aside from the recent technical advances, we clude with reflections on potential challenges and recognize two important reasons for their popular- risks when we develop and deploy new ConvAI ity. First, ConvAI systems provide a natural lan- technologies into the real world, focusing on the guage user interface (LUI) for their underlying ap- societal impact and how one might mitigate them. plications, requiring little to no prior training on the users’ part to use them. In an ideal world, 2 Background on Conversational AI ConvAI technology would help us build LUIs that Communicating with computers in human lan- 1 https://convai.io/ guages has been a long-standing goal of com-
puter science since its early days. From intuitively explored practical conversations across the spec- named commands to more recent conversational trum of different goal specificity, domain open- AI systems, effective use of computer systems has ness, as well as levels of grounding. Some take the been made more and more accessible to average familiar form of a ConvAI system assisting human users. users (e.g., knowledge-based ConvAI, including For our discussion, we adopt an deliberately question answering systems (Choi et al., 2018)), broad definition of the term “Conversational while others focus more on conversational skills AI”: any system or technology that allows hu- that featured in collaborative or competitive peer man users to interact with computers via natu- conversations (He et al., 2017, 2018). ral language. This definition encompasses most Before we move on to exploring how ConvAI language-related interactive systems, including systems can be applied for social good, we must those based on audible speech interactions (e.g., first answer the following questions: once we have CMU’s Let’s Go! system for bus information built a ConvAI system, what effects can we expect (Raux et al., 2005)). it to have on the world around us? What proper- ties make them more appealing to, say, their hu- Many early ConvAI systems give one of the con- man counterparts? One of the most basic proper- versationalists the exclusive agency to drive the ties that makes ConvAI systems potentially more conversation. System-initiative ConvAI agents appealing to human operators is that they are more typically follow a predefined dialogue plan and of- readily available and scalable. This makes it pos- fer users limited options. As a result, these sys- sible for ConvAI services to be accessible dur- tems tend to have an easier time understanding the ing non-working hours, and allow them to cater user as long as they are cooperating (e.g., auto- to more users at the same time at virtually no mated banking services). On the other hand, user- marginal cost. As a result, ConvAI systems can be initiative systems focus on responding or react- perfect alternatives in helping us gather informa- ing to user requests. These systems can perform tion from, or disseminate information to, a large specific tasks at the user’s request (e.g., Wino- population for social benefit with minimal require- grad’s (1971) SHRDLU or question answering ments to the technological infrastructure. What systems) or keep the user company and soothed is more, ConvAI systems can potentially be more (e.g., ELIZA Weizenbaum, 1966). Recent work easily personalized than human operators when has placed more emphasis on mixed-initiative di- serving a large population (e.g., adjusting the vol- alogues, where interlocutors take turns to direct ume and/or rate of speech if the user has diffi- the conversation. This is also reflected in many culty in hearing). With data properly handled, ConvAI systems that serve a large number of users, they can also be better at preserving the privacy such as XiaoIce (Zhou et al., 2020) and Alexa of individuals using them. These properties make Prize socialbots (Gabriel et al., 2020). ConvAI systems promising candidates for tasks The community has also witnessed an evolution where a more personal-level connection is bene- in format of the conversations systems are pre- ficial, and especially where the stakes of privacy pared to engage. Task-oriented conversations re- is high. Such tasks might range from lower-stake main a popular problem to tackle, as they help hu- ones that inform the public about conservation man users to achieve concrete goals, such as book- and recycling in an attempt to bring behavioral ing a restaurant or turning on the light (e.g., conver- change, to higher-stake ones that consult individ- sations in the MultiWOZ dataset (Budzianowski uals on mental health issues or detecting suicidal et al., 2018)). Similar are experiment-based di- tendency. alogues where users perform experiments within an environment with the help of an ConvAI 3 Opportunities for Conversational AI system (e.g., SHRDLU or data analytics LUIs). Systems for Social Good These conversations often take place in a rela- tively closed world and involve well-framed prob- In this section, we will focus on discussing and il- lems. On the other end of the spectrum, chitchat lustrating how existing ConvAI technologies and ConvAI systems (or chatbots) engage and enter- systems can be applied for social good. This sec- tain the user without a predefined agenda, goal, or tion is organized around how ConvAI systems can even limit on topics. More recent research has also be applied to help us approach the UN’s Sustain-
able Development Goals (SDGs). For each SDG, tems can be made so that they only require natural we will focus on providing what in our perspec- language interactions (instead of keypad), further tive are salient but sometimes lesser-known exam- making it accessible to a broader demographic to ples to illustrate how today’s ConvAI technology collect answers. can be applied, but we acknowledge that this is Public health is not just about physical health. far from a comprehensive review. As our effort to Aside from physical health, mental well-being is avoid the pitfall of technological solutionism, we also of great importance, though public awareness, will focus on the SDGs where we see ConvAI tech- discovery, and intervention are still lagging behind nology having a more direct impact. (World Health Organization, 2013). With the right 3.1 Good Health and Well-being (SDG #3) set of tools, ConvAI systems have the great poten- tial of helping us uncover potential mental issues As the world is swept by the COVID-19 pandemic (stress, anxiety, depression, among others), and and quarantine measures in response, health and even administer the proper intervention. As the mental well-being is becoming more and more the pandemic exerts great mental stress on the public center issue on people’s mind. Many in the tech- (World Health Organization, 2020), one need not nology community wonder what we can do to help look far from our COVID-19 survey example to improve the situation, if anything. find a potential candidate (see Figure 1(b)). Smart assistants can help inform the public. Mental health issues can potentially be miti- The smart voice assistants in many homes is one gated outside of active counseling. We should way through which ConvAI technologies can help by no means limit ourselves to only considering inform the public about practical guidelines in this mental health issues when times are challenging, global health crisis. ConvAI agents can act as a as they are affecting the lives of many even with- source for answers to frequently asked questions out the pandemic. For instance, suicide was iden- regarding the pandemic, such as “What are the tified as the second leading cause of death in 2016 common symptoms of COVID-19?” and “What for young adults between 15 and 29 years of age are some best practices to prevent the spread of (World Health Organization, 2019). While chat- COVID-19?” (Eddy, 2020). Operating in a user- bots have been developed for counseling and/or initative, knowledge-based manner in a relatively remedying certain mental issues (Han et al., 2013; closed domain with trusted sources for answers al- Fadhil and AbuRa’ed, 2019; Simonite, 2020), they lows them to mitigate the risk of misinformation. require willing participants to make a difference. More effective and equitable public health mea- On the other hand, as ConvAI systems are already sures via telephone. Effective and equitable being widely deployed to offer customers pre-sale public health measures should be able to reach a and post-sale support in task-oriented dialogues, it much broader population than just those who have is also important to understand that these systems access to stable Internet connections and smart can also be designed more responsibly to detect, devices. Telephone surveys are an irreplaceable monitor, and respond to extreme emotions and be- means for understanding the spread and effect of haviors. With the reach of these online platforms the pandemic (World Bank, 2021). A timely up- and their ConvAI systems that often directly inter- date on a critical mass in the population is crucial act with users in place of human agents, seemingly to the control and monitoring of a rapidly devel- simple design choices can literally result in the dif- oping pandemic, but it is also tedious and labor- ference between life and death (e.g., (XinhuaNet, intensive. One potential way that ConvAI sys- 2020) where one such system notified help in time tems can help improve this (and similar future) sit- to prevent a suicide attempt via drug purchase). uation is conducting automated surveys over the phone. Since surveys are usually designed to be 3.2 Quality Education (SDG #4) well-structured with a relatively one-sided flow Help instructors be more effective. Making ef- of information, a ConvAI system can engage in fective use of computers to aid human instructors system-initiative conversations with a large num- better achieve their educational goals has been ber of users and collect their responses relatively the pursuit of the computer-assisted instruction easily (see Figure 1(a) for an example). With (CAI) community for decades (Graesser et al., the help of the ASR technologies today, such sys- 1999; Yacef, 2002; Olney et al., 2012). Combined
SYSTEM : Hello, this is the CDC calling to conduct a short survey in an effort to monitor the spread of COVID- (system inquires about flu-like symptoms) USER : Yes, I am coughing a little these two 19 in the city. Is this Ms. X? USER : Umm... Yes? days. I’m a little concerned. SYSTEM : I totally understand how you must SYSTEM : Hi Ms. X, may I ask if you have recently ex- feel. Are you experiencing fevers over 38 de- perienced flu-like symptoms like fever, dry cough, short- grees Celsius or 100 degrees Fahrenheit? ness of breath, or fatigue? USER : No, no fevers. Am I going to be okay? USER : No, none of those. SYSTEM : I see, there is no need to panic. I SYSTEM : I see. Have you had in-person contact with have recorded your information and will flag anyone who recently traveled to City Y in the past a health worker to check-in with you later in month? the day. In the meantime, please take your USER : No, not that I’m aware of. Not in the past month. body temperature twice a day, drink enough SYSTEM : I see. Given recent developments of the pan- fluids, and get plentiful rest. If you still demic, please continue to avoid indoor or outdoor gath- feel unwell and experience fevers or chills, erings, avoid unnecessary travels, and wear a mask when please visit a designated fever clinic. Call... you are in public spaces. Thank you for your time! (b) Pandemic survey with emotional support (a) Pandemic survey over the phone Figure 1: Two illustrative examples of pandemic survey over the phone using a ConvAI system. Besides gathering information for the survey (a), the system can be equipped with relatively simple natural language understanding and/or emotion detection tools to help offer appeasing messages and helpful advice (b). with domain knowledge and teaching strategies in- ing for teaching resource (e.g., asking questions in spired by human instructors, these conversational class). These agents can further be personalized to agents can potentially distribute the experience of fit each learner’s habit and better help each individ- quality one-on-one tutoring more broadly. This ual tackle long-term goals, such as building one’s can be especially helpful in communities that ex- vocabulary or preparing for a test, with helpful re- perience a shortage in supply of qualified educa- minders, check-ins, and interactive exercises. tors due to economic and/or geographic reasons, or where in-person learning is limited and engag- 3.3 Reducing Inequalities (SDG #10) ing learners via one-to-one instruction becomes more difficult (Becker et al., 2020). While teach- Our society evolves at breakneck speed, espe- ing dialogues are often mixed-initiative, which cially when one retrospects on the technological are more challenging, we could pursue relatively advances over the recent decades. In the mean- closed subsets of the conversation space by of- time, social inequalities also deteriorate as differ- fering experiment-based help for solving specific ent countries, communities, or individuals benefit multi-step problems, for instance. With direct from these advances differently. ConvAI is well- contact to the learners, ConvAI systems can also positioned to reduce some of the inequalities we be equipped to detect learning disabilities (Håvik experience. et al., 2018). This unique advantage will not only Equitable policies begin with equitable access allow the ConvAI systems themselves to adapt, to governments. One of the essential needs that but also potentially inform the human instructors ConvAI systems are well-equipped to help address to better cater to each learner’s unique learning is that each community can easily express their needs. needs to local officials to inform policy-making. Available beyond the classroom. Besides im- Similar to the telephone survey we described in proving the bandwidth and teaching style of tra- Section 3.1, a system-initiative ConvAI agent over ditional classroom learning, ConvAI systems can the phone can not only help gather community also engage learners better outside the physical feedback, but also help organize it for authorities or virtual classroom. Not being bound to limited to process more efficiently, expanding the band- office hours, relatively simple system-initiative width in public opinion intake (Androutsopoulou ConvAI helpers can be made available at virtually et al., 2019). Once an issue has been resolved, any time to a student. Having exclusive access depending on the subject matter, ConvAI systems to a virtual learning assistant can also potentially can also help inform the original caller of the reso- help eliminate the effect of imbalance in compet- lution and gather further feedback if necessary.
Disability and quality of life should not be mu- 3.4 Shared Economic Growth (SDGs #1) tually exclusive. Aside from gaining access to the public discourse, it is also important that ev- Besides aging, disability, and other personal traits, eryone is able to enjoy their private life, especially social inequalities within and between countries the benefits brought by technological development. are often rooted in the socioeconomic statuses of People with disabilities are too often left behind the population. It is imperative that we consider by the convenience of modern life. Natural lan- technological solution to not only reduce existing guage processing has great potential in improving inequalities, but also eradicate its source when pos- life quality for the visually and hearing impaired sible. Perhaps one of the most important and sus- through assistive technologies (e.g., audio descrip- tainable approach towards ending poverty is gen- tion for movies (Rohrbach et al., 2017) and closed erating and providing decent jobs with reasonable captioning via ASR). However, many existing as- pay, so that people can afford improved liveli- sistive language technologies provide one-sided in- hoods and pull themselves out of economic limbo. terfaces where people are only receiving informa- tion from the system. We argue that ConvAI tech- ConvAI can positively affect people aside from nologies can be applied in many of these applica- directly serving them. Despite our focus so far tions to bring an interactive experience that further on applications of ConvAI for social good, one as- extend the potential for them. For instance, aug- pect that is commonly neglected is how these sys- mented with knowledge-based visual question an- tems are built. Data is critical to modern (Conv)AI swering (Antol et al., 2015) and visual dialogue systems, and the process of acquiring this data of- (Das et al., 2017), movie description systems can ten involve paid annotation teams working in a further help the visually impaired explore scenes controlled setting. The nature of ConvAI makes in creative art, providing a fuller viewing experi- it a great candidate for redistributing the payout ence. of data collection to the disadvantaged, because: (1) natural conversation data is in great need, which is easy to generate without too much train- Fight inequality by reinforcing existing mecha- ing (e.g., using a Wizard-of-Oz approach (Kelley, nisms. Inequality is not new to our society, and 1984; Budzianowski et al., 2018)), and can usually it predates most of the technological advances we be collected in a safe environment;2 (2) ConvAI have discussed in this paper. To reduce inequal- data can be delivered via telephone/text messaging ity brought about by various societal changes, we (SMS) systems if necessary, which is much more have collectively come up with many solutions widely available; (3) as a result of the wider reach, in human history, one of which is charity orga- the resulting data is also more diverse, and will nizations that help redistribute resources to peo- help ConvAI systems better adapt to different data ple and causes in need. Besides directly helping variations, including those that would naturally oc- people affected by inequality, can ConvAI tech- cur in some of the aforementioned scenarios where nologies benefit social good by making a positive ConvAI can be applied (over the phone or SMS). impact on these organizations? Recently, Wang et al. (2019) demonstrated an inspiring angle to 4 Challenges for Conversational AI answer this question, where ConvAI agents are Systems for Social Good trained to persuade the interlocutor to make char- itable donations to certain causes. They showed In the previous section, we have explored vari- that these agents can potentially be personalized ous opportunities that current ConvAI technolo- to the psychological backgrounds of different per- gies can contribute to social good. In this section, suadees to promote positive outcome. For in- we turn to potential challenges we need to tackle to stance, when attempting to persuade someone who better realize this goal, and articulate some of the is more open (as defined in the Big-Five traits risks that might lie within the further development (Goldberg, 1992)), inviting them to learn about and deployment of ConvAI technology in the real the cause/organization (termed source-related in- world. quiry by Wang et al. (2019)) could be helpful, e.g., “Have you heard of Save the Children?” or “Are 2 Here, we focus on the raw speech/text data, and acknowl- you familiar with the organization?”. edge that they usually require separate post-processing.
4.1 Shared Challenges with ML/NLP for 4.2 Technical challenges for ConvAI Social Good In this section, we will focus on technical chal- Before diving into issues that are more unique to lenges and research frontiers that are more closely ConvAI, we briefly review some of the key chal- related to ConvAI (but not necessarily exclu- lenges and considerations that CovnAI systems sively). share with other machine learning (ML) or natu- Know thy interlocutor. One of the fundamen- ral language processing (NLP) technologies when tal goals of ConvAI is to facilitate communica- applied for social good. tion, therefore it is important to take into consid- Problem-centric, not tech-centric. One of the eration whether the system is communicating in common pitfalls is technology-centered solution- a manner that is clear and effective for the inter- ism. Tuomi (2018) summarized its origins and locutor. This is particularly important consider- risks aptly in an European Union Science for Pol- ing the increased diversity of population we set icy report on AI’s impact on education: out to reach in applications for social good. One aspect to factor into consideration when design- In the stage of technology push, technol- ing ConvAI systems to communicate with people ogy experts possess scarce knowledge ... from diverse backgrounds is to understand how [which] often dominates and overrides they would perceive the same message or mode of other types of knowledge ... this can be- communication, as miscommunication can some- come a problem as technologists easily times easily slip detection and lead to catastrophic transfer their own experiences and be- outcomes (Wikipedia contributors, 2021). We liefs about learning to their designs. urge researchers to consider using guidelines like Although technologists do bring fresh perspec- Datasheets (Gebru et al., 2018) to help calibrate tives, it is crucial to remember that when devel- the collection and use of data, and when possible, oping technology to solve problems, domain ex- involve members of communities the system is in- perts are more knowledgeable of the mechanisms, tended for in the process. causes, and nuances regarding the subject matter. Go beyond words. Besides a better understand- Green (2019) also warns us against the dangers ing of users that the system is designed to interact of underdefined or short-sighted metrics of social with, we can also make ConvAI systems communi- good, which are sometimes the result of lack of cate more effectively with the help of other modal- communication between technologists and subject ities than just text and speech. As interactive tech- matter experts. nologies like high-quality computer graphics, ro- By extension, it is also important to assess bust computer vision, haptics devices, augmented whether ConvAI (or other approaches familiar to and virtual reality (AR/VR), and embodied tech- the technologists) is significantly better than alter- nology become increasingly available to the pub- native solutions. While requiring simpler equip- lic, there are also great opportunities for computer ment to deliver, language interfaces are not always scientists from these different areas to join forces the most efficient if graphical user interfaces are in the quest towards better interactive computer applicable (e.g., providing directions in a build- systems (e.g., (BAAI, 2020)). These will also help ing). broaden our horizon, and potentially help enable Avoid Amplifying Bias. It is also important to interactive conversational systems for social good avoid exacerbating existing social biases or in- that is beyond our current imagination. However, equalities. Today, this is commonly rooted in the one of the obstacles of further developments in this kind of data available for many ML applications, area, we believe, involves data sharing, which we from facial recognition (Garvie and Frankle, 2016) will detail in the next section. to automatic speech recognition (Koenecke et al., Faster, Higher, Greener. As we pack more fea- 2020), to what is more and more recognized in tures into ConvAI systems, it is always useful to virtually any NLP application, the lack of linguis- remember that at the end of the day, these systems tic diversity (Joshi et al., 2020). It is important to are built to interact with humans in real time. Thus, carefully design and curate the data used to build besides being able to communicate effectively, rea- these systems, especially in a social good setting sonable response time is of great importance com- to avoid making a bad situation worse. pared to off-line ML systems. On the other hand,
the great opportunity for ConvAI to help in various fundamental ways to alleviate this bias is to col- settings and grounded situations should also en- lect data that is representative of the target audi- courage us to investigate better methods for trans- ence of our system. (SDG #10) Aside from rep- ferring between different settings with higher data- resentation of the population, what can usually efficiency. For instance, once we have built a well- be neglected is the representation of actual users designed system for pandemic survey for COVID- of the system. In the ConvAI research literature, 19, it can ideally help with surveying other infec- many datasets are collected from paid crowdwork- tious diseases without significant effort in data col- ers rather than actual users that the systems are in- lection, training, or system configuration. Both of tended to serve, which might not faithfully repre- these goals are not only practical and desirable for sent user behavior (de Vries et al., 2020). Further- ConvAI developers and social good stakeholders, more, as most datasets are collected via a Wizard- but potentially also helpful in reducing the compu- of-Oz approach, even if we were able to simulate tational cost and climate impact of these systems the intended users with crowdworkers, they will while they are serving to address societal problems be conversing with a “perfect” system that doesn’t (SDGs #12 Responsible Consumption and Pro- make certain mistakes that a trained ConvAI sys- duction & #13 Climate Action). We believe that tem would. This divide is crucial to realize and the rise of multi-domain dialogue datasets (e.g., mitigate in future work, especially for real-world MultiWOZ, Budzianowski et al., 2018) are a great applications. first step in this area. Another challenge that threatens the data quality It takes two... or more. Although so far in this for ConvAI concerns systems that adapt on the fly paper we have largely restricted our attention to to data collected during the time they are serving. conversations between one human user and one On one hand, overly relying on usage data could ConvAI system, this is by no means the only form potentially obfuscate or even exacerbate the im- of conversation that naturally occurs, or in the con- pact of blind spots in the system, as humans tend text of social good. One of the common scenar- to adjust their language use to accommodate the ios that multi-party conversations naturally occur perceived properties of the interlocutor (Giles and is peer support groups (SDG #3), where ConvAI Coupland, 1991). If a component in a ConvAI sys- can potentially serve as the coordinator between tem does not meet a user’s initial expectation, in- multiple human speakers to help guide the conver- stead of trying the same thing over and over again, sation (Nordberg et al., 2019). To the best of our they are much more likely to accommodate the knowledge, ConvAI research in this direction re- system by either enunciating or avoid the compo- mains relatively limited. nent entirely if possible. It is thus important to look beyond the survivorship bias that is usually 4.3 Social challenges for ConvAI present in such usage data. On the other hand, sys- Now that we have discussed the technical chal- tems made publicly available are more vulnerable lenges facing ConvAI and its application in social to data poisoning attacks (Nelson et al., 2008; Ru- good in the near future, we turn to social chal- binstein et al., 2009). Without sufficient defense lenges that are unlikely addressed solely by techni- and audit mechanisms, a well-intended system can cal advances, and/or that require a wider societal easily and quickly be polluted through its vulnera- awareness and collective effort to mitigate. bilities, and turned against the very people it set out to serve (e.g., Microsoft’s Tay bot (Vincent, Data sourcing and quality. The source of train- 2016)). ing and evaluation data is one of the most impor- tant aspects to consider when building and deploy- Data sharing. For the sustained development of ing modern AI systems. This is because not only ConvAI and its adoption in social good, a coor- does data affect the behavior of these systems, but dinated effort between academia and the industry it also often functions as a standard to compare and is also essential. Despite the abundance of tal- evaluate different candidates before “better” sys- ent and research freedom, academia usually lacks tems are chosen and deployed. the means to obtain substantial amounts of data As we have discussed in Section 4, it is impor- for ConvAI, especially data that reflect the “real- tant that we don’t exacerbate existing social biases istic data distribution”. In contrast, the industry when building these systems, and one of the most has a broader access to real-world data and prob-
lems, yet usually limited research staff to tackle all cause of its coherent and convincing posts, it went of them. While their interests somewhat diverge undetected for weeks (Heaven, 2020). between academically interesting, abstract, and This poses a higher risk in ConvAI systems open research projects (academia) versus propri- due to their interactive nature, as their misuse can etary technology and practical solutions for prob- lead to more elaborate and convincing deception lems at hand (industry), we would like to highlight schemes such as spreading mis/disinformation or that their shared common interests in empirically scamming. Seemingly benign technology that al- sound technology that hopefully satisfy a wide low ConvAI systems to assume a coherent iden- range of real-world needs. Therefore, we would tity (Li et al., 2016) can potentially help some- like to advocate closer collaborations between the one impersonate someone else online, akin to how two on how to collect, convert, anonymize, and generative adversarial networks (Goodfellow et al., share conversational data (especially data less pro- 2014) have lead to convincing deepfakes. This prietary in nature) and work on repeatable evalua- would not only jeopardize causes for social good tion methods, so that the broader research commu- that we have advocated for in this paper, but can nity can be exposed to unique problems rooted in potentially cause significant damage to our society real-world issues that can generate great positive if left unchecked. We urge the community to re- impact. flect on the ramifications of misuse of current AI Aggression from human users. As ConvAI sys- technologies, and devote time and energy to miti- tems are deployed in the wild, it is not uncommon gating their societal risks. that they meet aggression from human users from Transparency and trustworthiness. When time to time, and managing user frustration is an ConvAI systems interact with the general popu- important topic. Beyond helping the user achieve lation, they face the choice of whether to declare their goals, however, we should also watch out for their identity as a computer system or robot. This how these interactions can potentially affect our active act of transparency is not always designed interaction with other humans. into systems for various reasons (e.g., Google Most ConvAI systems available today on the Duplex (Google, 2018)), and as a result might market project a female persona (exclusively or sometimes have unintended consequences (Garun, by default), which is even more pronounced when 2019). Although the lack of transparency in this they interact using speech (Tay et al., 2014; UN- case is not a deceptive act for malignant purposes, ESCO and EQUALS Skills Coalition, 2019). As the restaurants’ perplexed reaction to Duplex they inevitably fail to fulfill our requests from clearly indicates that it is not inconsequential. time to time, these systems can become the tar- Another aspect related to transparency is trust- get of frustration, which may also inadvertently af- worthiness, which is essential for any AI system, fect how the new generation interact with ConvAI especially ConvAI systems that interact with hu- systems as they grow up around them (Elgan, mans directly. To this end, we argue that ConvAI 2018; Rudgard, 2018). Setting aside the debate systems deployed in the real world should be able about child education, it is important and respon- to recognize their limits of capabilities, and com- sible that we better understand whether this phe- municate these limits clearly. This can be a crucial nomenon mirrors actual human interactions, and step in earning trust especially in the event that the how it would contribute to social interactions in system has misstepped as a result of misidentify- the long term. If these devices are helping instill ing user intent and sentiment, or even mischarac- or reinforce stereotypes of genders and gender dy- terized user profile in an attempt to personalize to namics, it is the responsibility of the community to the user. raise public awareness and help advocate means to mitigate this effect (SDG #5 Gender equality). 5 Concluding Remarks Misuse for Deception. Technological advances In this paper, we have introduced how conversa- are almost always double-edged swords. As NLP tional artificial intelligence (ConvAI) systems can systems become more advanced, it will inevitably be applied to causes for social good today and in be easier to make them appear more human-like. the near future, and summarized some of the ma- For instance, GPT-3 (Brown et al., 2020) was used jor technical and social challenges the community to post on behalf of an account on Reddit, and be- still needs to tackle down the line. We acknowl-
edge that the opportunities and risks we included Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen- in this paper are by no means comprehensive, but tau Yih, Yejin Choi, Percy Liang, and Luke Zettle- moyer. 2018. QuAC: Question answering in context. an attempt of us to summarize and introduce what In Proceedings of the 2018 Conference on Empiri- in our opinion are typical and atypical scenarios cal Methods in Natural Language Processing, Brus- where ConvAI can be impactful or concerning to sels, Belgium. Association for Computational Lin- the community. We refer the reader to (Ruane guistics. et al., 2019) for a more comprehensive view of so- Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad cial and ethical considerations of ConvAI. Goel, and Aziz Huq. 2017. Algorithmic decision This paper is part of a broader conversation making and the cost of fairness. In Proceedings around the impact of AI systems in the real world, of the 23rd acm sigkdd international conference on including the very definition of social good (Green, knowledge discovery and data mining, pages 797– 806. 2019), transparency and interpretablility (Doshi- Velez and Kim, 2017), as well as algorithmic fair- Abhishek Das, Satwik Kottur, Khushi Gupta, Avi ness (Corbett-Davies et al., 2017), among others. Singh, Deshraj Yadav, José MF Moura, Devi Parikh, We hope that our paper can serve as a starting point and Dhruv Batra. 2017. Visual dialog. In Proceed- for the community brainstorming of various oppor- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 326–335. tunities and potential risks for ConvAI to promote social good and betterment. Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608. References Nathan Eddy. 2020. Mayo Clinic adds COVID- Aggeliki Androutsopoulou, Nikos Karacapilidis, Euri- 19 skills to Amazon Alexa. https://www. pidis Loukis, and Yannis Charalabidis. 2019. Trans- healthcareitnews.com/news/mayo-clinic- forming the communication between citizens and adds-covid-19-skills-amazon-alexa. government through AI-guided chatbots. Govern- Accessed: January 15, 2021. ment Information Quarterly, 36(2):358 – 367. Mike Elgan. 2018. The case against teaching Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Mar- kids to be polite to Alexa. https://www. garet Mitchell, Dhruv Batra, C Lawrence Zitnick, fastcompany.com/40588020/the-case- and Devi Parikh. 2015. VQA: Visual question an- against-teaching-kids-to-be-polite-to- swering. In Proceedings of the IEEE international alexa. Accessed: January 22, 2021. conference on computer vision, pages 2425–2433. Ahmed Fadhil and Ahmed AbuRa’ed. 2019. OlloBot BAAI. 2020. BAAI-JD multimodal dialog challenge. - towards a text-based Arabic health conversational https://jddc.jd.com. Accessed: January 22, agent: Evaluation and results. In Proceedings of 2021 (in Chinese). the International Conference on Recent Advances in Stephen P. Becker, Rosanna Breaux, Caroline N. Natural Language Processing (RANLP 2019), pages Cusick, Melissa R. Dvorsky, Nicholas P. Marsh, 295–303, Varna, Bulgaria. INCOMA Ltd. Emma Sciberras, and Joshua M. Langberg. 2020. Remote learning during COVID-19: Examining Raefer Gabriel, Yang Liu, Anna Gottardi, Mihail Eric, school practices, service continuation, and difficul- Anju Khatri, Anjali Chadha, Qinlang Chen, Behnam ties for adolescents with and without attention-d- Hedayatnia, Pankaj Rajan, Ali Binici, Shui Hu, eficit/hyperactivity disorder. Journal of Adolescent Karthik Gopalakrishnan, Seokhwan Kim, Lauren Health, 67(6):769 – 777. Stubel, Kate Bland, Arindam Mandal, and Dilek Hakkani-Tür. 2020. Further advances in open do- Tom B Brown, Benjamin Mann, Nick Ryder, Melanie main dialog systems in the third alexa prize socialbot Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind grand challenge. Technical report, Amazon Alexa Neelakantan, Pranav Shyam, Girish Sastry, Amanda AI. Askell, et al. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165. Natt Garun. 2019. One year later, restaurants are still confused by Google Duplex. https://www. Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang theverge.com/2019/5/9/18538194/google- Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ra- duplex-ai-restaurants-experiences- madan, and Milica Gašić. 2018. MultiWOZ - a review-robocalls. Accessed: January 10, 2021. large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In Proceedings of Clare Garvie and Jonathan Frankle. 2016. Facial- the 2018 Conference on Empirical Methods in Natu- recognition software might have a racial bias prob- ral Language Processing. lem. The Atlantic, 7.
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Jennifer Wortman Vaughan, Hanna Wallach, Hal Bali, and Monojit Choudhury. 2020. The state and Daumé III, and Kate Crawford. 2018. Datasheets fate of linguistic diversity and inclusion in the NLP for datasets. arXiv preprint arXiv:1803.09010. world. In Proceedings of the 58th Annual Meet- ing of the Association for Computational Linguistics, Howard Giles and Nikolas Coupland. 1991. Accommo- pages 6282–6293, Online. Association for Computa- dating language. Open University Press. tional Linguistics. Lewis R Goldberg. 1992. The development of mark- John F Kelley. 1984. An iterative design methodology ers for the big-five factor structure. Psychological for user-friendly natural language office information assessment, 4(1):26. applications. ACM Transactions on Information Sys- tems (TOIS), 2(1):26–41. Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Allison Koenecke, Andrew Nam, Emily Lake, Joe Courville, and Yoshua Bengio. 2014. Generative ad- Nudell, Minnie Quartey, Zion Mengesha, Connor versarial networks. arXiv preprint arXiv:1406.2661. Toups, John R Rickford, Dan Jurafsky, and Sharad Goel. 2020. Racial disparities in automated speech Google. 2018. Google Duplex: An AI sys- recognition. Proceedings of the National Academy tem for accomplishing real-world tasks over of Sciences, 117(14):7684–7689. the phone. https://ai.googleblog.com/ 2018/05/duplex-ai-system-for-natural- Jiwei Li, Michel Galley, Chris Brockett, Georgios Sp- conversation.html. Accessed: January 10, ithourakis, Jianfeng Gao, and Bill Dolan. 2016. A 2021. persona-based neural conversation model. In Pro- ceedings of the 54th Annual Meeting of the Associa- Arthur C Graesser, Katja Wiemer-Hastings, Peter tion for Computational Linguistics (Volume 1: Long Wiemer-Hastings, and Roger Kreuz. 1999. AutoTu- Papers), pages 994–1003, Berlin, Germany. Associ- tor: A simulation of a human tutor. Cognitive Sys- ation for Computational Linguistics. tems Research, 1(1):35 – 51. Ben Green. 2019. “Good” isn’t good enough. In Blaine Nelson, Marco Barreno, Fuching Jack Chi, An- NeurIPS Joint Workshop on AI for Social Good. thony D Joseph, Benjamin IP Rubinstein, Udam Saini, Charles A Sutton, J Doug Tygar, and Kai Xia. Sangdo Han, Kyusong Lee, Donghyeon Lee, and 2008. Exploiting machine learning to subvert your Gary Geunbae Lee. 2013. Counseling dialog sys- spam filter. LEET, 8:1–9. tem with 5W1H extraction. In Proceedings of the SIGDIAL 2013 Conference, pages 349–353, Metz, Oda Elise Nordberg, Jo Dugstad Wake, Emilie Sektnan France. Association for Computational Linguistics. Nordby, Eivind Flobak, Tine Nordgreen, Suresh Ku- mar Mukhiya, and Frode Guribye. 2019. Designing Robin Håvik, Jo Dugstad Wake, Eivind Flobak, As- chatbots for guiding online peer support conversa- tri Lundervold, and Frode Guribye. 2018. A con- tions for adults with ADHD. In International Work- versational interface for self-screening for ADHD in shop on Chatbot Research and Design, pages 113– adults. In International Conference on Internet Sci- 126. Springer. ence, pages 133–144. Springer. Andrew M Olney, Sidney D’Mello, Natalie Person, He He, Anusha Balakrishnan, Mihail Eric, and Percy Whitney Cade, Patrick Hays, Claire Williams, Blair Liang. 2017. Learning symmetric collaborative dia- Lehman, and Arthur Graesser. 2012. Guru: A com- logue agents with dynamic knowledge graph embed- puter tutor that models expert human tutors. In Inter- dings. In Proceedings of the 55th Annual Meeting of national conference on intelligent tutoring systems, the Association for Computational Linguistics (Vol- pages 256–261. Springer. ume 1: Long Papers), pages 1766–1776, Vancouver, Canada. Association for Computational Linguistics. Antoine Raux, Brian Langner, Dan Bohus, Alan W Black, and Maxine Eskenazi. 2005. Let’s go pub- He He, Derek Chen, Anusha Balakrishnan, and Percy lic! taking a spoken dialog system to the real world. Liang. 2018. Decoupling strategy and generation in In Ninth European conference on speech communi- negotiation dialogues. In Proceedings of the 2018 cation and technology. Conference on Empirical Methods in Natural Lan- guage Processing, pages 2333–2343, Brussels, Bel- Anna Rohrbach, Atousa Torabi, Marcus Rohrbach, gium. Association for Computational Linguistics. Niket Tandon, Christopher Pal, Hugo Larochelle, Aaron Courville, and Bernt Schiele. 2017. Movie Will Douglas Heaven. 2020. A GPT-3 bot posted description. International Journal of Computer Vi- comments on reddit for a week and no one no- sion, 123(1):94–120. ticed. https://www.technologyreview.com/ 2020/10/08/1009845/a-gpt-3-bot-posted- Elayne Ruane, Abeba Birhane, and Anthony Ven- comments-on-reddit-for-a-week-and-no- tresque. 2019. Conversational AI: Social and ethical one-noticed/. Accessed: January 21, 2021. considerations. In AICS, pages 104–115.
Benjamin IP Rubinstein, Blaine Nelson, Ling Huang, World Bank. 2021. LSMS-supported high-frequency Anthony D Joseph, Shing-hon Lau, Satish Rao, Nina phone surveys on covid-19. https://www. Taft, and J Doug Tygar. 2009. Antidote: understand- worldbank.org/en/programs/lsms/brief/ ing and defending against poisoning of anomaly de- lsms-launches-high-frequency-phone- tectors. In Proceedings of the 9th ACM SIGCOMM surveys-on-covid-19. Accessed: January 19, conference on Internet measurement, pages 1–14. 2021. Olivia Rudgard. 2018. ‘alexa generation’ may be learn- World Health Organization. 2013. Comprehensive ing bad manners from talking to digital assistants, mental health action plan: 2013–2020. https:// report warns. https://www.telegraph.co. apps.who.int/gb/ebwha/pdf_files/WHA66/ uk/news/2018/01/31/alexa-generation- A66_R8-en.pdf. Accessed: January 19, 2021. could-learning-bad-manners-talking- digital/. Accessed: January 10, 2021. World Health Organization. 2019. Suicide. https:// www.who.int/news-room/fact-sheets/ Tom Simonite. 2020. The therapist is in—and it’s a detail/suicide. Accessed: January 19, 2021. chatbot app. https://www.wired.com/story/ therapist-in-chatbot-app/. Accessed: Jan- World Health Organization. 2020. COVID-19 disrupt- uary 20, 2021. ing mental health services in most countries, WHO survey. https://www.who.int/news/item/ Benedict Tay, Younbo Jung, and Taezoon Park. 2014. 05-10-2020-covid-19-disrupting-mental- When stereotypes meet robots: the double-edge health-services-in-most-countries-who- sword of robot gender and personality in human– survey. Accessed: January 19, 2021. robot interaction. Computers in Human Behavior, 38:75–84. XinhuaNet. 2020. Precisely detecting desperate emo- tions, this AI customer service saved a life by Ilkka Tuomi. 2018. The impact of artificial intelligence alerting help. http://www.xinhuanet.com/ on learning, teaching, and education. Technical re- tech/2020-06/24/c_1126152953.htm. Ac- port, Joint Research Centre. cessed: January 19, 2021 (in Chinese). UNESCO and EQUALS Skills Coalition. 2019. I’d K. Yacef. 2002. Intelligent teaching assistant systems. blush if I could: closing gender divides in digital In International Conference on Computers in Educa- skills through education. https://unesdoc. tion, 2002. Proceedings., pages 136–140 vol.1. unesco.org/ark:/48223/pf0000367416. page=1. Accessed: January 21, 2021. Li Zhou, Jianfeng Gao, Di Li, and Heung-Yeung Shum. James Vincent. 2016. Twitter taught Microsoft’s 2020. The design and implementation of xiaoice, an AI chatbot to be a racist asshole in less than a empathetic social chatbot. Computational Linguis- day. https://www.theverge.com/2016/3/24/ tics, 46(1):53–93. 11297050/tay-microsoft-chatbot-racist. Accessed: January 21, 2021. Harm de Vries, Dzmitry Bahdanau, and Christopher Manning. 2020. Towards ecologically valid re- search on language user interfaces. arXiv preprint arXiv:2007.14435. Xuewei Wang, Weiyan Shi, Richard Kim, Yoojung Oh, Sijia Yang, Jingwen Zhang, and Zhou Yu. 2019. Per- suasion for good: Towards a personalized persuasive dialogue system for social good. In Proceedings of the 57th Annual Meeting of the Association for Com- putational Linguistics. Joseph Weizenbaum. 1966. Eliza—a computer pro- gram for the study of natural language communica- tion between man and machine. Communications of the ACM, 9(1):36–45. Wikipedia contributors. 2021. Avianca flight 52 — Wikipedia, the free encyclopedia. [Online; accessed 22-January-2021]. Terry Winograd. 1971. Procedures as a represen- tation for data in a computer program for under- standing natural language. Technical report, MAS- SACHUSETTS INST OF TECH CAMBRIDGE PROJECT MAC.
400 dialog(ue) chat conversation(al) Paper Count 200 0 1980 2000 2020 Year Figure 2: Statistics of papers in the ACL Anthology that mention “dialog(ue)”, “chat”, or “conversation(al)” explicitly in the title over the past five decades. A Supplemental Material In Figure 2, we take all the references from the ACL Anthology3 as of May 2021, and illustrate the growing interest in Conversational AI as is re- flected in direct mentions of “dialog(ue)”, “chat”, and “conversation(al)” in paper titles. As can be seen from the figure, there has been a renewed interested in ConvAI from the ACL community since the turn of the millennium, followed by a sharp growth in the past five years. 3 https://www.aclweb.org/anthology/
You can also read