Conversational AI Systems for Social Good: Opportunities and Challenges

 
CONTINUE READING
Conversational AI Systems for Social Good:
                                                                        Opportunities and Challenges

                                                  Peng Qi         Jing Huang
                                                                   Youzheng Wu       Xiaodong He Bowen Zhou
                                                                     JD AI Research
                                                                 Mountain View, CA 94043
                                         {peng.qi, jing.huang, wuyouzheng1, xiaodong.he, bowen.zhou}@jd.com

                                                                  Abstract                         allow users to convey their needs the same way
                                                                                                   they would to a human counterpart. Second, with
arXiv:2105.06457v1 [cs.CL] 13 May 2021

                                                Conversational artificial intelligence (ConvAI)
                                                                                                   the help of increasingly robust automatic speech
                                                systems have attracted much academic and
                                                commercial attention recently, making signif-      recognition (ASR) and speech synthesis (or text-
                                                icant progress on both fronts. However, lit-       to-speech, TTS) systems, ConvAI systems are a
                                                tle existing work discusses how these sys-         lot easier and inexpensive to access than many
                                                tems can be developed and deployed for so-         other technical solutions. One could gain access
                                                cial good. In this paper, we briefly review        to a ConvAI system as long as they can access a
                                                the progress the community has made towards        telephone, without needing Internet access, smart
                                                better ConvAI systems and reflect on how ex-       devices, or even a digital cellular network.
                                                isting technologies can help advance social
                                                good initiatives from various angles that are
                                                                                                      Both reasons also make ConvAI a great can-
                                                unique for ConvAI, or not yet become com-          didate technology for social good, because these
                                                mon knowledge in the community. We further         properties help the underlying technical solution
                                                discuss about the challenges ahead for ConvAI      reach a much broader population at virtually no ad-
                                                systems to better help us achieve these goals      ditional cost to the society, nor delay in implemen-
                                                and highlight the risks involved in their devel-   tation to the communities to be served. However,
                                                opment and deployment in the real world.           despite these inherent advantages and the grow-
                                                                                                   ing interest in ConvAI, there has been little ex-
                                          1 Introduction
                                                                                                   isting work that discusses how these systems and
                                         Conversational artificial intelligence (ConvAI)           technologies can be developed and used for social
                                         systems, or dialogue systems, are increasingly            good to the best of our knowledge.
                                         prevalent in our everyday lives, taking the form of          In this paper, we aim to better understand how
                                         smart assistants for home and hand-held devices           conversational AI technologies and systems can
                                         (e.g., Amazon’s Alexa, Apple’s Siri, Google As-           be developed and deployed for social good. To
                                         sistant) or friendly helpers with telephone banking       this end, we begin by briefly reviewing the com-
                                         or online customer service, to name a few. Con-           munity’s efforts and progress on ConvAI over the
                                         temporaneously, there has been a renewed inter-           past few decades. We then explore how existing
                                         est in ConvAI from the research community, evi-           technolgy can be, or have been, applied to various
                                         denced by the enthusiasm around the ConvAI chal-          scenarios to advance the United Nations’ Sustain-
                                         lenge1 and the growing number of publications in          able Development Goals (SDGs)for social good.
                                         the ACL Anthology that mention ConvAI-themed              We illustrate specific use cases with concrete ex-
                                         keywords in their titles (see Figure 2 in the ap-         amples, with an focus on application scenarios that
                                         pendix).                                                  are far from common knowledge. Finally, we con-
                                            Aside from the recent technical advances, we           clude with reflections on potential challenges and
                                         recognize two important reasons for their popular-        risks when we develop and deploy new ConvAI
                                         ity. First, ConvAI systems provide a natural lan-         technologies into the real world, focusing on the
                                         guage user interface (LUI) for their underlying ap-       societal impact and how one might mitigate them.
                                         plications, requiring little to no prior training on
                                         the users’ part to use them. In an ideal world,           2   Background on Conversational AI
                                         ConvAI technology would help us build LUIs that
                                                                                                   Communicating with computers in human lan-
                                            1
                                                https://convai.io/                                 guages has been a long-standing goal of com-
puter science since its early days. From intuitively      explored practical conversations across the spec-
named commands to more recent conversational              trum of different goal specificity, domain open-
AI systems, effective use of computer systems has         ness, as well as levels of grounding. Some take the
been made more and more accessible to average             familiar form of a ConvAI system assisting human
users.                                                    users (e.g., knowledge-based ConvAI, including
   For our discussion, we adopt an deliberately           question answering systems (Choi et al., 2018)),
broad definition of the term “Conversational              while others focus more on conversational skills
AI”: any system or technology that allows hu-             that featured in collaborative or competitive peer
man users to interact with computers via natu-            conversations (He et al., 2017, 2018).
ral language. This definition encompasses most               Before we move on to exploring how ConvAI
language-related interactive systems, including           systems can be applied for social good, we must
those based on audible speech interactions (e.g.,         first answer the following questions: once we have
CMU’s Let’s Go! system for bus information                built a ConvAI system, what effects can we expect
(Raux et al., 2005)).                                     it to have on the world around us? What proper-
                                                          ties make them more appealing to, say, their hu-
   Many early ConvAI systems give one of the con-
                                                          man counterparts? One of the most basic proper-
versationalists the exclusive agency to drive the
                                                          ties that makes ConvAI systems potentially more
conversation. System-initiative ConvAI agents
                                                          appealing to human operators is that they are more
typically follow a predefined dialogue plan and of-
                                                          readily available and scalable. This makes it pos-
fer users limited options. As a result, these sys-
                                                          sible for ConvAI services to be accessible dur-
tems tend to have an easier time understanding the
                                                          ing non-working hours, and allow them to cater
user as long as they are cooperating (e.g., auto-
                                                          to more users at the same time at virtually no
mated banking services). On the other hand, user-
                                                          marginal cost. As a result, ConvAI systems can be
initiative systems focus on responding or react-
                                                          perfect alternatives in helping us gather informa-
ing to user requests. These systems can perform
                                                          tion from, or disseminate information to, a large
specific tasks at the user’s request (e.g., Wino-
                                                          population for social benefit with minimal require-
grad’s (1971) SHRDLU or question answering
                                                          ments to the technological infrastructure. What
systems) or keep the user company and soothed
                                                          is more, ConvAI systems can potentially be more
(e.g., ELIZA Weizenbaum, 1966). Recent work
                                                          easily personalized than human operators when
has placed more emphasis on mixed-initiative di-
                                                          serving a large population (e.g., adjusting the vol-
alogues, where interlocutors take turns to direct
                                                          ume and/or rate of speech if the user has diffi-
the conversation. This is also reflected in many
                                                          culty in hearing). With data properly handled,
ConvAI systems that serve a large number of users,
                                                          they can also be better at preserving the privacy
such as XiaoIce (Zhou et al., 2020) and Alexa
                                                          of individuals using them. These properties make
Prize socialbots (Gabriel et al., 2020).
                                                          ConvAI systems promising candidates for tasks
   The community has also witnessed an evolution          where a more personal-level connection is bene-
in format of the conversations systems are pre-           ficial, and especially where the stakes of privacy
pared to engage. Task-oriented conversations re-          is high. Such tasks might range from lower-stake
main a popular problem to tackle, as they help hu-        ones that inform the public about conservation
man users to achieve concrete goals, such as book-        and recycling in an attempt to bring behavioral
ing a restaurant or turning on the light (e.g., conver-   change, to higher-stake ones that consult individ-
sations in the MultiWOZ dataset (Budzianowski             uals on mental health issues or detecting suicidal
et al., 2018)). Similar are experiment-based di-          tendency.
alogues where users perform experiments within
an environment with the help of an ConvAI                 3   Opportunities for Conversational AI
system (e.g., SHRDLU or data analytics LUIs).                 Systems for Social Good
These conversations often take place in a rela-
tively closed world and involve well-framed prob-         In this section, we will focus on discussing and il-
lems. On the other end of the spectrum, chitchat          lustrating how existing ConvAI technologies and
ConvAI systems (or chatbots) engage and enter-            systems can be applied for social good. This sec-
tain the user without a predefined agenda, goal, or       tion is organized around how ConvAI systems can
even limit on topics. More recent research has also       be applied to help us approach the UN’s Sustain-
able Development Goals (SDGs). For each SDG,           tems can be made so that they only require natural
we will focus on providing what in our perspec-        language interactions (instead of keypad), further
tive are salient but sometimes lesser-known exam-      making it accessible to a broader demographic to
ples to illustrate how today’s ConvAI technology       collect answers.
can be applied, but we acknowledge that this is        Public health is not just about physical health.
far from a comprehensive review. As our effort to      Aside from physical health, mental well-being is
avoid the pitfall of technological solutionism, we     also of great importance, though public awareness,
will focus on the SDGs where we see ConvAI tech-       discovery, and intervention are still lagging behind
nology having a more direct impact.                    (World Health Organization, 2013). With the right
3.1 Good Health and Well-being (SDG #3)                set of tools, ConvAI systems have the great poten-
                                                       tial of helping us uncover potential mental issues
As the world is swept by the COVID-19 pandemic         (stress, anxiety, depression, among others), and
and quarantine measures in response, health and        even administer the proper intervention. As the
mental well-being is becoming more and more the        pandemic exerts great mental stress on the public
center issue on people’s mind. Many in the tech-       (World Health Organization, 2020), one need not
nology community wonder what we can do to help         look far from our COVID-19 survey example to
improve the situation, if anything.                    find a potential candidate (see Figure 1(b)).
Smart assistants can help inform the public.           Mental health issues can potentially be miti-
The smart voice assistants in many homes is one        gated outside of active counseling. We should
way through which ConvAI technologies can help         by no means limit ourselves to only considering
inform the public about practical guidelines in this   mental health issues when times are challenging,
global health crisis. ConvAI agents can act as a       as they are affecting the lives of many even with-
source for answers to frequently asked questions       out the pandemic. For instance, suicide was iden-
regarding the pandemic, such as “What are the          tified as the second leading cause of death in 2016
common symptoms of COVID-19?” and “What                for young adults between 15 and 29 years of age
are some best practices to prevent the spread of       (World Health Organization, 2019). While chat-
COVID-19?” (Eddy, 2020). Operating in a user-          bots have been developed for counseling and/or
initative, knowledge-based manner in a relatively      remedying certain mental issues (Han et al., 2013;
closed domain with trusted sources for answers al-     Fadhil and AbuRa’ed, 2019; Simonite, 2020), they
lows them to mitigate the risk of misinformation.      require willing participants to make a difference.
More effective and equitable public health mea-        On the other hand, as ConvAI systems are already
sures via telephone. Effective and equitable           being widely deployed to offer customers pre-sale
public health measures should be able to reach a       and post-sale support in task-oriented dialogues, it
much broader population than just those who have       is also important to understand that these systems
access to stable Internet connections and smart        can also be designed more responsibly to detect,
devices. Telephone surveys are an irreplaceable        monitor, and respond to extreme emotions and be-
means for understanding the spread and effect of       haviors. With the reach of these online platforms
the pandemic (World Bank, 2021). A timely up-          and their ConvAI systems that often directly inter-
date on a critical mass in the population is crucial   act with users in place of human agents, seemingly
to the control and monitoring of a rapidly devel-      simple design choices can literally result in the dif-
oping pandemic, but it is also tedious and labor-      ference between life and death (e.g., (XinhuaNet,
intensive. One potential way that ConvAI sys-          2020) where one such system notified help in time
tems can help improve this (and similar future) sit-   to prevent a suicide attempt via drug purchase).
uation is conducting automated surveys over the
phone. Since surveys are usually designed to be        3.2   Quality Education (SDG #4)
well-structured with a relatively one-sided flow       Help instructors be more effective. Making ef-
of information, a ConvAI system can engage in          fective use of computers to aid human instructors
system-initiative conversations with a large num-      better achieve their educational goals has been
ber of users and collect their responses relatively    the pursuit of the computer-assisted instruction
easily (see Figure 1(a) for an example). With          (CAI) community for decades (Graesser et al.,
the help of the ASR technologies today, such sys-      1999; Yacef, 2002; Olney et al., 2012). Combined
SYSTEM : Hello, this is the CDC calling to conduct a
           short survey in an effort to monitor the spread of COVID-             (system inquires about flu-like symptoms)
                                                                               USER :  Yes, I am coughing a little these two
           19 in the city. Is this Ms. X?
           USER : Umm... Yes?
                                                                               days. I’m a little concerned.
                                                                               SYSTEM : I totally understand how you must
           SYSTEM : Hi Ms. X, may I ask if you have recently ex-
                                                                               feel. Are you experiencing fevers over 38 de-
           perienced flu-like symptoms like fever, dry cough, short-
                                                                               grees Celsius or 100 degrees Fahrenheit?
           ness of breath, or fatigue?
                                                                               USER : No, no fevers. Am I going to be okay?
           USER : No, none of those.
                                                                               SYSTEM : I see, there is no need to panic. I
           SYSTEM : I see. Have you had in-person contact with
                                                                               have recorded your information and will flag
           anyone who recently traveled to City Y in the past
                                                                               a health worker to check-in with you later in
           month?
                                                                               the day. In the meantime, please take your
           USER : No, not that I’m aware of. Not in the past month.
                                                                               body temperature twice a day, drink enough
           SYSTEM : I see. Given recent developments of the pan-
                                                                               fluids, and get plentiful rest. If you still
           demic, please continue to avoid indoor or outdoor gath-
                                                                               feel unwell and experience fevers or chills,
           erings, avoid unnecessary travels, and wear a mask when
                                                                               please visit a designated fever clinic. Call...
           you are in public spaces. Thank you for your time!
                                                                             (b) Pandemic survey with emotional support
                   (a) Pandemic survey over the phone

Figure 1: Two illustrative examples of pandemic survey over the phone using a ConvAI system. Besides gathering
information for the survey (a), the system can be equipped with relatively simple natural language understanding
and/or emotion detection tools to help offer appeasing messages and helpful advice (b).

with domain knowledge and teaching strategies in-                      ing for teaching resource (e.g., asking questions in
spired by human instructors, these conversational                      class). These agents can further be personalized to
agents can potentially distribute the experience of                    fit each learner’s habit and better help each individ-
quality one-on-one tutoring more broadly. This                         ual tackle long-term goals, such as building one’s
can be especially helpful in communities that ex-                      vocabulary or preparing for a test, with helpful re-
perience a shortage in supply of qualified educa-                      minders, check-ins, and interactive exercises.
tors due to economic and/or geographic reasons,
or where in-person learning is limited and engag-                      3.3     Reducing Inequalities (SDG #10)
ing learners via one-to-one instruction becomes
more difficult (Becker et al., 2020). While teach-                     Our society evolves at breakneck speed, espe-
ing dialogues are often mixed-initiative, which                        cially when one retrospects on the technological
are more challenging, we could pursue relatively                       advances over the recent decades. In the mean-
closed subsets of the conversation space by of-                        time, social inequalities also deteriorate as differ-
fering experiment-based help for solving specific                      ent countries, communities, or individuals benefit
multi-step problems, for instance. With direct                         from these advances differently. ConvAI is well-
contact to the learners, ConvAI systems can also                       positioned to reduce some of the inequalities we
be equipped to detect learning disabilities (Håvik                    experience.
et al., 2018). This unique advantage will not only
                                                                       Equitable policies begin with equitable access
allow the ConvAI systems themselves to adapt,
                                                                       to governments. One of the essential needs that
but also potentially inform the human instructors
                                                                       ConvAI systems are well-equipped to help address
to better cater to each learner’s unique learning
                                                                       is that each community can easily express their
needs.
                                                                       needs to local officials to inform policy-making.
Available beyond the classroom. Besides im-                            Similar to the telephone survey we described in
proving the bandwidth and teaching style of tra-                       Section 3.1, a system-initiative ConvAI agent over
ditional classroom learning, ConvAI systems can                        the phone can not only help gather community
also engage learners better outside the physical                       feedback, but also help organize it for authorities
or virtual classroom. Not being bound to limited                       to process more efficiently, expanding the band-
office hours, relatively simple system-initiative                      width in public opinion intake (Androutsopoulou
ConvAI helpers can be made available at virtually                      et al., 2019). Once an issue has been resolved,
any time to a student. Having exclusive access                         depending on the subject matter, ConvAI systems
to a virtual learning assistant can also potentially                   can also help inform the original caller of the reso-
help eliminate the effect of imbalance in compet-                      lution and gather further feedback if necessary.
Disability and quality of life should not be mu-         3.4    Shared Economic Growth (SDGs #1&#8)
tually exclusive. Aside from gaining access to
the public discourse, it is also important that ev-      Besides aging, disability, and other personal traits,
eryone is able to enjoy their private life, especially   social inequalities within and between countries
the benefits brought by technological development.       are often rooted in the socioeconomic statuses of
People with disabilities are too often left behind       the population. It is imperative that we consider
by the convenience of modern life. Natural lan-          technological solution to not only reduce existing
guage processing has great potential in improving        inequalities, but also eradicate its source when pos-
life quality for the visually and hearing impaired       sible. Perhaps one of the most important and sus-
through assistive technologies (e.g., audio descrip-     tainable approach towards ending poverty is gen-
tion for movies (Rohrbach et al., 2017) and closed       erating and providing decent jobs with reasonable
captioning via ASR). However, many existing as-          pay, so that people can afford improved liveli-
sistive language technologies provide one-sided in-      hoods and pull themselves out of economic limbo.
terfaces where people are only receiving informa-
tion from the system. We argue that ConvAI tech-         ConvAI can positively affect people aside from
nologies can be applied in many of these applica-        directly serving them. Despite our focus so far
tions to bring an interactive experience that further    on applications of ConvAI for social good, one as-
extend the potential for them. For instance, aug-        pect that is commonly neglected is how these sys-
mented with knowledge-based visual question an-          tems are built. Data is critical to modern (Conv)AI
swering (Antol et al., 2015) and visual dialogue         systems, and the process of acquiring this data of-
(Das et al., 2017), movie description systems can        ten involve paid annotation teams working in a
further help the visually impaired explore scenes        controlled setting. The nature of ConvAI makes
in creative art, providing a fuller viewing experi-      it a great candidate for redistributing the payout
ence.                                                    of data collection to the disadvantaged, because:
                                                         (1) natural conversation data is in great need,
                                                         which is easy to generate without too much train-
Fight inequality by reinforcing existing mecha-          ing (e.g., using a Wizard-of-Oz approach (Kelley,
nisms. Inequality is not new to our society, and         1984; Budzianowski et al., 2018)), and can usually
it predates most of the technological advances we        be collected in a safe environment;2 (2) ConvAI
have discussed in this paper. To reduce inequal-         data can be delivered via telephone/text messaging
ity brought about by various societal changes, we        (SMS) systems if necessary, which is much more
have collectively come up with many solutions            widely available; (3) as a result of the wider reach,
in human history, one of which is charity orga-          the resulting data is also more diverse, and will
nizations that help redistribute resources to peo-       help ConvAI systems better adapt to different data
ple and causes in need. Besides directly helping         variations, including those that would naturally oc-
people affected by inequality, can ConvAI tech-          cur in some of the aforementioned scenarios where
nologies benefit social good by making a positive        ConvAI can be applied (over the phone or SMS).
impact on these organizations? Recently, Wang
et al. (2019) demonstrated an inspiring angle to         4     Challenges for Conversational AI
answer this question, where ConvAI agents are                  Systems for Social Good
trained to persuade the interlocutor to make char-
itable donations to certain causes. They showed          In the previous section, we have explored vari-
that these agents can potentially be personalized        ous opportunities that current ConvAI technolo-
to the psychological backgrounds of different per-       gies can contribute to social good. In this section,
suadees to promote positive outcome. For in-             we turn to potential challenges we need to tackle to
stance, when attempting to persuade someone who          better realize this goal, and articulate some of the
is more open (as defined in the Big-Five traits          risks that might lie within the further development
(Goldberg, 1992)), inviting them to learn about          and deployment of ConvAI technology in the real
the cause/organization (termed source-related in-        world.
quiry by Wang et al. (2019)) could be helpful, e.g.,
“Have you heard of Save the Children?” or “Are              2
                                                              Here, we focus on the raw speech/text data, and acknowl-
you familiar with the organization?”.                    edge that they usually require separate post-processing.
4.1 Shared Challenges with ML/NLP for                    4.2   Technical challenges for ConvAI
    Social Good                                          In this section, we will focus on technical chal-
Before diving into issues that are more unique to        lenges and research frontiers that are more closely
ConvAI, we briefly review some of the key chal-          related to ConvAI (but not necessarily exclu-
lenges and considerations that CovnAI systems            sively).
share with other machine learning (ML) or natu-          Know thy interlocutor. One of the fundamen-
ral language processing (NLP) technologies when          tal goals of ConvAI is to facilitate communica-
applied for social good.                                 tion, therefore it is important to take into consid-
Problem-centric, not tech-centric. One of the            eration whether the system is communicating in
common pitfalls is technology-centered solution-         a manner that is clear and effective for the inter-
ism. Tuomi (2018) summarized its origins and             locutor. This is particularly important consider-
risks aptly in an European Union Science for Pol-        ing the increased diversity of population we set
icy report on AI’s impact on education:                  out to reach in applications for social good. One
                                                         aspect to factor into consideration when design-
     In the stage of technology push, technol-           ing ConvAI systems to communicate with people
     ogy experts possess scarce knowledge ...            from diverse backgrounds is to understand how
     [which] often dominates and overrides               they would perceive the same message or mode of
     other types of knowledge ... this can be-           communication, as miscommunication can some-
     come a problem as technologists easily              times easily slip detection and lead to catastrophic
     transfer their own experiences and be-              outcomes (Wikipedia contributors, 2021). We
     liefs about learning to their designs.              urge researchers to consider using guidelines like
Although technologists do bring fresh perspec-           Datasheets (Gebru et al., 2018) to help calibrate
tives, it is crucial to remember that when devel-        the collection and use of data, and when possible,
oping technology to solve problems, domain ex-           involve members of communities the system is in-
perts are more knowledgeable of the mechanisms,          tended for in the process.
causes, and nuances regarding the subject matter.        Go beyond words. Besides a better understand-
Green (2019) also warns us against the dangers           ing of users that the system is designed to interact
of underdefined or short-sighted metrics of social       with, we can also make ConvAI systems communi-
good, which are sometimes the result of lack of          cate more effectively with the help of other modal-
communication between technologists and subject          ities than just text and speech. As interactive tech-
matter experts.                                          nologies like high-quality computer graphics, ro-
   By extension, it is also important to assess          bust computer vision, haptics devices, augmented
whether ConvAI (or other approaches familiar to          and virtual reality (AR/VR), and embodied tech-
the technologists) is significantly better than alter-   nology become increasingly available to the pub-
native solutions. While requiring simpler equip-         lic, there are also great opportunities for computer
ment to deliver, language interfaces are not always      scientists from these different areas to join forces
the most efficient if graphical user interfaces are      in the quest towards better interactive computer
applicable (e.g., providing directions in a build-       systems (e.g., (BAAI, 2020)). These will also help
ing).                                                    broaden our horizon, and potentially help enable
Avoid Amplifying Bias. It is also important to           interactive conversational systems for social good
avoid exacerbating existing social biases or in-         that is beyond our current imagination. However,
equalities. Today, this is commonly rooted in the        one of the obstacles of further developments in this
kind of data available for many ML applications,         area, we believe, involves data sharing, which we
from facial recognition (Garvie and Frankle, 2016)       will detail in the next section.
to automatic speech recognition (Koenecke et al.,        Faster, Higher, Greener. As we pack more fea-
2020), to what is more and more recognized in            tures into ConvAI systems, it is always useful to
virtually any NLP application, the lack of linguis-      remember that at the end of the day, these systems
tic diversity (Joshi et al., 2020). It is important to   are built to interact with humans in real time. Thus,
carefully design and curate the data used to build       besides being able to communicate effectively, rea-
these systems, especially in a social good setting       sonable response time is of great importance com-
to avoid making a bad situation worse.                   pared to off-line ML systems. On the other hand,
the great opportunity for ConvAI to help in various      fundamental ways to alleviate this bias is to col-
settings and grounded situations should also en-         lect data that is representative of the target audi-
courage us to investigate better methods for trans-      ence of our system. (SDG #10) Aside from rep-
ferring between different settings with higher data-     resentation of the population, what can usually
efficiency. For instance, once we have built a well-     be neglected is the representation of actual users
designed system for pandemic survey for COVID-           of the system. In the ConvAI research literature,
19, it can ideally help with surveying other infec-      many datasets are collected from paid crowdwork-
tious diseases without significant effort in data col-   ers rather than actual users that the systems are in-
lection, training, or system configuration. Both of      tended to serve, which might not faithfully repre-
these goals are not only practical and desirable for     sent user behavior (de Vries et al., 2020). Further-
ConvAI developers and social good stakeholders,          more, as most datasets are collected via a Wizard-
but potentially also helpful in reducing the compu-      of-Oz approach, even if we were able to simulate
tational cost and climate impact of these systems        the intended users with crowdworkers, they will
while they are serving to address societal problems      be conversing with a “perfect” system that doesn’t
(SDGs #12 Responsible Consumption and Pro-               make certain mistakes that a trained ConvAI sys-
duction & #13 Climate Action). We believe that           tem would. This divide is crucial to realize and
the rise of multi-domain dialogue datasets (e.g.,        mitigate in future work, especially for real-world
MultiWOZ, Budzianowski et al., 2018) are a great         applications.
first step in this area.                                    Another challenge that threatens the data quality
It takes two... or more. Although so far in this         for ConvAI concerns systems that adapt on the fly
paper we have largely restricted our attention to        to data collected during the time they are serving.
conversations between one human user and one             On one hand, overly relying on usage data could
ConvAI system, this is by no means the only form         potentially obfuscate or even exacerbate the im-
of conversation that naturally occurs, or in the con-    pact of blind spots in the system, as humans tend
text of social good. One of the common scenar-           to adjust their language use to accommodate the
ios that multi-party conversations naturally occur       perceived properties of the interlocutor (Giles and
is peer support groups (SDG #3), where ConvAI            Coupland, 1991). If a component in a ConvAI sys-
can potentially serve as the coordinator between         tem does not meet a user’s initial expectation, in-
multiple human speakers to help guide the conver-        stead of trying the same thing over and over again,
sation (Nordberg et al., 2019). To the best of our       they are much more likely to accommodate the
knowledge, ConvAI research in this direction re-         system by either enunciating or avoid the compo-
mains relatively limited.                                nent entirely if possible. It is thus important to
                                                         look beyond the survivorship bias that is usually
4.3 Social challenges for ConvAI                         present in such usage data. On the other hand, sys-
Now that we have discussed the technical chal-           tems made publicly available are more vulnerable
lenges facing ConvAI and its application in social       to data poisoning attacks (Nelson et al., 2008; Ru-
good in the near future, we turn to social chal-         binstein et al., 2009). Without sufficient defense
lenges that are unlikely addressed solely by techni-     and audit mechanisms, a well-intended system can
cal advances, and/or that require a wider societal       easily and quickly be polluted through its vulnera-
awareness and collective effort to mitigate.             bilities, and turned against the very people it set
                                                         out to serve (e.g., Microsoft’s Tay bot (Vincent,
Data sourcing and quality. The source of train-          2016)).
ing and evaluation data is one of the most impor-
tant aspects to consider when building and deploy-       Data sharing. For the sustained development of
ing modern AI systems. This is because not only          ConvAI and its adoption in social good, a coor-
does data affect the behavior of these systems, but      dinated effort between academia and the industry
it also often functions as a standard to compare and     is also essential. Despite the abundance of tal-
evaluate different candidates before “better” sys-       ent and research freedom, academia usually lacks
tems are chosen and deployed.                            the means to obtain substantial amounts of data
    As we have discussed in Section 4, it is impor-      for ConvAI, especially data that reflect the “real-
tant that we don’t exacerbate existing social biases     istic data distribution”. In contrast, the industry
when building these systems, and one of the most         has a broader access to real-world data and prob-
lems, yet usually limited research staff to tackle all   cause of its coherent and convincing posts, it went
of them. While their interests somewhat diverge          undetected for weeks (Heaven, 2020).
between academically interesting, abstract, and             This poses a higher risk in ConvAI systems
open research projects (academia) versus propri-         due to their interactive nature, as their misuse can
etary technology and practical solutions for prob-       lead to more elaborate and convincing deception
lems at hand (industry), we would like to highlight      schemes such as spreading mis/disinformation or
that their shared common interests in empirically        scamming. Seemingly benign technology that al-
sound technology that hopefully satisfy a wide           low ConvAI systems to assume a coherent iden-
range of real-world needs. Therefore, we would           tity (Li et al., 2016) can potentially help some-
like to advocate closer collaborations between the       one impersonate someone else online, akin to how
two on how to collect, convert, anonymize, and           generative adversarial networks (Goodfellow et al.,
share conversational data (especially data less pro-     2014) have lead to convincing deepfakes. This
prietary in nature) and work on repeatable evalua-       would not only jeopardize causes for social good
tion methods, so that the broader research commu-        that we have advocated for in this paper, but can
nity can be exposed to unique problems rooted in         potentially cause significant damage to our society
real-world issues that can generate great positive       if left unchecked. We urge the community to re-
impact.                                                  flect on the ramifications of misuse of current AI
Aggression from human users. As ConvAI sys-              technologies, and devote time and energy to miti-
tems are deployed in the wild, it is not uncommon        gating their societal risks.
that they meet aggression from human users from          Transparency and trustworthiness. When
time to time, and managing user frustration is an        ConvAI systems interact with the general popu-
important topic. Beyond helping the user achieve         lation, they face the choice of whether to declare
their goals, however, we should also watch out for       their identity as a computer system or robot. This
how these interactions can potentially affect our        active act of transparency is not always designed
interaction with other humans.                           into systems for various reasons (e.g., Google
   Most ConvAI systems available today on the            Duplex (Google, 2018)), and as a result might
market project a female persona (exclusively or          sometimes have unintended consequences (Garun,
by default), which is even more pronounced when          2019). Although the lack of transparency in this
they interact using speech (Tay et al., 2014; UN-        case is not a deceptive act for malignant purposes,
ESCO and EQUALS Skills Coalition, 2019). As              the restaurants’ perplexed reaction to Duplex
they inevitably fail to fulfill our requests from        clearly indicates that it is not inconsequential.
time to time, these systems can become the tar-             Another aspect related to transparency is trust-
get of frustration, which may also inadvertently af-     worthiness, which is essential for any AI system,
fect how the new generation interact with ConvAI         especially ConvAI systems that interact with hu-
systems as they grow up around them (Elgan,              mans directly. To this end, we argue that ConvAI
2018; Rudgard, 2018). Setting aside the debate           systems deployed in the real world should be able
about child education, it is important and respon-       to recognize their limits of capabilities, and com-
sible that we better understand whether this phe-        municate these limits clearly. This can be a crucial
nomenon mirrors actual human interactions, and           step in earning trust especially in the event that the
how it would contribute to social interactions in        system has misstepped as a result of misidentify-
the long term. If these devices are helping instill      ing user intent and sentiment, or even mischarac-
or reinforce stereotypes of genders and gender dy-       terized user profile in an attempt to personalize to
namics, it is the responsibility of the community to     the user.
raise public awareness and help advocate means to
mitigate this effect (SDG #5 Gender equality).           5   Concluding Remarks
Misuse for Deception. Technological advances             In this paper, we have introduced how conversa-
are almost always double-edged swords. As NLP            tional artificial intelligence (ConvAI) systems can
systems become more advanced, it will inevitably         be applied to causes for social good today and in
be easier to make them appear more human-like.           the near future, and summarized some of the ma-
For instance, GPT-3 (Brown et al., 2020) was used        jor technical and social challenges the community
to post on behalf of an account on Reddit, and be-       still needs to tackle down the line. We acknowl-
edge that the opportunities and risks we included          Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-
in this paper are by no means comprehensive, but             tau Yih, Yejin Choi, Percy Liang, and Luke Zettle-
                                                             moyer. 2018. QuAC: Question answering in context.
an attempt of us to summarize and introduce what
                                                             In Proceedings of the 2018 Conference on Empiri-
in our opinion are typical and atypical scenarios            cal Methods in Natural Language Processing, Brus-
where ConvAI can be impactful or concerning to               sels, Belgium. Association for Computational Lin-
the community. We refer the reader to (Ruane                 guistics.
et al., 2019) for a more comprehensive view of so-
                                                           Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad
cial and ethical considerations of ConvAI.                   Goel, and Aziz Huq. 2017. Algorithmic decision
   This paper is part of a broader conversation              making and the cost of fairness. In Proceedings
around the impact of AI systems in the real world,           of the 23rd acm sigkdd international conference on
including the very definition of social good (Green,         knowledge discovery and data mining, pages 797–
                                                             806.
2019), transparency and interpretablility (Doshi-
Velez and Kim, 2017), as well as algorithmic fair-         Abhishek Das, Satwik Kottur, Khushi Gupta, Avi
ness (Corbett-Davies et al., 2017), among others.            Singh, Deshraj Yadav, José MF Moura, Devi Parikh,
We hope that our paper can serve as a starting point         and Dhruv Batra. 2017. Visual dialog. In Proceed-
for the community brainstorming of various oppor-            ings of the IEEE Conference on Computer Vision
                                                             and Pattern Recognition, pages 326–335.
tunities and potential risks for ConvAI to promote
social good and betterment.                                Finale Doshi-Velez and Been Kim. 2017. Towards a
                                                              rigorous science of interpretable machine learning.
                                                              arXiv preprint arXiv:1702.08608.
References
                                                           Nathan Eddy. 2020. Mayo Clinic adds COVID-
Aggeliki Androutsopoulou, Nikos Karacapilidis, Euri-         19 skills to Amazon Alexa.  https://www.
  pidis Loukis, and Yannis Charalabidis. 2019. Trans-        healthcareitnews.com/news/mayo-clinic-
  forming the communication between citizens and             adds-covid-19-skills-amazon-alexa.
  government through AI-guided chatbots. Govern-             Accessed: January 15, 2021.
  ment Information Quarterly, 36(2):358 – 367.
                                                           Mike Elgan. 2018.    The case against teaching
Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Mar-          kids to be polite to Alexa.   https://www.
   garet Mitchell, Dhruv Batra, C Lawrence Zitnick,          fastcompany.com/40588020/the-case-
   and Devi Parikh. 2015. VQA: Visual question an-           against-teaching-kids-to-be-polite-to-
   swering. In Proceedings of the IEEE international         alexa. Accessed: January 22, 2021.
   conference on computer vision, pages 2425–2433.
                                                           Ahmed Fadhil and Ahmed AbuRa’ed. 2019. OlloBot
BAAI. 2020. BAAI-JD multimodal dialog challenge.
                                                            - towards a text-based Arabic health conversational
  https://jddc.jd.com. Accessed: January 22,
                                                             agent: Evaluation and results. In Proceedings of
  2021 (in Chinese).
                                                             the International Conference on Recent Advances in
Stephen P. Becker, Rosanna Breaux, Caroline N.               Natural Language Processing (RANLP 2019), pages
   Cusick, Melissa R. Dvorsky, Nicholas P. Marsh,            295–303, Varna, Bulgaria. INCOMA Ltd.
   Emma Sciberras, and Joshua M. Langberg. 2020.
   Remote learning during COVID-19: Examining              Raefer Gabriel, Yang Liu, Anna Gottardi, Mihail Eric,
   school practices, service continuation, and difficul-     Anju Khatri, Anjali Chadha, Qinlang Chen, Behnam
   ties for adolescents with and without attention-d-        Hedayatnia, Pankaj Rajan, Ali Binici, Shui Hu,
   eficit/hyperactivity disorder. Journal of Adolescent      Karthik Gopalakrishnan, Seokhwan Kim, Lauren
  Health, 67(6):769 – 777.                                   Stubel, Kate Bland, Arindam Mandal, and Dilek
                                                             Hakkani-Tür. 2020. Further advances in open do-
Tom B Brown, Benjamin Mann, Nick Ryder, Melanie              main dialog systems in the third alexa prize socialbot
  Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind           grand challenge. Technical report, Amazon Alexa
  Neelakantan, Pranav Shyam, Girish Sastry, Amanda           AI.
  Askell, et al. 2020. Language models are few-shot
  learners. arXiv preprint arXiv:2005.14165.               Natt Garun. 2019. One year later, restaurants are
                                                             still confused by Google Duplex. https://www.
Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang               theverge.com/2019/5/9/18538194/google-
  Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ra-           duplex-ai-restaurants-experiences-
  madan, and Milica Gašić. 2018. MultiWOZ - a              review-robocalls. Accessed: January 10, 2021.
  large-scale multi-domain Wizard-of-Oz dataset for
  task-oriented dialogue modelling. In Proceedings of      Clare Garvie and Jonathan Frankle. 2016. Facial-
  the 2018 Conference on Empirical Methods in Natu-          recognition software might have a racial bias prob-
  ral Language Processing.                                   lem. The Atlantic, 7.
Timnit Gebru, Jamie Morgenstern, Briana Vecchione,       Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika
  Jennifer Wortman Vaughan, Hanna Wallach, Hal             Bali, and Monojit Choudhury. 2020. The state and
  Daumé III, and Kate Crawford. 2018. Datasheets          fate of linguistic diversity and inclusion in the NLP
  for datasets. arXiv preprint arXiv:1803.09010.           world. In Proceedings of the 58th Annual Meet-
                                                           ing of the Association for Computational Linguistics,
Howard Giles and Nikolas Coupland. 1991. Accommo-          pages 6282–6293, Online. Association for Computa-
  dating language. Open University Press.                  tional Linguistics.
Lewis R Goldberg. 1992. The development of mark-         John F Kelley. 1984. An iterative design methodology
  ers for the big-five factor structure. Psychological     for user-friendly natural language office information
  assessment, 4(1):26.                                     applications. ACM Transactions on Information Sys-
                                                           tems (TOIS), 2(1):26–41.
Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza,
   Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron     Allison Koenecke, Andrew Nam, Emily Lake, Joe
   Courville, and Yoshua Bengio. 2014. Generative ad-      Nudell, Minnie Quartey, Zion Mengesha, Connor
   versarial networks. arXiv preprint arXiv:1406.2661.     Toups, John R Rickford, Dan Jurafsky, and Sharad
                                                           Goel. 2020. Racial disparities in automated speech
Google. 2018.   Google Duplex: An AI sys-                  recognition. Proceedings of the National Academy
  tem for accomplishing real-world tasks over              of Sciences, 117(14):7684–7689.
  the phone.   https://ai.googleblog.com/
  2018/05/duplex-ai-system-for-natural-                  Jiwei Li, Michel Galley, Chris Brockett, Georgios Sp-
  conversation.html.  Accessed: January 10,                 ithourakis, Jianfeng Gao, and Bill Dolan. 2016. A
  2021.                                                     persona-based neural conversation model. In Pro-
                                                            ceedings of the 54th Annual Meeting of the Associa-
Arthur C Graesser, Katja Wiemer-Hastings, Peter
                                                            tion for Computational Linguistics (Volume 1: Long
  Wiemer-Hastings, and Roger Kreuz. 1999. AutoTu-
                                                            Papers), pages 994–1003, Berlin, Germany. Associ-
  tor: A simulation of a human tutor. Cognitive Sys-
                                                            ation for Computational Linguistics.
  tems Research, 1(1):35 – 51.

Ben Green. 2019. “Good” isn’t good enough. In            Blaine Nelson, Marco Barreno, Fuching Jack Chi, An-
  NeurIPS Joint Workshop on AI for Social Good.            thony D Joseph, Benjamin IP Rubinstein, Udam
                                                           Saini, Charles A Sutton, J Doug Tygar, and Kai Xia.
Sangdo Han, Kyusong Lee, Donghyeon Lee, and                2008. Exploiting machine learning to subvert your
  Gary Geunbae Lee. 2013. Counseling dialog sys-           spam filter. LEET, 8:1–9.
  tem with 5W1H extraction. In Proceedings of the
  SIGDIAL 2013 Conference, pages 349–353, Metz,          Oda Elise Nordberg, Jo Dugstad Wake, Emilie Sektnan
  France. Association for Computational Linguistics.       Nordby, Eivind Flobak, Tine Nordgreen, Suresh Ku-
                                                           mar Mukhiya, and Frode Guribye. 2019. Designing
Robin Håvik, Jo Dugstad Wake, Eivind Flobak, As-          chatbots for guiding online peer support conversa-
  tri Lundervold, and Frode Guribye. 2018. A con-          tions for adults with ADHD. In International Work-
  versational interface for self-screening for ADHD in     shop on Chatbot Research and Design, pages 113–
  adults. In International Conference on Internet Sci-     126. Springer.
  ence, pages 133–144. Springer.
                                                         Andrew M Olney, Sidney D’Mello, Natalie Person,
He He, Anusha Balakrishnan, Mihail Eric, and Percy         Whitney Cade, Patrick Hays, Claire Williams, Blair
  Liang. 2017. Learning symmetric collaborative dia-       Lehman, and Arthur Graesser. 2012. Guru: A com-
  logue agents with dynamic knowledge graph embed-         puter tutor that models expert human tutors. In Inter-
  dings. In Proceedings of the 55th Annual Meeting of      national conference on intelligent tutoring systems,
  the Association for Computational Linguistics (Vol-      pages 256–261. Springer.
  ume 1: Long Papers), pages 1766–1776, Vancouver,
  Canada. Association for Computational Linguistics.     Antoine Raux, Brian Langner, Dan Bohus, Alan W
                                                           Black, and Maxine Eskenazi. 2005. Let’s go pub-
He He, Derek Chen, Anusha Balakrishnan, and Percy          lic! taking a spoken dialog system to the real world.
  Liang. 2018. Decoupling strategy and generation in       In Ninth European conference on speech communi-
  negotiation dialogues. In Proceedings of the 2018        cation and technology.
  Conference on Empirical Methods in Natural Lan-
  guage Processing, pages 2333–2343, Brussels, Bel-      Anna Rohrbach, Atousa Torabi, Marcus Rohrbach,
  gium. Association for Computational Linguistics.         Niket Tandon, Christopher Pal, Hugo Larochelle,
                                                           Aaron Courville, and Bernt Schiele. 2017. Movie
Will Douglas Heaven. 2020. A GPT-3 bot posted              description. International Journal of Computer Vi-
  comments on reddit for a week and no one no-             sion, 123(1):94–120.
  ticed. https://www.technologyreview.com/
  2020/10/08/1009845/a-gpt-3-bot-posted-                 Elayne Ruane, Abeba Birhane, and Anthony Ven-
  comments-on-reddit-for-a-week-and-no-                    tresque. 2019. Conversational AI: Social and ethical
  one-noticed/. Accessed: January 21, 2021.                considerations. In AICS, pages 104–115.
Benjamin IP Rubinstein, Blaine Nelson, Ling Huang,         World Bank. 2021. LSMS-supported high-frequency
  Anthony D Joseph, Shing-hon Lau, Satish Rao, Nina         phone surveys on covid-19.      https://www.
  Taft, and J Doug Tygar. 2009. Antidote: understand-        worldbank.org/en/programs/lsms/brief/
  ing and defending against poisoning of anomaly de-         lsms-launches-high-frequency-phone-
  tectors. In Proceedings of the 9th ACM SIGCOMM             surveys-on-covid-19. Accessed: January 19,
  conference on Internet measurement, pages 1–14.            2021.
Olivia Rudgard. 2018. ‘alexa generation’ may be learn-     World Health Organization. 2013. Comprehensive
  ing bad manners from talking to digital assistants,       mental health action plan: 2013–2020. https://
  report warns.      https://www.telegraph.co.               apps.who.int/gb/ebwha/pdf_files/WHA66/
  uk/news/2018/01/31/alexa-generation-                       A66_R8-en.pdf. Accessed: January 19, 2021.
  could-learning-bad-manners-talking-
  digital/. Accessed: January 10, 2021.                    World Health Organization. 2019. Suicide. https://
                                                             www.who.int/news-room/fact-sheets/
Tom Simonite. 2020. The therapist is in—and it’s a           detail/suicide. Accessed: January 19, 2021.
  chatbot app. https://www.wired.com/story/
  therapist-in-chatbot-app/. Accessed: Jan-                World Health Organization. 2020. COVID-19 disrupt-
  uary 20, 2021.                                            ing mental health services in most countries, WHO
                                                            survey.     https://www.who.int/news/item/
Benedict Tay, Younbo Jung, and Taezoon Park. 2014.           05-10-2020-covid-19-disrupting-mental-
  When stereotypes meet robots: the double-edge              health-services-in-most-countries-who-
  sword of robot gender and personality in human–            survey. Accessed: January 19, 2021.
  robot interaction. Computers in Human Behavior,
  38:75–84.                                                XinhuaNet. 2020. Precisely detecting desperate emo-
                                                             tions, this AI customer service saved a life by
Ilkka Tuomi. 2018. The impact of artificial intelligence     alerting help.    http://www.xinhuanet.com/
   on learning, teaching, and education. Technical re-       tech/2020-06/24/c_1126152953.htm.             Ac-
   port, Joint Research Centre.                              cessed: January 19, 2021 (in Chinese).
UNESCO and EQUALS Skills Coalition. 2019. I’d
                                                           K. Yacef. 2002. Intelligent teaching assistant systems.
 blush if I could: closing gender divides in digital
                                                             In International Conference on Computers in Educa-
 skills through education.     https://unesdoc.
                                                             tion, 2002. Proceedings., pages 136–140 vol.1.
  unesco.org/ark:/48223/pf0000367416.
  page=1. Accessed: January 21, 2021.                      Li Zhou, Jianfeng Gao, Di Li, and Heung-Yeung Shum.
James Vincent. 2016. Twitter taught Microsoft’s               2020. The design and implementation of xiaoice, an
  AI chatbot to be a racist asshole in less than a            empathetic social chatbot. Computational Linguis-
  day. https://www.theverge.com/2016/3/24/                    tics, 46(1):53–93.
  11297050/tay-microsoft-chatbot-racist.
  Accessed: January 21, 2021.
Harm de Vries, Dzmitry Bahdanau, and Christopher
  Manning. 2020. Towards ecologically valid re-
  search on language user interfaces. arXiv preprint
  arXiv:2007.14435.
Xuewei Wang, Weiyan Shi, Richard Kim, Yoojung Oh,
  Sijia Yang, Jingwen Zhang, and Zhou Yu. 2019. Per-
  suasion for good: Towards a personalized persuasive
  dialogue system for social good. In Proceedings of
  the 57th Annual Meeting of the Association for Com-
  putational Linguistics.
Joseph Weizenbaum. 1966. Eliza—a computer pro-
  gram for the study of natural language communica-
  tion between man and machine. Communications of
  the ACM, 9(1):36–45.
Wikipedia contributors. 2021. Avianca flight 52 —
  Wikipedia, the free encyclopedia. [Online; accessed
  22-January-2021].
Terry Winograd. 1971. Procedures as a represen-
  tation for data in a computer program for under-
  standing natural language. Technical report, MAS-
  SACHUSETTS INST OF TECH CAMBRIDGE
  PROJECT MAC.
400      dialog(ue)
                                 chat
                           conversation(al)
       Paper Count

                     200

                       0
                           1980        2000   2020
                                      Year

Figure 2: Statistics of papers in the ACL Anthology
that mention “dialog(ue)”, “chat”, or “conversation(al)”
explicitly in the title over the past five decades.

A Supplemental Material
In Figure 2, we take all the references from the
ACL Anthology3 as of May 2021, and illustrate
the growing interest in Conversational AI as is re-
flected in direct mentions of “dialog(ue)”, “chat”,
and “conversation(al)” in paper titles. As can be
seen from the figure, there has been a renewed
interested in ConvAI from the ACL community
since the turn of the millennium, followed by a
sharp growth in the past five years.

   3
       https://www.aclweb.org/anthology/
You can also read