Developing and evaluating a style guide for chatbots deployed in a technical setting - AGNES PETÄJÄVAARA
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Degree project in Interactive Media Technology Second cycle 30 credits Developing and evaluating a style guide for chatbots deployed in a technical setting AGNES PETÄJÄVAARA Stockholm, Sweden 2022
Developing and evaluating a style guide for chatbots deployed in technical settings Agnes Petäjävaara M.Sc. Interactive Media Technology, KTH, Stockholm, Sweden agnespet@kth.se ABSTRACT med chatboten utvärderades och resulterade i ett NPS värde på This study evaluates the perceived credibility of a technical 17 för chatboten som använde den ödmjuka kommunikation- chatbot based on its communication style - the way it interacts sstilen igämförelse med ett negativt värde på -16 för chatboten with its users embodied through text and emojis. A chatbot’s som använde den initiala kommunikationsstilen. initial communication style was compared to a humble version. Denna studie visade att en mer ödmjuk kommunikationsstil The humble communication style was developed from a design inte skadar den upplevda trovärdigheten hos en teknisk profes- workshop held together with six participants and is presented sionell chatbot. in this paper as a design style guide. The perceived credibility was divided into six dimensions; Author Keywords Competence, Goodwill, Honesty, Predictability, Reputation, Chatbot; Communication style; Design style guide; and Trustworthiness. The results from the evaluation of the Conversational Interface; Internal support systems; CUI two chatbot versions showed that the credibility was, in gen- eral, perceived higher for the chatbot using a humble communi- CCS Concepts cation style. Two exceptions were found; (1) the dimension of •Human-centered computing → Human computer inter- Trustworthiness stayed at the same level between the versions, action (HCI); Haptic devices; User studies; Please use the and (2) the dimension of Goodwill was perceived higher for 2012 Classifiers and see this link to embed them in the text: the chatbot not using the humble communication style. The https://dl.acm.org/ccs/ccs_flat.cfm satisfaction with the chatbot was measured and resulted in an NPS of 17 for the chatbot using the humble communication INTRODUCTION style compared to a negative score of -16 for the chatbot not Perceived credibility is an important user experience (UX) using it. feature for software systems [27, 20, 17]. A conversational This study found that a more humble communication style user interface (CUI) is essentially a digital interface enabling would not harm the perceived credibility of a technical profes- users to interact with software following the same principles sional chatbot. of human conversations [4]. Examples of CUIs are chatbots, voice, or virtual assistants. SAMMANFATTNING This study regards chatbots and the perceived credibility of a Den här studien utvärderar den upplevda trovärdigheten hos humble communication style. Humble is a part of the brand en teknisk chatbot baserat på dess kommunikationsstil - hur tone of the company issued this thesis, Ericsson, where humble den interagerar med sina användare genom text och emojis. is used to describe a more personal and kind way of interacting En chatbots initiala kommunikationsstil jämfördes med en that does not feel too mechanical. From this study’s design ödmjuk version. Den ödmjuka kommunikationsstilen utveck- workshop, humble chatbots were defined to be unpretentious lades från en designworkshop och presenteras i denna studie and respectful. som en designguide. Large-scale software development organizations need to have Den upplevda trovärdigheten delades upp i 6 dimensioner; an efficient development and support infrastructure. However, Kompetens, Välvilja, Ärlighet, Förutsägbarhet, Rykte, och large-scale organizations, such as Ericsson, come with com- Pålitlighet. Resultatet av utvärderingen av de två chatbot ver- plexity for support and maintenance which makes the need sionerna visade att trovärdigheten upplevdes generellt högre to simplify the process both necessary and challenging. Eric- för den ödmjukare chatboten. Två undantag påvisades dock; sson currently has a prototype of a chatbot to help alleviate (1) Pålitlighet blev oförändrad mellan de två kommunikation- these problems. The chatbot aims to be the primary point sstilarna, och (2) Välvilja resulterade i ett högre värde hos den of entry where Ericsson employees can search for guidance initiala kommunikationsstilen. Användarnas tillfredsställelse and support. The main users for the chatbot are engineers needed support with technical questions rather quickly. Erics- son believes when the chatbot is in production, it will make the Document date: 2022-02-18 development process faster and more efficient. This would im- © 2021 Copyright held by the author. prove the quality of deliverables in the long term and the ease 1
of use, speed, quality, and efficiency of the current handling of ment. For this reason, the credibility concept consists of the support issues in the short term. It is of great importance for union of dimensions from the definitions of human-human, the user experience and business value that users trust and be- computers in general, and AI for personal use. Competence, lieve what the chatbot tells them since the perceived credibility Goodwill, Honesty, Predictability, Reputation, and Trustwor- of a system influences the user’s interest in it [2]. thiness were used in this study to evaluate credibility. It has been found important for systems that act as a source of Communication style knowledge, aim to instruct or tutor users, or act as a decision Communication style is an expression of a person’s personality aid, to be perceived as credible [14]. Therefore, it is of great and determines the way people interact with others. A person’s importance that Ericsson’s chatbot also is perceived as credible. communication style determines how they speak, act, or react Tools and studies evaluating the perceived credibility based in various situations [24]. HEXACO Personality Inventory- on the communication styles of human-to-human interaction Revised (HEXACO-PI-R) is an instrument to measure the six exist, but there is a gap in the qualitative research of human- major dimensions of personality and is found to have medium computer interaction (HCI) [10, 25]. to strong associations with communication styles [8]. To be A study by Liebrecht, Sander, and van Hooijdonk investigated able to develop CUI that feels human-like in conversation, it communication styles (informal vs formal) of customer service is important to think of the CUIs personality [4]. Deciding chatbots of familiar and unfamiliar brands. It was found that on a voice and tone is fundamental, and this is why this study a chatbot’s informal communication style induced a higher focuses on the chatbot’s communication style. perceived social presence which in turn positively influenced Humility the quality of the interaction and brand attitude. [19] Humble was chosen as the communication style because it For chatbots used in customer support, it is important that users has been found to be an appreciated communication style in trust chatbots to provide the required support. An interview professional settings and is included in the brand tone of the study by Nordheim, Følstad, and Bjørkli focuses on chatbots company that issued this thesis. Based on the research done on used in customer support. They explain the importance of HEXACO-PI-R, the personality dimension “honesty-humility” users trusting chatbots for them to provide the required sup- was studied in relation to workplace behavior. It was found port. Trust was determined by the chatbot’s interpretation of that personalities with a high rating on the “honesty-humility” requests, its self-presentation, and its professional appearance. dimension, showed productive and dutiful behaviors at work. [15] The “Honesty-humility” dimension has also been found to be crucial for providing the foundation for moral action within A study by Beattie, Edwards and Edwards found that when organizations where leaders with a high rating have a more ef- emojis are used in chats the sender is considered more socially fective team performance. Additionally, personalities showing attractive, competent, and credible when compared to verbal- a high honesty-humility are perceived to be more trustworthy only message senders, no matter if the sender is a human or a than others. [9, 26, 3, 22] chatbot. [5] Ericsson’s brand personality is said to be that of a challenger, Based on the previous literature studies, there seems to be a meaning that they sometimes want to be perceived as daring, gap in research investigating the perceived credibility of chat- and sometimes as heartfelt. Their tone of voice consists there- bots deployed in technical settings based on their communica- fore of five different pillars. One is “Humbly intelligent” and tion style. This study aims to bridge this gap by investigating is said to surprise, engage, and captivate their audience. [11] whether a more humble conversation style of a chatbot, embod- ied through text and emojis, affects its perceived credibility. METHODOLOGY Additionally, the study presents a style guide for a humble and To answer the research question four phases beyond the litera- professional chatbot and the results of its evaluation. ture research had to be covered. First, by defining humility in The research question of this study is “What is the impact of a the specific context of a technical setting. This phase was cov- humble communication style on the perceived credibility of a ered in a design workshop conducted with six participants that chatbot used in a technical environment?” all came from outside of Ericsson. The workshop was used as a method to gather data on the perception of humbleness from tech office workers. To the author’s knowledge, there are lim- Six dimensions of credibility ited data available on humble chatbots, and this method was Credibility is a perceived quality that results from evaluating an approach to gain more data for the research work. There multiple dimensions simultaneously [13]. A credible human are two main reasons why these participants were recruited; is perceived to be competent, trustworthy, and have goodwill (1) there was difficult to find volunteers from the company that [5]. For computers in general, credibility is said to consist all could participate in a design workshop, and (2) using non- of trustworthiness and expertise [14]. The credibility of AI Ericsson participants gives a non-company-biased approach for private use, such as voice agents and personal assistants, to humility in a technical setting. is conceptualized along the dimensions of honesty, expertise, predictability, and reputation [23, 10]. Second, by designing a style guide that can be easily imple- mented by chatbots. The style guide was developed based on This paper is focusing on evaluating the perceived credibility data from the literature research and the outcomes from the de- of a support chatbot used in a professional technical environ- sign workshop using an affinity diagram. The affinity diagram 2
was evaluated and confirmed by a person with expertise in UX to ensure it was representing the collected data accordingly. Third, by implementing the style guide on a chatbot used in a large-scale software organization. This was made by only changing the sentences Ericsson’s chatbot prototype commu- nicated to its users embodied through text and emojis, and not changing any visual elements of the chatbot prototype. And lastly, by evaluating the eventual impact by measuring the perceived credibility of the chatbot using the humble style guide in comparison to a version of the chatbot that is us- ing the initial communication style. Ericsson’s chatbot is at a prototype stage, meaning that it has not been released to Figure 1. The results from the design workshop activity "What is hum- the greater audience yet. Methods used to evaluate products ble?". in production, such as A/B testing, are not applicable. The methods used in this phase were think-alouds, interviews, and surveys. To the author’s knowledge, there is no standardized credibility evaluation available for technical support chatbots, the point, which makes the conversation easy for everyone therefore, a union of established credibility evaluation surveys to understand”. for relatable fields was used and got backed up by measuring the Net Promoter Score (NPS) of the two versions of the chat- 3. Place the sentences on the grid: In this activity, the par- bot prototype. 12 employees at Ericsson participated in the ticipants were given ten sentences and were asked to rate evaluation; six on the initial version and six on the version them on a dimensional grid of humble vs. arrogant and following the humble style guide. professional vs. unprofessional. The dimensional grid was used as a tool to encourage discussion about different ways of expressing yourself online and how sentences are per- DESIGN WORKSHOP ceived differently by the sender and receiver depending on To be able to better understand what a humble communication the relation between them. Therefore, the actual placement style used by professional chatbots is - a design workshop was of the sentences got on the grid was insignificant for this held. It aimed for two main things; (1) to understand how study. the communication style of Ericsson’s initial prototype of the chatbot was perceived, and (2) how its communication style The sentences used were constructed by the author of this could be re-designed to instead be perceived as humble. Six paper with inspiration from real online conversations be- participants; A, B, C, D, E, and F, volunteered to participate. tween employees at Ericsson: The group of participants had a median age of 27 years old 1. "No worries! I am happy to help " and came from 4 different nationalities. Due to the COVID-19 2. "I hate giving people more work so you are welcome" pandemic and the several locations of the participants, the 3. " well done in deleting your history " workshop was held online for an effective 90 minutes and was 4. "Your history was successfully deleted " divided into five activities. 5. "Hellooooooo " 1. Perception of the initial version: As a first activity, the 6. "Hi! Can I help with anything?" initial version of Ericsson’s chatbot prototype was demoed 7. "Do you need support??" for the participants of the workshop. They were asked 8. "Bye! Come back soon " to focus on the personality of the chatbot to later be able 9. "Oh, I’m sorry I didn’t get that." to discuss their perception of it. The participants of the 10."How can you not already know that?" workshop described their perception of the chatbot as “basic Following the participants’ placements on the grid, two but professional” with the motivations of it only giving the main areas were discussed: (1)the use of emojis, and (2) least amount of response needed while using “strict” and the need not to feel judged. Regarding the use of emo- “formal” language. jis/emoticons, the participants agreed that only the most common ones are OK to use to still be perceived as profes- 2. What is humble?: The next activity was to define humble in sional. Participant A referred to sentence nr. 3 and said: the context of CUI. The participants were to rank synonyms to humble based on how well they thought that the word “It seems very sarcastic with the clapping be- corresponds to their definition of a humble online conversa- tween every word. If you are friends it is another tion. The results can be seen in Figure 1. “Unpretentious” story, then you know that the other person is jok- and “Respectful” were the synonyms that were the truest to ing. Otherwise, it feels very unprofessional to the participants’ beliefs. From this activity, the need for a me” - A conversation with a humble CUI not to be overly complex was mentioned. Participant B explained it as “If you are Participant F agreed with participant A’s statement and humble you are using simple language, kind of straight to added: 3
The most important feature was that the chatbot should have a fast response time. The participants expressed that this is essential for a humble chatbot because it needs to be respectful of your time. The participants also agreed that it is important for the chatbot to be able to redirect the user to an actual human when they reach a dead-end. Participant C stated that: “If a chatbot should be humble, it has to realize that it may not have all the answers and instead pass [ the user ] on to someone who does.” - C The priority of the friendliness of the humble chatbot di- Figure 2. Results from the workshop activity “Which features are essential vided the participants. Half of the group said that it is an priority?”. essential priority and the rest not so much. During the dis- cussion that followed, it was stated that the friendliness of the chatbot depends on how clever the chatbot is. Participant F explained it as: “You can use emojis and still be professional. Just not a bunch of them. And use the simple one, like “If I get the answer that I need then it doesn’t a smiley, wave, or thumbs up. Don’t use flowers matter if it is friendly or not. However, if it fails unless you are talking about flowers [referring to to give me the correct answers I would be annoyed sentence nr. 5].” - F if it also isn’t friendly. It [the chatbot] must be friendly if it is a bit stupid” - F The participants also stated the need not to feel judged or stupid for the chatbot to be perceived as humble. Sentence The participants are considering deemphasized chat ele- nr. 10 and nr. 9 are essentially expressing the same thing ments to be a priority because it seems more time-efficient but got placed on the grid as opposites. Sentence nr. 10 which is considered an essential priority for humble CUIs. got rated as extremely arrogant and unprofessional. Partici- Participant A explained their rating as: pant B expressed that they felt judged by that sentence and said "Who are they [the chatbot] to tell me what I should “I first rated this one quite low because I thought know?”. that I want the chatbot to adapt to how I am talk- Sentence nr. 9 got rated as humble and professional. Partic- ing, but through our discussions, I realized that it ipant E expressed their reasoning behind it as: should actually be a priority. Especially if you are either short on time or if you don’t know exactly “This is the most humble one because it states like how to phrase stuff, preselected options could I am sorry that *I* did not get it. It’s not blaming help you out. Having preset options is essential anyone else, more like, ‘sorry it’s my fault, not for a humble but because it would value/respect yours’” - E the user’s time“ - A Participant F agreed with participant E and said that nr. 9 The level of human likeness was considered important but does not imply that the receiver of the message is stupid. not essential. Participant D explained their ranking as: The feeling of being stupid was also perceived by sentence nr. 7, and this was only due to the double question marks “There needs to be some kind of human element according to participant F. to a chatbot for me to be able to perceive it as humble. Adding these human-unnecessary-words 4. Which features are essential priorities?: Since one of the here and there, for example [interjections such goals from Ericsson was for the chatbot to motivate focused as] “oh”, makes it sound more human and hence engagement and deliver simple and compelling UX, the perceived as more humble to me” - D fourth activity of the workshop was to discuss design prin- ciples for CUIs based on what the participants consider a 5. Scripting of a humble conversation: In the last activity of priority for a humble chatbot. The design principles come the workshop, a few humble conversations were scripted from Følstad and Brandtzaeg’s paper "Chatbots: Changing based on the support chatbot prototype demoed at the be- user needs and motivation" and focused on response time, ginning of the workshop. friendliness, human touch, and the presentation of chat ele- The first task for the participants in this activity was for them ments [6]. The results from the participants’ ratings can be to decide on how a humble chatbot should greet. Almost seen in Figure 2. What was considered the least important all participants replied with a similar answer to "Hey! How of the participants of the workshop was the gender of the can I help you? ". The biggest difference was the chatbot. However, previous research in this field indicate word chosen as a greeting word as well as the emojis used. that chatbot gender does have an effect on users overall Participant E explained why they used “hey” as a greeting satisfaction and gender-stereotypical perception [21]. word by saying: 4
be by asking a technical support chatbot "I hear music, do you?". There was a distinction in the participants’ answers where some stated the importance of helping the user back on track with the chatbot’s capabilities, whereas others wanted the chatbot to be odd in the replies in a similar fashion to the user’s question. Participant D submitted a more playful answer and explained that “If the user says something odd you can as well say something odd back”. Participant B on the other hand expressed the importance of bringing the users back on track focusing on deemphasized chat elements Figure 3. Participants submission to activity five and the scripting task and the preset answer options: "How should a humble chatbot ask the user for more information?". “The options are good to give [the user] just to be sure that if the user, in fact, tries to mess with the chatbot or if they just didn’t succeed in giving “I think the ‘hey’ makes it a little bit more per- [the chatbot] the correct key-words” - B sonal than “hi”. In the same way as [interjections such as] ‘oh’ make the chatbot seem a bit more The fifth scripting question was regarding how a humble human, and in that way also humble” - E chatbot should present knowledge or results to a user query. All participants stated the importance of the chatbot being Participant F agreed and further explained their choice of less confident in its answers. This, by explaining that what emojis with: the chatbot found might not be what the user was looking “The emoji could emphasize that you are not both- for, and in that case, the chatbot can try to find something ering it. That means a real emoji, not a :) [refer- better. Participant A explained their submission with: ring to emoticons]“ - F “By saying - ‘let me know if this isn’t what you From the discussion, the participants also agreed that it is are looking for’, or ‘I can help you find something important that the chatbot already when greeting the user else’ - is a nice way of putting ‘the blame’ on the expresses what it is capable of so the user would not waste chatbot in the case that it didn’t give you the right their time. results. It also shows that the chatbot is open to help more like it is not bothered if you ask it for The second scripting task for the participants focused on the more things. Rather than ‘ here are the results - scenario where the chatbot needs more information from that’s it ‘, where the chatbot intends that if you the user to proceed. From the submitted results from the don’t ask for the right thing it is on you that the participants, the importance of the chatbot apologizing was chatbot couldn’t find the right thing.” - A clear, see Figure 3. Participant F explained their submission with: The last scripting task was regarding how a chatbot should end a session. Also here the participants were on the same “Adding a “sorry” at the beginning makes me page regarding not explicitly including a formal "good-bye feel like I didn’t do the wrong thing by not giving word" and instead said things like "I hope I helped you the chatbot the right information. The blame is today, come back anytime! " or "Thank you for asking not on me. So adding that would make the chatbot me, just let me know if there’s anything else I can do for you seem a bit more humble” - F ". A discussion regarding when to use “sorry” and “please” Participant F explained it with: was happening where participant A explained that “sorry” is a better word because “it is a bit more ‘it’s my fault, not “I feel like the chatbot is always there in the back- yours’ than ‘please’”. ground, so it would be weird if it gave me a defi- Participant B was clear in stating the importance of humble nite bye” - F chatbots being able to connect the user to an actual human From some general discussion at the end of the workshop, the and explained it with: importance of using smiling emojis during the entire conversa- “If the chatbot is asking for more information, tion with a user to showcase that the chatbot has a “positive there is already a miscommunication happening. mindset” was determined. Therefore I think it is important to already here HUMBLE CHATBOT STYLE GUIDE throw in the possibility of talking to a human” - B The humble chatbot style guide is based on the outcomes from The next scripting task was regarding what the chatbot the design workshop and literature study and was examined should do if the user tries to break it. Trying to break using an affinity diagram that was evaluated by a person with the chatbot is a very common human behavior when first experience in UX. The style guide is applicable for support getting in contact with a chatbot [4]. One example could chatbots used in a technical setting, suggesting personal traits 5
of the user to be highly skilled, well-educated, and eventually What to do stressed. • “I am sorry, I didn’t get that. Would you mind rephrasing the question or should I find you a human to talk to?” Sentiment • “Please let me know if this is not what you are looking for The chatbot needs to be fast and straight to the point to be ” perceived as humble. If you are humble, you don’t waste someone’s time. The chatbot should use simple well-known What not to do words and phrases. This is for being as inclusive as possible • “Your input is wrong” for non-native English speakers. Interjections are important for the chatbot to seem more human-like and in that case, also • “This is how it is” be perceived as more humble. Emojis What to do Humble chatbots used in a professional setting can use emojis, • “Hey ” but only the most common ones, such as smile[ ], thumbs up[ ], or a wave[ ]. This is to avoid eventual miscom- • “Oh, cool!” munications between age groups and cultures. For a humble communication style, it is important to keep a positive mindset • “That’s unusual" during the entire conversation, and by using smiling emojis the chatbot emphasizes that the user is not bothering it. How- What not to do ever, less is more. Emojis should not be overused to keep the • “Hi” professional level to the conversation. The emojis used by a humble chatbot need to be relevant to the content of the mes- • “Ok.” sage. The meaning behind the emoji needs to be well-known and perceived in a similar manner across different cultures. • “That is arcane” More informal emojis can be used if the context allows it to, eg. if a user tries to break the chatbot by asking questions Clarity that have very little to do with the service the chatbot offers, The chatbot should state early in the conversation what it can open-ended, hypothetical, or rhetorical questions. do. This is so the user directly knows how the chatbot can help What to do them. Showing preset options already when greeting the user • Hey, how can I help you? [smiling face-emoji] would also oblige the user with problem-solving as soon as possible. All of which to prevent the user from wasting their What not to do time which is something a humble chatbot should avoid. • Hey, how can I help you? [smirking face-emoji and hibiscus-emoji] What to do • “Select a topic or type your question below and I’ll do my Availability best to help you [list of preset answer options]” The chatbot should always be there ready to help the user. For this reason, it should never end a session with a formal closing What not to do phrase. If the user submits a good-bye word to the chatbot, the • “What do you need help with?” chatbot should end the ongoing conversation but explain that it will stay idle in the background. Assertiveness The chatbot should be less assertive in what it tells the user. It What to do is important to ensure that the blame is always on the chatbot • “I hope I helped you today, come back anytime! [waving if something goes wrong or if the results given are not correct. hand-emoji]” It is also important to show that the chatbot is open to help • “Thank you for asking me, just let me know if there’s any- more and is not bothered if the user asks it for more things. thing else I can do for you [smiling face-emoji]” The user should never get the perception that it is their fault that the chatbot could not help them. Adding a “sorry” at the • “Let me know if there is anything else I can help you with. beginning of a sentence when the chatbot was unsuccessful Have a nice day [smiling face-emoji]” makes it perceived as more humble. The “sorry” makes the user feel that the error is on the chatbot and not on the input What not to do they gave the chatbot. It is also important to include a human • “Goodbye [waving hand-emoji]” hand-off as soon as the user is not fully understood to prevent deeper miscommunication from happening. The chatbot can IMPLEMENTATION OF STYLE GUIDE ask the user to clarify or rephrase their question, but if the The style guide was implemented to Ericsson’s initial chatbot chatbot after that still does not understand what the user is prototype by only changing the sentences the chatbot commu- asking for, it is important to already here give the possibility nicated to the users embodied through text and emojis, and of talking to a human. not changing any visual elements of the CUI. 6
mandatory question asking the user to further explain why they graded the statements the way they did. The survey data was collected over a one-on-one interview, where the first part was a think-aloud focusing on the general usability of the chatbot, in the last part of the interview the participants were asked to answer the survey. The credibility evaluation was made on both the initial version and the “humble” version of the chatbot. The statements and the median of the survey results can be seen in Figure 5. Initial version Six participants participated in the evaluation of the initial Figure 4. Chatbot’s initial reply when user tries to break it to the left and version of the chatbot. Two of the participants worked as after the implementation of a humble style guide to the right. support engineers, doing similar work to what the chatbot is intended to do in the future. They expected the chatbot to be able to help them with more technical advanced questions. The initial chatbot listed its capabilities in a list of preset an- The others had general engineering roles at Ericsson, such as swer options for the humble version, the message was changed test coordinator, product owner, and developer, and expected from “What kind of support do you need?” to the more hum- the chatbot to help them with both technologically advanced ble “Hey! How can I help you? Select a topic or write your and less advanced questions. All of which have a relation to question below. ”. the technical areas to which the chatbot is intended to give support. When the user selects a topic that they are interested in, the initial chatbot was straightforward and demanding, directly Three participants stated that they strongly agreed to the Good- asking the user to provide more information in order to pro- will statement and the rest that they agreed. One participant ceed with the user’s request. The humble chatbot has a less motivated their answer with: demanding approach. “While it is not clear how the chatbot finds the content When the chatbot presents results to a user request, the humble presented by it, I have no reasons to think the chatbot is chatbot says "I hope these results were helpful for you! If what not well-intentioned.” you were looking for is not included here, please let me know The initial version was perceived high on the Trustworthiness and I will do my best to improve over time ", in comparison and Reputation dimension of credibility. However, the ini- to the initial chatbot that did not say anything. tial chatbot prototype was perceived as less positive on the If a user initially tried to break the chatbot by asking irrelevant Competence, Honesty, and Predictability dimensions. Three questions such as “I hear music, do you?” the chatbot replied participants said that they neither agree nor disagree with the with “I can’t handle that request”. In the humble version, the Honesty statement of the chatbot, saying that the chatbot did chatbot uses interjections such as “hm”, and instead of leaving not reach that level. One participant strongly disagreed with it the user in a dead-end, it lists the preset answer options of and said: what the chatbot can do to further support the user. See Figure “I do not think that the chatbot is responsible for the 4 for a comparison between the two versions. reliability of the information. If the information source is If a user wants to end a session by typing a goodbye phrase the not accurate (erroneous or incomplete), the chatbot will initial version of the chatbot replies with a simple “Goodbye!”. simply present erroneous or incomplete information. The The humble version of the chatbot alternates between several chatbot can be trained to ignore unreliable information, sentences as a reply, such as "I hope I helped you today, come but it is impossible to ensure that the information will back anytime! ", "Let me know if there is anything else I always be reliable.” can help you with. Have a nice day ", and "Thank you for Humble version asking me, just let me know if there is anything else I can do for you " In the evaluation of the humble version of the chatbot, two participants worked as system architects, one as a UX special- ist, and the rest as data engineers. These had more negligible CREDIBILITY EVALUATION knowledge regarding the expertise of the chatbot compared to To understand how much and in what way the communica- the participants of the initial version evaluation. Meaning that tion style of technical support chatbots affects its users - the they expected the chatbot to help them with a wider range of chatbot’s credibility was evaluated by 12 employees at Erics- technical questions. son. The participants were asked to grade sample statements based on how well they agreed to them. The sample state- Analyzing the six independent data collections from the hum- ments were compiled from credibility evaluation surveys from ble version evaluation showed positive results in comparison relatable fields. Likert scale was used as a method to get to the initial chatbot. Two exceptions were raised; (1) the the conversation going regarding each dimension of the per- dimension of Goodwill, and (2) the dimension of Trustworthi- ceived credibility, therefor each statement was followed by a ness. Two participants said that they strongly agreed with the 7
Figure 5. The median results from the two communication style evaluation based on the six dimensions of credibility; the median of the results from the credibility evaluation of the initial communication style of the chatbot and the version of the chatbot with the implemented "humble" style guide. Goodwill statement, the rest that they just agreed with it. The dimension of Trustworthiness stayed at the same rating as for the initial version of the chatbot. One participant explained their score with: “If it finds some relevant info, the info page itself is of trustworthy sources, so I can freely accept it. However, I would never accept it if it would say something is not there, not found: that might just mean it cannot find it.” Figure 6. The NPS results to question “How likely are you to recommend the chatbot to a colleague?”. To the left is the results from the chatbot The dimensions of Competence, Honesty, Predictability, and using the initial communication style, and to the right is the results of the Reputation all increased with half a level on the Likert scale chatbot using the humble communication style. compared to the initial version of the chatbot. For the Reputa- tion statement, several participants expressed that the humble version was "simple and efficient to use". And lastly, measure the eventual impact by measuring the per- ceived credibility of the chatbot using the humble style guide Net Promoter Score (NPS) in comparison to a version of the chatbot that is not. This To get a more accurate overview of the credibility measure- study found that the perceived credibility is, in general, higher ment, all 12 participants were asked “How likely are you to for chatbots using a more humble conversation style. Two recommend the chatbot to a colleague?”. Since NPS mea- exceptions were found, (1) the credibility dimension of Trust- sures customer satisfaction it is also related to the dimensions worthiness stayed at the same level for both chatbot versions of credibility and was therefore included as a standardized in this study, and (2) the credibility dimension of Goodwill complement to the survey. The initial version of the chatbot was lower for the humble chatbot compared to the chatbot of got a negative result of -16. The humble version had a positive comparison. score with an NPS value of 17. See Figure 6 for diagrams. The neutral Trustworthiness result indicates that a humble communication style has no direct impact on the perceived DISCUSSION trust of the chatbot. This means that the personality of a This thesis work aimed to investigate whether the perceived chatbot used for support in a technical environment can not credibility of chatbots is affected by a more humble conversa- directly be compared to the personality of a human, since pre- tion style embodied through text and emojis. To understand vious research on the HEXACO-PI-R instrument found that the impact, this study had to cover several phases. First, by personalities with high honesty-humility are perceived to be defining humility in the specific context of a technical envi- more trustworthy than others [9, 26, 3, 22]. Additionally, this ronment. Second, designing a style guide that can be easily result can be extended to the study by Nordheim, Følstad, and implemented by chatbots. Third, implementing the style guide Bjørkli where trust was determined by the chatbot’s interpre- on a prototype used in a large-scale software organization. tation of requests, its self-presentation, and its professional 8
appearance, by explaining that a humble communication style participants should have interacted with the chatbot on their has no impact on the trustworthiness of chatbots that are used own before concluding its communication style. in a technical work environment [15]. An online interactive presentation tool, Mentimeter, was used The negative Goodwill result suggests that it is something that to collaboratively brainstorm ideas and answers to the dif- does not necessarily apply to the positive perception of cred- ferent activities of the workshop [1]. On the positive side, ibility for humble chatbots. This contradicts the hypothesis Mentimeter encourages all voices to be heard and generates by Beattie, Edwards, and Edwards, where they explain that great visualization of the data collected, on the downside the “chatbots using emojis in their conversation may be perceived participants could influence other participants’ answers. This to be demonstrating human goodwill by taking steps to con- was due to time pressure on completing each task in time, and vey relational information and keeping information open via the visualization showing all participants’ answers for every- facing more conversational cues”. However, since the hum- one directly when an answer was submitted. I.e. For the first ble chatbot is perceived to be more credible in general, but scripting activity, all six participants submitted a very similar also more competent in comparison to the chatbot using the sentences to the task “How should a humble chatbot greet?”. basic communication style, the outcome of this thesis work Credibility evaluation limitations also confirms Beattie, Edwards, and Edwards study where the sender is perceived to be both more competent and credible The perceived credibility evaluation generated mostly positive when using emojis [5]. results for the humble communication style, but the results might have turned out differently with a more homogeneous The previous research on communication styles (informal vs set of user groups. On the one hand, It has been found that formal) of customer service chatbots of familiar and unfamiliar users who are familiar with the content of a CUI will evaluate brands, by Liebrecht, Sander, and van Hooijdonk, can to some it more stringently and likely perceive it to be less credible extent be extended by the findings of this thesis. The style [14]. On the other hand, the main end-users of Ericsson’s guide is based on the outcome from the design workshop internal chatbot are engineers who all should have some level where all six independent participants were not trying to use of knowledge of the technical systems they are asking the human approaches to be humble, but rather rethinking what chatbot for support with. It has been found difficult to measure this means for a chatbot. Some formalities were explained perceived credibility if participants’ judgments are influenced to be needed for the chatbot to be perceived as professional, by objective properties of the information or its source [12]. but at the same time interjections, emojis, and simpler words, For these reasons, the outcome of this study might have turned which can be found to be less formal, were mentioned to be out differently if the credibility evaluations of the style guide fundamental features of a humble communication style. This were implemented and evaluated on a chatbot that was not is also something that is explained in the previous research to intended to be used in a technical professional environment. be important to improve customers’ brand attitude and quality of interactions [19]. In hindsight, there are disadvantages to using the Likert scale as a method to collect data. For example, acquiescence bias Based on the present findings, there are two main outcomes - a phenomenon arising in surveys in general where respon- of this thesis that can be directly applied in the industry; the dents are more likely to agree than disagree with the statement style guide and the survey results. The style guide can be of shown [18]. To prevent accidentally misleading the partici- interest to other designers and developers working with CUIs pants and to be able to ask more accurate follow-up questions, in general, and support chatbots in specific. This, since it gives it would have been preferable to ask the participants to rate practical guidelines of “dos” and “don’ts” in how to design a the credibility statements only over interviews instead of the chatbot that is perceived as humble. In addition, the results survey. For instance, from analyzing the results of the survey, from the evaluation, especially the positive NPS results, can several participants from the initial communication style evalu- be of interest for people working with developing chatbots in a ation oversaw the chatbot’s negative behavior and tried to find technical environment in general, and stakeholders, who need solutions for it, instead of purely explaining why they agreed the statistics to allow or reject a humble communication style, or disagreed with the statement. This was shown in the Think in specific. Alouds where they in greater chance made excuses for the chatbot’s bad behavior by saying things like “I should adapt my search query to fit this microservice” when the chatbot Limitations was unable to return their intended results, instead of thinking To gain a deeper understanding of the mechanisms behind that the chatbot did something wrong. A reason for this can the perceptions of a credible technical support chatbot, more be due to participants of the initial evaluation having a higher research is needed that takes the following limitations into knowledge of the expertise of the chatbot due to their daily account. work tasks. Design workshop limitations Ethics The participants of the design workshop who evaluated the Interaction happens both ways. For this reason, it is impor- perceived communication style of the initial chatbot were not tant to be aware of the chatbot’s personality since this may able to interact with it themselves before the workshop but influence user behavior in the long-term. The decision to take only observed a demo of it during the workshop and drew their a humble approach for the communication style of the CUI conclusions on its communication style from that. Ideally, the can be more ethically defensible since humble qualities are 9
an important trait at work and an appreciated communication style was implemented and evaluated by employees from a style in professional settings [9, 26, 3, 22]. Beyond work large-scale software development organization. benefits, the humble style guide encourages gratitude. A study by Grant and Gino has shown that showing gratitude, even the Participants of the evaluation study experienced that a more smallest thank you, can motivate prosocial behaviors in others humble communication style positively influenced the per- [16]. Not only those who give or receive prosocial behavior ceived knowledge of the chatbot, the reliability of its stated benefit from them; it also affects the people observing the kind information, as well as the predictability of the chatbot’s next acts or being part of the community where prosocial behavior action. The participants who used the chatbot with a humble happens [7]. communication style also stated that they were more likely to use the chatbot in the future as well as recommend the chat- The style guide encourages the design of a chatbot that is bot to a colleague. If a support chatbot used in a technical humanlike, but not overly so, to ensure that it still fulfills its environment is perceived as credible, it would greatly increase primary purpose of giving the user the right answer as fast as the chances of employees trusting the chatbot as the primary possible. Additionally, the style guide encourages chatbots source of information. If more employees use a chatbot for to have a human handoff if miscommunication occurs. This assistance it will reduce the time it takes for the employees to is not only a more efficient approach to problem solving, but get support and also free up time for the employees working can also have a positive influence on the user’s mental health with supporting others today. since early studies indicates that a pure robotic conversation In conclusion, this study shows that using a more humble might generate an increased feeling of isolation, loneliness, communication style generally positively affects the perceived and depression [6]. credibility of internal support chatbots used in a technical environment. Future work Future work should focus on better understanding the mecha- ACKNOWLEDGMENTS nisms why the dimension of Goodwill is the only dimension The author thanks all the volunteers who participated in the de- that harms the perceived credibility. Would the Goodwill in- sign workshop, interviews, and credibility survey. A grateful dication stay the same also if the Goodwill statement would thank you to Lori-Ann Robertson and Leif Jonsson at Erics- have been phrased differently, if a more diverse set of partici- son for your help and support in providing feedback on ideas pants would have answered the survey, or if more participants and help in collecting participants for the interviews and ques- would have participated? It would also be of interest to in- tionnaire. A big thank you to Madeline Balaam for being an vestigate if the perception of Trustworthiness would increase excellent academic supervisor and thank you to my examiner if the chatbot had a defined gender. This, since previous re- Kristina Höök for keeping me updated on the latest in the field. search indicates that gender transparency of a bot can create Finally, the author gratefully acknowledges friends and family trust among users[4]. Another interesting approach to future for providing endless feedback and support throughout the work would be to analyze if and how the credibility evalua- entire thesis work. Thank you! tion would have differed if the style guide would have been applied to another CUI or evaluated on another more diverse REFERENCES user group. Would the style guide be perceived as humble [1] 2022. (2022). https://www.mentimeter.com/ also for neurodiversity? Additionally, the last phase of this research work was quite short. It would be interesting to inves- [2] Farah Alsudani and Matthew Casey. 2009. The Effect of tigate whether a longer evaluation phase might show effects Aesthetics on Web Credibility. In Proceedings of the on human behavioral change since interactions happen both 23rd British HCI Group Annual Conference on People ways. Would a humble chatbot influence its users to be more and Computers: Celebrating People and Technology humble too? Finally, to make the style guide easier to adapt by (BCS-HCI ’09). BCS Learning amp; Development Ltd., other software development organizations it should be further Swindon, GBR, 512–519. DOI: improved, evaluated, generalized, and optimized for easier im- http://dx.doi.org/10.5555/1671011.1671077 plementation and adaption in the future. This could be done by [3] Michael C. Ashton and Kibeom Lee. 2009. The looping over more rounds of the Double Diamond approach, HEXACO–60: A Short Measure of the Major which due to time limitations was not possible for this thesis Dimensions of Personality. Journal of Personality work. Assessment 91, 4 (2009), 340–345. DOI: http://dx.doi.org/10.1080/00223890902935878 PMID: CONCLUSION 20017063. This study contributes to the field of HCI by highlighting the [4] Rachel Batish. 2018. Voicebot and Chatbot Design: importance of the communication style of support chatbots Flexible Conversational Interfaces with Amazon Alexa, used in a technical environment. It aims to bridge the theo- Google Home, and Facebook Messenger. Packt retical gap between credibility evaluation of support chatbots Publishing, Limited, Birmingham. 1789139627 and the use of a more humble communication style. The paper presents a style guide for a humble communication style based [5] Austin Beattie, Autumn P. Edwards, and Chad Edwards. on a design workshop with participants from a diverse set of 2020. A Bot and a Smile: Interpersonal Impressions of cultures and work areas. The designed humble communication Chatbots and Humans Using Emoji in 10
Computer-mediated Communication. Communication York, NY, USA, 80–87. 0201485591 DOI: Studies 71, 3 (2020), 409–427. DOI: http://dx.doi.org/10.1145/302979.303001 http://dx.doi.org/10.1080/10510974.2020.1725082 [15] Asbjørn Følstad, Cecilie Bertinussen Nordheim, and [6] Petter Bae Brandtzaeg and Asbjørn Følstad. 2018. Cato Alexander Bjørkli. 2018. What Makes Users Trust Chatbots: Changing User Needs and Motivations. a Chatbot for Customer Service? An Exploratory Interactions 25, 5 (aug 2018), 38–43. DOI: Interview Study. In Internet Science, Svetlana S. http://dx.doi.org/10.1145/3236669 Bodrunova (Ed.). Springer International Publishing, [7] Joseph Chancellor, Seth Margolis, and Sonja Cham, 194–208. 978-3-030-01437-7 Lyubomirsky. 2018. The propagation of everyday [16] Adam Grant and Francesca Gino. 2010. A Little Thanks prosociality in the workplace. The Journal of Positive Goes a Long Way: Explaining Why Gratitude Psychology 13, 3 (2018), 271–283. DOI: Expressions Motivate Prosocial Behavior. Journal of http://dx.doi.org/10.1080/17439760.2016.1257055 personality and social psychology 98 (06 2010), 946–55. [8] Reinout E. de Vries, Angelique Bakker-Pieper, Femke E. DOI:http://dx.doi.org/10.1037/a0017935 Konings, and Barbara Schouten. 2013. The [17] Kieun Kim. 2016. The Relationship of UX and Communication Styles Inventory (CSI): A Perceptions of Credibility: The Case of the Mobile Six-Dimensional Behavioral Model of Communication Social Commerce Sites. International Journal of Styles and Its Relation With Personality. Affective Engineering 15, 2 (2016), 109–114. Communication Research 40, 4 (2013), 506–532. DOI: http://dx.doi.org/10.1177/0093650211413571 [18] Ozan Kuru and Josh Pasek. 2016. Improving social media measurement in surveys: Avoiding acquiescence [9] Reinout E. de Vries and Jean-Louis van Gelder. 2015. bias in Facebook research. Computers in Human Explaining workplace delinquency: The role of Behavior 57 (2016), 82–92. DOI: Honesty–Humility, ethical culture, and employee http://dx.doi.org/https: surveillance. Personality and Individual Differences 86 //doi.org/10.1016/j.chb.2015.12.008 (2015), 112–116. DOI:http://dx.doi.org/https: //doi.org/10.1016/j.paid.2015.06.008 [19] Christine Liebrecht, Lena Sander, and Charlotte van Hooijdonk. 2021. Too Informal? How a Chatbot’s [10] Cal W. Downs, Joan Archer, John McGrath, and Jeff Communication Style Affects Brand Attitude and Stafford. 1988. An Analysis of Communication Style Quality of Interaction. Følstad A. et al. (eds) Chatbot Instrumentation. Management Communication Research and Design. CONVERSATIONS 2020. Lecture Quarterly 1, 4 (1988), 543–571. DOI: Notes in Computer Science 12604 (2021), 16–31. DOI: http://dx.doi.org/10.1177/0893318988001004006 http://dx.doi.org/10.1007/978-3-030-68288-0_2 [11] Ericsson. 2020. Ericsson brand guidelines - extract from [20] Jessica Lindblom and Rebecca Andreasson. 2016. ericsson brand house (v.1.0 ed.). 14 pages. Current Challenges for UX Evaluation of Human-Robot https://mediabank.ericsson.net/admin/mb/?h= Interaction. In Advances in Ergonomics of dbeb87a1bcb16fa379c0020bdf713872&p= Manufacturing: Managing the Enterprise of the Future, dccda36951e6721097a93eae5c593859&display=list Christopher Schlick and Stefan Trzcieliński (Eds.). [12] Andrew J. Flanagin and Miriam J. Metzger. 2007. The Springer International Publishing, Cham, 267–277. role of site features, user attributes, and information 978-3-319-41697-7 verification behaviors on the perceived credibility of [21] Marian McDonnell and David Baxter. 2019. Chatbots web-based information. New Media & Society 9, 2 and Gender Stereotyping. Interacting with Computers (2007), 319–342. DOI: 31, 2 (04 2019), 116–121. DOI: http://dx.doi.org/10.1177/1461444807075015 http://dx.doi.org/10.1093/iwc/iwz007 [13] B. J. Fogg, Jonathan Marshall, Othman Laraki, Alex [22] Lea Müller, Jens Mattke, Christian Maier, Tim Weitzel, Osipovich, Chris Varma, Nicholas Fang, Jyoti Paul, and Heinrich Graser. 2019. Chatbot Acceptance. Akshay Rangnekar, John Shon, Preeti Swani, and Proceedings of the 2019 on Computers and People Marissa Treinen. 2001. What Makes Web Sites Research Conference (2019). DOI: Credible? A Report on a Large Quantitative Study. In http://dx.doi.org/10.1145/3322385.3322392 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’01). Association [23] Cecilie Bertinussen Nordheim, Asbjørn Følstad, and for Computing Machinery, New York, NY, USA, 61–68. Cato Alexander Bjørkli. 2019. An Initial Model of Trust 1581133278 DOI: in Chatbots for Customer Service—Findings from a http://dx.doi.org/10.1145/365024.365037 Questionnaire Study. Interacting with Computers 31, 3 (08 2019), 317–335. DOI: [14] B. J. Fogg and Hsiang Tseng. 1999. The Elements of http://dx.doi.org/10.1093/iwc/iwz022 Computer Credibility. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems [24] Robert Norton. 1983. Communicator style. Sage (CHI ’99). Association for Computing Machinery, New Publications. 11
You can also read