User expectations towards machine translation: A case study
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
User expectations towards machine translation: A case study Barbara Heinisch Vesna Lušicky Centre for Translation Studies Centre for Translation Studies University of Vienna University of Vienna Austria Austria barbara.heinisch@univie.ac.at vesna.lusicky@univie.ac.at 1 Introduction Abstract Language technology applications have become a Neural machine translation (NMT) sys- ubiquitous service used by various user groups to tems have emerged as powerful platforms overcome language barriers. While certain types for providing fluent translations in a vari- of technology, such as translation memory sys- ety of languages and domains. The wide- tems, are specialized tools used by translators spread adoption of NMT has heightened only, machine translation (MT) systems are also the need for studying the results and im- used by non-translators. If the exposure of MT us- pact of these systems. Although ac- ers was somewhat limited to gist translation in the ceptance of machine translation has been past, users are increasingly implementing MT in analyzed, the expectations of users to- professional and other scenarios. The acceptance wards NMT have not received much atten- of MT tools and services is attested by the high tion yet. This paper investigates the expec- number of users of generic online MT services tations of novice translators enrolled on a (Way, 2018). Based on their prior experiences, us- postgraduate program in specialized trans- ers develop and form expectations towards MT. lation. In addition, it examines the confir- Expectations are beliefs about attributes or per- mation or disconfirmation of expectations formance of a product or service in the future (Ol- towards machine translation (MT) output son et al., 1979). Users’ expectations may have an among this user group. A three-step influence on the intended use and evaluation of mixed-method approach was applied: a MT. Expectations also provide the frame of refer- quantitative questionnaire and two recur- ence for satisfaction (Higgs et al., 2005). Satisfac- rent (pre-trial and post-trial) evaluations tion with a service is crucial when introducing or of raw MT outputs. The evaluations con- evaluating MT. Expectations are dynamic con- sisted of the identification and classifica- structs, a synergy of users’ pre-trial perceptions tion of errors in NMT output according to and beliefs about performance or attributes of a the Multidimensional Quality Metrics. product or a service. Although there is some am- The respondents expected the MT output biguity regarding the definition and operationali- to be of rather low quality, but the quality zation of expectations, the service quality litera- of NMT output was not as high as the par- ture differentiates several categories of expecta- ticipants expected. Compared to the ex- tions, most frequently: forecast, normative, ideal pected frequency of error types in the MT and minimum tolerable. The four categories cover output, the reported frequency differed different dimensions of expectations: forecast de- significantly. This paper argues that the scribes users’ perception of what will occur; nor- users’ experience and expectations have mative describes users’ perception of what should an impact on the use and evaluation of ma- occur; ideal describes the highest level attainable chine translation. in a category; and minimum tolerable describes the minimum baseline for normative and ideal © 2019 The authors. This article is licensed under a Crea- tive Commons 4.0 licence, no derivative works, attribution, CC BY-ND. Proceedings of MT Summit XVII, volume 2 Dublin, Aug. 19-23, 2019 | p. 42
(Higgs et al., 2005). Users’ expectations and the is used to get the gist of a text or for publication type of expectations depend on internal and exter- purposes, is usually not taken into account. Only nal cues, such as users’ prior experience and infor- the latter would usually require post-editing. mation on products. A series of error typologies have been devel- Users’ (quality) expectations towards MT out- oped to assess the quality of machine-translated put and resulting implications for MT use are an content. The Multidimensional Quality Metrics under-explored topic in MT research (Way, 2018). (MQM) error typology (Lommel et al., 2014) has So far, expectations were addressed in relation to been increasingly used and expanded for the eval- the estimation of the quality of post-editing effort uation of NMT (Klubička et al., 2018). The MQM (Specia et al., 2009). Way (2018) gives an over- framework provides a comprehensive typology of view of what level of quality can be expected from quality issues. This error typology contains stand- MT. Existing research recognizes the critical role ardized names and definitions of errors and has played by adoption (Cadwell et al., 2018) and ac- the flexibility of several assessment layers and ceptance of MT (Moorkens & Way, 2016; their granularity. The MQM issues are organized Koskinen & Ruokonen, 2017). Gaspari et al. in eight major dimensions: Accuracy, fluency, ter- (2015) also attempted to map the expectations, re- minology, locale convention, style, verity, design, quirements and needs of the translation industry and internationalization (Lommel et al., 2014). concerning translation quality and MT. By the nature of design, the assessment of the With the widespread application of neural ma- quality of MT output is a post-trial evaluation and chine translation (NMT) as the MT approach of does not consider pre-trial expectations. choice in generic as well as specialized MT sys- tems, the question of pre-trial user expectations 2 Research design and method should be addressed, especially user expectations The research reported in this paper has several ob- based on previous use and information obtained jectives. First, the research investigates the expec- on the service. They may have implications for the tations of a group of postgraduate specialized users’ intended purpose of MT use and their satis- translation students towards MT. This paper ex- faction with the service. The notion of expecta- plores how previous experience with MT influ- tions should also be considered in human evalua- ences their expectations towards the overall qual- tion of MT output: the types of expectations and a ity of and error types found in MT output. Second, potential negative bias may influence the results it seeks to examine the confirmation or disconfir- of human evaluations of MT output. mation of these expectations by an evaluation of There is a growing body of literature that rec- two MT outputs. ognizes the importance of quality assessment of This study makes a contribution to research on MT output. For MT developers, scale and robust- expectations towards MT by demonstrating that ness are major concerns, but end-users are also in- experience and expectations influence the use of terested in receiving good-enough or high-quality MT systems and the evaluation of MT output. We translations (Way, 2018). The concept of fitness- applied a mixed-method approach, combining a for-purpose of translation has been widely recog- quantitative questionnaire as well as MT output nized, but the assessment methods vary in opera- evaluation, i.e. error identification, error classifi- tionalization and theoretical framework. The qual- cation and correction of MT output. ity of MT output is either assessed automatically or by humans. First, automatic evaluation is usu- 2.1 Questionnaire ally based on evaluation metrics such as BLEU (Papineni et al., 2002), NIST, WNMf or ME- A questionnaire consisting of three parts with TEOR (Anastasiou & Gupta, 2011). Metrics such closed and open questions was distributed among as BLEU compare the MT output string with a hu- the user group. The first part was designed to as- man translation which is seen as “gold standard”. certain the respondents’ translation experience, However, these metrics ignore the source sentence working languages (A, B and C language (AIIC, as a reference and the fact that there might be 2018)) and professional experience. more than one correct translation (Way, 2018). The second part of the questionnaire addressed Second, human evaluation (also) requires the use the respondents’ prior experience in MT use, in- of evaluation criteria (a brief overview of evalua- cluding the frequency of and reasons for MT use. tion criteria provide Fiederer & O’Brien (2009)). The participants were asked to state whether they When comparing raw MT output with human use MT for professional, study or private pur- translations, the purpose of MT, e.g. whether MT poses, which MT systems they use and for which Proceedings of MT Summit XVII, volume 2 Dublin, Aug. 19-23, 2019 | p. 43
types of text. This part also elicited information on puts used for evaluation were excerpts from Brit- the respondents’ forecast, normative and ideal ex- ish newspaper articles on a topic related to Aus- pectations towards MT. The participants were tria. They comprised about 200 words each and asked to rank the quality-related issues and their were translated from English to German with the frequency they would expect in MT output ac- EU Council Presidency Translator (2019) plat- cording to the MQM. All respondents had to state form. The study participants were provided with the most frequent errors they expect in MT output. the source texts and the raw MT output as well as The third part of the questionnaire elicited in- the MQM and TAUS DQF spreadsheet for both formation on the quality expectations and ex- texts. The sentences in German were evaluated at pected errors when using an MT system for two the segment level in accordance with the MQM. different texts. The students were asked to read the English source text. Afterwards they had to state 3 Results their expectations towards the quality of the re- 3.1 Profile of the respondents lated MT output utilizing a five-point grade sys- tem (excellent, good, satisfactory, sufficient, use- Of the final cohort of 47 respondents, 8 already less). They had to rank the expected errors in the worked as professional translators and 39 were MT output according to the MQM. Second, they novice translators. The majority (68%) of the had to download a spreadsheet containing the respondents worked with German as A language, MQM and TAUS Dynamic Quality Framework ahead of Italian (11%) and Russian, Hungarian, (DQF) (Görög, 2014). They compared the source Polish, English and French. More than half of the and target text and identified (and corrected) er- participants (60%) stated that English was their B rors in the MT output. Each error was assigned to language, with German, Russian, Croatian and an MQM error (sub)category and an error severity Japanese being the B language of the remaining level on a five-point scale in the spreadsheet. The respondents. The C languages were quite diverse, completed spreadsheets served as basis for the ranging from English (38%), French, Spanish, third step, which consisted in ranking the error Slovakian, Italian, German to Greek and types found in the MT output according to their Romanian. Six respondents stated that they do not frequency. By using the TAUS DQF and MQM work with a C language. When asked about their for the error identification and classification task, translation experience, the majority (79%) we could compare their expectations with the indicated that they had translated more than 15 evaluation result. texts during their studies. The 8 students (17%) The questionnaire was circulated in early 2019. who had already worked as professional 79 students enrolled on a master’s program in translators were active in the fields of engineering, translation and focusing on specialized translation social sciences and humanities. were recruited for this study. 32 individuals were excluded from the study because English was 3.2 Experience in MT use none of their working languages or they did not About 62% of the respondents already had expe- complete all the tasks. rience in MT use. Almost all of them (93%) re- ported that they use MT as part of their studies. 2.2 Evaluation of MT output More than two-thirds (69%) indicated that they The objective of the participants’ evaluation of use MT for private purposes and 31% of the re- MT output in the third part of the questionnaire spondents for professional purposes. When asked was to collect the error issues detected in raw MT about the frequency of MT use in a professional, output by the respondents. The evaluation was private or study context, 41% of the students indi- used for contrastive analysis of users’ expecta- cated that they use MT for study purposes on a tions towards error issues in MT output and the weekly basis and the remainder several times a actual errors detected. It helped analyze the con- year (19%) or several times a month (15%). For firmation or disconfirmation of expectations. private purposes, they commented to use MT sev- The quality of the raw MT output was evaluated eral times a year (31%), on a weekly basis (21%), by the students based on the MQM error typology on a daily basis (3%) or never (14%). For profes- and the TAUS DQF. Prior to evaluation, they were sional purposes, the respondents indicated that familiarized with both frameworks. they never use MT (55%) or they use it several The students were given two English source times a month (17%), on a daily basis (14%), sev- texts and their German MT outputs. The MT out- eral times a year or on a weekly basis (7% each). Proceedings of MT Summit XVII, volume 2 Dublin, Aug. 19-23, 2019 | p. 44
Those experienced in MT use translated docu- (30%) and style (23%) were the two main aspects ments, e.g. reports or files (79%), ahead of web- on the third rank while locale conventions and sites (34%) or correspondence, e.g. e-mails style (23% each) had the highest number of re- (24%). Most of them reported that they use MT sponses on the fourth rank. Design and verity were for translations from German into English and mentioned predominantly on ranks 6 and 7. vice versa. They listed DeepL (69%) and Google Translate (59%) when asked about the MT system 3.4 Expectations towards error types and of choice. Another system mentioned was eTrans- their (dis)confirmation lation. Among the MT systems which the respond- After having read the first source text (ST1), the ents already tested but did not use frequently were respondents rated the expected quality of the Google Translate, the Facebook translator, Bing, related MT output (O1) with a grade ranging from Yandex and Babel. excellent to useless. Almost half (49%) of the The reasons for using MT included saving time respondents expected the quality of the MT output (69%), getting the gist of a text (66%), consulting to be sufficient, while 40% of those surveyed a reference (55%), avoiding repetitive work expected satisfactory MT output. Only a small (31%), avoiding typing (21%) and avoiding re- number of the participants expected good quality search (3%). (4%) or useless translations (6%). After having read O1 and after having identified, categorized 3.3 Expectations towards MT quality and corrected the errors in the raw MT output, the The participants expected MT to provide a raw participants rated the quality of O1 as follows: translation, i.e. a first draft they can post-edit Sufficient (40%), useless (28%), satisfactory (53%) or a gist translation (38%) when using MT (23%) and good (9%). Thus, the number of for study purposes. Only 5 respondents (11%) useless grades increased significantly while the would want MT to provide immediately usable number of satisfactory and sufficient grades translations in a study context. For professional decreased. and private purposes, 21 respondents (45%) The expected errors and their frequency in O1 expected MT output to produce texts which can be were primarily related to fluency (38% on the first used immediately without post-editing, i.e. they rank), accuracy (28% on the first rank, 32% on the expected a final translation. For professional second rank), style (23% on second rank) and purposes, 15 respondents (32%) reported that they terminology (21% on third rank). When compared would use MT output as a draft translation. For to the errors reported, accuracy errors increased private purposes, 24 respondents (51%) would use and fluency and verity errors decreased on rank 1, MT output only as a gist translation. This means while fluency errors increased, and accuracy and that draft translations were more important in a terminology errors decreased on rank 2. Style study context, whereas gist purposes (to errors increased slightly on rank 3 while locale understand the meaning of the text) and final convention errors increased on rank 4. translations were more relevant in a private For the second source text (ST2), the students context. predominantly expected the MT output (O2) to be When asked to rank their general expectations of sufficient quality (55%) or useless (26%). The towards working with an MT system, 81% of the other students reported that O2 would have satis- respondents ranked fast translation first. Proper factory (13%) or good quality (6%). Compared to functioning and intuitive use of the MT system their expectations, they rated the actual translation ranked second among 60% of the respondents, to be of lower quality. The participants stated that whereas intuitive use still ranked third among O2 was useless (36%) or of sufficient quality 28% of the respondents. On ranks 4 to 6 the re- (49%). This demonstrates that they expected the spondents predominately listed translation of dif- MT output to be of higher quality than later re- ferent file formats, status feedback and accessibil- ported. ity of the MT system. When asked about the expected error types in In response to the question about the expected O2, well over half (64%) of the respondents quality-related issues in MT output, nearly a third ranked accuracy errors first and more than half (30%) of those surveyed ranked accuracy first (57%) ranked fluency errors second. Well under while nearly one quarter (23%) ranked fluency half of those surveyed (40%) ranked style errors first. Just over a third of those who responded third. After completing the MQM table, there was ranked accuracy second, while approximately a a significant increase in fluency errors and de- fifth (21%) ranked fluency second. Terminology crease of accuracy errors on rank 1 as well as a Proceedings of MT Summit XVII, volume 2 Dublin, Aug. 19-23, 2019 | p. 45
significant increase in fluency errors on rank 2 and 4 Discussion a slight increase in terminology errors. On rank 3, the students reported a higher number of accuracy We focused on postgraduate translation students errors and a smaller number of locale convention due to the documented competence profile of this errors than expected. user group. Their competence profile included Thus, both accuracy and fluency were the translation, technological and revision compe- MQM error categories listed the most in all ana- tence (EMT, 2009). Therefore, we assumed that lyzed areas, i.e. the overall quality of MT output, the students had a basic knowledge of MT sys- the expected error types and the error types found. tems, their advantages and disadvantages as well However, the data showed a slight shift of the ac- as post-editing. It was necessary to familiarize curacy and fluency categories between the ex- them with the rather complex MQM framework pected and actual error types in both texts. which required a certain amount of time. In summary, the majority of the participants ex- Although this study is limited to a small num- pected the MT output to be of sufficient or inferior ber of participants, one NMT engine, the text type quality. Partly, the translations for both texts did newspaper article and a certain language pair and not meet their expectations since they assessed the direction, it revealed that participants use MT reg- MT output of higher quality before and of lower ularly or have used it at least once, especially quality after the evaluation. freely available systems. DeepL was the most fre- There was a disconfirmation of the respond- quently used system among the translation stu- ents’ expectations towards the error types in MT dents, ahead of Google Translate. We also saw that output. For O1, the participants expected a higher the users’ previous experience with MT systems frequency of fluency errors (on the first rank) be- has an impact on future expectations towards sim- fore the evaluation. However, they reported a ilar systems. This is in accordance with Anasta- higher frequency of accuracy errors after the eval- siou & Gupta (2011), assuming that freely availa- uation (62% on the first rank). ble, easily accessible MT which produces good- The expected error types in O2 mentioned by enough quality translations continues to be the the students may be influenced by the outcome of MT system of choice for casual users who wish to the analysis of the error issues found in O1. As translate websites or use MT for private purposes. mentioned before, after having analyzed O1, the The expectations towards working with the MT majority of the errors reported were related to ac- system among the analyzed user group were that curacy (62% on the first rank). This is also re- the system should work fast, function properly flected in the expected error issues reported for and can be used intuitively. O2. Here, accuracy errors were expected by 64% The majority of the respondents had considera- of the respondents (on the first rank). There was a ble experience of MT use for study or private pur- higher confirmation of their expectations towards poses. Almost half of the students (45% each) re- the translation quality of O2. For O2, there were ported that they expect MT output, in professional slightly more fluency and less accuracy errors (on and private contexts, to be useable immediately the first rank) reported than expected. without any further editing. However, when they This demonstrates that the participants in this used MT as part of their studies, more than half of study have rather low expectations towards the the respondents expected a raw translation they quality of the MT output. These expectations have can post-edit rather than an immediately usable been partly met, since the quality of both target translation. Gist translations were more important texts translated with the MT system was reported in a private context. A possible explanation for to be lower than expected. This might also be the this might be that the majority had already used reason why the participants expected the second MT output as a draft translation they post-edited. text to be of a slightly lower quality than the first Based on our experience, translation students aim one. This also means that there was a minor dis- for producing high-quality translations. There- crepancy between the pre-trial expectations and fore, they adapt the MT output to meet their ideal the errors found by the participants during evalu- expectations. For private purposes, however, they ation. Moreover, this user group expected a higher seem to use MT output not as a pre-translation frequency of some error types compared with the they can work on, but for languages they might reported post-trial frequency. not understand. Here, it might be more important to get the gist of the text rather than high accuracy and fluency. Thus, their expectations fall into the category of minimum tolerable. This finding Proceedings of MT Summit XVII, volume 2 Dublin, Aug. 19-23, 2019 | p. 46
seems to be consistent with other research which Gałuskina, 2017). Moreover, our analysis does not found a dance of agency (Cadwell et al., 2018). take account of intra-annotator or inter-annotator One interesting finding is that the students ex- agreement when identifying and categorizing the pected the MT output to be of rather low quality errors of the MT output. although they had used (general-purpose) MT be- The aim of (neural) MT is to reach the fluency fore. This finding is contrary to previous studies of human translations (Way, 2018). However, ac- which have suggested that those students express- curacy, e.g. whether the MT output imparts the ing higher skepticism towards MT had the least meaning of the source text, seems to be a major exposure to it (Fulford, 2002) and that a negative concern of translation students for the texts ana- attitude towards MT seems to be related to a lack lyzed in this study. NMT engines provide fluent of knowledge and (practical) experience (Gaspari, and easily readable translations. However, these 2001). However, these studies focused on the stu- fluent translations may mislead users to think that dents’ opinions or attitudes, whereas this study ad- the content is translated correctly, although the dressed their previous MT experience in relation message may be completely wrong. to their expectations as well as the confirmation or disconfirmation of their expectations. A possible 5 Conclusion explanation for the rather low expectations to- Translation should fulfil a specific purpose for the wards the quality of MT output is that students intended recipient in a certain context (Reiss & may be aware of the limitations of MT systems Vermeer, 1984). Therefore, this paper highlights since they use it in their studies. the importance of paying attention to user When we asked the students about their expec- expectations and not only to MT (quality) tations towards the MT output quality, accuracy evaluation (by users). This article attempts to and fluency were ranked high. This suggests that show that user expectations are crucial in accuracy and fluency made up translation quality translation, including processes in MT since they for them. This finding was also reported by an- may help predict user interventions, such as pre- other study, where translators expected an MT en- and post-editing. This paper argues that users’ past gine to suggest correct translations, which may re- experiences, expectations and (dis)confirmation fer to correct target-language syntax as well as of expectations frame human evaluation of MT. grammar and semantic equivalence to the source Therefore, users’ expectations should be factored text (Lagoudaki, 2008). in when introducing MT services and novel With a small sample size and a focus on trans- approaches to MT. lation students (and not professional translators), caution must be applied, as the findings might not Acknowledgment be generalizable to other user groups. However, MT-related tasks require other competences than This work has been partly funded by the European the traditional profile of professional translators Union’s Connecting Europe Facility under grant and additional competences than those acquired in agreement no. INEA/CEF/ICT/A2016/1297953. translator training (Pym, 2013). Professional translators may also have limited practical expo- References sure to MT and post-editing (Blagodarna 2018). In AIIC. 2018. Working languages. Retrieved from addition, a major issue with conceptualizing ex- https://aiic.net/page/4004/what-are-working-lan- pectations is the sources of information or lack guages-to-a-conference-interpreter/lang/1 thereof used to form expectations: marketing communication by developers, mass media, train- Anastasiou, Dimitra and Rajat Gupta. 2011. Compari- son of crowdsourcing translation with Machine ing settings, word-of-mouth referrals, and prior Translation. Journal of Information Science, experience with similar products. Service quality 37(6):637–659. is not static but should be considered as a dynamic https://doi.org/10.1177/0165551511418760 process (Boulding et al., 1993). Therefore, this study can only provide a small insight into user Blagodarna, Olena. 2018. Insights into post-edi- tors’profiles and post-editing practices. expectations of translation students at a certain Tradumàtica: tecnologies de la traducció, point in time. In addition, students may not have (16):35–51. identified all errors in the raw MT output. They may also lack critical evaluation of the MT output Boulding, William, Ajay Kalra, Richard Staelin, and and would rather search for errors that human Valarie A. Zeithaml. 1993. A Dynamic Process translators usually make (Sycz-Opoń & Model of Service Quality: From Expectations to Proceedings of MT Summit XVII, volume 2 Dublin, Aug. 19-23, 2019 | p. 47
Behavioral Intentions. Journal of Marketing Re- Lagoudaki, Elina 2008. The value of machine transla- search, 30(1):7–27. tion for the professional translator. Proceedings of https://doi.org/10.2307/3172510 the 8th Conference of the Association for Machine Translation in the Americas, 262–269. Cadwell, Patrick, Sharon O’Brien, and Carlos C. S. Teixeira. 2018. Resistance and accommodation: Lommel, Arle, Hans Uszkoreit, and Aljoscha Bur- factors for the (non-) adoption of machine transla- chardt. 2014. Multidimensional quality metrics tion among professional translators. Perspectives, (MQM): A framework for declaring and describ- 26(3):301–321. ing translation quality metrics. Tradumàtica: https://doi.org/10.1080/0907676X.2017.1337210 tecnologies de la traducció, (12):455–463. EMT. 2009. Competences for professional transla- Moorkens, Joss and Andy Way. 2016. Comparing tors, experts in multilingual and multimedia com- Translator Acceptability of TM and SMT outputs. munication. Retrieved from https://ec.eu- In Proceedings of the 19th Annual Conference of ropa.eu/info/sites/info/files/emt_compe- the European Association for Machine Transla- tences_translators_en.pdf tion, 141–151. EU Council Presidency Translator. 2019. EU Council Olson, Jerry and Philip A. Dover. 1979. Disconfirma- Presidency Translator. Retrieved from: tion of consumer expectations through product https://translate2018.eu/ trial. Journal of Applied Psychology (64):179–189. https://doi.org/10.1037/0021-9010.64.2.179 Fiederer, Rebecca and Sharon O’Brien. 2009. Quality and Machine Translation: A realistic objective? Papineni, Kishore, Salim Roukos, Todd Ward, and JoSTrans. (11):52–74. Retrieved from Wei-Jing Zhu. 2002. BLEU: A Method for Auto- http://www.jostrans.org/is- matic Evaluation of Machine Translation. In Pro- sue11/art_fiederer_obrien.pdf ceedings of the 40th Annual Meeting of the Associ- ation for Computational Linguistics (ACL), Phila- Fulford, Heather. 2002. Freelance translators and ma- delphia, 311–318. chine translation: An investigation of perceptions, https://doi.org/10.3115/1073083.1073135 uptake, experience and training needs. In 6th Euro- pean Association of Machine Translation, 117– Pym, Anthony. 2013. Translation skill-sets in a ma- 122. Retrieved from http://www.mt-ar- chine-translation age. Meta: Journal des tra- chive.info/EAMT-2002-Fulford.pdf ducteurs/Meta: Translators’ Journal, 58(3):487- 503. Gaspari, Federico. 2001. Teaching Machine Transla- tion to Trainee Translators: A Survey of Their Reiss, Katharina and Hans J. Vermeer. 1984. Grund- Knowledge and Opinions. In Proceedings of the legung einer allgemeinen Translationstheorie. MT Summit VIII, Santiago de Compostela, Spain, Linguistische Arbeiten: Vol. 147. Tübingen, Max 35–44. Niemeyer. Gaspari, Federico, Hala Almaghout, and Stephen Koskinen, Kaisa and Minna Ruokonen. 2017. Love Doherty. 2015. A survey of machine translation letters or hate mail? Translators’ technology ac- competences: Insights for translation technology ceptance in the light of their emotional narratives. educators and practitioners. Perspectives, In Dorothy Kenny (ed.). Human issues in transla- 23(3):333–358. tion technology, Routledge, 26-42. https://doi.org/10.1080/0907676X.2014.979842 Specia, Lucia, Craig Saunders, Marco Turchi, Zhu- Görög, Attila. 2014. Dynamic Quality Framework: oran Wang, and John Shawe-Taylor. 2009. Im- quantifying and benchmarking quality. proving the confidence of machine translation Tradumàtica: tecnologies de la traducció, quality estimates. Proceedings of the Twelfth Ma- (12):443–454. https://doi.org/10.5565/rev/tradu- chine Translation Summit (MT Summit XII), 136– matica.66 143. Higgs, Brownyn, Michael Jay Polonsky, and Mary Sycz-Opoń, Joanna and Ksenia Gałuskina. 2017. Ma- Hollick. 2005. Measuring expectations: forecast chine Translation in the Hands of Trainee Transla- vs. ideal expectations. Does it really matter? Jour- tors – an Empirical Study. Studies in Logic, Gram- nal of Retailing and Consumer Services, 12(1):49– mar and Rhetoric, 49(1):195–212. 64. https://doi.org/10.1016/j.jretcon- https://doi.org/10.1515/slgr-2017-0012 ser.2004.02.002 Way, Andy. 2018. Quality expectations of machine Klubička, Filip, Antonio Toral, and translation. Retrieved from Víctor M. Sánchez-Cartagena. 2018. Quantitative http://arxiv.org/pdf/1803.08409v1 fine-grained human evaluation of machine transla- tion systems: a case study on English to Croatian. Machine Translation, 32(3):195–215. Proceedings of MT Summit XVII, volume 2 Dublin, Aug. 19-23, 2019 | p. 48
You can also read