User expectations towards machine translation: A case study

Page created by Alan Butler
 
CONTINUE READING
User expectations towards machine translation: A case study

         Barbara Heinisch                 Vesna Lušicky
    Centre for Translation Studies Centre for Translation Studies
        University of Vienna           University of Vienna
               Austria                        Austria
barbara.heinisch@univie.ac.at vesna.lusicky@univie.ac.at

                                                              1    Introduction
                        Abstract
                                                              Language technology applications have become a
    Neural machine translation (NMT) sys-                     ubiquitous service used by various user groups to
    tems have emerged as powerful platforms                   overcome language barriers. While certain types
    for providing fluent translations in a vari-              of technology, such as translation memory sys-
    ety of languages and domains. The wide-                   tems, are specialized tools used by translators
    spread adoption of NMT has heightened                     only, machine translation (MT) systems are also
    the need for studying the results and im-                 used by non-translators. If the exposure of MT us-
    pact of these systems. Although ac-                       ers was somewhat limited to gist translation in the
    ceptance of machine translation has been                  past, users are increasingly implementing MT in
    analyzed, the expectations of users to-                   professional and other scenarios. The acceptance
    wards NMT have not received much atten-                   of MT tools and services is attested by the high
    tion yet. This paper investigates the expec-              number of users of generic online MT services
    tations of novice translators enrolled on a               (Way, 2018). Based on their prior experiences, us-
    postgraduate program in specialized trans-                ers develop and form expectations towards MT.
    lation. In addition, it examines the confir-                 Expectations are beliefs about attributes or per-
    mation or disconfirmation of expectations                 formance of a product or service in the future (Ol-
    towards machine translation (MT) output                   son et al., 1979). Users’ expectations may have an
    among this user group. A three-step                       influence on the intended use and evaluation of
    mixed-method approach was applied: a                      MT. Expectations also provide the frame of refer-
    quantitative questionnaire and two recur-                 ence for satisfaction (Higgs et al., 2005). Satisfac-
    rent (pre-trial and post-trial) evaluations               tion with a service is crucial when introducing or
    of raw MT outputs. The evaluations con-                   evaluating MT. Expectations are dynamic con-
    sisted of the identification and classifica-              structs, a synergy of users’ pre-trial perceptions
    tion of errors in NMT output according to                 and beliefs about performance or attributes of a
    the Multidimensional Quality Metrics.                     product or a service. Although there is some am-
    The respondents expected the MT output                    biguity regarding the definition and operationali-
    to be of rather low quality, but the quality              zation of expectations, the service quality litera-
    of NMT output was not as high as the par-                 ture differentiates several categories of expecta-
    ticipants expected. Compared to the ex-                   tions, most frequently: forecast, normative, ideal
    pected frequency of error types in the MT                 and minimum tolerable. The four categories cover
    output, the reported frequency differed                   different dimensions of expectations: forecast de-
    significantly. This paper argues that the                 scribes users’ perception of what will occur; nor-
    users’ experience and expectations have                   mative describes users’ perception of what should
    an impact on the use and evaluation of ma-                occur; ideal describes the highest level attainable
    chine translation.                                        in a category; and minimum tolerable describes
                                                              the minimum baseline for normative and ideal

 © 2019 The authors. This article is licensed under a Crea-
tive Commons 4.0 licence, no derivative works, attribution,
CC BY-ND.

Proceedings of MT Summit XVII, volume 2                                      Dublin, Aug. 19-23, 2019 | p. 42
(Higgs et al., 2005). Users’ expectations and the      is used to get the gist of a text or for publication
type of expectations depend on internal and exter-     purposes, is usually not taken into account. Only
nal cues, such as users’ prior experience and infor-   the latter would usually require post-editing.
mation on products.                                       A series of error typologies have been devel-
   Users’ (quality) expectations towards MT out-       oped to assess the quality of machine-translated
put and resulting implications for MT use are an       content. The Multidimensional Quality Metrics
under-explored topic in MT research (Way, 2018).       (MQM) error typology (Lommel et al., 2014) has
So far, expectations were addressed in relation to     been increasingly used and expanded for the eval-
the estimation of the quality of post-editing effort   uation of NMT (Klubička et al., 2018). The MQM
(Specia et al., 2009). Way (2018) gives an over-       framework provides a comprehensive typology of
view of what level of quality can be expected from     quality issues. This error typology contains stand-
MT. Existing research recognizes the critical role     ardized names and definitions of errors and has
played by adoption (Cadwell et al., 2018) and ac-      the flexibility of several assessment layers and
ceptance of MT (Moorkens & Way, 2016;                  their granularity. The MQM issues are organized
Koskinen & Ruokonen, 2017). Gaspari et al.             in eight major dimensions: Accuracy, fluency, ter-
(2015) also attempted to map the expectations, re-     minology, locale convention, style, verity, design,
quirements and needs of the translation industry       and internationalization (Lommel et al., 2014).
concerning translation quality and MT.                    By the nature of design, the assessment of the
   With the widespread application of neural ma-       quality of MT output is a post-trial evaluation and
chine translation (NMT) as the MT approach of          does not consider pre-trial expectations.
choice in generic as well as specialized MT sys-
tems, the question of pre-trial user expectations      2     Research design and method
should be addressed, especially user expectations
                                                       The research reported in this paper has several ob-
based on previous use and information obtained
                                                       jectives. First, the research investigates the expec-
on the service. They may have implications for the
                                                       tations of a group of postgraduate specialized
users’ intended purpose of MT use and their satis-
                                                       translation students towards MT. This paper ex-
faction with the service. The notion of expecta-
                                                       plores how previous experience with MT influ-
tions should also be considered in human evalua-
                                                       ences their expectations towards the overall qual-
tion of MT output: the types of expectations and a
                                                       ity of and error types found in MT output. Second,
potential negative bias may influence the results
                                                       it seeks to examine the confirmation or disconfir-
of human evaluations of MT output.
                                                       mation of these expectations by an evaluation of
   There is a growing body of literature that rec-
                                                       two MT outputs.
ognizes the importance of quality assessment of
                                                           This study makes a contribution to research on
MT output. For MT developers, scale and robust-
                                                       expectations towards MT by demonstrating that
ness are major concerns, but end-users are also in-
                                                       experience and expectations influence the use of
terested in receiving good-enough or high-quality
                                                       MT systems and the evaluation of MT output. We
translations (Way, 2018). The concept of fitness-
                                                       applied a mixed-method approach, combining a
for-purpose of translation has been widely recog-
                                                       quantitative questionnaire as well as MT output
nized, but the assessment methods vary in opera-
                                                       evaluation, i.e. error identification, error classifi-
tionalization and theoretical framework. The qual-
                                                       cation and correction of MT output.
ity of MT output is either assessed automatically
or by humans. First, automatic evaluation is usu-      2.1    Questionnaire
ally based on evaluation metrics such as BLEU
(Papineni et al., 2002), NIST, WNMf or ME-             A questionnaire consisting of three parts with
TEOR (Anastasiou & Gupta, 2011). Metrics such          closed and open questions was distributed among
as BLEU compare the MT output string with a hu-        the user group. The first part was designed to as-
man translation which is seen as “gold standard”.      certain the respondents’ translation experience,
However, these metrics ignore the source sentence      working languages (A, B and C language (AIIC,
as a reference and the fact that there might be        2018)) and professional experience.
more than one correct translation (Way, 2018).            The second part of the questionnaire addressed
Second, human evaluation (also) requires the use       the respondents’ prior experience in MT use, in-
of evaluation criteria (a brief overview of evalua-    cluding the frequency of and reasons for MT use.
tion criteria provide Fiederer & O’Brien (2009)).      The participants were asked to state whether they
When comparing raw MT output with human                use MT for professional, study or private pur-
translations, the purpose of MT, e.g. whether MT       poses, which MT systems they use and for which

Proceedings of MT Summit XVII, volume 2                               Dublin, Aug. 19-23, 2019 | p. 43
types of text. This part also elicited information on   puts used for evaluation were excerpts from Brit-
the respondents’ forecast, normative and ideal ex-      ish newspaper articles on a topic related to Aus-
pectations towards MT. The participants were            tria. They comprised about 200 words each and
asked to rank the quality-related issues and their      were translated from English to German with the
frequency they would expect in MT output ac-            EU Council Presidency Translator (2019) plat-
cording to the MQM. All respondents had to state        form. The study participants were provided with
the most frequent errors they expect in MT output.      the source texts and the raw MT output as well as
   The third part of the questionnaire elicited in-     the MQM and TAUS DQF spreadsheet for both
formation on the quality expectations and ex-           texts. The sentences in German were evaluated at
pected errors when using an MT system for two           the segment level in accordance with the MQM.
different texts. The students were asked to read the
English source text. Afterwards they had to state       3     Results
their expectations towards the quality of the re-
                                                        3.1    Profile of the respondents
lated MT output utilizing a five-point grade sys-
tem (excellent, good, satisfactory, sufficient, use-    Of the final cohort of 47 respondents, 8 already
less). They had to rank the expected errors in the      worked as professional translators and 39 were
MT output according to the MQM. Second, they            novice translators. The majority (68%) of the
had to download a spreadsheet containing the            respondents worked with German as A language,
MQM and TAUS Dynamic Quality Framework                  ahead of Italian (11%) and Russian, Hungarian,
(DQF) (Görög, 2014). They compared the source           Polish, English and French. More than half of the
and target text and identified (and corrected) er-      participants (60%) stated that English was their B
rors in the MT output. Each error was assigned to       language, with German, Russian, Croatian and
an MQM error (sub)category and an error severity        Japanese being the B language of the remaining
level on a five-point scale in the spreadsheet. The     respondents. The C languages were quite diverse,
completed spreadsheets served as basis for the          ranging from English (38%), French, Spanish,
third step, which consisted in ranking the error        Slovakian, Italian, German to Greek and
types found in the MT output according to their         Romanian. Six respondents stated that they do not
frequency. By using the TAUS DQF and MQM                work with a C language. When asked about their
for the error identification and classification task,   translation experience, the majority (79%)
we could compare their expectations with the            indicated that they had translated more than 15
evaluation result.                                      texts during their studies. The 8 students (17%)
   The questionnaire was circulated in early 2019.      who had already worked as professional
79 students enrolled on a master’s program in           translators were active in the fields of engineering,
translation and focusing on specialized translation     social sciences and humanities.
were recruited for this study. 32 individuals were
excluded from the study because English was             3.2    Experience in MT use
none of their working languages or they did not         About 62% of the respondents already had expe-
complete all the tasks.                                 rience in MT use. Almost all of them (93%) re-
                                                        ported that they use MT as part of their studies.
2.2   Evaluation of MT output
                                                        More than two-thirds (69%) indicated that they
The objective of the participants’ evaluation of        use MT for private purposes and 31% of the re-
MT output in the third part of the questionnaire        spondents for professional purposes. When asked
was to collect the error issues detected in raw MT      about the frequency of MT use in a professional,
output by the respondents. The evaluation was           private or study context, 41% of the students indi-
used for contrastive analysis of users’ expecta-        cated that they use MT for study purposes on a
tions towards error issues in MT output and the         weekly basis and the remainder several times a
actual errors detected. It helped analyze the con-      year (19%) or several times a month (15%). For
firmation or disconfirmation of expectations.           private purposes, they commented to use MT sev-
   The quality of the raw MT output was evaluated       eral times a year (31%), on a weekly basis (21%),
by the students based on the MQM error typology         on a daily basis (3%) or never (14%). For profes-
and the TAUS DQF. Prior to evaluation, they were        sional purposes, the respondents indicated that
familiarized with both frameworks.                      they never use MT (55%) or they use it several
   The students were given two English source           times a month (17%), on a daily basis (14%), sev-
texts and their German MT outputs. The MT out-          eral times a year or on a weekly basis (7% each).

Proceedings of MT Summit XVII, volume 2                                 Dublin, Aug. 19-23, 2019 | p. 44
Those experienced in MT use translated docu-        (30%) and style (23%) were the two main aspects
ments, e.g. reports or files (79%), ahead of web-      on the third rank while locale conventions and
sites (34%) or correspondence, e.g. e-mails            style (23% each) had the highest number of re-
(24%). Most of them reported that they use MT          sponses on the fourth rank. Design and verity were
for translations from German into English and          mentioned predominantly on ranks 6 and 7.
vice versa. They listed DeepL (69%) and Google
Translate (59%) when asked about the MT system         3.4   Expectations towards error types and
of choice. Another system mentioned was eTrans-              their (dis)confirmation
lation. Among the MT systems which the respond-        After having read the first source text (ST1), the
ents already tested but did not use frequently were    respondents rated the expected quality of the
Google Translate, the Facebook translator, Bing,       related MT output (O1) with a grade ranging from
Yandex and Babel.                                      excellent to useless. Almost half (49%) of the
   The reasons for using MT included saving time       respondents expected the quality of the MT output
(69%), getting the gist of a text (66%), consulting    to be sufficient, while 40% of those surveyed
a reference (55%), avoiding repetitive work            expected satisfactory MT output. Only a small
(31%), avoiding typing (21%) and avoiding re-          number of the participants expected good quality
search (3%).                                           (4%) or useless translations (6%). After having
                                                       read O1 and after having identified, categorized
3.3   Expectations towards MT quality
                                                       and corrected the errors in the raw MT output, the
The participants expected MT to provide a raw          participants rated the quality of O1 as follows:
translation, i.e. a first draft they can post-edit     Sufficient (40%), useless (28%), satisfactory
(53%) or a gist translation (38%) when using MT        (23%) and good (9%). Thus, the number of
for study purposes. Only 5 respondents (11%)           useless grades increased significantly while the
would want MT to provide immediately usable            number of satisfactory and sufficient grades
translations in a study context. For professional      decreased.
and private purposes, 21 respondents (45%)                The expected errors and their frequency in O1
expected MT output to produce texts which can be       were primarily related to fluency (38% on the first
used immediately without post-editing, i.e. they       rank), accuracy (28% on the first rank, 32% on the
expected a final translation. For professional         second rank), style (23% on second rank) and
purposes, 15 respondents (32%) reported that they      terminology (21% on third rank). When compared
would use MT output as a draft translation. For        to the errors reported, accuracy errors increased
private purposes, 24 respondents (51%) would use       and fluency and verity errors decreased on rank 1,
MT output only as a gist translation. This means       while fluency errors increased, and accuracy and
that draft translations were more important in a       terminology errors decreased on rank 2. Style
study context, whereas gist purposes (to               errors increased slightly on rank 3 while locale
understand the meaning of the text) and final          convention errors increased on rank 4.
translations were more relevant in a private              For the second source text (ST2), the students
context.                                               predominantly expected the MT output (O2) to be
   When asked to rank their general expectations       of sufficient quality (55%) or useless (26%). The
towards working with an MT system, 81% of the          other students reported that O2 would have satis-
respondents ranked fast translation first. Proper      factory (13%) or good quality (6%). Compared to
functioning and intuitive use of the MT system         their expectations, they rated the actual translation
ranked second among 60% of the respondents,            to be of lower quality. The participants stated that
whereas intuitive use still ranked third among         O2 was useless (36%) or of sufficient quality
28% of the respondents. On ranks 4 to 6 the re-        (49%). This demonstrates that they expected the
spondents predominately listed translation of dif-     MT output to be of higher quality than later re-
ferent file formats, status feedback and accessibil-   ported.
ity of the MT system.                                     When asked about the expected error types in
   In response to the question about the expected      O2, well over half (64%) of the respondents
quality-related issues in MT output, nearly a third    ranked accuracy errors first and more than half
(30%) of those surveyed ranked accuracy first          (57%) ranked fluency errors second. Well under
while nearly one quarter (23%) ranked fluency          half of those surveyed (40%) ranked style errors
first. Just over a third of those who responded        third. After completing the MQM table, there was
ranked accuracy second, while approximately a          a significant increase in fluency errors and de-
fifth (21%) ranked fluency second. Terminology         crease of accuracy errors on rank 1 as well as a

Proceedings of MT Summit XVII, volume 2                               Dublin, Aug. 19-23, 2019 | p. 45
significant increase in fluency errors on rank 2 and    4    Discussion
a slight increase in terminology errors. On rank 3,
the students reported a higher number of accuracy       We focused on postgraduate translation students
errors and a smaller number of locale convention        due to the documented competence profile of this
errors than expected.                                   user group. Their competence profile included
   Thus, both accuracy and fluency were the             translation, technological and revision compe-
MQM error categories listed the most in all ana-        tence (EMT, 2009). Therefore, we assumed that
lyzed areas, i.e. the overall quality of MT output,     the students had a basic knowledge of MT sys-
the expected error types and the error types found.     tems, their advantages and disadvantages as well
However, the data showed a slight shift of the ac-      as post-editing. It was necessary to familiarize
curacy and fluency categories between the ex-           them with the rather complex MQM framework
pected and actual error types in both texts.            which required a certain amount of time.
   In summary, the majority of the participants ex-        Although this study is limited to a small num-
pected the MT output to be of sufficient or inferior    ber of participants, one NMT engine, the text type
quality. Partly, the translations for both texts did    newspaper article and a certain language pair and
not meet their expectations since they assessed the     direction, it revealed that participants use MT reg-
MT output of higher quality before and of lower         ularly or have used it at least once, especially
quality after the evaluation.                           freely available systems. DeepL was the most fre-
   There was a disconfirmation of the respond-          quently used system among the translation stu-
ents’ expectations towards the error types in MT        dents, ahead of Google Translate. We also saw that
output. For O1, the participants expected a higher      the users’ previous experience with MT systems
frequency of fluency errors (on the first rank) be-     has an impact on future expectations towards sim-
fore the evaluation. However, they reported a           ilar systems. This is in accordance with Anasta-
higher frequency of accuracy errors after the eval-     siou & Gupta (2011), assuming that freely availa-
uation (62% on the first rank).                         ble, easily accessible MT which produces good-
   The expected error types in O2 mentioned by          enough quality translations continues to be the
the students may be influenced by the outcome of        MT system of choice for casual users who wish to
the analysis of the error issues found in O1. As        translate websites or use MT for private purposes.
mentioned before, after having analyzed O1, the            The expectations towards working with the MT
majority of the errors reported were related to ac-     system among the analyzed user group were that
curacy (62% on the first rank). This is also re-        the system should work fast, function properly
flected in the expected error issues reported for       and can be used intuitively.
O2. Here, accuracy errors were expected by 64%             The majority of the respondents had considera-
of the respondents (on the first rank). There was a     ble experience of MT use for study or private pur-
higher confirmation of their expectations towards       poses. Almost half of the students (45% each) re-
the translation quality of O2. For O2, there were       ported that they expect MT output, in professional
slightly more fluency and less accuracy errors (on      and private contexts, to be useable immediately
the first rank) reported than expected.                 without any further editing. However, when they
   This demonstrates that the participants in this      used MT as part of their studies, more than half of
study have rather low expectations towards the          the respondents expected a raw translation they
quality of the MT output. These expectations have       can post-edit rather than an immediately usable
been partly met, since the quality of both target       translation. Gist translations were more important
texts translated with the MT system was reported        in a private context. A possible explanation for
to be lower than expected. This might also be the       this might be that the majority had already used
reason why the participants expected the second         MT output as a draft translation they post-edited.
text to be of a slightly lower quality than the first   Based on our experience, translation students aim
one. This also means that there was a minor dis-        for producing high-quality translations. There-
crepancy between the pre-trial expectations and         fore, they adapt the MT output to meet their ideal
the errors found by the participants during evalu-      expectations. For private purposes, however, they
ation. Moreover, this user group expected a higher      seem to use MT output not as a pre-translation
frequency of some error types compared with the         they can work on, but for languages they might
reported post-trial frequency.                          not understand. Here, it might be more important
                                                        to get the gist of the text rather than high accuracy
                                                        and fluency. Thus, their expectations fall into the
                                                        category of minimum tolerable. This finding

Proceedings of MT Summit XVII, volume 2                                Dublin, Aug. 19-23, 2019 | p. 46
seems to be consistent with other research which       Gałuskina, 2017). Moreover, our analysis does not
found a dance of agency (Cadwell et al., 2018).        take account of intra-annotator or inter-annotator
   One interesting finding is that the students ex-    agreement when identifying and categorizing the
pected the MT output to be of rather low quality       errors of the MT output.
although they had used (general-purpose) MT be-           The aim of (neural) MT is to reach the fluency
fore. This finding is contrary to previous studies     of human translations (Way, 2018). However, ac-
which have suggested that those students express-      curacy, e.g. whether the MT output imparts the
ing higher skepticism towards MT had the least         meaning of the source text, seems to be a major
exposure to it (Fulford, 2002) and that a negative     concern of translation students for the texts ana-
attitude towards MT seems to be related to a lack      lyzed in this study. NMT engines provide fluent
of knowledge and (practical) experience (Gaspari,      and easily readable translations. However, these
2001). However, these studies focused on the stu-      fluent translations may mislead users to think that
dents’ opinions or attitudes, whereas this study ad-   the content is translated correctly, although the
dressed their previous MT experience in relation       message may be completely wrong.
to their expectations as well as the confirmation or
disconfirmation of their expectations. A possible      5    Conclusion
explanation for the rather low expectations to-
                                                       Translation should fulfil a specific purpose for the
wards the quality of MT output is that students
                                                       intended recipient in a certain context (Reiss &
may be aware of the limitations of MT systems
                                                       Vermeer, 1984). Therefore, this paper highlights
since they use it in their studies.
                                                       the importance of paying attention to user
   When we asked the students about their expec-
                                                       expectations and not only to MT (quality)
tations towards the MT output quality, accuracy
                                                       evaluation (by users). This article attempts to
and fluency were ranked high. This suggests that
                                                       show that user expectations are crucial in
accuracy and fluency made up translation quality
                                                       translation, including processes in MT since they
for them. This finding was also reported by an-
                                                       may help predict user interventions, such as pre-
other study, where translators expected an MT en-
                                                       and post-editing. This paper argues that users’ past
gine to suggest correct translations, which may re-
                                                       experiences, expectations and (dis)confirmation
fer to correct target-language syntax as well as
                                                       of expectations frame human evaluation of MT.
grammar and semantic equivalence to the source
                                                       Therefore, users’ expectations should be factored
text (Lagoudaki, 2008).
                                                       in when introducing MT services and novel
   With a small sample size and a focus on trans-
                                                       approaches to MT.
lation students (and not professional translators),
caution must be applied, as the findings might not     Acknowledgment
be generalizable to other user groups. However,
MT-related tasks require other competences than        This work has been partly funded by the European
the traditional profile of professional translators    Union’s Connecting Europe Facility under grant
and additional competences than those acquired in      agreement no. INEA/CEF/ICT/A2016/1297953.
translator training (Pym, 2013). Professional
translators may also have limited practical expo-      References
sure to MT and post-editing (Blagodarna 2018). In
                                                       AIIC. 2018. Working languages. Retrieved from
addition, a major issue with conceptualizing ex-          https://aiic.net/page/4004/what-are-working-lan-
pectations is the sources of information or lack          guages-to-a-conference-interpreter/lang/1
thereof used to form expectations: marketing
communication by developers, mass media, train-        Anastasiou, Dimitra and Rajat Gupta. 2011. Compari-
                                                         son of crowdsourcing translation with Machine
ing settings, word-of-mouth referrals, and prior
                                                         Translation. Journal of Information Science,
experience with similar products. Service quality        37(6):637–659.
is not static but should be considered as a dynamic      https://doi.org/10.1177/0165551511418760
process (Boulding et al., 1993). Therefore, this
study can only provide a small insight into user       Blagodarna, Olena. 2018. Insights into post-edi-
                                                          tors’profiles and post-editing practices.
expectations of translation students at a certain
                                                          Tradumàtica: tecnologies de la traducció,
point in time. In addition, students may not have         (16):35–51.
identified all errors in the raw MT output. They
may also lack critical evaluation of the MT output     Boulding, William, Ajay Kalra, Richard Staelin, and
and would rather search for errors that human            Valarie A. Zeithaml. 1993. A Dynamic Process
translators usually make (Sycz-Opoń &                    Model of Service Quality: From Expectations to

Proceedings of MT Summit XVII, volume 2                                Dublin, Aug. 19-23, 2019 | p. 47
Behavioral Intentions. Journal of Marketing Re-        Lagoudaki, Elina 2008. The value of machine transla-
   search, 30(1):7–27.                                       tion for the professional translator. Proceedings of
   https://doi.org/10.2307/3172510                           the 8th Conference of the Association for Machine
                                                             Translation in the Americas, 262–269.
Cadwell, Patrick, Sharon O’Brien, and Carlos C. S.
   Teixeira. 2018. Resistance and accommodation:          Lommel, Arle, Hans Uszkoreit, and Aljoscha Bur-
   factors for the (non-) adoption of machine transla-      chardt. 2014. Multidimensional quality metrics
   tion among professional translators. Perspectives,       (MQM): A framework for declaring and describ-
   26(3):301–321.                                           ing translation quality metrics. Tradumàtica:
   https://doi.org/10.1080/0907676X.2017.1337210            tecnologies de la traducció, (12):455–463.
EMT. 2009. Competences for professional transla-          Moorkens, Joss and Andy Way. 2016. Comparing
  tors, experts in multilingual and multimedia com-         Translator Acceptability of TM and SMT outputs.
  munication. Retrieved from https://ec.eu-                 In Proceedings of the 19th Annual Conference of
  ropa.eu/info/sites/info/files/emt_compe-                  the European Association for Machine Transla-
  tences_translators_en.pdf                                 tion, 141–151.
EU Council Presidency Translator. 2019. EU Council        Olson, Jerry and Philip A. Dover. 1979. Disconfirma-
  Presidency Translator. Retrieved from:                     tion of consumer expectations through product
  https://translate2018.eu/                                  trial. Journal of Applied Psychology (64):179–189.
                                                             https://doi.org/10.1037/0021-9010.64.2.179
Fiederer, Rebecca and Sharon O’Brien. 2009. Quality
   and Machine Translation: A realistic objective?        Papineni, Kishore, Salim Roukos, Todd Ward, and
   JoSTrans. (11):52–74. Retrieved from                      Wei-Jing Zhu. 2002. BLEU: A Method for Auto-
   http://www.jostrans.org/is-                               matic Evaluation of Machine Translation. In Pro-
   sue11/art_fiederer_obrien.pdf                             ceedings of the 40th Annual Meeting of the Associ-
                                                             ation for Computational Linguistics (ACL), Phila-
Fulford, Heather. 2002. Freelance translators and ma-
                                                             delphia, 311–318.
   chine translation: An investigation of perceptions,
                                                             https://doi.org/10.3115/1073083.1073135
   uptake, experience and training needs. In 6th Euro-
   pean Association of Machine Translation, 117–          Pym, Anthony. 2013. Translation skill-sets in a ma-
   122. Retrieved from http://www.mt-ar-                    chine-translation age. Meta: Journal des tra-
   chive.info/EAMT-2002-Fulford.pdf                         ducteurs/Meta: Translators’ Journal, 58(3):487-
                                                            503.
Gaspari, Federico. 2001. Teaching Machine Transla-
   tion to Trainee Translators: A Survey of Their         Reiss, Katharina and Hans J. Vermeer. 1984. Grund-
   Knowledge and Opinions. In Proceedings of the             legung einer allgemeinen Translationstheorie.
   MT Summit VIII, Santiago de Compostela, Spain,            Linguistische Arbeiten: Vol. 147. Tübingen, Max
   35–44.                                                    Niemeyer.
Gaspari, Federico, Hala Almaghout, and Stephen            Koskinen, Kaisa and Minna Ruokonen. 2017. Love
   Doherty. 2015. A survey of machine translation            letters or hate mail? Translators’ technology ac-
   competences: Insights for translation technology          ceptance in the light of their emotional narratives.
   educators and practitioners. Perspectives,                In Dorothy Kenny (ed.). Human issues in transla-
   23(3):333–358.                                            tion technology, Routledge, 26-42.
   https://doi.org/10.1080/0907676X.2014.979842
                                                          Specia, Lucia, Craig Saunders, Marco Turchi, Zhu-
Görög, Attila. 2014. Dynamic Quality Framework:              oran Wang, and John Shawe-Taylor. 2009. Im-
   quantifying and benchmarking quality.                     proving the confidence of machine translation
   Tradumàtica: tecnologies de la traducció,                 quality estimates. Proceedings of the Twelfth Ma-
   (12):443–454. https://doi.org/10.5565/rev/tradu-          chine Translation Summit (MT Summit XII), 136–
   matica.66                                                 143.
Higgs, Brownyn, Michael Jay Polonsky, and Mary            Sycz-Opoń, Joanna and Ksenia Gałuskina. 2017. Ma-
   Hollick. 2005. Measuring expectations: forecast           chine Translation in the Hands of Trainee Transla-
   vs. ideal expectations. Does it really matter? Jour-      tors – an Empirical Study. Studies in Logic, Gram-
   nal of Retailing and Consumer Services, 12(1):49–         mar and Rhetoric, 49(1):195–212.
   64. https://doi.org/10.1016/j.jretcon-                    https://doi.org/10.1515/slgr-2017-0012
   ser.2004.02.002
                                                          Way, Andy. 2018. Quality expectations of machine
Klubička, Filip, Antonio Toral, and                         translation. Retrieved from
   Víctor M. Sánchez-Cartagena. 2018. Quantitative          http://arxiv.org/pdf/1803.08409v1
   fine-grained human evaluation of machine transla-
   tion systems: a case study on English to Croatian.
   Machine Translation, 32(3):195–215.

Proceedings of MT Summit XVII, volume 2                                   Dublin, Aug. 19-23, 2019 | p. 48
You can also read