MACHINE TRANSLATION REPORT Q2/2021 - Memsource

Page created by Brett Banks
 
CONTINUE READING
MACHINE
TRANSLATION
REPORT Q2/2021
CONTENTS

INTRODUCTION                     2

REPORT STRUCTURE                 3

MEASURING QUALITY                4

DETERMINING DOMAINS              5

DATA                             6

MT QUALITY BY LANGUAGE PAIR       7

LANGUAGE PAIR IN THE SPOTLIGHT   13

DOMAIN IN THE SPOTLIGHT          16

MEMSOURCE TRANSLATE              18

CONCLUSION                       19
Introduction
Welcome to the Q2/2021 issue of Memsource’s Machine Translation Report.

Machine translation (MT) is one of the key technologies for localization and will only become
more important going forward, as MT systems become even more sophisticated and the de-
mand for localized content continues to grow.

At Memsource, our mission is to help our customers translate efficiently. We see MT as a truly
essential part of a translator’s toolbox, and so one of our main goals is to enable our custom-
ers to utilize MT to its full potential.

With that in mind, we’ve developed features such as MT quality estimation and Memsource
Translate, which help users better deal with the sometimes unpredictable quality of MT out-
put.

With this MT report, we hope to give you a general overview of how MT currently performs
across different language pairs and how this has changed since our last report, with spotlight
sections that will be especially interesting to those that translate into German or work with
customer-facing User Support content.

We hope that this report will help you in make informed decisions about MT, but also see
how Memsource monitors and analyzes MT data to ensure that our customers always use
optimal-quality MT.

               Aleš Tamchyna
               AI Research and Development Manager
               at Memsource

                                                 FOR MORE INFORMATION VISIT MEMSOURCE.COM     2
Report Structure
The report is split into three mains sections. Readers familiar with our reports can skip to
the data presented in section 2 and 3.

01 METHODS
    a. Measuring Quality: how we measure machine translation quality.
    b. Domains: how we define and identify domains.
    c. Data: what kind of data was used to produce this report.

02 MT Quality by Language Pair
    a. MT Quality by Language Pair: an overview of MT quality by language pair.
    b. Changes in Language Pair Quality: changes in recorded MT quality since
    our last report.

03 Spotlight section
    a. Language Pair: engine performance by domain for English-German.
    b. Domain: engine performance for the User Support domain for a variety of
    language pairs.

                                               FOR MORE INFORMATION VISIT MEMSOURCE.COM    3
Measuring Quality                                                                                01
Throughout this report we will be using post-editing data to measure the quality of MT
output. Post-editing itself is the process of adapting machine translation output by a pro-
fessional linguist.

To determine the post-editing score (PE), we          without any change. A score of 0 means that the
measure the post-editing effort, essentially the      whole segment was comprehensively rewrit-
percentage change of any given segment re-            ten, without any of the original being retained.
quired to achieve the final translation. Through-
out this report it is expressed as a number be-       We use anonymized Memsource post-editing
tween 0 and 100.                                      data taken from real workflows. This has the
                                                      advantage of providing us with a continuously
To illustrate this, a score of 100 for a given seg-   growing dataset, but also directly reflects real
ment means that the MT output was accepted            life use cases.

                                                      FOR MORE INFORMATION VISIT MEMSOURCE.COM       4
Determining Domains                                                                             01
The type of content you translate with ma-         In Memsource domains were created au-
chine translation can significantly affect the     tomatically using an AI-powered analysis
quality of the output. An engine that might        of documents. An unsupervised machine
perform well for product descriptions might        learning algorithm recognized 11 distinct
struggle with a medical report. The content        types of documents that shared similar sets
type variable is known as a domain.                of keywords. Based on the selection of key-
                                                   words, the domains were then named by
Categorizing documents into domains is             Memsource engineers.
normally a process that requires human
input, which creates complications when it         Here is a list of domains together with a few
comes to scaling translation and allows for        associated keywords:
basic human error.

DOMAIN                                  KEYWORDS
1. Medical                              'study', 'patients', 'patient', 'treatment', 'dose', 'mg',
                                        'clinical'
2. Travel and Hospitality               'km', 'hotel', 'guests', 'room', 'accommodation'

3. Business and Education               'team', 'business', 'work', 'school', 'students'

4. Legal and Finance                     'agreement', 'company', 'contract', 'services',
                                        'financial'
5. Software User Documentation          'click', 'select', 'data', 'text', 'view', 'file'

6. Consumer Electronics                 'power', 'battery', 'switch', 'sensor', 'usb'

7. User Support                         'please', 'email', 'account', 'domain', 'contact'

8. Cloud Services                       'network', 'server', 'database', 'sql', 'data'

9. Industrial                           'mm', 'pressure', 'valve', 'machine', 'oil'

10. Software Development                'value', 'class', 'type', 'element', 'string'

11. Entertainment                       'game', 'like', 'get', 'love', 'play', 'go'

                                                   FOR MORE INFORMATION VISIT MEMSOURCE.COM          5
Data                                                                                       01
1) Data Selection & Volume

The report is based on data collected in Q4 2020.

As in our previous reports, the data for the spotlight sections was collected exclusively us-
ing Memsource Translate. The data for the quality score matrix is taken from the Memsource
platform in general.

To gather precise MT quality results, we’ve filtered the translation segments based on the
following criteria:

      • The segment was either post-edited starting from raw MT output or MT was availa-
        ble, but the linguist decided to translate from scratch.

      • Segments which were translated using translation memory matches or non-trans-
        latable segments are excluded.

This filtering means that the data collected should closely reflect the required post-editing
effort; either MT was used and post-edited, or the linguist translated from scratch, suggest-
ing that the MT quality was too low for post-editing.

Specific details regarding how much data was used for each analysis is featured in each
spotlight section.

2) Domain Assignment

Every document was analyzed automatically and assigned a domain label. While the algo-
rithm might assign multiple domains to a document, for our analysis we only considered
the most prominent domain.

                                                FOR MORE INFORMATION VISIT MEMSOURCE.COM    6
MT Quality by Language Pair                                                               02
Language Pair overview

Here are the top ten most popular language pairs, listed in descending order from the most
popular:

RANK        MOST POPULAR LANGUAGE PAIRS IN Q4 2020
1           English - Japanese

2           English - Spanish

3           English - French

4           Japanese - English

5           English - Russian

6           English - German

7           English - Portuguese

8           English - Chinese

9           English - Italian

10          Dutch - English

These ten language pairs account for approximately half of all of the machine translation in
Memsource, with an additional 40 language pairs accounting for the remainder.

Most machine translation in Memsource starts with English. This reflects both the state of
MT technology, but also the localization needs of Memsource’s customers.

The target languages are however, incredibly varied. Although most users are translating
from English, they are able to effectively leverage MT to communicate in over thirty lan-
guages. Machine translation is an effective tool to reach new audiences.

                                               FOR MORE INFORMATION VISIT MEMSOURCE.COM    7
What language are users translating from?                                               02
    1.6% French
   1.7% Spanish
   2.3% Swedish
     2.4% Dutch                                                            8.1% Other
     2.5% Italian
 3.9% Lithuanian
                                                                           7.2% Japanese

                                                                          68.9% English

What languages are users translating to?

    26.2% Other                                                           24.5% English

    2.8% Italian
                                                                          8.2% Japanese
  2.9% Swedish

                                                                          7.5% Spanish
   3.6% Chinese
4.2% Portuguese                                                           7.4% French
   5.3% German                                                            7.3% Russian

                                           FOR MORE INFORMATION VISIT MEMSOURCE.COM        8
Engine Overview                                                                           02
Many different engines can be used effectively for translation, with the Memsource plat-
form alone supporting over thirty different engines. Although the use of generic engines,
such as Google, Microsoft, or Amazon is prevalent, a number of customers have used their
translation data to develop their own custom models.

The following graph illustrates which are most often chosen for translation:

Which engines are used in Memsource?

    22.9% Other                                                               31.5% Google
         engines

      4.8% Mirai
   6.4% Amazon

     7.7% DeepL                                                               26.7% Microsoft

                                               FOR MORE INFORMATION VISIT MEMSOURCE.COM         9
Language Pair Scores                                                                                                 02
    The following matrix shows the average post-editing score for some of the most popular
    languages used in Memsource Translate. Over 34 million unique and manually post-edit-
    ed segments were used to create this analysis.

    The scores represent an average across multiple domains and engines. For some specif-
    ic domains and with certain engines it is possible to achieve significantly better results.
    For some language pairs there is not enough data to provide a reliable estimate, these are
    marked in grey.

                                                                  TARGET LANGUAGE
                                 Czech   German English   Spanish French   Italian Japanese Portuguese   Russian   Chinese
                                                                                                                   (Simplified)

                  Czech                    67      76                       62
                  German           61              78       76      69      74
                  English          63      70               78      74      73       69         76         65          70
SOURCE LANGUAGE

                  Spanish                          79               76                          84
                  French                   70      69       71
                  Italian                  68      74       75      79                          84         70
                  Japanese                         67                                                                  53
                  Portuguese                       78       78
                  Russian                          74
                  Chinese                          46                                50
                  (Simplified)

    This table shows us that for most language pairs the quality of MT is quite high. This is
    especially true for the most commonly used languages pairs, with the average for English
    source translations being approximately 71. For these language pairs MT can significantly
    contribute to speeding up translation workflows.

                                                                      FOR MORE INFORMATION VISIT MEMSOURCE.COM               10
Commentary                                                                                  02
There are some notable outliers. The lowest      On the other hand, MT shows very strong
scores are found with Simplified Chinese         performance for language pairs where Por-
and Japanese source texts. This does not         tuguese was either the source or target lan-
mean that MT cannot be viable when trans-        guage. When translating from Spanish or
lating from Chinese or Japanese. After all,      Italian into Portuguese, users can expect
for certain domains or specific engines real     on average a score of 84, which in practical
performance can be significantly higher. It      terms can bring segments that require little
is likely that the quality gap can be more ef-   or no post-editing. These high scores are
fectively bridged if users started leveraging    caused by linguistic similarities between
MT engines that were developed with these        these Romance languages.
source languages in mind.

                                                 FOR MORE INFORMATION VISIT MEMSOURCE.COM    11
Changes in score since the last report                                                                                02
    This chart reflects the observed changes in output quality, measured with PE score, be-
    tween Q3 and Q4 2020. Please note that the data was rounded to the nearest integer, which
    means that a score of 0 does not necessarily mean that there was no recorded change.
    New language pairs that were added this quarter with no point of comparison are marked
    “N/A”.

                                                                 TARGET LANGUAGE
                                 Czech   German English   Spanish French    Italian Japanese Portuguese   Russian   Chinese
                                                                                                                    (Simplified)

                  Czech                    8       2                          6
                  German          N/A              1        3       4        N/A
                  English          0       1                1       -1        0       0          -1          1           -2
SOURCE LANGUAGE

                  Spanish                          1               N/A                           6
                  French                   0       0       N/A
                  Italian                  4       4        3       -1                           0          N/A
                  Japanese                         1                                                                     0
                  Portuguese                       1        3
                  Russian                          1
                  Chinese                          1                                  1
                  (Simplified)

    From this table we can observe that the quality of MT has increased for almost all language
    pairs since the last report. This is to be expected, as MT providers gather new data and re-
    train their engines for better performance. For some language pairs, notably where Czech,
    German or Italian was the source language, the observed gains were quite significant. An-
    other positive trend was the near universal gains achieved in translation into English.

                                                                         FOR MORE INFORMATION VISIT MEMSOURCE.COM              12
Language Pair in the Spotlight                                                           03
Our previous report featured a spotlight section on the English-French language pair. This
quarter, we will focus on the English-German language pair.

Memsource users translate a variety of different documents into German. This chart shows
relative data volumes of the individual domains in this language pair.

Language Spotlight: Overview of domains

        5.6% Travel                                                      8.5% User Support
      and Hospitality
                                                                         22.5% Business
 8.7% Software User                                                      and Education
      Documentation

       4.4% Medical                                                      5.5% Consumer
                                                                         Electronics
     6.4% Legal and
             Finance
    22.8% Industrial                                                     15.6% Entertainment

Due to a lack of representative data, the Software Development and Cloud Services do-
mains were excluded from this and subsequent graphs.

                                              FOR MORE INFORMATION VISIT MEMSOURCE.COM       13
Best engine performance by domain                                                                  03
100

75

50

25

00
      Business and Consumer Entertainment Industrial     Legal and    Medical    Travel and     User
       Education   Electronics                            Finance                Hospitality   Support

The above graph shows us the best engine performance for each domain in the English-Ger-
man language pair.

For nearly all domains, the quality of the best performing engine was relatively high, with
only the Entertainment domain scoring less than 70. In many cases MT can be used with
little to no post-editing. This is especially true for Travel and Hospitality and Legal and Fi-
nance domains.

                                                       FOR MORE INFORMATION VISIT MEMSOURCE.COM          14
Language Pair Spotlight: English to German                                                          03
100

75

50

25

00
      Business and Consumer Entertainment Industrial     Legal and    Medical     Travel and     User
       Education   Electronics                            Finance                 Hospitality   Support

                      Amazon            Google            Microsoft             DeepL

This chart shows us the average quality of             and Legal and Finance content, the differ-
MT output for the four stock engines used in           ence was up to 11 points. Suboptimal en-
Memsource for the English to German lan-               gine choice can result in lower quality and
guage pair, further divided into domains.              higher post-editing costs, which can easily
Sufficient performance data was not availa-            be mitigated through careful choice of en-
ble for DeepL in the last two domains; they            gines.
were excluded.
                                                       Another important observation is that there
The chart shows that the output quality can            is no “perfect” engine that would consist-
vary quite significantly based on both do-             ently translate better than its competitors.
main and engine. The difference between                For the best possible results across all do-
the best and worst performing engine                   mains, users should always use the best
was often quite significant: for Industrial            performing engine for the given domain.

                                                       FOR MORE INFORMATION VISIT MEMSOURCE.COM           15
Domain in the Spotlight                                                                                                                        03
In this issue our domain spotlight focuses on User Support content translated from English
into a number of different languages.

This domain is primarily associated with content that is created when communicating di-
rectly with customers, usually with the intent to help them with a problem or inform them
of a change. Examples of user support content include support emails or help center articles.

Domain Spotlight: User Support

100

75

50

25

00
      Arabic

               Czech

                       Danish

                                          Spanish

                                                    French

                                                             Italian

                                                                         Korean

                                                                                  Dutch

                                                                                          Portuguese

                                                                                                       Russian

                                                                                                                 Slovak

                                                                                                                                    Turkish

                                                                                                                                              Simplified
                                                                                                                                                Chinese
                                German

                                                                                                                          Swedish

                                                              Target Language

                                         Amazon                        Google             Microsoft

                                                                           FOR MORE INFORMATION VISIT MEMSOURCE.COM                                    16
Commentary                                                                               03
This chart shows us the average quality of    However, the difference between the best
MT output for the three most popular stock    and worst performing engines in some
engines used in Memsource, further subdi-     cases can be significant. For Arabic, Czech,
vided by target language.                     Russian, Turkish, and Simplified Chinese
                                              the difference was over 20 points.
From the data we can see that machine
translation performance can vary signifi-
cantly due to target language and engine
choice. However in most cases, at least one
engine was able to provide translation that
scored on average 70 or above.

                                              FOR MORE INFORMATION VISIT MEMSOURCE.COM    17
Memsource Translate
At Memsource, we have been exploring new      Automatic and data-driven engine selec-
ways for users to go further with MT. Our     tion saves time and money, by choosing
platform currently supports over 30 differ-   the best performing engine for every task.
ent engines with our internal data showing    The algorithm will consider not just the lan-
a steady rate of adoption and general qual-   guage pair, but also the type of content you
ity improvements for most engines year on     are translating, all in real-time and using
year. To avoid the hassle of engine testing   continuously updated performance data.
and implementation, most users tend to        Memsource Translate is fully integrated with
rely on one engine exclusively, losing out    existing workflows in Memsource, with fea-
on the gains achieved by other engines for    tures such as automatic domain detection
specific language pairs and content types.    further reducing the need for human input.

We have found that in over 70% of all         Memsource Translate makes managing
translation projects, a better performing     multiple engines convenient as customers
engine could have been used.                  can use, track, and pay for multiple engines
                                              all in one place. Users can immediately start
This is why we have developed Memsource       translating with three fully managed en-
Translate, an AI-powered machine transla-     gines, or add their own generic or customiz-
tion management solution that automati-       able engines.
cally selects the optimal MT engine for the
user’s content and language pair based on
past performance data.

Testing MT engines is normally a costly and
time-consuming process. For many users
keeping up to date with the latest engine
developments is prohibitively expensive
and taking advantage of them impossible.
Memsource Translate offers a unique solu-
tion to this problem.

                                              FOR MORE INFORMATION VISIT MEMSOURCE.COM   18
Conclusion
In this report, we have seen how the per-
formance of different machine translation
engines can vary across different language
pairs and content types. We have seen that
machine translation can achieve impres-
sive results with high quality outputs, but
this is not necessarily true in all cases. Most
importantly, the choice of the engine used
for a specific domain or language pair can
significantly impact the output quality.

As seen in our spotlight sections, relying on
one engine alone is not enough to get the
best possible results.

The data found in this report represents a
snapshot in time. Memsource Translate, our
engine management solution, monitors MT
engine performance in real-time to recom-
mend the optimal engine for your content.
If you would like to learn more about this
technology, contact us to try out Mem-
source Translate.

We look forward to seeing you again in our
next MT Report in Q3 2021.

                                                  FOR MORE INFORMATION VISIT MEMSOURCE.COM   19
Memsource a.s.
Spalena 51, 110 00 Praha 1
Czech Republic

   twitter.com/memsource
   facebook.com/memsource
   cz.linkedin.com/company/memsource
You can also read