MACHINE TRANSLATION REPORT Q2/2021 - Memsource
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
MACHINE TRANSLATION REPORT Q2/2021
CONTENTS INTRODUCTION 2 REPORT STRUCTURE 3 MEASURING QUALITY 4 DETERMINING DOMAINS 5 DATA 6 MT QUALITY BY LANGUAGE PAIR 7 LANGUAGE PAIR IN THE SPOTLIGHT 13 DOMAIN IN THE SPOTLIGHT 16 MEMSOURCE TRANSLATE 18 CONCLUSION 19
Introduction Welcome to the Q2/2021 issue of Memsource’s Machine Translation Report. Machine translation (MT) is one of the key technologies for localization and will only become more important going forward, as MT systems become even more sophisticated and the de- mand for localized content continues to grow. At Memsource, our mission is to help our customers translate efficiently. We see MT as a truly essential part of a translator’s toolbox, and so one of our main goals is to enable our custom- ers to utilize MT to its full potential. With that in mind, we’ve developed features such as MT quality estimation and Memsource Translate, which help users better deal with the sometimes unpredictable quality of MT out- put. With this MT report, we hope to give you a general overview of how MT currently performs across different language pairs and how this has changed since our last report, with spotlight sections that will be especially interesting to those that translate into German or work with customer-facing User Support content. We hope that this report will help you in make informed decisions about MT, but also see how Memsource monitors and analyzes MT data to ensure that our customers always use optimal-quality MT. Aleš Tamchyna AI Research and Development Manager at Memsource FOR MORE INFORMATION VISIT MEMSOURCE.COM 2
Report Structure The report is split into three mains sections. Readers familiar with our reports can skip to the data presented in section 2 and 3. 01 METHODS a. Measuring Quality: how we measure machine translation quality. b. Domains: how we define and identify domains. c. Data: what kind of data was used to produce this report. 02 MT Quality by Language Pair a. MT Quality by Language Pair: an overview of MT quality by language pair. b. Changes in Language Pair Quality: changes in recorded MT quality since our last report. 03 Spotlight section a. Language Pair: engine performance by domain for English-German. b. Domain: engine performance for the User Support domain for a variety of language pairs. FOR MORE INFORMATION VISIT MEMSOURCE.COM 3
Measuring Quality 01 Throughout this report we will be using post-editing data to measure the quality of MT output. Post-editing itself is the process of adapting machine translation output by a pro- fessional linguist. To determine the post-editing score (PE), we without any change. A score of 0 means that the measure the post-editing effort, essentially the whole segment was comprehensively rewrit- percentage change of any given segment re- ten, without any of the original being retained. quired to achieve the final translation. Through- out this report it is expressed as a number be- We use anonymized Memsource post-editing tween 0 and 100. data taken from real workflows. This has the advantage of providing us with a continuously To illustrate this, a score of 100 for a given seg- growing dataset, but also directly reflects real ment means that the MT output was accepted life use cases. FOR MORE INFORMATION VISIT MEMSOURCE.COM 4
Determining Domains 01 The type of content you translate with ma- In Memsource domains were created au- chine translation can significantly affect the tomatically using an AI-powered analysis quality of the output. An engine that might of documents. An unsupervised machine perform well for product descriptions might learning algorithm recognized 11 distinct struggle with a medical report. The content types of documents that shared similar sets type variable is known as a domain. of keywords. Based on the selection of key- words, the domains were then named by Categorizing documents into domains is Memsource engineers. normally a process that requires human input, which creates complications when it Here is a list of domains together with a few comes to scaling translation and allows for associated keywords: basic human error. DOMAIN KEYWORDS 1. Medical 'study', 'patients', 'patient', 'treatment', 'dose', 'mg', 'clinical' 2. Travel and Hospitality 'km', 'hotel', 'guests', 'room', 'accommodation' 3. Business and Education 'team', 'business', 'work', 'school', 'students' 4. Legal and Finance 'agreement', 'company', 'contract', 'services', 'financial' 5. Software User Documentation 'click', 'select', 'data', 'text', 'view', 'file' 6. Consumer Electronics 'power', 'battery', 'switch', 'sensor', 'usb' 7. User Support 'please', 'email', 'account', 'domain', 'contact' 8. Cloud Services 'network', 'server', 'database', 'sql', 'data' 9. Industrial 'mm', 'pressure', 'valve', 'machine', 'oil' 10. Software Development 'value', 'class', 'type', 'element', 'string' 11. Entertainment 'game', 'like', 'get', 'love', 'play', 'go' FOR MORE INFORMATION VISIT MEMSOURCE.COM 5
Data 01 1) Data Selection & Volume The report is based on data collected in Q4 2020. As in our previous reports, the data for the spotlight sections was collected exclusively us- ing Memsource Translate. The data for the quality score matrix is taken from the Memsource platform in general. To gather precise MT quality results, we’ve filtered the translation segments based on the following criteria: • The segment was either post-edited starting from raw MT output or MT was availa- ble, but the linguist decided to translate from scratch. • Segments which were translated using translation memory matches or non-trans- latable segments are excluded. This filtering means that the data collected should closely reflect the required post-editing effort; either MT was used and post-edited, or the linguist translated from scratch, suggest- ing that the MT quality was too low for post-editing. Specific details regarding how much data was used for each analysis is featured in each spotlight section. 2) Domain Assignment Every document was analyzed automatically and assigned a domain label. While the algo- rithm might assign multiple domains to a document, for our analysis we only considered the most prominent domain. FOR MORE INFORMATION VISIT MEMSOURCE.COM 6
MT Quality by Language Pair 02 Language Pair overview Here are the top ten most popular language pairs, listed in descending order from the most popular: RANK MOST POPULAR LANGUAGE PAIRS IN Q4 2020 1 English - Japanese 2 English - Spanish 3 English - French 4 Japanese - English 5 English - Russian 6 English - German 7 English - Portuguese 8 English - Chinese 9 English - Italian 10 Dutch - English These ten language pairs account for approximately half of all of the machine translation in Memsource, with an additional 40 language pairs accounting for the remainder. Most machine translation in Memsource starts with English. This reflects both the state of MT technology, but also the localization needs of Memsource’s customers. The target languages are however, incredibly varied. Although most users are translating from English, they are able to effectively leverage MT to communicate in over thirty lan- guages. Machine translation is an effective tool to reach new audiences. FOR MORE INFORMATION VISIT MEMSOURCE.COM 7
What language are users translating from? 02 1.6% French 1.7% Spanish 2.3% Swedish 2.4% Dutch 8.1% Other 2.5% Italian 3.9% Lithuanian 7.2% Japanese 68.9% English What languages are users translating to? 26.2% Other 24.5% English 2.8% Italian 8.2% Japanese 2.9% Swedish 7.5% Spanish 3.6% Chinese 4.2% Portuguese 7.4% French 5.3% German 7.3% Russian FOR MORE INFORMATION VISIT MEMSOURCE.COM 8
Engine Overview 02 Many different engines can be used effectively for translation, with the Memsource plat- form alone supporting over thirty different engines. Although the use of generic engines, such as Google, Microsoft, or Amazon is prevalent, a number of customers have used their translation data to develop their own custom models. The following graph illustrates which are most often chosen for translation: Which engines are used in Memsource? 22.9% Other 31.5% Google engines 4.8% Mirai 6.4% Amazon 7.7% DeepL 26.7% Microsoft FOR MORE INFORMATION VISIT MEMSOURCE.COM 9
Language Pair Scores 02 The following matrix shows the average post-editing score for some of the most popular languages used in Memsource Translate. Over 34 million unique and manually post-edit- ed segments were used to create this analysis. The scores represent an average across multiple domains and engines. For some specif- ic domains and with certain engines it is possible to achieve significantly better results. For some language pairs there is not enough data to provide a reliable estimate, these are marked in grey. TARGET LANGUAGE Czech German English Spanish French Italian Japanese Portuguese Russian Chinese (Simplified) Czech 67 76 62 German 61 78 76 69 74 English 63 70 78 74 73 69 76 65 70 SOURCE LANGUAGE Spanish 79 76 84 French 70 69 71 Italian 68 74 75 79 84 70 Japanese 67 53 Portuguese 78 78 Russian 74 Chinese 46 50 (Simplified) This table shows us that for most language pairs the quality of MT is quite high. This is especially true for the most commonly used languages pairs, with the average for English source translations being approximately 71. For these language pairs MT can significantly contribute to speeding up translation workflows. FOR MORE INFORMATION VISIT MEMSOURCE.COM 10
Commentary 02 There are some notable outliers. The lowest On the other hand, MT shows very strong scores are found with Simplified Chinese performance for language pairs where Por- and Japanese source texts. This does not tuguese was either the source or target lan- mean that MT cannot be viable when trans- guage. When translating from Spanish or lating from Chinese or Japanese. After all, Italian into Portuguese, users can expect for certain domains or specific engines real on average a score of 84, which in practical performance can be significantly higher. It terms can bring segments that require little is likely that the quality gap can be more ef- or no post-editing. These high scores are fectively bridged if users started leveraging caused by linguistic similarities between MT engines that were developed with these these Romance languages. source languages in mind. FOR MORE INFORMATION VISIT MEMSOURCE.COM 11
Changes in score since the last report 02 This chart reflects the observed changes in output quality, measured with PE score, be- tween Q3 and Q4 2020. Please note that the data was rounded to the nearest integer, which means that a score of 0 does not necessarily mean that there was no recorded change. New language pairs that were added this quarter with no point of comparison are marked “N/A”. TARGET LANGUAGE Czech German English Spanish French Italian Japanese Portuguese Russian Chinese (Simplified) Czech 8 2 6 German N/A 1 3 4 N/A English 0 1 1 -1 0 0 -1 1 -2 SOURCE LANGUAGE Spanish 1 N/A 6 French 0 0 N/A Italian 4 4 3 -1 0 N/A Japanese 1 0 Portuguese 1 3 Russian 1 Chinese 1 1 (Simplified) From this table we can observe that the quality of MT has increased for almost all language pairs since the last report. This is to be expected, as MT providers gather new data and re- train their engines for better performance. For some language pairs, notably where Czech, German or Italian was the source language, the observed gains were quite significant. An- other positive trend was the near universal gains achieved in translation into English. FOR MORE INFORMATION VISIT MEMSOURCE.COM 12
Language Pair in the Spotlight 03 Our previous report featured a spotlight section on the English-French language pair. This quarter, we will focus on the English-German language pair. Memsource users translate a variety of different documents into German. This chart shows relative data volumes of the individual domains in this language pair. Language Spotlight: Overview of domains 5.6% Travel 8.5% User Support and Hospitality 22.5% Business 8.7% Software User and Education Documentation 4.4% Medical 5.5% Consumer Electronics 6.4% Legal and Finance 22.8% Industrial 15.6% Entertainment Due to a lack of representative data, the Software Development and Cloud Services do- mains were excluded from this and subsequent graphs. FOR MORE INFORMATION VISIT MEMSOURCE.COM 13
Best engine performance by domain 03 100 75 50 25 00 Business and Consumer Entertainment Industrial Legal and Medical Travel and User Education Electronics Finance Hospitality Support The above graph shows us the best engine performance for each domain in the English-Ger- man language pair. For nearly all domains, the quality of the best performing engine was relatively high, with only the Entertainment domain scoring less than 70. In many cases MT can be used with little to no post-editing. This is especially true for Travel and Hospitality and Legal and Fi- nance domains. FOR MORE INFORMATION VISIT MEMSOURCE.COM 14
Language Pair Spotlight: English to German 03 100 75 50 25 00 Business and Consumer Entertainment Industrial Legal and Medical Travel and User Education Electronics Finance Hospitality Support Amazon Google Microsoft DeepL This chart shows us the average quality of and Legal and Finance content, the differ- MT output for the four stock engines used in ence was up to 11 points. Suboptimal en- Memsource for the English to German lan- gine choice can result in lower quality and guage pair, further divided into domains. higher post-editing costs, which can easily Sufficient performance data was not availa- be mitigated through careful choice of en- ble for DeepL in the last two domains; they gines. were excluded. Another important observation is that there The chart shows that the output quality can is no “perfect” engine that would consist- vary quite significantly based on both do- ently translate better than its competitors. main and engine. The difference between For the best possible results across all do- the best and worst performing engine mains, users should always use the best was often quite significant: for Industrial performing engine for the given domain. FOR MORE INFORMATION VISIT MEMSOURCE.COM 15
Domain in the Spotlight 03 In this issue our domain spotlight focuses on User Support content translated from English into a number of different languages. This domain is primarily associated with content that is created when communicating di- rectly with customers, usually with the intent to help them with a problem or inform them of a change. Examples of user support content include support emails or help center articles. Domain Spotlight: User Support 100 75 50 25 00 Arabic Czech Danish Spanish French Italian Korean Dutch Portuguese Russian Slovak Turkish Simplified Chinese German Swedish Target Language Amazon Google Microsoft FOR MORE INFORMATION VISIT MEMSOURCE.COM 16
Commentary 03 This chart shows us the average quality of However, the difference between the best MT output for the three most popular stock and worst performing engines in some engines used in Memsource, further subdi- cases can be significant. For Arabic, Czech, vided by target language. Russian, Turkish, and Simplified Chinese the difference was over 20 points. From the data we can see that machine translation performance can vary signifi- cantly due to target language and engine choice. However in most cases, at least one engine was able to provide translation that scored on average 70 or above. FOR MORE INFORMATION VISIT MEMSOURCE.COM 17
Memsource Translate At Memsource, we have been exploring new Automatic and data-driven engine selec- ways for users to go further with MT. Our tion saves time and money, by choosing platform currently supports over 30 differ- the best performing engine for every task. ent engines with our internal data showing The algorithm will consider not just the lan- a steady rate of adoption and general qual- guage pair, but also the type of content you ity improvements for most engines year on are translating, all in real-time and using year. To avoid the hassle of engine testing continuously updated performance data. and implementation, most users tend to Memsource Translate is fully integrated with rely on one engine exclusively, losing out existing workflows in Memsource, with fea- on the gains achieved by other engines for tures such as automatic domain detection specific language pairs and content types. further reducing the need for human input. We have found that in over 70% of all Memsource Translate makes managing translation projects, a better performing multiple engines convenient as customers engine could have been used. can use, track, and pay for multiple engines all in one place. Users can immediately start This is why we have developed Memsource translating with three fully managed en- Translate, an AI-powered machine transla- gines, or add their own generic or customiz- tion management solution that automati- able engines. cally selects the optimal MT engine for the user’s content and language pair based on past performance data. Testing MT engines is normally a costly and time-consuming process. For many users keeping up to date with the latest engine developments is prohibitively expensive and taking advantage of them impossible. Memsource Translate offers a unique solu- tion to this problem. FOR MORE INFORMATION VISIT MEMSOURCE.COM 18
Conclusion In this report, we have seen how the per- formance of different machine translation engines can vary across different language pairs and content types. We have seen that machine translation can achieve impres- sive results with high quality outputs, but this is not necessarily true in all cases. Most importantly, the choice of the engine used for a specific domain or language pair can significantly impact the output quality. As seen in our spotlight sections, relying on one engine alone is not enough to get the best possible results. The data found in this report represents a snapshot in time. Memsource Translate, our engine management solution, monitors MT engine performance in real-time to recom- mend the optimal engine for your content. If you would like to learn more about this technology, contact us to try out Mem- source Translate. We look forward to seeing you again in our next MT Report in Q3 2021. FOR MORE INFORMATION VISIT MEMSOURCE.COM 19
Memsource a.s. Spalena 51, 110 00 Praha 1 Czech Republic twitter.com/memsource facebook.com/memsource cz.linkedin.com/company/memsource
You can also read