STUDYING THE USEFULNESS AND RELIABILITY OF ENGLISH TO CHINESE MACHINE TRANSLATION - Master IDTM
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Université Clermont Auvergne UFR Langues, Cultures et Communication STUDYING THE USEFULNESS AND RELIABILITY OF ENGLISH TO CHINESE MACHINE TRANSLATION Marion Batisse Master 2 Langues Étrangères Appliquées Ingénierie de la Documentation Technique Multilingue Directed by Dacia Dressen-Hammouda 2017-2018
Abstract The purpose of this study is to identity if Google Translate is reliable and to which level, and if it is useful in a working environment. An emphasis was placed on English to Chinese language as there is very few research in this direction. Through an analysis of a corpus analysis translated from English to Chinese by machines translators, it is shown that Google Translate is more accurate than expected with ratings higher than average. Interviews of managers dealing with translations daily were also conducted in order to define several situations where machine translators would be useful like checking if the general meaning of the text or some words and small sentences, cross-referencing an agency’s translations or seeing what a target language translation would look like in a layout. However, it has to be kept in mind that machine translators should be used with precaution as there is no way to assess the quality of translations when having no language knowledge. The use of this tool is recommended only when mastering the language, and post-editing and proofreading phases are mandatory. Keywords: Chinese, Google Translate, machine translation, reliability, translation project, usefulness, working environment Résumé Le but de cette étude est de définir si Google Traduction est fiable et à quel niveau, et s'il est utile dans le cadre de l’entreprise. L'accent a été mis sur la traduction de l'anglais vers le chinois car il y a très peu de recherches dans ce sens. Grâce à l'analyse d'un corpus de textes traduit de l'anglais vers le chinois par des traducteurs automatiques, il est démontré que Google Traduction est plus performant qu’imaginé avec des notes supérieures à la moyenne. Des interviews avec des managers qui s'occupent quotidiennement de traductions ont également été menées afin de définir plusieurs situations où les traducteurs automatiques seraient utiles. Par exemple, pour vérifier le sens général du texte ou certains mots et petites phrases, pour intégrer les traductions d'une agence dans le document d’origine et donc faire des références croisées, ou encore pour voir à quoi ressemblerait une traduction dans la langue cible au niveau de la mise en page du document. Cependant, il faut garder à l'esprit que les traducteurs automatiques doivent être utilisés avec précaution car il n'y a aucun moyen d'évaluer la qualité des traductions lorsque l’utilisateur ne possède pas de connaissances suffisantes en langues étrangères. L'utilisation de cet outil n'est donc recommandée que pour la maîtrise de la langue, et les phases de post-édition et de relecture sont obligatoires. Mots-clés : cadre de travail, chinois, fiabilité, Google Traduction, projet de traduction, traduction automatique, utilité 3
4
1. Introduction A little more than 7000 languages are currently spoken in the world (Fennig & Simmons, 2018). Some are spoken by the majority of the word population and some are used by a minority of people. Either way, the need to understand others’ languages came very early with the development of different civilizations. Translation comes from the Latin word translatio, which means “to carry across” (Vélez, 2016). It is an old process that probably appeared back during antiquity according to discussions. As civilizations evolved and the world continued to develop itself, translation became quite essential for the diffusion of information. As a nation consistently growing, China is one of the countries that attracts a great number of international companies. English to Chinese translation has become a necessity to communicate effectively in the global market. Technological advances actually led to big changes in the translation process (Li, 2015; Doherty, 2016). With the development of internet, you can find more content, more easily than ever and in the language that you speak. However, when some attention is drawn on internet content, we see that there is a dominance of English language. The numbers show that in 2018, 25.3% of internet users are speaking English and that 52.4% of the available content is in English. This is definitely a big part of internet content and we can find it logical as English is the most spoken and understood language in the world (Crystal, 2008). However, a fact to be noted is that although 19.4% of internet users speak Chinese, only 1.8% of the content is in Chinese. The available content on the internet is clearly not proportional to the number of users speaking the language. Chinese being the second language with the highest number of internet users, its speakers should have access to a greater amount of information and content. This is the same situation with all languages. Although actual internet content for each language may be satisfactory enough for the population, we can see that the diffusion of information has a language barrier to overcome. Moreover, those former numbers do not grow quickly. For Chinese content, they are even falling because of the creation of more content in other languages. Therefore, we need technological tools to improve this situation. Something that could answer our need is machine translation. However, machine translation systems have many risks whether they are used for business or personal daily life (MacKenzie, 2014). While it is cheap and supposedly “effortless”, it can lower the value and quality of the content. Researches – that we will see in more details later – have been conducted and actually found several solutions to reduce the error rate like using controlled languages or post-editing (Vivien, 2013). The aim of this study is to see if machine translation can be considered as a reliable tool to use in a company and to find in which cases it can be useful. Therefore, this paper main questions are: Is a perfect translation really important for a company? 5
And what is a perfect translation? To which level is it ok to rely on the use of machine translation? In which situations can we say that machine translation can be useful in a working environment? To answer those questions, a corpus analysis of a text which was translated from English to Chinese with a machine translator was chosen. This analysis will allow to rate whether machine translation is reliable or not for this language. Moreover, two interviews were conducted with a French project manager and a Chinese product manager in order to find the problems related to translation in a company. These interviews allowed me to define different situations where machine translation could be useful and how to use it. My hypothesis was that although machine translation quality has greatly evolved recently, it is hard to consider it reliable and that the results of the translated corpus would be very wide-ranging. I thought that both my interviewees would not use machine translation since its reputation is quite bad (MacKenzie, 2014). However, they would provide a lot of situations where the use of machine translator could be useful. First, an overview of machine translation will be given with its dangers and how researchers have found solutions to reduce the number of mistakes. Then there will be more details of English and Chinese translation and about the reliability of machine translators. Secondly, the methodology of this experiment will be explained before giving the results of the analysis and interviews. Lastly, the previous results will be discussed and the last part will define in which situation machine translation could be useful in a working environment. Several lines of reflection for possible further study will also be suggested. 6
2. Theoretical background 2.1 Machine translation Machine translation is the process of translating a text from a source language to a target language by using a computer (Hutchins, 1995). According to this definition, a human should not need to be involved in the process but we will see later that this is not necessarily true. There are several types of technologies used for translation: statistical, rule-based, example-based, hybrid and neural machine translators. This paper focuses on one the most used free machine translator in the world: Google Translate. Google Translate was launched in 2006. It originally worked with a statistical-based technology. It consisted in translating the source language into English and then translating it into the target language thanks to bilingual references which are called parallel corpora (Koehn, 2010). However, it brings some big disadvantages to languages which do not have a lot of human-translated documents and resources. Since 2016, a new technology is used by Google: Google Neural Machine Translation. According to Google, this system is capable to learn by itself how to produce more fluent translations (Turovsky, 2016). The change is that instead of translating pieces of sentences and putting them together, this new system translates the sentence as a whole without needing to translate it in English as a first step. Without going into technical details, this system replicates the human brain by analyzing the meaning, the grammar and finally the context of the sentence. According to Turovsky, Google Translate is the most used machine translator in the world with more than 500 million daily users benefiting from this tool’s features. 2.2 Problems and some solutions Machine translators, such as Google Translate, are used for their many advantages. Indeed, they can work much faster than manual translation and save a lot of time. According to Boitet (2008), you can save one hour and five minutes by translating the text with a machine translator and post-editing it, instead of just translating it manually. Moreover, it is cheaper than addressing a professional translator and a certain confidentiality can be kept. However, machine translators can easily bring problems of accuracy and context which can – in certain situations – do a lot of damages. MacKenzie (2014) explains that people mostly use Google Translate when shopping on e-Commerce websites, reading blogs or articles from overseas. In addition, Google can suggest a page translation of a website, and more and more businesses are relying on those methods to translate their website. According to the author, it is a dangerous decision to make. She affirms that translations are not entirely accurate as machine translators can still not understand and use local nuances, peculiarities 7
and idiosyncrasies present in all the world’s languages. In addition, grammar being quite different according to the language pair like English/Chinese for example, the final meaning or syntax is often wrong. That is why machine translator is a risky tool for businesses. While it is cheap and effortless, it can lower the value and quality of the content. For the author, businesses in technical industry like manufacturing, engineering or chemical companies should definitely avoid machine translators as there is a vast amount of important terminology and grammar that should be translated carefully. Previous research developed and improved solutions to ease the translation process by using machine translators. Ferret (2015) studied if the pre-translation of a text could improve the quality of machine translation. He demonstrated that pre-translation could avoid recurrent errors in translations and simplify post-editing, but that it is not enough to use independently. Therefore, he decided to combine pre-translation and controlled languages which are a set of generally accepted rules that enables to avoid machine translation mistakes as much as possible. His results showed that today, there is no better alternative than the controlled languages approach, and that pre-translation does not show good enough results. However, given the fast-developing speed of machine translation, there is the high possibility that Farret’s method could be more efficient in the near future. The controlled language approach was studied in more details by Vivien (2013). She investigated the quality of machine translation with the English-French language pair. She explains that the quality and number of mistakes of machine translation require a mandatory post-editing by the human hand. In order to reduce as much as possible this post-editing phase, Vivien created a template for translatable contents by using principles from controlled languages. After her experiment on nine texts, she observed that using the template does not remove the post-editing phase but that the translation quality is better and post-editing clearly faster and easier. She concludes that while her template is not perfect because of the need of human participation, it would definitely help people who wish to offer a good online language experience to their target readers. 2.3 English and Chinese translation Chinese and English are very different languages which make the translation between the two even harder. The grammar is not the same and there are no letters, plural forms or tenses in Chinese. While in English the verb is conjugated to imply the tense, it is done by adding words to the sentence in Chinese. The order of the words in a sentence can also be very different which could cause confusion. In addition, some Chinese characters can mean several English words and vice versa. In her research, Brazill (2016) identified the problems of Chinese to English translation and one of them was machine translators. Because machine translators cannot translate idiomatic expressions or 8
understand the context, cultural awareness is important to improve the translation quality. She thinks that a machine translator is best used for rough translations in order to understand the essence of the text even if some parts are not accurate. It is important to have a professional translator to at least proofread the translated texts. What we can understand from this is that machine translation should not be used by people with no knowledge of the target language. However, it could be different if we could assess that the quality is good enough for a general understanding. 2.4 Machine translators and reliability Assessing machine translation quality can be quite difficult to do. Many studies have tried to evaluate if machine translators were reliable. However, translation quality can be rated as high “according to some standards and be a bad translation according to others” (Görög, 2014). He points out: “A translator's work might be excellent in terms of fluency (meaning it sounds natural or intuitive), but how about the adequacy of the translation (and its fidelity to the source text) or errors made based on an error typology (such as terminology, country standards and formatting)?”. He thinks about a way to grade translation as somebody would rate a hotel with stars. With comprehensibility as a standard, less than one-star would be a text not even considered as translation since the meaning would be completely different. A five-star translation could be either very fluent or accurate to the source text but done with tight deadlines. A similar system to rate machine translation could be very useful. For this reason, being able to analyze whether machine translators are reliable and to which level is necessary. Roig Allué (2017) studied the reliability of Google Translate by analyzing mistakes made after the translation process. She defined three variables for her corpus compilation which are the language (English/Spanish), the direction of the translation (English into Spanish and vice-versa) and the genre of the texts (tourist: tourist texts and sport: football match reports). Her hypothesis was that Google Translate would be reliable enough to have a general idea of what the text meant rather than to obtain a professional translation. After her analysis, she discovered that lexicogrammatical and syntactic mistakes can frequently be found in the translated texts for both tourism and sport genres. Sometimes, the meaning is still understandable, but other times the texts have a lot of mistakes which leads to misunderstandings. Translations of tourist texts are better when done from Spanish to English. She supposes that this situation is due the fact that the more frequent a genre is online, the better the quality of its translations will be. She concludes that Google Translate cannot be 9
considered reliable as it does not fulfill the user’s need for understanding, although the quality will presumably improve during the next few years. However, this does not eliminate the fact that machine translators could useful in several situations. The objective is to find out the right situations to use it. 10
3. Methodology 3.1 Research design The purpose of this study was to determine whether machine translators can be useful for Chinese- English translations, in particular in working settings. A number of authors have written extensively about Chinese to English translations, highlighting its difficulties (Brazill, 2016; Vilar et al., 2006). A qualitative approach was used. In order to study reliability, a text corpus of twelve different texts was built analyzed after they were translated from English to Chinese by a machine translator. Those translated were categorized by using a grid. This method is based on Roig Allué’s research (2017) who decided to evaluate the reliability of Google Translate by using three variables: the languages being translated, the direction of the translation and the genre of the texts. In order to gain further insight into how machine translators could be useful in a work environment, interviews of two persons working at the company where I am currently doing my internship were conducted. Sharing their point of view concerning translation/machine translation for projects in a company setting and finding solutions to solve the translation problems they encountered would be a good way to see if machine translators would be useful in different situations. 3.2 Participants A native Chinese speaking person helped to do the translated corpus analysis. She is a twenty-two years old student currently living and studying in Beijing. She is also able to speak English, Korean and Japanese at a level where it is possible to follow classes at university so she has an affinity with languages. She was the one who graded the translated Chinese text as my Chinese level was not high enough to do it. A total of two participants took part in the interviews of the study. One is a French project manager who creates e-learning modules for a French company’s commercial delegates. The other is a Chinese product manager currently working in China for the Chinese branch of the same company. I had the opportunity to meet her during a training where we both discussed translation problems for sales representatives and employees in China. They both had encountered several situations where they had to translate documents or e-learning modules in from English to Chinese and had difficulties to do so. It is right and overdue to remark that I personally know the people who were interviewed. I needed to have an idea of their level in Chinese so that the interview would be relevant to my study. However, that fact will not affect the results of this study as the analysis method is completely objective 11
3.3 Materials and procedure for data collection To evaluate if machine translation is reliable and useful in different situations, a 1,835 words text corpus of different genres was built from three different websites (see Appendix A). The three genres were chosen because they are quite different from one another where the level of translation accuracy could have a different impact: recipes, news articles and medical notices for children. They also were selected because the use of Google Translate is justified for all three. Indeed, someone could simply be searching for a recipe but have problems with the names of ingredients or instructions. News articles are also often translated since potential users would like to know what is happening in the rest of the world. Regarding medical notices for children, the use of machine translation could be very useful if the user is in another country where a language is spoken that is not the language used in the notice. However, it is important to note that a wrongly translated cooking recipe has technically less impact than a mistranslated medicine notice for children. For each genre, different text extracts were selected as seen in Table 1. Three texts were chosen for News (585 words) and for Medicine (635 words), while six texts were selected for Recipes (615 words) because the content was smaller. Please note that the corpus can be considered as relatively small because the translation was done for two different machine translators. Table 1: Number of words used for the text corpus TOPIC TEXT NUMBER OF WORDS Recipes 1 49 2 82 3 221 4 126 5 57 6 80 TOTAL 615 News 1 267 2 197 3 121 TOTAL 585 Medicine 1 247 2 104 3 284 TOTAL 635 All the source texts were modified according to Vivien’s (2013) controlled languages template. This approach was proved to reduce the number of translation mistakes and was chosen in this present study because machine translators have to be as reliable as possible. 12
Each text was then translated by two machine translators to compare which one was more reliable and faithful to the source text. Google Translate and Baidu Fanyi were chosen because the first is the most used machine translator in the world (Turovsky, 2016) and the second is from a Chinese company that is currently the leader of the online search market in China (Incitez China, 2015). Then, following Görög for rating a translation (2014), a Chinese assistant helped to rate the translated texts from one to five for their reliability in terms of the meaning of the translation: 1. The text makes no sense 2. A large portion of the text is incomprehensible 3. Some elements are wrong or missing and the meaning of the text is hard to understand (some important information is left out) 4. Some elements are wrong or missing but the meaning of the text is clear (you get the essential information) 5. The text was faithfully translated For my co-workers’ interviews, a list of ten questions mostly inspired by previous discussions on the subject was prospectively built (see Appendix B). They were also mostly based on a research from Brazill (2016) who carried out several interviews and built surveys in order to identify and solve Chinese to English translation problems. Because my French colleague was working at the same site, it was possible to interview her directly. The interview was recorder in order to explore and analyze her remarks. For my Chinese colleague, the distance did not enable a direct interview. The solution that she would record herself answering the questions was chosen. 13
14
4. Results First, the translated corpus analysis will be explained, followed by the interviews outcomes. 4.1 Translated corpus analysis Table 2 shows that most of the translations are considered better by a Chinese native speaker when translated by Google Translate. This implies that the meaning of the translated text was closer to the original and less information was lost. Only Recipe 5 and Recipe 6 translations were actually better when translated by Baidu Fanyi. Because 83% of the translation texts were chosen for Google Translate, this means that this machine translator is typically more reliable than Baidu Fanyi. Table 2: Machine translator which had the best translation results TOPIC TEXT GOOGLE TRANSLATE BAIDU Recipes 1 X 2 X 3 X 4 X 5 X 6 X News 1 X 2 X 3 X Medicine 1 X 2 X 3 X Concerning the rating of the texts, we can note that the results are quite average as shown in table 3. Indeed, no translated texts scored 1, 2 or 5 out of 5. 50% of the texts scored 3 and 50% scored 4 out of 5. This indicates that in all texts, some elements were wrong or missing. In Recipes 1/2/3 and in Medicines 1/2/3, the meaning of the translated text is clear enough to get the essential information. In Recipes 4/5/6 and News 1/2/3, the meaning of the text was hard to understand because important information was left out. If we analyze the scores by theme, we can see than Medicine was better translated with a score of 4 out of 5 for its average. Just behind we have Recipes with an average of 3.5 out of 5 and in last place News with 3 out of 5. 15
Table 3: Rating the meaning of translated texts RATING TOPIC TEXT (1 TO 5) Recipes 1 4 2 4 3 4 4 3 5 3 6 3 News 1 3 2 3 3 3 Medicine 1 4 2 4 3 4 4.2 Interviews results A total of ten questions were answered by the two participants (see Appendix B). 1. While taking into account that only a few people in the company present in China can speak English, do you think it is important to translate contents intended for commercial delegates in their native language (in Chinese)? Why? Both interviewees think that translating content intended for commercial delegates in their native language is very important. They mentioned that in their company, very few Chinese sales persons actually speak English. The Chinese product manager said that sometimes the workers will make an effort to ask someone who can speak English to translate, and sometimes they will just ignore the document. This situation will clearly have an influence on their business. The French project manager stated that delivering content is not the only important part: the feeling of wanting to read and commitment are also necessary. If not, the translated texts will not be efficient. The commitment lies in the ease of navigation through the content, and language will be a way to smooth this navigation to eventually assimilate the information better. She added that although language was important, cultural differences were also fundamental. How people perceive a subject is very different from one country to another. She related a previous experience when she was told that she was being too familiar with German employees while she was speaking the same way as she would do in France. Therefore, the content does not only need to be translated but also adapted to Chinese sales representatives. 16
2. By which part this translation should be done? France (with the source language), China (with target language), internally (by the services responsible for the documents…) or externally (translation agency…)? Both interviewees think that translations should be done both by a translation agency and internally. The Chinese product manager mentioned that for translations related to products knowledge, company employees will do a better job because translation agencies do not know the company’s products, wording or corporate codes. According to the French project manager, the translation agency could be very useful to trim off the text. This would ease the process by translating general sentences that take the most time to do. Then, the proofreading would be done by a native speaker because he knows about the company’s specifications. She said that although this solution is the most complete, it is not the most efficient because of the wording used in the company. Standardizing the vocabulary and manner of speaking could be more efficient in the long run. This solution is longer to carry out but is still more relevant and fitting. 3. Generally, are translations done directly from French language to Chinese, or from French to English to Chinese? The source language chosen for the translation depends on the start request. The interviewees said that generally, translations are done from English to Chinese although the company is French. This is because the first training module or the document created is most of the time the English one. 4. What do you think is the biggest challenge when you translate English to Chinese? The manager from China believes that the biggest challenges when you translate English to Chinese would be some language habits such as the fact that the grammar is different between the two languages and that long sentences in Chinese give a more complicated feeling compared to English. The other French interviewee mentioned that the biggest challenge is to replace one language by the other is a document by doing the match-ups. The problem is that she usually sends files in a language that she masters but receives the translated files in a language that she does not understand. Thus, proof-reading is mandatory after changing the source text by the translated text on the training module. Moreover, this situation causes some graphic issues since the way English and Chinese are written is completely different. Indeed, Chinese sentences with characters take less space or can even be written from top to bottom. 5. Some words are quite difficult/impossible to translate (for ex: product name, brand…) In those cases, who decides about the final name/translation? There is also the issue of some words being impossible to translate into Chinese, like products names or brands. In this case, the company will translate it based on the pronunciation. According to the 17
Chinese manager, they sometimes just choose a nice name which sounds good in Chinese and not necessarily relevant to the English name. The French manager added that the global marketing director is usually the one who has the final say. He makes his decision with each country manager because there are specificities in each country and market. 6. To your knowledge, was there some cases of mistranslation? What was the impact? To my interviewees’ knowledge, there have been several cases of mistranslation before in the company. According to the Chinese product manager, such mistranslation are often not serious mistakes. The employees can still understand the meaning of the text but it can make people laugh and have less efficiency to deliver its message. There were of course mistranslations which resulted in simple misunderstanding, and in the most serious cases, those employees understood in a completely different way than what was intended. The French project manager related another experience. In her experience, the training team tried to produce an e-learning module intended for the company’s Chinese sales representatives. However, the result was the non-delivery of the final document. Indeed, the training team received a text which has been translated by a translation agency but no one could do the cross-referencing of the text. China did not have the necessary resources to do it and France did not have Chinese knowledge to finish it. She added that it was not a lack of vision and willingness but a lack of resources. Another case was mentioned where a degraded German translation was delivered. It was originally proof-read by a native. Although the training was important and necessary for the German team, the number of learners who did participate online was unusually low. Further feedback informed the training department that the translation was of very low quality although it was proof-read by a native. She concluded that in a same language, the nuances and regional intricacies can have a big impact on the training comprehension, frequentation and efficiency. 7. Do you think that some translation mistakes due to the fact that the translator did not speak a perfect English or Chinese have a lot of importance? The Chinese product manager answered that even if the translator was a native, this could also lead to mistakes. She related an example of a former Chinese product manager who could speak both languages very well but did not have any knowledge in the target fields which were chemistry and physics. Therefore, to speak a perfect Chinese or English is not the only thing that is important. The project manager from France added that it is not a problem of vocabulary but more a problem of culture and phrasing. In her opinion, a translation agency will translate word-by-word in order to be as close as possible to the text. She added that it was a pity to follow the original structure of the sentence because this process prevents the use of daily life idioms and phrasing. Even if the 18
translation is right, the sentence is wrong with regard to acculturation. This is what machine translators are actually lacking. She also mentioned the Deepl online translator which she thinks does a better job in this field. 8. Do you think that if the person making the document (ex: e-learning) knew Chinese, it could make the translation process easier? 9. Do you think that if the person doing the translated text integration in the document knew Chinese, it could make the translation process easier? For the questions concerning if the person making the document (e-learning) or the translated text integration knew Chinese, they both answered that it could make the translation process easier. If you have the Chinese language’s codes in mind when creating a document in English, the creation could be optimized so that the translated text is easier to do. They added that knowing Chinese would also make the text integration easier and this could avoid a situation like we saw previously with the non-delivery of the training module. 10. What do you think about the use of machine translators in the company? Finally, both interviewees think that machine translators are not really reliable in a working setting but could still be useful depending on how it is used. The Chinese manager would not rely on it for full sentences. She said that it would be useful to translate some words or small phrases. The French interlocutor added that she uses machine translator for the languages that she masters or have some knowledge of. She needs it when she has a doubt or in order to feel reassured about the right word but she knows where she wants to go with it. The mistake would be to use machine translator without any knowledge. For her, machine translators can be used if somebody already has languages notions because they allow a critical thinking. Therefore, not having this knowledge will result in translating an approximate language and will not make the flow or the learning easier. She concluded that machine translators have meaning only when the level of language proficiency is enough. 19
20
5. Discussion 5.1 Reliability is a matter of content It was no surprise that Google Translate had the best translation results. In a non-scientific article written in 2018, a test was conducted in order to see which machine translator had the best quality output (Fu, 2018). The results showed that Google Translate performed well for relating the main idea of a text. In this article, the machine translators were classified by tiers. While Google Translate was listed as a first tier machine translator, Baidu was listed as second tier. Another fact to be noted is that the official translation used by the Chinese government was very similar to the translation from Google Translate. The author supposes that it is possible that they used Google Translate as a reference. To this day, a scientific study comparing machine translators for the English-Chinese language pair was not found. Therefore, it could be interesting to see which Machine Translator is the most reliable. In another study in 2016, the authors analyzed whether Neural Machine Translation was really better than Statistical Machine Translation quality wise (Junczys-Dowmunt et al., 2016). For all languages pairs involving Chinese, NMT always had better results than SMT and by a great range (see Figure 1). Figure 1: Comparison between NMT and SMT systems for 6 languages pairs Source: Junczys-Dowmunt, M., Dwojak, T., & Hoang, H. (2016). Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions. 21
Therefore, we can see that the NMT system is better adapted to Asian languages than the SMT system. One of my hypothesis was that some genres of text would be more faithfully translated than others. I thought that Recipes would have the best translations, then News and finally Medicine. The reasons were because Recipes did not have very difficult words to translate, News would have some mistakes mainly because of names that would be hard to translate into Chinese and finally Medicine because of the difficult lexical field. However, this was not the case. When we study the reasons why the results were not exactly what was expected, we can note several points. The medicine notices were quite easy to follow and the instructions were actually straight to the point. Therefore, the translation process was easier for the machine translator. One of the hardest point could have been the disease/medicine names translation but they were also translated correctly. We did not give a 5 out of 5 to the rating because although the sentences were understandable, it was not always right grammatically. The fact that Recipes has similar rating is for the same reasons. The most redundant mistake was that the words were reversed, which often changed the meaning of the whole sentence. We suppose that there was too much important information in a small sentence. We also assume that the News texts had the lower score because of the writing style which is quite formal and more difficult to translate than instructions, names of people or places and citations. Some words were not even translated. Thanks to those results, we can say that a translation will be more or less reliable if the genre is easy to translate. In addition, the way the source text is written will have an impact on the translation reliability: for example, if a company want to translate a technical datasheet for a product, it should be careful of words like products or brand names because those kinds of words do not have a match in other languages. The translation is bound to have mistakes if a machine translator is used like this. This is why the author of the document should think about the content and how it will be translated. An example would be to use standard wording and avoid cultural references or the use of slang because it does not translate very well in other languages and cultures (Rimalower, 2009). Therefore, it would greatly help if the author of the document had knowledge about the target language, which is often not the case. Another surprising fact was that no text was rated 1 or 2. This proves that machine translation is useful when you want to know what a text is about. However, no text was rated with a 5 out of 5. Actually, we can ask ourselves if 4 could be the highest grade we can give to a machine translator. This could be the case because to this day, human post-editing is still necessary to have an error-free and fluent sounding translation. Because of this fact, having a translation rated 4 would be no different from a 5 points translation. While the meaning will be clear, there will still be work to do on 22
the style, grammar or even expressions. Even if the text was faithfully translated by a machine translator, it would still be far from perfect. 5.2 A perfect translation Even if a translation is really faithful to the source text, it does not necessarily mean that it is perfect. In part 4.2, we saw that acculturation is very important for commitment in order to give people the desire to look at the content we write. Researchers explain that literal translation was considered as an ideal during certain times (Lilova, 1987; Jin, 1997). However, this concept has changed a lot along with the development of translation. Lilova states that “Translation is not just an activity of reproduction but is one of creation”. The fact that the effect is more important than matching all the words is well known but not necessarily applied. In the interview, the project manager mentioned that some of the translations she received from an agency were too similar to the original text and did not suit the target language. This result is comparable to what a machine translator could give. The difference is that for a professional translator, we are sure that the meaning is right even if it is not well adapted. An interesting non-scientific article was found about whether perfect translation was a reality. The author wonders if translations are a matter of taste and if it should be treated subjectively (Rourke, 2015). A translation can be good for a native and completely wrong for another, which was a problem identified through the interviews. To solve it, we could either inquire in detail about the target and adapt the translation perfectly or choose to translate subjectively so that the text is understood by a majority. This first solution is definitely the best regarding quality and potential commitment. However, in case of several different targets needing the same language translation, this would be too difficult and expensive to do. For example, if we had to translate the text into Chinese, we should take into account that the main spoken language is Mandarin. However, if the Chinese headquarters are in Shanghai, translating the text in Shanghainese dialect would have better results because using local expressions would raise involvement. It could also be possible that some people working in Shanghai come from a lot of different places and do not necessarily speak this dialect. Therefore, the only viable solution for businesses would be to follow the majority by using the most used language. Being aware of language nuances is good but it raises several challenges for machine translation. Translation is more a problem of culture and phrasing instead of vocabulary. Even if we want to convey the same thing, we have to be careful of which expression or tone to use. In 4.2, we saw an example of when the project manager used the same distance approach to communicate with the German and the French sales forces. However, she had feedback that she was acting too familiar with the Germans. This is due to the fact that German culture is considered as low-context culture 23
(Meyer, 2014). This term was first introduced by Edward Hall (1976). According to him, low-context cultures have a specific behavior. In order to communicate effectively, messages should be explicit. Privacy and having a certain distance between people are also essential. All of those cultural aspects have to be taken into account when translating and localizing. We saw that the effect is the most important part of a perfect translation. However, machine translators are known for being as close as possible to the source text. The French interviewee mentioned that the free machine translator Deepl actually gives better translation regarding matching expressions instead of word-to-word. Several recent researches found out that Google Translate used a literal translation while Deepl seemed more fluent and nuanced (Coldeway & Lardinois, 2017; Isabelle & Kuhn, 2018). However, it is still not possible to test the English-Chinese language pair as it is not available on the platform. In addition to this, we have to mind languages habits. Like said in 5.1, translating expressions or slangs can be challenging. For example, if we take the Chinese expression 加油 (jiāyóu), it literally means “fill a tank with petrol”. However, this expression is used to encourage somebody and could be translated as many things like “do your best, hang in there, good luck”. Machine translators have difficulties to translate those kinds of expressions as seen in the following picture. Picture 1: Difficult Chinese-to-English translation of the word 加油 Source: Google Translate Another point to be noted is that when we translate Chinese, we have to think about the future document layout. Writing in characters is very different from writing in letters. When making an e- learning, creating the layout is a big part. Therefore, we have to think beforehand about the final 24
result and if it will be suitable for the target language. This is where machine translation can be useful as people can have an idea of what the text will look like although the content will not be right. Because of all of this, it is hard to say what would be a perfect translation. We can only conclude that there would be as many perfect translations as people and situations. 5.3 The right situations to use machine translation As said in 5.1, a rated 4 translation is actually good enough for a machine translator because a post- editing and proofreading phases are still necessary. Therefore, the results are actually similar to Roig Allué’s (2017). She concluded that although the meaning of the translated texts is sometimes understandable, misunderstandings can happen frequently and this situation makes machine translators unreliable tools. Even though we cannot entirely rely on machine translators, it does not mean that it is not useful in a working environment. Machine translators are useful if people can assess the translation quality. This means that people need knowledge about languages and their codes in order to see if the meaning is right or not. If somebody does not master the language or have sufficient skills, it is recommended to use a translation agency because they can deliver a reliable document. Indeed, the translation process in agencies is well supervised, done by experts and the translation is sent after one or several proofreadings. However, choosing the right translator is also important. It has to be an expert in the field. Consider asking for a fluent sounding translation that could help raise commitment as it is ok to not have a word-to-word result. You can also provide a glossary for the company’s wording and a style guide with all the company’s conventions for documents in order to have the best translation as possible with the least eventual modifications. These tools can also be helpful to do the cross-reference of the source text and the translated text in a document. Even if the person responsible for this task do not have any knowledge of the used languages, machine translators enable to check if the meaning is at least similar, which is enough for this situation. However, this requires several mandatory proofreadings to be sure the cross- referencing was done correctly. In the situation where a collaborator cannot read English and a Chinese translation is not available, machine translators can be used to have the main ideas of the text. However, it is important to be careful with this method. As machine translators are considered unreliable, it is necessary to confirm the right meaning with other persons. If the user knows the source language, the best solution to avoid as much mistakes as possible is to work on the source text. It was indeed proved that using controlled languages gives better translation results. 25
Using machine translator in order to check single words or to translate general sentences is also conceivable. However, people have be careful to avoid elements likely to cause errors like specific wording, slang or expressions. In can also help for the document layout during the creation phase. The author can see what the result would look like and do an early adaptation process to simplify the text integration. 26
6. Conclusion The purpose of this current study was to evaluate whether machine translators could be used in a work environment. It was also designed to see to which level we can trust it for the English-Chinese language pair. The findings of this investigations were that Google Translate did a better job than expected. Indeed, the quality of the translations regarding the meaning was higher than average for half of the texts. This means that the main ideas and information were comprehensible. However, for the other half of the texts, some important details were lost in translation, which compromised Google Translate quality for these languages. What came out of this was that machine translators should be used if the knowledge in the source and target languages are high enough to assess if the translation meaning is correct. Overall, this study strengthens the fact that post-editing and proofreading is always necessary. Having the opportunity to interview both a Chinese product manager and a French project manager who deal with translations daily was also a good way to identify situations where machine translators would help rather than be harmful. Those findings support the facts that machine translators are suited for supporting human translation, translating words or short sentences, inserting the translated text into the document without having knowledge in the language, taking a look of what a translated text could look like in a target language – mostly for the possible layout result – or even helping somebody who does not understand the text to know most of its general meaning. We still have to note that this solution is far from being ideal as information can be lost. Before this study, evidence of Google Translate accuracy for translating English to Chinese texts was purely anectodical. Indeed, several researchers analyzed which translator did a better job with quality or if machine translation was accurate but never a scientific article for the English to Chinese pair (Fu, 2018; Isabelle & Kuhn, 2018). This study’s results allow us to see a real analysis of a translated corpus and see a real proof of how much Google Translate is reliable. The main weakness of this study is that analyzing a bigger amount of text would have given more specific results. Half of the text was rated 3 while the other half was rated 4 out of 5. Other texts analysis would have maybe turned the results around and we could have observed which part would be the highest. Moreover, my ability to speak Chinese being not sufficient to do this study by myself, it cannot be ensured that my Chinese colleague actually rated the text to my own understanding of it. We both discussed the rating together in order to be as accurate as possible, and the rating we used was as objective as possible. More interviews could have also been a good way to collect various points of view and raise new questions about machine translators. Despite its exploratory nature, this study certainly adds to our understanding of how English to Chinese machine translation developed throughout the years. We can now see that the misconception that machine translators – translating English to Chinese in 27
particular – make complete no-sense is not right at all. It can clearly be useful in business and solve problems related to translation. The only thing is that caution should always be exercised when dealing with these tools. This study should be repeated using a larger text corpus. This will enable a fairer number and see if the most obtained results would be 3 or 4 out of 5. Further research could also be conducted to determine which machine translator is the most accurate and feel the most fluent for this language pair. One point mentioned in this study was that Deepl made translations of better quality. Because research has been conducted for available languages and studies got good results (Coldeway & Lardinois, 2017; Isabelle & Kuhn, 2018), we have to keep watch on this machine translator. It would be very interesting to wait for the Chinese language to be implemented and compare which translator between Google and Deepl would give the most accurate and fluent translations. 28
7. References Boitet, C. (2008). La Traduction Automatique : ça marche ou non ?. Brazill, S. (2016). Chinese to English Translation: Identifying Problems and Providing Solutions. Graduate Theses & Non-Theses, 71. Incitez China. (2015). China Search Engine Market Overview in 2014. Retrieved September 10, 2018, from https://www.chinainternetwatch.com/12678/search-engine-market-overview-2014/ Coldewey, D., & Lardinois, F. (2017). DeepL schools other online translators with clever machine learning. Retrieved September 10, 2018, from https://techcrunch.com/2017/08/29/deepl-schools- other-online-translators-with-clever-machine-learning/?guccounter=1 Crystal, D. (2008). Two thousand million?. English Today, 24(01), 3-6. doi: 10.1017/s0266078408000023 Doherty, S. (2016). The Impact of Translation Technologies on the Process and Product of Translation. International Journal Of Communication, 10, 947-969. Farret, J. (2015). Machine Translation: how to avoid errors?. Fennig, C., & Simmons, G. (2018). Ethnologue: Languages of the World, Twenty-first edition. Dallas, Texas: SIL International. Retrieved September 10, 2018, from http://www.ethnologue.com Fu, Y. (2018). Who Offers the Best Chinese-English Machine Translation? A Comparison of Google, Microsoft Bing, Baidu, Tencent, Sogou, and NetEase Youdao · Yiqin Fu. Retrieved September 10, 2018, from https://yiqinfu.github.io/posts/machine-translation-chinese-english-june-2018/ Görög, A. (2014). Evaluating quality in translation [Ebook]. Retrieved September 10, 2018, from https://www.multilingual.com/article/201412-22.pdf Hall, E. (1976). Beyond culture. Hutchins, W. (1995). Machine Translation. In S. Chan & D. Pollard, An Encyclopedia of Translation (pp. 591-602). Isabelle, P., & Kuhn, R. (2018). A Challenge Set for French -> English Machine Translation. doi: arXiv:1806.02725v2 Jin, D. (1997). What is a perfect translation?. Babel Revue Internationale De La Traduction / International Journal Of Translation, 43(3), 267-272. doi: 10.1075/babel.43.3.06jin Junczys-Dowmunt, M., Dwojak, T., & Hoang, H. (2016). Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions. doi: arXiv:1610.01108 Koehn, P. (2014). Statistical machine translation (pp. 4-7). Cambridge: Cambridge University Press. Li, A. (2015). Machines, Lost In Translation: The Dream Of Universal Understanding. 29
Lilova, A. (1987). The perfect translation - Ideal and reality. In M. Gaddis Rose, Translation Excellence: Assessment, Achievement, Maintenance (pp. 9-18). MacKenzie, E. (2014). The dangers of machine translation. Retrieved September 10, 2018, from http://blog.webcertain.com/the-dangers-of-machine-translation/09/07/2014/ Meyer, E. (2014). The Culture Map: Breaking Through the Invisible Boundaries of Global Business. New York, NY: PublicAffairs. Rimalower, G. (2009). Tips for Writing a Document Destined for Translation. Intercom (pp. 21-22). Roig Allué, B. (2017). The Reliability and Limitations of Google Translate: A Bilingual, Bidirectional and Genre-Based Evaluation. Rourke, J. (2015). Does the perfect translation exist?. Retrieved September 10, 2018, from https://silvertonguetranslations.com/perfect-translation/ Turovsky, B. (2016). Found in translation: More accurate, fluent sentences in Google Translate. Retrieved September 10, 2018, from https://blog.google/products/translate/found-translation- more-accurate-fluent-sentences-google-translate/ Vélez, F. (2016). Antes de Babel (pp. 3-21). Granada: Comares. Vilar, D., Xu, J., D'Haro, L., & Ney, H. (2006). Error analysis of statistical machine translation output. Proceedings Of LREC, 697-702. Vivien, J. (2013). A Loosely-Defined Controlled Language Template can help the English-to-French Machine Translation of Non-Technical Texts. 30
8. Appendices Appendix A: corpus of text used for the translation analysis RATING TOPIC TEXT (1 TO 5) https://www.allrecipes.com/recipe/260527/maple-bacon- Recipes 1 crepe-stack/ https://www.allrecipes.com/recipe/241528/fudgy-nutella-mug- 2 cake/ 3 https://www.allrecipes.com/recipe/23600/worlds-best-lasagna/ https://www.allrecipes.com/recipe/93234/honey-walnut- 4 shrimp/ https://www.allrecipes.com/recipe/14746/mushroom-pork- 5 chops/ 6 https://www.allrecipes.com/recipe/21313/banana-pudding-iii/ News 1 https://www.bbc.com/news/world-europe-44531448 2 https://www.bbc.com/news/uk-politics-44532500 3 https://www.bbc.com/news/world-us-canada-44538110 https://www.medicinesforchildren.org.uk/amoxicillin-bacterial- Medicine 1 infections-0 2 https://www.medicinesforchildren.org.uk/metformin-diabetes https://www.medicinesforchildren.org.uk/topiramate- 3 preventing-seizures 31
Appendix B: Questions asked during the interviews 1. While taking into account that only a few people in the company present in China can speak English, do you think it is important to translate contents intended for commercial delegates in their native language (in Chinese)? Why? 2. By which part this translation should be done? France (with the source language), China (with target language), internally (by the services responsible for the documents…) or externally (translation agency…)? 3. Generally, are translations done directly from French language to Chinese, or from French to English to Chinese? 4. What do you think is the biggest challenge when you translate English to Chinese? 5. Some words are quite difficult/impossible to translate (for ex: product name, brand…) In those cases, who decides about the final name/translation? 6. To your knowledge, was there some cases of mistranslation? What were the impacts? 7. Do you think that some translation mistakes due to the fact that the translator did not speak a perfect English or Chinese have a lot of importance? 8. Do you think that if the person making the document (ex: e-learning) knew Chinese, it could make the translation process easier? 9. Do you think that if the person doing the translated text integration in the document knew Chinese, it could make the translation process easier? 10. What do you think about the use of machine translators in the company? 32
You can also read