COMMON LEXICAL ERRORS MADE BY MACHINE TRANSLATION ON CULTURAL TEXT
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Edulingua: Jurnal Linguistiks Terapan dan Pendidikan Bahasa Inggris | Vol 8. No. 1. Juli 2021 COMMON LEXICAL ERRORS MADE BY MACHINE TRANSLATION ON CULTURAL TEXT Nanda Fitri Mar'athus Sholikhah1, Rohmani Nur Indah2 1 Department of English Language Education, Faculty of Education and Teacher Training, State Islamic Institute of Kediri nandafitrims@gmail.com 2 Department of English Literature, Faculty of Humanities, Universitas Islam Negeri Maulana Malik Ibrahim Malang indah@bsi.uin-malang.ac.id ABSTRACT Machine translation is one tool of Google that presents various languages to translate. As a translator machine, the results of Google Translate are not always perfectly correct which is still needed to be revised. Arok Dedes story is one of the Javanese stories that contain elements of culture. Translating texts which contain elements of a culture is not easy because one region to another have different cultures, so that it is difficult to look for parallel words that contain elements of culture. This study is aimed at two main purposes: (1) finding out the types of lexical errors made by machine translation in translating cultural text and (2) knowing the most dominant type of lexical errors made by machine translation in translating cultural text. This study was carried out in a population of 553 pages of Arok Dedes story. A simple random sampling technique was done to select samples. The study results are that there are only 9 types of the total 21 types of lexical errors, namely calque, misselection, consonant-based type, false friend, vowel-based type, inappropriate co-hyponym statistically weighted preferences, semantically determined word selection, and preposition partners. The most dominant error of lexical errors is calque. Keywords: Lexical Errors; Machine Translation; Cultural Text Internet technology today allows everyone to presented on web pages on the internet can access information from all over the world be in multiple languages. Therefore, many anytime and anywhere. One of the tools that web visitors use machine translation help internet users find information assistance to help them translate from one effectively is Google. Google has a wide language to another. variety of applications and features that its Various types of machine translation users can take advantage of. In education, can be accessed through Google, such as Google is one of the media that is often Google Translate, Yandex Translate, visited by students and teachers. Google Translate.com, and Bing Microsoft developers recognize that the information Translator (Kumparan.com Tekno & Sains,
Jurnal Edulingua | Vol 4. N. 2 Juli - Desember 2017 2020). The four machine translations are the assisted translation (Keshavarz, 1999). This most frequently visited by Google users in theory can be applied in finding and Indonesia. Google Translate is a translation observing machine translation errors, which service that has been successfully recognized are then edited by human translators. by the world. This translation, which is This article analyzes the errors shaded by Google, has a database of up to translated by the machine translator, namely
Nanda Fitri Mar'athus Sholikhah dan Rohmani Nur Indah religion, social, customs, social organization, source is a book entitled "Arok Dedes." This procedures, sign language, and ecology. book was written by Pramoedya Ananta Machine translators cannot translate cultural Toer, who is one of the most prolific writers terms easily. Examples of cultural text in in the history of Indonesian literature. Javanese stories are terms of an object, Pramoedya has produced numorous works action, and behavior or technical terms that and translated into several foreign languages. express cultural meanings or are written in The instrument used in qualitative 41 | >> pure regional languages. research is a human instrument. Accordingly, The purpose of this study is to this study used a human instrument, namely identify the types of lexical errors made by the researchers, who act as an instrument to machine translators in translating cultural collect and analyze the data. Moreover, this texts in Pramoedya Ananta Toer's Arok research uses cultural texts that have been Dedes dialogue and to determine the most translated by machine translators to collect dominant types of lexical errors. This study the required data. Data collection techniques can provide useful information on how also cover tracing relevant informations from machine translation can be used effectively books, the internet, journals, articles, and when translating various texts, such as texts others. According to Sudaryanto, there are containing cultural elements. This study's five data collection strategies in linguistic findings can clarify the types of lexical errors research: recording techniques, recording found in the Arok Dedes dialogue created by techniques, separating data techniques, data Google Translate. Furthermore, the results transfer techniques, and replacing techniques can be used as a reference in the future to (Sudaryanto, 1992). In this study, the authors train machine translation users to be more only used three data collection strategies: 1) careful and have to draw on re-examining the recording techniques. This technique is used results of their translations so that they do to collect data by recording it using a not necessarily adopt or take the translation notebook. 2) Separation technique is a results. It will help them get a fast translation strategy to separate data from other data to even if they still have to make some edits find similarities and the distribution between and retouch the translation. Thus, this study's them: 3) Transfer technique, namely findings are expected to provide guidelines transferring data to other media. for various learning fields or language skills The data collection procedures of this and those interested in translation. study are as follows: first, the authors read the entire book Arok Dedes. Second, the RESEARCH METHOD writers gave a sign to the text to be analyzed, This section discusses the research design, namely the cultural text contained in Arok data sources, research instruments, data Dedes. Third, the writers listed the text that collection, and data analysis. This study used has been selected, which is a dialogue that a descriptive analysis method to describe the contains cultural texts. Fourth, the writers lexical error analysis results in Arok Dedes' translated the text into a translation machine, story dialogue. Researchers also use namely Google, Translate. Fifth, the authors qualitative descriptive because it tries to rewrote the cultural text with the machine describe the errors found in the object. This translator's translation to a new page as the study discusses machine translators' lexical first data. Sixth, the authors identified the errors in translating Arok Dedes' story machine translators' lexical errors in dialogue, which contains cultural texts. translating Arok Dedes' story dialogue. The data collected is in the form of Finally, the authors conducted data checking words and sentences. This study describes with the experts so that the data are valid. the types of lexical errors in translating In this research, the descriptive machines in translating dialogue in Arok qualitative method employed focuses on Dedes' stories. In this research, the data words, phrases, and sentences rather than Common Lexical Errors Made by Machine Translation on Cultural Text
Jurnal Edulingua | Vol 4. N. 2 Juli - Desember 2017 numbers. There are several procedures in analyzing data based on Keshavarz and Brown (in Hana Amanah, 2017), who explain that error analysis refers to collecting samples, identifying errors and classifying them into which category they belong to, and A. Formal errors
Nanda Fitri Mar'athus Sholikhah dan Rohmani Nur Indah this research is a qualitative descriptive study, the data results will be described qualitatively by presenting the frequency and percentage of errors. The frequency and percentage of each error category are the Table 1. The Frequency and Percentage of results of calculations using certain formulas Lexical Errors that have been mentioned in the previous Error Types Number 43 | >> Percent chapter. of (%) Errors The Types and The Dominant Type of (34) Lexical Errosmade by Google Translate in A. Formal errors Translating Cultural Text 1. Formal Misselection As shown in table 1, there are thirty-four a. Vowel-based type 1 3.00% lexical errors found from fourteen pages in b. Consonant-based Arok Dedes' story dialogue that contains type 1 3.00% cultural texts. Some errors are common (for c. False type example, calque, statistically weighted 2. Misformations 5 15.00% preferences, and semantically determined a. Calque word selection). Others are relatively rare 3. Distortion 9 27.00% (e.g., false type, misselection, consonant a. Misselection based type, vowel based type, inappropriate B. Semantic errors 4 12.00% co-hyponyms, and preposition partners). 1. Confusion of sense There were no errors in any other categories relation (for example, suffix types, prefixing types, a. Inappropriate co borrowing, coinage, omission, overinclusion, hyponyms 1 3.00% misordering, blending, a supermom for 2. Collocation errors hyponym, hyponym for superonym, wrong a. Semantically near-synonym, and arbitrary). determined word selection 6 18.00% b. Statistically weighted preferences 6 18.00% c. Preposition partners 1 3.00% Based on the table above, the authors provide further explanation as follows: a. Formal Errors In formal errors, the most problematic error category in the data is a calque (there are 27.00% of all errors), followed by false type (15.00%), then misselection (12.00%). The errors that rarely appear in formal errors are consonant based type and vowel based type, only 3.00% of the total. These findings indicate that the Indonesian language, especially the word structure of culture, is quite influential in making English Common Lexical Errors Made by Machine Translation on Cultural Text
Jurnal Edulingua | Vol 4. N. 2 Juli - Desember 2017 sentences. The five formal errors that arise in 4) Consonant Based Type this study will be discussed below by This error case is almost the same as the providing some examples and explanations vowel based type case. In this case, they are of how they occur and how they should almost the same shape but have different occur. consonants. Let's take an example: 1) Calque Source Text: Tak pernah yang mulia
Nanda Fitri Mar'athus Sholikhah dan Rohmani Nur Indah Statistically weighted preferences are words ears, and then you dare to speak like or phrases often used in conjunction with that? other words or phrases and sound natural and The words 'tell' and 'bitter' in the two appropriate for native speakers. An incorrect sentences above are examples of phrase may not be completely wrong; it is semantically word determined selection. In imprecise. the first sentence (the word 'tell'), let's see There are six statistically weighted how the first language was used before 45 | >> preferences found from thirty-four samples. translation. "Did Hyang Wisynu command We will discuss one of them. Here is a you to empower that eagle diligently too?" mistake: The word 'commission' means that someone Source Text: Bukankah Yang Mulia Akuwu is ordered to do something. Command here sudah cukup memuliakan kau, means action. If we use the word 'tell,' it is Dedes? Pramesywari Tumapel? not wrong, but it is inappropriate. The word Telah mengangkat naik kau dalam 'tell' here is only a command word, not an perkawinan kebesaran ini? action. So the suggested word to use is Google Translate: Is not your Honor Akuwu 'command/ask'. enough to glorify you, Dedes? In the second sentences (the word Pramesywari Tumapel? Has raised 'bitter'), it can be seen that several words you in this marriage of greatness? may have the same meaning. For example, In the case of these statistically the words 'bitter' and 'bad.' have negative weighted preferences, the phrase 'your meanings. Bitter is a word to express a Honor' is the mistake. The word 'honor' is negative that can be felt most clearly, such as not appropriate to attach to the word Akuwu, butter's bitterness. However, if we look at the a person who can glorify someone. This context of "That is the bitter result of Sri word is inaccurate because it is not an Erlangga's legacy.", It means that it is not appropriate adjective or combination for something that can be immediately felt 'Akuwu.' Even though we change the speech clearly by the human senses. However, the part of the word 'honor' to an adjective, it word is against bad circumstances or bad still does not make the phrase correct. The conditions. So the suggested translation is suggested translation is 'glorious.' "That is the bad consequence of Sri Erlangga" 2) Semantically Determined Word Selection 3) Preposition Partners This is an example of an error found in the Preposition adalah kata yang Reveal Arok Dedes dialog: relationships. Preposition partners relate to Source Text: prepositions that follow before or after a a) Apakah Hyang Wisynu menitahkan verb, adjective, or noun. This is an example agar kalian memelihara elang itu of the preposition partners found in the Arok dengan tekun juga? Dedes dialog by Google Translate: b) Itulah akibat pahit peninggalan Sri Source Text: Setidak-tidaknya dari Hyang Erlangga.Apakah gurumu yang lama, Bthara Guru aku tahu, dua hari lagi Tantripala, yang membisikan itu kalian akan mendapat perintah pada kupingmu, maka kau berani untuk mengangkut upeti ke Kediri. bicara seperti itu? Dari Hyang Wisynu aku tahu, kalian Google Translate: akan lakukan itu dengan patuh. a) Did Hyang Wisynu tell you to keep Google Translate: At least from Hyang the eagle diligently too? Bthara Guru I know, in two days b) That's the bitter consequence of Sri you will get an order to bring tribute Erlangga. What is your old teacher, to Kediri. From Hyang Wisynu I Tantripala, who whispers it to your know you will do it obediently. Common Lexical Errors Made by Machine Translation on Cultural Text
Jurnal Edulingua | Vol 4. N. 2 Juli - Desember 2017 The preposition 'in' in a sentence is shown in table 1, there were a total of 34 incorrect because the preposition 'in' is not a lexical errors found in Arok Dedes preposition to indicate time. The preposition Pramoedya Ananta Toer's story dialogue, 'in' is a preposition that indicates direction. which was translated by Google Translate. Actually, in this sentence, the word above The most dominant type of lexical error is means 'for two days.' There is a time calque.
Nanda Fitri Mar'athus Sholikhah dan Rohmani Nur Indah For example, the translation, "By morphologically similar, and as a result, Hyang Wisynu, on the closing day of this machine translators think they can correctly Brahmancarya ..." is translated as "For the produce the target language. Another sake of Hyang Wisynu, on the day of closing common error that occurs is an 'incorrect of this Brahman ...". This is an interesting collocation.' There could be several reasons mistake made by Google Translate for for this collocation error. In the first place, failing to recognize that the word there may be L1 interference, and the 47 | >> "Brahmancarya" refers to "Brahman." In resulting collocation is the result of direct fact, the two mean very different words. translation. Another possible reason is Thus, the source text, which is literally underdeveloped knowledge of the word. The translated or mirrored, results in the target word may be in their productive vocabulary, text's wrong text structure, which also affects but machine translation may not understand its meaning. Based on these findings, formal the different connotation coloring, thus errors were more frequent than semantic placing the word in the wrong context. errors. This shows that knowledge of There are several mistakes in the morphology is more difficult than semantic 'word-formation.' This type of error knowledge. Besides, it proves that Google frequently occurs for two reasons. The first Translate cannot translate cultural elements is a misapplication of the L2 derived rule, or easily. sometimes, machine translation applies the Lexical errors have been proven to be L1 derived rule to produce an L2 target major errors in the dialogue of the translation word. This is outside the scope of this study of Arok Dedes Pramoedya Ananta Toer's but will be an interesting research topic for a story, which was translated by Google different study. In summary, half of all errors Translate. Lexical errors contribute to nearly made are classified as lexical errors. The half of all errors made. A closer look at the most common type of lexical error data shows that the second problem type of encountered has to deal with a limited lexical error is semantic. This type of error understanding of the semantic range of the can be attributed to underdeveloped words and how they intersect with other vocabulary knowledge. Machine translation words. It was also found that the most uses words in the semantically correct field, significant mistake Google Translate made but the connotative meaning of the words was in the wrong word order. In translation used does not fit the context. This error is units (in dialogs containing cultural text), considered an error in the connotation of the Google Translate cannot recognize the word culture. There are two possible reasons source text's root text due to the order of the for this. The first reason may be that machine elements that appear completely independent translations do not have the same word count or are used for different lexical purposes. to cover the semantic field. The second This section also presents the results reason that makes sense is that the machine of previous research in the form of Farah translator does not fully know the word. It Hana Amanah, Error Made by Google means that they do not know the appropriate Translate and It is Rectification by Human collocates. Translators, Faculty of Languages and An important part of knowing a word Linguistics University of Malaya Kuala is when, where, and how to use it. Formal Lumpur, 2017 where three human translators errors make the most common lexical errors. rectify or edit Google Translation raw The reasons for this error can be myriad. The outputs with access to the source text. Error authors may have accessed the wrong word corrections made by human translators are in their mental lexicon because of how it was made based on a taxonomy of error stored. Perhaps the word is included in their categorization. The results show that human receptive vocabulary. The target language translators produce accurate output and the resulting language are compared to Google translations. The reason Common Lexical Errors Made by Machine Translation on Cultural Text
Jurnal Edulingua | Vol 4. N. 2 Juli - Desember 2017 is because human translators know more Foreign Language (EFL).' This research about the context of the text than Google demonstrates the importance of Translate. understanding how the lexis is acquired and Error analysis is collecting samples, identifying where learning has not taken identifying errors, and classifying errors, and place and therefore the areas of teaching finally evaluating the errors (which means and/or remedial correction, hope that this
Nanda Fitri Mar'athus Sholikhah dan Rohmani Nur Indah implications. It was found that a large Less common lexical error subtypes are false number of lexical errors were made. More friend types, misselection, consonant based than 50% of lexical errors relate to learners types, vowel based types, inappropriate co- not understanding the semantic ranges of hyponyms, and preposition partners. The words and not understanding the other categories do not appear at all. They corresponding word sets. Based on these are the suffix type, prefix type, borrowing, findings, several approaches and activities coinage, omission, overinclusion, 49 | >> are provided for use with ALL. The focus of misordering, blending, a superonym for this activity is to create individualized and hyponyms, a hyponym for a supermom, differentiated instruction through the use of wrong near-synonyms, and arbitrary student writing and goal setting. This activity combination. Of the two main types of also provides deeper vocabulary knowledge lexical errors, formal errors are more to ALLs using semantic mapping, studying problematic than semantic errors found in collocations, and using concordances. the Arok Dedes dialog. As found in this study, teachers can Google translate does not easily use exercises to help students differentiate translate morphological knowledge rather between minimal pairs and increase their than semantics. Google is a great source of morphological awareness when teaching information and knowledge. From Google, vocabulary and spelling. To deal with we can know and learn all the sciences. One collocation errors, students can be informed of the Google tools that helps internet users about the corporation's value and can be and has common uses is Google translate. encouraged to access the corporation online This is because Google translate presents and use these facilities when studying various languages. However, as a machine collocation. Also, students at beginner and translator, Google Translate's results are not lower intermediate levels can initially entirely correct. Therefore, it is suggested to memorize word pieces in the learning readers, especially users, to be more collocation. With this in mind, we suggest corrective and careful in adopting them. that teachers use a taxonomy of lexical error During this research, the researcher or develop their own in their vocabulary found another problem in Arok Dedes' teaching. We firmly believe that this dialogue, which was translated by Google taxonomy serves not only as a research tool Translate. Another problem that arises is but, more importantly, as a learning tool that how to revise the results of Google teachers should use. When used effectively, Translate, which have many errors and still this lexical error taxonomy can help students need correction. Therefore, the authors improve their metacognitive skills in suggest that other researchers conduct further recognizing and perhaps even correcting research on this matter. Furthermore, Google their own mistakes. This could be a way to Translate has difficulty with certain types of help minimize lexical error fossilization. errors, as this study shows. It would be interesting to test other machine translations CONCLUSION to compare accuracy rates to see which of After conducting a study of lexical errors, them can produce the translation output with the authors came to several conclusions. This the fewest errors. Further research may also conclusion is to answer the research explore other text type genres and use larger problems of this study. Of the twenty-one volumes of text. subtypes of lexical errors, only nine subtypes appeared in the Arok Dedes dialogue, and ACKNOWLEDGEMENT the rest did not appear at all. Some of Arok The publication of this paper is inseparable Dedes' dialogue's common mistakes are from the guidance and support of various calque, statistical weight preferences, and parties. Therefore, on this occasion the semantically determined word selection. author would like to extend the deepest Common Lexical Errors Made by Machine Translation on Cultural Text
Jurnal Edulingua | Vol 4. N. 2 Juli - Desember 2017 appreciation to the advisors, Mr Bahruddin, who always guide and give suggestions. S.S, M.Pd and Mr Chothibul Umam, M.Pd REFERENCES Amini, R., & Bayesteh, J. (2020). Investigating Error Analysis in Interlinguistic English
Nanda Fitri Mar'athus Sholikhah dan Rohmani Nur Indah 51 | >> Common Lexical Errors Made by Machine Translation on Cultural Text
You can also read