COMMON LEXICAL ERRORS MADE BY MACHINE TRANSLATION ON CULTURAL TEXT

Page created by Theresa Strickland
 
CONTINUE READING
Edulingua: Jurnal Linguistiks Terapan dan Pendidikan Bahasa Inggris | Vol 8. No. 1. Juli 2021

COMMON LEXICAL ERRORS MADE BY MACHINE TRANSLATION
                ON CULTURAL TEXT

                Nanda Fitri Mar'athus Sholikhah1, Rohmani Nur Indah2
  1
   Department of English Language Education, Faculty of Education and Teacher Training,
                               State Islamic Institute of Kediri
                                   nandafitrims@gmail.com
2
  Department of English Literature, Faculty of Humanities, Universitas Islam Negeri Maulana
                                    Malik Ibrahim Malang
                                 indah@bsi.uin-malang.ac.id

                                          ABSTRACT
Machine translation is one tool of Google that presents various languages to translate. As a
translator machine, the results of Google Translate are not always perfectly correct which is
still needed to be revised. Arok Dedes story is one of the Javanese stories that contain
elements of culture. Translating texts which contain elements of a culture is not easy because
one region to another have different cultures, so that it is difficult to look for parallel words
that contain elements of culture. This study is aimed at two main purposes: (1) finding out the
types of lexical errors made by machine translation in translating cultural text and (2)
knowing the most dominant type of lexical errors made by machine translation in translating
cultural text. This study was carried out in a population of 553 pages of Arok Dedes story. A
simple random sampling technique was done to select samples. The study results are that
there are only 9 types of the total 21 types of lexical errors, namely calque, misselection,
consonant-based type, false friend, vowel-based type, inappropriate co-hyponym statistically
weighted preferences, semantically determined word selection, and preposition partners. The
most dominant error of lexical errors is calque.

Keywords: Lexical Errors; Machine Translation; Cultural Text

Internet technology today allows everyone to       presented on web pages on the internet can
access information from all over the world         be in multiple languages. Therefore, many
anytime and anywhere. One of the tools that        web visitors use machine translation
help internet users find information               assistance to help them translate from one
effectively is Google. Google has a wide           language to another.
variety of applications and features that its              Various types of machine translation
users can take advantage of. In education,         can be accessed through Google, such as
Google is one of the media that is often           Google Translate, Yandex Translate,
visited by students and teachers. Google           Translate.com,     and    Bing    Microsoft
developers recognize that the information          Translator (Kumparan.com Tekno & Sains,
Jurnal Edulingua | Vol 4. N. 2 Juli - Desember 2017

        2020). The four machine translations are the       assisted translation (Keshavarz, 1999). This
        most frequently visited by Google users in         theory can be applied in finding and
        Indonesia. Google Translate is a translation       observing machine translation errors, which
        service that has been successfully recognized      are then edited by human translators.
        by the world. This translation, which is                   This article analyzes the errors
        shaded by Google, has a database of up to          translated by the machine translator, namely
Nanda Fitri Mar'athus Sholikhah dan Rohmani Nur Indah

religion, social, customs, social organization,     source is a book entitled "Arok Dedes." This
procedures, sign language, and ecology.             book was written by Pramoedya Ananta
Machine translators cannot translate cultural       Toer, who is one of the most prolific writers
terms easily. Examples of cultural text in          in the history of Indonesian literature.
Javanese stories are terms of an object,            Pramoedya has produced numorous works
action, and behavior or technical terms that        and translated into several foreign languages.
express cultural meanings or are written in                 The instrument used in qualitative 41 | >>
pure regional languages.                            research is a human instrument. Accordingly,
         The purpose of this study is to            this study used a human instrument, namely
identify the types of lexical errors made by        the researchers, who act as an instrument to
machine translators in translating cultural         collect and analyze the data. Moreover, this
texts in Pramoedya Ananta Toer's Arok               research uses cultural texts that have been
Dedes dialogue and to determine the most            translated by machine translators to collect
dominant types of lexical errors. This study        the required data. Data collection techniques
can provide useful information on how               also cover tracing relevant informations from
machine translation can be used effectively         books, the internet, journals, articles, and
when translating various texts, such as texts       others. According to Sudaryanto, there are
containing cultural elements. This study's          five data collection strategies in linguistic
findings can clarify the types of lexical errors    research: recording techniques, recording
found in the Arok Dedes dialogue created by         techniques, separating data techniques, data
Google Translate. Furthermore, the results          transfer techniques, and replacing techniques
can be used as a reference in the future to         (Sudaryanto, 1992). In this study, the authors
train machine translation users to be more          only used three data collection strategies: 1)
careful and have to draw on re-examining the        recording techniques. This technique is used
results of their translations so that they do       to collect data by recording it using a
not necessarily adopt or take the translation       notebook. 2) Separation technique is a
results. It will help them get a fast translation   strategy to separate data from other data to
even if they still have to make some edits          find similarities and the distribution between
and retouch the translation. Thus, this study's     them: 3) Transfer technique, namely
findings are expected to provide guidelines         transferring data to other media.
for various learning fields or language skills              The data collection procedures of this
and those interested in translation.                study are as follows: first, the authors read
                                                    the entire book Arok Dedes. Second, the
RESEARCH METHOD                                     writers gave a sign to the text to be analyzed,
This section discusses the research design,         namely the cultural text contained in Arok
data sources, research instruments, data            Dedes. Third, the writers listed the text that
collection, and data analysis. This study used      has been selected, which is a dialogue that
a descriptive analysis method to describe the       contains cultural texts. Fourth, the writers
lexical error analysis results in Arok Dedes'       translated the text into a translation machine,
story dialogue. Researchers also use                namely Google, Translate. Fifth, the authors
qualitative descriptive because it tries to         rewrote the cultural text with the machine
describe the errors found in the object. This       translator's translation to a new page as the
study discusses machine translators' lexical        first data. Sixth, the authors identified the
errors in translating Arok Dedes' story             machine translators' lexical errors in
dialogue, which contains cultural texts.            translating Arok Dedes' story dialogue.
        The data collected is in the form of        Finally, the authors conducted data checking
words and sentences. This study describes           with the experts so that the data are valid.
the types of lexical errors in translating                  In this research, the descriptive
machines in translating dialogue in Arok            qualitative method employed focuses on
Dedes' stories. In this research, the data          words, phrases, and sentences rather than

                                Common Lexical Errors Made by Machine Translation on Cultural Text
Jurnal Edulingua | Vol 4. N. 2 Juli - Desember 2017

        numbers. There are several procedures in
        analyzing data based on Keshavarz and
        Brown (in Hana Amanah, 2017), who
        explain that error analysis refers to collecting
        samples, identifying errors and classifying
        them into which category they belong to, and        A. Formal errors
Nanda Fitri Mar'athus Sholikhah dan Rohmani Nur Indah

this research is a qualitative descriptive
study, the data results will be described
qualitatively by presenting the frequency and
percentage of errors. The frequency and
percentage of each error category are the        Table 1. The Frequency and Percentage of
results of calculations using certain formulas                  Lexical Errors
that have been mentioned in the previous                   Error Types            Number
                                                                                                   43 | >>
                                                                                                   Percent
chapter.                                                                            of              (%)
                                                                                  Errors
The Types and The Dominant Type of                                                 (34)
Lexical Errosmade by Google Translate in         A. Formal errors
Translating Cultural Text                           1. Formal Misselection
As shown in table 1, there are thirty-four              a. Vowel-based type          1             3.00%
lexical errors found from fourteen pages in             b. Consonant-based
Arok Dedes' story dialogue that contains                     type                    1             3.00%
cultural texts. Some errors are common (for             c. False type
example, calque, statistically weighted             2. Misformations                 5             15.00%
preferences, and semantically determined                a. Calque
word selection). Others are relatively rare         3. Distortion                    9             27.00%
(e.g., false type, misselection, consonant              a. Misselection
based type, vowel based type, inappropriate      B. Semantic errors                  4             12.00%
co-hyponyms, and preposition partners).              1. Confusion of sense
There were no errors in any other categories             relation
(for example, suffix types, prefixing types,             a. Inappropriate      co
borrowing, coinage, omission, overinclusion,                 hyponyms                1             3.00%
misordering, blending, a supermom for                2. Collocation errors
hyponym, hyponym for superonym, wrong                   a. Semantically
near-synonym, and arbitrary).                                determined     word
                                                             selection               6             18.00%
                                                        b. Statistically
                                                             weighted
                                                             preferences             6             18.00%
                                                        c. Preposition partners

                                                                                         1         3.00%

                                                     Based on the table above, the authors
                                                 provide further explanation as follows:
                                                 a. Formal Errors
                                                 In formal errors, the most problematic error
                                                 category in the data is a calque (there are
                                                 27.00% of all errors), followed by false type
                                                 (15.00%), then misselection (12.00%). The
                                                 errors that rarely appear in formal errors are
                                                 consonant based type and vowel based type,
                                                 only 3.00% of the total. These findings
                                                 indicate that the Indonesian language,
                                                 especially the word structure of culture, is
                                                 quite influential in making English

                              Common Lexical Errors Made by Machine Translation on Cultural Text
Jurnal Edulingua | Vol 4. N. 2 Juli - Desember 2017

        sentences. The five formal errors that arise in 4) Consonant Based Type
        this study will be discussed below by This error case is almost the same as the
        providing some examples and explanations vowel based type case. In this case, they are
        of how they occur and how they should almost the same shape but have different
        occur.                                              consonants. Let's take an example:
        1) Calque                                           Source Text: Tak pernah yang mulia
Nanda Fitri Mar'athus Sholikhah dan Rohmani Nur Indah

Statistically weighted preferences are words              ears, and then you dare to speak like
or phrases often used in conjunction with                 that?
other words or phrases and sound natural and              The words 'tell' and 'bitter' in the two
appropriate for native speakers. An incorrect   sentences       above     are    examples        of
phrase may not be completely wrong; it is       semantically word determined selection. In
imprecise.                                      the first sentence (the word 'tell'), let's see
        There are six statistically weighted    how the first language was used before 45 | >>
preferences found from thirty-four samples.     translation. "Did Hyang Wisynu command
We will discuss one of them. Here is a          you to empower that eagle diligently too?"
mistake:                                        The word 'commission' means that someone
Source Text: Bukankah Yang Mulia Akuwu          is ordered to do something. Command here
        sudah cukup memuliakan kau,             means action. If we use the word 'tell,' it is
        Dedes? Pramesywari Tumapel?             not wrong, but it is inappropriate. The word
        Telah mengangkat naik kau dalam         'tell' here is only a command word, not an
        perkawinan kebesaran ini?               action. So the suggested word to use is
Google Translate: Is not your Honor Akuwu       'command/ask'.
        enough to glorify you, Dedes?                     In the second sentences (the word
        Pramesywari Tumapel? Has raised         'bitter'), it can be seen that several words
        you in this marriage of greatness?      may have the same meaning. For example,
        In the case of these statistically      the words 'bitter' and 'bad.' have negative
weighted preferences, the phrase 'your          meanings. Bitter is a word to express a
Honor' is the mistake. The word 'honor' is      negative that can be felt most clearly, such as
not appropriate to attach to the word Akuwu,    butter's bitterness. However, if we look at the
a person who can glorify someone. This          context of "That is the bitter result of Sri
word is inaccurate because it is not an         Erlangga's legacy.", It means that it is not
appropriate adjective or combination for        something that can be immediately felt
'Akuwu.' Even though we change the speech       clearly by the human senses. However, the
part of the word 'honor' to an adjective, it    word is against bad circumstances or bad
still does not make the phrase correct. The     conditions. So the suggested translation is
suggested translation is 'glorious.'            "That is the bad consequence of Sri
                                                Erlangga"
2) Semantically       Determined     Word
   Selection                                    3) Preposition Partners
This is an example of an error found in the     Preposition adalah kata yang Reveal
Arok Dedes dialog:                              relationships. Preposition partners relate to
Source Text:                                    prepositions that follow before or after a
   a) Apakah Hyang Wisynu menitahkan            verb, adjective, or noun. This is an example
       agar kalian memelihara elang itu         of the preposition partners found in the Arok
       dengan tekun juga?                       Dedes dialog by Google Translate:
   b) Itulah akibat pahit peninggalan Sri       Source Text: Setidak-tidaknya dari Hyang
       Erlangga.Apakah gurumu yang lama,                 Bthara Guru aku tahu, dua hari lagi
       Tantripala, yang membisikan itu                   kalian akan mendapat perintah
       pada kupingmu, maka kau berani                    untuk mengangkut upeti ke Kediri.
       bicara seperti itu?                               Dari Hyang Wisynu aku tahu, kalian
Google Translate:                                        akan lakukan itu dengan patuh.
   a) Did Hyang Wisynu tell you to keep         Google Translate: At least from Hyang
       the eagle diligently too?                         Bthara Guru I know, in two days
   b) That's the bitter consequence of Sri               you will get an order to bring tribute
       Erlangga. What is your old teacher,               to Kediri. From Hyang Wisynu I
       Tantripala, who whispers it to your               know you will do it obediently.

                             Common Lexical Errors Made by Machine Translation on Cultural Text
Jurnal Edulingua | Vol 4. N. 2 Juli - Desember 2017

                 The preposition 'in' in a sentence is      shown in table 1, there were a total of 34
        incorrect because the preposition 'in' is not a     lexical errors found in Arok Dedes
        preposition to indicate time. The preposition       Pramoedya Ananta Toer's story dialogue,
        'in' is a preposition that indicates direction.     which was translated by Google Translate.
        Actually, in this sentence, the word above          The most dominant type of lexical error is
        means 'for two days.' There is a time               calque.
Nanda Fitri Mar'athus Sholikhah dan Rohmani Nur Indah

         For example, the translation, "By        morphologically similar, and as a result,
Hyang Wisynu, on the closing day of this          machine translators think they can correctly
Brahmancarya ..." is translated as "For the       produce the target language. Another
sake of Hyang Wisynu, on the day of closing       common error that occurs is an 'incorrect
of this Brahman ...". This is an interesting      collocation.' There could be several reasons
mistake made by Google Translate for              for this collocation error. In the first place,
failing to recognize that the word                there may be L1 interference, and the 47 | >>
"Brahmancarya" refers to "Brahman." In            resulting collocation is the result of direct
fact, the two mean very different words.          translation. Another possible reason is
Thus, the source text, which is literally         underdeveloped knowledge of the word. The
translated or mirrored, results in the target     word may be in their productive vocabulary,
text's wrong text structure, which also affects   but machine translation may not understand
its meaning. Based on these findings, formal      the different connotation coloring, thus
errors were more frequent than semantic           placing the word in the wrong context.
errors. This shows that knowledge of                      There are several mistakes in the
morphology is more difficult than semantic        'word-formation.' This type of error
knowledge. Besides, it proves that Google         frequently occurs for two reasons. The first
Translate cannot translate cultural elements      is a misapplication of the L2 derived rule, or
easily.                                           sometimes, machine translation applies the
         Lexical errors have been proven to be    L1 derived rule to produce an L2 target
major errors in the dialogue of the translation   word. This is outside the scope of this study
of Arok Dedes Pramoedya Ananta Toer's             but will be an interesting research topic for a
story, which was translated by Google             different study. In summary, half of all errors
Translate. Lexical errors contribute to nearly    made are classified as lexical errors. The
half of all errors made. A closer look at the     most common type of lexical error
data shows that the second problem type of        encountered has to deal with a limited
lexical error is semantic. This type of error     understanding of the semantic range of the
can be attributed to underdeveloped               words and how they intersect with other
vocabulary knowledge. Machine translation         words. It was also found that the most
uses words in the semantically correct field,     significant mistake Google Translate made
but the connotative meaning of the words          was in the wrong word order. In translation
used does not fit the context. This error is      units (in dialogs containing cultural text),
considered an error in the connotation of the     Google Translate cannot recognize the
word culture. There are two possible reasons      source text's root text due to the order of the
for this. The first reason may be that machine    elements that appear completely independent
translations do not have the same word count      or are used for different lexical purposes.
to cover the semantic field. The second                   This section also presents the results
reason that makes sense is that the machine       of previous research in the form of Farah
translator does not fully know the word. It       Hana Amanah, Error Made by Google
means that they do not know the appropriate       Translate and It is Rectification by Human
collocates.                                       Translators, Faculty of Languages and
         An important part of knowing a word      Linguistics University of Malaya Kuala
is when, where, and how to use it. Formal         Lumpur, 2017 where three human translators
errors make the most common lexical errors.       rectify or edit Google Translation raw
The reasons for this error can be myriad. The     outputs with access to the source text. Error
authors may have accessed the wrong word          corrections made by human translators are
in their mental lexicon because of how it was     made based on a taxonomy of error
stored. Perhaps the word is included in their     categorization. The results show that human
receptive vocabulary. The target language         translators    produce      accurate     output
and       the    resulting    language      are   compared to Google translations. The reason

                              Common Lexical Errors Made by Machine Translation on Cultural Text
Jurnal Edulingua | Vol 4. N. 2 Juli - Desember 2017

        is because human translators know more             Foreign Language (EFL).' This research
        about the context of the text than Google          demonstrates        the       importance      of
        Translate.                                         understanding how the lexis is acquired and
                Error analysis is collecting samples,      identifying where learning has not taken
        identifying errors, and classifying errors, and    place and therefore the areas of teaching
        finally evaluating the errors (which means         and/or remedial correction, hope that this
Nanda Fitri Mar'athus Sholikhah dan Rohmani Nur Indah

implications. It was found that a large           Less common lexical error subtypes are false
number of lexical errors were made. More          friend types, misselection, consonant based
than 50% of lexical errors relate to learners     types, vowel based types, inappropriate co-
not understanding the semantic ranges of          hyponyms, and preposition partners. The
words      and     not   understanding      the   other categories do not appear at all. They
corresponding word sets. Based on these           are the suffix type, prefix type, borrowing,
findings, several approaches and activities       coinage,        omission,        overinclusion, 49 | >>
are provided for use with ALL. The focus of       misordering, blending, a superonym for
this activity is to create individualized and     hyponyms, a hyponym for a supermom,
differentiated instruction through the use of     wrong near-synonyms, and arbitrary
student writing and goal setting. This activity   combination. Of the two main types of
also provides deeper vocabulary knowledge         lexical errors, formal errors are more
to ALLs using semantic mapping, studying          problematic than semantic errors found in
collocations, and using concordances.             the Arok Dedes dialog.
        As found in this study, teachers can              Google translate does not easily
use exercises to help students differentiate      translate morphological knowledge rather
between minimal pairs and increase their          than semantics. Google is a great source of
morphological awareness when teaching             information and knowledge. From Google,
vocabulary and spelling. To deal with             we can know and learn all the sciences. One
collocation errors, students can be informed      of the Google tools that helps internet users
about the corporation's value and can be          and has common uses is Google translate.
encouraged to access the corporation online       This is because Google translate presents
and use these facilities when studying            various languages. However, as a machine
collocation. Also, students at beginner and       translator, Google Translate's results are not
lower intermediate levels can initially           entirely correct. Therefore, it is suggested to
memorize word pieces in the learning              readers, especially users, to be more
collocation. With this in mind, we suggest        corrective and careful in adopting them.
that teachers use a taxonomy of lexical error             During this research, the researcher
or develop their own in their vocabulary          found another problem in Arok Dedes'
teaching. We firmly believe that this             dialogue, which was translated by Google
taxonomy serves not only as a research tool       Translate. Another problem that arises is
but, more importantly, as a learning tool that    how to revise the results of Google
teachers should use. When used effectively,       Translate, which have many errors and still
this lexical error taxonomy can help students     need correction. Therefore, the authors
improve their metacognitive skills in             suggest that other researchers conduct further
recognizing and perhaps even correcting           research on this matter. Furthermore, Google
their own mistakes. This could be a way to        Translate has difficulty with certain types of
help minimize lexical error fossilization.        errors, as this study shows. It would be
                                                  interesting to test other machine translations
CONCLUSION                                        to compare accuracy rates to see which of
After conducting a study of lexical errors,       them can produce the translation output with
the authors came to several conclusions. This     the fewest errors. Further research may also
conclusion is to answer the research              explore other text type genres and use larger
problems of this study. Of the twenty-one         volumes of text.
subtypes of lexical errors, only nine subtypes
appeared in the Arok Dedes dialogue, and          ACKNOWLEDGEMENT
the rest did not appear at all. Some of Arok      The publication of this paper is inseparable
Dedes' dialogue's common mistakes are             from the guidance and support of various
calque, statistical weight preferences, and       parties. Therefore, on this occasion the
semantically determined word selection.
                                                  author would like to extend the deepest

                              Common Lexical Errors Made by Machine Translation on Cultural Text
Jurnal Edulingua | Vol 4. N. 2 Juli - Desember 2017

        appreciation to the advisors, Mr Bahruddin,        who always guide and give suggestions.
        S.S, M.Pd and Mr Chothibul Umam, M.Pd

        REFERENCES
        Amini, R., & Bayesteh, J. (2020). Investigating Error Analysis in Interlinguistic English
Nanda Fitri Mar'athus Sholikhah dan Rohmani Nur Indah

                                                                     51 | >>

Common Lexical Errors Made by Machine Translation on Cultural Text
You can also read