How do you pronounce your name? Improving G2P with transliterations
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
How do you pronounce your name? Improving G2P with transliterations Aditya Bhargava and Grzegorz Kondrak Department of Computing Science University of Alberta Edmonton, Alberta, Canada, T6G 2E8 {abhargava,kondrak}@cs.ualberta.ca Abstract scripts such as Arabic, Korean, or Hindi are more consistent and easier to identify than various pho- Grapheme-to-phoneme conversion (G2P) of netic transcription schemes. The process of translit- names is an important and challenging prob- eration, also called phonetic translation (Li et al., lem. The correct pronunciation of a name is often reflected in its transliterations, which are 2009b), involves “sounding out” a name and then expressed within a different phonological in- finding the closest possible representation of the ventory. We investigate the problem of us- sounds in another writing script. Thus, the correct ing transliterations to correct errors produced pronunciation of a name is partially encoded in the by state-of-the-art G2P systems. We present a form of the transliteration. For example, given the novel re-ranking approach that incorporates a ambiguous letter-to-phoneme mapping of the En- variety of score and n-gram features, in order glish letter g, the initial phoneme of the name Gersh- to leverage transliterations from multiple lan- win may be predicted by a G2P system to be ei- guages. Our experiments demonstrate signifi- cant accuracy improvements when re-ranking ther /g/ (as in Gertrude) or /Ã/ (as in Gerald). The is applied to n-best lists generated by three transliterations of the name in other scripts provide different G2P programs. support for the former (correct) alternative. Although it seems evident that transliterations should be helpful in determining the correct pronun- 1 Introduction ciation of a name, designing a system that takes ad- Grapheme-to-phoneme conversion (G2P), in which vantage of this insight is not trivial. The main source the aim is to convert the orthography of a word to its of the difficulty stems from the differences between pronunciation (phonetic transcription), plays an im- the phonologies of distinct languages. The mappings portant role in speech synthesis and understanding. between phonemic inventories are often complex Names, which comprise over 75% of unseen words and context-dependent. For example, because Hindi (Black et al., 1998), present a particular challenge has no /w/ sound, the transliteration of Gershwin to G2P systems because of their high pronunciation instead uses a symbol that represents the phoneme variability. Guessing the correct pronunciation of a /V/, similar to the /v/ phoneme in English. In ad- name is often difficult, especially if they are of for- dition, converting transliterations into phonemes is eign origin; this is attested by the ad hoc transcrip- often non-trivial; although few orthographies are as tions which sometimes accompany new names intro- inconsistent as that of English, this is effectively the duced in news articles, especially for international G2P task for the particular language in question. stories with many foreign names. In this paper, we demonstrate that leveraging Transliterations provide a way of disambiguating transliterations can, in fact, improve the grapheme- the pronunciation of names. They are more abun- to-phoneme conversion of names. We propose a dant than phonetic transcriptions, for example when novel system based on discriminative re-ranking that news items of international or global significance are is capable of incorporating multiple transliterations. reported in multiple languages. In addition, writing We show that simplistic approaches to the problem
fail to achieve the same goal, and that translitera- to the method used by Finch and Sumita (2010) to tions from multiple languages are more helpful than combine the scores of two different machine translit- from a single language. Our approach can be com- eration systems. bined with any G2P system that produces n-best lists instead of single outputs. The experiments that we 2.3 Measuring similarity perform demonstrate significant error reduction for The approaches presented in the previous section three very different G2P base systems. crucially depend on a method for computing the similarity between various symbol sequences that 2 Improving G2P with transliterations represent the same word. If we have a method of converting transliterations to phonetic represen- 2.1 Problem definition tations, the similarity between two sequences of In both G2P and machine transliteration, we are in- phonemes can be computed with a simple method terested in learning a function that, given an input such as normalized edit distance or the longest com- sequence x, produces an output sequence y. In the mon subsequence ratio, which take into account the G2P task, x is composed of graphemes and y is number and position of identical phonemes. Alter- composed of phonemes; in transliteration, both se- natively, we could apply a more complex approach, quences consist of graphemes but they represent dif- such as ALINE (Kondrak, 2000), which computes ferent writing scripts. Unlike in machine translation, the distance between pairs of phonemes. However, the monotonicity constraint is enforced; i.e., we as- the implementation of a conversion program would sume that x and y can be aligned without the align- require ample training data or language-specific ex- ment links crossing each other (Jiampojamarn and pertise. Kondrak, 2010). We assume that we have available a A more general approach is to skip the tran- base G2P system that produces an n-best list of out- scription step and compute the similarity between puts with a corresponding list of confidence scores. phonemes and graphemes directly. For example, the The goal is to improve the base system’s perfor- edit distance function can be learned from a training mance by applying existing transliterations of the in- set of transliterations and their phonetic transcrip- put x to re-rank the system’s n-best output list. tions (Ristad and Yianilos, 1998). In this paper, we apply M2M-A LIGNER (Jiampojamarn et al., 2007), 2.2 Similarity-based methods an unsupervised aligner, which is a many-to-many A simple and intuitive approach to improving G2P generalization of the learned edit distance algorithm. with transliterations is to select from the n-best list M2M-A LIGNER was originally designed to align the output sequence that is most similar to the cor- graphemes and phonemes, but can be applied to dis- responding transliteration. For example, the Hindi cover the alignment between any sets of symbols transliteration in Figure 1 is arguably closest in per- (given training data). The logarithm of the probabil- ceptual terms to the phonetic transcription of the ity assigned to the optimal alignment can then be second output in the n-best list, as compared to interpreted as a similarity measure between the two the other outputs. One obvious problem with this sequences. method is that it ignores the relative ordering of the n-best lists and their corresponding scores produced 2.4 Discriminative re-ranking by the base system. The methods described in Section 2.2, which are A better approach is to combine the similarity based on the similarity between outputs and translit- score with the output score from the base system, al- erations, are difficult to generalize when multiple lowing it to contribute an estimate of confidence in transliterations of a single name are available. A lin- its output. For this purpose, we apply a linear combi- ear combination is still possible but in this case opti- nation of the two scores, where a single parameter λ, mizing the parameters would no longer be straight- ranging between zero and one, determines the rela- forward. Also, we are interested in utilizing other tive weight of the scores. The exact value of λ can be features besides sequence similarity. optimized on a training set. This approach is similar The SVM re-ranking paradigm offers a solution
input Gershwin n-best outputs /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ transliterations गश� िवन ガーシュウィン Гершвин (/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/) Figure 1: An example name showing the data used for feature construction. Each arrow links a pair used to generate features, including n-gram and score features. The score features use similarity scores for transliteration-transcription pairs and system output scores for input-output pairs. One feature vector is constructed for each system output. to the problem. Our re-ranking system is informed of them simultaneously. We apply the n-gram fea- by a large number of features, which are based on tures across all transliteration-transcription pairs in scores and n-grams. The scores are of three types: addition to the usual input-output pairs correspond- ing to the n-best lists. Figure 1 illustrates the set of 1. The scores produced by the base system for pairs used for feature generation. each output in the n-best list. In this paper, we augment the n-gram features by 2. The similarity scores between the outputs and a set of reverse features. Unlike a traditional G2P each available transliteration. generator, our re-ranker has access to the outputs produced by the base system. By swapping the input 3. The differences between scores in the n-best and the output side, we can add reverse context and lists for both (1) and (2). linear-chain features. Since the n-gram features are also applied to transliteration-transcription pairs, the Our set of binary n-gram features includes those reverse features enable us to include features which used for D IREC TL+ (Jiampojamarn et al., 2010). bind a variety of n-grams in the transliteration string They can be divided into four types: with a single corresponding phoneme. 1. The context features combine output symbols The construction of n-gram features presupposes (phonemes) with n-grams of varying sizes in a a fixed alignment between the input and output se- window of size c centred around a correspond- quences. If the base G2P system does not provide ing position on the input side. input-output alignments, we use M2M-A LIGNER for this purpose. The transliteration-transcription 2. The transition features are bigrams on the out- pairs are also aligned by M2M-A LIGNER, which at put (phoneme) side. the same time produces the corresponding similarity 3. The linear chain features combine the context scores. (We set a lower limit of -100 on the M2M- features with the bigram transition features. A LIGNER scores.) If M2M-A LIGNER is unable to produce an alignment, we indicate this with a binary 4. The joint n-gram features are n-grams contain- feature that is included with the n-gram features. ing both input and output symbols. 3 Experiments We apply the features in a new way: instead of be- ing applied strictly to a given input-output set, we We perform several experiments to evaluate our expand their use across many languages and use all transliteration-informed approaches. We test simple
similarity-based approaches on single-transliteration Language Corpus size Overlap data, and evaluate our SVM re-ranking approach Bengali 12,785 1,840 against this as well. We then test our approach us- Chinese 37,753 4,713 ing all available transliterations. Relevant code and Hindi 12,383 2,179 scripts required to reproduce our experimental re- Japanese 26,206 4,773 sults are available online1 . Kannada 10,543 1,918 Korean 6,761 3,015 3.1 Data & setup Russian 6,447 487 For pronunciation data, we extracted all names from Tamil 10,646 1,922 the Combilex corpus (Richmond et al., 2009). We Thai 27,023 5,436 discarded all diacritics, duplicates and multi-word names, which yielded 10,084 unique names. Both Table 1: The number of unique single-word entries in the the similarity and SVM methods require transliter- transliteration corpora for each language and the amount ations for identifying the best candidates in the n- of common data (overlap) with the pronunciation data. best lists. They are therefore trained and evaluated on the subset of the G2P corpus for which transliter- English-to-Hindi transliteration performance with a ations available. Naturally, allowing transliterations simple cleaning of the data. from all languages results in a larger corpus than the Our tests involving transliterations from multiple one obtained by the intersection with transliterations languages are performed on the set of names for from a single language. which we have both the pronunciation and translit- For our experiments, we split the data into 10% eration data. There are 7,423 names in the G2P cor- for testing, 10% for development, and 80% for pus for which at least one transliteration is available. training. The development set was used for initial Table 1 lists the total size of the transliteration cor- tests and experiments, and then for our final results pora as well as the amount of overlap with the G2P the training and development sets were combined data. Note that the base G2P systems are trained us- into one set for final system training. For SVM re- ing all 10,084 names in the corpus as opposed to ranking, during both development and testing we only the 7,423 names for which there are transliter- split the training set into 10 folds; this is necessary ations available. This ensures that the G2P systems when training the re-ranker as it must have system have more training data to provide the best possible output scores that are representative of the scores on base performance. unseen data. We ensured that there was never any For our single-language experiments, we normal- overlap between the training and testing data for all ize the various scores when tuning the linear com- trained systems. bination parameter λ so that we can compare values Our transliteration data come from the shared across different experimental conditions. For SVM tasks on transliteration at the 2009 and 2010 Named re-ranking, we directly implement the method of Entities Workshops (Li et al., 2009a; Li et al., 2010). Joachims (2002) to convert the re-ranking problem We use all of the 2010 English-source data plus the into a classification problem, and then use the very English-to-Russian data from 2009, which makes fast LIBLINEAR (Fan et al., 2008) to build the SVM nine languages in total. In cases where the data models. Optimal hyperparameter values were deter- provide alternative transliterations for a given in- mined during development. put, we keep only one; our preliminary experiments We evaluate using word accuracy, the percentage indicated that including alternative transliterations of words for which the pronunciations are correctly did not improve performance. It should be noted predicted. This measure marks pronunciations that that these transliteration corpora are noisy: Jiampo- are even slightly different from the correct one as in- jamarn et al. (2009) note a significant increase in correct, so even a small change in pronunciation that 1 http://www.cs.ualberta.ca/˜ab31/ might be acceptable or even unnoticeable to humans g2p-tl-rr would count against the system’s performance.
3.2 Base systems find the similarity between phonetic transcriptions, It is important to test multiple base systems in order we use the two different methods described in Sec- to ensure that any gain in performance applies to the tion 2.2: A LINE and M2M-A LIGNER. We further task in general and not just to a particular system. test the use of a linear combination of the similar- We use three G2P systems in our tests: ity scores with the base system’s score so that its confidence information can be taken into account; 1. F ESTIVAL (F EST), a popular speech synthe- the linear combination weight is determined from sis package, which implements G2P conver- the training set. These methods are referred to as sion with CARTs (decision trees) (Black et al., A LINE +BASE and M2M+BASE. For these experi- 1998). ments, our training and testing sets are obtained by intersecting our G2P training and testing sets respec- 2. S EQUITUR (S EQ), a generative system based tively with the Hindi transliteration corpus, yielding on the joint n-gram approach (Bisani and Ney, 1,950 names for training and 229 names for testing. 2008). Since the similarity-based methods are designed 3. D IREC TL+ (DTL), the discriminative system to incorporate homogeneous same-script translitera- on which our n-gram features are based (Ji- tions, we can only run this experiment on one lan- ampojamarn et al., 2010). guage at a time. Furthermore, ALINE operates on phoneme sequences, so we first need to convert the All systems are capable of providing n-best output transliterations to phonemes. An alternative would lists along with scores for each output, although for be to train a proper G2P system, but this would re- F ESTIVAL they had to be constructed from the list quire a large set of word-pronunciation pairs. For of output probabilities for each input character. this experiment, we choose Hindi, for which we We run D IREC TL+ with all of the features de- constructed a rule-based G2P converter. Aside from scribed in (Jiampojamarn et al., 2010) (i.e., context simple one-to-one mapping (romanization) rules, features, transition features, linear chain features, the converter has about ten rules to adjust for con- and joint n-gram features). System parameters, such text. as maximum number of iterations, were determined For these experiments, we apply our SVM re- during development. For S EQUITUR, we keep de- ranking method in two ways: fault options except for the enabling of the 10 best outputs and we convert the probabilities assigned to 1. Using only Hindi transliterations (referred to as the outputs to log-probabilities. We set S EQUITUR’s SVM-H INDI). joint n-gram order to 6 (this was also determined 2. Using all available languages (referred to as during development). SVM-A LL). Note that the three base systems differ slightly in terms of the alignment information that they pro- In both cases, the test set is restricted to the same vide in their outputs. F ESTIVAL operates letter-by- 229 names, in order to provide a valid comparison. letter, so we use the single-letter inputs with the Table 2 presents the results. Regardless of the phoneme outputs as the aligned units. D IREC TL+ choice of the similarity function, the simplest ap- specifies many-to-many alignments in its output. For proaches fail in a spectacular manner, significantly S EQUITUR, however, since it provides no informa- reducing the accuracy with respect to the base sys- tion regarding the output structure, we use M2M- tem. The linear combination methods give mixed re- A LIGNER to induce alignments for n-gram feature sults, improving the accuracy for F ESTIVAL but not generation. for S EQUITUR or D IREC TL+ (although the differ- ences are not statistically significant). However, they 3.3 Transliterations from a single language perform much better than the methods based on sim- The goal of the first experiment is to compare sev- ilarity scores alone as they are able to take advan- eral similarity-based methods, and to determine how tage of the base system’s output scores. If we look they compare to our re-ranking approach. In order to at the values of λ that provide the best performance
Base system Base system F EST S EQ DTL F EST S EQ DTL Base 58.1 67.3 71.6 Base 55.3 66.5 70.8 A LINE 28.0 26.6 27.5 SVM-S CORE 62.1 68.4 71.0 M2M 39.3 36.2 36.2 SVM- N - GRAM 66.2 72.5 73.8 A LINE +BASE 58.5 65.9 71.2 SVM-A LL 67.2 73.4 74.3 M2M+BASE 58.5 66.4 70.3 SVM-H INDI 63.3 69.0 69.9 Table 3: Word accuracy of the base system versus the re- SVM-A LL 68.6 72.5 75.6 ranking variants with transliterations from multiple lan- guages. Table 2: Word accuracy (in percentages) of various meth- ods when only Hindi transliterations are used. which differ with respect to the set of included fea- tures: on the training set, we find that they are higher for 1. SVM-S CORE includes only the three types of the stronger base systems, indicating more reliance score features described in Section 2.4. on the base system output scores. For example, for A LINE +BASE the F ESTIVAL-based system has 2. SVM- N - GRAM uses only the n-gram features. λ = 0.58 whereas the D IREC TL+-based system has λ = 0.81. Counter-intuitively, the A LINE +BASE 3. SVM-A LL is the full system that combines the and M2M+BASE methods are unable to improve score and n-gram features. upon S EQUITUR or D IREC TL+. We would expect The objective is to determine the degree to which to achieve at least the base system’s performance, each of the feature classes contributes to the overall but disparities between the training and testing sets results. Because we are using all available transliter- prevent this. ations, we achieve much greater coverage over our The two SVM-based methods achieve much bet- G2P data than in the previous experiment; in this ter results. SVM-A LL produces impressive accu- case, our training set consists of 6,660 names while racy gains for all three base systems, while SVM- the test set has 763 names. H INDI yields smaller (but still statistically signifi- Table 3 presents the results. Note that the base- cant) improvements for F ESTIVAL and S EQUITUR. line accuracies are somewhat lower than in Table 2 These results suggest that our re-ranking method because of the different test set. We find that, when provides a bigger boost to systems built with dif- using all features, the SVM re-ranker can provide ferent design principles than to D IREC TL+ which a very impressive error reduction over F ESTIVAL utilizes a similar set of features. On the other hand, (26.7%) and S EQUITUR (20.7%) and a smaller but the results also show that the information obtained still significant (p < 0.01 with the McNemar test) by consulting a single transliteration may be insuf- error reduction over D IREC TL+ (12.1%). ficient to improve an already high-performing G2P When we consider our results using only the score converter. and n-gram features, we can see that, interestingly, the n-gram features are most important. We draw 3.4 Transliterations from multiple languages a further conclusion from our results: consider the Our second experiment expands upon the first; we large disparity in improvements over the base sys- use all available transliterations instead of being re- tems. This indicates that F ESTIVAL and S EQUITUR stricted to one language. This rules out the sim- are benefiting from the D IREC TL+-style features ple similarity-based approaches, but allows us to used in the re-ranking. Without the n-gram fea- test our re-ranking approach in a way that fully uti- tures, however, there is still a significant improve- lizes the available data. We test three variants of our ment over F ESTIVAL, demonstrating that the scores transliteration-informed SVM re-ranking approach, do provide useful information. In this case there is
no way for D IREC TL+-style information to make # TL # Entries Improvement its way into the re-ranking; the process is based ≤1 111 0.9 purely on the transliterations and their similarities ≤2 266 3.0 with the transcriptions in the output lists, indicat- ≤3 398 3.8 ing that the system is capable of extracting use- ≤4 536 3.2 ful information directly from transliterations. In the ≤5 619 2.8 case of D IREC TL+, the transliterations help through ≤6 685 3.4 the n-gram features rather than the score features; ≤7 732 3.7 this is probably because the crucial feature that ≤8 762 3.5 signals the inability of M2M-A LIGNER to align a ≤9 763 3.5 given transliteration-transcription pair belongs to the set of the n-gram features. Both the n-gram fea- Table 4: Absolute improvement in word accuracy (%) tures and score features are dependent on the align- over the base system (D IREC TL+) of the SVM re-ranker ments, but they differ in that the n-gram features for various numbers of available transliterations. allow weights to be learned for local n-gram pairs whereas the score features are based on global infor- which provides some idea of how similar the pre- mation, providing only a single feature for a given dicted pronunciation is to the correct one. transliteration-transcription pair. The two therefore overlap to some degree, although the score fea- 3.5 Effect of multiple transliterations tures still provide useful information via probabili- One motivating factor for the use of SVM re-ranking ties learned during the alignment training process. was the ability to incorporate multiple transliteration A closer look at the results provides additional languages. But how important is it to use more than insight into the operation of our re-ranking system. one language? To examine this question, we look For example, consider the name Bacchus, which D I - particularly at the sets of names having at most k REC TL+ incorrectly converts into /bækÙ@s/. The transliterations available. Table 4 shows the results most likely reason why our re-ranker selects instead with D IREC TL+ as the base system. Note that the the correct pronunciation /bæk@s/ is that M2M- number of names with more than five transliterations A LIGNER fails to align three of the five available was small. Importantly, we see that the increase in transliterations with /bækÙ@s/. Such alignment fail- performance when only one transliteration is avail- ures are caused by a lack of evidence for the map- able is so small as to be insignificant. From this, we ping of the grapheme representing the sound /k/ can conclude that obtaining improvement on the ba- in the transliteration training data with the phoneme sis of a single transliteration is difficult in general. /Ù/. In addition, the lack of alignments prevents any This corroborates the results of the experiment de- n-gram features from being enabled. scribed in Section 3.3, where we used only Hindi Considering the difficulty of the task, the top ac- transliterations. curacy of almost 75% is quite impressive. In fact, many instances of human transliterations in our cor- 4 Previous work pora are clearly incorrect. For example, the Hindi transliteration of Bacchus contains the /Ù/ conso- There are three lines of research that are relevant to nant instead of the correct /k/. Moreover, our strict our work: (1) G2P in general; (2) G2P on names; and evaluation based on word accuracy counts all sys- (3) combining diverse data sources and/or systems. tem outputs that fail to exactly match the dictio- The two leading approaches to G2P are repre- nary data as errors. The differences are often very sented by S EQUITUR (Bisani and Ney, 2008) and minor and may reflect an alternative pronunciation. D IREC TL+ (Jiampojamarn et al., 2010). Recent The phoneme accuracy2 of our best result is 93.1%, comparisons suggests that the former obtains some- what higher accuracy, especially when it includes 2 The phoneme accuracy is calculated from the minimum joint n-gram features (Jiampojamarn et al., 2010). edit distance between the predicted and correct pronunciations. Systems based on decision trees are far behind. Our
results confirm this ranking. 5 Conclusions & future work Names can present a particular challenge to G2P In this paper, we explored the application of translit- systems. Kienappel and Kneser (2001) reported a erations to G2P. We demonstrated that transliter- higher error rate for German names than for general ations have the potential for helping choose be- words, while on the other hand Black et al. (1998) tween n-best output lists provided by standard G2P report similar accuracy on names as for other types systems. Simple approaches based solely on sim- of English words. Yang et al. (2006) and van den ilarity do not work when tested using a single Heuvel et al. (2007) post-process the output of a transliteration language (Hindi), necessitating the general G2P system with name-specific phoneme- use of smarter methods that can incorporate mul- to-phoneme (P2P) systems. They find significant im- tiple transliteration languages. We apply SVM re- provement using this method on data sets consisting ranking to this task, enabling us to use a variety of Dutch first names, family names, and geograph- of features based not only on similarity scores but ical names. However, it is unclear whether such an on n-grams as well. Our method shows impressive approach would be able to improve the performance error reductions over the popular F ESTIVAL sys- of the current state-of-the-art G2P systems. In addi- tem and the generative joint n-gram S EQUITUR sys- tion, the P2P approach works only on single outputs, tem. We also find significant error reduction using whereas our re-ranking approach is designed to han- the state-of-the-art D IREC TL+ system. Our analy- dle n-best output lists. sis demonstrated that it is essential to provide the Although our approach is (to the best of our re-ranking system with transliterations from multi- knowledge) the first to use different tasks (G2P and ple languages in order to mitigate the differences transliteration) to inform each other, this is concep- between phonological inventories and smooth out tually similar to model and system combination ap- noise in the transliterations. proaches. In statistical machine translation (SMT), In the future, we plan to generalize our approach methods that incorporate translations from other lan- so that it can be applied to the task of generating guages (Cohn and Lapata, 2007) have proven effec- transliterations, and to combine data from distinct tive in low-resource situations: when phrase trans- G2P dictionaries. The latter task is related to the no- lations are unavailable for a certain language, one tion of domain adaptation. We would also like to ap- can look at other languages where the translation ply our approach to web data; we have shown that it is available and then translate from that language. is possible to use noisy transliteration data, so it may A similar pivoting approach has also been applied be possible to leverage the noisy ad hoc pronuncia- to machine transliteration (Zhang et al., 2010). No- tion data as well. Finally, we plan to investigate ear- tably, the focus of these works have been on cases in lier integration of such external information into the which there are less data available; they also modify G2P process for single systems; while we noted that the generation process directly, rather than operating re-ranking provides a general approach applicable to on existing outputs as we do. Ultimately, a combina- any system that can generate n-best lists, there is a tion of the two approaches is likely to give the best limit as to what re-ranking can do, as it relies on the results. correct output existing in the n-best list. Modifying Finch and Sumita (2010) combine two very dif- existing systems would provide greater potential for ferent approaches to transliteration using simple lin- improving results even though the changes would be ear interpolation: they use S EQUITUR’s n-best out- necessarily system-specific. puts and re-rank them using a linear combination of the original S EQUITUR score and the score for Acknowledgements that output of a phrased-based SMT system. The lin- ear weights are hand-tuned. We similarly use linear We are grateful to Sittichai Jiampojamarn and Shane combinations, but with many more scores and other Bergsma for the very helpful discussions. This re- features, necessitating the use of SVMs to determine search was supported by the Natural Sciences and the weights. Importantly, we combine different data Engineering Research Council of Canada. types where they combine different systems.
References Computational Linguistics, pages 697–700, Los An- geles, California, USA, June. Association for Compu- Maximilian Bisani and Hermann Ney. 2008. Joint- tational Linguistics. sequence models for grapheme-to-phoneme conver- Thorsten Joachims. 2002. Optimizing search engines us- sion. Speech Communication, 50(5):434–451, May. ing clickthrough data. In Proceedings of the Eighth Alan W. Black, Kevin Lenzo, and Vincent Pagel. 1998. ACM SIGKDD International Conference on Knowl- Issues in building general letter to sound rules. In The edge Discovery and Data Mining, pages 133–142, Ed- Third ESCA/COCOSDA Workshop (ETRW) on Speech monton, Alberta, Canada. Association for Computing Synthesis, Jenolan Caves House, Blue Mountains, New Machinery. South Wales, Australia, November. Anne K. Kienappel and Reinhard Kneser. 2001. De- Trevor Cohn and Mirella Lapata. 2007. Machine trans- signing very compact decision trees for grapheme- lation by triangulation: Making effective use of multi- to-phoneme transcription. In EUROSPEECH-2001, parallel corpora. In Proceedings of the 45th Annual pages 1911–1914, Aalborg, Denmark, September. Meeting of the Association of Computational Linguis- Grzegorz Kondrak. 2000. A new algorithm for the tics, pages 728–735, Prague, Czech Republic, June. alignment of phonetic sequences. In Proceedings of Association for Computational Linguistics. the First Meeting of the North American Chapter of Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui the Association for Computational Linguistics, pages Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A li- 288–295, Seattle, Washington, USA, April. brary for large linear classification. Journal of Ma- Haizhou Li, A Kumaran, Vladimir Pervouchine, and Min chine Learning Research, 9:1871–1874. Zhang. 2009a. Report of NEWS 2009 machine Andrew Finch and Eiichiro Sumita. 2010. Translitera- transliteration shared task. In Proceedings of the 2009 tion using a phrase-based statistical machine transla- Named Entities Workshop: Shared Task on Transliter- tion system to re-score the output of a joint multigram ation (NEWS 2009), pages 1–18, Suntec, Singapore, model. In Proceedings of the 2010 Named Entities August. Association for Computational Linguistics. Workshop (NEWS 2010), pages 48–52, Uppsala, Swe- Haizhou Li, A Kumaran, Min Zhang, and Vladimir Per- den, July. Association for Computational Linguistics. vouchine. 2009b. Whitepaper of NEWS 2009 ma- Sittichai Jiampojamarn and Grzegorz Kondrak. 2010. chine transliteration shared task. In Proceedings Letter-phoneme alignment: An exploration. In Pro- of the 2009 Named Entities Workshop: Shared Task ceedings of the 48th Annual Meeting of the Associ- on Transliteration (NEWS 2009), pages 19–26, Sun- ation for Computational Linguistics, pages 780–788, tec, Singapore, August. Association for Computational Uppsala, Sweden, July. Association for Computational Linguistics. Linguistics. Haizhou Li, A Kumaran, Min Zhang, and Vladimir Per- Sittichai Jiampojamarn, Grzegorz Kondrak, and Tarek vouchine. 2010. Report of NEWS 2010 transliteration Sherif. 2007. Applying many-to-many alignments generation shared task. In Proceedings of the 2010 and hidden Markov models to letter-to-phoneme con- Named Entities Workshop (NEWS 2010), pages 1–11, version. In Human Language Technologies 2007: The Uppsala, Sweden, July. Association for Computational Conference of the North American Chapter of the As- Linguistics. sociation for Computational Linguistics; Proceedings Korin Richmond, Robert Clark, and Sue Fitt. 2009. Ro- of the Main Conference, pages 372–379, Rochester, bust LTS rules with the Combilex speech technology New York, USA, April. Association for Computational lexicon. In Proceedings of Interspeech, pages 1295– Linguistics. 1298, Brighton, UK, September. Sittichai Jiampojamarn, Aditya Bhargava, Qing Dou, Eric Sven Ristad and Peter N. Yianilos. 1998. Learn- Kenneth Dwyer, and Grzegorz Kondrak. 2009. Di- ing string edit distance. IEEE Transactions on Pattern recTL: a language independent approach to translitera- Recognition and Machine Intelligence, 20(5):522– tion. In Proceedings of the 2009 Named Entities Work- 532, May. shop: Shared Task on Transliteration (NEWS 2009), Henk van den Heuvel, Jean-Pierre Martens, and Nanneke pages 28–31, Suntec, Singapore, August. Association Konings. 2007. G2P conversion of names. what can for Computational Linguistics. we do (better)? In Proceedings of Interspeech, pages Sittichai Jiampojamarn, Colin Cherry, and Grzegorz Kon- 1773–1776, Antwerp, Belgium, August. drak. 2010. Integrating joint n-gram features into a Qian Yang, Jean-Pierre Martens, Nanneke Konings, and discriminative training framework. In Human Lan- Henk van den Heuvel. 2006. Development of a guage Technologies: The 2010 Annual Conference of phoneme-to-phoneme (p2p) converter to improve the the North American Chapter of the Association for grapheme-to-phoneme (g2p) conversion of names. In
Proceedings of the 2006 International Conference on Language Resources and Evaluation, pages 2570– 2573, Genoa, Italy, May. Min Zhang, Xiangyu Duan, Vladimir Pervouchine, and Haizhou Li. 2010. Machine transliteration: Leverag- ing on third languages. In Coling 2010: Posters, pages 1444–1452, Beijing, China, August. Coling 2010 Or- ganizing Committee.
You can also read