A Survey on Retrieval-Augmented Text Generation
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
A Survey on Retrieval-Augmented Text Generation Huayang Li♥,∗ Yixuan Su♠,∗ Deng Cai♦,∗ Yan Wang♣,∗ Lemao Liu♣,∗ ♥ Nara Institute of Science and Technology ♠ University of Cambridge ♦ The Chinese University of Hong Kong ♣ Tencent AI Lab li.huayang.lh6@is.naist.jp, ys484@cam.ac.uk thisisjcykcd@gmail.com, brandenwang@tencent.com lemaoliu@gmail.com Abstract firstly present the generic paradigm of retrieval- augmented generation as well as three key com- Recently, retrieval-augmented text generation ponents under this paradigm, which are retrieval attracted increasing attention of the compu- sources, retrieval metrics and generation models. tational linguistics community. Compared arXiv:2202.01110v1 [cs.CL] 2 Feb 2022 Then, we introduce notable methods about with conventional generation models, retrieval- augmented text generation has remarkable ad- retrieval-augmented generation, which are orga- vantages and particularly has achieved state-of- nized with respect to different tasks. Specifically, the-art performance in many NLP tasks. This on the dialogue response generation task, exem- paper aims to conduct a survey about retrieval- plar/template retrieval as an intermediate step has augmented text generation. It firstly highlights been shown beneficial to informative response gen- the generic paradigm of retrieval-augmented eration (Weston et al., 2018; Wu et al., 2019; Cai generation, and then it reviews notable ap- et al., 2019a,b). In addition, there has been growing proaches according to different tasks including dialogue response generation, machine trans- interest in knowledge-grounded generation explor- lation, and other generation tasks. Finally, it ing different forms of knowledge such as knowl- points out some important directions on top of edge bases and external documents (Dinan et al., recent methods to facilitate future research. 2018; Zhou et al., 2018; Lian et al., 2019; Li et al., 2019; Qin et al., 2019; Wu et al., 2021; Zhang et al., 1 Introduction 2021). On the machine translation task, we summa- rize the early work on how the retrieved sentences Retrieval-augmented text generation, as a new (called translation memory) are used to improve text generation paradigm that fuses emerging deep statistical machine translation (SMT) (Koehn et al., learning technology and traditional retrieval tech- 2003) models (Simard and Isabelle, 2009; Koehn nology, has achieved state-of-the-art (SOTA) per- and Senellart, 2010) and in particular, we inten- formance in many NLP tasks and attracted the at- sively highlight several popular methods to inte- tention of the computational linguistics community grating translation memory to NMT models (Gu (Weston et al., 2018; Dinan et al., 2018; Cai et al., et al., 2018; Zhang et al., 2018; Xu et al., 2020; 2021). Compared with generation-based counter- He et al., 2021). We also review the applications part, this new paradigm has some remarkable ad- of retrieval-augmented generation in other genera- vantages: 1) The knowledge is not necessary to be tion tasks such as abstractive summarization (Peng implicitly stored in model parameters, but is explic- et al., 2019), code generation (Hashimoto et al., itly acquired in a plug-and-play manner, leading 2018), paraphrase (Kazemnejad et al., 2020; Su to great scalibility; 2) Instead of generating from et al., 2021b), and knowledge-intensive generation scratch, the paradigm generating text from some re- (Lewis et al., 2020b). Finally, we also point out trieved human-written reference, which potentially some promising directions on retrieval-augmented alleviates the difficulty of text generation. generation to push forward the future research. This paper aims to review many representative approaches for retrieval-augmented text generation 2 Retrieval-augmented Paradigm tasks including dialogue response generation (We- ston et al., 2018), machine translation (Gu et al., 2.1 Formulation and Motivation 2018) and others (Hashimoto et al., 2018). We Most text generation tasks can be formulated as a ∗ All authors contribute equally. mapping from input sequence x to output sequence
Information Retrieval Sources Training Unsupervised Sec. 3: Dialogue Sec. 4: Machine Sec. 5: Other External Data Tasks: (Sec. 2.2): Corpus Data Generation Translation Tasks Metrics Sparse-vector Dense-vector Task-specific Models Data Attention Skeleton & (Sec. 2.3): Retrieval Retrieval Retrieval (Sec 2.4): Augmentation Mechanism Templates Input Retrieval Memory Generation Model Output Figure 1: The overview of this survey. y : y = f (x). For example, x and y could be et al., 2018; Weston et al., 2018). In the inference the dialogue history and its response in dialogue time, retrieved examples with high relevant scores generation, sequences in source language and target could be regarded as extra references and reduce language in machine translation, and so on. model’s uncertainty in generation. The main moti- Recently, some researchers suggest to endow vation of those works is to to store knowledge not models the capability to access external memory only in the model parameters but also in an explicit via some information retrieval techniques, so that and accessible form, making the model be able to they can acquire more information in the generation re-access it during inference. process (Gu et al., 2018; Weston et al., 2018; Cai Some researchers also propose to retrieval rel- et al., 2019b). The retrieval-augmented generation evant samples from external datasets (Su et al., can be further formulated as: 2021c; Xiao et al., 2021). In these studies, the re- trieval pool is different with the training corpus, y = f (x, z) (1) which can further provide additional information where z = {hxr , y r i} is a set of relevant instances that are not contained in the training corpus. This retrieved from the original training set or external is especially beneficial for applications such as do- datasets. The main idea of this paradigm is that main adaptation and knowledge update. For exam- y r may benefit the response generation, if xr (or ple, Khandelwal et al. (2020a); Zheng et al. (2021a) y r ) is extremely relevant to the input x. It is worth employ the in-domain dataset as the external mem- noting that xr = ∅ when unsupervised retrieval ory to achieve fast domain adaptation for machine sources are used. More details about how to get z translation. will be discussed in §2.3. One limitation for previous two sources is that In this section, we will briefly introduce some the datasets have to be supervised datasets con- basic IR techniques. In general, the retrieval mem- sisting of aligned input-output pairs. For machine ory can be retrieved from three kinds of sources: translation, Cai et al. (2021) propose a cross-lingual the training corpus, external datasets in the same retriever to directly retrieve target sentence from format with the training corpus, and large-scale unsupervised monolingual corpus. The main idea unsupervised corpus (§2.2). Metrics that evaluate is aligning source-side sentences and the corre- the relevance between text are varied as well, in sponding target-side translations in a dense vector §2.3 we divided them into three categories: sparse- space, i.e., aligning x and y r when xr is absent. vector retrieval, dense-vector retrieval, and training- As a result, the retriever directly connects the dots based retrieval. Finally, how to integrate the re- between the source-side input and target-side trans- trieval memory to the generation model is also sig- lations, enabling monolingual data in the target nificant, we also introduce some popular integra- language to be used alone as memories. tion approaches in §2.4. 2.3 Retrieval Metrics 2.2 Retrieval Sources Given an input sequence x and a retrieval corpus, Most previous studies search the external memory retrieval model aims to retrieve a set of relevant from its training corpus (Song et al., 2016; Gu examples z = {hxr , y r i} from the corpus. When
a supervised corpus is used, {hxr , y r i} is retrieved 2017) a key module in lots of NLP models, integrat- by measuring the similarity between x and xr . ing retrieved memory through attention becomes a For similarity measurement, sparse-vector very nature and efficient way. retrieval methods such as TF-IDF and In previous two methods, an NLP model learns BM25 (Robertson and Zaragoza, 2009) are how to filter out irrelevant or even harmful informa- widely used. They match keywords efficiently tion from the retrieved examples implicitly. There with an inverted index. However, these methods also exist some works that try to explicitly extract prefer examples with similar surfaces, and may useful information, i.e., skeleton extraction, from fail to retrieve examples that are only semantically the retrieved memory (Cai et al., 2019a; Wu et al., relevant. 2019; Cai et al., 2019b). For example, one skeleton To alleviate above problem, some studies (Cao should be a part of a whole utterance with irrelevant and Xiong, 2018) attempt to retrieve in dense- content masked, and the generation model only in- vector space instead of the lexical overlap. Re- tegrate this skeleton in the generation process. cent work (Lee et al., 2019) makes use of pre- trained language models, which encodes the text to 3 Dialogue Response Generation low-dimensional dense vectors via BERT-based en- coders. The retrieval score are computed via inner Background Dialogue systems can be grouped products between vectors. into two categories: chit-chat systems and task- oriented systems. While task-oriented dialogue Similarity-based retrieval is based on a simple systems are designed to accomplish specific user heuristic. That is, the more xr resembles with x, tasks such as air tickets booking, chit-chat dialogue the more likely xr and y r will help the generation. systems aim at giving a meaningful and fluent re- However, the most similar one by universal textual sponse for any dialogue history in the open domain. similarity does not necessarily serve the best for Dialogue response generation in chit-chat dialogue downstream models. Ideally, the retrieval metric system is challenging partly due to the diversity would be learned from the data in a task-dependent of possible responses to a single dialogue history way: we wish to consider a memory only if it can (i.e., the one-to-many problem). The dialogue his- indeed boost the quality of final generation. Cai tory alone cannot decide a meaningful and specific et al. (2021) propose to unify the memory retriever response. Also, external knowledge that is not and its downstream NMT model into a learnable present in the dialogue history are often necessary whole. Such memory retrieval is end-to-end opti- for avoiding safe but boring responses. We focus mized for task-specific objectives. on recent efforts tackling the challenges to develop chit-chat dialogue systems. 2.4 Integration Most modern chit-chat dialogue systems can There are several ways to integrate the retrieved be categorized into two classes, namely, retrieval- external memory in generation. One straightfor- based models and generation-based models. The ward way is data augmentation, which constructs retrieval-based models (Ji et al., 2014; Hu et al., some augmented inputs by concatenating spans 2014) directly copy an existing response from cu- from {hxr , y r i} with the original input x. By train- rated dialogue corpora (i.e., the retrieval pool) ing on the augmented inputs, a generation model when receiving a response request. The retrieved implicitly leans how to integrate the retrieved infor- responses are often informative and grammatical mation. Despite the simplicity, this kind of methods as they are collected from real-world conversa- works efficiently in lots of tasks (Song et al., 2016; tions and possibly post-edited by a human. How- Weston et al., 2018; Bulte and Tezcan, 2019). ever, such systems perform poorly when a given Another integration method is based on the at- dialogue history is substantially different from tention mechanism (Bahdanau et al., 2014). The those in the retrieval pool. On the other hand, main idea of this fashion is adopting additional en- the generation-based models (Shang et al., 2015; coders (in various architectures) to encode retrieved Vinyals and Le, 2015; Li et al., 2016a) generate target sentences, and integrate them through atten- a new utterance from scratch. Those generation- tion (Cao and Xiong, 2018; Gu et al., 2018; Bapna based models have better generalization capacity and Firat, 2019). Since the attention mechanism is when handling unseen dialogue contexts. Never- becoming (Bahdanau et al., 2014; Vaswani et al., theless, the generated utterances are inclined to be
dull and non-informative (e.g., “I don’t know”, “I generating the skeletons used for training, which think so”, “Me too” etc.) (Li et al., 2016a). extract skeletons from the corresponding responses with some deliberate disturbance. Paranjape et al. Shallow Integration As discussed, retrieval- (2021) propose to model the retriever after the pos- based models may give informative but inappro- terior distribution of retrieval given the input and priate responses while generation-based models the target output and train it jointly with the stan- often do the opposite. It is desirable to combine the dard retriever and the generator by maximizing the best of both worlds. Early work (Qiu et al., 2017) evidence lower bound (ELBo) in expectation over attempts to re-rank the output from both models. retrieval. For a deep integration, Song et al. (2016) and Yang et al. (2019) extend the standard S EQ 2S EQ encoder- Knowledge-Enhanced Generation The afore- decoder model (Bahdanau et al., 2014) with an ex- mentioned work demonstrates that retrieval-based tra encoder for encoding the retrieval result. The dialogue systems can be used for building bet- output of the extra encoder, along with the output ter generation-based models. In general, this is from the original encoder for dialogue history, is done by conditioning the generation on some re- used to feed the decoder. Weston et al. (2018) use trieved responses. More traditionally, to infuse a single encoder that takes the concatenation of the response with external knowledge, the retrieval the original dialogue history and the retrieved as pool is not necessarily a dialogue corpus. In fact, input. Wu et al. (2019) note that the retrieved infor- knowledge-grounded dialogue response generation mation should be used in awareness of the context exploring different forms of knowledge such as difference, and further proposed to construct an knowledge bases and external documents (Dinan edit vector by explicitly encoding the lexical differ- et al., 2018; Zhou et al., 2018; Lian et al., 2019; ences between the input dialogue history and the Li et al., 2019; Qin et al., 2019; Wu et al., 2021; retrieved dialogue history. Pandey et al. (2018) fur- Zhang et al., 2021; Komeili et al., 2021) has been ther propose to weight different training instances actively explored. by context similarity. Limitations We note that there are three major limitations in existing work for dialogue response Deep Integration To prevent the inflow of er- generation. First, current methods only use one roneous information, Cai et al. (2019a) propose retrieved response for generation. It can be more a general framework that first extracts a skeleton beneficial to combine multiple retrieval responses. from the retrieved response and then generates the However, this can be difficult due to the one-to- response based on the extracted skeleton. This many nature of dialogue response generation. Sec- framework is also adopted for stylistic response ond, current methods use universal relevance score generation (Su et al., 2021c). Gupta et al. (2021) for retrieval. It can be more effective if we can suggest to use the semantic structure of an exem- use more customized retrieval metric especially plar response, instead of the tokens of the exem- for controlled dialogue response generation (e.g., plar response, to guide generation. Despite their persona, emotion, etc). Third, the retrieval pool differences, a common issue is that the genera- of existing methods is limited to dialogue corpora tion model easily learns to ignore the retrieved re- (context-response pairs) or documents. It might sponse entirely and collapses to a vanilla seq2seq be useful to enlarge the retrieval pool by including model. This happens with improper training in- more corpora in other domains or in other modali- stances. Due to the one-to-many nature, it hap- ties. As discussed, there leaves plenty of possible pens frequently that a retrieved response (extracted directions to explore in the future. skeleton) is suitable for responding to the query, but inconsistent with the current target response. 4 Machine Translation Earlier studies (Weston et al., 2018; Wu et al., 2019; Cai et al., 2019a) alleviate the above prob- Retrieval augmented translation originates from hu- lems by putting hard constraints on the data (e.g., man translation scenarios (Somers, 2003). When discarding data with low similarity of the retrieved translating ŷ from an input source sentence x, a hu- response and the target response), which, however, man translator typically involves a search engine to greatly reduces the amount of usable data. Cai retrieve similar sentences {hxr , y r i} from a bilin- et al. (2019b) employ a random mechanism for gual database. Such a technique called translation
memory is helpful to improve the translation qual- translation rules into the phrase table in a shallow ity and efficiency for human translators (Dillon combination way. They introduce an additional fea- and Fraser, 2006). As the development of ma- ture to indicate that whether translation rule is from chine translation techniques, there is a surge of {hxr , y r i} or not and then train all feature weights interests in improving machine translation models with MERT (Och, 2003). One characteristic of with translation memory. In the rest of this section, these work is that a translation rule extracted from we will review translation memory for both statisti- {hxr , y r i} which can not exactly match any seg- cal machine translation (SMT) and neural machine ments in x is useless even if it may contain some translation (NMT). useful words in its target side. To remedy this ob- servation, Wang et al. (2013, 2014) resort to a deep 4.1 Translation Memory in SMT combination way to using the extracted translation Generally, SMT includes three key components in rules. For each rule in the phrase table, it designs a pipeline manner such as phrase table extraction, a generative model to reward the rules which are parameter tuning and decoding (Koehn et al., 2003; similar to those extracted from {hxr , y r i}. Then Chiang, 2007). As a result, many efforts have been this generative model is used as a feature in the log- made to make use of translation memory (TM) on linear based SMT model whose weight is tuned top of each component. together with other features by MERT. In addition, Constrained Decoding with TM Constrained Li et al. (2014) employ a similar way to reward decoding is the most straightforward way to in- the rules but it relies on a discriminative model tegrating TM into SMT (Smith and Clark, 2009; which is easy to integrate potential features from Koehn and Senellart, 2010; Zhechev and Van Gen- {hxr , y r i}. abith, 2010; Ma et al., 2011). Its basic idea is Parameter Tuning with TM Unlike the above to reuse the useful segments in y r while trans- two research lines, Liu et al. (2012, 2014) make use late other segments by SMT. Specifically, the ap- of translation memory only in tuning parameters. proach consists of three steps: 1) identify the un- To be specific, when translating an input sentence matched segments in both xr and x through the x, they firstly retrieve many similar bilingual sen- edit-distance algorithm; 2) identify the unmatched tences {hxr , y r i}, and then tune the parameters on segments in y r , each of which is aligned to one top of the retrieved sentences as well as a given de- unmatched segment in xr by a word alignment velopment dataset in a sentence-wise manner, i.e., algorithm; 3) decode each unmatched segment in it performs an independent tuning for each input x by SMT and then use the result to replace its sentence. To improve the efficiency of each tuning corresponding unmatched segment in y r . Li et al. step, it propose a local update on top of {hxr , y r i} (2016b) further extend this approach from sentence from a baseline model. level to phrase level. The advantage in constrained Despite the successes of translation memory in decoding is that it does not require to change the SMT, there are still some limitations for the above translation model (including phrase table and pa- three kinds of methods. Firstly, all these methods rameters) and can be applied in a plug-and-play employ fuzzy score for retrieval which is highly de- way. This approach is successful when x is highly pendent on word matching and thus can not recall similar to xr ; otherwise its performance is de- such examples which are similar in word seman- graded largely, because it explicitly isolates TM tics but different in surface form. Secondly, these matching and SMT decoding and reuses the results methods integrate the retrieved examples into a in xr or not in a deterministic way. module of SMT in the ways which can not make Phrase Table Aggregation with TM There are full use of the knowledge in retrieved examples. also notable efforts to augment the phrase table For example, the integration ways in the first two for SMT by extracting translation rules from the kinds (constrained decoding and phrase table ag- retrieved bilingual sentences {hxr , y r i}. Then gregation) are heuristic and not optimized towards they re-tune the parameters for the SMT model translation quality; the parameter tuning method which makes use of translation knowledge from fine-tunes few parameters for log-linear based SMT {hxr , y r i} in a implicit way when translating x. which are not enough to preserve sufficient knowl- For example, Biçici and Dymetman (2008); Simard edge from retrieved examples. Thirdly, since SMT and Isabelle (2009) directly combine the extracted performs in a pipeline manner, it is intractable to
jointly optimize retrieval metrics as well as SMT a light-weight network to learn the reward score. models. Consequently, all these methods adopt an Since dense retrieval has the potential of cross- off-the-shelf metric for retrieval, leading to sub- lingual retrieval, Zheng et al. (2021b) use a similar optimal performance. approach to achieve unsupervised domain adapta- tion, where a main change is to create the datastore 4.2 Translation Memory in NMT based on synthetic sources sentence and the real Translation memory has been widely explored in target sentences. Neural Machine Translation (NMT). Depending on when retrieval is involved, we can categorize Training Phase Different from those model- previous works into two classes: 1) an NMT model agnostic approaches, previous works in this line leans how to cooperate with the retrieval model in aim to train the generation model to learn how the training phase; 2) an NMT model is only aware to cooperate with the retrieval model. It is also of the retrieved data in the inference phase. worth noting that most works in this line adopt the sentence-level retrieval, when integrating the Inference Phase The key point of literature in retrieval information in the training process. To this line is to reward some target words based on achieve its goal, Bulte and Tezcan (2019) and words in y r in the inference process. Thus, a de- Hossain et al. (2020) propose a data augmenta- cision can be made based on both the distribution tion method to integrate the retrieved information, of generation model and the additional reward of where x is concatenated with y r before feeding retrieval model. Some previous works propose to into the model . Following the data augmentation reward target words based on the sentence-level approach, Xu et al. (2020) propose more matching similarity between x and xr , and the word align- methods to determine including which retrieved ment between xr and y r . Given the input sentence example in the source is better. x, Zhang et al. (2018) try to assign target words There also exist some works that propose new in ŷ with higher rewards, when they appear in y r architectures to integrate the retrieval information. and the aligned source words are in both xr and Under the RNN-based framework, Cao and Xiong x. He et al. (2019) follow a similar framework (2018) and Gu et al. (2018) use the gating and at- and consider the position information of those tar- tention mechanism to incorporate the retrieved tar- get words when rewarding. Those works reward get sentences. When Transformer (Vaswani et al., the target words in an explicit way, however, the 2017) becomes the backbone of NMT, some works one-sentence-one-model approach (Li et al., 2016c; also use additional transformer encoders to en- Turchi et al., 2017) propose to reward target word code retrieved target sentences, and integrate them implicitly. For each testing input x, their approach through attention mechanism (Bapna and Firat, will first finetune the translation model on retrieved 2019; Cao et al., 2019). Xia et al. (2019) repre- memory {hxr , y r i} and then translate x. sent the retrieved target sentences in a different Others try to reward target words based on token- data structure, i.e., a graph structure, and integrate level similarity score. Most works in this line are it through attention mechanism. He et al. (2021) based on the dense retriever (Khandelwal et al., propose a light-weight method to encode the re- 2020a), e.g., faiss. Khandelwal et al. (2020a) build trieved target sentences and leverage the alignment a key-value datastore, where key h(xr , y r
primary feature to derive reward scores. How- augmented text generation. Peng et al. (2019) ever, some information, e.g., frequencies of words propose an adaptive decoding framework which and context, may also be beneficial for integrating first retrieves an exemplar document given the the translation memory. Second, it remains to be source document. Then, the summarization of the an open question that when should we use the re- source document is derived through an adaptive trieved information and when not. In the inference generation process based on the retrieved template. phase, approaches tend to integrate the translation Different from Peng et al. (2019), Cao et al. memory excessively, e.g., at each time step, which (2018) and Hossain et al. (2020) introduce an not only reduces the translation efficiency but may intermediate re-ranking stage into the generation also dampen the fluency of generated results. pipeline. Specifically, before generating the document summary, the retrieval documents are 5 Other Tasks first re-ranked based on their similarity scores with respect to the source document. Then, the In addition to dialogue system and machine trans- document summarization is produced by re-writing lation, retrieval-augmented generation techniques the selected templates. have shown to be beneficial in many other tasks. In the following, we highlight several key tasks that Paraphrase Generation To address the lack of apply retrieval-augmented generation approaches.1 quality as well as diversity in the generation of para- phrases, Kazemnejad et al. (2020) propose a gen- Language Modelling It has been shown that eration framework which first retrieves a sentence properly leveraging information from retrieval that is similar to input sentence. Then, based on memory could improve the performance of large the retrieved sentence, a neural editor produces the pre-trained language model. To build a more accu- resulting paraphrased sentence. Chen et al. (2019) rate language model, Khandelwal et al. (2020b) pro- investigate a different aspect of paraphrasing, i.e. pose to incorporate a soft memory module into the how to control the linguistic syntax displayed in system. Specifically, an index is built by caching the generated text. To achieve this goal, Chen et al. the hidden states of the training corpus. Then, the (2019) propose to first extract a sentential exem- language model accesses the index via k-NN search plar that serves as the syntax template. A neural and displays a greatly improved performance. As model then generates the paraphrase with desired another example, Guu et al. (2020) propose a new linguistic syntax following the retrieved exemplar. paradigm that applies retrieval-augmented tech- nique into the pre-training of generative language Text Style Transfer To improve the quality of model. During learning, they train a neural se- generated text, Li et al. (2018) propose a retrieval- lector that dynamically samples a relevant text to augmented framework which first retrieves texts guide the reconstruction of a corrupted input se- that are similar to the input based on lexical-level quence. In this way, the pre-trained model deliv- similarity. Then, the retrieved tokens that are irrel- ers better results by explicitly grounding on the evant to the source are deleted, and the output is retrieval memory. Lewis et al. (2020a) combine derived from the edited template. Xiao et al. (2021) language model pre-training with a paraphrasing also adopte this framework by incorporating re- approach. During learning, an input sequence to trieval information from two sources (i.e. sparse the model is first corrupted. In the meantime, a set and dense memories) and obtained an improved of multi-lingual texts are retrieved based on which model performance. the model learns to reconstruct the original input sequence. Recently, Borgeaud et al. (2021) pro- Data-to-Text Generation Recently, retrieval- pose RETRO, a large pre-trained language model augmented generation has been adapted to the task enhanced with retrieved documents, and obtained of data-to-text generation. To bridge the gap be- comparable performances with GPT-3 using 25× tween the structured data and natural language fewer parameters. text, Su et al. (2021a) propose a novel retrieval- augmented framework. Specifically, given the Summarization Text summarization is another source data, a set of candidate texts are first re- research area that benefits from retrieval- trieved from a large unlabelled corpus. Then, a 1 Here, we focus on tasks other than question answering. neural selector is applied to measure the similari- We refer readers interested in QA to Chen and Yih (2020). ties between the source data and candidate texts,
and extract a set of more fine-grained prototypes and generation models. However, in practice, there from the candidates. Lastly, a generation model is an essential gap about the retrieval metric be- takes the prototypes as input to produce the text tween the training and inference phrases. In the that describes the given structured data. training phase, the loss is locally back-propagated While retrieval-augmented generation has been to only a few retrieved examples while in the infer- widely explored in the NLP community, we sug- ence phase the metric is globally conducted among gest that future research could extend this approach all examples in the memory. It would be interesting to tasks that involve data from multiple modali- to narrow such a gap when learning a better metric ties. For instance, with recent advancements in for generation tasks. image-text retrieval (Jia et al., 2021; Radford et al., 2021), the structural gap between images and texts Multi-Modalities With recent advancement in is largely bridged. Some early studies (Zhang et al., image-text retrieval, directly associating images 2020) have shown that information retrieved from with relevant text becomes possible. This urges images could improve the performance of neural researchers to investigate the possibility of retrieval- machine translation model. Naturally, such meth- based text generation in tasks that involve data from ods could be extended to other multi-modal tasks, different modalities. One typical task is image such as image captioning (Karpathy and Li, 2015). captioning. Beyond images, other tasks like speech- A similar idea could also be applied to tasks be- to-text transcription could potentially benefit from yond images, such as speech-to-text transcription retrieval-based generation methods as well. (Gales and Young, 2007). Diverse & Controllable Retrieval Most of the 6 Future Directions existing approaches adopt a universal metric for Despite the current success of retrieval augmented retrieval, such as lexical similarities of sentences. text generation, there is still a long way to go as Future work should explore how to use customized discussed in previous sections. We highlight some metrics for retrieval. This can be beneficial for directions to facilitate the future research as fol- more controlled text generation. For example, in- lows: stances with emotions and styles may be more de- sirable in the personalized dialogue generation, par- Retrieval Sensitivity The performance of re- allel data that contains specific terminologies is trieval augmented text generation is very sensitive more helpful in machine translation, and so on. On to the retrieval quality, i.e., the similarity between the other hand, using a universal metric for retrieval the query and the retrieved examples. Currently, re- may lead to the lack of diversity of the retrieval re- trieval augmented text generation models perform sults. Collecting a diverse set of retrieval results well when the retrieved examples are very simi- can improve the coverage of useful information. lar to the query. However, they are even worse Thus, considering multiple different metrics for re- than the generation models without retrieval when trieval may lead to generation with higher quality the retrieval examples are less similar. Therefore, in the future. it would be important to exploit new methods to address such an issue on similarity. 7 Conclusion Retrieval Efficeincy Generally, if one enlarges In this paper, we surveyed recent approaches for the retrieval memory to some extent, it would be retrieval-augmented text generation. We reviewed possible to retrieve an example which is very simi- and summarized the development of different com- lar to the query.Unfortunately, the downside is that ponents of retrieval-augmented text generation in- the overall inference for the retrieval augmented cluding retrieval metrics, retrieval sources, and in- generation models is less efficient due the consid- tegration paradigms. We gave in-depth discussions erable retrieval overhead. In this sense, it is urgent when retrieval-augmented text generation comes to to consider some methods to trade off the retrieval different applications including dialogue response memory size and retrieval efficiency, for example, generation, machine translation, and other genera- data compression for the retrieval memory. tion tasks. We also pointed out some future direc- Local vs. Global Optimization Theoretically, it tions for retrieval-augmented text generation. seems promising to jointly learn retrieval metrics
References Qian Cao, Shaohui Kuang, and Deyi Xiong. 2019. Learning to reuse translations: Guiding neural ma- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- chine translation with examples. arXiv preprint gio. 2014. Neural machine translation by jointly arXiv:1911.10732. learning to align and translate. arXiv preprint arXiv:1409.0473. Qian Cao and Deyi Xiong. 2018. Encoding gated translation memory into neural machine translation. Ankur Bapna and Orhan Firat. 2019. Non-parametric In Proceedings of the 2018 Conference on Empiri- adaptation for neural machine translation. In Pro- cal Methods in Natural Language Processing, pages ceedings of the 2019 Conference of the North Amer- 3042–3047. ican Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol- Ziqiang Cao, Wenjie Li, Sujian Li, and Furu Wei. ume 1 (Long and Short Papers), pages 1921–1931. 2018. Retrieve, rerank and rewrite: Soft template based neural summarization. In Proceedings of the Ergun Biçici and Marc Dymetman. 2008. Dynamic 56th Annual Meeting of the Association for Com- translation memory: Using statistical machine trans- putational Linguistics, ACL 2018, Melbourne, Aus- lation to improve translation memory fuzzy matches. tralia, July 15-20, 2018, Volume 1: Long Papers, In International Conference on Intelligent Text Pro- pages 152–161. Association for Computational Lin- cessing and Computational Linguistics, pages 454– guistics. 465. Springer. Sebastian Borgeaud, Arthur Mensch, Jordan Hoff- Danqi Chen and Wen-tau Yih. 2020. Open-domain mann, Trevor Cai, Eliza Rutherford, Katie Millican, question answering. In Proceedings of the 58th An- George van den Driessche, Jean-Baptiste Lespiau, nual Meeting of the Association for Computational Bogdan Damoc, Aidan Clark, Diego de Las Casas, Linguistics: Tutorial Abstracts, pages 34–37, On- Aurelia Guy, Jacob Menick, Roman Ring, Tom Hen- line. Association for Computational Linguistics. nigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Ge- Mingda Chen, Qingming Tang, Sam Wiseman, and offrey Irving, Oriol Vinyals, Simon Osindero, Karen Kevin Gimpel. 2019. Controllable paraphrase gen- Simonyan, Jack W. Rae, Erich Elsen, and Laurent eration with a syntactic exemplar. In Proceedings of Sifre. 2021. Improving language models by retriev- the 57th Conference of the Association for Compu- ing from trillions of tokens. CoRR, abs/2112.04426. tational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages Bram Bulte and Arda Tezcan. 2019. Neural fuzzy re- 5972–5984. Association for Computational Linguis- pair: Integrating fuzzy matches into neural machine tics. translation. In Proceedings of the 57th Annual Meet- ing of the Association for Computational Linguistics, David Chiang. 2007. Hierarchical phrase-based trans- pages 1800–1809. lation. computational linguistics, 33(2):201–228. Deng Cai, Yan Wang, Wei Bi, Zhaopeng Tu, Xi- Sarah Dillon and Janet Fraser. 2006. Translators and aojiang Liu, Wai Lam, and Shuming Shi. 2019a. tm: An investigation of translators’ perceptions of Skeleton-to-response: Dialogue generation guided translation memory adoption. Machine Translation, by retrieval memory. In Proceedings of the 2019 20(2):67–79. Conference of the North American Chapter of the Association for Computational Linguistics: Human Emily Dinan, Stephen Roller, Kurt Shuster, Angela Language Technologies, Volume 1 (Long and Short Fan, Michael Auli, and Jason Weston. 2018. Wizard Papers), pages 1219–1228. of wikipedia: Knowledge-powered conversational agents. arXiv preprint arXiv:1811.01241. Deng Cai, Yan Wang, Wei Bi, Zhaopeng Tu, Xiao- jiang Liu, and Shuming Shi. 2019b. Retrieval- Mark J. F. Gales and Steve J. Young. 2007. The applica- guided dialogue response generation via a matching- tion of hidden markov models in speech recognition. to-generation framework. In Proceedings of the Found. Trends Signal Process., 1(3):195–304. 2019 Conference on Empirical Methods in Natu- ral Language Processing and the 9th International Jiatao Gu, Yong Wang, Kyunghyun Cho, and Vic- Joint Conference on Natural Language Processing tor OK Li. 2018. Search engine guided neural ma- (EMNLP-IJCNLP), pages 1866–1875. chine translation. In Proceedings of the AAAI Con- ference on Artificial Intelligence, volume 32. Deng Cai, Yan Wang, Huayang Li, Wai Lam, and Lemao Liu. 2021. Neural machine translation with Prakhar Gupta, Jeffrey Bigham, Yulia Tsvetkov, and monolingual translation memory. In Proceedings of Amy Pavel. 2021. Controlling dialogue generation the 59th Annual Meeting of the Association for Com- with semantic exemplars. In Proceedings of the putational Linguistics and the 11th International 2021 Conference of the North American Chapter of Joint Conference on Natural Language Processing the Association for Computational Linguistics: Hu- (Volume 1: Long Papers), pages 7307–7318, Online. man Language Technologies, pages 3018–3029, On- Association for Computational Linguistics. line. Association for Computational Linguistics.
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasu- Urvashi Khandelwal, Angela Fan, Dan Jurafsky, Luke pat, and Ming-Wei Chang. 2020. REALM: retrieval- Zettlemoyer, and Mike Lewis. 2020a. Near- augmented language model pre-training. CoRR, est neighbor machine translation. arXiv preprint abs/2002.08909. arXiv:2010.00710. Tatsunori B Hashimoto, Kelvin Guu, Yonatan Oren, Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke and Percy S Liang. 2018. A retrieve-and-edit frame- Zettlemoyer, and Mike Lewis. 2020b. Generaliza- work for predicting structured outputs. In Advances tion through memorization: Nearest neighbor lan- in Neural Information Processing Systems, pages guage models. In 8th International Conference on 10052–10062. Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. Qiuxiang He, Guoping Huang, Qu Cui, Li Li, and Lemao Liu. 2021. Fast and accurate neural machine Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. translation with translation memory. In Proceed- Statistical phrase-based translation. In Proceedings ings of the 59th Annual Meeting of the Association of the 2003 Human Language Technology Confer- for Computational Linguistics and the 11th Interna- ence of the North American Chapter of the Associa- tional Joint Conference on Natural Language Pro- tion for Computational Linguistics, pages 127–133. cessing (Volume 1: Long Papers), pages 3170–3180. Philipp Koehn and Jean Senellart. 2010. Convergence of translation memory and statistical machine trans- Qiuxiang He, Guoping Huang, Lemao Liu, and Li Li. lation. In Proceedings of AMTA Workshop on MT 2019. Word position aware translation memory for Research and the Translation Industry, pages 21–31. neural machine translation. In CCF International Conference on Natural Language Processing and Mojtaba Komeili, Kurt Shuster, and Jason Weston. Chinese Computing, pages 367–379. Springer. 2021. Internet-augmented dialogue generation. arXiv preprint arXiv:2107.07566. Nabil Hossain, Marjan Ghazvininejad, and Luke Zettle- moyer. 2020. Simple and effective retrieve-edit- Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. rerank text generation. In Proceedings of the 58th 2019. Latent retrieval for weakly supervised Annual Meeting of the Association for Computa- open domain question answering. arXiv preprint tional Linguistics, pages 2532–2538. arXiv:1906.00300. Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Ar- Chen. 2014. Convolutional neural network architec- men Aghajanyan, Sida Wang, and Luke Zettlemoyer. tures for matching natural language sentences. In 2020a. Pre-training via paraphrasing. In Advances NIPS, pages 2042–2050. in Neural Information Processing Systems 33: An- nual Conference on Neural Information Processing Zongcheng Ji, Zhengdong Lu, and Hang Li. 2014. An Systems 2020, NeurIPS 2020, December 6-12, 2020, information retrieval approach to short text conver- virtual. sation. arXiv preprint arXiv:1408.6988. Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Petroni, Vladimir Karpukhin, Naman Goyal, Hein- Parekh, Hieu Pham, Quoc V. Le, Yun-Hsuan Sung, rich Küttler, Mike Lewis, Wen-tau Yih, Tim Rock- Zhen Li, and Tom Duerig. 2021. Scaling up visual täschel, et al. 2020b. Retrieval-augmented gen- and vision-language representation learning with eration for knowledge-intensive nlp tasks. arXiv noisy text supervision. In Proceedings of the 38th In- preprint arXiv:2005.11401. ternational Conference on Machine Learning, ICML Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, 2021, 18-24 July 2021, Virtual Event, volume 139 of and Bill Dolan. 2016a. A diversity-promoting ob- Proceedings of Machine Learning Research, pages jective function for neural conversation models. In 4904–4916. PMLR. NAACL, pages 110–119. Andrej Karpathy and Fei-Fei Li. 2015. Deep visual- Juncen Li, Robin Jia, He He, and Percy Liang. 2018. semantic alignments for generating image descrip- Delete, retrieve, generate: a simple approach to sen- tions. In IEEE Conference on Computer Vision and timent and style transfer. In Proceedings of the 2018 Pattern Recognition, CVPR 2015, Boston, MA, USA, Conference of the North American Chapter of the June 7-12, 2015, pages 3128–3137. IEEE Computer Association for Computational Linguistics: Human Society. Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume Amirhossein Kazemnejad, Mohammadreza Salehi, and 1 (Long Papers), pages 1865–1874. Association for Mahdieh Soleymani Baghshah. 2020. Paraphrase Computational Linguistics. generation by learning how to edit from samples. In Proceedings of the 58th Annual Meeting of the Asso- Liangyou Li, Andy Way, and Qun Liu. 2014. A ciation for Computational Linguistics, pages 6010– discriminative framework of integrating translation 6021, Online. Association for Computational Lin- memory features into smt. In Proceedings of the guistics. 11th Conference of the Association for Machine
Translation in the Americas, volume 1, pages 249– Hao Peng, Ankur P. Parikh, Manaal Faruqui, Bhuwan 260. Dhingra, and Das Dipanjan. 2019. Text generation with exemplar-based adaptive decoding. In Proceed- Liangyou Li, Andy Way, and Qun Liu. 2016b. Phrase- ings of the Conference of the North American Chap- level combination of smt and tm using constrained ter of the Association for Computational Linguistics: word lattice. Association for Computational Lin- Human Language Technologies. guistics (ACL). Lianhui Qin, Michel Galley, Chris Brockett, Xiaodong Xiaoqing Li, Jiajun Zhang, and Chengqing Zong. Liu, Xiang Gao, William B Dolan, Yejin Choi, and 2016c. One sentence one model for neural machine Jianfeng Gao. 2019. Conversing by reading: Con- translation. arXiv preprint arXiv:1609.06490. tentful neural conversation with on-demand machine reading. In Proceedings of the 57th Annual Meet- Zekang Li, Cheng Niu, Fandong Meng, Yang Feng, ing of the Association for Computational Linguistics, Qian Li, and Jie Zhou. 2019. Incremental trans- pages 5427–5436. former with deliberation decoder for document grounded conversations. In Proceedings of the 57th Minghui Qiu, Feng-Lin Li, Siyu Wang, Xing Gao, Yan Annual Meeting of the Association for Computa- Chen, Weipeng Zhao, Haiqing Chen, Jun Huang, tional Linguistics, pages 12–21. and Wei Chu. 2017. Alime chat: A sequence to se- quence and rerank based chatbot engine. In ACL, Rongzhong Lian, Min Xie, Fan Wang, Jinhua Peng, pages 498–503. and Hua Wu. 2019. Learning to select knowledge for response generation in dialog systems. arXiv Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya preprint arXiv:1902.04911. Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Lemao Liu, Hailong Cao, Taro Watanabe, Tiejun Zhao, Gretchen Krueger, and Ilya Sutskever. 2021. Learn- Mo Yu, and Conghui Zhu. 2012. Locally training ing transferable visual models from natural lan- the log-linear model for smt. In Proceedings of the guage supervision. In Proceedings of the 38th In- 2012 Joint Conference on Empirical Methods in Nat- ternational Conference on Machine Learning, ICML ural Language Processing and Computational Natu- 2021, 18-24 July 2021, Virtual Event, volume 139 of ral Language Learning, pages 402–411. Proceedings of Machine Learning Research, pages 8748–8763. PMLR. Lemao Liu, Tiejun Zhao, Taro Watanabe, Hailong Cao, and Conghui Zhu. 2014. Discriminative training for Stephen Robertson and Hugo Zaragoza. 2009. The log-linear based smt: Global or local methods. ACM probabilistic relevance framework: BM25 and be- Transactions on Asian Language Information Pro- yond. Now Publishers Inc. cessing (TALIP), 13(4):1–25. Lifeng Shang, Zhengdong Lu, and Hang Li. 2015. Neu- Yanjun Ma, Yifan He, Andy Way, and Josef van Gen- ral responding machine for short-text conversation. abith. 2011. Consistent translation using discrim- In ACL, pages 1577–1586. inative learning-a translation memory-inspired ap- Michel Simard and Pierre Isabelle. 2009. Phrase-based proach. In Proceedings of the 49th Annual Meet- machine translation in a computer-assisted transla- ing of the Association for Computational Linguistics: tion environment. Proceedings of the Twelfth Ma- Human Language Technologies, pages 1239–1248. chine Translation Summit (MT Summit XII), pages 120–127. Yuxian Meng, Xiaoya Li, Xiayu Zheng, Fei Wu, Xi- aofei Sun, Tianwei Zhang, and Jiwei Li. 2021. James Smith and Stephen Clark. 2009. Ebmt for smt: Fast nearest neighbor machine translation. arXiv a new ebmt-smt hybrid. In Proceedings of the 3rd preprint arXiv:2105.14528. International Workshop on Example-Based Machine Translation, pages 3–10. Citeseer. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the Harold Somers. 2003. Translation memory systems. 41st Annual Meeting of the Association for Compu- Benjamins Translation Library, 35:31–48. tational Linguistics, pages 160–167, Sapporo, Japan. Association for Computational Linguistics. Yiping Song, Rui Yan, Xiang Li, Dongyan Zhao, and Ming Zhang. 2016. Two are better than one: An en- Gaurav Pandey, Danish Contractor, Vineet Kumar, and semble of retrieval-and generation-based dialog sys- Sachindra Joshi. 2018. Exemplar encoder-decoder tems. arXiv preprint arXiv:1610.07149. for neural conversation generation. In ACL, pages 1329–1338. Yixuan Su, Zaiqiao Meng, Simon Baker, and Nigel Col- lier. 2021a. Few-shot table-to-text generation with Ashwin Paranjape, Omar Khattab, Christopher Potts, prototype memory. In Findings of the Association Matei Zaharia, and Christopher D Manning. 2021. for Computational Linguistics: EMNLP 2021, Vir- Hindsight: Posterior-guided training of retrievers for tual Event / Punta Cana, Dominican Republic, 16- improved open-ended generation. arXiv preprint 20 November, 2021, pages 910–917. Association for arXiv:2110.07752. Computational Linguistics.
Yixuan Su, David Vandyke, Simon Baker, Yan Wang, Fei Xiao, Liang Pang, Yanyan Lan, Yan Wang, Huawei and Nigel Collier. 2021b. Keep the primary, rewrite Shen, and Xueqi Cheng. 2021. Transductive learn- the secondary: A two-stage approach for paraphrase ing for unsupervised text style transfer. In Proceed- generation. In Findings of the Association for Com- ings of the 2021 Conference on Empirical Methods putational Linguistics: ACL-IJCNLP 2021, pages in Natural Language Processing, EMNLP 2021, Vir- 560–569, Online. Association for Computational tual Event / Punta Cana, Dominican Republic, 7-11 Linguistics. November, 2021, pages 2510–2521. Association for Computational Linguistics. Yixuan Su, Yan Wang, Deng Cai, Simon Baker, Anna Korhonen, and Nigel Collier. 2021c. PROTOTYPE- Jitao Xu, Josep M Crego, and Jean Senellart. 2020. TO-STYLE: dialogue generation with style-aware Boosting neural machine translation with similar editing on retrieval memory. IEEE ACM Trans. Au- translations. In Proceedings of the 58th Annual dio Speech Lang. Process., 29:2152–2161. Meeting of the Association for Computational Lin- Marco Turchi, Matteo Negri, M Farajian, and Marcello guistics, pages 1580–1590. Federico. 2017. Continuous learning from human post-edits for neural machine translation. Liu Yang, Junjie Hu, Minghui Qiu, Chen Qu, Jian- feng Gao, W Bruce Croft, Xiaodong Liu, Yelong Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Shen, and Jingjing Liu. 2019. A hybrid retrieval- Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz generation neural conversation model. In Proceed- Kaiser, and Illia Polosukhin. 2017. Attention is all ings of the 28th ACM international conference on in- you need. In Advances in neural information pro- formation and knowledge management, pages 1341– cessing systems, pages 5998–6008. 1350. Oriol Vinyals and Quoc Le. 2015. A neural conversa- tional model. In ICML (Deep Learning Workshop). Jingyi Zhang, Masao Utiyama, Eiichiro Sumita, Gra- ham Neubig, and Satoshi Nakamura. 2018. Guiding Kun Wang, Chengqing Zong, and Keh-Yih Su. 2013. neural machine translation with retrieved translation Integrating translation memory into phrase-based pieces. In Proceedings of the 2018 Conference of the machine translation during decoding. In Proceed- North American Chapter of the Association for Com- ings of the 51st Annual Meeting of the Association putational Linguistics: Human Language Technolo- for Computational Linguistics (Volume 1: Long Pa- gies, Volume 1 (Long Papers), pages 1325–1335. pers), pages 11–21. Yizhe Zhang, Siqi Sun, Xiang Gao, Yuwei Fang, Chris Kun Wang, Chengqing Zong, and Keh-Yih Su. 2014. Brockett, Michel Galley, Jianfeng Gao, and Bill Dynamically integrating cross-domain translation Dolan. 2021. Joint retrieval and generation train- memory into phrase-based machine translation dur- ing for grounded text generation. arXiv preprint ing decoding. In Proceedings of COLING 2014, arXiv:2105.06597. the 25th International Conference on Computational Linguistics: Technical Papers, pages 398–408. Zhuosheng Zhang, Kehai Chen, Rui Wang, Masao Jason Weston, Emily Dinan, and Alexander Miller. Utiyama, Eiichiro Sumita, Zuchao Li, and Hai Zhao. 2018. Retrieve and refine: Improved sequence gen- 2020. Neural machine translation with universal eration models for dialogue. In Proceedings of the visual representation. In 8th International Confer- 2018 EMNLP Workshop SCAI: The 2nd Interna- ence on Learning Representations, ICLR 2020, Ad- tional Workshop on Search-Oriented Conversational dis Ababa, Ethiopia, April 26-30, 2020. OpenRe- AI, pages 87–92. view.net. Yu Wu, Furu Wei, Shaohan Huang, Yunli Wang, Zhou- Ventsislav Zhechev and Josef Van Genabith. 2010. jun Li, and Ming Zhou. 2019. Response generation Seeding statistical machine translation with trans- by context-aware prototype editing. In Proceedings lation memory output through tree-based structural of the AAAI Conference on Artificial Intelligence, alignment. In Proceedings of the 4th Workshop volume 33, pages 7281–7288. on Syntax and Structure in Statistical Translation, pages 43–51. Zeqiu Wu, Michel Galley, Chris Brockett, Yizhe Zhang, Xiang Gao, Chris Quirk, Rik Koncel-Kedziorski, Jianfeng Gao, Hannaneh Hajishirzi, Mari Ostendorf, Xin Zheng, Zhirui Zhang, Junliang Guo, Shujian et al. 2021. A controllable model of grounded re- Huang, Boxing Chen, Weihua Luo, and Jiajun Chen. sponse generation. In Proceedings of the AAAI Con- 2021a. Adaptive nearest neighbor machine transla- ference on Artificial Intelligence, volume 35, pages tion. arXiv preprint arXiv:2105.13022. 14085–14093. Xin Zheng, Zhirui Zhang, Shujian Huang, Boxing Mengzhou Xia, Guoping Huang, Lemao Liu, and Chen, Jun Xie, Weihua Luo, and Jiajun Chen. 2021b. Shuming Shi. 2019. Graph based translation mem- Non-parametric unsupervised domain adaptation for ory for neural machine translation. In Proceedings neural machine translation. In Findings of the As- of the AAAI Conference on Artificial Intelligence, sociation for Computational Linguistics: EMNLP volume 33, pages 7297–7304. 2021, pages 4234–4241.
Kangyan Zhou, Shrimai Prabhumoye, and Alan W Black. 2018. A dataset for document grounded con- versations. arXiv preprint arXiv:1809.07358.
You can also read