Logician and Orator: Learning from the Duality between Language and Knowledge in Open Domain
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Logician and Orator: Learning from the Duality between Language and Knowledge in Open Domain Mingming Sun1,2 , Xu Li1,2 , Ping Li1 1 Big Data Lab (BDL-US), Baidu Research 2 National Engineering Laboratory of Deep Learning Technology and Application, China {sunmingming01,lixu13,liping11}@baidu.com Abstract traction” or “relation classification”, which iden- We propose the task of Open-Domain Infor- tifies instances of a fixed and finite set of rela- mation Narration (OIN) as the reverse task of tions from natural language corpus, using super- Open Information Extraction (OIE), to imple- vised methods (Kambhatla, 2004; Zelenko et al., ment the dual structure between language and 2003; Miwa and Bansal, 2016; Zheng et al., 2017) knowledge in the open domain. We then de- or weakly supervised methods (Mintz et al., 2009; velop an agent, called Orator, to accomplish Lin et al., 2016). In the meantime, the close- the OIN task, and assemble the Orator and the recently proposed OIE agent – Logician (Sun domain IN (CIN) task (Wiseman et al., 2017; et al., 2018) into a dual system to utilize the Chisholm et al., 2017; Agarwal and Dymetman, duality structure with a reinforcement learning 2017; Vougiouklis et al., 2017; Yin et al., 2016) paradigm. Experimental results reveal the dual transforms a set of facts with a pre-defined schema structure between OIE and OIN tasks helps to or relation types (such as facts from Freebase (Bol- build better both OIE agents and OIN agents. lacker et al., 2008), DBpedia (Auer et al., 2007), or database tables), into natural language sen- 1 Introduction tences/documents. Furthermore, the dual structure The duality between language and knowledge is between CIE and CIN tasks has been noticed and natural for human intelligence. The human can ex- utilized in (Chisholm et al., 2017). tract knowledge from natural language to learn or For the open-domain problem, the open-domain remember, and then narrate the knowledge back IE (OIE) task is to investigate how the natural to natural language to communicate. Information language sentences express the facts, and then extraction (IE) is a task to simulate the first part use the learned knowledge to extract entity and of the duality, which is a long-term hot spot for relation level intermediate structures from open- NLP research. Recently, the task that fulfills the domain sentences (Christensen et al., 2011; Et- last part of the duality, that is, assembling a set zioni et al., 2011; Schmitz et al., 2012; Pal and of relation instances/facts or database records into Mausam, 2016). Although the OIE task has at- natural language sentences/documents, has also tracted much interests and obtained many appli- attracted many interests (Wiseman et al., 2017; cations (Christensen et al., 2013, 2014; Mausam, Chisholm et al., 2017; Agarwal and Dymetman, 2016; Stanovsky et al., 2015; Khot et al., 2017; 2017; Vougiouklis et al., 2017; Yin et al., 2016). In Fader et al., 2014), the OIN task has not been the literature, this task has been referred to as “data stated, neither the duality between the language to document generation” (Wiseman et al., 2017) and knowledge in the open domain. or “knowledge-to-text” (Chisholm et al., 2017). In this paper, we name the task as information narra- Open-Domain Closed-Domain tion (IN), to emphasize the reverse relationship to Extraction OIE CIE the information extraction (IE) task. Narration OIN CIN The duality between language and knowledge Table 1: Taxonomy: Tasks between knowledge and (and thus between the IE and IN tasks) can be natural language. examined in closed-domain or open-domain. For the closed-domain problem, the closed-domain IE The tasks involved in the duality between lan- (CIE) task is often referred to as “relation ex- guage and knowledge is shown in Table 1, where 2119 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2119–2130 Brussels, Belgium, October 31 - November 4, 2018. c 2018 Association for Computational Linguistics
the OIN task has not been stated. In this paper, CIN agents face problems from different prob- we focus on the OIN task and the duality between lem domains, from people biographies to basket- the OIE and OIN tasks, for following reasons: 1) ball game records, but most of them follow the the OIN task is an essential component for open- same sequence-to-sequence pattern. First, the al- domain information processing pipeline. For ex- gorithm encodes a sequence of facts into a set ample, it is helpful for building natural and in- of annotations and then decodes the annotations formative response for open-domain KBQA sys- into a natural language text. Mechanisms such tems (Khot et al., 2017; Fader et al., 2014). 2) (as as attention (Bahdanau et al., 2014) and copy- the results in this paper will illustrate) the duality ing (Gu et al., 2016) are employed into the de- between tasks can be valuable for building better coder to improve the performance. Then, the mod- agents for both tasks (Xia et al., 2017, 2016). els are trained on a supervised dataset with back- A major historical obstacle for investigating propagation. the duality between OIE and OIN tasks is the In this work, we adopt a similar sequence-to- absence of parallel corpus between natural lan- sequence architecture to build our baseline Ora- guage sentence and open-domain facts. Recently, tor agent, but with following differences: 1) the the SAOKE dataset (Sun et al., 2018) was re- Orator is proposed to narrate open domain facts, leased, which contains more than forty thou- where the encoder must encode words rather than sand of human-labeled open-domain sentence- the entities and relations in the closed domain; 2) facts pairs, and thus essentially eliminates the ob- the baseline Orator will be fine tuned using the stacle for our investigation. dual learning algorithm proposed in this paper. The contribution of this paper lies in following aspects: 2.2 Dual Learning Systems For many natural language processing tasks, there • We propose the concept of OIN task, which is exist corresponding reverse/dual tasks. One ex- potentially an important component for open- ample of a pair of dual problems is the question domain information pipeline. We develop the answering (QA) and question generation (QG). In Orator agent to fulfill the task; (Tang et al., 2017), the duality between QA prob- lem and QG problem was considered as a con- • We build a multi-agent system with Logician straint that both problems must share the same and Orator to exploit the dual structure be- joint probability. Then, a loss function that im- tween language and knowledge in open do- plemented the constraint was involved in the su- main. Experimental results reveal that the pervised learning procedures for both agents. Fur- dual information is beneficial for improving thermore, researchers (Tang et al., 2017; Duan the performance of both agents. et al., 2017; Sachan and Xing, 2018) use both the question-answering agent and the question- The paper is organized as follows: Section 2 generation agent to identify extra high-confident discusses the related work. Section 3 explains the question-answering pairs, which are further used Orator agent for OIN task. Section 4 describes the to fine tune the pre-trained agents. multi-agent system with Logician and Oration and its algorithm to learn from the duality between lan- Back-and-forth translation (or round-trip trans- guage and sentence. The experimental results of lation) 1 is another example of duality, in the field the fine tuned agents are shown and discussed in of machine translation. It has been employed to Section 5. We conclude our work and discuss the evaluate the quality of machine translation sys- future direction in Section 6. tems (Van Zaanen and Zwarts, 2006), or to test the suitability of text for machine translation (Gas- 2 Related Work pari, 2006; Shigenobu, 2007). Recently, (Xia et al., 2016) implemented the duality in a neural- 2.1 Information Narration based dual learning system, in which the quality of each translation agent was improved on the un- The closed-domain information narration (CIN) labeled dataset using the rewards provided by its task has been studied in (Wiseman et al., 2017; Chisholm et al., 2017; Agarwal and Dymet- 1 https://en.wikipedia.org/wiki/ man, 2017; Vougiouklis et al., 2017). These Round-trip_translation 2120
Chinese English Translation Sentence 他发明了有两个固定点(水的沸 He invented a 100 degree centimeter temperature 点和冰点)的100摄氏度计温尺, scale with two fixed points (the boiling point of wa- 是世界大多数地区通用的摄氏温 ter and the freezing point), which is the precursor of 度计的前身。 the Celsius thermometer in most parts of the world. Facts (他,发明,100摄氏度计温尺)(100摄 (He, invented, the 100 degree Celsius temperature 氏 度 计 温 尺,有,两 个 固 定 点)(水 scale) (100 Celsius temperature scale, has, two fixed 的[沸 点|冰 点],ISA,两 个 固 定 points) (water [boiling point | freezing point], ISA, 点)(100摄 氏 度 计 温 尺,是X的 前 two fixed points) (100 degrees Celsius temperature 身,世界大多数地区通用的摄氏温 scale, is the predecessor of X, most Celsius ther- 度计) mometers in most parts of the world) Table 2: An example sentence and the corresponding facts in the SAOKE dataset, where “ISA” is a symbol denoting the “is-a” relationship in the SAOKE format. corresponding dual agent, using the reinforcement related rewards. Furthermore, our approach is learning technique. more adaptable than the weight sharing approach adopted in (Chisholm et al., 2017). Parsing-reconstruction is also a pattern of du- ality. (Konstas et al., 2017) considered the AMR 3 Orator (Abstract Meaning Representation)(Banarescu et al., 2013) parsing problem (text to AMR) 3.1 SAOKE Dataset and AMR generation problem (AMR to text) in Symbolic Aided Open Knowledge Expression one system, in which the AMR parser generated (SAOKE) is proposed in (Sun et al., 2018) as the extra text-AMR pair data to fine tune the AMR form to honestly record the facts that humans can generator. The AMR generator, however, does not extract from sentences when humans read them. contribute to the performance improvement of the SAOKE uses a unified form - an n-ary tuple: AMR parser. The CIE agent and CIN agent in (Chisholm et al., 2017) also follow this pattern, (subject, predicate, object1 , · · · , objectN ), where both agents help each other to improve by sharing weights. Nevertheless, the sharing to express four categories of facts: 1) Rela- weight strategy cannot be applied to agents with tion: Verb/preposition-based n-ary relations be- different architecture, which is a typical situation tween entity mentions; 2) Attribute: Nominal at- in practice. tributes for entity mentions; 3) Description: De- scriptive phrases of entity mentions; 4) Concept: From these practices, it can be seen that the Hyponymy and synonymy relations among con- duality can be implemented with two major ap- cepts and instances. proaches: 1) by providing additional labeled sam- Using this SAOKE format, Sun et al. (2018) ples via bootstrapping, and 2) by adding losses or manually labeled the SAOKE dataset DSAOKE by rewards to the training procedure of the agents. In crowdsourcing, which includes more than forty this paper, we follow the second approach. We thousand sentence-facts pairs < S, F >.2 The la- design a set of rewards, among which some are beling procedure is under the supervision of the related to OIE and OIN tasks respectively, and “Completeness” criterion (Sun et al., 2018), so the some are related to the duality of the problems. facts recorded information in the sentence as much Then we optimize both agents using the reinforce- as possible (only auxiliary information and rela- ment learning technique. The learning algorithm tion between facts are omitted (Sun et al., 2018)). is similar to the dual-NMT algorithm described As a result, the SAOKE dataset is a valid open- in (Xia et al., 2016), but with adaption for the OIE domain sentence-facts parallel dataset for both and OIN tasks, especially on the task related re- OIE and OIN tasks. Table 2 is an example from wards. Compared to the approach of applying the the SAOKE dataset for an easy understanding of regularization about sharing the same joint prob- the dual relationship between sentence and facts. ability (Tang et al., 2017), our approach directly 2 optimizes the task objective by introducing task http://ai.baidu.com/broad 2121
3.2 Model where pF is the probability of copying from F and pV is the probability of selecting from V . The de- The Orator is an agent O that assembles a set of tails can be found in (Gu et al., 2016). open-domain facts F into a sentence S with prob- ability PO (S|F, ΘO ), where ΘO is the set of pa- 3.2.3 Coverage Mechanism rameters of O : To cope with the problem of information lost or redundancy in the generated sentence, the copied Orator O: F → PO (S|F, ΘO ). histories of previous generated words should be remembered to guide future generation. This For each pair < S, F >∈ DSAOKE , the set could be done through the coverage mecha- of facts F is actually expressed as a sequence of nism (Tu et al., 2016), in which a coverage vec- facts, in the order of the labeler wrote them. So, tor mtj is introduced for each word wjF in F the deep sequence to sequence paradigm is suit- and updated at each step t as a gated function of able to model the Orator. In this work, we build hF t−1 j , αtj , st−1 , mj . By this means, the coverage the base Orator model with the attention-based se- vectors remember the historical attentions over quence to sequence model, together with copy and source sequence and can be incorporated in the coverage mechanism, in a similar way of the im- alignment model to generate complete and non- plementation of the Logician in (Sun et al., 2018). redundant sentences. Detailed formulations can be found in (Tu et al., 2016) and (Sun et al., 2018). 3.2.1 Attention based Sequence-to-sequence Learning 4 Learning the Dual Structure between The attention-based sequence-to-sequence learn- Knowledge and Natural Language ing (Bahdanau et al., 2014) first encodes the in- 4.1 Dual Structure between Orator and put fact sequence F (actually the sequence of Ne - Logician dimensional word embedding vectors) into a Nh - dimensional hidden states H F = [hF F In (Sun et al., 2018), an agent L, called Logician, 1 , · · · , hNS ] using bi-directional GRU (Gated Recurrent Units) was trained to convert a sentence S into a set of network (Cho et al., 2014). Then, when gen- facts F with probability PL (F|S, ΘL ), where ΘL erating word wt of the target sentence, the de- is the set of parameters of L: coder computes the probability of generating wt Logician L: S → PL (F|S, ΘL ). by p(wt |{w1 , · · · , wt−1 }, ct ) = g(ht−1 , st , ct ), where st is the hidden state of the GRU decoder, Obviously, the Logician and Orator can coop- g is the word generation model, and ct is the dy- erate to supervise each other. Given < S, F >∈ namic context vector which focuses attention on DSAOKE , the Logician produces a predicted set of specific location l in the input hidden states H F . facts F ∗ for the sentence S, and the Orator can For the Orator, we use the copy mechanism to calculate the probability PO (S|F ∗ , ΘO ) of recon- implement the word generation model g and use struction S from F ∗ . Intuitively, if F ∗ loses major the coverage mechanism to compute the dynamic information of S, honestly reconstructing S from context vector ct . F ∗ would be impossible, and thus the probabil- ity PO (S|F ∗ , ΘO ) would be small. Thus, it is a 3.2.2 Copy Mechanism strong signal to evaluate the quality of F ∗ . Simi- In the SAOKE dataset, the words in the set of facts larly, when the Orator produces a sentence S ∗ for (excluding the external symbols) must be in the the set of facts F, the probability PL (F|S ∗ , ΘL ) corresponding sentence, so the problem is suitable provided by the Logician is a strong signal for to be modeled via the copy mechanism (Gu et al., evaluating the quality of S ∗ . These signals are 2016). In the copy mechanism, when the decoder helpful to conquer several problems of the origi- is considering generating a word wt , it can either nal agents, including information lost, information be copied from the source fact sequence F or se- redundancy, and non-fluency. lect from a vocabulary V : Note that the supervision signals PO (S|F ∗ , ΘO ) and PL (F|S ∗ , ΘL ) do not rely on p(wt |wt−1 , st , ct ) = pF (wt |wt−1 , st , ct ) + any supervised parallel corpus. Thus, similar to pV (wt |wt−1 , st , ct ), the application of dual learning paradigm on NMT 2122
task (Xia et al., 2016), it is theoretically possible 4.2.2 Similarity Rewards to use unparalleled sentences and sets of facts to Since the SAOKE dataset has label information, compute these signals. However, unsupervised the similarities between the predicted results and collections of fact-groups that can be reasonably the ground truths can be used as rewards. narrated in a sentence are not naturally available. For the Orator, since the S ∗ can be viewed as Currently, the only available collection is the the summarization of S, we use the widely used sets of facts provided by the SAOKE dataset, ROUGE-L (Lin, 2004) measure in the text sum- where the supervised information is available. marization field to evaluate the quality of S ∗ : As a result, we implement our dual learning system in a supervised approach, which uses the SO = ROUGEL (S, S ∗ ). reinforcement learning algorithm to optimize the Orator and the Logician. The involved rewards For the Logician, we use following procedure to are described in the next subsection, and then the calculate the similarity between F and F ∗ . First, algorithm is detailed in the last subsection. we compute the similarity between each predicted fact f ∗ ∈ F ∗ and each ground truth fact f ∈ F 4.2 Rewards with following measure: Given < S, F >∈ DSAOKE , we sample a set Pmin(|f ∗ |,|f |) of facts F ∗ from distribution PL (·|S, ΘL ) and a ∗ i=1 SimStr(fi∗ , fi ) SimF act(f , f ) = , sentence S ∗ from distribution PO (·|F, ΘO ). Fol- max(|f ∗ |, |f |) lowing rewards are introduced into the proposed dual learning system, and the relationships be- where fi∗ and fi denote the i-th element of tu- tween them are shown in Figure 1. ples of fact f ∗ and f , SimStr(·, ·) denotes the gestalt pattern matching (Ratcliff and Metzener, VL (Fi ) 1988) measure for two strings, and | · | is the car- dinality function. Then, each predicted fact in F ∗ is aligned to its corresponding ground-truth Logician fact in F by solving a linear assignment prob- S {Fi } lem (Wikipedia, 2017) to maximize the sum of RO (S, Fi ) similarities between the aligned facts. Finally, the similarity reward for the Logician is calculated by: SO (Sj , S) SL (F, Fi ) P ∗ SimF act(f ∗, f ) SL (F , F) = , RL (F, Sj ) max(|F ∗ |, |F|) {Sj } F where f ∗ ∈ F ∗ , f ∈ F are aligned pair of facts. Orator 4.2.3 Validity Rewards VO (Sj ) For the Orator, the output is expected as a valid natural language sentence, so the validity reward Figure 1: Illustration of the dual learning system of Lo- can be defined as: gician and Orator. VO (S ∗ ) = LM (S ∗ ), where the LM (·) is a language model. 4.2.1 Reconstruction Rewards For the Logician, the output should represent a Following the idea described in above subsection, valid collection of facts, which means: 1) the out- we design the reconstruction reward for the Orator put can be parsed into a collection of facts; 2) there as: is no duplicated fact (identified by the SimF act RO (S ∗ , F) = log PL (F|S ∗ , ΘL ), value larger than 0.85) in the parsed collection. The validity reward for Logician is defined as: and that for the Logician as: ( 0 if F ∗ is valid; RL (F ∗ , S) = log PO (S|F ∗ , ΘO ). ∗ VL (F ) = −1 otherwise. 2123
Algorithm 1 A simple dual-learning algorithm for facts extraction and expression Require: A set of sentence-facts pairs {< S, F >}; An initial Logician L and an initial Orator O; Beam size K; repeat 1: Sample a sentence-facts pair < S, F >; 1: Sample a sentence-facts pair < S, F >; 2: Logician produces K sets of facts 2: Orator produces K sentences S1 , · · · , SK F1 , · · · , FK from S via beam search; from F via beam search; 3: for each set of facts Fi do 3: for each sentence Si do 4: Compute the reward for Fi as: 4: Compute the reward for Si as: riF = α1 RO (S, Fi )+α2 VF (F)+α3 SF (F, Fi ). riS = β1 RL (F, Si )+β2 VS (Si )+β3 SS (S, Si ). 5: end for 5: end for Compute the total reward r = K1 K S P 1 PK i=1 ri ; 6: Compute the total reward r = K F 6: i=1 ri ; 7: Compute the stochastic gradient of ΘL : 7: Compute the stochastic gradient of ΘO , K K 1 X F 1 X S ∇ΘL Ê[r] = ri DΘL (S, Fi ) ∇ΘO Ê[r] = ri DΘO (Si , F) K K i=1 i=1 8: Compute the stochastic gradient of ΘO : 8: Compute the stochastic gradient of ΘL : K K α1 X β1 X ∇ΘO Ê[r] = DΘO (Fi , S) ∇ΘL Ê[r] = DΘL (Fi , S) K K i=1 i=1 9: Model updates: 9: Model updates: ΘL ← ΘL + ηL · ∇ΘL Ê[r], ΘO ← ΘO + ηO · ∇ΘO Ê[r], ΘO ← ΘO + ηO · ∇ΘO Ê[r]. ΘL ← ΘL + ηL · ∇ΘL Ê[r]. until convergence 4.3 Algorithm 4.3.2 Learning from Facts to Sentence For each pair < S, F >∈ DSAOKE , the following We sample S ∗ from the Logician PO (·|F, ΘO ) procedures are performed respectively (details are and define the total reward for S ∗ by: shown in Algorithm 1): rO = β1 RO (S ∗ , F) + β2 VO (S ∗ ) + 4.3.1 Learning from Sentence to Facts β3 SO (S ∗ , S), We sample F ∗ from the Logician PL (·|S, ΘL ) and where P βi = 1. The gradients can be computed calculate the total reward for F ∗ by as follows: rL = α1 · RL (F ∗ , S) + α2 · VL (F ∗ ) + ∇ΘL E[rO ] = E[β1 DΘL (F, S ∗ )], α3 · SL (F ∗ , F), ∇ΘO E[rO ] = E[rO DΘO (S ∗ , F)]. In practice, we use beam search (Sutskever P where αi = 1. The gradients of the expected reward E[rL ] to the parameters of agents can be et al., 2014) to obtain high-quality samples as F ∗ computed as follows, according to the policy gra- and S ∗ , and estimate the true gradient with the em- dient theorem (Sutton et al., 1999): pirical average of gradients over these samples. ∇ΘL E[rL ] = E[rL DΘL (F ∗ , S)], 5 Experimental Results ∗ ∇ΘO E[rL ] = E[α1 DΘO (S, F )]. 5.1 Experimental Design where DΘL (F, S) = ∇ΘL log PL (F|S, ΘL ) and First, we evaluate the performance of each agent DΘO (S, F) = ∇ΘO log PO (S|F, ΘO ). fine-tuned by the dual learning procedure on the 2124
SAOKE dataset. Then we evaluate the Orator on strings are concatenated with commas to form the noisy facts, which accords with real OIN applica- final sentence. tion scenarios. Last, we investigate the behavior 5.1.3 Reward Implementation of agents in the dual system. In the experiments, the SAOKE dataset is split For the validity reward of the Orator, the lan- into the training set, validating set and testing guage model is trained using an RNN based set with ratios of 80%, 10%, 10%, respectively. method (Mikolov et al., 2010) with the same vo- For each algorithm involved in the experiments, cabulary V and the web pages from Baidu Baike we perform grid search to find the optimal super- website. parameters, and the model with the best perfor- For the reconstruction reward of the Orator, mance on the validating set is chosen as the learnt since the Logician needs the shallow tag and de- model to be evaluated on the testing set. pendency information of S ∗ as inputs, the infor- mation is extracted using the LTP tool-set (Che 5.1.1 Evaluation Metric et al., 2010) and then fed to the Logician. For the Orator, BLEU-4 and ROUGE-L are used 5.1.4 Training to measure how well the output matches the ground truth sentence. When training the base model for each agent, the For the Logician, based on the fact-equivalence batch size is set to 20. When training two agents judgment proposed in (Sun et al., 2018), we com- in dual learning, the batch size is set to 12, and the pute the Precision(P), Recall (R) and F1-score beam size is set to 3. Both agents are trained using over the testing set of the SAOKE dataset as the the stochastic gradient descent (SGD) with RM- evaluation metric. SPROP strategy (Hinton et al., 2012) and early- stop strategy on the validating set. In dual learn- 5.1.2 Agent Implementation ing, the super-parameters, including αi , βi , is de- For the Orator, we make a vocabulary V with termined by grid-search. size 72,591 by collecting all web pages from Baidu Baike website3 (a Chinese alternative to 5.2 Evaluation of Agents on the SAOKE Wikipedia) and identifying the words occurred in dataset more than 100 web pages. For the Orator, the di- First, we evaluate the performance of the agents mension of embedding vectors is set to Ne = 256, optimized by the dual learning method. To iden- and the dimension of hidden states is set to Nh = tify the contribution of the dual structure, we train 256. We use a three-layer bi-directional GRU with another pair of agents with α1 = 0 and β1 = 0 dimension 128 as the encoder. All dimensions of in Algorithm 1 to exclude the dual information. hidden states in the decoder are set to 256. Without the dual information, these two agents are For the Logician, we implement the model de- trained independently to each other with reinforce- scribed in (Sun et al., 2018), including the shallow ment learning on their own supervised informa- tag information and the gated dependency atten- tion. We name these two agents as R-Logician and tion mechanism. R-Orator, where “R” means “Reinforced”. In the Furthermore, to provide an intuitive compre- experimental results of this paper, the symbol at hension of the OIN task, we implement a rule- the top mark means that the marked result is sig- based method for OIN task. For each sequence of nificantly different (with p = 0.05) with the corre- facts in the SAOKE dataset, the method first iden- sponding result of the agent with the specific mark. tifies the subsequences in which the facts share the same subject. Then it preserves the subject of the Methods Precision Recall F1 first fact in each subsequence and removes the sub- Logician∗ 0.449 0.400 0.423 jects of following facts (by replacing it with an R-Logician∓ 0.462∗ 0.432∗ 0.446∗ empty string). It is necessary since the SAOKE Logician@Dual 0.494∗∓ 0.426∗ 0.457∗∓ dataset requires the shared subject to be repeated for completeness of the related facts. At last, each Table 3: Performance of the Logicians. fact is formatted into a string by filling the objects The experimental results for the Logician into the placeholders of the predicate and these agents are shown in Table 3, from which we can 3 http://baike.baidu.com observe a significant performance improvement 2125
from Logician to R-Logician and also from R- 5.4 Evaluation of the Dual System Logician to Logician@Dual. The experimental re- In this section, we investigate the behavior of sults for the Orator agents are shown in Table 4. agents in the dual system. We first examine the The neural based Orator agents significantly out- Orator Logician procedure F −−−−→ S ∗ −−−−−→ F ∗∗ , that is, for perform the rule-based agent. For both evalua- each F in the testing set of the SAOKE dataset, let tion metric, the R-Orator and Orator@Dual are the Orator narrate it into a sentence S ∗ , and then both significantly outperform the original Orator. let the Logician to extract facts F ∗∗ from S ∗ . Then The Orator@Dual significantly outperforms the the quality of F ∗∗ is measured by comparing it R-Orator on the BLEU-4 score, but is not signifi- Logician Orator cantly different on the ROUGE-L score. with F. Then we examine S −−−−−→ F ∗ −−−−→ S ∗∗ , which is the reverse procedure. The compar- Methods BLEU-4 ROUGE-L ison is made between the family of base agents Rule? 0.257 0.434 and that of the dual-trained agents. The results are Orator∗ 0.401? 0.556? shown in Table 6, and two instance of these two R-Orator∓ 0.405?∗ 0.559?∗ experiments are shown in Table 7 and 8 respec- Orator@Dual 0.419?∗∓ 0.559?∗ tively. From these results, we can observe large improvements of reconstruction quality on both Table 4: Performance of the Orators. directions. Orator Logician By comparing the performance of R-agents F −−−−→ S ∗ −−−−−→ F ∗∗ and the agents@Dual, we can observe that Methods Precision Recall F1 agents@Dual generally achieve better perfor- Base∗ 0.574 0.488 0.527 mance on precision, but may recall less informa- Dual 0.657∗ 0.565∗ 0.608∗ tion, resulting in smaller advances in the balanced Logician Orator S −−−−−→ F ∗ −−−−→ S ∗∗ evaluation metric (F1 and ROUGE-L). This may Methods BLEU-4 ROUGE-L imply that the agents tend to provide easy input Base∗ 0.428 0.565 for each other for higher accuracy, by neglecting ∗ Dual 0.635 0.635∗ some difficult part of the problem which they cur- rently cannot handle properly. This interesting Table 6: Reconstruction performance for the Logicians phenomenon is the subject of our future research. and the Orators. 5.3 Evaluation of Orator on Noisy Facts 6 Conclusion Experiments in the previous subsection show the In this paper, we investigate the OIN task and its performance of Orators to narrate a set of human- duality to the OIE task. The proposed Orator has labeled facts. In practice, however, the input to shown its ability to fulfill the OIN task, that is, as- the Orator might not be the human-labeled perfect sembling open-domain facts into high quality sen- facts, but some noisy facts automatically extracted tences. Furthermore, our attempt to utilize the du- by OIE algorithms. In this subsection, we make ality between OIN and OIE tasks for improving a collection of sets of noisy facts by feeding the the performances for both OIN and OIE agents ac- sentences in the testing set of the SAOKE dataset complishes a preliminary success. to the base Logician model and collecting the out- Our work suggests at least three future research puts. Then we evaluate the series of Orator models topics: Firstly, one can enrich the theoretical study on these noisy facts, and report their performance of the duality between the OIE and OIN tasks. at Table 5, from which we can see the performance Secondly, one can investigate how to conquer the improvement from the Orator to Orator@Dual. barrier of the absence of an extensive collection of Methods BLEU-4 ROUGE-L reasonable sets of open-domain facts and incorpo- Orator∗ 0.428 0.565 rate unsupervised information into this Logician- R-Orator∓ 0.431∗ 0.567∗ Orator dual learning structures for further im- Orator@Dual 0.458∗∓ 0.572∗∓ provement. Lastly, one can also interested in de- veloping task-oriented rewards for adapting the Table 5: Performance of the Orators on noisy facts. agent to a specific task, for example, the answer generation task for open-domain KBQA system. 2126
S 大综货物吞吐量均保持两位数增长,其中铁矿石吞吐量突破4500万吨,木材吞吐量 突破600万立方,均创历史新高,成为全国进口木材第一大港。 S in English The throughput of all integrated cargoes kept a double-digit growth. Among them, the throughput of iron ore exceeded 45 million tons and the throughput of timber exceeded 6 million cubic meters, all of which hit record highs, and became the country’s largest port of timber imports. Base Models Dual Models Logician S −−−−−−→ F ∗ (大综货物吞吐量, 保持, 两位数增长) (铁 (大综货物吞吐量, 均保持, 两位数增长) (铁 矿石吞吐量, 突破, 4500万吨) (木材吞吐量, 矿石吞吐量, 突破, 4500万吨) (木材吞吐量, 突破, 600万立方) (创历史, DESC, 新高) (_, 突破, 600万立方) (_, 均创, 历史新高) (_, 成 成为, 全国进口木材第一大港) 为, 全国进口木材第一大港) Logician S −−−−−−→ F ∗ (The throughput of all integrated cargoes, (The throughput of integrated cargoes, all in English kept, double-digits growth) (Throughput kept, double-digit growth) (Throughput of iron ore, exceeded, 45 million tons) of iron ore, exceeded, 45 million tons) (Throughput of wood, breakthrough, 6 mil- (Throughput of timber, exceeded 6, million lion cubic meters) (Hit historical, DESC, new cubic meters) (_, all hit, record highs) (_, high) (_, become, the country’s largest port of become, the country’s largest port of timber timber imports) imports) Logician S −−−−−−→ 大综货物吞吐量保持两位数增长,突 大综货物吞吐量均保持两位数增长,铁 ∗ Orator F −−−−→ S ∗∗ 破4500万吨,突破600万立方,新高,成 矿石吞吐量突破4500万吨,木材吞吐量突 为全国进口木材第一大港。 破600万立方,均创历史新高,成为全国 进口木材第一大港。 Logician S −−−−−−→ The throughput of all integrated cargoes The throughput of all integrated cargoes kept ∗ Orator F −−−−→ S ∗∗ kept double-digit growth, breaking 45 million double-digit growth. The throughput of iron in English tons, breaking 6 million cubic meters, a new ore exceeded 45 million tons and the through- high, became the country’s largest port of tim- put of timber exceeded 6 million cubic me- ber imports. ters, all hit a record high and became the country’s largest port of timber imports. Table 7: Examples for sentence reconstruction. F (营业法人, 具备, 中专(高中)以上学历) (营业法人, 有, 一定的[管理|经营]能力) (营 业法人, 具备, [较强的服务理念|团队合作精神]) F in English (Business legal person, is with, technical secondary school (high school) or above) (Busi- ness legal person, has, certain [managerial | operational] ability) (Business legal person, is with, [strong philosophy of service | teamwork spirit]) Base Models Dual Models Orator F −−−−→ S ∗ 营业法人具备中专(高中)以上学历,有 营业法人具备中专(高中)以上学历,营 一定的管理和经营能力,具备较强的服务 业法人有一定的管理和经营能力,具备较 理念和团队合作精神。 强的服务理念和团队合作精神。 Orator F −−−−→ S ∗ in Business legal person is with secondary (high Business legal person is with secondary (high English school) or above, has certain managerial and school) or above, business legal person has operational capabilities, and is with strong certain managerial and operational capabili- philosophy of service and teamwork spirit. ties, and is with strong philosophy of service and teamwork spirit. Orator F −−−−→ (营业法人具备中专(高中)以上学历, 有, (营业法人, 具备, 中专(高中)以上学历) ∗ Logician S −−−−−−→ F ∗∗ 一定的[管理|经营]能力) (营业法人具备中 (营业法人,有,[一定的管理|经营能力]) (营 专的[管理|经营]能力, 具备, 较强的[服务理 业法人, 具备, 较强的[服务理念|团队合作 念|团队合作精神]) 精神]) Orator F −−−−→ (Business legal person with technical sec- (Business legal person, is with, technical sec- ∗ Logician S −−−−−−→ F ∗∗ ondary school (high school) or above, has, ondary school (high school) or above) (Busi- in English certain [managerial | operational] ability) ness legal person, has, [certain managerial | (Business legal person with technical sec- operational ability]) (Business legal person, ondary school (high school) or above, is is with, [strong philosophy of service | team- with, [strong philosophy of service | team- work spirit]) work spirit]) Table 8: Examples for fact reconstruction. 2127
References Janara Christensen, Stephen Soderland, and Gagan Bansal. 2014. Hierarchical Summarization: Scaling Shubham Agarwal and Marc Dymetman. 2017. A Up Multi-Document Summarization. In Proceed- Surprisingly Effective Out-of-the-Box Char2char ings of the 52nd Annual Meeting of the Association Model on the E2E NLG Challenge Dataset. In Pro- for Computational Linguistics, pages 902–912. ceedings of the 18th Annual SIGdial Meeting on Dis- course and Dialogue, August, pages 158–163. Nan Duan, Duyu Tang, Peng Chen, and Ming Zhou. 2017. Question Generation for Question Answer- Sören Auer, Christian Bizer, Georgi Kobilarov, Jens ing. In Proceedings of the 2017 Conference on Em- Lehmann, Richard Cyganiak, and Zachary Ives. pirical Methods in Natural Language Processing, 2007. DBpedia: A Nucleus for a Web of Open Data. pages 866–874. The Semantic Web, 4825 LNCS:722–735. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- Oren Etzioni, Anthony Fader, Janara Christensen, gio. 2014. Neural Machine Translation by Jointly Stephen Soderland, and Mausam. 2011. Open In- Learning to Align and Translate. In International formation Extraction: The Second Generation. In Conference on Learning Representations. Proceeding of International Joint Conference on Ar- tificial Intelligence, pages 3–10. Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Anthony Fader, Luke S Zettlemoyer, and Oren Et- Knight, Philipp Koehn, Martha Palmer, and Nathan zioni. 2014. Open Question Answering Over Cu- Schneider. 2013. Abstract Meaning Representation rated and Extracted Knowledge Bases. In Proceed- for Sembanking. Proceedings of the 7th Linguistic ings of the 20th ACM SIGKDD International Con- Annotation Workshop and Interoperability with Dis- ference on Knowledge Discovery and Data Mining, course, pages 178–186. pages 1156–1165. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Federico Gaspari. 2006. Look Who’s Translating: Im- Sturge, and Jamie Taylor. 2008. Freebase: a Col- personations, Chinese Whispers and Fun with Ma- laboratively Created Graph database for Structuring chine Translation on the Internet. In EAMT-2006: Human Knowledge. In Proceedings of the 2008 11th Annual Conference of the European Associa- ACM SIGMOD International Conference on Man- tion for Machine Translation, pages 19–20. agement of Data, pages 1247–1250. ACM. Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K. Wanxiang Che, Zhenghua Li, and Ting Liu. 2010. LTP: Li. 2016. Incorporating Copying Mechanism in A Chinese Language Technology Platform. In Pro- Sequence-to-Sequence Learning. In Proceedings of ceedings of the Coling, pages 13–16. the 54th Annual Meeting of the Association for Com- putational Linguistics, pages 1631–1640. Andrew Chisholm, Will Radford, and Ben Hachey. 2017. Learning to Generate One-sentence Biogra- Geoffrey Hinton, Nitish Srivastava, and Kevin Swer- phies from Wikidata. In Proceedings of the 15th sky. 2012. Overview of Mini-batch Gradient De- Conference of the European Chapter of the Associa- scent. Technical report. tion for Computational Linguistics, volume 1, pages 633–642. Nanda Kambhatla. 2004. Combining Lexical, Syntac- tic, and Semantic Features with Maximum Entropy Kyunghyun Cho, Bart van Merrienboer, Caglar Gul- Models for Extracting Relations. In Proceedings of cehre, Dzmitry Bahdanau, Fethi Bougares, Hol- the ACL 2004 on Interactive Poster and Demonstra- ger Schwenk, and Yoshua Bengio. 2014. Learn- tion Sessions. ing Phrase Representations using RNN Encoder- Decoder for Statistical Machine Translation. In Pro- Tushar Khot, Ashish Sabharwal, and Peter Clark. 2017. ceedings of the 2014 Conference on Empirical Meth- Answering Complex Questions Using Open Infor- ods in Natural Language Processing, pages 1724– mation Extraction. In Proceedings of the 55th An- 1734. nual Meeting of the Association for Computational Linguistics, pages 311—-316. Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. 2011. An Analysis of Open Informa- Ioannis Konstas, Srinivasan Iyer, Mark Yatskar, Yejin tion Extraction based on Semantic Role Labeling. Choi, and Luke Zettlemoyer. 2017. Neural amr: In Proceedings of the Sixth International Conference Sequence-to-sequence models for parsing and gen- on Knowledge Capture, pages 113–120. ACM Press. eration. In Proceedings of the 55th Annual Meet- ing of the Association for Computational Linguis- Janara Christensen, Mausam, Stephen Soderland, Oren tics, pages 146–157. Association for Computational Etzioni, Mausam, Stephen Soderland, and Oren Et- Linguistics. zioni. 2013. Towards Coherent Multi-Document Summarization. In Proceedings of the 2013 Con- C Y Lin. 2004. Rouge: A Package for Automatic Eval- ference of the North American Chapter of the Asso- uation of Summaries. In Proceedings of the 2004 ciation for Computational Linguistics: Human Lan- ACL Workshop on Text Summarization Branches guage Technologies, Section 3, pages 1163–1173. Out, 1, pages 25–26. 2128
Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, Mingming Sun, Xu Li, Xin Wang, Miao Fan, Yue Feng, and Maosong Sun. 2016. Neural Relation Extrac- and Ping Li. 2018. Logician: A Unified End-to-End tion with Selective Attention over Instances. In Pro- Neural Approach for Open-Domain Information Ex- ceedings of the 54th Annual Meeting of the Asso- traction. In Proceedings of the Eleventh ACM In- ciation for Computational Linguistics, pages 2124– ternational Conference on Web Search and Data 2133. Mining, February, pages 556–564, New York, New York, USA. ACM Press. Mausam. 2016. Open Information Extraction Systems and Downstream Applications. In Proceedings of Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. the 25th International Joint Conference on Artificial Sequence to Sequence Learning with Neural Net- Intelligence, pages 4074–4077. works. In Advances in Neural Information Process- ing Systems 27, volume 155, page 9. T Mikolov, M Karafiat, L Burget, J Cernocky, and S Khudanpur. 2010. Recurrent Neural Network Richard S. Sutton, David Mcallester, Satinder Singh, based Language Model. In Proceedings of Inter- and Yishay Mansour. 1999. Policy Gradient Meth- speech, September, pages 1045–1048. ods for Reinforcement Learning with Function Ap- proximation. In Advances in Neural Information Mike Mintz, Steven Bills, Rion Snow, and Dan Juraf- Processing Systems 12, pages 1057–1063. sky. 2009. Distant Supervision for Relation Extrac- tion without Labeled Data. In Proceedings of the Duyu Tang, Nan Duan, Tao Qin, Zhao Yan, and Joint Conference of the 47th Annual Meeting of the Ming Zhou. 2017. Question answering and ques- ACL and the 4th International Joint Conference on tion generation as dual tasks. arXiv preprint Natural Language, volume 2, page 1003, Morris- arXiv:1706.02027. town, NJ, USA. Association for Computational Lin- guistics. Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016. Modeling Coverage for Neural Makoto Miwa and Mohit Bansal. 2016. End-to-End Machine Translation. In Proceedings of the Annual Relation Extraction using LSTMs on Sequences and Meeting of the Association for Computational Lin- Tree Structures. In Proceedings of the 54th Annual guistics, pages 76–85. Meeting of the Association for Computational Lin- guistics, pages 1105–1116, Stroudsburg, PA, USA. Menno Van Zaanen and Simon Zwarts. 2006. Unsu- Association for Computational Linguistics. pervised Measurement of Translation Quality using Multi-engine, Bi-directional Translation. In Aus- Harinder Pal and Mausam. 2016. Demonyms and tralasian Joint Conference on Artificial Intelligence, Compound Relational Nouns in Nominal Open IE. pages 1208–1214. Springer. In Proceedings of the 5th Workshop on Automated Knowledge Base Construction, pages 35–39. Pavlos Vougiouklis, Hady Elsahar, Lucie-Aimée Kaffee, Christoph Gravier, Frederique Laforest, John W Ratcliff and David E Metzener. 1988. Pattern Jonathon Hare, and Elena Simperl. 2017. Neural Matching: The Gestalt Approach. Dr Dobb’s, 13(7). Wikipedian: Generating Textual Summaries from Knowledge Base Triples. Journal of Web Seman- Mrinmaya Sachan and Eric Xing. 2018. Self-Training tics: Science, Services and Agents on the World Wide for Jointly Learning to Ask and Answer Questions. Web. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computa- Wikipedia. 2017. Assignment problem— Wikipedia, tional Linguistics: Human Language Technologies, The Free Encyclopedia. volume 1, pages 629–640. Sam Wiseman, Stuart M. Shieber, and Alexander M. Michael Schmitz, Robert Bart, Stephen Soderland, and Rush. 2017. Challenges in Data-to-Document Gen- Oren Etzioni. 2012. Open language learning for in- eration. In Proceedings of the 2017 Conference on formation extraction. In Proceedings of the 2012 Empirical Methods in Natural Language Process- Joint Conference on Empirical Methods in Natural ing, pages 2243–2253. Language Processing and Computational Natural Language Learning, pages 523–534. Yingce Xia, Di He, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual Learn- Tomohiro Shigenobu. 2007. Evaluation and Usability ing for Machine Translation. In Advances in Neural of Back Translation for Intercultural Communica- Information Processing Systems 29, pages 1–9. tion. In International Conference on Usability and Internationalization, pages 259–265. Springer. Yingce Xia, Tao Qin, Wei Chen, Jiang Bian, Nenghai Yu, and Tie-Yan Liu. 2017. Dual Supervised Learn- Gabriel Stanovsky, Ido Dagan, and Mausam. 2015. ing. In Proceedings of 34th International Confer- Open IE as an Intermediate Structure for Semantic ence on Machine Learning. Tasks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and Jun Yin, Xin Jiang, Zhengdong Lu, Lifeng Shang, the 7th International Joint Conference on Natural Hang Li, and Xiaoming Li. 2016. Neural Gen- Language Processing, pages 303–308. erative Question Answering. In Proceedings of 2129
2016 NAACL Human-Computer Question Answer- ing Workshop, pages 36–42. Dmitry Zelenko, Chinatsu Aone, Anthony Richardella, Jaz Kandola, Thomas Hofmann, Tomaso Poggio, and John Shawe-Taylor. 2003. Kernel Methods for Relation Extraction. Journal of Machine Learning Research, 3:1083–1106. Suncong Zheng, Feng Wang, Hongyun Bao, Yuexing Hao, Peng Zhou, and Bo Xu. 2017. Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme. In Proceedings of the 55th Annual Meet- ing of the Association for Computational Linguis- tics, pages 1227–1236. 2130
You can also read