Natural Language Generation as Planning Under Uncertainty for Spoken Dialogue Systems
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Natural Language Generation as Planning Under Uncertainty for Spoken Dialogue Systems Verena Rieser Oliver Lemon School of Informatics School of Informatics University of Edinburgh University of Edinburgh vrieser@inf.ed.ac.uk olemon@inf.ed.ac.uk Abstract In extending this previous work we treat NLG We present and evaluate a new model for as a statistical sequential planning problem, anal- Natural Language Generation (NLG) in ogously to current statistical approaches to Dia- Spoken Dialogue Systems, based on statis- logue Management (DM), e.g. (Singh et al., 2002; tical planning, given noisy feedback from Henderson et al., 2008; Rieser and Lemon, 2008a) the current generation context (e.g. a user and “conversation as action under uncertainty” and a surface realiser). We study its use in (Paek and Horvitz, 2000). In NLG we have a standard NLG problem: how to present similar trade-offs and unpredictability as in DM, information (in this case a set of search re- and in some systems the content planning and DM sults) to users, given the complex trade- tasks are overlapping. Clearly, very long system offs between utterance length, amount of utterances with many actions in them are to be information conveyed, and cognitive load. avoided, because users may become confused or We set these trade-offs by analysing exist- impatient, but each individual NLG action will ing MATCH data. We then train a NLG pol- convey some (potentially) useful information to icy using Reinforcement Learning (RL), the user. There is therefore an optimization prob- which adapts its behaviour to noisy feed- lem to be solved. Moreover, the user judgements back from the current generation context. or next (most likely) action after each NLG action This policy is compared to several base- are unpredictable, and the behaviour of the surface lines derived from previous work in this realizer may also be variable (see Section 6.2). area. The learned policy significantly out- NLG could therefore fruitfully be approached performs all the prior approaches. as a sequential statistical planning task, where there are trade-offs and decisions to make, such as 1 Introduction whether to choose another NLG action (and which Natural language allows us to achieve the same one to choose) or to instead stop generating. Re- communicative goal (“what to say”) using many inforcement Learning (RL) allows us to optimize different expressions (“how to say it”). In a Spo- such trade-offs in the presence of uncertainty, i.e. ken Dialogue System (SDS), an abstract commu- the chances of achieving a better state, while en- nicative goal (CG) can be generated in many dif- gaging in the risk of choosing another action. ferent ways. For example, the CG to present In this paper we present and evaluate a new database results to the user can be realized as a model for NLG in Spoken Dialogue Systems as summary (Polifroni and Walker, 2008; Demberg planning under uncertainty. In Section 2 we argue and Moore, 2006), or by comparing items (Walker for applying RL to NLG problems and explain the et al., 2004), or by picking one item and recom- overall framework. In Section 3 we discuss chal- mending it to the user (Young et al., 2007). lenges for NLG for Information Presentation. In Previous work has shown that it is useful to Section 4 we present results from our analysis of adapt the generated output to certain features of the MATCH corpus (Walker et al., 2004). In Sec- the dialogue context, for example user prefer- tion 5 we present a detailed example of our pro- ences, e.g. (Walker et al., 2004; Demberg and posed NLG method. In Section 6 we report on Moore, 2006), user knowledge, e.g. (Janarthanam experimental results using this framework for ex- and Lemon, 2008), or predicted TTS quality, e.g. ploring Information Presentation policies. In Sec- (Nakatsu and White, 2006). tion 7 we conclude and discuss future directions. Proceedings of the 12th Conference of the European Chapter of the ACL, pages 683–691, Athens, Greece, 30 March – 3 April 2009. c 2009 Association for Computational Linguistics 683
2 NLG as planning under uncertainty as classifier learning and re-ranking, e.g. (Stent et al., 2004; Oh and Rudnicky, 2002). Supervised We adopt the general framework of NLG as plan- approaches involve the ranking of a set of com- ning under uncertainty (see (Lemon, 2008) for the pleted plans/utterances and as such cannot adapt initial version of this approach). Some aspects of online to the context or the user. Reinforcement NLG have been treated as planning, e.g. (Koller Learning (RL) provides a principled, data-driven and Stone, 2007; Koller and Petrick, 2008), but optimisation framework for our type of planning never before as statistical planning. problem (Sutton and Barto, 1998). NLG actions take place in a stochastic environ- ment, for example consisting of a user, a realizer, 3 The Information Presentation Problem and a TTS system, where the individual NLG ac- tions have uncertain effects on the environment. We will tackle the well-studied problem of Infor- For example, presenting differing numbers of at- mation Presentation in NLG to show the benefits tributes to the user, and making the user more or of this approach. The task here is to find the best less likely to choose an item, as shown by (Rieser way to present a set of search results to a user and Lemon, 2008b) for multimodal interaction. (e.g. some restaurants meeting a certain set of con- Most SDS employ fixed template-based gener- straints). This is a task common to much prior ation. Our goal, however, is to employ a stochas- work in NLG, e.g. (Walker et al., 2004; Demberg tic realizer for SDS, see for example (Stent et al., and Moore, 2006; Polifroni and Walker, 2008). 2004). This will introduce additional noise, which In this problem, there there are many decisions higher level NLG decisions will need to react available for exploration. For instance, which pre- to. In our framework, the NLG component must sentation strategy to apply (NLG strategy selec- achieve a high-level Communicative Goal from tion), how many attributes of each item to present the Dialogue Manager (e.g. to present a number (attribute selection), how to rank the items and at- of items) through planning a sequence of lower- tributes according to different models of user pref- level generation steps or actions, for example first erences (attribute ordering), how many (specific) to summarize all the items and then to recommend items to tell them about (conciseness), how many the highest ranking one. Each such action has un- sentences to use when doing so (syntactic plan- predictable effects due to the stochastic realizer. ning), and which words to use (lexical choice) etc. For example the realizer might employ 6 attributes All these parameters (and potentially many more) when recommending item i4 , but it might use only can be varied, and ideally, jointly optimised based 2 (e.g. price and cuisine for restaurants), depend- on user judgements. ing on its own processing constraints (see e.g. the We had two corpora available to study some of realizer used to collect the MATCH project data). the regions of this decision space. We utilise the Likewise, the user may be likely to choose an item MATCH corpus (Walker et al., 2004) to extract an after hearing a summary, or they may wish to hear evaluation function (also known as ”reward func- more. Generating appropriate language in context tion”) for RL. Furthermore, we utilise the SPaRKy (e.g. attributes presented so far) thus has the fol- corpus (Stent et al., 2004) to build a high quality lowing important features in general: stochastic realizer. Both corpora contain data from • NLG is goal driven behaviour “overhearer” experiments targeted to Information Presentation in dialogues in the restaurant domain. • NLG must plan a sequence of actions While we are ultimately interested in how hearers • each action changes the environment state or engaged in dialogues judge different Information context Presentations, results from overhearers are still di- • the effect of each action is uncertain. rectly relevant to the task. These facts make it clear that the problem of 4 MATCH corpus analysis planning how to generate an utterance falls nat- urally into the class of statistical planning prob- The MATCH project made two data sets available, lems, rather than rule-based approaches such as see (Stent et al., 2002) and (Whittaker et al., 2003), (Moore et al., 2004; Walker et al., 2004), or super- which we combine to define an evaluation function vised learning as explored in previous work, such for different Information Presentation strategies. 684
strategy example av.#attr av.#sentence SUMMARY “The 4 restaurants differ in food quality, and cost.” 2.07±.63 1.56±.5 (#attr = 2, #sentence = 1) COMPARE “Among the selected restaurants, the following offer 3.2±1.5 5.5±3.11 exceptional overall value. Aureole’s price is 71 dol- lars. It has superb food quality, superb service and superb decor. Daniel’s price is 82 dollars. It has su- perb food quality, superb service and superb decor.” (#attr = 4, #sentence = 5) RECOMMEND “Le Madeleine has the best overall value among the 2.4±.7 3.5±.53 selected restaurants. Le Madeleine’s price is 40 dol- lars and It has very good food quality. It’s in Mid- town West. ” (#attr = 3, #sentence = 3) Table 1: NLG strategies present in the MATCH corpus with average no. attributes and sentences as found in the data. The first data set, see (Stent et al., 2002), com- .34). 1 prises 1024 ratings by 16 subjects (where we only use the speech-based half, n = 512) on the follow- score = .775 × #attr + (−.301) × #sentence; ing presentation strategies: RECOMMEND, COM - (1) PARE, SUMMARY . These strategies are realized The second MATCH data set, see (Whittaker et using templates as in Table 2, and varying num- al., 2003), comprises 1224 ratings by 17 subjects bers of attributes. In this study the users rate the on the NLG strategies RECOMMEND and COM - individual presentation strategies as significantly PARE. The strategies realise varying numbers of different (F (2) = 1361, p < .001). We find that attributes according to different “conciseness” val- SUMMARY is rated significantly worse (p = .05 ues: concise (1 or 2 attributes), average (3 with Bonferroni correction) than RECOMMEND or 4), and verbose (4,5, or 6). Overhearers and COMPARE, which are rated as equally good. rate all conciseness levels as significantly different This suggests that one should never generate (F (2) = 198.3, p < .001), with verbose rated a SUMMARY. However, SUMMARY has different highest and concise rated lowest, supporting qualities from COMPARE and RECOMMEND, as our findings in the first data set. However, the rela- it gives users a general overview of the domain, tion between number of attributes and user ratings and probably helps the user to feel more confi- is not strictly linear: ratings drop for #attr = 6. dent when choosing an item, especially when they This suggests that there is an upper limit on how are unfamiliar with the domain, as shown by (Po- many attributes users like to hear. We expect this lifroni and Walker, 2008). to be especially true for real users engaged in ac- In order to further describe the strategies, we ex- tual dialogue interaction, see (Winterboer et al., tracted different surface features as present in the 2007). We therefore include “cognitive load” as a data (e.g. number of attributes realised, number of variable when training the policy (see Section 6). sentences, number of words, number of database In addition to the trade-off between length and items talked about, etc.) and performed a step- informativeness for single NLG strategies, we are wise linear regression to find the features which interested whether this trade-off will also hold for were important to the overhearers (following the generating sequences of NLG actions. (Whittaker PARADISE framework (Walker et al., 2000)). We et al., 2002), for example, generate a combined discovered a trade-off between the length of the ut- strategy where first a SUMMARY is used to de- terance (#sentence) and the number of attributes scribe the retrieved subset and then they RECOM - MEND one specific item/restaurant. For example realised (#attr), i.e. its informativeness, where overhearers like to hear as many attributes as pos- “The 4 restaurants are all French, but differ in sible in the most concise way, as indicated by 1 For comparison: (Walker et al., 2000) report on R2 be- the regression model shown in Equation 1 (R2 = tween .4 and .5 on a slightly lager data set. 685
Figure 1: Possible NLG policies (X=stop generation) food quality, and cost. Le Madeleine has the best describe a worked through example of the overall overall value among the selected restaurants. Le framework. Madeleine’s price is 40 dollars and It has very good food quality. It’s in Midtown West.” 5 Method: the RL-NLG model We therefore extend the set of possible strate- For the reasons discussed above, we treat the gies present in the data for exploration: we allow NLG module as a statistical planner, operat- ordered combinations of the strategies, assuming ing in a stochastic environment, and optimise that only COMPARE or RECOMMEND can follow a it using Reinforcement Learning. The in- SUMMARY , and that only RECOMMEND can fol- put to the module is a Communicative Goal low COMPARE, resulting in 7 possible actions: supplied by the Dialogue Manager. The CG consists of a Dialogue Act to be generated, 1. RECOMMEND for example present items(i1 , i2 , i5 , i8 ), 2. COMPARE and a System Goal (SysGoal) which is 3. SUMMARY the desired user reaction, e.g. to make the user choose one of the presented items 4. COMPARE+RECOMMEND (user choose one of(i1 , i2 , i5 , i8 )). The 5. SUMMARY+RECOMMEND RL-NLG module must plan a sequence of lower- level NLG actions that achieve the goal (at lowest 6. SUMMARY+COMPARE cost) in the current context. The context consists 7. SUMMARY+COMPARE+RECOMMEND of a user (who may remain silent, supply more constraints, choose an item, or quit), and variation We then analytically solved the regression from the sentence realizer described above. model in Equation 1 for the 7 possible strategies Now let us walk-through one simple ut- using average values from the MATCH data. This is terance plan as carried out by this model, solved by a system of linear inequalities. Accord- as shown in Table 2. Here, we start ing to this model, the best ranking strategy is to with the CG present items(i1 , i2 , i5 , i8 )& do all the presentation strategies in one sequence, user choose one of(i1 , i2 , i5 , i8 ) from the i.e. SUMMARY+COMPARE+RECOMMEND. How- system’s DM. This initialises the NLG state (init). ever, this analytic solution assumes a “one-shot” The policy chooses the action SUMMARY and this generation strategy where there is no intermediate transitions us to state s1, where we observe that feedback from the environment: users are simply 4 attributes and 1 sentence have been generated, static overhearers (they cannot “barge-in” for ex- and the user is predicted to remain silent. In this ample), there is no variation in the behaviour of the state, the current NLG policy is to RECOMMEND surface realizer, i.e. one would use fixed templates the top ranked item (i5 , for this user), which takes as in MATCH, and the user has unlimited cogni- us to state s2, where 8 attributes have been gener- tive capabilities. These assumptions are not real- ated in a total of 4 sentences, and the user chooses istic, and must be relaxed. In the next Section we an item. The policy holds that in states like s2 the 686
ACTIONS: summarise recommend stop GOAL s1 s2 end init ENVIRONMENT: atts=4 atts=8 Reward user=silent user=choose Figure 2: Example RL-NLG action sequence for Table 4 State Action State change/effect init SysGoal: present items(i1 , i2 , i5 , i8 )& user choose one of(i1 , i2 , i5 , i8 ) initialise state s1 RL-NLG: SUMMARY(i1 , i2 , i5 , i8 ) att=4, sent=1, user=silent s2 RL-NLG: RECOMMEND(i5 ) att=8, sent=4, user=choose(i5 ) end RL-NLG: stop calculate Reward Table 2: Example utterance planning sequence for Figure 2 best thing to do is “stop” and pass the turn to the generator state. In other words, the NLG user sim- user. This takes us to the state end, where the total ulation tells us what the user is most likely to do reward of this action sequence is computed (see next, if we were to stop generating now. It also Section 6.3), and used to update the NLG policy tells us the probability whether the user chooses in each of the visited state-action pairs via back- to “barge-in” after a system NLG action (by either propagation. choosing an item or providing more information). The user simulation for this study is a simple 6 Experiments bi-gram model, which relates the number of at- We now report on a proof-of-concept study where tributes presented to the next likely user actions, we train our policy in a simulated learning envi- see Table 3. The user can either follow the goal ronment based on the results from the MATCH cor- provided by the DM (SysGoal), for example pus analysis in Section 4. Simulation-based RL choosing an item. The user can also do some- allows to explore unseen actions which are not in thing else (userElse), e.g. providing another the data, and thus less initial data is needed (Rieser constraint, or the user can quit (userQuit). and Lemon, 2008b). Note, that we cannot directly For simplification, we discretise the number of learn from the MATCH data, as therefore we would attributes into concise-average-verbose, need data from an interactive dialogue. We are reflecting the conciseness values from the MATCH currently collecting such data in a Wizard-of-Oz data, as described in Section 4. In addition, we experiment. assume that the user’s cognitive abilities are lim- ited (“cognitive load”), based on the results from 6.1 User simulation the second MATCH data set in Section 4. Once the User simulations are commonly used to train number of attributes is more than the “magic num- strategies for Dialogue Management, see for ex- ber 7” (reflecting psychological results on short- ample (Young et al., 2007). A user simulation for term memory) (Baddeley, 2001)) the user is more NLG is very similar, in that it is a predictive model likely to become confused and quit. of the most likely next user act. However, this user The probabilities in Table 3 are currently man- act does not actually change the overall dialogue ually set heuristics. We are currently conducting a state (e.g. by filling slots) but it only changes the Wizard-of-Oz study in order to learn these proba- 687
bilities (and other user parameters) from real data. contexts it might be desirable to generate all 9 at- tributes, e.g. if the generated utterance is short. SysGoal userElse userQuit concise 20.0 60.0 20.0 Threshold-based approaches, in contrast, cannot average 60.0 20.0 20.0 (easily) reason with respect to the current content. verbose 20.0 20.0 60.0 6.4 State and Action Space Table 3: NLG bi-gram user simulation We now formulate the problem as a Markov De- cision Process (MDP), relating states to actions. 6.2 Realizer model Each state-action pair is associated with a transi- The sequential NLG model assumes a realizer, tion probability, which is the probability of mov- which updates the context after each generation ing from state s at time t to state s0 at time t + 1 af- step (i.e. after each single action). We estimate ter having performed action a when in state s. This the realiser’s parameters from the mean values we transition probability is computed by the environ- found in the MATCH data (see Table 1). For this ment model (i.e. user and realizer), and explic- study we first (randomly) vary the number of at- itly captures noise/uncertainty in the environment. tributes, whereas the number of sentences is fixed This is a major difference to other non-statistical (see Table 4). In current work we replace the re- planning approaches. Each transition is also as- alizer model with an implemented generator that sociated with a reinforcement signal (or reward) replicates the variation found in the SPaRKy real- rt+1 describing how good the result of action a izer (Stent et al., 2004). was when performed in state s. The state space comprises 9 binary features rep- #attr #sentence resenting the number of attributes, 2 binary fea- SUMMARY 1 or 2 2 tures representing the predicted user’s next ac- COMPARE 3 or 4 6 tion to follow the system goal or quit, as well as RECOMMEND 2 or 3 3 a discrete feature reflecting the number of sen- tences generated so far, as shown in Figure 3. Table 4: Realizer parameters This results in 211 × 6 = 12, 288 distinct genera- tion states. We trained the policy using the well known SARSA algorithm, using linear function ap- 6.3 Reward function proximation (Sutton and Barto, 1998). The policy The reward function defines the final goal of the was trained for 3600 simulated NLG sequences. utterance generation sequence. In this experiment In future work we plan to learn lower level deci- the reward is a function of the various data-driven sions, such as lexical adaptation based on the vo- trade-offs as identified in the data analysis in Sec- cabulary used by the user. tion 4: utterance length and number of provided attributes, as weighted by the regression model 6.5 Baselines in Equation 1, as well as the next predicted user We derive the baseline policies from Informa- action. Since we currently only have overhearer tion Presentation strategies as deployed by cur- data, we manually estimate the reward for the rent dialogue systems. In total we utilise 7 differ- next most likely user act, to supplement the data- ent baselines (B1-B7), which correspond to single driven model. If in the end state the next most branches in our policy space (see Figure 1): likely user act is userQuit, the learner gets a penalty of −100, userElse receives 0 reward, B1: RECOMMEND only, e.g. (Young et al., 2007) and SysGoal gains +100 reward. Again, these hand coded scores need to be refined by a more B2: COMPARE only, e.g. (Henderson et al., 2008) targeted data collection, but the other components B3: SUMMARY only, e.g. (Polifroni and Walker, of the reward function are data-driven. 2008) Note that RL learns to “make compromises” B4: SUMMARY followed by RECOMMEND, e.g. with respect to the different trade-offs. For ex- (Whittaker et al., 2002) ample, the user is less likely to choose an item if there are more than 7 attributes, but the real- B5: Randomly choosing between COMPARE and izer can generate 9 attributes. However, in some RECOMMEND , e.g. (Walker et al., 2007) 688
n o attributes | 1 | -|9 | : 0,1 SUMMARY n o COMPARE sentence: 1-11 action: state: n o RECOMMEND userGoal: 0,1 end n o userQuit: 0,1 Figure 3: State-Action space for RL-NLG B6: Randomly choosing between all 7 outputs ment. Furthermore, the simulated environment allows us to replicate the results in the MATCH B7: Always generating whole sequence, i.e. corpus (see Section 4) when only comparing sin- SUMMARY + COMPARE + RECOMMEND , as gle strategies: SUMMARY performs significantly suggested by the analytic solution (see worse, while RECOMMEND and COMPARE per- Section 4). form equally well. 6.6 Results policy reward (±std) We analyse the test runs (n=200) using an ANOVA B1 99.1 (±129.6) with a PostHoc T-Test (with Bonferroni correc- B2 90.9 (±142.2) tion). RL significantly (p < .001) outperforms all B3 65.5 (±137.3) baselines in terms of final reward, see Table 5. RL B4 176.0 (±154.1) is the only policy which significantly improves the B5 95.9 (±144.9) next most likely user action by adapting to features B6 168.8 (±165.3) in the current context. In contrast to conventional B7 229.3 (±157.1) approaches, RL learns to ‘control’ its environment RL 310.8 (±136.1) according to the estimated transition probabilities and the associated rewards. Table 5: Evaluation Results (p < .001 ) The learnt policy can be described as follows: It either starts with SUMMARY or COMPARE af- ter the init state, i.e. it learnt to never start with a 7 Conclusion RECOMMEND . It stops generating after COMPARE if the userGoal is (probably) reached (e.g. the We presented and evaluated a new model for Nat- user is most likely to choose an item in the next ural Language Generation (NLG) in Spoken Dia- turn, which depends on the number of attributes logue Systems, based on statistical planning. After generated), otherwise it goes on and generates a motivating and presenting the model, we studied RECOMMEND . If it starts with SUMMARY , it al- its use in Information Presentation. ways generates a COMPARE afterwards. Again, it We derived a data-driven model predicting stops if the userGoal is (probably) reached, oth- users’ judgements to different information presen- erwise it generates the full sequence (which corre- tation actions (reward function), via a regression sponds to the analytic solution B7). analysis on MATCH data. We used this regression The analytic solution B7 performs second best, model to set weights in a reward function for Re- and significantly outperforms all the other base- inforcement Learning, and so optimize a context- lines (p < .01). Still, it is significantly worse adaptive presentation policy. The learnt policy was (p < .001) than the learnt policy as this ‘one-shot- compared to several baselines derived from previ- strategy’ cannot robustly and dynamically adopt to ous work in this area, where the learnt policy sig- noise or changes in the environment. nificantly outperforms all the baselines. In general, generating sequences of NLG ac- There are many possible extensions to this tions rates higher than generating single actions model, e.g. using the same techniques to jointly only: B4 and B6 rate directly after RL and B7, optimise choosing the number of attributes, aggre- while B1, B2, B3, B5 are all equally bad given gation, word choice, referring expressions, and so our data-driven definition of reward and environ- on, in a hierarchical manner. 689
We are currently collecting data in targeted Tim Paek and Eric Horvitz. 2000. Conversation as Wizard-of-Oz experiments, to derive a fully data- action under uncertainty. In Proc. of the 16th Con- ference on Uncertainty in Artificial Intelligence. driven training environment and test the learnt policy with real users, following (Rieser and Joseph Polifroni and Marilyn Walker. 2008. Inten- Lemon, 2008b). The trained NLG strategy sional Summaries as Cooperative Responses in Di- will also be integrated in an end-to-end statis- alogue Automation and Evaluation. In Proceedings tical system within the CLASSiC project (www. of ACL. classic-project.org). Verena Rieser and Oliver Lemon. 2008a. Does this list contain what you were searching for? Learn- Acknowledgments ing adaptive dialogue strategies for Interactive Ques- tion Answering. J. Natural Language Engineering, The research leading to these results has received 15(1):55–72. funding from the European Community’s Sev- enth Framework Programme (FP7/2007-2013) un- Verena Rieser and Oliver Lemon. 2008b. Learn- der grant agreement no. 216594 (CLASSiC project ing Effective Multimodal Dialogue Strategies from project: www.classic-project.org) and Wizard-of-Oz data: Bootstrapping and Evaluation. In Proceedings of ACL. from the EPSRC project no. EP/E019501/1. S. Singh, D. Litman, M. Kearns, and M. Walker. 2002. Optimizing dialogue management with Reinforce- References ment Learning: Experiments with the NJFun sys- tem. JAIR, 16:105–133. A. Baddeley. 2001. Working memory and language: an overview. Journal of Communication Disorder, 36(3):189–208. Amanda Stent, Marilyn Walker, Steve Whittaker, and Preetam Maloor. 2002. User-tailored generation for Vera Demberg and Johanna D. Moore. 2006. Infor- spoken dialogue: an experiment. In In Proc. of IC- mation presentation in spoken dialogue systems. In SLP. Proceedings of EACL. Amanda Stent, Rashmi Prasad, and Marilyn Walker. James Henderson, Oliver Lemon, and Kallirroi 2004. Trainable sentence planning for complex in- Georgila. 2008. Hybrid reinforcement / supervised formation presentation in spoken dialog systems. In learning of dialogue policies from fixed datasets. Association for Computational Linguistics. Computational Linguistics (to appear). R. Sutton and A. Barto. 1998. Reinforcement Learn- Srinivasan Janarthanam and Oliver Lemon. 2008. User ing. MIT Press. simulations for online adaptation and knowledge- alignment in Troubleshooting dialogue systems. In Marilyn A. Walker, Candace A. Kamm, and Diane J. Proc. of SEMdial. Litman. 2000. Towards developing general mod- Alexander Koller and Ronald Petrick. 2008. Experi- els of usability with PARADISE. Natural Language ences with planning for natural language generation. Engineering, 6(3). In ICAPS. Marilyn Walker, S. Whittaker, A. Stent, P. Maloor, Alexander Koller and Matthew Stone. 2007. Sentence J. Moore, M. Johnston, and G. Vasireddy. 2004. generation as planning. In Proceedings of ACL. User tailored generation in the match multimodal di- alogue system. Cognitive Science, 28:811–840. Oliver Lemon. 2008. Adaptive Natural Language Generation in Dialogue using Reinforcement Learn- Marilyn Walker, Amanda Stent, François Mairesse, and ing. In Proceedings of SEMdial. Rashmi Prasad. 2007. Individual and domain adap- tation in sentence planning for dialogue. Journal of Johanna Moore, Mary Ellen Foster, Oliver Lemon, and Artificial Intelligence Research (JAIR), 30:413–456. Michael White. 2004. Generating tailored, com- parative descriptions in spoken dialogue. In Proc. Steve Whittaker, Marilyn Walker, and Johanna Moore. FLAIRS. 2002. Fish or Fowl: A Wizard of Oz evaluation Crystal Nakatsu and Michael White. 2006. Learning of dialogue strategies in the restaurant domain. In to say it well: Reranking realizations by predicted Proc. of the International Conference on Language synthesis quality. In Proceedings of ACL. Resources and Evaluation (LREC). Alice Oh and Alexander Rudnicky. 2002. Stochastic Stephen Whittaker, Marilyn Walker, and Preetam Mal- natural language generation for spoken dialog sys- oor. 2003. Should i tell all? an experiment on tems. Computer, Speech & Language, 16(3/4):387– conciseness in spoken dialogue. In Proc. European 407. Conference on Speech Processing (EUROSPEECH). 690
Andi Winterboer, Jiang Hu, Johanna D. Moore, and Clifford Nass. 2007. The influence of user tailoring and cognitive load on user performance in spoken dialogue systems. In Proc. of the 10th International Conference of Spoken Language Processing (Inter- speech/ICSLP). SJ Young, J Schatzmann, K Weilhammer, and H Ye. 2007. The Hidden Information State Approach to Dialog Management. In ICASSP 2007. 691
You can also read