Natural Language Generation as Planning Under Uncertainty for Spoken Dialogue Systems

Page created by Deborah Little

Uncategorized

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Natural Language Generation as Planning Under Uncertainty for Spoken
Dialogue Systems

Verena Rieser Oliver Lemon
School of Informatics School of Informatics
University of Edinburgh University of Edinburgh
vrieser@inf.ed.ac.uk olemon@inf.ed.ac.uk

Abstract In extending this previous work we treat NLG
We present and evaluate a new model for as a statistical sequential planning problem, anal-
Natural Language Generation (NLG) in ogously to current statistical approaches to Dia-
Spoken Dialogue Systems, based on statis- logue Management (DM), e.g. (Singh et al., 2002;
tical planning, given noisy feedback from Henderson et al., 2008; Rieser and Lemon, 2008a)
the current generation context (e.g. a user and “conversation as action under uncertainty”
and a surface realiser). We study its use in (Paek and Horvitz, 2000). In NLG we have
a standard NLG problem: how to present similar trade-offs and unpredictability as in DM,
information (in this case a set of search re- and in some systems the content planning and DM
sults) to users, given the complex trade- tasks are overlapping. Clearly, very long system
offs between utterance length, amount of utterances with many actions in them are to be
information conveyed, and cognitive load. avoided, because users may become confused or
We set these trade-offs by analysing exist- impatient, but each individual NLG action will
ing MATCH data. We then train a NLG pol- convey some (potentially) useful information to
icy using Reinforcement Learning (RL), the user. There is therefore an optimization prob-
which adapts its behaviour to noisy feed- lem to be solved. Moreover, the user judgements
back from the current generation context. or next (most likely) action after each NLG action
This policy is compared to several base- are unpredictable, and the behaviour of the surface
lines derived from previous work in this realizer may also be variable (see Section 6.2).
area. The learned policy significantly out- NLG could therefore fruitfully be approached
performs all the prior approaches. as a sequential statistical planning task, where
there are trade-offs and decisions to make, such as
1 Introduction whether to choose another NLG action (and which
Natural language allows us to achieve the same one to choose) or to instead stop generating. Re-
communicative goal (“what to say”) using many inforcement Learning (RL) allows us to optimize
different expressions (“how to say it”). In a Spo- such trade-offs in the presence of uncertainty, i.e.
ken Dialogue System (SDS), an abstract commu- the chances of achieving a better state, while en-
nicative goal (CG) can be generated in many dif- gaging in the risk of choosing another action.
ferent ways. For example, the CG to present In this paper we present and evaluate a new
database results to the user can be realized as a model for NLG in Spoken Dialogue Systems as
summary (Polifroni and Walker, 2008; Demberg planning under uncertainty. In Section 2 we argue
and Moore, 2006), or by comparing items (Walker for applying RL to NLG problems and explain the
et al., 2004), or by picking one item and recom- overall framework. In Section 3 we discuss chal-
mending it to the user (Young et al., 2007). lenges for NLG for Information Presentation. In
Previous work has shown that it is useful to Section 4 we present results from our analysis of
adapt the generated output to certain features of the MATCH corpus (Walker et al., 2004). In Sec-
the dialogue context, for example user prefer- tion 5 we present a detailed example of our pro-
ences, e.g. (Walker et al., 2004; Demberg and posed NLG method. In Section 6 we report on
Moore, 2006), user knowledge, e.g. (Janarthanam experimental results using this framework for ex-
and Lemon, 2008), or predicted TTS quality, e.g. ploring Information Presentation policies. In Sec-
(Nakatsu and White, 2006). tion 7 we conclude and discuss future directions.
Proceedings of the 12th Conference of the European Chapter of the ACL, pages 683–691,
Athens, Greece, 30 March – 3 April 2009. c 2009 Association for Computational Linguistics

683

2 NLG as planning under uncertainty as classifier learning and re-ranking, e.g. (Stent et
al., 2004; Oh and Rudnicky, 2002). Supervised
We adopt the general framework of NLG as plan-
approaches involve the ranking of a set of com-
ning under uncertainty (see (Lemon, 2008) for the
pleted plans/utterances and as such cannot adapt
initial version of this approach). Some aspects of
online to the context or the user. Reinforcement
NLG have been treated as planning, e.g. (Koller
Learning (RL) provides a principled, data-driven
and Stone, 2007; Koller and Petrick, 2008), but
optimisation framework for our type of planning
never before as statistical planning.
problem (Sutton and Barto, 1998).
NLG actions take place in a stochastic environ-
ment, for example consisting of a user, a realizer,
3 The Information Presentation Problem
and a TTS system, where the individual NLG ac-
tions have uncertain effects on the environment. We will tackle the well-studied problem of Infor-
For example, presenting differing numbers of at- mation Presentation in NLG to show the benefits
tributes to the user, and making the user more or of this approach. The task here is to find the best
less likely to choose an item, as shown by (Rieser way to present a set of search results to a user
and Lemon, 2008b) for multimodal interaction. (e.g. some restaurants meeting a certain set of con-
Most SDS employ fixed template-based gener- straints). This is a task common to much prior
ation. Our goal, however, is to employ a stochas- work in NLG, e.g. (Walker et al., 2004; Demberg
tic realizer for SDS, see for example (Stent et al., and Moore, 2006; Polifroni and Walker, 2008).
2004). This will introduce additional noise, which
In this problem, there there are many decisions
higher level NLG decisions will need to react
available for exploration. For instance, which pre-
to. In our framework, the NLG component must
sentation strategy to apply (NLG strategy selec-
achieve a high-level Communicative Goal from
tion), how many attributes of each item to present
the Dialogue Manager (e.g. to present a number
(attribute selection), how to rank the items and at-
of items) through planning a sequence of lower-
tributes according to different models of user pref-
level generation steps or actions, for example first
erences (attribute ordering), how many (specific)
to summarize all the items and then to recommend
items to tell them about (conciseness), how many
the highest ranking one. Each such action has un-
sentences to use when doing so (syntactic plan-
predictable effects due to the stochastic realizer.
ning), and which words to use (lexical choice) etc.
For example the realizer might employ 6 attributes
All these parameters (and potentially many more)
when recommending item i4 , but it might use only
can be varied, and ideally, jointly optimised based
2 (e.g. price and cuisine for restaurants), depend-
on user judgements.
ing on its own processing constraints (see e.g. the
We had two corpora available to study some of
realizer used to collect the MATCH project data).
the regions of this decision space. We utilise the
Likewise, the user may be likely to choose an item
MATCH corpus (Walker et al., 2004) to extract an
after hearing a summary, or they may wish to hear
evaluation function (also known as ”reward func-
more. Generating appropriate language in context
tion”) for RL. Furthermore, we utilise the SPaRKy
(e.g. attributes presented so far) thus has the fol-
corpus (Stent et al., 2004) to build a high quality
lowing important features in general:
stochastic realizer. Both corpora contain data from
• NLG is goal driven behaviour “overhearer” experiments targeted to Information
Presentation in dialogues in the restaurant domain.
• NLG must plan a sequence of actions
While we are ultimately interested in how hearers
• each action changes the environment state or engaged in dialogues judge different Information
context Presentations, results from overhearers are still di-
• the effect of each action is uncertain. rectly relevant to the task.

These facts make it clear that the problem of 4 MATCH corpus analysis
planning how to generate an utterance falls nat-
urally into the class of statistical planning prob- The MATCH project made two data sets available,
lems, rather than rule-based approaches such as see (Stent et al., 2002) and (Whittaker et al., 2003),
(Moore et al., 2004; Walker et al., 2004), or super- which we combine to define an evaluation function
vised learning as explored in previous work, such for different Information Presentation strategies.

684

strategy example av.#attr av.#sentence
SUMMARY “The 4 restaurants differ in food quality, and cost.” 2.07±.63 1.56±.5
(#attr = 2, #sentence = 1)
COMPARE “Among the selected restaurants, the following offer 3.2±1.5 5.5±3.11
exceptional overall value. Aureole’s price is 71 dol-
lars. It has superb food quality, superb service and
superb decor. Daniel’s price is 82 dollars. It has su-
perb food quality, superb service and superb decor.”
(#attr = 4, #sentence = 5)
RECOMMEND “Le Madeleine has the best overall value among the 2.4±.7 3.5±.53
selected restaurants. Le Madeleine’s price is 40 dol-
lars and It has very good food quality. It’s in Mid-
town West. ” (#attr = 3, #sentence = 3)

Table 1: NLG strategies present in the MATCH corpus with average no. attributes and sentences as found
in the data.

The first data set, see (Stent et al., 2002), com- .34). 1
prises 1024 ratings by 16 subjects (where we only
use the speech-based half, n = 512) on the follow- score = .775 × #attr + (−.301) × #sentence;
ing presentation strategies: RECOMMEND, COM - (1)
PARE, SUMMARY . These strategies are realized The second MATCH data set, see (Whittaker et
using templates as in Table 2, and varying num- al., 2003), comprises 1224 ratings by 17 subjects
bers of attributes. In this study the users rate the on the NLG strategies RECOMMEND and COM -
individual presentation strategies as significantly PARE. The strategies realise varying numbers of
different (F (2) = 1361, p < .001). We find that attributes according to different “conciseness” val-
SUMMARY is rated significantly worse (p = .05 ues: concise (1 or 2 attributes), average (3
with Bonferroni correction) than RECOMMEND or 4), and verbose (4,5, or 6). Overhearers
and COMPARE, which are rated as equally good. rate all conciseness levels as significantly different
This suggests that one should never generate (F (2) = 198.3, p < .001), with verbose rated
a SUMMARY. However, SUMMARY has different highest and concise rated lowest, supporting
qualities from COMPARE and RECOMMEND, as our findings in the first data set. However, the rela-
it gives users a general overview of the domain, tion between number of attributes and user ratings
and probably helps the user to feel more confi- is not strictly linear: ratings drop for #attr = 6.
dent when choosing an item, especially when they This suggests that there is an upper limit on how
are unfamiliar with the domain, as shown by (Po- many attributes users like to hear. We expect this
lifroni and Walker, 2008). to be especially true for real users engaged in ac-
In order to further describe the strategies, we ex- tual dialogue interaction, see (Winterboer et al.,
tracted different surface features as present in the 2007). We therefore include “cognitive load” as a
data (e.g. number of attributes realised, number of variable when training the policy (see Section 6).
sentences, number of words, number of database In addition to the trade-off between length and
items talked about, etc.) and performed a step- informativeness for single NLG strategies, we are
wise linear regression to find the features which interested whether this trade-off will also hold for
were important to the overhearers (following the generating sequences of NLG actions. (Whittaker
PARADISE framework (Walker et al., 2000)). We
et al., 2002), for example, generate a combined
discovered a trade-off between the length of the ut- strategy where first a SUMMARY is used to de-
terance (#sentence) and the number of attributes scribe the retrieved subset and then they RECOM -
MEND one specific item/restaurant. For example
realised (#attr), i.e. its informativeness, where
overhearers like to hear as many attributes as pos- “The 4 restaurants are all French, but differ in
sible in the most concise way, as indicated by 1
For comparison: (Walker et al., 2000) report on R2 be-
the regression model shown in Equation 1 (R2 = tween .4 and .5 on a slightly lager data set.

685

Figure 1: Possible NLG policies (X=stop generation)

food quality, and cost. Le Madeleine has the best describe a worked through example of the overall
overall value among the selected restaurants. Le framework.
Madeleine’s price is 40 dollars and It has very
good food quality. It’s in Midtown West.” 5 Method: the RL-NLG model
We therefore extend the set of possible strate-
For the reasons discussed above, we treat the
gies present in the data for exploration: we allow
NLG module as a statistical planner, operat-
ordered combinations of the strategies, assuming
ing in a stochastic environment, and optimise
that only COMPARE or RECOMMEND can follow a
it using Reinforcement Learning. The in-
SUMMARY , and that only RECOMMEND can fol-
put to the module is a Communicative Goal
low COMPARE, resulting in 7 possible actions:
supplied by the Dialogue Manager. The CG
consists of a Dialogue Act to be generated,
1. RECOMMEND
for example present items(i1 , i2 , i5 , i8 ),
2. COMPARE and a System Goal (SysGoal) which is
3. SUMMARY the desired user reaction, e.g. to make the
user choose one of the presented items
4. COMPARE+RECOMMEND (user choose one of(i1 , i2 , i5 , i8 )). The
5. SUMMARY+RECOMMEND RL-NLG module must plan a sequence of lower-
level NLG actions that achieve the goal (at lowest
6. SUMMARY+COMPARE cost) in the current context. The context consists
7. SUMMARY+COMPARE+RECOMMEND of a user (who may remain silent, supply more
constraints, choose an item, or quit), and variation
We then analytically solved the regression from the sentence realizer described above.
model in Equation 1 for the 7 possible strategies Now let us walk-through one simple ut-
using average values from the MATCH data. This is terance plan as carried out by this model,
solved by a system of linear inequalities. Accord- as shown in Table 2. Here, we start
ing to this model, the best ranking strategy is to with the CG present items(i1 , i2 , i5 , i8 )&
do all the presentation strategies in one sequence, user choose one of(i1 , i2 , i5 , i8 ) from the
i.e. SUMMARY+COMPARE+RECOMMEND. How- system’s DM. This initialises the NLG state (init).
ever, this analytic solution assumes a “one-shot” The policy chooses the action SUMMARY and this
generation strategy where there is no intermediate transitions us to state s1, where we observe that
feedback from the environment: users are simply 4 attributes and 1 sentence have been generated,
static overhearers (they cannot “barge-in” for ex- and the user is predicted to remain silent. In this
ample), there is no variation in the behaviour of the state, the current NLG policy is to RECOMMEND
surface realizer, i.e. one would use fixed templates the top ranked item (i5 , for this user), which takes
as in MATCH, and the user has unlimited cogni- us to state s2, where 8 attributes have been gener-
tive capabilities. These assumptions are not real- ated in a total of 4 sentences, and the user chooses
istic, and must be relaxed. In the next Section we an item. The policy holds that in states like s2 the

686

ACTIONS:       summarise                 recommend                    stop

                  GOAL                               s1                      s2                    end
                           init

                         ENVIRONMENT:         atts=4                       atts=8                   Reward
                                              user=silent                  user=choose

                                  Figure 2: Example RL-NLG action sequence for Table 4

State   Action                                                                                        State change/effect
init    SysGoal: present items(i1 , i2 , i5 , i8 )& user choose one of(i1 , i2 , i5 , i8 )               initialise state
s1      RL-NLG: SUMMARY(i1 , i2 , i5 , i8 )                                                        att=4, sent=1, user=silent
s2      RL-NLG: RECOMMEND(i5 )                                                                  att=8, sent=4, user=choose(i5 )
end     RL-NLG: stop                                                                                   calculate Reward

                              Table 2: Example utterance planning sequence for Figure 2

  best thing to do is “stop” and pass the turn to the                 generator state. In other words, the NLG user sim-
  user. This takes us to the state end, where the total               ulation tells us what the user is most likely to do
  reward of this action sequence is computed (see                     next, if we were to stop generating now. It also
  Section 6.3), and used to update the NLG policy                     tells us the probability whether the user chooses
  in each of the visited state-action pairs via back-                 to “barge-in” after a system NLG action (by either
  propagation.                                                        choosing an item or providing more information).
                                                                         The user simulation for this study is a simple
  6     Experiments                                                   bi-gram model, which relates the number of at-
  We now report on a proof-of-concept study where                     tributes presented to the next likely user actions,
  we train our policy in a simulated learning envi-                   see Table 3. The user can either follow the goal
  ronment based on the results from the MATCH cor-                    provided by the DM (SysGoal), for example
  pus analysis in Section 4. Simulation-based RL                      choosing an item. The user can also do some-
  allows to explore unseen actions which are not in                   thing else (userElse), e.g. providing another
  the data, and thus less initial data is needed (Rieser              constraint, or the user can quit (userQuit).
  and Lemon, 2008b). Note, that we cannot directly                       For simplification, we discretise the number of
  learn from the MATCH data, as therefore we would                    attributes into concise-average-verbose,
  need data from an interactive dialogue. We are                      reflecting the conciseness values from the MATCH
  currently collecting such data in a Wizard-of-Oz                    data, as described in Section 4. In addition, we
  experiment.                                                         assume that the user’s cognitive abilities are lim-
                                                                      ited (“cognitive load”), based on the results from
  6.1     User simulation                                             the second MATCH data set in Section 4. Once the
  User simulations are commonly used to train                         number of attributes is more than the “magic num-
  strategies for Dialogue Management, see for ex-                     ber 7” (reflecting psychological results on short-
  ample (Young et al., 2007). A user simulation for                   term memory) (Baddeley, 2001)) the user is more
  NLG is very similar, in that it is a predictive model               likely to become confused and quit.
  of the most likely next user act. However, this user                   The probabilities in Table 3 are currently man-
  act does not actually change the overall dialogue                   ually set heuristics. We are currently conducting a
  state (e.g. by filling slots) but it only changes the               Wizard-of-Oz study in order to learn these proba-

                                                                687

bilities (and other user parameters) from real data. contexts it might be desirable to generate all 9 at-
tributes, e.g. if the generated utterance is short.
SysGoal userElse userQuit
concise 20.0 60.0 20.0
Threshold-based approaches, in contrast, cannot
average 60.0 20.0 20.0 (easily) reason with respect to the current content.
verbose 20.0 20.0 60.0
6.4 State and Action Space
Table 3: NLG bi-gram user simulation We now formulate the problem as a Markov De-
cision Process (MDP), relating states to actions.
6.2 Realizer model Each state-action pair is associated with a transi-
The sequential NLG model assumes a realizer, tion probability, which is the probability of mov-
which updates the context after each generation ing from state s at time t to state s0 at time t + 1 af-
step (i.e. after each single action). We estimate ter having performed action a when in state s. This
the realiser’s parameters from the mean values we transition probability is computed by the environ-
found in the MATCH data (see Table 1). For this ment model (i.e. user and realizer), and explic-
study we first (randomly) vary the number of at- itly captures noise/uncertainty in the environment.
tributes, whereas the number of sentences is fixed This is a major difference to other non-statistical
(see Table 4). In current work we replace the re- planning approaches. Each transition is also as-
alizer model with an implemented generator that sociated with a reinforcement signal (or reward)
replicates the variation found in the SPaRKy real- rt+1 describing how good the result of action a
izer (Stent et al., 2004). was when performed in state s.
The state space comprises 9 binary features rep-
#attr #sentence resenting the number of attributes, 2 binary fea-
SUMMARY 1 or 2 2 tures representing the predicted user’s next ac-
COMPARE 3 or 4 6 tion to follow the system goal or quit, as well as
RECOMMEND 2 or 3 3 a discrete feature reflecting the number of sen-
tences generated so far, as shown in Figure 3.
Table 4: Realizer parameters This results in 211 × 6 = 12, 288 distinct genera-
tion states. We trained the policy using the well
known SARSA algorithm, using linear function ap-
6.3 Reward function
proximation (Sutton and Barto, 1998). The policy
The reward function defines the final goal of the was trained for 3600 simulated NLG sequences.
utterance generation sequence. In this experiment In future work we plan to learn lower level deci-
the reward is a function of the various data-driven sions, such as lexical adaptation based on the vo-
trade-offs as identified in the data analysis in Sec- cabulary used by the user.
tion 4: utterance length and number of provided
attributes, as weighted by the regression model 6.5 Baselines
in Equation 1, as well as the next predicted user
We derive the baseline policies from Informa-
action. Since we currently only have overhearer
tion Presentation strategies as deployed by cur-
data, we manually estimate the reward for the
rent dialogue systems. In total we utilise 7 differ-
next most likely user act, to supplement the data-
ent baselines (B1-B7), which correspond to single
driven model. If in the end state the next most
branches in our policy space (see Figure 1):
likely user act is userQuit, the learner gets a
penalty of −100, userElse receives 0 reward, B1: RECOMMEND only, e.g. (Young et al., 2007)
and SysGoal gains +100 reward. Again, these
hand coded scores need to be refined by a more B2: COMPARE only, e.g. (Henderson et al., 2008)
targeted data collection, but the other components B3: SUMMARY only, e.g. (Polifroni and Walker,
of the reward function are data-driven. 2008)
Note that RL learns to “make compromises”
B4: SUMMARY followed by RECOMMEND, e.g.
with respect to the different trade-offs. For ex-
(Whittaker et al., 2002)
ample, the user is less likely to choose an item
if there are more than 7 attributes, but the real- B5: Randomly choosing between COMPARE and
izer can generate 9 attributes. However, in some RECOMMEND , e.g. (Walker et al., 2007)

688

                                                    n   o
                                         attributes  | 1 | -|9 | :  0,1 
                           SUMMARY                   n       o            
                 
                         COMPARE         
                                            sentence:  1-11                
                                                                            
                 action: 
                          
                                    
                                    state:           n     o              
                 
                         RECOMMEND       
                                            userGoal: 0,1
                                                                            
                                                                            
                           end
                                                     n     o              
                                                                          
                                             userQuit: 0,1

                                  Figure 3: State-Action space for RL-NLG

B6: Randomly choosing between all 7 outputs                      ment. Furthermore, the simulated environment
                                                                 allows us to replicate the results in the MATCH
B7: Always generating whole sequence, i.e.
                                                                 corpus (see Section 4) when only comparing sin-
    SUMMARY + COMPARE + RECOMMEND ,      as
                                                                 gle strategies: SUMMARY performs significantly
    suggested by the analytic solution (see
                                                                 worse, while RECOMMEND and COMPARE per-
    Section 4).
                                                                 form equally well.
6.6   Results
                                                                            policy   reward    (±std)
We analyse the test runs (n=200) using an ANOVA                             B1       99.1      (±129.6)
with a PostHoc T-Test (with Bonferroni correc-                              B2       90.9      (±142.2)
tion). RL significantly (p < .001) outperforms all                          B3       65.5      (±137.3)
baselines in terms of final reward, see Table 5. RL                         B4       176.0     (±154.1)
is the only policy which significantly improves the                         B5       95.9      (±144.9)
next most likely user action by adapting to features                        B6       168.8     (±165.3)
in the current context. In contrast to conventional                         B7       229.3     (±157.1)
approaches, RL learns to ‘control’ its environment                          RL       310.8     (±136.1)
according to the estimated transition probabilities
and the associated rewards.                                           Table 5: Evaluation Results (p < .001 )
   The learnt policy can be described as follows:
It either starts with SUMMARY or COMPARE af-
ter the init state, i.e. it learnt to never start with a         7   Conclusion
RECOMMEND . It stops generating after COMPARE
if the userGoal is (probably) reached (e.g. the                  We presented and evaluated a new model for Nat-
user is most likely to choose an item in the next                ural Language Generation (NLG) in Spoken Dia-
turn, which depends on the number of attributes                  logue Systems, based on statistical planning. After
generated), otherwise it goes on and generates a                 motivating and presenting the model, we studied
RECOMMEND . If it starts with SUMMARY , it al-                   its use in Information Presentation.
ways generates a COMPARE afterwards. Again, it                      We derived a data-driven model predicting
stops if the userGoal is (probably) reached, oth-                users’ judgements to different information presen-
erwise it generates the full sequence (which corre-              tation actions (reward function), via a regression
sponds to the analytic solution B7).                             analysis on MATCH data. We used this regression
   The analytic solution B7 performs second best,                model to set weights in a reward function for Re-
and significantly outperforms all the other base-                inforcement Learning, and so optimize a context-
lines (p < .01). Still, it is significantly worse                adaptive presentation policy. The learnt policy was
(p < .001) than the learnt policy as this ‘one-shot-             compared to several baselines derived from previ-
strategy’ cannot robustly and dynamically adopt to               ous work in this area, where the learnt policy sig-
noise or changes in the environment.                             nificantly outperforms all the baselines.
   In general, generating sequences of NLG ac-                      There are many possible extensions to this
tions rates higher than generating single actions                model, e.g. using the same techniques to jointly
only: B4 and B6 rate directly after RL and B7,                   optimise choosing the number of attributes, aggre-
while B1, B2, B3, B5 are all equally bad given                   gation, word choice, referring expressions, and so
our data-driven definition of reward and environ-                on, in a hierarchical manner.

                                                           689

We are currently collecting data in targeted Tim Paek and Eric Horvitz. 2000. Conversation as
Wizard-of-Oz experiments, to derive a fully data- action under uncertainty. In Proc. of the 16th Con-
ference on Uncertainty in Artificial Intelligence.
driven training environment and test the learnt
policy with real users, following (Rieser and Joseph Polifroni and Marilyn Walker. 2008. Inten-
Lemon, 2008b). The trained NLG strategy sional Summaries as Cooperative Responses in Di-
will also be integrated in an end-to-end statis- alogue Automation and Evaluation. In Proceedings
tical system within the CLASSiC project (www. of ACL.
classic-project.org). Verena Rieser and Oliver Lemon. 2008a. Does this
list contain what you were searching for? Learn-
Acknowledgments ing adaptive dialogue strategies for Interactive Ques-
tion Answering. J. Natural Language Engineering,
The research leading to these results has received 15(1):55–72.
funding from the European Community’s Sev-
enth Framework Programme (FP7/2007-2013) un- Verena Rieser and Oliver Lemon. 2008b. Learn-
der grant agreement no. 216594 (CLASSiC project ing Effective Multimodal Dialogue Strategies from
project: www.classic-project.org) and Wizard-of-Oz data: Bootstrapping and Evaluation.
In Proceedings of ACL.
from the EPSRC project no. EP/E019501/1.
S. Singh, D. Litman, M. Kearns, and M. Walker. 2002.
Optimizing dialogue management with Reinforce-
References ment Learning: Experiments with the NJFun sys-
tem. JAIR, 16:105–133.
A. Baddeley. 2001. Working memory and language:
an overview. Journal of Communication Disorder,
36(3):189–208. Amanda Stent, Marilyn Walker, Steve Whittaker, and
Preetam Maloor. 2002. User-tailored generation for
Vera Demberg and Johanna D. Moore. 2006. Infor- spoken dialogue: an experiment. In In Proc. of IC-
mation presentation in spoken dialogue systems. In SLP.
Proceedings of EACL.
Amanda Stent, Rashmi Prasad, and Marilyn Walker.
James Henderson, Oliver Lemon, and Kallirroi 2004. Trainable sentence planning for complex in-
Georgila. 2008. Hybrid reinforcement / supervised formation presentation in spoken dialog systems. In
learning of dialogue policies from fixed datasets. Association for Computational Linguistics.
Computational Linguistics (to appear).
R. Sutton and A. Barto. 1998. Reinforcement Learn-
Srinivasan Janarthanam and Oliver Lemon. 2008. User ing. MIT Press.
simulations for online adaptation and knowledge-
alignment in Troubleshooting dialogue systems. In Marilyn A. Walker, Candace A. Kamm, and Diane J.
Proc. of SEMdial. Litman. 2000. Towards developing general mod-
Alexander Koller and Ronald Petrick. 2008. Experi- els of usability with PARADISE. Natural Language
ences with planning for natural language generation. Engineering, 6(3).
In ICAPS.
Marilyn Walker, S. Whittaker, A. Stent, P. Maloor,
Alexander Koller and Matthew Stone. 2007. Sentence J. Moore, M. Johnston, and G. Vasireddy. 2004.
generation as planning. In Proceedings of ACL. User tailored generation in the match multimodal di-
alogue system. Cognitive Science, 28:811–840.
Oliver Lemon. 2008. Adaptive Natural Language
Generation in Dialogue using Reinforcement Learn- Marilyn Walker, Amanda Stent, François Mairesse, and
ing. In Proceedings of SEMdial. Rashmi Prasad. 2007. Individual and domain adap-
tation in sentence planning for dialogue. Journal of
Johanna Moore, Mary Ellen Foster, Oliver Lemon, and Artificial Intelligence Research (JAIR), 30:413–456.
Michael White. 2004. Generating tailored, com-
parative descriptions in spoken dialogue. In Proc.
Steve Whittaker, Marilyn Walker, and Johanna Moore.
FLAIRS.
2002. Fish or Fowl: A Wizard of Oz evaluation
Crystal Nakatsu and Michael White. 2006. Learning of dialogue strategies in the restaurant domain. In
to say it well: Reranking realizations by predicted Proc. of the International Conference on Language
synthesis quality. In Proceedings of ACL. Resources and Evaluation (LREC).

Alice Oh and Alexander Rudnicky. 2002. Stochastic Stephen Whittaker, Marilyn Walker, and Preetam Mal-
natural language generation for spoken dialog sys- oor. 2003. Should i tell all? an experiment on
tems. Computer, Speech & Language, 16(3/4):387– conciseness in spoken dialogue. In Proc. European
407. Conference on Speech Processing (EUROSPEECH).

690

Andi Winterboer, Jiang Hu, Johanna D. Moore, and
  Clifford Nass. 2007. The influence of user tailoring
  and cognitive load on user performance in spoken
  dialogue systems. In Proc. of the 10th International
  Conference of Spoken Language Processing (Inter-
  speech/ICSLP).

SJ Young, J Schatzmann, K Weilhammer, and H Ye.
  2007. The Hidden Information State Approach to
  Dialog Management. In ICASSP 2007.

                                                         691

You can also read