Abstract Meaning Representation for Human-Robot Dialogue
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Abstract Meaning Representation for Human-Robot Dialogue Claire Bonial1 , Lucia Donatelli2 , Jessica Ervin3 , and Clare R. Voss1 1 U.S Army Research Laboratory, Adelphi, MD 20783 2 Georgetown University, Washington D.C. 20057 3 University of Rochester, Rochester, NY 14627 claire.n.bonial.civ@mail.mil Abstract only be trained to process and execute the ac- tions corresponding to semantic elements of the In this research, we begin to tackle the challenge of natural language understanding representation (ii) there are a variety of fairly ro- (NLU) in the context of the development of bust AMR parsers we can employ for this work, a robot dialogue system. We explore the ad- enabling us to forego manual annotation of sub- equacy of Abstract Meaning Representation stantial portions of our data and facilitating ef- (AMR) as a conduit for NLU. First, we con- ficient automatic parsing in a future end-to-end sider the feasibility of using existing AMR system; and (iii) the structured representation fa- parsers for automatically creating meaning cilitates the interpretation of novel instructions representations for robot-directed transcribed speech data. We evaluate the quality of out- and grounding instructions with respect to the put of two parsers on this data against a manu- robot’s current physical surroundings and set of ally annotated gold-standard data set. Second, executable actions. The latter motivation is espe- we evaluate the semantic coverage and distinc- cially important given that our human-robot dia- tions made in AMR overall: how well does it logue is physically situated. This stands in con- capture the meaning and distinctions needed trast to many other dialogue systems, such as task- in our collaborative human-robot dialogue do- oriented chat bots, which do not require establish- main? We find that AMR has gaps that align ing and acting upon a shared understanding of the with linguistic information critical for effec- tive human-robot collaboration in search and physical environment and often do not require any navigation tasks, and we present task-specific intermediate semantic representation (see §6 for modifications to AMR to address the deficien- further comparison to related work). cies. Our paper is structured as follows: First, we present background both on the corpus of human- 1 Introduction robot dialogue we are leveraging (§2), and on A central challenge in human-agent collaboration AMR (§3). §4 discusses the implementation and is that robots (or their virtual counterparts) do results of two AMR parsers on the human-robot not have sufficient linguistic or world knowledge dialogue data. §5 assesses the semantic coverage to communicate in a timely and effective manner of AMR for the human-robot dialogue data in par- with their human collaborators (Chai et al., 2017; ticular. We then discuss related work that informs She and Chai, 2017). We address this challenge the current research in §6. Finally, §7 concludes in ongoing research directed at analyzing robot- and presents ideas for future work. directed communication in collaborative human- agent exploration tasks, with the ultimate goal of 2 Background: Human-Robot Dialogue enabling robots to adapt to domain-specific lan- Corpus guage. In this paper, we choose to adopt an interme- We aim to support NLU within the broader context diate semantic representation and select Abstract of ongoing research to develop a human-robot di- Meaning Representation (AMR) (Banarescu et al., alogue system (Marge et al., 2016a) to be used on- 2013) in particular for three reasons: (i) the se- board a remotely located agent collaborating with mantic representation framework abstracts away humans in search and navigation tasks (e.g., disas- from surface variation, therefore the robot will ter relief). In developing this dialogue system, we 236 Proceedings of the Society for Computation in Linguistics (SCiL) 2019, pages 236-246. New York City, New York, January 3-6, 2019
are making use of portions of the corpus of human- change surrounding the successful execution of a robot dialogue data collected under this effort (Bo- command are grouped and annotated for the rela- nial et al., 2018; Traum et al., 2018).1 This corpus tions they hold to one another. Relation types (Rel) was collected via a phased ‘Wizard-of-Oz’ (WoZ) signal how utterances within the same TU relate methodology, in which human experimenters per- to one another in the context of the ultimate goal form the dialogue and navigation capabilities of of the TU (e.g. “ack-done” in table 1, shortened the robot during experimental trials, unbeknownst from “acknowledge-done,” signals that an utter- to participants interacting with the ‘robot’(Marge ance acknowledges completion of a previous ut- et al., 2016b). terance; for full details on Rel types, see Traum et Specifically, a naı̈ve participant (unaware of the al. (2018)). Antecedents (Ant) specify which utter- wizards) is tasked with instructing a robot to navi- ance is related to which. An example of a TU may gate through a remote, unfamiliar house-like en- be seen in table 1. It is notable that the existing an- vironment, and asked to find and count objects notation scheme highlights dialogue structure and such as shoes and shovels. In reality, the partic- does not provide a markup of the semantic content ipant is not speaking directly to a robot, but to of participant instructions. an unseen Dialogue Manager (DM) Wizard who listens to the participant’s spoken instructions and 3 Background: Abstract Meaning responds with text messages in a chat window or Representation passes a simplified text version of the instructions The Abstract Meaning Representation (AMR) to a Robot Navigator (RN) Wizard, who joysticks project (Banarescu et al., 2013) has created a man- the robot to complete the instructions. Given that ually annotated “semantics bank” of text drawn the DM acts as an intermediary passing communi- from a variety of genres. The AMR project anno- cations between the participant and the RN, the di- tations are completed on a sentence-by-sentence alogue takes place across multiple conversational basis, where each sentence is represented by a floors. The flow of dialogue from participant to rooted directed acyclic graph (DAG). For ease of DM, DM to RN and subsequent feedback to the creation and manipulation, annotators work with participant can be seen in table 1. the PENMAN representation of the same infor- The corpus comprises 20 participants and about mation (Penman Natural Language Group, 1989). 20 hours of audio, with 3,573 participant utter- For example: ances (continuous speech) totaling 18,336 words, (w / want-01 as well as 13,550 words from DM-Wizard text :ARG0 (d / dog) messages. The corpus includes speech transcrip- :ARG1 (p / pet-01 :ARG0 (g / girl) tions from participants as well as the speech of the :ARG1 d)) RN-Wizard. These transcriptions are time-aligned with the DM-Wizard text messages passed either Figure 1: PENMAN notation of The dog wants the girl to to the participant or to the RN-Wizard. pet him. The corpus also includes a dialogue annota- In neo-Davidsonian fashion (Davidson, 1969; tion scheme specific to multi-floor dialogue that Parsons, 1990), AMR introduces variables (or identifies initiator intent and signals relations be- graph nodes) for entities, events, properties, tween individual utterances pertaining to that in- and states. Leaves are labeled with con- tent (Traum et al., 2018).The design of the exist- cepts, so that (d / dog) refers to an in- ing annotation scheme allows for the characteriza- stance (d) of the concept dog. Relations link tion of distinct information states by way of sets entities, so that (w / walk-01 :location of participants, participant roles, turn-taking and (p/ park)) means the walking event (w) has floor-holding, and other factors (Traum and Lars- the relation location to the entity, park (p). son, 2003). Transaction units (TU) identify utter- When an entity plays multiple roles in a sen- ances from multiple participants and floors into tence (e.g., (d / dog) above), AMR employs units according to the realization of an initiator’s re-entrancy in graph notation (nodes with multiple intent, such that all utterances involved in an ex- parents) or variable re-use in PENMAN notation. 1 This corpus is still being collected and a public release is AMR concepts are either English words in preparation. (boy), PropBank (Palmer et al., 2005) role- 237
Left floor Right Floor Annotations # Participant DM Ñ Participant DM Ñ RN RN TU Ant Rel 1 move forward 3 feet 1 2 ok 1 1 ack-wilco 3 move forward 3 feet 1 1 trans-r 4 done 1 3 ack-done 5 I moved forward 3 feet 1 4 trans-l Table 1: Example of a Transaction Unit (TU) in the existing corpus dialogue annotation, which contains an instruction initiated by the participant, its translation to a simplified form (DM to RN), and the execution of the instruction and acknowledgement of such by the RN. TU, Ant(ecedent), and Rel(ation type) are indicated in the right columns. (Traum et al., 2018) sets (want-01), or special keywords indi- 4.1 Parsers cating generic entity types: date-entity, To automatically create AMRs for the human- world-region, distance-quantity, etc. robot diaogue data, we used two off-the-shelf In addition to the PropBank lexicon of role- AMR parsers, JAMR2 (Flanigan et al., 2014) and sets, which associate argument numbers (ARG CAMR3 (Wang et al., 2015). JAMR was one of the 0–6) with predicate-specific semantic roles (e.g., first AMR parsers and uses a two-part algorithm to ARG0=wanter in ex. 1), AMR uses approximately first identify concepts and then to build the maxi- 100 relations of its own (e.g., :time, :age, mum spanning connected subgraph of those con- :quantity, :destination, etc.). cepts, adding in the relations. CAMR, in con- The representation captures who is doing what trast, starts by obtaining the dependency tree (in to whom like other semantic role labeling (SRL) this case, using the Charniak parser4 and Stanford schemes (e.g., PropBank (Palmer et al., 2005), CoreNLP toolkit (Manning et al., 2014)) and then FrameNet (Baker et al., 1998; Fillmore et al., uses their algorithm to apply a series of transfor- 2003), VerbNet (Kipper et al., 2008)), but also rep- mations to the dependency tree, ultimately trans- resents other aspects of meaning outside of seman- forming it into an AMR graph. One strength of tic role information, such as fine-grained quan- CAMR is that the dependency parser is indepen- tity and unit information and parthood relations. dent of the AMR creation, so a dependency parser Also distinguishing it from other SRL schemes, a that is trained on a larger data set, and therefore goal of AMR is to capture core facets of mean- more accurate, can be used. Both JAMR and ing while abstracting away from idiosyncratic syn- CAMR have algorithms that have learned proba- tactic structures; thus, for example, She adjusted bilities from training data in order to execute their the machine and She made an adjustment to the algorithms on novel sentences. machine share the same AMR. AMR has been widely used to support NLU, generation, and sum- 4.2 Gold Standard Data Set marization (Liu et al., 2015; Pourdamghani et al., In order to evaluate parser performance on our 2016), machine translation, question answering data set, we hand-annotated a subset of the par- (Mitra and Baral, 2016), information extraction ticipant’s speech in the human-robot dialogue cor- (Pan et al., 2015), and biomedical text mining pus to create a gold standard data set. We focus (Garg et al., 2016; Rao et al., 2017; Wang et al., on only participant language because it is the nat- 2017) ural language that the robot will ultimately need to process and act on. This selected subset comprises 4 Evaluating AMR parsers on 10% of participant utterances from one phase of Human-Robot Dialogue Data the corpus including 10 subjects. The result- ing sample is 137 sentences, equally distributed To serve as a conduit for NLU in a dialogue sys- across the 10 participants, who tend to have unique tem, the ideal semantic representation would have speech patterns. Three expert annotators famil- robust parsers, allowing the representation to be iar with both the human-robot dialogue data and implemented efficiently on a large scale. There AMR independently annotated this sample, ob- have been a variety of parsers developed for AMR; 2 https://github.com/jflanigan/jamr two parsers using very different approaches are ex- 3 https://github.com/c-amr/camr plored in the sections to follow. 4 https://github.com/BLLIP/bllip-parser 238
taining inter-annotator agreement (IAA) scores of can be confidently inferred should be included in .82, .82, and .91 using the Smatch metric.5 the AMR, even if implicit.6 After independent annotations, we collabora- (m / move-01 :mode imperative tively created the gold standard set. Notable :ARG0 (y / you) choices made during this process include the treat- :ARG1 y ment of “can you” utterances, re-entry of the sub- :direction (f / forward) :extent (d / distance-quantity ject in commands using motion verbs with in- :quant 3 dependent arguments for the mover and thing- :unit (f2 / foot))) moved, and handling of disfluencies; each is de- Figure 3: Gold standard AMR for “move forward 3 feet,” scribed here. showing the inferred argument “you” as :ARG0 (mover) and In “can you” utterances, there is an ambi- :ARG1 (thing-moved). guity as to whether it is a genuine question of ability, or a polite request. This difference Although the LDC AMR corpus7 does not in- determines whether the sentence gets annotated clude speech data, AMR does offer guidance with possible-01 (used in AMR to convey for disfluencies—dropping the disfluent portion both possibility and ability), or just as a command of an utterance in favor of representing only the (figure 2). It also determines whether the robot speaker’s repair utterance.8 We followed this gen- should respond with a statement of ability, or eral AMR practice and dropped disfluent speech perform the action requested. To resolve this for the surprisingly infrequent cases of disfluency ambiguity, we referred back to the full transcripts in our gold standard sample. of the data, and inferred based on context. In our sample, only one of these utterances (“can 4.3 Results & Error Analysis you go that way”) was deemed to be a genuine Having created a gold standard sample of our data, questions of ability, while the remaining 13 (e.g. we ran both JAMR and CAMR on the same sam- “can you take a picture,” “can you turn to your ple and obtained the Smatch scores when com- right”) were treated as commands. Those that pared to the gold standard. As seen in Table 2, were commands were annotated with :polite CAMR performs better on both precision and re- +, in order to preserve what we believe to be the call, thus obtaining the higher F-score. However, speaker’s intention in using the modal “can.” compared to their self-reported F-scores (0.58 for (p / possible-01 JAMR and 0.63 for CAMR) on other corpora, both :ARG1 (p2 / picture-01 under-perform on the human-robot dialogue data. :ARG0 (y / you)) :polarity (a / amr-unknown)) Precision Recall F-score (p / picture-01 :mode imperative JAMR 0.27 0.44 0.33 :polite + CAMR 0.33 0.51 0.40 :ARG0 (y / you)) Figure 2: Two different AMR parses for the utterance “can Table 2: Parser performance on human-robot dialogue data. you take a picture,” convey two distinct interpretations of the utterance: the top can be read as “Is it possible for you to take Of the errors present in the parser output, many a picture?,” the bottom as a command for a picturing event. come from improper handling of light verb con- struction, imperatives, inferred arguments, and re- With commands like “move” or “turn,” it is im- quests phrased as “can you” questions. “Take a plied that the robot is the agent impelling mo- picture” is an example of a frequent light verb con- tion and the thing being moved. Therefore, we struction, in which the verb (“take”) is not seman- used re-entry in those AMRs to infer the implied tically the main predicating element of the sen- “you” as both the :ARG0, mover, and the :ARG1, tence. The correct parse is shown in Figure 4, thing-moved (figure 3). This is consistent with 6 See “implicit roles” in AMR guide- AMR’s goal of capturing the meaning of an utter- lines: https://github.com/amrisi/amr- ance, independent of syntax—all arguments that guidelines/blob/master/amr.md 7 https://catalog.ldc.upenn.edu/ 5 IAA Smatch scores on AMRs are generally between .7 LDC2017T10 8 and .8, depending on the complexity of the data (AMR devel- https://www.isi.edu/˜ulf/amr/lib/ opment group communication, 2014). amr-dict.html#disfluency 239
followed by the AMR that both parsers consis- training data came from the LDC AMR cor- tently create. They both incorrectly place “take” pus, made up entirely of written text, mostly as the top node (using the grasping, caused mo- newswire.9 In contrast, the human-robot dia- tion sense of the verb), when in reality it should be logue data is transcribed from spoken utterances, dropped from the AMR completely according to taken from dialogue that is instructional and goal- AMR practice for light verbs. Light verb construc- oriented. Thus, running the parsers on this data tions occur in 60 utterances in our sample, with 59 set shows that the differences in domain have sig- of those being some variation on “take a picture” nificant effects on the parsers, and give rise to the or “take a photo.” systematic errors described above. Improvements to the parser output could be ob- (p / picture-01 :mode imperative :ARG0 (y / you)) tained even by just adding a few simple heuris- tics, due to the formulaic nature of our data. Of (x1 / take-01 137 sample sentences, 25 were “take a picture,” :ARG1 (x3 / picture)) so introducing a heuristic specific to that sentence Figure 4: Gold standard AMR for “take a picture” (top), would be a simple way to make several correc- followed by parser output. tions. To obtain broader improvements, however, it’s clear that it will be necessary to retrain the Another common error shown in figure 4 is parsers on in-domain data. Given that retrain- the notation :mode imperative, used to indi- ing requires a corpus of hand-annotated data, this cate commands, which is present in 127 sentences gives us an opportunity to examine the current within our sample. Despite the prevalence of this features of AMR in relation to our collaborative feature in our gold standard, it is never present in human-robot dialogue domain, and to explore pos- parser output. sible additions to the annotation scheme to ensure Another problematic omission on the parsers’ that all elements of meaning essential to our do- part is a lack of inferred arguments. As dis- main have coverage. The findings of this analysis cussed earlier, commands like “move forward 3 are described in the sections to follow. feet” have an implied “you” in both the :ARG0 and :ARG1 positions. However, the parsers don’t 5 Evaluating Semantic Coverage & include this variable, instead only including con- Distinctions of AMR cepts that are explicitly mentioned in the sentence (figure 5). We assess the adequacy of AMR for its use in an NLU component of a human-robot dialogue (x1 / move-01 system both on theoretical grounds and in light :direction (x2 / forward) :ARG1 (x4 / distance-quantity of the results and error analysis presented in §4. :unit (f / foot) To our knowledge, our research is the first to :quant 3)) employ AMR to capture the semantics of spo- Figure 5: CAMR output for “move forward 3 feet.” The ken language; existing corpora are otherwise text- distance is mistaken for the :ARG1 thing-moved and the im- based.10 Here, we discuss the characteristics of plied “you”/robot is omitted. Compare with figure 3. our data relevant to semantic representation and highlight specific challenges and areas of interest A fourth error made by the parsers was on “can that we hope to address with AMR (§5.1); explore you” requests. These were consistently handled how to leverage AMR for these purposes by identi- by both parsers as questions of ability, annotated fying (§5.2) and remedying (§5.3) gaps in existing using possible-01, even when (as was usually AMR; and conclude with discussion (§5.4). the case) the utterances were intended to be polite commands (parser output is shown in the top parse 5.1 Challenges of the Data of figure 2). Our goal in introducing AMR is to bridge the gap 4.4 Discussion between what is annotated currently as dialogue The poor performance of both parsers on the 9 https://catalog.ldc.upenn.edu/ human-robot dialogue data is unsurprising given LDC2017T10 10 Data released at https://amr.isi.edu/ the significant differences between it and the data download/amr-bank-struct-v1.6.txt (Little the parsers were trained on. For both parsers, Prince) and LDC corpus (footnote 5). 240
structure (Traum et al., 2018), and the seman- ble of performing low-level actions with specific tic content of utterances that comprise such dia- endpoints or goals: for example, “move five feet logue (not included in the current scheme). This forward,” “face the doorway on your left,” and goal follows from the understanding that dialogue “get closer to that orange object.” This robot can- acts are composed of two primary components: (i) not successfully perform instructions that have no semantic content, identifying the entities, events, clear endpoint:“move forward” and “turn” will and propositions relevant to the dialogue; and (ii) trigger clarification requests for further specifica- communicative function, identifying the ways an tion. In our planned system, however, we would addressee may use semantic content to update the ultimately like to give the robot an instruction such information state (Bunt et al., 2012). The existing as “Robot, explore this space and tell me if any- dialogue structure annotation scheme of Traum one has been here recently.” If the robot can learn et al. (2018) distinguishes two primary levels of to decompose an instruction such as explore into pragmatic meaning important to dialogue that our smaller actions, as well as how to identify signs research aims to maintain. The first, intentional of previous inhabitants, such instructions may be- structure (Grosz and Sidner, 1986), is equiva- come feasible. lent to a TU11 : all utterances that explicate and Finally, much of the semantic content of the address an initiator’s intent. The second, inter- data in our experiment must be situated within actional structure, captures how the information the dialogue context to be properly interpreted. state of participants in the dialogue is updated as A command of “Do that again” is ambiguous in the TU is constructed (Traum et al., 2018). These terms of which action it refers back to. Similarly, a two levels of meaning stand apart from the basic negative command such as “No, go to the doorway compositional meaning of their associated utter- on the left” negates specific information contained ances.We seek to represent these pragmatic levels in a previous command. Implicit in natural lan- of meaning and link them to their respective se- guage input, as well, are subtle event-event tem- mantic forms. poral relations, such as “Move forward and take We also seek to represent the temporal, aspec- a picture” (sequential actions) and “Turn around tual, and veridical/modal nature of the robot’s ac- and take a picture every 45 degrees” (simultane- tions. Human instructions must be interpreted for ous actions). actions: the robot may respond to such instruction by asserting whether or not such an instruction is 5.2 Gaps in AMR possible given the robot’s capabilities and the sur- We focus on the three elements of meaning crucial rounding physical environment, and the robot may to human-robot dialogue and currently lacking in also communicate whether it is in the process of AMR: (i) speaker intended meaning (as differing completing or has completed the desired instruc- from the compositional meaning of the speaker’s tion. Such information is implicitly linked to the utterances);13 (ii) tense and aspect (and vericity, intentional and interactional structure of the dia- by association); and (iii) spatial parameters neces- logue. For example, the act of giving a command sary for the robot to successfully execute instruc- implies that an event (if it occurs) will happen in tions within its physical environment. the future; the act of asserting that an event has The goal of the AMR project is to represent occurred signals that the event is past. meaning, but whether such meaning is purely se- The representation of space and specific param- mantic or also captures a speaker’s intended mean- eters that contribute to the robot’s understanding ing is not specified. Here, we attempt to strike of how human language maps on to its physi- a balance between capturing speaker intended cal environment is also of interest to our work meaning and overspecifying utterances. We do here.12 As its capabilities are presented to the this to stay faithful to existing experimental dia- participant, the ‘robot’ in this research is capa- logue annotation practices, and to enable the robot 11 13 Transaction Unit is described in §2. Speaker vs. compositional meaning is arguably 12 Though this mapping is outside the scope of our work, annotated inconsistently in the current AMR corpus; we see AMR as contributing substantially richer semantic see https://www.isi.edu/˜ulf/amr/lib/ forms to the NL planning research in robotics of others, such amr-dict.html#pragmatics for some specific guide- as (Howard et al., 2014) that entails such grounding, the pro- lines, but note that the released annotations seem to differ cess of assigning physical meanings to NL expressions. from this guidance in places. 241
to generalize the connections between such inten- dialogue structure annotation scheme for our cor- tion and underlying semantic content. pus that identifies dialogue types similar to speech To illustrate the need for adding tense and as- acts. Using this scheme as a starting point, we cre- pect information to existing AMRs, compare the ate 36 unique speech acts and corresponding AMR following utterances: (i) “move forward five feet” frames: this consists of six general dialogue types (uttered by the human participant); (ii) “moving (command, assert, request, question, evaluate, ex- forward five feet...” (sent via text by the robot to press) with 5 to 12 subtypes each (e.g., move, signal initiation of instruction); and (iii) “I moved send-image). An example of a speech act template forward five feet” (sent by the robot upon comple- for command:move may be seen in figure 7.16 No- tion of the action). Although distinctions between tably, by adding a layer of speaker-intended mean- these three utterances are critical for our domain, ing to the content of the proposition itself, we are AMR represents these three utterances the same able to capture participant roles within the speech way:14 act (:ARG0 and :ARG1 of command-02). Fu- ture work will be able to reference these roles and (m / move-01 :ARG1 (r / robot) model how participant relations evolve over the :extent (d / distance-quantity course of the discourse (Allwood et al., 2000). :quant 5 :unit (f / foot)) (c / command-02 :direction (f / forward)) :ARG0-commander :ARG1-impelled agent Figure 6: Without tense and aspect representation, current :ARG2 (g / go-02 :completable + AMR conflates commands to move forward and assertions of :ARG0-goer ongoing and/or completed motion. :ARG1-extent :ARG3-start point :ARG4-end point Spatial parameters of actions are represented in :path the current AMR through the use of predicate- :direction specific ARG numbers, as outlined in the Prop- :time (a / after :op1 (n / now)))) Bank rolesets, or with the use of AMR relations, such as :path and :destination. Whether Figure 7: Speech act template for command:move. Argu- or not a relation or argument number is used, and ments and relations in italics are filled in from context and the utterance. which argument number, is specific to a predicate and therefore inconsistent across motion relations. We aim to make the representation of these param- Tense and Aspect. We also adopt the an- eters more consistent and enrich them with infor- notation scheme proposed by Donatelli et al. mation about which are required, which are op- (2018) for augmenting AMR with tense and as- tional, and which might have assumed, default in- pect. This scheme identifies the temporal na- terpretations. ture of an event relative to the dialogue act in which it is expressed as past, present, or future. 5.3 Proposed Refinements The aspectual nature of the event can be specified as atelic (:ongoing -/+), telic and hypotheti- We leverage the existing dialogue annotations to cal (:ongoing -, :completable +), telic extract intentional and interactional meaning and, and in progress (:ongoing +, :complete where relevant, map these annotations to proposed -), or telic and complete (:ongoing -, refinements described below. :complete +). A telic and hypothetical event Speech Acts. We add a higher level of prag- representation can be seen in figure 7. This matic meaning to the propositional content rep- tense/aspect annotation scheme is specific to AMR resented in the AMR through frames that corre- and coarse-grained in nature, but through its use spond to speech acts (Austin, 1975; Searle, 1969). of existing AMR relations (:time, before, We use the model of vocatives in existing AMR after, now, :op), it can be adapted to finer- as our guide.15 We also make use of the existing grained temporal relations in future work. 14 Utterance (i) would additionally annotate (m / move) Spatial Parameters. As seen in figure 7, all ar- with :mode imperative to signal a command. As noted guments of go-02 as well as additional relations in §4, parsers rarely capture this information. 15 16 https://www.isi.edu/˜ulf/amr/lib/ Annotation of tense and aspect, and the need for extra amr-dict.html#vocative relations will be explained in continuation. 242
are made use of in the template. These positions tent parses in comparison to CCG. Universal Con- correspond to parameters of the action, needed ceptual Cognitive Annotation (UCCA) (Abend so the robot may carry out the action success- and Rappoport, 2013), which also abstracts away fully. Having a template for each domain-relevant from syntactic idiosyncrasies, and its correspond- speech act will allow us to specify required and ing parser (Hershcovich et al., 2017) merits future optional parameters for robot operationalization. investigation. In figure 7, this is either :ARG1 or :ARG4, given that the robot currently needs an endpoint for each 6.2 NLU in Dialogue Systems action; if both arguments are empty, this ought to trigger a request for further information by the Task-oriented spoken dialogue systems have been robot. Note also that the same command:move an active area of research since the early 1990s. AMR (including go-02) would be implemented Broadly, the architecture of such systems includes for all realizations of commands for movement (i) automatic speech recognition (ASR) to recog- (e.g., move, go, drive), whereas these would re- nize an utterance, (ii) an NLU component to iden- ceive distinct AMRs, headed by each individual tify the user’s intent, and (iii) a dialogue man- predicate, under current practices. ager to interact with the user and achieve the in- tended task (Bangalore et al., 2006). The mean- 5.4 Discussion ing representation within such systems has, in the The refinements we present to the existing AMR past, been predefined frames for particular sub- aim to mimic a conservative learning process: we tasks (e.g., flight inquiry), with slots to be filled seek to provide just enough pragmatic meaning to (e.g., destination city) (Issar and Ward, 1993). In assist robot understanding of natural language, but such approaches, the meaning representation was we do not provide specification to the point that crafted for a specific application, making gener- there will be over-fitting in the form of one-to-one alizability to new domains difficult if not impos- mappings of semantic content and pragmatic ef- sible. Current approaches still model NLU as a fect. As research continues and robot capabilities combination of intent and dialogue act classifica- expand, we expect to augment the robot’s linguis- tion and slot tagging, but many have begun to in- tic knowledge based on general patterns in the cur- corporate recurrent neural networks (RNNs) and rent annotation scheme. some multi-task learning for both NLU and di- alogue state tracking (Hakkani-Tür et al., 2016; 6 Related work Chen et al., 2016), the latter of which allows the system to take advantage of information from the 6.1 Semantic Representation discourse context to achieve improved NLU. Sub- There is a long-standing tradition of research in se- stantial challenges to these systems include work- mantic representation within NLP, AI, as well as ing in domains with intents that have a large num- theoretical linguistics and philosophy (see Schu- ber of possible values for each slot and accommo- bert (2015) for an overview). In this body of re- dation of out-of-vocabulary slot values (i.e. oper- search, there are a variety of options that could be ating in a domain with a great deal of linguistic used within dialogue systems for NLU. However, variability). for many of these representations, there are no ex- Thus, a primary challenge today and in the past isting automatic parsers, limiting their feasibility is representing the meaning of an utterance in a for larger-scale implementation. A notable excep- form that can exploit the constraints of a partic- tion is combinatory categorical grammar (CCG) ular domain but also remain portable across do- (Steedman and Baldridge, 2009); CCG parsers mains and robust despite linguistic variability. We have already been incorporated in some current see AMR as promising because the parsers are dialogue systems (Chai et al., 2014). Although domain-independent (and can be retrained), and promising, CCG parses closely mirror the input the representation itself is flexible enough for the language, so systems making use of CCG parses addition of some domain-specific constraints. Fur- still face the challenge of a great deal of linguis- thermore, since AMR abstracts away from syn- tic variability that can be associated with a single tactic variability to represent only core elements intent. Again, in abstracting away from surface of meaning, some of the variability in the input variation, AMR may offer more regular, consis- language can be “tamed,” to give systems more 243
systematic input. With the proposed addition of 2018). For NLU, Scout Bot makes use of the speech acts to AMR described in §5.3, the aug- NPCEditor (Leuski and Traum, 2011), a statisti- mented AMRs also facilitate dialogue state track- cal classifier that learns a mapping from inputs to ing. outputs. Currently, the NPCEditor currently relies Although human-robot dialogue systems often on string divergence measures to associate an in- leverage a similar architecture to that of the spo- struction with either a text version to be sent for- ken dialogue systems described above, human- ward to the RN-Wizard or a clarification question robot dialogue introduces the challenge of phys- to be returned to the participant. However, some ically situated dialogue and the necessity for sym- of the challenging cases we analyzed in §5.1 sug- bol and action grounding, which generally incor- gest that an intermediate semantic representation porate computer vision. Few systems are tackling will be needed within the NLU phase. Specifi- all of these challenges at this point (but see Chai cally, because the instructions must be grounded et al. (2017)). A description of the preliminary within physical surroundings and with respect to human-robot dialogue system developed under the an executable set of robot actions, a semantic rep- umbrella of this project, and where this research resentation provides the structure needed to inter- might fit into that system, is described in the next pret novel instructions as well as ground instruc- section. tions in novel physical contexts. Error analysis has demonstrated that the current Scout Bot system, 7 Conclusions & Future Work by simply learning an association between an in- put string and a particular set of executed actions, Overall, we find results to be mixed on the fea- cannot generalize to unseen, novel input instruc- sibility of AMR for NLU within a human-robot tions (e.g, “Turn left 100 degrees,” as opposed to dialogue system. On one hand, AMR is attrac- a more typical number of degrees like 90) and fails tive given that there are a variety of relatively to interpret instructions with respect to the cur- robust parsers available for AMR, making im- rent physical surroundings (e.g., the destination of plementation on a larger scale feasible. How- “Move to the door on the left” will be interpreted ever, our evaluation of two parsers on the human- differently depending where the robot is facing). robot dialogue data demonstrates that retraining The structure of the semantic representation pro- on domain-relevant data is necessary, and this vided by AMR will allow the system to interpret will require a certain amount of manual annota- 100 degrees as a novel extent of turning, and al- tion. Furthermore, our assessment of the distinc- low destination slots like “door on the left” to be tions made in AMR reveal gaps that must be ad- grounded to a location in the current physical con- dressed for effective use in collaborative human- text with the help of the robot’s sensors. robot search and navigation. Nonetheless, these AMR refinements are tractable and may also be Thus, in future iterations of the dialogue sys- valuable to the broader community. tem incorporating AMR, we will retrain or refor- Thus, we have several paths forward in ongoing mulate the NPCEditor to take the automatic AMR and future work. First, we plan to use heuristics parses as input and output the in-domain AMR and manual corrections to CAMR parser output to templates described in §5.3. The Dialogue Man- create a larger in-domain training set following ex- ager will act upon these templates with either a isting AMR guidelines. We plan to combine this response/question to the participant or pass the training set with other AMRs from various human- domain-specific AMR along to be mapped to the agent dialogue data sets being annotated in parallel behavior specification of the robot for execution. with this work. In addition to expanding the train- Specific steps on this research trajectory will in- ing set for dialogue, this will allow us to explore clude (i) development of graph to graph trans- the extent to which our findings, with respect to formations to map parser output to the domain- AMR gaps, may also apply to other human-agent refined AMRs and (ii) an assessment of how well dialogue domains. the domain-refined AMRs map to a specific robot Second, we will consider how to implement planning and behavior specification, which will fa- AMR into the existing, preliminary dialogue sys- cilitate determining what other refinements may tem called ‘Scout Bot,’ which has been developed be necessary to effectively bridge from natural lan- as part of our larger research project (Lukin et al., guage instructions to robot execution. 244
References Yun-Nung Chen, Dilek Hakkani-Tür, Gökhan Tür, Jianfeng Gao, and Li Deng. 2016. End-to-end mem- Omri Abend and Ari Rappoport. 2013. Universal Con- ory networks with knowledge carryover for multi- ceptual Cognitive Annotation (UCCA). In Proceed- turn spoken language understanding. In INTER- ings of the 51st Annual Meeting of the Association SPEECH, pages 3245–3249. for Computational Linguistics (Volume 1: Long Pa- pers), volume 1, pages 228–238. Donald Davidson. 1969. The individuation of events. In Essays in honor of Carl G. Hempel, pages 216– Jens Allwood, David Traum, and Kristiina Jokinen. 234. Springer. 2000. Cooperation, dialogue and ethics. In- ternational Journal of Human-Computer Studies, Lucia Donatelli, Michael Regan, William Croft, and 53(6):871–914. Nathan Schneider. 2018. Annotation of tense and aspect semantics for sentential AMR. In Proceed- John Langshaw Austin. 1975. How to do things with ings of the Joint Workshop on Linguistic Annotation, words, volume 88. Oxford University Press. Multiword Expressions and Constructions, Santa Fe, New Mexico, USA. Association for Computational Collin F Baker, Charles J Fillmore, and John B Lowe. Linguistics. 1998. The Berkeley FrameNet project. In Proc. Charles J Fillmore, Christopher R Johnson, and of the 36th Annual Meeting of the Association for Miriam RL Petruck. 2003. Background to Computational Linguistics and 17th International FrameNet. International journal of lexicography, Conference on Computational Linguistics-Volume 1, 16(3):235–250. pages 86–90. Association for Computational Lin- guistics. Jeffrey Flanigan, Sam Thomson, Jaime Carbonell, Chris Dyer, and Noah A Smith. 2014. A discrim- Laura Banarescu, Claire Bonial, Shu Cai, Madalina inative graph-based parser for the Abstract Meaning Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Representation. In Proceedings of the 52nd Annual Knight, Philipp Koehn, Martha Palmer, and Nathan Meeting of the Association for Computational Lin- Schneider. 2013. Abstract Meaning Representation guistics (Volume 1: Long Papers), volume 1, pages for sembanking. In Proceedings of the 7th Linguis- 1426–1436. tic Annotation Workshop and Interoperability with Discourse, pages 178–186. Sahil Garg, Aram Galstyan, Ulf Hermjakob, and Daniel Marcu. 2016. Extracting biomolecular inter- Srinivas Bangalore, Dilek Hakkani-Tür, and Gokhan actions using semantic parsing of biomedical text. Tur. 2006. Introduction to the special issue on spo- In Proc. of AAAI, Phoenix, Arizona, USA. ken language understanding in conversational sys- tems. Speech Communication, 3(48):233–238. Barbara J Grosz and Candace L Sidner. 1986. Atten- tion, intentions, and the structure of discourse. Com- Claire Bonial, Stephanie Lukin, Ashley Foots, Cas- putational linguistics, 12(3):175–204. sidy Henry, Matthew Marge, Kimberly Pollard, Ron Artstein, David Traum, and Clare R. Voss. 2018. Dilek Hakkani-Tür, Gökhan Tür, Asli Celikyilmaz, Human-robot dialogue and collaboration in search Yun-Nung Chen, Jianfeng Gao, Li Deng, and Ye- and navigation. In Proceedings of the Annota- Yi Wang. 2016. Multi-domain joint semantic frame tion, Recognition and Evaluation of Actions (AREA) parsing using bi-directional rnn-lstm. In Inter- Workshop at LREC, 2018. speech, pages 715–719. Harry Bunt, Jan Alexandersson, Jae-Woong Choe, Daniel Hershcovich, Omri Abend, and Ari Rappoport. Alex Chengyu Fang, Koiti Hasida, Volha Petukhova, 2017. A transition-based directed acyclic graph Andrei Popescu-Belis, and David R Traum. 2012. parser for UCCA. arXiv preprint arXiv:1704.00552. Iso 24617-2: A semantically-based standard for di- Thomas Howard, Stephanie Tellex, and Nicholas Roy. alogue annotation. In LREC, pages 430–437. Cite- 2014. A natural language planner interface for mo- seer. bile manipulators. In Proceedings of the IEEE In- ternational Conference on Robotics and Automation Joyce Y. Chai, Rui Fang, Changsong Liu, and Lanbo (ICRA 2014). She. 2017. Collaborative Language Grounding To- ward Situated Human-Robot Dialogue. AI Maga- Sunil Issar and Wayne Ward. 1993. Cmlps robust spo- zine, 37(4):32. ken language understanding system. In Third Eu- ropean Conference on Speech Communication and Joyce Y Chai, Lanbo She, Rui Fang, Spencer Ottarson, Technology. Cody Littley, Changsong Liu, and Kenneth Hanson. 2014. Collaborative effort towards common ground Karin Kipper, Anna Korhonen, Neville Ryant, and in situated human-robot dialogue. In Proceedings Martha Palmer. 2008. A large-scale classification of of the 2014 ACM/IEEE international conference on English verbs. Language Resources and Evaluation, Human-robot interaction, pages 33–40. ACM. 42(1):21–40. 245
Anton Leuski and David Traum. 2011. Npceditor: Cre- Lenhart K Schubert. 2015. Semantic representation. In ating virtual human dialogue using information re- AAAI, pages 4132–4139. trieval techniques. Ai Magazine, 32(2):42–56. John Rogers Searle. 1969. Speech acts: An essay in the Fei Liu, Jeffrey Flanigan, Sam Thomson, Norman philosophy of language, volume 626. Cambridge Sadeh, and Noah A Smith. 2015. Toward abstrac- University Press. tive summarization using semantic representations. In Proc. of NAACL. Lanbo She and Joyce Chai. 2017. Interactive Learning of Grounded Verb Semantics towards Human-Robot Stephanie M Lukin, Felix Gervits, Cory J Hayes, An- Communication. In Proceedings of the 55th An- ton Leuski, Pooja Moolchandani, John G Rogers III, nual Meeting of the Association for Computational Carlos Sanchez Amaro, Matthew Marge, Clare R Linguistics (Volume 1: Long Papers), pages 1634– Voss, and David Traum. 2018. Scoutbot: A dialogue 1644, Vancouver, Canada. Association for Compu- system for collaborative navigation. arXiv preprint tational Linguistics. arXiv:1807.08074. Mark Steedman and Jason Baldridge. 2009. Combina- Christopher Manning, Mihai Surdeanu, John Bauer, tory categorial grammar. nontransformational syn- Jenny Finkel, Steven Bethard, and David McClosky. tax: A guide to current models. Blackwell, Oxford, 2014. The Stanford CoreNLP natural language pro- 9:13–67. cessing toolkit. In Proceedings of 52nd annual meeting of the association for computational lin- David Traum, Cassidy Henry, Stephanie Lukin, Ron guistics: system demonstrations, pages 55–60. Artstein, Felix Gervits, Kimberly Pollard, Claire Bonial, Su Lei, Clare Voss, Matthew Marge, Cory Matthew Marge, Claire Bonial, Brendan Byrne, Taylor Hayes, and Susan Hill. 2018. Dialogue Structure Cassidy, A. William Evans, Susan G. Hill, and Clare Annotation for Multi-Floor Interaction. In Proceed- Voss. 2016a. Applying the Wizard-of-Oz Technique ings of the Eleventh International Conference on to Multimodal Human-Robot Dialogue. In Proc. of Language Resources and Evaluation (LREC 2018), RO-MAN. Miyazaki, Japan. European Language Resources Matthew Marge, Claire Bonial, Kimberly A Pollard, Association (ELRA). Ron Artstein, Brendan Byrne, Susan G Hill, Clare David R Traum and Staffan Larsson. 2003. The in- Voss, and David Traum. 2016b. Assessing Agree- formation state approach to dialogue management. ment in Human-Robot Dialogue Strategies: A Tale In Current and new directions in discourse and dia- of Two Wizards. In International Conference on In- logue, pages 325–353. Springer. telligent Virtual Agents, pages 484–488. Springer. Chuan Wang, Nianwen Xue, and Sameer Pradhan. Arindam Mitra and Chitta Baral. 2016. Addressing a 2015. A transition-based algorithm for AMR pars- question answering challenge by combining statisti- ing. In Proceedings of the 2015 Conference of cal methods with inductive rule learning and reason- the North American Chapter of the Association for ing. In Proc. of AAAI, pages 2779–2785. Computational Linguistics: Human Language Tech- Martha Palmer, Daniel Gildea, and Paul Kingsbury. nologies, pages 366–375. 2005. The proposition bank: An annotated cor- Yanshan Wang, Sijia Liu, Majid Rastegar-Mojarad, Li- pus of semantic roles. Computational Linguistics, wei Wang, Feichen Shen, Fei Liu, and Hongfang 31(1):71–106. Liu. 2017. Dependency and AMR embeddings for Xiaoman Pan, Taylor Cassidy, Ulf Hermjakob, Heng Ji, drug-drug interaction extraction from biomedical lit- and Kevin Knight. 2015. Unsupervised entity link- erature. In Proc. of ACM-BCB, pages 36–43, New ing with Abstract Meaning Representation. In Proc. York, NY, USA. of HLT-NAACL, pages 1130–1139. Terence Parsons. 1990. Events in the Semantics of En- glish, volume 5. MIT Press, Cambridge, MA. Penman Natural Language Group. 1989. The Penman user guide. Technical report, Information Sciences Institute. Nima Pourdamghani, Kevin Knight, and Ulf Herm- jakob. 2016. Generating English from Abstract Meaning Representations. In Proc. of INLG, pages 21–25. Sudha Rao, Daniel Marcu, Kevin Knight, and Hal Daumé III. 2017. Biomedical event extraction us- ing Abstract Meaning Representation. In Proc. of BioNLP, pages 126–135, Vancouver, Canada. 246
You can also read