Classifying the form of iconic hand gestures from the linguistic categorization of co-occurring verbs
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Classifying the form of iconic hand gestures from the linguistic categorization of co-occurring verbs Magdalena Lis Costanza Navarretta Centre for Language Technology Centre for Language Technology University of Copenhagen University of Copenhagen Njalsgade 140 Njalsgade 140 2300 Copenhagen 2300 Copenhagen magdalena@hum.ku.dk costanza@hum.ku.dk Abstract Iconic gestures are especially well-suited to ex- press spatio-motoric information (Alibali et al., This paper deals with the relation between 2001; Krauss et al., 2000; Rauscher et al., 1996) speech and form of co-occurring iconic and, thus, often accompany verbal expressions of hand gestures. It focuses on multimodal eventualities, in particular motion eventualities. expression of eventualities. We investi- Eventuality is an umbrella term for entities like gate to what extent it is possible to au- events, actions, states, processes, etc. (Ramchard, tomatically classify gestural features from 2005).1 On the level of language, such entities the categorization of verbs in a wordnet. are mostly denoted by verbs. Gesturally, they are We do so by applying supervised machine depicted by means of iconicity relation (McNeill, learning to an annotated multimodal cor- 1992; McNeill, 2005; Peirce, 1931). This rela- pus. The annotations describe form fea- tion does not, however, on its own fully explain tures of gestures. They also contain in- the form that a gesture takes - a referent can be formation about the type of eventuality, depicted in gestures in multiple ways, for instance verb Aktionsart and Aspect, which were from different perspectives - that of the observer or extracted from plWordNet 2.0. Our results that of the agent. How a speaker chooses to repre- confirm the hypothesis that the Eventual- sent a referent gesturally determines which physi- ity Type and Aktionsart are related to the cal form a gesture takes. Knowledge about the fac- form of gestures. They also indicate that it tors influencing this choice, is still sparse (Kopp is possible to some extent to classify cer- et al., 2008). It is, however, crucial not only for tain form characteristics of gesture from our understanding of human communication but the linguistic categorization of their lexi- also for theoretical models of gesture production cal affiliates. We also identify the gestu- and its interaction with speech. Such models can ral form features which are most strongly in turn inform generation of natural communica- correlated to the Viewpoint adopted in ges- tive behaviors in Embodied Conversational Agents ture. (Kopp et al., 2008). Keywords: multimodal eventuality expression, The present paper contributes to this under- iconic co-speech gesture, wordnet, machine learn- standing. It addresses a particular aspect of ges- ing ture production and its relationship to speech with focus on multimodal expression of eventualities. 1 Introduction Various factors have been suggested to influence In face-to-face interaction humans communicate eventuality gestures, including referent character- by means of speech as well as co-verbal ges- istics (Parrill, 2010; Poggi, 2008), verb Aspect tures, i.e. spontaneous and meaningful hand (Duncan, 2002) and Aktionsart (Becker et al., movements semantically integrated with concur- 2011). We present a pilot study investigating the rent spoken utterances (Kendon, 2004; McNeill, extent to which hand gestures can be automatically 1992). Gestures which depict entities are called iconic gestures. Such gestures are co-expressive 1 In gesture studies the terms ’action’ or ’event’ are habit- with speech, but not redundant. According to in- ually used in this sense. We adopted the term ’eventuality’ to accommodate the terminology for Aktionsart categories ter alia McNeill (1992; 2005) and Kendon (2004), reported in Subsection 3.2.2, where ’action’ and ’event’ are they form an integral part of a spoken utterance. subcategories of what can be termed ’eventualities.’ Proceedings from the 1st European Symposium on Multimodal Communication, University of Malta, Valletta, October 17–18, 2013 41
classified from the information about these factors. future research. We extract this information from the categoriza- tion of verbs in a lexical-semantic database called 2 Background wordnet. The theoretical background and method- ological framework are discussed in (Lis, 2012a; The form of co-verbal, iconic gestures is influ- Lis, 2012b; Lis, submitted). enced by, among others, the semantics of the co- occurring speech and by the visually perceivable In the present paper, differing from preceding characteristics of the entity referred to (Kita and studies on the multimodal expression of eventu- Özyürek, 2003; McNeill, 1992). Poggi (2008) has alities, we test the hypotheses by applying super- suggested that not only the observable properties vised learning on the data. Our aim in employ- of the referent should be taken into consideration ing this method is to test the annotation scheme but also "the type of semantic entity it constitutes." and potential application of the annotations in au- She has distinguished four such types (Animates, tomatic systems and to study the relationship be- Artifacts, Natural Objects and Eventualities) and tween speech and gesture not only for relations proposed that their gestural representation will dif- between single variables but also groups of at- fer. tributes. In this, we follow the approach adopted Eventualities themselves can still be repre- by a number of researchers. For example, Jokinen sented in gesture in various ways, for example and colleagues (2008) have used classification ex- from different Viewpoints (McNeill, 1992; Mc- periments to test the adequacy of the annotation Neill, 2005). In Character Viewpoint gestures (C- categories for the studied phenomenon. Louw- vpt), an eventuality is shown from the perspec- erse and colleagues (2006a; 2006b) have applied tive of the agent, gesturer mimes agent’s behav- machine learning algorithms on annotated English ior; in Observer Viewpoint (O-vpt), the narrator map-task dialogues to study the relation between sees the eventuality as an observer and in Dual facial expressions, gaze and speech. A number of Viewpoint (D-vpt), the gesturer merges the two papers (Fujie et al., 2004; Morency et al., 2009; perspectives. Parrill (2010) has suggested that the Morency et al., 2005; Morency et al., 2007; Navar- choice of Viewpoint is influenced by the eventu- retta and Paggio, 2010) describe classification ex- ality structure. She has proposed that eventuali- periments testing the correlation between speech, ties which have trajectory as the more salient ele- prosody and head movements in annotated multi- ment - elicit O-vpt gesture, while eventualities in modal corpora. Machine learning algorithms have which the use of character’s hands in accomplish- also been applied to annotations of hand gestures ing a task is more prominent - tend to evoke C-vpt and the co-occurring referring expressions in or- gestures. der to identify gestural features relevant for co- Other factors suggested to influence eventual- reference resolution (Eisenstein and Davis, 2006; ity gestures include verb Aspect and Aktionsart. Navarretta, 2011). Aspect marks "different ways of viewing the in- Moreover, in the present work, we extend the ternal temporal constituency of a situation" (Com- annotations reported in (Lis, 2012b) with two rie, 1976). The most common distinction is be- more form attributes (Movement and Direction). tween perfective and imperfective aspect: the for- These attributes are chosen because they belong to mer draws focus to the completeness and resul- fundamental parameters of gesture form descrip- tativness of an eventuality, whereas with the lat- tion (Bressem, 2013) and they are associated with ter the eventuality is viewed as ongoing. Duncan motion, so are expected to be of importance con- (2002) has analyzed the relationship between As- sidering we study eventualities, especially motion pect of verbs and Handedness in gestures in En- ones. glish and Chinese data. Handedness regards which The paper is organized as follows. In section 2, hand performs the movement and, in case of bi- we shortly present the background for our study, handed gestures, whether the hands mirror each and in section 3 we describe the multimodal cor- other. Duncan has found that symmetric bi-handed pus and the annotations used in our analyses. In gestures more often accompany perfective verbs section 4, we present the machine learning experi- than imperfective ones; the latter mostly co-occur ments, and in section 5 we discuss our results and with two handed non-symmetric gestures. Parrill their implications, and we propose directions for and colleagues (2013) have investigated the rela- Proceedings from the 1st European Symposium on Multimodal Communication, University of Malta, Valletta, October 17–18, 2013 42
tionship between verbal Aspect and gesture Iter- ation (repetition of a movement pattern within a gesture). They have found that descriptions in pro- gressive Aspect are more often accompanied by it- erated gestures. This is, however, only the case if eventualities are presented to the speakers in that Aspect in the stimuli. Aktionsart is a notion similar to, but discernible from, Aspect.2 It concerns Vendler’s (1967) dis- tinction between States, Activities, Accomplish- ments and Achievements, according to differ- ences between the static and dynamic, telic and atelic, durative and punctual. Becker and col- leagues (2011) have conducted a qualitative study Figure 1: A snapshot from the ANVIL tool on Aktionsart and temporal coordination between speech and gesture. They have suggested that gestures affiliated with Achievement and Accom- tualities and has proved to elicit rich multimodal plishment verbs are completed, or repeated, on the output. The monologues were recorded in a studio goal of the verb, whereas in case of gestures ac- as shown in Figure 1 and the whole corpus consists companying Activity verbs, the stroke coincides of approximately one hour of recordings. with the verb itself. Lis (2012a) has introduced a framework in 3.2 The annotation which the relationship between these factors and gestural expressions of eventualities is investi- Speech has been transcribed with word time gated using wordnet databases, i.e. electronic lin- stamps by the DiaGest group, who has also iden- guistic taxonomies. She has employed wordnet tified communicative hand gestures and annotated to, among others, formalize Poggi (2008) and Par- their phases, phrases and semiotic types in ELAN rill’s (2010) insights. Based on plWordNet 1.5 (Wittenburg et al., 2006). Lis (2012a; 2012b) ex- classification, she has distinguished different types ported the annotations to the ANVIL tool (Kipp, of eventualities and showed their correlation with 2004) and enriched it with coding of verbs and gestural representation (Lis, 2012b). The present Viewpoint, Handedness, Handshape and Iteration study further builds up on that work, using updated of gestures. The annotations in the corpus were (plWN 2.0), revised (Lis, submitted) and extended refined and, for the purpose of the present study, annotations and machine learning experiments. further extended with two more gesture form at- tributes (Direction and Movement) (Lis, submit- 3 The data ted). 3.1 The corpus 3.2.1 The annotation of gestures Our study was conducted on the refined annota- Iconic hand gestures were identified based on Di- tions (Lis, submitted) from the corpus described aGest’s annotation of semiotic types. Gestures in (Lis, 2012a; Lis, 2012b), which has in turn depicting eventualities were manually annotated been an enriched version of the PCNC corpus cre- using six pre-defined features, as reported in de- ated by the DiaGest research group (Karpiński tail in (Lis, submitted). Table 1 shows the at- et al., 2008). Data collection followed the tributes and values for gestures annotation used well-established methodology of McNeill (1992; in this study. Viewpoint describes the perspec- 2005): the corpus consists of audio-video record- tive adopted by the speaker and was encoded us- ings of 5 male and 5 female adult native Polish ing the values proposed by McNeill: C-, O- and speakers who re-tell a Canary Row cartoon to an D-vpt (1992). The attribute Handedness indi- addressee. The stimulus contains numerous even- cates whether one (Right_Hand, Left_Hand) or 2 two hands are gesturing and whether they are For a discussion on the differences between Aspect and Aktionsart and between the Germanic and Slavic traditions symmetric or not (Symmetric_Hands versus Non- of viewing these two concept cf. (Młynarczyk, 2004). symmetric_Hands). Handshape refers to configu- Proceedings from the 1st European Symposium on Multimodal Communication, University of Malta, Valletta, October 17–18, 2013 43
ity. The choice of the domains in focus has been Table 1: Annotations of gestures partially inspired by Parrill’s distinction between Attribute Value eventualities with a more prominent trajectory Viewpoint Observer_Viewpoint, Character_Viewpoint, versus eventualities with a more prominent han- Dual_Viewpoint, dling element (Parrill, 2010). Based on this, Lis Handshape ASL_C, ASL_G, ASL_5, ASL_O, ASL_S, Complex (2012a; 2012b) has distinguished two Eventuality Other Types:3 Translocation and Body_Motion. The Handedness Right_Hand, Left_Hand, Symmetric_Hands, former refers to eventualities with traversal of Non-symmetric_ Hands a path of a moving object or focus on spatial Iteration Single, Repeated, Hold Movement Straight, Arc, Circle, arrangement and the latter refers to a movement Complex, None of agent’s body (part) not entailing displacement Direction Vertical, Horizontal, Multidirectional, None of the agent as a whole (cf: (Levin, 1993)). Lis has subsumed plWordNet domains to fit this distinction. The domains relevant to our study are ration of palm and fingers of the gesturing hand(s); (with examples of verbs from the corpus given in the values are taken from American Sign Lan- parentheses): guage Handshape inventory (Tennant and Brown, TRANSLOCATION 2010): ASL_C, ASL_G, ASL_5, ASL_O, ASL_S {location or spatial relations}4 (spadać ’to fall,’ and supplemented with the value for hand shapes zderzać si˛e ’to collide’); changing throughout the stroke (Complex) or not {change of location or spatial relations falling under any of the mentioned categories change}(biegać ’to run,’ skakać ’to jump’). (Handshape_Other). Iteration indicates whether BODY_MOTION a particular movement pattern within a stroke oc- {causing change of location or causing spatial curs once (Single) or multiple times (Repeated), or relations change}(rzucać ’to throw,’ otwierać ’to whether the stroke consists of a static Hold. Move- open’); ment regards shape of the motion, while Direction {physical contact}(bić ’to beat’, łapać ’to catch’); - the plane on which the motion is performed. {possession}(dawać ’to give,’ brać ’to take’); {producing}(budować ’to build,’ gotować ’to 3.2.2 The annotation of verbs cook’). Verbs were identified in the word stamp speech Verbs from the synsets {location or spatial transcript. Information about verbs was extracted relations} and its alterational counterpart were from the Polish WordNet, plWordNet 2.0, fol- subsumed under the type Translocation. More lowing the procedure explained in (Lis, 2012a; examples of the verbs from the corpus include: Lis, 2012b). In a wordnet, the lexical units are wspinać si˛e ’to climb,’chodzić ’to walk,’ wypadać classified into sets of synonyms, called synsets, ’to fall out’. Synsets {causing change of location which are linked to each other via a number of or causing spatial relations change} and {physical conceptual-semantic and lexical relations (Fell- contact}, as well as {possession} and {produc- baum, 1998). The most frequently encoded one ing} were grouped under the type Body_Motion. is hyponymy, also called IS_A or TYPE_OF re- Further verb examples are: przynosić ’to bring,’ lation, that connects a sub-class to its super- trzymać ’to keep,’ walić ’to bang,’ dawać ’to class, the hyperonym. Non-lexical synsets in the give,’ szyć ’to sew.’ Verbs from the remaining upper-level hierarchies of hyponymy encodings in domains were collected under the umbrella term plWordNet contain information on verb Aspect, ’Eventuality_Other.’ These verbs constituted less Aktionsart and domain (Maziarz, 2012). than 10% of all verb-gesture tokens found in the A domain denotes a segment of reality and data. Examples include: {social relationships} all lexical units belonging to a particular domain grać ’to play,’ {mental or emotional state} ogla- ˛ share a common semantic property (Brinton, dać ’to watch.’ For the purpose of the analyses 2000). Lis (2012a; 2012b) has used wordnet in the present paper, they were combined with domains to categorize referents of multimodal expressions according to their type. The attribute 3 Note that these categories are orthogonal to Poggi’s Eventuality Type was assigned based the domain (2008) ontological types. 4 of the verb used in speech to denote the eventual- In wordnets {} indicates a synset. Proceedings from the 1st European Symposium on Multimodal Communication, University of Malta, Valletta, October 17–18, 2013 44
Table 2: Annotations of verbs Table 3: Classification of Handshape Attribute Value Hanshape Precision Recall F-score Eventuality Type Translocation, baseline 0.08 0.28 0.12 Body_Motion, Other Aspect 0.08 0.28 0.12 Aspect Perfective,Imperfective Aktionsart 0.22 0.28 0.21 Aktionsart State, Act, Activity, Type 0.17 0.32 0.22 Accident, Event, all 0.19 0.27 0.21 Action, Process hand gestures from the characteristics of eventual- the Body_Motion category.5 The domains were ities and verbs, as reflected in plWordNet’s cate- semi-automatically assigned to the verbs in our gorization. The relevant data were extracted from data. Verb polysemy was resolved with a refined gesture and orthography tracks in ANVIL, and version (Lis, submitted) of the heuristics proposed combined using the Multilink annotation. Clas- in (Lis, 2012b). sification experiments were performed in WEKA Apart from the domains, the encoding of (Witten and Frank, 2005) using ten-fold cross- hyponymy-hyperonymy relations of verbs in validation to train and test the classifiers. As plWordNet provides also information about Ak- baseline in the evaluation, the results obtained by tionsart and Aspect. The attribute Aspect has the ZeroR classifier were used. ZeroR always two possible values: Perfective and Imperfective. chooses the most frequently occurring nominal For Aktionsart, seven categories are distinguished: value. An implementation of a support vector States, Acts, Activities, Accidents, Events, Ac- classifier (WEKA’s SMO) was applied in all other tions and Processes. They are Laskowski’s (1998) cases; various algorithms were tested, with SMO adaptation of Vendler’s (1967) Aktionsart classifi- giving the best results. The results of the exper- cation to the features typical for Polish language.6 iments are provided in terms of Precision, Recall Table 2 shows the attributes and values for verbs and F-score (Witten and Frank, 2005). annotation used in our study. 4.1 Classifying the gesture form features 3.2.3 The annotation process from linguistic information Gestures and verbs were coded on separate tracks In these experiments we wanted to test whether and connected by means of MultiLink option in it is possible to predict the form of the gesture ANVIL. Gestures were linked to the semanti- from the type of the eventuality referred to and in- cally affiliated verb. The verbs and gestures were formation about Aspect and Aktionsart. The first closely related temporally: 80% of the verb on- group of experiments regards the Handshape at- sets fell within stroke phase or slightly preceded tribute with seven possible values. In Table 3, the it (Lis, submitted). Figure 1 shows a screen-shot results of these experiments are shown. They in- of the annotations in the tool. 269 relevant verb- dicate that Aspect information does not at all af- gesture pairs were found in the data. Intercoder fect the classification of Handshape, and Eventu- agreement was calculated for the majority of the ality Type and Aktionsart only slightly contribute gesture annotation attributes and ranged from 0.67 to the classification (the best result is obtained us- to 0.96 (Lis, submitted) in terms of κ score (Co- ing Eventuality Type annotation, F-score improve- hen, 1960), i.e. from substantial to almost perfect ment of 0.1 with respect to the baseline, but is not agreement (Rietveld and van Hout, 1993). significant).7 Not surprisingly, the confusion ma- 4 The classification experiments trix from this experiment shows that the categories which are assigned more correctly are those that In the machine learning experiments we wanted occur more often in the data (ASL_5 and ASL_S). to test to which extent we can predict the form of In the following experiment, we wanted to test 5 whether Aktionsart, Aspect and Eventuality Type The resulting frequency distribution of Type in the verb-gesture pairs: Translocation(150) and are related to the employment of hands in the ges- Body_Motion+Other(119). tures. Thus, Handedness was predicted using the 6 Laskowski’s (1998) categories of Vendler’s (1967) Ak- 7 tionsart are called Classes. For the sake of simplicity, we use We indicate significant results with ∗. Significance was the term Aktionsart instead of Class to refer to them. calculated with one-tailed t-test and p
Table 4: Classification of Handedness Table 7: Classification of Direction Handedness Precision Recall F-score Direction Precision Recall F-score baseline 0.2 0.44 0.27 baseline 0.26 0.50 0.34 Aspect 0.2 0.44 0.27 Aspect 0.26 0.50 0.34 Aktionsart 0.33 0.45 0.37 Aktionsart 0.47 0.55 0.50 Type 0.36 0.48 0.41 Type 0.26 0.50 0.34 all 0.35 0.47 0.40 all 0.47 0.55 0.50 Table 5: Classification of Iteration Table 8: Predicting the Viewpoint type from lin- Iteration Precision Recall F-score guistic information baseline 0.55 0.74 0.63 Viewpoint Precision Recall F-score Aspect 0.55 0.74 0.63 baseline 0.29 0.54 0.38 Aktionsart 0.55 0.74 0.63 Aspect 0.29 0.54 0.38 Type 0.55 0.74 0.63 Aktionsart 0.53 0.59 0.53 all 0.55 0.74 0.63 Type 0.71 0.78 0.74 all 0.71 0.78 0.74 verb related annotations. The results of these ex- periments are in Table 4. Also in this case, As- tween the linguistic categorization of verbs and the pect does not contribute to the prediction of ges- direction of the hand movement was determined. ture form. However, the results show that infor- The results of these classification experiments are mation about the Eventuality Type to some extent given in Table 7. They indicate that only Aktion- improves classification with respect to the base- sart contributes to the prediction of Direction (the line (F-score improvement: 0.14∗). The most improvement with respect to the baseline: 0.16∗). correctly identified gestures were performed with 4.2 Classifying the Viewpoint Right_Hand and Symmetrical_Hands, which are the most frequently occurring Handedness values In the following experiments we investigated to in the data. what extent it is possible to predict the Viewpoint in gesture from a) the linguistic categorization of In the third group of experiments, we wanted the verb and b) from the gesture form. to investigate whether the linguistic categorization In the first experiment, we tried to automati- of verbs improves the prediction of the gesture It- cally identify the Viewpoint in the gesture from the eration. The results of these classification exper- Eventuality Type annotation. We also investigated iments are in Table 5. They indicate that no sin- to which extent the verb Aspect and Aktionsart gle feature contributes to the classification of hand contribute to the classification. The results of these repetition: in all cases the most frequently occur- experiments are in Table 8. The results confirm ring value, Single, is chosen as in the baseline. that there is a strong correlation between View- In the fourth group of experiments we analyzed point and Eventuality Type (F-score improvement whether the linguistic categorization of verbs en- with respect to the baseline: 0.36∗). We also found hances the prediction of Movement. We present a correlation between Viewpoint and Aktionsart. the results of these classification experiments in In Figure 2 the confusion matrix for the best Table 6. They show that none of the investigated classification results are given. Not surprisingly, verbal attributes has a relation to the Movement in the classifier did not perform well on the very in- gesture. frequent category, i.e. D-vpt. In the fifth group of experiments the relation be- a b c
tualities, on the other hand, is less readily reen- Table 9: Predicting the Viewpoint type from form acted without the risk of hindering communicative features Viewpoint Precision Recall F-score flow between interlocutors. It can, however, be easily depicted from an external perspective with baseline 0.29 0.54 0.38 Handshape 0.64 0.7 0.67 gestures drawing paths. Moreover, we have iden- Handedness 0.58 0.64 0.60 tified the form features of gestures which are most Iteration 0.67 0.57 0.44 tightly related to the Viewpoint, that is Handshape Movement 0.55 0.55 0.43 Direction 0.67 0.57 0.44 and Handedness. In line with the previous inter- all 0.68 0.72 0.69 pretation, Lis (submitted) suggests that C-vpt ges- tures often depict interaction with an object and the hand shapes reflect grasping and holding. O- from Handshape, Handedness, Iteration, Move- vpt gestures, on the contrary, focus on shapes and ment and Direction. Table 9 summarizes the re- spatial extents and utilize, thus, hand shapes con- sults of these experiments. They demonstrate a venient for depicting lines, i.e. a hand with ex- strong correlation between the form of a gesture tended finger(s). It needs to be, however, further and the gesturer’s Viewpoint: F-score improve- examined in how far the distribution of Handshape ment with respect to the baseline is 0.31∗ when and Handedness in our data is motivated by the all form related features are used, and all features specifics of the stimuli. contribute to the classifications. Handshape and Handedness are the features most strongly corre- Our findings also show that the type of even- lated to Viewpoint. In Figure 3 the confusion ma- tuality improves prediction of Handedness. How- trix for the best classification results is given. ever, Eventuality Type provides a more substan- tial improvement in the prediction of Viewpoint, a b c
our data.8 The three most frequent Aktionsart cat- wordnet for investigation of speech-gesture en- egories share the feature ’intentionality,’ but be- sembles. Wordnet classification of lexical items long to different groups in Vendler’s classification can be used to shed some light on speech-related (Maziarz et al., 2011). It should be investigated in gestural behavior. Using wordnet as an external how far different Aktionsart types in our data are source of annotation increases coding reliability represented for different Eventuality Types, as that and due to the wordnet machine-readable format, may provide a further explanation of the obtained it enables automatic assignment of values. Word- results. nets exist for numerous languages and the ap- Aspect does not improve the classification for proach may, thus, be applied cross-linguistically any feature. The observation that Aspect is re- and help to uncover universal versus language- lated to Handedness (Duncan, 2002) and Itera- specific structures in gesture production. The find- tion (Parrill et al., 2013) is, thus, not reflected ings support the viability of a number of categories in this corpus. It needs to be remembered that in the annotation scheme used - they corroborate the relationship between Aspect and Iteration was that the type of referent is a category relevant to found by Parrill and colleagues (2013) only when studying gestural characteristics and they validate the eventualities were presented to speakers in the the importance of introducing distinctions among appropriate Aspect in the stimuli. Our results eventualities for multimodal phenomena. The ex- suggest it may not be generalizable to an over- periments also identify another attribute, i.e. Ak- all correlation between Aspect and gesture Itera- tionsart, as relevant in the framework. tion. Moreover, Aspect is expressed very differ- It has to, however, be noted that our study is ently in the three languages under consideration only preliminary, because the results of our ma- (Polish - the present study, English (Parrill et al., chine learning experiments are biased by the fact 2013), and English and Chinese (Duncan, 2002)). that for some attributes certain values occur much Cross-linguistic differences have been found to be more frequently than others in the data. Future reflected in gesturing (Kita and Özyürek, 2003). work should address normalization as a possible Whether such differences in encoding of Aspect solution. Moreover, our findings are based on nar- impact gestures should be, thus, investigated fur- rational data, and need to be tested on different ther. types of interaction. Most importantly, the dataset The results of the experiments also indicate that we used is small for machine learning purposes. gestural Iteration and Movement are not at all re- Due to time load of multimodal annotation, small lated to the linguistic characteristics of the co- datasets are a well-known challenge in gesture re- occurring verb and that the only feature improv- search. Our results await, thus, validation on a ing classification of gesture direction is Aktion- larger sample. Also, cross-linguistic studies on sart. For Iteration, however, our data are biased in comparative corpora should be performed. that single gestures are predominant, which may In the present work only one type of bodily be- have affected the results. Regarding Movement haviors, i.e. hand gestures, was taken into account, and Direction, we suggest that they may be pri- but people use all their body when communicat- marily dependent on visual properties of the refer- ing. Thus, we plan to extend our investigation to ent, rather than the investigated factors. For exam- gestures of other articulators, such as head move- ple, Kita and Özyürek (2003) have found that the ments and posture changes. In the present work direction of gesture in elicited narrations reflects only gestures referring to eventualities were con- the direction in which an eventuality has been pre- sidered. Lis (submitted) has recently started ex- sented in the stimuli. The only improvement iden- tending the wordnet-based framework and investi- tified in our experiments in the classification of Di- gation to animate and inanimate objects. rection (due to Aktionsart) requires further inves- tigation. Our results suggest the viability of the frame- References work adopted in the paper, i.e. application of Polish WordNet. Wrocław University of Technology. http://plwordnet.pwr.wroc.pl/wordnet/. 8 The frequency distribution of Aktionsart in the verb- gesture pairs: Activities(115), Acts(56), Actions(58), Tennant, R. and M. Brown The American Sign Lan- Events(23), Accidents(15), States(2), Processes(0), and of guage Handshape Dictionary. Washington, DC: Aspect: Imperfective(179) and Perfective(126). Gallaudet University Press (2010). Proceedings from the 1st European Symposium on Multimodal Communication, University of Malta, Valletta, October 17–18, 2013 48
Alibali, M. W., Heath, D. C., and Meyers, H. J. Ef- Kendon, A. Gesture: Visible Action As Utterance. fects of visibility between speakers and listeners on Cambridge University Press, Cambridge (2004). gesture production: Some gestures are meant to be seen. Journal of Memory and Language, 44:159– Kipp, M. Gesture Generation by Imitation - From Hu- 188 (2001). man Behavior to Computer Character Animation. Boca Raton, Florida (2004). Becker, R., Cienki, A., Bennett, A., Cudina, C., De- bras, C, Fleischer, Z., M. Haaheim, T. Mueller, Kita, S. and A. Özyürek. What does cross–linguistic K. Stec, and A. Zarcone. Aktionsarten, speech and variation in semantic coordination of speech and gesture. In Gesture and Speech in Interaction ’11, gesture reveal? Evidence for an interface represen- (2011). tation of spatial thinking and speaking. Journal of Memory and Language, 48(1):16–32 (2003). Bressem, J. A linguistic perspective on the notation of form features in gestures. Body – Language – Com- Kopp, S., Bergmann, K., and Ipke, W. Multimodal munication. Handbooks of Linguistics and Commu- communication from multimodal thinking – towards nication Science. Berlin, New York: Mouton de an integrated model of speech and gesture produc- Gruyter (2013). tion. Semantic Computing, 2(1):115–136 (2008). Brinton, L. The structure of modern English: A lin- Krauss, R. M., Chen, Y., and Gottesman, R. F. Lexi- guistic introduction. John Benjamins Publishing cal gestures and lexical access. a process model. In Company (2000). McNeill, D., editor, Language and Gesture, pages 261–283. Cambridge University Press, New York Comrie, B. Aspect. Cambridge: Cambridge University (2000). Press (1976). Laskowski, L. Kategorie morfologiczne j˛ezyka pol- Cohen, J. A coefficient of agreement for nominal skiego — charakterystyka funkcjonalna. PWN, scales. Educational and Psychological Measure- Warszawa (1998). ment, 20(1):37–46 (1960). Levin, B. English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Duncan, S. Gesture, verb Aspect, and the nature Press, Chicago (1993). of iconic imagery in natural discourse. Gesture, 2(2):183–206 (2002). Lis, M. Annotation scheme for multimodal commu- nication: Employing plWordNet 1.5. In Proceed- Eisenstein, J.and Davis, R. Gesture features for coref- ings of the Formal and Computational Approaches erence resolution. In Renals, S., Bengio, S., and Fis- to Multimodal Communication Workshop. 24th Eu- cus, J., editors, MLMI 06, pages 154–155 (2006). ropean Summer School in Logic, Language and In- formation (ESSLLI’12) (2012). Eisenstein, J.and Davis, R. Gesture improve corefer- ence resolution. In Proceedings of the Human Lan- Lis, M. Influencing gestural representation of eventual- guage Technology Conference of the North Ameri- ities: insights from ontology. In Proceedings of the can Chapter of the ACL, pages 37–40, New York 14th ACM International Conference on Multimodal (2006). Interaction (ICMI’12), 281–288, (2012). Fellbaum, C., WordNet: An Electronic Lexical Lis, M. Multimodal representation of entities: A Database. MIT Press, Cambridge, MA (1998). corpus-based investigation of co-speech hand ges- ture. PhD dissertation, University of Copenhagen Fujie, S., Ejiri, Y., Nakajima, K., Matsusaka, Y., and (submitted). Kobayashi, T. A conversation robot using head ges- ture recognition as para-linguistic information. In Louwerse, M., Jeuniaux, P., Hoque, M., Wu, J., and Proceedings of the 13th IEEE International Work- Lewis, G. Multimodal communication in computer- shop on Robot and Human Interactive Communica- mediated map task scenarios. In Sun, R. and tion, 159–154 (2004). Miyake, N., editors, Proceedings of the 28th Annual Conference of the Cognitive Science Society, Mah- Grice, H. Logic and Conversation. Syntax and Seman- wah, NJ. Erlbaum (2006). tics, 3:41–58. Academic Press, New York (1976). Louwerse, M. M., Benesh, N., Hoque, M., Jeuniaux, Jokinen, K. , Navarretta, C., and Paggio, P. Dis- P., Lewis, G., Wu, J., and Zirnstein, M. Multi- tinguishing the communicative function of gesture. modal communication in face-to-face conversations. Proceedings of MLMI (2008). In Sun, R. and Miyake, N., editors, Proceedings of the 29th Annual Conference of the Cognitive Science Karpiński, M., Jarmołowicz-Nowikow, E., Malisz, Z., Society, Mahwah, NJ. Erlbaum (2006). Szczyszek, M., Juszczyk, J. Rejestracja, tran- skrypcja i tagowanie mowy oraz gestów w narracji Maziarz, M. Non-lexical verb synsets in upper- dzieci i dorosłych. Investigationes Linguisticae, 17 hierarchy levels of polish wordnet 2.0. Technical (2008). report, Wrocław University of Technology (2012). Proceedings from the 1st European Symposium on Multimodal Communication, University of Malta, Valletta, October 17–18, 2013 49
Maziarz, M., Piasecki, M., Szpakowicz, S., Rabiega- Rauscher, F. H., Krauss, R. M., and Chen, Y. Ges- Wiśniewska, J. and B. Hojka. Semantic relations be- ture, Speech, and lexical access: The Role of Lex- tween verbs in polish wordnet 2.0. Cognitive Stud- ical Movements in Speech Production. Psychologi- ies, (11):183–200 (2011). cal Science, 7(4):226–231 (1996). McNeill, D. Hand and Mind: What Gestures Re- Rietveld, T. and Hout, R. v. Statistical Techniques veal About Thought. University of Chicago Press, for the Study of Language and Language Behavior. Chicago (1992). Mouton De Gruyter, Berlin (1993). McNeill, D. Gesture and Thought. University of Vendler, Z. Linguistics in Philosophy. Cornell Univer- Chicago Press, Chicago (2005). sity Press, Ithaca, NY (1967). Melinger, A. and Levelt, W. Gesture and the commu- Witten, J. and Frank, E. Data Mining: Practical ma- nicative intention of the speaker. Gesture, 4(2):119– chine learning tools and techniques. Morgan Kauf- 141 (2005). mann, San Francisco, 2 edition (2005). Młynarczyk, A. Aspectual pairing in Polish. PhD dis- Wittenburg, P., Brugman, H., Russel, A., Klassmann, sertation, University of Utrecht (2004). A., and Sloetjes, H. Elan: a professional framework for multimodality research. In LREC’06, Fifth In- Morency, L.-P., de Kok, I., and Gratch, J. A proba- ternational Conference on Language Resources and bilistic multimodal approach for predicting listener Evaluation (2006). backchannels. Autonomous Agents and Multi-Agent Systems, 20:70–84 (2009). Morency, L.-P., Sidner, C., Lee, C., and Darrell, T. Contextual recognition of head gestures. In Pro- ceedings of the International Conference on Multi- modal Interfaces (2005). Morency, L.-P., Sidner, C., Lee, C., and Darrell, T. Head gestures for perceptual interfaces: The role of context in improving recognition. Artificial Intelli- gence, 171(8–9):568–585 (2007). Navarretta, C. Anaphora and gestures in multimodal communication. In Proceedings of the 8th Dis- course Anaphora and Anaphor Resolution Collo- quium (DAARC 2011), pages 171–181, Faro, Por- tugal (2011). Navarretta, C. and Paggio, P. Classification of feed- back expressions in multimodal data. In Proceed- ings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL’10), pages 318– 324, Uppsala, Sweden (2010). Parrill, F. Viewpoint in speech–gesture integration: Linguistic structure, discourse structure, and event structure. Language and Cognitive Processes, 25(5):650–668 (2010). Parrill, F., Bergen, B. and P. Lichtenstein. Grammat- ical aspect, gesture, and conceptualization: Using co-speech gesture to reveal event representations. In Cognitive Linguistics, 24(1): 135–158 (2013). Peirce, C. S. Collected Papers of Charles Sanders Peirce (1931-58). Hartshorne, P. Weiss and A. Burks, Cambridge, MA: Harvard University Press (1931). Poggi, I. Iconicity in different types of gestures. Ges- ture, 8(1):45–61 (2008). Ramchard, G. Post-davidsionianism. Theoretical Lin- guistics, 31(3):359–373 (2005). Proceedings from the 1st European Symposium on Multimodal Communication, University of Malta, Valletta, October 17–18, 2013 50
You can also read