OPEN SOURCE HAMNOSYS PARSER FOR MULTILINGUAL SIGN LANGUAGE ENCODING
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Open Source HamNoSys Parser for Multilingual Sign Language Encoding Sylwia Majchrowska1,2 , Marta Plantykow3 , Milena Olech3 1 Wroclaw University of Science and Technology, Wrocław, Poland 2 AI Sweden, Gothenburg, Sweden, 3 Intel Technology Poland, Gdańsk, Poland, sylwia.majchrowska@ai.se, m.plantykow@gmail.com, milena.w.olech@gmail.com Abstract This paper presents our recent developments in the field of automatic processing of sign language corpora using the Hamburg Sign Language Annotation System (HamNoSys). We designed an automated tool to convert HamNoSys annotations into numerical labels for defined initial features of body and hand positions. Our proposed numerical multilabels greatly simplify the structure of HamNoSys annotation without significant loss of gloss meaning. These numerical multilabels can potentially be used to feed the machine learning models, which would accelerate the development of vision-based sign language recognition. In addition, this tool can assist experts in the annotation process to help identify semantic errors. The code and arXiv:2204.06924v1 [cs.CL] 14 Apr 2022 sample annotations are publicly available at https://github.com/hearai/parse-hamnosys. Keywords: HamNoSys, Multilingualism, Sign Language Processing 1. Introduction based applications would be extremely difficult. A People communicate to exchange information, feel- large training database is needed to perform this task, ings, and ideas. Verbal communication is the primary since the use of HamNoSys labels would change the method of human interaction. Those who can speak classification technique from single-class to multi-class and listen must use the same type of code (language) to as compared to gloss labels. Furthermore, the in- communicate effectively. The inability to use spoken consistent order of symbols in HamNoSys annotations language makes Deaf people the most excluded group amongst different datasets makes it almost impossible in society. Communication outside the Deaf commu- to process the labels properly. nity is a huge challenge, as sign languages (SLs) are To address the problem described above, we created a unrelated to spoken languages. solution for automatic processing of SL corpora that Moreover, different nationalities use different versions transforms HamNoSys labels into numerical labels for of SL and there is no universal one. SLs are natu- carefully defined classes describing the initial body ral human languages with their own grammatical rules and hand positions. The numerical labels thus cre- and lexicons. Therefore, developing an efficient sys- ated would support trainings of language-agnostic ML tem capable of translating spoken languages into sign models, since such multilabel annotations are easier to languages and vice versa would greatly improve two- interpret by users unfamiliar with HamNoSys annota- way communication. However, designing and imple- tions. Moreover, the proposed solution allows for fur- menting such a machine learning (ML) based system ther simplification of the annotation structure, which requires a large dataset with appropriate annotations. may make it easier to develop a suitable model. The Traditionally, SL corpora intended for ML training presented solution is unique of its kind and a publicly have been annotated at the gloss level. In the simplest available tool that allows arbitrary selection of appro- terms, a gloss is a label. Nevertheless, the same word priate classes to describe a specific part of a given ges- in different SL can be represented by different gestures. ture. Therefore, ML methods need large datasets for each SL separately to achieve good results. Although there 2. Related Works is no common notation system for SL, the Hamburg In recent years, work around HamNoSys has focused Sign Language Notation System (HamNoSys) defines on creating large multilingual lexicons and simplify- a unified set of written symbols that can be used by ing the labeling procedure. Since the grammar of the the creators of different SL datasets. This system con- notation system is complex and known only to sign sists of transcriptions of non-manual and manual fea- language experts, the main effort has been in annota- tures that describe the shape, orientation, position, and tion processing (Jakub and Zdeněk, 2008; Power et al., hand movement of a given sign. 2019; Skobov and Lepage, 2020). Although not all Although HamNoSys is language-agnostic, there are available SL corpora are annotated using HamNoSys, not many ML solutions using this notation system. the unified annotation system has attracted the atten- First of all, the grammar of HamNoSys is difficult to tion of many researchers (Koller et al., 2016; Dhanjal understand without specialized linguistic knowledge. and Singh, 2022) because it can greatly simplify multi- Moreover, using HamNoSys in its raw form in ML- lingual research.
2.1. HamNoSys-annotated SL Corpora editor called an automatic signed speech synthesizer. Deep learning (DL) based approaches are a promising This tool is able to generate a sign animation based on method for the process of sign language translation. the input HamNoSys label. The created application can However, they need a large amount of adequately la- accept the input label in two modes: one designed for beled training data to perform well. As HamNoSys direct insertion of HamNoSys symbols and the other annotations are not language-dependent, they can fa- with more intuitive graphical interfaces. The main goal cilitate cross-language research. Moreover, merging of this solution was to allow an inexperienced annotator several multilingual corpora will increase the available to correctly annotate signs. Unfortunately, it has sev- data resources. A comprehensive overview of the ex- eral limitations, as it contains no more than 300 signs isting corpora supplied with HamNoSys labels is pro- in Czech Sign Language and is based on a domain- vided below. specific lexicon. GLEX: The Fachgebärden Lexicon – Gesundheit and Power et al. (Power et al., 2019) conducted a histori- Pflege (Hanke et al., 2018) was developed at the Ger- cal sign language analysis based on 284 multilingual3 man Sign Language Institute of the University of Ham- signs annotated with HamNoSys symbols. They devel- burg between 2004 and 2007 and contains a total of oped an open-source Python library to facilitate statis- 2330 signs. This dataset is dedicated to technical terms tical analysis for manipulating HamNoSys annotations. related to health and nursing care. As a result, they created the first publicly available tool GALEX: The Fachgebärden Lexicon – Gärtnerei und capable of parsing HamNoSys annotations. However, Landschaftsbau (Hanke et al., 2018) was also created the proposed parser can only work with sign language at the Hamburg Institute of German Sign Language be- data if the HamNoSys labels satisfy strict assumptions. tween 2006 and 2009. This dataset consists of 710 The symbols must appear in the gloss notation in a spe- signs related to technical terms for landscaping and gar- cific order and be separated by spaces. Consequently, dening. the tool cannot be used on already existing SL corpora BL: The Basic Lexicon is part of the multilingual1 without initial preprocessing. three-years research project DICTA-SIGN (Matthes et In 2020, an automatic annotation system from video- al., 2012), carried out since 2009 by a Consortium of to-HamNoSys based on the HamNoSys grammar struc- European Universities2 . About 1k signs are provided ture was presented (Skobov and Lepage, 2020). The for each SL. The shared list of glosses covers the topic proposed solution is based on virtual avatar anima- of traveling. tions created using the JASigning platform4 . This ap- CDPSL: The Corpus-based Dictionary of Polish Sign proach was able to generate a HamNoSys label based Language (PJM) (Joanna Łacheta, 2016) was created in on a given sign animation. However, the Skobov et 2016 at the Sign Linguistics Laboratory of the Univer- al. claimed that the proposed methodology with some sity of Warsaw. The dictionary was developed based on modifications can also be used to obtain better results the PJM corpus, which collected the video data of 150 on real data. deaf signers who use PJM. CDPSL allows document- 2.3. ML-based approaches using HamNoSys ing and describing authentic everyday use of PJM. GSLL: The Greek Sign Language Lemmas (A. Krati- There is a visibly growing interest in sign language menos, G. Pavlakos and P. Maragos, 2021; S. Theodor- recognition from video recordings using ML methods. akis, V. Pitsikalis, and P. Maragos, 2014; E. Efthimiou, Some of them use HamNoSys labels in a reduced form, S-E. Fotinea, C. Vogler, T. Hanke, J. Glauert, R. Bow- as the access to large databases is limited. The most den, A. Braffort, C. Collet, P. Maragos, and J. Segouat, popular approaches use classification networks. 2009) is developed by the National Technical Univer- In 2016, Koller et al. (Koller et al., 2016) used part sity of Athens and supported by the EU research project of HamNoSys annotations describing the hand orienta- DICTA-SIGN. The dataset is dedicated to isolated SL tion modality. They trained classifier on isolated signs recognition and contains 347 glosses signed by two in Swiss-German and Danish Sign Language. They participants, where each sign is repeated between 5 and applied it to the continuous sign language recognition 17 times. task on the RWTH-PHOENIX-Weather corpus (Forster et al., 2012) containing German Sign Language. This 2.2. HamNoSys Processing Tools multilingual approach significantly reduced word er- In the first decade of the 21st century, Kanis et rors. al. (Jakub and Zdeněk, 2008) proposed a HamNoSys Recently, Hidden Markov Models were also used to directly translate multilingual speech into Indian Sign 1 British Sign Language, German Sign Language, Greek Language. (Dhanjal and Singh, 2022). In this ap- Sign Language, and French Sign Language proach, HamNoSys annotations provided an interme- 2 Institute for Language and Speech Processing, Univer- 3 sität Hamburg, University of East Anglia, University of Sur- American Sign Language, Flemish Sign Language, Mex- rey, Laboratoire d’informatique pour la mécanique et les sci- ican Sign Language, French Sign Language, French Belgian ences de l’ingénieur, Université Paul Sabatier, National Tech- Sign Language, Brazilian Sign Language 4 nical University of Athens, WebSourd https://vh.cmp.uea.ac.uk/index.php/JASigning
diate step between spoken and sign language. Addi- • Movement/Action: represents a combination of tional conversion of speech into human-readable text path movements that can be specified as target- was not needed. Despite the successful creation of an ed/absolute (location) or relative (direction and Indian speech recognition system, it was pointed out size) movements. that building an optimized model requires a sufficient amount of data in a properly transcribed format. This 4. Decision Tree-based HamNoSys limitation has yet to be overcome. parser 3. Hamburg Sign Language Notation The main goal of the parser is to translate a label rep- resenting a SL gloss, written in HamNoSys format, System into a form that can be used for DL-based classifica- The Hamburg Sign Language Notation System (Ham- tion. Since the structure of the HamNoSys grammar NoSys) is a phonetic transcription system that has been can be described as a decision tree (Skobov and Lep- widespread for more then 20 years (Hanke et al., 2018). age, 2020), we used the method to decompose the nota- HamNoSys does not refer to different national finger tion into numerical multilabels for the defined classes. alphabets and can therefore be used internationally. It The parser logic implements the rules underlying data can be divided into six basic blocks, as presented in with sequential structure to analyze a series of symbols. Fig. 1 (upper panel). The first two out of six blocks It matches each symbol with the class that describes it – symmetry operator and non-manual features – are (while assigning it the appropriate number) or removes optional. The remaining four components – hand- it. The fig. 2 shows a diagram of how a parser works. shape, hand position, hand location, and movement – are mandatory (Smith, 2013). Figure 1: Detailed HamNoSys structure. Figure 2: A schema of action of implemented parser. Each HamNoSys part mentioned above has the follow- ing meaning (Hanke, 2004): 4.1. Methodology • Symmetry operator (optional): denotes two- As described in Section 3, HamNoSys labels can be handed signs and determines how attributes represented by blocks. In our implementation, four should be mirrored to the non-dominant hand un- blocks (symmetry operator, location left/right, location less otherwise specified. top/bottom, distance from the body) refer to the over- • Non-manual features (optional): represents non- all human posture, while five blocks (handshape base manual features (e.g., puffed or sucked-in cheeks) form, handshape thumb position, handshape bending, that can be used to describe a given sign. hand position extended finger direction, and hand posi- tion palm orientation) relate to a single hand. • Handshape: refers to the handshape description, Furthermore, two hands (dominant and non-dominant) which is composed of three subblocks – Base can be involved in each sign. Up to two HamNoSys form, Thumb position, and Bending. symbols describing a single class can be assigned for • Hand position/orientation: describes the orien- each hand. To properly store all symbols, the classes tation of the hand using two subblocks – Ex- related to a single hand are repeated 4 times as the pri- tended finger direction, which can specify two mary and secondary description for the dominant hand degrees of freedom seen from signer’s, birds’ or and the primary and secondary description non dom- from-the-right view, and Palm orientation, as inant hand. Moreover, we added one extra class that third degree of freedom, defined relative to the ex- indicates if the sign description starts from a relaxed tended finger direction. hand sign. As a result, together with classes describing overall human posture, the parser considers 25 classes • Hand Location: is split into three components – when analyzing HamNoSys labels. The fig. 3 presents Location left/right specifies x coordinate, Loca- the numerical values and assigned to them HamNoSys tion top/bottom for y coordinate, and Distance symbols. (that is skipped if natural) specifies the z coordi- Class Symmetry operator, describes how the descrip- nate. tion of the dominant hands maps to the non-dominant
hand using nine numbers. A value of class NonDom first equal 1 if the sign description starts from a relaxed hand sign, otherwise is set to 0. The following 20 classes contain primary and sec- ondary descriptions of five features assigned separately for dominant and non-dominant hands. Secondary classes will be assigned if the \ operator is used. For example, if A\B construct is used, symbol A will be assigned to the primary class, and symbol B will be as- signed to the secondary class. The Handshape base form class describes the base form of a handshape in the initial posture. This class value can be in the range from 0 to 11. The appearance of the class symbol is expected right after the symmetry operator. Only three symbols: relaxed hand, ˜ or [ can occur in between. The first one, indicating a relaxed hand, will be assigned a numeric label, and the other two will be omitted. If this symbol is not found in the expected place, an error will occur. The Handshape thumb position class describes the position of a thumb in the initial posture and has four possibilities. This class symbol can only be used right after the handshape base form symbol or bending sym- bol. Otherwise, an error will occur. The Handshape bending class describes the hand bending in the initial posture. This class symbol is in the range 0 to 5 and can be used only right after the handshape base form sym- bol or handshape thumb position symbol. Otherwise, an error will occur. Fig. 3 presents the thumb position and bending in combination with the handshape base form icon to increase its readability. Figure 3: Parser classes values. The Hand position extended finger direction class specifies two degrees of freedom. This class value can be in the range from 0 to 17. This class must occur at in the HamNoSys label are marked as NaN to distin- least once in the HamNoSys label. Otherwise, an error guish them easily. An error when parsing a particular will occur since this class is mandatory. The Hand po- class will be indicated by a negative number assigned sition palm orientation class is in the range 0 to 7 and to the class. specifies the third degree of freedom. Like the previous one, this class is mandatory. The symbol must be found 4.2. Example usage in the HamNoSys label at least once - if not, an error The open-source code is implemented in Python pro- will occur. gramming language, making it more flexible and user- Three remaining classes, Hand location L/R (Left- friendly and providing the opportunity to incorporate /Right) Hand location T/B (Top/Bottom) and Hand commonly used Python libraries such as Pandas. location distance, describe the hand location, using x, The package consists of two main scripts: parse- y and z coordinates respectively. The first of them is hamnosys, which is the main driver, and ham- in range 0 to 4, the second one uses 37 symbols, and nosys dicts, which is the dictionary file. The parse- the last class values are in between 0 and 5. For Hand hamnosys script can be used to convert a HamNoSys location L/R class, the symbol position is analyzed in encoding to numerical labels. It requires arguments in- relation to the Hand location T/B symbol (see Fig. 3). dicating the source file containing HamNoSys notation, As previously described, there are three mandatory destination files that will separately contain success- symbol classes the parser shall find in the HamNoSys fully and not successfully parsed results. Moreover, label: hand position base form, hand extended finger there are two optional arguments that specify the names direction, and hand palm orientation. If any of those of columns in the input and output text files. classes are not found, an error will occur. Listing 1 gives an example of the command to use the As suggested in sources (Hanke, 2004), if the location parser with the mentioned above parameters. As a re- information is missing, the default neutral values are sult, two files, source.txt and error.txt, will be gener- assigned automatically, meaning 0 for distance, 0 for ated. The first of them contains correctly parsed glosses left/right, and 14 for top/bottom. All classes not found with their classes. The other one contains glosses that
were not parsed properly. parser as an input. The # correct column contains the number of correctly parsed glosses, while the # er- Listing 1: Basic example of calling parse-hamnosys rors column shows the number of HamNoSys labels script. in which the order of symbols was not properly recog- python3 parse−hamnosys.py −− src file source. txt \ nized. The parser effectiveness ηp was calculated as the −− dst file result . txt \ percentage of entries that were correctly decomposed −− err file error . txt and parsed by the parser. Table 1 presents a few examples from two dif- ferent datasets. The example 1 stands for word Table 2: Number of successfully parsed entries for each EΠIΣKEΥAZΩ and comes from GSLL dataset. The dataset. second example comes from Basic Lexicon dataset and Dataset # # # ηp stands for word know. name glosses parsed errors (%) Class Ex. 1 Ex. 2 GALEX 568 561 7 98.77 Symmetry 7 0 GLEX 829 778 51 93.85 NonDom first 0 0 CDPSL 2835 2828 7 99.75 Base form 0 0 BL 4123 3907 216 94.76 Shape Thumb 0 1 GSLL 3476 3316 160 95.40 Dom. 1 Bending 0 0 Ext.finger dir. 0 0 Position 5. Data reduction influence analysis Palm 5 4 Base form 2 NaN As a final analysis, we performed backward decoding Shape Thumb 0 NaN from the assigned numeric multilabels to the proper Dom. 2 Bending 4 NaN glosses. In this study, we evaluated the impact of infor- Ext.finger dir. NaN 7 mation reduction caused by the proposed HamNoSys Position label encoding methodology. Palm NaN 2 Base form 0 NaN We assumed that the decoding process was successful Shape Thumb 0 NaN if the parser assigned one HamNoSys label to reach the Ndom. 1 Bending 0 NaN tested gloss. If it assigned more than one label, the Ext.finger dir. 1 NaN verification was considered unsuccessful. The results Position were verified separately for each database due to their Palm 6 NaN Base form 2 NaN diversity and significant differences in the amount of Shape Thumb 0 NaN data. The lowest decoding efficiency ηd was observed Ndom 2 Bending 4 NaN for Basic Lexicon, which is a set that consists of four Ext.finger dir. NaN NaN SLs. For this database, the number of glosses assigned Position to a single HamNoSys label (and thus misidentified) Palm NaN NaN x 0 4 reached the highest value of 12. Location y 13 3 Table 3 summarizes the influence of data reduction on z 0 0 each dataset. The # unique glosses column indicates the total number of distinctive glosses processed by the Table 1: Example results for two separate words – parser and assigned to HamNoSys labels. The # singly EΠIΣKEΥAZΩ (Ex.1) and know (Ex.2) – from two labelled and # repeated columns show the number of databases. correctly and incorrectly decoded glosses. Finally, the calculated high decoding efficiency ηd proves that the parser is able of correctly recognizing an individual 4.3. Error handling gloss from different SLs. The data reduction has no Decision tree-based parser implementation predefines significant negative impact on this process. possible HamNoSys label formats. If the parser does not recognize the order of symbols in the HamNoSys Table 3: A summary of the impact of data reduction on label, an error will be announced by filling the given individual datasets. column with a negative value. To analyse the parser effectiveness ηp we used the open Dataset # unique # singly # ηd part of each database mentioned in Section 2.1. The name glosses labelled repeated (%) gathered collection of datasets consists of around ten GALEX 514 484 30 94.16 hours of videos accompanied by 11831 glosses, where GLEX 723 684 39 94.61 7095 are unique. CDPSL 2480 2259 221 91.09 Table 2 presents the number of successfully parsed en- BL 3078 2580 498 83.82 tries for each dataset analyzed. The # glosses col- GSLL 300 283 17 94.33 umn represents the number of all glosses passed to the
6. Conclusion Sign Language Resources in the Service of the Lan- Nowadays, several lexical corpus collections include guage Community, Technological Challenges and HamNoSys annotations. However, the way signs are Application Perspectives, pages 209–216, Marseille, annotated is not standardized. We decided to imple- France, May. European Language Resources Asso- ment the HamNoSys parser as universally as possi- ciation (ELRA). ble to leverage and combine existing annotation efforts from different corpora. 9. Language Resource References The proposed parser reduces the amount of data stored A. Kratimenos, G. Pavlakos and P. Maragos. (2021). in the original HamNoSys character since it omits some “Independent Sign Language Recognition with data, such as movement (analyzing only the initial 3D Body, Hands, and Face Reconstruction”, in gloss position) or finger-related details. Nevertheless, Proc. IEEE International Conference on Acoustics, the essential characteristic of the sign is preserved. We Speech, and Signal Processing (ICASSP). have also proved that backward decoding of the gloss is E. Efthimiou, S-E. Fotinea, C. Vogler, T. Hanke, possible with an efficiency above 80%. We believe that J. Glauert, R. Bowden, A. Braffort, C. Collet, P. the developed tool will contribute to future research ef- Maragos, and J. Segouat. (2009). Sign Language forts aimed at creating a fully functional sign language- Recognition, Generation, and Modelling: A Re- agnostic translator. search Effort with Applications in Deaf Communi- cation”, Proc. 5th Conf. on Universal Access in 7. Acknowledgements Human-Computer Interaction (jointly with HCI In- The five-month non-profit educational project HearAI ternational 2009), Springer LNCS vol.5614, pp. 21- was organized by Natalia Czerep, Sylwia Ma- 30. jchrowska, Agnieszka Mikołajczyk, Milena Olech, Forster, Jens and Schmidt, Christoph and Hoyoux, Marta Plantykow, Żaneta Lucka Tomczyk. The main Thomas and Koller, Oscar and Zelle, Uwe and Pi- goals were to work on sign language recognition us- ater, Justus and Ney, Hermann. (2012). RWTH- ing HamNoSys, and to increase public awareness of PHOENIX-Weather: A Large Vocabulary Sign Lan- the Deaf community. The authors acknowledge other guage Recognition and Translation Corpus. Euro- team members: Natalia Czerep, Maria Ferlin, Wik- pean Language Resources Association (ELRA). tor Filipiuk, Grzegorz Goryl, Agnieszka Kamińska, Hanke, Thomas and König, Susanne and Konrad, Alicja Krzemińska, Adrian Lachowicz, Agnieszka Reiner and Langer, Gabriele and Barbeito Rey- Mikołajczyk, Patryk Radzki, Krzysztof Bork-Ceszlak, Geißler, Patricia and Blanck, Dolly and Gold- Alicja Kwasniewska, Karol Majek, Jakub Nalepa, schmidt, Stefan and Hofmann, Ilona and Hong, Marek Sowa who contributed to the project. The au- Sung-Eun and Jeziorski, Olga and Kleyboldt, Thimo thors acknowledge the infrastructure and support of and König, Lutz and Matthes, Silke and Nishio, Voicelab.ai in Gdańsk. Rie and Rathmann, Christian and Salden, Uta and Wagner, Sven and Worseck, Satu. (2018). 8. Bibliographical References MEINE DGS. Öffentliches Korpus der Deutschen Dhanjal, A. S. and Singh, W. (2022). An op- Gebärdensprache, 1. Release. Universität Hamburg. timized machine translation technique for multi- Thomas Hanke. (2004). HamNoSys – Representing lingual speech to sign language notation. Multime- Sign Language Data in Language Resources and dia Tools and Applications. Language Processing Contexts. Jakub, K. and Zdeněk, K. (2008). Interactive ham- Joanna Łacheta, Małgorzata Czajkowska-Kisil, nosys notation editor for signed speech annotation. Jadwiga Linde-Usiekniewicz, Paweł Rutkowski. In Proceedings of the 3rd Workshop on the Represen- (2016). Corpus-based Dictionary of Polish Sign tation and Processing of Signed Languages: Con- Language, Warsaw: Faculty of Polish Studies, struction and Exploitation of Sign Language Cor- University of Warsaw. pora, 6. Matthes, Silke and Hanke, Thomas and Regen, Anja Koller, O., Ney, H., and Bowden, R. (2016). Auto- and Storz, Jakob and Worseck, Satu and Efthimiou, matic alignment of hamnosys subunits for continu- Eleni and Dimou, Athanasia-Lida and Braffort, An- ous sign language recognition. In Proceedings of the nelies and Glauert, John and Safar, Eva. (2012). LREC2016 10th edition of the Language Resources Dicta-Sign – Building a Multilingual Sign Language and Evaluation Conference, 05. Corpus. Power, J., Quinto-Pozos, D., and Law, D. (2019). Can S. Theodorakis, V. Pitsikalis, and P. Maragos. (2014). the comparative method be used for signed language Dynamic-Static Unsupervised Sequentiality, Statisti- historical analyses?, 09. cal Subunits and Lexicon for Sign Language Recog- Skobov, V. and Lepage, Y. (2020). Video-to- nition”, Image and Vision Computing, vol.32, no.8, HamNoSys automated annotation system. In Pro- pp.533–549. ceedings of the LREC2020 9th Workshop on the Robert Smith. (2013). HamNoSys 4.0 User Guide. Representation and Processing of Sign Languages:
You can also read