Lecture Notes in Artificial Intelligence - Edited by R. Goebel, J. Siekmann, and W. Wahlster - Computational Processing of the Portuguese ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Lecture Notes in Artificial Intelligence 5190 Edited by R. Goebel, J. Siekmann, and W. Wahlster Subseries of Lecture Notes in Computer Science
António Teixeira Vera Lúcia Strube de Lima Luís Caldas de Oliveira Paulo Quaresma (Eds.) Computational Processing of the Portuguese Language 8th International Conference, PROPOR 2008 Aveiro, Portugal, September 8-10, 2008 Proceedings 13
Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editors António Teixeira Universidade de Aveiro, Dep. de Electrónica, Telecomunicações e Informática, and Instituto de Engenharia Electrónica e Telemática de Aveiro (IEETA) 3810-193 Aveiro, Portugal E-mail: ajst@ua.pt Vera Lúcia Strube de Lima Pontifícia Universidade Católica do Rio Grande do Sul Faculdade de Informática, Grupo PLN 90619-900 Porto Alegre, RS, Brazil E-mail: vera.strube@pucrs.br Luís Caldas de Oliveira Universidade Técnica de Lisboa, and INESC-ID, L2F 1000 Lisboa, Portugal E-mail: lco@inesc-id.pt Paulo Quaresma Universidade de Évora, Departamento de Informática 7000-671 Évora, Portugal E-mail: pq@di.uevora.pt Library of Congress Control Number: 2008933855 CR Subject Classification (1998): H.3.1, H.5.2, I.2.1, I.2.7 LNCS Sublibrary: SL 7 – Artificial Intelligence ISSN 0302-9743 ISBN-10 3-540-85979-9 Springer Berlin Heidelberg New York ISBN-13 978-3-540-85979-6 Springer Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2008 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12513574 06/3180 543210
Preface The International Conference on Computational Processing on Portuguese, for- merly the Workshop on Computational Processing of the Portuguese Language – PROPOR – is the main event in the area of Natural Language Processing that focuses on Portuguese and the theoretical and technological issues related to this specific language. The meeting has been a very rich forum for the interchange of ideas and partnerships for the research communities dedicated to the automated processing of the Portuguese language. This year’s PROPOR, the first one to adopt the International Conference la- bel, followed workshops held in Lisbon, Portugal (1993), Curitiba, Brazil (1996), Porto Alegre, Brazil (1998), Évora, Portugal (1999), Atibaia, Brazil (2000), Faro, Portugal (2003) and Itatiaia, Brazil (2006). The constitution of a steering committee (PROPOR Committee), an interna- tional program committee, the adoption of high-standard refereing procedures and the support of the prestigious ACL and ISCA international associations demonstrate the steady development of the field and of its scientific community. A total of 63 papers were submitted to PROPOR 2008. Each submitted paper received a careful, triple-blind review by the program committee or by their commitment. All those who contributed are mentioned on the following pages. The reviewing process led to the selection of 21 regular papers for oral presentation and 16 short papers for poster sessions. The workshop and this book were structured around the following main top- ics: Speech Analysis; Ontologies, Semantics and Anaphora Resolution; Speech Synthesis; Machine Learning Applied to Natural Language Processing; Speech Recognition and Natural Language Processing Tools and Applications. Short papers and related posters were organized according to the two main areas of PROPOR: Natural Language Processing and Speech Technology. This year’s PROPOR had two important novelties: one was the fact that the two main areas of the conference were more equally represented and the other was the inclusion of a special session dedicated to Applications of Por- tuguese Speech and Language Technologies. The special session, promoted by the Microsoft Language Development Center (MLDC), provided an opportunity for university and industrial communities working on portuguese natural lan- guage processing and speech technology to report their most recent products, systems, resources or tools for Portuguese. Two satellite events were also or- ganized in association with PROPOR: the Second HAREM Workshop, Named Entity Recognition in Portuguese, and the workshop “Ten years of Linguateca”. We would like to express here our thanks to all members of our technical program committee and additional reviewers, as listed on the following pages. We are especially grateful to our invited speakers, Tanja Schultz (Univer- sity of Karlsruhe and CMU) and Chris Quirk (Microsoft), for their invaluable
VI Preface contribution, which undoubtedly increased the interest in the conference and its quality. We are indebted to the PROPOR 2008 secretary, Anabela Viegas, for all her support. We would like to publicly acknowledge the institutions and companies with- out which this conference would not have been possible: Universidade de Aveiro, Institute of Electronics and Telematics Engineering of Aveiro (IEETA), Associa- tion for Computational Linguistics (ACL), International Speech Communication Association (ISCA), ISCA Special Interest Group on Iberian Language (SIG-IL), Fundação para a Ciência e a Tecnologia (FCT), Microsoft, Springer, !UZ Tech- nologies, DESIGNEED and Grande Hotel da Curia. June 2008 António Teixeira Vera Lúcia Strube de Lima Luı́s Caldas de Oliveira Paulo Quaresma
Organization Conference Chair António Teixeira DETI/IEETA, Universidade de Aveiro, Portugal Program Co-chairs Vera Lúcia Strube de Lima Pontifı́cia Universidade Católica do Rio Grande do Sul, Brazil Luı́s Caldas de Oliveira L2F/INESC-ID, IST, Portugal Publication Chair Paulo Quaresma Universidade de Évora, Portugal Program Committee Alexandre Agustini Pontifı́cia Universidade Católica do Rio Grande do Sul, Brazil Sandra Aluisio Universidade de São Paulo, Brazil Amália Andrade CLUL, Universidade de Lisboa, Portugal Jorge Baptista Universidade do Algarve, Portugal Plı́nio Barbosa Universidade Estadual de Campinas, Brazil Dante Barone Universidade Federal do Rio Grande do Sul, Brazil Steven Bird University of Melbourne, Australia Antonio Bonafonte Universitat Politècnia de Catalunya, Spain António Branco Universidade de Lisboa, Portugal Luı́s Caldas de Oliveira INESC-ID/IST, Portugal Nick Campbell NiCT/ATR, Japan Diamantino Caseiro INESC-ID, Portugal Berthold Crysmann Bonn University, Germany Gaël Dias Universidade da Beira Interior, Portugal Bento Dias da Silva Universidade Estadual Paulista, Brazil Marcelo Finger IME- USP, Brazil Diamantino Freitas Faculdade de Engenharia, Universidade do Porto, Portugal Pablo Gamallo Universidade de Santiago de Compostela, Spain
VIII Organization Caroline Hagège Xerox Research Centre Europe, France Julia Hirschberg Columbia University, USA Isabel Hub Faria Universidade de Lisboa, Portugal Tracy Holloway King Palo Alto Research Center, USA Eric Laporte Université Paris-Est Marne-la-Vallée, France Gabriel Lopes Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Portugal Saturnino Luz Trinity College Dublin, Ireland Lúcia Machado Rino Dep. de Computação, Universidade Federal de São Carlos, Brazil Sandra Madureira Pontifı́cia Universidade Católica de São Paulo, Brazil Belinda Maia Faculdade de Letras, Universidade do Porto, Portugal Ranniery Maia ATR Spoken Language Communication Labs, Japan Nuno Mamede INESC-ID/IST, Portugal Jean-Luc Minel MoDyCo, CNRS, France Climent Nadeu Universitat Politècnica de Catalunya, Spain João Neto INESC-ID/IST, Portugal Viviane Moreira Orengo Universidade Federal do Rio Grande do Sul, Brazil Manuel Palomar Universidad de Alicante, Spain Fernando Perdigão Universidade de Coimbra, Portugal Carlos Prolo Pontifı́cia Universidade Católica do Rio Grande do Sul, Brazil Paulo Quaresma Universidade de Évora, Portugal Violeta Quental Pontifı́cia Universidade Católica do Rio de Janeiro, Brazil Elisabete Ranchhod Universidade de Lisboa, Portugal Fernando Gil Resende Jr. Universidade Federal do Rio de Janeiro, Brazil António Ribeiro IPSC, Italy Irene Rodrigues Departamento de Informática, Universidade de Évora, Portugal Solange Rossato University of Grenoble 3, France Diana Santos SINTEF, Norway Luı́s Seabra Lopes DETI/IEETA, Universidade de Aveiro, Portugal António Serralheiro INESC-ID and Academia Militar, Portugal Vera Strube de Lima Pontifı́cia Universidade Católica do Rio Grande do Sul, Brazil António Teixeira DETI/IEETA, Universidade de Aveiro, Portugal
Organization IX Ana Maria Tramunt Ibaños Pontifı́cia Universidade Católica do Rio Grande do Sul, Brazil Isabel Trancoso INESC-ID/IST, Portugal João Veloso Universidade do Porto, Portugal Renata Vieira UNISINOS, Brazil Aline Villavicencio Universidade Federal do Rio Grande do Sul, Brazil Fábio Violaro Universidade Estadual de Campinas, Brazil Maria das Graças Volpe Nunes Universidade de São Paulo, Brazil Dina Wonsever Universidad de la Republica, Uruguay Nestor Yoma Universidad de Chile, Chile Additional Reviewers Petra Wagner Bonn University, Germany Luı́sa Coheur INESC-ID, Portugal José Adrián Rodrı́guez Fonollosa Universitat Politècnica de Catalunya, Spain Thiago Pardo Universidade de São Paulo, Brazil
Table of Contents Speech Analysis Event Detection by HMM, SVM and ANN: A Comparative Study . . . . . . 1 Carla Lopes and Fernando Perdigão Frication and Voicing Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Luis M.T. Jesus and Philip J.B. Jackson A Spoken Dialog System Speech Interface Based on a Microphone Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Gustavo Esteves Coelho, António Joaquim Serralheiro, and João Paulo Neto Ontologies, Semantics and Anaphora Resolution PAPEL: A Dictionary-Based Lexical Ontology for Portuguese . . . . . . . . . 31 Hugo Gonçalo Oliveira, Diana Santos, Paulo Gomes, and Nuno Seco Comparing Window and Syntax Based Strategies for Semantic Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Pablo Gamallo Otero The Mitkov Algorithm for Anaphora Resolution in Portuguese . . . . . . . . . 51 Amanda Rocha Chaves and Lucia Helena Machado Rino Semantic Similarity, Ontologies and the Portuguese Language: A Close Look at the subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Juliano Baldez de Freitas, Vera Lúcia Strube de Lima, and Josiane Fontoura dos Anjos Brandolt Speech Synthesis Boundary Refining Aiming at Speech Synthesis Applications . . . . . . . . . . 71 Monique V. Nicodem, Sandra G. Kafka, Rui Seara Jr., and Rui Seara Evolutionary-Based Design of a Brazilian Portuguese Recording Script for a Concatenative Synthesis System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Monique Vitório Nicodem, Izabel Christine Seara, Daiana dos Anjos, Rui Seara Jr., and Rui Seara DIXI – A Generic Text-to-Speech System for European Portuguese . . . . . 91 Sérgio Paulo, Luı́s C. Oliveira, Carlos Mendes, Luı́s Figueira, Renato Cassaca, Céu Viana, and Helena Moniz
XII Table of Contents European Portuguese Articulatory Based Text-to-Speech: First Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 António Teixeira, Catarina Oliveira, and Plı́nio Barbosa Machine Learning Applied to Natural Language Processing Statistical Machine Translation of Broadcast News from Spanish to Portuguese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Raquel Sánchez Martı́nez, João Paulo da Silva Neto, and Diamantino António Caseiro Combining Multiple Features for Automatic Text Summarization through Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Daniel Saraiva Leite and Lucia Helena Machado Rino Some Experiments on Clustering Similar Sentences of Texts in Portuguese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Eloize Rossi Marques Seno and Maria das Graças Volpe Nunes Portuguese Part-of-Speech Tagging Using Entropy Guided Transformation Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Cı́cero Nogueira dos Santos, Ruy L. Milidiú, and Raúl P. Renterı́a Learning Coreference Resolution for Portuguese Texts . . . . . . . . . . . . . . . . 153 José Guilherme C. de Souza, Patricia Nunes Gonçalves, and Renata Vieira Speech Recognition and Applications Domain Adaptation of a Broadcast News Transcription System for the Portuguese Parliament . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Luı́s Neves, Ciro Martins, Hugo Meinedo, and João Neto Automatic Classification and Transcription of Telephone Speech in Radio Broadcast Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Alberto Abad, Hugo Meinedo, and João Neto A Platform of Distributed Speech Recognition for the European Portuguese Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 João Miranda and João P. Neto Natural Language Processing Tools and Applications Supporting e-Learning with Language Technology for Portuguese . . . . . . 192 Mariana Avelãs, António Branco, Rosa Del Gaudio, and Pedro Martins
Table of Contents XIII ParaMT: A Paraphraser for Machine Translation . . . . . . . . . . . . . . . . . . . . . 202 Anabela Barreiro POSTERS Natural Language Processing Second HAREM: New Challenges and Old Wisdom . . . . . . . . . . . . . . . . . . 212 Diana Santos, Cláudia Freitas, Hugo Gonçalo Oliveira, and Paula Carvalho Floresta Sintá(c)tica: Bigger, Thicker and Easier . . . . . . . . . . . . . . . . . . . . . 216 Cláudia Freitas, Paulo Rocha, and Eckhard Bick The Identification and Description of Frozen Prepositional Phrases through a Corpus-Oriented Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Milena Garrão, Violeta Quental, Nuno Caminada, and Eckhard Bick CorrefSum: Referencial Cohesion Recovery in Extractive Summaries . . . . 224 Patrı́cia Nunes Gonçalves, Renata Vieira, and Lucia Helena Machado Rino Answering Portuguese Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Luı́s Fernando Costa and Luı́s Miguel Cabral XisQuê: An Online QA Service for Portuguese . . . . . . . . . . . . . . . . . . . . . . . 232 António Branco, Lino Rodrigues, João Silva, and Sara Silveira Using Semantic Prototypes for Discourse Status Classification . . . . . . . . . 236 Sandra Collovini, Luiz Carlos Ribeiro Jr., Patricia Nunes Gonçalves, Vinicius Muller, and Renata Vieira Using System Expectations to Manage User Interactions . . . . . . . . . . . . . . 240 Filipe M. Martins, Ana Mendes, Joana Paulo Pardal, Nuno J. Mamede, and João P. Neto Speech and Language Processing Adaptive Modeling and High Quality Spectral Estimation for Speech Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 Luı́s Coelho and Daniela Braga On the Voiceless Aspirated Stops in Brazilian Portuguese . . . . . . . . . . . . . 248 Mariane Antero Alves, Izabel Christine Seara, Fernando Santana Pacheco, Simone Klein, and Rui Seara
XIV Table of Contents Comparison of Phonetic Segmentation Tools for European Portuguese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 Luı́s Figueira and Luı́s C. Oliveira Spoltech and OGI-22 Baseline Systems for Speech Recognition in Brazilian Portuguese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 Nelson Neto, Patrick Silva, Aldebaro Klautau, and Andre Adami Development of a Speech Recognizer with the Tecnovoz Database . . . . . . 260 José Lopes, Cláudio Neves, Arlindo Veiga, Alexandre Maciel, Carla Lopes, Fernando Perdigão, and Luı́s Sá Dynamic Language Modeling for the European Portuguese . . . . . . . . . . . . 264 Ciro Martins, António Teixeira, and João Neto An Approach to Natural Language Equation Reading in Digital Talking Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 Carlos Juzarte Rolo and António Joaquim Serralheiro Topic Segmentation in a Media Watch System . . . . . . . . . . . . . . . . . . . . . . . 272 Rui Amaral and Isabel Trancoso Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
You can also read