Lecture Notes in Artificial Intelligence - Edited by R. Goebel, J. Siekmann, and W. Wahlster - Computational Processing of the Portuguese ...

Page created by John Andrews
 
CONTINUE READING
Lecture Notes in Artificial Intelligence            5190
Edited by R. Goebel, J. Siekmann, and W. Wahlster

Subseries of Lecture Notes in Computer Science
António Teixeira
Vera Lúcia Strube de Lima
Luís Caldas de Oliveira
Paulo Quaresma (Eds.)

Computational
Processing of the
Portuguese Language

8th International Conference, PROPOR 2008
Aveiro, Portugal, September 8-10, 2008
Proceedings

13
Series Editors
Randy Goebel, University of Alberta, Edmonton, Canada
Jörg Siekmann, University of Saarland, Saarbrücken, Germany
Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany

Volume Editors
António Teixeira
Universidade de Aveiro, Dep. de Electrónica, Telecomunicações e Informática, and
Instituto de Engenharia Electrónica e Telemática de Aveiro (IEETA)
3810-193 Aveiro, Portugal
E-mail: ajst@ua.pt

Vera Lúcia Strube de Lima
Pontifícia Universidade Católica do Rio Grande do Sul
Faculdade de Informática, Grupo PLN
90619-900 Porto Alegre, RS, Brazil
E-mail: vera.strube@pucrs.br

Luís Caldas de Oliveira
Universidade Técnica de Lisboa, and
INESC-ID, L2F
1000 Lisboa, Portugal
E-mail: lco@inesc-id.pt

Paulo Quaresma
Universidade de Évora, Departamento de Informática
7000-671 Évora, Portugal
E-mail: pq@di.uevora.pt

Library of Congress Control Number: 2008933855

CR Subject Classification (1998): H.3.1, H.5.2, I.2.1, I.2.7

LNCS Sublibrary: SL 7 – Artificial Intelligence

ISSN              0302-9743
ISBN-10           3-540-85979-9 Springer Berlin Heidelberg New York
ISBN-13           978-3-540-85979-6 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
© Springer-Verlag Berlin Heidelberg 2008
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper    SPIN: 12513574        06/3180       543210
Preface

The International Conference on Computational Processing on Portuguese, for-
merly the Workshop on Computational Processing of the Portuguese Language
– PROPOR – is the main event in the area of Natural Language Processing that
focuses on Portuguese and the theoretical and technological issues related to this
specific language. The meeting has been a very rich forum for the interchange of
ideas and partnerships for the research communities dedicated to the automated
processing of the Portuguese language.
    This year’s PROPOR, the first one to adopt the International Conference la-
bel, followed workshops held in Lisbon, Portugal (1993), Curitiba, Brazil (1996),
Porto Alegre, Brazil (1998), Évora, Portugal (1999), Atibaia, Brazil (2000), Faro,
Portugal (2003) and Itatiaia, Brazil (2006).
    The constitution of a steering committee (PROPOR Committee), an interna-
tional program committee, the adoption of high-standard refereing procedures
and the support of the prestigious ACL and ISCA international associations
demonstrate the steady development of the field and of its scientific community.
    A total of 63 papers were submitted to PROPOR 2008. Each submitted
paper received a careful, triple-blind review by the program committee or by
their commitment. All those who contributed are mentioned on the following
pages. The reviewing process led to the selection of 21 regular papers for oral
presentation and 16 short papers for poster sessions.
    The workshop and this book were structured around the following main top-
ics: Speech Analysis; Ontologies, Semantics and Anaphora Resolution; Speech
Synthesis; Machine Learning Applied to Natural Language Processing; Speech
Recognition and Natural Language Processing Tools and Applications. Short
papers and related posters were organized according to the two main areas of
PROPOR: Natural Language Processing and Speech Technology.
    This year’s PROPOR had two important novelties: one was the fact that
the two main areas of the conference were more equally represented and the
other was the inclusion of a special session dedicated to Applications of Por-
tuguese Speech and Language Technologies. The special session, promoted by
the Microsoft Language Development Center (MLDC), provided an opportunity
for university and industrial communities working on portuguese natural lan-
guage processing and speech technology to report their most recent products,
systems, resources or tools for Portuguese. Two satellite events were also or-
ganized in association with PROPOR: the Second HAREM Workshop, Named
Entity Recognition in Portuguese, and the workshop “Ten years of Linguateca”.
    We would like to express here our thanks to all members of our technical
program committee and additional reviewers, as listed on the following pages.
    We are especially grateful to our invited speakers, Tanja Schultz (Univer-
sity of Karlsruhe and CMU) and Chris Quirk (Microsoft), for their invaluable
VI     Preface

contribution, which undoubtedly increased the interest in the conference and its
quality.
   We are indebted to the PROPOR 2008 secretary, Anabela Viegas, for all her
support.
   We would like to publicly acknowledge the institutions and companies with-
out which this conference would not have been possible: Universidade de Aveiro,
Institute of Electronics and Telematics Engineering of Aveiro (IEETA), Associa-
tion for Computational Linguistics (ACL), International Speech Communication
Association (ISCA), ISCA Special Interest Group on Iberian Language (SIG-IL),
Fundação para a Ciência e a Tecnologia (FCT), Microsoft, Springer, !UZ Tech-
nologies, DESIGNEED and Grande Hotel da Curia.

June 2008                                                      António Teixeira
                                                    Vera Lúcia Strube de Lima
                                                       Luı́s Caldas de Oliveira
                                                               Paulo Quaresma
Organization

Conference Chair
António Teixeira          DETI/IEETA, Universidade de Aveiro,
                             Portugal

Program Co-chairs
Vera Lúcia Strube
  de Lima                  Pontifı́cia Universidade Católica do Rio
                             Grande do Sul, Brazil
Luı́s Caldas de Oliveira   L2F/INESC-ID, IST, Portugal

Publication Chair
Paulo Quaresma             Universidade de Évora, Portugal

Program Committee
Alexandre Agustini         Pontifı́cia Universidade Católica do Rio
                             Grande do Sul, Brazil
Sandra Aluisio             Universidade de São Paulo, Brazil
Amália Andrade            CLUL, Universidade de Lisboa, Portugal
Jorge Baptista             Universidade do Algarve, Portugal
Plı́nio Barbosa            Universidade Estadual de Campinas, Brazil
Dante Barone               Universidade Federal do Rio Grande do Sul,
                             Brazil
Steven Bird                University of Melbourne, Australia
Antonio Bonafonte          Universitat Politècnia de Catalunya, Spain
António Branco            Universidade de Lisboa, Portugal
Luı́s Caldas de Oliveira   INESC-ID/IST, Portugal
Nick Campbell              NiCT/ATR, Japan
Diamantino Caseiro         INESC-ID, Portugal
Berthold Crysmann          Bonn University, Germany
Gaël Dias                 Universidade da Beira Interior, Portugal
Bento Dias da Silva        Universidade Estadual Paulista, Brazil
Marcelo Finger             IME- USP, Brazil
Diamantino Freitas         Faculdade de Engenharia, Universidade do
                             Porto, Portugal
Pablo Gamallo              Universidade de Santiago de Compostela,
                             Spain
VIII   Organization

Caroline Hagège       Xerox Research Centre Europe, France
Julia Hirschberg       Columbia University, USA
Isabel Hub Faria       Universidade de Lisboa, Portugal
Tracy Holloway King    Palo Alto Research Center, USA
Eric Laporte           Université Paris-Est Marne-la-Vallée, France
Gabriel Lopes          Faculdade de Ciências e Tecnologia,
                         Universidade Nova de Lisboa, Portugal
Saturnino Luz          Trinity College Dublin, Ireland
Lúcia Machado Rino    Dep. de Computação, Universidade Federal de
                         São Carlos, Brazil
Sandra Madureira       Pontifı́cia Universidade Católica de São Paulo,
                         Brazil
Belinda Maia           Faculdade de Letras, Universidade do Porto,
                         Portugal
Ranniery Maia          ATR Spoken Language Communication Labs,
                         Japan
Nuno Mamede            INESC-ID/IST, Portugal
Jean-Luc Minel         MoDyCo, CNRS, France
Climent Nadeu          Universitat Politècnica de Catalunya, Spain
João Neto             INESC-ID/IST, Portugal
Viviane Moreira Orengo Universidade Federal do Rio Grande do Sul,
                         Brazil
Manuel Palomar         Universidad de Alicante, Spain
Fernando Perdigão     Universidade de Coimbra, Portugal
Carlos Prolo           Pontifı́cia Universidade Católica do Rio
                         Grande do Sul, Brazil
Paulo Quaresma         Universidade de Évora, Portugal
Violeta Quental        Pontifı́cia Universidade Católica do Rio de
                         Janeiro, Brazil
Elisabete Ranchhod     Universidade de Lisboa, Portugal
Fernando Gil
   Resende Jr.         Universidade Federal do Rio de Janeiro, Brazil
António Ribeiro       IPSC, Italy
Irene Rodrigues        Departamento de Informática, Universidade
                         de Évora, Portugal
Solange Rossato        University of Grenoble 3, France
Diana Santos           SINTEF, Norway
Luı́s Seabra Lopes     DETI/IEETA, Universidade de Aveiro,
                         Portugal
António Serralheiro   INESC-ID and Academia Militar, Portugal
Vera Strube de Lima    Pontifı́cia Universidade Católica do Rio
                         Grande do Sul, Brazil
António Teixeira      DETI/IEETA, Universidade de Aveiro,
                         Portugal
Organization   IX

Ana Maria
  Tramunt Ibaños        Pontifı́cia Universidade Católica do Rio
                           Grande do Sul, Brazil
Isabel Trancoso          INESC-ID/IST, Portugal
João Veloso             Universidade do Porto, Portugal
Renata Vieira            UNISINOS, Brazil
Aline Villavicencio      Universidade Federal do Rio Grande do Sul,
                           Brazil
Fábio Violaro           Universidade Estadual de Campinas, Brazil
Maria das
   Graças Volpe Nunes   Universidade de São Paulo, Brazil
Dina Wonsever            Universidad de la Republica, Uruguay
Nestor Yoma              Universidad de Chile, Chile

Additional Reviewers
Petra Wagner             Bonn University, Germany
Luı́sa Coheur            INESC-ID, Portugal
José Adrián
  Rodrı́guez Fonollosa   Universitat Politècnica de Catalunya, Spain
Thiago Pardo             Universidade de São Paulo, Brazil
Table of Contents

Speech Analysis
Event Detection by HMM, SVM and ANN: A Comparative Study . . . . . .                                                        1
   Carla Lopes and Fernando Perdigão

Frication and Voicing Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                         11
   Luis M.T. Jesus and Philip J.B. Jackson

A Spoken Dialog System Speech Interface Based on a Microphone
Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   21
   Gustavo Esteves Coelho, António Joaquim Serralheiro, and
   João Paulo Neto

Ontologies, Semantics and Anaphora Resolution
PAPEL: A Dictionary-Based Lexical Ontology for Portuguese . . . . . . . . .                                                 31
  Hugo Gonçalo Oliveira, Diana Santos, Paulo Gomes, and Nuno Seco

Comparing Window and Syntax Based Strategies for Semantic
Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      41
   Pablo Gamallo Otero

The Mitkov Algorithm for Anaphora Resolution in Portuguese . . . . . . . . .                                                51
  Amanda Rocha Chaves and Lucia Helena Machado Rino

Semantic Similarity, Ontologies and the Portuguese Language: A Close
Look at the subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .             61
   Juliano Baldez de Freitas, Vera Lúcia Strube de Lima, and
   Josiane Fontoura dos Anjos Brandolt

Speech Synthesis
Boundary Refining Aiming at Speech Synthesis Applications . . . . . . . . . .                                                71
  Monique V. Nicodem, Sandra G. Kafka, Rui Seara Jr., and Rui Seara

Evolutionary-Based Design of a Brazilian Portuguese Recording Script
for a Concatenative Synthesis System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                          81
    Monique Vitório Nicodem, Izabel Christine Seara, Daiana dos Anjos,
    Rui Seara Jr., and Rui Seara

DIXI – A Generic Text-to-Speech System for European Portuguese . . . . .                                                    91
  Sérgio Paulo, Luı́s C. Oliveira, Carlos Mendes, Luı́s Figueira,
  Renato Cassaca, Céu Viana, and Helena Moniz
XII         Table of Contents

European Portuguese Articulatory Based Text-to-Speech: First
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   101
   António Teixeira, Catarina Oliveira, and Plı́nio Barbosa

Machine Learning Applied to Natural Language
Processing
Statistical Machine Translation of Broadcast News from Spanish to
Portuguese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      112
   Raquel Sánchez Martı́nez, João Paulo da Silva Neto, and
   Diamantino António Caseiro

Combining Multiple Features for Automatic Text Summarization
through Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                    122
   Daniel Saraiva Leite and Lucia Helena Machado Rino

Some Experiments on Clustering Similar Sentences of Texts in
Portuguese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      133
   Eloize Rossi Marques Seno and Maria das Graças Volpe Nunes

Portuguese Part-of-Speech Tagging Using Entropy Guided
Transformation Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                 143
   Cı́cero Nogueira dos Santos, Ruy L. Milidiú, and Raúl P. Renterı́a

Learning Coreference Resolution for Portuguese Texts . . . . . . . . . . . . . . . .                                        153
   José Guilherme C. de Souza, Patricia Nunes Gonçalves, and
   Renata Vieira

Speech Recognition and Applications
Domain Adaptation of a Broadcast News Transcription System for the
Portuguese Parliament . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .               163
   Luı́s Neves, Ciro Martins, Hugo Meinedo, and João Neto

Automatic Classification and Transcription of Telephone Speech in
Radio Broadcast Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                172
  Alberto Abad, Hugo Meinedo, and João Neto

A Platform of Distributed Speech Recognition for the European
Portuguese Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .               182
   João Miranda and João P. Neto

Natural Language Processing Tools and Applications
Supporting e-Learning with Language Technology for Portuguese . . . . . .                                                   192
   Mariana Avelãs, António Branco, Rosa Del Gaudio, and
   Pedro Martins
Table of Contents               XIII

ParaMT: A Paraphraser for Machine Translation . . . . . . . . . . . . . . . . . . . . .                               202
   Anabela Barreiro

POSTERS

Natural Language Processing
Second HAREM: New Challenges and Old Wisdom . . . . . . . . . . . . . . . . . .                                       212
   Diana Santos, Cláudia Freitas, Hugo Gonçalo Oliveira, and
   Paula Carvalho

Floresta Sintá(c)tica: Bigger, Thicker and Easier . . . . . . . . . . . . . . . . . . . . .                          216
   Cláudia Freitas, Paulo Rocha, and Eckhard Bick

The Identification and Description of Frozen Prepositional Phrases
through a Corpus-Oriented Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                   220
   Milena Garrão, Violeta Quental, Nuno Caminada, and Eckhard Bick

CorrefSum: Referencial Cohesion Recovery in Extractive Summaries . . . .                                              224
  Patrı́cia Nunes Gonçalves, Renata Vieira, and
  Lucia Helena Machado Rino

Answering Portuguese Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                  228
  Luı́s Fernando Costa and Luı́s Miguel Cabral

XisQuê: An Online QA Service for Portuguese . . . . . . . . . . . . . . . . . . . . . . .                            232
   António Branco, Lino Rodrigues, João Silva, and Sara Silveira

Using Semantic Prototypes for Discourse Status Classification . . . . . . . . .                                        236
   Sandra Collovini, Luiz Carlos Ribeiro Jr., Patricia Nunes Gonçalves,
   Vinicius Muller, and Renata Vieira

Using System Expectations to Manage User Interactions . . . . . . . . . . . . . .                                     240
   Filipe M. Martins, Ana Mendes, Joana Paulo Pardal,
   Nuno J. Mamede, and João P. Neto

Speech and Language Processing
Adaptive Modeling and High Quality Spectral Estimation for Speech
Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   244
  Luı́s Coelho and Daniela Braga

On the Voiceless Aspirated Stops in Brazilian Portuguese . . . . . . . . . . . . .                                    248
  Mariane Antero Alves, Izabel Christine Seara,
  Fernando Santana Pacheco, Simone Klein, and Rui Seara
XIV         Table of Contents

Comparison of Phonetic Segmentation Tools for European
Portuguese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      252
   Luı́s Figueira and Luı́s C. Oliveira

Spoltech and OGI-22 Baseline Systems for Speech Recognition in
Brazilian Portuguese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .              256
   Nelson Neto, Patrick Silva, Aldebaro Klautau, and Andre Adami

Development of a Speech Recognizer with the Tecnovoz Database . . . . . .                                                   260
  José Lopes, Cláudio Neves, Arlindo Veiga, Alexandre Maciel,
  Carla Lopes, Fernando Perdigão, and Luı́s Sá

Dynamic Language Modeling for the European Portuguese . . . . . . . . . . . .                                               264
  Ciro Martins, António Teixeira, and João Neto

An Approach to Natural Language Equation Reading in Digital Talking
Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   268
  Carlos Juzarte Rolo and António Joaquim Serralheiro

Topic Segmentation in a Media Watch System . . . . . . . . . . . . . . . . . . . . . . .                                    272
  Rui Amaral and Isabel Trancoso

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .            277
You can also read