Data-Centric Systems and Applications
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Data-Centric Systems and Applications Series Editors M.J. Carey S. Ceri Editorial Board P. Bernstein U. Dayal C. Faloutsos J.C. Freytag G. Gardarin W. Jonker V. Krishnamurthy M.-A. Neimat P. Valduriez G. Weikum K.-Y. Whang J. Widom For further volumes: http://www.springer.com/series/5258
•
Roberto De Virgilio Francesco Guerra Yannis Velegrakis Editors Semantic Search over the Web 123
Editors Roberto De Virgilio Francesco Guerra Department of informatics and Automation University of Modena and Reggio Emilia University Roma Tre Modena Rome Italy Italy Yannis Velegrakis University of Trento Trento Italy ISBN 978-3-642-25007-1 ISBN 978-3-642-25008-8 (eBook) DOI 10.1007/978-3-642-25008-8 Springer Heidelberg New York Dordrecht London Library of Congress Control Number: 2012943692 ACM Computing Classification: H.3, I.2 c Springer-Verlag Berlin Heidelberg 2012 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Introduction The Web has become the world’s largest database with search being the main tool that enables organization and individuals to exploit its huge amounts of information that is freely offering. Thus, having a successful mechanism for finding and retrieving the most relevant information to a task at hand is of major importance. Traditionally, Web search has been based on textual and structural similarity. Given the set of keywords that comprise a query, the goal is to identify the documents containing all these keywords (or as many as possible). Additional information such as information from logs, references from authorities, popularity, and personalization has been extensively used to further improve the accuracy. However, one of the dimensions that has not been captured to its full extent is that of semantics, that is, fully understanding the meaning of the words in a query and in a document. Combining search and semantics gives birth to the idea of the semantic search. Semantic search can be described in a sentence as the effort of improving the accuracy of the search process by understanding the context and limiting the ambiguity. The idea of the semantic Web is based on this goal and aims at making the semantics of the Web content machine understandable. To do so, a number of different technologies that allowed for richer modeling of the Web resources, along- side annotations describing their semantics, have been introduced. Furthermore, the semantic Web went on to create associations between different representations of the same real-world entity. These associations are either explicitly specified or derived off-line and then remain static. They allow data from many different sources to be interlinked, giving birth to the so-called linked open data cloud. Nevertheless, semantics have yet to fully penetrate existing data management solutions and become an integral part in information retrieval, analysis, integration, and data exchange techniques. Unfortunately, the generic idea of semantic search has remained in its infancy. Existing solutions are either search engines that simply index the semantic Web data, like Sindice, or the traditional search engines enhanced with some basic form of synonym exploitation, as supported by Google and Bing. Semantic search is about using the semantics of the query terms instead of the terms themselves. This means v
vi Introduction using synonyms and related terms, providing additional materials in the answer that may be related to elements already in the result, searching not only in the content but also in the semantic annotations of the data, exploiting ontological knowledge through advanced reasoning techniques, treating the query as a natural language expression, clustering the results, offering faced browsing, etc. All the above mean that there are currently numerous opportunities to exploit in the area of semantic search on the Web. In this work, we try to give a generic overview of the works that have been done in the field and in other related areas. However, the work should definitely not be considered as a survey. It is simply intended to provide the reader with a taste of the many different aspects of the problem and go deep in some specific technologies and solutions. The book is divided into three parts. The first part introduces the notion of the Web of Data. It describes the different types of data that exist, their topology, and their storing and indexing techniques. It also shows how semantic links between the data can be automatically derived. The second part is dedicated specifically to Web search. It presents different kinds of search, such as the exploratory or the path-oriented, alongside methods for efficiently implementing them. It talks about the problem of interactive query construction and also about the understanding of the keyword query semantics. Other topics include the use of uncertainty in query answering or the exploitation of ontologies. The second part concludes with some reference to Mashup technologies and the way they are affected by the semantics. The theme of the third part of the book is Linked Data and, more specifically, how recommender system ideas can be used in the case of linked data management alongside techniques for efficient query answering. Rome, Italy Roberto De Virgilio Modena, Italy Francesco Guerra Trento, Italy Yannis Velegrakis
Contents Part I Introduction to Web of Data 1 Topology of the Web of Data . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3 Christian Bizer, Pablo N. Mendes, and Anja Jentzsch 2 Storing and Indexing Massive RDF Datasets .. . . . . . .. . . . . . . . . . . . . . . . . . . . 31 Yongming Luo, François Picalausa, George H.L. Fletcher, Jan Hidders, and Stijn Vansummeren 3 Designing Exploratory Search Applications upon Web Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 61 Marco Brambilla and Stefano Ceri Part II Search over the Web 4 Path-Oriented Keyword Search Query over RDF . .. . . . . . . . . . . . . . . . . . . . 81 Roberto De Virgilio, Paolo Cappellari, Antonio Maccioni, and Riccardo Torlone 5 Interactive Query Construction for Keyword Search on the Semantic Web. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 109 Gideon Zenz, Xuan Zhou, Enrico Minack, Wolf Siberski, and Wolfgang Nejdl 6 Understanding the Semantics of Keyword Queries on Relational Data Without Accessing the Instance . . . .. . . . . . . . . . . . . . . . . . . . 131 Sonia Bergamaschi, Elton Domnori, Francesco Guerra, Silvia Rota, Raquel Trillo Lado, and Yannis Velegrakis 7 Keyword-Based Search over Semantic Data . . . . . . . .. . . . . . . . . . . . . . . . . . . . 159 Klara Weiand, Andreas Hartl, Steffen Hausmann, Tim Furche, and François Bry vii
viii Contents 8 Semantic Link Discovery over Relational Data . . . . .. . . . . . . . . . . . . . . . . . . . 193 Oktie Hassanzadeh, Anastasios Kementsietsidis, Lipyeow Lim, Renée J. Miller, and Min Wang 9 Embracing Uncertainty in Entity Linking . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 225 Ekaterini Ioannou, Wolfgang Nejdl, Claudia Niederée, and Yannis Velegrakis 10 The Return of the Entity-Relationship Model: Ontological Query Answering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 255 Andrea Calı̀, Georg Gottlob, and Andreas Pieris 11 Linked Data Services and Semantics-Enabled Mashup . . . . . . . . . . . . . . . . 283 Devis Bianchini and Valeria De Antonellis Part III Linked Data Search Engines 12 A Recommender System for Linked Data . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 311 Roberto Mirizzi, Azzurra Ragone, Tommaso Di Noia, and Eugenio Di Sciascio 13 Flint: From Web Pages to Probabilistic Semantic Data .. . . . . . . . . . . . . . . 333 Lorenzo Blanco, Mirko Bronzi, Valter Crescenzi, Paolo Merialdo, and Paolo Papotti 14 Searching and Browsing Linked Data with SWSE . . . . . . . . . . . . . . . . . . . . 361 Andreas Harth, Aidan Hogan, Jürgen Umbrich, Sheila Kinsella, Axel Polleres, and Stefan Decker Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 415
Contributors Sonia Bergamaschi Dipartimento di Ingegneria dell’Informazione, Università di Modena e Reggio Emilia, Modena, Italy Devis Bianchini Department of Electronics for Automation, University of Brescia, Brescia, Italy Christian Bizer Web-based Systems Group, Freie Universität Berlin, Berlin, Germany Lorenzo Blanco Dipartimento di Informatica e Automazione, Università degli Studi Roma Tre, Rome, Italy Marco Brambilla Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy Mirko Bronzi Dipartimento di Informatica e Automazione, Università degli Studi Roma Tre, Rome, Italy François Bry Institute for Informatics, University of Munich, München, Germany Andrea Calı̀ Department of Computer Science and Information Systems, Birkbeck University of London, London, UK Paolo Cappellari Interoperable System Group, Dublin City University, Dublin, Ireland Stefano Ceri Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy Valter Crescenzi Dipartimento di Informatica e Automazione, Università degli Studi Roma Tre, Rome, Italy Valeria De Antonellis Department of Electronics for Automation, University of Brescia, Brescia, Italy Roberto De Virgilio University Roma Tre, Rome, Italy ix
x Contributors Tommaso Di Noia Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari, Bari, Italy Eugenio Di Sciascio Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari, Bari, Italy Stefan Decker Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland Elton Domnori Dipartimento di Ingegneria dell’Informazione, Università di Modena e Reggio Emilia, Modena, Italy George H.L. Fletcher Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands Tim Furche Department of Computer Science and Institute for the Future of Computing, Oxford University, Oxford, UK Georg Gottlob Computing Laboratory, University of Oxford, Oxford, UK Oxford-Man Institute of Quantitative Finance, University of Oxford, Oxford, UK Francesco Guerra Dipartimento di Economia Aziendale, Università di Modena e Reggio Emilia, Modena, Italy Andreas Harth Karlsruhe Institute of Technology, Institute AIFB, Karlsruhe, Germany Andreas Hartl Institute for Informatics, University of Munich, München, Germany Oktie Hassanzadeh University of Toronto, Toronto, Ontario, Canada Steffen Hausmann Institute for Informatics, University of Munich, München, Germany Jan Hidders Faculty of Electrical Engineering Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands Aidan Hogan Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland Ekaterini Ioannou University Campus – Kounoupidiana, Technical University of Crete, Chania, Greece Anja Jentzsch Web-based Systems Group, Freie Universität Berlin, Berlin, Germany Anastasios Kementsietsidis IBM T.J. Watson Research Center, Hawthorne, NY, USA Sheila Kinsella Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland
Contributors xi Lipyeow Lim University of Hawaii at Manoa, Honolulu, HI, USA Yongming Luo Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands Antonio Maccioni University Roma Tre, Rome, Italy Pablo N. Mendes Web-based Systems Group, Freie Universität Berlin, Berlin, Germany Paolo Merialdo Dipartimento di Informatica e Automazione, Università degli Studi Roma Tre, Rome, Italy Renée J. Miller University of Toronto, Toronto, Ontario, Canada Enrico Minack L3S Research Center, Hannover, Germany Roberto Mirizzi Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari, Bari, Italy Wolfgang Nejdl L3S Research Center, Hannover, Germany Claudia Niederée L3S Research Center, Hannover, Germany Paolo Papotti Dipartimento di Informatica e Automazione, Università degli Studi Roma Tre, Rome, Italy François Picalausa Université Libre de Bruxelles, Brussels, Belgium Andreas Pieris Department of Computer Science, University of Oxford, Oxford, UK Axel Polleres Siemens AG Österreich, Vienna, Austria Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland Azzurra Ragone Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari, Bari, Italy Exprivia S.p.A., Molfetta, BA, Italy Silvia Rota Dipartimento di Ingegneria dell’Informazione, Università di Modena e Reggio Emilia, Modena, Italy Wolf Siberski L3S Research Center, Hannover, Germany Riccardo Torlone University Roma Tre, Rome, Italy Raquel Trillo Informatica e Ing. Sistemas, Zaragoza, Spain Jürgen Umbrich Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland Stijn Vansummeren Université Libre de Bruxelles, Brussels, Belgium
xii Contributors Yannis Velegrakis University of Trento, Trento, Italy Min Wang HP Labs China, Beijing, China Klara Weiand Institute for Informatics, University of Munich, München, Germany Gideon Zenz L3S Research Center, Hannover, Germany Xuan Zhou Renmin University of China, Beijing, China
You can also read