PRISMHA (PROVIDING RICH SEMANTIC METADATA FOR HISTORICAL ARCHIVES)
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
PRiSMHA (Providing Rich Semantic M t d t ffor Historical Metadata Hi t i l Archives) A hi ) Un progetto per dotare gli archivi storici di metadati semantici ricchi Anna Goy le immagini utilizzate possono essere soggette a COPYRIGHT DipInfo si racconta 2019 Goy et al. 1
PRiSMHA - I PRiSMHA (Providing Rich Semantic Metadata for Historical Archives) national project running 2017-2020 funded by Compagnia di San Paolo and Università di Torino p partners: Dip. p di Informatica and Dip. p di Studi Storici (Università di Torino) in collaboration with Polo del '900 & Istituto Piemontese A.Gramsci (Torino) g goal = experimenting p g with a crowdsourcing g approach for the construction of ontology- based formal semantic representations of the content t t off historical hi t i l documents d t DipInfo si racconta 2019 Goy et al. 2
PRiSMHA - II Involved people: Anna Goy, Goy Rossana Damiano Damiano, Diego Magro Magro, Daniele Radicioni, Antonio Lieto, Enrico Mensa, Marco Rovera, Davide Colla, Cristina Re, Marco Leontino (Dip. Informatica, Unito) Fabrizio Loreto,, Cristina Accornero;; Stefano Musso (Dip. Studi Storici, Unito) Dunia Astrologo, Bruno Boniolo, Matteo D'Ambrosio, Elisa Elmi, Valeria Mosca, Claudio Salin, Alice Montanaro (Fondaz. Ist. Piemontese A. Gramsci, Polo del '900) Web: di.unito.it/prismha DipInfo si racconta 2019 Goy et al. 3
PRiSMHA Partners: Polo del '900 What: Cultural Center © Polo del '900 Where: 8.000 mq c/o Quartieri Militari juvarriani in Torino Who: 19 cultural institutions (members) R i Regione Piemonte, Pi t Comune C di TTorino, i Compagnia C i di S. Paolo (founders) Online: www.polodel900.it www polodel900 it Library: 300.000 volumes Archives: 900 archival fonds, 130.000 pictures, 21.000 posters, 53.000 AV, ... A hi Archives online li (9centRo platform): p / www.polodel900.it/9centro DipInfo si racconta 2019 Goy et al. © Polo del '9004
PRiSMHA Partners: I tit t Pi Istituto Piem. A A. Gramsci G i Where: at Polo del '900 Online: www.gramscitorino.it Library: 60.000 volumes + a huge h amountt off journals j l andd newspapers Archives: 220 fonds, 33.000 pictures, 4.000 posters, 1 000 AV, 1.000 AV flags, flags banners e objects (= 25% of the total of Polo del '900 archives) Pictures Archive online: www.gramscitorino.it/archiviofotografico.html © Polo del '900 DipInfo si racconta 2019 Goy et al. 5
PRiSMHA: semantic layer - I General goal: building a "smart digital archivist" by enhancing the access to historical archives through Semantic Web technologies Semantic layer: semantic metadata describing the content of historical documents (events, ( people, places, etc.) by means of: Computational ontology = system domain knowledge RDF triplestore = KB containing semantic metadata, based on the vocabulary provided by the ontology DipInfo si racconta 2019 Goy et al. 6
PRiSMHA: semantic layer - II E Example: l semantic ti representation t ti off an eventt "Il 20 Novembre, ... gli studenti tecnici sono stati aggrediti dai carabinieri armati di catene" catene 152301_18.9_Bonet_watermark.pdf Confrontational Action Organization ISA ISA isAbout Day PoliceCharge LawEnforcement Agency instance-of instanceOf instance-of stu t studenti hasAgent 20.11.68 aggrediti dai Carabinieri hasTime carabinieri Physical y Set Role Object hasInstrument hasPatient instance-of instance-of instance-off studenti catene tecnici studente hasDescribing DipInfo si racconta 2019 Goy et al. Concept 7
PRiSMHA: semantic layer - III Main bottleneck = building semantic metadata! e ada a! So Solutions: u o s: 1. Information Extraction (when text is available) 2 2. Crowdsourcing C d i ( (user-generated t d content): t t) web platform for collaboratively building semantic metadata NB synergy between the two approaches: RDF triples IE crowdsourcing web plarform DipInfo si racconta 2019 Goy et al. 8
PRiSMHA: semantic layer - III HERO − Historical Event Representation Ontology 1. Top/core ontology ◦ is a common vocabulary = shared between: the system, users of the crowdsourcing platform, and final users querying the "smart archivist" computer scientists/ontologists (designing and implementing the system) and historians (providing a historical perspective on the docs) ◦ is the result of the integration of an analysis of existing models the outcomes of the dialogg between computer scientists and historians DipInfo si racconta 2019 Goy et al. 9
PRiSMHA: semantic layer - IV St Structure t off HERO: HERO HERO-TOP: very general classes and properties (abstract entity entity, (non)physical object object, being part of of, being a sub-concept of, ...) A set of modules containing general classes and properties for characterizing... ◦ events (event, action, participating in an event, playing the role of agent in an event, event causing, causing ...)) HERO-EVENT ◦ roles, organizations, collections, and sets (role, organization, playing a (social) role, being a member of a collective/set, ...) HERO-ROCS ◦ places l (place, building, inside/outside, ...) HERO-PLACE ◦ time intervals (time interval interval, day day, Allen Allen'ss relations between time intervals, ...) HERO-TIME DipInfo si racconta 2019 Goy et al. 10
PRiSMHA: processes & synergies - I The top-level semantic model guided the analysis and selection of documents from the Ist. Gramsci'ss archives (the Gramsci (th students t d t and d workers k protest t t during the years 1968-1969 in Italy): ◦ historians analyzed docs and built analytical cards, structured on the basis of the top-level semanticti model d l The content of the cards, d coupled l d with ith domain d i expertise ti directly di tl provided by historians, has been used to build the domain ontology (HERO (HERO-900) 900) = specific semantic model refining HERO and containing concepts and properties i characterizing i i the domaini DipInfo si racconta 2019 Goy et al. 11
PRiSMHA: prototype_v1 - I We designed W d i d andd iimplemented l t da first prototype of the crowdsourcing platform (with a limited set of functionalities) ◦ it enables users to "annotate" archival documents with formal semantic descriptions of their content (knowledge base/RDF triplestore) ◦ the process is driven by the underlying ontology HERO 900 HERO-900 Technologies: Spring, p g MySQL, y Jackson Libraries, OWL Api, Konclude, Konclude Apache JENA, Log4J, Gradle, B tt Bootstrap, JQuery, JQ D3, PDFObject DipInfo si racconta 2019 Goy et al. 12
PRiSMHA: prototype_v1 – II... © FONDAZIONE ISTITUTO PIEMONTESE ANTONIO GRAMSCI ONLUS Examples of documents from Ist. Gramsci's archives... © FONDAZIONE ISTITUTO PIEMONTESE ANTONIO GRAMSCI ONLUS DipInfo si racconta 2019 Goy et al. 13
PRiSMHA: prototype_v1 – II... © FONDAZIONE ISTITUTO PIEMONTESE ANTONIO GRAMSCI ONLUS © FONDAZIONE ISTITUTO PIEMONTESE ANTONIO GRAMSCI ONLUS © FONDAZIONE ISTITUTO PIEMONTESE ANTONIO GRAMSCI ONLUS DipInfo si racconta 2019 Goy et al. 14
PRiSMHA: prototype_v1 – II... © FONDAZIONE ISTITUTO PIEMONTESE ANTONIO GRAMSCI ONLUS © FONDAZIONE ISTITUTO PIEMONTESE ANTONIO GRAMSCI ONLUS DipInfo si racconta 2019 Goy et al. 15
PRiSMHA: prototype_v1 – II... © FONDAZIONE ISTITUTO PIEMONTESE ANTONIO GRAMSCI ONLUS TO PIEMONTESE LUS RAMSCI ONL FONDAZIONE ISTITUT NTONIO GR AN ©F © FONDAZIONE ISTITUTO PIEMONTESE ANTONIO GRAMSCI ONLUS DipInfo si racconta 2019 Goy et al. 16
PRiSMHA: Further steps We are designing and implementing prototype_v2 prototype v2 of the crowdsourcing platform (revised on the basis of feedback on v1 and including g a larger g set of functionalities) We are investigating g g the exploitation p of automatic Information Extraction on OCR-ized archival documents, to provide an effective supportt tto the th annotation t ti process THANKS FOR YOUR ATTENTION! DipInfo si racconta 2019 Goy et al. 17
You can also read