PRISMHA (PROVIDING RICH SEMANTIC METADATA FOR HISTORICAL ARCHIVES)

Page created by Roger Reynolds
 
CONTINUE READING
PRISMHA (PROVIDING RICH SEMANTIC METADATA FOR HISTORICAL ARCHIVES)
PRiSMHA (Providing Rich Semantic
        M t d t ffor Historical
        Metadata       Hi t i l Archives)
                                A hi     )
                        
      Un progetto per dotare gli archivi
       storici di metadati semantici ricchi

                                     Anna Goy
                   le immagini utilizzate possono essere soggette a COPYRIGHT
DipInfo si racconta 2019                       Goy et al.                       1
PRISMHA (PROVIDING RICH SEMANTIC METADATA FOR HISTORICAL ARCHIVES)
PRiSMHA - I
   PRiSMHA
   (Providing Rich Semantic Metadata for Historical Archives)
      national project running 2017-2020
      funded by Compagnia di San Paolo and
       Università di Torino
      p
       partners:  Dip.
                    p di Informatica and Dip.
                                           p di Studi
       Storici (Università di Torino)
      in collaboration with Polo del '900 & Istituto
       Piemontese A.Gramsci (Torino)
      g
       goal = experimenting
                p           g with a crowdsourcing
                                                 g
       approach for the construction of ontology-
       based formal semantic representations of the
       content
          t t off historical
                  hi t i l documents
                             d         t
DipInfo si racconta 2019         Goy et al.                     2
PRISMHA (PROVIDING RICH SEMANTIC METADATA FOR HISTORICAL ARCHIVES)
PRiSMHA - II
   Involved people:
       Anna Goy,
            Goy Rossana Damiano
                           Damiano, Diego Magro
                                              Magro,
       Daniele Radicioni, Antonio Lieto, Enrico Mensa,
       Marco Rovera, Davide Colla, Cristina Re,
       Marco Leontino (Dip. Informatica, Unito)
       Fabrizio Loreto,, Cristina Accornero;; Stefano
       Musso (Dip. Studi Storici, Unito)
       Dunia Astrologo, Bruno Boniolo, Matteo
       D'Ambrosio, Elisa Elmi, Valeria Mosca, Claudio
       Salin, Alice Montanaro (Fondaz. Ist.
       Piemontese A. Gramsci, Polo del '900)
   Web: di.unito.it/prismha

DipInfo si racconta 2019       Goy et al.                3
PRISMHA (PROVIDING RICH SEMANTIC METADATA FOR HISTORICAL ARCHIVES)
PRiSMHA Partners: Polo del '900
   What: Cultural Center                                      © Polo del '900

   Where: 8.000 mq c/o Quartieri Militari
           juvarriani in Torino
   Who: 19 cultural institutions (members)
   R i
   Regione   Piemonte,
             Pi      t Comune
                          C        di TTorino,
                                          i    Compagnia
                                               C      i di
   S. Paolo (founders)
   Online: www.polodel900.it
            www polodel900 it
   Library: 300.000 volumes
   Archives: 900 archival fonds,
   130.000 pictures, 21.000 posters,
   53.000 AV, ...
   A hi
   Archives  online
                li
   (9centRo platform):
         p              /
   www.polodel900.it/9centro

DipInfo si racconta 2019       Goy et al.           © Polo del '9004
PRISMHA (PROVIDING RICH SEMANTIC METADATA FOR HISTORICAL ARCHIVES)
PRiSMHA Partners:
 I tit t Pi
 Istituto Piem. A
                A. Gramsci
                   G     i
   Where: at Polo del '900
   Online: www.gramscitorino.it
   Library: 60.000 volumes
   + a huge
       h      amountt off journals
                          j     l andd newspapers
   Archives: 220 fonds, 33.000 pictures, 4.000 posters,
   1 000 AV,
   1.000 AV flags,
              flags banners e objects
   (= 25% of the total of Polo del '900 archives)
   Pictures Archive online:
   www.gramscitorino.it/archiviofotografico.html

                                               © Polo del '900

DipInfo si racconta 2019       Goy et al.                        5
PRiSMHA: semantic layer - I
   General goal: building a "smart digital
   archivist" by enhancing the access to
   historical archives through Semantic
   Web technologies

   Semantic layer: semantic metadata describing
   the content of historical documents (events,
                                       (
   people, places, etc.) by means of:
    Computational ontology =
     system domain knowledge
    RDF triplestore = KB containing
     semantic metadata,
      based on the
      vocabulary provided
      by the ontology
DipInfo si racconta 2019    Goy et al.            6
PRiSMHA: semantic layer - II
   E
   Example:
        l semantic
               ti representation
                          t ti   off an eventt
      "Il 20 Novembre, ... gli studenti tecnici sono
      stati aggrediti dai carabinieri armati di catene"
                                                catene
                                                    152301_18.9_Bonet_watermark.pdf
                Confrontational
                   Action                                                          Organization
                        ISA                                                               ISA
                                                   isAbout
  Day            PoliceCharge                                                   LawEnforcement
                                                                                    Agency
           instance-of         instanceOf
                                                                                          instance-of
                                              stu t
                                              studenti            hasAgent
         20.11.68                           aggrediti dai                          Carabinieri
                           hasTime           carabinieri

         Physical
           y                                                                 Set         Role
         Object            hasInstrument
                                                  hasPatient
                                                                       instance-of          instance-of
          instance-off
                                             studenti
                            catene            tecnici                                  studente
                                                                hasDescribing
DipInfo si racconta 2019                           Goy et al.   Concept                                 7
PRiSMHA: semantic layer - III
  Main bottleneck = building semantic
  metadata!
   e ada a!  So
               Solutions:
                  u o s:
  1.     Information Extraction (when text is available)
  2
  2.     Crowdsourcing
         C   d      i   (
                        (user-generated
                                     t d content):
                                              t t)
         web platform for collaboratively building
         semantic metadata
  NB synergy between the two approaches:

                                                    RDF triples
                  IE

                           crowdsourcing
                           web plarform

DipInfo si racconta 2019               Goy et al.                 8
PRiSMHA: semantic layer - III
  HERO − Historical Event Representation Ontology
  1.    Top/core ontology
       ◦ is a common vocabulary = shared between:
           the system, users of the crowdsourcing platform,
            and final users querying the "smart archivist"
           computer scientists/ontologists (designing and
            implementing the system) and historians
            (providing a historical perspective on the docs)
       ◦ is the result of the integration of
           an analysis of existing models
           the outcomes of the dialogg between
            computer scientists
            and historians

DipInfo si racconta 2019         Goy et al.                    9
PRiSMHA: semantic layer - IV
  St
  Structure
       t    off HERO:
                HERO
     HERO-TOP: very general classes and properties
      (abstract entity
                entity, (non)physical object
                                      object, being part of
                                                         of, being
      a sub-concept of, ...)
     A set of modules containing general classes and
      properties for characterizing...
      ◦ events (event, action, participating in an event, playing
          the role of agent in an event,
                                  event causing,
                                         causing ...))
         HERO-EVENT
      ◦ roles, organizations, collections, and sets (role,
          organization, playing a (social) role, being a member of a
          collective/set, ...)  HERO-ROCS
      ◦ places
          l     (place, building, inside/outside, ...)
         HERO-PLACE
      ◦ time intervals (time interval
                             interval, day
                                       day, Allen
                                            Allen'ss relations
        between time intervals, ...)  HERO-TIME
DipInfo si racconta 2019             Goy et al.                      10
PRiSMHA: processes & synergies - I
     The top-level semantic model guided the analysis
      and selection of documents from the Ist.
      Gramsci'ss archives (the
      Gramsci             (th students
                               t d t and
                                       d workers
                                            k    protest
                                                    t t
      during the years 1968-1969 in Italy):
      ◦ historians analyzed docs and built analytical cards,
        structured on the
        basis of the top-level
        semanticti model
                      d l
     The content of the
      cards,
         d coupled
                 l d with
                       ith domain
                           d     i expertise
                                         ti directly
                                             di  tl
      provided by historians, has been used to build the
      domain ontology (HERO
                        (HERO-900)
                               900) =
      specific semantic model refining HERO
      and containing concepts and
      properties
              i characterizing
                         i i   the domaini
DipInfo si racconta 2019            Goy et al.                 11
PRiSMHA: prototype_v1 - I
   We designed
   W     d i     d andd iimplemented
                            l    t da
   first prototype of the crowdsourcing platform
   (with a limited set of functionalities)
   ◦ it enables users to "annotate" archival documents
     with formal semantic descriptions of their content
     (knowledge base/RDF triplestore)
   ◦ the process is driven by the underlying ontology
     HERO 900
     HERO-900
   Technologies:
   Spring,
    p g MySQL,
             y
   Jackson Libraries,
   OWL Api,
   Konclude,
   Konclude
   Apache JENA,
   Log4J, Gradle,
   B tt
   Bootstrap,  JQuery,
               JQ
   D3, PDFObject
DipInfo si racconta 2019      Goy et al.                  12
PRiSMHA: prototype_v1 – II...                                           © FONDAZIONE ISTITUTO PIEMONTESE ANTONIO GRAMSCI ONLUS

   Examples of documents from Ist. Gramsci's
   archives...

                           © FONDAZIONE ISTITUTO PIEMONTESE ANTONIO GRAMSCI ONLUS

DipInfo si racconta 2019                  Goy et al.                                                               13
PRiSMHA: prototype_v1 – II...                                           © FONDAZIONE ISTITUTO PIEMONTESE ANTONIO GRAMSCI ONLUS

                            © FONDAZIONE ISTITUTO PIEMONTESE ANTONIO GRAMSCI ONLUS

                           © FONDAZIONE ISTITUTO PIEMONTESE ANTONIO GRAMSCI ONLUS

DipInfo si racconta 2019                  Goy et al.                                                               14
PRiSMHA: prototype_v1 – II...                                            © FONDAZIONE ISTITUTO PIEMONTESE ANTONIO GRAMSCI ONLUS

                           © FONDAZIONE ISTITUTO PIEMONTESE ANTONIO GRAMSCI ONLUS

DipInfo si racconta 2019                    Goy et al.                                                              15
PRiSMHA: prototype_v1 – II...                              © FONDAZIONE ISTITUTO PIEMONTESE ANTONIO GRAMSCI ONLUS

                                                                                         TO PIEMONTESE
                                                                                         LUS
                                                                                RAMSCI ONL
                                                                        FONDAZIONE ISTITUT
                                                                        NTONIO GR
                                                                       AN
                                                                       ©F
                                    © FONDAZIONE ISTITUTO PIEMONTESE
                                    ANTONIO GRAMSCI ONLUS

DipInfo si racconta 2019   Goy et al.                                                                    16
PRiSMHA: Further steps
      We are designing and
       implementing prototype_v2
                      prototype v2 of
       the crowdsourcing platform (revised on the
       basis of feedback on v1 and including
                                           g a larger
                                                  g
       set of functionalities)
      We are investigating
                       g  g the exploitation
                                   p         of
       automatic Information Extraction on OCR-ized
       archival documents, to provide an effective
       supportt tto the
                    th annotation
                           t ti   process

               THANKS FOR YOUR ATTENTION!

DipInfo si racconta 2019     Goy et al.                 17
You can also read