A Proposal for Space Exploration at The National Archives
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
A Proposal for Space Exploration at The National Archives Amy Warner Paul Clough The National Archives University of Sheffield Kew Gardens, London Sheffield (UK) S1 4DP Amy.Warner@nationalarchives.gsi.gov.uk p.d.clough@sheffield.ac.uk Abstract This information can be exploited and used to The National Archives (TNA) is the UK provide spatial awareness to information systems government's official archive. They store (see, e.g. (Buckland et al., 2007; Purves et al., and maintain records spanning over a 2007)). Zong et al. (2005) discuss how the ability thousand years; more recently in digital to perform query by location can be an important form. Much of the information held by and useful addition to a digital library and Buck- TNA makes reference to place; and fre- land et al. (2007:376) concur with this in saying quently user’s queries to TNA’s online “libraries have a broad need to support geo- catalogue involve searches for location. graphic search” To enhance access to this unique collec- In this paper we consider how geo-referencing tion, we are planning to utilise existing information from The National Archives1 (TNA), technologies from Geographical Informa- the UK government's official archive, could pro- tion Retrieval (GIR). In this paper we de- vide a means of synthesising access across dif- scribe TNA, initial analysis of the geo- ferent archives and provide novel ways of graphical nature of data held by TNA, searching and browsing historical and culturally geo-referencing as a starting point for GIR significant material. Section 2 discusses past and our proposed plan of research. work in exploiting location for enhancing access to libraries and archives; Section 3 introduces 1 Introduction TNA, the data and motivations for the proposed project; Section 4 describes areas involved in Geo-referencing - relating information to geo- geo-referencing information; Section 5 outlines graphic location - is an important consideration our planned project work and Section 6 con- for information access systems (Hill, 2006; cludes the paper. Chapman & Wieczorek, 2006). This is due to the ubiquity of location (particularly place names) 2 Related Work within much of the information we encounter on a regular basis. For example, many Web docu- Geography provides an important facet for in- ments contain geographical identifiers including formation seeking in many contexts, including place names (or toponyms), addresses (and ad- historical, cultural and library archives. Several dress fragments), postal codes, and hyperlinks studies have examined the nature of search que- (Ding et al., 2000; McCurley, 2001; Himmel- ries in the cultural heritage field, both for general stein, 2005). In addition, Petras (2004) showed information (Cunningham et al, 2004) and for that over half of a set of five million library cata- images (Pask, 2005; Choi & Rasmussen, 2003; logue records of the University of California Collins, 1998; Chen, 2001). These all had similar contained one or more place-related subject findings: namely that people, places, time peri- headings or codes. However, not only do docu- ods, and subjects were popular topics of search ments contain geo-references, but users of search in this domain. This is also true of library users systems also query based upon location. For ex- (Buckland et al., 2007; Zong et al., 2005) and ample, a study by Zhang et al. (2006) found that utilizing place (and time) for information access 12.7% of queries submitted to a Web search en- has been investigated in a range of past projects. gine contained a place name; Sanderson & Han Probably one of the most widely cited research (2007) also showed that queries for city names projects that made use of geo-referencing was were repeated more frequently than for other names (e.g. country and state). 1 http://www.nationalarchives.gov.uk/
the Alexandria Digital Library (ADL) project lection, is investigated. This includes the devel- (Frew et al., 1998; Goodchild, 2004) aimed “to opment of a prototype system to demonstrate develop a user-friendly digital library system that advanced search and browse functionalities, in- provides a comprehensive range of services to cluding faceted browsing, timelines and maps, collections of maps, images and spatially- based on semantic enrichment of data from Tate referenced information.” Online. Moritz (1999) describes the geo-referencing of Finally, the Vision of Britain5 website brings narrative descriptions of specimens in natural together census data, election results, historical history museums written by collectors in the maps, gazetteers and travel writing between 1901 field. Reid et al. (2004) discuss some of the pro- and 2001 into a single point of access. This is jects funned by EDINA, a National Data Centre part of the Great Britain Historical Geographical in Edinburgh, including Go-Geo! and Geo-X- Information System (GBHGIS) project managed Walk. These projects facilitate the discovery of by the University of Portsmouth (Gregory & geo-referenced information from distributed col- Southall, 2003). lections in the UK. Buckland & Lancaster (2004) describe com- 3 The National Archives (TNA) bining place, time and topic in the Electronic 3.1 The Role of TNA Cultural Atlas Initiative 2 , designed to foster a research community interested in geo-temporally The National Archives (TNA) is the UK gov- encoded data and create a catalogue of geo- ernment's official archive. The documents that it referenced online resources. As a part of this ini- holds reflects almost 1,000 years of history, with tiative, Buckland et al. (2007) overview the Go- records ranging from parchment and paper ing Places in the Catalog project, an initiative to scrolls through to digital files and archived web- offer seamless searching of educational and sites6. The National Archives also provides ad- scholarly numeric and textual resources. This vice and guidance to others throughout the public includes extracting geo-spatial information from and private sectors about caring for archives, and library catalog records to enhance the user’s records the location of archival collections search experience. throughout the UK. It publishes all UK legisla- Johnson (2004; 2005) describes the indexing tion and advises upon and encourages the re-use and delivery of historical maps online from the of public sector information. National Library of Australia using the Time- The National Archives brings together the Map3 tool (a mapping Java applet which gener- Public Record Office (founded in 1838), Histori- ates complete interactive maps with a few simple cal Manuscripts Commission (founded 1869), the lines of html code). With a similar goal, Martins Office for Public Sector Information, and Her et al. (2007) describe the EU co-funded DIG- Majesty's Stationery Office (founded 1786). MAP project aimed at making collections of digitised historical maps spatially aware, through 3.2 Data held by TNA the use of text mining techniques and Geo- The National Archives’ history, and its central graphic Information Retrieval (GIR). This pro- role in information management, means that the ject deals specifically with some of the issues data that it holds is extensive, varied and rich. involved in using digitised versions of historical This information falls into 4 groups: documents (e.g. old font styles, incorrectly OCR’d words etc.). 1. Original documents, in both paper and Mostern & Johnson (2008) describe the con- electronic format (found in Electronic struction of a historical event gazetteer (these are Records Online – ERO) and digitised records of historical events that describe the exis- copies of original documents made tence of named places) and the use of spatio- available online (through Documents temporal visualisation to view events and rela- Online). tionships from the Heurist 4 collaborative data- 2. Catalogues which describe the content base. Clough et al. (2008) describe a study in of The National Archives collections and which semantic access to cultural heritage infor- record the location of archival records mation from Tate Online, a large online art col- 2 5 http://www.ecai.org/ http://www.visionofbritain.org.uk 3 6 http://www.timemap.net http://www.nationalarchives.gov.uk/about/whowhathow.htm? 4 http://heuristscholar.org/heurist/ source= ddmenu_about1
held throughout the UK. These range would enable the creation of a distinct set from the general, in particular TNA’s of place data. Catalogue of its collections and the Na- • Disambiguation between different places tional Register of Archives, to the spe- with the same name. cific, with the E179 Taxation database. 3. Published information, in particular the Exploiting place information in The National London Gazette, official newspaper of Archives databases could unlock their potential, record and the Statute Law Database. allowing the information to be interrogated in a This data is currently maintained and way which is not possible with 15 discreet data 7 hosted separately . sources, currently existing in isolation. Comple- 4. A wide variety of other datasets, in- mentary information in different databases could cluding the ARCHON Directory of ar- be highlighted, enhancing the richness of the chive contact details, and Your Archives, data. For example, researchers interested in the a wiki which allows users to contribute history of Godalming parish could consult: content about archival records. • Tithe records on The Catalogue: In addition to these groups, data can also be http://www.nationalarchives.gov.uk/ca talogue/displaycataloguedetails.asp? divided into structured and unstructured; the CATID=4085656&CATLN=6&Highlight= latter providing a greater challenge for computa- ,GODALMING,PARISH&FullDetails=False tional text analysis. Geographic information can • Details on a seventeenth century Vicar of be found throughout TNA’s databases, varying Godalming, who failed to pay his tax, from the postal addresses of every archive in the found on the E179 database UK (found in the ARCHON directory); a com- http://www.nationalarchives.gov.uk/e1 79/details.asp?piece_id=1030&doc_id=3 prehensive directory of medieval places, with 8513&doc_ref=E179/14/216A related OS grid references and alternative names • The records of Godalming parish, held at (drawn from the E179 tax database); and a list of Surrey History Centre, recorded on the parishes which relate to the National Farm Sur- National Register of Archives: vey carried out during World War Two (found in http://www.nationalarchives.gov.uk/nr a/searches/subjectView.asp?ID=O12730 the Catalogue, reference MAF 32). • The location of records relating to the 3.3 Motivations for Space Exploration manor of Godalming, found on the Ma- norial Documents Register: Allowing users to search TNA data by place is a http://www.nationalarchives.gov.uk/md high priority because of the popularity of place r/searches/detail.asp?SubjectID=17071 names as search queries. A recent survey of a 6&CountyID=398&FirstDate=&LastDate=&P arishName=&MDKeyword= sample of search queries carried out in January and July 2009 found that about 25% of search User’s resource discovery experience could queries related to a place (substantially higher potentially be enriched further by links to related than figures reported in previous studies for gen- external resources, for example: eral-purpose Web search). Enabling TNA’s data to be navigated and ex- • The page on Godalming on the Explor- plored by place could provide an effective means ing Surreys past website: of helping users to find a way through the mass http://www.exploringsurreyspast.org. of information held by The National Archives – uk/themes/places/surrey/waverley/goda lming currently about 32 million records in the online databases and documents. In particular this • The entry for Godalming in the Victoria would allow: County History, found on the British History Online website: http://www.british- • Visualisation of TNA data through geo- history.ac.uk/report.aspx?compid= graphical interfaces. 42924&strquery=godalming • Division between place names and per- • The Wikipedia page on Godalming: http://en.wikipedia.org/wiki/Godalmin sonal names (including personal titles, g such as the Duke of Richmond). This The fragmented nature of TNA’s databases can make resource discovery challenging for us- 7 http://www.gazettes-online.co.uk/, http://www.statutelaw.gov.uk/
ers, and the vision of a single catalogue interface 4.1 Sources of Geographical Knowledge is something which is being actively pursued and To explicitly geo-reference information relies on the identification and exploitation of place data having access to existing geographical knowl- in TNA’s resources could contribute to this. edge. The most commonly used resource is the Because of the nature of data in TNA, a criti- gazetteer: at minimum a list of place names, fea- cal challenge that needs to be addressed is the ture types and spatial positions (Hill, 2006; issue of change of name over time and the use of Mostern & Johnson, 2008). Sources of gazetteers alternative names. The fact that TNA’s data in- commonly include (Goodchild & Hill, cludes, in digital form, information from the me- 2008:1039): dieval period to the present day, means that it provides a unique environment to test the type of • Gazetteers of ‘official’ toponyms (e.g. issues which could arise. Examples of changes of the Ordnance Survey 1:50,000 Scale name over time include: Gazetteer); • Indexes accompanying published atlases • Hampshire, sometimes known by the ab- (e.g. Multimap.com); breviation Hants, also (formerly fre- quently) known as Southampton. • Place identifier tables accompanying GIS datasets; • Richmond, Surrey, formerly known as West Sheen. • Place authority files (e.g. Vision of Brit- ain) and rules (e.g. NCA Rules for the • Stepney, formerly also known as Ste- Construction of Personal, Place and Cor- bunheath. porate Names8) used for cataloguing and 4 Geo-Referencing indexing; • Historical printed gazetteers and ency- Relating information to geographic location in- clopedias (e.g. Gazetteer of Great British volves linking between informal means of refer- Place Names); ring to locations using place names (or • General online resources (e.g. Wikipe- toponyms) and formal representations based on dia). spatial referencing systems (e.g. longitude and latitude or the British National Grid system). Researchers have investigated how to combine Much of the geographical information contained multiple sources to create more comprehensive in libraries and archives is based on place names, resources (see, e.g. (Manguinhas et al., 2008).) however these must be geospatially referenced Example sources which could be used within the (i.e. grounded) if they are to be used to their project include the following 9 (these are pre- greatest potential for information access and dominantly UK-biased): knowledge discovery. Locations are also identi- fiable in other forms than place names, including • Getty Thesaurus of Geographic addresses, postcodes, an address fragment, phone Names 10 (TGN) contains more than 1 numbers, URLs and IP addresses. million names and other information Gazetteers are typically used to provide the about places across all continents (in- link between informal (place names) and formal cluding political and historical places). (corresponding spatial references) representa- The locations are organized in a hierar- tions (section 4.1 presents some example gazet- chical structure (e.g. World>Europe> teers). A key task for providing geospatial infor- United Kingdom>Scotland). mation access is the recognition (or extraction) • Ordnance Survey 1:50,000 Scale Gaz- of potential geo-references and their geospatial etteer (OS50k) contains 260,000 names referencing (discussed further in section 4.2). in Great Britain from the current and Geographic Information Retrieval (GIR) deals previous years’ 1:50,000 Scale gazetteer. with the indexing of information to enable Locations are represented by points (Na- searching over it to find information that is po- tional Grid squares) with additional fea- tentially relevant to some information need (dis- ture type. cussed in section 4.3). 8 http://www.nca.org.uk/materials/namingrules.pdf 9 A comprehensive survey can be found from EDINA: http://edina.ac.uk/projects/crosswalk/ 10 http://www.getty.edu
• Ordnance Survey Code-Point provides In addition to these resources geographic data a National Grid reference for each unit can be mined from the following resources and postcode in Great Britain. The postcode services: data is sourced, from among other re- sources, Address Point, which contains • Wikipedia contains a wealth of place 26 million addresses recorded in the names which can be automatically ex- Royal Mail Postcode Address List tracted and compiled into useful gazetteer (PAF). resources (see, e.g. (Overell & Rüger, • GeoNames11 is a database of over 8 mil- 2006)). lion place names (and postcodes) from • Geo-X-Walk14is available for use through all countries. Locations are represented EDINA and provides APIs for UK name, by points and contain a range of feature postcode and identifier lookup. types (e.g. country, region, city, body of • Geograph 15 is a project which aims to water etc.). The database can be collect photographs and information for downloaded and used free of charge. The every square km of Great Britain. The pro- data is based on several sources includ- ject provides an Application Programming ing the GEOnet Names Server (GNS), Interface (API) which can be used to in- the US Geological Survey Geographic teract with the data. Currently there are Names Information System (GNIS) and 1,359305 images for Great Britain which Wikipedia. include place name metadata and an OS • Gazetteer of British Place Names 12 grid reference. (GBPN) contains over 50,000 entries • Yahoo Geocoding API 16 is a freely- (points - National Grid references) refer- available web service (limited to 5,000 re- ring to places in Great Britain. Each quests per day) that provides coordinates place name is related to its historic for a given address or place name. (The county (administrative areas) and vari- GeoNames service uses this as a backup ants of place names are also included. resource if the place name cannot be • UK Placename Finder13 is a database of found in its own database.) more than 31,000 UK place names (available for purchase on CD-ROM). A more sophisticated form of geographic Places are referenced to the UK National knowledge representation is an ontology which Grid. typically organizes locations into hierarchical • Seamless Administrative Boundaries structures and provides topological relations be- of Europe (SABE) is a pan-European tween locations (see, e.g. Fu et al., 2004). Recent dataset of administrative units compiled work has focused on areas such as spatio- from national contributions (containing temporal modeling (Mostern & Johnson, 2008), the geometry and semantics of the ad- interoperability (Janowicz and Kessler, 2008) ministrative hierarchies of 30 European and automatic gazetteer construction (Nadeau et countries). For the UK, the dataset con- al., 2006; Popescu et al., 2008). sists of around 22,000 place names. 4.2 Identifying/Resolving Geo-references • Alexandria Gazetteer contains around 5.9 million place names, with feature A basic task in geo-referencing information is to type information for all entries. The gaz- identify candidate geo-references. For place etteer is mainly US-based (U.S. place names (or toponyms), this is commonly referred names from the U.S. Geological Survey's to as geo-parsing (Larson, 1996) or toponym GNIS database are used), but also con- recognition (Leidner, 2008) and can be likened tains around 4 million non-US place to the Information Extraction task of Named En- names from the GEOnet Names Server tity Recognition (or NER): the process of assign- (GNS). ing every word or group of words to a set of pre- defined categories (including “not an entity”). 11 14 http://www.geonames.org/ http://www.geoxwalk.ac.uk 12 15 http://www.gazetteer.co.uk/ http://www.geograph.org.uk/ 13 16 http://www.digital-documents.co.uk/archi/placename.htm http://developer.yahoo.com/maps/rest/V1/geocode.html
These categories commonly include entities such geographical ontologies, spatial indexing and as location, person and organization. storage of documents, geographical relevance Most NER systems consist of at least three ba- ranking, the extraction and resolution of geo- sic components: (1) a tokeniser, (2) gazetteer graphical references, determining the geographi- lists, and (3) a Named Entity (NE) grammar cal scope (or focus) of web documents, user in- (syntactic rules that take into account features terfaces and visualization, and methods for the that describe the surrounding context). Rules for formulation of spatial queries and interaction NER can be generated entirely by hand (knowl- with the results of geographic search (see, e.g. edge-based) or automatically, using machine (Hill, 2006:185-214; Purves et al., 2007)). learning (or statistical) techniques on previously classified texts (see, e.g. (Zhou & Su, 2002; 5 Discussion Curran and Clark, 2003)). The most common 5.1 Example Applications approach is to use gazetteers together with syn- tactic rules. For example, Curran & Clark (2003) By geo-referencing data from TNA, a number of use a Maximum Entropy model to perform NER applications could be developed to enhance ac- using features such as Part-Of-Speech tags, a cess and support knowledge discovery. These history of preceding and following NEs, ortho- include: graphic information and gazetteers. Their ap- proach was used on historical texts in the Geo-X- Geographical search/browse: an obvious ap- Walk project (Nissim et al., 2004). plication is providing users with functionalities Once recognised, a candidate place name must to search and navigate the TNA archives through be resolved, i.e. given a spatial reference (Larson place, thereby making the current search spa- (2006) calls this geo-coding; Leidner (2008) re- tially-aware. Browsing across archives and fers to this as toponym resolution). This may in- search results would be enhanced through visual- volve disambiguating the place. There are two ising records on a map. Users could also initiate main ambiguities in geo-references: (1) referent a search by entering a place name, selecting ambiguity and (2) reference ambiguity. The for- names from a list (or ontology) or selecting mer occurs when the same place name is used to points (or a region) from a map. A gazetteer or refer to more than one location (e.g. “Chapel- geographical ontology could also provide search town” refers to a location in South Yorkshire assistance by suggesting locations spatially re- (UK), Lancashire (UK), Kent County (USA) and lated to the current search position (Purves et al., Panola County (USA)). This also includes a geo- 2007). graphical reference being used for other entities (e.g. the name of a person or company), which is Collection overviews: to support more ex- called referent class ambiguity. (Amitay et al. ploratory search paradigms (i.e. users with no (2004) separate ambiguity into geo/non-geo and specific search goal in mind), a summary of loca- geo/geo.) tions represented in different archives could be Reference ambiguity occurs when the same created. For example, through mapping all location can have more than one name, e.g. due documents to grid cells and shading these to in- to the historical deviation of a location name dicate document density. This would indicate the over time (Smith & Mann, 2003), transliteration coverage of the content from the archives (Hill, (Kwok & Deng, 2003) and implicit formats and 2006:211). character encoding (Axelrod, 2003). An exten- sive summary can be found in (Leidner, 2008). Co-occurring concepts: through statistical analysis of term co-occurrences, terms could be 4.3 Geographical IR (GIR) identified that help to characterise a location (e.g. Geographic Information Retrieval (GIR) is con- it is common to find the terms “Sheffield” and cerned with finding documents relevant to que- “steel” co-occur). This could be analysed with ries which include some geographical context respect to time to see changes in concepts related (Larson, 1996). Information is assigned one or to specific places (or between archives). Map more footprints and documents are retrieved and visualizations overlaid with co-occurring con- ranked according to thematic and spatial rele- cepts (and timeline controls) could help to high- vance (results can also be visualised using maps). light concepts or themes which relate to specific Research in this field has addressed a wide regions (see, e.g. (Purves and Edwardes, 2008)). range of relevant areas such as the building of
5.2 Research Plan to use tools such as GATE 17 (Cunningham, 2002) and OpenCalais18 from Reuters for NER, The goal of our proposed work is to investigate OpenSpace19 from Ordnance Survey and Google geo-referencing within TNA (a set of distributed Maps20 for mapping, Solr21 for interaction func- archives) to enable geo-spatial information ac- tionality including faceted browsing and Post- cess to the archive collections. This will include GIS 22 and LocalLucene 23 for spatial/textual in- the following key stages: dexing and retrieval. Stage 1: establishing user (and system) re- Stage 5: evaluating prototype applications quirements for geo-enabling TNA resources. with users and TNA. This will involve previous This will involve understanding the current use work on relevance assessments in GIR (Purves & of place in the search for archived material (e.g. Clough, 2005). log analysis and user interviews) and analysing user’s information needs. 5.3 Potential Challenges Stage 2: developing suitable geographical re- By geo-referencing material from The National sources that can be used for geo-referencing the Archives, we will be able to provide additional collection. Resources such as the OS50k gazet- functionality to explore the content with respect teer and GeoNames will be used as a starting to place (as discussed in section 3.3). However, point and initial analysis will be undertaken to the particular context of TNA provides several compare and understand these available re- challenges, particularly with respect to applying sources (and others). Techniques to combine and automated text processing techniques. Chal- extend existing geographical resources to deal lenges include the following: with historical variations and vernacular expres- sions and add further information (e.g. feature Multiple archives: each TNA database exhib- types) will be explored based on mining existing its different geographical characteristics (e.g. data from the TNA (e.g. from the E179 database) ARCHON includes address information; E179 and online sources (Uryupina, 2003; Jones et al., include historical variants; Documents Online 2006; Popescu et al., 2008; Manguinhas et al., contains unstructured text). Decisions regarding 2008). A key task in this activity will be deciding how documents will be indexed in this distrib- how to create spatial coordinates for places uted setting must also be considered (e.g. central- which do not appear in existing resources (e.g. ized or distributed indexes; single search or me- historical names, (Crane, 2004)). In addition, this tasearch). work will explore a suitable document represen- tation and annotation scheme for the TNA data. Understanding references to place: geo- references are used in a number of different con- Stage 3: developing suitable methods for texts (e.g. could refer to the location of the toponym recognition and resolution (addresses document’s author or publisher; the geographical will also be considered for the ARCHON data- scope of its content; or the location of a physical base). Simple approaches for NER based on gaz- entity, e.g. a library). Simply identifying geo- etteer lookup and minimal syntactic rules references and attaching these to documents or (Clough, 2005) will be tested; together with more records may not be sufficient. sophisticated approaches based on machine learning (Curran & Clark, 2003). An existing Extracting and resolving toponyms: ambi- method for resolving ambiguous toponyms using guities in natural language always present chal- default senses and gazetteer resources (Clough, lenges to natural language processing. In particu- 2005) will be tested, together with approaches lar, extracting and referencing toponyms which which utilize spatial proximity (Pasley & do not exist within the current resources (e.g. Clough, 2007). We plan to investigate the suit- historical variants) will require attention. Al- ability of different approaches though approaches in NER can successfully 17 http://gate.ac.uk/ Stage 4: developing “light-weight” prototypes 18 http://www.opencalais.com/ based on using publicly-accessible resources (i.e. 19 http://openspace.ordnancesurvey.co.uk/openspace/ 20 mashups (Erle et al., 2005)) and a suitable archi- http://maps.google.co.uk/ 21 http://lucene.apache.org/solr/ tecture for providing GIR functionality. We plan 22 http://postgis.refractions.net/ 23 http://locallucene.sourceforge.net/
identify place names, the main problem is how to Amitay, E., Har'El, N., Sivan, R.. and Soffer, A. ( spatially ground the references (i.e. can we find 2004) Web-a-Where: Geotagging Web Content, In: spatial information for known places which are Proceedings of the 27th annual international con- likely to co-occur with historical variants?) ference on Research and development in informa- tion retrieval (SIGIR04), Sheffield UK, 2004, pp. 273-280. Integrating space and time: in many contexts space and time is linked and should therefore be Axelrod, A. E. (2003) On building a high perform- modeled together (e.g. Buckland & Lancaster, ance gazetteer database. In: Kornai, A. and Sund- 2004; Mostern & Johnson, 2008)). Current spa- heim, B. (eds.) Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic Refer- tial indexing methods may not be suitable for ences, Alberta, Canada: ACL, 63-68. supporting representation of space and time without major re-design. Buckland, M. and Lancaster, M. (2004) Combining Place, Time and Topic: The Electronic Cultural At- Implementation: one goal of this work is to las Initiative, D-Lib Magazine, Vol. 10(5). inform future designs of the TNA’s search en- Buckland, M., Chen, A., Gey, F.C., Larson, R.R., gine. Geo-enabling TNA’s content should be Mostern, R. and Petras, V. (2007) Geographic achieved in such a way that it offers minimal Search: Catalogs, Gazetteers, and Maps. College & interruption/change to the current system and Research Libraries. Vol 68, no. 5 (Sept 2007): cataloguing processes. 376-387. Chapman, A.D. and J. Wieczorek (eds) (2006) Guide Scalability: the volume of data held in the to Best Practices for Georeferencing. Copenhagen: TNA databases also presents an implementation Global Biodiversity Information Facility. challenge, particularly for processes such as geo- Chen, H. (2001) An analysis of image queries in the parsing and geo-coding. This may affect the kind field of art history. JASIST, 52(3), 260-273. of solutions developed in the project (e.g. gazet- Choi, Y. and Rasmussen, E.M. (2003) Searching for teer lookup with minimal natural language proc- Images: The Analysis of Users' Queries for Image essing has shown to work efficiently for large Retrieval in American History. JASIST, 54(6), 498- collections in past work (McCurley, 2003; 511. Clough, 2005)). Clough, P. (2005), Extracting Metadata for Spatially- 6 Conclusions Aware Information Retrieval on the Internet, In: Proceedings of Workshop on Geographic Informa- In this paper we have described a proposal for tion Retrieval (GIR'05), held in conjunction with space exploration at The National Archives, the CIKM2005, Bremen, Germany, pp. 25-30. UK government's official archive. The diversity Clough, P., Marlow, J. and Ireson, N. (2008) Enabling and vastness of content at TNA currently pre- Semantic Access to Cultural Heritage: A Case sents a real challenge for the archival staff and Study of Tate Online, Larson, M., K. Fernie, J. end users of the archive, such as scholars and the Oomen and J. Cigarran (eds.) Proceedings of the general public. By exploiting geo-references ECDL 2008 Workshop on Information Access to from the content, material can be linked and ag- Cultural Heritage, Aarhus, Denmark, September gregated to provide enhanced forms of informa- 18, 2008. ISBN 978-90-813489-1-1. tion access and knowledge discovery across dis- Collins, K. (1998) Providing subject access to images: creet data sources. It would also highlight com- A study of user queries. The American Archivist, plementary information in TNA’s databases, and 61, 36-55. in doing so enhance the richness of the data. Crane, G. (2004) Georeferencing in Historical Collec- Place data is also one of the most popular topics tions, D-Lib Magazine, Vol. 10(5). for searching the archives from TNA’s website Cunningham, H. (2002) GATE, a General Architec- and Geographical Information Retrieval (GIR) ture for Text Engineering. Computers and the Hu- technologies could help improve the current manities 36, 223-254. search paradigms. By applying (and exploring) existing technologies from areas such as GIR and Cunningham, S.J., Bainbridge, D. and Masoodian, M. text mining, we can geo-enable the archive and (2004) How people describe their image informa- tion needs: A grounded theory analysis of visual prepare it for future access. arts queries, In: Proceedings of JCDL’04, 47-48. References Curran, J.R. and Clark, S. (2003) Language Inde- pendent NER using a Maximum Entropy Tagger.
In Proceedings of CoNLL-2003, Edmonton, Can- Geographic References, Alberta, Canada: ACL, ada, 2003, 164-167. 26-30. Ding, J., Gravano, L. and Shivakumar, N. (2000) Larson, R.R. (1996) Geographic Information Re- Computing geographical scopes of web resources. trieval and Spatial Browsing. In: GIS and Librar- In: Proceedings of 26th International Conference ies: Patrons, Maps and Spatial Information, Linda on Very Large Databases, VLDB 2000, Cairo, Smith and Myke Gluck, Eds., University of Illi- Egypt, September 2000, 10-14. nois. Frew, J., Freeston, M., Freitas, N., Hill, L.L., Janee, Leidner, J.L. (2008) Toponym Resolution in Text. G., Lovette, K., Nideffer, R., Smith, T.R. and Boca Raton, FL, USA: Universal Publishers Zheng, Q. (1998) The Alexandria Digital Library (ISBN-10 1581123841, ISBN-13: 978158112384 Architecture. ECDL 1998, 61-73. 5). Fu, F., Jones, C.B. and Abdelmoty, A.I. (2005) Build- Manguinhas, H., Martins, B. and Borbinha, J. (2008) ing a Geographical Ontology for Intelligent Spatial A Geo-Temporal Web Gazetteer Integrating Data Search on the Web, In: Proceedings of IASTED In- From Multiple Source. In Proceedings of the 2008 ternational Conference on Databases and Applica- IEEE international Conference on Digital Informa- tions (DBA2005). tion Management (November 13 - 16, 2008). Goodchild, M. (2004) The Alexandria Digital Library Martins, B., Borbinha, J., Pedrosa, G., Gil, J., and Project: Review, Assessments and Prospects, D- Freire, N. (2007) Geographically-aware informa- Lib Magazine, Vol. 10(5). tion retrieval for collections of digitized historical maps. In: Proceedings of the 4th ACM Workshop Gregory, I. and Southall, H. (2003) The Great Britain on Geographical information Retrieval (Lisbon, Historical GIS, University of Portsmouth, Avail- Portugal, November 09 - 09, 2007), ACM, New able online: http://www.port.ac.uk/research/ York, NY, 39-42. gbhgis/aboutthegbhistoricalgis/documentation/ McCurley, S.K. (2001) Geospatial mapping and navi- Goodchild, M. F. and Hill, L. L. (2008) Introduction gation of the web. In: Proceedings of the Tenth In- to digital gazetteer research. Int. J. Geogr. Inf. Sci. ternational WWW Conference, Hong Kong, 1-5 22, 10 (Oct. 2008), 1039-1044. May, 221-229. Hill, L.L. (2006) Georeferencing: The Geographic Mostern, R. and Johnson, I. (2008) From named place Associations of Information, The MIT Press, Cam- to naming event: creating gazetteers for history. bridge, Massachusetts, September 2006, 0-262- Int. J. Geogr. Inf. Sci. 22, 10 (Oct. 2008), 1091- 08354-X. 1108. Himmelstein, M. (2005) Local Search: The Internet Is Moritz, T. (1999) Geo-referencing the Natural and the Yellow Pages, IEEE Computer Society Journal, Cultural World, Past and Present: Towards Build- 0018-9162/05, 26-34. ing a Distributed, Peer-Reviewed Gazetteer Sys- Janowicz, K. and Kessler, C. (2008) The role of on- tem. Presented at the Digital Gazetteer Informa- tology in improving gazetteer interaction, Interna- tion Exchange Workshop, Oct 13-14, 1999. tional Journal of Geographical Information Sci- Nadeau, D., Turney, P.D. and Matwin, S. (2006) Un- ence, Vol. 22(10), 1129-1157. supervised named-entity recognition: Generating Johnson, I. (2004) Putting time on the map: Using gazetteers and resolving ambiguity. In: Proceed- TimeMap for map animation and web delivery, ings of 19th Canadian Conference on Artificial In- GeoInformatics, 26-29. telligence. Johnson, I. (2005) Indexing and Delivery of Historical Nissim, M., Matheson, C. and Reid, J. (2004) Recog- Maps Online Using TimeMapTM, National Library nising Geographical Entities in Scottish Historical of Australia News, January 2005 Volume XV Documents, In: Proceedings of the Workshop on Number 4. Geographic Information Retrieval, SIGIR 2004. Jones, C., Purves, R.S., Clough, P., and Joho, H. Overell, S.E. and Rüger, S. (2006) Identifying and (2008) Modelling Vague Regions with Knowledge Grounding Descriptions of Places, In: Proceedings from the Web, International Journal Geographic of Workshop on Geographic Information Retrieval Information Systems (IJGIS)- Special Edition on (GIR’06). Digital Gazetteer Research, Volume 22(10), 1045- Pask, A. (2005) Art Historians' Use of Digital Images: 1065. A Usability test of ARTstor. Dissertation at Uni- Kwok, K. L. and Deng, Q. (2003) GeoName: a sys- versity of North Carolina, Chapel Hill. tem for back-transliterating pinyin place names. In: Pasley, R., Clough, P. and Sanderson, M. (2007), Kornai, A. and Sundheim, B. (eds.) Proceedings of Geo-Tagging for Imprecise Regions of Different the HLT-NAACL 2003 Workshop on Analysis of
Sizes, In: Proceedings of Workshop on Geographic Zhou, G. and Su, J. (2002) Named Entity Recognition Information Retrieval GIR'07. Using a HMM-based Chunk Tagger. In: Proceed- ings of the 40th Annual Meeting of the Association Petras, V. (2004) Statistical Analysis of Geographic for Computational Linguistics (ACL2002), Phila- and Language Clues in the MARC Record, Tech- delphia, July 2002, 473-480. nical Report, University of California at Berkeley. Zong, W., Wu, D., Sun, A., Lim, E., and Goh, D. H. Popescu, A., Grefenstette, G. and Moëllic, P. A. (2005) On assigning place names to geography re- (2008) Gazetiki: automatic creation of a geo- lated web pages. In: Proceedings of the 5th graphical gazetteer. In: Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Li- ACM/IEEE-CS joint conference on Digital librar- braries (Denver, CO, USA, June 07 - 11, 2005). ies table of contents, Pittsburgh PA, PA, USA. JCDL '05, ACM, New York, NY, 354-362. Purves, R.S., Clough, P., Jones, C.B., Arampatzis, A., Bucher, B., Finch, D., Fu, G., Joho, H., Khirini, A.S., Vaid, S., and Yang, B. (2007), The Design and Implementation of SPIRIT: a Spatially-Aware Search Engine for Information Retrieval on the Internet, International Journal Geographic Infor- mation Systems (IJGIS), Volume 21(7), January 2007, pp. 717-745. Purves, R.S. and Clough, P. (2006) Judging spatial relevance and document location for Geographic Information Retrieval, extended abstract, In Pro- ceedings of 4th International Conference on Geo- graphic Information Science (GIScience 2006), Münster, Germany, September 2006, 159-164. Purves, R.S. and Edwardes, A.J. (2008) Exploiting Volunteered Geographic Information to Describe Place. Proceedings of the GIS Research UK 16th Annual Conference,Lambrick, D.(eds), 252-255. Reid, J.S., Higgins, C., Medyckyj-Scott, D. and Robson, A. (2004) Spatial Data Infrastructures and Digital Libraries: Paths to Convergence, D-Lib Magazine, Vol. 10(5). Sanderson, M. and Han, Y. (2007) Search Words and Geography, In: Proceedings of the Geographic In- formation Retrieval workshop at the ACM CIKM conference. Schuyler, E., Gibson, R. and Walsh, J. (2005) Map- ping Hacks: Tips and Tools for Electronic Cartog- raphy, O’Reilly. Smith, D. A. and Mann, G. S. (2003) Bootstrapping toponym classifiers. In: Kornai, A. and Sundheim, B. (eds.) Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, Alberta, Canada: ACL, 45-49. Uryupina, O. (2003) Semi-supervised learning of geographical gazetteers from the internet. In: Kor- nai, A. and Sundheim, B. (eds.) Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geo- graphic References, Alberta, Canada: ACL, 18-25. Zhang, V.W. Rey, B. Stipp. E. and Jones, R. (2006) Geomodification in Query Rewriting. In: Proceed- ings of the 2006 Workshop on Geographic Infor- mation Retrieval, Seattle, USA, 23-27.
You can also read