"You say sea cow, I say dugong "1 :a usage scenario for the use of controlled vocabularies in a federated registry/repository environment
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
VocabUsageScenario.doc 14/5/08 “You say sea cow, I say dugong …”1 :a usage scenario for the use of controlled vocabularies in a federated registry/repository environment Chris Blackall (APSR)2 Background to the Usage Scenario This usage scenario is the result of an impromptu discussion at a meeting to discuss how controlled vocabularies might be integrated into repository/registry applications.3 After the discussion I agreed to write the usage scenario and circulate it to attendees. The workshop was organised by Rob Atkinson and was held on 1 May 2008 at the CSIRO IM&T offices at Yarralumla, Canberra. Note that a usage scenario is not equivalent to a ‘use-case’ as defined in UML. User scenario’s are more discursive than UML use-cases and include narrative descriptions and other information about users and their needs that provide richer contexts for gathering user requirements. Scope of the Usage Scenario The usage scenario addresses the generic need of researchers and other data producers to lower the cost/effort required to create surrogate metadata records for research publications and data before they are ingested into ‘repositories’ (defined broadly here to include long- term data storage facilities).4 More specifically, it addresses the need to improve the accuracy of surrogate metadata records by providing data producers with automated mechanisms to populate and/or validate the relevant descriptive sections of metadata records; for example, by filling in or validating Web page forms containing ‘subject’ information with controlled vocabularies (e.g. Field of Research Codes) taken from authoritative sources (e.g. Australian and New Zealand Standard Research Codes 2008).5 Additionally, the usage scenario covers improving the accuracy of metadata for datasets by using controlled vocabularies, but in association with the semi-automated production of data product specifications6; specifically, ISO 19131 Geographic information - Data product specification.7 1 Sung to the tune of “Let's Call the Whole Thing Off” (Originally performed by Ginger Rodgers and Fred Astaire, composed by George Gershwin and Ira Gershwin, for the 1937 film Shall We Dance) http://www.youtube.com/watch?v=zZ3fjQa5Hls 2 Chris Blackall, Business Analyst, Australian Partnership for Sustainable Repositories (APSR), W.K. Hancock Building, Australian National University. 3 https://www.seegrid.csiro.au/twiki/bin/view/AppSchemas/VocabularyBindingMechanismsWorkshop 4 Simon Cox discussed this requirement at the open meeting the previous day. 5 http://www.abs.gov.au/AUSSTATS/abs@.nsf/productsbyCatalogue/5D99AEA1DD8AA8E0CA2574180005421C? OpenDocument 6 This is my attempt at capturing Rob Atkinson’s requirements. See 7 http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=36760 1
VocabUsageScenario.doc 14/5/08 Usage Scenario: One-click repository ingest and controlled vocabularies Jill Page8, Professor of Environmental Science, James Cook University9, leads a multidisciplinary team of researchers studying marine mammals; in particular, the Dugong (species: Dugong dugon) 10. One area where Prof. Page’s team has excelled is using new information and communications technologies to remotely capture data about dugong movements and behaviours. For example, they have developed techniques for attaching GPS (Global Positioning Systems) transmitters to individual dugongs and recording their location and movement data for later analysis with GIS (Geographical Information Systems) software.11 Furthermore, they have pioneered the use of blimps12 and Unmanned Aerial Vehicles13 (UAV) to remotely record digital video of dugong populations and behaviours (see attached). Thanks to these new data collection and analysis tools, Page and her team have created large volumes of data that is stored across many computers, storage devices and media. Worryingly for Page as team leader, this data has not been properly described, nor is under long-term management. Despite the poor management of data, many of Page’s publications are stored in the University institutional repository (JCU ePrints)14, which uses the ePrints software15. Not only does Page encourage her team to submit research articles into the repository because of the evidence that it improves the impact of their research and contributes to community outreach, but also because she anticipates that the Australian government will eventually mandate the submission of publicly-funded research publications and data and will possibly allocate research funds partly based on the statistics provided by the repository through the Excellence in Research for Australia (ERA) initiative.16 Hence, she concludes that creating accurate metadata about research publications and data will be of major strategic importance for her team. Although Page is convinced of the importance of submitting research publications to the repository, she wants better mechanisms to enable her research team to archive new research publications and primary data sets and to ensure the metadata is accurate. Moreover, she wants research publications and data to be linked so that users can discover and download the publication and its data. Finally, Page wants the whole submission process to be streamlined as much as possible—a ‘one-click’ process as she describes it. Put simply, Page just wants to manually fill in the publication ‘title’ and ‘abstract’ fields in the Web form; the other information should be entered automatically from stored data, or 8 Prof. Jill Page is a fictional identity; however, it is largely based on the profile and work of Prof. Helene Marsh. http://dugong.id.au/ 9 http://www.jcu.edu.au/ Although James Cook University is small by Australian and world standards, its proximity to the Great Barrier Reef, and its affiliations with leading marine research groups, means that it is an important node within the worldwide network of marine mammal researchers. 10 http://en.wikipedia.org/wiki/Dugong 11 Pyper, Wendy. 2007. Getting a Fast Lock on Dugong location. Australian Antarctic Magazine, 13. 26. 12 Hodgson, Amanda. 2007. “BLIMP-CAM”: Aerial video observations of marine animals. Marine Technology Society Journal 41 (2):39-43. 13 Pyper, Wendy. 2007. Population survey pilots unmaned aircraft. Australian Antarctic Magazine, 13. 15. 14 http://eprints.jcu.edu.au/ 15 http://www.eprints.org/ 16 http://www.arc.gov.au/era/default.htm 2
VocabUsageScenario.doc 14/5/08 entered from pick-lists, pull down menus, and other user interface elements that are populated with information from controlled vocabularies and other authoritative sources. Page’s requirements for a better repository submission process were based on her previous, mostly negative, experiences of submitting research publications to the University repository. The errors often arose because the submission process Web form lacked basic data entry validation functions for key metadata fields. In order to fill these fields in correctly, Page had to cut-and-paste from various documents into the Web form. Even when finished, she lacked confidence that the information was correct. To fix these limitations, Page’s wish list includes: 1. A ‘smart’ Web form for repository metadata That is, Web forms with pick-lists, pull-down menus, and other user interface elements that would assist the user to be populate the form with information from controlled vocabulary registries. These, for example, might include information about: • Researcher names, identifiers, affiliation information obtained from an institutional (LDAP) directory or a Researcher Name Registry17 • Field of Research (FOR) and Socio-Economic Outcome (SOE) codes and descriptors obtained from an Australian and New Zealand Standard Research Codes (ANZSRC) registry18 • Unique identifiers for species obtained from a Life Science Identifiers (LSID) registry19 • Geospatial coverage and place names obtained from a national gazetteer service/registry • Research collection information obtained from the Online Research Collections Australia (ORCA) Registry20 • … 2. A data product specification ‘wizard’ That is, a web application, or wizard, that guides users through the creation a standard data product specification for submission to a repository as a Submission Information Package (SIP). The wizard would include controlled vocabularies to assist users to fill in specific metadata fields (as in 1 above). The resulting Wizard configuration/profile information would be stored and associated with users identity information so that the configuration/profile can be easily reused. Similarly, local instances of metadata schemas and profiles would be regularly updated and maintained through a central metadata registry. 17 Possibly as part of the Australian Access Federation, http://www.aaf.edu.au/ 18 Not under developed, but suggested to the ABS as a service that they should develop. 19 http://lsids.sourceforge.net/ 20 http://www.apsr.edu.au/orca/index.htm Note that the ORCA Registry is the basis of the proposed ANDS Collections and Services Registry (see figure 1). 3
VocabUsageScenario.doc 14/5/08 Who would be the beneficiaries in this usage scenario? Three groups that would primarily benefit: 1. Producers and owners of the original research publications and data would have low cost/effort methods of creating metadata, at the same time fulfilling some of the administrative requirements of their host institutions and research funding bodies. The aggregation of this metadata by third parties would enable their work to be visible at national and international levels via search engines and discovery services. This would potentially improve its impact, and certainly its reach. The development of controlled vocabularies by specific research communities would also assist researcher cohesion and collaboration though standardized use of terms, categories and concepts. 2. End-users would have access to accurate information about research publications and data that was described and organised using controlled vocabularies. The use of controlled vocabularies would enable users to navigate/browse through research collections using faceted browse and navigation functions. 3. Research funding organizations and managers would benefit through access to up- to-date information and statistics about research publications and data that adopted controlled vocabularies to ensure reliable and consistent metadata. The Architectural Context of the Scenario The reference architecture for this scenario is the one described in Towards an Australian Data Commons (TADC), which details a federated architecture for a national network of repositories and registries.21 Following the TADC architecture, the usage scenario assumes that ‘repositories’ are separate functional entities to ‘registries’; although, they are inseparable in terms of the services that provide to end-users of the federation (see fig 1 for example).22 In the context of the TADC, ‘repositories’ are typically document-centric or data-centric. By this I mean that document-centric repositories (e.g. Fedora, DSpace and Eprints) typically hold research publications (e.g. PDF files) and associated digital objects (e.g. image and audio file), but little in the way of research ‘data’. 23 Nevertheless, these repositories are evolving to operate in a service-oriented environment and thus can communicate with any third-party ‘service’, including registries, via the standard W3C/OASIS Web Services stack, or via REST protocols and interfaces. In other words, they can be easily integrated with data-centric repositories; that is, just as long as both support the same interoperability frameworks and standards. Hence, I am assuming that the controlled vocabulary registry applications implied in the usage scenario would be ‘loosely coupled’ to repositories via Web Services/REST. It follows that the Smart forms and Data Wizards would take advantage of Web 2.0 technologies (REST, AJAX, etc.) to dynamically provide controlled vocabulary items to users when filling out Web forms. 21 ANDS Technical Working Group. 2007. Towards the Australian Data Commons: A proposal for an Australian National Data Service Canberra. Department of Education, Science and Training (DEST), Australian Government. http://www.pfc.org.au/twiki/pub/Main/Data/TowardstheAustralianDataCommons.pdf 22 Note that because ebXML Registry specification combines repository and registry functions this scenario may need to be adapted to be more understandable to the ebXML community. 23 Document-centric repositories generally follow the reference model established by the NASA Consultative Committee for Space Data Systems in the Reference Model for an Open Archival Information System (OAIS). See OAIS. 2001. Reference Model for an Open Archival Information System (OAIS). http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html 4
VocabUsageScenario.doc 14/5/08 Also, to be clear, this assumption does not preclude the option proposed by Rob Atkinson of creating local proxy versions of vocabulary data: indeed, these strategies are complementary. In addition to some basic technical and administrative metadata, the metadata ingested into document-centric repositories is mostly descriptive or bibliographic information that is used for discovery and citation purposes by users. The metadata standard used by document- centric repositories is the ‘unqualified’ version of Dublin Core Metadata Initiative (DCMI), Dublin Core Metadata Element Set, Version 1.1.24 However, the usage scenario described above would require ‘qualified’ DC metadata, which in turn would require community agreements about metadata profiles and interchange formats. In contrast, the metadata required for data-centric repositories varies a great deal as these are often run along community- or discipline-specific lines and adopt local or de facto standards. A further complication is that many data-centric repositories neither support the standard W3C/OASIS Web Services stack, nor REST protocols and interfaces. 24 http://dublincore.org/documents/dces/ 5
VocabUsageScenario.doc 14/5/08 Figure 1: Conceptual view of a (simplified) TADC Architecture including a generic vocabulary registry 6
Australian Antarctic magazine issue 13: 2007 Getting a fast lock on dugong locations New generation satellite tag technology that can locate and record the position of tagged animals faster and more efficiently than previous instrumentation, promises to vastly improve scientific understanding of dugong movement and habitat use. Wildlife Computers Through the Australian Centre for Applied Marine ‘When dugongs are in deep water and/or moving Mammal Science, Dr Ivan Lawler of James Cook quickly, we get fewer location fixes using standard University, and Mr Dave Holley of Edith Cowan GPS technology, because the tags do not breach the University, will test the ability of new ‘Fastloc®’ GPS surface for long enough,’ Dr Lawler says. A Fastloc® tag, similar to this one produced by Wildlife Computers in the US, but with a dugong-specific (Global Positioning System) technology (developed ‘This introduces a serious bias that can interfere housing that allows the tag to be tethered to dugongs’ by Wildtrack Telemetry Systems Ltd, UK) to track with modelling of dugong habitat use and our tails, will be used to track the fine scale movements of the fine scale movements of dugongs in deep water ability to detect migratory corridors. dugongs in deep water and sub-tidal seagrass meadows. and sub-tidal seagrass meadows. ‘If we don’t know what routes dugongs take when Dugongs have traditionally been tracked with ‘The habitat use of dugongs within inshore seagrass they move between areas, we don’t know what standard GPS tags, which need to remain above the meadows is poorly understood at low tide because threats – such as nets – they could potentially be water’s surface long enough to download ‘ephemeris’ the animals are in deeper water than at high tides exposed to, and we can’t assess the importance of data relating to the positions of the passing GPS when they move up into the intertidal shallows,’ Dr deep water seagrass beds to the animals. This has satellites. The longer a tag is submerged between Lawler says. implications for the conservation and management one position fix and the next, and the further the ‘So fewer locations are received from dugongs at of both dugongs and their habitat.’ animal travels before resurfacing, the longer it takes low tides than at high tides. We’ll compare the to record the next position. In practice, this often The research team will test the effectiveness of frequency of location fixes between these two areas means that the dugong (and tag) re-submerges Fastloc® tags in two very different habitats – and if similar numbers of locations are received in before a location is calculated, leaving significant Shoalwater Bay in central Queensland and Shark Bay both habitats it will demonstrate that the Fastloc® gaps in the data. Fastloc® tags, in contrast, do in Western Australia. Both areas are important for system can acquire position fixes from animals in not download ephemeris data and need only 0.02 dugong conservation. However, Shoalwater Bay has deep water.’ seconds at the surface to record data that can be a high tidal range of 7-8 m while Shark Bay has a The tags will also be tested for their ability to processed to provide an animal’s position. tidal range of 1.7 m. acquire location fixes from dugongs moving rapidly between seagrass habitats in different bays. The tag units will be deployed on five dugongs in each region for 2-3 months, along with time-depth recorders to measure the animals’ dive profiles. Tags will be attached to the tail of the dugong via a harness with a remotely triggered release. The Argos satellite system will then be used to locate the tag and to decode the dugong location information recorded by it. WENDY PYPER Information Services, AAD Paul Lavery Judy Davidson A dugong is released with its tag (a traditional GPS unit) A dugong is restrained during attachment of a tag to its tail. attached. 26
MARINE MAMMAL SCIENCE Population survey pilots unmanned aircraft Robotic aircraft or Joshua Smith and Michael Noad ‘Unmanned Aerial Vehicles’ (UAVs) could soon take to the skies in the name of marine mammal research, if a pilot project to test the technology succeeds. Through the Australian Centre for Applied Marine Mammal Science, Dr Amanda Hodgson and Dr Michael Noad, of the University of Queensland, will conduct and compare traditional manned and UAV surveys of dugongs and humpback whales, to test whether UAVs can improve the safety, cost-effectiveness and accuracy of marine mammal population surveys. ‘Aircraft hire and personnel costs mean that traditional manned aerial surveys are expensive, can images be viewed in real time to enable ‘Migrating humpback whales usually travel singly and eight people have died over the past 20 years operators on the ground to alter the flight path or in pairs, and often you just see their blows after aircraft crashed during aerial surveys,’ Dr when animals are sighted; and how much post- before they submerge again. They’re spread out on Hodgson says. flight analysis of images is required? a long migratory path, so you have to cover quite a bit of ocean to find them.’ ‘So we want to determine whether UAVs offer Dugongs and humpback whales are being a better way of monitoring marine mammal For dugongs, the UAV will fly transects over Moreton targeted as they live in different environments, Bay and Hervey Bay, in south-east Queensland, and populations, by reducing the cost and the risk, are sighted using different cues from the air, when a herd is sighted - through the live video link and by increasing the accuracy of species and have very different movement habits and – researchers will take over the controls and circle detection, location and identification using aggregation patterns. the herd to get an accurate count. on-board imaging technology.’ ‘Dugongs sometimes congregate in large herds of Humpback whales will be located during their UAVs have been around since the 1950s and up to 300 individuals, and need to be circled to be winter migration past North Stradbroke Island, and developed for a range of applications including counted,’ Dr Noad says. the UAV will again be tested at varying heights defence, weather research, and search and rescue. above the animals. Still and video images will then They are largely untested in wildlife research, be compared to see if there is any advantage of but they have the potential to be used at night – one over the other. with infrared cameras attached – or in extreme ‘Still images will likely have a better resolution environments. Their lower cost would also enable than video images, but it may be easier to detect more aerial surveys to be conducted, improving whales from movement in the video,’ Dr Noad says. Aerocam Australia population estimates. If this first phase of the project proves successful, The research team will use a large (5 m wingspan), the researchers will move on to the second commercially available UAV, supplied by Aerocam phase – to directly compare the results of UAV Australia and equipped with video and still cameras. surveys with manned surveys. Aerocam’s UAV ‘Shadow’ ‘A larger UAV can carry more equipment and a The scientists admit this is a high-risk project. lot more fuel – allowing us to cover the greater Aerocam’s ‘Shadow’ specs But even if the technology does not prove distances necessary for whale surveys,’ Dr Noad says. Wingspan: 5.2 m adequate today, with the pace of development, Length: 2.9 m it may be in just a few years’ time. The first phase of the project will test the basic Max weight: 90 kg ‘In the medium to long term, smaller UAVs could capabilities of the UAV for viewing and surveying Fuel load: 12-24 l reduce the cost of flights to just a few dollars an marine mammals. It will ask a range of questions, Max range: 1500 km hour, while better imaging software could negate including: does the UAV provide video and still Endurance: 3-8 hr images that can be easily analysed by researchers the need for human analysis at all,’ Dr Noad says. Speed: 160-200 km/hr or image analysis programs; what is the optimal Max payload: 25 kg WENDY PYPER camera height and system for different species; Information Services, AAD 25
You can also read