Digital Data Delights: 50 years of bits and bytes - Hersh Mann and Louise Corti UK Data Archive
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Digital Data Delights: 50 years of bits and bytes Hersh Mann and Louise Corti UK Data Archive University of Essex 50th Anniversary University of Essex 13 September 2014
Overview • What are Socioeconomic data? • Why the need for an archive? • Why the University of Essex? • The Archive through the decades • Types of data and media over time • What types of data collections does the Archive hold? • The evolution of our services as technology changes • New data landscapes • The future of the UK Data Archive?
What do we mean by ‘data’? • Quantitative • Statistics • Census data • Survey microdata • Macrodata We are now seeing the emergence of new forms and sources of data e.g. • Qualitative adminstrative data, big data • Historical documents • Diaries • Interview transcripts • Field notes • Audio recordings • Photographs and video
Survey microdata Percentage of party supporters who believe large numbers of people falsely claim benefits British Social Attitudes Survey 2010 (weighted data)
Trends in domestic burglary, 1981-2011/12 Crime Survey for England and Wales Figure 8 from ‘Crime in England and Wales Quarterly First Release, March 2012’ www.ons.gov.uk
The types of data that underlie these outputs need to be preserved and made available for secondary analysis. So…
Planning for an archive • The Social and Economic Archive Committee (SEAC) was established in 1963 to tackle the problem of data being ‘lost’ to British researchers because poor communication was leading academics to replicate work that was costly and time-consuming • SEAC was hosted by Political and Economic and Planning (PEP) • Funding was received from the London School of Economics (LSE) and the new Social Science Research Council (SSRC) • SEAC was well supported and well connected and aimed to establish an archive for social research
The contenders • Three proposals were submitted • University of Essex • PEP • SSRC • (Strathclyde had been considered a candidate but did not submit a proposal) • The PEP plan was not worked out in detail and mostly relied on the argument that such a national resource should be located in the capital • The SSRC bid was more interesting and argued that the data should be preserved by the funders because they are better placed than a university to obtain data and would operate ‘neutrally’ across the sector
Let’s gang up on Essex • The drawbacks of the SSRC bid were the higher costs of locating in London and their lack of computing facilities • To counter this problem the leaders of the SSRC bid teamed up with Claus Moser at the LSE who was highly opposed to the new archive being housed at Essex • They failed • SEAC chose Essex • it could be established quickly • provided value for money • had the office space • had the computing facilities
The only way is Essex • Essex was invited to submit an application • Allen Potter (Head of the Department of Government) was named as the Principal Applicant • £33,000 over 5 years • £4,500 p.a. for staff costs • £500 for travel • £1,000 p.a. for magnetic tapes • The SSRC Data Bank was set up at the University of Essex in 1967 "Data [are] deposited with the Bank on a wide range of topics including such intriguing questions as 'Did you go on a school visit to a coal mine?'" Wyvern, 16 February 1968
1960s - The Data Bank “The creation of Britain’s first memory bank on the computer at the University of Essex is a tremendous feather in the cap of the University…Hitherto the fame of the town of Colchester has rested upon the past, specifically it’s Roman background. From now on it will rest equally, if not more so, on the University” Colchester Gazette, 7 February 1968
1970s – The Survey Archive • The 1970s was a period of growth in empirical social science research. By the mid-1970s approximately £50 million per annum was being spent on social research in Britain, half in universities and the rest in ‘in-house’ government research and independent research units. • In its early years the Data Bank experienced difficulties in populating its collection due to: • an immature culture of data sharing • the high standards it required from deposits • restrictions on use attached to certain studies particularly government surveys • The turning-point came in the early 1970s when the Government Statistical Service enabled government surveys to pass to the Survey Archive, as the Data Bank had been renamed in 1972.
1970s - The Survey Archive
1980s - The SSRC/ESRC Data Archive • The Survey Archive was renamed the SSRC Data Archive in 1982 to reflect the broader range of data resources being collected and stored • At this time the work of the SSRC was reviewed by the Government the Rothschild Report supported a stronger focus on empirical research and research considered to be of ‘public concern’. This led to the SSRC being renamed, becoming the Economic and Social Research Council (ESRC) in January 1984. This resulted in a second name change to the Archive in two years – ESRC Data Archive! • Whilst the 1980s could be seen to be a low point for the social sciences, in retrospect, pressures on funding had both negative and positive impacts on the Archive. Less was spent on primary data collection, yet this in turn encouraged increased secondary use of research data and a greater acceptance of data sharing. • The 1980s also saw the Archive branch out through its involvement in a number of large co-operative data-orientated projects – key of which were the Domesday Project and the Rural Areas Database. This set a trend which has continued up to the present.
1980s - The SSRC/ESRC Data Archive
1990s - The Data Archive • The 1993 White Paper on Science and Technology led to an emphasis on wealth creation and the need to establish closer and deeper partnership between the academics and users of its research. In line with this, the 1990s witnessed an extension of our activity. • In 1992 the History Data Unit was formed as a specialist unit within the Archive, becoming part of the Arts and Humanities Data Service (AHDS) • In 1996 direct funding from the Joint Information Systems Committee (Jisc) was received in recognition of the support provided by the Archive for teaching and learning. This led to a new name and a logo that complemented that of the University.
1990s - The Data Archive
2000s - UK Data Archive • To reflect both its UK-wide remit and the importance of its role within the international data network, we became the UK Data Archive (UKDA) • New initiative in the form of the Economic and Social Data Service (ESDS) which came into operation in 2003 to include the Archive and Institute for Social and Economic Research (ISER) at Essex, and the Cathie Marsh Centre for Census and Survey Research (CCSR) and Manchester Information and Associated Services (MIMAS) both located at Manchester. • In recognition of its position in disseminating and preserving an increasingly diverse collection of government data, from 1 January 2005 the UKDA became a designated Place of Deposit for public records for The National Archives (TNA), thus making the deposit of materials a legal requirement for the first time, and thereby ensuring the supply of key social surveys for future research. • In 2007, the 40th anniversary year, together with ISER, the UKDA moved into a new purpose-built social science research centre
2000s - UK Data Archive
2010s - UK Data Archive • New look • New services • Launch of the UK Data Service • Big data network
Our Directors over time
The colours, buildings, computing media and hair styles change dramatically…
We have gone from this…
…to this, as we adapt to technological changes and embrace the digital age…
Digital age As technology advances, so must we. The history of the Archive is tied to the history of computing
Inside our ‘data factory’ over time the process has remained pretty much the same! “The greatest misconception about survey archives is the belief…that when data…arrive… their transfer is complete” Allen Potter
A united UK Data Service? • a comprehensive resource funded by the Economic and Social Research Council (ESRC) • a single point of access to a wide range of secondary social science data • support, training and guidance throughout the data life cycle • listen to our recorded webinars at http://ukdataservice.ac.uk/news-and- events/videos.aspx
UK Data Service Integrates ESDS, Survey Question Bank and Census.ac.uk ukdataservice.ac.uk
What does the UK Data Service do? • put together a collection of the most valuable data and enhance these over time • preserve data in the long term for future research purposes • make the data and documentation available for reuse • provide data management advice for data creators • provide support for users of the service • information about how data are used • easy access through website
Who is it for? • academic researchers and students • government analysts • charities and foundations • business consultants • independent research centres • think tanks • citizen scientists, where skills enable analysis
Our data portfolio UK Surveys Longitudinal International Large-scale Major UK Multi-nation government surveys following aggregate funded surveys individuals over databanks and time survey data Census Business Qualitative Census data Microdata and Range of 1971 – 2011 administrative multimedia data qualitative data sources
How many data collections are there in the UK Data Service catalogue? A. 4,800 B. 5,200 C. 5,700 D. 6,200 E. 7,300
UK survey series • high quality repeated cross-sectional surveys • Individual or household level data • cover many topics including health, work, crime, social attitudes, family expenditure, living costs, housing etc. • Labour Force Survey • Crime Surveys • Health Survey for England • British Social Attitudes • Annual Population Survey ….
Longitudinal studies • British Household Panel Survey and Understanding Society • 1958, 1970, 2000-01 Birth Cohorts • English Longitudinal Study of Ageing • Families and Children Study • Growing Up in Scotland • Longitudinal Study of Young People in England
International macrodata • time series data aggregated to country/region • International governmental organisations (IMF, OECD, IEA, World Bank) • wide range of socio-economic topics • regularly updated • currently limited to UK HE/FE institutions • World Bank data are open access
Trade value, US$ thousands G 1000 2000 3000 4000 5000 6000 7000 8000 9000 0 re R ec om e Graph: Celia Russell an i a Tu rk e Po y la B nd el gi H um u C nga ze ch ry Re In B d on p. os ni e L i si a a t He hu rz ani eg a ov .stat: UN COMTRADE, 2008 French snail imports in C a yp r B us ul ga r ia M ad Ita ag ly as U ca ni r te d Sy K r in ia gd om
UK census data • 1971-2011 census data • baseline for other statistics • detailed combinations of characteristics • small geographies • Census outputs • aggregate data • boundary data • flow data • microdata • aggregate data is open access • some restricted to UK HE/FE
Qualitative data Qualitative data in a number of different formats: interview transcripts, visual data, focus groups, essays, diaries, online data, observation notes, documents, audio data, open- ended survey questions, case notes etc. Examples of sociology data collections: • Family Life and Work Experience before 1918, Middle and Upper Class Families in the Early 20th Century, 1870-1977 (SN 5404) • Gender Difference, Anxiety and the Fear of Crime, 1995 (SN 4581) • Mothers Alone: Poverty and the Fatherless Family, 1955-1966 (SN 5072) • Affluent Worker in the Class Structure, 1961-1962 (SN 6512)
QualiBank
Another example of qualitative data Ray Pahl, SN 4867: School Leavers Study, 1978 Teachers at a comprehensive school on the Isle of Sheppey were asked to set a particular essay to those pupils who were students in English lessons about ten days before they were due to leave school. The students were asked to imagine that they were nearing the end of their life, and that something had made them think back to the time when they left school. They were then asked to write an imaginary account of their life over the next 30 or 40 years. The resulting data: 141 handwritten essays in 1978 by school leavers aged 15 and 16 years old. These can be browsed online.
Links with other data archives worldwide
Some statistics about our Service • 6,000 datasets in the collection • 400 new datasets and new editions added each year • 23,000 registered users • 60,000 downloads worldwide per annum • 4000+ user support queries per annum
What do our users do with the data ? • Comparative research, restudy or follow-up study • Re-analysis/secondary analysis • Research design and methodological advancement • Replication of published statistics • Teaching and learning
Expert advice on creating high quality data • We have supported research funder data policies since 1995 • We advise and support grant applicants and award holders • We write guidance for applicants and Data Management Planning (DMP) reviewers • We provide detailed training • We have published the first researcher-oriented textbook on this topic
Our managing and sharing data resources • Online ukdataservice.ac.uk/manage-data • Managing and Sharing Research Data – a Guide to Good Practice: www.uk.sagepub.com/books/9781446267264 (SAGE Publications) • Training programme
On our horizon • More data that can be linked to our more traditional data • Big data • Cloud computing • Mobile access to data • Access to powerful data through safe settings
What do we mean by ‘big data’? Big data is a buzzword used to describe massive volumes of data - structured and unstructured - within organisations that is so large and moves so quickly that it exceeds current processing capabilities Big data have the potential to help society improve operations and allows us to make faster, more intelligent decisions Nicole Miskelly, bobsguide, 8 August 2014 http://www.bobsguide.com/guide/news/2014/Aug/8/is-big-data-the-new-normal.html
Big data – the three V’s High-volume • Transaction-based data stored over the years • Unstructured data streaming from social media channels • Huge amounts of sensor and machine-to machine data
Big data – the three V’s High-velocity • Data are streaming at high speeds and needs to be processed quickly • This is a challenge for many organisations
Big data – the three V’s High-variety • Data come in structured and unstructured formats • Numeric data in traditional databases are usually structured • Text documents, email, video, audio and financial transactions are all unstructured • Hard to govern, merge and manage these different varieties • Formats • Licensing • Dissemination
Three significant changes in big data • Lower costs • Cloud storage • Technological advancement of open-source software “The cost to store a gigabyte of data is ten times cheaper than it was ten years ago. Open-source tools also now allow users to use commodity software and link together inexpensive computers instead of having to buy one big expensive server and Cloud Computing has enabled companies to borrow servers rather than having to buy and maintain them, which means they can just pay for what they use and then give them back.” Karl Rieder (Executive Consultant, GFT UK Limited)
How can big data help? • Can provide organisations will the ability to harness relevant data and analyse it to find answers • Examples in business might be: Optimisation of processes Reduce costs through efficiency New product development Smarter business decision making
Will it be possible to share data collected by your iPhone?
What is our future? • New forms of data are emerging • Technology watch • Collaboration • Enable safe and trusted access to data • We have much more computing power in our pockets today than the University had when it was founded. What will the picture be like when the University is 100? What types of data will the UK Data Archive have in 2064?
UK Data Archive Media Exhibition
Keep connected • Subscribe to UK Data Service list: www.jiscmail.ac.uk/cgi- bin/webadmin?A0=UKDATASERVICE • Follow UK Data Archive on Twitter: @UKDataArchive • Follow UK Data Service on Twitter: @UKDataService • Facebook: https://www.facebook.com/UKDataService • Youtube: www.youtube.com/user/UKDATASERVICE
Acknowledgements • Many thanks to Andrew Harrison, Maths@Essex for his slides on new technologies
Contact UK Data Archive http://www.data-archive.ac.uk/contact UK Data Service http://ukdataservice.ac.uk/help/get-in-touch.aspx
Questions?
You can also read