Protecting privacy in practice - The current use, development and limits of Privacy Enhancing Technologies in data analysis - Royal Society
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Protecting privacy in practice The current use, development and limits of Privacy Enhancing Technologies in data analysis
Protecting privacy in practice: The current use, development and limits of Privacy Enhancing Technologies in data analysis Issued: March 2019 DES5759 ISBN: 978-1-78252-390-1 The text of this work is licensed under the terms of the Creative Commons Attribution License which permits unrestricted use, provided the original author and source are credited. The license is available at: creativecommons.org/licenses/by/4.0 Images are not covered by this license. This report can be viewed online at: royalsociety.org/topics-policy/projects/privacy- enhancing-technologies Cover image © from2015.
CONTENTS Contents Foreword 4 Executive summary 5 Recommendations 6 Summary table 8 Introduction 10 Background – previous Royal Society reports 10 Purpose 11 Target audience 11 Scope 11 Methodology 13 Key terms and definitions 13 Chapter one – Data sensitivity and protections: what are the issues at stake? 17 1.1 Data sensitivity 18 1.2 Privacy risks and the data analysis pipeline 19 1.3 The legal context for data processing: personal data and the GDPR 22 Chapter two – PETs today: capabilities and limitations 25 2.1 PETs and privacy-preserving data analysis 26 2.2 Example PETs, capabilities and limitations 31 2.3 Privacy in practice – Privacy-preserving machine learning 48 Chapter three – The road to adoption 53 3.1 Further research and development 54 3.2 Awareness raising and quality assurance 55 3.3 Adoption within a wider business framework 56 3.4 Consider the wider markets 57 Appendix 59 Index of case studies 60 Working Group members 60 Royal Society staff 61 Reviewers 62 Workshop participants 63 Protecting privacy in practice 3
FOREWORD Foreword This report comes in the midst of a period here do not achieve their promise in the of rapid developments in the collection, near term, or that the costs of adoption analysis and use of data. We are becoming prove prohibitive, or that other technologies ever more aware of the social, research not explored in depth might leapfrog and business benefits of accessing and them. However, our aim here is to help using the data generated through everyday raise awareness of the potential of these activities. We need to ensure that when such technologies so that we can inspire further data is collected and used it is done so for research into their development, spurred by Image good reasons, in a well-governed way and identifying the opportunities where they can Professor Alison Noble FREng FRS, Chair, Privacy so that sensitive personal data or valuable be put into practice. We also aim to highlight Enhancing Technologies commercial data is adequately protected. their practical and technical limitations and to Working Group. note that there is no technology that replaces Privacy Enhancing Technologies (PETs) offer the need for good governance and proper an avenue for enabling that well-governed business practice relating to the use of data. access. The evolving role of PETs is in ensuring that, when we have good reasons We hope that this report adds to the lively and the legal and ethical grounds to access debate on the topic of data use and data and use data, we can do so while protecting governance, and complements other work that data and the individuals and organisations assessing the technological readiness levels it relates to. We aim in this report to explore of PETs. Our aim is that it will be an important that role and bring it to the attention of those part of conversations between researchers, who hold, analyse and use data. government and industry on the future use cases for PETs that can both drive research The field of PETs development is likewise forward and enable everyone to access social moving quickly, and this report captures a benefits from data. moment in time where the technologies are maturing and opportunities to use these Professor Alison Noble FREng FRS technologies are beginning to emerge. It may Chair, Privacy Enhancing Technologies be that some of the technologies surveyed Working Group 4 Protecting privacy in practice
EXECUTIVE SUMMARY Executive summary The scale and rate at which data is collected, unanswered questions: What are concrete used and analysed is rapidly increasing, offering trade-offs in real-world applications? How significant new and developing benefits to mature are different PETs? What opportunities society and the economy. However, realising do they present and what are their the full potential of large-scale data analysis limitations? How can government and industry may be constrained by important legal, accelerate their uptake and make the most reputational, political, business and competition of their potential? concerns. These concerns arise because the use of data creates a set of social and ethical This report provides a high-level overview tensions and risks: in particular there is a of current PETs, and the roles that they can balancing act between realising the benefits play, in order to inform applied data science of data analysis versus protecting sensitive research and government departments’ data and the interests of the individuals and digital strategies as well as those of business. organisations it relates to. The failure to It also considers how PETs sit within wider adequately address privacy risks may damage governance frameworks that are intended to trust and limit the realisation of the benefits that enable the beneficial use of data in an ethical can be delivered by data-enabled technologies. and well-governed way. Certain risks can potentially be mitigated and This report also aims to prompt reflection about managed with a set of emerging technologies the use of technology in governance and to and approaches often collectively referred to encourage regulators to consider new ways as ‘Privacy Enhancing Technologies’ (PETs). to approach privacy risks, including the use of Whilst cybersecurity is focussed on protecting PETs. To this end, this document provides an data so that other people cannot access evidence base – including a set of case studies it, PETs, in data analysis, are focussing on that capture concrete example uses for each enabling the derivation of useful results from of the five PETs considered in this report – data without giving other people access to and signposts to further resources. all of the data. This nascent but potentially disruptive set of technologies, combined Finally, this report includes recommendations on with changes in wider policy and business how the UK could fully realise the potential of frameworks, could enable significantly PETs and to allow their use on a greater scale. greater sharing and use of data in a privacy- preserving, trustworthy manner. It could The field of PETs development is moving create new opportunities to use datasets quickly. This report looks at five interrelated without creating unacceptable risks. It also and heterogeneous approaches within a offers great potential to reshape the data broad field and there is no intention to suggest economy, and to change, in particular, the trust that these particular technologies will develop relationships between citizens, governments earlier or get more uptake than others. and companies. However, this report is intended to raise awareness of the potential of this diverse field The field of PETs is rapidly evolving. However, of technologies and approaches and ways that currently, many of the most promising tools, they could be applied, in order to encourage whilst having a rich research heritage, are further research into their development and to relatively new to real-world applications. As inform future policy conversations about the such there remain a number of important development and use of PETs. Protecting privacy in practice 5
RECOMMENDATIONS Recommendations Recommendation 1 Recommendation 2 Accelerate the research and Promote the development of an development of PETs. innovation ecosystem. Funders, government, industry and the third UK Research and Innovation (UKRI) have a sector can work together to articulate and role in encouraging data-handling companies support the development of cross-sector to engage with the start-ups and scale-ups research challenges, alongside providing developing PETs, to support research and continued support for fundamental research early trials. This will help UK investors and on PETs. businesses realise the extent of the market opportunity for PETs. Recommendation 5 Recommendation 6 Give public sector organisations Create the skilled workforce the level of expertise and needed to develop and assurance they need to implement PETs. implement new technological Funding should be made available so that the applications, enable a centralised capacity to train UK PhD and Master students in cryptography, statistics, systems engineering approach to due diligence, and and software development increases with the assure quality across the board. level of demand for well-trained, high-calibre The National Cyber Security Centre should act candidates. This could be an outcome of the as a source of advice and guidance on the use National Cyber Security Programme and the of suitably mature PETs, as part of a network cybersecurity centres of excellence scheme of expert organisations. Such a network of by the Engineering and Physical Sciences expertise would support the development and Research Council. Universities should evolution of best practices and also provide consider adding privacy engineering to the access to advice on specific cases of data use curriculum of software engineering and data or sharing. Ultimately, this could also serve as a science courses, treating the need to protect point of engagement for academics and industry data as core knowledge in data analysis. bodies working in the space and provide a portal from which private sector organisations interested in learning about PETs could access information on existing case studies. 6 Protecting privacy in practice
RECOMMENDATIONS Recommendation 3 Recommendation 4 Drive the development and Support organisations to become adoption of PETs. intelligent users of PETs. Government can be an important early There is a need for Government, public bodies adopter, using PETs and being open about and regulators to raise awareness further their use so that others can learn from their and provide guidelines about how PETs can experience. Government departments should mitigate privacy risks and address regulations consider what existing processing might such as GDPR. For example, the Information be performed more safely with PETs and Commissioner’s Office (ICO) should provide how PETs could unlock new opportunities guidance about the use of suitably mature for data analysis, including opening up the PETs to help UK organisations minimise risks analysis of sensitive datasets to a wider pool to data protection, and this should be part of of experts whilst fully addressing privacy and the ICO’s Data Protection Impact Assessment confidentiality concerns. guidelines. Such guidance would need to cover how PETs fit within an organisation's overall data governance infrastructure, since the use of PETs in isolation is unlikely to be sufficient. Recommendation 7 Promote human flourishing by exploring innovative ways of governing data and its use that are enabled by PETs. The Department for Digital, Culture, Media and Sport (DCMS), the Centre for Data Ethics and Innovation (CDEI), Office for AI, regulators and civil society should consider how PETs could become part of the data stewardship infrastructure, underpinning governance tools such as ‘data trusts’ and other initiatives for the governance of data use. Protecting privacy in practice 7
SUMMARY TABLE Summary table(a) Trusted Execution Environments Homomorphic Encryption Type of privacy • S ecurely outsourcing to a server, or cloud, • S ecurely outsourcing specific operations computations on sensitive data on sensitive data • Safely providing access to sensitive data Privacy risk • Revealing sensitive attributes present in • Revealing sensitive attributes present in addressed a dataset a dataset Data protected • In storage • In storage • During computing • During computing Benefits • Commercial solutions widely available • C an allow zero loss of information • Zero loss of information • FHE* can support the computation of any operation Current limitations • Many side-channel attacks possible • F HE currently inefficient, but SHE* and PHE* are usable • Highly computationally intensive; bandwidth and latency issue • Running time • PHE and SHE support the computation of limited functions • Standardisation in progress Readiness level Product PHE: SHE: FHE: Product Pilot Research – proof of concept Qualification criteria • Specialist skills • Custom protocols • Computing resources Key FHE: Fully Homomorphic Encryption SHE: Somewhat Homomorphic Encryption PHE: Partial Homomorphic Encryption PIR: Private Information Retrieval PSI: Private Set Intersection 8 Protecting privacy in practice
SUMMARY TABLE Secure Multi-Party Computation Differential Privacy Personal Data Stores (b) • E nabling joint analysis on sensitive • Organisation releasing statistics • Individual managing with whom and data held by several organisations or derived information – generally how they share data an organisation that holds a large • De-centralising services that rely amount of data on user data • Revealing sensitive attributes • D ataset or output disclosing • U ndesired sharing of sensitive present in a dataset sensitive information about an entity information included in the dataset • During computing • A t point of dataset or result • At point of collection disclosure • During computing (locally) • N o need for a trusted third party - • F ormal mathematical proof / privacy • Gives full control to individuals sensitive information is not revealed guarantee • Removes the risk of attacks on to anyone • The user can set the level of ‘honeypots’ of centralised data • The parties obtain only the resulting protection desired, in particular • Analysis can be run locally analysis or model by reasoning about the number of times the data might be queried • H ighly compute and • N oise and loss of information, • Impracticality of individual communication intensive unless datasets are large enough controlling data sharing with • Setting the level of protection many parties requires expertise PSI*, PIR*: Proof of concept Pilot Product Product – pilot • Specialist skills • Specialist skills • Custom protocols • Custom protocols • Computing resources • Very large datasets (a) This table is intended as a guide to the reader of this report, to help understand the five PETs covered in the report in their current state of development as of March 2019. It is not an exhaustive taxonomy of PETs. (b) Unlike the other four PETs covered in the report, which are tools for privacy-preserving computation, Personal Data Stores are consumer-facing apps and services which can be supported by different kinds of PETs. They provide an example of one of the goals for PETs – enabling people to have more control over data. Protecting privacy in practice 9
INTRODUCTION Introduction Background – previous Royal Society reports That report recommended a principled The amount of data we generate, collect approach to resolving these tensions and the PETs allow the and use and the power of analysis to draw need for stewardship of data use. It highlighted derivation of useful insights from it are increasing rapidly. that governance challenges go beyond the insights from data, This significant change in data-enabled remit of data protection laws. In line with technologies has been the basis for three the report’s recommendation that a new without requiring major Royal Society reports in recent years. stewardship body be created to oversee the full data access. Progress and research in cybersecurity 1 made whole landscape of data use, several initiatives the case that trust is essential for growing have been established in the UK. In particular, and maintaining participation in the digital the UK Government has created a Centre for society. Organisations earn trust by acting in Data Ethics and Innovation and the Nuffield a trustworthy manner: building systems that Foundation has established the Ada Lovelace are reliable and secure, and treating people, Institute. Both will have a role in exploring the their privacy and their data with respect. It social, ethical and regulatory issues that arise argued that, as part of this, organisations from new uses of data and the practical means need to provide credible and comprehensible of addressing them. However, the extent to information to help people understand and which the tensions identified in the 2017 Royal assess how secure these systems are. Society and British Academy report can also be resolved through technological means is a That report was followed by the Royal Society key question for this current project. and British Academy’s joint report Data management and use: governance in the The Royal Society Machine learning: the 21st century 2 which identified a number of power and promise of computers that learn social and ethical tensions that arise out of by example 3 report called for a new wave of the changing ways that we use data. Several machine learning research, including technical of these relate to the use of (personal or solutions that can maintain the privacy of otherwise) sensitive data: datasets, whilst allowing them to be used in new ways by different users. That report referenced • M aking use of the data gathered through differential privacy and homomorphic encryption daily interaction to provide more efficient as promising lines of research. Here we explore services and security, whilst respecting the these technological approaches, along with presence of spheres of privacy. others, in more detail. Such technologies and • P roviding ways to exercise reasonable approaches are often collectively referred to as control over data relating to individuals Privacy Enhancing Technologies (PETs). whilst encouraging data sharing for private and public benefit. • Incentivising innovative uses of data whilst ensuring that such data can be traded and transferred in mutually beneficial ways. 1. The Royal Society 2016 Progress and Research in Cybersecurity: supporting a resilient and trustworthy system for the UK (see https:// royalsociety.org/topics-policy/projects/cybersecurity-research/, accessed 12 February 2019) 2. The British Academy and The Royal Society 2017 Data management and use: Governance in the 21st century. (see https://royalsociety. org/topics-policy/projects/data-governance/, accessed 12 February 2019) 3. The Royal Society 2017 Machine learning: the power and promise of computers that learn by example (see https://royalsociety.org/~/ media/policy/projects/machine-learning/publications/machine-learning-report.pdf, accessed 12 February 2019) 10 Protecting privacy in practice
INTRODUCTION Purpose Target audience Previous reports all show that data is an This report is aimed at an audience that might important resource and new technologies enable not have deep technical knowledge of statistics, us to use it to significant benefit. However, cryptography, systems security, hardware collecting and using such data creates risks for or other disciplines relevant to the detailed the data user – including legal, reputational, understanding of PETs. This report might, in political, business and competition risks – and particular, be of use to those in charge of digital can generate privacy concerns for individuals, strategies and data protection in the public, communities and organisations. This report private and third sector (eg chief data officers considers the extent to which PETs provide and data protection officers). It is also intended satisfactory routes through these dilemmas, and to inform those working in government and third offer an optimal outcome, where we can release sector bodies concerned with the ethics and the value in data whilst protecting its sensitivities. governance of data use, in order to consider How close are we to achieving this through the the extent to which the use of PETs can play a use of specific technologies, and which of these role in data governance frameworks. tensions cannot be robustly resolved through technological means? Scope This project has examined the role of PETs Whilst cybersecurity is focussed on protecting in enabling data analysis and extracting data so that other people cannot access it, value whilst preserving sensitive information, PETs are focussing on enabling the derivation and drawing out the implications for policy of useful results from data without giving other and practice. The aims of the project people access to all of the data. Different were to explore the interplay between the PETs achieve this in different ways, which are following questions: described in section 2.2. • W hat are the ethical and social issues at stake? The field of PETs is rapidly evolving; however, many of the most promising tools, whilst • W hat is mathematically possible and what is having a rich research heritage, are relatively technically feasible? In particular, the project new to real-world applications. As such there considered how privacy can be defined remain a number of important unanswered mathematically, and what is theoretically questions: What are concrete trade-offs in real- possible, alongside practical questions of world applications? How mature are different implementation. PETs? What opportunities do they present • W hat business models and incentive and what are their current limitations? How systems can deliver these technologies? can government and industry accelerate their uptake and make the most of their potential? Where is there potential for better take up, and where is there need for caution? Answering these questions, and in fact understanding PETs themselves, requires bringing together knowledge from multiple fields. This is reflected in the multidisciplinary nature of this Royal Society project on PETs. Protecting privacy in practice 11
INTRODUCTION Chapter one looks at some of the drivers for We recognise that some of these are not the use of PETs, in particular the types of risks technologies per se, but rather concepts associated with the analysis of personal data, or definitions which can be fulfilled by and the way current regulation in this domain technological means; however, it is common may encourage the uptake of PETs. Chapter practice to use ‘PETs’ as shorthand for the two explores opportunities for PETs-enabled broader field. Where possible, we draw out data analysis and considerations for the use how these five PETs may be combined with of PETs as a whole, before taking a deeper one another or with other technologies. For dive into a selection of five PETs. Chapter example, blockchain6, which is not a PET in itself, three brings together the rationale for the can actually be made more privacy-preserving recommendations made in this report. using PETs (see Case study 2 section 2.2.2). Additionally, section 2.3 summarises how ‘PETs’ is an umbrella term covering a broad these and other PETs can support privacy- range of technologies and approaches – preserving machine learning – enabling the from a piece of tape masking a webcam to use of a powerful form of data analysis. advanced cryptographic techniques4,5. This project did not consider the role of PETs in Case studies have been used to help illustrate private communications. We focus on a subset these technologies in practice, suggesting of five PETs identified during the scoping of the potential and relevance of different PETs the project as being particularly promising to to a range of sectors and application areas enable privacy-aware data collection, analysis (see section 2.2). Sharemind is a commercially and dissemination of results: homomorphic available distributed database system, encryption, trusted execution environments, developed by Cybernetica, using secure multi- secure multi-party computation, differential party computation to enable the safe sharing privacy, and personal data stores. of sensitive information between companies. Microsoft has developed CoCo, a confidential They represent a quite diverse set of blockchain framework where trusted approaches highlighting the different ways processors add confidentiality, performance that distinct communities – such as systems and governance to blockchain. NHS Digital security/hardware, statistics and cryptography is using Privitar’s de-identification product, – are tackling similar problems. Privitar Publisher, which uses homomorphic encryption to enable the safer sharing and linkage of data between authorised parties. The US Census Bureau has used differential privacy and is planning on using it on a large scale for the release of statistics from the upcoming 2020 census. Finally, CitizenMe is a phone app, and an example of a personal data store, that enables users to choose what data to share and with whom. 4. ENISA 2014 Privacy and data protection by design – from policy to engineering (see https://www.enisa.europa.eu/publications/privacy- and-data-protection-by-design, accessed 8 February 2019) 5. Office of the Privacy Commissioner of Canada 2017 Privacy Enhancing Technologies – A Review of Tools and Techniques (see https:// www.priv.gc.ca/en/opc-actions-and-decisions/research/explore-privacy-research/2017/pet_201711/, accessed 12 February 2019) 6. Blockchain is an open, distributed ledger that can record transactions between several parties efficiently and in a verifiable and permanent way. See Introduction, Key terms and definitions. 12 Protecting privacy in practice
INTRODUCTION Methodology Differential privacy: security definition which This report was informed by a series means that, when a statistic is released, it of conversations with a broad range of should not give much more information about stakeholders from academia, industry, a particular individual than if that individual had government and civil society, two interdisciplinary not been included in the dataset. The differential workshops on 16 May 2018 and 18 July 2018 privacy definition allows one to reason about (on the business and societal needs for PETs, how much privacy is lost over multiple queries and trends affecting their uptake; and on the (see privacy budget). state of play of research and implementation, Epsilon: see privacy budget. respectively), and consideration by an expert Working Group. The report was reviewed by FHE: Fully homomorphic encryption (FHE) expert readers and peer reviewers. Full details refers to encryption schemes where it is of the workshop speakers, Working Group possible to compute any polynomial function members, expert readers and peer reviewers on the data, which means both additions and are provided in the Appendix. multiplications. Homomorphic encryption (HE): a property Key terms and definitions that some encryption schemes have, so that This report draws on some concepts and models it is possible to compute on encrypted data from business, technology and data analytics. without deciphering it. Here is a concise glossary of 23 key terms used: Integrity: confidence that data is not B2B: Business-to-business; application or service tampered with. provided by a business to another business. MPC: see secure multi-party computation. B2C: Business-to-consumer; application or service provided by a business to an individual. Noise: noise refers to a random alteration of data/values in a dataset so that the true data Blockchain: an open, distributed ledger that points (eg personal identifiers) are not as easy can record transactions between several parties to identify. efficiently and in verifiable and permanent way. Blockchain is not a PET. Personal Data Store (PDS): systems that provide the individual with access and control Derivation: part of a de-identification process over data about them, so that they can decide where the granularity of a data item is what information they want to share and reduced (eg date of birth to year of birth). with whom. This is an irreversible activity. It is also known as sub-sampling. Partial Homomorphic Encryption (PHE): encryption supporting only additions or only multiplications (also referred to as additive homomorphic encryption and multiplicative homomorphic encryption). Protecting privacy in practice 13
INTRODUCTION Privacy budget, or differential privacy Somewhat Homomorphic Encryption (SHE): budget, or epsilon: quantitative measure of encryption supporting a limited number of both by how much the risk to an individual’s privacy additions and multiplications on encrypted data. may increase by, due to that individual’s data Tokenisation: consistently obscuring a data inclusion in the inputs to the algorithm. item (eg an NHS number) by replacement Privacy Enhancing Technologies (PETs): with a token, such as a regular expression, an umbrella term covering a broad range of as part of a de-identification process. This is technologies and approaches that can help a reversible activity. mitigate security and privacy risks. Trusted Execution Environment (TEE): Private Information Retrieval (PIR): an MPC isolated part of secure processors that allow protocol allowing a user to query a database the isolation of secret code from the rest of the whilst hiding the identity of the data retrieved. software that is running on a system in order to achieve confidentiality of the data. Trusted Private Set Intersection (PSI): secure multi- execution environments are also known as party computation protocol where two parties secure enclaves. compare datasets without revealing them in an unencrypted form. At the end, each party Zero-knowledge proof: method by which knows which items they have in common with one party can prove to another party that the other. There are some scalable open- they know a value x, without conveying source implementations of PSI available. any information apart from the fact that the statement is true. Redaction: part of a de-identification process where an identifier (such as name) is deleted; Secure Multi-Party Computation (SMPC or MPC): a subfield of cryptography concerned with enabling private distributed computations. MPC protocols allow computation or analysis on combined data without the different parties revealing their own private input. 14 Protecting privacy in practice
INTRODUCTION Image © scanrail. Protecting privacy in practice 15
CHAPTER ONE 16 Protecting privacy in practice
CHAPTER ONE Chapter one Data sensitivity and protections: what are the issues at stake? Left © filadendron. Protecting privacy in practice 17
CHAPTER ONE Data sensitivity and protections: what are the issues at stake? This section looks at some of the drivers for Personal privacy is not the only concern, data the use of PETs, in particular the types of risks might be commercially sensitive or related Data might be associated with the analysis of personal data, to national security, for example. Individuals considered and the way current regulation in this domain or organisation might be keen to share data sensitive for may encourage the uptake of PETs. but want to restrict whom they are sharing information with and what information they a range of 1.1 Data sensitivity want to share. PETs can help achieve such reasons, including Growing computing power, the volume of restrictions in different ways. personal privacy, information generated every day, and their commercial wider availability is making both benevolent An additional need for new PETs is being driven applications and attacks rapidly more powerful. by cloud computing, with the need for the data sensitivity or What seemed inconceivable in the past may controller to give the data processor specific national security. now become possible. For example, using a and limited capabilities. smartphone comes with the risk of a profile containing sensitive attributes being created People and organisations are becoming more about you without your full knowledge7; and aware of the individual and group harms information about you can be inferred from your caused by misuse of data and data breaches contacts8. Also, in the case of machine learning – for example the Facebook and Cambridge models, which are usually trained using a large Analytica scandal or the Equifax breach10,11. dataset, there is an expectation that the training dataset cannot be recovered from the trained model, but this is not true in all cases9. As data collection and use is expanding, the analysis of available datasets can make it possible to identify individuals and information about them, therefore data privacy management is also about identity management. 7. New York Times 2017 Your Apps Know Where You Were Last Night, and They’re Not Keeping It Secret (see https://www.nytimes.com/ interactive/2018/12/10/business/location-data-privacy-apps.html, accessed 12 February 2019) 8. Such JM and Criado R. 2018 Multiparty Privacy in Social Media. Communications of the ACM 61, 74–81. (see https://cacm.acm.org/ magazines/2018/8/229766-multiparty-privacy-in-social-media/fulltext, accessed 12 February 2019) 9. Veale M et al 2018 Algorithms that remember: model inversion attacks and data protection law. Phil. Trans. R. Soc. A 376. (see http://rsta.royalsocietypublishing.org/content/376/2133/20180083, accessed 12 February 2019) 10. Information Commissioner’s Office 2018 ICO issues maximum £500,000 fine to Facebook for failing to protect users’ personal information (see https://ico.org.uk/about-the-ico/news-and-events/news-and-blogs/2018/10/facebook-issued-with-maximum-500-000-fine/, accessed 12 February 2019) 11. Information Commissioner’s Office 2018 ICO Credit reference agency Equifax fined for security breach (see https://ico.org.uk/about-the-ico/ news-and-events/news-and-blogs/2018/09/credit-reference-agency-equifax-fined-for-security-breach/, accessed 12 February 2019) 18 Protecting privacy in practice
CHAPTER ONE It should be emphasized that the use of • H ow much does the analysis reveal about PETs does not in itself automatically make the whole population or group from which an analysis legal, ethical or trustworthy. Many the data used for the analysis originated? ethical questions arise through the data (Note that there is a definitional issue here: analysis pipeline, eg whether the purpose some might argue that this question relates of data use is socially beneficial, whether rather to fairness and discrimination, and it might result in disadvantages for certain others might point out that it also affects the individuals or groups, whether the data privacy of individuals in the population) has been collected appropriately, and so • D oes the analysis reveal whether someone on. Implementing PETs can ensure that the or a specific entity is included in the dataset methods by which data is used include that was used to conduct the analysis? protection against specific privacy risks, but it does not tell us anything about these • H ow much does an analysis reveal about broader ethical concerns. sensitive attributes about specific individuals or entities in the dataset? 1.2 Privacy risks and the data analysis pipeline • T o whom is information revealed and what To understand how PETs protect against might they do with it? privacy risks it is important to acknowledge that ‘privacy’ is a term which has many different • H ow sensitive are the input, intermediate meanings12,13. Privacy has been referred to values and output of an analysis? in legal concepts such as the right to be left alone, data protection rights, rights to control Secondly, it is also important to consider or ‘own’ personal data, secrecy, anonymity, and test whether or not a system presents and the right to respect for private and family vulnerabilities. Cryptographers and other life14. To consider the way PETs can mitigate experts have set goals for security and privacy privacy risks that stem from data analysis, it is which can then be assessed. The security necessary to establish at the outset different engineering community specified a set of kinds of privacy risk. First of all, for any given testable goals which systems can be assessed data analysis, one might ask generally how against, in particular: confidentiality, integrity much sensitive information it risks revealing. and availability (CIA). Echoing this triad of This question has several possible angles security requirements, the European Network and interpretations: and Information Security Agency (ENISA), alongside others, have proposed the following testable goals to be considered in privacy engineering: ENISA proposed the goals of unlinkability15, transparency, and intervenability16; 12. Op. Cit. 2 13. The Israel Academy of Sciences and Humanities and The Royal Society 2019 Israel-UK Privacy and Technology workshop – note of discussions (see https://royalsociety.org/topics-policy/projects/privacy-enhancing-technologies/, accessed March 2019) 14. Article 8, European Convention on Human Rights (see https://www.echr.coe.int/Documents/Convention_ENG.pdf, accessed 12 February 2019) 15. ‘Unlinkability of two or more items of interest means that within the system (comprising these and possibly other items) from the attacker’s perspective, these items of interest are no more and no less related after his observation than they are related concerning his a-priori knowledge’ - Pfitzmann A and Hansen M. 2005 Anonymity, Unlinkability, Unobservability, Pseudonymity, and Identity Management – A Consolidated Proposal for Terminology (see https://www.freehaven.net/anonbib/cache/terminology.pdf, accessed 8 February 2019) 16. Op. Cit. 4 Protecting privacy in practice 19
CHAPTER ONE in the US, the National Institute of Standards Finally, it is useful to consider the risks and and Technology (NIST) proposed predictability, harms associated with each stages of a typical manageability and disassociability17. data analysis pipeline – collection, processing and dissemination – and where PETs may Thirdly, an interesting proposition from be applied. Different types of potentially academia is the development of formal problematic actions in the data analysis mathematical approaches to privacy to support pipeline can result in privacy harms (see Box 1 a robust connection between regulation and and Figure 1, focusing on harms to individuals). practical enforcement18. Formalism potentially Note that harms and actions are distinct. For allows a shift from informal requirements instance, different actions (eg ‘identification’ and goals (‘obey law’, ‘be ethical’) towards or ‘exposure’) can result in the same harm quantifiable measurement of privacy (eg distress)20. Certain tools, such as PETs, enforcement (such as the ‘privacy budget’ can reduce the chance of specific problematic parameter in differential privacy (section 2.2.4)). data actions resulting in a harm, but they do However, this would require matching legal not necessarily stop other actions which could and mathematical definitions, as demonstrated result in the same or different types of harm. by Kobbi Nissim and colleagues (2018)19. It is also important to recognise that technology is not value-neutral and also to scrutinise how governance-sensitive choices are made – see in particular the subsequent discussion around setting the value of the ‘privacy budget’ parameters when using differential privacy (section 2.2.4). In any case, technical solutions need to be grounded in the legal framework. 17. National Institute of Standards and Technology 2017 An Introduction to Privacy Engineering and Risk Management in Federal Systems (see: https://doi.org/10.6028/NIST.IR.8062, accessed 12 February 2019) 18. Nissim K and Wood A. 2018 Is Privacy Privacy? Phil. Trans. R. Soc. A 376. (see https://doi.org/10.1098/rsta.2017.0358, accessed 12 February 2019) 19. Nissim K et al. 2018 Bridging the gap between computer science and legal approaches to privacy. Harvard Journal of Law & Technology 31. (see https://privacytools.seas.harvard.edu/publications/bridging-gap-between-computer-science-and-legal- approaches-privacy, accessed 12 February 2019) 20. Op. Cit. 17 20 Protecting privacy in practice
CHAPTER ONE BOX 1 Daniel Solove’s taxonomy of privacy Different types of actions in a data analysis pipeline can result in privacy harms. One way of understanding these different harms is through Daniel Solove’s taxonomy of privacy 21. Focusing on the analysis of personal data, problematic data actions include: • ‘Aggregation’, the gathering together • ‘Insecurity’ is a problem caused by the of information about a person. A piece way our information is handled and of information here or there is not very protected. It might involve glitches, a lack telling. But when combined together, of cybersecurity, abuses and illicit uses bits and pieces of data begin to form a of information. One major harm that can portrait of a person. The accumulation of result from insecurity is identity theft. data and increasing power of analytics • ‘Exclusion’ refers to the failure to provide mean that aggregation is easier and more individuals with notice and input about insightful than ever before. This presents their records. Insecurity and exclusion can both potential benefits (eg personalised create a sense of vulnerability, uncertainty recommendations by online retailers) and powerlessness amongst individuals and potential harms (eg potential distress from whom the data comes. or other harms resulting from a profile containing sensitive attributes being • ‘Disclosure’ occurs when certain created about an individual without their information about a person is revealed to full knowledge)22. others. Disclosure can threaten people’s security and might limit their freedom of • ‘Identification’ is connecting data to a association and expression. specific individual. An obvious benefit is identity verification. Identification can • ‘Exposure’ involves the exposing to others result in harm if an individual would rather of certain physical and emotional attributes stay anonymous. Of note, aggregation can about a person. Such exposure risks lead to the identification of an individual, creating embarrassment and humiliation. by linking ‘de-identified’ data (where • ‘Intrusion’ involves invasions or incursions personal identifiers have been removed) into one’s life, whether in the physical or and other data. digital space. For example, spam and junk mail are intrusions that sap people’s time. 21. Solove DJ. 2006 A Taxonomy of Privacy. University of Pennsylvania Law Review 154, 477–560. (see https://www.law.upenn.edu/ journals/lawreview/articles/volume154/issue3/Solove154U.Pa.L.Rev.477(2006).pdf, accessed 12 February 2019) 22. Op. Cit. 7 Protecting privacy in practice 21
CHAPTER ONE figure 1 Mapping of data lifecycle stages and problematic actions presenting a risk to privacy. INVAsions Information processing Information Information collection Data holders dissemination DATA SUBJECT risks risks risks risks Intrusion Surveillance Aggregation Breach of Decisional Interrogation Identification confidentiality interference Insecurity Disclosure Secondary use Exposure Exclusion Increased accessibility Blackmail Appropriation Distortion Source: Solove DJ. 2006 A Taxonomy of Privacy. University of Pennsylvania Law Review 154, 477–560. (see https://www. law.upenn.edu/journals/lawreview/articles/volume154/issue3/Solove154U.Pa.L.Rev.477(2006).pdf, accessed 12 February 2019) 1.3 The legal context for data processing: could overcome their apparent ‘failure’ to personal data and the GDPR go mainstream23. This report is not intended Considering how technologies can be used to provide a comprehensive analysis of all as a means of governance is the essence of relevant legal frameworks with regards to the the ‘data protection by design and default’ processing of personal or otherwise sensitive approach required by the General Data data (such as those mentioned in section 1.2) – Protection Regulation (GDPR), which came rather we take a closer look at the GDPR in the into force in May 2018 and introduced new context of the protection of personal data. protections across the European Union. Following many of the principles at the heart The ‘data protection by design and default’ of the GDPR, new data protection regimes approach introduced by the GDPR includes a are being adopted across the world. Such number of data protection principles, including regulatory change builds a fresh case for data ‘minimisation’ – to collect and store only the development and uptake of PETs, which the minimum required amount of data for the 23. Op. Cit. 5 22 Protecting privacy in practice
CHAPTER ONE purpose of a given application. An increased approach where certain variables in a dataset emphasis on data minimisation could lead to are suppressed or generalised in a manner Considering how a cultural shift within the organisations that that ensures the information from any single handle data. In particular, the GDPR challenges individual cannot be distinguished from at least technologies business models that involve data hoarding k-1 other individuals in the dataset. But this can be used and constitutes a push for business models that approach can still lead to sensitive attribute as a means of involve privacy protection. At the moment there disclosure in some cases. is a general culture of accumulating data and governance is the working out purpose and use later. However, This has prompted the Information essence of the there might be a counter-trend towards more Commissioner’s Office to consider ways ‘data protection by focused data collection24 based on specific it could address data that is not per se design and default’ variables of interest and specific hypotheses ‘personal data’ but might still contain sensitive to be tested. This move could potentially information27. New approaches, such as the approach required encourage the development of data-enabled PETs described in this report, need to be by the GDPR. technologies that utilise small, high quality considered as alternative safeguards. data, rather than relying on very large datasets. This in turn could open up new opportunities. The GDPR places an increased emphasis on accountability and, regarding consent for The regulation also recommends that data data processing in particular, it requires more users employ ‘safeguards’ to ensure their specificity about the purposes that data is processing of personal data is compliant with used for. Personal data stores, one of the PETs the regulation. Personal data is information considered in more detail in chapter two, have that relates to an identified or identifiable been proposed as one solution to help the individual25. Organisations wanting to adopt ‘data subject’ manage granular consent and a privacy-preserving approach may often provide individuals with control over their data. ask for a solution that transforms a dataset containing personal data into an anonymised Finally, GDPR also introduces mandated data one, so that they can perform computation on it. ‘portability’ – a data subject has the right to A key approach to do so is pseudonymisation, receive data about them from the controller, a data management procedure by which in a structured, commonly used, and machine personally identifiable information fields are readable format. This could, for instance, help replaced by artificial identifiers. However, consumers switch services and this might there have been several studies showing create markets, such as a service-switching that pseudonymised datasets could in certain market. Could PETs be part of the solution and cases be re-identified, by linking them with help share consumer data in a secure way? other data26. k-anonymisation is an alternative 24. The Economist 2018 Can the EU become another AI superpower? (see https://www.economist.com/business/2018/09/22/can-the-eu- become-another-ai-superpower, accessed 12 February 2019) 25. See https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/ (accessed 12 February 2019) 26. Narayanan A and Shmatikov V. 2008 Robust De-anonymization of Large Sparse Datasets (see https://www.cs.utexas.edu/~shmat/ shmat_oak08netflix.pdf, accessed 12 February 2019) 27. The British Academy and The Royal Society 2017 Data management and use: Governance in the 21st century – Priorities for data governance: discussions at a British Academy and Royal Society seminar on 16 October 2017. (see https://royalsociety.org/topics-policy/projects/data-governance/, accessed 12 February 2019) Protecting privacy in practice 23
CHAPTER TWO 24 Protecting privacy in practice
CHAPTER TWO Chapter two PETs today: capabilities and limitations Left © standret. Protecting privacy in practice 25
CHAPTER TWO PETs today: capabilities and limitations This section explores the opportunities 2.1 PETs and privacy-preserving data analysis for PETs-enabled data analysis and 2.1.1 What are the potential opportunities? PETs can reduce considerations for the use of PETs as a whole, At a high level, PETs can reduce the privacy risk the privacy risk of before taking a deeper dive into a selection associated with existing processing, reduce the existing processing, of five PETs. Finally, section 2.3 shows how friction of processing which otherwise would different PETs might be used to achieve one not be able to take place for being deemed too reduce the friction type of analysis, focusing on the example high risk, and create entirely new opportunities of processing of machine learning. to access and use data which had not currently deemed previously been thought of. Once developed too high risk, and The overviews given here provide a to the appropriate level of technical maturity, snapshot of this emerging area with the PETs could enable secure and controlled data create entirely understanding that a number of new access, and a range of new data-enabled new opportunities methods are being proposed and are products or services, in circumstances that to access and developing at different rates, with the five would otherwise raise legal, reputational, selected PETs presented as examples of political, business or competition concerns due use data. some of the main classes of methods at the to the potential for privacy risks. In this respect, current time. Some of these technologies PETs may be an enabler of data use, rather than are emerging into the basis of real-world an additional layer of cybersecurity defence. applications, and the case studies included Here we discuss some of the potential benefits demonstrate their potential for use; whilst offered by the use of PETs. others remain some distance away from practical use and uptake, being at the point A system of certification for PETs could enable of fundamental research. This section shows organisations using them to provide assurance what is possible, whilst giving a sense of that they are sharing and processing data in a the technical and practical limitations of secure, privacy-preserving way, assuming the the technologies at their current stage appropriate standard for the certifications could of development. be agreed by relevant stakeholders. In so doing, these technologies could help open up datasets to more external parties, and thus encourage innovation. PETs could, for example, enable government departments to share data with one another to allow analysis that would provide benefits to citizens through improved services – for example they could support the sharing of data between the National Health Service (NHS) and the Department for Work and Pensions (DWP) for administering health-related benefits. 26 Protecting privacy in practice
CHAPTER TWO In addition, PETs may help remove the need 2.1.2 What sort of protections do PETs provide? for government departments and agencies There is currently no technology that is The use of PETs to hold large datasets in places of restricted applicable to every single situation of privacy- access and only perform analysis in situ, preserving data analysis. Different PETs can be would not detract which is currently the case, for instance, for used to achieve distinct aims (see Summary from the need to Department for Education data28. Instead, table), such as: assess whether it PETs might allow secure cloud-based • s ecurely providing access to private is legal or ethical computation. They could also enable wider datasets; use of distributed processing on data stored to carry out an on citizens’ smartphones and other connected • e nabling joint analysis on private data held analysis or give objects, thus saving the need for servers that by several organisations; access to the data could be subject to security breaches29. • s ecurely out-sourcing to the cloud in the first place. computations on private data; Within organisations, PETs could support cross-team working, by enabling different • d e-centralising services that rely on internal teams to work on data that they user data. otherwise would not be able to match or view because of data protection rules. In academia, It is worth noting that some PETs might the safe sharing of data with the research be better suited for use by organisations community could also lead to innovations (business-to-business; B2B), and others for use benefiting citizens. However, this would not by individuals (business-to-consumer; B2C). detract from the need to assess whether it is For example, cloud providers may want to legal or ethical to carry out an analysis or give employ secure hardware or techniques based access to the data in the first place30. on encryption to protect code and data on their platform, whereas individuals may benefit PETs could also help commercial organisations from using personal data stores and other introduce new business models and use PETs designed for individuals31. consumer data in new ways. Personal data stores are an interesting example of such an application (see section 2.2.5). 28. See https://www.gov.uk/guidance/how-to-access-department-for-education-dfe-data-extracts (accessed 12 February 2019) 29. The Royal Academy of Engineering 2018 Internet of Things: realising the potential of a trusted smart world (see https://www.raeng.org. uk/publications/reports/internet-of-things-realising-the-potential-of-a-tr, accessed 12 February 2019) 30. For example, in 2017 the Digital Economy Act provided the Office for National Statistics (ONS) with permissive and mandatory gateways to receive data from all public authorities and Crown bodies, and new powers to mandate data from some UK businesses. In limited circumstances data held by ONS may also be shared with the devolved administrations solely for statistical purposes. (see https://www.ons.gov.uk/aboutus/transparencyandgovernance/lookingafterandusingdataforpublicbenefit/policies, accessed 12 February 2019) 31. Op. Cit. 5 Protecting privacy in practice 27
CHAPTER TWO The choice of a PET also requires considering Finally, it should be noted that there is a forms of control or oversight, or trust models. difference between ways of specifying privacy The choice of In any given system, whether centralised, definitions and mechanisms for achieving a PET requires decentralised or distributed32, trust depends them. Because of the complexity of the context considering forms on the context and who ‘sees’ unencrypted in which PETs operate, guaranteeing privacy data. Most of the PETs discussed in this report in computing requires a solid mathematical of control or have their origins in the field of cryptography, definition, often referred to as a ‘security oversight in any which often frames challenges in terms of definition’ or a ‘privacy definition’. Differential given system, and ‘attacker’ models (or threat models), ie what privacy (section 2.2.4) is an example of a what attacks that attacks a certain system might be vulnerable privacy definition – it is a way of measuring to. This has implications for the ways in which privacy. Each privacy definition might have system might be these PETs manage data access, for example: different mechanisms for being achieved. vulnerable to. • s ecure multi-party computation specifically For example, there might be several different removes the need for a central trusted ways of injecting random noise into a authority, with which parties would computation that would result in different otherwise need to share information ‘differentially private’ algorithm. These (section 2.2.3, Figure 5); algorithms would all satisfy the definition of • c entralised and distributed differential differential privacy, and thus offer the same privacy (section 2.2.4) come with different protection, but some might be more accurate trust models: in centralised (aka ‘global’) than others, resulting in different scalability differential privacy, noise33 is added upon trade-offs. release of the output which means trust lies in a central organisation, whilst in distributed 2.1.3 The cost of using PETs – accuracy, utility (aka ‘local’) differential privacy noise is and financial costs added at the time of data collection. In other words, the risk of sensitive information being How else do PETs alter data applications? disclosed in the case of an attack on the When using PETs, there are trade-offs. central authority is lower if that organisation Privacy engineers say that PETs incur a cost uses distributed differential privacy; in terms of ‘utility’. In the context of different technologies, the cost in utility might be of a • p ersonal data stores provide individuals with different nature. For example, with differential the means to choose and control who they privacy adding noise to a dataset entails a loss want to trust with the data they generate. Also, of some useful information so there is a cost in they might offer the possibility to process data terms of accuracy. In the case of PETs where locally rather than sending the raw data to a computation happens on encrypted data, central authority, whose concentration of data such as homomorphic encryption and secure makes it an attractive target for hackers (also multi-party computation, the main cost to utility known as a ‘honeypot’). is in terms of computation resources (time, computing power). 32. Goyal S. 2015 Centralized vs Decentralized vs Distributed (see https://medium.com/delta-exchange/centralized-vs-decentralized-vs- distributed-41d92d463868, accessed 12 February 2019) 33. Noise here refers to a random alteration of data values in a dataset so that the true data points (eg personal identifiers) are not as easy to identify. See Introduction, Key terms and definitions. 28 Protecting privacy in practice
You can also read