Protecting privacy in practice - The current use, development and limits of Privacy Enhancing Technologies in data analysis - Royal Society

Page created by Holly Schmidt
 
CONTINUE READING
Protecting privacy in practice - The current use, development and limits of Privacy Enhancing Technologies in data analysis - Royal Society
Protecting privacy
in practice
The current use, development
and limits of Privacy Enhancing
Technologies in data analysis
Protecting privacy in practice - The current use, development and limits of Privacy Enhancing Technologies in data analysis - Royal Society
Protecting privacy in practice: The current use,
development and limits of Privacy Enhancing
Technologies in data analysis
Issued: March 2019 DES5759
ISBN: 978-1-78252-390-1

The text of this work is licensed under the terms
of the Creative Commons Attribution License
which permits unrestricted use, provided the
original author and source are credited.

The license is available at:
creativecommons.org/licenses/by/4.0

Images are not covered by this license.

This report can be viewed online at:
royalsociety.org/topics-policy/projects/privacy-
enhancing-technologies

Cover image © from2015.
Protecting privacy in practice - The current use, development and limits of Privacy Enhancing Technologies in data analysis - Royal Society
CONTENTS

                            Contents
                            Foreword                                                                              4

                            Executive summary                                                                     5

                            Recommendations                                                                      6

                            Summary table                                                                        8

                            Introduction                                                                         10
                            Background – previous Royal Society reports                                          10
                            Purpose                                                                              11
                            Target audience                                                                      11
                            Scope                                                                                11
                            Methodology                                                                          13
                            Key terms and definitions                                                            13

                            Chapter one – Data sensitivity and protections: what are the issues at stake?        17
                            1.1 Data sensitivity                                                                 18
                            1.2 Privacy risks and the data analysis pipeline                                     19
                            1.3 The legal context for data processing: personal data and the GDPR	               22

                            Chapter two – PETs today: capabilities and limitations                               25
                            2.1 PETs and privacy-preserving data analysis                                        26
                            2.2 Example PETs, capabilities and limitations                                       31
                            2.3 Privacy in practice – Privacy-preserving machine learning                        48

                            Chapter three – The road to adoption                                                 53
                            3.1 Further research and development                                                 54
                            3.2 Awareness raising and quality assurance                                          55
                            3.3 Adoption within a wider business framework                                       56
                            3.4 Consider the wider markets                                                       57

                            Appendix                                                                             59
                            Index of case studies                                                                60
                            Working Group members                                                                60
                            Royal Society staff                                                                  61
                            Reviewers                                                                            62
                            Workshop participants                                                                63

Protecting privacy in practice	                                                                                   3
Protecting privacy in practice - The current use, development and limits of Privacy Enhancing Technologies in data analysis - Royal Society
FOREWORD

                            Foreword
                            This report comes in the midst of a period         here do not achieve their promise in the
                            of rapid developments in the collection,           near term, or that the costs of adoption
                            analysis and use of data. We are becoming          prove prohibitive, or that other technologies
                            ever more aware of the social, research            not explored in depth might leapfrog
                            and business benefits of accessing and             them. However, our aim here is to help
                            using the data generated through everyday          raise awareness of the potential of these
                            activities. We need to ensure that when such       technologies so that we can inspire further
                            data is collected and used it is done so for       research into their development, spurred by
Image
                            good reasons, in a well-governed way and           identifying the opportunities where they can
Professor Alison Noble
FREng FRS, Chair, Privacy   so that sensitive personal data or valuable        be put into practice. We also aim to highlight
Enhancing Technologies      commercial data is adequately protected.           their practical and technical limitations and to
Working Group.                                                                 note that there is no technology that replaces
                            Privacy Enhancing Technologies (PETs) offer        the need for good governance and proper
                            an avenue for enabling that well-governed          business practice relating to the use of data.
                            access. The evolving role of PETs is in
                            ensuring that, when we have good reasons           We hope that this report adds to the lively
                            and the legal and ethical grounds to access        debate on the topic of data use and data
                            and use data, we can do so while protecting        governance, and complements other work
                            that data and the individuals and organisations    assessing the technological readiness levels
                            it relates to. We aim in this report to explore    of PETs. Our aim is that it will be an important
                            that role and bring it to the attention of those   part of conversations between researchers,
                            who hold, analyse and use data.                    government and industry on the future use
                                                                               cases for PETs that can both drive research
                            The field of PETs development is likewise          forward and enable everyone to access social
                            moving quickly, and this report captures a         benefits from data.
                            moment in time where the technologies are
                            maturing and opportunities to use these            Professor Alison Noble FREng FRS
                            technologies are beginning to emerge. It may       Chair, Privacy Enhancing Technologies
                            be that some of the technologies surveyed          Working Group

4                                                                                                   Protecting privacy in practice
Protecting privacy in practice - The current use, development and limits of Privacy Enhancing Technologies in data analysis - Royal Society
EXECUTIVE SUMMARY

Executive summary
The scale and rate at which data is collected,         unanswered questions: What are concrete
used and analysed is rapidly increasing, offering      trade-offs in real-world applications? How
significant new and developing benefits to             mature are different PETs? What opportunities
society and the economy. However, realising            do they present and what are their
the full potential of large-scale data analysis        limitations? How can government and industry
may be constrained by important legal,                 accelerate their uptake and make the most
reputational, political, business and competition      of their potential?
concerns. These concerns arise because the
use of data creates a set of social and ethical        This report provides a high-level overview
tensions and risks: in particular there is a           of current PETs, and the roles that they can
balancing act between realising the benefits           play, in order to inform applied data science
of data analysis versus protecting sensitive           research and government departments’
data and the interests of the individuals and          digital strategies as well as those of business.
organisations it relates to. The failure to            It also considers how PETs sit within wider
adequately address privacy risks may damage            governance frameworks that are intended to
trust and limit the realisation of the benefits that   enable the beneficial use of data in an ethical
can be delivered by data-enabled technologies.         and well-governed way.

Certain risks can potentially be mitigated and         This report also aims to prompt reflection about
managed with a set of emerging technologies            the use of technology in governance and to
and approaches often collectively referred to          encourage regulators to consider new ways
as ‘Privacy Enhancing Technologies’ (PETs).            to approach privacy risks, including the use of
Whilst cybersecurity is focussed on protecting         PETs. To this end, this document provides an
data so that other people cannot access                evidence base – including a set of case studies
it, PETs, in data analysis, are focussing on           that capture concrete example uses for each
enabling the derivation of useful results from         of the five PETs considered in this report –
data without giving other people access to             and signposts to further resources.
all of the data. This nascent but potentially
disruptive set of technologies, combined               Finally, this report includes recommendations on
with changes in wider policy and business              how the UK could fully realise the potential of
frameworks, could enable significantly                 PETs and to allow their use on a greater scale.
greater sharing and use of data in a privacy-
preserving, trustworthy manner. It could               The field of PETs development is moving
create new opportunities to use datasets               quickly. This report looks at five interrelated
without creating unacceptable risks. It also           and heterogeneous approaches within a
offers great potential to reshape the data             broad field and there is no intention to suggest
economy, and to change, in particular, the trust       that these particular technologies will develop
relationships between citizens, governments            earlier or get more uptake than others.
and companies.                                         However, this report is intended to raise
                                                       awareness of the potential of this diverse field
The field of PETs is rapidly evolving. However,        of technologies and approaches and ways that
currently, many of the most promising tools,           they could be applied, in order to encourage
whilst having a rich research heritage, are            further research into their development and to
relatively new to real-world applications. As          inform future policy conversations about the
such there remain a number of important                development and use of PETs.

Protecting privacy in practice	                                                                                           5
Protecting privacy in practice - The current use, development and limits of Privacy Enhancing Technologies in data analysis - Royal Society
RECOMMENDATIONS

                      Recommendations

                      Recommendation 1                                     Recommendation 2

                      Accelerate the research and                          Promote the development of an
                      development of PETs.                                 innovation ecosystem.
                      Funders, government, industry and the third          UK Research and Innovation (UKRI) have a
                      sector can work together to articulate and           role in encouraging data-handling companies
                      support the development of cross-sector              to engage with the start-ups and scale-ups
                      research challenges, alongside providing             developing PETs, to support research and
                      continued support for fundamental research           early trials. This will help UK investors and
                      on PETs.                                             businesses realise the extent of the market
                                                                           opportunity for PETs.

                      Recommendation 5                                     Recommendation 6

                      Give public sector organisations                     Create the skilled workforce
                      the level of expertise and                           needed to develop and
                      assurance they need to                               implement PETs.
                      implement new technological                          Funding should be made available so that the
                      applications, enable a centralised                   capacity to train UK PhD and Master students
                                                                           in cryptography, statistics, systems engineering
                      approach to due diligence, and
                                                                           and software development increases with the
                      assure quality across the board.                     level of demand for well-trained, high-calibre
                      The National Cyber Security Centre should act        candidates. This could be an outcome of the
                      as a source of advice and guidance on the use        National Cyber Security Programme and the
                      of suitably mature PETs, as part of a network        cybersecurity centres of excellence scheme
                      of expert organisations. Such a network of           by the Engineering and Physical Sciences
                      expertise would support the development and          Research Council. Universities should
                      evolution of best practices and also provide         consider adding privacy engineering to the
                      access to advice on specific cases of data use       curriculum of software engineering and data
                      or sharing. Ultimately, this could also serve as a   science courses, treating the need to protect
                      point of engagement for academics and industry       data as core knowledge in data analysis.
                      bodies working in the space and provide a
                      portal from which private sector organisations
                      interested in learning about PETs could access
                      information on existing case studies.

6                                                                                               Protecting privacy in practice
RECOMMENDATIONS

Recommendation 3                                   Recommendation 4

Drive the development and                          Support organisations to become
adoption of PETs.                                  intelligent users of PETs.
Government can be an important early               There is a need for Government, public bodies
adopter, using PETs and being open about           and regulators to raise awareness further
their use so that others can learn from their      and provide guidelines about how PETs can
experience. Government departments should          mitigate privacy risks and address regulations
consider what existing processing might            such as GDPR. For example, the Information
be performed more safely with PETs and             Commissioner’s Office (ICO) should provide
how PETs could unlock new opportunities            guidance about the use of suitably mature
for data analysis, including opening up the        PETs to help UK organisations minimise risks
analysis of sensitive datasets to a wider pool     to data protection, and this should be part of
of experts whilst fully addressing privacy and     the ICO’s Data Protection Impact Assessment
confidentiality concerns.                          guidelines. Such guidance would need to cover
                                                   how PETs fit within an organisation's overall data
                                                   governance infrastructure, since the use of PETs
                                                   in isolation is unlikely to be sufficient.

Recommendation 7

Promote human flourishing by
exploring innovative ways of
governing data and its use that
are enabled by PETs.
The Department for Digital, Culture, Media
and Sport (DCMS), the Centre for Data Ethics
and Innovation (CDEI), Office for AI, regulators
and civil society should consider how PETs
could become part of the data stewardship
infrastructure, underpinning governance tools
such as ‘data trusts’ and other initiatives for
the governance of data use.

Protecting privacy in practice	                                                                                           7
SUMMARY TABLE

                             Summary table(a)
                              Trusted Execution Environments                           Homomorphic Encryption

    Type of privacy           • S
                                 ecurely outsourcing to a server, or cloud,           • S
                                                                                          ecurely outsourcing specific operations
                                computations on sensitive data                           on sensitive data
                                                                                       • Safely providing access to sensitive data

    Privacy risk              • Revealing sensitive attributes present in             • Revealing sensitive attributes present in
    addressed                   a dataset                                                a dataset

    Data protected            • In storage                                             • In storage
                              • During computing                                       • During computing

    Benefits                  • Commercial solutions widely available                 • C an allow zero loss of information
                              • Zero loss of information                               • FHE* can support the computation
                                                                                          of any operation

    Current limitations       • Many side-channel attacks possible                    • F HE currently inefficient, but SHE* and PHE*
                                                                                          are usable
                                                                                       • Highly computationally intensive; bandwidth
                                                                                          and latency issue
                                                                                       • Running time
                                                                                       • PHE and SHE support the computation
                                                                                          of limited functions
                                                                                       • Standardisation in progress

    Readiness level           Product                                                  PHE:              SHE:               FHE:
                                                                                       Product           Pilot              Research
                                                                                                                            – proof of
                                                                                                                            concept

    Qualification criteria                                                             • Specialist skills
                                                                                       • Custom protocols
                                                                                       • Computing resources

                             Key

                             FHE: Fully Homomorphic Encryption    SHE: Somewhat Homomorphic Encryption   PHE: Partial Homomorphic Encryption
                             PIR: Private Information Retrieval   PSI: Private Set Intersection

8                                                                                                             Protecting privacy in practice
SUMMARY TABLE

 Secure Multi-Party Computation                        Differential Privacy                                  Personal Data Stores (b)

 • E
    nabling joint analysis on sensitive               • Organisation releasing statistics                  • Individual managing with whom and
   data held by several organisations                    or derived information – generally                     how they share data
                                                         an organisation that holds a large                  • De-centralising services that rely
                                                         amount of data                                         on user data

 • Revealing sensitive attributes                     • D
                                                          ataset or output disclosing                       • U
                                                                                                                ndesired sharing of sensitive
   present in a dataset                                  sensitive information about an entity                 information
                                                         included in the dataset

 • During computing                                    • A
                                                          t point of dataset or result                      • At point of collection
                                                         disclosure                                          • During computing (locally)

 • N o need for a trusted third party -               • F ormal mathematical proof / privacy               • Gives full control to individuals
    sensitive information is not revealed                 guarantee                                          • Removes the risk of attacks on
    to anyone                                          • The user can set the level of                        ‘honeypots’ of centralised data
 • The parties obtain only the resulting                 protection desired, in particular                  • Analysis can be run locally
    analysis or model                                     by reasoning about the number of
                                                          times the data might be queried

 • H
    ighly compute and                                 • N oise and loss of information,                    • Impracticality of individual
   communication intensive                                unless datasets are large enough                      controlling data sharing with
                                                       • Setting the level of protection                       many parties
                                                          requires expertise

 PSI*, PIR*:                Proof of concept           Pilot                                                 Product
 Product                    – pilot

 • Specialist skills                                   • Specialist skills
 • Custom protocols                                    • Custom protocols
 • Computing resources                                 • Very large datasets

(a)	This table is intended as a guide to the reader of this report, to help understand the five PETs covered in the report in their current state
     of development as of March 2019. It is not an exhaustive taxonomy of PETs.
(b)	Unlike the other four PETs covered in the report, which are tools for privacy-preserving computation, Personal Data Stores are consumer-facing
     apps and services which can be supported by different kinds of PETs. They provide an example of one of the goals for PETs – enabling people
     to have more control over data.

Protecting privacy in practice	                                                                                                                       9
INTRODUCTION

                       Introduction
                       Background – previous Royal Society reports                              That report recommended a principled
                       The amount of data we generate, collect                                  approach to resolving these tensions and the
PETs allow the         and use and the power of analysis to draw                                need for stewardship of data use. It highlighted
derivation of useful   insights from it are increasing rapidly.                                 that governance challenges go beyond the
insights from data,    This significant change in data-enabled                                  remit of data protection laws. In line with
                       technologies has been the basis for three                                the report’s recommendation that a new
without requiring
                       major Royal Society reports in recent years.                             stewardship body be created to oversee the
full data access.      Progress and research in cybersecurity 1 made                            whole landscape of data use, several initiatives
                       the case that trust is essential for growing                             have been established in the UK. In particular,
                       and maintaining participation in the digital                             the UK Government has created a Centre for
                       society. Organisations earn trust by acting in                           Data Ethics and Innovation and the Nuffield
                       a trustworthy manner: building systems that                              Foundation has established the Ada Lovelace
                       are reliable and secure, and treating people,                            Institute. Both will have a role in exploring the
                       their privacy and their data with respect. It                            social, ethical and regulatory issues that arise
                       argued that, as part of this, organisations                              from new uses of data and the practical means
                       need to provide credible and comprehensible                              of addressing them. However, the extent to
                       information to help people understand and                                which the tensions identified in the 2017 Royal
                       assess how secure these systems are.                                     Society and British Academy report can also
                                                                                                be resolved through technological means is a
                       That report was followed by the Royal Society                            key question for this current project.
                       and British Academy’s joint report Data
                       management and use: governance in the                                    The Royal Society Machine learning: the
                       21st century 2 which identified a number of                              power and promise of computers that learn
                       social and ethical tensions that arise out of                            by example 3 report called for a new wave of
                       the changing ways that we use data. Several                              machine learning research, including technical
                       of these relate to the use of (personal or                               solutions that can maintain the privacy of
                       otherwise) sensitive data:                                               datasets, whilst allowing them to be used in new
                                                                                                ways by different users. That report referenced
                       • M
                          aking use of the data gathered through
                                                                                                differential privacy and homomorphic encryption
                         daily interaction to provide more efficient
                                                                                                as promising lines of research. Here we explore
                         services and security, whilst respecting the
                                                                                                these technological approaches, along with
                         presence of spheres of privacy.
                                                                                                others, in more detail. Such technologies and
                       • P
                          roviding ways to exercise reasonable                                 approaches are often collectively referred to as
                         control over data relating to individuals                              Privacy Enhancing Technologies (PETs).
                         whilst encouraging data sharing for private
                         and public benefit.

                       • Incentivising innovative uses of data whilst
                          ensuring that such data can be traded and
                          transferred in mutually beneficial ways.

                       1.   The Royal Society 2016 Progress and Research in Cybersecurity: supporting a resilient and trustworthy system for the UK (see https://
                            royalsociety.org/topics-policy/projects/cybersecurity-research/, accessed 12 February 2019)

                       2. The British Academy and The Royal Society 2017 Data management and use: Governance in the 21st century. (see https://royalsociety.
                          org/topics-policy/projects/data-governance/, accessed 12 February 2019)

                       3. The Royal Society 2017 Machine learning: the power and promise of computers that learn by example (see https://royalsociety.org/~/
                          media/policy/projects/machine-learning/publications/machine-learning-report.pdf, accessed 12 February 2019)

10                                                                                                                           Protecting privacy in practice
INTRODUCTION

Purpose                                                  Target audience
Previous reports all show that data is an                This report is aimed at an audience that might
important resource and new technologies enable           not have deep technical knowledge of statistics,
us to use it to significant benefit. However,            cryptography, systems security, hardware
collecting and using such data creates risks for         or other disciplines relevant to the detailed
the data user – including legal, reputational,           understanding of PETs. This report might, in
political, business and competition risks – and          particular, be of use to those in charge of digital
can generate privacy concerns for individuals,           strategies and data protection in the public,
communities and organisations. This report               private and third sector (eg chief data officers
considers the extent to which PETs provide               and data protection officers). It is also intended
satisfactory routes through these dilemmas, and          to inform those working in government and third
offer an optimal outcome, where we can release           sector bodies concerned with the ethics and
the value in data whilst protecting its sensitivities.   governance of data use, in order to consider
How close are we to achieving this through the           the extent to which the use of PETs can play a
use of specific technologies, and which of these         role in data governance frameworks.
tensions cannot be robustly resolved through
technological means?                                     Scope
                                                         This project has examined the role of PETs
Whilst cybersecurity is focussed on protecting           in enabling data analysis and extracting
data so that other people cannot access it,              value whilst preserving sensitive information,
PETs are focussing on enabling the derivation            and drawing out the implications for policy
of useful results from data without giving other         and practice. The aims of the project
people access to all of the data. Different              were to explore the interplay between the
PETs achieve this in different ways, which are           following questions:
described in section 2.2.
                                                         • W
                                                            hat are the ethical and social issues
                                                           at stake?
The field of PETs is rapidly evolving; however,
many of the most promising tools, whilst                 • W
                                                            hat is mathematically possible and what is
having a rich research heritage, are relatively            technically feasible? In particular, the project
new to real-world applications. As such there              considered how privacy can be defined
remain a number of important unanswered                    mathematically, and what is theoretically
questions: What are concrete trade-offs in real-           possible, alongside practical questions of
world applications? How mature are different               implementation.
PETs? What opportunities do they present
                                                         • W
                                                            hat business models and incentive
and what are their current limitations? How
                                                           systems can deliver these technologies?
can government and industry accelerate their
uptake and make the most of their potential?
Where is there potential for better take up,
and where is there need for caution?

Answering these questions, and in fact
understanding PETs themselves, requires
bringing together knowledge from multiple
fields. This is reflected in the multidisciplinary
nature of this Royal Society project on PETs.

Protecting privacy in practice	                                                                                               11
INTRODUCTION

                    Chapter one looks at some of the drivers for                             We recognise that some of these are not
                    the use of PETs, in particular the types of risks                        technologies per se, but rather concepts
                    associated with the analysis of personal data,                           or definitions which can be fulfilled by
                    and the way current regulation in this domain                            technological means; however, it is common
                    may encourage the uptake of PETs. Chapter                                practice to use ‘PETs’ as shorthand for the
                    two explores opportunities for PETs-enabled                              broader field. Where possible, we draw out
                    data analysis and considerations for the use                             how these five PETs may be combined with
                    of PETs as a whole, before taking a deeper                               one another or with other technologies. For
                    dive into a selection of five PETs. Chapter                              example, blockchain6, which is not a PET in itself,
                    three brings together the rationale for the                              can actually be made more privacy-preserving
                    recommendations made in this report.                                     using PETs (see Case study 2 section 2.2.2).
                                                                                             Additionally, section 2.3 summarises how
                    ‘PETs’ is an umbrella term covering a broad                              these and other PETs can support privacy-
                    range of technologies and approaches –                                   preserving machine learning – enabling the
                    from a piece of tape masking a webcam to                                 use of a powerful form of data analysis.
                    advanced cryptographic techniques4,5. This
                    project did not consider the role of PETs in                             Case studies have been used to help illustrate
                    private communications. We focus on a subset                             these technologies in practice, suggesting
                    of five PETs identified during the scoping of                            the potential and relevance of different PETs
                    the project as being particularly promising to                           to a range of sectors and application areas
                    enable privacy-aware data collection, analysis                           (see section 2.2). Sharemind is a commercially
                    and dissemination of results: homomorphic                                available distributed database system,
                    encryption, trusted execution environments,                              developed by Cybernetica, using secure multi-
                    secure multi-party computation, differential                             party computation to enable the safe sharing
                    privacy, and personal data stores.                                       of sensitive information between companies.
                                                                                             Microsoft has developed CoCo, a confidential
                    They represent a quite diverse set of                                    blockchain framework where trusted
                    approaches highlighting the different ways                               processors add confidentiality, performance
                    that distinct communities – such as systems                              and governance to blockchain. NHS Digital
                    security/hardware, statistics and cryptography                           is using Privitar’s de-identification product,
                    – are tackling similar problems.                                         Privitar Publisher, which uses homomorphic
                                                                                             encryption to enable the safer sharing and
                                                                                             linkage of data between authorised parties.
                                                                                             The US Census Bureau has used differential
                                                                                             privacy and is planning on using it on a large
                                                                                             scale for the release of statistics from the
                                                                                             upcoming 2020 census. Finally, CitizenMe is a
                                                                                             phone app, and an example of a personal data
                                                                                             store, that enables users to choose what data
                                                                                             to share and with whom.

                    4. ENISA 2014 Privacy and data protection by design – from policy to engineering (see https://www.enisa.europa.eu/publications/privacy-
                       and-data-protection-by-design, accessed 8 February 2019)

                    5. Office of the Privacy Commissioner of Canada 2017 Privacy Enhancing Technologies – A Review of Tools and Techniques (see https://
                       www.priv.gc.ca/en/opc-actions-and-decisions/research/explore-privacy-research/2017/pet_201711/, accessed 12 February 2019)

                    6. Blockchain is an open, distributed ledger that can record transactions between several parties efficiently and in a verifiable and
                       permanent way. See Introduction, Key terms and definitions.

12                                                                                                                        Protecting privacy in practice
INTRODUCTION

Methodology                                           Differential privacy: security definition which
This report was informed by a series                  means that, when a statistic is released, it
of conversations with a broad range of                should not give much more information about
stakeholders from academia, industry,                 a particular individual than if that individual had
government and civil society, two interdisciplinary   not been included in the dataset. The differential
workshops on 16 May 2018 and 18 July 2018             privacy definition allows one to reason about
(on the business and societal needs for PETs,         how much privacy is lost over multiple queries
and trends affecting their uptake; and on the         (see privacy budget).
state of play of research and implementation,
                                                      Epsilon: see privacy budget.
respectively), and consideration by an expert
Working Group. The report was reviewed by             FHE: Fully homomorphic encryption (FHE)
expert readers and peer reviewers. Full details       refers to encryption schemes where it is
of the workshop speakers, Working Group               possible to compute any polynomial function
members, expert readers and peer reviewers            on the data, which means both additions and
are provided in the Appendix.                         multiplications.

                                                      Homomorphic encryption (HE): a property
Key terms and definitions
                                                      that some encryption schemes have, so that
This report draws on some concepts and models
                                                      it is possible to compute on encrypted data
from business, technology and data analytics.
                                                      without deciphering it.
Here is a concise glossary of 23 key terms used:
                                                      Integrity: confidence that data is not
B2B: Business-to-business; application or service
                                                      tampered with.
provided by a business to another business.
                                                      MPC: see secure multi-party computation.
B2C: Business-to-consumer; application or
service provided by a business to an individual.      Noise: noise refers to a random alteration of
                                                      data/values in a dataset so that the true data
Blockchain: an open, distributed ledger that
                                                      points (eg personal identifiers) are not as easy
can record transactions between several parties
                                                      to identify.
efficiently and in verifiable and permanent way.
Blockchain is not a PET.                              Personal Data Store (PDS): systems that
                                                      provide the individual with access and control
Derivation: part of a de-identification process
                                                      over data about them, so that they can decide
where the granularity of a data item is
                                                      what information they want to share and
reduced (eg date of birth to year of birth).
                                                      with whom.
This is an irreversible activity. It is also known
as sub-sampling.                                      Partial Homomorphic Encryption (PHE):
                                                      encryption supporting only additions or only
                                                      multiplications (also referred to as additive
                                                      homomorphic encryption and multiplicative
                                                      homomorphic encryption).

Protecting privacy in practice	                                                                                            13
INTRODUCTION

                    Privacy budget, or differential privacy             Somewhat Homomorphic Encryption (SHE):
                    budget, or epsilon: quantitative measure of         encryption supporting a limited number of both
                    by how much the risk to an individual’s privacy     additions and multiplications on encrypted data.
                    may increase by, due to that individual’s data
                                                                        Tokenisation: consistently obscuring a data
                    inclusion in the inputs to the algorithm.
                                                                        item (eg an NHS number) by replacement
                    Privacy Enhancing Technologies (PETs):              with a token, such as a regular expression,
                    an umbrella term covering a broad range of          as part of a de-identification process. This is
                    technologies and approaches that can help           a reversible activity.
                    mitigate security and privacy risks.
                                                                        Trusted Execution Environment (TEE):
                    Private Information Retrieval (PIR): an MPC         isolated part of secure processors that allow
                    protocol allowing a user to query a database        the isolation of secret code from the rest of the
                    whilst hiding the identity of the data retrieved.   software that is running on a system in order
                                                                        to achieve confidentiality of the data. Trusted
                    Private Set Intersection (PSI): secure multi-
                                                                        execution environments are also known as
                    party computation protocol where two parties
                                                                        secure enclaves.
                    compare datasets without revealing them in
                    an unencrypted form. At the end, each party         Zero-knowledge proof: method by which
                    knows which items they have in common with          one party can prove to another party that
                    the other. There are some scalable open-            they know a value x, without conveying
                    source implementations of PSI available.            any information apart from the fact that the
                                                                        statement is true.
                    Redaction: part of a de-identification process
                    where an identifier (such as name) is deleted;

                    Secure Multi-Party Computation (SMPC or
                    MPC): a subfield of cryptography concerned
                    with enabling private distributed computations.
                    MPC protocols allow computation or analysis
                    on combined data without the different parties
                    revealing their own private input.

14                                                                                           Protecting privacy in practice
INTRODUCTION

                                       Image © scanrail.

Protecting privacy in practice	                       15
CHAPTER ONE

16                 Protecting privacy in practice
CHAPTER ONE

                            Chapter one
                            Data sensitivity and protections:
                            what are the issues at stake?

Left
© filadendron.

Protecting privacy in practice	                                     17
CHAPTER ONE

                     Data sensitivity and protections:
                     what are the issues at stake?
                     This section looks at some of the drivers for                            Personal privacy is not the only concern, data
                     the use of PETs, in particular the types of risks                        might be commercially sensitive or related
Data might be        associated with the analysis of personal data,                           to national security, for example. Individuals
considered           and the way current regulation in this domain                            or organisation might be keen to share data
sensitive for        may encourage the uptake of PETs.                                        but want to restrict whom they are sharing
                                                                                              information with and what information they
a range of
                     1.1 Data sensitivity                                                     want to share. PETs can help achieve such
reasons, including   Growing computing power, the volume of                                   restrictions in different ways.
personal privacy,    information generated every day, and their
commercial           wider availability is making both benevolent                             An additional need for new PETs is being driven
                     applications and attacks rapidly more powerful.                          by cloud computing, with the need for the data
sensitivity or       What seemed inconceivable in the past may                                controller to give the data processor specific
national security.   now become possible. For example, using a                                and limited capabilities.
                     smartphone comes with the risk of a profile
                     containing sensitive attributes being created                            People and organisations are becoming more
                     about you without your full knowledge7; and                              aware of the individual and group harms
                     information about you can be inferred from your                          caused by misuse of data and data breaches
                     contacts8. Also, in the case of machine learning                         – for example the Facebook and Cambridge
                     models, which are usually trained using a large                          Analytica scandal or the Equifax breach10,11.
                     dataset, there is an expectation that the training
                     dataset cannot be recovered from the trained
                     model, but this is not true in all cases9. As data
                     collection and use is expanding, the analysis
                     of available datasets can make it possible to
                     identify individuals and information about them,
                     therefore data privacy management is also
                     about identity management.

                     7.   New York Times 2017 Your Apps Know Where You Were Last Night, and They’re Not Keeping It Secret (see https://www.nytimes.com/
                          interactive/2018/12/10/business/location-data-privacy-apps.html, accessed 12 February 2019)

                     8. Such JM and Criado R. 2018 Multiparty Privacy in Social Media. Communications of the ACM 61, 74–81. (see https://cacm.acm.org/
                        magazines/2018/8/229766-multiparty-privacy-in-social-media/fulltext, accessed 12 February 2019)

                     9. Veale M et al 2018 Algorithms that remember: model inversion attacks and data protection law. Phil. Trans. R. Soc. A 376.
                        (see http://rsta.royalsocietypublishing.org/content/376/2133/20180083, accessed 12 February 2019)

                     10. Information Commissioner’s Office 2018 ICO issues maximum £500,000 fine to Facebook for failing to protect users’ personal information
                         (see https://ico.org.uk/about-the-ico/news-and-events/news-and-blogs/2018/10/facebook-issued-with-maximum-500-000-fine/, accessed
                         12 February 2019)

                     11. Information Commissioner’s Office 2018 ICO Credit reference agency Equifax fined for security breach (see https://ico.org.uk/about-the-ico/
                         news-and-events/news-and-blogs/2018/09/credit-reference-agency-equifax-fined-for-security-breach/, accessed 12 February 2019)

18                                                                                                                          Protecting privacy in practice
CHAPTER ONE

It should be emphasized that the use of                                    • H
                                                                              ow much does the analysis reveal about
PETs does not in itself automatically make                                   the whole population or group from which
an analysis legal, ethical or trustworthy. Many                              the data used for the analysis originated?
ethical questions arise through the data                                     (Note that there is a definitional issue here:
analysis pipeline, eg whether the purpose                                    some might argue that this question relates
of data use is socially beneficial, whether                                  rather to fairness and discrimination, and
it might result in disadvantages for certain                                 others might point out that it also affects the
individuals or groups, whether the data                                      privacy of individuals in the population)
has been collected appropriately, and so
                                                                           • D
                                                                              oes the analysis reveal whether someone
on. Implementing PETs can ensure that the
                                                                             or a specific entity is included in the dataset
methods by which data is used include
                                                                             that was used to conduct the analysis?
protection against specific privacy risks,
but it does not tell us anything about these                               • H
                                                                              ow much does an analysis reveal about
broader ethical concerns.                                                    sensitive attributes about specific individuals
                                                                             or entities in the dataset?
1.2 Privacy risks and the data analysis pipeline                           • T
                                                                              o whom is information revealed and what
To understand how PETs protect against                                       might they do with it?
privacy risks it is important to acknowledge
that ‘privacy’ is a term which has many different                          • H
                                                                              ow sensitive are the input, intermediate
meanings12,13. Privacy has been referred to                                  values and output of an analysis?
in legal concepts such as the right to be left
alone, data protection rights, rights to control                           Secondly, it is also important to consider
or ‘own’ personal data, secrecy, anonymity,                                and test whether or not a system presents
and the right to respect for private and family                            vulnerabilities. Cryptographers and other
life14. To consider the way PETs can mitigate                              experts have set goals for security and privacy
privacy risks that stem from data analysis, it is                          which can then be assessed. The security
necessary to establish at the outset different                             engineering community specified a set of
kinds of privacy risk. First of all, for any given                         testable goals which systems can be assessed
data analysis, one might ask generally how                                 against, in particular: confidentiality, integrity
much sensitive information it risks revealing.                             and availability (CIA). Echoing this triad of
This question has several possible angles                                  security requirements, the European Network
and interpretations:                                                       and Information Security Agency (ENISA),
                                                                           alongside others, have proposed the following
                                                                           testable goals to be considered in privacy
                                                                           engineering: ENISA proposed the goals of
                                                                           unlinkability15, transparency, and intervenability16;

12. Op. Cit. 2

13. The Israel Academy of Sciences and Humanities and The Royal Society 2019 Israel-UK Privacy and Technology workshop –
    note of discussions (see https://royalsociety.org/topics-policy/projects/privacy-enhancing-technologies/, accessed March 2019)

14. Article 8, European Convention on Human Rights (see https://www.echr.coe.int/Documents/Convention_ENG.pdf,
    accessed 12 February 2019)

15. ‘Unlinkability of two or more items of interest means that within the system (comprising these and possibly other items) from the attacker’s
    perspective, these items of interest are no more and no less related after his observation than they are related concerning his a-priori
    knowledge’ - Pfitzmann A and Hansen M. 2005 Anonymity, Unlinkability, Unobservability, Pseudonymity, and Identity Management – A
    Consolidated Proposal for Terminology (see https://www.freehaven.net/anonbib/cache/terminology.pdf, accessed 8 February 2019)

16. Op. Cit. 4

Protecting privacy in practice	                                                                                                                                  19
CHAPTER ONE

                   in the US, the National Institute of Standards                            Finally, it is useful to consider the risks and
                   and Technology (NIST) proposed predictability,                            harms associated with each stages of a typical
                   manageability and disassociability17.                                     data analysis pipeline – collection, processing
                                                                                             and dissemination – and where PETs may
                   Thirdly, an interesting proposition from                                  be applied. Different types of potentially
                   academia is the development of formal                                     problematic actions in the data analysis
                   mathematical approaches to privacy to support                             pipeline can result in privacy harms (see Box 1
                   a robust connection between regulation and                                and Figure 1, focusing on harms to individuals).
                   practical enforcement18. Formalism potentially                            Note that harms and actions are distinct. For
                   allows a shift from informal requirements                                 instance, different actions (eg ‘identification’
                   and goals (‘obey law’, ‘be ethical’) towards                              or ‘exposure’) can result in the same harm
                   quantifiable measurement of privacy                                       (eg distress)20. Certain tools, such as PETs,
                   enforcement (such as the ‘privacy budget’                                 can reduce the chance of specific problematic
                   parameter in differential privacy (section 2.2.4)).                       data actions resulting in a harm, but they do
                   However, this would require matching legal                                not necessarily stop other actions which could
                   and mathematical definitions, as demonstrated                             result in the same or different types of harm.
                   by Kobbi Nissim and colleagues (2018)19. It is
                   also important to recognise that technology
                   is not value-neutral and also to scrutinise how
                   governance-sensitive choices are made – see
                   in particular the subsequent discussion around
                   setting the value of the ‘privacy budget’
                   parameters when using differential privacy
                   (section 2.2.4). In any case, technical solutions
                   need to be grounded in the legal framework.

                   17. National Institute of Standards and Technology 2017 An Introduction to Privacy Engineering and Risk Management in Federal
                       Systems (see: https://doi.org/10.6028/NIST.IR.8062, accessed 12 February 2019)

                   18. Nissim K and Wood A. 2018 Is Privacy Privacy? Phil. Trans. R. Soc. A 376. (see https://doi.org/10.1098/rsta.2017.0358,
                       accessed 12 February 2019)

                   19. Nissim K et al. 2018 Bridging the gap between computer science and legal approaches to privacy. Harvard Journal of Law
                       & Technology 31. (see https://privacytools.seas.harvard.edu/publications/bridging-gap-between-computer-science-and-legal-
                       approaches-privacy, accessed 12 February 2019)

                   20. Op. Cit. 17

20                                                                                                                        Protecting privacy in practice
CHAPTER ONE

   BOX 1

   Daniel Solove’s taxonomy of privacy

   Different types of actions in a data analysis pipeline can result in privacy harms. One way
   of understanding these different harms is through Daniel Solove’s taxonomy of privacy 21.
   Focusing on the analysis of personal data, problematic data actions include:

   • ‘Aggregation’, the gathering together                          • ‘Insecurity’ is a problem caused by the
      of information about a person. A piece                            way our information is handled and
      of information here or there is not very                          protected. It might involve glitches, a lack
      telling. But when combined together,                              of cybersecurity, abuses and illicit uses
      bits and pieces of data begin to form a                           of information. One major harm that can
      portrait of a person. The accumulation of                         result from insecurity is identity theft.
      data and increasing power of analytics
                                                                     • ‘Exclusion’ refers to the failure to provide
      mean that aggregation is easier and more
                                                                        individuals with notice and input about
      insightful than ever before. This presents
                                                                        their records. Insecurity and exclusion can
      both potential benefits (eg personalised
                                                                        create a sense of vulnerability, uncertainty
      recommendations by online retailers)
                                                                        and powerlessness amongst individuals
      and potential harms (eg potential distress
                                                                        from whom the data comes.
      or other harms resulting from a profile
      containing sensitive attributes being                          • ‘Disclosure’ occurs when certain
      created about an individual without their                         information about a person is revealed to
      full knowledge)22.                                                others. Disclosure can threaten people’s
                                                                        security and might limit their freedom of
   • ‘Identification’ is connecting data to a
                                                                        association and expression.
      specific individual. An obvious benefit
      is identity verification. Identification can                   • ‘Exposure’ involves the exposing to others
      result in harm if an individual would rather                      of certain physical and emotional attributes
      stay anonymous. Of note, aggregation can                          about a person. Such exposure risks
      lead to the identification of an individual,                      creating embarrassment and humiliation.
      by linking ‘de-identified’ data (where
                                                                     • ‘Intrusion’ involves invasions or incursions
      personal identifiers have been removed)
                                                                        into one’s life, whether in the physical or
      and other data.
                                                                        digital space. For example, spam and junk
                                                                        mail are intrusions that sap people’s time.

21. Solove DJ. 2006 A Taxonomy of Privacy. University of Pennsylvania Law Review 154, 477–560. (see https://www.law.upenn.edu/
    journals/lawreview/articles/volume154/issue3/Solove154U.Pa.L.Rev.477(2006).pdf, accessed 12 February 2019)

22. Op. Cit. 7

Protecting privacy in practice	                                                                                                                21
CHAPTER ONE

                      figure 1

                   Mapping of data lifecycle stages and problematic actions presenting a risk to privacy.

                        INVAsions                                                  Information
                                                                                    processing

                                                  Information                                                   Information
                                                  collection                       Data holders                 dissemination

                     DATA SUBJECT

                   risks                          risks                          risks                          risks
                   Intrusion                      Surveillance                   Aggregation                    Breach of
                   Decisional                     Interrogation                  Identification                 confidentiality
                   interference                                                  Insecurity                     Disclosure
                                                                                 Secondary use                  Exposure
                                                                                 Exclusion                      Increased
                                                                                                                accessibility
                                                                                                                Blackmail
                                                                                                                Appropriation
                                                                                                                Distortion

                   Source: Solove DJ. 2006 A Taxonomy of Privacy. University of Pennsylvania Law Review 154, 477–560. (see https://www.
                   law.upenn.edu/journals/lawreview/articles/volume154/issue3/Solove154U.Pa.L.Rev.477(2006).pdf, accessed 12 February 2019)

                   1.3 The legal context for data processing:                    could overcome their apparent ‘failure’ to
                   personal data and the GDPR                                    go mainstream23. This report is not intended
                   Considering how technologies can be used                      to provide a comprehensive analysis of all
                   as a means of governance is the essence of                    relevant legal frameworks with regards to the
                   the ‘data protection by design and default’                   processing of personal or otherwise sensitive
                   approach required by the General Data                         data (such as those mentioned in section 1.2) –
                   Protection Regulation (GDPR), which came                      rather we take a closer look at the GDPR in the
                   into force in May 2018 and introduced new                     context of the protection of personal data.
                   protections across the European Union.
                   Following many of the principles at the heart                 The ‘data protection by design and default’
                   of the GDPR, new data protection regimes                      approach introduced by the GDPR includes a
                   are being adopted across the world. Such                      number of data protection principles, including
                   regulatory change builds a fresh case for                     data ‘minimisation’ – to collect and store only
                   the development and uptake of PETs, which                     the minimum required amount of data for the

                   23. Op. Cit. 5

22                                                                                                       Protecting privacy in practice
CHAPTER ONE

purpose of a given application. An increased                            approach where certain variables in a dataset
emphasis on data minimisation could lead to                             are suppressed or generalised in a manner
                                                                                                                                 Considering how
a cultural shift within the organisations that                          that ensures the information from any single
handle data. In particular, the GDPR challenges                         individual cannot be distinguished from at least         technologies
business models that involve data hoarding                              k-1 other individuals in the dataset. But this           can be used
and constitutes a push for business models that                         approach can still lead to sensitive attribute
                                                                                                                                 as a means of
involve privacy protection. At the moment there                         disclosure in some cases.
is a general culture of accumulating data and                                                                                    governance is the
working out purpose and use later. However,                             This has prompted the Information                        essence of the
there might be a counter-trend towards more                             Commissioner’s Office to consider ways                   ‘data protection by
focused data collection24 based on specific                             it could address data that is not per se
                                                                                                                                 design and default’
variables of interest and specific hypotheses                           ‘personal data’ but might still contain sensitive
to be tested. This move could potentially                               information27. New approaches, such as the               approach required
encourage the development of data-enabled                               PETs described in this report, need to be                by the GDPR.
technologies that utilise small, high quality                           considered as alternative safeguards.
data, rather than relying on very large datasets.
This in turn could open up new opportunities.                           The GDPR places an increased emphasis
                                                                        on accountability and, regarding consent for
The regulation also recommends that data                                data processing in particular, it requires more
users employ ‘safeguards’ to ensure their                               specificity about the purposes that data is
processing of personal data is compliant with                           used for. Personal data stores, one of the PETs
the regulation. Personal data is information                            considered in more detail in chapter two, have
that relates to an identified or identifiable                           been proposed as one solution to help the
individual25. Organisations wanting to adopt                            ‘data subject’ manage granular consent and
a privacy-preserving approach may often                                 provide individuals with control over their data.
ask for a solution that transforms a dataset
containing personal data into an anonymised                             Finally, GDPR also introduces mandated data
one, so that they can perform computation on it.                        ‘portability’ – a data subject has the right to
A key approach to do so is pseudonymisation,                            receive data about them from the controller,
a data management procedure by which                                    in a structured, commonly used, and machine
personally identifiable information fields are                          readable format. This could, for instance, help
replaced by artificial identifiers. However,                            consumers switch services and this might
there have been several studies showing                                 create markets, such as a service-switching
that pseudonymised datasets could in certain                            market. Could PETs be part of the solution and
cases be re-identified, by linking them with                            help share consumer data in a secure way?
other data26. k-anonymisation is an alternative

24. The Economist 2018 Can the EU become another AI superpower? (see https://www.economist.com/business/2018/09/22/can-the-eu-
    become-another-ai-superpower, accessed 12 February 2019)

25. See https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/ (accessed 12 February 2019)

26. Narayanan A and Shmatikov V. 2008 Robust De-anonymization of Large Sparse Datasets (see https://www.cs.utexas.edu/~shmat/
    shmat_oak08netflix.pdf, accessed 12 February 2019)

27. The British Academy and The Royal Society 2017 Data management and use: Governance in the 21st century –
    Priorities for data governance: discussions at a British Academy and Royal Society seminar on 16 October 2017.
    (see https://royalsociety.org/topics-policy/projects/data-governance/, accessed 12 February 2019)

Protecting privacy in practice	                                                                                                                  23
CHAPTER TWO

24                 Protecting privacy in practice
CHAPTER TWO

                            Chapter two
                            PETs today: capabilities
                            and limitations

Left
© standret.

Protecting privacy in practice	                                      25
CHAPTER TWO

                       PETs today: capabilities
                       and limitations
                       This section explores the opportunities          2.1 PETs and privacy-preserving data analysis
                       for PETs-enabled data analysis and               2.1.1 What are the potential opportunities?
PETs can reduce        considerations for the use of PETs as a whole,   At a high level, PETs can reduce the privacy risk
the privacy risk of    before taking a deeper dive into a selection     associated with existing processing, reduce the
existing processing,   of five PETs. Finally, section 2.3 shows how     friction of processing which otherwise would
                       different PETs might be used to achieve one      not be able to take place for being deemed too
reduce the friction
                       type of analysis, focusing on the example        high risk, and create entirely new opportunities
of processing          of machine learning.                             to access and use data which had not
currently deemed                                                        previously been thought of. Once developed
too high risk, and     The overviews given here provide a               to the appropriate level of technical maturity,
                       snapshot of this emerging area with the          PETs could enable secure and controlled data
create entirely        understanding that a number of new               access, and a range of new data-enabled
new opportunities      methods are being proposed and are               products or services, in circumstances that
to access and          developing at different rates, with the five     would otherwise raise legal, reputational,
                       selected PETs presented as examples of           political, business or competition concerns due
use data.
                       some of the main classes of methods at the       to the potential for privacy risks. In this respect,
                       current time. Some of these technologies         PETs may be an enabler of data use, rather than
                       are emerging into the basis of real-world        an additional layer of cybersecurity defence.
                       applications, and the case studies included      Here we discuss some of the potential benefits
                       demonstrate their potential for use; whilst      offered by the use of PETs.
                       others remain some distance away from
                       practical use and uptake, being at the point     A system of certification for PETs could enable
                       of fundamental research. This section shows      organisations using them to provide assurance
                       what is possible, whilst giving a sense of       that they are sharing and processing data in a
                       the technical and practical limitations of       secure, privacy-preserving way, assuming the
                       the technologies at their current stage          appropriate standard for the certifications could
                       of development.                                  be agreed by relevant stakeholders. In so doing,
                                                                        these technologies could help open up datasets
                                                                        to more external parties, and thus encourage
                                                                        innovation. PETs could, for example, enable
                                                                        government departments to share data with one
                                                                        another to allow analysis that would provide
                                                                        benefits to citizens through improved services
                                                                        – for example they could support the sharing of
                                                                        data between the National Health Service (NHS)
                                                                        and the Department for Work and Pensions
                                                                        (DWP) for administering health-related benefits.

26                                                                                            Protecting privacy in practice
CHAPTER TWO

In addition, PETs may help remove the need                              2.1.2 What sort of protections do PETs provide?
for government departments and agencies                                 There is currently no technology that is
                                                                                                                                             The use of PETs
to hold large datasets in places of restricted                          applicable to every single situation of privacy-
access and only perform analysis in situ,                               preserving data analysis. Different PETs can be                      would not detract
which is currently the case, for instance, for                          used to achieve distinct aims (see Summary                           from the need to
Department for Education data28. Instead,                               table), such as:
                                                                                                                                             assess whether it
PETs might allow secure cloud-based
                                                                        • s ecurely providing access to private                             is legal or ethical
computation. They could also enable wider
                                                                           datasets;
use of distributed processing on data stored                                                                                                 to carry out an
on citizens’ smartphones and other connected                            • e
                                                                           nabling joint analysis on private data held                      analysis or give
objects, thus saving the need for servers that                            by several organisations;
                                                                                                                                             access to the data
could be subject to security breaches29.
                                                                        • s ecurely out-sourcing to the cloud                               in the first place.
                                                                           computations on private data;
Within organisations, PETs could support
cross-team working, by enabling different                               • d
                                                                           e-centralising services that rely on
internal teams to work on data that they                                  user data.
otherwise would not be able to match or view
because of data protection rules. In academia,                          It is worth noting that some PETs might
the safe sharing of data with the research                              be better suited for use by organisations
community could also lead to innovations                                (business-to-business; B2B), and others for use
benefiting citizens. However, this would not                            by individuals (business-to-consumer; B2C).
detract from the need to assess whether it is                           For example, cloud providers may want to
legal or ethical to carry out an analysis or give                       employ secure hardware or techniques based
access to the data in the first place30.                                on encryption to protect code and data on
                                                                        their platform, whereas individuals may benefit
PETs could also help commercial organisations                           from using personal data stores and other
introduce new business models and use                                   PETs designed for individuals31.
consumer data in new ways. Personal data
stores are an interesting example of such an
application (see section 2.2.5).

28. See https://www.gov.uk/guidance/how-to-access-department-for-education-dfe-data-extracts (accessed 12 February 2019)

29. The Royal Academy of Engineering 2018 Internet of Things: realising the potential of a trusted smart world (see https://www.raeng.org.
    uk/publications/reports/internet-of-things-realising-the-potential-of-a-tr, accessed 12 February 2019)

30. For example, in 2017 the Digital Economy Act provided the Office for National Statistics (ONS) with permissive and mandatory
    gateways to receive data from all public authorities and Crown bodies, and new powers to mandate data from some UK businesses.
    In limited circumstances data held by ONS may also be shared with the devolved administrations solely for statistical purposes.
    (see https://www.ons.gov.uk/aboutus/transparencyandgovernance/lookingafterandusingdataforpublicbenefit/policies, accessed
    12 February 2019)

31. Op. Cit. 5

Protecting privacy in practice	                                                                                                                               27
CHAPTER TWO

                    The choice of a PET also requires considering                             Finally, it should be noted that there is a
                    forms of control or oversight, or trust models.                           difference between ways of specifying privacy
The choice of
                    In any given system, whether centralised,                                 definitions and mechanisms for achieving
a PET requires      decentralised or distributed32, trust depends                             them. Because of the complexity of the context
considering forms   on the context and who ‘sees’ unencrypted                                 in which PETs operate, guaranteeing privacy
                    data. Most of the PETs discussed in this report                           in computing requires a solid mathematical
of control or
                    have their origins in the field of cryptography,                          definition, often referred to as a ‘security
oversight in any    which often frames challenges in terms of                                 definition’ or a ‘privacy definition’. Differential
given system, and   ‘attacker’ models (or threat models), ie what                             privacy (section 2.2.4) is an example of a
what attacks that   attacks a certain system might be vulnerable                              privacy definition – it is a way of measuring
                    to. This has implications for the ways in which                           privacy. Each privacy definition might have
system might be
                    these PETs manage data access, for example:                               different mechanisms for being achieved.
vulnerable to.
                    • s ecure multi-party computation specifically
                                                                                              For example, there might be several different
                       removes the need for a central trusted
                                                                                              ways of injecting random noise into a
                       authority, with which parties would
                                                                                              computation that would result in different
                       otherwise need to share information
                                                                                              ‘differentially private’ algorithm. These
                       (section 2.2.3, Figure 5);
                                                                                              algorithms would all satisfy the definition of
                    • c entralised and distributed differential                              differential privacy, and thus offer the same
                       privacy (section 2.2.4) come with different                            protection, but some might be more accurate
                       trust models: in centralised (aka ‘global’)                            than others, resulting in different scalability
                       differential privacy, noise33 is added upon                            trade-offs.
                      release of the output which means trust lies
                      in a central organisation, whilst in distributed                        2.1.3 The cost of using PETs – accuracy, utility
                      (aka ‘local’) differential privacy noise is                             and financial costs
                      added at the time of data collection. In other
                      words, the risk of sensitive information being                          How else do PETs alter data applications?
                      disclosed in the case of an attack on the                               When using PETs, there are trade-offs.
                      central authority is lower if that organisation                         Privacy engineers say that PETs incur a cost
                      uses distributed differential privacy;                                  in terms of ‘utility’. In the context of different
                                                                                              technologies, the cost in utility might be of a
                    • p
                       ersonal data stores provide individuals with
                                                                                              different nature. For example, with differential
                      the means to choose and control who they
                                                                                              privacy adding noise to a dataset entails a loss
                      want to trust with the data they generate. Also,
                                                                                              of some useful information so there is a cost in
                      they might offer the possibility to process data
                                                                                              terms of accuracy. In the case of PETs where
                      locally rather than sending the raw data to a
                                                                                              computation happens on encrypted data,
                      central authority, whose concentration of data
                                                                                              such as homomorphic encryption and secure
                      makes it an attractive target for hackers (also
                                                                                              multi-party computation, the main cost to utility
                      known as a ‘honeypot’).
                                                                                              is in terms of computation resources (time,
                                                                                              computing power).

                    32. Goyal S. 2015 Centralized vs Decentralized vs Distributed (see https://medium.com/delta-exchange/centralized-vs-decentralized-vs-
                        distributed-41d92d463868, accessed 12 February 2019)

                    33. Noise here refers to a random alteration of data values in a dataset so that the true data points (eg personal identifiers) are not as easy
                        to identify. See Introduction, Key terms and definitions.

28                                                                                                                          Protecting privacy in practice
You can also read