L8: Introduction to privacy-preserving computations - Privacy-preserving Technologies / LTAT.04.007

Page created by Don Love
 
CONTINUE READING
L8: Introduction to privacy-preserving computations - Privacy-preserving Technologies / LTAT.04.007
L8: Introduction to privacy-preserving
computations
Privacy-preserving Technologies / LTAT.04.007

Dan Bogdanov
dan.bogdanov@cyber.ee
L8: Introduction to privacy-preserving computations - Privacy-preserving Technologies / LTAT.04.007
Motivation – why privacy-preserving computing?
L8: Introduction to privacy-preserving computations - Privacy-preserving Technologies / LTAT.04.007
Example: Linking Tax And Education Data

3
L8: Introduction to privacy-preserving computations - Privacy-preserving Technologies / LTAT.04.007
Regulatory Barriers On Data Linking

4
L8: Introduction to privacy-preserving computations - Privacy-preserving Technologies / LTAT.04.007
Using privacy technologies to solve it

    Source data:                                                        Estonian
                                                  Education           Information
      10 million tax records,                      records         System's Authority
      600 000 education records.           Ministry of Education
                                               and Research
    Each record upload using secret sharing
    (think: “encryption”)
                                                                     Ministry of
    Records linked and processed using           Employment
                                                  tax records         Finance
    secure multi-party computation (think:                           IT Center
                                               Estonian Tax and
    “data not decrypted for processing”)        Customs Board

    Data never existed outside the source in
    an unencrypted state.
                                                                     Cybernetica
    Solution based on Sharemind MPC.
5
L8: Introduction to privacy-preserving computations - Privacy-preserving Technologies / LTAT.04.007
6
L8: Introduction to privacy-preserving computations - Privacy-preserving Technologies / LTAT.04.007
Tax and             Aggregate by year
      Customs
       Board         Monthly            Average
                     income           yearly income                                Recover
    Extract data                                                                 results from
                    Aggregate      Expand by years and                             shares
     Employment     by month       aggregate by person
    tax payments
                    Employment         Employment                     Analysis     Analysis
                   tax payments     record of a person                 results      results
    Secret share
     and upload
                                                                                       ?
                                     Merge by     Complete record     Analysis
                                    person's ID     of a person        table
    Higher study   Higher study
                                                         Compute additional
       events         events
                                                            attributes and
    Extract data       Aggregate     University career   align tax payments
                       by person       of a person
                                                                                  Statistical
     Ministry of
                        Data stored with secret sharing and                        analyst
     Education
    and Science    processed with secure multi-party computation
7
Sharemind-powered Analytics

    Data scientists used analytics tools based          Estonian
                                                      Information
    on secure multi-party computation.             System's Authority
    The MPC system also prevented queries
    outside the study plan.
    Reports were given to industry, universities
                                                                         Data   Universities
    and the government.                               Ministry of
                                                       Finance          Analyst Companies
                                                                               Policymakers
    Result: no clear relation between working         IT Center

    during studies and not graduating.

                                                      Cybernetica

8
A privacy-preserving statistics tool inspired by R

9
10
Non-IT graduation
                                                              rate is around 40%

                                                               IT graduation rate
                                                               is around 20%

11   Joonis 1. Nominaalajaga lõpetajate osakaal immatrikuleerimisaastate lõikes, IKT- ja mitte-IKT õppekavad,
     bakalaureuseõpe
Non-IT and IT students have similar employment
              ratios, but IT students lost more in the financial crisis

     Joonis 4. Nominaalaja jooksul töötanud tudengite osakaal kõigist tudengitest aastati, IKT- ja mitte-IKT
12   õppekavad, bakalaureuseõpe
DATA-DRIVEN SERVICES ON CONFIDENTIAL DATA

Regulatory status of the project

     In an official response, after a study of the system, the Estonian DPA
     suggested that
        neither the hosts of the servers running the statistics
        nor the analysts making the queries
        could feasibly re-identify individuals in the source database (this was pre-GDPR).
     The Internal Supervision Department of the Tax and Customs Board agreed
     to provide unmodified tax records after a code and process review.
     Follow-up legal review in the FP7 PRACTICE by a research from the
     University of Göttingen suggested that the same precedent could hold under
     GDPR as well.

13
A general model for privacy-preserving computing
Concept of secure computing

                                               encrypted
                                               database
                                   standard
       When a standard computer    tools
       encrypts data, it must be
       decrypted before analysis

                                   secure
      Secure computing systems     computing
      can analyze data without
      removing the encryption.

15
Extended definition of secure multi-party computation
     Input parties                  Computing parties                 Result parties

                                     x11

        IP1          x1
                                      ...
                                     xk1
                                             CP1        y1
                                                                     y1
                                                                           RP1
                                     x1i

         ...                          ...     ...       yj                  ...
                                     xki

                                     x1l
        IPk          xk               ...
                                             CPl        yl
                                                                     ym    RPm
                                     xkl

                Step 1:                       Step 2:            Step 3:
                upload and                    secure         publishing
16              storage of inputs           computing         of results
Technique: property-preserving cryptography

                       Analogy: symmetric crypto that preserves a
                       relation on inputs (e.g., order, equality).
                       Pros:
                         Low performance overhead.
                         Fits well into existing systems.
                       Cons:
                         Only allows a few operations (e.g., only
                         equality comparison or ordering).
                         Multi-user systems are a challenge, but can
                         be done with proxy re-encryption.

17
Technique: homomorphic encryption

                     Analogy: asymmetric crypto that allows
                     addition and multiplication of ciphertexts.
                     Pros:
                        Fits well into existing systems.
                     Cons:
                        High performance overhead.
                        Multi-user systems are a challenge, but can
                        be done with proxy re-encryption.

18
Technique: garbled circuits

                        Analogy: cryptographic versions of electrical
                        circuits.
                        Pros:
                           Flexible programming model.
                        Cons:
                           Medium performance overhead.
                           Fixed number of parties (can be solved by
                           combining with other techniques).

19
Example: millionaire’s problem

20
Technique: secret sharing

                       Analogy: give a number of people a random
                       piece of each secret value and let them
                       collaborate to compute results.
                       Pros:
                            Low-to-medium performance overhead.
                            Flexible programming model.
                       Cons:
                            Distributed deployments do not fit into all
                            existing systems.

21
Technique: trusted execution environments

                       Analogy: think of a computer process that
                       hides the data from its owner
       Ik              Pros:
                         Minimal performance overhead.
                         Relatively easy to convert applications to work
       SC
            C            on trusted execution environments
                       Cons:
                         Side-channel attack mitigations are
       Rn                complicated to implement.

22
Lecture exercise: modelling parties for a COVID-19
social distancing tracking application
Lecture task

     Think of an application that would support social distancing and limit infection
     rates. Write down very clearly, what is the expected benefit of the system.
        Write down the list of input parties and the data they would provide.
        Write down the list of computing parties and describe the kind of processing they
        would perform.
        Write down the list of result parties and describe the outputs they would receive.
     Bonus tasks, time permitting:
        Think of minimizing personal data processing using process redesign.
        See if any of the secure computing paradigms described above could support
        your application.
     Prepare in 12 minutes and then we’ll have 1-2 students present their ideas.
24
Programmable privacy-preserving computations
PDK as an abstraction of a secure computing paradigm

     A protection domain kind (PDK) is a set of data representations, algorithms
     and protocols for storing and computing on protected data.
     Examples:
       SMC based on secret sharing,
       SMC based on garbled circuits,
       (fully) homomorphic encryption,
       trusted hardware (e.g., Intel SGX).

26
Protection domain as an instance of a PDK

     A protection domain (PD) is a set of data that is protected with the same
     resources and for which there is a well-defined set of algorithms and
     protocols for computing on that data while keeping the protection.
     Examples:
        data held by a fixed group of servers performing secure multi-party computation,
        data encrypted under a fixed key of a homomorphic encryption scheme.

27
Application model for privacy-preserving computing

                     Secure            Privacy-
                                                        Application
       Application   primitive         preserving
                                                        logic
                     operations        algorithms

      • private outputs from private • publish selected results to
        inputs,                        make system useful,
      • have privacy proofs,         • do not leak private inputs or
      • remain private under           show leakage as acceptable,
        sequential or parallel      • compositions of secure
        composition,                  primitive operations,
      • optimized to have a         • optimize for running
        low resource footprint.       time.
28
Converting an algorithm to a privacy-preserving one

     We pick frequent itemset mining as a problem of choice.
     Frequent itemset mining is a data mining problem that helps with shopping
     basket analysis and the simplest kinds of recommender systems.
        What kind of things do people buy from stores together most often?
        If the service provider knows this, they can recommend one to a customer who is
        planning to buy the other.
     The simpler algorithms include Apriori (breadth-first search) and Eclat (depth-
     first search).
     We will know look at the basic primitive of frequent itemset mining and then
     build a privacy-preserving approach.

29
Privacy-preserving data representations

       Private data representations are the key toward
       desaigning privacy-preserving algorithms.
                                                               nasi             chicken
                                                    rendang           lontong
                                                              lemak              satay
                                     chicken
      t1   rendang      nasi lemak
                                     satay     t1     1        1        0         1
      t2   nasi lemak   lontong                t2     0        1        1         0
           chicken
      t3   satay                               t3     0        0        0         1
      t4   rendang      nasi lemak             t4     1        1        0         0
                        chicken
      t5   nasi lemak
                        satay                  t5     0        1        0         1
                        chicken
      t6   nasi lemak
                        satay                  t6     0        1        0         1
      t7   lontong                             t7     0        0        1         0
30
Calculating the support of an item

       The data representation allows for very efficient
       calculation of item supports.
                       nasi             chicken         nasi
            rendang           lontong
                      lemak              satay         lemak

       t1     1        1        0         1             1
       t2     0        1        1         0             1
       t3     0        0        0         1             0
       t4     1        1        0         0             1
       t5     0        1        0         1             1
       t6     0        1        0         1             1
       t7     0        0        1         0             0
31                                                ∑=    5
Calculating support for a set of items

       Checking the joint support of a pair of items
       simply requires a multiplication
                       nasi             chicken    nasi       chicken       nasi lemak &
            rendang           lontong                                       chicken satay
                      lemak              satay    lemak        satay

       t1     1        1        0         1        1      x     1       =       1
       t2     0        1        1         0        1      x     0       =       0
       t3     0        0        0         1        0      x     1       =       0
       t4     1        1        0         0        1      x     0       =       0
       t5     0        1        0         1        1      x     1       =       1
       t6     0        1        0         1        1      x     1       =       1
              0        0        1         0        0      x     0       =       0
       t7
32                                                                  ∑=          3
Evaluating itemsets with a depth-first strategy

          Depth-first search would be intuitive for pruning.

               { rendang }           { nasi lemak }      { lontong }        { chicken satay }

      {rendang,
                } {
       nasi lemak
                         rendang,
                         lontong    } {rendang,
                                                   } {
                                       chicken satay
                                                         nasi lemak,
                                                         lontong    } {    nasi lemak,
                                                                                      } {
                                                                           chicken satay
                                                                                                lontong,
                                                                                                           }
                                                                                                chicken satay

              {               } {                  }{                  }{                   }
                    rendang,           rendang,            rendang,             nasi lemak,
                    nasi lemak,        nasi lemak,         lontong,             lontong
                    lontong            chicken satay       chicken satay        chicken satay

                                           {                    }
                                                rendang,
                                                nasi lemak,
                                                lontong
                                                chicken satay

33
Evaluating itemsets with a breadth-first strategy

          However, breadth-first search can be done in parallel.

               { rendang }           { nasi lemak }      { lontong }        { chicken satay }

      {rendang,
                } {
       nasi lemak
                         rendang,
                         lontong    } {rendang,
                                                   } {
                                       chicken satay
                                                         nasi lemak,
                                                         lontong    } {    nasi lemak,
                                                                                      } {
                                                                           chicken satay
                                                                                                lontong,
                                                                                                           }
                                                                                                chicken satay

              {               } {                  }{                  }{                   }
                    rendang,           rendang,            rendang,             nasi lemak,
                    nasi lemak,        nasi lemak,         lontong,             lontong
                    lontong            chicken satay       chicken satay        chicken satay

                                           {                    }
                                                rendang,
                                                nasi lemak,
                                                lontong
                                                chicken satay

34
Balancing optimizations with privacy preservation

     Challenge: exploring all possible itemsets leads is slow due to combinatorial
     explosion.
     Pruning the search tree requires us to declassify itemset supports during
     computation (leak?).
     Solution: consider that the algorithm will publish all frequent itemsets, as that
     is its intended goal.
     We will compare support to the threshold privately, only declassifying the
     result bit.
     We will prune the search tree based on that bit.
     Not a leak - if the itemset is frequent, we would have learned it from the
     outputs anyway.
35
You can also read