Perturbative methods and tools for census data - Presentation at the Satellite meeting in Amsterdam on 12 September - EUROPA

Page created by Edgar Murray
 
CONTINUE READING
Perturbative methods and tools for census data - Presentation at the Satellite meeting in Amsterdam on 12 September - EUROPA
Perturbative methods and tools for
census data
   Presentation at the Satellite meeting
   in Amsterdam on 12 September
Perturbative methods and tools for census data - Presentation at the Satellite meeting in Amsterdam on 12 September - EUROPA
Outline

 – Census 2011 and confidentiality
 – Details of the project
 – Content of the project ‘Harmonised protection of census
   data in the ESS’
 – Preparation phase
 – Operational phase
 – Definition of test scenarios
 – Record swapping
 – Cell key method
 – Tools
 – Questions

                                                     2
Perturbative methods and tools for census data - Presentation at the Satellite meeting in Amsterdam on 12 September - EUROPA
Census 2011 and confidentiality (1)

 European Census 2011 data represent an essential source
 of vital statistical information ranging from the lowest
 small-area geographical divisions to national and
 international levels

 Harmonised census tables of 32 European countries are
 available via the Census Hub
 (https://ec.europa.eu/CensusHub2/)

 Census data are detailed and confidential; protecting the
 census data is the responsibility of the member states

                                                       3
Perturbative methods and tools for census data - Presentation at the Satellite meeting in Amsterdam on 12 September - EUROPA
Census 2011 and confidentiality (2)

 In spite of the output harmonisation international
 comparisons of census data are hampered by different
 statistical disclosure control approaches

 In this Specific Grant Agreement (SGA) best practices for
 the Census 2021 have been defined and tested

                                                       4
Perturbative methods and tools for census data - Presentation at the Satellite meeting in Amsterdam on 12 September - EUROPA
Details of the project (1)

 – Start: 1 September 2016
 – End: 31 August 2017

 – Four WPs:
   – WP 1 Management (7 deliverables)
   – WP 2 Questionnaire (2 deliverables)
   – WP 3 Development and testing of the
     recommendations; identification of best practices (4
     deliverables)
   – WP 4 Dissemination (5 deliverables)

                                                     5
Perturbative methods and tools for census data - Presentation at the Satellite meeting in Amsterdam on 12 September - EUROPA
Details of the project (2)

 – Six countries involved:
   - CBS (Eric Schulte Nordholt, Peter-Paul de Wolf),
   - INSEE (Maël-Luc Buron),
   - Destatis (Sarah Gießing, Tobias Enderle),
   - HCSO (László Antal, Beata Nagy),
   - Statistics Finland (Annu Cabrera) and
   - SURS (Andreja Smukavec, Junoš Lukan)

                                                        6
Perturbative methods and tools for census data - Presentation at the Satellite meeting in Amsterdam on 12 September - EUROPA
Content of the project ‘Harmonised
protection of census data in the ESS’

 – Reviewed the country specific data protection
   regulations and methods
 – Provided a harmonised approach to the protection of the
   Census 2021 (taking the national constraints into
   account)
 – Recommended to Member States appropriate statistical
   disclosure control methods for hypercubes
 – Recommended how to handle efficiently confidential
   cells in grid squares and regional breakdowns (risk of
   disclosure due to differencing)
                                                    7
Preparation phase

 – Inventory of the country specific data protection
   regulations and methods
   - Reused already existing information
   - Collected new information via a questionnaire
 – Review of the census data requirements (breakdowns,
   formats, levels of detail, classifications, geographical
   classifications as country, NUTS, LAU2 and grid
   squares), the different links between data were taken
   into account (also national versus ESS and other
   international requirements)

                                                        8
Operational phase

 – Development of the recommendations for the treatment
   of statistical confidentiality (by means of best practices)

 – Testing of the recommended approach(es)

 – Support to other NSIs that are willing to test

 – Adaptation of the recommended approach taking into
   account the feedback (after testing)

 – Reports to the relevant ESS bodies

                                                        9
Definition of test scenarios (1)

 Restrictions:
 – No global recodes (lay-out of hypercubes fixed in
   implementing census regulation)
 – No cell suppressions (very difficult for linked hypercubes
   and otherwise no European total can be calculated)

 Complications:
 – 1 km2 grid cells lead to many small cell values
 – 1 km2 grid cells ó administrative regions (risk of
   disclosure due to differencing)

                                                        10
Definition of test scenarios (2)
Definition of test scenarios (3)
Definition of test scenarios (4)
Definition of test scenarios (5)

 The Statistical Disclosure Control solution should not alter
 the spatial distribution of the grid data too much:
 – Zer0 frequencies grids should not too often be changed
   to positive frequencies
 – Rare non-zero frequencies in an area should not be
   changed much

 Usual disclosure risks:
 – Small counts (may lead to direct identification)
 – Attribute disclosure (a positive frequency may lead to
   disclosing information from a hypercube)

                                                       14
Definition of test scenarios (6)

 Flexible method that can be adapted to national needs by
 the member states:
 – Pre-tabular method of record swapping
 – Post-tabular method of cell key method

 Record swapping and cell key method:
 – Enhanced variant of cell key method developed by the
   Australian Bureau of Statistics (ABS)
 – Provided by the Office for National Statistics (ONS) and
   adapted in this SGA

                                                      15
Record Swapping (1)

                      16
Record Swapping (2)

                      17
Record Swapping (3)

                      18
Cell key method (1)
Assign each record a                  For each cell, sum rkey and apply a function to get a cell key
random number
                                       Age by        Male        Female              Record     Rkey
Record      Rkey                       sex
r1 →        54                                                                       r2 →       4
                                       0-15           .            .
r2 →        4                                                                        r4 →       61
                                       16-24          .            4
r3 →        93                                                                       r56 →      7
                                       25-34          .            .
…                                                                                    r72 →      90
                                       …
rN →        26                                                                       Sum =      162
                                                                               e.g. take last two digits → Ckey = 62

         Use a look up table to get perturbation value                     Apply perturbation value to cell
                 Cell Key
                   1    2    3    …   61   62   63   …      99
                                                                       Age by         Male     Female
                                                                       sex
     Cell   1           +1
                                                                       0-15             .            .
    Value   2                +1                 -1
            3                                               +1
                                                                       16-24            .            5

            4      -1                      +1                          25-34            .            .
            5                -1       -1                               …
            …
                                                                                                         19
Cell key method (2)

 – Cell key method is primarily used for protecting against
   differencing
 – Additional to record swapping (that is considered the
   primary approach)
 – Considering the need to retain 1s and 2s in outputs
 – Structural zeroes can be taken into account
 – Introduces another layer of uncertainty for intruder
 – Consistency in same cell across tables
 – After restoring additivity some small inconsistencies in
   breakdowns of different hypercubes may appear

                                                      20
Tools

 – New Specific Grant Agreement started this spring: SGA on
   ‘Open source tools for perturbative confidentiality methods’
   and will run for 15 months
 – Aim of this new SGA: integrate the codes produced into user-
   friendly open source software packages
 – Seven countries involved in this SGA: Austria, Finland, France,
   Germany, Hungary, Netherlands and Slovenia
 – Beta version of Tau-ARGUS (version 4.1.12BETA) with the
   prototype cell key method is available on github
   (https://github.com/sdcTools/tauargus/releases) to test
 – Prototype R-package cellKey is available on github
   (https://github.com/sdcTools/cellKey) to test

                                                           21
Questions

 Do you have any questions or remarks?

                                         22
You can also read