Perturbative methods and tools for census data - Presentation at the Satellite meeting in Amsterdam on 12 September - EUROPA
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Perturbative methods and tools for census data Presentation at the Satellite meeting in Amsterdam on 12 September
Outline – Census 2011 and confidentiality – Details of the project – Content of the project ‘Harmonised protection of census data in the ESS’ – Preparation phase – Operational phase – Definition of test scenarios – Record swapping – Cell key method – Tools – Questions 2
Census 2011 and confidentiality (1) European Census 2011 data represent an essential source of vital statistical information ranging from the lowest small-area geographical divisions to national and international levels Harmonised census tables of 32 European countries are available via the Census Hub (https://ec.europa.eu/CensusHub2/) Census data are detailed and confidential; protecting the census data is the responsibility of the member states 3
Census 2011 and confidentiality (2) In spite of the output harmonisation international comparisons of census data are hampered by different statistical disclosure control approaches In this Specific Grant Agreement (SGA) best practices for the Census 2021 have been defined and tested 4
Details of the project (1) – Start: 1 September 2016 – End: 31 August 2017 – Four WPs: – WP 1 Management (7 deliverables) – WP 2 Questionnaire (2 deliverables) – WP 3 Development and testing of the recommendations; identification of best practices (4 deliverables) – WP 4 Dissemination (5 deliverables) 5
Details of the project (2) – Six countries involved: - CBS (Eric Schulte Nordholt, Peter-Paul de Wolf), - INSEE (Maël-Luc Buron), - Destatis (Sarah Gießing, Tobias Enderle), - HCSO (László Antal, Beata Nagy), - Statistics Finland (Annu Cabrera) and - SURS (Andreja Smukavec, Junoš Lukan) 6
Content of the project ‘Harmonised protection of census data in the ESS’ – Reviewed the country specific data protection regulations and methods – Provided a harmonised approach to the protection of the Census 2021 (taking the national constraints into account) – Recommended to Member States appropriate statistical disclosure control methods for hypercubes – Recommended how to handle efficiently confidential cells in grid squares and regional breakdowns (risk of disclosure due to differencing) 7
Preparation phase – Inventory of the country specific data protection regulations and methods - Reused already existing information - Collected new information via a questionnaire – Review of the census data requirements (breakdowns, formats, levels of detail, classifications, geographical classifications as country, NUTS, LAU2 and grid squares), the different links between data were taken into account (also national versus ESS and other international requirements) 8
Operational phase – Development of the recommendations for the treatment of statistical confidentiality (by means of best practices) – Testing of the recommended approach(es) – Support to other NSIs that are willing to test – Adaptation of the recommended approach taking into account the feedback (after testing) – Reports to the relevant ESS bodies 9
Definition of test scenarios (1) Restrictions: – No global recodes (lay-out of hypercubes fixed in implementing census regulation) – No cell suppressions (very difficult for linked hypercubes and otherwise no European total can be calculated) Complications: – 1 km2 grid cells lead to many small cell values – 1 km2 grid cells ó administrative regions (risk of disclosure due to differencing) 10
Definition of test scenarios (2)
Definition of test scenarios (3)
Definition of test scenarios (4)
Definition of test scenarios (5) The Statistical Disclosure Control solution should not alter the spatial distribution of the grid data too much: – Zer0 frequencies grids should not too often be changed to positive frequencies – Rare non-zero frequencies in an area should not be changed much Usual disclosure risks: – Small counts (may lead to direct identification) – Attribute disclosure (a positive frequency may lead to disclosing information from a hypercube) 14
Definition of test scenarios (6) Flexible method that can be adapted to national needs by the member states: – Pre-tabular method of record swapping – Post-tabular method of cell key method Record swapping and cell key method: – Enhanced variant of cell key method developed by the Australian Bureau of Statistics (ABS) – Provided by the Office for National Statistics (ONS) and adapted in this SGA 15
Record Swapping (1) 16
Record Swapping (2) 17
Record Swapping (3) 18
Cell key method (1) Assign each record a For each cell, sum rkey and apply a function to get a cell key random number Age by Male Female Record Rkey Record Rkey sex r1 → 54 r2 → 4 0-15 . . r2 → 4 r4 → 61 16-24 . 4 r3 → 93 r56 → 7 25-34 . . … r72 → 90 … rN → 26 Sum = 162 e.g. take last two digits → Ckey = 62 Use a look up table to get perturbation value Apply perturbation value to cell Cell Key 1 2 3 … 61 62 63 … 99 Age by Male Female sex Cell 1 +1 0-15 . . Value 2 +1 -1 3 +1 16-24 . 5 4 -1 +1 25-34 . . 5 -1 -1 … … 19
Cell key method (2) – Cell key method is primarily used for protecting against differencing – Additional to record swapping (that is considered the primary approach) – Considering the need to retain 1s and 2s in outputs – Structural zeroes can be taken into account – Introduces another layer of uncertainty for intruder – Consistency in same cell across tables – After restoring additivity some small inconsistencies in breakdowns of different hypercubes may appear 20
Tools – New Specific Grant Agreement started this spring: SGA on ‘Open source tools for perturbative confidentiality methods’ and will run for 15 months – Aim of this new SGA: integrate the codes produced into user- friendly open source software packages – Seven countries involved in this SGA: Austria, Finland, France, Germany, Hungary, Netherlands and Slovenia – Beta version of Tau-ARGUS (version 4.1.12BETA) with the prototype cell key method is available on github (https://github.com/sdcTools/tauargus/releases) to test – Prototype R-package cellKey is available on github (https://github.com/sdcTools/cellKey) to test 21
Questions Do you have any questions or remarks? 22
You can also read