Requirements for the use of pseudonymisation solutions in compliance with data protection regulations - GDD eV
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Rolf Schwartmann / Steffen Weiß (Ed.) Requirements for the use of pseudonymisation solutions in compliance with data protection regulations A working paper of the Data Protection Focus Group of the Platform Security, Protection and Trust for Society and Business at the Digital Summit 2018
Rolf Schwartmann / Steffen Weiß (Ed.) Benjamin Walczack Requirements for the use of pseudonymisation Johannes Landvogt Independent Centre for Data The Federal Commissioner for Data solutions in compliance with data protection Protection and Freedom of Protection Schleswig-Holstein regulations Information (BfDI) Author: Data Protection Focus Group Prof. Dr. Michael Meier A working paper of the Data Protection Focus Group of the of the Digital Summit University of Bonn/German Platform Security, Protection and Trust for Society and Business Informatics Society (GI) Chairman of the Focus Group: at the Digital Summit 2018 Prof. Dr. Rolf Schwartmann Dr. Frank Niedermeyer Federal Office for Information Security Chairman of the Focus Group: Walter Ernestus Robin L. Mühlenbeck Prof. Dr. Rolf Schwartmann The Federal Commissioner for Data Cologne Research Centre for Media Cologne Research Centre for Media Protection and Freedom of Law - Cologne University of Law - Cologne University of Technology Information (BfDI) Technology Member of the Data Ethics Commission Nicolas Goß Jonas Postneek of the Federal Government eco - Association of the Internet Industry Federal Office for Information Coordination: Security Michael Herfert Steffen Weiß, LL.M. Fraunhofer-Society for the Promotion Frederick Richter, LL.M. German Society for Data of Applied Research Data Protection Foundation (Stiftung Protection and Data Security Datenschutz) Maximilian Hermann (GDD) Contact: Cologne Research Centre for Media Dr. Sachiko Scheuing Members: Law - Cologne University of Technology Acxiom Germany GmbH Steffen Weiß Prof. Dr. Christoph Bauer Dr. Detlef Houdeau Irene Schlünder German Society for Data ePrivacy GmbH Infineon Technologies AG Technology and Method Platform for Protection and Data Security Patrick von Braunmühl Networked Medical Research (TMF) Heinrich-Böll-Ring 10 Angelika Hüsch-Schneider Bundesdruckerei GmbH Deutsche Telekom AG Sebastian Schulz 53119 Bonn, Germany Dr. Guido Brinkel Clemens John Federal Association for E- Microsoft Germany GmbH Commerce and Distance Selling Phone: +49 228 96 96 75 00 United Internet AG Germany (bevh) Email: info@gdd.de Susanne Dehmel Annette Karstedt-Meierrieks www.gdd.de Federal Association for Association of German Dr. Claus D. Ulmer Information Technology, Chambers of Industry and Deutsche Telekom AG Telecommunications and Commerce (DIHK) Dr. Winfried Veil New Media e.V. (BITKOM) Federal Ministry of the Interior German Society for Data Daniel Krupka Protection and Data Security Philip Ehmann e.V. German Informatics Society (GI) Dr. Martina Vomhof eco - Association of the Internet Industry German Insurance Association Version 1.0, 2018 (GDV)
Requirements for the use of pseduonymisation solutions in compliance with data protection regulations A working paper of the Data Protection Focus Group of the Platform Security, Protection and Trust for Society and Business at the Digital Summit 2018 Foreword Pseudonymisation as a bridge between informational and entrepreneurial self-determination Machines should make life safer, lighter, more pleasant and longer. The human being, In 2018, the Focus Group continued its work and presents this working paper. It stipulates respectively his/her intelligence is the starting point of the AI. The technology is intended requirements for the use of pseudonymisation solutions in compliance with data protection to imitate human behaviour through mechanical work and understanding in order to apply regulations. At the same time, it represents a necessary intermediate step from the it independently on this basis, if necessary. For this purpose, huge amounts of data are whitepaper on the way to a proposal for a code of conduct for pseudonymisation which processed with the aim of identifying patterns from the data, evaluating them and drawing the Focus Group intends present at the Digital Summit in 2019 and which will help industry conclusions from them. to achieve greater investment security. "Artificial intelligence - a key to growth and prosperity." This is the title of the 2018 Digital My sincere thanks is owed to to all the members of the Focus Group for their intensive, Summit of the Federal Government. Germany is strong in the digital economy and should constructive and efficient work. Special thanks go to Mr Steffen Weiß for coordinating the become a leader in the AI. The strategy is right. However, it is bound by legal guidelines. Group's work so expertly and prudently. The General Data Protection Regulation (GDPR) provides a reliable legal framework for innovative technologies and applications, including in the field of AI. It lays down rules on the protection of individuals with regard to the processing of personal data and on the free movement of such data. “The revision of the e-privacy regulation is intended to complement this protection concept.” This is the clear commitment of the Federal Government within its Key Issues Paper as part of Germany’s digital strategy. In order to make personal data economically usable, the GDPR relies on Cologne, November 2018 pseudonymisation. Pseudonymisation serves two functions – one that it is intended to Professor Dr. Rolf Schwartmann protect personal data, the other to facilitate its economic use at the same time. The core Head of the Data Protection Focus Group of the Platform Security, Protection and Trust for Society and of pseudonymisation is to replace a person's identity data with a specific string, as it is Business at the Digital Summit 2018 and member of the Federal Government's Data Ethics Commission the case with a vehicle registration number. Disclosing a person’s identity from the pseudonym takes place according to fixed rules. The Focus Group on data protection of the Platform Security, Protection, and Trust * Available at: https://bit.ly/2FjLnVd. For Society and Business has already published a white paper which presents guidelines for the legally secure use of pseudonymisation solutions, taking into account the GDPR.
Requirements for the use of pseduonymisation solutions in compliance with data protection regulations A working paper of the Data Protection Focus Group of the Platform Security, Protection and Trust for Society and Business at the Digital Summit 2018 Table of contents A. Introductory remarks................................................................................... 8 B. Legal classification of pseudonymisation ....................................... 8 C. Requirements for pseudonymisation ................................................. 9 D. Technical-organisational requirements for the pseudonymisation .............................................................................. 12 E. Best practices ............................................................................................... 26
Requirements for the use of pseduonymisation solutions in compliance with data protection regulations A working paper of the Data Protection Focus Group of the Platform Security, Protection and Trust for Society and Business at the Digital Summit 2018 whitepaper on pseudonymisation above. serves primarily to reduce risks within the a processing. A. introductory remarks Following the presentation of the legal meaning of Art. 32 GDPR. In view of the The aim of this guideline is to support classification of pseudonymisation legal requirements pursuant to Art. 25 C. Requirements for (Section B.), this guideline elaborates GDPR for "Privacy by Design", those responsible for data processing in pseudonymisation the legally compliant implementation of (Section C.) the prerequisites for a legally pseudonymisation ensures that personal pseudonymisation measures by means of secure pseudonymisation process as well information can be decoupled from other Every pseudonymisation must comply with corresponding specifications. Already in as the requirements that a data at an early stage. certain conditions in order to be legally the whitepaper for the Digital Summit pseudonymisation must typically fulfil compliant in the sense of the GDPR. 20171 the "Data Protection Focus Group of (Section D.). 2. Pseudonymisation to facilitate a processing or the processing 1. Assignment of responsibilities the Platform Security, Protection and Trust for another purpose For the monitoring of the for Society and Business” has stressed the pseudonymisation process, a person, importance of pseudonymisation. B. Legal classification of According to the GDPR, pseudonyms can e.g. a specialist manager should be An essential characteristic of a pseudonymisation also enable certain data processing appointed. the person should be able pseudonymisation is that pseudonyms can because of the associated risk reduction. to provide the necessary technical and no longer be assigned to the person Pseudonymisation alone does not make These are in particular the cases of a so- legal understanding of specifically concerned without the data processing lawful. It is merely a called “compatible further processing” in pseudonymisation. The role of this involvement of additional information. In building block to ensure that a data accordance with Art. 6 para. 4 GDPR, person is to take responsibility for this respect, pseudonymisation protects processing is in accordance with the meaning that a subsequent change of important decisions. He/she should be citizens whose personal data are GDPR. It is therefore always necessary to purpose regarding the processing of involved in the in a position to create a processed from unwanted identification. In have a legal basis for processing personal personal data is considered compatible uniform approach of contrast to anonymisation2, pseu- data. The requirements of Art. 6 GDPR with the initial purpose of processing. pseudonymisation at the data donymised data can be traced back to the ("lawfulness of processing") must be Whether a new processing purpose is controller/processor. Besides, he/she complied with, as in the case of special compatible with the original purpose and should have the opportunity to draw on individual (re-identification). The strength categories of personal data those of Art. 9 whether further processing can therefore know-how from within or outside the of pseudonymisation depends on how GDPR ("processing of special categories be based on the original legal basis is the organisation. The controller/processor high the risk, the costs and the time of personal data"). result of weighing different criteria remains responsible for fulfilling the required for a direct or indirect Essentially, pseudonymisation can be mentioned in Art. 6 para. 4 GDPR. One tasks and duties of the GDPR. identification by third parties are to be estimated. applied in two instances: criterion in favour of compatibility of purposes is the existence of suitable 2. Applicability and legal The pseudonymisation of data is useful 1. Pseudonymisation as a guarantees, which may include admissibility when compiling statistics in conformity technical protective measure encryption or pseudonymisation (Art. 6 Depending on the application of with the law, for the implementation of research projects, as well as for the Pseudonyms are necessary, for example, para. 4 lit. e GDPR). Even in the context if critical data processing needs to be of a balancing of interests pursuant to Art. pseudonymisation, different conditions must implementation of advertising measures. specially protected against inadmissible 6 (1) lit. f GDPR, a pseudonymisation be observed. Special application scenarios for pseudonymisation are listed in the access. In this case, pseudonymisation may have a positive effect and legitimise 1 https://bit.ly/2FjLnVd. 2 Anonymisation is a different procedure.
Requirements for the use of pseduonymisation solutions in compliance with data protection regulations A working paper of the Data Protection Focus Group of the Platform Security, Protection and Trust for Society and Business at the Digital Summit 2018 concerned about the purpose, the controller will inform the data a. In the case of pseudonymisation as a legitimate interest of the controller must be pseudonymisation carried out. In subject accordingly, if possible. The measure for the realisation of an examined in particular. addition, the data subject must be information must at least include a appropriate protection of personal data informed in this context, if necessary, reference to the origin of the data and to The type and quality of the of his right to object with which he/she possible identification if the data subject within the meaning of Art. 32 GDPR, pseudonymization is of particular can prevent his originally collected provides the information required for its importance in both case variants. data from becoming part of a identification (see Art. 11 para. 2 GDPR). compatible further processing (Art. 6 para. 4 lit. d)). the requirement for pseudonymisation 4. Rules for merging with will, as a rule, result from a risk c. If a responsible party has received individual information of assessment including the criteria 3. Information of data subjects - pseudonymised data from a third party certain persons (re- specified in Art. 32 GDPR (state of the transparency - and possibilities and the responsible party can no identification) art, implementation costs, type, scope, to object longer easily identify the data subject, If the results of the pseudonymous data circumstances and purpose of The data subject should also be informed the responsible party shall at least processing are to be merged with the processing as well as level of risk for the adequately when processing of his/her provide the general information via its individual data of data subjects or are to be rights of the person concerned). data based on pseudonyms. This can be own website that pseudonymous data traced back to the data subject (re- Especially critical data processing must done, for example, via data protection are being processed. The origin of the identification), this can either be part of the be provided with a strong notices or policies. The legal options data and the existence of the right of originally planned processing or an pseudonymisation (see D. for technical regarding the data subject’s rights, such as access have to be named. additional service based on the evaluation and organisational requirements). the right to object or the right of access are of the compatible further processing. to be granted to him. d. The data subjects' rights under b. If pseudonymisation is to be used to Chapter III of the GDPR (right of a. If pseudonymisation is used as a enable the processing or further a. In the case of pseudonymisation for access, rectification, deletion, protective measure within the framework processing of personal data, this can be protection purposes, the general restriction of processing, data of legitimate data processing, which is done either by compatible further information provided during data portability and objection) have to be basically also permissible with plain data processing in accordance with Art. 6 para. collection will be sufficient. Objection fulfilled in full by the controller - also in of the data subjects, no further 4 GDPR or by balancing the interests in rights or consent requirements are relation to the stored pseudonym - if permission is required to trace accordance with Art. 6 para. 1 lit. f GDPR. determined here in accordance with the the controller can identify the natural pseudonyms back to individuals in In the case of further processing pursuant legal conditions for the originally person linked to a pseudonym. The addition to the original legitimation for to Art. 6 para. 4 GDPR, the legal planned processing. controller, however, consults the data data processing. requirements of the article must be subject whether a re-identification cumulatively examined and balanced. b. If a pseudonymous further processing, related to the pseudonym is desired. b. If pseudonymisation is a measure to Depending on the characteristics of the for example for compatible purposes, individual test points (lit. a) to e)), further make a further processing possible, the takes place, the data subject should be If a data subject requests access and the processing may or may not be permissible. legitimation according to Art. 6 para. 4 informed about it in principle. The controller cannot identify the data subject GDPR extends only to the further other purpose and its scope should be using the pseudonym, because it does processing of the data, but not to re- When balancing the interests within the explained to him/her. In this case, it is not know certain information about the identify a data subject with a pseudonym. meaning of Art. 6 para. 1 lit. f GDPR, the advisable to inform the person data subject which is necessary for this In these cases, the re-identification
Requirements for the use of pseduonymisation solutions in compliance with data protection regulations A working paper of the Data Protection Focus Group of the Platform Security, Protection and Trust for Society and Business at the Digital Summit 2018 therefore requires the provision of a D.1.2 Discoverability D.1.5 Data collection D.1.10 Entropy consent mechanism for the data A pseudonym is discoverable if it is Data material consisting of several data A measure of the indeterminacy of a subjects which meets the requirements possible to deduce the identity data of the sets from possibly different sources or sequence. For example, ten independent of Art. 7 GDPR. This also applies to associated person from the pseudonym. years, which is to be evaluated for coin tosses (head/number) provide ten bits processing on the basis of a balancing This may require a secret cryptographic statistical purposes and for this reason is of entropy. If a sequence is calculated from of interests pursuant to Art. 6 para. 1 lit. key that is only available to certain users. to be pseudonymised. an initial value ("seed") using a pseudo- f GDPR. random number generator, it can never D.1.3 Enumeration/brute-force attack D.1.6 Dataset reach a higher entropy than the initial value. 5. documentation If all details (including the cryptographic An information belonging to a person A cryptographic key should have an entropy The prerequisites for a legally secure keys used) of a pseudonymisation process which contains identity and content data of at least 100 bits. pseudonymisation as well as the process are known, the corresponding identity data and which must be pseudonymised. steps for carrying out a pseudonymisation can be determined from an existing must be documented. This can be done pseudonym by an enumeration/brute-force D.1.7 Data trustee either by an independent attack (also "complete exhaustion" or "trial See Trusted Third Party. pseduonymisation concept or by a general encryption"). For this purpose, all relevant description when documenting technical- identity data is subjected to D.1.8 l-diversity organisational measures for a processing. pseudonymisation and compared with the A (pseudonymised) data collection offers l- existing pseudonym. diversity, if there are at least l different D. Technical- For example, if f is a crypto-graphic forms of content data for each group of organizational hash function and the value y = f(name) is identical identity data contained in it. l is a requirements for known, but ‘name’ is unknown, the value natural number. pseudonymisation f(name) can be calculated for all names in question and compared with y in order to D.1.9 One-way function D.1 Definitions determine ‘name’. Function f, which is easily calculable but difficult to reverse; it should be practically The following terms are used in the D.1.4 Block cipher impossible to draw conclusions from a following sections. The short explanations Block cipher is an encryption method function value y to x with f(x) = y. of the terms serve to explain their use in the following and may not satisfy the which transforms a data block of fixed length (e.g. 128 bits) into a block of the Remark: requirement of complete definitions. same length depending on a For a one-way function it is necessary that cryptographic key. The most common the definition range of f is very large, D.1.1 k-Anonymity block encryption method today is the AES otherwise the value f(x) could be A (pseudonymised) data collection offers (Advanced Encryption Standard), which calculated and compared with y for all k-anonymity if the identity data of each encrypts 128-bit blocks using a 128-,192- possible x. For an example see: individual person still contained therein bit or 256-bit key. Enumeration attack/brute-force attack. matches at least k - 1 other persons. K is a natural number here.
Requirements for the use of pseduonymisation solutions in compliance with data protection regulations A working paper of the Data Protection Focus Group of the Platform Security, Protection and Trust for Society and Business at the Digital Summit 2018 D.1.11 HMAC D.1.18 Cryptographic key D.1.23 Record Linkage See: Cryptographic checksum. A character string that is used to transform In specialist literature, the merging of data D.1.15 Control number a set of data using a cryptographic function records of a pseudonymised data collection See: Pseudonym. D.1.12 Homonym error (encryption or signature). Depending on on the basis of linkable pseudonyms is A homonym error occurs when the application, the key must be kept referred to as record linkage. D.1.16 Cryptographic pseudonymisation procedures that provide secret. hash function linkability falsely lead to the same D.1.24 Re-identification A hash function is a function that assigns a pseudonyms from different persons. D.1.19 Pseudonym See Discoverability. string of any length to a string of fixed A character string that replaces the length (about 256 bits). A cryptographic D.1.13 Identity data identity data of a person and thus D.1.25 Synonym error hash function also has the property of a All data relating to a person that make it represents this person. The identity data Occurs when, in a linkable one-way function. If, in addition, it is possible to identify the person in more of a pseudonym should, if at all, only be pseudonymisation procedure, identity data practically impossible to find two different detail. inferred under strictly defined conditions of the same person incorrectly lead to input values that provide the same function (see Discoverability). different pseudonyms, although this was not value, one speaks of a collision-resistant D.1.14 Content data intended. hash function. Internationally standardized In a data collection, essentially all data that cryptographic hash functions are MD5, D.1.20 Pseudonymisation list do not belong to the identity data. A list that compares identity data and D.1.26 Linkability of pseudonyms SHA256 or SHA-3. Nevertheless, a personal reference can be pseudonyms. A pseudonymisation list A pseudonymisation procedure ensures established from content data if they are, can be used to determine a person's the linkability of pseudonyms if identity data D.1.17 Cryptographic for example, unique and this information pseudonyms directly from an individual's for the same person generally lead to checksum can be linked to a person. identity data and vice versa to determine identical or similar pseudonyms. The ps- A bit sequence of fixed length (about 256 an individual's identity data from an eudonym, respectively the data records of bits) that is calculated from a character Remark: individual's pseudonym. the person are then "linkable": Identical string of any length using a cryptographic Sometimes there may be overlaps pseudonyms can usually be used to identify key. If the key is known, the checksum can between content data and identity data, be used to determine the integrity of the D.1.21 Pseudonymisation stages identical persons. e.g. in the data collection for a study to string. Without knowledge of the key, it is If a pseudonym is not created directly investigate statements about dependency from the identity data, but in mutually Remarks: impossible to create a valid cryptographic on age or occupation on certain checksum for a character string. An independent stages, one speaks of The linking of pseudonymised data with characteristics. In this case, age and internationally standardized cryptographic pseudonymisation lstages. persons without knowledge of the occupation would (also) be counted checksum is calculated using the HMAC Remark: pseudonymisation procedure or the among the content data. algorithm ((Keyed- Hash Message A pseudonymisation in several stages pseudonymisation table is not meant and Authentication Code). takes place, for example, with the must be avoided. participation of one or more trust bodies. In the case of linkable pseudonyms, D.1.22 Pseudonymisation procedure homonym or synonym errors can A procedure which generates a nevertheless occur. pseudonym from the identity data of a person.
Requirements for the use of pseduonymisation solutions in compliance with data protection regulations A working paper of the Data Protection Focus Group of the Platform Security, Protection and Trust for Society and Business at the Digital Summit 2018 implementation of a pseudonymisation D.1.27 Encryption d. Pseudonyms only if there is a need for requires various process steps, which D.2.2.1 Pseudonymisation lists A method which converts a plaintext into a them; otherwise anonymization typically are: A pseudonymisation list is used to assign ciphertext depending on a cryptographic Depending on the context, different types pseudonyms to identity data using a table. key. The inversion, i.e. to restore the of pseudonyms can be used: D.2.2 Creation of a pseudonym The pseudonyms have no relation to the plaintext from the encrypted text, is called (pseudonymisation of the identity data, neither from functional nor a decryption. ▪ Personal-pseudonyms, which are data record) content perspective. used instead of identity data such as Every pseudonymisation begins with the D.1.28 Trusted third party name, ID number or mobile phone creation of pseudonyms that connect data Example 1: A body which is independent of the data number are displayed sets with associated natural persons. The Pseudonyms are controller in terms of space and ▪ Role-pseudonyms, where one or pseudonym may be used to re-identify a numbered organisation. The only task of the trusted more persons are assigned to a data set, must be kept separately and consecutively. third party is to support the conversion of pseudonym (e.g. IP number) protected by technical and organisational identity data into pseudonyms. ▪ Relationship-pseudonyms, in which measures. Identity data Pseudonym Remark: a person uses a different With the data to be pseudonymised, a Peter Müller If necessary, several trust authorities can pseudonym for each distinction is made between identity data born 2022917 be involved in a pseudonymisation (communication) relationship, e.g. of the persons involved and content data. 31.01.1965 process, which create the pseudonyms in different nicknames A strict separation between the two types in Cologne several pseudonymisation stages. ▪ Role-relationship-pseudonyms, of data is not possible in all cases, so that Maria Schulze which are a combination of the two content data can also contain information born 2022918 D.1.29 Allocation table pseudonym types about a person (e.g. gender, occupational 03.05.1959 See pseudonymisation list. ▪ Transaction-pseudonyms, in which group and year of birth) and thus an in Hürth Max KIein a new pseudonym is used for each identification of a data subject is possible. born 31.10.1967 2022919 D.2 Measures transaction, which is used, for The type of pseudonymisation chosen in Bornheim example, in online banking. can have a fundamental influence on the D.2.1 General information user's scope of action. With a strong In the case of pseudonymisation, basic In general, the linkability of personal pseudonymisation, more critical data principles are to be observed which must pseudonyms is considered higher than processing can usually be sufficiently be observed for every procedure: that of role or relationship pseudonyms. protected than with a weak Even less is the linkability of role- pseudonymisation. It is also true in the a. Knowledge, only if necessary relationship pseudonyms and trans- area of compatible further processing that action pseudonyms; in principle they with stronger pseudonymisation a given b. Delete data, whenever possible cannot be linked. Basically, the less compatibility of the intended further pseudonymisation can be linked, the processing with the original purpose can c. Avoiding the accumulation of too much greater is the possible anonymity of the be assumed. knowledge in one place (e.g. plain text data for third parties. A low linkability There are basically two procedures data and pseudonymised data about a increases the strength of the available for creating a pseudonym: person) pseudonymisation at the same time. In Pseudonymisation lists and pseudonyms addition, the technical-organisational by calculation methods.
Requirements for the use of pseduonymisation solutions in compliance with data protection regulations A working paper of the Data Protection Focus Group of the Platform Security, Protection and Trust for Society and Business at the Digital Summit 2018 4. If several data suppliers are involved in Formation of a cryptographic Example 2: that with n possible pseudonyms, a the pseudonymisation process and if a checksum: Pseudonyms are generated collision occurs after the square root of Pseudonym = HMACK(ID). randomly or pseudo- n formed pseudonyms with a re-identification of the data supplier on HMAC = a Keyed Hash Message randomly. probability of 50%. So, if the the basis of a pseudonym should be Authentication Code, like RFC2104. pseudonyms are chosen as ten-digit possible, the identity of the data supplier Identity data Pseudonym decimal numbers, after 10000 can also be pseudonymised and placed Comments: randomly generated pseudonyms with in front of the pseudonyms of the Peter Müller 1. The entropy of K should be at least 100 a probability of 50%, two identical persons. born 2184578 bits. pseudonyms are created (keyword 31.01.1965 "birthday paradox"3). D.2.2.2 Pseudonyms through in Cologne 2. To calculate the pseudonym, not all Maria Schulze calculation methods born 3654425 3. The random function offered by a Another possibility is to calculate the identity data need to be used. In general, 03.05.1959 programming language should not be pseudonyms from identity data using an it is sufficient to make a selection of the in Hürth used as the source of randomness algorithm. identification data so that the person can Max Klein (e.g. the function rand() in the The transformation process has to be identified in the data collection to be born 31.10.1967 8745124 programming language C). For consider a state-of-the-art procedure (e.g. pseudonymised. See also section E.2. in Bornheim example, the iterated output of a the Federal Office for Information cryptographic hash function can be Security’s guideline TR- 02102-11 or the 3. The entire output of the calculation is not Comments: used as a random source: ENISA guideline on crypto procedures) in needed to generate the pseudonym. See order to avoid weak points of an encryption note 2 from section D.2.2.1. 1. When the pseudonyms are numbered A1 = Hash(A0), which could lead to the disclosure of a consecutively, it may be possible to Pseudonym1 = Bit 1 to 40 of A1 person. 4. Although a cryptographic hash draw conclusions about the identity In order not to be able to deduce the function is a one-way function, it is not data. For example, if the output data is A2 = Hash(A1), identity data (ID) from the pseudonym, the sufficient to calculate the pseudonym sorted alphabetically. Or at what time Pseudonym2 = Bit 1 to 40 of A2 calculation must depend on a certain exclusively using the hash function, for the pseudonyms were created parameter, a so-called cryptographic key. example via (example: Spanish registration plates A3 = Hash(A2), Calculation methods can be the ▪ Pseudonym = Hash(PID) provide information about the initial Pseudonym3 = Bit 1 to 40 of A3 following: registration of the vehicle). If a pseudonym is present, the PID whose Encryption with an encryption hash value gives the pseudonym could be 2. With random pseudonyms, the length A0 is a genuine random value to be method: determined by an exhaustive search of all of the pseudonyms should not be too chosen by the pseudonymisation Pseudonym = EK(ID). possible values for PID. In Germany, short, otherwise collisions and authority with an entropy of at least 100 Here EK refers to the encryption with a depending on the composition of PID, this homonym errors can occur. The rule of bits. For the selection of the number of block cipher algorithm, such as AES, with search would be limited to a maximum of 80 thumb is, bits (here 40) see note 2. the key K. million hash value calculations. 3 https://en.wikipedia.org/wiki/Birthday_problem.
Requirements for the use of pseduonymisation solutions in compliance with data protection regulations A working paper of the Data Protection Focus Group of the Platform Security, Protection and Trust for Society and Business at the Digital Summit 2018 5. In a data collection, the identity data may be replaced by several pseudonyms that are calculated from different attributes of the identity data. Example: Pseudonym1 = EK(health insurance number) Pseudonym2 = EK(Name | Birthday | Place of birth) Pseudonym3 = EK(birth name | birthday | place of birth) 6. The generation and administration (e.g. distribution, storage, use, deletion) of secret parameters (cryptographic keys) must be realized by state-of-the-art technical and organisational measures. 7. The security of the chosen pseudo- nymisation procedure can be increased by defining suitable intervals - depending on time or data volume - in which a secret parameter (cryptographic key) is exchanged. Depending on the type of procedure chosen and the risk for those affected, several pseudonymisation stages can also be built in to exclude detectability (so-called "over-encryption").
Requirements for the use of pseduonymisation solutions 23 in compliance with data protection regulations A working paper of the Data Protection Focus Group of the Platform Security, Protection and Trust for Society and Business at the Digital Summit 2018 D.2.2.3 Multi-stage and mixed D.2.2.4 Advantages and disadvantages of different pseudonymisation methods pseudonymisation procedures 5. V forwards the data records with the Method Advantages Disadvantages The security of a pseudonymisation procedure can be increased if the creation new pseudonyms P2 to a collector C. 1. No key management 1. Poor scalability (table can of pseudonyms is carried out by several required become very large) independent bodies. Both 6. C uses the pseudonyms P2 to merge 2. Table must be protected pseudonymisation lists and calculation the received data records by means of permanently methods can be used. record linkage. 3. Pseudonymiser needs permanent access to the whole table 7. The data are to be evaluated at points 4. Discoverability requires access to Example: Assignment the entire table X, Y and Z (from different points of tables 5. Linkability requires access to the 1. A, B and C collect data from individuals view). For this purpose, C filters the data entire table (A, B and C can be, for example, collection and compiles the necessary 6. Access to the table implies medical practices that collect patient data records for X, Y and Z from the linkability and discoverability data collection. (linkability and discoverability are data). not separately controllable) 7. Access based on roles 2. A, B and C form data sets with the help 8. From the (partial) data collection for X requires role-specific table of a calculation method and a (and also for Y and Z) the pseudonyms copies cryptographic key K1 (which is available P2 are removed and replaced by new 1. Key management required (if 1. Good scalability, no table at all data collection points). pseudonyms P3, which result from a management necessary further secret or public pseudonymisation list LX, which 2. Control of knowledge of secret parameters are needed) assigns the pseudonyms P3 to the parameters allows access control 3. A, B and C deliver the pseudonymised to calculation rules data records to a trusted third party. pseudonyms P2. The pseudo- 3. Different parameters for nymisation lists LX, LY and LZ for X, Y pseudonymisation, linkability and and Z are different and independent of Calculation 4. V forms new pseudonyms P2 from the discoverability are possible, each other. method therefore separately controllable obtained pseudonyms P1 using a 4. Only the cryptographic keys need to calculation method and a cryptographic be securely protected key K2 for the data records and Remark: 5. Role-based access via roll- replaces the obtained pseudonyms P1 By generating different lists it is ensured specific parameters is easily with the new pseudonyms P2. that not several data evaluators can merge possible the data collections made available to them 6. Purpose limitation via technical on the basis the pseudonyms contained in parameters provides linkability and discoverability based on these collections. specific purposes
Requirements for the use of pseduonymisation solutions in compliance with data protection regulations A working paper of the Data Protection Focus Group of the Platform Security, Protection and Trust for Society and Business at the Digital Summit 2018 D.2.4 Documentation of technical D.2.6 Loss of purpose for D.2.3 Separate storage of the It should be avoided that a person is and organisational measures processing cryptographic key assigned to multiple roles. This also for non-assignability The purposes and duration of the applies to administrators. Any exceptions Technical-organisational measures to pseudonymisation procedure shall be D.2.3.1 Access control must be justified and documented. ensure that a pseudonym cannot be determined in advance and the measures (authorization concept) Access to a cryptographic key must be assigned to identity data, for example in for the termination of the procedure, A separate storage of the cryptographic restricted to an absolute minimum of the case of missing legitimation, must be including the technical implementation of a key requires a documented authorization trustworthy persons (need-to-know documented. This can be done in a data deletion, be documented. concept. At least two different roles must principle). pseudonymisation concept. The concept If the purpose for a pseudonymisation be defined: The possibility of re-identification must be integrated into an IT security no longer applies, e.g. the data is no 1) The role with access authorization to should not exist in the department of an management system (e.g. ISO/IEC longer needed, pseudonymised data must the key for re-identification; organisation in which content data 27001). The IT security management be deleted or anonymised in accordance 2) The role with access to the belonging to a pseudonym are processed. system has to be documented and its with data protection regulations. Such pseudonymised content data. Any exceptions must be justified and effectiveness regularly reviewed. anonymisation cannot usually be achieved documented. by deleting the pseudonyms, but must take It is recommended to define the D.2.5 Rules for disclosure place as an independent procedure for following roles for a pseudonymisation D.2.3.2 Four-eyes principle Since a re-identification of identity data which special requirements apply which procedure: Any access to a cryptographic key for the may be possible during cannot be dealt with in detail here. In the re-identification of identity data must pseudonymisation, a planned disclosure case of anonymisation, it must also be 1. Provide data follow the four-eyes principle. This can be of a pseudonym must be regulated. To this checked at regular intervals whether the solved technically or organisationally. end, a documented definition of cases of a data can still be classified as anonymous. 2. Pseudonymise data and re-identify Furthermore, none of the persons desired disclosure is needed. The process If a data subject has a right to delete it, if necessary involved should have access rights to both of re-identifying the data subject must also his/her data, this right refers to personal the cryptographic key, the pseudonym be logged. The record must show which data and pseudonymised data, not to 3. Collect data and merge them and the associated content data. If the persons carried out the re-identification. anonymous data. Legal retention periods using pseudonyms ("record four-eyes principle is not possible, at least No conclusions about the identity data on must be observed. linkage") the access to the cryptographic key must which a pseudonym is based may be be logged individually. drawn from the recording. Therefore, the 4. Evaluate data scope of the logging must be restricted. Log data may only be stored for a limited It is mandatory that roles 2 and 4 exist time. separately from each other.
Requirements for the use of pseduonymisation solutions in compliance with data protection regulations A working paper of the Data Protection Focus Group of the Platform Security, Protection and Trust for Society and Business at the Digital Summit 2018 E.2 Selection of identity data ▪ Writing and typing errors or 3. treatment methods. Data suppliers All attributes relating to a person that allow transposed numbers E. Best practices the person to be more closely identified ▪ Change of name due to wedding or include doctors, hospitals and death belong to the identity data of the person. divorce E.1 Linkable registers. Some of the data extend over These could be for example: ▪ Different spellings of the first name pseudonymisation methods long periods of time and may even (e.g. Hans/Johannes, Inge/ Ingrid) A pseudonymisation process provides originate from different federal states, ▪ First name, family name and maiden ▪ Change of residence linkable pseudonyms if identical or similar as the patients may have changed their name ▪ Change of name of a locality due to pseudonyms are generated for persons place of residence. Meaningful studies ▪ Gender a territorial reform with the same or similar identity data. In can only be created on the basis of ▪ Date and place of birth ▪ Ignorance of an attribute this case, data records can be merged linkable pseudonyms. ▪ Place of residence (e.g. place of birth) using pseudonyms. Linkable methods are and nationality ▪ and much more. important for long-term studies, for Note: ▪ Number of siblings example, or if the data sets come from ▪ Occupation or If the case occurs that a person is assigned different sources and are to be merged for If there are several pseudonyms in the occupational group different pseudonyms at different times or one study. The process of merging by data collection for a data set (see note 5 in ▪ Health insurance or identity card from different places, one speaks of a means of linkable pseudonyms is referred Section D.2.2.2), the data sets can be number synonym error. In this case, the to in specialist literature as record linkage. linked if only one of the pseudonyms ▪ and much more. pseudonym can no longer be linked to this matches. person. Examples: E.2.1 Identity data for the The synonym error rate can be reduced calculation of pseudonyms by the following measures: 1. For studies on the legal probation of offenders, the content data (offence, The identity data of a person can be used, as described in Section D.2.2.2, to ▪ Omission of an attribute when sentence, age, etc.) are collected in a calculate the pseudonym to the person. calculating the pseudonym, for database. For data protection reasons, It has to be taken into account that example, only the year of birth is used the entries may not have any personal when using a cryptographic function to instead of the complete date of birth. reference. Authorities are regularly obliged to delete data on previous calculate the pseudonyms, the same ▪ Restriction of the name to the initial convictions of persons after legally identity data will provide the same letter or letters (i.e. the first three) prescribed periods of time. However, in pseudonyms, but even minor deviations in ▪ Use of a name or phonetic code order to carry out long-term studies on the identity data will lead to completely instead of the name (see for example the recidivism of offenders, the data different pseudonyms. Reasons for a de.wikipedia.org/wiki/Kölner_Phoneti material could be analysed using change of the pseudonyms can be: k) linkable pseudonyms. ▪ Use of the municipality code number instead of place of residence or birth 2. The German epidemiological cancer ▪ and much more. registries collect pseudonymised data sets on cancer patients in order to If, on the other hand, different people investigate the success of different receive the same pseudonym at different
Requirements for the use of pseduonymisation solutions in compliance with data protection regulations A working paper of the Data Protection Focus Group of the Platform Security, Protection and Trust for Society and Business at the Digital Summit 2018 Comments: Comments: For example, it is conceivable that there is times or from different places, this is only one floor tiler in the postal code area 1. Larger values for k and l represent a referred to as a homonym error. If the 1. In the case of a high synonym error rate, 65432. This would then undoubtedly be greater anonymity in this context. pseudonyms are calculated from the values are generally underestimated identifiable in the data collection. However, 2. k-Anonymity and l-Diversity can be identity data, homonym errors always (e.g. the relapse rate in a legal even if there are several floor tilers with the achieved by aggregating the attributes in occur if the identity data from which the probation study or the mortality rate in a postal code 65432, it would have to be the identity data. pseudonyms are calculated match for both specific treatment method). ensured that these do not all have a persons. The homonym error rate can be 2. With a high homonym error rate, values certain characteristic in common, for Examples: are generally overestimated. example a certain illness, because reduced by the following measures: 3. A reduction of the synonym error rate otherwise one would immediately know ▪ Instead of "tiler", "craftsman" is usually results in an increase of the from a person of whom one knows that he indicated as the occupation. ▪ Adding additional attributes for the homonym error rate - and vice versa. is a floor tiler by profession and has the ▪ All postal codes in the data collection calculation of the pseudonym, e.g. 4. A compromise between synonym error postal code 65432 that he suffers from this that begin with 654 are added the complete date of birth can be rate and homonym error rate strongly illness. together. Instead of 65432, 654xx is used instead of only the year of birth. depends on the underlying or expected For a pseudonymised data collection then stored in the data collection. ▪ Use of long-lasting unique data collection. Accordingly, the k-anonymity and l-diversity must therefore characteristics for calculating attributes of the identity data to be used be guaranteed. 3. k-anonymity and l-diversity shall be pseudonyms, such as the pension for the calculation of the pseudonyms A data collection offers k-anonymity if established by the pseudonymising insurance or health insurance are to be selected. the identity data of each individual person body (see Section D.2.3.a). For this numbers contained in it overrides at least k - 1 other purpose, the pseudonymising entity ▪ and much more. E.2.1 Identity data in the persons. must have access to the attributes of the content data A data collection offers l-diversity if identity data contained in the content In pseudonymised data collections, the there are at least l different forms of data. content data may still contain identification content data for each group of identical data, provided that this can be of identity data contained therein. significance for the intended research k and l are natural numbers. using the data collection. For example, gender, age, place of residence (as a five- digit postal code) or occupation may be of interest. In certain cases, however, it may be possible to identify individuals solely on the basis of the identity data contained in the content data.
Requirements for the use of pseduonymisation solutions in compliance with data protection regulations A working paper of the Data Protection Focus Group of the Platform Security, Protection and Trust for Society and Business at the Digital Summit 2018 a separate transmission path from the E.3 Involving a trusted third If the pseudonym was created from the data suppliers directly to the data party identity data by a pseudonymisation list, collection entity. The separate The security of pseudonymisation transmission path can be of a physical it is necessary for the discoverability procedures is generally increased if the nature; however, the content data can that the pseudonymisation list used was roles mentioned in Section D.2.3.a are also pass through the trusted third party not deleted. separated organisationally and locally. A and be encrypted using an encryption For multi-stages and mixed processes, trusted third party receives the data procedure in which only the data collection all cryptographic keys and collection of the data supplier(s), entity is able to decrypt the data. pseudonymisation lists used for generates the pseudonyms and forwards creation are required for discoverability. them to the data collection entity. The data E.3 Discoverability of In the example scenario from section collection entity then merges the data pseudonyms/re-identification D.2.2.3, a re-identification of a received using the pseudonyms. The data Under certain circumstances, it may be pseudonym P3, which is present at the collection entity then passes them on to necessary to trace the associated person data evaluator X, would be possible as the data evaluator(s). In this way, neither or his or her identity data from a follows: the data collection entity nor the data pseudonym. evaluator come into contact with the In the event that the pseudonym has 1. X returns the pseudonym P3 to S identity data at any time. been created by a calculation procedure 2. S determines the pseudonym P2 After pseudonymisation at the trusted from the identification data, it is necessary from P3 using the list LX. third party, the trusted party may be for discovery that the cryptographic keys obliged to delete the identity data used have not been deleted. If the 3. S returns the pseudonym P2 to V irretrievably if there is no need to re- formation of the pseudonyms was based 4. V calculates the pseudonym P1 from identify the pseudonyms (see sections on an encryption process, the pseudonym P2 using the key K2. D.2.5 and E.4). After completion of the can be decrypted immediately in order to entire procedure, the trusted third party access the identity data. If the pseudonym 5. V returns the pseudonym P1 to an may be obliged to delete the cryptographic was formed by a crypto-graphical authorized authority that has keys used. checksum, it is not possible to discover knowledge of the key K1. For the trust third party there is no need the identity data directly. However, if the to know the content data, but only the key K used has not been deleted, the The authorized authority determines identity data in the case of a linkable identity data can be determined by a the associated identity data from P1 pseudonymisation procedure. It is complete exhaustion of all relevant using the key K1. therefore advisable to set the content data identity data (see comment 4 in Section to D.2.2.2).
You can also read