Contact Tracing Mobile Apps for COVID-19: Privacy Considerations and Related Trade-offs - arXiv
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Contact Tracing Mobile Apps for COVID-19: Privacy Considerations and Related Trade-offs Hyunghoon Cho∗ Daphne Ippolito∗ Yun William Yu∗ Broad Institute of MIT and Harvard University of Pennsylvania University of Toronto hhcho@broadinstitute.org daphnei@seas.upenn.edu ywyu@math.toronto.edu Abstract there were only a few cases, contact tracing could be done manually. With hundreds to thousands of Contact tracing is an essential tool for pub- cases surfacing in some cities, contact tracing has arXiv:2003.11511v2 [cs.CR] 30 Mar 2020 lic health officials and local communities to become much more difficult [4]. fight the spread of novel diseases, such as for the COVID-19 pandemic. The Singaporean Countries have been employing a variety of government just released a mobile phone app, means to enable contact tracing. In Israel, legisla- TraceTogether, that is designed to assist health tion was passed to allow the government to track officials in tracking down exposures after an in- the mobile-phone data of people with suspected fected individual is identified. However, there infection [5]. In South Korea, the government has are important privacy implications of the exis- maintained a public database of known patients, tence of such tracking apps. Here, we analyze including information about their age, gender, oc- some of those implications and discuss ways of ameliorating the privacy concerns without cupation, and travel routes [6]. In Taiwan, medical decreasing usefulness to public health. We institutions were given access to patients travel his- hope in writing this document to ensure that tories [7], and authorities track phone location data privacy is a central feature of conversations for anyone under quarantine [8]. And on March surrounding mobile contact tracing apps and to 20, 2020, Singapore released an app that tracks encourage community efforts to develop alter- via Bluetooth when two app users have been in native effective solutions with stronger privacy close proximity: when a person reports they have protection for the users. Importantly, though we discuss potential modifications, this docu- been diagnosed with COVID-19, the app allows the ment is not meant as a formal research paper, Ministry of Health to determine anyone logged to but instead is a response to some of the privacy be near them; a human contact tracer can then call characteristics of direct contact tracing apps those contacts and determine appropriate follow-up like TraceTogether and an early-stage Request actions. for Comments to the community. Solutions that have worked for some countries Date written: 2020-03-24 may not work well in other countries with differ- Minor correction: 2020-03-30 ent societal norms. We believe that in the United States, in particular, the aforementioned measures 1 Introduction are unlikely to be widely adopted. On the legal side, publicly revealing patients’ protected health infor- The COVID-19 pandemic has spread like wildfire mation (PHI) is a violation of the federal HIPAA across the globe [1]. Very few countries have man- Privacy Rule [9], and the Fourth Amendment bars aged to keep it well-controlled, but one of the key the government from requesting phone data with- tools that several such countries use is contact trac- out cause [10]. Some of these norms may be sus- ing [2]. More specifically, whenever an individual pended during times of crisis—HIPAA has recently is diagnosed with the coronavirus, every person been relaxed via enforcement discretion during the who had possibly been near that infected individual crisis to allow for telemedicine [11], and a pub- during the period in which they were contagious lic health emergency could well be argued to be a is contacted and told to self-quarantine for two valid cause [12]. However, many Americans are weeks [3]. In the early days of the virus, when wary of sharing location and/or contact data with ∗ Authors listed alphabetically. tech companies or the government, and any privacy
concerns could slow adoption of the system [13]. 3 Desirable Notions of Privacy Singapore’s approach of an app, which gives in- dividuals more control over the process, is perhaps Here, we discuss three notions of privacy that are the most promising solution for the United States. relevant to our analysis of contact-tracing systems: However, while Singapore’s TraceTogether app (1) privacy from snoopers, (2) privacy from con- protects the privacy of users from each other, it has tacts, and (3) privacy from the authorities. Note serious privacy concerns with respect to the gov- that in this document, we do not rigorously define ernment’s access to the data. In this document, we what it means for information to be private, as this discuss these privacy issues in more detail and intro- is a topic better left for future works; some popular duce approaches for building a contact tracing ap- definitions include information theoretic privacy plication with enhanced privacy guarantees, as well [16], k-anonymity [17], and differential privacy as strategies for encouraging rapid and widespread [18]. Furthermore, we discuss only these three adoption of this system. We do not make explicit notions of privacy to illustrate some of the short- recommendations about how one should build a comings of direct contact-tracing systems. Other privacy-preserving contact tracing app, as any de- recent work has presented a useful taxonomy of the sign implementation should first be carefully vetted risks and challenges of contact tracing apps [19]. by security, privacy, legal, ethics, and public health For any contact tracing app that achieves the aim experts. However, we hope to show that there exist of telling individuals that they might have been options for preserving several different notions of exposed to the virus, there is clearly some amount user privacy while still fully serving public health of information that has to be revealed. Even if aims through contact tracing apps. the only information provided is a binary yes/no to exposure, a simple linkage attack [20] can be 2 Singapore’s TraceTogether App performed: if the individual was only near to one person in the last two weeks, then there will be On March 20, 2020, the Singaporean Ministry of an obvious inference about the infection status of Health released the TraceTogether app for Android that person. The goal is of course to reduce the and iOS [14]. It operates by exchanging tokens amount of information that can be inferred by each between nearby phones via a Bluetooth connec- of the three parties (snoopers, contacts, and the tion. The tokens are also sent to a central server. authorities) while still achieving the public health These tokens are time-varying random strings, as- goal of informing people of potential exposures to sociated with an individual for some amount of help slow the spread of the disease. time before they are refreshed. Should an indi- Of note, here we use a semi-honest model for vidual be diagnosed with COVID-19, the health privacy [21], where we do not consider the pos- officials will ask* them to release their data on the sibility of malicious actors polluting the database app, which includes a list of all the tokens the app or sending malformed queries, but rather instead has received from nearby phones. Because the gov- just analyze the privacy loss from the information ernment keeps a database linking tokens to phone revealed to each party. A nefarious actor could, numbers and identities, it can resolve this list of for example, falsely claim to be infected to spread tokens to the users who may have been exposed. panic; this is not a privacy violation, though we do By using time-varying tokens, the app does keep consider this further in the Discussion. Alternately, the users private from each other. A user has no when a server exposes a public API, queries can be way of knowing who the tokens stored in their app crafted to reveal more information than intended belong to, except by linking them to the time the by the system design, which is indeed a privacy token was received. However, the app provides violation. We leave a more thorough analysis of little to no privacy for infected individuals; after safeguards for the malicious model to future work. an infected individual is compelled to release their data, the Singaporean government can build a list 3.1 Privacy from Snoopers of all the other people they have been in contact with. We will formalize these several notions of Consider the most naı̈ve system for contact trac- privacy in Section 3. ing, which no reasonable privacy-conscious society * While the health officials ask, it is a crime in Singa- would ever use, where the app simply broadcasts pore not to assist the Ministry of Health in mapping one’s the name and phone number of the phone’s owner, movements, so ‘ask’ is a bit of a misnomer [15]. and nearby phones log this information. Then, 2
upon diagnosis of COVID-19, the government pub- the user is a binary exposure indicator, which is ar- lishes a public list of those infected, which the app guably the minimum possible information release then checks against its list of known recent contacts. for the system to be useful. This is clearly problematic as a nefarious passive actor (a ‘snooper’) could track the identities of peo- 3.3 Privacy from the Authorities ple walking past them on the street. Protecting the privacy of the users from the au- A slightly more reasonable system would as- thorities, i.e. whoever is administering the app, sign a unique user-ID to each individual, which whether that is a government agency or a large is instead broadcast out. This does not have quite tech company, is also a challenging task. Clearly, as many immediate security implications, though in the absence of a fully decentralized peer-to-peer all it would take is a nefarious actor linking each system, any information sharing among phones ID to a user before one runs into the same prob- with the app installed will have to be mediated by lem, which is known as a ‘linkage attack.’ Given some coordinating servers. Without any protective how easy and common linkage attacks are, this ap- measures (e.g. based on cryptography), the coordi- proach also provides insufficient levels of privacy nating servers are given an inordinate amount of for users [22; 23]. knowledge. The Singaporean app TraceTogether does better, TraceTogether does not privilege this type of pri- in that it instead broadcasts random time-varying vacy, instead making use of relatively high trust tokens as temporary IDs. Because these tokens in the government in its design. While it does not are random and change over time, someone scan- deliberately gather more information than neces- ning the tokens while walking down the street will sary to build a contact map—for example, it does not be able to track specific users across different not use GPS location information, as Bluetooth time points, as their tokens are constantly refreshed. is sufficient for finding contacts—it also does not Note that the length of time before refreshing a to- try to hide anything from the Singaporean govern- ken is an important parameter of the system (too ment. When a user is diagnosed with COVID-19 infrequent and users can still be tracked, too fre- and gives their list of tokens to the Ministry of quent and the amount of tokens that need to be Health, the government can retrieve the mobile stored by the server could be huge), but with a rea- numbers of all individuals that user has been in sonable refresh rate, the users are largely protected contact with. Thus, neither the diagnosed user, nor against attacks by snoopers in public spaces. the exposed contacts, have any privacy from the government. 3.2 Privacy from Contacts Furthermore, because the government maintains a database linking together time-varying tokens Here, the term contact is defined as any individ- with mobile numbers, they can also, in theory, track ual with whom a user has exchanged tokens in the people’s activities without GPS simply by placing contact tracing app based on some notion of phys- Bluetooth receivers in public places. There is no ical proximity. Privacy from contacts is harder to reason to disbelieve the TraceTogether team when achieve, because the information that needs to be they state that they do not attempt to track people’s passed along is whether one of the individual’s con- movements directly; however, the data they have tacts has been diagnosed with COVID-19, so some could be employed to do so. Citizens of countries information has to be revealed. such as the U.S. trust authorities much less than The TraceTogether app gives privacy from con- Singaporeans [24], so the privacy trade-offs that tacts by instead putting trust in government authori- Singaporeans are willing to make may not be the ties. When TraceTogether alerts a contact that they same ones that Americans will accept. have been exposed to COVID-19, the information comes directly from the Singaporean Ministry of 4 Privacy-Enhancing Augmentations to Health, and no additional information is shared (to the TraceTogether System our knowledge) that could identify the individual that was diagnosed. Thus, TraceTogether does pro- Here, we discuss potential approaches to build tect users’ privacy from each other, except for what upon the TraceTogether model to obtain a con- can be inferred based on the user’s full list of con- tact tracing system with differing privacy char- tacts, as the only information that is revealed to acteristics for the users. Though important and 3
Table 1: Comparison of contact tracing systems discussed in this document with respect to privacy of the users in the semi-honest model and required computational infrastructure. Privacy Privacy from contacts Privacy from authorities Infrastructure from Exposed requirements Diagnosed user Exposed user Diagnosed user snoopers user No. Infection status, No. Exposure Trace To- all tokens, and all Yes Yes Yes status and all tokens Minimal gether [14] contact tokens revealed. revealed. Polling- Partial. Susceptible Partial. Susceptible Low. Single based* Yes Yes Yes† to linkage attacks. to linkage attacks. server. (§4.1) Almost private. Almost private. Polling- Medium. Protects against Protects against based with † Multiple Yes Yes Yes linkage attacks by linkage attacks by mixing servers for mixing tokens from mixing tokens from (§4.3) mixing. different users. different users. Partial. Info Communica- Public leaked at time Partial. Susceptible tion cost to database Yes Yes Yes of token to linkage attacks. phones is (§4.4) exchange. high. High. Private Partial. Info Multiple messaging leaked at time Yes Yes Yes Yes servers system of token performing (§5) exchange. ‡ crypto. * Augmenting with random tokens does not improve privacy. † However, if contacts are malicious, and they send malformed queries (e.g. a query that includes only a single token), the diagnosed individual only has the same privacy level as in the public database solution. Namely, there’s only partial privacy because information is leaked through knowing the time of token exchange. ‡ This information leakage might be fixable using data aggregation based on multi-key homomorphic encryption, but we do not do so here. highly nontrivial, various technical and engineer- i.e. bt ∈  and at ∈ B̂ because Alice and Bob ing challenges behind the exchange of Bluetooth exchanged tokens at time t. Five days later, Bob tokens [25] are outside the scope of this document. is diagnosed with COVID-19, and sends his list Our abstraction is that there exists some mecha- of contact tokens B̂, which includes at , to Grace. nism for nearby phones to exchange short tokens if Grace then matches each b̂i to a phone number, the devices come within 6 feet of each other—the reaches out to those individuals, including Alice, estimated radius within which viral transmission and advises them to quarantine themselves because is a considerable risk [26]. We are primarily con- they may have been exposed to the virus. cerned with the construction of those tokens, and how those tokens can be used to perform contact 4.1 Partially Anonymizing via Polling tracing in a privacy-preserving manner. Instead of having Grace reach out to Alice when First, we formally describe the TraceTogether Bob reports that he has been diagnosed, a more system. Let Alice and Bob be users of the app, and privacy-conscious alternative is for Alice to “poll” let Grace be the government server (or other cen- Grace on a regular basis. In this setting, Grace tral authority). Alice generates a series of random maintains the full database, and Alice asks Grace tokens A = {a0 , a1 , . . .}, one for each time inter- if she has been exposed. This alternative does not val, and Bob generates a similar series of tokens require Alice and Bob to send their phone numbers B = {b0 , b1 , . . .}, all drawn randomly from some to Grace. In this setting, there are two reporting space {0, 1}N . They also both report their list of choices for when Bob wishes to declare his diag- tokens A and B, as well as their phone numbers nosis of COVID-19. Bob can send his own tokens to Grace. At a time t, Alice and Bob encounter B to Grace, or he can send the contact tokens B̂ each other, exchanging at and bt . Alice and Bob to Grace. In the former case, Alice needs to send keep lists of contact tokens  = {â0 , â1 , . . .} and Grace her contact tokens  to see if any have been B̂ = {b̂0 , b̂1 , . . .} respectively. These consist of diagnosed with COVID-19. In the latter case, Alice tokens from every person they were exposed to; needs to send Grace her own tokens A to ask if any 4
of them have been published. Either way, Grace ple their tokens uniformly at random from {0, 1}N , is able to inform Alice that she has been exposed, where N is chosen to be sufficiently large that ac- without revealing Bob’s identity. This presupposes cidental collisions between individuals’ tokens are that Alice is Honest but Curious (semi-honest); if unlikely. Suppose Bob sends to Grace his own to- Alice is malicious and crafts a malformed query kens B upon being diagnosed, and Alice queries containing only the token she exchanged with Bob, Grace with all her contact tokens Â. In theory, she may be able to reveal Bob’s identity. Bob could augment his own tokens with a set of Note that in either version of this system, indi- n random tokens {ri }ni=1 drawn uniformly from viduals still have privacy from snoopers and from {0, 1}N , and send those to Grace as well. Un- contacts. However, they additionally gain some fortunately, N was chosen to prevent accidental amount of privacy from authority, as Grace does collisions; this means that the probability that the not have their mobile numbers. Of course, Grace additional random tokens correspond to the tokens does have some ability to perform linkage attacks. broadcast by any individual is vanishing small. But If Bob publishes to Grace his own tokens B upon then, there is actually little to no privacy gained. being diagnosed, and Alice queries Grace with all Grace can just assume that the augmented set of her contact tokens Â, then Grace can attempt to link tokens correspond to Bob, and perform the same those sets of tokens to individuals or geographic linkage analysis that she would with only the cor- areas; further, Grace can also monitor the source rect set of tokens. This does nothing but pollute of Alice and Bob’s queries (i.e. IP addresses of Grace’s database with extra data, without affording phones). For example, if Grace has Bluetooth sen- any real privacy gains for Bob. Similarly, Alice sors set up in public places, she can then trace also cannot obfuscate her exposure through Bob Alice and Bob’s geographic movements. That kind from Grace, because any extra tokens she sends to of location trace is often sufficient to deanonymize Grace will not change the fact that she has Bob’s personal identities [23]. Alternatively, the same is token as one of her contacts. true if Bob publishes his contact tokens to Grace The root of the problem is that Grace has access and Alice queries Grace with her own tokens. Thus, to the universe of all tokens through user queries, there is not perfect privacy from the authorities, but and so can simply filter out all of the random tokens still better than in the original TraceTogether sys- generated. Thus, random noise is ineffective for tem, at the cost of potentially lower privacy for Bob hiding information from Grace. in the malicious model. 4.3 Enhancing Anonymity by Mixing 4.2 Ineffectiveness of Adding Spurious Different Users’ Tokens Tokens for Further Anonymization Although introducing spurious random tokens into To further anonymize the polling-based system the system achieves little in terms of privacy, as to increase privacy from authorities, there are a discussed in the previous subsection, a slight mod- number of techniques that can be used to hide Al- ification of this idea leads to meaningful privacy ice and Bob’s identities. Let’s begin with a sim- guarantees. The issue is that Grace has access to ple approach—that doesn’t actually work—to give the entire universe of tokens, as well as both of some intuition before moving on to more effective the sets of tokens corresponding to Alice and Bob, approaches. Consider injecting random noise by possibly augmented with random noise. Instead augmenting the data with artificial tokens. When- of hiding true tokens with random noise, suppose ever Alice and Bob send information to Grace (ei- the system includes a set of M honest-but-curious ther in the form of a diagnosis report or a query), non-colluding “mixing” servers not controlled by they can augment their tokens with random ones. Grace that aggregate data before forwarding it on Note that some care has to be taken in deciding to Grace. which distribution to draw the random tokens from. When Bob is diagnosed with COVID-19, he par- Not only should the system keep the probability of titions the tokens he wishes to send (depending on spurious matches low, but the distributions should the setup of the system, either his own tokens, or also be designed to make inferences by Grace diffi- those of his contacts) into M groups, and sends cult. each group to one of the mixing servers. The mix- For example, assume that Alice and Bob sam- ing servers then combine Bob’s data with that of 5
other users diagnosed with COVID-19 before for- countered Bob’s token. If the token she exchanged warding it onto Grace. Similarly, Alice does the with Bob is present in the database, she gets a hint same thing for querying, except she also needs to as to the disease status of one of the individuals she wait on a response from the mixing server for each was in contact with during the token exchange. of the tokens she sends. The linkage problem then becomes much more difficult for Grace, because 5 Privacy from Authorities based on the valid tokens for individuals have been split up. Private Messaging Systems Similarly, each mixing server only has access to a subset of the tokens corresponding to each indi- None of the easy-to-implement augmentation ideas vidual, making the linkage analysis more difficult given in Section 4 guarantee full privacy from the for them. Of course, if the mixing servers collude, authorities. At a cost of more computation, how- then the privacy reduces to that of the standard ever, we believe that a solution for secure contact polling-based approach. tracing can be built using modern cryptographic Note that this approach can also be simulated protocols. In particular, private messaging systems without the mixing servers by either Alice or Bob [27; 28; 29] and private set intersection (cardinal- if they have access to a large number of distinct ity) [30; 31; 32; 33] protocols seem especially rel- IP addresses. They can simply send their queries evant. The sketch we provide below is based on and tokens with some time delay from the different private messaging systems, though we do not claim IP addresses, preventing Grace from linking all of this to be an optimal implementation. them together. However, this approach may not be We will give the intuition here before going into feasible for most users. technical details necessary for an effective imple- mentation. First, we replace the random tokens 4.4 Public Database of Infected Users’ (at , bt ) exchanged by Alice and Bob with random Tokens is Efficient but Less Private public keys (pkA B t , pkt ) from asymmetric encryp- tion schemes [34]. The matching secret keys are Alternatively, Grace can simply publish the entire stored locally on each of Alice’s and Bob’s phones. database of tokens she receives from infected in- Then, imagine that Grace has established a collec- dividuals, including the ones from Bob. If Alice tion of mailboxes, one for each public key that Al- simply downloads the entire database, and locally ice and Bob exchange. Additionally, we introduce queries against it, then no information about Al- Frank and Fred. Frank forwards messages to/from ice’s identity is leaked to Grace. Fred. Fred forwards messages to/from Grace. They This approach may seem less computationally do not tell each other the source of the messages. feasible, especially on mobile devices. In circum- At fixed time points after Bob’s contact with Al- stances where the total number of people infected is ice (up to some number of days), Bob addresses a not very high, this approach works, as evidenced by message to Alice encrypted using the public key the South Korean model [6], though the approach Alice gave Bob. Bob gives the message to Frank, may fail as the epidemic reaches a peak. However, who then forwards it on to Grace (through Fred), the computational and transmission cost can be who puts it in Alice’s mailbox. The content of the partially ameliorated by batching together Grace’s message is Bob’s current infection status, and the database, so that Alice is not downloading the en- reason he sends messages at fixed time points is tire thing. For example, in the version where Bob to prevent Frank from figuring out Bob’s infection sends his own tokens B to Grace, Alice can down- status from the fact that he is sending messages. load batches corresponding to her contact tokens Â. Alice checks all of the mailboxes corresponding to If each batch has e.g. 50 tokens, then Grace does her last several days worth of broadcasted public not know which of those 50 tokens Alice came into keys. In one of the mailboxes, she then receives contact with. and decrypts Bob’s message, and learns whether Unfortunately, it is worth noting that this ap- she has been exposed to the virus. Grace cannot de- proach decreases Bob’s privacy from Alice, be- crypt the message Bob sends to Alice because it is cause Alice knows when she encountered the token protected by asymmetric encryption. Furthermore, Bob sent; she can then limit the number of possible to protect Alice’s privacy, she can also access her individuals who could have sent the token based on mailboxes through Frank and Fred, who deliver who she was in contact with during the time she en- the messages in Alice’s mailboxes to her without 6
At Contact Periodically After Contact Proxy Servers Grace (Frank and Fred) Alice Alice Server 1 Alice retrieves and decrypts messages in mailbox Bluetooth Server 2 “I am (not) infected.” Server Bob Bob Alice and Bob Bob sends encrypted infection Proxy servers obfuscate Grace maintains mailboxes, but exchange public keys status to Alice’s mailbox mailbox access patterns cannot tell Bob sent a message to Alice Figure 1: Overview of contact tracing based on private messaging systems. When Alice and Bob are near each other they exchange public keys as tokens. They then periodically encrypt (using each other’s public key, followed by the public keys of the proxy servers) a message indicating their infection status, and send it to the proxy server. They also periodically query the proxy server for messages posted to the mailboxes corresponding to their public keys to find out whether they have been exposed to the virus. revealing which mailboxes she owns. a more sophisticated use of mixing servers than de- Contact tracing can be viewed as a problem of scribed in Section 4.3 for the polling based solution. secure communication between pairs of users who When Bob wishes to send his encrypted message came into contact in the physical world. The com- to Alice, he first encrypts it multiple times with munication patterns of who is sending messages public keys corresponding to each of the servers to whom can reveal each individuals contact his- in the mix network. Because the messages are en- tory to the service provider (Grace). This notion crypted in multiple layers, and each server peels is known as metadata privacy leakage in computer only the outermost layer, the final destination (Al- security [35], where the metadata associated with a ice’s mailbox) is revealed only to the last server, message (e.g. sender/recipient and time) is con- and only Alice can read the content of the mes- sidered sensitive, in addition to the actual mes- sage (i.e. infection status). To prevent Grace from sage contents. In the contact tracing case, such learning the identity associated with each mailbox, metadata could reveal who has been in contact Alice can also access her mailboxes through the with whom, potentially revealing the users’ sen- mix network, which shuffles the traffic to decouple sitive activities. We believe that recent technical the mailboxes from their owners. As long as one advances [36; 27; 29] for designing scalable private of the servers is neither breached nor controlled by messaging systems with metadata privacy present the adversary, the final message cannot be linked a promising path for developing a similar platform to a specific sender even if the adversary has full for secure contact tracing. control of the rest of the network. Such a system Following recent works, our idea is to leverage for private communication could allow the users a ‘mix network [37], which is a routing protocol (Bob) to share their infection status with their re- that uses a chain of proxy servers (Frank/Fred) cent contacts (Alice) while hiding the metadata of that individually shuffle the incoming messages their contact patterns from the service providers. before passing them onto the next server, thereby The involvement of non-government entities, such decoupling the sender of each message from its as an academic institution or a hospital, in the mix destination—these types of mix networks are per- network may help increase users trust in the system haps most well-known for being the basis of the and lower the bar for adoption. Onion Router/Tor anonymity network [38]. This is There are several remaining issues that will 7
need to be addressed for this system to be widely strong privacy guarantees would likely encourage adopted. First, if time-varying IDs are used, then voluntary adoption. Any app needs to clearly ex- the user receiving a token from a nearby person plain privacy guarantees in ways understandable could infer the identity of the sender based on their by the average user, which was our motivation in travel history; i.e. Alice might be able to infer who describing here the different types of privacy (from Bob is based on the time they exchanged the tokens, snoopers, contacts, and the authorities) that the app as described in Section 4.4 in the case where the should be able to provide to users in order to earn database is made public. This loss of privacy from their trust. contacts can be partially alleviated by choosing a On that note, we believe it is imperative for any less frequent token refresh, so that with high like- app to be open source and audited by both secu- lihood, Alice cannot completely identify Bob by rity professionals and privacy advocates. This is the time interval. Actual implementations much de- not yet true for TraceTogether, but the app’s cre- cide on the right tradeoffs between Alice and Bob’s ators do claim that they will release the source code privacy from eachother and authorities, as well as soon [41]. Furthermore, open sourcing allows dif- contact tracing effectiveness. Another possible way ferent countries to customize such apps for their to mitigate this problem would be to aggregate the particular use cases and cultural preferences. messages for Alice on the server before making Also, while in some countries it may be difficult the results available to her. The messages are en- to enforce a government mandate that all residents crypted under different public keys, but it may be install an app, it is possible to have this as a require- possible to use multi-key homomorphic encryption ment for entering certain public places. Such a prac- schemes [39; 40] which allow computation over tice has precedence in so-called implied consent ciphertexts encrypted with different public keys to laws, such as agreeing to field sobriety tests when sum up the count of ‘infected’ messages. We defer getting a driver’s license [42]. One could imagine the details of approach to future work. grocery stores, schools, and universities requiring One other issue is that the volume of messages installing a contact tracing app as a precondition delivered to each user may reveal how socially ac- for entrance. This does not stop users from unin- tive each user has been, which could be considered stalling or turning off the app off-premises, but it sensitive by some users. Approaches to flatten the would at least be useful in getting people over the distribution with dummy messages could allevi- initial activation barrier of installation. ate this concern. Flattening the distribution with Finally, some amount of social pressure may also dummy messages may however lead to scalability assist in reaching widespread adoption. Contact challenges for existing private messaging systems. tracing apps, by design, know how many other Though many techniques [36; 27; 29] have been people close by have the app installed. An app proposed to address this challenge, further discus- could display that number. Given this knowledge, sion among the stakeholders is needed to determine a user may be incentivized to attempt to persuade the suitable trade-off between the level of latency others nearby to install the app, in the interest of that can be tolerated and the level of privacy guar- public health. antees desired by the users. Ultimately, though, private messaging systems enable provable privacy 7 Discussion from the authorities while still maintaining the use- fulness of contact tracing. In this document, we discuss ways to build an app for contact tracing, based upon the premise that 6 Strategies for Encouraging phones can broadcast tokens to all nearby phones. Widespread Adoption Notably, we do not address the engineering behind applying Bluetooth to enable such a feature. Nor Contact tracing apps depend on the network effect do we address the possibility of location data col- and critical mass to work. Having the app go ‘vi- lection for assisting epidemiologists in forecasting ral’ requires that people trust the app enough to disease spread [43]. We also do not discuss ap- install it and are enthusiastic enough to convince propriate selection of token refresh interval and their friends to do the same. After all, app adop- frequency at which phones should poll for nearby tion must have a higher ‘transmission rate’ than the ones, which are important factors for balancing virus itself in order for it to be effective. Providing privacy and efficiency—stale IDs have been seen 8
to permit linkage attacks in other similar contexts works. This is more computationally expensive, [44]. Lastly, we also do not build a full model for but would assure users that they do not have to give privacy of contact tracing, which is a delicate and up their privacy in order to take part in public con- easy-to-get-wrong task that requires much more tact tracing efforts. Indeed, the chief selling point careful research. Instead, we focus only on the would be that they would get additional informa- privacy implications of a dedicated contact tracing tion on their exposure without needing to trust any app, in the hopes that providing sufficiently strong individual third party with their private location or privacy guarantees would assist an app in gaining medical information. We believe that such a guar- the critical mass needed to be effective. antee would go a long way towards mass adoption Note that here we only discuss direct contact trac- of a contact tracing app in the United States. ing using Bluetooth proximity networks, without Future work remains to actually build such an using any location data. Some indirect proposals app, of course, and additional engineering, security, for contact tracing instead simply securely log the and policy considerations are sure to arise. For ex- user’s location history, which is then given to the au- ample, scalability of the data structures used in the thorities if a user is diagnosed with COVID-19 [45]. servers may become a major issue when the num- This approach has the benefit of not requiring net- ber of infected individuals rises. One additional work effects, because single individuals can track concern which we have not addressed is that of their locations without needing their contacts to nefarious actors seeking to spread panic by falsely have the app. The approach of logging location claiming to be infected. This could be prevented by history is inherently less private than direct con- allowing only hospital workers to trigger the broad- tact tracing, but that may possibly be resolved with cast of infection status, as in Singapore’s system, appropriate safeguards and redactions [45]. Fur- where the Ministry of Health directly contacts those thermore, hybrid approaches involving both GPS exposed, though that of course trades away some data and Bluetooth proximity networks may prove of the privacy of diagnosed patients. Alternately, to be useful to public health officials in modelling others have proposed cryptographic verification of disease spread beyond just contact tracing [46]. contact events, which could perhaps be extended We first discussed how, with just minor mod- to infection event broadcast without giving direct ifications, a polling-based direct contact tracing access of tokens to the authorities [47]. However, solution allows for some anonymity from authori- given that some cities are already rationing testing ties, which is lacking in the Singaporean Ministry kits and doctors’ visits to only the most serious of Health app TraceTogether. We believe that this cases [48; 49], restricting self-reporting might re- may help an app succeed in countries such as the sult in many instances of virus spread to be missed. U.S., where many citizens are loath to give too Alternately, the system can also be designed to sep- much data to the government. arate self-reports from confirmed reports by simply keeping two databases. Even the polling-based solution still reveals quite Our goal in writing this document is to start a a bit of information to the authorities, who could conversation on (1) what kinds of privacy trade-offs make use of linkage analysis to track individual people are willing to endure for the sake of public users. However, utilizing additional mixing servers health, and (2) the fact that with sufficient computa- is relatively practical and does provide additional tional resources and use of cryptographic protocols, protection. Alternately, a system can follow the app-based contact tracing can be accomplished South Korean model of openly publishing data without completely sacrificing privacy. Because about patients diagnosed with COVID-19, trading bad early design choices can persist long after roll- off some of their privacy to enhance the privacy out, we hope that developers and policy-makers of individuals who are trying to determine if they will give privacy considerations careful thought have been exposed. when designing new contact tracing apps. However, if we are willing to invest in additional computational resources, it is possible to achieve Acknowledgment increased privacy from snoopers, contacts, and the authorities, and we propose the beginnings of one We would like to thank David Rolnick, Adam Seal- approach using private messaging systems, which fon, Noah Daniels, and Michael Wirth for helpful we hope will be further expanded upon in future comments. 9
References [12] A. J. Jacobs, “Is state power to protect health com- patible with substantive due process rights,” Annals [1] “Novel Coronavirus Map from HealthMap,” Health L., vol. 20, p. 113, 2011. March 2020. [Online]. Available: https: //www.healthmap.org/covid-19/ [13] R. Prez-Pea, “Virus Hits Europe Harder Than China. Is That the Price of an Open Society? [2] K. T. Eames and M. J. Keeling, “Contact tracing and ,” New York Times, March 2020. [Online]. Avail- disease control,” Proceedings of the Royal Society of able: https://www.nytimes.com/2020/03/19/world/ London. Series B: Biological Sciences, vol. 270, no. europe/europe-china-coronavirus.html 1533, pp. 2565–2571, 2003. [14] “Help speed up contact tracing with TraceTo- [3] D. Normile, “Coronavirus cases have dropped gether,” Singapore Government Blog, March 2020. sharply in South Korea. Whats the secret to its [Online]. Available: https://www.gov.sg/article/ success?” https://www.sciencemag.org/news/2020/ help-speed-up-contact-tracing-with-tracetogether 03/coronavirus-cases-have-dropped-sharply-south- korea-whats-secret-its-success, 2020, accessed: [15] T. TraceTogether, “Can I say no to uploading my 2020-03-23. TraceTogether data when contacted by the Ministry of Health?” https://tracetogether.zendesk.com/ [4] B. Chappell, “Coronavirus: Sacramento County hc/en-sg/articles/360044860414-Can-I-say-no- Gives Up On Automatic 14-Day Quarantines,” to-uploading-my-TraceTogether-data-when- https://www.npr.org/sections/health-shots/2020/ contacted-by-the-Ministry-of-Health-, 2020, 03/10/813990993/coronavirus-sacramento-county- accessed: 2020-03-23. gives-up-on-automatic-14-day-quarantines, 2020, accessed: 2020-03-23. [16] C. E. Shannon, “Communication theory of secrecy systems,” Bell system technical journal, vol. 28, [5] J. Tidy, “Coronavirus: Israel enables emergency no. 4, pp. 656–715, 1949. spy powers,” BBC News, March 2020. [Online]. Available: https://www.bbc.com/news/technology- [17] L. Sweeney, “k-anonymity: A model for protect- 51930681 ing privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, [6] M. J. Kim and S. Denyer, “A travel log no. 05, pp. 557–570, 2002. of the times in South Korea: Mapping the movements of coronavirus carriers ,” [18] C. Dwork, F. McSherry, K. Nissim, and A. Smith, The Washington Post, March 2020. [Online]. “Calibrating noise to sensitivity in private data Available: https://www.washingtonpost.com/ analysis,” in Theory of cryptography conference. world/asia pacific/coronavirus-south-korea- Springer, 2006, pp. 265–284. tracking-apps/2020/03/13/2bed568e-5fac-11ea- ac50-18701e14e06d story.html [19] R. Raskar, I. Schunemann, R. Barbar, K. Vil- cans, J. Gray, P. Vepakomma, S. Kapa, A. Nuzzo, [7] C. J. Wang, C. Y. Ng, and R. H. Brook, “Response R. Gupta, A. Berke et al., “Apps gone rogue: Main- to COVID-19 in Taiwan: Big Data Analytics, New taining personal privacy in an epidemic,” arXiv Technology, and Proactive Testing,” JAMA, 2020. preprint arXiv:2003.08567, 2020. [8] Y. Lee, “Taiwan’s new ’electronic fence’ [20] C. Dwork, A. Roth et al., “The algorithmic foun- for quarantines leads wave of virus mon- dations of differential privacy,” Foundations and itoring,” March 2020. [Online]. Avail- Trends® in Theoretical Computer Science, vol. 9, no. able: https://www.reuters.com/article/us-health- 3–4, pp. 211–407, 2014. coronavirus-taiwan-surveillanc-idUSKBN2170SK [21] O. Goldreich, S. Micali, and A. Wigderson, “How [9] “HIPAA Privacy Rule,” December 2000. [On- to solve any protocol problem,” in Proc. of STOC, line]. Available: https://www.hhs.gov/hipaa/for- 1987. professionals/privacy/index.html [22] M. M. Merener, “Theoretical results on de- [10] C. J. Roberts, “Carpenter v. United anonymization via linkage attacks,” Transactions on States,” Supreme Court of the United Data Privacy, vol. 5, no. 2, pp. 377–402, 2012. States, no. 16-402, 2018. [Online]. Avail- able: https://www.supremecourt.gov/opinions/ [23] M. Srivatsa and M. Hicks, “Deanonymizing mobil- 17pdf/16-402 h315.pdf ity traces: Using social network as a side-channel,” in Proceedings of the 2012 ACM conference on Com- [11] “Notification of Enforcement Discretion for puter and communications security, 2012, pp. 628– telehealth remote communications during the 637. COVID-19 nationwide public health emer- gency,” March 2020. [Online]. Available: https: [24] E. T. Barometer, “January 20, 2019,” 2019. [On- //www.hhs.gov/hipaa/for-professionals/special- line]. Available: https://www.edelman.com/sites/g/ topics/emergency-preparedness/notification- files/aatuss191/files/2019-02/2019 Edelman Trust enforcement-discretion-telehealth/index.html Barometer Global Report 2.pdf 10
[25] T. TraceTogether, “How does TraceTogether [37] D. L. Chaum, “Untraceable electronic mail, re- work?” https://tracetogether.zendesk.com/hc/en-sg/ turn addresses, and digital pseudonyms,” Communi- articles/360043543473-How-does-TraceTogether- cations of the ACM, vol. 24, no. 2, pp. 84–90, 1981. work-, 2020, accessed: 2020-03-23. [38] M. G. Reed, P. F. Syverson, and D. M. Gold- [26] “How COVID-19 spreads,” Centers for Disease schlag, “Anonymous connections and onion rout- Control and Prevention, March 2020. [Online]. ing,” IEEE Journal on Selected areas in Communi- Available: https://www.cdc.gov/coronavirus/2019- cations, vol. 16, no. 4, pp. 482–494, 1998. ncov/prepare/transmission.html [39] A. López-Alt, E. Tromer, and V. Vaikuntanathan, [27] J. Van Den Hooff, D. Lazar, M. Zaharia, and N. Zel- “On-the-fly multiparty computation on the cloud via dovich, “Vuvuzela: Scalable private messaging re- multikey fully homomorphic encryption,” in Pro- sistant to traffic analysis,” in Proceedings of the 25th ceedings of the forty-fourth annual ACM symposium Symposium on Operating Systems Principles, 2015, on Theory of computing, 2012, pp. 1219–1234. pp. 137–152. [40] H. Chen, W. Dai, M. Kim, and Y. Song, “Efficient [28] N. Tyagi, Y. Gilad, D. Leung, M. Zaharia, and multi-key homomorphic encryption with packed ci- N. Zeldovich, “Stadium: A distributed metadata- phertexts with application to oblivious neural net- private messaging system,” in Proceedings of the work inference,” in Proceedings of the 2019 ACM 26th Symposium on Operating Systems Principles, SIGSAC Conference on Computer and Communica- 2017, pp. 423–440. tions Security, 2019, pp. 395–412. [29] H. Corrigan-Gibbs, D. Boneh, and D. Mazières, [41] J. Zhang, “620,000 people installed TraceTogether “Riposte: An anonymous messaging system han- in 3 days, Spores open source contact tracing dling millions of users,” in 2015 IEEE Symposium app,” Mothership, March 2020. [Online]. Avail- on Security and Privacy. IEEE, 2015, pp. 321–338. able: https://mothership.sg/2020/03/tracetogether- installed-open-source/ [30] M. J. Freedman, K. Nissim, and B. Pinkas, “Effi- cient private matching and set intersection,” in Inter- [42] A. C. Wagenaar, T. S. Zobeck, G. D. Williams, national conference on the theory and applications and R. Hingson, “Methods used in studies of drink- of cryptographic techniques. Springer, 2004, pp. drive control efforts: a meta-analysis of the litera- 1–19. ture from 1960 to 1991,” Accident Analysis & Pre- [31] L. Kissner and D. Song, “Privacy-preserving set vention, vol. 27, no. 3, pp. 307–316, 1995. operations,” in Annual International Cryptology Conference. Springer, 2005, pp. 241–257. [43] S. Pei, S. Kandula, W. Yang, and J. Shaman, “Forecasting the spatial transmission of influenza [32] E. De Cristofaro and G. Tsudik, “Practical private in the United States,” Proceedings of the National set intersection protocols with linear complexity,” in Academy of Sciences, vol. 115, no. 11, pp. 2752– International Conference on Financial Cryptogra- 2757, 2018. phy and Data Security. Springer, 2010, pp. 143– 159. [44] S. E. Sarma, S. A. Weis, and D. W. Engels, “RFID systems and security and privacy implica- [33] E. De Cristofaro, P. Gasti, and G. Tsudik, “Fast and tions,” in International Workshop on Cryptographic private computation of cardinality of set intersection Hardware and Embedded Systems. Springer, 2002, and union,” in International Conference on Cryptol- pp. 454–469. ogy and Network Security. Springer, 2012, pp. 218– 231. [45] “Private Kit: Safe Paths- Can we slow the spread without giving up individual privacy?” http:// [34] G. J. Simmons, “Symmetric and asymmetric en- safepaths.mit.edu/, 2020, accessed: 2020-03-23. cryption,” ACM Computing Surveys (CSUR), vol. 11, no. 4, pp. 305–330, 1979. [46] “COVID Watch,” https://covid-watch.org/, 2020. [35] B. Greschbach, G. Kreitz, and S. Buchegger, “The [47] J. Petrie, “Cryptographically Secure Contact Trac- devil is in the metadatanew privacy challenges in de- ing,” March 2020. centralised online social networks,” in 2012 IEEE In- ternational Conference on Pervasive Computing and [48] J. Dolan and B. Mejia, “L.A. County gives Communications Workshops. IEEE, 2012, pp. 333– up on containing coronavirus, tells doctors 339. to skip testing of some patients,” Los Ange- les Times, March 2020. [Online]. Available: [36] A. Kwon, D. Lu, and S. Devadas, “{XRD}: Scal- https://www.latimes.com/california/story/2020-03- able Messaging System with Cryptographic Pri- 20/coronavirus-county-doctors-containment-testing vacy,” in 17th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 20), [49] C. Y. Johnson and L. H. Sun, “Health officials in 2020, pp. 759–776. New York, California restrict coronavirus testing to 11
health care workers and people who are hospital- ized,” The Philadelphia Inquirer, March 2020. [On- line]. Available: https://www.inquirer.com/health/ coronavirus/coronavirus-testing-20200321.html 12
You can also read