PanCast: Listening to Bluetooth Beacons for Epidemic Risk Mitigation

Page created by Bryan Castillo
 
CONTINUE READING
PanCast: Listening to Bluetooth Beacons for Epidemic Risk Mitigation
PanCast: Listening to Bluetooth Beacons for
                                                                         Epidemic Risk Mitigation
                                                                                     White Paper

                                                 Gilles Barthe1 , Roberta De Viti2 , Peter Druschel2 , Deepak Garg2 ,
                                         Manuel Gomez-Rodriguez2 , Pierfrancesco Ingo2 , Matthew Lentz3 , Aastha Mehta2 , and
                                                                        Bernhard Schölkopf4
                                                               1
                                                                 MPI for Security and Privacy, gbarthe@mpi-sp.org
arXiv:2011.08069v1 [cs.CR] 16 Nov 2020

                                                   2
                                                       MPI for Software Systems, {rdeviti, druschel, dg, manuelgr, pieringo,
                                                                             aasthakm}@mpi-sws.org
                                                                     3
                                                                       VMware Research, mlentz@cs.umd.edu
                                                               4
                                                                 MPI for Intelligent Systems, bs@tuebingen.mpg.de

                                                                                  November 17, 2020

                                         This white paper summarizes the preliminary results of a Max Planck working group created in early
                                         April, 2020. The working group assembles expertise from institutes in Intelligent Systems, Security and
                                         Privacy, and Software Systems and it is part of a larger initiative from the Max Planck Society to bring
                                         together diverse fields in tackling COVID-19. The crisis caused by the current pandemic is a global one,
                                         triggering a spirit of open exchange of ideas, and there are a number of related technological efforts.
                                         While we have tried to cite all these related efforts, we appreciate suggestions for further references as
                                         well as constructive feedback and comments. The authors of this white paper are listed alphabetically.
                                         For more details, please visit the website https://pancast.mpi-sws.org.

                                                                                             1
Extended abstract
During the ongoing COVID-19 pandemic, efforts have been made to develop and deploy smartphone
apps to expedite contact tracing and risk notification. Most of these apps track pairwise encounters
between individuals via Bluetooth and then use the tracked encounters to identify and notify those who
might have been in proximity of contagious individuals. Unfortunately, the effectiveness of these apps
for contact tracing has been limited, partly due to low adoption rates, a difficult tradeoff between utility
and privacy, and COVID-19’s overdispersion, i.e., most infected individuals do not infect anybody else
but a few superspreaders infect many.
    PanCast is a system for epidemic risk assessment and notification that scales gracefully with adoption
rates, utilizes location and environmental information to increase utility without tracking its users, and
can be used to identify superspreading events.
    In PanCast, Bluetooth beacons placed in strategic locations continuously broadcast ephemeral IDs
that change over time. A subset of these beacons, called network beacons, also broadcast risk information
(associated with times when individuals who tested positive were near specific beacons). Users carry
inexpensive, zero-maintenance, small dongles in the form of cards or keyfobs that listen to these beacon
broadcasts passively (without transmitting anything), store beacon ephemeral IDs in local memory and,
when in proximity of a network beacon, compare these stored IDs with the broadcast risk information.
When an individual tests positive for the infectious disease, they can optionally disclose (a selected part
of) the list of ephemeral IDs stored in their dongle, which becomes part of the global risk information.
This means that the information about which locations were dangerous at which times gets included
in the broadcast risk messages. By design, a user learns their risk of contagion due to proximity to
diagnosed individuals but does not learn anything about locations visited by other individuals unless the
user was in physical proximity at the same time.
PanCast has the following key properties:
   • It is useful despite low adoption. PanCast lends itself to incremental deployment naturally by
     placing beacons in locations where superspreading events are most likely to occur and it can assist
     and benefit from existing digital contact tracing systems and manual contact tracing.
   • It ensures data minimization and prevents tracking of users. A healthy individual can
     use the system in a purely passive radio mode. Individuals who test positive can optionally and
     partially disclose the same type of information as in a manual contact tracing interview. Disclosed
     information is accessible only to individuals at risk and in a privacy-preserving way.
   • It can be used to identify superspreading events. Similar to manual contact tracing, PanCast
     makes use of location and environmental information (in a privacy-preserving way). This is crucial
     to the identification of superspreading events.

   • It is an inclusive system. PanCast enables the participation of technology challenged, eco-
     nomically disadvantaged, and physically challenged individuals, who cannot or do not wish to use
     smartphones.
   • It can be gracefully dismantled. At the end of the pandemic (or at any time), users who
     only used the system in passive radio mode can reset or destroy their dongles. Risk information
     uploaded by users who voluntarily decided to disclose their list of stored ephemeral IDs does not
     contain any identifiable personal data and might be valuable for epidemiological studies.

                                                     2
Contents

1 Introduction                                                                                                                                                                4
  1.1 Limitations of SPECTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                       4
  1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                     6

2 PanCast Overview                                                                                        8
  2.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
  2.2 Threat model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 PanCast data collection and processing                                                                                                                                     11
  3.1 Beacon and dongle registration . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   11
  3.2 Capturing beacon encounters . . . . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   12
  3.3 Encounter data upload . . . . . . . . . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
      3.3.1 Requirements . . . . . . . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
      3.3.2 Upload mechanisms . . . . . . . .                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
      3.3.3 Selective upload . . . . . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14
  3.4 Encounter data processing at the backend                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14

4 Risk dissemination                                                                                   14
  4.1 Noising the risk dissemination broadcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
  4.2 Bandwidth requirements and broadcast time . . . . . . . . . . . . . . . . . . . . . . . . . 16

5 Risk score calculation                                                                                                                                                     17

6 PanCast privacy and security                                                                                                                                               18
  6.1 Privacy analysis . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
      6.1.1 Leaks to the backend . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
      6.1.2 Leaks to other users . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
      6.1.3 Leaks to network beacons         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   19
      6.1.4 Leaks to user terminals .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   19
      6.1.5 Leaks from dongles . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   19
  6.2 Security analysis . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   19

7 Deployment and interoperability                                                                                                                                            20

8 Conclusion                                                                                                                                                                 21

A Proof of differential privacy of noise added to risk broadcasts                                                                                                            25

                                                                 3
1     Introduction

Containing infectious diseases such as the ongoing COVID-19 pandemic requires effective testing, contact
tracing, and isolation of infected individuals (TTI) [34,44]. Among these, contact tracing is an important
tool that can help use the limited test resources for those who are most likely to be infected, by identifying
infected individuals and those who came into close contact with them during their infectious period.
Furthermore, contact tracing provides knowledge of the circumstances of contagion, which in turn informs
the implementation of public health policies and interventions that can help contain further disease
spread. For instance, outbreaks of COVID-19 in meat packing plants in Germany [9] provided insights
into conditions that can potentially breed infection hotspots, which led to regulatory action1 .
    To expedite contact tracing and assist health officials in responding to patients and at-risk individuals,
many digital contact tracing systems have been deployed [10–13,15,17,33]. We collectively refer to these
systems as SPECTs (smartphone-based, pairwise encounter-based contact tracing systems). In SPECTs,
individuals install a smartphone application that records instances of physical proximity with devices of
other individuals via close-range Bluetooth exchanges, referred to as encounters. When an individual is
diagnosed with the infectious disease, they report their recent encounter history to a health authority,
which can then be used to notify other infected or at-risk individuals who might have come into contact
with the diagnosed individual during the individual’s period of contagiousness (indicated by mutual
encounters).
    Depending on how risk status is determined and delivered to users, current SPECTs are broadly
grouped into two types: centralized and decentralized. In a centralized system, the central health autho-
rity computes the risk status for all users in the system, and either broadcasts it to everyone (e.g., PEPP-
PT-NTK [11], systems based on the BlueTrace protocol [25] like OpenTrace [10] or TraceTogether [15]) or
returns the user-specific risk status to an individual upon inquiry (e.g., ROBERT [12]). In a decentralized
system, the health authority broadcasts information about patients’ encounters, and the smartphone app
computes the user’s risk status locally on the device and reports it to the user. Examples of decentralized
systems include those based on DP3T [17], PACT [33], and the TCN coalition [13].
    In centralized systems, all user data is collected and managed centrally, which places a high degree
of trust in the central authority. On the other hand, decentralized systems aim to preserve the privacy
of users and, therefore, minimize data that is shared with authorities. However, this data minimization
also prevents aggregation of data for epidemiological analysis.
    Despite strong efforts to deploy SPECTs and the availability of tracing apps in many countries,
there is scarce evidence of the effectiveness of SPECTs [6, 28]. This can be partially due to: (i) low
adoption rates (
room size, ventilation, air quality [58], or noise level.2 From an epidemiological point of view,
      it would be desirable to get accurate probabilistic transmission models describing the influence
      of such features on the infection risk associated with an encounter; however, controlled trials are
      ruled out by ethical concerns, and data-driven models require data including the relevant features.
      Such models would help us understand disease transmissions, justify political decisions, reduce false
      positive rates when alerting individuals, and enable more economical use of testing resources [21].
      We seek to address this issue by using Bluetooth beacons that are strategically placed at known
      locations (or trajectories in case of beacons placed in vehicles) and tagged with the relevant features.
      User dongles capture encounters with the beacons (not other user dongles as in SPECTs), thus
      indirectly capturing location information useful for risk identification.

   • Privacy. SPECTs, including decentralized variants, raise privacy concerns related to the fact
     that user smartphones continuously transmit information. These transmissions may enable eaves-
     droppers to learn the whereabouts of users who subsequently test positive and possibly infer the
     identities of patients using background information. Moreover, SPECTs collect and store sensi-
     tive contact information on smartphones, where their privacy is subject to multinational platform
     providers’ compliance with national privacy laws, and to the robustness of these platforms to
     hacking [53].
      We seek to address this concern by enabling user dongles to just listen passively to Bluetooth broad-
      casts from beacons, without transmitting anything during normal operation. Dongles transmit only
      when a patient is diagnosed positive and, even then, they only disclose information explicitly ap-
      proved by their owners. Moreover, dongles rely on a single-purpose, custom software and hardware
      platform with a small attack surface and are independent of multinational platform providers.
   • User transparency. Current systems are non-transparent in that users cannot readily interpret
     what information is exchanged with other users’ devices and with the backend server (if there is
     one). Consequently, users cannot make informed, selective decisions about information they wish
     to release.
      We seek to address this limitation by presenting the information gathered by a dongle to the user as
      a time series of visited beacons labeled with their locations. A diagnosed user can easily interpret
      the labeled information and can select parts of their history that they wish to disclose.
   • Scaling with adoption level. Arguably, the lack of inclusion and user transparency, as well
     as concerns about privacy impede adoption of SPECTs, which in turn limits their effectiveness.
     Worse, SPECTs rely on pairwise encounters between user devices and therefore suffer from the n2
     problem [33]: if only a relatively small fraction n < 1 of a population uses the system, then at most
     n2  1 of all encounters can be detected. Indeed, few SPECTs have adoption rates larger than
     10% in their countries [52], which implies that most relevant encounters are not detected.
      We address this scaling issue by relying not on pairwise encounters between individuals, but on
      encounters between individuals’ dongles and beacons installed in relevant locations. In this case, if
      a fraction n of people carry dongles, we detect n of the encounters of sick people with beacons. As
      argued below, this even allows certain risk predictions that do not involve (and indeed should not
      be limited to) pairwise encounters. Moreover, our system lends itself to incremental deployment
      naturally by placing beacons strategically in locations where superspreading events are more likely
      to occur.

   • Non-contemporaneous transmission. SPECTs can only detect infection transmissions between
     individuals who were in physical proximity of each other, i.e., have been in the same location at the
     same time. These systems cannot detect fomite transmissions (contaminated surfaces) nor airborne
     transmissions caused by the dissemination of droplet nuclei (aerosols) that may remain infectious
     over longer time periods when suspended in air, especially when ventilation is poor [48].
      By relying on beacons, we can detect potential transmission among users who visit a location
      within a certain period of time without ever meeting.
2 Experimental  studies indicate that the amount of virus being shed scales superlinearly with speaking volume, which in
 turn is influenced by the ambient noise level [22].

                                                           5
• Indentifying superspreading events. Given a diagnosed patient A, contact tracing can find
         individuals at risk in two fundamentally different ways. First, by finding people that A may have
         met and infected. Second, by identifying the event where A was infected, and then looking for
         others who have attended that event. Most European countries have focused on the former but
         there is growing evidence that countries that have focused on the latter, e.g., Japan3 , have been
         more successful at containing the spread of COVID-19. This can be in part due to the observed
         overdispersion—most infected individuals do not infect anyone, while a few superspreaders infect
         many [32, 42, 43, 60]. Therefore, while the diagnosed patient A is unlikely to be a superspreader
         herself, she is likely to have been infected by a superspreader. If contact tracing can identify the
         event where she was infected, we may be able to trace an entire cluster [37].
         Due to the lack of circumstantial information, SPECTs do not facilitate identification of such
         events, leading to the recommendation that citizens should in addition keep a ‘cluster diary’ [35].
         Our beacon-based system can address this limitation. Users diagnosed after encountering a beacon
         can help authorities identify a potentially relevant event. In this case, authorities may decide to
         notify all attendees of the event, even if they did not attend at the same time as a diagnosed
         patient, and even if they do not carry a dongle (by conventional means of contract tracing or even
         via broadcast media in the case of public events).
       • Interoperability with manual contact tracing. While digital contact tracing has not played
         a major role in containing COVID-19 in most countries to date, manual contact tracing has proven
         important [27]. However, most digital solutions fail to provide assistance to manual contact tracing
         and cannot benefit from information obtained through manual contact tracing. This is in contrast
         with our beacon-based system, which can assist and benefit from manual contact tracing.
A number of ongoing efforts seek to address some but not all of the limitations of SPECTs. Specifically,
efforts are underway to rely on simple, small form-factor dongles as user devices such as the Trace
Together Token [14], the Corona Warning Buzzer [3], the Corona Armband [2], Contact Harald [1], and
Minew [8]. In the same spirit as our system, other ongoing efforts, such as the MIT Safe Paths [29],
WiFiTrace [55] and CrowdNotifier [5] depart from pair-wise Bluetooth encounters and instead focus on
presence tracing. In these systems, users match their personal diary of location data on their smartphones,
acquired using GPS (Safe Paths), WiFi network logs (WiFiTrace) or QR codes (CrowdNotifier), with
the anonymized location history of infected patients. However, by relying on smartphones, these systems
are less inclusive and, even in their decentralized versions, are more prone to privacy and security attacks
than our system.
     Finally, this work focuses only on the tracing component of TTI and, like other SPECTs, the broader
TTI strategy that it contributes to could suffer from weaknesses in testing and isolation that are beyond
the scope of this work. These weakenesses need to be addressed by governments or health authorities
separately. For example, a recent survey [4] has shown that (i) not every individual who needs to get
tested is actually tested, either for lack of testing capacity or unwillingness of people to get tested, and
(ii) many individuals are unable to self-isolate for a variety of reasons, including sharing their household
with others, having to take care of children, not being able to afford weeks of isolation, not being able
to take medical leave, psychological reasons, and fear of stigmatization (most prevalent among youth).

1.2       Contributions
We propose PanCast, a system that enables rapid dissemination of risk information to users in a passive,
secure, and privacy-preserving manner. Unlike existing SPECTs that record pairwise encounters between
user smartphones, we rely on simple dongles recording the presence of BLE (Bluetooth Low Energy)
beacons installed in strategic locations. Thus, dongles do not capture or disclose trajectories of inter-
personal encounters. Instead, they collect space-time information about their encounters with beacons.
In these encounters, the dongles by default only listen to the beacon broadcasts passively (without
transmitting anything)4 .
   When a user tests positive for the infectious disease, they can voluntarily choose to connect their
dongle to the system using a terminal that has both a BLE interface and an internet connection. Through
3 https://www.japantimes.co.jp/news/2020/06/10/national/japan-coronavirus-strategy/
4 In   the remainder of this document, unless stated otherwise, encounters refer to encounters between dongles and beacons.

                                                              6
the terminal, the dongle uploads the user’s recent history of encountered beacons to the backend together
with a certificate of positive diagnosis from the health facility. When permitted or required by law,
PanCast can also be set up to allow the user to select a subset of their history for upload by manually
going through locations they have visited (see Section 3.3 for details).
    A subset of the beacons, called network beacons, are connected with health authorities over the
internet and play a special role: They continuously broadcast current risk information over BLE. Risk
information is a list of ephemeral ids (called risk entries) of beacons, where diagnosed individuals were
recently present, as well as encounter circumstances considered risky. Whenever a dongle encounters
a network beacon, it passively listens to the beacon’s risk broadcast. If the dongle finds a risk entry
matching with its own recorded space-time history, it alerts its user of potential risk, e.g., through a
blinking LED. In case of an alert, health authorities may encourage (or mandate that) the dongle owner
present themselves for testing.
    The above design enables PanCast to have the following properties:
    • Usability. Users carry inexpensive, zero-maintenance, small devices (e.g., keyfobs attached to a
      keyring or worn on wrist or around the neck) with a minimal user interface. Thus, PanCast enables
      the participation of technology-challenged, economically-disadvantaged, and physically-challenged
      individuals.
    • Utility. PanCast provides utility for users and health authorities. Users obtain risk notification,
      while health authorities receive the relevant space-time information about infection events, which
      they can use for epidemiological analysis5 .

    • Privacy. In PanCast, a user device never transmits any information unless the owner chooses
      to do so (e.g., after the user is tested positive). When an individual is diagnosed they explicitly
      consent to the transmission and can select the information they wish to transmit from their device
      to the health authority. Even the most privacy-conscious users who disclose nothing and never
      transmit anything from their dongles (i.e., only listen passively) receive risk notifications arising
      from space-time proximity to other diagnosed users who choose to disclose information after their
      diagnosis. Thus, from a privacy standpoint, the user experience is closer to that of manual contact
      tracing, where users decide specifically which information to share and those at risk get notified.
      Moreover, a user learns about potential risks in a visited location only when a diagnosed individual
      visited the location within a time window, but does not otherwise learn the location history of a
      diagnosed individual. Finally, the risk broadcast provides strong differential privacy guarantees for
      the number of risk entries and the number of diagnosed individuals contained in a risk broadcast.
      This ensures that an adversary, even one with offline information about some users, learns nothing
      about the health status or whereabouts of the remaining users through the system.
    • Security. PanCast’s data collection and risk dissemination protocols are immune to many of the
      attacks that are possible with SPECTs [24,57]. Moreover, PanCast’s simpler devices have a smaller
      attack surface compared to smartphones.
    • Interoperability. PanCast can effectively and transparently complement manual contact tracing.
      Health authorities may manually obtain location data from consenting diagnosed individuals and
      insert records into PanCast. Thus, even users who do not carry a dongle can contribute to sub-
      sequent risk estimates and broadcasts. Vice versa, by providing a user-comprehensible record of
      visited beacon locations, PanCast can be used as a diary aiding the memory of individuals who
      participate in manual contact tracing. Moreover, since our system can associate risk events with lo-
      cations, information about potential superspreading events can be broadcast also using traditional
      means of communication. Furthermore, PanCast can be designed to inter-operate with existing
      SPECTs.
We acknowledge that PanCast requires investment in infrastructure—beacons must be installed in rel-
evant locations and users need to be provided dongles. However, PanCast’s utility scales more quickly
5 Thismay contribute towards resolving some significant open questions. To give but one example, it has been hypothesized
 that wearing masks may not only reduce infection risk, but also reduce the probability that conditional on being infected,
 a person develops clinical COVID-19 [26]. PanCast would often allow us to pin down whether or not a person was wearing
 a mask at infection, and thus allow gathering data to help address the above question.

                                                            7
Figure 1: PanCast architecture. 1. Beacons and dongles are registered with the backend. 2. Dongles
record encounters with BLE beacons. 3. Diagnosed users or healthy volunteers may upload their history
of encountered beacons to the backend via a terminal. 4. The backend updates the risk database with
uploaded encounters. 5. Risk information is periodically broadcast from the backend to network beacons,
which broadcast the information to nearby dongles.

with the degree of deployment than SPECTs’, which suffer from the n2 problem as described above.
Thus, a strategic deployment plan can help ease the burden of investment while covering high-risk areas
that contribute to significant spread of infection. Initially, beacons can be placed in locations where
infections are most likely. In addition, network beacons can be colocated or integated with existing WiFi
base stations, which can further reduce the installation and maintenance costs for these beacons. Finally,
we expect the battery-operated BLE beacons and dongles to be very cheap—in the order of 10 euros per
device when produced in bulk. Future updates to PanCast protocols can also be easily pushed to the
devices through signed, over-the-air firmware upgrades, thus providing extensibility.
    Possible deployment scenarios include the case where, initially, specific areas such as a hospital, a
company or a school may decide to roll out PanCast across their physical site. This may enable a
country to keep critical infrastructure open during lock-down phases of a pandemic, while retaining the
ability to swiftly test, trace and isolate cases even if manual contact tracing systems are overloaded. It
would also allow us to gather data to better understand where infections take place in order to take
countermeasures. Public acceptance for such measures to improve safety in the work place is likely to be
higher than for systems such as SPECTs that continuously monitor encounter events even when people
are in their homes.

2     PanCast Overview

We start with an overview of PanCast’s components and our trust assumptions about them.

2.1    Components
Figure 1 shows an overview of PanCast’s architecture. PanCast comprises two types of beacons (BLE-
only and BLE+network), personal devices (dongles), terminals, and a backend platform that relays risk
notifications and aggregates data for epidemiological analysis. All beacons and dongles are registered
and authenticated with the backend, receive a secret key from the backend at the time of registration,
and have a coarse-grained timer and a small amount of flash storage. When a user receives a dongle,
they receive a list of one-time passwords (OTPs) that are also stored in the dongle. The user uses these
OTPs to authenticate to the dongle during testing and to control upload of data from the dongle. We
elaborate on the components and their functionalities next.

Beacons. There are two types of beacons. BLE beacons are commodity, battery-operated BLE-only
beacons and require no network connection to the backend. Network beacons also use BLE, but addi-

                                                    8
tionally require mains power and a network connection to provide connectivity to backend servers. The
beacons serve two purposes:
  (i) Every beacon provides a localization point in a specific place (e.g., an office, a bar, a public bus).
      A beacon periodically broadcasts an ephemeral id, its device id, and a location id that is either
      a fixed geo-coordinate or a service id identifying the beacon’s trajectory. The device id and the
      location id are cryptographically signed by the backend, while the ephemeral id is generated by the
      beacon by hashing a beacon-specific secret key, its location id, and an epoch number derived from
      its local clock. Because the hash includes the epoch number, a fresh ephemeral id is broadcast in
      every epoch. This localizes every encounter between the beacon and a dongle to a specific epoch.
      The epoch length is set upfront to a small value, e.g., 15 minutes.
 (ii) Network beacons additionally broadcast global risk information received from the backend period-
      ically using a protocol that balances privacy and efficiency.
Beacons are installed in specific locations, for instance, under the guidance of health authorities or by
organizations who voluntarily place them on their own premises. Stationary beacons broadcast a GPS
coordinate or a named identifier (e.g., city, zipcode, etc). Beacons may also be installed in mobile loca-
tions like a city bus or a train; they broadcast an id that identifies their trajectory (e.g., train service id).
All beacons are registered with the backend using their id and their location id comprising their station-
ary coordinate or trajectory, as well as information about their location that may be epidemiologically
relevant (e.g., indoor, outdoor, ventilation, air quality, ambient noise level, etc.). This information can
be used by the backend when computing infection risks or performing epidemiological analyses.
    Beacons may serve one or both of the above functionalities. Simple, battery-operated BLE beacons
would account for the majority of beacons. They are cheap and easy to install, because they do not require
mains power or network connectivity. We expect them to be installed wherever infection transmission is
likely to occur (e.g., in places where people congregate). A smaller number of network beacons provide
nearby dongles with risk information. To reduce installation costs, we expect them to be installed where
power and network is already available, e.g., next to a WiFi base station.
    Beacon placement. The density and placement of beacons is important for minimizing false positives
and false negatives in PanCast. False positives arise when a user receives a risk notification even though
they have not been in close contact with a diagnosed user. For instance, a false risk notification may be
generated when two users encounter a beacon placed on a glass door but from opposite sides of the door.
Such false positives can be reduced by placing a sufficient number of beacons in a location and using well-
known localization techniques, and by relying on additional labels on beacons (e.g., whether a beacon is
indoor or outdoor, etc.). False negatives arise when potential transmissions between users are missed,
for instance, because of users meeting in locations where there are no beacons. Beacon deployments
can be planned strategically to minimize false negatives. For instance, restaurants are likely to be more
crowded than parks; therefore, restaurants must be prioritized in a partial rollout.

Dongles. Dongles are small, simple devices that users can attach to a keyring, or wear on the wrist
or around the neck. They operate off a coin battery and have a minimal user interface in the form of a
LED that indicates risk status and battery condition, and a button to control the LED notification.
    Dongles continuously listen for BLE transmissions from nearby beacons. They receive ephemeral ids
from both types of beacons and store them along with the timestamps. When in proximity of a network
beacon, they additionally receive risk information, compare that information with the device’s stored
history of ephemeral ids received from beacons, and alert the user in case they were in proximity of a
confirmed patient under circumstances that suggest a possible transmission.
    As discussed in more detail in Section 3.3, dongles transmit information only when in the presence of
terminals and with the explicit consent of the owner (terminals are introduced below). Normally, such
information uploads occur only when the owner has tested positive. However, users have the option of
contributing their beacon encounter histories on a regular basis to aid health authorities with epidemic
analytics.

Backend service. The backend maintains several databases. First, it maintains a database of reg-
istered beacons, their locations/trajectories, and the secret key used by each beacon to generate the

                                                       9
unique sequence of ephemeral ids it broadcasts. A second database contains registered users, their don-
gles, and the cryptographic keys required to authenticate each dongle. A third database, called the risk
database, contains the uploaded encounter histories of recently diagnosed individuals. Finally, a fourth
database, called the epidemiology database, contains the encounter histories of healthy users who chose
to contribute their data for epidemic analytics.
    The encounter database is available to health authorities for analytics, e.g., to identity hotspots,
superspreading events, and to estimate epidemiological parameters. Moreover, it is used by the backend
to transmit risk information to the network beacons, which those beacons in turn broadcast to nearby
user dongles.

User terminals. User terminals are provided at locations that issue dongles as well as health care
faclities that do testing. Terminals allow users to connect to their dongles over BLE to change their
privacy settings, inspect what data is recorded on their dongles, upload data to the backend and, when
allowed or required by law, decide what subset of the recorded information they wish to upload when
they are diagnosed. Users can also use personal computers or smartphones as terminals to perform these
tasks in the comfort of their homes, or use the smartphone of a care provider who visits their home.

2.2   Threat model
Beacons. Beacons are operated by untrusted parties, may fail, be accidentally misconfigured, or be
corrupted by malicious parties. However, we assume that only a small fraction of beacons is affected at
any time.
    BLE and network beacons are managed by untrusted third parties, but they are registered and
approved by the backend, run trusted firmware, and only accept firmware updates signed by the backend.
Network beacons relay risk information from the backend to user dongles. This information is signed by
the backend so dongles can detect unauthorized changes by corrupt beacons. For timely dissemination
of risk dissemination (liveness), PanCast requires that most beacons transmit risk information without
change.
    BLE beacons are prone to four types of issues, which may lead to inconsistencies in data collection
and subsequently incorrect risk notification and epidemiological analysis. First, beacons may fail; for
example, they may run out of battery or may be rebooted at any time. Clocks in beacons may go
out of sync due to failures and reboots, or due to clock drift. Second, a beacon may be installed in
a location different from the one it was registered for; such misconfigurations may result from human
error or deliberate acts. Misconfigured beacons can cause the backend to mis-identify the physical
location of potential transmissions, hotspots, or superspreading events. Third, attackers could relay
and re-broadcast a legitimate beacon’s transmissions at a different location, even in near-real time.
Relay attacks can cause the system to temporarily infer transmission risks that do not actually exist
(false positives). Fourth, an attacker could obtain illegitimate access to a beacon and hack it or break
into it physically. A compromised beacon allows the attacker to steal its secret key and replicate the
beacon’s broadcast, thus creating false positives as with relay transmissions above. We assume that
such failures, misconfigurations, and attacks are sporadic and uncoordinated. The backend can detect
misbehaving and failed beacons, and initiate a manual repair. PanCast’s backend can also detect and
correct inconsistencies in the collected data to a large extent. While a beacon is unavailable for any
reason, transmissions in its vicinity may be missed by PanCast (false negatives).
    We do not address side-channel leaks of beacon secrets, for instance, through power or electromagnetic
radiations.

Dongles. Dongles may fail or be corrupted by malicious parties. However, we assume that only a small
fraction of dongles is affected at any time.
    Like beacons, dongles are registered and approved by the backend, run trusted firmware, and only
accept firmware updates signed by the backend. What information users must provide to register and
obtain a dongle is subject to local policy and legislation. Minimally, evidence should be required to
prevent any individual from registering many dongles. Requiring additional information such as names
and contact details enables health authorities to actively trace individuals at risk, but raises privacy
concerns that may limit adoption.

                                                   10
Like beacons, dongles may fail and be physically compromised. Dongle failures (e.g., reboots, perma-
nent failures) may cause false negatives due to missed beacon broadcasts while the dongle is unavailable
or the user obtains a new one. A physical compromise may allow the attacker to steal the owner’s
data from the dongle and make up plausible encounter histories, thereby potentially causing some false
positives, some false negatives or adding misleading information about user behavior.
   Side-channel leaks of secrets from dongles are out of scope. Also, PanCast does not address leaks
through the sizes and timing of location history uploads from dongles to the backend. In principle, these
can be mitigated by adding chaff traffic to dongle uploads, and having users upload chaff data at random
times.

Backend. PanCast’s backend service and its controlling authority are trusted to not leak or misuse user
registration information, secret keys shared with devices, the encounter histories that users upload, and
the backend’s private key. The backend is also trusted to transmit risk information correctly to beacons.6

User terminals. Terminals are trusted to a limited extent.
   There are two cases. First, users may use their personal computing devices (computers or smart-
phones) as terminals. These computing devices are implicitly trusted by users. Second, users who do
not own personal computing devices may use public terminals, or use the smartphone of a care provider.
These terminals are trusted to a limited extent: (i) When such a terminal is used to select information
on a dongle prior to upload, the terminal must be trusted to not leak that information. (ii) When
such a terminal is used to send commands to a user dongle, the terminal is trusted to send the correct
commands. Terminals are not trusted in any other way.

Eavesdroppers. Any BLE device may receive and store beacon broadcasts. However, we assume that
no party has the ability to continuously receive and collect the BLE transmissions of a significant fraction
of beacons within a large, contiguous geographic region.
    BLE eavesdroppers can receive the transmissions of beacons they are close to and conspiring users may
aggregate their observations. However, we assume that no party has the ability to continuously receive
and collect the transmissions of a significant fraction of beacons within a large, contiguous geographic re-
gion. Malicious users and eavesdroppers may attempt to combine information obtained through PanCast
with any amount of auxiliary information about a subset of users obtained through offline channels to
try and violate the privacy of the remaining users.
       We elaborate on PanCast’s security and privacy properties in Section 6.

3      PanCast data collection and processing

We next describe PanCast’s operation involving device registration, encounter logging, encounter uploads
to the backend, and encounter data processing at the backend.

3.1      Beacon and dongle registration
When a dongle is registered, it receives a dongle id d, the backend’s public key, an initial clock Cd synced
to real time, a secret key skd and a list of one-time passwords (OTPs) from the backend. When a beacon
is registered, it receives a beacon id b, the backend’s public key, a location id locb corresponding to the
location where the beacon is supposed to be installed, an initial clock Cb synced to real time, and a
secret key skb from the backend. The backend stores the registration data of the beacons and dongles
in a device database. Specifically, the database contains entries of the form:

                {device type, device id} :- {secret key, initial clock, clock offset, }

where the device type is either beacon or dongle, and the location id exists only if the device type is
beacon. The clock offset in the backend’s database denotes any divergence between the local timer of
6 Asexplained in Section 3.3, secret keys shared with user dongles are only used to establish secure sessions in which dongles
 upload user histories. Consequently, leaking keys shared with dongles is no worse than leaking uploaded user histories
 directly.

                                                             11
a device and real time that is known to the backend. It is initialized to 0 during device registration.
The clock offsets for a dongle d and a beacon b are denoted δd and δb , respectively. To simplify the
presentation, in this document, we assume that all beacons and dongles are initialized synchronously,
i.e., the initial clock is set to zero in all devices at the same time; however, this is not a necessary
assumption.
     Each dongle and beacon has a coarse-grained timer of 1-minute resolution (td and tb respectively),
which is set to the initial clock value provided by the backend, and subsequently increments once every
minute. A device stores its timer value to local flash storage at intervals of fixed length L, called epochs.
A non-persistent variable tracks the epoch id or the number of epochs elapsed since the device’s start
(id and ib for the dongle and beacon, respectively). As stated earlier, in practice, we suggest an epoch
length L = 15 minutes.
     The secret key of a device is known only to the device and the backend. This key is used to mutually
authenticate the device and the backend to each other, and to establish a secure channel between the
two, whenever the two communicate with each other. The OTPs provided to a dongle are also given to
the dongle’s owner in human-readable form (e.g., printed on paper). These OTPs are used to mutually
authenticate the owner and the dongle whenever the owner interacts with the dongle (e.g., to initiate
data upload to the backend). See Section 3.3 for details of the communication between devices and the
backend, and between dongles and their owners.

3.2   Capturing beacon encounters
A beacon generates a new ephemeral id every epoch. In the ith epoch ib , the beacon b generates an
ephemeral id ephb,i that is computed as follows.

                                         ephb,i = hash(skb , locb , ib )

Here ib = b(tb −Cb )/Lc and hash is a one-way hash function. The beacon broadcasts {ephb,i , b, locb , tb },
which is captured by nearby user dongles. Suppose a dongle d encounters a beacon b when the dongle’s
local timer is td , and the beacon’s timer, epoch and ephemeral id are tb , ib , and ephb,i , respectively. The
dongle logs an entry, enctr, in its persistent storage where enctr is a tuple defined as:

                                          {ephb,i , b, locb , tb , td }

Beacons may broadcast the ephemeral ids several times within an epoch. Dongles persist only one entry
for each unique ephemeral id they receive.

Data structure size and storage requirements Location identifiers locb are 8 bytes in size.
These bytes may be used to represent the GPS coordinates or a hierarchical location naming scheme
(e.g., country.city.zip, or country.city.bus_id). Beacon timer and epoch counters are 4 bytes in
size and have a 1-minute resolution. Thus they can present time for 16 years, which is well beyond the
expected lifetime of the beacons. Beacon identifiers b are 4 bytes in size, and represent a simple global
counter for all beacons. Thus, we can support roughly 4.3 billion beacons in the world. The hash in the
ephemeral id is generated by computing SHA-256 of the inputs and taking the least significant 15 bytes
of the result. Overall, each beacon broadcast ({ephb,i , b, locb , tb }) is 15 + 4 + 8 + 4 = 31 bytes in length.
    Similar to beacons, dongle timers and epoch counters are 4 bytes each and have a 1-minute resolution,
and dongle identifiers are 4 bytes in size. Thus, for each unique ephemeral id received from a beacon, a
dongle stores 35 bytes of data. We store an encounter with a beacon if the dongle observes the beacon’s
ephemeral id broadcast for at least Emin minutes, which we assume to be 5 minutes.
    For making estimates in later sections, we assume that users encounter on average no more than one
unique beacon ephemeral id every 5 minutes throughout the day (24 hours). Accordingly, dongles need
to store data for 288 encounters in one day or 4032 encounters in a 14-day window (the infectious period
for the current COVID-19 disease as determined by health experts), which requires roughly 138KB
of persistent storage in a dongle. In reality, a dongle is very unlikely to encounter a distinct beacon
ephemeral id every 5 minutes continuously for 14 days, so this estimate is conservative.

                                                       12
3.3     Encounter data upload
Diagnosed individuals may share their encounter data with health authorities to enable dissemination
of risk information to people who might have been in their vicinity. We first discuss the requirements
for enabling data upload from users’ dongles. Then, we describe different upload mechanisms that can
enable users with different levels of technological access to upload their data.

3.3.1    Requirements
(i) User dongles only have Bluetooth connectivity; therefore, they need to use a networked BLE node to
upload their data to the backend. (ii) Since dongles have a minimal user interface, users fundamentally
need access to a separate device through which they can authorize their dongle to upload data in case
of a positive test. This device, subsequently called a terminal, can be any device that has a graphical
user interface, supports BLE, and has an Internet connection. A user needs to trust the terminal to send
correct commands to their dongle. In case they wish to perform a selective upload (described below),
then they must also trust the terminal with the data stored on their dongle. The terminal can be part
of a kiosk installed in a test center, clinic, or doctor’s office, or a personal device (e.g., a smartphone
or a computer) owned by the user or a care provider. (iii) Users may be allowed to share their data
selectively. For this, a terminal can be used to allow users to review the history stored in their dongles
and select what subset of the information they wish to release to the backend.

3.3.2    Upload mechanisms
Users typically visit a test center or a clinic for testing and receive their results offline via email or phone.
If the test is positive, they may wish to upload their data. Users may also voluntarily contribute their
data for epidemic analytics in the absence of a test. We present different mechanisms for data upload,
which vary in the time when data is uploaded and when it is actually released to the backend.
    In all cases, patients identify themselves at the time a test is taken using the normal procedures in
place for this purpose. Normally, their contact details are recorded along with the id of the test kit used
for the patient. Once the test results are available, the user is informed using their contact details. If
the result is positive, the notification includes a certificate that the patient has tested positive on the
given date, which the user can forward to the PanCast backend.

Delayed release Here, a user initiates the data upload after they receive a positive test result. If the
patient owns a smartphone or home computer, they can use it as a terminal for the upload. Users who
don’t own or have access to such a device and cannot leave their home to visit a terminal may use the
smartphone of a person who visits them to provide essential services (e.g., delivering groceries).
    When a user wishes to contribute their data, either because they tested positive and wish to warn
other people they have encountered or because they wish to contribute their data for epidemic analytics,
they establish a secure connection to their dongle via the terminal. For this, the user enters one of the
OTPs it shares with the dongle into the terminal. The dongle and the terminal, acting on behalf of
the user, use this OTP as a common shared secret to authenticate each other and establish a secure
connection. Once the connection is established, the user may enable the upload and attach their test
certificate if applicable. The dongle then encrypts the data it stores with a symmetric secret key it shares
with the backend, and uploads to the backend via the terminal, along with any certificate.

Early release Alternatively, users may choose to upload their data into escrow at the time of their
testing, pending a positive test result. Here, the user establishes a secure connnection to their dongle
using a terminal at the testing site, and consents to contributing their data if the test result turns out
positive. The data is immediately uploaded to the backend in encrypted form, using an encryption key
derived from one of the OTPs the dongle shares with the user. The user is asked to reveal this OTP
to the testing site, which uploads it to the backend along with the certificate if and when the result is
positive.
    With the early release approach, users trust the testing center to release their escrowed data only
in case of a positive test result. Alternatively, users themselves can upload the OTP to release their
escrowed information upon receiving a positive test result, for instance, through an automated phone
system.

                                                       13
3.3.3    Selective upload
When a user enables a data upload or release, they can optionally select a subset of their recent history.
For instance, they may choose not to upload data about visits to sensitive locations. The user may
perform this selection using a terminal at the testing center, or using a smartphone or computer at any
time. The user must trust this terminal with the currently recorded data on their dongles, as it is being
displayed on the terminal.

3.4     Encounter data processing at the backend
Once the backend receives a user’s encryption key, it decrypts their dongle’s encounter entries and verifies
their consistency. If the entries are consistent, the backend adds each encounter entry in the form of
{ephb,i , locb , T } to the risk database and/or the epidemiology database. Here, T corresponds to the
real time of encounter between a dongle and a beacon.
    The consistency check verifies that the value of each ephemeral beacon id ephb,i is consistent with
the time at which the dongle received it. It accommodates known clock offsets between encountered
beacons and the dongle. We explain this in more detail below. We describe the handling of other forms
of inconsistencies, for instance, due to misconfigured beacons or relay attacks in Section 6.2.

Verifying an encounter ephermeral id, ephb,i . Recall that an encounter entry is of the form
{ephb,i , b, locb , tb , td }. Suppose the encouner happened at real time T . Taking timer offsets into
account, we have the following equation:

                                          td + δd = T = tb + δb ,

Here, δd and δb are the clock offsets of the dongle and the beacon known to the backend. The backend
checks that td + δd = tb + δb . Next, the backend calculates the beacon’s epoch id ib = b(tb + δb − Cb )/Lc
at the time of encounter. The backend retrieves the beacon’s secret key skb using the beacon identifier
b from the encounter entry, and computes the expected ephemeral id hash(skb , locb , ib ) for the epoch ib .
If this expected ephemeral id matches the ephemeral id ephb,i in the encounter entry, then the encounter
entry is consistent and the backend adds it to its risk database, else the entry is ignored.

4     Risk dissemination

In this section, we describe a protocol for risk dissemination. Such a protocol must balance privacy
and efficiency concerns. On the one hand, the protocol must protect the location history and identity
of diagnosed patients. On the other hand, the protocol must ensure that relevant risk information
reaches potentially affected users’ dongles in a timely manner. We opt for a protocol where network
beacons broadcast global risk information periodically, and user dongles passively listen for relevant risk
information whenever they are in proximity of such a beacon. With this approach, dongles receive risk
information while reveal no information about their identity or their history to the backend, the network
beacons, or to other users.
     The risk information consists of a list of ephemeral ids. The ephemeral id of a beacon b for epoch
e is included in the list if and only if a diagnosed individual encountered b in epoch e. To reduce the
bandwidth overheads and broadcast delays, the list of risk entries is limited to the period of contagion
of the disease (e.g., 14 days in the case of COVID-19). The list can also be compressed using cuckoo
filters [40], at the cost of a small percentage of false positives. For example, to ensure an average false
positive rate of 0.01% for each user in a 14-day period, we require a cuckoo filter with entries of size just
27 bits (as opposed to the 15-byte ephemeral ids), which reduces the size of a risk broadcast by ∼4.4x.
The risk information may additionally contain values of parameters relevant for risk score calculation
performed on user devices, e.g., the weights of various features of an encounter in the risk estimation.
All risk information is signed by the backend to allow detection of any tampering by intermediate nodes,
including network beacons.
     If a dongle has previously received any of the ephemeral ids listed in the risk information, its owner
may have been exposed to a diagnosed individual. The device computes a risk score based on the number
of matched ephemeral ids and other features of each encounter encoded in the beacon broadcast, and
using the latest weights received from the backend. If the risk exceeds a certain threshold, the dongle

                                                     14
      δ       Offset(t)    99%-ile noise              δ      Offset(t)   99%-ile noise
          0.5    0.001   46638        78196                0.5    0.01   28181       59850
          0.2    0.001   94981        173938               0.2    0.01   49361       129117
          0.1    0.001   160148       318262               0.1    0.01   70588       231983
          0.05   0.001   263153       580176               0.05   0.01   90283       420116

        Table 1: 99th percentile noise required for various values of  and δ. Sensitivity (A) is 4032.

notifies the user via a LED so they can self-isolate and get tested. Users press a button on the dongle to
activate the LED; this ensures that they are not notified unexpectedly in public places, which may make
nearby people uncomfortable or stigmatize the user.
    In the following, we discuss the need to add noise in the risk dissemination protocol and the efficiency
of the entire protocol.

4.1     Noising the risk dissemination broadcast
We present two scenarios where the number of entries in the risk broadcast could potentially reveal an
individual’s identity, location history, or health status to an adversary in the locality of the individual.
These leaks arise without the adversary having even encountered the individual at all. We then describe
our solution to mitigate such leaks.

 i Whereabouts of diagnosed individuals. Suppose Alice learns from the local news that there
   was only one case of infection in the past few days within some geographic region. Furthermore,
   Alice happens to know that Bob was diagnosed, that he carries a dongle, and that he agreed to
   upload his encounter history when he got diagnosed. Now, if Alice learns that the risk information
   for her region includes a non-zero number of ephemeral ids, she can infer that Bob left his home
   at some point while he was contagious. Dually, if Alice sees that no risk information is provided
   for her region, she can infer that Bob has not been near a beacon within the region and during the
   period in question. In short, the length of a risk notification broadcast (zero vs. non-zero) reveals to
   an adversary information about the whereabouts of a diagnosed individual. Note that this problem
   exists even if a cuckoo filter is used to encode risk information, because the size of a filter optimized
   for space reveals information about the number of elements it includes.
      This information leak arises only if the location history of diagnosed users is always uploaded to
      the backend. In practice, users have a choice to not upload their history to the backend. If users
      exercise this choice frequently, then an adversary cannot learn much from the absence of risk entries.
      Nevertheless, we recognize that, with what we have described so far, such a leak is possible.
 ii Health status of an individual. Suppose Alice lives in an area with very few people, say n people,
    and Alice is able to track the movements of n − 1 of these people through outside channels. Suppose
    the risk information Alice receives contains more ephemeral ids than can be accounted for by the
    movements of the n − 1 people Alice is tracking. At this point, Alice knows that the nth person
    (whom Alice is not tracking) must be sick as well. Even though such an attack requires a significant
    amount of offline information and may be difficult to use in practice, it does raise privacy concerns.

Note that both these leaks rely solely on the number of ephermeral ids in a risk notification. We propose
to mitigate these leaks by adding noise to the risk notification broadcast in order to hide the actual
number of ephermeral ids. For this, we add junk ids that do not correspond to any real beacon and,
therefore, do not match the history of any user dongle. Since our threat model assumes that no adversary
can monitor the ephermeral ids from a significant fraction of beacons, no adversary can distinguish these
junk ids from legitimate ids. The number of junk ids can be chosen to satisfy differential privacy for all
individual users, which we describe next.

Differential privacy. Our goal is to add a random number of additional junk ids to a risk broadcast
to make the total number of entries differentially private for all individual users. The number of junk ids
we add must obviously be non-negative. For this, we adapt a mechanism proposed in prior work [19].
We describe our adapted mechanism next.

                                                      15
You can also read