Location Privacy in Pervasive Computing
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
SECURITY & PRIVACY SECURITY & PRIVACY Location Privacy in Pervasive Computing As location-aware applications begin to track our movements in the name of convenience, how can we protect our privacy? This article introduces the mix zone—a new construction inspired by anonymous communication techniques—together with metrics for assessing user anonymity. M any countries recognize privacy therefore most people could see no privacy impli- as a right and have attempted to cations in revealing their location, except in special codify it in law. The first known circumstances. With pervasive computing, though, piece of privacy legislation was the scale of the problem changes completely. You England’s 1361 Justices of the probably do not care if someone finds out where Peace Act, which legislated for the arrest of eaves- you were yesterday at 4:30 p.m., but if this someone droppers and stalkers. The Fourth Amendment to could inspect the history of all your past movements, the US Constitution proclaims citizens’ right to pri- recorded every second with submeter accuracy, you vacy, and in 1890 US Supreme Court Justice Louis might start to see things differently. A change of scale Brandeis stated that “the right to be left alone” is of several orders of magnitude is often qualitative one of the fundamental rights of a democracy.1 The as well as quantitative—a recurring problem in per- 1948 Universal Declaration of vasive computing.4 Human Rights2 declares that We shall focus on the privacy aspects of using Alastair R. Beresford and everyone has a right to privacy at location information in pervasive computing appli- Frank Stajano home, with family, and in corre- cations. When location systems track users auto- University of Cambridge spondence. Other pieces of more matically on an ongoing basis, they generate an recent legislation follow this prin- enormous amount of potentially sensitive informa- ciple. Although many people tion. Privacy of location information is about con- clearly consider their privacy a fundamental right, trolling access to this information. We do not nec- comparatively few can give a precise definition of essarily want to stop all access—because some the term. The Global Internet Liberty Campaign3 applications can use this information to provide use- has produced an extensive report that discusses per- ful services—but we want to be in control. sonal privacy at length and identifies four broad cat- Some goals are clearly mutually exclusive and can- egories: information privacy, bodily privacy, privacy not be simultaneously satisfied: for example, want- of communications, and territorial privacy. ing to keep our position secret and yet wanting col- This article concentrates on location privacy, a leagues to be able to locate us. Despite this, there is still particular type of information privacy that we define a spectrum of useful combinations to be explored. as the ability to prevent other parties from learning Our approach to this tension is a privacy-protecting one’s current or past location. Until recently, the very framework based on frequently changing pseudo- concept of location privacy was unknown: people nyms so users avoid being identified by the locations did not usually have access to reliable and timely they visit. We further develop this framework by information about the exact location of others, and introducing the concept of mix zones and showing 46 PERVASIVE computing Published by the IEEE CS and IEEE Communications Society ■ 1536-1268/03/$17.00 © 2003 IEEE
Related Work: Location Awareness and Privacy T he first wide-scale outdoor location system, GPS,1 lets users cal- culate their own position, but the flow of information is unidi- rectional; because there is no back-channel from the GPS receiver to customer billing information, micropayment systems, customer preferences, and location information. In this context, individual privacy becomes much more of a concern. Users of location-aware the satellites, the system cannot determine the user’s computed loca- applications in this scenario could potentially have all their daily tion, and does not even know whether anyone is accessing the movements traced. service. At the level of raw sensing, GPS implicitly and automatically In a work-in-progress Internet draft that appeared after we sub- gives its users location privacy. mitted this article for publication, Jorge R. Cuellar and colleagues In contrast, the first indoor location system, the Active Badge,2 also explore location privacy in the context of mobile cellular sys- detects the location of each user and broadcasts the information tems.8 As we do, they suggest using pseudonyms to protect loca- to everyone in the building. The system as originally deployed tion privacy. Unlike us, however, they focus on policies and rules— assumes anyone in the building is trustworthy; it therefore they do not consider attacks that might break the unlinkability provides no mechanisms to limit the dissemination of individuals’ that pseudonyms offer. location information. Ian W. Jackson3 modified the Active Badge system to address this issue. In his version, a badge does not reveal its identity to the sensor detecting its position but only to a trusted REFERENCES personal computer at the network edge. The system uses encrypted and anonymized communication, so observation of the 1. I. Getting, “The Global Positioning System,” IEEE Spectrum, vol. 30, no. 12, Dec. 1993, pp. 36–47. traffic does not reveal which computer a given badge trusts. The badge’s owner can then use traditional access control methods to 2. R. Want et al., “The Active Badge Location System,” ACM Trans. Infor- allow or disallow other entities to query the badge’s location. mation Systems, vol. 10, no. 1, Jan. 1992, pp. 91–102. More recent location systems, such as Spirit4 and QoSDream,5 3. I.W. Jackson, “Anonymous Addresses and Confidentiality of Location,” have provided applications with a middleware event model Information Hiding: First Int’l Workshop, R. Anderson, ed., LNCS, vol. through which entities entering or exiting a predefined region of 1174, Springer-Verlag, Berlin, 1996, pp. 115–120. space generate events. Applications register their interest in a par- 4. N. Adly, P. Steggles, and A. Harter, “SPIRIT: A Resource Database for ticular set of locations and locatables and receive callbacks when Mobile Users,” Proc. ACM CHI'97 Workshop on Ubiquitous Computing, the corresponding events occur. Current location-aware middle- ACM Press, New York, 1997. ware provides open access to all location events, but it would be 5. H. Naguib, G. Coulouris, and S. Mitchell, “Middleware Support for possible to augment this architecture to let users control the dis- Context-Aware Multimedia Applications,” Proc. 3rd Int’l Conf. Distrib- semination of their own location information. uted Applications and Interoperable Systems, Kluwer Academic Publish- ers, Norwell, Mass., 2001, pp. 9–22. So far, few location systems have considered privacy as an initial design criterion. The Cricket location system6 is a notable excep- 6. N.B. Priyantha, A. Chakraborty, and H. Balakrishnan, “The Cricket Loca- tion: location data is delivered to a personal digital assistant under tion-Support System,” Proc. 6th Int’l Conf. Mobile Computing and Net- working (Mobicom 2000), ACM Press, New York, 2000, pp. 32–43. the sole control of the user. Commercial wide-area location-based services will initially 7. P. Smyth, Project Erica, 2002, www.btexact.com/erica. appear in mobile cellular systems such as GSM. BTexact’s Erica sys- 8. J.R. Cuellar, J.B. Morris, and D.K. Mulligan, Geopriv Requirements, tem7 delivers sensitive customer information to third-party appli- Internet draft, Nov. 2002, www.ietf.org/internet-drafts/draft-ietf- cations. Erica provides an API for third-party software to access geopriv-reqs-01.txt. how to map the problem of location pri- Problem, threat model, and now dangerously close to reality. The more vacy onto that of anonymous communi- application framework challenging problem we explore in this arti- cation. This gives us access to a growing In the pervasive computing scenario, loca- cle is to develop techniques that let users body of theoretical tools from the infor- tion-based applications track people’s benefit from location-based applications mation-hiding community. In this context, movements so they can offer various use- while at the same time retaining their loca- we describe two metrics that we have ful services. Users who do not want such tion privacy. developed for measuring location privacy, services can trivially maintain location pri- To protect the privacy of our location one based on anonymity sets and the other vacy by refusing to be tracked—assuming information while taking advantage of based on entropy. Finally, we move from they have the choice. This has always been location-aware services, we wish to hide theory to practice by applying our meth- the case for our Active Badge (see the our true identity from the applications ods to a corpus of more than three million “Related Work” sidebar) and Active Bat receiving our location; at a very high level, location sample points obtained from the systems but might not be true for, say, a this can be taken as a statement of our Active Bat installation at AT&T Labs nationwide network of face-recognizing security policy. Cambridge.5 CCTV cameras—an Orwellian dystopia Users of location-based services will not, JANUARY–MARCH 2003 PERVASIVE computing 47
SECURITY & PRIVACY in general, want information required by We assume a system with the typical collusion. This is because certain regions one application to be revealed to another. architecture of modern location-based ser- of space, such as user desk locations in an For example, patients at an AIDS testing vices, which are based on a shared event- office, act as “homes” and, as such, are clinic might not want their movements (or driven middleware system such as Spirit or strongly associated with certain identities. even evidence of a visit) revealed to the QoSDream (see the “Related Work” side- Applications could identify users by fol- location-aware applications in their work- bar). In our model, the middleware, like lowing the “footsteps” of a pseudonym to place or bank. We can achieve this com- the sensing infrastructure, is trusted and or from such a “home” area. At an earlier partmentalization by letting location ser- might help users hide their identity. Users stage in this research, we tried this kind of vices register for location callbacks only at register interest in particular applications attack on real location data from the Active the same site where the service is provided. with the middleware; applications receive Bat and found we could correctly de- But what happens if the clinic and work- event callbacks from the middleware when anonymize all users by correlating two sim- place talk about us behind our back? the user enters, moves within, or exits cer- ple checks: First, where does any given Because we do not trust these applications tain application-defined areas. The mid- pseudonym spend most of its time? Sec- and have no control over them, we must dleware evaluates the user’s location at reg- ond, who spends more time than anyone assume they might collude against us to ular periodic intervals, called update else at any given desk? Therefore, long- discover the information we aim to hide. periods, to determine whether any events term pseudonyms cannot offer sufficient We therefore regard all applications as one have occurred, and issues callbacks to protection of location privacy. global hostile observer. applications when appropriate. Our countermeasure is to have users Some location-based services, such as Users cannot communicate with appli- change pseudonyms frequently, even while “when I am inside the office building, let cations directly—otherwise they would they are being tracked: users adopt a series my colleagues find out where I am,” can- reveal their identity straight away. We there- of new, unused pseudonyms for each appli- not work without the user’s identity. Oth- fore require an anonymizing proxy for all cation with which they interact. In this sce- ers, such as “when I walk past a coffee communication between users and appli- nario, the purpose of using pseudonyms is shop, alert me with the price of coffee,” cations. The proxy lets applications receive not to establish and preserve reputation (if can operate completely anonymously. and reply to anonymous (or, more correctly, we could work completely anonymously Between those extremes are applications pseudonymous) messages from the users. instead, we probably would) but to provide that cannot be accessed anonymously— The middleware system is ideally placed to a return address. So the problem, in this but do not require the user’s true identity perform this function, passing user input context, is not whether changing pseudo- either. An example of this third category and output between the application and the nyms causes a loss of reputation but more might be “when I walk past a computer user. For example, to benefit from the appli- basically whether the application will still screen, let me teleport my desktop to it.” cation-level service of “when I walk past a work when the user changes under its feet. Here, the application must know whose coffee shop, alert me with the price of cof- Our preliminary analysis convinced us that desktop to teleport, but it could conceiv- fee,” the user requests the service, perhaps many existing applications could be made ably do this using an internal pseudonym from the coffee shops’ trade-association to work within this framework by judicious rather than the user’s actual name. Web site, but using her location system mid- use of anonymizing proxies. However, we Clearly, we cannot use applications dleware as the intermediary so as not to decided to postpone a full investigation of requiring our true identity without vio- reveal her identity. The coffee shop appli- this issue (see the “Directions for Further lating our security policy. Therefore, in cation registers with the user’s location sys- Research” sidebar) and concentrate instead this article, we concentrate on the class of tem and requests event callbacks for posi- on whether this approach, assuming it did location-aware applications that accept tive containment in the areas in front of not break applications, could improve loca- pseudonyms, and our aim will be the each coffee shop. When the user steps into tion privacy. If it could not, adapting the anonymization of location information. a coffee shop event callback area, the cof- applications would serve no purpose. In our threat model, the individual does fee shop application receives a callback Users can therefore change pseudonyms not trust the location-aware application but from the user’s location system on the next while applications track them; but then, if does trust the raw location system (the sens- location update period. The coffee shop ser- the system’s spatial and temporal resolu- ing infrastructure that can position locata- vice then sends the current price of coffee tion were sufficiently high (as in the Active bles). This trust is certainly appropriate for as an event message to the registered user Bat), applications could easily link the old systems such as GPS and Cricket, in which via the middleware without having to know and new pseudonyms, defeating the pur- only the user can compute his or her own the user’s real identity or address. pose of the change. To address this point, location. Its applicability to other systems, Using a long-term pseudonym for each we introduce the mix zone. such as the Active Bat, depends on the trust user does not provide much privacy—even relationships between the user and the if the same user gives out different pseu- Mix zones entity providing the location service. donyms to different applications to avoid Most theoretical models of anonymity 48 PERVASIVE computing http://computer.org/pervasive
Directions for Further Research D uring our research on location privacy, several interesting open problems emerged. measurements from the perspective of what a hostile observer can see and deduce, instead of counting the users in the mix zone. The formalization, which we have attempted, is rather more com- Managing application use of pseudonyms plicated than what is presented in this article, but it might turn out A user who does not wish to be tracked by an application will to provide a more accurate location privacy metric. want to use different pseudonyms on each visit. Many applications can offer better services if they retain per-user state, such as per- Dummy users sonal preferences; but if a set of preferences were accessed by sev- We could increase location privacy by introducing dummy users, eral different pseudonyms, the application would easily guess that similar to the way cover traffic is used in mix networks, but there these pseudonyms map to the same user. We therefore need to are side effects when we apply this technique to mix zones. Serv- ensure that the user state for each pseudonym looks, to the appli- ing dummy users, like processing dummy messages, is a waste of cation, different from that of any other pseudonym. resources. Although the overhead might be acceptable in the There are two main difficulties. First, the state for a given user realm of bits, the cost might be too high with real-world services (common to all that user’s pseudonyms) must be stored elsewhere such as those found in a pervasive computing environment. and then supplied to the application in an anonymized fashion. Dummy users might have to control physical objects—opening Second, the application must not be able to determine that two and closing doors, for example—or purchase services with sets of preferences map to the same user, so we might have to add electronic cash. Furthermore, realistic dummy user movements small, random variations. However, insignificant variations might are much more difficult to construct than dummy messages. be recognizable to a hostile observer, whereas significant ones could adversely affect semantics and therefore functionality. Granularity The effectiveness of mixing depends not only on the user popu- Reacting to insufficient anonymity lation and on the mix zone’s geometry, but also on the sensing You can use the methods described in this article to compute a quanti- system’s update rate and spatial resolution. At very low spatial res- tative measure of anonymity. How should a user react to a warning olutions, any “location” will be a region as opposed to a point and that measured anonymity is falling below a certain threshold? The sim- so might be seen as something similar to a mix zone. What are the ple-minded solution would be for the middleware to stop divulging loca- requirements for various location-based applications? How does tion information, but this causes applications to become unresponsive. their variety affect the design of the privacy protection system? An alternative might be to reduce the size or number of the appli- Several location-based applications would not work at all if they cation zones in which a user has registered. This is hard to automate: got updates only on a scale of hours and kilometers, as opposed to either users will be bombarded with warnings, or they will believe they seconds and meters. are still receiving service when they are not. In addition, removing an application zone does not instantaneously increase user anonymity; Scalability it just increases the size or number of available mix zones. The user Our results are based on experimental data from our indoor must still visit them and mix with others before anonymity increases. location system, covering a relatively small area and user popula- Reconciling privacy with application functionality is still, in gen- tion. It would be interesting to apply the same techniques to a eral, an open problem. wider area and larger population—for example, on the scale of a city—where we expect to find better anonymity. We are currently Improving the models trying to acquire access to location data from cellular telephony It would be interesting to define anonymity sets and entropy operators to conduct further experiments. and pseudonymity originate from the work is analogous to a mix node in communi- network that offers anonymous commu- of David Chaum, whose pioneering con- cation systems. Using the mix zone to nication facilities. The network contains tributions include two constructions for model our spatiotemporal problem lets us normal message-routing nodes alongside anonymous communications (the mix net- adopt useful techniques from the anony- special mix nodes. Even hostile observers work6 and the dining cryptographers algo- mous communications field. who can monitor all the links in the net- rithm7) and the notion of anonymity sets.7 work cannot trace a message from its In this article, we introduce a new entity Mix networks and mix nodes source to its destination without the collu- for location systems, the mix zone, which A mix network is a store-and-forward sion of the mix nodes. JANUARY–MARCH 2003 PERVASIVE computing 49
SECURITY & PRIVACY Figure 1. A sample mix zone arrangement with three application zones. The airline agency (A) is much closer to the bank (B) than the coffee shop (C). Users Airline Mix zone leaving A and C at the same time might be distinguishable on arrival at B. Coffee shop Bank amount to which the mix zone anonymizes users is therefore smaller than one might believe by looking at the anonymity set size. The two “Results” sections later in this article discuss this issue in more detail. A quantifiable anonymity In its simplest form, a mix node collects distinct mix zones. In contrast, we define measure: The anonymity set n equal-length packets as input and reorders an application zone as an area where a user For each mix zone that user u visits dur- them by some metric (for example, lexico- has registered for a callback. The middle- ing time period t, we define the anonymity graphically or randomly) before forward- ware system can define the mix zones a pri- set as the group of people visiting the mix ing them, thus providing unlinkability ori or calculate them separately for each zone during the same time period. between incoming and outgoing messages. group of users as the spatial areas currently The anonymity set’s size is a first mea- For brevity, we must omit some essential not in any application zone. sure of the level of location privacy avail- details about layered encryption of packets; Because applications do not receive any able in the mix zone at that time. For exam- interested readers should consult Chaum’s location information when users are in a ple, a user might decide a total anonymity original work. The number of distinct mix zone, the identities are “mixed.” set size of at least 20 people provides suf- senders in the batch provides a measure of Assuming users change to a new, unused ficient assurance of the unlinkability of the unlinkability between the messages com- pseudonym whenever they enter a mix their pseudonyms from one application ing in and going out of the mix. zone, applications that see a user emerging zone to another. Users might refuse to pro- Chaum later abstracted this last obser- from the mix zone cannot distinguish that vide location updates to an application vation into the concept of an anonymity user from any other who was in the mix until the mix zone offers a minimum level set. As paraphrased by Andreas Pfitzmann zone at the same time and cannot link peo- of anonymity. and Marit Köhntopp, whose recent survey ple going into the mix zone with those Knowing the average size of a mix zone’s generalizes and formalizes the terminology coming out of it. anonymity set, as opposed to its instanta- on anonymity and related topics, the If a mix zone has a diameter much larger neous value, is also useful. When a user anonymity set is “the set of all possible sub- than the distance the user can cover dur- registers for a new location-aware service, jects who might cause an action.”8 In ing one location update period, it might the middleware can calculate from histor- anonymous communications, the action is not mix users adequately. For example, ical data the average anonymity set size of usually that of sending or receiving a mes- Figure 1 provides a plan view of a single neighboring mix zones and therefore esti- sage. The larger the anonymity set’s size, mix zone with three application zones mate the level of location privacy available. the greater the anonymity offered. Con- around the edge: an airline agency (A), a The middleware can present users with this versely, when the anonymity set reduces to bank (B), and a coffee shop (C). Zone A is information before they accept the services a singleton—the anonymity set cannot much closer to B than C, so if two users of a new location-aware application. become empty for an action that was actu- leave A and C at the same time and a user There is an additional subtlety here: ally performed—the subject is completely reaches B at the next update period, an introducing a new application could result exposed and loses all anonymity. observer will know the user emerging from in more users moving to the new applica- the mix zone at B is not the one who tion zone. Therefore, a new application’s Mix zones entered the mix zone at C. Furthermore, if average anonymity set might be an under- We define a mix zone for a group of nobody else was in the mix zone at the estimate of the anonymity available users as a connected spatial region of max- time, the user can only be the one from A. because users who have not previously vis- imum size in which none of these users has If the maximum size of the mix zone ited the new application zone’s geograph- registered any application callback; for a exceeds the distance a user covers in one ical area might venture there to gain access given group of users there might be several period, mixing will be incomplete. The to the service. 50 PERVASIVE computing http://computer.org/pervasive
North North zone West zone East zone & Corridor Hallway stairwell South zone Figure 2. Floor plan layout of the laboratory. Combinations of the labeled areas form the mix zones z1, z2, and z3, described in the main text. Experimental data period between 9 a.m. and 5 p.m.—a total rather meaningless to average values that We applied the anonymity set technique of more than 3.4 million samples. All the also include zero. to location data collected from the Active location sightings used in the experiment Figure 3 plots the size of the anonymity Bat system5 installed at AT&T Labs Cam- come from researchers wearing bats in nor- set for the hallway mix zone, z1, for update bridge, where one of us was sponsored and mal use. Figure 2 provides a floor plan view periods of one second to one hour. A loca- the other employed. Our location privacy of the building’s first floor. The second- and tion update period of at least eight minutes experiments predate that lab’s closure, so third-floor layouts are similar. is required to provide an anonymity set size the data we present here refers to the orig- of two. Expanding z1 to include the corridor inal Active Bat installation at AT&T. The Results yields z2, but the results (not plotted) for Laboratory for Communications Engi- Using the bat location data, we simu- this larger zone do not show any significant neering at the University of Cambridge, lated longer location update periods and increase in the anonymity level provided. our current affiliation, has since redeployed measured the size of the anonymity sets for Mix zone z3, encompassing the main the system. three hypothetical mix zones: corridors and hallways of all three floors The system locates the position of bats: as well as the stairwell, considerably small mobile devices each containing an • z1: first-floor hallway improves the anonymity set’s cardinality. ultrasonic transmitter and a radio trans- • z2: first-floor hallway and main corridor Figure 4 plots the number of people in z3 ceiver. The laboratory ceilings house a • z3: hallway, main corridor, and stairwell for various location update periods. The matrix of ultrasonic receivers and a radio on all three floors mean anonymity set size reaches two for a network; the timing difference between location update period of 15 seconds. This ultrasound reception at different receivers We chose to exclude from our measure value is much better than the one obtained lets the system locate bats with less than 3 of anonymity set sizes of zero occupancy, using mix zone z1, although it is still poor cm error 95 percent of the time. While in for the following reason. In general, a high in absolute terms. use, typical update rates per bat are number of people in the mix zone means The anonymity set measurements tell us between one and 10 updates per second. greater anonymity and a lower number that our pseudonymity technique cannot Almost every researcher at AT&T wore means less. However, this is only true for give users adequate location privacy in this a bat to provide position information to values down to one, which is the least- particular experimental situation (because location-aware applications and colleagues anonymous case (with the user completely of user population, resolution of the loca- located within the laboratory. For this arti- exposed); it is not true for zero, when no tion system, geometry of the mix zones, cle, we examined location sightings of all users are present. While it is sensible to aver- and so on). However, two positive results the researchers, recorded over a two-week age anonymity set size values from one, it is counterbalance this negative one: First, we JANUARY–MARCH 2003 PERVASIVE computing 51
SECURITY & PRIVACY Figure 3. Anonymity set size for z1. 18 Maximum 16 Mean 14 Minimum point of view: If I am in the mix zone with Cardinality of anonymity set 12 20 other people, I might consider myself well protected. But if all the others are on 10 the third floor while I am on the second, 8 when I go in and out of the mix zone the observer will strongly suspect that those 6 lonely pseudonyms seen on the second 4 floor one at a time actually belong to the same individual. This discovery motivated 2 the work we describe next. 1s 2s 4s 8s 15s 30s 1m 2m 4m 8m 15m 30m 1h Accounting for user movement Location update period So far we have implicitly assumed the location of entry into a mix zone is inde- pendent of the location of exit. This in gen- have a quantitative measurement tech- approximately 50 seconds. For update eral is not the case, and there is a strong nique that lets us perform this assessment. periods shorter than this value, we cannot correlation between ingress position and Second, the same protection technique that consider users to be “mixed.” Intuitively, it egress position, which is a function of the did not work in the AT&T setting might seems much more likely that a user enter- mix zone’s geography. For example, con- fare better in another context, such as ing the mix zone from an office on the third sider two people walking into a mix zone wide-area tracking of cellular phones. floor will remain on the third floor rather from opposite directions: in most cases Apart from its low anonymity set size, than move to an office on the first. Analy- people will continue walking in the same mix zone z3 presents other difficulties. The sis of the data reveals this to be the case: in direction. maximum walking speed in this area, cal- more than half the movements in this mix Anonymity sets do not model user entry culated from bat movement data, is 1.2 zone, users travel less than 10 meters. This and exit motion. Andrei Serjantov and meters per second. To move from one means that the users in the mix zone are George Danezis9 propose an information- extreme of z3 to another, users require not all equal from the hostile observer’s theoretic approach to consider the varying probabilities of users sending and receiv- ing messages through a network of mix nodes. We apply the same principle here to 40 mix zones. Maximum The crucial observation is that the 36 Mean anonymity set’s size is only a good measure 32 Minimum of anonymity when all the members of the Cardinality of anonymity set 28 set are equally likely to be the one of inter- est to the observer; this, according to Shan- 24 non’s definition,10 is the case of maximal 20 entropy. For a set of size n, if all elements 16 are equiprobable, we need log2 n bits of information to identify one of them. If they 12 are not equiprobable, the entropy will be 8 lower, and fewer bits will suffice. We can 4 precisely compute how many in the fol- lowing way. 0 1s 2s 4s 8s 15s 30s 1m 2m 4m 8m 15m 30m 1h Location update period Figure 4. Anonymity set size for z3. 52 PERVASIVE computing http://computer.org/pervasive
Figure 5. The movement matrix M records the frequency of movements through the Mix zone hallway mix zone. Each element represents the frequency of movements 96 47 66 101 814 from the preceding zone p, at time t – 1, through zone z, at time t, to the West subsequent zone s, at time t + 1. 125 13 17 1 43 Preceding zone, p South Consider user movements through a 29 55 7 22 30 mix zone z. For users traveling through z at time t, we can record the preceding zone, p, visited at time t − 1 , and the subsequent North zone, s, visited at time t + 1. The user does- 39 6 64 39 66 n’t necessarily change zones on every loca- tion update period: the preceding and sub- sequent zones might all refer to the same East 24 47 74 176 162 mix zone z. Using historical data, we can calculate, for all users visiting zone z, the relative frequency of each pair of preceding East North South West Mix zone and subsequent points (p, s) and record it Subsequent zone, s in a movement matrix M. M records, for all possible (p, s) pairs, the number of times a person who was in z at t was in p at t − 1 and in s at t + 1. The entries of M are pro- Results way mix zone z1 in the same location portional to the joint probabilities, which We applied this principle to the same set update period. East and west are applica- we can obtain by normalization: of Active Bat movement data described ear- tion zones, so the hostile observer will lier. We consider the hallway mix zone, z1, know that two users went into z1, one from M (p, s ) and define four hypothetical application east and another from west, and that later P (prec = p, subs = s ) = . ∑i, j M (i , j ) zones surrounding it: east, west, north, and south (see Figure 2). Walls prevent user those users came out of z1, under new pseu- donyms, one of them into east and the movement in any other direction. Figure 5 other into west. What the observer wants The conditional probability of coming shows M, the frequency of all possible out- is to link the pseudonyms—that is, to find out through zone s, having gone in through comes for users entering the hallway mix out whether they both went straight ahead zone p, follows from the product rule: zone z1, for a location update period of five or each made a U-turn. Without any a pri- seconds. From Figure 5 we observe that ori knowledge, the information content of M ( p, s ) users do not often return to the same mix this alternative is one full bit. We will now P ( subs = s prec = p) = . ∑j M ( p, j ) zone they came from. In other words, it is unlikely that the preceding zone, p, is the show with a numerical example how the knowledge encoded in movement matrix same as the subsequent zone, s, in every case M lets the observer “guess” the answer We can now apply Shannon’s classic except a prolonged stay in the mix zone with a chance of success that is better than measure of entropy10 to our problem: itself. It is quite common for users to remain random, formalizing the intuitive notion in z1 for more than one location update that the U-turn is the less likely option. h =− ∑ pi ⋅ log pi. period; the presence of a public computer in the hallway might explain this. We are observing two users, each mov- ing from a preceding zone p through the i We can use the data from Figure 5 to mix zone z1 to a subsequent zone s. If p and This gives us the information content, in determine how modeling user movements s are limited to {E, W}, we can reduce M bits, associated with a set of possible out- affects the anonymity level that the mix to a 2 × 2 matrix M′. There are 16 possible comes with probabilities pi. The higher it is, zone offers. As an example, consider two cases, each representable by a four-letter the more uncertain a hostile observer will users walking in opposite directions—one string of Es and Ws: EEEE, EEEW, …, be about the true answer, and therefore the east to west and the other west to east— WWWW. The four characters in the string higher our anonymity will be. passing through the initially empty hall- represent respectively the preceding and JANUARY–MARCH 2003 PERVASIVE computing 53
SECURITY & PRIVACY subsequent zone of the first user and the hostile observer can therefore unlink pseu- tem demonstrated that, because the tem- preceding and subsequent zone of the sec- donyms with much greater success than the poral and spatial resolution of the location ond user. mere anonymity set size would suggest. data generated by the bats is high, location We identify two events of interest. The The entropy therefore gives us a tighter and privacy is low, even with a relatively large first, observed, corresponds to the situation more accurate estimate of the available mix zone. However, we anticipate that the we observed: two users coming into z1 from anonymity.9 same techniques will show a much higher opposite directions and then again coming degree of unlinkability between pseudo- out of it in opposite directions. We have nyms over a larger and more populated T echnologies for locating and area, such as a city center, in the context of observed = EEWW ∨ EWWE ∨ tracking individuals are becom- locating people through their cellular tele- WEEW ∨ WWEE, ing increasingly commonplace, phones. and they will become more per- where ∨ means logical OR. The second vasive and ubiquitous in the future. Loca- event, uturn, corresponds to both users tion-aware applications will have the poten- doing a U-turn: tial to follow your every move, from leaving your house to visiting the doctor’s office, ACKNOWLEDGMENTS uturn = EEWW ∨ WWEE. recording everything from the shelves you view in a supermarket to the time you We thank our colleagues Andrei Serjantov, Dave Scott, Mike Hazas, Tom Kelly, and Gray Girling, We see that, in fact, the first event includes spend beside the coffee machine at work. as well as Tim Kindberg and the other IEEE Com- the second. We can easily compute the prob- We must address the issue of protecting puter Society referees, for their thoughtful advice abilities of these two events by summing the location information before the widespread and comments. We also thank all of our former col- leagues at AT&T Laboratories Cambridge for pro- probabilities of the respective four-letter sub- deployment of the sensing infrastructure. viding their movement data, as well as EPSRC and cases. We obtain the probability of a four- Applications can be built or modified to AT&T Laboratories Cambridge for financial letter subcase by multiplying together the use pseudonyms rather than true user iden- support. probability of the paths taken by the two tities, and this is one route toward greater users. For example, we obtain the proba- location privacy. Because different appli- bility of EWWE by calculating P(EW) × cations can collude and share information P(WE). We obtain these terms, in turn, from about user sightings, users should adopt the reduced movement matrix M′ by nor- different pseudonyms for different appli- malization. Numerically, P(observed) = cations. Furthermore, to thwart more REFERENCES 0.414. Similarly, P(uturn) = 0.0005. sophisticated attacks, users should change 1. S. Warren and L. Brandeis, “The Right to As the hostile observer, what we want to pseudonyms frequently, even while being Privacy,” Harvard Law Rev., vol. 4, no. 5, Dec. know is who was who, which is formally tracked. 1890, pp. 193–200. expressible as the conditional probability Drawing on the methods developed for 2. United Nations, Universal Declaration of Human of whether each did a U-turn given what anonymous communication, we use the Rights, General Assembly Resolution 217 A (III), we saw: P(uturn | observed). From the conceptual tools of mix zones and 1948; www.un.org/Overview. product rule, this is equal to P(uturn ∧ anonymity sets to analyze location privacy. 3. D. Banisar and S. Davies, Privacy and Human observed) / P(observed). This, because the Although anonymity sets provide a first Rights, www.gilc.org/privacy/survey. second event is included in the first, reduces quantitative measure of location privacy, 4. F. Stajano, Security for Ubiquitous Computing, to P(uturn) / P(observed) = 0.001. this measure is only an upper-bound esti- John Wiley & Sons, New York, 2002. So the observer now knows that they mate. A better, more accurate metric, devel- either did a U-turn, with probability 0.1 oped using an information-theoretic 5. A. Ward, A. Jones, and A. Hopper, “A New Location Technique for the Active Office,” IEEE percent, or they went straight, with proba- approach, uses entropy, taking into Personal Communications, vol. 4, no. 5, Oct. bility 99.9 percent. The U-turn is extremely account the a priori knowledge that an 1997, pp. 42–47. unlikely, so the information content of the observer can derive from historical data. 6. D. Chaum, “Untraceable Electronic Mail, outcome is not one bit, as it would be if the The entropy measurement shows a more Return Addresses and Digital Pseudonyms,” choices were equiprobable, but much less— pessimistic picture—in which the user has Comm. ACM, vol. 24, no. 2, 1981, pp. 84–88. because even before hearing the true answer less privacy—because it models a more 7. D. Chaum, “The Dining Cryptographers Prob- we are almost sure that it will be “they both powerful adversary who might use histor- lem: Unconditional Sender and Recipient went straight ahead.” Numerically, the ical data to de-anonymize pseudonyms Untraceability,” J. Cryptology, vol. 1, no. 1, entropy of this choice is 0.012 bits. more accurately. 1988, pp. 66–75. This value is much lower than 1 because Applying the techniques we presented in 8. A. Pfitzmann and M. Köhntopp, “Anonymity, the choices are not equiprobable, and a this article to data from our Active Bat sys- Unobservability and Pseudonymity—A Proposal 54 PERVASIVE computing http://computer.org/pervasive
for Terminology,” Designing Privacy Enhancing Technologies: Proc. Int’l Workshop Design Issues the AUTHORS in Anonymity and Observability, LNCS, vol. 2009, Springer-Verlag, Berlin, 2000, pp. 1–9. Alastair Beresford is a PhD student at the University of Cambridge. His research interests include ubiquitous systems, computer security, and networking. After com- 9. A. Serjantov and G. Danezis, “Towards an Infor- pleting a BA in computer science at the University of Cambridge, he was a mation Theoretic Metric for Anonymity,” Proc. researcher at BT Labs, returning to study for a PhD in the Laboratory for Communi- Workshop on Privacy Enhancing Technologies, cations Engineering in October 2000. Contact him at the Laboratory for Communi- 2002; www.cl.cam.ac.uk/users/aas23/papers_ cation Engineering, Univ. of Cambridge, William Gates building, 15, JJ Thomson aas/set.ps. Avenue, Cambridge, CB3 0FD, UK; arb33@cam.ac.uk. 10. C.E. Shannon, “A Mathematical Theory of Communication,” Bell System Tech. J., vol. 27, July 1948, pp. 379–423, and Oct. 1948, pp. 623–656. Frank Stajano is a faculty member at the University of Cambridge’s Laboratory for Communication Engineering, where he holds the ARM Lectureship in Ubiquitous Computing. His research interests include security, mobility, ubicomp, scripting lan- guages, and middleware. He consults for industry on computer security and he is the author of Security for Ubiquitous Computing (Wiley, 2002). Contact him at the For more information on this or any other com- Laboratory for Communication Engineering, Univ. of Cambridge, William Gates puting topic, please visit our Digital Library at building, 15, JJ Thomson Avenue, Cambridge, CB3 0FD, UK; fms27@cam.ac.uk; http://computer.org/publications/dlib. www-lce.eng.cam.ac.uk/~fms27. PU R POSE The IEEE Computer Society is EXECUTIVE COMMITTEE the world’s largest association of computing President: professionals, and is the leading provider of STEPHEN L. DIAMOND* technical information in the field. Picosoft, Inc. MEMBERSHIP Members receive the P.O.Box 5032 monthly magazine COM PUTER , discounts, San Mateo, CA 94402 and opportunities to serve (all activities are Phone: +1 650 570 6060 led by volunteer members). Membership is Fax: +1 650 345 1254 open to all IEEE members, affiliate society s.diamond@computer.org members, and others interested in the com- puter field. President-Elect: CARL K. CHANG* COMPUTER SOCIETY WEB SITE Past President: WILLIS. K. KING* The IEEE Computer Society’s Web site, at VP, Educational Activities: DEBORAH K. SCHERRER http://computer.org, offers information and samples from the society’s publications (1ST VP)* and conferences, as well as a broad range of VP, Conferences and Tutorials: CHRISTINA information about technical committees, SCHOBER* standards, student activities, and more. COMPUTER SOCIETY OFFICES VP, Chapters Activities: MURALI VARANASI† Headquarters Office VP, Publications: RANGACHAR KASTURI † BOARD OF GOVERNORS 1730 Massachusetts Ave. NW VP, Standards Activities: JAMES W. MOORE† Term Expiring 2003: Fiorenza C. Albert- Howard, Manfred Broy, Alan Clements, Richard A. Washington, DC 20036-1992 VP, Technical Activities: YERVANT ZORIAN† Kemmerer, Susan A. Mengel, James W. Moore, Phone: +1 202 371 0101 • Fax: +1 202 728 9614 Secretary: OSCAR N. GARCIA* Christina M. Schober Treasurer:WOLFGANG K. GILOI* (2ND VP) Term Expiring 2004: Jean M. Bacon, Ricardo E-mail: hq.ofc@computer.org Baeza-Yates, Deborah M. Cooper, George V. Cybenko, 2002–2003 IEEE Division VIII Director: JAMES D. Haruhisha Ichikawa, Lowell G. Johnson, Thomas W. Publications Office ISAAK† Williams 10662 Los Vaqueros Cir., PO Box 3014 2003–2004 IEEE Division V Director: GUYLAINE M. Term Expiring 2005: Oscar N. Garcia, Mark A POLLOCK† Grant, Michel Israel, Stephen B. Seidman, Kathleen Los Alamitos, CA 90720-1314 M. Swigger, Makoto Takizawa, Michael R. Williams Phone:+1 714 821 8380 Computer Editor in Chief:DORIS L. CARVER† Next Board Meeting: 6 May 2003, Vancouver, WA Executive Director: DAVID W. HENNAGE† E-mail: help@computer.org * voting member of the Board of Governors Membership and Publication Orders: † nonvoting member of the Board of Governors IEEE OFFICERS Phone: +1 800 272 6657 Fax: +1 714 821 4641 President: MICHAEL S. ADLER President-Elect: ARTHUR W. WINSTON E-mail: help@computer.org Asia/Pacific Office EXECUTIVE STAFF Past President: RAYMOND D. FINDLAY Executive Director: DAVID W. HENNAGE Executive Director: DANIEL J. SENESE Watanabe Building Assoc. Executive Director: Secretary: LEVENT ONURAL 1-4-2 Minami-Aoyama,Minato-ku, ANNE MARIE KELLY Treasurer: PEDRO A. RAY Tokyo107-0062, Japan Publisher: ANGELA BURGESS VP, Educational Activities: JAMES M. TIEN Phone: +81 3 3408 3118 • Fax: +81 3 3408 3553 Assistant Publisher: DICK PRICE VP, Publications Activities:MICHAEL R. LIGHTNER E-mail: tokyo.ofc@computer.org Director, Finance & Administration: VIOLET S. VP, Regional Activities: W. CLEON ANDERSON DOAN VP, Standards Association: GERALD H. PETERSON Director, Information Technology & Services: VP, Technical Activities: RALPH W. WYNDRUM JR. ROBERT CARE IEEE Division VIII Director JAMES D. ISAAK Manager, Research & Planning: JOHN C. KEATON President, IEEE-USA: JAMES V. LEONARD JANUARY–MARCH 2003 PERVASIVE computing 55
You can also read