Privacy and Accountability for Location-based Aggregate Statistics

Page created by Lewis Gonzalez

Buildings

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Privacy and Accountability for Location-based Aggregate Statistics

Privacy and Accountability for Location-based Aggregate
                        Statistics

                    Raluca Ada Popa                                Andrew J. Blumberg                               Hari Balakrishnan
                                 MIT                              University of Texas, Austin                                  MIT
                    ralucap@mit.edu                           blumberg@math.utexas.edu                                 hari@mit.edu

                                                                            Frank H. Li
                                                                                     MIT
                                                                       frankli@mit.edu

ABSTRACT                                                                                   information is then processed by the server to compute a variety of
A significant and growing class of location-based mobile applica-                          aggregate statistics.
tions aggregate position data from individual devices at a server and                         As primary motivation, consider applications that process streams
compute aggregate statistics over these position streams. Because                          of GPS position and/or speed samples along vehicle trajectories,
these devices can be linked to the movement of individuals, there                          sourced from smartphones, to determine current traffic statistics such
is significant danger that the aggregate computation will violate the                      as average speed, average delay on a road segment, or congestion at
location privacy of individuals. This paper develops and evaluates                         an intersection. Several research projects (e.g., CarTel [21], Mobile
PrivStats, a system for computing aggregate statistics over location                       Millennium [27]) and commercial products (e.g., TomTom [34]) pro-
data that simultaneously achieves two properties: first, provable                          vide such services. Another motivation is social mobile crowdsourc-
guarantees on location privacy even in the face of any side informa-                       ing, for example, estimating the number of people at a restaurant to
tion about users known to the server, and second, privacy-preserving                       determine availability by aggregating data from position samples
accountability (i.e., protection against abusive clients uploading                         provided by smartphones carried by participating users.
large amounts of spurious data). PrivStats achieves these properties                          A significant concern about existing implementations of these
using a new protocol for uploading and aggregating data anony-                             services is the violation of user location privacy. Even though
mously as well as an efficient zero-knowledge proof of knowledge                           the service only needs to compute an aggregate (such as a mean,
protocol we developed from scratch for accountability. We imple-                           standard deviation, density of users, etc.), most implementations
mented our system on Nexus One smartphones and commodity                                   simply continuously record time-location pairs of all the clients
servers. Our experimental results demonstrate that PrivStats is a                          and deliver them to the server, labeled by which client they belong
practical system: computing a common aggregate (e.g., count) over                          to [21, 27, 22]. In such a system, the server can piece together all
the data of 10, 000 clients takes less than 0.46 s at the server and                       the locations and times belonging to a particular client and obtain
the protocol has modest latency (0.6 s) to upload data from a Nexus                        the client’s path, violating her location privacy.
phone. We also validated our protocols on real driver traces from                             Location privacy concerns are important to address because many
the CarTel project.                                                                        users perceive them to be significant (and may refuse to use or even
                                                                                           oppose a service) and because they may threaten personal security.
Categories and subject descriptors: C.2.0 [Computer Communi-                               Recently (as of the time of writing this paper), two users have sued
cation Networks]: General–Security and protection                                          Google [26] over location data that Android phones collect citing
General terms: Security                                                                    as one of the concerns “serious risk of privacy invasions, includ-
                                                                                           ing stalking." The lawsuit attempts to prevent Google from selling
                                                                                           phones with software that can track user location. Just a week before,
1.     INTRODUCTION                                                                        two users sued Apple [25] for violating privacy laws by keeping a
   The emergence of location-based mobile services and the interest                        log of user locations without offering users a way to disable this
in using them in road transportation, participatory sensing [28, 21],                      tracking or delete the log. Users of TomTom, a satellite navigation
and various social mobile crowdsourcing applications has led to                            system, have expressed concern over the fact that TomTom logs
a fertile area of research and commercial activity. At the core of                         user paths and sells aggregate statistics (such as speeding hotspots)
many of these applications are mobile nodes (smartphones, in-car                           to the police, who in turn install speed cameras [34]. A study by
devices, etc.) equipped with GPS or other position sensors, which                          Riley [35] shows even wider location privacy fears: a significant
(periodically) upload time and location coordinates to a server. This                      number of drivers in the San Francisco Bay Area will not install toll
                                                                                           transponders in their cars because of privacy concerns. Moreover,
                                                                                           online databases are routinely broken into or abused by insiders
                                                                                           with access; if that happens, detailed records of user mobility may
Permission to make digital or hard copies of all or part of this work for                  become known to criminals, who can then attempt security attacks,
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies                 such as burglarizing homes when they know that their residents are
bear this notice and the full citation on the first page. To copy otherwise, to            away [40].
republish, to post on servers or to redistribute to lists, requires prior specific            In this paper, we design, implement, and evaluate PrivStats, a
permission and/or a fee.                                                                   practical system for computing aggregate statistics in the context of
CCS’11, October 17–21, 2011, Chicago, Illinois, USA.                                       mobile, location-based applications that achieves both strong guar-
Copyright 2011 ACM 978-1-4503-0948-6/11/10 ...$10.00.

antees of location privacy and protection against cheating clients. server, performs two simple tasks summing up to only 62 lines
PrivStats solves two major problems: it provides formal location of code (excluding libraries) which can be easily scrutinized for
privacy guarantees against any side information (SI) attacks, and it correctness, and only uses constant storage per aggregate.
provides client accountability without a trusted party. Because of - In contrast to our limited SM, trusted parties in previous work [18,
these contributions, in comparison to previous systems [16], Clique- 14, 20, 16] had the ability to fully compromise client paths and
Cloak [14], [19], [24], [20], and Triplines [18], PrivStats provides modify aggregate results when corrupted; at the same time, their
the strongest formal privacy and correctness guarantees while mak- use still did not provide general SI attack protection.
ing the weakest trust assumptions.
Side information refers to any out-of-bound information the server Since clients are anonymous, the problem of accountability be-
may have, which, when used together with the data the server gets comes serious: malicious clients can significantly affect the correct-
in a system, can help the server compromise user location privacy. ness of the aggregate results. Hoh et al. [19] present a variety of
Some previous work ensures that clients upload data free of identi- attacks clients may use to change the aggregate result. For exam-
fiers (they upload approximate time, location, and data sample), but ple, clients might try to divert traffic away from a road to reduce
even when all data is anonymized, a considerable amount of private a particular driver’s travel time or to keep low traffic in front of
location information can still leak due to side information. As a sim- one’s residence, or divert traffic toward a particular road-way to
ple example, if the server knows that Alice is the only person living increase revenue at a particular store. A particular challenge is that
on a street, it can infer that Alice just left or arrived at her house accountability and privacy goals are in tension: if each client upload
when receiving some speed updates from that street. SI can come in is anonymous, it seems that the client could upload as many times
many forms and has been shown to leak as much as full client paths as she wishes. Indeed, most previous systems [16, 14, 24, 20] do
in some cases: Krumm [24], as well as Gruteser and Hoh [17], show not provide any protection for such client biasing attacks and the
that one can infer driver paths from anonymized data and then link few that provide such verification ([18, 19]) place this task on heav-
them to driver identities using side information such as knowledge ily loaded trusted parties that when compromised can release full
of map, typical driving behavior patterns, timing between uploads, location paths with user identities and can corrupt aggregate results.
and a public web service with addresses and names. Despite SI’s Our second contribution is a privacy-preserving accountability
ability to leak considerable location privacy, previous systems for protocol without any trusted parties (§6). We provide a crypto-
mobile systems aggregates either do not address the problem at all, graphic protocol for ensuring that each client can upload at most a
or they only treat specific cases of it, such as areas of low density fixed quota of values for each aggregate. At the core of the protocol,
(see §10). is an efficient zero-knowledge proof of knowledge (ZKPoK) that
Our first contribution is to provide an aggregate statistics proto- we designed from scratch for this protocol. The zero-knowledge
col with strong location privacy guarantees, including protection property is key in maintaining the anonymity of clients. Therefore,
against any general side information attack. We formalize our no trusted party is needed; in particular, the SM is not involved in
guarantees by providing a definition of location privacy, termed this protocol.
strict location privacy or SLP (Def. 2), which specifies that the Finally, our third contribution is an implementation of the overall
server learns nothing further about the location of clients in the face system on Nexus One smartphones and commodity servers (§9).
of arbitrary side information other than the desired aggregate result; One of our main goals was to design a practical system, so we strived
then, in §5, we provide a protocol, also termed SLP, that provably to keep our cryptographic protocols practical, including designing
achieves our definition. our ZKPoK from scratch. Computing a common aggregate (e.g.
While we manage to hide most sources of privacy leakage in our count) over the data of 104 clients takes less than 0.46 s at the server,
SLP protocol without any trust assumptions, hiding the number of the protocol has modest latency of about 0.6 s to upload data from a
tuples generated by clients information-theoretically can be reduced, Nexus phone, and the throughput is linearly scalable in the number
under reasonable model assumptions, to a problem in distributed of processing cores. We believe these measurements indicate that
algorithms that can be shown to be impossible to solve. The reason PrivStats is practical. We also validated that our protocols introduce
is that it requires synchronization among a set of clients that do little or no error in statistics computation using real driver traces
not know of each other. Our solution is to use a lightweight and from the CarTel project [21].
restricted module, the smoothing module (SM), that helps clients
synchronize with respect to the number of tuples upload and per- 2. MODEL
forms one decryption per aggregate. We distribute the SM on the
In this section, we discuss the model for PrivStats. Our setup
clients (each “client SM” is responsible for handling a few aggre-
captures a wide range of aggregate statistics problems for mobile
gates) so as to ensure that at most a small fraction of SMs misbehave.
systems.
We provide our strong SLP guarantees for all aggregates with honest
SMs, while still ensuring a high level of protection for compromised 2.1 Setting
SMs:
In our model, we have many clients, a server, and a smoothing
- A compromised SM cannot change aggregate results undetectably module (SM). Clients are mobile nodes equipped with smartphones:
and cannot collude with clients to allow them to corrupt aggre- drivers in a vehicular network or peers in a social crowdsourcing
gates. application. The server is an entity interested in computing certain
- Although a malicious SM opens the doors to potential SI leakage aggregate statistics over data from clients. The SM is a third party
from the number of uploads and the values uploaded, a significant involved in the protocol that is distributed on clients.
degree of privacy remains because client uploads are always We assume that clients communicate with the server and the SM
anonymized, free of timing and origin information. Our SM is using an anonymization network (e.g., Tor), a proxy forwarder, or
distributed on the clients, resulting in few compromised client other anonymizing protocol to ensure that privacy is not compro-
SMs in practice. mised by the underlying networking protocol. In this paper, we
- Our design of the SM (§5) facilitates distribution and ensuring its assume these systems succeed in hiding the origin of a packet from
correctness: the SM has light load, ≈ 50 times less load than the an adversary; it is out of our scope to ensure this. We assume that

these systems also hide network travel time; otherwise clients must              !"#$%&'(
introduce an appropriate delay so that all communications have              *567,$%&'(()%*+,$%-./812$%/9214%

roughly the same round trip times.
   Clients can choose which aggregates they want to participate in;
they can opt-out if the result is deemed too revealing.                                                                                    )$+,$+(
   A sample is a client’s contribution to an aggregate, such as average                                        -%.%/0#1#%2(
speed, delay on a road, or a bit indicating presence in a certain                                 )*(            %$&3.+4(     5678906(:/0$%;3?)%1@""AB%
in vehicular systems, or at certain location/times, as is the case for                                                        (5658906(:/0$%;DD/E

I. System setup (Runs once, when the system starts).
  1: Server generates public and private keys for accountability by running System setup from §6.
  2: Both Server and SM generate public and private keys for the aggregation protocols by running System setup from §5.

   II. Client join (Runs once per client, when a client signs up to the system).
  1: Client identifies herself and obtains public keys and capabilities to upload data for accountability (by running Client join, §6) from
     Server and public keys for aggregation (by running Client join, §5) from SM. Server also informs Client of the aggregates Server
     wants to compute, and Client decides to which aggregates she wants to contribute (i.e., generate tuples).

   III. Client uploads for aggregate id (Runs when a client generates a tuple hid, samplei)
  1: Client runs SLP’s Upload protocol (§5) by communicating with the SM for each hid, samplei and produces a set of transformed tuples
     T1 , . . . , Tk and a set of times, t1 , . . . , tk , when each should be uploaded to Server.
  2: For each i, Client uploads Ti to Server at time ti .
  3: Client also proves to Server using the Accountability upload protocol, §6, that the upload is within the permissible quotas and
     acceptable sample interval.

   IV. Server computes aggregate result id (Runs at the end of an aggregate’s time interval).
  1: Server puts together all the tuples for the same id and aggregates them by running Aggregation with SM, §5.

                                                        Figure 2: Overview of PrivStats.

   For a security parameter k and an aggregate function F (which              ViewP (Rb ) to Adv. (ViewP usually contains probabilistic encryp-
can also be a collection of aggregate functions), consider a protocol         tion so two encryptions of Rb will lead to different values.)
P = PF (k) for computing F . For example, the SLP protocol (§5) is         4: The adversary outputs its best guess for b∗ and wins this game if
such a protocol P where F can be any of the aggregates discussed in           b = b∗ . Let winP (Adv, k) := Pr[b = b∗ ] be the probability that
§7. Let R be a collection of raw tuples generated by users in some            Adv wins this game.
time period; that is, R is a set of tuples of the form hid, samplei
together with the precise time, client network information, and                D EF. 2 (S TRICT L OCATION P RIVACY – SLP). A protocol P
client id when they are generated. R is therefore the private data         has strict location privacy with respect to a raw-tuple domain, D,
containing the paths of all users and should be hidden from the            if, for any polynomial-time adversary Adv, winP (Adv, k) ≤ 1/2 +
server. We refer to R as a raw-tuple configuration. Let SI be some         negligible function of k.
side information available at the server about R. Using P, clients
                                                                              Intuitively, this definition says that, when the server examines the
transform the tuples in R before uploading them to the server. (The
                                                                           data it receives and any side information, all possible configurations
sizes of R, F , and SI are assumed to be polynomial in k.)
                                                                           of client paths having the same aggregate result are equally likely.
   The following definition characterizes the information available at
                                                                           Therefore, the server learns nothing new beyond the aggregate result.
the server when running protocol P. Since we consider a real system,
                                                                              Strict location privacy and differential privacy. The guarantee
the server observes the timing and network origin of the packets it
                                                                           of strict location privacy is complementary to those of differential
receives; a privacy definition should take these into account.
                                                                           privacy [11], and these two approaches address different models. In
    D EF. 1 (S ERVER ’ S VIEW.). The view of the server, ViewP (R),        SLP, the server is untrusted and all that it learns is the aggregate
is all the tuples the server receives from clients or SM in P associated   result. In a differential privacy setting, the server is trusted and
with time of receipt and any packet network information, when              knows all private information of the clients, clients issue queries to
clients generate raw tuples R and run protocol P.                          the database at the server, and clients only learn aggregate results
                                                                           that do not reveal individual tuples. Of course, allowing the server
   Let D be the domain for all raw-tuple configurations R. Let res be      to know all clients’ private path information is unacceptable in our
some aggregate result. Let RF (res) = {R ∈ D : F (R) = res} be             setting. PrivStats’ model takes the first and most important step for
all possible collections of raw tuples where the associated aggregate      privacy: hiding the actual paths of the users from the server. Actual
result is res.                                                             paths leak much more than common aggregate results in practice.
Security game. We describe the SLP definition using a game be-             A natural question is whether one can add differential privacy on
tween a challenger and an adversary. The adversary is a party that         top of PrivStats to reduce leakage from the aggregate result, while
would get access to all the information the server gets. Consider a        retaining the guarantees of PrivStats, which we discuss in §8.
protocol P, a security parameter k, a raw-tuple domain D, and an
adversary Adv.
                                                                           4.      OVERVIEW
1: The challenger sets up P by choosing any secret and public keys            PrivStats consists of running an aggregation protocol and an ac-
   required by P with security parameter k and sends the public keys       countability protocol. The aggregation protocol (§5) achieves our
   to Adv.                                                                 strict location privacy (SLP) definition, Def. 2, and leaks virtu-
2: Adv chooses SI, res, and R0 , R1 ∈ RF (res) (to facilitate guess-       ally nothing about the clients other than the aggregate result. The
   ing based on SI) and sends R0 and R1 to the challenger.                 accountability protocol (§6) enables the server to check three prop-
3: Challenger runs P producing ViewP (R0 ) and ViewP (R1 ), and            erties of each client’s upload without learning anything about the
   sends them to Adv. It chooses a fair random bit b and sends             identity of the client: the client did not exceed a server-imposed

C contacts SM to sync.
quota of uploads for each sample point, did not exceed a total quota at random timing
of uploads over all sample points, and the sample uploaded is in an
acceptable interval of values. (1) (3)
Figure 1 illustrates the interaction of the three components in
(2)
our system on an example: computation of aggregate statistic with
C passes by sample C uploads a real and a junk
id 14, average speed. The figure shows Alice and Bob generating point, generates tuple tuple at S at random timing
data when passing through the corresponding sample point and then
contacting the SM to know how many tuples to upload. As we time
explain in §5, we can see that the data that reaches the server is
anonymized, contains encrypted speeds, arrives at random times
Figure 3: Staged randomized timing upload for a client C: (1) is the
independent of the time of generation, and is accompanied by ac-
generation interval, (2) the synchronization interval, and (3) the upload
countability proofs to prevent malicious uploads. Moreover, the
interval.
number of tuples arriving is uncorrelated with the real number.
Figure 2 provides an overview of our protocols, with components
elaborated in the upcoming sections. rithms problem which is shown to be impossible (as presented in
a longer version of our paper at http://nms.csail.mit.edu/
projects/privacy/). For this reason, we use a party called the
5. AGGREGATION: THE SLP (STRICT smoothing module (SM) that clients use to synchronize and upload
LOCATION PRIVACY) PROTOCOL a constant number of tuples at each aggregate. The SM will also
A protocol with strict location privacy must hide all five leakage perform the final decryption of the aggregate result. The trust re-
vectors (included in Def. 1): client identifier, network origin of quirement on the SM is that it does not decrypt more than one value
packet, time of upload, sample value, and number of tuples gen- for the server per aggregate and it does not leak the actual number
erated for each aggregate (i.e., the number of clients passing by of tuples. Although the impossibility result provides evidence that
a sample point). The need to hide the first two is evident and we some form of trust is needed for SLP, we do not claim that hiding
already discussed the last three vectors in §3. the number of tuples would remain impossible if one weakened the
Hiding the identifier and network origin. As discussed in §2, privacy guarantees or altered the model. However, we think our
clients never upload their identities (which is possible due to our trust assumption is reasonable, as we explain in §8: a malicious
accountability protocol described in §6). We hide the network origin SM has limited effect on privacy (it does not see the timing, origin,
using an anonymizing network as discussed in §2.1. and identifier of client tuples) and virtually no effect on aggregate
Hiding the sample. Clients encrypt their samples using a (seman- results (it cannot decrypt an incorrect value or affect accountability).
tically secure) homomorphic encryption scheme. Various schemes Moreover, the SM has light load, and can be distributed on clients
can be used, depending on the aggregate to be computed; Pail- ensuring that most aggregates have our full guarantees. For simplic-
lier [30] is our running example and it already enables most common ity, we describe the SLP protocol with one SM and subsequently
aggregates. Paillier has the property that E(a) · E(b) = E(a + b), explain how to distribute it on clients.
where E(a) denotes encryption of a under the public key, and the Since the number of real tuples generated may be less than Uid ,
multiplication and addition are performed in appropriate groups. we will arrange to have clients occasionally upload junk tuples:
Using this setup, the server computes the desired aggregate on en- tuples that are indistinguishable from legitimate tuples because the
crypted data; the only decrypted value the server sees is the final encryption scheme is probabilistic. Using the SM, clients can figure
aggregate result (due to the SM, as described later in this section). out how many junk tuples they should upload. The value of the junk
It is important for the encryption scheme to be verifiable to pre- tuples is a neutral value for the aggregate to be computed (e.g., zero
vent the SM from corrupting the decrypted aggregate result. That for summations), as we explain in §7.
is, given the public key PK and a ciphertext E(a), when given a To summarize, the SM needs to perform only two simple tasks:
and some randomness r, the server can verify that E(a) was indeed • Keep track of the number of tuples that have been uploaded
an encryption of a using r, and there is no b such that E(a) could so far (sid ) for an aggregate (without leaking this number to
have been an encryption of b for some randomness. Also, the holder an adversary). Clients use this information to figure out how
of the secret key should be able to compute r efficiently. Fortu- many more tuples they should upload at the server to reach a
nately, Paillier has this property because it is a trapdoor permutation; total of Uid .
computing r involves one exponentiation [30]. • At the end of the time period for an aggregate id, decrypt the
Hiding the number of tuples. The server needs to receive a number value of the aggregate result from the server.
of tuples that is independent of the actual number of tuples generated.
The idea is to arrange that the clients will upload in total a constant Hiding timing. Clients upload tuples according to a staged ran-
number of tuples, Uid , for each aggregate id. Uid is a publicly-known domized timing algorithm described in Figure 3; this protocol hides
value, usually an upper bound on the number of clients generating the tuple generation time from both the server and the SM. The
tuples, and computed based on historical estimates of how many generation interval corresponds to the interval of aggregation when
clients participate in the aggregate. §9 explains how to choose Uid . clients generate tuples, and the synchronization and upload interval
The difficulty with uploading Uid in total is that clients do not are added at the end of the generation interval, but are much shorter
know how many other clients pass through the same sample point, (they are not points only to ensure reasonable load per second at the
and any synchronization point may itself become a point of leakage SM and server).
for the number of clients. We now put together all these components and specify the SLP
Assuming that there are no trusted parties (i.e., everyone tries protocol. Denote an aggregate by id and let quota be the number of
to learn the number of tuples generated), under a reasonable for- samples a client can upload at id. Let [ts0 , ts1 ] be the sync. interval.
malization of our specific practical setting, the problem of hiding
the number of tuples can be reduced to a simple distributed algo-

per (http://nms.csail.mit.edu/projects/privacy/); nev-
I. System setup: SM generates a Paillier secret and public key. ertheless, the proof is easy enough to understand in such terms.
II. Client joins: Client obtains the public key from the SM. According to Def. 2, we need to argue that the information in the
III. Upload: view of the server (besides the aggregate result and any SI the server
◦ G ENERATION INTERVAL already knows) when clients generate two different collections of
1: Client generates a tuple hid, samplei and computes T1 := raw tuples, R0 and R1 , is indistinguishable. The information that
hid, E[sample]i using SM’s public key. reaches the server for id is a set of tuples consisting of three compo-
2: Client selects a random time ts in the sync. interval and does nents: time of receipt, network origin, and encrypted sample. Due
nothing until ts . to our Uid scheme, the number of tuples arriving at the server in both
◦ S YNCHRONIZATION INTERVAL cases is the same, Uid . Next, our staged upload protocol ensures
3: Client ↔ SM : At time ts , Client requests sid , the number that the timing of upload is uniformly at random distributed and the
of tuples other clients already engaged to upload for id from network origin is hidden. The encryption scheme of the samples is
SM (sid = 0 if Client is the first to ask sid for id). semantically-secure so, by definition, any two samples encrypted
4: Client should upload ∆s := min(quota, dUid (ts − with this scheme are indistinguishable to any polynomial adversary.
ts0 )/(ts1 − ts0 )e − sid ) tuples. If ∆s ≤ 0, but sid < Uid , as- Since all these three components are independent of each other, the
sign ∆s := 1. (In parallel, SM updates sid to sid := ∆s+sid total set of tuples for R0 and R1 are indistinguishable.
because it also knows quota and it assumes that Client will Note that this theorem assumes that the clients, server, and SM
upload her share.) run the SLP protocol correctly; specifically, the server performs all
5: Client picks ∆s random times, t1 , . . . , t∆s , in the upload tasks in the protocol correctly, although it may try to infer private in-
interval to use when uploading tuples. formation from the data it receives. In §8, we discuss the protection
◦ U PLOAD INTERVAL : PrivStats provides when these parties deviate from the SLP protocol
6: Client → Server : The client uploads ∆s tuples at times in various ways.
t1 , . . . , t∆s . One of the tuples is the real tuple T1 , the others, Note that we provide the SLP guarantees as long as clients suc-
if any, are junk tuples. ceed in uploading Uid tuples (as captured in the definition of D∗ ).
In the special cases when an aggregate is usually popular (so it has
IV. Aggregation: high Uid ), but suddenly falls in popularity (at least quota = 3 times
1: At the end of the interval, Server computes the aggregate on less), clients may not be able to reach Uid . One potential solution
encrypted samples by homomorphically adding all samples is to have clients volunteer to “watch” over certain sample points
for id. they do not necessarily traverse, by uploading junk tuples within the
2: Server ↔ SM : Server asks SM to decrypt the aggregate re- same quota restrictions, if sid is significantly lower than Uid towards
sult. SM must decrypt only one value per aggregate. Server the end of the synchronization interval.
verifies that SM returned the correct decrypted value as dis- Also note that the results of an aggregate statistics are available
cussed. at the server only at the end of the generation, synchronization,
and upload intervals. The last two intervals are short, but the first
one is as long as the time interval for which the server wants to
The equation for ∆s is designed so that for each time t in the syn- compute the aggregate. This approach inherently does not provide
chronization interval, approximately a total of Uid (t − ts0 )/(ts1 − real-time statistics; however, it is possible to bring it closer to real-
ts0 ) tuples have been scheduled by clients to be uploaded during the time statistics, if the server chooses the generation interval to be
upload interval. Therefore, when t becomes ts1 , Uid tuples should short, e.g., the server runs the aggregation for every 5 minutes of an
have been scheduled for upload. hour of interest.
We assign sid := 1 when ∆s comes out negative or zero because 5.1 Distributing the Smoothing Module
as long as sid < Uid , it is better to schedule clients for uploading in
order to reach the value of Uid by the end of the time interval; this So far, we referred to the smoothing module as one module for
approach avoids too many junk tuples (which do not carry useful simplicity, but we recommend the SM be distributed on the clients:
information) being uploaded later in the interval if insufficiently this distributes the already limited trust placed on the SM. Each
many clients show up then. Moreover, this ensures that, as long client will be responsible for running the SM for one or a few
as Uid is an upper bound on the number of clients passing by, few aggregate statistics. We denote by a client SM the program running
clients will refrain from uploading in practice. on a client performing the task of the SM for a subset of the sample
Choice of Uid and quota. A reasonable choice for quota is 3 points. Therefore, each physical client will be running the protocol
and Uid = avgid + stdid , where avgid is an estimate of the average for the client described above and the protocol for one client SM
of clients passing through a sample point and stdid of the standard (these two roles being logically separated).
deviation; both these values can be deduced based on historical A set of K clients will handle each aggregate to ensure that the
records or known trends of the sample point id. We justify and SLP protocol proceeds even if K − 1 clients become unavailable
evaluate this choice of Uid and quota in §9.2. (e.g., fail, lose connectivity). Assuming system membership is
Let D∗ be the set of all raw-tuple configurations in which clients public, anyone can compute which K clients are assigned to an ag-
succeed in uploading Uid at sample point id. Raw-tuple configura- gregate id using consistent hashing [23]; this approach is a standard
tions in practice should be a subset of this set for properly chosen distributed systems technique with the property that, roughly, load
Uid and quota. is balanced among clients and clients are assigned to aggregates in
a random way (see [23] for more details).
T HEOREM 1. The SLP protocol achieves the strict location pri-
The design of the SM facilitates distribution to smartphones: the
vacy definition from Def. 2 for D∗ .
SM has light load (≈ 50 times less than the server), performs two
P ROOF. Due to space constraints, we provide a high-level proof simple tasks, and uses constant storage per aggregate (see §9).
leaving a mathematical exposition for an extended version of this pa- We now discuss how the the two tasks of a client SM – aggregate

result decryption and maintaining the total number of tuples – are client for aggregate id. All the values in this protocol are computed
accomplished in a distributed setting. in Zn , where n = p1 p2 , two safe prime numbers, even if we do not
During system setup, one of the K clients chooses a secret and a always mark this fact. We make standard cryptographic assump-
public key and shares them with the other K − 1 clients. The server tions that have been used in the literature such as the strong-RSA
can ask any SM to decrypt the aggregate result: since the server can assumption.
verify the correctness of the decryption, the server can ask a different
SM in case it establishes that the first one misbehaved. Client SMs
should notify each other if the server asked for a decryption to 6.1 Protocol
prevent the server from asking different client SMs for different
For clarity, we explain the protocol for a quota of one tuple per
decryptions. Decryption is a light task for client devices because it
aggregate id per client and then explain how to use the protocol for
happens only once per aggregate.
a larger quota.
For a given aggregate, the aggregate result will be successfully
The accountability protocol consists of an efficient zero-knowledge
decrypted if at least one of K client SMs is available. K should
proof of knowledge protocol we designed from scratch for this prob-
be chosen based on the probability that clients become unavailable,
lem. Proofs of knowledge [15], [3], [36] are proofs by which one
which depends on the application. Since it is enough for only one
can prove that she knows some value that satisfies a certain relation.
SM to function for correctness, in general we expect that a small
For example, Schnorr [36] provided a simple and efficient algorithm
number for K (e.g., K = 4 should suffice in practice).
for proving possession of a discrete log. The zero-knowledge [15]
For maintaining the total number of tuples, we designed our
property provides the guarantee that no information about the value
staged timing protocol (Fig. 3) in such a way that client SMs need
in question is leaked.
only be available for the synchronization time interval. This interval
System setup. The server generates a public key (PKs ) and a private
is shorter than the generation interval. A client should ask each
key (SKs ) for the signature scheme from [10].
client SM corresponding to a sample point for the value of sid in
Client joins. Client identifies herself to Server (using her real
parallel. Since some client may not succeed in contacting all the
identity) and Server gives Client one capability (allowing her to
client SMs, the client SMs should synchronize from time to time.
upload one tuple per aggregate) as follows. Client picks a random
To facilitate client SMs synchronization, each client should choose
number s and obtains a blind signature from Server on s denoted
a random identifier for the request and provide it to each client
sig(s) (using the scheme in [10]). The pair (s, sig(s)) is a capability.
SM so that SMs can track which requests they missed when they
A capability enables a client to create exactly one correct token Tid
synchronize. (As an optimization, for very popular aggregates, one
for each aggregate id. The purpose of sig(s) is to attest that the
might consider not trying to hide the number of tuples because
capability of the client is correct. Without the certification provided
the leakage is less for such aggregates. For example, an aggregate
by the blind signature, Client can create her own capabilities and
can be conservatively considered popular if there are at least 1000
upload more than she is allowed.
clients passing per hour.)
The client keeps s and sig(s) secret. Since the signature is blind,
Our strict privacy guarantees will hold for an aggregate if none
Server never sees s or sig(s). Otherwise, the server could link
of the K client SMs for that aggregate are malicious. Since the
the identity of Client to sig(s); it could then test each token Tid
fraction of malicious clients is small in practice, our strong privacy
received to see if it was produced using sig(s) and know for which
guarantees will be achieved for most of the aggregates.
tuples Client uploaded and hence her path. By running the signing
protocol only once with Client, Server can ensure that it gives only
6. ACCOUNTABILITY one capability to Client.
Since each client uploads tuples anonymously, malicious clients Accountability upload
might attempt to bias the aggregates by uploading multiple times. In 1. Client → Server : Client computes Tid = ids mod n and
this section, we discuss PrivStats’s accountability protocol, which uploads it together with a tuple. Client also uploads a zero-
enforces three restrictions to protect against such attacks: a quota knowledge proof of knowledge (ZKPoK, below in Sec. 6.2)
for each client’s uploads for each sample point, a total upload quota with which Client proves that she knows a value s such that
per client, and a guarantee that each sample value must be within an ids = Tid mod n and for which she has a signature from the
acceptable interval. The protocol must also achieve the conflicting server.
goal of protecting location privacy. The most challenging part is to 2. Server → Client : Server checks the proof and discards the
ensure that each client uploads within quota at each sample point; tuple if the proof fails or if the value Tid has been uploaded
the other two requirements are provided by previous work. before for id (signaling an overupload).
We note that the SM is not involved in any part of accountability; We prove the security of the scheme in Sec. 6.2 and the appendix.
the server will perform the accountability checks by itself. This Intuitively, Tid does not leak s because of hardness of the discrete
is one of the main contributions of PrivStats: with no reliance on log problem. The client’s proof is a zero-knowledge proof of knowl-
a (even partially or distributed) trusted party, the server is able to edge (ZKPoK) so it does not leak s either. Since Tid is computed
enforce a quota on the number of uploads of each client without deterministically, more than one upload for the same id will result in
learning who performed any given upload. the same value of Tid . The server can detect these over-uploads and
The idea is to have each client upload a cryptographic token throw them away. The client cannot produce a different Tid∗ for the
whose legitimacy the server can check and which does not reveal same id by using a different s∗ because she cannot forge a signature
any information about the client; furthermore, a client cannot create of the server for s∗ ; without such signature, Client cannot convince
more that her quota of legitimate tokens for a given aggregate. Thus, Server that she has a signature for the exponent of id in Tid∗ . Here
if a client exceeds her quota, she will have to re-use a token, and the we assumed that id is randomly distributed.
server will note the duplication, discarding the tuple. Quotas. To enforce a quota > 1, the server simply issues a
Notation and conventions. Let SKs be the server’s secret sign- quota of capabilities during registration. In addition, the server may
ing key and PKs the corresponding public verification key. Consider want to tie quotas to aggregates. To do so, the server divides the
an aggregate with identifier id. Let Tid be the token uploaded by a aggregates into categories. For each category, the server runs system

setup separately obtaining different SKs and PKs and then gives a probability q, it checks each tuple for aggregate id. If the server
different number of capabilities to clients for each category. notices an attempt to bias the aggregate (a proof failing or a dupli-
Quota on total uploads. We also have a simple cryptographic cate token), it should then check all proofs for the aggregate from
protocol to enforce a quota on how much a client can upload in then on or, if it stored the old proofs it did not verify, it can check
total over all aggregates, not presented here due to space constraints; them all now to determine which to disregard from the computation.
however, any existing e-cash scheme suffices here [8]. Note that the probability of detecting an attempt to violate the up-
Ensuring reasonable values. As mentioned, in addition to quota load quote, Q, increases exponentially in the number of repeated
enforcement, for each tuple uploaded, clients prove to the server uploads, n: Q = 1 − (1 − q)n . For q = 20%, after 10 repeated
that the samples encrypted are in an expected interval of values uploads, the chance of detection is already 90%. We recommend
(e.g., for speed, the interval is (0, 150)) to prevent clients from using q = 20%, although this value should be adjusted based on the
uploading huge numbers that affect the aggregate result. Such proof expected number of clients uploading for an aggregate.
is done using the efficient interval zero-knowledge proof of [5]; the
server makes such interval publicly available for each aggregate. 6.4 Generality of Accountability
We discuss the degree to which fake tuples can affect the result in The accountability protocol is general and not tied to our setting.
section 9.2. It can be used to enable a server to prevent clients from uploading
more than a quota for each of a set of “subjects” while preserving
6.2 A Zero-Knowledge Proof of Knowledge their anonymity. For example, a class of applications are online
We require a commitment scheme (such as the one in [31]): recall reviews and ratings in which the server does not need to be able to
that a commitment scheme allows a client Alice to commit to a link uploads from the same user; e.g., course evaluations. Students
value x by computing a ciphertext ciph and giving it to another would like to preserve their anonymity, while the system needs to
client Bob. Bob cannot learn x from ciph. Later, Alice can open prevent them from uploading more than once for each class.
the commitment by providing x and a decommitment key that are
checked by Bob. Alice cannot open the commitment for x0 6= x and 7. AGGREGATE STATISTICS SUPPORTED
pass Bob’s verification check.
SLP supports any aggregates for which efficient homomorphic
Public inputs: id, Tid , PKs , g, h encryption schemes and efficient verification or proof of decryption
Client’s input: s, σ = sig(s). exist. Additive homomorphic encryption (e.g., Paillier [30]) already
Server’s input: SKs supports many aggregates used in practice, as we explain below.
C ONSTRUCTION 1. Proof that Client knows s and σ such that Note that if verification of decryption by the SM is not needed (e.g.,
ids = T and σ is a signature by Server on s. in a distributed setting), a wider range of schemes are available.
1. Client computes a Pedersen commitment to s: com = g s hr Also note that if the sample value is believed to not leak, arbitrary
mod n, where r is random. Client proves to Server that aggregates could be supported by simply not encrypting the sample.
she knows a signature sig(s) from the server on the value Table 1 lists some representative functions of practical interest
committed in com using the protocol in [10]. supported by PrivStats. Note the generality of sum and average of
2. Client proves that she knows s and r such that Tid = ids and functions — F could include conditionals or arbitrary transforma-
com = g s hr as follows: tions based on the tuple. When computing the average of functions,
(a) Client picks k1 and k2 at random in Zn . She computes some error is introduced due to the presence of junk tuples because
T1 = g k1 mod n, T2 = hk2 mod n and T3 = idk1 each sample is weighted by the number of uploads. However, as we
mod n and gives them to the server. show in §9.2, the error is small (e.g., < 3% for average speed for
(b) Server picks c a prime number, at random and sends it CarTel).
to Client. Note in Table 1 that count is a special case and we can make some
(c) Client computes r1 = k1 + sc, r2 = k2 + rc and sends simplifications. We do not need the SM at all. The reason is that the
them to Server. result of the aggregate itself equals the number of tuples generated at
?
(d) Server checks if comc T1 T2 ≡ g r1 hr2 mod n and the sample point, so there is not reason to hide this number any more.
?
Tidc T3 ≡ idr1 mod n. If the check succeeds, it out- Also, since each sample of a client passing through the sample point
puts “ACCEPT”; else, it outputs “REJECT”. is 1, there is no reason to hide this value, so we remove the sample
encryptions. As such, there is no need for the SM any more. Instead
This proof can be made non-interactive using the Fiat-Shamir of contacting the SM in the sync. interval (Fig. 3), clients directly
heuristic [13] or following the proofs in [5]; due to space limits, we upload to the server at the same random timing.
do not present the protocol here. The idea is that the client computes Additive homomorphic encryption does not support median, min,
c herself using a hash of the values she sends to the server in a way and max. For the median, one possible solution is to approximate
that is computationally infeasible for the client to cheat. it with the mean. One possible solution is to use order-preserving
T HEOREM 2. Under the strong RSA-assumption, Construction encryption [4]: for two values a < b, encryption of a will be smaller
1 is a zero-knowledge proof of knowledge that the client knows s and than encryption of b. This approach enables the server to directly
σ such that ids = Tid mod n and σ is a signature by the server on compute median, min, and max. However, it has the drawback that
s. the server learns the order on the values. An alternative that leaks
less information is as follows: To compute the min, a client needs to
Due to space constraints, the proof is presented in a longer ver- upload an additional bit with their data. The bit is 1 if her sample is
sion of this paper at http://nms.csail.mit.edu/projects/ smaller than a certain threshold. The server can collect all values
privacy/. with the bit set and ask the SM to identify and decrypt the minimum.
(Special purpose protocols for other statistics are similar.)
6.3 Optimization
To avoid performing the accountability check for each tuple up-
loaded, the server can perform this check probabilistically; with a

Table 1: Table of aggregate statistics supported by PrivStats with example applications and implementation.
Aggregation Example Application Implementation
Summation No. of people carpooled Each client uploads encryption of the number of people in the car. Junk tuples are encryptions
through an intersection. of zero.
Addition
P of functions Count of people exceeding Each client applies F to her tuple and uploads encryption of the resulting value. Junk tuples
F (tuplei ) speed limit. This is a gener- are encryptions of zero.
alization of summation.
Average Average speed or delay. See average of functions where F (tuple) = sample.
Standard Deviation Standard deviation of delays. Compute average of functions (see below) where F (tuple) = sample2 and denote the result
Avg1 . Separately
p compute F (tuple) = sample and denote the result Avg2 . The server
computes Avg2 − (Avg1 )2 . This procedure also reveals average to the server, but likely
one would also want to compute average when computing std. deviation.
If count (see below) is also computed for the sample point, compute N
P
Average of functions
PN
Average no. of people in a i=1 F (tuplei ) as
i=1 F (tuplei )/N
certain speed range. Average above instead, and then the server can divide by the count. Otherwise, compute summation
speed and delay. This is a gen- of functions, where junk tuples equal real tuples. The server divides the result by the
eralization of average. corresponding Uid .
Count Traffic congestion: no. of The SM is not needed at all for count computation. Clients do not upload any sample value
drivers at an intersection. (uploading simply hidi is enough to indicate passing through the sample point). There are no
junk tuples.

8. PRIVACY ANALYSIS The SM may attempt a DoS attack on an aggregate by telling
We saw that if the clients, smoothing module, and the server clients that Uid tuples have already been uploaded and only allowing
follow the SLP protocol, clients have strong location privacy guar- one client with a desired value to upload. However, the server can
antees. We now discuss the protection offered by PrivStats if the easily detect such cheating when getting few uploads.
server, SM, or clients maliciously deviate from our strict location Malicious clients. Our accountability protocol prevents clients
privacy protocol. from uploading too much at an aggregate, too much over all aggre-
Malicious server. The server may attempt to ask the SM to de- gates, and out-of-range values. Therefore, clients cannot affect the
crypt the value of one sample as opposed to the aggregate result. correctness of an aggregate result in this manner. As mentioned in
However, this prevents the server from obtaining the aggregate re- §2, we do not check if the precise value a client uploads is correct,
sult, since the SM will only decrypt once per aggregate id, and we but we show in §9.2 that incorrect value in the allowable range likely
assume that the server has incentives to compute the correct result. will not introduce a significant error in the aggregate result.
Nevertheless, to reduce such an attack, the SM could occasionally Colluding clients may attempt to learn the private path of a spe-
audit the server by asking the server to supply all samples it aggre- cific other client. However, since our SLP definition models a
gated. The SM checks that there are roughly Uid samples provided general adversary with access to server data, such clients will not
and that their aggregation is indeed the supplied value. learn anything beyond the aggregate result and the SI they already
The server may attempt to act as a client and ask the SM for sid knew (which includes, for example, their own paths). A client may
(because the upload is anonymous) to figure out the total number try a DoS attack by repeatedly contacting the SM so that the SM
of tuples. Note that, if the aggregate is not underpopulated, if the thinks Uid tuples have been uploaded. This attack can be prevented
server asks for sid at time t, it will receive as answer approximately by having the SM also run the check for the total number of tuples a
Uid (t − ts0 )/(ts1 − ts0 ), a value it knew even without asking the client can upload; Uid for a popular aggregate should be larger than
SM. This is one main reason why we used the “growing sid ” idea the number of aggregates a client can contribute to in a day. For a
for the sync. interval (Fig. 3), as opposed to just having the SM more precise, but expensive check, the SM could run a check for the
count how many clients plan to upload and then informing each quota of tuples per aggregate, as the server does.
client of the total number at the end of the sync. interval, so that they Aggregate result. In some cases, the aggregate result itself may
can scale up their uploads to reach Uid . Therefore, to learn useful reveal something about clients’ paths. However, our goal was to
information, the server must ask the SM for sid frequently to catch enable the server to know this result accurately, while not leaking
changes in sid caused by other clients. However, when the server additional information. As mentioned, clients can choose not to
asks the SM for sid , the helper increases sid as well. Therefore, the participate in certain aggregates (see §2).
SM will reach the Uid quota early in the sync. interval and not allow Differential privacy protocols [11] add noise into the aggregate re-
other clients to upload; hence, the server will not get an accurate sult to avoid leaking individual tuples; however, most such protocols
aggregate result, which is against its interest. leak all the private paths to the server by the nature of the model.
Malicious SM. Since the server verifies the decryption, the SM §3 explains how the SLP and differential privacy models are com-
cannot change the result. Even if the SM colludes with clients, plementary. A natural question is whether one can add differential
the SM has no effect on the accountability protocol because it is privacy on top of the PrivStats protocol, while retaining PrivStats’
entirely enforced by the server. If the SM misbehaves, the damage guarantees. At a high level, the SM, upon decrypting the result, may
is limited because clients upload tuples without identifiers, network decide how much noise to add to the result to achieve differential
origin, and our staged timing protocol for upload hides the time privacy, also using the number of true uploads it knows (the number
when the tuples were generated even from the SM. A compromised of times it was contacted during the sync. interval). The design
SM does permit SI attacks based on sample value and number of of a concrete protocol for this problem is future work. Related to
tuples; however, if the SM is distributed, we expect the fraction of this question is the work of Shi et al. [37], who proposed a protocol
aggregates with a compromised SM to be small and hence our SLP combining data hiding from the server with differentially-private
guarantees to hold for most aggregates. aggregate computation at the server; they consider the different

End-to-end metric Result
Contact SM
Setup 0.16 s
Join, Nexus client 0.92 s
Prepare tuples
Join, laptop client 0.42 s
Upload without account., Nexus 0.29 s Prepare account.
Upload with account., Nexus 2.0 s proof
Upload with 20% account., Nexus 0.6 s Upload to server and wait reply
Upload without account., laptop 0.094 s
Upload with account., laptop 0.84 s 0 0.5 1 1.5 2
seconds
Aggregation (103 samples) 0.2 s
Aggregation (104 samples) 0.46 s
Aggregation (105 samples) 3.1 s Figure 4: Breakdown of Nexus upload latency.

Table 2: End-to-end latency measurements. Server 0.29 s Client laptop 0.46 s Client Nexus 1.16 s

Table 3: Runtime of the accountability protocol.
setting of n parties streaming data to a server computing an overall
summation. While this is an excellent attempt at providing both Server metric Measurement
types of privacy guarantees together, their model is inappropriate for Upload latency, with account. 0.3 s
our setting: they assume that a trusted party can run a setup phase Upload latency, no account. 0.02 s
for each specific set of clients that will contribute to an aggregate Throughput with 0% account. 2400 uploads/core/min
ahead of time. Besides the lack of such a trusted party in our model, Throughput with 20% account. 860 uploads/core/min
Throughput with 100% account. 170 uploads/core/min
importantly, one does not know during system setup which specific
clients will pass through each sample point in the future; even the
Table 4: Server evaluation for uploads. Latency indicates the time it
number n of such clients is unknown.
takes for the server to process an upload from the moment the request
reaches the server until it is completed. Throughput indicates the num-
9. IMPLEMENTATION AND EVALUATION ber of uploads per minute the server can handle.
In this section, we demonstrate that PrivStats can run on top of
commodity smartphones and hardware at reasonable costs. We
implemented an end-to-end system; the clients are smartphones Since this occurs either in the background or after the client triggered
(Nexus One) or commodity laptops (for some social crowd-sourcing the upload, the user does not need to wait for completion. Figure 4
applications), the server is a commodity server, and the SM was shows the breakdown into the various operations of an upload. We
evaluated on smartphones because it runs on clients. The system can see that the accountability protocol (at client and server) takes
implements our protocols (Fig. 2) with SLP and enforcing a quota most of the computation time (86%). The cost of accountability is
number of uploads at each aggregate per client for accountability. summarized in Table 3.
We wrote the code in both C++ and Java. For our evaluation For aggregation, we used the Paillier encryption scheme which
below, the server runs C++ for efficiency, while clients and SM takes 33 ms to encrypt, 16.5 ms to decrypt, and 0.03 ms for one
run Java. Android smartphones run Java because Android does not homomorphic aggregation on the client laptop, and 135 ms to en-
fully support C++. (As of the time of the evaluation for this paper, crypt and 69 ms to decrypt on Nexus. The aggregation time includes
Android NDK lacks support for basic libraries we require.) We server computation and interaction with the SM. The latency of
implemented our cryptographic protocols using NTL for C++ and this operation is small: 0.46 s for 104 tuples per aggregate, more
BigInteger for Java. Our implementation is ≈ 1300 lines of code for tuples than in the common case. Moreover, the server can aggregate
all three parties, not counting libraries; accountability forms 55% samples as it receives them, rather than waiting until the end.
of the code. The core code of the SM is only 62 lines of code (not In order to understand how much capacity one needs for an appli-
including libraries), making it easy to secure. cation, it is important to determine the throughput and latency at the
server as well as if the throughput scales with the number of cores.
9.1 Performance Evaluation We issued many simultaneous uploads to the server to measure these
In our experiments, we used Google Nexus One smartphones metrics, summarized in Table 4. We can see that the server only
(1GHz Scorpion CPU running Android 2.2.2., 512 MB RAM), a needs to perform 0.3 s of computation to verify a cryptographic
commodity laptop (2.13 GHz Intel Pentium CPU 2-core, 3 GB proof and one commodity core can process 860 uploads per minute,
RAM), a commodity server (2.53GHz Intel i5 CPU 2-core, 4 GB a reasonable number. We parallelized the server using an adjustable
RAM), and a server with many cores for our scalability measure- number of threads: each thread processes a different set of aggregate
ments: Intel Xeon CPU 16 cores, 1.60 GHz, 8 GB RAM. In what identifiers. Moreover, no synchronization between these threads
follows, we report results with accountability, without accountability was needed because each aggregate is independent. We ran the
(to show the overhead of the aggregation protocols), and with 20% experiment on a 16-core machine: Fig. 5 shows that the throughput
of uploads using accountability as recommended in §6.3. indeed scales linearly in the number of requests.
Table 2 shows end-to-end measurements for the four main op- We proceed to estimate the number of cores needed in an applica-
erations in our system (Fig. 2); these include our operations and tion. In a social crowd-sourcing application, suppose that a client
network times. We can see that the latency of setup and join are uploads samples on the order of around 10 times a day when it visits
insignificant, especially since they only happen once for the service a place of interest (e.g., restaurant). In this case, one commodity
or once for each client, respectively. core can already serve about 120,000 clients.
The latency of upload, the most frequent operation, is measured In the vehicular case, clients upload more frequently. If n is
from the time the client wants to generate an upload until the upload the number of cars passing through a sample point, a server core
is acknowledged at the server, including interaction with the SM. working 24 h can thus support ≈ 24 · 60 · 860/n statistics in one
The latency of upload is reasonable: 0.6 s and up to 2 s for Nexus. day. For instance, California Department of Transportation statistics

You can also read