PRIVACY IN VOIP NETWORKS: FLOW ANALYSIS ATTACKS AND DEFENSE
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
1 Privacy in VoIP Networks: Flow Analysis Attacks and Defense Mudhakar Srivatsa† , Arun Iyengar† , Ling Liu‡ and Hongbo Jiang∗ IBM T.J. Watson Research Center† College of Computing, Georgia Institute of Technology‡ Huazhong Institute of Science and Technology∗ {msrivats, aruni}@us.ibm.com, lingliu@cc.gatech.edu, hongbojiang@hust.edu.cn Abstract— 1 Peer-to-Peer VoIP (voice over IP) networks, ex- in the network and to place voice calls to other clients on emplified by Skype [5], are becoming increasingly popular due the network. VoIP uses the two main protocols: route setup to their significant cost advantage and richer call forwarding protocol for call setup and termination, and Real-time Trans- features than traditional public switched telephone networks. One of the most important features of a VoIP network is pri- port Protocol (RTP) for media delivery. In order to satisfy vacy (for VoIP clients). Unfortunately, most peer-to-peer VoIP QoS requirements, a common solution used in peer-to-peer networks neither provide personalization nor guarantee a quan- VoIP networks is to use a route setup protocol that sets up tifiable privacy level. In this paper we propose novel flow analysis the shortest route on the VoIP network from a caller src to attacks that demonstrate the vulnerabilities of peer-to-peer VoIP a receiver dst2 . RTP is used to carry voice traffic between networks to privacy attacks. We then address two important challenges in designing privacy-aware VoIP networks: Can we the caller and the receiver along an established bi-directional provide personalized privacy guarantees for VoIP clients that voice circuit. allow them to select privacy requirements on a per-call basis? In such VoIP networks, preserving the anonymity of caller- How to design VoIP protocols to support customizable privacy receiver pairs becomes a challenging problem. In this paper we guarantee? This paper proposes practical solutions to address focus on attacks that attempt to infer the receiver for a given these challenges using a quantifiable k-anonymity metric and a privacy-aware VoIP route setup and route maintenance protocols. VoIP call using traffic analysis on the media delivery phase. We present detailed experimental evaluation that demonstrates We make two important contributions. First, we show that the performance and scalability of our protocol, while meeting using the shortest route (as against a random route) for routing customizable privacy guarantees. voice flows makes the anonymizing network vulnerable to flow analysis attacks. Second, we develop practical techniques to achieve quantifiable and customizable k-anonymity on VoIP I. I NTRODUCTION networks. Our proposal exploits the fact that audio codecs The concept of a mix [9] was introduced by Chaum in 1981. (such as G.729A without silence suppression3 ) generate sta- Since then several authors have used mix as a network routing tistically identical packet streams that can be mixed without element to construct anonymizing networks such as Onion leaking much information to an external observer (see Fig 2). Routing [16], Tor [10], Tarzan [15], or Freedom [7]. Mix The following portions of this paper are organized as fol- network provides good anonymity for high latency communi- lows. We present a reference model for a VoIP network fol- cations by routing network traffic through a number of nodes lowed by flow analysis attacks in Section III. Section IV pro- with random delay and random routes. However, emerging vides a more concrete definition of k-anonymity and describes applications such as VoIP, SSH, online gaming, etc have addi- an efficient anonymity-aware route setup protocol (AARSP). tional quality of service (QoS) requirements; for instance ITU We sketch an implementation of our proposal and present ex- (International Telecommunication Union) recommends up to perimental results that quantify the performance and scalability 250ms one-way latency for interactive voice communication; of AARSP in Section V. We present related work in Section recent case study [26] indicates that latencies up to 250ms VI and finally conclude in Section VII. are unperceivable to human users; while latencies over 400ms significantly deteriorate the quality of voice conversations. II. VO IP ROUTE S ETUP P ROTOCOL This paper examines anonymity for QoS sensitive appli- cations on mix networks using peer-to-peer VoIP service as In this section, we describe a commonly used shortest route a sample application. A peer-to-peer VoIP network typically setup protocol in peer-to-peer VoIP networks. The protocol consists of a core proxy network and a set of clients that operates in four steps: initSearch (initiates a route setup by connect to the edge of this proxy network (see Fig 1). This src), processSearch (process route setup request at some network allows a client to dynamically connect to any proxy 2 Enterprise VoIP networks that use SIP or H.323 signaling protocol may 1 A short version of this paper appears in IEEE INFOCOM 2009: not use the shortest route 3 G.729A without silence suppression deterministically generates one IP http://www.research.ibm.com/people/i/iyengar/INFOCOM2009-kanon.pdf. packet every 20ms. With silence suppression voice flows may be non-identical, Prof. Ling Liu’s work was partially supported by grants from NSF CyberTrust thereby making them trivially vulnerable to traditional traffic analysis attacks program, AFOSR, Intel, and an IBM faculty award
2 Fig. 1. Anonymizing VoIP Network Flow 1: x pkts per sec x pkts per sec Flow 1: x pkts per sec y pkts per sec Flow ? Flow 2 Flow ? Flow 1 Flow 2: x pkts per sec x pkts per sec Flow 2: y pkts per sec x pkts per sec VoIP Flows Unequal Flows: |x−y| > threshold Fig. 2. Mixing Statistically Identical VoIP Flows node), processResult (process results of a route setup re- from the following facts: (i) the first search request that reaches quest at some node), and finSearch (concludes the route a node p must have traveled along the shortest route from src setup). One should note that flow analysis attacks exploits only to p, and (ii) In processSearch a node p records the neigh- the shortest path property and is independent of the concrete bor q through which it received the first search request. This route setup protocol. The description here serves as a basis for indicates that the shortest route from src to p is via q. Setting our AARSP in Section IV. p = dst shows that route setup procedure in processResult initSearch. A VoIP client src initiates a route setup for builds the shortest VoIP network path from src to dst. a receiver dst by broadcasting search(searchId, sipurl = After a successful route setup, the clients src and dst ex- dst.sipurl, ts = curT ime) to all nodes p ∈ ngh(src), where change an end-to-end media encryption key and switch to the ngh(src) denotes the neighbors of node src in the VoIP net- media delivery phase. The media delivery phase additional work. Each VoIP client is identified by an URL (say, sip:bob@ uses hop-by-hop re-encryption using pair-wise shared keys example.com). The search identifier searchId is a long ran- between neighboring proxy nodes in the VoIP network. An domly chosen unique identifier and ts denotes the time stamp external observer tapping into the VoIP network may observe at which the search request was initiated. hsrcIP , dstIP , srcP ort, dstP ort, EKp,q (EKsrc,dst (media))i, processSearch. Let us suppose p receives search(searchId, where Kp,q denotes a pair-wise shared symmetric key between sipurl, ts) from its neighbor q. If p has seen searchId in neighboring nodes p and q on the route, Ksrc,dst denotes the the recent past then it drops the search request. Otherwise, end-to-end encryption key and media denotes an encoding of p checks if sipurl is the URL of a VoIP client connected media bits. Further, an observer may observe packet sizes4 and to p. If yes, p returns its IP address using result(searchId, statistics on inter-packet arrival times. Re-encryption essen- p) to q. p broadcasts search(searchId, sipurl, ts) to all p0 tially guarantees unlinkability between EKp,q (EKsrc,dst (media)) ∈ ngh(p)−{q} and caches the search identifier hsearchId, and EKq,r (EKsrc,dst (media)) by examining the contents of sipurl, qi in its recently seen list. Note that p0 has no knowl- the transmitted packet. edge of where the search request is initiated. processResult. Let us suppose p receives result(searchId, III. F LOW A NALYSIS ATTACKS q) from q. Note that p has no knowledge as to where the search In this section, we describe flow analysis attacks on VoIP result was initiated. p looks up its cache of recently seen search networks. These attacks exploit the shortest path nature of queries to locate hsearchId, sipurl, previ. p adds a routing the voice flows to identify pairs of callers and receivers on entry hsipurl, qi and forwards result(searchId, p) to prev. the VoIP network. Similar to other security models for VoIP finSearch. When src receives result(searchId, q) from q, networks, we assume that the physical network infrastructure it adds a routing entry hdst, qi to its routing table. is owned by an untrusted third party (say, tier one/two network The route setup protocol establishes the shortest overlay 4 Media packet sizes obtained from a common encoding algorithm such as network route between src and dst. This observation follows G.729A are typically identical; if not packets can be padded with random bits
3 Fig. 5. Call Hold Time Data Fig. 3. Inferring Number of Flows from Flow Volume p1 1 1 p3 p1 1 1 p3 p5 1 p5 1 1 p2 p4 p2 p4 Fig. 6. Tracing Voice Calls Fig. 7. Shortest Path Tracing varying from 24ms − 150ms with a mean of 74ms and is within 20% error margin from real world latency measure- ments [17]. The average route (shortest path) latency between any two nodes in the network is 170ms, while the worst case route latency is 225ms. Our experiments over NS-2 use a bursty packet delay model wherein 20% of the packets incur Fig. 4. Call Volume Data an additional delay of up to 44% of average one-way latency [17]. In practice, the total cost of framing, decoding and hop- by-hop re-encryption amounts to about 1.4ms per voice packet service provider). Hence, the VoIP service must route voice on commodity hardware. In our simulations, we adjust link flows on the untrusted network in a way that preserves the latencies to reflect the cost of routing VoIP packets. identities of callers and receivers from the untrusted network. We generate voice traffic based on call volume and call We assume that the untrusted network service provider (ad- hold time distribution obtained from a large enterprise with versary) is aware of the VoIP network topology [28][10] and 973 subscribers (averaged over a month) (see Figures 4 and the flow rates on all links in the VoIP network [19][7]. The 5). The call volume is specified in Erlangs [18]: if the mean network service provider can obtain VoIP topology and flow arrival rate of new calls is λ per unit time and the mean information using traffic analysis (see Fig 3) or using various call holding time (duration of voice session) is h, then the measurement based approaches (such as expanding ring search traffic in Erlangs A = λh; for example, if total phone use in a on the network topology) [23]. We experimentally show that given area per hour is 180 minutes, this represents 180/60 = the attack can be very effective even when only one third of 3 Erlangs. We use G.729A audio codec for generating audio the links are monitored by the adversary. traffic. The (src, dst) pair information for each call was not We represent the VoIP network topology as a weighted made available. We have experimented under two settings: graph G = hV , Ei, where V is the set of nodes and E ⊆ first, we assume that for a given VoIP call, the (src, dst) pair is V ×V is the set of undirected edges. The weight of an edge e chosen randomly from the VoIP network; second, we assume = (p, q) (denoted by w(p, q)) is the latency between the nodes that 80% of the calls are made between nodes that are in the p and q. We assume that the adversary can observe the network same network geography (e.g., same autonomous system). As and thus knows nf (p → q) the number of voice flows between noted in Section III-E, any prior information (such as, 80% two nodes p and q on the VoIP network such that (p, q) ∈ E. of call volume is limited to local network geography) can be To illustrate the effectiveness of our flow analysis attacks, used by the adversary to further enhance the efficacy of flow we use a synthetic network topology with 1024 nodes. The analysis attacks. Finally, we note that all results reported in topology is constructed using the GT-ITM topology gener- this paper have been averaged over seven independent runs. ator [35][1] and our experiments were performed on NS-2 [2][3]. GT-ITM models network geography and the small world A. Naive Tracing Algorithm phenomenon (power law graph with parameter γ=2.1) [14][22]. Let src be the caller. We use a Boolean variable f (p) ∈ {0, The topology generator assigns node-to-node round trip times 1} to denote whether the node p is reachable from src using
4 T RACE(Graph G=hV , Ei, Caller src) (1) for each vertex v ∈ V S HORTEST PATH T RACING(Graph G=hV , Ei, Caller (2) f [v] = 0; label[v] = false src) (3) end for (1) for each vertex v ∈ V (4) f [src] = 1; label[src] = true (2) dist[v] = ∞; label[v] = false; prev[v] = (5) while pick a vertex v labeled true null (6) label[v] = false (3) end for (7) for each node u such that (u, v) ∈ E (4) dist[src] = 0; label[src] = true (8) if (f [u] = 0) (5) while pick a labeled vertex v with minimum (9) f [u] = 1; label[u] = true dist[v] (10) end if (6) label[v] = false (11) end for (7) for each node u such that (u, v) ∈ E (12) end while (8) if (dist[u] < dist[v] + w(u, v)) (9) dist[u] = dist[v] + w(u, v) (10) prev[u] = {v}; label[u] = true Fig. 8. Naive Tracing Algorithm (11) end if (12) if (dist[u] = dist[v] + w(u, v)) (13) prev[u] = prev[u] ∪ {v} the measured flows on the VoIP network. One can determine (14) end if (15) end for f (p) for all nodes p in O(|E|) time as follows. The base case (16) end while of the recursion is f (src) = 1. For any node q, we set f (q) (17) G1 = hV 1 , E 1 i: V 1 = V , E 1 = (u → v) ∀u ∈ to one if there exists a node p such that (p, q) ∈ E ∧ f (p) = prev[v], ∀v ∈ V 1 ∧ nf (p → q) > 0. Let us consider a sample topology shown in Figure 6. For Fig. 9. Shortest Path Tracing Algorithm the sake of simplicity assume that each edge has unit latency. The label on the edges in Figure 6 indicates the number of voice flows. A trace starting from caller p1 will result in f (p1 ) 1 = f (p2 ) = f (p3 ) = f (p4 ) = f (p5 ) = 1. Filtering out the VoIP proxy nodes (p5 ) and the caller (p1 ), the clients p2 , p3 and p4 could be potential destinations for a call emerging from p1 . 0.1 However, the tracing algorithm does not consider the short- est path nature of voice routes. Considering the shortest path Precision nature of voice paths leads us to conclude that p2 is not a 0.01 possible receiver for a call from p1 . If indeed p2 were the receiver then the voice flow would have taken the shorter route p1 → p2 (latency = 1), rather than the longer route p1 → p5 → 0.001 p2 (latency = 2) as indicated by the flow information. Hence, we now have only two possible receivers, namely, p3 and p4 . ’naive-trace’ ’shortest-path-trace’ 0.0001 B. Shortest Path Tracing Algorithm 1 2 4 8 16 32 64 128 256 512 Call Traffic (erlangs) In this section, we describe techniques to generate a directed sub-graph G1 = hV 1 , E 1 i from G which encodes the shortest Fig. 10. Shortest Path Tracing Vs Naive Tracing path nature of the voice paths. Given a graph G and a caller src, we construct a sub-graph G1 that contains only those voice paths that respect the shortest path property. Figure 9 1 uses a breadth first search on G to compute G1 in O(|E|) ’recall’ ’precision’ time. ’f-measure’ One can formally show that the directed graph G1 satisfies the following properties: (i) If the voice traffic from src were Attack Efficacy 0.1 to traverse an edge e ∈ / E 1 , then it violates the shortest path property. (ii) All voice paths that respect the shortest path property are included in G1 . (iii) The graph G1 is acyclic. Figure 7 illustrates the result of applying the algorithm in 0.01 Figure 9 on the sample topology in Figure 6. Indeed if one uses the trace algorithm (Figure 8) on graph G1 , we get f (p2 ) = 0, f (p3 ) = f (p4 ) = 1. Figure 10 compares the effectiveness of the shortest path tracing algorithm with the tracing algorithm 0.001 1 2 4 on a 1024 node VoIP network. On the x-axis we plot the call traffic measured in Erlangs. We quantify the efficacy of Parameter k an attack using standard metrics from inference algorithms: Fig. 11. Statistical Shortest Path Tracing: 128 Erlang Call Volume precision, recall and F-measure. We use S to denote the set
5 1 shortest paths from src to v. One can formally show that all voice paths that respect the top-κ shortest path property are included in Gκ . However, unlike G1 , the graph Gκ (for κ ≥ 0.1 2) may contain directed cycles. Evidently, as κ increases, the tracing algorithm can accom- F-measure modate higher uncertainty in network latencies, thereby im- 0.01 proving recall. On the other hand, as κ increases, the precision initially increases and then decreases. The initial increase is attributed to the fact when κ is small the tracing algorithm may 0.001 even fail to identify the actual receiver as a candidate receiver; f [dst] may be 0 resulting in zero precision. However, for ’shortest-path-trace’ ’statistical-shortest-path’ large values of κ, the number of candidate receivers becomes 0.0001 very large, thereby decreasing the precision metric. Figure 11 1 2 4 8 16 32 64 128 256 512 shows the precision, recall and F -measure of the statistical Call Volume (Erlang) shortest path tracing algorithm with 128 Erlang call volume Fig. 12. F -measure with κ = 2 and varying κ. This experiment leads us to conclude that κ = 2 yields a concise and yet precise list of potential receivers; observe that κ = 2 improves precision and recall by 97% and of nodes such that for every p ∈ S, f [p] = 1. Recall denotes the 37.5% (respectively) over κ = 1. probability of identifying the true receiver dst (dst ∈ S?), and Figure 12 compares the F -measure for the statistical short- precision is inversely related to the size of candidate receiver est path tracing algorithm (κ = 2) and the shortest path tracing 1 algorithm (κ = 1) and varying call volume. We observe that set (∝ |S| ). F -measure (computed as harmonic mean of recall and precision scores) is a commonly used as a single metric for low call volumes the shortest path tracing algorithm is for measuring the effectiveness of inference algorithms [29]. sufficiently accurate. However, for moderate call volumes the statistical shortest path tracing algorithm can improve attack recall = P r(f [dst] = 1) efficacy by 1.5-2.5 times. ( 1 |S| if dst ∈ S precision = 0 otherwise D. Flow Analysis Algorithm 2 ∗ recall ∗ precision We have so far used a Boolean variable f (p) to denote F-measure = recall + precision whether a VoIP client p can be a potential receiver for a VoIP In a deterministic network setting, the receiver dst is guar- call from src. In this section, we use the flow measurements anteed to be marked with f [dst] = 1, that is, recall = 1 for to construct a probability distribution over the set of possible both the naive tracking algorithm and the shortest path tracing receivers. Let Gκ be a sub-graph of G obtained using the top-κ algorithm. Hence, Figure 10 compares only the precision of shortest path tracing algorithm with caller src. Let nf (p → q) these two algorithms. We observe that for low call volumes (< denote the number of flows on the edge p → q. Let in(p) 64 Erlangs) the shortest path tracing algorithm is about 5-10 denote the total number of flows into node p. Note that both times more precise than the naive tracking algorithm. However, nf (p → q) and in(p) are observable by an external adversary. higher call volumes facilitate natural mixing of VoIP flows Assuming a node p in the VoIP network performs perfect thereby decreasing the precision of both the naive tracing and mixing, the probability that some incoming flow is forwarded shortest path tracing algorithms. on the edge p → q as observed by an external adversary is nf (p→q) in(p) . Let f (p) denote the probability that a VoIP flow originating at src flows through node p. The function f is C. Statistical Shortest Path Tracing recursively defined on the directed edges in Gκ =hV κ , E κ i as In a realistic setting with uncertainties in network latencies follows: the shortest path tracing algorithm may not identify the re- X nf (p → q) f (q) = f (p) ∗ (1) ceiver. We handle such uncertainties in network link latencies κ in(p) p→q∈E by using a top-κ shortest path algorithm to construct Gκ from G. An edge e is in Gκ if and only if it appears in some top-κ with the base case f (src) = 1 and in(src) = 1. Now, every shortest path originating from src in graph G. We modified VoIP client p (p 6= src) is a possible destination for the VoIP Algorithm 9 to construct Gκ by simply maintaining top-κ flow originating from src if f (p) > 0. We use the top-m distance measurements dist1 [v], dist2 [v], · · · , distκ [v] instead probability metric, namely, the probability that the receiver of only the shortest (top-1) distance measurement. We also dst appears in the top-m entries when f (p) is sorted in de- maintain previous hops prev 1 [v], prev 2 [v], · · · , prev κ [v] that scending order. Top-m probability besides being an intuitive correspond to each of these top-κ shortest paths. We add an metric directly relates to the information theoretic notion of edge (u, v) to E κ if u = prev i [v] for some 1 ≤ i ≤ κ. We self-information (entropy) I = − log p. Indeed higher entropy say that the voice traffic from src to v satisfies the top-κ represents higher randomness and thus translates into better shortest path property if it is routed along one of the top-κ privacy. Such probability and entropy based metrics are often used for quantifying privacy in anonymous networks [19].
6 1 1 0.9 0.9 0.8 0.8 Top-10 Probability Top-m Probability 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 ’spt’ 0.2 ’m=1’ ’fa’ ’m=10’ 0.1 ’fah’ 0.1 ’m=20’ ’fad’ ’m=50’ 0 0 1 2 4 8 16 32 64 128 256 512 1 2 4 8 16 32 64 128 256 512 Call Volume (Erlang) Call Volume (Erlang) Fig. 13. Top-10 Probability (κ = 2): Comparison between shortest path Fig. 14. Top-m Probability with κ = 2 and Distance prior tracing, flow analysis algorithm with no prior and Distance prior κ # Iterations Time (ms) 1 - 0.03 Computing the probabilities f (p) for G1 (top-1 shortest 2 7 31 paths) is very efficient. Observe that since G1 is a directed 3 11 49.7 acyclic graph, it can be sorted topologically. Let p1 = src, p2 , 4 19 84.2 · · · pN be a topological ordering of the nodes in G1 such that Fig. 15. Computation Cost f (pi ) depends on f (pj ) only if j < i. Hence, one can effi- ciently evaluate the probabilities by following the topological order, namely, compute f (p2 ), f (p3 ), · · · , f (pN ) in that order 1 ’m=1’ starting with f (p1 ) = 1. 0.9 ’m=10’ However, Gκ (for κ ≥ 2) may contain cycles and thus ’m=20’ 0.8 ’m=50’ cannot be topologically sorted. In this case, we represent the Top-m Probability 0.7 set of equations in 1 as π = πM , where π is a 1×N row 0.6 vector and M is a N ×N matrix, where πi = f (pi ) and Mij nf (pi →pj ) 0.5 = in(p i) if there exists a directed edge pi → pj in Gκ ; and Mij = 0 otherwise. Hence, the solution π is the stationary 0.4 distribution of a Markov chain whose transition probability 0.3 matrix is given by M . We compute π using a fixed point com- 0.2 putation approach as follows: we recursively compute π t+1 = 0.1 π t M starting with πi0 = 1 if pi = src; πi0 = 0 otherwise. 0 Assuming M is irreducible, π converges to a steady state 0 0.2 0.4 0.6 0.8 1 solution π ∗ in O(N log N ) iterations. If the matrix M were Fraction of Flow Information not irreducible (this happens if the underlying directed graph G κ is not strongly connected5 ), then we approximate π ∗ as P τ ∗N log N t Fig. 16. Incomplete Flow Information π t=0 τ ∗N log N for some constant τ ≥ 1 (typically, we set τ = 1). 1 0.9 E. Distance Prior and Hop Count Prior 0.8 Top-m Probability In this section, we enhance the efficacy of the flow analysis algorithm using hop count and distance prior. We use ghop 0.7 and glat to denote hop count and distance (in terms of latency) between src and dst. For instance, one can use ghop and glat to 0.6 encode the fact that most calls are between nodes in the same 0.5 autonomous system. Using the hop count prior, the probability ’m=1’ that a node p forwards an incoming flow on the edge p → 0.4 ’m=10’ (p→q) ’m=20’ q is nfin(p) ∗ P r(hop ≥ hc(src, p) + 1 | hop ≥ hc(src, ’m=50’ p)), where hc(src, p) denotes the number of hops along the 0.3 0.0625 0.125 0.25 0.5 shortest path between src and p on graph Gκ . P r(hop ≥ Fraction of Compromised Nodes 5 A directed graph is said to be strongly connected if there exists a directed path from every vertex to every other vertex in the graph Fig. 17. Compromised Proxies
7 hc(src, p) + 1 | hop ≥ hc(src, p)) = PPr(hop≥hc(src,p)+1) r(hop≥hc(src,p)) = hV κ , E κ i and the set of unmonitored links U ⊆ E κ as denotes the probability that the receiver dst could be one more follows: hop away given that dst is at least hc(src, p) hops away from X 1 X nf (p → q) src. f (q) = f (p) ∗ + f (p) ∗ deg(p) κ in(p) A similar analysis applies to distance prior as well. We use p→q∈U p→q∈E \U dist(src, p) to denote the latency of the shortest path between Figure 16 shows the efficacy of the attack assuming that only a src and p on graph Gκ , and w(p, q) denotes the one-way fraction of the flow information is monitored by the adversary. latency between nodes p and q. As with the flow analysis It follows from the figure that with only 20% flow information, algorithm, the function f is defined on the directed edges in flow analysis attacks are ineffective; however, with 30-60% graph Gκ = hV κ , E κ i as follows: flow information the top-m probability increases substantially. This suggests that the flow analysis attack is robust against X nf (p → q) P r(hop ≥ hc(src, p) + 1) f (q) = f (p) ∗ in(p) ∗ P r(hop ≥ hc(src, p)) some inaccuracies in topology and flow information. p→q∈E κ X nf (p → q) P r(lat ≥ dist(src, p) + w(p, q)) f (q) = f (p) ∗ ∗ p→q∈E κ in(p) P r(lat ≥ dist(src, p)) G. Compromised Proxies In addition, to passive observation based attacks, the ad- Figure 13 shows the Top-10 probability of an attack versus versary could actively compromise some of the nodes in the call volume using κ = 2 for different attack algorithms spt: VoIP proxy. We assume an honest-but-curious model for the statistical shortest path tracing, fa: flow analysis (with no compromised nodes. A compromised node p reveals its mixing priors), fah: flow analysis with hop count prior and fad: flow information (namely, nf (r → p → q)) to an adversary, where analysis with distance prior. We observe that the flow analysis nf (r → p → q) denotes the number of voice flows that were algorithm with distance prior offers the best results. This is routed from r to q by the malicious node p. With slight abuse primarily because the route setup protocol always constructs of notation, we use f (p → q) to denote the probability that a voice paths that have minimum one-way latency. The distance VoIP call from src traverses the edge p → q. Hence, the new prior directly reflects on the latency based shortest path nature flow analysis equations are as follows: of the route setup protocol and thus performs best. Figure 14 shows the top-m probability, namely, the proba- ( nf (p→q) f (p) ∗ in(p) honest p bility that the true receiver dst appears in the top-m entries f (p → q) = P nf (r→p→q) r→p f (r → p) ∗ nf (r→p) malicious p when f (p) is sorted in descending order using the flow anal- X ysis algorithm with distance prior and κ = 2. We also experi- f (q) = f (p → q) (2) p→q mented with hop count prior; however, distance prior directly reflects on the latency based shortest path nature of the route Figure 17 shows the effectiveness of the attack as we vary setup protocol and thus performs best. With a call volume of the fraction of nodes that is compromised by the adversary. 64 Erlangs, there is 86% chance that the true receiver dst We use a call volume of 128 Erlangs and κ = 2 in this appears in the top-10 entries. Under very high call volume experiment. Evidently, compromised proxy nodes significantly (512 Erlangs) the top-10 probability drops to 0.17. However, enhance attack efficacy; for instance, when 20% of the nodes we note from our enterprise data set (see Fig 4) that the call are compromised the top-1 probability improves from 0.23 to volume is smaller than 64 Erlangs for about 75% of the day. 0.50. Figure 15 shows the computation cost6 incurred in com- puting the probabilities f (p) for all potential receivers p. In IV. VO IP P RIVACY USING k-A NONYMITY practice, we observed that the number of iterations required for f (p) to converge to its stationary value is much smaller In this section, we develop a k-anonymity approach to pro- that the theoretical bound O(N log N ). We attribute this to the tect the identity of a receiver from flow analysis attacks. We sparse nature of matrix M . The overall running time is of the define k-anonymity for identical voice flows as follows: order of few tens of milliseconds making the attack feasible k-anonymity: A voice flow from src to dst is said to be k- in real-time. anonymous if the size of a candidate receiver set identified by an adversary using the naive tracking algorithm is no smaller F. Incomplete Flow Information than k 7 . So far, we have assumed that the adversary can monitor the As pointed out earlier, k-anonymity can be realized by mixing flow rate on all the links in the VoIP network. In the absence a voice flow from src to dst with at least k − 1 other flows of flow information on a link p→q, we use an unbiased ran- (each of which has a different source and destination nodes). dom walk estimator to compute f (p)’s contribution to f (q) as The primary supposition that any two voice flows are indistin- 1 f (p) ∗ deg(p) , where deg(p) denotes the number of neighbors guishable under traffic analysis (see Fig 2) follows from the of p on the VoIP network. As with the flow analysis algorithm, properties of RTP (real-time protocol) and the constant packet the function f is defined on the directed edges in graph Gκ rate property of silence suppression free audio codecs. One 6 As measured on a 900 MHz Pentium III processor running RedHat Linux 7 k-anonymity is defined with respect to naive tracing, since AARSP does 7.2 not use shortest path routing
8 way to achieve k-anonymity is to mix a flow from src to dst p3 0 with k − 1 dummy voice flows; however, this approach can 0 increase aggregation bandwidth consumption by k-fold. In this section, we propose an anonymity aware route set up protocol p1 p2 (AARSP). AARSP reroutes and mixes existing voice flows 1 1 1 1 (without adding dummy traffic) with the goal of: (i) meeting k-anonymity, and (ii) satisfying latency based QoS guarantee. Figures 18 illustrates a simple scenario with two VoIP flows s1 d1 s2 d2 between clients (s1 , d1 ) and (s2 , d2 ). In the figures, each edge Fig. 18. 1-anonymity on the VoIP network is marked with the number of flows it carries. Figure 19 shows how one can reroute VoIP traffic in order to achieve 2-anonymity. Evidently, the rerouted flows 1 p3 1 may not satisfy the shortest route property and thus may vio- late latency based QoS guarantees. However, AARSP exploits the slack between the shortest path latency and the tolerable p1 p2 latency (250ms) to accomplish its goals. 1 1 1 1 A. AARSP: Anonymity-Aware RSP In this section, we summarize our ideas behind AARSP. s1 d1 s2 d2 AARSP accepts an anonymity parameter k as an input for the Fig. 19. 2-anonymity route setup protocol, on a per-client per-call basis. AARSP modifies the basic route set up protocol (RSP) such that it si- multaneously satisfies three conditions: (i) we have at least one respectively. Node p drops the search request received at time node p ∈ route(src, dst) such that in(p) ≥ k (k-anonymity), t2 if: prev1 = prev2 . (ii) the end-to-end one-way latency on the route from src to AARSP sets up the shortest possible route from src to dst dst is smaller than maxLat (typically set to 250ms), and (iii) that meets k-anonymity (if there exists such a route). If no the total call volume on every node p ∈ route(src, dst) is such route exists, AARSP setup identifies a route with highest smaller than its capacity maxF low(p). possible anonymity k 0 < k; in addition, AARSP sets up the Similar to the naive route set up protocol discussed in Sec- shortest route that achieves k 0 -anonymous route from src to tion II, AARSP uses an expanding search approach to con- dst. Indeed, if one sets k = ∞, then AARSP identifies the struct a k-anonymous path from a caller src to a recipient most anonymous route whose one-way latency is smaller than dst. The key idea is that the expanding search not only tracks maxLat = 250ms. For detailed analysis on the properties the distance from src to a node p in the network (dist[p]), but of the AARSP protocol we refer the readers to a detailed also the anonymity of a route from src to p (anon[p]). Recall technical report [27]. that the naive route set up protocol drops a search request if Figure 20 compares the average one-way path latency for it has seen the search identifier searchId in the recent past. both AARSP and RSP for varying call volumes. At low call On the other hand, in the AARSP protocol, let us suppose volumes AARSP may construct significantly longer routes than that a node p receives a search request identified by searchId RSP. At lower call volumes, AARSP may significantly de- at two time instants t1 and t2 (wlog, t1 < t2 ) such that the viate from the shortest path to achieve the desired level of anonymity of the route traversed by the search requests is k1 and k2 respectively. Node p drops the search request received at time t2 if: k1 ≥ k or k1 ≥ k2 . 250 ’rsp’ AARSP may intentionally introduce loops in the path to 240 ’aarsp’ improve the anonymity of voice flows. In doing so, AARSP 230 ensures the protocol converges to a valid path by limiting 220 number of times any loop is traversed by a path to at most one. 210 Latency Unlike the shortest path route setup protocol, the routing entry 200 at a node p in the AARSP protocol is a three tuple hsipurl, prev, nexti, which indicates that when a node p receives voice 190 packets from node prev destined to sipurl, then node p must 180 forward the packet to node next. The three tuple routing table 170 allows us to handle loops; for example, from Figure 19 node 160 p3 may have the following two routing entries: hd1 , p1 , p2 i 150 and hd1 , p2 , p1 i. In the AARSP protocol, let us suppose that 1 2 4 8 16 32 64 128 256 512 a node p receives a search request identified by searchId at Call Volume (Erlang) two time instants t1 and t2 (wlog, t1 < t2 ) such that the search request was forwarded to node p by nodes prev1 and prev2 Fig. 20. AARSP Vs RSP: Path Latency
9 anonymity; higher call volumes facilitate natural mixing of voice flows and thus require relatively smaller deviation from 30 the shortest path. Nonetheless, AARSP ensures that the la- tency is below 250ms and thus preserves the quality of voice 25 conversations. Anonymity Level Figure 21 shows the anonymity level of a VoIP call as 20 a function of its hold time given that the initial route was established using k = 20. Anonymity level of a VoIP flow F 15 at time t denotes the number of flows mixed with F at time t along F ’s route. We tracked the worst case (lowest) and 10 the best case (highest) anonymity levels of a VoIP call. We ’anon-low-64Erlang’ observe that at lower call volumes (64 Erlangs) calls with hold 5 ’anon-low-128Erlang’ time larger than 400 seconds may experience time durations ’anon-high-64Erlang’ ’anon-high-128Erlang’ at which their anonymity level falls below 80% of the initial 0 value. Fortunately, 95% of voice calls are shorter than 400s 100 1000 (see Fig 5). Long lasting calls may set up a lower watermark Call Hold Time (seconds) level k 0 ≤ k such that the AARSP route maintenance protocol Fig. 21. Route Maintenance automatically terminates the call when its anonymity level falls below k 0 . B. Flow Analysis Attacks on AARSP A flow analysis attack on AARSP operates efficiently as 1 follows. We assume that the adversary knows the parameter k 0.9 used by a caller src. First, the adversary identifies all nodes p 0.8 such that in(p) ≥ k; if there exists no such p, then the attacker Top-10 Probability picks p that maximizes in(p). Let Gκsrc be a sub-graph of G 0.7 obtained using the top-κ shortest path tracing algorithm with 0.6 caller src. We use Gκsrc and the flow measurements to compute 0.5 the probability that the voice flow is routed from src to p (fsrc (p)) for all p such that in(p) ≥ k starting with fsrc (src) 0.4 = 1. The algorithm used for computing fsrc (p) is described 0.3 in Section III-D. 0.2 ’rsp’ Second, for every such node p with in(p) ≥ k, we con- ’aarsp’ 0.1 struct Gκp as a sub-graph of G obtained using the top-κ short- 1 2 4 8 16 32 64 128 256 512 est path tracing algorithm with p as the caller. We compute Call Volume (Erlang) the probability of a voice flow being routed from src to r via p (fsrc,p (r)) for every candidate receiver r starting with Fig. 22. AARSP Vs RSP: Top-10 Probability fsrc,p (p) = fsrc (p), where fsrc (p) is obtained from the first step. Third, we compute the probability of r being the re- ceiver as frecv (r) = fsrc,pivot (r), where pivot = argminp {dist(src, p) + dist(p, r)}. Similar to other flow analysis at- 1 tacks, one could sort the receivers in descending order of f (r) ’rsp’ 0.9 ’aarsp’ and use a top-m probability metric to study the efficacy of the attack. 0.8 Top-m Probability Figures 22 and 23 compare the effectiveness of AARSP 0.7 with the shortest path route setup protocol (RSP) in mitigating 0.6 flow analysis attacks. We set the parameter k = ∞ so that 0.5 AARSP identifies the most anonymous route from caller src 0.4 to receiver dst that satisfies the latency constraint. We use the flow analysis described in this section for AARSP and a flow 0.3 analysis attack with distance prior for RSP. Figure 22 shows 0.2 the top-10 probability using both AARSP and RSP for varying 0.1 call volumes. Observe that at low and moderate call volumes 0 AARSP offers significantly improved protection against flow 1 2 4 8 16 32 64 analysis attacks. Figure 23 shows the top-m probability for Parameter m both AARSP and RSP for varying m and a call volume of Fig. 23. AARSP Vs RSP: 128 Erlangs 128 Erlangs. We also observe that AARSP consistently out performs RSP for all values of m.
10 C. Tolerating Compromised Proxies 1 In this section, we study the effect of compromised proxies 0.9 on AARSP. Similar to Section III, we assume that a com- 0.8 promised node will execute AARSP honestly. However, the Top-10 Probability malicious nodes may record observed information during the 0.7 voice session and collude with the external observer. In par- 0.6 ticular, a malicious node may reveal the mapping between its 0.5 in-flows and out-flows thereby decreasing the anonymity of a VoIP flow. 0.4 We describe a simple extension to AARSP to tolerate ma- 0.3 ’c=1’ licious nodes and yet preserve k-anonymity. We allow the 0.2 ’c=2’ caller src to specify a personalized security parameter c for ’c=3’ 0.1 every VoIP call indicating that the route from src to dst 0.0625 0.125 0.25 0.5 should tolerate up to c compromised nodes while preserving Fraction of Compromised Nodes k-anonymity. The key idea is to construct a route from src to dst with at least c + 1 nodes p1 , p2 , · · · , pc+1 such that in(pi ) Fig. 24. Compromised Proxies ≥ k for all 1 ≤ i ≤ c+1. One can introduce this constraint by recording top-c anonymity levels in the anon field of ASRSP. k Time(s) Now, one could replace the constraint anon ≥ k with anon 2 9.0 ≥ (k, c). 10 5.9 While this approach offers significantly higher attack re- 20 4.0 silience, one can extend the flow analysis attack on a k-anonymous 50 2.9 AARSP to a (k, c)-anonymous AARSP. We assume that the Fig. 25. Attack Cost Vs Anonymity Level (k) adversary knows the parameters k and c used by a caller src. The adversary identifies a set of nodes S such that for all nodes p ∈ S, in(p) ≥ k. For every permutation of c computation cost for an attack grows rapidly with c, making nodes from the set S, the adversary computes the probabil- it harder for an adversary to launch flow analysis attacks in ity of a voice flow being routed from src to r via p1 , p2 , real-time. · · · , pc (fsrc,p1 ,p2 ,··· ,pc (r)) for a candidate receiver r. We recursively compute fsrc,p1 ,p2 ,··· ,pi (r) using flow analysis al- V. P ERFORMANCE E VALUATION gorithms on Gκpi and starting probability fsrc,p1 ,··· ,pi (pi ) = A. Implementation Sketch fsrc,p1 ,p2 ,··· ,pi−1 (pi ). The recursion terminates at the base case fsrc (src) = 1. Finally, we compute the probability of r be- In this section, we briefly describe an implementation of ing the receiver as frecv (r) = fsrc,pivot (r), where pivot = our algorithms using Phex [4]: an open source Java based argmin{p1 ,p2 ,··· ,pc }⊆S {latdist(src, p1 ) + latdist(p1 , p2 ) + implementation of shortest route set up protocol (RSP). VoIP · · · + latdist(pc−1 , pc ) + latdist(pc , r)}. This attack reduces protocols operate on top of the peer-to-peer infrastructure. a simple flow analysis attack on AARSP when c = 1; however, We have implemented our algorithms as pluggable modules the pivot size and consequently the attack complexity grows that can be weaved into the Phex client code using AspectJ exponentially in c. [12]. Our implementation is completely transparent to the VoIP Figure 24 shows the effectiveness of the attack as we vary protocol that operates on top of the peer-to-peer infrastructure. the fraction of nodes that are compromised by the adversary Also, our implementation does not require any changes to and the parameter c. We use a call volume of 128 Erlangs, topology construction and maintenance algorithms (as nodes k = 10 and κ = 2 in this experiment. We observe that as c join, leave, fail or recover) and the underlying TCP/IP or UDP increases the effectiveness of the attack decreases significantly; based communication libraries. for instance, when 10% of the nodes are compromised, using Below we sketch our implementation of AARSP. The Phex c = 3 reduces the top-10 probability to 0.06 when compared to broadcast search protocol has four operations: initSearch, processSearch, processResult and finSearch. These c = 1 which results in a top-10 probability of 0.44. Nonethe- less, when a large number of nodes are malicious AARSP four operations are implemented as event handlers in Phex. becomes vulnerable to flow analysis attacks. Under a realistic When a Phex client receives a messages, it determines the setting, when only a small number of nodes (< 20%) are likely type of the message (search request, search result, etc) and to be malicious, AARSP offers very good protection against flow analysis attacks. c Time(s) Figure 25 shows the computation cost incurred to an adver- 1 2.9 sary as k increases. As k increases, the number of pivot nodes 2 4.4 p, that is, p such that in(p) ≥ k decreases; and thus, the 3 11.6 computation cost decreases. Figure 26 shows the computation 4 43.5 cost for an attack as c increases with k=∞. We observe that the Fig. 26. Attack Cost Vs Tolerance to Malicious Nodes (c)
11 triggers the appropriate event handler. AARSP substitutes all four event handlers (see Section IV). In addition, we also mod- Per Node Msg Cost (# msgs per search) ’rsp’ ify the search request payload and the search result payload 2.5 ’aarsp-k=10’ ’aarsp-k=20’ to include the parameters k and c. In the rest of this section, ’aarsp-k=50’ all experimental results are reported using our implementation 2 deployed on 64 nodes in Planet Lab8 . 1.5 B. Performance and Scalability 1 In this section, we compare the messaging cost and the search latency of AARSP and RSP. Figure 27 shows the mes- 0.5 saging cost per node as we vary the call volume and the anonymity parameter k (using c = 1). These experiments show that AARSP incurs about 1-3 times the messaging cost of RSP. 0 1 2 4 8 16 32 64 128 256 512 However, the search request and the results are typically of Call Volume (Erlang) the order of 300 Bytes. Hence, the communication cost at a node for handling 10 messages (3 KB) is equivalent to a voice Fig. 27. Messaging Cost session of one second (24 Kbps). Even though AARSP incurs higher communication cost than RSP, its effect on the overall bandwidth consumption is negligible. Figure 28 shows the latency of a search operation as we 250 vary the call volume and the anonymity parameter k (using c = 1). This experiment shows that AARSPs incurs about 30- 40% higher search latency than RSP. One should note that the 200 search latency only affects the initial connection set up time. Search Latency (ms) Once the route is established AARSP ensures good quality 150 voice conversations by limiting the path latency to 250ms. Figures 27 and 28 also show that the relative overhead of 100 AARSP over RSP decreases with call volume. The key intu- ition here is that higher call volumes facilitate natural mixing of voice flows, thereby decreasing the overall messaging and 50 ’rsp’ ’aarsp-k=10’ search cost. ’aarsp-k=20’ Figure 29 shows the average number of concurrent VoIP ’aarsp-k=50’ 0 calls handled by a node in the VoIP network (using c = 1). 1 2 4 8 16 32 64 128 256 512 AARSP incurs a higher load primarily because the traffic is Call Volume (Erlang) not routed through the optimal route. Hence, a voice route in AARSP may include more network hops than RSP. This Fig. 28. Search Latency increases the average number of proxies that route one voice call, and thus increases the average load on a proxy. The per- centile increase in average node load for higher call volumes is small. Hence, when the call volume is high (VoIP network 45 is heavily loaded), AARSP imposes small overheads (20-40%) ’rsp’ 40 ’aarsp-k=10’ compared with RSP. On the other hand, when the call volume Node Load (# concurrent calls) ’aarsp-k=20’ is low (VoIP network is lightly loaded), AARSP incurs 2-3 35 ’aarsp-k=50’ times the average node load when compared to RSP. However, 30 this 2-3x increase in node load is incurred when the VoIP 25 network is itself lightly loaded; hence, AARSP does not harm the performance and scalability of the VoIP network. 20 15 VI. R ELATED W ORK 10 Privacy has long been a hot button issue for both the VoIP 5 clients and the law enforcement bodies. On one hand, users 0 want their phone conversations to be anonymous; anonymity 1 2 4 8 16 32 64 128 256 512 offers them possible deniability thereby shielding them from Call Volume (Erlangs) law enforcement bodies. On the other hand FCC (Federal Fig. 29. Node Load 8 http://www.planet-lab.org/
12 Communications Commission) [13] considers capability of track- an anonymity aware route setup protocol (AARSP) that allows ing VoIP calls “of paramount importance to the law enforce- clients to specify personalized privacy requirements for their ment and the national security interests of the United States”. voice calls (on a per-client per-call basis) using a quantifiable Similar to other VoIP privacy papers [30], we leave aside the k-anonymity metric. We have implemented our proposal on the controversy between anonymity and security. Instead we focus Phex client and presented detailed experimental evaluation that on technical feasibility of privacy attacks and defenses on VoIP demonstrates the performance and scalability of our protocol, networks. while meeting customizable privacy guarantees. Mix [9] is a routing element that attempts to hide correspon- dences between its input and output messages. A large number R EFERENCES of low latency anonymizing networks have been built using [1] GT-ITM: Georgia tech internetwork topology models. the concept of a mix network [9][25]. Onion routing [16] and http://www.cc.gatech.edu/projects/gtitm/. its second generation Tor [10] aim at providing anonymous [2] The network simulator NS-2. http://www.isi.edu/nsnam/ns/. [3] The network simulator NS-2: Topology generation. transport of TCP flows over the Internet. ISDN mixes [20] http://www.isi.edu/nsnam/ns/ns-topogen.html. proposes solutions to anonymize phone calls over traditional [4] Phex client. http://www.phex.org. PSTN (Public Switched Telephone Networks). In this paper we [5] Skype - the global internet telephone company. http://www.skype.com. [6] Telegeography research. http://www.telegeography.com. have focused on VoIP networks given its recent wide spread [7] A. Back, I. Goldberg, and A. Shostack. Freedom 2.1 security issues and adoption9 . analysis. Zero Knowledge Systems, Inc. white paper, 2001. It is widely acknowledged that low latency anonymizing [8] A. Blum, D. Song, and S. Venkataraman. Detectiong of interactive stepping stones: Algorithms and confidence bounds. In 7th RAID, 2004. networks [10][16][7] are vulnerable to timing analysis attacks [9] D. Chaum. Untraceable electronic mail, return addresses, and digital [28][25], especially from well placed malicious attackers [33]. pseudonyms. In Communications of ACM, 24(2): 84-88, 1981. Several papers have addressed the problem of tracing encrypted [10] R. Dingledine, N. Mathewson, and P. Syverson. Tor: The second traffic using timing analysis [32][34][36][31][8][11][30]. All [11] generation onion router. In 13th USENIX Security Symposium, 2000. D. L. Donoho, A. G. Flesia, U. Shankar, V. Paxson, J. Coit, and these papers use inter-packet timing characteristics for trac- S. Staniford. Multiscale stepping stone detection: Detecting pairs of ing traffic. Complementary to all these approaches, we have jittered interactive streams by exploiting maximum tolerable delay. In introduced flow analysis attacks that target the shortest path [12] 5th RAID, 2002. Eclipse. Aspectj compiler. http://eclipse.org/aspectj. property of voice routes and presented techniques to provide [13] FBI. Letter to FCC. http://www.askcalea.com/docs/20040128.jper.letter.pdf. customizable anonymity guarantees in a VoIP network. Unlike [14] B. Fortz and M. Thorup. Optimizing OSPF/IS-IS weights in a changing world. In IEEE Journal on Special Areas in Communication, 2002. the timing analysis attacks, our approach does not rely upon [15] M. J. Freedman and R. Morris. Tarzan: A peer-to-peer anonymizing inter-packet times to detect caller-receiver pairs; instead we network layer. In 9th ACM CCS, 2002. analyze the volume of flow in the VoIP network and deduce [16] D. Goldschlag, M. Reed, and P. Syverson. Onion routing for anonymous and private internet connections. In Communications of ACM, Vol 42(2), possible caller-receiver pairs using the flow information and 1999. the underlying VoIP network topology. [17] K. Gummadi, R. Gummadi, S. Gribble, S. Ratnasamy, S. Shenker, and Tarzan [15] presents an anonymizing network layer using a I. Stoica. The impact of DHT routing geometry on resilience and proximity. In ACM SIGCOMM, 2003. gossip-based peer-to-peer protocol. We note that flow analysis [18] J. Y. Hui. Switching and traffic theory for integrated broadband attacks target the shortest path property and not the protocol networks. Academic Press ISBN: 079239061X, 1990. used for constructing the route itself; hence, a gossip based [19] G. Perng, M. K. Reiter, and C. Wang. M2: Multicasting mixes for efficient and anonymous communication. In IEEE ICDCS, 2006. shortest path setup protocol is equally vulnerable to flow anal- [20] A. Pfitzmann, B. Pfitzmann, and M. Waidner. ISDN-MIXes: Untraceable ysis attacks. communication with small bandwidth overhead. In GI/ITG Conference Traditionally, multicast and broadcast protocols have been on Communication in Distributed Systems, 1991. used to protect receiver anonymity [21][24]. However, in a [21] A. Pfitzmann and M. Waidner. Networks without user observability. In Computers and Security, 2(6): 158-166. multicast based approach achieving k-anonymity may increase [22] L. Qiu, V. N. Padmanabhan, and G. M. Voelker. On the placement of the network traffic by k-fold. In contrast our paper attempts web server replicas. In 12th IEEE INFOCOM, 2001. to reroute and mix existing voice flows and thus incurs sig- [23] S. Saroiu, P. K. Gummadi, and S. D. Gribble. A measurement study of peer-to-peer file sharing systems. In Multimedia Computing and nificantly smaller overhead on the VoIP network. Networks (MMCN), 2002. [24] C. Shields and B. N. Levine. A protocol for anonymous communication over the internet. In ACM CCS, 2000. VII. C ONCLUSION [25] V. Shmatikov and M. H. Wang. Timing analysis in low latency mix In this paper we have addressed the problem of providing networks: Attacks and defenses. In 11th ESORICS, 2006. [26] G. I. Sound. VoIP: Better than PSTN? privacy guarantees in peer-to-peer VoIP networks. First, we http://www.globalipsound.com/demo/tutorial.php. have developed flow analysis attacks that allow an adversary [27] M. Srivatsa, A. Iyengar, and L. Liu. Privacy in voip networks: A k- (external observer) to identify a small and accurate set of anonymity approach. Technical report, IBM Research RC24625, 2008. [28] P. Syverson, G. Tsudik, M. Reed, and C. Landwehr. Towards an analysis candidate receivers even when all the nodes in the network of onion routing security. In Workshop on Design Issues in Anonymity are honest. We have used network flow analysis and statistical and Unobservability, 2000. inference to study the efficacy of such an attack. Second, we [29] C. J. Van-Rijsbergen. Information retrieval. In Second Edition, Butterworths, London, 1979. have developed mixing based techniques to provide a guaran- [30] X. Wang, S. Chen, and S. Jajodia. Tracking anonymous peer-to-peer teed level of anonymity for VoIP clients. We have developed VoIP calls on the internet. In 12th ACM CCS, 2005. [31] X. Wang and D. Reeves. Robust correlation of encrypted attack traffic 9 According to TeleGeography Research [6], worldwide VoIP’s share of through stepping stones by manipulation of interpacket delays. In 10th voice traffic has grown from 12.8% in 2003 to about 44% in 2007 ACM CCS, 2003.
13 [32] X. Wang, D. Reeves, and S. Wu. Inter-packet delay based correlation for tracing encrypted connections through stepping stones. In 7th ESORICS, 2002. [33] X. Y. Wang and D. S. Reeves. Robust correlation of encrypted attack traffic through stepping stones by manipulation of interpacket delays. In ACM CCS, 2003. [34] K. yoda and H. Etoh. Finding a connection chain for tracing intruders. In 6th ESORICS, 2000. [35] E. W. Zegura, K. Calvert, and S. Bhattacharjee. How to model an internetwork? In IEEE Infocom, 1996. [36] Y. Zhang and V. Paxon. Detecting stepping stones. In 9th USENIX Security Symposium, 2000. Mudhakar Srivatsa is a research scientist at IBM T. J. Watson Research Center where he conducts re- search at the intersection of networking and security. He received a B.Tech degree in Computer Science PLACE and Engineering from IIT, Madras, and a PhD de- PHOTO gree in Computer Science from Georgia Tech. His HERE research interests primarily include security and re- liability of large scale networks, secure information flow, and risk management. Arun Iyengar received the Ph.D. degree in com- puter science from MIT. He does research and de- velopment into Web performance, distributed com- puting, and high availability at IBM’s T.J. Watson PLACE Research Center. Arun is Co-Editor-in-Chief of the PHOTO ACM Transactions on the Web, Founding Chair of HERE IFIP Working Group 6.4 on Internet Applications Engineering, and an IBM Master Inventor. Ling Liu is an associate professor at the College of Computing at the Georgia Institute of Technol- ogy. There, she directs the research programs in the Distributed Data Intensive Systems Lab (DiSL), ex- PLACE amining research issues and technical challenges in PHOTO building large scale distributed computing systems HERE that can grow without limits. Her research group has produced a number of software systems that are either open sources or directly accessible on- line, among which the most popular are WebCQ and XWRAPElite. She is on the editorial board of sev- eral international journals, such as the IEEE Transactions on Knowledge and Data Engineering, the International Journal of Very large Database Systems (VLDBJ), and the International Journal of Web Services Research. Hongbo Jiang is an associate professor at the De- partment of Electronics and Information Engineering at the Huazhong University of Science and Technol- ogy. Hongbo leads Netwoked and Communication PLACE Systems Research group, wherein he conducts re- PHOTO search on computer networking, especially focused HERE on algorithms and architectures for high performance wireless networks.
You can also read