Efficient Anonymization of the SocioNet with the Aid of Rumor Riding
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Efficient Anonymization of the SocioNet with the Aid of Rumor Riding Hiroki IIZUKA1 and Satoshi FUJITA1 1 Department of Information Engineering, Hiroshima University Higashi-Hiroshima, 739-8527, Japan Abstract— In this paper, we propose an anonymized object efficiency of query flooding can be significantly improved search scheme for the SocioNet which is an unstructured compared with random overlays [4]. However, although it P2P based on the notion of the similarity of interests. The certainly improves the efficiency, it causes a serious risk for proposed scheme is an application of a randomized object each user so that the fact of issuing a query, the fact of search scheme proposed by Liu et al. called Rumor Riding responding to the query and the content of the query and (RR, for short). We propose two techniques to overcome the the reply are disclosed to all peers to have similar interests. inefficiency of a simple application of the RR to the SocioNet. In other words, such a simple flooding could not preserve The performance of the proposed scheme is evaluated by the privacy of users which is a crucial drawback of the most simulation. The simulation result indicates that the proposed of existing flooding-based object search schemes. scheme reduces the number of messages to a half of a simple In this paper, we focus on the SocioNet as the underlying combination, and additionally, shows that the number of similarity-based P2P, and propose a scheme to preserve the delegates selected in the RR severely affects the success rate anonymity of users in the network. The proposed scheme of the overall scheme particularly when the TTL is not large. is an application of a randomized method proposed by Liu et al. called Rumor Riding (RR, for short) [5]. The key idea of the RR is to select delegates through random Keywords: Peer-to-Peer content sharing, anonymity of users, walk and to make those delegates to conduct actual query object search, Rumor Riding, SocioNet. flooding and the response to the query (see Section 3 for the details). It is evaluated by simulation that such a randomized 1. Introduction approach could certainly preserve the anonymity of users Recent advancement of network technologies enables us while keeping the cost reasonably low. However, a direct to easily share various contents over the Internet. For exam- application of the RR to the SocioNet is not efficient since ple, YouTube attracts more than 1 billion unique user visits the RR was originally proposed for random overlays and the per month and the upload of 100 hours of video every minute application of the RR loses the benefit of the SocioNet so in 2014. A key issue to realize such a content sharing over that the distance between the questioner and the respondent a large network is how to find the location of a requested is short. To overcome such an issue, this paper proposes two object. In particular, the support of an efficient object search techniques to improve the efficiency of the object search in is a crucial issue for Peer-to-Peer (P2P) applications since the SocioNet in terms of the number of messages which is in those systems, objects are generally stored in the local necessary to keep a high success rate. storage of each peer without being collected to a specific server as in classical content sharing services. The performance of the proposed scheme is evaluated by Flooding of queries with a designated TTL (time to live) is simulation. The result of simulation indicates that it reduces a simple but commonly used technique to realize an efficient the number of messages to a half of a simple combination of object search in P2P networks. There are many proposals the RR and the SocioNet, and additionally, it shows that the concerned with the variations of the query flooding, which number of delegates severely affects the success rate when includes LightFlood [3], Diff-Flooding [2] and UMPS [10]. the given TTL is not large. More precisely, we found that Among them, we are interested in the object search based the number of delegates, which can be controlled by tuning on an unstructured overlay reflecting the interest of the parameters used in the RR, should be at least three to attain users. SocioNet [4] and UIM [1] are representatives of such a high success rate while keeping the number of messages approaches. The key idea of such similarity-based overlays sufficiently low. is to connect peers to have similar interests by a link so The remainder of this paper is organized as follows. that the peer which issues a query, called questioner, can Sections 2 and 3 describe an overview of the SocioNet and be connected with a peer which has an object matching the the basic flow of the RR, respectively. Section 4 describes the query, called respondent, through a path consisting of a proposed scheme. Section 5 describes the simulation result. small number of links. By adopting such an overlay, the Finally, Section 6 concludes the paper with future work.
2. SocioNet 2.1 Overview rC Flooding of decrypted query The SocioNet is an unstructured P2P based on the notion Sower of similarity of interests. Each link in the SocioNet is either a similarity link or a random link. The former is intended to connect peers to have similar interests so that a query issued Questioner by a peer easily hits a target object with high probability which is expected to be held by a peer to have similar interest to the questioner, and the latter is intended to connect a rK pair of remote peers so that the resulting network has a Random walk of query rumors short diameter. Random links are established by rewiring similarity links with a certain probability β through random walk, as in the Watts and Strogatz’s scheme to construct Fig. 1: Steps 1 and 2 in the Rumor Riding. small-world networks [9] (the value of parameter β is set to around 0.2 to 0.3 in the SocioNet). The search of a target object is done through the flooding 2.3 Update Procedure of a query as in conventional P2Ps, while the existence of similarity links could significantly reduce the number of With the above notions, the SocioNet establishes similar- message transmissions required for attaining a given hit rate ity links in two different ways. The first way is to use a server compared with random overlays such as Gnutella [8]. which keeps the similarity for all pairs of peers to select pairs to have high similarity in a centralized manner. The second 2.2 Similarity of Peers way, which will be adopted in the proposed scheme, is to use random walk. More concretely, each peer i which wishes The similarity of peers is defined as follows. Let Oi be to update its similarity links first conducts x independent the set of objects held by peer i. Assume that each object random walks, where x is the (maximum) degree of the is attached tags representing the attributes of the object, peer in the overlay. At any peer in the random walk, it stops e.g., a music file of the performance of Benny Goodman with probability c/ log N for some constant c so that the will be attached tags Jazz, Clarinet and Swing. Let T = expected length becomes O(log N ), where N is the number {t1 , t2 , . . . , tj , . . .} denote the (universal) set of tags. For of peers in the network, and the peer at the stopped point is each peer i and tag tj ∈ T , let Oi,tj denote the set of regarded as the candidate for new neighbors. Among those x objects attached tag tj in set Oi . Then, the relevance of tag candidates and the currently adjacent x peers, peer i selects tj with peer i is defined as x peers to have highest similarity to peer i, and updates def |Oi,tj | neighbors so that it is connected to the selected x peers. wi,tj = . |Oi | 3. Rumor Riding For example, if peer i has 100 objects and 50 of them are Rumor Riding (RR) is a scheme to realize an anonymous attached tag Jazz, then wi,Jazz = 10050 = 0.5. The profile of object search in unstructured P2Ps. The basic idea of the peer i, denoted by w⃗i , is a vector of relevances, i.e., RR is to delegate the roles of the flooding of a query and def the reply to the query to randomly selected peers called w ⃗i = (wi,t1 , wi,t2 , . . . , wi,tj , . . .). sowers. With such a randomized mechanism, we can keep With the above notions, the similarity of peer j for peer i is the anonymity of the questioner and the respondent. In defined as follows addition, to keep the security of message transmissions, each message is encrypted by the sender of the message using the def |Oi | 1 sim(i, j) = × , (1) public key of the receiver. |Oj | cos(w ⃗i , w⃗j ) The protocol for the object search in RR consists of five where peer j with a smaller sim(i, j) is more favorable for steps. In the following, we explain each step in detail. peer i as an adjacent peer connected by a similarity link. Step 1: Generation of Query Rumors The reader should note that the above notion of similarity is Let i be the questioner. At first, peer i generates a public not symmetrical. If fact, even if two peers a and b have the key Ki+ and inserts it to the content of the query, where same profile w ⃗ = w⃗a = w⃗b , when |Oa | < |Ob |, we have Ki+ will be used to encrypt the reply to the query by the sim(a, b) < 1 < sim(b, a), respondent. Let q be the plain text of the resulting message including Ki+ . Peer i then encrypts q with a symmetric key that is, b would be favorable for a but the reserve is not true. K into a cipher text C, then organizes two query rumors rK
Direct forwarding of rC decrypted reply Sower Direct forwarding of rC’ decrypted ACK Sower Sower Questioner Questioner Respondent Respondent Random walk rK rK’ of rumors Sower Random walk of rumors Fig. 2: Step 3 in the Rumor Riding. Fig. 3: Step 4 in the Rumor Riding. and rC , where rK and rC are messages containing K and C, as the sower concerned with the respondent as follows: 1) it respectively. Those rumors are sent out to different neighbors decrypts R and the IP address of s from reply rumors, and of peer i and start an (independent) random walk with an 2) it directly forwards reply rumors to sower s. See Figure appropriate TTL (more precisely, peer i generates k such 2 for illustration. pairs of rumors to increase the probability of those rumors Step 4: ACK Message “meeting” at a peer, where k is an appropriate parameter; After receiving reply rumors rK ′ and rC ′ , the questioner i it is experimentally verified that k and the TTL should be decrypts R from C ′ with symmetry key K ′ and then decrypts determined so that their product is from 100 to 200, i.e., if the reply message from R with the secret key of peer i. k is four then the TTL should be from 25 to 50 [5]). See Then peer i sends an ACK message to the respondent j Figure 1 for illustration. in the following manner: 1) it encrypts the ACK message Step 2: Sowers Concerned with the Questioner into a cipher text with the public key of j (which should In the RR, a peer which receives both rK and rC serves as be contained in the reply message); 2) it organizes two a delegate of the questioner called sower. More concretely, rumors from the cipher text as in previous steps; and 3) it after decrypting message q from K and C, each sower starts sends out those rumors to different neighbors, as before. The the flooding of q to its neighbors and waits for the reply to sower conceded with the ACK message directly forwards the query from an appropriate respondent. After receiving a the received rumors to the sower concerned with the reply reply message from the respondent, which is encrypted with message described in Step 3, which will be delivered to the the public key Ki+ of the questioner i and is separated into respondent j by traveling the path used in the random walk two rumors similar to the separation of q into rK and rC , in the reverse direction. See Figure 3 for illustration. it sends back those rumors to the questioner along the paths Step 5: Transmission of Object traveled by rK and rC , respectively, in the reverse direction. After receiving the (encrypted) ACK message, the respon- The reader should note that to enable such a behavior of the dent j decrypts it into plain text with the symmetry key sower and the other intermediate peers, the RR should force contained in a rumor and the secret key of j. After that, every peer to cache all rumors passing through the peer for it moves to the actual transmission of the requested object a certain time so that it is expired after the reply message is using digital envelope. More concretely, after encrypting successively received by the questioner. the object into a cipher text F , peer j transfers it to Step 3: Reply from the Respondent the questioner through random walk of two rumors, direct Suppose that query q transmitted by a sower s is received forwarding of the rumors to the sower concerned with the by a peer j holding an object matching the query. After ACK message, and the delivery of rumors by traveling the receiving q, peer j generates a reply message and encrypts path used in the random walk of Step 4 in the reverse it with the public key Ki+ of the questioner i. Let R be direction. the resulting cipher text. Peer j then encrypts R and the IP address of s with a symmetry key K ′ into a cipher text C ′ , 4. Proposed Method then organizes two reply rumors rK ′ and rC ′ similar to Step 1. Those rumors are sent out to different neighbors and start 4.1 Design Issues an (independent) random walk, as before. If a peer receives This section describes the details of the proposed scheme. both rK ′ and rC ′ from its neighbors, then the peer serves The goal of the scheme is to realize an anonymous object
two to five, using both of random and similarity links. Each copy of the query stops the propagation when: 1) the TTL exhausts or 2) it arrives at a peer holding an object matching the query. In addition, if it arrives at a peer which has a Sower similar interest to the query, then it switches the mode of flooding so that it merely uses similarity links to realize an efficient intensification of the exploration. The similarity of a peer j with a query q is calculated as follows. Recall that in the SocioNet, each peer is associated with a profile representing its interests in the form of a vector of relevances to the tags in T . The idea is to associate Similarity link a set of tags to each query issued by the questioners1 . If Random link query q is associated with a single tag t drawn from set T , the similarity of the query with a peer j is calculated in Fig. 4: Dynamic switch of the mode of flooding during the the following three steps: 1) extract the relevance wj,t of j query propagation. with tag t from the profile w⃗j ; 2) extract top α elements from the profile with the maximum relevance; and 3) if wj,t is contained in the extracted α elements, then we judge search in the SocioNet using the notion of the RR described that the similarity between peer j and query q is high. in Section 3. However, if we directly apply the techniques If q is associated with two or more tags, we extend the used in the RR to the SocioNet, we will face to the following above scheme so as to check whether the majority of tags issues: 1) As was described in Section 2, the SocioNet is associated with the query are contained in the top α elements designed in such a way that the questioner is located in in the profile vector. the neighborhood of the respondent. However, the direct Figure 4 illustrates a running example of the scheme. In application of the RR to the SocioNet loses such a benefit this figure, the peer holding an object matching the query of the SocioNet, since in the RR, the actual flooding is issued by the questioner is painted red, and peers which conducted by a sower which is randomly selected from all has a similar interest with the query is painted orange. peers in the network, i.e., we cannot guarantee that the sower After decrypting the query from two query rumors received is in the neighborhood of the respondent. 2) The search in from different neighbors, the sower, which is painted green, the RR is based on a simple flooding, i.e., it repeats the initiates a flooding of the query by setting the TTL to a small forwarding of a received query to all neighbors until the value. The flooded message uses all links within the TTL, TTL given to the query exhausts. However, such a simple and after arriving at an orange peer, which has a similar scheme does not fully utilize the structure of the SocioNet interest to the query, it switches the mode to the flooding so that two types of links play different roles in the overlay, without random links. i.e., random link connects remote peers and similarity link connects peers to have similar interests. This means that to 4.3 Similarity-Based Filtering improve the efficiency of the object search, the propagation The second technique is to filter queries at each similarity of a query from the selected sower should be conducted by link by the similarity of the receiver to the query. Suppose carefully considering the difference of the role of links. that peer j receives a query q associated with a set of In the following subsections, we propose two techniques tags. In the first technique, all similarity edges outgoing to overcome those issues. from j are used for the propagation of the query unless the 4.2 Dynamic Switch of the Mode of Flooding TTL is exhausted. However, since the similarity of peers is defined by the cosign similarity of profiles and the number of The first technique is to take into account the difference of objects held by each peer (see Equation (1) for the details), the role of links during the propagation of query messages. a neighbor ℓ of j connected by a similarity edge (j, ℓ) might More concretely, we devolve the role of diversification to not be relevant to q even if peer j is relevant to q and the random links in an early phase of the query propagation and value of sim(j, ℓ) is small. For example, consider the case the role of intensification to similarity links in the remaining in which peers j and ℓ have 200 objects attached tag Jazz, steps of the query propagation. peer j has 20 objects attached tag Clarinet and peer ℓ has The concrete operation proceeds as follows. Let s be a sower concerned with the questioner which received two 1 The simplest way to realize such a situation is to ask questioners to query rumors rK and rC from its different neighbors. After designate tags associated with the query. Another possible way is to adopt the technique of automatic tag attachment which has been proposed in the decrypting message q from C with K, s starts the flooding literature [7]. In the evaluation described in Section 5, we assume that each of q to its neighbors by setting TTL to a small value, e.g., query is attached a single tag by the questioner.
Table 1: Parameters used in the simulation. 3000 The number of peers 10000 The number of objects 1000 COMB Average number of messages 2500 The number of peers holding matching object 100 PROP Average degree of peers 6 TECH1 2000 TECH2 Rewiring probability 0.3 TTL of the first phase 2 Threshold θ 0.8 1500 1000 no object attached tag Clarinet. In such a case, a query q 500 with tag Clarinet received by peer j should not be forwarded to peer ℓ, since ℓ has no object attached tag Clarinet and 0 2 3 4 5 such a fact can be detected by analyzing the relevance of TTL the receiver ℓ to the query. The filtering of queries is conducted by using the cosign Fig. 5: The average number of messages issued in four similarity. More concretely, each query q is associated with schemes. a binary vector ⃗q so that the ith element in the vector takes value 1 if and only if the ith tag (in set T ) is associated with 100 q. Then, the similarity σ(q, ℓ) between peer ℓ and query q 90 is calculated as σ(q, j) = cos(⃗q , w⃗ℓ ), and the similarity link 80 connecting to ℓ stops the forwarding of q if the value of 70 σ(q, j) is smaller than a predetermined threshold θ. Success rate [%] 60 COMB 5. Evaluation 50 40 PROP 5.1 Setup 30 We evaluate the performance of the proposed scheme by 20 simulation. The simulation is conducted by using PeerSim 10 simulator [6], and as the competitor, we use a simple 0 2 3 4 5 combination of the RR and the SocioNet in which each TTL sower concerned with the questioner initiates a flooding of the decrypted query with a designated TTL. In the following, Fig. 6: The success rate of two schemes obtained by dividing we denote the above combined scheme as COMB and the the number of successful runs by the total number of runs. proposed scheme with two techniques PROP, where for the reader’s reference, we also show the result for the scheme merely with the first technique denoted as TECH1 and that 5.2 Number of Messages with the second technique denoted as TECH2. The metric Figure 5 illustrates the result on the number of messages. for the evaluation is the number of messages and the success The horizontal axis is the TTL of the flooding and four rate, which are averaged over 30 runs. curves correspond to the result for COMB, PROP, TECH1 Parameters used in the simulation are given as follows. and TECH2, respectively. Although there is no big differ- The number of peers and the number of objects are fixed to ence among four schemes when TTL is two, we could find a 10000 and 1000, respectively, where each object can have significant reduction of the number of messages as the TTL several copies in the overlay. The number of copies held by becomes large. In particular, the amount of improvement of each peer follows a Poisson distribution with mean λ = 6. COMB by PROP is about 50% when TTL is five. The popularity of the object matching a query is set to 1%, i.e., we consider a situation in which among 10000 peers, 5.3 Success Rate only 100 peers hold the object matching the query. The Figure 6 compares the success rate of COMB and PROP, overlay network consisting of similarity links is generated by which is calculated by dividing the number of successful the Barbási-Albert (BA) model so that the average degree of runs by the total number of runs in the simulation, where each peer is six and the probability of rewiring a similarity the horizontal axis is the TTL of query flooding, as before. link into a random link is set to 0.3. TTL of the first phase The success rate of COMB monotonically grows as the TTL of the query forwarding used in the first technique is set to increases, which reaches 100% when TTL is four. However, two. Finally, we fix threshold θ used in the second technique the success rate of PROP is not stable with respect to the to 0.8. Those parameters as summarized in Table 1. monotonic change of the TTL; e.g., the success rate when
100 100 90 90 80 80 70 70 Success rate [%] Success rate [%] 60 COMB 60 COMB PROP 50 50 TECH1 PROP 40 TECH2 40 30 30 20 20 10 10 0 0 2 3 4 5 1 2 3 4 TTL Number of sowers Fig. 7: Comparison of the success rate of four schemes which Fig. 8: Impact of the number of sowers to the success rate is calculated by excluding runs with two or less sowers. (TTL is fixed to three). TTL is three seems to be too small compared with the a detailed analysis of the behavior of the proposed scheme, success rate for other TTLs. since in the current work, we merely evaluate the average A reason of such an instability of the success rate is due number of messages and the success rate. to the small number of sowers generated by the RR. See Figure 7 for illustration. This figure redraws the curves of References the success rate after excluding simulation runs in which [1] G. Chen, C.-P. Low, and Z.-H. Yang. “Enhancing Search Performance the number of sowers generated by the random walks is two in Unstructured P2P Networks Based on Users’ Common Interest.” or less. As shown in the figure, by excluding such runs, IEEE Trans. on Parallel and Distributed Systems, 19(6): 821–836, 2008. we have a reasonable grow of the success rate, and can [2] Y.-D. Gong, J. Hu, Z.-J. Dong, S.-N. Wang, and S.-J. Hu. “Improved make the following observations: 1) the use of the second Flooding-Based Resource Discovery.” In Proc. of the 2nd Interna- technique reduces the success rate (recall that the second tional Workshop on Intelligent Systems and Applications (ISA), 2010, pages 1–4. technique stops the forwarding of the query to a peer to have [3] S. Jiang, L. Guo, and X.-D. Zhang. “LightFlood: an efficient flooding a profile which is not similar to the query); and 2) COMB scheme for file search in unstructured peer-to-peer systems.” In Proc. is better than TECH2, i.e., the simultaneous use of the first of International Conference on Parallel Processing, 2003, pages 627– 635. technique with the second technique relaxes the badness of [4] K. C.-J. Lin, C.-P. Wang, C.-F. Chou, and L. Golubchik. “SocioNet: the second scheme. The conjecture such that the number of A Social-Based Multimedia Access System for Unstructured P2P sowers affects the success rate is confirmed by Figure 8, Networks.” IEEE Trans. on Parallel and Distributed Systems, 21(7): 1027–1041, 2010. which illustrates the impact of the number of sowers to the [5] Y.-H. Liu, J.-S. Han, J.-L. Wang. “Rumor Riding: Anonymizing success rate by fixing the TTL to three. Unstructured Peer-to-Peer Systems.” IEEE Trans. on Parallel and Distributed Systems, 22(3): 464–475, 2011. [6] A. Montresor and M. Jelasity. “PeerSim: A scalable P2P simulator.” In 6. Concluding Remarks Proc. of the 9th International Conference on Peer-to-Peer Computing This paper proposes an anonymized object search scheme (P2P’09), 2009, pages 99–100. [7] T.-T. Qin and S. Fujita. “Automatic Tag Attachment Scheme for for the SocioNet. More precisely, we propose two techniques Efficient File Search in Peer-To-Peer File Sharing Systems.” In Proc. to overcome the inefficiency of a simple application of the International Conference on Advances in Social Network Analysis and Rumor Riding to the SocioNet, where the first technique Mining (ASONAM 2011), 2011, pages 507–511. [8] Y. Wang, X.-C. Yun, and Y.-F. Li. “Analyzing the Characteristics of is to dynamically switch the kind of links used for the Gnutella Overlays.” In Proc. of the 4th International Conference on query propagation and the second technique is to filter Information Technology, 2007, pages 1095–1100. queries at each similarity link by the similarity of the [9] D. J. Watts and S. H. Strogatz. “Collective dynamics of ’small-world’ networks.” Nature, 393(6684): 440–442, 1998. receiver to the query. The performance of the scheme is [10] A. Wu, X.-S. Liu, and K.-J. Liu. “Efficient flooding in peer-to-peer evaluated by simulation. The simulation result indicates that networks.” In Proc. of the 7th International Conference on Computer- the proposed scheme reduces the number of messages of a Aided Industrial Design and Conceptual Design (CAIDCD ’06), 2006, pages 1–6. simple combination of the SocioNet and the Rumor Riding to a half without significantly reducing the success rate. A future work is to verify the effect of the popularity of the searched object to the performance, which was fixed to 1% in the current simulation. Another key issue is to conduct
You can also read