Efficient Anonymization of the SocioNet with the Aid of Rumor Riding

Page created by Dustin Hopkins

Current Events

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Efficient Anonymization of the SocioNet
with the Aid of Rumor Riding
Hiroki IIZUKA1 and Satoshi FUJITA1
1 Department of Information Engineering, Hiroshima University

Higashi-Hiroshima, 739-8527, Japan

Abstract— In this paper, we propose an anonymized object efficiency of query flooding can be significantly improved
search scheme for the SocioNet which is an unstructured compared with random overlays [4]. However, although it
P2P based on the notion of the similarity of interests. The certainly improves the efficiency, it causes a serious risk for
proposed scheme is an application of a randomized object each user so that the fact of issuing a query, the fact of
search scheme proposed by Liu et al. called Rumor Riding responding to the query and the content of the query and
(RR, for short). We propose two techniques to overcome the the reply are disclosed to all peers to have similar interests.
inefficiency of a simple application of the RR to the SocioNet. In other words, such a simple flooding could not preserve
The performance of the proposed scheme is evaluated by the privacy of users which is a crucial drawback of the most
simulation. The simulation result indicates that the proposed of existing flooding-based object search schemes.
scheme reduces the number of messages to a half of a simple In this paper, we focus on the SocioNet as the underlying
combination, and additionally, shows that the number of similarity-based P2P, and propose a scheme to preserve the
delegates selected in the RR severely affects the success rate anonymity of users in the network. The proposed scheme
of the overall scheme particularly when the TTL is not large. is an application of a randomized method proposed by
Liu et al. called Rumor Riding (RR, for short) [5]. The
key idea of the RR is to select delegates through random
Keywords: Peer-to-Peer content sharing, anonymity of users,
walk and to make those delegates to conduct actual query
object search, Rumor Riding, SocioNet.
flooding and the response to the query (see Section 3 for the
details). It is evaluated by simulation that such a randomized
1. Introduction approach could certainly preserve the anonymity of users
Recent advancement of network technologies enables us while keeping the cost reasonably low. However, a direct
to easily share various contents over the Internet. For exam- application of the RR to the SocioNet is not efficient since
ple, YouTube attracts more than 1 billion unique user visits the RR was originally proposed for random overlays and the
per month and the upload of 100 hours of video every minute application of the RR loses the benefit of the SocioNet so
in 2014. A key issue to realize such a content sharing over that the distance between the questioner and the respondent
a large network is how to find the location of a requested is short. To overcome such an issue, this paper proposes two
object. In particular, the support of an efficient object search techniques to improve the efficiency of the object search in
is a crucial issue for Peer-to-Peer (P2P) applications since the SocioNet in terms of the number of messages which is
in those systems, objects are generally stored in the local necessary to keep a high success rate.
storage of each peer without being collected to a specific
server as in classical content sharing services. The performance of the proposed scheme is evaluated by
Flooding of queries with a designated TTL (time to live) is simulation. The result of simulation indicates that it reduces
a simple but commonly used technique to realize an efficient the number of messages to a half of a simple combination of
object search in P2P networks. There are many proposals the RR and the SocioNet, and additionally, it shows that the
concerned with the variations of the query flooding, which number of delegates severely affects the success rate when
includes LightFlood [3], Diff-Flooding [2] and UMPS [10]. the given TTL is not large. More precisely, we found that
Among them, we are interested in the object search based the number of delegates, which can be controlled by tuning
on an unstructured overlay reflecting the interest of the parameters used in the RR, should be at least three to attain
users. SocioNet [4] and UIM [1] are representatives of such a high success rate while keeping the number of messages
approaches. The key idea of such similarity-based overlays sufficiently low.
is to connect peers to have similar interests by a link so The remainder of this paper is organized as follows.
that the peer which issues a query, called questioner, can Sections 2 and 3 describe an overview of the SocioNet and
be connected with a peer which has an object matching the the basic flow of the RR, respectively. Section 4 describes the
query, called respondent, through a path consisting of a proposed scheme. Section 5 describes the simulation result.
small number of links. By adopting such an overlay, the Finally, Section 6 concludes the paper with future work.

2. SocioNet
2.1 Overview rC
Flooding of
decrypted query
The SocioNet is an unstructured P2P based on the notion Sower
of similarity of interests. Each link in the SocioNet is either
a similarity link or a random link. The former is intended to
connect peers to have similar interests so that a query issued Questioner
by a peer easily hits a target object with high probability
which is expected to be held by a peer to have similar interest
to the questioner, and the latter is intended to connect a rK
pair of remote peers so that the resulting network has a Random walk of query rumors
short diameter. Random links are established by rewiring
similarity links with a certain probability β through random
walk, as in the Watts and Strogatz’s scheme to construct
Fig. 1: Steps 1 and 2 in the Rumor Riding.
small-world networks [9] (the value of parameter β is set to
around 0.2 to 0.3 in the SocioNet).
The search of a target object is done through the flooding 2.3 Update Procedure
of a query as in conventional P2Ps, while the existence
of similarity links could significantly reduce the number of With the above notions, the SocioNet establishes similar-
message transmissions required for attaining a given hit rate ity links in two different ways. The first way is to use a server
compared with random overlays such as Gnutella [8]. which keeps the similarity for all pairs of peers to select pairs
to have high similarity in a centralized manner. The second
2.2 Similarity of Peers way, which will be adopted in the proposed scheme, is to
use random walk. More concretely, each peer i which wishes
The similarity of peers is defined as follows. Let Oi be to update its similarity links first conducts x independent
the set of objects held by peer i. Assume that each object random walks, where x is the (maximum) degree of the
is attached tags representing the attributes of the object, peer in the overlay. At any peer in the random walk, it stops
e.g., a music file of the performance of Benny Goodman with probability c/ log N for some constant c so that the
will be attached tags Jazz, Clarinet and Swing. Let T = expected length becomes O(log N ), where N is the number
{t1 , t2 , . . . , tj , . . .} denote the (universal) set of tags. For of peers in the network, and the peer at the stopped point is
each peer i and tag tj ∈ T , let Oi,tj denote the set of regarded as the candidate for new neighbors. Among those x
objects attached tag tj in set Oi . Then, the relevance of tag candidates and the currently adjacent x peers, peer i selects
tj with peer i is defined as x peers to have highest similarity to peer i, and updates
def |Oi,tj | neighbors so that it is connected to the selected x peers.
wi,tj = .
|Oi |
3. Rumor Riding
For example, if peer i has 100 objects and 50 of them are Rumor Riding (RR) is a scheme to realize an anonymous
attached tag Jazz, then wi,Jazz = 10050
= 0.5. The profile of object search in unstructured P2Ps. The basic idea of the
peer i, denoted by w⃗i , is a vector of relevances, i.e., RR is to delegate the roles of the flooding of a query and
def the reply to the query to randomly selected peers called
w
⃗i = (wi,t1 , wi,t2 , . . . , wi,tj , . . .).
sowers. With such a randomized mechanism, we can keep
With the above notions, the similarity of peer j for peer i is the anonymity of the questioner and the respondent. In
defined as follows addition, to keep the security of message transmissions, each
message is encrypted by the sender of the message using the
def |Oi | 1
sim(i, j) = × , (1) public key of the receiver.
|Oj | cos(w
⃗i , w⃗j )
The protocol for the object search in RR consists of five
where peer j with a smaller sim(i, j) is more favorable for steps. In the following, we explain each step in detail.
peer i as an adjacent peer connected by a similarity link. Step 1: Generation of Query Rumors
The reader should note that the above notion of similarity is Let i be the questioner. At first, peer i generates a public
not symmetrical. If fact, even if two peers a and b have the key Ki+ and inserts it to the content of the query, where
same profile w
⃗ = w⃗a = w⃗b , when |Oa | < |Ob |, we have Ki+ will be used to encrypt the reply to the query by the
sim(a, b) < 1 < sim(b, a), respondent. Let q be the plain text of the resulting message
including Ki+ . Peer i then encrypts q with a symmetric key
that is, b would be favorable for a but the reserve is not true. K into a cipher text C, then organizes two query rumors rK

Direct forwarding of
              rC                           decrypted reply
                         Sower                                                                        Direct forwarding of
                                                                rC’                                     decrypted ACK

                                                     Sower                                                               Sower
            Questioner                                                            Questioner

                                                 Respondent                                                          Respondent
                                                                            Random walk
               rK                                    rK’                     of rumors    Sower
                                 Random walk of rumors

            Fig. 2: Step 3 in the Rumor Riding.                                   Fig. 3: Step 4 in the Rumor Riding.

and rC , where rK and rC are messages containing K and C,             as the sower concerned with the respondent as follows: 1) it
respectively. Those rumors are sent out to different neighbors        decrypts R and the IP address of s from reply rumors, and
of peer i and start an (independent) random walk with an              2) it directly forwards reply rumors to sower s. See Figure
appropriate TTL (more precisely, peer i generates k such              2 for illustration.
pairs of rumors to increase the probability of those rumors              Step 4: ACK Message
“meeting” at a peer, where k is an appropriate parameter;                After receiving reply rumors rK ′ and rC ′ , the questioner i
it is experimentally verified that k and the TTL should be            decrypts R from C ′ with symmetry key K ′ and then decrypts
determined so that their product is from 100 to 200, i.e., if         the reply message from R with the secret key of peer i.
k is four then the TTL should be from 25 to 50 [5]). See              Then peer i sends an ACK message to the respondent j
Figure 1 for illustration.                                            in the following manner: 1) it encrypts the ACK message
   Step 2: Sowers Concerned with the Questioner                       into a cipher text with the public key of j (which should
   In the RR, a peer which receives both rK and rC serves as          be contained in the reply message); 2) it organizes two
a delegate of the questioner called sower. More concretely,           rumors from the cipher text as in previous steps; and 3) it
after decrypting message q from K and C, each sower starts            sends out those rumors to different neighbors, as before. The
the flooding of q to its neighbors and waits for the reply to         sower conceded with the ACK message directly forwards
the query from an appropriate respondent. After receiving a           the received rumors to the sower concerned with the reply
reply message from the respondent, which is encrypted with            message described in Step 3, which will be delivered to the
the public key Ki+ of the questioner i and is separated into          respondent j by traveling the path used in the random walk
two rumors similar to the separation of q into rK and rC ,            in the reverse direction. See Figure 3 for illustration.
it sends back those rumors to the questioner along the paths             Step 5: Transmission of Object
traveled by rK and rC , respectively, in the reverse direction.          After receiving the (encrypted) ACK message, the respon-
The reader should note that to enable such a behavior of the          dent j decrypts it into plain text with the symmetry key
sower and the other intermediate peers, the RR should force           contained in a rumor and the secret key of j. After that,
every peer to cache all rumors passing through the peer for           it moves to the actual transmission of the requested object
a certain time so that it is expired after the reply message is       using digital envelope. More concretely, after encrypting
successively received by the questioner.                              the object into a cipher text F , peer j transfers it to
   Step 3: Reply from the Respondent                                  the questioner through random walk of two rumors, direct
   Suppose that query q transmitted by a sower s is received          forwarding of the rumors to the sower concerned with the
by a peer j holding an object matching the query. After               ACK message, and the delivery of rumors by traveling the
receiving q, peer j generates a reply message and encrypts            path used in the random walk of Step 4 in the reverse
it with the public key Ki+ of the questioner i. Let R be              direction.
the resulting cipher text. Peer j then encrypts R and the IP
address of s with a symmetry key K ′ into a cipher text C ′ ,         4. Proposed Method
then organizes two reply rumors rK ′ and rC ′ similar to Step
1. Those rumors are sent out to different neighbors and start         4.1 Design Issues
an (independent) random walk, as before. If a peer receives             This section describes the details of the proposed scheme.
both rK ′ and rC ′ from its neighbors, then the peer serves           The goal of the scheme is to realize an anonymous object

two to five, using both of random and similarity links. Each
                                                                   copy of the query stops the propagation when: 1) the TTL
                                                                   exhausts or 2) it arrives at a peer holding an object matching
                                                                   the query. In addition, if it arrives at a peer which has a
                   Sower                                           similar interest to the query, then it switches the mode of
                                                                   flooding so that it merely uses similarity links to realize an
                                                                   efficient intensification of the exploration.
                                                                      The similarity of a peer j with a query q is calculated as
                                                                   follows. Recall that in the SocioNet, each peer is associated
                                                                   with a profile representing its interests in the form of a vector
                                                                   of relevances to the tags in T . The idea is to associate
             Similarity link
                                                                   a set of tags to each query issued by the questioners1 . If
             Random link
                                                                   query q is associated with a single tag t drawn from set T ,
                                                                   the similarity of the query with a peer j is calculated in
Fig. 4: Dynamic switch of the mode of flooding during the          the following three steps: 1) extract the relevance wj,t of j
query propagation.                                                 with tag t from the profile w⃗j ; 2) extract top α elements
                                                                   from the profile with the maximum relevance; and 3) if wj,t
                                                                   is contained in the extracted α elements, then we judge
search in the SocioNet using the notion of the RR described        that the similarity between peer j and query q is high.
in Section 3. However, if we directly apply the techniques         If q is associated with two or more tags, we extend the
used in the RR to the SocioNet, we will face to the following      above scheme so as to check whether the majority of tags
issues: 1) As was described in Section 2, the SocioNet is          associated with the query are contained in the top α elements
designed in such a way that the questioner is located in           in the profile vector.
the neighborhood of the respondent. However, the direct               Figure 4 illustrates a running example of the scheme. In
application of the RR to the SocioNet loses such a benefit         this figure, the peer holding an object matching the query
of the SocioNet, since in the RR, the actual flooding is           issued by the questioner is painted red, and peers which
conducted by a sower which is randomly selected from all           has a similar interest with the query is painted orange.
peers in the network, i.e., we cannot guarantee that the sower     After decrypting the query from two query rumors received
is in the neighborhood of the respondent. 2) The search in         from different neighbors, the sower, which is painted green,
the RR is based on a simple flooding, i.e., it repeats the         initiates a flooding of the query by setting the TTL to a small
forwarding of a received query to all neighbors until the          value. The flooded message uses all links within the TTL,
TTL given to the query exhausts. However, such a simple            and after arriving at an orange peer, which has a similar
scheme does not fully utilize the structure of the SocioNet        interest to the query, it switches the mode to the flooding
so that two types of links play different roles in the overlay,    without random links.
i.e., random link connects remote peers and similarity link
connects peers to have similar interests. This means that to       4.3 Similarity-Based Filtering
improve the efficiency of the object search, the propagation
                                                                      The second technique is to filter queries at each similarity
of a query from the selected sower should be conducted by
                                                                   link by the similarity of the receiver to the query. Suppose
carefully considering the difference of the role of links.
                                                                   that peer j receives a query q associated with a set of
   In the following subsections, we propose two techniques
                                                                   tags. In the first technique, all similarity edges outgoing
to overcome those issues.
                                                                   from j are used for the propagation of the query unless the
4.2 Dynamic Switch of the Mode of Flooding                         TTL is exhausted. However, since the similarity of peers is
                                                                   defined by the cosign similarity of profiles and the number of
   The first technique is to take into account the difference of   objects held by each peer (see Equation (1) for the details),
the role of links during the propagation of query messages.        a neighbor ℓ of j connected by a similarity edge (j, ℓ) might
More concretely, we devolve the role of diversification to         not be relevant to q even if peer j is relevant to q and the
random links in an early phase of the query propagation and        value of sim(j, ℓ) is small. For example, consider the case
the role of intensification to similarity links in the remaining   in which peers j and ℓ have 200 objects attached tag Jazz,
steps of the query propagation.                                    peer j has 20 objects attached tag Clarinet and peer ℓ has
   The concrete operation proceeds as follows. Let s be a
sower concerned with the questioner which received two                1 The simplest way to realize such a situation is to ask questioners to

query rumors rK and rC from its different neighbors. After         designate tags associated with the query. Another possible way is to adopt
                                                                   the technique of automatic tag attachment which has been proposed in the
decrypting message q from C with K, s starts the flooding          literature [7]. In the evaluation described in Section 5, we assume that each
of q to its neighbors by setting TTL to a small value, e.g.,       query is attached a single tag by the questioner.

Table 1: Parameters used in the simulation.                                                    3000
                   The number of peers                10000
                 The number of objects                1000                                                           COMB

                                                                          Average number of messages
                                                                                                       2500
        The number of peers holding matching object    100                                                           PROP
                 Average degree of peers                6                                                            TECH1
                                                                                                       2000          TECH2
                   Rewiring probability                0.3
                  TTL of the first phase                2
                       Threshold θ                     0.8                                             1500

                                                                                                       1000

no object attached tag Clarinet. In such a case, a query q                                              500
with tag Clarinet received by peer j should not be forwarded
to peer ℓ, since ℓ has no object attached tag Clarinet and                                                   0
                                                                                                                 2           3         4    5
such a fact can be detected by analyzing the relevance of                                                                        TTL
the receiver ℓ to the query.
   The filtering of queries is conducted by using the cosign         Fig. 5: The average number of messages issued in four
similarity. More concretely, each query q is associated with         schemes.
a binary vector ⃗q so that the ith element in the vector takes
value 1 if and only if the ith tag (in set T ) is associated with                                      100
q. Then, the similarity σ(q, ℓ) between peer ℓ and query q                                              90
is calculated as σ(q, j) = cos(⃗q , w⃗ℓ ), and the similarity link
                                                                                                        80
connecting to ℓ stops the forwarding of q if the value of
                                                                                                        70
σ(q, j) is smaller than a predetermined threshold θ.
                                                                          Success rate [%]              60
                                                                                                                                           COMB

5. Evaluation                                                                                           50

                                                                                                        40
                                                                                                                                           PROP

5.1 Setup                                                                                               30

   We evaluate the performance of the proposed scheme by                                                20

simulation. The simulation is conducted by using PeerSim                                                10

simulator [6], and as the competitor, we use a simple                                                    0
                                                                                                                 2           3         4    5
combination of the RR and the SocioNet in which each                                                                             TTL
sower concerned with the questioner initiates a flooding of
the decrypted query with a designated TTL. In the following,         Fig. 6: The success rate of two schemes obtained by dividing
we denote the above combined scheme as COMB and the                  the number of successful runs by the total number of runs.
proposed scheme with two techniques PROP, where for the
reader’s reference, we also show the result for the scheme
merely with the first technique denoted as TECH1 and that            5.2 Number of Messages
with the second technique denoted as TECH2. The metric                  Figure 5 illustrates the result on the number of messages.
for the evaluation is the number of messages and the success         The horizontal axis is the TTL of the flooding and four
rate, which are averaged over 30 runs.                               curves correspond to the result for COMB, PROP, TECH1
   Parameters used in the simulation are given as follows.           and TECH2, respectively. Although there is no big differ-
The number of peers and the number of objects are fixed to           ence among four schemes when TTL is two, we could find a
10000 and 1000, respectively, where each object can have             significant reduction of the number of messages as the TTL
several copies in the overlay. The number of copies held by          becomes large. In particular, the amount of improvement of
each peer follows a Poisson distribution with mean λ = 6.            COMB by PROP is about 50% when TTL is five.
The popularity of the object matching a query is set to 1%,
i.e., we consider a situation in which among 10000 peers,            5.3 Success Rate
only 100 peers hold the object matching the query. The                  Figure 6 compares the success rate of COMB and PROP,
overlay network consisting of similarity links is generated by       which is calculated by dividing the number of successful
the Barbási-Albert (BA) model so that the average degree of          runs by the total number of runs in the simulation, where
each peer is six and the probability of rewiring a similarity        the horizontal axis is the TTL of query flooding, as before.
link into a random link is set to 0.3. TTL of the first phase        The success rate of COMB monotonically grows as the TTL
of the query forwarding used in the first technique is set to        increases, which reaches 100% when TTL is four. However,
two. Finally, we fix threshold θ used in the second technique        the success rate of PROP is not stable with respect to the
to 0.8. Those parameters as summarized in Table 1.                   monotonic change of the TTL; e.g., the success rate when

100 100

90 90

80 80

70 70
Success rate [%]

Success rate [%]
60 COMB 60
COMB
PROP
50 50
TECH1
PROP
40 TECH2 40

30 30

20 20

10 10

0 0
2 3 4 5 1 2 3 4
TTL Number of sowers

Fig. 7: Comparison of the success rate of four schemes which Fig. 8: Impact of the number of sowers to the success rate
is calculated by excluding runs with two or less sowers. (TTL is fixed to three).

TTL is three seems to be too small compared with the a detailed analysis of the behavior of the proposed scheme,
success rate for other TTLs. since in the current work, we merely evaluate the average
A reason of such an instability of the success rate is due number of messages and the success rate.
to the small number of sowers generated by the RR. See
Figure 7 for illustration. This figure redraws the curves of References
the success rate after excluding simulation runs in which [1] G. Chen, C.-P. Low, and Z.-H. Yang. “Enhancing Search Performance
the number of sowers generated by the random walks is two in Unstructured P2P Networks Based on Users’ Common Interest.”
or less. As shown in the figure, by excluding such runs, IEEE Trans. on Parallel and Distributed Systems, 19(6): 821–836,
2008.
we have a reasonable grow of the success rate, and can [2] Y.-D. Gong, J. Hu, Z.-J. Dong, S.-N. Wang, and S.-J. Hu. “Improved
make the following observations: 1) the use of the second Flooding-Based Resource Discovery.” In Proc. of the 2nd Interna-
technique reduces the success rate (recall that the second tional Workshop on Intelligent Systems and Applications (ISA), 2010,
pages 1–4.
technique stops the forwarding of the query to a peer to have [3] S. Jiang, L. Guo, and X.-D. Zhang. “LightFlood: an efficient flooding
a profile which is not similar to the query); and 2) COMB scheme for file search in unstructured peer-to-peer systems.” In Proc.
is better than TECH2, i.e., the simultaneous use of the first of International Conference on Parallel Processing, 2003, pages 627–
635.
technique with the second technique relaxes the badness of [4] K. C.-J. Lin, C.-P. Wang, C.-F. Chou, and L. Golubchik. “SocioNet:
the second scheme. The conjecture such that the number of A Social-Based Multimedia Access System for Unstructured P2P
sowers affects the success rate is confirmed by Figure 8, Networks.” IEEE Trans. on Parallel and Distributed Systems, 21(7):
1027–1041, 2010.
which illustrates the impact of the number of sowers to the [5] Y.-H. Liu, J.-S. Han, J.-L. Wang. “Rumor Riding: Anonymizing
success rate by fixing the TTL to three. Unstructured Peer-to-Peer Systems.” IEEE Trans. on Parallel and
Distributed Systems, 22(3): 464–475, 2011.
[6] A. Montresor and M. Jelasity. “PeerSim: A scalable P2P simulator.” In
6. Concluding Remarks Proc. of the 9th International Conference on Peer-to-Peer Computing
This paper proposes an anonymized object search scheme (P2P’09), 2009, pages 99–100.
[7] T.-T. Qin and S. Fujita. “Automatic Tag Attachment Scheme for
for the SocioNet. More precisely, we propose two techniques Efficient File Search in Peer-To-Peer File Sharing Systems.” In Proc.
to overcome the inefficiency of a simple application of the International Conference on Advances in Social Network Analysis and
Rumor Riding to the SocioNet, where the first technique Mining (ASONAM 2011), 2011, pages 507–511.
[8] Y. Wang, X.-C. Yun, and Y.-F. Li. “Analyzing the Characteristics of
is to dynamically switch the kind of links used for the Gnutella Overlays.” In Proc. of the 4th International Conference on
query propagation and the second technique is to filter Information Technology, 2007, pages 1095–1100.
queries at each similarity link by the similarity of the [9] D. J. Watts and S. H. Strogatz. “Collective dynamics of ’small-world’
networks.” Nature, 393(6684): 440–442, 1998.
receiver to the query. The performance of the scheme is [10] A. Wu, X.-S. Liu, and K.-J. Liu. “Efficient flooding in peer-to-peer
evaluated by simulation. The simulation result indicates that networks.” In Proc. of the 7th International Conference on Computer-
the proposed scheme reduces the number of messages of a Aided Industrial Design and Conceptual Design (CAIDCD ’06), 2006,
pages 1–6.
simple combination of the SocioNet and the Rumor Riding
to a half without significantly reducing the success rate.
A future work is to verify the effect of the popularity of
the searched object to the performance, which was fixed to
1% in the current simulation. Another key issue is to conduct

You can also read