A Personalized Recommender System for Pervasive Social Networks

Page created by Timothy Cunningham
 
CONTINUE READING
A Personalized Recommender System for Pervasive Social Networks
A Personalized Recommender System for Pervasive Social Networks

                                                               Valerio Arnaboldia , Mattia G. Campanaa,∗, Franca Delmastroa , Elena Paganib,a
                                                                                      a IIT-CNR,Via G. Moruzzi 1, 56124, Pisa, Italy
                                                        b Department   of Computer Science, Università degli Studi di Milano, Via Comelico 39, 20135 Milano, Italy

                                         Abstract
                                         The current availability of interconnected portable devices, and the advent of the Web 2.0, raise the problem of supporting
                                         anywhere and anytime access to a huge amount of content, generated and shared by mobile users. On the one hand,
arXiv:2205.15063v1 [cs.NI] 30 May 2022

                                         users tend to be always connected for sharing experiences and conducting their social interactions with friends and
                                         acquaintances, through so-called Mobile Social Networks, further improving their social inclusion. On the other hand,
                                         the pervasiveness of communication infrastructures spreading data (cellular networks, direct device-to-device contacts,
                                         interactions with ambient devices as in the Internet-of-Things) makes compulsory the deployment of solutions able to
                                         filter off undesired information and to select what content should be addressed to which users, for both (i) better user
                                         experience, and (ii) resource saving of both devices and network.
                                         In this work, we propose a novel framework for pervasive social networks, called Pervasive PLIERS (p-PLIERS), able
                                         to discover and select, in a highly personalized way, contents of interest for single mobile users. p-PLIERS exploits the
                                         recently proposed PLIERS tag-based recommender system [2] as context a reasoning tool able to adapt recommendations
                                         to heterogeneous interest profiles of different users. p-PLIERS effectively operates also when limited knowledge about
                                         the network is maintained. It is implemented in a completely decentralized environment, in which new contents are
                                         continuously generated and diffused through the network, and it relies only on the exchange of single nodes’ knowledge
                                         during proximity contacts and through device-to-device communications. We evaluated p-PLIERS by simulating its
                                         behavior in three different scenarios: a big event (Expo 2015), a conference venue (ACM KDD’15), and a working day
                                         in the city of Helsinki. For each scenario, we used real or synthetic mobility traces and we extracted real datasets from
                                         Twitter interactions to characterize the generation and sharing of user contents.
                                         Keywords: pervasive content sharing, mobile social networks, opportunistic networks, personalized recommender
                                         systems

                                         1. Introduction                                                        on device-to-device (D2D) wireless communications have
                                                                                                                been proposed in the literature, such as opportunistic net-
                                         The amount of data accessible through the Internet has                 works [7]. However, most of the classical mechanisms for
                                         dramatically increased over the last years. This trend has             content identification and recommendation designed for
                                         been exacerbated by the advent of online social networks               centralized infrastructures cannot be directly applied to
                                         (hereinafter OSNs) and by video-on-demand services like                these scenarios, and ad-hoc solutions must be adopted.
                                         Netflix [3]. It is then expected to boom with the Internet             This is the case of Mobile Social Networks (hereinafter
                                         of Things, where potentially every object in the physical              MSNs), designed to further improve social interactions and
                                         world will create and share information over the network.              inclusion through experience sharing based on opportunis-
                                         The availability of this data represents an important re-              tic communications, and to this aim they need efficient and
                                         source that is revolutionizing our society, but it is also             personalized mechanisms for useful content selection and
                                         posing some serious technological challenges in terms of               distribution.
                                         maintenance, management, indexing and identification of                The basic approaches to identify useful contents in oppor-
                                         contents. This affects especially mobile communications,               tunistic networks are based on publish/subscribe mecha-
                                         where users want to be always connected and able to share              nisms. Users have to explicitly define their interests by
                                         contents anywhere and anytime. To alleviate the burden                 subscribing to a fixed set of thematic channels and, when
                                         of data traffic on cellular networks, several solutions based          they encounter other mobile users, they can ask them for
                                                                                                                contents related to the channels they are subscribed to
                                           ∗ Corresponding author
                                                                                                                (see for example the PodNet project [19]). Other solu-
                                             Email addresses: v.arnaboldi@iit.cnr.it (Valerio                   tions exploit context information (e.g., the history of phys-
                                         Arnaboldi), m.campana@iit.cnr.it (Mattia G. Campana),                  ical contacts between nodes, social information about the
                                         f.delmastro@iit.cnr.it (Franca Delmastro), pagani@di.unimi.it          users or the presence of communities) to improve forward-
                                         (Elena Pagani)

                                         Preprint submitted to Elsevier                                                                                               May 31, 2022
A Personalized Recommender System for Pervasive Social Networks
ing and content dissemination to potentially interested             homonyms, polysemies, and different users’ tagging be-
users (ContentPlace [4], ProfileCast [15], SocialCast [8]           havior make the reasoning process difficult to perform in
and ICast [26]). They are generally defined as context-             some cases, and undermine the use of simple tag match-
aware forwarding or content dissemination protocols, and            ing [14]. To overcome these limitations, a novel family
context information generally includes information charac-          of recommender systems, called Tag-based Recommender
terizing the user’s behavior, her interests, generated and          Systems [33], have been recently proposed for centralized
shared contents and the surrounding environment.                    infrastructures, where global knowledge is available. To
Recently researchers have investigated the possibility of us-       understand whether a content is interesting for a user,
ing recommender systems [16] to identify useful contents            tag-based recommender systems do not only consider tags
also in the opportunistic environment (mainly based on              that are directly associated with contents, but also the re-
content filtering and tag expansion) [9, 28, 29, 22]. Rec-          lations existing between tags, trying to extract a semantic
ommender systems perform better than publish/subscribe              meaning from the contents. To the best of our knowledge,
mechanisms by mainly relying on information about users’            currently there is no solution in the literature proposing
past actions (e.g., past purchases in e-commerce [17, 20]           the use of tag-based recommender systems in opportunistic
or past visualizations of multimedia content in video-on-           networks. We think that this could substantially change
demand services [3, 1]). At the same time, they can be              the way mobile users and services can access contents, es-
included in the more general definition of context-aware            pecially in Mobile Social Network scenarios.
systems, in which context information is mainly focused             We recently proposed a novel tag-based recommender sys-
on content characterization and it is directly defined by           tem, called PLIERS [2], which outperforms the state-of-
users. In fact, through the use of Social Tagging Systems           the-art tag-based recommender systems defined for online
(STS), mobile users can directly define tags (i.e., labels)         social networks and centralized solutions. In this paper,
that describe contents from a semantic point of view. The           we present a novel framework that exploits PLIERS prin-
ensemble of tags generated by users in a certain online sys-        ciples in a completely decentralized environment, relying
tem is known in the literature as folksonomy [22, 25]. An           only on the exchange of single nodes’ knowledge during
important aspect of folksonomies is that, differently from          proximity contacts and through D2D communications. We
ontologies, no relationship between the terms is required           called it Pervasive PLIERS (p-PLIERS). It represents a
a priori (hierarchical or not). On the contrary, these rela-        general framework for identifying useful and interesting
tionships are automatically built thanks to the tags created        contents in the mobile environment, on top of which sev-
by the users and assigned to contents. Folksonomies have            eral services can be built – from context-aware forward-
the ability to quickly adapt to changes in the user’s vo-           ing protocols, to content dissemination services (e.g., tar-
cabulary and to represent highly personalized information           geted advertising), to content sharing services, etc. We
about the users’ interests. These aspects fit well mobile           extensively evaluated the proposed solution through sim-
users’ behavior, especially when they define, generate, and         ulations considering three different application scenarios:
share contents on-the-fly, while participating in an event          (i) users attending Expo2015 during the World Food Day
or living a particular experience.                                  (one of the most crowded day of the entire event); (ii) users
The richness of information contained in folksonomies can           attending a scientific conference (ACM KDD 2015); (iii)
be used to improve the identification of interesting con-           users moving around the city of Helsinki during a working
tents for each single user, and also to determine relation-         day. For each scenario, we selected appropriate mobility
ships and affinity between different users, depending on the        traces, synthetic or real (when available) and we extracted
content they generate and share. To this aim, it is essential       real datasets from geo-localized Twitter interactions. The
to define an algorithm able to efficiently detect interesting       datasets would reflect the behavior of mobile users, gener-
content starting from the local user preferences and a lim-         ating and sharing contents related to a specific event or,
ited knowledge of what is available on the network. This            more in general, to their life in a European city.
represents a context reasoning problem in a distributed             The rest of the paper is organized as follows. In Section 2,
and mobile environment, where each mobile device has a              we present a summary of the use of recommender systems
local representation of context information in the network          in opportunistic networks for the identification of interest-
(e.g., folksonomy in case of recommender systems). This             ing contents. In Section 3, we describe PLIERS and we
local knowledge is a view of the global knowledge of the            compare it with existing tag-based recommender systems.
information available from the whole network. Nodes can             Then, in Section 4, we present p-PLIERS framework in
enrich their local knowledge by sharing it with other nodes         detail. In Section 5, we provide a general description of
they physically encounter, through opportunistic commu-             the experimental scenarios that we considered for the eval-
nications. They can take decisions about which contents             uation of the proposed solution. Specifically, in Section 6,
to download or forward to other nodes using their local             we present a comparison among existing recommender sys-
knowledge. In this way, mobile users can identify interest-         tems used in pervasive social networks, showing the ad-
ing content locally, without accessing a centralized service.       vantages of PLIERS. In Section 7, we present p-PLIERS
However, since folksonomies are based on user-defined               performance evaluation through simulations in the three
tags, they also have some drawbacks.              Synonyms,         different scenarios. Finally, we discuss examples of ser-
                                                                2
vices that can benefit from the framework in Section 8,                                                                               19/24
                                                                                                                  i1             i1
and we conclude the paper in Section 9.                                                          5/6
                                                                                                       u1              u1
                                                                                                                  i2                  5/24
                                                                                                                                 i2
                                                                                                 5/6 u
                                                                                                         2             u2
2. Related Work                                                                                                   i3                  3/8
                                                                                                                                 i3
                                                                                                 1/3 u
                                                                                     1                   3             u3
In the last few years, many researchers have used Rec-                          i1                                i4             i4
                                                                                                                                      5/8
                                                                                         ProbS    0 u
ommender Systems for disseminating content in a mobile               u1
                                                                                     0                   4             u4
                                                                                i2                                i5                  0
and pervasive setting. Most of the approaches in the liter-                                                                      i5
                                                                     u2
ature are based on popular Web Recommender Systems,                             i3
                                                                                     0                       b)             c)

conveniently modified for mobile environments – e.g., by             u3                                                               2/3
                                                                                     1                            i1             i1
reducing the computational complexity of the algorithms                         i4
                                                                                                  1
                                                                     u4                                u1              u1
and limiting the amount of necessary memory.                                         0 HeatS                      i2             i2
                                                                                                                                      1/2
                                                                                i5
The most popular and widely implemented system is rep-                                           1/2 u                 u2
                                                                           a)                            2                            1/3
resented by the Collaborative Filtering (CF) [13] approach.                                                       i3             i3
                                                                                                 1/2 u
The simplest implementation of CF makes recommenda-                                                      3             u3
                                                                                                                                      3/4
                                                                                                                  i4             i4
tions to a user based on items that other users with similar                                      0 u                  u4
interests liked in the past. The similarity in interests be-                                             4
                                                                                                                  i5             i5
                                                                                                                                      0
tween two users is calculated by a similarity metric (e.g.,
                                                                                                             d)             e)
cosine similarity or Pearson correlation) between their re-
spective histories (i.e., the sets of items they liked in the       Figure 1: Application of ProbS and HeatS with a bipartite graph.[34]
past). This kind of CF is also known as “user-based CF ”,
in contrast to the “item-based CF ” which models the pref-
erence of a user for an item based on ratings of similar            computational resources, these approaches take into ac-
items by the same user.                                             count just the relations existing between users and their
Typically, CF-based systems operate on second-order ten-            preferences, while they neglect the nature of the items
sors (or matrices) that represent the relationships between         shared in the network and do not exploit all the available
users and items (or an item-item matrix for the item-based          information from STS (i.e., the complete folksonomy).
CF). For this reason, CF systems could suffer from scal-            Tag-expansion [22] is the only solution proposed in the lit-
ability problems depending on the size of the data struc-           erature to perform folksonomy-based reasoning for content
tures to be kept in memory and on the sparseness degree             dissemination in an opportunistic environment. Using this
of the matrix. Since the number of items in STS is typ-             approach, each node builds a tag co-occurrence matrix to
ically high and far beyond users’ ability to evaluate even          identify the tags that are most frequently used in conjunc-
a small fraction of them, the data representation in a CF           tion with other tags (i.e., expanded tags), and it downloads
matrix is often highly sparse. Recent research work (e.g.,          an item if it is tagged with one of these tags. One of the
[9, 28, 29]) focused on the reduction of the complexity of          main drawbacks is that a node could receive more items
CF in order to identify useful content for mobile users in          than those really interesting for it because the user may
opportunistic networks and then optimize content dissem-            not be interested in the topics represented by the expanded
ination. For example, diffeRS [9] tries to classify users in        tags. In addition, this approach can suffer from scalability
two separate classes: mass-like minded and atypical users.          problems depending on the dimension of the data struc-
The former set is composed by users whose preferences               tures to be kept in memory and on the sparseness of the
are similar to the interests of other users encountered in          matrix. Since the distribution of the popularity of tags
the past (i.e., the community). For this kind of users,             in STS generally follows a long tail distribution [18], the
the recommendation is simply based on the average of the            data representation in a tag co-occurrence matrix could be
community’s preferences. For atypical users (i.e., users            often highly sparse.
whose preferences differ from those of the community), in-
stead, a user-based CF approach is applied, computed only           2.1. Tag-based Recommender Systems
among those other atypical users who similarly deviate              To overcome the limitations of tag-expansion in oppor-
from the community. Thereby, nodes exchange only the                tunistic networks, we propose to use a new family of recom-
contents that CF identifies as attractive to the local users.       mender systems based on folksonomies: Tag-based Recom-
MobHinter [28], instead, limits the CF computation to the           mender Systems [33]. In the literature, many approaches
most similar users according to a certain similarity mea-           for Tag-based Recommender Systems have been proposed,
sure, which takes into account different parameters (e.g.,          but – to the best of our knowledge – none of them has
resources in common, similar behaviors, and so on). More-           been so far employed in opportunistic networks. Among
over, it proposes different strategies to limit the amount          different possible tag-based solutions, diffusion-based [33]
of information exchanged by nodes every time they meet.             algorithms are the most promising ones for our reference
However, even though there has been a complexity reduc-             scenario. These algorithms try to overcome the complex-
tion to allow CF to run on mobile devices with limited              ity and scalability issues by using graphs as a natural way
                                                                3
u1         u3       u2        u4     u5         u6         u7               resource among the items connected to her (second diffu-
                                                                                     sion step) and each item then receives the same portion of
                                                                                     the resource. The final score of each item ij is given by the
                                                                                     sum of the portions of resources that are assigned to it after
 Verdi   Bach     Piano   Bocelli   Concert   Football   Millwall        Milan
                                                                                     the two diffusion steps. The set of all the scores obtained
                                                                                     in this way is called resource vector and it can be used to
                               HeatS                ProbS
                                                                                     rank the items not directly linked to the target user. The
                   (a) First recommendations to U1                                   higher the score obtained by an item, the greater could
                                                                                     be the interest in it for the target user. The mechanism of
                                                                                     HeatS is similar to ProbS, but it is based on opposite rules:
                                                                                     each time a resource (or a portion of it) is redistributed, it
         u1         u3       u2        u4     u5         u6
                                                         U6         u7
                                                                                     is divided by the number of edges connected to the node
                                                                                     towards which it is heading to. Figure 1 depicts the dif-
                                                                                     fusion steps of the two algorithms, highlighting the dif-
                                                                                     ferences in the two recommendations. Although they may
 Verdi   Bach     Piano   Bocelli   Concert   Football   Millwall        Milan       seem a good way to make recommendations, actually both
                          HeatS                                      ProbS           of them are biased by the presence of extremely popular
                   (b) First recommendations to U6                                   or non-popular items or tags, and they do not take into
                                                                                     account the characteristics of the user’s interests. For this
                Figure 2: ProbS and HeatS suggestions.                               reason, they present strong limitations if applied to STS.
                                                                                     Specifically, ProbS tends to recommend most popular con-
                                                                                     tents, while HeatS tends to highlight those with minimal
to represent folksonomies. In these cases, nodes of these                            popularity (i.e., with the smallest possible number of users
graphs represent users, items and/or tags, while their links                         connected to them). Figure 2 depicts an example of the
represent relationships among nodes. Since nodes are di-                             described behavior. If we consider a target user interested
vided into three separate classes, folksonomies are usually                          in tags with low popularity (u1 in Figure 2a), HeatS will
represented as tripartite graphs, or by separate bipartite                           suggests tags with low popularity (that, however, may be
graphs for user-item or user-tag relationships in order to                           of limited interest for the target user from a semantic point
further reduce the computational complexity. These ap-                               of view). On the other hand, ProbS tends to recommend
proaches rely on the diffusion of fictitious resources within                        tags with high popularity, possibly semantically unrelated
the folksonomy graph, starting from a node representing                              with the user’s interests. By contrast, if we consider a
a target user (i.e., the target of the recommendation) and                           user with popular interests (u6 in Figure 2b), ProbS high-
diffusing the resources by following links between nodes.                            lights the correct tags and, instead, HeatS still (wrongly)
This permits to identify relevant items (or tags) as those                           recommends the less popular tags.
nodes that are indirectly connected to the target user via                           To overcome these limitations, a ProbS+HeatS hybrid ap-
other users with whom she shares one or more connec-                                 proach (hereafter just Hybrid ) [34] has been recently pro-
tions. The higher the number of links connecting an item                             posed in the literature. This algorithm calculates a linear
i to the items of the target user, the higher the score that                         combination of the results of ProbS and HeatS with a pa-
i will receive for the recommendation. In this way, the                              rameter λ governing the relative importance of one of the
recommender system exploits the structure of the graph                               two original algorithms. However, the problem of Hybrid
to identify content relevant for the user. In addition, this                         (and other recently proposed solutions [23, 21, 30, 24]),
approach gives the opportunity to exploit additional hid-                            precisely lies in the use of parameters which can vary
den information derived from graph-based analysis (e.g.,                             greatly depending on the nature of the dataset, and that
community detection [12]), which could be used to further                            are difficult to estimate in real situations.
customize recommendations.                                                           PLIERS [2] solves the dilemma of the choice between pop-
The oldest and most famous diffusion-based solutions are                             ular or non-popular items in the network in a more natu-
represented by two algorithms: ProbS (also known as Mass                             ral way than the other diffusion-based algorithms, without
Diffusion) [35, 31] and HeatS [32]. By applying ProbS to                             requiring any parameters to tune, and ensuring that the
user-item bipartite graphs, a generic resource is assigned                           popularity of recommended items is always comparable
to each item is directly linked to the target user ut of                             with the popularity of items already adopted by the users.
the recommendation such that its value is 1 if an edge is                            PLIERS assumes that if the user is interested in general
present between ut and is , and 0 otherwise. These items                             categories of items, with a high number of connected users
represent the content that the user has created or down-                             (i.e., popular items), the recommendations will be general
loaded in the past. The resource is then split (first diffu-                         as well. On the other hand, if the user is interested in less
sion step) among the users directly connected to the item                            popular items, the recommendations will prefer items with
and each user receives the same portion of the resource.                             less connected users.
Subsequently, each user splits the portion of the received                           In addition, PLIERS assumes that a very popular item/tag
                                                                                 4
Table 1: Ranking for User 1 for PLIERS (PL), ProbS (Pr), HeatS
              2
                                                                                             (He) and Hybrid (Hy).

             1.5                                                                                         Tag                     PL      Pr      He      Hy
                                                                                                         orchestra (1)              1     12       1        1
Popularity

                                                                                                         debussy (1)                2     13       2        2
              1
                                                                                                         thepianoguys (1)           3     16       3        3
                                                                                                         pavarotti (1)              4     19       4        4
             0.5
                                                                                                         nabucco (1)                5     20       5        5
                                                                                                         millwall (22)             13      5      14       13
                                                                                                         music (24)                15      4      15       14
              0                                                                                          sherlock (27)             16      3      16       15
                                                                                                         sport (41)                19      2      19       18
                        t

                               n

                                              i

                                             o

                                             n

                                                    ch

                                                               i

                                                                       al

                                                                                  t
                                            rd

                                                             in
                        r

                                                                               ar
                             an

                                          an

                                           ve
                      be

                                                                    ic
                                                            s
                                                  Ba
                                         Ve

                                                                            oz
                                                         os

                                                                  ss
                                        ho
                           m

                                        Pi
                    hu

                                                                                                         startrek (44)             22      1      22       19
                                                                        M
                         hu

                                                                la
                                                         R
                                     et
                   Sc

                                                                C
                                   Be
                        Sc

                            Figure 3: Tags popularity for User 1.
                                                                                             3.1. Notation
                                                                                             Formally, a folksonomy can be represented with three node
can semantically relate to a more “generic” topic com-                                       sets: users U = {u1 , . . . , un }, items I = {i1 , . . . , im } and
pared to a less popular item/tag that, instead, describes                                    tags T = {t1 , . . . , tk }. Consequently, each binary relation
a more “specific” topic. For example, any content related                                    between them can be described using adjacency matrices,
to the football club Millwall can be tagged with both tags                                   AUI , AIT , AUT respectively for user-item, item-tag and
“Millwall” and “Football” but the opposite is not always                                     user-tag relations. If the user ul has collected the item is ,
true: all content concerning football will not always be                                     we set aUI                      UI                              IT
                                                                                                     l,s = 1, otherwise al,s = 0. Similarly, we set as,q =
tagged with “Millwall”. According to this assumption,                                                                                     IT
                                                                                             1 if is has been tagged with tq and as,q = 0 otherwise.
we can therefore say that the tag “Football” refers to a
                                                                                             Furthermore, connections between users and tags can be
more generic topic than that referred by the tag “Mill-
                                                                                             represented by an adjacency matrix AUT , where aUT           l,q = 1
wall”. Users interested in the Millwall football club, but                                                                               UT
not connected to items tagged with “Football”, are clearly                                   if ul owns items tagged with tq , and al,q = 0 otherwise.
not interested in all the items tagged with the latter tag, as                               In the next subsection, we consider user-item bipartite
these could contain information about other football clubs.                                  graphs described by the AUI adjacency matrix.
PLIERS leverages this assumption to improve the recom-
mendations and to provide more personalized items to the                                     3.2. The algorithm
users. We demonstrated that PLIERS outperforms all the                                       PLIERS is inspired by ProbS and shares with it the same
other diffusion-based approaches applied to Online Social                                    two diffusion steps. In addition, PLIERS normalizes the
Networks, both in terms of recommendation accuracy and                                       value obtained by ProbS when comparing an item ij with
personalization [2]. In this paper we present p-PLIERS, a                                    one of the items is of the target user ut . This normalization
framework able to merge the knowledge about users, inter-                                    is performed by multiplying the recommendation score by
ests and contents on single mobile devices and to evaluate                                   the cardinality of the intersection between the set of users
the relevance of the available contents for each single user                                 connected to ij and the set of users connected to is , divided
through PLIERS recommendations. To make the reader                                           by k(ij ), which is the popularity of ij . In this way, items
able to completely understand this new solution, we pro-                                     with popularity similar to the popularity of the items of
vide, in the next sections, an extended description of PLI-                                  the target user, and that possibly share the same set of
ERS notation and algorithm with respect to [2]. Then, in                                     users, are preferred.
Section 4, we present p-PLIERS.                                                              The final value of the item ij for the target user ut , in
                                                                                             a graph with n users and m items, is then calculated by
                                                                                             PLIERS with the following formula.
3. PLIERS: PopuLarity                             based         ItEm            Recom-
   mender System
                                                                                                         Xn X m
                                                                                                                 al,j · al,s · at,s |Us ∩ Uj |
In this section, we present PLIERS working principles on                                        fjpl =                                           j = 1, . . . , m,   (1)
                                                                                                         l=1 s=1
                                                                                                                  k(ul ) · k(is )     k(ij )
a simple bipartite graph, and its comparison with ProbS,
HeatS and Hybrid. Then, we describe in detail how PLI-                                       Here, Uj is the set of users connected to the item ij , k(ij )
ERS can be applied to a tripartite graph, which is used                                      is the popularity degree of the item ij (i.e., the number of
by p-PLIERS to represent knowledge in opportunistic sce-                                     connected users), and ax,y is an element of the AUI matrix.
narios.                                                                                      The higher the value of fjpl , the more item ij is similar to
                                                                                             the items already owned by ut .

                                                                                         5
Table 2: Ranking for User 3 for PLIERS (PL), ProbS (Pr), HeatS
             50
                                                                   (He) and Hybrid (Hy).

             40
                                                                              Tag                  PL     Pr    He     Hy
                                                                              3g (1)                26    26      5     18
Popularity

             30
                                                                              bag (1)               24    24      4     17
                                                                              carling (1)           23    23      3     16
             20
                                                                              cpu (1)               22    22      2     15
                                                                              cricket (1)           25    25      1     14
             10                                                               seriea (19)            1     1      6      1
                                                                              android (21)           4     8     21      6
             0                                                                googleglass (21)       3     7     20      5
                                                                              mobile (21)            2     6     19      4
                      on ic
                       M rt
                       nS n
                       Pi iro
                     M ano
                       Sc ll
                       S la
                   Pa ell rt
                  St var s
                       W i
                              s
                  ia erlo r
                       G k
                       ot s
                   St sca ll
                      ar la

                   N oog k
                     ab le
                      Bo cco
                             lli
                     ar ott

                eP Sh Inte
                    La llwa

                      E ba
                            o

                           ar

                    no c
                    Fo uy

                     G re
                    Sa ila
                          ce

                     2C po

                          ce
                    C us

                           a

                         T

                         u
                       M

                                                                              crime (23)            12     4     23      7
                       i

                                                                              pop (23)               8     5     24      8
              Th

                                                                              tv-series (34)         9     3     25      3
                  Figure 4: Tags popularity for User 3.                       sci-fi (37)            5     2     22      2

3.3. Working principles of PLIERS on a simple graph                data) recommend similar tags for the first 12 positions,
To describe in detail the mechanism of PLIERS, and to              highlighting tags with a popularity degree similar to those
prove its effectiveness, we manually built a synthetic user-       already connected to the user, and more semantically re-
tag bipartite graph consisting of 64 users, 62 tags and a          lated with them (e.g., “Debussy”, “Orchestra”). On the
total of 612 edges. The graph represents the relation be-          contrary, ProbS presents wrong recommendations in the
tween a set of users and several tags, which are character-        first positions, by assigning a higher score to uncorrelated
ized by a variable degree of popularity. A link between a          tags (from a semantic point of view), which are charac-
user and a tag indicates that the user owns that tag (i.e.,        terized with a popularity degree that deviates too much
she has already downloaded a content labeled with that             from that of the tags held by the user (e.g., “StarTrek”,
tag).                                                              “Sport”).
Figure 5 shows the structure of a synthetic bipartite graph,       By contrast, User 3 is interested in both non-popular tags
highlighting the characteristics of two opposite cases: the        (such as “Pavarotti”, “2Cellos”, and “Nabucco”) and par-
first user (User 1), connected just to non-popular tags            ticularly popular (“StarTrek”, “Star Wars”, “Millwall”) or
(more specific), and the second one (User 3), connected            generic tags (“Sport”, “Music”), which increase the aver-
to tags with, on average, higher popularity. To clarify the        age popularity of her topics to the value of 15.3 (Figure 4).
difference between popular and non-popular tags: the tag           In this case, PLIERS adapts its recommendations to the
“TV-Series” is far more popular than “SherlockHolmes”              “generic” nature of the target’s interests, deviating from
and, since the majority of users connected to the tag “Sher-       HeatS and agreeing with the suggestions made by ProbS
lockHolmes” are also connected to “TV-Series” (but the             and Hybrid (see Table 2).
opposite is not always true), semantically speaking, we
can refer to “TV-Series” as a “superclass” of “Sherlock-
Holmes”.
In the following, we compare the differences in the recom-
mendations lists (i.e., the resource vectors) generated by
PLIERS, ProbS, HeatS and Hybrid for the two considered
users.
User 1 is interested in topics related to classical music
(“Schubert”, “Verdi”, “Rossini”, etc.) that, in our graph,
have a low popularity with mean equal to 1.22. Figure 3
depicts the popularity of each tag connected to User 1 in
terms of number of users connected to that tag.
Table 1 contains the results obtained from the execution of
the four algorithms, considering User 1 as the target. We
reported the top 10 tags common to the rankings of all
the algorithms, together with their position in each rank-
ing, so as to better compare the differences among the
approaches. Next to each tag, its popularity is reported.             Figure 5: Structure of the synthetic user-tag bipartite graph.
PLIERS, HeatS and Hybrid (with λ = 0.5 – the value
generally used when there is no a priori knowledge on the
                                                               6
Affinity                    Symmetrically, we define the similarity index of an item
                       u1         u2          u3                                   ij with respect to the items already linked to the target
                                                                                   user ut , as

                                                                                                       Xk X m
                                                                                                               az,j · az,s · at,s |Ts ∩ Tj |
                                                                                               fjs =                                ·          ,   (3)
                                                                  Similarity                           z=1 s=1
                                                                                                                ki (tz ) · kt (is )   kt (ij )
                i1           i2          i3         i4                             where Tj is the set of tags connected to the item ij , kt (ij )
                                                                                   is the number of tags with which the item ij was marked,
                                                                                   ki (tz ) is the number of items connected to the tag tz , and
                                                                                   ax,y is an element of the matrix AIT .
                                                                                   We define the final score of an item ij as the linear com-
      t1        t2           t3         t4          t5            t6               bination of the two indices (2) and (3) as follows.

                                                                                                        fj = λ · fja + (1 − λ) · fjs ,             (4)
Figure 6: Affinity and similarity indices in a tripartite folksonomy
                                                                                   where λ ∈ [0, 1] is a tunable parameter of the algorithm
graph.                                                                             with which we can weigh the links between users and items
                                                                                   and those between items and tags.

3.4. Extension to tripartite graphs
                                                                                   4. Pervasive PLIERS
The three adjacency matrices introduced in Section 3.1 can
be represented as a tripartite graph G = (U, I, T, E, F )                          p-PLIERS implements a framework for the representation
where U is the set of users in the folksonomy, I is the set                        and exchange of context information describing contents
of items, T is the set of tags, E is the set of links between                      and users among nodes of an opportunistic network. It
users and items, and F is the set of links between items                           exploits PLIERS for the evaluation of the collected knowl-
and tags. If an edge el,s between the user-node ul and the                         edge and to provide personalized recommendations to the
item-node is exists, we say that the user ul was interested                        users about available interesting contents.
in the item is (i.e., she created or downloaded it in the                          To this aim, p-PLIERS supports the maintenance of a lo-
past) and, in a completely analogous way, if the item-node                         cal representation of the knowledge about the users, items
is is connected to the tag-nodes ti , . . . , tk , we mean that                    and tags in the network and their relations (i.e., a local
the item is was tagged with ti , . . . , tk .                                      knowledge graph - LKG) on each node of the network.
Recommender systems based on tripartite rather than bi-                            This knowledge is built by merging the information about
partite graphs exploit all the information available from                          contents created or downloaded by each local user, with
social tagging systems, and this can lead to better results                        the local knowledge of other nodes gathered during phys-
in terms of recommendation accuracy and precision, as we                           ical contacts. The nodes use this partial knowledge to
show in Section 6.                                                                 evaluate the relevance of items (content) carried by other
As depicted in Figure 6, in a tripartite graph we can define                       nodes in proximity, with respect to the interests of their
two characteristic values:                                                         local user. The evaluation is performed locally, without re-
                                                                                   quiring centralized control and without the need of having
Affinity index: is the score obtained by PLIERS when                               a global knowledge of the network (i.e., global knowledge
     applied on user-item links. This indicates the degree                         graph - GKG). We represent the LKG of each node as a tri-
     to which an item is close to the user’s interests and                         partite graph as described in the previous section. When
     preferences.                                                                  a node creates a new item, it updates its LKG by inserting
Similarity index: is the score obtained by PLIERS when                             the relation between the user entity that represents it in
     applied on tag-item links. This indicates the degree
     to which two items are similar in terms of the tags                           Algorithm 1 p-PLIERS: Content discovery and evalua-
     with which they are labeled.                                                  tion in opportunistic networks
More formally, we define the affinity index of an item ij                           1:   procedure Encounter(node n)
for a target user ut with the following formula:                                    2:      Send my LKG to n
                                                                                    3:      nLKG ← Receive n’s LKG
                    Xn X m
                            al,j · al,s · at,s |Us ∩ Uj |                           4:      Update my LKG with nLKG
            fja =                               ·          ,             (2)
                            ki (ul ) · ku (is )   ku (ij )                          5:      I ← new discovered items
                    l=1 s=1
where Uj is the set of users connected to the item ij , ku (ij )                    6:      for each item i ∈ I do
is the popularity degree of the item ij (i.e., the number of                        7:         score(i) ← evaluate i with PLIERS
connected users), ki (ul ) is the number of items connected                         8:      end for
to the user ul , and ax,y is an element of the matrix AUI .                         9:   end procedure

                                                                               7
1                                                           1                                                    500
                                                                                                                               450
                                                                                                                               400
                                                                         0.1                                                   350

                                                                                                                   Frequency
             0.1                                                                                                               300
P(X > x)

                                                           P(X > x)
                                                                                                                               250
                                                                        0.01                                                   200
                                                                                                                               150
            0.01                                                                                                               100
                                                                       0.001                                                    50

                                                                                                                                       po r
                                                                                                                                                15

                                                                                                                                                  s
                                                                                                                                     tio lla

                                                                                                                                      po ay

                                                                                                                                                  ti

                                                                                                                                 be a2 d
                                                                                                                                      de 5
                                                                                                                                      nk ita

                                                                                                                                                  n
                                                                                                                                    ex ge

                                                                                                                                               w

                                                                                                                                               is

                                                                                                                                                o
                                                                                                                                    ro 01

                                                                                                                                              oo
                                                                                                                                  na tare

                                                                                                                                  ex ld

                                                                                                                                   ba llav
                                                                                                                                           20

                                                                                                                                             fo
                                                                                                                                           ne

                                                                                                                                            m
                                                                                                                                  at un

                                                                                                                                         im
                                                                                                                                        na

                                                                                                                                         tti
                                                                                                                               un roh

                                                                                                                                       at

                                                                                                                                       m
                                                                                                                                     m

                                                                                                                                     g
                                                                                                                                  ze
           0.001                                                      0.0001
                   1             10                  100                       1                              10

                                                                                                                               al
                         Number of tweets per user                                 Number of tags per tweet

                               (a)                                                      (b)                                              (c)

Figure 7: Descriptive statistics of the Twitter dataset used for the WFD@Expo2015 scenario. (a) CCDF of the number of tweets per user,
(b) CCDF of the number of tags per tweet, and (c) Frequency of use of the most used tags.

the graph and the created item, and the relations between                                     The static scenario allowed us to evaluate the accuracy
the item and its tags. When a node encounters another                                         of PLIERS using a standard evaluation method (i.e. link
node in the network, the two nodes exchange their LKGs,                                       prediction, as detailed in the following). For this scenario,
and locally integrate them with the received information.                                     we downloaded the tweets generated during World Food
The basic operations of our solution are summarized in                                        Day at Expo 2015 (WFD@Expo2015) in the urban area
Algorithm 1.                                                                                  of Milan, and we built a tripartite graph composed by
Our solution relies on the fact that, as we will demonstrate                                  users, tweets, and their hashtags. This represents a realis-
with simulations, using the local knowledge of each node                                      tic folksonomy graph of online tagged contents related to
is sufficient to correctly evaluate the relevance of contents,                                a popular event.
and leads to recommendations that are compatible with                                         For the dynamic scenarios, we considered different situa-
the results that one would obtain with a global knowledge                                     tions in which pervasive communication systems may be
about all the information available in the whole network at                                   used. Specifically, the scenarios are characterized by vari-
the time of the evaluation. This supports the decentral-                                      able number of people moving with different mobility pat-
ization of recommendations, and allows us to efficiently                                      terns generated by human mobility models or derived from
apply recommender systems, and PLIERS in particular,                                          real contact traces. Each scenario represents a specific
to dynamic mobile environments.                                                               application use case: (i) a big event (WFD@Expo2015),
                                                                                              (ii) a conference (ACM KDD’15), and (iii) an urban area
                                                                                              (Helsinki city center). We performed a set of simulations
5. Experimental Evaluation: General Description
                                                                                              based on these scenarios, where each person is expected
As a first set of experiments, we evaluated the accuracy                                      to use a mobile device able to communicate through D2D
of PLIERS recommendations with respect to other recom-                                        communications with other devices in proximity. In addi-
mender systems typically used for content dissemination in                                    tion, each device allows the users to generate and tag con-
opportunistic networks. To do so, we considered a static                                      tents over time, and uses p-PLIERS to identify potentially
user-item-tag graph obtained from Twitter to evaluate the                                     interesting contents created by others. We think that the
accuracy of the recommendations given by PLIERS. Then,                                        scenarios cover a significant set of cases where pervasive
we thoroughly evaluated p-PLIERS in three different dy-                                       and mobile systems may be applied, and represent thus
namic opportunistic scenarios. In both cases (static and                                      the basis for realistic experimental evaluations.
dynamic), we used graphs derived from real online tagged
contents obtained from Twitter, and we set the parameter                                      6. PLIERS Experimental Evaluation in a Static
λ in equation 4 to 0.5.                                                                          Scenario

Table 3: Statistics of the Twitter dataset used for the simulations.
                                                                                              We compared the performances of PLIERS with respect
                                                                                              to the other recommender systems proposed in the litera-
                   Statistic                                          Value                   ture for opportunistic networks, namely user-based Col-
                   N. of tweets                                        5,260                  laborative Filtering and Tag Expansion. To assess the
                   N. of users                                         2,946                  accuracy of the recommendations, we performed a link
                   N. of unique hashtags                               3,292                  prediction task on a tripartite graph. This is a standard
                   Average n. of hashtags per tweet                     2.12                  way to evaluate and compare recommender systems [16].
                   Max n. of hashtags per tweet                           13                  In essence, the technique consists in randomly removing
                   Min n. of hashtags per tweet                            1                  a small portion of links from a folksonomy graph, then
                                                                                              verifying whether the recommendations generated on the

                                                                                        8
1200%                                                                             100%
                                PLIERS vs Tag-Exp                  PLIERS vs CF                                            PLIERS vs CF                 PLIERS vs Tag-Exp

                   1000%
                                                                                                          80%
                   800%
  Precision Gain

                                                                                            Recall Gain
                   600%                                                                                   60%

                                                                                                          40%
                    60%
                    40%
                                                                                                          20%
                    20%
                     0%                                                                                   0%
                           10    20    30   40      50       60   70   80   90 100                              10    20    30    40      50       60    70    80    90 100
                                                         k                                                                                     k

Figure 8: Precision gain obtained by PLIERS in a complete knowl-                         Figure 9: Recall gain obtained by PLIERS in a complete knowledge
edge scenario.                                                                           scenario.

pruned graph coincide with the removed links. Intuitively,                               not be able to recover the removed links, reducing the sig-
a good recommender system should be able to identify the                                 nificance of the experimental results. The percentage of
items that were originally connected to the removed links.                               links between users and items we deleted from the original
                                                                                         graph is equal to 1.4%. We computed the performances
6.1. Dataset Description                                                                 of each method using the measures of Precision (P) and
We downloaded a dataset of 5,260 tweets generated during                                 Recall (R) [16], defined as follows:
World Food Day at Expo 2015 through Twitter Stream-
                                                                                                                       1 X 1       X   1
ing API. To do so, we applied a filter to Twitter API by                                                        P =                         ,                                 (5)
indicating a set of hashtags (e.g., #Expo2015, #ExpoMi-                                                               |U | |T (u)|   pos(t)
                                                                                                                           u∈U                 t∈T (u)
lan2015, and other possible combinations) and we queried
only tweets generated in the area of Milan. We decided to                                                                   1 X |L(u) ∩ T (u)|
                                                                                                                  R=                           ,                              (6)
validate PLIERS using tweets generated in a limited area                                                                   |U |    |T (u)|
                                                                                                                                 u∈U
during a thematic event such as the WFD@Expo2015 since
                                                                                         where U is the set of users for whom we have removed
we think that the semantic relationships between the rel-
                                                                                         links, L(u) is the recommendation list for the user u, T (u)
ative folksonomy entities is significant, and we expect to
                                                                                         is the set of links removed from the user u, and pos(t) is
obtain meaningful and realistic recommendations.
                                                                                         the position in which the removed item t appears in the
The number of different users that generated the down-
                                                                                         list of recommended items L(u).
loaded tweets is 2,946. We removed the hashtags that we
used to filter the data (e.g., #Expo2015, #ExpoMilan)
                                                                                         6.3. Results
as they are contained in all the tweets, and they are thus
not useful to describe the contents. In Table 3, we report                               Figures 8 and 9 depict respectively the Precision gain and
the statistics of the dataset. Figure 7b and Figure 7a de-                               the Recall gain of PLIERS with respect to both CF and
pict the CCDF of the number of tags per tweet and the                                    Tag Expansion. The gains are expressed in percentage and
number of tweets per user. It is worth noting that both                                  identify the improvement obtained by PLIERS in terms of
CCDFs show a long-tailed distribution, a typical result for                              precision and recall with respect to the other algorithms.
social networks. In addition, in Figure 7c, we depict the                                The parameter k in the figures represents, for CF, the
frequency of use of the 10 most used tags in the dataset.                                number of users similar to the target user to be considered
                                                                                         in the calculation, and for Tag Expansion the k tags more
6.2. Link Prediction Task                                                                correlated with the tags of the target user to be considered.
We removed 1 link from each user connected to at least                                   Higher values of k should therefore lead to better recom-
5 items with popularity greater than 1, and then we ran                                  mendations. It is worth noting that PLIERS outperforms
PLIERS and the other systems on the updated graph to                                     the two reference algorithms for both measures, receiving a
generate the recommendation list for each user. Then,                                    precision score up to eight times higher than that obtained
we calculated the percentage of removed links that are in-                               by Tag Expansion and a score 40% higher than CF, even
cluded in the recommendations of each algorithm (i.e., the                               in case of very high values of k. With regard to the Recall
“recovered links”). We selected the links to be removed                                  measure, PLIERS obtains a score around 50% higher than
in this particular way in order to avoid the complete iso-                               that received by Tag Expansion and around 30% higher
lation of the items connected to just a single user. In                                  than that of CF.
fact, in those cases, all the recommender systems would
                                                                                     9
These results indicate that PLIERS, being able to exploit            knowledge about the associations between the contents in
all the information contained in the folksonomy graph (i.e.,         the network and the nodes which created them, and the
user-item-tag relationships), obtains better results than            associations between the contents and the tags associated
the state-of-the-art solutions used in opportunistic net-            with them.
works, which consider only partial user-item or item-tag             The LKG of each node is initially empty. Every time a
relationships.                                                       new item (tweet) i is created by node u, u, i, and each tag
                                                                     t associated with i are added to LKGu (if not present), as
                                                                     well as the link connecting u and i (eui ) and all the links
7. p-PLIERS Experimental Evaluation in Dy-
                                                                     connecting i and each of its tags t (fit ).
   namic Scenarios
                                                                     We also maintain GKG, the global tripartite graph that
In [11], ElSherief et al. established theoretical limits on          represents the complete knowledge of all the users, items,
the performance of knowledge sharing in opportunistic so-            and tags in the network at a certain time.
cial networks. They calculated how many contacts are
needed to ensure that nodes are able to well approximate             7.2. Measures
the global knowledge (i.e., all the available information            At each time step of the simulation (i.e., every minute),
in the network) for different sharing policies in a scenario         the simulator calculates the following measures:
where the global knowledge is static and defined a priori.
We performed an empirical evaluation similar to that car-              a. Average similarity between the LKG of each node
ried out in [11], but using more complex and realistic refer-             and the GKG.
ence scenarios, where the information is dynamically gen-              b. Average similarity between the vector of recommen-
erated over time by the nodes. Specifically, in our reference             dations generated by PLIERS on the local and global
scenarios, the users are characterized by different mobility              graphs.
and content generation patterns.
                                                                     To calculate the similarity between tripartite graphs, we
                                                                     first flattened the graphs, obtaining two adjacency lists L1
7.1. p-PLIERS Pervasive Simulator
                                                                     and L2 , and then calculated the Jaccard index J on these
To simulate the dynamic scenarios, we implemented a                  lists:
high-level simulator that emulates the execution of p-
PLIERS on a set of mobile nodes that generate contents                                                    |L1 ∩ L2|
                                                                                          J(L1 , L2) =                             (7)
over time. The simulator requires in input (i) a set of                                                   |L1 ∪ L2|
traces defining the physical contacts between nodes over             In the same way, the similarity of two ranked recommen-
time and (ii) a list of contents generated by the nodes and          dation vectors R1 and R2 is calculated as J(R1 , R2 ).
marked with a timestamp. Then, it calculates statistics              Although the Jaccard index is a standard and widely used
to evaluate p-PLIERS both in terms of its ability to ap-             measure of similarity, it does not consider possible differ-
proximate the global knowledge from a local perspective,             ences in terms of position in the rankings given by the rec-
and the accuracy of the given recommendations. The sim-              ommendation vectors, but rather considers only the pres-
ulation process proceeds in discrete steps, each of which            ence or absence of recommended items in the two vectors.
represents 1 minute of simulated time. For each contact              To account for possible differences in terms of position of
in the simulation steps, indicated by the timestamps in              the same elements in the rankings produced either using
the contact traces, the involved nodes establish a D2D               local or global knowledge graphs, we also calculated a sim-
communication between each other and they perform the                ilarity measure based on Spearman’s Footrule [6] defined
operations of Algorithm 1.                                           as follows:
Note that the simulator is at a higher abstraction level
than other existing network simulators (e.g., generic net-
                                                                                                P
                                                                                                           d(x, R1 , L2 )
work simulators such as ns-31 or OMNeT++2 , and simula-                        S(R1 , R2 ) = 1 − x∈R1 ∩R2                 ,   (8)
                                                                                                    max(|R1 |, |R2 |)
tors specific for opportunistic networks such as TheONE3 ).
Thus, at this level, we do not consider network-related is-          where
sues, but we assume that, when two nodes are in contact,
the communication channel between them was successfully                                   (
                                                                                              |R1 (x) − R2 (x)|   if x ∈ R1 ∩ R2
established.                                                           d(x, R1 , R2 ) =                                            (9)
To represent the knowledge about the available content in                                     max(|R1 |, |R2 |)   otherwise,
the simulated mobile network, each node u maintains a
local tripartite graph LKGu that represents its (partial)            where R(x) is the position of object x in the ordered set
                                                                     R. S is similar to the Jaccard index, but, in addition, it
                                                                     penalizes possible differences in terms of rankings of the
  1 https://www.nsnam.org/                                           elements in the resource vectors.
  2 https://omnetpp.org/
                                                                     In addition, for each simulation step, we calculated:
  3 https://akeranen.github.io/the-one

                                                                10
...

Figure 10: Map of Expo 2015 area with the position of five of the simulated communities. Note that the grid in the figure is only an example
to show how we divided the area for the simulations, but it does not represent the real grid used.

   c. Number of contents generated by the nodes over                     presence of “communities”, i.e., each node has a higher
      time.                                                              probability to meet nodes within its own community than
   d. Average number of contacts between nodes over                      nodes belonging to different communities. This fits well
      time.                                                              with WFD@Expo2015 scenario, as the area was divided
                                                                         into several pavilions. We can reasonably assume that
These measures are used to characterize the contact traces               the mobility of people inside each pavilion was lower than
and the contents used in the different scenarios. We an-                 the mobility of people moving between pavilions, and that,
ticipate that the synthetic and the real traces that we                  consequently, the density of intra-community contacts was
used show similar properties (e.g., the contact traces used              higher than that of inter-community ones. We generated
for the WFD@Expo2015 scenario show values compatible                     mobility traces through HCMM for 60 communities, ap-
with those used in the conference scenario), thus support-               proximately the number of pavilions at Expo (see Fig-
ing the significance of the synthetic trace.                             ure 10 for a graphical representation of a map of the area
We also calculated all the measures by considering that the              of Expo and the communities used in HCMM). We set
interests of nodes may be limited in time. To do so, we cal-             a probability of inter-community contacts of 0.1 (this pa-
culated the measures using only the most recent contents                 rameter is called “rewiring” probability in HCMM)4 . The
generated in the network and considering only the infor-                 speed of nodes ranges between 0.01 m/s (almost steady
mation about these contents in the folksonomy graphs. In                 nodes) and 1.86 m/s (nodes representing people walking
the simulations, we considered different “expiry date” for               relatively fast).
the contents, i.e., 1, 2 or 3 hours.                                     Using HCMM, we simulated the contacts between
                                                                         nodes for 13 hours, to cover the timespan of the
7.3. Scenario 1 - Big Event: World Food Day @Expo2015                    WFD@Expo2015, from 10am (opening time) till 11pm
As a first dynamic scenario for the evaluation of p-                     (closing time). We considered that each node has a trans-
PLIERS, we considered a big event attended by a large                    mission range of 20 meters (the maximum distance to avoid
number of people in a relatively large area. In this sce-                that nodes in adjacent pavilions are constantly in contact
nario, accessing the Internet from mobile devices may be                 with each other). When two nodes are in the transmission
problematic and thus obtaining useful content from D2D                   range of each other, HCMM generates a contact between
communications would provide an important source of in-                  them. We think that HCMM mobility model with these
formation for the users. We considered the World Food                    settings can well approximate the real mobility of people
Day at Expo 2015, organized on October 16, 2015. We as-                  during WFD@Expo2015.
sumed that people attending the event were able to create
tagged contents from their mobile devices, and that other                7.3.2. Content Generation
attendees might have been interested in obtaining these                  To have a realistic representation of multimedia contents
contents through D2D communications.                                     generated during the WFD@Expo2015, we downloaded
                                                                         the tweets generated during the event, using the Twit-
7.3.1. Mobility Traces                                                   ter Streaming API. To be sure to obtain only information
We simulated a set of nodes moving within an area of
300 x 2,000 meters, which coincides with the Expo area                       4 We performed the same simulations described in the following
in Milan. Each node represents a person equipped with a                  also with rewiring probabilities 0.2 and 0.3 (thus considering more
mobile device. The mobility of nodes is simulated through                dynamicity in the movements) and this resulted in a general, al-
                                                                         though slight, improvement due to a higher inter-community mo-
the HCMM human mobility model [5]. The model gener-                      bility of the nodes. Since the contact traces generated by setting
ates contacts between nodes according to a Pareto dis-                   a rewiring probability of 0.1 represent the worst case scenario, we
tribution, as observed in real traces, and considers the                 present only the results for this parameter.

                                                                    11
1                                                                                                 1                                                                                        100
                                                                                                                                                                                                                   90         250 users
                                                                                                                                                                                                                              500 users
                                                                                                                                                                                                                   80         900 users
                                                                                                                        0.1                                                                                        70

                                                                                                                                                                                                     Frequency
                                                                                                                                                                                                                   60
             0.1
                                                                                                                                                                                                                   50
P(X > x)

                                                                                                    P(X > x)
                                                                                                                                                                                                                   40
                                                                                                                     0.01                                                                                          30
                                                                                                                                                                                                                   20
            0.01                                                                                                                                                                                                   10
                                                                                                                    0.001                                                                                           0

                                                                                                                                                                                                                                           ita

                                                                                                                                                                                                                                 gm ood

                                                                                                                                                                                                                                 m 015

                                                                                                                                                                                                                                  tio lla

                                                                                                                                                                                                                                           ay

                                                                                                                                                                                                                             un to l

                                                                                                                                                                                                                                    po t
                                                                                                                                                                                                                                  ro 15

                                                                                                                                                                                                                                            er
                                      250 users                                                                                   250 users

                                                                                                                                                                                                                                          pa

                                                                                                                                                                                                                                 ex lis

                                                                                                                                                                                                                                        ng
                                                                                                                                                                                                                               na are
                                                                                                                                                                                                                              v

                                                                                                                                                                                                                                        ld

                                                                                                                                                                                                                               ze 20
                                                                                                                                                                                                                               at do
                                                                                                                                                                                                                                       ne
                                                                                                                                                                                                                          lla
                                      500 users                                                                                   500 users

                                                                                                                                                                                                                                        f
                                                                                                                                                                                                                                      a2

                                                                                                                                                                                                                                     na

                                                                                                                                                                                                                                     hu
                                                                                                                                                                                                                                      t
                                                                                                                                                                                                                        de

                                                                                                                                                                                                                                    at
                                      900 users                                                                                   900 users

                                                                                                                                                                                                                     ro
           0.001                                                                                                   0.0001

                                                                                                                                                                                                                   be
                                  1                       10                     100                                          1                                                  10

                                                                                                                                                                                                                 al
                                               Number of tweets per user                                                               Number of hashtags per tweet

                                                      (a)                                                                                     (b)                                                                                                 (c)

Figure 11: Descriptive statistics of the Twitter datasets used for the WFD@Expo2015 dynamic scenarios. (a) CCDF of the number of tweets
per user, (b) CCDF of the number of tags per tweet, and (c) Frequency of use of the most used tags.

                                                                                                                                                                                350
                                      160000          250 agents                       900 agents                                                                                                                                            250 agents
                                                      500 agents                                                                                                                                                                             500 agents
                                                                                                                                                                                300                                                          900 agents
                                      140000

                                                                                                                                                           Number of contents
             Number of contacts

                                      120000                                                                                                                                    250

                                      100000                                                                                                                                    200
                                      80000
                                                                                                                                                                                150
                                      60000
                                                                                                                                                                                100
                                      40000

                                      20000                                                                                                                                      50

                                           0                                                                                                                                     0
                                                  11 m
                                                         12 m
                                                            am
                                                                 pm
                                                                      pm
                                                                           pm
                                                                                pm
                                                                                     pm
                                                                                          pm
                                                                                               pm
                                                                                                     pm
                                                                                                               pm
                                                                                                                     pm

                                                                                                                                                                                           11 m
                                                                                                                                                                                                  12 m
                                                                                                                                                                                                     am

                                                                                                                                                                                                                 pm

                                                                                                                                                                                                                        pm

                                                                                                                                                                                                                              pm

                                                                                                                                                                                                                                   pm

                                                                                                                                                                                                                                        pm

                                                                                                                                                                                                                                             pm

                                                                                                                                                                                                                                                  pm

                                                                                                                                                                                                                                                        pm

                                                                                                                                                                                                                                                             10 m
                                                                                                                                                                                                                                                                 pm
                                                  a
                                                      a

                                                                                                                                                                                       a

                                                                                                                                                                                              a

                                                                                                                                                                                                                                                             p
                                             10

                                                                1
                                                                    2
                                                                        3
                                                                            4
                                                                                 5
                                                                                       6
                                                                                           7
                                                                                                8
                                                                                                               9
                                                                                                                   10

                                                                                                                                                                                      10

                                                                                                                                                                                                                 1
                                                                                                                                                                                                                     2
                                                                                                                                                                                                                             3
                                                                                                                                                                                                                                  4
                                                                                                                                                                                                                                      5
                                                                                                                                                                                                                                          6
                                                                                                                                                                                                                                               7
                                                                                                                                                                                                                                                    8
                                                                                                                                                                                                                                                         9
Figure 12: Overall number of contacts between nodes during the                                                                                      Figure 13: Number of contents generated by nodes during the simu-
simulation for the Expo 2015 scenario.                                                                                                              lation for the Expo 2015 scenario.

related to Expo, we filtered our requests in the same way                                                                                           number of tweets generated by users and the number of
as done for the dataset collected for the evaluation in the                                                                                         hashtags per tweet respectively, for the samples extracted
static scenario presented in Section 6. With respect to                                                                                             for the three settings (250, 500, and 900 nodes). The dis-
the dataset used for the static scenario, we downloaded                                                                                             tribution of hashtags per tweet are approximately the same
only tweets generated from 10am till 11pm, to cover the                                                                                             for the three samples: the majority of tweets have just one
same span of time of the simulated contact traces. The                                                                                              or two hashtags associated with them, whereas just few
obtained dataset contains 4,817 tweets generated by 2,660                                                                                           tweets are linked to more hashtags. Similarly, the number
users, and contains 3,008 unique hashtags.                                                                                                          of users who generated a high number of tweets is low,
We associated each node of the simulated contact traces                                                                                             when the majority of them has generated just one or two
with a Twitter user. In this way, the tweets of the Twitter                                                                                         tweets.
user associated with each node represent contents that the                                                                                          Figure 11c depicts the frequency of use for the 5 most
node generated during the simulation.                                                                                                               used tags for each sample. As expected, the most used
                                                                                                                                                    tags are related to topics about the Expo2015 exhibition,
7.3.3. Simulation Settings                                                                                                                          the WFD and some special guests (e.g., “mattarella” refers
We considered different settings for our simulations in or-                                                                                         to Sergio Mattarella, who is the actual President of Italy
der to analyze the possible impact of different parame-                                                                                             and was already in charge during the Expo in 2015, and
ters on the results. Specifically, we varied the number                                                                                             he was invited as a special speaker for the WFD event).
of nodes, simulating 250, 500, and 900 nodes in the con-                                                                                            Figure 12 depicts the total number of contacts between
sidered area. As the number of users that tweeted during                                                                                            nodes for each hour of simulation. It is worth noting that,
WFD@Expo2015 was higher than the number of simulated                                                                                                for each setting, the number of contacts between nodes
nodes (900 is the maximum number of nodes supported by                                                                                              is roughly constant for the entire simulation time, but it
HCMM), we sampled, for each setting, the Twitter users                                                                                              greatly differs from one setting to another, providing thus
with a uniform random sampling.                                                                                                                     significantly different cases in terms of opportunities of
In order to give an idea of the dynamics of content gener-                                                                                          contact between nodes in the simulations.
ation, Figures 11a and Figure 11b depict the CCDF of the                                                                                            Similarly, Figure 13 depicts the number of contents

                                                                                                                                              12
1                                                                               1                                                                                  1

                                                              Average Spearman similarity between

                                                                                                                                                  Average Jaccard similarity between
                                                                local and global resource vectors

                                                                                                                                                   local and global resource vectors
                    0.8                                                                             0.8                                                                                0.8
Graphs similarity

                    0.6                                                                             0.6                                                                                0.6

                    0.4                                                                             0.4                                                                                0.4

                    0.2                                                                             0.2                                                                                0.2
                                          250 agents                                                                          250 agents                                                                      250 agents
                                          500 agents                                                                          500 agents                                                                      500 agents
                                          900 agents                                                                          900 agents                                                                      900 agents
                     0                                                                               0                                                                                  0
                     10am   1pm   4pm           8pm    11pm                                          10am   1pm    4pm              8pm    11pm                                         10am   1pm    4pm           8pm    11pm
                                   Time                                                                                Time                                                                            Time

                                  (a)                                                                             (b)                                                                                (c)

Figure 14: Results for the WFD@Expo2015 scenario. (a) Average Jaccard similarity between local graphs of the agents and the global graph,
for different number of agents. (b) Average Spearman and (c) Jaccard similarity between the PLIERS resource vectors obtained on the local
graphs of the agents and those obtained from the global graph, for different number of agents.

(tweets) generated by nodes for each hour of simulation.                                                                graphs converge to those on the global graph. The pres-
The distribution is similar for the three settings: during                                                              ence of negative peaks in Figure 14b that are not visible in
the lunch break (from 12am till 1pm) there is an increment                                                              Figure 14c highlights the higher sensitivity of the similarity
in the generation of content, then it remain stable until the                                                           measure S compared to the Jaccard index. These nega-
evening hours and decreases closer to the exposition clos-                                                              tive peaks indicate a slight decrease in the performances
ing time. It is worth noting that no tweets containing the                                                              of PLIERS, which are nonetheless in the order of ∼15%
selected hashtags have been generated between 10am and                                                                  maximum, and the curve remains always above 0.6 for all
11am, probably due to the low number of visitors in the                                                                 the settings, at least after 4 hours of simulated time.
very first part of the event.                                                                                           Figure 25a depicts the average similarity between the lo-
                                                                                                                        cal graphs of the agents and the global graph, where only
7.3.4. Results                                                                                                          information generated not more than 1, 2 and 3 hours
Figure 14a depicts the average Jaccard index between the                                                                (of simulated time) before the calculations is respectively
local graphs of the agents and the global graph, for the                                                                considered. Note that the figure is related to the simula-
different numbers of simulated agents. From the figure,                                                                 tion with 900 agents. The differences in terms of average
we can note that, after a certain time, the similarity be-                                                              similarity of the curves related to the simulations for 3
tween the local graphs and the global graph, on average,                                                                and 2 hours are similar to the results obtained without
reaches a very high level for all the different settings con-                                                           temporal limitations. This tells us that, nodes can make
sidered. After approximately two hours of simulated time,                                                               accurate recommendations even if they maintain limited
the similarity reaches ∼80% for the cases with 900 and                                                                  information about past history and purge the older infor-
500 agents, with slight variations, whereas ∼4 hours are                                                                mation from their memory. The case where only informa-
needed to reach the same level of similarity for the case                                                               tion generated in the last hour is considered, the similarity
with 250 agents. After that, the similarity remains quite                                                               is slightly lower, but it is still around 75%, which could be
stable until the end of the simulation for all the settings.                                                            enough for some applications.
This means that even with a small number of nodes (250)
that generate items in a relatively large area (the whole                                                               7.4. Scenario 2 - Conference: KDD 2015
area of Expo), the opportunities of contact are enough to                                                               As a second dynamic scenario, we considered a school cam-
have a good approximation of the knowledge about items                                                                  pus during a conference event, where people stay most
and tags generated in the network, even after few hours.                                                                of the time within rooms, but they regularly gather at
Interestingly enough, the differences between 250 and 900                                                               breaks (e.g., coffee or lunch breaks). This is a typical sce-
agents does not have a substantial impact on the curve.                                                                 nario where pervasive and mobile communications could
The similarity between the results obtained by PLIERS on                                                                improve content delivery services and user experience in
local and global graphs are depicted in Figure 14b and 14c.                                                             general. We assume that the contents generated during the
Specifically, the former figure depicts the average similar-                                                            conference can be shared by nodes through wireless com-
ity S (derived from Spearman’s Footrule) for the recom-                                                                 munication, and we assess the potential impact of PLIERS
mendation vectors of the two cases, while the latter figure                                                             in the accuracy of recommendations using only local infor-
depicts the average Jaccard index on the same recommen-                                                                 mation obtained by mobile nodes from direct interactions
dation vectors. From the figures, it is worth noting that, on                                                           with other nodes, similarly to the previous scenario.
average, the differences between the results obtained from
local and global graphs are quite small. In addition, the                                                               7.4.1. Mobility Traces
figures clearly indicate that the results obtained on local                                                             For this scenario, we used real contact traces representing
                                                                                                                        the physical interactions of a group of students, professors,
                                                                                                                  13
You can also read