Combinations of Affinity Functions for Different Community Detection Algorithms in Social Networks

Page created by Albert Wise
 
CONTINUE READING
Combinations of Affinity Functions for Different Community Detection
                                                                   Algorithms in Social Networks
                                                Javier Fumanal-Idocin1 , Oscar Cordón2 , Marı́a Minárová3 , Amparo Alonso-Betanzos4 , Humberto Bustince1
                                         1
                                             Public University of Navarra, Pamplona, Spain, 2 Slovak University of Technology in Bratislava, Bratislava, Slovakia
                                                             3
                                                               University of A Coruña, A Coruña. Spain 4 University of Granada, Granada, Spain
                                                                                         javier.fumanal@unavarra.es

                                                                  Abstract                               [12], a protein complex [13], and to discover routing
arXiv:2208.12874v1 [cs.SI] 26 Aug 2022

                                                                                                         strategies [14]. Many algorithms have been proposed
                                             Social network analysis is a popular discipline             to solve community detection, a great deal of them
                                         among the social and behavioural sciences, in which             using the concept of modularity to measure how good
                                         the relationships between different social entities             are the sub-partitions detected [15]. The most popular
                                         are modelled as a network.            One of the most           community detection algorithm is the Lovaine algorithm
                                         popular problems in social network analysis is finding          [16], which uses changes in modularity to obtain the
                                         communities in its network structure.             Usually,      optimal partition. Other popular algorithms that use the
                                         a community in a social network is a functional                 idea of modularity are the proposal in [15], which is a
                                         sub-partition of the graph. However, as the definition          greedy algorithm designed to quickly reach a solution,
                                         of community is somewhat imprecise, many algorithms             and the proposal in [17], in which the authors propose
                                         have been proposed to solve this task, each of them             to iteratively eliminate edges in a network using the idea
                                         focusing on different social characteristics of the actors      of edge betweenness. Further approaches include using
                                         and the communities. In this work we propose to                 deep learning to generate an embedding of the graph
                                         use novel combinations of affinity functions, which are         and a reconstructed adjacency matrix in order to obtain
                                         designed to capture different social mechanics in the           a representation where the community structure can be
                                         network interactions. We use them to extend already             seen clearly [18]; spreading the labels in the network as
                                         existing community detection algorithms in order to             if it was an epidemic [19]; or using an equation, called
                                         combine the capacity of the affinity functions to model         the map equation, to model the information flows of the
                                         different social interactions than those exploited by the       network [20].
                                         original algorithms.
                                                                                                             Another proposal that does not include a modularity
                                                                                                         optimization process is [21], in which the authors
                                         1.      Introduction                                            propose to use a new kind of functions, the affinity
                                                                                                         functions, that model the local interactions between
                                             Social network analysis has become an important             actors according to different social behaviours. They do
                                         part of many disciplines such as physics [1], public            so in order to propose a community detection algorithm,
                                         health and administration [2, 3], and biology [4].              the Borgia Clustering, that simulates the dynamics
                                         The idea behind studying complex systems using this             of the conquests of Cesare Borgia in the Italian
                                         technique is that the elements composing such systems           Renaissance. However, the possibilities of using affinity
                                         are part of multi-lateral interactions and processes,           functions with other community detection algorithms
                                         which affect deeply the system and each of its                  were not studied. This possibility is of spatial relevance
                                         composing individuals. Many problems in computer                because affinity functions could improve the network
                                         science have been tackled using social network analysis         features that some community detection algorithms
                                         [5]. This is the case of information propagation [6],           exploit. In fact, some algorithms compute different
                                         leader detection [7], recommendation systems [8, 9] and         representations of the network using computationally
                                         decision making [10, 11]. One of the most popular               expensive processes, like a convolution [18]. Affinity
                                         problems of this kind is community detection.                   functions could work in a similar way in those algorithm
                                             A community in a social network is a group of nodes         that do not compute other representations for the
                                         that present a significant interaction among them, and          original network. Besides, although some combination
                                         much less with the rest of the network. Community               of affinity functions were proposed in [21], some
                                         detection has been used to identify groups of friends           mathematical properties of this combination could be
relaxed in order to explore further results.                     • Best friend affinity: it measures the importance of
    Taking these considerations into account, in this              a relationship with an actor y for the actor x, in
work we aim to:                                                    relation to all the other relationships of x:

     1. Extend the convex combination of affinity                                                C(x, y)
        functions to a more general expression.                              FCBF (x, y) = P                        (1)
                                                                                                a∈V C(x, a)

     2. Explore the possibilities of using affinity
        functions with community detection algorithms            • Best common friend affinity: it measures the
        other than the Borgia Clustering.                          importance of the relationship taking into account
                                                                   how important are the common connections
    In order to achieve our aims, we use different                 between the connected nodes to x and y, in
combinations of affinity functions without restricting             relation to all other relationships of x in the
them to be convex combinations, and then we study                  network:
these combinations with three of the most popular
modularity-based community detection algorithms:                                    maxa∈V {min{C(x, a), C(y, a)}}
                                                                   FCBCF (x, y) =           P
the Lovaine algorithm [16], the Greedy modularity                                            a∈V C(x, a)
algorithm [15] and Girvan-Newman algorithm the [17],                                                                (2)
and compare the results with the Borgia Clustering
algorithm.                                                       • Machiavelli affinity: it computes how affine two
    The rest of the work follows this structure: in                actors x and y are based on how similar is the
Section 2 we describe the affinity functions used                  social structure that surrounds them:
and how we combine them, and we also discuss
the community detection algorithms used in our                                                     | Ix − Iy |
                                                                           FCM ach (x, y) = 1 −                 ,   (3)
experimentation. In Section 3 we discuss the datasets                                             max{Ix , Iy }
used in our experimentation and the results obtained.                             P
Finally, in Section 4 we draw our conclusions for this             where Ia =       z∈Z D(z), where Z is the set of
work and the future lines for our research.                        actors where C(a, z) > 0, ∀z ∈ Z, and D(z) is
                                                                   the centrality degree of z.
2.     Methods
                                                                 Regarding the interpretation of the affinity value
    In this section we recall the notion of affinity         obtained, a 0 value means that no affinity has been
function, the functions studied and the algorithms tested    found at all while an 1 value means that there is a
to perform community detection.                              perfect match according to the analyzed factors Since
                                                             affinities are not necessarily symmetrical, the strength of
2.1.    Affinity functions                                   this interaction depends on who the sender and receiver
                                                             are, as it happens in human interactions. We denote a
    Affinity functions were proposed in [21] to measure      network resulting from calculating all its edges with an
the strength of the relationship between a pair of actors    affinity function an “affinity network”.
in a social network by capturing the nature of their local       There are two different kinds of affinity functions:
interactions. They are defined as functions that take as     personal and structural affinities. Personal affinities
input two different actors, x and y, and return a number     are computed using the edges of the x, y actors, while
in the [0, 1] interval:                                      the structural affinities use different centrality measures
                                                             of these actors. Two examples of personal affinity
                  FC : (x, y) → [0, 1]                       functions are the best friend and best common friend
                                                             affinities. The Machiavelli affinity is an example of a
    where C is the adjacency matrix whose each entry         structural affinity.
C(x, y) quantifies the relationship for the pair of actors       Depending on the affinity function computed, the
x, y in a weighted network. The affinity value will          resulting affinity network will have different structural
be higher or lower depending on which aspect of the          properties. For example, for the case of the ratio
relationship we are taking into account, like the number     of connections in the network with the respect to
of friends in common or the relative strength of the         all the potential ones (commonly called density): the
interaction between x and y. Some affinity functions         Machiavelli affinity will result in a network with density
are:                                                         equal to 1; the best friend affinity will preserve the
same density as in the adjacency network; and the best            with α ∈ [0, 1], for the linear combination:
common friend affinity tends to increase the density with
respect to the adjacency network.                                         FCLC = αFCBF (x, y) + βFCBCF (x, y)          (5)
    More affinity functions and further details about
                                                               with α, β ∈ [0, 1].
them can be found in [21].
                                                                  However Eq. (4) might not be always an affinity
                                                               function, so we establish the constraint that:
2.2.     Combinations of affinity functions
                                                                                       α + β 0                         (7)
two affinity functions is also a valid affinity function.
    However, some combinations of affinity functions               In this way, if α + β = 1 we recover a convex
that are not convex are also affinity functions. As            combination, and if not, we are obtaining a value lesser
long as we guarantee that the output is inside the             than the one obtained using a convex combination, so it
[0, 1] interval, the resulting value is a valid affinity       will never be above 1. We denote the expression Eq. (5),
function. We are interested in these expressions because       as long as it holds Eq. (6) and Eq. (7), a less than-convex
some affinity functions tend to have a higher average          combination.
value than others. Combining affinity functions can be             It is straightforward that the less than-convex
problematic because the values of one of the functions         combination of two affinity functions is another valid
can be irrelevant compared to the other in most cases.         affinity function, because it will always be lower than
One possible solution to this problem is to extremely          the convex combination of affinity functions (which is a
skew the value of the α to favour the irrelevant affinity.     valid affinity function), and always bigger than 0.
However, this solution presents two problems:
                                                               2.3.     Community detection algorithms
       • This solution results not only in decreasing one
                                                                   For our experimentation we have studied three
         affinity, it also increases the other, which is not
                                                               very popular community detection algorithms based on
         an indented behaviour. Also, finding the proper
                                                               different modularity optimization processes:
         equilibrium between the mixing parameters can
         require an expensive fine-tuning.                            • Lovaine community detection algorithm [16]:
                                                                        this method optimizes the modularity of the
       • The interpretability of this solution is                       community structure found as the algorithm
         counter-intuitive. If one of the mixing parameters             progresses. The algorithm is divided in two
         is much higher than the other, we can interpret                phases: (1) First, small communities are found
         that one affinity function is much important than              by optimizing modularity locally on each node.
         the other in our analysis. However, this would not             We do so by computing the changes in modularity
         be the case, as we did not increase on purpose that            when each node i is joined in the same community
         parameter, we only tried to decrease the average               as each of their neighbours, and choosing the
         value of the other affinity function.                          biggest positive change in modularity for each
                                                                        node. We do so for each node until no positive
    Using a non-convex combination, might solve both                    change in modularity is possible. (2) Then, each
of these problems. By simply reducing one mixing                        of these communities is grouped into one node
parameter without affecting the other, we are able to                   and the process repeats iteratively.
scale both affinity functions to an appropriate average
value without performing an expensive fine-tuning or                  • Girvan-Newman community detection algorithm
affecting the interpretability of the combination.                      [17]: this algorithm progressively removes edges
    To do so, we start by substituting the expression of a              from the original network based on the idea
convex combination of two affinity functions:                           of how likely an edge is to be a bridge
                                                                        between communities, which is called “edge
                                                                        betweenness”. The algorithm repeats iteratively
       FCCC = αFCBF (x, y) + (1 − α)FCBCF (x, y)         (4)            this process until there are no more edges: (1) We
compute all the edge betweenness. (2) We remove       3.1.   Performance metrics
       the one with the highest value. Finally, we obtain
       a dendrogram where the leaves are the individual               Normalized Mutual Information (NMI) is a
       nodes.                                                normalization of the Mutual Information score to scale
                                                             the results between 0 (no mutual information) and 1
     • Greedy modularity community detection                 (perfect correlation). We have used the NMI to evaluate
       algorithm [15]: is a greedy algorithm designed        the results obtained in community detection against
       to reach a solution in few steps. It works by         ground truth labels.
       optimizing the modularity value of each partition     3.1.1. Normalized Mutual Information Given the
       in the graph. Each node starts as the sole member     conditional entropy of two populations:
       of its community. Then, we select the edge from
       the original graph that has the highest positive                            XX
       modularity change. We do so iteratively, until               H (Y |X) = −               p (x, y) log p (y|x)    (8)
                                                                                    x      y
       only all the nodes are connected. Then, we
       choose the partition with better modularity as the
       final result.                                         then, the mutual information of two random discrete
                                                             variables can be defined as:
    We chose these algorithms as they are very popular,
and the concept of modularity is one of the most popular                 I(X; Y ) = H(X) − H(X|Y ),                    (9)
ones in community detection in social network analysis.
The procedure to use them with affinity functions is the     which measures the mutual dependence between X and
same for the three algorithms: we compute the affinity       Y . So, the final expression for the NMI is:
function or the combination of affinity functions on the
original adjacency matrix, and then, we compute the                                          2I (X; Y )
                                                                       N M I (X, Y ) =                                (10)
algorithm on the resulting affinity network.                                               H (X) + H (Y )
    For the sake of comparison, we have also compared
the results obtained with these algorithms with those        3.1.2. Modularity Modularity is a metric of the
obtained by the Borgia Clustering [21]. The Borgia           structure of networks that measures the strength of
Clustering is a community detection algorithm that           partition into different communities [26]. Networks
generalizes the gravitational clustering algorithm [22]      with high modularity have dense connections between
using affinity functions, among other changes. This          the nodes within communities but scarce connections
algorithm was explicitly designed to work with affinity      between nodes in different ones.
functions, so we think that it is interesting to compare         The expression of modularity, Q, is the following:
the results obtained using this algorithm with the rest
of the community detection algorithms using affinity                                 c
                                                                                     X
functions.                                                                     Q=      (eii − a2i )                   (11)
                                                                                     i=1
3.    Experimental results
                                                             where c is the number of communities, where eu,v is the
   In this section we have compared the results              fraction of edge with one end in a community, u, and
performing community detection in three datasets:            another end in community v, and a is the fraction of
                                                             edges with one end in the i-th community.
     • Zachary Karate Club [23]: is a social network of
       a university karate club that consists of 34 people   3.2.   The effects of different affinity
       and how they split into two communities.                     combinations in the networks
     • Dolphin [24]: a social network of bottlenose
                                                                 In this subsection we have tested the effects in each
       dolphins, where each link represents the
                                                             dataset using different affinity functions compared to
       frequency in which they played.
                                                             the original adjacency matrix. We expect the best
     • Polbooks [25]: nodes represent books about            friend affinity not to alter the degree in each node, but
       US politics sold by Amazon. Edges represent           other centrality measures may be affected. And for the
       frequent co-purchasing of books by the same           best common friend affinity, we expect it to increase
       buyers, as indicated by the ”customers who            the degree in each node, which might reduce other
       bought this book also bought these other books”       centrality measures, specially the betweenness, because
       feature on Amazon.                                    more edges will be present in the network.
An example of three different affinity functions                                                               3.3.    Community detection experiments
applied to the Zachary karate network is shown in
Figure 1. We can see that adding the best common                                                                       In this subsection we have evaluated the results
friend affinity did increase the number of edges and the                                                           obtained using the combinations of affinity functions
network, and the role that some actors present in the                                                              with community detection algorithms other than the
network varies significantly depending on the affinity                                                             Borgia Clustering, and then we have compared them
function used. We expect the results in community                                                                  with those obtained using this algorithm. We use the
detection to change considerably since the centrality                                                              Modularity as internal index and the NMI as an external
measures of some actors change significantly. Specially                                                            one to evaluate our results.
relevant are the changes of actors “34” or “2”. It is                                                                  In Table 1 we have displayed the results obtained for
also noteworthy how some actors do not change their                                                                each dataset and algorithm using the original adjacency
role in their network independently of the affinity used,                                                          matrix. We can see that the Greedy Modularity
like “17” or “8”. In this particular case, we can see                                                              algorithm obtained the best average results in the
that even though there are clearly two communities                                                                 NMI index and the Lovaine algorithm did so for the
in three affinity networks, the border between them                                                                Modularity index.
varies significantly between them since some actors in                                                                 We have studied how these indexes change when
between the communities seem to differ their position                                                              we apply the three algorithms using the affinity matrix
remarkably in the different affinity networks.                                                                     resulting when using the combinations proposed in Eq.
                                                                                                                   (5) instead of the original adjacency matrix. We have
                                                                                                              17

                                                                     4         13
                                                                                             6
                                                                                                      7
                                                                                                                   reported the modularity results in Figure 2 for the
                                                                                         11

                                                                         8         22             5                modularity index, and in Figure 3 for the NMI index.
                    25
                                        32             3                           18        12
              26
                                                                         2
                                                                                        1                              A summary of the best results can be found
                             28        29                  20 14                                                   in Table 2, alongside the results of the Borgia
               24                           10
                                                                     9
                                                                                                                   Clustering for the same trials. We found that in most
                                  34
                                                       31
                   30 27             15                                                                            cases the best friend affinity obtains the best results,
                                   23 16
                               19 21
                                             33
                                                                                                                   specially in the case of the Greedy Modularity and the
                         a) α = 0.25, β = 0.50                                                                     Girvan-Newman algorithms. However, in the case of the
                                                                         1                                         Louvain algorithm, non-convex combinations of affinity
                                                                                        5
                                                                                                              17
                                                                                                                   functions resulted best for the NMI index. Compared
                                                   2
                                                                             22 12
                                                                              18
                                                                                   11 6                            to the results obtained using the adjacency matrix,
                                                                                   13
                                               14                        8                    7                    NMI index decreased dramatically for the Dolphin and
                                       10   31 20                              4                                   Polbooks datasets, but modularity seemed to improve in
                                                                 9
                              23                                                                                   most cases, specially for the Louvain algorithm.
                             27 21 15             28
                               16 19                             32                3                                   All the studied algorithms seemed to obtain good
                                              29
                                   30
                                        24
                                                                                                                   modularity results, both in the case of the adjacency
              34                                       33                                                          and the affinity matrices. In fact, these algorithms
                                                  26
                                                           25
                                                                                                                   seemed to over perform the Borgia Clustering in terms
                         b) α = 0.50, β = 0.10                                                                     of the modularity index, although the Borgia Clustering
                                                                                                      6
                                                                                                                   presented much higher NMI values than the rest of
                                                                                                              17
                                                                              13
                                                                         4                   11
                                                                                                          7
                                                                                                  5
                                             9             14            8     1        12                         the algorithms, probably because the Borgia Clustering
                                       31                            2        22

               19
                         15                            3
                                                                                   18
                                                                                                                   computed correctly the number of real communities for
                                       10                       20
              21
                   16
                         33
                                  34                                                                               each dataset.
                          23                 29
                   27
                                                       32
                        30
                                  24
                                        28

                                                  25
                                                                                                                   4.     Results and Discussion
                                             26

                         c) α = 0.75, β = 0.25
                                                                                                                       In this experimentation we studied the performance
     Figure 1. Visual representation of the Zachary                                                                of different community detection algorithms using less
       Karate club Network using three different                                                                   than-convex combinations of affinity functions, in three
    combinations of affinities, using the parameters                                                               different datasets. We found satisfactory results for the
   indicated for each one. We can observe that two                                                                 modularity metric in all datasets for all the algorithms
communities are clearly present in the three networks,                                                             tested, and for the two metrics studied in the Zachary
  although the frontier between both is more clearly                                                               karate network. Also, we found the best friend affinity
        present in (b) compared to (a) and (c).                                                                    to be the most effective affinity function to perform
Greedy Modularity                                                                                              Girvan-Newman                                                                                                       Louvain
                                                                  value                                                                                                    value                                                                                              value
                                                                                                                                                                        

                                                                                                                                     
                                                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                         
                                                                                                                                     
                                                                                                                                                                                                                 
                                                                                                                                                   
                                                                                                                                                                                                                 
             value

                                                                                                                        value

                                                                                                                                                                                                                            value
                                                                                                                                                           
                                                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                     
                                                                                                             
                                                                                                                                                                                                                 
                                                                                                             
                                                                                                                                                                                                                 
                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                     

  Zachary
                                                                  value                                                                                                    value                                                                                              value
                                                                                                                                                                        

                                                                                                                           
                                                                                                                                                                                                                                                                                                             
                                                                                                                                     
                                                                                                                                                                                                         
                                                                                                                                                         
                                                                                                             
             value

                                                                                                                        value

                                                                                                                                                                                                                            value
                                                                                                                                                                                                                 
                                                                                                                                                                           

                                                                                                                                                                                     
                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                                               
                                                                                                             
                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                                       

  Dolphin
                                                                  value                                                                                                    value
                                 0.0        0.1        0.25       0.5       0.75        0.9        1.0                                      0.0       0.1       0.25       0.5      0.75       0.9      1.0
                                                                                                                                                                                                                                                                              value
                       0.0        0           0       0.067 0.19 0.28 0.31 0.34                                                   0.0        0        0.55 0.55 0.55 0.55 0.55 0.55                              0.5                                                
                                                                                                             0.30
                       0.1 0.069 0.054 0.039 0.026 0.024 0.021                                      0                             0.1 0.42 0.42 0.42 0.42 0.42 0.42                                      0                                                   
                                                                                                             0.25                                                                                                0.4                                                                                                 
                                                                                                                                                                                                                                                         
                     0.25 0.07 0.064 0.057 0.047 0.041                                   0          0                           0.25 0.42 0.42 0.42 0.42 0.42                                   0        0
                                                                                                             0.20                                                                                                                                                        
                                                                                                                                                                                                                 0.3                
             value

                                                                                                                        value

                       0.5 0.07 0.068 0.064 0.058                             0          0          0                             0.5 0.42 0.42 0.42 0.42                             0         0        0

                                                                                                                                                                                                                            value
                                                                                                             0.15                                                                                                                                                        
                     0.75 0.073 0.072 0.069                         0         0          0          0                           0.75 0.42 0.42 0.42                         0         0         0        0       0.2
                                                                                                             0.10                                                                                                                                                            
                       0.9 0.074 0.073                   0          0         0          0          0                             0.9 0.42 0.42                   0         0         0         0        0       0.1                                                           
                                                                                                             0.05                                                                                                                                                                                                    
                       1.0 0.075              0          0          0         0          0          0                             1.0 0.42              0         0         0         0         0        0                                                                           
                                                                                                             0.00                                                                                                0.0                                                                                                 

 Polbooks
 Figure 2. Modularity results for the community detection algorithms using affinity networks. Best friend affinity
  overperforms the combination with the best friend affinity in all cases. Results are generally better than those
                         obtained with the adjacency matrix according to this metric.

community detection.                                                                                                                                         for sure affect the performance and execution time of the
    The Louvain algorithm seems to be the most                                                                                                               Girvan-Newman algorithm, that needs to recompute the
successful classical algorithm when using affinity                                                                                                           edge betweenness in each iteration.
functions, according to the modularity index. The
Girvan-Newman algorithm also obtained a moderate                                                                                                             5.             Conclusions
performance increase. On the contrary, the Greedy
modularity algorithm performed better with the original                                                                                                          In this work we have proposed a new method
adjacency networks compared to the affinity functions                                                                                                        to construct affinity functions using non-convex
tested. None of the studied algorithms performed better                                                                                                      combinations of affinity functions. We have explained
than the Borgia Clustering in terms of real world labels,                                                                                                    how these combinations work and how some centrality
but they did in terms of modularity.                                                                                                                         measures change based on the mixing factors of the
    A possible explanation for the disparity between                                                                                                         combination. Besides, we have used the affinity
the modularity and NMI indexes performance could be                                                                                                          functions for the first time in classical community
due to the social interactions exploited by the affinity                                                                                                     detection algorithms designed to work with the
functions, which could be different from the dynamics                                                                                                        adjacency matrix. In order to test our proposal,
registered by the ground truth labels. Besides, the                                                                                                          we have computed a series of experiments using
behaviour of some of these algorithms can be affected                                                                                                        different combinations of affinity functions, and we have
dramatically by the affinity function chosen. For                                                                                                            compared the best results obtained with the classical
example, the best common friend affinity leads to an                                                                                                         algorithms with the Borgia Clustering.
increase in the number of edges in the network. This will                                                                                                        We found that the use of affinity functions can have
Greedy Modularity                                                                                                 Girvan-Newman                                                                                                                    Louvain
                                                                      value                                                                                                      value                                                                                                          value
                                                                                                                                                                                               

                                                                                                                                                                     
                                                                                                                 
                                                                                                                                                                                                                       
                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                                                                       
                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                                             
              value

                                                                                                                            value

                                                                                                                                                                                                                                           value
                                                                                                                                                                                                                              
                                                                                                                                                                                                                               
                                                                                                                 
                                                                                                                                                                                                             
                                                                                                                                                                                                                                       
                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                               

  Zachary
                                                                      value                                                                                                      value                                                                                                          value
                                                                                                                                                                                               

                                                                                                                            
                                                                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                             
              value

                                                                                                                            value

                                                                                                                                                                                                                                           value
                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                             
                                                                                                                                                                                                                            
                                                                                                                 
                                                                                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                           

  Dolphin
                                                                      value                                                                                                      value
                                  0.0         0.1        0.25         0.5      0.75        0.9         1.0                                      0.0        0.1        0.25        0.5       0.75        0.9         1.0
                                                                                                                                                                                                                              0.014                                                             value
                        0.0         0 0.0022 0.012                     0         0 0.00130.0086                                       0.0        0        0.014 0.014 0.014 0.014 0.014 0.014                                                                                              
                                                                                                                 0.010                                                                                                        0.012
                        0.1         0           0           0          0 0.00220.0022 0                                               0.1        0           0          0          0           0          0          0                                          
                                                                                                                 0.008                                                                                                        0.010                                                                                      
                      0.25 0.00220.00220.00220.00220.0022 0                                             0                           0.25         0           0          0          0           0          0          0
                                                                                                                                                                                                                              0.008                                                                             
              value

                                                                                                                            value

                        0.5 0.0022 0                        0          0         0           0          0        0.006                0.5        0           0          0          0           0          0          0                                                                                                                       

                                                                                                                                                                                                                                           value
                                                                                                                                                                                                                              0.006                                                                               
                      0.75 0.0033 0                         0          0         0           0          0        0.004              0.75         0           0          0          0           0          0          0                                                                                                                       
                                                                                                                                                                                                                              0.004                                                                             
                        0.9 0.00540.0011 0                             0         0           0          0                             0.9        0           0          0          0           0          0          0                                                                                                  
                                                                                                                 0.002                                                                                                        0.002
                                                                                                                                                                                                                                                                                                                                       
                        1.0 0.0022 0                        0          0         0           0          0                             1.0        0           0          0          0           0          0          0                                                                                            
                                                                                                                 0.000                                                                                                        0.000                                                                                                          

 Polbooks
Figure 3. NMI results for the community detection algorithms usgin affinity networks. We found good results for
   the Zachary karate network dataset, but underwhelming results for the Dolphin and Polbooks in all cases.

a positive impact on the modularity metric performance,                                                                                                           6.               Acknowledgments
although evaluation with ground truth labels decreased
in most cases. However, in the case of the Borgia                                                                                                                    Javier    Fumanal       Idocin     and     Humberto
Clustering we found the opposite results. The ground                                                                                                              Bustince’s    re-search      has     been     supported
truth labels evaluation was in general terms better than                                                                                                          by      the        project      PID2019-108392GBI00
the rest of the algorithms but the modularity values were                                                                                                         (AEI/10.13039/501100011033).
lower.                                                                                                                                                            Maria Minárová research has been funded by the project
                                                                                                                                                                  work was supported by the projects APVV-17-0066 and
                                                                                                                                                                  APVV-18-0052.
                                                                                                                                                                  Oscar Cordón’s research was supported by the Spanish
    Future research shall study the relationship between                                                                                                          Ministry of Science, Innovation and Universities under
the Borgia Clustering and the rest of the community                                                                                                               grant EXASOCO (PGC2018-101216-B-I00), including,
detection algorithms. As the Borgia Clustering is                                                                                                                 European Regional Development Funds (ERDF).
very expensive to compute in large networks, we are
particularly interested in a possible hybrid algorithm                                                                                                            References
that combines the Louvaine algorithm and the Borgia
Clustering, in order to surpass some of the limitations of                                                                                                           [1] M. Newman, “The physics of networks,” Physics today,
                                                                                                                                                                         vol. 61, no. 11, pp. 33–38, 2008.
the latter in terms of memory and execution time. Also,
we shall research the interaction of affinity functions                                                                                                              [2] T. W. Valente, Social networks and health: Models,
                                                                                                                                                                         methods, and applications, vol. 1. Oxford University
with specific community detection algorithms in order to                                                                                                                 Press New York, 2010.
develop specific functions to enhance the performance                                                                                                                [3] A. P. Chaves, I. Steinmacher, and V. Vieira, “Social
for each individual algorithm.                                                                                                                                           networks and collective intelligence applied to public
Dataset         Algorithm         NMI      Mod.           [10] H. Zhang, F. Wang, Y. Dong, F. Chiclana, and
                                                                    E. Herrera-Viedma, “Social trust-driven consensus
                   Greedy Mod.         0.45     0.38                reaching model with a minimum adjustment feedback
     Zachary      Girvan-Newman        0.35     0.41                mechanism considering assessments-modifications
                                                                    willingness,” IEEE Transactions on Fuzzy Systems,
                      Lovaine          0.35     0.41                2021.
                   Greedy Mod.         0.41     0.49           [11] E. Herrera-Viedma, F. J. Cabrerizo, F. Chiclana, J. Wu,
                                                                    M. J. Cobo, and S. Konstantin, “Consensus in group
     Dolphin      Girvan-Newman        0.44     0.51                decision making and social networks,” Studies in
                      Lovaine          0.33     0.51                Informatics and Control, vol. 26(3), pp. 259–268, 2017.
                                                               [12] S. Papadopoulos, Y. Kompatsiaris, A. Vakali, and
                   Greedy Mod.         0.49     0.50                P. Spyridonos, “Community detection in social media,”
    Polbooks      Girvan-Newman        0.48     0.51                Data Mining and Knowledge Discovery, vol. 24, no. 3,
                      Lovain           0.56     0.49                pp. 515–554, 2012.
                                                               [13] O. Mason and M. Verwoerd, “Graph theory and networks
  Table 1. Results for community detection for the                  in biology,” IET systems biology, vol. 1, no. 2,
    three datasets considered using three different                 pp. 89–119, 2007.
algorithms. In this case we used the adjacency matrix          [14] N. P. Nguyen, T. N. Dinh, Y. Shen, and M. T.
          to compute the network partition.                         Thai, “Dynamic social community detection and its
                                                                    applications,” PloS one, vol. 9, no. 4, p. e91431, 2014.
                                                               [15] M. E. Newman, “Fast algorithm for detecting community
    Dataset          Algorithm          NMI      Mod.               structure in networks,” Physical review E, vol. 69, no. 6,
                                                                    p. 066133, 2004.
                  Greedy Mod.           0.31     0.23          [16] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and
    Zachary      Girvan Newman          0.42     0.55               E. Lefebvre, “Fast unfolding of communities in large
                     Louvain            0.31     0.85               networks,” Journal of Statistical Mechanics: Theory and
                                                                    Experiment, vol. 2008, oct 2008.
                 Borgia Clustering      0.83     0.36
                                                               [17] M. Girvan and M. E. Newman, “Community structure
                  Greedy Mod.          0.052     0.43               in social and biological networks,” Proceedings of
    Dolphin      Girvan Newman         0.038     0.55               the National Academy of Sciences, vol. 99, no. 12,
                                                                    pp. 7821–7826, 2002.
                     Louvain           0.089     0.59
                                                               [18] L. Wu, Q. Zhang, C.-H. Chen, K. Guo, and D. Wang,
                 Borgia Clustering      1.00     0.37               “Deep learning techniques for community detection in
                                                                    social networks,” IEEE Access, vol. 8, pp. 96016–96026,
                  Greedy Mod.          0.086     0.55               2020.
    Polbooks     Girvan Newman         0.014     0.55          [19] S. E. Garza and S. E. Schaeffer, “Community detection
                     Louvain           0.024     0.59               with the label propagation algorithm: A survey,” Physica
                 Borgia Clustering      0.56     0.49               A: Statistical Mechanics and its Applications, vol. 534,
                                                                    p. 122058, 2019.
   Table 2. A summary for the best results for                 [20] M. Rosvall, D. Axelsson, and C. T. Bergstrom, “The
community detection computed over affinity networks                 map equation,” The European Physical Journal Special
     for the datasets and algorithms studied.                       Topics, vol. 178, no. 1, pp. 13–23, 2009.
                                                               [21] J. Fumanal-Idocin, A. Alonso-Betanzos, O. Cordón,
                                                                    H. Bustince, and M. Minárová, “Community detection
                                                                    and social network analysis based on the italian wars of
    transportation systems: A survey,” Viii Sbsc, pp. 16–23,        the 15th century,” Future Generation Computer Systems,
    2011.                                                           vol. 113, pp. 25–40, 2020.
[4] S. Horvath, Weighted network analysis: applications        [22] W. Wright, “Gravitational clustering,” Pattern
    in genomics and systems biology. Springer Science &             Recognition, vol. 9, no. 3, pp. 151 – 166, 1977.
    Business Media, 2011.                                      [23] W. W. Zachary, “An information flow model for conflict
[5] M. Newman, Networks. Oxford university press, 2018.             and fission in small groups,” Journal of Anthropological
                                                                    Research, vol. 33, no. 4, pp. 452–473, 1977.
[6] I. Lobel and E. Sadler, “Information diffusion
    in networks through social learning,” Theoretical          [24] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase,
    Economics, vol. 10, no. 3, pp. 807–851, 2015.                   E. Slooten, and S. M. Dawson, “The bottlenose dolphin
                                                                    community of doubtful sound features a large proportion
[7] S. M. H. Bamakan, I. Nurgaliev, and Q. Qu, “Opinion             of long-lasting associations,” Behavioral Ecology and
    leader detection: A methodological review,” Expert              Sociobiology, vol. 54, no. 4, pp. 396–405, 2003.
    Systems with Applications, vol. 115, pp. 200–222, 2019.    [25] L. A. Adamic and N. Glance, “The political blogosphere
[8] W. Yu and S. Li, “Recommender systems based on                  and the 2004 us election: divided they blog,” in
    multiple social networks correlation,” Future Generation        Proceedings of the 3rd international workshop on Link
    Computer Systems, vol. 87, pp. 312 – 327, 2018.                 discovery, pp. 36–43, ACM, 2005.
[9] J. Wu, S. Wang, F. Chiclana, and E. Herrera-Viedma,        [26] M. E. Newman and M. Girvan, “Finding and evaluating
    “Two-fold personalized feedback mechanism for                   community structure in networks,” Physical review E,
    social network consensus by uninorm interval trust              vol. 69, no. 2, p. 026113, 2004.
    propagation,” IEEE Transactions on Cybernetics, 2021.
You can also read