POTION : Optimizing Graph Structure for Targeted Diffusion

Page created by Ellen Torres

Arts & Entertainment

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

POTION : Optimizing Graph Structure for Targeted Diffusion
                                                          ∗ †1
                                             Sixie Yu            , Leo Torres‡2 , Scott Alfeld§3 , Tina Eliassi-Rad           ¶2
                                                                                                                                   , and Yevgeniy Vorobeychik              ‖1

                                                                                     1
                                                                                         Washington University in St. Louis
                                                                                             2
                                                                                               Northeastern University
                                                                                                 3
                                                                                                   Amherst College
arXiv:2008.05589v4 [cs.SI] 17 Feb 2021

                                         Abstract                                                          at particular subsets of critical devices [13], which should
                                         The problem of diffusion control on networks has been             be accounted for when modeling attacking behavior.
                                         extensively studied, with applications ranging from               Congestion cascades of ground traffic or flight networks
                                         marketing to controlling infectious disease. However,             are other examples, where the goal of resilience may be
                                         in many applications, such as cybersecurity, an attacker          to ensure that cascades concentrate on a subset of high-
                                         may want to attack a targeted subgraph of a network,              capacity nodes that can handle them, limiting the impact
                                         while limiting the impact on the rest of the network              on the rest of the network [11, 10]. In another domain,
                                         in order to remain undetected. We present a model                 medical treatments for certain diseases such as cancer
                                         POTION in which the principal aim is to optimize                  may leverage a molecular signaling network, with the
                                         graph structure to achieve such targeted attacks. We              goal of targeting just the pathogenic portion of it, while
                                         propose an algorithm POTION-ALG for solving the                   limiting the deleterious effects on the rest [44].
                                         model at scale, using a gradient-based approach that              We study the problem of targeted diffusion in which
                                         leverages Rayleigh quotients and pseudospectrum theory.           an attacker1 can modify the graph structure G =
                                         In addition, we present a condition for certifying that a         (V, E) to achieve two goals: 1) maximize the diffusion
                                         targeted subgraph is immune to such attacks. Finally, we          spread to a target subgraph GS , and 2) minimize
                                         demonstrate the effectiveness of our approach through             the impact on the remaining graph G \ GS . We
                                         experiments on real and synthetic networks.                       capture the first goal by maximizing a utility function
                                                                                                           that incorporates spectral information of the adjacency
                                         1    Introduction                                                 matrix of G, specifically its largest (in magnitude)
                                         Many diverse phenomena that propagate through a                   eigenvalue, eigenvector centrality, and the normalized
                                         network, such as epidemic spread, cascading failures,             cut of the target subgraph. The second goal is achieved
                                         and chemical reactions, can be modeled by network                 by limiting the modifications made outside of the target
                                         diffusion models [4, 6, 24, 46, 43]. The problem of               subgraph. We present a scalable algorithmic framework
                                         controlling diffusion has, as a result, received much             for solving this problem. Our framework leverages a
                                         attention in the literature, with primary focus on two            combination of gradient ascent with the use of Rayleigh
                                         mechanisms for control: the choice of initial nodes to            quotients and pseudospectrum theory, which yields
                                         start the spread [15, 8, 48], and the modification of             differentiable approximations of our objective and allows
                                         network structure [17, 38, 47, 39]. To date, most work            us to avoid projection steps that would otherwise be
                                         on diffusion control (either promotion or inhibition) has         costly and imprecise. Moreover, we derive a condition
                                         considered diffusion over the entire network. However,            that enables us to certify if a network is robust against
                                         in many problems, the focus is instead on diffusion that          a broad class of targeted diffusion attacks. Finally, we
                                         is targeted to a particular subgraph of the network. For          demonstrate the effectiveness of our approach through
                                         example, in cybersecurity, diffusion commonly represents          extensive experiments.
                                         malware spread, but malware attacks are often targeted
                                                                                                           In summary, our contributions are:
                                             ∗ The   first two authors contributed equally to the paper.    1. We propose POTION (oPtimizing graph structures
                                             † sixie.yu@wustl.edu.                                             fOr Targeted diffusION): a model for targeted
                                             ‡ leo@leotrs.com
                                                                                                               diffusion attack by optimizing graph structures.
                                             § salfeld@amherst.edu
                                             ¶ t.eliassirad@northeastern.edu
                                             ‖ yvorobeychik@wustl.edu                                        1 The   attacker is the agent who initiates diffusion.

                                                                                                                                                 Copyright © 2021 by SIAM
                                                                                                                         Unauthorized reproduction of this article is prohibited

2. We present POTION-ALG : an efficient algorithm to edge perturbation. All of these prior efforts focus on the
optimize POTION by leveraging Rayleigh quotient impact either at the network level or at the node-level
and pseudospectrum theory. properties, while our focus is on the impact of diffusion
dynamics on a targeted subgraph of the network.
3. We describe a condition for certifying that a targeted
subgraph is immune to such attacks. 3 POTION : Proposed Model
4. We demonstrate the effectiveness and efficiency of We present a model for targeted diffusion through graph
POTION and POTION-ALG on synthetic and real- structure optimization. We refer to the agent who
world networks; and against baseline and competing initiates diffusion as the attacker. We use cybersecurity
methods.2 as a running example. Here the attacker initiates the
diffusion (e.g., the spread of malware) on a network of
2 Related Work computers. We define the impact of the diffusion as
Various dynamical processes can be modeled as diffusion the number of infected nodes (e.g., compromised with
dynamics on networks, including the spread of infectious malware). The attacker has two objectives: 1) she wishes
diseases [4, 6], cascading failures in infrastructure to maximize the impact of the diffusion on a targeted set
networks [24, 46], and information spread (e.g., rumors, of nodes (e.g., computing nodes with access to critical
fake news) on social networks [20, 21]. One line of assets), and 2) to limit the impact on non-targeted nodes
research assesses the impact of cascading failures. Yang to ensure stealth [13].
et al. [46] simulated cascading failures to quantify Let G = (V, E) be a connected, weighted or unweighted,
the vulnerability of the power grid in North America. undirected graph with no self-loops. Let n = |V| be the
Fleurquin et al. [11] studied the impact of flight delays number of nodes in G and A be its adjacency matrix.
as a cascading failure diffusing through the network. Throughout this paper, the eigenvalues of A are ranked
Motter and Lai [24] investigated the cascading failures in descending order λ1 (A) ≥ · · · ≥ λn (A). Suppose the
on a network due to the malfunction of a single node. attacker targets a subgraph GS where S ⊆ V is the node
Another line of research concerns diffusion control, for set of GS . Let S 0 = V \ S, and its induced subgraph
example, selecting a set of nodes such that if the GS 0 . Throughout the paper we assume GS is connected,
diffusion originated from them, it reaches as many and denote its adjacency matrix by AS . To achieve her
nodes as possible [15, 8, 48]; or modifying network objectives, the attacker modifies the structure of G. The
structures to increase or limit some diffusion [38, 31]. modified graph and targeted subgraph are represented by
However, these lines of research do not differentiate G̃ and G̃S , respectively. Formally, the attacker’s action
between targeted and non-targeted nodes. Ho et al. [14] is to add a perturbation ∆ ∈ Rn×n to A, which results
studied targeted diffusion controlled by changing nodal in the perturbed adjacency matrix Ã = A + ∆. The
status. We focus on the problem where an attacker adjacency matrix of G̃S is denoted by ÃS .
manipulates underlying network structures in order to
achieve targeted diffusion. 3.1 Diffusion Dynamics The status of a node is
Another relevant research thread is network design, modeled by the well-known SIS (Susceptible-Infected-
which is the problem of modifying network structure Susceptible) diffusion dynamics, where it alternates
to induce certain desirable outcomes. Some prior work between “infected” and “susceptible”. 3 Due to the
[38, 47, 36] considered the containment of spreading malware spread by infected neighbors, a susceptible node
dynamics by adding or removing nodes or edges from becomes infected with probability β. An infected node
the network, while others [42, 32, 17, 7, 39] considered becomes susceptible again (e.g., malware is removed)
limiting the spread of infectious disease by minimizing with probability δ. Following Chakrabarti et al.[6], this
the largest eigenvalue of the network. Kempe et al. [16] process is modeled by a nonlinear dynamical system.
studied modifying network structure to induce certain Let πi be the probability of node i becoming infected
outcomes from a game-theoretic perspective, but they (e.g., compromised with malware) in the steady state
did not consider diffusion dynamics. Others have studied of this dynamical system, with π the vector of these
the problem of manipulating node centrality measures probabilities. A key result in [6] is that when λ1 (A) <
(e.g., eigenvector or PageRank centrality) [2, 1] or node δ/β the system converges to the steady state π = 0,
similarity measures (e.g., Katz similarity) [49] through which implies that the diffusion process quickly dies out.

3 Due
to brevity, a discussion on generalization of our approach
2 The
code to replicate the experimental results is at https: to other diffusion dynamics is at https://arxiv.org/abs/2008.
//github.com/marsplus/POTION. 05589.

However, when λ1 (A) ≥ δ/β the system converges to                   contributed by the infected neighbors of node i (which
another steady state π =6 0. We leverage this connection             increases Pit ), while the second term is the force due
between graph structure, dynamical model of epidemic                 to i’s self recovery (which decreases Pit ). Rewriting in
spread, and the epidemic threshold, in constructing our              matrix notation yields:
threat model, as discussed next.                                                            dP t
                                                                                                 = β Ã − δI P t ,
                                                                                                           
                                                                     (3.3)
                                                                                             dt
3.2 Threat Model Maximizing the Impact on
GS : To maximize the impact of diffusion on GS , the                 which gives a linear approximation to the non-linear
attacker has two goals: 1) ensure that epidemics starting            dynamical system  proposed in [6]. The steady state
in GS spread rather than die out, and 2) ensure that                 π must satisfy β Ã − δI π = 0, which is equivalent
epidemics starting outside GS are likely to reach it. We             to Ãπ = (δ/β)π. Suppose λ1 (Ã) = δ/β, and π is the
capture the first goal by maximizing the largest (in                 corresponding eigenvector. Let ṽ1 be the
                                                                                                            P unit eigenvector
modulus) eigenvalue of GS , λ1 (ÃS ), which corresponds             associated with λ1 (Ã). Let σ(S) = j∈S ṽ1 [j] be the
to the epidemic threshold of the targeted subgraph.4 The             eigenvector centrality of S. Noting that π may differ
second goal is captured by maximizing the normalized                 from ṽ1 by up to a multiplicative constant c, the impact
cut of GS , φ(S), where S is the set of nodes in GS and              on GS 0 can be approximated as:
S 0 are the nodes in the remaining graph. The normalized                                    X                X                           
cut is formally defined as follows:                                  (3.4)     I(GS 0 ) =           πj ≈ c           ṽ1 [j] = c 1 − σ(S) ,
                                                                                          j∈S 0            j∈S 0
                                       1         1
(3.1)      φ(S) = cut(S, S 0 )              +                 ,
                                     vol(S)   vol(S 0 )
                                                            where the last equality is because S and S 0 are disjoint
                 0
where cut(S, S ) is the sum of the weights on the edges and ṽ1 is an unit vector. Thus, minimizing the impact
across S and S 0 (unit weights for unweighted graphs), on GS 0 is approximately equivalent to maximizing the
and vol(S) (resp. vol(S 0 )) is the sum of degrees of the eigenvector centrality of S.
nodes in S (resp. S 0 ). The formal rationale for using
                                                            Recall that to have an epidemic spread, one needs
the normalized cut is based on Meila et al. [22], which
                                                            λ1 (Ã) ≥ δ/β. Here, we assumed λ1 (Ã) = δ/β. In
showed that increasing φ(S) increases the probability
                                                            Section 6, we demonstrate that our analysis yields an
that a random walker transitions from S 0 to S, if we
                                                            approach that is effective even when this assumption
assume that GS is smaller than GS 0 .
                                                            fails to hold (i.e., when λ1 (Ã) > δ/β).
Limiting the Impact on GS 0 : Another important
                                                            Now we focus on limiting the impact on the spectrum
objective of the attacker is to limit the impact on GS 0 ,
                                                            of A. 5 Let  > 0 be the attacker’s budget. Formally,
the non-targeted part of the graph. We capture this goal
                                                            this notion is captured through the following constraints:
in two different ways. First, by limiting the likelihood
                                                            (3.5)         |λi (Ã) − λi (A)| ≤ , i = 1, . . . , n.
of the epidemic spreading to GS 0P    , which we define as
minimizing the impact I(GS 0 ) = i∈S 0 πi . Second, by
                                                            In summary, the principal aims to (i ) maximize the
limiting the impact on the spectrum of A.
                                                            impact on GS through maximizing λ1 (ÃS ) while (ii )
We now demonstrate that minimizing I(GS 0 ) is approx- limiting the impact on GS 0 by maximizing the eigenvec-
imately equivalent to minimizing the eigenvector cen- tor centrality σ(S), and satisfying Eq. (3.5). Formally,
trality of S 0 . Let P t be the global configuration of the the principal aims to solve the following optimization
graph at time step t, where Pit is the probability that problem:
node i is infected (e.g., compromised with malware). Fol- (3.6)
lowing Mieghem et al. [23] , ignoring higher-order terms     max α1 λ1 (ÃS ) + α2 σ(S) + α3 φ(S)
                                                               Ã
and taking the time step to be infinitesimally small, the                       (                                              )
                  t                                                                  |λi (Ã) − λi (A)| ≤ , i = 1, . . . , n,
dynamics of Pi is modeled as the following:                  s.t.    Ã ∈ P = Ã                                                 ,
                     dPit   X                                                                   Ã = Ã> , Ãii = 0, ∀i = 1, . . . , n
(3.2)                     =     β Ãij Pjt − δPit .
                      dt                              where the relative importance of the terms is balanced by
                            j∈V
                                                      the nonnegative constants α1 , α2 , α3 , and the restrictions
                                                      Ã = Ã> and Ãii = 0, ∀i = 1, . . . , n ensure that Ã is a
Here, we can think of the two terms on the right side valid adjacency matrix.
as two competing forces. The first term is the force
                                                                       5 In  cybersecurity, there are natural interpretations of an
   4 If
      GS is not connected, we may replace λ1 (ÃS ) by the largest   attack’s stealth. For further details, see the extended version
eigenvalue of the largest connected component of GS .                at https://arxiv.org/abs/2008.05589.

                                                                                                          Copyright © 2021 by SIAM
                                                                                  Unauthorized reproduction of this article is prohibited

4   POTION-ALG : Proposed Algorithm                              and does not provide us with the necessary gradient
To solve the optimization problem in Eq. (3.6), a natural        information. Instead, we use the power method [12] to
approach would be to use a form of projected gradient            compute λ1 (ÃS ). Let vS be the eigenvector associated
ascent. There are, however, two major hurdles to this            with the largest eigenvalue λ1 (ÃS ). Using Rayleigh
basic approach: 1) the objective function involves terms         quotients [40], we can compute λ1 (ÃS ) as follows:
that do not have an explicit functional representation in
the decision variables, and 2) the projection step is quite
                                                                 (4.8a)           vS = arg max x> ÃS x
expensive, as it involves projecting into a spectral norm                                kxk2 =1
ball, which entails an expensive SVD operation [18]. We
                                                                 (4.8b)           λ1 (ÃS ) = vS> ÃS vS .
address these challenges in Algorithm 1, which is our
gradient-based solution to the attacker’s optimization
problem as described in Eq. (3.6).
                                                                 Thus, when vS is known, the computation of λ1 (ÃS )
Algorithm 1 POTION-ALG                                           reduces to matrix multiplications. In addition, ÃS
 1: Input: A, , {ηi }i=1 . {ηi }i=1 is a schedule of step sizes is usually sparse, so we can leverage sparse matrix
 2: Initialize: i = 1, Ã1 = A, B1 = 0 . Bi : the amount of multiplication to speed up the computation.
    budget used just before step i
 3: while True do                                                The remaining challenge is that vS is an optimal solution
 4:   Set ∆i to the gradient of α1 λ1 (ÃS )+α2 σ(S)+α3 φ(S)     of an optimization problem, and we need an explicit
    w.r.t. to Ãi                                                derivative of it. Fortunately, our problem has a special
 5:    Set the diagonal entries of ∆i to zeros                   structure that we exploit to obtain an approximation
 6:    if k∆i k = 0 then          . a local optimum is found     of the derivative of vS . From our experiments we find
 7:        return Ãi                                            that G̃S is nearly always connected. This means that
 8:    end if                                                    the largest eigenvalue of ÃS is simple. In addition, due
 9:    if Bi + ||ηi ∆i ||2 ≤  then     . one-step look ahead
                                                                 to the Perron–Frobenius theorem, the absolute value
10:        Ãi+1 = Ãi +ηi ∆i , Bi+1 = Bi +kηi ∆i k2 , i = i+1
                                                                 of the largest eigenvalue is strictly greater than the
11:    else
12:        return Ãi                                            absolute values of others, i.e., |λ1 (ÃS )| > |λk (ÃS )| for
13:    end if                                                    all k 6= 1. Under these conditions, we can use the
14: end while                                                    power method to estimate vS by repeating the formula:
                                                                   (t+1)          (t)       (t)
                                                                 ṽS      = ÃS ṽS /kÃS ṽS k2 . The `2 -norm distance
A key step of Algorithm 1 is line 4, where we compute the        between ṽS and vS decreases in a rate O(ρk ) [12],
                                                                             k

gradient of the attacker’s utility function with respect         where ρ < 1. In our experiments we found k = 50
to Ã. This gradient involves terms that do not have an          is enough to give a high-quality estimation for a graph
explicit functional form in terms of the decision variable,      with 986 nodes. Intuitively, we are using a sequence
and we deal with each of these in turn.                          of differentiable operations to approximate the argmax
                                                                 operation. Therefore the computation of ∇Ã λ1 (ÃS ) can
First, consider the gradient of the normalized cut φ(S)          be handled by PyTorch.
w.r.t. Ã. Let xS be the characteristic vector of S, that is
         1 iff i ∈ S. Let D̃ be the diagonal degree matrix We use the same machinery to compute ∇Ã σ(S). First,
xS [i] = P
D̃ii = j Ãij , and let L̃ = Ã − D̃ be the Laplacian we write σ(S) in matrix notation:
matrix. Using D̃ and L̃ to express vol(S) and cut(S, S 0 ), (4.9)                    σ(S) = v > xS ,
respectively, we have:
                                                             where v is the unit eigenvector associated with λ1 (Ã).
                                                     !
                                 1            1              Then we apply the power method to compute v. Finally,
(4.7)        φ(S) = x>S L̃xS   >
                                       + >             .     σ(S) is just a linear function of v. All of these operations
                              xS D̃xS    xS 0 D̃xS 0
                                                             are differentiable, and the computation of ∇Ã σ(S) is
                                                             handled by PyTorch.
Clearly, Eq. (4.7) is a differentiable function of Ã. Com-
puting its gradient ∇Ã φ(S) can then be handled by au- We next address the challenge imposed by the constraints
tomatic differentiation tools such as PyTorch [28].          (3.5), which can result in a computationally challenging
                                                             projection step which can also significantly harm solution
Next, we compute the gradient of λ1 (ÃS ) w.r.t. Ã. quality. We address this challenge as follows. Given a
A standard way to compute λ1 (ÃS ) is by using SVD. real symmetric matrix X, let kXk2 denote its spectral
However, this is both prohibitively expensive (O(n3 )), norm. To satisfy Eq. (3.5), we use the following result

                                                                                                     Copyright © 2021 by SIAM
                                                                             Unauthorized reproduction of this article is prohibited

from pseudospectrum theory (see [41], Theorem 2.2): After running Algorithm 1, we obtain a perturbed matrix
                                                           Ã with fractional entries. For unweighted graphs, a
(4.10) λi (Ã)−λi (A) ≤ , i = 1, . . . , n ⇐⇒ kÃ−Ak2 ≤  rounding heuristic is needed to convert Ã to a valid
                                                           adjacency matrix. Let D = {(i, j)|Ãi,j 6= Ai,j } be the
                                                           set of candidate edges that will be added or deleted
Since ∆ = Ã−A is real and symmetric, we have k∆k2 =
                                                           from G. For each edge (i, j) ∈ D define the score
max{|λ1 (∆)|, |λn (∆)|} and −λn (∆) = λ1 (−∆), which
                                                           s(i,j) = |Ãi,j − Ai,j |. Intuitively, s(i,j) indicates the
leads to:
                                                           impact that adding or deleting the edge has on the
(4.11)                                                     principal’s utility. Next, we iteratively modify G, by
  Ã satisfies Eq. (3.5) ⇐⇒ max{|λ1 (∆)|, |λ1 (−∆)|} ≤ .  adding or deleting edges in D, starting with the one
                                                           with the largest s(i,j) . The modification process stops
                                                           when the budget is exhausted, which results in the
                                                           desired binary adjacency matrix. For weighted graphs,
This equivalence allows the attacker to check
                                                           let C = maxi,j Aij and normalize each entry by C, that is
whether she is within budget simply by evalu-
                                                           Aij /C. We run Algorithm 1 on the normalized adjacency
ating max{|λ1 (∆)|, |λ1 (−∆)|}, i.e., computing the
largest eigenvalue of a real symmetric matrix, which matrix, which results in Ã. The desired adjacency matrix
can be computed efficiently using, e.g., the power is obtained by multiplying each Ãij by C, C Ãij . If
method [12].                                               integer weights are desired (e.g., the number of packages
                                                           transmitted between two computers), a final rounding
Our algorithm leverages this connection as follows. step is applied. Our experimental results show that the
Line 9 in Algorithm 1 is a one step look-ahead, which rounding heuristic is effective in practice.
ensures that the perturbation ∆i is only added to Ãi
when there is enough budget. Recall from Section 3.2 5 Certified Robustness
that k∆i k2 = max{|λ1 (∆i )|, |λ1 (−∆i )|}. Thus this This section addresses the following question: what
step requires us to compute λ1 (∆i ) and λ1 (−∆i ), using are the limits on the attacker’s ability to successfully
again the power method. Line 10 tracks the amount of accomplish her attack? More precisely, we now seek to
budget used so far. We now show that the output of identify necessary conditions on the attack budget  so
Algorithm 1 always returns a feasible solution. Suppose the attack succeeds; conversely, we can view a given
Algorithm 1 terminates after k > 1 iterations. This graph to be certified to be robust to attacks that use a
means Bk + kηk ∆k k2 >  and Bk ≤ . In other words smaller budget than the one required.
        Pk−1
Bk = i=1 kηi ∆i k2 ≤ . Note that the total amount
of perturbation added to A is ∆ = i=1 ηi ∆i . The Let TargetDiff(S, G, ) be an instance of the targeted
                                             Pk−1
triangle inequality implies k∆k2 ≤ .                      diffusion problem with target subset S, underlying graph
                                                           G and budget . The attacker is successful on an instance
For each iteration of Algorithm 1, the most computa- TargetDiff(S, G, ) if she is able to modify G into G̃
tionally expensive components are the power method within budget  such that I(G̃S ) > I(GS ). We now
and matrix multiplication. Let m be the number of derive a necessary condition for successful attacks, in
nonzeros in Ãi ; if the graph is unweighted then m is the form of a lower bound on .
the number of edges at this iteration. By leveraging the
sparseness exhibited in Ãi , the power method runs in To derive the necessary condition on , we use our
O(m) and the matrix multiplications cost O(mn). Thus, experimental observation that in successful attacks the
the time complexity of each iteration is O(mn), which degrees of nodes in the targeted subgraph GS always
significantly improves the O(n3 ) time complexity of SVD increase. This is intuitive: a denser subgraph GS will
that would otherwise be needed.                            tend to increase the propensity of the diffusion (e.g., of
                                                           malware) to spread within it, which is one of our explicit
Recall that our model for targeted diffusion is applicable objectives. Let di (resp. d˜i ) be the degree of node i
to both weighted and unweighted graphs. For weighted before (resp. after) graph modification. We assume if
graphs, the attacker modifies the weights on existing an attack is successful, the degrees of nodes in GS are
edges. For unweighted graphs, the attacker adds new increased, i.e., d˜i ≥ di for i ∈ S.
edges or deletes existing edges from the graph. The
main difference between the two settings is that the Now, observe that computing the exact value of I(GS )
latter needs a rounding heuristic to convert a matrix is intractable, since the exact computation of πi is
with fractional entries to a binary adjacency matrix. We prohibitive (see, e.g., [23], Section IV.B). Mieghem
discuss this heuristic below.                              et al. [23] proposed a simple yet effective estimator

                                                                                                Copyright © 2021 by SIAM
                                                                        Unauthorized reproduction of this article is prohibited

for πi to be 1 − δ/(βdi ). The estimator works in 6 Experiments and Discussion
the regime δ/β ≤ dmin , where dmin is the minimum This section presents experimental results on three real-
degree of G. Consequently, an estimator for I(GS ) is world datasets: an email network, an airport network,
ˆ S) = P
I(G i∈S 1 − δ/(βdi ). We focus on the setting and a brain network. 7
where the estimation error is bounded by a small number,
ˆ S ) − I(GS )| ≤ τ . Note that τ can be estimated
i.e., |I(G For each network we run POTION with hyper-
from historical diffusion data. The formal statement of parameters α1 = α2 = α3 = 1/3, which encodes that
the necessary condition is in Theorem 5.1. 6 the attacker’s objectives are equally important. To
study how the attacker’s effectiveness changes with re-
Theorem 5.1. Given an instance TargetDiff(S, G, ), spect to her budget, we set = γλ1 (A) and vary γ from
ˆ S) = P
I(GS ) is estimated by I(G i∈S 1 − δ/(βdi ). 10% to 50%. A single initially infected node is selected
Suppose we have an upper bound |I(G ˆ S ) − I(GS )| ≤ τ , uniformly at random.
the degrees of nodes in S are increased, i.e., d˜i ≥ Recall we use G and G̃ to denote the original and the
di for i ∈ S, and δ/β ≤ dmin . In order to have
modified graphs, respectively. We simulate the spreading
I(G̃S ) − I(GS ) > 2τ , the budget must satisfy:
dynamics 2000 times on both G and G̃. For unweighted
1/2
graphs the recovery rate δ and transmission rate β are
2
r !
2
P P
|S| d
i∈S i i∈S d i set to 0.24, 0.06, resp.; for weighted graphs we set
(5.12) ≥ − .
n |S| |S|2 δ = 0.24 and β = 0.2. The spreading dynamics converges
exponentially fast to the steady state: empirically, we
found 30 time steps to be enough to reach the steady
The quantity inside the square root is always nonnegative state in most cases. When the simulation finishes, we
due to Jensen’s inequality. The lower bound involves extract the number of nodes that are “infected”. We
only structural properties of the graph (node degrees and use Ioriginal and Imodified to represent the fractions of
the size of S) and thus can be easily computed given an infected nodes on G and G̃, resp.
arbitrary graph. As mentioned above, we can view this
lower bound as a robustness certificate, or guarantee for We use two other algorithms as baselines for comparison,
the given graph. It guarantees, in particular, that when which we call deg and gel. The two algorithms work
the budget is below the lower bound, the total probability by alternating between modifying GS and modifying
of “infection” (e.g., malware infection) in GS cannot be GS 0 until the budget is spent. When modifying GS ,
increased by more than 2τ . In the special case of perfect deg chooses the edge (i, j) with the maximum value of
estimation (τ = 0), it implies impossibility of increasing di + dj . Algorithm gel is based on [37], and chooses the
the susceptibility of GS to targeted diffusion. edge (i, j) with maximum eigenscore, defined as u[i]v[j],
where u, v are the left and right principal eigenvectors of
The proof of Theorem 5.1 does not depend on the specific A, respectively. This edge is chosen from among those
objective function proposed in this paper. Consequently, edges that are absent (if the graph is unweighted) or
the certificate is not specific to our particular objective present (if the graph is weighted). When modifying GS 0 ,
function. Further, the lower bound is independent of the these baselines choose an existing edge of to remove (if
values of δ and β, as long as δ/β ≤ dmin . unweighted) or decrease its weight (if weighted) with the
highest value of di + dj or u[i]v[j], respectively.
We briefly discuss the settings where the robustness
guarantee is most applicable. First, the estimation Unweighted Graphs: We consider (the largest
for the infected ratio on GS is accurate, i.e., |I(G ˆ S) − connected component of) an email network [19] that has
I(GS )| ≤ τ and τ is small. According to Mieghem et al. 986 nodes. An edge (i, j) indicates that there were email
[23], this usually happens on graphs with small degree exchanges between nodes i and j. This data set contains
variation. Another setting is where the degrees of nodes ground-truth labels to indicate which community a node
in GS increase as a result of the attack, which is both belongs to. We pick a community with 15 nodes as
natural and empirically founded, as we mentioned earlier. S; the results for communities with other sizes are
We provide experimental results on synthetic networks similar.
to verify the robustness guarantee in Section 6. The overall effectiveness of our approach is shown in
Figure 2, top. The difference Imodified − Ioriginal of the

6 Due to brevity the proof is in Appendix C at https://arxiv. 7 Due to brevity, additional results on real and synthetic

org/abs/2008.05589. networks are at https://arxiv.org/abs/2008.05589.

impact on the modified and original graphs is shown for 15%
BA

Imodified Ioriginal
GS (red line) and GS 0 (purple line), respectively. As γ
10% BTER
gets larger, the impact on GS increases, while the impact WS
on GS 0 is under control, which demonstrates that the 5%
proposed approach is highly effective at both increasing 0%
the impact of diffusion on the targeted subgraph, and at 0.0 0.2 0.4
the same time preventing the impact on the remaining
graph.
Figure 1: Certified robustness results. Dashed lines mark
Weighted Graphs: We consider an airport network
the lower bounds from Eq. (5.12). Solid lines represent
and a brain network. The airport network [25] was
infectious ratios within targeted subgraphs.
collected from the website of Bureau of Transportation
Statistics of the U.S., where the nodes represent all of
Email Airport Brain
the 1572 airports in the U.S. and the weights on edges

%
encode the number of passengers traveled between two

5.0
%
5% 0% 5% 0% 0% 0% 0% 0%

5.0
-0. 0. 0. 5. 10. 15. 0. 10.
airports in 2010. We scaled the weights on the airport

%
Imodified Ioriginal

% % %

5.0 0.0
0.0 5.0 0.0
network to [0, 1]. The targeted set S was chosen by first

%
sampling a node i uniformly at random, and then setting
S to be i and all its neighbors. We report experimental

5% 0% 5% 0%
-0. 0. 0. 0.
results for an S with 60 nodes. The brain network [9]

5% 0% 5%
-0. 0. 0.
consists of 638 nodes where each node corresponds to
a region in human brain. An edge between nodes i
0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5
and j indicates that the two regions have co-activated
on some tasks. The weight on the edge quantifies the GS GS 0 POTION deg gel
strength of the co-activation estimated by the Jaccard
index. The weights on edges lie in [0, 1]. The 638 regions
are categorized into four areas: default mode, visual, Figure 2: POTION effectively achieves targeted diffusion
fronto-parietal, and central. Each area is responsible for (top) in GS (red line) without affecting GS 0 (purple); higher
some functionality of human. We select 100 nodes from is better. Comparison against deg and gel baselines in GS
the central area as the targeted set S. The results for (middle; higher is better) and GS 0 (bottom; lower is better).
the airport (resp. brain) network are at the center (resp.
right) column of Figure 2. The overall trend is similar
to that of the email network. lower bounds on the budget computed using Eq. (5.12).
Note that when the budget is less than the lower bound,
Comparison against Baselines: The comparisons the differences are close to zero, which means that the
against the baselines are shown in Figure 2, middle and network is robust against targeted diffusion.
bottom rows. The moddle row shows the infectious Running Time: The running time of Algorithm 1 on
ratios within the targeted subgraphs. It is clear that our the three real-world networks is showed in Figure 3. Each
algorithm is more effective at increasing the infectious point in the figure is the average running time over 10
ratios than the baselines. The bottom row shows the trials. Intuitively, as the budget γ increases the attacker
infectious ratios within the non-targeted subgraphs. The needs to search a larger space, therefore the running
magnitudes of the differences are negligible, although in time increases. The numbers of nodes and edges of the
some cases our algorithm is significantly better than the three networks are in Table 1.
baselines (e.g., on airport network when γ = 0.4).
Airport
Verify the Certified Robustness: We run experi- 150 Brain
Email
seconds

ments on synthetic networks to verify the certified ro- 100
bustness; the synthetic networks include Barabási-Albert
(BA) [5], Watts-Strogatz [45], and Block Two-level Erdős- 50
Rényi (BTER) networks [33]. We use the same exper-
imental setup as described above. Fig. 1 shows the 0.1 0.2 0.3 0.4 0.5
difference of infectious ratios on the modified and origi-
nal graphs (within targeted subgraphs), as a function of
Figure 3: Running time on the real-world networks.
the attacker’s budget . The vertical dashed lines are the

Email Airport Brain of infectious diseases and its applications. Charles
Griffin & Company Ltd, 1975.
#nodes 986 1572 638
#edges 16064 17214 18625 [5] Albert-László Barabási and Réka Albert. Emer-
gence of scaling in random networks. Science, 286
Table 1: Statistics of the real-world networks. (5439):509–512, 1999.

[6] Deepayan Chakrabarti, Yang Wang, Chenxi Wang,
7 Conclusion Jure Leskovec, and Christos Faloutsos. Epidemic
Diffusion control on network has attracted much atten- thresholds in real networks. ACM Trans. Inf. Syst.
tion, however, most studies focus on diffusion over the Secur., 10(4):1:1–1:26, 2008.
entire network. We address the problem of targeted dif-
[7] Chen Chen, Hanghang Tong, B. Aditya Prakash,
fusion attack on networks. We present a combination
Charalampos E. Tsourakakis, Tina Eliassi-Rad,
of modeling and algorithmic advances to systematically
Christos Faloutsos, and Duen Horng Chau. Node
address this problem. On the modeling side, we present
immunization on large graphs: Theory and algo-
a novel model called POTION that optimizes graph
rithms. TKDE, 28(1):113–126, 2016.
structure to affect such targeted diffusion attacks, which
preserves structural properties of the graph. On the [8] Wei Chen, Yajun Wang, and Siyu Yang. Efficient
algorithmic side, we design an efficient algorithm named influence maximization in social networks. In KDD,
POTION-ALG by leveraging Rayleigh quotients and pages 199–208. ACM, 2009.
pseudospectrum theory, which is scalable to real-world
graphs. We also derive a condition to certify whether [9] Nicolas A Crossley, Andrea Mechelli, Petra E Vértes,
a network is robust against a broad class of targeted Toby T Winton-Brown, Ameera X Patel, Cedric E
diffusion. Our experiments on both synthetic and real- Ginestet, Philip McGuire, and Edward T Bullmore.
world networks show that the model is highly effective Cognitive relevance of the community structure of
in implementing the targeted diffusion attack. the human brain functional coactivation network.
PNAS, 110(28):11583–11588, 2013.
Acknowledgement
[10] Ernesto Estrada. ‘Hubs-repelling’ Laplacian and re-
SY and YV were partially supported by the National lated diffusion on graphs/networks. Linear Algebra
Science Foundation (grants IIS-1903207 and IIS-1910392) Appl., 2020.
and Army Research Office (grants W911NF1810208 and
W911NF1910241). LT and TER were supported in part [11] Pablo Fleurquin, José J Ramasco, and Victor M
by the National Science Foundation (IIS-1741197) and by Eguiluz. Systemic delay propagation in the US
the Combat Capabilities Development Command Army airport network. Scientific Reports, 3:1159, 2013.
Research Laboratory (under Cooperative Agreement
Number W911NF-13-2-0045). The authors would like to [12] Gene Golub and Charles Van Loan. Matrix com-
thank the anonymous reviewers and Chloe Wohlgemuth putations. Johns Hopkins Studies in Mathematical
for their helpful comments. Sciences, 1996.

[13] Nika Haghtalab, Aron Laszka, Ariel D. Procaccia,
References
Yevgeniy Vorobeychik, and Xenofon Koutsoukos.
Monitoring stealthy diffusions. Knowledge and
[1] Victor Amelkin and Ambuj K. Singh. Fighting
Information Systems, 2017.
opinion control in social networks via link recom-
mendation. In KDD, pages 677–685. ACM, 2019. [14] Christopher Ho, Mykel J. Kochenderfer, Vineet
Mehta, and Rajmonda S. Caceres. Control of
[2] Konstantin Avrachenkov and Nelly Litvak. The epidemics on graphs. In CDC, pages 4202–4207.
effect of new links on google pagerank. Stochastic IEEE, 2015.
Models, 22(2):319–331, 2006.
[15] David Kempe, Jon M. Kleinberg, and Éva Tardos.
[3] Lars Backstrom and Jure Leskovec. Supervised Maximizing the spread of influence through a social
random walks: predicting and recommending links network. In KDD, pages 137–146. ACM, 2003.
in social networks. In KDD, pages 635–644, 2011.
[16] David Kempe, Sixie Yu, and Yevgeniy Vorobeychik.
[4] Norman TJ Bailey et al. The mathematical theory Inducing equilibria in networked public goods games

through network structure modification. In AAMAS, [29] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena.
       pages 611–619, 2020.                                     Deepwalk: Online learning of social representations.
                                                                In KDD, pages 701–710, 2014.
[17]   Long T. Le, Tina Eliassi-Rad, and Hanghang Tong.
       MET: A fast algorithm for minimizing propagation [30] B. Aditya Prakash, Deepayan Chakrabarti, Nicholas
       in large graphs with small eigen-gaps. In SDM,           Valler, Michalis Faloutsos, and Christos Faloutsos.
       pages 694–702. SIAM, 2015.                               Threshold conditions for arbitrary cascade models
                                                                on arbitrary networks. Knowl. Inf. Syst., 33(3):
[18]   Stamatios Lefkimmiatis, John Paul Ward, and              549–575, 2012.
       Michael Unser. Hessian schatten-norm regulariza-
       tion for linear inverse problems. IEEE Trans. Image [31] Victor M. Preciado, Michael Zargham, Chinwendu
       Process., 22(5):1873–1888, 2013.                         Enyioha, Ali Jadbabaie, and George J. Pappas.
                                                                Optimal vaccine allocation to control epidemic
[19]   Jure Leskovec, Jon M. Kleinberg, and Christos            outbreaks in arbitrary networks. In CDC, pages
       Faloutsos. Graph evolution: Densification and            7486–7491. IEEE, 2013.
       shrinking diameters. ACM Trans. Knowl. Discov.
       Data, 1(1):2, 2007.                                 [32] Sudip Saha, Abhijin Adiga, B. Aditya Prakash,
                                                                and Anil Kumar S. Vullikanti. Approximation
[20]   Jure Leskovec, Mary McGlohon, Christos Faloutsos,        algorithms for reducing the spectral radius to
       Natalie S. Glance, and Matthew Hurst. Patterns of        control epidemic spread. In SDM, pages 568–576.
       cascading behavior in large blog graphs. In SDM,         SIAM, 2015.
       pages 551–556. SIAM, 2007.
                                                           [33] Comandur Seshadhri, Tamara G Kolda, and Ali
[21]   Jure Leskovec, Lars Backstrom, and Jon M. Klein-         Pinar. Community structure and scale-free collec-
       berg. Meme-tracking and the dynamics of the news         tions of erdős-rényi graphs. Phys. Rev. E, 85(5):
       cycle. In KDD, pages 497–506. ACM, 2009.                 056109, 2012.

[22] Marina Meila and Jianbo Shi. Learning segmen- [34] Jimeng Sun, Huiming Qu, Deepayan Chakrabarti,
     tation by random walks. In NIPS, pages 873–879.       and Christos Faloutsos. Neighborhood formation
     MIT Press, 2000.                                      and anomaly detection in bipartite graphs. In ICDM.
                                                           IEEE, 2005.
[23] Piet Van Mieghem, Jasmina Omic, and Robert E.
     Kooij. Virus spread in networks. IEEE/ACM Trans. [35] Hanghang Tong, Christos Faloutsos, and Jia-Yu
     Netw., 17(1):1–14, 2009.                              Pan. Fast random walk with restart and its
                                                           applications. In ICDM, pages 613–622. IEEE, 2006.
[24] Adilson E Motter and Ying-Cheng Lai. Cascade-
     based attacks on complex networks. Phys. Rev. E, [36] Hanghang Tong, B. Aditya Prakash, Charalampos E.
     66(6):065102, 2002.                                   Tsourakakis, Tina Eliassi-Rad, Christos Faloutsos,
                                                           and Duen Horng Chau. On the vulnerability of
[25] Tore Opsahl. US airport network traffic data in       large graphs. In ICDM, pages 1091–1096. IEEE
     2010. https://bit.ly/3dlpN3a, 2010.                   Computer Society, 2010.
[26] Lawrence Page, Sergey Brin, Rajeev Motwani, and [37] Hanghang Tong, B. Aditya Prakash, Tina Eliassi-
     Terry Winograd. The pagerank citation ranking:       Rad, Michalis Faloutsos, and Christos Faloutsos.
     Bringing order to the web. Technical report,         Gelling, and melting, large graphs by edge manipu-
     Stanford InfoLab, 1999.                              lation. In CIKM, pages 245–254. ACM, 2012.

[27] Romualdo Pastor-Satorras, Claudio Castellano, [38] Hanghang Tong, B. Aditya Prakash, Tina Eliassi-
     Piet Van Mieghem, and Alessandro Vespignani.       Rad, Michalis Faloutsos, and Christos Faloutsos.
     Epidemic processes in complex networks. Reviews    Gelling, and melting, large graphs by edge manipu-
     of modern physics, 87(3):925, 2015.                lation. In CIKM, pages 245–254. ACM, 2012.

[28] Adam Paszke, Sam Gross, Soumith Chintala, Gre- [39] Leo Torres, Kevin S Chan, Hanghang Tong,
     gory Chanan, Edward Yang, Zachary DeVito, Zem-      and Tina Eliassi-Rad. Node immunization with
     ing Lin, Alban Desmaison, Luca Antiga, and Adam     non-backtracking eigenvalues. arXiv preprint
     Lerer. Automatic differentiation in pytorch, 2017.  arXiv:2002.12309, 2020.

                                                                                               Copyright © 2021 by SIAM
                                                                       Unauthorized reproduction of this article is prohibited

[40] Lloyd N Trefethen and David Bau III. Numerical Appendix
linear algebra, volume 50. SIAM, 1997. A Generalization to Other Diffusion
[41] Lloyd N Trefethen and Mark Embree. Spectra and Dynamics
pseudospectra: the behavior of nonnormal matrices In this section we discuss generalization of the targeted
and operators. Princeton University Press, 2005. diffusion model, i.e., Eq. (3.6), to other common diffusion
dynamics. The fundamental question is: does the heuris-
[42] Piet Van Mieghem, Dragan Stevanović, Fernando tic encoded by the model apply to other scenarios with
Kuipers, Cong Li, Ruud Van De Bovenkamp, Daijie different diffusion dynamics (e.g., SIR or SEIR)?
Liu, and Huijuan Wang. Decreasing the spectral
radius of a graph by link removals. Phys. Rev. E, First, the feasible region of the model is independent of
84(1):016101, 2011. the diffusion dynamics, as it is only related to the spectral
properties of the underlying graph. Thus, the structural
[43] Tan Van Vu and Yoshihiko Hasegawa. Diffusion- (i.e., spectra, degree sequence, and triangle distribution)
dynamics laws in stochastic reaction networks. Phys. preserving properties of the diffusion model generalize to
Rev. E, 99:012416, 2019. other diffusion dynamics. Next, recall that the objective
[44] Zhihui Wang and Thomas S. Deisboeck. Dynamic function of the model is the following
targeting in cancer treatment. Frontiers in Physiol- α1 λ1 (ÃS ) + α2 σ(S) + α3 φ(S).
ogy, 10:1–9, 2019.
The third term is the normalized cut, which only depends
[45] Duncan J Watts and Steven H Strogatz. Collective on structural properties of the underlying graph, so
dynamics of small-world networks. Nature, 393 it generalizes to any other diffusion dynamics. The
(6684):440, 1998. first term also generalizes to many common diffusion
[46] Yang Yang, Takashi Nishikawa, and Adilson E dynamics, including SIR and SEIR, as their epidemic
Motter. Small vulnerable sets determine large thresholds are known to also be determined by the largest
network cascades in power grids. Science, 358(6365): eigenvalue of the underlying adjacency matrix [30]. The
eaan3184, 2017. only exception is the second term σ(S), that is, limiting
the impact on non-targeted subset through maximizing
[47] Sixie Yu and Yevgeniy Vorobeychik. Removing the eigencentrality of the targeted subset. This is because
malicious nodes from networks. In AAMAS, pages the rationale of maximizing σ(S) depends on the steady
314–322, 2019. state of the diffusion dynamics. Here, the steady state
is where in the long run a constant (in average) fraction
[48] Haifeng Zhang, Yevgeniy Vorobeychik, Joshua
of infected nodes exist. However, both SIR and SEIR
Letchford, and Kiran Lakkaraju. Data-driven agent-
have been shown without a steady state, as in the long
based modeling, with application to rooftop solar
run all nodes will be in the recovered state (i.e., immune
adoption. JAAMAS, 30(6):1023–1049, 2016.
to the diffusion) [27].
[49] Kai Zhou, Tomasz P. Michalak, Marcin Waniek, Ta-
Finally, the certified robustness in Section 5 generalizes
lal Rahwan, and Yevgeniy Vorobeychik. Attacking
to other diffusion dynamics, as its proof only depends on
similarity-based link prediction in social networks.
the spectral properties of the underlying graph.
In AAMAS, pages 305–313, 2019.
B Degree Sequence and Triangles
We now show that satisfying the restrictions Eq (3.5)
implies that certain structural properties of the graph
will be perturbed by only a small amount.
Indeed, the principal’s action has mild impact on the
degree sequence of G. Let d = A1 be the vector whose
i-th entry is the degree of the i-th node in the original
graph, and similarly, let d˜ = Ã1 be the degree sequence
after the perturbation.
Proposition B.1. The degree sequence of G before and
after the perturbation satisfies:
√
(2.13) kd˜ − dk2 ≤ n.

Proof.                                                                         And thus
(2.14)                                                                                                     n                      n
                          (a)            √   √
  kd˜ − dk2 = kÃ1 − A1k2 = k∆1k2 = k∆(1/ n)1 nk2
                                                                                                           X                      X
                                                                                          2|T̃ − T | ≈         λi (A)2 ηi ≤         λi (A)2 ηi
                                    √                                                                      i                      i
                                  ≤ n max k∆xk2
                                                              kxk=1                                                               n
                                                                               (2.20)                                             X
                                                    (b)   √                                                                ≤         λi (A)2
                                                    =         nk∆k2                                                               i
                                                          √
                                                    ≤      n,                                                             =  Tr(A2 ).

where (a) is due to the fact that Ã − A = ∆, and (b) comes                    Since A is symmetric and binary, we have Tr(A2 ) = 2m,
from the definition of spectral norm.                                          where m is the number of edges in G.

A direct corollary of Proposition B.1 concerns the average In what follows we present experimental results to show
degree of G.                                                      that the spectra and degree sequences do not change
                                                                  a lot due to the targeted diffusion. The spectra and
Corollary B.1. The average degree of G after the pertur- degree sequences of G and G̃ are showed in Figure 4,
bation is within  of the average degree before the perturbation: in which the top row presents the spectra with the
                                                                  eigenvalues as ranked in descending order, and the
                                                                  bottom row presents the degree sequences. The three
(2.15)           davg (G; Ã) − davg (G; A) ≤ .                  columns (from left to right) correspond to the email
                                                                  network, the airport network, and the brain network,
                                      > ˜                     >   respectively. The parameter γ is set to 0.5, the most
Proof. Note that davg (G̃) = 1 n d and davg (G) = 1 n d .
                                                                  powerful principal.
Thus we have:
                                                                  The Email Network: From Figure 4 (top row), the
                 ˜ − 1> d/n = (1/n) 1> (d˜ − d)
             1> d/n                                               eigenvalues with large value admit the largest deviation,
(2.16)                                                            while  the bottom row of that figure shows that the
                                    ≤ (1/n)k1k2 · kd˜ − dk2
                                                                  degree sequence is not significantly affected by the
                                    = .                          targeted diffusion. In fact, the change to the original
                                                                  degree sequence is mild, and a student’s t-test cannot
                                                                  differentiate the modified degree sequence from the
                                                                  original one (p-value=0.081).
Next, we perform a similar analysis for the number of
triangles before and after the perturbation.                      The Airport and The Brain Networks:                    The
                                                                  airport network is directed and each node pair is
Proposition B.2. Assume G is unweighted with m edges associated with two edges in opposite directions. We
and T triangles. Suppose the number of triangles after the convert the network to an undirected one by substituting
perturbation is T̃ . Then we have                                 an undirected edge for the two edges. The weight of the
                                                                  undirected edge is the sum of the weights on the two
(2.17)                    |T − T̃ | ≤ m,                         edges. The spectra and degree sequences of G and G̃
                                                                  are showed in the last two columns of Figure 4. As we
where the estimate is correct up to a first order approximation. can see from Figure 4 (top row), the graph spectrum is
                                                                  again nearly preserved, except the eigenvalues with small
Proof. Since G is unweighted, we have T = Tr A3 /6,
                                                            

where Tr is the trace operator. The restrictions Eq. (3.5)        values  admit some deviation. Similarly, the modified
guarantee that we can write λi (Ã) = λi (A) + ηi , where        degree   sequences cannot be differentiated from the
ηi ∈ [−1, 1]. Thus,                                               original one by student’s t-tests (airport: p-value=0.4969,
(2.18)                                                            brain: p-value=0.9919).
             n
             X                 n
                               X                    n
                                                    X
   6T̃ =         λi (Ã3 ) =           λi (Ã)3 =         (λi (A) + ηi )3 .
                                                                               C Proof of Theorem 5.1
             i                     i                i
                                                                               Proof. From the discussion in the main paper, an instance
Expanding the cube and neglecting the terms of higher order                    TargetDiff(S, G, ) can be encoded by the following meta
in ηi , we have                                                                model:
(2.19)
         n              n                      n                                                    max        I(G̃S ) − I(GS )
                                                                                                     Ã
        X               X                      X
6T̃ ≈      λi (A)3 + 3   λi (A)2 ηi = 6T + 3   λi (A)2 ηi .                  (3.21)
         i                     i                                 i                                 s.t.        Ã ∈ P.

                                                                                                                    Copyright © 2021 by SIAM
                                                                                            Unauthorized reproduction of this article is prohibited

eigenvalues

                                         eigenvalues

                                                                              eigenvalues
                         modified                                modified                                modified   of two convex sets.
          50             original                  5             original
                                                                                        4                original
                                                   0                                    2
           0                                                                            0                           The objective function is concave in d˜S , since it is twice
               0       500     1000                    0   500 1000 1500                    0     200 400 600       differentiable on the feasible region M and the Hessian matrix
               eigen-dimension                         eigen-dimension                          eigen-dimension     is negative definite; the Hessian matrix is a diagonal matrix
          300                                      1500                                                             with the i-th diagonal element being −2/d˜3i . Thus, Eq. (3.23)
                             modified                            modified               40              modified
                                         #nodes

                                                                              #nodes
          200                                      1000
#nodes

                             original
                                                                 original
                                                                                                        original    is a convex optimization problem. Note that the Slater’s
          100                                       500                                 20                          condition is satisfied (e.g., with d˜S = dS ), which indicates
            0                                         0                                  0                          that strong duality holds. Thus, the KKT conditions are
                   0 100 200 300                           0     2      4                 0.0 2.5 5.0 7.5 10.0
                      degree                                   degree   1e7                         degree          satisfied at any primal and dual optimal solutions.

Figure 4: Spectra (top) and degree sequences (bottom)                                                               For convenience, in what follows we use di (resp. d˜i ) to
for the left: the email network; middle: the airport                                                                represent the degree of a node i ∈ S before (resp. after)
network; and right: the brain network. The hyper-                                                                   graph modification. The Lagrange function of Eq. (3.23) is:
                                                                                                                    (3.24)
parameters are set to α1 = α2 = α3 = 1/3.                                                                                              X 1           
                                                                                                                                                   1                           
                                                                                                                       L(d˜S , λ, β) =          −       + λ n2 − kd˜S − dS k22
                                                                                                                                       i∈S
                                                                                                                                             di   d˜i
As discussed in [23] (see Section IV.B), computing the exact                                                                                         
                                                                                                                                     + β > d˜S − dS ,
value of I(GS ) is intractable, since the exact computation
of πi is challenging. Our model Eq. (3.6) can be thought
                                                                                                                    where λ ≥ 0 and β  0 are Lagrangian multipliers. Recall
of as a tractable proxy to the meta model.       An estimation
                                  ˆ S) = P                                                                          that the degrees of nodes in S are increased, i.e., d˜i ≥ di
to I(GS ) is given in [23], i.e., I(G           i∈S 1 − δ/(βdi ),                                                   for all i ∈ S. Note that for a node i ∈ S such that d˜i = di ,
where di is the degree of node i in G. The estimator works in
                                                                                                                    we let the corresponding βi = 0. Thus, by complementary
the region δ/β ≤ dmin , where dmin is the minimum degree
                                                                                                                    slackness, we have βi = 0 for all i ∈ S. The gradient of
of G. When the estimation is reasonably good, that is
 ˆ S ) − I(GS )| ≤ τ , we have the following relation:                                                              L(d˜S , λ, β) w.r.t. d˜i becomes:
|I(G
(3.22)                                                                                                                            ∂L      1                1
 I(G̃S ) − I(GS ) > 2τ =⇒ (I( ˆ G̃S ) + τ ) − (I(G
                                               ˆ S ) − τ ) > 2τ                                                                         = 2 − 2λd˜i + βi = 2 − 2λd˜i .
                                                                                                                                  ∂ d˜i  d˜i              d˜i
                                                    ˆ G̃S ) − I(G
                                                 =⇒ I(        ˆ S ) > 0.
                                                                                                                    Setting the gradient to zero leads to:
Thus, in what follows we focus on deriving the necessary                                                                                                     1/3
              ˆ G̃S ) − I(G
condition for I(        ˆ S ) > 0, which directly translates                                                                                              1
                                                                                                                                            d˜i =                    , ∀i ∈ S.
to the necessary condition for I(G̃S ) − I(GS ) > 2τ .                                                                                                   2λ

Suppose there exists an adjacency matrix Ã∗ ∈ P such that                                                          Since the optimal solution exists, we have λ 6= 0. By
I(G̃S ) − I(GS ) > 2τ . This indicates that the corresponding                                                       complementary slackness we have λ n2 − kd˜S − dS k22 = 0,
instance TargetDiff(S, G, ) is successful. Consequently, it                                                        which indicates:
follows       ˆ G̃S )− I(G
             I(         ˆ S ) > 0. Recall that I(
                                               ˆ G̃S )− I(G
                                                        ˆ S) =
 δ
   P that  1      1          ˜S ∈ R|S| represent the degree
                                                                                                                                       n2 = kd˜S − dS k22 .
β
         (
     i∈S di  −   d˜i
                     ). Let d
                                                                                                                    Expand the above equation:
sequence of nodes in S. Due to Proposition B.1 we have
kd˜S − dS k22 ≤ kd˜ − dk22 ≤ n2 . Consider the optimization                                                                            n2 = kd˜S − dS k22
problem in Eq. (3.23), where the objective function is                                                                                        X            2
ˆ G̃S ) − I(G
I(         ˆ S ) (up to a multiplicative factor). The last                                                                                  =      d˜i − di
constraint follows from the assumption that for a successful                                                        (3.25)                      i∈S
instance the degrees of nodes in S increase. The fact that                                                                                    X  1 1/3
                                                                                                                                                             !2
ˆ G̃S ) − I(G
I(         ˆ S ) > 0 implies that the optimal solution of                                                                                   =            − di .
Eq. (3.23) exists and the associated objective value is greater                                                                               i∈S
                                                                                                                                                  2λ
than zero.
                                                                                                                                   1 1/3
                                                                                                                                     
                                                                                                                    Substitute    2λ
                                                                                                                                            with a variable x and re-arrange the above
                                                                                                                    equation:
                                                       X 1      1
                                                                    
                                    max                       −                                                                                                          d2i − n2
                                                                                                                                            P                  P
                                                                                                                                        2           di
                                        d˜S
                                                       i∈S
                                                           di   d˜i                                                              x2 −        i∈S
                                                                                                                                                         x+        i∈S
                                                                                                                                                                                   = 0.
(3.23)                                                                                                                                      |S|                          |S|
                                    s.t.               kd˜S − dS k22 ≤ n2
                                                                                                                    According to vieta theorem, a necessary condition that we
                                                       d˜S  dS .                                                   can solve for x ∈ R from the above equation is:
                                                                                                                                                                 !
                                                                                                                               2 i∈S di 2                2     2
                                                                                                                              P                 P
                                                                                                                                                   i∈S di − n
                                                                                                                                         
Denote the feasible region of the above optimization problem
                                                                                                                                            −4                     ≥ 0,
as M. Note that M is a convex set since it is the intersection                                                                     |S|                 |S|

                                                                                                                                                          Copyright © 2021 by SIAM
                                                                                                                                  Unauthorized reproduction of this article is prohibited

25%                                                       4%                                                         20%
which leads to:                                                                                                                                                        (1/3, 1/3, 1/3)                                             (1/3, 0, 1/3)

                                                                                     Ioriginal

                                                                                                                                                Ioriginal

                                                                                                                                                                                                            Ioriginal
                                                                                                    0%                                                       3%        (1/3, 0, 1/3)                                    15%        (1/3, 0, 0)

                                                                                                                                           0
                                                                                                  -25%                                                       2%

                                                                                      S

                                                                                                                                                 S

                                                                                                                                                                                                             S
             r          P        2       P            2 !1/2                                                                                                                                                           10%
                  |S|       i∈S di           i∈S di                                               -50%                                                       1%

                                                                           Imodified

                                                                                                                                      Imodified

                                                                                                                                                                                                  Imodified
         ≥                          −                          .                                 -75%        (1/3, 0, 1/3)                                  0%                                                         5%
                   n        |S|              |S|2                                                             (0, 0, 1/3)

                                                                                                                                           0
                                                                            S

                                                                                                                                       S

                                                                                                                                                                                                   S
                                                                                                 -100%                                                      -1%                                                         0%
                                                                                                         10% 20% 30% 40% 50%                                      10% 20% 30% 40% 50%                                         10% 20% 30% 40% 50%

                                                                                                 5.00%                                                  0.40%          (1/3, 1/3, 1/3)
                                                                                                                                                                                                                     6.00%         (1/3, 0, 1/3)

                                                                                 Ioriginal

                                                                                                                                            Ioriginal

                                                                                                                                                                                                        Ioriginal
                                                                                                 0.00%
                                                                                                                                                                       (1/3, 0, 1/3)                                 4.00%         (1/3, 0, 0)
                                                                                                                                                        0.20%

                                                                                                                                      0
                                                                                                                                                                                                                     2.00%

                                                                                  S

                                                                                                                                             S

                                                                                                                                                                                                         S
                                                                                                 -5.00%
D    Additional Results on Real Networks                                                                                                                                                                             0.00%

                                                                       Imodified

                                                                                                                                  Imodified

                                                                                                                                                                                              Imodified
                                                                                                                                                        0.00%
                                                                                             -10.00%          (1/3, 0, 1/3)                                                                                         -2.00%
                                                                                                              (0, 0, 1/3)

                                                                                                                                      0
                                                                        S

                                                                                                                                   S

                                                                                                                                                                                               S
We run three experiments to show the effectiveness of                                        -15.00%                                                    -0.20%                                                      -4.00%
                                                                                                          10% 20% 30% 40% 50%                                      10% 20% 30% 40% 50%                                        10% 20% 30% 40% 50%
each term in the objective function of POTION . The
                                                                                                   5%                                                        1%                                                         5%
results are showed in Figure 5. The three rows (from

                                                                                     Ioriginal

                                                                                                                                                Ioriginal

                                                                                                                                                                                                            Ioriginal
                                                                                                                                                                     (1/3, 1/3, 1/3)                                             (1/3, 0, 1/3)
                                                                                                   0%                                                        0%      (1/3, 0, 1/3)                                      4%       (1/3, 0, 0)

                                                                                                                                           0
                                                                                      S

                                                                                                                                                 S

                                                                                                                                                                                                             S
                                                                                                                                                                                                                        3%
top to bottom) correspond to experimental results on                                              -5%                                                        0%
                                                                                                                                                                                                                        2%

                                                                           Imodified

                                                                                                                                      Imodified

                                                                                                                                                                                                  Imodified
                                                                                                 -10%        (1/3, 0, 1/3)                                  -0%
the email, airport and brain networks.                                                                       (0, 0, 1/3)                                                                                                1%
                                                                                                 -15%                                                       -1%                                                         0%

                                                                                                                                           0
                                                                            S

                                                                                                                                       S

                                                                                                                                                                                                   S
                                                                                                        10% 20% 30% 40% 50%                                       10% 20% 30% 40% 50%                                        10% 20% 30% 40% 50%

The first column corresponds to the first experiment,
which is to show that maximizing λ1 (ÃS ) leads to
higher infected ratios. The hyper-parameters are set to                Figure 5: Experiments showing the model’s effectiveness.
α1 = 1/3, α2 = 0, and α3 = 1/3 (the hyper-parameters                   Top: the email network; Middle: the airport network;
do not need to sum to one). The labels of the y-axis                   Bottom: the brain network. The three columns (from left
             S          S                                              to right) show the effectiveness of: 1) maximizing λ1 (ÃS ); 2)
become Imodified    − Ioriginal , which highlights that the
infected ratios are for the targeted subgraph GS (the                  Maximizing the eigenvector centrality of S; 3) maximizing
higher the better). Note that α2 is set to zero in order to            the normalized cut of S.
avoid the coupling between the eigenvector centrality of
S and λ1 (ÃS ). From the plot it is clear that maximizing
λ1 (ÃS ) is important to increase the infected ratios within          qualitatively resemble real networks [45]. BTER are
GS (the blue line). Note that solely maximizing the                    generative network models that can be calibrated to
normalized cut of S may backfire (the red line), as                    match real-world networks, in particular, to reproduce
a large portion of edges are deleted from GS when γ                    the community structures [33].
increases.
                                                                       The experimental setup is similar to the setup for the
The second column is to show the effectiveness of limiting
                                                                       email network, except for a few changes. First, the
the impact on GS 0 by maximizing the eigenvector
                                        S0          S0                 experimental results for each class of the synthetic
centrality of S. The y-axis represents Imodified − Ioriginal ,
                                                                       networks are averaged over 30 randomly generated
which highlights that the infected ratios are for the
                                                                       network topologies. Another difference lies in how the
non-targeted subgraph GS 0 (the lower the better). The
                                                                       targeted set S is selected. For each randomly generated
plot shows that the impact on GS 0 is well limited; the
                                                                       network, the targeted set S is selected as the node whose
effectiveness is most significant when γ > 40%.
                                                                       degree is the 90 percentile of the degree sequence, and
The last column is to show the effectiveness of maximiz-               its neighbors. Some statistics of the synthetic networks
ing the normalized cut of S. We set α2 = 0 to avoid                    are summarized in Table 2. Recall that δ = 0.24 and
the effect of maximizing the eigenvector centrality of                 β = 0.06. The experimental results are showed in
S. Observe that maximizing the normalized cut of S is                  Figure 6. The conclusion derived from Figure 6 is similar
effective in increasing the infected ratio within GS only              to that of the email network. It is worth pointing out that
for the email network. This suggests that on weighted                  maximizing the normalized cut of S is effective on BA
graphs normalized cut might not be a good heuristic to                 networks, while for other network it may backfire.
increase the centrality of S.

E   Additional          Results                on          Synthetic                                                                                                 BA                  Watts-Strogatz                                            BTER
    Networks                                                                                                        |S|                                              17.5                          12                                              20.03
In this section we show experimental results on synthetic           dmin            9.86         10                                                                                                                                                11.69
                                                                  density           0.02        0.03                                                                                                                                                0.03
unweighted graphs with 375 nodes. We focus on
                                                              average degree        9.87         10                                                                                                                                                 11.5
three classes of networks: Barabasi-Albert (BA), Watts-
                                                          average clustering coeff. 0.08        0.35                                                                                                                                                0.05
Strogatz, and BTER [33]. BA is characterized by
its power-law degree distribution [5]. Watts-Strogatz
                                                                 Table 2: Statistics of synthetic networks.
is well-known for its local clustering in a way as to

                                                                                                                                                  Copyright © 2021 by SIAM
                                                                                                                          Unauthorized reproduction of this article is prohibited

You can also read