Dynamic Information Flow Tracking for Detection of Advanced Persistent Threats: A Stochastic Game Approach

Page created by Julie Ortiz

IT & Technique

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Dynamic Information Flow Tracking for Detection of Advanced Persistent Threats: A Stochastic Game Approach

Dynamic Information Flow Tracking for Detection of Advanced
                                                   Persistent Threats: A Stochastic Game Approach
                                                         Shana Moothedath, Dinuka Sahabandu, Joey Allen, Andrew Clark, Member, IEEE,
                                                   Linda Bushnell, Fellow, IEEE, Wenke Lee, Fellow, IEEE and Radha Poovendran, Fellow, IEEE

                                            Abstract—Advanced Persistent Threats (APTs) are stealthy at-              known threats. However, APTs introduce information flows in
                                         tacks by intelligent adversaries. This paper studies the detection           the form of data-flow commands and control-flow commands
                                         of APTs that infiltrate cyber systems and compromise specifically            while interacting with the system and these are continuously
                                         targeted data and/or infrastructures. Dynamic information flow
                                                                                                                      recorded in the log file of the system.
arXiv:2006.12327v2 [cs.GT] 25 Jun 2021

                                         tracking is an information trace-based detection mechanism
                                         against APTs that tags suspicious information flows in the system               After an APT attack, when the consequences of the attack
                                         and performs security analysis for unauthorized use of tagged                has been identified, a forensic investigation is conducted using
                                         data. In this paper, we develop an analytical model for resource-            the data from the log files. The purpose of this postmortem
                                         efficient detection of APTs using an information flow tracking               investigation is to understand the incident’s root cause, and
                                         game. The game is a nonzero-sum, turn-based, stochastic game
                                         with asymmetric information as the defender cannot distinguish               construct appropriate defense strategies against such APTs and
                                         whether an incoming flow is malicious or benign. The payoff                  its variants [1]. During the forensic analysis, the system log
                                         functions of the game capture the cost for performing security               data is analyzed in an offline setting.
                                         analysis and the rewards and penalties received by the players.                 Dynamic Information Flow Tracking (DIFT) [2] is a widely
                                         We analyze equilibrium of the game and prove that a Nash                     used detection mechanism for offline analysis of APT. DIFT
                                         equilibrium is given by a solution to the minimum capacity
                                         cut set problem on a flow-network derived from the system.                   uses the information traces recorded in the system log for
                                         The edge capacities of the flow-network are obtained from                    performing the security analysis [1]. The key idea behind
                                         the cost of performing security analysis. Finally, we implement              DIFT is that it tags all suspicious input/data channels and
                                         our algorithm on a real-world dataset for a data exfiltration                tracks the propagation of the tagged information flows through
                                         attack augmented with false-negative and false-positive rates and            the system. DIFT generates security analysis using a pre-
                                         compute an optimal defender strategy.
                                                                                                                      specified set of heuristic security rules whenever it observes
                                            Index Terms—Advanced Persistent Threats (APTs), Informa-                  an unauthorized use of tagged data. These heuristic rules are
                                         tion flow tracking, Stochastic games, Minimum-cut problem                    typically defined by practicing experts based on the histori-
                                                                                                                      cal attack knowledge and knowledge about nominal system
                                                                  I. I NTRODUCTION                                    behavior [1].
                                                                                                                         Although, security rules incorporated in the DIFT mecha-
                                            An advanced persistent threat (APT) is a prolonged and
                                                                                                                      nism cover a wide range of attacks, these security rules may
                                         targeted cyber attack in which an intruder gains illicit access
                                                                                                                      not be capable of verifying the authenticity of information
                                         to a system and remains undetected for an extended period
                                                                                                                      flows against all possible attacks, resulting in the generation
                                         of time. The intention of an APT is to monitor system
                                                                                                                      of false-negatives and false-positives. Incorrectly identifying
                                         activity and continuously mine highly sensitive data rather
                                                                                                                      a benign (nominal user) flow as malicious is a false-positive
                                         than causing damage to the system or organization. APT
                                                                                                                      (FP) and failing to identify a malicious (adversarial) flow is a
                                         attacks consist of multiple stages that are initiated by an
                                                                                                                      false-negative (FN). For instance, while the security rules for
                                         initial compromise and reconnaissance stage to establish a
                                                                                                                      buffer overflow protection [3] can be verified accurately, the
                                         foothold in the system. Attackers then progress through the
                                                                                                                      security rules for web application vulnerabilities [4] cannot
                                         system, exploring and planning an attack strategy to obtain
                                                                                                                      be accurately verified. Consequently, attacks that exploit web
                                         the desired data. This is followed by exfiltration of sensitive
                                                                                                                      application may lead to incorrect conclusions by DIFT thereby
                                         data, which is continued over a long period of time. Defending
                                                                                                                      generating FPs and FNs.
                                         against APTs is a challenging task since APTs are specifically
                                                                                                                         Additionally, limited availability of resources for defense
                                         designed to evade conventional security mechanisms such as
                                                                                                                      along with the performance and memory overhead imposed
                                         firewalls, anti-virus software and intrusion-detection systems
                                                                                                                      by the defense mechanism on the system demands a resource-
                                         that rely on signatures and can, therefore, guard only against
                                                                                                                      efficient detection technique. An analytical model of DIFT
                                           S. Moothedath, D. Sahabandu, L. Bushnell, and R. Poovendran are            that captures the cost for performing security analysis and
                                         with the Department of Electrical and Computer Engineering, University       the effectiveness of flow-tagging mechanisms will facilitate
                                         of Washington, Seattle, WA 98195, USA. {sm15, sdinuka, lb2,                  the trade-off between the effectiveness of defense and the
                                         rp3}@uw.edu.
                                           J. Allen and W. Lee are with the College of Computing, Georgia Institute   resource efficiency. Further, such a model would enable the
                                         of Technology, Atlanta, GA 30332 USA. jallen309@gatech.edu,                  design of optimal security strategies while taking into account
                                         wenke@cc.gatech.edu.                                                         the generation of FP and FN.
                                           A. Clark is with the Department of Electrical and Computer En-
                                         gineering, Worcester Polytechnic Institute, Worcester, MA 01609 USA.            In this paper, we provide an analytical model to enable
                                         aclark@wpi.edu.                                                              DIFT to optimally select locations in the system to perform

security analysis so as to maximize the probability of detection dynamical system [5]. Stochastic games have been widely
while minimizing the cost of detection. Our framework is used to model security games [6], economic games [7], and
based on the following insights. First, although both APT resilience of cyber-physical systems [8], where each player
and DIFT are unaware of the other player’s strategy, the tries to maximize its individual payoff. A brief overview of
effectiveness of the DIFT depends on the APT’s strategy and the existing results in stochastic games is given in [9].
the APT’s probability of evading detection depends on the Stochastic games have been used to address system security
DIFT’s strategy. This strategic interaction motivates a game- related problems. The interaction between malicious attackers
theoretic approach. Second, the efficiency of detection also and the intrusion detection system (IDS) is modeled using
depends on the effectiveness of performing security analysis a stochastic game in [10]. A nonzero-sum stochastic game
at different locations in the system, and hence is determined model is given in [11] to model network security configuration
by rates of FP and FN. Third, the game unfolds at multiple problem in distributed IDS. Then a value iteration based algo-
states between the entry points and the exit points of the rithm is proposed in [11] to find an ε-NE for an attacker model
attack where each state corresponds to the position of the where multiple adversaries simultaneously attack a network.
tagged flows in the system. At each state, the defender decides A zero-sum stochastic game is formulated in [12] for IDS in
whether to analyze a tagged flow and which one of the a communication or computer network with interdependent
tagged flows to analyze (to avoid generation of FP and FN) nodes and correlated security assets and vulnerabilities. Note
while the adversary decides which process to transition to, that, [13] considered a finite-horizon game and [10]-[12] dealt
at the cost of spending the defense resources. We formulate with zero-sum stochastic games for IDS.
a stochastic game model that is played on an information Game-theoretic models for resource-efficient detection of
flow graph (IFG). An IFG is a directed graph that expresses APTs are given in [14], [15], [16], and [17]. DIFT models
the history of a system’s execution in terms of the spatio- with fixed trapping nodes are introduced and analyzed in
temporal relationships between processes and subjects. The [14], [15]. Paper [16] extended the models in [14], [15] by
contributions of this paper are the following. considering a DIFT model which selects the trapping nodes
• We model the interaction of APT and DIFT as a nonzero- in a dynamic manner rather than being fixed and proposed a
sum, two-player, turn-based stochastic game (G) with min-cut based solution approach. Reference [17] considered
finite state and action spaces. In the APT vs. DIFT game, the detection of APTs when the attack consists of multiple
the state of the game is the nodes of the IFG at which the attackers possibly with different capabilities and analyzed the
tagged flows arrive. The location of the tagged flows is best responses of the players. Note that, the focus in [14], [15],
known to both players (APT and DIFT), and this makes and [16] is resource-efficient detection of APTs and they do
the state of the game observable. However, each player not consider the false-positives and false-negatives generated
is unaware of the other player’s strategy. Also, before in the system. In other words, the game models in [14]-[17]
performing security analysis, DIFT cannot distinguish assume that security analysis performed by DIFT can verify
malicious and benign flows. This results in an asymmetric the authenticity of tagged information flows accurately.
information structure. A stochastic game model for detecting APTs was recently
• We analyze Nash equilibrium (NE) strategy of the game introduced in the conference version of this work [18]. The
and show that an NE can be obtained from a solution to proposed method in [18] analyzed a discounted stochastic
the minimum capacity cut-set problem on a flow-network game and presented a value iteration based algorithm to obtain
constructed from the IFG of the system. an ε-NE of the discounted game. Due to the discounted nature
• We implement our algorithms and results on real-world of the payoff, the Nash equilibrium analysis of the game in
data obtained for a data exfiltration attack using the [18] reduces to a nonlinear program (NLP). Solving an NLP is
Refinable Attack INvestigation (RAIN) system [1] aug- computationally challenging and the convergence of the algo-
mented with FN and FP rates of the system. rithm in [18] is not guaranteed. This necessitates an alternate
The rest of the paper is organized as follows. Section II solution approach for solving the DIFT vs. APT game. In this
presents related work. Section III introduces preliminary con- paper, we solve an average reward (undiscounted) stochastic
cepts on information flow graph, APT attack, and a descrip- game and present a min-cut based approach with guaranteed
tion of DIFT-based defense mechanism. Section IV describes convergence to compute optimal defense strategies.
the formulation of the turn-based stochastic game model. In this paper we consider that the trapping nodes are gener-
Section V gives the solution concept of the game and the ated dynamically and we extend our previous results on min-
equilibrium analysis results for computing optimal strategies cut analysis in [16] in the following aspects. (i) Since there can
of the players. Section VI illustrates the numerical results be a wide range of possible APT attacks, we recognize that
using data exfiltration attack data set collected using RAIN the security rules of DIFT may not be capable of verifying
and augmented with FN and FP rates. Section VII concludes the authenticity of the information flows accurately thereby
the paper. resulting in the generation of FP and FN. Consequently,
the game model in this paper is stochastic unlike in [16].
(ii) We introduce a model with multiple information flows
II. R ELATED W ORK out of which one is malicious and the remaining are benign
Stochastic games model interactions of multiple agents unlike in [16] which dealt with one information flow. Note
that jointly control the evolution of states of a stochastic that, while DIFT knows there is exactly one malicious flow

in the system, it is unaware which flow is the malicious 4. Lateral Movement: The attacker will increase its control
flow before performing security analysis. This results in an of the system by moving laterally to new nodes in the
asymmetric information between players and hence the game system.
is imperfect information game. (iii) In the current game model, 5. Target Attainment: Next, the attacker will try and escalate
although an information flow is concluded as benign at an its privileges which may be necessary in order to access
initial inspection, it can be inspected again later during its sensitive information, such as a proprietary source code
propagation. The model in [16] inspects the flow exactly once or customer information.
as FN and FP rates are assumed to be zero. However, DIFT 6. Attack Completion: The final goal of the attacker is to
may analyze a flow multiple times and this will result in deconstruct the attack, hopefully in a way to minimize
added resource cost. The game model considered in this paper its footprint in order to evade detection. For example,
captures the added resource cost incurred from reevaluation of attackers may rely on deleting the system’s log.
the flows.

III. P RELIMINARIES
In this section, we first describe the graphical representation
of the system, denoted as the information flow graph, and then
present the models of the attacker and the flow tracking-based
defense.

A. Information Flow Graph (IFG)
Let G = (VG , EG ) represent the IFG of the system. VG =
{v1 , . . . , vN } consists of the processes (e.g., an instance of
a computer program), files, and objects in the system and Figure 1: Schematic diagram of an APT attack timeline.
EG ⊆ VG × VG represents the information flows (directed) in
the system from one node to the other. IFG-based auditing is Let the set of possible entry points of the APT be denoted
heavily desired by large enterprises and government agencies as λ ⊂ VG . During an attack, the goal of the APT is to capture
to detect APTs. We perform our game-theoretic analysis on a subset of nodes of the IFG referred to as destinations and
the IFG of the system and use DIFT as the defense mechanism denoted as D ⊂ VG . Here, λ ∩ D = 0. / Once a foothold is
to detect APTs. established in the system, APT tries to elevate the privileges
and proceeds to the destinations through more internal com-
promises and performs data exfiltration at an ultra-low-rate.
B. Attack Model: Advanced Persistent Threats (APTs) In order to achieve this the APT performs operations in the
APTs are intelligent attacks with specific targets. APTs are system to transition through the nodes in VG and arrive at
stealthy attacks that perform data exfiltration at an ultra-low- some node in D . In other words, the adversary (APT) selects
rate to avoid detection. Unlike classical malware, APT cam- paths in the IFG and performs transitions along the paths to
paigns tend to involve multiple hosts, multiple systems, and reach the set D from the set λ .
extend over a long period of time, up to several months [19]. The adversary in the DIFT vs. APT game has the following
APTs are characterized by their abilities to render existing properties. We consider an APT attack that generates a single
security mechanisms ineffective. APTs can evade security pro- malicious information flow. The malicious flow originates at
tection because existing mechanisms lack sufficient visibility an entry point of the attack and the objective of the flow is to
into user, program, and operating system activities to ascertain reach a destination. The state of the game is the set of nodes
the authenticity of an activity and the provenance of its data. of the information flow graph at which the tagged flows arrive.
The timeline and key stages of the APT lifecycle is described APT observes the tagged flows and hence the state of the game
below and presented in Figure 1. is observable to APT. However, APT is unaware of the actions
1. Reconnaissance: During the reconnaissance phase, the and the strategy of DIFT. Thus the information structure of
attacker will try to gain information about the system, APT is asymmetric with respect to that of the defender.
such as what nodes are accessible on the system and the
security defenses that are being used. C. Defender Model: Dynamic Information Flow Tracking
2. Initial Compromise: During the initial compromise stage, (DIFT)
the attacker’s goal is to gain access to an enterprise’s The objective of the defender is to prevent any adversarial
network. In most cases, the attacker achieves this by information flow traversing to the destination nodes. DIFT
exploiting a vulnerability or a social engineering trick, is a dynamic taint analysis based detection mechanism that
such as a phishing email. consists of three main components: (i) tag sources, (ii) tag
3. Foothold Establishment: Once the attacker has completed propagation rules, and (iii) trapping nodes. Trapping nodes
the initial compromise, it will establish a persistent are processes and objects (files and network endpoints) in the
presence by opening up a communication channel with system that are considered as suspicious sources of informa-
their Command & Control (C&C) server. tion by the system. All data originating from a tag source is

labeled or tagged and DIFT tracks the propagation of a tagged the adversarial player. We consider a nonzero-sum turn-based
data through the system. Propagation of a tagged information game. The information structure of the players is asymmetric
flow through the system results in tagging of more information as the defender player does not know if a tagged (suspicious)
flows based on the tag propagation rules specified by the DIFT flow is malicious or benign while the adversarial player knows
[20]. Tag propagation rules are defined based on two kinds of this information. The game G = {S, s0 , Σ , ∆ , P} unfolds on a
information flows in the system: explicit information flows finite state space, S, with initial state, s0 , finite action space of
and implicit information flows [21]. players, Σ , labeled transitions ∆ ⊆ S × Σ × S, and transition
During the execution of a program in the system, DIFT probability matrix P.
keeps track of the tagged information flows and generates
trapping nodes for any unauthorized use of tagged data that
indicate a possible attack. DIFT invokes security analysis A. State Space
at the trapping nodes using the specified security rules to
We consider a turn-based game with the state of the game
verify the authenticity of the tagged flow thereby concluding
at time t ∈ {0, 1, 2, . . .} denoted by the random variable st . Let
whether there is an attack or not. These security rules are pre-
T denote the time horizon of the game, i.e., the game ends
specified depending on the application running on the system
at time T and st 0 = sT , for all t 0 > T . The state space S is
[3]. Note that tagged (suspicious) flows consist of both benign
partitioned into two sets, SA and SD , such that SA ∩ SD = 0/ and
and malicious flows. As the specified security rules may not
SA ∪ SD = S. The states that belong to the set SA are referred
necessarily cover all possible attacks by APTs, DIFT generates
to as the adversary-controlled states and the states in SD are
false-positives and false-negatives. An optimal selection of
referred to as the defender-controlled states. Specifically, Sx
trapping nodes in the IFG is hence critical to detect a wide
is the subset of states at which player Px , where x ∈ {A, D},
range of attacks in the system with minimum false-positives
controls the transitions. The state of the game at time t is
and false-negatives. Note that while tag sources and tag
the state of the system which corresponds to the position of
propagation rules are known to the attacker, the trapping
the tagged (suspicious) flows at time t. Let W be the number
nodes are dynamically generated during the operation of the
of tagged flows that arrive into the system at time t = 1. In
system and hence is unknown to the attacker. Figure 2 shows
practice (from the analysis of log data), only a small fraction
a schematic diagram of the DIFT-based detection framework
of nodes in the system receive a system call at the same time.
considered in this paper.
Hence in our model we assume that W 1, the state
of the game is st = (x, vi1 , . . . , viW ), where x ∈ {A, D} and
{vi1 , . . . , viW } ∈ VG ∪ {φ , τ}. Note that, out of the W flows one
is a malicious flow and the remaining (W − 1) are benign
flows. For notational convenience we consider the first flow is
malicious (note that, defender does not have this information).
Figure 2: Schematic diagram of the DIFT detection framework. The
nodes in the figure denote system components such as processes, While DIFT observes all the W flows, it cannot distinguish
files, and subjects. malicious and benign flows. APT, on the other hand, knows
which flow is the malicious flow. Thus in the DIFT vs. APT
The defender in the DIFT vs. APT game has the following game, APT and DIFT have asymmetric information.
properties. The state of the game, which is the nodes of the Also, corresponding to a state st if vik = φ , for some k ∈
information flow graph at which the tagged flows arrive, is {1, . . . ,W }, then for all time t 0 > t the state corresponding to
known to DIFT. Thus DIFT observes the state of the game. the kth flow remains φ , i.e., if a flow is dropped at time t it
However, DIFT cannot distinguish a malicious and a benign remains dropped through out the rest of the game.
flow before performing security analysis. Also, while DIFT
knows that there is an APT in the system, DIFT is unaware
of the actions and the strategy of APT. B. Action Spaces
We first construct a directed flow-network F = (VF , EF )
IV. T URN -BASED APT VS . DIFT G AME G
from the IFG G by introducing a source node sF with an
In this section, we model the interaction of the APT with outgoing edge to all the entry points and a sink node tF with
the system during the different stages of the attack as a an incoming edge from all the destination nodes. Here VF =
dynamic stochastic game between the defender (DIFT) and VG ∪ {sF ,tF } and EF = EG ∪ {sF × λ } ∪ {D × tF }.
the adversary (APT). Let PD be the defender player and PA be

Definition IV.1. An attack path in the flow-network F is a                        security analysis on an information flow and also concluded it
simple directed path1 from sF to tF . The set of attack paths in                  as malicious, i.e., a flow is trapped, and case (iii) corresponds
F is denoted as Ω D .                                                             to all W flows dropping out.
   The objective of the APT attack is to capture a destination
node in set D . To achieve the same, APT chooses transitions                      C. State Transitions
along a path from the set λ to the set D which translates                         Definition IV.2. Time t is said to be the termination time
into an attack path in the flow-network F . Since APT is a                        of the game G if it satisfies one of the following condition:
stealthy attack, APT performs minimum amount of activities                        (a) t = T and (b) the state of the game st = (x, vi1 , . . . , viW ),
in the system in order to avoid detection. Thus, there are no                     where x ∈ {A, D}, is an absorbing state.
cycles in the transition path of the APT through the IFG. We
also note that, any IFG with set of cycles can be converted                          We use the notation W to denote the number of tagged
into an acyclic IFG without losing any causal relationships                       information flows that are recorded at the first instant of time,
between the components given in the original IFG. One such                        i.e., t = 1. The log recording system (RAIN) used in our
dependency preserving conversion is node versioning given in                      work uses the system call in [23] to create a timestamp for
[22]. Hence all ω ∈ Ω D are simple directed paths.                                each audit log. At every instant of the log recording, RAIN
   Using the construction of the flow-network, we denote                          records at most one system call for each system component
the initial state of the game at t = 0 is denoted as s0 =                         (processes, files). As a result, the W information flows arrive
(A, sF , . . . , sF ), since at t = 0 all flows originate at the source           at distinct nodes of the information flow graph as it is not
node sF . At a state st , either PD or PA chooses an action from                  possible for multiple flows to arrive at one node at the
their respective action sets denoted by AD and AA , depending                     same instant of time. Let the state of the game at t = 1
on whether st ∈ SD or st ∈ SA . If st ∈ SA , the adversary decides                be st = (D, vi1 , . . . , viW ). Here {vi1 , . . . , viW } are the nodes of
whether to quit the attack by dropping the information flow                       the IFG at which the flows arrive. The DIFT-based defense
or to continue the attack. If the adversary decides not to                        mechanism observes the flows. The player PD now chooses
quit the attack, then it selects which neighboring node of the                    an action from the set AD (st ) ∈ {0, vi1 , . . . , viW }. At state st ,
IFG to transition from the current node so as to reach set                        there are three possibilities: (1) PD does not analyze any flow,
D . In other words, the adversary either drops the malicious                      (2) PD analyze the malicious flow, and (3) PD analyze a
flow or performs a transition along a path in Ω D and hence                       benign flow. Based on the action chosen by PD , i.e., AD (st ),
AA := {φ } ∪ VG . Specifically, at state st = (A, vi1 , . . . , viW ),            the next possible state of the game st+1 is defined. At st+1 PA
where vi1 ∈ VG and vi1 ∈         / D , AA (st ) ∈ {φ } ∪ N (vi1 ), where          has two possibilities to choose from: (a) to quit the flow, φ
N (vi1 ) = {v j ∈ VG : (vi1 , v j ) ∈ EG }. On the other hand, if                 and (b) to transition to an out-neighbor of node vi1 , N (vi1 ),
st ∈ SD , the action of the defender is to decide whether to                      i.e., AA (st+1 ) ∈ {φ } ∪ N (vi1 ). Based on the action of PA and
perform security analysis on an information flow or not. If                       the distribution of the benign flows in the system, the next
the defender chooses to perform security analysis, it also                        state st+2 is arrived. At t + 2, PD again chooses its action
selects a flow to analyze. A node at which security analysis                      and the game continues in a turn-based fashion. Note that the
is performed is referred to as a trapping node. Consider a                        DIFT will analyze only one information flow at a time t as
state st = (D, vi1 , . . . , viW ) and let defender decides to perform            we consider a single malicious flow in the system.
security analysis on the flow at vi1 . Then vi1 is chosen as a                       For case (1) the action of PD is AD (st ) = 0, i.e., DIFT does
trapping node. Thus, AD := {0} ∪VG , where 0 represents not                       not perform security analysis on any of the W information
analyzing any flow. The defender performs security analysis                       flows. Then the next state of the game st+1 = (A, vi1 , . . . , viW ).
on at most one flow in a state as there is only malicious                         For case (2), PD chooses action AD (st ) = vi1 and the next
flow and also it is not possible to perform analysis on the                       state of the game is st+1 = (A, vi1 , . . . , viW ) with probability
information flows at the entry points of the attack, λ . Thus                     FN and st+1 = (A, τ, vi2 , . . . , viW ) with probability 1 − FN.
at state st = (D, vi1 , . . . , viW ), AD (st ) ∈ {0} ∪ {vik : vik ∈
                                                                   / λ,k ∈        For case (3), the player PD chooses action AD (st ) = vik ,
{1, . . . ,W }}. We also note that the information flow chosen                    k 6= 1, i.e., k ∈ {2, . . . ,W }. Then the next state of the game
by the DIFT to perform security analysis can be either the                        is st+1 = (A, vi1 , . . . , vik−1 , τ, vik+1 , . . . , viW ) with probability FP
malicious flow or a benign flow.                                                  and st+1 = (A, vi1 , . . . , viW ) with probability 1 − FP. Here, FN
   A state st is said to be an absorbing state if the game                        and FP are the rate of false-negatives and false-positives in
terminates at st and the action sets of the players are empty,                    the DIFT-architecture which are empirically computed. The
i.e., AA (st ) = AD (st ) = 0.    / In G a state st = (x, vi1 , . . . , viW ),    values of FP and FN depend on the security rules and vary
where x ∈ {A, D}, is absorbing if one of the three cases hold:                    across the different DIFT-architectures and they determine the
                                                                                  transition probabilities P of the stochastic game. In order to
  (i) vi1 ∈ D ⊂ VG ,
                                                                                  reduce false-positives and false-negatives in the system, the
 (ii) vik = τ for some k ∈ {1, . . . ,W },
                                                                                  security rules must be capable of identifying the behavior
 (iii vik = φ for all k ∈ {1, . . . ,W }.
                                                                                  and predicting the intend of information flows, i.e., determine
   Case (i) corresponds to adversary reaching a destination,
                                                                                  whether an unknown flow is indeed malicious, or whether it
case (ii) corresponds to the case where DIFT performed
                                                                                  is benign flow that is exhibiting malware-like behavior [24].
  1 A directed path is said to be a simple directed path if there are no cycles      If t +1 is not a terminating time instant, then the adversarial
or loops in the path.                                                             player PA chooses action AA (st+1 ) ∈ {φ } ∪ N (vi1 ). The

remaining W − 1 flows follow the benign flow distribution in                          the adversary. Similarly, UA consists of two components: (i) a
the system, which is computed empirically from the nominal                            reward βA > 0 for reaching a destination node in the set D , and
system operation and known. Let πB : vi ∈ VG → [0, 1]|N (vi )|+1                      (ii) penalty αA < 0 for getting detected by the defender. The
denote the benign flow distribution. If the action of PA is φ ,                       reward and penalty parameters and the resource cost of DIFT
then st+2 = (D, φ , v j2 , . . . , v jW ), where v j2 , . . . , v jW depend on        and APT do not necessarily induce a zero-sum scenario where
the distribution πB . Here {v j2 , . . . , v jW } ∈ {φ } ∪VG . Note that a            the payoff that is gained by one player is lost by the other.
benign flow can also drop out. Now PD chooses its action and                          Therefore, we considered a non-zero sum payoff structure for
the game continues. To summarize, given st = (D, vi1 , . . . , viW )                  the DIFT vs. APT game.
(w.p stands for with probability), state transitions and the
                                                                                         Let the payoff of player Px at an absorbing state st be
corresponding transition probabilities, given by P, are
                                                                                     denoted as cx (st ), where x ∈ {A, D}. Also, let the payoff of PD
       
        (A, vi1 , . . . , viW ),        w.p 1,       if AD (st ) = 0,                at a non-absorbing defense-controlled state st with action dt be
                                                                                      rD (st , dt ), and the payoff of PA at a non-absorbing adversary-
       
         (A,     , . . . ,     ),                        AD (st ) = vi1 ,
       
       
       
            vi1           viW           w.p FN,      if
st+1 = (A, τ, . . . , viW ),             w.p 1 − FN, if AD (st ) = vi1 ,              controlled state st with action at be rA (st , at ). At each state in
                                                                                     the game, st at time t, where t < T and st is a non-absorbing
         (A, v   , . . . , v   ),        w.p 1 − FP, if AD (st ) = vik , k 6= 1
       
              i             i
       
               1             W
                                                                                      state, player chooses its action (dt for st ∈ SD and at for st ∈ SA )
       
       
       
       (A, v , . . . , τ, . . . , v ),w.p FP,
              i1                    iW                 if AD (st ) = vik , k 6= 1.    and receives payoff rD (st , dt ) and rA (st , at ), respectively, and
                                                                                (1)
                                                                                      the game transitions to a next state st+1 . This is continued
 and
         (                                                                            until they reach an absorbing state and incur cA (st ) and cD (st ),
             (D, v j1 , . . . , v jW ),      if AA (st+1 ) ∈ N (vi1 ),                respectively, or the game arrives at the horizon, i.e., t = T .
  st+2 =                                                                        (2)
             (D, φ , v j2 , . . . , v jW ), if AA (st+1 ) = φ .                       Then,

                                                                                                          rA (st , at ) = 0 for all st , at ,                 (4)
D. Strategies of the Players
   A strategy is a rule that each player uses to select actions
                                                                                                   
                                                                                                    αA , st ∈ SA , st = (A, τ, . . . , viW )
                                                                                                   
at every step of the game. We consider mixed (stochastic) and
                                                                                        cA (st ) =   βA , st ∈ SA , st = (A, vi1 , . . . , τ, . . . , viW )   (5)
behavioral player strategies. Since the strategy is mixed, at                                      
                                                                                                     0,   otherwise
                                                                                                   
a state, PD and PA select an action from the action set AD
and AA , respectively, based on some probability distribution.                                   
Further, since the strategy is behavioral, the probability dis-                                   βD , st ∈ SD , st = (D, vi1 , . . . , viW ), vi1 ∈ D
                                                                                                 
tribution on the action set at a time instant t depends on all                        cD (st ) =   αD , st ∈ SD , st = (A, τ, . . . , viW )             (6)
the states traversed by the game and the actions taken by the                                    
                                                                                                 
                                                                                                   0,   otherwise
players until time t.
   Let dt , at be the action of the defender and the attacker,                                        (
respectively, at time t. We denote the information available to                                        0,         st ∈ SD , dt = 0
                                                                                      rD (st , dt ) =                                                      (7)
the players PD and PA at time t 6 T by Yt and Zt , respectively.                                       CD (vik ), st ∈ SD , dt = vik , k ∈ {1, . . . ,W }.
Then
                                                                                        As the initial state of the game is s0 , for a strategy pair
             Yt     := {s0 , d0 , s1 , d1 , . . . , st−1 , dt−1 , st },               (pD , pA ) the expected payoffs of the players are
             Zt     := {s0 , a0 , s1 , a1 , . . . , st−1 , at−1 , st }.        (3)
                                                                                                                             "        #
                                                                                                                                       T

We denote the set of all possible outcomes for Yt and Zt at
                                                                                                  UA (pD , pA ) = Es0 ,pA ,pD          ∑ (RtA )       and     (8)
                                                                                                                                     t=0
time t using Y ? and Z ? , respectively. Let the set of all pure                                                                   "
                                                                                                                                       T
                                                                                                                                                  #
strategies of PD and PA for game G be P̄D and P̄A , respectively.                                 UD (pD , pA ) = Es0 ,pA ,pD               D
                                                                                                                                       ∑ (Rt )        ,       (9)
Define ∆ P̄D (∆ P̄A ) as the simplex of P̄D (P̄A ) or, the set of                                                                    t=0
probability distributions over P̄D (P̄A ). A mixed strategy for PD                    where Es0 ,pA ,pD denotes the expectation with respect to s0 , pA ,
is an element pD ∈ ∆ P̄D , so that pD is a probability distribution                   and pD and from Eqs. (4)-(7)
over P̄D . Similarly, a mixed strategy for PA is an element                                       D
pA ∈ ∆ P̄A , so that pA is a probability distribution over P̄A . A                               
                                                                                                     r (st , dt ), x = D, st ∈ SD , t < T
behavioral mixed strategy of player PD is given by pD : Y ? →
                                                                                                 
                                                                                                  rA (s , a ), x = A, s ∈ S , t < T
                                                                                                          t t                 t    A
∆ P̄D and of player PA is given by pA : Z ? → ∆ P̄A .                                      Rtx =       D
                                                                                                                                                  (10)
                                                                                                 
                                                                                                 
                                                                                                     c (st ),      st ∈ SD , t = T,
                                                                                                  A
                                                                                                      c (st ),      st ∈ SA , t = T.
E. Payoffs to the Players
                                                                                         We incorporate the idea of maximizing the probability of
  The payoff functions of PD and PA are denoted by UD and                             detection and minimizing the probability of adversary evading
UA , respectively. UD consists of three components: (i) resource                      detection using the terms pT and pR , respectively. Further, we
cost CD (vi ) < 0 for performing security analysis of a flow at                       capture the false-positives generated in the system using the
a node vi ∈ VG , (ii) penalty βD < 0 for adversary reaching a                         term pFP . Note that, generation of false-positive implies that
destination node in D , and (iii) reward αD > 0 for detecting                         the defender failed to capture the adversary and concluded a

benign flow is malicious. This means that the adversary will            Lemma V.3. Consider the APT vs. DIFT game G =
attain the target as the malicious flow evaded detection. Given         {S, s0 , Σ , ∆ , P}. Let N be the number of nodes in the IFG
a set of strategies (pD , pA ), where pD ∈ PD and pA ∈ PA , and         and W be the number of flows that arrive into the system at
benign distribution πB , the payoff functions of the players, i.e.,     time t = 1. Then |S| = O(NW ).
Eqs.(8), (9), can be rewritten using Eqs. (4)-(7), (10) as
                                                                      Proof. We prove the result by showing that |SA | = O(NW )
   UD (pD , pA ) = pT αD + (pR + pFP ) βD+∑ ∑ pD (vi )CD (vi ) (, 11)   and |SD | = O(NW ). Consider an arbitrary state s =
                                        s∈SD vi ∈s                      (x, vi1 , . . . , viW ) ∈ S \ s0 , where x ∈ {A, D}. Here, vik ∈ VG ∪
                                        
   UA (pD , pA ) = pT αA + (pR + pFP ) βA .                     (12)    {φ , τ} for all k ∈ {1, . . . ,W }. All the W tagged flows arrive at
                                                                        distinct nodes in the set VG of the IFG as a process or file in
  Here pT is the cumulative probability that the adversarial            the system receive exactly one information flow at a particular
flow is detected by the defender. The term pT αD captures               time. Therefore, for a state s = (x, vi1 , . . . , viW ), where vik ∈ VG
the reward received by DIFT for detecting the attack and the            for all k ∈ {1, . . . ,W }, ia 6= ib for a, b ∈ {1, . . . ,W }. Also,
term pT αA captures the penalty incurred by APT for getting             the defender performs security analysis on one tagged flow
detected. Also, pR denotes the cumulative probability that the          at a time and hence there is no state s = (x, vi1 , . . . , viW )
adversarial flow reaches a destination and pFP denotes the              with via = vib = τ for a 6= b. Repetition of vik ’s is possible
cumulative probability that a benign flow is concluded as               only for states with φ . Hence s belongs to one of the two
malicious (i.e., trapped) by the defender, i.e., false-positive.        types: (a) {s = (x, vi1 , . . . , viW ) : vik ∈ VG ∪ {τ}} and (b) {s =
The term (pR + pFP ) βD captures the penalty incurred by                (x, vi1 , . . . , viW ) : vik ∈ VG ∪ {φ } such that vik = φ for at least
DIFT for not detecting the attack and the term (pR + pFP ) βA           some k ∈ {1, . . . ,W }}.
captures the reward received by the APT for not getting                    Case (a) corresponds to selecting W items from a set of
detected. Recall that pD (vi ) denotes the probability with which       N + 1 items without any repetitions. The cardinality of this
DIFT selects node vi as a security check point (trap). The                                            (N+1)!
                                                                        set is N+1   W        = (W )!(N+1−W            W
                                                                                                               )! = O(N ). Case (b) corresponds
term ∑s∈SD ∑vi ∈s (pD (vi )CD (vi )) captures the total resource cost   to states with one or more φ entries. Note that, here repetitions
associated with DIFT for performing security analysis. Note                                                                                N+1 
                                                                        are allowed only for φ . The cardinality of this set is W            −1 +
that pT , pR are functions of pD and pA , and pFP is a function           N+1  
                                                                                  +    . . . +  N+1  
                                                                                                       +  N+1   
                                                                                                                  = O(N W −2 ). Additionally, there
                                                                          W −2                    2          1
of pD and πB . Our focus is to compute a limiting average               is an initial state s0 . From (a) and (b), the number of possible
equilibrium of the nonzero-sum stochastic game.                         cases for s is O(NW ). Since s = (x, vi1 , . . . , viW ), where x ∈
                                                                        {A, D}, we get |SA | = O(NW ) and |SD | = O(NW ). As S =
        V. C OMPUTATION OF O PTIMAL S TRATEGY
                                                                        SA ∪ SD , we get |S| = O(NW ).
A. Solution Concept
                                                                           It is shown that there exists an NE for nonzero-sum dis-
  In this subsection, we present the solution concept of the
                                                                        counted stochastic games [25]. However, the existence of NE
game.
                                                                        for nonzero-sum undiscounted stochastic games is open when
Definition V.1. Let pA : Z ? → [0, 1]|AA | denote a strategy of         the time horizon is infinite, i.e., T = ∞. In the result below
the adversary. Also let pD : Y ? → [0, 1]|AD | denote a strategy        we prove the existence of NE for the game G. The existence
of the defender, i.e., the probability of performing security           is shown by proving that time horizon is indeed finite for
analysis at a state. Then the best response of the defender is          G and then invoking the existence of NE for finite horizon
given by                                                                undiscounted games.
              BR(pA ) = arg max {U D(pD , pA )}.
                               pD ∈PD                                   Proposition V.4. Let T denote the termination time of the
Similarly, the best responses of the adversary are given by             APT vs. DIFT game. The APT vs. DIFT game, G, terminates
                                                                        in at most is 2N steps, i.e., T 6 2N.
               BR(pD ) = arg max {U A(pD , pA )}.
                               pA ∈PA                                   Proof. The action set AA of the adversary player (APT) is to
   The best response of the defender is the set of defense              choose a transition along an attack path of the flow-network F
strategies that maximize the payoff of the defender for a               (Definition IV.1) or to drop out at some point. Let Ω D denote
given adversarial strategy and known benign flow distribution.          the set of attack paths in F . Consider an arbitrary attack path
The best response of the adversary is a set of transition               ω̂ ∈ Ω D . Recall that ω̂ is a simple path as an APT due to
strategies, for a given defender strategy and known benign              its stealthy behavior will not traverse in cycles through the
flow distribution, that maximizes the probability of adversary          system. Hence the length of ω̂ is at most N, as ω̂ is a simple
reaching a destination node without getting detected.                   directed path. Thus in any run of the game, the adversary can
                                                                        take at most N transitions. As G is a turn-based game, the
Definition V.2. A pair of strategies (pD , pA ) is said to be a         game terminates in at most 2N steps.
Nash equilibrium (NE) if
                                                                        Proposition V.5. There exists a Nash equilibrium for the APT
                 pD ∈ BR(pA ) and pA ∈ BR(pD ).                         vs. DIFT game G.
   An NE is a pair of strategies such that no player can benefit        Proof. It is shown in [13] that there exists a Nash equilibrium
through unilateral deviation, i.e., by changing the strategy            for a nonzero-sum stochastic game with asymmetric informa-
while the other player keeps the strategy unchanged.                    tion structure, under stochastic behavioral strategies, when the

time horizon is finite. Proposition V.4 prove that the APT vs.                        before reaching node tF . In order to ensure security, the
DIFT game will terminate in 2N number of steps. Further,                              defender must select at least one node in all possible paths
the behavioral strategy space is a subset of the strategy space                       from sF to tF as a trapping node. In the equilibrium analysis
PA × PD . Hence by the result in [13], the proof follows.                             of the game we prove that an optimal strategy of the defender
                                                                                      is indeed to select the min-cut nodes of the flow-network as
B. Solution Approach                                                                  trapping nodes. The objective of the adversary is to optimally
   In this section, we compute an NE of the DIFT vs. APT                              choose the transitions in such a way that the probability
game G. Our approach is based on a minimum capacity                                   of reaching tF is maximum. The adversary hence plans its
cut-set formulation on a flow-network constructed from the                            transitions to select an attack path with least probability of
information flow graph of the system. Consider the flow-                              detection.
network F = (VF , EF ), where VF = VG ∪ {sF ,tF } and EF =                               The result below proves that an NE of game G is repre-
EG ∪ {sF × λ } ∪ {D × tF }. Then, a cut of F is defined below.                        sented by the nodes corresponding to a minimum capacity cut
                                                                                      in the flow-network F . Note that, the solution to the min-cut
Definition V.6. In a flow-network F with vertex set VF and                            problem may not be unique. Consequently, there may exist
directed edge set EF , the cut induced by Ŝ ⊂ VF is a subset of                      multiple NE for the game G.
edges κ(Ŝ ) ⊆ EF such that for every (u, v) ∈ κ(Ŝ ), |{u, v} ∩
Ŝ | = 1. Further, given edge capacity vector cF : EF → R+ , the                      Theorem V.7. Every NE (pA , pD ) of the APT vs. DIFT game
capacity of a cut κ(Ŝ ), is defined as the sum of the capacities                     G satisfies the following properties:
of the edges in the cut, i.e., cF (κ(Ŝ )) = ∑e∈κ(Ŝ ) cF (e).                         1) The defender’s strategy pD selects all the nodes in Ŝ ? as
                                                                                          trapping nodes, where Ŝ ? is a set of min-cut nodes of
   The (source-sink)-min-cut problem aims to find a cut κ(Ŝ ? )                          the flow network F .
of Ŝ ? ⊂ VF such that cF (κ(Ŝ ? )) 6 cF (κ(Ŝ )) for any cut κ(Ŝ )                  2) The adversary’s strategy pA chooses transitions such that
of Ŝ ⊂ VF satisfying sF ∈ Ŝ and tF ∈              / Ŝ . The (source-sink)-             each attack path passes through exactly one node in the
min-cut problem is well studied and there exist algorithms                                set Ŝ ? .
to find a min-cut in time polynomial in |VF | and |EF | [26].
                                                                                         We prove Theorem V.7 by invoking the property that at NE
In our approach to compute NE, which is detailed later in
                                                                                      every player plays a best response against the other players
the section, we find a min-cut through nodes of the flow-
                                                                                      simultaneously. We first present prove the best response re-
network instead of the edges. Hence we transform the node
                                                                                      sults, Lemma V.8, Lemma V.10, and Lemma V.11, and then
version of the min-cut problem to an equivalent edge version.
                                                                                      present the proof of Theorem V.7.
For this, we introduce an edge corresponding to each node in
                                                                                         Lemma V.8 gives the best response of the adversary for a
F except the source and sink nodes. The transformed flow-
                                                                                      given defender strategy that selects the min-cut nodes of F
network is denoted as F = (V F , E F ), where V F = VF ∪ VG0
                                                                                      as the trapping nodes for analyzing the flows.
and E F = EF0 ∪ EG0 ∪ Eλ ∪ ED with VG0 = {v01 , . . . , v0N }, E 0 F =
{(v0i , v j ) : (vi , v j ) ∈ EG }, Eλ = {(sF , vi ) : vi ∈ λ }, ED = {(v0i ,tF ) :   Lemma V.8. Let Ω D be the set of all attack paths in F .
vi ∈ D }, and EG0 = {(vi , v0i ) : i = 1, . . . , N}. The capacity vector             Consider a defender strategy pD in which only the min-cut
associated with the edges in F is given by                                            nodes Ŝ ? of the flow-network F are chosen as trapping
                                  (                                                   nodes with nonzero probability. Then, the best response of
                                     CD (vi ), e ∈ EG0                                the adversary, BR(pD ), is to choose the transitions in such a
                      cF (e) :=                                              (13)
                                     ∞,         otherwise                             way that all attack paths which has nonzero probability under
                                                                                      BR(pD ) pass through exactly one node in Ŝ ? .
   Note that EG0 is a cut of F and ∑e∈E 0 cF (e) < ∞. Hence any
                                         G
                                                                                      Proof. Consider the payoff function of the adversary,
minimum capacity cut in F corresponds to edges from the set
                                                                                      UA (pD , pA ) = (pT αA + (pR + pFP ) βA ). Here, pT and pR are
EG0 as the capacity of the remaining edges is ∞ as shown in
                                                                                      functions of pD and pA . However, pFP depends only on pD
Eq. (13). Further, the capacity of an edge in set (vi , v0i ) ∈ EG0
                                                                                      and πB . Thus for a given pD , πB , the probabilities pT and pR
corresponds to the cost of conducting the security analysis at
                                                                                      vary depending on pA , however, pFP is a constant. We prove
node vi . Thus a minimum capacity cut in F corresponds to a
                                                                                      the result using a contradiction argument. Suppose the best
cut node set of the IFG with minimum total cost of performing
                                                                                      response of PA is a strategy p0A such that there exists a path
security analysis.
                                                                                      ω̂ ∈ Ω D that passes through two nodes, say vi , v j ∈ Ŝ ? ⊂ VG ,
   Let κ(Ŝ ? ) denotes an optimal solution to the (source-sink)-
                                                                                      and πA (ω̂) 6= 0, where πA (ω̂) is the probability of attack path
min-cut problem on F . Then κ(Ŝ ? ) ⊆ EG0 and cF (κ(Ŝ ? )) < ∞.
                                                                                      ω̂ under p0A . Note that Ŝ ? is a min-cut and vi , v j ∈ Ŝ ? . Further,
The nodes corresponding to the min-cut is
                                                                                      CD (vi ) 6= 0 and CD (v j ) 6= 0. Thus there exists at least one
                                                                                      directed path, say ω̂ 0 , from vi to tF that does not pass through
                      Ŝ ? := {vi : (vi , v0i ) ∈ κ(Ŝ ? )}.                 (14)
                                                                                      any node in Ŝ ? \ {vi }. Similarly, there exists at least one
   In the APT vs. DIFT game, the aim of the defender is                               directed path, say ω̂ 00 , from v j to tF that does not pass through
to optimally select trapping nodes in the IFG, i.e., nodes of                         any node in Ŝ ? \ {v j }. As per the given defender strategy
IFG to perform security analysis, such that no adversarial flow                       pD the only trapping node in ω̂ 0 , ω̂ 00 is vi , v j , respectively.
reaches some node in D . In other words, defender ensures that                        Thus there exists a strategy pA such that all paths in Ω D with
all adversarial flows that originate in node sF gets detected                         nonzero probability under pA pass through exactly one node

in Ŝ ? and UA (pD , pA ) > UA (pD , p0A ) (since pFP remains same    Given p(ω)’s and f (ω)’s are equal for all ω ∈ Ω . Thus
and pT is higher and pR is lower under p0A when compared to                          h                                i       
the values under strategy pA ). This contradicts the assumption       UD (pD , pA ) = p(ω) (αD − βD ) + (1 + f (ω))βD    ∑ π(ω)
that p0A is a best response and completes the proof.                                                                                         ω∈Ω

                                                                                        +   ∑         ∑       pD (vi )CD (vi )
   Recall that Ω D is the set of all source-to-sink paths in the                            ω∈Ω       vi ∈ω
flow-network F (Definition IV.1). Note that, any path in F             h                           i                                                         
that does not belong to Ω D is not a valid attack, as the attacker    = p(ω) αD +(1− p(ω)+ f (ω))βD +                       ∑         ∑       pD (vi )CD (vi )
cannot reach a destination node. Every attack path in Ω D ,                                                                ω∈Ω        vi ∈ω
                                                                                                                                            (15)
depending on the defender strategy and the transition structure
                                                                      Eq. (15) holds as ∑ω∈Ω π(ω) = 1. Consider a defender
of the game, induces a set of paths in the state space graph
                                                                      strategy pD in which exactly one node in every ω̂ ∈ Ω D is
of the APT vs. DIFT game. Let Ω be the set of all paths
                                                                      chosen as the trapping node. Assume that the defender strategy
induced by Ω D . In other words, Ω is the set of all possible
                                                                      is modified to p0D such that more than one node in some
state transition paths in the state space graph corresponding
                                                                      path are chosen as trapping nodes. This variation updates the
to all attack paths in F .
                                                                      probabilities of nodes in a set of paths in Ω . Note that, due to
Definition V.9. Let the set of paths induced in the state             the constraints on p(ω) and f (ω) (conditions (a) and (b)), the
space S by the attack paths Ω D in F be denoted as Ω .                defender’s probabilities (strategy) at two nodes in a path are
Then the probability of selecting a path ω ∈ Ω , denoted by           dependent. Hence for all paths ω ∈ Ω whose probabilities are
π(ω), is π(ω) = πA (ω)πB (ω), where πA (ω) is the probability         modified in p0D , ∑vi ∈ω p0D (vi )CD (vi ) < ∑vi ∈ω pD (vi )CD (vi ). This
with which an adversary chooses ω (product of the adversary           holds as the probability in the single node case is less than
transition probabilities along path ω) and πB (ω) is the              the sum of the probabilities of more than one node case as the
probability of benign flows in ω under distribution πB .              events are dependent and the sum of the values of CD (·) < 0
   The following result proves that, under certain conditions,        are also minimum (since min-cut). This implies
                                                                                                                                   
for a given adversary strategy the best response of the defender
is to select one node in every attack path as a trapping node.
                                                                         ∑ ∑ p0D (vi )CD (vi ) < ∑ ∑ pD (vi )CD (vi ) . (16)
                                                                        ω∈Ω    vi ∈VG                           ω∈Ω     vi ∈VG

Lemma V.10. Let Ω D be the set of attack paths in F ,                 From Eq. (15) and by conditions (a) and (b), we get
Ω be the set of paths in S induced by Ω D , and pA be a               UD (p0D , pA ) < UD (pD , pA ). Therefore, p0D is not a best response
given strategy of PA . For ω ∈ Ω , let ω(A) denote the set            for the defender and no best response of the defender has
of nodes in ω corresponding to the adversarial (malicious)            more than one node chosen as trapping node, if p(ω)’s and
flow and ω(B) denote the set of nodes in ω corresponding              f (ω)’s are equal for all ω ∈ Ω .
to the benign flows. Also, let p(ω) denote the probabil-
ity
h of detecting the adversary        along path ω, i.e., p(ω) =
                              i                                          The result below proves that if Lemma V.10 holds, then for
  1 − ∏vi ∈ω(A) (1 − pD (vi )) (1 − FN). Also, let f (ω) denote       a given adversary strategy the best response of the defender
the probability
              h of trapping a benigni flow (false-positive),          is to select the min-cut nodes of F as trapping nodes.
i.e., f (ω) = 1 − ∏vi ∈ω(B) (1 − pD (vi )) FP. If the defender’s
strategy satisfies the following two conditions:                      Lemma V.11. Let Ω D be the set of attack paths in F . Assume
 (a) p(ω) = p(ω 0 ), for all ω, ω 0 ∈ Ω and                           that the defender’s strategy satisfies conditions (a) and (b) in
 (b) f (ω) = f (ω 0 ), for all ω, ω 0 ∈ Ω ,                           Lemma V.10 for a given adversary strategy pA . Then the best
                                                                      response of the defender, BR(pA ), is to select with nonzero
then the best response of the defender, BR(pA ), is to select
                                                                      probability a min-cut node set of the flow-network F as the
with nonzero probability exactly one node in every ω̂ ∈ Ω D
                                                                      trapping nodes.
as a trapping node.
Proof. Let π(ω) denote the probability of a path ω ∈ Ω under          Proof. Under conditions (a) and (b) in Lemma V.10, the best
the given strategy pA and benign distribution πB . For a path ω       response of the defender is to select one node in every attack
in the state space S, ω(A) denotes the set of nodes in ω corre-       path as trapping node. Note that all attack paths in F pass
sponding to the adversarial (malicious) flow and ω(B) denotes         through some node in the min-cut node set Ŝ ? . By selecting
the set of nodes in ω corresponding to the benign flows. Let          the nodes in Ŝ ? as trapping nodes, all possible attack paths
p(ω) denote the probability   of detecting the iadversary along       have some nonzero probability of getting detected. We prove
                                                                      the result using a contradiction argument. Suppose that the
                      h
path ω, i.e., p(ω) = 1− ∏vi ∈ω(A) (1− pD (vi )) (1−FN). Also,
                                                                      best response of the defender is not to select the nodes in Ŝ ?
let f (ω) denote the probabilityh of trapping a benign flow,i i.e.,   as trapping nodes. Then, there exists a subset of nodes Ŝ ⊂ VG
false-positive. Then f (ω) = 1 − ∏vi ∈ω(B) (1 − pD (vi )) FP.         such that ∑vi ∈Ŝ CD (vi ) < ∑v j ∈Ŝ ? CD (v j ). Further, all possible
The defender’s payoff                                                 attack paths in F pass through some node in Ŝ . Then, Ŝ is a
                                                                      (source-sink)-cut-set and let κ(Ŝ ) := {(vi , v0i ) : vi ∈ Ŝ }. Then,
                            h                                    i
UD (pD , pA ) = ∑ π(ω) p(ω) αD + (1 − p(ω) + f (ω))βD
                   ω∈Ω                                                κ(Ŝ ) is a cut set and cF (κ(Ŝ )) < cF (κ(Ŝ ? )). This contradicts
                                                                      the fact that κ(Ŝ ? ) is an optimal solution to the (source-sink)-
                                                    
              +     ∑        ∑       pD (vi )CD (vi )
                   ω∈Ω       vi ∈ω                                    min-cut problem. Hence under Lemma V.10 the best response

of the defender is to select the nodes in Ŝ ? as trapping nodes.           flow and ω(B) denote the set of nodes in ω corresponding to
                                                                            the benign flows. Rewriting(p(ω1 ) − p(ω2 )) > 0, we get
                                                                            h                            i h                                     i
                                                                              1−      ∏      (1− pD (vi )) − 1−           ∏         (1− pD (v j )) (1−FN) > 0
    Using Lemma V.8, Lemma V.10, and Lemma V.11, we                                 vi ∈ω1 (A)                         v j ∈ω2 (A)
present below the proof of Theorem V.7.                                                                                                                     (20)
Proof of Theorem V.7: Consider a defender’s strategy that                   Similarly rewriting( f (ω1 ) − f (ω2 )) > 0, we get
selects the min-cut nodes as the trapping nodes. Then, by                     h                                i h                                    i
                                                                                1−        ∏       (1 − pD (vi )) − 1 −            ∏     (1 − pD (v j )) FP > 0
Lemma V.8 the best response of the adversary is to select                              vi ∈ω1 (B)                           v j ∈ω2 (B)
transitions in such a way that all attack paths with nonzero                                                                                                (21)
probability pass through exactly one node in the min-cut.                   As 0 < FP < 1 and 0 < FN < 1, Eqs. (20) and (21) imply
Further Let Ω D be the set of attack paths in F and Ω be                       h                                      i
the set of paths in S induced by Ω D . Lemma V.11 concludes                       ∏ (1 − pD (vi )) − ∏ (1 − pD (vi )) > 0 (22)
that the best response of the defender is to select the min-                        vi ∈ω2 (A)                      v j ∈ω1 (A)
                                                                                h                                                             i
cut nodes of F as trapping nodes, provided the probability                             ∏         (1 − pD (vi )) −      ∏       (1 − pD (v j )) > 0         (23)
of detecting the adversary and the probability of trapping a                        vi ∈ω2 (B)                      v j ∈ω1 (B)
benign flow are equal for all ω ∈ Ω . This implies that, if NE
                                                                            Eqs. (22) and (23) imply
strategy pair satisfy the conditions that p(ω)’s and f (ω)’s
are equal for all ω ∈ Ω , then the defender’s strategy at NE                           ∏         (1 − pD (vi )) >             ∏       (1 − pD (vi )) and (24)
is to select the min-cut nodes as the trapping nodes and the                        vi ∈ω2 (A)                            v j ∈ω1 (A)
adversary’s strategy is to choose an attack path such that it                                    (1 − pD (vi )) >                     (1 − pD (v j ))      (25)
passes through exactly one node in the min-cut node set. By
                                                                                       ∏                                      ∏
                                                                                    vi ∈ω2 (B)                            v j ∈ω1 (B)
Proposition V.5, there exists an NE for G. Consequently, if
p(ω)’s and f (ω)’s are equal at NE, the proof follows.                      Note that at every state in s ∈ S with s = {vi1 , . . . , vik } 0 6
    Let (pA , pD ) be an NE of G. Consider a unilateral deviation           ∑vik ∈s pD (vik ) 6 1. Thus Eqs. (24) and (25) cannot be together
in the strategy of the adversary. Let π(ω)’s for ω ∈ Ω are                  satisfied. Thus case (i) does not hold at an NE. Following
modified due to change in transition probabilities of the adver-            the similar arguments one can show that case (ii) also does
sary such that the updated probabilities of the attack paths are            not hold at an NE. This implies (p(ω1 ) − p(ω2 )) = 0 and
π(ωi ) + εi , for i = 1, . . . , |Ω |. Here, εi ’s can take positive val-   ( f (ω1 ) − f (ω2 )) = 0.
                                                 |Ω |
ues, negative values or zero such that ∑i=1 εi = 0. Consider two                Since ω1 and ω2 are arbitrary, one can show that for the
arbitrary paths, say ω1 ∈ Ω and ω2 ∈ Ω , such that a unilateral             general case
change in the adversary strategy changes π(ω1 ) and π(ω2 ) and                                   |Ω |                        |Ω |
the probabilities of the other paths remain unchanged. Without                                   ∑ εi p(ωi ) = 0 and ∑ εi f (ωi ) = 0.                     (26)
loss of generality, assume that π(ω1 ) increases by ε while                                      i=1                        i=1

π(ω2 ) decreases by ε and all other π(ω)’s             remain the same.        Eq. (26) should hold for all possible values of εi ’s satisfying
                                                                             |Ω |
                                              
As (pD , pA ) is an NE, (π(ω1 ) + ε) p(ω1 )(αA − βA ) + βA +                ∑i=1 εi  = 0. This gives p(ωi ) = p(ω j ) and f (ωi ) = f (ω j ), for
                                                                         all i, j ∈ {1, . . . , |Ω |}, at NE. This completes the proof.
 f (ω1 ) βA + (π(ω2 ) − ε) p(ω2 )(αA − βA ) + βA + f (ω2 ) βA 6
                                                                            Theorem V.7 concludes that the set of all solutions to the
π(ω1 ) p(ω1 )(αA −βA )+βA + f (ω1 ) βA +π(ω2 ) p(ω2 )(αA −                  min-cut problem characterizes the set of NE of game G. Two
                                                                            key properties of an equilibrium strategy pair is that defender
                       
βA ) + βA + f (ω2 ) βA . Thus
                                                                            chooses the min-cut nodes as trapping nodes and indeed with
   (p(ω1 ) − p(ω2 ))(αA − βA ) + ( f (ω1 ) − f (ω2 ))βA 6 0.        (17)    equal probability, which is proved in Lemma V.12.

Now consider another unilateral deviation of adversary strat-               Lemma V.12. Consider the APT vs. DIFT game G and let
egy such that π(ω1 ) decreases by ε while π(ω2 ) increases by               (pA , pD ) be an NE of G. Then under pD all the min-cut nodes
ε and all other π(ω)’s remain the same. This gives                          of the flow-network have equal probability of selecting as a
                                                                            trapping node.
   (p(ω1 ) − p(ω2 ))(αA − βA ) + ( f (ω1 ) − f (ω2 ))βA > 0.        (18)
                                                                            Proof. At NE, all the paths in the state space have equal
Eqs. (17) and (18) imply                                                    probability of getting detected, i.e., p(ω)’s are same for all
   (p(ω1 ) − p(ω2 ))(αA − βA ) + ( f (ω1 ) − f (ω2 ))βA = 0.        (19)    ω ∈ Ω (by Theorem V.7). For a path ω ∈ Ω with ω(A)
                                                                            denoting the set of nodes corresponding to the adversarial
Note that (α A − β A ) < 0 and βA > 0. Therefore, there are three           flow, we know
possible cases where Eq. (19) holds.                                                  h                       i
                                                                              p(ω) = 1 − ∏ (1 − pD (vi )) (1 − FN), for all ω ∈ Ω .
  (i) (p(ω1 ) − p(ω2 )) > 0 and ( f (ω1 ) − f (ω2 )) > 0,                                           vi ∈ω(A)
 (ii) (p(ω1 ) − p(ω2 )) < 0 and ( f (ω1 ) − f (ω2 )) < 0, and                                                                       (27)
(iii) (p(ω1 ) − p(ω2 )) = 0 and ( f (ω1 ) − f (ω2 )) = 0.                   Also, there is exactly one node corresponding to an attack
Consider case (i). For a path ω in the state space let ω(A)                 path that has nonzero probability of selecting as a trapping
denote the set of nodes in ω corresponding to the adversarial               node (Theorem V.7). Hence in set ω(A) exactly one node, say

You can also read