Dynamic Information Flow Tracking for Detection of Advanced Persistent Threats: A Stochastic Game Approach
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Dynamic Information Flow Tracking for Detection of Advanced Persistent Threats: A Stochastic Game Approach Shana Moothedath, Dinuka Sahabandu, Joey Allen, Andrew Clark, Member, IEEE, Linda Bushnell, Fellow, IEEE, Wenke Lee, Fellow, IEEE and Radha Poovendran, Fellow, IEEE Abstract—Advanced Persistent Threats (APTs) are stealthy at- known threats. However, APTs introduce information flows in tacks by intelligent adversaries. This paper studies the detection the form of data-flow commands and control-flow commands of APTs that infiltrate cyber systems and compromise specifically while interacting with the system and these are continuously targeted data and/or infrastructures. Dynamic information flow recorded in the log file of the system. arXiv:2006.12327v2 [cs.GT] 25 Jun 2021 tracking is an information trace-based detection mechanism against APTs that tags suspicious information flows in the system After an APT attack, when the consequences of the attack and performs security analysis for unauthorized use of tagged has been identified, a forensic investigation is conducted using data. In this paper, we develop an analytical model for resource- the data from the log files. The purpose of this postmortem efficient detection of APTs using an information flow tracking investigation is to understand the incident’s root cause, and game. The game is a nonzero-sum, turn-based, stochastic game with asymmetric information as the defender cannot distinguish construct appropriate defense strategies against such APTs and whether an incoming flow is malicious or benign. The payoff its variants [1]. During the forensic analysis, the system log functions of the game capture the cost for performing security data is analyzed in an offline setting. analysis and the rewards and penalties received by the players. Dynamic Information Flow Tracking (DIFT) [2] is a widely We analyze equilibrium of the game and prove that a Nash used detection mechanism for offline analysis of APT. DIFT equilibrium is given by a solution to the minimum capacity cut set problem on a flow-network derived from the system. uses the information traces recorded in the system log for The edge capacities of the flow-network are obtained from performing the security analysis [1]. The key idea behind the cost of performing security analysis. Finally, we implement DIFT is that it tags all suspicious input/data channels and our algorithm on a real-world dataset for a data exfiltration tracks the propagation of the tagged information flows through attack augmented with false-negative and false-positive rates and the system. DIFT generates security analysis using a pre- compute an optimal defender strategy. specified set of heuristic security rules whenever it observes Index Terms—Advanced Persistent Threats (APTs), Informa- an unauthorized use of tagged data. These heuristic rules are tion flow tracking, Stochastic games, Minimum-cut problem typically defined by practicing experts based on the histori- cal attack knowledge and knowledge about nominal system I. I NTRODUCTION behavior [1]. Although, security rules incorporated in the DIFT mecha- An advanced persistent threat (APT) is a prolonged and nism cover a wide range of attacks, these security rules may targeted cyber attack in which an intruder gains illicit access not be capable of verifying the authenticity of information to a system and remains undetected for an extended period flows against all possible attacks, resulting in the generation of time. The intention of an APT is to monitor system of false-negatives and false-positives. Incorrectly identifying activity and continuously mine highly sensitive data rather a benign (nominal user) flow as malicious is a false-positive than causing damage to the system or organization. APT (FP) and failing to identify a malicious (adversarial) flow is a attacks consist of multiple stages that are initiated by an false-negative (FN). For instance, while the security rules for initial compromise and reconnaissance stage to establish a buffer overflow protection [3] can be verified accurately, the foothold in the system. Attackers then progress through the security rules for web application vulnerabilities [4] cannot system, exploring and planning an attack strategy to obtain be accurately verified. Consequently, attacks that exploit web the desired data. This is followed by exfiltration of sensitive application may lead to incorrect conclusions by DIFT thereby data, which is continued over a long period of time. Defending generating FPs and FNs. against APTs is a challenging task since APTs are specifically Additionally, limited availability of resources for defense designed to evade conventional security mechanisms such as along with the performance and memory overhead imposed firewalls, anti-virus software and intrusion-detection systems by the defense mechanism on the system demands a resource- that rely on signatures and can, therefore, guard only against efficient detection technique. An analytical model of DIFT S. Moothedath, D. Sahabandu, L. Bushnell, and R. Poovendran are that captures the cost for performing security analysis and with the Department of Electrical and Computer Engineering, University the effectiveness of flow-tagging mechanisms will facilitate of Washington, Seattle, WA 98195, USA. {sm15, sdinuka, lb2, the trade-off between the effectiveness of defense and the rp3}@uw.edu. J. Allen and W. Lee are with the College of Computing, Georgia Institute resource efficiency. Further, such a model would enable the of Technology, Atlanta, GA 30332 USA. jallen309@gatech.edu, design of optimal security strategies while taking into account wenke@cc.gatech.edu. the generation of FP and FN. A. Clark is with the Department of Electrical and Computer En- gineering, Worcester Polytechnic Institute, Worcester, MA 01609 USA. In this paper, we provide an analytical model to enable aclark@wpi.edu. DIFT to optimally select locations in the system to perform
security analysis so as to maximize the probability of detection dynamical system [5]. Stochastic games have been widely while minimizing the cost of detection. Our framework is used to model security games [6], economic games [7], and based on the following insights. First, although both APT resilience of cyber-physical systems [8], where each player and DIFT are unaware of the other player’s strategy, the tries to maximize its individual payoff. A brief overview of effectiveness of the DIFT depends on the APT’s strategy and the existing results in stochastic games is given in [9]. the APT’s probability of evading detection depends on the Stochastic games have been used to address system security DIFT’s strategy. This strategic interaction motivates a game- related problems. The interaction between malicious attackers theoretic approach. Second, the efficiency of detection also and the intrusion detection system (IDS) is modeled using depends on the effectiveness of performing security analysis a stochastic game in [10]. A nonzero-sum stochastic game at different locations in the system, and hence is determined model is given in [11] to model network security configuration by rates of FP and FN. Third, the game unfolds at multiple problem in distributed IDS. Then a value iteration based algo- states between the entry points and the exit points of the rithm is proposed in [11] to find an ε-NE for an attacker model attack where each state corresponds to the position of the where multiple adversaries simultaneously attack a network. tagged flows in the system. At each state, the defender decides A zero-sum stochastic game is formulated in [12] for IDS in whether to analyze a tagged flow and which one of the a communication or computer network with interdependent tagged flows to analyze (to avoid generation of FP and FN) nodes and correlated security assets and vulnerabilities. Note while the adversary decides which process to transition to, that, [13] considered a finite-horizon game and [10]-[12] dealt at the cost of spending the defense resources. We formulate with zero-sum stochastic games for IDS. a stochastic game model that is played on an information Game-theoretic models for resource-efficient detection of flow graph (IFG). An IFG is a directed graph that expresses APTs are given in [14], [15], [16], and [17]. DIFT models the history of a system’s execution in terms of the spatio- with fixed trapping nodes are introduced and analyzed in temporal relationships between processes and subjects. The [14], [15]. Paper [16] extended the models in [14], [15] by contributions of this paper are the following. considering a DIFT model which selects the trapping nodes • We model the interaction of APT and DIFT as a nonzero- in a dynamic manner rather than being fixed and proposed a sum, two-player, turn-based stochastic game (G) with min-cut based solution approach. Reference [17] considered finite state and action spaces. In the APT vs. DIFT game, the detection of APTs when the attack consists of multiple the state of the game is the nodes of the IFG at which the attackers possibly with different capabilities and analyzed the tagged flows arrive. The location of the tagged flows is best responses of the players. Note that, the focus in [14], [15], known to both players (APT and DIFT), and this makes and [16] is resource-efficient detection of APTs and they do the state of the game observable. However, each player not consider the false-positives and false-negatives generated is unaware of the other player’s strategy. Also, before in the system. In other words, the game models in [14]-[17] performing security analysis, DIFT cannot distinguish assume that security analysis performed by DIFT can verify malicious and benign flows. This results in an asymmetric the authenticity of tagged information flows accurately. information structure. A stochastic game model for detecting APTs was recently • We analyze Nash equilibrium (NE) strategy of the game introduced in the conference version of this work [18]. The and show that an NE can be obtained from a solution to proposed method in [18] analyzed a discounted stochastic the minimum capacity cut-set problem on a flow-network game and presented a value iteration based algorithm to obtain constructed from the IFG of the system. an ε-NE of the discounted game. Due to the discounted nature • We implement our algorithms and results on real-world of the payoff, the Nash equilibrium analysis of the game in data obtained for a data exfiltration attack using the [18] reduces to a nonlinear program (NLP). Solving an NLP is Refinable Attack INvestigation (RAIN) system [1] aug- computationally challenging and the convergence of the algo- mented with FN and FP rates of the system. rithm in [18] is not guaranteed. This necessitates an alternate The rest of the paper is organized as follows. Section II solution approach for solving the DIFT vs. APT game. In this presents related work. Section III introduces preliminary con- paper, we solve an average reward (undiscounted) stochastic cepts on information flow graph, APT attack, and a descrip- game and present a min-cut based approach with guaranteed tion of DIFT-based defense mechanism. Section IV describes convergence to compute optimal defense strategies. the formulation of the turn-based stochastic game model. In this paper we consider that the trapping nodes are gener- Section V gives the solution concept of the game and the ated dynamically and we extend our previous results on min- equilibrium analysis results for computing optimal strategies cut analysis in [16] in the following aspects. (i) Since there can of the players. Section VI illustrates the numerical results be a wide range of possible APT attacks, we recognize that using data exfiltration attack data set collected using RAIN the security rules of DIFT may not be capable of verifying and augmented with FN and FP rates. Section VII concludes the authenticity of the information flows accurately thereby the paper. resulting in the generation of FP and FN. Consequently, the game model in this paper is stochastic unlike in [16]. (ii) We introduce a model with multiple information flows II. R ELATED W ORK out of which one is malicious and the remaining are benign Stochastic games model interactions of multiple agents unlike in [16] which dealt with one information flow. Note that jointly control the evolution of states of a stochastic that, while DIFT knows there is exactly one malicious flow
in the system, it is unaware which flow is the malicious 4. Lateral Movement: The attacker will increase its control flow before performing security analysis. This results in an of the system by moving laterally to new nodes in the asymmetric information between players and hence the game system. is imperfect information game. (iii) In the current game model, 5. Target Attainment: Next, the attacker will try and escalate although an information flow is concluded as benign at an its privileges which may be necessary in order to access initial inspection, it can be inspected again later during its sensitive information, such as a proprietary source code propagation. The model in [16] inspects the flow exactly once or customer information. as FN and FP rates are assumed to be zero. However, DIFT 6. Attack Completion: The final goal of the attacker is to may analyze a flow multiple times and this will result in deconstruct the attack, hopefully in a way to minimize added resource cost. The game model considered in this paper its footprint in order to evade detection. For example, captures the added resource cost incurred from reevaluation of attackers may rely on deleting the system’s log. the flows. III. P RELIMINARIES In this section, we first describe the graphical representation of the system, denoted as the information flow graph, and then present the models of the attacker and the flow tracking-based defense. A. Information Flow Graph (IFG) Let G = (VG , EG ) represent the IFG of the system. VG = {v1 , . . . , vN } consists of the processes (e.g., an instance of a computer program), files, and objects in the system and Figure 1: Schematic diagram of an APT attack timeline. EG ⊆ VG × VG represents the information flows (directed) in the system from one node to the other. IFG-based auditing is Let the set of possible entry points of the APT be denoted heavily desired by large enterprises and government agencies as λ ⊂ VG . During an attack, the goal of the APT is to capture to detect APTs. We perform our game-theoretic analysis on a subset of nodes of the IFG referred to as destinations and the IFG of the system and use DIFT as the defense mechanism denoted as D ⊂ VG . Here, λ ∩ D = 0. / Once a foothold is to detect APTs. established in the system, APT tries to elevate the privileges and proceeds to the destinations through more internal com- promises and performs data exfiltration at an ultra-low-rate. B. Attack Model: Advanced Persistent Threats (APTs) In order to achieve this the APT performs operations in the APTs are intelligent attacks with specific targets. APTs are system to transition through the nodes in VG and arrive at stealthy attacks that perform data exfiltration at an ultra-low- some node in D . In other words, the adversary (APT) selects rate to avoid detection. Unlike classical malware, APT cam- paths in the IFG and performs transitions along the paths to paigns tend to involve multiple hosts, multiple systems, and reach the set D from the set λ . extend over a long period of time, up to several months [19]. The adversary in the DIFT vs. APT game has the following APTs are characterized by their abilities to render existing properties. We consider an APT attack that generates a single security mechanisms ineffective. APTs can evade security pro- malicious information flow. The malicious flow originates at tection because existing mechanisms lack sufficient visibility an entry point of the attack and the objective of the flow is to into user, program, and operating system activities to ascertain reach a destination. The state of the game is the set of nodes the authenticity of an activity and the provenance of its data. of the information flow graph at which the tagged flows arrive. The timeline and key stages of the APT lifecycle is described APT observes the tagged flows and hence the state of the game below and presented in Figure 1. is observable to APT. However, APT is unaware of the actions 1. Reconnaissance: During the reconnaissance phase, the and the strategy of DIFT. Thus the information structure of attacker will try to gain information about the system, APT is asymmetric with respect to that of the defender. such as what nodes are accessible on the system and the security defenses that are being used. C. Defender Model: Dynamic Information Flow Tracking 2. Initial Compromise: During the initial compromise stage, (DIFT) the attacker’s goal is to gain access to an enterprise’s The objective of the defender is to prevent any adversarial network. In most cases, the attacker achieves this by information flow traversing to the destination nodes. DIFT exploiting a vulnerability or a social engineering trick, is a dynamic taint analysis based detection mechanism that such as a phishing email. consists of three main components: (i) tag sources, (ii) tag 3. Foothold Establishment: Once the attacker has completed propagation rules, and (iii) trapping nodes. Trapping nodes the initial compromise, it will establish a persistent are processes and objects (files and network endpoints) in the presence by opening up a communication channel with system that are considered as suspicious sources of informa- their Command & Control (C&C) server. tion by the system. All data originating from a tag source is
labeled or tagged and DIFT tracks the propagation of a tagged the adversarial player. We consider a nonzero-sum turn-based data through the system. Propagation of a tagged information game. The information structure of the players is asymmetric flow through the system results in tagging of more information as the defender player does not know if a tagged (suspicious) flows based on the tag propagation rules specified by the DIFT flow is malicious or benign while the adversarial player knows [20]. Tag propagation rules are defined based on two kinds of this information. The game G = {S, s0 , Σ , ∆ , P} unfolds on a information flows in the system: explicit information flows finite state space, S, with initial state, s0 , finite action space of and implicit information flows [21]. players, Σ , labeled transitions ∆ ⊆ S × Σ × S, and transition During the execution of a program in the system, DIFT probability matrix P. keeps track of the tagged information flows and generates trapping nodes for any unauthorized use of tagged data that indicate a possible attack. DIFT invokes security analysis A. State Space at the trapping nodes using the specified security rules to We consider a turn-based game with the state of the game verify the authenticity of the tagged flow thereby concluding at time t ∈ {0, 1, 2, . . .} denoted by the random variable st . Let whether there is an attack or not. These security rules are pre- T denote the time horizon of the game, i.e., the game ends specified depending on the application running on the system at time T and st 0 = sT , for all t 0 > T . The state space S is [3]. Note that tagged (suspicious) flows consist of both benign partitioned into two sets, SA and SD , such that SA ∩ SD = 0/ and and malicious flows. As the specified security rules may not SA ∪ SD = S. The states that belong to the set SA are referred necessarily cover all possible attacks by APTs, DIFT generates to as the adversary-controlled states and the states in SD are false-positives and false-negatives. An optimal selection of referred to as the defender-controlled states. Specifically, Sx trapping nodes in the IFG is hence critical to detect a wide is the subset of states at which player Px , where x ∈ {A, D}, range of attacks in the system with minimum false-positives controls the transitions. The state of the game at time t is and false-negatives. Note that while tag sources and tag the state of the system which corresponds to the position of propagation rules are known to the attacker, the trapping the tagged (suspicious) flows at time t. Let W be the number nodes are dynamically generated during the operation of the of tagged flows that arrive into the system at time t = 1. In system and hence is unknown to the attacker. Figure 2 shows practice (from the analysis of log data), only a small fraction a schematic diagram of the DIFT-based detection framework of nodes in the system receive a system call at the same time. considered in this paper. Hence in our model we assume that W 1, the state of the game is st = (x, vi1 , . . . , viW ), where x ∈ {A, D} and {vi1 , . . . , viW } ∈ VG ∪ {φ , τ}. Note that, out of the W flows one is a malicious flow and the remaining (W − 1) are benign flows. For notational convenience we consider the first flow is malicious (note that, defender does not have this information). Figure 2: Schematic diagram of the DIFT detection framework. The nodes in the figure denote system components such as processes, While DIFT observes all the W flows, it cannot distinguish files, and subjects. malicious and benign flows. APT, on the other hand, knows which flow is the malicious flow. Thus in the DIFT vs. APT The defender in the DIFT vs. APT game has the following game, APT and DIFT have asymmetric information. properties. The state of the game, which is the nodes of the Also, corresponding to a state st if vik = φ , for some k ∈ information flow graph at which the tagged flows arrive, is {1, . . . ,W }, then for all time t 0 > t the state corresponding to known to DIFT. Thus DIFT observes the state of the game. the kth flow remains φ , i.e., if a flow is dropped at time t it However, DIFT cannot distinguish a malicious and a benign remains dropped through out the rest of the game. flow before performing security analysis. Also, while DIFT knows that there is an APT in the system, DIFT is unaware of the actions and the strategy of APT. B. Action Spaces We first construct a directed flow-network F = (VF , EF ) IV. T URN -BASED APT VS . DIFT G AME G from the IFG G by introducing a source node sF with an In this section, we model the interaction of the APT with outgoing edge to all the entry points and a sink node tF with the system during the different stages of the attack as a an incoming edge from all the destination nodes. Here VF = dynamic stochastic game between the defender (DIFT) and VG ∪ {sF ,tF } and EF = EG ∪ {sF × λ } ∪ {D × tF }. the adversary (APT). Let PD be the defender player and PA be
Definition IV.1. An attack path in the flow-network F is a security analysis on an information flow and also concluded it simple directed path1 from sF to tF . The set of attack paths in as malicious, i.e., a flow is trapped, and case (iii) corresponds F is denoted as Ω D . to all W flows dropping out. The objective of the APT attack is to capture a destination node in set D . To achieve the same, APT chooses transitions C. State Transitions along a path from the set λ to the set D which translates Definition IV.2. Time t is said to be the termination time into an attack path in the flow-network F . Since APT is a of the game G if it satisfies one of the following condition: stealthy attack, APT performs minimum amount of activities (a) t = T and (b) the state of the game st = (x, vi1 , . . . , viW ), in the system in order to avoid detection. Thus, there are no where x ∈ {A, D}, is an absorbing state. cycles in the transition path of the APT through the IFG. We also note that, any IFG with set of cycles can be converted We use the notation W to denote the number of tagged into an acyclic IFG without losing any causal relationships information flows that are recorded at the first instant of time, between the components given in the original IFG. One such i.e., t = 1. The log recording system (RAIN) used in our dependency preserving conversion is node versioning given in work uses the system call in [23] to create a timestamp for [22]. Hence all ω ∈ Ω D are simple directed paths. each audit log. At every instant of the log recording, RAIN Using the construction of the flow-network, we denote records at most one system call for each system component the initial state of the game at t = 0 is denoted as s0 = (processes, files). As a result, the W information flows arrive (A, sF , . . . , sF ), since at t = 0 all flows originate at the source at distinct nodes of the information flow graph as it is not node sF . At a state st , either PD or PA chooses an action from possible for multiple flows to arrive at one node at the their respective action sets denoted by AD and AA , depending same instant of time. Let the state of the game at t = 1 on whether st ∈ SD or st ∈ SA . If st ∈ SA , the adversary decides be st = (D, vi1 , . . . , viW ). Here {vi1 , . . . , viW } are the nodes of whether to quit the attack by dropping the information flow the IFG at which the flows arrive. The DIFT-based defense or to continue the attack. If the adversary decides not to mechanism observes the flows. The player PD now chooses quit the attack, then it selects which neighboring node of the an action from the set AD (st ) ∈ {0, vi1 , . . . , viW }. At state st , IFG to transition from the current node so as to reach set there are three possibilities: (1) PD does not analyze any flow, D . In other words, the adversary either drops the malicious (2) PD analyze the malicious flow, and (3) PD analyze a flow or performs a transition along a path in Ω D and hence benign flow. Based on the action chosen by PD , i.e., AD (st ), AA := {φ } ∪ VG . Specifically, at state st = (A, vi1 , . . . , viW ), the next possible state of the game st+1 is defined. At st+1 PA where vi1 ∈ VG and vi1 ∈ / D , AA (st ) ∈ {φ } ∪ N (vi1 ), where has two possibilities to choose from: (a) to quit the flow, φ N (vi1 ) = {v j ∈ VG : (vi1 , v j ) ∈ EG }. On the other hand, if and (b) to transition to an out-neighbor of node vi1 , N (vi1 ), st ∈ SD , the action of the defender is to decide whether to i.e., AA (st+1 ) ∈ {φ } ∪ N (vi1 ). Based on the action of PA and perform security analysis on an information flow or not. If the distribution of the benign flows in the system, the next the defender chooses to perform security analysis, it also state st+2 is arrived. At t + 2, PD again chooses its action selects a flow to analyze. A node at which security analysis and the game continues in a turn-based fashion. Note that the is performed is referred to as a trapping node. Consider a DIFT will analyze only one information flow at a time t as state st = (D, vi1 , . . . , viW ) and let defender decides to perform we consider a single malicious flow in the system. security analysis on the flow at vi1 . Then vi1 is chosen as a For case (1) the action of PD is AD (st ) = 0, i.e., DIFT does trapping node. Thus, AD := {0} ∪VG , where 0 represents not not perform security analysis on any of the W information analyzing any flow. The defender performs security analysis flows. Then the next state of the game st+1 = (A, vi1 , . . . , viW ). on at most one flow in a state as there is only malicious For case (2), PD chooses action AD (st ) = vi1 and the next flow and also it is not possible to perform analysis on the state of the game is st+1 = (A, vi1 , . . . , viW ) with probability information flows at the entry points of the attack, λ . Thus FN and st+1 = (A, τ, vi2 , . . . , viW ) with probability 1 − FN. at state st = (D, vi1 , . . . , viW ), AD (st ) ∈ {0} ∪ {vik : vik ∈ / λ,k ∈ For case (3), the player PD chooses action AD (st ) = vik , {1, . . . ,W }}. We also note that the information flow chosen k 6= 1, i.e., k ∈ {2, . . . ,W }. Then the next state of the game by the DIFT to perform security analysis can be either the is st+1 = (A, vi1 , . . . , vik−1 , τ, vik+1 , . . . , viW ) with probability FP malicious flow or a benign flow. and st+1 = (A, vi1 , . . . , viW ) with probability 1 − FP. Here, FN A state st is said to be an absorbing state if the game and FP are the rate of false-negatives and false-positives in terminates at st and the action sets of the players are empty, the DIFT-architecture which are empirically computed. The i.e., AA (st ) = AD (st ) = 0. / In G a state st = (x, vi1 , . . . , viW ), values of FP and FN depend on the security rules and vary where x ∈ {A, D}, is absorbing if one of the three cases hold: across the different DIFT-architectures and they determine the transition probabilities P of the stochastic game. In order to (i) vi1 ∈ D ⊂ VG , reduce false-positives and false-negatives in the system, the (ii) vik = τ for some k ∈ {1, . . . ,W }, security rules must be capable of identifying the behavior (iii vik = φ for all k ∈ {1, . . . ,W }. and predicting the intend of information flows, i.e., determine Case (i) corresponds to adversary reaching a destination, whether an unknown flow is indeed malicious, or whether it case (ii) corresponds to the case where DIFT performed is benign flow that is exhibiting malware-like behavior [24]. 1 A directed path is said to be a simple directed path if there are no cycles If t +1 is not a terminating time instant, then the adversarial or loops in the path. player PA chooses action AA (st+1 ) ∈ {φ } ∪ N (vi1 ). The
remaining W − 1 flows follow the benign flow distribution in the adversary. Similarly, UA consists of two components: (i) a the system, which is computed empirically from the nominal reward βA > 0 for reaching a destination node in the set D , and system operation and known. Let πB : vi ∈ VG → [0, 1]|N (vi )|+1 (ii) penalty αA < 0 for getting detected by the defender. The denote the benign flow distribution. If the action of PA is φ , reward and penalty parameters and the resource cost of DIFT then st+2 = (D, φ , v j2 , . . . , v jW ), where v j2 , . . . , v jW depend on and APT do not necessarily induce a zero-sum scenario where the distribution πB . Here {v j2 , . . . , v jW } ∈ {φ } ∪VG . Note that a the payoff that is gained by one player is lost by the other. benign flow can also drop out. Now PD chooses its action and Therefore, we considered a non-zero sum payoff structure for the game continues. To summarize, given st = (D, vi1 , . . . , viW ) the DIFT vs. APT game. (w.p stands for with probability), state transitions and the Let the payoff of player Px at an absorbing state st be corresponding transition probabilities, given by P, are denoted as cx (st ), where x ∈ {A, D}. Also, let the payoff of PD (A, vi1 , . . . , viW ), w.p 1, if AD (st ) = 0, at a non-absorbing defense-controlled state st with action dt be rD (st , dt ), and the payoff of PA at a non-absorbing adversary- (A, , . . . , ), AD (st ) = vi1 , vi1 viW w.p FN, if st+1 = (A, τ, . . . , viW ), w.p 1 − FN, if AD (st ) = vi1 , controlled state st with action at be rA (st , at ). At each state in the game, st at time t, where t < T and st is a non-absorbing (A, v , . . . , v ), w.p 1 − FP, if AD (st ) = vik , k 6= 1 i i 1 W state, player chooses its action (dt for st ∈ SD and at for st ∈ SA ) (A, v , . . . , τ, . . . , v ),w.p FP, i1 iW if AD (st ) = vik , k 6= 1. and receives payoff rD (st , dt ) and rA (st , at ), respectively, and (1) the game transitions to a next state st+1 . This is continued and ( until they reach an absorbing state and incur cA (st ) and cD (st ), (D, v j1 , . . . , v jW ), if AA (st+1 ) ∈ N (vi1 ), respectively, or the game arrives at the horizon, i.e., t = T . st+2 = (2) (D, φ , v j2 , . . . , v jW ), if AA (st+1 ) = φ . Then, rA (st , at ) = 0 for all st , at , (4) D. Strategies of the Players A strategy is a rule that each player uses to select actions αA , st ∈ SA , st = (A, τ, . . . , viW ) at every step of the game. We consider mixed (stochastic) and cA (st ) = βA , st ∈ SA , st = (A, vi1 , . . . , τ, . . . , viW ) (5) behavioral player strategies. Since the strategy is mixed, at 0, otherwise a state, PD and PA select an action from the action set AD and AA , respectively, based on some probability distribution. Further, since the strategy is behavioral, the probability dis- βD , st ∈ SD , st = (D, vi1 , . . . , viW ), vi1 ∈ D tribution on the action set at a time instant t depends on all cD (st ) = αD , st ∈ SD , st = (A, τ, . . . , viW ) (6) the states traversed by the game and the actions taken by the 0, otherwise players until time t. Let dt , at be the action of the defender and the attacker, ( respectively, at time t. We denote the information available to 0, st ∈ SD , dt = 0 rD (st , dt ) = (7) the players PD and PA at time t 6 T by Yt and Zt , respectively. CD (vik ), st ∈ SD , dt = vik , k ∈ {1, . . . ,W }. Then As the initial state of the game is s0 , for a strategy pair Yt := {s0 , d0 , s1 , d1 , . . . , st−1 , dt−1 , st }, (pD , pA ) the expected payoffs of the players are Zt := {s0 , a0 , s1 , a1 , . . . , st−1 , at−1 , st }. (3) " # T We denote the set of all possible outcomes for Yt and Zt at UA (pD , pA ) = Es0 ,pA ,pD ∑ (RtA ) and (8) t=0 time t using Y ? and Z ? , respectively. Let the set of all pure " T # strategies of PD and PA for game G be P̄D and P̄A , respectively. UD (pD , pA ) = Es0 ,pA ,pD D ∑ (Rt ) , (9) Define ∆ P̄D (∆ P̄A ) as the simplex of P̄D (P̄A ) or, the set of t=0 probability distributions over P̄D (P̄A ). A mixed strategy for PD where Es0 ,pA ,pD denotes the expectation with respect to s0 , pA , is an element pD ∈ ∆ P̄D , so that pD is a probability distribution and pD and from Eqs. (4)-(7) over P̄D . Similarly, a mixed strategy for PA is an element D pA ∈ ∆ P̄A , so that pA is a probability distribution over P̄A . A r (st , dt ), x = D, st ∈ SD , t < T behavioral mixed strategy of player PD is given by pD : Y ? → rA (s , a ), x = A, s ∈ S , t < T t t t A ∆ P̄D and of player PA is given by pA : Z ? → ∆ P̄A . Rtx = D (10) c (st ), st ∈ SD , t = T, A c (st ), st ∈ SA , t = T. E. Payoffs to the Players We incorporate the idea of maximizing the probability of The payoff functions of PD and PA are denoted by UD and detection and minimizing the probability of adversary evading UA , respectively. UD consists of three components: (i) resource detection using the terms pT and pR , respectively. Further, we cost CD (vi ) < 0 for performing security analysis of a flow at capture the false-positives generated in the system using the a node vi ∈ VG , (ii) penalty βD < 0 for adversary reaching a term pFP . Note that, generation of false-positive implies that destination node in D , and (iii) reward αD > 0 for detecting the defender failed to capture the adversary and concluded a
benign flow is malicious. This means that the adversary will Lemma V.3. Consider the APT vs. DIFT game G = attain the target as the malicious flow evaded detection. Given {S, s0 , Σ , ∆ , P}. Let N be the number of nodes in the IFG a set of strategies (pD , pA ), where pD ∈ PD and pA ∈ PA , and and W be the number of flows that arrive into the system at benign distribution πB , the payoff functions of the players, i.e., time t = 1. Then |S| = O(NW ). Eqs.(8), (9), can be rewritten using Eqs. (4)-(7), (10) as Proof. We prove the result by showing that |SA | = O(NW ) UD (pD , pA ) = pT αD + (pR + pFP ) βD+∑ ∑ pD (vi )CD (vi ) (, 11) and |SD | = O(NW ). Consider an arbitrary state s = s∈SD vi ∈s (x, vi1 , . . . , viW ) ∈ S \ s0 , where x ∈ {A, D}. Here, vik ∈ VG ∪ UA (pD , pA ) = pT αA + (pR + pFP ) βA . (12) {φ , τ} for all k ∈ {1, . . . ,W }. All the W tagged flows arrive at distinct nodes in the set VG of the IFG as a process or file in Here pT is the cumulative probability that the adversarial the system receive exactly one information flow at a particular flow is detected by the defender. The term pT αD captures time. Therefore, for a state s = (x, vi1 , . . . , viW ), where vik ∈ VG the reward received by DIFT for detecting the attack and the for all k ∈ {1, . . . ,W }, ia 6= ib for a, b ∈ {1, . . . ,W }. Also, term pT αA captures the penalty incurred by APT for getting the defender performs security analysis on one tagged flow detected. Also, pR denotes the cumulative probability that the at a time and hence there is no state s = (x, vi1 , . . . , viW ) adversarial flow reaches a destination and pFP denotes the with via = vib = τ for a 6= b. Repetition of vik ’s is possible cumulative probability that a benign flow is concluded as only for states with φ . Hence s belongs to one of the two malicious (i.e., trapped) by the defender, i.e., false-positive. types: (a) {s = (x, vi1 , . . . , viW ) : vik ∈ VG ∪ {τ}} and (b) {s = The term (pR + pFP ) βD captures the penalty incurred by (x, vi1 , . . . , viW ) : vik ∈ VG ∪ {φ } such that vik = φ for at least DIFT for not detecting the attack and the term (pR + pFP ) βA some k ∈ {1, . . . ,W }}. captures the reward received by the APT for not getting Case (a) corresponds to selecting W items from a set of detected. Recall that pD (vi ) denotes the probability with which N + 1 items without any repetitions. The cardinality of this DIFT selects node vi as a security check point (trap). The (N+1)! set is N+1 W = (W )!(N+1−W W )! = O(N ). Case (b) corresponds term ∑s∈SD ∑vi ∈s (pD (vi )CD (vi )) captures the total resource cost to states with one or more φ entries. Note that, here repetitions associated with DIFT for performing security analysis. Note N+1 are allowed only for φ . The cardinality of this set is W −1 + that pT , pR are functions of pD and pA , and pFP is a function N+1 + . . . + N+1 + N+1 = O(N W −2 ). Additionally, there W −2 2 1 of pD and πB . Our focus is to compute a limiting average is an initial state s0 . From (a) and (b), the number of possible equilibrium of the nonzero-sum stochastic game. cases for s is O(NW ). Since s = (x, vi1 , . . . , viW ), where x ∈ {A, D}, we get |SA | = O(NW ) and |SD | = O(NW ). As S = V. C OMPUTATION OF O PTIMAL S TRATEGY SA ∪ SD , we get |S| = O(NW ). A. Solution Concept It is shown that there exists an NE for nonzero-sum dis- In this subsection, we present the solution concept of the counted stochastic games [25]. However, the existence of NE game. for nonzero-sum undiscounted stochastic games is open when Definition V.1. Let pA : Z ? → [0, 1]|AA | denote a strategy of the time horizon is infinite, i.e., T = ∞. In the result below the adversary. Also let pD : Y ? → [0, 1]|AD | denote a strategy we prove the existence of NE for the game G. The existence of the defender, i.e., the probability of performing security is shown by proving that time horizon is indeed finite for analysis at a state. Then the best response of the defender is G and then invoking the existence of NE for finite horizon given by undiscounted games. BR(pA ) = arg max {U D(pD , pA )}. pD ∈PD Proposition V.4. Let T denote the termination time of the Similarly, the best responses of the adversary are given by APT vs. DIFT game. The APT vs. DIFT game, G, terminates in at most is 2N steps, i.e., T 6 2N. BR(pD ) = arg max {U A(pD , pA )}. pA ∈PA Proof. The action set AA of the adversary player (APT) is to The best response of the defender is the set of defense choose a transition along an attack path of the flow-network F strategies that maximize the payoff of the defender for a (Definition IV.1) or to drop out at some point. Let Ω D denote given adversarial strategy and known benign flow distribution. the set of attack paths in F . Consider an arbitrary attack path The best response of the adversary is a set of transition ω̂ ∈ Ω D . Recall that ω̂ is a simple path as an APT due to strategies, for a given defender strategy and known benign its stealthy behavior will not traverse in cycles through the flow distribution, that maximizes the probability of adversary system. Hence the length of ω̂ is at most N, as ω̂ is a simple reaching a destination node without getting detected. directed path. Thus in any run of the game, the adversary can take at most N transitions. As G is a turn-based game, the Definition V.2. A pair of strategies (pD , pA ) is said to be a game terminates in at most 2N steps. Nash equilibrium (NE) if Proposition V.5. There exists a Nash equilibrium for the APT pD ∈ BR(pA ) and pA ∈ BR(pD ). vs. DIFT game G. An NE is a pair of strategies such that no player can benefit Proof. It is shown in [13] that there exists a Nash equilibrium through unilateral deviation, i.e., by changing the strategy for a nonzero-sum stochastic game with asymmetric informa- while the other player keeps the strategy unchanged. tion structure, under stochastic behavioral strategies, when the
time horizon is finite. Proposition V.4 prove that the APT vs. before reaching node tF . In order to ensure security, the DIFT game will terminate in 2N number of steps. Further, defender must select at least one node in all possible paths the behavioral strategy space is a subset of the strategy space from sF to tF as a trapping node. In the equilibrium analysis PA × PD . Hence by the result in [13], the proof follows. of the game we prove that an optimal strategy of the defender is indeed to select the min-cut nodes of the flow-network as B. Solution Approach trapping nodes. The objective of the adversary is to optimally In this section, we compute an NE of the DIFT vs. APT choose the transitions in such a way that the probability game G. Our approach is based on a minimum capacity of reaching tF is maximum. The adversary hence plans its cut-set formulation on a flow-network constructed from the transitions to select an attack path with least probability of information flow graph of the system. Consider the flow- detection. network F = (VF , EF ), where VF = VG ∪ {sF ,tF } and EF = The result below proves that an NE of game G is repre- EG ∪ {sF × λ } ∪ {D × tF }. Then, a cut of F is defined below. sented by the nodes corresponding to a minimum capacity cut in the flow-network F . Note that, the solution to the min-cut Definition V.6. In a flow-network F with vertex set VF and problem may not be unique. Consequently, there may exist directed edge set EF , the cut induced by Ŝ ⊂ VF is a subset of multiple NE for the game G. edges κ(Ŝ ) ⊆ EF such that for every (u, v) ∈ κ(Ŝ ), |{u, v} ∩ Ŝ | = 1. Further, given edge capacity vector cF : EF → R+ , the Theorem V.7. Every NE (pA , pD ) of the APT vs. DIFT game capacity of a cut κ(Ŝ ), is defined as the sum of the capacities G satisfies the following properties: of the edges in the cut, i.e., cF (κ(Ŝ )) = ∑e∈κ(Ŝ ) cF (e). 1) The defender’s strategy pD selects all the nodes in Ŝ ? as trapping nodes, where Ŝ ? is a set of min-cut nodes of The (source-sink)-min-cut problem aims to find a cut κ(Ŝ ? ) the flow network F . of Ŝ ? ⊂ VF such that cF (κ(Ŝ ? )) 6 cF (κ(Ŝ )) for any cut κ(Ŝ ) 2) The adversary’s strategy pA chooses transitions such that of Ŝ ⊂ VF satisfying sF ∈ Ŝ and tF ∈ / Ŝ . The (source-sink)- each attack path passes through exactly one node in the min-cut problem is well studied and there exist algorithms set Ŝ ? . to find a min-cut in time polynomial in |VF | and |EF | [26]. We prove Theorem V.7 by invoking the property that at NE In our approach to compute NE, which is detailed later in every player plays a best response against the other players the section, we find a min-cut through nodes of the flow- simultaneously. We first present prove the best response re- network instead of the edges. Hence we transform the node sults, Lemma V.8, Lemma V.10, and Lemma V.11, and then version of the min-cut problem to an equivalent edge version. present the proof of Theorem V.7. For this, we introduce an edge corresponding to each node in Lemma V.8 gives the best response of the adversary for a F except the source and sink nodes. The transformed flow- given defender strategy that selects the min-cut nodes of F network is denoted as F = (V F , E F ), where V F = VF ∪ VG0 as the trapping nodes for analyzing the flows. and E F = EF0 ∪ EG0 ∪ Eλ ∪ ED with VG0 = {v01 , . . . , v0N }, E 0 F = {(v0i , v j ) : (vi , v j ) ∈ EG }, Eλ = {(sF , vi ) : vi ∈ λ }, ED = {(v0i ,tF ) : Lemma V.8. Let Ω D be the set of all attack paths in F . vi ∈ D }, and EG0 = {(vi , v0i ) : i = 1, . . . , N}. The capacity vector Consider a defender strategy pD in which only the min-cut associated with the edges in F is given by nodes Ŝ ? of the flow-network F are chosen as trapping ( nodes with nonzero probability. Then, the best response of CD (vi ), e ∈ EG0 the adversary, BR(pD ), is to choose the transitions in such a cF (e) := (13) ∞, otherwise way that all attack paths which has nonzero probability under BR(pD ) pass through exactly one node in Ŝ ? . Note that EG0 is a cut of F and ∑e∈E 0 cF (e) < ∞. Hence any G Proof. Consider the payoff function of the adversary, minimum capacity cut in F corresponds to edges from the set UA (pD , pA ) = (pT αA + (pR + pFP ) βA ). Here, pT and pR are EG0 as the capacity of the remaining edges is ∞ as shown in functions of pD and pA . However, pFP depends only on pD Eq. (13). Further, the capacity of an edge in set (vi , v0i ) ∈ EG0 and πB . Thus for a given pD , πB , the probabilities pT and pR corresponds to the cost of conducting the security analysis at vary depending on pA , however, pFP is a constant. We prove node vi . Thus a minimum capacity cut in F corresponds to a the result using a contradiction argument. Suppose the best cut node set of the IFG with minimum total cost of performing response of PA is a strategy p0A such that there exists a path security analysis. ω̂ ∈ Ω D that passes through two nodes, say vi , v j ∈ Ŝ ? ⊂ VG , Let κ(Ŝ ? ) denotes an optimal solution to the (source-sink)- and πA (ω̂) 6= 0, where πA (ω̂) is the probability of attack path min-cut problem on F . Then κ(Ŝ ? ) ⊆ EG0 and cF (κ(Ŝ ? )) < ∞. ω̂ under p0A . Note that Ŝ ? is a min-cut and vi , v j ∈ Ŝ ? . Further, The nodes corresponding to the min-cut is CD (vi ) 6= 0 and CD (v j ) 6= 0. Thus there exists at least one directed path, say ω̂ 0 , from vi to tF that does not pass through Ŝ ? := {vi : (vi , v0i ) ∈ κ(Ŝ ? )}. (14) any node in Ŝ ? \ {vi }. Similarly, there exists at least one In the APT vs. DIFT game, the aim of the defender is directed path, say ω̂ 00 , from v j to tF that does not pass through to optimally select trapping nodes in the IFG, i.e., nodes of any node in Ŝ ? \ {v j }. As per the given defender strategy IFG to perform security analysis, such that no adversarial flow pD the only trapping node in ω̂ 0 , ω̂ 00 is vi , v j , respectively. reaches some node in D . In other words, defender ensures that Thus there exists a strategy pA such that all paths in Ω D with all adversarial flows that originate in node sF gets detected nonzero probability under pA pass through exactly one node
in Ŝ ? and UA (pD , pA ) > UA (pD , p0A ) (since pFP remains same Given p(ω)’s and f (ω)’s are equal for all ω ∈ Ω . Thus and pT is higher and pR is lower under p0A when compared to h i the values under strategy pA ). This contradicts the assumption UD (pD , pA ) = p(ω) (αD − βD ) + (1 + f (ω))βD ∑ π(ω) that p0A is a best response and completes the proof. ω∈Ω + ∑ ∑ pD (vi )CD (vi ) Recall that Ω D is the set of all source-to-sink paths in the ω∈Ω vi ∈ω flow-network F (Definition IV.1). Note that, any path in F h i that does not belong to Ω D is not a valid attack, as the attacker = p(ω) αD +(1− p(ω)+ f (ω))βD + ∑ ∑ pD (vi )CD (vi ) cannot reach a destination node. Every attack path in Ω D , ω∈Ω vi ∈ω (15) depending on the defender strategy and the transition structure Eq. (15) holds as ∑ω∈Ω π(ω) = 1. Consider a defender of the game, induces a set of paths in the state space graph strategy pD in which exactly one node in every ω̂ ∈ Ω D is of the APT vs. DIFT game. Let Ω be the set of all paths chosen as the trapping node. Assume that the defender strategy induced by Ω D . In other words, Ω is the set of all possible is modified to p0D such that more than one node in some state transition paths in the state space graph corresponding path are chosen as trapping nodes. This variation updates the to all attack paths in F . probabilities of nodes in a set of paths in Ω . Note that, due to Definition V.9. Let the set of paths induced in the state the constraints on p(ω) and f (ω) (conditions (a) and (b)), the space S by the attack paths Ω D in F be denoted as Ω . defender’s probabilities (strategy) at two nodes in a path are Then the probability of selecting a path ω ∈ Ω , denoted by dependent. Hence for all paths ω ∈ Ω whose probabilities are π(ω), is π(ω) = πA (ω)πB (ω), where πA (ω) is the probability modified in p0D , ∑vi ∈ω p0D (vi )CD (vi ) < ∑vi ∈ω pD (vi )CD (vi ). This with which an adversary chooses ω (product of the adversary holds as the probability in the single node case is less than transition probabilities along path ω) and πB (ω) is the the sum of the probabilities of more than one node case as the probability of benign flows in ω under distribution πB . events are dependent and the sum of the values of CD (·) < 0 The following result proves that, under certain conditions, are also minimum (since min-cut). This implies for a given adversary strategy the best response of the defender is to select one node in every attack path as a trapping node. ∑ ∑ p0D (vi )CD (vi ) < ∑ ∑ pD (vi )CD (vi ) . (16) ω∈Ω vi ∈VG ω∈Ω vi ∈VG Lemma V.10. Let Ω D be the set of attack paths in F , From Eq. (15) and by conditions (a) and (b), we get Ω be the set of paths in S induced by Ω D , and pA be a UD (p0D , pA ) < UD (pD , pA ). Therefore, p0D is not a best response given strategy of PA . For ω ∈ Ω , let ω(A) denote the set for the defender and no best response of the defender has of nodes in ω corresponding to the adversarial (malicious) more than one node chosen as trapping node, if p(ω)’s and flow and ω(B) denote the set of nodes in ω corresponding f (ω)’s are equal for all ω ∈ Ω . to the benign flows. Also, let p(ω) denote the probabil- ity h of detecting the adversary along path ω, i.e., p(ω) = i The result below proves that if Lemma V.10 holds, then for 1 − ∏vi ∈ω(A) (1 − pD (vi )) (1 − FN). Also, let f (ω) denote a given adversary strategy the best response of the defender the probability h of trapping a benigni flow (false-positive), is to select the min-cut nodes of F as trapping nodes. i.e., f (ω) = 1 − ∏vi ∈ω(B) (1 − pD (vi )) FP. If the defender’s strategy satisfies the following two conditions: Lemma V.11. Let Ω D be the set of attack paths in F . Assume (a) p(ω) = p(ω 0 ), for all ω, ω 0 ∈ Ω and that the defender’s strategy satisfies conditions (a) and (b) in (b) f (ω) = f (ω 0 ), for all ω, ω 0 ∈ Ω , Lemma V.10 for a given adversary strategy pA . Then the best response of the defender, BR(pA ), is to select with nonzero then the best response of the defender, BR(pA ), is to select probability a min-cut node set of the flow-network F as the with nonzero probability exactly one node in every ω̂ ∈ Ω D trapping nodes. as a trapping node. Proof. Let π(ω) denote the probability of a path ω ∈ Ω under Proof. Under conditions (a) and (b) in Lemma V.10, the best the given strategy pA and benign distribution πB . For a path ω response of the defender is to select one node in every attack in the state space S, ω(A) denotes the set of nodes in ω corre- path as trapping node. Note that all attack paths in F pass sponding to the adversarial (malicious) flow and ω(B) denotes through some node in the min-cut node set Ŝ ? . By selecting the set of nodes in ω corresponding to the benign flows. Let the nodes in Ŝ ? as trapping nodes, all possible attack paths p(ω) denote the probability of detecting the iadversary along have some nonzero probability of getting detected. We prove the result using a contradiction argument. Suppose that the h path ω, i.e., p(ω) = 1− ∏vi ∈ω(A) (1− pD (vi )) (1−FN). Also, best response of the defender is not to select the nodes in Ŝ ? let f (ω) denote the probabilityh of trapping a benign flow,i i.e., as trapping nodes. Then, there exists a subset of nodes Ŝ ⊂ VG false-positive. Then f (ω) = 1 − ∏vi ∈ω(B) (1 − pD (vi )) FP. such that ∑vi ∈Ŝ CD (vi ) < ∑v j ∈Ŝ ? CD (v j ). Further, all possible The defender’s payoff attack paths in F pass through some node in Ŝ . Then, Ŝ is a (source-sink)-cut-set and let κ(Ŝ ) := {(vi , v0i ) : vi ∈ Ŝ }. Then, h i UD (pD , pA ) = ∑ π(ω) p(ω) αD + (1 − p(ω) + f (ω))βD ω∈Ω κ(Ŝ ) is a cut set and cF (κ(Ŝ )) < cF (κ(Ŝ ? )). This contradicts the fact that κ(Ŝ ? ) is an optimal solution to the (source-sink)- + ∑ ∑ pD (vi )CD (vi ) ω∈Ω vi ∈ω min-cut problem. Hence under Lemma V.10 the best response
of the defender is to select the nodes in Ŝ ? as trapping nodes. flow and ω(B) denote the set of nodes in ω corresponding to the benign flows. Rewriting(p(ω1 ) − p(ω2 )) > 0, we get h i h i 1− ∏ (1− pD (vi )) − 1− ∏ (1− pD (v j )) (1−FN) > 0 Using Lemma V.8, Lemma V.10, and Lemma V.11, we vi ∈ω1 (A) v j ∈ω2 (A) present below the proof of Theorem V.7. (20) Proof of Theorem V.7: Consider a defender’s strategy that Similarly rewriting( f (ω1 ) − f (ω2 )) > 0, we get selects the min-cut nodes as the trapping nodes. Then, by h i h i 1− ∏ (1 − pD (vi )) − 1 − ∏ (1 − pD (v j )) FP > 0 Lemma V.8 the best response of the adversary is to select vi ∈ω1 (B) v j ∈ω2 (B) transitions in such a way that all attack paths with nonzero (21) probability pass through exactly one node in the min-cut. As 0 < FP < 1 and 0 < FN < 1, Eqs. (20) and (21) imply Further Let Ω D be the set of attack paths in F and Ω be h i the set of paths in S induced by Ω D . Lemma V.11 concludes ∏ (1 − pD (vi )) − ∏ (1 − pD (vi )) > 0 (22) that the best response of the defender is to select the min- vi ∈ω2 (A) v j ∈ω1 (A) h i cut nodes of F as trapping nodes, provided the probability ∏ (1 − pD (vi )) − ∏ (1 − pD (v j )) > 0 (23) of detecting the adversary and the probability of trapping a vi ∈ω2 (B) v j ∈ω1 (B) benign flow are equal for all ω ∈ Ω . This implies that, if NE Eqs. (22) and (23) imply strategy pair satisfy the conditions that p(ω)’s and f (ω)’s are equal for all ω ∈ Ω , then the defender’s strategy at NE ∏ (1 − pD (vi )) > ∏ (1 − pD (vi )) and (24) is to select the min-cut nodes as the trapping nodes and the vi ∈ω2 (A) v j ∈ω1 (A) adversary’s strategy is to choose an attack path such that it (1 − pD (vi )) > (1 − pD (v j )) (25) passes through exactly one node in the min-cut node set. By ∏ ∏ vi ∈ω2 (B) v j ∈ω1 (B) Proposition V.5, there exists an NE for G. Consequently, if p(ω)’s and f (ω)’s are equal at NE, the proof follows. Note that at every state in s ∈ S with s = {vi1 , . . . , vik } 0 6 Let (pA , pD ) be an NE of G. Consider a unilateral deviation ∑vik ∈s pD (vik ) 6 1. Thus Eqs. (24) and (25) cannot be together in the strategy of the adversary. Let π(ω)’s for ω ∈ Ω are satisfied. Thus case (i) does not hold at an NE. Following modified due to change in transition probabilities of the adver- the similar arguments one can show that case (ii) also does sary such that the updated probabilities of the attack paths are not hold at an NE. This implies (p(ω1 ) − p(ω2 )) = 0 and π(ωi ) + εi , for i = 1, . . . , |Ω |. Here, εi ’s can take positive val- ( f (ω1 ) − f (ω2 )) = 0. |Ω | ues, negative values or zero such that ∑i=1 εi = 0. Consider two Since ω1 and ω2 are arbitrary, one can show that for the arbitrary paths, say ω1 ∈ Ω and ω2 ∈ Ω , such that a unilateral general case change in the adversary strategy changes π(ω1 ) and π(ω2 ) and |Ω | |Ω | the probabilities of the other paths remain unchanged. Without ∑ εi p(ωi ) = 0 and ∑ εi f (ωi ) = 0. (26) loss of generality, assume that π(ω1 ) increases by ε while i=1 i=1 π(ω2 ) decreases by ε and all other π(ω)’s remain the same. Eq. (26) should hold for all possible values of εi ’s satisfying |Ω | As (pD , pA ) is an NE, (π(ω1 ) + ε) p(ω1 )(αA − βA ) + βA + ∑i=1 εi = 0. This gives p(ωi ) = p(ω j ) and f (ωi ) = f (ω j ), for all i, j ∈ {1, . . . , |Ω |}, at NE. This completes the proof. f (ω1 ) βA + (π(ω2 ) − ε) p(ω2 )(αA − βA ) + βA + f (ω2 ) βA 6 Theorem V.7 concludes that the set of all solutions to the π(ω1 ) p(ω1 )(αA −βA )+βA + f (ω1 ) βA +π(ω2 ) p(ω2 )(αA − min-cut problem characterizes the set of NE of game G. Two key properties of an equilibrium strategy pair is that defender βA ) + βA + f (ω2 ) βA . Thus chooses the min-cut nodes as trapping nodes and indeed with (p(ω1 ) − p(ω2 ))(αA − βA ) + ( f (ω1 ) − f (ω2 ))βA 6 0. (17) equal probability, which is proved in Lemma V.12. Now consider another unilateral deviation of adversary strat- Lemma V.12. Consider the APT vs. DIFT game G and let egy such that π(ω1 ) decreases by ε while π(ω2 ) increases by (pA , pD ) be an NE of G. Then under pD all the min-cut nodes ε and all other π(ω)’s remain the same. This gives of the flow-network have equal probability of selecting as a trapping node. (p(ω1 ) − p(ω2 ))(αA − βA ) + ( f (ω1 ) − f (ω2 ))βA > 0. (18) Proof. At NE, all the paths in the state space have equal Eqs. (17) and (18) imply probability of getting detected, i.e., p(ω)’s are same for all (p(ω1 ) − p(ω2 ))(αA − βA ) + ( f (ω1 ) − f (ω2 ))βA = 0. (19) ω ∈ Ω (by Theorem V.7). For a path ω ∈ Ω with ω(A) denoting the set of nodes corresponding to the adversarial Note that (α A − β A ) < 0 and βA > 0. Therefore, there are three flow, we know possible cases where Eq. (19) holds. h i p(ω) = 1 − ∏ (1 − pD (vi )) (1 − FN), for all ω ∈ Ω . (i) (p(ω1 ) − p(ω2 )) > 0 and ( f (ω1 ) − f (ω2 )) > 0, vi ∈ω(A) (ii) (p(ω1 ) − p(ω2 )) < 0 and ( f (ω1 ) − f (ω2 )) < 0, and (27) (iii) (p(ω1 ) − p(ω2 )) = 0 and ( f (ω1 ) − f (ω2 )) = 0. Also, there is exactly one node corresponding to an attack Consider case (i). For a path ω in the state space let ω(A) path that has nonzero probability of selecting as a trapping denote the set of nodes in ω corresponding to the adversarial node (Theorem V.7). Hence in set ω(A) exactly one node, say
You can also read