Preserving Privacy and Security in Federated Learning
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 Preserving Privacy and Security in Federated Learning Truc Nguyen and My T. Thai Abstract—Federated learning is known to be vulnerable to both security and privacy issues. Existing research has focused either on preventing poisoning attacks from users or on concealing the local model updates from the server, but not both. However, integrating these two lines of research remains a crucial challenge since they often conflict with one another with respect to the threat model. In this work, we develop a principle framework that offers both privacy guarantees for users and detection against poisoning attacks from them. With a new threat model that includes both an honest-but-curious server and malicious users, we first propose a secure arXiv:2202.03402v2 [cs.LG] 17 May 2022 aggregation protocol using homomorphic encryption for the server to combine local model updates in a private manner. Then, a zero-knowledge proof protocol is leveraged to shift the task of detecting attacks in the local models from the server to the users. The key observation here is that the server no longer needs access to the local models for attack detection. Therefore, our framework enables the central server to identify poisoned model updates without violating the privacy guarantees of secure aggregation. Index Terms—Federated learning, zero-knowledge proof, homomorphic encryption, model poisoning F 1 I NTRODUCTION Federated learning is an engaging framework for a large- the local models in order to update the global model with- scale distributed training of deep learning models with out learning any information about each individual local thousands to millions of users. In every round, a central model. In specific, the server can compute the sum of the server distributes the current global model to a random parameters of the local models while not having access to subset of users. Each of the users trains locally and submits the local models themselves. As a result, the local model a model update to the server. Then, the server averages updates are concealed from the server, thereby preventing the updates into a new global model. Federated learning the server from exploiting the updates of any user to infer has inspired many applications in many domains, especially their private training data. training image classifiers and next-word predictors on users’ On the other hand, previous studies also unveil a vul- smartphones [15]. To exploit a wide range of training data nerability of federated learning when it comes to certain while maintaining users’ privacy, federated learning by adversarial attacks from malicious users, especially the poi- design has no visibility into users’ local data and training. soning attacks [3], [28]. This vulnerability leverages the fact Despite the great potential of federated learning in large- that federated learning gives users the freedom to train scale distributed training with thousands to millions of their local models. Specifically, any user can train in any users, the current system is still vulnerable to certain privacy way that benefits the attack, such as arbitrarily modifying and security risks. First, although the training data of each the weights of their local model. To prevent such attacks, user is not disclosed to the server, the model update is. This a conventional approach is to have the central server run causes a privacy threat since it is suggested that a trained some defense mechanisms to inspect each of the model neural network’s parameters enable a reconstruction and/or updates, such as using anomaly detection algorithms to inference of the original training data [1], [12]. Second, fed- filter out the poisoned ones [3]. erated learning is generically vulnerable to model poisoning However, combining these two lines of research on pri- attacks that leverage the fact that federated learning enables vacy and security is not trivial since they contradict one adversarial users to directly influence the global model, another. In particular, by allowing the server to inspect local thereby allowing considerably more powerful attacks [3]. model updates from users to filter out the attacked ones, Recent research on federated learning focuses on either it violates the security model of the secure aggregation. In improving the privacy of model updates [2], [7], [30] or fact, under a secure aggregation setting, the server should preventing certain poisoning attacks from malicious users not be able to learn any useful information when observing [3], [28], [29], but not both. an individual model update. Therefore, the server cannot To preserve users privacy, secure aggregation protocols run any defense mechanism directly on each model update have been introduced into federated learning to devise a to deduce whether it is attack-free or not. training framework that protect the local models [2], [7], To tackle this issue, in this paper, we propose a frame- [30]. These protocols enable the server to privately combine work that integrates secure aggregation with defense mech- anisms against poisoning attacks from users. To circumvent • T. Nguyen and My T. Thai are with the Department of Computer & the aforementioned problem, we shift the task of running Information Science & Engineering, University of Florida, Gainesville, the defense mechanism onto the users, and then each user FL, 32611. is given the ability to attest the output of the defense mech- E-mail: truc.nguyen@ufl.edu and mythai@cise.ufl.edu anism with respect to their model update. Simply speaking,
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2 the users are the ones who run the defense mechanism, then 2.1 System model they have to prove to the server that the mechanism was We define two entities in our protocol: executed correctly. To achieve this, the framework leverages a zero-knowledge proof (ZKP) protocol that the users can 1) A set of n users U : Each user has a local training use to prove the correctness of the execution of a defense dataset, computes a model update based on this mechanism. The ZKP protocol must also make it difficult dataset, and collaborates with each other to securely to generate a valid proof if the defense mechanism was aggregate the model updates. not run correctly, and the proof must not reveal any useful 2) A single server S : Receives the aggregated model information about the model update in order the retain the updates from the users and computes a global privacy guarantee of secure aggregation. model that can later be downloaded by the users. To demonstrate the feasibility of the framework, we use We denote the model update of each user u ∈ U as an the backdoor attack as a use case in which we construct a input vector xu P of dimension m. For simplicity, all values ZKP protocol for a specific backdoor defense mechanism in both xu and u∈U xu are assumed to be integers in the [19]. Furthermore, we also propose a secure aggregation range [0, N ) forPsome known N . The protocol is correct if S protocol in federated learning to address the limitations of can learn x̄ = u∈U 0 xu for some subset of users U 0 ⊆ U . previous work. Specifically, we show that our proposed ag- We assume a federated learning process that follows the gregation protocol is robust against malicious users in a way FedAvg framework [21], in which U 0 is a random subset that it can still maintain privacy and liveness despite the of U (the server selects a random subset of users for each fact that some users may not follow the protocol honestly. training round). Security requires that (1) S only learns the Our framework then can combine both the ZKP and the sum of the users’ model updates including contributions secure aggregation protocols to tackle the above-mentioned from a large portion of users, (2) S can verify that x̄ is the privacy and security risks in federated learning. sum of clean model updates, and (3) each user u ∈ U learns Contributions. Our main contributions are as follows: nothing. • We establish a framework intergrating both the se- In addition to securely computing x̄, each user u ∈ U cure aggregation protocol and defense mechanisms computes a zero-knowledge proof πu proving that xu is against poisoning attacks without violating any pri- a clean model and not leaking any information about xu . vacy guarantees. This proof πu is then sent to and validated by the server S . • We propose a new secure aggregation protocol using The server then verifies that {πu }u∈U is consistent with the homomorphic encryption for federated learning that obtained x̄. can tolerate malicious users and a semi-honest server while maintaining both privacy and liveness. • We construct a ZKP protocol for a backdoor defense 2.2 Security model to demonstrate the feasibility of our framework. Threat model. We take into account a computationally • Finally, we analyze the computation and commu- bounded adversary that can corrupts the server or a subset nication cost of the framework and provide some of users in the following manners. The server is considered benchmarks regarding its performance. honest-but-curious in a way that it behaves honestly accord- Organization. The rest of the paper is structured as follows. ing to the training protocol of federated learning but also Section 2 provides the system and security models. The attempts to learn as much as possible from the data received main framework for combining both the secure aggregation from the users. The server can also collude with corrupted protocol and defense mechanisms against poisoning attacks users in U to learn the inputs of the remaining users. is shown in Section 3. Section 4 presents our proposed secure Moreover, some users try to inject unclean model updates to aggregation protocol for federated learning. Section 5 gives poison the global model and they can also arbitrarily deviate the ZKP protocol for a backdoor defense mechanism. We from the protocol. evaluate the complexity and benchmark our solution in As regards the security of the key generation and distri- Section 6. We discuss some related work in Section 7 and bution processes, there has been extensive research in con- provide some concluding remarks in Section 8. structing multi-party computation (MPC) protocols where a public key is computed collectively and every user com- 2 S YSTEM AND S ECURITY MODEL putes private keys [10], [22]. Such a protocol can be used in this architecture to securely generate necessary keys for Current systems proposed to address either the poisoning the users and the server. Hence, this paper does not focus attacks from users or the privacy of model updates are on securing the key generation and distribution processes designed using security models that conflict each other. In but instead assume that the keys are created securely and fact, when preserving the privacy of model updates, the honest users’ keys are not leaked to others. central server and the users are all considered as honest-but- Design goals. Besides the security aspects, our design also curious (or semi-honest). However, when defending against aims to address unique challenges in federated learning, poisoning attacks from users, the server is considered honest including operating on high-dimensional vectors and users while the users are malicious. dropping out. With respect to the threat model, we summa- Due to that conflict, this section establishes a more gen- rize the design goals as follows: eral security model combining the ones proposed in pre- vious work on security and privacy of federated learning. 1) The server learns nothing except what can be in- From the security model, we also define some design goals. ferred from x̄.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3 Fig. 1: Secure framework for FL. Each user u ∈ U trains a local model xu that is used as Pan input to (1) User aggregation and (2) ZKP of attack-free model. The User aggregation component returns c̄ = Epk ( u∈U 0 xu ) which is the encryption of the sum over the local models of honest users. The ZKP of attack-free model component returns the proof πu and the commitment C(xu ) for each xu . The outputs of these two component are then used by the central server as inputs to the Computing global model component. This component validates {πu }u∈U , obtains the global model x̄ from c̄, checks if C(x̄) is consistent with {C(xu )}u∈U , and then returns x̄. 2) Each user does not learn anything about the others’ inputs. User aggregation 3) The server can verify that x̄ is the sum of clean Inputs: models. 4) The protocol operates efficiently on high- •Local model xu of each user u ∈ U dimensional vectors •Public key pk 5) Robust to users dropping out. Output: c̄ = Epk ( u∈U 0 xu ) where U 0 ⊆ U is the set of P 6) Maintain undisrupted service (i.e., liveness) in case honest users a small subset of users deviate from the protocol. Properties: • The encryption algorithm E(·) is additively homo- morphic and IND-CPA secure. 3 S ECURE F RAMEWORK FOR F EDERATED L EARN - • xu is not revealed to other users U \{u} nor the server ING S . More formally, there exists a PPT simulator that To address the new security model proposed in Section 2, we can simulate each u ∈ U such that it can generate propose a framework to integrate secure aggregation with a view for the adversary that is computationally defenses against poisoning attacks in a way that they can indistinguishable from the adversary’s view when maintain their respective security and privacy properties. interacting with honest users. Our framework is designed to ensure the privacy of users’ • Robust to users dropping out local model while preventing certain attacks from users. As shown in Fig. 1, our framework consists of three com- Fig. 2: User aggregation component ponents: (1) users aggregation, (2) zero-knowledge proof of attack-free model, and (3) computing global model. We discuss the properties of each component as follows. 3.2 Zero-knowledge proof of attack-free model This component generates a zero-knowledge proof proving 3.1 Users aggregation the local model is free from a certain poisoning attack. In specific, we first assume the existence of a function that This component securely computes the sum of local models verifies an attack-free model as follows: from the users. First, each user u ∈ U trains a local model update xu and uses it as an input to P this component. ( Then, the component outputs c̄ = Epk ( u∈U 0 xu ) which C(x) if x is attack-free R(x) = (1) is an encryption over the sum of the model updates of ⊥ otherwise all honest users U 0 ⊆ U . Epk (·) denotes a viable additive homomorphic encryption scheme and we require that it where x is a local model, C(x) is the commitment of x. must have the property of indistinguishability under chosen Next, the component implements a zero-knowledge proof plaintext attack (IND-CPA). The encryption function will be protocol for the users to attest the correct execution of discussed in detail in Section 4. R(x) without revealing x. A specific implementation of this Furthermore, this aggregation must maintain the privacy component for defending against backdoor attacks is shown with respect to each local model xu , particularly, we should in Section 5. The properties of this component is shown in be able to simulate the output of each user by a Probabilistic Fig. 3. Polynomial Time (PPT) simulator. We specify the security The prover algorithm P is run by the users, while the properties of the Users aggregation component in Fig. 2. verifier algorithm V is run by the server S .
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4 a random variable representing the joint views of all parties Zero-knowledge proof of attack-free model that are corrupted by the adversary according to the threat Inputs: model. We show that it is possible to simulate the combined view of any subset of honest-but-curious parties given only • Local model xu of users u the inputs of those parties and the global model x̄. This • Function R(·) verifying an attack-free model means that those parties do not learn anything other than Outputs: their own inputs and x̄. • Zero-knowledge proof πu verifying the correct exe- Theorem 1. There exists a PPT simulator S such that for cution of R(xu ) Adv ⊆ U ∪ S , the output of the simulator S is computationally • Commitment C(xu ) indistinguishable from the output of Adv Properties: Proof. Refer to Appendix A. • The commitment scheme C(·) is additively homo- morphic. 4 S ECURE AGGREGATION IN F EDERATED L EARN - • Denote (P, V ) as a pair of prover, verifier algorithm, ING respecitvely. If R(xu ) = C(xu ) then Pr[(P, V )(xu ) = In this section, we describe a secure aggregation scheme that accept] = 1 realizes the User aggregation component of our framework. • If R(xu ) = ⊥ then Pr[(P, V )(xu ) = accept] ≤ Although some secure aggregation schemes have been pro- where denotes a soundness error. posed for federated learning, they are not secure against • There exists a simulator that, given no access to the our threat model in Section 2. Particularly, the aggregation prover, can produce a transcript that is computa- schemes in [2] and [30] assume no collusion between the tionally indistinguishable from the one between an server and a subset of users. The work of Bonawitz et al. honest prover and an honest verifier. [7] fails to reconstruct the global model if one malicious user arbitrarily deviates from the secret sharing protocol. Fig. 3: Zero-knowledge proof component A detailed discussion of these related work is provided in Section 7. The main challenge here is that we want to provide ro- Computing global model bustness in spite of malicious users, i.e., the protocol should be able to tolerate users that do not behave according to the Inputs: protocol and produce correct results based on honest users. • Zero-knowledge proof πu for u ∈ U Moreover, we need to minimize the users’ communication • Commitment C(xu ) for u ∈ U overhead when applying this scheme. • c̄ obtained from the User aggregation • Secret key sk 4.1 Cryptographic primitives Outputs: Global model x̄ = u∈U 0 xu where U 0 ⊆ U is the Damgard-Jurik and Paillier cryptosystems [9], [23]. Given P set of honest users pk = (n, g) where n is the public modulus and g ∈ Z∗ns+1 Properties: is the base, the Damgard-Jurik encryption of a message s • Validate πu and only keep valid proofs m ∈ Zns is Epk (m) = g m rn mod ns+1 for some random • x̄ = Dsk (c̄) r ∈ Z∗n and a positive natural number s. The Damgard- denotes the homo- Jurik cryptosystem is additively homomorphic such that Q Q • C(x̄) = u∈U 0 C(xu ) where morphic addition of the commitment scheme. Epk (m1 ) · Epk (m2 ) = Epk (m1 + m2 ). This cryptosystem is semantically secure (i.e., IND-CPA) based on the assump- tion that computing n-th residue classes is believed to be Fig. 4: Computing global model component computationally difficult. The Paillier’s scheme is a special case of Damgard-Jurik with s = 1. In the threshold variant of these cryptosystems [9], 3.3 Computing global model several shares of the secret key are distributed to some This component takes as input the outputs from the other decryption parties, such that any subset of at least t of them two components and produces the corresponding global can perform decryption efficiently, while less than t cannot model. Hence, it serves to combine two previous com- decrypt the ciphertext. Given a ciphertext, each party can ponents and it is run solely by the server. Specifically, it obtain a partial decryption. When t partial decryptions are needs to be able to decrypt c̄ from the first component, and available, the original plaintext can be obtained via a Share to validate the zero-knowledge proofs {πu }u∈U 0 from the combining algorithm. second component. Then, it outputs the sum of the model Pedersen commitment [24]. A general commitment scheme updates from the honest users. We define this component in includes an algorithm C that takes as inputs a secret mes- Fig. 4. sage m and a random r, and outputs the commitment From the third component, we can prove that this frame- commit to the message m, i.e., commit = C(m, r). A com- work preserves the privacy of users’ model updates given mitment scheme has two important properties: (1) commit that all the components satisfy their properties. Let Adv be does not reveal anything about m (i.e., hiding), and (2) it
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5 is infeasible to find (m0 , r0 ) 6= (m, r) such that C(m0 , r0 ) = in a way that allows the server to decrypt some cu to obtain C(m, r) (i.e., binding). Pedersen commitment [24] is a real- the corresponding xu . By using the threshold encryption, as ization of such scheme and it is also additively homomor- long as there is at least one honest user in Ut , the server will phic such that C(m, r) · C(m0 , r0 ) = C(m + m0 , r + r0 ) not be able to learn xu of any user. Hence, the protocol can tolerate up to t − 1 malicious users, with t < |U|. 4.2 Secure aggregation protocol Second, in the encryption phase, if a user deviates from the protocol, e.g., by computing cu as a random string or Our secure aggregation scheme relies on additive homo- using a different key other than pk , the server will not be morphic encryption schemes, that is Epk (m1 + m2 ) = able to decrypt c̄. To tolerate this behavior, the server can Epk (m1 )·Epk (m2 ). In this work, we use the Daamgard-Jurik iteratively drop each cu from c̄ until it obtains a decryptable public-key cryptosystem [9] which is both IND-CPA secure value. and additively homomorphic. The cryptosystem supports In the decryption phase, if a user u∗ does not cooperate threshold encryption and operates on the set Z∗n . to provide c0u∗ , then the server will not have all necessary We also assume that the communication channel be- information to perform decryption. However, if we still tween each pair of users is secured with end-to-end en- have at least t honest users (not counting u∗ ), we can cryption, such that an external observer cannot eavesdrop eliminate u∗ from Ut and add another user to Ut , hence, the data being transmitted between the users. The secure the server can still perform decryption. communication channel between any pair of users can be In summary, our secure aggregation protocol can tolerate mediated by the server. t−1 malicious users and be robust against |U|−t users drop- In the one-time setup phase, the secret shares {su }u∈U of ping out (where t < |U|). Therefore, t should be determined the private key sk are distributed among n users in U via the by calibrating the trade-off between the usability (small t) threshold encryption scheme. The scheme requires a subset and security (large t). Ut ⊂ U of at least t users to perform decryption with sk . This process of key distribution can be done via a trusted dealer or via an MPC protocol. After that, each secure aggregation 4.3 Efficient encryption of xu process is divided into 2 phases: (1) Encryption and (2) We discuss two methods for reducing the ciphertext size Decryption. when encrypting xu . Since xu is a vector of dimension m, Encryption P phase: To securely compute the sum x̄ = naively encrypting every element of xu using either the u∈U x u , each user u ∈ U first computes the encryption Paillier or Damgard-Jurik cryptosystems will result in a cu = Epk (xu ) and then send cu to the server. The server huge ciphertext with the expansion factor of 2 log n (i.e., the then computes the product of all {cu }u∈U , denoting as c̄: ratio between the size of cu and of xu ). Below, we present (1) a polynomial packing method and (2) a hybrid encryp- c̄ = Y cu mod ns+1 = Y Epk (xu ) mod ns+1 tion method that includes a key encapsulation mechanism u∈U u∈U (KEM) and a data encapsulation mechanism (DEM). ! (2) Polynomial packing. We can reduce the expansion factor by packing xu into polynomials as follows. Suppose xu ≡ X s+1 = Epk xu mod n (1) (2) (m) u∈U (xu , xu , ..., xu ), we transform this into: Additionally, per IND-CPA, the server knowing x̄ (or any of the non-leaf nodes in the aggregation tree) learns x0u = x(1) (2) b (3) 2b (m) u + xu · 2 + xu · 2 ... + xu · 2(m−1)b (3) nothing about each individual xu for u ∈ U . If some subset of users dropout, the rest can still submit their cu and the where b = dlog N e. Hence, x0u becomes a mb-bit number. server can compute c̄ for the remaining users. Then, to encrypt xu , we simply run the Damgard-Jurik Decryption phase: Suppose that up to this phase, some encryption on x0u . Specifically, we obtain the ciphertext amount of less than N − t users dropout, that means we still cu = E(x0u ). This cu will have (d log mb n e + 1) log n bits. The have the subset Ut of at least t users remaining to perform expansion factor can be calculated as: decryption. mb First, the server sends c̄ to all users in Ut . Each user (d log n e + 1) log n lim =1 (4) u ∈ Ut computes c0u = c̄2su ∆ where ∆ = t!, and then m→∞ mb sends c0u back to the server. The server then applies P the Therefore, as m increases, i.e., larger models, the ex- Share combining algorithm in [9] to compute x̄ = u∈U xu pansion factor approaches 1, thereby implying minimal from {c0u }u∈Ut . As a result, the server can securely obtain x̄ overhead. Note that since we are packing into polynomials, without knowing each individual xu . the additive homomorphic property of the ciphertexts is User drop-outs. Furthermore, by design, it can be seen from retained. this protocol that we can tolerate up to |U|−t users dropping Additionally, if we use this polynomial packing scheme out while the server still be able to construct the global with the Paillier encryption, it would result in an expansion model of all users. In other words, as long as there are at factor of 2 as Paillier requires the plaintext size to be smaller least t users remaining, it will not affect the outcome of the than the key size. To keep the expansion factor of 1 when protocol. Therefore, our protocol is robust to user drop-outs. using Paillier, we introduce a KEM-DEM scheme as follows. Security against active adversary. We analyze the security Hybrid encryption (KEM-DEM technique). In [8], the in the active adversary model. First, the protocol is secure in authors propose a KEM-DEM approach to compress the spite of a collusion between the server and a subset of users ciphertext of Fully Homomorphic Encryption (FHE). With
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6 FHE, they can use blockciphers to generate the keystreams. However, since our work only deals with Partially HE (PHE) like Paillier, we cannot use blockciphers, hence, we need to propose a different approach to generate the keystream with only PHE. The general idea of our protocol is as follows. Suppose a user wants to send Epk (x) to the server (for simplicity, we use the notation x instead of xu ). Instead of sending c = Epk (x) directly, the user chooses a secret random symmetric key k that has the same size as an element in x and computes c0 = (Epk (k), SEk (x)) where SEk (x) = x−k , which means subtracting k from every element in x. c0 is then sent to the Fig. 5: KEM-DEM for PHE server. When the server receives c0 , it can obtain c = Epk (x) by computing Epk (k) × Epk (SEk (x)) = Epk (x) = c. The correctness of this scheme holds because the server where c is the size of the IV. Therefore, this scheme also can obtain Epk (x) from the user. The security also holds achieves the expansion factor close to 1 as m grows, how- because the server does not know the key k chosen by ever, it incurs some computational overhead as compared to the user, hence, it cannot know x. However, this scheme the polynomial packing scheme. is not semantically secure, i.e., IND-CPA secure, because by Although both the KEM-DEM scheme and polynomial defining SEk (x) = x − k , identical elements in x will result packing have the same asymptotic expansion factor (e.g., in the same ciphertext. 1), each has its own advantages and disadvantages. The To resolve this, we use an IV (initialization vector) for the polynomial packing is simple and works out-of-the-box if encryption function. IV is a nonce which could be a random we use the Damgard-Jurik cryptosystem. This is because number or a counter that is increased each time it is used. Damgard-Jurik allows the plaintext size to be greater than Denote x = (x1 , ..., xm ), on the user’s side: the key size. However, it will not work with conventional cryptosystems like Paillier as they require the plaintext size 1) Choose the secret l-bit random key k and a nonce to be smaller than the key size. The KEM-DEM scheme IV. would work for all cryptosystems but it is more complex 2) compute the keystream ks: For i ∈ [1, m], compute and incurs more computational overhead than the poly- ksi = k × (IV ki) mod p where p is an l-bit number. nomial packing scheme. Since our protocol is using the 3) Compute SEk (x) = x − ks = (x1 − ks1 , ..., xm − Damgard-Jurik cryptosystem, Section 6 only evaluates the ksm ) polynomial packing scheme. 4) Send (IV, Epk (k), SEk (x)) to the server On the server’s side, receiving (IV, Epk (k), SEk (x)): 5 U SE CASE : ZKP FOR BACKDOOR DEFENSES 1) Compute the encrypted keystream Epk (ks): For i ∈ [1, m], compute Epk (k)IV ki mod n2 = Epk (k × This section describes an implementation that realizes the (IV ki)) = Epk (ksi ) Zero-knowledge proof component of our framework. Using 2) Compute Epk (x) = Epk (SEk (x)) · Epk (ks) = the backdoor attack as a use case, we provide a ZKP protocol Epk (x1 − ks1 + ks1 , ..., xm − ksm + ksm ) = that can detect backdoor attacks from users. Before that, we Epk (x1 , ..., xm ) establish some background regarding multi-party protocol (MPC) and ZKP. Lemma 1. By picking k uniformly at random, and p as an l-bit prime number, the DEM component (i.e., SEk (x)) is IND-CPA secure. 5.1 Preliminaries Multi-party computation (MPC) protocol. In a generic MPC Proof. To show that the DEM component is IND-CPA secure, protocol, we have n players P1 , ..., Pn , each player Pi holds it is sufficient prove that k × (IV ki) mod p is uniformly a secret value xi and wants to compute y = f (x) with random with respect to k (Theorem 3.32 in [17]). In fact, in x = (x1 , ..., xn ) while keeping his input private. The players Z∗p , by picking k uniformly at random, k × (IV ki) mod p is jointly run an n-party MPC protocol Πf . The view of the uniformly distributed if and only if neither k nor (IV ki) player Pi , denoted by V iewPi (x), is defined as the concate- shares prime factors with p. Hence, p should be an l-bit nation of the private input xi and all the messages received prime number and k can be chosen randomly from Z∗p . by Pi during the execution of Πf . Two views are consistent Since the KEM component (i.e., Epk (k)) is IND-CPA if the messages reported in V iewPj (x) as incoming from secure by default using Paillier, and the DEM component Pi are consistent with the outgoing message implied by is also IND-CPA secure as previously shown in Lemma 1, V iewPj (x) (i 6= j). Furthermore, the protocol Πf is defined we can derive the following theorem: as perfectly correct if there are n functions Πf,1 , ..., Πf,n such that y = Πf,i (V iewPi (x)) for all i ∈ [n]. An MPC Theorem 2. The KEM-DEM scheme is IND-CPA secure. protocol is designed to be t-private where 1 ≤ t < n: the Figure 5 illustrates this scheme. The bandwidth expan- views of any t-subset of the parties do not reveal anything sion factor of this scheme can be computed as: more about the private inputs of any other party. mb + 2 log n + c Zero-knowledge proof (ZKP). Suppose that we have a lim =1 (5) public function f , a secret input x and a public output y . m→∞ mb
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7 By using a ZKP protocol, a prover wants to prove that it Proof. Refer to Appendix B. knows an x s.t. f (x) = y , without revealing what x is. A ZKP protocol must have three properties as follows: (1) if Given a model x, we define R(r, x) for some random r the statement f (x) = y is correct, the probability that an as follows: honest verifier accepting the proof from an honest prover is ( 1 (i.e., completeness), (2) if the statement is incorrect, with a C(r, x) if backdoor(x) returns 1 probability less than some small soundness error, an honest R(r, x) = (6) ⊥ otherwise verify can accept the proof from a dishonest prover showing that the statement is correct (i.e., soundness), and (3) during where C(r, x) is the commitment of x the execution of the ZKP protocol, the verifier cannot learn Based on [16], we construct the ZKP protocol as follows. anything other than the fact that the statement is correct (i.e., Each user u ∈ U generates a zero-knowledge proof πu zero-knowledge). proving xu is clean, i.e., R(ru , xu ) = C(ru , xu ). To produce Backdoor attacks and defenses. Backdoor attacks aim to the proof, each user u runs Algorithm 2 to obtain πu . Note make the target model to produce a specific attacker-chosen that Πf is a 3-party 2-private MPC protocol, which means output label whenever it encounters a known trigger in the a subset of any 2 views do not reveal anything about the input. At the same time, the backdoored model behaves input xu . Then, πu and commitu = C(ru , xu ) are sent to similarly to a clean model on normal inputs. In the context the server where they will be validated as in Algorithm 3. of federated learning, any user can replace the global model Finally, for the server to validate that the x̄ obtained with another so that (1) the new model maintains a similar from secure aggregation is the sum of the models that were accuracy on the federated-learning task, and (2) the attacker has control over how the model behaves on a backdoor task Q by the ZKP protocol,Pit collects {r validated u }u∈U and verify specified by the attacker [3]. that u∈U commitu = C u∈U ru , x̄ . This is derived from the additive homomorphic property of the Pedersen To defend against backdoor attacks, Liu et al. [19] pro- commitment scheme, where pose a pruning defense that can disable backdoors by remov- ing backdoor neurons that are dormant for normal inputs. The defense mechanism works in the following manner: for each Y Y X X ! iteration, the defender executes the backdoored DNN on commitu = C(ru , xu ) = C ru , xu samples from the testing dataset, and stores the activation u∈U u∈U u∈U u∈U (7) of each neuron per test sample. The defender then computes ! X the neurons’ average activations over all samples. Next, =C ru , x̄ the defender prunes the neuron having the lowest average u∈U activation and obtains the accuracy of the resulted DNN on the testing dataset. The defense process terminates when the accuracy drops below a predetermined threshold. Algorithm 2: Generate zero-knowledge proof Input: xu , λ ∈ N, a perfectly correct and 2-private 5.2 ZKP protocol to backdoor detection MPC Πf protocol among 3 players To inspect a local model for backdoors while keeping the Output: Proof πu model private, we propose a non-interactive ZKP such that 1 πu ← ∅; each user u ∈ U can prove that xu is clean without leaking 2 for k = 1, ..., λ do any information about xu . 3 Samples 3 random vectors x1 , x2 , x3 such that First, we devise an algorithm backdoor that checks for xu = x1 + x2 + x3 ; backdoor in a given DNN model x as in Algorithm 1. 4 Consider an 3-input function f (x1 , x2 , x3 ) = R(r, x1 + x2 + x3 ) and emulate the protocol Πf on inputs x1 , x2 , x3 . Obtain the Algorithm 1: backdoor(x) – Check for backdoors in views vi = V iewPi (x) for all i ∈ [3]; a DNN model 5 Computes the commitments Input: DNN model x, validation dataset commit1 , ..., commit3 to each of the n Output: 1 if x is backdoored, 0 otherwise produced views v1 , v2 , v3 ; 1 Test x against the validation dataset, and record the 6 For j ∈ {1, 2}, computes average activation of each neuron; ej = H(j, {commiti }i∈[3] ); 2 k ← the neuron that has the minimum average 7 πu ← activation; πu ∪ {ej }j∈{1,2} , {vej }j∈{1,2} , {commiti }i∈[3] 3 Prune k from x; 8 return πu 4 Test x against the validation dataset; 5 if accuracy drops by a threshold τ then 6 return 1 Theorem 4. The proposed zero-knowledge proof protocol satisfies 7 else the completeness, zero-knowledge, and soundness properties with 8 return 0 λ soundness error 32 where λ is a security parameter. Theorem 3. backdoor(x) returns 1 ⇒ x is a backdoored model Proof. Refer to Appendix C.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8 Algorithm 3: Validate zero-knowledge proof Input: πu , commitu Output: 1 if accept, 0 if reject 1 for pk ∈ πu do 2 {ej }j∈{1,2} , {vej }j∈{1,2} , {commiti }i∈[3] ← pk ; 3 if {commiti }i∈[3] is invalid then 4 return 0 5 if ej 6= H(j, {commiti }i∈[3] ) for j ∈ {1, 2} then 6 return 0 7 if ∃e ∈ {e1 , e2 } : Πf,e (V iewPe (x)) 6= commitu Fig. 6: Wall-clock running time per user, as the size of the then data vector increases. 8 return 0 9 if ve1 and ve2 are not consistent with each other then 10 return 0 11 return 1 6 E VALUATION In this section, we analyze the computation and commu- nication cost of the proposed protocol on both the users and the server. Then we implement a proof-of-concept and (a) Total data expansion factor per (b) Total data transfer per user. user, as compared to sending the The amount of transmitted data benchmark the performance. raw data vector to the server. Dif- at 1024-bit, 2048-bit, and 3072-bit ferent lines represent different val- keys are nearly identical to 4096- 6.1 Performance analysis ues of k. bit key Users. For the communication cost, w.r.t the secure aggrega- Fig. 7: Bandwidth consumption per user tion, a user has to send out the encrypted model, which cost O(m). Regarding the zero-knowledge proof, the user needs to submit a proof of size O(mkλ) where k is the size of the Also note that the computation of each user is independent commitments. of each other, hence, the number of users do not affect the As regards the computation cost, the computation in- user’s running time. By polynomial packing the data vector cludes packing the data vector into polynomials and per- and optimizing the encryption process, the user can encrypt form encryption which take O(m + s) in total for the secure the entire data vector within seconds. aggregation. The zero-knowledge proof protocol requires λ Fig. 7a shows the bandwidth expansion factor of the runs of O(L) model inferences where L is the size of the model encryption per user as we increase the data vector validation dataset, thus the total cost is O(λL) size. It can be observed that the factor approaches 1 as Server. In the secure aggregation scheme, since the server the model size increases, thereby conforming the analysis receives data that are sent from all users, the communica- of reducing the ciphertext size. Moreover, the number of tion complexity is O(m|U|). As regards the computation users also does not affect the bandwidth consumption. This cost, for the secure aggregation, the computation includes result implies that our secure aggregation scheme induces aggregating the ciphertext from each user and perform minimal overhead as compared to sending the raw data decryption on the aggregated model, which takes O(m|U|2 ) vector, and can scale to a large number of users. in total. Validating the zero-knowledge proofs from all users Fig. 7b illustrates the total data transfer of each user. The requires another O(|U|kλ) computations. total data increases linearly with the data vector size, which follows the communication complexity O(m). Furthermore, 6.2 Proof-of-concept and benchmarking as shown in Fig. 7a, the amount of transmitted data is very Secure aggregation. To measure the performance, we im- close to the raw data size under all key sizes, therefore, plement a proof-of-concept in C++ and Python 3. The the total data transfer is approximately the vector size Damgard-Jurik cryptosystem is used with 1024-, 2048-, 3072- multiplied by 3 bytes. , and 4096-bit keys. To handle big numbers efficiently, we On the server’s side, Fig. 8a shows the running time on use the GNU Multiple Precision PArithmetic Library [14]. We the server as we increase the number of users. As can be assume that each element in u∈U xu can be stored with seen, for m = 100000, it only takes more than 15 minutes 3 bytes without overflow. This assumption conforms with to run, and for m = 500000, the server can run within the experiments in [7]. All the experiments are conducted 90 minutes, which is a good running time for machine on a single-threaded machine equipped with an Intel Core learning in distributed system. This result conforms the i7-8550U CPU and 16GB of RAM running Ubuntu 20.04. running time complexity of O(m|U|2 ). Furthermore, Fig. 8b Wall-clock running times for users is plotted in Fig. 6. gives the amount of data received by the server. Since we We measure the running time with key sizes of 1024, 2048, already optimized the size of sent data on the user-side, and 4096, respectively. As can be seen, under all settings, the the bandwidth consumption of the server is also optimized, result conforms with the running time complexity analysis. which follows the communication complexity O(m|U|).
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9 users. However, the protocol relies on users honestly fol- lowing the secret sharing scheme. Consequently, although the privacy guarantee for honest users still retains, a single malicious user can arbitrarily deviate from the secret sharing protocol and make the server fail to reconstruct the global model. Furthermore, there is no mechanism to identify the attacker if such an attack occurs. In [2] and [30], the decryption key is distributed to the users, and the server uses homomorphic encryption to (a) Wall-clock running time of (b) Total bandwidth consumption the server at different data vector of the server at various number blindly aggregate the model updates. However, the authors sizes, as the number of users in- of users, as the data vector size assume that there is no collusion between the server and creases. increase. users so that the server cannot learn the decryption key. Hence, the system does not work under our security model Fig. 8: Running time and bandwidth consumption of the where users can be malicious and collude with the server. server. The key size is fixed to 4096 bits Additionally, there have been studies on how to use generic secure MPC based on secret sharing to securely compute any function among multiple parties [4], [11], [18]. However, with these protocols, each party has to send a secret share of its whole data vector to a subset of the other parties. In order to make the protocols robust, the size of this subset of users should be considerably large. Since each secret share has the same size as the entire data vector’s, these approaches are not practical in federated learning in which we need to deal with high-dimensional vectors. (a) User generating ZKP (b) Server validating ZKPs of 100 Zero-knowledge proof for detecting poisoned models. users Although there have been many studies on developing a ZKP protocol based on MPC, they have not been widely Fig. 9: Wall-clock running time of the zero-knowledge proof used in the machine learning context. The IKOS protocol protocol on both users and server, as λ increases proposed by Ishai et al. [16] is the first work that leverages secure multi-party computation protocols to devise a zero- Zero-knowledge proof. For the implementation of the zero- knowledge argument. Giacomelli et al. [13] later refine this knowledge backdoor detection, we train a neural network approach and construct ZKBoo, a zero-knowledge argument to use as an input. We use the MNIST dataset and train a 3- system for Boolean circuits using collision-resistant hashes layer DNN model in the Tensorflow framework using ReLU that does not require a trusted setup. Our proposed ZKP activation at the hidden layers and softmax at the output protocol is inspired by the IKOS protocol with repetitions λ layer. The training batch size is 128. Then, we implement an to reduce the soundness error to 23 . We also demonstrate MPC protocol that is run between 3 emulated parties based how such protocol can be used in the context of machine on [26]. The model’s accuracy is 93.4% on the test set. learning, especially for attesting non-poisoned models while To benchmark the performance of the ZKP protocol, maintaining the privacy guarantees. we measure the proving and validation time against the Another line of research in ZKP focuses on zkSNARK soundness error. As shown before, the soundness error is implementations [5] which have been used in practical ap- 2λ plications like ZCash [25]. However, these systems depend 3 where λ is the security parameter. Fig. 9a illustrates the proving time to generate a ZKP by each user. We on cryptographic assumptions that are not standard, and can see that as λ increases, the soundness error decreases, have large overhead in terms of memory consumption and nonetheless, the proving time increases linearly. This is a computation cost, thereby limiting the statement sizes that trade-off, since the protocol runs slower if we need to obtain they can manage. Therefore, it remains a critical challenge a better soundness error. It only takes more than 5 minutes whether zkSNARK could be used in machine learning to achieve a soundness error of 0.06, and roughly 7 minutes where the circuit size would be sufficiently large. to achieve 0.03. Defense mechanisms against poisoning attacks. There Fig. 9b shows the validation time of the ZKP protocol have been multiple research studies on defending against when the server validates the proofs of 100 users. Similar to poisoning attacks. Liu et al. propose to remove backdoors the proving time, there is a trade-off between the running by pruning redundant neurons [19]. On the other hand, time and the soundness error. For soundness error of 0.03, Bagdasaryan et al. [3] devise several anomaly detection we need about 3.5 minutes to validate 100 users, which is methods to filter out attacks. More recently proposed de- suitable for most FL use cases [6]. fense mechanisms [20], [27] detect backdoors by finding differences between normal and infected label(s). Our work leverages such a defense mechanism to con- 7 R ELATED WORK struct a ZKP protocol for FL in which the users can run the Secure aggregation in FL. Leveraging secret sharing and defense locally on their model updates, and attest its output random masking, Bonawitz et al. [7] propose a secure aggre- to the server, without revealing any information about the gation protocol and utilize it to aggregate local models from local models. Different defense mechanisms may have dif-
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 10 ferent constructions of the ZKP protocols, nonetheless, they [11] Ivan Damgård, Valerio Pastro, Nigel Smart, and Sarah Zakarias. must all abide by the properties specified in our framework Multiparty computation from somewhat homomorphic encryp- tion. In Annual Cryptology Conference, pages 643–662. Springer, (Section 3). As shown in Section 5, we construct a ZKP 2012. protocol based on the defense proposed by Liu et al. [19]. [12] Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Con- 8 C ONCLUSION ference on Computer and Communications Security, pages 1322–1333, 2015. In this paper, we have proposed a secure framework for [13] Irene Giacomelli, Jesper Madsen, and Claudio Orlandi. Zkboo: federated learning. Unlike existing research, we integrate Faster zero-knowledge for boolean circuits. In 25th {usenix} both secure aggregation and defense mechanisms against security symposium ({usenix} security 16), pages 1069–1083, 2016. [14] Torbjörn Granlund. Gnu multiple precision arithmetic library. poisoning attacks under the same threat model that main- http://gmplib. org/, 2010. tains their respective security and privacy guarantees. We [15] Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ra- have also proposed a secure aggregation protocol that can maswamy, Françoise Beaufays, Sean Augenstein, Hubert Eichner, maintain liveness and privacy for model updates against Chloé Kiddon, and Daniel Ramage. Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604, 2018. malicious users. Furthermore, we have designed a ZKP pro- [16] Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky, and Amit Sahai. tocol for users to attest non-backdoored models without re- Zero-knowledge from secure multiparty computation. In Pro- vealing any information about their models. Our framework ceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 21–30, 2007. combines these two protocols and shows that the server can [17] Jonathan Katz and Yehuda Lindell. Introduction to modern cryptog- detect backdoors while preserving the privacy of the model raphy. CRC press, 2020. updates. The privacy guarantees for the users’ models have [18] Yehuda Lindell, Benny Pinkas, Nigel P Smart, and Avishay been theoretically proven. Moreover, we have presented Yanai. Efficient constant round multi-party computation combin- ing bmr and spdz. In Annual Cryptology Conference, pages 319–338. an analysis on the computation and communication cost Springer, 2015. and provided some benchmarks regarding its runtime and [19] Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. Fine- bandwidth consumption. pruning: Defending against backdooring attacks on deep neural networks. In International Symposium on Research in Attacks, Intru- sions, and Defenses, pages 273–294. Springer, 2018. R EFERENCES [20] Yingqi Liu, Wen-Chuan Lee, Guanhong Tao, Shiqing Ma, Yousra Aafer, and Xiangyu Zhang. Abs: Scanning neural networks for [1] Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, back-doors by artificial brain stimulation. In Proceedings of the 2019 Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with ACM SIGSAC Conference on Computer and Communications Security, differential privacy. In Proceedings of the 2016 ACM SIGSAC pages 1265–1282, 2019. conference on computer and communications security, pages 308–318, [21] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, 2016. and Blaise Aguera y Arcas. Communication-efficient learning of [2] Yoshinori Aono, Takuya Hayashi, Lihua Wang, Shiho Moriai, et al. deep networks from decentralized data. In Artificial intelligence and Privacy-preserving deep learning via additively homomorphic statistics, pages 1273–1282. PMLR, 2017. encryption. IEEE Transactions on Information Forensics and Security, [22] Takashi Nishide and Kouichi Sakurai. Distributed paillier cryp- 13(5):1333–1345, 2017. tosystem without trusted dealer. In International Workshop on [3] Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, Information Security Applications, pages 44–60. Springer, 2010. and Vitaly Shmatikov. How to backdoor federated learning. In [23] Pascal Paillier. Public-key cryptosystems based on composite International Conference on Artificial Intelligence and Statistics, pages degree residuosity classes. In International conference on the theory 2938–2948. PMLR, 2020. and applications of cryptographic techniques, pages 223–238. Springer, [4] Michael Ben-Or, Shafi Goldwasser, and Avi Wigderson. Complete- 1999. ness theorems for non-cryptographic fault-tolerant distributed [24] Torben Pryds Pedersen. Non-interactive and information-theoretic computation. In Providing Sound Foundations for Cryptography: On secure verifiable secret sharing. In Annual international cryptology the Work of Shafi Goldwasser and Silvio Micali, pages 351–371. 2019. conference, pages 129–140. Springer, 1991. [5] Nir Bitansky, Alessandro Chiesa, Yuval Ishai, Omer Paneth, and [25] Eli Ben Sasson, Alessandro Chiesa, Christina Garman, Matthew Rafail Ostrovsky. Succinct non-interactive arguments via linear Green, Ian Miers, Eran Tromer, and Madars Virza. Zerocash: interactive proofs. In Theory of Cryptography Conference, pages 315– Decentralized anonymous payments from bitcoin. In 2014 IEEE 333. Springer, 2013. Symposium on Security and Privacy, pages 459–474. IEEE, 2014. [6] Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry [26] Sameer Wagh, Divya Gupta, and Nishanth Chandran. Securenn: 3- Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub party secure computation for neural network training. Proceedings Konečnỳ, Stefano Mazzocchi, H Brendan McMahan, et al. To- on Privacy Enhancing Technologies, 2019(3):26–49, 2019. wards federated learning at scale: System design. arXiv preprint [27] Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal arXiv:1902.01046, 2019. Viswanath, Haitao Zheng, and Ben Y Zhao. Neural cleanse: [7] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marce- Identifying and mitigating backdoor attacks in neural networks. done, H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron In 2019 IEEE Symposium on Security and Privacy (SP), pages 707– Segal, and Karn Seth. Practical secure aggregation for privacy- 723. IEEE, 2019. preserving machine learning. In proceedings of the 2017 ACM [28] Hongyi Wang, Kartik Sreenivasan, Shashank Rajput, Harit Vish- SIGSAC Conference on Computer and Communications Security, pages wakarma, Saurabh Agarwal, Jy-yong Sohn, Kangwook Lee, and 1175–1191, 2017. Dimitris Papailiopoulos. Attack of the tails: Yes, you really [8] Anne Canteaut, Sergiu Carpov, Caroline Fontaine, Tancrède Le- can backdoor federated learning. In H. Larochelle, M. Ranzato, point, María Naya-Plasencia, Pascal Paillier, and Renaud Sirdey. R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Stream ciphers: A practical solution for efficient homomorphic- Information Processing Systems, volume 33, pages 16070–16084. ciphertext compression. Journal of Cryptology, 31(3):885–916, 2018. Curran Associates, Inc., 2020. [9] Ivan Damgård, Mads Jurik, and Jesper Buus Nielsen. A generaliza- [29] Chulin Xie, Keli Huang, Pin-Yu Chen, and Bo Li. Dba: Distributed tion of paillier’s public-key system with applications to electronic backdoor attacks against federated learning. In International Con- voting. International Journal of Information Security, 9(6):371–385, ference on Learning Representations, 2019. 2010. [30] Chengliang Zhang, Suyi Li, Junzhe Xia, Wei Wang, Feng Yan, [10] Ivan Damgård and Maciej Koprowski. Practical threshold rsa and Yang Liu. Batchcrypt: Efficient homomorphic encryption for signatures without a trusted dealer. In International Conference on cross-silo federated learning. In 2020 {USENIX} Annual Technical the Theory and Applications of Cryptographic Techniques, pages 152– Conference ({USENIX}{ATC} 20), pages 493–506, 2020. 165. Springer, 2001.
You can also read