CHEETAH: An Ultra-Fast, Approximation-Free, and Privacy-Preserved Neural Network Framework based on Joint Obscure Linear and Nonlinear ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
1 CHEETAH: An Ultra-Fast, Approximation-Free, and Privacy-Preserved Neural Network Framework based on Joint Obscure Linear and Nonlinear Computations Qiao Zhang, Cong Wang, Chunsheng Xin, and Hongyi Wu arXiv:1911.05184v2 [cs.LG] 11 Feb 2021 Abstract—Machine Learning as a Service (MLaaS) is enabling a wide range of smart applications on end devices. However, such convenience comes with a cost of privacy because users have to upload their private data to the cloud. This research aims to provide effective and efficient MLaaS such that the cloud server learns nothing about user data and the users cannot infer the proprietary model parameters owned by the server. This work makes the following contributions. First, it unveils the fundamental performance bottleneck of existing schemes due to the heavy permutations in computing linear transformation and the use of communication intensive Garbled Circuits for nonlinear transformation. Second, it introduces an ultra-fast secure MLaaS framework, CHEETAH, which features a carefully crafted secret sharing scheme that runs significantly faster than existing schemes without accuracy loss. Third, CHEETAH is evaluated on the benchmark of well-known, practical deep networks such as AlexNet and VGG-16 on the MNIST and ImageNet datasets. The results demonstrate more than 100× speedup over the fastest GAZELLE (Usenix Security’18), 2000× speedup over MiniONN (ACM CCS’17) and five orders of magnitude speedup over CryptoNets (ICML’16). This significant speedup enables a wide range of practical applications based on privacy-preserved deep neural networks. Index Terms—privacy; machine learning as a service; secure two party computation; joint obscure neural computing ✦ 1 I NTRODUCTION F ROM Alexa and Google Assistant to self-driving vehi- cles and Cyborg technologies, deep learning is rapidly advancing and transforming the way we work and live. It cloud for inference, but they want the data privacy well pro- tected, preventing curious cloud provider from mining valu- able information. In many domains such as health care [9] is becoming prevalent and pervasive, embedded in many and finance [10], data are extremely sensitive. For example, systems, e.g., for pattern recognition [1], medical diagnosis when patients transmit their physiological data to the server [2], speech recognition [3] and credit-risk assessment [4]. for medical diagnosis, they do not want anyone (including In particular, deep Convolutional Neural Network (CNN) the cloud provider) to see it. Regulations such as Health has demonstrated superior performance in computer vision Insurance Portability and Accountability Act (HIPAA) [11] such as image classification [5], [6] and facial recognition [7], and the recent General Data Protection Regulation (GDPR) among many others. in Europe [12] have been in place to impose restrictions Since training a deep neural network model is resource- on sharing sensitive user information. On the other hand, intensive, cloud providers begin to offer Machine Learning cloud providers do not want users to be able to extract as a Service (MLaaS) [8], where a proprietary model is their proprietary, valuable model that has been trained with trained and hosted on clouds, and clients make queries (in- significant resource and efforts, as it may turn customers ference) and receive results through a web portal. While this into one-time shoppers [13]. Furthermore, the trained model emerging cloud service is embraced as important tools for contains private information about the training data set efficiency and productivity, the interaction between clients and can be exploited by malicious users [14], [15], [16]. To and cloud servers creates new vulnerabilities for unau- this end, there is an urgent need to develop effective and thorized access to private information. This work focuses efficient schemes to ensure that, in MLaaS, a cloud server on ensuring privacy-preserved while efficient inference in does not have access to users’ data and a user cannot learn MLaaS. the server’s model. Although communication can be readily secured from end to end, privacy still remains a fundamental challenge. 1.1 Retrospection: Evolvement of Privacy-Preserved On the one hand, the clients must submit their data to the Neural Networks The quest began in 2016 when CryptoNets [17] was pro- posed to embed Homomorphic Encryption (HE) [29] into • Qiao Zhang, Chunsheng Xin, and Hongyi Wu are with the Department of Electrical and Computer Engineering, Old Dominion University, Norfolk, CNN. It was the first work that successfully demonstrated VA, 23529. Cong Wang is with the Department of Computer Science, Old the feasibility of calculating inference over Homomorphi- Dominion University, Norfolk, VA, 23529. cally encrypted data. While the idea is conceptually straight- E-mail: {qzhan002, c1wang, cxin, h1wu}@odu.edu forward, its prohibitively high computation cost renders it
2 TABLE 1 Comparison of Privacy-Preserved Neural Networks. Scheme for Linear Computation Scheme for Non-Linear Computation Speedup over [17] CryptoNets [17] HE HE (square approx.) – Faster CryptoNets [18] HE HE (polynomial approx.) 10× GELU-Net [19] HE Plaintext (no approx.) 14× E2DM [20] Packed HE & Matrix optimization HE (square approx.) 30× SecureML [21] HE & Secret share GC (piecewise linear approx.) 60× Chameleon [22] Secret share GMW & GC (piecewise linear approx.) 150× MiniONN [23] Packed HE & Secret share GC (piecewise linear approx.) 230× DeepSecure [24] GC GC (polynomial approx.) 527× SecureNN [25] Secret share GMW (piecewise linear approx.) 1000× FALCON [26] Packed HE with FFT GC (piecewise linear approx.) 1000× XONN [27] GC GC (piecewise linear approx.) 1000× GAZELLE [28] Packed HE & Matrix optimization GC (piecewise linear approx.) 1000× CHEETAH Packed HE & Obscure matrix cal. Obscure HE & SS (no approx.) 100,000× impractical for most applications that rely on non-trivial of magnitude faster than CryptoNets. So far, GAZELLE is deep neural networks with a practical size in order to considered the state-of-art framework for secure inference characterize complex feature relations [6]. For instance, computation. CryptoNets takes about 300s for computing inference even Two recent works unofficially published in arXiv re- on a simple three-layer CNN architecture. With the increase ported new designs that achieved computation speed at of depth, the computation time grows exponentially. More- the same order of magnitude as GAZELLE. FALCON [26] over, several key functions in neural networks (e.g., acti- leveraged fast Fourier Transform (FFT) to accelerate linear vation and pooling) are nonlinear. CryptoNets had to use computation. Its computing speed is similar to GAZELLE, Taylor approximation, e.g., replacing the original activation while the communication cost is higher. SecureNN [25] function with a square function. Such approximation leads adopted a design philosophy similar to Chameleon and to not only degraded accuracy compared with the original MiniONN, but exploited a 3-party setting to accelerate model, but also instability and failure in training. the secure computation, to obtain a 4 times speedup over Following CryptoNets, the past two years have seen a GAZELLE, at the cost of using a semi-trust third party. multitude of works aiming to improve the computation Additionally, XONN [27] worked in line with DeepSecure accuracy and efficiency (as summarized in Table 1). A neural to explore the GC based design for Binary Neural Network network essentially consists of two types of computations, (BNN), achieving up to 7 times speedup over GAZELLE, at i.e., linear and nonlinear computations. The former focuses the cost of accuracy drop due to the binary quantization in on matrix calculation to compute dot product (for fully- BNN. connected dense layers) and convolution (for convolutional In addition, a few approaches were introduced to not layers). The latter includes nonlinear functions such as acti- just improve computation efficiency but also provide other vation, pooling and softmax. A series of studies have been desirable features. For example, GELU-Net [19] aims to carried out to accelerate the linear computation, or nonlinear avoid approximation of non-linear functions. It partitioned computation, or both. For example, faster CryptoNets [18] computation onto non-colluding parties: one party performs leveraged sparse polynomial multiplication to accelerate the linear computations on encrypted data, and the other exe- linear computation. It achieved about 10 times speedup cutes nonpolynomial computation in an unencrypted but over CryptoNets. SecureML [21], Chameleon [22] and Min- secure manner. It showed over 14 times speedup than Cryp- iONN [23] adopted a similar design concept. Among them, toNets and does not have accuracy loss. E2DM [20] aimed to MiniONN achieved the highest performance gain. It applied encrypt both data and neural network models, assuming the Secret Share (SS) for linear computation, and packed HE [30] latter are uploaded by users to untrusted cloud. It focused to pre-share a noise vector between the client and server on matrix optimization by combining Homomorphic oper- offline, in order to cancel the noise during secure online ation and ciphertext permutation, demonstrating 30 times computation. In [23], non-linear functions were approxi- speedup over CryptoNets. mated by piece-wise linear segments, and computed by using Garbled Circuits (GC), which resulted in 230 times 1.2 Contribution of This Work speedup over CryptoNets. DeepSecure [24] took an all- GC approach, i.e., implemented both linear and nonlinear Despite the fast and promising improvement in compu- computations using GC. It optimized the gates in the tradi- tation speed, there is still a significant performance gap tional GC module to achieve a speedup of 527 times over to apply privacy-preserved neural networks on practical CryptoNets. Finally, GAZELLE [28] focused on the linear applications. The time constraints in many real-time appli- computation, to accelerate the matrix-vector multiplication cations (such as speech recognition in Alexa and Google based on packed HE, such that Homomorphic computa- Assistant) are within 10 seconds [31], [32]; self-driving cars tions can be efficiently parallelized on multiple packed even demand an immediate response less than a second [33]. ciphertexts. GAZELLE demonstrated impressive speedup of In contrast, our benchmark has showed that GAZELLE, about 20 times compared with MiniONN and three orders which has achieved the best performance so far in terms of
3 speed among existing schemes, takes 161s and 1731s to run solutions rely on piece-wise or polynomial approximation the well-known practical deep neural networks AlexNet [5] for nonlinear functions such as activation. This leads to and VGG-16 [6], which renders it impractical in real-world degraded accuracy and the accuracy loss is often significant. applications. The proposed scheme takes a secret sharing approach with In this paper, we propose CHEETAH, an ultra-fast, 0-multiplicative-depth packed HE to avoid the use of com- secure MLaaS framework that features a carefully crafted putationally expensive GC. A novel design is developed to secret sharing scheme to enable efficient, joint linear and allow the server and client to each obtain a share of Homo- nonlinear computation, so that it can run significantly faster morphic encrypted nonlinear transformation result based than the state-of-the-art schemes. It eliminates the need on the obscure linear transformation as discussed above. to use approximation for nonlinear computations; hence, This approach eliminates the need to use approximation for unlike the existing schemes, CHEETAH does not have accu- nonlinear functions and achieves enormous speedup. For racy loss. It, for the first time, reduces the computation delay example, it is 1793 times faster than GAZELLE in comput- to milliseconds and thus enables a wide range of practical ing the most common nonlinear ReLu activation function, applications to utilize privacy-preserved deep neural net- under the output dimension of 10K. works. To the best of knowledge, this is also the first work Overall, the proposed CHEETAH is an ultra-fast privacy- that demonstrates privacy-preserved inference based on the preserved neural network inference framework without well-known, practical deep architectures such as AlexNet accuracy loss. It enables obscure neural computing that and VGG. intrinsically merges the calculation of linear and nonlinear The significant performance improvement of CHEETAH transformations and effectively reduces the computation stems from a creative design, called joint obscure neural time. We benchmark the performance of CHEETAH with computing. Computations in neural networks follow a se- well-known deep networks for secure inference. Our results ries of operations alternating between linear and nonlinear show that it is 218 and 334 times faster than GAZELLE, transformations for feature extraction. Each operation takes respectively, for a 3-layer and a 4-layer CNN used in pre- the output from the previous layer as the input. For exam- vious works. It achieves a significant speedup of 130 and ple, the nonlinear activation is computed on the weighted 140 times, respectively, over GAZELLE in the well-known, values of linear transformations (i.e., the dot product or practical deep networks AlexNet and VGG-16. Compared convolution). All existing approaches discussed in Sec. 1.1 with CryptoNets, CHEETAH achieves a speedup of five essentially follow the same framework, aiming to securely orders of magnitudes. compute the results for each layer and then propagate to The rest of the paper is organized as follows. Section 2 in- the next layer. This seemingly logic approach, however, troduces the system and threat models. Section 3 elaborates becomes the fundamental performance hurdle as revealed the system design of CHEETAH, followed by the security by our analysis. analysis in Section 4. Experimental results are discussed in First, although matrix computation has been deeply op- Section 5. Finally, Section 6 concludes the paper. timized based on packed HE for the linear transformation in the state-of-the-art GAZELLE, it is still costly. The com- 2 S YSTEM AND T HREAT M ODELS putation time of the linear transformation is dominated by In this section, we introduce the overall system architecture the operation called ciphertext permutation (or Perm) [28], and threat model, as well as the background knowledge which generates the sum based on a packed vector. It is about cryptographic tools used in our design. required in both convolution (for a convolutional layer) and dot product (for a dense layer). From our experiments, one Perm is 56 times slower than one Homomorphic addition 2.1 System Model and 34 times slower than one Homomorphic multiplica- We consider a MLaaS system as shown in Fig. 1. The client is tion. We propose an approach to enable an incomplete the party that generates or owns the private data. The server (or obscure) linear transformation result to propagate to is the party that has a well-trained deep learning model and the next nonlinear transformation as the input to continue provides the inference service based on the client’s data. For the neural computation, reducing the number of ciphertext example, a doctor performs a chest X-ray for her patient and permutations to zero in both convolution and linear dot sends the X-ray image to the server on the cloud, which runs product computation. the neural network model and returns the inference result Second, most existing schemes (including GAZELLE) to assist the doctor’s diagnosis. adopted GC to compute the nonlinear transformation (such While various deep learning techniques can be em- as activation, pooling and softmax), because GC generally ployed to enable MLaaS, we focus on the Convolutional performs better than HE when the multiplicative depth is Private chest X-ray greater than 0 (i.e., nonlinear) [28]. However, the GC-based approach is still costly. The overall network must be repre- Client Server sented as circuits and involves interactive communications between two parties to jointly evaluate neural functions over their private inputs. The time cost is often significant for large and deep networks. Specifically, our benchmark Inference result: shows that it takes about 263 seconds to compute a nonlin- Pneumonia ear ReLu function with 3.2M input values, which is part of the VGG-16 framework [6]. Moreover, all existing GC-based Fig. 1. An overview of the MLaaS system.
4 Neural Network (CNN), which has achieved wide success where k and x are the kernel and input, respectively. For the and demonstrated superior performance in computer vision ease of description, we omit the bias in Eq. (1). Nevertheless, such as image classification [5], [6] and face recognition [7]. it can be easily transformed into the convolution or weight A CNN consists of a stack of layers to learn a complex matrix multiplication [35]. relation among the input data, e.g., the relations between The last convolutional layer is typically connected with pixels of an input image. It operates on a sequence of the fully-connected layer, which computes the weighted sum, linear and nonlinear transformations to infer a result, e.g., i.e., a dot product between the weight matrix w of size no × whether an input medical image indicates the patient has ni and a flatten feature vector of size ni × 1. The output is pneumonia. The linear transformations are in two typical a vector with the size of no × 1. Each element of the output forms: dot product and convolution. The nonlinear transfor- vector is calculated below: mations leverage activations such as the Rectified Linear Unit X ni (ReLu) to approximate complex functions [34] and pooling z(i) = w(i, j)x(j). (2) j=1 (e.g., max pooling and mean pooling) for dimensionality reduction. CNN repeats the linear and nonlinear transfor- Activation. Nonlinear activation is applied to convolu- mations recursively to reduce the high-dimensional input tional and weighted-sum outputs in an elementwise man- data to a low-dimensional feature vector for classification ner. In this work, we mainly target on the ReLu activation at the fully connected layer. Without losing generality, we function, f (x) = max{0, x}, which is widely adopted in use image classification as an example in the following state-of-the-art neural networks such as AlexNet [5] and discussion, aiming to provide a lucid understanding of the VGG-16 [6]. CNN architecture as illustrated in Fig. 2. Pooling. Pooling conducts downsampling to reduce di- Convolutional Layer. As shown in Fig. 2(b), the input to mensionality. In this work, we consider Mean pooling, which a convolutional layer has the dimensions of wi × hi × ci , is implemented in CryptoNets and also commonly adopted where wi and hi are the width and height of the input in state-of-art CNNs. It splits a feature map into regions and feature map and ci is the number of the feature maps (or averages the regional elements. Compared to max pooling channels). For the first layer, the feature maps are simply (another pooling function which selects the maximum value the input images. Hereafter, we use the subscript i to denote in each region), authors in [36] have claimed that while input and o output. The input is convolved with co groups the max and mean pooling functions are rather similar, the of kernels. The size of each group of kernel is kp × kq × ci , in use of mean pooling encourages the network to identify which kp and kq are the width and height of the kernel. The the complete extent of the object, which builds a generic number of channels of the kernel group must match with localizable deep representation that exposes the implicit the input, i.e., ci . The convolution will produce the feature attention of CNNs on an image. output, with a size of wo × ho × co . More specifically, the (m, n)-th element in the t-th (1 ≤ t ≤ co ) output feature is 2.2 Threat Model calculated as follows: Similar to [21], [23], [24], [28], we adopt the semi-honest X X X ci kp −1 kq −1 model, in which both parties try to learn additional in- z(m, n, t) = k(u, v, j, t)x(m − u, n − v, j), (1) formation from the message received (assuming they have j=1 u=0 v=0 bounded computational capability). That is, the client C and server S will follow the protocol, but C wants to learn the Input image Conv layer Pooling Output predictions model parameters and S attempts to learn the data. Hence, Flatten vector Cat FC layer FC layer Dog the goal is to make the server oblivious of the private data Deer Frog from the clients, and prevent the client from learning the ... Truck model parameters of the server. We would prove that the proposed framework is secure under semi-honest corrup- (a) Overall network structure. tion using ideal/real security [37]. Our framework targets to protect clients’ sensative data, and service providers’ models which have been trained by service providers with co groups of kernels kq significant resources (e.g., private training data and comput- hi ing power). Protecting models is usually sufficient through ho ho ... wi wo wo protecting the model parameters, which are the most critical kp Output after information for a model. Moreover, many applications are Input Output activation even built on well-known deep network structures such as (b) Convolutional layer. AlexNet [5], VGG16/19 [6] and ResNet50 [38]. Hence it is 1 typically not necessary to protect the structure (number of layers, kernel size, etc). In the case that the implemented no no no ni structure is proprietary and has to be protected, service ho hi 1 wi wo providers can introduce redundant layers and kernels to ni Input Output Output- hide the real structure [23], [28]. Input Output Weight matrix -after activation There is also an array of emerging attacks to the security (c) Pooling. (d) Fully connected layer. and privacy of the neural networks [13], [14], [16], [39], [40], Fig. 2. A three-layer CNN: (a) overall network structure, (b) convolutional [41]. They can be further classified by the processes that they layer, (c) pooling, (d) fully connected layer. are targeting at: training, inference (model) and input.
5 (1) Training. The attack in [16] attempts to steal the hy- homomorphic addition (Add), multiplication (Mult) and perparameters during training. The membership inference permutation (Perm). Add([x],[y]) outputs a ciphertext [x+y] attack [14] wants to find out whether an input belongs to which encrypts the elementwise sum of x and y . Mult([x],u) the training set based on the similarities between models outputs a ciphertext [x ◦ u] which encrypts the elementwise that are privately trained or duplicated by the attacker. This multiplication of x and plaintext u. It is worth pointing paper focuses on the inference stage and does not consider out that CHEETAH is designed to require multiplication such attacks in training, since the necessary variables for between a ciphertext and a plaintext only, but not the launching these attacks have been released in memory and much more expensive multiplication between two cipher- the training API is not provided. texts. Perm([x]) permutes the n elements in [x] into another (2) Model. The model extraction attack [13] exploits the ciphertext [xπ ], where xπ = (x(π 0 ), x(π 1 ), · · · ) and πi is a linear transformation at the inference stage to extract the permutation of {0, 1, · · · , n − 1}. model parameters and the model inversion attack [39] at- The run-time complexities of Add and Mult are signifi- tempts to deduce the training sets by finding the input that cantly lower than Perm. From our experiments, one Perm is maximizes the classification probability. The success of these 56 times slower than one Add and 34 times slower than one attacks requires full knowledge of the softmax probability Mult. This observation motivates the design of CHEETAH, vectors. To mitigate them, the server can return only the which completely eliminates permutations in convolution predicted label but not the probability vector or limits and dot product transformations, thus substantially reduc- the number of queries from the attacker. The Generative ing the overall computation time. Adversarial Networks (GAN) based attacks [40] can recover It is worth pointing out that neural networks always the training data by accessing the model. In this research, deal with floating point numbers while the PHE is in the since the model parameters are successfully protected from integer domain. Specifically, neural networks typically use the clients, this attack can be defended effectively. real number arithmetic, not modular arithmetic. On the (3) Input. A plethora of attacks adopt adversarial exam- other hand, direct increasing plaintext modulus in PHE ples by adding a small perturbation to the input in order to increases noise budget consumption, and also decreases the cause the neural network to misclassify [41]. Since rational initial noise budget, which causes limited Homomorphic clients pay for prediction services, it is not of their interest to operations. As for the original floating point numbers in obtain an erroneous output. Thus, this attack does not apply neural networks, they are firstly quantized into 8-bit signed in our framework. integers with fix-point encoding. As for the transforma- tion from fix-point number to integer, our implementation 2.3 Cryptographic Tools adopts the highly efficient encoding for BFV in Microsoft The proposed privacy-preserved deep neural network SEAL library [44] to establish a mapping from real numbers framework, i.e., CHEETAH, employs two fundamental in neural network to plaintext elements in PHE. This makes cryptographic tools as outlined below. real number arithmetic workable in PHE without data over- (1) Packed Homomorphic Encryption. Homomorphic En- flow. Thereafter, our design is described in floating point cryption (HE) is a cryptographic primitive that supports domain with real number input. meaningful computations on encrypted data without the (2) Secret Sharing. In the secret sharing protocol, a value decryption key. It has found increasing applications in data is shared between two parties, such that combining the two communication, storage and computation [42]. Traditional secrets yields the true value [22]. In order to additively HE operates on individual ciphertext [19], while the packed share a secret m, a random number, s, is selected and two homomorphic encryption (PHE) enables packing of multiple shares are created as hmi0 = s and hmi1 = m − s. Here, values into a single ciphertext and performs component- m can be either plaintext or ciphertext. A party that wants wise homomorphic computation in a Single Instruction to share a secret sends one of the shares to the other party. Multiple Data (SIMD) manner [43] to take the advantages To reconstruct the secret, one needs to only add two shares of parallelism. Among various PHE techniques, our work m = hmi0 + hmi1 . builds on the private-key Brakerski-Fan-Vercauteren (BFV) While the overall idea of secret share (SS) is straightfor- scheme [30], which involves four parameters1: 1) ciphertext ward, creative designs are often required to enable its ef- modulus q , 2) plaintext modulus p, 3) number of ciphertext fective application in practice, because in many applications slots n, and 4) a Gaussian noise with a standard deviation σ . the two parties need to perform complex nonlinear compu- The secure computation involves two parties, i.e., the client tation on their respective shares and thus it is non-trivial to C and server S . reconstruct the final result based on the computed shares. In PHE, the encryption algorithm encrypts a plaintext Due to this fundamental hurdle, the existing approaches message vector x from Zn into a ciphertext [x] with n discussed in Sec. 1.1 predominately chose to use GC, instead slots. We denote [x]C and [x]S as the ciphertexts encrypted of SS, to implement the nonlinear functions. However, GC is by client C and server S , respectively. The decryption al- computationally costly for large input [24], [28], [45]. Specifi- gorithm returns the plaintext vector x from the cipher- cally, our benchmark shows that GC takes about 263 seconds text [x]. Computation can be performed on the cipher- to compute a nonlinear ReLu function with 3.2M input text. In a general sense, an evaluation algorithm inputs values, which is part of the VGG-16 framework [6]. In this several ciphertexts [x1 ], [x2 ], · · · and outputs a ciphertext work, we propose a creative PHE-based SS for CHEETAH [x′ ] = f ([x1 ], [x2 ], · · · ). The function f is constructed by to implement secret nonlinear computation, which requires only 21 round communication for each nonlinear function, 1. The readers are referred to [28] for more detail. thus achieving multiple orders of magnitude reduction of
6 the computation time. For example, CHEETAH achieves a ensure that the server does not have access to x and the speedup of 1793 times over GAZELLE in computing the client cannot learn the server’s model parameters. To this nonlinear ReLu function. end, in GAZELLE, C encrypts x into [x]C by using HE and sends it to S . In the following discussion, both server and client use private-key BFV encryption [30]. The subscript 3 D ESIGN OF P RIVACY P RESERVED I NFERENCE [·]C denotes ciphertext encrypted by the client’s private key, A neural network is organized into layers. For example, while [·]S denotes ciphertext encrypted by the private key CNN consists of convolutional layers and fully-connected of server. dense layers. Each layer includes linear transformation (i.e., S performs HE computation to calculate the convolu- weighted sum for a fully-connected dense layer or con- tion k ∗ [x]C . To accelerate the computation, packed HE is volution for a convolutional layer), followed by nonlinear employed. For example, to compute the first element of tranformation (such as activation and pooling). All existing the convolution (i.e., Con1 ), a single cipheretxt can be cre- schemes intend to securely compute the results for linear ated to contain the vector [x(1, 1), x(1, 2), x(2, 1), x(2, 2)]C . transformation first, and then perform the nonlinear com- On the other hand, a packed plaintext vector is putation. Although it appears logical, such design leads to a created for [k(2, 2), k(2, 3), k(3, 2), k(3, 3)]. The packed fundamental performance bottleneck as discussed in Sec. 1. HE supports the computation of element-wise multi- The proposed approach, CHEETAH, is based on a creative plication between the two vectors in a single op- design, named joint obscure neural computing, which only eration, yielding a single ciphertext for the vector computes a partial linear transformation output and uses [k(2, 2)x(1, 1), k(2, 3)x(1, 2), · · · , k(3, 3)x(2, 2)]C . However, it to complete the nonlinear transformation. It achieves sev- we still need to add the vector’s elements together to com- eral orders of magnitude speedup compared with existing pute Con1 . Since the vector is in a single ciphertext, direct schemes. addition is not possible. GAZELLE uses permutation (Perm) We introduce the basic idea of CHEETAH via a simple to compute the sum [28]. For example, given a ciphertext example based on a two-layer CNN (with a convolutional that has four elements, it is firstly permed such that the layer and a dense layer), which can be formulated as fol- last two elements are moved to the first two slots in the lows: ciphertext. Then the permed ciphertext is added with the z = w · f (k ∗ x), (3) original counterpart, which results in a ciphertext whose first two elements are the partial sum of the four elements. where f (·) is the activation function, x is the 2 × 2 input Then that added ciphertext is permed such that its second data, k is a 3 × 3 kernel for the convolutional layer, ∗ stands element is moved to the first slot. The sum of the four for convolution and w is the weight matrix for the dense elements is obtained by adding the permed ciphertext and layer: " # the non-permed one. The resultant sum is at the first slot of k(1, 1) k(1, 2) k(1, 3) the final ciphertext. x(1, 1) x(1, 2) h i x = x(2, 1) x(2, 2) , k = k(2, 1) k(2, 2) k(2, 3) and However, computing the sum using Perm is costly, with k(3, 1) k(3, 2) k(3, 3) the complexity of O(r2 ) for convolution and O(log nno + ni no n ) for weighted sum in the dense layer, where no , ni and w(1, 1) w(1, 2) w(1, 3) w(1, 4) h i w= w(2, 1) w(2, 2) w(2, 3) w(2, 4) . r are the output dimension, input dimension, and kernel Note that while we use the simple two-layer CNN to lucidly size, respectively). From our experiments, one Perm is 56 describe the main idea, CHEETAH is actually applicable to times slower than one Add and 34 times slower than one any neural networks with any layer structure and input data Mult. size. In the rest of this section, we first present CHEETAH In this paper, we propose a novel idea to enable an for a Single Input Single Output (SISO) convolution layer incomplete (or obscure) linear transformation result to prop- and then discuss the cases for Multiple Input Multiple Out- agate to the next nonlinear transformation to continue the put (MIMO) convolution and fully connected dense layers. neural computation, thus eliminating the need for cipher- text permutations. The overall design is motivated by the double-secret scheme for solving linear system of equations 3.1 SISO Convolutional Layer [46]. Our scheme is illustrated in Fig. 3. The process of convolution can be visualized as placing the (1) Packed HE Encryption. C and S transform the data x kernels at different locations of the input data. At each loca- and kernel k into x′ and k ′ , respectively, as follows: tion, an element-wise sum of product is computed between the kernel and corresponding data values. If the convolution x′ = [x(1, 1), x(1, 2), x(2, 1), x(2, 2), x(1, 1), x(1, 2), x(2, 1), x(2, 2), of the above example, i.e., k ∗ x, is computed in plaintext, x(1, 1), x(1, 2), x(2, 1), x(2, 2), x(1, 1), x(1, 2), x(2, 1), x(2, 2)], the result, denoted as Con, should include four elements, Con = [Con1 , Con2 , Con3 , Con4 ]: k′ = [k(2, 2), k(2, 3), k(3, 2), k(3, 3), k(2, 1), k(2, 2), k(3, 1), k(3, 2), k(1, 2), k(1, 3), k(2, 2), k(2, 3), k(1, 1), k(1, 2), k(2, 1), k(2, 2)]. Con1 : k(2, 2)x(1, 1) + k(2, 3)x(1, 2) + k(3, 2)x(2, 1) + k(3, 3)x(2, 2), Con2 : k(2, 1)x(1, 1) + k(2, 2)x(1, 2) + k(3, 1)x(2, 1) + k(3, 2)x(2, 2), Con3 : k(1, 2)x(1, 1) + k(1, 3)x(1, 2) + k(2, 2)x(2, 1) + k(2, 3)x(2, 2), As illustrated in Fig. 4, four convolutional blocks are Con4 : k(1, 1)x(1, 1) + k(1, 2)x(1, 2) + k(2, 1)x(2, 1) + k(2, 2)x(2, 2). computed. For example, the first convolutional block com- In the problem setting of secure MLaaS (as introduced putes x(1, 1) × k(2, 2) + x(1, 2) × k(2, 3) + x(2, 1) × k(3, 2) + x(2, 2) × in Sec. 2), the client C owns the data x, while the server S k(3, 3). The elements in each convolutional block are sequen- owns the CNN model (including k and w). The goal is to tially extracted into a packed ciphertext [x′ ]C . Meanwhile, S
7 Client (C) Server (S) convolution result with a randomly multiplicative blinding (x, s1) offline (k, w) factor. Specifically, S pre-generates a pair of random num- bers that satisfy vi1 vi2 = 1, for each i-th to-be-summed [s]1 C block in [x′ ◦ k ′ ]C , where i ∈ {1, 2, 3, 4} in this example. [ID1]S & [ID2]S S constructs the following vector v by using vi1 : v = [v11 , v11 , v11 , v11 , v21 , v21 , v21 , v21 , Encrypted Data [x']C v31 , v31 , v31 , v31 , v41 , v41 , v41 , v41 ], which will be used to scramble [x′ ◦ k ′ ]C by multiplying it Obscure Linear Result [x'żk'żv+b]C with v before it is sent to C . Note that, as each individual in the i-th four-element block is multiplied with the same Recover S-encrypted ReLu (Eq. (6)) factor (since v11 , v21 , v31 , v41 are repeated four times in v ), it would leak the relative magnitude among those four Secret Share for Server [f(k* x)-s1]S elements in each block. To this end, S further constructs a noise vector as follows: Recover C-encrypted ReLu [f(k* x)]C b = [b11 , b12 , b13 , b14 , b21 , b22 , b23 , b24 , b31 , b32 , b33 , b34 , b41 , b42 , b43 , b44 ], where bij are random numbers subject to 4j=1 bij = vi1 δi P Fig. 3. The overall design of CHEETAH. that is uniformly distributed in [−ǫ, ǫ] where ǫ is a model also transforms the kernel k into k ′ according to each con- parameter known to server S . volutional block. Note that the transformation is completed At the same time, S uses vi2 to create the following offline. C encrypts x′ and sends [x′ ]C to S . vectors: (2) Perm-free Secure Linear Computation. Upon receiving ID1 = [ID11 , ID21 , ID31 , ID41 ], [x′ ]C , S performs the linear computation based on the client- ID2 = [ID12 , ID22 , ID32 , ID42 ], encrypted data. A distinguished feature of the proposed where (IDi1 , IDi2 ) is a pair of polar indicator, design is to eliminate the costly permutations. Let x′ ◦k ′ denote the elementwise multiplication between (0, vi2 ), if vi1 > 0 (IDi1 , IDi2 ) = (4) x and k ′ . As we can see, the sum of four elements for each ′ (vi2 , −vi2 ), if vi1 < 0. block in x′ ◦k ′ corresponds to one element of the convolution S encrypts ID1 and ID2 by using packed HE. The encrypted result. For example, the four elements for first block, i.e., values, i.e., [ID1 ]S and [ID2 ]S , will be sent to C for the [x(1, 1), x(1, 2), x(2, 1), x(2, 2)] and [k(2, 2), k(2, 3), k(3, 2), k(3, 3)], nonlinear computation as to be discussed later. Note that, correspond to Con1 . The next block (i.e., [x(1, 1), x(1, 2),x(2, 1), [ID1 ]S and [ID2 ]S can be transmitted to C offline, as vi1 and x(2, 2)] and [k(2, 1), k(2, 2), k(3, 1), k(3, 2)]) correspond to Con2 , vi2 are pre-generated by S . and so on and so forth. Now, let us put all pieces together for the secure com- S performs Mult([x′ ]C , k ′ ) to obtain [x′ ◦ k ′ ]C . The result putation of convolution: C encrypts x′ and sends [x′ ]C to S . is the client-encrypted elementwise multiplication between S pre-computes v ◦ k ′ in plaintext and then multiplies the x′ and k ′ . But S does not intend to calculate the sum of each result with [x′ ]C to obtain [x′ ◦ k ′ ◦ v]C . As we can see, the block to obtain the final convolution result as GAZELLE i-th convolution element (which corresponds to the sum of does, because it would need the costly permutations. In- i-th four-element block in [x′ ◦ k ′ ◦ v]C ) is actually multiplied stead, it intends to let C decrypt [x′ ◦ k ′ ]C to compute the with a random number vi1 . Finally, S adds the noise vector sum in the plaintext. by Add([x′ ◦ k ′ ◦ v]C , b) = [x′ ◦ k ′ ◦ v + b]C . In this way, However, naively sending [x′ ◦ k ′ ]C to the client would b disturbs each element of convolution result (the sum of allow the client to obtain the neural network model infor- four elements in each block) with a random noise δi while v mation, i.e., k . To this end, S disturbs each element of the scales each noised element. Next, we will show that, although the convolution result x' at C is not explicitly calculated, the partial (obscure) result, i.e., x(1,1) x(1,2) x(2,1) x(2,2) x(1,1) x(1,2) x(2,1) x(2,2) x(1,1) x(1,2) x(2,1) x(2,2) x(1,1) x(1,2) x(2,1) x(2,2) [x′ ◦ k ′ ◦ v + b]C , is sufficient to compute the nonlinear x at C transformation (e.g., activation and pooling). Convolutional blocks x(1,1) x(1,2) k(1,1) k(1,2) k(1,3) k(1,1) k(1,2) k(1,3) (3) PHE-based Secret Share for Non-Linear Transforma- k(2,1) x(1,1) k(2,2) k(2,3) x(1,2) k(2,1) k(2,2) x(2,1) x(2,2) x(1,1) x(1,2) k(2,3) tion. S sends [x′ ◦ k ′ ◦ v + b]C , [ID1 ]S and [ID2 ]S to C (note k(3,1) x(2,1) k(3,2) k(3,3) x(2,2) k(3,1) k(3,2) x(2,1) x(2,2) k(3,3) * that [ID1 ]S and [ID2 ]S are transmitted to C offline). k(1,1) k(1,2) k(1,3) k(1,1) x(1,1) x(1,2) k(1,1) k(1,2) C decrypts [x′ ◦k ′ ◦v+b]C and sums up each four-element k(1,2) k(1,3) x(1,1) x(1,2) k(1,3) k(2,1) k(2,2) k(2,3) k(2,1) x(2,1) k(2,2) k(2,3) x(2,2) k(2,1) k(2,2) x(2,1) x(2,2) k(2,3) block in plaintext, yielding y = [y(1), y(2), y(3), y(4)]. It is k(3,1) k(3,2) k(3,3) k(3,1) k(3,2) k(3,3) k(3,1) k(3,2) k(3,3) not difficult to show that y(i) is vi1 times of the disturbed k at S convolution, i.e., y(i) = vi1 × (Coni + δi ). k(2,2) k(2,3) k(3,2) k(3,3) k(2,1) k(2,2) k(3,1) k(3,2) k(1,2) k(1,3) k(2,2) k(2,3) k(1,1) k(1,2) k(2,1) k(2,2) If C had the true convolution outcome, i.e., Coni , it k' at S would compute the ReLu function as follows: Coni , if Coni ≥ 0 Fig. 4. Data transformation at client and server. fR (Coni ) = (5) 0, if Coni < 0.
8 However, C only has y(i) = vi1 × (Coni + δi ). Since vi1 cn be the number of input data that can be packed into one is a random number that could be positive or negative, it ciphertext. Recall that each x must be transformed to x′ as is infeasible to obtain correct activation directly. Instead, C discussed in Sec. 3.1. Let co denote the number of kernels computes and r the size of each kernel. After transformation, the size of x′ is r2 times of the original x. Therefore, each ciphertext Add(Mult([ID1 ]S , y), Mult([ID2 ]S , fR (y))). (6) can hold cn /r2 such transformed input data. Accordingly, We can show that the above calculation essentially re- the ci input data are transformed and encrypted into ci r2 /cn covers the server-encrypted ReLu function of Coni + δi , i.e., ciphertexts. [f (k ∗x+δ)]S where δ = {δi }. Since y(i) = vi1 ×(Coni +δi ), The remaining process for linear and nonlinear compu- fR (y(i)) may yield four possible outputs, depending on the tation is similar to SISO, except that the computation on a signs of vi1 and Coni + δi . ciphertext actually calculates multiple input data simultane- ously and that the convolution of all input ciphertexts based y(i), if vi1 > 0 & (Coni + δi ) ≥ 0 on one kernel are combined into one output ciphertext, y(i), if vi1 < 0 & (Coni + δi ) < 0 fR (y(i)) = (7) yielding a total of co output ciphertexts. MIMO is obviously 0, if vi1 > 0 & (Coni + δi ) < 0 more efficient in processing batches of input data. 0, if vi1 < 0 & (Coni + δi ) ≥ 0. For example, when vi1 > 0 and (Coni + δi ) ≥ 0, we have 3.3 Fully-connected Dense Layer ID1 = {0} according to Eq. (4) and thus Mult([ID1 ]S , y) = In a fully-connected dense layer, S uses the output of the [0]S . On the other hand, ID2 = [v12 , v22 , v32 , v42 ]. Since previous layer (i.e., [a]C ) to compute the weighted sum. Take y(i) = vi1 × (Coni + δi ), we have Mult([ID2 ]S , fR (y)) = the simple two-layer CNN as an example, the weighted sum [v12 v11 (Con1 + δ1 ), · · · , v42 v41 (Con4 + δ4 )]S . Note that we computes have chosen vi1 vi2 = 1. Therefore, Eq. (6) should yield c1 = w(1, 1)[a(1)]C + w(1, 2)[a(2)]C + w(1, 3)[a(3)]C + w(1, 4)[a(4)]C , [Con1 +δ1 , Con2 +δ2 , Con3 +δ3 , Con4 +δ4 ]S . This is clearly c2 = w(2, 1)[a(1)]C + w(2, 2)[a(2)]C + w(2, 3)[a(3)]C + w(2, 4)[a(4)]C . the server-encrypted ReLu output of Con + δ . Similarly, we can examine other cases of vi1 and Coni + δi in Eq. (7) and The computation of c1 and c2 is intrinsically the same as the show that Eq. (6) always produce the server-encrypted ReLu computation of each convolution element (i.e., Con1 , . . . , outcome f (k ∗ x + δ). We will show in Sec. 5 that the ReLu Con4 ) as discussed above. function of noised linear result introduces negligible accu- racy loss to the neural networks while δi and v1i prevent 3.4 Complexity Analysis client from inferring the right Coni . Subsequently, C creates a ReLu share s1 and computes In this subsection, we analyze the computation and com- the server’s share as Add([f (k ∗ x + δ)]S , −s1 ) = [f (k ∗ munication cost of CHEETAH and compare it with other x + δ) − s1 ]S . C sends it along with [s1 ]C (i.e., the client- schemes. encrypted share s1 , which can be pre-generated by C ) to S . (1) Computation Complexity. The analysis of the computa- S decrypts [f (k ∗ x + δ) − s1 ]S to obtain a share of the tion complexity focuses on the number of ciphertext permu- plaintext activation result, i.e., f (k ∗ x + δ) − s1 . It then tations (Perm), multiplications (Mult), and additions (Add). computes Add([s1 ]C , f (k∗x+δ)−s1) to obtain [a]C = [f (k∗ The notations to be used in the analysis are summarized as x + δ)]C , i.e., the client-encrypted nonlinear transformation follows: result. Note that, the introduce of δ dose not effect the neural • n is the number of slots in a ciphertext. network performance as shown in Sec. 5. • q is the ciphertext space. Till now, the computation of the current layer (including • n log q is the number of bits of a ciphertext. linear convolution and nonlinear activation) is completed. • ni is the input dimension of a fully connected layer. The output of this layer (i.e., [f (k ∗ x + δ)]C ) will serve as the • no is the output dimension of a fully connected layer. input for the next layer. If the next layer is still convolution, • r is the kernel size. the server simply repeats the above process. Otherwise, if • ci is the number of input data (channels) in MIMO. the next is a fully-connected dense layer, a similar approach • co is the number of kernels or the number of output can be taken as to be discussed in Sec. 3.3. feature maps in MIMO. Note that some CNN models employ pooling after acti- • cn is the number of input data that can be packed vation to reduce its dimensionality. For example, mean pool- into one ciphertext. ing takes the activations as the input, which is divided into a number of regions. The averaged value of each region is In SISO, recall that a ciphertext [x′ ]C is firstly sent to S . S used to represent that region. Both C and S can respectively conducts one ciphertext multiplication and addition to get average their activation shares (i.e., s1 and f (k ∗ x + δ) − s1 ) [v ◦ k ′ ◦ x′ + b]C . Then C receives [v ◦ k ′ ◦ x′ + b]C , performs to obtain the share of mean pooling. Meanwhile, a similar the decryption, and gets the summed convolution y in plain- scheme can be applied if the bias is included. text, which is followed by 2 multiplications and 1 addition to get the encrypted ReLu, according to Eq. 6. Finally, C does another addition namely Add([f (k ∗ x + δ)]S , −s1 ) to 3.2 MIMO Convolutional Layer generate S ’s ReLu share. S finaly recovers the encrypted The above SISO method can be readily extended to MIMO nonlinear result with another addition. Therefore, total 3 convolutional layer in order to process multiple inputs multiplications and 4 additions are required in SISO. The simultaneously. Assume there are ci input data (i.e., x). Let complexity is O(1).
9 In MIMO, C sends S ci r2 /cn ciphertexts. Then S per- TABLE 2 forms ci r2 /cn Mult and (ci r2 /cn − 1) Add to get an in- Comparison of computation complexity. complete ciphertext for each of co kernels. After that, each Method Perm Mult Add of co incomplete ciphertext is added with noise vector by GA-SISO O(r ) 2 2 O(r ) O(r 2 ) one addition. Then S sends those co cipheretxts to C , which CH-SISO 0 O(1) O(1) ci co r 2 ci co r 2 decrypts them and obtain co output features, creating co /cn IR-MIMO O(ci r 2 ) O( cn ) O( cn ) plaintext. Based on Eq. (6), C gets the encrypted ReLu with c c r2 ci co r 2 c c r2 OR-MIMO O( i con ) O( cn ) O( i con ) 2co /cn multiplications and co /cn additions, because each CH-MIMO 0 c co r 2 O( i cn ) c co r 2 O( i cn ) of co /cn plaintext associates with 2 multiplications and 1 NA-FC [28] O(no log ni ) O(no ) O(no log ni ) addition. Finally, C performs another addition on each of HS-FC [47] O(ni ) O(ni ) O(ni ) n n n n n n co /cn ReLu ciphertexts to generate the ReLu share for S . GA-FC O(log nno + in o ) O( in o ) O(log nno + in o ) n n n n CH-FC 0 O( in o ) O( in o ) S then gets its ReLu share by decryption and recovers the nonlinear result by co r2 /cn Add. Therefore, MIMO needs 2 (ci +1)co r 2 ( ci cconr + 2c cn ) multiplications and ( o cn + 2c cn ) addi- o changed. In the second transmission, since S can simulta- tions, both with the complexity of 2 O( ci cconr ). neously send each of co cipheretexts after each calculation, the actual communication cost is on transmitting the last In a fully-connected (FC) dense layer, S conducts ni no /n one of co ciphertexts. Thus, CHEETAH has a pipelined multiplications to get ni no /n intermediate ciphertext, where communication cost as ( ccni + 1)n log q bits. n is usually much larger than ni and no . After that, the In the FC layer, the two transmissions are 1) C sends S an zero-sum vector is added on each of ni no /n intermediate input ciphertext; 2) S sends C ni no /n cipheretexts. As each ciphertext to form [x′ ◦ w′ ◦ v + b′ ]C 2 which is sent to C . C of ni no /n cipheretexts can be simultaneously transmitted does the decryption and gets the summed result in plaintext. after each calculation, the actual communication cost is the Then C calculates the encrypted ReLu with 2 multiplications and 1 addition by Eq. (6). Finally, one addition is performed last one of ni no /n ciphertexts. The total pipelined cost is thus 2n log q bits. The quantitative communication compar- to generate the ReLu share for S , and S needs another Add ison to other approaches is given in Sec. 5. to recover the encrypted nonlinear result. So the FC layer needs ( ninno + 2) multiplications and ( ninno + 3) additions, resulting in the complexity of O( ninno ). 4 S ECURITY A NALYSIS Table 2 compares the computation complexity between CHEETAH and other schemes. Specifically, In the SISO We follow the ideal/real world paradigm [37], [48], [49] to case, CHEETAH (CH) has a constant complexity without prove the security of CHEETAH. We start with defining the permutation while GAZELLE (GA) has the complexity r2 . ideal functionality f OMI which captures the security prop- In the MIMO case, GAZELLE has two traditional options for erties we want to achieve for Outsourced MLaaS Inference. permutation, i.e., Input Rotation (IR) and Output Rotation Defintion 1. The ideal functionality f OMI of outsourced MLaaS (OR) [28]. CHEETAH eliminates the expensive permutation inference consists of the following parts: without incurring more multiplications and additions, thus - Input. The server sends model parameters M , e.g., kernel yielding a considerable gain. In the FC layer, we compare k ∈ M , to f OMI . The client sends private input x to f OMI . CHEETAH with a naive method (NA) in [28] (the base- - Computation. Upon receiving the model parameters from line of GAZELLE), Halevi-Shoup (HS) [47] and GAZELLE. server and the private input x from client, f OMI conducts Through the obscure matrix calculation, obscure HE and MLaaS inference by linear and nonlinear computation with x and secret share, CHEETAH further reduces the complexity of produces the nonlinear result f (x ∗ k) = ReLu(x ∗ k). addition by O(log nno ) compared to GAZELLE. In particular, - Output: The f OMI sends respective share of the nonlinear n is usually much larger than no , which makes this reduc- result f (x ∗ k) = ReLu(x ∗ k) to client and server. As for the tion significant. It is worth pointing out that CHEETAH last layer, the f OMI sends the obscure linear result to client with completes both the linear and nonlinear operations with one random number in v . the above complexity while the existing schemes such as Given the ideal functionality f OMI , we give the formal GAZELLE only finish the linear operation. security definition as follows. (2) Communication Complexity. In the SISO case, CHEE- TAH has two transmissions: 1) C sends the encrypted data Definition 2. A protocol Π securely computes the f OMI in the [x′ ]C to S ; 2) S sends [x′ ◦ k ′ ◦ v + b]C to C . Thus the semi-honest adversary setting with static corruption if it provides communication cost is 2n log q bits. Note that the third the following guarantees: transmission in Fig. 4 where C sends the encrypted ReLu - Corrupted server. We require that a corrupted and semi- share to S is the beginning of the next layer. honest server does not learn any information about the values Similarly, in MIMO, the two transmissions are 1) C sends in the client’s private input x. Formally, there should exist a S ci r2 /cn ciphertexts for ci input images; 2) S sends C co ci- Probabilistic Polynomial Time (PPT) simulator simS such that c pheretexts for co kernels. Note that, in the first transmission, viewSΠ ≈ simS (M , out), where viewSΠ denotes the view of the ci r2 /cn ciphertexts are transmitted at the first convolutional server in the real protocol execution (including the server’s input, layer while only ci /cn ciphertexts are needed in other layers. randomness, and the transcript of the protocol). simS (M , out) This is because the size of S -encrypted ReLu will not be is the simulation based on S ’s input, i.e., M , and its final output c ‘out’, e.g., the share of nonlinear function. The “≈” denotes 2. The structure of b′ is similar with b. “computationally indistinguishable”.
10 - Corrupted client. We require that a corrupted and semi- of simC (x, out) is computationally indistinguishable to honest client does not learn any information about the server’s the viewCΠ of the corrupted client. model parameters beyond some generic meta-parameters, i.e, the b) The case of last layer. simC 1) chooses an uniform number of input and output channels and the number of layers. random tape for the client; 2) sends private input x Formally, there should exist a PPT simulator simC such that to f ODT and gets the obscure linear result as out; 3) c viewCΠ ≈ simC (x, out), where viewCΠ denotes the view of the receives from client the C -encrypted input as [x]C ; 3) client in the real protocol execution (including the client’s input, enceypts out with client’s public key as [out]C ; 4) sends randomness, and the transcript of the protocol). simC (x, out) is [out]C to client and outputs whatever C outputs. Here the simulation based on C ’s input, i.e., x, and its final output the view of client in real protocol execution and the ‘out’, e.g., the share of nonlinear function. simulated counterpart is identical. So the output of simC (x, out) are computationally indistinguishable to Theorem 1. Our protocol provides a secure realization of the ideal the viewCΠ of the corrupted client. The proof of Theorem functionality f ODT according to Definition 2. 1 is completed. Proof. According to our security definition, we need to show a simulator for different corrupted parties i.e., the server and 5 P ERFORMANCE E VALUATION the client. We implement CHEETAH with C++ based on Microsoft - Simulator for the corrupted server: SEAL Library [44], and compare it with the best existing a) The case of intermediate layer. simS 1) chooses an scheme, GAZELLE3 . We use two workstations as the client uniform random tape for the server; 2) sends model and server. Both machines run Ubuntu with Intel i7-8700 parameters M to f ODT and gets the share of the 3.2GHz CPU with 12 threads and 16 GB RAM. The network nonlinear result as out; 2) randomly picks a public link between them is a Gigabit Ethernet. Recall that the four key pk and encrypts all-zero input as [0]simS ; 3) sends parameters in BFV scheme are: 1) ciphertext modulus q ; 2) [0]simS to server and receives the obscure linear result plaintext modulus p; 3) number of ciphertext slots n and 4) from the server; 4) encrypts out with S ’s public key as a Gaussian noise with a standard deviation σ . A larger q/p [out]S ; 5) sends [out]S to server and outputs whatever tolerates more noise. We set p to be a 20-bit number and q S outputs. Here the view of server in real protocol to be a 60-bit psuedo-Mersenne prime. The number of slots execution is the client-encrypted input and the share for the packed encryption is set to 10,000. of nonlinear function, while the simulated view is simS -encrypted input and the same share of nonlinear 5.1 Component-wise Benchmark function. On the one hand, the client-encrypted input and simS -encrypted input are indistinguishable due to We first examine the performance of each functional com- the semantic security of HE. On the other hand, the ponent including Conv, FC and ReLu. share of nonlinear function are identical in real and Convolution Benchmark. We define the time of the con- simulated execution. So the output of simS (M , out) is volution operation as the duration between S receives the computationally indistinguishable to the viewSΠ of the encrypted data or secret share from the previous layer (e.g., corrupted server. ReLu) till S completes the convolution computation, just before sending the (partial) convolution results to C . It does b) The case of last layer. simS 1) chooses an uniform not contain the communication time between S and C , such random tape for the server; 2) sends model parameters M to f ODT and gets the None as out; 2) randomly as transmitting the (partial) convolution results to C , or picks a public key pk and encrypts all-zero input as secret share to S , or in the case of GAZELLE, the time [0]simS ; 3) sends [0]simS to server and receives the for the HE to GC transformation between S and C for fair obscure linear result from the server. Here the view of comparison. All such communication time is accounted in ReLu and pooling discussed later. server in real protocol execution is the client-encrypted Table 3 benchmarks the convolution with different in- input while the simulated view is simS -encrypted in- put and kernel sizes. The ‘In rot’ and ‘Out rot’ indicate put. As the client-encrypted input and simS -encrypted two GAZELLE variants with the input or output rotation, input are indistinguishable due to the semantic security of HE, the output of simS (M , out) is computationally from which, one of them has to be used for convolution indistinguishable to the viewSΠ of the corrupted server. (see [28] for details). From Table 3, CHEETAH significantly outperforms GAZELLE. E.g., with the kernel size 5 × 5@5, - Simulator for the corrupted client: both the GAZELLE In rot and Out rot variants need more a) The case of intermediate layer. simC 1) chooses an than 25 Mult, 24 Add and 24 Perm operations to yield the uniform random tape for the client; 2) sends private in- result of convolution. In contrast, CHEETAH needs only 5 put x to f ODT and gets the share of the nonlinear result Mult and 5 Add operations, one for each kernel, to obtain as out; 2) receives from client the C -encrypted input as the (partial) convolution results. Those results are then [x]C ; 3) randomly forms a vector r and encrypts it with sent to C for computing ReLu (to be discussed). Overall, client’s public key as [r]C ; 4) sends [r]C to client and CHEETAH accomplishes a speedup of 247 and 207 times receives the S -encrypted share of nonlinear function for compared with the GAZELLE In rot and Out rot variants, server. Here the view of client in real protocol execution respectively, for the case with the kernel size 5 × 5@5 and is the obscure linear result, e.g., x′ ◦ k ′ ◦ v + b, while input data size 28 × 28@1. the simulated view is r. As the v , b and r are random, x′ ◦ k ′ ◦ v + b and r are indistinguishable. So the output 3. Available at: https://github.com/chiraag/gazelle mpc
You can also read