SIDE-CHANNEL ANALYSIS OF THE XILINX ZYNQ ULTRASCALE+ ENCRYPTION ENGINE
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
SIDE-CHANNEL ANALYSIS OF THE XILINX ZYNQ ULTRASCALE+ ENCRYPTION ENGINE Benjamin Hettwer1, Sebastien Leger1, Daniel Fennes2 Stefan Gehrer3, and Tim Güneysu2 1Robert Bosch GmbH, Corporate Sector Research, Stuttgart, Germany 2HorstGörtz Institute for IT-Security, Ruhr University Bochum, Germany 3Robert Bosch LLC, Corporate Research, Pittsburgh, USA CHES 2021
Motivation FPGA Security Mostly based on Static Random-Access Memory (SRAM) Configuration file (i.e. bitstream) must be loaded at each power-up Bitstream has to be protected against duplication, manipulation and reverse engineering Bitstream encryption (and authentication) supported by many devices EDA Software FPGA NVM Bitstream 01010 11000 Configuration 01000 10100 110… Enc 000… Dec memory (SRAM) 2 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
Motivation SCA on Bitstream Encryption Examples of successful extraction of bitstream decryption key Moradi et al., 2012 [1] => Xilinx Virtex-4, Virtex-5 Moradi et al., 2016 [2] => Xilinx Spartan-6, Kintex-7, Artix-7 Moradi et al., 2013 [3] => Altera Stratix II Swierczynski et al. 2015 [4] => Altera Stratix II, Stratix III General problem: No dedicated SCA countermeasures! 3 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
Motivation Xilinx ZYNQ UltraScale+ On-chip encryption engine AES-256 in GCM mode (full hardware) Used for bitstream encryption and authentication Protocol-based countermeasure: key rolling Limits data collection for the adversary Xilinx gives no recommendation about a suitable block size ‒ I.e. how many AES blocks are encrypted under same key Goal: Find appropriate value for key rolling parameter Image Source: Xilinx Zynq UltraScale+ Device, Technical Reference Manual, v1.9 4 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
ZYNQ UltraScale+ Encryption Engine Assumptions Hardware root of trust authentication enabled (RSA) Only decryption of authenticated items (bitstream, software) can be measured No chosen ciphertext Attacker can record one specific bitstream decryption multiple times Averaging of traces => Increases Signal-to-Noise (SnR) ratio No masking countermeasure in AES core Focused on 1st order leakage Access to open copy of device => Profiling attacks possible 5 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
ZYNQ UltraScale+ Encryption Engine Setup Target: ZYNQ UltraScale+ evaluation board ZCU 102 (16 nm technology) Flip-chip packaging => Removed metal cap with a small driller Leakage vector: EM signal induced by on-chip decoupling capacitors Placed EM probe directly to decoupling capacitor related to AES power rail (VCCPSINTLP) 6 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
ZYNQ UltraScale+ Encryption Engine Setup (cont.) Measurement setup Target board clock frequency: 48 MHz Langer EMV probe ICR HV 500-75 placed on 4-axis positioning system Picoscope 6404 with sampling rate of 625 MS/s and 500 MHz bandwidth Averaging factor: 250 7 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
ZYNQ UltraScale+ Encryption Engine AES Hardware Architecture Black-box reverse engineering of AES-256 architecture Correlation with known key Steps Timing behavior of the core (14 clock cycles per enc.) => Round-based implementation Tried different leakage models with varying number of pipeline stages and registers Assumed architecture: 8 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
ZYNQ UltraScale+ Encryption Engine AES Hardware Architecture (cont.) Power Leakage : = [( ( || +1 ) ⊕ ( || +1 )] ‒ : Hamming weight ‒ : State of register in AES round ‒ : Initialization vector ‒ : Counter value after iterations Hamming Distance (HD) between the output of encryption in round , and the output of encryption + 1 in the same round 9 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
Attack Procedure CTR Mode and Hardware Specific Attributes Red: Changes in AES state when only LSB of Counter (CTR) changes Only one byte out of 16 bytes is changing at each encryption and is leaking information HD Leakage in 1 only related to one key byte ( 15 ) Several rounds have to be considered to extract full 256-bit AES key 10 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
Attack Procedure … over first five AES rounds 1: key byte ( 15 ) with 8-bit hypothesis 2 : Four constants ( 0 − 3 ) related to subkeys with 8-bit hypothesis Constants are only valid for 256 consecutive encryptions => upper bound 3 : 16 constants ( 0 − 15 ) with four attacks with 32-bit hypothesis Enables to calculate input vector of 4 for 256 consecutive encryptions 4 : 16 subkey bytes with four attacks with 32-bit hypothesis 5 : 16 subkey bytes with four attacks with 32-bit hypothesis 11 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
Experiments Data Set 200.000 traces (after averaging) 75% profiling, 24% validation (to monitor training progress), 1% attack traces Random keys and IVs Each trace includes EM emanations for 256 successive AES-GCM encryptions ( 0 to 255) 15625 samples per trace 12 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
Experiments Baseline Attacks Correlation Power Analysis (CPA) with Linear Discriminant Analysis (LDA) pre-processing LDA projects traces into a smaller dimension which maximizes intra-class variance Calculated LDA co-effients for each HD leakage per round attack using 50 sample points Location of samples has been determined by correlation with known key Attack traces compressed into single data point per leakage Challenge In theory 256 – 1 = 255 leakages First leakage for = 3 AES pipeline registers are filled with values from r + 3 every 4th clock cycle Only 190 encryptions generate exploitable leakage 13 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
Experiments Baseline Attacks (cont.) Two power leakage models Regular HD Linear Regression (LR) – aka stochastic approach by Schindler et al. [5] Round 1 Round 2 Round 3 Round 4 Round 5 CPA-LDA 170* - - - - (HD Model) CPA-LDA 130* - - - - (LR Model) *Number of traces to reach a key rank of one 14 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
Experiments Deep Learning Attacks - Concept Correlation Optimization (CO) scheme by Robyns et al. [6] Idea: Train a Deep Neural Network (DNN) to produce an encoding of the input data (i.e. the traces) that maximizes the Pearson correlation with a hypothetical power leakage Extension of classical CPA with an additional profiling/learning phase Specialized loss function: ℒ ( , ) = 1 − ( , ) DNN Corr. Why CO? Only 190 encryptions available => profiled attack CO automatically encodes traces into single value => faster attack phase (good for 32-bit hypothesis) Imbalanced data set due to binominal distribution produced by HW function ‒ Problem for Gaussian templates and standard deep learning-based attacks with cross-entropy loss function 15 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
Experiments Deep Learning Attacks – Concept (cont.) CO extensions ( ) Bitwise Correlation Loss (CO-BIT) DNN Corr. ‒ Bit flips in registers are used as leakage labels (no HW) ( ) ( ) ‒ Sum of correlation for each bit used to calculate total loss: ( ) ℒ − , = 1 − σ Weighted-Bit Correlation (CO-W-BIT) DNN Corr. ‒ Adoption of stochastic approach ‒ Weight co-efficient for each bit => approximated by neuron ‒ Creates optimized encoding for leakage that is (1) 1 used in loss function: ℒ − − , = 1 − ( − , ) (2) 2 σ ( ) 16 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
Experiments Deep Learning Attacks - Models (1) DNN 1 1 1 (2) DNN 2 DNN 2 (1) DNN 2 … 190 3 (190) DNN 190 Individual model per One model output per Triple output model leakage leakage Trade-off between training Shorter traces, but long Faster training, time and complexity training time more complex 17 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
Experiments Deep Learning Attacks – Models (cont.) Implemented all three schemes Model per leakage: CO, CO-BIT, CO-W-BIT Output per counter leakage (All-CTR): CO Triple output model (3-CTR): CO Neural Network Types Multi-Layer Perceptron (MLP) ‒ 2 hidden layers (10/15 neurons) Convolutional Neural Network (CNN) ‒ 3 blocks of batch_norm + conv + max_pooling Attacks implemented in Python with Keras and Tensorflow frameworks 18 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
Experiments Results – Round 1 Target: key byte 15 19 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
Experiments Results – Round 2 Target: Four 8-bit constants ( 0 − 3 ) 20 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
Experiments Results – Round 3 Target: 16 constants ( 0 − 15 ) with four attacks with 32-bit hypothesis Enables to calculate input vector of 4 for 256 consecutive encryptions Implemented attack phase on GPU using Nvidia CUDA 232 key guesses = 4 294 967 296 Reduced recovery phase from weeks to approx. 6h None of attacks were able to extract constants within 190 encryptions Key rank between 10.000 – 10.000.000 Complexity too high and SnR too low for 32-bit attack > 1000 encryptions needed in our setup 21 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
Experiments Results – Round 4/5 Target: 16 subkey bytes with four attacks with 32-bit hypothesis Key recovery not successful Round 3 constants not available ~ 3000 encryptions needed 22 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
Recommendation for Key Rolling Parameter Boot Time Effects Key rolling parameter impacts size of configuration image and boot time < 5 increases boot time by more than 400% > 40 increases boot time by less than 10% 23 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
Recommendation for Key Rolling Parameter Conclusion Complete extraction of the key has not been successful within 256 encryptions Why not just using a value of ~200? Parts of the key could be extracted with less than 50 encryptions Worst case scenario: Profiling attack on single device “Attacks always become better, never get worse” – Old NSA saying Lifetime of products can be 20 years or longer (e.g. in automotive) Recommendation: 20 - 30 Security margin against future attacks Reasonable boot time overhead (15 − 25%) 24 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
THANK YOU!
References [1] Moradi, A., Kasper, M., Paar, C.: Black-box side-channel attacks highlight the importance of countermeasures. In: Dunkelman, O. (ed.) CT-RSA 2012. LNCS, vol. 7178, pp. 1–18. Springer, Heidelberg (2012) [2] Moradi, A., Schneider, T.: Improved Side-Channel Analysis Attacks on Xilinx Bitstream Encryption of 5, 6, and 7 Series. In: Standaert, F.X., Oswald, E. (eds.) Constructive Side-Channel Analysis and Secure Design. pp. 71– 87. Springer International Publishing, Cham (2016) [3] Moradi, A., Oswald, D., Paar, C., Swierczynski, P.: Side-channel attacks on the bitstream encryption mechanism of AlteraStratix II: facilitating black-box analysis using software reverse-engineering. In: FPGA, pp. 91–100. ACM (2013) [4] Swierczynski, P., Moradi, A., Oswald, D., Paar, C.: Physical security evaluation of the bitstream encryption mechanism of Altera Stratix II and Stratix III FPGAs. TRETS 7(4), 34:1–34:23 (2015) [5] Schindler, W., Lemke, K., Paar, C.: A stochastic model for differential side channel cryptanalysis. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 30–46. Springer, Heidelberg (2005) [6] Robyns, P., Quax, P., Lamotte, W.: Improving CEMA using Correlation Optimization. IACR Transactions on Cryptographic Hardware and Embedded Systems 2019(1), 1–24 (Nov 2018). https://doi.org/10.13154/tches.v2019.i1.1-24, https://tches.iacr.org/index.php/TCHES/article/view/7332 Hettwer et al. | Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine | CHES 2021 © Robert Bosch GmbH 2021. All rights reserved, also regarding any disposal, exploitation, reproduction, editing, distribution, as well as in the event of applications for industrial property rights.
You can also read