Swivel: Hardening WebAssembly against Spectre
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Swivel: Hardening WebAssembly against Spectre Shravan Narayan† Craig Disselkoen† Daniel Moghimi¶† Sunjay Cauligi† Evan Johnson† Zhao Gang† Anjo Vahldiek-Oberwagner? Ravi Sahita∗ Hovav Shacham‡ Dean Tullsen† Deian Stefan† † UC San Diego ¶ Worcester Polytechnic Institute ? Intel Labs ∗ Intel ‡ UT Austin Abstract in recent microarchitectures [41] (see Section 6.2). In con- arXiv:2102.12730v2 [cs.CR] 20 Mar 2021 We describe Swivel, a new compiler framework for hardening trast, Spectre can allow attackers to bypass Wasm’s isolation WebAssembly (Wasm) against Spectre attacks. Outside the boundary on almost all superscalar CPUs [3, 4, 35]—and, browser, Wasm has become a popular lightweight, in-process unfortunately, current mitigations for Spectre cannot be im- sandbox and is, for example, used in production to isolate plemented entirely in hardware [5, 13, 43, 51, 59, 76, 81, 93]. different clients on edge clouds and function-as-a-service On multi-tenant serverless, edge-cloud, and function as a platforms. Unfortunately, Spectre attacks can bypass Wasm’s service (FaaS) platforms, where Wasm is used as the way to isolation guarantees. Swivel hardens Wasm against this class isolate mutually distursting tenants, this is particulary con- of attacks by ensuring that potentially malicious code can nei- cerning:1 A malicious tenant can use Spectre to break out of ther use Spectre attacks to break out of the Wasm sandbox nor the sandbox and read another tenant’s secrets in two steps coerce victim code—another Wasm client or the embedding (§5.4). First, they mistrain different components of the under- process—to leak secret data. lying control flow prediction—the conditional branch predic- We describe two Swivel designs, a software-only approach tor (CBP), branch target buffer (BTB), or return stack buffer that can be used on existing CPUs, and a hardware-assisted (RSB)—to speculatively execute code that accesses data out- approach that uses extension available in Intel® 11th genera- side the sandbox boundary. Then, they reconstruct the secret tion CPUs. For both, we evaluate a randomized approach that data from the underlying microarchitectural state (typically mitigates Spectre and a deterministic approach that eliminates the cache) using a side channel (cache timing). Spectre altogether. Our randomized implementations impose One way to mitigate such Spectre-based sandbox breakout under 10.3% overhead on the Wasm-compatible subset of attacks is to partition mutually distrusting code into separate SPEC 2006, while our deterministic implementations impose processes. By placing untrusted code in a separate process overheads between 3.3% and 240.2%. Though high on some we can ensure that the attacker cannot access secrets. Chrome benchmarks, Swivel’s overhead is still between 9× and 36.3× and Firefox, for example, do this by partitioning different sites smaller than existing defenses that rely on pipeline fences. into separate processes [25, 64, 68]. On a FaaS platform, we could similarly place tenants in separate processes. 1 Introduction Unfortunately, this would still leave tenants vulnerable to WebAssembly (Wasm) is a portable bytecode originally de- cross-process sandbox poisoning attacks [12, 36, 50]. Specif- signed to safely run native code (e.g., C/C++ and Rust) in ically, attackers can poison hardware predictors to coerce a the browser [27]. Since its initial design, though, Wasm has victim sandbox to speculatively execute gadgets that access been increasingly used to sandbox untrusted code outside the secret data—from their own memory region—and leak it via browser. For example, Fastly and Cloudflare use Wasm to the cache (e.g., by branching on the secret). Moreover, using sandbox client applications running on their edge clouds— process isolation would sacrifice Wasm’s scalability (running where multiple client applications run within a single pro- many sandboxes within a process) and performance (cheap cess [30, 85]. Mozilla uses Wasm to sandbox third-party startup times and context switching) [28, 30, 65, 85]. C/C++ libraries in Firefox [21, 65]. Yet others use Wasm to The other popular approach, removing speculation within isolate untrusted code in serverless computing [28], IoT appli- the sandbox, is also unsatisfactory. For example, using cations [10], games [62], trusted execution environments [17], pipeline fences to restrict Wasm code to sequential execution and even OS kernels [79]. imposes a 1.8×–7.3× slowdown on SPEC 2006 (§5). Con- In this paper, we focus on hardening Wasm against Spec- servatively inserting pipeline fences before every dynamic tre attacks—the class of transient execution attacks which load—an approach inspired by the mitigation available in exploit control flow predictors [49]. Transient execution at- Microsoft’s Visual Studio compiler [61]—is even worse: it tacks which exploit features within the memory subsystem incurs a 7.3×–19.6× overhead on SPEC (§5). (e.g., Meltdown [57], MDS [11, 72, 84], and Load Value In- 1 Thoughour techniques are general, for simplicity we henceforth focus jection [83]) are limited in scope and have already been fixed on Wasm as used on FaaS platforms. 1
In this paper, we take a compiler-based approach to hard- To eliminate host poisoning attacks, we use both Intel® ening Wasm against Spectre, without resorting to process CET and Intel® MPK. In particular, we use Intel® MPK to par- isolation or the use of fences. Our framework, Swivel, ad- tition the application into two domains—the host and Wasm dresses not only sandbox breakout and sandbox poisoning sandbox(es)—and, on context switch, ensure that each domain attacks, but also host poisoning attacks, i.e., Spectre attacks can only access its own memory regions. We use Intel® CET that coerce the process hosting the Wasm sandboxes into leak- forward-edge control-flow integrity to ensure that application ing sensitive data. That is, Swivel ensures that a malicious code cannot jump, sequentially or speculatively, into arbitrary Wasm tenant cannot speculatively access data outside their sandbox code (e.g., due to a poisoned BTB). We do this by sandbox nor coerce another tenant or the host to divulge se- inserting endbranch instructions in Wasm sandboxes to de- crets of other sandboxes via poisoning. We develop Swivel marcate valid jump targets, essentially partitioning the BTB via three contributions: into two domains. Our use of Intel’s MPK and CET ensures that even if the host code runs with poisoned predictors, it 1. Software-only Spectre hardening (§3.2) Our first con- cannot read—and thus leak—sandbox data. tribution, Swivel-SFI, is a software-only approach to harden- ing Wasm against Spectre. Swivel-SFI eliminates sandbox Since Intel® MPK only supports 16 protection regions, we breakout attacks by compiling Wasm code to linear blocks cannot use it to similarly prevent sandbox poisoning attacks: (LBs). Linear blocks are straight-line x86 code blocks that serverless, edge-cloud, and FaaS platforms have thousands satisfy two invariants: (1) all transfers of control, including of co-located tenants. Hence, to address sandbox poisoning function calls, are at the block boundary—to (and from) other attacks, like for Swivel-SFI, we consider and evaluate two linear blocks; and (2) all memory accesses within a linear designs. The first (again) uses ASLR to randomize the loca- block are masked to the sandbox memory. These invariants tion of each sandbox and flushes the BTB on sandbox entry; are necessary to ensure that the speculative control and data we don’t flush on sandbox exit since the host can safely run flow of the Wasm code is restricted to the sandbox boundary. with a poisoned BTB. The second is deterministic and not They are not sufficient though: Swivel-SFI must also be tol- only allows using conditional branches but also avoids BTB erant of possible RSB underflow. We address this by (1) not flushes. It does this using a new technique, register interlock- emitting ret instructions and therefore completely bypassing ing, which tracks the control flow of the Wasm sandbox and the RSB and (2) using a separate stack for return addresses. turns every misspeculated memory access into an access to To address poisoning attacks, Swivel-SFI must still account an empty guard page. Register interlocking allows a tenant for a poisoned BTB or CBP. Since these attacks are more so- to run with a poisoned BTB or CBP since any potentially phisticated, we evaluate two different ways of addressing poisoned speculative memory accesses will be invalidated. them, and allow tenants to choose between them according to their trust model. The first approach uses address space 3. Implementation and evaluation (§4–5) We implement layout randomization (ASLR) to randomize the placement of both Swivel-SFI and Swivel-CET by modifying the Lucet each Wasm sandbox and flushes the BTB on each sandbox compiler’s Wasm-to-x86 code generator (Cranelift) and run- boundary crossing. This does not eliminate poisoning attacks; time. To evaluate Swivel’s security we implement proof of it only raises the bar of Wasm isolation to that of process iso- concept breakout and poisoning attacks against stock Lucet lation. Alternately, tenants can opt to eliminate these attacks (mitigated by Swivel). We do this for all three Spectre variants, altogether; to this end, our deterministic Swivel-SFI rewrites i.e., Spectre attacks that abuse the CBP, BTB, and RSB. conditional branches to indirect jumps—thereby completely We evaluate Swivel’s performance against stock Lucet and bypassing the CBP (which cannot be directly flushed) and several fence-insertion techniques on several standard bench- relying solely on the BTB (which can). marks. On the Wasm compatible subset of the SPEC 2006 2. Hardware-assisted Spectre hardening (§3.3) Our sec- CPU benchmarking suite we find the ASLR variants of ond contribution, Swivel-CET, restores the use of all predic- Swivel-SFI and Swivel-CET impose little overhead—they tors, including the RSB and CBP, and partially obviates the are at most 10.3% and 6.1% slower than Lucet, respectively. need for BTB flushing. It does this by sacrificing backwards Our deterministic implementations, which eliminate all three compatibility and using new hardware security extensions: categories of attacks, incur modest overheads: Swivel-SFI Intel’s Control-flow Enforcement Technology (CET) [39] and and Swivel-CET are respectively 3.3%–86.1% (geomean: Memory Protection Keys (MPK) [39]. 47.3%) and 8.0%–240.2% (geomean: 96.3%) slower than Like Swivel-SFI, Swivel-CET relies on linear blocks to Lucet. These overheads are smaller than the overhead im- address sandbox breakout attacks. But Swivel-CET does not posed by state-of-the-art fence-based techniques. avoid ret instructions. Instead, we use Intel® CET’s hardware shadow stack to ensure that the RSB cannot be misused to Open source and data We make all source and data avail- speculatively return to a location that is different from the able under an open source license at: https://swivel.pro expected function return site on the stack [39]. gramming.systems. 2
Wasm Client A Wasm Client B ... Wasm Client X the host context and resume execution in the host. FaaS host procees Trampoline Trampoline Trampoline Using Wasm in FaaS platforms WebAssembly FaaS plat- Springboard Requests forms like Fastly’s Terrarium allow clents to deploy scal- FaaS Runtime able function-oriented Web and cloud applications written in any langauge (that can be compiled to Wasm). Clients com- Figure 1: FaaS platform using Wasm to isolate mutually distrusting tenants. pile their code to Wasm and upload the resulting module to 2 A brief overview of Wasm and Spectre the platform; the platform handles scaling and isolation. As shown in Figure 1, FaaS platforms place thousands of client In this section, we give a brief overview of WebAssembly’s Wasm modules within a single host process, and distribute use in multi-tenant serverless and edge-cloud computing these processes across thousands of servers and multiple data- platforms—and more generally function as a service (FaaS) centers. This is the key to scaling—it allows any host process, platforms. In particular, we describe how FaaS platforms use in any datacenter, to spawn a fresh Wasm sandbox instance Wasm as an intermediate compilation layer for isolating dif- and run any client function in response to a user request. By ferent tenants today. We then briefly review Spectre attacks using Wasm as an intermediate layer, FaaS platforms isolate and describe how today’s approach to isolating Wasm code is the client for free [6, 30, 60, 65, 66, 88]. Unfortunately, this vulnerable to this class of attacks. isolation does not hold in the presence of Spectre attacks. 2.1 WebAssembly 2.2 Spectre attacks Wasm is a low-level 32-bit machine language explicitly de- signed to embed C/C++ and Rust code in host applications. Spectre attacks exploit hardware predictors to induce mis- Wasm programs (which are simply a collection of functions) predictions and speculatively execute instructions—gadgets— are (1) deterministic and well-typed, (2) follow a structured that would not run sequentially [49]. Spectre attacks are clas- control flow discipline, and (3) separate the heap—the linear sified by the hardware predictor they exploit [12]. We focus memory—from the well-typed stack and program code. These on the three Spectre variants that hijack control flow: properties make it easy for compilers like Lucet to sandbox Wasm code and safely embed it within an application [27]. I Spectre-PHT Spectre-PHT [49] exploits the pattern his- tory table (PHT), which is used as part of the conditional Control flow safety Lucet’s code generator, Cranelift, en- branch predictor (CBP) to guess the direction of a condi- sures that the control flow of the compiled code is restricted to tional branch while the condition is still being evaluated. the sandbox and cannot be bent to bypass bounds checks (e.g., In a Spectre-PHT attack, the attacker (1) pollutes entries via return-oriented programming). This comes directly from in the PHT so that a branch is mispredicted to the wrong preserving Wasm’s semantics during compilation. For ex- path. The attacker can then use this wrong-path execution ample, compiled code preserves Wasm’s safe stack [27, 52], to bypass memory isolation guards or control flow integrity. ensuring that stack frames (and thus return values on the I Spectre-BTB Spectre-BTB [49] exploits the branch target stack) cannot be clobbered. The compiled code also enforces buffer (BTB), which is used to predict the target of an Wasm’s coarse-grained CFI and, for example, matches the indirect jump [94]. In a Spectre-BTB attack, the attacker type of each indirect call site with the type of the target. pollutes entries in the BTB, redirecting speculative control Memory isolation When Lucet creates a Wasm sandbox, flow to an arbitrary target. Spectre-BTB can thus be used it reserves 4GB of virtual memory for the Wasm heap and to speculatively execute gadgets that are not in the normal uses Cranelift to bound all heap loads and stores. To this end, execution path (e.g., to carry out a ROP-style attack). Cranelift (1) explicitly passes a pointer to the base of the I Spectre-RSB Spectre-RSB [50, 58] exploits the return heap as the first argument to each function and (2) masks all stack buffer (RSB), which memorizes the location of re- pointers to be within this 4GB range. Like previous software- cently executed call instructions to predict the targets of based isolation (SFI) systems [88], Cranelift avoids expensive ret instructions. In a Spectre-RSB attack, the attacker uses bounds check operations by using guard pages to trap any chains of call or ret instructions to over- or underflow the offsets that may reach beyond the 4GB heap space. RSB, and redirect speculative control flow in turn. Embedding Wasm An application with embedded Wasm Spectre can be used in-place or out-of-place [12]. In an in- code will typically require context switching between the place attack, the attacker mistrains the prediction for a victim Wasm code and host—e.g., to read data from a socket. Lucet branch by repeatedly executing the victim branch itself. In an allows safe control and data flow across the host-sandbox out-of-place attack, the attacker finds a secondary branch that boundary via springboards and trampolines. Springboards is congruent to the victim branch—predictor entries are in- are used to enter Wasm code—they set up the program context dexed using a subset of address bits—and uses this secondary for Wasm execution—while trampolines are used to restore branch to mistrain the prediction for the victim branch. 3
Branch Wasm address space Guide 1 mov rdx,QWORD PTR [fn_table_len] ; get fn table length Predictor Scenario 1: Malicious sandbox Benign 2 cmp rcx,rdx ; check that rcx is in-bounds Unit redirects itself to a Spectre Malicious prediction 3 jb index_ok Trampoline gadget Sandbox Spectre 4 ud2 ; trap otherwise prediction 5 index_ok: Spectre 6 lea rdx,[fn_table] access Outside 7 mov rcx,QWORD PTR [rdx+rcx*4] x Sandbo Data call rcx Trampoline leakage 8 Scenario 2: Malicious sandbox redirects the benign sandbox Victim Spectre gadget Figure 3: A simplified snippet of the vulnerable code from our Spectre-PHT to a Spectre gadget Sandbox breakout attack. This code is safe during sequential execution (it checks Sensitive data the index rcx before using it to load a function table entry). But, during Springboard speculative execution, control flow may bypass this check and access memory Scenario 3: Malicious sandbox outside the function table bounds. redirects the application to a FaaS Runtime Spectre gadget Swivel-SFI Swivel-CET Figure 2: A malicious tenant can fill branch predictors with invalid state Attack variant ASLR Det ASLR Det (red). In one scenario, the attacker causes its own branches to speculatively execute code that access memory outside of the sandbox. In the second in-place Spectre-PHT and third scenarios, the attacker uses Spectre to respectively target a victim out-of-place sandbox or the host runtime to misspeculate and leak secret data. in-place Spectre-BTB out-of-place in-place 2.3 Spectre attacks on FaaS platforms Spectre-RSB out-of-place A malicious FaaS platform client who can upload arbitrary Wasm code can force the Wasm compiler to emit native code Table 1: Effectiveness of Swivel against different Spectre variants. A full circle indicates that Swivel eliminates the attack while a half circle indicates which is safe during sequential execution, but uses Spectre that Swivel only mitigates the attack. to bypass Wasm’s isolation guarantees during speculative execution. We identify three kinds of attacks (Figure 2): the table, and calls it; otherwise the code traps. I Scenario 1: Sandbox breakout attacks The attacker An attacker can mistrain the conditional branch on line 3 bends the speculative control flow of their own module and cause it to speculatively jump to index_ok even when to access data outside the sandbox region. For example, rcx is out-of-bounds. By controlling the contents of rcx, the they can use Spectre-PHT to bypass conditional bounds attacker can thus execute arbitrary code locations outside checks when accessing the indirect call table. Alternatively, the sandbox. In Section 5.4 we demonstrate several proof they can use Spectre-BTB to transfer the control flow into of concept attacks, including a breakout attack that uses this the middle of instructions to execute unsafe code (e.g., code gadget. These attacks serve to highlight the importance of that bypasses Wasm’s implicit heap bounds checks). hardening Wasm against Spectre. I Scenario 2: Sandbox poisoning attacks The attacker uses an out-of-place Spectre attack to bend the control flow 3 Swivel: Hardening Wasm against Spectre of a victim sandbox and coerce the victim into leaking Swivel extends Lucet—and the underlying Cranelift code their own data. Although this attack is considerably more generator—to address Spectre attacks on FaaS Wasm plat- sophisticated, we were still able to implement proof of forms. We designed Swivel with several goals in mind. First, concept attacks following Canella et al. [12]. Here, the performance: Swivel minimizes the number of pipeline fences attacker finds a (mispredicted) path in the victim sandbox it inserts, allowing Wasm to benefit from speculative execu- that leads to the victim leaking data, e.g., through cache tion as much as possible. Second, automation: Swivel does state. They then force the victim to mispredict this path by not rely on user annotations or source code changes to guide using a congruent branch within their own sandbox. mitigations; we automatically apply mitigations when com- I Scenario 3: Host poisoning attacks Instead of bending piling Wasm. Finally, modularity: Swivel offers configurable the control flow of the victim sandbox, the attacker can use protection, ranging from probabilistic schemes with high per- an out-of-place Spectre attack to bend the control flow of formance to thorough mitigations with strong guarantees. This the host runtime. This allows the attacker to speculatively allows Swivel users to choose the most appropriate mitiga- access data from the host as well as any other sandbox. tions (see Table 1) according to their application domain, Figure 3 gives an example sandbox breakout gadget. The gad- security considerations, or particular hardware platform. get is in the implementation of the Wasm call_indirect in- In the rest of this section, we describe our attacker model struction, which is used to call functions indexed in a module- and introduce a core abstraction: linear blocks. We then show level function table. This code first compares the function how linear blocks, together with several other techniques, are index rcx to the length of the function table (to ensure that used to address both sandbox breakout and poisoning attacks. rcx points to a valid entry). If rcx is valid, it then jumps to These techniques span two Swivel designs: Swivel-SFI, a index_ok, loads the function from the corresponding entry in software-only scheme which provides mitigations compatible 4
with existing CPUs; and Swivel-CET, which uses hardware native code, Swivel ensures that a linear block is safe by: extensions (Intel® CET and Intel® MPK) available in the 11th generation Intel® CPUs. Masking memory accesses Since we cannot make any as- sumptions about the initial contents of registers, Swivel en- Attacker model We assume that the attacker is a FaaS plat- sures that unconditional heap bounds checks (performed via form client who can upload arbitrary Wasm code which the masking) are performed in the same linear block as the heap platform will then compile and run alongside other clients memory access itself. We do this by modifying the Cranelift using Swivel. The goal of the attacker is to read data sensi- optimization passes which could lift bounds checks (e.g., loop tive to another (victim) client using Spectre attacks. In the invariant code motion) to ensure that they don’t move masks Swivel-CET case, we only focus on exfiltration via the data across linear block boundaries. Similarly, since we cannot cache—and thus assume an attacker who can only exploit trust values on the stack, Swivel ensures that any value that gadgets that leak via the data cache. We consider transient at- is unspilled from the stack and used in a bounds check is tacks that exploit the memory subsystem (e.g., Meltdown [57], masked again. We use this mask-after-unspill technique to MDS [11, 72, 84], and LVI [83]) out of scope and discuss this replace Cranelift’s unsafe mask-before-spill approach. in detail in Section 6.2. We assume that our Wasm compiler and runtime are cor- Pinning the heap registers To properly perform bounds rect. We similarly assume the underlying operating system checks for heap memory accesses, a Swivel linear block must is secure and the latest CPU microcode updates are applied. determine the correct value of the heap base. Unfortunately, as We assume hyperthreading is disabled for any Swivel scheme described above, we cannot make any assumptions about the except for the deterministic variant of Swivel-CET. Consis- contents of any register or stack slot. Swivel thus reserves one tent with previous findings [94], we assume BTBs predict the register, which we call the pinned heap register, to store the lower 32-bits of target addresses, while the upper 32-bits are address of the sandbox heap. Furthermore, Swivel prevents inferred from the instruction pointer. any instructions in the sandbox from altering the pinned heap Swivel addresses attackers that intentionally extract infor- register. This allows each linear block to safely assume that mation using Spectre. We do not prevent clients from acciden- the pinned heap register holds the correct value of the heap tally leaking secrets during sequential execution and, instead, base, even when the speculative control flow of the program assume they use techniques like constant-time programming has gone awry due to misprediction. to prevent such leaks [14]. For all Swivel schemes except the deterministic variant of Swivel-CET, we assume that a sand- Hardening jump tables Wasm requires bounds checks on box cannot directly invoke function calls in other sandboxes, each access to indirect call tables and switch tables. Swivel i.e., it cannot control the input to another sandbox to perform ensures that each of these bounds checks is local to the lin- an in-place poisoning attack. We lastly assume that host se- ear block where the access is performed. Moreover, Swivel crets can be protected by placing them in a Wasm sandbox, implements the bounds check using speculative load hard- and discuss this further in Section 6.1. ening [13], masking the index of the table access with the length of the table. This efficiently prevents the attacker from 3.1 Linear blocks: local Wasm isolation speculatively bypassing the bounds check. To enforce Wasm’s isolation sequentially and speculatively, Swivel does not check the indirect jump targets beyond Swivel-SFI and Swivel-CET compile Wasm code to linear what Cranelift does. At the language level, Wasm already blocks (LBs). Linear blocks are straight-line code blocks that guarantees that the targets of indirect jumps (i.e., the entries do not contain any control flow instructions except for their in indirect call tables and switch tables) can only be the tops of terminators—this is in contrast to traditional basic blocks, functions or switch-case targets. Compiled, these correspond which typically do not consider function calls as terminators. to the start of Swivel linear blocks. Thus, an attacker can only This simple distinction is important: It allows us to ensure that train the BTB with entries that point to linear blocks, which, all control flow transfers—both sequential and speculative— by construction, are safe to execute in any context. land on linear block boundaries. Then, by ensuring that indi- vidual linear blocks are safe, we can ensure that whole Wasm Protecting returns Wasm’s execution stack—in particular, programs, when compiled, are confined and cannot violate return addresses on the stack—cannot be corrupted during Wasm’s isolation guarantees. sequential execution. Unfortunately, this does not hold specu- A linear block is safe if, independent of the control flow latively: An attacker can write to the stack (e.g., with a buffer into the block, Wasm’s isolation guarantees are preserved. overflow) and speculatively execute a return instruction which In particular, we cannot rely on safety checks (e.g., bounds will divert the control flow to their location of choice. Swivel checks for memory accesses) performed across linear blocks ensures that return addresses on the stack cannot be corrupted since, speculatively, blocks may not always execute in sequen- as such, even speculatively. We do this using a separate stack tial order (e.g., because of Spectre-BTB). When generating or shadow stack [8], as we detail below. 5
Spectre-unsafe Wasm compilation Software-only Swivel Hardware-assisted Swivel block1: lblock1_1: Linear block lblock1_1: instx instx instx insty insty insty mask store block4, store block4, store block4, call call call block4: instz instx block4: CET endbranch access insty Register interlocking lblock1_2: instx jmp block2 retn lblock1_2: intz insty mask load sstack, register_interlock block4: access jmp intz endbranch block2: jmp block2 mask register_interlock Separate stack instc access instx cmp jmp block2 insty jnz block3 RSB to BTB + retn separate stack block2: Shadow stack instc block2: instd block3: mov 0, register_interlock CET shadow stack jmp inste cmp instc ... cmov Lblock3, cmp jmp CBP to BTB jnz block3 Key Direct branch: jmp/call block? instd block3: Indirect branch: jmp/call instd block3: jmp inste jmp inste ... Conditional Branch: jCC block? ... x86 return instruction Figure 4: Swivel hardens Wasm against spectre via compiler transformations. In Swivel-SFI, we convert basic blocks to linear blocks. Each linear block (e.g., lblock1_1 and lblock1_2) maintains local security guarantees for speculative execution. Then, we protect the backward edge (block4) by replacing the return instructions and using a separate return stack. To eliminate poisoning attacks, in the deterministic version of Swivel-SFI, we further encode conditional branches as indirect jumps. In Swivel-CET, we similarly use linear blocks, but we allow return instruction, and protect returns using the hardware shadow stack. To reduce BTB flushes, we additionally use Intel® CET’s endbranch to ensure that targets of indirect branches land at the beginning of linear blocks. In the deterministic version, we avoid BTB flushing and instead use register interlocking to prevent leakage on misspeculated paths. 3.2 Swivel-SFI 3.2.2 Addressing sandbox and host poisoning attacks Swivel-SFI builds on top of linear blocks to address all three There are two ways for a malicious sandbox to carry out poi- classes of attacks. soning attacks: By poisoning CBP or BTB entries. Since we already flush the BTB to address sandbox breakout attacks, 3.2.1 Addressing sandbox breakout attacks we trivially prevent all BTB poisoning. Addressing CBP poi- Compiling Wasm code to linear blocks eliminates most av- soning is less straightforward. We consider two schemes: enues for breaking out of the sandbox. The only way for an Mitigating CBP poisoning To mitigate CBP-based poison- attacker to break out of the sandbox is to speculatively jump ing attacks, we use ASLR to randomize the layout of sandbox into the middle of a linear block. We prevent this with: code pages. This mitigation is not sound—it is theoretically The separate stack We protect returns by preserving possible for an attacker to influence a specific conditional Wasm’s safe return stack during compilation. Specifically, branch outside of the sandbox. As we discuss in Section 3.4, we create a separate stack in a fixed memory location, but this raises the bar to (at least) that of process isolation: The outside the sandbox stack and heap, to ensure that it cannot be attacker would would have to (1) de-randomize the ASLR of overwritten by sandboxed code. We replace every call instruc- both their own sandbox and the victim’s and (2) find useful tion with an instruction sequence that stores the address of gadgets, which is itself an open problem (§7). the subsequent instruction—the return address—to the next Eliminating CBP poisoning Clients that are willing to tol- entry in this separate stack. Similarly, we replace every return erate modest performance overheads (§5) can opt to eliminate instruction with a sequence that pops the address off the sep- poisoning attacks. We eliminate poisoning attacks by remov- arate stack and jumps to that location. To catch under- and ing conditional branches from Wasm sandboxes altogether. over-flows, we surround the separate stack with guard pages. Following [54], we do this by using the cmov conditional move BTB flushing The other way an attacker can jump into the instruction to encode each conditional branch as an indirect middle of a linear block is via a mispredicted BTB entry. branch with only two targets (Figure 4). Since all indirect jumps inside a sandbox can only point to the tops of linear blocks, any such entries can only be set via 3.3 Swivel-CET a congruent entry outside any sandbox—i.e., an attacker must Swivel-SFI avoids using fences to address Spectre attacks, orchestrate the host runtime into mistraining a particular jump. but ultimately bypasses all but the BTB predictors—and even We prevent such attacks by flushing the BTB on transitions then we flush the BTB on every sandbox transition. Swivel- into and out of the sandbox.2 CET uses Intel® CET [39] and Intel® MPK [39] to restore 2 In practice, BTB predictions are not absolute (as discussed in our attacker To ensure that this does not result in predictions at non linear block bound- model), instead they are 32-bit offsets relative to the instruction pointer [94]. aries, we restrict the sandbox code size to 4GB. 6
the use of the CBP and RSB, and avoid BTB flushing.3 sensitive data. As with Swivel-SFI, we consider both a proba- bilistic and deterministic design to addressing these attacks. 3.3.1 Addressing sandbox breakout attacks Since the probabilistic approach is like Swivel-SFI’s, we de- Like Swivel-SFI, we build on linear blocks to address sandbox scribe only the deterministic design. breakout attacks (Figure 4). Swivel-CET, however, prevents Preventing leaks under poisoned execution Swivel-CET an attacker from speculatively jumping into the middle of a does not eliminate cross-sandbox CBP or BTB poisoning. linear block using: Instead, we ensure that a victim sandbox cannot be coerced The shadow stack Swivel-CET uses Intel® CET’s shadow into leaking data via the cache when executing a mispredicted stack to protect returns. Unlike Swivel-SFI’s separate stack, path. To leak secrets through the cache, the attacker must the shadow stack is a hardware-maintained stack, distinct from maneuver the secret data to a gadget that will use it as an the ordinary data stack and inaccessible via standard load and offset into a memory region. In Cranelift, any such gadget store instructions. The shadow stack allows us to use call and will use the heap, as stack memory is always accessed at return instructions as usual—the CPU uses the shadow stack constant offsets from the stack pointer (which itself cannot be to check the integrity of return addresses on the program stack directly assigned). We thus need only prevent leaks that are during both sequential and speculative execution. via the Wasm heap—we do this using register interlocks. Forward-edge CFI Instead of flushing the BTB, Swivel- Register interlocking Our register interlocking technique CET uses Intel® CET’s coarse-grained control flow integrity tracks the control flow of a Wasm program and prevents it (CFI) [1] to ensure that sandbox code can only jump to the from accessing its stack or heap when the speculative path top of a linear block. We do this by placing an endbranch in- diverges from the sequential path. We first assign each non- struction at the beginning of every linear block that is used as trivial linear block a unique 64-bit block label. We then calcu- an indirect target (e.g., the start of a function that is called in- late the expected block label of every direct or indirect branch directly). During speculative execution, if the indirect branch and assign this value to a reserved interlock register prior to predictor targets an instruction other than an endbranch (e.g., branching. At the beginning of each linear block, we check inside the host runtime), the CPU stops speculating [39]. that the value of the interlock register corresponds to the static block label using cmov instructions. If the two do not match, Conditional BTB flushing When using ASLR to address we zero out the heap base register as well as the stack register. sandbox poisoning attacks, we still need to flush the BTB Finally, we unmap pages from the address space to ensure on transitions into each sandbox. Otherwise, one sandbox that any access from a zero heap or stack base will fault—and could potentially jump to a linear block in another sandbox. thus will not affect cache state. Our deterministic approach to sandbox poisoning (described The register interlock fundamentally introduces a data de- below), however, eliminates BTB flushes altogether. pendency between memory operations and the resolution of 3.3.2 Addressing host poisoning attacks control flow. In doing so, we prevent any memory operations that would result in cache based leaks, but do not prevent all To prevent host poisoning attacks, Swivel-CET uses Intel® speculative execution. In particular, any arithmetic operations MPK. Intel® MPK exposes new user mode instructions that may still be executed speculatively. This is similar to hard- allow a process to partition its memory into sixteen linear ware taint tracking [93], but enforced purely through compiler regions and to selectively enable/disable read/write access changes. to any of those regions. Swivel-CET uses only two of these Finally, Wasm also stores certain data (e.g., globals vari- protection domains—one for the host and one shared by all ables and internal structures) outside the Wasm stack or heap. sandboxes—and switches domains during the transitions be- To secure these memory accesses with the register interlock, tween host and sandbox. When the host creates a new sandbox, we introduce an artificial heap load in the address computation Swivel-CET allocates the heap memory for the new sandbox for this data. with the sandbox protection domain, and then relinquishes its own access to that memory so that it is no longer accessible by 3.4 Security and performance trade-offs the host. This prevents host poisoning attacks by ensuring that Swivel offers two design points for protecting Wasm modules the host cannot be coerced into leaking secrets from another from Spectre attacks: Swivel-SFI and Swivel-CET. For each sandbox. We describe how we safely copy data across the of these schemes we further consider probabilistic (ASLR) boundary later (§4). and deterministic techniques. In this section, we discuss the performance and security trade-offs when choosing between 3.3.3 Addressing sandbox poisoning attacks these various Swivel schemes. By poisoning CBP or BTB entries, a malicious sandbox can 3.4.1 Probabilistic or deterministic? coerce a victim sandbox into executing a gadget that leaks Table 1 summarizes Swivel’s security guarantees. Swivel’s 3 Appendix A.1 gives a brief introduction to these new hardware features. deterministic schemes eliminate Spectre attacks, while the 7
Swivel-SFI Swivel-CET only to mitigate sandbox poisoning attacks—and unlike sand- Swivel protection and technique ASLR Det ASLR Det box breakout attacks, these attacks are significantly more Sandbox breakout protections challenging for an attacker to carry out: They must conduct - Linear blocks [CBP, BTB, RSB] an out-of-place attack on a specific target while accounting - BTB flush in springboard [BTB] for the unpredictable mapping of the branch predictor. To our - Separate control stack [RSB] knowledge, such an attack has not been demonstrated, even - CET endbranch [BTB] without the additional challenges of defeating ASLR. - CET shadow stack [RSB] Furthermore, on a FaaS platform, these attacks are even Sandbox poisoning protections - BTB flush in springboard [BTB] harder to pull off, as the attacker has only a few hundred - Code page ASLR [CBP] milliseconds to land an attack on a victim sandbox instance - Direct branches to indirect [CBP] before it finishes—and the next victim instance will have - Register interlock [CBP, BTB] entirely new mappings. Previous work suggests that such an Host poisoning protections attack is not practical in such a short time window [18]. - Separate control stack [RSB] For other application domains, the overhead of the deter- - Code page ASLR [CBP] ministic Swivel variants may yet be reasonable. As we show - BTB flush in trampoline [BTB] in Section 5, the average (geometric) overhead of Swivel-SFI - Direct branches to indirect [CBP] is 47.3% and that of Swivel-CET is 96.3%. Moreover, users - Two domain MPK [CBP] can choose to use Swivel-SFI and Swivel-CET according to Table 2: Breakdown of Swivel’s individual protection techniques which, their trust model—Swivel allows sandboxes of both designs when combined, address the thee different class of attacks on Wasm (§2.3). to coexist within a single process. For each technique we also list (in brackets) the underlying predictors. 3.4.2 Software-only or hardware-assisted? probabilistic schemes eliminate Spectre attacks that exploit the BTB and RSB, but trade-off security for performance Swivel-SFI and Swivel-CET present two design points that when it comes to the CBP (§5): Our probabilistic schemes have different trade-offs beyond backwards compatibility. We only mitigate Spectre attacks that exploit the CBP. discuss their trade-offs, focusing on the deterministic variants. To this end, (probabilistic) Swivel hides branch offsets by Swivel-SFI eliminates Spectre attacks by allowing spec- randomizing code pages. Previously, similar fine-grain ap- ulation only via the BTB predictor and by controlling BTB proaches to address randomization have been proposed to mit- entries through linear blocks and BTB flushing. Swivel-CET, igate attacks based on return-oriented programming [15, 22]. on the other hand, allows the other predictors. To do this Specifically, when loading a module, Swivel copies the code safely though, we use register interlocking to create data pages of the Wasm module to random destinations, random- dependencies (and thus prevent speculation) on certain op- izing all but the four least significant bits (LSBs) to keep erations after branches. Our interlock implementation only 16-byte alignment. This method is more fine-grained than guards Wasm memory operations—this means that, unlike page remapping, which would fail to randomize the lower 12 Swivel-SFI, Swivel-CET only prevents cache-based sand- bits for 4KB instruction pages. box poisoning attacks. While non-memory instructions (e.g., Unfortunately, only a subset of address bits are typically arithmetic operations) can still speculatively execute, register used by hardware predictors. Zhang et. al [94], for example, interlocks sink performance: Indeed, the overall performance found that only the 30 LSBs of the instruction address are overhead of Swivel-CET is higher than Swivel-SFI (§5). used as input for BTB predictors. Though a similar study has At the same time, Swivel-CET can be used to handle a not been conducted for the CBP, if we pessimistically assume more powerful attacker model (than our FaaS model). First, that 30 LSBs are used for prediction then our randomization Swivel-CET eliminates poisoning attacks even in the pres- offers at least 26 bits of entropy. Since the attacker must de- ence of attacker-controlled input. This is a direct corollary of randomize both their module and the victim module, this is being able to safely execute code with poisoned predictors. likely higher in practice. Second, Swivel-CET (in the deterministic scheme) is safe in As we show in Section 5, the ASLR variants of Swivel are the presence of hyperthreading; our other Swivel schemes faster than the deterministic variants. Using code page ASLR assume that hyperthreading is disabled (§3). Swivel-CET al- imposes less overhead than the deterministic techniques (sum- lows hyperthreading because it doesn’t rely on BTB flushing; marized in Table 2). This is not surprising: CBP conversion it uses register interlocking to eliminate sandbox poisoning (in Swivel-SFI) and register interlocking (in Swivel-CET) are attacks. In contrast, our SFI schemes require the BTB to be the largest sources of performance overhead. isolated for the host and each Wasm sandbox—an invariant For many application domains, this security-performance that may not hold if, for example, hyperthreading interleaves trade-off is reasonable. Our probabilistic schemes use ASLR host application and Wasm code on sibling threads. 8
4 Implementation transition functions to switch between the application and sandbox Intel® MPK domains. We implement Swivel on top of the Lucet Wasm compiler Since Intel® MPK blocks the application from accessing and runtime [30, 60]. In this section, we describe our modifi- sandbox memory, we add primitives that briefly turn off Intel® cations to Lucet. MPK to copy memory into and out of sandboxes. We imple- We largely implement Swivel-SFI and Swivel-CET as ment these primitives using the rep instruction prefix instead passes in Lucet’s code generator Cranelift. For both schemes, of branching code, ensuring that the primitives are not vulner- we add support for pinned heap registers and add direct jmp able to Spectre attacks during this window. instructions to create linear block boundaries. We modify Finally, we add Intel® CET support to both the Rust Cranelift to harden switch-table and indirect-call accesses: compiler—used to compile Lucet—and to Lucet itself so Before loading an entry from either table, we truncate the that the resulting binaries are compatible with the hardware. index to the length of the table (or the next power of two) using a bitwise mask. We also modify Lucet’s stack overflow 5 Evaluation protection: Lucet emits conditional checks to ensure that the We evaluate Swivel by asking four questions: stack does not overflow; these checks are rare and we simply use lfences. I What is the overhead of Wasm execution? (§5.1) We modify the springboard and trampoline transition func- Swivel’s hardening schemes make changes to the code tions in the Lucet runtime. Specifically, we add a single generated by the Lucet Wasm compiler. We examine lfence to each transition function since we must disallow the performance impact of these changes on Lucet’s speculation from crossing the host-sandbox boundary. Sightglass benchmark suite [9] and Wasm-compatible The deterministic defenses for both Swivel-SFI and Swivel- SPEC 2006 [29] benchmarks. CET—CBP conversion and register interlocks—increase the I What is the overhead of transitions? (§5.2) Swivel mod- cost of conditional control flow. To reduce the number of ifies the trampolines and springboards used to transition conditional branches, we thus enable explicit loop unrolling into and out of Wasm sandboxes. The changes vary across flags when compiling the deterministic schemes.4 This is our different schemes—from adding lfences, or flushing not necessary for the ASLR-based variants since they do not the BTB during one or both transition directions, to switch- modify conditional branches. Indeed, the ASLR variants are ing Intel® MPK domains. We measure the impact of these straightforward modifications to the dynamic library loader changes on transition costs using a microbenchmark. used by the Lucet runtime: Since all sandbox code is position I What is the end-to-end overhead of Swivel? (§5.3) We independent, we just copy a new sandbox instance’s code and examine the impact of Wasm execution overhead and transi- data pages to a new randomized location in memory. tion overhead on a webserver that runs Wasm services. We We also made changes specific to each Swivel scheme: measure the impact of Swivel protections on five different Wasm workloads running on this webserver. Swivel-SFI We augment the Cranelift code generation pass I Does Swivel eliminate Spectre attacks? (§5.4) We eva- to replace call and return instructions with the Swivel-SFI lute the security of Swivel, i.e., whether Swivel prevents separate stack instruction sequences and we mask pointers sandbox breakout and poisoning attacks, by implementing when they are unspilled from the stack. For the deterministic several proof-of-concept Spectre attacks. variant of Swivel-SFI, we also replace conditional branches with indirect jump instructions, as described in Section 3.2. Machine setup We run our benchmarks on a 4-core, To protect against sandbox poisoning attacks (§2.3), we 8-thread Tigerlake CPU software development platform flush the BTB during the springboard transition into any sand- (2.7GHz with a turbo boost of 4.2GHz) supporting the Intel® box. Since this is a privileged operation, we implement this CET extension. The machine has 16 GB of RAM and using a custom Linux kernel module. runs 64-bit Fedora 32 with the 5.7.0 Linux kernel modified to include Intel® CET support [34]. Our Swivel modifica- Swivel-CET In the Swivel-CET code generation pass, we tions are applied to Lucet version 0.7.0-dev, which includes place endbranch instructions at each indirect jump target in Cranelift version 0.62.0. We perform benchmarks on stan- Wasm to enable Intel® CET protection. These indirect jump dard SPEC CPU 2006, and Sightglass version 0.1.0. Our targets include switch table entries and functions which may webserver macrobenchmark relies on the Rocket webserver be called indirectly. We also use this pass to emit the register version 0.4.4, and we use wrk version 4.1.0 for testing. interlocks for the deterministic variant of Swivel-CET. We adapt the springboard and trampoline transition func- 5.1 Wasm execution overhead tions to ensure that all uses of jmp, call and return conform We measure the impact of Swivel’s Spectre mitigations on to the requirements of Intel® CET. We furthermore use these Wasm performance in Lucet using two benchmark suites: 4 Forsimplicity, we do this in the Clang compiler when compiling appli- I The Sightglass benchmark suite [9], used by the Lucet cations to Wasm and not in Lucet proper. compiler, includes small functions such as cryptographic 9
32.8× 348% 468% 333% 392% 349% 476% LoadLfence Blade 300% Stock-Unrolled SFI-Det 25× Strawman Relative execution time 250% SFI-ASLR CET-Det Execution overhead CET-ASLR 20× 200% 15× 150% 100% 10× 50% 5× 0% 1× -50% se n se n k m maatr k rc t st at2 rc t st at2 st chr st chr m 20 m 20 ed ct 64 25 ype fi 9 he g b2 G ach 20 ed ct 64 51e fi 9 he g b2 G ch 20 ke so i ke so i c rt em tr ix m ix2 c rt em tr ix ne m ov2 nd om lim2 si it si it st ve st len sw it k xb i ch h bla 2 neest ini ve neste dlo sv st dlo op ra oo 2 ra nd p3 ra do m te m2 n st ve st len swwit k xcbla itchh ha bla 2 ne st ini e neste dlo sv st dlo op ra oo 2 ra nd p3 n l l st rca st rca m ca m ca ap im ap im sw to s rto ba an ba an 25 yp 51 l p xc la tch ea m ix l p te m c ea lim m maatr n o e c e c e e n m o eo a eo a ed o ed o r r r r r ra o rm rm ke ke x ac ac (a) Fence scheme overhead on Sightglass (b) Swivel scheme overhead on Sightglass Figure 5: Performance overhead of Swivel on the Sightglass benchmarks. (a) On Sightglass, the baseline schemes LoadLfence, Strawman, and Mincut incur geomean overheads of 8.7×, 6.9×, and 2.4× respectively. (b) In contrast, the Swivel schemes perform much better where the ASLR versions of Swivel-SFI and Swivel-CET incur geomean overheads of 5.5% and 4.2% respectively. With deterministic sandbox poisoning mitigations, these overheads are 61.9% and 99.7%. 20× 250% Relative execution time Execution overhead LoadLfence 200% Stock-Unrolled SFI-Det 15× Strawman SFI-ASLR CET-Det Mincut 150% CET-ASLR 10× 100% 50% 5× 0% 1× -50% r 2 cf ilc d m bm n r 2 cf ilc d um m n ta ta ip m ea ip m ea m m tu lb m m as as bz bz na l na nt m 9_ 0_ m 9_ 0_ an 3_ 3_ 3_ 3_ a 1_ 1_ eo 4_ eo 4_ 42 47 42 47 43 43 qu qu 47 47 40 40 44 44 G G lib lib 2_ 2_ 46 46 (a) Fence scheme overhead on SPEC 2006 (b) Swivel scheme overhead on SPEC 2006 Figure 6: Performance overhead of Swivel on SPEC 2006 benchmarks. (a) On SPEC 2006, the baseline schemes LoadLfence, Strawman, and Mincut incur overheads of 7.3×–19.6×, 1.8×–7.3×, and 1.2×–5.4× respectively. (b) In contrast, the Swivel schemes perform much better where the ASLR versions of Swivel-SFI and Swivel-CET incur overheads of at most 10.3% and 6.1% respectively. With deterministic sandbox poisoning mitigations, these overheads are 3.3%–86.1% and 8.0%–240.2% respectively. primitives (ed25519, xchacha20), mathematical functions run them with the default settings. Sightglass repeats each test (ackermann, sieve), and common programming utilities at least 10 times or for a total of 100ms (whichever occurs (heapsort, strcat, strtok). first) and reports the median runtime. We compare Swivel’s I SPEC CPU 2006 is a popular performance benchmark performance overhead with respect to the performance of the that includes various integer and floating point workloads same benchmarks when using the stock Lucet compiler. For from typical programs. We evaluate on only the subset increased measurement consistency on short-running bench- of the benchmarks from SPEC 2006 that are compatible marks, while measuring Sightglass we pin execution to a with Wasm and Lucet. This excludes programs written single core and disable CPU frequency scaling.6 in Fortran, programs that rely on dynamic code rewriting, Baseline schemes In addition to our comparison against programs that require more than 4GB of memory which Stock Lucet, we also implement three known Spectre miti- Wasm does not support, and programs that use exceptions, gations using lfences and compare against these as a refer- longjmp, or multithreading.5 We note that Swivel does not ence point. First, we implement LoadLfence, which places an introduce new incompatibilities with SPEC 2006 bench- lfence after every load, similar to Microsoft’s Visual Studio marks; all of Swivel’s schemes are compatible with the compiler’s “Qspectre-load” mitigation [61]. Next, we imple- same benchmarks as stock Lucet. ment Strawman, a scheme which restricts code to sequential Setup We compile both Sightglass and the SPEC 2006 execution by placing an lfence at all control flow targets (at benchmarks with our modified Lucet Wasm compiler and the start of all linear blocks)—this is similar to the Intel com- piler’s “all-fix-lfence” mitigation [40]. Finally, we implement 5 Some Web-focused Wasm platforms support some of these features. Mincut, an lfence insertion algorithm suggested by Vassena Indeed, previous academic work evaluates these benchmarks on Wasm [42], but non-Web Wasm platforms including Lucet do not support them. 6 Tests were performed on June 18, 2020; see testing disclaimer A.2. 10
et al. [86] which uses a min-cut algorithm to minimize the Transition Type Function Callback number of required lfences. We further augment Mincut’s Invoke Invoke lfence insertion with several of our own optimizations, in- Stock 2.14µs 0.07µs Swivel-SFI (lfence + BTB flush both ways) 4.5µs 1.26µs cluding (1) only inserting a single lfence per linear block; Swivel-CET ASLR (lfence + BTB flush 4.08µs 0.79µs and (3) unrolling loops to minimize branches, as we do for one way + MPK) the register interlock scheme and CBP conversions (§3.2.2). Swivel-CET deterministic (lfence + MPK) 2.29µs 0.08µs Finally, to ensure that unrolling loops does not provide an unfair advantage, we also present results for Stock-Unrolled, Table 3: Time taken for transitions between the application and sandbox—for function calls into the sandbox and callback invocations from the sandbox. which is stock Lucet with the same loop unrolling as used in Swivel overheads are generally modest, with the deterministic variant of Swivel’s schemes. Swivel-CET in particular imposing very low overheads. Results We present the Wasm execution overhead of the var- that similar data-dependent loops are largely the cause for ious protection options on the Sightglass benchmarks in Fig- the slowdowns on SPEC benchmarks, including 429.mcf and ure 5b, and on the SPEC 2006 benchmarks in Figure 6b. The 401.bzip2. Some of the other large overheads in Sightglass overheads of the ASLR versions of Swivel-SFI and Swivel- (e.g., fib2 and nestedloop3) are largely artifacts of the bench- CET are small: 5.5% and 4.2% geomean overheads respec- marking suite: These microbenchmarks test simple constructs tively on Sightglass, and at most 10.3% (geomean: 3.4%) and like loops—and CBP-to-BTB conversion naturally makes at most 6.1% (geomean: 2.6%) respectively on SPEC. The (almost empty) loops slow. deterministic versions of Swivel introduce modest overheads: 61.9% and 99.7% on Sightglass, and 3.3%–86.1% (geomean: 5.2 Sandbox transition overhead 47.3%) and 8.0%–240.2% (geomean: 96.3%) on SPEC. All four configurations outperform the baseline schemes by or- We evaluate the overhead of context switching. As described ders of magnitude: Strawman incurs geomean overheads of in Section 3, Swivel adds an lfence instruction to host- 6.9× and 4.3× on Sightglass and SPEC respectively, Load- sandbox transitions to mitigate sandbox breakout attacks. In Lfence incurs 8.7× and 12.5× overhead respectively, while addition to this: Swivel-SFI flushes the BTB during each tran- Mincut incurs 2.4× and 2.8× respectively. sition; Swivel-CET, in deterministic mode, switches Intel® MPK domains during each transition; and Swivel-CET, in Breakdown Addressing CBP poisoning (CBP-to-BTB con- ASLR mode, flushes the BTB in one direction and switches version in Swivel-SFI and register interlocks in Swivel-CET) Intel® MPK domain in each transition. dominates the performance overhead of our deterministic im- We measure the time required for the host application to plementations. We confirm this hypothesis with a microbench- invoke a simple no-op function call in the sandbox, as well mark: We measure the overheads of these techniques individ- as the time required for the sandboxed code to invoke a per- ually on stock Lucet (with our loop unrolling flags). We find mitted function in the application (i.e., perform a callback). that the average (geomean) overhead of CBP conversion is We compare the time required for Wasm code compiled by 52.9% on Sightglass and 38.5% on SPEC. The corresponding stock Lucet with the time required for code compiled with our overheads for register interlocks are 93.2% and 53.4%. various protection schemes. We measure the average perfor- Increasing loop unrolling thresholds does not significantly mance overhead across 1000 such function call invocations.7 improve the performance of stock Lucet (e.g., the speedup These measurements are presented in Table 3. of our loop unrolling on stock Lucet is 0.0% and 5.9% on First, we briefly note that function calls in stock Lucet take Sightglass and SPEC, respectively). It does impact the perfor- much longer than callbacks. This is because the Lucet runtime mance of our deterministic Swivel variants though (e.g., we has not fully optimized the function call transition, as these find that it contributes to a 15%-20% speed up). This is not are relatively rare compared to callback transitions, which surprising since loop unrolling results in fewer conditional occur during every syscall. branches (and thus reduces the effect of CBP conversions and In general, Swivel’s overheads are modest, with the deter- register interlocking). ministic variant of Swivel-CET in particular imposing very To understand the outliers in Figure 5 and Figure 6, low overheads. Flushing the BTB does increase transition we inspect the source of the benchmarks. Some of the costs, but the overall effect of this increase depends on how largest overheads in Figure 5 are on Sightglass’ string ma- frequently transitions occur between the application and sand- nipulation benchmarks, e.g., strcat and strlen. These mi- box. In addition, flushing the BTB affects not only transi- crobenchmarks have lots of data-dependent loops—tight tion performance but also the performance of both the host loops with data-dependent conditions—that cannot be un- application and sandboxed code. Fully understanding these rolled at compile-time. Since our register interlocking inserts overheads requires that we evaluate the overall performance a data dependence between the pinned heap base register and impact on real world applications, which we do next. the loop condition, this prevents the CPU from speculatively executing instructions from subsequent iterations. We believe 7 Tests were performed on June 18, 2020; see testing disclaimer A.2. 11
You can also read