Swivel: Hardening WebAssembly against Spectre

Page created by Derek Buchanan
 
CONTINUE READING
Swivel: Hardening WebAssembly against Spectre

                                                              Shravan Narayan† Craig Disselkoen† Daniel Moghimi¶†
                                                                   Sunjay Cauligi† Evan Johnson† Zhao Gang†
                                             Anjo Vahldiek-Oberwagner? Ravi Sahita∗ Hovav Shacham‡ Dean Tullsen†                                                       Deian Stefan†
                                                           † UC   San Diego    ¶ Worcester   Polytechnic Institute        ? Intel   Labs   ∗ Intel     ‡ UT   Austin

                                                                   Abstract                                          in recent microarchitectures [41] (see Section 6.2). In con-
arXiv:2102.12730v2 [cs.CR] 20 Mar 2021

                                         We describe Swivel, a new compiler framework for hardening                  trast, Spectre can allow attackers to bypass Wasm’s isolation
                                         WebAssembly (Wasm) against Spectre attacks. Outside the                     boundary on almost all superscalar CPUs [3, 4, 35]—and,
                                         browser, Wasm has become a popular lightweight, in-process                  unfortunately, current mitigations for Spectre cannot be im-
                                         sandbox and is, for example, used in production to isolate                  plemented entirely in hardware [5, 13, 43, 51, 59, 76, 81, 93].
                                         different clients on edge clouds and function-as-a-service                     On multi-tenant serverless, edge-cloud, and function as a
                                         platforms. Unfortunately, Spectre attacks can bypass Wasm’s                 service (FaaS) platforms, where Wasm is used as the way to
                                         isolation guarantees. Swivel hardens Wasm against this class                isolate mutually distursting tenants, this is particulary con-
                                         of attacks by ensuring that potentially malicious code can nei-             cerning:1 A malicious tenant can use Spectre to break out of
                                         ther use Spectre attacks to break out of the Wasm sandbox nor               the sandbox and read another tenant’s secrets in two steps
                                         coerce victim code—another Wasm client or the embedding                     (§5.4). First, they mistrain different components of the under-
                                         process—to leak secret data.                                                lying control flow prediction—the conditional branch predic-
                                            We describe two Swivel designs, a software-only approach                 tor (CBP), branch target buffer (BTB), or return stack buffer
                                         that can be used on existing CPUs, and a hardware-assisted                  (RSB)—to speculatively execute code that accesses data out-
                                         approach that uses extension available in Intel® 11th genera-               side the sandbox boundary. Then, they reconstruct the secret
                                         tion CPUs. For both, we evaluate a randomized approach that                 data from the underlying microarchitectural state (typically
                                         mitigates Spectre and a deterministic approach that eliminates              the cache) using a side channel (cache timing).
                                         Spectre altogether. Our randomized implementations impose                      One way to mitigate such Spectre-based sandbox breakout
                                         under 10.3% overhead on the Wasm-compatible subset of                       attacks is to partition mutually distrusting code into separate
                                         SPEC 2006, while our deterministic implementations impose                   processes. By placing untrusted code in a separate process
                                         overheads between 3.3% and 240.2%. Though high on some                      we can ensure that the attacker cannot access secrets. Chrome
                                         benchmarks, Swivel’s overhead is still between 9× and 36.3×                 and Firefox, for example, do this by partitioning different sites
                                         smaller than existing defenses that rely on pipeline fences.                into separate processes [25, 64, 68]. On a FaaS platform, we
                                                                                                                     could similarly place tenants in separate processes.
                                         1   Introduction                                                               Unfortunately, this would still leave tenants vulnerable to
                                         WebAssembly (Wasm) is a portable bytecode originally de-                    cross-process sandbox poisoning attacks [12, 36, 50]. Specif-
                                         signed to safely run native code (e.g., C/C++ and Rust) in                  ically, attackers can poison hardware predictors to coerce a
                                         the browser [27]. Since its initial design, though, Wasm has                victim sandbox to speculatively execute gadgets that access
                                         been increasingly used to sandbox untrusted code outside the                secret data—from their own memory region—and leak it via
                                         browser. For example, Fastly and Cloudflare use Wasm to                     the cache (e.g., by branching on the secret). Moreover, using
                                         sandbox client applications running on their edge clouds—                   process isolation would sacrifice Wasm’s scalability (running
                                         where multiple client applications run within a single pro-                 many sandboxes within a process) and performance (cheap
                                         cess [30, 85]. Mozilla uses Wasm to sandbox third-party                     startup times and context switching) [28, 30, 65, 85].
                                         C/C++ libraries in Firefox [21, 65]. Yet others use Wasm to                    The other popular approach, removing speculation within
                                         isolate untrusted code in serverless computing [28], IoT appli-             the sandbox, is also unsatisfactory. For example, using
                                         cations [10], games [62], trusted execution environments [17],              pipeline fences to restrict Wasm code to sequential execution
                                         and even OS kernels [79].                                                   imposes a 1.8×–7.3× slowdown on SPEC 2006 (§5). Con-
                                            In this paper, we focus on hardening Wasm against Spec-                  servatively inserting pipeline fences before every dynamic
                                         tre attacks—the class of transient execution attacks which                  load—an approach inspired by the mitigation available in
                                         exploit control flow predictors [49]. Transient execution at-               Microsoft’s Visual Studio compiler [61]—is even worse: it
                                         tacks which exploit features within the memory subsystem                    incurs a 7.3×–19.6× overhead on SPEC (§5).
                                         (e.g., Meltdown [57], MDS [11, 72, 84], and Load Value In-                     1 Thoughour techniques are general, for simplicity we henceforth focus
                                         jection [83]) are limited in scope and have already been fixed              on Wasm as used on FaaS platforms.

                                                                                                              1
In this paper, we take a compiler-based approach to hard-               To eliminate host poisoning attacks, we use both Intel®
ening Wasm against Spectre, without resorting to process               CET and Intel® MPK. In particular, we use Intel® MPK to par-
isolation or the use of fences. Our framework, Swivel, ad-             tition the application into two domains—the host and Wasm
dresses not only sandbox breakout and sandbox poisoning                sandbox(es)—and, on context switch, ensure that each domain
attacks, but also host poisoning attacks, i.e., Spectre attacks        can only access its own memory regions. We use Intel® CET
that coerce the process hosting the Wasm sandboxes into leak-          forward-edge control-flow integrity to ensure that application
ing sensitive data. That is, Swivel ensures that a malicious           code cannot jump, sequentially or speculatively, into arbitrary
Wasm tenant cannot speculatively access data outside their             sandbox code (e.g., due to a poisoned BTB). We do this by
sandbox nor coerce another tenant or the host to divulge se-           inserting endbranch instructions in Wasm sandboxes to de-
crets of other sandboxes via poisoning. We develop Swivel              marcate valid jump targets, essentially partitioning the BTB
via three contributions:                                               into two domains. Our use of Intel’s MPK and CET ensures
                                                                       that even if the host code runs with poisoned predictors, it
1. Software-only Spectre hardening (§3.2) Our first con-               cannot read—and thus leak—sandbox data.
tribution, Swivel-SFI, is a software-only approach to harden-
ing Wasm against Spectre. Swivel-SFI eliminates sandbox                   Since Intel® MPK only supports 16 protection regions, we
breakout attacks by compiling Wasm code to linear blocks               cannot use it to similarly prevent sandbox poisoning attacks:
(LBs). Linear blocks are straight-line x86 code blocks that            serverless, edge-cloud, and FaaS platforms have thousands
satisfy two invariants: (1) all transfers of control, including        of co-located tenants. Hence, to address sandbox poisoning
function calls, are at the block boundary—to (and from) other          attacks, like for Swivel-SFI, we consider and evaluate two
linear blocks; and (2) all memory accesses within a linear             designs. The first (again) uses ASLR to randomize the loca-
block are masked to the sandbox memory. These invariants               tion of each sandbox and flushes the BTB on sandbox entry;
are necessary to ensure that the speculative control and data          we don’t flush on sandbox exit since the host can safely run
flow of the Wasm code is restricted to the sandbox boundary.           with a poisoned BTB. The second is deterministic and not
They are not sufficient though: Swivel-SFI must also be tol-           only allows using conditional branches but also avoids BTB
erant of possible RSB underflow. We address this by (1) not            flushes. It does this using a new technique, register interlock-
emitting ret instructions and therefore completely bypassing           ing, which tracks the control flow of the Wasm sandbox and
the RSB and (2) using a separate stack for return addresses.           turns every misspeculated memory access into an access to
   To address poisoning attacks, Swivel-SFI must still account         an empty guard page. Register interlocking allows a tenant
for a poisoned BTB or CBP. Since these attacks are more so-            to run with a poisoned BTB or CBP since any potentially
phisticated, we evaluate two different ways of addressing              poisoned speculative memory accesses will be invalidated.
them, and allow tenants to choose between them according
to their trust model. The first approach uses address space            3. Implementation and evaluation (§4–5) We implement
layout randomization (ASLR) to randomize the placement of              both Swivel-SFI and Swivel-CET by modifying the Lucet
each Wasm sandbox and flushes the BTB on each sandbox                  compiler’s Wasm-to-x86 code generator (Cranelift) and run-
boundary crossing. This does not eliminate poisoning attacks;          time. To evaluate Swivel’s security we implement proof of
it only raises the bar of Wasm isolation to that of process iso-       concept breakout and poisoning attacks against stock Lucet
lation. Alternately, tenants can opt to eliminate these attacks        (mitigated by Swivel). We do this for all three Spectre variants,
altogether; to this end, our deterministic Swivel-SFI rewrites         i.e., Spectre attacks that abuse the CBP, BTB, and RSB.
conditional branches to indirect jumps—thereby completely                We evaluate Swivel’s performance against stock Lucet and
bypassing the CBP (which cannot be directly flushed) and               several fence-insertion techniques on several standard bench-
relying solely on the BTB (which can).                                 marks. On the Wasm compatible subset of the SPEC 2006
2. Hardware-assisted Spectre hardening (§3.3) Our sec-                 CPU benchmarking suite we find the ASLR variants of
ond contribution, Swivel-CET, restores the use of all predic-          Swivel-SFI and Swivel-CET impose little overhead—they
tors, including the RSB and CBP, and partially obviates the            are at most 10.3% and 6.1% slower than Lucet, respectively.
need for BTB flushing. It does this by sacrificing backwards           Our deterministic implementations, which eliminate all three
compatibility and using new hardware security extensions:              categories of attacks, incur modest overheads: Swivel-SFI
Intel’s Control-flow Enforcement Technology (CET) [39] and             and Swivel-CET are respectively 3.3%–86.1% (geomean:
Memory Protection Keys (MPK) [39].                                     47.3%) and 8.0%–240.2% (geomean: 96.3%) slower than
   Like Swivel-SFI, Swivel-CET relies on linear blocks to              Lucet. These overheads are smaller than the overhead im-
address sandbox breakout attacks. But Swivel-CET does not              posed by state-of-the-art fence-based techniques.
avoid ret instructions. Instead, we use Intel® CET’s hardware
shadow stack to ensure that the RSB cannot be misused to               Open source and data We make all source and data avail-
speculatively return to a location that is different from the          able under an open source license at: https://swivel.pro
expected function return site on the stack [39].                       gramming.systems.

                                                                   2
Wasm Client A   Wasm Client B
                                                           ...   Wasm Client X                  the host context and resume execution in the host.
      FaaS host procees

                           Trampoline      Trampoline             Trampoline
                                                                                                Using Wasm in FaaS platforms WebAssembly FaaS plat-
                                             Springboard
                                                                                 Requests       forms like Fastly’s Terrarium allow clents to deploy scal-
                                            FaaS Runtime
                                                                                                able function-oriented Web and cloud applications written in
                                                                                                any langauge (that can be compiled to Wasm). Clients com-
Figure 1: FaaS platform using Wasm to isolate mutually distrusting tenants.                     pile their code to Wasm and upload the resulting module to
2         A brief overview of Wasm and Spectre                                                  the platform; the platform handles scaling and isolation. As
                                                                                                shown in Figure 1, FaaS platforms place thousands of client
In this section, we give a brief overview of WebAssembly’s                                      Wasm modules within a single host process, and distribute
use in multi-tenant serverless and edge-cloud computing                                         these processes across thousands of servers and multiple data-
platforms—and more generally function as a service (FaaS)                                       centers. This is the key to scaling—it allows any host process,
platforms. In particular, we describe how FaaS platforms use                                    in any datacenter, to spawn a fresh Wasm sandbox instance
Wasm as an intermediate compilation layer for isolating dif-                                    and run any client function in response to a user request. By
ferent tenants today. We then briefly review Spectre attacks                                    using Wasm as an intermediate layer, FaaS platforms isolate
and describe how today’s approach to isolating Wasm code is                                     the client for free [6, 30, 60, 65, 66, 88]. Unfortunately, this
vulnerable to this class of attacks.                                                            isolation does not hold in the presence of Spectre attacks.
2.1                       WebAssembly
                                                                                                2.2    Spectre attacks
Wasm is a low-level 32-bit machine language explicitly de-
signed to embed C/C++ and Rust code in host applications.                                       Spectre attacks exploit hardware predictors to induce mis-
Wasm programs (which are simply a collection of functions)                                      predictions and speculatively execute instructions—gadgets—
are (1) deterministic and well-typed, (2) follow a structured                                   that would not run sequentially [49]. Spectre attacks are clas-
control flow discipline, and (3) separate the heap—the linear                                   sified by the hardware predictor they exploit [12]. We focus
memory—from the well-typed stack and program code. These                                        on the three Spectre variants that hijack control flow:
properties make it easy for compilers like Lucet to sandbox
Wasm code and safely embed it within an application [27].                                       I Spectre-PHT Spectre-PHT [49] exploits the pattern his-
                                                                                                  tory table (PHT), which is used as part of the conditional
Control flow safety Lucet’s code generator, Cranelift, en-                                        branch predictor (CBP) to guess the direction of a condi-
sures that the control flow of the compiled code is restricted to                                 tional branch while the condition is still being evaluated.
the sandbox and cannot be bent to bypass bounds checks (e.g.,                                     In a Spectre-PHT attack, the attacker (1) pollutes entries
via return-oriented programming). This comes directly from                                        in the PHT so that a branch is mispredicted to the wrong
preserving Wasm’s semantics during compilation. For ex-                                           path. The attacker can then use this wrong-path execution
ample, compiled code preserves Wasm’s safe stack [27, 52],                                        to bypass memory isolation guards or control flow integrity.
ensuring that stack frames (and thus return values on the                                       I Spectre-BTB Spectre-BTB [49] exploits the branch target
stack) cannot be clobbered. The compiled code also enforces                                       buffer (BTB), which is used to predict the target of an
Wasm’s coarse-grained CFI and, for example, matches the                                           indirect jump [94]. In a Spectre-BTB attack, the attacker
type of each indirect call site with the type of the target.                                      pollutes entries in the BTB, redirecting speculative control
Memory isolation When Lucet creates a Wasm sandbox,                                               flow to an arbitrary target. Spectre-BTB can thus be used
it reserves 4GB of virtual memory for the Wasm heap and                                           to speculatively execute gadgets that are not in the normal
uses Cranelift to bound all heap loads and stores. To this end,                                   execution path (e.g., to carry out a ROP-style attack).
Cranelift (1) explicitly passes a pointer to the base of the                                    I Spectre-RSB Spectre-RSB [50, 58] exploits the return
heap as the first argument to each function and (2) masks all                                     stack buffer (RSB), which memorizes the location of re-
pointers to be within this 4GB range. Like previous software-                                     cently executed call instructions to predict the targets of
based isolation (SFI) systems [88], Cranelift avoids expensive                                    ret instructions. In a Spectre-RSB attack, the attacker uses
bounds check operations by using guard pages to trap any                                          chains of call or ret instructions to over- or underflow the
offsets that may reach beyond the 4GB heap space.                                                 RSB, and redirect speculative control flow in turn.
Embedding Wasm An application with embedded Wasm                                                Spectre can be used in-place or out-of-place [12]. In an in-
code will typically require context switching between the                                       place attack, the attacker mistrains the prediction for a victim
Wasm code and host—e.g., to read data from a socket. Lucet                                      branch by repeatedly executing the victim branch itself. In an
allows safe control and data flow across the host-sandbox                                       out-of-place attack, the attacker finds a secondary branch that
boundary via springboards and trampolines. Springboards                                         is congruent to the victim branch—predictor entries are in-
are used to enter Wasm code—they set up the program context                                     dexed using a subset of address bits—and uses this secondary
for Wasm execution—while trampolines are used to restore                                        branch to mistrain the prediction for the victim branch.

                                                                                            3
Branch                                        Wasm address space                      Guide                1    mov    rdx,QWORD PTR [fn_table_len] ; get fn table length
 Predictor     Scenario 1: Malicious sandbox                                            Benign               2    cmp    rcx,rdx ; check that rcx is in-bounds
   Unit        redirects itself to a Spectre               Malicious                    prediction           3    jb     index_ok

                                                                          Trampoline
               gadget                                      Sandbox
                                                                                        Spectre              4    ud2 ; trap otherwise
                                                                                        prediction
                                                                                                             5   index_ok:
                                                                                        Spectre              6    lea    rdx,[fn_table]
                                                                                        access
                                                                       Outside                               7    mov    rcx,QWORD PTR [rdx+rcx*4]
                                                                              x
                                                                       Sandbo
                                                                                        Data
                                                                                                                  call   rcx

                                                                          Trampoline
                                                                                        leakage              8
               Scenario 2: Malicious sandbox
               redirects the benign sandbox                   Victim                    Spectre
                                                                                        gadget
                                                                                                         Figure 3: A simplified snippet of the vulnerable code from our Spectre-PHT
               to a Spectre gadget                           Sandbox
                                                                                                         breakout attack. This code is safe during sequential execution (it checks
                                                                                        Sensitive
                                                                                        data             the index rcx before using it to load a function table entry). But, during

                                                                          Springboard
                                                                                                         speculative execution, control flow may bypass this check and access memory
               Scenario 3: Malicious sandbox                                                             outside the function table bounds.
               redirects the application to a        FaaS Runtime
               Spectre gadget

                                                                                                                                                 Swivel-SFI           Swivel-CET
Figure 2: A malicious tenant can fill branch predictors with invalid state                                   Attack variant
                                                                                                                                                ASLR     Det         ASLR     Det
(red). In one scenario, the attacker causes its own branches to speculatively
execute code that access memory outside of the sandbox. In the second                                                         in-place
                                                                                                             Spectre-PHT
and third scenarios, the attacker uses Spectre to respectively target a victim                                                out-of-place
sandbox or the host runtime to misspeculate and leak secret data.                                                             in-place
                                                                                                             Spectre-BTB
                                                                                                                              out-of-place
                                                                                                                              in-place
2.3          Spectre attacks on FaaS platforms                                                               Spectre-RSB
                                                                                                                              out-of-place
A malicious FaaS platform client who can upload arbitrary
Wasm code can force the Wasm compiler to emit native code                                                Table 1: Effectiveness of Swivel against different Spectre variants. A full
                                                                                                         circle indicates that Swivel eliminates the attack while a half circle indicates
which is safe during sequential execution, but uses Spectre                                              that Swivel only mitigates the attack.
to bypass Wasm’s isolation guarantees during speculative
execution. We identify three kinds of attacks (Figure 2):                                                the table, and calls it; otherwise the code traps.
I Scenario 1: Sandbox breakout attacks The attacker                                                         An attacker can mistrain the conditional branch on line 3
  bends the speculative control flow of their own module                                                 and cause it to speculatively jump to index_ok even when
  to access data outside the sandbox region. For example,                                                rcx is out-of-bounds. By controlling the contents of rcx, the
  they can use Spectre-PHT to bypass conditional bounds                                                  attacker can thus execute arbitrary code locations outside
  checks when accessing the indirect call table. Alternatively,                                          the sandbox. In Section 5.4 we demonstrate several proof
  they can use Spectre-BTB to transfer the control flow into                                             of concept attacks, including a breakout attack that uses this
  the middle of instructions to execute unsafe code (e.g., code                                          gadget. These attacks serve to highlight the importance of
  that bypasses Wasm’s implicit heap bounds checks).                                                     hardening Wasm against Spectre.
I Scenario 2: Sandbox poisoning attacks The attacker uses
  an out-of-place Spectre attack to bend the control flow                                                3       Swivel: Hardening Wasm against Spectre
  of a victim sandbox and coerce the victim into leaking                                                 Swivel extends Lucet—and the underlying Cranelift code
  their own data. Although this attack is considerably more                                              generator—to address Spectre attacks on FaaS Wasm plat-
  sophisticated, we were still able to implement proof of                                                forms. We designed Swivel with several goals in mind. First,
  concept attacks following Canella et al. [12]. Here, the                                               performance: Swivel minimizes the number of pipeline fences
  attacker finds a (mispredicted) path in the victim sandbox                                             it inserts, allowing Wasm to benefit from speculative execu-
  that leads to the victim leaking data, e.g., through cache                                             tion as much as possible. Second, automation: Swivel does
  state. They then force the victim to mispredict this path by                                           not rely on user annotations or source code changes to guide
  using a congruent branch within their own sandbox.                                                     mitigations; we automatically apply mitigations when com-
I Scenario 3: Host poisoning attacks Instead of bending                                                  piling Wasm. Finally, modularity: Swivel offers configurable
  the control flow of the victim sandbox, the attacker can use                                           protection, ranging from probabilistic schemes with high per-
  an out-of-place Spectre attack to bend the control flow of                                             formance to thorough mitigations with strong guarantees. This
  the host runtime. This allows the attacker to speculatively                                            allows Swivel users to choose the most appropriate mitiga-
  access data from the host as well as any other sandbox.                                                tions (see Table 1) according to their application domain,
Figure 3 gives an example sandbox breakout gadget. The gad-                                              security considerations, or particular hardware platform.
get is in the implementation of the Wasm call_indirect in-                                                   In the rest of this section, we describe our attacker model
struction, which is used to call functions indexed in a module-                                          and introduce a core abstraction: linear blocks. We then show
level function table. This code first compares the function                                              how linear blocks, together with several other techniques, are
index rcx to the length of the function table (to ensure that                                            used to address both sandbox breakout and poisoning attacks.
rcx points to a valid entry). If rcx is valid, it then jumps to                                          These techniques span two Swivel designs: Swivel-SFI, a
index_ok, loads the function from the corresponding entry in                                             software-only scheme which provides mitigations compatible

                                                                                                     4
with existing CPUs; and Swivel-CET, which uses hardware                 native code, Swivel ensures that a linear block is safe by:
extensions (Intel® CET and Intel® MPK) available in the 11th
generation Intel® CPUs.                                                 Masking memory accesses Since we cannot make any as-
                                                                        sumptions about the initial contents of registers, Swivel en-
Attacker model We assume that the attacker is a FaaS plat-
                                                                        sures that unconditional heap bounds checks (performed via
form client who can upload arbitrary Wasm code which the
                                                                        masking) are performed in the same linear block as the heap
platform will then compile and run alongside other clients
                                                                        memory access itself. We do this by modifying the Cranelift
using Swivel. The goal of the attacker is to read data sensi-
                                                                        optimization passes which could lift bounds checks (e.g., loop
tive to another (victim) client using Spectre attacks. In the
                                                                        invariant code motion) to ensure that they don’t move masks
Swivel-CET case, we only focus on exfiltration via the data
                                                                        across linear block boundaries. Similarly, since we cannot
cache—and thus assume an attacker who can only exploit
                                                                        trust values on the stack, Swivel ensures that any value that
gadgets that leak via the data cache. We consider transient at-
                                                                        is unspilled from the stack and used in a bounds check is
tacks that exploit the memory subsystem (e.g., Meltdown [57],
                                                                        masked again. We use this mask-after-unspill technique to
MDS [11, 72, 84], and LVI [83]) out of scope and discuss this
                                                                        replace Cranelift’s unsafe mask-before-spill approach.
in detail in Section 6.2.
   We assume that our Wasm compiler and runtime are cor-
                                                                        Pinning the heap registers To properly perform bounds
rect. We similarly assume the underlying operating system
                                                                        checks for heap memory accesses, a Swivel linear block must
is secure and the latest CPU microcode updates are applied.
                                                                        determine the correct value of the heap base. Unfortunately, as
We assume hyperthreading is disabled for any Swivel scheme
                                                                        described above, we cannot make any assumptions about the
except for the deterministic variant of Swivel-CET. Consis-
                                                                        contents of any register or stack slot. Swivel thus reserves one
tent with previous findings [94], we assume BTBs predict the
                                                                        register, which we call the pinned heap register, to store the
lower 32-bits of target addresses, while the upper 32-bits are
                                                                        address of the sandbox heap. Furthermore, Swivel prevents
inferred from the instruction pointer.
                                                                        any instructions in the sandbox from altering the pinned heap
   Swivel addresses attackers that intentionally extract infor-
                                                                        register. This allows each linear block to safely assume that
mation using Spectre. We do not prevent clients from acciden-
                                                                        the pinned heap register holds the correct value of the heap
tally leaking secrets during sequential execution and, instead,
                                                                        base, even when the speculative control flow of the program
assume they use techniques like constant-time programming
                                                                        has gone awry due to misprediction.
to prevent such leaks [14]. For all Swivel schemes except the
deterministic variant of Swivel-CET, we assume that a sand-
                                                                        Hardening jump tables Wasm requires bounds checks on
box cannot directly invoke function calls in other sandboxes,
                                                                        each access to indirect call tables and switch tables. Swivel
i.e., it cannot control the input to another sandbox to perform
                                                                        ensures that each of these bounds checks is local to the lin-
an in-place poisoning attack. We lastly assume that host se-
                                                                        ear block where the access is performed. Moreover, Swivel
crets can be protected by placing them in a Wasm sandbox,
                                                                        implements the bounds check using speculative load hard-
and discuss this further in Section 6.1.
                                                                        ening [13], masking the index of the table access with the
                                                                        length of the table. This efficiently prevents the attacker from
3.1    Linear blocks: local Wasm isolation                              speculatively bypassing the bounds check.
To enforce Wasm’s isolation sequentially and speculatively,                Swivel does not check the indirect jump targets beyond
Swivel-SFI and Swivel-CET compile Wasm code to linear                   what Cranelift does. At the language level, Wasm already
blocks (LBs). Linear blocks are straight-line code blocks that          guarantees that the targets of indirect jumps (i.e., the entries
do not contain any control flow instructions except for their           in indirect call tables and switch tables) can only be the tops of
terminators—this is in contrast to traditional basic blocks,            functions or switch-case targets. Compiled, these correspond
which typically do not consider function calls as terminators.          to the start of Swivel linear blocks. Thus, an attacker can only
This simple distinction is important: It allows us to ensure that       train the BTB with entries that point to linear blocks, which,
all control flow transfers—both sequential and speculative—             by construction, are safe to execute in any context.
land on linear block boundaries. Then, by ensuring that indi-
vidual linear blocks are safe, we can ensure that whole Wasm            Protecting returns Wasm’s execution stack—in particular,
programs, when compiled, are confined and cannot violate                return addresses on the stack—cannot be corrupted during
Wasm’s isolation guarantees.                                            sequential execution. Unfortunately, this does not hold specu-
   A linear block is safe if, independent of the control flow           latively: An attacker can write to the stack (e.g., with a buffer
into the block, Wasm’s isolation guarantees are preserved.              overflow) and speculatively execute a return instruction which
In particular, we cannot rely on safety checks (e.g., bounds            will divert the control flow to their location of choice. Swivel
checks for memory accesses) performed across linear blocks              ensures that return addresses on the stack cannot be corrupted
since, speculatively, blocks may not always execute in sequen-          as such, even speculatively. We do this using a separate stack
tial order (e.g., because of Spectre-BTB). When generating              or shadow stack [8], as we detail below.

                                                                    5
Spectre-unsafe Wasm compilation           Software-only Swivel                                              Hardware-assisted Swivel

               block1:                                     lblock1_1:            Linear block                                 lblock1_1:
               instx                                       instx                                                              instx
               insty                                       insty                                                              insty
               mask                              store block4,                                                 store block4, 
               store block4,                          call                                                          call 
               call                     block4:
               instz                         instx
                                                                                  block4:                                                                                       CET endbranch
               access               insty                                                                                                   Register interlocking
                                                           lblock1_2:             instx
               jmp block2                    retn                                                                             lblock1_2:
                                                           intz                   insty
                                                           mask         load sstack,                           register_interlock                  block4:
                                                           access        jmp                                    intz                                endbranch
                   block2:                                 jmp block2                                                         mask                      register_interlock
                                                                                        Separate stack
                   instc                                                                                                      access                     instx
                   cmp                                                                                                  jmp block2                          insty
                   jnz block3                                                                       RSB to BTB +                                                  retn
                                                                                                    separate stack
                                                           block2:                                                                                                     Shadow stack
                                                           instc                                                               block2:
               instd          block3:                      mov 0,                                                         register_interlock
                                                                                                                                                                              CET shadow stack
               jmp       inste                        cmp                                                           instc
                              ...                          cmov Lblock3,                                                  cmp 
                                                           jmp               CBP to BTB                                   jnz block3
             Key
                   Direct branch: jmp/call block?                                                                           instd          block3:
                   Indirect branch: jmp/call        instd        block3:                                               jmp       inste
                                                         jmp     inste                                                                ...
                   Conditional Branch: jCC block?                     ...
                   x86 return instruction

Figure 4: Swivel hardens Wasm against spectre via compiler transformations. In Swivel-SFI, we convert basic blocks to linear blocks. Each linear block (e.g.,
lblock1_1 and lblock1_2) maintains local security guarantees for speculative execution. Then, we protect the backward edge (block4) by replacing the
return instructions and using a separate return stack. To eliminate poisoning attacks, in the deterministic version of Swivel-SFI, we further encode conditional
branches as indirect jumps. In Swivel-CET, we similarly use linear blocks, but we allow return instruction, and protect returns using the hardware shadow stack.
To reduce BTB flushes, we additionally use Intel® CET’s endbranch to ensure that targets of indirect branches land at the beginning of linear blocks. In the
deterministic version, we avoid BTB flushing and instead use register interlocking to prevent leakage on misspeculated paths.

3.2      Swivel-SFI                                                                              3.2.2               Addressing sandbox and host poisoning attacks
Swivel-SFI builds on top of linear blocks to address all three                                  There are two ways for a malicious sandbox to carry out poi-
classes of attacks.                                                                             soning attacks: By poisoning CBP or BTB entries. Since we
                                                                                                already flush the BTB to address sandbox breakout attacks,
3.2.1     Addressing sandbox breakout attacks
                                                                                                we trivially prevent all BTB poisoning. Addressing CBP poi-
Compiling Wasm code to linear blocks eliminates most av-                                        soning is less straightforward. We consider two schemes:
enues for breaking out of the sandbox. The only way for an
                                                                                                 Mitigating CBP poisoning To mitigate CBP-based poison-
attacker to break out of the sandbox is to speculatively jump
                                                                                                 ing attacks, we use ASLR to randomize the layout of sandbox
into the middle of a linear block. We prevent this with:
                                                                                                 code pages. This mitigation is not sound—it is theoretically
The separate stack We protect returns by preserving                                              possible for an attacker to influence a specific conditional
Wasm’s safe return stack during compilation. Specifically,                                       branch outside of the sandbox. As we discuss in Section 3.4,
we create a separate stack in a fixed memory location, but                                       this raises the bar to (at least) that of process isolation: The
outside the sandbox stack and heap, to ensure that it cannot be                                  attacker would would have to (1) de-randomize the ASLR of
overwritten by sandboxed code. We replace every call instruc-                                    both their own sandbox and the victim’s and (2) find useful
tion with an instruction sequence that stores the address of                                     gadgets, which is itself an open problem (§7).
the subsequent instruction—the return address—to the next
                                                                                                 Eliminating CBP poisoning Clients that are willing to tol-
entry in this separate stack. Similarly, we replace every return
                                                                                                 erate modest performance overheads (§5) can opt to eliminate
instruction with a sequence that pops the address off the sep-
                                                                                                 poisoning attacks. We eliminate poisoning attacks by remov-
arate stack and jumps to that location. To catch under- and
                                                                                                 ing conditional branches from Wasm sandboxes altogether.
over-flows, we surround the separate stack with guard pages.
                                                                                                 Following [54], we do this by using the cmov conditional move
BTB flushing The other way an attacker can jump into the                                         instruction to encode each conditional branch as an indirect
middle of a linear block is via a mispredicted BTB entry.                                        branch with only two targets (Figure 4).
Since all indirect jumps inside a sandbox can only point to
the tops of linear blocks, any such entries can only be set via                                  3.3            Swivel-CET
a congruent entry outside any sandbox—i.e., an attacker must                                     Swivel-SFI avoids using fences to address Spectre attacks,
orchestrate the host runtime into mistraining a particular jump.                                 but ultimately bypasses all but the BTB predictors—and even
We prevent such attacks by flushing the BTB on transitions                                       then we flush the BTB on every sandbox transition. Swivel-
into and out of the sandbox.2                                                                    CET uses Intel® CET [39] and Intel® MPK [39] to restore
   2 In practice, BTB predictions are not absolute (as discussed in our attacker                To ensure that this does not result in predictions at non linear block bound-
model), instead they are 32-bit offsets relative to the instruction pointer [94].               aries, we restrict the sandbox code size to 4GB.

                                                                                         6
the use of the CBP and RSB, and avoid BTB flushing.3                                 sensitive data. As with Swivel-SFI, we consider both a proba-
                                                                                     bilistic and deterministic design to addressing these attacks.
3.3.1   Addressing sandbox breakout attacks
                                                                                     Since the probabilistic approach is like Swivel-SFI’s, we de-
Like Swivel-SFI, we build on linear blocks to address sandbox                        scribe only the deterministic design.
breakout attacks (Figure 4). Swivel-CET, however, prevents                           Preventing leaks under poisoned execution Swivel-CET
an attacker from speculatively jumping into the middle of a                          does not eliminate cross-sandbox CBP or BTB poisoning.
linear block using:                                                                  Instead, we ensure that a victim sandbox cannot be coerced
The shadow stack Swivel-CET uses Intel® CET’s shadow                                 into leaking data via the cache when executing a mispredicted
stack to protect returns. Unlike Swivel-SFI’s separate stack,                        path. To leak secrets through the cache, the attacker must
the shadow stack is a hardware-maintained stack, distinct from                       maneuver the secret data to a gadget that will use it as an
the ordinary data stack and inaccessible via standard load and                       offset into a memory region. In Cranelift, any such gadget
store instructions. The shadow stack allows us to use call and                       will use the heap, as stack memory is always accessed at
return instructions as usual—the CPU uses the shadow stack                           constant offsets from the stack pointer (which itself cannot be
to check the integrity of return addresses on the program stack                      directly assigned). We thus need only prevent leaks that are
during both sequential and speculative execution.                                    via the Wasm heap—we do this using register interlocks.
Forward-edge CFI Instead of flushing the BTB, Swivel-                                Register interlocking Our register interlocking technique
CET uses Intel® CET’s coarse-grained control flow integrity                          tracks the control flow of a Wasm program and prevents it
(CFI) [1] to ensure that sandbox code can only jump to the                           from accessing its stack or heap when the speculative path
top of a linear block. We do this by placing an endbranch in-                        diverges from the sequential path. We first assign each non-
struction at the beginning of every linear block that is used as                     trivial linear block a unique 64-bit block label. We then calcu-
an indirect target (e.g., the start of a function that is called in-                 late the expected block label of every direct or indirect branch
directly). During speculative execution, if the indirect branch                      and assign this value to a reserved interlock register prior to
predictor targets an instruction other than an endbranch (e.g.,                      branching. At the beginning of each linear block, we check
inside the host runtime), the CPU stops speculating [39].                            that the value of the interlock register corresponds to the static
                                                                                     block label using cmov instructions. If the two do not match,
Conditional BTB flushing When using ASLR to address                                  we zero out the heap base register as well as the stack register.
sandbox poisoning attacks, we still need to flush the BTB                            Finally, we unmap pages from the address space to ensure
on transitions into each sandbox. Otherwise, one sandbox                             that any access from a zero heap or stack base will fault—and
could potentially jump to a linear block in another sandbox.                         thus will not affect cache state.
Our deterministic approach to sandbox poisoning (described                              The register interlock fundamentally introduces a data de-
below), however, eliminates BTB flushes altogether.                                  pendency between memory operations and the resolution of
3.3.2   Addressing host poisoning attacks                                            control flow. In doing so, we prevent any memory operations
                                                                                     that would result in cache based leaks, but do not prevent all
To prevent host poisoning attacks, Swivel-CET uses Intel®                            speculative execution. In particular, any arithmetic operations
MPK. Intel® MPK exposes new user mode instructions that                              may still be executed speculatively. This is similar to hard-
allow a process to partition its memory into sixteen linear                          ware taint tracking [93], but enforced purely through compiler
regions and to selectively enable/disable read/write access                          changes.
to any of those regions. Swivel-CET uses only two of these                              Finally, Wasm also stores certain data (e.g., globals vari-
protection domains—one for the host and one shared by all                            ables and internal structures) outside the Wasm stack or heap.
sandboxes—and switches domains during the transitions be-                            To secure these memory accesses with the register interlock,
tween host and sandbox. When the host creates a new sandbox,                         we introduce an artificial heap load in the address computation
Swivel-CET allocates the heap memory for the new sandbox                             for this data.
with the sandbox protection domain, and then relinquishes its
own access to that memory so that it is no longer accessible by                      3.4     Security and performance trade-offs
the host. This prevents host poisoning attacks by ensuring that                      Swivel offers two design points for protecting Wasm modules
the host cannot be coerced into leaking secrets from another                         from Spectre attacks: Swivel-SFI and Swivel-CET. For each
sandbox. We describe how we safely copy data across the                              of these schemes we further consider probabilistic (ASLR)
boundary later (§4).                                                                 and deterministic techniques. In this section, we discuss the
                                                                                     performance and security trade-offs when choosing between
3.3.3   Addressing sandbox poisoning attacks
                                                                                     these various Swivel schemes.
By poisoning CBP or BTB entries, a malicious sandbox can                             3.4.1   Probabilistic or deterministic?
coerce a victim sandbox into executing a gadget that leaks
                                                                                     Table 1 summarizes Swivel’s security guarantees. Swivel’s
   3 Appendix   A.1 gives a brief introduction to these new hardware features.       deterministic schemes eliminate Spectre attacks, while the

                                                                                 7
Swivel-SFI         Swivel-CET           only to mitigate sandbox poisoning attacks—and unlike sand-
 Swivel protection and technique
                                        ASLR     Det       ASLR     Det          box breakout attacks, these attacks are significantly more
 Sandbox breakout protections                                                    challenging for an attacker to carry out: They must conduct
 - Linear blocks [CBP, BTB, RSB]                                                 an out-of-place attack on a specific target while accounting
 - BTB flush in springboard [BTB]                                                for the unpredictable mapping of the branch predictor. To our
 - Separate control stack [RSB]                                                  knowledge, such an attack has not been demonstrated, even
 - CET endbranch [BTB]
                                                                                 without the additional challenges of defeating ASLR.
 - CET shadow stack [RSB]
                                                                                    Furthermore, on a FaaS platform, these attacks are even
 Sandbox poisoning protections
 - BTB flush in springboard [BTB]
                                                                                 harder to pull off, as the attacker has only a few hundred
 - Code page ASLR [CBP]                                                          milliseconds to land an attack on a victim sandbox instance
 - Direct branches to indirect [CBP]                                             before it finishes—and the next victim instance will have
 - Register interlock [CBP, BTB]                                                 entirely new mappings. Previous work suggests that such an
 Host poisoning protections                                                      attack is not practical in such a short time window [18].
 - Separate control stack [RSB]                                                     For other application domains, the overhead of the deter-
 - Code page ASLR [CBP]                                                          ministic Swivel variants may yet be reasonable. As we show
 - BTB flush in trampoline [BTB]
                                                                                 in Section 5, the average (geometric) overhead of Swivel-SFI
 - Direct branches to indirect [CBP]
                                                                                 is 47.3% and that of Swivel-CET is 96.3%. Moreover, users
 - Two domain MPK [CBP]
                                                                                 can choose to use Swivel-SFI and Swivel-CET according to
Table 2: Breakdown of Swivel’s individual protection techniques which,           their trust model—Swivel allows sandboxes of both designs
when combined, address the thee different class of attacks on Wasm (§2.3).       to coexist within a single process.
For each technique we also list (in brackets) the underlying predictors.
                                                                                 3.4.2   Software-only or hardware-assisted?
probabilistic schemes eliminate Spectre attacks that exploit
the BTB and RSB, but trade-off security for performance                          Swivel-SFI and Swivel-CET present two design points that
when it comes to the CBP (§5): Our probabilistic schemes                         have different trade-offs beyond backwards compatibility. We
only mitigate Spectre attacks that exploit the CBP.                              discuss their trade-offs, focusing on the deterministic variants.
   To this end, (probabilistic) Swivel hides branch offsets by                      Swivel-SFI eliminates Spectre attacks by allowing spec-
randomizing code pages. Previously, similar fine-grain ap-                       ulation only via the BTB predictor and by controlling BTB
proaches to address randomization have been proposed to mit-                     entries through linear blocks and BTB flushing. Swivel-CET,
igate attacks based on return-oriented programming [15, 22].                     on the other hand, allows the other predictors. To do this
Specifically, when loading a module, Swivel copies the code                      safely though, we use register interlocking to create data
pages of the Wasm module to random destinations, random-                         dependencies (and thus prevent speculation) on certain op-
izing all but the four least significant bits (LSBs) to keep                     erations after branches. Our interlock implementation only
16-byte alignment. This method is more fine-grained than                         guards Wasm memory operations—this means that, unlike
page remapping, which would fail to randomize the lower 12                       Swivel-SFI, Swivel-CET only prevents cache-based sand-
bits for 4KB instruction pages.                                                  box poisoning attacks. While non-memory instructions (e.g.,
   Unfortunately, only a subset of address bits are typically                    arithmetic operations) can still speculatively execute, register
used by hardware predictors. Zhang et. al [94], for example,                     interlocks sink performance: Indeed, the overall performance
found that only the 30 LSBs of the instruction address are                       overhead of Swivel-CET is higher than Swivel-SFI (§5).
used as input for BTB predictors. Though a similar study has                        At the same time, Swivel-CET can be used to handle a
not been conducted for the CBP, if we pessimistically assume                     more powerful attacker model (than our FaaS model). First,
that 30 LSBs are used for prediction then our randomization                      Swivel-CET eliminates poisoning attacks even in the pres-
offers at least 26 bits of entropy. Since the attacker must de-                  ence of attacker-controlled input. This is a direct corollary of
randomize both their module and the victim module, this is                       being able to safely execute code with poisoned predictors.
likely higher in practice.                                                       Second, Swivel-CET (in the deterministic scheme) is safe in
   As we show in Section 5, the ASLR variants of Swivel are                      the presence of hyperthreading; our other Swivel schemes
faster than the deterministic variants. Using code page ASLR                     assume that hyperthreading is disabled (§3). Swivel-CET al-
imposes less overhead than the deterministic techniques (sum-                    lows hyperthreading because it doesn’t rely on BTB flushing;
marized in Table 2). This is not surprising: CBP conversion                      it uses register interlocking to eliminate sandbox poisoning
(in Swivel-SFI) and register interlocking (in Swivel-CET) are                    attacks. In contrast, our SFI schemes require the BTB to be
the largest sources of performance overhead.                                     isolated for the host and each Wasm sandbox—an invariant
   For many application domains, this security-performance                       that may not hold if, for example, hyperthreading interleaves
trade-off is reasonable. Our probabilistic schemes use ASLR                      host application and Wasm code on sibling threads.

                                                                             8
4     Implementation                                                              transition functions to switch between the application and
                                                                                  sandbox Intel® MPK domains.
We implement Swivel on top of the Lucet Wasm compiler
                                                                                     Since Intel® MPK blocks the application from accessing
and runtime [30, 60]. In this section, we describe our modifi-
                                                                                  sandbox memory, we add primitives that briefly turn off Intel®
cations to Lucet.
                                                                                  MPK to copy memory into and out of sandboxes. We imple-
   We largely implement Swivel-SFI and Swivel-CET as
                                                                                  ment these primitives using the rep instruction prefix instead
passes in Lucet’s code generator Cranelift. For both schemes,
                                                                                  of branching code, ensuring that the primitives are not vulner-
we add support for pinned heap registers and add direct jmp
                                                                                  able to Spectre attacks during this window.
instructions to create linear block boundaries. We modify
                                                                                     Finally, we add Intel® CET support to both the Rust
Cranelift to harden switch-table and indirect-call accesses:
                                                                                  compiler—used to compile Lucet—and to Lucet itself so
Before loading an entry from either table, we truncate the
                                                                                  that the resulting binaries are compatible with the hardware.
index to the length of the table (or the next power of two)
using a bitwise mask. We also modify Lucet’s stack overflow                       5     Evaluation
protection: Lucet emits conditional checks to ensure that the
                                                                                  We evaluate Swivel by asking four questions:
stack does not overflow; these checks are rare and we simply
use lfences.                                                                      I What is the overhead of Wasm execution? (§5.1)
   We modify the springboard and trampoline transition func-                        Swivel’s hardening schemes make changes to the code
tions in the Lucet runtime. Specifically, we add a single                           generated by the Lucet Wasm compiler. We examine
lfence to each transition function since we must disallow                           the performance impact of these changes on Lucet’s
speculation from crossing the host-sandbox boundary.                                Sightglass benchmark suite [9] and Wasm-compatible
   The deterministic defenses for both Swivel-SFI and Swivel-                       SPEC 2006 [29] benchmarks.
CET—CBP conversion and register interlocks—increase the                           I What is the overhead of transitions? (§5.2) Swivel mod-
cost of conditional control flow. To reduce the number of                           ifies the trampolines and springboards used to transition
conditional branches, we thus enable explicit loop unrolling                        into and out of Wasm sandboxes. The changes vary across
flags when compiling the deterministic schemes.4 This is                            our different schemes—from adding lfences, or flushing
not necessary for the ASLR-based variants since they do not                         the BTB during one or both transition directions, to switch-
modify conditional branches. Indeed, the ASLR variants are                          ing Intel® MPK domains. We measure the impact of these
straightforward modifications to the dynamic library loader                         changes on transition costs using a microbenchmark.
used by the Lucet runtime: Since all sandbox code is position                     I What is the end-to-end overhead of Swivel? (§5.3) We
independent, we just copy a new sandbox instance’s code and                         examine the impact of Wasm execution overhead and transi-
data pages to a new randomized location in memory.                                  tion overhead on a webserver that runs Wasm services. We
   We also made changes specific to each Swivel scheme:                             measure the impact of Swivel protections on five different
                                                                                    Wasm workloads running on this webserver.
Swivel-SFI We augment the Cranelift code generation pass
                                                                                  I Does Swivel eliminate Spectre attacks? (§5.4) We eva-
to replace call and return instructions with the Swivel-SFI
                                                                                    lute the security of Swivel, i.e., whether Swivel prevents
separate stack instruction sequences and we mask pointers
                                                                                    sandbox breakout and poisoning attacks, by implementing
when they are unspilled from the stack. For the deterministic
                                                                                    several proof-of-concept Spectre attacks.
variant of Swivel-SFI, we also replace conditional branches
with indirect jump instructions, as described in Section 3.2.                     Machine setup We run our benchmarks on a 4-core,
   To protect against sandbox poisoning attacks (§2.3), we                        8-thread Tigerlake CPU software development platform
flush the BTB during the springboard transition into any sand-                    (2.7GHz with a turbo boost of 4.2GHz) supporting the Intel®
box. Since this is a privileged operation, we implement this                      CET extension. The machine has 16 GB of RAM and
using a custom Linux kernel module.                                               runs 64-bit Fedora 32 with the 5.7.0 Linux kernel modified
                                                                                  to include Intel® CET support [34]. Our Swivel modifica-
Swivel-CET In the Swivel-CET code generation pass, we                             tions are applied to Lucet version 0.7.0-dev, which includes
place endbranch instructions at each indirect jump target in                      Cranelift version 0.62.0. We perform benchmarks on stan-
Wasm to enable Intel® CET protection. These indirect jump                         dard SPEC CPU 2006, and Sightglass version 0.1.0. Our
targets include switch table entries and functions which may                      webserver macrobenchmark relies on the Rocket webserver
be called indirectly. We also use this pass to emit the register                  version 0.4.4, and we use wrk version 4.1.0 for testing.
interlocks for the deterministic variant of Swivel-CET.
   We adapt the springboard and trampoline transition func-                       5.1    Wasm execution overhead
tions to ensure that all uses of jmp, call and return conform                     We measure the impact of Swivel’s Spectre mitigations on
to the requirements of Intel® CET. We furthermore use these                       Wasm performance in Lucet using two benchmark suites:
    4 Forsimplicity, we do this in the Clang compiler when compiling appli-       I The Sightglass benchmark suite [9], used by the Lucet
cations to Wasm and not in Lucet proper.                                            compiler, includes small functions such as cryptographic

                                                                              9
32.8×                                                                                                                        348%                         468% 333% 392%
                                                                                                                                                                                                    349%                     476%
                                      LoadLfence              Blade                                                                            300%         Stock-Unrolled              SFI-Det
                          25×         Strawman
Relative execution time

                                                                                                                                               250%         SFI-ASLR                    CET-Det

                                                                                                                          Execution overhead
                                                                                                                                                            CET-ASLR
                          20×
                                                                                                                                               200%

                          15×                                                                                                                  150%

                                                                                                                                               100%
                          10×
                                                                                                                                               50%
                          5×
                                                                                                                                                0%
                          1×                                                                                                                   -50%
                                   se n

                                                                                                                                                         se n
                                        k

                                                                                                                                                 m maatr k
                                   rc t
                                  st at2

                                                                                                                                                         rc t
                                                                                                                                                        st at2
                                  st chr

                                                                                                                                                        st chr
                                  m 20

                                                                                                                                                        m 20
                             ed ct 64
                                 25 ype

                                     fi 9
                             he g b2

                            G ach 20

                                                                                                                                                   ed ct 64
                                                                                                                                                           51e
                                                                                                                                                           fi 9
                                                                                                                                                   he g b2

                                                                                                                                                  G ch 20
                                ke so i

                                                                                                                                                      ke so i
                                    c rt

                             em tr ix
                                  m ix2

                                                                                                                                                          c rt

                                                                                                                                                   em tr ix
                                                                                                                                                 ne m ov2

                                                                                                                                                      nd om

                                                                                                                                                          lim2
                                   si it

                                                                                                                                                         si it
                                  st ve

                                  st len
                               sw it k
                            xb i ch
                              h bla 2
                          neest ini ve
                          neste dlo sv
                            st dlo op

                               ra oo 2
                             ra nd p3
                              ra do m
                                 te m2

                                        n

                                                                                                                                                        st ve

                                                                                                                                                        st len
                                                                                                                                                     swwit k
                                                                                                                                                 xcbla itchh
                                                                                                                                                    ha bla 2
                                                                                                                                                ne st ini e
                                                                                                                                                neste dlo sv
                                                                                                                                                  st dlo op

                                                                                                                                                     ra oo 2
                                                                                                                                                   ra nd p3

                                                                                                                                                              n
                                        l

                                                                                                                                                               l
                                st rca

                                                                                                                                                      st rca
                                 m ca

                                                                                                                                                       m ca
                                 ap im

                                                                                                                                                       ap im
                                 sw to

                                                                                                                                                       s rto
                                ba an

                                                                                                                                                      ba an

                                                                                                                                                       25 yp
                                     51

                                  l p

                           xc la tch

                                     ea

                                                                                                                                                        m ix

                                                                                                                                                        l p

                                                                                                                                                       te m

                                                                                                                                                             c

                                                                                                                                                           ea
                                    lim
                           m maatr

                                n o
                                e c

                                                                                                                                                      e c
                                     e

                                                                                                                                                           e
                           n m o

                              eo a

                                                                                                                                                    eo a
                               ed o

                                                                                                                                                     ed o
                                     r

                                                                                                                                                           r
                                     r

                                                                                                                                                           r
                                     r

                                                                                                                                                    ra o
                                                                                                                                                       rm
                                 rm

                                                                                                                                               ke
                    ke

                                                                                                                                                  x
                                                                                                                                      ac
    ac

                                                 (a) Fence scheme overhead on Sightglass                                                                               (b) Swivel scheme overhead on Sightglass

   Figure 5: Performance overhead of Swivel on the Sightglass benchmarks. (a) On Sightglass, the baseline schemes LoadLfence, Strawman, and Mincut incur
   geomean overheads of 8.7×, 6.9×, and 2.4× respectively. (b) In contrast, the Swivel schemes perform much better where the ASLR versions of Swivel-SFI and
   Swivel-CET incur geomean overheads of 5.5% and 4.2% respectively. With deterministic sandbox poisoning mitigations, these overheads are 61.9% and 99.7%.

                          20×                                                                                                                  250%
Relative execution time

                                                                                                                          Execution overhead
                                        LoadLfence                                                                                             200%                                                      Stock-Unrolled         SFI-Det
                          15×           Strawman                                                                                                                                                         SFI-ASLR               CET-Det
                                        Mincut                                                                                                 150%                                                      CET-ASLR

                          10×                                                                                                                  100%

                                                                                                                                               50%
                          5×
                                                                                                                                                0%
                          1×
                                                                                                                                               -50%
                                                                                                           r
                                 2

                                            cf

                                                        ilc

                                                                       d

                                                                                      m

                                                                                               bm

                                                                                                                      n

                                                                                                                                                                                                                                  r
                                                                                                                                                       2

                                                                                                                                                                  cf

                                                                                                                                                                             ilc

                                                                                                                                                                                          d

                                                                                                                                                                                                     um

                                                                                                                                                                                                                    m

                                                                                                                                                                                                                                            n
                                                                                                           ta

                                                                                                                                                                                                                                ta
                                 ip

                                                                      m

                                                                                                                     ea

                                                                                                                                                      ip

                                                                                                                                                                                          m

                                                                                                                                                                                                                                          ea
                                           m

                                                                                                                                                                 m
                                                                                  tu

                                                                                                                                                                                                                   lb
                                                        m

                                                                                                                                                                             m
                                                                                                         as

                                                                                                                                                                                                                               as
                                bz

                                                                                                                                                    bz
                                                                    na

                                                                                              l

                                                                                                                                                                                        na

                                                                                                                                                                                                    nt
                                                                                                                    m
                                        9_

                                                                                           0_

                                                                                                                                                                                                                                          m
                                                                                                                                                                9_

                                                                                                                                                                                                                0_
                                                                                 an
                                                     3_

                                                                                                                                                                         3_
                                                                                                     3_

                                                                                                                                                                                                                           3_
                                                                                                                                                                                                     a
                           1_

                                                                                                                                                 1_
                                                                                                                eo
                                                                4_

                                                                                                                                                                                                                                      eo
                                                                                                                                                                                    4_
                                      42

                                                                                          47

                                                                                                                                                           42

                                                                                                                                                                                                              47
                                                   43

                                                                                                                                                                        43
                                                                             qu

                                                                                                                                                                                                  qu
                                                                                                    47

                                                                                                                                                                                                                          47
                          40

                                                                                                                                               40
                                                              44

                                                                                                                                                                                   44
                                                                                                                G

                                                                                                                                                                                                                                      G
                                                                           lib

                                                                                                                                                                                              lib
                                                                        2_

                                                                                                                                                                                           2_
                                                                      46

                                                                                                                                                                                         46
                                               (a) Fence scheme overhead on SPEC 2006                                                                                (b) Swivel scheme overhead on SPEC 2006

   Figure 6: Performance overhead of Swivel on SPEC 2006 benchmarks. (a) On SPEC 2006, the baseline schemes LoadLfence, Strawman, and Mincut incur
   overheads of 7.3×–19.6×, 1.8×–7.3×, and 1.2×–5.4× respectively. (b) In contrast, the Swivel schemes perform much better where the ASLR versions of
   Swivel-SFI and Swivel-CET incur overheads of at most 10.3% and 6.1% respectively. With deterministic sandbox poisoning mitigations, these overheads are
   3.3%–86.1% and 8.0%–240.2% respectively.

  primitives (ed25519, xchacha20), mathematical functions                                                                                       run them with the default settings. Sightglass repeats each test
  (ackermann, sieve), and common programming utilities                                                                                          at least 10 times or for a total of 100ms (whichever occurs
  (heapsort, strcat, strtok).                                                                                                                   first) and reports the median runtime. We compare Swivel’s
I SPEC CPU 2006 is a popular performance benchmark                                                                                              performance overhead with respect to the performance of the
  that includes various integer and floating point workloads                                                                                    same benchmarks when using the stock Lucet compiler. For
  from typical programs. We evaluate on only the subset                                                                                         increased measurement consistency on short-running bench-
  of the benchmarks from SPEC 2006 that are compatible                                                                                          marks, while measuring Sightglass we pin execution to a
  with Wasm and Lucet. This excludes programs written                                                                                           single core and disable CPU frequency scaling.6
  in Fortran, programs that rely on dynamic code rewriting,                                                                                     Baseline schemes In addition to our comparison against
  programs that require more than 4GB of memory which                                                                                           Stock Lucet, we also implement three known Spectre miti-
  Wasm does not support, and programs that use exceptions,                                                                                      gations using lfences and compare against these as a refer-
  longjmp, or multithreading.5 We note that Swivel does not
                                                                                                                                                ence point. First, we implement LoadLfence, which places an
  introduce new incompatibilities with SPEC 2006 bench-                                                                                         lfence after every load, similar to Microsoft’s Visual Studio
  marks; all of Swivel’s schemes are compatible with the                                                                                        compiler’s “Qspectre-load” mitigation [61]. Next, we imple-
  same benchmarks as stock Lucet.                                                                                                               ment Strawman, a scheme which restricts code to sequential
   Setup We compile both Sightglass and the SPEC 2006                                                                                           execution by placing an lfence at all control flow targets (at
   benchmarks with our modified Lucet Wasm compiler and                                                                                         the start of all linear blocks)—this is similar to the Intel com-
                                                                                                                                                piler’s “all-fix-lfence” mitigation [40]. Finally, we implement
      5 Some Web-focused Wasm platforms support some of these features.                                                                         Mincut, an lfence insertion algorithm suggested by Vassena
   Indeed, previous academic work evaluates these benchmarks on Wasm [42],
   but non-Web Wasm platforms including Lucet do not support them.                                                                                    6 Tests   were performed on June 18, 2020; see testing disclaimer A.2.

                                                                                                                          10
et al. [86] which uses a min-cut algorithm to minimize the              Transition Type                                    Function     Callback
number of required lfences. We further augment Mincut’s                                                                    Invoke       Invoke
lfence insertion with several of our own optimizations, in-              Stock                                              2.14µs       0.07µs
                                                                         Swivel-SFI (lfence + BTB flush both ways)           4.5µs       1.26µs
cluding (1) only inserting a single lfence per linear block;             Swivel-CET ASLR (lfence + BTB flush                4.08µs       0.79µs
and (3) unrolling loops to minimize branches, as we do for               one way + MPK)
the register interlock scheme and CBP conversions (§3.2.2).              Swivel-CET deterministic (lfence + MPK)            2.29µs       0.08µs
Finally, to ensure that unrolling loops does not provide an
unfair advantage, we also present results for Stock-Unrolled,          Table 3: Time taken for transitions between the application and sandbox—for
                                                                       function calls into the sandbox and callback invocations from the sandbox.
which is stock Lucet with the same loop unrolling as used in           Swivel overheads are generally modest, with the deterministic variant of
Swivel’s schemes.                                                      Swivel-CET in particular imposing very low overheads.

Results We present the Wasm execution overhead of the var-
                                                                       that similar data-dependent loops are largely the cause for
ious protection options on the Sightglass benchmarks in Fig-
                                                                       the slowdowns on SPEC benchmarks, including 429.mcf and
ure 5b, and on the SPEC 2006 benchmarks in Figure 6b. The
                                                                       401.bzip2. Some of the other large overheads in Sightglass
overheads of the ASLR versions of Swivel-SFI and Swivel-
                                                                       (e.g., fib2 and nestedloop3) are largely artifacts of the bench-
CET are small: 5.5% and 4.2% geomean overheads respec-
                                                                       marking suite: These microbenchmarks test simple constructs
tively on Sightglass, and at most 10.3% (geomean: 3.4%) and
                                                                       like loops—and CBP-to-BTB conversion naturally makes
at most 6.1% (geomean: 2.6%) respectively on SPEC. The
                                                                       (almost empty) loops slow.
deterministic versions of Swivel introduce modest overheads:
61.9% and 99.7% on Sightglass, and 3.3%–86.1% (geomean:                5.2     Sandbox transition overhead
47.3%) and 8.0%–240.2% (geomean: 96.3%) on SPEC. All
four configurations outperform the baseline schemes by or-             We evaluate the overhead of context switching. As described
ders of magnitude: Strawman incurs geomean overheads of                in Section 3, Swivel adds an lfence instruction to host-
6.9× and 4.3× on Sightglass and SPEC respectively, Load-               sandbox transitions to mitigate sandbox breakout attacks. In
Lfence incurs 8.7× and 12.5× overhead respectively, while              addition to this: Swivel-SFI flushes the BTB during each tran-
Mincut incurs 2.4× and 2.8× respectively.                              sition; Swivel-CET, in deterministic mode, switches Intel®
                                                                       MPK domains during each transition; and Swivel-CET, in
Breakdown Addressing CBP poisoning (CBP-to-BTB con-                    ASLR mode, flushes the BTB in one direction and switches
version in Swivel-SFI and register interlocks in Swivel-CET)           Intel® MPK domain in each transition.
dominates the performance overhead of our deterministic im-               We measure the time required for the host application to
plementations. We confirm this hypothesis with a microbench-           invoke a simple no-op function call in the sandbox, as well
mark: We measure the overheads of these techniques individ-            as the time required for the sandboxed code to invoke a per-
ually on stock Lucet (with our loop unrolling flags). We find          mitted function in the application (i.e., perform a callback).
that the average (geomean) overhead of CBP conversion is               We compare the time required for Wasm code compiled by
52.9% on Sightglass and 38.5% on SPEC. The corresponding               stock Lucet with the time required for code compiled with our
overheads for register interlocks are 93.2% and 53.4%.                 various protection schemes. We measure the average perfor-
   Increasing loop unrolling thresholds does not significantly         mance overhead across 1000 such function call invocations.7
improve the performance of stock Lucet (e.g., the speedup              These measurements are presented in Table 3.
of our loop unrolling on stock Lucet is 0.0% and 5.9% on                  First, we briefly note that function calls in stock Lucet take
Sightglass and SPEC, respectively). It does impact the perfor-         much longer than callbacks. This is because the Lucet runtime
mance of our deterministic Swivel variants though (e.g., we            has not fully optimized the function call transition, as these
find that it contributes to a 15%-20% speed up). This is not           are relatively rare compared to callback transitions, which
surprising since loop unrolling results in fewer conditional           occur during every syscall.
branches (and thus reduces the effect of CBP conversions and              In general, Swivel’s overheads are modest, with the deter-
register interlocking).                                                ministic variant of Swivel-CET in particular imposing very
   To understand the outliers in Figure 5 and Figure 6,                low overheads. Flushing the BTB does increase transition
we inspect the source of the benchmarks. Some of the                   costs, but the overall effect of this increase depends on how
largest overheads in Figure 5 are on Sightglass’ string ma-            frequently transitions occur between the application and sand-
nipulation benchmarks, e.g., strcat and strlen. These mi-              box. In addition, flushing the BTB affects not only transi-
crobenchmarks have lots of data-dependent loops—tight                  tion performance but also the performance of both the host
loops with data-dependent conditions—that cannot be un-                application and sandboxed code. Fully understanding these
rolled at compile-time. Since our register interlocking inserts        overheads requires that we evaluate the overall performance
a data dependence between the pinned heap base register and            impact on real world applications, which we do next.
the loop condition, this prevents the CPU from speculatively
executing instructions from subsequent iterations. We believe             7 Tests   were performed on June 18, 2020; see testing disclaimer A.2.

                                                                  11
You can also read