LibrettOS: A Dynamically Adaptable Multiserver-Library OS
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
LibrettOS: A Dynamically Adaptable Multiserver-Library OS∗ Ruslan Nikolaev, Mincheol Sung, Binoy Ravindran Bradley Department of Electrical and Computer Engineering, Virginia Tech {rnikola, mincheol, binoy}@vt.edu Abstract network components fail or need to be upgraded. Finally, to arXiv:2002.08928v1 [cs.OS] 20 Feb 2020 efficiently use hardware resources, applications can dynam- We present LibrettOS, an OS design that fuses two ically switch between the indirect and direct modes based paradigms to simultaneously address issues of isolation, on their I/O load at run-time. We evaluate LibrettOS with performance, compatibility, failure recoverability, and run- 10GbE and NVMe using Nginx, NFS, memcached, Redis, and time upgrades. LibrettOS acts as a microkernel OS that runs other applications. LibrettOS’s performance typically ex- servers in an isolated manner. LibrettOS can also act as a ceeds that of NetBSD, especially when using direct access. library OS when, for better performance, selected applica- Keywords: operating system, microkernel, multiserver, net- tions are granted exclusive access to virtual hardware re- work server, virtualization, Xen, IOMMU, SR-IOV, isolation sources such as storage and networking. Furthermore, ap- plications can switch between the two OS modes with no interruption at run-time. LibrettOS has a uniquely distin- guishing advantage in that, the two paradigms seamlessly 1 Introduction coexist in the same OS, enabling users to simultaneously ex- Core components and drivers of a general purpose mono- ploit their respective strengths (i.e., greater isolation, high lithic operating system (OS) such as Linux or NetBSD typ- performance). Systems code, such as device drivers, net- ically run in privileged mode. However, this design is of- work stacks, and file systems remain identical in the two ten inadequate for modern systems [13, 30, 43, 50, 62]. On modes, enabling dynamic mode switching and reducing de- the one hand, a diverse and ever growing kernel ecosys- velopment and maintenance costs. tem requires better isolation of individual drivers and other To illustrate these design principles, we implemented a system components to localize security threats due to the prototype of LibrettOS using rump kernels, allowing us to increasingly large attack surface of OS kernel code. Bet- reuse existent, hardened NetBSD device drivers and a large ter isolation also helps with tolerating component failures ecosystem of POSIX/BSD-compatible applications. We use and thereby increases reliability. Microkernels achieve this hardware (VM) virtualization to strongly isolate different goal, specifically in multiserver OS designs [13, 30, 33].1 On rump kernel instances from each other. Because the orig- the other hand, to achieve better device throughput and inal rumprun unikernel targeted a much simpler model for resource utilization, some applications need to bypass the uniprocessor systems, we redesigned it to support multi- system call and other layers so that they can obtain exclu- core systems. Unlike kernel-bypass libraries such as DPDK, sive access to device resources such as network adapter’s applications need not be modified to benefit from direct (NIC) Tx/Rx queues. This is particularly useful in recent hardware access. LibrettOS also supports indirect access hardware with SR-IOV support [63], which can create vir- through a network server that we have developed. Instances tual PCIe functions: NICs or NVMe storage partitions. Li- of the TCP/IP stack always run directly inside the address brary OSes and kernel-bypass libraries [77, 84] achieve this space of applications. Unlike the original rumprun or mono- goal. Multiserver-inspired designs, too, can outperform tra- lithic OSes, applications remain uninterrupted even when ditional OSes on recent hardware [54, 56]. ∗ ©2020 Copyright held by the owner/author(s). Publication rights Microkernels, though initially slow in adoption, have gai- licensed to ACM. This is the author’s version of the work. It is ned more traction in recent years. Google’s Fuchsia OS [24] posted here for your personal use. Not for redistribution. The defini- uses the Zircon microkernel. Intel Management Engine [37] tive Version of Record was published in Proceedings of the 16th uses MINIX 3 [30] since 2015. A multiserver-like network- ACM SIGPLAN/SIGOPS International Conference on Virtual Execu- tion Environments (VEE ’20), March 17, 2020, Lausanne, Switzerland ing design was recently revisited by Google [56] to improve http://dx.doi.org/10.1145/3381052.3381316. performance and upgradability when using their private The U.S. Government is authorized to reproduce and distribute reprints 1 As microkernels are defined broadly in the literature, we clarify that for Governmental purposes notwithstanding any copyright annotation we consider multiserver OSes as those implementing servers to isolate core thereon. OS components, e.g., MINIX 3 [30]. 1
devices restrict the number of SR-IOV interfaces – e.g., the storage) JDK (NVMestorage) python Intel 82599 adapter [38] supports up to 16 virtual NICs, 4 NFS ntp Tx/Rx queues each; for other adapters, this number can be Network rsyslog Server ssh even smaller. Thus, it is important to manage available bus(NVMe hardware I/O resources efficiently. Since I/O load is usu- DB key-value HTTP (Nginx) ally non-constant and changes for each application based (mySQL) (Redis) PCIebus on external circumstances (e.g., the number of clients con- nected to an HTTP server during peak and normal hours), PCIe PCIe PCIebus bus(NIC) (NIC) conservative resource management is often desired: use network server(s) until I/O load increases substantially, at Figure 1. Server ecosystem example. which point, migrate to the library OS mode (for direct ac- cess) at run-time with no interruption. This is especially useful for recent bare metal cloud systems – e.g., when one (non-TCP) messaging protocol. However, general-purpose Amazon EC2 bare metal instance is shared by several users. application and device driver support for microkernels is In this paper, we present a new OS design – LibrettOS limited, especially for high-end hardware such as 10GbE+. – that not only reconciles the library and multiserver OS paradigms while retaining their individual benefits (of bet- Kernel-bypass techniques, e.g., DPDK [84] and ter isolation, failure recoverability, and performance), but SPDK [77], are also increasingly gaining traction, as also overcomes their downsides (of driver and application they eliminate OS kernels from the critical data path, incompatibility). Moreover, LibrettOS enables applications thereby improving performance. Unfortunately, these to switch between these two paradigms at run-time. While techniques lack standardized high-level APIs and require high performance can be obtained with specialized APIs, massive engineering effort to use, especially to adapt to which can also be adopted in LibrettOS, they incur high en- existing applications [89]. Additionally, driver support in gineering effort. In contrast, with LibrettOS, existing appli- kernel-bypass libraries such as DPDK [84] is great only cations can already benefit from more direct access to hard- for high-end NICs from certain vendors. Re-implementing ware while still using POSIX. drivers and high-level OS stacks from scratch in user space We present a realization of the LibrettOS design through involves significant development effort. a prototype implementation. Our prototype leverages rump Oftentimes, it is overlooked that “no one size fits all.” In kernels [42] and reuses a fairly large NetBSD driver collec- other words, no single OS model is ideal for all use cases. tion. Since the user space ecosystem is also inherited from Depending upon the application, security or reliability re- NetBSD, the prototype retains excellent compatibility with quirements, it is desirable to employ multiple OS paradigms existing POSIX and BSD applications as well. Moreover, in in the same OS. In addition, applications may need to switch the two modes of operation (i.e., library OS mode and multi- between different OS modes based on their I/O loads. In server OS mode), we use an identical set of drivers and soft- Figure 1, we illustrate an ecosystem of a web-driven server ware. In our prototype, we focus only on networking and running on the same physical or virtual host. The server storage. However, the LibrettOS design principles are more uses tools for logging (rsyslog), clock synchronization (ntp), general, as rump kernels can potentially support other sub- NFS shares, and SSH for remote access. The server also runs systems – e.g., NetBSD’s sound drivers [18] can be reused. python and Java applications. None of these applications The prototype builds on rumprun instances, which execute are performance-critical, but due to the complexity of the rump kernels atop a hypervisor. Since the original rumprun network and other stacks, it is important to recover from did not support multiple cores, we redesigned it to support temporary failures or bugs without rebooting, which is im- symmetric multiprocessing (SMP) systems. We also added possible in monolithic OSes. One way to meet this goal is 10GbE and NVMe drivers, and made other improvements to to have a network server as in the multiserver paradigm, rumprun. As we show in Section 5, the prototype outper- which runs system components in separate user processes forms the original rumprun and NetBSD, especially when for better isolation and failure recoverability. This approach employing direct hardware access. In some tests, the pro- is also convenient when network components need to be totype also outperforms Linux, which is often better opti- upgraded and restarted at run-time [56] by triggering an ar- mized for performance than NetBSD. tificial fault. The paper’s research contribution is the proposed OS Core applications such as an HTTP server, database, and design and its prototype. Specifically, LibrettOS is the first key-value store are more I/O performance-critical. When OS design that simultaneously supports the multiserver and having a NIC and NVMe storage with SR-IOV support (or library OS paradigms; it is also the first design that can dy- multiple devices), selected applications can access hardware namically switch between these two OS modes at run-time resources directly as in library OSes or kernel-bypass li- with no interruption to applications. Our network server, braries – i.e., by bypassing the system call layer or inter- designed for the multiserver mode, accommodates both process communication (IPC). Due to finite resources, PCIe paradigms by using fast L2 frame forwarding from appli- 2
cations. Finally, LibrettOS’s design requires only one set of 2.2 NetBSD, rump kernels, and rumprun device drivers for both modes and mostly retains POSIX/BSD Rump kernels are a concept introduced and popularized by compatibility. Antti Kantee [42] and the larger NetBSD [59] community. Our work also confirms an existent hypothesis that ker- NetBSD is a well known monolithic OS; along with FreeBSD nel bypass is faster than a monolithic OS approach. How- and OpenBSD, it is one of the most popular BSD systems ever, unlike prior works [7, 56, 64], we show this without in use today. NetBSD provides a fairly large collection of resorting to optimized (non-POSIX) APIs or specialized de- device drivers, and its latest 8.0 version supports state-of- vice drivers. the-art 10GbE networking, NVMe storage, and XHCI/USB 3.0. Unlike in other OSes, NetBSD’s code was largely re- designed to factor out its device drivers and core compo- 2 Background nents into anykernel components. As shown in Figure 2, a special rump kernel glue layer enables execution of anyker- For greater clarity and completeness, in this section, we dis- nels outside of the monolithic NetBSD kernel. For example, cuss various OS designs, microkernels, and hypervisors. We using POSIX threads (pthreads), rump kernels can run anyk- also provide background information on rump kernels and ernels in user space on top of existing OSes. In fact, NetBSD their practical use for hypervisors and bare metal machines. already uses rump kernels to facilitate its own device driver development and testing process, which is done much easier in user space. 2.1 Multiserver and Library OSes rump kernel anykernel calls Traditionally, monolithic OSes run all critical components Program in a privileged CPU mode within a single kernel address USB VFS TCP/IP Program libc libc space that is isolated from user applications running in XHCI NVMe NIC function an unprivileged mode. The isolation of systems software components such as device drivers in separate address spaces is the fundamental principle behind the microker- rump rumpglue gluecode code(**) (**) nel model [1, 27, 30, 52], which provides stronger security and reliability [22, 31, 46] guarantees to applications com- rumprun rumprun(*) (*) pared to the classical monolithic kernel designs. To support existing applications and drivers, microkernel systems ei- Figure 2. Rump software stack, (*) indicates components ther implement emulation layers [26] or run a multiserver substantially and (**) partially modified in LibrettOS. OS [13, 20, 29, 30, 33] on top of the microkernel. In the multi- server design, servers are typically created for specific parts Rump kernels may serve as a foundation for new OSes, as of the system, such as device drivers and network and stor- in the case of the rumprun unikernel. As shown in Figure 2, age stacks. Applications communicate with the servers us- systems built from rump kernels and rumprun are, effec- ing IPC to access hardware resources. tively, library OSes. In contrast to NetBSD, rumprun-based The system call separation layer between user applica- systems substitute system calls with ordinary function calls tions and critical components in monolithic OSes is some- from libc. The original rumprun, unfortunately, lacks SMP what arbitrary, chosen for historical reasons such as rough support. correspondence to the POSIX standard. However, more ker- Rump kernels were also used to build servers as in the nel components can be moved into the application itself, rump file system server [41]. However, prior solutions sim- bringing it closer to the hardware. This concept was ini- ply rely on user-mode pthreads, targeting monolithic OSes. tially proposed by Tom Anderson [3], later implemented The design of such servers is very different from a more in the exokernel model [15], and named the “library OS.” low-level, microkernel-based architecture proposed in this This model is often advocated for performance reasons and paper. OS flexibility [6]. Subsequent works pointed out the secu- rity benefits of the library OS model [66, 85, 86] due to the 2.3 Linux-based Library OSes strong isolation provided by the reduced interactions be- tween the application and the privileged layer. Specialized We also considered Linux Kernel Library (LKL) [67], which library OSes such as unikernels [10, 43, 45, 50, 55, 90] do is based on the Linux source. However, LKL is not as flexible not isolate OS components from applications and are de- as rump kernels yet. Additionally, rump kernels have been signed to run just a single application in virtualized envi- upstreamed into the official NetBSD repository, whereas ronments typical of cloud infrastructures. As unikernels are LKL is still an unofficial fork of the Linux source branch. application-oriented, they may provide a significant reduc- rump kernels also support far more applications at the mo- tion in software stacks compared to full-fledged OSes. ment. 3
System API Generality TCP Driver Paradigms Dynamic Direct Failure Stack Base Switch Access Recovery DPDK [84] Low-level Network 3rd party Medium N/A No Yes No SPDK [77] Low-level Storage N/A NVMe N/A No Yes No IX [7] Non-standard Network App DPDK Library OS No Almost No Arrakis [64] POSIX/Custom Network App Limited Library OS No Yes No Snap [56] Non-standard Network unavailable Limited Multiserver No No Yes VirtuOS [62] POSIX/Linux Universal Server Large Multiserver No No Yes MINIX 3 [30] POSIX Universal Server Limited Multiserver No No Yes HelenOS [13] POSIX Universal Servers Limited Multiserver No No Yes Linux POSIX/Linux Universal Kernel Large Monolithic No No No NetBSD [59] POSIX/BSD Universal Kernel Large Monolithic No No No LibrettOS POSIX/BSD Universal App Large Multiserver, Yes Yes Yes (NetBSD) Library OS Table 1. Comparison of LibrettOS with libraries and frameworks as well as monolithic, multiserver, and library OSes. Unikernel Linux (UKL) [69] uses the unikernel model for where rumprun instances execute as ordinary microkernel Linux by removing CPU mode transitions. UKL is in an processes. The seL4 version of rumprun still lacks SMP sup- early stage of development; its stated goals are similar to port, as it only changes its platform abstraction layer. that of rumprun. However, it is unclear whether it will even- tually match rump kernels’ flexibility. Aside from technical issues, due to incompatibility, 2.5 PCI Passthrough and Input-Output Linux’s non-standard GPLv2-only license may create le- Memory Management Unit (IOMMU) gal issues2 when linking proprietary3 or even GPLv3+4 VMs can get direct access to physical devices using their open-source applications. Since licensing terms cannot be hypervisor’s PCI passthrough capabilities. IOMMU [2, 39] adapted easily without explicit permission from all numer- hardware support makes this process safer by remapping in- ous Linux contributors, this can create serious hurdles in terrupts and I/O addresses used by devices. When IOMMU the wide adoption of Linux-based library OS systems. In is enabled, faulty devices are unable to inject unassigned contrast, rump kernels and NetBSD use a permissive 2-BSD interrupts or perform out-of-bound DMA operations. Sim- license. ilarly, IOMMU prevents VMs from specifying out-of-bound addresses, which is especially important when VM appli- 2.4 Hypervisors and Microkernels cations are given direct hardware access. We use both PCI passthrough and IOMMU in the LibrettOS prototype to ac- Hypervisors are typically used to isolate entire guest OSes cess physical devices. in separate Virtual Machines (VMs). In contrast, microker- nels are typically used to reduce the Trusted Computing Base (TCB) of a single OS and provide component isolation us- 2.6 Single-Root I/O Virtualization (SR-IOV) ing process address space separation. Although hypervisors To facilitate device sharing across isolated entities such as and microkernels have different design goals, they are inter- VMs or processes, self-virtualizing hardware [68] can be related and sometimes compared to each other [26, 28]. used. SR-IOV [63] is an industry standard that allows split- Moreover, hypervisors can be used for driver isolation in ting one physical PCIe device into logically separate vir- VMs [17], whereas microkernels can be used as virtual ma- tual functions (VFs). Typically, a host OS uses the device’s chine monitors [16]. physical function (PF) to create and delete VFs, but the PF The original uniprocessor rumprun executes as a par- need not be involved in the data path of the VF users. SR- avirtualized (PV) [88] guest OS instance on top of the Xen IOV is already widely used for high-end network adapters, hypervisor [4] or KVM [44]. Recently, rumprun was also where VFs represent logically separate devices with their ported [14] to seL4 [46], a formally verified microkernel, own MAC/IP addresses and Tx/Rx hardware buffers. SR- 2 This is just an opinion of this paper’s authors and must not be inter- IOV is also an emerging technology for NVMe storage, preted as an authoritative legal advice or guidance. where VFs represent storage partitions. 3 Due to imprecise language, Linux’s syscall exception falls into a gray legal area when the system call layer is no longer present or clearly defined. VFs are currently used by VMs for better network perfor- 4 Without the syscall exception, GPLv3+ is incompatible with GPLv2 mance, as each VM gets its own slice of the physical device. (GPLv2+ only) [83]. Moreover, most user-space code has moved to GPLv3+. Alternatively, VFs (or even the PF) can be used by the DPDK 4
library [84], which implements lower layers of the network Application 1, libOS Application 2 Network (direct access) (uses servers) Server stack (L2/L3) for certain NIC adapters. High-throughput M:N network applications use DPDK for direct hardware access. scheduler M:N libc libc scheduler 2.7 POSIX and Compatibility FS TCP/IP TCP/IP IPC Although high-throughput network applications can bene- NVMe NIC VIF NIC fit from custom APIs [7], POSIX compatibility remains crit- rumprun rumprun rumprun ical for many legacy applications. Moreover, modern server applications require asynchronous I/O beyond POSIX, such μ-kernel or hypervisorkernel or hypervisor scheduler (Xen, seL4, etc) as Linux’s epoll_wait(2) or BSD’s kqueue(2) equivalent. PCI passthrough (with IOMMU) Typically, library OSes implement their own APIs [7, 64]. PCIe bus (PF or VF through SR-kernel or hypervisorIOV) Some approaches [34, 85] support POSIX but still run on top of a monolithic OS with a large TCB. LibrettOS provides full Figure 3. LibrettOS’s architecture. POSIX and BSD compatibility (except fork(2) and pipe(2) as discussed in Section 4.2), while avoiding a large TCB in the underlying kernel. Since an OS typically runs several applications, hardware resources need to be shared. In this model, devices are 2.8 Summary shared using hardware capabilities such as SR-IOV which splits one NVMe device into partitions and one NIC into Table 1 summarizes OS design features and compares Li- logically separate NICs with dedicated Tx/Rx buffers. We brettOS against existing monolithic, library, and multiserver assume that the NIC firmware does not have fatal security OSes. Since LibrettOS builds on NetBSD’s anykernels, it in- vulnerabilities – i.e., it prevents applications from adversely herits a very large driver code base. Applications do not affecting one another. (VF isolation is already widely used generally need to be modified as the user space environ- by guest VMs where similar security concerns exist.) ment is mostly compatible with that of NetBSD, except for scenarios described in Section 4.2. LibrettOS supports two While some high-end NICs expose VFs using SR-IOV, the paradigms simultaneously: applications with high perfor- total number of VFs is typically limited, and only a small mance requirements can directly access hardware resources number of applications that need faster network access will as with library OSes (or DPDK [84] and SPDK [77]), while benefit from this hardware capability. When there are no other applications can use servers that manage correspond- available VFs for applications, or their direct exposure to ap- ing resources as with multiserver OSes. Similar to MINIX plications is undesirable due to additional security require- 3 [30], HelenOS [13], VirtuOS [62], and Snap [56], LibrettOS ments, we use network servers. As in library OS mode, supports failure recovery of servers, which is an important Application 2’s data (see Figure 3) goes through libc sock- feature of the multiserver OS paradigm. Additionally, Li- ets and the entire network stack to generate L2 (data link brettOS’s applications can dynamically switch between the layer) network frames. However, unlike Application 1, the library and multiserver OS modes at run-time. network frames are not passed to/from the NIC driver di- In Section 6, we provide a detailed discussion of the past rectly, but rather relayed through a virtual interface (VIF). and related work and contrast them with LibrettOS. The VIF creates an IPC channel with a network server where frames are additionally verified and rerouted to/from the NIC driver. 3 Design LibrettOS is mostly agnostic to the kind of hypervisor or microkernel used by rumprun. For easier prototyping One of the defining features of LibrettOS is its ability to use while preserving rich IPC capabilities and microkernel-like the same software stacks and device drivers irrespective of architecture, we used Xen, which runs rumprun instances in whether an application accesses hardware devices directly VMs. Unlike KVM, Xen has a microkernel-style design and or via servers. In this section, we describe LibrettOS’s archi- does not use any existing monolithic OS for the hypervisor tecture, its network server, and dynamic switching capabil- implementation. Since rumprun has already been ported to ity. the seL4 microkernel [14], and our core contributions are mostly orthogonal to the underlying hypervisor, the Libret- tOS design principles can be adopted in seL4 as well. 3.1 LibrettOS’s Architecture Scheduling in LibrettOS happens at both the hypervi- Figure 3 presents LibrettOS’s architecture. In this illus- sor and user application levels. The hypervisor schedules tration, Application 1 runs in library OS mode, i.e., it ac- per-VM virtual CPUs (VCPUs) as usual. Whereas the orig- cesses hardware devices such as NVMe and NIC directly inal rumprun implemented a simple non-preemptive N : 1 by running corresponding drivers in its own address space. uniprocessor scheduler, we had to replace it entirely to sup- 5
port SMP – i.e., multiple VCPUs per instance. Each rumprun Application mbuf Network mbuf Application 1 exchange Server exchange 2 instance in LibrettOS runs its own M : N scheduler that ex- ecutes M threads on top of N VCPUs. libc Rx handler 1 Rx libc LibrettOS does not have any specific restrictions on the TCP/IP Rx:threads handler 2 Rx:threads TCP/IP type of hardware used for direct access; we tested and eval- Tx Tx VIF NIC driver VIF uated both networking and storage devices. We only im- Tx:threads Tx:threads plemented a network server to demonstrate indirect access. rumprun rumprun rumprun VIRQ1 VIRQ2 However, storage devices and file systems can also be shared using an NFS server. The NFS rumprun instance runs on top Figure 4. Network server. of an NVMe device initialized with the ext3 file system and exports file system shares accessible through network. In Network Sever Application the long term, an existing rump kernel-based approach [41] for running file systems in the user space of monolithic OSes Packet forward service libc READ can be adopted in the rumprun environment as well. RO-Mapped Pages TCP/UDP LibrettOS does not currently support anything outside of Portmap Table the POSIX/BSD APIs. Admittedly, POSIX has its downsides – e.g., copying overheads which are undesirable for high- Xen performance networking. Nonetheless, LibrettOS already Portmap Table Portbind Hypercall eliminates the system call layer, which can open the way for further optimizations by extending the API in the future. Figure 5. Portmap table. 3.2 Network Server routing based on TCP/UDP ports. When applications re- Due to flexibility of rump kernels, their components can quest socket bindings, the network server allocates corre- be chosen case by case when building applications. Unlike sponding ports (both static and dynamic) by issuing a spe- more traditional OSes, LibrettOS runs the TCP/IP stack di- cial portbind hypercall as shown in Figure 5 which updates rectly in the application address space. As we discuss be- a per-server 64K port-VM (portmap) table in the hypervi- low, this does not compromise security and reliability. In sor. We store the portmap in the hypervisor to avoid its fact, compared to typical multiserver and monolithic OSes, loss when the network server is restarted due to failures. where buggy network stacks affect all applications, Libret- While this mechanism is specific to the network server, the tOS’s network stack issues are localized to one application. added code to the hypervisor is very small and is justified The network server IPC channel consists of Tx/Rx ring for omnipresent high-speed networking. Alternatively, the buffers which relay L2 network frames in mbufs through a portmap table can be moved to a dedicated VM. virtual network interface (VIF) as shown in Figure 4. We use a lock-free ring buffer implementation from [61]. When re- Since packet routing requires frequent reading of the laying mbufs, the receiving side, if inactive, must be notified portmap entries, we read-only map the portmap pages into using a corresponding notification mechanism. For the Xen the network server address space, so that the network server hypervisor, this is achieved using a virtual interrupt (VIRQ). can directly access these entries. Since a stream of packets is generally transmitted without The network server does not trust applications for se- delays, the number of such notifications is typically small. curity and reliability reasons. We enforce extra metadata The sending side detects that the receiving side is inactive verification while manipulating ring buffers. Since applica- by reading a corresponding shared (read-only) status vari- tions may still transmit incorrect data (e.g., spoof MAC/IP able denoted as threads, which indicates how many threads source addresses), either automatic hardware spoofing de- are processing requests. When this variable is zero, a VIRQ tection must be enabled, or address verification must be en- is sent to wake a worker thread up. forced by verifying MAC/IP addresses in outbound frames. The network server side has a per-application handler Similarly, ports need to be verified by the server. which forwards mbufs to the NIC driver. Frames do not Although, unlike POSIX, invalid packets may be gener- need to be translated, as applications and servers are built ated by (malicious) applications at any point, they can only from the same rump kernel source base. appear in local (per-application) ring buffers. When the net- An important advantage of this design is that an applica- work server copies packets to the global medium, it verifies tion can recover from any network server failures, as TCP MAC, IP, and port metadata which must already be valid for state is preserved inside the application. The network server a given application. Races are impossible as prior users of operates at the very bottom of the network stack and sim- the same MAC, IP, and port must have already transmitted ply routes inbound and outbound mbufs. However, rout- all packets to the global medium and closed their connec- ing across different applications requires some knowledge tions. of the higher-level protocols. We support per-application We also considered an alternative solution based on the 6
netfront/netback model [4] used by Xen. The netback driver as connections are closed. When launching an application typically runs in an OS instance with a physical device that needs dynamic switching (Application 2), the network (Driver Domain). The netfront driver is used by guest OSes server allocates an IP address that will be used by this ap- as a virtual NIC to establish network connectivity through plication. Applications use virtual interfaces (VIF) which the netback driver. Rumprun already implements a very are initially connected to the network server (i.e., similar to simple netfront driver so that it can access a virtual NIC Application 1). The network server adds the IP to the list of when running on top of the Xen hypervisor. However, configured addresses for its IF (PF or VF). it lacks the netback driver that would be required for the As I/O load increases, an application can be decoupled server. This model is also fundamentally different from from the network server to use direct access for better our approach: transmitted data must traverse additional OS throughput and CPU utilization. The NIC interface used by layers on the network server side and all traffic needs to the network server deactivates application’s IP (which re- go through network address translation (NAT) or bridging, mains unchanged throughout the lifetime of an application). as each application (VM instance) gets its own IP address. The application then configures an available NIC interface While the Xen approach is suitable when running guest (typically, VF) with its dedicated IP. Changing IP on an in- OSes, it introduces unnecessary layers of indirection and terface triggers ARP invalidation. A following ARP probe overheads as we show in Section 5.3. detects a new MAC address for the interface-assigned IP. These events are masked from applications, as they typi- cally use higher-level protocols such as TCP/IP. No pause- 3.3 Dynamic Mode Switch and-drain is needed, and on-the-fly requests are simply ig- LibrettOS’s use of the same code base for both direct and nored because TCP automatically resends packets without server modes is crucial for implementing the dynamic mode ACKs. switch. We implement this feature for networking, which is When running in this mode, all traffic from the NIC in- more or less stateless. As NVMe’s emerging support for SR- terface (direct access) is redirected through VIF, as all appli- IOV becomes publicly available, our technique can also be cation socket bindings belong to this VIF. This special VIF considered for storage devices. However, storage devices is very thin: mbufs are redirected to/from IF without copy- are more stateful and additional challenges need to be over- ing, i.e., avoiding extra overhead. When the load decreases, come. applications can return to the original state by performing Figure 6 shows LibrettOS’s dynamic switching mecha- the above steps in the reverse order. If applications always nism for the network server. LibrettOS is the only OS that use direct access (Application 3), VIFs are not needed, i.e., supports this mechanism. Moreover, applications do not all traffic goes directly through IF. need to be modified to benefit from this feature. As we fur- Dynamic switching helps conserve SR-IOV resources, i.e., ther discuss in Section 5, dynamic switching does not incur it enables more effective use of available hardware resources large overhead, and switching latency is small (56-69ms). across different applications. Policies for switching (e.g., for managing resources, security, etc.) are beyond the scope of Application Network Application this paper, as our focus is on the switching mechanism. In Application 1 Server 2 3 our evaluation, users manually trigger dynamic switching. IP 1 IP 1 MAC/IP IP 2 2 IP 3 VIF 1 IP 2 Dynamic VIF VF2 IF 4 Implementation switch IF IF Direct access Uses server only While NetBSD code and its rump kernel layer are SMP- aware (except rump kernel bugs that we fixed), none of the Figure 6. Dynamic switching mechanism. existing rumprun versions (Xen PV, KVM, or seL4) support SMP. The biggest challenge for dynamic switching is that ap- plication migration must be completely transparent. When 4.1 Effort using direct access (VF), we need a dedicated IP address for each NIC interface. On the other hand, our network server We redesigned low-level parts of rumprun (SMP BIOS shares the same IP address across all applications. A com- initialization, CPU initialization, interrupt handling) as plicating factor is that once connections are established, IP well as higher-level components such as the CPU sched- addresses used by applications cannot be changed easily. uler. Our implementation avoids coarse-grained locks To allow dynamic switching, we leverage NetBSD’s sup- on performance-critical paths. Additionally, the original port for multiple IPs per an interface (IF). Unlike hard- rumprun was only intended for paravirtualized Xen. Thus, ware Rx/Tx queues, IPs are merely logical resources; they we added support for SMP-safe, Xen’s HVM-mode event are practically unlimited when using NAT. A pool of avail- channels (VIRQs) and grant tables (shared memory pages) able IPs is given to a server, and addresses are recycled when implementing our network server. We implemented 7
Component Description LOC Processor 2 x Intel Xeon Silver 4114, 2.20GHz Number of cores 10 per processor, per NUMA node SMP support SMP BIOS init/trampoline, 2092 HyperThreading OFF (2 per core) M:N scheduler, interrupts TurboBoost OFF HVM support Xen grant tables and VIRQ 1240 L1/L2 cache 64 KB / 1024 KB per core PV clock SMP-safe Xen/KVM clock 148 L3 cache 14080 KB Network server mbuf routing, 1869 Main Memory 96 GB dynamic switching Network Intel x520-2 10GbE (82599ES) Xen Network server registrar 516 Storage Intel DC P3700 NVMe 400 GB NetBSD, network Network server forwarder 51 NetBSD, kqueue EVFILT_USER support 303 NetBSD, rump Race conditions in SMP 45 Table 3. Experimental setup. NetBSD, glue New drivers (NVMe, ixgbe) 234 NFS, rpcbind, Porting from NetBSD, 154 and mountd avoiding fork(2) Moreover, typical applications that rely on fork [82] can be Total: 6652 configured to use the multi-threading mode instead. We did not implement POSIX’s IPCs such as pipe(2), but Xen-based mechanisms can be used with no restrictions. Table 2. Added or modified code. As fork(2) is not currently supported, we did not consider sharing socket file descriptors or listen(2)/accept(2) handoff across processes. This may create additional challenges in the Xen HVM interrupt callback mechanism by binding it to the future. We believe that the problem is tractable, but re- the per-CPU interrupt vectors as it is implemented in Linux quires modification of the network stack. Also, Rx/Tx ring and FreeBSD. The HVM mode uses hardware-assisted virtu- buffers will have to be created per connection rather than alization and is more preferable nowadays for security rea- per process to allow sharing per-socket/connection buffers sons [53]. Moreover, it has better IOMMU support when us- across processes while not violating process isolation re- ing device PCI passthrough. We also fixed race conditions quirements. in the rump layer, and added NVMe and 10GbE ixgbe driver As previously mentioned, we did not develop policies for glue code which was previously unavailable for rumprun. dynamic switching as our focus was on mechanism. Sec- Additionally, we ported a patch that adds support for the tion 5 uses a simple policy wherein a user triggers an event. EVFILT_USER extension in kqueue(2), which is required by some applications and supported by FreeBSD but is, for some reason, still unsupported by NetBSD. 5 Evaluation Since rumprun does not currently support preemption, we designed a non-preemptive, lock-free M:N scheduler We evaluate LibrettOS using a set of micro- and macro- with global and per-VCPU queues. A lock-free scheduler benchmarks and compare it against NetBSD and Linux. has an advantage that, it is not adversely affected by pre- Both of these baselines are relevant since LibrettOS is di- emption which still takes place in the hypervisor. (When the rectly based on the NetBSD code base, and Linux is a hypervisor de-schedules a VCPU holding a lock, other VC- popular server OS. LibrettOS typically has better perfor- PUs cannot get the same lock and make further progress.) mance, especially when using direct mode. Occasionally, Lock-free data structures for OS schedulers were already ad- Linux demonstrates better performance since it currently vocated in [45]. has more optimizations than NetBSD in its drivers and net- In Table 2, we present the effort that was required to work stack, especially for 10GbE+, as we further discuss in implement our current LibrettOS prototype. Most changes Section 5.1. pertain to rumprun and the network server implementation, In the evaluation, we only consider standard POSIX com- whereas drivers and applications did not require substantial patible applications, as compatibility is one of our impor- changes. tant design goals. Since kernel-bypass libraries and typical library OSes [7, 77, 84] are not compatible with POSIX, they cannot be directly compared. Moreover, kernel-bypass li- 4.2 Limitations braries require special TCP stacks [40] and high engineer- In our current prototype, applications cannot create pro- ing effort to adopt existing applications [89]. Likewise, com- cesses through fork(2), as its implementation would require parison against existing multiserver designs [30, 56] is chal- complex inter-VM interactions which Xen is not designed lenging due to very limited application and device driver for. We consider such an implementation to be realistic support. for microkernels such as seL4. Each application, however, Table 3 shows our experimental setup. We run Xen can already support virtually unlimited number of threads. 4.10.1, and Ubuntu 17.10 with Linux 4.13 as Xen’s Dom0 8
(for system initialization only). We use the same version of 5.2 Sysbench/CPU Linux as our baseline. Finally, we use NetBSD 8.0 with the NET_MPSAFE feature5 enabled for better network scalabil- To verify our new scheduler and SMP support, we ran the ity [58]. LibrettOS uses the same NetBSD code base, also Sysbench/CPU [80] multi-threaded test which measures the with NET_MPSAFE enabled. We set the maximum trans- elapsed time for finding prime numbers. Overall, LibrettOS, mission unit (MTU) to 9000 in all experiments to follow gen- Linux, and NetBSD show roughly similar performance. eral recommendations [51] for the optimal 10GbE perfor- mance. We measure each data point 10 times and present 5.3 NetPIPE the average; the relative standard deviation is mostly less than 1%. LibrettOS not only outperforms NetBSD, but also outper- 8 Linux Throughput (Gbps) forms Linux in certain tests. This is despite the fact that 7 LibrettOS-Direct 6 LibrettOS-Server NetBSD can be slower than Linux. LibrettOS’s advantage NetBSD comes from its superior multiserver-library OS design and 5 Rumprun-PV 4 better resource management (compared to NetBSD). 3 2 1 5.1 NetBSD and Linux performance 0 0.25 1 4 16 64 256 Since neither the original rumprun nor other unikernels Message Size (KB) support 10GbE+ or NVMe physical devices, one of our goals (a) Throughput is to show how this cutting-edge hardware works in Libret- 100 tOS. CPU utilization (%) 80 Although NetBSD, Linux, and rumprun’s 1GbE are known to be on par with each other [14], 10GbE+ network- 60 ing puts more pressure on the system, and we found occa- 40 sional performance gaps between NetBSD and Linux. How- ever, they are mostly due to specific technical limitations, 20 most of which are already being addressed by the NetBSD 0 community. 0.25 1 4 16 64 256 One contributing factor is the network buffer size. Message Size (KB) NetBSD uses mbuf chains; each mbuf is currently limited (b) CPU utilization (1 CPU is 100%) to 512 bytes in x86-64. In contrast, Linux has larger (frame- sized) sk_buff primitives. Since 512-byte units are too small for 10GbE+ jumbo frames, we increased the mbuf size to Figure 7. NetPIPE (log2 scale). 4K; this improved performance by ≈10% on macrobench- marks. mbuf cannot be increased beyond the 4K page size To measure overall network performance, we use Net- easily without bigger changes to the network stack, but this PIPE, a popular ping pong benchmark. It exchanges a fixed is likely to be addressed by the NetBSD community sooner size message between servers to measure the latency and or later. bandwidth of a single flow. We evaluate Linux, LibrettOS Another contributing factor is the use of global SMP (direct access mode), LibrettOS (using our network server), locks by NetBSD. Since NET_MPSAFE was introduced very NetBSD, and the original rumprun with the netfront driver. recently, and the effort to remove the remaining coarse- On the client machine, Linux is running for all the cases. We grained locks is still ongoing, Linux’s network stack and configure NetPIPE to send 10,000 messages from 64 bytes network drivers may still have superior performance in to 512KB. Figure 7(a) shows the throughput for different some tests which use high-end 10GbE+ networking. message sizes. LibrettOS (direct access) achieves 7.4Gbps We also found that the adaptive interrupt moderation with 256KB. It also has a latency of 282µs for 256KB mes- option in NetBSD’s ixgbe driver adversely affected perfor- sages. LibrettOS (network server) achieves 6.4Gbps with mance in some tests such as NetPIPE (e.g., NetBSD was only 256KB, and the latency of 256KB messages is 326µs. Linux able to reach half of the throughput). Thus, we disabled it. achieves 6.5Gbps with 256KB messages and the latency of 318.9µs. NetBSD reaches 6.4Gbps in this test with the la- tency of 326µs. The original rumprun with the netfront 5 NET_MPSAFE, recently introduced in NetBSD, reduces global SMP driver reaches only 1.0Gbps with 256KB messages and has locks in the network stack. By and large, the network stack is the only major place where global locks still remain. The ongoing effort will elimi- a large latency of 1,983µs. nate them. Overall, all systems except the original rumprun have 9
comparable performance. With smaller messages (< Figure 9 shows the throughput with different concurrency 256KB), LibrettOS-Direct can be a bit faster than Linux or levels and file sizes. LibrettOS (direct access and server) NetBSD. We attribute this improvement to the fact that is always faster than NetBSD. Except very large blocks for smaller messages, the cost of system calls is non- (128KB), LibrettOS-Direct also outperforms Linux. Partic- negligible [76]. In LibrettOS, all system calls are replaced ularly, for 8KB/50, LibrettOS (direct access) is 66% faster with regular function calls. Even LibrettOS-Server avoids than NetBSD and 27% faster than Linux. Even LibrettOS- IPC costs, as the server and application VMs typically run Server outperforms Linux in many cases due to its opti- on different CPU cores. mized, fast IPC mechanism. We note that LibrettOS is able to outperform Linux even though the original NetBSD is LibrettOS Linux NetBSD slower than Linux in this test. We generally attribute this 5 improvement to the fact that LibrettOS removes the system Throughput (GB/s) 4 call layer. Furthermore, compared to NetBSD, LibrettOS im- 3 proves scheduling and resource utilization. For very large 2 blocks (128KB), NetBSD’s network stack current limitations 1 outweigh other gains. We also evaluated dynamic switching when running Ng- 0 4 8 16 32 64 128 256 512 inx. In Figure 10, LibrettOS-Hybrid represents a test with Block Size (KB) runtime switching from the server to direct mode and back to the server mode (the concurrency level is 20). Overall, Figure 8. NFS server. Nginx spends 50% of time in the server mode and 50% in the direct mode. LibrettOS-Hybrid’s average throughput shows LibrettOS-Direct is more efficient not just on throughput benefits of using this mode compared to LibrettOS-Server, but, more importantly, on CPU utilization (Figure 7(b)). The Linux, and NetBSD. The relative gain (Figure 10) over Linux, saved CPU resources can be used in myriad ways – e.g., let alone NetBSD, is quite significant in this test. We also running more applications, obtaining higher overall perfor- measured the cost of dynamic switching to be 56ms from mance. For LibrettOS-Server, there is a 1.8x gap in overall the server to direct mode, and 69ms in the opposite direc- CPU consumption due to an extra layer where packets are tion. transferred from an application to the server. The same ar- gument applies to other experiments in this section – e.g., 5.6 Memcached dynamic switching. We also tested LibrettOS’s network performance with mem- cached [57], a well-known distributed memory caching sys- 5.4 NFS Server tem, which caches objects in RAM to speed up database ac- To evaluate NVMe storage, we run an NFS server mac- cesses. We use memcached 1.4.33 on the server side. On the robenchmark by using Sysbench/FileIO [80] from the client. client side, we run memtier_benchmark 1.2.15 [71] which We measure mixed file I/O composed of page cache and stor- acts as a load generator. The benchmark performs SET and age I/O because users benefit from the page cache in gen- GET operations using the memcache_binary protocol. We eral. We mount an NVMe partition initialized with the ext3 use the default ratio of operations 1:10 to ensure a typi- file system and export it through the NFS server. For Li- cal, read-dominated workload. In Figure 11, we present the brettOS, we use direct NIC and NVMe access. We mount number of operations per second for a small and large block the provided NFS share on the client side and run the Sys- sizes and using a different number of threads (1-20). Each bench test with different block sizes (single-threaded). Fig- thread runs 10 clients, and each client performs 100,000 op- ure 8 shows that for all block sizes, LibrettOS outperforms erations. both NetBSD and Linux. LibrettOS is 9% consistently faster LibrettOS-Direct and Linux show similar overall perfor- than both of them. This is because, LibrettOS avoids the mance. LibrettOS-Server reveals a runtime overhead since it system call layer which is known to cause a non-negligible has to run a server in a separate VM. Finally, NetBSD shows performance overhead [76]. the worst performance as in the previous test. 5.5 Nginx HTTP Server 5.7 Redis To run a network-bound macrobenchmark, we choose Ng- Finally, we ran Redis, an in-memory key-value store used inx 1.8.0 [60], a popular web server. We use the Apache by many cloud applications [70]. We use Redis-server 4.0.9 Benchmark [81] to generate server requests from the client and measure the throughput of 1,000,000 SET/GET opera- side. The benchmark is set to send 10,000 requests with var- tions. In the pipeline mode, the Redis benchmark sends re- ious levels of concurrency on the client side. (In this bench- quests without waiting for responses which improves per- mark, concurrency is the number of concurrent requests.) formance. We set the pipeline size to 1000. We measure 10
the performance of Redis for various number of concurrent 4KB 8KB connections (1-20). We show results for the SET/GET oper- 90 180 ation (128 bytes); trends for other sizes are similar. Libret- 80 160 Throughput (MB/s) tOS’s GET throughput (Figure 12) is close to Linux, but the 70 140 60 120 SET throughput is smaller. NetBSD reveals very poor per- 50 100 formance in this test, much slower than that of LibrettOS 40 80 and Linux. We suspect that the current Redis implementa- 30 60 LibrettOS-Direct 20 LibrettOS-Server 40 tion is suboptimal for certain BSD systems, which results in 10 Linux 20 NetBSD worse performance that we did not observe in other tests. 0 0 0 10 20 30 40 50 60 0 10 20 30 40 50 60 5.8 Failure Recovery and Software Up- 16KB 32KB 300 500 grades 250 Throughput (MB/s) 400 LibrettOS can recover from any faults occurring in its 200 servers, including memory access violations, deadlocks, and 300 150 interrupt handling routine failures. Our model assumes that 200 100 failures are localized to one server, i.e., do not indirectly 50 100 propagate elsewhere. Applications that use our network server do not necessarily fail: the portmap state is kept in 0 0 0 10 20 30 40 50 60 0 10 20 30 40 50 60 the hypervisor, and if recovery happens quickly, TCP can still simply resend packets without timing out (applications 64KB 128KB keep their TCP states). Additionally, when upgrading the 700 1000 Throughput (MB/s) server, an artificial fault can be injected so that an applica- 600 800 tion can use a newer version of the server without restarting 500 400 600 the application itself. 300 400 To demonstrate LibrettOS’s failure recovery (or transpar- 200 ent server upgrades), we designed two experiments. In the 200 100 first experiment, an application has two jobs: one uses net- 0 0 working (Nginx) and the other one accesses a file system. 0 10 20 30 40 50 60 0 10 20 30 40 50 60 When LibrettOS’s network server fails, the Nginx server is Concurrency Concurrency unable to talk to the NIC hardware, and we report outage Figure 9. Nginx HTTP server. that is observable on the client side. This failure, nonethe- less, does not affect the file system job. In Figure 13(a), we LibrettOS-Direct LibrettOS-Hybrid show the corresponding network and file system transfer LibrettOS-Server Linux Absolute Throughput (MB/s) speeds. After rebooting the network server, Nginx can again NetBSD Gain (% of NetBSD’s) 50 reach the network, and clients continue their data transfers. 400 350 40 In the other test, Figure 13(b), we run two applications. Ng- 300 250 30 inx uses direct access while Redis uses our network server. 200 150 20 We show that a failure of Nginx does not affect Redis. Then 100 10 we also show that a network server failure does not impact 50 0 0 Nginx. 8 16 32 8 16 32 File Size (KB) File Size (KB) 6 Related Work Figure 10. Dynamic switch (Nginx). LibrettOS’s design is at the intersection of three OS research topics: system component isolation for security, fault-tole- and reliability guarantees [22, 31, 46]. With microkernels, rance, and transparent software upgrades (multiserver OS); only essential functionalities (scheduling, memory manage- application-specialized OS components for performance, ment, IPC) are implemented within the kernel. L4 [52] security, and software bloat reduction (library OS); and ker- is a family of microkernels, which is known to be used nel bypassing with direct hardware access to applications for various purposes. A notable member of this family for performance. is seL4 [46], a formally verified microkernel. Multiserver Isolation of system software components in separate ad- OSes such as MINIX 3 [30], GNU Hurd [8], Mach-US [79], dress spaces is the key principle of the microkernel model [1, and SawMill [20] are a specialization of microkernels where 13, 27, 30, 33, 36, 52]. Compared to the classical mono- OS components (e.g., network and storage stacks, device lithic OS design, microkernels provide stronger security drivers) run in separate user processes known as “servers.” 11
LibrettOS-Direct Linux network stacks such as mTCP [40], MegaPipe [25], and LibrettOS-Server NetBSD 32 bytes 8KB OpenOnload [65] have been designed to bypass the OS and avoid various related overheads. While it is not their pri- 250000 200000 mary objective, such user-space network stacks do provide Operations / s 200000 150000 a moderate degree of reliability: since they do not reside in 150000 100000 the kernel, a fault in the network stack will not affect the 100000 50000 applications that are not using it. 50000 In its multiserver mode, LibrettOS obtains the security 0 0 0 5 10 15 20 0 5 10 15 20 and reliability benefits of microkernels by decoupling the Threads (each runs 10 clients) Threads (each runs 10 clients) application from system components such as drivers that are particularly prone to faults [23, 32] and vulnerabili- Figure 11. Memcached (distributed object system). ties [9]. IX [7] and Arrakis [64] bypass traditional OS layers to im- LibrettOS-Direct Linux prove network performance compared to commodity OSes. LibrettOS-Server NetBSD 128 bytes (Set) 128 bytes (Get) IX implements its own custom API while using the DPDK 1.6 library [84] to access NICs. Arrakis supports POSIX but Million transact / s Million transact / s 1 1.4 builds on top of the Barrelfish OS [5], for which device 0.8 1.2 1 driver support is limited. LibrettOS gains similar perfor- 0.6 0.8 mance benefits when using direct hardware access (library 0.4 0.6 0.4 OS) mode, while reducing code development and mainte- 0.2 0.2 nance costs as it reuses a very large base of drivers and soft- 0 0 0 5 10 15 20 0 5 10 15 20 ware from a popular OS, NetBSD. Additionally, in contrast Concurrency Concurrency to LibrettOS, these works do not consider recoverability of OS servers. Figure 12. Redis key-value store. Certain aspects of the multiserver design can be em- ployed using existing monolithic OSes. VirtuOS [62] is a fault-tolerant multiserver design based on Linux that pro- Throughput (MB/s) Throughput (MB/s) 1200 1200 NIC Nginx (Direct) 1000 FS 1000 Redis (Server) vides strong isolation of the kernel components by running 800 800 600 600 them in virtualized service domains on top of Xen. Snap [56] 400 400 implements a network server in Linux to improve perfor- 200 200 0 0 mance and simplify system upgrades. However, Snap uses 0 5 10 15 20 25 30 0 10 20 30 40 50 its private (non-TCP) protocol which cannot be easily inte- Time (sec) Time (sec) grated into existent applications. Moreover, Snap is limited (a) Single application (Nginx) (b) Multiple applications to networking and requires re-implementing all network device drivers. Figure 13. Failure recovery and transparent upgrades. On-demand virtualization [47] implements full OS migra- tion, but does not specifically target SR-IOV devices. In con- trast, LibrettOS allows dynamic device migrations. Decomposition and isolation have proved to be benefi- To optimize I/O in VMs, researchers have proposed the cial for various layers of system software in virtualized en- sidecore approach [19, 48] which avoids VM exits and of- vironments: hypervisor [75], management toolstack [11], floads the I/O work to sidecores. Since this approach is and guest kernel [62]. Security and reliability are partic- wasteful when I/O activity reduces, Kuperman et al. [49] ularly important in virtualized environments due to their proposed to consolidate sidecores from different machines multi-tenancy characteristics and due to the increased re- onto a single server. liance on cloud computing for today’s workloads. In the Exokernel [15] was among the first to propose compiling desktop domain, Qubes OS [73] leverages Xen to run ap- OS components as libraries and link them to applications. plications in separate VMs and provides strong isolation for Nemesis [72] implemented a library OS with an extremely local desktop environments. LibrettOS also uses Xen to play lightweight kernel. Drawbridge [66], Graphene [85], and the role of the microkernel. However, the LibrettOS design Graphene-SGX [86] are more recent works that leverage is independent of the microkernel used and can be applied the security benefits of library OSes due to the increased as-is to other microkernels supporting virtualization, with isolation resulting from the lesser degree of interaction be- potentially smaller code bases [46, 78, 87]. tween applications and the privileged layer. Bascule [6] The network stack is a crucial OS service, and its reliabil- demonstrated OS-independent extensions for library OSes. ity has been the subject of multiple studies based on micro- EbbRT [74] proposed a framework for building per-app- kernels/multiserver OSes using state replication [12], par- lication library OSes for performance. titioning [35], and checkpointing [21]. Several user-space Library OSes specialized for the cloud – unikernels [10, 12
You can also read