Serverless Predictions: 2021-2030 - arXiv.org
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Serverless Predictions: 2021-2030 Pedro Garcia Lopez Universitat Rovira i Virgili/IBM Watson Research, Spain/USA Aleksander Slominski IBM Watson Research, USA arXiv:2104.03075v1 [cs.DC] 7 Apr 2021 Michael Behrendt IBM Deutschland Research & Development GmbH, DE Bernard Metzler IBM Research Zurich, CH Abstract— Within the next 10 years, advances on resource disaggregation will enable full transparency for most Cloud applications: to run unmodified single-machine applications over effectively unlimited remote computing resources. In this article, we present five serverless predictions for the next decade that will realize this vision of transparency – equivalent to Tim Wagner’s Serverless SuperComputer or AnyScale’s Infinite Laptop proposals. S ERVERLESS gained popularity in industry goal of merging the programming and computa- and in academia in last few years [1]. It at- tional models of local and remote computing is tracted companies and developers with a sim- not new. They state that around every ten years ple Function-as-a-Service (FaaS) programming “a furious bout of language and protocol design model that realized original promise of the Cloud: takes place and a new distributed computing elasticity and fine grained pay-as-you-go for ac- paradigm is announced“. In every iteration, a new tual usage. When a code is running in FaaS wave of software modernization is generated, and developers do not have control over where the applications are ported to the newest and hot code is running and do not need to worry about paradigm. how to do scaling - serverless cloud providers We believe the Serverless Compute paradigm, create transparency by removing servers, or at as emerging today [1], [9], will converge at the least making them more transparent. needed level of resource abstraction to enable transparency. What we call The Serverless End Transparency is an archetypal challenge in Game is the process of mapping this principle distributed systems that has not yet been ade- on emerging disaggregated computing resources quately solved. Transparency implies the conceal- (compute, storage, memory), eventually enabling ment from the user and the application program- unlimited flexible scaling. mer of the complexities of distributed systems. The major hypothesis of this paper is that full According to Colouris [2], access transparency transparency will become possible in the next enables local and remote resources to be accessed years thanks to predicted advances on latency using identical operations. reduction in distributed systems [4], [6], [16]. Nevertheless, Waldo et al. [3] explain that the This will put an end to the aforementioned cycles Published by the IEEE Computer Society © IEEE 1
Table 1. Latencies for remote resource access specialized system interface which includes some Resource Local Remote today Remote soon POSIX syscalls, serverless-specific tasks, and Storage 10ms 10ms 0.05ms (NVM) Compute 0.01ms 15-100ms 0.001-0.1ms frameworks such as OpenMP and MPI. Faasm (Thread) (FaaS) (RPC) transparently intercepts calls to this interface Memory 0.0001ms 0.25ms 0.002-0.01ms to automatically distribute unmodified applica- (Redis) (PMEM) tions, and execute existing HPC applications over serverless compute resources. of software modernization. The consequences for Faasm allows colocated functions to share the field will be enormous, by considerably sim- pages of memory and synchronizes these pages plifying development and maintenance of soft- across hosts to provide distributed state. However, ware systems for the majority of users. this is done through a custom API where the user must have knowledge of the underlying system, BACKGROUND hence breaking full transparency. Furthermore, Latency improvements [4], [6] are boosting when functions are widely distributed, this ap- resource disaggregation in the Cloud, which is the proach exhibits performance similar to traditional definitive catalyst to achieve transparency. As we distributed shared memory (DSM), which has can see in table 1, current data center networks proven to be poor without hardware support. already enable disk storage disaggregation, where Nevertheless, resource disaggregation is still reads from local disk are comparable (10ms) to in its infancy, and there is no current solution to reads over the network. In contrast, creating a provide flexible scaling and access transparency thread in Linux takes about 10µs, still far better over remote shared memory. Container instan- than 15ms/100ms (warm/cold) as achieved to- tiation is slow comparing to local threads, and day in Function-as-a-Service (FaaS) settings. The even fast NVMs[16] are an order of magnitude level of resource disaggregation as possible today slower than local memory accesses which are in is specifically utilized by Serverless Platforms, the nanosecond range [6]. and focus of research in Disaggregated Data Besides of the aforementioned constraints of Centers (DDC) in general [8]. current resource disaggregation, serverless com- Providing access transparency over DDC re- puting has a number of well known limitations[9] sources is the aim of LegoOS: A disseminated, like: focus on stateless computations, lack of distributed OS for hardware resource disaggrega- efficient communication between executed tasks tion [13]. LegoOS exposes a distributed set of or functions, maximum code runtime limitations, virtual nodes (vNode) to users. Each vNode is and deficiencies in transparent integration of like a virtual machine managing its own disaggre- hardware accelerators. gated processing, memory and storage resources. LegoOS achieves transparency and backwards PREDICTIONS compatibility by supporting the Linux system call The major hypothesis of this paper is that interface, so that unmodified Linux applications transparency will be achieved in the next ten can run on top of it. For example, LegoOS years thanks to novel advances in networking, dis- executes two unmodified applications: Phoenix aggregation, and middleware services. The huge (a single-node multi-threaded implementation of consequence is the unification of local and remote MapReduce) and TensorFlow. paradigms, which will democratize distributed A good example of providing access trans- programming for a majority of users. This will parency over serverless resources is Lithops realize the old and ultimate goal of hiding the [14]. Lithops intercepts Python language li- complexity of distributed systems. braries (multiprocessing) in order to access re- The projected developments to reach the ul- mote serverless resources in a transparent way. timate goal (Serverless End Game) include the Lithops is however limited to running Python following: aplications using that library. Another example of transparency in a server- • Prediction 1: Serverless Clusters (Multi-tenant less context is Faasm [19]. Faasm exposes a Kubernetes) will overcome the current limi- 2
(Spark, Ray, Dask), Actors (Akka), and even High Performance Computing (MPI). There will be intense efforts in cluster scheduling and workload analysis using machine learning techniques to optimize the deployment of different applications and frameworks. Nevertheless, container instantiation times will still be slow (tens to 100s ms) and similar to current FaaS technologies. And the lack of mem- Figure 1. Serverless End Game ory disaggregation will preclude the transition for many applications. tations of direct communication among func- Prediction 2 tions, hardware acceleration, and time limits. Serverless Granular Computing will drasti- • Prediction 2: Serverless Granular computing cally reduce instantiation and execution times will offer 1-10 µs microsecond latencies for thanks to more lightweight virtualization and remote functions thanks to lightweight virtual- execution technologies[7] and fast RPCs[5]. Mi- ization and fast RPCs. crosecond latencies will also propagate to other • Prediction 3: Serverless memory disaggrega- middleware services such as messaging and col- tion will offer shared mutable state and coor- lective communication. dination at 2-10 µs microsecond latencies over Accessing disaggregated resources in a trans- persistent memory. parent manner requires a form of lightweight, • Prediction 4: Serverless Edge Computing plat- flexible virtualization that does not currently ex- forms leveraging 6G’s ms latencies and AI op- ist. This new virtualization must intercept compu- timizations will facilitate a Cloud Continuum tation and memory management to provide access for remote applications. to disaggregated resources, and must do so with • Prediction 5: Transparency will become the native-like performance and no input from the dominant software paradigm for most appli- programmer. Several alternatives for lightweight cations, when computing resources become virtualization will emerge around micro-VMs, standardized utilities. software-based virtualization (Krustlet and We- We will discuss the basis of these predictions bAssembly) and platform-independent runtimes as well as technical challenges and risks. As we like GraalVM. can see in Figure 1, the predictions are mapped In summary, serverless granular computing to phases to reach the final goal. Let’s review the will represent a performance boost for most ap- proposed forecasts. plications in the Cloud, provoking a massive redesign of many toolkits (Web, games, Desktop). Prediction 1 In less than two years, a second generation of Prediction 3 Serverless platforms will overcome major limita- In the next years, public Cloud providers will tions of previous FaaS offerings. start offering Serverless Memory Disaggregation This is already starting to happen around at microsecond speeds. We leverage here previous the so called serverless containers platforms like work on memory disaggregation like [20], [11], Google Cloud Run, IBM Code Engine, or Ama- [10]. Services like Serverless Redis will arrive zon Fargate. Multi-tenant Kubernetes (K8s) clus- soon with limited consistency requirements. ters may become the necessary glue to build Nevertheless, disaggregated memory at mi- multi-Cloud API-agnostic applications. crosecond latencies will never reach the nanosec- As a consequence, a number of Cluster toolk- ond latencies of local memory. As a consequence, its will accelerate the transition to K8s like Web the combination of local memory with remote frameworks (Flask, Rails, Node.js), Microser- consistent shared memory will then create truly vices (Quarcus, Spring, Micronaut), Analytics Serverless Mutable State and Coordination ser- 3
vices. This will open transparency avenues for applications using OS public Call Interfaces. This most stateful applications like: Serverless In- will require the collaboration of DDC and OS Memory Databases, Analytics (TensorFlow, Py- researchers, but also the research work of the Torch, Phoenix, OpenMP), Game engines and broader Systems community. Multi-user Worlds (Unity, Minecraft), Desktop applications (Image editing, Video Editing), and Collaborative applications (games, conferencing). Prediction 4 When Serverless Granular Computing and The computing power at the edge, along 1-2 Disaggregated Memory become available, the millisecond latencies facilitated by 6G will pro- three types of resources (compute, storage, mem- duce Serverless Edge platforms that will alter the ory) may be offered through virtual interfaces operating systems of mobile phones and terminals against disaggregated Serverless services. at the edge. The smooth integration with Cloud This will imply parallel work in different resources will create a Cloud Continuum where layers of the software stack. The first approach, as transparent applications directly access data in followed by Lithops[14] will consist in intercept- remote resources. ing language libraries and toolkits to transparently Micro-second latencies will also produce a interact and combine local and remote resources. flourishing of remote use interfaces. In line with The second approach will adapt polyglot exe- remote displays (X-Windows, VNC), many ap- cution runtimes like GraalVM, WASM, or .NET plications could just focus edge devices in user Common Language Runtime among others. This interface interaction, moving all computing logic second phase is more ambitious, and it will con- to remote execution. This could provoke a revival sistently arrive to all languages and applications of dumb terminals that delegate their execution built on top of them. entirely to the cloud. Finally, the more advanced but also more complex approach is to redesign the Kernel Space Nevertheless, dumb terminals cannot address to accomodate Serverless Remote Resources in a more sophisticated use cases where edge com- seamless way. This is more in line with initial puting resources are relevant. In cases where you DDC works like LegoOS[13] or Arrakis[18], but cannot neglect the computing resources in the they will certainly require hard engineering ef- edge, local and remote resources should be intel- forts to cover the entire kernel APIs, and probably ligently combined in the Edge/Cloud Continuum. novel co-designs and OS improvements at all This will certainly become the last frontier for levels. We outline here the virtual file system and software modernization, which will then define network device drivers, virtual memory manage- how to write local applications that use remote ment, and also process/thread management layers. resources in a transparent way. A plethora of In this line, Arrakis[18] comes from previous novel applications (tele-presence, games, collab- efforts aimed at optimizing the kernel code paths orative work, augmented/mixed reality, virtual to improve data transfer and latency in the OS. reality) will emerge leveraging the novel pro- In Arrakis, applications have direct access to gramming abstractions, and existing ones will be virtualized I/O devices, which allows most I/O re-engineered or improved considerably. operations to bypass the kernel entirely without compromising process isolation. Arrakis virtual- ized control plane approach allows storage so- Prediction 5 lutions to be integrated with applications (co- In ten years, no application domain will re- design), even allowing the development of higher main unaltered for transparency as a consequence level abstractions like persistent data structures. of the Serverless End Game. The convergence Arrakis shows us how its control plane may of Granular Computing, Disaggregated memory, become a first step towards integration with a and the Edge-Cloud continuum will definitely serverless data center resource allocator. unify the local and remote paradigms. Computing Redesign of OS kernels will then enable the will finally become a utility thanks to standards final goal of full transparency for any native accepted by all providers. 4
IMPACT in the data center, and 1-5 ms latencies of edge Achieving transparency will produce a pro- resources will profoundly change how we interact found impact in the computing field and it will with technology. Analogously to the unification of change drastically how applications are built in the local and remote paradigms in programming, the future. we will see a blurring and unification of real On the technical side, transparency has been and virtual environments using Augmented and the ultimate goal of distributed systems for many Virtual Reality. decades now. Developers will still care about par- allel programming, object oriented programming, TRADEOFFS OF DISAGGREGATION user interfaces, event-based systems, or functional AND TRANSPARENCY programming. However, most developers will no Blurring the borders between local and remote longer worry about middleware, transport layers resources may have important advantages, but it and marshaling, web-based protocols, or Cloud- also implies some tradeoffs and limitations that specific APIs among others. we must be aware. On the technical side, the disaggregation (and Waldo et al.[3] already considered 25 years standardization) of remote computing resources ago that future hardware improvements could will also have a profound impact on the edge: in make the difference in latency irrelevant, and that the redesign of Operating Systems for computers, differences between local and remote memory mobile phones, and edge devices. Novel devel- could be masked. But they still claimed that opment environments will emerge to cover the concurrency and partial failures preclude the uni- entire life cycle of applications in the Cloud/Edge fication of local and remote computing. Let´s continuum. study the major challenges and tradeoffs: On the economic side, transparency will also imply a revolution that will shake the entire indus- 1) Partial Failure: Existing applications are a try. The major economic impact will be in devel- blackbox for the Cloud, but the transition oper costs and productivity for most applications. will imply a “compile to the Cloud“ pro- If Serverless technologies transparently handle cess. In this case, the Cloud will have ac- the distributed systems complexity, programming cess over applications’ life cycle and it will remote applications will become as simple as be able to offer failure transparency, and to programming local ones with infinite resources. optimize their execution performance and Transparency will boost the consumerization cost. This means that they can perform of software, facilitating the creation of many static analysis to predict resource require- applications in different domains. More users ments, failures, dependencies and potential will be able to customize and develop software, for hardware acceleration. In the past, ex- benefiting from novel visual and no-code tools. tensive optimizations have been performed The economic impact will also affect all to declarative dataflows, but there are also Cloud providers on a first wave, in a fierce opportunities to improve imperative pro- competition to be the most efficient disaggregated grams with static analysis [15]. backend for applications. It will also affect Oper- Transparency efforts for different types of ating Systems in the edge (like Apple, Windows, applications will also require customizable Linux, Android, or iOS), which will seamlessly control planes. In particular, customized co- integrate remote resources. designs [17] may be necessary to optimize In this integration of Cloud and Edge, some resource usage. Such customization will be providers with presence in both sides (Google, based on advanced observability and fast Microsoft, Apple) will have a certain advantage orchestration mechanisms relying on stan- over the rest of the industry. Nevertheless, open dard services and protocols. Monitoring and source giants like IBM/RedHat could also offer interception of the different resources (com- integrated solutions in the Cloud and Edge based pute, storage, memory, network) should be on Linux technologies. available and even integrated into the data In any case, microsecond latencies enabled center, enabling coordinated actuators at 5
different levels. will always offer an order of magnitude 2) Concurrency: Another important limitation faster access than remote memory. Future when developing concurrent applications approaches should smartly leverage local is scaling transparency, which means that memory and combine it with remote mem- applications can expand in scale without ory. changes to the system structure or the appli- Locality still plays a key role in stateful cation algorithms. If the local programming distributed applications. For example: (i) model was designed to use a fixed amount where huge data movements represent a of resources, there is no magic way of penalty and memory-locality can be use- transparently achieving scalability, not to ful; (ii) where specialized hardware like mention elasticity. GPUs must be used; in (iii) some iterative Some workloads that do not need elasticity, machine-learning algorithms; in (iv) simu- such as enterprise batch jobs or scientific lators, interactive agents or actors. simulations, can just port existing code to Future serverless middleware addressing the Cloud. If the application is configured to transparency will have to provide affinity run on a fixed number of resources (threads and grouping requirements for stateful en- or processes), it may run unmodified over tities. Serverless Stateful services will sup- the same fixed amount of disaggregated port very different requirements of coor- resources. However, for more user driven dination, consistency, scalability and fault and interactive services, such as internal en- tolerance. terprise web applications, simple porting of 4) Cost: Another limiting factor today for the executables (sometimes referred as “lift- current serverless disaggregated technolo- and-shift“) is rarely enough. The unchanged gies is cost. For example, Amazon Lambda code is not able to take advantage of the compute time is 2x more expensive than on- elasticity of disaggregated resources. demand VMs, and around 6x more expen- We will then need elastic programming sive than Spot Instances. For problems like models that can be used without change batch analytics that have 100% utilization, when running over Cloud resources. Such VMs are now a better solution for intensive elastic models should take care of providing computing applications. the different transparency types (scaling, We foresee that cost limitations will failure, replication, location, access) and be overcome in the future, when other aspects of application behavior when Cloud providers improve scheduling it is moved between local and distributed of lightweight micro-VMs, reducing both environments. The local executable APIs start time and overall costs thanks to may need to be expanded to include elas- locality. Furthermore, sophisticated cost tic programming abstractions for processes, planes should be available to users in order memory, and storage. to have full control of costs derived by 3) Locality: Advances in datacenter network- remote resources. ing and NVMs have reduced access to networked storage to a few 10ths of µs, however this is still an order of magnitude CONCLUSIONS slower than local memory accesses which We argue that full transparency will be pos- are in the nanosecond range [6] (100ns), sible soon thanks to low latency and resource and local cache accesses in the 4ns-30ns disaggregation. The Serverless End Game will range. Existing efforts in memory disag- unify local and remote programming paradigms, gregation [11], [10] strive to play in the changing completely the way we currently create µs range, which can be a limiting factor distributed applications. This is the ultimate goal for some applications. This means that lo- of distributed systems, to become invisible us- cal memory cannot be neglected, since it ing transparent middleware, and to simplify how users access remote resources. 6
ACKNOWLEDGMENT 15. M. Kiran, A. Hashim, L. Kuan, T. Jiun, “Execution time This work has been partially supported by the prediction of imperative paradigm tasks for grid schedul- EU Horizon2020 programme under grant agree- ing optimization”, Int J Comput Sci Netw Secur,2009. ment No 825184. 16. T. Coughlin, “Nonvolatile Memory Express: The Link That Binds Them”, IEEE Computer,2019. REFERENCES 17. S. Angel, M. Navanati, S.Sen, “Disaggregation and the 1. P. Castro, V.Ishakian, V. Muthusamy, A. Slominski “The Application”, USENIX Workshop on Hot Topics in Cloud rise of serverless computing”, Communications of the Computing,2020. ACM,2019. 18. P. Simon et al, “Arrakis: The operating system is the 2. G. Colouris, J. Dollimore, T. Kindberg, G. Blair, “Dis- control plane”, ACM Transactions on Computer Systems tributed Systems: Concepts and Design” Addison- (TOCS), volume 33, pages 1-30, 2015. Wesley Publishing Company, 2011. (book) 19. S. Shillaker, P. Pietzutch, “FAASM: Lightweight Isolation 3. J. Waldo, G. Wyant, A.Wollrath, S. Kendall, “A note on for Efficient Stateful Serverless Computing”, USENIX distributed computing”, International Workshop on Mo- Annual Technical Conference (USENIX ATC 19), 2019. bile Object Systems, 1996. 20. A. Klimovic et al, “Pocket: Elastic ephemeral storage 4. S. Rumble, D. Ongaro, R. Stutsman, M. Rosenblum, J.K. for serverless analytics”, 13th USENIX Symposium on Ousterhout, “It’s Time for Low Latency”, Proceedings of Operating Systems Design and Implementation (OSDI the Workshop on Hot Topics in Operating Systems, 2011. 18), 2018. 5. A. Kalia, M. Kaminsky, D. Andersen, “Datacenter RPCs can be general and fast”, USENIX Symposium on Net- Pedro Garcia Lopez is full professor at Universi- worked Systems Design and Implementation, 2019. tat Rovira i Virgili (Spain) and visiting scientist at IBM Watson Research. He leads the “Cloud and 6. L. Barroso, M. Marty, D. Patterson, P. Ranganathan, Distributed Systems Lab” research group and co- “Attack of the killer microseconds”, Communications of ordinates the european research project ”CloudBut- the ACM, pp 48–54,2017. ton: Serverless Data Analytics”. Contact him at pe- 7. C. Lee, J. Ousterhout, “Granular Computing”, Proceed- dro.garcia@urv.cat. ings of the Workshop on Hot Topics in Operating Sys- tems,2019. Aleksander Slominski is research staff member in 8. P. Gao, A. Narayan, S. Karandikar, J. Carreira, S. Han, the Serverless Group in Cloud Platform, Cognitive R. Agarwal, S. Ratnasamy, S. Shenker, “Network require- Systems and Services Department at IBM T.J. Wat- ments for resource disaggregation”, USENIX Symposium son Research Center in Yorktown Heights, NY, USA. on Operating Systems Design and Implementation,2016. Contact him at https://aslom.net 9. E. Jona et al, “Cloud programming simplified: A berke- ley view on serverless computing”, [Online]. Available: Michael Behrendt is a Distinguished Engineer and technical executive at IBM Deutschland Research. https://arxiv.org/abs/1902.03383 (URL). He leads key Serverless services like IBM Cloud 10. J. Gu, Y. Lee, Y. Zhang, M. Chowdhury, K. Shin, “Ef- Functions and IBM Code Engine. Contact him at ficient memory disaggregation with infiniswap”, USENIX michaelbehrendt@de.ibm.com. Symposium on Networked Systems Design and Imple- mentation,2017. Bernard Metzler is a Principal Research Staff Mem- 11. A. Dragojević, D. Narayanan, M. Castro, O. Hodson, ber and Technical Leader at IBM Zurich Research “FaRM: Fast remote memory”, USENIX Symposium on Laboratory. His main research interests are in en- Networked Systems Design and Implementation,2014. hancing network and storage IO of distributed sys- 12. S. Peter et al, “Arrakis: The operating system is the tems, and the integration of modern high performance control plane”, ACM Transactions on Computer Systems IO hardware with distributed applications. Contact (TOCS),2015. (journal) him at bmt@zurich.ibm.com. 13. S. Angel, M. Nanavati, S. Sen, “Disaggregation and the Application”, USENIX Workshop on Hot Topics in Cloud Computing,2020. 14. J. Sampe, P. Garcia-Lopez, M. Sanchez, “Towards Mul- ticloud Access Transparency in Serverless Computing”, IEEE Software,2020. 7
You can also read