Towards Disaggregating the SDN Control Plane - arXiv
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
1 Towards Disaggregating the SDN Control Plane Douglas Comer, and Adib Rastegarnia Abstract—Current SDN controllers have been designed provide control logic. In the current SDN paradigm, based on a monolithic approach that integrates all of management functionality crosses three key layers, services and applications into one single, huge program. including data plane forwarding mechanisms, con- The monolithic design of SDN controllers restricts pro- trol plane functions, and management applications grammers who build management applications to the specific programming interfaces and services that a given [1]. A centralized control plane is implemented by arXiv:1902.00581v2 [cs.NI] 22 Apr 2019 SDN controller provides, making application develop- an SDN controller. A controller uses two types of ment dependent on the controller, and thereby restricting Application Program Interfaces (APIs) to connect portability of management applications across controllers. to other entities: a Northbound (NB) API and a Furthermore, the monolithic approach means an SDN Southbound (SB) API. A NB API defines commu- controller must be recompiled whenever a change is made, nication between external management applications and does not provide an easy way to add new functionality or scale to handle large networks. To overcome the and the SDN controller; a SB API defines com- weaknesses inherent in the monolithic approach, the next munication between the controller and underlying generation of SDN controllers must use a distributed, network devices [1]. One of the most widely used microservice architecture that disaggregates the control SB APIs, known as the OpenFlow [2] protocol, plane by dividing the monolithic controller into a set allows a controller to insert, modify, and update of cooperative microservices. In this paper, we explain flow table rules, and to specify associated actions the steps that are required to migrate from a monolithic architecture to a microservice architecture, and consider to be performed for each of the flows that pass two potential designs that achieve the goal. Finally, the through a given network device. The architecture paper reports the results of testbed measurements that we of current software defined management systems use to evaluate the proposed disaggregated architecture exhibits several weaknesses as follows: from multiple perspectives, including functionality and performance. • Monolithic and Proprietary: Current SDN con- Index Terms—Software Defined Networking, Control trollers employ a monolithic architecture that Plane Disaggregation, OpenFlow, Network Operating Sys- aggregates all control plane subsystems into a tem, gRPC, Apache Kafka. single, gigantic, monolithic program. In the ag- gregated control plane model, each controller I. I NTRODUCTION defines its own set of programming interfaces and services; when writing management applica- Software defined networking (SDN) is an emerg- tions, a programmer can only use the facilities ing trend for the design of Internet management the controller provides. An aggregated controller systems that decouples vertical integration of the design makes application development dependent control plane and data plane and provides flexibility on a particular SDN controller and the specific that allows software to program the data plane hard- programming language that has been used to ware directly according to a set of network policies. implement the controller. Consequently it restricts Thus, the control functions used to configure a portability across multiple controllers by making device no longer need to be integrated with the each application depend on a specific controller. functions that perform data forwarding. SDN offers In the monolithic approach, the vendor that cre- flexibility to program and monitor computer net- ates a controller chooses all functionality and works directly using a controller that runs software ensures the cohesion of all pieces. However, the known as a Network Operating System (NOS) to approach does not provide an easy way for users to incorporate new services or adapt the controller Douglas Comer and Adib Rastegarnia are with the Department of Computer Science, Purdue University, West Lafayette, IN,47906, quickly. USA, e-mail: (comer@cs.purdue.edu, arastega@purdue.edu • Lack of a Uniform Set of NB APIs: Even if
2 they use the same general form of interaction event processing. and Section V compares the two (e.g., RESTful interface), SDN controllers, such event distribution systems by assessing implementa- as ONOS [3] and OpenDayLight [4], each offer tion tradeoffs. Section VI explains an experimental a NB APIs that differs from the APIs offered by setup used to assess performance, and reports the other controllers in terms of syntax, naming con- results of measurements. Finally, Section VII con- ventions, and resources. The lack of uniformity cludes the paper. among NB APIs makes each external manage- ment application dependent on the NB API of a II. R ELATED W ORK specific SDN controller. • Lack of Reusability of Software Modules: In Umbrella [5] is a unified software defined net- the current SDN architecture, the dependency work programming framework that provides a new between a management application and a specific set of APIs for the implementation of SDN applica- type of controller limits reuse of SDN software. tions independent of the NB APIs used by specific In many cases, when porting an application from SDN controllers. Umbrella uses OFtee [6] as a tool one controller to another, a programmer must to provide OpenFlow PACKET IN messages to the completely recode all modules, including basic external applications to support external reactive modules that collect topology information, gen- SDN applications as well as proactive applications. erate and install flow rules, monitor topology This article is an extension of [7], which presents the changes, and collect flow rule statistics. idea of externalization of packet processing in SDN, • Lack of Scalability and Reliability: In current one of the first steps towards control plane disaggre- SDN controllers, there are strong dependencies gation. This article generalizes the idea of external between the subsystems. If one of the subsystems packet processing to external event processing, a fails, the failure affects the operation of other design that allows complete disaggeration of control susbsystems. In addition, a monolithic SDN con- plane services. In addition, this article focuses on troller is a gigantic and complex software that required steps to migrate from a monolithic SDN is difficult to scale because a given subsystem control plane architecture to a disaggregated control cannot be changed or scaled independently. plane. To overcome the above weaknesses, we propose a new architecture for the design and implementation III. SDN C ONTROL P LANE D ISAGGREGATION of next generation SDN control plane systems that As Figure 1 illustrates, a monolithic SDN con- splits the current monolithic controller software into troller can be disaggregated into a suite of coop- a set of cooperating microservices. This article erating microservices, such that a controller core makes the following contributions: provides the minimum required functionality, and • Explains the concept of SDN Control Plane microservices, that exist outside of the controller Disaggregation and introduce a distributed ar- core provide all other services. Migrating from a chitecture for next generation of SDN con- monolithic controller to a microservice architecture trollers. for control planes offers the following benefits: • Describes steps taken towards disaggregating • Flexibility In Scaling: One of the main ad- the SDN control plane, considers potential vantages of the microservice architecture arises ways to achieve the goal, and discuss the from its ability to scale a given service hori- advantages and disadvantages of each. zontally, independent of other subsystems and • Evaluates the approaches and consequent services. tradeoffs from multiple perspectives, such as • Freedom In Choosing a Programming Lan- performance and implementation effort. guage: In the disaggregated control plane The rest of the paper is organized as follows: The model, programmers have the opportunity to next section summarizes related work. Section III choose an arbitrary programming language, explains the concept of SDN control plane disaggre- programming technology, and third-party li- gation. Section IV considers two event distribution brary function when building a given SDN systems that can potentially be used to externalize management application. The approach makes
3 the application development process more flex- vides a standard API used to stream events ible, and allows a programmer to choose a to the external processes (i.e. microservices); language that is appropriate to a given app. to permit multilingual apps, the API must be • Fault Isolation: In current SDN controllers, if independent of the programming language one of the subsystems fails it can affect the used to implement each microservice. entire controller. A disaggregated control plane – SB Protocols: SB protocols define the com- means that the failure of a given microservice munication between underlying network de- will not affect other microservices. Moreover, vices and the core of the controller. The a microservice can be repaired and restarted event distribution system communicates uses without recompiling the controller and without SB protocols to receive events from under- restarting other microservices. lying devices, and then uses the distribution • A Controller Core With Minimal Com- mechanism to propagate event notifications ponents: In the monolithic approach, every to external microservices. instance of a controller includes all services • Disaggregated Control Plane Services: In and apps, even if the instance only needs a the disaggregated model, each of the core small subset. In the disaggregated model, the control plane services (e.g. topology service, controller core contains the minimum viable flow service, etc) is a microservice that can set of components and functions, and only the run in a container. The microservices (i.e. the required apps and services need to be deployed containers in which the service runs) can be outside of the controller core. orchestrated using a conventional container or- • A Disaggregated Code Base: In the mono- chestrator technology, such as Kubernetes [8]. lithic architecture, a single, large code base Each control plane service provides three sets includes all services and apps. Consequently, of APIs: NB APIs that define the communica- changes to even a small, seldom-used service tion between control plane services and appli- requires changing the controller code base. In cations, inter-service APIs that define the com- the disaggregated model, each services and munication between each microservice, and SB each applications can be in a separate code APIs that define the communication between base, isolating changes. each microservice and controller core compo- The key components of a disaggregated control nents. plane architecture consist of: • Management Plane: Management plane com- prises set of management applications that use • Minimum Viable Controller Components: disaggregated control plane services to imple- The first step towards SDN control plane dis- ment network control functions, policies, and aggregation involves identifying a minimal set strategies, such as routing, network monitoring, of viable controller core components. When it firewall rules, network address translation, and receives a request from a management app or load balancing. To access microservice, man- from a control plane service, a control plane agement applications use the NB APIs that the core service uses a SB protocol to communi- microservices offer. In fact, each management cate with the underlying network devices, and application forms a microservice that runs in a then uses the information to respond to the container exactly like other microservices. request. Consequently, a controller with min- imum functionality needs the following two major subsystems: IV. A N OVERVIEW OF E VENT D ISTRIBUTION M ECHANISMS – An Event Distribution System: This sub- system notifies microservices (i.e. control We define an event as a change in the network. plane services and management applica- For example, consider the following network events tions) about events that occur in the under- that occur commonly: lying network devices, such as flow rule. • Packet Exception Event. A switch is config- network link, device, or packet exception ured to send any packet to the controller core if events. The event distribution system pro- the packet does not match any of the installed
4 Management Plane SDN App SDN App SDN App Intent Management Topology Management Service Service Disaggregated Inter-service APIs Control Plane Telemetry Flow Management NB APIs Service Services SB APIs Inventory Minimum Viable Event Distribution System Controller Southbound Protocols Components Switch Data Plane Switch Switch Switch Fig. 1. The Disaggregated SDN Control Plane Architecture forwarding rules. A packet exception event changes in connectivity from link and device events. occurs when a packet arrives at the controller Consequently, the topology microservice and man- core. agement apps can each react to the network changes. • Topology Event. Whenever the network topol- The next subsections consider the implementation ogy changes (e.g., a link fails or a link is of event distribution mechanisms using publish- placed back in service), the controller core subscribe and point-to-point messaging paradigms. receives a topology event. Topology events can be categorized according to type: link, device, A. An Event Distribution Mechanism Using A port, and so on. Publish-Subscribe Model • Flow Rule Event: Flow rule event occurs As Figure 2 illustrates, the event distribution whenever the flow rules change (e.g, when a system that runs inside the controller listens for rules if added, removed, or updated). events that occur in the controller core, and pushes As the previous section explains, disaggregating each events into a topic in an event broker where control plane requires external event processing. external processes and applications can consume A typical SDN controller provides several built- the event by pulling it from a topic in the event in services, such as a topology discovery service. broker. In essence, the event distribution system Provided that event processing can be externalized, acts as a producer by pushing items to brokers, and most of the control plane services in an SDN con- external applications and services act as consumers troller can be implemented as microservices outside that receive events from the broker. In the publish- of the controller core. For example, to implement subscribe model, each type of event is associated a topology discovery microservice, the event dis- with a Topic. Thus, an external application or a tribution mechanism can capture and forward both service specifies a topic, and then receives events link layer discovery protocol (LLDP) and Address sent to the topic. If an app wishes to receive packet Resolution Protocol (ARP) packets to an external events, the app subscribes to the packet event topic. microservice that uses the packets to build a map of Once an app subscribes to a topic, the app will re- links between switches and end-hosts. Both a topol- ceive a notification each time the event distribution ogy service and management apps can learn about systems publishes an event on the topic.
5 V. I MPLEMENTATION O PTIONS AND App Service App T RADE - OFFS Various technologies and frameworks that can Network Events be used to implement microservice architectures, Topic 1 Topic 2 Topic 3 Inter-service APIs including container technologies such as Docker, NB APIs container orchestrators such as Kubernetes, and SB APIs microservice frameworks. In a microservice archi- Event Distribution System tecture, microservices communicate with each other SB Protocols via standard APIs such as gRPC or a REST API. In addition, one can use frameworks such as Apache Switch Kafka, RabbitMQ, and ActiveMQ to implement an Switch event distribution system that follows the publish- Switch Switch subscribe model; frameworks, such as gRPC, can be used to implement an event distribution system that follows the point-to-point model. Studying and Fig. 2. The architecture of the event distribution system using comparing of all potential implementation technolo- publish-subscribe paradigm gies falls outside the scope of this paper. How- ever, to investigate the feasibility of the proposed disaggregated control plane architecture, we report B. An Event Distribution Mechanism Using A some preliminary results based on Kafka [9] and Point-to-Point Model gRPC [10], two well-known frameworks. In both As Figure 3 illustrates, an event distribution sys- implementations, we use protocol buffers [11] that tem that follows a point-to-point model uses a point- are a language-neutral, platform-neutral extensible to-point channel to send event notifications to each mechanism for serializing structured data to encode of the applications and services. The event distribu- network events. For the purpose of experimental tion system employs a push-notification mechanism results, we use a gRPC API to return the packet that provides a copy of each incoming events to to the controller core, which will send the packet to each external services or app that has subscribed a switch. Furthermore, If an application or service to the type of the event. In essence, the the event needs to install flow rules on network devices it has distribution system streams events to each of the a choice of using a REST API or gRPC. subscribers. A. A Comparison Of Kafka and gRPC Event Dis- tribution Mechanisms App Service App The advantages and disadvantages of the two event distribution mechanisms can be summarized: • Programming Language Agnosticism: Each of Network Events the proposed event distribution mechanisms Inter-service APIs provide a facility that can be used to implement NB APIs services and applications in arbitrary program- Event Distribution System SB APIs ming languages. From the client side perspec- SB Protocols tive, however, gRPC makes it easier than Kafka to expand a distribution system to include apps Switch written in new programming languages. To use Switch Switch gRPC with a new language, a programmer only Switch needs to compile the set of protobuf messages used for communication between the event distribution system and each microservice. The Fig. 3. The architecture of the event distribution system using point- gRPC technology automatically generates the to-point paradigm gRPC stubs that external SDN applications and
6 services need. In contrast, to use Kafka with the time it takes to process the same packet even a new programming language, a programmer inside the monolithic version of ONOS, and to un- must implement a set of high level abstractions derstand the effect on overall response time. In the that applications use to receive Kafka events. internal ONOS packet processing, whenever a host • Loose Coupling: Kafka, and the publish- sends a ping request for which no forwarding rules subscribe paradigm in general, provides a more have been established, each switch along the path loosely coupled communication system than sends the incoming packet to the controller core, gRPC and point-to-point messaging technolo- which processes the packet internally and returns gies. In Kafka’s publish-subscribe paradigm, the packet to the switch to be forwarded. In the producers and consumers do not need to know disaggregated architecture, whenever a host sends the existence of other producers and con- a ping request for which no forwarding rules have sumers. In gRPC, however, a client needs to been established, each switch along the path sends know about the services that the event distribu- the incoming packet to the controller core, and the tion system provides. Using a loosely coupled event distribution system in the core (using either event distribution system offers the advantages Kafka or gRPC) distributes the incoming packet of scalability, resilience, and maintainability. to the set of external apps and services that have • Fault Isolation and Code Base Disaggrega- subscribed to packet events. The external process tion: Disaggregating the control plane services then uses gRPC to return the packet to the switch for and using an event distribution mechanism forwarding. To focus measurements on the control increases fault isolation and allows code base plane overhead, we did not install flow rules. Thus, disaggregation regardless of the language used, each packet causes a packet event at each switch by keeping components independent of one along the path. We ran the experiment 500 times and another. measured the ping response time between two end hosts that are 5 hops apart in our SDN testbed. Ex- VI. E XPERIMENTAL R ESULTS ternalization of packet processing introduces over- head that increases the overall response time. As a This section presents an evaluation of the Kafka baseline, we measured the average response time for and gRPC distribution mechanisms using various internal processing in ONOS as 24 ms. The average performance metrics, primarily response time and response time for a gRPC system is 29 ms, and throughput. the average time for a Kafka system is 35 ms. We observe that using gRPC introduces less overhead A. Experimental Setup than the Kafka event distribution mechanism. We implemented an early of version of the pro- 2) Throughput: To assess the impact of exter- posed event distribution systems as an application nalized packet processing and the use of a REST for the ONOS [3] SDN controller. To measure API for flow rule installation on throughput, we the two distribution mechanisms, we used an SDN compare external reactive forwarding applications testbed that consists of 10 OpenFlow switches that that use gRPC and Kafka with the same reactive logically define 5 interconnected sites. We use the forwarding application compiled into ONOS. In all virtualized mode feature on the network switches three cases, we use a hard time-out of 10 seconds to divide each physical switch into 10 independent to remove all flow rules (i.e.,the flow rules installed smaller switches. Each emulated site implements a in a switch disappear every 10 seconds. Following Fat-tree network topology. removal of the flow rules, the next packet to arrive will causes a packet event, and the subsequent re- installation of the rules). We use the iperf3 tool B. Experimental Scenarios to generate TCP traffic, varying the the number of 1) Response Time: This experiment measures concurrent TCP connections, and running each mea- response time, and provides an evaluation of the surement for 150 seconds. As the results in Figure overhead of external event processing. Our goal is 4 show, the effect of externalized packet processing to compare the amount of time that an external app on throughput is negligible. Moreover, the overhead or service needs to process a packet event with of using a REST API to install flow rules is larger
7 1000 omy,” IEEE Communications Surveys Tutorials, vol. 18, no. 4, External Reactive Forwarding Using Kafka External Reactive Forwarding Using gRPC pp. 2687–2712, 2016. ONOS Reactive Forwarding [2] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, 980 L. Peterson, J. Rexford, S. Shenker, and J. Turner, “Openflow: Throughput (megabit per second) Enabling innovation in campus networks,” SIGCOMM Comput. Commun. Rev., vol. 38, no. 2, pp. 69–74, Mar. 2008. [Online]. 960 Available: http://doi.acm.org/10.1145/1355734.1355746 [3] P. Berde, M. Gerola, J. Hart, Y. Higuchi, M. Kobayashi, T. Koide, B. Lantz, B. O’Connor, P. Radoslavov, W. Snow, 940 and G. Parulkar, “Onos: Towards an open, distributed sdn os,” in Proceedings of the Third Workshop on Hot Topics 920 in Software Defined Networking, ser. HotSDN ’14. New York, NY, USA: ACM, 2014, pp. 1–6. [Online]. Available: http://doi.acm.org/10.1145/2620728.2620744 900 [4] “OpenDayLight,” https://www.opendaylight.org/, Accessed on 2019. 5 15 25 35 45 Number of TCP Connections [5] D. Comer, R. H. Karandikar, and A. Rastegarnia, “Umbrella: A unified software defined development framework,” in Fig. 4. Throughput vs. number of TCP connections for external and Proceedings of the 2018 Symposium on Architectures for internal ONOS reactive forwarding applications Networking and Communications Systems, ser. ANCS ’18. New York, NY, USA: ACM, 2018, pp. 148–150. [Online]. Available: http://doi.acm.org/10.1145/3230718.3233546 than overhead introduced by externalization. [6] “OFtee: An OpenFlow Proxy,” https://github.com/ciena/oftee/, Accessed on 2019. [7] D. Comer and A. Rastegarnia, “Externalization of packet VII. C ONCLUSION processing in software defined networking,” 2019. [Online]. Available: https://arxiv.org/abs/1901.02585 A monolithic architecture for an SDN controller [8] “kubernetes:Production-Grade Container Orchestration,” https: aggregates all control plane subsystems into a //www.kubernetes.io, Accessed on 2019. single, gigantic program. An SDN controller that [9] “Apache Kafka:A Distributed Streaming Platform,” https:// kafka.apache.org/, Accessed on 2019. adopts the monolithic approach restricts program- [10] “gRPC: A high performance, open-source universal RPC frame- mers who write management applications to the pro- work,” https://www.grpc.io/, Accessed on 2019. gramming interfaces and services that the controller [11] “Protocol Buffers,” https://developers.google.com/ protocol-buffers/, Accessed on 2019. provides. To overcome the limitations inherent in a the monolithic architecture, we propose a distributed Douglas Comer is an internationally recog- architecture that disaggregates controller software nized expert on computer networking and the into a small controller core and a set of cooperative TCP/IP protocols. He has been working with microservices. Disaggregation allows a programmer TCP/IP and the Internet since the late 1970s. Comer established his reputation as a principal to choose a programming language that is appro- investigator on several early Internet research priate for each microservice. To migrate from a projects. He served as chairman of the CSNET monolithic approach to a disaggregated microser- technical committee, chairman of the DARPA Distributed Systems Architecture Board, and vice architecture, a mechanism must be devised to was a member of the Internet Activities Board (the group of re- distribute events to external processes. In this paper, searchers who built the Internet). Professor Comer is well-known for we evaluate two candidate distribution mechanisms: his series of ground breaking textbooks on computer networks, the Internet, computer operating systems, and computer architecture. His Kafka and gRPC. Our experimental results show books have been translated into sixteen languages, and are widely that externalizing event processing introduces some used in both industry and academia. overhead, and the overhead resulting from gRPC is lower than the overhead resulting from Kafka. Adib Rastegarnia is a PhD candidate at Com- puter Science Department of Purdue Univer- Externalizing event processing has a negligible ef- sity under the supervision of Prof. Douglas fect on throughput, and the cost is considered small Comer. He earned his Masters degree in Com- when compared with the advantages of portability, puter Science from Purdue University in 2018. His current research interests span the areas of flexibility, and support for multilanguage manage- computer networks, operating systems, SDN, ment applications. and Internet of Things (IoT) with a focus on designing and implementing working proto- types of large, complex software systems and conducting real world R EFERENCES measurements. He is also a member of Systems Research Group of [1] C. Trois, M. D. D. Fabro, L. C. E. de Bona, and M. Martinello, Computer Science Department at Purdue University. “A survey on sdn programming languages: Toward a taxon-
You can also read