The time for serverless is now! - Serverless Architecture Whitepaper Up in the Cloud: Step by step towards serverless applications, platforms and ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
The time for serverless is now! Serverless Architecture Whitepaper Up in the Cloud: Step by step towards serverless applications, platforms and a cloud-native ecosystem @ServerlessCon # SLA_con www.serverless-architecture.io
Content Serverless Development First things first 3 Your first step towards serverless application development Quarkus: Modernizing Java to keep pace in a cloud-native world7 Scaling the modern app world Serverless Architecture & Design Why platform as a service is such a great model9 Looking into the future of PaaS The time for serverless is now – tips for getting started11 If not now, when? Building a data platform on Google Cloud Platform 13 Laying the groundwork for big data Migrating big data workloads to Azure HDInsight – Smoothing the path to the cloud with a plan17 Strategies for big data migration Serverless Engineering & Operations Cloud-Native DevOps20 The driving force behind the digital transformation of modern enterprises Serverless Security 25 Basic considerations on the subject of serverless architecture security www.serverless-architecture.io @ServerlessCon # SLA_con 2
WHITEPAPER Serverless Development Your first step towards serverless application development First things first In this article, Kamesh Sampath shows us how to master the first steps on the journey towards a serverless application. He shows how to set up the right environment and takes us through its deployment. by Kamesh Sampath RAM, 6 CPUs and 50 GB hard disk space. The boot command also contains a few additional configurations In the first part of this article, we will deal with setting for the Kubernetes cluster that are necessary to get Kna- up a development environment that is suitable for Kna- tive up and running. It is also important that the used tive in version 0.6.0. The second part deals with the Kubernetes version is not older than version 1.12.0, deployment of your first serverless microservice. The otherwise Knative will not work. If Minikube doesn’t basic requirement for using Knative to create serverless start immediately, it’s completely normal; it can take a applications is a solid knowledge of Kubernetes. If you few minutes until the initial startup is complete, so you are still inexperienced, you should complete the official should be a little patient when setting it up. basic Kubernetes tutorial [1]. Before we get down to the proverbial “can do”, a few Setting up an Istio Ingress Gateway tools and utilities have to be installed: Knative requires an Ingress Gateway to route requests to Knative Services. In addition to Istio [6], Gloo [7] is also • Minikube [2] supported as an Ingress Gateway. For our example, we • kubectl [3] will use Istio, though. The following steps show how to • kubens [4] perform a lightweight installation of Istio that contains only the Ingress Gateway: For Windows users, WSL [5] has proven to be quite use- ful, so I recommend installing that as well. curl -L https://raw.githubusercontent.com/knative/serving/release-0.6/ third_party/istio-1.1.3/istio-lean.yaml \ Setting up Minikube | sed ‘s/LoadBalancer/NodePort/’ \ Minikube is a single node Kubernetes cluster that is ide- | kubectl apply --filename – al for everyday development with Kubernetes. After the setup, the following steps must be performed to make Like the setup of Minikube, the deployment of the Istio Minikube ready for deployment with Knative Serving. Pod takes a few minutes. With the command kubectl Listing 1 shows what this looks like in the code. —namespace istio-system get pods –watch you can see First, a Minikube profile must be created, which is the status; the overview is finished with Ctrl + C. Whe- what the first line achieves. The second command is then ther the deployment was successful or not can be easi- used to set up a Minikube instance that contains 8 GB ly determined with the command kubectl –namespace www.serverless-architecture.io @ServerlessCon # SLA_con 3
WHITEPAPER Serverless Development istio-system get pods. If everything went well, the output Create the deployment and service should look like Listing 2. By applying the previously created YAML file, we can create the deployment and service. This is done using Installing Knative Serving the kubectl apply –filename app.yaml command. Also, The installation of Knative Serving [8] allows us to run at this point, the command kubectl get pods –watch serverless workloads on Kubernetes. It also provides au- can be used to get information about the status of the tomatic scaling and tracking of revisions. You can ins- application, while CTRL + C terminates the whole tall Knative Serving with the following commands: thing. If all went well, we should now have a deploy- ment called greeter and a service called greeter-svc (Lis- kubectl apply --selector knative.dev/crd-install=true \ ting 5). --filename https://github.com/knative/serving/releases/download/v0.6.0/ To activate a service, you can also use a Minikube serving.yaml shortcut like minikube service greeter-svc, which opens the service URL in your browser. If you prefer to use kubectl apply --filename https://github.com/knative/serving/releases/ curl to open the same URL, you have to use the com- download/v0.6.0/serving.yaml --selector networking.knative.dev/certificate- mand curl $(minikube service greeter-svc –url). Now provider!=cert-manager you should see a text that looks something like this: Hi greeter => ‘9861675f8845’ : 1 Again, it will probably take a few minutes until the Knati- ve Pods are deployed; with the command kubectl –name- Migrating the traditional Kubernetes space knative-serving get pods –watch you can check the deployment to serverless with Knative status. As before, the check can be aborted with Ctrl + C. The migration starts by simply copying the app.yaml With the command kubectl –namespace knative-serving file, naming it serverless-app-yaml and updating it to the get pods you can check if everything is running. If this is lines shown in Listing 6. the case, an output like in Listing 3 should be displayed. If we compare the traditional Kubernetes application (app.yaml) with the serverless application (serverless- Deploy demo application The application we want to create for demonst- ration is a simple greeting machine that outputs Listing1 “Hi”. For this we use an existing Linux container image, which can be found on the Quay website [9]. minikube profile knative The first step is to create a traditional Kubernetes de- ployment that can then be modified to use serverless minikube start -p knative --memory=8192 --cpus=6 \ functionality. This will make clear where the actual dif- --kubernetes-version=v1.12.0 \ ferences lie and how to make existing deployments using --disk-size=50g \ Knative serverless. --extra-config=apiserver.enable-admission-plugins=”LimitRanger,Namesp aceExists,NamespaceLifecycle,ResourceQuota,ServiceAccount,DefaultStora Create a Kubernetes resource file geClass,MutatingAdmissionWebhook” The following steps show how to create a Kubernetes resource file. To do this, you must first create a new file called app.yaml, into which the code in Listing 4 must be copied. Listing 2 NAME READY STATUS RESTARTS AGE cluster-local-gateway-7989595989-9ng8l 1/1 Running 0 Session: From Monolith to Serverless: 2m14s istio-ingressgateway-6877d77579-fw97q 2/2 Running 0 Rethinking your Architecture 2m14s Michael Dowden istio-pilot-5499866859-vtkb8 1/1 Running 0 2m14s It’s easy to understand the benefits of serverless but it’s not always easy to un- derstand how this will impact our software architecture. In this talk we will deconst- Listing 3 ruct a set of requirements and walk through the architecture of both a traditional service-oriented NAME READY STATUS RESTARTS AGE architecture and a modern serverless architecture. activator-54f7c49d5f-trr82 1/1 Running 0 27m You’ll leave with a better understanding of how to autoscaler-5bcd65c848-2cpv8 1/1 Running 0 27m design event-driven systems and serverless APIs, controller-c795f6fb-r7bmz 1/1 Running 0 27m along with some alternatives to the traditional networking-istio-888848b88-bkxqr 1/1 Running 0 27m RESTful API layer. webhook-796c5dd94f-phkxw 1/1 Running 0 27m www.serverless-architecture.io @ServerlessCon # SLA_con 4
WHITEPAPER Serverless Development Listing 4 --- - name: greeter path: /healthz apiVersion: apps/v1 image: quay.io/rhdevelopers/knative- port: 8080 kind: Deployment tutorial-greeter:quarkus --- metadata: resources: apiVersion: v1 name: greeter limits: kind: Service spec: memory: “32Mi” metadata: selector: cpu: “100m” name: greeter-svc matchLabels: ports: spec: app: greeter - containerPort: 8080 selector: template: livenessProbe: app: greeter metadata: httpGet: type: NodePort labels: path: /healthz ports: app: greeter port: 8080 - port: 8080 spec: readinessProbe: targetPort: 8080 containers: httpGet: Listing 5 Listing 7 $ kubectl get deployments $ kubectl get deployments NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE greeter 1 1 1 1 16s greeter 1 1 1 1 30m greeter-bn8cm-deployment 1 1 1 1 59s $ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE greeter-svc NodePort 10.110.164.179 8080:31633/TCP 50s Listing 8 $ kubectl get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE Listing 6 greeter ExternalName istio-ingressgateway.istio- --- system.svc.cluster.local 114s apiVersion: serving.knative.dev/v1alpha1 greeter-bn8cm ClusterIP 10.110.208.72 kind: Service 80/TCP 2m21s metadata: greeter-bn8cm-metrics ClusterIP 10.100.237.125 name: greeter 9090/TCP 2m21s spec: greeter-bn8cm-priv ClusterIP 10.107.104.53 template: 80/TCP 2m21s metadata: labels: app: greeter spec: containers: Listing 9 - image: quay.io/rhdevelopers/knative-tutorial-greeter:quarkus resources: kubectl get services.serving.knative.dev limits: NAME URL LATESTCREATED LATESTREADY memory: “32Mi” READY REASON cpu: “100m” greeter http://greeter.default.example.com greeter-bn8cm greeter- ports: bn8cm True - containerPort: 8080 livenessProbe: Attention httpGet: In a Minikube deployment we will have neither LoadBalancer nor DNS to path: /healthz resolve anything to *.example.com or a service URL like http://greeter. readinessProbe: default.example.com. To call a service, the host header must be used with httpGet: http/curl. path: /healthz www.serverless-architecture.io @ServerlessCon # SLA_con 5
WHITEPAPER Serverless Development app.yaml), we find three things. Firstly, no additional find out the address of the Istio gateway we have to use service is needed, as Knative will automatically crea- in the http/curl call, the following command can be used: te and route the service. Secondly, since the definition of the service is done manually, there is no need for IP_ADDRESS=”$(minikube ip):$(kubectl get svc istio-ingressgateway selectors anymore, so the following lines of code are --namespace istio-system --output ‘jsonpath={.spec.ports[?(@.port==80)]. omitted: nodePort}’)” selector: The command receives the NodePort of the service is- matchLabels: tio-ingressgateway in the namespace istio-system. If we app: greeter have the NodePort of the istio-ingressgateway, we can call the greeter service via $IP_ADDRESS by passing Lastly, under TEMPLATE | SPEC | CONTAINERS the host header with http/curl calls. name: is omitted because the name is automatically ge- nerated by Knative. In addition, no ports need to be de- curl -H “Host:greeter.default.example.com” $IP_ADDRESS fined for the probe’s liveness and readiness. Now you should get the same answer as with Deploying the serverless app traditional Kubernetes deployment (Hi greeter => The deployment follows the same pattern as before, ‘9861675f8845’ : 1). If you allow the deployment to using the command kubectl apply –filename serverless- be in idle mode for about 90 seconds, the deployment app.yaml. The following objects should have been crea- will be terminated. At the next call, the scheduled de- ted after the successful deployment of the serverless ployment is then reactivated, and the request is answe- application: The deployment should now have been red. added (Listing 7). A few new services should also be Congratulations, you have successfully deployed and available (Listing 8), including the ExternalName ser- called your first serverless application! vice, which points to istio-ingressgateway.istio-system. svc.cluster.local. There should also be a Knative service available with a URL to which requests can be sent (Lis- Kamesh is a Principal Software Engineer at Red Hat. ting 9). As part of his additional role as Director of Developer To be able to call a service, the request must go Experience at Red Hat, he actively educates on Ku- bernetes/OpenShift, Service Mesh, and serverless through the ingress or gateway (in our case Istio). To technologies. With a career spanning close to two decades, most of Kamesh’s career was with services industry helping various enterprise customers build Java-based solu- Session: Serverless Development Life- tions. Kamesh has been a contributor to Open Source pro- jects for more than a decade and he now actively contributes cycle: How to Design, Structure and to projects like Knative, Quarkus, Eclipse Che etc. As part of Maintain Serverless Projects his developer philosophy he strongly believes in: “Learn more, do more and share more!” Jorge Vargas We’ve been working with AWS Lambda for more than 2 years in production; we started with a small and iterative process, which helped us to quickly adapt and iterate our projects, but we also made a lot of mis- takes along the way. We didn’t know how to auto- mate our serverless deployments, how to manage serverless tasks or how to structure our projects. This led to manual work, maintainability and or- Links & Literature chestration issues. Luckily, after 2 years we’ve lear- [1] https://kubernetes.io/docs/tutorials/kubernetes-basics/ ned a lot about what to do and what not to do when it comes to serverless systems on AWS. We [2] https://kubernetes.io/docs/tasks/tools/install-minikube/ would like to show our initial mistakes, how we [3] https://kubernetes.io/docs/tasks/tools/install- overcame them and what has worked for our kubectl/#install-kubectl-on-linux team. We will also talk about how to be as lean as [4] https://github.com/ahmetb/kubectx/blob/master/ possible while maintaining flexibility and robust- kubens/ ness in a serverless system. We will show how to [5] https://docs.microsoft.com/en-us/windows/wsl/install- use, configure and extend a serverless framework win10 and how to combine it with Terraform. Additio- [6] https://istio.io nally, we will show how to easily structure, develop, build and deploy AWS components, Lambda func- [7] https://gloo.solo.io tions, and lambda@edge functions. [8] https://knative.dev/docs/serving/ [9] https://quay.io/rhdevelopers/knative-tutorial-greeter www.serverless-architecture.io @ServerlessCon # SLA_con 6
WHITEPAPER Serverless Development Scaling the modern app world Quarkus: Moderni- zing Java to keep pace in a cloud-na- tive world Java is no spring chicken and some are even referring to it as a “vintage language”. Despite its popularity, there are some complaints about it. In our new cloud-native world, why does Java need to evolve? In order to evolve to keep up with modern, cloud-native apps, Java needs to keep all of what makes it so dependable, while also being able to function in new app environments. by Rich Sharples that release cycles have been shortened to every six months, and Java 13 was just recently announced at this Don’t worry, you are not the only one who feels old year’s Oracle OpenWorld. It addition to never dipping when you hear Java being described as a “vintage” pro- below number two on the TIOBE index, SlashData [2] gramming language. While Java has been around since has predicted that there will be 7.6 million Java develo- 1995, it is certainly not ready to retire (or rather, be pers by the end of 2019. retired), and continues to rank among the top languages Java has many advantages, including being designed TIOBE index [1]. In fact, no other language has been so for ease of use, and it is often said that it is easier to popular for so long. write, compile and debug in Java than in any other pro- However, it is not without its issues, including someti- gramming language. This, coupled with the fact that it mes being too clunky to keep up with some of the newer ranks among the top programming languages used by programming languages, not agile and flexible enough to companies in the Fortune 25, means that it continues to work in this new world of containers, and not really re- remain relevant, even as shiny new programming langu- levant in applications that are not coded to be Java first. ages like Rust, Elixir, and Swift come on to the scene. While they say you can’t teach an old dog new tricks, you can rethink how it performs what they already know. Why does Java need to evolve? This piece will discuss what the community can do The disconnect between modern application develop- to help the language keep up with modern application ment and Java is that the apps built on newer program- development trends, to ensure that it continues to have a ming languages tend to be more lightweight, agile and place in the new cloud-native programming world. flexible, often running in containers, which traditionally Java has not been well-equipped for. Why has Java stood the test of time? Common complaints include: It has been said that Java is having a “Renaissance Mo- ment” where the programming language keeps evolving. • Java is too fat, often starting with libraries that are In fact, there is so much demand for new innovations not used. This does not bode well for microservices www.serverless-architecture.io @ServerlessCon # SLA_con 7
WHITEPAPER Serverless Development architectures but does work when the Java applica- hout having to learn an entirely new language and shift tion is being used to solve a more complex problem. how they work. • It still follows the “write once, run anywhere” prin- ciple, meaning that any device that has a Java Virtual Java in the modern application Machine (JVM) should be able to successfully – me- development world aning without it being altered – run a Java app. While When I say modern application development, I am re- this is generally a good feature, it is not as important ferring to environments like Kubernetes and Serverless, when targeting containers. both of which rely on containers for deploying code into • Java has a longer start-up time when compared with production, that up until very recently, Java has been newer apps, which goes back to it being really good incompatible with. at having everything it needs to solve complex prob- Long time Java leaders like Red Hat are aiming to lems, but leaves something to be desired in terms of make it a key player in these environments, through simpler processes. initiatives like Quarkus, which is a Kubernetes-native • Having too many libraries, and therefore having a Java framework tailored for GraalVM and OpenJDK large package size, slows down the start-up time and Hostpot. By offering developers the ability to use Java makes the Java app less agile. in a unified reactive and imperative programming mo- • Some also say that Java is too verbose and that more del, Quarkus aims to enable developers to work within modern languages can do the same thing with less Kubernetes and serverless environments without having code. to learn a new paradigm. It can deliver new runtime ef- • Java is a very dynamic language, which is part of ficiencies to try to tackle some of what currently makes what makes it so productive and agile but can also Java stuck in the past, including faster startup time, lo- cause some frameworks to abuse the dynamic ca- wer memory utilization and a smaller application and pabilities, resulting in longer startup time and large container image footprint. memory overhead. Through frameworks like Quarkus, I believe Java will • It is not always the best equipped language to hand- be better equipped to scale in the modern application le event driven architectures where concurrency development landscape and continue to not only evolve and throughput are more important. Java’s plan to but also innovate. Because that is what is key here – address this is through Fibers. creating a path to the future for cloud-native Java and in doing so, keep Java at the center of enterprise inno- In order to evolve to keep up with modern, cloud-na- vation. tive apps, Java needs to keep all of what makes it so dependable, while also being able to function in new Rich Sharples is the Senior Director of Product Ma- app environments. Part of Java’s renaissance moment nagement in the Application Platforms Business Group at Red Hat. He has spent the last twenty ye- is that developers are beginning to realize that, and ars evangelizing, using, and designing Enterprise are doing what they can to modernize Java while not Middleware. He previously worked for Forte Software straying too far from the tried and true benefits of the and Sun Microsystems and as an independent software de- language. This can enable allow the millions of current veloper and consultant building large distributed software systems for the space, transport, and energy sectors. He also Java developers to expand the work they can do wit- serves on the node.js Foundation Board of Directors. In his spare time he enjoys running, cycling, and anything that gets him outdoors. Session: The Real Cost of Pay-Per-Use in Serverless Ran Ribenzaft Pay-per-use is one of the major drivers of serverless adoption. Small startups love it because their monthly bill is almost zero. Large organizations are attracted to im- proving their IT spending when the old servers have very low utilization and are mostly idle. While it sounds promising, it also produces a massive challenge – paying per use means that you don’t know how much you are going to pay – why? Be- cause most of us don’t know exactly how much we are going to use. In addition, new and unique chal- lenges arise – a bug in the code can suddenly lead Links & Literature to a very high cloud bill, as well as an external API [1] https://www.tiobe.com/tiobe-index/ that has a very slow response, causing us to pay for this additional time. [2] https://slashdata-website-cms.s3.amazonaws.com/ sample_reports/ZAamt00SbUZKwB9j.pdf www.serverless-architecture.io @ServerlessCon # SLA_con 8
WHITEPAPER Serverless Architecture & Design Looking into the future of PaaS Why platform as a service is such a great model By now, we have all become used to Software as a Service (SaaS), but for many, the idea of platform as a service is still fairly new. PaaS offers enterprise-ready infrastructure, systems, and tools and is the logical next step of SaaS. What can we expect from the future of platform as a service? For one, the requirements to use PaaS has only gotten cheaper and easier. by MicroStartups (software as a service). We first saw the advent of inter- net-supported suites such as Adobe’s Creative Cloud, Platform as a service (PaaS) involves providing an in- and everyone got used to that (by now, it’s fair to say stantly-deployable set of cloud computing services that that Creative Cloud has some serious clout [2]) — then can scale to meet the user’s needs and is typically priced came the advent of software solutions accessible only by resource use. Though such services have been around through online portals, with Salesforce being one of the for some time, their regular practicality has steadily earliest success stories. trended up over the years as systems have been updated People jumped on SaaS because it skirts the need to and the fundamental requirement to use them — fast invest heavily in expensive hardware such as server and stable internet access — has become cheaper and equipment, as well as the awkward setup phase of get- easier. ting software up and running across different devices. There are currently two suites dominating the cloud Provided you can reach a business-level internet con- computing field (AWS (Amazon Web Services) and Mi- nection, you can use SaaS — the pricing keeps going crosoft Azure [1]), and they’re changing how the busi- down alongside bandwidth costs, and since the location ness world approaches digital activity in general. It’s doesn’t matter, you can work from anywhere without clear that there’s a huge long-term market for this type needing an office. of system — Microsoft and Amazon, two giants of the PaaS takes this concept to its logical zenith. Instead of business world, wouldn’t have made such commitments simply offering software that can be accessed through to theirs otherwise. the internet, it offers enterprise-ready infrastructure plus So, what is it about the platform as a service (PaaS) all the systems you need to put it to good use: tools for model that makes it so compelling? How is it changing development, design, testing, CRM, CRO, and essenti- how business is done, and what might the future hold? ally anything on the market the provider has created (or Let’s discuss. licenced). What PaaS involves How PaaS brings simplicity and efficiency What we’ve seen over the last decade is the steady on- to business line migration of local tasks, ultimately constituting the No matter what you want to achieve as a business biggest portent of the still-nascent PaaS industry: SaaS with a keen eye on the digital world, you can drive it www.serverless-architecture.io @ServerlessCon # SLA_con 9
WHITEPAPER Serverless Architecture & Design through a PaaS solution with no need to plan ahead, compelling choice in the PaaS world. The more it can go through cumbersome setup procedures, or painsta- offer online, the more people will be inclined to invest kingly pick out integration-capable tools. The tools and in the Microsoft ecosystem — if you’re going to migrate options offered through a PaaS system are guaranteed everything to Azure [5], you might as well go all-in on to work together. the company. The advantages don’t stop there. Think about how costly it can be to acquire licences for high-level soft- What can we expect in the future? ware suites [3], particularly if you’re a small business. For users, we should see the convenience of IT resour- Because PaaS companies work in bulk, tying tools to- ces continue to rise as the PaaS model becomes further gether and making them available to many clients, it ty- entrenched in everyday business. Just as SaaS seemed pically works out as significantly cheaper to use a PaaS novel one minute and near-mandatory the next, expec- service. What’s more, since PaaS pricing is determined tations will soon have adjusted to the extent that using by resource use, you don’t lose out if you don’t use it on-site platforms will seem antiquated and cumberso- for a while. me. There are some reasons why people dislike PaaS ser- For PaaS in general, the best thing for the digital vices, of course: they’re wary about cloud-based servi- landscape will be an influx of competitors to give peo- ces because they want full control over their data, or ple realistic alternatives to Azure and Amazon’s AWS. they don’t have sufficiently-stable internet connections, Narrowed options lead to stagnation and profiteering, or they fear getting invested in a system only to see the after all. 10 years from now, every business should have pricing increase exponentially and leave them in a tough a wide range of PaaS options from which it can select an spot (replatforming is tough enough in the SaaS-driven ecosystem that perfectly suits its needs. ecommerce world [4], so imagine attempting the same thing for an industry-spanning enterprise company). MicroStartups is a business community that celebra- On the whole, though, there’s every reason to be posi- tes inspiring startups, small businesses, and entre- preneurs. Whether you’re a solopreneur or a startup tive — if you can get good internet access, the other fears making your way in the business world, we’re here are unfounded. The implementation of GDPR has chan- to help. For the latest news, inspiring stories and ac- ged how data security is viewed, and as long as viable tionable advice, follow us on Twitter @getmicrostarted. PaaS alternatives keep appearing, we’ll avoid a duopoly. The role of Azure in Microsoft’s business model Where Amazon yields all of its cloud computing reve- nue from AWS, Microsoft has Office 365 and Dynamics alongside Azure, and together they recently achieved an annual revenue of $9.6 billion (as announced in the la- test quarterly earnings). This is a significant portion of Microsoft’s bottom line, and it stands to keep getting bigger. This makes a lot of sense when you think about Microsoft’s identity today. While it doesn’t really get to compete at the high end as far as consumer products go (the Surface line has never threatened Apple’s Mac- books and iPads, for example), it maintains a strang- lehold on business operating systems that makes it a Session: Let a thousand flower bloom? Links & Literature Scaling Serverless SaaS in the [1] https://www.zdnet.com/article/top-cloud-providers-2019- enterprise aws-microsoft-azure-google-cloud-ibm-makes-hybrid- Holger Reinhardt move-salesforce-dominates-saas/ How to scale and converge serverless [2] https://www.vice.com/en_us/article/3kgw83/is-adobes- Microservice architecture and patterns creative-cloud-too-powerful-for-its-own-good using engineering blue prints. Using deve- [3] https://www.financialdirector.co.uk/2018/07/31/the- loper evangelism and Inner Source to hidden-costs-of-software-licensing-and-how-to-avoid- drive adaption of common patterns across the them/ organisation, create composable applications and [4] https://www.shopify.com/enterprise/ecommerce- accelerate time to market. replatforming-guide [5] https://jaxenter.com/cloud-big-data-azure-161270.html www.serverless-architecture.io @ServerlessCon # SLA_con 10
WHITEPAPER Serverless Architecture & Design If not now, when? The time for ser- verless is now – tips for getting started Just because everyone is talking about something, that doesn’t mean it’s actually worth your time. Chris Wahl shares his experiences getting to grips with serverless technology, what he learned throughout the process, and whether, ultimately, serverless is something worth considering. by Chris Wahl mewhere. This is often the case with on-premises inf- rastructure; you will find a little pizza box server or set The world of IT operations is rife with all sorts of “hot of virtual machines that spend most of their time just new things” being lauded by thought leaders and ven- waiting to run code based on dates and times. dors alike. When your job is to reliably and consistently In one specific example, a series of data center envi- deliver services and applications to engage and delight ronments used for demonstration purposes were being your users, it’s tough to absorb the idea of serverless. reset to their base configurations on a nightly basis. Even the name makes it sound like you’re going to be Rather than constructing a container or server so- tricked into throwing away your servers somehow! mewhere within the data center to do this, all of the With that said, the ability to deploy code via functions baseline functions were migrated over to AWS Lambda that can be run anywhere with the underlying layers ab- functions and triggered based on a CloudWatch Event stracted sounded interesting to my team of engineers set to a nightly datetime value. This had two immediate based on our collective experiences. In this post, I’ll go benefits: over some of the initial serverless sticker shock items from the past few years to help you prepare to bring IT 1. The team no longer had to maintain the cron ser- operations into the world of serverless to drive higher vers. This eliminated thankless and time-intensive value and ultimately do less manual work. patching, securing, and maintenance tasks. 2. The total cost of ownership (TCO) was reduced Where to start? by CapEx (more resources available for other wor- There are a number of places you can start. My team’s kloads, which avoids hardware spend) and OpEx first use case with serverless was tackling a long list (we can focus on other things). Now, it only costs a of cron jobs and scheduled tasks that were often just few pennies a month to run the majority of our cron soaking up idle CPU time on a job or batch server so- functions. www.serverless-architecture.io @ServerlessCon # SLA_con 11
WHITEPAPER Serverless Architecture & Design basis. Our goal was to leverage a rotation service Operationalize first, optimize later along with an encrypted vault in the future, but the However, things weren’t just magical and rosy from use of environmental variables helped kick-start the first pass. Due to the iterative nature of construc- things in the beginning. ting a function, you’ll often want to start by getting your workflow operational and understanding all of the various requirements, permissions, and non-obvious The biggest investment with serverless caveats. Later, after having learned more about how As you seek to adopt serverless, you might ask: What serverless operates, you can make more passes across was the biggest time investment for the move towards the functional code to streamline your workflow. For serverless? The answer to that would easily be documen- example, once our team had the on-premises workflow tation. The need for design documentation is critical. operating in serverless functions, we then took another Because your serverless functions are typically standalo- pass to optimize, refactor, and slim things down. ne, loosely organized, and triggered based on a variety In addition, we learned a few things the hard way: of different inputs (API gateways, CloudWatch Events, Lex inputs, and so on), it’s a good idea to maintain a 1. Longer timeouts for functions become necessary high-level workflow that programmatically describes when calling back to environments that require the the triggers, functions, and outputs. We use a combina- construction of a network that lives within a VPC. tion of git-backed repositories that contain JSON files This is often due to the cold start period where the along with Confluence pages that have embedded docu- default of 30 seconds is not quite enough. mentation and links to functions. 2. Python or Go can offer a much better user experi- Today, some of our functions can be called via Slack ence compared to other scripting languages, such as commands that are triggered via an API Gateway. One, PowerShell, and can help solve the problem of hit- for example, is used to construct new GitHub reposito- ting a timeout due to cold starts. ries that have the correct naming standards, contribu- 3. Encrypted environmental variables are good for tor file, code of conduct, labels (tags), and license file. storing tokens and keys. However, that’s not enough An authorized user simply uses the slash command to – you’ll still want to rotate these objects on a regular answer a few questions (without any authority in Git- Hub) and a new, compliant repository is generated and handed over. This offers the ability to: Session: Serverless Patterns Made 1. Pass along workflows that are complicated or securi- Simple with Real-World Use Cases ty sensitive to authorized users in a ChatOps style. Sheen Brisals 2. Collaborate together using a messaging platform Patterns are common in all walks of life, where the team can see what is being done in real including in serverless computing! Soft- time and offer input, troubleshooting, or guidance. ware architectural patterns are often view- ed as complex constructs that are beyond the grasp of many engineers. When mixed with The time for serverless is now cloud computing and serverless, the perception If you’re on the fence about diving into serverless tech- takes it even further. Textbook-style narratives of nologies, now is the time to make your move. Getting serverless patterns often fail to connect with engi- neers. Sharing a true serverless journey with the your hands on the workflows presented by serverless community brings authenticity and acts as a great options such as AWS Lambda will put your ahead of the learning platform. This is exactly what this talk pack in the world of IT operations and is actually quite aims to achieve.The Shopper and Consumer Tech- fun to learn, configure, and operate. nology team at LEGO have successfully migrated their legacy eCommerce platform onto a cloud- based serverless solution on AWS. This employed a number of technologies, best practices and of course serverless patterns that helped to accelera- te the overall process. In this talk, I will touch upon Chris Wahl is Chief Technologist at Cloud Data Ma- the experience of the migration journey and focus nagement company Rubrik. He has acquired over mainly on the patterns that we came across and two decades of IT experience in enterprise infrastruc- got implemented. Each of those architectural ture design, service orchestration, and building po- patterns will be associated with a use case from licy-based automation. As an independent author our journey that everyone can easily relate to. of the award winning Wahl Network blog and host of the Da- tanauts Podcast, Chris focuses on creating content that re- What better way to equip the generation of server- volves around next generation data centers, workflow less engineers to be efficient and innovative than automation, building operational excellence, and evangeli- sharing the knowledge? zing solutions that benefit the technology community. www.serverless-architecture.io @ServerlessCon # SLA_con 12
WHITEPAPER Serverless Architecture & Design Laying the groundwork for big data Building a data platform on Goog- le Cloud Platform At the moment, big data is very popular and there is a wide variety of pro- ducts available for handling data. In this article, read a case study about a German startup tackled their data problems and built a common data platform into their architecture. The data platform consists of four compo- nents: Ingestion, storage, process, and provisioning. by Claire Fautsch However, especially for new users, we do not have much insight yet. Thus, we want to predict their beha- Introduction viour based on other users showing a similar user jour- Joblift is a Germany-based startup. Its core business is ney. In our case, this could be for example another user a meta-search engine for job seekers in Germany, the simply searching for the same job title. Or, in a more Netherlands, France, the UK and since 2018, the US. We complex scenario, we could also consider users attracted aggregate jobs from various job boards, agencies and via the same marketing campaigns. companies. Our aim is to provide job seekers a single To solve this, we want to put data from different users page listing all jobs instead of having to browse multiple in relation. One possibility to do this, is using machine websites. learning algorithms. They can be used to predict possi- Since its beginnings Joblift focused on the so-called ble relations with a certain level of confidence. Together arbitrage business, i.e., buying traffic via various marke- with facts we already know about a user (e.g., “User ting channels, and forwarding it to its own customers. clicked on Job XY” or “Users having clicked on Job XY The profit is made from the difference in the price. As also clicked on Job AB”), we can store those predictions this is a successfully running business right now, the in a knowledge graph (“User is likely to click on Job time has come to take the platform to the next level. AB”). Our vision is to become the job board of the future. The foundation of all this is big and fast data. The Instead of helping our users search for a job, we want to arising challenge is consequently how to ingest, store, recommend them their perfect match. The foundation process and provide the accumulated data. for this new line of business is data. Solution Challenges To tackle this challenge, we decided to build a common In order to be able to provide the perfect matching job to data platform into our architecture. The data platform a user, we need to gain as much knowledge and insights is a set of infrastructure components, tools and proces- about them as possible. This means every move of a user ses with the sole aim of collecting data, storing it, and on our platform needs to be tracked, analyzed, put into providing it to all possible use cases in an efficient way. context and modeled into information. As Joblift’s existing architecture is deployed on the www.serverless-architecture.io @ServerlessCon # SLA_con 13
WHITEPAPER Serverless Architecture & Design Google Cloud Platform (GCP) [1], we decided to stay updates from various job boards). Looking at the GCP with GCP also for our data platform. Google offers a managed products, the obvious choice for the ingestion wide range of managed products, and wherever fitting, component would have been Cloud Pub/Sub. This is we stuck to those managed products. For us, the big GCP’s managed solution for message and event inges- advantage is, that once we decide to get started with a tion. technology, we do not need to invest much time setting However, on Pub/Sub it was at the time not possible up the infrastructure. The drawback is that we might to replay or re-read messages, which is a crucial fea- not have all the flexibility we have with self-managed ture for us (that changed in the meantime). Also, Pub/ products and have a vendor lock-in. However, in our Sub does not guarantee a certain ordering. For those case, velocity clearly wins over flexibility. reasons, and because we had a fully setup Kafka clus- In those cases where we decided not to use a managed ter on Kubernetes in GCP already, we chose Kafka for product, we deployed the respective components direct- ingesting streaming data. ly to our Kubernetes cluster, also running on GCP. By setting the ley properly, we have a guaranteed or- der on the partitions and because of the retention are Architecture able to replay messages. Additionally, we can have se- We aimed to clearly separate the data platform, and the veral consumers for the same data, without having to use cases using it. Our Data Platform consists of 4 sepa- do fan-outs as on Pub/sub (which increases costs). If at rate components which we will outline in more detail some point we would need the additional features Cloud below, namely: Pub/Sub provides, we can take advantage of the availa- ble connectors to connect Kafka to Cloud Pub/Sub via • Ingestion Kafka Connect. • Storage For getting batches of data into our system we mainly • Process use ftp or http. • Provisioning Storage Once the data is in the system we want to store it in its Ingestion raw format. We opted here for the concept of a two-way In a first step. we need to get the data into the plat- storage architecture. On one side, we have a long-term form. We have a wide range of different data sources. storage, referred to as cold storage. Data on the cold They provide either continuous data (e.g., in the form storage has an unlimited retention with slower access of transactional events) or batched data (e.g., daily job times. On the other hand, there is the short-term sto- rage, or hot storage, with limited retention but faster access times. Session: Serverless Integration For the hot storage, we opted for Cloud Bigtable, which is GCP’s managed NoSQL wide-column database Architectures service, using the HBase API. The drawback of Bigtab- Samuel Vandecasteele le/HBase, compared to Cassandra, for example, is that Serverless is changing the way we ar- there is no possibility for a secondary index. We can live chitect our cloud environments. It lever- with this drawback. In case we need to have different ages all the advantages of the cloud wit- row keys, we duplicate our data with appropriate row hout the operational overhead. By gradu- ally moving towards more Serverless paradigms, keys. This also guarantees us fast access. big steps can be taken in offering more reliable, For the cold storage, we simply use Google Cloud Sto- scalable and cost-efficient IT services. With FaaS rage (GCS), which is designed for durable storage. (function as a service), cloud-native messaging and serverless API management the mayor building Data processing blocks for a new generation of enterprise integrati- For the data processing, that happens inside our data on architectures are available, making integration platform, we have various approaches. use cases ideal candidates as early adopters for For streaming data, we mostly rely on Java micro- this serverless movement. In this session, we go services using Kafka streaming technology as in and deeper into how serverless is changing the integ- output. This gives us a maximum of flexibility, paired ration landscape. We’ll cover serverless integration with high scalability and low complexity. Using gitlab Architectures. We investigate the ecosystem of available tools and frameworks, cover best practi- CI/CD pipelines, we build docker images, and deploy ces and answer questions like: „Are the famous them to our GCP hosted Kubernetes cluster using Helm enterprise integration patterns still relevant? Will charts. Adhering to the microservice principle and using FaaS replace my ESB? How to choose between asynchronous communication via Kafka, we can ensure iPaaS and FaaS? How do different public vendors a high level of continuous integration and deployment. and ecosystems relate to one another?“ New requirements can thus be implemented with an ap- propriate speed. www.serverless-architecture.io @ServerlessCon # SLA_con 14
WHITEPAPER Serverless Architecture & Design For modeling our more advanced ETL data pipelines, One way to do so is via a knowledge graph. A know- with opted for Apache Airflow, respectively the GCP ledge graph is a way of representing data in relation. It managed version of it, Cloud Composer. It allows us quickly allows us to extract connections between diffe- to define and schedule pipelines in the form of a direct rent users or jobs (e.g., “users that applied for this job, acyclic graph (DAG). One of the main advantages of also applied to jobs XYZ”). using Cloud Composer is that it neatly integrates with The technical foundation for our knowledge graph is other GCP projects, such as GCS. The DAG defined in a graph database. When this article was written, GCP Python can simply be uploaded to GCS and then they did not offer any managed graph database. We thus op- are automatically picked up by Cloud Composer’s sche- ted for a self-managed JanusGraph deployed to GCP, duling component. as it nicely integrates with Google BigTable as backend. For the indexes, we use Elasticsearch. Data provisioning Gathering all that data is well and good, but it is worth- Machine learning less if we are not able to provide our use cases with the To fill our knowledge graph not only with facts but also right data in an efficient way. Consequently, the aim of with probable relations (“user is likely to click on job our data platform is not only to gather and store data, XY”) we want to make use of machine learning algo- but also to provide the correct data in the right way. rithms. We want to predict their journey based on the For providing streaming access to our data, we opted knowledge we have from other users. Another use for again for Kafka. Combined with Avro and the kafka- machine learning algorithms is clustering and catego- schema registry, this allows us to have a clearly defined rization of our jobs. Machine learning algorithms need interface contract including proper versioning. data, ideally lots of historic data, and this data is provi- For batch access, the provisioning depends on the use ded via the cold storage previously described. case and is optimized for the respective use-case. Takeaways Use cases We have successfully laid the groundwork to operate a Having all this data available and being able to provide big and fast data platform, and while it clearly showed it, we now want to use it to fulfill our business needs. its dependability and the proof of concept was success- In the following sections, we outline several of our use- ful we also had some key learnings. cases. Big data being very popular at the moment, there is a wide range of open- and closed source, managed and Analytics unmanaged products available out there. Having such Before even tackling our new vision, one of the first use an overwhelming amount of choices, we learned that cases for our data is analytics. To provide our business sometimes, you just have to take a decision on which analysts with an easy way of creating business reports product to use. After some time of evaluation, we deci- on one side, and ad-hoc analysis on the other side, we ded that for the sake of velocity, we would use wherever chose to build an analytics data warehouse. In the lan- possible GCP managed products. Those products are dingzone of our data warehouse, we store relevant raw optimized for a dedicated purpose and fulfill that purpo- and pre-aggregated data, which can then be used for re- se with bravery. Some of them are simply the managed porting and analysis. We opted for Google BigQuery for our data warehouse. It is fully managed, and can be queried using Session: Modern Architecture in the simple SQL. Additionally, with its BI Engine it comes Cloud of 2020 with all that is needed to build reports and dashboards. Marius Zaharia On top of BigQuery, we use Google Data Studio to crea- Today, the large public clouds – Azure and te interactive dashboards to support our business and AWS – deploy a diversity of services and salespeople in their daily decision making. features at high speed. Between Azure Functions, Lambda, Event Grid, Simple Knowledge graph Workflow Service or Logic Apps, which one should In view of our vision, our goal is to provide the best I choose? Should I go for microservices? Event- matching job to every user. The foundation of Joblift driven? Lambda architecture? Deploy on server- less? Containers? Modern Compute? Let’s put a bit is a search engine. Users can search for jobs. The cur- of order in all that. Enter the Modern Architecture, rent flow is a user coming to our page and searching the foundation for the new wave of cloud services either for any job in a given city (location search) or for and much more. This session will be focused on a specific job (expert search). The executed search that is application and infrastructure architecture, with based on the keywords entered in the search mask. Un- live examples based on the cloud, perspectives and fortunately, the results provided by such a simple search roadmap of the corresponding services at Amazon often lack some relevance. Thus, we want to enhance and Microsoft. the results in order to provide the best matching job. www.serverless-architecture.io @ServerlessCon # SLA_con 15
WHITEPAPER Serverless Architecture & Design version of popular open source products, such as for throwing 2 weeks work away, in the end, this shows to example Cloud Composer being the managed version be far more efficient and in the end, there is a stable and of Apache Airflow. Also, still being a startup, it is quite dependable solution. some relief not to have to the additional effort operating a self-managed component. Conclusion and outlook Right with that decision came the second learning, The combination of managed and self-deployed services managed products often feel like a blackbox. While for so far showed to be a good choice for our data platform. self-deployed services you have all the liberty of confi- Having proven its dependability with the first set of use guration and full insights, a managed product usually cases, the next steps are now implementing further use comes with a limited set of configurable parameters. cases. Additionally, we would like to set up metadata For example, with Cloud Composer we had the prob- management, to get a full data lineage of our data and lem that its logs get fed into stackdriver (GCP’s monito- gain even more insights. Here we could possibly imagi- ring solution). For our existing microservices, however, ne another one of Google’s products to come into play, we use the complete ELK stack for storing log files, namely Data Catalog. and Prometheus and Grafana for monitoring and aler- ting. While stackdriver integrates with Grafana, it just Dr. Claire Fautsch is an Engineering Manager at Joblift doesn’t run as smooth as our existing setup. Especially GmbH where she is leading a team of backend de- velopers while still enjoying getting hand-on coding when analyzing issues, the stackdriver setup proves less time as well. Previously she worked at Goodgame flexible for us. Studios as a Java developer and as IT Consultant. Finally, additional learning is to fail fast. Documen- Dr. Fautsch obtained her PhD in Computer Science on the tation is often sparse, and every use case different. All topic of Information Retrieval as well as her bachelor’s and master’s degree in mathematics at University of Neuchâtel theoretical knowledge in spite, it can still be that your (Switzerland). She enjoys exploring new technologies and chosen solution just doesn’t work out at the end, for never says no to a nice challenge. reasons you simply weren’t aware of or didn’t consider. Thus, we opted for an agile development process star- ting with building prototypes quickly to prove feasibility (or the contrary). When something doesn’t work out, don’t with all means try to stick to the initial idea, just because on paper it reads great. Learn from what you Links & Literature did and move on with an improved solution tackling the encountered issues. Even if sometimes this feels like [1] https://cloud.google.com www.serverless-architecture.io @ServerlessCon # SLA_con 16
WHITEPAPER Serverless Architecture & Design Strategies for big data migration Migrating big data workloads to Azure HDInsight – Smoothing the path to the cloud with a plan Migrating to the cloud isn’t the easiest task however, you can limit its complexity. Smooth out the plan for migrating big data to the cloud with a step by step plan. Learn the correct questions to ask yourself before mi- grating big data workloads to Azure HDInsights in order to ensure a per- fect, error-free migration. by Shivnath Babu understand its current environment, determine high priority applications to migrate, and set a performance Migrating big data workloads to the cloud remains baseline to be able to measure and compare on-premises a key priority as well as a challenge for business lea- clusters versus Azure HDInsight clusters. ders. Many are looking to AI and predictive analytics to increase performance, throughput, and to reduce • What does my current on-premises cluster look like, application, data, and processing costs as a way out and how does it perform? of the complexities of the big data operations lands- • How much disk, compute, and memory am I using cape. today? Planning is key, and there are some sensible questions • Which of my workloads are best suited for migration to ask to ensure the planning phase runs smoothly and to the cloud? sets the project up for success. The organisation must • What are my HDInsight resource requirements? www.serverless-architecture.io @ServerlessCon # SLA_con 17
WHITEPAPER Serverless Architecture & Design • Should I use manual scaling or auto-scaling HDIn- • Which big data services (Spark, Hadoop, Kafka, etc.) sight clusters, and with what VM sizes? are installed? • Which datasets should I migrate? Overall, organisations that understand the true path to the cloud isn’t paved with rainbows know the need to reduce the complexity of delivering reliable application Azure HDInsight environment performance when migrating data from on-premises or • What are my HDInsight resource requirements? a different cloud platform onto HDInsight. Application • How do my on-premises resource requirements map Performance Management (APM) solutions have a vital to HDInsight? role in bringing a host of services that should provide • How much and what type of storage would I need on unified visibility and operational intelligence to plan and HDInsight, and how will my storage requirements optimise the migration process. It’s strongly recommen- evolve with time? ded to make use of such solutions in order to not suffer • Would I be able to meet my current SLAs or better some of the common challenges that crop up time and them once I’ve migrated to HDInsight? again. • Should I use manual scaling or auto-scaling HDIn- An APM will automate and optimise some of these sight clusters, and with what VM sizes? major areas to simplify the overall project: • Identify the current big data landscape and platforms Baselining on-premises performance and for baselining performance and usage resource usage • Make use of AI and predictive analytics to increase To effectively migrate big data pipelines from physical the performance and throughput and to reduce the to virtual data centres one needs to understand the dy- application, data, and processing costs from an elas- namics of on-premises workloads, usage patterns, re- tic cloud environment source consumption, dependencies and a host of other • Automatically size cluster nodes and tune configura- factors. tions for the best throughput for big data workloads It’s vital to get these detailed reports of on-premises • Find, tier, and optimise storage choices in HDInsight clusters including total memory, disk, number of hosts, for hot, warm, and cold data and number of cores used. A cluster discovery report also delivers insights on cluster topology, running servi- An organisation must understand its current environ- ces, operating system version and more. Resource usage ment, determine high priority applications to migrate, heatmaps can be used to determine any unique needs and set a performance baseline to be able to measure for Azure. and compare its on-premises clusters versus its Azure It’s also key to gain app usage insights from cluster HDInsight clusters. workload analytics and data insights. When the busi- ness can highlight application workload seasonality by In the on-premises environment user, department, application type, etc., it helps calibra- • What does my current on-premises cluster look like, te and make the best use of Azure resources. This type and how does it perform? of reporting can greatly aid in HDInsight cluster design • How much disk, compute, and memory am I using choices (size, scale, storage, scalability options, etc.) to today? maximise the ROI on Azure expenses. • Who is using it, and what apps are they running? Don’t neglect searching for the best strategy for sto- • Which of my workloads are best suited for migration rage in the cloud by looking at specific metrics on usa- to the cloud? ge patterns of tables and partitions in the on-premises cluster. Next, consider identifying unused or ‘cold’ data. Once Session: Event-Driven Serverless identified, one can then decide on the appropriate layout Microservices in Azure for the data in the cloud accordingly, and make the best use of the Azure budget. Based on this information, one Rainer Stropek can distribute datasets most effectively across HDIn- With Azure Function Apps, Microsoft has sight storage options. been supporting serverless microservices for quite a while. In this session, Rainer Data migration Stropek focuses on the current version of Microsoft’s Functions SDK and demonstrates how Migrate on-premises data to Azure it integrates with the rest of the Azure platform to There are two main options to migrate data from on- design and implement event-driven software. This premises to Azure. will be a demo-heavy session (C#, .NET Core) with only a few slides. 1. Transfer data over network with TLS 2. Shipping data offline www.serverless-architecture.io @ServerlessCon # SLA_con 18
You can also read