Netflix: Building Up and Scaling Out on Open Source
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Presenters Adrian Cockcroft is the director of architecture for the Cloud Systems team at Netflix. He is focused on availability, resilience, performance, and measurement of the Netflix cloud platform, and has presented at many conferences, including QCon San Francisco, Beijing and Tokyo. Adrian is also well known as the author of several books while a Distinguished Engineer at Sun Microsystems: Sun Performance and Tuning; Resource Management; and Capacity Planning for Web Services. From 2004-2007 he was a founding member of eBay Research Labs. He graduated with a BSc in Applied Physics from The City University, London. Andrew Aitken - Founder and GM of Olliance Consulting, the leading open source business and strategy consultancy and a division of Black Duck. With 15+ years of industry experience, Andrew is a recognized expert on strategies for FOSS commercialization and a leader in the open source community. Founder of the industry’s only “think tank” on the future of commercial open source, a bi-annual event held in Napa, CA and Paris, France, and regularly attended by the leading CEOs and visionaries. He has served as an expert witness on the issues of open source and been an invited guest lecturer at Stanford’s Entrepreneur program. Andrew has chaired and spoken internationally at multiple industry conferences, sits on the Board of Advisors of SugarCRM, DotNetNuke, and Funambol, and has personally worked with companies such as IBM, Microsoft, Intel and the U.S. Navy. In 2 © Black Duck 2013 2
Olliance Consulting, a division of Black Duck Open Source Strategy: Our Experience, Your Success The world’s leading organizations turn to Olliance Consulting to create and implement open source strategies to achieve business success. With more than a decade of experience and hundreds of engagements assisting companies ranging from start-ups to the world’s largest corporations, Olliance creates innovative strategies to leverage the strategic, financial and technological advantages of open source software and methods. Profile – Open Source Software Industry’s leading business consultancy – Over 700 engagements to date – Trusted Advisor to leading Fortune 2000 companies 3 © Black Duck 2013
Open Source Think Tank The Open Source Think Tank is an invitation-only conference for 140 CEOs, CIOs, CTOs, legal experts, investors and other senior executives engaged in open source software. An annual event held in Napa, CA, and regularly attended by the industry’s leading CEO’s and visionaries. Visit osthinktank.com 4 © Black Duck 2013
Cloud Native Open Source at Netflix June 2013 Adrian Cockcroft @adrianco #netflixcloud @NetflixOSS http://www.linkedin.com/in/adriancockcroft
We are Engineers We solve hard problems We build amazing and complex things We fix things when they break
But perfection takes too long… So we compromise Time to market vs. Quality Utopia remains out of reach
Where time to market wins big Web services Agile infrastructure - cloud Continuous deployment
How Soon? Code features in days instead of months Hardware in minutes instead of weeks Incident response in seconds instead of hours
Tipping the Balance Utopia Dystopia
A new engineering challenge Construct a highly agile and highly available service from ephemeral and often broken components
Inspiration
Netflix Streaming A Cloud Native Application
Netflix Member Web Site Home Page Personalization Driven – How Does It Work?
How Netflix Streaming Works Consumer Electronics User Data Web Site or AWS Cloud Discovery API Services Personalization CDN Edge Locations DRM Customer Device Streaming API (PC, PS3, TV…) QoS Logging CDN Management and Steering OpenConnect CDN Boxes Content Encoding
Content Delivery Service Open Source Hardware Design + FreeBSD, bird, nginx
Nov 2012 Streaming Bandwidth 18x March 2013 Mean Bandwidth +39% 6mo 25x Amazon Video 1.31%
Real Web Server Dependencies Flow (Netflix Home page business transaction as seen by AppDynamics) Each icon is three to a few hundred instances across three Cassandra AWS zones memcached Web service Start Here S3 bucket Personalization movie group choosers (for US, Canada and Latam)
New Anti-Fragile Patterns Micro-services and Chaos engines Highly available systems composed from ephemeral components Open Source is the default
Cloud Native Master copies of data are cloud resident Everything is dynamically provisioned All services are ephemeral
How to get to Cloud Native Freedom and Responsibility for Developers Decentralize and Automate Ops Activities Integrate DevOps into the Business Organization
Netflix BusDevOps Organization Chief Product Officer VP Product VP UI VP Discovery VP Platform Management Engineering Engineering Directors Directors Directors Directors Product Development Development Platform Code, independently updated Developers + Developers + Developers + continuous delivery DevOps DevOps DevOps Denormalized, independently UI Data Discovery Platform updated and scaled data Sources Data Sources Data Sources Cloud, independently updated and scaled infrastructure AWS AWS AWS
Four Transitions • Management: Integrated Roles in a Single Organization – Business, Development, Operations -> BusDevOps • Developers: Denormalized Data – NoSQL – Decentralized, scalable, available, polyglot • Responsibility from Ops to Dev: Continuous Delivery – Decentralized small daily production updates • Responsibility from Ops to Dev: Agile Infrastructure - Cloud – Hardware in minutes, provisioned directly by developers
Cost Process reduction reduction Lower Slow down Higher Speed up margins developers margins developers Less More More Less revenue competitive revenue competitive What’s Different? Get out of the way of innovation Best of breed, provisoned by the hour Choices based on features and scale Almost everything is Open Source
Decentralized Deployment
Asgard http://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html
Ephemeral Instances • Largest services are autoscaled • Average lifetime of an instance is 36 hours P u s h Autoscale Up Autoscale Down
Global Deployment
Cross Region Use Cases • Geographic Isolation – US to Europe replication of subscriber data – Read intensive, low update rate – Production use since late 2011 • Redundancy for regional failover – US East to US West replication of everything – Includes write intensive data, high update rate – Testing now
Managing Multi-Region Availability AWS DynECT Route53 Denominator UltraDNS DNS Regional Load Balancers Regional Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Denominator – manage traffic via multiple DNS providers
Benchmarking Global Cassandra Write intensive test of cross region capacity 16 x hi1.4xlarge SSD nodes per zone = 96 total Validation Test 1 Million reads Test Load Load 1 Million writes Load CL.ONE with no CL.ONE Data loss US-West-2 Region - Oregon US-East-1 Region - Virginia Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Inter-Zone Traffic Inter-Region Traffic Up to 9Gbits/s, 83ms 18TB S3
Cloud Native Big Data
Netflix Dataoven From cloud RDS Services ~100 Billion Ursula Events/day Metadata From C* Aegisthus Terabytes of Dimension data Data Pipelines Gateways Data Warehouse Over 2 Petabytes Hadoop Clusters – AWS EMR Tools 1300 nodes 800 nodes Multiple 150 nodes Nightly
A Cloud Native Open Source Platform
Beware of Geeks Bearing Gifts: Strategies for an Increasingly Open Economy Simon Wardley - Researcher at the Leading Edge Forum
How did Netflix get ahead? Netflix BusDevOps Org Traditional IT Operations • Doing it since 2009 • Taking their time • SaaS Applications • Pilot private cloud projects • PaaS for agility • Beta quality installations • Public IaaS for AWS features • Small scale • Big data in the cloud • Integrating several vendors • Integrating many APIs • Paying big $ for software • FOSS from github • Paying big $ for consulting • Renting hardware for 1hr • Buying hardware for 3yrs • Coding in Java/Groovy/Scala • Hacking at scripts
Netflix Platform Evolution 2009-2010 2011-2012 2013-2014 Bleeding Edge Common Shared Innovation Pattern Pattern Netflix ended up several years ahead of the industry, but it’s becoming commoditized now
Making it easy to follow Exploring the wild west each time vs. laying down a shared route
Establish our Hire, Retain and solutions as Best Engage Top Practices / Standards Engineers Goals Build up Netflix Benefit from a Technology Brand shared ecosystem
How does it all fit together?
Example Application – RSS Reader Zuul Traffic Processing and Routing Z U U L
Zuul Architecture http://techblog.netflix.com/2013/06/announcing-zuul-edge-service-in-cloud.html
Zuul Components
What’s Coming Next? Better portability Higher availability More Features Easier to deploy Contributions from end users Contributions from vendors More Use Cases
Vendor Driven Portability Interest in using NetflixOSS for Enterprise Private Clouds “It’s done when it runs Asgard” Functionally complete Demonstrated March Released June in V3.3 Growing vendor interest Some vendor interest Openstack “Heat” getting there Needs AWS compatible Autoscaler Another very large vendor planning to demo NetflixOSS at July 17th Meetup
AWS 2009 Baseline features needed to support NetflixOSS Eucalyptus 3.3
Boosting the @NetflixOSS Ecosystem
Judges Aino Corry Martin Fowler Program Chair for Qcon/GOTO Simon Wardley Chief Scientist Thoughtworks Strategist Werner Vogels Yury Izrailevsky CTO Amazon Joe Weinman VP Cloud Netflix SVP Telx, Author “Cloudonomics”
Award Registration Apache Close Entries AWS Ceremony Github Opened Github Licensed Github September 15 Dinner March 13 Contributions Re:Invent November Six Judges Winners $10K cash $5K AWS Netflix Nominations Categories Ten Prize Engineering Categories AWS Trophy Re:Invent Conforms to Working Community Tickets Entrants Rules Code Traction
Functionality and scale now, portability coming Moving from parts to a platform in 2013 Netflix is fostering a cloud native ecosystem Rapid Evolution - Low MTBIAMSH (Mean Time Between Idea And Making Stuff Happen)
Slideshare NetflixOSS Details • Lightning Talks Feb S1E1 – http://www.slideshare.net/RuslanMeshenberg/netflixoss-open-house-lightning-talks • Asgard In Depth Feb S1E1 – http://www.slideshare.net/joesondow/asgard-overview-from-netflix-oss-open-house • Lightning Talks March S1E2 – http://www.slideshare.net/RuslanMeshenberg/netflixoss-meetup-lightning-talks-and- roadmap • Security Architecture – http://www.slideshare.net/jason_chan/ • Cost Aware Cloud Architectures – with Jinesh Varia of AWS – http://www.slideshare.net/AmazonWebServices/building-costaware-architectures-jinesh- varia-aws-and-adrian-cockroft-netflix
Takeaway NetflixOSS makes it easier for everyone to become Cloud Native Open Source is not just the default, it’s a strategic weapon @adrianco #netflixcloud @NetflixOSS
Q&A 57 © Black Duck 2013
Amazon Cloud Terminology Reference See http://aws.amazon.com/ This is not a full list of Amazon Web Service features • AWS – Amazon Web Services (common name for Amazon cloud) • AMI – Amazon Machine Image (archived boot disk, Linux, Windows etc. plus application code) • EC2 – Elastic Compute Cloud – Range of virtual machine types m1, m2, c1, cc, cg. Varying memory, CPU and disk configurations. – Instance – a running computer system. Ephemeral, when it is de-allocated nothing is kept. – Reserved Instances – pre-paid to reduce cost for long term usage – Availability Zone – datacenter with own power and cooling hosting cloud instances – Region – group of Avail Zones – US-East, US-West, EU-Eire, Asia-Singapore, Asia-Japan, SA-Brazil, US-Gov • ASG – Auto Scaling Group (instances booting from the same AMI) • S3 – Simple Storage Service (http access) • EBS – Elastic Block Storage (network disk filesystem can be mounted on an instance) • RDS – Relational Database Service (managed MySQL master and slaves) • DynamoDB/SDB – Simple Data Base (hosted http based NoSQL datastore, DynamoDB replaces SDB) • SQS – Simple Queue Service (http based message queue) • SNS – Simple Notification Service (http and email based topics and messages) • EMR – Elastic Map Reduce (automatically managed Hadoop cluster) • ELB – Elastic Load Balancer • EIP – Elastic IP (stable IP address mapping assigned to instance or ELB) • VPC – Virtual Private Cloud (single tenant, more flexible network and security constructs) • DirectConnect – secure pipe from AWS VPC to external datacenter • IAM – Identity and Access Management (fine grain role based security keys)
You can also read