WHERE DO YOU "TUBE"? UNCOVERING YOUTUBE SERVER SELECTION STRATEGY
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Where Do You “Tube”? Uncovering YouTube Server Selection Strategy Vijay Kumar Adhikari, Sourabh Jain, Zhi-Li Zhang University of Minnesota - Twin Cities Abstract id space, hierarchical logical video server DNS names- paces and physical video cache servers. YouTube then YouTube is one of the most popular video sharing web- uses a combination of static and dynamic load balancing site in the world. In order to serve its globally distributed approaches to distribute the demand to its physical re- users, it requires a massive-scale video delivery system. sources. As shown in that work, the video-id space con- We analyze DNS resolutions and video playback sists of fixed length unique identifier referring to each traces collected by playing half a million YouTube YouTube video, which are then mapped to hierarchi- videos from geographically distributed PlanetLab nodes cal video server namespaces using static hashing. In to uncover load-balancing and server selection strate- addition, YouTube DNS servers map these server host- gies used by YouTube. Our preliminary results indicate names to IP addresses corresponding to physical hosts in that YouTube is aggressively deploying cache servers at a client-location aware manner. many different locations around the world, and several of them are co-located with ISP-PoPs to reduce cost and In this paper, we use the same active measurement in- improve the end-user performance. We also find that frastructure used in [1] to provide further insights into YouTube tries to use local “per-cache” load-sharing be- the YouTube video delivery network. In particular, we fore resorting to redirect a user to bigger/central cache provide additional details on how various strategies are locations. used by YouTube to distribute the video request to its ge- ographically distributed global cache servers, how these strategies interact with each other, and how this interac- 1 Introduction tion impacts the performance observed by the user. Our study shows that YouTube uses 3 different approaches to YouTube, which started as a “garage-project” to share distribute the load among various servers. a. Static load videos online in 2005, has seen an explosive growth in sharing using hash based mechanism: As noted in [1], its popularity. Today it is indisputably the world’s largest YouTube maps each video-id to a unique hostname in video sharing site. It serves millions of users across the each of the namespace in the hierarchical DNS based world every day. However, due to its ever increasing pop- host namespaces. b. Semi-dynamic approach using loca- ularity and demand, it is subjected to a continual expan- tion aware DNS resolutions: YouTube maps each DNS sion to accommodate the growing demands. As shown hostname to an IP address, which represents a physical in a recent study [2], in 2008, it used 6 large data-centers video cache, based upon the user location and current located with in United States to serve the videos to users demand. As seen in our experiments, YouTube redirects (while LimeLight CDN was used to push the most pop- user to a geographically close cache location during the ular videos to users). However, these data-centers were “normal” hours. However, during the “busy” hours it not enough to meet the increasing global demand, and uses DNS based resolutions to direct the user to a slightly sometimes after it was bought by Google, it started ex- farther location, which helps in avoiding geographical panding its video distribution network by using Google’s hot-spots. c. Dynamic load-sharing using HTTP redi- infrastructure. In a recent work [1], Adhikari et.al. rections: Finally, to further balance the load on differ- used a reverse engineering based methodology to un- ent physical servers YouTube caches used HTTP redi- cover the basic principles behind the design of YouTube rections to direct user from a busy server to a not so busy video delivery cloud. It showed that YouTube video de- video server. It helps in smoothing the skewed load dis- livery cloud consists of three components: the video- tribution caused by the combination of video popular-
ity and the spontaneous video demands. Our findings popularity distribution, popularity evolution, and key el- also show that YouTube caches are present at more than ements that shape the popularity distribution using data- to 45 cities in 25 different countries around the world. driven analysis. The authors in [4] investigate the (top These trends suggest that Google is aggressively push- 100 most viewed) YouTube video file characteristics and ing its content close to users by placing a large num- usage patterns such as the number of users, requests, as ber of “cache servers” at various geographical locations seen from the perspective of an edge network. Another around the world. Moreover, several of these caches are study[6] analyzed network traces for YouTube traffic at co-located with ISP-PoPs, which not only helps in reduc- a campus network to understand benefits of alternative ing the bandwidth cost for both the ISP and YouTube, but content distribution strategies. A more recent work [5] also improves the performance for the ISP users. studies the impact of the YouTube video recommenda- The remainder of the paper is organized as follows. tion on the popularity of videos. Our study compli- We present background and related work in Section 2. ments and advances these works by shedding lights on Section 3 describes and characterizes various YouTube the multi-step load-balancing and server selection strat- cache locations. We discuss YouTube’s server selection egy employed by YouTube to serve video contents from and load balancing strategies in Section 4 and conclude geographically diverse cache locations. the paper in Section 5. 2.2 YouTube Video Delivery Cloud 2 Background & Related Work YouTube video delivery cloud consists of three major components: video id space, hierarchical logical video In this section we first summarize the related work. servers represented using multiple anycast1 (or logical) Next, we provide an architecture overview of YouTube DNS namespaces, and a 3-tier physical server cache hi- video delivery cloud based on the findings from a recent erarchy. In the following we present a brief overview study [1]. of these components, while detailed description can be found in [1]. YouTube Video Id Space: Each YouTube video is 2.1 Related Work uniquely identified using a fixed length flat identifier. The most relevant to our work is the recent study carried These identifiers construct the video id space. in [1]. In this study authors used an active measurement Hierarchical Cache Server DNS Namespaces: YouTube testbed comprising of several Planet-Lab nodes and open defines multiple (anycast) DNS namespaces, each rep- recursive DNS servers. Using the testbed authors played resenting a collection of logical video servers with cer- a large number of videos and analyzed the detailed video tain roles. Together, these (anycast) DNS namespaces playback logs to distill out the basic principles behind form a hierarchical organization of logical video servers. the YouTube video delivery cloud. However, the work Logical video servers at each level of this organiza- mostly focused on the organization of video servers and tion are mapped to IP addresses (of “physical” video provides very limited analysis of mechanisms used to servers residing at various locations) within a partic- perform load-balancing and how it impacts the video ular tier of the physical cache hierarchy. There are delivery performance. In this work, we use the same a total of three sets of anycast namespaces, which testbed extract the key load-sharing mechanisms used by we refer to as primary, secondary and tertiary names- YouTube. In addition, we also provide a detailed chart- paces; each namespace has a specific format. E.g. ing of various YouTube cache servers, their locations and hosts in primary namespace use the following format: other characteristics. In another recent study [2], authors v[1-24].lscache[1-8].c.youtube.com. As utilize the Netflow traffic data passively collected at vari- seen in the above example there are a total of 24 × 8 ous locations within a tier-1 ISP to uncover the locations or 192 such hostnames in the primary namespace. Sim- of YouTube data center locations, and infer the load- ilarly, there are 192 hostnames in the secondary names- balancing strategy employed by YouTube at that time. pace, and 64 hostnames in tertiary namespace. These The focus of the study was on the impact of YouTube logical layered DNS namespaces play an important role load-balancing on the ISP traffic dynamics, from the per- in dynamic HTTP request direction mechanism em- spective of the tier-1 ISP. As the data used in the study is ployed by YouTube. In general only the DNS names from spring 2008, the results reflect the YouTube deliv- belonging to the primary namespace are visible in the ery infrastructure pre Google re-structuring. There are URLs or HTML pages referencing videos; whereas DNS several other existing studies of YouTube, which mainly 1 Here by an anycast (DNS) namespace we mean that each DNS focus on user behaviors or the system performance. For name is by design mapped to multiple IP addresses (“physical” video instance, the authors in [3] examined the YouTube video servers). 2
names belonging to the other two namespaces occur server namespace, and 636 and 320 IP address for sec- mostly only in the URLs used in dynamic HTTP request ondary and tertiary namespaces respectively. redirections during video playback. Next, we performed WHOIS queries for all the ex- Physical Cache Servers: Although there are three unique tracted IP prefixes to find the corresponding owner or- logical namespaces used to represent the DNS names for ganizations for them. Our results show that although the hosts in various logical cache servers, each anycast most of the IP prefixes (/24s) belong either to Google DNS hostname may map to a large set of IP addresses. or YouTube, there are approximately 20% of the prefixes E.g., hostname v1.lscache1.c.youtube.com in that belong to several other ISPs. The distribution of the primary namespace maps to 75 different IP addresses de- organizations that own these prefixes is shown in Fig- pending on time and the location of the user. ure 1. In this figure the “others” category includes several Datesets. For this study we used several datasets that regional ISPs such as Bell-Canada, Comcast and a some were collected using our active measurement platform Internet Exchange Points. In addition, our results show using 500 PlanetLab nodes and 1000 open recursive that YouTube caches corresponding to IP addresses from DNS servers. other ISPs are mostly co-located with ISP-PoPs, and only We obtained video playback traces by playing approx- serve the client IP addresses belonging to those ISPs or imately half a million videos used in [1]. We also ob- their customers. We verified this by trying to access a tained IP geolocation data from the same study. Ad- number of videos from host IP addresses within those ditionally, we collected DNS resolution data for the ISPs and outside those ISPs. In all the cases, we found YouTube hostnames from all the PlanetLab nodes con- that only the client IP addresses within the corresponding tinuously for a month. During that period we also moni- ISP were able to successfully access the video. Requests tored the change in view-counts for all 500K videos. coming from outside hosts were not served. 3 Charting YouTube Global Caches 3.2 YouTube Cache Locations and Sizes In this section we provide a detailed charting of YouTube physical video servers in terms of the IP addresses used and their geographical locations. For these analyses we extract the YouTube server IP addresses by resolving YouTube video server hostnames from various PlanetLab nodes and open recursive DNS servers continuously for more than a month. Next, we utilize the IP to city map- ping data used in [1] to chart the geographic distribution of YouTube caches. Figure 2: Three tiers of YouTube video cache loca- 3.1 IP addresses for YouTube servers tions(P=Primary,S=Secondary,T=tertiary). YouTube 11% Others 19.5% Using the IP address to location mapping data we identified a total of 45 cities where YouTube video caches are present. Out of these 45 locations, 43 distinct locations correspond to primary caches, while secondary and tertiary caches mapped to only 8 and 5 distinct lo- cations respectively. The geographical span of these lo- cations is shown in Figure 2. The primary, secondary and tertiary cache locations are indicated by letters P, S Google 69.5% and T respectively in that figure. We note that at some locations, more than one tier of the cache hierarchy are Figure 1: Organizations where the YouTube video- present. serving IP prefixes come from. Although there are 45 unique locations where YouTube caches are present, not all the locations are Using the DNS resolutions of YouTube servers’ logi- same in terms of the available resources. To understand cal DNS namespaces we extracted a total of 6000 unique the diversity in the size of different cache locations, we IP addresses. In terms of the namespace hierarchy, there plot the distribution of YouTube cache sizes in Figure 3. are 5000 IP addresses corresponding to primary logical We use the number of IP addresses present at each lo- 3
cation as a proxy for the size of that cache location. In To demonstrate this, we plot the percentile for the la- this figure the x-axis represents the YouTube locations, tency between the vantage point and the mapped IP ad- and the y-axis represents the size in terms of the num- dress, which is shown in Figure 4. In this figure, X-axis ber of IP addresses. As seen in this figure, the sizes of represents the percentile corresponding to the latency. the cache locations vary widely. In particular there are This indicates that although the hostnames could have locations, which have large number of IP addresses, e.g. been mapped to any of the 75 IP addresses, in almost all Frankfurt cache location has more than 600 IP address, cases, YouTube maps them to a physical cache server (or while cache located in Athens has merely 9 IP addresses. IP address) that is close to the client requesting a video. 800 1 Number of IP address 0.8 600 0.6 400 0.4 200 0.2 0 0 0 10 20 30 40 50 60 70 80 90 100 Cache locations Percentile to which the mapped IPs belong. Figure 3: Distribution of size of YouTube cache locations Figure 4: CDF plot showing which decile mapped IPs (cities) in terms of number of IP addresses seen. belong to in terms of ICMP ping latency. We also visualize the location-awareness in DNS res- olution using a geographic map as shown in Figure 5. 4 Physical Server Selection & Load Bal- In this figure we tag each PlanetLab node’s location us- ancing ing one of the 5 YouTube tertiary cache location that it is mapped to. Here, each balloon represents a PlanetLab We saw that YouTube aggressively pushes its content node and the letter indicates the tertiary cache location closer to user using a large number of geographically dis- that node gets redirected to using DNS based resolutions. tributed video cache servers. In this section, we use that We can clearly see that in almost all of the cases, the information to uncover the several mechanisms used by nodes pick the closest tertiary cache location. YouTube to serve each user request, so as to maximize user performance and also to evenly distribute load on its cache servers. When a user wants to watch a video, the video ID gets mapped to a unique hostname. The hostname then maps to different IP address based upon multiple factors. YouTube’s server selection strategy can therefore be un- derstood by examining this hostname to IP mapping. In this section we investigate the spatial and temporal fac- tors that influence what IP the hostname gets mapped to. 4.1 Location Awareness in Server Selection Figure 5: Locations PlanetLab nodes indicating In order to understand how the user’s location plays a their tertiary-cache location choices (B=BRU, C=CBF, role in the mapping of YouTube cache server hostname F=FRA, I=IAD, N=NUQ). to IP address mapping, we resolved the hostnames corre- sponding to primary logical servers from all the vantage These results show that YouTube tries to send users to points. We find that each primary logical server host- one of the closest cache locations. name maps to approximately 75 unique IP addresses. We also computed latencies between the vantage points and 4.2 Temporal Patterns in DNS Mappings all the mapped IP addresses. These round trip delay mea- surements show that the hostname to IP address mapping To understand temporal changes in DNS resolutions, we provided by YouTube DNS servers tries to direct the user continuously resolve the logical server hostnames from to a near by physical cache server. all the vantage points for a month-long period. 4
Our analysis of these DNS resolutions revealed that approximately 4 times more video requests than others. we can group the vantage points into two distinct groups In fact, we can expect that during smaller time windows based on how frequently the mappings change. In the at different location this distribution is even more likely first group of vantage points the mappings change dur- to be skewed, which can result into highly uneven load ing a certain time of the day, and the pattern repeats ev- distribution on different physical servers. ery day. However, the pattern is more continuous for the vantage points in the second group. x 10 8 1.6 To illustrate this we plot the IP address mappings for Number of video requests served one of the primary logical server’s hostname. Figure 6 1.4 shows an example plot for the second group. In this fig- 1.2 ure, the X-axis represents the time which is divided in the intervals of 5 minutes each, and Y-axis represents the 1 mapped IP address. As seen in this group the mapped 0.8 IP address changes almost every time. It suggests that YouTube is trying to use 4 distinct physical servers to 0.6 represent the instance of one particular primary logical 0.4 0 20 40 60 80 100 120 140 160 180 200 server, and changes the mapped IP address to divide the Hostnames from primary namespace client request to each of these physical servers. On the other hand, there are other locations where new IP ad- Figure 8: Distribution of video requests per primary dresses only show up at specific times of the day. Our cache hostnames. analysis of DNS resolution pattern for these locations shows that each logical server hostname maps to a fixed We can see that in this setting, it is very hard to achieve IP address most of the time during the day, however, dur- even load sharing among different physical servers at a ing the certain hours of the day we see a large number of given cache location. In addition, these static hash-based distinct IP addresses for the same hostname. We show an load distribution and DNS based dynamic mappings, by example of such a location in Figure 7. In this figure, we themselves, can not ensure that a location will be able to see that DNS servers primarily map the hostname to IP handle a sudden increase in video demand. To deal with 3. However, at “busy” hours of the day, the DNS server these problems, YouTube uses a more dynamic approach also starts mapping the hostname to other IP addresses. using HTTP redirection. In this week long data, we can clearly see 7 specific peri- To understand this HTTP redirection based dynamic ods (one every day) in which we see additional IP address load sharing approach, we extract all the instances where other that IP 3. a user is redirected to a different physical server using HTTP redirections. Our analysis of these redirect logs 4.3 Need for Dynamic Load Balancing reveal two distinct patterns in redirections. In the first set of instances, a logical server in one namespace redirects We have seen that YouTube uses a static hash-based to another logical server in the same namespace. We re- scheme to map videos to hostnames. However, the static fer to these cases as “intra-tier” redirections. In rest of mapping of videos to hostname can lead to uneven load the cases, we see that a server in one namespace redirects on the logical servers due to the different popularity for to a server in a higher tier of cache hierarchy (such as a each video. For instance, if a video becomes popular, the server from primary namespace redirecting to a server logical server corresponding to that video will experience in secondary namespace). These cases are referred to as more traffic than other servers. These uneven loads for “inter-tier” redirections. logical servers can then translate into uneven load distri- In case of intra-tier redirection, we looked at loca- bution for physical servers corresponding to them. tions of the servers involved in the redirection. A careful To understand this uneven load among different inspection of these redirections showed that the IP ad- servers, we tracked the global video view counts for ap- dresses of the two servers (the one that redirects the client proximately half a million videos over a month. We see and the one which receives and handles the redirected re- that even when we look at aggregate video counts over a quests) are in the same location (city). This means in case month that includes videos views from all over the world, of intra-tier redirections, YouTube is trying to redirect a different logical servers receive widely differing video user another physical server at the same location. requests. To demonstrate this we plot the distribution of This suggests that YouTube is trying to distribute the video request on each logical server during a month for load locally: i.e. a busy server in one location redirecting our sample of half a million videos in Figure 8. As seen a client to a less busy server in the same cache location. in this figure, some logical servers are responsible for To see if this is indeed the case, we look at the individual 5
IP address identifiers. 4 3 2 1 200 400 600 800 1000 1200 1400 1600 1800 2000 Time slot identifiers. Each point indicates a 5 minute interval. Figure 6: Example plot showing hostname to IP mapping changing continuously. 8 IP address identifiers. 6 4 2 200 400 600 800 1000 1200 1400 1600 1800 2000 Time slot identifiers. Each point indicates a 5 minute interval. Figure 7: Example plot showing hostname to IP mapping changing only during peak hours. IP addresses involved in redirections. We see that at each geographic locations and other characteristics. We also location and at any given time period, there are servers uncovered key ideas behind the application-level redirec- that seem to redirect to a different set of servers at the tion mechanism that YouTube uses and how that impacts same location. the performance as observed by its users. These results We see that the use of HTTP redirects helps YouTube demonstrate how one of the best content distribution net- to achieve a more fine granular load distribution among work works. This has implications on how any such fu- servers. The use of intra-tier and inter-tier redirections ture systems can be developed and deployed. require no global coordination among resources. The In the future, we plan to build upon these prelimi- local load sharing only requires sharing load informa- nary findings to investigate other aspects of YouTube tion among nodes in the same location. Redirecting to video delivery system such as the effects of popularity of a higher tier also does not need any global coordination videos and size of cache locations on server selection. since the target hostname can easily be obtained by the same static hashing mechanism. References However, the use of redirections also has its own draw- [1] A DHIKARI , V. K., JAIN , S., C HEN , Y., AND Z HANG , backs. First of all, since YouTube first tries to load- Z.-L. How Do You ”Tube”? In Tech Report, http: balance locally, it leads to multiple redirections if the //www-users.cs.umn.edu/˜viadhi/resources/ host that receives the redirection can not serve the video. youtube-tech-report.pdf (2010). In addition, each redirection requires a client to make an [2] A DHIKARI , V. K., JAIN , S., AND Z HANG , Z. YouTube Traffic additional HTTP request, it also leads to higher delays Dynamics and Its Interplay with a Tier-1 ISP: An ISP Perspective. before the video starts playing back. Moreover, inter-tier In IMC ’10 (2010), ACM. redirections generally leads a client to a distant cache lo- [3] C HA , M., K WAK , H., RODRIGUEZ , P., A HN , Y.-Y., AND M OON , S. I tube, you tube, everybody tubes: analyzing the cation because the higher tier caches are only present at world’s largest user generated content video system. In IMC ’07 small number of locations. (2007), ACM. [4] G ILL , P., A RLITT, M., L I , Z., AND M AHANTI , A. Youtube traffic characterization: a view from the edge. In IMC ’07 (2007), ACM. 5 Conclusion [5] Z HOU , R., K HEMMARAT, S., AND G AO , L. The Impact of YouTube Recommendation System on Video Views. In IMC ’10 We examined YouTube’s video distribution architecture (2010), ACM. to uncover key aspects that determine how video servers [6] Z INK , M., S UH , K., G U , Y., AND K UROSE , J. Characteristics and their locations are determined when users try to of youtube network traffic at a campus network - measurements, watch YouTube videos. We found that YouTube uses a models, and implications. Comput. Netw. (2009). large number of data-centers and caches that vary in size, 6
You can also read