Opening Up the Sky: A Comparison of Performance-Enhancing Features in
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Opening Up the Sky: A Comparison of Performance-Enhancing Features in SkyDrive and Dropbox Herman Slatman University of Twente P.O. Box 217, 7500AE Enschede The Netherlands H.Slatman@student.utwente.nl ABSTRACT increasingly making use of cloud storage services, like Cloud storage services are increasing in popularity and using a Dropbox, Google Drive and Microsoft SkyDrive, to store and growing amount of bandwidth on the Internet. Insights on how share files with great ease. Those cloud storage services already much traffic is generated is needed for a number of reasons. generate quite some traffic on the Internet - an educated guess 1 Cloud storage providers are interested in serving their clients on the total amount of traffic generated by uploading files to efficiently and effectively, and they want to know how their Dropbox, is estimated at about 54Gbps - and it is to be expected product is performing and how they can improve their service. the amount of traffic due to cloud storage services will further Internet Service Providers need an indication of the amount of increase in the future. To maintain the quality of the Internet in traffic generated by cloud storage. Lastly, users of cloud storage terms of available bandwidth and latency, predicting the impact services might want to know how their favorite service cloud storage services have and will have on the Internet is performs. At the moment not much is known about the important. To gain a better understanding of the impact of cloud performance of different cloud storage providers, but this paper storage services, having knowledge of the internals and the aims at getting a thorough understanding of those services and performance of those services is necessary. Not much is known their impact on the Internet. This paper focuses on Microsoft about the internals of cloud storage services, but [1] gives a SkyDrive, as this is the second most popular cloud storage great insight in Dropbox’s internals, which is shown to be the service [1] and because it has been neatly integrated in the most popular cloud storage provider. Microsoft Windows operating system. Microsoft SkyDrive will be compared to Dropbox in terms of The goal of this paper is to get a thorough understanding of the performance-enhancing features. As shown in [1], Dropbox Microsoft SkyDrive service, its internals and, specifically, the storage servers are all located in the United States, which is not performance of aforementioned service. The main reasons an optimal solution for clients spread around the world. Also, Microsoft SkyDrive has been chosen as the research topic, are the way SkyDrive manages and transfers its files will be that it is the second largest service [1] in terms of traffic analyzed to assert whether SkyDrive has deployed more generated and because it has the potential to grow substantially efficient synchronization strategies than Dropbox. in the near future. The latter is because SkyDrive can be This research contributes to getting to know which technologies accessed via the Web and several client applications are the state-of-the-art cloud storage services have or have not available for different operating systems and because it has deployed to increase performance and to gain a thorough been neatly integrated in the Microsoft Windows operating understanding of the performance of Microsoft SkyDrive system. compared to Dropbox’s. To say something about the performance of SkyDrive, a comparison against Dropbox will be performed. The main Keywords research question reads the following: Cloud Storage, Performance, SkyDrive How does SkyDrive compare to Dropbox, in terms of the presence of performance-enhancing features? 1. INTRODUCTION Recent developments show an increased interest in the use of The main research question is focused on generated traffic and cloud storage services. Both individuals and enterprises are the efficiency with which SkyDrive handles files on its service. To answer this research question, the research is split up in three parts, answering the following questions: Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are How are the administration and transfer of files controlled not made or distributed for profit or commercial advantage and that in SkyDrive? copies bear this notice and the full citation on the first page. To copy How are the servers in SkyDrive distributed over the otherwise, or republish, to post on servers or to redistribute to lists, world? requires prior specific permission and/or a fee. 18thTwente Student Conference on IT, January 25, 2013, Enschede, The Netherlands. 1 Copyright 2013, University of Twente, Faculty of Electrical Engineering, http://www.extremetech.com/computing/129183-how-big-is- Mathematics and Computer Science. the-cloud, accessed on 03-10-2012
Does SkyDrive deploy specific technologies to enhance 3. METHODOLOGY its performance, and how do these compare to Dropbox? Active and passive measurements have been carried out to assess the performance of SkyDrive and to compare it with The first question was included to gain an understanding of the Dropbox. These included uploading files to the SkyDrive operation of SkyDrive. Information gained from this was used servers and measurements to determine the location of the to setup the experiments for the two remaining research servers. questions. Together these questions give an insight of the performance of the Microsoft SkyDrive service compared to Before conducting any active or passive experiments, a lab Dropbox in terms of features. environment suitable for those experiments was setup. This lab environment consisted of a host PC running Debian An overview of the SkyDrive will be given in Section 2. GNU/Linux version 6.0, kernel 2.6.32-5-amd, on which Section 3 describes the methodology used to conduct this Wireshark, a popular packet sniffer and network protocol research. In Section 4 technologies that enhance the analyzer, was installed. Windows 7 was installed as a virtual performance of cloud services are introduced. Section 5 machine. On this virtual machine, the SkyDrive client compares SkyDrive and Dropbox. The subsequent Section application was installed, together with Charles, a web introduces related work. Section 7, lastly, summarizes the debugging proxy. Charles is a shareware application that allows conclusions of this paper. for setting up a local proxy to capture, for example, all data that is sent via SSL/TLS encrypted connections. 2. A BIRD’S EYE VIEW OF SKYDRIVE The setup described above allowed for capturing and analyzing SkyDrive was initially released by Microsoft in 2007 and has all traffic that was exchanged during the various experiments, since then been known under a few different names. At the time including the encrypted traffic. of writing, it offers 7 GB of storage for free to new users, whereas early users could opt-in for a free 25GB if they had 3.1 File Administration and Transfers Files differing in size and containing random text were used the service before the 22nd of April of 2012. uploaded to SkyDrive to determine the way SkyDrive handles Client applications are available for Windows Vista and file administration and transfers. At first these uploads were Windows 7, which can be used to integrate SkyDrive inside analyzed only using Wireshark, which showed all transfers and those operating systems. In Windows 8, Microsoft’s newest administration of files were carried over encrypted connections. iteration of the operating system, the SkyDrive has been Charles was used to gain a more thorough understanding of the integrated natively. Client applications are also available for the information sent over those encrypted connections. Hostnames OS X, iOS, Windows Phone and Android operating systems, used in the service were recorded and the corresponding IP- covering a broad spectrum of devices. This paper focusses on addresses were added together with the functionality they the desktop client for Windows 7. provide. A web interface to the SkyDrive service is also available, which is built on HTML5 technologies 2. Amongst other things, it 3.2 Distribution of Servers supports email-integration, integration with Microsoft Office Active measurements were performed to assess the and it features Office Web Apps, in which users can create, geographical distribution of servers in the SkyDrive service. view and edit documents right in the browser and store them on This was done in two consecutive steps. The first step was to SkyDrive. find out what hostnames the SkyDrive application in the lab environment would connect to. Wireshark was used to analyze Users can login to the service with their Microsoft Account the relevant Internet traffic and it then showed some of the which is used in all other services provided by Microsoft. Files hostnames that SkyDrive connects to. Some online on the service can be shared with other people that have a investigation showed more hostnames 4,5 to incorporate in this Microsoft Account, but it is also possible to share files on social research. networks, such as Twitter, LinkedIn and Facebook. SkyDrive maintains an Access Control List (ACL) for every file and The second step involved setting up a test bed of Planet-Lab folder 3, which is used to grant users the privileges needed to servers spread over the world. On those machines the execute the associated operations on a file. It is possible, for traceroute and dig commands were executed against the example, to create an URL for a file which has the property that hostnames found during the first step, to determine whether the everyone is allowed to read the file, but not change it. It is also hostnames would always resolve to the same IP. The IP- possible to mandate to be logged in before access is granted. addresses that resulted from this step were all queried against the databases on MaxMind.com and Route.IM to get On the 15th of November 2012, Microsoft introduced selective information on their geographical location. sync, enabling users to control which files are being synchronized amongst their devices. Updates to the SkyDrive The results gained from querying those two websites were not applications for Windows Phone 8 and Android were also rolled taken for granted though, as research [5] shows these GeoIP out. According to Microsoft, on November the 15th the amount services are not always precise, especially on the city-level. of SkyDrive storage had doubled since the introduction of the Instead, the results of the queries on those websites have been desktop and mobile applications on April the 22nd of 2012. complemented by traceroute timings, to further establish the outcomes. 2 4 http://bit.ly/SD-Modern-Web, Introducing SkyDrive for the http://bit.ly/Upload-Issues-For-ISP, Microsoft Answers, modern web, built using HTML5, accessed on 29-10-2012 accessed on 29-10-2012 3 5 http://bit.ly/Rebuilding-Permissions, Designing app-centric http://bit.ly/Low-Bandwidth-Areas, Microsoft Answers, sharing for SkyDrive, accessed on 07-11-2012 accessed on 29-10-2012
3.3 Comparison With Dropbox Cloud storage providers can implement a feature called delta The comparison between Dropbox and SkyDrive has been updates, with which it becomes possible to upload a chunk of based both on a literature survey and active measurements. The data that has been changed, while leaving the unchanged literature survey was performed first to create an understanding chunks of data untouched [1]. An example of an algorithm that of features that in general improve the performance of cloud can be used to implement delta updates is the rsync algorithm services. Google Scholar was used primarily to search for [7]. Less Internet traffic is generated when delta updates are relevant sources. Starting point were the very generic terms implemented in a cloud storage service, as there is no need to cloud storage and cloud service. Then some more terms were upload an entire file when only a small part is changed. introduced in the search queries: for example performance and infrastructure. 4.3 Data Compression Data compression is the act of encoding data in such a way that Active measurements were then conducted to determine if the the encoded data takes less bytes to store the same information features that were found during the literature survey are present that is present in the original data [4]. When data compression in SkyDrive. This involved uploading a series of different files is deployed on the client side of a cloud storage service, files that were carefully crafted in order to ensure the features would that are exchanged with the cloud storage facility are be exploited when they were present in the service. The files compressed before they are sent over the Internet. This allows that were uploaded as part of these measurements are described for less Internet traffic to be generated as, in general, files can in Section 5.3 and can also be found in Table 3. indeed be compressed. RFC2616 describes the HTTP 1.1 Another part of the comparison is the assessment of the specification [2], which includes a section on compression of popularity of SkyDrive compared to Dropbox. This is not part files sent via HTTP. Compression is in fact in widespread 6 use of the research questions, but was included to be able to say by websites, saving their users bandwidth and time. something about the usage of the service. The dataset that was analyzed as part of this was produced by capturing flow data 4.4 Server Distribution from a building on the campus of the University of Twente, in In general, services on the web perform faster and more which 982 unique IP-addresses were present. These IP- efficiently when the client connecting to such a service is close addresses are assigned statically. The number of unique IP- to the server [8]. Services that are used on a world wide scale addresses that connected to a storage server in the SkyDrive should therefore, ideally, deploy servers distributed all over the service was recorded. This was put against the number of IP- world, to guarantee a good performance and quick response for addresses that connected to a Dropbox storage server. Also, the all users spread. This is no different in cloud storage services, in amount of traffic generated in flows was captured. which a big amount of data has to be uploaded and downloaded, and therefore server distribution is an important part of the performance of those services. 4. CLOUD STORAGE TECHNOLOGIES 4.5 Storage Protocol A literature survey was conducted to gain an understanding of The storage protocol that is at the heart of a cloud storage what features affect the performance of a cloud service, and service, and can therefore severely impact the performance of more specifically, a cloud storage service. A selection of those the service [4]. At a high-level, the protocols may be features has been made and they are discussed and elaborated implemented in an Application Programming Interface (API). on in the following subsections. Several options are available, such as Web- and File-based 4.1 Data Deduplication APIs. Figure 1 shows a diagram with some of the available Many users store a lot of files in the cloud nowadays. It is options categorized on access method, and some technologies perfectly possible some files are uploaded to a cloud storage corresponding to those options. The most popular APIs are facility by two or more different users or that it is being stored REST and SOAP, which are employed by Amazon S3 and twice or more times by a single user. This could be the case for Windows Azure for example. The APIs provide for ways to an e-book for example; it is then unnecessary to save more than connect to services via a specific interface and specify how one copy of the e-book in the storage service. This kind of systems have to communicate with each other, including how administration is known as data deduplication, in this case, data is exchanged between each entity and how data is saved on server-side data deduplication [3], [4]. Data deduplication the cloud storage servers. Other APIs include Block-based allows for less Internet traffic to be generated, as files will not access to cloud storage. have to be uploaded when they are already present on the cloud Another part of the storage protocol is the transport protocol storage facility. In this paper, only client side data deduplication that is used to transfer the files from a client to the storage will be considered. Client-side data deduplication can be servers. An example is of course the TCP/IP stack of protocols, implemented by creating a mechanism that checks if a file is that is also being used in HTTP to power the Web. Dropbox, for already stored on the service, and only uploads a file when it is example, uses the HTTP and HTTPS application layer protocols not already present. This saves bandwidth, as files will not be to transfer its files [1]. The use of HTTP(S) introduces round- uploaded unnecessarily. trip times, as messages are acknowledged upon receipt. The duration of those round-trip times also influences the 4.2 Delta Updates performance of cloud storage services. When a file is created to be stored on a cloud storage facility, it can in general be assumed that all bytes have to be transferred over the Internet. In general, files will change over time and those changes have to be synchronized to the cloud storage facility. 6 http://w3techs.com/technologies/details/ce-compression/all/all
Intelligent Transfer Service (BITS) 7. BITS defines new headers on top of the standard HTTP headers. In BITS new sessions are started for every file that has to be uploaded via a Create- Session packet. Files are uploaded in Fragment packets, which contain information on the part of the file that is being uploaded and the data itself. The Fragment packets contain the blocks that were described in the previous paragraph and, as such, are around 1MB in size in the SkyDrive service. Although SkyDrive uses the BITS headers, it does not seem to run on the BITS protocol. Connections to the storage servers use remote port 443, and data is sent encrypted over the network. Connections to the storage server are closed a little after the file transfer is completed. Figure 1: Cloud storage access methods showing Web- File- A continuous connection is present whenever the SkyDrive and Block-based APIs and others. Figure taken and slightly application is running. This connection also uses remote port adapted from [4]. 443. It periodically polls a notification server for notifications the application has subscribed for. These notifications include the amount of disk space used, the disk space quota and 5. SKYDRIVE VS. DROPBOX information on files that have been uploaded or updated. This Section compares SkyDrive and Dropbox. Subsection 5.1 When the SkyDrive application is started, authentication is describes SkyDrive internals. In subsection 5.2 the geographical performed via login.live.com, based on a Windows Live ID. distribution of servers in SkyDrive will be assessed. In After successfully authenticating, the application registers itself subsection 5.3 a comparison of data deduplication, delta for notifications on act-3.blu.mesh.com. Notifications are sent updates and data compression is performed. In Section 5.4 the by a host suffixed with wns.windows.com. Storage operations popularity of both services is featured. Lastly, subsection 5.5 are all performed against a host suffixed with storage.msn.com, shows how SkyDrive stacks up against Dropbox in a conclusive except in the case of storage via the web interface, which are summary. It also discusses the results and introduces future performed against hosts suffixed with storage.live.com. Other work. hostnames associated with the web interface have been omitted 5.1 SkyDrive In-Depth for brevity. This section describes technical details of SkyDrive that were of Table 1 shows the hostnames that are in use by SkyDrive, interest during the research and is in its totality an answer to the together with services that are provided by those hostnames. first research question. Knowledge about the internals of SkyDrive was needed to setup experiments for the other two research questions. Table 1: Hostnames and their use in SkyDrive Users are identified by a 16-character identifier. This identifier Hostname Service is also used for identifying every single file or folder that is login.live.com Authentication stored on the service. When used as an identifier for files and *.mesh.com Notification subscription folders, a numerical suffix is added to identify the right entity. *.wns.windows.com Notifications An example is B222AADFECF84486!1514, where the skydrivesync.policies.live.net Client Policy updates exclamation mark separates the user- and file-identifier. skyapi.live.net API functions The application stores a local database in which file metadata ssw.live.com Debug/Statistics are kept. This metadata includes filename, client-identifier, file- *.storage.msn.com Storage identifier and a 32-character hash-value. When a file is added, it *.storage.live.com Storage via web is assigned a provisional file-identifier. These look like #b18dd088-9f1f-4bb9-aba1-1206. The file is then uploaded to the storage server and, as soon as the upload is finished, the file 5.2 Server Distribution is assigned a final file-identifier, which looks like the one Table 2 shows the hostnames that are in use by SkyDrive to described in the previous paragraph. store files. The client application runs uploads to exactly one of those hostnames; the one that is used can change over time When a file gets altered, its hash value is checked. When this though, as the hostname that should be used by the service is value is not the same as the one that is present in the database, explicitly stated in the ClientPolicy.ini file. Every hostname is the file is uploaded to the storage server again. associated with at least two distinct IP-addresses. Together with Files stored using the native application on Windows are split the option to change the storage server at runtime due to a up in blocks. The maximum block-size is set in a configuration ClientPolicy.ini update, this indicates load balancing is file (ClientPolicy.ini), which can be found in the local performed in the service. All hostnames have been traced to the application data folder. The currently assigned block-size is United States, using MaxMind.com data, Route.IM data and by 1MB. The SkyDrive application periodically checks online running traceroute. Most of them resolve to the state of whether the policies that are set in ClientPolicy.ini have to be Washington, whereas two IP-addresses where traced to the state updated, so the block-size might be subject to changes. California. The region the hosts behind dm1.storage.msn.com The transfer of files is carried out via HTTPS. Analyzing the originate from could not be resolved on MaxMind.com, but headers of the packets using Charles showed that SkyDrive uses special headers that are defined in Microsoft’s Background 7 http://bit.ly/Microsoft-BITS, Microsoft TechNet, accessed on 14-01-2013.
response times on Route.IM suggest that they are closer to Stage 1 – LI6000.txt, containing 6000 paragraphs of ‘Lorem California than Washington. Ipsum’, was uploaded. This resulted in approximately 3.7 megabytes being uploaded to the SkyDrive storage server. Table 2: Hostnames, associated IP-addresses and locations Stage 2 – The contents of LI6000.txt were copied, appended to for storage servers in SkyDrive the original LI6000.txt and saved, basically doubling the size of the file. This resulted in 7.5 megabytes getting transferred to the Hostname(s) IP address Ctry. Rgn. SkyDrive storage server, which is about equal to the file size of by1.storage.msn.com 65.54.191.46 US WA LI12000.txt. by2.storage.msn.com 65.54.191.47 US WA Stage 3 – The resulting file from Stage 2 was again appended with 6000 paragraphs of ‘Lorem Ipsum’ and saved. The file blu1.storage.msn.com 65.55.195.238 US WA now contains 18000 paragraphs. This resulted in 11.2 blu2.storage.msn.com 65.55.195.239 US WA megabytes being sent to the SkyDrive storage server, which is about equal to the file size of LI18000.txt. dm1.storage.msn.com 157.55.246.46 US - Stage 4 – Consisted of cutting off the last 15000 paragraphs 157.55.246.47 US - from the Stage 3 file and saving, resulting in 1.9 megabytes 157.55.241.174 US - being sent to the SkyDrive storage server, which is about equal to LI3000.txt. 157.55.241.175 US - The above experiment shows SkyDrive does not use delta sn2.storage.msn.com 207.46.0.174 US CA updates. The same measurement was performed with Dropbox 207.46.0.175 US CA as the storage service. Figure 3 shows the results of both measurements. From the figure it can be concluded that Dropbox does indeed employ delta updates, as the amount of upload traffic does not double As the above table shows, all storage servers are located in the when doubling the amount of data in the file and that SkyDrive United States. This means files from all over the world need to does not employ delta updates, as every single byte is sent when be send there to be stored. As SkyDrive uses TCP at the a file is changed. transport layer, and closes the connection to the storage server after a file transfer is completed, this might cripple performance for users that are not close to the United States. This is because Table 3: Files and their size as used during measurements TCP employs a slow-start mechanism. Performance is affected by the round-trip time between the client application and Filename File size (Bytes) File size Rounded storage server. (MB) LI3000.txt 1.866.358 1.9 5.3 Technology Comparison LI6000.txt 3.732.718 3.7 Our experiments showed that SkyDrive does not employ data LI9000.txt 5.599.089 5.6 deduplication, delta updates and data compression. The latter LI12000.txt 7.465.438 7.5 can be established from inspecting Figure 2. The file sizes on LI15000.txt 9.331.798 9.3 the horizontal axis correspond to the file sizes in Table 3. They LI18000.txt 11.198.158 11.2 all contained a specific number of paragraphs of ‘Lorem Ipsum’ LI21000.txt 13.064.518 13.1 - text that is often used as dummy text on websites when LI24000.txt 14.930.878 14.9 designing page layouts 8 -, according to the number that is LI27000.txt 16.797.238 16.8 present in their filename. The reason ‘regular’ text and no random data was inside the files, is because of the possible data LI30000.txt 18.663.598 18.7 compression on files in the service. When random data is inside, the compression rate might well be 0%, which is not the case when regular text is used. Files were built in a modular manner to exploit the features of data deduplication and delta updates. The graph shows a linear progress in upload traffic when the size of the file that is being uploaded increases. The amount of upload traffic is bigger than the file size. This overhead contains the information needed to administer the upload of the file. The absence of data compression can be concluded from the fact that the amount of bytes uploaded is bigger than the amount of bytes the files consist of. The derivative, or direction coefficient, in Figure 2 is about equal to 1.006, whereas employment of data compression would have shown a derivative smaller than 1.0. In Dropbox, according to [1], data compression is present. This can also be established Figure 2: Upload Traffic observed when uploading the from inspecting Figure 2. ‘Lorem Ipsum’ files to SkyDrive. To discover whether SkyDrive employs delta updates an experiment was setup that consisted of four stages: The absence of client-side data deduplication in the SkyDrive service has been established by uploading LI3000.txt to the storage servers five times, each time to a different folder. 8 http://www.lipsum.com/ Analysis of the traffic generated showed that the file was
uploaded to the storage servers in its entirety each time. From this fact can be concluded that SkyDrive does not keep track of files that are already present on the storage servers for a specific user and so does not perform client-side data deduplication to save upload bandwidth. Dropbox does employ client-side data deduplication on a per-user basis. The results of this experiment are shown in Figure 4. Note that only during the first upload the (compressed) bytes are uploaded to Dropbox, whereas the entire file is sent uncompressed every time to SkyDrive. Figure 3: Amount of uploaded bytes under common file operations, e.g. appending and deleting text. Figure 5: Number of unique IP-addresses connecting to a SkyDrive or Dropbox storage server during two different timespans Figure 4: Upload traffic observed when uploading LI3000.txt to five different folders. 5.4 Service Popularity The popularity of the SkyDrive service was measured by monitoring the unique IP-addresses that connected to a storage server each day, in a building on the campus of the University of Twente. This gives a good indication of popularity, as clients would only connect to a storage server when they upload a file. The same was performed with Dropbox as storage service. The top part of Figure 5 shows the measurement for the period from the 1st of June till the 5th of July. It shows more unique IP- addresses connecting to a Dropbox server than there are unique IP-addresses connecting to a SkyDrive storage server. The bottom part of Figure 5 shows the number of unique IP- addresses that connect to a storage server in the SkyDrive and Dropbox service in the period spanning from September the 19th till October the 22th. The graph shows SkyDrive is roughly at 1/6th of unique IP-addresses as compared to Dropbox. An decrease of about 10.7% was observed. The number of IP- addresses connecting to a Dropbox storage server remained pretty stable. A decrease of 2.3% was observed. Figure 6: Amount of traffic generated during two different timespans
The amount of traffic generated during flows was also paper. In this paper SkyDrive has been researched, and it is measured. Figure 6 shows the sum of downloaded and shown to be the second most popular cloud storage provider. uploaded MB to SkyDrive and Dropbox storage servers. Two Another paper on the performance of cloud storage is [3]. In different timespans were used again, and they roughly this paper Dropbox is discussed amongst three other cloud correspond to the timespans in Figure 5. The amount of traffic storage providers. The performance was measured while generated by interacting with SkyDrive storage servers during making and restoring an online backup. The methodology is the September/October timespan has increased by very similar to the one in this research, but the SkyDrive service approximately 178.8% compared to the measurements from is used and examined in this research. Also, we address some June. This includes both up- and downloaded bytes. The features that enhance the performance of cloud storage amount of traffic generated by interacting with Dropbox storage providers. servers decreased by approximately 14.0% during that same timespan. A paper in which the optimization of cloud storage systems is discussed is [6]. This paper describes which factors influence 5.5 Discussion the performance of cloud storage systems and current issues on Table 4 shows the described features and briefly summarizes existing services. These were used to understand the the findings of Sections 5.2 and 5.3. As written before, performance of Microsoft SkyDrive and compare it effectively SkyDrive does not employ client-side data deduplication, delta with Dropbox. updates nor data compression, as opposed to Dropbox. Reasons In [4],[9] and [10] a couple of features that make cloud storage for this, and these are conjectures only, could include that the services perform more efficiently are introduced and discussed. development team of SkyDrive was under the impression the These features were used in the literature survey on the current state of the Internet provides for enough bandwidth to performance comparison between SkyDrive and Dropbox. handle the operation of the service in its current form. Also, Microsoft owns an infrastructure that provides a lot of storage space and bandwidth and can therefore offer SkyDrive in its current form. In contrast, there is Dropbox, which has to 7. CONCLUSIONS We have performed an analysis of the SkyDrive application to squeeze out every single bit of bandwidth as it has to pay for the gain an understanding of its internals. We have established the rent of storage space and the amount of bandwidth uploaded to service stores files over HTTPs using headers available in the these servers to Amazon S3, which indicates why Dropbox Microsoft BITS service and maintains a local database of files employs various technologies to reduce the amount of stored online. As soon as a file is changed, the file is sent bandwidth generated and bytes stored. encrypted to the storage server. This answers the question how Both SkyDrive and Dropbox do not employ geographical SkyDrive administers and handles its files. distribution of the user’s data on a world-wide scale, as both The measurements have also shown that the storage servers, services store files in the United States. Reason for this could be against which most traffic in the SkyDrive service is performed, that Microsoft’s infrastructure is based there, and they felt no are all located in the United States. This does not differ from need to distribute the data geographically. As explained, the Dropbox however, as both services do not employ the strategy distance packets have to travel, influences the speed with which that Content Delivery Networks employ to speed up the up- and this happens, and thus influences the speed with which files can download speed by deploying servers close to clients. This be uploaded to the service. answers the question how the servers in the SkyDrive service are distributed over the world. Table 4: Technologies and their presence in SkyDrive and Experiments conducted during this research have shown that Dropbox SkyDrive does not employ client side data deduplication, data Cloud Storage Provider compression nor delta updates, as opposed to Dropbox. This Technology answers the third research question. SkyDrive Dropbox Client-Side Data Dedupl. No Yes From those three conclusions we conclude that the Microsoft Delta Updates No Yes SkyDrive service is inferior to Dropbox in terms of the presence of performance-enhancing features. The distribution of storage Data Compression No Yes servers in SkyDrive is setup in the same way as in Dropbox. Server Distribution US US However, no performance-enhancing features that are available Storage Protocol Via HTTPS Via HTTPS in Dropbox are available in SkyDrive. This results in quite some bandwidth being squandered by SkyDrive. Future work in this field could be conducted on other providers of cloud storage, to determine whether other technologies have been deployed to enhance performance of those services. The 8. ACKNOWLEDGEMENTS usage of the web interface to SkyDrive could be investigated This paper has been written as part of the ‘Broadband for All’ also, to get a more thorough understanding of the service and track of the Bachelorreferaat course at the University of how it performs compared to Dropbox. Also, the way cloud Twente, which is supervised by the ‘Design and Analysis of storage services are being utilized by clients could be Communication Systems’-group (DACS) of the University of investigated, to gain a better understanding of the typical usage Twente. I would like to thank my supervisor, I. Drago, for his of cloud storage services. continuing support and insights during my work. 6. RELATED WORK 9. REFERENCES As written before, [1] provides for a thorough understanding of [1] Drago, I., Mellia, M., Munafò, M.M., Sperotto, A., Sadre, the Dropbox service. Its performance is clearly discussed in the R. and Pras, A. 2012. Inside Dropbox: Understanding
Personal Cloud Storage Services. In Proceedings of the [6] Spillner, J., Müller, J., Schill, A. 2012. Creating optimal 12th ACM SIGCOMM Conference on Internet cloud storage systems. Future Generation Computer Measurement. IMC ’12. Pages 481-494. DOI= Systems. 16 June 2012. http://dx.doi.org/10.1145/2398776.2398827 DOI=http://dx.doi.org/10.1016/j.future.2012.06.004 [2] Fielding, R., e.a. 1999. RFC2616 - Hypertext Transfer [7] Tridgell, A., Mackerras, P. 1996. The rsync algorithm. Protocol – HTTP 1.1. Available on Joint Computer Science Technical Report Series, TR-CS- http://www.ietf.org/rfc/rfc2616.txt 96-05 [3] Hu, W., Yang, T., Matthews, J.N. 2010. The good, the bad [8] Vakali, A., Pallis, G. 2003. Content delivery networks: and the ugly of consumer cloud storage. ACM SIGOPS status and trends. IEEE Internet Computing, Vol 7, Issue 6, Operating Systems Review, Vol 44, Issue 3, July 2010, Nov-Dec 2003, pages 68-74. DOI= pages 110-115. http://dx.doi.org/10.1109/MIC.2003.1250586 DOI=http://dx.doi.org/10.1145/1842733.1842751 [9] Wang, L., et. al. 2010. Cloud Computing: a Perspective [4] Jones, M. T., 2010. Anatomy of a cloud storage Study. New Generation Computing, Vol 28, Issue 2, April infrastructure. IBM developerWorks. Available on 2010, pages 137-146. http://www.ibm.com/developerworks/cloud/library/cl- DOI=http://dx.doi.org/10.1007/s00354-008-0081-5 cloudstorage/. Also available as PDF. [10] Zeng, W., Zhao, Y., Ou, K., Song, W. 2009. Research on [5] Poese, I., Uhlig, S., Kaafar, M.A., Donnet, B., Gueye, B. cloud storage architecture and key technologies. 2011. IP geolocation databases: unreliable? ACM Proceedings ICIS ’09, pages 1044-1048. DOI= SIGCOMM Computer Communication Review, Vol 41, http://dx.doi.org/10.1145/1655925.1656114 Issue 2, April 2011, pages 53-56. DOI= http://dx.doi.org/10.1145/1971162.1971171
You can also read