TRANSPARENT ENCRYPTION FOR CLOUD-BASED SERVICES - Gergő Ládi - Dr. Levente Buttyán - HTE
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Budapest University of Technology and Economics Faculty of Electrical Engineering and Informatics Department of Networked Systems and Services Gergő Ládi TRANSPARENT ENCRYPTION FOR CLOUD-BASED SERVICES ADVISOR Dr. Levente Buttyán BUDAPEST, 2017
Table of Contents Table of Contents ........................................................................................................ 2 Összefoglaló ................................................................................................................. 7 Abstract ....................................................................................................................... 8 1 Introduction .............................................................................................................. 9 1.1 Definitions ........................................................................................................... 9 1.2 Problem Statement ............................................................................................... 9 1.3 Challenges ......................................................................................................... 11 1.4 Outline............................................................................................................... 12 2 Goals, Tasks, Objectives, and Strategies ............................................................... 13 2.1 Goals ................................................................................................................. 13 2.2 Tasks ................................................................................................................. 13 2.3 Objectives and Strategies ................................................................................... 15 3 Initial Research ....................................................................................................... 16 3.1 Related Work..................................................................................................... 16 3.1.1 Publications ................................................................................................ 16 3.1.2 Similar Software ......................................................................................... 17 3.1.2.1 Boxcryptor ........................................................................................... 17 3.1.2.2 Cipherdocs ........................................................................................... 17 3.1.2.3 CloudFogger ........................................................................................ 18 3.1.2.4 SeaFile ................................................................................................. 18 3.1.2.5 Cryptomator ......................................................................................... 18 3.1.2.6 Tresorit ................................................................................................ 18 3.1.3 Summary of Related Work .......................................................................... 19 3.2 Enumerating Potential Services for Encryption .................................................. 19 4 Picking and Analysing Services ............................................................................. 21 4.1 Detailed Analysis of Evernote's Communication Protocol .................................. 21 4.1.1 Protocol Analysis ........................................................................................ 21 4.1.2 Message Analysis ....................................................................................... 24 4.1.2.1 Initial Page Load .................................................................................. 24 4.1.2.2 Reading Note Contents ......................................................................... 25 4.1.2.3 Creating Notes ..................................................................................... 26 4.1.2.4 Editing Notes ....................................................................................... 26
4.1.2.5 Editing Reminders ................................................................................ 27 4.1.2.6 Deleting Notes ..................................................................................... 29 4.1.3 Analysis Summary ...................................................................................... 29 4.2 Detailed Analysis of Google Calendar’s Communication Protocol ..................... 30 4.2.1 Protocol Analysis ........................................................................................ 30 4.2.2 Message Analysis ....................................................................................... 32 4.2.2.1 Initial Page Load .................................................................................. 32 4.2.2.2 Dynamic Loading ................................................................................. 34 4.2.2.3 Creating Events .................................................................................... 35 4.2.2.4 Editing Events ...................................................................................... 37 4.2.2.5 Deleting Events .................................................................................... 37 4.2.3 Analysis Summary ...................................................................................... 38 4.3 Quick Analyses .................................................................................................. 39 4.3.1 Dropbox...................................................................................................... 39 4.3.2 Dynalist ...................................................................................................... 40 4.3.3 OneNote (Online) ....................................................................................... 40 4.3.4 SimpleNote ................................................................................................. 40 4.4 Analysis Summary ............................................................................................. 40 5 Designing a Transparent Encryption Layer .......................................................... 42 5.1 Intercepting Traffic ............................................................................................ 42 5.1.1 Hijacking DNS Queries ............................................................................... 42 5.1.2 Proxying Connections ................................................................................. 43 5.1.3 Handling Certificates .................................................................................. 44 5.1.3.1 The “Problem” with Certificates ........................................................... 44 5.1.3.2 Becoming a Trusted Root Certificate Authority .................................... 44 5.1.3.3 Validating the Provider’s Certificate..................................................... 45 5.2 Inspecting and Altering Traffic .......................................................................... 45 5.3 Encrypting/Decrypting Messages ....................................................................... 46 5.3.1 Key Management ........................................................................................ 46 5.3.2 Using Format Preserving Encryption ........................................................... 46 5.3.2.1 Format-Preserving Encryption for Text ................................................ 48 5.3.2.2 Format-Preserving Encryption for Date and Time ................................ 48 5.4 Design Summary ............................................................................................... 48 6 Implementing a Prototype ...................................................................................... 49
6.1 Intercepting traffic ............................................................................................. 49 6.1.1 DNS Hijacking............................................................................................ 49 6.1.2 Creating Certificates ................................................................................... 50 6.1.3 Implementing the Proxy .............................................................................. 51 6.2 Inspecting and Altering Traffic .......................................................................... 52 6.3 Encrypting/Decrypting Messages ....................................................................... 54 6.3.1 Key Management ........................................................................................ 54 6.3.2 Initialization Vectors ................................................................................... 54 6.3.3 Format preserving encryption ...................................................................... 54 6.3.3.1 Format-Preserving Encryption for Text ................................................ 55 6.3.3.2 Format-Preserving Encryption for Date and Time ................................ 56 7 Testing the Prototype ............................................................................................. 57 7.1 Smoke Testing ................................................................................................... 57 7.1.1 Smoke Testing the DNS Hijacking Component ........................................... 57 7.1.2 Smoke Testing the TLS Proxy..................................................................... 58 7.1.3 Smoke Testing the FPE Module .................................................................. 59 7.2 Unit Testing ....................................................................................................... 60 7.2.1 Unit Testing the Filters................................................................................ 60 7.2.2 Unit Testing the FPE Module ...................................................................... 61 7.3 Integration Testing ............................................................................................. 61 8 Conclusion .............................................................................................................. 63 9 Further Considerations .......................................................................................... 64 9.1 Possible Threats ................................................................................................. 64 9.1.1 Ever-Changing APIs ................................................................................... 64 9.1.2 New Security Measures ............................................................................... 64 9.2 Plans for Improvement ....................................................................................... 64 9.2.1 Supporting Multiple Users .......................................................................... 64 9.2.2 Usage in Enterprise Environments .............................................................. 65 9.2.3 More Services ............................................................................................. 65 9.2.4 Linux Compatibility .................................................................................... 65 9.2.5 User Experience .......................................................................................... 65 References .................................................................................................................. 66 Appendix.................................................................................................................... 69 A. Table of Abbreviations .................................................................................... 69
B. Table of Figures ............................................................................................... 71 C. Table of Exhibits ............................................................................................. 71 D. Exhibits ........................................................................................................... 72
HALLGATÓI NYILATKOZAT Alulírott Ládi Gergő, szigorló hallgató kijelentem, hogy ezt a diplomatervet meg nem engedett segítség nélkül, saját magam készítettem, csak a megadott forrásokat (szakirodalom, eszközök stb.) használtam fel. Minden olyan részt, melyet szó szerint, vagy azonos értelemben, de átfogalmazva más forrásból átvettem, egyértelműen, a forrás megadásával megjelöltem. Hozzájárulok, hogy a jelen munkám alapadatait (szerző(k), cím, angol és magyar nyelvű tartalmi kivonat, készítés éve, konzulens(ek) neve) a BME VIK nyilvánosan hozzáférhető elektronikus formában, a munka teljes szövegét pedig az egyetem belső hálózatán keresztül (vagy hitelesített felhasználók számára) közzétegye. Kijelentem, hogy a benyújtott munka és annak elektronikus verziója megegyezik. Dékáni engedéllyel titkosított diplomatervek esetén a dolgozat szövege csak 3 év eltelte után válik hozzáférhetővé. Kelt: Budapest, 2017. 12. 17. ...……………………………………………. Ládi Gergő
Összefoglaló Az idei elemzések szerint a felhő alapú szolgáltatások egyre nagyobb népszerűségre tesznek szert, mind vállalati közegben, mind az otthoni felhasználók körében. Legyen szó fájlok tárolásáról, levelezésről, naptár- és időkezelésről, jegyzetelésről, vagy akár jelszókezelésről, előszeretettel veszünk igénybe online szolgáltatásokat. Ez biztonsági szempontból számos veszéllyel jár – adataink elveszhetnek, sérülhetnek, vagy még rosszabb: illetéktelenek kezébe kerülhetnek. Az elmúlt néhány évben többtíz biztonsági incidensről olvashattunk a hírekben, melyek áldozatai között voltak kicsi és nagy, híres és kevésbé ismert cégek is. Az incidensek során sokszor többszázezer felhasználó adatai szivárogtak ki és kerültek nyilvánosságra az interneten. Ez nem csak azért veszélyes, mert esetlegesen bizalmas információk (például magánjellegű üzenetek, üzleti titkok) kerülhetnek a versenytársak kezébe, de azért is, mert az adatok között szerepelhetnek jelszavak vagy jelszóval egyenértékű adatok. Ezen adatok birtokában egy támadó képessé válhat arra, hogy az áldozatok más szolgáltatásokhoz tartozó fiókjaihoz hozzáférjen, ezzel pedig még több esetlegesen érzékeny adatra tehessen szert. A fenti probléma egyik lehetséges megoldása az úgynevezett transzparens titkosítás. Ennek működési elve az, hogy az adatokat még helyben, a szolgáltatóhoz való továbbítás előtt titkosítjuk (így oda már csak titkosított formában jutnak el), majd később, még mielőtt a kliens feldolgozná azokat, kititkosításra kerülnek, szintén helyben. Ezáltal, még ha betörés áldozata is lesz egy szolgáltató, vagy akár csak egy máshonnan megszerzett felhasználónév-jelszó párossal lép be egy támadó, csak a titkosított, számára értéktelen adathalmazt látja. Diplomatervem keretében választok egy felhő alapú szolgáltatást, elemzem az általa használt kommunikációs protokollt, majd megtervezek és elkészítek egy szoftvert, amely képes biztonságot nyújtani a szolgáltató felé irányuló forgalom releváns üzeneteinek transzparens titkosításával.
Abstract Recent surveys have shown that cloud services are becoming more and more popular, both in the enterprise sector and among individuals. Be it online file storage, e- mail, calendar & time management, note taking, or even password management, we rely heavily on online services. From a security standpoint, this poses several risks – data might be lost or corrupted, or even worse: accessed by unauthorized individuals. In the past few years, tens of security incidents, hitting big and small, little-known and famous companies alike, were covered in the news. Many of these breaches resulted in several hundreds of thousands of user records being leaked and made available on the internet – some exclusively on the black market, others to the general public. This is not only dangerous because potentially confidential information (such as private messages or trade secrets) might get into the hands of competitors, but also because user records may contain passwords or password equivalents. Using said information, it might be possible for an adversary to get into the accounts of these victims for other services, gaining access to even more potentially sensitive information. One of the possible solutions to this issue is employing transparent encryption, the principle of which is to encrypt information locally, before it is being sent to the cloud service provider (where it is stored in an encrypted form), and then, upon reception, decrypt it before it is processed by the local client. This way, even if the cloud service provider itself is compromised, or is accessed using stolen credentials, the attacker can obtain nothing but encrypted pieces of information. Within this thesis, I am to choose a cloud service, analyse the communication protocol used, then design and implement a piece of software that can perform transparent encryption by identifying and modifying the relevant messages in transport. 8
1 Introduction 1.1 Definitions For the purposes of this document, unless otherwise noted, the terms cloud service, cloud service provider, service provider, and provider refer to a service (and the company that provides said service) that is available through the internet and that lets its users upload and store user data. User data shall refer to documents, images, and other files, as well as personal information including but not limited to names, addresses, telephone numbers, and dates of birth that need not be known by the provider in order to fulfil its purpose. For example, in the case of a web shop, a shopper’s address and telephone number are not considered user data as these are needed to ensure delivery, while in the case of a contact manager application, they are. 1.2 Problem Statement Over the course of the past couple of years, we could see public cloud-based services gain ground over traditional self-hosted or serverless solutions. This shift towards public cloud services could be observed not only in the enterprise sector, but also among home users. Gartner's forecast, titled Public Cloud Services, Worldwide, 2014- 2020 corroborates these observations, and further adds that this process is not expected to stop in the following years, although the speed of change may begin decreasing, starting from 2020 [1]. A different publication by RightScale points out that the typical user, as of 2017, leverages 1.8 public cloud services on average as part of his daily routine and is experimenting with a further 1.8 services [2]. This growing interest resulted in several existing companies adapting their software and services to the cloud, as well as new companies entering the market, promising easy-to-use applications that provide access to your information, regardless of which one of your devices you're using. Services appeared providing online file storage and synchronization, calendar management, image sharing, note taking, or even password management. Either to speed up the process of development in order to be first, or simply to cut costs, security analyses were often skipped. As it was later revealed, the lack of security measures and improper security design were the two main reasons for most of 9
recent years' data breaches [3]. These breaches affected big and small companies alike, causing loss of fame, revenue, and their users' trust. To make matters worse, some of these incidents are not discovered until after several months, or even years have passed. The risks of using public cloud services is fivefold: 1) Data loss: If a provider ceases operations without notifying clients, clients lose their files unless they have other copies of these. An example of this would be the sudden closure of MegaUpload, a file storage and sharing service, in 2011. 2) Direct data theft: If a provider is breached where potentially sensitive information is stored (e.g. files, images, or notes), these could be accessed by unauthorized individuals. 3) Indirect data theft: If any site or service is breached where the user hasn't stored anything valuable, the attacker may still gain access to the users' email addresses and passwords or equivalent authentication information (such as hashes of passwords). In this case, if the users had accounts with different providers where they used the same usernames and passwords, the attackers will be able to log in and access the users' sensitive information, even if the service itself was reasonably secure. 4) Insider access: The provider itself may access the users' sensitive information without their knowledge or permission. This could be a malicious employee or a system designed to perform data mining on user data to extract features deemed interesting in order to create user profiles to be sold or otherwise used for profit. 5) Nation state attackers: The provider may be forced to, or may decide to hand over user data to nation states or law enforcement agencies, which could put the users' lives at risk. A good example for this could be Arabic countries, where the internet is heavily regulated, and people having opposing views to the current political party are often chased down. 10
The first risk may be eliminated by having a proper backup plan (such as the 3-2- 1 strategy1) in place. The rest of these risks could also be eliminated, or at least greatly reduced by employing transparent encryption. The principle of transparent encryption is that, before being sent over the network by client applications unencrypted, data is intercepted on the client computer (or on a trusted device on the home network), and is encrypted by a separate piece of software. This encrypted data is then received and stored by the provider. When the client application needs this data later, it requests the data from the server, which then sends it back – still encrypted – to the client. Before it could be processed by the client application, is it intercepted again by the previously mentioned software, and is decrypted on-the-fly. Finally, the client application receives the data unencrypted, just as it was expecting it, then processes the data as needed. In case an attacker manages to get his hands on a file (in any of the manners detailed above) that was encrypted this way, he will have gained nothing but a blob of garbage (from his point of view). This approach can be extended to cover not only files, but also other kinds of potentially sensitive information, such as text fields, dates or credit card numbers. Transparent encryption has the advantage that neither the client application nor the server-side code has to be modified in any way, therefore it can be used even if the service provider does not support such extra security measures, and the provider itself does not have to spend resources implementing said measures. In addition, if the encryption layer is implemented by independent developers, the provider does not have to be trusted (which may, otherwise, implement intentionally weak or flawed encryption, or leave backdoors). Furthermore, since transparent encryption is implemented as a separate piece of software, it can be licensed under a different licensing model than the original software – this is advantageous because the encryption software can be made open source, even if the underlying client application is closed source. 1.3 Challenges When designing and implementing a transparent encryption layer, one may face several challenges that need to be overcome in order to succeed. The most crucial part is analysing and learning how the target service works and how the client application 1 The 3-2-1 strategy: have 3 copies of your critical data. These copies should reside on at least 2 different kinds of media (e.g. two on hard disks, one on an optical disk), and 1 copy should be kept off-site. 11
communicates with the web service, followed by the design of encryption methods that generate output that pass possible validation checks made by the providers. Firstly, if the communication channel itself is encrypted, the encryption method has to be understood, circumvented, then reimplemented in the transparent encryption layer. In the event that the service uses a non-standard or proprietary encryption mechanism, these steps may end up rather time consuming. Secondly, in addition to being encrypted, the channel may also be authenticated or otherwise tamper-proofed, which, again, needs to be circumvented, and may even make it impossible to create a truly transparent proxy. Thirdly, the protocol spoken by the parties also has to be – at least partially – understood. This may turn out to be a custom, undocumented proprietary binary protocol, which could take several days to reverse. Fourthly, some legal agreements or country laws may not permit the disassembly and/or analysis of the client application. Finally, in certain cases, the cloud service expects the data to conform to a certain format and/or be in a specific range of values, and if encryption is used, the ciphertext will most likely not meet these requirements. 1.4 Outline Within the next sections, I will choose and introduce a cloud-based service, find a way of intercepting its messages, analyse these, then design, implement, and test the prototype of a software that is capable of performing transparent encryption on the relevant messages. In section 2, I will explain in detail the problem to be solved, and the goals to be reached. In section 3, I will research and provide an overview of similar works and results, as well as identify services that could be possible candidates for this project. In section 4, I will analyse at least one cloud-based software, its channels of communication, message types, then identify which (parts of) these need to be encrypted to ensure the security of user data. Then, in section 5, I will design a system that makes it possible to intercept and encrypt/decrypt messages identified in the previous section. Afterwards, in section 6, I will implement a prototype using the previous design. Then, in section 7, I will perform tests on the prototype. In section 8, I will summarize my work, then, finally, in section 9, I will explain how this prototype could be further improved, as well as list possible issues that might make it difficult to use or maintain such a solution. 12
2 Goals, Tasks, Objectives, and Strategies 2.1 Goals Goals describe what must be achieved for the project to be considered successful. As defined in the thesis assignment description, I will have six top-level goals: 1) Choosing a cloud-based service that can be used to demonstrate transparent encryption 2) Analysing the communication protocol that is used by the service 3) Identifying the relevant protocol elements that should be encrypted in order to provide confidentiality 4) Identifying a transparent encryption scheme that could be used to provide confidentiality, and perhaps, adapting it to better suit the current task 5) Designing and implementing the encryption layer, and integrating it with the client-side application 6) Testing the implementation and summarizing the results In addition to the above, I’ve chosen to add an extra goal as a sort of a zeroth step: researching similar or related existing solutions. 2.2 Tasks Tasks are technically breakdowns of goals. Goal 1) can be broken down to two main tasks: a) Making a list of potential services, possibly with the help of search engines and peers b) Choosing a service that is expected to use a limited set of relatively simple messages. It would also be ideal if this service was not completely unknown to me. Goal 2) will consist of four distinct tasks: 13
a) Identifying the transport layer (OSI layer 4) protocol2 that is used by the application b) Identifying whether the application uses a well-known higher-layer (e.g. OSI layer 7) protocol or not c) Identifying whether the protocol messages are encrypted and/or covered by integrity protection d) Identifying the structure of the protocol messages (if any) Goal 3) is comprised of the following three tasks: a) Identifying the data types that will need to be protected (e.g. files, text, images, phone numbers, etc.) b) Identifying the messages in which these are transmitted c) Identifying the location of these data elements within these messages Goal 4) includes another three tasks: a) Enumerating the algorithms that could be used b) Choosing the most suitable group of algorithms c) Identifying how the keys and other necessary parameters will be managed Goal 5) can be broken down to another set of three: a) Planning the architecture of the prototype, defining what components there are and how they interact with each other b) Choosing a paradigm and a language the prototype will be implemented in c) Implementing each component in the chosen language and paradigm Finally, goal 6) will include the following four tasks: a) Determining the type and number of tests needed 2 Open Systems Interconnection model: an abstract model that can be used to describe the means and methods of network communication between hosts. It consists of 7 independent layers, with each layer being responsible for a different role, such as physical or logical addressing, or retransmission of lost data. 14
b) Planning the tests, writing test cases c) Performing the tests d) Evaluating the results, drawing conclusions In addition, the previously mentioned zeroth goal will be made up of two tasks: a) Looking for related publications in conference archives and publication indexing services b) Using search engines to find related software 2.3 Objectives and Strategies Objectives define the deadline, while strategies define the order of completion. In my case, the tasks are mostly linear, with each task depending on the previous ones. • The literature research phase should be done first so as to avoid possible duplicate work. It is expected to take one or two weeks, depending on the findings. • Goal 1) must be completed next as all others depend upon it. This should take at most one week. • Goal 2) must follow afterwards, as none of the others are available at this point. This is task expected to take one week. • Goals 3) and 4) may be done in parallel, although it would be preferable to finish goal 3) before starting goal 4). These are expected to take two and three weeks, respectively. • Goal 5) depends on all previous goals and is the only one that can be completed next. It is expected to take four weeks. • Finally, goal 6) can be completed. This should take two weeks. 15
3 Initial Research 3.1 Related Work 3.1.1 Publications Based on searches I conducted using the IEEE Xplore Digital Library and Google Scholar, two of the biggest research databases, over 2 000 papers have been published to date that are related to cloud and encryption. Approximately 200 papers are connected to transparent encryption, while only a fifth of those are also related to cloud computing. There exist some solutions for specific use cases, such as transparently encrypting data stored in a local filesystem [4][5], in MongoDB databases [6], HDFS3 [7][8], or transmitted between virtual machines and their hosts [9][10]. Application Layer Encryption for Cloud by Saxena et al. [11] describes a similar, but not necessarily transparent method, user-layer encryption (as they name it). In 2012, Diallo et al. published CloudProtect [12], a middleware written in the Java programming language that can transparently encrypt certain data fields in Google Docs documents, as well as Google Calendar items, while also making it possible to share encrypted items with others. However, the solution presented in this paper only seems to consider unencrypted HTTP4 sessions, which are less and less common these days. Just recently, in 2017, Newport et al. described a system in their paper, A Secure Cloud Storage System for Small and Medium Enterprises [13], that is capable of encrypting files on the fly, then storing them in Dropbox, a cloud-based file storage service. Although the exact details of the method are not specified, I would infer from the examples that this solution works by creating a local folder, looking for changes in the files and folders within, then encrypting the changed files, putting them in an actual Dropbox folder, to be uploaded to the cloud. 3 Hadoop Distributed File System: a distributed file system that stores data on commodity machines and is often used in clustered environments. 4 Hypertext Transfer Protocol: a text-based application layer (OSI L7) protocol that is most commonly used by browsers to access web-based content. 16
3.1.2 Similar Software The next phase of my research consisted of discovering software that was similar in functionality to what I aim to achieve within this thesis. 3.1.2.1 Boxcryptor The first solution I found was Boxcryptor. It is a closed-source software that creates a virtual drive that can be used for secure storage. When a file is written on this drive, it is encrypted instead, then stored in one of the supported cloud storage services (this approach is often called the overlay method as it overlays an existing solution). The free version only supports Dropbox as its back-end, while the paid version includes support for Google Drive, OneDrive, Box, Cubby, and several others [14]. It uses AES5 with a key length of 256 bits in CBC6 mode to encrypt files, with each file being encrypted with a different key. The AES keys are encrypted with the user’s 4096-bit RSA7 key, and are appended to the files [15]. Boxcryptor also supports master keys and sharing files with other users or groups. It works on Windows and Linux, as well as several mobile platforms. 3.1.2.2 Cipherdocs The next solution I came across was cipherdocs. It is an open-source project, available on GitHub8. According to the project documentation [16], it works with most cloud storage solutions, including Dropbox, Google Drive and OneDrive. Instead of the overlay method, it changes the file extensions of the encrypted files to .gpg, which, when opened, are decrypted into a temporary (local) folder, then re-encrypted and moved back 5 Advanced Encryption Standard: a commonly used symmetric encryption algorithm for which no known feasible attacks exist as of today 6 Cipher Block Chaining: a mode of operation of a block cipher, in which the input of the n th encryption function is not just the n th block of plaintext, but the nth plaintext block XORed (bitwise eXclusive OR) with the ciphertext of the (n-1)th block. For the first block, the first plaintext block is XORed with an initialization vector. 7 Rivest-Shamir-Adleman: a commonly used asymmetric encryption algorithm, the strength of which is based on the factoring problem (factorizing the product of two large primes) 8 A development platform that makes it easy to share code, enable other programmers to propose changes to the code, track issues, publish releases, automate testing, and manage projects. 17
if changed. It uses an OpenPGP9 implementation to encrypt files. Designed for single- user mode, sharing encrypted files is not supported. It works on Windows only. 3.1.2.3 CloudFogger A third provider I intended to check out was CloudFogger, but as of 4 Oct, 2017, only a notice appears on their website 10, stating that the service is no longer available, recommending users to try Boxcryptor instead. 3.1.2.4 SeaFile The next solution to be investigated was SeaFile, an open-source, Git11-based storage system. It is self-hosted, meaning that the server has to be installed by, and the storage space has to be provided by the user, requiring more knowledge of computer systems than the previously introduced solutions. It supports sharing files among users and groups [17]. Neither the website nor their GitHub page mention any specifics of the encryption process. Supports Windows and Linux. 3.1.2.5 Cryptomator Another application I tried was Cryptomator. It is free and open-source, and acts as an overlay above Dropbox, Google Drive, and other services, just like Boxcryptor. It uses AES with 256-bit long keys, and encrypts file names as well as the structure of the folders. Written in Java, it works on Windows and Linux as well. 3.1.2.6 Tresorit Last, but not least, I checked out Tresorit. It turned out to be different: it is a stand-alone service that has its own client, and does not overlay an existing service. It supports 2- factor authentication, has a version history, and the more expensive plans include sharing files. It uses AES-256 for encryption. It is closed-source and has no free plan. 9 An e-mail encryption framework based on Phil Zimmermann’s software titled Pretty Good Privacy (PGP). 10 https://www.cloudfogger.com 11 A distributed version control system for files, often used by programmers. 18
3.1.3 Summary of Related Work Based on the above, it can be concluded that the need for encryption for cloud- based services has already been recognized, resulting in several publications and implementations. As for the papers, while many are related to my project, some of them employ non-transparent methods, some of them focus on enterprise-level solutions more than home users, and some of them just describe what should be done, but now how. It is apparent, however, that by combining existing proposals and extending the result with some of my ideas, it is possible to build a system that solves the problems detailed in the problem statement. As for the implementations, it can be said that none of them are truly transparent in that they all require the end-user to change how he uses the underlying cloud service (for example, by having to store and manage files on a new virtual drive instead of Dropbox). Furthermore, it seems that all the solutions focus on securing files, and there are no well-known implementations to secure note-taking applications or calendars. 3.2 Enumerating Potential Services for Encryption I used three different approaches to find cloud-based services that could possibly be made more secure using transparent encryption. First, I listed the services that I currently use, used before, or at least have heard about. Second, I asked some of my colleagues and friends about which services they used. Third, I used three different search engines, Google, Yahoo and Bing, querying for typical search terms such as online file storage, online calendar, note management, and alternatives, to see if there's anything I missed. After compiling the lists from all the sources, I ended up with Table 3.1. The table shows the services found in alphabetical order, the type of the service, and the source where the entry came from. Service Type Source Apple Cloud file storage colleagues Box.com file storage heard about Dropbox file storage used before Dynalist.io note/task management Yahoo Evernote note management used before Google Calendar calendar management used before 19
Google Drive file storage used before Google Keep calendar management heard about Note Taking Express note management Bing OneDrive file storage using it OneNote (online) note management colleagues Outlook (online) calendar management colleagues SimpleNote note management Google SpiderOak file storage colleagues Sync.com file storage Google Zoho Notebook note management Google Table 3.1 – A list of services that could potentially be encrypted transparently 20
4 Picking and Analysing Services Having found that there exist several solutions that focus on securing online file storage services, but none that aim to secure calendar applications and note-taking services, I have chosen to focus on these latter two categories. Even though the assignment description only requires me to analyse one service, if time permits, I would like to examine more of them in order to gain insight on current trends. I expect this to help me design the transparent encryption layer later more efficiently. The first service to be analysed was chosen to be Evernote. Evernote, as its name suggests, is a note-taking application that also supports reminders, task lists, and interactive content. Even though Evernote isn't the most popular service on the list, I have previously worked with its API12 as part of a related project. The API isn't exactly what I'd call simple, but being at least somewhat familiar with it makes Evernote an ideal candidate for starters. 4.1 Detailed Analysis of Evernote's Communication Protocol 4.1.1 Protocol Analysis Evernote has a web-based, a desktop, and a mobile client. This would make it logical to assume that all of these clients use a common API, which is likely to be a web- based API. To confirm this hypothesis, I opened Firefox, a web browser, loaded the landing page (https://www.evernote.com), then pressed F12 to show the browser’s Developer Tools window. Although it might be called differently, all modern browsers have this feature today. This feature makes it possible to inspect and manipulate the DOM13 of websites, run/inject arbitrary JavaScript code locally, perform benchmarks, test the layout with various screen sizes and aspect ratios, and, of course, take a peek at what is being sent on the network (see Figure 4.1). 12 Application Programming Interface: a set of data structures and function declarations that describe how a service that implements this API may be consumed (interacted with). 13 Document Object Model: a model that treats an (X)HTML document as a tree, where each node represents an element in the document. 21
Figure 4.1 – Evernote's messages in Firefox's Developer Tools It can be seen that the web client of Evernote uses HTTP over SSL 14/TLS15 over TCP16 to send and receive messages. To further confirm the hypothesis, I downloaded and installed the Evernote application from Play Store, the official application store of Google. After installation, I made sure that my Android-based phone was connecting to the same WiFi network my laptop was on, then opened Wireshark, a network packet capture/analysis application. After logging in to Evernote on the mobile, I could see in Wireshark that it was connecting to the same IP address as the browser, and that it was also using TLS. I could not see what was being transmitted, but this is expected since the traffic itself is encrypted. Having confirmed the hypothesis, the analysis followed. Within its scope, I logged in, created two notes, set their title and contents, added a reminder to one of them, changed 14 Secure Sockets Layer: a set of cryptographic protocols that aim to secure communication channels. Typical services include encryption, integrity protection and authentication. 15 Transport Layer Security: a (more secure) successor of SSL. 16 Transmission Control Protocol: a connection-oriented transport layer (OSI L4) protocol that offers reliability via the retransmission of lost segments. It also reorders out-of-order segments and supports flow control. 22
the contents, changed the date of the reminder, created a third note, deleted it, and then examined the results: • For all of the actions that create or change a note, AJAX17 calls are made to the Evernote API. The HTTP method (verb) used for these calls is always POST, no matter whether the action is a read, update, or delete operation. From this, I can conclude that the Evernote API is not a RESTful18 API. • The endpoint for the relevant messages is always either https://www.evernote.com/shard/s###/enweb/notestore or https://www.evernote.com/shard/s###/enweb/notestore/ext, where ### is a three-digit number, possibly referring to the specific server on which my session exists. • The response always has the MIME19 type text/json, although the response body is never a valid JSON20 object. It always begins with two forward slashes and the letters O, K (i.e. //OK), which, in JavaScript, denotes a comment, however, this is not valid in JSON. This is most likely used as a security measure against object hijacking attacks via script inclusion, which the two forward slashes break by making the entire response behave like a comment if included maliciously. The actual client implementations can just ignore the first four characters of each response, and parse the rest as a regular (and valid) JSON object. 17 Asynchronous JavaScript and XML: a means of performing asynchronous calls in web applications that is typically used to dynamically load page contents without having to reload the entire page. 18 Representational State Transfer: a REST (or RESTful) API is a stateless API that identifies the resource to be queried or manipulated in the URL, and uses HTTP verbs to specify the action to be carried out on the target resource. 19 Multipurpose Internet Mail Extensions: originally designed to describe the types and contents of files that are sent as e-mail attachments, MIME has been adopted to be used in other protocols as well. 20 JavaScript Object Notation: a JavaScript-like data format that is commonly used in modern web APIs, especially object-oriented ones. 23
• The request body always consists of several tens of data fields separated by pipe symbols ( | ). Some of the fields contain values such as java.lang.String/2004016611, hinting that these fields may be Java objects serialized into strings. It took me a while to figure out what framework or API serializes data like this, but I succeeded. The framework was identified as GWT 21, while the protocol being spoken as GWT-RPC22 wire protocol. While there is no official documentation for this, it was reverse- engineered in 2012 [18]. Even though GWT-RPC parsers are only available in Java, this does not limit the languages that can be used to implement a transparent proxy for Evernote since we don’t need to parse and interpret each field in the message, only specific ones that are always in the same position. • The messages have no replay or integrity protection. I was able to replay an API call using Firefox's Edit and Resend function. I changed several letters in the content of a note that was to be edited, then replayed the message. The server responded with a 200 OK message. After a reload, the note had its content successfully modified. 4.1.2 Message Analysis 4.1.2.1 Initial Page Load After logging in, when the site is first loaded, a call to the findNotesMetadata function is issued. As shown later, this function returns a list of notes that are owned by the user that is logged in. 7|0|10|https://www.evernote.com/focusclient/|78E137D3512D195F071EB90374365 42A|com.evernote.web.shared.GWTNoteStoreInterface|findNotesMetadata|com.ev ernote.edam.notestore.NoteFilter/3387378272|I|com.evernote.edam.notestore. NotesMetadataResultSpec/2285571585|[Z/1413617015|Etc/GMT-1||1|2|3|4|4|5|6| 21 Google Web Toolkit: an open-source framework that makes it possible to create JavaScript- based web applications in Java. 22 Remote Procedure Call: a means of implementing inter-process communication, in which the caller can invoke methods on remote servers, then process the results as if they were the results of local function calls. 24
6|7|5|8|4|1|1|1|0|0|0|0|0|0|0|2|0|9|10|0|50|7|8|11|1|0|1|1|0|0|1|0|1|1|0|1 |0|1|0|1|0|1|0|1|0|1| This request does not contain anything sensitive, but we can see that the version of the GWT-RPC protocol being used is 7. The response looks as follows: //OK[150,2,0,0,0,'VjcO6_o',0,11,0,8,0,0,10,'A','VjcO30g',0,'A',0,0,0,0,'A' ,'A','A','A',0,0,0.0,0.0,0,0,0,0,0,0,6,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,12,2, 5,0,0,0,1,1,0,6,2,4,'VjcPCUY',0,9,0,8,0,0,7,'A','VjcO$Kw',0,'A',0,0,0,0,'A ','A','A','A',0,0,0.0,0.0,0,0,0,0,0,0,6,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,12,2 ,5,0,0,0,1,1,0,6,2,4,2,3,1,1,1,3,2,1,["com.evernote.edam.notestore.NotesMe tadataList/3192047339","[Z/1413617015","java.util.ArrayList/4159755760","c om.evernote.edam.notestore.NoteMetadata/2574771094","com.evernote.edam.typ e.NoteAttributes/74627218","me4626","b1dc0239-b0b1-47ee-a17e-b69b66ba70e4" ,"My second note","9a8dddfd-0d87-4e7c-b81d-086a5aa3f9ee","My first note"],0,7] Here, I can see that I have two notes, surrounded with what appear to be GUIDs23. From that, I can infer that notes are identified by the GUIDs in the system. One has the GUID of b1dc0239-b0b1-47ee-a17e-b69b66ba70e4 and the title My second note, the other the GUID of 9a8dddfd-0d87-4e7c-b81d-086a5aa3f9ee and the title My first note. The note titles here are of interest since these may contain sensitive information. 4.1.2.2 Reading Note Contents The content of notes is retrieved using getHtmlNoteContent: 7|0|11|https://www.evernote.com/focusclient/|304280BD765DCDBF4A79609E10928 B10|com.evernote.web.shared.GWTNoteStoreExtensions|getHtmlNoteContent|java .lang.String/2004016611|java.util.List|S|Z|b1dc0239-b0b1-47ee-a17e-b69b66b a70e4|java.util.ArrayList/4159755760|/shard/s652/res/|1|2|3|4|5|5|6|5|7|8| 9|10|0|11|-1|0| The request contains the GUID of the note whose contents should be retrieved, while the response carries the contents of the note: //OK[1,["\x3Cbody class\x3D\"ennote\"\x3E\x3Cdiv\x3ESecond note content. \x3Cbr clear\x3D\"none\"/\x3E\x3C/div\x3E\x3C/body\x3E"],0,7] 23 Globally Unique Identifier: a 128-bit hexadecimal number that is supposed to uniquely identify an object. GUIDs usually are represented in the form 8-4-4-4-12, e.g. 12345678-90ab-cdef-1234- 567890abcdef 25
It may be observed that note bodies are considered and handled as HTML24, with the special characters (such as < and >) escaped. In a more readable format, the above note body would be: Second note content. Obviously, note contents are prime targets for encryption. 4.1.2.3 Creating Notes Notes are created by calling the createNote function. 7|0|14|https://www.evernote.com/focusclient/|304280BD765DCDBF4A79609E10928 B10|com.evernote.web.shared.GWTNoteStoreExtensions|createNote|com.evernote .edam.type.Note/4071998839|java.lang.String/2004016611|java.util.List|[Z/1 413617015|com.evernote.edam.type.NoteAttributes/74627218|me4626|Untitled|| java.util.ArrayList/4159755760|1|2|3|4|3|5|6|7|5|8|6|0|1|1|0|1|0|1|9|8|12| 0|0|0|0|0|0|0|0|0|0|0|1|0|0|10|0|0|0|0|0|0|0|0|0|0|A|A|A|A|0|0|0|0|A|0|0|0 |VjcXc9v|A|0|0|11|0|0|0|0|0|12|0|VjcXnr6|13|14|0| The function creates a new instance of the note object, sets its title to Untitled, and its content to an empty string. There's nothing sensitive here. The response: //OK['VjcXnoo',155,8,0,0,0,0,0,7,0,6,'A','VjcXc5I',154,72,77,-51,-102,- 43,36,23,-32,115,44,30,-68,-25,63,34,73,16,5,0,'A',0,0,0,0,'A','A','A', 'A',0,0,0.0,0.0,0,0,0,0,0,0,4,0,0.0,1,0,0,0,0,0,0,0,0,0,0,0,12,2,3,1,1,1,0 ,1,1,1,6,2,1,["com.evernote.edam.type.Note/4071998839","[Z/1413617015","co m.evernote.edam.type.NoteAttributes/74627218","me4626","[B/3308590456","b3 ba7ec9-9b0a-4ec3-8bba-fea2d9d07537","Untitled"],0,7] Here, it is shown that the new, third note was created and that it was assigned a GUID of b3ba7ec9-9b0a-4ec3-8bba-fea2d9d07537. There is nothing sensitive here, either. 4.1.2.4 Editing Notes Changes to note titles and contents are sent using the updateNoteIfUsnMatches function. 7|0|16|https://www.evernote.com/focusclient/|304280BD765DCDBF4A79609E10928 B10|com.evernote.web.shared.GWTNoteStoreExtensions|updateNoteIfUsnMatches| com.evernote.edam.type.Note/4071998839|java.lang.String/2004016611|java.ut il.List|[Z/1413617015|com.evernote.edam.type.NoteAttributes/74627218|me462 6|[B/3308590456|b1dc0239-b0b1-47ee-a17e-b69b66ba70e4|My second note| 24 Hypertext Markup Language: the markup language in which the layout of websites and web applications are written. 26
Second note content with updates. |java.util.ArrayList/4159755760|1|2|3|4|3|5|6|7|5|8|6|1|1|1|0|1|1|1|9|8|12 |0|0|0|0|0|0|0|0|0|0|0|0|0|0|10|0|0|0|0|0|0|0|0|0|0|A|A|A|A|0|0|0|0|A|0|11 |16|-63|-38|103|95|59|-46|120|35|67|-28|-12|55|80|-97|87|-57|173|VjcO$Kw|A |12|0|13|0|0|0|0|0|14|150|VjcQlIZ|15|16|0| The request contains the GUID of the note to be edited, the new title, and the new contents. The title and the contents to be encrypted here. The response: //OK[1,'VjcQk84',151,9,0,0,0,0,0,8,0,7,'A','VjcO$Kw',186,56,-126,-45,- 106,15,-115,-53,-4,75,84,33,48,-128,-12,-95,125,16,6,0,'A',0,0,0,0,'A', 'A','A','A',0,0,0.0,0.0,0,0,0,0,0,0,5,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,12,2,4 ,1,1,1,0,1,1,1,6,2,3,1,1,2,1,["com.evernote.edam.notestore.UpdateNoteIfUsn MatchesResult/226232967","[Z/1413617015","com.evernote.edam.type.Note/4071 998839","com.evernote.edam.type.NoteAttributes/74627218","me4626","[B/3308 590456","b1dc0239-b0b1-47ee-a17e-b69b66ba70e4","My second note"],0,7] The GUID and the note title are echoed back. The function name has the conditional clause if USN matches in its name, with USN most likely meaning Update Sequence Number. USNs are used in multi-client systems to ensure that an update cannot accidentally overwrite changes that were made by another update in the meantime. 4.1.2.5 Editing Reminders Reminders are set using the updateNote function. 7|0|12|https://www.evernote.com/focusclient/|78E137D3512D195F071EB90374365 42A|com.evernote.web.shared.GWTNoteStoreInterface|updateNote|com.evernote. edam.type.Note/4071998839|[Z/1413617015|com.evernote.edam.type.NoteAttribu tes/74627218|me4626|[B/3308590456|b1dc0239-b0b1-47ee-a17e-b69b66ba70e4|cd3 306ff-56c9-4ae8-9def-75ee96af1ee8|My second note|1|2|3|4|1|5|5|6|6|1|1|1|0| 1|1|1|7|6|12|0|0|0|0|0|1|0|1|0|0|0|0|0|0|8|0|0|0|0|0|0|0|0|0|0|A|VjcRAIZ| Vky2QMA|A|0|0|0|0|A|0|9|16|125|-95|12|-128|48|33|84|75|-4|-53|-115|15|-106 -45|-126|56|186|VjcO$Kw|A|10|0|11|0|0|0|0|0|12|153|VjcQk84| The GUID of the note (and for some reason, its title) are sent to the server. The request also contains several alphanumeric fields of length 7 (highlighted in blue or red). I knew these fields contained dates and times somehow, but I wasn’t sure how. I figured out that the one highlighted in red stores the date of the reminder by changing the reminder date multiple times. Smaller changes to the date resulted in smaller changes in the value, same dates resulted in the same value, and a later date resulted in a value that succeeded the previous values in alphabetical order (in other words, I got a string that was “greater” if the date was also “greater” than the previous one). I had a feeling these were 27
You can also read