Implementation and Evaluation of an Emulated Permission Sys tem for VS Code Extensions using Abstract Syntax Trees
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Linköping University | Department of Computer and Information Science Master’s thesis, 30 ECTS | Computer Science and Engineering 2021 | LIU-IDA/LITH-EX-A--21/054--SE Implementation and Evaluation of an Emulated Permission Sys‐ tem for VS Code Extensions using Abstract Syntax Trees Implementation och Utvärdering av ett Emulerat Be‐ hörighetssystem för Extensions i VS Code med hjälp av Abstrakta Syntaxträd David Åström Supervisor : Rouhollah Mahfouzi Examiner : Ahmed Rezine External supervisor : Magnus Kraft Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se
Upphovsrätt Detta dokument hålls tillgängligt på Internet ‐ eller dess framtida ersättare ‐ under 25 år från publicer‐ ingsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Över‐ föring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och till‐ gängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet än‐ dras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/. Copyright The publishers will keep this document online on the Internet ‐ or its possible replacement ‐ for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to down‐ load, or to print out single copies for his/hers own use and to use it unchanged for non‐commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/. © David Åström
Abstract Permission systems are a common security feature in browser extensions and mobile ap- plications to limit their access to resources outside their own process. IDEs such as Visual Studio Code, however, have no such features implemented, and therefore leave extensions with full user permissions. This thesis explores how VS Code extensions access exter- nal resources and presents a proof-of-concept tool that emulates a permission system for extensions. This is done through static analysis of extension source code using abstract syntax trees, scanning for usage of Extension API methods and Node.js dependencies. The tool is evaluated and used on 56 popular VS Code extensions to evaluate what re- sources are most prevalently accessed and how. The study concludes that most extensions use minimal APIs, but often rely on Node.js libraries rather than the API for external functionality. This leads to the conclusion the inclusion of Node.js dependencies and npm packages is the largest hurdle to implementing a permission system for VS Code.
Acknowledgments First of all, I would like to thank Cybercom Stockholm for the opportunity to conduct my thesis at their Innovation Zone and their constant support throughout the project. I would especially like to thank my supervisor Magnus Kraft as well as the other thesis students at the company for all the help and moral support during the distance work situation. I also want to thank my university supervisor Rouhollah Mahfouzi and examiner Ahmed Rezine for their feedback and aid with the project. iv
Contents Abstract iii Acknowledgments iv Contents v List of Figures vii List of Tables viii Listings 1 1 Introduction 2 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Theory 4 2.1 Software Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Permission systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Visual Studio Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 TypeScript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Static Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.6 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3 Method 12 3.1 Pre-study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4 Results 23 4.1 Pre-study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5 Discussion 27 5.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.3 Source criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.4 The work in a wider context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 6 Conclusion 33 6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 v
Bibliography 35 A Tested Extensions 39 B Non-namespace permission messages 41 C Testing results - Permissions 43 D Testing results - Dependencies 45 vi
List of Figures 3.1 Overview of the analyser architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 AST generated from the line let variable = object.function(prop1, prop2) 14 3.3 Visualisation of the traversal algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 Visualisation of the visitor pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 vii
List of Tables 2.1 VS Code API namespaces, with description if applicable, as cited from the official documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1 Permission messages related to each namespace. . . . . . . . . . . . . . . . . . . . . . 20 4.1 Permission messages detected in Bracket Pair Colorizer 2. . . . . . . . . . . . . . . . 24 4.2 Permission messages detected in Path Intellisense. . . . . . . . . . . . . . . . . . . . . 25 4.3 Permission messages detected in Live Server. . . . . . . . . . . . . . . . . . . . . . . . 25 4.4 Precision and Recall Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.5 Commonly occurring and interesting permissions found during testing . . . . . . . . 26 4.6 Commonly occurring and interesting dependencies found during testing . . . . . . . 26 viii
Listings 3.1 Tree traversal algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Default import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3 Namespace import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4 Named import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.5 Combined type import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.6 Require calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.7 Variable declaration and property assignment . . . . . . . . . . . . . . . . . . . . . 17 3.8 Variable assignment binary expression . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.9 Function declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.10 Method call expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.11 Example of the ImportReference JSON structure. . . . . . . . . . . . . . . . . . 18 3.12 Example of the usedProps JSON structure . . . . . . . . . . . . . . . . . . . . . . 18 3.13 Example of the output data JSON structure . . . . . . . . . . . . . . . . . . . . . 20 3.14 Bash command to extract installed extensions . . . . . . . . . . . . . . . . . . . . 22 4.1 Bash command to run the analysis tool . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2 Code line triggering Interact with the editor environment . . . . . . . . . . . . . . 24 4.3 Code line triggering Access all visible extensions . . . . . . . . . . . . . . . . . . . 24 4.4 Code line triggering Access the active editor . . . . . . . . . . . . . . . . . . . . . . 25 1
1 Introduction 1.1 Motivation Many popular Integrated Development Environments (IDEs) and text editors use extensions or plugins as a way for the user to extend functionality and tune the system to their liking. This offers customisation to the user but could also pose risks. Extensions are external software, often open-source, which are installed onto a system and given rights to control certain parts of the application and/or system. It is therefore very possible that they could introduce new security vulnerabilities. Most web browsers implement similar systems for extensions and research has been done on the security of browser extensions. Results have shown that extensions have the possibility to introduce vulnerabilities, or even function as malware. However, little to no formal research seems to have been made on the security of IDE extensions and it is rarely discussed in internet communities. Despite this, many developers tend to install such plugins without much consideration. In today’s work environment, which is becoming more and more remote, the dependency on these digital tools and extensions become even more important. At the same time, the possibility for companies to control and monitor them becomes even lower, which would make this issue very interesting to explore. In this thesis, the security of extensions in the popular open-source IDE Visual Studio Code (VS Code) will be explored and evaluated. The focus will be on the resources and services the extensions use, and what security implications that entails. In contrast to most browsers, VS Code does not provide a system for the user to view or control permissions for extensions to address this. In a solution similar to those implemented by browsers, the user would be able to choose what permissions the extension is granted, and thereby which resources and services it can access. 1.2 Aim As a first step towards this functionality, this thesis aims to investigate what resources a VS Code extension may have permission to access, and how access to these is implemented 2
1.3. Research questions today. Based on this investigation, the possibility of detecting instances of these accesses automatically, by statically analysing an extension, will be explored. This aims to result in a proof-of-concept tool, which will then be validated by using it to evaluate a number of popular VS Code extensions. By doing this evaluation, the aim is to gain a better understanding of what resources and services are more commonly accessed in the current landscape of extensions, and what security implications that may have. 1.3 Research questions Through these studies, this thesis aims to answer the following research questions: RQ1. What external resources can be accessed in a VS Code extension and how is this implemented? RQ2. Is it possible to detect occurrences of accessing external resources through static analysis of the extension source code? RQ3. What resources and services are more prevalently accessed in practice in popular extensions? 1.4 Delimitations The study will focus only on VS Code extensions and will not touch upon extensions for other IDEs unless they are relevant as related work. It will also focus on providing a proof-of-concept rather than a complete solution. Because of this, the study will not necessarily cover all possible resources that an extension may be able to access, nor all supported language constructs in TypeScript, but rather focus on a limited set of common resources that are deemed relevant to examine. Lastly, the thesis only focuses on VS Code extensions written in TypeScript, as these are the most common. Therefore, it will not attempt to analyse extensions written directly in JavaScript, extensions that do not utilise scripting, or extension components written in other languages. 3
2 Theory This chapter will establish the theoretical ground on which the study is built upon. First, it will cover background which introduces the themes relevant for the study. Following this, the chapter will cover related work which have been published in scientific conferences and journals. 2.1 Software Extensions Software extensions is a broad concept. Also known as plugins or add-ons, they are a type of smaller program that builds upon another host program to expand the functionality of the latter [12]. Extension systems are common in many types of software as a way to provide modularity to the user experience. Instead of the program developers having to design all possible features a user may want, third party developers, or the users themselves, can extend the program with the features they need [48]. Besides the benefits from a feature standpoint, designing a program for extensibility can have other positive implications. Wagner et al. [56] mention that designing for extensibility requires high modularity in the host program, something that is generally seen as an important aspect when designing software systems [18]. Designing for this modularity can often help reduce the complexity of the program as each module is responsible for a specific task. This modularity, and the possibility to even design intended main features of a program as extensions, can allow the host program to include only the central features. Instead, most features can be made optional to the user, something that greatly improves the possibility to customise the entire program to the individual user. This also makes deployment of new features a simpler task, as these can simply be plugged in as an extension [56]. By providing the extension functionality as an Application Programming Interface (API), the extensibility can be provided without the programmer even needing comprehensive knowledge of the host program. Wagner et al. [56] however, also mentions a drawback of providing extensibility, as the extension system needs to be robust enough to handle the complexity of several extensions, which may or may not conflict with each other. 4
2.2. Permission systems Browser extensions A common domain of extensible programs are web browsers. Most popular web browsers support extensions in some way: Google provides a platform for extensions for the Chrome Browser through its Chrome Web Store [14], Apple Safari has extension support through the App Store [45], Mozilla Firefox and Microsoft Edge uses the term add-ons, and provide them through their respective store [2, 39]. 2.2 Permission systems Permission systems are a common security feature of many client-based software domains. It is a type of access control that aims to restrict an application’s access to sensitive resources [44]. While there is no standard method to implement these, they are generally on the principle of least privilege [47]. Permission systems generally bases its functionality on running applica- tions in isolated environments with minimised privileges, where the application cannot directly access user-owned resources [44]. These could include sensitive data, access to hardware and sensors, access to the file system etc. All of which could include some kind of private infor- mation. To provide access to required resources, the host exposes some kind of API for the application to use. However, in order to use these, the application needs to request which APIs it requires access to, which the user must manually approve before it is granted. In the Android operating system [5], applications are isolated into individual processes called sandboxes [6]. Applications cannot interact with restricted data or actions by default, which includes resources that may be include sensitive information. Android divides permissions into Install-time permissions and Runtime permissions. Install-time permissions are presented before installation of the application and are granted automatically upon installation. These represent data and actions outside of the sandbox which is generally deemed of less risk to the user’s privacy or other applications [43]. Runtime permissions represent more sensitive resources. These need to be actively requested before accessing the resource and triggers a prompt which the user must approve before the permission is allowed [43]. Both permission types require the permission to be declared in the application’s manifest file [43]. Browser extension permission systems All major browsers have a permission system implemented for their extensions [42]. While these have been developed separately for many years, an emerging standard has appeared, called the WebExtensions API [31]. Extensions have very limited functionality by default, and in order to access more powerful functionality and resources, the browsers expose a set of JavaScript APIs. Each API corresponds to a permission which the extension must request access to in an extension manifest file called manifest.json, which is also standardised. Like with Android permissions, extension permission declarations can be declared as required and optional. The required permissions are granted at install-time, while optional permissions must be granted by the user during runtime. In addition to permissions for APIs, extensions can also declare host-permissions, which defines what web hosts it may interact with. These declarations are also made in the manifest.json and be made required or optional. Declaration is done through pattern matching, so extensions can request access either to specific hosts, or broader patterns covering some general domains, or even any host [19]. Permissions are divided based on how powerful they are. If the permission is less powerful, and therefore of lower risk, no action is required by the user when using these, and the extension is stated to require no special permissions. Using more powerful, higher risk permissions however, 5
2.3. Visual Studio Code result in a warning message being presented to the user. Each message is mapped to one or more permission in a table and describe in a short sentence what the implications of allowing the permissions are [20]. While the overall structure and APIs are more or less standardised between browsers, the actual implementation is up to the individual browsers, which has led to a few differences in compatibility and how the APIs are implemented [13]. In Chrome, Opera, and other Chromium based browsers, the extension API is implemented under the chrome namespace, and implemented using callbacks rather than promises. In Firefox and Edge, the API is instead implemented under the browser namespace, and in Firefox APIs implemented using promises, while Edge uses callbacks. Regarding compatibility, it varies between browsers, some APIs are not supported at all by some browsers, while some are only partly supported [9]. 2.3 Visual Studio Code Visual Studio Code, often abbreviated as VS Code, is a free code editor developed by Mi- crosoft [54]. By default, it provides a relatively small core with support for JavaScript, Type- Script and Node.js [41], as well as features such as a built-in Git client [22]. This core is also available open-source, mainly implemented in TypeScript, which allow companies and teams to implement their own version of the editor, or incorporate the editor into other software [40]. VS Code extensions While the default functionality in VS Code is relatively small, one of its main features is its extensibility, where further features being possible to add through extensions [35], to customise the editor for the individual user’s needs and preferences. VS Code uses an Extension host, a separate Node.js [41] process which hosts and manages extensions [23]. Through this, the functionality of extensions is separated from the main VS Code process, which limits exten- sions’ ability to interact with the program in unintended ways, for both stability and security reasons [26]. Similarly to browser extensions, the Extension host exposes a set of API endpoints which the extension can use to access and interact with VS Code itself, its environment, workspaces, editors etc. [55]. The VS Code API is divided into 11 namespaces with different functionality. These contain the main resources and methods related to the host program. A list of these together with a brief description of each namespace can be seen in table 2.1. Each VS Code extension consists of a few basic components. At its core, an extension is also a separate Node.js package and process. Therefore, it must contain a package.json [51] file in its root folder. This is a manifest file that define various properties of a Node.js package but is also used as an extension manifest by VS Code. While it may contain several different properties, the VS Code extension documentation [23] lists a few base properties that are the most important for extensions. The name and publisher fields are used as a unique ID for VS Code to identify the extension. main declares the main entry file of the extension. This is the file that contains the activate and deactivate functions that the Extension host invokes when an activation event related to the extension is fired. Activation events are the events that cause the extension to activate, such as certain commands or opening documents of a certain file type [1]. These are declared in the activationEvents field. In addition to these, the extension also declares how it contributes to VS Code in the contributes field. These contribution points could for example be commands, themes, or languages [16]. Lastly, the extension manifest must also declare the lowest supported VS Code version through engines.vscode. This property also allows the extension to use the VS Code API in its source files. 6
2.4. TypeScript Table 2.1: VS Code API namespaces, with description if applicable, as cited from the official documentation. Namespace Description authentication Namespace for authentication. commands Namespace for dealing with commands. In short, a command is a func- tion with a unique identifier. The function is sometimes also called com- mand handler. comments - debug Namespace for debug functionality. env Namespace describing the environment the editor runs in. extensions Namespace for dealing with installed extensions. Extensions are repre- sented by an extension-interface which enables reflection on them. languages Namespace for participating in language-specific editor features, like In- telliSense, code actions, diagnostics etc. scm - tasks Namespace for tasks functionality. window Namespace for dealing with the current window of the editor. That is visible and active editors, as well as, UI elements to show messages, selections, and asking for user input. workspace Namespace for dealing with the current workspace. A workspace is the collection of one or more folders that are opened in a VS Code window (instance). Extension language As VS Code itself is built on a TypeScript code base [40], it is also the endorsed language for building extensions. Both official and bundled extensions are generally built on TypeScript, as well as extension examples in the official documentation [25]. However, since TypeScript is compiled to JavaScript before runtime, as described in section 2.4, it is also possible to develop extensions directly in JavaScript as well. In addition, some extension functionality does not require scripting, and can simply be added with JSON files. Examples of such are adding colour themes [15] or syntax highlighting support for new languages [49]. 2.4 TypeScript TypeScript is a programming language that is built as a superset of JavaScript [53]. This means that it is a language that builds upon JavaScript, supporting the full JavaScript language, while adding features on top of it. The main feature offered by TypeScript is adding a static type system to JavaScript. This system allows developers to define object types, but also implements type inference where the type of a variable is inferred by the object it is assigned the value of. This type system is said to help developers increase the structure of the code, and help find errors earlier in the development phase [53]. TypeScript is implemented to be optional, in order to simplify adoption. It allows developers to convert parts of the codebase of a project to TypeScript at a time, only adding typing where necessary [53]. TypeScript is not traditionally compiled, but rather transformed into JavaScript before run- time. This also simplifies combining TypeScript and JavaScript in a project, but also adds another quirk. Since JavaScript does not support typing, this is lost in transformation. There- 7
2.5. Static Analysis fore, the typing and type checks only exist during development and analysis of the TypeScript code, but are not enforced during runtime [53]. 2.5 Static Analysis Static analysis is a general term for programmatically analysing software without executing the code. This in contrast to dynamic analysis which analyses programs during runtime. Static analysis normally analyses the non-compiled source code but may also analyse compiled bytecode [34]. Abstract Syntax Trees Abstract syntax trees (AST), or syntax trees, are an abstract representation of a program’s syntax, commonly used in compiler design and derived during compilation [3]. It represents syntax in a hierarchical tree, where each node and leaf represent a syntactical construct. In addition to its use in compilers, it is also often a central part of static code analysis and code checking. As it represents the source code in the context of the language, without taking for example white-spaces, dots, and commas into account, it allows for structured traversal and analysis of the source code. It also allows for modification of the source code by inserting or deleting nodes in the tree. One example of such a use case is in linters. These static analysis programs use the AST of source files to analyse the code [30]. They look for common coding errors and oversights that the programmer may miss and notify the user or directly corrects them before they get compiled. These errors may be things as unused variables or imports, unreachable code, or assignment of dereferenced pointers [32]. 2.6 Related work This section introduces previous research done on the security of extensions, as well as the implementation and effectiveness of permission systems as a security feature. This related work is introduced to put the work of this thesis in a larger context. Browser extension security While the risks of IDE extensions have not been explored, the area of browser extensions have been thoroughly researched. Several studies have evaluated popular extension in search for vulnerabilities. For example, Carlini et al. [10] evaluated 100 Google Chrome extensions by analysing and modifying their network traffic during runtime, as well as through manual static taint anal- ysis. They found that at least 40% of the analysed extension contain one or more vulnera- bilities. When evaluating these vulnerabilities to the security mechanism in Chrome, isolated environments, privilege separation, and permissions, the mechanisms were overall effective at preventing vulnerabilities, but also requires the developers to utilise them properly. It was not uncommon that developers actively circumvent them when implementing features. Bauer et al. [8] present a set of stealthier attacks to Chrome extensions. These range from tracking and stealing user data and behaviour, to privilege escalation by having extensions share data and state, thereby making data from an extension with limited permissions avail- able to another with another set of permissions. Some of these could be solved by websites implementing content security policies restricting the use of data from the website, as well as enforcing policies for information flow in the browser. More fine-grained permissions could 8
2.6. Related work solve some of the problems as well. Finally, they propose methods for increasing user aware- ness when installing extensions and noting the importance of providing the user with clear information about an extension. This allows them to make informed decisions and could help steer them to extension requiring less permissions. Wang et al. [57] used a modified browser based on Firefox to dynamically analyse 2,465 Firefox extensions for vulnerable behaviours. They categorise vulnerable behaviours in groups of high, medium, low, and none, based on their severity. The behaviours of highest severity include arbitrary file access, functionality to download, install, and launch processes, access to network functionality as well as injecting DOM objects. In their evaluation, they found occurrences of all these behaviours, and while they are not necessarily malicious, they may expose some vulnerabilities. Effectiveness of permissions The effectiveness of permissions as a security feature is another subject that has been well covered in literature. Felt et al. [28] evaluated the effectiveness of install-time permissions in Chrome extensions and Android applications. They find that install-time permissions are an effective security feature, but it is often compromised by some shortcomings in its design. Most evaluated applications request at least some dangerous permission. Several of these were deemed to be over-privileged, i.e., they request permissions that they do not actually need to fulfil their functionality. This would be both because of developer errors, but also from the possible permissions not being granular enough, requiring the application to request a permission where only a small part of it is needed. They also discuss the issue of dangerous permissions being too common or too broad. This prevalence causes desensitisation in users, where the warning messages lose their perceived importance. Users simply accepts warnings without paying closer attention to them. Marouf et al. [36] studied the Chrome extension permission model and come to many of the same conclusions as Felt et al. They propose a solution using optional runtime-permissions, which gives the user more fine grained control of the extensions permission during usage, some- thing that has since been implemented in both major browsers and Android applications [31, 43]. Over-privilege detection The issue of over-privilege mentioned above has also been studied further. Here, the focus is on detecting over-privileged applications and extensions. A common approach is the use of various machine-learning technologies. Khazaei et al. [33] propose a system called OPEXA, Over-Privileged EXtension Analyser, which use natural lan- guage processing on extension descriptions to detect what permissions the described function- ality. These results were compared to the actual declared permissions to detect over-privilege. Shezan et al. [46] used a similar methodology but increased the dataset by creating a model by mapping permissions from different domains with similar functionality. By doing this, browser extensions, mobile applications, and internet of things services can be analysed with the same tool. On the other hand, Wu et al. [58] detected over-privileged Android applications through data mining. They divided applications based on their store category and compared applications’ declared permissions to a set of permissions commonly needed by applications in the same category to detect over-privilege. 9
2.6. Related work The issue of over-privilege is similar to the problem of this thesis, as it concerns analysing what permissions the extension of application actually uses, or should reasonably need to use, compared to what it declares. The machine learning based approaches discussed above, does however require a permission system to be implemented to compare against, which makes those methods unfeasible for VS Code extensions at the moment. Tang et al. [50] take a static analysis approach to the problem in Android applications. They decompiled application binaries and checks the resulting source code for instances of permission-related method usages. After detecting what the application actually uses, they also apply the semantic approach and compare the application description to the result of the analysis to decide what represents over-privilege. Dennis et al. [21] present the tool P-Lint, a linter that reverse engineers Android application code and detects use of permission methods and especially focuses on improper or vulnerable use patterns. In another study, Chester et al. [11] present M-Perm, which combines static and dynamic analysis to detect both under- and over-permission. They also use decompiled applications to statically compare the declared permissions to those being used in each file. In addition, a call graph of the application is used to deduce the reachability of each permission from each entry-point of the application. With this, they can detect which permissions are active at each application state. This approach shows that using static analysis to check the code for permission usages is a valid approach. In Android applications, the fact that applications are published compiled and need decompilation before analysis, makes the static analysis a more difficult task. Bartel et al. [7] mention a few of these difficulties, making naive static analysis non-sufficient in many situations. One example is the difficulty of deciding which permission a permission use is connected to, due to string literals not necessarily being preserved during decompilation. In the case of VS Code extensions however, this should not be an issue due to the extensions generally having their source code fully available as open-source. Evaluation of static analysis tools While security tools asses the quality of other software, the tools themselves need to be eval- uated in order to verify their effectiveness. Soundness is a commonly used property of static analysers which relates to if the analyser is guaranteed, i.e., if there exist a vulnerability or piece of code that the analyser should be reporting, it will report that item [38]. While soundness can often be a property to strive for, one issue is that it may result in a large number of false positives, which in the end results in the true positives losing their value. In contrast to soundness there is the concept of completeness. If an analyser is complete, it is provable that all items reported are true positives [38]. As soundness often brings a lot of false positives, completeness instead often brings false negatives - that is when items that should be reported are missed by the analyser. Both soundness and completeness are, by definition, all or nothing [38]. Therefore, in order to prove soundness for example, a formal proof is often needed, which is often not practical in real contexts. Instead of aiming for proving soundness, aiming for a probably sound analysis is often more practical. This can be done through for example exhaustive testing or as done by Andreasen et al. [4], through comparing the static analysis to that seen in dynamic analysis. Evaluating static analysis tools often looks at detected and non-detected instances. These are classified as four different categories: 10
2.6. Related work • True Positives - Vulnerabilities detected as such. • False Positives - Non-vulnerabilities detected as vulnerabilities. • True Negatives - Non-vulnerabilities not detected by the tool. • False Negatives - Vulnerabilities missed by the tool. These categories can be compared in various ways to calculate various metrics related to the effectiveness in detecting vulnerabilities. In order to measure soundness and completeness, two metrics are commonly introduced - Recall and Precision. Recall measures the level of soundness of an analyser. A high recall metric means a higher soundness. Recall is calculated by comparing the number of true positives to the total number of items the analyser should report [38]. The formula therefore is: T rueP ositives Recall = T rueP ositives + F alseN egatives Precision on the other hand, is a metric that allows you to measure the number of true positives to false positives, that is, the level of completeness in the analyser [38]. This is calculated as: T rueP ositives P recision = T rueP ositives + F alseP ositives Tang et al. [50] compare the result of their static and semantic analysis to a human reading the description of an app. Based on the text, the reader makes an assumption of what permissions would be needed for this functionality. Based on these results, they calculate precision and recall, and also introduce two additional metrics: 2 ∗ P recision ∗ Recall F − measure = P recision + Recall T rueP ositives + T rueN egatives Accuracy = T rueP ositives + F alseP ositives + T rueN egatives + F alseN egatives F-measure aims to find the best compromise between precision and recall, while accuracy gives a general score on how well the analysis performs in regard to both true positives and true negatives. In their study, they especially focus on the F-measure in order to maximise soundness, while minimising the number of false positives. All these could be interesting to explore, while a real-life setting could also introduce aspects as speed of the analysis as a relevant metric to measure. In this study, however, recall and precision are chosen as the metrics to evaluate. 11
3 Method This chapter documents the methodology used throughout the thesis. First, the pre-study is described in section 3.1, followed by the implementation in section 3.2. This section describes how the static analysis for detecting method usage in extensions was implemented. Finally, section 3.3 describes how the effectiveness of the tool was evaluated by analysing a larger set of extensions. 3.1 Pre-study In order to gather enough information about VS Code extensions, permission systems, and static analysis, to be able to answer RQ1 and determine the possibility of implementing a static analysis tool that detects method usage in VS Code extensions, a pre-study was conducted. This pre-study was divided into two main phases. First, information on VS Code extensions and their anatomy and architecture was gathered. This was mainly done through studying the official documentation which describes the inner workings of extensions and how to develop your own [24]. The second phase consisted of gathering theoretic knowledge and related work on the subjects relevant to the thesis. This was done through searching for scientific articles, mainly on Google Scholar, for keywords such as extension security, permission models, static analysis, over-privilege. Through articles found with this method, further material was gathered through their respective referenced articles and citations. In those cases where no published material could be found, less formal material such as documentation and developer blogs were used to gain a better understanding on the subject. Most of the result from this study can be read in the Theory chapter, while answers to RQ1 are presented in the Results chapter. 12
3.2. Implementation 3.2 Implementation The implementation phase was conducted with the goal of creating a static analysis tool that, with the input of the source code of a VS Code extension, can analyse the program and return a list of the extension’s capabilities from a permission point of view. The tool itself is built in TypeScript to simplify extracting ASTs for extension TypeScript source files. It is implemented as a Node.js application to allow local execution and is currently implemented for use as a command line interface (CLI), while measures have been made to simplify connecting the application to a web service or similar. An overview of the system can be seen in figure 3.1 and the individual components and tasks of the tool will be presented in detail below. Figure 3.1: Overview of the analyser architecture Defining the extension root The first task of the analyser is to find and define the root directory of the extension within the provided directory. The input directory is assumed to be cloned or downloaded directly from a Git repository, and while most extensions are published by themselves, with the extension root as the repository root, there might be exceptions to this. The extension MetaGo [37] is an example of this, as it also publishes its sub-extensions MetaJump and MetaWord under the same Git repository. To accommodate this, the possible extension roots of the provided source code is identified by recursively searching through all subdirectories. The goal is to find directories that contain both a package.json file, which indicates an extension, and a tsconfig.json file, which indicates that TypeScript is used throughout the extension. Each such occurrence is stored to be analysed separately as an individual extension. 13
3.2. Implementation Abstract Syntax Trees In order to analyse each extension, an AST representation of each source file in the project is extracted. TypeScript supplies a compiler API through the TypeScript npm package. This is the same functionality used in the TypeScript compiler, and the API can therefore directly provide an AST representation of an imported .ts source file. An example of the AST structure provided by the compiler API can be seen in figure 3.2. Figure 3.2: AST generated from the line let variable = object.function(prop1, prop2) ts-morph While the AST and methods provided by the compiler API could be used directly for the analysis, ts-morph is an open-source library that wraps the compiler API in order to simplify navigation and manipulation of TypeScript ASTs [52]. ts-morph is used throughout this project to import and navigate the source file ASTs. Some examples of functionality ts-morph adds, that are used in this project, are the possibility of finding AST nodes of certain kinds, and also the mapping of identifiers to references of that identifier. This simplifies the task of finding uses of imported methods. AST traversal The analysis is done through traversing the AST of each project source file. Traversal is done depth first, using both preorder and postorder tree walk [17]. By doing this, actions and analysis of nodes can be done both on entry and exit of the node, i.e., before or after visiting each child of the node. This strategy is essentially the same as the one used by estraverse in ESLint [30]. The basic traversal algorithm is presented in listing 3.1 and is visualised further in figure 3.2. Listing 3.1: Tree traversal algorithm 1 function traverse (node) { 2 before (node) 3 4 for ( childNode in node) { 5 traverse ( childNode ) 6 } 7 8 after (node) 14
3.2. Implementation 9 } Figure 3.3: Visualisation of the traversal algorithm Node Visitors Visiting the nodes during the traversal is implemented using a variant of a visitor pattern [29]. However, as the node types were not modifiable, the usual double dispatch functionality was not possible to implement, why the current solution resorts to a list of if statements for type checking. A UML-diagram of the implemented pattern can be seen in figure 3.2. The visitor structure is based around an abstract NodeVisitor class. This class defines three methods. The before and after methods correspond to the same methods in the algorithm above. These take a node as input and define the visit behaviour for different node types, either to be handled before or after visiting the node’s children. Generally, most function- ality is implemented in the before method, but after is used in certain situations. The provideResults method is supposed to be called on after analysis is completed. It sum- marises and returns the results of the visitor’s analysis. The NodeVisitor class is purposefully designed to be modular to allow for extension of the analyser in the future if needed. Figure 3.4: Visualisation of the visitor pattern 15
3.2. Implementation AbstractMethodVisitor For this thesis, a single subtype of NodeVisitor is implemented, AbstractMethodVisitor. This subtype is also abstract and contains functionality for searching source files for imported method calls. In turn, two concrete classes of this abstract class have been implemented, ImportMethodVisitor and VsCodeMethodVisitor. VsCodeMethodVisitor is responsible for specifically analysing methods from the VS Code API, while ImportMethodVisitor is more general and analyses other imported packages. This subtype of NodeVisitor is based around individual source files. It requires analysis to be started with a complete source file AST, i.e., an AST with a SourceFile node as root. Before entering a source file, it stores a new SourceFileData object. This object contains basic information about the source file, and dependencies and methods found during analysis is stored in this object. Upon exiting the same node, the current SourceFileData object is pushed to a separate list for later extraction. Implemented language constructs For the implementation, support for a reduced set of language constructs was implemented in the analyser. These were chosen to represent a base set of the most common ways to access dependencies, properties, and methods, as well as to store subreferences to these. In the following sections, these constructs and their implemented behaviour will be described in more detail. • Import declarations are the main method for importing external packages and mod- ules into a TypeScript source file. When visiting an import declaration, the analyser stores a reference to the imported package for each imported module. Imports in Type- Script come in three main types: Default, Namespace, and Named. – Default imports the default export from the package. Listing 3.2: Default import 1 import Module from 'package '; – Namespace imports the entire package namespace into a single variable. Listing 3.3: Namespace import 1 import * as Alias from 'package '; – Named imports one or more individual modules from the package. Listing 3.4: Named import 1 import { Module1 , Module2 } from 'package '; In addition, it is possible for all imports to be declared with aliases, where the actual variable name is changed from the module name. It is also possible to combine different import types in a single import statement which is also handled by the visitor. Listing 3.5: Combined type import 1 import DefaultModule , { Module1 } from 'package '; 16
3.2. Implementation • Require calls are another way to import packages and modules. These function sim- ilarly to import statements but are assigned as a regular variable declaration and are therefore processed slightly different to import statements, although the resulting refer- ence is the same. Listing 3.6: Require calls 1 const module = require ('package '); 2 const { module1 , module2 } = require ('package '); • Variable declarations and property assignments are treated in much the same way. If a property of an imported package is assigned to a variable or object property, the assigned variable is stored with a reference to the imported property it was assigned with. However, if the variable is assigned with a method call, it is not stored, as the analyser does not handle return objects at this moment. For further description on handling of method calls, se Call expressions. Listing 3.7: Variable declaration and property assignment 1 let variable = module . property ; 2 object . property = module . property ; • Binary expressions represent, among other types of expressions, variable assignments in the AST, i.e., assignment of values to already declared variables. These are identifiable as an Identifier node as a left node followed by an equals sign. In contrast to variable declarations and property assignments, these are not represented by a unique node type in the AST, why this special handling is needed. Except for that these instances are treated in the same way. Listing 3.8: Variable assignment binary expression 1 variable = module . property ; • Function declarations are somewhat handled by the analyser. While functions them- selves are currently not handled, function parameters may represent assignment of an imported type or interface. Therefore, parameters with imported types are also stored as reference to imported modules. This allows method calls on these parameters inside the method to be tracked as well. Listing 3.9: Function declaration 1 function ( parameter : ImportedModule ) {} • Call expressions are the node representation of function and method calls. If the property that contains the method being called is reference to an imported module, the method is tracked as used, and added to the result of the analysis. Listing 3.10: Method call expression 1 module . method (); Reference Handling Upon finding an instance of an imported method or property being referenced, that instance needs to be stored, either to be able to detect further references to the referenced symbol, or to later extract all used properties and method calls. In each source file object created by a 17
3.2. Implementation NodeVisitor a set of dependencies are stored, either as a single vsCodeDependency or a list of importDependencies. These each represent an imported npm package. Each dependency contains the name of the dependency, a usedProps property, and an importReferences list. importReferences References to a dependency throughout the analysis is stored in a nested list of ImportReference objects. At the top level, DirectImportReference objects are stored which represent references directly connected to the package. These are references that are added through import statements or require calls. Each reference also stores subReferences. These are assignments that reference this reference. References are identified using Identifier nodes. When adding a new reference to the structure, all Identifier nodes that references this node are found using ts-morph functionality and stored in the usages field. When, for example, a call expression or variable assignment are found during analysis, the expression is mapped against existing reference usages to identify if the expression references an imported package or its sub references. In addition to these fields, each reference also contains the name, any potential alias of the reference, i.e., alias imports or variable names, its declaration Identifier, as well as a reference to the parent node of the tree, either a reference or a dependency. The structure of these objects can be seen in listing 3.11. Listing 3.11: Example of the ImportReference JSON structure. 1 DirectImportReference { 2 name: " module ", 3 identifier : Identifier , 4 usages : [ Identifier1 , Identifier2 , ...] , 5 subReferences : [ 6 SecondaryImportReference { 7 name: " property ", 8 identifier : Identifier1 , 9 usages : [ Identifier3 , Identifier4 , ...] , 10 subReferences : [...] 11 aliasReference : " variableName ", 12 reference : Parent , 13 }, 14 ... 15 ] 16 aliasReference : " variableName ", 17 dependency : Parent 18 } usedProps When imported methods are called or properties accessed, these are also pushed to a separate set called usedProps. This is stored as a nested object, but in contrast to importReferences, this only contains unique values and only the name of the resource as defined by the VS Code API. This is the list of props that is extracted from the source file after analysis. An example can be seen in listing 3.12. Listing 3.12: Example of the usedProps JSON structure 1 usedProps : { 2 module1 : { 18
3.2. Implementation 3 property1 : { 4 method1 : {} 5 }, 6 method2 : {} 7 }, 8 module2 : { 9 method3 : {} 10 } 11 } Used Prop extraction Once analysis of all source files of an extension is finished, all usedProps are combined into a single object for the entire extension. This is done in the provideResults method in each NodeVisitor. The method iterates through each source file and dependency, merging the usedProps into a single object per dependency. The resulting list of dependencies and used props are then saved to the analyser. Permissions As no existing permission system is implemented for VS Code, there exists no mapping between API methods and potential warning messages. This mapping therefore has to be created manually. API extraction The VS Code API is available from the documentation and is generated from a vscode.d.ts type declaration file [55]. In order to work with the API, it is first extracted to a JSON file. This is also done using the ts-morph library. Using the AST of the file, all namespace and interface/class names can be easily extracted and stored. For each namespace and interface/- class, all properties and methods are then extracted and stored. If a property is of a type defined by the API, the type name is also stored in the property to map the property to any method usage found during analysis. Finally, the extracted API is stored as a JSON file to be imported during analysis. Permission categories When designing permission categories for the Extension API, a similar strategy to the system implemented in the WebExtensions API [31] is used. Each namespace in the Extension API is treated as an API in the WebExtensions API. These are therefore each mapped to a permission message as seen in table 3.1. These messages are formulated to be short and concise as to not overload a potential user with information, while also assuming that the user has some computer knowledge and are familiar with basic concepts in VS Code such as workspaces and commands. However, as many studies on browser extension permissions suggest, the granularity of group- ing permissions by namespace or API is often not enough, leading to over-privilege issues [8, 28, 36]. In an attempt to combat this, additional messages are mapped to properties and methods of namespaces. In the case of properties being of types defined in the API, some interfaces have messages mapped to properties and methods as well, as to define the specific types of actions being done to the properties. While these more granular permissions could be added to all possible actions, focus has been put into those of types that would be deemed of more sensitive nature, based on their corresponding permissions in browser extensions, as 19
3.2. Implementation described in section 2.6. These are methods and properties related to file access and modifica- tion, script execution, interactions with external URIs, and system specific resources such as machine ids and using the clipboard. A full list of all messages and the corresponding methods and properties they are mapped to can be seen in appendix B. Each message is stored using an enum mapping to the message. The enums were in turn added manually to each relevant instance in the API JSON file. Table 3.1: Permission messages related to each namespace. Namespace Permission message authentication Interact with and handle third-party authentication providers commands Interact with VS Code commands comments Interact with the Comments interface debug Interact with the Debug interface env Interact with the editor environment extensions Interact with installed extensions languages Interact with language features scm Interact with Source Control Managers tasks Interact with VS Code Tasks window Interact with the editor window workspace Interact with the current workspace Permission mapping As both used methods and properties, and the extracted API are stored as hierarchical trees identified with strings, the used methods and properties are mapped recursively directly to the API. If any namespace, property, method etc. in the recursive chain maps to an instance in the API that contains a permission message, that message is added to a separate list of triggered permissions for the application. In the case of a method being applied to a namespace property, this method permission is mapped as a child to the property permission. As such, this allows the analyser to, for example, see that it is specifically the document in the active editor that is being edited, rather than any arbitrary document, increasing the granularity further. Data extraction The analysis program is designed for the resulting data to be processed by some other program for visualisation, for example a web client, why the resulting data is extracted in JSON format. This was chosen because of two advantages. First, it is easily readable both by humans and most programming languages, as it stores data as readable key-value pairs. It is also easy to extract the data from TypeScript objects using built-in serialisation methods. An example of the output data format can be seen in listing 3.13. Listing 3.13: Example of the output data JSON structure 1 { 2 " extension_1 ": { 3 " messages ": [ 4 { 5 " message ": " Interact with the editor window ", 6 " subMessages ": [ 7 { 8 " message ": " Access the active editor ", 9 " subMessages ": [] 20
You can also read