Implementation and Evaluation of an Emulated Permission Sys tem for VS Code Extensions using Abstract Syntax Trees

Page created by Monica Miller
 
CONTINUE READING
Linköping University | Department of Computer and Information Science

               Master’s thesis, 30 ECTS | Computer Science and Engineering

                                     2021 | LIU-IDA/LITH-EX-A--21/054--SE

Implementation and Evaluation
of an Emulated Permission Sys‐
tem for VS Code Extensions using
Abstract Syntax Trees
Implementation och Utvärdering av ett Emulerat Be‐
hörighetssystem för Extensions i VS Code med hjälp av Abstrakta
Syntaxträd

David Åström

Supervisor : Rouhollah Mahfouzi
Examiner : Ahmed Rezine

External supervisor : Magnus Kraft

                                                      Linköpings universitet
                                                        SE–581 83 Linköping
                                                +46 13 28 10 00 , www.liu.se
Upphovsrätt
Detta dokument hålls tillgängligt på Internet ‐ eller dess framtida ersättare ‐ under 25 år från publicer‐
ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för
enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Över‐
föring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning
av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och till‐
gängligheten finns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god
sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet än‐
dras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens
litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida
http://www.ep.liu.se/.

Copyright
The publishers will keep this document online on the Internet ‐ or its possible replacement ‐ for a
period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down‐
load, or to print out single copies for his/hers own use and to use it unchanged for non‐commercial
research and educational purpose. Subsequent transfers of copyright cannot revoke this permission.
All other uses of the document are conditional upon the consent of the copyright owner. The publisher
has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is
accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures
for publication and for assurance of document integrity, please refer to its www home page:
http://www.ep.liu.se/.

© David Åström
Abstract

Permission systems are a common security feature in browser extensions and mobile ap-
plications to limit their access to resources outside their own process. IDEs such as Visual
Studio Code, however, have no such features implemented, and therefore leave extensions
with full user permissions. This thesis explores how VS Code extensions access exter-
nal resources and presents a proof-of-concept tool that emulates a permission system for
extensions. This is done through static analysis of extension source code using abstract
syntax trees, scanning for usage of Extension API methods and Node.js dependencies.
The tool is evaluated and used on 56 popular VS Code extensions to evaluate what re-
sources are most prevalently accessed and how. The study concludes that most extensions
use minimal APIs, but often rely on Node.js libraries rather than the API for external
functionality. This leads to the conclusion the inclusion of Node.js dependencies and npm
packages is the largest hurdle to implementing a permission system for VS Code.
Acknowledgments

First of all, I would like to thank Cybercom Stockholm for the opportunity to conduct my
thesis at their Innovation Zone and their constant support throughout the project. I would
especially like to thank my supervisor Magnus Kraft as well as the other thesis students at
the company for all the help and moral support during the distance work situation. I also want
to thank my university supervisor Rouhollah Mahfouzi and examiner Ahmed Rezine for
their feedback and aid with the project.

                                              iv
Contents

Abstract                                                                                                                                                                            iii

Acknowledgments                                                                                                                                                                     iv

Contents                                                                                                                                                                             v

List of Figures                                                                                                                                                                    vii

List of Tables                                                                                                                                                                     viii

Listings                                                                                                                                                                             1

1 Introduction                                                                                                                                                                       2
  1.1 Motivation . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     2
  1.2 Aim . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     2
  1.3 Research questions       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     3
  1.4 Delimitations . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     3

2 Theory                                                                                                                                                                             4
  2.1 Software Extensions          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     4
  2.2 Permission systems .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     5
  2.3 Visual Studio Code           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     6
  2.4 TypeScript . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     7
  2.5 Static Analysis . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     8
  2.6 Related work . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     8

3 Method                                                                                             12
  3.1 Pre-study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
  3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
  3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Results                                                                                            23
  4.1 Pre-study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
  4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
  4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5 Discussion                                                                                                                                                                        27
  5.1 Results . . . . . . . . . . . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    27
  5.2 Method . . . . . . . . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    29
  5.3 Source criticism . . . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    31
  5.4 The work in a wider context                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    31

6 Conclusion                                                                                        33
  6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

                                                                           v
Bibliography                            35

A Tested Extensions                     39

B Non-namespace permission messages     41

C Testing results - Permissions         43

D Testing results - Dependencies        45

                                   vi
List of Figures

3.1   Overview of the analyser architecture . . . .    . . . . . . . . . . . . . . . . . . . . . . . .   13
3.2   AST generated from the line let variable         = object.function(prop1, prop2)                   14
3.3   Visualisation of the traversal algorithm . . .   . . . . . . . . . . . . . . . . . . . . . . . .   15
3.4   Visualisation of the visitor pattern . . . . .   . . . . . . . . . . . . . . . . . . . . . . . .   15

                                                vii
List of Tables

2.1   VS Code API namespaces, with description if applicable, as cited from the official
      documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                7

3.1   Permission messages related to each namespace. . . . . . . . . . . . . . . . . . . . . .                        20

4.1   Permission messages detected in Bracket Pair Colorizer 2. . . . . . . . .           .   .   .   .   .   .   .   24
4.2   Permission messages detected in Path Intellisense. . . . . . . . . . . . . .        .   .   .   .   .   .   .   25
4.3   Permission messages detected in Live Server. . . . . . . . . . . . . . . . .        .   .   .   .   .   .   .   25
4.4   Precision and Recall Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   25
4.5   Commonly occurring and interesting permissions found during testing .               .   .   .   .   .   .   .   26
4.6   Commonly occurring and interesting dependencies found during testing                .   .   .   .   .   .   .   26

                                                   viii
Listings

 3.1    Tree traversal algorithm . . . . . . . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   14
 3.2    Default import . . . . . . . . . . . . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
 3.3    Namespace import . . . . . . . . . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
 3.4    Named import . . . . . . . . . . . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
 3.5    Combined type import . . . . . . . . . . . . . . . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
 3.6    Require calls . . . . . . . . . . . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   17
 3.7    Variable declaration and property assignment . . . . . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   17
 3.8    Variable assignment binary expression . . . . . . . . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   17
 3.9    Function declaration . . . . . . . . . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   17
 3.10   Method call expression . . . . . . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   17
 3.11   Example of the ImportReference JSON structure. . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
 3.12   Example of the usedProps JSON structure . . . . . . . .             .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
 3.13   Example of the output data JSON structure . . . . . . .             .   .   .   .   .   .   .   .   .   .   .   .   .   .   20
 3.14   Bash command to extract installed extensions . . . . . .            .   .   .   .   .   .   .   .   .   .   .   .   .   .   22
 4.1    Bash command to run the analysis tool . . . . . . . . . .           .   .   .   .   .   .   .   .   .   .   .   .   .   .   23
 4.2    Code line triggering Interact with the editor environment           .   .   .   .   .   .   .   .   .   .   .   .   .   .   24
 4.3    Code line triggering Access all visible extensions . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   24
 4.4    Code line triggering Access the active editor . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   25

                                                  1
1 Introduction

1.1 Motivation
Many popular Integrated Development Environments (IDEs) and text editors use extensions or
plugins as a way for the user to extend functionality and tune the system to their liking. This
offers customisation to the user but could also pose risks. Extensions are external software,
often open-source, which are installed onto a system and given rights to control certain parts
of the application and/or system. It is therefore very possible that they could introduce new
security vulnerabilities.

Most web browsers implement similar systems for extensions and research has been done on
the security of browser extensions. Results have shown that extensions have the possibility to
introduce vulnerabilities, or even function as malware. However, little to no formal research
seems to have been made on the security of IDE extensions and it is rarely discussed in
internet communities. Despite this, many developers tend to install such plugins without
much consideration. In today’s work environment, which is becoming more and more remote,
the dependency on these digital tools and extensions become even more important. At the
same time, the possibility for companies to control and monitor them becomes even lower,
which would make this issue very interesting to explore.

In this thesis, the security of extensions in the popular open-source IDE Visual Studio Code
(VS Code) will be explored and evaluated. The focus will be on the resources and services the
extensions use, and what security implications that entails. In contrast to most browsers, VS
Code does not provide a system for the user to view or control permissions for extensions to
address this. In a solution similar to those implemented by browsers, the user would be able
to choose what permissions the extension is granted, and thereby which resources and services
it can access.

1.2 Aim
As a first step towards this functionality, this thesis aims to investigate what resources a
VS Code extension may have permission to access, and how access to these is implemented

                                              2
1.3. Research questions

today. Based on this investigation, the possibility of detecting instances of these accesses
automatically, by statically analysing an extension, will be explored. This aims to result in a
proof-of-concept tool, which will then be validated by using it to evaluate a number of popular
VS Code extensions. By doing this evaluation, the aim is to gain a better understanding of
what resources and services are more commonly accessed in the current landscape of extensions,
and what security implications that may have.

1.3 Research questions
Through these studies, this thesis aims to answer the following research questions:

   RQ1. What external resources can be accessed in a VS Code extension and how is this
        implemented?

   RQ2. Is it possible to detect occurrences of accessing external resources through static
        analysis of the extension source code?

   RQ3. What resources and services are more prevalently accessed in practice in popular
        extensions?

1.4 Delimitations
The study will focus only on VS Code extensions and will not touch upon extensions for other
IDEs unless they are relevant as related work.

It will also focus on providing a proof-of-concept rather than a complete solution. Because of
this, the study will not necessarily cover all possible resources that an extension may be able
to access, nor all supported language constructs in TypeScript, but rather focus on a limited
set of common resources that are deemed relevant to examine.

Lastly, the thesis only focuses on VS Code extensions written in TypeScript, as these are
the most common. Therefore, it will not attempt to analyse extensions written directly in
JavaScript, extensions that do not utilise scripting, or extension components written in other
languages.

                                                                                             3
2 Theory

This chapter will establish the theoretical ground on which the study is built upon. First,
it will cover background which introduces the themes relevant for the study. Following this,
the chapter will cover related work which have been published in scientific conferences and
journals.

2.1 Software Extensions
Software extensions is a broad concept. Also known as plugins or add-ons, they are a type
of smaller program that builds upon another host program to expand the functionality of the
latter [12]. Extension systems are common in many types of software as a way to provide
modularity to the user experience. Instead of the program developers having to design all
possible features a user may want, third party developers, or the users themselves, can extend
the program with the features they need [48].

Besides the benefits from a feature standpoint, designing a program for extensibility can have
other positive implications. Wagner et al. [56] mention that designing for extensibility requires
high modularity in the host program, something that is generally seen as an important aspect
when designing software systems [18]. Designing for this modularity can often help reduce the
complexity of the program as each module is responsible for a specific task. This modularity,
and the possibility to even design intended main features of a program as extensions, can allow
the host program to include only the central features. Instead, most features can be made
optional to the user, something that greatly improves the possibility to customise the entire
program to the individual user. This also makes deployment of new features a simpler task, as
these can simply be plugged in as an extension [56]. By providing the extension functionality
as an Application Programming Interface (API), the extensibility can be provided without the
programmer even needing comprehensive knowledge of the host program.

Wagner et al. [56] however, also mentions a drawback of providing extensibility, as the extension
system needs to be robust enough to handle the complexity of several extensions, which may
or may not conflict with each other.

                                               4
2.2. Permission systems

Browser extensions
A common domain of extensible programs are web browsers. Most popular web browsers
support extensions in some way: Google provides a platform for extensions for the Chrome
Browser through its Chrome Web Store [14], Apple Safari has extension support through the
App Store [45], Mozilla Firefox and Microsoft Edge uses the term add-ons, and provide them
through their respective store [2, 39].

2.2 Permission systems
Permission systems are a common security feature of many client-based software domains. It is
a type of access control that aims to restrict an application’s access to sensitive resources [44].
While there is no standard method to implement these, they are generally on the principle of
least privilege [47]. Permission systems generally bases its functionality on running applica-
tions in isolated environments with minimised privileges, where the application cannot directly
access user-owned resources [44]. These could include sensitive data, access to hardware and
sensors, access to the file system etc. All of which could include some kind of private infor-
mation. To provide access to required resources, the host exposes some kind of API for the
application to use. However, in order to use these, the application needs to request which
APIs it requires access to, which the user must manually approve before it is granted.

In the Android operating system [5], applications are isolated into individual processes called
sandboxes [6]. Applications cannot interact with restricted data or actions by default, which
includes resources that may be include sensitive information. Android divides permissions
into Install-time permissions and Runtime permissions. Install-time permissions are presented
before installation of the application and are granted automatically upon installation. These
represent data and actions outside of the sandbox which is generally deemed of less risk to
the user’s privacy or other applications [43]. Runtime permissions represent more sensitive
resources. These need to be actively requested before accessing the resource and triggers a
prompt which the user must approve before the permission is allowed [43]. Both permission
types require the permission to be declared in the application’s manifest file [43].

Browser extension permission systems
All major browsers have a permission system implemented for their extensions [42]. While
these have been developed separately for many years, an emerging standard has appeared,
called the WebExtensions API [31]. Extensions have very limited functionality by default,
and in order to access more powerful functionality and resources, the browsers expose a set
of JavaScript APIs. Each API corresponds to a permission which the extension must request
access to in an extension manifest file called manifest.json, which is also standardised. Like
with Android permissions, extension permission declarations can be declared as required and
optional. The required permissions are granted at install-time, while optional permissions
must be granted by the user during runtime.

In addition to permissions for APIs, extensions can also declare host-permissions, which defines
what web hosts it may interact with. These declarations are also made in the manifest.json
and be made required or optional. Declaration is done through pattern matching, so extensions
can request access either to specific hosts, or broader patterns covering some general domains,
or even any host [19].

Permissions are divided based on how powerful they are. If the permission is less powerful, and
therefore of lower risk, no action is required by the user when using these, and the extension is
stated to require no special permissions. Using more powerful, higher risk permissions however,

                                                                                                 5
2.3. Visual Studio Code

result in a warning message being presented to the user. Each message is mapped to one or
more permission in a table and describe in a short sentence what the implications of allowing
the permissions are [20].

While the overall structure and APIs are more or less standardised between browsers, the
actual implementation is up to the individual browsers, which has led to a few differences
in compatibility and how the APIs are implemented [13]. In Chrome, Opera, and other
Chromium based browsers, the extension API is implemented under the chrome namespace,
and implemented using callbacks rather than promises. In Firefox and Edge, the API is instead
implemented under the browser namespace, and in Firefox APIs implemented using promises,
while Edge uses callbacks. Regarding compatibility, it varies between browsers, some APIs are
not supported at all by some browsers, while some are only partly supported [9].

2.3 Visual Studio Code
Visual Studio Code, often abbreviated as VS Code, is a free code editor developed by Mi-
crosoft [54]. By default, it provides a relatively small core with support for JavaScript, Type-
Script and Node.js [41], as well as features such as a built-in Git client [22]. This core is also
available open-source, mainly implemented in TypeScript, which allow companies and teams
to implement their own version of the editor, or incorporate the editor into other software [40].

VS Code extensions
While the default functionality in VS Code is relatively small, one of its main features is its
extensibility, where further features being possible to add through extensions [35], to customise
the editor for the individual user’s needs and preferences. VS Code uses an Extension host,
a separate Node.js [41] process which hosts and manages extensions [23]. Through this, the
functionality of extensions is separated from the main VS Code process, which limits exten-
sions’ ability to interact with the program in unintended ways, for both stability and security
reasons [26].

Similarly to browser extensions, the Extension host exposes a set of API endpoints which the
extension can use to access and interact with VS Code itself, its environment, workspaces,
editors etc. [55]. The VS Code API is divided into 11 namespaces with different functionality.
These contain the main resources and methods related to the host program. A list of these
together with a brief description of each namespace can be seen in table 2.1.

Each VS Code extension consists of a few basic components. At its core, an extension is also
a separate Node.js package and process. Therefore, it must contain a package.json [51] file
in its root folder. This is a manifest file that define various properties of a Node.js package
but is also used as an extension manifest by VS Code. While it may contain several different
properties, the VS Code extension documentation [23] lists a few base properties that are the
most important for extensions. The name and publisher fields are used as a unique ID for VS
Code to identify the extension. main declares the main entry file of the extension. This is the
file that contains the activate and deactivate functions that the Extension host invokes when
an activation event related to the extension is fired. Activation events are the events that cause
the extension to activate, such as certain commands or opening documents of a certain file
type [1]. These are declared in the activationEvents field. In addition to these, the extension
also declares how it contributes to VS Code in the contributes field. These contribution points
could for example be commands, themes, or languages [16]. Lastly, the extension manifest must
also declare the lowest supported VS Code version through engines.vscode. This property also
allows the extension to use the VS Code API in its source files.

                                                                                                6
2.4. TypeScript

Table 2.1: VS Code API namespaces, with description if applicable, as cited from the official
documentation.
 Namespace          Description
 authentication     Namespace for authentication.
 commands           Namespace for dealing with commands. In short, a command is a func-
                    tion with a unique identifier. The function is sometimes also called com-
                    mand handler.
 comments           -
 debug              Namespace for debug functionality.
 env                Namespace describing the environment the editor runs in.
 extensions         Namespace for dealing with installed extensions. Extensions are repre-
                    sented by an extension-interface which enables reflection on them.
 languages          Namespace for participating in language-specific editor features, like In-
                    telliSense, code actions, diagnostics etc.
 scm                -
 tasks              Namespace for tasks functionality.
 window             Namespace for dealing with the current window of the editor. That
                    is visible and active editors, as well as, UI elements to show messages,
                    selections, and asking for user input.
 workspace          Namespace for dealing with the current workspace. A workspace is the
                    collection of one or more folders that are opened in a VS Code window
                    (instance).

Extension language
As VS Code itself is built on a TypeScript code base [40], it is also the endorsed language for
building extensions. Both official and bundled extensions are generally built on TypeScript,
as well as extension examples in the official documentation [25]. However, since TypeScript is
compiled to JavaScript before runtime, as described in section 2.4, it is also possible to develop
extensions directly in JavaScript as well.

In addition, some extension functionality does not require scripting, and can simply be added
with JSON files. Examples of such are adding colour themes [15] or syntax highlighting support
for new languages [49].

2.4 TypeScript
TypeScript is a programming language that is built as a superset of JavaScript [53]. This means
that it is a language that builds upon JavaScript, supporting the full JavaScript language,
while adding features on top of it. The main feature offered by TypeScript is adding a static
type system to JavaScript. This system allows developers to define object types, but also
implements type inference where the type of a variable is inferred by the object it is assigned
the value of. This type system is said to help developers increase the structure of the code,
and help find errors earlier in the development phase [53].

TypeScript is implemented to be optional, in order to simplify adoption. It allows developers
to convert parts of the codebase of a project to TypeScript at a time, only adding typing where
necessary [53].

TypeScript is not traditionally compiled, but rather transformed into JavaScript before run-
time. This also simplifies combining TypeScript and JavaScript in a project, but also adds
another quirk. Since JavaScript does not support typing, this is lost in transformation. There-

                                                                                                 7
2.5. Static Analysis

fore, the typing and type checks only exist during development and analysis of the TypeScript
code, but are not enforced during runtime [53].

2.5 Static Analysis
Static analysis is a general term for programmatically analysing software without executing
the code. This in contrast to dynamic analysis which analyses programs during runtime.
Static analysis normally analyses the non-compiled source code but may also analyse compiled
bytecode [34].

Abstract Syntax Trees
Abstract syntax trees (AST), or syntax trees, are an abstract representation of a program’s
syntax, commonly used in compiler design and derived during compilation [3]. It represents
syntax in a hierarchical tree, where each node and leaf represent a syntactical construct.

In addition to its use in compilers, it is also often a central part of static code analysis and
code checking. As it represents the source code in the context of the language, without taking
for example white-spaces, dots, and commas into account, it allows for structured traversal
and analysis of the source code. It also allows for modification of the source code by inserting
or deleting nodes in the tree. One example of such a use case is in linters. These static analysis
programs use the AST of source files to analyse the code [30]. They look for common coding
errors and oversights that the programmer may miss and notify the user or directly corrects
them before they get compiled. These errors may be things as unused variables or imports,
unreachable code, or assignment of dereferenced pointers [32].

2.6 Related work
This section introduces previous research done on the security of extensions, as well as the
implementation and effectiveness of permission systems as a security feature. This related
work is introduced to put the work of this thesis in a larger context.

Browser extension security
While the risks of IDE extensions have not been explored, the area of browser extensions have
been thoroughly researched. Several studies have evaluated popular extension in search for
vulnerabilities.

For example, Carlini et al. [10] evaluated 100 Google Chrome extensions by analysing and
modifying their network traffic during runtime, as well as through manual static taint anal-
ysis. They found that at least 40% of the analysed extension contain one or more vulnera-
bilities. When evaluating these vulnerabilities to the security mechanism in Chrome, isolated
environments, privilege separation, and permissions, the mechanisms were overall effective at
preventing vulnerabilities, but also requires the developers to utilise them properly. It was not
uncommon that developers actively circumvent them when implementing features.

Bauer et al. [8] present a set of stealthier attacks to Chrome extensions. These range from
tracking and stealing user data and behaviour, to privilege escalation by having extensions
share data and state, thereby making data from an extension with limited permissions avail-
able to another with another set of permissions. Some of these could be solved by websites
implementing content security policies restricting the use of data from the website, as well
as enforcing policies for information flow in the browser. More fine-grained permissions could

                                                                                                8
2.6. Related work

solve some of the problems as well. Finally, they propose methods for increasing user aware-
ness when installing extensions and noting the importance of providing the user with clear
information about an extension. This allows them to make informed decisions and could help
steer them to extension requiring less permissions.

Wang et al. [57] used a modified browser based on Firefox to dynamically analyse 2,465 Firefox
extensions for vulnerable behaviours. They categorise vulnerable behaviours in groups of high,
medium, low, and none, based on their severity. The behaviours of highest severity include
arbitrary file access, functionality to download, install, and launch processes, access to network
functionality as well as injecting DOM objects. In their evaluation, they found occurrences
of all these behaviours, and while they are not necessarily malicious, they may expose some
vulnerabilities.

Effectiveness of permissions
The effectiveness of permissions as a security feature is another subject that has been well
covered in literature.

Felt et al. [28] evaluated the effectiveness of install-time permissions in Chrome extensions and
Android applications. They find that install-time permissions are an effective security feature,
but it is often compromised by some shortcomings in its design. Most evaluated applications
request at least some dangerous permission. Several of these were deemed to be over-privileged,
i.e., they request permissions that they do not actually need to fulfil their functionality. This
would be both because of developer errors, but also from the possible permissions not being
granular enough, requiring the application to request a permission where only a small part of
it is needed. They also discuss the issue of dangerous permissions being too common or too
broad. This prevalence causes desensitisation in users, where the warning messages lose their
perceived importance. Users simply accepts warnings without paying closer attention to them.

Marouf et al. [36] studied the Chrome extension permission model and come to many of the
same conclusions as Felt et al. They propose a solution using optional runtime-permissions,
which gives the user more fine grained control of the extensions permission during usage, some-
thing that has since been implemented in both major browsers and Android applications [31,
43].

Over-privilege detection
The issue of over-privilege mentioned above has also been studied further. Here, the focus is
on detecting over-privileged applications and extensions.

A common approach is the use of various machine-learning technologies. Khazaei et al. [33]
propose a system called OPEXA, Over-Privileged EXtension Analyser, which use natural lan-
guage processing on extension descriptions to detect what permissions the described function-
ality. These results were compared to the actual declared permissions to detect over-privilege.
Shezan et al. [46] used a similar methodology but increased the dataset by creating a model
by mapping permissions from different domains with similar functionality. By doing this,
browser extensions, mobile applications, and internet of things services can be analysed with
the same tool. On the other hand, Wu et al. [58] detected over-privileged Android applications
through data mining. They divided applications based on their store category and compared
applications’ declared permissions to a set of permissions commonly needed by applications in
the same category to detect over-privilege.

                                                                                                9
2.6. Related work

The issue of over-privilege is similar to the problem of this thesis, as it concerns analysing
what permissions the extension of application actually uses, or should reasonably need to use,
compared to what it declares. The machine learning based approaches discussed above, does
however require a permission system to be implemented to compare against, which makes
those methods unfeasible for VS Code extensions at the moment.

Tang et al. [50] take a static analysis approach to the problem in Android applications.
They decompiled application binaries and checks the resulting source code for instances of
permission-related method usages. After detecting what the application actually uses, they
also apply the semantic approach and compare the application description to the result of the
analysis to decide what represents over-privilege. Dennis et al. [21] present the tool P-Lint, a
linter that reverse engineers Android application code and detects use of permission methods
and especially focuses on improper or vulnerable use patterns. In another study, Chester et
al. [11] present M-Perm, which combines static and dynamic analysis to detect both under-
and over-permission. They also use decompiled applications to statically compare the declared
permissions to those being used in each file. In addition, a call graph of the application is used
to deduce the reachability of each permission from each entry-point of the application. With
this, they can detect which permissions are active at each application state.

This approach shows that using static analysis to check the code for permission usages is a
valid approach. In Android applications, the fact that applications are published compiled
and need decompilation before analysis, makes the static analysis a more difficult task. Bartel
et al. [7] mention a few of these difficulties, making naive static analysis non-sufficient in many
situations. One example is the difficulty of deciding which permission a permission use is
connected to, due to string literals not necessarily being preserved during decompilation. In
the case of VS Code extensions however, this should not be an issue due to the extensions
generally having their source code fully available as open-source.

Evaluation of static analysis tools
While security tools asses the quality of other software, the tools themselves need to be eval-
uated in order to verify their effectiveness.

Soundness is a commonly used property of static analysers which relates to if the analyser
is guaranteed, i.e., if there exist a vulnerability or piece of code that the analyser should be
reporting, it will report that item [38]. While soundness can often be a property to strive for,
one issue is that it may result in a large number of false positives, which in the end results in
the true positives losing their value.

In contrast to soundness there is the concept of completeness. If an analyser is complete, it is
provable that all items reported are true positives [38]. As soundness often brings a lot of false
positives, completeness instead often brings false negatives - that is when items that should
be reported are missed by the analyser.

Both soundness and completeness are, by definition, all or nothing [38]. Therefore, in order
to prove soundness for example, a formal proof is often needed, which is often not practical in
real contexts. Instead of aiming for proving soundness, aiming for a probably sound analysis is
often more practical. This can be done through for example exhaustive testing or as done by
Andreasen et al. [4], through comparing the static analysis to that seen in dynamic analysis.

Evaluating static analysis tools often looks at detected and non-detected instances. These are
classified as four different categories:

                                                                                               10
2.6. Related work

  • True Positives - Vulnerabilities detected as such.

  • False Positives - Non-vulnerabilities detected as vulnerabilities.

  • True Negatives - Non-vulnerabilities not detected by the tool.

  • False Negatives - Vulnerabilities missed by the tool.

These categories can be compared in various ways to calculate various metrics related to the
effectiveness in detecting vulnerabilities. In order to measure soundness and completeness, two
metrics are commonly introduced - Recall and Precision.

Recall measures the level of soundness of an analyser. A high recall metric means a higher
soundness. Recall is calculated by comparing the number of true positives to the total number
of items the analyser should report [38]. The formula therefore is:

                                              T rueP ositives
                          Recall =
                                     T rueP ositives + F alseN egatives

Precision on the other hand, is a metric that allows you to measure the number of true positives
to false positives, that is, the level of completeness in the analyser [38]. This is calculated as:

                                                 T rueP ositives
                         P recision =
                                        T rueP ositives + F alseP ositives

Tang et al. [50] compare the result of their static and semantic analysis to a human reading the
description of an app. Based on the text, the reader makes an assumption of what permissions
would be needed for this functionality. Based on these results, they calculate precision and
recall, and also introduce two additional metrics:
                                               2 ∗ P recision ∗ Recall
                             F − measure =
                                                 P recision + Recall
                                      T rueP ositives + T rueN egatives
      Accuracy =
                   T rueP ositives + F alseP ositives + T rueN egatives + F alseN egatives

F-measure aims to find the best compromise between precision and recall, while accuracy
gives a general score on how well the analysis performs in regard to both true positives and
true negatives. In their study, they especially focus on the F-measure in order to maximise
soundness, while minimising the number of false positives.

All these could be interesting to explore, while a real-life setting could also introduce aspects
as speed of the analysis as a relevant metric to measure. In this study, however, recall and
precision are chosen as the metrics to evaluate.

                                                                                                11
3 Method

This chapter documents the methodology used throughout the thesis. First, the pre-study is
described in section 3.1, followed by the implementation in section 3.2. This section describes
how the static analysis for detecting method usage in extensions was implemented. Finally,
section 3.3 describes how the effectiveness of the tool was evaluated by analysing a larger set
of extensions.

3.1 Pre-study
In order to gather enough information about VS Code extensions, permission systems, and
static analysis, to be able to answer RQ1 and determine the possibility of implementing a static
analysis tool that detects method usage in VS Code extensions, a pre-study was conducted.
This pre-study was divided into two main phases.

First, information on VS Code extensions and their anatomy and architecture was gathered.
This was mainly done through studying the official documentation which describes the inner
workings of extensions and how to develop your own [24].

The second phase consisted of gathering theoretic knowledge and related work on the subjects
relevant to the thesis. This was done through searching for scientific articles, mainly on
Google Scholar, for keywords such as extension security, permission models, static analysis,
over-privilege. Through articles found with this method, further material was gathered through
their respective referenced articles and citations. In those cases where no published material
could be found, less formal material such as documentation and developer blogs were used to
gain a better understanding on the subject.

Most of the result from this study can be read in the Theory chapter, while answers to RQ1
are presented in the Results chapter.

                                              12
3.2. Implementation

3.2 Implementation
The implementation phase was conducted with the goal of creating a static analysis tool that,
with the input of the source code of a VS Code extension, can analyse the program and
return a list of the extension’s capabilities from a permission point of view. The tool itself
is built in TypeScript to simplify extracting ASTs for extension TypeScript source files. It is
implemented as a Node.js application to allow local execution and is currently implemented for
use as a command line interface (CLI), while measures have been made to simplify connecting
the application to a web service or similar. An overview of the system can be seen in figure 3.1
and the individual components and tasks of the tool will be presented in detail below.

                       Figure 3.1: Overview of the analyser architecture

Defining the extension root
The first task of the analyser is to find and define the root directory of the extension within the
provided directory. The input directory is assumed to be cloned or downloaded directly from
a Git repository, and while most extensions are published by themselves, with the extension
root as the repository root, there might be exceptions to this. The extension MetaGo [37] is
an example of this, as it also publishes its sub-extensions MetaJump and MetaWord under
the same Git repository. To accommodate this, the possible extension roots of the provided
source code is identified by recursively searching through all subdirectories. The goal is to
find directories that contain both a package.json file, which indicates an extension, and a
tsconfig.json file, which indicates that TypeScript is used throughout the extension. Each
such occurrence is stored to be analysed separately as an individual extension.

                                                                                                13
3.2. Implementation

    Abstract Syntax Trees
    In order to analyse each extension, an AST representation of each source file in the project is
    extracted. TypeScript supplies a compiler API through the TypeScript npm package. This
    is the same functionality used in the TypeScript compiler, and the API can therefore directly
    provide an AST representation of an imported .ts source file. An example of the AST structure
    provided by the compiler API can be seen in figure 3.2.

    Figure 3.2: AST generated from the line let variable = object.function(prop1, prop2)

    ts-morph
    While the AST and methods provided by the compiler API could be used directly for the
    analysis, ts-morph is an open-source library that wraps the compiler API in order to simplify
    navigation and manipulation of TypeScript ASTs [52]. ts-morph is used throughout this
    project to import and navigate the source file ASTs. Some examples of functionality ts-morph
    adds, that are used in this project, are the possibility of finding AST nodes of certain kinds,
    and also the mapping of identifiers to references of that identifier. This simplifies the task of
    finding uses of imported methods.

    AST traversal
    The analysis is done through traversing the AST of each project source file. Traversal is done
    depth first, using both preorder and postorder tree walk [17]. By doing this, actions and
    analysis of nodes can be done both on entry and exit of the node, i.e., before or after visiting
    each child of the node. This strategy is essentially the same as the one used by estraverse in
    ESLint [30]. The basic traversal algorithm is presented in listing 3.1 and is visualised further
    in figure 3.2.

                                 Listing 3.1: Tree traversal algorithm
1      function traverse (node) {
2        before (node)
3
4        for ( childNode in node) {
5          traverse ( childNode )
6        }
7
8        after (node)

                                                                                                  14
3.2. Implementation

9     }

                         Figure 3.3: Visualisation of the traversal algorithm

    Node Visitors
    Visiting the nodes during the traversal is implemented using a variant of a visitor pattern [29].
    However, as the node types were not modifiable, the usual double dispatch functionality was
    not possible to implement, why the current solution resorts to a list of if statements for type
    checking. A UML-diagram of the implemented pattern can be seen in figure 3.2.

    The visitor structure is based around an abstract NodeVisitor class. This class defines three
    methods. The before and after methods correspond to the same methods in the algorithm
    above. These take a node as input and define the visit behaviour for different node types,
    either to be handled before or after visiting the node’s children. Generally, most function-
    ality is implemented in the before method, but after is used in certain situations. The
    provideResults method is supposed to be called on after analysis is completed. It sum-
    marises and returns the results of the visitor’s analysis. The NodeVisitor class is purposefully
    designed to be modular to allow for extension of the analyser in the future if needed.

                            Figure 3.4: Visualisation of the visitor pattern

                                                                                                  15
3.2. Implementation

AbstractMethodVisitor
For this thesis, a single subtype of NodeVisitor is implemented, AbstractMethodVisitor.
This subtype is also abstract and contains functionality for searching source files for imported
method calls. In turn, two concrete classes of this abstract class have been implemented,
ImportMethodVisitor and VsCodeMethodVisitor. VsCodeMethodVisitor is responsible for
specifically analysing methods from the VS Code API, while ImportMethodVisitor is more
general and analyses other imported packages.

This subtype of NodeVisitor is based around individual source files. It requires analysis to
be started with a complete source file AST, i.e., an AST with a SourceFile node as root.
Before entering a source file, it stores a new SourceFileData object. This object contains
basic information about the source file, and dependencies and methods found during analysis
is stored in this object. Upon exiting the same node, the current SourceFileData object is
pushed to a separate list for later extraction.

Implemented language constructs
For the implementation, support for a reduced set of language constructs was implemented
in the analyser. These were chosen to represent a base set of the most common ways to
access dependencies, properties, and methods, as well as to store subreferences to these. In
the following sections, these constructs and their implemented behaviour will be described in
more detail.

  • Import declarations are the main method for importing external packages and mod-
    ules into a TypeScript source file. When visiting an import declaration, the analyser
    stores a reference to the imported package for each imported module. Imports in Type-
    Script come in three main types: Default, Namespace, and Named.

        – Default imports the default export from the package.

                                       Listing 3.2: Default import
        1             import Module from 'package ';

        – Namespace imports the entire package namespace into a single variable.

                                     Listing 3.3: Namespace import
        1             import * as Alias from 'package ';

        – Named imports one or more individual modules from the package.

                                       Listing 3.4: Named import
        1             import { Module1 , Module2 } from 'package ';

      In addition, it is possible for all imports to be declared with aliases, where the actual
      variable name is changed from the module name. It is also possible to combine different
      import types in a single import statement which is also handled by the visitor.

                                Listing 3.5: Combined type import
  1           import DefaultModule , { Module1 } from 'package ';

                                                                                             16
3.2. Implementation

  • Require calls are another way to import packages and modules. These function sim-
    ilarly to import statements but are assigned as a regular variable declaration and are
    therefore processed slightly different to import statements, although the resulting refer-
    ence is the same.

                                     Listing 3.6: Require calls
  1              const module = require ('package ');
  2              const { module1 , module2 } = require ('package ');

  • Variable declarations and property assignments are treated in much the same
    way. If a property of an imported package is assigned to a variable or object property,
    the assigned variable is stored with a reference to the imported property it was assigned
    with. However, if the variable is assigned with a method call, it is not stored, as the
    analyser does not handle return objects at this moment. For further description on
    handling of method calls, se Call expressions.

                    Listing 3.7: Variable declaration and property assignment
  1           let variable = module . property ;
  2           object . property = module . property ;

  • Binary expressions represent, among other types of expressions, variable assignments
    in the AST, i.e., assignment of values to already declared variables. These are identifiable
    as an Identifier node as a left node followed by an equals sign. In contrast to variable
    declarations and property assignments, these are not represented by a unique node type
    in the AST, why this special handling is needed. Except for that these instances are
    treated in the same way.

                        Listing 3.8: Variable assignment binary expression
  1           variable = module . property ;

  • Function declarations are somewhat handled by the analyser. While functions them-
    selves are currently not handled, function parameters may represent assignment of an
    imported type or interface. Therefore, parameters with imported types are also stored
    as reference to imported modules. This allows method calls on these parameters inside
    the method to be tracked as well.

                                 Listing 3.9: Function declaration
  1           function ( parameter : ImportedModule ) {}

  • Call expressions are the node representation of function and method calls. If the
    property that contains the method being called is reference to an imported module, the
    method is tracked as used, and added to the result of the analysis.

                               Listing 3.10: Method call expression
  1           module . method ();

Reference Handling
Upon finding an instance of an imported method or property being referenced, that instance
needs to be stored, either to be able to detect further references to the referenced symbol, or
to later extract all used properties and method calls. In each source file object created by a

                                                                                             17
3.2. Implementation

     NodeVisitor a set of dependencies are stored, either as a single vsCodeDependency or a list
     of importDependencies. These each represent an imported npm package. Each dependency
     contains the name of the dependency, a usedProps property, and an importReferences list.

     importReferences
     References to a dependency throughout the analysis is stored in a nested list of
     ImportReference objects. At the top level, DirectImportReference objects are stored which
     represent references directly connected to the package. These are references that are added
     through import statements or require calls. Each reference also stores subReferences. These
     are assignments that reference this reference. References are identified using Identifier
     nodes.

     When adding a new reference to the structure, all Identifier nodes that references this node
     are found using ts-morph functionality and stored in the usages field. When, for example,
     a call expression or variable assignment are found during analysis, the expression is mapped
     against existing reference usages to identify if the expression references an imported package
     or its sub references. In addition to these fields, each reference also contains the name, any
     potential alias of the reference, i.e., alias imports or variable names, its declaration Identifier,
     as well as a reference to the parent node of the tree, either a reference or a dependency. The
     structure of these objects can be seen in listing 3.11.

                    Listing 3.11: Example of the ImportReference JSON structure.
 1      DirectImportReference {
 2        name: " module ",
 3        identifier : Identifier ,
 4        usages : [ Identifier1 , Identifier2 , ...] ,
 5        subReferences : [
 6          SecondaryImportReference {
 7             name: " property ",
 8             identifier : Identifier1 ,
 9             usages : [ Identifier3 , Identifier4 , ...] ,
10             subReferences : [...]
11             aliasReference : " variableName ",
12             reference : Parent ,
13          },
14          ...
15        ]
16        aliasReference : " variableName ",
17        dependency : Parent
18      }

     usedProps
     When imported methods are called or properties accessed, these are also pushed to a separate
     set called usedProps. This is stored as a nested object, but in contrast to importReferences,
     this only contains unique values and only the name of the resource as defined by the VS Code
     API. This is the list of props that is extracted from the source file after analysis. An example
     can be seen in listing 3.12.

                        Listing 3.12: Example of the usedProps JSON structure
1       usedProps : {
2         module1 : {

                                                                                                      18
3.2. Implementation

 3           property1 : {
 4              method1 : {}
 5           },
 6           method2 : {}
 7         },
 8         module2 : {
 9            method3 : {}
10         }
11     }

     Used Prop extraction
     Once analysis of all source files of an extension is finished, all usedProps are combined into
     a single object for the entire extension. This is done in the provideResults method in each
     NodeVisitor. The method iterates through each source file and dependency, merging the
     usedProps into a single object per dependency. The resulting list of dependencies and used
     props are then saved to the analyser.

     Permissions
     As no existing permission system is implemented for VS Code, there exists no mapping between
     API methods and potential warning messages. This mapping therefore has to be created
     manually.

     API extraction
     The VS Code API is available from the documentation and is generated from a vscode.d.ts
     type declaration file [55]. In order to work with the API, it is first extracted to a JSON file.
     This is also done using the ts-morph library. Using the AST of the file, all namespace and
     interface/class names can be easily extracted and stored. For each namespace and interface/-
     class, all properties and methods are then extracted and stored. If a property is of a type
     defined by the API, the type name is also stored in the property to map the property to any
     method usage found during analysis. Finally, the extracted API is stored as a JSON file to be
     imported during analysis.

     Permission categories
     When designing permission categories for the Extension API, a similar strategy to the system
     implemented in the WebExtensions API [31] is used. Each namespace in the Extension API is
     treated as an API in the WebExtensions API. These are therefore each mapped to a permission
     message as seen in table 3.1. These messages are formulated to be short and concise as to
     not overload a potential user with information, while also assuming that the user has some
     computer knowledge and are familiar with basic concepts in VS Code such as workspaces and
     commands.

     However, as many studies on browser extension permissions suggest, the granularity of group-
     ing permissions by namespace or API is often not enough, leading to over-privilege issues [8,
     28, 36]. In an attempt to combat this, additional messages are mapped to properties and
     methods of namespaces. In the case of properties being of types defined in the API, some
     interfaces have messages mapped to properties and methods as well, as to define the specific
     types of actions being done to the properties. While these more granular permissions could
     be added to all possible actions, focus has been put into those of types that would be deemed
     of more sensitive nature, based on their corresponding permissions in browser extensions, as

                                                                                                  19
3.2. Implementation

    described in section 2.6. These are methods and properties related to file access and modifica-
    tion, script execution, interactions with external URIs, and system specific resources such as
    machine ids and using the clipboard. A full list of all messages and the corresponding methods
    and properties they are mapped to can be seen in appendix B.

    Each message is stored using an enum mapping to the message. The enums were in turn added
    manually to each relevant instance in the API JSON file.

                      Table 3.1: Permission messages related to each namespace.
     Namespace          Permission message
     authentication     Interact with and handle third-party authentication providers
     commands           Interact with VS Code commands
     comments           Interact with the Comments interface
     debug              Interact with the Debug interface
     env                Interact with the editor environment
     extensions         Interact with installed extensions
     languages          Interact with language features
     scm                Interact with Source Control Managers
     tasks              Interact with VS Code Tasks
     window             Interact with the editor window
     workspace          Interact with the current workspace

    Permission mapping
    As both used methods and properties, and the extracted API are stored as hierarchical trees
    identified with strings, the used methods and properties are mapped recursively directly to
    the API. If any namespace, property, method etc. in the recursive chain maps to an instance
    in the API that contains a permission message, that message is added to a separate list of
    triggered permissions for the application. In the case of a method being applied to a namespace
    property, this method permission is mapped as a child to the property permission. As such,
    this allows the analyser to, for example, see that it is specifically the document in the active
    editor that is being edited, rather than any arbitrary document, increasing the granularity
    further.

    Data extraction
    The analysis program is designed for the resulting data to be processed by some other program
    for visualisation, for example a web client, why the resulting data is extracted in JSON format.
    This was chosen because of two advantages. First, it is easily readable both by humans and
    most programming languages, as it stores data as readable key-value pairs. It is also easy to
    extract the data from TypeScript objects using built-in serialisation methods. An example of
    the output data format can be seen in listing 3.13.

                      Listing 3.13: Example of the output data JSON structure
1     {
2         " extension_1 ": {
3            " messages ": [
4               {
5                 " message ": " Interact with the editor window ",
6                 " subMessages ": [
7                    {
8                      " message ": " Access the active editor ",
9                      " subMessages ": []

                                                                                                 20
You can also read