WEB SITE SECURITY MATURITY OF THE EUROPEAN UNION AND ITS MEMBER STATES - DIVA-PORTAL
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Degree project Web Site Security Maturity of the European Union and its Member States A survey study on the compliance with best practices of DNSSEC, HSTS, HTTPS, TLS- version, and Certificate Validation Types Bachelor Degree Project in Information Technology with a Specialisation towards Network and System Administration G2E (IT610G) Date of examination: 2021-06-13 First Cycle 22.5 credits Spring term 2021 Student: Axel Rapp (a18axera@student.his.se) Supervisor: Johan Zaxmy Examiner: Jianguo Ding
Acknowledgement I would like to begin this report by issuing a thank you to everyone who assisted me in the process of completing this bachelor thesis. First and foremost to my supervisor Johan Zaxmy who was constantly present and was able to guide me in the right direction every step of the way. Further, I would also like to thank my examinator Jianguo Ding who provided feedback and answers when questions arose. Finally, a thank you to my peers, friends, and family. Thank you!
Abstract With e-governance steadily growing, citizen-to-state communication via Web sites is as well, placing enormous trust in the protocols designed to handle this communication in a secure manner. Since breaching any of the protocols enabling Web site communication could yield benefits to a malicious attacker and bring harm to end-users, the battle between hackers and information security professionals is ongoing and never-ending. This phenomenon is the main reason why it is of importance to adhere to the latest best practices established by specialized independent organizations. Best practice compliance is important for any organization, but maybe most of all for our governing authorities, which we should hold to the highest standard possible due to the nature of their societal responsibility to protect the public. This report aims to, by conducting a quantitative survey, study the Web sites of the governments and government agencies of the member states of the European Union, as well as Web sites controlled by the European Union to assess to what degree their domains comply with the current best practices of DNSSEC, HSTS, HTTPS, SSL/TLS, and certificate validation types. The findings presented in this paper show that there are significant differences in compliance level between the different parameters measured, where HTTPS best practice deployment was the highest (96%) and HSTS best practice deployment was the lowest (3%). Further, when comparing the average best practice compliance by country, Denmark and the Netherlands performed the best, while Cyprus had the lowest average. Keywords: Web Site Security, Information Security, E-governance, Best Practice, DNSSEC, HSTS, HTTPS, SSL, TLS, Certificate Validation
Table of Contents 1 Introduction ........................................................................................................................... 1 1.1 Disposition .................................................................................................................... 1 1.2 Terminology .................................................................................................................. 2 2 Background ........................................................................................................................... 3 2.1 DNSSEC........................................................................................................................ 3 2.2 HSTS ............................................................................................................................. 3 2.3 HTTPS ........................................................................................................................... 3 2.4 TLS & SSL .................................................................................................................... 4 2.5 Certificate Validation .................................................................................................... 4 2.6 Related Work................................................................................................................. 5 3 Motivation & Problem Definition ......................................................................................... 7 3.1 Thesis Statement............................................................................................................ 8 3.2 Objectives ...................................................................................................................... 8 4 Methodology ......................................................................................................................... 9 4.1 Explanation of methodology ......................................................................................... 9 4.2 Selection of Scope ....................................................................................................... 10 4.3 Data Analysis Methodology ........................................................................................ 11 4.4 Validity ........................................................................................................................ 11 5 Implementation of Methodology ......................................................................................... 13 5.1 Finding Domains ......................................................................................................... 13 5.1.1 Governments ....................................................................................................... 13 5.1.2 Government Agencies ......................................................................................... 13 5.1.3 European Union Web Sites ................................................................................. 14 5.2 Finding Best Practices ................................................................................................. 15 5.2.1 DNSSEC.............................................................................................................. 15 5.2.2 HSTS ................................................................................................................... 15 5.2.3 HTTPS ................................................................................................................. 15 5.2.4 TLS ...................................................................................................................... 16 5.2.5 Certificate Validation .......................................................................................... 16 5.3 Finding Compliance with Best Practices of the Domains ........................................... 17 5.3.1 DNSSEC.............................................................................................................. 17 5.3.2 HSTS ................................................................................................................... 17 5.3.3 HTTPS ................................................................................................................. 17 5.3.4 TLS ...................................................................................................................... 18 i
5.3.5 Certificate Validation .......................................................................................... 18 5.4 Automatic Data Gathering and Parsing ....................................................................... 18 6 Results ................................................................................................................................. 20 6.1 Best Practice Compliance of the Whole Population per Parameter ............................ 20 6.1.1 DNSSEC.............................................................................................................. 20 6.1.2 HSTS ................................................................................................................... 21 6.1.3 HTTPS ................................................................................................................. 22 6.1.4 TLS ...................................................................................................................... 22 6.1.5 Certificate Validation Type ................................................................................. 23 6.2 Best Practice Compliance per Population Group per Parameter ................................. 23 6.2.1 DNSSEC.............................................................................................................. 24 6.2.2 HSTS ................................................................................................................... 25 6.2.3 HTTPS ................................................................................................................. 26 6.2.4 TLS ...................................................................................................................... 26 6.2.5 Certificate Validation Type ................................................................................. 27 6.3 Best Practice Compliance per Country per Parameter ................................................ 27 6.3.1 DNSSEC.............................................................................................................. 28 6.3.2 HSTS ................................................................................................................... 29 6.3.3 HTTPS ................................................................................................................. 31 6.3.4 TLS ...................................................................................................................... 32 6.3.5 Certificate Validation Type ................................................................................. 33 6.4 Best Practice Compliance Comparisons ...................................................................... 33 6.4.1 Comparison between Population Groups ............................................................ 34 6.4.2 Comparison between Agency Types ................................................................... 34 6.4.3 Comparison between Countries ........................................................................... 35 6.5 Conclusion ................................................................................................................... 36 6.5.1 Comparison with Related Work .......................................................................... 37 6.5.2 Contributions ....................................................................................................... 38 7 Discussion and Future Work ............................................................................................... 39 7.1 Result Validity............................................................................................................. 39 7.2 Ethical and Societal Aspects ....................................................................................... 40 7.3 Future Work ................................................................................................................ 40 References ................................................................................................................................... 41 Appendix A. List of Government Domains ............................................................................ 44 Appendix B. List of Armed Forces Domains .......................................................................... 45 Appendix C. List of National Civil Police Agency Domains.................................................. 46 ii
Appendix D. List of Prison Agency Domains ......................................................................... 47 Appendix E. List of Public Employment Service Domains .................................................... 48 Appendix F. List of Taxation Agency Domains ..................................................................... 49 Appendix G. List of europa.eu Domains ................................................................................. 50 Appendix H. HTTPS Data Collection Script .......................................................................... 53 Appendix I. DNSSEC Data Collection Script ......................................................................... 55 Appendix J. TLS Data Collection Script ................................................................................. 56 Appendix K. HSTS Data Collection Script ............................................................................. 59 Appendix L. Certificate Type Data Collection Script ............................................................. 60 Appendix M. Data Parsing Script............................................................................................ 61 Appendix N. Data Gathering and Parsing Overview .............................................................. 66 iii
1 Introduction It seems like every other day we hear news reports of breaches or hacking events that have taken place, leaving trusting users in the line of fire for criminals. In a time where e-governance is steadily increasing, and citizens expect to find information and services available online, it is paramount that government-affiliated web services should lead by example and provide a high level of security for the users. A pillar of democracy is that all power should be transparent and scrutinized which leads to the question; are the EU and the member states’ governments leading by example in web security? If not, how can we expect that the private sector will follow? Even though there are sufficient solutions for many security threats, the implementation of such solutions in the real world is not granted. For example, HTTPS support increased from 2016 to 2017, however, the support varied by region and popularity of the Web site (Felt et al., 2017). Thus, leaving more to wish for. This is further confirmed when looking at web security features in the light of the Diginotar security breach in 2011 where the certificate authority was hacked and as a result, was issuing fraudulent certificates. Techniques developed after the incident, such as Certificate Transparency (CT) which makes CA systems auditable; header additions to HTTPS; preventing protocol downgrade attacks with SCSV; DNS-based extensions controlling certificate issuing with CAA, could all protect against a multitude of attacks, even mitigating the effect of Diginotar-similar breaches. Their implementation, however, was found to be disappointing in deployment (Amann et al., 2017). So, there is cause for security concerns in the Internet ecosystem as a whole. What about government security? When Thompson et al. (2020) compared a highly developed e-government state (Australia) and a developing nation as a contrast (Thailand) it was found that not much separated the two nations' security levels of their Web sites. In an example from the study, only half of the Australian sites were configured to use enforcing HTTPS, or HSTS, as compared to Thailand’s one third. Government agencies have the ability to set a mandate for the private sector to follow. But are they? Certainly, there are many different ways of judging a Web site’s security level. The focus of this study is Web site configuration regarding data exchange between the user and the web server within the EU. This project aims to look at a selection of security parameters for Web sites as part of a security chain and analyse the configuration of the selected parameters of websites of governments, government agencies, and EU domains using tools to collect the necessary data. The parameters considered in this paper are: - DNSSEC (DNS Security Extension) – Ensuring authenticated DNS lookups - HSTS (HTTP Strict Transfer Security) – Enforcing use of HTTPS and blocking insecure redirects - HTTPS (HTTP over TLS) – Secure communication via encryption and server authentication - SSL/TLS-version (Transport Layer Security & Secure Socket Layer) – The encryption layer of the HTTPS protocol - Certification Validation type – To what degree certificate ownership is validated 1.1 Disposition The contents of this report begin with a background section intended to present the necessary information to provide the reader with an understanding of the protocols studied, as well as a 1
summarization of related work. Next, a section that motivates why this subject is of interest for a study is discussed, and a thesis statement is defined. The methodology selected is presented and argued for section four, together with a discussion on validity. In section five it is detailly elaborated on how the study was conducted, before presenting the finding of the study in a result section which also includes the conclusions drawn from the findings. Finally, a discussion entertaining possible validity concerns, ethical and societal aspects of the study, and ideas for future work is found in section seven. 1.2 Terminology This section provides an easily accessible list of the terminology and abbreviations used throughout the report. The list is in alphabetical order. CA – Certificate Authorities are organizations that can issue digital certificates and store the information of public keys and their owners. CN – Canonical Name is a record used in DNS to create an alias from one domain to another. CSV – A Comma Separated Value file is a text file using (mainly) commas to separate values. A CSV file usually stores tabular data. DNS – The Domain Name System is the system used to match domain names to IP addresses. DNSSEC – Domain Name System Security Extensions. A detailed explanation of DNSSEC is available in section 2.1. E-governance – Electronic governance is the act of using Information and Communication Technology to provide government services or information sharing. HSTS – HTTP Strict Transport Security. A detailed explanation of HSTS is available in section 2.2. HTTP – Hypertext Transfer Protocol is the protocol used to transfer data over the Web. HTTPS – Hypertext Transfer Protocol Secure. A detailed explanation of HTTPS is available in section 2.3. IANA – The Internet Assigned Numbers Authority is a department of ICANN responsible for unique identifiers registries, such as domain names, protocol parameters, and IP addresses. ICANN – Internet Corporation for Assigned Names and Numbers is a non-profit organization responsible for the IP protocols and address space, among others. IETF – The Internet Engineering Task Force is a non-profit standards organization creating standards to maintain and improve the usability and interoperability of the Internet. MITM-attack – A Man-in-the-Middle-Attack is an attack where the attacker places themselves in between an end-user and a service in order to control the information flowing between them. This attack can be used in different ways, for example, eavesdropping or manipulate information. RFC – A Request for Comments is a document published by the IETF used to develop standards. TCP – Transmission Control Protocol is the most common transmission protocol in IP networks. TLS/SSL – Transport Layer Security/Secure Socket Layer. A detailed explanation of TLS/SSL is available in section 2.4. URL – Uniform Resource Locator is the network identification for any resource connected to the web and is used to specify addresses on the World Wide Web network. 2
2 Background This section of the report aims to provide the reader with information and an understanding of central concepts of web security. All parameters discussed are part of a linked security chain, meaning that each link directly affects the overall security of a Web site. 2.1 DNSSEC The Internet addressing system, DNS, is arguably the most critical part of Internet infrastructure and has been in use since the 1980s. As most protocols developed in the early days of the Internet they were not designed with security in mind, leaving them vulnerable to attacks where users can be redirected to fraudulent sites and having valuable information stolen from them. Developed to combat these sorts of MITM attacks, DNSSEC makes use of public-key cryptography to authenticate and validate DNS data(Arends et al., 2005a). The idea is built upon a chain of trust, meaning that once a DNS response is given to a local DNS server, a public key from the responding DNS server is sent along with the signed response. The local DNS use this public key to validate its authenticity by querying its parent zone, usually, a Top-Level Domain (TLD), which can vouch for the child’s signature as well as signing its own response. The local DNS then proceeds to query the root zone which can vouch for the TLD’s zones signature. If this process does not output any errors, the local DNS is ensured of the DNS data’s authenticity and integrity. As an example, when resolving the IP address of www.example.com the root zone (.) will verify the .com zone’s signature and the .com zone will verify the example.com zone’s signature. If any link in this chain is broken, it is enough cause for concern for the local DNS not to accept the resolved IP address. 2.2 HSTS Based upon Jackson & Barth’s (2008) prototype approach of ForceHTTPS, which intended to enforce HTTPS in browser client communication through a browser extension, HSTS instead have the websites declare themselves as HTTPS only, mainly in the HTTP response header field (Hodges et al., 2012). Hodges et al. (2012) continue by explaining that with HSTS implemented on the server, the client will dynamically change insecure links to secure ones before accessing the web host (e.g., http://his.se to https://his.se). Further, if security errors with regard to TLS or the certificate not being trusted occur, the connection will be terminated by the server and the client will not be able to access the web application. If applied to the top domain name, HSTS could also be configured to apply to all sub-domains, thus blocking any HTTP redirects within the domain. 2.3 HTTPS A fundamental part of Internet security relates to the use of HTTPS which was first introduced in 2000 in an RFC by IETF (Rescorla, 2000). The communication protocol builds upon the application layer protocol HTTP which makes up the fundament of world wide web communication. The incentives for implementing HTTPS are that the HTTP protocol makes no significant efforts towards security. Thus, exposing vulnerabilities to confidentiality and integrity of network traffic. As explained by Rescorla (2000) the concept of HTTPS is simple; simply use HTTP over TLS as you would over TCP. This prevents packets from being sent in plain text to instead being encrypted with TLS, achieving secure 3
communication. The HTTPS protocol fulfils the following features: • Confidentiality – the message is encrypted. • Integrity – the message has not been altered in transit. • Server Authentication – the message is received from the correct sender. An HTTPS session usually occurs between a web browser (client) and a web server where the former initiates the process of agreeing upon what cryptographic parameters to use for the session. This process is called “The TLS handshake”. There are multiple ways to configure a server and a client to what this handshake should entail, however, according to Rescorla (2000), the most important and relevant steps are the following: 1. The client sends a ClientHello message with information regarding what TLS version and encryption algorithms (also called cypher suites) it supports along with a preference list for what to use if it can be matched by the server. A session ID is also present. 2. The server responds with a ServerHello message where the TLS version and cipher suite selected is established. It also passes along its server certificate. 3. The client responds by sending a premaster secret which is used by the server to set up an asymmetric encryption channel. 4. The two sides confirm that the encryption is valid and working, and if it is, application data can be securely sent both ways. Dierks and Rescorla (2008) give insight into the following issue: an HTTPS session is initiated by the client, but it is the server that dictates the term of the session. A web server can be configured not to support certain TLS versions as well as cipher suites in order to avoid establishing insecure communication channels with clients. There is a balance here that is of interest: a web server which only supports the newest and most secure cypher suites might find itself denying most clients. Therefore, some support for older (and perhaps insecure) versions and suites needs to be present. A conflict of interest can occur for the web server owner between wanting to provide access to all clients and wanting to provide only secure access. 2.4 TLS & SSL SSL is the predecessor and now obsolete protocol of TLS, the encryption protocol used in HTTPS communication. Originally developed by Netscape the SSL version 1.0 was never published due to security flaws. Its successor, SSL 2.0, released in 1995 was not free of security flaws either and forced a rather quick redesign of the protocol released the following year as SSL version 3.0. SSL 3.0 is the protocol built upon by the newer TLS versions. The control of the protocol was given to IETF which, in 1999, released TLS version 1.0 as an update to SSL 3.0. Since then, three more versions of TLS have been published adding more security features and protections against newer attacks. In 2011 and 2015, SSL versions 2.0 and 3.0 respectively were deprecated, leaving only the TLS suite available for use (Barnes et al., 2015; Turner & Polk, 2011). 2.5 Certificate Validation Another parameter important to web security are certificates, or more specifically, certificate validation levels. The main idea of a certificate is its ability to certify a public key’s owner in some manner. If the public key is trusted, then clients can also trust that public key’s corresponding private key. This trust can be accomplished only by a trusted third party – a Certificate Authority. 4
As explained by (CABF, n.d.), when a CA issues certificates there are a few different ways of validating the entity to which the certificate is applied: (1) Domain Validated Certificate, (2) Organizational Validated Certificates, and (3) Extended Validated Certificates. Each differing in cost and how thorough the validation process is. (1) Domain Validation (DV) is the lowest level of validation where the CA essentially only validates that the purchaser has authority over the domain in question. A DV certificate can usually be obtained within a few minutes or hours from purchase and is the cheapest alternative since no human interaction is needed. You are, however, provided with the ability to establish an HTTPS connection to your Web site. (2) Organizational Validation (OV) is a step in the right direction with a more expensive solution in which the CA needs to be able to validate the organization’s identity before issuing the certificate. This process can be completed within a few days. (3) Extended Validation (EV) is the strictest alternative where the CA requires the organization to provide documentation of ownership, physical location, the legal existence of the organization etc. before issuing the certificate. Naturally, with this type of thoroughness, the certificate takes longer (usually a few weeks) and costs more to obtain. 2.6 Related Work This section aims to explain how this study fits in the environment of previously published work in the area and why it is a necessary contribution to the total body of literature on the subject of Web site security for governmental Web sites. E-governance and security are two interchangeable areas. Alharbi et al. (2014) identify security as part of e-governance to be a main factor in regard to adoption by end-users and attribute the perceived risk similar importance. Further, their study finds that half of the respondents housed a privacy concern when utilizing e-government services. The researcher believes this study, among others, highlights the importance of transparency in the development of Web security related to governments. Another approach seen in the literature is the deep-dive analysis of a specific country’s conditions regarding e-governance or audits of specific services, often performed on developing nations. For instance, the vulnerability assessment of Burkina Faso’s government-controlled Web sites which showed known vulnerabilities on half of the inspected sites (Bissyandé et al., 2016). While this type of work is particularly important for the ability of these nations to better their infrastructure and overall security, it does not provide a general status of larger regions, such as Europe. Studies similar to this study, where the adoption rates of certain security features or protocols are researched, are available. However, these also focus on specific countries. As mentioned in section 1, Thompson et al. (2020) investigated, as part of their study, the adoption rate of the HTTPS enforcing protocol HSTS. Thus, providing a glimpse of the best practice implementation status of that protocol for the two countries included in the study at that specific point in time. This type of information has not been identified collectively for the government agencies of the European countries. There are a lot of data available, however, only for the usage of certain protocols on the Internet as a whole, or specific for individual countries. This type of data is often provided by non-scientific sources such as browser providers or certificate authorities but still gives a good estimate of the ecosystem’s adaptation as a whole. In contrast, this study aims to provide similar data but specific to governmental Web sites within the EU in a scientific manner. Another non-scientific resource that qualifies to be mentioned in this section is Balter's (2021) analysis of federal .gov domains in the US. His work is closely related to the aims of this research study. The prevalence of a set of parameters has been analysed for all US federal controlled domains to serve as a status update of the current adoptions of those parameters. It was shown that HTTPS was supported on 5
95% of domains, and 75% returned DNSSEC records. Further, 69% of domains supported HSTS in some capacity whereof 44% also present on the HSTS preload list (Balter, 2021). 6
3 Motivation & Problem Definition As the use of the Internet increases, so does the adaptation of e-governance by government agencies. Government agencies use the Internet to communicate with their citizens mainly through their Web sites and Web services, and citizens are to a greater extent expected to use the Internet in their communication with the government agencies, even more so in developed parts of the world, such as Europe (Thompson et al., 2020). As identified by Alharbi et al. (2014); security and perceived security are two important factors for this state-to-citizen relationship to work. A prerequisite for this type of communication to work as intended is for the communication to be secure. The security of the communication between Web sites and end-users are not always clear to the end- user and not complying with best practices regarding Web site security can lead to a multitude of issues affecting the confidentiality and the integrity of the end-user. For example, by not supporting HTTPS and providing Web site communication over plain HTTP the end-user is at risk for information theft and manipulation due to the loss of confidentiality, integrity and server authentication as compared to HTTPS communication (Rescorla, 2000). An intention of this report is to, by establishing the best practices and highlighting possible weak areas, assist network and system administrators in maintaining secure Web sites. The goal of this study is to perform a collective data gathering from the Web sites of the governments, and the government agencies of the member states of the European Union (EU), as well as EU- controlled Web sites. By analysing the collected data and presenting the results, it will act as a snapshot of the current implementation adoption of relevant security protocols and security features, something which is not readily available at the moment. The benefit of such information is to provide a fair insight into the e-government security level of a large geographic area affecting a large demographic. Further, a snapshot of the current state can also serve as a benchmark when measuring improvement over time. The selection of parameters to gather data upon is derived from the reasons explained in section 2. Essentially, to be able to assess the security maturity of multiple Web sites there is a trade-off between how thorough of a security analysis of a Web site to perform and how many Web site you can assess. This project aims to assess many different domains in multiple countries and because of this, an in- depth analysis of each Web site is not applicable given the resource availability of this final year project. Rather, the parameters to be included must be able to be validated in an automated fashion without human interaction. A few key metrics have been chosen as part of a Web security chain that also fulfils the previously mentioned requirement of automated validation. Each security feature’s relevance and their importance as part of this chain are discussed in section 2 where their interplay is also clarified. The chosen parameters are: - Prevalence of DNSSEC - Prevalence of HSTS - Prevalence of HTTPS - SSL/TLS version support - Type of Certificate Validation The selection of subjects where this study exclusively explores government agencies, governments of EU member states and EU itself is explained by the interest of scrutinizing those whose power and trust is the greatest to ensure that they oblige to best practices. Since e-governance is an ongoing 7
driving force it is of importance that it is handled correctly. Further, the researcher has chosen not to look at specific countries or regions within the EU, instead, all countries within the EU will be included in the study. 3.1 Thesis Statement Considering the reasons explained in the previous section, the thesis statement of this study is: This study aims to establish to what extent the European Union controlled Web sites, and the Web sites of the governments and a selection of the government agencies of the member states of the European Union comply with the best practices of Web site security in regard to DNSSEC, HSTS, HTTPS, TLS, and Certificate Validation. 3.2 Objectives Three subordinate objectives have been formulated and need to be answered in order to reach the aim of the thesis statement: Objective 1: What Web sites should be included in the study? To reach the aim, this first objective is essential to have a starting point. Since there are several different countries with different polities to be included in the study, as well as an international union, a selection of relevant and comparable Web sites to be included must be found. Objective 2: What are the best practices for the relevant security features? To be able to establish to what extent the relevant Web sites comply with best practice, there needs to be a standard to which to compare the sites. This standard for best practices needs to be defined before any relevant conclusions or comparisons can be made from the material. Objective 3: Do the Web sites in objective 1 comply with the best practices established in objective 2? Objective 3 could be argued to be the core question in this study since it builds on top of objective 1 and objective 2 and causally relates to the thesis statement. If each Web site’s compliance with the best practices can be assessed, the entire population of assessed Web sites will make up the total body of results needed to establish the extent of best practice use. Answers to these objectives and establishing the extent of best practice use as explained by the thesis statement could be valuable, not only to organizations included in the study but to any organization in evaluating its Web site security. Further, the findings of this study will provide a transparent security view of the organizations included in the study which can be used by individuals communicating with them through their Web sites. Finally, the findings of this study would be beneficial for any research intended to measure Web site security features implementation over time in providing a snapshot of the current status. 8
4 Methodology This section of the report aims to provide the reader with an explanation of how the study is to be conducted, and why the methodology was chosen to be the best applicable methodology in this particular circumstance. 4.1 Explanation of methodology The general process of this methodology is separated into three different steps, each corresponding to a specific objective. The first step will be to decide on which entities to be included in the study. Secondly, to map the current best practices of the parameters chosen to be included in the study. Thirdly, to gather empirical data of the parameters for the entities, and further, analyse the data to evaluate the compliance with the best practices determined in step 2. According to Wohlin et al. (2012), there are three different strategies to empirical studies such as this one, namely, survey, case study, and experiment. A survey approach is useful for collecting information to be able to describe, explain or compare variables. This is suitable for this study since the objective is to describe Web site security and to compare implementation between different groups. A case study would not suit this study to the same extent since case studies aim to explain phenomena that are not that well understood, which is not the case when it comes to Web site security. Although Web site security might be complex, the features looked at are inherently well understood and best practices agreed upon. Further, the thesis statement can only be answered by looking at the whole population to which it relates, not by deep diving into only a selected few organizations. Finally, experiments manipulate a variable in a controlled setting to examine what effect it has on a subject. This study is not performed in a controlled setting, rather, in a real-world setting, meaning that the researcher cannot control certain elements in the environment. An experiment also makes use of a hypothesis to try to verify or falsify it. This is not applicable in this study. (Berndtsson et al., 2008) Given the above reasons, the chosen methodology is to perform a survey-based study focused on gathering the same types of data from each subject in a larger population. In the context of this study, the data refer to Web site security implementations of governments and government agencies of member states of the EU, as well as EU-controlled Web sites. Unlike a classical survey, this survey will be conducted on computers rather than people, which entails certain advantages over classical surveys worth mentioning: (1) The standardization of the questions used in the survey is important when dealing with people to ensure all subjects interprets the questions in the same way. This becomes a non-issue when you query a Web site server. The use of standardized protocols ensures that the question is understood and that an expected response is given. (2) As highlighted by Berndtsson et al. (2008) a common issue for surveys is that the motivation for participation often is low and that high response rates are difficult to achieve. By performing the survey on Web site servers, an answer is guaranteed as long as the subject is online. (3) Responder bias where respondents tend to want to show a positive image of themselves, or that respondents tend to favour a neutral position in a question is also a non-issue concerning this survey. All possible bias of the respondent’s answer being influenced to be pleasing or unpleasing is non-existent when the responder is a computer. 9
4.2 Selection of Scope According to Robson & McCartan (2016), the population refers to all cases within the scope of the research question. The sample (the entities included in the survey) is selected from that population using a sampling technique. The population of this study consists of all Web sites of governments of member states of the EU, and all agencies under those governments, and all the EU-controlled Web sites. As stated by Robson & McCartan (2016), the larger the sample is in relation to the population, the lower the risk is of error in generalization. This is the reason for striving for an inclusive survey, whereas much of the population as possible is included in the study. The first group in the population – governments – allows for a total inclusion since they are not very numerous. The second group – government agencies of member states – is a larger group that is not as easily available. If all member states of the EU were to have registries available displaying all national agencies, a total inclusion could be achieved here as well. However, not all relevant states have public registries of this kind, and the resources available for this final year project for a bachelor’s degree are not sufficient to localize all national agencies for all the EU member states. The non-probability sampling technique of purposive sampling was utilized to achieve a sample within this population. It is described by Robson & McCartan (2016) as a technique where the researcher chose the sample based on satisfying the need of the study. The population is divided into groups – each member state is a group – and from that group representative agencies which is comparable to one another was chosen. The goal is to find five agency types expected to exist in all member states of the EU to include in the study. To achieve this, the researchers starting point was to find five well-funded agency types. It is not necessary for the study that the selected agencies are the most well-funded in each state. Instead, a generalization and an assumption have been made that even though the countries in question might distribute their spending differently between their agencies, that it is still a fair ground for comparison if the same types of agencies are selected. By referencing the Swedish Financial Management Authority (Utgifter i Statens Budget, 2021), the allocation of state resources in Sweden was used to find five agency types that were well-funded and have counterparts in all member states of the European Union. The implications of this selection are discussed in section 7.1. The agency types selected were: 1. Armed Forces 2. National Civil Police Agency 3. Prison Agencies 4. Public Employment Services 5. Taxation Agencies Finally, the third group, consisting of EU-controlled Web sites, will not be handled in precisely the same way as the government agencies of each member states since the EU as an international union simply is not comparable to individual nations. However, an approach similar to the first group of governments can be taken and include all EU institutions, agencies, and bodies. This is an appropriate measure since all EU institutions, agencies, and bodies reside on the same 2nd level domain – namely europa.eu – along with a few other types of Web sites relevant to the thesis statement: 1. Inter-institutional cooperation entities or services. 2. Sites providing access to the information and services of an official programme. 3. Sites providing access to a service or database with a well-established brand name. 4. Sites requiring high visibility for promotional purposes. (The Europa Domain, 2021) 10
A total inclusion of all 3rd level domains under the europa.eu domain will be selected, thus, including the entire population. 4.3 Data Analysis Methodology The data gathering process aims to extract data from each Web site that can be compared to a best practice standard clarified in section 5.2 and also give the ability to compare results between different subject groups (e.g., comparing countries with one another). The different metrics included in the study all generate nominal data since the result of all queries to the Web site will be constructed similar to yes and no questions. For example: Are you configured with HTTPS? Yes. or Do you support TLS version 1.3? No. Of course, this is not an interview with a server. Details of which tools and queries will be used to extract this information is presented in section 5. The analysis of this study is focused on measuring the gathered data for frequency. For example, how many domains that have HSTS implemented on their Web sites. 4.4 Validity For the trustworthiness of the result, this section will explore the specific validity concerns in regard to this specific study by using the classification presented by Wohlin et al. (2012) to discuss to what extent the results are true and not biased by the researcher’s subjective perspective. The first validity threat concerns the reliability of measures which Wohlin et al. (2012) explain to be of great importance for any study which includes measurement of some sort. This reliability can take a toll if, for example, bad questioning is used. As further explained by Wohlin et al. (2012), as a general principle the less human interaction included in measurements, usually the more reliable the measure. If something is measured twice, the outcome should be the same. For this study, human error is avoided by using scripts and tools that ensure that the queries are constructed in a standardized manner ensuring that all subjects are being treated the same way. This also allows for the automation of certain tasks within the process. Testing of the scripts and tools used for data gathering will be conducted to compare the output with recognized tools and methods to ensures functionality ahead of data gathering. This will be performed by choosing a number of domains included in the study and running them through external testing tools to validate the results. The standardization and automation of data gathering also mitigate any threat to the reliability of treatment implementation, which refers to the test subjects being treated differently. Furthermore, it is important to acknowledge that even though the scripts and tools produce identical responses when run directly after one another, the results will not be able to be repeated due to the constant change of the studied environment. Not only are adoption rates of protocols likely to change, but new consensus on best practices is almost certain to change over time as the threats and security protocols used to combat them change the dynamic of the Internet. However, this is not a threat to validity since that is to be expected when analysing public Web sites available on the Internet. It is an ever-changing environment. Therefore, due to the nature of the posted thesis statement, the result is only presented as a current state as of 2021-05-11. 11
No in-depth statistical analysis will be performed due to the data only being nominal. However, the sample size used should be a true representation of the total population since all entities are being tested for the population groups of governments and europa.eu domains, and generalization errors were mitigated with the sampling technique explained in section 4.2. Other validity threats include maturation, history, repeated testing, responsiveness, and statistical regression. These types of human changes in responses have been briefly touched on in section 4. With the use of standardized protocols and the respondents being Web site servers, these types of validity threats are not affecting this study. Construct validity relates to what extent the operational measures that are studied really represent what the researcher has in mind and what is investigated according to the research questions. If, for example, the constructs discussed in the interview questions are not interpreted in the same way by the researcher and the interviewed persons, there is a threat to construct validity. As mentioned before, this is mitigated by the standardization of protocols used when querying the Web site servers. This type of validity threat could also include the studied parameters not providing sufficient information to be able to answer the thesis statement. Web site security is a broad subject, and the best practices extend beyond the scope of this study. Hence, it has been decided to present the results as compliance with the best practices of the parameters included in the study, not as holistic compliance with Web site security best practices. As previously mentioned in section 4.1, surveying computers guarantees an answer as long as the subject is online. However, there can be a multitude of reasons as to why a server is unresponsive (downtime, etc) and it is possible that unresponsiveness could impact the result of the study. This is important to keep in mind when consuming the results of this study since an unresponsive server yields a failed tests as the scripts are designed. Due to the constraints of resource availability of this bachelor thesis, no mitigation to this validity threat was achieved. 12
5 Implementation of Methodology This section aims to provide the reader with a more detailed explanation of the methodology described in section 4 as well as the outcome of the questions presented as objectives to the thesis statement where objective 1 correlates to section 5.1, objective 2 correlates to section 5.2, and objective 3 correlates to section 5.3. 5.1 Finding Domains This section aims to explain how the researcher conducted the task of finding the domains and providing answers to objective 1 for all different population groups. 5.1.1 Governments As discussed in section 4.2, a total inclusion of the first population of governments is used in the study. To find the Web sites of all governments of the member states of the EU, a premade list was utilized from an official EU Web site. Each Web site was manually extracted and visited by the researcher to ensure the validity of topicality of the link. The full list of government URLs is available in Appendix A (List of National Governments, 2021; The 27 Member Countries of the EU, 2021). 5.1.2 Government Agencies The sample of government agencies to be included in the study was established to consist of (1) Armed forces, (2) National Civil Police Agencies, (3) Prison Agencies, (4) Public Employment Services, and (5) Taxation Agencies. The process of finding the URLs of these agencies was to search for premade lists already containing links to the agency in question, preferably lists residing on official EU Web sites. If such a list cannot be found, other sources have been utilized to locate the accurate Web sites. All resources used will be referenced accordingly. (1) Armed Forces No official EU Web site was found containing a premade list holding this information. Instead, a Wikipedia page listing all military and paramilitary personnel was utilized to locate the Web sites of the national armed forces of the EU member states. Each Web site was manually extracted and visited by the researcher to ensure the validity of topicality of the link. Where no general agency was found for a nation, the ministry of defence was selected instead. The full list of Armed Forces URLs is available in Appendix B (List of Countries by Number of Military and Paramilitary Personnel, 2021). (2) National Civil Police Agencies No official EU Web site was found containing a premade list holding this information. Instead, two Wikipedia pages listing law enforcement agencies of different countries were utilized. Differences in policing structure are obvious between different countries. The objective was to, as accurately as possible, locate an agency equivalent to a national civil police. The lists were cross-checked against one another to find the accurate URL. Each Web site was manually extracted and visited by the researcher to ensure the validity of topicality of the link. 13
The full list of National Civil Police Agencies URLs is available in Appendix C (Law Enforcement by Country, 2021; List of Law Enforcement Agencies, 2021). (3) Prison Agencies No official EU Web site was found containing a premade list holding this information. However, Europris, an organization dedicated to promoting professional prison practice and officially supported by the Justice Programme of the European Union provides just that through its web tool EPIS. Using this tool, each Web site was manually extracted and visited by the researcher to ensure the validity of topicality of the link. In cases when no general prison agency was listed, the Web site of the ministry of justice was used instead. The full list of Prison Agencies URLs is available in Appendix D (European Prison Information System, 2021). (4) Public Employment Services A list residing on the European Commission’s Web site was located holding information on all the member states’ public employment services. Each Web site was manually extracted from this list and visited by the researcher to ensure the validity of topicality of the link. An exception was made regarding Belgium, where four different agencies were listed. Instead of selecting only one, not possessing the ability to make a fair choice, all four Web sites were included in the list. The full list of Public Employment Service URLs is available in Appendix E (Public Employment Services, n.d.). (5) Taxation Agency A list residing on EUs official Web site was utilized to find the Web sites of all national taxation agencies. Each Web site was manually extracted and visited by the researcher to ensure the validity of topicality of the link. In cases when no general taxation agency was listed, the Web site of the ministry of finance was used instead. The full list of taxation agency URLs is available in Appendix F (Tax Authorities Contact List, 2021). 5.1.3 European Union Web Sites Similar to 5.1.1 – Governments, European Union controlled Web sites will have a total inclusion of the population to be used in the study. To find all subdomains of europa.eu a web tool by Wolfram Alpha was utilized which presents all subdomains of a given domain along with visiting statistics of each domain. The tools provided the researcher with 78 individual subdomains of europa.eu which all were included in the study. Each Web site was manually visited by the researcher to ensure the validity of topicality. This resulted in the removal of a few entries from the list due to duplicates and domains no longer being active. In total, 78 domains, are included in this population in the study. The full list of europa.eu URLs is available in Appendix G (WolframAlpha, n.d.) 14
5.2 Finding Best Practices This section will present the concluded best practices for each parameter included in the study. To establish the best practices of the parameters included in the study a multitude of sources were consulted to find a consensus on the matter of best practices. The open standards organizations of Internet Engineering Task Force and Internet Assigned Numbers Authority provided a significant basis in this regard. Further, industry bodies, eminent organizations in the industry, and research articles regarding the subjects were also consulted in establishing the current best practices. 5.2.1 DNSSEC The first published standard of DNSSEC was made public by IETF in 1997 (Eastlake & Kaufman, 1997). This was followed by a revision of the initial RFC in 1999 (Eastlake, 1999). The protocol suite was rewritten in 2005 spanning over RFC 4033-4035 (Arends et al., 2005a, 2005b, 2005c). At this point, any holistic implementation of DNSSEC was still limited due to the fact that the root zone had not yet been signed. This was achieved in July of 2010 when ICANN signed the root zone, greatly simplifying the deployment of DNSSEC resolvers. As of February 2021, the Root Zone Database of IANA (Internet Assigned Numbers Authority) contains 1589 TLDs adopting DNSSEC, proving its worth as a best practice (IANA & IANA, 2021). 5.2.2 HSTS The essentialness of using HTTPS is covered in section 2.1 and 5.2.3, and the HSTS protocol helps ensure the use of HTTPS and to mitigate mainly SSL-stripping man-in-the-middle attacks, an attack where a secure HTTPS connection is converted into a plain text HTTP connection. However, HSTS does not come without limitations. The same idea of SSL-stripping can be used by an attacker if the user is accessing the Web site for the very first time and strip the HSTS header, essentially precluding the protocol to be activated. This limitation is addressed by the browsers by distributing HSTS preloaded lists within their commercial browsers. An HSTS preloaded list contains known HSTS supported Web sites for which HTTPS is used for the initial request. A simple workaround indeed, however, it is unwieldy due to its scalability issues. This list cannot cover the entire Internet. There are ongoing discussions of a more scalable solution with HSTS headers announced via DNS, securely using DNSSEC, but no consensus has been established. The best practice is to have HSTS configured on the main domain covering redirects to subdomains and to have the domain present on HSTS preloaded lists (Hodges et al., 2012). 5.2.3 HTTPS The importance of offering HTTPS support and moving away from HTTP without TLS can be manifested by the many promotion efforts invested in this development. In 2014, Google started using HTTPS support as a ranking parameter in search results - an effort to encourage websites to implement HTTPS support (Ait Bahajii & Illyes, 2014). It is now common for web browsers to warn their users when visiting a website over HTTP (Schechter, 2016; Vyas & Dolanjski, 2017). To further press on the efforts made by the industry to make the Web a more secure place are the works of Let’s Encrypt. Let’s Encrypt is a free Certificate Authority currently serving close to a quarter-billion Web sites starting in 2013 (Let’s Encrypt Stats, 2021). It is made clear by the industry that HTTPS support is best practice. 15
You can also read