Toward an Automated Feedback System in Educational Cybersecurity Games - Masaryk University
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Masaryk University Faculty of Informatics Toward an Automated Feedback System in Educational Cybersecurity Games Ph.D. Thesis Proposal Mgr. Valdemar Švábenský Advisor: doc. Ing. Pavel Čeleda, Ph.D. Brno, January 2019 Signature of Thesis Advisor
Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Mgr. Valdemar Švábenský Advisor: doc. Ing. Pavel Čeleda, Ph.D. i
Acknowledgements The majority of research and published papers was supported by the Security Research Programme of the Czech Republic 2015–2020 (BV III / 1 – VS) granted by the Ministry of the Interior of the Czech Republic under No. VI20162019014 – Simulation, detection, and mitigation of cy- ber threats endangering critical infrastructure. Computational resources were provided by the European Regional Development Fund Project CERIT Scientific Cloud (No. CZ.02.1.01/0.0/0.0/16_013/0001802). I’d like to sincerely thank the whole CSIRT-MU group. The team leader and my advisor, Pavel Čeleda, has always found time to help me with any question about my Ph.D. studies. His dedication and hard work has motivated me to do my best. My consultant, Jan Vykopal, has drawn me in the area of cybersecurity education and has been a great “partner in crime” ever since. All the other members of the team are really good colleagues as well, and I appreciate working with them. Another big thank you goes to Petr Jirásek and AFCEA, who made it possible for me to attend the European Cyber Security Challenge 2018 in the role of the coach of the Czech team. It was a great responsibility and learning experience to face all the practical challenges of a large international Capture the Flag competition. Next, I learned an awful lot from Martin Ukrop and Ondráš Přibyla as the leaders of the Teaching Lab initiative. Vlasta Šťavová was a big help for her advice on Ph.D. and, along with Tomáš Effenberger, provided useful comments on some of the preliminary stages of this thesis proposal. I also thank Peter Hladký for our discussions on (working) life and providing another perspective on my ideas. Finally, I thank mum and dad, my girlfriend Pavlínka, the FASKOM+ crew, and all my family and friends who have been supporting me and helping in various ways throughout my journey. I dedicate this work to the loving memory of my brother Gabriel. ii
Abstract Educational games feature practice tasks with multiple approaches to the solution. An example of such a game is Capture the Flag, a popular type of training activity for exercising cybersecurity skills. During the game, regular feedback is crucial to support learning of novice players. However, providing feedback manually requires an expert instructor and is time-consuming. Therefore, our goal is to automate the process by creating a virtual learning assistant. To achieve this goal, we need to address three research topics. The first is defining a formal model for tasks in educational games within the cybersecurity domain. The second is leveraging the data of players to improve the accuracy of the model. The third is employing the model to provide personalized feedback. To address the first topic, we propose a transition graph describing how a single player can progress through the game tasks and what actions (s)he can perform. Next, we will develop data analysis methods for deriving information about the progress of each player. It is possible to collect data about players’ in-game actions, such as typed commands or solution attempts, along with their timing. We will leverage this data to improve the accuracy of the model for a particular game. Finally, we will employ the model to provide automated, personalized formative feedback. The feedback will include hints and debriefing of the player’s strategy to support learning. To evaluate our research, We will test the hypothesis that players who receive the feedback will perform significantly better than others on the same game. We aim to contribute both theoretical and applied results. First, we will develop a general approach to model tasks in cybersecurity games. Second, we will provide associated methods for data analysis and explore patterns in solving the game tasks. Third, we will create software that will assist players by providing feedback. Although there are automated feedback systems for learning programming, the novelty of our approach lies in the context of educational games. The data logged from them are more heterogeneous than data from programming assignments, and cybersecurity tasks require using domain-specific tools. Our research will increase the educational impact of the games and reduce their dependency on human experts. As a result, more students will be able to learn cybersecurity skills at an individual pace. iii
Keywords cybersecurity games, capture the flag, active learning, adult educa- tion, educational data mining, learning analytics, formative assessment, intelligent tutoring systems iv
Contents 1 Introduction 1 1.1 Overview of the Research Problem . . . . . . . . . . . . 2 1.2 Expected Contributions and Impact . . . . . . . . . . . . 4 1.3 Structure of the Thesis Proposal . . . . . . . . . . . . . . 4 2 State of the Art 5 2.1 Capture the Flag Games . . . . . . . . . . . . . . . . . . 5 2.1.1 Games in Education . . . . . . . . . . . . . . . . 5 2.1.2 Origins and Definition of CTF . . . . . . . . . . . 6 2.1.3 Attack-defense CTF . . . . . . . . . . . . . . . . 7 2.1.4 Jeopardy CTF . . . . . . . . . . . . . . . . . . . 7 2.1.5 Technical Infrastructure for CTF . . . . . . . . . 8 2.1.6 Advantages and Disadvantages of Competitive CTF 9 2.1.7 Towards an Educational Use of CTF . . . . . . . 10 2.1.8 Using CTF for Cybersecurity Research . . . . . . 10 2.2 Computer Science Education Research . . . . . . . . . . 11 2.2.1 General Approaches . . . . . . . . . . . . . . . . 11 2.2.2 Formative Feedback . . . . . . . . . . . . . . . . . 12 2.2.3 Intelligent Tutoring Systems . . . . . . . . . . . . 13 2.2.4 Challenges to Providing Feedback in Cybersecurity 13 2.2.5 Employing Command Line History for Feedback . 14 2.2.6 Ideal Feedback . . . . . . . . . . . . . . . . . . . 14 2.2.7 Related Research in Cybersecurity Education . . 15 2.2.8 Education Research in Other Domains . . . . . . 15 3 Research Aims and Methods 17 3.1 Research Questions and Expected Results . . . . . . . . 17 3.2 Research Environment . . . . . . . . . . . . . . . . . . . 18 3.2.1 Cybersecurity Game Format . . . . . . . . . . . . 18 3.2.2 Target Audience of the Games . . . . . . . . . . . 18 3.2.3 Technical Infrastructure . . . . . . . . . . . . . . 18 3.2.4 Data Collection . . . . . . . . . . . . . . . . . . . 19 3.2.5 Properties of the Data . . . . . . . . . . . . . . . 19 3.2.6 User Privacy and Ethical Concerns . . . . . . . . 20 3.3 Research Methods . . . . . . . . . . . . . . . . . . . . . . 20 v
3.3.1 Modeling Game Levels (RQ1) . . . . . . . . . . . 20 3.3.2 Exploring Interaction Patterns of Players (RQ2) . 23 3.3.3 Providing Hints and Feedback (RQ3) . . . . . . . 24 3.3.4 Evaluation of the Research . . . . . . . . . . . . . 26 3.3.5 Limitations of the Approaches . . . . . . . . . . . 26 3.4 Schedule of the Research . . . . . . . . . . . . . . . . . . 27 3.5 Publication Venues . . . . . . . . . . . . . . . . . . . . . 27 3.5.1 Conferences . . . . . . . . . . . . . . . . . . . . . 27 3.5.2 Journals . . . . . . . . . . . . . . . . . . . . . . . 28 4 Achieved Results 31 4.1 Predicting Performance of Players . . . . . . . . . . . . . 31 4.2 Cybersecurity Course Report and Evaluation . . . . . . . 32 4.3 Analysis of Game Events . . . . . . . . . . . . . . . . . . 33 4.4 Academic Achievements . . . . . . . . . . . . . . . . . . 33 4.4.1 Presentations at International Conferences . . . . 33 4.4.2 Participation in Research Projects . . . . . . . . . 34 4.4.3 Teaching and Student Supervision . . . . . . . . . 34 5 Author’s Publications 35 5.1 Accepted and Released Publications . . . . . . . . . . . . 35 5.1.1 Challenges Arising from Prerequisite Testing in Cybersecurity Games . . . . . . . . . . . . . . . . 35 5.1.2 Enhancing Cybersecurity Skills by Creating Seri- ous Games . . . . . . . . . . . . . . . . . . . . . . 35 5.1.3 Gathering Insights from Teenagers’ Hacking Ex- perience with Authentic Cybersecurity Tools . . . 36 5.2 Accepted Publications to Appear . . . . . . . . . . . . . 36 5.2.1 Reflective Diary for Professional Development of Novice Teachers . . . . . . . . . . . . . . . . . . . 36 5.2.2 Towards Learning Analytics in Cybersecurity Cap- ture the Flag Games . . . . . . . . . . . . . . . . 37 5.2.3 Analyzing User Interactions with Cybersecurity Games . . . . . . . . . . . . . . . . . . . . . . . . 37 vi
1 Introduction More than 16,500 new security vulnerabilities were discovered in 2018 [1]. To give an example, one of the most prominent exploits targeted Face- book in September 2018 [2]. Through a combination of software bugs, attackers were able to obtain an access token for an arbitrary user ac- count. This breach, which was arguably the largest in Facebook’s history, exposed personal information of 50 million users. In the light of similar cyber attacks threatening enterprises all over the globe, it is startling that 41% of companies leave sensitive data completely unprotected [3]. What is more, by 2021, the annual damages from cybercrime will cost the world a staggering $6 trillion [4], which is a huge increase compared to the $1 trillion costs in 2012 [5]. Cybersecurity is defined as “a computing-based discipline involv- ing technology, people, information, and processes to enable assured operations in the context of adversaries” [6]. With the globally rising importance of combating cyber threats, the cybersecurity workforce shortage is growing as well. It is estimated that by 2022, 1.8 million jobs that require cybersecurity expertise will be unfilled [7]. Other sources estimate an even higher demand for 3.5 million experts by 2021 [4]. At the time of writing this thesis proposal, more than 310,000 of the open positions are in the USA [8]. Educational institutions, governmental organizations, and private companies are all aware that in a situation like this, training more cybersecurity professionals is crucial. As a result, they are continually developing curricula, courses, and training materials to fight the skill gap. An increasing trend in cybersecurity education is to complement theoretical knowledge and concepts with their practical applications. This is done by employing active learning methods [9] such as cyber- security games. These are software applications that allow learners to exercise their cybersecurity knowledge and skills by completing training tasks in a game-like context. The games simulate a broad spectrum of practical, real-world situations in a controlled environment. The players can attack and defend computer systems, analyze network traffic, or disassemble binaries without any negative consequences. Employing cybersecurity games in educational settings or com- petitions carries numerous benefits. The games can engage learners, 1
1. Introduction spark interest in cybersecurity, and motivate to explore the field fur- ther [10, 11, 12]. Next, the games allow learners to apply security tools [12] and practice theoretical concepts [13], thereby increasing com- petence, creativity, and problem-solving skills [14]. Many games feature assignments that resemble authentic cybersecurity work tasks, which might otherwise be problematic to simulate in a classroom [15]. This especially applies to exercising adversarial thinking. This term refers to adopting a “hacker perspective” [14] on how to force a computer system to fail. Such a skill is crucial for cybersecurity professionals, since it enables them to understand cyber attacks and set up effective defenses [14, 16]. Finally, apart from their value in promoting learn- ing [10, 17], ranking well in competitive games often leads to public recognition, (monetary) prizes, or job opportunities [18]. Because of their benefits, cybersecurity games have become one of the most prevalent and attractive methods of hands-on learning and competing. Games of various difficulty levels and focus grow in numbers and spread widely [10, 12, 15, 19, 20], from informal online hacking communities to universities, security conferences, and professional train- ing events. What is more, the number of participants in cybersecurity games is rising exponentially [11]. 1.1 Overview of the Research Problem The most popular format of a cybersecurity game is Capture the Flag (CTF). In a CTF game, the player completes practical security-related tasks while exercising technical skills and adversarial thinking. Finishing each task yields a unique textual flag that the player submits to confirm the solution. If the solution is correct, the player is immediately awarded points and continues with the next task. Although this game format has a vast educational potential for learners at all skill levels, it is currently employed mostly in competitions that target advanced players. CTF games usually require a substantial knowledge of the cybersecurity domain, as well as practical expertise. Therefore, these games are effective only for already skilled players [11] and offer little educational value to less experienced learners [18, 20]. Even worse, an unsuccessful attempt in such a game can frustrate beginners and diminish their motivation to learn [18]. 2
1. Introduction Cybersecurity Education Data analysis Figure 1.1: The research problem is in the intersection of three areas. To reduce the participation barrier for novice learners and increase the educational impact of cybersecurity games, players need to receive in-depth feedback on their approach. This feedback must be personalized for each player, explaining whether their approach is correct and why, what they do well, and what should they improve and how. Providing such guidance has “more effect on student achievement than any other single factor” [21, p. 480]. Without it, beginners miss learning goals and take a longer time to learn [22]. However, to the best of our knowledge, no CTF to date provides detailed feedback to players, and research of such methods in the context of cybersecurity games is almost non-existent. We see this as an open research problem. CTF games allow collecting data about the actions of players and the corresponding metadata, such as the timing of these actions. Although researchers analyzed such data to study computer security [12, 17, 23], very few focused on facilitating learning. To address this issue, we want to develop and evaluate methods for providing players with automated personalized feedback about their progress. For this, we need to define a model of the game levels, understand how players interact with the game and security tools, and create the feedback system. As Figure 1.1 shows, the research problem combines hands-on cybersecurity education with data analysis techniques. 3
1. Introduction 1.2 Expected Contributions and Impact Enhancing CTF games with real-time automated personalized feedback will improve the effectiveness of cybersecurity training on multiple levels. The feedback system can complement or even partially replace human teachers. Since automated interventions scale better than manual ones, more people, especially novice and intermediate learners, will be able to develop cybersecurity skills. Each learner will proceed at an individual pace and receive feedback with higher accuracy than from teachers, who act based on limited data. Moreover, the feedback system will address the needs of players who may require help, for example, by providing hints, explanations, or relevant study materials. This would allow learners to recognize mistakes, learn from them, and then improve their approach. Since cybersecurity tools are becoming increasingly complex, relevant feedback will ultimately help learners accomplish practical work tasks. Apart from helping students at all learning levels, our results will have a broad impact also on cybersecurity instructors, game designers, and educational researchers. Instructors will gain deeper insight into the difficulties of the learners, enabling them to facilitate learners more effectively. Game designers will gather evidence for improving the games and enhance the experience of future players. Finally, researchers will explore trends in the game data across groups of players or different games. What is more, since many other cybersecurity games involve similar types of player interactions, our methods could be generalized and applied also in other domains. 1.3 Structure of the Thesis Proposal This thesis proposal is divided into five chapters. Chapter 2 describes the current state of the art in cybersecurity education, analysis of educational data, and related areas. Chapter 3 defines the research problem, research questions, and methods. It also presents the plan of the work and lists relevant publication venues. Chapter 4 summarizes the results already achieved. Finally, Chapter 5 lists my published papers, three of which are included in the appendix. 4
2 State of the Art This chapter provides the necessary background and a survey of related research findings. Section 2.1 covers in depth the topic of CTF as the core of this thesis proposal. Section 2.2 deals with approaches to educational data analysis, the main research area for this work. It is important to note that cybersecurity education is a relatively young field. ACM/IEEE Computer Science Curricula [24] included Information Assurance and Security as a knowledge area only in 2013. Research in cybersecurity education is fragmented, and there is no single comprehensive resource, such as a monograph or journal series. Therefore, when writing this chapter, I read papers published at the related conferences and journals (see Section 3.5). I focus especially on the most recent publications from 2013 to 2018. 2.1 Capture the Flag Games Since this work is interdisciplinary and overlaps with educational re- search, this section starts with a broader context of using games in education. It narrows down to cybersecurity as it continues with a brief history and definition of CTF, its typology along with examples, and a discussion of the required technical infrastructure. Finally, the section examines the use of CTFs for competitions, education, and research. 2.1.1 Games in Education When it comes to gaming approaches, education can be enhanced by gamification or serious games. The former is defined as “the use of game design elements in non-game contexts” [25]. The latter refers to full-fledged games designed for a primary purpose other than entertain- ment [26] (in this context, to teach knowledge or skills). Using gamifica- tion or serious games in education is a form of active learning [27], and the latter is sometimes called (digital) game-based learning [28]. Cybersecurity educators applied gamification and serious games with great success. Enhancing an undergraduate cybersecurity course with game elements such as storyline, real-time scoring, and badges deepened student interest and motivation [29]. In another course, gam- 5
2. State of the Art ification reinforced student experience and engagement [30]. Multiple case studies of serious cybersecurity games report their positive effects on learning. Apart from CTFs described later, these games include Netsim, a web-based game to teach network routing [31]; CyberCIEGE, a game with 3D graphics to simulate cybersecurity decision-making at a managerial level [32]; and Werewolves, a text-based game to demon- strate information flow and covert channels [33]. Even board and card games were developed to teach cybersecurity principles. These games include Elevation of Privilege, a game to teach threat modeling [34]; Control-Alt-Hack, a game to promote the cybersecurity field [35]; and [d0x3d!], a game to teach cybersecurity principles [36]. The benefits of gaming approaches, which were mentioned above and in Chapter 1, are also supported in pedagogical theory and research. Studies confirm that students who play serious games exhibit higher flow and attainment compared to lectures [37, 38]. Generally, games promote learning and motivation, but their positive effects also depend on the context and the target audience [39]. For the advantages of serious games to manifest, elements of story, interactivity, and adequate delivery of educational content are vital [40]. 2.1.2 Origins and Definition of CTF The term Capture the Flag originally refers to a traditional outdoor game for two teams. The goal of each team is to steal a physical flag from the other team’s base while defending own flag at the same time. This game format later inspired the organizers of a hacker conference DEF CON [41] when creating a virtual playground for exercising cybersecurity skills. In 1996, DEF CON started a tradition of cybersecurity CTFs that is still evolving today. Since then, the label CTF has been used to denote a broad spectrum of events [23] with a different scope, structure, and variations of rules. This sometimes led to ambiguous interpretations of the term. Therefore, based on a thorough literature review below, we propose the following definition to unify the terminology. CTF is a remote or on-site training activity in which participants exercise their cybersecurity skills by solving various technical tasks in a limited time. Completing each task results in finding (“capturing”) a text string called flag. The flag serves as a proof of solution that is worth points. Therefore, the flags are usually 6
2. State of the Art long and random to prevent cheating. The team with the most points at the end of the game wins. CTF games can run in one of three modes, Attack-defense, Attack-only, or Jeopardy, which are detailed below. 2.1.3 Attack-defense CTF In an Attack-defense CTF [17, 23], the organizers prepare a (virtual) network of hosts that run intentionally vulnerable services. Each partic- ipating team controls an instance of this network with identical services. The goal is to attack the networks of other teams and defend own network at the same time. Attacking involves exploiting the vulnerabil- ities in the services of other teams, which results in gaining access to secret flags. Defending involves patching the same services on own hosts without breaking their functionality. A scoring bot regularly grades the teams based on a combination of submitting correct flags, applying de- fensive countermeasures, and maintaining the availability of the services. Examples of scoring systems are detailed in [20, 23, 42]. Attack-defense was the first type of CTF [43]. Since its inception in 1996, DEF CON CTF [41] has been hosted annually as an on-site event. Next, iCTF [17] is the largest Attack-defense CTF focused on cybersecurity education, which has been running online every year since 2003. Apart from the US-based events, the Russian RuCTFE is one of the biggest online Attack-defense CTFs held annually [16]. An Attack-only CTF is a subcategory of Attack-defense CTF. The defensive elements are removed from the game, and the teams focus only on exploiting services in the given network infrastructure. The term Defense-only CTF appeared in the literature [23, 44], but is rare in practice. Instead, Cyber Defense Exercises serve for defense training. Still, offensive and defensive skills are closely related, often blurring the line between attacking and defending [45, 46]. 2.1.4 Jeopardy CTF A Jeopardy CTF imitates the format of the popular television game show “Jeopardy!” [15]. It features an online game board with many different standalone assignments1 called challenges [13, 47]. The challenges are 1. Since the assignments are usually of an offensive nature, some authors regard Jeopardy CTFs as a subcategory of Attack-only CTFs [23]. However, the following 7
2. State of the Art divided into categories; the five most common are cryptography, web security, reverse engineering, forensics, and pwn (a hacker jargon for gaining ownership of a service) [48]. Each challenge has different difficulty and a corresponding score value. At any time, each team can choose to attempt any challenge2 , which typically includes downloadable files [49] and is solved locally. A successful completion yields a flag that is submitted to a scoring server to confirm the solution. Jeopardy CTFs are a part of informal competitions, academic events, and professional courses. One of the first Jeopardy CTFs arose again within the DEF CON community [41]. Every year since 2002 [50], DEF CON CTF Quals has determined advancement to the Attack- defense finale. This event inspired a multitude of other informal CTFs, such as Plaid CTF [51] running since 2011. Even Google started its annual CTF in 2016 [52]. Inter-ACE and C2C target university students, and extensive experience report from the organizers is available [53]. Next, CSAW CTF [54] is a well-established entry-level CTF hosted by academics. Since 2007, it has offered challenges for undergraduates who are beginners in cybersecurity and CTF [55]. Another introductory CTF is PicoCTF [56] running since 2013 for middle- and high-school students. Last but not least, private companies, such as SANS, create Jeopardy CTFs for certified security training [57]. The vast majority of Jeopardy CTFs, including all those previously named, are held online. 2.1.5 Technical Infrastructure for CTF Regarding platforms for Attack-defense CTFs, iCTF framework [17] is an open-source tool to build virtual machines (VMs) for the games. It was later developed into a website that offers on-demand creation of CTFs [16]. The service runs in a cloud and features a library of vulnera- bilities that can be included in the VMs. Alternative approaches include Git-based CTF [58], an open-source Attack-defense CTF platform, or us- ing Docker application containers to create the game infrastructure [59]. distinction is more practical: the tasks in Attack-defense and Attack-only CTFs are carried out in underlying network infrastructure, whereas in Jeopardy CTFs, the tasks are predefined in a web interface or a virtual machine. 2. All challenges are usually released at the start of the game. However, unlocking new ones at a predefined time or based on solved challenges is not uncommon. 8
2. State of the Art There are many open-source platforms for Jeopardy CTFs. A well- established one is CTFd [47], which allows creating and administering Jeopardy challenges via a web browser. The framework is documented and customizable with plugins. PicoCTF [56] developed an own platform similar to CTFd. It was later enhanced with the generation of unique flags or problem instances [60]. The former serves to prevent and detect flag sharing, while the latter allows creating practice problems. Finally, it is possible to share offline VMs with Jeopardy challenges [13]. 2.1.6 Advantages and Disadvantages of Competitive CTF The original purpose of CTF was competitive [47], and most CTFs remain “highly focused on competition” [12]. Similarly to programming contests [61], their goal is to showcase and evaluate the performance of already skilled participants [62]. Competitive CTFs cover many cy- bersecurity topics [10] and offer recruitment opportunities, reputation building [10], and enjoyment of competing [13] to the participants. Next, a competitive setting can motivate and engage students [29, 13], espe- cially those who are attracted to cybersecurity, have extensive prior experience, or possess skills required by the competition [11]. By solving the competition tasks, participants deepen their understanding of cyber- security [22] and practice creative approaches to both known problems and those outside the traditional curriculum [56]. Moreover, competi- tions offer considerable learning benefits also before and after the event. Preparing for a CTF involves developing new tools, studying vulnerabil- ities, and discussing strategies [17], which exposes participants to new skills [15]. After a CTF, the competitors or organizers publish write- ups: walkthroughs that report solutions and explain the vulnerabilities involved in the game. Both writing and reading these is beneficial [16]. Some authors argue that the effectiveness of cybersecurity com- petitions is not researched thoroughly [11], and that the evidence of engagement and motivation for learning is often anecdotal [10]. Although CTFs have vast educational potential, their competitive setting might discourage or even alienate other students [12, 63], especially begin- ners [11], for three main reasons. First, the tasks are usually too difficult for less-experienced participants [20]. Second, some of the tasks are also intentionally ambiguous, require a lot of guessing, or include artificial obstacles to make them harder to solve [55]. Third, the participants 9
2. State of the Art receive limited individual feedback about their progress. They are often unsure if they are on the right track and usually receive only information about whether the submitted flag was correct or wrong [55]. While the properties mentioned above are often suitable for compe- titions, they also create a large barrier to entry. Competitions do not attract many new participants, possibly leaving many talents uniden- tified and undeveloped [10]. The features of competitions can even deter novices from pursuing cybersecurity knowledge [64]. Even if less experienced players decide to participate, they may quickly become discouraged [55] or frustrated because of performing poorly [11]. Finally, although the unguided progress inherent for competitions suits advanced learners and can lead to creative solutions [65], it is highly ineffective for beginners [66]. Without guidance, novice students miss essential learning goals and take longer to learn a concept [22]. 2.1.7 Towards an Educational Use of CTF Only one can win a competition, but everyone can meet a challenge [21]. Therefore, educators can leverage the format of CTF for a self-paced, individual, hands-on practice of cybersecurity skills without the overly competitive setting. This would preserve most of the advantages men- tioned above without alienating beginners [12]. Some educators host CTFs with simpler tasks that are more suitable for beginners [67, 64]. However, to further unfold the educational potential of CTFs, learners must receive formative feedback (see Section 2.2.2) in the game. 2.1.8 Using CTF for Cybersecurity Research Apart from their value in competitions and education, CTFs can generate realistic datasets for research [68]. This data were employed to study cybersecurity itself [12, 17, 23] (not cybersecurity education, which is discussed in Section 2.2). Examples of such cybersecurity research include measuring network traffic during attacks, exploring the attack mechanics, or testing prototype tools. Moreover, the iCTF team leveraged this data to measure the effectiveness of attacks [42] or analyze the risks and rewards associated with the players’ strategy [69]. The datasets from iCTF are public [70], as well as from DEF CON CTF [41]. 10
2. State of the Art 2.2 Computer Science Education Research This section discusses approaches to educational data mining (EDM) [71] and learning analytics (LA) [72]. These are applied computer science disciplines that leverage student data to better understand learning and teaching, and ultimately, improve it [73]. EDM and LA significantly overlap, and differences in their philosophies are minor [74]. Both are interdisciplinary fields that combine educational theory and practice with inferential statistics, data analysis, and machine learning. They allow a shift from subjective teaching and learning to evidence-based, data- driven approaches [75]. We examine motivation, foundations, and recent findings related to the goals of this thesis. However, EDM/LA research in the domain of cybersecurity is sparse. Therefore, this section also mentions works from other domains: most notably programming, which comprises the majority of EDM/LA studies in computer science [76]. 2.2.1 General Approaches EDM/LA studies usually build a model of student data for further analysis. These models can be descriptive or predictive [77]. Descriptive models aim at explaining structure, patterns, and relationships within the data to address student modeling, support, and feedback. They usually apply unsupervised learning algorithms (such as clustering), inferential statistics, association rules, or instances-based learning. Pre- dictive models aim at estimating “unknown or future values of dependent variables based on the features of related independent variables” [77] to address student behavior modeling and assessment. They usually apply supervised learning algorithms (such as regression and classification), decision trees, or Bayesian networks. Traditionally, EDM/LA studies often relied on collecting additional information about learners, such as their demographics, previous experi- ence, or academic performance, via questionnaires [78]. This paradigm is apparent also in studies that evaluated some aspect of CTFs, such as par- ticipant learning or engagement. The evaluation was almost exclusively based on informal and often self-reported participant data [12] from surveys and focus group interviews, as in [20, 62, 79, 80, 81, 82, 83]. An- other traditional approach was comparing pre-test and post-test scores, as in [64, 84]. While both these approaches have merit in educational 11
2. State of the Art research, they also have major shortcomings. Self-reported data can be inaccurate3 , while test scores strongly depend on the test design. Nowadays, it is becoming increasingly common to examine student data produced while solving assignments [78], such as program code, and the corresponding metadata, such as the time spent on a task. This type of research involves four cyclical steps: collect data from a learning platform, analyze it, design a learning intervention, and deliver it within the platform [73]. We will employ a similar approach in a rigorous analysis of data available from CTFs. 2.2.2 Formative Feedback Formative feedback (also called formative assessment) is “information communicated to the learner. . . to modify his or her thinking or behavior to improve learning” [86]. An example of formative feedback is inform- ing students who struggle with a task about their misunderstandings and recommending concrete steps for improvement. Unlike summative assessment, which refers to evaluating a student’s performance with a grade or points after finishing a task, formative feedback is useful while the student is still learning [21, p. 480]. Although most practical computer science courses involve students in extensive problem-solving, completing as many assignments as possible does not necessarily promote learning [87]. Pedagogical theory [88] and cybersecurity educators [76] agree that formative feedback is another crucial element. It helps students to deeply engage with the subject [13], correct misconceptions, and improve understanding. Perhaps surpris- ingly, students are more motivated to improve their work when they receive formative feedback without the summative one [89]. Forma- tive feedback is especially vital in serious games [90], since it deepens understanding and separates educational games from play [18]. Nevertheless, assessing student performance on cybersecurity as- signments is a difficult task [76] that involves collecting and analyzing evidence about student achievement [91]. Quality feedback requires domain experts, few of which are available for this task [92]. Even then, providing feedback manually is laborious, time-consuming [13], and costly [89], thus becomes impossible even in moderately large classes. 3. CTF participants may report behavior not reflected in the game logs, and conversely, report not behaving in a way that is however shown in the logs [85]. 12
2. State of the Art If instructors assess learners manually, the feedback is often sparse or delayed. Therefore, there is a great need to automate the process. 2.2.3 Intelligent Tutoring Systems An intelligent tutoring system (ITS) provides automated feedback to learners while they solve computer-based assignments [73]. The system is based on domain knowledge, since the feedback results from comparing the learner’s problem-solving process with the expert’s solution. An ITS aims to automate teaching by replacing instructor feedback [73], and in STEM4 disciplines, an ITS can be as effective as human tutors [93] and increase student achievement [94]. However, supplementing the instructor is not always needed. Instead, having an ITS analyze edu- cational data can enhance classroom instruction by providing teachers with greater insight into students’ problem-solving processes [75]. 2.2.4 Challenges to Providing Feedback in Cybersecurity Cybersecurity games could incorporate an ITS, since they allow gather- ing rich data that can be automatically analyzed. This data includes information about the game network (such as the status of the services), player interactions with the game systems (such as typed shell com- mands), or generic game events (such as flag submissions). However, achieving the desired level of feedback automation is extremely com- plex, because educational game logs consist of vast numbers of possible actions, observable variables, and their relationships to student perfor- mance [95]. Another challenge is that the game tasks have multiple paths to the correct solution [96, 76]. Since providing detailed feedback is complex, students receive only summative feedback in CTFs and most cybersecurity exercises. They are informed whether they reached the correct answer or not, and are possibly awarded points. Although this feedback is easy to automate, it is insufficient for educational purposes [76]. It disregards important information about the process of finding the answer, that is, how the stu- dent approached a particular task. Without this information, instructors or computer systems are unable to provide formative feedback. What 4. STEM is an acronym for Science, Technology, Engineering, and Mathematics. 13
2. State of the Art is more, negative binary feedback can demotivate beginners, mainly because it does not explain what was wrong and how to fix it [97]. 2.2.5 Employing Command Line History for Feedback Gathering command-line history of learners solving cybersecurity tasks is essential to provide automated formative feedback. Collection of Bash history (including timestamps of commands, their arguments, and exit status) is implemented, to the best of our knowledge, only in the EDURange platform for cybersecurity exercises [98]. The platform can automatically generate an oriented graph that visualizes the Bash history. The vertices of the graph represent the executed commands. The edges represent the sequence of commands, that is, an edge from a command to means that was executed after . Instructors can use the graphs in real time to check how the students progress, what mistakes do they make, and whether they need extra guidance. A post- exercise use case would be to compare the graphs to each other, examine the pros and cons of different approaches, or compare them to a sample solution. This helps students to understand what they did well, identify misconceptions, and discover better approaches. Creating the graphs is explained in [76]. Starting from the raw Bash history log, the instructors identify primary commands essential to solving the particular exercise (e.g., nmap), along with secondary commands that are informative about the student’s progress but are not specific to the exercise (e.g., grep). Then, to reduce the complexity of the lengthy log, they group commands with similar arguments into a single vertex in the graph. Therefore, a subgraph can correspond to a particular high-level task. Lastly, the authors compare the students’ quantitative results (to what extent they reached the solution) with a qualitative analysis of patterns in the corresponding command graphs. 2.2.6 Ideal Feedback Formative feedback in a cybersecurity game should include a person- alized breakdown [15] of player’s actions and an explanation of their effects. This would allow learners to recognize mistakes and learn from them. The feedback can also include hints, for example, in the form of explanations of concepts, links to reference materials, or information 14
2. State of the Art about the flag format [55]. Similarly helpful are indicators that encour- age the player to continue in a correct approach [55] or prevent him/her from pursuing a blind path. All these aspects can act as an instructional scaffolding [98, 22] that guides the player to maximize learning. At the same time, it is necessary to keep in mind that too much guidance can resort to “cookbook instructions”, which the students can blindly follow without understanding and learn nothing [98]. 2.2.7 Related Research in Cybersecurity Education An open research area is exploring tools and methods learners use to solve CTF tasks [12]. Only a few studies addressed it to date. One study explored players’ behavioral patterns in a Jeopardy CTF. Participant success positively correlated with time to solve the challenges and nega- tively with challenge abandonment, Internet searching, and switching between tools or challenges [99]. Recognizing this behavior can be a basis for alerting instructors about students who experience difficulties. 2.2.8 Education Research in Other Domains The largest body of R&D in computing education is carried out in the domain of introductory programming. The focus is on summative assessment; especially automated grading received attention in scientific studies [100] as well as commercial software [101, 102, 103]. However, on- line programming tutorials and environments lack personalized formative feedback [96]. They usually provide only shallow feedback [104] focused on the program correctness based on automated tests [89, 96, 105, 106]. Exploring Errors and Correct Solutions Educators call for exploring ways of providing in-depth feedback and guidance to beginners [104]. A step in this direction is examining stu- dents’ errors and mistakes. A typology of errors in Java code was used to analyze student compilations and determine most common types of er- rors, their time-to-fix, repetitions, and spread [107]. In [108], the authors collected multiple correct student solutions to the same programming problems. They then used thematic analysis to categorize differences in syntax, structure, and style of the correct solutions. In [95], clustering identified player solution strategies. It showed actions that contributed 15
2. State of the Art to the solution and also revealed error patterns. If more correct solutions were possible, the authors calculated student preference for particular solutions. Generating Hints and Feedback Misconception-Driven Feedback [92] is a model for which instructors pre- define common programming misconceptions. These can be discovered in interviews [109], students’ solutions to assignments [100], or recorded incremental changes in code [110]. The model then analyzes a student’s code to detect syntax and semantic errors (based on compiler error messages) and logical errors (based on output checking by unit tests or code pattern matching to common errors). For each error, instructors define feedback messages shown to students to explain where and why the misconception occurred. In [111], similar feedback messages were displayed directly in the programming environment. A related research area is an automated hint generation, which was studied in the domain of introductory programming [112, 113]. A graph with all solution paths of previous students was created in [114]. A hint corresponded to the path from a given node towards the goal. What is more, instructors can annotate hints [110] to provide higher-quality feedback [92]. However, these data-driven approaches suffer from a typical slow-start problem [92]. An exception is employing historical data of previous students: in [96], data from only ten students sufficed to generate hints for more than 90% of students. 16
3 Research Aims and Methods The chapter starts with an overview of the research questions and expected results in Section 3.1. Then, it explains the research environ- ment in Section 3.2. This establishes the ground for Section 3.3, which proposes a method for each research question. It also discusses the evaluation of the results and limitations of the approaches. Section 3.4 presents a time plan for the research steps. Finally, Section 3.5 lists conferences and journals relevant for publishing the results. 3.1 Research Questions and Expected Results The research aim of this work is exploring pathways to providing auto- mated formative feedback to players of cybersecurity games. Specifically, we want to develop and evaluate a virtual learning assistant for CTF games. To achieve this goal, we will explore the following three research questions. RQ1: How to model paths to the solution of a game level? RQ2: How to improve the accuracy of the model by extending it with interaction patterns of past players? RQ3: How to employ the model to provide automated formative feedback to current players? We expect three main contributions from answering these questions. First, we will describe a general approach to modeling game levels and apply it in practice. Second, we will propose a taxonomy of errors and perform an exploratory study to discover common issues that learners face. Third, we will create a system for providing automated personalized hints and feedback. These results will improve the players’ learning experience, the instructors’ insight, and the games themselves. Although we focus on CTF, logs in other educational games can be analyzed similarly. Therefore, our results will have a broad impact in being applicable also in other domains. At the same time, we will explore the domain-specific question of using cybersecurity tools in the context of CTF. Since this research is applied, we will identify and address practical issues and test the results in a realistic use case. 17
3. Research Aims and Methods 3.2 Research Environment This section provides a starting point for the research to set the context for understanding the research methods in Section 3.3. 3.2.1 Cybersecurity Game Format We develop and run training activities in the form of Attack-only CTFs for practicing offensive security skills. Each game is played by a single player who initially controls an attacker machine in a network with several vulnerable hosts. The player gradually receives security-related assignments structured into a linear sequence of levels. Each level is finished by finding the correct flag. There is always only one correct solution; however, there might be several correct pathways to reaching it. Finishing a level is awarded by a specified number of points that contribute to the player’s total score. The game ends when the player enters the last flag or when (s)he decides to quit. The game provides optional scaffolding by offering static hints, which are predefined by the game’s author. If the player struggles with a level, (s)he can reveal these hints in exchange for points. It is also possible to display the complete solution recommended by the game’s author, in which case the player receives zero points for the level. Since the game focuses on practicing and learning, the players are usually allowed to use any materials, and sometimes even discuss their approach with other players. This setting mimics real-life situations where external knowledge bases and outside help are available [85]. 3.2.2 Target Audience of the Games Our CTF games are intended for adults with IT background who want to broaden their technical skills. Specifically, we focus on computer science university students and cybersecurity professionals. In the vast majority of cases, we have no additional information about the educational background or experience of the learners. 3.2.3 Technical Infrastructure The games are hosted in the KYPO cyber range [115, 116], a virtual training environment based on computational resources of CERIT Sci- 18
3. Research Aims and Methods entific Cloud [117]. KYPO can emulate arbitrary networks of virtual hosts, each of which can run a broad range of operating systems and applications [85]. The hosts are sandboxed, which provides an isolated and controlled environment for safe execution of cyber attacks [118]. The KYPO environment and the games in it were created by CSIRT-MU, the Computer Security Incident Response Team of Masaryk University [119]. 3.2.4 Data Collection The generic game format allows us to collect generic events from the game portal, regardless of the topic of the particular game. The game events describe the player’s interaction with the game interface. There are seven types of game events: starting the game, ending the game, starting a level, ending a level (by submitting a correct flag), submitting an incorrect flag (and its content), taking a hint (and its number), and displaying a solution to the level. Each game event is logged with the corresponding player ID and a timestamp. In the vast majority of games, the players work with command-line tools, mostly with the penetration testing toolkit in Kali Linux [120]. In addition to the game events, the KYPO cyber range allows retrieving command history from sandboxes of individual players. Below is an example of a command log. The player attempts to use the hydra tool to crack the password of a user yoda and makes several errors before reaching the solution. The commands are sequentially executed “one-liners”, and we can determine the timestamp for each. 2018-08-20 16:53:02 # hydra -l yda -P pass.txt 2018-08-20 16:53:15 # hydra -l yoda -P pass.txt 2018-08-20 16:53:34 # hydra -h 2018-08-20 16:57:28 # hydra -l yoda -P pass.txt ssh:172.18.1.14 2018-08-20 16:57:54 # hydra -l yoda -P pass.txt ssh://172.18.1.14 3.2.5 Properties of the Data A single session of a KYPO CTF game is intended for up to 30 people due to resource constraints. The games feature tasks whose solution is often unclear at first and requires completing several sub-tasks. As a result, we can gather rich and detailed dataset in each CTF game that we can explore deeply. Each player interacts with the game for as 19
3. Research Aims and Methods long as two hours, generates dozens of game events, and enters up to a hundred commands. Compared to systems for learning basic programming, CTF games are arguably more heterogeneous. Introductory programming tasks combine only several syntactic blocks, such as conditionals and loops. Cybersecurity tasks, on the other hand, require applying a multitude of specialized tools, several of which may fit the task. Both systems are similar in allowing to collect data about solutions and the associated metadata. 3.2.6 User Privacy and Ethical Concerns The data we collect neither contain any personally identifiable informa- tion nor can such information be inferred from them. All player IDs are either randomly generated or replaced with sequentially increasing numbers before further analysis. We do not associate any personal data with these IDs. As a result, player records are completely anonymized, and tracking the same player across different sessions is impossible. We purposefully do not store individual keystrokes, since typing patterns can identify users [121]. Similarly, we avoid collecting other sensitive data, such as eye movements [122], audio recordings, or heart rate [123], all of which were employed in computing education research. 3.3 Research Methods To address the research questions posed in Section 3.1, we will first define a model of expected paths to the solution of a game level. Second, we will update the model with the data about actions of previous players who completed the level. Third, we will provide dynamic hints and feedback to future players based on their position in the model. Now, let us examine the research methods in detail. 3.3.1 Modeling Game Levels (RQ1) We need to define a comprehensive representation of all the feasible progressions throughout a game level that lead to the solution. The model must allow tracking the progress of individual players so that if the player is stuck, (s)he can receive helpful automated hints. The challenge 20
3. Research Aims and Methods is to define a model that is abstract enough to suppress unimportant details but preserves interesting properties for analysis [124]. Theoretical Specification of the Model We will start by having a game designer model the solution to a particular game level. The model will be a transition graph = ( , ) that is finite, directed, and acyclic. The vertices represent the current state of a player. There are two types of states: a knowledge state and an action state. A knowledge state ∈ represents the current knowledge of a player. The first knowledge state 1 represents the information given to the player at the start of the level. The last knowledge state represents finding the flag. The ones between represent partial knowledge resulting from completing the sub-tasks. An action state ∈ represents a tool that is relevant for completing a task. The player can use it to advance to another knowledge state. We will not assume an arbitrary tool, but limit the model to the tools that are pre-installed in the attacker virtual machine. In the early stages of the research, we will also constrain ourselves to command-line tools. To maintain a reasonable complexity of the model, we will not store the arguments of the commands directly in the model, but instead associate them with the action state (see later in this section). Together, the knowledge and action states form the whole set of states, that is, = ∪ . They capture situations that the player can reach during the game. Finally, the edges represent transitions between states. A direct transition between two knowledge states is not permissible, as there must always be at least one action performed. Example Model To show an example, assume the following simplified game level. In the beginning, the player is informed about a server with weak login credentials, and the objective is to access it. To accomplish this task, the player must scan the server’s ports to reveal that the TCP port 22 is open on the workstation, running the SSH service. Having a list of common usernames and passwords, the player must then execute a dictionary attack, crack the password, and access the workstation. 21
3. Research Aims and Methods Nmap Hydra Knows Knows Knows Medusa server SSH port password Metasploit John Figure 3.1: The graph shows an example model for a simplified game level. The knowledge states are marked with a blue full line. The action states are marked with a red dashed line. The game designer can model this level as shown in Figure 3.1. In the initial state, the player knows about the existence of the vulnerable server from the level description. After performing a successful port scan, for which the player may use Nmap or Metasploit, (s)he discovers the open ports. Finally, the player executes a dictionary attack using Hydra, Medusa, or John the Ripper. Accomplishing this task reveals the password that the player uses to log in. Representation of Shell Commands The executable commands are largely heterogeneous. Therefore, we need to convert them to a suitable abstract representation. This will ensure a normalized format that disregards whitespace and the order of the arguments. For example, a command nmap -sS -T4 123.45.67.89 can be represented in the following JSON structure: { "command_name" : "nmap", "options" : ["sS", "T4"], "parameters" : ["123.45.67.89"] } 22
3. Research Aims and Methods We will then associate these structures with the corresponding action states to fully capture the author’s solution. 3.3.2 Exploring Interaction Patterns of Players (RQ2) To provide valuable feedback, modeling only the expert’s sample solution is not enough. The game designer might not capture all the possible states. Therefore, we need to gradually update the initial model with data from players’ interactions. After we collect logs of players who completed a level, we will group them by the player ID, and for each player filter actions that either contribute to a solution or indicate an error. This will allow us to discover both new actions (new solution strategies) and problematic parts. We will then update the model accordingly. Errors of Players We are especially interested in errors that players make when using cybersecurity tools. Our motivation is that observing types of errors and their repetition is a reliable indicator of performance [125]. Based on the collected data, we will propose a classification of errors. We will then statistically analyze and compare the occurrence of errors to create a knowledge base of common issues associated with action states. This will help us understand the thinking of learners [126] and thus provide more relevant feedback to their errors. It might also indicate poorly designed game levels. Although there are many possible errors, describing the most frequent ones proved to be sufficient [127]. Similarly to [126], we expect that the most common errors will cover the majority of errors. Moreover, we will examine which errors repeat and how often. One-time errors are most likely accidental mistakes, whereas repeated errors might indicate a misconception that needs to be addressed [128]. Repeated errors in a short time may show guessing or brute-forcing. Approaches to Analysis Although an analysis model exists for programming data [73], most of its features do not apply in the context of CTF. This is because the model heavily relies on specifics on programming, such as code editing data, compilation data, and debugging data. Nevertheless, we will employ 23
You can also read