Investigation on how presentation attack detection can be used to increase security for face recognition as biometric identification ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Investigation on how presentation attack detec- tion can be used to increase security for face recognition as biometric identification Improvements on traditional locking system Fredrik Öberg Independent degree project – second cycle — Master thesis Main field of study: Department of Information Systems and Technology Credits: 30 hp Semester, year: 10, 2021 Supervisor: Sebastian Försth (Dewire), Luca Beltramelli (Mid sweden university) Examiner: Mikael Gidlund, mikael.gidlund@miun.se Degree programme: civil engineering computer science , 300 credits
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 Abstract Biometric identification has already been applied to society today, as to- day’s mobile phones use fingerprints and other methods like iris and the face itself. With growth for technologies like computer vision, the Internet of Things, Artificial Intelligence, The use of face recognition as a biomet- ric identification on ordinary doors has become increasingly common. This thesis studies is looking into the possibility of replacing regular door locks with face recognition or supplement the locks to increase security by using a pre-trained state-of-the-art face recognition method based on a convolu- tion neural network. A subsequent investigation concluded that a networks based face recognition are is highly vulnerable to attacks in the form of pre- sentation attacks. This study investigates protection mechanisms against these forms of attack by developing a presentation attack detection and an- alyzing its performance. The obtained results from the proof of concept showed that local binary patterns histograms as a presentation attack detec- tion could help the state of art face recognition to avoid attacks up to 88% of the attacks the convolution neural network approved without the presenta- tion attack detection. However, to replace traditional locks, more work must be done to detect more attacks in form of both higher percentage of attacks blocked by the system and the types of attack that can be done. Neverthe- less, as a supplement face recognition represents a promising technology to supplement traditional door locks, enchaining their security by comple- menting the authorization with biometric authentication. So the main con- tributions is that by using simple older methods LBPH can help modern state of the art face regognition to detect presentation attacks according to the results of the tests. This study also worked to adapt this PAD to be suit- able for low end edge devices to be able to adapt in an environment where modern solutions are used, which LBPH have. Keywords Face Recognition, Presentation Attacks, Convolutional Neural Network i
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 Acknowledgements First, i want to start by thanking Dewire by Knightec, who gave me the opportunity to do this thesis with them and my supervisor Sebastian Försth. Secondly this thesis could never have been good to complete without the help of my supervisor Luca Beltramelli at mid Sweden university witch help me when i needed it and gave excellent feedback on the thesis to improve it. ii
Table of Contents Abstract i Acknowledgements ii List of Figures v List of Tables vi Terminology / Notation vii 1 Introduction 1 1.1 Background and problem motivation . . . . . . . . . . . . . . . 1 1.2 Overall aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Research question . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.5 Concrete and verifiable goals . . . . . . . . . . . . . . . . . . . 5 1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.7 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Theory 6 2.1 Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 Haar-like cascade . . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 Histogram of Oriented Gradients . . . . . . . . . . . . . 7 2.2 Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Spoofing and Presentation Attack . . . . . . . . . . . . . . . . . 9 2.4 Face classification . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.5.1 Convolutional Neural Network . . . . . . . . . . . . . . 11 2.5.2 Local Binary Pattern . . . . . . . . . . . . . . . . . . . . 12 2.5.3 Principal component analysis . . . . . . . . . . . . . . . 13 2.6 Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.6.1 Face recognition . . . . . . . . . . . . . . . . . . . . . . 14 2.6.2 Spoofing attacks databases . . . . . . . . . . . . . . . . 14 2.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3 Methodology 17 3.1 Research area and strategy . . . . . . . . . . . . . . . . . . . . 17 3.2 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Dataset structure . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4 Choice of algorithms . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.1 Face detection . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.2 Face recognition . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.3 Image classification . . . . . . . . . . . . . . . . . . . . 20 3.4.4 Presentation attack detection . . . . . . . . . . . . . . . 20 iii
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 3.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4 Implementation 22 4.1 Testing framework . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.1.1 Presentation attacks . . . . . . . . . . . . . . . . . . . . 23 4.2 Face recognition system . . . . . . . . . . . . . . . . . . . . . . 24 4.2.1 Face detection . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2.2 Face recognition with CNN . . . . . . . . . . . . . . . . 25 4.2.3 Image classification . . . . . . . . . . . . . . . . . . . . . 26 4.3 Presentation attack detection with LBPH . . . . . . . . . . . . . 26 4.3.1 Face detection . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3.2 LBPH training . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3.3 Image classification . . . . . . . . . . . . . . . . . . . . . 27 5 Result 28 5.1 Investigation of methods . . . . . . . . . . . . . . . . . . . . . . 28 5.1.1 Face recognition . . . . . . . . . . . . . . . . . . . . . . 28 5.1.2 Presentation attacks . . . . . . . . . . . . . . . . . . . . 28 5.2 Implementation of systems . . . . . . . . . . . . . . . . . . . . . 29 5.3 Evaluation against the database . . . . . . . . . . . . . . . . . 30 5.3.1 Case one FR . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.3.2 Case two PAD . . . . . . . . . . . . . . . . . . . . . . . 31 5.3.3 Case three PAD + FR . . . . . . . . . . . . . . . . . . . . 32 6 Discussion 33 6.1 Development of system . . . . . . . . . . . . . . . . . . . . . . . 33 6.1.1 CNN face recognition . . . . . . . . . . . . . . . . . . . . 33 6.1.2 Presentation attack detection . . . . . . . . . . . . . . . 33 6.2 Framework discussion . . . . . . . . . . . . . . . . . . . . . . . 33 6.3 Evaluation of results . . . . . . . . . . . . . . . . . . . . . . . . . 34 6.4 Ethical aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 6.5 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 7 Conclusions 36 7.1 Concrete and verifiable goals . . . . . . . . . . . . . . . . . . . 36 7.2 Conclusion research question . . . . . . . . . . . . . . . . . . . 37 7.3 Overall conclusion and lessons learned . . . . . . . . . . . . . 38 7.4 Main contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 38 References 40 iv
List of Figures 1 Cloud based face recognition system . . . . . . . . . . . . . . . 3 2 Illustration Haar-like features . . . . . . . . . . . . . . . . . . . 6 3 Face recognition process . . . . . . . . . . . . . . . . . . . . . . 7 4 Standardization of weak point in ISO/IEC DIS 30107-1, 2016 . 9 5 Convolutional neural network . . . . . . . . . . . . . . . . . . . 11 6 Max-pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 7 Local Binary Pattern . . . . . . . . . . . . . . . . . . . . . . . . 13 8 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 9 Folder structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 10 Cnn detection of a face . . . . . . . . . . . . . . . . . . . . . . . 29 11 Attack and real histogram distribution . . . . . . . . . . . . . . 30 v
List of Tables 1 Database protocols . . . . . . . . . . . . . . . . . . . . . . . . . 19 2 CNN baseline confusion matrix . . . . . . . . . . . . . . . . . . 31 3 CNN baseline results obtain from matrix . . . . . . . . . . . . . 31 4 PAD confusion matrix . . . . . . . . . . . . . . . . . . . . . . . 31 5 PAD baseline results obtain from matrix . . . . . . . . . . . . . 32 vi
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 Terminology . CNN Convolutional Neural Network CRISP-DM CRoss Industry Standard Process for Data Mining (CRISP-DM) DNN Deep Neural Network FR Face Regongition IoT Internet of Things KNN k-nearest neighbor LBP Local Binary Pattern PA Presentation Attack PAD Presentation Attack Detection SVM Support Vector Machine vii
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 1 Introduction This chapter explains the background of this study which focus on examine the security of facial recognition as biometric identification as a replacement or supplement to key system. Followed by, concrete and verifiable goals to achieve the overall goal for the project. And in the end, the reader can read about how this report is structured and what limitations it had, but also the author’s Contribution. 1.1 Background and problem motivation In recent years, biometric identification is becoming the preferred solution to a wide range of problems involving identity-checking because of the abil- ity to provide more secure identification and verification, which this article states. [1] And from this, a method for biometric identification that is very common today is face recognition. So by focusing on biometric identifica- tion as an alternative to replacing traditional locks. We can see that this has already been applied to society today, as today’s mobile phones use finger- prints. Other methods exist as well, like the iris and the face itself. Of these three, when it comes to daily use, face recognition is the least intrusive of them because of how easy it is to analyze images with faces. A recent survey [2] published in 2019 has identified and categorized over 330 contributions to deep learning-based face recognition, a testament to the significant in- terest surrounding this area in academia. One big part of what this survey talked about where the Identification process of a person, which is simply the process of someone claiming to be a specific person. After this process, what needs to happen is an authentication process to verifying or prove the claimed identity. This process happens today in the form of a traditional locking system that can use a key or password. These traditional lock leads to having many accounts, passwords, and more, and keeping track of these is becoming increasingly complex, especially when it comes to systems that require high security. And to solve these problems, traditional systems have biometric identification can be used, which is a process where parts of a person’s body are analyzed to identify the person. By looking into how bio- metric identification is used, we can see that this type of identification has started to increase. It is used more in places like smartphones, laptops, and tablets to secure data and other sensitive information because of the unique- ness of biometric characteristics in the security system. [3] However, one of these identification methods mentioned earlier is face recog- nition. Some challenges that need to be addressed for this method are low resolution, pose variation, complex illumination, and motion blur. Face recognition methods based on more traditional algorithms like support vec- tor machine (SVM), Eigenfaces, Fisherfaces, Metaface, and Bayesian faces 1
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 do not handle the problems mentioned above in a good way. Furthermore, all of the mentioned methods cannot handle unconstrained face matching, like having different lighting and background every time. All of these above mentioning problems are described in [2] One interesting thing the survey focused on where Convolutional Neural Network (CNN), Which out of 330 contributions in the survey, 61% were based on the CNN to solve different face recognition problems. These methods showed a good result on verifi- cation with face recognition of up to 96 % accuracy. [2] And by combining these Biometric identification methods with technolo- gies such as computer vision, Internet of Things, Artificial Intelligence, and cloud solutions. An ideal system that utilizes these technologies have been created for this study in figure 1 as a reference picture which will be ex- plained in detail later. Based on that picture and that phones and comput- ers are already utilizing face recognition to identify users. Questions appear, like, is it possible to use FR to solve the traditional key system access prob- lem? This question has already been tackled for face recognition because of re- search on how ordinary doors with face recognition will work. For exam- ple, [4] [5] [6]. Intelligent doors with face recognition are realized. However, with face recognition, other problems will also appear, like presentation at- tacks, when face recognition wrongfully gives access when attacked. This article [7] states that if a deep neural network (DNN) face recognition is used, the method is highly vulnerable to Presentation Attacks if the model has higher than 90% accuracy. Furthermore, since security is always a hot topic, this study is more about security regarding facial recognition. Furthermore, to explain why we can look at today’s people who use large parts of key systems or code to access places. However, this does not con- firm whether it is the physical owner of the key who accesses the site be- cause there is no guarantee that when using keys and codes, the people who do not have the authorization to enter will enter. One way to solve this problem is by developing a face recognition system that can unlock doors with people’s faces. Nevertheless, still many open questions remain to be answered. Would this system be more secure than regular locks, and will it be safe to use as an everyday use? How resilient is it to Presentation Attacks like replay attacks? What pros and cons does a system like this have? Alternatively, can it be used as a face recognition supplement to the already existing key system?. All of the above questions and thoughts will this study try to answer. 2
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 Figure 1: Cloud based face recognition system 1.2 Overall aim This study aims to examine the security of facial recognition as biometric identification as a replacement or supplement to key systems or similarly restricted areas where not everyone is authorized to have access. This will be done by making a proof of concept that uses LBPH in the python library openCV2 which was originally made to do face recognition but in this case will act as a PAD. And this will be compered with a CNN FR system to see how it handels PAs. Furthermore, This aim proceed from assuming the developed system will be used in the reference picture in figure 1. Due to popular technologies such as cloud computing and IoT that Industries are trying to implement. This figure can be illustrated as follow. A full-fledged door locking system with an edge device to capture and detect faces to see if the picture is a presentation attack. Furthermore, thanks to it having an edge device, the developed system must run on lower-end edge devices. Then the edge device sends the picture to the cloud. Then in the cloud, 3
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 a face recognition process begins. A decision will be made if the person is allowed to enter the door. To do this, a face recognition model with high accuracy will be used. With the architecture in Fig 1. as a reference when de- veloping the FR and the presentation attack detection (PAD), this study will investigate how facial recognition as biometric identification can replace or supplement traditional systems. More practically, the aim is to develop a PAD to protect against PA based on related work. Furthermore, by using a pre-trained CNN model and evaluate its vulnerability against PA. This will create a two system one for the PAD and one for the FR These systems will work together to classify if it is a PA and if it is allowed. And all of this will be evaluated based on accuracy and how good it can restrain PA against the Replay-Attack database. [8] 1.3 Scope This thesis has several limitations in the scope. One of these is thanks to how many ways of implementing a face recognition system. This thesis fo- cused on convolution neural networks for the face recognition method be- cause it is considered a more state-of-the-art way of doing it. To investigate and evaluate if it is possible to create a face recognition system to replace a traditional lock or have it as a supplement. Will this thesis focus on how to protect against presentation attacks and see how it affects the state-of-the-art face recognition baseline protection. Furthermore, the study is based on the earlier mentioned architecture in figure 1. which means that the PAD must be able to run on a low-end edge device which means that the PAD must be suitable for this. Due to the time span of the thesis, this study will only look at high-resolution replay attacks because of the simplicity of a regular user to do this type of attack. 1.4 Research question The main research questions in this thesis are as follows: • How can a PAD be used to increase the security of a state of art face recognition model like a CNN model with high accuracy in a locking system? • Can traditional locking systems be replaced with face recognition or be used as a supplement to increase the security of an existing locking system? 4
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 1.5 Concrete and verifiable goals The concrete goals of the project are as follows: • Investigate what methods there are in face recognition and what meth- ods to protect against presentation attacks against these face recogni- tion methods. • Implement a face recognition system using a CNN model and a PAD suitable for running on a low-end edge device. • Implement an test environment to evaluate the PAD, and the FR. • Evaluate the CNN model and the PAD against the presentation attack database. • Evaluate the PAD and the CNN model together against the presenta- tion attack database. • From the result, Evaluate the possibility of increased security by re- placement or supplement of a traditional locking system with Face recognition to see strengths and weaknesses when it comes to facial recognition. 1.6 Outline Chapter 1 describes the general background and what the purpose of this project is. Chapter 2 explains the necessary theory and presents the related works. Chapter 3 explains the method used to carry out this project to test and validate the created systems. Chapter 4 describes the implementation of the system. Then comes Chapter 5, which will present the results. Finally, Chapter 6 and 7 show the discussion and conclusions. 1.7 Contributions The thesis has been performed by Fredrik Öberg, under supervision of Se- bastian Försth (Dewire), Luca Beltramelli (Mid sweden university). Dewire by Knightec 5
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 2 Theory This chapter describes the theoretical elements for this thesis. The reader will get the theory to understand how and what the coming chapters will present. Some significant parts are deep learning and facial recognition. Another part will be previous work in the areas of this study. 2.1 Face Detection Face detection is a technology in computer science that aims to detect and identify faces in an image or a video stream. There are different methods to accomplish the tasks. One of these is CNN. Developing these networks from scratch will require vast amounts of data, and can be complex. So if this is a problem, a pre-trained model trained on millions of faces can be used to make it easier. Furthermore, there are also other commonly used methods like Haar-like cascade (HOG) with (SVM) or (LBP) cascade. A comparison of this method has been made by. [9] which showed that the HOG+SVM approach is more robust and accurate than LBP and Haar approaches, with an average detection rate of 92.68%. 2.1.1 Haar-like cascade Viola-Jones Algorithm Developed in 2001 by Paul Viola and Michael Jones. [10] Is the first step to have a Haar-like cascade. The Viola-Jones algorithm is an object-recognition framework that allows the detection of image features in real-time. Despite being an outdated framework, Viola-Jones is quite pow- erful, and its application has proven to be exceptionally notable in real-time face detection. Figure 2: Illustration Haar-like features 6
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 Detection works by outline a box on the image then iterates thru this image with this box. Furthermore, while this box is going thru the image, it will be searching for these haar-like features. These haar-like features are features the system can see in pictures based on the distribution in black and white colors. Furthermore, by combining these features, a face can be created. How these features look like is shown in figure 2 2.1.2 Histogram of Oriented Gradients A feature descriptor is an algorithm that takes an image and outputs fea- ture descriptors/feature vectors. And what it does is encode the informa- tion into a series of numbers and then act as a numerical "fingerprint" that can differentiate one feature from another. This is the base of how the His- togram of Oriented Gradients works. Furthermore, this method wants to create images with so low an amount of data and still see what the picture represents. HOG works by focusing on the structure or the shape of an object, and what HOG does is provide the edge directions by extracting the gradient and ori- entation of the edges. Additionally, small regions in the picture will rep- resent these orientations in the image. Furthermore, for each region, the gradients and orientation are calculated. Finally, the HOG generates a His- togram for each of these regions separately. Based on the values of the pixels and create the histograms using the gradients and orientations. 2.2 Face Recognition Face recognition is a digital technology that began to be developed in the 1970s [11] and has since been developed at a tremendous rate essentially because computers have become more powerful. What Face Recognition does to identify or verify a person based on a digital image or video frame. By comparing images from a given image within a database to generate a model. This model knows all images in the database, the process in figure 3 Ilustate a typical system. Figure 3: Face recognition process 7
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 • Occlusion and Partial occlusion are some of the significant challenges of face recognition, which is the ability to hide part of the face. It would be difficult to recognize a face if some part of the face is missing. • Low Resolution as an example, the pictures are taken from surveil- lance video cameras comprise tiny faces. • Digital Noise lineages are inclined to several types of noise. This noise leads to poor detection and recognition accuracy. • Illumination the variations in illumination can drastically degrade the performance of the face recognition system. The reasons for these vari- ations could be background light, shadow, brightness, contrast. • Pose Variation frontal face reconstruction is required to match the im- age face with the face in the database. • Expressions With the help of facial expressions, we can express our feelings which can affect the FR. • Aging is one of the natural components. • Plastic Surgery plastic surgery and their faces will be unknown to the existing face recognition framework. Based on these factors, there is a couple of methods to conduct facial recog- nition. These can be summarized using geometry-based Methods, Holis- tic Methods, Feature-based Methods, Hybrid Methods, and Deep Learning Methods. • Geometry-based Methods This method is one of the first proposed methods for face recognition. The method works by finding a set of facial landmarks to measure the position and distance between them. • Holistic Methods Represent faces using the entire face region. Many of these methods work by projecting face images onto a low-dimensional space. • Feature-based Methods refer to methods that leverage local features extracted at different locations in a face image. • Hybrid Methods combine techniques from holistic and feature-based methods. Solutions like a holistic and feature-based method were state of the art before deep learning became widespread. • Deep Learning Methods CNNs are the most common type of deep learning method for face recognition. It is because of the capability to handle an unconstrained environment. One negative effect of CNN is the amount of training data it needs and how long it takes 8
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 2.3 Spoofing and Presentation Attack Face recognition as a method for Biometrics identification has been used in public devices as so far back to 2009 and earlier where big companies like Lenovo, Asus, and Toshiba. Moreover, as this paper conclude is that is possible to bypass all Three of the big company’s face recognition. [12] As mention in chapter 1, state art faces recognition like CNN tens to have very high accuracy. Thanks to this, it is not always ideal to use it because it should also protect itself against attacks. After all, they are many types of attacks. [13] Tackle this problem by developing a secure framework to protect the privacy of the data by offloading the data from the edge to the cloud. Figure 4: Standardization of weak point in ISO/IEC DIS 30107-1, 2016 A more general way is what figure 4 shows (ISO/IEC DIS 30107-1, 2016). These are weak point attacks on the biometric sensor (point 1) is called direct attacks or PAs Attacks at points 2 to 9 are called indirect attacks. From this, a presentation attack is when using biometric data as an attack on the system. The attacker will display biometric data to create events that wrongfully appear to pass the system when receiving data directly from the person, online or existing databases. It is possible to create these types of attacks. Protecting against these PAs is to develop countermeasures to PAs that iden- tify whether the presented biometric sample is a false presentation. This sys- tem is called PAD (presentation attack detection). Some variations of PADs are Frame-based, only use a single image to classify face samples. These PAD systems can quickly output a decision. Video-based require a video recording of a certain length to classify the samples. Other methods require human interaction, like Challenge-Response. When it comes to PA attacks, the are multiple ways to do them. Morphed 9
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 face attacks are one way of attacking a system. [14] Investigated the vul- nerability of biometric systems to such morphed face attacks. The result ended with creating two new databases by printing and scanning digitally morphed images using two different scanners and valuating the techniques proposed to detect morphed face images. Furthermore, other databases have also created to test and train the PAD and FR system to handle this type of attack. One this thesis will use named REPLAY-ATTACK, will be used in this paper. Other papers like this one 6313548 studied the Effec- tiveness of Local Binary Patterns in Face Anti-spoofing and, for evaluation, used the REPLAY-ATTACK. This paper as well used it for Image-Based Ob- ject Spoofing Detection [15], which tries to improve the spoofing detection ability by using multiple color schemes to concatenate them and train the model, which shows promising results against other PAD. The state-of-the-art method of developing PADs is to make CNN models. A problem with CNN-based PADs is that it needs many data to train cor- rectly as [16] mentions numerous parameters in these deep learning-based detection methods cannot be as good they can be due to limited data. 2.4 Face classification Face classification is classifying the features extracted from a person after getting hold of the facial features by the recognition. Furthermore, compar- ing it to the database to classify this. More precisely, person A has features, and person B has other features that must be classified to decide which per- son it is. To achieved this, a classification algorithm can be applied. Some classification algorithms are SVM, k-NN and Gaussian Naïve Bayes. • SVM the objective of the support vector machine algorithm is to find a hyperplane in N-dimensional space that distinctly classifies the data points. • k-NN The k-nearest neighbors (KNN) algorithm is a simple, easy-to- implement supervised machine learning algorithm that can solve clas- sification and regression problems. • Gaussian Naïve Bayes Based on Bayesian classification methods, Naive Bayes classifiers rely on Bayes’s theorem, an equation describing the relationship of conditional probabilities of statistical quantities. In Bayesian classification, we are interested in finding the probability of a label given some observed features. 10
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 2.5 Methods There are many ways to do face recognition when it comes to face recogni- tion, so this chapter presents an explanation of a couple of methods to do face recognition. 2.5.1 Convolutional Neural Network A convolutional neural network is a method in the field of deep learning. This method is a common and well-known method for image classification, object classification, and faces classification. CNN takes an input image and runs this image thru a couple of different layers. An example of this setup is shown in figure 6. What the layers do is explain down below. Figure 5: Convolutional neural network • Input Layer This layer takes an image that has a basic two-dimensional structure. But if we take the colors, then we can represent the image in three-dimensional. Images are encoded into color channels, so the image data is represented into each color intensity in color, typically RGB. The intensity of each channel color into the width and height of the image becomes three-dimensional. To be able to use the image in the CNN, it needs to reshape it into a single column. As an example, 28x28 = 784 will be converted into a 784x1. So, if the training data is n, the input will be (784, n) • Convolution Layer This layer main focused is to extract features. What the layer does is taking the input image and connect it to the Convo layer. This performs a convolution operation, which means it will cy- cle through the image with a set size of a filter. As an example, if the image is4x4 and the filter is 3x3 the cycle will go through the image four times and calculate a 2x2 matrix. equation (1) is the general for- mula for this operation. which shows the operation where N is the image size and Fis the filter. If the size of the output wants to be con- 11
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 trolled, padding can be added to the equation. (2) shows a version that adds p as padding. ( NxN ) ∗ ( FxF ) = ( N − F + 1) x ( N − F + 1) (1) ( N + 2p − F + 1) x ( N + 2p − F + 1) (2) • Pooling Layer This is for reducing the volume of the image to a more spatial form and is usually between two Convolution layers. One of the more popular Pooling layers is max pooling which means the max- imum value in a batch will be chosen in the reduction figure 6 shows a 2x2 max polling process. This is to reduce the computationally ex- pensive not doing it will have. Figure 6: Max-pooling • Fully Connected Layer A fully connected layer involves weights, bi- ases, and neurons. It connects neurons in one layer to neurons in an- other layer. It is used to classify images between different categories by training. In place of fully connected layers, conventional classifiers like SVM can be used as well. However, we generally adding a Fully Connected Layer will be added to make the model end-to-end train- able. 2.5.2 Local Binary Pattern Local Binary Pattern is a simple yet very efficient texture operator which la- bels the pixels of an image by thresholding the neighborhood of each pixel and considers the result as a binary number. As is shown in 7 The general way to describe this process is equation (3)where S is defined as (4)The ob- tain values then can be used to create a histogram of the future which then combines with another future histogram. This histogram is a classifier for different classification methods. 12
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 Due to its discriminative power and computational simplicity, the LBP tex- ture operator has become a popular approach in various applications. It is the unifying approach to the more traditionally divergent statistical and structural models of texture analysis. Perhaps the essential property of the LBP operator in real-world applications is its robustness to monotonic gray- scale changes caused, for example, by illumination variations. Another im- portant property is its computational simplicity, making it possible to ana- lyze images in challenging real-time settings. p =0 LBP(gpx , gpy ) ∑ P−1 S( gp − gc) × 2 p (3) 0 if x≥0 S( x ) = (4) 1 if x
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 Step 5 compute the covariance matrix Step 6 calculate the eigenvectors with their related eigenvalues Step 7 K eigenvectors For face detection and recognition, the Eigenface approach is considered by many to be the first working facial recognition technology. It served as the basis for one of the top commercial face recognition technology prod- ucts. Since its initial development and publication, there have been many extensions to the original method and many new automatic face recognition systems. Eigenfaces are a baseline comparison method to demonstrate the system’s minimum expected performance. 2.6 Databases Using a database of pictures and training a model can be done with a face recognition algorithm. This chapter will discuss the different types of databases that exist to use to create FR models like CNN. To get the best result, differ- ent kinds of data in the database depending on the situation to achieve the best result. For example, age can significantly impact the result and the lighting, and the environment. Another essential part is the problem with presentation attack, which also needs training depending on PAD type. It will also address some of the databases for different kinds of attacks on sys- tems. 2.6.1 Face recognition To develop a successful FR system, the system must consider what kind of problems the system has to deal with, and This requires a database to train the model. The choice of database most fits the purpose of the model. For example, if the model purpose is to make an FR system for children, then the database must contain images and variation of the image that mimics chil- dren. Other factors like Occlusion, Low-Resolution Noise Plastic Surgery, Aging illumination, Pose Expressions can affect the result. 2.6.2 Spoofing attacks databases To handle presentation attacks, databases must be available to test if the system can handle several types of attacks. This chapter explains what types of databases there are and the different purposes. Some are for latex masks. Some are for print attacks others are replay attacks. Later these databases can test the crated system against matrices for face recognition. • MOBIO This database consists of bi-modal (audio and video). which contains data from 152 people, 100 males and 52 females. This was 14
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 done from 2008 to 2010 from six different sites from five different coun- tries. [17] • Replay-Attack Is a 2012 database and is made up of 1300 video clips of photo and video attack attempts on 50 clients. Furthermore, have four different groups. training data ("train"), to be used for training your anti-spoof classifier, Development data ("devel"), to be used for threshold estimation, Test data ("test"), with which to report error fig- ures, Enrollment data ("enroll") that can be used to verify spoofing sensitivity on the face detection algorithms. [8] • Replay-Mobile Is a similar database to Replay-Attack. Consists of 1190 video clips of photo and video attack attempt to 40 clients, under different lighting conditions. an also have the same groups as Replay- Attack [18]. • SWAN The SWAN-Idiap dataset comprises 150 subjects captured in six different sessions reflecting real-life scenarios of smartphone-assisted authentication. One of the unique features of this dataset is that it is collected in four other geographic locations representing a diverse population and ethnicity. Additionally, it also contains a multi-modal Presentation Attack (PA) or spoofing dataset using low-cost Presen- tation Attack Instruments (PAI) such as print and electronic display attacks . [19] • WMCA The Wide Multi-Channel Presentation Attack (WMCA) database consists of 1941 short video recordings of both bonafide and presen- tation attacks from 72 different identities. The data is recorded from several channels, including color, depth, infra-red, and thermal [20]. 2.7 Related work As mentioned in the introduction of this thesis, much work exists in face recognition in recent years. About 330 contributions analyzed in the 2019 servery 61 % were based on the CNN network to solve different face recog- nition problems. Show good results on verification with face recognition up to 96 % Accuracy. One big part of this servery was to focus on what problems FR must overcome to get a good face recognition, and some of them play a big role depending on the purpose of the model. Some of these problems were still image-based face recognition. Where in recent year con- siderable progress has been made in constraint environment. Furthermore, recently, researchers focus more on unconstrained face recognition where various poses, illuminations, expressions, blur, ages, and occlusions were problems. [2] However, with FR models with high accuracy, other problems will be discovered. Like what this article has researched Deeply vulnerable: 15
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 a study of the robustness of face recognition to presentation attacks. What this article has done is to investigate the DNN FR model’s vulnerability to a PA. Because as of earlier said, DNN FR, like CNN, has been recently outper- formed other methods by a significant margin. Nevertheless, maximizing recognition performance alone is not sufficient. The system should also be capable of resisting various kinds of attacks, including PA. What this studie shows is that high DNN based FR is highly correlated to be vulnerable to PA when the accuracy starts to be in the 90% or more [7]. Which also shows in this article [21] which concludes the lessons learned about spoofing and anti-spoofing in face biometrics and highlight open issues and future direc- tions. A what they say is that "Without spoofing counter-measures, most of the state-of-the-art facial biometric systems are indeed vulnerable to attacks since they try to maximize the discriminability between identities without regards to whether the presented trait originates from a legitimate living client or not." As for system development for exactly door access, some articles focused on developing a Low-Cost Embedded Facial Recognition System for Door Access Control using Deep Learning. To have ass an edge device and so on. However, one vulnerability this found is that we have said earlier the ability to use a phone with the face and access the door. [4] Other paper have also done developing the smart door system like [5] [6] 16
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 3 Methodology This chapter describes the methods used to fulfill the Concrete and verifi- able objectives described in Chapter 1.5. By first explaining the Work strat- egy that will be used to achieve the goal of this study. This will be performed during the thesis period until the project has been completed. The Last part will describe the testing and the validation of the system. 3.1 Research area and strategy During the work, A conclusive research with experimental data has been conducted and how this was achieved was with a mixture of the two agile work strategies Scrum and XP. Scrum was chosen as it is well suited for de- velopment projects where the requirements often change during the work. With Scrum, it is in these cases easy to change the requirements set at the be- ginning of the project. The method is also suitable when there is uncertainty about which parts the project will have. XP was used together with Scrum to enable backlog changes during an ongoing sprint, as rapid changes and varying requirements could have occurred. Scrum has a sprint length of at least two weeks, while XP has a length of one to two weeks. Furthermore, the mix between Scrum and XP has meant that the work has been focused on a product backlog. This backlog has constantly been chang- ing based on need. These changes could also have taken place during an ongoing sprint. Something that Scrum as the only strategy had not allowed. In the initial stage of the work, a feasibility study has been carried out. This helps to produce information to create a solid foundation to work from. In meetings with Knightec Dewire and Mid Sweden University, the discussion regarding the scope and area of the project has been clarified. This informa- tion has since been of great importance for the collection of requirements on which the work is based on. During the feasibility study, information has been obtained from a similar Thesis that has existed. This is to investigate how these solutions work and how and if this could affect the project’s direction. The feasibility study has shown that similar systems and software exist today but with some differ- ences. 17
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 3.2 Proposed solution The proposed approach to investigate the aim of this study is to make a proof of concept that uses LBPH in the library openCV2 which was orig- inally made to do face recognition but in this case will act as a PAD and this proof of concept is based on if it is possible to replace regular locking systems with state of art systems like CNN. The main way how this study will work is based on the CRoss Industry Standard Process for Data Mining (CRISP-DM) where the first focus is on finding purpose for the project through Business understanding. The sec- ond step is to understand what type of data will be needed in this case, which database to use and what type of attacks. Which then leads to prepar- ing the data to be used. In the end, modeling and testing will be done to be evaluated it. To be able to to this fist of all an investigation has been done to complete one of the Concrete and verifiable goals. This investigation has shown that CNN faces recognition has really high accuracy which means that it a good candidate for the study. which furthermore research of the re- lated topic the concept of Presentation attack was introduced. This concept is attacks on the FR system and a couple of articles show a high correlation between high accuracy FR system and the vulnerability for PA. So based on that information the study will lock into a special case which was shown in chapter 1 this was an IoT and cloud-based solution for a locking system. With this in mind, the proof of concept will include a FR and a PAD with a testing framework to see if the high-accuracy CNN model will have effi- cient results from protection from PAs with this PAD which must be suitable to run on low-edge devices. Also, what type of data will be tested on the system. All of the choice which algorithm to use is explained further in this chapter. First, the dataset. Then the choice of algorithms for Face detection, Face recognition, Image classification, and presentation attack detection. 3.3 Dataset structure The chosen database for PA where the REPLAY-ATTACK database because of the related article [15]. which use LBP, which is similar to this study. The chosen PAD in that article gave a good result. Also, how easy it is to make a replay attack on a system. The database is constructed to have four different types of data train, dev, testing, and Enrollment which give the user a comprehensive ability to construct a PAD. Furthermore, the dataset comes with protocols of different types of attacks listed in Table 1. Moving on, the PAD and FR will train on this protocol. As seen in the table, a good variation of training and testing is available. The PAD and CNN model will have the own collection of training, but both of them will use the same database. 18
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 Table 1: Database protocols Hand-Attack Fixed-Support All Supports Protocol train dev test train dev test train dev test Print 30 30 40 30 30 40 60 60 80 Mobile 60 60 80 60 60 80 120 120 160 Highdef 60 60 80 60 60 80 120 120 160 Digitalphoto 60 60 80 60 60 80 120 120 160 Photo 90 90 120 90 90 120 180 180 240 Video 60 60 80 60 60 80 120 120 160 Grandtest 150 150 200 150 150 200 300 300 400 3.4 Choice of algorithms In this study, a couple of choices have been made because of how broad the choices is. This chapter will address the choice of the critical parts in the system. This will include which face detection, face recognition, and what classifying method will be used. Furthermore, what methods are used to detect the presentation attacks. 3.4.1 Face detection This study focuses on face recognition and the detection of PA and not the detection of the face which means not the focus has not been on face de- tection. Furthermore, the face detection method’s choice is based on the mentioned architecture presented in Chapter 1. So it must be able to run on a lower edge device. To make it more accessible, the system will use Opencv2 and Python’s own face recognition library to utilize as this study do not have a focus on Face detection. so what method the CNN and the PAD will be Histogram of Oriented Gradients (HOG) to detect the face in the CNN , and the PAD will use CV2 CascadeClassifier. 3.4.2 Face recognition For the face recognition, we have two choices: one for the CNN face Rego- nigtion and one for the PAD. For the CNN models, there are a lot of different trained models that can be used. As mention in Section 2.5.1. Researchers have developed different kinds of CNN architecture which. In this study, Dlibs face recognition will be used which is build in python. Dlibs is a ver- sion of the ResNet-34 developed by [22] but with fewer layers and the num- ber of filters reduced by half. This version was made by Davis King and was done on a severely different dataset, including self scraped from the internet, scrub dataset. [23], the VGG dataset [24] and the Labeled Faces in the Wild (LFW) [25] dataset the network compares to other state-of-the-art 19
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 methods, reaching 99.38% accuracy. [26] 3.4.3 Image classification The last part is image classification. The used CNN model will use regular Euclidean distance with a specific confidence. Furthermore, in the end, the result of Euclidean distance will end up in a voting system. To decide which one of the faces has more confidence. The LBPH will be using the histogram generated for each face to compare it to the input, and with calculated con- fidence, it will decide how close the face is to the real one. 3.4.4 Presentation attack detection The PAD will be using LBPH because of the promising result in [15], which worked with LBP with different color schemes. Also, one reason for using LPB based is because it not highly computational is excellent for edge de- vices. More state of art PADs that uses CNN to train the PAD is problematic because of the amount of data it needs. This article [16] mentions, the avail- able databases used for PA are not so good because of the size CNN needs. This article also states that CNN and LPH have a similar structure which can be a good choice. What thay did in the article was to use LPB to reduce the CNN, so it did not need as much data which is ass mention earlier as a problem. 3.5 Evaluation To understand how good or bad the created system is. It can be evaluated against performance matrices. In this chapter, some evaluations of biomet- ric recognition systems will be explained. The generic way of evaluating this kind of system is Metrics for binary clas- sification systems. The idea is to identify if a person is positive or negative. eq. (6) defines a label positive or negative depending on the function M ( x ) which returns the score of the face model, which then can be compared against a certain threshold r. positive i f M( x) ≥ r label = (6) negative i f M( x) < r These metrics for binary classification systems have four possible outcomes listed down below. • true positive (TP) when x is a positive sample and is labeled as a pos- itive sample. 20
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 • true negative (TN) when x is a negative sample and is labeled as a negative sample. • false positive (FP) when x is a negative sample and is labeled as a positive sample. • false negative (FN) when x is a positive sample and is labeled as a negative sample Furthermore, based on these values, a calculation can be done to obtain the following computed score. • sensitivity, recall, hit rate, or true positive rate (TPR): • specificity, selectivity or true negative rate (TNR): • precision or positive predictive value (PPV): • negative predictive value (NPV): • False Rejection Rate (FRR): • False Acceptance Rate (FAR): • half total error rate (HTER): To test A spoofing detection system, we must handle two types of errors, either the actual access is rejected (false rejection), or an attack is accepted (false acceptance). In order to measure the performance of a spoofing de- tection system, the Half Total Error Rate (HTER), which combines the False Rejection Rate (FRR) and the False Acceptance Rate (FAR) and is defined as (7) FAR + FRR HTER(%) = ∗ 100 (7) 2 FP FAR = (8) FP + TP FN FRR = (9) TP + FN In an ideal spoofing detection system, both FAR and FRR should be 0. An- other metric commonly used to evaluate a biometric system is the EER - Equal error rate. This error rate is obtained at the threshold that provides the same FAR and FRR. 21
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 4 Implementation This chapter will cover the CNN face recognition implementation using Python, which uses DLIB which is a C++ toolkit containing machine learn- ing algorithms. As mention earlier, a pre-trained and modified version of the resnet32 will be used to do the FR. The developed PAD will use LBPH to train on the faces in the databases. And then how this two models for PAD and FR can be used as a evaluation if the face was real or an attack will be covered in this chapter as well. Furthermore will the evaluation of the system be done by developing a framework, to be able to attack the system with specific protocols that the database has. 4.1 Testing framework The created framework to test the CNN and the PAD is based on testing three different cases in the system illustrated in figure 8. The first case represents the FR result without the PAD and the second one is the result of the PAD. This is to evaluate the two systems separately. The third one is when the system applies the PAD to the system. This is to see how the PAD affected the FR when faces labeled PA is removed in the FR. Figure 8: Framework Furthermore the created framework is based on a terminal application that works with arguments. Below is the listed argument for the framework. A trained CNN model and PAD have been created with the corresponded data, which is the replay database. The training on the CNN will be done on the actual videos in the dataset. The PADs training can happen in several ways depending on what system is testing because of the severe types of attack that can be done on the system which is explain more later. • Testing specific argument 22
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 – test-method which test you gona run CNN only PAD or both togheter – protocal which protocal to run • CNN specific argument – detection-method face detection model to use either ‘hog‘ or ‘cnn‘ – encodings path to serialized db of facial encodings for the CNN – dataset path to input directory of faces + images – Image path to input image – Inputvideo path to input video – display whether or not to display output frame to screen – outputvideo path to output video • LBPH specific argument – lbphcascade path to the face detection cascade for the LBPH – lbphyml path to the yml file which containd the traind data for the PAD – lbphlabels path to the pickle file which contain the accosiaded names – lbphsavecapture save path for the PAD model 4.1.1 Presentation attacks Ass mentioned earlier in this chapter. The system will be evaluated against some protocols. These are created by the developer of the Database ex- plained in section 3.2, which is what type of attacks can be done on the system. These protocols will work as an attack on the system in the three cases explain earlier. First, it will run FR without the PAD to see how the baseline protection is for the CNN FR. The same protocol will run thru the PAD to detect as many PAs as possible. Furthermore, the case three will be run last which are a combination of both cases 1 and 2 together. So in these three cases three attacks will have been done, which are evaluated against the HTER value explained in section 3.4 In the end after the system has ob- tained results from the CNN and the PAD separately. But also together. will conclusion and discussion will be presented in the later chapters. 23
Investigation on how presentation attack dedection can be used to increase security for face recognition as biometric identification Fredrik Öberg 2021–06–15 4.2 Face recognition system One of the concrete and verifiable goals was to implement a face recognition system using the CNN model. The Python Face recognition library was used to accomplish this goal, and this library uses the modified version of the resnet 32 CNN model and trained with over 3 million faces. The rest of 4.2 will explain the implementation steps of the created FR system with this CNN model. 4.2.1 Face detection The first step of training the CNN model is to detecting faces in the pictures. The face detection part of the CNN model will work with the Histogram of Oriented Gradients to speed up the process which was explained in detail in chapter 2.1.2. Why this metod is used is becuse when testing was done a noticeable increase of execution time was shown when for exempel CNN was used instead of HOG Futermore when developing the CNN FR, some possible cases were devel- oped. One when the system needed to recognize faces in a single image, one with recognizable faces in a live video stream from the webcam and then outputing a video, and one to recognize faces in a video file residing on disk and output the processed video to disk. down below a step-by-step process be explained how the detection will work • Step 1 Depending on what type of media the user is using, the detec- tion of the face is the same. The idea is to store the known encoding and the known names in two lists. These two will contain the face encodings and corresponding names for each person in the dataset. • Step 2 Depending on how many people the user wants to train. The system needs to iterate thru them and detect the faces. This is depend- ing on the structure illustrated in figure 9 The process will iterate N times if N people are in the dataset. From there, the system will ex- tract the name of the person from the image path. And important step is to converting images to RGB because DLIB expects it, so before we proceed, a swap needs to be done. • Step 3 In this step, for each iteration we use the library module in python named face recognition that has a face locations method that takes the RGB image and what type of method to detect ass mention earlier it is HOG. • Step 4 In this last step, we utilize the face encodings module in the library to convert the image to an en numerical encoding and take the name and the encoding and append it to (known encodings and known names). 24
You can also read