Unmasking Face Embeddings by Self-restrained Triplet Loss for Accurate Masked Face Recognition
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
1 Unmasking Face Embeddings by Self-restrained Triplet Loss for Accurate Masked Face Recognition Fadi Boutrosa,b , Naser Damera,b,∗ , Florian Kirchbuchnera , Arjan Kuijpera,b a Fraunhofer Institute for Computer Graphics Research IGD, Darmstadt, Germany b Mathematical and Applied Visual Computing, TU Darmstadt, Darmstadt, Germany Email: fadi.boutros@igd.fraunhofer.de Using the face as a biometric identity trait is motivated by the does not require any modification or training of the existing contactless nature of the capture process and the high accuracy arXiv:2103.01716v1 [cs.CV] 2 Mar 2021 face recognition model. We achieved this goal by proposing of the recognition algorithms. After the current COVID-19 the Embedding Unmasking Model (EUM) operated on the pandemic, wearing a face mask has been imposed in public places to keep the pandemic under control. However, face occlusion embedding space. The input for EUM is feature embedding due to wearing a mask presents an emerging challenge for face extracted from the masked face, and its output is new feature recognition systems. In this paper, we presented a solution to im- embedding similar to an embedding of a unmasked face of prove the masked face recognition performance. Specifically, we the same identity, whereas, it is dissimilar from any other propose the Embedding Unmasking Model (EUM) operated on embedding of any other identity. To achieve that through our top of existing face recognition models. We also propose a novel loss function, the Self-restrained Triplet (SRT), which enabled the EUM, we propose a novel loss function, Self-restrained Triplet EUM to produce embeddings similar to these of unmasked faces Loss (SRT) to guide the EUM during the training phase. The of the same identities. The achieved evaluation results on two face SRT shares the same learning objective with the triplet loss recognition models and two real masked datasets proved that i.e. it enables the model to minimize the distance between our proposed approach significantly improves the performance genuine pairs and maximize the distance between imposter in most experimental settings. pairs. Nonetheless, unlike triplet loss, the SRT can dynamical self-adjust its learning objective by focusing on minimizing the Index Terms—COVID-19, Biometric recognition, Identity ver- distance between the genuine pairs when the distance between ification, Masked face recognition. the imposter pairs is deemed to be sufficient. The presented approach is evaluated on top of two face I. I NTRODUCTION recognition models, ResNet-50 [9] and MobileFaceNet [10] Face recognition is one of the preferable biometric recogni- trained with the loss function, Arcface loss [11], to validate tion solutions due to its contactless nature and the high accu- the feasibility of adopting our solution on top of different deep racy achieved by face recognition algorithms. Face recognition neural network architectures. With a detailed evaluation of the systems have been widely deployed in many application sce- proposed EUM and SRT, we reported the verification per- narios such as automated border control, surveillance, as well formance gain by the proposed approach on two real masked as convenience applications [1], [2]. However, these systems face datasets [7], [3]. We further experimentally supported our are mostly designed to operate on none occluded faces. After theoretical motivation behind our SRT loss by comparing its the current COVID-19 pandemic, wearing a protective face performance with the conventional triplet loss. The overall ver- mask has been imposed in public places by many governments ification result showed that our proposed approach improved to reduce the rate of COVID-19 spread. This new situation the performance in most of the experimental settings. For raises a serious unusually challenge for the current face example, when the probes are masked, the achieved FMR100 recognition systems. Recently, several studies have evaluated measures (the lowest false non-match rate (FNMR) for false the effect of wearing a face mask on face recognition accuracy match rate (FMR) ≤ 1.0 %) by our approach on top of [3], [4], [5], [6]. The listed studies have reported the negative MobileFaceNet are reduced by ∼ 28% and 26% on the two impact of masked faces on the face recognition performance. evaluated datasets The main conclusion of these studies [3], [4], [5], [6] is that In the rest of the paper, we discuss first the related works the accuracy of face recognition algorithm with masked faces focusing on masked face recognition in Section II. Then, we is significantly degraded, in comparison to unmasked face. present our proposed EUM architecture and our SRT loss in Motivate by this new circumstance we propose in this paper Section III. In Section IV, we present the experimental setups a new approach to reduce the negative impact of wearing a and implementation details applied in this work. Section V facial mask on face recognition performance. The presented presents and discuss the achieved results. Finally, a set of solution is designed to operate on top of existing face recog- conclusions are drawn in Section VI. nition models, and thus avoid retraining existing solutions . used for unmasked faces. Recent works either proposed to train face recognition models with simulated masked faces II. R ELATED W ORK [7], or to train a model to learn the periocular area of the face In recent years, significant progress has been made to images exclusively [8]. Unlike these, our proposed solution improve face recognition verification performance with essen-
2 tially none-occluded face. Several previous works [12], [13] purpose. Recently, a rapid number of researches are published addressed general face occlusion e.g. wearing sunglasses or a to address the detection of wearing a face mask [16], [17], scarf. Nonetheless, they did not directly address facial mask [18]. These studies did not directly address the effect of occlusion (before the current COVID-19 situation). wearing a mask on the performance of face recognition or After the current COVID-19 situation, four major studies presenting a solution to improve masked face recognition. have evaluated the effect of wearing a facial mask on face As reported in a previous study [3], [4], face recognition recognition performance [3], [4], [5], [6]. The National In- systems might fail in detecting a masked face. Thus, a face stitute of Standards and Technology (NIST) has published recognition system could benefit from the detection of face two specific studies on the effect of masked faces on the mask to improve the face detection, alignment and cropping as performance of face recognition solutions submitted by ven- they are an essential preprocessing stpdf for feature extraction. dors using pre-COVID-19 [5] algorithms, and post-COVID- Motivated by the recent evaluations efforts on the negative 19 [6] algorithms. These studies are part of the ongoing Face effect of wearing a facial mask on the face recognition Recognition Vendor Test (FRVT). The studies by the NIST performance [3], [4], [5], [6] and driven by the need for ex- concluded that wearing a face mask has a negative effect clusively developing an effective solution to improve masked on the face recognition performance. However, the evaluation face recognition, we present in this work a novel approach to by NIST is conducted using synthetically generated masks, improve masked face recognition performance. The proposed which may not fully reflect the actual effect of wearing a solution is designed to run on top of existing face recognition protective face mask on the face recognition performance. The models. Thus, it does not require any retraining the existing recent study by Damer et al. [3] has tackled this limitation face recognition models as presented in next Section III. by evaluating the effect of wear mask on two academic face recognition algorithms and one commercial solution III. M ETHODOLOGY using a specific collected database for this purpose from In this section, we present our proposed approach to im- 24 participants over three collaborative sessions. The study prove the verification performance of masked face recognition. indicates the significant effect of wearing a face mask on face The presented solution is designed to be built on top of existing recognition performance. A similar study was carried out by face recognition models. To achieve this goal, we propose an the Department of Homeland Security (DHS) [4]. In this study, EUM. The input for our proposed model is a face embedding several face recognition systems (using six face acquisition extracted from a masked face image and the output is what systems and 10 matching algorithms) were evaluated on a we call an ”unmasked face embedding”, which aims at being specifically collected database of 582 individuals. The main more similar to the embedding of the same identity when not conclusion from this study is that the accuracy of most best- wearing a mask. Thus, the proposed solution does not require performing face recognition system is degraded from 100% to any modification or training of the existing face recognition 96% when the subject is wearing a facial mask. solution. Figure 1 shows an overview of the proposed approach Li et al. [8] proposed to use an attention-based method to workflow in training and in operation modes. train a face recognition model to learn from the periocular area Furthermore, we propose the SRT to control the model of masked faces. The presented method show improvement during the training phase. Similar to the well-known triplet- in the masked face recognition performance, however, the based learning, the SRT loss has two learning objectives: proposed approach is only tested on simulated masked face 1) Minimizing the inter-class variation i.e. minimizing the datasets and essentially only maps the problem into a perioc- distance between genuine pairs of unmasked and masked ular recognition problem. A recent preprint by [7] presented face embeddings. 2) Maximizing the intra-class variation i.e. a small dataset of 269 unmasked and masked face images of maximizing the distance between imposter pairs of masked 53 identities crawled from the internet. The work proposed face embeddings. However, unlike the traditional triplet loss, to fine-tune Facenet model [14] using simulated masked face the proposed SRT loss function can self-adjust its learning ob- images to improve the recognition accuracy. However, the jective by only focusing on optimizing the inter-class variation proposed solution is only tested using a small database (269 when the intra-class variation is deemed to be sufficient. When images). Wang et al. [15] presented three datasets crawled the gap on intra-class variation is violated, our proposed loss from the internet for face recognition, detection, and simulated behave as traditional triplet loss. The theoretical motivation masked faces. The face recognition dataset contains 5000 behind our SRT loss will be presented along with the function masked face images of 525 identities, and 90000 unmasked formulation later in this section. face images of the same 525 identities. The authors claim to In the following, this section presents our proposed EUM improve the verification accuracy from 50% to 95% on the architecture and the SRT loss. masked face. However, they did not provide any information about the evaluation protocol, proposed solution, or imple- mentation details. Moreover, the published part of the dataset A. Embedding Unmasking Model Architecture does not contain pairs of unmasked-masked images for most The architecture of the EUM is based on a Fully Connected of the identities 1 . Thus, such a dataset could be more suitable Neural Network (FCNN). Having a fully connected layer, for face mask detection [16], [17], [18] than face recognition where all neurons in two consecutive layers are connected to each other, enables us to demonstrate a generalized EUM de- 1 https://github.com/X-zhangyang/Real-World-Masked-Face-Dataset sign. This is the case because this structure is easily adaptable
3 backpropagation Embedding unmasking model CNN Anchor Feature extractor (Pretrained) Face embedding Unmasked face embedding Positive d1 Anchor CNN d2 Negative Self-restrained triplet loss Positive Feature extractor (Pretrained) Face embedding CNN Negative Feature extractor (Pretrained) Face embedding Fig. 1: The workflow of the face recognition model with our proposed EUM. The top part of the figure (inside the dashed rectangle) shows the proposed solution in operation mode. Given an embedding obtained from the masked face, the proposed model is trained with SRT loss to output a new embedding that is similar to the one of the unmasked face of the same identity and different from the masked face embedding from all other identities. to different input shapes, and thus can be adapted on the top report concluded that FMR seems to be less affected when the of different face recognition models, motivating our choice of probes are masked. A similar observation from the study by the FCNN. The model input is a masked feature embedding Damer et al. [3] stated that the genuine score distributions are (i.e. resulting from a masked face image) of the size d (d significantly affected by masked probes. The study reported depends on the used face recognition network), and the model that the genuine score distribution strongly shifts towards the output is a features vector of the same size d. The proposed imposter score distributions. On the other hand, the imposter model consists of four fully connected layers (FC)- one input score distributions do not seem to be strongly affected by layer, two hidden layers, and one output layer. The input size masked face probes. for all FC layers is of the size d. Each of the layers 1, 2, One of the main observations of the previous studies in and 3 is followed by batch normalization(BN) [19] and Leaky [3], [5], is that wearing a face mask significantly increase the ReLU non-linearity activation function [20]. The last FC layer FNMR, whereas the FMR seem to be less affected by wearing is followed by BN. a mask. Similar remarks have been also reported in our result (see Section V). Based on these observations, we motivate B. Unmasked Face Embedding Learning our proposed SRT loss function to focus on increasing the The learning objective of our model is to reduce the FNMR similarity between genuine pairs of unmasked and masked face of genuine unmasked-masked pairs. The main motivation embeddings, while maintaining the imposter similarity at an behind this learning goal is inspired by the latest reports on acceptable level. In the following section, we briefly present evaluating the effect of the masked faces on face recogni- the naive triplet loss followed by our proposed SRT loss. tion performance by the National Institute of Standards and 1) Self-restrained Triplet Loss Technology (NIST) [5] and the recent work by Damer et Previous works [14], [21] indicated that utilizing triplet- al. [3]. The NIST report [5] stated that the false non-match based learning is beneficial for learning discriminative face rates (FNMR) are increased in all evaluated algorithms when embeddings. Let x ∈ X represents a batch of training samples, the probes are masked. For the most accurate algorithms, and f (x) is the face embeddings obtained from the face the FNMR increased from 0.3% to 5% at FMR of 0.001% recognition model. Training with triplet loss requires a triplet when the probes are masked. On the other hand, the NIST of samples in the form {xai , xpi , xni } ∈ X, where xai , the
4 2.5 2.5 2.0 2.0 Distance Distance Triplet loss-d1 Triplet loss-d1 Triplet loss-d2 Triplet loss-d2 1.5 SRT loss-d1 SRT loss-d1 1.5 SRT loss-d2 SRT loss-d2 1.0 1.0 0.5 0.5 0 20000 40000 60000 80000 100000 0 20000 40000 60000 80000 100000 Iteration Iteration (a) ResNet-50: Triplet loss vs. SRT loss (b) MobileFaceNet: Triplet loss vs. SRT loss Fig. 2: Naive triplet loss vs. SRT loss distances learning over training iterations. The plots show the learned d1 (distance between genuine pairs) and d2 (distance between imposter pairs) by each loss over training iterations. It can be clearly noticed that the anchor (model output) of model trained with SRT loss is more similar to the positive than the anchor of the model trained with naive triple loss. anchor, and xpi , the positive, are two different samples of the a huge number of training triplet with large computational same identity, and xni , the negative, is a sample belonging to resources for selecting the optimal triplets for training. Given a different identity. The learning objective of the triplet loss a masked face embedding, our model is trained to generate is that the distance between f (xai ) and f (xpi ) (genuine pairs) a new embedding such as it is similar to the unmasked face with the addition of a fixed margin value (m) is smaller than embedding of the same identity and dissimilar from other face the distance between f (xai ) and any face embedding f (xpi ) of embeddings of any other identities. As discussed earlier in any other identities (imposter pairs). In FaceNet [14], triplet this section, the distance between imposter pairs have been loss is proposed to learn face embeddings using euclidean found to be less affected by wearing a mask [3], [5]. Thus, distance applied on normalized face embeddings. Formally, we aim to ensure that our proposed loss focuses on minimizing the triplet loss `t for a min-batch of N samples is defined as the distance between the genuine pairs (similar to scenario 2) following: while maintaining the distance between imposter pairs. N Training EUM with SRT loss requires a triple to be defined 1 X `t = max{d(f (xai ), f (xpi )) − d(f (xai ), f (xni )) + m, 0}, as follows: f (xai ) is an anchor of masked face embedding, N i EU M (f (xai )) is the anchor given as an output of the EUM, (1) f (xpi ) is a positive of unmasked embedding, and f (xni ) is a where m is a margin applied to impose the separability negative embedding of a different identity than anchor and between genuine and imposter pairs. An d is the euclidean positive. This triplet is illustrated in Figure 1. We want to distance applied on normalized features, and is given by: ensure that the distance (d1) between EU M (f (xai )) and f (xpi ) 2 in addition to a predefined margin is smaller than the distance d(xi , yi ) = kxi − yi k2 . (2) (d2) between EU M (f (xai )) and f (xni ). Our goal is to train Figure 3 visualize two triplet loss learning scenarios. Figure EUM to have more focus on minimizing d1, as d2 is less 3.a shows the initial training triplet, and Figure 3.b and 3.c affected by mask. illustrate two scenarios that can be learnt using triplet loss. Under the assumption that the distance between the positive In both scenarios, the goal of the triplet loss is achieved and the negative embeddings (d3) are close to optimal and i.e. d(f (xai ), f (xni )) > d(f (xai ), f (xpi )) + m. In Figure 3.b it does not contribute to the back-propagation of the EUM (scenario 1), both distances are optimized. However, in this model, we propose to use this distance as a reference to control scenario, the optimization of d2 distance is greater than the the triplet loss. The main idea is to train the model as a naive optimization on d1 distance. Whereas, in Figure 3.c (scenario triplet loss when d2 (anchor-negative distance) is smaller than 2), the triplet loss enforces the model to focus on minimizing d3 (positive-negative distance). In this case, the SRT guides the the distance between the anchor and the positive. The optimal model to maximize d2 distance and to minimize d1 distance. state for the triplet loss is achieved when both distance are When d2 is equal or greater than d3, we replace d2 by d3 in fully optimized i.e. d(f (xai ), f (xpi )) is equal to zero and the loss calculation. This distance swapping allows the SRT d(f (xai ), f (xni )) is greater than the predefined margin. How- to learn only, at this point, to minimize d1 distance. At any ever, achieving such a state may not be feasible, and it requires point of the training, when the condition on d2 is violated i.e
5 d(d2) < d(d3), the SRT behave again as a naive triplet loss. version of MS-Celeb-1M [24] dataset. Our choice to employ We opt to compare the d2 and d3 distances on the batch level Arcface loss as it achieved state-of-the-art performance of to avoid swapping the distance on every minor update on the several face recognition testing datasets such as Labeled Face distance between the imposter pairs. In this case, we want to in the Wild(LFW) [25]. The achieved accuracy on LFW by ensure that the d1 distance, with the addition of a margin m, ResNet-50 and MobileFaceNet trained with Arcface loss using is smaller than the mean of the d3 distances calculated on the MS1MV2 dataset are 99.80% and 99.55%, respectively. The mini-batch of a triplet. Thus, our loss is less sensitive to the considered face recognition models are evaluate with cosine- outliers resulted from comparing imposter pairs. We define our distance for comparison. The Multi-task Cascaded Convo- SRT loss for a mini-batch of the size N as follow: lutional Networks (MTCNN) solution [26] is employed to 1 PN a p a n detect and align the input face image. Both models process N i max{d(f (xi ), f (xi )) − d(f (xi ), f (xi )) aligned and cropped face image of size 112 × 112 pixels to +m, 0} if µ(d2) < µ(d3) `SRT = 1 PN a p (3) produce 512−D embedding feature by ResNet-50 and 128−D max{d(f (x i ), f (x i )) − µ(d3) embedding feature by MobileFaceNet. N i +m, 0} otherwise, where µ(d2) is the mean of the distances between the anchor B. Synthetic Mask Generation and the Pnegative pairs calculated on the mini-batch level, given As there is no large scale database with pairs of unmasked N as N1 i (d(f (xai ), f (xni )). µ(d3) is the mean of the distances and masked face images, we opted to use a synthetically between the positive and thePnegative pairs calculated on the generated mask to train our proposed approach. Specifically, N mini-batch level, given as N1 i (d(f (xpi ), f (xni )). An d is the we employ the synthetic mask generation method proposed euclidean distance computed on normalized feature embedding by NIST [5]. The synthetic generation method depends on (Equation 2). the Dlib [27] toolkit to detect and extract 68 facial landmarks Figure 2 illustrates the optimization of d1 (distance between from a face image. Based on the extracted landmark points, genuine pairs) and d2 (distance between imposter pairs) by a face mask of different shapes, heights, and colors can be naive triplet loss and SRT loss over the training iterations of drawn on the face images. The detailed implementation of two EUM models. In Figure 2a, the EUM model is trained the synthetic mask generation method is described in [5]. The on top of feature embeddings obtained from a ResNet-50 face synthetic mask generation method provided six mask types recognition model. In Figure 2b, the EUM model is trained with different high and coverage: A) wide-high coverage, B) on top of feature embeddings obtained from MobileFaceNet- round-high coverage, C) wide-medium coverage, D) round- 50 face recognition model. Details on the training is presented medium coverage, E) wide-low coverage, and F) round-low in Section IV. It can be clearly noticed that the d1 distance coverage. Figure 4 shows samples of simulated face mask (anchor-positive distance) learned by SRT is significantly of different type and coverage. The mask color is randomly smaller than the one learned by naive triplet loss. This in- selected from the RGB color space. To synthetically generate a dicates that the output embedding of the EUM trained with masked face image, we extract first the facial landmarks of the SRT is more similar to the embedding of the same identity input face image. Then, a masked of specific color and type (the positive) than the output embedding of EUM trained with can be drawn on the face image using the x, y coordination triplet loss. of the facial landmarks points. IV. E XPERIMENTAL SETUP C. Database This section presents the experimental setups and the im- We used MS1MV2 dataset [11] to train our proposed plementation details applied in the paper. approach. The MS1MV2 is a refined version of MS-Celeb- 1M [24] dataset. The MS1MV2 contains 58m images of 85k A. Face Recognition Model different identities. We generated a masked version of the To provide a deep evaluation of the performance of the MS1MV2 noted as MS1MV2-Masked as described in Section proposed solution, we evaluated our proposed solution on top IV-C. The mask type (as described in Section ) and color of two face recognition models - ResNet-50 [9] and Mobile- are randomly selected for each image to add more diversity FaceNet [10]. ResNet is one of the widely used Convolutional of mask color and coverage to the training dataset. The Dlib Neural Network (CNN) architecture employed in several face failed in extracting the facial landmarks from 426k images. recognition models e.g. ArcFace [11] and VGGFace2 [22]. These images are neglected from the training database. A MobileFaceNet is a compact model designed for low subset of 5k images are randomly selected from MS1MV2- computational powered devices. MobileFaceNet model archi- Masked to validate the model during the training phase. tecture is based on residual bottlenecks proposed by Mo- The proposed solution is evaluated using two real masked bileNetV2 [23] and depth-wise separable convolutions layer, face datasets: Masked Faces in Real World for Face Recog- which allows building a CNN model with a much smaller set nition (MRF2) [7] and the extended masked face recognition of parameters in comparison to standard CNNs. To provide (MFR) dataset [3]. To the best of our knowledge, these are fair and comparable evaluation results, both models are trained the only datasets available for research including pairs of using the same loss function, Arcface loss [11], and the same real masked face images and unmasked face images. Table training database, MS1MV2 [11]. The MS1MV2 is a refined I summarize the evaluation datasets used in this work. The
6 Positive d1 Anchor d3 Positive d2 d1 Learning Negative d3 b) Scenario 1 Anchor Positive d2 d1 Negative Anchor d3 d2 Negative a) Training triplet c) Scenario 2 Fig. 3: Triplet loss guide the model to maximize the distance between the anchor and negative such as it is greater than the distance between the anchor and positive with the addition of a fixed margin value. One can be clearly noticed the high similarity between the anchor and positive (d1) learned in scenario 2, in comparison to the d1 learned one in scenario 1, whereas, the distance, d2, between the anchor and the negative (imposter pairs) in scenario 1 is extremely large than the d2 in scenario 2. (a) Wide-high coverage (b) Round-high coverage (c) Wide-medium coverage (d) Round-medium coverage (e) Wide-low coverage (f) Round-low coverage Fig. 4: Samples of the synthetically generated face masks of different shape and coverage. MFR2 contains 269 images of 53 identities crawled from the and triplet loss noted as F-M(SRT) and M-M(SRT) and F- internet. Therefore, the images in the MRF2 dataset can be M(T) and M-M(T), respectively. considered to be captured under in-the-wild conditions. The database contains images of masked and unmasked faces with Also, We deploy an extended version of the MFR dataset an average of 5 images per identity. The baseline evaluation [3]. The extended version of MFR is collected from 48 par- of the considered face recognition on MFR2 is done by ticipants using their webcams under three different sessions- performing an N:N comparisons between the unmasked face session 1 (reference) and session 2 and 3 (probes). The noted as F-F, the unmasked faces and masked faces noted sessions are captured on three different days. Each session as F-M, the masked faces noted as M-M to evaluate the contains data captured using three videos. In each session, influence of wearing a facial mask on the face recognition the first video is recorded when the subject is not wearing a performance. Finally, to evaluate our proposed solution, we facial mask in the daylight without additional electric lighting. report the verification performance of the M-M and F-M The second and third videos are recorded when the subject is settings based on our proposed EUM model with SRT loss wearing a facial mask and with no additional electric lighting in the second video and with electric lighting in the third
7 video (room light is turned on). The baseline reference (BLR) settings. FDR is a class separability criterion described in [28], contains 480 images from the first video of the first session and it is given by: (day) as described in [3]. The mask reference (MR) contains (µG − µI )2 960 images from the second and third videos of the first F DR = , (σG )2 + (σI )2 session. The baseline probe (BLP) contains 960 images from the first video of the second and third sessions and contains where µG and µI are the genuine and imposter scores mean face images with no mask. The mask probe (MP) contains values and σG and σI are their standard deviations values. 1920 images from the second and third videos of the second The larger FDR value, the higher is the separation between and third sessions. The baseline evaluation of face recognition the genuine and imposters scores and thus better expected performance on the MFR database is done by performing verification performance. N:N between BLR and BLP (unmasked face), noted as BLR- BLP. To evaluate the effect of wearing a facial mask on face V. R ESULT recognition performance, we perform an N:N comparisons In this section, we present and discuss our achieved results. between BLR and MP data splits, noted as BLR-MP, and We experimentally present first the negative impact of wearing an N:N comparisons between MR and MP data splits, noted a face mask on face recognition performance. Then, we present as MR-MP. Finally, we evaluate and report the verification and discuss the impact of our EUM trained with SRT on performance of the MR-MP and BLR-MP settings achieved by enhancing the separability between the genuine and imposter our EUM model trained with SRT and, the naive triplet loss (as comparison scores. Then, we present the gain in the masked a baseline of our SRT loss), noted as MR-MP(SRT) and BLR- face verification performance achieved by our proposed EUM MP(SRT), and MR-MP(T) and BLR-MP(T), respectively. trained with SRT on the collaborative and in-the-wild masked face recognition. Finally, we present an ablation study on SRT D. Model Training Setup to experimentally supported our theoretical motivation behind the SRT loss by comparing its performance with the triplet We trained four instances of the EUM model. The first and loss. second instances, model1 and model2, are trained with SRT loss using feature embeddings obtained from ResNet-50 and MobileFaceNet, respectively. The third and fourth instances, A. Impact of Wearing a Protective Face Mask on Face model3 and model4, are trained with triplet loss using feature Recognition Performance embeddings obtained from ResNet-50 and MobileFaceNet, Tables II, III, IV, and V present a comparison between respectively as an ablation study to our proposed SRT. The the baseline evaluation where both reference and probe are proposed EUM models in this paper are implemented by unmasked (F-F, BLR-BLP), the case where only the probe Pytorch and trained on Nvidia GeForce RTX 2080 GPU. All is masked (F-M,BLR-MP), and the case where reference and models are trained using SGD optimizer with initial learning probe are masked (M-M,MR-MP). Using unmasked images, rate of 1e-1 and batch size of 512. The learning rate is divided the considered face recognition models, ResNet-50 and Mo- by 10 at 30k, 60k, , 90k training iterations. The early-stopping bileFaceNet, achieve a very high verification performance. patience parameter is set to 3 (around 30k training iteration) This is demonstrated by scoring 0.0% and 0.0% EER on the causing model1, model2, model3, and model4 to stop after MFR database and 0% and 0.0106% on the MRF2 database, 80k, 70k, 60k, 10k training iterations, respectively. respectively by model ResNet-50 and MobileFaceNet. The verification performances of the considered models is substan- tially degraded when evaluated on masked face images. This E. Evaluation Metric is indicated by the degradation in verification performance The verification performance is reported as Equal Error measures and FDR values, in comparison to the case where Rate (EER), FMR100 and FMR1000 which are the lowest probe and reference are unmasked. Furthermore, the consid- FNMR for a FMR≤1.0% and ≤0.1%, respectively. We also ered models achieved lower verification performance when a report the mean of the genuine scores (G-mean) and the mean masked probe is compared to a masked reference than the case of imposter scores (I-mean) to analysis the shifts in genuine where only the probe is masked, as seen in Tables II, III, IV, and imposter scores distributions induced by wearing a face and V. For example, using MFR dataset, when the probe is mask and to demonstrate the improvement in the verification masked, the achieved EER by ResNet-50 model is 1.2492% performance achieved by our proposed solution. For each (BLR-MP). This error rate is raised to 1.2963% when both of the evaluation settings, we plot the receiver operating probe and reference are masked (MR-MP) as seen in Table II. characteristic (ROC) curves. Also, for each of the conducted This results is also supported by G-mean, I-mean and FDR as experiment, we report the failure to extract rate (FTX) to shown in Tables II, III, IV and V. capture the effect of wearing a face mask on face detection. We also make two general observations. 1) The compact FTX measures is proportion of comparisons where the feature model, MobileFaceNet, achieved lower verification perfor- extraction was not possible. Further, we enrich our reported mance than the ResNet-50 model when comparing masked evaluation results by reporting the Fisher Discriminant Ratio probes to unmasked/masked references. One of the reasons (FDR) to provide an in-depth analysis of the separability for this performance degradation might be due to the smaller of genuine and imposters scores for different experimental embedding size of MobileFaceNet (128-D), in comparison to
8 Dataset Multi-sessions No. images No. identities Capture scenario MFR [3] Yes 269 53 Collaborative MRF2 [7] No 4320 48 In-the-wild TABLE I: An overview of the evaluation datasets employed in this work. We evaluate our solution on real (not simulated) masked datasets in two different capture scenarios, collaborative and in-the-wild. Resnet50 EER % FMR100 % FMR1000 % G-mean I-mean FDR FTX % BLR-BLP 0.0 0.0 0.0 0.8538 0.0349 55.9594 0.0 BLR-MP 1.2492 1.4251 3.778 0.5254 0.0251 12.6189 4.4237 BLR-MP(T) 1.9789 2.9533 7.9988 0.4401 0.0392 9.4412 4.4237 BLR-MP(SRT) 0.9611 0.946 2.5652 0.5447 0.0272 13.4045 4.4237 MR-MP 1.2963 1.4145 2.6311 0.7232 0.0675 15.1356 4.4736 MR-MP(T) 1.3091 1.456 2.8259 0.8269 0.4169 13.0528 4.4736 MR-MP(SRT) 1.1207 1.1367 2.4523 0.7189 0.0557 15.1666 4.4736 TABLE II: The achieved verification performance of different experimental settings achieved by ResNet-50 model along with EUM trained with triplet loss and EUM trained with SRT loss. The result is reported using MFR dataset. It can be clearly noticed the significant improvement in the verification performance induced by our proposed approach (SRT). Mobilefacenet EER% FMR100% FMR1000% G-mean I-mean FDR FTX % BLR-BLP 0.0 0.0 0.0 0.8432 0.0488 37.382 0.0 BLR-MP 3.4939 6.507 20.564 0.468 0.0307 7.1499 4.4237 BLR-MP(T) 5.2759 12.7835 28.8175 0.3991 0.0501 5.9623 4.4237 BLR-MP(SRT) 2.8805 4.6331 13.4384 0.5013 0.0383 8.6322 4.4237 MR-MP 3.506 6.8842 17.3479 0.6769 0.1097 7.9614 4.4736 MR-MP(T) 4.2947 7.9124 16.3772 0.8082 0.4716 6.6455 4.4736 MR-MP(SRT) 3.1866 5.6166 13.529 0.6636 0.0837 8.0905 4.4736 TABLE III: The achieved verification performance of different experimental settings achieved by MobileFaceNet model along with EUM trained with triplet loss and EUM trained with SRT loss. The result is reported using MFR dataset. It can be clearly noticed the significant improvement in the verification performance induced by our proposed approach (SRT). the embedding size of 512-D in ResNet-50. 2) The considered measures by considered face recognition models, as shown in models achieved lower performance when evaluated on the Table II, III, V and IV. This indicates a general expected im- MRF2 dataset than the case when evaluated on the MRF provement in verification performance of the face recognition dataset. This result was expected as the images in the MRF2 and enhancing the general trust in the verification decision. dataset are crawled from the internet with large variations in For example, when the ResNet-50 model is evaluated on the facial expression, pose, illumination. On the other hand, the MFR dataset, and the probe is masked, the FDR is increased images in the MFR dataset are collected in a collaborative from 12.6189 (BLR-MP) to 13.4045 (BLR-MP(SRT)) using environment. our proposed approach. This improvement in the separability To summarize, wearing a protective face mask has negative between the genuine and the imposter samples by our proposed effect on the face recognition verification performance. This approach is consistent in all reported results. result supports and complements (by evaluating masked-to- masked pairs in this work) to the previous findings in the C. Impact of our EUM with SRT solution on the collabora- studies in [3], [4], [5], [6] evaluated the impact of wearing a tive masked face recognition mask on face recognition performance. When the considered models are evaluated on the MFR dataset, it can be observed that our proposed approach sig- B. Impact of our EUM with SRT on the Separability between nificantly enhanced the masked face verification performance Genuine and Imposter Comparison Scores as shown in Table II and III. For example, when comparing The proposed approach significantly enhanced the separa- masked probes to unmasked references, the achieved EER bility between the genuine and imposter comparison scores in by ResNet-50 model is 1.2492% (BLR-MP). This error rate both considered face recognition models and on both evaluated is decreased to 0.9611% by our proposed approach (BLR- datasets. This improvement is noticeable from the increment MP(SRT)) indicating a clear improvement in the verification in the FDR separability measure achieved by our proposed performance induced by our proposed approach as shown in. EUM trained with SRT, in comparison to the achieved FDR Table II. Similar enhancement in the verification performance
9 Resnet50 EER% FMR100% FMR1000% G-mean I-mean FDR FTX % F-F 0.0 0.0 0.0 0.7477 0.0038 37.9345 0.0 F-M 4.3895 6.7568 10.473 0.4263 0.0005 8.2432 0.9497 F-M(T) 6.4169 7.7703 12.1622 0.3567 -0.0066 6.8853 0.9497 F-M(SRT) 4.7274 7.4324 9.4595 0.4553 0.0014 8.4507 0.9497 M-M 6.8831 10.0156 13.7715 0.6496 0.0301 4.7924 1.203 M-M(T) 6.8831 9.7027 14.0845 0.7759 0.3663 4.8791 1.203 M-M(SRT) 6.2578 9.0767 11.8936 0.6488 0.0144 4.9381 1.203 TABLE IV: The achieved verification performance of different experimental settings achieved by ResNet-50 model along with EUM trained with triplet loss and EUM trained with SRT loss. The result is reported using MRF2 dataset. Our proposed approach (SRT) achieved competitive performance on F-M experimental setting and the best performance on M-M experimental setting. 1.00 1.00 0.98 0.98 1 - FNMR 0.96 1 - FNMR 0.96 0.94 0.94 0.92 BLR-MP AUC = 0.9975 0.92 MR-MP AUC = 0.9968 BLR-MP(T) AUC = 0.9960 MR-MP(T) AUC = 0.9969 BLR-MP(SRT) AUC = 0.9971 MR-MP(SRT) AUC = 0.9968 0.90 0.90 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 FMR FMR (a) ResNet-50: BLR-MP (b) ResNet-50: MR-MP 1.00 1.00 0.98 0.98 1 - FNMR 1 - FNMR 0.96 0.96 0.94 0.94 0.92 BLR-MP AUC = 0.9908 0.92 MR-MP AUC = 0.9916 BLR-MP(T) AUC = 0.9878 MR-MP(T) AUC = 0.9875 BLR-MP(SRT) AUC = 0.9949 MR-MP(SRT) AUC = 0.9918 0.90 0.90 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 FMR FMR (c) MobileFaceNet: BLR-MP (d) MobileFaceNet: MR-MP Fig. 5: The verification performance for the considered face recognition models, our proposed EUM trained with naive triplet loss, and our proposed EUM trained with SRT loss, are presented as ROC curves. The curves are plot using MFR dataset for the experimental settings BLR-MP (probe is masked) and MR-MP (reference and probe are masked). For each ROC curve, the area under the curve (AUC) is listed inside the plot. The improvement in the performance is noticeable by our proposed SRT. This improvement is very clear for the cases where masked probe significantly affected the verification performance. is observed by our approach when comparing masked probes to masked references. In this case, the EER is decreased from
10 M-M AUC = 0.9766 1.00 1.00 M-M(T) AUC = 0.9836 M-M(SRT) AUC = 0.9780 0.98 0.98 1 - FNMR 1 - FNMR 0.96 0.96 0.94 0.94 0.92 F-M AUC = 0.9937 0.92 F-M(T) AUC = 0.9869 F-M(SRT) AUC = 0.9904 0.90 0.90 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 FMR FMR (a) ResNet-50: F-M (b) ResNet-50: M-M F-M AUC = 0.9717 M-M AUC = 0.9609 1.00 F-M(T) AUC = 0.9669 1.00 M-M(T) AUC = 0.9671 F-M(SRT) AUC = 0.9753 M-M(SRT) AUC = 0.9675 0.98 0.98 1 - FNMR 1 - FNMR 0.96 0.96 0.94 0.94 0.92 0.92 0.90 0.90 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 FMR FMR (c) MobileFaceNet: F-M (d) MobileFaceNet: M-M Fig. 6: The verification performance for the considered face recognition models, our proposed EUM trained with naive triplet loss, and our proposed EUM trained with SRT loss, are presented as ROC curves. The curves are plot using MRF2 dataset for the experimental settings F-M (probe is masked) and M-M (reference and probe are masked). For each ROC curve, the area under the curve (AUC) is listed inside the plot. The improvement in the performance is noticeable by our proposed SRT in the plots c and d, in comparison to the achieved performance by the base face recognition model. 1.2963% (MR-MP) to 1.1207% (MR-MP(SRT)). EER by ResNet-50 model is 4.3895% (F-M) as shown in Table When the probes are masked, the achieved EER by the II. Only in this case, the EER did not improve by our proposed MobileFaceNet model is 3.4939% (BLR-MP). This error is approach where the achieved EER by our proposed approach reduced to 2.8805% using our proposed approach (BLR- is 4.7274% (F-M(SRT)). Nonetheless, a notable improvement MP(SRT)). Also, when comparing masked probes to masked in the FMR1000 and the FDR separability measures can be references, the EER is decreased from 3.506% (MR-MP) to observed from the reported result. The increase in FDR points 3.1866 (MR-MP(SRT)) by our approach. The improvement in out the possibility that given a larger more representative eval- the masked face recognition verification performance is also uation data, the consistent enhancement in verification accu- noticeable from the improvement in FMR100 and FMR1000 racy will be apparent here as well. A significant improvement measures. in the verification performance is achieved by our approach when comparing masked probes to masked references. In this D. Impact of our EUM with SRT on in-the-wild masked face case, the achieved EER is decreased from 6.8831% (M-M) recognition to 6.2578% (M-M(SRT)). A Similar conclusion can be made from the improvement from the other performance verification The achieved evaluation results on the MRF2 dataset by measures and the FDR measure. ResNet-50 and MobileFaceNet models are presented in Tables IV and V, respectively. Using masked probes, the achieved Using masked probes, the achieved verification performance
11 Mobilefacenet EER% FMR100% FMR1000% G-mean I-mean FDR FTX % F-F 0.0106% 0.0 0.0 0.7318 0.0078 26.4276 0.0 F-M 6.4169 16.8919 24.3243 0.3803 -0.0019 4.6457 0.9497 F-M(T) 7.7685 15.8784 34.4595 0.3304 -0.0027 4.2067 0.9497 F-M(SRT) 6.079 12.5 21.9595 0.4157 -0.0018 5.2918 0.9497 M-M 8.4777 18.1534 28.795 0.6087 0.0509 3.2505 1.203 M-M(T) 8.7634 17.5274 26.2911 0.7638 0.3966 3.5408 1.203 M-M(SRT) 7.8232 15.0235 22.5352 0.6087 0.0241 3.5815 1.203 TABLE V: The achieved verification performance of different experimental settings achieved by MobileFaceNet model along with EUM trained with triplet loss and EUM trained with SRT loss. The result is reported using MRF2 dataset. In both F-M and M-M experimental settings, our proposed approach (SRT) achieved the best performances. by MobileFaceNet is significantly enhanced by our proposed masked anchor is similar (to some degree) to the positive approach. Similar improvement in the verification performance (unmasked embedding) and it is dissimilar (to some degree) is achieved when comparing masked probe to masked refer- from the negative. Therefore, finding triplets that violate the ence by our approach as shown in Table V. For example,when triplet condition is not trivia and it could not be possible for the probes and references are masked, the achieved EER many of triplet in the training dataset. This explains the poor by MobileFaceNet is 8.4777%. This error rate is reduced to result achieved when the EUM model is trained with triplet 7.8232% using our proposed approach. loss, as there are only few triplets violating the triplet loss condition. One can assume that using a larger margin value allows the EUM model to further optimizing the genuine pairs E. Ablation Study on Self-restrained Triplet loss distance and imposter pairs distance as the triplet condition In this subsection we experimentally prove, and theoretically can be violated by increasing the margin value. However, by discuss, the advantage of our proposed SRT solution over the increasing the margin value, we increase the upper bound common naive triplet loss. Using masked face dataset, we of the loss function. Thus, we ignore the fact that distance explore first the validity of training EUM model with triplet between imposter pairs is sufficient respect to the distance loss. It is noticeable that training EUM with naive triplet between genuine pairs in the embedding space. For example, is inefficient for learning from masked face embedding as using unmasked data, the mean of the imposter scores achieved presented in in Table II, IV III and V. For example, when the by ResNet-50 on MFR dataset is 0.0349. When the probe is probe is masked, the achieved EER by EUM with triplet loss masked, the mean of imposter scores is 0.0251 as shown in on top of ResNet-50 is 1.9789%, in comparison to 0.9611% Table II. Therefore, any further optimization on the distance EER achieved by EUM with our SRT as shown in Table II. It is between the imposter pairs will effect the discriminative fea- crucial for learning with triplet loss that the input triplet violate tures learned by the base face recognition model and there the condition d(f (xai ), f (xni )) > d(f (xai ), f (xpi )) + m. Thus, is no restriction on the learning process that insures that the the model can learn to minimize the distance between the model will maintain the distance between the imposter pairs. genuine pairs and maximize the distance between the imposter Alternatively, training the EUM model with our SRT loss pairs. When the previous condition is not violated, the loss achieved significant improvement on minimizing the distance value will be close to zero and the model will not be able between the genuine pairs, simultaneously, it maintains the dis- to further optimizing the distances of the genuine pairs and tance between the imposter pairs to be close to the one learned imposter pairs. This motivated our SRT solution. by the base face recognition model. It is noticeable from the Given that our proposed EUM solution is built on top of reported result that the I-mean achieved by our SRT is closer to a pre-trained face recognition model. The feature embeddings the I-mean achieved when the model is evaluated on unmasked of the genuine pairs are similar (to a large degree), and the data, in comparison to the one achieved by naive triplet loss as ones of imposter pairs are dissimilar. However, this similarity shown in Tables II, III, IV and V. The achieved result pointed is affected (to some degree) when the faces are masked and out the efficiency of our proposed EUM trained with SRT our main goal is to reduce this effect. This statement can be in improving the masked face recognition, in comparison to observed from the achieved results presented in Tables II, III, the considered face recondition models. Also, it supported our IV and V. For example, using MFR dataset, the achieved theoretical motivation behind SRT where training the EUM G-mean and I-mean by ResNet-50 is 0.8538 and 0.0349, with SRT significantly outperformed the EUM trained with respectively. When the probe is masked, the achieved G- naive triplet loss. mean and I-mean shift to 0.5254 and 0.0251, respectively a) : as shown in Table II. The shifting in G-mean pointed out that the similarity between the genuine pairs is reduced (to some degree) when the probe is masked. Training EUM with VI. C ONCLUSION naive triplet loss requires selecting a triplet of embeddings In this paper, we presented and evaluated a novel solution to violated the triplet condition. As we discussed earlier, the reduce the negative impact of wearing a protective face mask
12 on face recognition performance. This work was motivated [12] L. Song, D. Gong, Z. Li, C. Liu, and W. Liu, “Occlusion robust by the recent evaluation efforts on the effect of masked faces face recognition based on mask learning with pairwise differential siamese network,” in 2019 IEEE/CVF International Conference on on the face recognition performance [3], [4], [5], [6]. The Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 presented solution is designed to operate on top of existing - November 2, 2019, 2019, pp. 773–782. [Online]. Available: face recognition models, thus avoid the need for retraining https://doi.org/10.1109/ICCV.2019.00086 [13] M. Opitz, G. Waltner, G. Poier, H. Possegger, and H. Bischof, “Grid existing face recognition solutions used for unmasked faces. loss: Detecting occluded faces,” CoRR, vol. abs/1609.00129, 2016. This goal has been accomplished by proposing the EUM op- [Online]. Available: http://arxiv.org/abs/1609.00129 erated on the embedding space. The learning objective of our [14] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” pp. 815–823, 2015. EUM is to increase the similarity between genuine unmasked- [Online]. Available: https://doi.org/10.1109/CVPR.2015.7298682 masked pairs and to decrease the similarity between imposter [15] Z. Wang, G. Wang, B. Huang, Z. Xiong, Q. Hong, H. Wu, P. Yi, pairs. We achieved this learning objective by proposing a K. Jiang, N. Wang, Y. Pei et al., “Masked face recognition dataset and application,” arXiv preprint arXiv:2003.09093, 2020. novel loss function, the SRT. Through ablation study and [16] M. Loey, G. Manogaran, M. H. N. Taha, and N. E. M. Khalifa, “A experiments on two real masked face dataset and two face hybrid deep transfer learning model with machine learning methods for recognition models, we demonstrated that our proposed EUM face mask detection in the era of the covid-19 pandemic,” Measurement, vol. 167, p. 108288, 2021. with SRT significantly improved the masked face verification [17] Z. Wang, P. Wang, P. C. Louis, L. E. Wheless, and Y. Huo, “Wearmask: performance in most experimental settings. Fast in-browser face mask detection with serverless edge computing for covid-19,” arXiv preprint arXiv:2101.00784, 2021. [18] B. Qin and D. Li, “Identifying facemask-wearing condition using ACKNOWLEDGMENT image super-resolution with classification network to prevent covid-19,” Sensors, vol. 20, no. 18, p. 5236, 2020. This research work has been funded by the German Federal [19] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep Ministry of Education and Research and the Hessen State network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on Machine Learning, ICML Ministry for Higher Education, Research and the Arts within 2015, Lille, France, 6-11 July 2015, ser. JMLR Workshop and their joint support of the National Research Center for Applied Conference Proceedings, F. R. Bach and D. M. Blei, Eds., Cybersecurity ATHENE. vol. 37. JMLR.org, 2015, pp. 448–456. [Online]. Available: http://proceedings.mlr.press/v37/ioffe15.html [20] A. L. Maas, “Rectifier nonlinearities improve neural network acoustic R EFERENCES models,” 2013. [21] Y. Feng, H. Wang, H. R. Hu, L. Yu, W. Wang, and S. Wang, [1] D. Gorodnichy, S. Yanushkevich, and V. Shmerko, “Automated border “Triplet distillation for deep face recognition,” in IEEE International control: Problem formalization,” in Computational Intelligence in Bio- Conference on Image Processing, ICIP 2020, Abu Dhabi, United Arab metrics and Identity Management (CIBIM), 2014 IEEE Symposium on. Emirates, October 25-28, 2020. IEEE, 2020, pp. 808–812. [Online]. IEEE, 2014, pp. 118–125. Available: https://doi.org/10.1109/ICIP40778.2020.9190651 [2] G. Lovisotto, R. Malik, I. Sluganovic, M. Roeschlin, P. Trueman, and [22] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman, “Vggface2: I. Martinovic, “Mobile biometrics in financial services: A five factor A dataset for recognising faces across pose and age,” in 13th IEEE framework,” Tech. Rep. International Conference on Automatic Face & Gesture Recognition, FG [3] N. Damer, J. H. Grebe, C. Chen, F. Boutros, F. Kirchbuchner, and 2018, Xi’an, China, May 15-19, 2018. IEEE Computer Society, 2018, A. Kuijper, “The effect of wearing a mask on face recognition pp. 67–74. [Online]. Available: https://doi.org/10.1109/FG.2018.00020 performance: an exploratory study,” in BIOSIG 2020 - Proceedings of [23] M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L. Chen, the 19th International Conference of the Biometrics Special Interest “Inverted residuals and linear bottlenecks: Mobile networks for Group, online, 16.-18. September 2020, ser. LNI, A. Brömme, C. Busch, classification, detection and segmentation,” CoRR, vol. abs/1801.04381, A. Dantcheva, K. B. Raja, C. Rathgeb, and A. Uhl, Eds., vol. P-306. 2018. [Online]. Available: http://arxiv.org/abs/1801.04381 Gesellschaft für Informatik e.V., 2020, pp. 1–10. [Online]. Available: [24] Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao, “Ms-celeb-1m: A dataset https://dl.gi.de/20.500.12116/34316 and benchmark for large-scale face recognition,” in Computer Vision - [4] Department of Homeland Security, “Biometric Technology Rally at ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, MDTF,” https://mdtf.org/Rally2020, 2020, last accessed: March 3, 2021. October 11-14, 2016, Proceedings, Part III, ser. Lecture Notes in [5] M. L. Ngan, P. J. Grother, and K. K. Hanaoka, “Ongoing face recognition Computer Science, B. Leibe, J. Matas, N. Sebe, and M. Welling, vendor test (frvt) part 6b: Face recognition accuracy with face masks Eds., vol. 9907. Springer, 2016, pp. 87–102. [Online]. Available: using post-covid-19 algorithms,” 2020. https://doi.org/10.1007/978-3-319-46487-9 6 [6] ——, “Ongoing face recognition vendor test (frvt) part 6b: Face recog- [25] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, “Labeled nition accuracy with face masks using post-covid-19 algorithms,” 2020. faces in the wild: A database for studying face recognition in uncon- [7] A. Anwar and A. Raychowdhury, “Masked face recognition for secure strained environments,” University of Massachusetts, Amherst, Tech. authentication,” 2020. Rep. 07-49, October 2007. [8] Y. Li, K. Guo, Y. Lu, and L. Liu, “Cropping and attention based approach [26] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and for masked face recognition,” Applied Intelligence, pp. 1–14, 2021. alignment using multitask cascaded convolutional networks,” IEEE [9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016. image recognition,” in 2016 IEEE Conference on Computer Vision [27] D. E. King, “Dlib-ml: A machine learning toolkit,” J. Mach. and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June Learn. Res., vol. 10, pp. 1755–1758, 2009. [Online]. Available: 27-30, 2016. IEEE Computer Society, 2016, pp. 770–778. [Online]. https://dl.acm.org/citation.cfm?id=1755843 Available: https://doi.org/10.1109/CVPR.2016.90 [28] N. Poh and S. Bengio, “A study of the effects of score normalisation [10] S. Chen, Y. Liu, X. Gao, and Z. Han, “Mobilefacenets: Efficient cnns prior to fusion in biometric authentication tasks,” IDIAP, Tech. Rep., for accurate real-time face verification on mobile devices,” CoRR, vol. 2004. abs/1804.07573, 2018. [Online]. Available: http://arxiv.org/abs/1804. 07573 [11] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, 2019, pp. 4690–4699. [Online]. Available: http://openaccess.thecvf.com/ content CVPR 2019/html/Deng ArcFace Additive Angular Margin Loss for Deep Face Recognition CVPR 2019 paper.html
You can also read