Analysis of Feature Extraction and Anti-Interference of Face Image under Deep Reconstruction Network Algorithm
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Hindawi Complexity Volume 2021, Article ID 8391973, 15 pages https://doi.org/10.1155/2021/8391973 Research Article Analysis of Feature Extraction and Anti-Interference of Face Image under Deep Reconstruction Network Algorithm Jin Yang , Yuxuan Zhao , Shihao Yang , Xinxin Kang , Xinyan Cao, and Xixin Cao College School of Software & Microelectronics, Peking University, Beijing 102600, China Correspondence should be addressed to Jin Yang; yang.jin@pku.edu.cn Received 19 July 2020; Revised 11 September 2020; Accepted 23 September 2020; Published 20 March 2021 Academic Editor: Zhihan Lv Copyright © 2021 Jin Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In face recognition systems, highly robust facial feature representation and good classification algorithm performance can affect the effect of face recognition under unrestricted conditions. To explore the anti-interference performance of convolutional neural network (CNN) reconstructed by deep learning (DL) framework in face image feature extraction (FE) and recognition, in the paper, first, the inception structure in the GoogleNet network and the residual error in the ResNet network structure are combined to construct a new deep reconstruction network algorithm, with the random gradient descent (SGD) and triplet loss functions as the model optimizer and classifier, respectively, and it is applied to the face recognition in Labeled Faces in the Wild (LFW) face database. Then, the portrait pyramid segmentation and local feature point segmentation are applied to extract the features of face images, and the matching of face feature points is achieved using Euclidean distance and joint Bayesian method. Finally, Matlab software is used to simulate the algorithm proposed in this paper and compare it with other algorithms. The results show that the proposed algorithm has the best face recognition effect when the learning rate is 0.0004, the attenuation coefficient is 0.0001, the training method is SGD, and dropout is 0.1 (accuracy: 99.03%, loss: 0.0047, training time: 352 s, and overfitting rate: 1.006), and the algorithm proposed in this paper has the largest mean average precision compared to other CNN algorithms. The correct rate of face feature matching of the algorithm proposed in this paper is 84.72%, which is higher than LetNet-5, VGG-16, and VGG-19 algorithms, the correct rates of which are 6.94%, 2.5%, and 1.11%, respectively, but lower than GoogleNet, AlexNet, and ResNet algorithms. At the same time, the algorithm proposed in this paper has a faster matching time (206.44 s) and a higher correct matching rate (88.75%) than the joint Bayesian method, indicating that the deep reconstruction network algorithm proposed in this paper can be used in face image recognition, FE, and matching, and it has strong anti-interference. 1. Introduction Depth model has been applied in many fields, and its application in image recognition is the first to be concerned. FR technology has been extensively adopted in identity Moreover, the depth model is used for FE in the image, recognition, but it is mainly used to detect biological features which is far better than manual FE, and can be effectively in the face for recognition, with strong uniqueness and applied in the field where manual FE is not perfect [4]. The security [1]. However, the dataset made up of face images is common feature of DL and most machine learning is the one that presents a highly nonlinear distribution, and if ability to extract features. In DL, multilayer network simple classification method is applied, higher classification structure is mostly applied, which can fuse the bottom errors will occur due to individual differences [2]. Moreover, features in the image to form the top features [5]. ImageNet’s the face image will be disturbed by lighting, decorations, annual ImageNet Large-Scale Visual Recognition Challenge scenes, and other factors during the shooting, which makes is the largest and most advanced image recognition com- FR extremely difficult. At present, the methods commonly petition in the world. Since 2012, AlexNET used DCNN to used for face detection include principal component anal- reduce the recognition error rate of the top 5 to 16.4%, and ysis, support vector machine, CNN, and active deformation DCNN algorithm has been adopted in the subsequent model [3]. champion recognition models [6, 7].
2 Complexity The rest of the paper is organized as follows. Section 2 feature vectors and found that the proposed algorithm had discusses the related work of the DCNN network and image high extraction speed and accuracy at different noise levels FE technology. Then, in Section 3, a deep reconstruction [15]. Fieldin et al. used enhanced adaptive particle swarm network is proposed, and the face recognition and facial FE optimization to conduct deep CNN evolution, and after algorithm is constructed based on the optimized deep re- training and verification, they found that the extraction error construction network. Section 4 carries out the verification rate was 4.78% [16]. of face recognition and FE and matching algorithms pro- To sum up, it is evident that DCNN is widely used in posed in this paper and compares them with other DCNN image recognition. The improved DCNN can effectively algorithms. Section 5 discusses and analyzes the results extract the features in the image. However, there are few obtained in this paper, and the results of the paper were research studies on the anti-interference of DCNN in face compared with other people’s relevant research results. image recognition. Therefore, a DCNN based on Caffe depth Section 6 concludes the paper with summary and points out framework is proposed. Matlab simulation software is used the future research directions. to explore the influence of different parameter settings on the performance of FR by constructing the DCNN model. 2. Related Work Then, the LFW database and the self-constructed face da- tabase in this paper are combined to explore the anti-in- NN is a common computer model, which is mainly based on terference performance of the DCNN model constructed in the traditional statistical modeling of complex relationships this paper for face image recognition in different scenarios. to explore the relationship between different datasets. The The results of this paper are intended to lay a foundation for NN structure in different application fields is different, but improving the efficiency of FR. the NN structure with the largest application in image-re- lated fields is CNN. Seeliger et al. applied DCNN to the 3. Methodology detection of magnetoencephalogram signal map and reconstructed the cerebral cortex activity model based on the 3.1. Design of Deep Reconstruction Network Structure. extracted resolution source [8]. Hoang and Kang proposed a Existing studies have shown that, with the gradual deepening method for fault diagnosis of rolling axis based on CNN and of the CNN structure, the CNN training results become vibration images based on the depth structure and found better, but while improving the results, it will also increase that this method does not need FE technology and has high the amount of network computation [17]. Therefore, an diagnostic accuracy and robustness under noisy environ- inception network structure is proposed in the GoogleNet ment [9]. At present, deep learning has been widely used in network, which can make full use of the features extracted by the field of image analysis, especially in facial image rec- each layer of the network, and the structure can increase the ognition, which has obtained excellent results. Zhang et al. depth and complexity of the network while ensuring the proposed a convolutional neural network model based on network computing complexity. In the paper, some network three-stage filtering and HOG, which was applied to the parameter adjustments are proposed for the basic structure recognition of color face images and obtained excellent effect of the GoogleNet network: (1) the size of the feature map of [10]. Goswami et al. proposed a framework for face rec- the input network should be slowly reduced in the network ognition based on deep neural networks and used a char- to avoid the bottleneck of the feature representation in the acteristic anomaly filter to detect singularities. Finally, after image; (2) high-dimensional features are easier to obtain verification with public data, it was found that the model than low-dimensional features and can accelerate the built by it had strong face recognition robustness [11]. training speed of the model; (3) the adjacent neurons in the Isogawa et al. indicated that there was no denoising method network have close correlations and can be integrated into capable of parameter adjustment in the DCNN model. spatial relations in low-dimensional spaces. They used the proportional coefficient of soft shrinkage In the convolutional neural network, the 5 ∗ 5 size threshold to optimize DCNN, which was found to be ap- convolution kernel is 25/9 times the size of 3 ∗ 3 convo- plicable to noise removal in the image [12]. Nakazawa and lution. If the larger convolution kernel can be replaced with Kulkarni proposed a method to detect and segment ab- the superposition of multiple 3 ∗ 3 size convolution kernels, normal defect patterns using the NN architecture of deep then it can make the network have fewer parameters in the convolutional encoder-decoder and finally found that these same field of view. Therefore, a hypothesis is proposed in the models could detect invisible defect patterns from real study. First, a 3 ∗ 3 size convolution filter is adopted to images [13]. process a 5 ∗ 5 size block to obtain a 3 ∗ 3 size output The basic principle of image FE is to use a computer to feature map. After the convolution with a 3 ∗ 3 size con- extract the information in the image and then judge whether volution kernel, 1 ∗ 1 size output is obtained. At this time, the differences in the image are one of the features in the the final calculation amount is about 18/25 of the direct use image. An and Zhou proposed an image FE algorithm based of the 5 ∗ 5 size convolution kernel, which saves 28% of the on two-dimensional empirical mode decomposition and calculation amount, but the network also needs to use the scale-invariant feature change. After verification, they found ReLu activation function to correct the output. After that, that it could effectively improve the speed and accuracy of FE the 2 ∗ 2 size convolution kernel is adopted to replace the [14]. Xu et al. proposed a model for automatically extracting 3 ∗ 3 size convolution kernel, which can save 11% of the noise level perception features based on the CNN model and calculation, but the use of an asymmetric convolution kernel
Complexity 3 is better. Therefore, in the study, 1 ∗ 3 and 3 ∗ 1 size Input cascaded convolution kernels are used to replace 3 ∗ 3 size convolution kernels to save 33% of the calculation, so the larger size convolution kernel can be decomposed in the same way. Based on this, an improved Inception structure is obtained, as shown in Figure 1. n ∗ 1 Conv The structure of ResNet can have up to 152 layers [18]. This model adds an identical high-speed channel to a simple 1 ∗ 1 Conv stacked shallow network, as shown in Figure 2. On the left is the ordinary network, and its output value is H(x), which 1 ∗ n Conv 1 ∗ 1 Conv represents any ideal feature transformation. The residual network output value on the right is H(x) � F(x) + x, and 3 ∗ 3 MaxPool what needs to be fitted in the network is the residual F(x). Adding an identical shortcut connection in the network 1 ∗ 1 Conv Europe can make the input mapping and input of the network overlap. The output formula of the residual unit is y � F(x,{Wi}) + x, where x represents the input and y rep- resents the output, and the function F(x,{Wi}) is the residual mapping obtained by learning. Theoretically, the following Filter concat two networks can approximate any function. But if the Figure 1: Improved inception structure. optimal solution of the network is close to the identity mapping, it is easier to optimize using the residual network; if the convolution branch appears as gradient dispersion, the x x features and gradients in the network can also flow in the branches of the identity mapping, which ensures the reli- ability of information transmission. Weight layer Weight layer The convolution layer in the convolution branch F(x,{Wi}) of the residual unit can be composed of multiple ReLu ReLu X layers, and the residual unit can also be modified by scaling Identity of the mapping branch, convolution kernel size, dropout, Weight layer Weight layer and the activation function location. Therefore, a new F (x) ReLu DCNN network structure is constructed based on the in- ception structure in the GoogleNet network and the residual F (x) + x + structure in the ResNet network, to improve the network’s ReLu computational efficiency and information circulation effect. The combined new network structure is shown in Figure 3. The basic parameters of the deep reconstruction network Figure 2: Differences s in the structure of different networks. The constructed in this paper are shown in Table 1, where the left is an ordinary network, and the right is a residual network in ResNet. dropout parameter is x, and the subsequent continuous adjustment of the parameters is required to select the op- timal dropout value for the final test. in which α is the Euclidean distance between sample features Deep neural networks often use Softmax as a classifier, of the same category or the minimum interval of Euclidean and the triplet loss function is a classifier function proposed distance between sample features of different categories. by Google for face recognition CNNs. Studies have also At this time, the mathematical expression of triplet loss is shown that its recognition rate on the LFW dataset has as follows: reached 99.63%, and the recognition rate on the YouTube N � ��2 �� ��2 � dataset has also reached 95.12% [19]. Therefore, the impact L � ���f xi1 − f xi2 ��� − ���f xi1 − f xi3 ��� + α . (2) of different classifiers on the model’s face recognition rate is 2 2 i analyzed. In this paper, triplets are composed of a reference (anchor) of random samples in the training set, a reference In the training process of the neural network, the gra- sample (positive), and a sample of different categories dient equation corresponding to each parameter is as (negative). Assuming that the sample features obtained by follows: extracting each sample in triplets are f(x1 ), f(x2 ), and zL f(x3 ), respectively, then the goal of optimization is as � 2 f xi3 − f xi2 , (3) zf xi1 follows: �� � � � ��f xi − f xi ���2 + α
4 Complexity Image Image Conv 1 ∗ 1 Conv 1 ∗ 1 Conv 1 ∗ 1 Conv 1 ∗ 1 Conv 1 ∗ 1 Conv 3 ∗ 3 Conv 3 ∗ 3 Conv 1 ∗ 7 Conv 3 ∗ 3 Conv 7 ∗ 1 Combination Combination Conv 1 ∗ 1 Conv 1 ∗ 1 Sum Sum (a) (b) Image Conv 1 ∗ 1 Conv 1 ∗ 1 Conv 1 ∗ 3 Conv 3 ∗ 1 Combination Conv 1 ∗ 1 Sum (c) Figure 3: (a) ResNet-Inception-1 structure; (b) ResNet-Inception-2 structure; (c) ResNet-Inception-3 structure. zL the same category [20]. Center loss can be defined as � 2 f xi1 − f xi3 . (5) follows: in which cyi is the feature center of the i-th zf xi3 category. In order to improve the optimization efficiency of the model, the online method is used to select the combination 1 m ��� ��2 LC � ��xi − cyi ���2 , (7) of triplets. At the same time, in order to prevent the model 2 i�1 from falling into the local optimal solution state, α is re- moved when selecting the negative sample, so xin only needs where LC is the center loss. Cyi is the feature center of i-th to satisfy �� � � � category. xi is the feature before full connection layer, and m ��f xi − f xi ���2
Complexity 5 alignment, and the face features in different poses are Table 1: Basic parameters of ResNet-Inception network structure. extracted, mainly partial images around the eyes, corners of Structure Parameter Output image size the mouth, and the tip of the nose in the face image, as Conv-1 3∗3 147 ∗ 147 ∗ 32 shown in Figure 5. Conv-2 3∗3 145 ∗ 145 ∗ 32 Subsequently, the Euclidean distance [21] and the Conv-3 3∗3 145 ∗ 145 ∗ 64 learning feature matching algorithm are applied to feature Pool-1 — 73 ∗ 73 ∗ 64 matching in the face image, and the matching process is Conv-4 1∗1 73 ∗ 73 ∗ 80 shown in Figure 6. Conv-5 3∗3 70 ∗ 70 ∗ 190 In the first step, it is necessary to quickly judge the Conv-6 3∗3 33 ∗ 33 ∗ 256 sample that is easy to judge in matching face image samples. ResNet-Inception-1 5 33 ∗ 33 ∗ 256 Reduction-1 — 16 ∗ 16 ∗ 512 The Euclidean distance is used for the similarity between ResNet-Inception-2 10 16 ∗ 16 ∗ 512 samples, and then, the absolute distance between two points Reduction-2 — 8 ∗ 8 ∗ 1024 in space is calculated as follows: ResNet-Inception-3 5 8 ∗ 8 ∗ 1024 ���������� Pool-2 — 1792 n 2 Dropout x 1792 d(u, v) � ui − vi . (10) i�1 The cyi is updated to update the sample set in the input In the second step, on the premise of ensuring the feature network, and the update rule is as follows: recognition rate in the face image, a learning algorithm with a higher recognition rate is used to model the features in the n i�1 δ yi � u · cyi − xi face image, and the similarity of the complete face features is cv+1 v u � cu − , (8) matched to realize the refined judgment of the sample. It is 1 + ni�1 δ yi � u evident from Figure 5 that it is needed to extract five local value features in the face image and obtain a total of six 160- in which δ belongs to 0 or 1. dimensional feature vectors, which are spliced into 960- Then, the error function in the network can be defined as dimensional facial feature vectors. Then, the feature image is follows: reduced to 160 dimensions again by PCA dimension reduction. LOSS � LOSSs + λLOSSc , (9) Besides, in the second step, joint Bayesian is used to calculate the similarity between samples. Assuming that the in which λ is the weight in the loss function. When λ is 0, the two faces are the joint distribution of x1 and x2, respectively, classifier in the network is Softmax. and both obey the Gaussian distribution feature of 0 mean, then the mathematical expression of the covariance between 3.2. Face Fe and Matching Based on Deep Reconstruction these two face images is as follows: Network. In the process of taking photos, people may not cov ui , vj � cov μi , μj + cov εi , εj , (11) fully expose their faces due to the camera angle, light, mood, and equipment. In order to detect faces in different positions in which μ is the identity of the character and ε is the or different angles in the image, it needs to first change the difference between the face itself (light, expression, and scale of the sliding window or the input image. The sliding posture). window also needs to scan the input image according to a certain step size (w); as w increases, the quantity of judgment windows and the amount of computation in the model 3.3. Evaluation Indexes for the Testing and Identification of network will decrease geometrically. When the image is Deep Reconstruction Network Algorithm. In this paper, input into the DCNN, there is no need to preprocess the Matlab software is used to simulate the deep reconstruction image. However, in order to better detect faces of different network, and LFW dataset is used for training and testing. In sizes, the subacquisition rate of the original input image is this paper, the following experiments are carried out on scaled at different scales, and then, the image pyramid is TensorFlow platform with Linux as the operating system and obtained, as shown in Figure 4. 2 ∗ TITAN X as the GPU. The LFW dataset has a total of In this paper, DCNN is used to process each face image 13233 face images, each of which gives the corresponding located in the image pyramid according to the input order, name, there are a total of 5749 people, and most people have and the potential regions in the detected face image are only one picture. The size of each image is 250 × 250, most of marked at the same time. The coordinate points and size of which are color images, but there are also a few black and the specific area of the face in the image are recorded. After white face pictures. In this paper, the face images in the all images are processed, all regions of response points in the dataset are randomly divided into a training set and a detected images are reversely mapped to the original face verification set. The training set contains 10,000 face images, image input in the input layer. Then, the overlapping areas of and the verification set contains 3233 face images. When the image are fused to obtain the final result. Subsequently, training the model, first the effects of different parameter face straightening is performed at the stage of no face settings (learning rate η, attenuation coefficient λ, training
6 Complexity DCNN DCNN DCNN DCNN Figure 4: Flow chart of face detection in face image. + + + + + Figure 5: Extraction of face feature points and subfigures of each area. Start Face image Local feature extraction Feature extraction Feature stitching Normalized Dimensionality reduction Euclidean distance Similarity calculation No thethreshold? Yes Yes No Same person Different person End Figure 6: Matching process of face feature points.
Complexity 7 optimization algorithm, and dropout) on the training effect task. In addition, the overfitting ratio (OR) is used to of the model are analyzed, and the loss value, accuracy value, evaluate the algorithm. The calculation equation of OR is as overfitting rate, and training time are adopted for evaluation follows: of the training effect. Based on the model with the optimal parameter settings, it is applied to the verification set, and TrainAcc OR � , (16) the mean average precision (mAP) value is used to evaluate ValAcc the model recognition effect. in which TrainAcc is the training accuracy rate and ValAcc is In order to evaluate the effect of the proposed algorithm the verification accuracy rate. on face FE, 10140 images of 1150 people from the LFW dataset are selected, and 180 pairs of positive face samples and 180 pairs of negative face samples are randomly selected 4. Results as test sample pairs. Then, the construction network is used to perform face FE, and the simplest Euclidean distance is 4.1. Comparison of Recognition Rate Based on Deep Recon- used to match feature similarity. Finally, classification ac- struction Network Algorithm. By adjusting the learning rate curacy and matching accuracy are used to evaluate the ef- η, attenuation coefficient λ, training method, and dropout, fectiveness of face FE. Subsequently, these 10140 images are the impact of different parameters on the accuracy of DCNN used to continue segmentation. Images of 1050 people are model recognition is explored, the database is used to train selected for facial feature matching algorithm training. For the DCNN model, and the maximum number of iterations is the remaining 100 images of individuals, 200 pairs of set to 1000. In the experiment, the activation function in the positive samples and 200 pairs of negative samples are network is always kept as ReLU. First, the effects of different randomly selected to verify. The recognition accuracy rate, learning rates η on the model recognition rate are compared. positive sample error rate, negative sample error rate, The results are shown in Figures 7 and 8. It is evident from pairing correct rate, and pairing average time are used to Figure 7 that when η � 0.0004, the recognition accuracy of evaluate the matching effect of face features. the algorithm is the highest; when η � 0.001, the recognition In the field of target recognition and detection and accuracy of the algorithm is the lowest. It is evident from recognition, mAP is often used to evaluate the effect of Figure 8 that when η � 0.0004, the loss value of the algorithm algorithm detection and recognition [22]. In the paper, basic is the smallest, and when η � 0.001, the loss value of the concepts such as error rate, accuracy, precision, recall, and algorithm is the largest. With the gradual increase in η, the average precision are first introduced. The error rate and accuracy value gradually decreases and the loss value accuracy rate are the most commonly used measurement gradually increases. terms in classification evaluation, where the error rate is the Then, the effects of different learning rates on the face ratio of error samples to the total sample, and the accuracy recognition performance of the algorithm are quantitatively rate is the ratio of correct sample to the total sample. As- compared. It is evident from Table 2 that when η � 0.0004, suming that the sample dataset is S, the sample is defined as the algorithm has the largest accuracy value (99.03%), the ui, the type of the sample is vi, the algorithm’s prediction smallest loss value (0.047), and the shortest training time result is f(ui), and then the calculation equations of error rate (352 s). When η � 0.0007, the algorithm has the lowest and accuracy rate are as follows: overfitting rate (1.005). In summary, when η � 0.0004, the performance of the algorithm is the best, so follow-up tests 1 m are conducted based on this. Error(f; S) � I f ui ≠ vi , (12) The effect of different attenuation coefficients λ on the m i�1 model recognition rate is compared. The results are shown in Figures 9 and 10. It is evident from Figure 9 that when 1 m λ � 0.0001, the recognition accuracy of the algorithm is the Acc(f; S) � I f ui � vi � 1 − Error(f; S). (13) m i�1 highest; when λ � 0.09, the recognition accuracy of the al- gorithm is the lowest. It is evident from Figure 10 that when The calculation equations for the precision rate (Pre) and λ � 0.0001, the loss value of the algorithm is the smallest; the recall rate (Recall) are as follows: when λ � 0.09, the loss value of the algorithm is the largest. TP With the gradual increase in λ, the accuracy value gradually Pre � , (14) decreases and the loss value gradually increases. TP + FP Then, the effects of different attenuation coefficients on TP the face recognition performance of the algorithm are Recall � , (15) quantitatively compared. It is evident from Table 3 that when TP + FN λ � 0.0001, the accuracy value of the algorithm is the largest where TP is the true positive, FP is the false positive, and FN (99.03%), and the loss value is the smallest (0.047). When is the false negative. λ � 0.0007, the training time of the algorithm is the shortest Using the precision rate as the vertical axis and the recall (342 s). When λ � 0.0001 and 0.0009, the overfitting rate of rate as the horizontal axis, a P-R curve is obtained. The area the algorithm is the smallest (1.006). In summary, when under the P-R curve is average precision (AP), and mAP is λ � 0.0001, the performance of the algorithm is the best, so the average of AP indicators in each type of classification follow-up tests are conducted based on this, and the classifier
8 Complexity 1.0 used in this model is determined to be the triplet loss function. After comparing the effects of different training methods 0.9 on the model recognition rate, the results are shown in Figures 11 and 12. It is evident from Figure 11 that when the training method is SGD, the recognition accuracy of the Accuracy 0.8 algorithm is the highest; when the training method is Adagrad, the recognition accuracy of the algorithm is the lowest. It is evident from Figure 12 that when the training 0.7 method is SGD, the loss value of the algorithm is the smallest; when the training method is Adagrad, the loss value of the algorithm is the largest. 0.6 Then, the effects of different training methods on the face 0 200 400 600 800 1000 recognition performance of the algorithm are quantitatively Number of iterations compared. It is evident from Table 4 that when the training η = 0.0004 η = 0.0008 method is SGD, the algorithm has the largest Acc value η = 0.0005 η = 0.0009 (99.03%), the smallest loss value (0.047), and the lowest η = 0.0006 η = 0.001 overfitting rate (1.006). When the training method is η = 0.0007 RMSprop, the training time of the algorithm is the shortest (351 s). In summary, when the training method is SGD, the Figure 7: Comparison of recognition accuracy of different learning performance of the algorithm is the best, so follow-up tests rates. are performed based on this. Finally, the impact of different dropout values on the 2.5 model recognition rate is compared. The results are shown in Figures 13 and 14. It is evident from Figure 13 that when dropout � 0.1, the algorithm has the highest recognition 2.0 accuracy; when dropout � 0.6, the algorithm has the lowest recognition accuracy. It is evident from Figure 14 that when 1.5 dropout � 0.1, the loss value of the algorithm is the smallest, and when dropout � 0.6, the loss value of the algorithm is the Loss largest. With the gradual increase in the dropout value, the 1.0 accuracy value of the algorithm gradually decreases, and the loss value gradually increases. 0.5 Then, the effects of different dropout values on the face recognition performance of the algorithm are quantitatively compared. It is evident from Table 5 that when 0.0 0 200 400 600 800 1000 dropout � 0.1, the algorithm has the largest Acc value Number of iterations (99.03%), the smallest loss value (0.047), and the lowest η = 0.0004 η = 0.0008 overfitting rate with dropout � 0.2 (1.006). When η = 0.0005 η = 0.0009 dropout � 0.6, the training time of the algorithm is the η = 0.0006 η = 0.001 shortest (337 s). In summary, when dropout � 0.1, the per- η = 0.0007 formance of the algorithm is the best, so follow-up tests are conducted based on this. Figure 8: Comparison of training loss value of different learning It is set that η � 0.0004 and λ � 0.0001 in the construction rates. of the deep reconstruction network, SGD is selected as the training method, and dropout � 0.1. Then, the effects of different activation functions on the model recognition rate are compared, as shown in Figures 15 and 16. It is evident Table 2: Effects of different learning rates on the training effect of from Figure 15 that the recognition accuracy rate is the deep reconstruction network algorithm. highest when the activation function is ReLu, and the rec- η Acc (%) Loss Training time (s) Overfitting rate ognition accuracy rate is the lowest when the activation 0.0004 99.03 0.047 352 1.006 function is sigmoid. From Figure 16, it is evident that the loss 0.0005 96.74 0.059 354 1.006 value is the lowest when the activation function is ReLu, and 0.0006 93.15 0.062 368 1.082 the loss value is the highest when the activation function is 0.0007 90.31 0.120 370 1.005 sigmoid. 0.0008 88.07 0.115 400 1.011 Then, the effects of different activation functions on the 0.0009 86.41 0.170 398 1.026 face recognition performance of the algorithm are quanti- 0.001 85.05 0.206 405 1.058 tatively compared. It is evident from Table 6 that when the
Complexity 9 1.00 activation function is ReLu, the performance of the algo- rithm is the best, so follow-up tests are carried out on the 0.95 basis. Based on the results obtained above, in the deep recon- 0.90 struction network constructed in this paper, η � 0.0004, λ � 0.0001, the training method is set to SGD, and dropout � 0.1 Accuracy 0.85 for face recognition verification, and the recognition effect is compared with LetNet-5, GoogleNet, AlexNet, VGG-16, VGG- 0.80 19, and ResNet network. It is evident from Figure 17 that the area under the P-R curve of the algorithm constructed in this 0.75 paper is the largest; that is, the mAP value is the largest. The mAP values of different models are ranked, and the result is the 0.70 algorithm proposed in the paper > ResNet > AlexNet > VGG- 0 200 400 600 800 1000 19>GoogleNet > VGG-16>LetNet-5. Number of iterations λ = 0.0001 λ = 0.009 λ = 0.0009 λ = 0.01 4.2. Face Fe and Matching Verification Based on Deep Re- λ = 0.001 λ = 0.09 construction Network Algorithm. The face recognition and FE algorithm proposed in this paper is applied to the LFW Figure 9: Comparison of recognition accuracy of different at- tenuation coefficients. dataset to extract and match face features. The results are shown in Figure 18. It is evident that the algorithm proposed in this paper can effectively extract the features in the face image and can complete the matching of the same features 1.4 and then realize the recognition and detection of the same 1.2 face. Then, the deep reconstruction network algorithm 1.0 constructed in this paper is applied to FE and feature 0.8 matching in face images, and the performance is com- Loss pared with other models. It is evident from Table 7 that 0.6 when the construction method of this paper is trained, the 0.4 classification accuracy rate on the training set is 97.94%, and it is higher than the LetNet-5, VGG-16, and VGG-19 0.2 algorithms, the accuracy rates of which are 2.1%, 0.94%, 0.0 and 1.16%, respectively, but lower than GoogleNet, 0 200 400 600 800 1000 AlexNet, ResNet algorithm, the accuracy rates of which Number of iterations are 0.14%, 0.58%, and 0.46%, respectively. In the verifi- λ = 0.0001 λ = 0.009 cation set, the correct matching rate of the algorithm λ = 0.0009 λ = 0.01 constructed in this paper is 84.72%, and it is higher than λ = 0.001 λ = 0.09 the LetNet-5, VGG-16, and VGG-19 algorithms, the correct matching rates of which are 6.94%, 2.5%, and Figure 10: Comparison of training loss value of different atten- uation coefficients. 1.11%, but lower than the GoogleNet, AlexNet, and ResNet algorithms, the correct matching rates of which are 0.56%, 1.39%, and 1.95%, respectively. Table 3: Influence of different attenuation coefficients on the The matching effect of the features extracted from the training effect of deep reconstruction network algorithm. face image is compared. The comparison results of the total λ Acc (%) Loss Training time (s) Overfitting rate recognition rate are shown in Table 8. It is evident that, 0.0001 99.03 0.047 352 1.006 after the joint Bayesian method is used in the algorithm 0.0009 97.63 0.048 342 1.006 proposed in this paper, the correct recognition log rate in 0.001 95.17 0.102 358 1.037 the verification set is the highest (88.75%), and it is higher than LetNet-5, GoogleNet, AlexNet, VGG-16, VGG-19, and ResNet algorithms, the correct recognition rates of activation function is ReLu, the Acc value of this algorithm is which are 4%, 1.25%, 0.75%, 3.75%, 3.25%, and 0.25%, the largest (99.03%), the loss value is the smallest (0.047), respectively. and the overfitting rate is the lowest (1.006). When the The effects of using only the joint Bayesian method and activation function is Tanh, the training time of this algo- the algorithm in this paper on the matching effect of face rithm is the shortest (341 s). In summary, when the features are compared, and the results are shown in
10 Complexity 1.00 1.00 0.95 0.95 0.90 0.90 Accuracy Accuracy 0.85 0.85 0.80 0.80 0.75 0.75 0 200 400 600 800 1000 0 200 400 600 800 1000 Number of iterations Number of iterations Dropout = 0.1 Dropout = 0.4 Adadelta RMSprop Dropout = 0.2 Dropout = 0.5 Adam SGD Dropout = 0.3 Dropout = 0.6 Nadam Adagrad Figure 13: Comparison of recognition accuracy with different Figure 11: Comparison of recognition accuracy of six training dropout values. methods. 1.4 1.4 1.2 1.2 1.0 1.0 0.8 0.8 Loss Loss 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0 200 400 600 800 1000 0.0 0 200 400 600 800 1000 Number of iterations Number of iterations Adadelta RMSprop Dropout = 0.1 Dropout = 0.4 Adam SGD Dropout = 0.2 Dropout = 0.5 Nadam Adagrad Dropout = 0.3 Dropout = 0.6 Figure 12: Comparison of training loss value of six training Figure 14: Comparison of training loss value with different methods. dropout values. Table 4: Influence of different training methods on the training Table 5: Effect of different dropout values on the training effect of effect of deep reconstruction network algorithm. deep reconstruction network algorithm. Training Overfitting Acc (%) Loss Training time (s) Dropout Acc (%) Loss Training time (s) Overfitting rate method rate 0.1 99.03 0.047 352 1.006 Adadelta 97.45 0.062 357 1.010 0.2 98.83 0.057 347 1.006 Adam 97.05 0.087 388 1.012 0.3 97.20 0.087 348 1.009 Nadam 96.87 0.132 438 1.016 0.4 97.05 0.112 351 1.017 RMSprop 96.14 0.147 351 1.129 0.5 96.23 0.127 340 1.027 SGD 99.03 0.047 352 1.006 0.6 95.93 0.153 337 1.518 Adagrad 94.91 0.187 422 1.158 Table 9. It is evident that the algorithm proposed in this 5. Discussion paper has a faster matching time (206.44 s) and a higher correct matching rate (88.75%) than the joint Bayesian Previous studies have shown that different parameter set- method. tings will have a certain impact on the recognition rate of the
Complexity 11 1.00 0.95 0.90 Accuracy 0.85 0.80 0.75 0.70 0 200 400 600 800 1000 Number of iterations ReLu MaxOut LeakyReLu ELU Tanh Sigmoid Figure 15: Comparison of accuracy of behavior recognition of six activation functions. 0.30 DCNN model. In this paper, the effects of different learning rates, training methods, attenuation coefficients, and 0.25 dropout on the recognition performance of the algorithm are compared. Studies have shown that too large learning 0.20 rate will cause the algorithm to appear unstable during the learning process, which in turn affects the performance of the algorithm [23], which is consistent with the results of this Loss 0.15 paper that the learning rate is negatively correlated with the 0.10 performance of the algorithm proposed in this paper. Then, the effects of different training methods on the recognition 0.05 rate of the DCNN model are compared. The RMSprop-based network training method will increase the probability of 0.00 overfitting, and the madam-based network training method 0 200 400 600 800 1000 will increase the momentum parameter of Nesterov, so the Number of iterations training will take more time. Previous studies have shown ReLu ELU that adjusting dropout discarding rate in many network LeakyReLu Sigmoid parameters can solve the problem of overfitting in the model, Tanh pReLu but excessively increasing dropout discarding rate will re- MaxOut duce the RA of image classification, and the overfitting probability will increase [24], which is basically consistent Figure 16: Comparison of behavior training loss of six activation functions. with the results in this paper that the dropout is negatively correlated with the recognition performance of the con- structed face recognition algorithm. At the same time, the Table 6: Influence of different activation functions on the training effect of different attenuation coefficient settings on the effect of deep reconstruction network algorithm. recognition performance of the proposed algorithm is compared. It is found that the attenuation coefficient is Activation Acc Training time Overfitting negatively correlated with the recognition rate of the pro- Loss function (%) (s) rate posed algorithm. This is because excessive weight attenua- Sigmoid 95.17 0.091 382 1.007 tion coefficient will destroy the stability of the algorithm in ReLu 99.03 0.047 352 1.006 the learning process, so choosing the smallest possible at- Tanh 98.52 0.053 341 1.007 LeakyReLu 97.11 0.073 350 1.006 tenuation coefficient can ensure the stability of the face ELU 96.78 0.087 349 1.007 recognition algorithm [25]. MaxOut 97.45 0.082 367 1.006 Facial features are mainly divided into color (skin color), contour (ellipse), illumination (hair, eyes, and jaw), template
12 Complexity 1.0 0.8 0.6 Precision 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Recall LetNet AlexNet VGG-16 ResNet GoogleNet Our method VGG-19 Figure 17: Comparison of P-R curves of face recognition performance of different models. Figure 18: Face image FE and matching. Different colored circles in the same picture indicate different extracted features, and the same color in different pictures in the same group indicates matching similar features.
Complexity 13 Table 7: Comparison of facial FE performance of different models. Correct number of training Training classification accuracy Verify correct Verification correct match rate Algorithm classifications (%) log (%) LetNet-5 9028 95.84 280 77.78 VGG-16 9117 96.78 296 82.22 VGG-19 9137 97.00 301 83.61 GoogleNet 9239 98.08 307 85.28 AlexNet 9281 98.52 310 86.11 ResNet 9269 98.40 312 86.67 Our method 9226 97.94 305 84.72 LetNet-5 9028 95.84 280 77.78 Table 8: Comparison of total recognition rates of facial features of different models. Recognition correct log Log error rate of positive Logarithmic error rate of negative Algorithm Identify the correct logarithm rate (%) samples (%) samples (%) LetNet-5 339 84.75 13.5 17 VGG-16 340 85 14.5 15.5 VGG-19 342 85.5 15 14 GoogleNet 350 87.5 14 11 AlexNet 352 88 11.5 12.5 ResNet 354 88.5 10.5 12.5 Our method 355 88.75 11 11.5 Table 9: Comparison of total recognition rates of facial features of different models. Algorithm Matching time (s) Match log correctly Correct match rate (%) United Bayes 240.15 350 87.5 Our method 206.44 355 88.75 (mean and variance), transformation domain (feature rep- matching [36] and improve the matching accuracy [37], resentation), structure (facial symmetry), inlay (mosaic indicating that the algorithm proposed in this paper can rule), and histogram features (grey distribution) [26]. In the quickly perform large-scale face image feature matching and process of face feature detection, because of factors such as has certain advantages compared with other DCNN models. highly nonlinear distribution (such as facial expression and color difference), ornaments (beard, glasses, and hats), the expression (facial muscle movement), light (brightness and 6. Conclusion angles), image quality (resolution), and a complex scenario (face number and gap), classification errors are caused [27]. In this paper, a new deep reconstruction network is con- Xu et al. proposed a semisupervised method for FR in LFW structed using the Inception structure in the GoogleNet and YTF databases and found that the recognition rate of the network and the residual structure in the ResNet network, algorithm was 98.63% and 91.76%, respectively [28]. In this and it is applied to face recognition for optimal parameter paper, the constructed deep reconstruction network is ap- selection of the algorithm. Based on this algorithm, face plied to the FE of face images in LFW data; after the pro- pyramid and local feature segmentation are applied to posed algorithm is compared to LetNet-5 [29], VGG-16 [30], construct a face FE algorithm. It is found that it has a better and VGG-19 [31] algorithms, it is found that it has a higher FE effect than the general DCNN model. Finally, based on matching accuracy rate but slightly lower than GoogleNet the algorithm, the face feature matching is achieved using [32], AlexNet [33], and ResNet [34] algorithms. However, the joint Bayesian method, and the results are verified. The the algorithm proposed in this paper can reduce the com- results show that the model constructed in this paper can plexity of model generation, while preventing the occurrence effectively identify face images with different interference of overfitting. Finally, the Bayesian method [35] is used to factors. However, the model is trained and verified only match face features. It is found that the accuracy rate of the through the images in the database. A specific FR system proposed algorithm for face features matching is higher than needs to be developed to explore its application effect in LetNet-5, VGG-16, VGG-19, GoogleNet, AlexNet, and video portrait recognition. In conclusion, the results help for ResNet. At the same time, compared with the joint Bayes the subsequent development of FR system based on DCNN method alone, it can reduce the time consumption of feature and the improvement of the efficiency of FR.
14 Complexity Data Availability [15] S. P. Xu, T. Liu, G. Z. Zhang et al., “A two-stage noise level estimation using automatic feature extraction and mapping The raw/processed data required to reproduce these findings model,” IEEE Signal Processing Letters, vol. 26, no. 1, cannot be shared at this time as the data also form part of an pp. 179–183, 2018. ongoing study. [16] B. Fielding and L. Zhang, “Evolving image classification ar- chitectures with enhanced particle swarm optimisation,” IEEE Access, vol. 6, pp. 68560–68575, 2018. Conflicts of Interest [17] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond The authors declare that they have no conflicts of interest. a Gaussian denoiser: residual learning of deep cnn for image denoising,” IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155, 2016. References [18] E. Kang, W. Chang, J. Yoo, and J. C. Ye, “Deep convolutional framelet denosing for low-dose ct via wavelet residual net- [1] E. Viccaro, E. Sands, and C. Springer, “Spaced retrieval using work,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, static and dynamic images to improve face-name recognition: pp. 1358–1369, 2018. alzheimer’s dementia and vascular dementia,” American [19] D. S. . Trigueros, L. Meng, and M. Hartnett, “Enhancing Journal of Speech-Language Pathology, vol. 28, no. 3, pp. 1184–1197, 2019. convolutional neural networks for face recognition with oc- [2] L. Li, P. L. Correia, and A. Hadid, “Face recognition under clusion maps and batch triplet loss,” Image and Vision spoofing attacks: countermeasures and research directions,” Computing, vol. 79, 2018. IET Biometrics, vol. 7, no. 1, pp. 3–14, 2018. [20] J. Alan and F. Guo, “Spectral-spatial feature extraction and [3] R. Starrfelt, S. K. Klargaard, A. Petersen, and C. Gerlach, classification by ann supervised with center loss in hyper- “Reading in developmental prosopagnosia: evidence for a spectral imagery,” IEEE Transactions on Geoscience and Re- dissociation between word and face recognition,” Neuropsy- mote Sensing, vol. 57, no. 3, pp. 1755–1767, 2019. chology, vol. 32, no. 2, pp. 138–147, 2018. [21] X. Gan and Z. Haitao, “A noise-robust semi-supervised di- [4] H. Qin and M. A. El-Yacoubi, “Deep representation-based mensionality reduction method for face recognition,” Optik feature extraction and recovering for finger-vein verification,” International Journal for Light & Electron Optics, vol. 157, IEEE Transactions on Information Forensics and Security, pp. 858–865, 2018. vol. 12, no. 8, pp. 1816–1829, 2017. [22] U. D. Dixit and M. S. Shirdhonkar, “Face-based document [5] Z. Wang, C. Song, and T. Chen, “Deep learning based image retrieval system,” Procedia Computer Science, vol. 132, monitoring of furnace combustion state and measurement of pp. 659–668, 2018. heat release rate,” Energy, vol. 131, pp. 106–112, 2017. [23] H. Zhao, F. Liu, H. Zhang, and Z. Liang, “Research on a [6] O. Russakovsky, J. Deng, H. Su et al., “ImageNet large scale learning rate with energy index in deep learning,” Neural visual recognition challenge,” International Journal of Com- Networks, vol. 110, pp. 225–231, 2019. puter Vision, vol. 115, no. 3, pp. 211–252, 2015. [24] G. H. D. Rosa, P. P. João, and X. S. Yang, “Handling dropout [7] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet probability estimation in convolution neural networks using classification with deep convolutional neural networks,” metaheuristics,” Soft Computing, vol. 4, pp. 6147–6156, 2017. Advances in Neural Information Processing Systems, vol. 23, [25] M. Tzelepi and A. Tefas, “Improving the performance of no. 2, pp. 1097–1105, 2012. lightweight cnns for binary classification using quadratic [8] K. Seeliger, M. Fritsche, U. Güçlü et al., “Convolutional neural mutual information regularization,” Pattern Recognition, network-based encoding and decoding of visual object rec- vol. 106, p. 107407, 2020. ognition in space and time,” NeuroImage.vol. 180, pp. 253– [26] T. Amina, A. T. Ahmed, S. Djamel et al., “Deep learning 266, 2017. features for robust facial kinship verification,” IET Image [9] D. T. Hoang and H. J. Kang, “Rolling element bearing fault Process, vol. 12, no. 12, pp. 2336–2345, 2018. diagnosis using convolutional neural network and vibration [27] M. Kraemer, Q. B. Huynh, D. Wieczorek et al., “Distinctive image,” Cognitive Systems Research, vol. 53, pp. 42–50, 2018. facial features in idiopathic Moyamoya disease in Caucasians: [10] M. Zhang, S. Khan, and H. Yan, “Deep eigen-filters for face a first systematic analysis,” PeerJ, vol. 6, no. 4, Article ID recognition: feature representation via unsupervised multi- e4740, 2018. structure filter learning,” Pattern Recognition, vol. 100, [28] W. Xu, J. Wu, S. Ding, L. Lian, and H. Chao, “Enhancing face p. 107176, 2020. recognition from massive weakly labeled data of new do- [11] G. Goswami, A. Agarwal, N. Ratha, R. Singh, and M. Vatsa, mains,” Neural Processing Letters, vol. 49, no. 3, pp. 939–950, “Detecting and mitigating adversarial perturbations for robust 2019. face recognition,” International Journal of Computer Vision, [29] D. Wang, H. Lu, and M.-H. Yang, “Kernel collaborative face vol. 127, no. 6-7, pp. 719–742, 2019. recognition,” Pattern Recognition, vol. 48, no. 10, [12] K. Isogawa, T. Ida, T. Shiodera et al., “Deep shrinkage con- pp. 3025–3037, 2015. volutional neural network for adaptive noise reduction,” IEEE [30] F. Zhu, X. Kong, H. Fu, and Q. Tian, “Pseudo-positive reg- Signal Processing Letters, vol. 25, no. 2, pp. 224–228, 2017. ularization for deep person re-identification,” Multimedia [13] T. Nakazawa and D. V. Kulkarni, “Anomaly detection and Systems, vol. 24, no. 4, pp. 477–489, 2018. segmentation for wafer defect patterns using deep convolu- [31] M. F. Hansen, M. L. Smith, L. N. Smith et al., “Towards on- tional encoder-decoder neural network architectures in farm pig face recognition using convolutional neural net- semiconductor manufacturing,” IEEE Transactions on Semi- works,” Computers in Industry, vol. 98, pp. 145–152, 2018. conductor Manufacturing, vol. 32, no. 2, pp. 250–256, 2019. [32] N. Bi, J. Chen, and J. Tan, “The handwritten Chinese character [14] F.-P. An and X.-W. Zhou, “BEMD-SIFT feature extraction recognition uses convolutional neural networks with the algorithm for image processing application,” Multimedia googlenet,” International Journal of Pattern Recognition and Tools and Applications, vol. 76, no. 11, pp. 13153–13172, 2017. Artificial Intelligence, vol. 33, no. 11, p. 1940016, 2019.
Complexity 15 [33] A. Aslam, K. Hayat, A. I. Umar et al., “Wavelet-based con- volutional neural networks for gender classification,” Journal of Electronic Imaging, vol. 28, no. 1, p. 1, 2019. [34] Z. Lu, X. Jiang, and A. Kot, “Deep coupled resnet for low- resolution face recognition,” IEEE Signal Processing Letters, vol. 25, no. 4, pp. 526–530, 2018. [35] Z. Huang, X. Xu, J. Ni, H. Zhu, and C. Wang, “Multimodal representation learning for recommendation in internet of things,” IEEE Internet of Things Journal, vol. 6, no. 6, pp. 10675–10685, 2019. [36] M. Zhang, D. Zhang, H. Yao, and K. Zhang, ““A probabilistic model of human error assessment for autonomous cargo ships focusing on human–autonomy collaboration,” Safety Science, vol. 130, Article ID 104838, 2020. [37] F. Hu and G. Wu, “Distributed error correction of EKF al- gorithm in multi-sensor fusion localization model,” IEEE Access, vol. 8, pp. 93211–93218, 2020.
You can also read