Analysis of Feature Extraction and Anti-Interference of Face Image under Deep Reconstruction Network Algorithm

Page created by Gary Ruiz

Style & Fashion

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Analysis of Feature Extraction and Anti-Interference of Face Image under Deep Reconstruction Network Algorithm

Hindawi
Complexity
Volume 2021, Article ID 8391973, 15 pages
https://doi.org/10.1155/2021/8391973

Research Article
Analysis of Feature Extraction and Anti-Interference of Face
Image under Deep Reconstruction Network Algorithm

 Jin Yang , Yuxuan Zhao , Shihao Yang , Xinxin Kang , Xinyan Cao, and Xixin Cao
 College School of Software & Microelectronics, Peking University, Beijing 102600, China

 Correspondence should be addressed to Jin Yang; yang.jin@pku.edu.cn

 Received 19 July 2020; Revised 11 September 2020; Accepted 23 September 2020; Published 20 March 2021

 Academic Editor: Zhihan Lv

 Copyright © 2021 Jin Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which
 permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

 In face recognition systems, highly robust facial feature representation and good classiﬁcation algorithm performance can aﬀect
 the eﬀect of face recognition under unrestricted conditions. To explore the anti-interference performance of convolutional neural
 network (CNN) reconstructed by deep learning (DL) framework in face image feature extraction (FE) and recognition, in the
 paper, ﬁrst, the inception structure in the GoogleNet network and the residual error in the ResNet network structure are combined
 to construct a new deep reconstruction network algorithm, with the random gradient descent (SGD) and triplet loss functions as
 the model optimizer and classiﬁer, respectively, and it is applied to the face recognition in Labeled Faces in the Wild (LFW) face
 database. Then, the portrait pyramid segmentation and local feature point segmentation are applied to extract the features of face
 images, and the matching of face feature points is achieved using Euclidean distance and joint Bayesian method. Finally, Matlab
 software is used to simulate the algorithm proposed in this paper and compare it with other algorithms. The results show that the
 proposed algorithm has the best face recognition eﬀect when the learning rate is 0.0004, the attenuation coeﬃcient is 0.0001, the
 training method is SGD, and dropout is 0.1 (accuracy: 99.03%, loss: 0.0047, training time: 352 s, and overﬁtting rate: 1.006), and the
 algorithm proposed in this paper has the largest mean average precision compared to other CNN algorithms. The correct rate of
 face feature matching of the algorithm proposed in this paper is 84.72%, which is higher than LetNet-5, VGG-16, and VGG-19
 algorithms, the correct rates of which are 6.94%, 2.5%, and 1.11%, respectively, but lower than GoogleNet, AlexNet, and
 ResNet algorithms. At the same time, the algorithm proposed in this paper has a faster matching time (206.44 s) and a higher
 correct matching rate (88.75%) than the joint Bayesian method, indicating that the deep reconstruction network algorithm
 proposed in this paper can be used in face image recognition, FE, and matching, and it has strong anti-interference.

1. Introduction Depth model has been applied in many ﬁelds, and its
 application in image recognition is the ﬁrst to be concerned.
FR technology has been extensively adopted in identity Moreover, the depth model is used for FE in the image,
recognition, but it is mainly used to detect biological features which is far better than manual FE, and can be eﬀectively
in the face for recognition, with strong uniqueness and applied in the ﬁeld where manual FE is not perfect [4]. The
security [1]. However, the dataset made up of face images is common feature of DL and most machine learning is the
one that presents a highly nonlinear distribution, and if ability to extract features. In DL, multilayer network
simple classiﬁcation method is applied, higher classiﬁcation structure is mostly applied, which can fuse the bottom
errors will occur due to individual diﬀerences [2]. Moreover, features in the image to form the top features [5]. ImageNet’s
the face image will be disturbed by lighting, decorations, annual ImageNet Large-Scale Visual Recognition Challenge
scenes, and other factors during the shooting, which makes is the largest and most advanced image recognition com-
FR extremely diﬃcult. At present, the methods commonly petition in the world. Since 2012, AlexNET used DCNN to
used for face detection include principal component anal- reduce the recognition error rate of the top 5 to 16.4%, and
ysis, support vector machine, CNN, and active deformation DCNN algorithm has been adopted in the subsequent
model [3]. champion recognition models [6, 7].

2 Complexity

The rest of the paper is organized as follows. Section 2 feature vectors and found that the proposed algorithm had
discusses the related work of the DCNN network and image high extraction speed and accuracy at different noise levels
FE technology. Then, in Section 3, a deep reconstruction [15]. Fieldin et al. used enhanced adaptive particle swarm
network is proposed, and the face recognition and facial FE optimization to conduct deep CNN evolution, and after
algorithm is constructed based on the optimized deep re- training and verification, they found that the extraction error
construction network. Section 4 carries out the verification rate was 4.78% [16].
of face recognition and FE and matching algorithms pro- To sum up, it is evident that DCNN is widely used in
posed in this paper and compares them with other DCNN image recognition. The improved DCNN can effectively
algorithms. Section 5 discusses and analyzes the results extract the features in the image. However, there are few
obtained in this paper, and the results of the paper were research studies on the anti-interference of DCNN in face
compared with other people’s relevant research results. image recognition. Therefore, a DCNN based on Caffe depth
Section 6 concludes the paper with summary and points out framework is proposed. Matlab simulation software is used
the future research directions. to explore the influence of different parameter settings on
the performance of FR by constructing the DCNN model.
2. Related Work Then, the LFW database and the self-constructed face da-
tabase in this paper are combined to explore the anti-in-
NN is a common computer model, which is mainly based on terference performance of the DCNN model constructed in
the traditional statistical modeling of complex relationships this paper for face image recognition in different scenarios.
to explore the relationship between different datasets. The The results of this paper are intended to lay a foundation for
NN structure in different application fields is different, but improving the efficiency of FR.
the NN structure with the largest application in image-re-
lated fields is CNN. Seeliger et al. applied DCNN to the 3. Methodology
detection of magnetoencephalogram signal map and
reconstructed the cerebral cortex activity model based on the 3.1. Design of Deep Reconstruction Network Structure.
extracted resolution source [8]. Hoang and Kang proposed a Existing studies have shown that, with the gradual deepening
method for fault diagnosis of rolling axis based on CNN and of the CNN structure, the CNN training results become
vibration images based on the depth structure and found better, but while improving the results, it will also increase
that this method does not need FE technology and has high the amount of network computation [17]. Therefore, an
diagnostic accuracy and robustness under noisy environ- inception network structure is proposed in the GoogleNet
ment [9]. At present, deep learning has been widely used in network, which can make full use of the features extracted by
the field of image analysis, especially in facial image rec- each layer of the network, and the structure can increase the
ognition, which has obtained excellent results. Zhang et al. depth and complexity of the network while ensuring the
proposed a convolutional neural network model based on network computing complexity. In the paper, some network
three-stage filtering and HOG, which was applied to the parameter adjustments are proposed for the basic structure
recognition of color face images and obtained excellent effect of the GoogleNet network: (1) the size of the feature map of
[10]. Goswami et al. proposed a framework for face rec- the input network should be slowly reduced in the network
ognition based on deep neural networks and used a char- to avoid the bottleneck of the feature representation in the
acteristic anomaly filter to detect singularities. Finally, after image; (2) high-dimensional features are easier to obtain
verification with public data, it was found that the model than low-dimensional features and can accelerate the
built by it had strong face recognition robustness [11]. training speed of the model; (3) the adjacent neurons in the
Isogawa et al. indicated that there was no denoising method network have close correlations and can be integrated into
capable of parameter adjustment in the DCNN model. spatial relations in low-dimensional spaces.
They used the proportional coefficient of soft shrinkage In the convolutional neural network, the 5 ∗ 5 size
threshold to optimize DCNN, which was found to be ap- convolution kernel is 25/9 times the size of 3 ∗ 3 convo-
plicable to noise removal in the image [12]. Nakazawa and lution. If the larger convolution kernel can be replaced with
Kulkarni proposed a method to detect and segment ab- the superposition of multiple 3 ∗ 3 size convolution kernels,
normal defect patterns using the NN architecture of deep then it can make the network have fewer parameters in the
convolutional encoder-decoder and finally found that these same field of view. Therefore, a hypothesis is proposed in the
models could detect invisible defect patterns from real study. First, a 3 ∗ 3 size convolution filter is adopted to
images [13]. process a 5 ∗ 5 size block to obtain a 3 ∗ 3 size output
The basic principle of image FE is to use a computer to feature map. After the convolution with a 3 ∗ 3 size con-
extract the information in the image and then judge whether volution kernel, 1 ∗ 1 size output is obtained. At this time,
the differences in the image are one of the features in the the final calculation amount is about 18/25 of the direct use
image. An and Zhou proposed an image FE algorithm based of the 5 ∗ 5 size convolution kernel, which saves 28% of the
on two-dimensional empirical mode decomposition and calculation amount, but the network also needs to use the
scale-invariant feature change. After verification, they found ReLu activation function to correct the output. After that,
that it could effectively improve the speed and accuracy of FE the 2 ∗ 2 size convolution kernel is adopted to replace the
[14]. Xu et al. proposed a model for automatically extracting 3 ∗ 3 size convolution kernel, which can save 11% of the
noise level perception features based on the CNN model and calculation, but the use of an asymmetric convolution kernel

Complexity 3

is better. Therefore, in the study, 1 ∗ 3 and 3 ∗ 1 size
Input
cascaded convolution kernels are used to replace 3 ∗ 3 size
convolution kernels to save 33% of the calculation, so the
larger size convolution kernel can be decomposed in the
same way. Based on this, an improved Inception structure is
obtained, as shown in Figure 1. n ∗ 1 Conv
The structure of ResNet can have up to 152 layers [18].
This model adds an identical high-speed channel to a simple 1 ∗ 1 Conv
stacked shallow network, as shown in Figure 2. On the left is
the ordinary network, and its output value is H(x), which 1 ∗ n Conv 1 ∗ 1 Conv
represents any ideal feature transformation. The residual
network output value on the right is H(x) � F(x) + x, and 3 ∗ 3 MaxPool
what needs to be fitted in the network is the residual F(x).
Adding an identical shortcut connection in the network
1 ∗ 1 Conv
Europe can make the input mapping and input of the
network overlap. The output formula of the residual unit is
y � F(x,{Wi}) + x, where x represents the input and y rep-
resents the output, and the function F(x,{Wi}) is the residual
mapping obtained by learning. Theoretically, the following Filter concat
two networks can approximate any function. But if the
Figure 1: Improved inception structure.
optimal solution of the network is close to the identity
mapping, it is easier to optimize using the residual network;
if the convolution branch appears as gradient dispersion, the x x
features and gradients in the network can also flow in the
branches of the identity mapping, which ensures the reli-
ability of information transmission.
Weight layer Weight layer
The convolution layer in the convolution branch
F(x,{Wi}) of the residual unit can be composed of multiple ReLu ReLu X
layers, and the residual unit can also be modified by scaling Identity
of the mapping branch, convolution kernel size, dropout, Weight layer Weight layer
and the activation function location. Therefore, a new F (x) ReLu
DCNN network structure is constructed based on the in-
ception structure in the GoogleNet network and the residual F (x) + x +
structure in the ResNet network, to improve the network’s
ReLu
computational efficiency and information circulation effect.
The combined new network structure is shown in Figure 3.
The basic parameters of the deep reconstruction network Figure 2: Differences s in the structure of different networks. The
constructed in this paper are shown in Table 1, where the left is an ordinary network, and the right is a residual network in
ResNet.
dropout parameter is x, and the subsequent continuous
adjustment of the parameters is required to select the op-
timal dropout value for the final test. in which α is the Euclidean distance between sample features
Deep neural networks often use Softmax as a classifier, of the same category or the minimum interval of Euclidean
and the triplet loss function is a classifier function proposed distance between sample features of different categories.
by Google for face recognition CNNs. Studies have also At this time, the mathematical expression of triplet loss is
shown that its recognition rate on the LFW dataset has as follows:
reached 99.63%, and the recognition rate on the YouTube N � ��2 �� ��2
�
dataset has also reached 95.12% [19]. Therefore, the impact L � ���f xi1 − f xi2 ��� − ���f xi1 − f xi3 ��� + α . (2)
of different classifiers on the model’s face recognition rate is 2 2
i
analyzed. In this paper, triplets are composed of a reference
(anchor) of random samples in the training set, a reference In the training process of the neural network, the gra-
sample (positive), and a sample of different categories dient equation corresponding to each parameter is as
(negative). Assuming that the sample features obtained by follows:
extracting each sample in triplets are f(x1 ), f(x2 ), and zL
f(x3 ), respectively, then the goal of optimization is as � 2 f xi3 − f xi2 , (3)
zf xi1
follows:
�� � � �
��f xi − f xi ���2 + α

4 Complexity

 Image Image

 Conv 1 ∗ 1 Conv 1 ∗ 1 Conv 1 ∗ 1 Conv 1 ∗ 1 Conv 1 ∗ 1

 Conv 3 ∗ 3 Conv 3 ∗ 3 Conv 1 ∗ 7

 Conv 3 ∗ 3 Conv 7 ∗ 1

 Combination Combination

 Conv 1 ∗ 1 Conv 1 ∗ 1

 Sum Sum

 (a) (b)

 Image

 Conv 1 ∗ 1 Conv 1 ∗ 1

 Conv 1 ∗ 3

 Conv 3 ∗ 1

 Combination

 Conv 1 ∗ 1

 Sum

 (c)

 Figure 3: (a) ResNet-Inception-1 structure; (b) ResNet-Inception-2 structure; (c) ResNet-Inception-3 structure.

 zL the same category [20]. Center loss can be deﬁned as
 � 2 f xi1 − f xi3 . (5) follows: in which cyi is the feature center of the i-th
 zf xi3 
 category.
 In order to improve the optimization eﬃciency of the
model, the online method is used to select the combination 1 m ��� ��2
 LC � ��xi − cyi ���2 , (7)
of triplets. At the same time, in order to prevent the model 2 i�1
from falling into the local optimal solution state, α is re-
moved when selecting the negative sample, so xin only needs
 where LC is the center loss. Cyi is the feature center of i-th
to satisfy
 �� � � � category. xi is the feature before full connection layer, and m
 ��f xi − f xi ���2

Complexity 5

alignment, and the face features in diﬀerent poses are
Table 1: Basic parameters of ResNet-Inception network structure.
extracted, mainly partial images around the eyes, corners of
Structure Parameter Output image size the mouth, and the tip of the nose in the face image, as
Conv-1 3∗3 147 ∗ 147 ∗ 32 shown in Figure 5.
Conv-2 3∗3 145 ∗ 145 ∗ 32 Subsequently, the Euclidean distance [21] and the
Conv-3 3∗3 145 ∗ 145 ∗ 64 learning feature matching algorithm are applied to feature
Pool-1 — 73 ∗ 73 ∗ 64 matching in the face image, and the matching process is
Conv-4 1∗1 73 ∗ 73 ∗ 80 shown in Figure 6.
Conv-5 3∗3 70 ∗ 70 ∗ 190 In the ﬁrst step, it is necessary to quickly judge the
Conv-6 3∗3 33 ∗ 33 ∗ 256
sample that is easy to judge in matching face image samples.
ResNet-Inception-1 5 33 ∗ 33 ∗ 256
Reduction-1 — 16 ∗ 16 ∗ 512
The Euclidean distance is used for the similarity between
ResNet-Inception-2 10 16 ∗ 16 ∗ 512 samples, and then, the absolute distance between two points
Reduction-2 — 8 ∗ 8 ∗ 1024 in space is calculated as follows:
ResNet-Inception-3 5 8 ∗ 8 ∗ 1024 ����������

Pool-2 — 1792 n
2
Dropout x 1792 d(u, v) � ui − vi . (10)
i�1

The cyi is updated to update the sample set in the input In the second step, on the premise of ensuring the feature
network, and the update rule is as follows: recognition rate in the face image, a learning algorithm with
a higher recognition rate is used to model the features in the
n
i�1 δ yi � u · cyi − xi face image, and the similarity of the complete face features is
cv+1 v
u � cu − , (8) matched to realize the refined judgment of the sample. It is
1 + ni�1 δ yi � u
evident from Figure 5 that it is needed to extract five local
value features in the face image and obtain a total of six 160-
in which δ belongs to 0 or 1. dimensional feature vectors, which are spliced into 960-
Then, the error function in the network can be defined as dimensional facial feature vectors. Then, the feature image is
follows: reduced to 160 dimensions again by PCA dimension
reduction.
LOSS � LOSSs + λLOSSc , (9) Besides, in the second step, joint Bayesian is used to
calculate the similarity between samples. Assuming that the
in which λ is the weight in the loss function. When λ is 0, the
two faces are the joint distribution of x1 and x2, respectively,
classifier in the network is Softmax.
and both obey the Gaussian distribution feature of 0 mean,
then the mathematical expression of the covariance between
3.2. Face Fe and Matching Based on Deep Reconstruction these two face images is as follows:
Network. In the process of taking photos, people may not cov ui , vj � cov μi , μj + cov εi , εj , (11)
fully expose their faces due to the camera angle, light, mood,
and equipment. In order to detect faces in different positions in which μ is the identity of the character and ε is the
or different angles in the image, it needs to first change the difference between the face itself (light, expression, and
scale of the sliding window or the input image. The sliding posture).
window also needs to scan the input image according to a
certain step size (w); as w increases, the quantity of judgment
windows and the amount of computation in the model 3.3. Evaluation Indexes for the Testing and Identification of
network will decrease geometrically. When the image is Deep Reconstruction Network Algorithm. In this paper,
input into the DCNN, there is no need to preprocess the Matlab software is used to simulate the deep reconstruction
image. However, in order to better detect faces of different network, and LFW dataset is used for training and testing. In
sizes, the subacquisition rate of the original input image is this paper, the following experiments are carried out on
scaled at different scales, and then, the image pyramid is TensorFlow platform with Linux as the operating system and
obtained, as shown in Figure 4. 2 ∗ TITAN X as the GPU. The LFW dataset has a total of
In this paper, DCNN is used to process each face image 13233 face images, each of which gives the corresponding
located in the image pyramid according to the input order, name, there are a total of 5749 people, and most people have
and the potential regions in the detected face image are only one picture. The size of each image is 250 × 250, most of
marked at the same time. The coordinate points and size of which are color images, but there are also a few black and
the specific area of the face in the image are recorded. After white face pictures. In this paper, the face images in the
all images are processed, all regions of response points in the dataset are randomly divided into a training set and a
detected images are reversely mapped to the original face verification set. The training set contains 10,000 face images,
image input in the input layer. Then, the overlapping areas of and the verification set contains 3233 face images. When
the image are fused to obtain the final result. Subsequently, training the model, first the effects of different parameter
face straightening is performed at the stage of no face settings (learning rate η, attenuation coefficient λ, training

6 Complexity

 DCNN

 DCNN

 DCNN

 DCNN

 Figure 4: Flow chart of face detection in face image.

 + +
 +
 + +

 Figure 5: Extraction of face feature points and subﬁgures of each area.

 Start

 Face image Local feature extraction

 Feature extraction Feature stitching

 Normalized Dimensionality reduction

 Euclidean distance Similarity calculation

 No

 thethreshold?

 Yes Yes No
 Same person Different person

 End

 Figure 6: Matching process of face feature points.

Complexity 7

optimization algorithm, and dropout) on the training effect task. In addition, the overfitting ratio (OR) is used to
of the model are analyzed, and the loss value, accuracy value, evaluate the algorithm. The calculation equation of OR is as
overfitting rate, and training time are adopted for evaluation follows:
of the training effect. Based on the model with the optimal
parameter settings, it is applied to the verification set, and TrainAcc
OR � , (16)
the mean average precision (mAP) value is used to evaluate ValAcc
the model recognition effect. in which TrainAcc is the training accuracy rate and ValAcc is
In order to evaluate the effect of the proposed algorithm the verification accuracy rate.
on face FE, 10140 images of 1150 people from the LFW
dataset are selected, and 180 pairs of positive face samples
and 180 pairs of negative face samples are randomly selected 4. Results
as test sample pairs. Then, the construction network is used
to perform face FE, and the simplest Euclidean distance is 4.1. Comparison of Recognition Rate Based on Deep Recon-
used to match feature similarity. Finally, classification ac- struction Network Algorithm. By adjusting the learning rate
curacy and matching accuracy are used to evaluate the ef- η, attenuation coefficient λ, training method, and dropout,
fectiveness of face FE. Subsequently, these 10140 images are the impact of different parameters on the accuracy of DCNN
used to continue segmentation. Images of 1050 people are model recognition is explored, the database is used to train
selected for facial feature matching algorithm training. For the DCNN model, and the maximum number of iterations is
the remaining 100 images of individuals, 200 pairs of set to 1000. In the experiment, the activation function in the
positive samples and 200 pairs of negative samples are network is always kept as ReLU. First, the effects of different
randomly selected to verify. The recognition accuracy rate, learning rates η on the model recognition rate are compared.
positive sample error rate, negative sample error rate, The results are shown in Figures 7 and 8. It is evident from
pairing correct rate, and pairing average time are used to Figure 7 that when η � 0.0004, the recognition accuracy of
evaluate the matching effect of face features. the algorithm is the highest; when η � 0.001, the recognition
In the field of target recognition and detection and accuracy of the algorithm is the lowest. It is evident from
recognition, mAP is often used to evaluate the effect of Figure 8 that when η � 0.0004, the loss value of the algorithm
algorithm detection and recognition [22]. In the paper, basic is the smallest, and when η � 0.001, the loss value of the
concepts such as error rate, accuracy, precision, recall, and algorithm is the largest. With the gradual increase in η, the
average precision are first introduced. The error rate and accuracy value gradually decreases and the loss value
accuracy rate are the most commonly used measurement gradually increases.
terms in classification evaluation, where the error rate is the Then, the effects of different learning rates on the face
ratio of error samples to the total sample, and the accuracy recognition performance of the algorithm are quantitatively
rate is the ratio of correct sample to the total sample. As- compared. It is evident from Table 2 that when η � 0.0004,
suming that the sample dataset is S, the sample is defined as the algorithm has the largest accuracy value (99.03%), the
ui, the type of the sample is vi, the algorithm’s prediction smallest loss value (0.047), and the shortest training time
result is f(ui), and then the calculation equations of error rate (352 s). When η � 0.0007, the algorithm has the lowest
and accuracy rate are as follows: overfitting rate (1.005). In summary, when η � 0.0004, the
performance of the algorithm is the best, so follow-up tests
1 m are conducted based on this.
Error(f; S) � I f ui ≠ vi , (12) The effect of different attenuation coefficients λ on the
m i�1 model recognition rate is compared. The results are shown in
Figures 9 and 10. It is evident from Figure 9 that when
1 m λ � 0.0001, the recognition accuracy of the algorithm is the
Acc(f; S) � I f ui � vi � 1 − Error(f; S). (13)
m i�1 highest; when λ � 0.09, the recognition accuracy of the al-
gorithm is the lowest. It is evident from Figure 10 that when
The calculation equations for the precision rate (Pre) and λ � 0.0001, the loss value of the algorithm is the smallest;
the recall rate (Recall) are as follows: when λ � 0.09, the loss value of the algorithm is the largest.
TP With the gradual increase in λ, the accuracy value gradually
Pre � , (14) decreases and the loss value gradually increases.
TP + FP
Then, the effects of different attenuation coefficients on
TP the face recognition performance of the algorithm are
Recall � , (15) quantitatively compared. It is evident from Table 3 that when
TP + FN
λ � 0.0001, the accuracy value of the algorithm is the largest
where TP is the true positive, FP is the false positive, and FN (99.03%), and the loss value is the smallest (0.047). When
is the false negative. λ � 0.0007, the training time of the algorithm is the shortest
Using the precision rate as the vertical axis and the recall (342 s). When λ � 0.0001 and 0.0009, the overfitting rate of
rate as the horizontal axis, a P-R curve is obtained. The area the algorithm is the smallest (1.006). In summary, when
under the P-R curve is average precision (AP), and mAP is λ � 0.0001, the performance of the algorithm is the best, so
the average of AP indicators in each type of classification follow-up tests are conducted based on this, and the classifier

8 Complexity

1.0 used in this model is determined to be the triplet loss
function.
After comparing the eﬀects of diﬀerent training methods
0.9 on the model recognition rate, the results are shown in
Figures 11 and 12. It is evident from Figure 11 that when the
training method is SGD, the recognition accuracy of the
Accuracy

0.8 algorithm is the highest; when the training method is
Adagrad, the recognition accuracy of the algorithm is the
lowest. It is evident from Figure 12 that when the training
0.7 method is SGD, the loss value of the algorithm is the
smallest; when the training method is Adagrad, the loss value
of the algorithm is the largest.
0.6 Then, the effects of different training methods on the face
0 200 400 600 800 1000 recognition performance of the algorithm are quantitatively
Number of iterations compared. It is evident from Table 4 that when the training
η = 0.0004 η = 0.0008 method is SGD, the algorithm has the largest Acc value
η = 0.0005 η = 0.0009 (99.03%), the smallest loss value (0.047), and the lowest
η = 0.0006 η = 0.001 overfitting rate (1.006). When the training method is
η = 0.0007 RMSprop, the training time of the algorithm is the shortest
(351 s). In summary, when the training method is SGD, the
Figure 7: Comparison of recognition accuracy of different learning performance of the algorithm is the best, so follow-up tests
rates.
are performed based on this.
Finally, the impact of different dropout values on the
2.5 model recognition rate is compared. The results are shown in
Figures 13 and 14. It is evident from Figure 13 that when
dropout � 0.1, the algorithm has the highest recognition
2.0
accuracy; when dropout � 0.6, the algorithm has the lowest
recognition accuracy. It is evident from Figure 14 that when
1.5 dropout � 0.1, the loss value of the algorithm is the smallest,
and when dropout � 0.6, the loss value of the algorithm is the
Loss

largest. With the gradual increase in the dropout value, the
1.0
accuracy value of the algorithm gradually decreases, and the
loss value gradually increases.
0.5 Then, the effects of different dropout values on the face
recognition performance of the algorithm are quantitatively
compared. It is evident from Table 5 that when
0.0
0 200 400 600 800 1000 dropout � 0.1, the algorithm has the largest Acc value
Number of iterations (99.03%), the smallest loss value (0.047), and the lowest
η = 0.0004 η = 0.0008
overfitting rate with dropout � 0.2 (1.006). When
η = 0.0005 η = 0.0009
dropout � 0.6, the training time of the algorithm is the
η = 0.0006 η = 0.001
shortest (337 s). In summary, when dropout � 0.1, the per-
η = 0.0007 formance of the algorithm is the best, so follow-up tests are
conducted based on this.
Figure 8: Comparison of training loss value of different learning It is set that η � 0.0004 and λ � 0.0001 in the construction
rates. of the deep reconstruction network, SGD is selected as the
training method, and dropout � 0.1. Then, the effects of
different activation functions on the model recognition rate
are compared, as shown in Figures 15 and 16. It is evident
Table 2: Effects of different learning rates on the training effect of from Figure 15 that the recognition accuracy rate is the
deep reconstruction network algorithm.
highest when the activation function is ReLu, and the rec-
η Acc (%) Loss Training time (s) Overfitting rate ognition accuracy rate is the lowest when the activation
0.0004 99.03 0.047 352 1.006 function is sigmoid. From Figure 16, it is evident that the loss
0.0005 96.74 0.059 354 1.006 value is the lowest when the activation function is ReLu, and
0.0006 93.15 0.062 368 1.082 the loss value is the highest when the activation function is
0.0007 90.31 0.120 370 1.005 sigmoid.
0.0008 88.07 0.115 400 1.011 Then, the effects of different activation functions on the
0.0009 86.41 0.170 398 1.026 face recognition performance of the algorithm are quanti-
0.001 85.05 0.206 405 1.058
tatively compared. It is evident from Table 6 that when the

Complexity 9

 1.00 activation function is ReLu, the performance of the algo-
 rithm is the best, so follow-up tests are carried out on the
 0.95 basis.
 Based on the results obtained above, in the deep recon-
 0.90 struction network constructed in this paper, η � 0.0004,
 λ � 0.0001, the training method is set to SGD, and dropout � 0.1
 Accuracy

 0.85 for face recognition veriﬁcation, and the recognition eﬀect is
 compared with LetNet-5, GoogleNet, AlexNet, VGG-16, VGG-
 0.80 19, and ResNet network. It is evident from Figure 17 that the
 area under the P-R curve of the algorithm constructed in this
 0.75 paper is the largest; that is, the mAP value is the largest. The
 mAP values of diﬀerent models are ranked, and the result is the
 0.70 algorithm proposed in the paper > ResNet > AlexNet > VGG-
 0 200 400 600 800 1000 19>GoogleNet > VGG-16>LetNet-5.
 Number of iterations
 λ = 0.0001 λ = 0.009
 λ = 0.0009 λ = 0.01 4.2. Face Fe and Matching Veriﬁcation Based on Deep Re-
 λ = 0.001 λ = 0.09 construction Network Algorithm. The face recognition and
 FE algorithm proposed in this paper is applied to the LFW
Figure 9: Comparison of recognition accuracy of diﬀerent at-
tenuation coeﬃcients.
 dataset to extract and match face features. The results are
 shown in Figure 18. It is evident that the algorithm proposed
 in this paper can eﬀectively extract the features in the face
 image and can complete the matching of the same features
 1.4
 and then realize the recognition and detection of the same
 1.2 face.
 Then, the deep reconstruction network algorithm
 1.0
 constructed in this paper is applied to FE and feature
 0.8 matching in face images, and the performance is com-
 Loss

 pared with other models. It is evident from Table 7 that
 0.6
 when the construction method of this paper is trained, the
 0.4 classiﬁcation accuracy rate on the training set is 97.94%,
 and it is higher than the LetNet-5, VGG-16, and VGG-19
 0.2 algorithms, the accuracy rates of which are 2.1%, 0.94%,
 0.0 and 1.16%, respectively, but lower than GoogleNet,
 0 200 400 600 800 1000 AlexNet, ResNet algorithm, the accuracy rates of which
 Number of iterations are 0.14%, 0.58%, and 0.46%, respectively. In the veriﬁ-
 λ = 0.0001 λ = 0.009 cation set, the correct matching rate of the algorithm
 λ = 0.0009 λ = 0.01 constructed in this paper is 84.72%, and it is higher than
 λ = 0.001 λ = 0.09 the LetNet-5, VGG-16, and VGG-19 algorithms, the
 correct matching rates of which are 6.94%, 2.5%, and
Figure 10: Comparison of training loss value of diﬀerent atten-
uation coeﬃcients. 1.11%, but lower than the GoogleNet, AlexNet, and
 ResNet algorithms, the correct matching rates of which
 are 0.56%, 1.39%, and 1.95%, respectively.
Table 3: Inﬂuence of diﬀerent attenuation coeﬃcients on the The matching eﬀect of the features extracted from the
training eﬀect of deep reconstruction network algorithm. face image is compared. The comparison results of the total
λ Acc (%) Loss Training time (s) Overﬁtting rate recognition rate are shown in Table 8. It is evident that,
0.0001 99.03 0.047 352 1.006 after the joint Bayesian method is used in the algorithm
0.0009 97.63 0.048 342 1.006 proposed in this paper, the correct recognition log rate in
0.001 95.17 0.102 358 1.037 the veriﬁcation set is the highest (88.75%), and it is higher
 than LetNet-5, GoogleNet, AlexNet, VGG-16, VGG-19,
 and ResNet algorithms, the correct recognition rates of
activation function is ReLu, the Acc value of this algorithm is which are 4%, 1.25%, 0.75%, 3.75%, 3.25%, and 0.25%,
the largest (99.03%), the loss value is the smallest (0.047), respectively.
and the overﬁtting rate is the lowest (1.006). When the The eﬀects of using only the joint Bayesian method and
activation function is Tanh, the training time of this algo- the algorithm in this paper on the matching eﬀect of face
rithm is the shortest (341 s). In summary, when the features are compared, and the results are shown in

10 Complexity

 1.00 1.00

 0.95 0.95

 0.90 0.90

 Accuracy
 Accuracy

 0.85
 0.85

 0.80
 0.80

 0.75
 0.75 0 200 400 600 800 1000
 0 200 400 600 800 1000 Number of iterations
 Number of iterations
 Dropout = 0.1 Dropout = 0.4
 Adadelta RMSprop
 Dropout = 0.2 Dropout = 0.5
 Adam SGD
 Dropout = 0.3 Dropout = 0.6
 Nadam Adagrad
 Figure 13: Comparison of recognition accuracy with diﬀerent
Figure 11: Comparison of recognition accuracy of six training
 dropout values.
methods.

 1.4
 1.4
 1.2
 1.2
 1.0
 1.0

 0.8 0.8
 Loss
 Loss

 0.6 0.6

 0.4 0.4

 0.2 0.2
 0.0
 0 200 400 600 800 1000 0.0
 0 200 400 600 800 1000
 Number of iterations
 Number of iterations
 Adadelta RMSprop Dropout = 0.1 Dropout = 0.4
 Adam SGD
 Dropout = 0.2 Dropout = 0.5
 Nadam Adagrad
 Dropout = 0.3 Dropout = 0.6
Figure 12: Comparison of training loss value of six training
 Figure 14: Comparison of training loss value with diﬀerent
methods.
 dropout values.

Table 4: Inﬂuence of diﬀerent training methods on the training
 Table 5: Eﬀect of diﬀerent dropout values on the training eﬀect of
eﬀect of deep reconstruction network algorithm.
 deep reconstruction network algorithm.
Training Overﬁtting
 Acc (%) Loss Training time (s) Dropout Acc (%) Loss Training time (s) Overﬁtting rate
method rate
 0.1 99.03 0.047 352 1.006
Adadelta 97.45 0.062 357 1.010
 0.2 98.83 0.057 347 1.006
Adam 97.05 0.087 388 1.012
 0.3 97.20 0.087 348 1.009
Nadam 96.87 0.132 438 1.016
 0.4 97.05 0.112 351 1.017
RMSprop 96.14 0.147 351 1.129
 0.5 96.23 0.127 340 1.027
SGD 99.03 0.047 352 1.006
 0.6 95.93 0.153 337 1.518
Adagrad 94.91 0.187 422 1.158

Table 9. It is evident that the algorithm proposed in this 5. Discussion
paper has a faster matching time (206.44 s) and a higher
correct matching rate (88.75%) than the joint Bayesian Previous studies have shown that diﬀerent parameter set-
method. tings will have a certain impact on the recognition rate of the

Complexity 11

 1.00

 0.95

 0.90

 Accuracy
 0.85

 0.80

 0.75

 0.70
 0 200 400 600 800 1000
 Number of iterations
 ReLu MaxOut
 LeakyReLu ELU
 Tanh Sigmoid
 Figure 15: Comparison of accuracy of behavior recognition of six activation functions.

 0.30 DCNN model. In this paper, the eﬀects of diﬀerent learning
 rates, training methods, attenuation coeﬃcients, and
 0.25 dropout on the recognition performance of the algorithm
 are compared. Studies have shown that too large learning
 0.20 rate will cause the algorithm to appear unstable during the
 learning process, which in turn aﬀects the performance of
 the algorithm [23], which is consistent with the results of this
 Loss

 0.15
 paper that the learning rate is negatively correlated with the
 0.10 performance of the algorithm proposed in this paper. Then,
 the eﬀects of diﬀerent training methods on the recognition
 0.05 rate of the DCNN model are compared. The RMSprop-based
 network training method will increase the probability of
 0.00
 overﬁtting, and the madam-based network training method
 0 200 400 600 800 1000 will increase the momentum parameter of Nesterov, so the
 Number of iterations training will take more time. Previous studies have shown
 ReLu ELU that adjusting dropout discarding rate in many network
 LeakyReLu Sigmoid parameters can solve the problem of overﬁtting in the model,
 Tanh pReLu but excessively increasing dropout discarding rate will re-
 MaxOut duce the RA of image classiﬁcation, and the overﬁtting
 probability will increase [24], which is basically consistent
Figure 16: Comparison of behavior training loss of six activation
functions.
 with the results in this paper that the dropout is negatively
 correlated with the recognition performance of the con-
 structed face recognition algorithm. At the same time, the
Table 6: Inﬂuence of diﬀerent activation functions on the training eﬀect of diﬀerent attenuation coeﬃcient settings on the
eﬀect of deep reconstruction network algorithm. recognition performance of the proposed algorithm is
 compared. It is found that the attenuation coeﬃcient is
Activation Acc Training time Overﬁtting negatively correlated with the recognition rate of the pro-
 Loss
function (%) (s) rate
 posed algorithm. This is because excessive weight attenua-
Sigmoid 95.17 0.091 382 1.007 tion coeﬃcient will destroy the stability of the algorithm in
ReLu 99.03 0.047 352 1.006
 the learning process, so choosing the smallest possible at-
Tanh 98.52 0.053 341 1.007
LeakyReLu 97.11 0.073 350 1.006 tenuation coeﬃcient can ensure the stability of the face
ELU 96.78 0.087 349 1.007 recognition algorithm [25].
MaxOut 97.45 0.082 367 1.006 Facial features are mainly divided into color (skin color),
 contour (ellipse), illumination (hair, eyes, and jaw), template

12 Complexity

 1.0

 0.8

 0.6

 Precision
 0.4

 0.2

 0.0
 0.0 0.2 0.4 0.6 0.8 1.0
 Recall
 LetNet AlexNet
 VGG-16 ResNet
 GoogleNet Our method
 VGG-19
 Figure 17: Comparison of P-R curves of face recognition performance of diﬀerent models.

Figure 18: Face image FE and matching. Diﬀerent colored circles in the same picture indicate diﬀerent extracted features, and the same
color in diﬀerent pictures in the same group indicates matching similar features.

Complexity 13

 Table 7: Comparison of facial FE performance of diﬀerent models.
 Correct number of training Training classiﬁcation accuracy Verify correct Veriﬁcation correct match rate
Algorithm
 classiﬁcations (%) log (%)
LetNet-5 9028 95.84 280 77.78
VGG-16 9117 96.78 296 82.22
VGG-19 9137 97.00 301 83.61
GoogleNet 9239 98.08 307 85.28
AlexNet 9281 98.52 310 86.11
ResNet 9269 98.40 312 86.67
Our method 9226 97.94 305 84.72
LetNet-5 9028 95.84 280 77.78

 Table 8: Comparison of total recognition rates of facial features of diﬀerent models.
 Recognition correct log Log error rate of positive Logarithmic error rate of negative
Algorithm Identify the correct logarithm
 rate (%) samples (%) samples (%)
LetNet-5 339 84.75 13.5 17
VGG-16 340 85 14.5 15.5
VGG-19 342 85.5 15 14
GoogleNet 350 87.5 14 11
AlexNet 352 88 11.5 12.5
ResNet 354 88.5 10.5 12.5
Our method 355 88.75 11 11.5

 Table 9: Comparison of total recognition rates of facial features of diﬀerent models.
Algorithm Matching time (s) Match log correctly Correct match rate (%)
United Bayes 240.15 350 87.5
Our method 206.44 355 88.75

(mean and variance), transformation domain (feature rep- matching [36] and improve the matching accuracy [37],
resentation), structure (facial symmetry), inlay (mosaic indicating that the algorithm proposed in this paper can
rule), and histogram features (grey distribution) [26]. In the quickly perform large-scale face image feature matching and
process of face feature detection, because of factors such as has certain advantages compared with other DCNN models.
highly nonlinear distribution (such as facial expression and
color diﬀerence), ornaments (beard, glasses, and hats), the
expression (facial muscle movement), light (brightness and 6. Conclusion
angles), image quality (resolution), and a complex scenario
(face number and gap), classiﬁcation errors are caused [27]. In this paper, a new deep reconstruction network is con-
Xu et al. proposed a semisupervised method for FR in LFW structed using the Inception structure in the GoogleNet
and YTF databases and found that the recognition rate of the network and the residual structure in the ResNet network,
algorithm was 98.63% and 91.76%, respectively [28]. In this and it is applied to face recognition for optimal parameter
paper, the constructed deep reconstruction network is ap- selection of the algorithm. Based on this algorithm, face
plied to the FE of face images in LFW data; after the pro- pyramid and local feature segmentation are applied to
posed algorithm is compared to LetNet-5 [29], VGG-16 [30], construct a face FE algorithm. It is found that it has a better
and VGG-19 [31] algorithms, it is found that it has a higher FE eﬀect than the general DCNN model. Finally, based on
matching accuracy rate but slightly lower than GoogleNet the algorithm, the face feature matching is achieved using
[32], AlexNet [33], and ResNet [34] algorithms. However, the joint Bayesian method, and the results are veriﬁed. The
the algorithm proposed in this paper can reduce the com- results show that the model constructed in this paper can
plexity of model generation, while preventing the occurrence eﬀectively identify face images with diﬀerent interference
of overﬁtting. Finally, the Bayesian method [35] is used to factors. However, the model is trained and veriﬁed only
match face features. It is found that the accuracy rate of the through the images in the database. A speciﬁc FR system
proposed algorithm for face features matching is higher than needs to be developed to explore its application eﬀect in
LetNet-5, VGG-16, VGG-19, GoogleNet, AlexNet, and video portrait recognition. In conclusion, the results help for
ResNet. At the same time, compared with the joint Bayes the subsequent development of FR system based on DCNN
method alone, it can reduce the time consumption of feature and the improvement of the eﬃciency of FR.

14 Complexity

Data Availability [15] S. P. Xu, T. Liu, G. Z. Zhang et al., “A two-stage noise level
 estimation using automatic feature extraction and mapping
The raw/processed data required to reproduce these ﬁndings model,” IEEE Signal Processing Letters, vol. 26, no. 1,
cannot be shared at this time as the data also form part of an pp. 179–183, 2018.
ongoing study. [16] B. Fielding and L. Zhang, “Evolving image classiﬁcation ar-
 chitectures with enhanced particle swarm optimisation,” IEEE
 Access, vol. 6, pp. 68560–68575, 2018.
Conflicts of Interest [17] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond
The authors declare that they have no conﬂicts of interest. a Gaussian denoiser: residual learning of deep cnn for image
 denoising,” IEEE Transactions on Image Processing, vol. 26,
 no. 7, pp. 3142–3155, 2016.
References [18] E. Kang, W. Chang, J. Yoo, and J. C. Ye, “Deep convolutional
 framelet denosing for low-dose ct via wavelet residual net-
 [1] E. Viccaro, E. Sands, and C. Springer, “Spaced retrieval using
 work,” IEEE Transactions on Medical Imaging, vol. 37, no. 6,
 static and dynamic images to improve face-name recognition:
 pp. 1358–1369, 2018.
 alzheimer’s dementia and vascular dementia,” American
 [19] D. S. . Trigueros, L. Meng, and M. Hartnett, “Enhancing
 Journal of Speech-Language Pathology, vol. 28, no. 3,
 pp. 1184–1197, 2019. convolutional neural networks for face recognition with oc-
 [2] L. Li, P. L. Correia, and A. Hadid, “Face recognition under clusion maps and batch triplet loss,” Image and Vision
 spooﬁng attacks: countermeasures and research directions,” Computing, vol. 79, 2018.
 IET Biometrics, vol. 7, no. 1, pp. 3–14, 2018. [20] J. Alan and F. Guo, “Spectral-spatial feature extraction and
 [3] R. Starrfelt, S. K. Klargaard, A. Petersen, and C. Gerlach, classiﬁcation by ann supervised with center loss in hyper-
 “Reading in developmental prosopagnosia: evidence for a spectral imagery,” IEEE Transactions on Geoscience and Re-
 dissociation between word and face recognition,” Neuropsy- mote Sensing, vol. 57, no. 3, pp. 1755–1767, 2019.
 chology, vol. 32, no. 2, pp. 138–147, 2018. [21] X. Gan and Z. Haitao, “A noise-robust semi-supervised di-
 [4] H. Qin and M. A. El-Yacoubi, “Deep representation-based mensionality reduction method for face recognition,” Optik
 feature extraction and recovering for ﬁnger-vein veriﬁcation,” International Journal for Light & Electron Optics, vol. 157,
 IEEE Transactions on Information Forensics and Security, pp. 858–865, 2018.
 vol. 12, no. 8, pp. 1816–1829, 2017. [22] U. D. Dixit and M. S. Shirdhonkar, “Face-based document
 [5] Z. Wang, C. Song, and T. Chen, “Deep learning based image retrieval system,” Procedia Computer Science, vol. 132,
 monitoring of furnace combustion state and measurement of pp. 659–668, 2018.
 heat release rate,” Energy, vol. 131, pp. 106–112, 2017. [23] H. Zhao, F. Liu, H. Zhang, and Z. Liang, “Research on a
 [6] O. Russakovsky, J. Deng, H. Su et al., “ImageNet large scale learning rate with energy index in deep learning,” Neural
 visual recognition challenge,” International Journal of Com- Networks, vol. 110, pp. 225–231, 2019.
 puter Vision, vol. 115, no. 3, pp. 211–252, 2015. [24] G. H. D. Rosa, P. P. João, and X. S. Yang, “Handling dropout
 [7] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet probability estimation in convolution neural networks using
 classiﬁcation with deep convolutional neural networks,” metaheuristics,” Soft Computing, vol. 4, pp. 6147–6156, 2017.
 Advances in Neural Information Processing Systems, vol. 23, [25] M. Tzelepi and A. Tefas, “Improving the performance of
 no. 2, pp. 1097–1105, 2012. lightweight cnns for binary classiﬁcation using quadratic
 [8] K. Seeliger, M. Fritsche, U. Güçlü et al., “Convolutional neural mutual information regularization,” Pattern Recognition,
 network-based encoding and decoding of visual object rec- vol. 106, p. 107407, 2020.
 ognition in space and time,” NeuroImage.vol. 180, pp. 253– [26] T. Amina, A. T. Ahmed, S. Djamel et al., “Deep learning
 266, 2017. features for robust facial kinship veriﬁcation,” IET Image
 [9] D. T. Hoang and H. J. Kang, “Rolling element bearing fault Process, vol. 12, no. 12, pp. 2336–2345, 2018.
 diagnosis using convolutional neural network and vibration [27] M. Kraemer, Q. B. Huynh, D. Wieczorek et al., “Distinctive
 image,” Cognitive Systems Research, vol. 53, pp. 42–50, 2018. facial features in idiopathic Moyamoya disease in Caucasians:
[10] M. Zhang, S. Khan, and H. Yan, “Deep eigen-ﬁlters for face a ﬁrst systematic analysis,” PeerJ, vol. 6, no. 4, Article ID
 recognition: feature representation via unsupervised multi- e4740, 2018.
 structure ﬁlter learning,” Pattern Recognition, vol. 100, [28] W. Xu, J. Wu, S. Ding, L. Lian, and H. Chao, “Enhancing face
 p. 107176, 2020. recognition from massive weakly labeled data of new do-
[11] G. Goswami, A. Agarwal, N. Ratha, R. Singh, and M. Vatsa, mains,” Neural Processing Letters, vol. 49, no. 3, pp. 939–950,
 “Detecting and mitigating adversarial perturbations for robust 2019.
 face recognition,” International Journal of Computer Vision, [29] D. Wang, H. Lu, and M.-H. Yang, “Kernel collaborative face
 vol. 127, no. 6-7, pp. 719–742, 2019. recognition,” Pattern Recognition, vol. 48, no. 10,
[12] K. Isogawa, T. Ida, T. Shiodera et al., “Deep shrinkage con- pp. 3025–3037, 2015.
 volutional neural network for adaptive noise reduction,” IEEE [30] F. Zhu, X. Kong, H. Fu, and Q. Tian, “Pseudo-positive reg-
 Signal Processing Letters, vol. 25, no. 2, pp. 224–228, 2017. ularization for deep person re-identiﬁcation,” Multimedia
[13] T. Nakazawa and D. V. Kulkarni, “Anomaly detection and Systems, vol. 24, no. 4, pp. 477–489, 2018.
 segmentation for wafer defect patterns using deep convolu- [31] M. F. Hansen, M. L. Smith, L. N. Smith et al., “Towards on-
 tional encoder-decoder neural network architectures in farm pig face recognition using convolutional neural net-
 semiconductor manufacturing,” IEEE Transactions on Semi- works,” Computers in Industry, vol. 98, pp. 145–152, 2018.
 conductor Manufacturing, vol. 32, no. 2, pp. 250–256, 2019. [32] N. Bi, J. Chen, and J. Tan, “The handwritten Chinese character
[14] F.-P. An and X.-W. Zhou, “BEMD-SIFT feature extraction recognition uses convolutional neural networks with the
 algorithm for image processing application,” Multimedia googlenet,” International Journal of Pattern Recognition and
 Tools and Applications, vol. 76, no. 11, pp. 13153–13172, 2017. Artiﬁcial Intelligence, vol. 33, no. 11, p. 1940016, 2019.

Complexity 15

[33] A. Aslam, K. Hayat, A. I. Umar et al., “Wavelet-based con-
 volutional neural networks for gender classiﬁcation,” Journal
 of Electronic Imaging, vol. 28, no. 1, p. 1, 2019.
[34] Z. Lu, X. Jiang, and A. Kot, “Deep coupled resnet for low-
 resolution face recognition,” IEEE Signal Processing Letters,
 vol. 25, no. 4, pp. 526–530, 2018.
[35] Z. Huang, X. Xu, J. Ni, H. Zhu, and C. Wang, “Multimodal
 representation learning for recommendation in internet of
 things,” IEEE Internet of Things Journal, vol. 6, no. 6,
 pp. 10675–10685, 2019.
[36] M. Zhang, D. Zhang, H. Yao, and K. Zhang, ““A probabilistic
 model of human error assessment for autonomous cargo ships
 focusing on human–autonomy collaboration,” Safety Science,
 vol. 130, Article ID 104838, 2020.
[37] F. Hu and G. Wu, “Distributed error correction of EKF al-
 gorithm in multi-sensor fusion localization model,” IEEE
 Access, vol. 8, pp. 93211–93218, 2020.

You can also read