Label Embedded Dictionary Learning for Image Classification
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Label Embedded Dictionary Learning for Image Classification Shuai Shaoa , Yan-Jiang Wanga,∗, Bao-Di Liua,∗∗, Weifeng Liua , Rui Xua a College of Information and Control Engineering, China University of Petroleum, arXiv:1903.03087v2 [cs.CV] 17 Apr 2019 Qingdao 266580, China Abstract Recently, label consistent k-svd (LC-KSVD) algorithm has been successfully applied in image classification. The objective function of LC-KSVD is con- sisted of reconstruction error, classification error and discriminative sparse codes error with `0 -norm sparse regularization term. The `0 -norm, however, leads to NP-hard problem. Despite some methods such as orthogonal match- ing pursuit can help solve this problem to some extent, it is quite difficult to find the optimum sparse solution. To overcome this limitation, we propose a label embedded dictionary learning (LEDL) method to utilise the `1 -norm as the sparse regularization term so that we can avoid the hard-to-optimize problem by solving the convex optimization problem. Alternating direc- tion method of multipliers and blockwise coordinate descent algorithm are then exploited to optimize the corresponding objective function. Extensive experimental results on six benchmark datasets illustrate that the proposed algorithm has achieved superior performance compared to some conventional classification algorithms. Keywords: Dictionary learning, sparse representation, label embedded dictionary learning, image classification ∗ PaperID:NEUCOM-S-19-01925 Corresponding author. ∗∗ PaperID:NEUCOM-S-19-01925 Corresponding author. Email addresses: shuaishao@s.upc.edu.cn (Shuai Shao), yjwang@upc.edu.cn (Yan-Jiang Wang), thu.liubaodi@gmail.com (Bao-Di Liu), liuwf@upc.edu.cn (Weifeng Liu), xddxxr@126.com (Rui Xu) Preprint submitted to Neurocomputing April 18, 2019
Figure 1: The scheme of LEDL is on the right while the LC-KSVD is on the left. The difference between the two methods is the sparse regularization term which LEDL use the `1 -norm regularization term and LC-KSVD use the `0 -norm regularization term. Compared with `0 -norm, the sparsity constraint factor of `1 -norm is unfixed so that the basis vectors can be selected freely for linear fitting. Thus, our proposed LEDL method can get smaller errors than LC-KSVD. 1. Introduction Recent years, image classification has been a classical issue in computer vision. Many successful algorithms Yu et al. (2012b); Shi et al. (2018); Wright et al. (2009); Yu et al. (2014); Song et al. (2018); Yu et al. (2013); Wang et al. (2018); Yu et al. (2012a); Yang et al. (2009, 2017); Liu et al. (2014a, 2017); Jiang et al. (2013); Hao et al. (2017); Chan et al. (2015); Nakazawa and Kulkarni (2018); Ji et al. (2014); Xu et al. (2019); Yuan et al. (2016) have been proposed to solve the problem. In these algorithms, there is one category that contributes a lot for image classification which is the sparse representation based method. Sparse representation is capable of expressing the input sample features as a linear combination of atoms in an overcomplete basis set. Wright et al. (2009) proposed sparse representation based classification (SRC) algorithm 2
which use the `1 -norm regularization term to achieve impressive performance. SRC is the most representative one in the sparse representation based meth- ods. However, in traditional sparse representation based methods, training sample features are directly exploited without considering the discriminative information which is crucial in real applications. That is to say, sparse rep- resentation based methods can gain better performance if the discriminative information is properly harnessed. To handle this problem, dictionary learning (DL) method is introduced to preprocess the training sample features befor classification. DL is a gener- ative model for sparse representation which the concept was firstly prposed by Mallat and Zhang (1993). A few years later, Olshausen and Field (1996, 1997) proposed the application of DL on natural images and then it has been widely used in many fields such as image denoising Chang et al. (2000); Li et al. (2012, 2018), image superresolution Yang et al. (2010); Wang et al. (2012b); Gao et al. (2018), and image classification Liu et al. (2017); Jiang et al. (2013); Chang et al. (2016). A well learned dictionary can help to get significant boost in classification accuracy. Therefore, DL based methods in classification are more and more popular in recent years. Specificially, there are two strategies are proposed to successfully utilise the discriminative information: i) class specific dictionary learning ii) class shared dictionary learning. The first strategy is to learn specific dictionaries for each class such as Wang et al. (2012a); Yang et al. (2014); Liu et al. (2016). The second strategy is to learn a shared dictionary for all classes. For example, Zhang and Li (2010) proposed discriminative K-SVD(D-KSVD) algorithm to directly add the discriminative information into objective func- tion. Furthermore, Jiang et al. (2013) proposed label consistence K-SVD (LC-KSVD) method which add a label consistence term into the objective function of D-KSVD. The motivation for adding this term is to encourage the training samples from the same class to have similar sparse codes and those from different classes to have dissimilar sparse codes. Thus, the discrimi- native abilities of the learned dictionary is effectively improved. However, the sparse regularization term in LC-KSVD is `0 -norm which leads to the NP-hard Natarajan (1995) problem. Although some greedy methods such as orthogonal matching pursuit (OMP) Tropp and Gilbert (2007) can help solve this problem to some extent, it is usually to find the suboptimum sparse solu- tion instead of the optimal sparse solution. More specifically, greedy method solve the global optimal problems by finding basis vectors in order of recon- struction errors from small to large until T (the sparsity constraint factor) 3
times. Thus, the initialized values are crucial. To this end, `0 -norm based sparse constraint is not conducive to finding a global minimum value to ob- tain the optimal sparse solution. In this paper, we propose a novel dictionary learning algorithm named label embedded dictionary learning (LEDL). This method introduces the `1 - norm regularization term to replace the `0 -norm regularization of LC-KSVD. Thus, we can freely select the basis vectors for linear fitting to get optimal sparse solution. In addition, `1 -norm sparse representation is widely used in many fields so that our proposed LEDL method can be extended and applied easily. We show the difference between our proposed LEDL and LC- KSVD in Figure 1. We adopt the alternating direction method of multipliers (ADMM) Boyd et al. (2011) framework and blockwise coordinate descent (BCD) Liu et al. (2014b) algorithm to optimize LEDL. Our work mainly focuses on threefold. • We propose a novel dictionary learning algorithm named label embed- ded dictionary learning which introduces the `1 -norm regularization term as the sparse constraint. The `1 -norm sparse constraint is able to help easily find the optimal sparse solution. • We propose to utilize the alternating direction method of multipliers (ADMM) Boyd et al. (2011) algorithm and blockwise coordinate de- scent (BCD) Liu et al. (2014b) algorithm to optimize dictionary learn- ing task. • We verify the superior performance of our method on six benchmark datasets. The rest of the paper is organized as follows. Section 2 reviews two con- ventional methods which are SRC and LC-KSVD. Section 3.1 presents LEDL method for image classification. The optimization approach and the conver- gence are elaborated in Section 3.2. Section 4 shows experimental results on six well-known datasets. Finally, we conclude this paper in Section 5. 2. Related Work In this section, we overview two related algorithms, including sparse representation based classification (SRC) and label consistent K-SVD (LC- KSVD). 4
Algorithm 1 Sparse representation based classification Input: X ∈ RD×N , y ∈ RD×1 , α Output: id(y) 1: Code y with nthe dictionary X via `o1 -minimization. 2: ŝ = arg mins ky − Xsk22 + 2αksk1 3: for c = 1;c ≤ C;c++ do 4: Compute the residual ec (y) = ky − xc ŝc k22 5: end for 6: id (y) = arg minc {ec } 7: return id(y) 2.1. Sparse representation based classification (SRC) SRC was proposed by Wright et al. (2009). Assume that we have C classes of training samples, denoted by Xc , c = 1, 2, · · · , C, where Xc is the training sample matrix of class c. Each column of the matrix Xc is a training sample feature from the cth class. The whole training sample matrix can be denoted as X = [X1 , X2 , · · · XC ] ∈ RD×N , where D represents the dimensions of the sample features and N is the number of training samples. Supposing that y ∈ RD×1 is a testing sample vector, the sparse representation algorithm aims to solve the following objective function: ŝ = arg min ky − Xsk22 + 2αksk1 s (1) where, α is the regularization parameter to control the tradeoff between fit- ting goodness and sparseness. The sparse representation based classification is to find the minimum value of the residual error for each class. id (y) = arg min ky − xc ŝc k22 (2) c where id (y) represents the predictive label of y, ŝc is the sparse code of cth class. The procedure of SRC is shown in Algorithm 1. Obviously, the residual ec is associated with only a few images in class c. 2.2. Label Consistent K-SVD (LC-KSVD) Jiang et al. (2013) proposed LC-KSVD to encourage the similarity among representations of samples belonging to the same class in D-KSVD. The au- thors proposed to combine the discriminative sparse codes error with the re- construction error and the classification error to form a unified objective func- tion, which gave discriminative sparse codes matrix Q = [q1 , q2 , · · · , qN ] ∈ 5
Algorithm 2 Label Consistent K-SVD Input: X ∈ RD×N , H ∈ RC×N , Q ∈ RK×N , λ, ω, T , K Output: B ∈ RD×K , W ∈ RC×K , A ∈ RK×K , S ∈ RK×N 1: Compute B0 by combining class-specific dictionary items for each class using K-SVD Aharon et al. (2006); 2: Compute S0 for X and B0 using sparse coding; 3: Compute A0 using A = QST SST + I −1 ; −1 4: Cpmpute W0 using W = HST SST + I ; √B0 5: Solve Eq.(3); Use √ ωA0 to initialize the dictionary. λW0 6: Normalize n B A W: o B ← kbb1k , kbb2k , · · · , kbbKk n 1 2 2 2 K 2 o A ← kba1k , kba2k , · · · , kbaKk n 1 2 2 2 K 2 o W ← kbw1k , kbw2k , · · · , kbwKk 1 2 2 2 K 2 7: return B, W, A, S RK×N , label matrix H = [h1 , h2 , · · · , hN ] ∈ RC×N and training sample ma- trix X. The objective function is defined as follows: < B, W, A, S > = arg min kX − BSk2F + λ kH − WSk2F B,W,A,S + ω kQ − ASk2F s.t. ksi k0 < T (i = 1, 2 · · · , N ) 2 (3) √X √ B = argmin √ωQ − √ ωA S B,W,A,S λH λW F s.t. ksi k0 < T (i = 1, 2 · · · , N ) where T is the sparsity constraint factor, making sure that si has no more than T nonzero entries. The dictionary B = [b1 , b2 , · · · , bK ] ∈ RD×K , where K > D is the number of atoms in the dictionary, and S = [s1 , s2 , · · · , sN ] ∈ RK×D is the sparse codes of training sample matrix X. W = [w1 , w2 , · · · , wK ] ∈ RC×K is a classifier learned from the given label matrix H. We hope the W can return the most probable class this sample belongs to. A = [a1 , a2 , · · · , aK ] ∈ RK×K is a linear transformation relys on Q. λ and ω are the regularization parameters balancing the discriminative sparse codes er- ror and the classification contribution to the overall objective, respectively. The algorithm is shown in Algorithm 2. Here, we denote m (m = 0, 1, 2, · · ·) as the iteration number and (•)m means the value of matrix (•) after mth iteration. 6
While the LC-KSVD algorithm exploits the `0 -norm regularization term to control the sparseness, it is difficult to find the optimal sparse solution to a general image recognition. The reason is that LC-KSVD use OMP method to optimise the objective function which usually obtain the suboptimal sparse solution unless finding the perfect initialized values. 3. Methodology In this section, we first give our proposed label embedded dictionary learning algorithm. Then we elaborate the optimization of the objective function. 3.1. Proposed Label Embedded Dictionary Learning (LEDL) Motivated by that the optimal sparse solution can not be found easily with `0 -norm regularization term, we propose a novel dictionary learning method named label embedded dictionary learning (LEDL) for image classification. This method introduces the `1 -norm regularization term to replace the `0 - norm regularization of LC-KSVD. Thus, we can freely select the basis vectors for linear fitting to get optimal sparse solution. The objection function is as follows: < B, W, A, S >= arg min kX − BSk2F + λ kH − WSk2F B,W,A,S + ω kQ − ASk2F + 2εkSk`1 (4) s.t. kB•k k22 ≤ 1, kW•k k22 ≤ 1, kA•k k22 ≤ 1 (k = 1, 2, · · · K) where, (•)•k denotes the kth column vector of matrix (•). The `1 -norm reg- ularization term is utilized to enforce sparsity and ε is the regularization parameter which has the same function as α in Equation (1). 3.2. Optimization of Objective Function Consider the optimization problem (4) is not jointly convex in both S, B, W and A, it is separately convex in either S (with B, W, A fixed), B (with S, W, A fixed), W (with S, B, A fixed) or A (with S, B, W fixed). To this end, the optimization problem can be recognised as four optimization subproblems which are finding sparse codes (S) and learning bases (B, W, A), respectively. Here, we employ the alternating direction 7
Figure 2: The complete process of LEDL algorithm method of multipliers (ADMM) Boyd et al. (2011) framework to solve the first subproblem and the blockwise coordinate descent (BCD) Liu et al. (2014b) algorithm for the rest subproblem. The complete process of LEDL is shown in Figure 2. 3.2.1. ADMM for finding sparse codes While fixing B, W and A, we introduce an auxiliary variable Z and refor- mulate the LEDL algorithm into a linear equality-constrained problem with respect to each iteration has the closed-form solution. The objective function is as follows: < B, W, A, C, Z >= arg min kX − BCk2F + 2εkZk`1 B,W,A,C,Z + λ kH − WCk2F + ω kQ − ACk2F (5) s.t. C = Z, kB•k k22 ≤ 1, kW•k k22 ≤ 1, kA•k k22 ≤ 1(k = 1, 2 · · · K) While utilising the ADMM framework with fixed B, W and A, the la- grangian function of the problem (5) is rewritten as: < C, Z, L >= arg min kX − BCk2F + λ kH − WCk2F + ω kQ − ACk2F C,Z,L (6) T + 2εkZk`1 + 2L (C − Z) + ρ kC − Zk2F where L = [l1 , l2 , · · · , lN ] ∈ RK×N is the augmented lagrangian multiplier and ρ > 0 is the penalty parameter. After fixing B, W and A, we initialize the C0 , Z0 and L0 to be zero matrices. Equation (6) can be solved as follows: 8
(1) Updating C while fixing Z, L, B, W and A: Cm+1 =< Bm , Wm , Am , Cm , Zm , Lm > (7) The closed form solution of C is −1 Cm+1 = Bm T Bm + λWm T Wm + ωAm T Am + ρI (8) × Bm T X + λWm T H + ωAm T Q + ρZm − Lm (2) Updating Z while fixing C, L, B, W and A Zm+1 =< Bm , Wm , Am , Cm+1 , Zm , Lm > (9) The closed form solution of Z is Lm ε Zm+1 = max Cm+1 + − I, 0 ρ ρ (10) Lm ε + min Cm+1 + + I, 0 ρ ρ where I is the identity matrix and 0 is the zero matrix. (3) Updating the Lagrangian multiplier L Lm+1 = Lm + ρ (Cm+1 − Zm+1 ) (11) where the ρ in Equation (11) is the gradient of gradient descent (GD) method, which has no relationship with the ρ in Equation (6). In order to make better use of ADMM framework, the ρ in Equation (11) can be rewritten as θ. Lm+1 = Lm + θ (Cm+1 − Zm+1 ) (12) 3.2.2. BCD for learning bases Without consisdering the sparseness regulariation term in Equation (5), the constrained minimization problem of (4) with respect to the single col- umn has the closed-form solution which can be solved by BCD method. The objective function can be rewritten as follows: < B, W, A > = arg min kX − BCk2F + λ kH − WCk2F + ω kQ − ACk2F B,W,A + 2εkZk`1 + 2LT (C − Z) + ρ kC − Zk2F s.t. kB•k k22 ≤ 1, kW•k k22 ≤ 1, kA•k k22 ≤ 1(k = 1, 2 · · · K) (13) 9
We initialize B0 , W0 and A0 to be random matrices and normalize them, respectively. After that we use BCD method to update B, W and A. (1) Updating B while fixing C, L, Z, W and A Bm+1 =< Bm , Wm , Am , Cm+1 , Zm+1 , Lm+1 > (14) The closed-form solution of single column of B is T T X (Ck• )m+1 − B̃k Cm+1 (Ck• )m+1 m (B•k ) m+1 = T T (15) X (Ck• )m+1 − B̃k Cm+1 (Ck• )m+1 m 2 B•p , p 6= k where B̃k = , (•)k• denotes the kth row vector of matrix (•). 0, p = k (2) Updating W while fixing C, L, Z, B and A Wm+1 =< Bm+1 , Wm , Am , Cm+1 , Zm+1 , Lm+1 > (16) The closed-form solution of single column of W is T T H (Ck• )m+1 − W̃k Cm+1 (Ck• )m+1 (W•k ) m+1 = T m T (17) H (Ck• )m+1 − W̃k Cm+1 (Ck• )m+1 m 2 W•p , p 6= k where W̃k = . 0, p = k (3) Updating A while fixing C, L, Z, B and W Am+1 =< Bm+1 , Wm+1 , Am , Cm+1 , Zm+1 , Lm+1 > (18) The closed-form solution of single column of A is T k T Q (Ck• )m+1 − Ã Cm+1 (Ck• )m+1 m (A•k ) m+1 = T T (19) Q (Ck• )m+1 − Ã k Cm+1 (Ck• )m+1 m 2 A•p , p 6= k where Ãk = . 0, p = k 10
Figure 3: Convergence curve of LEDL Algorithm on four datasets. 3.2.3. Convergence Analysis Assume that the result of the objective function after mth iteration is de- fined as f (Cm , Zm , Lm , Bm , Wm , Am ). Since the minimum point is obtained by ADMM and BCD methods, each method will monotonically decrease the corresponding objective function after about 100 iterations. Consider- ing that the objective function is obviously bounded below and satisfies the Equation (20), it converges. Figure 3 shows the convergence curve of the proposed LEDL algorithm by using four well-known datasets. The results demonstrate that our proposed LEDL algorithm has fast convergence and low complexity. f (Cm , Zm , Lm , Bm , Wm , Am ) ≥f (Cm+1 , Zm+1 , Lm+1 , Bm , Wm , Am ) (20) ≥f (Cm+1 , Zm+1 , Lm+1 , Bm+1 , Wm+1 , Am+1 ) 11
3.2.4. Overall Algorithm The overall updating procedures of proposed LEDL algorithm is summa- rized in Algorithm 3. Here, maxiter is the maximum number of iterations, 1 ∈ RK×K is a squre matrix with all elements 1 and indicates element dot product. By iterating C, Z, L, B, W and A alternately, the sparse codes are obtained, and the corresponding bases are learned. Algorithm 3 Label Embedded Dictionary Learning Input: X ∈ RD×N , H ∈ RC×N , Q ∈ RK×N , λ, ω, ε, ρ, θ, K Output: B ∈ RD×K , W ∈ RC×K , A ∈ RK×K , C ∈ RK×N 1: C0 ← zeros (K, N ), Z0 ← zeros (K, N ), L0 ← zeros (K, N ) 2: B0 ← rand (D, K), W0 ← rand (C, K), A0 ← rand (K, K) 3: B•k = kBB•kk , W•k = kW •k 2 W•k k , A•k = kA •k 2 A•k k , (k = 1, 2 · · · K) •k 2 4: m = 0 5: while m ≤ max iter do 6: m←m+1 7: Update C: −1 8: Cm+1 = Bm T Bm + λWm T Wm + ωAm T Am + ρI 9: × Bm T X + λWm T H + ωAm T Q + ρZm − Lm 10: Update Z: n o 11: Zm+1 = max Cm+1 + Lρm − ρε I, 0 n o Lm ε 12: + min Cm+1 + ρ + ρ I, 0 13: Update L: 14: Lm+1 = Lm + θ (Cm+1 − Zm+1 ) 15: Update B, W, A: 16: Compute Dm+1 = Cm+1 Cm+1 T (1 − I) 17: for k = 1;k ≤ K;k++ do T X[(Ck• )m+1 ] −Bm (D•k )m+1 18: (B•k ) m+1 = T X[(Ck• )m+1 ] −Bm (D•k )m+1 2 T H[(Ck• )m+1 ] −Wm (D•k )m+1 19: (W•k ) m+1 = T H[(Ck• )m+1 ] −Wm (D•k )m+1 2 T Q[(Ck• )m+1 ] −Am (D•k )m+1 20: (A•k ) m+1 = T Q[(Ck• )m+1 ] −Am (D•k )m+1 2 21: end for 22: Update the objective function: 23: f = kX − BCk2F + λ kY − WCk2F + ω kQ − ACk2F + 2εkZk`1 + LT (C − Z) + ρ kC − Zk2F 24: end while 25: return B, W, A, C In testing stage, the constraint terms are based on `1 -norm sparse con- straint. Here, we exploit the learned dictionary D to fit the testing sample y to obtain the sparse codes s. Then, we use the trained classfier W to predict the label of y by calculating max {Ws}. 12
Table 1: Classification rates (%) on different datasets Datasets\Methods SRC CRC CSDL-SRC LC-KSVD LEDL Extended YaleB 79.1 79.2 80.2 73.5 81.3 CMU PIE 73.7 73.3 77.4 67.1 77.7 UC-Merced 80.4 80.7 80.5 79.4 80.7 AID 71.6 72.6 71.6 70.2 72.9 Caltech101 89.4 89.4 89.4 88.3 90.1 USPS 78.4 77.9 78.8 71.1 81.1 4. Experimental results In this section, we utilize several datasets (Extended YaleB Georghiades et al. (2001), CMU PIE Sim et al. (2002), UC Merced Land Use Yang and Newsam (2010), AID Xia et al. (2017), Caltech101 Fei-Fei et al. (2007) and USPS Hull (1994)) to evaluate the performance of our algorithm and compare it with other state-of-the-art methods such as SRC Wright et al. (2009), LC- KSVD Jiang et al. (2013), CRC Zhang et al. (2011) and CSDL-SRC Liu et al. (2016). In the following subsection, we first give the experimental settings. Then experiments on these six datasets are analyzed. Moreover, some discussions are listed finally. 4.1. Experimental settings For all the datasets, in order to eliminate the randomness, we carry out every experiment 8 times and the mean of the classification rates is reported. And we randomly select 5 samples per class for training in all the experi- ments. For Extended YaleB dataset and CMU PIE dataset, each image is cropped to 32 × 32, pulled into column vector, and `2 normalized to form the raw `2 normalized features. For UC Merced Land Use dataset, AID dataset, we use resnet model He et al. (2016) to extract the features. Specifically, the layer pool5 is utilized to extract 2048-dimensional vectors for them. For Caltech101 dataset, we use the layer pool5 of resnet model and spatial pyra- mid matching (SPM) with two layers (the second layer include five part, such as left upper, right upper, left lower, right lower, center) to extract 12288-dimensional vectors. And finally, each of the images in USPS dataset is resized into 16 × 16 vectors. For convenience, the dictionary size (K) is fixed to the twice the number of training samples. In addition, we set ρ = 1 and initial θ = 0.5, then 13
decrease the θ in each iteration. Moreover, there are other three parameters (λ, ω and ε) need to be adjust to achieve the highst classification rates. The details are showed in the following subsections. 4.2. Extended YaleB Dataset The Extended YaleB dataset contains 2,432 face images from 38 individ- uals, each having 64 frontal images under varying illumination conditions. Figure 4 shows some images of the dataset. Figure 4: Examples of the Extended YaleB dataset In addition, we set λ = 2−3 , ω = 2−11 , ε = 2−8 in our experiment. The ex- perimental results are summarized in Table (1). We can see that our proposed LEDL algorithm achieves superior performance to other classical classifica- tion methods by an improvement of at least 1.1%. Compared with `0 -norm sparsity constraint based dictionary learning algorithm LC-KSVD, our pro- posed `1 -norm sparsity constraint based dictionary learning algorithm LEDL algorithm exceeds it 7.8%. The reason of the high improvement between LC-KSVD and LEDL is that `0 -norm sparsity constraint leads to NP-hard problem which is not conductive to finding the optimal sparse solution for the dictionary. In order to further illustrate the performance of our method, we choose the first 20 classes samples as a subdataset and show the confu- sion matrices in Figure 5. As can be seen that, our method achieves higher classification rates in all the chosen 20 classes than LC-KSVD. Especially in class1, class2, class3, class10, class16, LEDL can achieve at least 10.0% performance gain than LC-KSVD. 14
Figure 5: Confusion matrices on Extended YaleB dataset 4.3. CMU PIE Dataset The CMU PIE dataset consists of 41,368 images of 68 individuals with 43 different illumination conditions. Each human is under 13 different poses and with 4 different expressions. In Figure 6, we list several samples from this dataset. Figure 6: Examples of the CMU PIE dataset The comparasion results are showed in Table 1, we can see that our proposed LEDL algorithm outperforms over other well-known methods by an improvement of at least 0.5%. To be attention, LEDL is capable of exceeding LC-KSVD 10.6% in this dataset. The optimal parameters are λ = 2−3 , ω = 2−11 , ε = 2−8 . 15
4.4. UC Merced Land Use Dataset The UC Merced Land Use dataset is widely used for aerial image clas- sification. It consists of totally 2,100 land-use images of 21 classes. Some samples are showed in Figure 7. Figure 7: Examples of the UC Merced dataset In Table 1, we can see that our proposed LEDL algorithm is only similar with CRC and still outperforms the other methods. Compared with LC- KSVD, LEDL achieves the higher accuracy by an improvement of 1.3%. Here, we set λ = 20 , ω = 2−9 , ε = 2−6 to get the optimal result. The confusion matrices of the UC Merced Land Use dataset for all classes are shown in Figure 8. We can see that, in all classes except the tennis, LEDL almost achieve better results compared with LC-KSVD. In several classes such as building, freeway, river, and sparse, our method achieves superior performance to LC-KSVD by an improvement of at least 0.5%. 4.5. AID Dataset The AID dataset is a new large-scale aerial image dataset which can be downloaded from Google Earth imagery. It contains 10,000 images from 30 aerial scene types. In Figure 9, we show several images of this dataset. Table 1 illustrates the effectiveness of LEDL for classifying images. We adjust λ = 2−6 , ω = 2−14 , ε = 2−12 to achieve the highest accuracy by an improvement of at least 0.3% in the five algorithms. While compared with LC-KSVD, LEDL achieves an improvement of 2.7%. 16
Figure 8: Confusion matrices on UCMerced dataset Figure 9: Examples of the AID dataset 4.6. Caltech101 Dataset The caltech101 dataset includes 9,144 images of 102 classes in total, which are consisted of cars, faces, flowers and so on. Each category have about 40 to 800 images and most of them have about 50 images. In figure 10, we show several images of this dataset. As can be seen in Table 1, our proposed LEDL algorithm outperforms all the competing approaches by setting λ = 2−4 , ω = 2−13 , ε = 2−14 and achieves improvements of 1.8% and 0.7% over LC-KSVD and other methods, 17
Figure 10: Examples of the Caltech101 dataset respectively. Here, we also choose the first 20 classes to build the confusion matrices. They are shown in Figure 11. Figure 11: Confusion matrices on Caltech101 dataset 4.7. USPS Dataset The USPS dataset contains 9,298 handwritten digit images from 0 to 9 which come from the U.S. Postal System. We list several samples from this dataset in Figure 12. 18
Figure 12: Examples of the USPS dataset Table 1 shows the comparasion results of five algorithms and it is easy to find out that our proposed LEDL algorithm outperforms over other well- known methods by an improvement of at least 2.3%. And our proposed method achieves an improvement of 10.0% over LC-KSVD method. The optimal parameters are λ = 2−4 , ω = 2−8 , ε = 2−5 . 4.8. Discussion From the experimental results on six datasets, we can obtain the following conclusions. (1) All the above experimental results illustrate that, our proposed LEDL algorithm is an effective and general classifier which can achieve superior performacne to state-of-the-art methods on various datasets, especially on Extended YaleB dataset, CMU PIE dataset and USPS dataset. (2) Our proposed LEDL method introduces the `1 -norm regularization term to replace the `0 -norm regularization of LC-KSVD. However, compared with LC-KSVD algorithm, LEDL method is always better than it on the six datasets. Moreover, on the two face datasets and USPS dataset, our method can exceed LC-KSVD nearly 10.0%. (3) Confusion matrices of LEDL and LC-KSVD on three datasets are shown in Figure 5 8 and 11. They clearly illustrate the superiority of our method. Specificially, for Extended YaleB dataset, our method achieve out- standing performance in five classes (class1, class2, class3, class10, class16). For UC Merced dataset, LEDL almost achieve better classification rates than LC-KSVD in all classes except the tennis class. For Caltech101 dataset, our proposed LEDL method perform much better than LC-KSVD method in 19
some classes such as beaver, binocular, brontosaurus, cannon and ceiling fan. 5. Conclusion In this paper, we propose a Label Embedded Dictionary Learning (LEDL) algorithm. Specifically, we introduce the `1 -norm regularization term to re- place the `0 -norm regularization term of LC-KSVD which can help to avoid the NP-hard problem and find optimal solution easily. Furthermore, we pro- pose to adopt ADMM algorithm to solve `1 -norm optimization problem and BCD algorithm to update the dictionary. Besides, extensive experiments on six well-known benchmark datasets have proved the superiority of our proposed LEDL algorithm. 6. Acknowledgment This research was funded by the National Natural Science Foundation of China (Grant No. 61402535, No. 61671480), the Natural Science Founda- tion for Youths of Shandong Province, China (Grant No. ZR2014FQ001), the Natural Science Foundation of Shandong Province, China(Grant No. ZR2018MF017), Qingdao Science and Technology Project (No. 17-1-1-8-jch), the Fundamental Research Funds for the Central Universities, China Univer- sity of Petroleum (East China) (Grant No. 16CX02060A, 17CX02027A), and the Innovation Project for Graduate Students of China University of Petroleum(East China) (No. YCX2018063). References Aharon, M., Elad, M., Bruckstein, A., et al., 2006. K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on signal processing 54 (11), 4311–4322. 6 Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends R in Machine learning 3 (1), 1–122. 4, 8 Chan, T.-H., Jia, K., Gao, S., Lu, J., Zeng, Z., Ma, Y., 2015. Pcanet: A simple deep learning baseline for image classification? IEEE Transactions on image processing 24 (12), 5017–5032. 2 20
Chang, H., Yang, M., Yang, J., 2016. Learning a structure adaptive dic- tionary for sparse representation based classification. Neurocomputing 190 (19), 124–131. 3 Chang, S. G., Yu, B., Vetterli, M., 2000. Adaptive wavelet thresholding for image denoising and compression. IEEE Transactions on image processing 9 (9), 1532–1546. 3 Fei-Fei, L., Fergus, R., Perona, P., 2007. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer vision and Image understanding 106 (1), 59–70. 13 Gao, D., Hu, Z., Ye, R., 2018. Self-dictionary regression for hyperspectral image super-resolution. Remote Sensing 10 (10), 1574–1596. 3 Georghiades, A. S., Belhumeur, P. N., Kriegman, D. J., 2001. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Transactions on pattern analysis and machine intelligence 23 (6), 643–660. 13 Hao, S., Wang, W., Yan, Y., Bruzzone, L., 2017. Class-wise dictionary learn- ing for hyperspectral image classification. Neurocomputing 220 (12), 121– 129. 2 He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition (CVPR), 2016 IEEE conference on. IEEE, pp. 770–778. 13 Hull, J. J., 1994. A database for handwritten text recognition research. IEEE Transactions on pattern analysis and machine intelligence 16 (5), 550–554. 13 Ji, R., Gao, Y., Hong, R., Liu, Q., Tao, D., Li, X., 2014. Spectral-spatial constraint hyperspectral image classification. IEEE Transactions on geo- science and remote sensing 52 (3), 1811–1824. 2 Jiang, Z., Lin, Z., Davis, L. S., 2013. Label consistent k-svd: Learning a discriminative dictionary for recognition. IEEE Transactions on pattern analysis and machine intelligence 35 (11), 2651–2664. 2, 3, 5, 13 21
Li, H., He, X., Tao, D., Tang, Y., Wang, R., 2018. Joint medical image fusion, denoising and enhancement via discriminative low-rank sparse dictionaries learning. Pattern Recognition 79, 130–146. 3 Li, S., Fang, L., Yin, H., 2012. An efficient dictionary learning algorithm and its application to 3-d medical image denoising. IEEE Transactions on biomedical engineering 59 (2), 417–427. 3 Liu, B.-D., Gui, L., Wang, Y., Wang, Y.-X., Shen, B., Li, X., Wang, Y.- J., 2017. Class specific centralized dictionary learning for face recognition. Multimedia Tools and Applications 76 (3), 4159–4177. 2, 3 Liu, B.-D., Shen, B., Gui, L., Wang, Y.-X., Li, X., Yan, F., Wang, Y.-J., 2016. Face recognition using class specific dictionary learning for sparse representation and collaborative representation. Neurocomputing 204 (5), 198–210. 3, 13 Liu, B.-D., Shen, B., Wang, Y.-X., 2014a. Class specific dictionary learn- ing for face recognition. In: Security, Pattern Analysis, and Cybernetics (SPAC), 2014 IEEE International conference on. IEEE, pp. 229–234. 2 Liu, B.-D., Wang, Y.-X., Shen, B., Zhang, Y.-J., Wang, Y.-J., 2014b. Block- wise coordinate descent schemes for sparse representation. In: Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International confer- ence on. IEEE, pp. 5267–5271. 4, 8 Mallat, S. G., Zhang, Z., 1993. Matching pursuit with time-frequency dictio- naries. IEEE Transactions on signal processing 41 (12), 3397–3415. 3 Nakazawa, T., Kulkarni, D. V., 2018. Wafer map defect pattern classification and image retrieval using convolutional neural network. IEEE Transactions on semiconductor manufacturing 31 (2), 309–314. 2 Natarajan, B. K., 1995. Sparse approximate solutions to linear systems. SIAM journal on computing 24 (2), 227–234. 3 Olshausen, B. A., Field, D. J., 1996. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381 (6583), 607–609. 3 22
Olshausen, B. A., Field, D. J., 1997. Sparse coding with an overcomplete basis set: a strategy employed by v1? Vision Research 37 (23), 3311–3325. 3 Shi, H., Zhang, Y., Zhang, Z., Ma, N., Zhao, X., Gao, Y., Sun, J., 2018. Hypergraph-induced convolutional networks for visual classification. IEEE Transactions on neural networks and learning systems. 2 Sim, T., Baker, S., Bsat, M., 2002. The cmu pose, illumination, and expres- sion (pie) database. In: Automatic Face and Gesture Recognition (FG), 2002 IEEE International conference on. IEEE, pp. 53–58. 13 Song, Y., Liu, Y., Gao, Q., Gao, X., Nie, F., Cui, R., 2018. Euler label consis- tent k-svd for image classification and action recognition. Neurocomputing 310 (8), 277–286. 2 Tropp, J. A., Gilbert, A. C., 2007. Signal recovery from random measure- ments via orthogonal matching pursuit. IEEE Transactions on information theory 53 (12), 4655–4666. 3 Wang, H., Yuan, C., Hu, W., Sun, C., 2012a. Supervised class-specific dic- tionary learning for sparse modeling in action recognition. Pattern Recog- nition 45 (11), 3902–3911. 3 Wang, N., Zhao, X., Jiang, Y., Gao, Y., BNRist, K., 2018. Iterative metric learning for imbalance data classification. In: International Joint Confer- ence on Artificial Intelligence (IJCAI), 2018 Morgan Kaufmann conference on. Morgan Kaufmann, pp. 2805–2811. 2 Wang, S., Zhang, L., Liang, Y., Pan, Q., 2012b. Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch syn- thesis. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE conference on. IEEE, pp. 2216–2223. 3 Wright, J., Yang, A. Y., Ganesh, A., Sastry, S. S., Ma, Y., 2009. Robust face recognition via sparse representation. IEEE Transactions on pattern analysis and machine intelligence 31 (2), 210–227. 2, 5, 13 Xia, G.-S., Hu, J., Hu, F., Shi, B., Bai, X., Zhong, Y., Zhang, L., Lu, X., 2017. Aid: A benchmark data set for performance evaluation of aerial scene 23
classification. IEEE Transactions on geoscience and remote sensing 55 (7), 3965–3981. 13 Xu, J., An, W., Zhang, L., Zhang, D., 2019. Sparse, collaborative, or nonneg- ative representation: Which helps pattern classification? Pattern Recog- nition 88, 679–688. 2 Yang, J., Wright, J., Huang, T. S., Ma, Y., 2010. Image super-resolution via sparse representation. IEEE Transactions on image processing 19 (11), 2861–2873. 3 Yang, J., Yu, K., Gong, Y., Huang, T., 2009. Linear spatial pyramid match- ing using sparse coding for image classification. In: Computer Vision and Pattern Recognition (CVPR), 2009 IEEE conference on. IEEE, pp. 1794– 1801. 2 Yang, M., Chang, H., Luo, W., 2017. Discriminative analysis-synthesis dic- tionary learning for image classification. Neurocomputing 219 (5), 404–411. 2 Yang, M., Zhang, L., Feng, X., Zhang, D., 2014. Sparse representation based fisher discrimination dictionary learning for image classification. Interna- tional Journal of Computer Vision 109 (3), 209–232. 3 Yang, Y., Newsam, S., 2010. Bag-of-visual-words and spatial extensions for land-use classification. In: Advances in Geographic Information Systems (GIS), 2010 ACM International conference on. ACM, pp. 270–279. 13 Yu, J., Feng, L., Seah, H. S., Li, C., Lin, Z., 2012a. Image classification by multimodal subspace learning. Pattern Recognition Letters 33 (9), 1196– 1204. 2 Yu, J., Rui, Y., Tang, Y. Y., Tao, D., 2014. High-order distance-based mul- tiview stochastic learning in image classification. IEEE Transactions on cybernetics 44 (12), 2431–2442. 2 Yu, J., Tao, D., Rui, Y., Cheng, J., 2013. Pairwise constraints based mul- tiview features fusion for scene classification. Pattern Recognition 46 (2), 483–496. 2 24
Yu, J., Tao, D., Wang, M., 2012b. Adaptive hypergraph learning and its application in image classification. IEEE Transactions on image processing 21 (7), 3262–3272. 2 Yuan, L., Liu, W., Li, Y., 2016. Non-negative dictionary based sparse repre- sentation classification for ear recognition with occlusion. Neurocomputing 171 (1), 540–550. 2 Zhang, L., Yang, M., Feng, X., 2011. Sparse representation or collabora- tive representation: Which helps face recognition? In: Computer Vision (ICCV), 2011 IEEE International conference on. IEEE, pp. 471–478. 13 Zhang, Q., Li, B., 2010. Discriminative k-svd for dictionary learning in face recognition. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE conference on. IEEE, pp. 2691–2698. 3 25
You can also read