Comparative Assessment of Independent Component Analysis (ICA) for Face Recognition
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Appears in the Second International Conference on Audio- and Video-based Biometric Person Authentication, AVBPA’99, Washington D. C. USA, March 22-24, 1999. Comparative Assessment of Independent Component Analysis (ICA) for Face Recognition Chengjun Liu and Harry Wechsler Department of Computer Science, George Mason University, 4400 University Drive, Fairfax, VA 22030-4444, USA cliu, wechsler @cs.gmu.edu Abstract fication problems. A successful face recognition method- ology depends heavily on the particular choice of the fea- This paper addresses the relative usefulness of Inde- tures used by the (pattern) classifier [4], [17], [3]. Feature pendent Component Analysis (ICA) for Face Recognition. selection in pattern recognition involves the derivation of Comparative assessments are made regarding (i) ICA sen- salient features from the raw input data in order to reduce sitivity to the dimension of the space where it is carried out, the amount of data used for classification and simultane- and (ii) ICA discriminant performance alone or when com- ously provide enhanced discriminatory power. bined with other discriminant criteria such as Bayesian One popular technique for feature selection and di- framework or Fisher’s Linear Discriminant (FLD). Sensi- mensionality reduction is Principal Component Analysis tivity analysis suggests that for enhanced performance ICA (PCA) [8], [6]. PCA is a standard decorrelation technique should be carried out in a compressed and whitened Prin- and following its application one derives an orthogonal cipal Component Analysis (PCA) space where the small projection basis which directly leads to dimensionality re- trailing eigenvalues are discarded. The reason for this duction, and possibly to feature selection. PCA was first finding is that during whitening the eigenvalues of the co- applied to reconstruct human faces by Kirby and Sirovich variance matrix appear in the denominator and that the [10] and to recognize faces by Turk and Pentland [19]. The small trailing eigenvalues mostly encode noise. As a con- recognition method, known as eigenfaces, defines a feature sequence the whitening component, if used in an uncom- space, or “face space”, which drastically reduces the di- pressed image space, would fit for misleading variations mensionality of the original space, and face detection and and thus generalize poorly to new data. Discriminant anal- identification are then carried out in the reduced space. ysis shows that the ICA criterion, when carried out in the Independent Component Analysis (ICA) has emerged properly compressed and whitened space, performs bet- recently as one powerful solution to the problem of blind ter than the eigenfaces and Fisherfaces methods for face source separation [5], [9], [7] while its possible use for recognition, but its performance deteriorates when aug- face recognition has been shown by Bartlett and Sejnowski mented by additional criteria such as the Maximum A Pos- [1]. ICA searches for a linear transformation to express a teriori (MAP) rule of the Bayes classifier or the FLD. The set of random variables as linear combinations of statisti- reason for the last finding is that the Mahalanobis distance cally independent source variables [5]. The search crite- embedded in the MAP classifier duplicates to some extent rion involves the minimization of the mutual information the whitening component, while using FLD is counter to expressed as a function of high order cumulants. Basically the independence criterion intrinsic to ICA. PCA considers the 2nd order moments only and it uncor- relates data, while ICA accounts for higher order statistics and it identifies the independent source components from 1. Introduction their linear mixtures (the observables). ICA thus provides a more powerful data representation than PCA [9]. As Face recognition is important not only because it has a PCA derives only the most expressive features for face re- lot of potential applications in research fields such as Hu- construction rather than face classification, one would usu- man Computer Interaction (HCI), biometrics and security, ally use some subsequent discriminant analysis to enhance but also because it is a typical Pattern Recognition (PR) PCA performance [18]. problem whose solution would help solving other classi- This paper makes a comparative assessment on the use
of ICA as a discriminant analysis criterion whose goal is faces and the Most Discriminant Features (MDF) method) to enhance PCA stand alone performance. Experiments for face recognition. Methods that combine PCA and the in support of our comparative assessment of ICA for face standard FLD, however, lack in their generalization ability recognition are carried out using a large data set consisting as they overfit to the training data. To address the overfit- of 1,107 images and drawn from the FERET database [16]. ting problem Liu and Wechsler [11] introduced Enhanced The comparative assessment suggests that for enhanced FLD Models (EFM) to improve on the generalization ca- face recognition performance ICA should be carried out pability of the standard FLD based classifiers such as Fish- in a compressed and whitened space, and that ICA per- erfaces [2]. formance deteriorate when it is augmented by additional decision rules such as the Bayes classifier or the Fisher’s 3. Independent Component Analysis (ICA) linear discriminant analysis. As PCA considers the 2nd order moments only it lacks 2. Background information on higher order statistics. ICA accounts for higher order statistics and it identifies the independent PCA provides an optimal signal representation tech- source components from their linear mixtures (the observ- nique in the mean square error sense. The motivation be- ables). ICA thus provides a more powerful data represen- hind using PCA for human face representation and recog- tation than PCA [9] as its goal is that of providing an inde- nition comes from its optimal and robust image compres- pendent rather than uncorrelated image decomposition and sion and reconstruction capability [10] [15]. PCA yields representation. projection axes based on the variations from all the training ICA of a random vector searches for a linear trans- samples, hence these axes are fairly robust for representing formation which minimizes the statistical dependence be- both training and testing images (not seen during training). PCA does not distinguish, however, the different roles of tween its components [5]. In particular, let random vector representing an image, where be a is the di- mensionality of the image space. The vector is formed by the variations (within- and between-class variations) and it treats them equally. This leads to poor performance concatenating the rows or the columns of the image which when the distributions of the face classes are not separated may be normalized to have a unit norm and/or an equalized "!$# by the mean-difference but separated by the covariance- histogram. The covariance matrix of is defined as difference [6]. &% (1) ()(*,+- ' Swets and Weng [18] point out that PCA derives only the most expressive features which are unrelated to actual where is the expectation operator, denotes the trans- face recognition, and in order to improve performance one pose operation, and . The ICA of factor- .,/0. ! needs additional discriminant analysis. One such discrim- izes the covariance matrix into the following form inant criterion, the Bayes classifier, yields the minimum / . classification error when the underlying probability density (2) functions (pdf) are known. The use of the Bayes classifier is conditioned on the availability of an adequate amount where 1 is diagonal real positive and transforms the 2. 1 of representative training data in order to estimate the pdf. original data into As an example, Moghaddam and Pentland [13] developed (3) 1 a Probabilistic Visual Learning (PVL) method which uses the eigenspace decomposition as an integral part of esti- . such that the components of the new data are indepen- mating the pdf in high-dimensional image spaces. To ad- dent or “the most independent possible” [5]. dress the lack of sufficient training data Liu and Wechsler To derive the ICA transformation , Comon [5] de- [12] introduced the Probabilistic Reasoning Models (PRM) veloped an algorithm which consists of three operations: where the conditional pdf for each class is modeled using whitening, rotation, and normalization. First, the whiten- 3 the pooled within-class scatter and the Maximum A Pos- ing operation transforms a random vector into another teriori (MAP) Bayes classification rule is implemented in 254 687&9$: 3 one that has a unit covariance matrix. the reduced PCA subspace. 4 6 The Fisher’s Linear Discriminant (FLD) is yet another (4) popular discriminant criterion. By applying first PCA 4 6(4 ! for dimensionality reduction and then FLD for discrimi- where and are derived by solving the following eigen- nant analysis, Belhumire, Hespanha, and Kriegman [2] and value equation. Swets and Weng [18] developed similar methods (Fisher- (5) 2
4 7 26 : where tor matrix and 7 : is an orthonormal eigenvec- is a diagonal # small trailing eigenvalues are excluded so that we can ob- tain more robust whitening results. Toward that end, dur- eigenvalue matrix of . One notes that whitening, an in- ing the whitening transformation we should keep a balance tegral ICA component, counteracts the fact that the Mean between the need that the selected eigenvalues account for Square Error (MSE) preferentially weighs low frequencies most of the spectral energy of the raw data and the require- [14]. The rotation operations, then, perform source sepa- ment that the trailing eigenvalues of the covariance matrix ration (to derive independent components) by minimizing are not too small. As a result, the eigenvalue spectrum of the mutual information approximated using higher order the training data supplies useful information for the deci- cumulants. Finally, the normalization operation derives sion of the dimensionality of the subspace. unique independent components in terms of orientation, unit norm, and order of projections [5]. 4. Sensitivity Analysis of ICA 3 6 79:4 ! Eq. 4 can be rearranged to the following form 4 6 (6) where and are eigenvector and eigenvalue matrices, respectively (see Eq. 5). Eq. 6 shows that during whiten- ing the eigenvalues appear in the denominator. The trailing eigenvalues, which tend to capture noise as their values are fairly small, thus cause the whitening step to fit for mis- leading variations and make the ICA criterion to generalize poorly when it is exposed to new data. As a consequence, if the whitening step is preceded by a dimensionality reduc- tion procedure ICA performance would be enhanced and computational complexity reduced. Specifically, space di- mensionality is first reduced to discard the small trailing ! "$#&%('*)!+-,/.!01'2' 3 & 4 06587 '9:0;4
to decreased ICA performance (see Fig. 3 and Fig. 4) as 0.96 they amplify the effects of noise. So we set the dimension W 0.95 of the reduced space as . 0.94 In order to assess the sensitivity of ICA in terms of the dimension of the compressed and whitened space where it 0.93 correct retrieval rate is implemented, we carried out a comparative assessment 0.92 for different dimensional ( ) whitened subspace. Fig. 3 0.91 and Fig. 4 show the ICA face recognition performance when different is chosen during the whitening transfor- 0.9 mation (see Eq. 9) and as the number of features used can 0.89 range up to the dimension ofW the compressed space. The curve corresponding to performs better than all the 0.88 m = 30 m = 40 other curves obtained for different values. Fig. 4 shows 0.87 m = 50 m = 70 that ICA performance deteriorates quite severely as is 0.86 getting larger. The reason for this deterioration is that for 20 25 30 35 40 45 50 55 60 65 70 number of features the large values of , more small trailing eigenvalues (see ! "$#&%(' $+ .!065&4% 4 < . % 0;4 P' . > ! " Fig. 2 ) are included in the whitening step and this leads I!T= P%@:06 4 A P%@ AK'C%@T. B ' C W # to an unstable transformation. Note that smaller values for lead to decreased ICA performance as well because the %(' =5!' PA '
consequence overfitting takes place. To improve the gen- 0.96 *) eralization ability of Fisherfaces, we implemented it in the PCA subspace. The comparative performance 0.94 of eigenfaces, Fisherfaces and ICA is W plotted in Fig. 5 when ICA was implemented in the PCA subspace. Fig. 5 shows that ICA criterion performs better than both correct retrieval rate 0.92 eigenfaces and Fisherfaces. We also assessed the ICA discriminant criterion against 0.9 two other popular discriminant criteria: the MAP rule of the Bayes classifier and the Fisher’s linear discriminant as they are embedded within the Probabilistic Reasoning 0.88 Models (PRM) [12] and the Enhanced FLD Models (EFM) ICA [11]. Fig. 6 plots the comparative performances of PRM-1 PRM−1 0.86 and EFM-1 against the ICA method with again being set EFM−1 to 40. ICA is shown to have comparable face recognition 18 20 22 24 26 28 30 32 34 36 38 number of features performance with the MAP rule of the Bayes classifier and the Fisher’s linear discriminant as embedded within PRM ! "$#&%(' $+ . 065&4% 4 .C% 0;4 P' .C> '< "&' " " > 4CP'= !T=BC'(4CP' = 4 &I + ! "$#&%(' $+ .!0 54%@4 A ' 5!'(.C% 0;4&P'.C> 4 I JP.!06L8 C'
0.95 the 4th Annual Joint Symposium on Neural Computation, Pasadena, CA, May 17, 1997. 0.9 [2] P. Belhumeur, J. Hespanha, and D. Kriegman. Eigenfaces 0.85 vs. fisherfaces: Recognition using class specific linear pro- jection. IEEE Trans. Pattern Analysis and Machine Intelli- 0.8 gence, 19(7):711–720, 1997. correct retrieval rate [3] R. Brunelli and T. Poggio. Face recognition: Features vs. 0.75 templates. IEEE Trans. Pattern Analysis and Machine In- 0.7 telligence, 15(10):1042–1053, 1993. [4] R. Chellappa, C. Wilson, and S. Sirohey. Human and 0.65 machine recognition of faces: A survey. Proc. IEEE, 83(5):705–740, 1995. 0.6 [5] P. Comon. Independent component analysis, a new con- ICA 0.55 cept? Signal Processing, 36:287–314, 1994. ICA+FLD [6] K. Fukunaga. Introduction to Statistical Pattern Recogni- 0.5 5 10 15 20 25 30 35 40 tion. Academic Press, second edition, 1991. number of features [7] A. Hyvarinen and E. Oja. A fast fixed-point algorithm !T" & # %(' + .!0 54%@4 A ' 5!'C% >(.C% 0;4 P'.> 4 I for independent component analysis. Neural Computation, ?P.!0 L& ' I TA B + [8] 9:1483–1492, 1997. I. T. Jolliffe. Principal Component Analysis. Springer, New York, 1986. [9] J. Karhunen, E. Oja, L. Wang, R. Vigario, and J. Joutsen- salo. A class of neural networks for independent compo- 6. Conclusions nent analysis. IEEE Trans. on Neural Networks, 8(3):486– 504, 1997. This paper addresses the relative usefulness of the inde- [10] M. Kirby and L. Sirovich. Application of the karhunen- pendent component analysis for Face Recognition. Com- loeve procedure for the characterization of human faces. parative assessments are made regarding (i) ICA sensitiv- IEEE Trans. Pattern Analysis and Machine Intelligence, ity to the dimension of the space where it is carried out, 12(1):103–108, 1990. [11] C. Liu and H. Wechsler. Enhanced fisher linear discrimi- and (ii) ICA discriminant performance alone or when com- nant models for face recognition. In Proc. the 14th Inter- bined with other discriminant criteria such as the MAP cri- national Conference on Pattern Recognition, Queensland, teria of the Bayes classifier or the Fisher’s linear discrim- Australia, August 17-20, 1998. inant. The sensitivity analysis suggests that for enhanced [12] C. Liu and H. Wechsler. Probabilistic reasoning models for performance ICA should be carried out in a compressed face recognition. In Proc. IEEE Computer Society Confer- and whitened space where most of the representative in- ence on Computer Vision and Pattern Recognition, Santa formation of the original data is preserved and the small Barbara, California, USA, June 23-25, 1998. trailing eigenvalues discarded. The dimensionality of the [13] B. Moghaddam and A. Pentland. Probabilistic visual learn- compressed subspace is decided based on the eigenvalue ing for object representation. IEEE Trans. Pattern Analysis and Machine Intelligence, 19(7):696–710, 1997. spectrum from the training data. The discriminant analysis [14] B. A. Olshausen and D. J. Field. Emergence of simple- shows that the ICA criterion, when carried out in the prop- cell receptive field properties by learning a sparse code for erly compressed and whitened space, performs better than natural images. Nature, 381(13):607–609, 1996. the eigenfaces and Fisherfaces methods for face recogni- [15] P. Penev and J. Atick. Local feature analysis: A general tion, but its performance deteriorates significantly when statistical theory for object representation. Network: Com- augmented by an additional discriminant criteria such as putation in Neural Systems, 7:477–500, 1996. the FLD. [16] P. Phillips, H. Moon, P. Rauss, and S. Rizvi. The FERET september 1996 database and evaluation procedure. In Proc. First Int’l Conf. on Audio and Video-based Biometric Acknowledgments: This work was partially supported Person Authentication, pages 12–14, Switzerland, 1997. by the DoD Counterdrug Technology Development Pro- [17] A. Samal and P. Iyengar. Automatic recognition and analy- gram, with the U.S. Army Research Laboratory as Techni- sis of human faces and facial expression: A survey. Pattern cal Agent, under contract DAAL01-97-K-0118. Recognition, 25(1):65–77, 1992. [18] D. L. Swets and J. Weng. Using discriminant eigenfeatures for image retrieval. IEEE Trans. on PAMI, 18(8):831–836, References 1996. [19] M. Turk and A. Pentland. Eigenfaces for recognition. Jour- [1] M. Bartlett and T. Sejnowski. Independent components of nal of Cognitive Neuroscience, 13(1):71–86, 1991. face images: A representation for face recognition. In Proc. 6
You can also read