ICA and Committee Machine-Based Algorithm for Cursor Control in a BCI System
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
ICA and Committee Machine-Based Algorithm for Cursor Control in a BCI System Jianzhao Qin1 , Yuanqing Li1,2 , and Andrzej Cichocki3 1 Institute of Automation Science and Engineering, South China University of Technology, Guangzhou, 510640, China 2 Institute for Infocomm Research, Singapore 119613 3 Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute Wako shi, Saitama 3510198, Japan Abstract. In recent years, brain-computer interface (BCI) technology has emerged very rapidly. Brain-computer interfaces (BCIs) bring us a new communication interface technology which can translate brain ac- tivities into control signals of devices like computers, robots. The pre- processing of electroencephalographic (EEG) signal and translation al- gorithms play an important role in EEG-based BCIs. In this study, we employed an independent component analysis (ICA)-based preprocess- ing method and a committee machine-based translation algorithm for the offline analysis of a cursor control experiment. The results show that ICA is an efficient preprocessing method and the committee machine is a good choice for translation algorithm. 1 Introduction BCIs give their users a communication and control approach that does not de- pend on the brain’s normal output channels (i.e. peripheral nerves and muscles). These new communication systems can improve the quality-of-life of those peo- ple with severe motor disabilities, and provide a new way for able-bodied people to control computers or other devices (e.g., robot arm). EEG-based BCIs record EEG at the scalp to control cursor movement, to select letters or icons. Since the EEG signal includes some noise, such as eye movements, eye blinks and EMG, the BCIs should include a preprocessing pro- cedure to separate the useful EEG signal from noise (including artifacts). A good preprocessing method can greatly improve the information transferring rate (ITR) of BCIs. ICA has been widely used in blind source separation [1], [2], [3], and biomedical signal analysis including EEG signal analysis [4]. In the offline analysis of a cursor control experiment, we used an ICA-based preprocess- ing method. The results show that the accuracy rate has improved dramatically after ICA preprocessing. A translation algorithm transforms the EEG features derived by the signal preprocessing stage into actual device control commands. In the offline case with- out feedback, the translation algorithm primarily performs a pattern recognition task (We extract features from preprocessed EEG signal, then classify them into J. Wang, X. Liao, and Z. Yi (Eds.): ISNN 2005, LNCS 3496, pp. 973–978, 2005. c Springer-Verlag Berlin Heidelberg 2005
974 Jianzhao Qin, Yuanqing Li, and Andrzej Cichocki several classes that indicate the users’ different intentions). In supervised learn- ing, if the size of training data is small (It is usual in BCIs), the overfitting problem may arise. A good transfer function should have a good generaliza- tion performance. In the analysis, we designed a simple and efficient committee machine as a transfer function to handle the overfitting problem. 2 Methods In this section, we first describe the experiment data set and illustrate the frame- work of our offline analysis, then introduce the ICA preprocessing and the feature extraction. Finally, the structure of the committee machine and the classification procedure are presented. 2.1 Data Description The EEG-based cursor control experiment was carried out in Wadsworth Center. The recorded data set was given in the BCI competition 2003. The data set and the details of this experiment are available on the web site http://ida.first.fraunhofer.de/projects/bci/competition. The data set was recorded from three subjects (AA, BB, CC). The framework of our offline analysis is depicted as Fig. 1. Fig. 1. Framework diagram 2.2 Independent Component Analysis Independent component analysis is a method for solving the blind source sepa- ration problem [5]: A random source vector S(n) is defined by S(n) = [S1 (n), S2 (n), . . . , Sm (n)]T (1) where the m components are a set of independent sources. The argument n denotes discrete time. A, a nonsingular m-by-m matrix, is called mixing matrix. The relation between X(n) and S(n) is as follows X(n) = AS(n) (2) The source vector S(n) and the mixing matrix A are both unknown. The task of blind source separation is to find a demixing matrix C such that the original source vector S(n) can be recovered as below Y(n) = CX(n) (3)
ICA and Committee Machine-Based Algorithm for Cursor Control 975 The ICA method is based on the assumption that the original sources are sta- tistically independent. The objective of an ICA algorithm is to find a demixing matrix C, such that components of Y are statistically independent. We assume that the multichannel EEG can be modelled by (2), where X(n) is the recorded multichannel EEG at time n, A is the mixing matrix, and S(n) is the source vector at time n. There are many algorithms to implement ICA. Bell and Sejnowski (1995) [6] proposed an infomax algorithm. Natural gradient (1995) was proposed and applied to ICA by Amari et al [7]. In the analysis, we applied a natural gradient- flexible ICA algorithm [8], which could separate mixtures of sub- and super- Gaussian source signals. We expected that ICA preprocessing can separate the useful EEG components from the noise (including artifacts). 2.3 Feature Extraction In the analysis, we extracted and combined two kinds of features from the pre- processed EEG. One is the power feature, the other is the CSP feature. The data includes 64 channels of EEG signal, but we only used 9 channels of EEG signal with channel number [8, 9, 10, 15, 16, 17, 48, 49, 50] for the ICA preprocessing and power feature extraction. These 9 channels covered the left sensorimotor cortex, which is the most important part when the subject used his or her EEG to control the cursor in this experiment. During each trial with trial length 368 samples (subject AA and BB) or 304 samples (subject CC), we imagined that the position of the cursor was updated once in every time interval of 160 adjacent samples, and two subsequent time intervals were overlapped in 106 (Subject AA and BB) or 124 (subject CC) samples. Thus there were 5 updates of the position of the cursor in each trial. Only one best component, which had the best correct recognition rate in training sets (sessions 1–6), was used for power feature extraction. For each trial, the power feature is defined as, PF = [P F1 , P F2 , P F3 , P F4 , P F5 ] (4) P Fn = Pn (f ) ∗ w1 + Pn (f ) ∗ w2 (5) f ∈[11,14] f ∈[22,26] where Pn (f ) is the power spectral of the n − th time bin. The parameters w1 and w2 are determined by experiment. The criteria for choosing the two parameters is similar to that for choosing the best component. CSP is a technique that has been applied to EEG analysis to find spatial structures of event-related (de-)synchronization [9]. Our CSP feature is defined as in [9]. The CSP analysis consists of calculating a matrix W and diagonal matrix D: WΣ1 WT = D and WΣ4 WT = 1 − D (6) where Σ1 and Σ4 are the normalized covariance matrix of the trial-concatenated matrix of target 1 and 4, respectively. W can be obtained by jointed diagonaliza- tion method. Prior to calculating features by CSPs, common average reference
976 Jianzhao Qin, Yuanqing Li, and Andrzej Cichocki [10] was carried out, then the referenced EEG was filtered in 10–15Hz. The CSP feature for each trial consists of 6 most discriminating main diagonal elements of the transformed covariance matrix for a trial followed by a log-transformation [9]. 3 Committee Machine-Based Translation Algorithm Multi-layer perceptron is a strong tool in supervised-learning pattern recognition, but when the size of the training samples is relatively small compared with the number of network parameters, the overfitting problem may arise. In the session, based on the features mentioned above, we describe a committee machine consisting of several small-scale multi-layer perceptrons to solve the overfitting problem. In our analysis, the data from sessions 1–6 (about 1200 samples) were used for training. A statistical theory on overfitting phenomenon [11] suggests that overfitting may occur when N < 30W , where N is the number of training samples, W denotes the number of network parameters. According to this theory, the maximum number of network parameters should be less than 40. In order to satisfy this requirement, we designed a committee machine to divide the task into 2 simple tasks, so the structure and training of each network in the committee machine can be simplified. Fig. 2. The structure of a committee machine The structure of the committee machine is depicted in Fig. 2. The units of this committee machine are several small-scale three-layer (including input layer) perceptrons with nonlinear activation function. We call these networks ’experts’ which are divided into two groups. One group of experts make their decisions by using power features, while the other group’s decision is from CSP features. The experts in the same group share common inputs, but are trained differently by varied initial values. Each network has four output neurons corresponding to four target positions. The final decision of a group is made by averaging all outputs of its experts, then the final outputs of the two groups are linearly combined to produce an overall output of the machine.
ICA and Committee Machine-Based Algorithm for Cursor Control 977 4 Result We trained ICA on 250s (40000 samples) of EEG recording randomly chosen from session 1–6. All the trials in session 7–10 were used to test our method. For the purpose of comparison, we done feature extraction and classification under three conditions. 1) ICA was used for preprocessing, and committee machine was used for classification. 2) Without ICA preprocessing, the best channel of raw EEG signal was chosen for power feature extraction, and committee machine was used for classification. 3) ICA was used for preprocessing, while the committee machine was replaced by normal multiple-layer network for classification. The results for the three subjects are shown in Table 1, which were obtained under the three conditions. Table 1. Accuracy rates (%) for the three subjects obtained under above three condi- tions Subject Condition Session 7 Session 8 Session 9 Session 10 Average accuracy AA 1 71.20 71.20 66.49 69.63 69.63 AA 2 68.06 68.06 65.45 68.59 67.54 AA 3 70.68 64.40 63.35 59.69 64.53 BB 1 63.87 62.30 47.12 54.97 57.07 BB 2 62.30 61.78 42.41 48.17 53.67 BB 3 62.30 57.07 46.07 46.60 53.01 CC 1 66.67 72.82 54.36 81.54 68.85 CC 2 63.59 70.26 50.77 72.82 64.36 CC 3 61.54 66.15 54.87 68.21 62.69 5 Discussion and Conclusion Table 1 shows that the accuracy of offline analysis has been improved consid- erably by using the ICA preprocessing method. Furthermore, the committee machine is better in generalization performance than the normal multiple-layer network after comparing the results in conditions 1 and 3. In the analysis, we used ICA as the preprocessing method for BCI. This method has some advantages. First, we think that the ICA preprocessing can separate useful source components from noise. Thus we can choose one or two components which contain more useful information for extracting power fea- tures than before preprocessing. Second, since we choose a smaller number of ICA components for feature extraction, the computation burden can be reduced. Furthermore, the dimensionality of the feature space can be reduced, as a con- sequence, not only the structure of the classifier can be simplified, but also the overfitting problem can be solved to some extent. Meanwhile, a committee machine was used as a translation algorithm, which can also improve the performance of BCIs. The key point of committee machine
978 Jianzhao Qin, Yuanqing Li, and Andrzej Cichocki is to divide a complex computational task into a number of simple tasks. Due to the simple network structure, the constituent experts of the machine are easy to be trained, and the generalization performance is improved. Acknowledgements This study was supported by the National Natural Science Foundation of China (No. 60475004, No. 60325310), Guangdong Province Science Foundation for Re- search Team Program (No. 04205789), and the Excellent Young Teachers Pro- gram of MOE, China. References 1. Li, Y., Wang, J., Zurada, J. M.: Blind Extraction of Singularly Mixed Sources Signals. IEEE Trans. On Neural Networks, 11 (2000) 1413–2000 2. Li, Y., Wang, J.: Sequential Blind Extraction of Linearly Mixed Sources. IEEE Trans. on Signal Processing, 50 (2002) 997–1006 3. Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications. John Wiley, New York (2002). 4. Makeig, S., Bell, A.J., Jung, T.-P., Sejnowski, T.J.: Independent Component Anal- ysis of Electroencephalographic Data. Adv Neural Info Processing Systems, 8 (1996) 145–151 5. Comon, P.: Independent Component Analysis - A New Concept Signal Procesing, 36 (1994) 287–314 6. Bell, A.J., Sejnowski, T.J.: An Information-maximization Approach to Blind Sep- aration and Blind Deconvolution. Neural Computation, 7 (1995) 1129–1159 7. Amari, S., Chichocki, A., Yang, H.H.: A New Learning Algorithm for Blind Signal Separation. Advances in Neural Information Processing, 8 (1996) 757–763 8. Choi, S., Chichocki, A., Amari, S.: Flexible Independent Component Analysis. Proc. of the 1998 IEEE Workshop on NNSP, (1998) 83–92 9. Ramoser, H., Gerking, J.M., Pfurtscheller, G.: Optimal Spatial Filtering of Single Trial EEG during Imagined Hand Movement. IEEE Trans. Rehab. Eng. 8 (2000) 441–446 10. McFarland, D.J., McCane, L.M., David, S.V., Wolpaw, J.R.: Spatial Filter Se- lection for EEG-based Communication. Electroenc. Clin. Neurophys. 103 (1997) 386–394 11. Amari, S., Murata, N., Müller, K.-R, Finke, M., Yang, H.: Statistical Theory of Overtraining-Is Cross-Validation Asymptotically Effective Advances in Neural In- formation Processing Systems. Vol. 8. MIT Press Cambridge, MA (1996) 176–182
You can also read