Automated Detection of Rehabilitation Exercise by Stroke Patients Using 3-Layer CNN-LSTM Model
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Hindawi Journal of Healthcare Engineering Volume 2022, Article ID 1563707, 12 pages https://doi.org/10.1155/2022/1563707 Research Article Automated Detection of Rehabilitation Exercise by Stroke Patients Using 3-Layer CNN-LSTM Model Zia Ur Rahman ,1 Syed Irfan Ullah,1 Abdus Salam ,1 Taj Rahman ,2 Inayat Khan ,3 and Badam Niazi 4 1 Department of Computing and Technology Abasyn University, Peshawar 25000, Pakistan 2 Qurtuba University of Science and Technology Peshawar, Peshawar 25000, Pakistan 3 Department of Computer Science, University of Buner, Buner 19290, Pakistan 4 Department of Computer Science, University of Nangarhar, Jalalabad 2600, Afghanistan Correspondence should be addressed to Badam Niazi; niazi5.48@gmail.com Received 10 December 2021; Accepted 12 January 2022; Published 4 February 2022 Academic Editor: Liaqat Ali Copyright © 2022 Zia Ur Rahman et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. According to statistics, stroke is the second or third leading cause of death and adult disability. Stroke causes losing control of the motor function, paralysis of body parts, and severe back pain for which a physiotherapist employs many therapies to restore the mobility needs of everyday life. This research article presents an automated approach to detect different therapy exercises performed by stroke patients during rehabilitation. The detection of rehabilitation exercise is a complex area of human activity recognition (HAR). Due to numerous achievements and increasing popularity of deep learning (DL) techniques, in this research article a DL model that combines convolutional neural network (CNN) and long short-term memory (LSTM) is proposed and is named as 3-Layer CNN-LSTM model. The dataset is collected through RGB (red, green, and blue) camera under the supervision of a physiotherapist, which is resized in the preprocessing stage. The 3-layer CNN-LSTM model takes preprocessed data at the convolutional layer. The convolutional layer extracts useful features from input data. The extracted features are then processed by adjusting weights through fully connected (FC) layers. The FC layers are followed by the LSTM layer. The LSTM layer further processes this data to learn its spatial and temporal dynamics. For comparison, we trained CNN model over the prescribed dataset and achieved 89.9% accuracy. The conducted experimental examination shows that the 3-Layer CNN-LSTM outperforms CNN and KNN algorithm and achieved 91.3% accuracy. 1. Introduction cerebrum paralysis [2], trauma and paralytic injury [3], posttraumatic stiffness [4], congenital deformity [5], and Stroke is a worldwide healthcare problem which causes due Guillain–barre syndrome [6]. Injuries to the cervical spinal to heart failure or malfunctioning of blood vessels. It is a cord usually result in loosened leg and arms functions where common, dangerous, and disabling health disease that af- hip flexors and legs are degraded by lumbar and spinal cord fects people all around the world. Stroke is the second or injuries. The survivors of a stroke have a similar condition third leading cause of death in most regions, as well as one of since they must relearn the lost skills when their brain is hit the leading causes of acquired adult disability [1]. Over the by a stroke. next couple of decades, the frequency of stroke-related A physiotherapist employs many therapies, including burden is predicted to rise. Stroke causes losing control of nerve reeducation, task coaching, and muscle strengthening the motor function, incoordination or paralysis of all body to restore the mobility needs of everyday life. Different parts, and severe back pain. Due to stroke, patients will have physiotherapy and rehabilitation programs are needed to muscle and neurological trauma and disorders such as restore the function of the upper extremity and increase their
2 Journal of Healthcare Engineering quality of life. Some exercises such as motor training physiotherapy exercises, (3) extracting and classifying ac- (movement exercise), mobility training (restriction-in- tivity features automatically, (5) and having better or at least duced), motion therapy (flow therapy), and repetitive task same accuracy as of the existing DL approaches laterally with training (workout training) are very effective for learning fast convergence speed and good generalization ability. and taking control of the body [7]. Both for upper and lower The manuscript is organized as follows: in Section 2 we limbs, balancing exercises are of considerable benefit to take a look at some current techniques for HAR which is increase the balance after spinal cord injury. It is obvious using machine learning and deep learning approaches. that serious health problems may lead to death or acquired Section 3 explains CNN and LSTM algorithms and data physical impairment due to injury to the backbone. Different preprocessing for the proposed model. Moreover, it contains neuroplastic results have shown that it can be recovered a detailed overview of the 3-Layer CNN-LSTM model and its partly through adequate rehabilitation exercises [8]. implementation. In Section 4, the performance of the 3- Motor function controls mobility and muscle movement Layer CNN-LSTM model is explained along with their and is a commonly recognized impairment due to stroke. To experimental results. Section 5, concludes the research work reestablish motor function, the most important technique is with a brief summary. to perform rehabilitation exercises under the direction of a physiotherapist. The financial requirement to receive the 2. Literature Review treatment is not easy, so the family can suffer from financial burden. The resolution of this is a new virtual reality re- Machine learning models are used to learn the fundamental habilitation problem, which uses sensor tools to capture and connections in data through experience while performing recognize movements. The rehabilitation program requires some tasks and making decisions without explicit instruc- physiological exercises like flexion, extension, abduction, tions [18]. For a very long time, ML models have been used adduction, enlargement, sleeves, dorsiflexion, plantar flex- widely for HAR. Different types of models which can apply ion, and rotation of various joints in patients with muscular for HAR depend on data type, the volume of data, number of and neurological trauma and disorder. In the existing lit- activities, similarities among activities, and number of ac- erature studies, most of the researchers have focused on the tivity classes. The existing ML models such as hidden detection of human activities like standing, sitting, sleeping, Markov model (HMM) [19], linear discriminant analysis walking up and down stairs, etc., but very less attention was (LDA) [20], random forest (RnF) [21], logistic regression focused on the recognition and classification of rehabilita- [22], support vector machine (SVM) [23], decision tree (DT) tion physiotherapy exercises which is a multifaceted area of [24], histogram oriented gradient (HOG) [25], and HAR. K-nearest neighbour (KNN) [26] are used for human ac- Hitherto, HAR has been widely used in numerous ap- tivity classification. Nevertheless, for precision and accu- plications, like gesture recognition gait analysis, human- racies of the abovementioned algorithms, the selection of computer interaction, home behavior analysis, personal different parameters like the method of distance calculation health system, video surveillance, and antiterrorism moni- and the number of neighbors for KNN, choice of the kernel toring [9–16]. It has the ability to learn in advance from raw for SVM, the tolerance value for LDA, and the number of data around human activities. Currently, HAR is a popular trees for RnF plays a significant role which should be research track, due to progression in the field of human- considered carefully [27–29]. These algorithms have computer interaction. Generally, there are two types of achieved remarkable classification accuracies, nonetheless, it HAR: sensor-based and video-based. Sensor-based HAR requires a lot of hand-tuning to formulate the data, feature depends on the data learned through keen sensors. Due to engineering, preprocessing, and domain knowledge the development of ubiquitous computing and sensor amongst others. These methods are not suitable in scenarios technology, sensor-based HAR is more frequently used. To like indoor environments where confidentiality is required. improve recognition accuracy, researchers have developed Some of the other approaches are highly vulnerable to il- various types of sensing technologies such as techniques lumination disparities and background changes which are based on static and dynamic sensors. The video-based HAR restraining their practical use. takes advantage of the data acquired through various kinds A unique biometric system for detecting human actions of cameras to determine human activities [17], which is in 3D space is proposed in [30] in which joint skeleton angles becoming popular due to the reduced complexity and ease of recorded through an RGB depth sensor are used as input availability of different kinds of cameras. In this research features. The angle information is stored using the sliding article, for the detection of rehabilitation physiotherapy kernel method. Before lowering the data dimension, the exercise, the dataset is collected through an RGB camera, Haar wavelet transform (HWT) is used to maintain feature and then a 3-Layer CNN-LSTM algorithm is applied for the information. For dimensionality reduction, an averaging detection of rehabilitation physiotherapy exercise. The 3- technique is applied to reduce computing costs and result in Layer CNN-LSTM algorithm seeks to leverage the power of faster convergence. Moreover, the system may be used for merging both CNN and LSTM and address the deficiencies elderly care and video surveillance, but there are a few of existing approaches, laterally with the following charac- drawbacks to the suggested approach. First, improper teristics: (1) the model is robust enough to perform equally skeletal detection leads to inaccurate angle calculations, well or better on input data, (2) it is evaluated on our self- which causes the classification to be automatically misled. As created complex dataset, having rehabilitation the system is trained on activities in two directions at various
Journal of Healthcare Engineering 3 angles and positions, there may be some confusion when maintenance, robot-assisted techniques are difficult to be attempting to recognize an activity due to probable simi- used. Moreover, due to the shortage of healthcare providers, larities in the positions and angles of different activities. artificial form of rehabilitation is also hard to be accessed. As RGB-D images are beneficial for action recognition; how- we know that rehabilitation of stroke patients is a long- ever, the computational complexity of the learning model lasting process due to which both robot-assisted and arti- grows rapidly as the number of frames grows. As a result, the ficial forms of rehabilitation are not feasible. Therefore, an system becomes more complex and slower. Thus, instead of automated rehabilitation training method is needed to ad- an RGB-D camera, a simple RGB camera could be explored dress this issue. Consequently, we designed an automated to broaden the applications of the HAR system. Activity model to recognize the physiotherapy exercise while keeping sequence recording, function extraction, model imple- in view the existing HAR approaches. The architecture of the mentation, and finally identification are the four essential proposed algorithm is shown in Figure 1. steps in vision-based HAR [31, 32]. The Kinect-based re- The 3-Layer CNN-LSTM model seeks to leverage the habilitation training system has arisen recently, which is power of merging both CNN and LSTM. The main approach is utilized in most research studies. Some researchers focus on divided into two sections, the detection of physiotherapy ex- skeleton data while most of the researchers use RGB-D data, ercise and its classification. The first section contains data which is due to the fact that Kinect is based on depth sensors collection, preprocessing of images, dimensionality reduction, and uses structured light, which is not accurate. On the basis and data augmentation. The second section is using a com- of depth data, it obtains the skeleton data, which is of low bination of DL models to evaluate the features and classify the quality and high noise. Thus, the data obtained through physiotherapy exercises efficiently. To test the model’s effi- skeleton joints is not accurate and has outliers which are ciency, various experiments were performed for the detection declining the model performance. and classification of physiotherapy exercises. The main com- Researchers have turned towards DL techniques for the ponents of the algorithm are discussed briefly below. detection of complex human activities. DL techniques extract features automatically from raw data during the training phase 3.1. Convolutional Neural Network. It is a type of deep neural and have produced remarkable results in many activity rec- network (DNN) having multiple layers. Its concept is origi- ognition tasks. It has tremendous applications in the field of nated from the receptive field and neural cognitive machine HAR, as the process of extracting features and classification are which is more sophisticated than a traditional neural network. performed simultaneously. The RNN-LSTM approach used in In the presence of additional deep layers, the DNN model [33, 34] has achieved outstanding performances and shown learns deeply compared to other shallow neural networks. excellent results when compared to traditional hand-crafted Dealing with the problem of image classification and recog- practices, nevertheless, it exaggerates the temporal and un- nition, the CNN has high distortion tolerance due to its spatial derstates the spatial information of input data. In many tasks structure and weight sharing mechanism [36]. The funda- such as NLP and speech recognition amongst others, CNN mental structures of the CNN are the amalgamation of weight achieved better or at least similar performance to that of re- sharing, subsampling, local receptive field, and dimensionality current neural network (RNN). Due to this trend, recently reduction during feature extraction. To reduce the complexity CNN is widely used in the literature to grab the HAR problems. of the model, enhance their performance and efficiently reg- Numerous studies showed that CNN-based approaches are far ulate the number of weights, the weight sharing mechanism is better than traditional hand-crafted approaches since CNN has used. CNN map input image data to an output variable i.e., it the ability to learn complex motion features [35]. The presented takes the input image data, processes the data, and predict the DL models for HAR are having simple architecture and great image class efficiently. Input data is in the form of a two-di- accuracy. However, these models were tested on simple mensional vector and CNN deals with it in a better way. In 3- datasets like standing, sitting, sleeping, walking upstairs and layer CNN-LSTM model, we used CNN to extract useful downstairs, and do not have a good generalization ability. features from the input image matrix. In this step, the phys- The main goal of this research work is to design a DL- iotherapy exercise image is taken as input which is processed to based algorithm for automated detection of rehabilitation extract features from it. In this work, the LSTM is used to physiotherapy exercises. In the suggested 3-Layer CNN- classify the input data under certain categories. The process of LSTM model, data is fed to the convolutional layer for the CNN for feature extraction is described in detail in Section 3.3. extraction of useful features, and for classification, it is The learning and classification technique of CNN is described passed to LSTM to recognize the rehabilitation exercise. mathematically through (1). In equation (1), Zi is the set of Batch normalization (BN) is applied to stabilize the learning inputs, Wi is the set of weights, and B is the bias operation. process and reduce internal covariate shift. The model au- tomatically detects the set of rehabilitation physiotherapy N exercises and classify them under certain categories. P � f⎝ ⎛ Zi × Wi + B⎠ ⎞. (1) i�1 3. Methodology For rehabilitation of stroke patients mostly two conventional 3.2. Long Short-Term Memory. LSTM is used to evade the techniques are used: artificially and robot-assisted. However, problem of gradient vanishing or gradient exploding during due to the cost of intelligent robots and their supplementary training. The back propagation (BP) algorithm is used to
4 Journal of Healthcare Engineering .... .... .... SoftMax and output Batch layer Input data Pre-processing Conv + ReLU + Maxpooling LSTM Normalization + Dropout layer Fully connected layer Figure 1: Architecture of 3-Layer CNN-LSTM model. update the weights of the neural network. The BP algorithm The σ and Tanh is a nonlinear activation function and UI, first calculates the gradient using the chain rule and then WI, UF, WF, UO, WO, UL, and WL are the respective weights updates the weights of the network on the base of the calculated which are having M × 2N dimensions, where M shows the loss. The BP starts from the output layer and the whole network number of memory cells and N shows the dimension of the is traversed towards the input layer which is facing vanishing input vector. The mathematics of LSTM behind the whole gradient or exploding gradient problems while updating the process is formalized in [38] as shown in equations (2) to (7). weights in DNN. So, to avoid the said problem of gradient explosion or gradient vanishing during training traditional RNNs, an LSTM algorithm is proposed. Furthermore, RNNs 3.3. Data Representation and Feature Extraction. Data rep- are unable to memorize long sequences of data, while LSTM resentation is the first section of the suggested approach efficiently deals with it. LSTM is a type of RNN and the building which contains data collection, preprocessing, and feature block of artificial neural network which is having additional extraction. Data for this research study were collected memory cells for time steps and remembers the past infor- through an RGB camera, from participants performing mation. The process diagram of LSTM is shown in Figure 2. It exercises under the direction of a physiotherapist. Data is capable to remember and learn long-term sequences. LSTM augmentation is used to reduce overfitting and enlarge the consists of 4 different components: input gate (It), output gate dataset artificially as shown in Figure 3. After the collection (Ot), forget gate (Ft), and cell state (Ct) at time step (t) [37]. The of raw images, the next step is to preprocess them prior to past information is stored in the state vector of Ct−1 . The It the implementation of any proceeding functionalities. Data decides how to update the state vector using the current input preprocessing involves data cleaning such as noise removal, information. The data which are added to the state from the resizing and filling, or removing null values. current input is represented through Lt vector. Zt represents Data for each rehabilitation physiotherapy exercise are input vector at time step t, Ht, and Ht−1 is the current and combined in one file named as “Categories” and pre- previous cell output, Ct and Ct−1 is the current and previous processed by resizing each image to reduce complexity. This memory cell, (x) is elementwise multiplication, and W, U whole process is shown in Algorithm 1. represent weights of the four gates i.e. It, Ot, Ft, and Ct. Due to To minimize the set of features and enhance the effi- this structure of LSTM, it is applied to learn efficiently complex ciency of the classification algorithm, useful features are sequences of data. extracted from the data. The extraction of correct features is an exciting job for the recognition of physiotherapy exercises Lt � tanh Zt × WL + Ht−1 xUL , (2) for which CNN is used. In feature extraction phase con- volution operation, pooling operation and ReLU (rectified Ft � σ Zt xWF + Ht−1 xUF , (3) linear unit) activation function is applied as shown in equations (8) and (9), respectively It � σ Zt xWI + Ht−1 xUI , (4) The physiotherapy exercise data named as, New_array is taken as input and passed through convolutional layer followed by ReLU, pooling, and dropout layer to extract Ot � σ Zt xWo + Ht−1 xUO , (5) useful features from it. The input is of order 2 matrix with HxW (rows, columns), H and W are indexed as (m, n), Ct � Ft xCt−1 + It xLt , (6) where 0 ≤ m ≤ H, 0 ≤ n ≤ W. The final and useful feature map values i.e., F_ym, n is obtained through convolutional op- Ht � Ot x tanh Ct . (7) eration layer 3
Journal of Healthcare Engineering 5 Ht x Ct Ct-1 Tanh x Ft It Lt Ot δ δ Tanh δ x U Ht-1 Ht Zt W Figure 2: Process diagram of LSTM. Figure 3: Data augmentation. Om,n � f[j, k]Z[m − j, n − k], BN is applied to stabilize the learning process and reduce (8) internal covariate shift. During the training period, at each m n middle layer, the BN calculates the mean and variance as F Ym,n � max 0, F 0m,n . (9) shown through equations (10) and (11). For each layer, the normalized input is gained from the previously calculated The activation function is applied at every layer to make mean and variance as shown in equation (12). the model capable of solving nonlinear problems as shown in 1 n equation (9), while to minimize the computational load, max mean � Z , (10) pooling and dropout technique is used. m i�1 i 1 n var � Z − mean 2 , (11) 3.4. Classification. The 3-layer CNN-LSTM is used as a m i�1 i learning algorithm for feature extraction and its classifica- tion. The whole process of feature extraction and its clas- Zi − mean sification under certain categories are shown in Algorithm 2. Z′i � √������ . (12) var + ε For classification of rehabilitation physiotherapy exercise pretrained LSTM is applied as explained in detail in Section During the training of the network, c (standard devia- 3.2. The pretrained LSTM is followed by fully connected tion) and β (mean parameter) along other parameters are (FC) layers, batch normalization (BN) layer, and SoftMax learned. The final mean and variance equations for testing function. The proposed algorithm makes it possible to see the model are shown in equation (13) to (15), respectively that during the training phase, the accuracy is much more [39]. In these equations’ “j” shows the number of batches than the testing phase, which is because of overfitting and where each batch has “m” samples. The final mean and outcome a complication in the regularization and balancing variance are assessed from the previous mean, variance of hyperparameters. formulas calculated for each batch during training.
6 Journal of Healthcare Engineering Input: rehabilitation exercise, labelled dataset categories Preprocessing: (1) Categories ⟵ Dataset categories (2) Apply data augmentation on Categories (3) for each_Image in Categories a: Image_array ⟵ (Image, “gray”) b: Drop null values c: New_array � Resize (Image_array, (256, 256)) end for Preparation: Training_data � [ ] (4) For each_class in Categories a: class_num ⟵ Categories (each_class) b: Training_data ⟵ (New_array, class_num) end for (5) Create Training_data (6) Shuffle randomly (Training_data) ALGORITHM 1: Data preprocessing for training. Consequently, after all in BN, the layers are normalized To increase the number of convolution filters, the through final mean and variance procedures. model learns more complex features which ultimately j increases the number of parameters and causes overfitting 1 issues. So, the accurate and balanced selection of filters at f mean � mean(i) , (13) m i�1 each layer is important. At the first layer, we used 64 filters. In the second convolutional layer, the number of filters is 1 m j doubled compared to the first convolution layer and so on, f var � var(i), (14) to cope with downsampling caused by the pooling layer. m m − 1 i�1 The selected number of filters shown in Table 2 outper- forms the other combination of filters at different layers. In c c × f mean the start, a reduced size filter is used to learn low-level f yi � �������� (Z) + β + �������� . (15) f var + ε f var + ε features, nevertheless, for high level and specific features, large size filters perform better. In layer 1, kernel size of The purpose of training the model is to adjust the filter (5,5) is selected, whereas in layers 2 and 3, kernel size is weights such that the predicted class should be as close as reduced. The idea behind selecting a large filter size at the possible to the actual class. During training, the network runs in start is that it read generic features in one value and its the forward direction to get the resultant predicted value. The effect is more globally on the whole image, but missing loss function is calculated to evaluate how well our proposed local features. In layer 2 and layer 3, a small filter size is model is working. To compare the predicted value with the used to learn local and specific features. The number of corresponding target value through continuous forward pass filters and size of the filter are selected by the hit and trial running, the total loss at the last layer is obtained. The loss is method. guiding the model to update parameters to reduce the error We used different batch sizes and monitor the model rate. The relative probability of real values at the output layer is performance. By selecting 32 batch sizes, the highest ac- calculated through the SoftMax function to recognize the re- curacy is achieved. An optimal learning rate of 0.1 along with habilitation exercise. A short comparison of the proposed 36 epochs is used in the training stage to improve the fitting model to the current state-of-the-art models is given in Table 1. ability of the model. The impact of changing learning rate and the number of epochs was studied and it was concluded 4. Results and Discussion that by reducing the learning rate, the process takes a long time to converge while the high learning rate results in the The model is trained in a fully supervised manner, and the process to converge quickly. It is observed that the learning gradient is backpropagated from the SoftMax to CNN layer rate and the number of epochs have an inter-relationship to reduce the loss. Through randomly selected values the bias with each other and affect model performance. During and weights are initialized at each layer. training the 3-layer CNN-LSTM model, Adam optimizer is used which has the best fitting effect on model performance 4.1. Hyperparameter Selection. During classification, the and gives the highest accuracy. For training purposes, nu- model performance is greatly affected by the selection of merous combinations of hyperparameters are used and hyperparameters. The impact of different hyperparameters tested by using the hit and trial method for parameters such as the number of convolution filters, batch size, learning selection, and finally, the best parameters giving the highest rate, kernel size, pooling size, epochs, and type of optimizer is performance results are selected. The list of selected observed on model performance and explained as follows. hyperparameters is shown in Table 2.
Journal of Healthcare Engineering 7 Input: unobserved exercise image Initialization: (1) Zm,n array of image (m rows, n column) at convolution layer 1 (2) Fi, j filter (i rows, j column) (3) Om,n resultant array obtained after convolution (4) Ym,n output array after removing negative values (5) ∗ sum of product operation (6) FM feature map function (7) Qm,n output array at 2nd convolution layer (8) F_Ym, n output at 3rd convolution layer Preparation: (1) Load CNN model (2) Load trained LSTM model Steps: (3) CNN ⟵ New_array (Zi) (4) Load FM in Conv1: (convolution layer 1), filter_size (5, 5) Number of rows and columns (m, n) a: Om,n ⟵Fi, j × Zm,n No. of filters ⟵ 64 b: Ym, n ⟵ max(0, Om,n ) c: Max_pooling (4, 4) d: Dropout (0.5) (5) Load FM in Conv2: (convolution layer 2), filter size (3, 3) a: Pm,n ⟵ Fi, j × Ym,n No. of filters ⟵ 128 b: Qm, n ⟵ max(0, Pm,n ) c: Max_pooling (2, 2) d: Dropout (0.5) (6) : Load FM in Conv3: (convolution layer 3), filter size (3, 3) a: F Om,n ⟵ Fi, j × Qm,n No. of filters ⟵ 256 b: F_Ym, n ⟵ max (0, F_0m, n) c: Max_pooling ⟵ 2 × 2 (7) LSTM (64) ⟵ F_Ym, n after Conv3 (8) FC Layer ⟵ Dense (64) (9) Predicted values ⟵ Dense (64) (10) Pass the predicted values through the BN layer (11) Calculate Loss ⟵ (ground_truth_value–predicted_value) (12) SoftMax function ⟵ predicted exercise Output: display and label the predicted exercise ALGORITHM 2: 3-layer CNN-LSTM model for the detection of rehabilitation exercise. Table 1: Comparison of 3-layer CNN-LSTM model with other standard models. S. No Other standard models Proposed CNN-LSTM model RNN-LSTM model used in [33] for HAR system and achieved CNN is used for feature extraction and selection of useful great accuracy, but this model exaggerates the temporal and 1 features, while LSTM is used for exercise recognition. This model understates the spatial information as both of the models best fit maintains a balance between spatial and temporal information. for temporal data. In the proposed model, 3-layer CNN is applied first to process In [40], LSTM-CNN model is used for activity recognition. The spatial input data. The data is then fed to the LSTM layer to 2 LSTM is used before CNN to process input data which is not further refine the extracted data and detect the rehabilitation efficient for the processing of spatial input data. exercise. The CNN model for exercise recognition was tested and The model learned about 392,765 parameters which conclude 3 observed that CNN learn too many complex parameters of about that CNN-LSTM model is lightweight which has reduced 2,575,753 during training. complexity and achieved better accuracy. The 3-layer CNN-LSTM model learns activity feature In [30], KNN is applied to recognize human activity which fails automatically and handles these issues efficiently. We used an 4 to address occlusion, deformation, and viewpoint variation, as RGB camera instead of Kinect sensors to reduce complexity and KNN is using hand-crafted techniques. processing time.
8 Journal of Healthcare Engineering Table 2: List of hyperparameters selected for training. 4.4.1. Precision. It is the ratio of true positive (TP) to TP and Processing stage Hyperparameters Values selected false positive (FP) observation, which is predicted positive and is calculated as Filters 64 Kernel size 5 TP Convolution_1 Stride 1 Precision � . (16) (TP + FP) Max pooling 4 Filters 128 Convolution_2 Kernel size 3 4.4.2. Recall. The recall is the ratio of the predicted true Max pooling 2 positive observation to true positive and false negative (FN) Filters 256 observation, which is actually positive and is calculated as Convolution_3 Kernel size 3 Max pooling 2 follows: Learning rate 0.1 TP Recall � . (17) Epochs 36 (TP + FN) Training parameters Batch size 32 Optimizer Adam 4.4.3. F1-Score. F1-score is the weighted average of recall and precision and calculated as 4.2. Dataset of Rehabilitation Exercises. A physiotherapist employs many complex exercises for the rehabilitation of 2 × recall × precision F1-score � . (18) stroke patients depending on the type and severity of stroke. recall + precision Rehabilitation exercises include but are not limited to flexion and extension of the neck, flexion and extension of the trunk, flexion and extension of the knee joint, flexion and extension 4.4.4. Accuracy. It is the ratio of correctly classified activities of the hip joint, flexion and extension of the wrist, abduction to the total number of classified activities. and adduction of the upper limb, and dorsiflexion and plantar flexion of the foot as shown in Figure 4. (TP + TN) Accuracy � . (19) In this research work, we generated our own dataset (TP + FP + TN + FN) under the direction of a physiotherapist consisting of 2250 different samples. The data is recorded from participants performing different rehabilitation exercises through an 4.5. Results. The confusion matrix of rehabilitation exercises RGB camera. The participants include males, females, and is shown in Figure 5, which shows the true label at the y-axis children having up to 40 years of age. The description of the and the predicted label at the x-axis. The numbering from 0 dataset is given in Table 3. to 6 shows the set of different exercises e.g. dorsiflexion, neck exercise, plantar flexion, trunk extension, trunk flexion, wrist extension, and wrist flexion, respectively. The classi- 4.3. Training. The model training is performed on a Dell fication report of 3-layer CNN-LSTM along with the CNN laptop with an Intel Core i7 processor and 16 GB RAM model is shown in Table 4. equipped with 64 bits operating system The classification The performance evaluation of the 3-layer CNN-LSTM model is implemented in Python 2.7.0 with Jupyter note- model is obtained in terms of precision, recall, f1-score, and book. The main theme of the training is to adjust the filter accuracy which is calculated according to equations (16) to weights such that the predicted class should be as close as (19). The performance of the model for discrete rehabili- possible to the actual class. The dataset is divided into two tation exercises is evaluated through Figure 6. To validate the parts. The first part is having 80% of data which is used for dominance of the suggested 3-layer CNN-LSTM model, it is training purposes while the second part is having 20% of compared with KNN [30] and CNN models. The overall data and is used for testing the efficiency of the model. The accuracy achieved by KNN, CNN, and 3-layer CNN-LSTM Adam optimization algorithm is used for adjusting the model is given in Table 5 while represented graphically in weights in such a way to move from a large loss point to a Figure 7. Consequently, we see from Table 5, a gradual small loss point using the error backpropagation method for decrease in test errors and an increase in accuracy. The optimization. According to the activation function, the model achieved the highest recall of 96%, precision of 95%, weights are then updated. f1-score of 95% for dorsiflexion of the foot, and lowest recall of 90%, precision of 84%, and f1-score of 87% for the wrist flexion as shown in Figure 6. 4.4. Performance Evaluation Metrics. To scrutinize the The precision, recall, and f1-scores are calculated performance of the model various evaluation metrics are to validate the performance, in case the dataset in a class used, which shows the reliability of the model in examining is imbalanced and accuracy may produce deceptive the rehabilitation exercise. The most common metrics used results. for performance evaluation are recall, f1-score, precision, In 3-layer CNN-LSTM model, the LSTM which is a and accuracy [41–44]. variant of RNN is the primary learning element and
Journal of Healthcare Engineering 9 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) Figure 4: List of rehabilitation exercises, from left to right: (a) extension of the neck; (b) rotation of neck; (c) flexion of the trunk side view; (d) flexion of the trunk front view; (e), (f ) extension of the elbow joint (front and side view); (g) dorsiflexion of the foot; (h) plantar flexion of the foot; (i), (j) extension and flexion of the knee joint; (k) extension of the trunk; and (l) flexion of the wrist. Table 3: Description of the dataset. 398 0 4 0 0 14 0 0 Total samples of rehabilitation exercise: 2250 0 146 1 5 4 2 7 320 1 Number of participants: 20 7 6 184 2 4 4 4 240 2 Rehabilitation exercise No. of samples Flexion and extension of neck 332 4 2 5 146 3 3 5 3 Flexion and extension of trunk 328 160 2 5 0 0 148 3 2 4 Flexion of knee joint 348 Flexion and extension of wrist 489 6 5 0 2 3 300 9 80 5 Dorsiflexion and plantar flexion 419 Abduction of upper limb 334 4 3 1 1 4 4 147 6 0 0 1 2 3 4 5 6 Figure 5: Confusion matrix of 3-layer CNN-LSTM model. produced better or at least the same accuracy compared to other state-of-the-art models on the prescribed datasets. The model was tested on different types of data, which the the model is not overfitted and is performing better in model had not seen before, and observed that the CNN- situations like that of occlusion, viewpoint variation, and LSTM model has the same best accuracy. It confirms that deformation.
10 Journal of Healthcare Engineering Table 4: Classification report of 3-layer CNN-LSTM and CNN model. CNN-LSTM model CNN model Exercises Precision Recall F1-score Precision Recall F1-score Dorsiflexion 0.95 0.96 0.95 0.93 0.99 0.96 Neck exercise 0.87 0.88 0.88 0.87 0.87 0.87 Plantar flexion 0.94 0.87 0.91 0.91 0.86 0.89 Trunk extension 0.94 0.87 0.90 0.93 0.81 0.86 Trunk flexion 0.89 0.93 0.91 0.95 0.84 0.89 Wrist extension 0.91 0.92 0.92 0.85 0.90 0.88 Wrist flexion 0.84 0.90 0.87 0.84 0.90 0.87 98 96 94 92 Performance (%) 90 88 86 84 82 80 78 wrist Exercise NECK Exercise wrist Flexion Trunk Exercise DorsiFlexion PlanterFlexion Trunk Flexion List of Exercises precision Recall f1-score Figure 6: Performance in terms of precision, recall, and f1-score. Table 5: Performance comparison of different models. S. No Model Accuracy (%) 1 KNN [30] 86.1 2 CNN 89.9 3 3-layer CNN-LSTM 91.3 3 -Layer CNN -LSTM Model CNN Model 100 100 90 90 80 80 Accuracy and Loss Accuracy and Loss 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 0 0 3 6 9 12 15 18 21 24 27 30 33 36 0 3 6 9 12 15 18 21 24 27 30 33 36 Number of Epochs Number of Epochs Loss Loss Accuracy Accuracy Figure 7: Line graph showing the average accuracy of CNN-LSTM and CNN model.
Journal of Healthcare Engineering 11 5. Conclusion and Future Work Therapy : Official Publication of the American Occupational Therapy Association, vol. 70, no. 3, pp. 7003290040–11, 2016. Deep learning models have powerful learning abilities in [9] S. Skaria, A. Al-Hourani, M. Lech, and R. J. Evans, “Hand- dealing with deformation, viewpoint variation, occlusion, gesture recognition using two-antenna Doppler radar with and background switches. In the suggested model, a DNN deep convolutional neural networks,” IEEE Sensors Journal, algorithm is implemented that combines CNN and LSTM as vol. 19, no. 8, pp. 3041–3048, 2019. 3-layer CNN-LSTM for the detection of rehabilitation ex- [10] W. Tao, T. Liu, R. Zheng, and H. Feng, “Gait analysis using wearable sensors,” Sensors, vol. 12, no. 2, pp. 2255–2283, 2012. ercises. After the fully connected layer, a BN layer is added to [11] F. Karray, M. Alemzadeh, J. Abou Saleh, and M. Nours Arab, reduce the internal covariate shift and speed up the con- “Human-computer interaction: overview on state of the art,” vergence procedure of the model. In the proposed archi- International Journal on Smart Sensing and Intelligent Sys- tecture, the data collected by an RGB camera under the tems, vol. 1, no. 1, pp. 137–159, 2008. direction of a physiotherapist is fed into a 3-layer CNN [12] J. Reyes-Campos, G. Alor-Hernández, I. Machorro-Cano, followed by an LSTM layer. The CNN along with LSTM is J. O. Olmedo-Aguirre, J. L. Sánchez-Cervantes, and making the model proficient in learning the spatial and L. Rodrı́guez-Mazahua, “Discovery of resident behavior temporal dynamics at various time slots. The parameters are patterns using machine learning techniques and IoT para- learned over CNNs and further classified by LSTM to attain digm,” Mathematics, vol. 9, no. 3, pp. 1–25, 2021. better accuracy and preserve a high recognition rate. In the [13] L. Verde, G. De Pietro, and G. Sannino, “Voice disorder future, the model shall be trained on more complex datasets identification by using machine learning techniques,” IEEE Access, vol. 6, pp. 16246–16255, 2018. to detect complex rehabilitation physiotherapy exercises. [14] M. Elhoseny, “Multi-object detection and tracking (MODT) Moreover, the algorithm will be updated in such a way that machine learning model for real-time video surveillance can be used at home for a patient to carry out the prescribed systems,” Circuits, Systems, and Signal Processing, vol. 39, rehabilitation exercises without direct in-person supervision no. 2, pp. 611–630, 2020. of physical therapists. [15] M. Xi, N. Lingyu, and S. Jiapeng, “RETRACTED: research on urban anti-terrorism intelligence perception system from the Data Availability perspective of Internet of things application,” International Journal of Electrical Engineering Education, vol. 58, no. 2, The data that support the findings of this study are available pp. 248–257, 2021. upon request from the first author. [16] I. Ullah, M. Jian, S. Hussain et al., “Global context-aware multi-scale feature aggregative network for salient object detection,” Neurocomputing, vol. 455, pp. 139–153, 2021. Conflicts of Interest [17] J. P. Zhu, H. Q. Chen, and W. Bin Ye, “Classification of human activities based on radar signals using 1D-CNN and The authors declare that they have no conflicts of interest. LSTM,” in Proceedings of the IEEE Int. Symp. Circuits Syst, no. 1–5, Seville, Spain, October 2020. [18] A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi, “A survey References of the recent architectures of deep convolutional neural [1] J. Bernhardt, P. Langhorne, and G. Kwakkel, “Stroke care 2 : networks,” Artificial Intelligence Review, vol. 53, no. 8, stroke rehabilitation,” Lancet, vol. 377, no. 9778, pp. 1693– pp. 5455–5516, 2020. 1702, 2011. [19] B. Schuster-Böckler and A. Bateman, “An introduction to [2] I. E. Nikityuk, G. A. Ikoeva, and O. I. Kivoenko, “The vertical hidden Markov models,” Curr. Protoc. Bioinforma, vol. 18, balance management system is more synchronized in children no. 1, 2007. with cerebral paralysis than in healthy children,” Pediatric [20] S. J. D. Prince and J. H. Elder, “Probabilistic linear dis- Traumatology, Orthopaedics and Reconstructive Surgery, criminant analysis for inferences about identity,” in Pro- vol. 5, no. 3, pp. 49–57, 2017. ceedings of the IEEE Int. Conf. Comput. Vis., no. 2, pp. 1–8, [3] D. Nizamutdinov and L. A. Shapiro, “Overview of traumatic Seoul, Republic of Korea, October 2007. brain injury: an immunological context,” Brain Sciences, [21] M. Schonlau and R. Y. Zou, “The random forest algorithm for vol. 7, no. 1, pp. 1–11, 2017. statistical learning,” STATA Journal, vol. 20, no. 1, pp. 3–29, [4] L. Adolfsson, “Post-traumatic stiff elbow,” EFORT Open 2020. Reviews, vol. 3, no. 5, pp. 210–216, 2018. [22] M. Maalouf, “Logistic regression in data analysis: an over- [5] R. J. Oskouian, C. A. Sansur, and C. I. Shaffrey, “Congenital view,” International Journal of Data Analysis Techniques and abnormalities of the thoracic and lumbar spine,” Neurosur- Strategies, vol. 3, no. 3, pp. 281–299, 2011. gery Clinics of North America, vol. 18, no. 3, pp. 479–498, 2007. [23] S. Huang, C. A. I. Nianguang, P. Penzuti Pacheco, [6] S. Esposito and M. R. Longo, “Guillain-Barré syndrome,” S. Narandes, Y. Wang, and X. U. Wayne, “Applications of Autoimmunity Reviews, vol. 16, no. 1, pp. 96–101, 2017. support vector machine (SVM) learning in cancer genomics,” [7] K. N. Borschmann and K. S. Hayward, “Recovery of upper CANCER GENOMICS and PROTEOMICS, vol. 15, no. 1, limb function is greatest early after stroke but does continue to pp. 41–51, 2018. improve during the chronic phase: a two-year, observational [24] Y. Y. Song and Y. Lu, “Decision tree methods: applications for study,” Physiotherapy, vol. 107, pp. 216–223, 2020. classification and prediction,” Shanghai Arch. Psychiatry, [8] C. A. Doman, K. J. Waddell, R. R. Bailey, J. L. Moore, and vol. 27, no. 2, pp. 130–135, 2015. C. E. Lang, “Changes in upper-extremity functional capacity [25] A. V. Vokhmintcev, I. V. Sochenkov, V. V. Kuznetsov, and and daily performance during outpatient occupational ther- D. V. Tikhonkikh, “Face recognition based on a matching apy for people with stroke,” American Journal of Occupational algorithm with recursive calculation of oriented gradient
12 Journal of Healthcare Engineering histograms,” Doklady Mathematics, vol. 93, no. 1, pp. 37–41, [41] V. Bijalwan, V. B. Semwal, and V. Gupta, “Wearable sensor- 2016. based pattern mining for human activity recognition: deep [26] J. S. Raikwal and K. Saxena, “Performance evaluation of SVM learning approach,” Industrial Robot, pp. 1–10, 2021. and K-nearest neighbor algorithm over medical data set,” [42] I. Khan, S. Ali, and S. Khusro, “Smartphone-based lifelogging: International Journal of Computing and Applications, vol. 50, an investigation of data volume generation strength of no. 14, pp. 35–39, 2012. smartphone sensors,” in Proceeding of the International [27] A. Das Antar, M. Ahmed, and M. A. R. Ahad, “Challenges in Conference on Simulation Tools and Techniques, pp. 63–73, sensor-based human activity recognition and a comparative Springer, Chengdu, China, July 2019. analysis of benchmark datasets: a review,” in Proceeding of the [43] I. Khan, S. Khusro, S. Ali, and A. U. Din, “Daily life activities 2019 Jt. 8th Int. Conf. Informatics, Electron. Vision, ICIEV on smartphones and their effect on battery life for better 2019 3rd Int. Conf. Imaging, Vis. Pattern Recognition, icIVPR personal information management,” Proceedings of the 2019 with Int. Conf. Act. Behav. Comput., pp. 134–139, Pakistan Academy of Sciences: A. Physical and Computational Washington, DC, USA, June 2019. Sciences, vol. 53, no. 1, pp. 61–74, 2016. [28] B. K. Yousafzai, S. A. Khan, T. Rahman et al., “Student- [44] I. Khan, S. Khusro, and I. Alam, “Smartphone distractions and performulator: student academic performance using hybrid its effect on driving performance using vehicular lifelog deep neural network,” Sustainability, vol. 13, no. 17, p. 9775, dataset,” in Proceeding of the 2019 International Conference on Electrical, Communication, and Computer Engineering 2021. (ICECCE), pp. 1–6, IEEE, Swat, Pakistan, July 2019. [29] Z. Ali, G. Qi, K. Muhammad, P. Kefalas, and S. Khusro, “Global citation recommendation employing generative adversarial network,” Expert Systems with Applications, vol. 180, 2021. [30] Ö. F. İnce, I. F. Ince, M. E. Yıldırım, J. S. Park, J. K. Song, and B. W. Yoon, “Human activity recognition with analysis of angles between skeletal joints using a RGB-depth sensor,” ETRI Journal, vol. 42, no. 1, pp. 78–89, 2020. [31] M. Ye, Q. Zhang, L. Wang, and J. Zhu, “A survey on human motion analysis,” Time-of-Flight Depth Imaging. Sensors, Algorithms, Appl.vol. 8200, pp. 149–187, 2013. [32] B. J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, and R. M. Cook, “Moore “real-time human pose from single depth images,” Proceeding of the. 2011 Conf. Comput. Vis. Pattern Recognit, vol. 56, pp. 1295–1304, Denver, CO, USA, June 2013. [33] P. Agarwal and M. Alam, “A lightweight deep learning model for human activity recognition on edge devices,” Procedia Computer Science, vol. 167, no. 2019, pp. 2364–2373, 2020. [34] W. A. Abro, G. Qi, Z. Ali, Y. Feng, and M. Aamir, “Multi-turn intent determination and slot filling with neural networks and regular expressions,” Knowledge-Based Systems, vol. 208, 2020. [35] R. Bibi, Y. Saeed, A. Zeb et al., “Edge AI-based automated detection and classification of road anomalies in VANET using Deep Learning,” Computational Intelligence and Neu- roscience, 2021. [36] H. Salman, J. Grover, and T. Shankar, Hierarchical Rein- forcement Learning for Sequencing Behaviors, vol. 2733, pp. 2709–2733, 2018. [37] J. Zhu, H. Chen, and W. Ye, A Hybrid CNN-LSTM Network for the Classification of Human Activities Based on Micro- doppler Radar, IEEE Access, vol. 8, pp. 24713–24720, 2020. [38] M. Milenkoski, K. Trivodaliev, S. Kalajdziski, M. Jovanov, and B. R. Stojkoska, “Real time human activity recognition on smartphones using LSTM networks,” in Proceeding of the 2018 41st Int. Conv. Inf. Commun. Technol. Electron. Microelectron. MIPRO 2018 - Proc., pp. 1126–1131, Opatija, Croatia, May 2018. [39] P. Luo, X. Wang, W. Shao, and Z. Peng, “Towards under- standing regularization in batch normalization,” in Proceeding of the 7th Int. Conf. Learn. Represent. ICLR 2019, pp. 1–23, New Orleans, LA, USA, May 2019. [40] K. Xia, J. Huang, and H. Wang, “LSTM-CNN architecture for human activity recognition,” IEEE Access, vol. 8, pp. 56855– 56866, 2020.
You can also read