Automated Detection of Rehabilitation Exercise by Stroke Patients Using 3-Layer CNN-LSTM Model

Page created by Christopher Klein

Health & Fitness

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Hindawi
Journal of Healthcare Engineering
Volume 2022, Article ID 1563707, 12 pages
https://doi.org/10.1155/2022/1563707

Research Article
Automated Detection of Rehabilitation Exercise by Stroke
Patients Using 3-Layer CNN-LSTM Model

Zia Ur Rahman ,1 Syed Irfan Ullah,1 Abdus Salam ,1 Taj Rahman ,2 Inayat Khan ,3
and Badam Niazi 4
1
Department of Computing and Technology Abasyn University, Peshawar 25000, Pakistan
2
Qurtuba University of Science and Technology Peshawar, Peshawar 25000, Pakistan
3
Department of Computer Science, University of Buner, Buner 19290, Pakistan
4
Department of Computer Science, University of Nangarhar, Jalalabad 2600, Afghanistan

Correspondence should be addressed to Badam Niazi; niazi5.48@gmail.com

Received 10 December 2021; Accepted 12 January 2022; Published 4 February 2022

Academic Editor: Liaqat Ali

Copyright © 2022 Zia Ur Rahman et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
According to statistics, stroke is the second or third leading cause of death and adult disability. Stroke causes losing control of the
motor function, paralysis of body parts, and severe back pain for which a physiotherapist employs many therapies to restore the
mobility needs of everyday life. This research article presents an automated approach to detect diﬀerent therapy exercises
performed by stroke patients during rehabilitation. The detection of rehabilitation exercise is a complex area of human activity
recognition (HAR). Due to numerous achievements and increasing popularity of deep learning (DL) techniques, in this research
article a DL model that combines convolutional neural network (CNN) and long short-term memory (LSTM) is proposed and is
named as 3-Layer CNN-LSTM model. The dataset is collected through RGB (red, green, and blue) camera under the supervision of
a physiotherapist, which is resized in the preprocessing stage. The 3-layer CNN-LSTM model takes preprocessed data at the
convolutional layer. The convolutional layer extracts useful features from input data. The extracted features are then processed by
adjusting weights through fully connected (FC) layers. The FC layers are followed by the LSTM layer. The LSTM layer further
processes this data to learn its spatial and temporal dynamics. For comparison, we trained CNN model over the prescribed dataset
and achieved 89.9% accuracy. The conducted experimental examination shows that the 3-Layer CNN-LSTM outperforms CNN
and KNN algorithm and achieved 91.3% accuracy.

1. Introduction cerebrum paralysis [2], trauma and paralytic injury [3],
posttraumatic stiffness [4], congenital deformity [5], and
Stroke is a worldwide healthcare problem which causes due Guillain–barre syndrome [6]. Injuries to the cervical spinal
to heart failure or malfunctioning of blood vessels. It is a cord usually result in loosened leg and arms functions where
common, dangerous, and disabling health disease that af- hip flexors and legs are degraded by lumbar and spinal cord
fects people all around the world. Stroke is the second or injuries. The survivors of a stroke have a similar condition
third leading cause of death in most regions, as well as one of since they must relearn the lost skills when their brain is hit
the leading causes of acquired adult disability [1]. Over the by a stroke.
next couple of decades, the frequency of stroke-related A physiotherapist employs many therapies, including
burden is predicted to rise. Stroke causes losing control of nerve reeducation, task coaching, and muscle strengthening
the motor function, incoordination or paralysis of all body to restore the mobility needs of everyday life. Different
parts, and severe back pain. Due to stroke, patients will have physiotherapy and rehabilitation programs are needed to
muscle and neurological trauma and disorders such as restore the function of the upper extremity and increase their

2 Journal of Healthcare Engineering

quality of life. Some exercises such as motor training physiotherapy exercises, (3) extracting and classifying ac-
(movement exercise), mobility training (restriction-in- tivity features automatically, (5) and having better or at least
duced), motion therapy (flow therapy), and repetitive task same accuracy as of the existing DL approaches laterally with
training (workout training) are very effective for learning fast convergence speed and good generalization ability.
and taking control of the body [7]. Both for upper and lower The manuscript is organized as follows: in Section 2 we
limbs, balancing exercises are of considerable benefit to take a look at some current techniques for HAR which is
increase the balance after spinal cord injury. It is obvious using machine learning and deep learning approaches.
that serious health problems may lead to death or acquired Section 3 explains CNN and LSTM algorithms and data
physical impairment due to injury to the backbone. Different preprocessing for the proposed model. Moreover, it contains
neuroplastic results have shown that it can be recovered a detailed overview of the 3-Layer CNN-LSTM model and its
partly through adequate rehabilitation exercises [8]. implementation. In Section 4, the performance of the 3-
Motor function controls mobility and muscle movement Layer CNN-LSTM model is explained along with their
and is a commonly recognized impairment due to stroke. To experimental results. Section 5, concludes the research work
reestablish motor function, the most important technique is with a brief summary.
to perform rehabilitation exercises under the direction of a
physiotherapist. The financial requirement to receive the 2. Literature Review
treatment is not easy, so the family can suffer from financial
burden. The resolution of this is a new virtual reality re- Machine learning models are used to learn the fundamental
habilitation problem, which uses sensor tools to capture and connections in data through experience while performing
recognize movements. The rehabilitation program requires some tasks and making decisions without explicit instruc-
physiological exercises like flexion, extension, abduction, tions [18]. For a very long time, ML models have been used
adduction, enlargement, sleeves, dorsiflexion, plantar flex- widely for HAR. Different types of models which can apply
ion, and rotation of various joints in patients with muscular for HAR depend on data type, the volume of data, number of
and neurological trauma and disorder. In the existing lit- activities, similarities among activities, and number of ac-
erature studies, most of the researchers have focused on the tivity classes. The existing ML models such as hidden
detection of human activities like standing, sitting, sleeping, Markov model (HMM) [19], linear discriminant analysis
walking up and down stairs, etc., but very less attention was (LDA) [20], random forest (RnF) [21], logistic regression
focused on the recognition and classification of rehabilita- [22], support vector machine (SVM) [23], decision tree (DT)
tion physiotherapy exercises which is a multifaceted area of [24], histogram oriented gradient (HOG) [25], and
HAR. K-nearest neighbour (KNN) [26] are used for human ac-
Hitherto, HAR has been widely used in numerous ap- tivity classification. Nevertheless, for precision and accu-
plications, like gesture recognition gait analysis, human- racies of the abovementioned algorithms, the selection of
computer interaction, home behavior analysis, personal different parameters like the method of distance calculation
health system, video surveillance, and antiterrorism moni- and the number of neighbors for KNN, choice of the kernel
toring [9–16]. It has the ability to learn in advance from raw for SVM, the tolerance value for LDA, and the number of
data around human activities. Currently, HAR is a popular trees for RnF plays a significant role which should be
research track, due to progression in the field of human- considered carefully [27–29]. These algorithms have
computer interaction. Generally, there are two types of achieved remarkable classification accuracies, nonetheless, it
HAR: sensor-based and video-based. Sensor-based HAR requires a lot of hand-tuning to formulate the data, feature
depends on the data learned through keen sensors. Due to engineering, preprocessing, and domain knowledge
the development of ubiquitous computing and sensor amongst others. These methods are not suitable in scenarios
technology, sensor-based HAR is more frequently used. To like indoor environments where confidentiality is required.
improve recognition accuracy, researchers have developed Some of the other approaches are highly vulnerable to il-
various types of sensing technologies such as techniques lumination disparities and background changes which are
based on static and dynamic sensors. The video-based HAR restraining their practical use.
takes advantage of the data acquired through various kinds A unique biometric system for detecting human actions
of cameras to determine human activities [17], which is in 3D space is proposed in [30] in which joint skeleton angles
becoming popular due to the reduced complexity and ease of recorded through an RGB depth sensor are used as input
availability of different kinds of cameras. In this research features. The angle information is stored using the sliding
article, for the detection of rehabilitation physiotherapy kernel method. Before lowering the data dimension, the
exercise, the dataset is collected through an RGB camera, Haar wavelet transform (HWT) is used to maintain feature
and then a 3-Layer CNN-LSTM algorithm is applied for the information. For dimensionality reduction, an averaging
detection of rehabilitation physiotherapy exercise. The 3- technique is applied to reduce computing costs and result in
Layer CNN-LSTM algorithm seeks to leverage the power of faster convergence. Moreover, the system may be used for
merging both CNN and LSTM and address the deficiencies elderly care and video surveillance, but there are a few
of existing approaches, laterally with the following charac- drawbacks to the suggested approach. First, improper
teristics: (1) the model is robust enough to perform equally skeletal detection leads to inaccurate angle calculations,
well or better on input data, (2) it is evaluated on our self- which causes the classification to be automatically misled. As
created complex dataset, having rehabilitation the system is trained on activities in two directions at various

Journal of Healthcare Engineering 3

angles and positions, there may be some confusion when maintenance, robot-assisted techniques are difficult to be
attempting to recognize an activity due to probable simi- used. Moreover, due to the shortage of healthcare providers,
larities in the positions and angles of different activities. artificial form of rehabilitation is also hard to be accessed. As
RGB-D images are beneficial for action recognition; how- we know that rehabilitation of stroke patients is a long-
ever, the computational complexity of the learning model lasting process due to which both robot-assisted and arti-
grows rapidly as the number of frames grows. As a result, the ficial forms of rehabilitation are not feasible. Therefore, an
system becomes more complex and slower. Thus, instead of automated rehabilitation training method is needed to ad-
an RGB-D camera, a simple RGB camera could be explored dress this issue. Consequently, we designed an automated
to broaden the applications of the HAR system. Activity model to recognize the physiotherapy exercise while keeping
sequence recording, function extraction, model imple- in view the existing HAR approaches. The architecture of the
mentation, and finally identification are the four essential proposed algorithm is shown in Figure 1.
steps in vision-based HAR [31, 32]. The Kinect-based re- The 3-Layer CNN-LSTM model seeks to leverage the
habilitation training system has arisen recently, which is power of merging both CNN and LSTM. The main approach is
utilized in most research studies. Some researchers focus on divided into two sections, the detection of physiotherapy ex-
skeleton data while most of the researchers use RGB-D data, ercise and its classification. The first section contains data
which is due to the fact that Kinect is based on depth sensors collection, preprocessing of images, dimensionality reduction,
and uses structured light, which is not accurate. On the basis and data augmentation. The second section is using a com-
of depth data, it obtains the skeleton data, which is of low bination of DL models to evaluate the features and classify the
quality and high noise. Thus, the data obtained through physiotherapy exercises efficiently. To test the model’s effi-
skeleton joints is not accurate and has outliers which are ciency, various experiments were performed for the detection
declining the model performance. and classification of physiotherapy exercises. The main com-
Researchers have turned towards DL techniques for the ponents of the algorithm are discussed briefly below.
detection of complex human activities. DL techniques extract
features automatically from raw data during the training phase
3.1. Convolutional Neural Network. It is a type of deep neural
and have produced remarkable results in many activity rec-
network (DNN) having multiple layers. Its concept is origi-
ognition tasks. It has tremendous applications in the field of
nated from the receptive field and neural cognitive machine
HAR, as the process of extracting features and classification are
which is more sophisticated than a traditional neural network.
performed simultaneously. The RNN-LSTM approach used in
In the presence of additional deep layers, the DNN model
[33, 34] has achieved outstanding performances and shown
learns deeply compared to other shallow neural networks.
excellent results when compared to traditional hand-crafted
Dealing with the problem of image classification and recog-
practices, nevertheless, it exaggerates the temporal and un-
nition, the CNN has high distortion tolerance due to its spatial
derstates the spatial information of input data. In many tasks
structure and weight sharing mechanism [36]. The funda-
such as NLP and speech recognition amongst others, CNN
mental structures of the CNN are the amalgamation of weight
achieved better or at least similar performance to that of re-
sharing, subsampling, local receptive field, and dimensionality
current neural network (RNN). Due to this trend, recently
reduction during feature extraction. To reduce the complexity
CNN is widely used in the literature to grab the HAR problems.
of the model, enhance their performance and efficiently reg-
Numerous studies showed that CNN-based approaches are far
ulate the number of weights, the weight sharing mechanism is
better than traditional hand-crafted approaches since CNN has
used. CNN map input image data to an output variable i.e., it
the ability to learn complex motion features [35]. The presented
takes the input image data, processes the data, and predict the
DL models for HAR are having simple architecture and great
image class efficiently. Input data is in the form of a two-di-
accuracy. However, these models were tested on simple
mensional vector and CNN deals with it in a better way. In 3-
datasets like standing, sitting, sleeping, walking upstairs and
layer CNN-LSTM model, we used CNN to extract useful
downstairs, and do not have a good generalization ability.
features from the input image matrix. In this step, the phys-
The main goal of this research work is to design a DL-
iotherapy exercise image is taken as input which is processed to
based algorithm for automated detection of rehabilitation
extract features from it. In this work, the LSTM is used to
physiotherapy exercises. In the suggested 3-Layer CNN-
classify the input data under certain categories. The process of
LSTM model, data is fed to the convolutional layer for the
CNN for feature extraction is described in detail in Section 3.3.
extraction of useful features, and for classification, it is
The learning and classification technique of CNN is described
passed to LSTM to recognize the rehabilitation exercise.
mathematically through (1). In equation (1), Zi is the set of
Batch normalization (BN) is applied to stabilize the learning
inputs, Wi is the set of weights, and B is the bias operation.
process and reduce internal covariate shift. The model au-
tomatically detects the set of rehabilitation physiotherapy N
exercises and classify them under certain categories. P � f⎝
⎛ Zi × Wi + B⎠
⎞. (1)
i�1

3. Methodology
For rehabilitation of stroke patients mostly two conventional 3.2. Long Short-Term Memory. LSTM is used to evade the
techniques are used: artiﬁcially and robot-assisted. However, problem of gradient vanishing or gradient exploding during
due to the cost of intelligent robots and their supplementary training. The back propagation (BP) algorithm is used to

4 Journal of Healthcare Engineering

....
....

....
SoftMax
and output
Batch layer
Input data Pre-processing Conv + ReLU + Maxpooling LSTM Normalization
+ Dropout layer Fully connected
layer
Figure 1: Architecture of 3-Layer CNN-LSTM model.

update the weights of the neural network. The BP algorithm The σ and Tanh is a nonlinear activation function and UI,
first calculates the gradient using the chain rule and then WI, UF, WF, UO, WO, UL, and WL are the respective weights
updates the weights of the network on the base of the calculated which are having M × 2N dimensions, where M shows the
loss. The BP starts from the output layer and the whole network number of memory cells and N shows the dimension of the
is traversed towards the input layer which is facing vanishing input vector. The mathematics of LSTM behind the whole
gradient or exploding gradient problems while updating the process is formalized in [38] as shown in equations (2) to (7).
weights in DNN. So, to avoid the said problem of gradient
explosion or gradient vanishing during training traditional
RNNs, an LSTM algorithm is proposed. Furthermore, RNNs 3.3. Data Representation and Feature Extraction. Data rep-
are unable to memorize long sequences of data, while LSTM resentation is the first section of the suggested approach
efficiently deals with it. LSTM is a type of RNN and the building which contains data collection, preprocessing, and feature
block of artificial neural network which is having additional extraction. Data for this research study were collected
memory cells for time steps and remembers the past infor- through an RGB camera, from participants performing
mation. The process diagram of LSTM is shown in Figure 2. It exercises under the direction of a physiotherapist. Data
is capable to remember and learn long-term sequences. LSTM augmentation is used to reduce overfitting and enlarge the
consists of 4 different components: input gate (It), output gate dataset artificially as shown in Figure 3. After the collection
(Ot), forget gate (Ft), and cell state (Ct) at time step (t) [37]. The of raw images, the next step is to preprocess them prior to
past information is stored in the state vector of Ct−1 . The It the implementation of any proceeding functionalities. Data
decides how to update the state vector using the current input preprocessing involves data cleaning such as noise removal,
information. The data which are added to the state from the resizing and filling, or removing null values.
current input is represented through Lt vector. Zt represents Data for each rehabilitation physiotherapy exercise are
input vector at time step t, Ht, and Ht−1 is the current and combined in one file named as “Categories” and pre-
previous cell output, Ct and Ct−1 is the current and previous processed by resizing each image to reduce complexity. This
memory cell, (x) is elementwise multiplication, and W, U whole process is shown in Algorithm 1.
represent weights of the four gates i.e. It, Ot, Ft, and Ct. Due to To minimize the set of features and enhance the effi-
this structure of LSTM, it is applied to learn efficiently complex ciency of the classification algorithm, useful features are
sequences of data. extracted from the data. The extraction of correct features is
an exciting job for the recognition of physiotherapy exercises
Lt � tanh Zt × WL + Ht−1 xUL , (2) for which CNN is used. In feature extraction phase con-
volution operation, pooling operation and ReLU (rectified
Ft � σ Zt xWF + Ht−1 xUF , (3) linear unit) activation function is applied as shown in
equations (8) and (9), respectively
It � σ Zt xWI + Ht−1 xUI , (4) The physiotherapy exercise data named as, New_array is
taken as input and passed through convolutional layer
followed by ReLU, pooling, and dropout layer to extract
Ot � σ Zt xWo + Ht−1 xUO , (5)
useful features from it. The input is of order 2 matrix with
HxW (rows, columns), H and W are indexed as (m, n),
Ct � Ft xCt−1 + It xLt , (6) where 0 ≤ m ≤ H, 0 ≤ n ≤ W. The final and useful feature map
values i.e., F_ym, n is obtained through convolutional op-
Ht � Ot x tanh Ct . (7) eration layer 3

Journal of Healthcare Engineering 5

 Ht

 x
 Ct
 Ct-1
 Tanh
 x
 Ft It Lt
 Ot
 δ δ Tanh δ x
 U

 Ht-1
 Ht

 Zt W

 Figure 2: Process diagram of LSTM.

 Figure 3: Data augmentation.

 Om,n � f[j, k]Z[m − j, n − k], BN is applied to stabilize the learning process and reduce
 (8) internal covariate shift. During the training period, at each
 m n
 middle layer, the BN calculates the mean and variance as
 F Ym,n � max 0, F 0m,n . (9) shown through equations (10) and (11). For each layer, the
 normalized input is gained from the previously calculated
 The activation function is applied at every layer to make mean and variance as shown in equation (12).
the model capable of solving nonlinear problems as shown in
 1 n
equation (9), while to minimize the computational load, max mean � Z , (10)
pooling and dropout technique is used. m i�1 i

 1 n
 var � Z − mean 2 , (11)
3.4. Classiﬁcation. The 3-layer CNN-LSTM is used as a m i�1 i
learning algorithm for feature extraction and its classiﬁca-
tion. The whole process of feature extraction and its clas- Zi − mean 
siﬁcation under certain categories are shown in Algorithm 2. Z′i � √������ . (12)
 var + ε
For classiﬁcation of rehabilitation physiotherapy exercise
pretrained LSTM is applied as explained in detail in Section During the training of the network, c (standard devia-
3.2. The pretrained LSTM is followed by fully connected tion) and β (mean parameter) along other parameters are
(FC) layers, batch normalization (BN) layer, and SoftMax learned. The ﬁnal mean and variance equations for testing
function. The proposed algorithm makes it possible to see the model are shown in equation (13) to (15), respectively
that during the training phase, the accuracy is much more [39]. In these equations’ “j” shows the number of batches
than the testing phase, which is because of overﬁtting and where each batch has “m” samples. The ﬁnal mean and
outcome a complication in the regularization and balancing variance are assessed from the previous mean, variance
of hyperparameters. formulas calculated for each batch during training.

6 Journal of Healthcare Engineering

 Input: rehabilitation exercise, labelled dataset categories
 Preprocessing:
 (1) Categories ⟵ Dataset categories
 (2) Apply data augmentation on Categories
 (3) for each_Image in Categories
 a: Image_array ⟵ (Image, “gray”)
 b: Drop null values
 c: New_array � Resize (Image_array, (256, 256)) end for
 Preparation: Training_data � [ ]
 (4) For each_class in Categories
 a: class_num ⟵ Categories (each_class)
 b: Training_data ⟵ (New_array, class_num) end for
 (5) Create Training_data
 (6) Shuﬄe randomly (Training_data)

 ALGORITHM 1: Data preprocessing for training.

Consequently, after all in BN, the layers are normalized To increase the number of convolution ﬁlters, the
through ﬁnal mean and variance procedures. model learns more complex features which ultimately
 j increases the number of parameters and causes overﬁtting
 1 issues. So, the accurate and balanced selection of ﬁlters at
f mean � mean(i) , (13)
 m i�1 each layer is important. At the ﬁrst layer, we used 64 ﬁlters.
 In the second convolutional layer, the number of ﬁlters is
 1 m
 j doubled compared to the ﬁrst convolution layer and so on,
 f var � var(i), (14) to cope with downsampling caused by the pooling layer.
 m m − 1 i�1 The selected number of ﬁlters shown in Table 2 outper-
 forms the other combination of ﬁlters at diﬀerent layers. In
 c c × f mean the start, a reduced size ﬁlter is used to learn low-level
 f yi � �������� (Z) + β + �������� . (15)
 f var + ε f var + ε features, nevertheless, for high level and speciﬁc features,
 large size ﬁlters perform better. In layer 1, kernel size of
 The purpose of training the model is to adjust the ﬁlter (5,5) is selected, whereas in layers 2 and 3, kernel size is
weights such that the predicted class should be as close as reduced. The idea behind selecting a large ﬁlter size at the
possible to the actual class. During training, the network runs in start is that it read generic features in one value and its
the forward direction to get the resultant predicted value. The eﬀect is more globally on the whole image, but missing
loss function is calculated to evaluate how well our proposed local features. In layer 2 and layer 3, a small ﬁlter size is
model is working. To compare the predicted value with the used to learn local and speciﬁc features. The number of
corresponding target value through continuous forward pass ﬁlters and size of the ﬁlter are selected by the hit and trial
running, the total loss at the last layer is obtained. The loss is method.
guiding the model to update parameters to reduce the error We used diﬀerent batch sizes and monitor the model
rate. The relative probability of real values at the output layer is performance. By selecting 32 batch sizes, the highest ac-
calculated through the SoftMax function to recognize the re- curacy is achieved. An optimal learning rate of 0.1 along with
habilitation exercise. A short comparison of the proposed 36 epochs is used in the training stage to improve the ﬁtting
model to the current state-of-the-art models is given in Table 1. ability of the model. The impact of changing learning rate
 and the number of epochs was studied and it was concluded
4. Results and Discussion that by reducing the learning rate, the process takes a long
 time to converge while the high learning rate results in the
The model is trained in a fully supervised manner, and the
 process to converge quickly. It is observed that the learning
gradient is backpropagated from the SoftMax to CNN layer
 rate and the number of epochs have an inter-relationship
to reduce the loss. Through randomly selected values the bias
 with each other and aﬀect model performance. During
and weights are initialized at each layer.
 training the 3-layer CNN-LSTM model, Adam optimizer is
 used which has the best ﬁtting eﬀect on model performance
4.1. Hyperparameter Selection. During classiﬁcation, the and gives the highest accuracy. For training purposes, nu-
model performance is greatly aﬀected by the selection of merous combinations of hyperparameters are used and
hyperparameters. The impact of diﬀerent hyperparameters tested by using the hit and trial method for parameters
such as the number of convolution ﬁlters, batch size, learning selection, and ﬁnally, the best parameters giving the highest
rate, kernel size, pooling size, epochs, and type of optimizer is performance results are selected. The list of selected
observed on model performance and explained as follows. hyperparameters is shown in Table 2.

Journal of Healthcare Engineering 7

 Input: unobserved exercise image
 Initialization:
 (1) Zm,n array of image (m rows, n column) at convolution layer 1
 (2) Fi, j ﬁlter (i rows, j column)
 (3) Om,n resultant array obtained after convolution
 (4) Ym,n output array after removing negative values
 (5) ∗ sum of product operation
 (6) FM feature map function
 (7) Qm,n output array at 2nd convolution layer
 (8) F_Ym, n output at 3rd convolution layer
 Preparation:
 (1) Load CNN model
 (2) Load trained LSTM model
 Steps:
 (3) CNN ⟵ New_array (Zi)
 (4) Load FM in Conv1: (convolution layer 1), ﬁlter_size (5, 5)
 Number of rows and columns (m, n)
 a: Om,n ⟵Fi, j × Zm,n No. of ﬁlters ⟵ 64
 b: Ym, n ⟵ max(0, Om,n )
 c: Max_pooling (4, 4)
 d: Dropout (0.5)
 (5) Load FM in Conv2: (convolution layer 2), ﬁlter size (3, 3)
 a: Pm,n ⟵ Fi, j × Ym,n No. of ﬁlters ⟵ 128
 b: Qm, n ⟵ max(0, Pm,n )
 c: Max_pooling (2, 2)
 d: Dropout (0.5)
 (6) : Load FM in Conv3: (convolution layer 3), ﬁlter size (3, 3)
 a: F Om,n ⟵ Fi, j × Qm,n No. of ﬁlters ⟵ 256
 b: F_Ym, n ⟵ max (0, F_0m, n)
 c: Max_pooling ⟵ 2 × 2
 (7) LSTM (64) ⟵ F_Ym, n after Conv3
 (8) FC Layer ⟵ Dense (64)
 (9) Predicted values ⟵ Dense (64)
 (10) Pass the predicted values through the BN layer
 (11) Calculate Loss ⟵ (ground_truth_value–predicted_value)
 (12) SoftMax function ⟵ predicted exercise
 Output: display and label the predicted exercise

 ALGORITHM 2: 3-layer CNN-LSTM model for the detection of rehabilitation exercise.

 Table 1: Comparison of 3-layer CNN-LSTM model with other standard models.
S. No Other standard models Proposed CNN-LSTM model
 RNN-LSTM model used in [33] for HAR system and achieved
 CNN is used for feature extraction and selection of useful
 great accuracy, but this model exaggerates the temporal and
1 features, while LSTM is used for exercise recognition. This model
 understates the spatial information as both of the models best ﬁt
 maintains a balance between spatial and temporal information.
 for temporal data.
 In the proposed model, 3-layer CNN is applied ﬁrst to process
 In [40], LSTM-CNN model is used for activity recognition. The
 spatial input data. The data is then fed to the LSTM layer to
2 LSTM is used before CNN to process input data which is not
 further reﬁne the extracted data and detect the rehabilitation
 eﬃcient for the processing of spatial input data.
 exercise.
 The CNN model for exercise recognition was tested and The model learned about 392,765 parameters which conclude
3 observed that CNN learn too many complex parameters of about that CNN-LSTM model is lightweight which has reduced
 2,575,753 during training. complexity and achieved better accuracy.
 The 3-layer CNN-LSTM model learns activity feature
 In [30], KNN is applied to recognize human activity which fails
 automatically and handles these issues eﬃciently. We used an
4 to address occlusion, deformation, and viewpoint variation, as
 RGB camera instead of Kinect sensors to reduce complexity and
 KNN is using hand-crafted techniques.
 processing time.

8 Journal of Healthcare Engineering

Table 2: List of hyperparameters selected for training. 4.4.1. Precision. It is the ratio of true positive (TP) to TP and
Processing stage Hyperparameters Values selected
false positive (FP) observation, which is predicted positive
and is calculated as
Filters 64
Kernel size 5 TP
Convolution_1
Stride 1
Precision � . (16)
(TP + FP)
Max pooling 4
Filters 128
Convolution_2 Kernel size 3
4.4.2. Recall. The recall is the ratio of the predicted true
Max pooling 2
positive observation to true positive and false negative (FN)
Filters 256
observation, which is actually positive and is calculated as
Convolution_3 Kernel size 3
Max pooling 2
follows:
Learning rate 0.1 TP
Recall � . (17)
Epochs 36 (TP + FN)
Training parameters
Batch size 32
Optimizer Adam
4.4.3. F1-Score. F1-score is the weighted average of recall
and precision and calculated as
4.2. Dataset of Rehabilitation Exercises. A physiotherapist
employs many complex exercises for the rehabilitation of 2 × recall × precision
F1-score � . (18)
stroke patients depending on the type and severity of stroke. recall + precision
Rehabilitation exercises include but are not limited to flexion
and extension of the neck, flexion and extension of the trunk,
flexion and extension of the knee joint, flexion and extension 4.4.4. Accuracy. It is the ratio of correctly classified activities
of the hip joint, flexion and extension of the wrist, abduction to the total number of classified activities.
and adduction of the upper limb, and dorsiflexion and
plantar flexion of the foot as shown in Figure 4. (TP + TN)
Accuracy � . (19)
In this research work, we generated our own dataset (TP + FP + TN + FN)
under the direction of a physiotherapist consisting of 2250
different samples. The data is recorded from participants
performing different rehabilitation exercises through an 4.5. Results. The confusion matrix of rehabilitation exercises
RGB camera. The participants include males, females, and is shown in Figure 5, which shows the true label at the y-axis
children having up to 40 years of age. The description of the and the predicted label at the x-axis. The numbering from 0
dataset is given in Table 3. to 6 shows the set of different exercises e.g. dorsiflexion, neck
exercise, plantar flexion, trunk extension, trunk flexion,
wrist extension, and wrist flexion, respectively. The classi-
4.3. Training. The model training is performed on a Dell fication report of 3-layer CNN-LSTM along with the CNN
laptop with an Intel Core i7 processor and 16 GB RAM model is shown in Table 4.
equipped with 64 bits operating system The classification The performance evaluation of the 3-layer CNN-LSTM
model is implemented in Python 2.7.0 with Jupyter note- model is obtained in terms of precision, recall, f1-score, and
book. The main theme of the training is to adjust the filter accuracy which is calculated according to equations (16) to
weights such that the predicted class should be as close as (19). The performance of the model for discrete rehabili-
possible to the actual class. The dataset is divided into two tation exercises is evaluated through Figure 6. To validate the
parts. The first part is having 80% of data which is used for dominance of the suggested 3-layer CNN-LSTM model, it is
training purposes while the second part is having 20% of compared with KNN [30] and CNN models. The overall
data and is used for testing the efficiency of the model. The accuracy achieved by KNN, CNN, and 3-layer CNN-LSTM
Adam optimization algorithm is used for adjusting the model is given in Table 5 while represented graphically in
weights in such a way to move from a large loss point to a Figure 7. Consequently, we see from Table 5, a gradual
small loss point using the error backpropagation method for decrease in test errors and an increase in accuracy. The
optimization. According to the activation function, the model achieved the highest recall of 96%, precision of 95%,
weights are then updated. f1-score of 95% for dorsiflexion of the foot, and lowest recall
of 90%, precision of 84%, and f1-score of 87% for the wrist
flexion as shown in Figure 6.
4.4. Performance Evaluation Metrics. To scrutinize the The precision, recall, and f1-scores are calculated
performance of the model various evaluation metrics are to validate the performance, in case the dataset in a class
used, which shows the reliability of the model in examining is imbalanced and accuracy may produce deceptive
the rehabilitation exercise. The most common metrics used results.
for performance evaluation are recall, f1-score, precision, In 3-layer CNN-LSTM model, the LSTM which is a
and accuracy [41–44]. variant of RNN is the primary learning element and

Journal of Healthcare Engineering 9

 (a) (b) (c) (d) (e)

 (f) (g) (h) (i) (j)

 (k) (l)

Figure 4: List of rehabilitation exercises, from left to right: (a) extension of the neck; (b) rotation of neck; (c) ﬂexion of the trunk side view;
(d) ﬂexion of the trunk front view; (e), (f ) extension of the elbow joint (front and side view); (g) dorsiﬂexion of the foot; (h) plantar ﬂexion of
the foot; (i), (j) extension and ﬂexion of the knee joint; (k) extension of the trunk; and (l) ﬂexion of the wrist.

 Table 3: Description of the dataset.
 398 0 4 0 0 14 0
 0

 Total samples of rehabilitation exercise: 2250
 0 146 1 5 4 2 7 320
 1

 Number of participants: 20
 7 6 184 2 4 4 4 240
 2

Rehabilitation exercise No. of samples
Flexion and extension of neck 332 4 2 5 146 3 3 5
 3

Flexion and extension of trunk 328 160
 2 5 0 0 148 3 2
 4

Flexion of knee joint 348
Flexion and extension of wrist 489 6 5 0 2 3 300 9 80
 5

Dorsiﬂexion and plantar ﬂexion 419
Abduction of upper limb 334 4 3 1 1 4 4 147
 6

 0
 0 1 2 3 4 5 6
 Figure 5: Confusion matrix of 3-layer CNN-LSTM model.
produced better or at least the same accuracy compared to
other state-of-the-art models on the prescribed datasets.
The model was tested on diﬀerent types of data, which the the model is not overﬁtted and is performing better in
model had not seen before, and observed that the CNN- situations like that of occlusion, viewpoint variation, and
LSTM model has the same best accuracy. It conﬁrms that deformation.

10 Journal of Healthcare Engineering

 Table 4: Classiﬁcation report of 3-layer CNN-LSTM and CNN model.
 CNN-LSTM model CNN model
Exercises Precision Recall F1-score Precision Recall F1-score
Dorsiﬂexion 0.95 0.96 0.95 0.93 0.99 0.96
Neck exercise 0.87 0.88 0.88 0.87 0.87 0.87
Plantar ﬂexion 0.94 0.87 0.91 0.91 0.86 0.89
Trunk extension 0.94 0.87 0.90 0.93 0.81 0.86
Trunk ﬂexion 0.89 0.93 0.91 0.95 0.84 0.89
Wrist extension 0.91 0.92 0.92 0.85 0.90 0.88
Wrist ﬂexion 0.84 0.90 0.87 0.84 0.90 0.87

 98
 96
 94
 92
 Performance (%)

 90
 88
 86
 84
 82
 80
 78

 wrist Exercise
 NECK Exercise

 wrist Flexion
 Trunk Exercise
 DorsiFlexion

 PlanterFlexion

 Trunk Flexion

 List of Exercises

 precision
 Recall
 f1-score
 Figure 6: Performance in terms of precision, recall, and f1-score.

 Table 5: Performance comparison of diﬀerent models.
S. No Model Accuracy (%)
1 KNN [30] 86.1
2 CNN 89.9
3 3-layer CNN-LSTM 91.3

 3 -Layer CNN -LSTM Model CNN Model
 100 100
 90 90
 80 80
 Accuracy and Loss

 Accuracy and Loss

 70 70
 60 60
 50 50
 40 40
 30 30
 20 20
 10 10
 0 0
 0 3 6 9 12 15 18 21 24 27 30 33 36 0 3 6 9 12 15 18 21 24 27 30 33 36
 Number of Epochs Number of Epochs

 Loss Loss
 Accuracy Accuracy
 Figure 7: Line graph showing the average accuracy of CNN-LSTM and CNN model.

Journal of Healthcare Engineering 11

5. Conclusion and Future Work Therapy : Oﬃcial Publication of the American Occupational
 Therapy Association, vol. 70, no. 3, pp. 7003290040–11, 2016.
Deep learning models have powerful learning abilities in [9] S. Skaria, A. Al-Hourani, M. Lech, and R. J. Evans, “Hand-
dealing with deformation, viewpoint variation, occlusion, gesture recognition using two-antenna Doppler radar with
and background switches. In the suggested model, a DNN deep convolutional neural networks,” IEEE Sensors Journal,
algorithm is implemented that combines CNN and LSTM as vol. 19, no. 8, pp. 3041–3048, 2019.
3-layer CNN-LSTM for the detection of rehabilitation ex- [10] W. Tao, T. Liu, R. Zheng, and H. Feng, “Gait analysis using
 wearable sensors,” Sensors, vol. 12, no. 2, pp. 2255–2283, 2012.
ercises. After the fully connected layer, a BN layer is added to
 [11] F. Karray, M. Alemzadeh, J. Abou Saleh, and M. Nours Arab,
reduce the internal covariate shift and speed up the con- “Human-computer interaction: overview on state of the art,”
vergence procedure of the model. In the proposed archi- International Journal on Smart Sensing and Intelligent Sys-
tecture, the data collected by an RGB camera under the tems, vol. 1, no. 1, pp. 137–159, 2008.
direction of a physiotherapist is fed into a 3-layer CNN [12] J. Reyes-Campos, G. Alor-Hernández, I. Machorro-Cano,
followed by an LSTM layer. The CNN along with LSTM is J. O. Olmedo-Aguirre, J. L. Sánchez-Cervantes, and
making the model proﬁcient in learning the spatial and L. Rodrı́guez-Mazahua, “Discovery of resident behavior
temporal dynamics at various time slots. The parameters are patterns using machine learning techniques and IoT para-
learned over CNNs and further classiﬁed by LSTM to attain digm,” Mathematics, vol. 9, no. 3, pp. 1–25, 2021.
better accuracy and preserve a high recognition rate. In the [13] L. Verde, G. De Pietro, and G. Sannino, “Voice disorder
future, the model shall be trained on more complex datasets identiﬁcation by using machine learning techniques,” IEEE
 Access, vol. 6, pp. 16246–16255, 2018.
to detect complex rehabilitation physiotherapy exercises.
 [14] M. Elhoseny, “Multi-object detection and tracking (MODT)
Moreover, the algorithm will be updated in such a way that
 machine learning model for real-time video surveillance
can be used at home for a patient to carry out the prescribed systems,” Circuits, Systems, and Signal Processing, vol. 39,
rehabilitation exercises without direct in-person supervision no. 2, pp. 611–630, 2020.
of physical therapists. [15] M. Xi, N. Lingyu, and S. Jiapeng, “RETRACTED: research on
 urban anti-terrorism intelligence perception system from the
Data Availability perspective of Internet of things application,” International
 Journal of Electrical Engineering Education, vol. 58, no. 2,
The data that support the ﬁndings of this study are available pp. 248–257, 2021.
upon request from the ﬁrst author. [16] I. Ullah, M. Jian, S. Hussain et al., “Global context-aware
 multi-scale feature aggregative network for salient object
 detection,” Neurocomputing, vol. 455, pp. 139–153, 2021.
Conflicts of Interest [17] J. P. Zhu, H. Q. Chen, and W. Bin Ye, “Classiﬁcation of
 human activities based on radar signals using 1D-CNN and
The authors declare that they have no conﬂicts of interest. LSTM,” in Proceedings of the IEEE Int. Symp. Circuits Syst,
 no. 1–5, Seville, Spain, October 2020.
 [18] A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi, “A survey
References of the recent architectures of deep convolutional neural
 [1] J. Bernhardt, P. Langhorne, and G. Kwakkel, “Stroke care 2 : networks,” Artiﬁcial Intelligence Review, vol. 53, no. 8,
 stroke rehabilitation,” Lancet, vol. 377, no. 9778, pp. 1693– pp. 5455–5516, 2020.
 1702, 2011. [19] B. Schuster-Böckler and A. Bateman, “An introduction to
 [2] I. E. Nikityuk, G. A. Ikoeva, and O. I. Kivoenko, “The vertical hidden Markov models,” Curr. Protoc. Bioinforma, vol. 18,
 balance management system is more synchronized in children no. 1, 2007.
 with cerebral paralysis than in healthy children,” Pediatric [20] S. J. D. Prince and J. H. Elder, “Probabilistic linear dis-
 Traumatology, Orthopaedics and Reconstructive Surgery, criminant analysis for inferences about identity,” in Pro-
 vol. 5, no. 3, pp. 49–57, 2017. ceedings of the IEEE Int. Conf. Comput. Vis., no. 2, pp. 1–8,
 [3] D. Nizamutdinov and L. A. Shapiro, “Overview of traumatic Seoul, Republic of Korea, October 2007.
 brain injury: an immunological context,” Brain Sciences, [21] M. Schonlau and R. Y. Zou, “The random forest algorithm for
 vol. 7, no. 1, pp. 1–11, 2017. statistical learning,” STATA Journal, vol. 20, no. 1, pp. 3–29,
 [4] L. Adolfsson, “Post-traumatic stiﬀ elbow,” EFORT Open 2020.
 Reviews, vol. 3, no. 5, pp. 210–216, 2018. [22] M. Maalouf, “Logistic regression in data analysis: an over-
 [5] R. J. Oskouian, C. A. Sansur, and C. I. Shaﬀrey, “Congenital view,” International Journal of Data Analysis Techniques and
 abnormalities of the thoracic and lumbar spine,” Neurosur- Strategies, vol. 3, no. 3, pp. 281–299, 2011.
 gery Clinics of North America, vol. 18, no. 3, pp. 479–498, 2007. [23] S. Huang, C. A. I. Nianguang, P. Penzuti Pacheco,
 [6] S. Esposito and M. R. Longo, “Guillain-Barré syndrome,” S. Narandes, Y. Wang, and X. U. Wayne, “Applications of
 Autoimmunity Reviews, vol. 16, no. 1, pp. 96–101, 2017. support vector machine (SVM) learning in cancer genomics,”
 [7] K. N. Borschmann and K. S. Hayward, “Recovery of upper CANCER GENOMICS and PROTEOMICS, vol. 15, no. 1,
 limb function is greatest early after stroke but does continue to pp. 41–51, 2018.
 improve during the chronic phase: a two-year, observational [24] Y. Y. Song and Y. Lu, “Decision tree methods: applications for
 study,” Physiotherapy, vol. 107, pp. 216–223, 2020. classiﬁcation and prediction,” Shanghai Arch. Psychiatry,
 [8] C. A. Doman, K. J. Waddell, R. R. Bailey, J. L. Moore, and vol. 27, no. 2, pp. 130–135, 2015.
 C. E. Lang, “Changes in upper-extremity functional capacity [25] A. V. Vokhmintcev, I. V. Sochenkov, V. V. Kuznetsov, and
 and daily performance during outpatient occupational ther- D. V. Tikhonkikh, “Face recognition based on a matching
 apy for people with stroke,” American Journal of Occupational algorithm with recursive calculation of oriented gradient

12 Journal of Healthcare Engineering

 histograms,” Doklady Mathematics, vol. 93, no. 1, pp. 37–41, [41] V. Bijalwan, V. B. Semwal, and V. Gupta, “Wearable sensor-
 2016. based pattern mining for human activity recognition: deep
[26] J. S. Raikwal and K. Saxena, “Performance evaluation of SVM learning approach,” Industrial Robot, pp. 1–10, 2021.
 and K-nearest neighbor algorithm over medical data set,” [42] I. Khan, S. Ali, and S. Khusro, “Smartphone-based lifelogging:
 International Journal of Computing and Applications, vol. 50, an investigation of data volume generation strength of
 no. 14, pp. 35–39, 2012. smartphone sensors,” in Proceeding of the International
[27] A. Das Antar, M. Ahmed, and M. A. R. Ahad, “Challenges in Conference on Simulation Tools and Techniques, pp. 63–73,
 sensor-based human activity recognition and a comparative Springer, Chengdu, China, July 2019.
 analysis of benchmark datasets: a review,” in Proceeding of the [43] I. Khan, S. Khusro, S. Ali, and A. U. Din, “Daily life activities
 2019 Jt. 8th Int. Conf. Informatics, Electron. Vision, ICIEV on smartphones and their eﬀect on battery life for better
 2019 3rd Int. Conf. Imaging, Vis. Pattern Recognition, icIVPR personal information management,” Proceedings of the
 2019 with Int. Conf. Act. Behav. Comput., pp. 134–139, Pakistan Academy of Sciences: A. Physical and Computational
 Washington, DC, USA, June 2019. Sciences, vol. 53, no. 1, pp. 61–74, 2016.
[28] B. K. Yousafzai, S. A. Khan, T. Rahman et al., “Student- [44] I. Khan, S. Khusro, and I. Alam, “Smartphone distractions and
 performulator: student academic performance using hybrid its eﬀect on driving performance using vehicular lifelog
 deep neural network,” Sustainability, vol. 13, no. 17, p. 9775, dataset,” in Proceeding of the 2019 International Conference on
 Electrical, Communication, and Computer Engineering
 2021.
 (ICECCE), pp. 1–6, IEEE, Swat, Pakistan, July 2019.
[29] Z. Ali, G. Qi, K. Muhammad, P. Kefalas, and S. Khusro,
 “Global citation recommendation employing generative
 adversarial network,” Expert Systems with Applications,
 vol. 180, 2021.
[30] Ö. F. İnce, I. F. Ince, M. E. Yıldırım, J. S. Park, J. K. Song, and
 B. W. Yoon, “Human activity recognition with analysis of
 angles between skeletal joints using a RGB-depth sensor,”
 ETRI Journal, vol. 42, no. 1, pp. 78–89, 2020.
[31] M. Ye, Q. Zhang, L. Wang, and J. Zhu, “A survey on human
 motion analysis,” Time-of-Flight Depth Imaging. Sensors,
 Algorithms, Appl.vol. 8200, pp. 149–187, 2013.
[32] B. J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, and
 R. M. Cook, “Moore “real-time human pose from single
 depth images,” Proceeding of the. 2011 Conf. Comput. Vis.
 Pattern Recognit, vol. 56, pp. 1295–1304, Denver, CO,
 USA, June 2013.
[33] P. Agarwal and M. Alam, “A lightweight deep learning model
 for human activity recognition on edge devices,” Procedia
 Computer Science, vol. 167, no. 2019, pp. 2364–2373, 2020.
[34] W. A. Abro, G. Qi, Z. Ali, Y. Feng, and M. Aamir, “Multi-turn
 intent determination and slot ﬁlling with neural networks and
 regular expressions,” Knowledge-Based Systems, vol. 208,
 2020.
[35] R. Bibi, Y. Saeed, A. Zeb et al., “Edge AI-based automated
 detection and classiﬁcation of road anomalies in VANET
 using Deep Learning,” Computational Intelligence and Neu-
 roscience, 2021.
[36] H. Salman, J. Grover, and T. Shankar, Hierarchical Rein-
 forcement Learning for Sequencing Behaviors, vol. 2733,
 pp. 2709–2733, 2018.
[37] J. Zhu, H. Chen, and W. Ye, A Hybrid CNN-LSTM Network
 for the Classiﬁcation of Human Activities Based on Micro-
 doppler Radar, IEEE Access, vol. 8, pp. 24713–24720, 2020.
[38] M. Milenkoski, K. Trivodaliev, S. Kalajdziski, M. Jovanov, and
 B. R. Stojkoska, “Real time human activity recognition on
 smartphones using LSTM networks,” in Proceeding of the 2018
 41st Int. Conv. Inf. Commun. Technol. Electron. Microelectron.
 MIPRO 2018 - Proc., pp. 1126–1131, Opatija, Croatia, May
 2018.
[39] P. Luo, X. Wang, W. Shao, and Z. Peng, “Towards under-
 standing regularization in batch normalization,” in Proceeding
 of the 7th Int. Conf. Learn. Represent. ICLR 2019, pp. 1–23,
 New Orleans, LA, USA, May 2019.
[40] K. Xia, J. Huang, and H. Wang, “LSTM-CNN architecture for
 human activity recognition,” IEEE Access, vol. 8, pp. 56855–
 56866, 2020.

You can also read