Northumbria Research Link
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Northumbria Research Link Citation: McCay, Kevin D., Ho, Edmond, Sakkos, Dimitris, Woo, Wai Lok, Marcroft, Claire, Dulson, Patricia and Embleton, Nicholas D. (2021) Towards Explainable Abnormal Infant Movements Identification: A Body-part Based Prediction and Visualisation Framework. In: IEEE BHI 2021: IEEE International Conference on Biomedical and Health Informatics (BHI) : Reshaping healthcare through advanced AI-enabled health informatics for a better quality of life, 27-30 Jul 2021, Virtual. (In Press) URL: This version was downloaded from Northumbria Research Link: http://nrl.northumbria.ac.uk/id/eprint/46441/ Northumbria University has developed Northumbria Research Link (NRL) to enable users to access the University’s research output. Copyright © and moral rights for items on NRL are retained by the individual author(s) and/or other copyright owners. Single copies of full items can be reproduced, displayed or performed, and given to third parties in any format or medium for personal research or study, educational, or not-for-profit purposes without prior permission or charge, provided the authors, title and full bibliographic details are given, as well as a hyperlink and/or URL to the original metadata page. The content must not be changed in any way. Full items must not be sold commercially in any format or medium without formal permission of the copyright holder. The full policy is available online: http://nrl.northumbria.ac.uk/policies.html This document may differ from the final, published version of the research and has been made available online in accordance with publisher policies. To read and/or cite from the published version of the research, please visit the publisher’s website (a subscription may be required.)
Towards Explainable Abnormal Infant Movements Identification: A Body-part Based Prediction and Visualisation Framework Kevin D. McCay1 , Edmond S. L. Ho1 , Dimitrios Sakkos1 , Wai Lok Woo1 , Claire Marcroft2 , Patricia Dulson2 , and Nicholas D. Embleton2 1 Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne, UK 2 Newcastle Neonatal Service, NUTH NHS Foundation Trust, Newcastle upon Tyne, UK Abstract—Providing early diagnosis of cerebral palsy (CP) is Currently, the General Movements Assessment (GMA) is key to enhancing the developmental outcomes for those affected. used to evaluate infant movement by manually observing Diagnostic tools such as the General Movements Assessment spontaneous infant movements at a specific stage in develop- (GMA), have produced promising results in early diagnosis, however these manual methods can be laborious. ment. In a typically developing infant “Fidgety Movements” In this paper, we propose a new framework for the automated (FMs) are detectable from 3 to 5 months post term [14] and classification of infant body movements, based upon the GMA, consistently have a similar appearance. The absence of these which unlike previous methods, also incorporates a visualization movement characteristics consequently allows for abnormal framework to aid with interpretability. Our proposed framework FM patterns to be identified and classified [5]. However, the segments extracted features to detect the presence of Fidgety Movements (FMs) associated with the GMA spatiotemporally. challenges associated with applying these assessments in prac- These features are then used to identify the body-parts with the tice depends upon the availability of appropriately trained clin- greatest contribution towards a classification decision and high- icians. In order to address the issues surrounding manual clin- light the related body-part segment providing visual feedback to ical assessment, several studies have been carried out which the user. attempt to assess the viability of automating assessments to We quantitatively compare the proposed framework’s classi- predict motor impairment based upon observed motion quality fication performance with several other methods from the liter- ature and qualitatively evaluate the visualization’s veracity. Our using computer vision-based approaches [10]. Examples such experimental results show that the proposed method performs as [2] [1] [3] explore a per frame background subtraction more robustly than comparable techniques in this setting whilst method for analysis, whereas more recent methods [17] [13] simultaneously providing relevant visual interpretability. [15], propose the use of Optical Flow-based methods to track Index Terms—infants, cerebral palsy, general movements as- and assess infant movements. Whilst reasonable results are sessment, machine learning, explainable AI, visualization obtained, these methods typically struggle to deal with intra- class variation, as well as anomalies within the recorded I. I NTRODUCTION video footage such as illumination changes, camera movement, Cerebral palsy (CP) is the term for a group of lifelong subject-scaling, and resolution inconsistencies. This makes it neurological conditions which can cause difficulties with mo- difficult for these approaches to be adopted in a real-world bility, posture and coordination. CP can also cause problems clinical setting. with swallowing, speech articulation, vision, and has been On the other hand, with the advancement of pose estimation associated with a diminished ability to learn new skills. techniques, high-quality skeletal poses can be extracted from There is significant variance in the severity of CP, with some video automatically. Recent work such as [11], [12] proposed individuals showing very minor symptoms whilst others may using histogram-based pose features to automate GMA by be severely disabled [6]. CP is attributed to non-progressive classifying infant movements into FM+ (normal) and FM- damage to the brain in early infancy [4], [16] and is one of (abnormal). The pose-based features, namely Histograms of the most common physical disabilities in childhood. However, Joint Orientation 2D (HOJO2D) [11] and Histograms of Joint early diagnosis of CP can be difficult, with a confirmed Displacement 2D (HOJD2D) [11] are computed from the diagnosis rarely made before 18 months of age [10]. The orientation of the body segment and the displacement of the difficulty in providing an early diagnosis is problematic, as joints, respectively. Encouraging classification performance early intervention care is considered particularly important for on traditional classifiers [11] and deep learning frameworks those with emerging and diagnosed CP. [12] were demonstrated. Wu et al. [18] proposed Movement Complexity Index which determines the complexity of the This project is supported in part by the Royal Society (Ref: IES/R1/191147). body movements of the infant by computing the correlations Corresponding author: Edmond S. L. Ho (e.ho@northumbria.ac.uk) between the movements of the joints using the Spearman
Correlation Coefficient Matrix (SCCM). Although the method the effectiveness of using histogram-based motion features, focuses on analyzing the features to predict the risk level of namely Histograms of Joint Orientation 2D (HOJO2D) and CP of the infant without the need of the training process Histograms of Joint Displacement 2D (HOJD2D), extracted as in machine learning based approaches, the features are from 2D skeletal poses in detecting FMs from videos. In this computed from 3D skeletal data which requires specialized paper, an early fusion (i.e. concatenation) of the HOJO2D and image sensing devices to capture those data. Furthermore, the HOJD2D is used as the input motion features, since better method requires the user to specify a threshold level of the performance has been demonstrated [11], [12]. computed index to separate normal/abnormal, and it is unclear if this can be generalized to other datasets. B. Spatiotemporal Fidgety Movement Detection The aforementioned studies suggest that an automated sys- In order to detect the presence of FMs spatiotemporally, the tem could potentially help to reduce the time and cost associ- motion features have to be extracted from 1) different body- ated with current manual clinical assessments, and also assist parts and 2) different temporal segments individually. Inspired clinicians in making earlier and more confident diagnoses by this, we propose motion feature extraction from 5 different by providing additional information about the assessed infant body-parts in the spatial domain, namely left arm, right arm, movements. However, these methods are also not without their left leg, right leg, and head-torso. For the temporal domain, setbacks. One of the main issues with using machine-learning we compute HOJO2D and HOJD2D features (8 bins) for the approaches in the medical domain is the problem of inter- 5 body parts from every 100-frame segment. In doing so, pretable AI. Models are often seen as ‘black boxes’ in which each video is represented by multiple histogram-based motion the underlying structures can be difficult to understand. There features accordingly. For example, a 1000-frame video will be is an increasing requirement for the mechanisms behind why represented by 50 fused features of HOJO2D and HOJD2D. systems are making decisions to be transparent, understandable In this work, we formulate the FMs detection problem as a and explainable [8]. As such, we propose a new motion clas- binary classification. Since each video is annotated with FM+ sification and visualization framework, which takes an RGB or FM-, we label all the fused features extracted according video as the input and analyzes the movement of individual to the holistic annotation of the video. When training the body parts to determine if FMs are present (FM+) or absent classifier all features are used, while the temporal location (FM-), subsequently identifying normal or abnormal general information is not used. In other words, no matter whether movements from segments of the sequence. To make our the features are extracted from the beginning or near the end proposed framework fully interpretable, an important aspect of the video, they will be used to train a single classifier. This is the integration of an automatically generated visualization proposed approach provides distinct advantages over previous capable of relaying pertinent information to the assessor. methods, i.e. 1) the classifier will be trained by more data The visualization highlights body-parts which are showing samples rather than using only one histogram representation movement abnormalities, and are subsequently providing the for the whole video as proposed in [11], [12], and 2) a focus most significant contribution towards the classification result. on the presence/absence of FMs while ignoring the temporal As such, our proposed contributions are summarized as: information when training the classifier. • A new body-part based classification framework for the We follow McCay et al. [11] on using an ensemble clas- automated prediction of CP based upon body movement sifier on MATLAB R2020a that consists of a wide range extracted from videos. of classifiers to boost the performance of the classification • A visualization feature to highlight pertinent body-parts results. Given the multiple fused features extracted from a in the video to improve the model interpretability. video, all the features will be classified as FM+ or FM- Experimental results showed that out proposed fidgety , this information is then used in visualizing the results movements prediction framework achieved 100% accuracy and (Section II-D). As the features were extracted in sequential outperforms the existing work on the benchmark MINI-RGBD order in the temporal domain, the classification result on each [7] dataset. The details of our proposed classification and visu- histogram-based motion feature is essentially detecting FMs alisation framework are discussed in Section II. Our evaluation spatiotemporally. is discussed in Section III. Our hope is that this contribution will aid in the adoption of such technologies in this domain, C. Late Fusion for Cerebral Palsy Prediction through accurate, quantifiable and explainable results. A demo video is available on https://youtu.be/6CZZmWnT4mo While the method presented in Section II-B provides precise information on the presence/absence of FMs spatiotemporally, II. M ETHODOLOGY directly using all motion features as a cerebral palsy prediction In this section, we discuss the proposed classification and for the whole video will result in sub-optimal performance visualization framework illustrated in Figure 1. since the temporal ordering is less important in the GMA than the presence/absence of sustained FMs at any point in A. Pose-based motion features the sequence. To tackle this problem, we propose representing The first step of our proposed framework is extracting each of the 5 body parts using a single scalar score s, with features from input video data. McCay et al. [11] demonstrated this being the average score of the classification result (FM+
Fig. 1: The overview of the proposed prediction and visualization framework. Method Accuracy Sensitivity Specificity as 0 and FM- as 1) across all temporal segments for each [11] w/ LDA 66.67% 50.00% 75.00% body-part. Therefore, the range of s will be between 0 and 1. [11] w/ SVM 83.33% 50.00% 100.00% Here, we propose to use a late fusion approach to train an [11] w/ Decision Tree 75.00% 50.00% 87.50% [11] w/ kNN (k=1) 75.00% 25.00% 100.00% ensemble classifier for cerebral palsy prediction. Specifically, [11] w/ kNN (k=3) 50.00% 00.00% 75.00% each video is represented by using the 5 scores obtained from [11] w/ Ensemble 66.67% 50.00% 75.00% the body parts. The binary classifier will predict whether the FCNet [12] 83.33% 75% 87.5% motion in the video is considered normal or abnormal. Conv1D-1 [12] 83.33% 75% 87.5% Conv1D-2 [12] 91.67% 75% 100.00% Conv2D-1 [12] 83.33% 75% 87.5% D. Visualization Conv2D-2 [12] 83.33% 75% 87.5% While machine learning-based frameworks have obtained Movement Complexity Index [18] 91.67% 100.00% 87.5% excellent performance in a wide range of visual understanding Our method 100.00% 100.00% 100.00% tasks, most of the existing frameworks can be considered black-box approaches since most of the classification frame- TABLE I: Classification accuracy comparison between our works only output the predicted label without specifying ex- proposed framework and baseline methods. actly what influences the classification decision. Whilst this is acceptable in typical computer vision tasks, it is less preferable fidgety movement detection with baselines methods in Section in healthcare applications, since it is essential for the clinicians III-A. Next, we present the visualization results as qualitative to verify the prediction as well. analysis in Section III-B. We follow the standard protocol as To extract body part information from an input image, the in [11], [12], [18] to conduct a leave-one-subject out cross- CDCL [9] pre-trained body segmentation model is used in this validation to ensure the results presented in this section are work. The body is segmented into 6 parts; head, torso, upper obtained base on unseen data during the training process. arms, lower arms, pelvis and upper legs, and lower legs. An example of the segmentation result is illustrated in Figure 1 A. Quantitative Evaluation on the Fidgety Movement Detec- (bottom left-hand corner). Specifically, given an input infant tion Results image, CDCL [9] returns an image mask for segmentation. To demonstrate the overall performance of our proposed To align with those 5 body parts to be used in this work, we framework, we first evaluate the cerebral palsy prediction of separate the segmentation masks for the arms and legs into the whole input video as explained in Section II-C. We com- the left and right masks. Here, k-means clustering is used to pared with the existing methods and the results are presented in divide the pixels on each segment mask into two groups. Table I. Using our framework, we achieved a perfect prediction In order to make our proposed framework more inter- with 100% accuracy. This highlights the effectiveness of our pretable, we include a visualization module that highlights the proposed framework over the previous work ( [11], [12], [18]). body-parts that are contributing to the classification decision. Our proposed method highlights the body-parts in red to B. Qualitative Evaluation on the Visualization Results indicate the absence of fidgety movements based on the scores We further provide qualitative results to demonstrate the computed in the body part abnormality detection explained in effectiveness of our proposed framework. As presented in Sec- Section II-B, providing clinicians with an intuitive visualiza- tion II-D, we detect the absence (FM-) or presence (FM+) of tion such as the examples illustrated in Figure 2. fidgety movements of each body part in each temporal segment (see Section II-B). The body parts with a prediction of FM- III. E VALUATION will be highlighted in red. An example is illustrated in Figure In this section, we evaluate the effectiveness of our proposed 2. Readers are referred to https://youtu.be/6CZZmWnT4mo to method using the public dataset MINI-RGBD [7] with Fidgety evaluate the visual quality of the results. From the results, it movement annotation by an experienced GMs assessor in can be seen that the highlighted body-parts generally show less [11]. We first compare the performance of our method on complex or more repetitive movements in the videos annotated
R EFERENCES [1] Lars Adde, Jorunn Helbostad, Alexander R. Jensenius, Mette Langaas, and Ragnhild Støen. Identification of fidgety movements and prediction of CP by the use of computer-based video analysis is more accurate when based on two video recordings. Physiotherapy Theory and Practice, 29(6):469–475, 2013. [2] Lars Adde, Jorunn L. Helbostad, Alexander R. Jensenius, Gunnar Tarald- (a) Positive (abnormal) example sen, Kristine H. Grunewaldt, and Ragnhild StØen. Early prediction of cerebral palsy by computer-based video analysis of general movements: A feasibility study. Developmental Medicine and Child Neurology, 52(8):773–778, 2010. [3] Lars Adde, Hong Yang, Rannei Sæther, Alexander Refsum Jensenius, Espen Ihlen, Jia yan Cao, and Ragnhild Støen. Characteristics of general movements in preterm infants assessed by computer-based video analysis. Physiotherapy Theory and Practice, 34(4):286–292, 2018. PMID: 29064734. [4] Martin Bax, Murray Goldstein, Peter Rosenbaum, Alan Leviton, Nigel (b) Negative (normal) example Paneth, Bernard Dan, Bo Jacobsson, and Diane Damiano. Proposed definition and classification of cerebral palsy, april 2005. Developmental Medicine and Child Neurology, 47(8):571–576, 2005. [5] Christa Einspieler, Arend F. Bos, Melissa E. Libertus, and Peter B. Marschik. The general movement assessment helps us to identify preterm infants at risk for cognitive dysfunction. Frontiers in Psychol- ogy, 7, Mar 2016. [6] National Institute for Health and Care Excellence. NICE seeks to improve diagnosis and treatment of cerebral palsy, Jan 2017. (c) Positive (abnormal) example [7] Nikolas Hesse, Christoph Bodensteiner, Michael Arens, Ulrich G. Hof- mann, Raphael Weinberger, and A. Sebastian Schroeder. Computer Fig. 2: Examples of the video generated by our visualization vision for medical infant motion analysis: State of the art and RGB- module. Body-parts without fidgety movements are high- D data set. In Computer Vision - ECCV 2018 Workshops. Springer lighted in red. International Publishing, 2018. [8] Andreas Holzinger, Chris Biemann, Constantinos S. Pattichis, and Douglas B. Kell. What do we need to build explainable ai systems for the medical domain? arXiv preprint arXiv:1712.09923, 2017. as FM-. As shown in Figure 2 (a), the arms are showing a lack [9] Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, and Ming-Ting Sun. Cross-domain complementary learning using pose for of movement and are subsequently predicted as FM- using multi-person part segmentation. IEEE Transactions on Circuits and our framework. On the other hand, the legs are predicted as Systems for Video Technology, 2020. FM+ as they are showing some movements in that temporal [10] Claire Marcroft, Aftab Khan, Nicholas D. Embleton, Michael Trenell, and Thomas Plötz. Movement recognition technology as a method of segment. Figure 2 (c) show an example with monotonous arm assessing spontaneous general movements in high risk infants. Frontiers and leg movements and our method highlights those body parts in Neurology, 6(JAN):284, 2015. as FM- accordingly. For the videos annotated as FM+, such [11] K. D. McCay, E. S. L. Ho, C. Marcroft, and N. D. Embleton. Establish- ing pose based features using histograms for the detection of abnormal as the example shown in Figure 2 (b), it can be seen that infant movements. In Proceedings of IEEE EMBC, pages 5469–5472, a much wider variety of movements can be observed. The July 2019. visualization provides effective visual feedback to the user, [12] K. D. McCay, E. S. L. Ho, H. P. H. Shum, G. Fehringer, C. Marcroft, and N. D. Embleton. Abnormal infant movements classification with as such clinicians can pay greater attention to the highlighted deep learning on pose-based features. IEEE Access, pages 1–1, 2020. segments for further analysis. [13] S. Orlandi, K. Raghuram, C. R. Smith, D. Mansueto, P. Church, V. Shah, M. Luther, and T. Chau. Detection of atypical and typical infant IV. C ONCLUSION movements using computer-based video analysis. In Proceedings of IEEE EMBC, pages 3598–3601, 2018. In this paper, we present a new framework for detecting [14] Heinz FR Prechtl, Christa Einspieler, Giovanni Cioni, Arend F. Bos, fidgety movements of infants spatiotemporally using the pose- Fabizi Ferrari, and Dieter Sontheimer. An early marker for neurological based features extracted from RGB videos. Experimental deficits after perinatal brain lesions. The Lancet, 349(9062):1361–1363, May 1997. results demonstrated that the new method not only achieves [15] H. Rahmati, H. Martens, O. M. Aamo, Ø. Stavdahl, R. Støen, and perfect prediction with 100% accuracy, but also provides the L. Adde. Frequency analysis and feature reduction method for prediction user with visualization on how the machine-learning based of cerebral palsy in young infants. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 24(11):1225–1234, Nov 2016. framework made the overall prediction relating to the abnor- [16] Peter Rosenbaum, Nigel Paneth, Alan Leviton, Murray Goldstein, Martin mality of the infant’s movement. Whilst our system is able Bax, Diane Damiano, Bernard Dan, and Bo Jacobsson. A report: The to provide rudimentary visual feedback to the user, additional definition and classification of cerebral palsy april 2006. Developmental medicine and child neurology. Supplement, 109:8–14, 03 2007. visualization tools would be useful to exploit the extracted [17] Annette Stahl, Christian Schellewald, Øyvind Stavdahl, Ole Morten spatiotemporal information and provide additional predictive Aamo, Lars Adde, and Harald Kirkerod. An optical flow-based method aid to clinicians for this complex diagnostic task. We intend to to predict infantile cerebral palsy. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 20(4):605–614, 2012. further implement our method on data gathered in a real-world [18] Q. Wu, G. Xu, F. Wei, L. Chen, and S. Zhang. Rgb-d videos-based early clinical setting, as well as explore other relevant visualisation prediction of infant cerebral palsy via general movements complexity. methods to provide meaningful and explainable feedback to IEEE Access, 9:42314–42324, 2021. the user.
You can also read