2021 Vector Institute Research Symposium - February 16th & 17th, 2021
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Poster Sessions Tuesday, February 16th, 2021 Day 1 Poster Presentations (11:40 AM — 1:15 PM) Poster # Poster Title Presenter and Affiliation 1 Learning Differential Equations that are Easy to Jacob Kelly,University of Solve Toronto, Student of Vector Faculty Member 2 Improving the Classification Parity of Machine Cecilia Ying, Learning Models through Subgroup Queen’s University, Student of Thresholds Optimization (STO) Vector Faculty Affiliate 3 Contrastive Learning for Sports Video: Maria Koshkina, Unsupervised Player Classification York University, Student of Vector Faculty Affiliate 4 Learning from Unexpected Events in the Jason Pina, Neocortical Microcircuit York University, Student of Vector Faculty Affiliate 5 Regularized Linear Autoencoders Recover the Xuchan Bao, Principal Components, Eventually University of Toronto, Student of Vector Faculty Affiliate 6 Dataset Inference: Ownership Resolution in Mohammad Yaghini, Machine Learning University of Toronto, Student of Vector Faculty Affiliate 7 A Transfer Learning Based Active Learning Ernest Namdar, Framework for Brain Tumor Classification University of Toronto, Student of Vector Faculty Affiliate 8 Can Future Wireless Networks Detect Fires? David Radke, University of Waterloo, Student of Vector Faculty Affiliate 9 Domain Adaptation and Self-Supervised Alice Santilli, Learning for Surgical Margin Detection Queen’s University, Student of Vector Faculty Affiliate 1
10 Experience Selection Using Dynamics SiQi Zhou, Similarity for Efficient Multi-Source Transfer University of Toronto, Student Learning Between Robots of Vector Faculty Member 11 Multi-agent Correlated Deep Q-learning for Hao Zhou, Microgrid Energy Management University of Ottawa, Student of Vector Faculty Affiliate 12 A Learning-based Algorithm to Quickly Rahul Patel, Compute Good Primal Solutions for Stochastic University of Toronto, Student Integer Programs of Vector Faculty Affiliate 13 Classification of Spinal Curvature from 3D Geoff Klein, Spine CT Images using a Convolutional Neural University of Toronto, Student Network of Vector Faculty Affiliate 14 A Computational Framework for Slang Zhewei Sun, Generation University of Toronto, Student of Vector Faculty Affiliate 15 Self-supervised Driven Consistency Training Chetan Srinidhi, for Annotation Efficient Histopathology Image University of Toronto, Analysis Postdoctoral Fellow of Vector Faculty Affiliate 16 Identifying and Interpreting Tuning Dimensions Nolan Dey, in Deep Networks University of Waterloo, Student of Vector Faculty Member 17 Neural Response Time Analysis: XAI Using Eric Taylor, Only a Stopwatch Vector Institute Postdoctoral Fellow 18 Solving First Passage Problems in Nanofluidic Martin Magill, Devices with Deep Neural Networks Ontario Tech University, Vector Postgraduate Affiliate 19 Adversarial Robustness through the Lens of Avery Ma, Fourier Analysis University of Toronto, Student of Vector Faculty Member 20 Understanding and Mitigating Exploding Paul Vicol, Inverses in Invertible Neural Networks University of Toronto, Student of Vector Faculty Member 21 Artificial Intelligence for Public Health: Jungtaek Kim, Priorities for Successful Use by Public Health Pohang University of Science and Organizations Technology, Student of Vector Faculty Member 2
22 Representation of Non-local Shape Information Shaiyan Keshvari, in Deep Neural Networks York University, Postdoctoral Fellow of Vector Faculty Affiliate 23 Learning Permutation Invariant Mohammed Adnan, Representations Using Memory Networks University of Waterloo, Vector Scholarship in AI Recipient 24 Evaluation Metrics for Deep Learning Omar Boursalie, Imputation Models in Healthcare and Finance McMaster University, Vector Postgraduate Affiliate 25 Detecting fMRI-based Intrinsic Connectivity Saurabh Shaw, Networks using EEG alone McMaster University, Vector Postgraduate Affiliate 26 Understanding Public Sentiments about Jingcheng Niu, COVID-19 Non-Pharmaceutical Interventions University of Toronto, Student through Event Studies of Vector Faculty Affiliate 27 Partitioning FPGA-optimized Systolic Arrays Long Chan, for Fun and Profit University of Waterloo, Student of Vector Faculty Affiliate 28 Learning Personalized Models of Human Reid McIlroy-Young, Behavior in Chess University of Toronto, Student of Vector Faculty Affiliate 29 Learning a Universal Template for Few-shot Eleni Triantafillou, Dataset Generalization University of Toronto, Student of Vector Faculty Member 30 Towards Robustness in Deep Learning Reza Samavi, Ryerson University, Vector Faculty Affiliate 31 Siamese ResNet is capable of deconvolving Mehran Karimzadeh, the epigenome by learning from the Vector Institute Postdoctoral transcriptome Fellow 32 On the Sample Complexity of Privately Ishaq Aden-Ali, Learning Unbounded High-Dimensional McMaster University, Student Gaussians of Vector Faculty Affiliate 3
33 FAB: The French Absolute Beginner Corpus Sean Robertson, for Pronunciation Training University of Toronto, Student of Vector Faculty Affiliate 34 Writing Can Predict AI Papers Acceptance, but Zining Zhu, Not Their Impact University of Toronto, Vector Scholarship in AI Recipient 35 Autoregressive Models for Offline Policy Michael Zhang, Evaluation and Optimization University of Toronto, Student of Vector Faculty Member 36 Interpretable Sequence Classification via Maayan Shvo, Discrete Optimization University of Toronto, Student of Vector Faculty Member 37 Planning from Pixels using Inverse Dynamics Keiran Paster, Models University of Toronto, Student of Vector Faculty Member 38 The Act of Remembering: A Study in Partially Toryn Klassen, Observable Reinforcement Learning Postdoctoral Fellow of Vector Faculty Member 39 Building LEGO Using Deep Generative Models Rylee Thompson, of Graphs University of Guelph, Student of Vector Faculty Member 40 Evaluating Curriculum Learning Strategies in Michal Lisicki, Neural Combinatorial Optimization University of Guelph, Student of Vector Faculty Affiliate 41 Predicting Dreissenid Mussel Abundance in Angus Galloway, Nearshore Waters using Artificial Intelligence University of Guelph, Student of Vector Faculty Member 42 Exchanging Lessons Between Algorithmic Elliot Creager, Fairness and Domain Generalization University of Toronto, Student of Vector Faculty Member 4
Poster Sessions Wednesday, February 17th, 2021 Day 2 Poster Presentations (12:00 — 1:30 PM) Poster # Poster Title Presenter and Affiliation 1 Proportionally Fair Clustering Revisited Evi Micha, University of Toronto, Student of Vector Faculty Affiliate 2 Unsupervised Representation Learning for Sana Tonekaboni, Time Series with Temporal Neighbourhood University of Toronto, Student Coding of Vector Faculty Member 3 Controlled Online Optimization Learning Kyle Mills, Ontario Tech University, Vector Postgraduate Affiliate 4 How to Accelerate Attention-based Models Ali Hadi Zadeh, Using GOBO University of Toronto, Student of Vector Faculty Affiliate 5 An Analysis of Mortality in Ontario using Gemma Postill, Cremation Data: Rise in Cremations during the Western University, Student of COVID-19 Pandemic Vector Faculty Affiliate 6 Reinforcement Learning in Large, Structured Nathan Phelps, Action Spaces: A Simulation Study of Decision Western University, Student of Support for Spinal Cord Injury Rehabilitation Vector Faculty Affiliate 7 A Biologically-inspired Neural Implementation Aarti Malhotra, of Affect Control Theory University of Waterloo, Student of Vector Faculty Affiliate 8 CLAR: Contrastive Learning of Auditory Haider Al-Tahan, Representations Western University, Student of Vector Faculty Affiliate 9 Asynchronous Multi-view Simultaneous Joyce Yang, Localization and Mapping University of Toronto, Student of Vector Faculty Member 5
10 Learning Agent Representations for Ice Hockey Guiliang Liu, University of Waterloo, Postdoctoral Fellow of Vector Faculty Member 11 Inverse Reinforcement Learning for Team Yudong Luo, Sports: Valuing Actions and Players University of Waterloo, Student of Vector Faculty Member 12 Machine Learning Approaches to Piecewise Mohammadmehdi Ataei, Linear Interface Construction (PLIC) University of Toronto, Vector Postgraduate Affiliate 13 Swept Volumes: To be Continued Silvia Sellán, University of Toronto, Student of Vector Faculty Affiliate 14 Policy Teaching in Reinforcement Learning via Amin Rakhsha, Environment Poisoning Attacks University of Toronto, Student of Vector Faculty Member 15 Are Wider Nets better Given the Same Anna Golubeva, nNumber of Parameters? University of Waterloo, Vector Postgraduate Affiliate 16 Automatic Whole Cell Segmentation for Wenchao Han, Multiplexed Images of Ovarian Cancer Tissue University of Toronto, Student Section of Vector Faculty Affiliate 17 Robotic Assessment of Stroke Patients using a Faranak Akbarifar, Deep Learning Framework Queen’s University, Student of Vector Faculty Affiliate 18 Reveal the Hidden Graph in RIEMS Mass Amoon Jamzad, Spectra: Margin Assessment in Cancer Queen’s University, Surgery with Graph Neural Networks Postdoctoral Fellow of Vector Faculty Affiliate 19 Configural Processing in Humans and Deep Xingye Fan, Convolutional Neural Networks York University, Student of Vector Faculty Affiliate 20 Auto-Tuning Structured Light by Optical Parsa Mirdehghan, Stochastic Gradient Descent University of Toronto, Student of Vector Faculty Affiliate 6
21 POSTER WITHDRAWN 22 Online Bayesian Moment Matching based SAT Haonan Duan, Solver Heuristics University of Waterloo, Student of Vector Faculty Member 23 miRNA-Based Deep Learning for Cancer Emily Kaczmarek, Classification Queen’s University, Vector Scholarship in AI Recipient 24 Exploring Text Specific and Blackbox Fairness John Chen, Algorithms in Multimodal Clinical NLP University of Toronto, Student of Vector Faculty Member 25 Utilizing Voxel Deep Neural Networks in Kevin Ryczko, Orbital-free Density Functional Theory University of Ottawa, Vector Postgraduate Affiliate 26 Data Augmentation Using GANs for GAN Saman Motamed, Based Detection of Pneumonia and COVID-19 University of Toronto, Student in X-ray Images of Vector Faculty Affiliate 27 Fast Inverse Mapping of Face GANs Nicky Bayat, Western University, Student of Vector Faculty Affiliate 28 Vec2int: Applications of the Chinese Patricia Thaine, Remainder Theorem in Word Embedding University of Toronto, Vector Compression and Arithmetic Postgraduate Affiliate 29 Bayesian Few-Shot Classification with One-vs- Jake Snell, Each Pólya-Gamma Augmented Gaussian University of Toronto, Student Processes of Vector Faculty Member 30 Estimating Severity of Depression from Sri Harsha Dumpala, Acoustic Features and Embeddings of Natural Dalhousie University, Student Speech of Vector Faculty Member 31 Fall Risk Assessment in the Wild Using Mina Nouredanesh, Egocentric Vision and Wearable IMU data University of Waterloo, Vector Postgraduate Affiliate 32 No MCMC for me: Amortized Sampling for Fast Jacob Kelly, and Stable Training of Energy-Based Models University of Toronto, Student of Vector Faculty Member 7
33 A Modified AUC for Training Convolutional Ernest Namdar, Neural Networks: Taking Confidence into University of Toronto, Student Account of Vector Faculty Affiliate 34 Flexible Few-Shot Learning of Contextual Mengye Ren, Similarity University of Toronto, Student of Vector Faculty Member 35 Instance Selection for GANs Terrance DeVries, University of Guelph, Student of Vector Faculty Member 36 Explicit and Implicit Regularization in Denny Wu, Overparameterized Least Squares Regression University of Toronto, Student of Vector Faculty Member 37 AMINN: Autoencoder-based Multiple Instance Jianan Chen, Neural Network for Outcome Prediction of University of Toronto, Student of Multifocal Cancer Vector Faculty Affiliate 38 Folk Theories, Machine Learning, and XAI Michael Ridley, Western University, Vector Postgraduate Affiliate 39 Adaptive Gradient Quantization for Data- Fartash Faghri, Parallel SGD University of Toronto, Student of Vector Faculty Member 40 Improving Lossless Compression Rates via Yangjun Ruan & Daniel Monte Carlo Bits-Back Coding Severo, University of Toronto, Students of Vector Faculty Member 41 Musical Speech: A Transformer-based Jason d'Eon, Composition Tool Dalhousie University, Student of Vector Faculty Member 42 EEG Source Space Analysis for Brain- Leila Mousapour, computer Interfaces McMaster University, Student of Vector Faculty Affiliate 8
Tuesday, February 16th Day One Poster Presentations Poster #1 Learning Differential Equations that are Easy to Solve Presenter: Jacob Kelly, University of Toronto Collaborators: Jesse Bettencourt (University of Toronto, Vector Institute), Matthew James Johnson (Google Brain), and David Duvenaud (University of Toronto, Vector Institute) Neural ODEs become expensive to solve numerically as training progresses. We introduce a differentiable surrogate for the time cost of standard numerical solvers using higher-order derivatives of solution trajectories. These derivatives are efficient to compute with Taylor-mode automatic differentiation. Optimizing this additional objective trades model performance against the time cost of solving the learned dynamics. Poster #2 Improving the Classification Parity of Machine Learning Models through Subgroup Thresholds Optimization (STO) Presenter: Cecilia Ying, Queen’s University We introduce a new post-processing algorithm to reduce the bias, and hence, improve the classification parity of machine learning model predictions against any protected subgroup without the need to: (a) preprocess the input training data; or (b) change the underlying machine learning algorithm. Our algorithm, the Subgroup Threshold Optimizer (STO), optimizes the classification thresholds for individual subgroups in order to minimize the overall discrimination score between all subgroups. Since our algorithm works at the post-processing stage, it does not require any changes to the training data and is machine learning model-agnostic. We evaluated the effectiveness of our algorithm in achieving both group fairness and subgroup fairness and show improvement for group fairness and for subgroup fairness by reducing the overall post adjustment discrimination score. 9
Poster #3 Contrastive Learning for Sports Video: Unsupervised Player Classification Presenter: Maria Koshkina, York University Collaborators: Hemanth Pidaparthy (York University) and James Elder (York University) We address the problem of unsupervised classification of players in a team sport according to their team affiliation, when jersey colours and design are not known a priori. We adopt a contrastive learning approach in which an embedding network learns to maximize the distance between representations of players on different teams relative to players on the same team, in a purely unsupervised fashion, without any labelled data. We evaluate the approach using a new hockey dataset and find that it outperforms prior unsupervised approaches by a substantial margin, particularly for real-time application when only a small number of frames are available for unsupervised learning before team assignments must be made. Remarkably, we show that our contrastive method achieves 94% accuracy after unsupervised training on only a single frame, with accuracy rising to 97% within 500 frames (17 seconds of game time). We further demonstrate how accurate team classification allows accurate team-conditional heat maps of player positioning to be computed. Poster #4 Learning from Unexpected Events in the Neocortical Microcircuit Presenter: Jason Pina, York University Collaborators: Colleen Gillon (University of Toronto, Mila), Jerome Lecoq (Allen Institute for Brain Science), Joel Zylberberg (Vector Institute, York University, CIFAR), and Blake Richards (McGill University, Mila, CIFAR) Scientists have long conjectured that the neocortex learns the structure of the environment in a predictive, hierarchical manner. To do so, expected, predictable features are differentiated from unexpected ones by comparing bottom-up and top-down streams of data. It is theorized that the neocortex then changes the representation of incoming stimuli, guided by differences in the responses to expected and unexpected events. Such differences in cortical responses have been observed; however, it remains unknown whether these unexpected event signals govern subsequent changes in the brain’s stimulus representations, and, thus, govern learning. Here, we show that unexpected event signals predict subsequent changes in responses to expected and unexpected stimuli in individual neurons and distal apical dendrites that are tracked over a period of days. These findings were obtained by observing layer 2/3 and layer 5 pyramidal neurons in primary visual cortex of awake, behaving mice using two-photon calcium imaging. We found that many neurons in both layers 2/3 and 5 showed large differences between their responses to expected and unexpected events. These unexpected event signals also determined how the responses evolved over subsequent days, in a manner that was different between the somata and distal apical dendrites. This difference between the somata and distal 10
apical dendrites may be important for hierarchical computation, given that these two compartments tend to receive bottom-up and top-down information, respectively. Together, our results provide novel evidence that the neocortex indeed instantiates a predictive hierarchical model in which unexpected events drive learning. Poster #5 Regularized Linear Autoencoders Recover the Principal Components, Eventually Presenter: Xuchan Bao, University of Toronto Collaborators: James Lucas, Sushant Sachdeva, and Roger Grosse (all with Vector Institute, University of Toronto) Our understanding of learning input-output relationships with neural nets has improved rapidly in recent years, but little is known about the convergence of the underlying representations, even in the simple case of linear autoencoders (LAEs). We show that when trained with proper regularization, LAEs can directly learn the optimal representation - ordered, axis-aligned principal components. We analyze two such regularization schemes: non-uniform ℓ2 regularization and a deterministic variant of nested dropout [Rippel et al, ICML' 2014]. Though both regularization schemes converge to the optimal representation, we show that this convergence is slow due to ill-conditioning that worsens with increasing latent dimension. We show that the inefficiency of learning the optimal representation is not inevitable -- we present a simple modification to the gradient descent update that greatly speeds up convergence empirically. Poster #6 Dataset Inference: Ownership Resolution in Machine Learning Presenter: Mohammad Yaghini, University of Toronto Collaborators: Pratyush Maini (Indian Institute of Technology Delhi), and Nicolas Papernot (Vector Institute, University of Toronto) Upcoming Poster Spotlight at International Conference on Learning Representations (ICLR) 2021 With increasingly more data and computation involved in their training, machine learning models constitute valuable intellectual property. This has spurred interest in model stealing attacks, which are made more practical by advances in learning with partial, little, or no supervision. Existing defenses focus on inserting unique watermarks in the model's decision surface, but this is insufficient: since the watermarks are not sampled from the training distribution, they are not always preserved during model stealing. In this paper, we make the key observation that knowledge contained in the stolen model's training set is what is common to all stolen copies. The adversary's goal, irrespective of the attack employed, is always to extract this knowledge or 11
its by-products. This gives the original model's owner a strong advantage over the adversary: model owners have access to the original training data. We thus introduce dataset inference, the process of identifying whether a suspected model copy has private knowledge from the original model's dataset, as a defense against model stealing. We develop an approach for dataset inference that combines statistical testing with the ability to estimate the distance of multiple data points to the decision boundary. Our experiments on CIFAR10 and CIFAR100 show that model owners can claim with confidence greater than 99% that their model (or dataset as a matter of fact) was stolen, despite only exposing 50 of the stolen model's training points. Dataset inference defends against state-of-the-art attacks, even when the adversary is adaptive. Unlike prior work, it also does not require retraining or overfitting the defended model. Poster #7 A Transfer Learning Based Active Learning Framework for Brain Tumor Classification Presenter: Ernest Namdar, University of Toronto Collaborators: Ruqian Hao (University of Electronic Science and Technology of China), Lin Liu (University of Electronic Science and Technology of China), and Farzad Khalvati (University of Toronto) Brain tumor is one of the leading causes of cancer-related death globally among children and adults. Precise classification of brain tumor grade (low-grade and high-grade glioma) at early stage plays a key role in successful prognosis and treatment planning. With recent advances in deep learning, Artificial Intelligence-enabled brain tumor grading systems can assist radiologists in the interpretation of medical images within seconds. The performance of deep learning techniques is, however, highly dependent on the size of the annotated dataset. It is extremely challenging to label a large quantity of medical images given the complexity and volume of medical data. In this work, we propose a novel transfer learning based active learning framework to reduce the annotation cost while maintaining stability and robustness of the model performance for brain tumor classification. We employed a 2D slice-based approach to train and finetune our model on the Magnetic Resonance Imaging (MRI) training dataset of 203 patients and a validation dataset of 66 patients which was used as the baseline. With our proposed method, the model achieved Area Under Receiver Operating Characteristic (ROC) Curve (AUC) of 82.89% on a separate test dataset of 66 patients, which was 2.92% higher than the baseline AUC while saving at least 40% of labeling cost. In order to further examine the robustness of our method, we created a balanced dataset, which underwent the same procedure. The model achieved AUC of 82% compared with AUC of 78.48% for the baseline, which reassures the robustness and stability of our proposed transfer learning augmented with active learning framework while significantly reducing the size of training data. 12
Poster #8 Can Future Wireless Networks Detect Fires? Presenter: David Radke, University of Waterloo Collaborators: Omid Abari (UCLA), Tim Brecht (University of Waterloo), and Kate Larson (University of Waterloo) Oral at ACM International Conference on Systems for Energy-Efficient Built Environments (BuildSys) 2021 Latencies, operating ranges, and false positive rates for existing indoor fire detection systems like smoke detectors and sprinkler systems are far from ideal. This paper explores the use of wireless radio frequency (RF) signals to detect indoor fires with low latency, through walls and other occlusions. We build on past research focused on wireless sensing, and introduce RFire, a system which uses millimeter wave technology and deep learning to extract instances of fire. We perform line-of-sight (LoS) and occluded non-LoS experiments with fire at different distances, and find that RFire achieves a best-result mean latency of 24 seconds when trained and tested in multiple environments. RFire yields at least a 4 times improvement in mean alarm latency over today's alarms. Poster #9 Domain Adaptation and Self-supervised Learning for Surgical Margin Detection Presenter: Alice Santilli, Queen’s University Collaborators: Amoon Jamzad (Computing, Queen's University), Alireza Sedghi (Computing, Queen's University), Martin Kaufmann (Surgery, Queen's University), Kathryn Logan (Pathology, Queen's University), Julie Wallis (Pathology, Queen's University), Kevin Y.M Ren (Pathology, Queen's University), Natasja Janssen (Computing, Queen's University), Shaila Merchant (Surgery, Queen's University), Jay Engel (Surgery, Queen's University), Doug McKay (Surgery, Queen's University), Sonal Varma (Pathology, Queen's University), Ami Wang (Pathology, Queen's University), Gabor Fichtinger (Computing, Queen's University), John F. Rudan (Surgery, Queen's University), and Parvin Mousavi (Computing, Queen's University) Purpose: One in five women who undergo breast conserving surgery will need a second revision surgery due to tumor tissue that has been left behind. The iKnife is a mass spectrometry modality that produces real time margin information based on the signatures of metabolites in surgical smoke. Using this modality and real-time tissue classification, surgeons could remove all cancerous tissue during the initial surgery which would improve survival, mental health and cosmetic outcomes for patients. An obstacle in developing the iKnife breast cancer recognition model is the destructive, time consuming and sensitive nature of the data collection that limits the size of the datasets. 13
Methods: We propose to address these obstacles by first, building a self-supervised learning from limited, weakly-labeled data. By doing so, the model can learn to contextualize the general features of iKnife data with more accessible cancer tissue type. Second, the trained model can then be applied to a cancer classification task on breast data. This domain adaptation allows for the transfer of learnt weights from models of one tissue type to another. Results: Our datasets contained 320 skin burns (129 tumor burns, 191 normal burns) from 51 patients and 144 breast tissue burns (41 tumor and 103 normal) from 11 patients. We investigate the effect of different hyperparameters in the performance of the final classifier and show that the proposed two step configuration achieves an accuracy, sensitivity and specificity of 92%, 88% and 92% respectively. Conclusion: We showed that having a limited number of breast data samples for training a classifier can be compensate by self-supervised domain adaption on a set of unlabelled skin data. Poster #10 Experience Selection Using Dynamics Similarity for Efficient Multi-source Transfer Learning Between Robots Presenter: SiQi Zhou, University of Toronto Collaborators: Michael J. Sorocky (Vector Institute, University of Toronto), and Angela P. Schoellig (Vector Institute, University of Toronto) Oral at International Conference on Robotics and Automation (ICRA) 2020 In the robotics literature, different knowledge transfer approaches have been proposed to leverage the experience from a source task or robot -- real or virtual -- to accelerate the learning process on a new task or robot. A commonly made but infrequently examined assumption is that incorporating experience from a source task or robot will be beneficial. In practice, inappropriate knowledge transfer can result in negative transfer or unsafe behaviour. In this work, inspired by a system gap metric from robust control theory, the nu-gap, we present a data-efficient algorithm for estimating the similarity between pairs of robot systems. In a multi- source inter-robot transfer learning setup, we show that this similarity metric allows us to predict relative transfer performance and thus informatively select experiences from a source robot before knowledge transfer. We demonstrate our approach with quadrotor experiments, where we transfer an inverse dynamics model from a real or virtual source quadrotor to enhance the tracking performance of a target quadrotor on arbitrary hand-drawn trajectories. We show that selecting experiences based on the proposed similarity metric effectively facilitates the learning of the target quadrotor, improving performance by 62% compared to a poorly selected experience. 14
Poster #11 Multi-agent Correlated Deep Q-learning for Microgrid Energy Management Presenter: Hao Zhou, University of Ottawa Collaborators: Melike Erol-Kantarci (University of Ottawa) Microgrid (MG) energy management is an important part of MG operation. Various entities are generally involved in the energy management of an MG, e.g., energy storage system (ESS), renewable energy resources (RER) and the load of users, and it is crucial to coordinate these entities. The main contribution of this paper is that we propose a correlated deep Q-learning (CDQN) method for the MG energy management, where each agent runs the DQN independently, and the correlated equilibrium is used for coordination. Our simulation results demonstrate the success of CDQN by having 40.9% and 9.62% higher profit for ESS agent and PV agent, respectively. Poster #12 A Learning-based Algorithm to Quickly Compute Good Primal Solutions for Stochastic Integer Programs Presenter: Rahul Patel, University of Toronto Collaborators: Yoshua Bengio (Mila/University of Montreal), Andrea Lodi (Polytechnique Montreal), Emma Frejinger (University of Montreal), and Sriram Sankaranarayanan (Polytechnique Montreal) We propose a novel approach using supervised learning to obtain near-optimal primal solutions for two-stage stochastic integer programming (2SIP) problems with constraints in the first and second stages. The goal of the algorithm is to predict a representative scenario (RS) for the problem such that, deterministically solving the 2SIP with the random realization equal to the RS, gives a near-optimal solution to the original 2SIP. Predicting an RS, instead of directly predicting a solution ensures first-stage feasibility of the solution. If the problem is known to have complete recourse, second-stage feasibility is also guaranteed. For computational testing, we learn to find an RS for a two-stage stochastic facility location problem with integer variables and linear constraints in both stages and consistently provide near-optimal solutions. Our computing times are very competitive with those of general-purpose integer programming solvers to achieve a similar solution quality. 15
Poster #13 Classification of Spinal Curvature from 3D Spine CT Images using a Convolutional Neural Network Presenter: Geoff Klein, University of Toronto Collaborators: Michael Hardisty (Sunnybrook Research Institute, University of Toronto), Isaac Carreno (Sunnybrook Research Institute, University of Toronto), Joel Finkelstein (Sunnybrook Research Institute, University of Toronto), Young Lee (University of Toronto), Arjun Sahgal (Sunnybrook Research Institute, University of Toronto), Cari Whyne (Sunnybrook Research Institute, University of Toronto), and Anne Martel (Sunnybrook Research Institute, University of Toronto) Introduction: Approximately two-thirds of cancer patients develop bone metastases, with the spine being the most common location. Vertebral metastases can lead to biomechanical instability, pain, and neurological compromise. Stereotactic body radiation therapy (SBRT) delivers high-dose focal treatment to tumours and this treatment has been rapidly expanding because of its effectiveness for local tumour control. A significant side effect of SBRT is vertebral compression fracture, occurring in 10% to 40% of patients following SBRT. The broad goal of this work is to build automated, quantitative tools to aid clinical decision making related to mechanical stability and fracture risk in metastatically involved vertebrae. Spinal malalignment (scoliotic deformity characterised as abnormal lateral spinal curvature) is one measure that has been used in predicting vertebral fracture and progression following SBRT. However, current evaluation of spinal malalignment can be time consuming (requiring Cobb angle measurement) with significant inter-observer variation. As such, an automated algorithm to evaluate Cobb angle in 3D Computed Tomography (CT) scans was developed and applied to patients with spinal metastasis treated with SBRT. Methods: Using a 3D U-Net model, which determined a Gaussian heatmap for spine localization, spline curves were calculated and projected into the coronal plane. Angles were calculated from the gradient of the spline curves and a sliding window was used to determine the median angle along the spline curve to determine the overall Cobb angle for the spine. Data: The VerSe 2019 dataset was used to train the 3D U-Net to predict the Gaussian heatmaps of the spine using ground truth vertebral body centroids to determine the Gaussian heatmaps. Augmentation during training was done through random isotropic resampling (between 3.5- and 6-mm), and affine and deformation transformations. An in-house dataset from diagnostic imaging (45 CT scans) was then used for parameter tuning to calculate Cobb angles from the predicted heatmaps. The overall pipeline was then validated on an additional spine dataset collected for SBRT treatment planning (63 CT scans). The in-house data was a retrospective dataset of patients who underwent SBRT for spinal metastases, where dosimetry calculations were done on the SBRT treatment planning images and diagnostic images were follow-ups after treatment. The in-house data showed a mix of malalignment in terms of 16
existence and severity, surgical intervention (screws), metastases and vertebral fractures. Malalignment frequently extended beyond the field-of-view of the scan and pelvic involvement was common in the scans. Ground truth Cobb angles and scoliosis classification for the in- house datasets were conducted by a Spine Fellow. Ground truth and predicted angles above 10° were classified as scoliotic. Results: The model was able to predict scoliosis with accuracy of 79.5% and 76.2% on the diagnostic imaging and SBRT planning datasets, respectively. The mean ground truth and predicted Cobb angles in the SBRT treatment planning were 8.8° ± 7.0° (ranging from 0.8° to 28.0°) and 9.5° ± 7.5° (ranging from 1.2° to 35.6°), respectively. The mean ground truth and predicted Cobb angles in the diagnostic imaging dataset were 8.5° ± 6.7° (ranging from 0.2° to 28.6°) and 12.0° ± 12.6° (ranging from 0.4° to 51.4°), respectively. Conclusion: A fully automated model was constructed to predict scoliotic spinal curvature in 3D CT spine scans by evaluating the Cobb angle. Spinal curvature (scoliosis deformity) is contributing parameter for the SINS classification to determine instability. This algorithm can be used in clinical decision making to aid in spinal curvature classification and scoliosis severity assessment. Future work will focus on improving accuracy, expansion to kyphotic deformity, and combing with other image features related to fracture risk. Poster #14 A Computational Framework for Slang Generation Presenter: Zhewei Sun, University of Toronto Collaborators: Richard Zemel (Vector Institute, University of Toronto) and Yang Xu (Vector Institute, University of Toronto) Slang is a common type of informal language, but its flexible nature and paucity of data present challenges for existing natural language systems. We take an initial step toward machine generation of slang by developing a framework that models the speaker's word choice in slang context. Our framework encodes novel slang meaning by relating the conventional and slang senses of a word while incorporating syntactic and contextual knowledge in slang usage. We construct the framework using a combination of probabilistic inference and neural contrastive learning. We perform rigorous evaluations on three slang dictionaries and show that our approach not only outperforms state-of-the-art language models, but it also better predicts the historical emergence of slang word usages from 1960s to 2000s. We interpret the proposed models and find that the contrastively learned semantic space is sensitive to the similarities between slang and conventional senses of words. Our work creates opportunities for the automated generation and interpretation of informal language. 17
Poster #15 Self-supervised Driven Consistency Training for Annotation Efficient Histopathology Image Analysis Presenter: Chetan Srinidhi, University of Toronto Collaborators: Seung Wook Kim (Dept. CSE, University of Toronto), Fu-Der Chen (Dept. of ECE, University of Toronto) Training a neural network with a large labeled dataset is still a dominant paradigm in computational histopathology. However, obtaining such exhaustive manual annotations is often expensive, laborious, and prone to inter and intra-observer variability. While recent self- supervised and semi-supervised methods can alleviate this need by learning unsupervised feature representations, they still struggle to generalize well to downstream tasks when the number of labeled instances is small. In this work, we overcome this challenge by leveraging both task-agnostic and task-specific unlabeled data based on two novel strategies: i) a self-supervised pretext task that harnesses the underlying multi-resolution contextual cues in histology whole-slide images to learn a powerful supervisory signal for unsupervised representation learning; and ii) a new teacher- student semi-supervised consistency paradigm that learns to effectively transfer the pretrained representations to downstream tasks based on prediction consistency with the task-specific unlabeled data. We carry out extensive validation experiments on three histopathology benchmark datasets across two classification and one regression based tasks, i.e., tumor metastasis detection, tissue type classification, and tumor cellularity quantification. Under limited-label data, the proposed method yields tangible improvements, which is close or even outperforming other state-of-the-art self-supervised and supervised baselines. Furthermore, we empirically show that the idea of bootstrapping the self-supervised pretrained features is an effective way to improve the task-specific semi-supervised learning on standard benchmarks. Besides, we also show that our pretrained representations are more generic and agnostic to images trained with different tissue types or organs and resolution protocols. Poster #16 Identifying and Interpreting Tuning Dimensions in Deep Networks Presenter: Nolan Dey, University of Waterloo Collaborators: J. Eric Taylor (Vector Institute, University of Guelph), Bryan P. Tripp (University of Waterloo), Alexander Wong (University of Waterloo), and Graham W. Taylor (Vector Institute, University of Guelph) Training a neural network with a large labeled dataset is still a dominant paradigm in computational histopathology. However, obtaining such exhaustive manual annotations is often 18
expensive, laborious, and prone to inter and intra-observer variability. While recent self- supervised and semi-supervised methods can alleviate this need by learning unsupervised feature representations, they still struggle to generalize well to downstream tasks when the number of labeled instances is small. In this work, we overcome this challenge by leveraging both task-agnostic and task-specific unlabeled data based on two novel strategies: i) a self-supervised pretext task that harnesses the underlying multi-resolution contextual cues in histology whole-slide images to learn a powerful supervisory signal for unsupervised representation learning; ii) a new teacher-student semi-supervised consistency paradigm that learns to effectively transfer the pretrained representations to downstream tasks based on prediction consistency with the task-specific unlabeled data. We carry out extensive validation experiments on three histopathology benchmark datasets across two classification and one regression based tasks, i.e., tumor metastasis detection, tissue type classification, and tumor cellularity quantification. Under limited-label data, the proposed method yields tangible improvements, which is close or even outperforming other state-of-the-art self-supervised and supervised baselines. Furthermore, we empirically show that the idea of bootstrapping the self-supervised pretrained features is an effective way to improve the task-specific semi-supervised learning on standard benchmarks. Besides, we also show that our pretrained representations are more generic and agnostic to images trained with different tissue types or organs and resolution protocols. Poster #17 Neural Response Time Analysis: XAI Using Only a Stopwatch Presenter: Eric Taylor, Vector Institute Collaborators: Shashank Shekhar (University of Guelph), and Graham Taylor (University of Guelph) Oral at Conference on Computer Vision and Computer Recognition (CVPR) 2020 How would you describe the features that a deep learning model composes if you were restricted to measuring observable behaviours? Explainable artificial intelligence (XAI) methods rely on privileged access to model architecture and parameters that is not always feasible for most users, practitioners, and regulators. Inspired by cognitive psychology research on humans, we present a case for measuring response times (RTs) of a forward pass using only the system clock as a technique for XAI. Our method applies to the growing class of models that use input- adaptive dynamic inference and we also extend our approach to standard models that are converted to dynamic inference post hoc. The experimental logic is simple: If the researcher can contrive a stimulus set where variability among input features is tightly controlled, differences in response time for those inputs can be attributed to the way the model composes those features. 19
First, we show that RT is sensitive to difficult, complex features by comparing RTs from ObjectNet and ImageNet. Next, we make specific a priori predictions about RT for abstract features present in the SCEGRAM dataset, where object recognition in humans depends on complex intra-scene object-object relationships. Finally, we show that RT profiles bear specificity for class identity, and therefore the features that define classes. These results cast light on the model's feature space without opening the black box. Poster #18 Solving First Passage Problems in Nanofluidic Devices with Deep Neural Networks Presenter: Martin Magill, Ontario Tech University Collaborators: Andrew M. Nagel (Ontario Tech University), and Hendrick W. de Haan (Ontario Tech University) A major theme in the deep learning revolution has been the ability of deep models to overcome the curse of dimensionality in a wide variety of settings. One such application is the solution of high-dimensional partial differential equations (PDEs). PDEs are powerful mathematical tools used throughout physics and the mathematical sciences. The computational costs of traditional PDE solvers grow exponentially with problem dimension, so high-dimensional PDEs are typically considered intractable. However, the use of deep neural networks in this area is rapidly opening up new avenues of research in this field. This work looks at the use of deep neural networks to solve PDEs from biophysics that describe complex molecular motion. Specifically, the PDEs are formulated to understand the mean first passage time of molecules passing through a microfluidic sorting device. Such devices are designed with complex geometries to enable single-molecule detection, analysis, and manipulation, enabling a variety of biotechnologies (e.g., personalized medicine). The use of deep neural networks could enable faster and more efficient design of these complicated devices. Poster #19 Adversarial Robustness through the Lens of Fourier Analysis Presenter: Avery Ma, University of Toronto Collaborators: Simona Meng (University of Toronto), and Amir-massoud Farahmand (Vector Institute, University of Toronto) How is a robustified model different from a non-robustified one from the Fourier perspective? Our work investigates the problem of adversarial robustness by empirically studying different defense and attack approaches in the frequency domain. Motivated by the widely-used assumption that natural images are primarily represented in low frequencies, we demonstrate in a simple logistic regression setting that standard training focuses on optimizing low-frequency 20
components of the weights, making the model vulnerable to high-frequency adversarial perturbations. In our preliminary results, we show that attenuating the high-frequency components of the weights during training leads to improved adversarial robustness of the network. Poster #20 Understanding and Mitigating Exploding Inverses in Invertible Neural Networks Presenter: Paul Vicol, University of Toronto Collaborators: Jens Behrmann (University of Bremen), Kuan-Chieh Wang (University of Toronto & Institute), Roger Grosse (University of Toronto & Vector Institute), and Jorn-Henrik Jacobsen (University of Toronto & Vector Institute) Invertible neural networks (INNs) have been used to design generative models, implement memory- saving gradient computation, and solve inverse problems. In this work, we show that commonly-used INN architectures suffer from exploding inverses and are thus prone to becoming numerically non- invertible. Across a wide range of INN use-cases, we reveal failures including the non-applicability of the change-of-variables formula on in- and out-of-distribution (OOD) data, incorrect gradients for memory-saving backprop, and the inability to sample from normalizing flow models. We further derive bi-Lipschitz properties of atomic building blocks of common architectures. These insights into the stability of INNs then provide ways forward to remedy these failures. For tasks where local invertibility is sufficient, like memory-saving backprop, we propose a flexible and efficient regularizer. For problems where global invertibility is necessary, such as applying normalizing flows on OOD data, we show the importance of designing stable INN building blocks. Poster #21 Brick-by-Brick: Sequential 3D Object Construction with Deep Reinforcement Learning Presenter: Jungtaek Kim, Pohang University of Science and Technology (POSTECH) Collaborators: Hyunsoo Chung (POSTECH), Boris Knyaznev (University of Guelph, Vector Institute), Graham Taylor (University of Guelph, Vector Institute), Jinhwi Lee (POSTECH), Jaesik Park (POSTECH), and Minsu Cho (POSTECH) 3D object construction is a challenging problem requiring understanding of objects compositional and relational structure. Humans solve this problem using their natural ability to imagine a decomposition of a target object into its constituent parts and then \textit{sequentially} building the object part-by-part. Remarkably, to do so humans often do not rely on strong supervision in which order and where to put each of the parts. Our method models human behavior by constructing an object component-wise in a combinatorial manner. As the basis for learning, we utilize a volumetric unit primitive as the building block of 3D objects. In this regime, we formulate a reinforcement learning-based model without strong supervision of intermediate target object information or building instructions. Our approach employs graph-structured inputs, where the nodes and edges of the graph express the pose of primitives and the connection between them, 21
respectively. We introduce a reinforcement learning environment for construction based on OpenAI Gym and demonstrate that our approach successfully learns to construct objects within diverse evaluation scenarios conditioned on a single image or multiple views of a target object, even if when the target information of unseen categories is given. Poster #22 Representation of Non-local Shape Information in Deep Neural Networks Presenter: Shaiyan Keshvari, York University Collaborators: Ingo Fründ, and James Elder (York University) It is uncertain how explicitly deep convolutional neural networks (DCNNs) represent shape. While neurons in primate visual areas such as V4 and IT are known to be selective for global shape, some studies suggest that DCNNs rely primarily on local texture cues. Here we employ a set of novel shape stimuli to explicitly test for the representation of non-local shape information in DCNNs. We employ a set of animal silhouettes as well as matched controls generated by two distinct generative models of shape. The first model generates silhouettes that are matched for local curvature statistics, but are otherwise maximally random, containing no global regularities. The second model generates sparse shape components that contain many of the global symmetries seen in animal shapes but are otherwise not identifiable. To assess the selectivity of DCNNs for non-local shape information, we train a linear classifier to distinguish animal shapes from control shapes based on the activations in each layer. For both AlexNet and VGG16, discriminability improved monotonically from early to late convolutional layers, reaching 90-100% accuracy. These results show that DCNNs do represent non-local shape information, that this information becomes more explicit in later layers, and goes beyond simple global geometric regularities. Poster #23 Learning Permutation Invariant Representations using Memory Networks Presenter: Mohammed Adnan, University of Waterloo Collaborators: Shivam Kalra (KIMIA Lab, University of Waterloo), Graham Taylor (University of Guelph, Vector Institute), and H.R. Tizhoosh (KIMIA Lab, University of Waterloo) Many real-world tasks such as classification of digital histopathology images and 3D object detection involve learning from a set of instances. In these cases, only a group of instances or a 22
set, collectively, contains meaningful information and therefore only the sets have labels, and not individual data instances. In this work, we present a permutation invariant neural network called Memory-based Exchangeable Model (MEM) for learning set functions. The MEM model consists of memory units that embed an input sequence to high-level features enabling the model to learn inter-dependencies among instances through a self-attention mechanism. We evaluated the learning ability of MEM on various toy datasets, point cloud classification, and classification of lung whole slide images (WSIs) into two subtypes of lung cancer---Lung Adenocarcinoma, and Lung Squamous Cell Carcinoma. We systematically extracted patches from lung WSIs downloaded from The Cancer Genome Atlas~(TCGA) dataset, the largest public repository of WSIs, achieving a competitive accuracy of 84.84% for classification of two sub-types of lung cancer. The results on other datasets are promising as well, and demonstrate the efficacy of our model. Poster #24 Evaluation Metrics for Deep Learning Imputation Models in Healthcare and Finance Presenter: Omar Boursalie, McMaster University Collaborators: Reza Samavi (Ryerson University, Vector Institute) and Thomas E. Doyle (McMaster University, Vector Institute) Oral at AAAI Conference on Artificial Intelligence 2021 There is growing interest in imputing missing data in tabular datasets using deep learning. A commonly used metric in evaluating the performance of a deep learning-based imputation model is root mean square error (RMSE), which is a prediction evaluation metric. In this study, we demonstrate the limitations of RMSE for evaluating deep learning-based imputation performance by conducting a comparative analysis between RMSE and alternative metrics in the statistical literature including qualitative, predictive accuracy, and statistical distance. To minimize model and dataset biases, we use two different deep learning imputation models (denoising autoencoders and generative adversarial nets) and a regression imputation model. We also use two tabular datasets with growing amounts of missing data from different industry sectors: healthcare and financial. Our results show that contrary to the commonly used RMSE metric, the statistical metric of Jensen Shannon distance best assessed the imputation models' performance. The regression model also ranked higher than deep learning when evaluated using the Jensen Shannon metric. This study was presented at the 5th International Workshop on Health Intelligence (W3PHIAI-21) co-located with the 35th AAAI Conference on AI. The paper will appear in Studies in Computational Intelligence (SCI). 23
Poster #25 Detecting fMRI-based Intrinsic Connectivity Networks using EEG alone Presenter: Saurabh Shaw, McMaster University Collaborators: Margaret McKinnon (St. Joseph's, McMaster University, Homewood Research Institute), Jennifer Heisz (McMaster University), Amabilis Harrison (Hamilton Health Sciences), John Connolly (Vector Institute, McMaster University), and Suzanna Becker (Vector Institute, McMaster University) Dysfunctional intrinsic connectivity network (ICN) dynamics have been discovered in a number of psychopathologies. However, despite its potential use as biomarkers for clinical applications, major barriers have prevented its widespread adoption. These include high operational costs and low temporal resolution of fMRI, the most commonly used modality for this purpose. This study addresses this shortcoming by developing a machine learning pipeline capable of tracking ICNs using a cheaper and more widely accessible modality such as EEG. EEG-based features of three cognitively-relevant ICNs were found using feature engineering on simultaneous EEG- fMRI data. These features were used to train three classifiers, emulating different scenarios of data availability. The highest test-set classification accuracies of 97% were achieved using fully supervised classifiers that were trained on both EEG and fMRI data from the same participant. On the other hand, classification accuracies of 60% were achieved using traditional leave-one- subject-out cross validation on the EEG data only, and were boosted up to 75% by utilizing semi-supervised learning. In conclusion, this study validates a machine learning framework to detect ICN activation using EEG data alone, improving the feasibility of using brain network- based biomarkers in clinical applications. Poster #26 Understanding Public Sentiments about COVID-19 Non-pharmaceutical Interventions through Event Studies Presenter: Jingcheng Niu, University of Toronto Collaborators: Gerald Penn (University of Toronto), Victoria Ng (Public Health Agency of Canada), and Erin E. Rees (Public Health Agency of Canada) Attributing shifts in social media sentiment to real-world events is now an important aspect of public policy. Especially as the whole world is combating the COVID-19 pandemic, a better understanding of the public's opinion of their government's responses is crucial for balancing the demand for public health resources against the potential for economic devastation and the public's own compliance with draconian measures. Early publications about public sentiment towards interventions against SARS-CoV-2 transmission --- especially those not in CL conferences and journals --- have already drawn some highly suspect conclusions because they lack a method for properly attributing sentiment changes to events. As yet, they have no ability to distinguish the influence of various events across time, no possibility of conducting 24
significance tests, and no coherent model for predicting the public's opinion of future events of the same sort. Dealing in sentiment analysis components without providing some clear, task- specific guidance about how to use and evaluate them is simply asking for this sort of abuse. This paper argues that we can bring the potential of this urgently needed CL application to fruition by looking outside CL, because in fact, the required evaluation methods already do exist. In the financial sector, event studies of the fluctuation in a publicly traded company's stock price are commonplace for determining the effect of earnings announcements, product placements, etc. We argue that the same method is suitable for analysing temporal sentiment variation in the light of policy-level, non-pharmaceutical interventions (NPIs). We provide a case study of Twitter sentiment towards policy-level NPIs in Canada. Our results confirm a generally positive connection between the announcements of NPIs and Twitter sentiment, and we document a promising correlation between the results of this study and a public-health survey of popular compliance with NPIs. Poster #27 Partitioning FPGA-optimized Systolic Arrays for Fun and Profit Presenter: Long Chan, University of Waterloo Collaborators: Gurshaant Malik (University Waterloo), and Nachiket Kapre (University of Waterloo) We can improve the inference throughput of deep convolutional networks mapped to FPGA- optimized systolic arrays, at the expense of latency, with array partitioning and layer pipelining. Modern convolutional networks have a growing number of layers, such as the 58 separable layer GoogleNetv1, with varying compute, storage, and data movement requirements. At the same time, modern high-end FPGAs, such as the Xilinx UltraScale+ VU37P, can accommodate high-performance, 650 MHz, layouts of large 1920x9 systolic arrays. These can stay underutilized if the network layer requirements do not match the array size. We formulate an optimization problem, for improving array utilization, and boosting inference throughput, that determines how to partition the systolic array on the FPGA chip, and how to slice the network layers across the array partitions in a pipelined fashion. We adopt a two phase approach where: 1) we identify layer assignment for each partition using an Evolutionary Strategy; and 2) we adopt a greedy-but-optimal approach for resource allocation to select the systolic array dimensions of each partition. When compared to state-of-the-art systolic architectures, we show throughput improvements in the range 1.3-1.5x and latency improvements in the range 0.5-1.8x against Multi-CLP and Xilinx SuperTile. 25
You can also read