2021 Vector Institute Research Symposium - February 16th & 17th, 2021

Page created by Ken Hogan
2021 Vector Institute Research Symposium - February 16th & 17th, 2021
2021 Vector Institute
Research Symposium
February 16th & 17th, 2021
Poster Sessions

Tuesday, February 16th, 2021
Day 1 Poster Presentations (11:40 AM — 1:15 PM)

Poster # Poster Title Presenter and Affiliation

 1 Learning Differential Equations that are Easy to Jacob Kelly,University of
 Solve Toronto, Student of Vector
 Faculty Member

 2 Improving the Classification Parity of Machine Cecilia Ying,
 Learning Models through Subgroup Queen’s University, Student of
 Thresholds Optimization (STO) Vector Faculty Affiliate

 3 Contrastive Learning for Sports Video: Maria Koshkina,
 Unsupervised Player Classification York University, Student of
 Vector Faculty Affiliate

 4 Learning from Unexpected Events in the Jason Pina,
 Neocortical Microcircuit York University, Student of
 Vector Faculty Affiliate

 5 Regularized Linear Autoencoders Recover the Xuchan Bao,
 Principal Components, Eventually University of Toronto, Student
 of Vector Faculty Affiliate

 6 Dataset Inference: Ownership Resolution in Mohammad Yaghini,
 Machine Learning University of Toronto, Student
 of Vector Faculty Affiliate

 7 A Transfer Learning Based Active Learning Ernest Namdar,
 Framework for Brain Tumor Classification University of Toronto, Student
 of Vector Faculty Affiliate

 8 Can Future Wireless Networks Detect Fires? David Radke,
 University of Waterloo, Student
 of Vector Faculty Affiliate

 9 Domain Adaptation and Self-Supervised Alice Santilli,
 Learning for Surgical Margin Detection Queen’s University, Student of
 Vector Faculty Affiliate

10 Experience Selection Using Dynamics SiQi Zhou,
 Similarity for Efficient Multi-Source Transfer University of Toronto, Student
 Learning Between Robots of Vector Faculty Member

11 Multi-agent Correlated Deep Q-learning for Hao Zhou,
 Microgrid Energy Management University of Ottawa, Student
 of Vector Faculty Affiliate

12 A Learning-based Algorithm to Quickly Rahul Patel,
 Compute Good Primal Solutions for Stochastic University of Toronto, Student
 Integer Programs of Vector Faculty Affiliate

13 Classification of Spinal Curvature from 3D Geoff Klein,
 Spine CT Images using a Convolutional Neural University of Toronto, Student
 Network of Vector Faculty Affiliate

14 A Computational Framework for Slang Zhewei Sun,
 Generation University of Toronto, Student
 of Vector Faculty Affiliate

15 Self-supervised Driven Consistency Training Chetan Srinidhi,
 for Annotation Efficient Histopathology Image University of Toronto,
 Analysis Postdoctoral Fellow of Vector
 Faculty Affiliate

16 Identifying and Interpreting Tuning Dimensions Nolan Dey,
 in Deep Networks University of Waterloo, Student
 of Vector Faculty Member

17 Neural Response Time Analysis: XAI Using Eric Taylor,
 Only a Stopwatch Vector Institute Postdoctoral

18 Solving First Passage Problems in Nanofluidic Martin Magill,
 Devices with Deep Neural Networks Ontario Tech University,
 Vector Postgraduate Affiliate

19 Adversarial Robustness through the Lens of Avery Ma,
 Fourier Analysis University of Toronto, Student
 of Vector Faculty Member

20 Understanding and Mitigating Exploding Paul Vicol,
 Inverses in Invertible Neural Networks University of Toronto, Student
 of Vector Faculty Member

21 Artificial Intelligence for Public Health: Jungtaek Kim,
 Priorities for Successful Use by Public Health Pohang University of Science and
 Organizations Technology, Student of Vector
 Faculty Member

22 Representation of Non-local Shape Information Shaiyan Keshvari,
 in Deep Neural Networks York University, Postdoctoral
 Fellow of Vector Faculty

23 Learning Permutation Invariant Mohammed Adnan,
 Representations Using Memory Networks University of Waterloo, Vector
 Scholarship in AI Recipient

24 Evaluation Metrics for Deep Learning Omar Boursalie,
 Imputation Models in Healthcare and Finance McMaster University, Vector
 Postgraduate Affiliate

25 Detecting fMRI-based Intrinsic Connectivity Saurabh Shaw,
 Networks using EEG alone McMaster University, Vector
 Postgraduate Affiliate

26 Understanding Public Sentiments about Jingcheng Niu,
 COVID-19 Non-Pharmaceutical Interventions University of Toronto, Student
 through Event Studies of Vector Faculty Affiliate

27 Partitioning FPGA-optimized Systolic Arrays Long Chan,
 for Fun and Profit University of Waterloo, Student
 of Vector Faculty Affiliate

28 Learning Personalized Models of Human Reid McIlroy-Young,
 Behavior in Chess University of Toronto, Student
 of Vector Faculty Affiliate

29 Learning a Universal Template for Few-shot Eleni Triantafillou,
 Dataset Generalization University of Toronto, Student
 of Vector Faculty Member

30 Towards Robustness in Deep Learning Reza Samavi,
 Ryerson University, Vector
 Faculty Affiliate

31 Siamese ResNet is capable of deconvolving Mehran Karimzadeh,
 the epigenome by learning from the Vector Institute Postdoctoral
 transcriptome Fellow

32 On the Sample Complexity of Privately Ishaq Aden-Ali,
 Learning Unbounded High-Dimensional McMaster University, Student
 Gaussians of Vector Faculty Affiliate

33 FAB: The French Absolute Beginner Corpus Sean Robertson,
 for Pronunciation Training University of Toronto, Student
 of Vector Faculty Affiliate

34 Writing Can Predict AI Papers Acceptance, but Zining Zhu,
 Not Their Impact University of Toronto, Vector
 Scholarship in AI Recipient

35 Autoregressive Models for Offline Policy Michael Zhang,
 Evaluation and Optimization University of Toronto, Student
 of Vector Faculty Member

36 Interpretable Sequence Classification via Maayan Shvo,
 Discrete Optimization University of Toronto, Student of
 Vector Faculty Member

37 Planning from Pixels using Inverse Dynamics Keiran Paster,
 Models University of Toronto, Student of
 Vector Faculty Member

38 The Act of Remembering: A Study in Partially Toryn Klassen,
 Observable Reinforcement Learning Postdoctoral Fellow of Vector
 Faculty Member

39 Building LEGO Using Deep Generative Models Rylee Thompson,
 of Graphs University of Guelph, Student
 of Vector Faculty Member

40 Evaluating Curriculum Learning Strategies in Michal Lisicki,
 Neural Combinatorial Optimization University of Guelph, Student
 of Vector Faculty Affiliate

41 Predicting Dreissenid Mussel Abundance in Angus Galloway,
 Nearshore Waters using Artificial Intelligence University of Guelph, Student
 of Vector Faculty Member

42 Exchanging Lessons Between Algorithmic Elliot Creager,
 Fairness and Domain Generalization University of Toronto, Student
 of Vector Faculty Member

Poster Sessions

Wednesday, February 17th, 2021
Day 2 Poster Presentations (12:00 — 1:30 PM)

Poster # Poster Title Presenter and Affiliation

 1 Proportionally Fair Clustering Revisited Evi Micha,
 University of Toronto, Student
 of Vector Faculty Affiliate

 2 Unsupervised Representation Learning for Sana Tonekaboni,
 Time Series with Temporal Neighbourhood University of Toronto, Student
 Coding of Vector Faculty Member

 3 Controlled Online Optimization Learning Kyle Mills,
 Ontario Tech University,
 Vector Postgraduate Affiliate

 4 How to Accelerate Attention-based Models Ali Hadi Zadeh,
 Using GOBO University of Toronto, Student
 of Vector Faculty Affiliate

 5 An Analysis of Mortality in Ontario using Gemma Postill,
 Cremation Data: Rise in Cremations during the Western University, Student of
 COVID-19 Pandemic Vector Faculty Affiliate

 6 Reinforcement Learning in Large, Structured Nathan Phelps,
 Action Spaces: A Simulation Study of Decision Western University, Student of
 Support for Spinal Cord Injury Rehabilitation Vector Faculty Affiliate

 7 A Biologically-inspired Neural Implementation Aarti Malhotra,
 of Affect Control Theory University of Waterloo, Student
 of Vector Faculty Affiliate

 8 CLAR: Contrastive Learning of Auditory Haider Al-Tahan,
 Representations Western University, Student of
 Vector Faculty Affiliate

 9 Asynchronous Multi-view Simultaneous Joyce Yang,
 Localization and Mapping University of Toronto, Student
 of Vector Faculty Member

10 Learning Agent Representations for Ice Hockey Guiliang Liu,
 University of Waterloo,
 Postdoctoral Fellow of Vector
 Faculty Member

11 Inverse Reinforcement Learning for Team Yudong Luo,
 Sports: Valuing Actions and Players University of Waterloo, Student
 of Vector Faculty Member

12 Machine Learning Approaches to Piecewise Mohammadmehdi Ataei,
 Linear Interface Construction (PLIC) University of Toronto, Vector
 Postgraduate Affiliate

13 Swept Volumes: To be Continued Silvia Sellán,
 University of Toronto, Student
 of Vector Faculty Affiliate

14 Policy Teaching in Reinforcement Learning via Amin Rakhsha,
 Environment Poisoning Attacks University of Toronto, Student
 of Vector Faculty Member

15 Are Wider Nets better Given the Same Anna Golubeva,
 nNumber of Parameters? University of Waterloo, Vector
 Postgraduate Affiliate

16 Automatic Whole Cell Segmentation for Wenchao Han,
 Multiplexed Images of Ovarian Cancer Tissue University of Toronto, Student
 Section of Vector Faculty Affiliate

17 Robotic Assessment of Stroke Patients using a Faranak Akbarifar,
 Deep Learning Framework Queen’s University, Student of
 Vector Faculty Affiliate

18 Reveal the Hidden Graph in RIEMS Mass Amoon Jamzad,
 Spectra: Margin Assessment in Cancer Queen’s University,
 Surgery with Graph Neural Networks Postdoctoral Fellow of Vector
 Faculty Affiliate

19 Configural Processing in Humans and Deep Xingye Fan,
 Convolutional Neural Networks York University, Student of
 Vector Faculty Affiliate

20 Auto-Tuning Structured Light by Optical Parsa Mirdehghan,
 Stochastic Gradient Descent University of Toronto, Student
 of Vector Faculty Affiliate


22 Online Bayesian Moment Matching based SAT Haonan Duan,
 Solver Heuristics University of Waterloo, Student
 of Vector Faculty Member

23 miRNA-Based Deep Learning for Cancer Emily Kaczmarek,
 Classification Queen’s University, Vector
 Scholarship in AI Recipient

24 Exploring Text Specific and Blackbox Fairness John Chen,
 Algorithms in Multimodal Clinical NLP University of Toronto, Student
 of Vector Faculty Member

25 Utilizing Voxel Deep Neural Networks in Kevin Ryczko,
 Orbital-free Density Functional Theory University of Ottawa, Vector
 Postgraduate Affiliate

26 Data Augmentation Using GANs for GAN Saman Motamed,
 Based Detection of Pneumonia and COVID-19 University of Toronto, Student
 in X-ray Images of Vector Faculty Affiliate

27 Fast Inverse Mapping of Face GANs Nicky Bayat,
 Western University, Student of
 Vector Faculty Affiliate

28 Vec2int: Applications of the Chinese Patricia Thaine,
 Remainder Theorem in Word Embedding University of Toronto, Vector
 Compression and Arithmetic Postgraduate Affiliate

29 Bayesian Few-Shot Classification with One-vs- Jake Snell,
 Each Pólya-Gamma Augmented Gaussian University of Toronto, Student
 Processes of Vector Faculty Member

30 Estimating Severity of Depression from Sri Harsha Dumpala,
 Acoustic Features and Embeddings of Natural Dalhousie University, Student
 Speech of Vector Faculty Member

31 Fall Risk Assessment in the Wild Using Mina Nouredanesh,
 Egocentric Vision and Wearable IMU data University of Waterloo, Vector
 Postgraduate Affiliate

32 No MCMC for me: Amortized Sampling for Fast Jacob Kelly,
 and Stable Training of Energy-Based Models University of Toronto, Student
 of Vector Faculty Member

33 A Modified AUC for Training Convolutional Ernest Namdar,
 Neural Networks: Taking Confidence into University of Toronto, Student
 Account of Vector Faculty Affiliate

34 Flexible Few-Shot Learning of Contextual Mengye Ren,
 Similarity University of Toronto, Student
 of Vector Faculty Member

35 Instance Selection for GANs Terrance DeVries,
 University of Guelph, Student
 of Vector Faculty Member

36 Explicit and Implicit Regularization in Denny Wu,
 Overparameterized Least Squares Regression University of Toronto, Student
 of Vector Faculty Member

37 AMINN: Autoencoder-based Multiple Instance Jianan Chen,
 Neural Network for Outcome Prediction of University of Toronto, Student of
 Multifocal Cancer Vector Faculty Affiliate

38 Folk Theories, Machine Learning, and XAI Michael Ridley,
 Western University, Vector
 Postgraduate Affiliate

39 Adaptive Gradient Quantization for Data- Fartash Faghri,
 Parallel SGD University of Toronto, Student
 of Vector Faculty Member

40 Improving Lossless Compression Rates via Yangjun Ruan & Daniel
 Monte Carlo Bits-Back Coding Severo, University of Toronto,
 Students of Vector Faculty

41 Musical Speech: A Transformer-based Jason d'Eon,
 Composition Tool Dalhousie University, Student
 of Vector Faculty Member

42 EEG Source Space Analysis for Brain- Leila Mousapour,
 computer Interfaces McMaster University, Student
 of Vector Faculty Affiliate

Tuesday, February 16th
Day One Poster Presentations

Poster #1
Learning Differential Equations that are Easy to Solve
Presenter: Jacob Kelly, University of Toronto
Collaborators: Jesse Bettencourt (University of Toronto, Vector Institute), Matthew James
Johnson (Google Brain), and David Duvenaud (University of Toronto, Vector Institute)

Neural ODEs become expensive to solve numerically as training progresses. We introduce a
differentiable surrogate for the time cost of standard numerical solvers using higher-order
derivatives of solution trajectories. These derivatives are efficient to compute with Taylor-mode
automatic differentiation. Optimizing this additional objective trades model performance against
the time cost of solving the learned dynamics.

Poster #2
Improving the Classification Parity of Machine Learning Models through Subgroup
Thresholds Optimization (STO)
Presenter: Cecilia Ying, Queen’s University

We introduce a new post-processing algorithm to reduce the bias, and hence, improve the
classification parity of machine learning model predictions against any protected subgroup
without the need to: (a) preprocess the input training data; or (b) change the underlying machine
learning algorithm. Our algorithm, the Subgroup Threshold Optimizer (STO), optimizes the
classification thresholds for individual subgroups in order to minimize the overall discrimination
score between all subgroups. Since our algorithm works at the post-processing stage, it does
not require any changes to the training data and is machine learning model-agnostic. We
evaluated the effectiveness of our algorithm in achieving both group fairness and subgroup
fairness and show improvement for group fairness and for subgroup fairness by reducing the
overall post adjustment discrimination score.

Poster #3
Contrastive Learning for Sports Video: Unsupervised Player Classification
Presenter: Maria Koshkina, York University
Collaborators: Hemanth Pidaparthy (York University) and James Elder (York University)

We address the problem of unsupervised classification of players in a team sport according to
their team affiliation, when jersey colours and design are not known a priori. We adopt a
contrastive learning approach in which an embedding network learns to maximize the distance
between representations of players on different teams relative to players on the same team, in a
purely unsupervised fashion, without any labelled data. We evaluate the approach using a new
hockey dataset and find that it outperforms prior unsupervised approaches by a substantial
margin, particularly for real-time application when only a small number of frames
are available for unsupervised learning before team assignments must be made. Remarkably,
we show that our contrastive method achieves 94% accuracy after unsupervised training on
only a single frame, with accuracy rising to 97% within 500 frames (17 seconds of game time).
We further demonstrate how accurate team classification allows accurate team-conditional heat
maps of player positioning to be computed.

Poster #4
Learning from Unexpected Events in the Neocortical Microcircuit
Presenter: Jason Pina, York University
Collaborators: Colleen Gillon (University of Toronto, Mila), Jerome Lecoq (Allen Institute for
Brain Science), Joel Zylberberg (Vector Institute, York University, CIFAR), and Blake Richards
(McGill University, Mila, CIFAR)

Scientists have long conjectured that the neocortex learns the structure of the environment in a
predictive, hierarchical manner. To do so, expected, predictable features are differentiated from
unexpected ones by comparing bottom-up and top-down streams of data. It is theorized that the
neocortex then changes the representation of incoming stimuli, guided by differences in the
responses to expected and unexpected events. Such differences in cortical responses have
been observed; however, it remains unknown whether these unexpected event signals govern
subsequent changes in the brain’s stimulus representations, and, thus, govern learning. Here,
we show that unexpected event signals predict subsequent changes in responses to expected
and unexpected stimuli in individual neurons and distal apical dendrites that are tracked over a
period of days. These findings were obtained by observing layer 2/3 and layer 5 pyramidal
neurons in primary visual cortex of awake, behaving mice using two-photon calcium imaging.
We found that many neurons in both layers 2/3 and 5 showed large differences between their
responses to expected and unexpected events. These unexpected event signals also
determined how the responses evolved over subsequent days, in a manner that was different
between the somata and distal apical dendrites. This difference between the somata and distal

apical dendrites may be important for hierarchical computation, given that these two
compartments tend to receive bottom-up and top-down information, respectively. Together, our
results provide novel evidence that the neocortex indeed instantiates a predictive hierarchical
model in which unexpected events drive learning.

Poster #5
Regularized Linear Autoencoders Recover the Principal Components, Eventually
Presenter: Xuchan Bao, University of Toronto
Collaborators: James Lucas, Sushant Sachdeva, and Roger Grosse (all with Vector Institute,
University of Toronto)

Our understanding of learning input-output relationships with neural nets has improved rapidly in
recent years, but little is known about the convergence of the underlying representations, even
in the simple case of linear autoencoders (LAEs). We show that when trained with proper
regularization, LAEs can directly learn the optimal representation - ordered, axis-aligned
principal components. We analyze two such regularization schemes: non-uniform ℓ2
regularization and a deterministic variant of nested dropout [Rippel et al, ICML' 2014]. Though
both regularization schemes converge to the optimal representation, we show that this
convergence is slow due to ill-conditioning that worsens with increasing latent dimension. We
show that the inefficiency of learning the optimal representation is not inevitable -- we present a
simple modification to the gradient descent update that greatly speeds up convergence

Poster #6
Dataset Inference: Ownership Resolution in Machine Learning
Presenter: Mohammad Yaghini, University of Toronto
Collaborators: Pratyush Maini (Indian Institute of Technology Delhi), and Nicolas Papernot
(Vector Institute, University of Toronto)

Upcoming Poster Spotlight at International Conference on Learning Representations
(ICLR) 2021

With increasingly more data and computation involved in their training, machine learning models
constitute valuable intellectual property. This has spurred interest in model stealing attacks,
which are made more practical by advances in learning with partial, little, or no supervision.
Existing defenses focus on inserting unique watermarks in the model's decision surface, but this
is insufficient: since the watermarks are not sampled from the training distribution, they are not
always preserved during model stealing. In this paper, we make the key observation that
knowledge contained in the stolen model's training set is what is common to all stolen copies.
The adversary's goal, irrespective of the attack employed, is always to extract this knowledge or

its by-products. This gives the original model's owner a strong advantage over the adversary:
model owners have access to the original training data. We thus introduce dataset inference,
the process of identifying whether a suspected model copy has private knowledge from the
original model's dataset, as a defense against model stealing. We develop an approach for
dataset inference that combines statistical testing with the ability to estimate the distance of
multiple data points to the decision boundary. Our experiments on CIFAR10 and CIFAR100
show that model owners can claim with confidence greater than 99% that their model (or
dataset as a matter of fact) was stolen, despite only exposing 50 of the stolen model's training
points. Dataset inference defends against state-of-the-art attacks, even when the adversary is
adaptive. Unlike prior work, it also does not require retraining or overfitting the defended model.

Poster #7
A Transfer Learning Based Active Learning Framework for Brain Tumor Classification
Presenter: Ernest Namdar, University of Toronto
Collaborators: Ruqian Hao (University of Electronic Science and Technology of China), Lin Liu
(University of Electronic Science and Technology of China), and Farzad Khalvati (University of

Brain tumor is one of the leading causes of cancer-related death globally among children and
adults. Precise classification of brain tumor grade (low-grade and high-grade glioma) at early
stage plays a key role in successful prognosis and treatment planning. With recent advances in
deep learning, Artificial Intelligence-enabled brain tumor grading systems can assist radiologists
in the interpretation of medical images within seconds. The performance of deep learning
techniques is, however, highly dependent on the size of the annotated dataset. It is extremely
challenging to label a large quantity of medical images given the complexity and volume of
medical data. In this work, we propose a novel transfer learning based active learning
framework to reduce the annotation cost while maintaining stability and robustness of the model
performance for brain tumor classification. We employed a 2D slice-based approach to train and
finetune our model on the Magnetic Resonance Imaging (MRI) training dataset of 203 patients
and a validation dataset of 66 patients which was used as the baseline. With our proposed
method, the model achieved Area Under Receiver Operating Characteristic (ROC) Curve (AUC)
of 82.89% on a separate test dataset of 66 patients, which was 2.92% higher than the baseline
AUC while saving at least 40% of labeling cost. In order to further examine the robustness of
our method, we created a balanced dataset, which underwent the same procedure. The model
achieved AUC of 82% compared with AUC of 78.48% for the baseline, which reassures the
robustness and stability of our proposed transfer learning augmented with active learning
framework while significantly reducing the size of training data.

Poster #8
Can Future Wireless Networks Detect Fires?
Presenter: David Radke, University of Waterloo
Collaborators: Omid Abari (UCLA), Tim Brecht (University of Waterloo), and Kate Larson
(University of Waterloo)

Oral at ACM International Conference on Systems for Energy-Efficient Built
Environments (BuildSys) 2021

Latencies, operating ranges, and false positive rates for existing indoor fire detection systems
like smoke detectors and sprinkler systems are far from ideal. This paper explores the use of
wireless radio frequency (RF) signals to detect indoor fires with low latency, through walls and
other occlusions. We build on past research focused on wireless sensing, and introduce RFire,
a system which uses millimeter wave technology and deep learning to extract instances of fire.
We perform line-of-sight (LoS) and occluded non-LoS experiments with fire at different
distances, and find that RFire achieves a best-result mean latency of 24 seconds when trained
and tested in multiple environments. RFire yields at least a 4 times improvement in mean alarm
latency over today's alarms.

Poster #9
Domain Adaptation and Self-supervised Learning for Surgical Margin Detection
Presenter: Alice Santilli, Queen’s University
Collaborators: Amoon Jamzad (Computing, Queen's University), Alireza Sedghi (Computing,
Queen's University), Martin Kaufmann (Surgery, Queen's University), Kathryn Logan
(Pathology, Queen's University), Julie Wallis (Pathology, Queen's University), Kevin Y.M Ren
(Pathology, Queen's University), Natasja Janssen (Computing, Queen's University), Shaila
Merchant (Surgery, Queen's University), Jay Engel (Surgery, Queen's University), Doug McKay
(Surgery, Queen's University), Sonal Varma (Pathology, Queen's University), Ami Wang
(Pathology, Queen's University), Gabor Fichtinger (Computing, Queen's University), John F.
Rudan (Surgery, Queen's University), and Parvin Mousavi (Computing, Queen's University)

Purpose: One in five women who undergo breast conserving surgery will need a second
revision surgery due to tumor tissue that has been left behind. The iKnife is a mass
spectrometry modality that produces real time margin information based on the signatures of
metabolites in surgical smoke. Using this modality and real-time tissue classification, surgeons
could remove all cancerous tissue during the initial surgery which would improve survival,
mental health and cosmetic outcomes for patients. An obstacle in developing the iKnife breast
cancer recognition model is the destructive, time consuming and sensitive nature of the data
collection that limits the size of the datasets.

Methods: We propose to address these obstacles by first, building a self-supervised learning
from limited, weakly-labeled data. By doing so, the model can learn to contextualize the general
features of iKnife data with more accessible cancer tissue type. Second, the trained model can
then be applied to a cancer classification task on breast data. This domain adaptation allows for
the transfer of learnt weights from models of one tissue type to another. Results: Our datasets
contained 320 skin burns (129 tumor burns, 191 normal burns) from 51 patients and 144 breast
tissue burns (41 tumor and 103 normal) from 11 patients. We investigate the effect of different
hyperparameters in the performance of the final classifier and show that the proposed two step
configuration achieves an accuracy, sensitivity and specificity of 92%, 88% and 92%
respectively. Conclusion: We showed that having a limited number of breast data samples for
training a classifier can be compensate by self-supervised domain adaption on a set of
unlabelled skin data.

Poster #10
Experience Selection Using Dynamics Similarity for Efficient Multi-source Transfer
Learning Between Robots
Presenter: SiQi Zhou, University of Toronto
Collaborators: Michael J. Sorocky (Vector Institute, University of Toronto), and Angela P.
Schoellig (Vector Institute, University of Toronto)

Oral at International Conference on Robotics and Automation (ICRA) 2020

In the robotics literature, different knowledge transfer approaches have been proposed to
leverage the experience from a source task or robot -- real or virtual -- to accelerate the learning
process on a new task or robot. A commonly made but infrequently examined assumption is
that incorporating experience from a source task or robot will be beneficial. In practice,
inappropriate knowledge transfer can result in negative transfer or unsafe behaviour. In this
work, inspired by a system gap metric from robust control theory, the nu-gap, we present a
data-efficient algorithm for estimating the similarity between pairs of robot systems. In a multi-
source inter-robot transfer learning setup, we show that this similarity metric allows us to predict
relative transfer performance and thus informatively select experiences from a source robot
before knowledge transfer. We demonstrate our approach with quadrotor experiments, where
we transfer an inverse dynamics model from a real or virtual source quadrotor to enhance the
tracking performance of a target quadrotor on arbitrary hand-drawn trajectories. We show that
selecting experiences based on the proposed similarity metric effectively facilitates the learning
of the target quadrotor, improving performance by 62% compared to a poorly selected

Poster #11
Multi-agent Correlated Deep Q-learning for Microgrid Energy Management
Presenter: Hao Zhou, University of Ottawa
Collaborators: Melike Erol-Kantarci (University of Ottawa)

Microgrid (MG) energy management is an important part of MG operation. Various entities are
generally involved in the energy management of an MG, e.g., energy storage system (ESS),
renewable energy resources (RER) and the load of users, and it is crucial to coordinate these
entities. The main contribution of this paper is that we propose a correlated deep Q-learning
(CDQN) method for the MG energy management, where each agent runs the DQN
independently, and the correlated equilibrium is used for coordination. Our simulation results
demonstrate the success of CDQN by having 40.9% and 9.62% higher profit for ESS agent and
PV agent, respectively.

Poster #12
A Learning-based Algorithm to Quickly Compute Good Primal Solutions for Stochastic
Integer Programs
Presenter: Rahul Patel, University of Toronto
Collaborators: Yoshua Bengio (Mila/University of Montreal), Andrea Lodi (Polytechnique
Montreal), Emma Frejinger (University of Montreal), and Sriram Sankaranarayanan
(Polytechnique Montreal)

We propose a novel approach using supervised learning to obtain near-optimal primal solutions
for two-stage stochastic integer programming (2SIP) problems with constraints in the first and
second stages. The goal of the algorithm is to predict a representative scenario (RS) for the
problem such that, deterministically solving the 2SIP with the random realization equal to the
RS, gives a near-optimal solution to the original 2SIP. Predicting an RS, instead of directly
predicting a solution ensures first-stage feasibility of the solution. If the problem is known to
have complete recourse, second-stage feasibility is also guaranteed. For computational testing,
we learn to find an RS for a two-stage stochastic facility location problem with integer variables
and linear constraints in both stages and consistently provide near-optimal solutions. Our
computing times are very competitive with those of general-purpose integer programming
solvers to achieve a similar solution quality.

Poster #13
Classification of Spinal Curvature from 3D Spine CT Images using a Convolutional Neural
Presenter: Geoff Klein, University of Toronto
Collaborators: Michael Hardisty (Sunnybrook Research Institute, University of Toronto), Isaac
Carreno (Sunnybrook Research Institute, University of Toronto), Joel Finkelstein (Sunnybrook
Research Institute, University of Toronto), Young Lee (University of Toronto), Arjun Sahgal
(Sunnybrook Research Institute, University of Toronto), Cari Whyne (Sunnybrook Research
Institute, University of Toronto), and Anne Martel (Sunnybrook Research Institute, University of

Introduction: Approximately two-thirds of cancer patients develop bone metastases, with the
spine being the most common location. Vertebral metastases can lead to biomechanical
instability, pain, and neurological compromise. Stereotactic body radiation therapy (SBRT)
delivers high-dose focal treatment to tumours and this treatment has been rapidly expanding
because of its effectiveness for local tumour control. A significant side effect of SBRT is
vertebral compression fracture, occurring in 10% to 40% of patients following SBRT. The broad
goal of this work is to build automated, quantitative tools to aid clinical decision making related
to mechanical stability and fracture risk in metastatically involved vertebrae. Spinal
malalignment (scoliotic deformity characterised as abnormal lateral spinal curvature) is one
measure that has been used in predicting vertebral fracture and progression following SBRT.
However, current evaluation of spinal malalignment can be time consuming (requiring Cobb
angle measurement) with significant inter-observer variation. As such, an automated algorithm
to evaluate Cobb angle in 3D Computed Tomography (CT) scans was developed and applied to
patients with spinal metastasis treated with SBRT.
Methods: Using a 3D U-Net model, which determined a Gaussian heatmap for spine
localization, spline curves were calculated and projected into the coronal plane. Angles were
calculated from the gradient of the spline curves and a sliding window was used to determine
the median angle along the spline curve to determine the overall Cobb angle for the spine.
Data: The VerSe 2019 dataset was used to train the 3D U-Net to predict the Gaussian
heatmaps of the spine using ground truth vertebral body centroids to determine the Gaussian
heatmaps. Augmentation during training was done through random isotropic resampling
(between 3.5- and 6-mm), and affine and deformation transformations. An in-house dataset
from diagnostic imaging (45 CT scans) was then used for parameter tuning to calculate Cobb
angles from the predicted heatmaps. The overall pipeline was then validated on an additional
spine dataset collected for SBRT treatment planning (63 CT scans). The in-house data was a
retrospective dataset of patients who underwent SBRT for spinal metastases, where dosimetry
calculations were done on the SBRT treatment planning images and diagnostic images were
follow-ups after treatment. The in-house data showed a mix of malalignment in terms of

existence and severity, surgical intervention (screws), metastases and vertebral fractures.
Malalignment frequently extended beyond the field-of-view of the scan and pelvic involvement
was common in the scans. Ground truth Cobb angles and scoliosis classification for the in-
house datasets were conducted by a Spine Fellow. Ground truth and predicted angles above
10° were classified as scoliotic.
Results: The model was able to predict scoliosis with accuracy of 79.5% and 76.2% on the
diagnostic imaging and SBRT planning datasets, respectively. The mean ground truth and
predicted Cobb angles in the SBRT treatment planning were 8.8° ± 7.0° (ranging from 0.8° to
28.0°) and 9.5° ± 7.5° (ranging from 1.2° to 35.6°), respectively. The mean ground truth and
predicted Cobb angles in the diagnostic imaging dataset were 8.5° ± 6.7° (ranging from 0.2° to
28.6°) and 12.0° ± 12.6° (ranging from 0.4° to 51.4°), respectively.
Conclusion: A fully automated model was constructed to predict scoliotic spinal curvature in 3D
CT spine scans by evaluating the Cobb angle. Spinal curvature (scoliosis deformity) is
contributing parameter for the SINS classification to determine instability. This algorithm can be
used in clinical decision making to aid in spinal curvature classification and scoliosis severity
assessment. Future work will focus on improving accuracy, expansion to kyphotic deformity,
and combing with other image features related to fracture risk.

Poster #14
A Computational Framework for Slang Generation
Presenter: Zhewei Sun, University of Toronto
Collaborators: Richard Zemel (Vector Institute, University of Toronto) and Yang Xu (Vector
Institute, University of Toronto)

Slang is a common type of informal language, but its flexible nature and paucity of data present
challenges for existing natural language systems. We take an initial step toward machine
generation of slang by developing a framework that models the speaker's word choice in slang
context. Our framework encodes novel slang meaning by relating the conventional and slang
senses of a word while incorporating syntactic and contextual knowledge in slang usage. We
construct the framework using a combination of probabilistic inference and neural contrastive
learning. We perform rigorous evaluations on three slang dictionaries and show that our
approach not only outperforms state-of-the-art language models, but it also better predicts the
historical emergence of slang word usages from 1960s to 2000s. We interpret the proposed
models and find that the contrastively learned semantic space is sensitive to the similarities
between slang and conventional senses of words. Our work creates opportunities for the
automated generation and interpretation of informal language.

Poster #15
Self-supervised Driven Consistency Training for Annotation Efficient Histopathology
Image Analysis
Presenter: Chetan Srinidhi, University of Toronto
Collaborators: Seung Wook Kim (Dept. CSE, University of Toronto), Fu-Der Chen (Dept. of
ECE, University of Toronto)

Training a neural network with a large labeled dataset is still a dominant paradigm in
computational histopathology. However, obtaining such exhaustive manual annotations is often
expensive, laborious, and prone to inter and intra-observer variability. While recent self-
supervised and semi-supervised methods can alleviate this need by learning unsupervised
feature representations, they still struggle to generalize well to downstream tasks when the
number of labeled instances is small.
In this work, we overcome this challenge by leveraging both task-agnostic and task-specific
unlabeled data based on two novel strategies: i) a self-supervised pretext task that harnesses
the underlying multi-resolution contextual cues in histology whole-slide images to learn a
powerful supervisory signal for unsupervised representation learning; and ii) a new teacher-
student semi-supervised consistency paradigm that learns to effectively transfer the pretrained
representations to downstream tasks based on prediction consistency with the task-specific
unlabeled data.
We carry out extensive validation experiments on three histopathology benchmark datasets
across two classification and one regression based tasks, i.e., tumor metastasis detection,
tissue type classification, and tumor cellularity quantification. Under limited-label data, the
proposed method yields tangible improvements, which is close or even outperforming other
state-of-the-art self-supervised and supervised baselines. Furthermore, we empirically show
that the idea of bootstrapping the self-supervised pretrained features is an effective way to
improve the task-specific semi-supervised learning on standard benchmarks. Besides, we also
show that our pretrained representations are more generic and agnostic to images trained with
different tissue types or organs and resolution protocols.

Poster #16
Identifying and Interpreting Tuning Dimensions in Deep Networks
Presenter: Nolan Dey, University of Waterloo
Collaborators: J. Eric Taylor (Vector Institute, University of Guelph), Bryan P. Tripp (University
of Waterloo), Alexander Wong (University of Waterloo), and Graham W. Taylor (Vector Institute,
University of Guelph)

Training a neural network with a large labeled dataset is still a dominant paradigm in
computational histopathology. However, obtaining such exhaustive manual annotations is often

expensive, laborious, and prone to inter and intra-observer variability. While recent self-
supervised and semi-supervised methods can alleviate this need by learning unsupervised
feature representations, they still struggle to generalize well to downstream tasks when the
number of labeled instances is small.
In this work, we overcome this challenge by leveraging both task-agnostic and task-specific
unlabeled data based on two novel strategies: i) a self-supervised pretext task that harnesses
the underlying multi-resolution contextual cues in histology whole-slide images to learn a
powerful supervisory signal for unsupervised representation learning; ii) a new teacher-student
semi-supervised consistency paradigm that learns to effectively transfer the pretrained
representations to downstream tasks based on prediction consistency with the task-specific
unlabeled data.
We carry out extensive validation experiments on three histopathology benchmark datasets
across two classification and one regression based tasks, i.e., tumor metastasis detection,
tissue type classification, and tumor cellularity quantification. Under limited-label data, the
proposed method yields tangible improvements, which is close or even outperforming other
state-of-the-art self-supervised and supervised baselines. Furthermore, we empirically show
that the idea of bootstrapping the self-supervised pretrained features is an effective way to
improve the task-specific semi-supervised learning on standard benchmarks. Besides, we also
show that our pretrained representations are more generic and agnostic to images trained with
different tissue types or organs and resolution protocols.

Poster #17
Neural Response Time Analysis: XAI Using Only a Stopwatch
Presenter: Eric Taylor, Vector Institute
Collaborators: Shashank Shekhar (University of Guelph), and Graham Taylor (University of

Oral at Conference on Computer Vision and Computer Recognition (CVPR) 2020

How would you describe the features that a deep learning model composes if you were
restricted to measuring observable behaviours? Explainable artificial intelligence (XAI) methods
rely on privileged access to model architecture and parameters that is not always feasible for
most users, practitioners, and regulators. Inspired by cognitive psychology research on humans,
we present a case for measuring response times (RTs) of a forward pass using only the system
clock as a technique for XAI. Our method applies to the growing class of models that use input-
adaptive dynamic inference and we also extend our approach to standard models that are
converted to dynamic inference post hoc. The experimental logic is simple: If the researcher can
contrive a stimulus set where variability among input features is tightly controlled, differences in
response time for those inputs can be attributed to the way the model composes those features.

First, we show that RT is sensitive to difficult, complex features by comparing RTs from
ObjectNet and ImageNet. Next, we make specific a priori predictions about RT for abstract
features present in the SCEGRAM dataset, where object recognition in humans depends on
complex intra-scene object-object relationships. Finally, we show that RT profiles bear
specificity for class identity, and therefore the features that define classes. These results cast
light on the model's feature space without opening the black box.

Poster #18
Solving First Passage Problems in Nanofluidic Devices with Deep Neural Networks
Presenter: Martin Magill, Ontario Tech University
Collaborators: Andrew M. Nagel (Ontario Tech University), and Hendrick W. de Haan (Ontario
Tech University)

A major theme in the deep learning revolution has been the ability of deep models to overcome
the curse of dimensionality in a wide variety of settings. One such application is the solution of
high-dimensional partial differential equations (PDEs). PDEs are powerful mathematical tools
used throughout physics and the mathematical sciences. The computational costs of traditional
PDE solvers grow exponentially with problem dimension, so high-dimensional PDEs are
typically considered intractable. However, the use of deep neural networks in this area is rapidly
opening up new avenues of research in this field.
This work looks at the use of deep neural networks to solve PDEs from biophysics that describe
complex molecular motion. Specifically, the PDEs are formulated to understand the mean first
passage time of molecules passing through a microfluidic sorting device. Such devices are
designed with complex geometries to enable single-molecule detection, analysis, and
manipulation, enabling a variety of biotechnologies (e.g., personalized medicine). The use of
deep neural networks could enable faster and more efficient design of these complicated

Poster #19
Adversarial Robustness through the Lens of Fourier Analysis
Presenter: Avery Ma, University of Toronto
Collaborators: Simona Meng (University of Toronto), and Amir-massoud Farahmand (Vector
Institute, University of Toronto)

How is a robustified model different from a non-robustified one from the Fourier perspective?
Our work investigates the problem of adversarial robustness by empirically studying different
defense and attack approaches in the frequency domain. Motivated by the widely-used
assumption that natural images are primarily represented in low frequencies, we demonstrate in
a simple logistic regression setting that standard training focuses on optimizing low-frequency

components of the weights, making the model vulnerable to high-frequency adversarial
perturbations. In our preliminary results, we show that attenuating the high-frequency
components of the weights during training leads to improved adversarial robustness of the

Poster #20
Understanding and Mitigating Exploding Inverses in Invertible Neural Networks
Presenter: Paul Vicol, University of Toronto
Collaborators: Jens Behrmann (University of Bremen), Kuan-Chieh Wang (University of Toronto &
Institute), Roger Grosse (University of Toronto & Vector Institute), and Jorn-Henrik Jacobsen
(University of Toronto & Vector Institute)

Invertible neural networks (INNs) have been used to design generative models, implement memory-
saving gradient computation, and solve inverse problems. In this work, we show that commonly-used
INN architectures suffer from exploding inverses and are thus prone to becoming numerically non-
invertible. Across a wide range of INN use-cases, we reveal failures including the non-applicability of
the change-of-variables formula on in- and out-of-distribution (OOD) data, incorrect gradients for
memory-saving backprop, and the inability to sample from normalizing flow models. We further derive
bi-Lipschitz properties of atomic building blocks of common architectures. These insights into the
stability of INNs then provide ways forward to remedy these failures. For tasks where local invertibility
is sufficient, like memory-saving backprop, we propose a flexible and efficient regularizer. For
problems where global invertibility is necessary, such as applying normalizing flows on OOD data, we
show the importance of designing stable INN building blocks.

Poster #21
Brick-by-Brick: Sequential 3D Object Construction with Deep Reinforcement Learning
Presenter: Jungtaek Kim, Pohang University of Science and Technology (POSTECH)
Collaborators: Hyunsoo Chung (POSTECH), Boris Knyaznev (University of Guelph, Vector Institute),
Graham Taylor (University of Guelph, Vector Institute), Jinhwi Lee (POSTECH), Jaesik Park
(POSTECH), and Minsu Cho (POSTECH)

3D object construction is a challenging problem requiring understanding of objects compositional
and relational structure. Humans solve this problem using their natural ability to imagine a
decomposition of a target object into its constituent parts and then \textit{sequentially} building the
object part-by-part. Remarkably, to do so humans often do not rely on strong supervision in which
order and where to put each of the parts. Our method models human behavior by constructing an
object component-wise in a combinatorial manner. As the basis for learning, we utilize a
volumetric unit primitive as the building block of 3D objects. In this regime, we formulate a
reinforcement learning-based model without strong supervision of intermediate target object
information or building instructions. Our approach employs graph-structured inputs, where the
nodes and edges of the graph express the pose of primitives and the connection between them,

respectively. We introduce a reinforcement learning environment for construction based on
OpenAI Gym and demonstrate that our approach successfully learns to construct objects
within diverse evaluation scenarios conditioned on a single image or multiple views of a target
object, even if when the target information of unseen categories is given.

Poster #22
Representation of Non-local Shape Information in Deep Neural Networks
Presenter: Shaiyan Keshvari, York University
Collaborators: Ingo Fründ, and James Elder (York University)

It is uncertain how explicitly deep convolutional neural networks (DCNNs) represent shape.
While neurons in primate visual areas such as V4 and IT are known to be selective for global
shape, some studies suggest that DCNNs rely primarily on local texture cues. Here we employ
a set of novel shape stimuli to explicitly test for the representation of non-local shape
information in DCNNs.
We employ a set of animal silhouettes as well as matched controls generated by two distinct
generative models of shape. The first model generates silhouettes that are matched for local
curvature statistics, but are otherwise maximally random, containing no global regularities. The
second model generates sparse shape components that contain many of the global symmetries
seen in animal shapes but are otherwise not identifiable.
To assess the selectivity of DCNNs for non-local shape information, we train a linear classifier to
distinguish animal shapes from control shapes based on the activations in each layer. For both
AlexNet and VGG16, discriminability improved monotonically from early to late convolutional
layers, reaching 90-100% accuracy. These results show that DCNNs do represent non-local
shape information, that this information becomes more explicit in later layers, and goes beyond
simple global geometric regularities.

Poster #23
Learning Permutation Invariant Representations using Memory Networks
Presenter: Mohammed Adnan, University of Waterloo
Collaborators: Shivam Kalra (KIMIA Lab, University of Waterloo), Graham Taylor (University of
Guelph, Vector Institute), and H.R. Tizhoosh (KIMIA Lab, University of Waterloo)

Many real-world tasks such as classification of digital histopathology images and 3D object
detection involve learning from a set of instances. In these cases, only a group of instances or a

set, collectively, contains meaningful information and therefore only the sets have labels, and
not individual data instances. In this work, we present a permutation invariant neural network
called Memory-based Exchangeable Model (MEM) for learning set functions. The MEM model
consists of memory units that embed an input sequence to high-level features enabling the
model to learn inter-dependencies among instances through a self-attention mechanism. We
evaluated the learning ability of MEM on various toy datasets, point cloud classification, and
classification of lung whole slide images (WSIs) into two subtypes of lung cancer---Lung
Adenocarcinoma, and Lung Squamous Cell Carcinoma. We systematically extracted patches
from lung WSIs downloaded from The Cancer Genome Atlas~(TCGA) dataset, the largest
public repository of WSIs, achieving a competitive accuracy of 84.84% for classification of two
sub-types of lung cancer. The results on other datasets are promising as well, and demonstrate
the efficacy of our model.

Poster #24
Evaluation Metrics for Deep Learning Imputation Models in Healthcare and Finance
Presenter: Omar Boursalie, McMaster University
Collaborators: Reza Samavi (Ryerson University, Vector Institute) and Thomas E. Doyle
(McMaster University, Vector Institute)

Oral at AAAI Conference on Artificial Intelligence 2021

There is growing interest in imputing missing data in tabular datasets using deep learning. A
commonly used metric in evaluating the performance of a deep learning-based imputation
model is root mean square error (RMSE), which is a prediction evaluation metric. In this study,
we demonstrate the limitations of RMSE for evaluating deep learning-based imputation
performance by conducting a comparative analysis between RMSE and alternative metrics in
the statistical literature including qualitative, predictive accuracy, and statistical distance. To
minimize model and dataset biases, we use two different deep learning imputation models
(denoising autoencoders and generative adversarial nets) and a regression imputation model.
We also use two tabular datasets with growing amounts of missing data from different industry
sectors: healthcare and financial. Our results show that contrary to the commonly used RMSE
metric, the statistical metric of Jensen Shannon distance best assessed the imputation models'
performance. The regression model also ranked higher than deep learning when evaluated
using the Jensen Shannon metric. This study was presented at the 5th International Workshop
on Health Intelligence (W3PHIAI-21) co-located with the 35th AAAI Conference on AI. The
paper will appear in Studies in Computational Intelligence (SCI).

Poster #25
Detecting fMRI-based Intrinsic Connectivity Networks using EEG alone
Presenter: Saurabh Shaw, McMaster University
Collaborators: Margaret McKinnon (St. Joseph's, McMaster University, Homewood Research
Institute), Jennifer Heisz (McMaster University), Amabilis Harrison (Hamilton Health Sciences),
John Connolly (Vector Institute, McMaster University), and Suzanna Becker (Vector Institute,
McMaster University)

Dysfunctional intrinsic connectivity network (ICN) dynamics have been discovered in a number
of psychopathologies. However, despite its potential use as biomarkers for clinical applications,
major barriers have prevented its widespread adoption. These include high operational costs
and low temporal resolution of fMRI, the most commonly used modality for this purpose. This
study addresses this shortcoming by developing a machine learning pipeline capable of tracking
ICNs using a cheaper and more widely accessible modality such as EEG. EEG-based features
of three cognitively-relevant ICNs were found using feature engineering on simultaneous EEG-
fMRI data. These features were used to train three classifiers, emulating different scenarios of
data availability. The highest test-set classification accuracies of 97% were achieved using fully
supervised classifiers that were trained on both EEG and fMRI data from the same participant.
On the other hand, classification accuracies of 60% were achieved using traditional leave-one-
subject-out cross validation on the EEG data only, and were boosted up to 75% by utilizing
semi-supervised learning. In conclusion, this study validates a machine learning framework to
detect ICN activation using EEG data alone, improving the feasibility of using brain network-
based biomarkers in clinical applications.

Poster #26
Understanding Public Sentiments about COVID-19 Non-pharmaceutical Interventions
through Event Studies
Presenter: Jingcheng Niu, University of Toronto
Collaborators: Gerald Penn (University of Toronto), Victoria Ng (Public Health Agency of
Canada), and Erin E. Rees (Public Health Agency of Canada)

Attributing shifts in social media sentiment to real-world events is now an important aspect of
public policy. Especially as the whole world is combating the COVID-19 pandemic, a better
understanding of the public's opinion of their government's responses is crucial for balancing the
demand for public health resources against the potential for economic devastation and the
public's own compliance with draconian measures. Early publications about public sentiment
towards interventions against SARS-CoV-2 transmission --- especially those not in CL
conferences and journals --- have already drawn some highly suspect conclusions because they
lack a method for properly attributing sentiment changes to events. As yet, they have no ability
to distinguish the influence of various events across time, no possibility of conducting

significance tests, and no coherent model for predicting the public's opinion of future events of
the same sort. Dealing in sentiment analysis components without providing some clear, task-
specific guidance about how to use and evaluate them is simply asking for this sort of abuse.
This paper argues that we can bring the potential of this urgently needed CL application to
fruition by looking outside CL, because in fact, the required evaluation methods already do exist.
In the financial sector, event studies of the fluctuation in a publicly traded company's stock price
are commonplace for determining the effect of earnings announcements, product placements,
etc. We argue that the same method is suitable for analysing temporal sentiment variation in the
light of policy-level, non-pharmaceutical interventions (NPIs). We provide a case study of Twitter
sentiment towards policy-level NPIs in Canada. Our results confirm a generally positive
connection between the announcements of NPIs and Twitter sentiment, and we document a
promising correlation between the results of this study and a public-health survey of popular
compliance with NPIs.

Poster #27
Partitioning FPGA-optimized Systolic Arrays for Fun and Profit
Presenter: Long Chan, University of Waterloo
Collaborators: Gurshaant Malik (University Waterloo), and Nachiket Kapre (University of

We can improve the inference throughput of deep convolutional networks mapped to FPGA-
optimized systolic arrays, at the expense of latency, with array partitioning and layer pipelining.
Modern convolutional networks have a growing number of layers, such as the 58 separable
layer GoogleNetv1, with varying compute, storage, and data movement requirements. At the
same time, modern high-end FPGAs, such as the Xilinx UltraScale+ VU37P, can accommodate
high-performance, 650 MHz, layouts of large 1920x9 systolic arrays. These can stay
underutilized if the network layer requirements do not match the array size. We formulate an
optimization problem, for improving array utilization, and boosting inference throughput, that
determines how to partition the systolic array on the FPGA chip, and how to slice the network
layers across the array partitions in a pipelined fashion. We adopt a two phase approach where:
1) we identify layer assignment for each partition using an Evolutionary Strategy; and 2) we
adopt a greedy-but-optimal approach for resource allocation to select the systolic array
dimensions of each partition. When compared to state-of-the-art systolic architectures, we show
throughput improvements in the range 1.3-1.5x and latency improvements in the range 0.5-1.8x
against Multi-CLP and Xilinx SuperTile.

You can also read