CONDENSENEXTV2: LIGHT-WEIGHT MODERN IMAGE CLASSIFIER UTILIZING SELF-QUERYING AUGMENTATION POLICIES

Page created by Everett Sparks

Cars & Machinery

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

CONDENSENEXTV2: LIGHT-WEIGHT MODERN IMAGE CLASSIFIER UTILIZING SELF-QUERYING AUGMENTATION POLICIES

Journal of
 Low Power Electronics
 and Applications

Article
CondenseNeXtV2: Light-Weight Modern Image Classifier
Utilizing Self-Querying Augmentation Policies
Priyank Kalgaonkar and Mohamed El-Sharkawy *

 Department of Electrical and Computer Engineering, Purdue School of Engineering and Technology,
 Indianapolis, IN 46254, USA; pkalgaon@purdue.edu
 * Correspondence: melshark@purdue.edu

 Abstract: Artificial Intelligence (AI) combines computer science and robust datasets to mimic natural
 intelligence demonstrated by human beings to aid in problem-solving and decision-making involving
 consciousness up to a certain extent. From Apple’s virtual personal assistant, Siri, to Tesla’s self-
 driving cars, research and development in the field of AI is progressing rapidly along with privacy
 concerns surrounding the usage and storage of user data on external servers which has further
 fueled the need of modern ultra-efficient AI networks and algorithms. The scope of the work
 presented within this paper focuses on introducing a modern image classifier which is a light-weight
 and ultra-efficient CNN intended to be deployed on local embedded systems, also known as edge
 devices, for general-purpose usage. This work is an extension of the award-winning paper entitled
 ‘CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems’ published for
 the 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC).
 The proposed neural network dubbed CondenseNeXtV2 utilizes a new self-querying augmentation
 policy technique on the target dataset along with adaption to the latest version of PyTorch framework
 and activation functions resulting in improved efficiency in image classification computation and
 
  accuracy. Finally, we deploy the trained weights of CondenseNeXtV2 on NXP BlueBox which is an
Citation: Kalgaonkar, P.;
 edge device designed to serve as a development platform for self-driving cars, and conclusions will
El-Sharkawy, M. CondenseNeXtV2: be extrapolated accordingly.
Light-Weight Modern Image
Classifier Utilizing Self-Querying Keywords: CondenseNeXt; convolutional neural network; computer vision; embedded systems;
Augmentation Policies. J. Low Power edge devices; image classification; CNN; PyTorch
Electron. Appl. 2022, 12, 8. https://
doi.org/10.3390/jlpea12010008

Academic Editor: Andrea Calimera
 1. Introduction
Received: 9 December 2021
 Convolutional Neural Network (CNN) is a class of Deep Neural Network (DNN),
Accepted: 21 January 2022
 which is a subset of Machine Learning (ML), designed to realize and harness power of Arti-
Published: 3 February 2022
 ficial Intelligence (AI) to simulate natural intelligence demonstrated by living creatures of
Publisher’s Note: MDPI stays neutral the Kingdom Animalia. The first CNN algorithm was introduced by Alexey G. Ivakhnenko
with regard to jurisdictional claims in and V. G. Lapa in 1967 for supervised, deep, feed-forward, multi-layer perceptions [1].
published maps and institutional affil- By the year 2012, data and computation intensive CNN architectures began dominating
iations. accuracy benchmarks. On 30 September 2012, AlexNet CNN architecture developed by
 Alex Krizhevsky in collaboration with Ilya Sutskever and Geoffrey Hinton [2,3] won the
 ImageNet Large Scale Visual Recognition (ILSCR) Challenge [4] which caught the world’s
 attention, eventually becoming the benchmark for newer CNNs utilizing GPUs to accelerate
Copyright: © 2022 by the authors.
 deep learning [5].
Licensee MDPI, Basel, Switzerland.
 In recent years, CNNs have become a popular cornerstone in the field of computer
This article is an open access article
 vision research and development which is a multidisciplinary field of computer science
distributed under the terms and
conditions of the Creative Commons
 that focuses on developing innovative techniques and algorithms to enable machines to
Attribution (CC BY) license (https://
 perform complex tasks such as image classification and object detection, analogous to
creativecommons.org/licenses/by/ that of a human visual system, for real-world applications such as advanced surveillance
4.0/).

J. Low Power Electron. Appl. 2022, 12, 8. https://doi.org/10.3390/jlpea12010008 https://www.mdpi.com/journal/jlpea

of a human visual system, for real-world applications such as advanced surveillance sys
tems for malicious activity recognition [6], self-driving cars, AI robots, and Unmanned
Ariel Vehicles (UAV) such as drones [7].
J. Low Power Electron. Appl. 2022, 12, 8 2 of 15
In this paper, we propose a modern image classifier which is light-weight and ultra
efficient that is intended to be deployed on local embedded systems, also known as edge
devices, for general-purpose usage. This work is an extension of an award-winning paper
systems for malicious activity recognition [6], self-driving cars, AI robots, and Unmanned
entitled
Ariel ‘CondenseNeXt:
Vehicles (UAV) such as An Ultra-Efficient
drones [7]. Deep Neural Network for Embedded Sys
tems’ published for the 2021 IEEE 11th Annual
In this paper, we propose a modern image classifier Computing and Communication
which is light-weight and ultra- Work
shop and
efficient thatConference
is intended to(CCWC) [8]. The
be deployed proposed
on local neural
embedded network
systems, dubbed
also known CondenseN
as edge
eXtV2 utilizes
devices, a new self-querying
for general-purpose usage. Thisaugmentation policyoftechnique
work is an extension on a given
an award-winning papertarget da
entitled ‘CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems’
taset along with adaption to the latest version of PyTorch framework at the time of writing
published
this paper, forresulting
the 2021 IEEE
in an11th Annual Computing
improved and Communication
top-1 accuracy Workshopperformance
in image classification and
Conference (CCWC) [8]. The proposed neural network dubbed CondenseNeXtV2 utilizes
which we shall observe through experiments by deploying the trained weights of Con
a new self-querying augmentation policy technique on a given target dataset along with
denseNeXtV2 for real-time image classification on NXP BlueBox development platform
adaption to the latest version of PyTorch framework at the time of writing this paper,
[9].
resulting in an improved top-1 accuracy in image classification performance which we shall
observe through experiments by deploying the trained weights of CondenseNeXtV2 for
2. Background
real-time and Literature
image classification Review
on NXP BlueBox development platform [9].
2.1. Artificial Neural Networks
2. Background and Literature Review
Artificial
2.1. Artificial Neural
Neural Networks (ANN) take inspiration from the human brain. A human
Networks
brainArtificial
is a complex
Neuraland non-linear
Networks (ANN) processing system
take inspiration capable
from of retraining
the human brain. A and
humanreorganiz
ing its
brain is acrucial
complex structural components
and non-linear known
processing systemas neurons
capable to perform
of retraining and complex
reorganizingoperations
such
its as image
crucial classification
structural componentsand object
known detection
as neurons much complex
to perform faster than any general-purpose
operations such as
computer
image in existence
classification today.
and object ANNs much
detection are developed
faster than using a common programming
any general-purpose computer lan
in existence
guage, today.
trained ANNs
and thenare developed
deployed onusing a common
computer programming
hardware capablelanguage, trained
of executing objective
and then tasks
specific deployed on computer
or deployed hardwareon
to simulate capable of executing
a computer. ANNs objective-specific tasks or way by
function in a similar
deployed to simulate on a computer. ANNs function in a similar way by:
1. Extracting knowledge using neurons and,
1. Extracting knowledge using neurons and,
2. Storing the extracted information with the help of inter-neuron connection strengths
2. Storing the extracted information with the help of inter-neuron connection strengths
knownasassynaptic
known synaptic weights.
weights.
Thisprocess
This process ofof learning
learning by by reorganizing
reorganizing synaptic
synaptic weights
weights in an artificial
in an artificial neural net
neural net-
workisiscalled
work calleda learning
a learning algorithm.
algorithm. Figure
Figure 1 below
1 below provides
provides a visual
a visual representation
representation of a of a
nonlinearmodel
nonlinear model ofof a neuron
a neuron inANN.
in an an ANN.

Figure1.1.AAvisual
Figure visual representation
representation of aof a nonlinear
nonlinear modelmodel of a neuron
of a neuron in an ANN.
in an ANN.

InInFigure
Figure1, synaptic weights
1, synaptic are given
weights are as inputs
given astoinputs
the neurons
to theofneurons
an ANN of which may which
an ANN
have positive or negative values. A summing junction functions as an adder to
may have positive or negative values. A summing junction functions as an adder to line linearly
combine all input weights with respect to corresponding synaptic weights of the neuron
arly combine all input weights with respect to corresponding synaptic weights of the neu
and a continuously differentiable linear or non-linear activation functions such as Sigmoid,
ron and
Tanh, a continuously
ReLU, differentiable
etc. are used to linear or
decide if the neuron non-linear
should activation
be activated or not.functions such as Sig
moid, Tanh, ReLU, etc. are used to decide if the neuron should be activated or not.
2.2. Representation Learning in Multi-Layer Feed-Forward Neural Networks
2.2. Representation Learning inneural
Multi-layer feed-forward Multi-Layer Feed-Forward
networks Neural
are one of the most Networks
popular and widely
used types of artificial neural network architectures in which the neurons make use of a
learning algorithm to train on a target dataset.

Multi-layer feed-forward neural networks are one of the most popular and widel
used types of artificial neural network architectures in which the neurons make use of
J. Low Power Electron. Appl. 2022, 12, 8 3 of 15
learning algorithm to train on a target dataset.
Multi-layer feed-forward neural networks can have one or more hidden layers whic
determines the depth of a neural network. Here, the expression hidden signifies that a pa
Multi-layer feed-forward neural networks can have one or more hidden layers which
ticular part
determines theofdepth
an ANN is notnetwork.
of a neural directly accessible to input or
Here, the expression output
hidden nodesthat
signifies of athe ANN
Therefore, as the depth increases, the ANN will possess even more
particular part of an ANN is not directly accessible to input or output nodes of the ANN. synaptic connection
between as
Therefore, neurons
the depth to increases,
extract meaningful
the ANN will information
possess even and
more adjust its weights.
synaptic connections Such a ne
work architecture
between is alsomeaningful
neurons to extract commonlyinformation
known as and multilayer
adjust itsperceptrons,
weights. Such which
a net-utilizes
work architecture is algorithm
backpropagation also commonlyduring known as multilayer
the training phase.perceptrons, which utilizes a
backpropagation
The work algorithm
proposedduring within thethis
training
paperphase.
is based on fundamental principles of mult
The work proposed within this paper is based on fundamental principles of multi-layer
layer feed-forward neural networks for supervised learning in the OpenCV realm. In su
feed-forward neural networks for supervised learning in the OpenCV realm. In supervised
pervised machine learning, the mapping between input variables ( ) and output variable
machine learning, the mapping between input variables (x) and output variables (y) is
( ) is learned.
learned. It infersIta infers a function
function fromdataset
from labeled labeled dataset
known as known as ground
ground truth. truth.
The idea The idea
is to
to train
train and and
learnlearn hundreds
hundreds of classes
of classes from afrom a variety
variety of different
of different images containing
images containing noise nois
andirregularities
and irregularities which
which can can
be abe a challenge
challenge for traditional
for traditional image image classification
classification algorithms.algorithm
Figure
Figure 2 below
2 below provides
provides a visuala representation
visual representation of these
of these images. images.
The work The work
presented withinpresente
this paperthis
within utilizes
paperAutoAugment [10] data augmentation
utilizes AutoAugment [10] datatechnique to overcome
augmentation this issue
technique to overcom
asthis
discussed in the following sections of this paper.
issue as discussed in the following sections of this paper.

Figure
Figure 2. 2. Sample
Sample of ×
of 32 3232× resolution
32 resolution images
images containing
containing noise
noise and andirregularities.
other other irregularities.

ToTo
address
address thethe
challenges
challenges of performance
of performance and and
efficiency of deep
efficiency neural
of deep network
neural network a
architectures, in 2016, a team of researchers and engineers of Facebook AI Research (FAIR)
chitectures, in 2016, a team of researchers and engineers of Facebook AI Research (FAIR
lab introduced PyTorch and released it under the Modified BSD license for fair use [11].
lab introduced PyTorch and released it under the Modified BSD license for fair use [11
PyTorch is a scientific computing framework and an open-source machine learning library
PyTorch
built is aTorch
upon the scientific
library,computing
which is alsoframework and an open-source
a scientific computing machine
framework and learning l
an open-
brarymachine
source built upon the Torch
learning library,library, which
introduced is also
in 2002 a scientific
[12]. Some of thecomputing framework
important features of and a
open-source
PyTorch include: machine learning library, introduced in 2002 [12]. Some of the important fea
1.tures of PyTorch
Tensor include:
computation accelerated by GPU(s).
2.1. Automatic differentiationaccelerated
Tensor computation using tape-based autograd.
by GPU(s).
2. PyTorch is a popular
Automatic choice of deep
differentiation usinglearning framework
tape-based for commercial applications
autograd.
such as autonomous vehicles, AI applications, etc. Tesla Motors, a leader in the autonomous
PyTorch is a popular choice of deep learning framework for commercial application
(self-driving) vehicle industry, utilizes PyTorch for their famous Autopilot system, which is
such as autonomous vehicles, AI applications, etc. Tesla Motors, a leader in the autono
a suite of SAE Level 2 Advanced Driver-Assistance System (ADAS) features, by utilizing
mous (self-driving)
advanced vehicle industry,
computing hardware utilizescomputer
for deep learning, PyTorchvision
for their
andfamous Autopilot
sensor fusion [13]. system
which
The is a suite
proposed CNNof SAEthis
within Level 2 Advanced
paper is based on Driver-Assistance System
the latest stable release (ADAS)
of PyTorch features, b
version
utilizing
1.10.0 advanced
and CUDA computing
toolkit version 10.2hardware
[14]. for deep learning, computer vision and senso
fusion [13]. The proposed CNN within this paper is based on the latest stable release o
PyTorch version 1.10.0 and CUDA toolkit version 10.2 [14].

2.3. Evolution of CondenseNeXt

J. Low Power Electron. Appl. 2022, 12, x FOR PEER REVIEW 4 of 1

J. Low Power Electron. Appl. 2022, 12, 8 4 of 15

Huang et al. introduced ResNeXt [15] CNN architecture which utilizes an innovativ
technique to skip one or more layers, known as identity shortcut connections. This a
2.3.
lowedEvolution of CondenseNeXt
ResNeXt CNN to train on large amounts of data with unparalleled efficiency at th
time. Huang et al. introduced ResNeXt [15] CNN architecture which utilizes an innovative
technique to skip
In the one or more
following year,layers,
in 2017,known
Huangas identity shortcut connections.
et al. introduced DenseNet This
[16]allowed
CNN architec
ResNeXt CNN to train on large amounts of data with unparalleled efficiency
ture that utilizes a novel technique to connect each convolutional layer to other proceed at the time.
In the following year, in 2017, Huang et al. introduced DenseNet [16] CNN architecture
ing layers in a feed-forward fashion. This facilitates the reuse of synaptic weights b
that utilizes a novel technique to connect each convolutional layer to other proceeding
providing outputs of current layer as inputs to every other proceeding layer resulting i
layers in a feed-forward fashion. This facilitates the reuse of synaptic weights by providing
extraction
outputs of information
of current at different
layer as inputs levels
to every other of coarseness.
proceeding DenseNet
layer resulting providesofa baselin
in extraction
for CondenseNet CNN architecture.
information at different levels of coarseness. DenseNet provides a baseline for CondenseNet
In 2018, Huang et al. introduced CondenseNet [17], which was an improvement ove
CNN architecture.
the In 2018, Huang
DenseNet CNN.et al.Inintroduced
this CNNCondenseNet
architecture, [17], which proposed
authors was an improvement over
pruning several layer
the DenseNet CNN. In this CNN architecture, authors proposed pruning
and blocks and replacing standard convolutions with group convolutions (G). Furthe several layers
and
more,blocks and
they replacingastandard
proposed technique convolutions with group
to learn mapping convolutions
patterns (G). Furthermore,
between channels and group
they proposed a technique to learn mapping patterns between channels and groups by
by connecting layers from different groups to each other directly, thus preventing ind
connecting layers from different groups to each other directly, thus preventing individual
vidual groups of channels from training separately. The authors collectively called th
groups of channels from training separately. The authors collectively called this technique
astechnique as learned
learned group group convolutions.
convolutions. Figure 3 below Figure 3 below
provides provides
a visual a visual
comparison comparison o
of dense
dense blocks in DenseNet vs.
blocks in DenseNet vs. CondenseNet CNN. CondenseNet CNN.

Figure
Figure Comparison
3. 3. of dense
Comparison blocks
of dense in DenseNet
blocks vs. CondenseNet
in DenseNet CNN. CNN.
vs. CondenseNet

CondenseNet performs a 1 × 1 standard group convolution. The major drawback of
CondenseNet performs a 1 × 1 standard group convolution. The major drawback o
implementing such a convolution is that it does not parse through each channel’s spatial di-
implementing such amisses
mensions and therefore, convolution is that
out on fine it does
details nottoparse
leading a loss through each
in efficiency andchannel’s
overall spatia
dimensions
accuracy of theand therefore,
network. misses
To address theseout on fine
issues, details and
Kalgaonkar leading to a loss
El-Sharkawy in efficiency an
introduced
overall accuracy of the network. To address these issues, Kalgaonkar
CondenseNeXt [8] in 2021. CondenseNeXt refers to the next dimension of cardinality. and El-Sharkaw
In
introduced
this CondenseNeXt
CNN architecture, [8] propose
the authors in 2021.replacing
CondenseNeXt
standardrefers
groupedto the next dimension
convolutions with of ca
depthwise
dinality. separable
In this CNNconvolutions [18], which
architecture, allows the
the authors network
propose to extractstandard
replacing information by
grouped con
transforming an image only once and then elongating this transformed image
volutions with depthwise separable convolutions [18], which allows the network to ex to n number
of channels instead of simply transforming the image n number of times, resulting in a
tract information by transforming an image only once and then elongating this tran
boost in computational efficiency and fewer number of Floating-Point Operations (FLOPs)
formed image to number of channels instead of simply transforming the image
at the training stage.
number of times, resulting in a boost in computational efficiency and fewer number o
Floating-Point Operations (FLOPs) at the training stage.

3. CondenseNeXtV2 Architecture

J. Low Power Electron. Appl. 2022, 12, x FOR PEER REVIEW 5 of 16

J. Low Power Electron. Appl. 2022, 12, 8 In this section, a new CNN architecture, called CondenseNeXtV2, is being proposed.
5 of 15
CondenseNeXtV2 is inspired by and is an improvement over CondenseNeXt CNN. The
primary goal of CondenseNeXtV2 CNN is to further improve performance and top-1 ac-
curacy of the network.
3. CondenseNeXtV2 This section describes the architecture in detail.
Architecture
In this section, a new CNN architecture, called CondenseNeXtV2, is being proposed.
3.1. Convolutional Layers
CondenseNeXtV2 is inspired by and is an improvement over CondenseNeXt CNN. The
CondenseNeXtV2
primary utilizes depthwise
goal of CondenseNeXtV2 CNN isseparable
to furtherconvolution which enhances
improve performance the effi-
and top-1
ciency of the
accuracy network
of the network.without havingdescribes
This section to transform an image over
the architecture and over again, result-
in detail.
ing in reduction in number of FLOPs and an increase in computational efficiency in com-
3.1. Convolutional
parison to standardLayers
group convolutions. Depthwise separable convolution is comprised
of two distinct layers: utilizes depthwise separable convolution which enhances the effi-
CondenseNeXtV2
ciency of the network without having to transform an image over and over again, resulting
• Depthwise convolution: This layer acts like a filtering layer which applies depthwise
in reduction in number of FLOPs and an increase in computational efficiency in comparison
convolution to a single input channel. Consider an image of size × × given as
to standard group convolutions. Depthwise separable convolution is comprised of two
an input with number of input channels of the input image with kernels ( ) of size
distinct layers:
× × 1, the output of this depthwise convolution operation will be × ×
• Depthwise convolution: This layer acts like a filtering layer which applies depthwise
resulting in reduction of spatial dimensions whilst preserving number of channels
convolution to a single input channel. Consider an image of size X × X × I given as an
(depth) of the input image. The cost of computation for this operation can be mathe-
input with I number of input channels of the input image with kernels (K) of size K ×
matically
K × 1, the represented
output of thisasdepthwise
follows: convolution operation will be Y × Y × I resulting
in reduction of spatial dimensions whilst preserving I number of channels (depth) of
the input image. The cost , of = , , for
, computation ∗ this operation
, , can be mathematically (1)
represented as follows: ,

• Pointwise convolution: This layer acts like a combining layer which performs linear
Ŷ = ∑ K̂i,j,m ∗ Xk+i−1,l + j−1,m (1)
combination of all outputsk,l,m generated
i,j
by the previous layer since the depth of input
image from previous operation hasn’t changed. Pointwise convolution operation is
• performed using a 1 ×This
Pointwise convolution: 1 kernel
layer i.e., 1 × a1combining
acts like × resulting
layer in the size
which of the
performs output
linear
image to be of×all outputs
combination × . The cost of computation
generated for this
by the previous layeroperation
since the can be of
depth mathemat-
input
imagerepresented
ically from previous as operation
follows: hasn’t changed. Pointwise convolution operation is
performed using a 1 × 1 kernel i.e., 1 × 1 × K resulting in the size of the output image
to be Y × Y × K. The cost of ,computation for this operation can be mathematically
, = , ∗ , , (2)
represented as follows:
Yk,l,n = ∑ Kem,n ∗ Ŷk−1,l−1,m (2)
A depthwise separable convolution splits m a kernel into two distinct layers for filtering
and combining
A depthwiseoperations resulting insplits
separable convolution an improvement
a kernel into twoin distinct
overall layers
computational
for filteringeffi-
ciency. Figure 4operations
and combining provides resulting
a 3D visual representation
in an improvement of depthwise
in overall separable efficiency.
computational convolution
Figure 4 provides a 3D visual representation of depthwise separable convolution
operation where a single image is transformed only once and then this transformed image operation
iswhere a single
elongated toimage is transformed
over 128 only allows
channels. This once andthethen
CNNthisto
transformed image is elongated
extract information at different
to over
levels for128 channels. For
coarseness. Thisexample,
allows the CNN to
consider a extract information
green channel fromatthe
different levels for
RGB channel being
coarseness. For example, consider a green channel from the RGB channel being
processed. Depthwise separable convolution will extract different shades of color green processed.
Depthwise
from separable convolution will extract different shades of color green from this
this channel.
channel.

Figure4.
Figure A 3D
4. A 3D visual
visual representation
representationofofdepthwise
depthwiseseparable convolution
separable forfor
convolution a given example.
a given example.

3.2. Model Compression
3.2.1. Network Pruning

ctron. Appl. 2022, 12, x FOR PEER REVIEW 6 of 16
J. Low Power Electron. Appl. 2022, 12, 8 6 of 15

Network pruning
3.2. Modelis one of the effective ways to reduce computational costs by dis-
Compression
3.2.1. Network Pruning weights that do not affect the overall performance of
carding redundant and insignificant
Network a
a DNN. Figure 5 provides pruning
visualiscomparison
one of the effective ways to
of weight reduce computational
matrices for differentcosts by dis-
pruning
carding
approaches as follows: redundant and insignificant weights that do not affect the overall performance of
a DNN. Figure 5 provides a visual comparison of weight matrices for different pruning
1. Fine-grained pruning:
approaches using this approach, redundant weights that will have mini-
as follows:
mum influence
1. on accuracy
Fine-grained will beusing
pruning: pruned. However,
this approach, this approach
redundant weights results
that willin irreg-
have min-
ular structures.imum influence on accuracy will be pruned. However, this approach results in
2. Coarse-grainedirregular
pruning: structures.
using this approach, an entire filter which will have mini-
2. Coarse-grained pruning: using this approach, an entire filter which will have minimal
mal influence on overall accuracy is discarded. This approach results in regular
influence on overall accuracy is discarded. This approach results in regular structures.
structures. However,
However,ififthethe network
network isishighly
highly compressed,
compressed, there there
may bemay be significant
significant loss in the
loss in the overall accuracy
overall accuracyof ofthe
the DNN.
DNN.
3. 3.
2D group-level 2Dpruning:
group-level pruning:
this this approach
approach maintainsmaintains a balance
a balance of both,
of both, benefitsand
benefits and
trade-offs of fine-grained and coarse-grained
trade-offs of fine-grained and coarse-grained pruning. pruning.

Figure 5. Comparison
Figure of different pruning
5. Comparison methods.
of different pruning methods.

CondenseNeXtV2 CNN utilizes 2D Group-Level pruning to discard insignificant filters
CondenseNeXtV2 CNN utilizes 2D Group-Level pruning to discard insignificant fil-
which is complemented by L1 -Normalization [19] to facilitate the group-level pruning
ters which is complemented
process. by -Normalization [19] to facilitate the group-level prun-
ing process.
3.2.2. Class-Balanced Focal Loss (CFL) Function
Pruning
3.2.2.. Class-Balanced Focalnetworks in order
Loss (CFL) to discard insignificant and redundant elements of the
Function
network can still have an overall harsh effect caused due to imbalanced weights. Therefore,
Pruning networks
in order toin order any
mitigate to discard insignificant
such issues, and redundant
CondenseNeXtV2 elements
CNN incorporates of the
a weighting
network can still have
factor an overall
inversely harsh to
proportional effect caused
the total dueofto
number imbalanced
samples, weights. There-
called Class-balanced Focal
fore, in order Loss (CFL) Function
to mitigate any[20].
such Thisissues,
approachCondenseNeXtV2
can be mathematicallyCNNrepresented as follows: a
incorporates
weighting factor inversely proportional to the totalC number of samples, called Class-bal-
) = − ∑ (1can t γ t

anced Focal Loss (CFL) Function [20].CFL This
(z, yapproach − pibe
) ∗mathematically
log pi represented (3)
i =1
as follows:
3.2.3. Cardinality
A new dimension, called Cardinality, is added to the existing spatial dimensions of
( , ) = ( 1 ) ∗ log( )
the CondenseNeXtV2 network to prevent loss in overall accuracy during the pruning
(3)
process, denoted by D. Increasing cardinality rate is a more effective way to obtain a

3.2.3. Cardinality
A new dimension, called Cardinality, is added to the existing spatial dimensions of
the CondenseNeXtV2 network to prevent loss in overall accuracy during the pruning pro-

J. Low Power Electron. Appl. 2022, 12, 8 7 of 15

 boost in accuracy instead of going wider or deeper within the network. Cardinality can be
 mathematically represented as follows:

 G ∗ Gx = X ∗ D − p · X (4)

 3.3. Data Augmentation
 Data augmentation is a popular technique to increase the amount of data and its
 diversity by modifying already existing data and adding newly created data to existing data
 in order to improve the accuracy of modern image classifiers. CondenseNeXtV2 utilizes
 AutoAugment data augmentation technique to pre-process target dataset to determine the
 best augmentation policy with the help of Reinforcement Learning (RL). AutoAugment
 primarily consists of following two parts:
 • Search Space: Each policy consists of five sub-policies and each sub-policy consists of
 two operations that can be performed on an image in a sequential order. Furthermore,
 each image transformation operation consists of two distinct parameters as follows:
 o The probability of applying the image transformation operation to an image.
 o The magnitude with which the image transformation operation should be
 applied to an image.
 There are a total of 16 image transformation operations. 14 operations belong to
 the PIL (Python Imaging Library) which provides the python interpreter with image
 editing capabilities such as rotating, solarizing, color inverting, etc. and the remaining two
 operations are Cutout [21] and SamplePairing [22]. Table 6 in the ‘Supplementary materials’
 section of [10] provides a comprehensive list of all 16 image transformation operations
 along with default range of magnitudes.
 • Search Algorithm: AutoAugment utilizes a controller RNN, a one-layer LSTM [23]
 containing 100 hidden units and 2 × 5B softmax predictions, to determine the best
 augmentation policy. It examines the generalization ability of a policy by performing
 child model (a neural network being trained during the search process) experiments
 on a subset of the corresponding dataset. Upon completion of these experiments, a
 reward signal is sent to the controller to update validation accuracy using a popular
 policy gradient method called Proximal Policy Optimization algorithm (PPO) [24]
 with a learning rate of 0.00035 and an entropy penalty with a weight of 0.00001. The
 controller RNN provides a decision in terms of a softmax function which is then fed
 into the controller’s next stage as an input, thus allowing it to determine magnitude
 and probability of the image transformation operation.
 At the end of this search process, sub-policies from top five best policies are merged
 into one best augmentation policy with 25 sub-policies for the target dataset.

 3.4. Activation Function
 An activation function in a neural network plays a crucial role in determining the
 output of a node by limiting the output’s amplitude. These functions are also known as
 transfer functions. Activation functions help the neural network learn complex patterns of
 the input data more efficiently whilst requiring fewer number of neurons.
 CondenseNeXtV2 CNN utilizes ReLU6 activation functions along with batch nor-
 malization before each layer. ReLU6 activation function cap units at 6 which helps the
 neural network learn sparse features quickly in addition to preventing gradients to increase
 infinitely as seen in Figure 6. ReLU6 activation function can be mathematically defined as
 follows:
 f ( x ) = min(max(0, x ), 6) (5)

J.J.Low
LowPower
PowerElectron.
Electron.Appl. 2022,12,
Appl.2022, 12,8x FOR PEER REVIEW 88of
of15
16

Figure6.6.ReLU
Figure ReLUvs.
vs. ReLU6
ReLU6 activation
activation functions.
functions.

4.
4. NXP
NXP BlueBox
BlueBox 2.0
2.0 Embedded
Embedded Development
DevelopmentPlatform
Platform
This
This section provides
provides aa brief
briefintroduction
introductionofofNXP
NXP BlueBox
BlueBox 2.02.0 (Gen2)
(Gen2) embedded
embedded de-
development platform for Automotive High-Performance Compute (AHPC). It
velopment platform for Automotive High-Performance Compute (AHPC). It is a series of is a series
of development
development platform
platform designed
designed forfor self-driving
self-driving (autonomous)
(autonomous) vehicles.
vehicles.

4.1.
4.1. System
System Design
Design of of NXP
NXP BlueBox
BlueBoxGen2Gen2Family
Family
The BlueBox 2.0 embedded
The BlueBox 2.0 embedded development development platform
platform series is designed
series is designedandanddeveloped
developedby
NXP Semiconductors. First launched in 2016 as a generation 1 series,
by NXP Semiconductors. First launched in 2016 as a generation 1 series, the target appli- the target application
has been
cation hasself-driving (autonomous)
been self-driving vehicles ever
(autonomous) since.
vehicles everIt was introduced
since. to general to
It was introduced public
gen-
in Austin, Texas, USA. Shortly thereafter, NXP improved an advanced series of BlueBox
eral public in Austin, Texas, USA. Shortly thereafter, NXP improved an advanced series
which was called BlueBox 2.0, the second generation of this series. It features following
of BlueBox which was called BlueBox 2.0, the second generation of this series. It features
three new ARM-based automotive processors [9]:
following three new ARM-based automotive processors [9]:
1. S32V234 (S32V) dedicated for computer vision processing tasks: This automotive
1. processor
S32V234 (S32V)
is based dedicated
on a four forcore
computer vision A53
ARM Cortex processing tasks: and
architecture Thisoperates
automotiveat 1.0
processor is based on a four core ARM Cortex A53 architecture
GHz. An additional ARM Cortex M4 CPU provides further support and functionality and operates at 1.0
GHz.
for An additional
reliability ARM Cortex
and workload M4 CPU
sharing. It is provides furtherby
complimented support
4 MB andSRAM functional-
and an
additional 32-bit LPDDR3 controller to support additional memory if requiredand
ity for reliability and workload sharing. It is complimented by 4 MB SRAM by an
the
additional 32-bit
developers. It alsoLPDDR3
boasts ancontroller
Image Signalto support additional
Processor memorythe
(ISP) on-board if required by
SoC for noise
the developers.
filtering, It also boasts
noise reduction, etc. an Image Signal Processor (ISP) on-board the SoC for
2. noise filtering,
LS2084A (LS2) noise reduction,
dedicated for high etc.performance computing tasks: This automotive
2. processor
LS2084A (LS2) dedicated for
is based on an eight core high performance
ARM Cortex computing tasks: This
A72 architecture andautomotive
operates at
1.80 GHz. It is complimented by 144 bytes of DDR4 RAM with support enabledatfor
processor is based on an eight core ARM Cortex A72 architecture and operates
1.80 GHz. ItLX2
LayerScape is complimented
family. NXP claims by 144 LS2
bytestoofprovide
DDR4 RAM 15 yearswith ofsupport
reliable enabled for
service and
LayerScape
meets LX2 family.
Q100 grade NXP claims
3 automotive LS2 to provide 15 years of reliable service and
standards.
3. meets Q100
S32R274 (S32R)grade 3 automotive
dedicated for radarstandards.
information processing tasks: This automotive
3. processor
S32R274 (S32R)
is based dedicated for radar
on a dual-core information
Freescale PowerPCprocessing tasks: This automotive
e200z4 architecture and operates
processor
at 120 MHz is based on a dual-core
in addition to a checker Freescale PowerPC this
core on-board e200z4 architecture and
System-on-Chip oper-It
(SoC).
ates at 120
features a 2.0MHz MBin addition
flash memory to aandchecker core
1.5 MB ofon-board
SRAM. this System-on-Chip (SoC).
It features
NXP BlueBox a 2.0
2.0MB flash to
adheres memory
automotiveand 1.5 MB of SRAM.
compliance and safety standards such as the
ASIL-B/C
NXP and D to 2.0
BlueBox ensure operability
adheres and reliability
to automotive in alland
compliance real-world conditions.
safety standards Figure
such 7
as the
below provides
ASIL-B/C and Da to
high-level system architecture
ensure operability viewin
and reliability ofall
NXP BlueBoxconditions.
real-world 2.0 [9]. Figure 7
below provides a high-level system architecture view of NXP BlueBox 2.0 [9].

J. Low Power Electron. Appl. 2022, 12, x FOR PEER REVIEW

on. Appl. 2022,
 J. Low 12, x FOR
 Power PEER
 Electron. REVIEW
 Appl. 2022, 12, 8 9 of 16 9 of 15

 Figure 7. A high-level system architecture view of NXP BlueBox 2.0.
 Figure 7. A high-level system architecture view of NXP BlueBox 2.0.
 Figure 7. A high-level system architecture view of NXP BlueBox 2.0.

 Figure 8 below provides
 Figure 8 below provides a top-frontal view NXP BlueBox 2.0 automotive
 Figure 8 abelow
 top-frontal
 providesview NXP BlueBox
 a top-frontal 2.0BlueBox
 view NXP automotive embedded
 2.0 automotive embedded
 development
 development
 development platform. platform.
 platform.

 Figure 8. NXP BlueBox 2.0.
 Figure NXP
 Figure8. 8. BlueBox
 NXP 2.0.
 BlueBox
 2.0.
 4.2. RTMaps (Real-Time Multi-Sensor Applications) Remote Studio Software
 4.2. RTMaps
 RTMaps (Real-time (Real-Time
 Multi-sensor Multi-Sensor
 Applications) Applications)
 Remote Remote
 Studio Software is aStudio
 compo- Software
 nent-based software development and execution
 RTMaps (Real-time environment
 Multi-sensor made by Intempora
 Applications) Remoteto fa- Software i
 Studio
 cilitate development of autonomous driving and related automotive applications
 nent-based software development and execution environment made [25]. It by Intem
 allows users to link various sensors and hardware components by providing a host of
 cilitate development of autonomous driving and related automotive applicati
 component libraries for automotive sensors, buses, and perception algorithm design as
 allows users to link various sensors and hardware components by providin
 well as support for almost any type of sensors and any number of those sensors.
 component libraries for automotive sensors, buses, and perception algorithm
 RTMaps can run on Windows and Linux operating systems and provides support

J. Low Power Electron. Appl. 2022, 12, 8 10 of 15

4.2. RTMaps (Real-Time Multi-Sensor Applications) Remote Studio Software
RTMaps (Real-time Multi-sensor Applications) Remote Studio Software is a component-
based software development and execution environment made by Intempora to facilitate
development of autonomous driving and related automotive applications [25]. It allows
users to link various sensors and hardware components by providing a host of component
libraries for automotive sensors, buses, and perception algorithm design as well as support
for almost any type of sensors and any number of those sensors.
RTMaps can run on Windows and Linux operating systems and provides support for
PyTorch and TensorFlow deep learning frameworks. In RTMaps, programming is done
using Python scripting language and then deployed for real-time inference on NXP BlueBox
2.0.

5. Cyberinfrastructure
This section provides details on the hardware and software utilized for training
and testing the proposed CondenseNeXtV2 CNN architecture from which results are
extrapolated accordingly in the following sections.

5.1. Cyberinfrastructure for Training the Proposed CNN
Cyberinfrastructure for training is provided and managed by the Research Technolo-
gies division at the Indiana University in part by Shared University Research grants from
IBM Inc. to Indiana University and Lilly Endowment Inc. through its support for the
Indiana University Pervasive Technology Institute [26]. It is as follows:
• Intel Xeon Gold 6126 12-core CPU with 32 GB RAM.
• NVIDIA Tesla V100 GPU.
• CUDA Toolkit 10.2.
• Python version 3.7.9.
• PyTorch version 1.10.

5.2. Cyberinfrastructure for Testing the Proposed CNN
Cyberinfrastructure for testing is provided and managed by the Internet of Things (IoT)
Collaboratory at the Purdue School of Engineering and Technology at Indiana University
Purdue University at Indianapolis as follows:
• NXP BlueBox 2.0 embedded development platform.
• Intempora RTMaps Remote Studio version 4.9.0.
• Python version 3.6.7.
• PyTorch version 1.10.

6. Experiments and Results
This section provides information on experiment results based on extensive training
and testing of the proposed convolutional neural network, CondenseNeXtV2, on CIFAR-10
and CIFAR-100 benchmarking datasets to verify real-time image classification performance
on NXP BlueBox 2.0. CondenseNeXtV2 CNN has been adapted to be compatible with the
latest stable release of PyTorch version 1.10.0 and CUDA toolkit version 10.2 [14] whereas
CondenseNet and CondenseNeXt CNNs are based on PyTorch version 1.1.0 and 1.9.0
respectively.

6.1. CIFAR-10 Dataset
CIFAR-10 dataset is a popular computer-vision dataset used for image classification
and object detection. It consists of 60,000 32 × 32 RGB color images equally divided into 10
object classes with 6,000 images per class. CIFAR-10 dataset is split into two sets: 50,000
images for training a CNN and 10,000 images for testing the performance of trained model
weights. The training and testing sets are mutually exclusive to obtain fair evaluation
results.

J. Low Power Electron. Appl. 2022, 12, 8 11 of 15

CondenseNeXtV2 CNN was trained for 200 epochs with a batch size of 64 input
images. Since learning algorithm for CondenseNet, CondenseNeXt and CondenseNeXtV2
is unique, hyperparameters such as growth and learning rate for training these CNNs
is different. Python scripting language was utilized to develop an image classification
algorithm for CIFAR-10 dataset and the trained weights were then deployed on to NXP
BlueBox 2.0 embedded development platform using RTMaps Remote Studio software for
image classification performance evaluation.
J. Low Power Electron. Appl. 2022, 12, x FORTable
PEER1REVIEW
provides an overview of testing results in terms of FLOPs, number of trainable
11 of 16
parameters, Top-1 accuracy and inference time when evaluated on NXP BlueBox 2.0. Top-1
accuracy corresponds to one class predicted by CNN with highest probability of matching
the expected
matching theanswer, also
expected known
answer, as known
also the ground
as thetruth. Figure
ground 9 Figure
truth. provides an overview
9 provides of
an over-
model performance curves and Figure 10 provides screenshot of single image classification
view of model performance curves and Figure 10 provides screenshot of single image
performance of CondenseNeXtV2 when evaluated on NXP BlueBox 2.0 using RTMaps
classification performance of CondenseNeXtV2 when evaluated on NXP BlueBox 2.0 us-
Remote Studio software on a Linux PC.
ing RTMaps Remote Studio software on a Linux PC.
Table 1. Comparison of CIFAR-10 single image classification performance.
Table 1. Comparison of CIFAR-10 single image classification performance.
FLOPs Parameters Inference Time
CNN FLOPs
(in Million)
Parameters
(in Million)
Top-1 Accuracy Inference Time
(in Seconds)
CNN Top-1 Accuracy
(in Million) (in Million) (in Seconds)
CondenseNet 65.81 0.52 94.69% 0.1346
CondenseNet
CondenseNeXt 65.81
23.80 0.52
0.16 94.69%
92.28% 0.1346
0.0659
CondenseNeXt
CondenseNeXtV2 23.80
23.80 0.16
0.16 92.28%
93.57% 0.0659
0.0528
CondenseNeXtV2 23.80 0.16 93.57% 0.0528

Figure9.9.Performance
Figure Performancecurves
curvesofofCondenseNeXtV2
CondenseNeXtV2vs.vs.CondenseNet
CondenseNetand
andCondenseNeXt.
CondenseNeXt.These
These
CNN models are trained on CIFAR-10 dataset for 200 epochs
CNN models are trained on CIFAR-10 dataset for 200 epochs each. each.

6.2. CIFAR-100 Dataset
CIFAR-100 dataset is another popular computer-vision dataset used for image classi-
fication and object detection. It also consists of 60,000 32 × 32 RGB color images equally
divided into 100 object classes with 600 images per class. CIFAR-100 dataset is split into
two sets: 50,000 images for training a CNN and 10,000 images for testing the performance
of trained model weights. As with CIFAR-10 dataset, the training and testing sets are also
mutually exclusive in this dataset to obtain fair evaluation results.
CondenseNeXtV2 CNN was trained for 240 epochs with a batch size of 64 input images.
Hyperparameters such as growth and learning rate for training these CNNs is different
because learning algorithm for CondenseNet, CondenseNeXt and CondenseNeXtV2 is
unique. Python scripting language was utilized to develop an image classification algorithm
for CIFAR-100 dataset and the trained weights were then deployed on to NXP BlueBox

J. Low Power Electron. Appl. 2022, 12, 8 12 of 15

J. Low Power Electron. Appl. 2022, 12, 2.0 embedded
x FOR development
PEER REVIEW platform using RTMaps Remote Studio software for
12 ofimage
16
classification performance evaluation.

Figure 10. NXP BlueBox 2.0 single image classification prediction of a horse as an input image being
Figure 10. NXP BlueBox 2.0 single image classification prediction of a horse as an input image being
observed in RTMaps Remote Studio software.
observed in RTMaps Remote Studio software.
6.2. CIFAR-100 Dataset
Table 2 provides an overview of testing results in terms of FLOPs, number of trainable
CIFAR-100
parameters, dataset
Top-1 is another
accuracy and popular
inference computer-vision dataset used
time when evaluated for image
on NXP clas- 2.0.
BlueBox
sification and object detection. It also consists of 60,000 32 × 32 RGB
Figure 11 provides an overview of model performance curves and Figure 12 providescolor images equally
divided into
screenshot of 100 object
single imageclasses with 600 images
classification per class.
performance CIFAR-100 datasetwhen
of CondenseNeXtV2 is split into
evaluated
two sets: 50,000 images for training a CNN and 10,000 images for testing the performance
on NXP BlueBox 2.0 using RTMaps Remote Studio software on a Linux PC.
of trained model weights. As with CIFAR-10 dataset, the training and testing sets are also
mutually exclusive in this dataset to obtain fair evaluation results.
Table 2. Comparison of CIFAR-100 single image classification performance.
CondenseNeXtV2 CNN was trained for 240 epochs with a batch size of 64 input im-
ages. Hyperparameters FLOPs such as growth and learning rate for training these Inference
Parameters CNNs is dif- Time
ferent CNN
because learning algorithm for CondenseNet,
(in Million) (in Million)
Top-1 Accuracy
CondenseNeXt and CondenseN-
(in Seconds)
eXtV2 is unique. Python scripting
CondenseNet 65.85 language was 0.55utilized to develop an image classification
76.65% 0.2483
algorithm for
CondenseNeXt CIFAR-100 dataset
26.38 and the trained
0.22 weights were
74.87% deployed on0.1125
then to NXP
BlueBox 2.0 embedded development
CondenseNeXtV2 26.38 platform using RTMaps 75.54%
0.22 Remote Studio software
0.0972for
image classification performance evaluation.

Table 2. Comparison of CIFAR-100 single image classification performance.

 FLOPs Parameters Inference Time
 CNN Top-1 Accuracy
 (in Million) (in Million) (in Seconds)
 CondenseNet 65.85 0.55 76.65% 0.2483
J. Low Power Electron. Appl. 2022, 12, 8 13 of 15
 CondenseNeXt 26.38 0.22 74.87% 0.1125
 CondenseNeXtV2 26.38 0.22 75.54% 0.0972

 Figure 11. Performance curves
 curvesof
 Performance ofCondenseNeXtV2
 CondenseNeXtV2vs.vs.CondenseNet andand
 CondenseNet CondenseNeXt.
 CondenseNeXt.These
 These
 J. Low Power Electron. Appl. 2022, 12, Figure 11. REVIEW
 x FOR PEER 14 of 16
 CNN models are trained on CIFAR-100 dataset for 240 epochs each.
 CNN models are trained on CIFAR-100 dataset for 240 epochs each.

 Figure 12. NXP BlueBox 2.0 single image classification prediction of a train as an input image being
 Figure 12. NXP BlueBox 2.0 single image classification prediction of a train as an input image being
 observed in RTMaps Remote Studio software.
 observed in RTMaps Remote Studio software.
 7. Conclusions
 The scope of the work presented within this paper focuses on introducing a modern
 image classifier named CondenseNeXtV2 which is light-weight and ultra-efficient that is
 intended to be deployed on local embedded systems, also known as edge devices, for

J. Low Power Electron. Appl. 2022, 12, 8 14 of 15

 7. Conclusions
 The scope of the work presented within this paper focuses on introducing a modern
 image classifier named CondenseNeXtV2 which is light-weight and ultra-efficient that
 is intended to be deployed on local embedded systems, also known as edge devices, for
 general-purpose usage. This proposed CNN utilizes a new self-querying augmentation
 policy technique on a target dataset along with adaption to the latest version of PyTorch
 framework and activation functions resulting in improved efficiency in image classification
 computation and accuracy.
 CondenseNeXtV2 CNN was extensively trained from scratch and tested on two
 benchmarking datasets: CIFAR-10 and CIFAR-100 in order to verify the single image
 classification performance. The trained weights of CondenseNeXtV2 were then deployed on
 NXP BlueBox which is an edge device designed to serve as a development platform for self-
 driving cars, and the results were extrapolated and compared to its baseline architecture,
 CondenseNet, accordingly. CondenseNeXtV2 demonstrates an increase in Top-1 accuracy
 over CondenseNeXt as well as utilizes fewer number of FLOPs and total number of
 trainable parameters in comparison to CondenseNet, observed during the training and
 testing phases.

 Author Contributions: Conceptualization, P.K. and M.E.-S.; methodology, P.K.; software, P.K.; vali-
 dation, P.K. and M.E.-S.; writing—original draft preparation, P.K.; writing—review and editing, P.K.
 and M.E.-S.; supervision, M.E.-S. All authors have read and agreed to the published version of the
 manuscript.
 Funding: This research received no external funding.
 Data Availability Statement: The data presented in this study are available on request from the
 corresponding author. The data are not publicly available due to the fact that all the authors work at
 the same institution and there is no necessity to create repository for data exchange.
 Acknowledgments: The authors like to acknowledge the Indiana University Pervasive Technology
 Institute for providing supercomputing and storage resources as well as the Internet of Things
 (IoT) Collaboratory at the Purdue School of Engineering and Technology at Indiana University
 Purdue University at Indianapolis for providing autonomous vehicle development platform that
 have contributed to the research results reported within this paper.
 Conflicts of Interest: The authors declare no conflict of interest.

References
1. Berners-Lee, C.M. Cybernetics and Forecasting. Nature 1968, 219, 202–203. [CrossRef]
2. Gershgorn, D. ImageNet: The Data that Spawned the Current AI Boom, QUARTZ. Available online: https://qz.com/1034972/
 the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/ (accessed on 10 January 2022).
3. Alex, K.; Sutskever, I.; Geoffrey, H. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60,
 84–90. [CrossRef]
4. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al.
 ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [CrossRef]
5. Deshpande, A. Engineering at Forward|UCLA CS ’19. Available online: https://adeshpande3.github.io/ (accessed on 10 January
 2022).
6. Dimitriou, N.; Kioumourtzis, G.; Sideris, A.; Stavropoulos, G.; Taka, E.; Zotos, N.; Leventakis, G.; Tzovaras, D. An integrated
 framework for the timely detection of petty crimes. In Proceedings of the 2017 European Intelligence and Security Informatics
 Conference, Athens, Greece, 11–13 September 2017. [CrossRef]
7. Lotfi, A.; Bouchachia, H.; Gegov, A.; Langensiepen, C.; McGinnity, M. (Eds.) Advances in Computational Intelligence Systems;
 Springer: Berlin/Heidelberg, Germany, 2018; p. 840. [CrossRef]
8. Kalgaonkar, P.; El-Sharkawy, M. CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems. In Proceedings
 of the 2021 IEEE 11th Annual Computing and Communication Workshop and Conference, CCWC 2021, Las Vegas, NV, USA,
 27–30 January 2021; pp. 524–528. [CrossRef]
9. Cureton, C.; Douglas, M. Bluebox Deep Dive—NXP’s AD Processing Platform; NXP: Einhofen, The Netherlands, 2019; p. 28.
10. Cubuk, E.D.; Zoph, B.; Mane, D.; Vasudevan, V.; Le, Q.V. AutoAugment: Learning Augmentation Policies from Data. Cvpr 2019,
 2018, 113–123.

J. Low Power Electron. Appl. 2022, 12, 8 15 of 15

11. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch:
 An Imperative Style, High-Performance Deep Learning Library. Adv. Neural Inf. Processing Syst. 2019, 32, 8026–8037.
12. Collobert, R.; Kavukcuoglu, K.; Farabet, C. Torch7: A Matlab-like Environment for Machine Learning. NIPS. 2011. Available
 online: https://infoscience.epfl.ch/record/192376/files/Collobert_NIPSWORKSHOP_2011.pdf (accessed on 10 January 2022).
13. Fridman, L.; Ing, L.D.; Jenik, B.; Reimer, B. Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-
 Critical Decisions. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
 Workshops, Long Beach, CA, USA, 16–20 June 2019. [CrossRef]
14. PyTorch. PyTorch 1.10 Release, including CUDA Graphs APIs, Frontend and Compiler Improvements. Available online:
 https://pytorch.org/blog/pytorch-1.10-released/ (accessed on 10 January 2022).
15. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings
 of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Honolulu, HI, USA, 21–26 July 2017; pp.
 1492–1500. [CrossRef]
16. Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the
 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708.
 [CrossRef]
17. Huang, G.; Liu, S.; van der Maaten, L.; Weinberger, K.Q. CondenseNet: An Efficient DenseNet using Learned Group Convolutions.
 In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, Honolulu, HI,
 USA, 21–26 July 2017; pp. 2752–2761. [CrossRef]
18. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 30th IEEE Conference on
 Computer Vision and Pattern Recognition, CVPR, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [CrossRef]
19. Wu, S.; Li, G.; Deng, L.; Liu, L.; Wu, D.; Xie, Y.; Shi, L. L1-Norm Batch Normalization for Efficient Training of Deep Neural
 Networks. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2043–2051. [CrossRef] [PubMed]
20. Cui, Y.; Jia, M.; Lin, T.Y.; Song, Y.; Belongie, S. Class-balanced loss based on effective number of samples. In Proceedings of the
 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp.
 9268–9277. [CrossRef]
21. DeVries, T.; Taylor, G.W. Improved Regularization of Convolutional Neural Networks with Cutout. arXiv 2017, arXiv:1708.04552.
22. Inoue, H. Data Augmentation by Pairing Samples for Images Classification. arXiv 2018, arXiv:1801.02929.
23. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef] [PubMed]
24. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Openai, O.K. Proximal Policy Optimization Algorithms. arXiv 2017,
 arXiv:1707.06347.
25. IUPUI. Speed Is Key to Safety—dSPACE. Available online: https://www.dspace.com/en/inc/home/applicationfields/stories/
 iupui-speed-is-key-to-safety.cfm (accessed on 10 January 2022).
26. Stewart, C.A.; Welch, V.; Plale, B.; Fox, G.; Pierce, M.; Sterling, T. Indiana University Pervasive Technology Institute. IUScholar-
 Works. Available online: https://scholarworks.iu.edu/dspace/handle/2022/21675 (accessed on 10 January 2022).

You can also read