CONDENSENEXTV2: LIGHT-WEIGHT MODERN IMAGE CLASSIFIER UTILIZING SELF-QUERYING AUGMENTATION POLICIES

Page created by Everett Sparks
 
CONTINUE READING
CONDENSENEXTV2: LIGHT-WEIGHT MODERN IMAGE CLASSIFIER UTILIZING SELF-QUERYING AUGMENTATION POLICIES
Journal of
 Low Power Electronics
 and Applications

Article
CondenseNeXtV2: Light-Weight Modern Image Classifier
Utilizing Self-Querying Augmentation Policies
Priyank Kalgaonkar and Mohamed El-Sharkawy *

 Department of Electrical and Computer Engineering, Purdue School of Engineering and Technology,
 Indianapolis, IN 46254, USA; pkalgaon@purdue.edu
 * Correspondence: melshark@purdue.edu

 Abstract: Artificial Intelligence (AI) combines computer science and robust datasets to mimic natural
 intelligence demonstrated by human beings to aid in problem-solving and decision-making involving
 consciousness up to a certain extent. From Apple’s virtual personal assistant, Siri, to Tesla’s self-
 driving cars, research and development in the field of AI is progressing rapidly along with privacy
 concerns surrounding the usage and storage of user data on external servers which has further
 fueled the need of modern ultra-efficient AI networks and algorithms. The scope of the work
 presented within this paper focuses on introducing a modern image classifier which is a light-weight
 and ultra-efficient CNN intended to be deployed on local embedded systems, also known as edge
 devices, for general-purpose usage. This work is an extension of the award-winning paper entitled
 ‘CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems’ published for
 the 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC).
 The proposed neural network dubbed CondenseNeXtV2 utilizes a new self-querying augmentation
 policy technique on the target dataset along with adaption to the latest version of PyTorch framework
 and activation functions resulting in improved efficiency in image classification computation and
 
  accuracy. Finally, we deploy the trained weights of CondenseNeXtV2 on NXP BlueBox which is an
Citation: Kalgaonkar, P.;
 edge device designed to serve as a development platform for self-driving cars, and conclusions will
El-Sharkawy, M. CondenseNeXtV2: be extrapolated accordingly.
Light-Weight Modern Image
Classifier Utilizing Self-Querying Keywords: CondenseNeXt; convolutional neural network; computer vision; embedded systems;
Augmentation Policies. J. Low Power edge devices; image classification; CNN; PyTorch
Electron. Appl. 2022, 12, 8. https://
doi.org/10.3390/jlpea12010008

Academic Editor: Andrea Calimera
 1. Introduction
Received: 9 December 2021
 Convolutional Neural Network (CNN) is a class of Deep Neural Network (DNN),
Accepted: 21 January 2022
 which is a subset of Machine Learning (ML), designed to realize and harness power of Arti-
Published: 3 February 2022
 ficial Intelligence (AI) to simulate natural intelligence demonstrated by living creatures of
Publisher’s Note: MDPI stays neutral the Kingdom Animalia. The first CNN algorithm was introduced by Alexey G. Ivakhnenko
with regard to jurisdictional claims in and V. G. Lapa in 1967 for supervised, deep, feed-forward, multi-layer perceptions [1].
published maps and institutional affil- By the year 2012, data and computation intensive CNN architectures began dominating
iations. accuracy benchmarks. On 30 September 2012, AlexNet CNN architecture developed by
 Alex Krizhevsky in collaboration with Ilya Sutskever and Geoffrey Hinton [2,3] won the
 ImageNet Large Scale Visual Recognition (ILSCR) Challenge [4] which caught the world’s
 attention, eventually becoming the benchmark for newer CNNs utilizing GPUs to accelerate
Copyright: © 2022 by the authors.
 deep learning [5].
Licensee MDPI, Basel, Switzerland.
 In recent years, CNNs have become a popular cornerstone in the field of computer
This article is an open access article
 vision research and development which is a multidisciplinary field of computer science
distributed under the terms and
conditions of the Creative Commons
 that focuses on developing innovative techniques and algorithms to enable machines to
Attribution (CC BY) license (https://
 perform complex tasks such as image classification and object detection, analogous to
creativecommons.org/licenses/by/ that of a human visual system, for real-world applications such as advanced surveillance
4.0/).

J. Low Power Electron. Appl. 2022, 12, 8. https://doi.org/10.3390/jlpea12010008 https://www.mdpi.com/journal/jlpea
CONDENSENEXTV2: LIGHT-WEIGHT MODERN IMAGE CLASSIFIER UTILIZING SELF-QUERYING AUGMENTATION POLICIES
of a human visual system, for real-world applications such as advanced surveillance sys
 tems for malicious activity recognition [6], self-driving cars, AI robots, and Unmanned
 Ariel Vehicles (UAV) such as drones [7].
J. Low Power Electron. Appl. 2022, 12, 8 2 of 15
 In this paper, we propose a modern image classifier which is light-weight and ultra
 efficient that is intended to be deployed on local embedded systems, also known as edge
 devices, for general-purpose usage. This work is an extension of an award-winning paper
 systems for malicious activity recognition [6], self-driving cars, AI robots, and Unmanned
 entitled
 Ariel ‘CondenseNeXt:
 Vehicles (UAV) such as An Ultra-Efficient
 drones [7]. Deep Neural Network for Embedded Sys
 tems’ published for the 2021 IEEE 11th Annual
 In this paper, we propose a modern image classifier Computing and Communication
 which is light-weight and ultra- Work
 shop and
 efficient thatConference
 is intended to(CCWC) [8]. The
 be deployed proposed
 on local neural
 embedded network
 systems, dubbed
 also known CondenseN
 as edge
 eXtV2 utilizes
 devices, a new self-querying
 for general-purpose usage. Thisaugmentation policyoftechnique
 work is an extension on a given
 an award-winning papertarget da
 entitled ‘CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems’
 taset along with adaption to the latest version of PyTorch framework at the time of writing
 published
 this paper, forresulting
 the 2021 IEEE
 in an11th Annual Computing
 improved and Communication
 top-1 accuracy Workshopperformance
 in image classification and
 Conference (CCWC) [8]. The proposed neural network dubbed CondenseNeXtV2 utilizes
 which we shall observe through experiments by deploying the trained weights of Con
 a new self-querying augmentation policy technique on a given target dataset along with
 denseNeXtV2 for real-time image classification on NXP BlueBox development platform
 adaption to the latest version of PyTorch framework at the time of writing this paper,
 [9].
 resulting in an improved top-1 accuracy in image classification performance which we shall
 observe through experiments by deploying the trained weights of CondenseNeXtV2 for
 2. Background
 real-time and Literature
 image classification Review
 on NXP BlueBox development platform [9].
 2.1. Artificial Neural Networks
 2. Background and Literature Review
 Artificial
 2.1. Artificial Neural
 Neural Networks (ANN) take inspiration from the human brain. A human
 Networks
 brainArtificial
 is a complex
 Neuraland non-linear
 Networks (ANN) processing system
 take inspiration capable
 from of retraining
 the human brain. A and
 humanreorganiz
 ing its
 brain is acrucial
 complex structural components
 and non-linear known
 processing systemas neurons
 capable to perform
 of retraining and complex
 reorganizingoperations
 such
 its as image
 crucial classification
 structural componentsand object
 known detection
 as neurons much complex
 to perform faster than any general-purpose
 operations such as
 computer
 image in existence
 classification today.
 and object ANNs much
 detection are developed
 faster than using a common programming
 any general-purpose computer lan
 in existence
 guage, today.
 trained ANNs
 and thenare developed
 deployed onusing a common
 computer programming
 hardware capablelanguage, trained
 of executing objective
 and then tasks
 specific deployed on computer
 or deployed hardwareon
 to simulate capable of executing
 a computer. ANNs objective-specific tasks or way by
 function in a similar
 deployed to simulate on a computer. ANNs function in a similar way by:
 1. Extracting knowledge using neurons and,
 1. Extracting knowledge using neurons and,
 2. Storing the extracted information with the help of inter-neuron connection strengths
 2. Storing the extracted information with the help of inter-neuron connection strengths
 knownasassynaptic
 known synaptic weights.
 weights.
 Thisprocess
 This process ofof learning
 learning by by reorganizing
 reorganizing synaptic
 synaptic weights
 weights in an artificial
 in an artificial neural net
 neural net-
 workisiscalled
 work calleda learning
 a learning algorithm.
 algorithm. Figure
 Figure 1 below
 1 below provides
 provides a visual
 a visual representation
 representation of a of a
 nonlinearmodel
 nonlinear model ofof a neuron
 a neuron inANN.
 in an an ANN.

 Figure1.1.AAvisual
 Figure visual representation
 representation of aof a nonlinear
 nonlinear modelmodel of a neuron
 of a neuron in an ANN.
 in an ANN.

 InInFigure
 Figure1, synaptic weights
 1, synaptic are given
 weights are as inputs
 given astoinputs
 the neurons
 to theofneurons
 an ANN of which may which
 an ANN
 have positive or negative values. A summing junction functions as an adder to
 may have positive or negative values. A summing junction functions as an adder to line linearly
 combine all input weights with respect to corresponding synaptic weights of the neuron
 arly combine all input weights with respect to corresponding synaptic weights of the neu
 and a continuously differentiable linear or non-linear activation functions such as Sigmoid,
 ron and
 Tanh, a continuously
 ReLU, differentiable
 etc. are used to linear or
 decide if the neuron non-linear
 should activation
 be activated or not.functions such as Sig
 moid, Tanh, ReLU, etc. are used to decide if the neuron should be activated or not.
 2.2. Representation Learning in Multi-Layer Feed-Forward Neural Networks
 2.2. Representation Learning inneural
 Multi-layer feed-forward Multi-Layer Feed-Forward
 networks Neural
 are one of the most Networks
 popular and widely
 used types of artificial neural network architectures in which the neurons make use of a
 learning algorithm to train on a target dataset.
CONDENSENEXTV2: LIGHT-WEIGHT MODERN IMAGE CLASSIFIER UTILIZING SELF-QUERYING AUGMENTATION POLICIES
Multi-layer feed-forward neural networks are one of the most popular and widel
 used types of artificial neural network architectures in which the neurons make use of
J. Low Power Electron. Appl. 2022, 12, 8 3 of 15
 learning algorithm to train on a target dataset.
 Multi-layer feed-forward neural networks can have one or more hidden layers whic
 determines the depth of a neural network. Here, the expression hidden signifies that a pa
 Multi-layer feed-forward neural networks can have one or more hidden layers which
 ticular part
 determines theofdepth
 an ANN is notnetwork.
 of a neural directly accessible to input or
 Here, the expression output
 hidden nodesthat
 signifies of athe ANN
 Therefore, as the depth increases, the ANN will possess even more
 particular part of an ANN is not directly accessible to input or output nodes of the ANN. synaptic connection
 between as
 Therefore, neurons
 the depth to increases,
 extract meaningful
 the ANN will information
 possess even and
 more adjust its weights.
 synaptic connections Such a ne
 work architecture
 between is alsomeaningful
 neurons to extract commonlyinformation
 known as and multilayer
 adjust itsperceptrons,
 weights. Such which
 a net-utilizes
 work architecture is algorithm
 backpropagation also commonlyduring known as multilayer
 the training phase.perceptrons, which utilizes a
 backpropagation
 The work algorithm
 proposedduring within thethis
 training
 paperphase.
 is based on fundamental principles of mult
 The work proposed within this paper is based on fundamental principles of multi-layer
 layer feed-forward neural networks for supervised learning in the OpenCV realm. In su
 feed-forward neural networks for supervised learning in the OpenCV realm. In supervised
 pervised machine learning, the mapping between input variables ( ) and output variable
 machine learning, the mapping between input variables (x) and output variables (y) is
 ( ) is learned.
 learned. It infersIta infers a function
 function fromdataset
 from labeled labeled dataset
 known as known as ground
 ground truth. truth.
 The idea The idea
 is to
 to train
 train and and
 learnlearn hundreds
 hundreds of classes
 of classes from afrom a variety
 variety of different
 of different images containing
 images containing noise nois
 andirregularities
 and irregularities which
 which can can
 be abe a challenge
 challenge for traditional
 for traditional image image classification
 classification algorithms.algorithm
 Figure
 Figure 2 below
 2 below provides
 provides a visuala representation
 visual representation of these
 of these images. images.
 The work The work
 presented withinpresente
 this paperthis
 within utilizes
 paperAutoAugment [10] data augmentation
 utilizes AutoAugment [10] datatechnique to overcome
 augmentation this issue
 technique to overcom
 asthis
 discussed in the following sections of this paper.
 issue as discussed in the following sections of this paper.

 Figure
 Figure 2. 2. Sample
 Sample of ×
 of 32 3232× resolution
 32 resolution images
 images containing
 containing noise
 noise and andirregularities.
 other other irregularities.

 ToTo
 address
 address thethe
 challenges
 challenges of performance
 of performance and and
 efficiency of deep
 efficiency neural
 of deep network
 neural network a
 architectures, in 2016, a team of researchers and engineers of Facebook AI Research (FAIR)
 chitectures, in 2016, a team of researchers and engineers of Facebook AI Research (FAIR
 lab introduced PyTorch and released it under the Modified BSD license for fair use [11].
 lab introduced PyTorch and released it under the Modified BSD license for fair use [11
 PyTorch is a scientific computing framework and an open-source machine learning library
 PyTorch
 built is aTorch
 upon the scientific
 library,computing
 which is alsoframework and an open-source
 a scientific computing machine
 framework and learning l
 an open-
 brarymachine
 source built upon the Torch
 learning library,library, which
 introduced is also
 in 2002 a scientific
 [12]. Some of thecomputing framework
 important features of and a
 open-source
 PyTorch include: machine learning library, introduced in 2002 [12]. Some of the important fea
 1.tures of PyTorch
 Tensor include:
 computation accelerated by GPU(s).
 2.1. Automatic differentiationaccelerated
 Tensor computation using tape-based autograd.
 by GPU(s).
 2. PyTorch is a popular
 Automatic choice of deep
 differentiation usinglearning framework
 tape-based for commercial applications
 autograd.
 such as autonomous vehicles, AI applications, etc. Tesla Motors, a leader in the autonomous
 PyTorch is a popular choice of deep learning framework for commercial application
 (self-driving) vehicle industry, utilizes PyTorch for their famous Autopilot system, which is
 such as autonomous vehicles, AI applications, etc. Tesla Motors, a leader in the autono
 a suite of SAE Level 2 Advanced Driver-Assistance System (ADAS) features, by utilizing
 mous (self-driving)
 advanced vehicle industry,
 computing hardware utilizescomputer
 for deep learning, PyTorchvision
 for their
 andfamous Autopilot
 sensor fusion [13]. system
 which
 The is a suite
 proposed CNNof SAEthis
 within Level 2 Advanced
 paper is based on Driver-Assistance System
 the latest stable release (ADAS)
 of PyTorch features, b
 version
 utilizing
 1.10.0 advanced
 and CUDA computing
 toolkit version 10.2hardware
 [14]. for deep learning, computer vision and senso
 fusion [13]. The proposed CNN within this paper is based on the latest stable release o
 PyTorch version 1.10.0 and CUDA toolkit version 10.2 [14].

 2.3. Evolution of CondenseNeXt
CONDENSENEXTV2: LIGHT-WEIGHT MODERN IMAGE CLASSIFIER UTILIZING SELF-QUERYING AUGMENTATION POLICIES
J. Low Power Electron. Appl. 2022, 12, x FOR PEER REVIEW 4 of 1

 J. Low Power Electron. Appl. 2022, 12, 8 4 of 15

 Huang et al. introduced ResNeXt [15] CNN architecture which utilizes an innovativ
 technique to skip one or more layers, known as identity shortcut connections. This a
 2.3.
 lowedEvolution of CondenseNeXt
 ResNeXt CNN to train on large amounts of data with unparalleled efficiency at th
 time. Huang et al. introduced ResNeXt [15] CNN architecture which utilizes an innovative
 technique to skip
 In the one or more
 following year,layers,
 in 2017,known
 Huangas identity shortcut connections.
 et al. introduced DenseNet This
 [16]allowed
 CNN architec
 ResNeXt CNN to train on large amounts of data with unparalleled efficiency
 ture that utilizes a novel technique to connect each convolutional layer to other proceed at the time.
 In the following year, in 2017, Huang et al. introduced DenseNet [16] CNN architecture
 ing layers in a feed-forward fashion. This facilitates the reuse of synaptic weights b
 that utilizes a novel technique to connect each convolutional layer to other proceeding
 providing outputs of current layer as inputs to every other proceeding layer resulting i
 layers in a feed-forward fashion. This facilitates the reuse of synaptic weights by providing
 extraction
 outputs of information
 of current at different
 layer as inputs levels
 to every other of coarseness.
 proceeding DenseNet
 layer resulting providesofa baselin
 in extraction
 for CondenseNet CNN architecture.
 information at different levels of coarseness. DenseNet provides a baseline for CondenseNet
 In 2018, Huang et al. introduced CondenseNet [17], which was an improvement ove
 CNN architecture.
 the In 2018, Huang
 DenseNet CNN.et al.Inintroduced
 this CNNCondenseNet
 architecture, [17], which proposed
 authors was an improvement over
 pruning several layer
 the DenseNet CNN. In this CNN architecture, authors proposed pruning
 and blocks and replacing standard convolutions with group convolutions (G). Furthe several layers
 and
 more,blocks and
 they replacingastandard
 proposed technique convolutions with group
 to learn mapping convolutions
 patterns (G). Furthermore,
 between channels and group
 they proposed a technique to learn mapping patterns between channels and groups by
 by connecting layers from different groups to each other directly, thus preventing ind
 connecting layers from different groups to each other directly, thus preventing individual
 vidual groups of channels from training separately. The authors collectively called th
 groups of channels from training separately. The authors collectively called this technique
 astechnique as learned
 learned group group convolutions.
 convolutions. Figure 3 below Figure 3 below
 provides provides
 a visual a visual
 comparison comparison o
 of dense
 dense blocks in DenseNet vs.
 blocks in DenseNet vs. CondenseNet CNN. CondenseNet CNN.

 Figure
 Figure Comparison
 3. 3. of dense
 Comparison blocks
 of dense in DenseNet
 blocks vs. CondenseNet
 in DenseNet CNN. CNN.
 vs. CondenseNet

 CondenseNet performs a 1 × 1 standard group convolution. The major drawback of
 CondenseNet performs a 1 × 1 standard group convolution. The major drawback o
 implementing such a convolution is that it does not parse through each channel’s spatial di-
 implementing such amisses
 mensions and therefore, convolution is that
 out on fine it does
 details nottoparse
 leading a loss through each
 in efficiency andchannel’s
 overall spatia
 dimensions
 accuracy of theand therefore,
 network. misses
 To address theseout on fine
 issues, details and
 Kalgaonkar leading to a loss
 El-Sharkawy in efficiency an
 introduced
 overall accuracy of the network. To address these issues, Kalgaonkar
 CondenseNeXt [8] in 2021. CondenseNeXt refers to the next dimension of cardinality. and El-Sharkaw
 In
 introduced
 this CondenseNeXt
 CNN architecture, [8] propose
 the authors in 2021.replacing
 CondenseNeXt
 standardrefers
 groupedto the next dimension
 convolutions with of ca
 depthwise
 dinality. separable
 In this CNNconvolutions [18], which
 architecture, allows the
 the authors network
 propose to extractstandard
 replacing information by
 grouped con
 transforming an image only once and then elongating this transformed image
 volutions with depthwise separable convolutions [18], which allows the network to ex to n number
 of channels instead of simply transforming the image n number of times, resulting in a
 tract information by transforming an image only once and then elongating this tran
 boost in computational efficiency and fewer number of Floating-Point Operations (FLOPs)
 formed image to number of channels instead of simply transforming the image
 at the training stage.
 number of times, resulting in a boost in computational efficiency and fewer number o
 Floating-Point Operations (FLOPs) at the training stage.

 3. CondenseNeXtV2 Architecture
CONDENSENEXTV2: LIGHT-WEIGHT MODERN IMAGE CLASSIFIER UTILIZING SELF-QUERYING AUGMENTATION POLICIES
J. Low Power Electron. Appl. 2022, 12, x FOR PEER REVIEW 5 of 16

 J. Low Power Electron. Appl. 2022, 12, 8 In this section, a new CNN architecture, called CondenseNeXtV2, is being proposed.
 5 of 15
 CondenseNeXtV2 is inspired by and is an improvement over CondenseNeXt CNN. The
 primary goal of CondenseNeXtV2 CNN is to further improve performance and top-1 ac-
 curacy of the network.
 3. CondenseNeXtV2 This section describes the architecture in detail.
 Architecture
 In this section, a new CNN architecture, called CondenseNeXtV2, is being proposed.
 3.1. Convolutional Layers
 CondenseNeXtV2 is inspired by and is an improvement over CondenseNeXt CNN. The
 CondenseNeXtV2
 primary utilizes depthwise
 goal of CondenseNeXtV2 CNN isseparable
 to furtherconvolution which enhances
 improve performance the effi-
 and top-1
 ciency of the
 accuracy network
 of the network.without havingdescribes
 This section to transform an image over
 the architecture and over again, result-
 in detail.
 ing in reduction in number of FLOPs and an increase in computational efficiency in com-
 3.1. Convolutional
 parison to standardLayers
 group convolutions. Depthwise separable convolution is comprised
 of two distinct layers: utilizes depthwise separable convolution which enhances the effi-
 CondenseNeXtV2
 ciency of the network without having to transform an image over and over again, resulting
 • Depthwise convolution: This layer acts like a filtering layer which applies depthwise
 in reduction in number of FLOPs and an increase in computational efficiency in comparison
 convolution to a single input channel. Consider an image of size × × given as
 to standard group convolutions. Depthwise separable convolution is comprised of two
 an input with number of input channels of the input image with kernels ( ) of size
 distinct layers:
 × × 1, the output of this depthwise convolution operation will be × × 
 • Depthwise convolution: This layer acts like a filtering layer which applies depthwise
 resulting in reduction of spatial dimensions whilst preserving number of channels
 convolution to a single input channel. Consider an image of size X × X × I given as an
 (depth) of the input image. The cost of computation for this operation can be mathe-
 input with I number of input channels of the input image with kernels (K) of size K ×
 matically
 K × 1, the represented
 output of thisasdepthwise
 follows: convolution operation will be Y × Y × I resulting
 in reduction of spatial dimensions whilst preserving I number of channels (depth) of
 the input image. The cost , of = , , for
 , computation ∗ this operation
 , , can be mathematically (1)
 represented as follows: ,

 • Pointwise convolution: This layer acts like a combining layer which performs linear
 Ŷ = ∑ K̂i,j,m ∗ Xk+i−1,l + j−1,m (1)
 combination of all outputsk,l,m generated
 i,j
 by the previous layer since the depth of input
 image from previous operation hasn’t changed. Pointwise convolution operation is
 • performed using a 1 ×This
 Pointwise convolution: 1 kernel
 layer i.e., 1 × a1combining
 acts like × resulting
 layer in the size
 which of the
 performs output
 linear
 image to be of×all outputs
 combination × . The cost of computation
 generated for this
 by the previous layeroperation
 since the can be of
 depth mathemat-
 input
 imagerepresented
 ically from previous as operation
 follows: hasn’t changed. Pointwise convolution operation is
 performed using a 1 × 1 kernel i.e., 1 × 1 × K resulting in the size of the output image
 to be Y × Y × K. The cost of ,computation for this operation can be mathematically
 , = , ∗ , , (2)
 represented as follows:
 Yk,l,n = ∑ Kem,n ∗ Ŷk−1,l−1,m (2)
 A depthwise separable convolution splits m a kernel into two distinct layers for filtering
 and combining
 A depthwiseoperations resulting insplits
 separable convolution an improvement
 a kernel into twoin distinct
 overall layers
 computational
 for filteringeffi-
 ciency. Figure 4operations
 and combining provides resulting
 a 3D visual representation
 in an improvement of depthwise
 in overall separable efficiency.
 computational convolution
 Figure 4 provides a 3D visual representation of depthwise separable convolution
 operation where a single image is transformed only once and then this transformed image operation
 iswhere a single
 elongated toimage is transformed
 over 128 only allows
 channels. This once andthethen
 CNNthisto
 transformed image is elongated
 extract information at different
 to over
 levels for128 channels. For
 coarseness. Thisexample,
 allows the CNN to
 consider a extract information
 green channel fromatthe
 different levels for
 RGB channel being
 coarseness. For example, consider a green channel from the RGB channel being
 processed. Depthwise separable convolution will extract different shades of color green processed.
 Depthwise
 from separable convolution will extract different shades of color green from this
 this channel.
 channel.

 Figure4.
 Figure A 3D
 4. A 3D visual
 visual representation
 representationofofdepthwise
 depthwiseseparable convolution
 separable forfor
 convolution a given example.
 a given example.

 3.2. Model Compression
 3.2.1. Network Pruning
CONDENSENEXTV2: LIGHT-WEIGHT MODERN IMAGE CLASSIFIER UTILIZING SELF-QUERYING AUGMENTATION POLICIES
ctron. Appl. 2022, 12, x FOR PEER REVIEW 6 of 16
 J. Low Power Electron. Appl. 2022, 12, 8 6 of 15

 Network pruning
 3.2. Modelis one of the effective ways to reduce computational costs by dis-
 Compression
 3.2.1. Network Pruning weights that do not affect the overall performance of
 carding redundant and insignificant
 Network a
 a DNN. Figure 5 provides pruning
 visualiscomparison
 one of the effective ways to
 of weight reduce computational
 matrices for differentcosts by dis-
 pruning
 carding
 approaches as follows: redundant and insignificant weights that do not affect the overall performance of
 a DNN. Figure 5 provides a visual comparison of weight matrices for different pruning
 1. Fine-grained pruning:
 approaches using this approach, redundant weights that will have mini-
 as follows:
 mum influence
 1. on accuracy
 Fine-grained will beusing
 pruning: pruned. However,
 this approach, this approach
 redundant weights results
 that willin irreg-
 have min-
 ular structures.imum influence on accuracy will be pruned. However, this approach results in
 2. Coarse-grainedirregular
 pruning: structures.
 using this approach, an entire filter which will have mini-
 2. Coarse-grained pruning: using this approach, an entire filter which will have minimal
 mal influence on overall accuracy is discarded. This approach results in regular
 influence on overall accuracy is discarded. This approach results in regular structures.
 structures. However,
 However,ififthethe network
 network isishighly
 highly compressed,
 compressed, there there
 may bemay be significant
 significant loss in the
 loss in the overall accuracy
 overall accuracyof ofthe
 the DNN.
 DNN.
 3. 3.
 2D group-level 2Dpruning:
 group-level pruning:
 this this approach
 approach maintainsmaintains a balance
 a balance of both,
 of both, benefitsand
 benefits and
 trade-offs of fine-grained and coarse-grained
 trade-offs of fine-grained and coarse-grained pruning. pruning.

 Figure 5. Comparison
 Figure of different pruning
 5. Comparison methods.
 of different pruning methods.

 CondenseNeXtV2 CNN utilizes 2D Group-Level pruning to discard insignificant filters
 CondenseNeXtV2 CNN utilizes 2D Group-Level pruning to discard insignificant fil-
 which is complemented by L1 -Normalization [19] to facilitate the group-level pruning
 ters which is complemented
 process. by -Normalization [19] to facilitate the group-level prun-
 ing process.
 3.2.2. Class-Balanced Focal Loss (CFL) Function
 Pruning
 3.2.2.. Class-Balanced Focalnetworks in order
 Loss (CFL) to discard insignificant and redundant elements of the
 Function
 network can still have an overall harsh effect caused due to imbalanced weights. Therefore,
 Pruning networks
 in order toin order any
 mitigate to discard insignificant
 such issues, and redundant
 CondenseNeXtV2 elements
 CNN incorporates of the
 a weighting
 network can still have
 factor an overall
 inversely harsh to
 proportional effect caused
 the total dueofto
 number imbalanced
 samples, weights. There-
 called Class-balanced Focal
 fore, in order Loss (CFL) Function
 to mitigate any[20].
 such Thisissues,
 approachCondenseNeXtV2
 can be mathematicallyCNNrepresented as follows: a
 incorporates
 weighting factor inversely proportional to the totalC number of samples, called Class-bal-
 ) = − ∑ (1can t γ t
 
 anced Focal Loss (CFL) Function [20].CFL This
 (z, yapproach − pibe
 ) ∗mathematically
 log pi represented (3)
 i =1
 as follows:
 3.2.3. Cardinality
 A new dimension, called Cardinality, is added to the existing spatial dimensions of
 ( , ) = ( 1 ) ∗ log( )
 the CondenseNeXtV2 network to prevent loss in overall accuracy during the pruning
 (3)
 process, denoted by D. Increasing cardinality rate is a more effective way to obtain a

 3.2.3. Cardinality
 A new dimension, called Cardinality, is added to the existing spatial dimensions of
 the CondenseNeXtV2 network to prevent loss in overall accuracy during the pruning pro-
CONDENSENEXTV2: LIGHT-WEIGHT MODERN IMAGE CLASSIFIER UTILIZING SELF-QUERYING AUGMENTATION POLICIES
J. Low Power Electron. Appl. 2022, 12, 8 7 of 15

 boost in accuracy instead of going wider or deeper within the network. Cardinality can be
 mathematically represented as follows:

 G ∗ Gx = X ∗ D − p · X (4)

 3.3. Data Augmentation
 Data augmentation is a popular technique to increase the amount of data and its
 diversity by modifying already existing data and adding newly created data to existing data
 in order to improve the accuracy of modern image classifiers. CondenseNeXtV2 utilizes
 AutoAugment data augmentation technique to pre-process target dataset to determine the
 best augmentation policy with the help of Reinforcement Learning (RL). AutoAugment
 primarily consists of following two parts:
 • Search Space: Each policy consists of five sub-policies and each sub-policy consists of
 two operations that can be performed on an image in a sequential order. Furthermore,
 each image transformation operation consists of two distinct parameters as follows:
 o The probability of applying the image transformation operation to an image.
 o The magnitude with which the image transformation operation should be
 applied to an image.
 There are a total of 16 image transformation operations. 14 operations belong to
 the PIL (Python Imaging Library) which provides the python interpreter with image
 editing capabilities such as rotating, solarizing, color inverting, etc. and the remaining two
 operations are Cutout [21] and SamplePairing [22]. Table 6 in the ‘Supplementary materials’
 section of [10] provides a comprehensive list of all 16 image transformation operations
 along with default range of magnitudes.
 • Search Algorithm: AutoAugment utilizes a controller RNN, a one-layer LSTM [23]
 containing 100 hidden units and 2 × 5B softmax predictions, to determine the best
 augmentation policy. It examines the generalization ability of a policy by performing
 child model (a neural network being trained during the search process) experiments
 on a subset of the corresponding dataset. Upon completion of these experiments, a
 reward signal is sent to the controller to update validation accuracy using a popular
 policy gradient method called Proximal Policy Optimization algorithm (PPO) [24]
 with a learning rate of 0.00035 and an entropy penalty with a weight of 0.00001. The
 controller RNN provides a decision in terms of a softmax function which is then fed
 into the controller’s next stage as an input, thus allowing it to determine magnitude
 and probability of the image transformation operation.
 At the end of this search process, sub-policies from top five best policies are merged
 into one best augmentation policy with 25 sub-policies for the target dataset.

 3.4. Activation Function
 An activation function in a neural network plays a crucial role in determining the
 output of a node by limiting the output’s amplitude. These functions are also known as
 transfer functions. Activation functions help the neural network learn complex patterns of
 the input data more efficiently whilst requiring fewer number of neurons.
 CondenseNeXtV2 CNN utilizes ReLU6 activation functions along with batch nor-
 malization before each layer. ReLU6 activation function cap units at 6 which helps the
 neural network learn sparse features quickly in addition to preventing gradients to increase
 infinitely as seen in Figure 6. ReLU6 activation function can be mathematically defined as
 follows:
 f ( x ) = min(max(0, x ), 6) (5)
CONDENSENEXTV2: LIGHT-WEIGHT MODERN IMAGE CLASSIFIER UTILIZING SELF-QUERYING AUGMENTATION POLICIES
J.J.Low
 LowPower
 PowerElectron.
 Electron.Appl. 2022,12,
 Appl.2022, 12,8x FOR PEER REVIEW 88of
 of15
 16

 Figure6.6.ReLU
 Figure ReLUvs.
 vs. ReLU6
 ReLU6 activation
 activation functions.
 functions.

 4.
 4. NXP
 NXP BlueBox
 BlueBox 2.0
 2.0 Embedded
 Embedded Development
 DevelopmentPlatform
 Platform
 This
 This section provides
 provides aa brief
 briefintroduction
 introductionofofNXP
 NXP BlueBox
 BlueBox 2.02.0 (Gen2)
 (Gen2) embedded
 embedded de-
 development platform for Automotive High-Performance Compute (AHPC). It
 velopment platform for Automotive High-Performance Compute (AHPC). It is a series of is a series
 of development
 development platform
 platform designed
 designed forfor self-driving
 self-driving (autonomous)
 (autonomous) vehicles.
 vehicles.

 4.1.
 4.1. System
 System Design
 Design of of NXP
 NXP BlueBox
 BlueBoxGen2Gen2Family
 Family
 The BlueBox 2.0 embedded
 The BlueBox 2.0 embedded development development platform
 platform series is designed
 series is designedandanddeveloped
 developedby
 NXP Semiconductors. First launched in 2016 as a generation 1 series,
 by NXP Semiconductors. First launched in 2016 as a generation 1 series, the target appli- the target application
 has been
 cation hasself-driving (autonomous)
 been self-driving vehicles ever
 (autonomous) since.
 vehicles everIt was introduced
 since. to general to
 It was introduced public
 gen-
 in Austin, Texas, USA. Shortly thereafter, NXP improved an advanced series of BlueBox
 eral public in Austin, Texas, USA. Shortly thereafter, NXP improved an advanced series
 which was called BlueBox 2.0, the second generation of this series. It features following
 of BlueBox which was called BlueBox 2.0, the second generation of this series. It features
 three new ARM-based automotive processors [9]:
 following three new ARM-based automotive processors [9]:
 1. S32V234 (S32V) dedicated for computer vision processing tasks: This automotive
 1. processor
 S32V234 (S32V)
 is based dedicated
 on a four forcore
 computer vision A53
 ARM Cortex processing tasks: and
 architecture Thisoperates
 automotiveat 1.0
 processor is based on a four core ARM Cortex A53 architecture
 GHz. An additional ARM Cortex M4 CPU provides further support and functionality and operates at 1.0
 GHz.
 for An additional
 reliability ARM Cortex
 and workload M4 CPU
 sharing. It is provides furtherby
 complimented support
 4 MB andSRAM functional-
 and an
 additional 32-bit LPDDR3 controller to support additional memory if requiredand
 ity for reliability and workload sharing. It is complimented by 4 MB SRAM by an
 the
 additional 32-bit
 developers. It alsoLPDDR3
 boasts ancontroller
 Image Signalto support additional
 Processor memorythe
 (ISP) on-board if required by
 SoC for noise
 the developers.
 filtering, It also boasts
 noise reduction, etc. an Image Signal Processor (ISP) on-board the SoC for
 2. noise filtering,
 LS2084A (LS2) noise reduction,
 dedicated for high etc.performance computing tasks: This automotive
 2. processor
 LS2084A (LS2) dedicated for
 is based on an eight core high performance
 ARM Cortex computing tasks: This
 A72 architecture andautomotive
 operates at
 1.80 GHz. It is complimented by 144 bytes of DDR4 RAM with support enabledatfor
 processor is based on an eight core ARM Cortex A72 architecture and operates
 1.80 GHz. ItLX2
 LayerScape is complimented
 family. NXP claims by 144 LS2
 bytestoofprovide
 DDR4 RAM 15 yearswith ofsupport
 reliable enabled for
 service and
 LayerScape
 meets LX2 family.
 Q100 grade NXP claims
 3 automotive LS2 to provide 15 years of reliable service and
 standards.
 3. meets Q100
 S32R274 (S32R)grade 3 automotive
 dedicated for radarstandards.
 information processing tasks: This automotive
 3. processor
 S32R274 (S32R)
 is based dedicated for radar
 on a dual-core information
 Freescale PowerPCprocessing tasks: This automotive
 e200z4 architecture and operates
 processor
 at 120 MHz is based on a dual-core
 in addition to a checker Freescale PowerPC this
 core on-board e200z4 architecture and
 System-on-Chip oper-It
 (SoC).
 ates at 120
 features a 2.0MHz MBin addition
 flash memory to aandchecker core
 1.5 MB ofon-board
 SRAM. this System-on-Chip (SoC).
 It features
 NXP BlueBox a 2.0
 2.0MB flash to
 adheres memory
 automotiveand 1.5 MB of SRAM.
 compliance and safety standards such as the
 ASIL-B/C
 NXP and D to 2.0
 BlueBox ensure operability
 adheres and reliability
 to automotive in alland
 compliance real-world conditions.
 safety standards Figure
 such 7
 as the
 below provides
 ASIL-B/C and Da to
 high-level system architecture
 ensure operability viewin
 and reliability ofall
 NXP BlueBoxconditions.
 real-world 2.0 [9]. Figure 7
 below provides a high-level system architecture view of NXP BlueBox 2.0 [9].
CONDENSENEXTV2: LIGHT-WEIGHT MODERN IMAGE CLASSIFIER UTILIZING SELF-QUERYING AUGMENTATION POLICIES
J. Low Power Electron. Appl. 2022, 12, x FOR PEER REVIEW

on. Appl. 2022,
 J. Low 12, x FOR
 Power PEER
 Electron. REVIEW
 Appl. 2022, 12, 8 9 of 16 9 of 15

 Figure 7. A high-level system architecture view of NXP BlueBox 2.0.
 Figure 7. A high-level system architecture view of NXP BlueBox 2.0.
 Figure 7. A high-level system architecture view of NXP BlueBox 2.0.

 Figure 8 below provides
 Figure 8 below provides a top-frontal view NXP BlueBox 2.0 automotive
 Figure 8 abelow
 top-frontal
 providesview NXP BlueBox
 a top-frontal 2.0BlueBox
 view NXP automotive embedded
 2.0 automotive embedded
 development
 development
 development platform. platform.
 platform.

 Figure 8. NXP BlueBox 2.0.
 Figure NXP
 Figure8. 8. BlueBox
 NXP 2.0.
 BlueBox
 2.0.
 4.2. RTMaps (Real-Time Multi-Sensor Applications) Remote Studio Software
 4.2. RTMaps
 RTMaps (Real-time (Real-Time
 Multi-sensor Multi-Sensor
 Applications) Applications)
 Remote Remote
 Studio Software is aStudio
 compo- Software
 nent-based software development and execution
 RTMaps (Real-time environment
 Multi-sensor made by Intempora
 Applications) Remoteto fa- Software i
 Studio
 cilitate development of autonomous driving and related automotive applications
 nent-based software development and execution environment made [25]. It by Intem
 allows users to link various sensors and hardware components by providing a host of
 cilitate development of autonomous driving and related automotive applicati
 component libraries for automotive sensors, buses, and perception algorithm design as
 allows users to link various sensors and hardware components by providin
 well as support for almost any type of sensors and any number of those sensors.
 component libraries for automotive sensors, buses, and perception algorithm
 RTMaps can run on Windows and Linux operating systems and provides support
CONDENSENEXTV2: LIGHT-WEIGHT MODERN IMAGE CLASSIFIER UTILIZING SELF-QUERYING AUGMENTATION POLICIES
J. Low Power Electron. Appl. 2022, 12, 8 10 of 15

 4.2. RTMaps (Real-Time Multi-Sensor Applications) Remote Studio Software
 RTMaps (Real-time Multi-sensor Applications) Remote Studio Software is a component-
 based software development and execution environment made by Intempora to facilitate
 development of autonomous driving and related automotive applications [25]. It allows
 users to link various sensors and hardware components by providing a host of component
 libraries for automotive sensors, buses, and perception algorithm design as well as support
 for almost any type of sensors and any number of those sensors.
 RTMaps can run on Windows and Linux operating systems and provides support for
 PyTorch and TensorFlow deep learning frameworks. In RTMaps, programming is done
 using Python scripting language and then deployed for real-time inference on NXP BlueBox
 2.0.

 5. Cyberinfrastructure
 This section provides details on the hardware and software utilized for training
 and testing the proposed CondenseNeXtV2 CNN architecture from which results are
 extrapolated accordingly in the following sections.

 5.1. Cyberinfrastructure for Training the Proposed CNN
 Cyberinfrastructure for training is provided and managed by the Research Technolo-
 gies division at the Indiana University in part by Shared University Research grants from
 IBM Inc. to Indiana University and Lilly Endowment Inc. through its support for the
 Indiana University Pervasive Technology Institute [26]. It is as follows:
 • Intel Xeon Gold 6126 12-core CPU with 32 GB RAM.
 • NVIDIA Tesla V100 GPU.
 • CUDA Toolkit 10.2.
 • Python version 3.7.9.
 • PyTorch version 1.10.

 5.2. Cyberinfrastructure for Testing the Proposed CNN
 Cyberinfrastructure for testing is provided and managed by the Internet of Things (IoT)
 Collaboratory at the Purdue School of Engineering and Technology at Indiana University
 Purdue University at Indianapolis as follows:
 • NXP BlueBox 2.0 embedded development platform.
 • Intempora RTMaps Remote Studio version 4.9.0.
 • Python version 3.6.7.
 • PyTorch version 1.10.

 6. Experiments and Results
 This section provides information on experiment results based on extensive training
 and testing of the proposed convolutional neural network, CondenseNeXtV2, on CIFAR-10
 and CIFAR-100 benchmarking datasets to verify real-time image classification performance
 on NXP BlueBox 2.0. CondenseNeXtV2 CNN has been adapted to be compatible with the
 latest stable release of PyTorch version 1.10.0 and CUDA toolkit version 10.2 [14] whereas
 CondenseNet and CondenseNeXt CNNs are based on PyTorch version 1.1.0 and 1.9.0
 respectively.

 6.1. CIFAR-10 Dataset
 CIFAR-10 dataset is a popular computer-vision dataset used for image classification
 and object detection. It consists of 60,000 32 × 32 RGB color images equally divided into 10
 object classes with 6,000 images per class. CIFAR-10 dataset is split into two sets: 50,000
 images for training a CNN and 10,000 images for testing the performance of trained model
 weights. The training and testing sets are mutually exclusive to obtain fair evaluation
 results.
J. Low Power Electron. Appl. 2022, 12, 8 11 of 15

 CondenseNeXtV2 CNN was trained for 200 epochs with a batch size of 64 input
 images. Since learning algorithm for CondenseNet, CondenseNeXt and CondenseNeXtV2
 is unique, hyperparameters such as growth and learning rate for training these CNNs
 is different. Python scripting language was utilized to develop an image classification
 algorithm for CIFAR-10 dataset and the trained weights were then deployed on to NXP
 BlueBox 2.0 embedded development platform using RTMaps Remote Studio software for
 image classification performance evaluation.
 J. Low Power Electron. Appl. 2022, 12, x FORTable
 PEER1REVIEW
 provides an overview of testing results in terms of FLOPs, number of trainable
 11 of 16
 parameters, Top-1 accuracy and inference time when evaluated on NXP BlueBox 2.0. Top-1
 accuracy corresponds to one class predicted by CNN with highest probability of matching
 the expected
 matching theanswer, also
 expected known
 answer, as known
 also the ground
 as thetruth. Figure
 ground 9 Figure
 truth. provides an overview
 9 provides of
 an over-
 model performance curves and Figure 10 provides screenshot of single image classification
 view of model performance curves and Figure 10 provides screenshot of single image
 performance of CondenseNeXtV2 when evaluated on NXP BlueBox 2.0 using RTMaps
 classification performance of CondenseNeXtV2 when evaluated on NXP BlueBox 2.0 us-
 Remote Studio software on a Linux PC.
 ing RTMaps Remote Studio software on a Linux PC.
 Table 1. Comparison of CIFAR-10 single image classification performance.
 Table 1. Comparison of CIFAR-10 single image classification performance.
 FLOPs Parameters Inference Time
 CNN FLOPs
 (in Million)
 Parameters
 (in Million)
 Top-1 Accuracy Inference Time
 (in Seconds)
 CNN Top-1 Accuracy
 (in Million) (in Million) (in Seconds)
 CondenseNet 65.81 0.52 94.69% 0.1346
 CondenseNet
 CondenseNeXt 65.81
 23.80 0.52
 0.16 94.69%
 92.28% 0.1346
 0.0659
 CondenseNeXt
 CondenseNeXtV2 23.80
 23.80 0.16
 0.16 92.28%
 93.57% 0.0659
 0.0528
 CondenseNeXtV2 23.80 0.16 93.57% 0.0528

 Figure9.9.Performance
 Figure Performancecurves
 curvesofofCondenseNeXtV2
 CondenseNeXtV2vs.vs.CondenseNet
 CondenseNetand
 andCondenseNeXt.
 CondenseNeXt.These
 These
 CNN models are trained on CIFAR-10 dataset for 200 epochs
 CNN models are trained on CIFAR-10 dataset for 200 epochs each. each.

 6.2. CIFAR-100 Dataset
 CIFAR-100 dataset is another popular computer-vision dataset used for image classi-
 fication and object detection. It also consists of 60,000 32 × 32 RGB color images equally
 divided into 100 object classes with 600 images per class. CIFAR-100 dataset is split into
 two sets: 50,000 images for training a CNN and 10,000 images for testing the performance
 of trained model weights. As with CIFAR-10 dataset, the training and testing sets are also
 mutually exclusive in this dataset to obtain fair evaluation results.
 CondenseNeXtV2 CNN was trained for 240 epochs with a batch size of 64 input images.
 Hyperparameters such as growth and learning rate for training these CNNs is different
 because learning algorithm for CondenseNet, CondenseNeXt and CondenseNeXtV2 is
 unique. Python scripting language was utilized to develop an image classification algorithm
 for CIFAR-100 dataset and the trained weights were then deployed on to NXP BlueBox
J. Low Power Electron. Appl. 2022, 12, 8 12 of 15

 J. Low Power Electron. Appl. 2022, 12, 2.0 embedded
 x FOR development
 PEER REVIEW platform using RTMaps Remote Studio software for
 12 ofimage
 16
 classification performance evaluation.

 Figure 10. NXP BlueBox 2.0 single image classification prediction of a horse as an input image being
 Figure 10. NXP BlueBox 2.0 single image classification prediction of a horse as an input image being
 observed in RTMaps Remote Studio software.
 observed in RTMaps Remote Studio software.
 6.2. CIFAR-100 Dataset
 Table 2 provides an overview of testing results in terms of FLOPs, number of trainable
 CIFAR-100
 parameters, dataset
 Top-1 is another
 accuracy and popular
 inference computer-vision dataset used
 time when evaluated for image
 on NXP clas- 2.0.
 BlueBox
 sification and object detection. It also consists of 60,000 32 × 32 RGB
 Figure 11 provides an overview of model performance curves and Figure 12 providescolor images equally
 divided into
 screenshot of 100 object
 single imageclasses with 600 images
 classification per class.
 performance CIFAR-100 datasetwhen
 of CondenseNeXtV2 is split into
 evaluated
 two sets: 50,000 images for training a CNN and 10,000 images for testing the performance
 on NXP BlueBox 2.0 using RTMaps Remote Studio software on a Linux PC.
 of trained model weights. As with CIFAR-10 dataset, the training and testing sets are also
 mutually exclusive in this dataset to obtain fair evaluation results.
 Table 2. Comparison of CIFAR-100 single image classification performance.
 CondenseNeXtV2 CNN was trained for 240 epochs with a batch size of 64 input im-
 ages. Hyperparameters FLOPs such as growth and learning rate for training these Inference
 Parameters CNNs is dif- Time
 ferent CNN
 because learning algorithm for CondenseNet,
 (in Million) (in Million)
 Top-1 Accuracy
 CondenseNeXt and CondenseN-
 (in Seconds)
 eXtV2 is unique. Python scripting
 CondenseNet 65.85 language was 0.55utilized to develop an image classification
 76.65% 0.2483
 algorithm for
 CondenseNeXt CIFAR-100 dataset
 26.38 and the trained
 0.22 weights were
 74.87% deployed on0.1125
 then to NXP
 BlueBox 2.0 embedded development
 CondenseNeXtV2 26.38 platform using RTMaps 75.54%
 0.22 Remote Studio software
 0.0972for
 image classification performance evaluation.
Table 2. Comparison of CIFAR-100 single image classification performance.

 FLOPs Parameters Inference Time
 CNN Top-1 Accuracy
 (in Million) (in Million) (in Seconds)
 CondenseNet 65.85 0.55 76.65% 0.2483
J. Low Power Electron. Appl. 2022, 12, 8 13 of 15
 CondenseNeXt 26.38 0.22 74.87% 0.1125
 CondenseNeXtV2 26.38 0.22 75.54% 0.0972

 Figure 11. Performance curves
 curvesof
 Performance ofCondenseNeXtV2
 CondenseNeXtV2vs.vs.CondenseNet andand
 CondenseNet CondenseNeXt.
 CondenseNeXt.These
 These
 J. Low Power Electron. Appl. 2022, 12, Figure 11. REVIEW
 x FOR PEER 14 of 16
 CNN models are trained on CIFAR-100 dataset for 240 epochs each.
 CNN models are trained on CIFAR-100 dataset for 240 epochs each.

 Figure 12. NXP BlueBox 2.0 single image classification prediction of a train as an input image being
 Figure 12. NXP BlueBox 2.0 single image classification prediction of a train as an input image being
 observed in RTMaps Remote Studio software.
 observed in RTMaps Remote Studio software.
 7. Conclusions
 The scope of the work presented within this paper focuses on introducing a modern
 image classifier named CondenseNeXtV2 which is light-weight and ultra-efficient that is
 intended to be deployed on local embedded systems, also known as edge devices, for
J. Low Power Electron. Appl. 2022, 12, 8 14 of 15

 7. Conclusions
 The scope of the work presented within this paper focuses on introducing a modern
 image classifier named CondenseNeXtV2 which is light-weight and ultra-efficient that
 is intended to be deployed on local embedded systems, also known as edge devices, for
 general-purpose usage. This proposed CNN utilizes a new self-querying augmentation
 policy technique on a target dataset along with adaption to the latest version of PyTorch
 framework and activation functions resulting in improved efficiency in image classification
 computation and accuracy.
 CondenseNeXtV2 CNN was extensively trained from scratch and tested on two
 benchmarking datasets: CIFAR-10 and CIFAR-100 in order to verify the single image
 classification performance. The trained weights of CondenseNeXtV2 were then deployed on
 NXP BlueBox which is an edge device designed to serve as a development platform for self-
 driving cars, and the results were extrapolated and compared to its baseline architecture,
 CondenseNet, accordingly. CondenseNeXtV2 demonstrates an increase in Top-1 accuracy
 over CondenseNeXt as well as utilizes fewer number of FLOPs and total number of
 trainable parameters in comparison to CondenseNet, observed during the training and
 testing phases.

 Author Contributions: Conceptualization, P.K. and M.E.-S.; methodology, P.K.; software, P.K.; vali-
 dation, P.K. and M.E.-S.; writing—original draft preparation, P.K.; writing—review and editing, P.K.
 and M.E.-S.; supervision, M.E.-S. All authors have read and agreed to the published version of the
 manuscript.
 Funding: This research received no external funding.
 Data Availability Statement: The data presented in this study are available on request from the
 corresponding author. The data are not publicly available due to the fact that all the authors work at
 the same institution and there is no necessity to create repository for data exchange.
 Acknowledgments: The authors like to acknowledge the Indiana University Pervasive Technology
 Institute for providing supercomputing and storage resources as well as the Internet of Things
 (IoT) Collaboratory at the Purdue School of Engineering and Technology at Indiana University
 Purdue University at Indianapolis for providing autonomous vehicle development platform that
 have contributed to the research results reported within this paper.
 Conflicts of Interest: The authors declare no conflict of interest.

References
1. Berners-Lee, C.M. Cybernetics and Forecasting. Nature 1968, 219, 202–203. [CrossRef]
2. Gershgorn, D. ImageNet: The Data that Spawned the Current AI Boom, QUARTZ. Available online: https://qz.com/1034972/
 the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/ (accessed on 10 January 2022).
3. Alex, K.; Sutskever, I.; Geoffrey, H. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60,
 84–90. [CrossRef]
4. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al.
 ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [CrossRef]
5. Deshpande, A. Engineering at Forward|UCLA CS ’19. Available online: https://adeshpande3.github.io/ (accessed on 10 January
 2022).
6. Dimitriou, N.; Kioumourtzis, G.; Sideris, A.; Stavropoulos, G.; Taka, E.; Zotos, N.; Leventakis, G.; Tzovaras, D. An integrated
 framework for the timely detection of petty crimes. In Proceedings of the 2017 European Intelligence and Security Informatics
 Conference, Athens, Greece, 11–13 September 2017. [CrossRef]
7. Lotfi, A.; Bouchachia, H.; Gegov, A.; Langensiepen, C.; McGinnity, M. (Eds.) Advances in Computational Intelligence Systems;
 Springer: Berlin/Heidelberg, Germany, 2018; p. 840. [CrossRef]
8. Kalgaonkar, P.; El-Sharkawy, M. CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems. In Proceedings
 of the 2021 IEEE 11th Annual Computing and Communication Workshop and Conference, CCWC 2021, Las Vegas, NV, USA,
 27–30 January 2021; pp. 524–528. [CrossRef]
9. Cureton, C.; Douglas, M. Bluebox Deep Dive—NXP’s AD Processing Platform; NXP: Einhofen, The Netherlands, 2019; p. 28.
10. Cubuk, E.D.; Zoph, B.; Mane, D.; Vasudevan, V.; Le, Q.V. AutoAugment: Learning Augmentation Policies from Data. Cvpr 2019,
 2018, 113–123.
J. Low Power Electron. Appl. 2022, 12, 8 15 of 15

11. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch:
 An Imperative Style, High-Performance Deep Learning Library. Adv. Neural Inf. Processing Syst. 2019, 32, 8026–8037.
12. Collobert, R.; Kavukcuoglu, K.; Farabet, C. Torch7: A Matlab-like Environment for Machine Learning. NIPS. 2011. Available
 online: https://infoscience.epfl.ch/record/192376/files/Collobert_NIPSWORKSHOP_2011.pdf (accessed on 10 January 2022).
13. Fridman, L.; Ing, L.D.; Jenik, B.; Reimer, B. Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-
 Critical Decisions. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
 Workshops, Long Beach, CA, USA, 16–20 June 2019. [CrossRef]
14. PyTorch. PyTorch 1.10 Release, including CUDA Graphs APIs, Frontend and Compiler Improvements. Available online:
 https://pytorch.org/blog/pytorch-1.10-released/ (accessed on 10 January 2022).
15. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings
 of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Honolulu, HI, USA, 21–26 July 2017; pp.
 1492–1500. [CrossRef]
16. Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the
 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708.
 [CrossRef]
17. Huang, G.; Liu, S.; van der Maaten, L.; Weinberger, K.Q. CondenseNet: An Efficient DenseNet using Learned Group Convolutions.
 In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, Honolulu, HI,
 USA, 21–26 July 2017; pp. 2752–2761. [CrossRef]
18. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 30th IEEE Conference on
 Computer Vision and Pattern Recognition, CVPR, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [CrossRef]
19. Wu, S.; Li, G.; Deng, L.; Liu, L.; Wu, D.; Xie, Y.; Shi, L. L1-Norm Batch Normalization for Efficient Training of Deep Neural
 Networks. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2043–2051. [CrossRef] [PubMed]
20. Cui, Y.; Jia, M.; Lin, T.Y.; Song, Y.; Belongie, S. Class-balanced loss based on effective number of samples. In Proceedings of the
 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp.
 9268–9277. [CrossRef]
21. DeVries, T.; Taylor, G.W. Improved Regularization of Convolutional Neural Networks with Cutout. arXiv 2017, arXiv:1708.04552.
22. Inoue, H. Data Augmentation by Pairing Samples for Images Classification. arXiv 2018, arXiv:1801.02929.
23. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef] [PubMed]
24. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Openai, O.K. Proximal Policy Optimization Algorithms. arXiv 2017,
 arXiv:1707.06347.
25. IUPUI. Speed Is Key to Safety—dSPACE. Available online: https://www.dspace.com/en/inc/home/applicationfields/stories/
 iupui-speed-is-key-to-safety.cfm (accessed on 10 January 2022).
26. Stewart, C.A.; Welch, V.; Plale, B.; Fox, G.; Pierce, M.; Sterling, T. Indiana University Pervasive Technology Institute. IUScholar-
 Works. Available online: https://scholarworks.iu.edu/dspace/handle/2022/21675 (accessed on 10 January 2022).
You can also read