CONDENSENEXTV2: LIGHT-WEIGHT MODERN IMAGE CLASSIFIER UTILIZING SELF-QUERYING AUGMENTATION POLICIES
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Journal of Low Power Electronics and Applications Article CondenseNeXtV2: Light-Weight Modern Image Classifier Utilizing Self-Querying Augmentation Policies Priyank Kalgaonkar and Mohamed El-Sharkawy * Department of Electrical and Computer Engineering, Purdue School of Engineering and Technology, Indianapolis, IN 46254, USA; pkalgaon@purdue.edu * Correspondence: melshark@purdue.edu Abstract: Artificial Intelligence (AI) combines computer science and robust datasets to mimic natural intelligence demonstrated by human beings to aid in problem-solving and decision-making involving consciousness up to a certain extent. From Apple’s virtual personal assistant, Siri, to Tesla’s self- driving cars, research and development in the field of AI is progressing rapidly along with privacy concerns surrounding the usage and storage of user data on external servers which has further fueled the need of modern ultra-efficient AI networks and algorithms. The scope of the work presented within this paper focuses on introducing a modern image classifier which is a light-weight and ultra-efficient CNN intended to be deployed on local embedded systems, also known as edge devices, for general-purpose usage. This work is an extension of the award-winning paper entitled ‘CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems’ published for the 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC). The proposed neural network dubbed CondenseNeXtV2 utilizes a new self-querying augmentation policy technique on the target dataset along with adaption to the latest version of PyTorch framework and activation functions resulting in improved efficiency in image classification computation and accuracy. Finally, we deploy the trained weights of CondenseNeXtV2 on NXP BlueBox which is an Citation: Kalgaonkar, P.; edge device designed to serve as a development platform for self-driving cars, and conclusions will El-Sharkawy, M. CondenseNeXtV2: be extrapolated accordingly. Light-Weight Modern Image Classifier Utilizing Self-Querying Keywords: CondenseNeXt; convolutional neural network; computer vision; embedded systems; Augmentation Policies. J. Low Power edge devices; image classification; CNN; PyTorch Electron. Appl. 2022, 12, 8. https:// doi.org/10.3390/jlpea12010008 Academic Editor: Andrea Calimera 1. Introduction Received: 9 December 2021 Convolutional Neural Network (CNN) is a class of Deep Neural Network (DNN), Accepted: 21 January 2022 which is a subset of Machine Learning (ML), designed to realize and harness power of Arti- Published: 3 February 2022 ficial Intelligence (AI) to simulate natural intelligence demonstrated by living creatures of Publisher’s Note: MDPI stays neutral the Kingdom Animalia. The first CNN algorithm was introduced by Alexey G. Ivakhnenko with regard to jurisdictional claims in and V. G. Lapa in 1967 for supervised, deep, feed-forward, multi-layer perceptions [1]. published maps and institutional affil- By the year 2012, data and computation intensive CNN architectures began dominating iations. accuracy benchmarks. On 30 September 2012, AlexNet CNN architecture developed by Alex Krizhevsky in collaboration with Ilya Sutskever and Geoffrey Hinton [2,3] won the ImageNet Large Scale Visual Recognition (ILSCR) Challenge [4] which caught the world’s attention, eventually becoming the benchmark for newer CNNs utilizing GPUs to accelerate Copyright: © 2022 by the authors. deep learning [5]. Licensee MDPI, Basel, Switzerland. In recent years, CNNs have become a popular cornerstone in the field of computer This article is an open access article vision research and development which is a multidisciplinary field of computer science distributed under the terms and conditions of the Creative Commons that focuses on developing innovative techniques and algorithms to enable machines to Attribution (CC BY) license (https:// perform complex tasks such as image classification and object detection, analogous to creativecommons.org/licenses/by/ that of a human visual system, for real-world applications such as advanced surveillance 4.0/). J. Low Power Electron. Appl. 2022, 12, 8. https://doi.org/10.3390/jlpea12010008 https://www.mdpi.com/journal/jlpea
of a human visual system, for real-world applications such as advanced surveillance sys tems for malicious activity recognition [6], self-driving cars, AI robots, and Unmanned Ariel Vehicles (UAV) such as drones [7]. J. Low Power Electron. Appl. 2022, 12, 8 2 of 15 In this paper, we propose a modern image classifier which is light-weight and ultra efficient that is intended to be deployed on local embedded systems, also known as edge devices, for general-purpose usage. This work is an extension of an award-winning paper systems for malicious activity recognition [6], self-driving cars, AI robots, and Unmanned entitled Ariel ‘CondenseNeXt: Vehicles (UAV) such as An Ultra-Efficient drones [7]. Deep Neural Network for Embedded Sys tems’ published for the 2021 IEEE 11th Annual In this paper, we propose a modern image classifier Computing and Communication which is light-weight and ultra- Work shop and efficient thatConference is intended to(CCWC) [8]. The be deployed proposed on local neural embedded network systems, dubbed also known CondenseN as edge eXtV2 utilizes devices, a new self-querying for general-purpose usage. Thisaugmentation policyoftechnique work is an extension on a given an award-winning papertarget da entitled ‘CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems’ taset along with adaption to the latest version of PyTorch framework at the time of writing published this paper, forresulting the 2021 IEEE in an11th Annual Computing improved and Communication top-1 accuracy Workshopperformance in image classification and Conference (CCWC) [8]. The proposed neural network dubbed CondenseNeXtV2 utilizes which we shall observe through experiments by deploying the trained weights of Con a new self-querying augmentation policy technique on a given target dataset along with denseNeXtV2 for real-time image classification on NXP BlueBox development platform adaption to the latest version of PyTorch framework at the time of writing this paper, [9]. resulting in an improved top-1 accuracy in image classification performance which we shall observe through experiments by deploying the trained weights of CondenseNeXtV2 for 2. Background real-time and Literature image classification Review on NXP BlueBox development platform [9]. 2.1. Artificial Neural Networks 2. Background and Literature Review Artificial 2.1. Artificial Neural Neural Networks (ANN) take inspiration from the human brain. A human Networks brainArtificial is a complex Neuraland non-linear Networks (ANN) processing system take inspiration capable from of retraining the human brain. A and humanreorganiz ing its brain is acrucial complex structural components and non-linear known processing systemas neurons capable to perform of retraining and complex reorganizingoperations such its as image crucial classification structural componentsand object known detection as neurons much complex to perform faster than any general-purpose operations such as computer image in existence classification today. and object ANNs much detection are developed faster than using a common programming any general-purpose computer lan in existence guage, today. trained ANNs and thenare developed deployed onusing a common computer programming hardware capablelanguage, trained of executing objective and then tasks specific deployed on computer or deployed hardwareon to simulate capable of executing a computer. ANNs objective-specific tasks or way by function in a similar deployed to simulate on a computer. ANNs function in a similar way by: 1. Extracting knowledge using neurons and, 1. Extracting knowledge using neurons and, 2. Storing the extracted information with the help of inter-neuron connection strengths 2. Storing the extracted information with the help of inter-neuron connection strengths knownasassynaptic known synaptic weights. weights. Thisprocess This process ofof learning learning by by reorganizing reorganizing synaptic synaptic weights weights in an artificial in an artificial neural net neural net- workisiscalled work calleda learning a learning algorithm. algorithm. Figure Figure 1 below 1 below provides provides a visual a visual representation representation of a of a nonlinearmodel nonlinear model ofof a neuron a neuron inANN. in an an ANN. Figure1.1.AAvisual Figure visual representation representation of aof a nonlinear nonlinear modelmodel of a neuron of a neuron in an ANN. in an ANN. InInFigure Figure1, synaptic weights 1, synaptic are given weights are as inputs given astoinputs the neurons to theofneurons an ANN of which may which an ANN have positive or negative values. A summing junction functions as an adder to may have positive or negative values. A summing junction functions as an adder to line linearly combine all input weights with respect to corresponding synaptic weights of the neuron arly combine all input weights with respect to corresponding synaptic weights of the neu and a continuously differentiable linear or non-linear activation functions such as Sigmoid, ron and Tanh, a continuously ReLU, differentiable etc. are used to linear or decide if the neuron non-linear should activation be activated or not.functions such as Sig moid, Tanh, ReLU, etc. are used to decide if the neuron should be activated or not. 2.2. Representation Learning in Multi-Layer Feed-Forward Neural Networks 2.2. Representation Learning inneural Multi-layer feed-forward Multi-Layer Feed-Forward networks Neural are one of the most Networks popular and widely used types of artificial neural network architectures in which the neurons make use of a learning algorithm to train on a target dataset.
Multi-layer feed-forward neural networks are one of the most popular and widel used types of artificial neural network architectures in which the neurons make use of J. Low Power Electron. Appl. 2022, 12, 8 3 of 15 learning algorithm to train on a target dataset. Multi-layer feed-forward neural networks can have one or more hidden layers whic determines the depth of a neural network. Here, the expression hidden signifies that a pa Multi-layer feed-forward neural networks can have one or more hidden layers which ticular part determines theofdepth an ANN is notnetwork. of a neural directly accessible to input or Here, the expression output hidden nodesthat signifies of athe ANN Therefore, as the depth increases, the ANN will possess even more particular part of an ANN is not directly accessible to input or output nodes of the ANN. synaptic connection between as Therefore, neurons the depth to increases, extract meaningful the ANN will information possess even and more adjust its weights. synaptic connections Such a ne work architecture between is alsomeaningful neurons to extract commonlyinformation known as and multilayer adjust itsperceptrons, weights. Such which a net-utilizes work architecture is algorithm backpropagation also commonlyduring known as multilayer the training phase.perceptrons, which utilizes a backpropagation The work algorithm proposedduring within thethis training paperphase. is based on fundamental principles of mult The work proposed within this paper is based on fundamental principles of multi-layer layer feed-forward neural networks for supervised learning in the OpenCV realm. In su feed-forward neural networks for supervised learning in the OpenCV realm. In supervised pervised machine learning, the mapping between input variables ( ) and output variable machine learning, the mapping between input variables (x) and output variables (y) is ( ) is learned. learned. It infersIta infers a function function fromdataset from labeled labeled dataset known as known as ground ground truth. truth. The idea The idea is to to train train and and learnlearn hundreds hundreds of classes of classes from afrom a variety variety of different of different images containing images containing noise nois andirregularities and irregularities which which can can be abe a challenge challenge for traditional for traditional image image classification classification algorithms.algorithm Figure Figure 2 below 2 below provides provides a visuala representation visual representation of these of these images. images. The work The work presented withinpresente this paperthis within utilizes paperAutoAugment [10] data augmentation utilizes AutoAugment [10] datatechnique to overcome augmentation this issue technique to overcom asthis discussed in the following sections of this paper. issue as discussed in the following sections of this paper. Figure Figure 2. 2. Sample Sample of × of 32 3232× resolution 32 resolution images images containing containing noise noise and andirregularities. other other irregularities. ToTo address address thethe challenges challenges of performance of performance and and efficiency of deep efficiency neural of deep network neural network a architectures, in 2016, a team of researchers and engineers of Facebook AI Research (FAIR) chitectures, in 2016, a team of researchers and engineers of Facebook AI Research (FAIR lab introduced PyTorch and released it under the Modified BSD license for fair use [11]. lab introduced PyTorch and released it under the Modified BSD license for fair use [11 PyTorch is a scientific computing framework and an open-source machine learning library PyTorch built is aTorch upon the scientific library,computing which is alsoframework and an open-source a scientific computing machine framework and learning l an open- brarymachine source built upon the Torch learning library,library, which introduced is also in 2002 a scientific [12]. Some of thecomputing framework important features of and a open-source PyTorch include: machine learning library, introduced in 2002 [12]. Some of the important fea 1.tures of PyTorch Tensor include: computation accelerated by GPU(s). 2.1. Automatic differentiationaccelerated Tensor computation using tape-based autograd. by GPU(s). 2. PyTorch is a popular Automatic choice of deep differentiation usinglearning framework tape-based for commercial applications autograd. such as autonomous vehicles, AI applications, etc. Tesla Motors, a leader in the autonomous PyTorch is a popular choice of deep learning framework for commercial application (self-driving) vehicle industry, utilizes PyTorch for their famous Autopilot system, which is such as autonomous vehicles, AI applications, etc. Tesla Motors, a leader in the autono a suite of SAE Level 2 Advanced Driver-Assistance System (ADAS) features, by utilizing mous (self-driving) advanced vehicle industry, computing hardware utilizescomputer for deep learning, PyTorchvision for their andfamous Autopilot sensor fusion [13]. system which The is a suite proposed CNNof SAEthis within Level 2 Advanced paper is based on Driver-Assistance System the latest stable release (ADAS) of PyTorch features, b version utilizing 1.10.0 advanced and CUDA computing toolkit version 10.2hardware [14]. for deep learning, computer vision and senso fusion [13]. The proposed CNN within this paper is based on the latest stable release o PyTorch version 1.10.0 and CUDA toolkit version 10.2 [14]. 2.3. Evolution of CondenseNeXt
J. Low Power Electron. Appl. 2022, 12, x FOR PEER REVIEW 4 of 1 J. Low Power Electron. Appl. 2022, 12, 8 4 of 15 Huang et al. introduced ResNeXt [15] CNN architecture which utilizes an innovativ technique to skip one or more layers, known as identity shortcut connections. This a 2.3. lowedEvolution of CondenseNeXt ResNeXt CNN to train on large amounts of data with unparalleled efficiency at th time. Huang et al. introduced ResNeXt [15] CNN architecture which utilizes an innovative technique to skip In the one or more following year,layers, in 2017,known Huangas identity shortcut connections. et al. introduced DenseNet This [16]allowed CNN architec ResNeXt CNN to train on large amounts of data with unparalleled efficiency ture that utilizes a novel technique to connect each convolutional layer to other proceed at the time. In the following year, in 2017, Huang et al. introduced DenseNet [16] CNN architecture ing layers in a feed-forward fashion. This facilitates the reuse of synaptic weights b that utilizes a novel technique to connect each convolutional layer to other proceeding providing outputs of current layer as inputs to every other proceeding layer resulting i layers in a feed-forward fashion. This facilitates the reuse of synaptic weights by providing extraction outputs of information of current at different layer as inputs levels to every other of coarseness. proceeding DenseNet layer resulting providesofa baselin in extraction for CondenseNet CNN architecture. information at different levels of coarseness. DenseNet provides a baseline for CondenseNet In 2018, Huang et al. introduced CondenseNet [17], which was an improvement ove CNN architecture. the In 2018, Huang DenseNet CNN.et al.Inintroduced this CNNCondenseNet architecture, [17], which proposed authors was an improvement over pruning several layer the DenseNet CNN. In this CNN architecture, authors proposed pruning and blocks and replacing standard convolutions with group convolutions (G). Furthe several layers and more,blocks and they replacingastandard proposed technique convolutions with group to learn mapping convolutions patterns (G). Furthermore, between channels and group they proposed a technique to learn mapping patterns between channels and groups by by connecting layers from different groups to each other directly, thus preventing ind connecting layers from different groups to each other directly, thus preventing individual vidual groups of channels from training separately. The authors collectively called th groups of channels from training separately. The authors collectively called this technique astechnique as learned learned group group convolutions. convolutions. Figure 3 below Figure 3 below provides provides a visual a visual comparison comparison o of dense dense blocks in DenseNet vs. blocks in DenseNet vs. CondenseNet CNN. CondenseNet CNN. Figure Figure Comparison 3. 3. of dense Comparison blocks of dense in DenseNet blocks vs. CondenseNet in DenseNet CNN. CNN. vs. CondenseNet CondenseNet performs a 1 × 1 standard group convolution. The major drawback of CondenseNet performs a 1 × 1 standard group convolution. The major drawback o implementing such a convolution is that it does not parse through each channel’s spatial di- implementing such amisses mensions and therefore, convolution is that out on fine it does details nottoparse leading a loss through each in efficiency andchannel’s overall spatia dimensions accuracy of theand therefore, network. misses To address theseout on fine issues, details and Kalgaonkar leading to a loss El-Sharkawy in efficiency an introduced overall accuracy of the network. To address these issues, Kalgaonkar CondenseNeXt [8] in 2021. CondenseNeXt refers to the next dimension of cardinality. and El-Sharkaw In introduced this CondenseNeXt CNN architecture, [8] propose the authors in 2021.replacing CondenseNeXt standardrefers groupedto the next dimension convolutions with of ca depthwise dinality. separable In this CNNconvolutions [18], which architecture, allows the the authors network propose to extractstandard replacing information by grouped con transforming an image only once and then elongating this transformed image volutions with depthwise separable convolutions [18], which allows the network to ex to n number of channels instead of simply transforming the image n number of times, resulting in a tract information by transforming an image only once and then elongating this tran boost in computational efficiency and fewer number of Floating-Point Operations (FLOPs) formed image to number of channels instead of simply transforming the image at the training stage. number of times, resulting in a boost in computational efficiency and fewer number o Floating-Point Operations (FLOPs) at the training stage. 3. CondenseNeXtV2 Architecture
J. Low Power Electron. Appl. 2022, 12, x FOR PEER REVIEW 5 of 16 J. Low Power Electron. Appl. 2022, 12, 8 In this section, a new CNN architecture, called CondenseNeXtV2, is being proposed. 5 of 15 CondenseNeXtV2 is inspired by and is an improvement over CondenseNeXt CNN. The primary goal of CondenseNeXtV2 CNN is to further improve performance and top-1 ac- curacy of the network. 3. CondenseNeXtV2 This section describes the architecture in detail. Architecture In this section, a new CNN architecture, called CondenseNeXtV2, is being proposed. 3.1. Convolutional Layers CondenseNeXtV2 is inspired by and is an improvement over CondenseNeXt CNN. The CondenseNeXtV2 primary utilizes depthwise goal of CondenseNeXtV2 CNN isseparable to furtherconvolution which enhances improve performance the effi- and top-1 ciency of the accuracy network of the network.without havingdescribes This section to transform an image over the architecture and over again, result- in detail. ing in reduction in number of FLOPs and an increase in computational efficiency in com- 3.1. Convolutional parison to standardLayers group convolutions. Depthwise separable convolution is comprised of two distinct layers: utilizes depthwise separable convolution which enhances the effi- CondenseNeXtV2 ciency of the network without having to transform an image over and over again, resulting • Depthwise convolution: This layer acts like a filtering layer which applies depthwise in reduction in number of FLOPs and an increase in computational efficiency in comparison convolution to a single input channel. Consider an image of size × × given as to standard group convolutions. Depthwise separable convolution is comprised of two an input with number of input channels of the input image with kernels ( ) of size distinct layers: × × 1, the output of this depthwise convolution operation will be × × • Depthwise convolution: This layer acts like a filtering layer which applies depthwise resulting in reduction of spatial dimensions whilst preserving number of channels convolution to a single input channel. Consider an image of size X × X × I given as an (depth) of the input image. The cost of computation for this operation can be mathe- input with I number of input channels of the input image with kernels (K) of size K × matically K × 1, the represented output of thisasdepthwise follows: convolution operation will be Y × Y × I resulting in reduction of spatial dimensions whilst preserving I number of channels (depth) of the input image. The cost , of = , , for , computation ∗ this operation , , can be mathematically (1) represented as follows: , • Pointwise convolution: This layer acts like a combining layer which performs linear Ŷ = ∑ K̂i,j,m ∗ Xk+i−1,l + j−1,m (1) combination of all outputsk,l,m generated i,j by the previous layer since the depth of input image from previous operation hasn’t changed. Pointwise convolution operation is • performed using a 1 ×This Pointwise convolution: 1 kernel layer i.e., 1 × a1combining acts like × resulting layer in the size which of the performs output linear image to be of×all outputs combination × . The cost of computation generated for this by the previous layeroperation since the can be of depth mathemat- input imagerepresented ically from previous as operation follows: hasn’t changed. Pointwise convolution operation is performed using a 1 × 1 kernel i.e., 1 × 1 × K resulting in the size of the output image to be Y × Y × K. The cost of ,computation for this operation can be mathematically , = , ∗ , , (2) represented as follows: Yk,l,n = ∑ Kem,n ∗ Ŷk−1,l−1,m (2) A depthwise separable convolution splits m a kernel into two distinct layers for filtering and combining A depthwiseoperations resulting insplits separable convolution an improvement a kernel into twoin distinct overall layers computational for filteringeffi- ciency. Figure 4operations and combining provides resulting a 3D visual representation in an improvement of depthwise in overall separable efficiency. computational convolution Figure 4 provides a 3D visual representation of depthwise separable convolution operation where a single image is transformed only once and then this transformed image operation iswhere a single elongated toimage is transformed over 128 only allows channels. This once andthethen CNNthisto transformed image is elongated extract information at different to over levels for128 channels. For coarseness. Thisexample, allows the CNN to consider a extract information green channel fromatthe different levels for RGB channel being coarseness. For example, consider a green channel from the RGB channel being processed. Depthwise separable convolution will extract different shades of color green processed. Depthwise from separable convolution will extract different shades of color green from this this channel. channel. Figure4. Figure A 3D 4. A 3D visual visual representation representationofofdepthwise depthwiseseparable convolution separable forfor convolution a given example. a given example. 3.2. Model Compression 3.2.1. Network Pruning
ctron. Appl. 2022, 12, x FOR PEER REVIEW 6 of 16 J. Low Power Electron. Appl. 2022, 12, 8 6 of 15 Network pruning 3.2. Modelis one of the effective ways to reduce computational costs by dis- Compression 3.2.1. Network Pruning weights that do not affect the overall performance of carding redundant and insignificant Network a a DNN. Figure 5 provides pruning visualiscomparison one of the effective ways to of weight reduce computational matrices for differentcosts by dis- pruning carding approaches as follows: redundant and insignificant weights that do not affect the overall performance of a DNN. Figure 5 provides a visual comparison of weight matrices for different pruning 1. Fine-grained pruning: approaches using this approach, redundant weights that will have mini- as follows: mum influence 1. on accuracy Fine-grained will beusing pruning: pruned. However, this approach, this approach redundant weights results that willin irreg- have min- ular structures.imum influence on accuracy will be pruned. However, this approach results in 2. Coarse-grainedirregular pruning: structures. using this approach, an entire filter which will have mini- 2. Coarse-grained pruning: using this approach, an entire filter which will have minimal mal influence on overall accuracy is discarded. This approach results in regular influence on overall accuracy is discarded. This approach results in regular structures. structures. However, However,ififthethe network network isishighly highly compressed, compressed, there there may bemay be significant significant loss in the loss in the overall accuracy overall accuracyof ofthe the DNN. DNN. 3. 3. 2D group-level 2Dpruning: group-level pruning: this this approach approach maintainsmaintains a balance a balance of both, of both, benefitsand benefits and trade-offs of fine-grained and coarse-grained trade-offs of fine-grained and coarse-grained pruning. pruning. Figure 5. Comparison Figure of different pruning 5. Comparison methods. of different pruning methods. CondenseNeXtV2 CNN utilizes 2D Group-Level pruning to discard insignificant filters CondenseNeXtV2 CNN utilizes 2D Group-Level pruning to discard insignificant fil- which is complemented by L1 -Normalization [19] to facilitate the group-level pruning ters which is complemented process. by -Normalization [19] to facilitate the group-level prun- ing process. 3.2.2. Class-Balanced Focal Loss (CFL) Function Pruning 3.2.2.. Class-Balanced Focalnetworks in order Loss (CFL) to discard insignificant and redundant elements of the Function network can still have an overall harsh effect caused due to imbalanced weights. Therefore, Pruning networks in order toin order any mitigate to discard insignificant such issues, and redundant CondenseNeXtV2 elements CNN incorporates of the a weighting network can still have factor an overall inversely harsh to proportional effect caused the total dueofto number imbalanced samples, weights. There- called Class-balanced Focal fore, in order Loss (CFL) Function to mitigate any[20]. such Thisissues, approachCondenseNeXtV2 can be mathematicallyCNNrepresented as follows: a incorporates weighting factor inversely proportional to the totalC number of samples, called Class-bal- ) = − ∑ (1can t γ t anced Focal Loss (CFL) Function [20].CFL This (z, yapproach − pibe ) ∗mathematically log pi represented (3) i =1 as follows: 3.2.3. Cardinality A new dimension, called Cardinality, is added to the existing spatial dimensions of ( , ) = ( 1 ) ∗ log( ) the CondenseNeXtV2 network to prevent loss in overall accuracy during the pruning (3) process, denoted by D. Increasing cardinality rate is a more effective way to obtain a 3.2.3. Cardinality A new dimension, called Cardinality, is added to the existing spatial dimensions of the CondenseNeXtV2 network to prevent loss in overall accuracy during the pruning pro-
J. Low Power Electron. Appl. 2022, 12, 8 7 of 15 boost in accuracy instead of going wider or deeper within the network. Cardinality can be mathematically represented as follows: G ∗ Gx = X ∗ D − p · X (4) 3.3. Data Augmentation Data augmentation is a popular technique to increase the amount of data and its diversity by modifying already existing data and adding newly created data to existing data in order to improve the accuracy of modern image classifiers. CondenseNeXtV2 utilizes AutoAugment data augmentation technique to pre-process target dataset to determine the best augmentation policy with the help of Reinforcement Learning (RL). AutoAugment primarily consists of following two parts: • Search Space: Each policy consists of five sub-policies and each sub-policy consists of two operations that can be performed on an image in a sequential order. Furthermore, each image transformation operation consists of two distinct parameters as follows: o The probability of applying the image transformation operation to an image. o The magnitude with which the image transformation operation should be applied to an image. There are a total of 16 image transformation operations. 14 operations belong to the PIL (Python Imaging Library) which provides the python interpreter with image editing capabilities such as rotating, solarizing, color inverting, etc. and the remaining two operations are Cutout [21] and SamplePairing [22]. Table 6 in the ‘Supplementary materials’ section of [10] provides a comprehensive list of all 16 image transformation operations along with default range of magnitudes. • Search Algorithm: AutoAugment utilizes a controller RNN, a one-layer LSTM [23] containing 100 hidden units and 2 × 5B softmax predictions, to determine the best augmentation policy. It examines the generalization ability of a policy by performing child model (a neural network being trained during the search process) experiments on a subset of the corresponding dataset. Upon completion of these experiments, a reward signal is sent to the controller to update validation accuracy using a popular policy gradient method called Proximal Policy Optimization algorithm (PPO) [24] with a learning rate of 0.00035 and an entropy penalty with a weight of 0.00001. The controller RNN provides a decision in terms of a softmax function which is then fed into the controller’s next stage as an input, thus allowing it to determine magnitude and probability of the image transformation operation. At the end of this search process, sub-policies from top five best policies are merged into one best augmentation policy with 25 sub-policies for the target dataset. 3.4. Activation Function An activation function in a neural network plays a crucial role in determining the output of a node by limiting the output’s amplitude. These functions are also known as transfer functions. Activation functions help the neural network learn complex patterns of the input data more efficiently whilst requiring fewer number of neurons. CondenseNeXtV2 CNN utilizes ReLU6 activation functions along with batch nor- malization before each layer. ReLU6 activation function cap units at 6 which helps the neural network learn sparse features quickly in addition to preventing gradients to increase infinitely as seen in Figure 6. ReLU6 activation function can be mathematically defined as follows: f ( x ) = min(max(0, x ), 6) (5)
J.J.Low LowPower PowerElectron. Electron.Appl. 2022,12, Appl.2022, 12,8x FOR PEER REVIEW 88of of15 16 Figure6.6.ReLU Figure ReLUvs. vs. ReLU6 ReLU6 activation activation functions. functions. 4. 4. NXP NXP BlueBox BlueBox 2.0 2.0 Embedded Embedded Development DevelopmentPlatform Platform This This section provides provides aa brief briefintroduction introductionofofNXP NXP BlueBox BlueBox 2.02.0 (Gen2) (Gen2) embedded embedded de- development platform for Automotive High-Performance Compute (AHPC). It velopment platform for Automotive High-Performance Compute (AHPC). It is a series of is a series of development development platform platform designed designed forfor self-driving self-driving (autonomous) (autonomous) vehicles. vehicles. 4.1. 4.1. System System Design Design of of NXP NXP BlueBox BlueBoxGen2Gen2Family Family The BlueBox 2.0 embedded The BlueBox 2.0 embedded development development platform platform series is designed series is designedandanddeveloped developedby NXP Semiconductors. First launched in 2016 as a generation 1 series, by NXP Semiconductors. First launched in 2016 as a generation 1 series, the target appli- the target application has been cation hasself-driving (autonomous) been self-driving vehicles ever (autonomous) since. vehicles everIt was introduced since. to general to It was introduced public gen- in Austin, Texas, USA. Shortly thereafter, NXP improved an advanced series of BlueBox eral public in Austin, Texas, USA. Shortly thereafter, NXP improved an advanced series which was called BlueBox 2.0, the second generation of this series. It features following of BlueBox which was called BlueBox 2.0, the second generation of this series. It features three new ARM-based automotive processors [9]: following three new ARM-based automotive processors [9]: 1. S32V234 (S32V) dedicated for computer vision processing tasks: This automotive 1. processor S32V234 (S32V) is based dedicated on a four forcore computer vision A53 ARM Cortex processing tasks: and architecture Thisoperates automotiveat 1.0 processor is based on a four core ARM Cortex A53 architecture GHz. An additional ARM Cortex M4 CPU provides further support and functionality and operates at 1.0 GHz. for An additional reliability ARM Cortex and workload M4 CPU sharing. It is provides furtherby complimented support 4 MB andSRAM functional- and an additional 32-bit LPDDR3 controller to support additional memory if requiredand ity for reliability and workload sharing. It is complimented by 4 MB SRAM by an the additional 32-bit developers. It alsoLPDDR3 boasts ancontroller Image Signalto support additional Processor memorythe (ISP) on-board if required by SoC for noise the developers. filtering, It also boasts noise reduction, etc. an Image Signal Processor (ISP) on-board the SoC for 2. noise filtering, LS2084A (LS2) noise reduction, dedicated for high etc.performance computing tasks: This automotive 2. processor LS2084A (LS2) dedicated for is based on an eight core high performance ARM Cortex computing tasks: This A72 architecture andautomotive operates at 1.80 GHz. It is complimented by 144 bytes of DDR4 RAM with support enabledatfor processor is based on an eight core ARM Cortex A72 architecture and operates 1.80 GHz. ItLX2 LayerScape is complimented family. NXP claims by 144 LS2 bytestoofprovide DDR4 RAM 15 yearswith ofsupport reliable enabled for service and LayerScape meets LX2 family. Q100 grade NXP claims 3 automotive LS2 to provide 15 years of reliable service and standards. 3. meets Q100 S32R274 (S32R)grade 3 automotive dedicated for radarstandards. information processing tasks: This automotive 3. processor S32R274 (S32R) is based dedicated for radar on a dual-core information Freescale PowerPCprocessing tasks: This automotive e200z4 architecture and operates processor at 120 MHz is based on a dual-core in addition to a checker Freescale PowerPC this core on-board e200z4 architecture and System-on-Chip oper-It (SoC). ates at 120 features a 2.0MHz MBin addition flash memory to aandchecker core 1.5 MB ofon-board SRAM. this System-on-Chip (SoC). It features NXP BlueBox a 2.0 2.0MB flash to adheres memory automotiveand 1.5 MB of SRAM. compliance and safety standards such as the ASIL-B/C NXP and D to 2.0 BlueBox ensure operability adheres and reliability to automotive in alland compliance real-world conditions. safety standards Figure such 7 as the below provides ASIL-B/C and Da to high-level system architecture ensure operability viewin and reliability ofall NXP BlueBoxconditions. real-world 2.0 [9]. Figure 7 below provides a high-level system architecture view of NXP BlueBox 2.0 [9].
J. Low Power Electron. Appl. 2022, 12, x FOR PEER REVIEW on. Appl. 2022, J. Low 12, x FOR Power PEER Electron. REVIEW Appl. 2022, 12, 8 9 of 16 9 of 15 Figure 7. A high-level system architecture view of NXP BlueBox 2.0. Figure 7. A high-level system architecture view of NXP BlueBox 2.0. Figure 7. A high-level system architecture view of NXP BlueBox 2.0. Figure 8 below provides Figure 8 below provides a top-frontal view NXP BlueBox 2.0 automotive Figure 8 abelow top-frontal providesview NXP BlueBox a top-frontal 2.0BlueBox view NXP automotive embedded 2.0 automotive embedded development development development platform. platform. platform. Figure 8. NXP BlueBox 2.0. Figure NXP Figure8. 8. BlueBox NXP 2.0. BlueBox 2.0. 4.2. RTMaps (Real-Time Multi-Sensor Applications) Remote Studio Software 4.2. RTMaps RTMaps (Real-time (Real-Time Multi-sensor Multi-Sensor Applications) Applications) Remote Remote Studio Software is aStudio compo- Software nent-based software development and execution RTMaps (Real-time environment Multi-sensor made by Intempora Applications) Remoteto fa- Software i Studio cilitate development of autonomous driving and related automotive applications nent-based software development and execution environment made [25]. It by Intem allows users to link various sensors and hardware components by providing a host of cilitate development of autonomous driving and related automotive applicati component libraries for automotive sensors, buses, and perception algorithm design as allows users to link various sensors and hardware components by providin well as support for almost any type of sensors and any number of those sensors. component libraries for automotive sensors, buses, and perception algorithm RTMaps can run on Windows and Linux operating systems and provides support
J. Low Power Electron. Appl. 2022, 12, 8 10 of 15 4.2. RTMaps (Real-Time Multi-Sensor Applications) Remote Studio Software RTMaps (Real-time Multi-sensor Applications) Remote Studio Software is a component- based software development and execution environment made by Intempora to facilitate development of autonomous driving and related automotive applications [25]. It allows users to link various sensors and hardware components by providing a host of component libraries for automotive sensors, buses, and perception algorithm design as well as support for almost any type of sensors and any number of those sensors. RTMaps can run on Windows and Linux operating systems and provides support for PyTorch and TensorFlow deep learning frameworks. In RTMaps, programming is done using Python scripting language and then deployed for real-time inference on NXP BlueBox 2.0. 5. Cyberinfrastructure This section provides details on the hardware and software utilized for training and testing the proposed CondenseNeXtV2 CNN architecture from which results are extrapolated accordingly in the following sections. 5.1. Cyberinfrastructure for Training the Proposed CNN Cyberinfrastructure for training is provided and managed by the Research Technolo- gies division at the Indiana University in part by Shared University Research grants from IBM Inc. to Indiana University and Lilly Endowment Inc. through its support for the Indiana University Pervasive Technology Institute [26]. It is as follows: • Intel Xeon Gold 6126 12-core CPU with 32 GB RAM. • NVIDIA Tesla V100 GPU. • CUDA Toolkit 10.2. • Python version 3.7.9. • PyTorch version 1.10. 5.2. Cyberinfrastructure for Testing the Proposed CNN Cyberinfrastructure for testing is provided and managed by the Internet of Things (IoT) Collaboratory at the Purdue School of Engineering and Technology at Indiana University Purdue University at Indianapolis as follows: • NXP BlueBox 2.0 embedded development platform. • Intempora RTMaps Remote Studio version 4.9.0. • Python version 3.6.7. • PyTorch version 1.10. 6. Experiments and Results This section provides information on experiment results based on extensive training and testing of the proposed convolutional neural network, CondenseNeXtV2, on CIFAR-10 and CIFAR-100 benchmarking datasets to verify real-time image classification performance on NXP BlueBox 2.0. CondenseNeXtV2 CNN has been adapted to be compatible with the latest stable release of PyTorch version 1.10.0 and CUDA toolkit version 10.2 [14] whereas CondenseNet and CondenseNeXt CNNs are based on PyTorch version 1.1.0 and 1.9.0 respectively. 6.1. CIFAR-10 Dataset CIFAR-10 dataset is a popular computer-vision dataset used for image classification and object detection. It consists of 60,000 32 × 32 RGB color images equally divided into 10 object classes with 6,000 images per class. CIFAR-10 dataset is split into two sets: 50,000 images for training a CNN and 10,000 images for testing the performance of trained model weights. The training and testing sets are mutually exclusive to obtain fair evaluation results.
J. Low Power Electron. Appl. 2022, 12, 8 11 of 15 CondenseNeXtV2 CNN was trained for 200 epochs with a batch size of 64 input images. Since learning algorithm for CondenseNet, CondenseNeXt and CondenseNeXtV2 is unique, hyperparameters such as growth and learning rate for training these CNNs is different. Python scripting language was utilized to develop an image classification algorithm for CIFAR-10 dataset and the trained weights were then deployed on to NXP BlueBox 2.0 embedded development platform using RTMaps Remote Studio software for image classification performance evaluation. J. Low Power Electron. Appl. 2022, 12, x FORTable PEER1REVIEW provides an overview of testing results in terms of FLOPs, number of trainable 11 of 16 parameters, Top-1 accuracy and inference time when evaluated on NXP BlueBox 2.0. Top-1 accuracy corresponds to one class predicted by CNN with highest probability of matching the expected matching theanswer, also expected known answer, as known also the ground as thetruth. Figure ground 9 Figure truth. provides an overview 9 provides of an over- model performance curves and Figure 10 provides screenshot of single image classification view of model performance curves and Figure 10 provides screenshot of single image performance of CondenseNeXtV2 when evaluated on NXP BlueBox 2.0 using RTMaps classification performance of CondenseNeXtV2 when evaluated on NXP BlueBox 2.0 us- Remote Studio software on a Linux PC. ing RTMaps Remote Studio software on a Linux PC. Table 1. Comparison of CIFAR-10 single image classification performance. Table 1. Comparison of CIFAR-10 single image classification performance. FLOPs Parameters Inference Time CNN FLOPs (in Million) Parameters (in Million) Top-1 Accuracy Inference Time (in Seconds) CNN Top-1 Accuracy (in Million) (in Million) (in Seconds) CondenseNet 65.81 0.52 94.69% 0.1346 CondenseNet CondenseNeXt 65.81 23.80 0.52 0.16 94.69% 92.28% 0.1346 0.0659 CondenseNeXt CondenseNeXtV2 23.80 23.80 0.16 0.16 92.28% 93.57% 0.0659 0.0528 CondenseNeXtV2 23.80 0.16 93.57% 0.0528 Figure9.9.Performance Figure Performancecurves curvesofofCondenseNeXtV2 CondenseNeXtV2vs.vs.CondenseNet CondenseNetand andCondenseNeXt. CondenseNeXt.These These CNN models are trained on CIFAR-10 dataset for 200 epochs CNN models are trained on CIFAR-10 dataset for 200 epochs each. each. 6.2. CIFAR-100 Dataset CIFAR-100 dataset is another popular computer-vision dataset used for image classi- fication and object detection. It also consists of 60,000 32 × 32 RGB color images equally divided into 100 object classes with 600 images per class. CIFAR-100 dataset is split into two sets: 50,000 images for training a CNN and 10,000 images for testing the performance of trained model weights. As with CIFAR-10 dataset, the training and testing sets are also mutually exclusive in this dataset to obtain fair evaluation results. CondenseNeXtV2 CNN was trained for 240 epochs with a batch size of 64 input images. Hyperparameters such as growth and learning rate for training these CNNs is different because learning algorithm for CondenseNet, CondenseNeXt and CondenseNeXtV2 is unique. Python scripting language was utilized to develop an image classification algorithm for CIFAR-100 dataset and the trained weights were then deployed on to NXP BlueBox
J. Low Power Electron. Appl. 2022, 12, 8 12 of 15 J. Low Power Electron. Appl. 2022, 12, 2.0 embedded x FOR development PEER REVIEW platform using RTMaps Remote Studio software for 12 ofimage 16 classification performance evaluation. Figure 10. NXP BlueBox 2.0 single image classification prediction of a horse as an input image being Figure 10. NXP BlueBox 2.0 single image classification prediction of a horse as an input image being observed in RTMaps Remote Studio software. observed in RTMaps Remote Studio software. 6.2. CIFAR-100 Dataset Table 2 provides an overview of testing results in terms of FLOPs, number of trainable CIFAR-100 parameters, dataset Top-1 is another accuracy and popular inference computer-vision dataset used time when evaluated for image on NXP clas- 2.0. BlueBox sification and object detection. It also consists of 60,000 32 × 32 RGB Figure 11 provides an overview of model performance curves and Figure 12 providescolor images equally divided into screenshot of 100 object single imageclasses with 600 images classification per class. performance CIFAR-100 datasetwhen of CondenseNeXtV2 is split into evaluated two sets: 50,000 images for training a CNN and 10,000 images for testing the performance on NXP BlueBox 2.0 using RTMaps Remote Studio software on a Linux PC. of trained model weights. As with CIFAR-10 dataset, the training and testing sets are also mutually exclusive in this dataset to obtain fair evaluation results. Table 2. Comparison of CIFAR-100 single image classification performance. CondenseNeXtV2 CNN was trained for 240 epochs with a batch size of 64 input im- ages. Hyperparameters FLOPs such as growth and learning rate for training these Inference Parameters CNNs is dif- Time ferent CNN because learning algorithm for CondenseNet, (in Million) (in Million) Top-1 Accuracy CondenseNeXt and CondenseN- (in Seconds) eXtV2 is unique. Python scripting CondenseNet 65.85 language was 0.55utilized to develop an image classification 76.65% 0.2483 algorithm for CondenseNeXt CIFAR-100 dataset 26.38 and the trained 0.22 weights were 74.87% deployed on0.1125 then to NXP BlueBox 2.0 embedded development CondenseNeXtV2 26.38 platform using RTMaps 75.54% 0.22 Remote Studio software 0.0972for image classification performance evaluation.
Table 2. Comparison of CIFAR-100 single image classification performance. FLOPs Parameters Inference Time CNN Top-1 Accuracy (in Million) (in Million) (in Seconds) CondenseNet 65.85 0.55 76.65% 0.2483 J. Low Power Electron. Appl. 2022, 12, 8 13 of 15 CondenseNeXt 26.38 0.22 74.87% 0.1125 CondenseNeXtV2 26.38 0.22 75.54% 0.0972 Figure 11. Performance curves curvesof Performance ofCondenseNeXtV2 CondenseNeXtV2vs.vs.CondenseNet andand CondenseNet CondenseNeXt. CondenseNeXt.These These J. Low Power Electron. Appl. 2022, 12, Figure 11. REVIEW x FOR PEER 14 of 16 CNN models are trained on CIFAR-100 dataset for 240 epochs each. CNN models are trained on CIFAR-100 dataset for 240 epochs each. Figure 12. NXP BlueBox 2.0 single image classification prediction of a train as an input image being Figure 12. NXP BlueBox 2.0 single image classification prediction of a train as an input image being observed in RTMaps Remote Studio software. observed in RTMaps Remote Studio software. 7. Conclusions The scope of the work presented within this paper focuses on introducing a modern image classifier named CondenseNeXtV2 which is light-weight and ultra-efficient that is intended to be deployed on local embedded systems, also known as edge devices, for
J. Low Power Electron. Appl. 2022, 12, 8 14 of 15 7. Conclusions The scope of the work presented within this paper focuses on introducing a modern image classifier named CondenseNeXtV2 which is light-weight and ultra-efficient that is intended to be deployed on local embedded systems, also known as edge devices, for general-purpose usage. This proposed CNN utilizes a new self-querying augmentation policy technique on a target dataset along with adaption to the latest version of PyTorch framework and activation functions resulting in improved efficiency in image classification computation and accuracy. CondenseNeXtV2 CNN was extensively trained from scratch and tested on two benchmarking datasets: CIFAR-10 and CIFAR-100 in order to verify the single image classification performance. The trained weights of CondenseNeXtV2 were then deployed on NXP BlueBox which is an edge device designed to serve as a development platform for self- driving cars, and the results were extrapolated and compared to its baseline architecture, CondenseNet, accordingly. CondenseNeXtV2 demonstrates an increase in Top-1 accuracy over CondenseNeXt as well as utilizes fewer number of FLOPs and total number of trainable parameters in comparison to CondenseNet, observed during the training and testing phases. Author Contributions: Conceptualization, P.K. and M.E.-S.; methodology, P.K.; software, P.K.; vali- dation, P.K. and M.E.-S.; writing—original draft preparation, P.K.; writing—review and editing, P.K. and M.E.-S.; supervision, M.E.-S. All authors have read and agreed to the published version of the manuscript. Funding: This research received no external funding. Data Availability Statement: The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the fact that all the authors work at the same institution and there is no necessity to create repository for data exchange. Acknowledgments: The authors like to acknowledge the Indiana University Pervasive Technology Institute for providing supercomputing and storage resources as well as the Internet of Things (IoT) Collaboratory at the Purdue School of Engineering and Technology at Indiana University Purdue University at Indianapolis for providing autonomous vehicle development platform that have contributed to the research results reported within this paper. Conflicts of Interest: The authors declare no conflict of interest. References 1. Berners-Lee, C.M. Cybernetics and Forecasting. Nature 1968, 219, 202–203. [CrossRef] 2. Gershgorn, D. ImageNet: The Data that Spawned the Current AI Boom, QUARTZ. Available online: https://qz.com/1034972/ the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/ (accessed on 10 January 2022). 3. Alex, K.; Sutskever, I.; Geoffrey, H. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [CrossRef] 4. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [CrossRef] 5. Deshpande, A. Engineering at Forward|UCLA CS ’19. Available online: https://adeshpande3.github.io/ (accessed on 10 January 2022). 6. Dimitriou, N.; Kioumourtzis, G.; Sideris, A.; Stavropoulos, G.; Taka, E.; Zotos, N.; Leventakis, G.; Tzovaras, D. An integrated framework for the timely detection of petty crimes. In Proceedings of the 2017 European Intelligence and Security Informatics Conference, Athens, Greece, 11–13 September 2017. [CrossRef] 7. Lotfi, A.; Bouchachia, H.; Gegov, A.; Langensiepen, C.; McGinnity, M. (Eds.) Advances in Computational Intelligence Systems; Springer: Berlin/Heidelberg, Germany, 2018; p. 840. [CrossRef] 8. Kalgaonkar, P.; El-Sharkawy, M. CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems. In Proceedings of the 2021 IEEE 11th Annual Computing and Communication Workshop and Conference, CCWC 2021, Las Vegas, NV, USA, 27–30 January 2021; pp. 524–528. [CrossRef] 9. Cureton, C.; Douglas, M. Bluebox Deep Dive—NXP’s AD Processing Platform; NXP: Einhofen, The Netherlands, 2019; p. 28. 10. Cubuk, E.D.; Zoph, B.; Mane, D.; Vasudevan, V.; Le, Q.V. AutoAugment: Learning Augmentation Policies from Data. Cvpr 2019, 2018, 113–123.
J. Low Power Electron. Appl. 2022, 12, 8 15 of 15 11. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Adv. Neural Inf. Processing Syst. 2019, 32, 8026–8037. 12. Collobert, R.; Kavukcuoglu, K.; Farabet, C. Torch7: A Matlab-like Environment for Machine Learning. NIPS. 2011. Available online: https://infoscience.epfl.ch/record/192376/files/Collobert_NIPSWORKSHOP_2011.pdf (accessed on 10 January 2022). 13. Fridman, L.; Ing, L.D.; Jenik, B.; Reimer, B. Arguing Machines: Human Supervision of Black Box AI Systems That Make Life- Critical Decisions. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–20 June 2019. [CrossRef] 14. PyTorch. PyTorch 1.10 Release, including CUDA Graphs APIs, Frontend and Compiler Improvements. Available online: https://pytorch.org/blog/pytorch-1.10-released/ (accessed on 10 January 2022). 15. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [CrossRef] 16. Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [CrossRef] 17. Huang, G.; Liu, S.; van der Maaten, L.; Weinberger, K.Q. CondenseNet: An Efficient DenseNet using Learned Group Convolutions. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, Honolulu, HI, USA, 21–26 July 2017; pp. 2752–2761. [CrossRef] 18. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [CrossRef] 19. Wu, S.; Li, G.; Deng, L.; Liu, L.; Wu, D.; Xie, Y.; Shi, L. L1-Norm Batch Normalization for Efficient Training of Deep Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2043–2051. [CrossRef] [PubMed] 20. Cui, Y.; Jia, M.; Lin, T.Y.; Song, Y.; Belongie, S. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 9268–9277. [CrossRef] 21. DeVries, T.; Taylor, G.W. Improved Regularization of Convolutional Neural Networks with Cutout. arXiv 2017, arXiv:1708.04552. 22. Inoue, H. Data Augmentation by Pairing Samples for Images Classification. arXiv 2018, arXiv:1801.02929. 23. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef] [PubMed] 24. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Openai, O.K. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. 25. IUPUI. Speed Is Key to Safety—dSPACE. Available online: https://www.dspace.com/en/inc/home/applicationfields/stories/ iupui-speed-is-key-to-safety.cfm (accessed on 10 January 2022). 26. Stewart, C.A.; Welch, V.; Plale, B.; Fox, G.; Pierce, M.; Sterling, T. Indiana University Pervasive Technology Institute. IUScholar- Works. Available online: https://scholarworks.iu.edu/dspace/handle/2022/21675 (accessed on 10 January 2022).
You can also read