TINYML-ENABLED FRUGAL SMART OBJECTS: CHALLENGES AND OPPORTUNITIES - IOT CRAWLER

Page created by Samuel Hughes
 
CONTINUE READING
TINYML-ENABLED FRUGAL SMART OBJECTS: CHALLENGES AND OPPORTUNITIES - IOT CRAWLER
Feature

                       TinyML-Enabled
                    Frugal Smart Objects:
                       Challenges and
                        Opportunities
                                                Ramon Sanchez-Iborra and Antonio F. Skarmeta

                                                                                                                                                          ©ISTOCKPHOTO.COM/MONSITJ

4        IEEE CIRCUITS AND SYSTEMS MAGAZINE                                1531-636X/20©2020IEEE                                       THIRD QUARTER 2020

    Authorized licensed use limited to: Universidad Politecnica de Cartagena. Downloaded on August 18,2020 at 08:22:56 UTC from IEEE Xplore. Restrictions apply.
TINYML-ENABLED FRUGAL SMART OBJECTS: CHALLENGES AND OPPORTUNITIES - IOT CRAWLER
Frugal objects are independent “things” specialized
      Abstract
                                                                                in perform concrete operations that can be programmed
     The TinyML paradigm proposes to integrate Machine Learning
     (ML)-based mechanisms within small objects powered by Mi-                  with software rather than requiring specific electron-
     crocontroller Units (MCUs). This paves the way for the develop-            ics for each task, so they can be massively produced at
     ment of novel applications and services that do not need the               inexpensive cost. These devices incorporate enough
     omnipresent processing support from the cloud, which is power
                                                                                processing and memory resources, sensors, and con-
     consuming and involves data security and privacy risks. In this
     work, a comprehensive review of the novel TinyML ecosys-                   nectivity capabilities to perform the required operation,
     tem is provided. The related challenges and opportunities are              while being highly energy efficient. A concise definition
     identified and the potential services that will be enabled by the          of types of MCU-based devices according to their hard-
     development of truly smart frugal objects are discussed. As a
                                                                                ware resources is given in [8].
     main contribution of this paper, a detailed survey of the available
     ­TinyML frameworks for integrating ML algorithms within MCUs                  The penetration of frugal objects in our lives due to
      is provided. Besides, aiming at illustrating the given discussion,        the emergence of the Internet of Things (IoT) is skyrock-
      a real case study is presented. Concretely, we propose a multi-           eting, thus provisioning them with cognitive skills opens
      Radio Access Network (RAT) architecture for smart frugal ob-
                                                                                a plethora of opportunities for developing a widespread
      jects. The issue of selecting the most adequate communication
      interface for sending sporadic messages considering both the              web of collective intelligence (Fig. 1) [9]. Different para-
      status of the device and the characteristics of the data to be            digms such as fog or edge computing are fueling this
      sent is addressed. To this end, several TinyML frameworks are             hyper-connected distributed scheme supported by the
      evaluated and the performances of a number of ML algorithms
                                                                                new wave of communication technologies, which permit
      embedded in an Arduino Uno board are analyzed. The attained
      results reveal the validity of the TinyML approach, which suc-            coordinating end-devices efforts to place intelligence
      cessfully enables the integration of techniques such as Neural            close to the users [10]. Nevertheless, given the power
      Networks (NNs), Support Vector Machine (SVM), decision trees,             constraints of MCUs, reducing data transmissions is cru-
      or Random Forest (RF) in frugal objects with constrained hard-
                                                                                cial for their sustainability as this type of operations are
      ware resources. The outcomes also show promising results in
      terms of algorithm’s accuracy and computation performance.                more power demanding than computational tasks [11].
                                                                                   In this line, much data captured by sensors are
                                                                                wasted because of transporting costs, bandwidth limita-
                            I. Introduction                                     tions, or power constraints. For those reasons, perform-

    M
              achine Learning (ML) has revolutionized our                       ing on-device ML processing is a buzz trend today. The
              way of understanding the world by enabling the                    TinyML community [12] is boosting the integration of ML
              inference of valuable information and knowledge                   within MCU-based smart frugal objects. In fact, forecasts
     from huge amount of data, so far invisible to human eye.                   predict that the global edge computing market will reach
     Data are produced by our digital activity and grow at al-                  1.12 trillion dollars by 2023 [13] and some strong compa-
     most unlimited rates [1], hence calling for large pools of                 nies such as Ericsson are already offering TinyML-as-a-
     resources in terms of processing power and storage for                     Service solutions [14]. In this ­paper, we provide a compre-
     their handling and analysis. In this regard, research and                  hensive overview of current challenges and opportunities
     innovation advances during the last years have been
     notable in multiple areas of study [2]–[5]. These efforts
     have focused on obtaining a better response from the
     wide range of available ML techniques, e.g., Deep Learn-
     ing (DL), Neural Networks (NNs), Reinforcement Learning
     (RL), clustering, etc., by exploiting the potential offered
     by state-of-the-art processors, data-centers, and super-
     computers [6]. However, a non-negligible segment of
     processing entities has received scarce attention due to
     their inherent computing constraints: Microcontroller
     Units (MCUs). This tiny processing entities are usually
     embedded in daily-use appliances such as vehicles, medi-
     cal devices, personal gadgets, etc., but they may also con-
     stitute the core processing unit of small frugal objects [7].

     Digital Object Identifier 10.1109/MCAS.2020.3005467
                                                                                  Figure 1. Interconnected smart services.
     Date of current version: 12 August 2020

     Ramon Sanchez-Iborra and Antonio F. Skarmeta are with the University of Murcia, Murcia, Spain, (ramonsanchez@um.es; skarmeta@um.es).

     THIRD QUARTER 2020 		                                                                                       IEEE CIRCUITS AND SYSTEMS MAGAZINE            5

Authorized licensed use limited to: Universidad Politecnica de Cartagena. Downloaded on August 18,2020 at 08:22:56 UTC from IEEE Xplore. Restrictions apply.
TINYML-ENABLED FRUGAL SMART OBJECTS: CHALLENGES AND OPPORTUNITIES - IOT CRAWLER
opened by this emerging hot-topic. A dissection of the                     ML algorithms within a MCU powered device, and the
      potential services that will benefit from the introduction                 attained results. Finally, Section V summarizes the most
      of ML-powered mechanisms into frugal objects is also                       important findings and draws future research lines.
      presented. Currently available frameworks that per-
      mit to embed ML schemes into MCUs are explored and                                 II. Machine Learning in Frugal Objects
      reviewed. Although some performance evaluations of                         The society digitalization achieved during recent times
      certain frameworks have been reported [15], [16], to the                   has democratized the daily use of a plethora of frugal
      authors’ knowledge there is not any prior work provid-                     objects, e.g., wearables, environmental data collectors,
      ing a comprehensive survey focused on these valuable                       actuators, etc. The integration of intelligence within
      tools. Besides, we elaborate on a specific use case which                  these small devices brings a series of advantages
      is the intelligent communication interface selection in a                  that outweigh the possible cons. In the following, we dis-
      constrained multi-Radio Access Technology (RAT) de-                        cuss these aspects aiming at exploring the possibilities
      vice, by considering the system status as well the char-                   opened by TinyML from a realistic perspective.
      acteristics of the data to be sent. To this end, an archi-
      tectural design of the proposed frugal object and a real                   A. Benefits
      on-device implementation of a ML-based decision-mak-
      ing mechanism are finally showed. Therefore, the main                      1) Energy Efficiency
      contributions of this work are the following:                              This is the first positive point that comes to mind when adopt-
          ■■ A wide discussion of the TinyML concept as well                     ing MCU-based solutions. Whereas powerful processors and
             as the range of potentially benefited applications.                 Graphics Processing Units (GPUs) demand great amount of
          ■■ A survey of the principal available frameworks for                  power, MCUs present a reduced energy consumption, even
             integrating ML algorithms within MCUs.                              working at their maximum workload level. This permits
          ■■ Insights regarding the real implementation of                       MCUs to rely on batteries or even energy harvesting.
             ML-based schemes over frugal objects to solve                       Therefore, smart frugal objects can be placed almost every-
             the problem of intelligent interface selection in a                 where without the need of being plugged to the power grid,
             multi-RAT set up.                                                   which opens the door for novel cognitive itinerant applica-
          The rest of the paper is organized as follows. Sec-                    tions [17]. Furthermore, having a scarce power consumption
      tion II provides a comprehensive overview of the TinyML                    allows the smart unit to be coupled with larger battery-pow-
      landscape. Section III presents a survey of the available                  ered devices, hence converting them into connected smart
      frameworks for integrating ML within frugal objects.                       entities, e.g., personal mobility devices such as scooters or
      Section IV focuses on the specific use case of devices                     segways. Fig. 2 compares the main characteristics of devices
      with multiple communication interfaces, presenting the                     powered either by microcontroller or microprocessor.
      proposed architecture, the implementation process of
                                                                                 2) Low Cost
                                                                                 The simplicity and hardware constraints of frugal objects
                                 Processing Power
                                          High
                                                                                 leads to a moderate cost per unit. Clearly, this has been
                                                                                 one of the main reasons for the quick and fruitful exploi-
                Simplicity                  Medium         Memory
                                                                                 tation of IoT architectures in a large range of segments:
                                                                                 industry, agriculture, e-health, or entertainment, among
                                            Low
                                                                                 others [18]. These deployments consist of a large number
                                                                                 of frugal objects specialized on performing varied tasks.
          Low-Cost                                              Storage
                                                                                 The heterogeneity of employed end-devices, as well as
                                                                                 their reprogrammability bring endless opportunities for
                                                                                 enriching current deployments with intelligence, hence
                                                                                 enabling their evolution into smart and flexible systems.
                Portability                                Multi-RAT

                                                                                 3) System Reliability and Data Security
                                  Energy Efficiency                              As stated above, wireless transmissions require much
                              Microcontroller-Based Device                       energy in comparison with processing tasks. This
                              Microprocessor-Based Device                        ­involves a real need for limiting communication activi-
                                                                                  ties in frugal objects. Although this could be seen as a
        Figure 2. Feature comparison between MCU and micropro-                    negative aspect, it actually allows to increase system’s
        cessor-powered devices.
                                                                                  reliability as well as data security and privacy. Big data

6 	   IEEE CIRCUITS AND SYSTEMS MAGAZINE 		                                                                                         THIRD QUARTER 2020

 Authorized licensed use limited to: Universidad Politecnica de Cartagena. Downloaded on August 18,2020 at 08:22:56 UTC from IEEE Xplore. Restrictions apply.
TINYML-ENABLED FRUGAL SMART OBJECTS: CHALLENGES AND OPPORTUNITIES - IOT CRAWLER
applications require to stream raw data over unpredict-                    sets are suitable for straight-through ML integration, which
     able and lossy wireless channels from end-devices to the                   will permit an easy application development with the sup-
     cloud. This strategy is bandwidth consuming and prone                      port of device-level software. In this regard, it is important
     to transmission errors and cyber-attacks, e.g., man-in-                    that future firmwares could reach high reliability levels and
     the-middle or eavesdropping, so that the transmitted                       tackle key aspects such as in-device data security and op-
     data may be lost or compromised. These issues are                          timized transmission operations. In addition, cloud players
     avoided by performing on-device data processing which                      should develop enhanced platforms and APIs to allow ef-
     permits to perform fewer transmissions with aggregated                     ficient data exchanges between objects and cloud. Finally,
     or meaningless data for an attacker.                                       the adaptation of mainstream data-science frameworks
                                                                                to MCU requirements is crucial for a wide expansion of
     4) Latency                                                                 ­TinyML. Not only programming libraries but also lifecycle
     This is another advantage of performing local pro-                          management tools are needed to simplify solution develop-
     cessing instead of offloading computing tasks onto the                      ment, data handling, etc. As discussed in Section III, a first
     cloud. State-of-the-art IoT RATs, e.g., Low Power Wide                      wave of TinyML frameworks and developing tools is already
     Area Network (LPWAN) [19], have notably improved                            available, which will foster the arrival of novel services and
     energy efficiency and transmission distances at the                         applications to the market by attracting new players.
     expense of reducing data-rate. This fact leads to very
     long message’s times-on-air that add undesired delay                       C. Applications
     to the whole operation. Besides, on-device computation                     Many IoT applications are already integrated in our daily
     efficiency of complex ML algorithms is being deeply in-                    personal and professional activities, although many of
     vestigated in order to achieve real-time execution and                     them are prone to be strengthened with local smart pro-
     notable advances are being achieved [20].                                  cessing capabilities (Fig. 3). Most of current IoT devices
                                                                                make an extensive use of network-related operations,
     B. Challenges                                                              which is not always the best option as discussed previ-
     Given the discussed advantages, integrating ML within                      ously. For instance, wearables such as fitness monitors or
     frugal objects also presents important challenges to be                    “smart” watches are not that smart, as they rely on a Blue-
     overcome. These are mostly related to device heteroge-                     tooth-coupled smartphone which assumes the majority
     neity and MCU’s constraints.                                               of processing tasks. Integrating ML within frugal objects
                                                                                permits their independence from a master device, which
     1) Device Heterogeneity                                                    will notably enrich current applications landscape.
     Regarding this aspect, the enormous range of existing
     devices is a notable obstacle for having a general set                     1) eHealth
     of TinyML tools that could be valid for most of MCU-                       This is one of the segments with greater growth potential.
     based set-ups. These units may present different power                     Health monitoring, visual assistance, hearing aids, per-
     consumption, processing capabilities, memory, storage                      sonal sensing, activity recognition, etc., are just few ex-
     capacity, embedded RATs, peripherals, communication                        amples of applications that will exploit truly-smart end-
     ports and protocols, etc., which also hinders the design                   devices. Beyond the health-care perspective, augmented
     of generic benchmarking methodologies in TinyML sys-                       reality, real-time voice recognition, language transla-
     tems [21]. Therefore, the implementation of universal                      tion, context-aware support systems, or precise indoor/
     development and benchmarking frameworks coping                             outdoor positioning are other wearable-based services
     with this heterogeneity is not an easy task although it is                 to come in a near future [23].
     highly necessary for increasing awareness and adoption.
                                                                                2) Smart Spaces
     2) MCU Constraints                                                         ML-enabled frugal objects will also contribute to the
     So far computation and memory constraints have not been                    collective intelligence already present in scenarios such
     a limitation given the quantity and quality of available re-               as smart cities or cognitive buildings, among many
     sources in modern workstations and data centers. Howev-                    others [24]. Current IoT-based monitoring and surveil-
     er, up-scaling computing system’s resources according to                   lance systems, e.g., traffic, pollution, crowdsensing, etc.,
     the processing needs of ML mechanisms is not a valid ap-                   will evolve to smart and autonomous entities capable of
     proach for MCUs [22]. For that reason, TinyML ecosystem                    making decentralized and quick decisions. The deploy-
     players should coordinate effective efforts for achieving a                ment simplicity and independence from the power grid al-
     functional convergence between ML and frugal objects. In                   low things to be placed in remote and rural areas, hence
     this line, MCU manufacturers should ensure that their chip-                creating smart spaces with seamless roaming among them

     THIRD QUARTER 2020 		                                                                                       IEEE CIRCUITS AND SYSTEMS MAGAZINE            7

Authorized licensed use limited to: Universidad Politecnica de Cartagena. Downloaded on August 18,2020 at 08:22:56 UTC from IEEE Xplore. Restrictions apply.
TINYML-ENABLED FRUGAL SMART OBJECTS: CHALLENGES AND OPPORTUNITIES - IOT CRAWLER
4) Industry 4.0
                                                                                                             The potential impact of TinyML is
                                                                                                             enormous in industrial and manu-
                                                                                                             facturing sectors as they are expe-
                                                  Smart City
                                                                                                             riencing a digitalization process
                                                                                                             under the umbrella of the Industry
                                                                                                             4.0 revolution [28]. The processing
                                                                                                             tasks in these fields are highly het-
                                                                                                             erogeneous in terms of computing
                    eHealth                                                 Environmental                    or memory requirements, which
                                                                             Monitoring                      causes that MCUs are not always ca-
                                                                                                             pable of running certain tasks. How-
                                                                                                             ever, the integration of ML-based
                                                                                                             Decision Support Systems (DSSs)
                                                                                                             within MCUs may permit them to
                                                                                                             determine whether to assume a cer-
                                                                                                             tain computation task or offloading
                 Industry and                                                                                it onto upper processing layers such
                                                                              Smart Home
                   Business                                                                                  as edge or cloud [29]. Besides, the
                                                                                                             refinement of production process-
                                                                                                             es, decision making, asset tracking,
                                                                                                             etc. will be driven by the integration
                                                 Vehicular                                                   of intelligence along the production
                                                                                                             chain as well.

                                                                                                         5) Smart Agriculture
                                                                                                         and Farming
        Figure 3. TinyML ecosystem.
                                                                                                         These sectors are also under a
                                                                                                         deep digital transformation [30]. By
      [25]. Besides, the reduced cost of end-devices will also fos-              adopting ML-based smart monitoring and control, both of
      ter their adoption in disadvantaged areas, which may help                  them will increment the efficiency and health of crops and
      to revamp local economy and business activities [26].                      animals, which will lead to greater revenues due to the im-
                                                                                 provement of production quality. Finally, common daily ac-
      3) Vehicular Services                                                      tivities and home appliances will also reach the next level
      Thanks to the alliance between ML and MCUs, the vehicu-                    with the support of ML-enabled devices. For example, bio-
      lar market and, concretely, the Cooperative—Intelligent                    metric e-locks will be generalized as well as context-aware
      Transportation Systems (C-ITS) will be enriched with a seg-                domotic systems and wellbeing applications. Intelligent
      ment of vehicles that has been ignored so far, namely, per-                pet trackers and feeders, smart garden-irrigation control,
      sonal mobility devices. Urban mobility trends are changing                 or advanced safety & security home systems, are other
      during the last times due to the emergence of sustainable                  use cases that will be enabled thanks to the upcoming
      and healthy vehicles such as shared bikes, electric mo-                    ­TinyML revolution [31].
      peds, scooters, etc. However, current vehicular networks
      just consider traditional road transportation means, i.e.,                                  III. Tinyml Frameworks
      cars, buses, trucks, etc., as their equipped On-Board Units                Given the great potential of TinyML to revolutionize mul-
      (OBUs) require great amount of power to operate. Although                  tiple sectors, a number of libraries and tools for ­easing
      some efforts have been done for connecting motorcycles to                  the implementation of ML algorithms on constrained plat-
      C-ITS infrastructures [27], personal vehicles have not been                forms are available in the market. TinyML frameworks
      considered due to their novelty and inherent constraints.                  can be categorized into three different families. The most
      However, coupling these devices with MCU-based OBUs                        common approach relates to the conversion of already
      paves the way for their integration within the C-ITS and                   trained models for adapting them to the restrictions of
      smart-city ecosystems. Therefore, light vehicles will have                 MCUs. Thus, these tools usually take inference engines
      access to basic vehicular services such as driving safety,                 generated by well-known ML libraries such as TensorFlow
      device status monitoring, route planning, etc.                             [32], Scikit-Learn [33], or PyTorch [34], among ­others,

8 	   IEEE CIRCUITS AND SYSTEMS MAGAZINE 		                                                                                         THIRD QUARTER 2020

 Authorized licensed use limited to: Universidad Politecnica de Cartagena. Downloaded on August 18,2020 at 08:22:56 UTC from IEEE Xplore. Restrictions apply.
TINYML-ENABLED FRUGAL SMART OBJECTS: CHALLENGES AND OPPORTUNITIES - IOT CRAWLER
and port their code in order to be executed by scarce                      ect compatible with existing ML frameworks such as Ten-
     resource devices. The second approach bets on integrat-                    sorFlow and Caffe [57], and the ONNX format. As ARM-NN
     ing ML libraries within MCUs, hence providing them with                    does not provide support to the Cortex-M CPUs, ARM has
     local training and analysis capabilities. This permits to                  also released the Cortex Microcontroller Software Inter-
     generate models from data retrieved by the own device,                     face Standard-NN (CMSIS-NN) [42], which is a collection of
     which may improve the accuracy of the model and also                       NN kernels optimized to be executed on this processor se-
     enables the implementation of unsupervised learning al-                    ries. CMSIS-NN is able to convert floating point-based NNs
     gorithms. Finally, the last approach relies on the availabil-              generated by popular libraries, e.g., TensorFlow, PyTorch,
     ity of a dedicated co-processor that supports the main                     and Caffe, into fixed-point format that can run on Cortex-M
     computing unit on ML-specific tasks. This strategy per-                    devices. Another electronics manufacturer that has devel-
     mits to enhance the computation performance, although                      oped specific libraries for its devices is STMicroelectron-
     it is the less common approach as the price and complex-                   ics. Concretely, the STM32 Cube.AI toolkit [43] permits
     ity of the processing platforms are notably increased.                     to integrate pre-trained NNs within STM32 ARM Cortex-
         Big technological companies are promoting the expan-                   M-based microcontrollers. It generates STM32-compatible
     sion of the TinyML ecosystem by offering open-source                       C code from NN models generated by Tensorflow and
     development libraries. Google has launched TensorFlow                      Keras [58], or in the standard ONNX format. As an inter-
     Lite [35], which includes a set of tools to adapt Tensor-                  esting feature, STM32Cube.AI permits to run large NNs by
     Flow models aiming at making them to be runnable in mo-                    storing weights and activation buffers in external flash
     bile and embedded devices. This framework is composed                      memory and RAM, respectively.
     of two main elements, namely, the converter, which ports                       Besides open-source initiatives, some institutions
     TensorFlow models to optimized code that can be execut-                    and companies have also launched licensed products.
     ed in constrained platforms, and the interpreter, which                    The Fraunhofer Institute for Microelectronic Circuits and
     actually runs the code generated by the converter. The                     Systems (IMS) has developed the Artificial Intelligence
     optimized models, which cover a number of algorithms                       for Embedded Systems (AIfES) library [44], which is a
     from the NN family, can be executed in a range of plat-                    C-based and platform-independent tool for generating
     forms such as smart-phones, embedded Linux and MCUs.                       NNs compatible with a range of MCUs, e.g., Arduino UNO,
     Focusing on the latter, the optimized code is provided in                  ATMega32U4, or STM32 F4 Series. It can also produce ef-
     C++ 11 and requires 32-bit processors to be run. It has                    ficient code for Windows in the form of Dynamic Link Li-
     been successfully tested in devices with ARM Cortex-M                      brary (DLL) and Linux platforms such as Raspberry Pi.
     series processors, e.g., Arduino nano, and other architec-                 In its current version, it is able to extracts all parameters
     tures such as ESP32. Given the notoriety of Arduino, a                     from models produced by TensorFlow or Keras. Differ-
     specific library for this platform is available through its                ent from the frameworks reviewed above, AIfES permits
     IDE. Besides the development of TensorFlow Lite, another                   to implement the NN training process on the embedded
     related Google’s initiative has been the adaptation of the                 device, i.e., it belongs to the second family of frameworks
     GO programming language to microcontrollers [36]. In                       identified above. Cartesiam’s NanoEdge AI Studio [45] is
     turn, Microsoft has also contributed to the TinyML scene                   another commercial product which permits to create ML
     by releasing its open source Embedded Learning Library                     static libraries to embed them in Cortex-M-based MCUs.
     (ELL) [37]. This framework permits to design and deploy                    Similar to AIfES, it allows to integrate the training process
     pre-trained ML models onto constrained platforms, e.g.,                    within the constrained device but, in addition, it also
     ARM Cortex-A and Cortex-M architectures such as Ardu-                      permits to run unsupervised learning algorithms on the
     ino, Raspberry Pi, and micro:bit. ELL can be understood                    MCU. As a part of this toolkit, a microcontroller emula-
     as an optimizing cross-compiler that runs on a regular                     tor is provided in order to test and debug the generated
     desktop computer and outputs C++ code ready to be ex-                      model prior to its deployment on the device.
     ecuted on the targeted single-board computer. ELL’s API                        Surprisingly, all the frameworks presented so far only
     can be employed either from C++ or Python and makes                        focus on a single type of ML algorithm, namely, NN. How-
     use of pre-trained NN models produced by the Microsoft                     ever, different initiatives released by makers, developers,
     Cognitive Toolkit (CNTK) [38], Darknet [39], or in Open                    and researchers also consider other ML techniques such
     Neural Network Exchange (ONNX) format [40].                                as Naive Bayes classifier, decision trees, k-Nearest Neigh-
         ARM is also pushing for the integration of ML in their                 bors (k-NN), among others. MicroML [46] is a project that
     devices. The ARM-NN toolkit [41] is part of Linaro’s ML                    permits to port Support Vector Machine (SVM) and Rel-
     Initiative [56] and allows to integrate NN workloads within                evance Vector Machine (RVM) algorithms to C code that
     ARM Cortex-A CPUs, ARM Mali GPUs, and Ethos Neural                         can run in a number of MCUs, e.g., Arduino, ESP8266,
     Processing Units (NPUs). ARM-NN is an open-source proj-                    ESP32, and others with C support. It interoperates with the

     THIRD QUARTER 2020 		                                                                                       IEEE CIRCUITS AND SYSTEMS MAGAZINE            9

Authorized licensed use limited to: Universidad Politecnica de Cartagena. Downloaded on August 18,2020 at 08:22:56 UTC from IEEE Xplore. Restrictions apply.
TINYML-ENABLED FRUGAL SMART OBJECTS: CHALLENGES AND OPPORTUNITIES - IOT CRAWLER
widely-used Scikit-learn toolkit and transforms the mod-                     Table I summarizes the main characteristics of the
       els generated by this library in order to be executed on                  frameworks reviewed in this section.
       8-bit microcontrollers with 2 kb of RAM. Also compatible
       with Scikit-learn, sklearn-porter [47] permits to transpile                         IV. Use Case: Smart Multi-Rat Device
       trained estimators to C, Java, JavaScript, GO, Ruby, and                  Most of the latest efforts related to the evolution of the
       PHP. Given this variety of supported languages, it is a very              IoT ecosystem have focused on the development of novel
       complete framework as it is also compatible with a range                  communication technologies and protocols to provide
       of ML algorithms. Specifically for C language, sklearn-                   end-devices with ubiquitous connectivity. Thus, it is com-
       porter is able to port Scikit-learn models of SVM, decision               mon to find MCU-based objects with a variety of integrat-
       trees, Random Forest (RF), and AdaBoost classifier. How-                  ed RATs, which present different characteristics in terms
       ever, the versatility of this library hinders it to generate              of data-rate, latency, energy consumption, etc. As dis-
       microcontroller-optimized code in terms of required RAM.                  cussed previously, a convenient management of commu-
       A similar tool is m2cgen [48], which is also able to transpile            nication activities is crucial for ensuring the sustainability
       Scikit-learn-trained models into a native code, e.g., Python,             of the end-device; besides, the Quality of Service (QoS)
       C, Java, Go, JavaScript, Visual Basic, etc. In this case, both            requirements of data traffic should be taken into account
       the number of compatible algorithms as well as the target                 as well. For those reasons, we propose the application of
       programming languages are even greater than those sup-                    ML techniques to smartly decide which is the most ad-
       ported by sklearn-porter. A more limited tool is Weka-por-                equate communication interface for transmitting a given
       ter [49], which coverts decision tree classifiers generated               message by considering both the status of the device and
       by WEKA [59] into C, Java and JavaScript codes. EmbML                     the characteristics of the message to be sent. Adopting a
       [50] is a library that permits to convert models trained by               ML approach provides flexibility to the interface selection
       WEKA or Scikit-learn into C++ source-code files that can                  mechanism in the case of adding new communication in-
       be compiled and executed in constrained platforms. This                   terfaces or removing some of the existing ones. Thus, the
       library has been tested in Arduino and Teensy [60] boards                 multi-RAT frugal object is enriched with awareness capa-
       and it supports different ML algorithms such as decision                  bilities unavailable for current IoT devices. Aiming at pro-
       trees, SVM, and NN.                                                       viding insights regarding the design and implementation
           emlearn [51] is a framework that produces portable C99                processes, firstly we propose and describe a multi-RAT
       code from Python libraries such as Scikit-learn or Keras.                 smart object architecture and, then, we explore the inte-
       This library is compatible with generated models of a range               gration process of TinyML-based algorithms on real hard-
       of algorithms, e.g., RF, decision trees, Naive Bayes, or linear           ware in order to tackle the mentioned decision problem.
       models, and has been tested in different platforms, namely,
       AVR Atmega, ESP8266 and Linux. Focused on extending the                   A. Architecture
       compatibility of TensorFlow models, uTensor [52] and Ti-                  The proposed architecture is shown in Fig. 4. The core
       nyMLgen [53] are two libraries that produce C code ready                  element of the device is the MCU which gathers the pro-
       to be integrated within different families of MCUs. While the             cessing unit as well as program and data memories. In this
       former is focused on Mbed boards [61], the latter exports                 regard, we consider constrained processing platforms as
       the trained model to a plain C array, hence ready to be used              those described in [8]. Focusing on storage, it is usual to
       in a wider range of microcontrollers.                                     find commercial devices with SD-card m    ­ odules to expand
           Finally, two proposals from the Academia are focused                  their capacity, so this is not a big limitation in current de-
       on optimizing the implementation of different types of NNs                velopment boards. As observed in the diagram, the device
       on ARM Cortex-M processors. CMix-NN [54] is an open-                      is also equipped with a series of embedded sensors as well
       source mixed-precision library for quantized NNs deploy-                  as digital/analog pins for enabling the connection of exter-
       ment on MCUs. This tool supports convolutional kernels                    nal elements. To this end, a range of communication proto-
       with precision of 2, 4, or 8 bits, for any of the operands.               cols are frequently employed in MCUs, e.g., Universal Asyn-
       In their work, authors made use of the convolutional NN                   chronous Receiver/Transmitter (UART), Serial Peripheral
       models produced by MobileNets [62]. In turn, FANN-on-                     Interface (SPI), or Inter-Integrated Circuit (I2C). Although
       MCU [55] is another open-source framework which is built                  displays are highly consuming, they are useful to show im-
       upon the Fast Artificial Neural Network (FANN) library [63]               portant events or status messages. Besides, recently small
       aiming at running lightweight NNs on microcontrollers.                    e-Paper displays have arrived to the MCU market, which
       Concretely, this toolkit is able to process Multi-Layer Per-              present insignificant energy consumption levels [65].
       ceptrons (MLPs) trained with FANN and produces code                           The availability of multiple RATs embedded in the de-
       that, besides the mentioned ARM Cortex-M processor, can                   vice is also one of its key characteristics. WiFi or Blue-
       run on Parallel Ultra-Low Power Platforms (PULP) [64].                    tooth are the most usual technologies as they are very

10 	   IEEE CIRCUITS AND SYSTEMS MAGAZINE 		                                                                                        THIRD QUARTER 2020

 Authorized licensed use limited to: Universidad Politecnica de Cartagena. Downloaded on August 18,2020 at 08:22:56 UTC from IEEE Xplore. Restrictions apply.
TINYML-ENABLED FRUGAL SMART OBJECTS: CHALLENGES AND OPPORTUNITIES - IOT CRAWLER
Table I.
        TinyML frameworks.
                                                                                           Interoperable
                                                  Compatible             Output            External              Publicly
        Framework           Algorithms            Platforms              Languages         Libraries             Available      Main Developer
        TensorFlow          Neural                ARM Cortex-M           C++ 11            TensorFlow            Yes            Google
        Lite [35]           networks
        ELL [37]            Neural                ARM Cortex-M           C                 CNTK                  Yes            Microsoft
                            networks              ARM Cortex-A           C++               Darknet
                                                  Arduino                                  ONNX
                                                  micro:bit
        ARM-NN [41]         Neural                ARM Cortex-A           C                 TensorFlow            Yes            ARM
                            networks              ARM Mali                                 Caffe
                                                  Graphics                                 ONNX
                                                  Processors
                                                  ARM Ethos
                                                  Processor
        CMSIS-NN            Neural                ARM Cortex-M           C99               TensorFlow            Yes            ARM
        [42]                networks                                                       Caffe
                                                                                           PyTorch
        STM 32Cube.         Neural                STM32                  C                 Keras                 Yes            STMicroelectronics
        AI [43]             networks                                                       TensorFlow
                                                                                           Lite Caffe
                                                                                           ConvNetJs
                                                                                           Lasagne
        AIfES [44]          Neural                Windows (DLL)          C                 TensorFlow            No             Fraunhofer IMS
                            networks              Raspberry Pi                             Keras
                                                  Arduino
                                                  ATMega32U4
                                                  STM32 F4
                                                  Series
                                                  ARM Cortex-M4
        NanoEdge            Unsupervised          ARM Cortex-M           C                 —                     No             Cartesiam
        AI Studio [45]      learning
        MicroMLGen          SVM                   Arduino                C                 Scikit-learn          Yes            Particular
        [46]                RVM                   ESP32                                                                         developer
                                                  ESP8266
        sklearnporter       SVM                   Multiple               C                 Scikit-learn          Yes            Particular
        [47]                Decision tree         constrained &          GO                                                     developer
                            Random Forest         non-                   Java
                            Ada Boost             constrained            JavaScript
                            Classifier            platforms              PHP
                            k-NN                                         Ruby
                            Naive Bayes
                            Neural
                            networks
        m2cgen [48]         Linear                Multiple               C                 Scikit-learn          Yes            Particular
                            regression            constrained &          C#                                                     developer
                            Logistic              non-                   Dart
                            regression            constrained            Go
                            Neural                platforms              Java
                            networks                                     JavaScript
                            SVM                                          PHP
                            Decision tree                                PowerShell
                            Random Forest                                Python
                            LGBM                                         R
                            Classifier                                   Visual Basic                                                    (Continued )

     THIRD QUARTER 2020 		                                                                                       IEEE CIRCUITS AND SYSTEMS MAGAZINE            11

Authorized licensed use limited to: Universidad Politecnica de Cartagena. Downloaded on August 18,2020 at 08:22:56 UTC from IEEE Xplore. Restrictions apply.
TINYML-ENABLED FRUGAL SMART OBJECTS: CHALLENGES AND OPPORTUNITIES - IOT CRAWLER
Table I.
         TinyML frameworks. (Cotinued)
                                                                                            Interoperable
                                                    Compatible            Output            External              Publicly
         Framework           Algorithms             Platforms             Languages         Libraries             Available      Main Developer
         weka-porter         Decision trees         Multiple              C                 Weka                  Yes            Particular
         [49]                                       constrained &         Java                                                   developer
                                                    non-                  JavaScript
                                                    constrained
                                                    platforms
         EmbML [50]          Decision trees         Arduino               C++               Scikit-learn          No             Research group
                             Neural                 Teensy                                  Weka
                             networks
                             SVM
         emlearn [51]        Decision trees         AVR Atmega            C                 Keras                 Yes            Particular
                             Neural                 ESP8266                                 Scikit-learn                         developer
                             networks               Linux
                             Naive Gaussian
                             Bayes
                             Random forest
         uTensor [52]        Neural                 mBed boards           C++11             TensorFlow            Yes            Particular
                             networks                                                                                            developer
         TinyMLgen           Neural                 ARM Cortex-M          C                 TensorFlow            Yes            Particular
         [53]                networks               ESP32                                   Lite                                 developer
         CMix-NN [54]        Neural                 ARM Cortex-M          C                 Mobilenet             Yes            Research group
                             networks
         FANN-on-            Neural                 ARM Cortex-M          C                 FANN                  Yes            Research group
         MCU [55]            networks               PULP

                                          TinyML
                          Sensors                        Storage
                          Inertial
                          Weather                   MC
                                                     +
                          Health                   Memory      RAT 1
                             ...                                 ...
                            GPS
                                            Display            RAT
                                                                 ... n

                            I/O
                           Ports
                                             Battery                                                          APPS

       Figure 4. Multi-RAT smart object architecture.

12 	   IEEE CIRCUITS AND SYSTEMS MAGAZINE 		                                                                                        THIRD QUARTER 2020

 Authorized licensed use limited to: Universidad Politecnica de Cartagena. Downloaded on August 18,2020 at 08:22:56 UTC from IEEE Xplore. Restrictions apply.
TINYML-ENABLED FRUGAL SMART OBJECTS: CHALLENGES AND OPPORTUNITIES - IOT CRAWLER
­ ell-known standards that can reach high data-rates.
     w                                                                          ing the device status, several aspects are considered. The
     Other options based on cellular communications such as                     percentage of remaining battery (B) is one of the key factors
     4 G/5G may be also considered. However, all these RATs                     to evaluate as the power consumption of both interfaces
     require great amount of energy to work. For that reason,                   clearly differs. The coverage level (C) of each RAT {c1, c2},
     LPWAN-based solutions such as LoRaWAN, Sigfox, or                          is also taken into account in order to avoid transmissions
     Narrow Band—Internet of Things (NB-IoT) [19] are cur-                      over a lossy link. Besides, LoRaWAN makes use of license-
     rently being adopted by the IoT market as they provide                     free bands, e.g., Industrial, Scientific and Medical (ISM), so
     very long transmission distances with highly reduced en-                   the access to this wireless medium is normally limited by
     ergy consumption, although with the limitation of severely                 international regulations. Thus, we model the availability
     limiting the transmission rate. Given the heterogeneous                    of the LoRaWAN interface with the boolean feature L. We
     characteristics of the available RATs, it is crucial to precise-           consider that NB-IoT does not present this restriction as it
     ly select the most adequate technology to guarantee both                   makes use of licensed spectrum. We also assume that there
     the energy efficiency of the device as well as the QoS of                  are not dedicated queues for each interface; instead, we
     the transmitted packets. At this point, TinyML comes into                  consider a One in—One out message-forwarding system.
     play with the aim of smartly supporting this decision. As                  This assumption is made given the memory constraints of
     discussed in next Section, a number of ML techniques may                   MCUs, which prevent them of storing great amount of data
     be employed to this end, thanks to the algorithm adapta-                   in RAM. Two message features are also evaluated in the de-
     tions provided by the afore-reviewed TinyML frameworks.                    cision process. The data size (S), in bytes, is considered as
         This multi-R AT smart board can be integrated                          LoRaWAN has a maximum payload length of 240 B. Finally,
     within a plethora of devices and gadgets, e.g., vehicles,                  the urgency level of the message (U) is also contemplated
     wearables, sensors, robots, etc., which will be enriched                   with the aim of supporting different classes of services.
     with intelligence and communication capabilities. These                        With the features identified above, the current
     elements are connected to the cloud or other devices                       system status is defined by the following vector
     through the available communication interfaces in or-                      (B, " c 1, c 2 ,, L, S, U ), which is employed for determining
     der to gain access to a range of next-generation services,                 the most adequate action A ! " a 0, a 1, a 3 , . We tackle this
     which are finally accessed by me a n s of e nd - u se r’s                  issue as a supervised multi-class classification problem.
     personal devices such as smartphones or tablets.                           To this end, a dataset has been generated to train a num-
                                                                                ber of algorithms that have been embedded in a real con-
     B. Problem Statement                                                       strained device as explained in the following.
     We consider the problem of intelligently selecting the most
     adequate communication interface in a multi-RAT set up C. Dataset
     when certain data is generated to be transmitted in a non- A synthetic training dataset of 10,000 samples, i.e., fea-
     connection-oriented fashion. Concretely, given the architec- tures vectors, has been generated and each of the
     ture presented above, we consider an end-device producing ­features composing the input vectors has received a
     sporadic messages and that is equipped with two commu- value randomly calculated. The values for the message
     nication interfaces, namely, LoRaWAN and NB-IoT. As both urgency (U) and the coverage of each RAT (c1, c2) have
     of them are LPWAN-based solutions, they provide long been generated by assigning them uniformly distribut-
     range communications with reduced power consumption, ed values in the range [0%, 100%]. The values for the
     which are highly valued characteristics for moving devices. device’s remaining battery (B) and the packet size (S),
     However, given these common general attributes, as shown have been randomly calculated in the ranges [1%, 100%]
     in Table II, both present unique characteristics in terms of and [50 bytes, 500 bytes], respectively, following a uni-
     bandwidth, uplink/downlink latency, duty-cycle, etc. There- form distribution as well. Finally, the LoRaWAN interface
     fore, by considering these RAT features as well as the device availability (L), which was modeled as a boolean input,
     status and the message characteristics, it should be decid- has received true/false values uniformly distributed.
     ed whether to transmit a message or not, and if so, the most      Each of the 10,000 samples has been labelled with the
     adequate communication interface should be selected.           most adequate action considering its input features. To
        The set of all possible ac-
     tions (A) is {a0, a1, a2}, where     Table II.
     a0 represents the action of          Main characteristics of RATs under consideration.
     dropping the message and {a1,        RAT              Throughput     Latency      RX Sensitivity   Power Consumption
     a2} indicates sending the data
                                          LoRaWAN          Low            High         High             Low
     using LoRaWAN or NB-IoT in-
     terfaces, respectively. Regard-      NB-IoT           High           Low          Medium           Medium

     THIRD QUARTER 2020 		                                                                                       IEEE CIRCUITS AND SYSTEMS MAGAZINE            13

Authorized licensed use limited to: Universidad Politecnica de Cartagena. Downloaded on August 18,2020 at 08:22:56 UTC from IEEE Xplore. Restrictions apply.
this end, some hard restrictions have been established                    mized code in terms of memory allocation. We proved this
       in order to select a communication interface for sending                  fact by porting to C code a number of models produced by
       a given message (Table III). With these restrictions and                  Scikit-learn, namely, RF, decision trees, and SVM. In every
       considering the characteristics of each RAT (Table II), we                case, m2cgen generated non-compilable code for Arduino
       have developed an automatic labelling scheme as follows.                  Uno due to unallowed memory and object manipulations.
       LoRaWAN interface is preferred for small packets with low                     MicroMLGen is specialized in translating Scikit-
       urgency, given its highly reduced power consumption. Nev-                 learn’s SVM models to Arduino-like platforms. It is able
       ertheless, it is not the best option for transmitting urgent              to handle different SVM kernels such as linear, polyno-
       packets because of its limited throughput and the long la-                mial, and Radial Basis Function (rbf). SVMs are highly
       tency introduced. In turn, NB-IoT interface is more suitable              sensitive to the number of training samples as they de-
       for transporting urgent messages, given its lower latency                 termine the number of generated support vectors which
       and higher data-rate. However, its more limited sensitivity               have a direct impact on the required memory. Table IV
       causes that its usage is not recommended with a coverage                  summarizes the attained results in terms of accuracy
       level below 10%. The same applies to the battery level,                   and F1-score provided by Scikit-learn as well as program
       when the remaining energy is below 10%, using the NB-IoT                  memory footprint and classification time in Arduino
       interface is discarded due to its higher power consump-                   Uno. Regarding the latter, we present the average run-
       tion. When one of these situations happen, short urgent                   ning time for 100 random input vectors as well as the
       packets are sent via the LoRaWAN interface if it is available             confidence interval ^a = 0.05 h . The alpha SVM param-
       (L = true). Of course, considering the mentioned restric-                 eter has been fixed to 0.0001. It can be seen how the clas-
       tions, some combinations of the input features may lead to                sifier accuracy is improved by increasing the number of
       the decision of dropping messages.                                        training samples, no matter the kernel employed. The lin-
                                                                                 ear kernel is the fastest one and shows better accuracy
       D. TinyML integration and evaluation                                      than the rbf kernel, which presents the greatest memory
       Given the notoriety and mass adoption of Arduino Uno,                     footprint. Observe that both of them surpass the maxi-
       this board has been chosen as the testbench for evaluat-                  mum MCU’s program memory when employing 1000 and
       ing the performance of TinyML frameworks. We consider                     1250 training e­ xamples due to the large number of gen-
       it as a good candidate for conducting this benchmarking                   erated support vectors. Finally, the polynomial kernel is
       due to its processing and memory constraints, as it is                    the most accurate one and also presents the most mod-
       equipped with an 8 bit processor at 16 MHz, flash mem-                    erate memory footprint. However, it is the slowest kernel
       ory of 32 KB, and 2 kB of Static RAM (SRAM). According                    although always under the threshold of 200 ms.
       to the device classification defined in [8], this platform                    sklearnporter has been evaluated when dealing with
       belongs to the most constrained type of MCUs (Class 0).                   SVMs as well. However, the produced C code from the
           Taking a look at Table I, the only publicly available                 Scikit-learn’s models is not optimized for constrained
       frameworks that can produce code for Arduino Uno                          devices as mentioned previously, hence it provoked an
       are MicroMLGen [46], sklearnporter [47], m2cgen [48],                     unstable behaviour of the Arduino unit. Given the incon-
       and emlearn [51]. The workflow followed by these tools                    sistent performance obtained, we could not collect reli-
       is similar in all of them: firstly, a selected algorithm is               able outcomes to present here.
       trained using a Python-based ML library; thereafter, the                      As discussed above, the family of NN-based algo-
       model generated by this library is ported to C code by                    rithms is the one that has received greater attention
       the corresponding TinyML translation tool; finally, the                   from the TinyML community. In this case, we have
       produced code is integrated within an Arduino sketch                      evaluate the performance of the MLP implementation
       by importing it as an external library.                                   provided by the emlearn framework. We have employed
           Regarding m2cgen, as discussed above, although this                   Scikit-learn for producing trained networks with differ-
       framework is highly versatile as it supports many algo-                   ent number of hidden layers and neurons per layer, us-
       rithms and output languages, it does not generate opti-                   ing the sigmoid activation function. As the number of

         Table III.
         Feature restrictions for generating the dataset.
                               Battery                  Coverage (C)              LoRaWAN                   Packet Size (S)           Urgency (U)
         RAT                   Level (B) (%)            (%)                       Availability (L)          (bytes)                   (%)
         LoRaWAN               >5                       >5                        True                      10                       —                         —                         —

14 	   IEEE CIRCUITS AND SYSTEMS MAGAZINE 		                                                                                        THIRD QUARTER 2020

 Authorized licensed use limited to: Universidad Politecnica de Cartagena. Downloaded on August 18,2020 at 08:22:56 UTC from IEEE Xplore. Restrictions apply.
training samples does not impact on the weight of the                      for training and 20% for evaluating the produced classifier.
     produced model, the 80% of the dataset has been em-                        Observe how these algorithms capture the logic employed
     ployed for training and the other 20% for test. The ob-                    for labelling our synthetic dataset, which is evidenced by
     tained outcomes are shown in Table V. It can be seen                       the good accuracy obtained. By increasing the depth of the
     that in the MLP case the limiting factor is the Arduino                    decision tree, we improve the classifier performance up to
     Uno’s SRAM, as the network’s weights and biases are                        a perfect accuracy of 1 with decision trees of 6 levels. Re-
     stored there. Thus, when the number of neurons per lay-                    garding the RF classifier, we have evaluated one model con-
     er is 10, the maximum number of hidden layers for not                      sisting of 10 estimators (trees) and we also obtain a perfect
     consuming all the unit’s SRAM is 3. Regarding the algo-                    accuracy although the memory footprint and processing
     rithm’s performance, a trade-off between accuracy and                      time are notably greater for this scheme. Comparing both
     network complexity is evidenced. However, employing                        frameworks, sklearnporter provides faster and lighter deci-
     one hidden layer with 4 neurons permits an accuracy of                     sion trees in comparison with emlearn. However, the oppo-
     0.97, which is the best figure for this simple configura-                  site behaviour is obtained for the RF algorithm.
     tion. It is clear the direct relationship between the over-                    In the light of the obtained results, the performance
     all number of neurons in the network and the process-                      of decision trees and RF are notably better than that
     ing time for each classification operation, which should                   of SVM and MLP for solving the intelligent communica-
     be also considered when selecting the most adequate                        tion interface selection issue under consideration. They
     system configuration. However, observe that the times                      achieve greater accuracy with reduced memory foot-
     employed by the MLP classifier are notably shorter than                    print and computation delay. Therefore, for the specific
     those needed by the MicroMLGen’s SVMs. The achieved                        case of smart multi-RAT configuration management,
     accuracy is also better for the MLP algorithm.                             this family of algorithms would be the most adequate to
         The adaptation of other well-known ML mechanisms                       be integrated within the MCU.
     such as decision trees and RF has been evaluated as well.                      To sum up, it can be concluded that great advances have
     To this end, we have employed the sklearn-porter and em-                   been made so far for integrating ML algorithms within fru-
     learn frameworks to port the models generated by Scikit-                   gal devices. These achievements were unimaginable few
     learn. In this case, sklearnporter produced suitable code                  time ago and they pave the way for the arrival of smart ap-
     for Arduino Uno, given that decision trees and RF are light-               plications and services that will not require a continuous
     er than SVMs. A similar performance analysis to those                      processing support from the cloud. However, we are just in
     presented previously is shown in Table VI. The dataset                     the inception of this movement and more efforts are need-
     has been split as in the case of MLP evaluation, i.e., 80%                 ed for the integral development of the TinyML ecosystem.

        Table IV.
        SVM generated by MicroMLGen.
        Samples/Kernel                   Linear                                Polynomial                            rbf
       250                              Accuracy = 0.7816                      Accuracy = 0.7992                    Accuracy = 0.7355
                                        F1 = 0.7119                            F1 = 0.7643                          F1 = 0.6298
                                        Memory = 10678 B                       Memory = 10264 B                     Memory = 12172 B
                                        Time = 11.1±0.2 ms                     Time = 73.7±0.3 ms                   Time = 34.7±0.03 ms
       500                              Accuracy = 0.7871                      Accuracy = 0.8552                    Accuracy = 0.7472
                                        F1 = 0.7161                            F1 = 0.8074                          F1 = 0.6661
                                        Memory = 17782 B                       Memory = 15636 B                     Memory = 20574 B
                                        Time = 22.3±0.1 ms                     Time = 73.2±0.7 ms                   Time = 72±0.047 ms
       750                              Accuracy = 0.7922                      Accuracy = 0.8561                    Accuracy = 0.75
                                        F1 = 0.723                             F1 = 0.8347                          F1 = 0.6815
                                        Memory = 25298 B                       Memory = 20642 B                     Memory = 26644 B
                                        Time = 34.1±0.2 ms                     Time = 107.2±0.3 ms                  Time = 98.9±0.07 ms
       1000                             Accuracy = 0.7921                      Accuracy = 0.8688                    Accuracy = 0.7757
                                        F1 = 0.7232                            F1 = 0.8443                          F1 = 0.7402
                                        Memory = 30156 B                       Memory = 25530 B                     Memory = 35184 B
                                        Time = 41.8±0.2 ms                     Time = 140.1±0.2 ms                  (out-of-range)
        1250                            Accuracy = 0.7939                      Accuracy = 0.8757                    Accuracy = 0.7818
                                        F1 = 0.725                             F1 = 0.845                           F1 = 0.7484
                                        Memory = 37930 B                       Memory = 31968 B                     Memory = 44058 B
                                        (out-of-range)                         Time = 183.6±0.4 ms                  (out-of-range)

     THIRD QUARTER 2020 		                                                                                       IEEE CIRCUITS AND SYSTEMS MAGAZINE            15

Authorized licensed use limited to: Universidad Politecnica de Cartagena. Downloaded on August 18,2020 at 08:22:56 UTC from IEEE Xplore. Restrictions apply.
Table V.
         MLP generated by emlearn.

         Hidden
         layers/
         Neurons        2                          4                         6                         8                         10

         1              Accuracy = 0.8835          Accuracy = 0.9735         Accuracy = 0.961          Accuracy = 0.9705         Accuracy = 0.968
                        F1 = 0.8393                F1 = 0.9737               F1 = 0.9605               F1 = 0.9701               F1 = 0.9679
                        Memory = 6750 B            Memory = 6830 B           Memory = 6910 B           Memory = 6990 B           Memory = 7070 B
                        SRAM = 392 B               SRAM = 472 B              SRAM = 552 B              SRAM = 648 B              SRAM = 744 B
                        Time = 1.9+0.006           Time = 2.7+0.009          Time = 3.5+0.01           Time = 4.3+0.01           Time = 4.9+0.03
                        ms                         ms                        ms                        ms                        ms

         2              Accuracy = 0.8045          Accuracy = 0.907          Accuracy = 0.9544         Accuracy = 0.9742         Accuracy = 0.9691
                        F1 = 0.7319                F1 = 0.878                F1 = 0.9541               F1 = 0.9742               F1 = 0.9691
                        Memory = 6790 B            Memory = 6926 B           Memory = 7094 B           Memory = 7294 B           Memory = 7526 B
                        SRAM = 430 B               SRAM = 566 B              SRAM = 734 B              SRAM = 950 B              SRAM = 1198 B
                        Time = 2.4+0.005           Time = 3.8+0.009          Time = 5.5+0.01           Time = 7.3+0.02           Time = 9.2+0.02
                        ms                         ms                        ms                        ms                        ms

         3              Accuracy = 0.8903          Accuracy = 0.9746         Accuracy = 0.9543         Accuracy = 0.9705         Accuracy = 0.9795
                        F1 = 0.8431                F1 = 0.9745               F1 = 0.9532               F1 = 0.9709               F1 = 0.9797
                        Memory = 6826 B            Memory = 7018 B           Memory = 7274 B           Memory = 7594 B           Memory = 7978 B
                        SRAM = 468 B               SRAM = 660 B              SRAM = 916 B              SRAM = 1252 B             SRAM = 1652 B
                        Time = 2.9+0.01            Time = 5+0.01 ms          Time = 7.4+0.02           Time = 10.1+0.02          Time = 13.2+0.03
                        ms                                                   ms                        ms                        ms

         4              Accuracy = 0.888           Accuracy = 0.922          Accuracy = 0.924          Accuracy = 0.945          Accuracy = 0.944
                        F1 = 0.8392                F1 = 0.9242               F1 = 0.9238               F1 = 0.9453               F1 = 0.9422
                        Memory = 6864 B            Memory = 7112 B           Memory = 7456 B           Memory = 7896 B           Memory = 8432 B
                        SRAM = 506 B               SRAM = 754 B              SRAM = 1098 B             SRAM = 1154 B             SRAM = 2106 B
                        Time = 3.6+0.009           Time = 6.2+0.03           Time = 9.3+0.03           Time = 13+0.06 ms         (out-of-range)
                        ms                         ms                        ms

         5              Accuracy = 0.715           Accuracy = 0.882          Accuracy = 0.9285         Accuracy = 0.931          Accuracy = 0.9245
                        F1 = 0.6108                F1 = 0.8385               F1 = 0.9269               F1 = 0.9297               F1 = 0.9183
                        Memory = 6902 B            Memory = 7206 B           Memory = 7638 B           Memory = 8198 B           Memory = 8886 B
                        SRAM = 544 B               SRAM = 848 B              SRAM = 1280 B             SRAM = 1856 B             SRAM = 2560 B
                        Time = 3.9+0.03            Time = 7.3+0.03           Time = 11.1+0.07          Time = 15.8+0.08          (out-of-range)
                        ms                         ms                        ms                        ms

         Table VI.
         Decision trees and RF generated by sklearnporter and emlearn.
         Library/              Decision Tree            Decision Tree            Decision Tree             Decision Tree           RF Estimators
         Algorithm             Depth = 3                Depth = 4                Depth = 5                 Depth = 6               = 10
         sklearnporter         Accuracy =               Accuracy =               Accuracy =                Accuracy = 1            Accuracy = 1
                               0.9155                   0.9155                   0.9935                    F1 = 1                  F1 = 1
                               F1 = 0.8886              F1 = 0.8886              F1 = 0.9935               Memory = 3770 B         Memory = 27814 B
                               Memory = 3610 B          Memory = 3670 B          Memory = 3730 B           Time = 48.75±5.6        Time = 252.24±7
                               Time =                   Time =                   Time =                    μs                      μs
                               40.56±0.63 μs            45.28±1.55 μs            48.32±9.48 μs
         emlearn               Accuracy =               Accuracy =               Accuracy =                Accuracy = 1            Accuracy = 1
                               0.9155                   0.9155                   0.9935                    F1 = 1                  F1 = 1
                               F1 = 0.8886              F1 = 0.8886              F1 = 0.9935               Memory = 40016 B        Memory = 13160 B
                               Memory = 3840 B          Memory = 3904 B          Memory = 3952 B           Time =                  Time =
                               Time =                   Time =                   Time =                    104.08±3.87 μs          193.6±6.06 μs
                               103.28±3.87 μs           103.58±7.18 μs           103.96±83.82 μs

16 	   IEEE CIRCUITS AND SYSTEMS MAGAZINE 		                                                                                          THIRD QUARTER 2020

 Authorized licensed use limited to: Universidad Politecnica de Cartagena. Downloaded on August 18,2020 at 08:22:56 UTC from IEEE Xplore. Restrictions apply.
V. Conclusions                                     services, management of wireless mobile networks, green
     The integration of ML within IoT devices will provide them                 networking techniques, and IoT/M2M architectures.
     with a flexibility and processing capabilities never seen be-
     fore. The TinyML paradigm proposes to adapt advanced                                         Antonio Skarmeta received the B.S. de-
     ML techniques to the constraints of MCUs in order to con-                                    gree (Hons.) from the University of Mur-
     vert them in real smart objects. In this work, a comprehen-                                  cia, Spain, the M.S. degree from the Uni-
     sive review and discussion of this novel and vibrating eco-                                  versity of Granada, and the Ph.D. degree
     system have been presented. The related challenges and                                       from the University of Murcia, all in com-
     opportunities have been identified as well as the potential                                  puter science. He has been a Full Profes-
     services that will be enabled by these intelligent small el-               sor with the University of Murcia, since 2009. He has been
     ements. The TinyML frameworks available in the market                      part of many EU FP projects and even coordinated some of
     that currently permit the integration of complex ML mecha-                 them. He has published more than 200 international arti-
     nisms within MCUs have been also reviewed and their ap-                    cles. His main interests include the integration of security
     plicability has been discussed. Finally, a realistic use case              services, identity, the IoT, 5G, and smart cities.
     has been explored. Concretely, a multi-RAT smart object ar-
     chitecture has been proposed and the ML integration pro-                                                References
     cess has been explained. A number of TinyML frameworks                     [1] L. Columbus, “10 Charts that will change your perspective of big
                                                                                data’s growth,” Tech. Rep., 2018. [Online]. Available: https://software​
     have been employed for solving the problem of selecting
                                                                                strategiesblog.com/2018/06/02/10 - charts-that-will- change -your
     the most adequate communication interface considering                      -perspective-of-big-datas-growth/
     both the status of the device and the characteristics of the               [2] Y. Sun, M. Peng, Y. Zhou, Y. Huang, and S. Mao, “Application of ma-
                                                                                chine learning in wireless networks: Key techniques and open issues,”
     data to be sent. We have demonstrated that several classifi-               IEEE Commun. Surveys Tut., vol. 21, no. 4, pp. 3072–3108, 2019. doi:
     cation algorithms can be integrated within an Arduino Uno                  10.1109/COMST.2019.2924243. [Online]. Available: https://ieeexplore
                                                                                .ieee.org/document/8743390/
     board for tackling this issue, namely, SVM, MLP, decision
                                                                                [3] T. Baltrusaitis, C. Ahuja, and L.-P. Morency, “Multimodal ma-
     trees, and RF. From the attained outcomes, we can conclude                 chine learning: A survey and taxonomy,” IEEE Trans. Pattern Anal.
     that decisions trees and RF present the best performance                   Mach. Intell., vol. 41, no. 2, pp. 423 – 4 43, Feb. 2019. doi: 10.1109/
                                                                                TPAMI.2018.2798607. [Online]. Available: https://ieeexplore.ieee.org/
     in terms of accuracy, memory footprint, and classification                 document/8269806/
     speed. Other algorithms also showed good results in terms                  [4] Q. Chen et al., “A survey on an emerging area: Deep learning for
     of classification accuracy but their implementation should                 smart city data,” IEEE Trans. Emerg. Topics Comput. Intell., vol. 3, no. 5,
                                                                                pp. 392–410, Oct. 2019. doi: 10.1109/TETCI.2019.2907718. [Online]. Avail-
     be carefully addressed given the memory restrictions of                    able: https://ieeexplore.ieee.org/document/8704334/
     the considered MCU. As future work, we plan to consider                    [5] G. Li, R. Gomez, K. Nakamura, and B. He, “Human-centered rein-
                                                                                forcement learning: A survey,” IEEE Trans. Human-Mach. Syst., vol. 49,
     other constrained devices in order to benchmark additional                 no. 4, pp. 337–349, Aug. 2019. doi: 10.1109/THMS.2019.2912447. [Online].
     TinyML frameworks. Besides, other relevant aspects such                    Available: https://ieeexplore.ieee.org/document/8708686/
     as decision safety and model correctness in sensitive fields               [6] T. Luo et al., “DaDianNao: A neural network supercomputer,” IEEE Trans.
                                                                                Comput., vol. 66, no. 1, pp. 73–88, Jan. 2017. doi: 10.1109/TC.2016.2574353.
     of application, e.g., eHealth, will be considered.                         [Online]. Available: http://ieeexplore.ieee.org/document/7480791/
                                                                                [7] B. Garbinato, R. Guerraoui, J. Hulaas, M. Monod, and J. H. Spring, “Per-
                                                                                vasive computing with frugal objects,” in Proc. IEEE 21st Int. Conf. Advanced
                       Acknowledgment
                                                                                Information Networking and Applications Workshops (AINAW’07), 2007, pp.
     This work has been supported by the European Commis-                       13–18. [Online]. Available: http://ieeexplore.ieee.org/document/4224076/
     sion, under the projects IoTCrawler (Grant No. 779852),                    [8] C. Bormann, M. Ersue, and A. Keranen, Terminology for Constrained-
                                                                                Node Networks, IETF RFC 7228, 2014.
     CyberSec4Europe (Grant No. 830929) and Fed4IoT (Grant                      [9] M. S. Hadj Sassi, F. G. Jedidi, and L. C. Fourati, “A new architecture
     No. 814918), and by the Spanish Ministry of Science, In-                   for cognitive internet of things and big data,” Proc. Comput. Sci., vol. 159,
                                                                                pp. 534–543, 2019. doi: 10.1016/j.procs.2019.09.208. [Online]. Available:
     novation and Universities, under the project PERSEIDES
                                                                                https://linkinghub.elsevier.com/retrieve/pii/S1877050919313924
     (Grant No. TIN2017-86885-R with ERDF funds).                               [10] R. Roman, J. Lopez, and M. Mambo, “Mobile edge computing, Fog
                                                                                et al.: A survey and analysis of security threats and challenges,” Fu-
                                                                                ture Gener. Comput. Syst., vol. 78, pp. 680–698, Jan. 2018. doi: 10.1016/j.
                      Ramon Sanchez-Iborra received the                         future.2016.11.009. [Online]. Available: https://linkinghub.elsevier.com/
                      B.Sc. degree in telecommunication engi-                   retrieve/pii/S0167739X16305635
                      neering and the M.Sc. and Ph.D. degrees                   [11] B. Martinez, M. Monton, I. Vilajosana, and J. D. Prades, “The power of
                                                                                models: modeling power consumption for IoT devices,” IEEE Sensors J.,
                      in information and communication tech-                    vol. 15, no. 10, pp. 5777–5789, Oct. 2015. doi: 10.1109/JSEN.2015.2445094.
                      nologies from the Technical University                    [Online]. Available: http://ieeexplore.ieee.org/document/7122861/
                                                                                [12] TinyML. 2020. [Online]. Available: https://www.tinyml.org/
                      of Cartagena, in 2007, 2013, and 2016, re-                [13] C. MacGillivray and M. Torchia, “Internet of Things: Spending
     spectively. He is currently an Assistant Professor and a                   trends and outlook,” Tech. Rep., 2019. [Online]. Available: https://www
     Researcher with the Information and Communications                         .idc.com/getdoc.jsp?containerId=US45161419
                                                                                [14] H. Doyu, “TinyML as a Service and the challenges of machine learn-
     Engineering Department, University of Murcia. His main                     ing at the edge,” Ericsson, Tech. Rep., 2019. [Online]. Available: https://
     research interests are evaluation of QoE in multimedia                     www.ericsson.com/en/blog/2019/12/tinyml-as-a-service

     THIRD QUARTER 2020 		                                                                                         IEEE CIRCUITS AND SYSTEMS MAGAZINE           17

Authorized licensed use limited to: Universidad Politecnica de Cartagena. Downloaded on August 18,2020 at 08:22:56 UTC from IEEE Xplore. Restrictions apply.
You can also read