SENSOR NUMERICAL PREDICTION BASED ON LONG-TERM AND SHORT-TERM MEMORY NEURAL NETWORK - DIVA

Page created by Brent Marquez

World Around

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

SENSOR NUMERICAL PREDICTION BASED ON LONG-TERM AND SHORT-TERM MEMORY NEURAL NETWORK - DIVA

Sensor numerical prediction based on
long-term and short-term memory neural
network
Yangyang Wen

Type of document -- Computer Engineering BA(C),Final Project,Examensarbete
Main field of study: Computer Engineering
Credits: 15 credits
Semester/Year: Spring,VT2020
Supervisor: Dr. Forsström Stefan,stefan.forsstrom@miun.se
Examiner: Dr. Ulf Jennehag, e-mail ulf.jennehag@miun.se
Course code/registration number: DT099G
Degree programme: Exchange Student

Sensor numerical prediction based on LSTM neural network
Yangyang Wen                                                   2020-06-05

Abstract
Many sensor nodes are scattered in the sensor network,which are used
in all aspects of life due to their small size, low power consumption,
and multiple functions. With the advent of the Internet of Things, more
small sensor devices will appear in our lives. The research of deep
learning neural networks is generally based on large and medium-sized
devices such as servers and computers, and it is rarely heard about the
research of neural networks based on small Internet of Things devices.
In this study, the Internet of Things devices are divided into three types:
large, medium, and small in terms of device size, running speed, and
computing power. More vividly, I classify the laptop as a medium-
sized device, the device with more computing power than the laptop，
like server, as a large-size IoT(Internet of Things) device, and the IoT
mobile device that is smaller than it as a small IoT device. The purpose
of this paper is to explore the feasibility, usefulness, and effectiveness
of long-short-term memory neural network model value prediction
research based on small IoT devices. In the control experiment of small
and medium-sized Internet of Things devices, the following results are
obtained: the error curves of the training set and verification set of
small and medium-sized devices have the same downward trend, and
similar accuracy and errors. But in terms of time consumption, small
equipment is about 12 times that of medium-sized equipment.
Therefore, it can be concluded that the LSTM(long-and-short-term
memory neural networks) model value prediction research based on
small IoT devices is feasible, and the results are useful and effective.
One of the main problems encountered when the LSTM model is
extended to small devices is time-consuming.

Keywords: deep learning, long-and-short-term memory neural
networks, sensor networks, Internet of Things, value prediction.

                                    i

Sensor numerical prediction based on LSTM neural network
Yangyang Wen                                                2020-06-05

Acknowledgements
When conducting research and writing this thesis, I am participating in
the Chinese-Swedish Exchange Program and working as an exchange
student in Sweden. During the period, I am very grateful to teacher
Peihai Zhao for his domestic guidance.. At the same time, I feel pretty
grateful to my supervisor Forsström Stefan for providing me the
equipment needed for the experiment and for supporting me to modify
the paper. Thank Tingting Zhang for providing me the industrial sensor
data set.

                                  ii

Sensor numerical prediction based on LSTM neural network
Yangyang Wen                                                                                         2020-06-05

Table of Contents
1 Introduction................................................................................................... 1
1.1 Background and problem motivation..................................................... 1
1.2 Overall aim.................................................................................................. 3
1.3 Detailed problem statement...................................................................... 4
1.4 Scope............................................................................................................. 4
1.5 Outline..........................................................................................................5
1.6 Contributions.............................................................................................. 5
2 Theory............................................................................................................. 6
2.1 Sensor networks.......................................................................................... 6
2.2 Neural Network.......................................................................................... 7
2.2.1 Simple Neural Network..........................................................................7
2.2.2 Recurrent neural network...................................................................... 8
2.2.3 Long and short-term memory neural network model....................... 9
2.2.4 The working process of long and short-term memory neural
network model................................................................................................ 11
2.3 Data circulation process in LSTM Model.............................................. 17
2.4 Error estimation method......................................................................... 20
2.5 Tensor......................................................................................................... 21
2.6 Related research work..............................................................................22
2.6.1 LSTM-based stock returns prediction method: Take the Chinese
stock market as an example.......................................................................... 22
2.6.2 Travel time prediction of LSTM neural network.............................. 22
2.6.3 Development and application of deep neural network soft sensor
based on LSTM................................................................................................23
2.6.4 Analysis of industrial IoT devices based on LSTM.......................... 23
3 Methodology............................................................................................... 25
3.1.1 Pycharm.................................................................................................. 25
3.1.2 Anaconda................................................................................................ 26
3.2 LSTM univariate value prediction process........................................... 26
3.2.1 Data set.................................................................................................... 27
3.2.2 Data preprocessing................................................................................ 28
3.2.3 Data segmentation................................................................................. 30
3.2.4 Loss Function......................................................................................... 31
3.2.5 Multivariate experiment....................................................................... 31
3.3 Value prediction control experiment..................................................... 33
3.3.1 Univariate value prediction experiment based on laptop............... 33
3.3.2 Univariate value prediction experiment based on Raspberry Pi... 34

                                                          iii

Sensor numerical prediction based on LSTM neural network
Yangyang Wen                                                                                    2020-06-05

4 Implementation and Results.................................................................... 36
4.1 Environment and tools............................................................................ 36
4.1.1 Computer environment configuration............................................... 36
4.1.2 Environment configuration of Raspberry Pi..................................... 38
4.1.3 Version of equipment and environment............................................ 39
4.2 Multivariate experiment.......................................................................... 39
4.2.1 Experimental results of laptop.............................................................39
4.2.2 Experimental results of Raspberry Pi................................................. 45
4.3 Controlled experiment............................................................................. 50
4.3.1 The optimal model based on minimum error................................... 50
4.3.2 The optimal model based on minimum time consumption............ 53
5 Summary and Analysis of Results.......................................................... 57
5.1 Experimental summary of Multivariate experiment...........................57
5.2 Experimental summary of Control experiment................................... 58
6 Conclusions................................................................................................. 59
6.1 Ethical and Societal Discussion.............................................................. 60
6.2 Future Work.............................................................................................. 61

                                                      iv

Sensor numerical prediction based on LSTM neural network
Yangyang Wen                                               2020-06-05

Terminology
Acronyms

Adam       adaptive moment estimation,a method for stochastic
           Optimization

ARIMA      Differential autoregressive moving average model

BP         Back Propagation

KNN        K nearest neighbor classification algorithm

LSTM       Long Short-Term Memory

MAE        Mean Absolute Error

MSE        Mean Square Error

RMSE       Root Square Error

RNN        Recurrent nerual network

Mathematical notation

b          bias

bh         bias of hidden layer

C          Cell state

fa         activate function

Fo         Forget gate in LSTM

Ht         the results of hidden layer in RNN

In         Input gate in LSTM

Ot         the results of Output layer in RNN

Out        Output gate in LSTM

S          sigmoid activate function

                                  v

Sensor numerical prediction based on LSTM neural network
Yangyang Wen                                               2020-06-05

tanh       activate function

W          weight

Wih        the weight from Input layer to hidden layer

Z          input model

                                 vi

Sensor numerical prediction based on LSTM neural network
      Yangyang Wen                                                  2020-06-05

1     Introduction
      The concept of Industry 4.0 means that people are using information
      technology to promote industrial revolution and enter the era of
      intelligence. Data collection is the most practical and high-frequency
      demand in intelligent manufacturing, and it is also a prerequisite for
      Industry 4.0. Traditional data collection methods include manual entry,
      questionnaires, and telephone follow-up. Nowadays, the the sensor
      data collection method is one of the methods that directly changes the
      application scenarios of big data.

      It is estimated that by 2030, the number of small IoT devices will reach
      one trillion, and many of them will be small embedded devices with
      sensors and actuators. Due to the scale of the equipment and the large
      number of sensors, the actuator generates a lot of information every
      day. The sensor data collection method overcomes the error-prone and
      low-efficiency problems of traditional data collection methods, but the
      storage and processing difficulties caused by massive data are a major
      difficulty for sensors. The emergence of “dirty data” not only requires
      storage space, but also needs to be processed, which technically
      increases the difficulty.

      Therefore, the sensor value prediction method research based on LSTM
      will have important practical and innovative significance. For countries,
      sensor value prediction can effectively avoid the storage of dirty data
      and duplicate data, effectively save cloud storage, optimize the layout
      of network space resources. For engineering construction, the sensor
      value prediction can guarantee the effect of engineering construction
      and data analysis. For enterprises and the public, accurate sensor value
      prediction can provide data support for the development strategies of
      various enterprises and major decisions of the public, and improve the
      feasibility and reliability of decisions.

1.1   Background and problem motivation
      As early as 1998, new ideas for applying neural networks to solve
      problems in the field of sensors have been proposed proposed a new
      method for diagnosing sensor failure[1] based on neural network time
      series predictor. Its principle is to use the difference between predicted
      value of the neural network and the actual output value of the sensor to
      judge whether the sensor has failed. At present, many scholars around

                                          1

Sensor numerical prediction based on LSTM neural network
Yangyang Wen                                                   2020-06-05

the world have applied many mature models to the field of value
prediction, K nearest neighbor classification (KNN) algorithm,
differential autoregressive moving average model(ARIMA) and BP
neural network.

The details are as follows, uses the moving average method[2] to
forecast the demand for car sales in 4S stores[3]; and use the KNN[4]
method to predict the current category based on k historical data. The
category of the current time data in the past K days of data is the
category to which most of the data belongs. However, the prediction
results of this method are only related to a very small number of
adjacent samples, and when one sample has a large sample size and the
other sample classes are small, this method cannot guarantee the
accuracy of the prediction result. At the same time, the algorithm
requires a large amount of calculation. For each text to be classified, the
distance to all known samples must be calculated to obtain its K nearest
neighbors.

The ARIMA method[5] is used to predict the short-term expressway
traffic passager flow; use the gray system theory prediction [6] model
and the time series ARIMA prediction model to predict the traffic flow
respectively, and on this basis, the combined model’s prediction
accuracy is higher than that of the gray prediction model and the time
series analysis model, and it has the advantages of model simplicity
and strong interpretability. However, due to the characteristics of non-
linearity and uncertainty of the monitoring data, the parametric model
cannot describe its unique properties well, resulting in a larger
prediction error than the non-parametric model.

The temperature compensation of neural network humidity sensor
based on PSO-BP(Particle Swarm Optimization)algorithm[7] improves
the compensation accuracy of the original BP neural network, but it still
has the limitation of falling into extreme value. Short-term prediction of
urban passenger traffic based on GSO-BP[8] (Glowworm Swarm
Optimization) neural network has a slow learning speed and is easy to
fall into local minimum value.It performs better when the time series
data is shorter. In the case of processing long-term series data, it will
lead to the phenomenon that the earlier data has less influence on the
current prediction. It can be seen that choosing the appropriate
optimization algorithm can improve the performance of the prediction

                                    2

Sensor numerical prediction based on LSTM neural network
      Yangyang Wen                                                 2020-06-05

      model to a certain extent; and to break through the limitations, it is
      necessary to develop a more appropriate model.

      The LSTM neural network model can overcome the "forgetting"
      phenomenon and effectively solve the problem of gradient
      disappearance. Since its introduction, it has aroused great attention at
      home and abroad.

      The use of LSTM for continuous prediction[9] which enriched the
      traditional recurrent neural network (RNN),can solve many traditional
      problems that cannot be solved by RNN learning algorithms.

      The conbination method of LSTM and GRU neural network for traffic
      flow prediction[10] proves that deep learning methods using cyclic
      neural network such as LSTM create short-term memory and GRU gate
      The performance of the neural network controlling the recurrent unit is
      due to the autoregressive integrated moving average (ARIMA) model.

      What’s more, the LSTM neural network is used to predict stock price
      trends[11] .Compared with other machine learning methods, when
      predicting whether a particular stock will be in the near future When it
      will rise, the accuracy of the LSTM model reaches 55.9%.

      Due to the satisfactory performance of the long and short-term memory
      model in time series research, this study selects the LSTM model for the
      machine learning method used for sensor value prediction.

1.2   Overall aim
      There are problems such as data duplication and data loss when
      sensors collect data, which wastes storage space and increases sensor
      consumption. Neural network value prediction can reduce the
      frequency of sensor data collection without affecting the actual results,
      and use value prediction methods to make up for the missing data,
      which can reduce labor costs and machine repeated operation costs.

      There are problems such as data duplication and data loss when
      sensors collect data, which wastes storage space and increases sensor
      consumption. Neural network value prediction can reduce the
      frequency of sensor data collection without affecting the actual results,
      and use value prediction methods to make up for the missing data,
      which can reduce labor costs and machine repeated operation costs.

                                         3

Sensor numerical prediction based on LSTM neural network
      Yangyang Wen                                                  2020-06-05

      Currently, the hardware devices on which deep learning systems are
      based require large memory capacity, large GPUs, or strong CPU
      computing power. Neural network-based image recognition often
      requires server support. Based on the deep learning of mobile devices,
      the neural network model is directly hosted in the cloud, and the data
      can be uploaded through the mobile application to obtain the
      prediction results. This shows that the application of deep learning on
      small Internet devices is still very limited.

      In order to more clearly understand the main difficulties encountered
      when applying deep learning neural networks to small IoT devices, in
      order to explore the future fusion development of small IoT devices and
      deep learning, the project's aim is to compare LSTM on small IoT with
      LSTM on big or middle size IoT as a analysis to the performance of
      different IoT's LSTM prediction experiment. And also study and
      evaluate the feasibility, usefulness and effectiveness of the LSTM value
      prediction deep learning model applied to small IoT devices, and
      provide a reference for the future development of sensors and neural
      networks.

1.3   Detailed problem statement
      The survey has an objective to achieve the following three goals:

      1. Choose the suitable tools and libraries to build a deep learning
      environment.

      2. Propose and introduce the LSTM model for univariate value
      prediction process.

      3. The trained LSTM model is used for sensor value prediction research,
      and control experiments are set on the laptop and Raspberry Pi to
      evaluate and compare their training set’s loss value, testing set’s loss
      value and time consumption.

1.4   Scope
      The study has its focus on comparing the performance of LSTM value
      prediction experiments on large, medium and small IoT. In the
      experiment, the choice of hyperparameters is as moderate as possible,
      the purpose is to ignore the effect of overfitting on the experimental
      results. The prediction results on different devices are distinguished

                                         4

Sensor numerical prediction based on LSTM neural network
      Yangyang Wen                                                 2020-06-05

      according to the performance evaluation of the loss value of the training
      set , the loss value of the verification set and the time consumption.In
      the multivariate experiment, only the batch size and data set division in
      the hyperparameters were selected as independent variables, and the
      other parameters were initialized by default. For details, see the
      multivariate experiments in Part3 and Part 4.

1.5   Outline
      Chapter 2 describes data and basic theory used in LSTM value
      prediction research. Chapter 3 designs the method to realize the value
      prediction goal. Chapter 4 is the specific implementation process of the
      design scheme corresponding to the third section.Chapter 5 is the
      results part, which is a summary of the experimental results. Chapter 6
      is conclusion part, which discusses the future research of LSTM neural
      network value prediction and introduces the controversy caused by its
      moral aspects.

1.6   Contributions
      The study of this thesis is completed independently. The main work has
      been completed as follows: determine the research purpose, design the
      research plan, execute the research process and analyze the research
      results. And for all the tables used in the article, the flow charts and
      data charts are drawn by me or obtained by my own experiment.

                                         5

Sensor numerical prediction based on LSTM neural network
      Yangyang Wen                                                  2020-06-05

2     Theory
      This chapter will introduce some basic theories which I will use
      after.And also I will anaysis some related works and make conclusion
      about them here.

2.1   Sensor networks
      The development of wireless communication and electronic technology
      has promoted the development of low-power, low-cost, and multi-
      functional sensors. The sensor network is composed of many sensor
      nodes, and the sensor nodes communicate with each other to jointly
      realize the response to the surrounding environment or phenomenon
      [12]. These tiny sensors can communicate without limits within a
      certain range, have the ability to perform simple processing and
      calculation on local data, and realize the function of transmitting
      required data or locally calculated data to the network.

      In order to monitor a certain area or phenomenon, a sensor network
      deploys a large number of sensors in the area or on the surface of the
      phenomenon. These tiny sensor nodes have the following
      characteristics: large number of sensor nodes, intensive deployment,
      prone to failure, frequent network topology changes, broadcast
      communication, and most importantly, sensor nodes are limited in
      terms of power, computing power, and memory . The dense
      deployment of sensors leads to the overlapping of the monitoring range
      of the sensors, which causes a waste of storage space and collects a
      large amount of redundant information. However, problems such as
      fast power consumption and node failures can cause data loss, which in
      turn affects the sensor's monitoring of the environment or phenomena.

      The sensor network has the characteristics of easy deployment, self-
      organization, fault tolerance and wide application range. On the
      military side, sensor networks can be used for reconnaissance, targeting,
      intelligent control, and computing. In terms of health, the collected data
      can be used to monitor various indicators of patients. In terms of life,
      the sensor network can calculate the changes in the surrounding
      environment through the collected data, control the switching of
      household appliances, and regulate the temperature and humidity.
      Today, the very popular smart home technology is also inseparable
      from sensor networks.

                                         6

Sensor numerical prediction based on LSTM neural network
        Yangyang Wen                                                   2020-06-05

2.2     Neural Network
        The inspiration of the neural network model comes from simulating the
        processing of external information by the human brain. Every day our
        brain processes various stimuli from the external environment to guide
        our behavior. Three subjects are involved in this process: external
        stimuli, brain, and guidance scheme.

2.2.1   Simple Neural Network

        As shown in Figure 2-1, this is a schematic diagram of a simple fully
        connected neural network. It consists of an input layer, a hidden layer,
        and an output layer, which correspond to the external stimuli, brain,
        and program during the processing of external information by the
        human brain. There is only one neuron in the input and output layers.
        Hidden layers can have multiple layers to deal with more complex
        practical problems.

        The human brain is composed of many neurons, which contact each
        other and transmit messages through neurotransmitters. The neurons in
        the model, that is, circles, represent a computing center. They calculate
        the incoming data to determine whether the calculation result of the
        activation function satisfies the passing conditions. If so, the message is
        passed to the next computing center.

                               Figure 2-1 Neural network

                                            7

Sensor numerical prediction based on LSTM neural network
        Yangyang Wen                                                    2020-06-05

        The topological diagram of the neural network in Figure 2-1 shows that
        one neuron in the hidden layer will be stimulated by the three neurons
        in the input layer and decide whether to stimulate the two neurons in
        the output layer. The influence of input layer neurons on hidden layer
        neurons can be measured by weights. Whether the neurons in the
        hidden layer are activated to deliver a message to the next neuron
        depends on the calculation result of the activation function. Figure 2-2
        details the complete process of a neuron in the hidden layer of Figure 2-
        1 from being stimulated to delivering a message to the next neuron.

                     Figure 2-2 The calculation process of neurons

2.2.2   Recurrent neural network

        RNN is a machine learning model based on sequence model. It can
        effectively solve the problem of nonlinear time series. As shown in
        Figure 2-4 below, it shows the process of RNN model processing
        sequence data. For a given sequence I (I1, I2, I3..In), enter its sequence
        into the model, the hidden layer uses the following two formulas (1) ~
        (2) to iteratively calculate each data in sequence I, and finally get a
        hidden Layer sequence H (H1, H2, H3 ... Hn) and an output layer
        sequence O (O1, O2, O3 ... On). among them:

                                          Ht=fa(WihIt + WhhHt-1 + bh)          (1)
                                                  Ot=WhoHt + bo                (2)

                                            8

Sensor numerical prediction based on LSTM neural network
        Yangyang Wen                                                   2020-06-05

              (a) RNN model             (b) Hidden layer neuron structure

         Figure 2-3 Recurrent neural network model and hidden layer neuron
                                      structure

        For formulas (1) to (2), Ht represents the calculation result of the hidden
        layer, Ot represents the predicted result of the output, fa represents the
        activation function, W represents the weight, and b represents the bias.
        The order of the subscripts has a directional meaning, Wih represents
        the weight of the input layer named I to the hidden layer named H, bh
        represents the offset of the hidden layer, and so on.

        However, when the input sequence is too long, when the weights are
        updated, the effect of the sequence will show an exponential downward
        trend with time, and the gradient will exponentially decay with back
        propagation, that is, there are problems such as gradient disappearance
        and gradient explosion. , Affecting the accuracy of the model training
        results. In this case, the LSTM long-short-term memory neural network
        model will reflect its unique advantages. LSTM neural network has a
        long-term memory function, which can overcome the problems of
        gradient disappearance and gradient explosion, and effectively deal
        with time series.

2.2.3   Long and short-term memory neural network model

        In order to better understand the LSTM model, this section will
        introduce the structure and functional steps of the LSTM model neuron.
        The training process of the LSTM model is divided into four steps [13]:
        input time series data for forward calculation; and then reversely

                                            9

Sensor numerical prediction based on LSTM neural network
Yangyang Wen                                                 2020-06-05

calculate the error based on the output sequence of the prediction result
and the true value sequence; use the back propagation algorithm to
calculate the gradient of each weight ; Finally, choose the gradient-
based parameter optimization algorithm to update the weights. The
forward calculation formula of the LSTM model and the results of the
LSTM neurons are as follows:

        Int=S(WxInXt + WhInHt-1 + WcInCt-1 + bIn)                    (3)

       Fot=S(WxFoXt + WhFoHt-1 + WcFoCt-1 + bFo)                     (4)

       Ct=FotCt-1 + Inttanh(WxCXt +WhCHt-1 + bC)                     (5)

        Outt=S(WxOutXt + WhOutHt-1 + WcOutCt + bOut)                 (6)

       Ht=Yt=Outttanh(Ct)                                            (7)

                Figure 2-4 LSTM model neuron structure

Similarly, the input sequence is X (X1, X2 ... Xt ..), and the forward
calculation formulas (3) ~ (7) are iteratively calculated to obtain the
hidden layer sequence H (H1, H2 ..., Ht ..) And the output sequence Y
(Y1, Y2 ... Yt ..). Understanding Figure 2-5 helps to better understand
the meaning of the formula. In the formula and structure diagram, In,

                                      10

Sensor numerical prediction based on LSTM neural network
        Yangyang Wen                                                   2020-06-05

        Out, Fo, S, tanh, Z, and C respectively represent the input gate, output
        gate, forget gate, sigmoid activation function, tanh activation function,
        input module, and cell state of the neural network. Xt represents an
        element in the input sequence, H (t-1) represents the calculation result
        of the previous hidden layer, and Ht represents the calculation result of
        this hidden layer. C (t-1) represents the cell state of the previous layer,
        and Ct represents the cell state of this layer.

        The parameter optimization algorithm is used to update the model
        parameters. A good optimization algorithm can make the model
        converge faster and complete the parameter update faster. The
        parameter optimization algorithm based on gradient can find the best
        advantage as the gradient direction to update the weight. This article
        selects Adam optimization algorithm to update the model parameters.
        Adam adaptive momentum estimation algorithm as a gradient-based
        parameter optimization algorithm is widely used because of its easy
        implementation, high computational efficiency, and low memory
        footprint.

2.2.4   The working process of long and short-term memory neural
        network model

        In order to better understand the flow of the LSTM model for
        prediction work, and to understand the connected working modules
        and related functions of the LSTM input layer, output layer and hidden
        layer as a whole, this section will introduce the LSTM working process
        in detail.

                                            11

Sensor numerical prediction based on LSTM neural network
Yangyang Wen                                                   2020-06-05

                   Figure 2-5 LSTM working process

The LSTM model can be divided into two parts: network training and
network prediction. Network training is used to train a unique and
suitable LSTM prediction model with good prediction effect and good
weight parameter matching for specific experimental purposes. As we
all know, the data set is divided into a training set and a test set. The
training set is used to train the network, and the test set is used to make
network predictions to verify how effective the trained model is,
whether there is underfitting or overfitting. According to the
phenomenon, the basis of judgment is usually the error loss value
calculated by the loss function. The training work can be subdivided
into original data description and cleaning, data set division,
standardization and data segmentation (also can be understood as data
format conversion, that is, the standardized data sequence is divided
into tensors that can be directly processed by LSTM).

A. Network training
The network training takes the hidden layer as the research object. The
input sequence X of the input layer needs to meet the data format
requirements of the hidden layer, and the format of the output
sequence P of the output layer depends on the calculation result of the
hidden layer. LSTM is a special recurrent neural network RNN. Their
basic principles are the same. LSTM has been further improved.

                                    12

Sensor numerical prediction based on LSTM neural network
Yangyang Wen                                                             2020-06-05

Compared with the RNN model, the advantage of the LSTM model is
that in addition to solving the problems of gradient disappearance and
gradient explosion, it is not necessary to set the length of the window in
advance, and it can also be said that the number of windows. In real life,
the length of the long-term data set is large, and the implementation
cannot be estimated. The LSTM model has more practical significance.
Starting from the model input cutting process of window prediction by
RNN, the output cutting process of LSTM model is explored.

First, in the input layer, define the sensor time series as t = {t1, t2, ..., tn},
and divide it into training set ttrain = {t1, t2, ..., tm} and test set ttest = { tm+1,
tm+2, ..., tn}, satisfy the constraint m

Sensor numerical prediction based on LSTM neural network
Yangyang Wen                                                    2020-06-05

each window is i to m-L + i-1. The input and theoretical output of the
model are a subset of the training set, then the corresponding
theoretical output is

       Y={Y1,Y2,...,YL}                                                (13)

       Yi={Ti+1,Ti+2,...,Tm-L+i}                                       (14)

Input the sequence X into the hidden layer, combined with the
recurrent neural network mentioned in the theoretical section, we can
know that the sequence will be processed by L LSTM neurons, and
iteratively calculates the forward calculation formulas (3) ~ (7) to obtain
the predicted sequence P. The output of X after passing through the
hidden layer can be expressed as:

       P={P1，P2，...，P1}                                                (15)

       Pp=LSTMforward(Xp,Cp-1,Hp-1)                                    (16)

Cp-1 and Hp-1 respectively represent the state of the previous neuron and
the predicted result of the output. LSTMforward represents the forward
calculation formula. It can be seen that the hidden layer output
sequence P, the model input sequence X and the theoretical output
sequence Y windows (size, length) can be represented by a two-
dimensional tensor table with the shape (m-L, L). In this experiment,
the average absolute error calculation method MAE is selected as the
loss function. The elements of the theoretical output are yi and the
elements of the prediction result are pi. The loss function is expressed as:

                  1     ( m L) L
       Loss=               
               (m  L) L i 1
                                ( pi  yi ) 2                          (17)

Set the results of the random seed reduction model and set the number
of training steps for the purpose of saving time. Select Adam's gradient-
based model optimization algorithm to update the network weights,
and finally get a trained LSTM hidden layer network.

(2) LSTM model cutting training Ttrain = {T1, T2, ..., Tm}

LSTM needs to set the window size, and the number of windows can be
calculated automatically. Set the size of the split window to s, then the
split model input is:

                                                14

Sensor numerical prediction based on LSTM neural network
Yangyang Wen                                                   2020-06-05

      X={X1,X2,...,Xn-s}                                              (18)

      Xi={Ti,Ti+1,Ti+2,...,Ts+i-1}                                    (19)

       1≦i≦n-s,i∈N                                                    (20)

The size of the window is n, which means that there will be n-s
prediction results. The number of data in each window is from i to s + i-
1, a total of s elements. The input and theoretical output of the model
are a subset of the training set, then the corresponding theoretical
output is:

      Y={Y1,Y2,...,Yn-s}                                              (21)

      Yi={Ti+1,Ti+2,...,Ts+i}                                         (22)

Input the sequence X into the hidden layer, combined with the
recurrent neural network mentioned in the theoretical section, we can
know that the sequence will be processed by L LSTM neurons, and
iteratively calculates the forward calculation formulas (3) ~ (7) to obtain
the predicted sequence P. The output of X after passing through the
hidden layer can be expressed as:

      P={P1，P2，...，Pn-s}                                              (23)

       Pp=LSTMforward(Xp,Cp-1,Hp-1)                                   (24)

Cp-1 and Hp-1 respectively represent the state of the previous neuron and
the predicted result of the output. LSTMforward represents the forward
calculation formula. It can be seen that the windows (size, length) of the
hidden layer output sequence P, the model input sequence X and the
theoretical output sequence Y can be represented by a two-dimensional
tensor table of shape (s, n-s). In this experiment, the average absolute
error calculation method MAE is selected as the loss function. The
elements of the theoretical output are yi and the elements of the
prediction result are pi. The loss function is expressed as:
              s (ns )

               (p       i    yi ) 2
      Loss=     i 1
                                                                      (25)
                   s(n  s)

                                        15

Sensor numerical prediction based on LSTM neural network
Yangyang Wen                                                         2020-06-05

Set random seed reduction model results, set training steps, save time.
Select Adam's gradient-based model optimization algorithm to update
the network weights, and finally get a trained LSTM hidden layer
network.

B.   Network prediction

The network prediction part mainly includes the following contents:
iterative prediction, anti-normalized prediction results; calculation of
the error between the prediction results and the corresponding
theoretical set. The prediction process uses an iterative method, and
each prediction result will be used as the last element of the next model
input sequence to predict the next result. The process of network
prediction is as follows:

In the first step, the last input sequence of training set X is Xf = Tf = {Tm-n,
Tm-n+1, ..., Tm-1}, and the final theoretical output is Yf = {Tm-n+1, Tm-n+2, ...,
Tm}. This means the end of model training and the start of model
prediction.

In the second step, Yf is input into the LSTM model as an input
sequence. An output sequence Pf is obtained, and the last element of the
sequence represents the first prediction result of the model. Pf = LSTM
(Yf) = {Pm-n+2, Pm-n+3, ..., Pm+1}, this formula represents the predicted value
at the time of m + 1 is Pm+1. So a new output sequence is Yf+1 = {Tm-n+2, Tm-
n+3, ..., Tm, Pm+1}. Take it as input to the model calculation and iterate this

step until the length of the prediction sequence is equal to the length of
the test set to stop prediction. The prediction sequence at this time is P =
{Pm+1, Pm+2, ..., Pn}.

The accuracy of the model is measured by the error, and the error
represents the difference between the predicted value and the actual
value. Now that we have obtained the sequence of predicted values, we
only need to calculate with the corresponding theoretical value set.
However, it should be noted that in the previous step, normalization
converges the range of the training set data between 0-1, so the size of
each data in the current predicted value sequence is also between 0-1.
Denormalize the prediction set as:

                                       16

Sensor numerical prediction based on LSTM neural network
      Yangyang Wen                                                           2020-06-05

                                     n       n          2        n

                                     (ti   ti / n)           t     i
                           pi=Pi    t 1    i 1
                                                            +   i 1
                                                                                    (26)
                                            n                     n

                                   m+1≦i≦n,i∈N

      Error calculation. The error (Ptrain, Ttrain), Error (Ptestt, Ttest) of the training
      set and the test set are calculated by using the Loss function of equation
      (25).

2.3   Data circulation process in LSTM Model
      This section mainly introduces the whole process of the original data,
      sequentially, and without duplication through the LSTM model.
      Analyze the professional terms involved in the process.As shown in
      Figure 2-6 below, the entire process from the original data taken from
      the file to the LSTM model includes:

           1．reading into the buffer; dividing the batch;

           2．standardizing and data segmentation of the first batch;

           3．input LSTM neural network ;

           4．Reverse calculation of error and weight;

           5．update weight parameters;

           6．continue to traverse the next batch until all data is processed.

                                                   17

Sensor numerical prediction based on LSTM neural network
Yangyang Wen                                                    2020-06-05

Figure 2-6 The whole process of data passing through the LSTM model

Here, introduce some professional terms that are often used in neural
networks: buffer, batch, batch_size, iteration, steps, epoch.

As the name implies, buffer is just a buffer that we usually say. The
original sensor value data needs to pass through the buffer before
entering the LSTM network model. When the data set is very small,
only a few hundred or thousands, the buffer can read all sensor value
data at once. When the data set is large, there are tens of millions of
data types, the buffer can not read all the data at once. Therefore, only
part of the data can be entered first. After the data is called to leave the
buffer, the following data immediately enters the buffer to fill the gap.
We often say that buffer_size represents the size of the buffer. As
shown in the data pipeline diagram in the following figure, the data in
the buffer is cut into batches, each batch is sequentially standardized,
and the data is divided and then sent to the LSTM model.

                                    18

Sensor numerical prediction based on LSTM neural network
Yangyang Wen                                                   2020-06-05

                        Figure 2-7 Data pipeline

Batch meaning "batch" can be abstractly understood as packing 10
boxes of milk into a box, this box of milk is also called a batch of milk.
Batch_size, meaning batch size. Therefore, in the above example, it can
be well understood as the quantity of a batch of milk, batch_size is
equal to 10. Looking back at the working process of the LSTM model in
Figure 2-5, a batch here is equivalent to the training set of the input
layer, which can only be entered into the LSTM model for training after
standardization and data segmentation. The purpose of batching data is
mainly to: (1) improve the utilization of memory; (2) increase the
number of model iterations and parameter updates, so as to better
converge to the optimal performance model.

Iteration is the number of times an batch completes an epoch. A batch
performs forward calculation through the LSTM network, and then
reversely updates the weight parameters of the LSTM network. This
process is called an iteration. Steps refer to the number of steps to train
the network, but in fact it has the same meaning as iteration, so steps
and iteration are collectively called iteration in the back.

Epoch, translated as "epoch", I understand it as a cycle. All data in the
training set completes the model calculation task, which is called an
epoch. Assuming that the size of the data set is S, in an epoch, Iteration
and batch satisfy the following relationship:

                                    19

Sensor numerical prediction based on LSTM neural network
      Yangyang Wen                                                                2020-06-05

                                                 S
         Batch = Iteration =                                                         (27)
                                            batch _ size

      Sometimes it is not enough to train the model once with the complete
      data set. It is necessary to repeatedly use the data set to train several
      times. In this case, the number of batches is greater than Iteration, and
      the iteration value is unchanged.

2.4   Error estimation method
      The error estimation method is the loss function we often say. It is often
      used to calculate the deviation of the model's prediction results from
      the actual results and measure the model's prediction performance.
      Generally, the smaller the loss function value, the higher the model's
      prediction accuracy and the better the model performance; conversely,
      the larger the loss function value, the worse the model performance.
      Commonly used loss functions are Root Mean Square Error (RMSE),
      Mean Square Error (MSE), Mean Absolute Error (MAE), etc.

      When observing an experimental phenomenon, the true value set of
      this phenomenon is [x1, x2, ... xn], the corresponding prediction result
      set is [y1, y2, ... yn], and the error Ei = Yi- Xi; then the three forms of loss
      function reflecting the experiment are:

                              n                                      n

                              ( yi  xi) 2                   Ei            2

             RMSE=           i 1
                                                       =         i 1
                                                                                        (28)
                                        n                                n
                     n                             n

                     ( yi  xi) 2                 Ei           2

             MSE=   i 1
                                              =   i 1
                                                                                        (29)
                                  n                      n
                         n                             n

                      | yi  xi |  | Ei |
             MAE=    i 1
                                               =    i 1
                                                                                        (30)
                                    n                        n

      RMSE is often used as a standard function for measuring errors in
      machine learning models; MSE is the summation average of the squares
      of errors and is often used as a loss function; MAE is the average of
      absolute errors, which can reflect the actual situation of the predicted
      value error well; this article chooses MAE as a loss function of the
      model.

                                                                             20

Sensor numerical prediction based on LSTM neural network
      Yangyang Wen                                                           2020-06-05

2.5   Tensor
      Tensors are containers used to store data of different dimensions in
      neural networks. Tensors can have multiple dimensions, and each
      dimension can have multiple vectors. The input sequence of the LSTM
      neural network is usually a two-dimensional tensor table (samples,
      features). If an input time series contains 100 samples, and each sample
      collects the values of feature i and feature j at that time, then the tensor
      of this event sequence is expressed as (100, 2).

                                      table2-1 tensor table

       rank               example                          Python output

              0            Scalar                               S=234

              1            vector                         V=[32.1,25.2,3.3]

              2            matrix                    M=[[1,2,3],[4,5,6],[7,8,9]]

              3    3rd order tensor      T=[[[1],[2],[3]],[[4],[5],[6]],[[7],[8],[9]]]

              n              nth                                  ......

      Tensorflow uses tensor tables for data. Tensors have three key
      attributes, namely the number of axes, shape and data type. Axis is also
      called rank, 0 axis with 0 dimension tensor, 1 axis with 1 dimension
      tensor, 2 axes with 2 dimension tensor, and so on. Each axis can have
      multiple vectors. The data types are easier to understand, such as
      float32, unit8, float64, etc. The third attribute is shape. Knowing the
      array and shape can restore the tensor table, the shape of the tensor
      table can be calculated by the shape () function. The Shape () function
      has three parameters, and the size of the parameter represents the
      number of data. For example, shape = (2,3,2) means a three-dimensional
      tensor table, and the tensor table has two data in the first dimension,
      three data in the second dimension, and two data in the third
      dimension. Tensors [[[1,2], [3,4], [5,6]], [[7,8], [9,10], [11,12]]] are an
      example of satisfying the conditions. The reshape () function can divide
      the array into tensor tables of any shape. If you define an array a = array
      ([1,2,3,4,5,6,7,8,9,10,11,12]), a = np.reshape (a, (2,3, -1 )) The result is
      [[[1,2], [3,4], [5,6]], [[7,8], [9, 10], [11,12]]]. -1 represents the number of
      elements in the smallest unit, which can be inferred automatically. The

                                             21

Sensor numerical prediction based on LSTM neural network
        Yangyang Wen                                                      2020-06-05

        method to access the two-dimensional tensor table is also very simple,
        the access method is similar to the two-dimensional array. For example,
        t = [[1,2, 3], [4,5,6], [7,8,9]] is a second-order vector, and you can use the
        statement t [i, j] to access any element in it. For example, t [1,2] returns 6.
        Similarly, for third-order vectors, t [i, j, k] can be used to access any
        element.

2.6     Related research work
        In this paper, the LSTM model is used to model and predict industrial
        sensor data. The data set is composed of the sensor values collected by
        multiple small sensors from 2016/2/18 12:28:34 to 2016/2/18 15:20:19.
        The data set contains multiple sensor values in a continuous time of 2
        hours, 51 minutes and 45 seconds, and a sensor value is collected every
        0.1 seconds. The standard number of sensor data is 100,000. Transform
        the sensor historical data into multiple input sequences with a length of
        20 historical data, and then the 21st (that is, the next 0.1 second) data
        will be used as an element of the tag array and used as the theoretical
        output of the predicted value. This sequence has only one characteristic
        value of the current sensor's recorded value of the surrounding
        environment. Train the historical data with 8: 2 and 7: 3 ratios
        respectively to fit the LSTM model, select the optimal performance ratio
        for Raspberry Pi and laptop, and compare their prediction results.

2.6.1    LSTM-based stock returns prediction method: Take the Chinese
        stock market as an example

        The study used the LSTM model to predict China ’s stock returns,
        collecting the daily highs, lows, opening and closing records, and the
        Shanghai Composite Index of the Shanghai and Shenzhen Chinese
        stock markets from 1990 to 2015[14]. Divide the training set and test set
        according to the ratio of 4: 1, and set up five sets of control experiments.
        The results preliminarily prove that LSTM value prediction has a
        powerful function for the Chinese stock market. The common point
        with the LSTM model sensor value prediction experiment in this article
        is that the data is divided using a 4: 1 ratio, and control experiments are
        set up; the difference is that the experiment using the LSTM model to
        predict Chinese stocks is a multivariate value prediction, which sets
        more Many controlled experiments.

2.6.2   Travel time prediction of LSTM neural network

                                             22

Sensor numerical prediction based on LSTM neural network
        Yangyang Wen                                                  2020-06-05

        Travel time is one of the most important things for travelers. Travel
        time means the time spent from the beginning to the end of the
        previous journey, and once recorded, it becomes historical data.
        Knowing the historical time and predicting the travel time of the next
        trip has important guidance for the planning of travel attractions and
        road selection, and can help save time. LSTM travel time prediction has
        constructed 66 LSTM neural network sequence prediction models for
        the 66-segment link data set of the highway[15], and selected the
        optimal setting range for each model through training and testing. At
        the same time, for the time consumption on each link, the LSTM neural
        network's travel time prediction model performs multi-step prediction
        and sets predictions for 1 to 5 time steps in the future.

        The LSTM travel time prediction study is similar to the study in this
        article in that it has made efforts to select the optimal performance
        model parameter combination; the difference is that the study not only
        predicts a prediction experiment of 1 time step in the future, but also
        completes many Step prediction experiment. And the conclusion is
        drawn that the longer the time step, the greater the error.

2.6.3   Development and application of deep neural network soft sensor
        based on LSTM

        With the popularity of wearable devices nowadays, the comfort and
        elasticity of soft sensors have become the main factors that need to be
        considered when designing sensors for the human body or clothing.
        Soft sensors can accurately measure quality variables or important
        process variables [17]. These key quality variable data usually have
        dynamic and non-linear characteristics, which are sometimes difficult
        to measure. The method of applying LSTM model to soft sensor[16] for
        variable measurement can be used to measure variables with strong
        linearity and dynamic soft structure, and is especially suitable for
        dynamic soft sensor modeling. This point is very similar to the design
        concept of the research method in this article. Industrial sensor data has
        a large amount of data. The combination of long- and short-term
        memory neural networks and sensors can not only estimate lost data,
        but also predict complex data to reduce costs.

2.6.4   Analysis of industrial IoT devices based on LSTM

                                           23

Sensor numerical prediction based on LSTM neural network
Yangyang Wen                                                  2020-06-05

The research starts with modeling and predicting the operating state of
equipment using historical data of the Industrial Internet of Things, and
proposes a method[18] that uses long-term memory to predict the
operating state of equipment—the LSTM model. The similarity between
this study and this article is that we all propose the LSTM model to
analyze the status of industrial Internet of Things devices. We all set up
a control group. The difference is that this study proves the superiority
of LSTM for value prediction from the differences. This study starts
from the good value prediction effect of LSTM and aims to find the
similarity of the prediction results of LSTM models on large and small
IoT devices, proving the bright prospect of LSTM models applied to
small IoT devices.

                                   24

Sensor numerical prediction based on LSTM neural network
        Yangyang Wen                                                    2020-06-05

3       Methodology
        The experiment will be divided into three parts, as shown in Figure 3-1.
        First, introduce the environment in which the model runs and the tools
        used. Then, describe the LSTM neural network univariate prediction
        process, including data sets and data preprocessing. Then, import the
        experimental data set into the trained LSTM model and compare the
        prediction results based on Raspberry Pi and laptop.

                        Figure 3-1 Research steps and objectives

        3.1 Environment and tools

        The first goal is to find tools and learning libraries suitable for
        configuring deep learning networks on laptops and Raspberry Pi. First,
        by reviewing the information, I summarized two python development
        platforms for building neural networks. They are the pycharm
        computer integrated development environment and the free python
        development version Anaconda. By designing the installation and
        configuration steps of the two software, try to install them on the
        experimental equipment, and choose the best environment
        configuration according to the installation situation.

3.1.1   Pycharm

        The steps to build a deep learning framework on pycharm are mainly
        divided into three steps: first install Python 3.0 or later. Second, install
        pycharm and select the python installed in the previous step as the

                                            25

Sensor numerical prediction based on LSTM neural network
        Yangyang Wen                                                   2020-06-05

        automatically created pycharm virtual environment editor. In addition,
        install tensorflow and keras. As a back-end character library for keras,
        Tensorflow needs to be pre-installed to develop and train deep learning
        frameworks. You can use the statement import tensorflow as tf to verify
        that tensorflow has been successfully installed. If no errors are reported,
        the installation was successful.If an error is reported, you need to
        consider whether there is a version mismatch. Finally, install the library
        software packages needed for the experiment, such as numpy,
        matplotlib, etc. The successful installation of the library package will
        affect the success of the experiment.

3.1.2       Anaconda

        As a Python language distribution, Anaconda has package
        management functions and environment management functions.
        Anaconda has built-in many different versions of data packets and
        libraries needed to build neural networks, such as numpy, conda,
        python, numpy, pandas, etc. Anaconda will automatically complete the
        version matching between various data packages and libraries, and
        there will be no environment building error due to version mismatch.
        The environment management function supports the creation of
        different virtual environments for different projects. As a web-based
        application for interactive computing, JupyterNotebook can realize the
        process of code editing, running, displaying results and saving through
        the browser, and it also supports dozens of programming languages 
            such as python, R and Julia. First install Anaconda; after successful
        installation, create a virtual environment, install tensorflow, keras, and
        various data packages; after the installation is complete, open the
        Jupyter interactive notebook to write code.

3.2         LSTM univariate value prediction process
        The second goal of this experiment is to describe and introduce the
        LSTM model for univariate value prediction process. In order to
        achieve this goal, this summary describes the entire process of LSTM
        neural network value prediction. And the forecasting process is mainly
        divided into four steps: data set introduction, data preprocessing, data
        segmentation, and multi-variable experimental parameter adjustment.

                                            26

Sensor numerical prediction based on LSTM neural network
        Yangyang Wen                                                    2020-06-05

                   Figure 3-2 LSTM univariate value prediction process

        The univariate value prediction process of this experiment, as shown in
        Figure 3-2 above, first introduces the sensor data set, then the data
        preprocessing, including data cleaning and data standardization. Next
        is data segmentation, the target is a tensor table and label array. Finally,
        through multivariate experiments, the optimal parameter combination
        of the model performance is selected and the performance is evaluated.

3.2.1   Data set

        The data used in the experiment comes from actual industrial sensor
        data. From 2016/2/18 12:28:34 to 2016/2/18 15:20:19, multiple small
        sensors collected multiple sensor values in a continuous time of 2 hours,
        51 minutes and 45 seconds. The standard number of sensor data is
        100,000, and a sensor value is collected every 0.1 seconds. All sensor

                                            27

Sensor numerical prediction based on LSTM neural network
        Yangyang Wen                                                  2020-06-05

        data is stored in industrial_sensor_data .csv files in the form of numbers
        and text.

                         Figure 3-3 Industrial_sensor_data.csv

        In the above figure, the first row represents the name of each sensor,
        and each column represents the time-dependent data value recorded by
        the sensor. During the recording process, the sensor will encounter
        extreme weather, emergencies, man-made damage and other problems,
        which will result in the loss and redundancy of data values. As a
        univariate time series forecast, this experiment only needs to select the
        data of one sensor from all sensors for prediction.

3.2.2   Data preprocessing

        The first step in data preprocessing is data cleansing. Data cleaning
        mainly includes selecting the appropriate sensor and its data from a
        large number of sensors, and processing the redundant data and
        duplicate data. Therefore, choose the sensor that is most suitable for the
        experiment and see if there is a problem with the sensor data value. If
        there is missing data, it can be solved by manual filling. For redundant
        and abnormal data can be deleted. In this experiment, the original data
        is processed to exclude sensors with multiple redundant data and
        sensor data with multiple missing values. If there are missing values,
        we can use the average of the two values before and after to fill the gap.
        Finally, select the sensor named "C01424REG403RW". Create another
        CSV file and save the timestamp and data set of the
        "C01424REG403RW" sensor separately, named Sensor_values1.csv, as
        shown below.

                                           28

Sensor numerical prediction based on LSTM neural network
Yangyang Wen                                                   2020-06-05

                      Figure 3-4 Sensor-value1.csv

As a data processing and analysis software library, Pandas provides a
variety of advanced data structures, with powerful data indexing and
processing capabilities. Pandas.read_csv (file_or_buffer) can directly
read the file in csv format and return the DataFrame data object.
file_or_buffer indicates the access path of the file. As one of the main
data objects of pandas, DataFrame can be translated into "data frame",
which is a two-dimensional array structure, similar to the form of Excel
spreadsheet. Its vertical rows are called columns, and its horizontal
rows are called index subscripts. You can determine the position of an
element      value     through      columns       and     index.     The
Pandas.DataFrame.head function can return any number of rows of
DataFrame data objects.

 Figure 3-5 DataFrame data object        Figure 3-6 Check for null values

First, the Pandas.read_csv function reads the CSV file and returns the
DataFrame data object. Then use the Pandas.DataFrame.head function
to display all the contents of the data object. As shown in Figure 1, the
serial number ranges from 0 to 99998 and consists of two columns with
a total of 99999 rows. The first column shows the timestamp, and the
second column is the data value recorded by the sensor named
"C01424REG403RW". The file results have a total of 99999 data,

                                    29

You can also read