DESIGN AND IMPLEMENTATION OF A DATA AGGREGATION SYSTEM FOR THE 5G NETWORK MANAGEMENT PLANE - TRABAJO FIN DE MÁSTER José Luis Mendoza Sánchez

Page created by Felix Vega

Education

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

DESIGN AND IMPLEMENTATION OF A DATA AGGREGATION SYSTEM FOR THE 5G NETWORK MANAGEMENT PLANE - TRABAJO FIN DE MÁSTER José Luis Mendoza Sánchez

Universidad Politécnica de Madrid
Escuela Técnica Superior de Ingenieros de Telecomunicación

 DESIGN AND IMPLEMENTATION OF A
DATA AGGREGATION SYSTEM FOR THE
 5G NETWORK MANAGEMENT PLANE

             TRABAJO FIN DE MÁSTER

              José Luis Mendoza Sánchez

                           2021

Universidad Politécnica de Madrid
Escuela Técnica Superior de Ingenieros de Telecomunicación

               Máster Universitario en
     Ingeniería de Redes y Servicios Telemáticos

             TRABAJO FIN DE MÁSTER

 DESIGN AND IMPLEMENTATION OF A
DATA AGGREGATION SYSTEM FOR THE
 5G NETWORK MANAGEMENT PLANE

                         Autor
              José Luis Mendoza Sánchez

                      Tutor
       Ignacio Domínguez Martínez-Casanueva

   Departamento de Ingeniería de Sistemas Telemáticos

                           2021

Resumen

Las redes 5G han provocado un aumento de la complejidad del plano de gestión. Para
lograr una gestión inteligente y eficiente de las redes 5G, se necesitan nuevos
mecanismos de monitorización. El principal problema es el enorme número de fuentes
de datos que existen y su heterogeneidad, cada una con sus propios mecanismos y
formatos para representar la información. Utilizaremos el estándar ETSI de gestión de
información de contexto (ETSI ISG CIM), que permitirá una alta interoperabilidad con
otros sistemas de monitorización.

En este Trabajo de Fin de Máster, proponemos una arquitectura genérica que permite
agregar y unificar todos estos datos para distribuirlos entre los suscriptores interesados
con diferentes propósitos (por ejemplo: análisis de datos, visualización…). Hemos
desarrollado dos prototipos que implementan dicha arquitectura: uno de ellos utiliza un
NGSI-LD Context Broker del estándar ETSI ISG CIM, y el otro utiliza Apache Kafka
como bróker de distribución, una plataforma de intermediación de mensajes. Hemos
realizado un experimento con los dos prototipos para compararlos y comprobar cuál se
adapta mejor a un escenario de monitorización de red. Los resultados han demostrado
que, mientras que los Context Brokers de NGSI-LD son viables en escenarios con poca
carga de datos, la solución de Kafka se ajusta mejor en escenarios con alta carga de datos,
como podría ser el caso de la monitorización de redes.

Finalmente, hemos desarrollado otra implementación de la arquitectura propuesta
para ser utilizada en un escenario de uso real de la monitorización de la red 5G. Para el
desarrollo de todas las soluciones mencionadas, utilizaremos Apache NiFi, una
herramienta de ingeniería de datos.

Palabras clave: ETSI ISG Context Information Management, NGSI-LD, monitorización
de redes 5G, Orion-LD, Scorpio, Apache Nifi

Abstract

5G networks have led to an increase in the complexity of the management plane. In
order to achieve a smart and efficient management of 5G networks, new monitoring
mechanisms are required. The main problem is the huge number of data sources that
exist and their heterogeneity, with their own mechanisms and formats to represent the
information. We will leverage the ETSI ISG Context Information Management
framework (CIM) standard, which will enable high interoperability with other
monitoring systems.

In this work, we propose a generic architecture which allows us to aggregate and
unify all this data to distribute it among interested subscribers with different purposes
(e.g., data analysis, visualization). We have developed two prototypes that implement
the forementioned architecture: one of them uses a NGSI-LD Context Broker from the
ETSI ISG CIM standard, and the other uses Apache Kafka, an event streaming platform,
as the distribution broker. We have conducted an experiment involving the two
prototypes to compare them and check which fits better in a network monitoring
scenario. The results have shown that, while the NGSI-LD Context Brokers are feasible
in scenarios with low loads of data, the Kafka solution fits better in scenarios with high
loads of data, as it could be the case of network monitoring.

Finally, we will develop another implementation of the proposed architecture to be
used in a real use case scenario of the monitoring of the 5G network. For the developing
of all mentioned solutions, we will use Apache NiFi, a data engineering tool.

Keywords: ETSI ISG Context Information Management, NGSI-LD, 5G network
monitoring, Orion-LD, Scorpio, Apache Nifi

iii

Table of Contents

  Resumen .................................................................................................................................. i

  Abstract.................................................................................................................................. iii

  Table of Contents................................................................................................................... v

  List of Figures ...................................................................................................................... vii

  List of Acronyms .................................................................................................................. ix

  1     Introduction .................................................................................................................... 1

      1.1       Objectives of the project ........................................................................................ 3

      1.2       Structure of this document .................................................................................... 3

  2     State of the art ................................................................................................................. 5

      2.1       Context Information Management ...................................................................... 5

      2.2       Prometheus............................................................................................................ 10

      2.3       Apache NiFi .......................................................................................................... 12

         2.3.1         The ETL procedure....................................................................................... 13

         2.3.2         Core components .......................................................................................... 15

         2.3.3         Record-oriented data ................................................................................... 17

      2.4       Apache Kafka ........................................................................................................ 18

         2.4.1         Concepts ........................................................................................................ 19

         2.4.2         Architecture ................................................................................................... 20

      2.5       Apache Avro ......................................................................................................... 21

  3     The monitoring data aggregation system ................................................................. 23

  4     Experiments and results .............................................................................................. 27

      4.1       The systems under test ........................................................................................ 27

         4.1.1         The NGSI-LD adapter .................................................................................. 27

         4.1.2         The Kafka adapter ........................................................................................ 32

      4.2       Experiments........................................................................................................... 34

                                                                        v

4.3       Results .................................................................................................................... 35

        4.4       Experiment conclusions....................................................................................... 38

    5      Real use case.................................................................................................................. 39

    6      Conclusions and future work ..................................................................................... 43

        6.1       Conclusions ........................................................................................................... 43

        6.2       Future work ........................................................................................................... 44

    Bibliography......................................................................................................................... 47

    APPENDIX A: ETHICAL, ECONOMIC, SOCIAL AND ENVIRONMENTAL
ASPECTS ................................................................................................................................... 51

        A.1 INTRODUCTION .................................................................................................... 51

        A.2 DESCRIPTION OF RELEVANT IMPACTS RELATED TO THE PROJECT .... 51

        A.3 DETAILED ANALYSIS OF SOME OF THE MAIN IMPACTS ......................... 52

        A.4 CONCLUSIONS ....................................................................................................... 53

    APPENDIX B: ECONOMIC BUDGET ............................................................................. 55

                                                                        vi

List of Figures

Figure 2-1. Central Broker architecture .............................................................................. 7

Figure 2-2. Distributed Broker architecture ....................................................................... 7

Figure 2-3. NGSI-LD Information Model layers ............................................................... 8

Figure 2-4. NGSI-LD Data Model example use case ........................................................ 9

Figure 2-5. Prometheus common architecture ................................................................ 11

Figure 2-6. GUI of Apache NiFi ........................................................................................ 13

Figure 2-7. Traditional scenario (ETL) vs modern world techniques (ELT) ............... 14

Figure 2-8. Architecture of Apache NiFi .......................................................................... 16

Figure 2-9. Clients publishing events to the same topic. ............................................... 20

Figure 2-10. Topic-partition closer look. .......................................................................... 20

Figure 2-11. Architecture of Apache Kafka ..................................................................... 21

Figure 3-1. Generic architecture for a monitoring data aggregation system .............. 23

Figure 3-2. Prometheus source NGSI-LD data model ................................................... 24

Figure 3-3. Example of node_cpu_seconds_total metric ............................................... 25

Figure 4-1. The NGSI-LD adapter ..................................................................................... 28

Figure 4-2. TimeSeries NGSI-LD data model. ................................................................. 29

Figure 4-3. Parsed example of node_cpu_seconds_total metric ................................... 29

Figure 4-4. Flow chart of the NGSI-LD Producer ........................................................... 30

Figure 4-5. Subscription content. ...................................................................................... 31

Figure 4-6. Notification content......................................................................................... 31

Figure 4-7. The Kafka adapter ........................................................................................... 32

Figure 4-8. Avro schema of the Kafka-Avro adapter. .................................................... 33

Figure 4-9. Experiment results for latency measurements ............................................ 35

Figure 4-10. Experiment results for CPU / RAM measurements: Context Brokers .. 36

Figure 4-11. Experiment results for CPU / RAM measurements: Kafka Broker ....... 37

vii

Figure 5-1. Real use case monitoring data aggregation system flow chart. ................ 40

Figure 5-2. Example of Grafana dashboard: average idle seconds for CPU_0. .......... 41

                                            viii

List of Acronyms

API          -     Application Programming Interface

ASN.1        -     Abstract Syntax Notation One

AWS          -     Amazon Web Services

BI           -     Business Intelligence

BIDW         -     Business Intelligence and Data Warehousing

C3IM         -     Cross-cutting Context Information Management

ELT          -     Extraction, Load, and Transfer

ETL          -     Extraction, Transfer, and Load

ETSI         -     European Telecommunications Standards Institute

ETSI ISG     -     ETSI Industry Specification Group

ETSI ISG CIM -     Cross-cutting Context Information Management

FBP          -     Flow-based Programming

GUI          -     Graphical User Interface

IoT          -     Internet of Things

JVM          -     Java Virtual Machine

NCM          -     NiFi Cluster Manager

NOC          -     Network Operations Center

OLAP         -     On-Line Analytical Processing

OLTP         -     On-Line Transaction Processing

OS           -     Operating System

RPC          -     Remote Procedure Call

                                           ix

SDN   -   Software Defined Network

UI    -   User Interface

VNF   -   Virtual Network Functions

                              x

1 Introduction
Telecommunications networks have always presented a big challenge for their
operators, who must track and monitor them in order to detect anomalies, failures or
even cyberattacks. This challenge has been multiplied exponentially in the case of the 5G
network due to its particularities: the virtualization, the increasing need for automation
and the variety of domains that covers in comparison with older mobile technologies.
We are going to comment all these particularities.

• The 5G network has been developed considering the advantages of the cloud
computing technology in terms of scalability, reliability, elasticity, and
economics. The results of this mixture are materialized in the virtualization of the
network functions, originating the Virtual Network Functions (VNF) and most
of the nodes of the network, originating the Software Defined Networks (SDN).
These new elements no longer require highly specialized – and therefore
expensive – hardware, but generic purpose servers able to run software for the
virtualization of the network.

• The adoption of virtualized environments has led to a new necessity for the
deploying of services on demand. The manual configuration of these new
services is a professional, time-consuming and error-prone task. The network
management automation is focused on the creation of automated mechanism for
the deployment and tuning of these services.

• The 5G network has a broad range of monitoring data sources, all of them with
their own characteristics: data models, accessing mechanism…. This variety is
directly translated into a huge heterogeneity we have to overcome. To this end,
it is important to collect the metadata of a data source (its context) to capture its
characteristics at the same time we collect its monitoring data.

So, we have a challenge to solve: the monitoring of the 5G cloud-based network,
taking into account all its particularities and new challenges. When a new problem like
this one is presented, a typical approach is to look to other technological fields and find
similar ones. Following this advice, we are going to inspect three areas, namely: DevOps,
Data Engineering and IoT.

• The software running on top of the 5G network servers are often containers
orchestrated by some tool like Kubernetes, OpenStack, or Docker Swarm. The

monitoring of these containers is a challenge deeply studied in the DevOps area.
       DevOps engineers face a similar problem like the one we have when their web
       application (one of the numerous work areas of DevOps nowadays), that consists
       of a great number of containerized micro-services, fails, and they have to track
       the state of its containers to locate the problem.

   •   For its part, in the Data Engineering field, we discover that the extraction,
       transformation and loading of large amount of data is handled by Big Data
       architectures, for which a handful of tools are used. Monitoring data can also be
       handled by these architectures so that it is beneficiated with the advantages of
       Big Data like scalability and elasticity.

   •   In the IoT field, we find that the ETSI institution has created a framework for the
       sharing of important information between the IoT heterogenous devices, the
       “Cross-cutting Context Information Management” (the ETSI ISG CIM). This
       framework is used for the collection of the metadata, as well as the data itself,
       from the data source. In the ETSI ISG CIM, the management of these two is done
       at the same level. This is very helpful in the area of IoT, where the context (e.g.,
       device, building entities) is mixed with the actual monitoring data (e.g., sensors).
       In network infrastructures this mixture is not feasible due to the huge amount of
       data compared to IoT and the complex structures of the monitoring data (they
       are not just sensor measurements). In any case, the interoperability of the
       framework is very interesting for our problem, and that is why we are going to
       leverage the standard and prove its compatibility with the management of the
       network.

  This project has been done under the umbrella of the 5GROWTH project (reference
[1]). This European project is coordinated by Carlos J. Bernardos from Universidad
Carlos III de Madrid in partnership with other educational institutions and private
companies, and it tries to validate both technically and economically the 5G technology
from the verticals’ points of view. Its mission is to empower vertical industries, such as
Industry 4.0, Transportation, and Energy with 5G end-to-end solutions.

  In this thesis, we are going to design a generic architecture for a network monitoring
system. This system will consist of a Data Source, from which we will be collecting
monitoring data (e.g., metrics, logs…), a data pipeline to process the data, a Broker (the
intermediary) and a Consumer which will receive the monitoring data. To this end, we
will be using state-of-the-art technologies brought from the DevOps, Data Engineering
and IoT fields. In fact, after proposing the reference architecture, we will compare two

                                               2

specific implementations for such architecture to decide which one adapts better to the
particularities of the 5G network. Finally, we will create a last system to be used in a real
use case scenario of 5G network monitoring.

In the following subsections we are going to provide more information about the
purpose of this project and the structure of this document.

1.1 Objectives of the project
As we have introduced earlier, the purposes of this thesis are:

• Objective 1: To detail the enabling technologies that we can use for the monitoring
of the 5G network.

• Objective 2: To design a generic architecture behind a monitoring data
aggregation solution for the monitoring of the 5G network.

• Objective 3: To propose and develop distinct implementations of this architecture
that use different enabling technologies, compare them, and select which one is
better in terms of efficiency and latency.

• Objective 4: To develop a real use case scenario that can help us in our purpose of
monitoring the 5G network.

1.2 Structure of this document
This memory has 6 chapters and 3 appendices, which are:

• Chapter 1 – Introduction: the chapter in which we are now. It details the motivation
of the project and its purpose, as well as introduces the rest of the document.

• Chapter 2 – State of the art: in this chapter we are going to describe the technologies
that we are going to mention in the rest of the document and use in the project
itself.

• Chapter 3 – The monitoring data aggregation system: in this chapter we will explain
the idea behind a data aggregation system to enable the monitoring of the 5G
network.

• Chapter 4 – Experiments and results: we will describe the conducted experiments,
the prototypes we have developed and the followed methodology. We will also
show the results and explain our conclusions.

•   Chapter 5 – Real use case: in this chapter we will depict a new system, extension of
    a previous commented one, that could fit in a real use case scenario for the
    monitoring of the 5G network.

•   Chapter 6 – Conclusions and future work: we will review the fulfilment of the
    objectives commented in Chapter 1 and give the thesis a final conclusion and will
    introduce new lines for the future.

•   Appendix A – Ethical, economic, social, and environmental aspects: we will analyze
    the impact of this project.

•   Appendix B – Economic budget: the total cost of the project.

                                           4

2 State of the art
In this chapter we will discuss the enabling technologies that have made possible the
execution of this project. We will have a closer look to the IoT ecosystem and its Context
Information Management. We will also provide details about Prometheus, a broadly
used monitoring tool. Finally, we will inspect some of the most used Data Engineering
tools, such as Apache NiFi, Apache Kafka, and Apache Avro.

2.1 Context Information Management1
In the last decade, we have seen an explosion in the number of devices that surround
us, a handful of which have become a technology trend all by themselves. That is the
case of the IoT (“Internet of Things”), which use cases have varied from the digitizing of
industrial processes to creating smart services for citizens. But just because a thing is
connected to the Internet does not make it IoT, and an optimal management of these
devices have become a new necessity in these days.

In this line was born the Cross-cutting Context Information Management framework
(the ETSI ISG CIM framework, or just C3IM2), which tries to adopt a common way to
collect, process and distribute Context Information from different devices. That
management purpose we have commented is fulfilled by a C3IM platform, which
collects data from heterogeneous devices and provides as consolidated context
information to applications via an API. The information described and the diagrams
displayed in this section about C3IM and the associated protocol has been taken from
references [2], [3], [4], [5] and [6].

We can define the context of an application as all the relevant aspects of its operating
environment that are required for it to work properly. In the ETSI ISG CIM framework,
context information is any relevant information about entities, their properties
(temperature, location, or any other such parameter), and their relationships with other
entities. Entities may be representations of real-word objects but may also be more
abstract notions such as a legal entity, corporation, nation state, or groups of entities.
Therefore, an example of context information may be: the temperature of a thermometer
in a given office or the number of packets transmitted from a 5G node to another one per
hour.

1 If the reader is fluent in Spanish, there is a video in which I explain Context Information Management, NGSI-LD, and Linked
Data: https://youtu.be/VYbyT3eRKIk
2 As stated in reference [4], the abbreviation is C3IM and not CIM to avoid confusion with other acronyms such as the ISO/IEC

Common Information Model.

Therefore, context information is exchanged among applications, Context Producers,
Context Sources, Context Consumers and Context Brokers following a publish-subscribe
pattern. We are going to state the differences between these components:

     •     Context Broker: in the C3IM framework, a broker is a mediator between context
           consumers and producers. For that purpose, they externalize a HTTP API REST
           named “ETSI NGSI-LD API” for Context Producers to send their data and
           Context Consumers to subscribe to notifications. There are, at the time of
           publication of this document, three different Context Broker implementations3
           supporting the ETSI NGSI-LD API, namely Orion-LD Context Broker4, Scorpio
           Broker5 and Stellio Context Broker6.

     •     Context Producer: system that delivers context information to Context Brokers.
           Specific implementation of a Context Producer can be, for example, a sensors
           gateway. The data that the original devices produce often need some processing
           (e.g., a format change) before been sent to the Context Broker.

     •     Context Source: while a Context Producer must actively send its information to
           a Context Broker, a Context Source is a more complex system which can be
           consulted on the context information it collects. The consulting mechanism
           implies that the Context Source register itself in a component called the NSGI-
           LD Registry providing the types of Entities, Properties, and Relationships which
           the Context Source can be queried about. For example, a database can be
           registered as a Context Source that can provide the CPU usage of a set of servers
           placed in a datacenter.

     •     Context Consumer: it is any application interested in some context information
           collected (via Context Producers) or accessible (via Context Sources) by the
           Context Broker. A Context Consumer usually creates a subscription on the
           Context Broker so that it gets notified in case of an update on the context
           information it is interested in.

    The arrangement of these components leads us to the three main architectures that
can be observed in any C3IM Platform:

     •     Central Broker: in this architecture, a single Context Broker is placed as the
           endpoint of all the subscriptions of the Context Consumers and the aggregator

3 Full catalogue can be found in: https://www.fiware.org/developers/catalogue/
4 Orion-LD Context Broker repository in GitHub: https://github.com/FIWARE/context.Orion-LD
5 Scorpio Broker repository in GitHub: https://github.com/ScorpioBroker/ScorpioBroker
6 Stellio Context Broker repository in GitHub: https://github.com/stellio-hub/stellio-context-broker

                                                                  6

(in terms of storage and collection) of all the context information generated by
    the Context Producers. This architecture is commonly used due to its simplicity.

                         Figure 2-1. Central Broker architecture

•   Distributed Broker: this is a more complex architecture that involves the
    existence of one or more Context Sources which can be consulted, when
    necessary, by the Context Consumers. To make this work, Context Sources must
    register the information they can provide, along with geographical and temporal
    constraints if needed, to the Context Registry. Context Consumers can also
    directly discover Context Sources via the Context Registry.

                       Figure 2-2. Distributed Broker architecture

                                           7

• Federation Broker: this is a model for the aggregation of multiple NGSI-LD
systems. In this model, applications can query or subscribe within a wider
geographical scope. When the query or the subscription comes to a Federation
Broker, this forwards it to the domain Context Brokers identified as able to
provide useful information to that operation and aggregates the result. In fact,
the Context Consumers will not notice any difference between this architecture
and the other mentioned. Federation is broadly used to improve scalability.

Now we are going to focus on the Context Broker, which uses the NGSI-LD data
exchange protocol. The purpose of this protocol is to facilitate the interconnection of the
Context Information with several platforms like open databases, mobile Apps and IoT
platforms by adding semantic via the Linked Data technology. The protocol includes
two key components: the NGSI-LD HTTP RESTful API and the NGSI-LD data model.

Beginning with the latter, the NGSI-LD Information Model is based on Linked Data
standards. Therefore, the NGSI-LD API only accepts JSON-LD, a flavor of JSON used
for Linked Data which links the fields of the document with URI (Universal Resource
Identifier) where you can extract more information about the data presented in that field.
Linked Data allows different systems to use the same URI to reference the same concept,
enriching the Context Information and enhancing interoperability across multiple
domains (Smart Cities, Smart Agriculture, banking, healthcare …).

The NGSI-LD Information Model is layered to adapt it to the nature of real-world
entities and semantics. Thus, we can differentiate: the Core Meta-Model, the Cross
Domain Ontology and the Domain-Specific Ontologies:

Figure 2-3. NGSI-LD Information Model layers

•   The NGSI-LD Core Meta-Model is the foundation of all the Information Model.
       It is represented by Entities, Relationships and Properties. It contains the core terms
       needed to uniquely represent the key concepts of the NGSI-LD Information
       model as well as the terms that define the API-related Data Types.

   •   The NGSI-LD Cross Domain models provide widely used structures such as time
       and place of an entity. These apply broadly to several domains and the
       normalization of their representation offers useful interoperability between
       different domains.

   •   The Domain-Specific Ontologies can be created by extending the other two layers
       with specialized terms taken from other ontologies.

  The following figure is a graphic example for the NGSI-LD data modelling of a simple
use case. It represents the buying of a bottle of wine by a customer in a local store. The
example only uses the NGSI-LD Core Meta-Model (Entities, Relationships and Properties)

                        Figure 2-4. NGSI-LD Data Model example use case

                                               9

The other component of the NGSI-LD protocol is the NGSI-LD API. This API,
implemented in the Context Broker mainly, is presented as a RESTful HTTP API to be
used by the other elements of the platform. Specifically, the permitted operations are:

• Context Producers: create or update Context Information on the Context Broker
following the NGSI-LD Information Model.

• Context Sources: in this case, the registration is done against the Context Registry
as described above, following the same API.

• Context Consumers: query or subscribe to a specific Context Information on the
Context Broker.

2.2 Prometheus
Prometheus (docs in reference [7] and explanatory video in reference [8]) is a tool for
monitoring both traditional and highly dynamic container environments, like
Kubernetes or Docker Swarm. The more complex is becoming modern DevOps, the
more necessary it is to support the automated monitoring of the infrastructure behind to
prevent domino effect with the failure of any of the deployed services. For that purpose,
Prometheus helps us by providing insight of what is happening in our environment at
both hardware and container levels and triggering alerts to the system administrators
when measurements fall below or exceed provided thresholds.

In 5G scenarios, as mentioned in the introduction of this text, there is a lot of
virtualization under the hood. Prometheus fits perfectly in this situation to provide
automated monitoring and alerts about the usage of the containers or the load of the
network.

Clarifying the terminology, Prometheus monitors a particular target that can be, for
example: a Linux or Windows Server, a service, an application… Each target has one or
more monitoring information e.g., CPU status, memory or disk usage, number of
requests or the duration of the same. When Prometheus monitor and store this
information, it becomes a metric. Those metrics will be stored inside Prometheus in a
human-readable format along with at most three attributes: help, type, and unit. Help is
basically an optional description of what the metric is. The available types are counters
(to keep account of how many times an event has happened), gauges (that can go up and
down to store the value of a metric in each moment) and histograms (to track how long
or big an event has been). Unit is optional too, and it refers to the magnitude of the metric
(seconds, meters, grams…)

Prometheus pulls metrics from the targets, in opposite to the pushing mechanisms
that other monitor systems have. This reduces the load of network traffic considerably
and avoid setting processes on the services to push their metrics to the server. It also
comes with other advantages, but it is important to comment that not all the services are
suitable to be scraped by this pulling mechanism. That is the case of the “short-lived”
jobs, services that are awaken for a short period and they need to push their metrics in
that time window. For those cases, there exists a component called Pushgateway which
we will comment later.

  The core component of a Prometheus architecture is the Prometheus Server, which
does the actual monitoring work and consists of three parts:

   •   Data Retrieval Worker: a process that pulls metrics data from applications,
       services, servers, or any other targeted sources.

   •   Time Series Database: responsible of the storage of the metrics data.

   •   HTTP API: accepts queries in PromQL language for the stored data.

  Additional to the Prometheus Server, other components are set in the whole picture
to complete the functionality of the platform:

                          Figure 2-5. Prometheus common architecture

                                             11

• For the pulling of the metrics, targets must have an HTTP endpoint allocated by
default in “address/metrics” that exposes their monitoring information in a
format that Prometheus can understand. Many services expose their metrics in
“address/metrics” natively, but for those that do not, an extra component is
required. This component is called an exporter, namely a script that fetches
metrics from our target, converts it to the correct format and expose it via
“address/metrics” so that Prometheus can scrape this endpoint. Prometheus has
a list of official exporters ready to be used in frequent services that can be used
either on traditional services or as docker images. In case of monitoring a custom
application, a Prometheus client library is available on its webpage for multiple
program languages.

• As commented before, there exists an element called the Pushgateway used by
those processes that cannot hold a HTTP Server because of their short life period.
The Pushgateway is an endpoint to gather all the metrics pushed by those
processes and keep them until the Data Retrieval Worker scrape it.

• The Service discovery subsystem helps the Data Retrieval Worker to locate the
targets that it has been commanded to scrape in its configuration file
(prometheus.yml).

• For visualization purposes, we can choose between a built-in solution provided
by Prometheus (Prometheus Web UI), some off-the-shelf software with the
proper adapter (the most common one is Grafana) or any other client able to
request a HTTP API (like Postman or a custom solution).

• The Alertmanager is the last component to describe. It is responsible for firing
alerts via different channels (for example, emails). The Alertmanager is triggered
by the Prometheus server, which evaluates its alert rules defined in its
configuration file periodically.

2.3 Apache NiFi
Apache NiFi (docs in reference [9]) is an open-source tool to automate the flow of data
between systems that was born as a project of the National Security Agency and donated
afterwards to the Apache Software Foundation in 2014. Its software development and
commercial support is currently offered by Cloudera, a US-based company whose data
analytics platform comes with a copy of Apache NiFi (among others).

Written in Java, it is a member of the so-called ETL tools used in many industries like
IoT or Big Data. It has a Web-based UI and follows a “drag and drop” visual framework

to create dataflows. Furthermore, from its version 1.2.0, Nifi is prepared for working
with record-oriented data.

                                  Figure 2-6. GUI of Apache NiFi

   In the following subsections, we go in depth with the key concepts around Apache
NiFi.

        2.3.1 The ETL procedure
   According to references [10] and [11], ETL (Extract, Transform and Load) is the process
by which data is extracted from data sources (that are not optimized for analytics) and
moved to a central host (which is), performing such actions as changing the data types
or applying calculations. The exact steps in that process might differ from one ETL tool
to the next, but the result is the same.

   Traditionally, data used to be extracted from a very reduced group of sources and
loaded into the main analytics server of the organization in question. Those sources are
the OLTP databases (On-Line Transaction Processing), like sales, inventory, and
customer records. The data was then transformed with cleansing and optimizing to be
incorporated later into an OLAP database (On-Line Analytical Processing). All these
processes were done for a very specific BI (Business Intelligence) analysis. If the analysis

                                               13

had to change, it was very frequent that the previous techniques could not support that
new analysis, then all the processes needed to run again, this time with different
transformations.

Modern world technologies have changed the ETL process radically nowadays,
beginning with the sources. What used to be simple ad-hoc applications created
specifically for our company are now dozens or thousands of sources from both the
inside and the outside of our organization. The amount and variety of data sources have
raised explosively, and the traditional paradigm used for structured data is not
sufficient. Furthermore, business choices now must be taken in real time leaving typical
batch processing in a second plane because it is too slow. The same reason applies for
the analysis, which can vary from an hour to the other, and that is why modern ETL
infrastructures no longer transform their data before storing it, and there exists a quick
shift towards “ELT” (noticing the change in the order of the Transform and Load stages).

Figure 2-7. Traditional scenario (ETL) vs modern world techniques (ELT)

Luckily, today we count on powerful tools based on the Cloud where to load all the
information. These tools are called Data Lakes and Data Warehouses. A Data Warehouse
is a database optimized for analytics, e.g., Apache Druid or Snowflake. As such, it
imposes structure on the data. On the other hand, a Data Lake is a storage substrate
where data is dumped without any prior transformation. Examples are Amazon S3 or
HDFS.

For the analysis of this massive amount of data we also have Cloud-based software,
the so-called Business Intelligence tools. This tool is where data is analyzed and
visualized, e.g., Superset or Tableau. Sometimes Data Warehouses and Business
Intelligence tools come altogether as a unique software, called the Analytics Warehouse,
the BIDW tool or the BI/DW tool.

Apache NiFi is a key component to populate Data Lakes and Data Warehouses from
a vast variety of sources.

2.3.2 Core components
NiFi’s fundamental design concepts closely relate to the main ideas of FBP (Flow
Based Programming). According to its Wikipedia article (reference [12]), FBP is a
programming paradigm in which applications are defined as interconnections of “black
box” processes that exchange data. The main concepts of NiFi are inspired in the FBP
paradigm, and they are:

• FlowFile: it represents an object moving throughout our network. FlowFiles are
composed of two parts: attributes and content. Attributes are a key-value
dictionary associated to the FlowFile and helps NiFi track it. The content is a zero
or more bytes string representing the data inside the FlowFile.

• Processor: they perform some operation like routing, transforming, fetching, or
storing the FlowFiles. The processors are visually a configurable box that can be
selected from a catalogue.

• Connection: they link processors. Internally, they model a queue so that different
processes can interact at different rates if needed.

• Flow Controller: it is the brain of the operation. It acts as the broker facilitating
the exchange of FlowFiles between processors.

• Process Group: a set of processes and their connections with input and output
ports if needed. This allows the creation of new components by composition of
others.

As NiFi is written in Java, it runs over a JVM hosted by the OS. It comes with the
presence of some components as follows:

• Web Server: allow us to control NiFi using its API.

•   Flow Controller: its purpose is to provide threads for extensions to run on and to
       manage them.

   •   Extensions: NiFi provides several extension points to provide developers the
       ability to add functionality to the application to meet their needs. They are
       executed within the JVM.

   •   Processors: mentioned before, they are also executed within the JVM.

   •   FlowFile Repository: NiFi keeps track of all active FlowFiles, their attributes and
       other Metadata about them in this repository.

   •   Content Repository: it is where the contents of the FlowFiles live.

   •   Provenance Repository: where all the data-related events are allocated. The
       tracking of all these events from beginning to end is often called Data
       Provenance.

   •   Controller Services (reference [13]): they are components running in the
       background with the ability to support or control large groups of processors. For
       example, “AWS Credentials Provider Service”, which extracts the credentials
       from AWS so that it can be queried by the AWS related processors, instead of
       manually set the key inside the configuration of all these processors.

  The three repositories are pluggable, meaning that they can be located on a separate
drive if needed.

                            Figure 2-8. Architecture of Apache NiFi

                                              16

Apart from this architecture, NiFi also gives the possibility to operate as a cluster
maintained with Apache Zookeeper and a NiFi Cluster Manager (NCM) to increase
scalability and performance.

  Finally, we are going to detail the type of processors there are and give more example
about them (more details in reference [14]):

   •    Data Ingestion processors: they fetch data from sources. E.g.: GetFile, GetHTTP,
        ListenTCP, GetMongo…

   •    Routing and Mediation processors: responsible of routing the FlowFiles to
        different processors depending on the information of their attributes or content.
        E.g.: RouteOnAttribute, RouteOnContent…

   •    Database Access processors: they can perform statements against different
        databases: E.g.: ExecuteSQL, PutSQL, ListDatabaseTables…

   •    Attribute Extraction processors: responsible to extract, analyze, change FlowFile
        attributes E.g.: UpdateAttribute, ExtractText, EvaluateJSONPath…

   •    System Interaction processors: they can interact with the operating system below.
        This type of processors sets the NiFi powerfulness virtually unlimited, as we can
        execute    any    programming       language        with   an    OS    call.   E.g.:
        ExecuteStreamCommand, ExecuteScrip, ExecuteProcess…

   •    Data Transformation processors: they receive an input FlowFile and generate
        another   one    at    their   output.      E.g.:   UpdateAttribute,   ExtractText,
        JoltTransformJSON…

   •    Sending Data processors: they store a FlowFile into an external system. E.g.:
        PutFile, PutEmail…

   •    Splitting Data processors: allow us to split or merge the content in a FlowFile.
        E.g.: SplitText, SplitJson, MergeContent…

   •    Specific technologies processors: like for example, the family of the HTTP
        processors (e.g.: InvokeHTTP, ListenHTTP…) or the AWS processors.

2.3.3   Record-oriented data
  As we have seen, one of the best NiFi’s strength is that the framework is data agnostic,
treating data like a vector of bytes with associated attributes. That makes sense in a lot
of different scenarios, but it does not in many other ones. It is the case of the record-

                                               17

oriented data, in which all the data we treat comes with the same structure or schema.
That is why, since version 1.2.0 of NiFi (May 2017)7, a series of processors and controller
services were released to make this last scenario much simpler, as introduced in the
Apache NiFi’s blog article of reference [15].

Record-oriented data can implement its schema in different formats, and NiFi
supports a bunch of them, e.g.: Avro, CSV, Json, or XML. For each supported record-
oriented schema, NiFi has a Record Reader and a Record Writer controller service. In
order to make sense of the data, Record Readers and Writers need to know the schema
that is associated with the data. Some Readers (e.g.: the Avro Reader) allow the schema
to be read from the data itself.

Once that the correspondent controller service is up and running, we can use record-
oriented processors that makes some work easier than before. For example, with a CSV
Reader and Writer, we can merge many FlowFiles with the same CSV schema into a
single one or store them into a SQL database effortless.

2.4 Apache Kafka
During many years, we have been using databases to store records representing real-
world things. Although this was the main rule for a long time, an increasing in the
necessity of storing events - rather than things – has brought to the birth of some
solutions. One of these solutions is Apache Kafka (docs in reference [16] and
introduction video in reference [17]), which is an event streaming platform. The key to
understand Kafka is to visualize it as the manager of many queues (or logs, or software
buses), which are called topics in Kafka’s terms. Kafka’s topics are nothing but ordered
collection of events that are stored in a durable way, with real emphasis on the word
durable, because it means that events should be replicated in multiple devices so that it
is fault tolerant.

There are a huge number of reasons to answer why has Kafka become so popular,
here are some of them:

• The simplicity of the concept of topic makes Kafka really scalable, which has
become a must in this Big Data world.

• In a world quickly shifting into Service-Oriented Architectures, the increasing of
microservices has positioned Kafka as one of the best options when it comes to
the message exchange between all these components.

7 Release notes of version 1.2.0 in https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.2.0

• There is an increasing need for organizations to implement real-time analytics,
and Kafka’s topics can deliver events in real-time in a reliable manner.

• The integration with legacy events databases is possible and very simple thanks
to Kafka Connect, a Java library API to get data in and out of these older systems
by adding so-called connectors, most of which are already implemented and
ready to be reused in a declarative way.

• It also has another tool called Kafka Streams, a Java library API that handles
common higher-level operations like grouping, aggregating, counting,
filtering… for us to focus on functions that cover any specific needs of our
business.

2.4.1 Concepts
Kafka’s most basic unit is the event, also called record or message. The anatomy of an
event is simple: it is composed by three or four strings: an event key, an event value, an
event timestamp, and optional metadata headers. Events are published by producers
and subscribed by consumers. They are fully decoupled and agnostic of each other,
which is a key design element to achieve high scalability.

Events are organized in topics, as we already know. Events in a topic are not deleted
after consumption, so they can be read as often as needed. Topics are partitioned,
meaning that two events of the same topic can be physically stored in two different Kafka
servers (which are called brokers, as explained later) and also replicated to add fault
tolerance and high availability. This allows clients to both read and write data from or
to many brokers at the same time. Nevertheless, events with the same key are always
written to the same topic’s partition. Kafka guarantees event ordering inside each
partition (they can be read in the same order they were written) with an algorithm
inspired by Lamport’s clocks, the Idempotent producer algorithm (more information in
reference [18]). That is because inside each topic’s partition there is an ordered sequence
of events numbered with a sequential identifier called offset. Each consumer is
responsible for controlling the offset of the event it has already read, leaving the
possibility of reading a specific offset event if needed. That characteristic is called cherry-
picking and is a common concept in queues and control version systems.

Figure 2-9. Clients publishing events to the same topic.

          Blocks with the same color denote events with the same keys. Figure from reference [19]

                                  Figure 2-10. Topic-partition closer look.

        Consumers can read different events and perform cherry-picking. Figure from reference [19]

  Kafka has a command line tool for the creation, inspection and managing of topics
and other administration tasks. Apart from it, we count on three Java APIs (similar to
the Kafka Streams API and Kafka Connect API commented before): the Admin API, the
Producer API and the Consumer API to integrate programs with Kafka. The whole five
Java APIs are also interoperable with Scala.

2.4.2    Architecture
  Kafka Broker is the name given to the server that run each instance of Kafka in a Kafka
cluster. They use Apache ZooKeeper under the hood, a software for the management
and coordination of the cluster. Each broker can be responsible for partitions of one or

                                                     20

more topics. For each partition of a topic that is replicated (called replica), a broker must
fit in one out of two roles: Leader or Follower. Leaders deal with client requests about
their topic partition, while Followers only replicate passively the topic partition of the
leader to stay up to date with it. If a Leader fails, a voting process will elect a new Leader
from the list of Followers that were in-sync with the former one.

Apart from Brokers, Kafka also has Producers and Consumers, roles we have already
commented. Consumers with a common task can be included in a Consumer Group to
improve parallelism, so that each partition is read by only a single consumer within the
group. The whole picture is presented in this figure extracted from reference [19]:

Figure 2-11. Architecture of Apache Kafka

2.5 Apache Avro
Apache Avro (docs in reference [20]) is a data serialization system developed within
the Apache’s Hadoop project. It is used to define the data schema in record-oriented
data. The procedure is: Avro uses a certain schema to serialize records in a compact binary
format, and it uses a compatible schema to de-serialize it. By having the schema present
in both writing and reading of each datum, the process becomes faster and the
serialization lighter. Schemas are defined with JSON, and they can be stored within the
serialized object, making it slightly heavier, but allowing any program to read it. Apache

Avro has APIs written in many programming languages like C, Go, Java, Python or
Scala.

   Avro is widely used in RPCs (Remote Procedure Call), where the client and the server
exchange schemas in the connection handshake.

   As per today, there has been multiple systems that provides similar functionality (for
example: Apache Thrift, Google Protocol Buffers or the ASN.1 standard with its Basic
Encoding Rules). Apache Avro differs from those systems in these aspects:

   •     Dynamic typing: for reading and writing, all that Avro needs is a schema,
         nothing about code.

   •     Untagged data: data can be written in a more compact manner as schema is
         present when it is read.

   •     No manually-assigned fields IDs: when a schema changes, both the old and new
         schema are always present when processing data, so differences may be resolved
         symbolically, using field names.

                                            22

3 The monitoring data aggregation system
  In the introduction we mentioned that we were going to create a system for the
monitoring of the 5G networks adapted to all its particularities. In this chapter, we are
going to define the components of a monitoring data aggregation system in general.

  This figure represents the components of the generic schema:

             Figure 3-1. Generic architecture for a monitoring data aggregation system

  Now, we are going to describe them:

   •   Data Source: the Data Source is where the monitoring data originally comes from.
       We are going to use is Prometheus. This means that we have to deploy a
       Prometheus instance with the proper exporter. This will provide an endpoint
       (address/metrics as we commented in Chapter 2) from which to poll metrics via
       REST API request. This endpoint will be consulted for metrics periodically.

       As commented in the introduction of this document, not only it is important to
       collect monitoring data, but also the associated metadata that we already know
       can be called the context of the data source. To characterize the complex data
       source that is Prometheus, we have created a NGSI-LD data model that
       summarizes all the data and metadata that can be collected from it.

                                                 23

Figure 3-2. Prometheus source NGSI-LD data model

In the data model we can observe the following entities:

   o   Prometheus: this entity represents the instance of Prometheus. For that
       reason, it has two properties: the chosen name for the server and the
       specific version. Distinguishing a Prometheus entity from another can be
       very important in some scenarios to verify the available capabilities of
       that specific instance.

   o   Metric: the metric entity is the representation for an actual metric from
       Prometheus. It is described by a name property and has a set of labels
       which identify a specific time series of the metric. For example, the
       frequency of each CPU processor is described with the same metric name,
       but with different set of labels.

   o   MetricFamily: this entity provides information about the type of the
       metric (counters, gauges and histograms), as well as optional information
       like the unit of measurement of the metric and a brief description.

   o   Endpoint: the endpoint entity of the data model describes the connection
       point by which we have reached Prometheus. It has a name and a URI.

                                      24

o   Credentials: describes the authentication method used for the collection
            of the metric (e.g., user-password, certificate).

•   Producer: the Producer will scrape Prometheus, transform the incoming data,
    and send it to the Broker. For the transformation part, we must consider that the
    metrics collected from Prometheus will be formatted in JSON following the
    Prometheus data model generated when the query in PromQL is executed. In our
    case, both producers will be implemented in Apache NiFi.

    Here is an example of a metric retrieved from the Prometheus API. The metric
    we can see here is the node_cpu_seconds_total, which tell us how many seconds
    each CPU spent doing each type of work. For more information about this
    specific metric, please consult the reference [21].

                  Figure 3-3. Example of node_cpu_seconds_total metric

    As we can see, the JSON in the response comes with a very specific format. For
    example, in the field value we have a vector whose first element represents the
    timestamp of the metric and the second one is the actual value of the metric.

•   Broker: the Broker is an intermediary component that will distribute the data
    pushed by the Producer among all the interested Consumers.

                                            25

•    Consumers: they will subscribe to the distribution service that offers the Broker
          for the metrics they are interested in. After receiving the notification and
          processing it, they will be able to dump the monitoring data into the Data Sink.

     •    Data Sink: it is the place where the monitoring data is stored (e.g., file texts or
          databases) until its value is harvested8. For the harvesting of the value we could
          use, for example, the displaying of a dashboard with the evolution of the
          Timeseries and trigger alerts when a certain metric surpasses a pre-defined
          threshold. A new component, not represented in the diagram, would be
          responsible for doing this kind of tasks.

    In the following chapter, we are going to compare two monitoring data aggregation
system prototypes. First, we will depict their specifications, then, we will describe the
methodology we are going to use and lastly, we will comment the derived results from
the experiments.

8The harvesting of the value of the data is a concept drawn from the Data Value Pyramid we can see in the book Agile Data
Science from reference [22]

                                                            26

You can also read