Extracting Knowledge and Computable Models from Data - Needs, Expectations, and Experience

Page created by Ellen Lowe

Style & Fashion

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Extracting Knowledge and Computable Models
from Data - Needs, Expectations, and Experience

Thomas Natschläger, Felix Kossak, and Mario Drobics
Software Competence Center Hagenberg, A-4232 Hagenberg, Austria

Email: {Thomas.Natschlaeger, Felix.Kossak, Mario.Drobics}@scch.at

Abstract In modern industrial manufac- ble – a large quantity of high-quality prod-
turing, a great amount of data is gathered ucts in a (cost-) efficient way. To tackle
to monitor and analyze a given production this challenge, major requirements are an
process. Intelligent analysis of such data optimized production process and constant
helps to reveal as much information about quality control. Obviously it is desirable
the production process as possible. This that each of these issues is supported by
information is most useful if it is available suitable intelligent data analysis (IDA) tools
in the form of interpretable and predictive which operate on the process data, i.e. sig-
models. Such models can be generated nals which are measured by appropriate
from data by means of (fuzzy logic based) sensors connected to the production pro-
machine learning methods. In this contri- cess.
bution we will describe industrial applica-
tions in the areas of process optimization Fig. 1 shows a possible scenario where
and quality control where we have success- IDA tools are used in an offline manner:
fully established machine-learning methods the knowledge gained by IDA operating on
as intelligent data analysis tools. large amounts of stored data influences
“online” analyses like quality control, trend
analysis, and fault detection, as well as the
production process itself, i.e. process op-
1 Introduction timization. It turns out that the knowledge
extracted by the IDA is particularly useful
In order to be competitive on the market, if it comes in the form of mathematically
the major goal of any industrial production well defined models of certain dependen-
process is to produce – as fast as possi- cies within the data. A potentially success-

production process                      A   sensor s1

                                                                                                     sensor s2
                                                     sensor s2
                                                                                                                                                             prediction
                                                                 sensor s3

                                                                                 sensor sn
                                        sensor s1

                                                                                                                         computational
       knowledge, models, predictions

                                                                                                     sensor s3
                                                                                                                                                             M(s1,...,sd)
                                                                                                                           model M                         for sensor sp

                                                                                                     sensor sd

                                        online visualization
                                          quality control                                        B     class_Is_Iris-setosa
                                          trend analysis                     storage                   class_Is_Iris-versicol
                                          fault detection                                              class_Is_Iris-virginica

                                                                                                                     petal_width_IsAtLeast_M H150L

                                                                                                                 T                      petal_width_IsAtMost_M H100L

                                                                                                                                    T                  class_Is_Iris-setosa H53
                                                     Intelligent                                                                    F                  class_Is_Iris-virginica
                                                    Data Analysis                                                F                  class_Is_Iris-versicol H50L

Figure 1: Possible scenario of itelligent data C                                                     class_Is_Iris-setosa                        petal_width_Is_M
analysis (IDA)                                                                                       class_Is_Iris-versicol                      petal_width_Is_L
                                                                                                     class_Is_Iris-virginica                     petal_width_Is_H

ful approach is to generate such models                                                          Figure 2: Computational models. In con-
by means of machine learning methods;                                                            trast to “black box models” (A) such as neu-
see e.g. [1]. Typically a model “learned”                                                        ral networks, descriptive models such as
from the data predicts a sensor signal sp                                                        decision trees (B) or rule bases (C) are easy
based on a set of different sensor signals                                                       to interpret. The drawings in panels B and
s1 , s2 , ..., sd , i.e. sp = M (s1 , s2 , ..., sd ) (see                                        C are obtained with mlf (see Sec. II) applied
also Fig. 2A).                                                                                   to the well-known iris data set.
   The use of such predictive models can be
twofold: one can exploit the predictive capa-
bilities of a model on the one hand and the                                                      ily interpretable, like it is the case for deci-
structure of the model on the other hand.                                                        sion trees and rule bases (see Fig. 2B), this
The prediction obtained from a given (and                                                        structure can help to understand the pro-
previously learned) model can for example                                                        cess in more depth. In consequence, such
be used to detect anomalous working con-                                                         enhanced knowledge can often be used to
ditions of the production process [7] or to                                                      optimize the production process.
get an idea about the quality of the product                                                        Even from this short introduction, one can
if certain process parameters are varied. In                                                     already specify two general requirements
such applications, the accuracy of the pre-                                                      of machine learning methods when applied
diction is of primary interest while the struc-                                                  as intelligent data analysis tools: the re-
ture of the model plays a minor role. How-                                                       sulting models should have good predictive
ever, if the structure of the model is eas-                                                      capabilities while the structure of a model

                                                                                             2

should be easily interpretable. In Sec. 2, ical capabilities of Mathematica (see Fig. 3).
we will describe a framework for machine
learning, called mlf, which tries to meet this Descriptive models via fuzzy logic
requirements. mlf has been successfully
used as an intelligent data analysis tool in The design of mlf was strongly influenced
several applications. Two particular appli- by the requirements of being able to gen-
cations are described in Sec. 3 and Sec. 4. erate descriptive and highly predictive com-
Throughout the sections 2, 3, and 4, we will putational models. To a large part, this was
draw our attention to the features of the ap- achieved by integrating fuzzy logic wher-
plied machine learning tools which turned ever possible. As a result, the models gen-
out to be important to successfully establish erated by mlf contain easily interpretable
them in an intelligent data analysis tool. phrases like “the value of sensor s1 is large”
while still maintaining numeric accuracy and
predictive capabilities. In addition, smooth
2 mlf : A Machine Learn- results very often model the underlying de-
pendencies within the data more realistic
ing Framework for Math- than crisp ones.
ematica
Wide range of algorithms
The machine learning framework for Mathe-
matica1 (mlf ) is a collection of powerful ma- As it is a matter of fact that for a given prob-
chine learning algorithms integrated into a lem, it is not clear in advance which learning
framework for the main purpose of intelli- algorithm will yield the best results, it is al-
gent data analysis [2]. mlf 2 combines an ways a good advice to try different machine
optimized computational kernel - the core learning algorithms or to combine them to
engine - realized in C++ with the manipu- solve one single problem. Such combi-
lation, descriptive programming, and graph- nations of distinct algorithms may give the
user unforeseen insights into their data.
1
Mathematica is a registered trademark of Wol- Therefore mlf covers a wide range of ma-
fram Research Inc. (www.wolfram.com). chine learning algorithms, which are listed
2
mlf is developed and supported by the Knowl-
edge Based Technology area of the Software
in Tab. 1 together with the corresponding
Competence Center Hagenberg GmbH (A-4232 references.
Hagenberg, Austria, http://www.scch.at). mlf
is owned and distributed by uni software plus
GmbH (Kreuzstrasse 15a, A-4040 Linz, Aus- Visualization and structure of data
tria, http://www.unisoftwareplus.com/) which has ten
years of experience in distributing Mathematica and In addition to supervised learning algo-
Mathematica-based solutions and collaborates with rithms which generate the kind of compu-
Wolfram Research, Inc., for worldwide distribution. tational models depicted in Fig. 2A, the

Table 1: Algorithms implemented in mlf

                Supervised Analysis
                    Fuzzy decision tree learning (FS-ID3) [3]
                    Fuzzy rule base learning (FS-FOIL, FS-MINER) [4]
                    Numerical optimization of arbitrary rule bases (RENO) [5]

                Unsupervised Analysis
                    Clustering algorithms (WARD, FCM) [8, 1]
                    Self-organizing maps (SOM) [6]

machine learning framework also contains           Powerful architecture and interface
several unsupervised learning algorithms.
Such algorithms can help to understand             The machine learning framework for Math-
how data is structured (e.g. which groups          ematica is used from the Mathematica front
of customers a company has) or to visu-            end. By using the Mathematica front end,
alize the often very high dimensional data         one has access to all Mathematica func-
(e.g. by using SOMs [6]). Apart from ma-           tions, including the powerful graphical ma-
chine learning algorithms, mlf includes a          nipulation tools, and, with the Mathemat-
wide range of visualization methods using          ica programming language, one has access
all the power from the graphics capabilities       to an elegant scripting language. How-
of Mathematica.                                    ever, computationally intensive algorithms
                                                   are implemented in an optimized compu-
                                                   tational kernel (the core engine realized in
                                                   C++) to yield fast response times. The
                                                   C++ kernel is completely independent of
                                                   the Mathematica front end and can be uti-
High speed                                         lized from any environment capable of call-
                                                   ing C++ functions (see Fig. 3).

All algorithms are highly parameterizable to
be able to adjust them to a particular prob-
lem. Given this parameterizeability and the
                                                3 Process optimization in
range of learning algorithms combined with            paper production
the efficient core engine (realized in C++)
of mlf, the user is able to “look at their data In this section, we will describe an applica-
from different points of view” in real time.    tion of mlf for process optimization in paper

                                               4

production.                                                                    results
                                                        Excel sheets                                                 GUI
   Paper production is a complex flow pro-                              auxiliary information
cess with several process levels and hun-
                                                                   da                              nd
                                                                                                        s                   els
dreds of process parameters which poten-                               ta                      a                       od
                                                                                            mm                       dm
                                                                                         co                     ze
tially influence the quality of the produced                                                           u    a li
                                                                                                   vis
paper, assessed by dozens of quality mea-
surements. Hence a major goal is to adjust                                  Mathematica
the process parameters for optimum paper
quality under tough economic constraints.
Knowledge about dependencies within the                    XML               C++ core
data, i.e. the process parameters and the                                                      mlf
quality measurements, is valuable informa-
tion in assisting the optimum adjustment of         Figure 3: The architecture of PaperMiner
parameters which needs enormous expert              which, is based on mlf.
knowledge and experience. Unfortunately
the process of paper production is so com-
                                                    ality must be accessible for users without a
plex that the creation of explicit models for
                                                    special mathematical background. On the
such dependencies is hardly possible. In
                                                    other hand, experts should be able to dig
such a case, intelligent data analysis by
                                                    deeper using the same tools (e.g. with re-
means of machine learning methods can
                                                    spect to co-operative work). This results in
help to discover such required knowledge
                                                    the following requirements:
from previously measured data.
   This approach turned out to be very fruit-
ful and culminated in a project together with       GUI for standard problems
VOITH Paper (www.voith.com) and SCA
Graphic Laakirchen AG (www.sca.at) where            A user should be able to obtain useful re-
a machine learning application called Pa-           sults for standard problems by following a
perMiner was (and is still being) developed.        rather short series of simple steps. The
PaperMiner is heavily used in practice and          PaperMiner GUI represents the required
has led to a number of surprising insights          steps in a hierarchical way which makes
by the users.                                       the whole workflow visible at first sight; see
   A major requirement for the PaperMiner           Fig. 4.
from the onset was a graphical user inter-
face (GUI) for accessing some of the func-          Expert interface
tionality of mlf (see Fig. 3 for a block dia-
gram of PaperMiner). The requirement for            In many cases, however, standard pro-
a GUI reflects the more general require-            cedures will not yield the desired results.
ment that basic machine learning function-          In these cases, the user should be able,

                                                5

Figure 5: PaperMiner’s interactive decision
tree. The screen shot shows a decision tree
Figure 4: Graphical user interface (GUI) of
generated for the well known iris data set.
PaperMiner (screen shot)

data.
for instance, to optimize parameters of the During the application of PaperMiner to
model-generating algorithms. With the cur- practical problems in paper production, it
rent architecture of the application, this is turned out that whereas the resulting mod-
possible through the Mathematica interface els often describe highly interesting de-
of mlf. pendencies, sometimes the chosen at-
Another primary requirement for the Pa- tributes/measurements are judged by the
perMiner was that it can be used in con- domain experts not to be the “best” ones.
nection with Microsoft Excel, for the follow- To examine whether the original measure-
ing reasons: a) Excel is a program which ment suggested by the learning algorithm
potential users are typically already familiar can be replaced by a “better” measurement,
with; b) Excel is capable of displaying and PaperMiner features an interactive tool for
transforming data in a convenient way; and the construction of a decision tree. This
a) It is relatively easy to extract data from a tool allows a user to modify individual nodes
database into an Excel file (and respective of a given (most commonly a previously
scripts were already available for the pilot learned) decision tree; see Fig. 5.
application). From the requirement to use A further requirement is that the standard
Excel as the data source, one can derive presentation of the output should be as ap-
the more general requirement that applying pealing and intuitive as possible. This is
the machine learning algorithms is only half most easily achieved by graphical represen-
of the story about intelligent data analysis. tations. We experienced that people could
One also needs flexible, powerful and easy- easily interpret decision trees with color en-
to-use tools to access and preprocess the coding of key information at the nodes with-

out knowing anything about the theoretical
background. Simple diagrams and similar
graphical depictions can be interpreted by
humans intuitively, at first sight, while the
information conveyed by such pictures is
still sufficient for most applications. Still, for
transport module processing module
fine-grained process optimization, more de- finished
tailed output must be available as well. parts flow of workpiece holders feeding module

Figure 6: Schematic of a loosely coupled
4 Machine Learning in Dis- assembly line.

crete Manufacturing
an overall equipment efficiency (OEE) as
In this section, we will describe applications high as possible and to analyze the causes
of machine learning in the are of discrete if the OEE is not satisfactory. The OEE is
manufacturing. The process of assembling a product of equipment availability (EA),
discrete parts from its discrete components production efficiency (P E), and quality rate
(produced elsewhere) has different charac- (QR). Therefore each of the three factors
teristics from a flow process like paper pro- must be as high as possible.
duction (see previous section). In a typical If one of those factors is strongly varying
assembly line, the components to be as- over time, it is of interest to figure out the
sembled (provided by certain feeder mod- factors which cause low values. For exam-
ules) are fixed at a workpiece holder which ple, one may investigate whether the peo-
goes through several processing modules ple operating the assembly line and the type
until the part is finished either as “o.k.” or of the produced parts influence the aver-
“not o.k.” (cf. Fig. 6). The whole assembly age time needed to produce one part. A
process is monitored by a shop floor man- result of such analysis of a hypothetical as-
agement system which collects the process sembly line which produces different types
data and provides statistical information to of footwear is shown in Fig. 7. From such
operators and the management. a decision tree, one can not only conclude
In a project together with AMS En- that the average time indeed depends on
gineering Sticht GmbH (www.ams- the type and operator but one can also see
engineering.com), we set out to apply groupings of operators and types. Similar
machine learning tools like mlf to gain results were obtained for real data (from a
more knowledge out of the information given assembly line with more then 30 pro-
provided by the shop floor management cessing modules made available by AMS
system. The high level goal is to achieve engineering Sticht GmbH) when analyzing

if operator ∈ {Sue, Pam, Tim, Sam, Bob, Kim} then
if type ∈ {casuals, sandals} then T = 13.7
else T = 12.7
else
if type ∈ {sandals, galoshes, slippers} then T = 13.05
else T = 11.34

Figure 7: A binary decision tree (written as if-then-else statements) which shows how the
average production time T depends on the produced type and the operator of a hypothet-
ical footwear assembly line.

EA, P E, and QR in this manner. synchronously,3 and it is rather straightfor-
Obviously there are much more prob- ward to detect which module caused the
lems regarding the analysis of EA, P E, and standstill. In contrast, a loosely coupled
QR which can potentially be addressed by system consists of asynchronously work-
means of machine learning tools. Below, ing processing modules, and the individual
we describe for each of the three factors workpiece holders are moved forward by
a typical issue which was not yet solved separate transport modules which also act
in a satisfactory way in the analysis tools as buffers (cf. Fig. 6). In this case, it is not
which are currently integrated in the shop trivial to always detect the correct “guilty”
floor management system. modules. Having a model which relates
standstills of the assembly line and stand-
stills of modules to each other will help to
find the reasons for low EA. In the current
project, first promising results have been
Equipment availability obtained by a combination of intelligent data
visualization and machine learning tools.
Equipment availability (EA) measures how
much time is lost by (technical or organiza-
tional) standstills of the assembly line. Min- Production efficiency
imizing the number of standstills maximizes
EA. Since a standstill of the assembly line Production efficiency (P E) describes how
is caused by the standstill of one or more efficient/fast the individual parts are pro-
processing modules, one must detect the duced. On the level of individual modules,
“guilty” modules and remedy deficiencies in 3
To ensure that all the workpiece holders are at
those modules. In a rigidly coupled assem- the right place at the right time they are usually at-
bly line, all the processing modules work tached to some kind of transportation belt.

P E measures how efficient/fast the pro- quality rate of the machine but also help
cessing step of a given module is. To op- to determine whether a measured quan-
timize the efficiency, it is, of course, inter- tity is “important” or “useless” (and may be
esting to know what parameters influence skipped at all), i.e. attribute selection. In-
the processing speed (and thus P E) of a teresting results of this kind for real data
given module in which way. In the con- were obtained by using so called model or
text of optimizing the P E of the whole as- regression trees for more details).
sembly line, it is of special interest to ana-
lyze whether parameters which directly con-
trol the behavior of a given module influ- 5 Conclusions
ence the speed of any successor module.
Making extensive use of machine learning In this paper we have described two areas
tools, one can, for example, construct a dia- of industrial applications (paper production
gram where it is shown how strong the pro- and discrete manufacturing) where we have
cessing speed of a given module depends successfully applied machine learning tools
on parameters associated with predeces- (in particular mlf ) in an offline manner for
sor modules. With such an overview dia- process optimization and quality control. In
gram, one can dig deeper to look at the in- fact, the knowledge gained by applying this
dividual models relating the parameters to type of intelligent data analysis helped to
the processing speed. This approach was optimize the production processes.
successfully applied in analyzing real data The currently running projects are by no
(made available by AMS engineering Sticht means unidirectional. We are constantly
GmbH). receiving feedback from our industry part-
ners which often results in improvements
Quality rate of the used methods. Each particular im-
provement effects one ore more aspects of
The quality rate (QR) is the ratio between the used methods. According to our experi-
the number of “o.k.”-parts and the total num- ence gained in the running projects, several
ber of produced parts. The quality (“o.k.” or aspects turned out to be important for ma-
“not o.k.”) of a part is determined by means chine learning methods to be a useful tool
of a great number of measurements (taken to analyze a company’s crucial data:
at several modules) such as force, length, • Interpretability and prediction accu-
and torque referred to as quality criteria. If racy of computational models gener-
any of these quality criteria is outside its ated from data (e.g. fuzzy decision
allowed range, the part is marked as “not trees combine these two requirements)
o.k.”. A central issue is to analyze relation-
ships between such measurements. Such • Integration of methods in existing anal-
knowledge will not only help to increase the ysis and preprocessing tools (e.g. Pa-

perMiner is built as an add-on to MS-         Developer Conf., Champaign, IL, 2003.
    Excel)                                        Wolfram Research Inc.

   • Possibility to compare the results of [3] M. Drobics and U. Bodenhofer. Fuzzy
      several methods (with different param-      modeling with decision trees. In Proc.
      eters) in reasonable time.                  2002 IEEE Int. Conf. on Systems, Man
                                                  and Cybernetics, pages 90–95, Ham-
   • Appealing and easy-to-comprehend             mamet, Tunisia, October 2002.
      graphical representation
                                              [4] M. Drobics, U. Bodenhofer, and E. P.
   • Different levels of analysis; i.e. GUI       Klement. FS-FOIL: An inductive learn-
      (e.g. PaperMiner) versus programming        ing method for extracting interpretable
      language level (e.g. Mathematica).          fuzzy descriptions. Internat. J. Approx.
                                                  Reason., 32(2–3):131–152, 2003.
If all these requirements are met, it is very
likely that people start to “trust” in those [5] J. Haslinger, U. Bodenhofer, and
methods of data analysis.                         M. Burger. Data-driven construction
                                                  of Sugeno controllers: Analytical as-
                                                  pects and new numerical methods. In
Acknowledgements                                  Proc. Joint 9th IFSA World Congress
                                                  and 20th NAFIPS Int. Conf., pages 239–
This work has been done in the framework          244, Vancouver, July 2001.
of the Kplus Competence Center Program
which is funded by the Austrian Govern- [6] T. Kohonen. Self-organized formation of
ment, the Province of Upper Austria, and          topologically correct feature maps. Biol.
the Chamber of Commerce of Upper Aus-             Cyb., 43:59–69, 1982.
tria.                                         [7] E. Lughofer and E. P. Klement. Model-
                                                  based fault detection in multi-sensor
                                                  measurement systems. Technical Re-
References                                        port FLLL-TR-0303, Fuzzy Logic Labo-
                                                  ratorium Linz-Hagenberg, 2003.
[1] J. C. Bezdek. Pattern Recognition with
    Fuzzy Objective Function Algorithms. [8] J. H. Ward Jr. Hierarchical grouping to
    Plenum Press, New York, 1981.            optimize an objective function. J. Amer.
                                             Statist. Assoc., 58:236–244, 1963.
[2] M. Drobics. machine learning frame-
    work for Mathematica — creating under-
    standable computational models from
    data. In Proc. of the 2003 Mathematica

                                            10

You can also read