Extracting Knowledge and Computable Models from Data - Needs, Expectations, and Experience

Page created by Ellen Lowe
 
CONTINUE READING
Extracting Knowledge and Computable Models
  from Data - Needs, Expectations, and Experience

                                      Thomas Natschläger, Felix Kossak, and Mario Drobics
                         Software Competence Center Hagenberg, A-4232 Hagenberg, Austria

                      Email: {Thomas.Natschlaeger, Felix.Kossak, Mario.Drobics}@scch.at

Abstract In modern industrial manufac-              ble – a large quantity of high-quality prod-
turing, a great amount of data is gathered          ucts in a (cost-) efficient way. To tackle
to monitor and analyze a given production           this challenge, major requirements are an
process. Intelligent analysis of such data          optimized production process and constant
helps to reveal as much information about           quality control. Obviously it is desirable
the production process as possible. This            that each of these issues is supported by
information is most useful if it is available       suitable intelligent data analysis (IDA) tools
in the form of interpretable and predictive         which operate on the process data, i.e. sig-
models. Such models can be generated                nals which are measured by appropriate
from data by means of (fuzzy logic based)           sensors connected to the production pro-
machine learning methods. In this contri-           cess.
bution we will describe industrial applica-
tions in the areas of process optimization     Fig. 1 shows a possible scenario where
and quality control where we have success-  IDA tools are used in an offline manner:
fully established machine-learning methods  the knowledge gained by IDA operating on
as intelligent data analysis tools.         large amounts of stored data influences
                                            “online” analyses like quality control, trend
                                            analysis, and fault detection, as well as the
                                            production process itself, i.e. process op-
1 Introduction                              timization. It turns out that the knowledge
                                            extracted by the IDA is particularly useful
In order to be competitive on the market, if it comes in the form of mathematically
the major goal of any industrial production well defined models of certain dependen-
process is to produce – as fast as possi- cies within the data. A potentially success-

                                                1
production process                      A   sensor s1

                                                                                                     sensor s2
                                                     sensor s2
                                                                                                                                                             prediction
                                                                 sensor s3

                                                                                 sensor sn
                                        sensor s1

                                                                                                                         computational
       knowledge, models, predictions

                                                                                                     sensor s3
                                                                                                                                                             M(s1,...,sd)
                                                                                                                           model M                         for sensor sp

                                                                                                     sensor sd

                                        online visualization
                                          quality control                                        B     class_Is_Iris-setosa
                                          trend analysis                     storage                   class_Is_Iris-versicol
                                          fault detection                                              class_Is_Iris-virginica

                                                                                                                     petal_width_IsAtLeast_M H150L

                                                                                                                 T                      petal_width_IsAtMost_M H100L

                                                                                                                                    T                  class_Is_Iris-setosa H53
                                                     Intelligent                                                                    F                  class_Is_Iris-virginica
                                                    Data Analysis                                                F                  class_Is_Iris-versicol H50L

Figure 1: Possible scenario of itelligent data C                                                     class_Is_Iris-setosa                        petal_width_Is_M
analysis (IDA)                                                                                       class_Is_Iris-versicol                      petal_width_Is_L
                                                                                                     class_Is_Iris-virginica                     petal_width_Is_H

ful approach is to generate such models                                                          Figure 2: Computational models. In con-
by means of machine learning methods;                                                            trast to “black box models” (A) such as neu-
see e.g. [1]. Typically a model “learned”                                                        ral networks, descriptive models such as
from the data predicts a sensor signal sp                                                        decision trees (B) or rule bases (C) are easy
based on a set of different sensor signals                                                       to interpret. The drawings in panels B and
s1 , s2 , ..., sd , i.e. sp = M (s1 , s2 , ..., sd ) (see                                        C are obtained with mlf (see Sec. II) applied
also Fig. 2A).                                                                                   to the well-known iris data set.
   The use of such predictive models can be
twofold: one can exploit the predictive capa-
bilities of a model on the one hand and the                                                      ily interpretable, like it is the case for deci-
structure of the model on the other hand.                                                        sion trees and rule bases (see Fig. 2B), this
The prediction obtained from a given (and                                                        structure can help to understand the pro-
previously learned) model can for example                                                        cess in more depth. In consequence, such
be used to detect anomalous working con-                                                         enhanced knowledge can often be used to
ditions of the production process [7] or to                                                      optimize the production process.
get an idea about the quality of the product                                                        Even from this short introduction, one can
if certain process parameters are varied. In                                                     already specify two general requirements
such applications, the accuracy of the pre-                                                      of machine learning methods when applied
diction is of primary interest while the struc-                                                  as intelligent data analysis tools: the re-
ture of the model plays a minor role. How-                                                       sulting models should have good predictive
ever, if the structure of the model is eas-                                                      capabilities while the structure of a model

                                                                                             2
should be easily interpretable. In Sec. 2,                 ical capabilities of Mathematica (see Fig. 3).
we will describe a framework for machine
learning, called mlf, which tries to meet this             Descriptive models via fuzzy logic
requirements. mlf has been successfully
used as an intelligent data analysis tool in               The design of mlf was strongly influenced
several applications. Two particular appli-                by the requirements of being able to gen-
cations are described in Sec. 3 and Sec. 4.                erate descriptive and highly predictive com-
Throughout the sections 2, 3, and 4, we will               putational models. To a large part, this was
draw our attention to the features of the ap-              achieved by integrating fuzzy logic wher-
plied machine learning tools which turned                  ever possible. As a result, the models gen-
out to be important to successfully establish              erated by mlf contain easily interpretable
them in an intelligent data analysis tool.                 phrases like “the value of sensor s1 is large”
                                                           while still maintaining numeric accuracy and
                                                           predictive capabilities. In addition, smooth
2       mlf : A Machine Learn-                             results very often model the underlying de-
                                                           pendencies within the data more realistic
        ing Framework for Math-                            than crisp ones.
        ematica
                                                           Wide range of algorithms
The machine learning framework for Mathe-
matica1 (mlf ) is a collection of powerful ma-             As it is a matter of fact that for a given prob-
chine learning algorithms integrated into a                lem, it is not clear in advance which learning
framework for the main purpose of intelli-                 algorithm will yield the best results, it is al-
gent data analysis [2]. mlf 2 combines an                  ways a good advice to try different machine
optimized computational kernel - the core                  learning algorithms or to combine them to
engine - realized in C++ with the manipu-                  solve one single problem. Such combi-
lation, descriptive programming, and graph-                nations of distinct algorithms may give the
                                                           user unforeseen insights into their data.
    1
      Mathematica is a registered trademark of Wol-        Therefore mlf covers a wide range of ma-
fram Research Inc. (www.wolfram.com).                      chine learning algorithms, which are listed
    2
      mlf is developed and supported by the Knowl-
edge Based Technology area of the Software
                                                           in Tab. 1 together with the corresponding
Competence Center Hagenberg GmbH (A-4232                   references.
Hagenberg, Austria, http://www.scch.at).         mlf
is owned and distributed by uni software plus
GmbH (Kreuzstrasse 15a, A-4040 Linz, Aus-                  Visualization and structure of data
tria, http://www.unisoftwareplus.com/) which has ten
years of experience in distributing Mathematica and        In addition to supervised learning algo-
Mathematica-based solutions and collaborates with          rithms which generate the kind of compu-
Wolfram Research, Inc., for worldwide distribution.        tational models depicted in Fig. 2A, the

                                                       3
Table 1: Algorithms implemented in mlf

                Supervised Analysis
                    Fuzzy decision tree learning (FS-ID3) [3]
                    Fuzzy rule base learning (FS-FOIL, FS-MINER) [4]
                    Numerical optimization of arbitrary rule bases (RENO) [5]

                Unsupervised Analysis
                    Clustering algorithms (WARD, FCM) [8, 1]
                    Self-organizing maps (SOM) [6]

machine learning framework also contains           Powerful architecture and interface
several unsupervised learning algorithms.
Such algorithms can help to understand             The machine learning framework for Math-
how data is structured (e.g. which groups          ematica is used from the Mathematica front
of customers a company has) or to visu-            end. By using the Mathematica front end,
alize the often very high dimensional data         one has access to all Mathematica func-
(e.g. by using SOMs [6]). Apart from ma-           tions, including the powerful graphical ma-
chine learning algorithms, mlf includes a          nipulation tools, and, with the Mathemat-
wide range of visualization methods using          ica programming language, one has access
all the power from the graphics capabilities       to an elegant scripting language. How-
of Mathematica.                                    ever, computationally intensive algorithms
                                                   are implemented in an optimized compu-
                                                   tational kernel (the core engine realized in
                                                   C++) to yield fast response times. The
                                                   C++ kernel is completely independent of
                                                   the Mathematica front end and can be uti-
High speed                                         lized from any environment capable of call-
                                                   ing C++ functions (see Fig. 3).

All algorithms are highly parameterizable to
be able to adjust them to a particular prob-
lem. Given this parameterizeability and the
                                                3 Process optimization in
range of learning algorithms combined with            paper production
the efficient core engine (realized in C++)
of mlf, the user is able to “look at their data In this section, we will describe an applica-
from different points of view” in real time.    tion of mlf for process optimization in paper

                                               4
production.                                                                    results
                                                        Excel sheets                                                 GUI
   Paper production is a complex flow pro-                              auxiliary information
cess with several process levels and hun-
                                                                   da                              nd
                                                                                                        s                   els
dreds of process parameters which poten-                               ta                      a                       od
                                                                                            mm                       dm
                                                                                         co                     ze
tially influence the quality of the produced                                                           u    a li
                                                                                                   vis
paper, assessed by dozens of quality mea-
surements. Hence a major goal is to adjust                                  Mathematica
the process parameters for optimum paper
quality under tough economic constraints.
Knowledge about dependencies within the                    XML               C++ core
data, i.e. the process parameters and the                                                      mlf
quality measurements, is valuable informa-
tion in assisting the optimum adjustment of         Figure 3: The architecture of PaperMiner
parameters which needs enormous expert              which, is based on mlf.
knowledge and experience. Unfortunately
the process of paper production is so com-
                                                    ality must be accessible for users without a
plex that the creation of explicit models for
                                                    special mathematical background. On the
such dependencies is hardly possible. In
                                                    other hand, experts should be able to dig
such a case, intelligent data analysis by
                                                    deeper using the same tools (e.g. with re-
means of machine learning methods can
                                                    spect to co-operative work). This results in
help to discover such required knowledge
                                                    the following requirements:
from previously measured data.
   This approach turned out to be very fruit-
ful and culminated in a project together with       GUI for standard problems
VOITH Paper (www.voith.com) and SCA
Graphic Laakirchen AG (www.sca.at) where            A user should be able to obtain useful re-
a machine learning application called Pa-           sults for standard problems by following a
perMiner was (and is still being) developed.        rather short series of simple steps. The
PaperMiner is heavily used in practice and          PaperMiner GUI represents the required
has led to a number of surprising insights          steps in a hierarchical way which makes
by the users.                                       the whole workflow visible at first sight; see
   A major requirement for the PaperMiner           Fig. 4.
from the onset was a graphical user inter-
face (GUI) for accessing some of the func-          Expert interface
tionality of mlf (see Fig. 3 for a block dia-
gram of PaperMiner). The requirement for            In many cases, however, standard pro-
a GUI reflects the more general require-            cedures will not yield the desired results.
ment that basic machine learning function-          In these cases, the user should be able,

                                                5
Figure 5: PaperMiner’s interactive decision
                                            tree. The screen shot shows a decision tree
Figure 4: Graphical user interface (GUI) of
                                            generated for the well known iris data set.
PaperMiner (screen shot)

                                                      data.
for instance, to optimize parameters of the              During the application of PaperMiner to
model-generating algorithms. With the cur-            practical problems in paper production, it
rent architecture of the application, this is         turned out that whereas the resulting mod-
possible through the Mathematica interface            els often describe highly interesting de-
of mlf.                                               pendencies, sometimes the chosen at-
   Another primary requirement for the Pa-            tributes/measurements are judged by the
perMiner was that it can be used in con-              domain experts not to be the “best” ones.
nection with Microsoft Excel, for the follow-         To examine whether the original measure-
ing reasons: a) Excel is a program which              ment suggested by the learning algorithm
potential users are typically already familiar        can be replaced by a “better” measurement,
with; b) Excel is capable of displaying and           PaperMiner features an interactive tool for
transforming data in a convenient way; and            the construction of a decision tree. This
a) It is relatively easy to extract data from a       tool allows a user to modify individual nodes
database into an Excel file (and respective           of a given (most commonly a previously
scripts were already available for the pilot          learned) decision tree; see Fig. 5.
application). From the requirement to use                A further requirement is that the standard
Excel as the data source, one can derive              presentation of the output should be as ap-
the more general requirement that applying            pealing and intuitive as possible. This is
the machine learning algorithms is only half          most easily achieved by graphical represen-
of the story about intelligent data analysis.         tations. We experienced that people could
One also needs flexible, powerful and easy-           easily interpret decision trees with color en-
to-use tools to access and preprocess the             coding of key information at the nodes with-

                                                  6
out knowing anything about the theoretical
background. Simple diagrams and similar
graphical depictions can be interpreted by
humans intuitively, at first sight, while the
information conveyed by such pictures is
still sufficient for most applications. Still, for
                                                                  transport module            processing module
fine-grained process optimization, more de- finished
tailed output must be available as well.           parts          flow of workpiece holders   feeding module

                                                     Figure 6: Schematic of a loosely coupled
4    Machine Learning in Dis-                        assembly line.

     crete Manufacturing
                                                     an overall equipment efficiency (OEE) as
In this section, we will describe applications       high as possible and to analyze the causes
of machine learning in the are of discrete           if the OEE is not satisfactory. The OEE is
manufacturing. The process of assembling             a product of equipment availability (EA),
discrete parts from its discrete components          production efficiency (P E), and quality rate
(produced elsewhere) has different charac-           (QR). Therefore each of the three factors
teristics from a flow process like paper pro-        must be as high as possible.
duction (see previous section). In a typical            If one of those factors is strongly varying
assembly line, the components to be as-              over time, it is of interest to figure out the
sembled (provided by certain feeder mod-             factors which cause low values. For exam-
ules) are fixed at a workpiece holder which          ple, one may investigate whether the peo-
goes through several processing modules              ple operating the assembly line and the type
until the part is finished either as “o.k.” or       of the produced parts influence the aver-
“not o.k.” (cf. Fig. 6). The whole assembly          age time needed to produce one part. A
process is monitored by a shop floor man-            result of such analysis of a hypothetical as-
agement system which collects the process            sembly line which produces different types
data and provides statistical information to         of footwear is shown in Fig. 7. From such
operators and the management.                        a decision tree, one can not only conclude
   In a project together with AMS En-                that the average time indeed depends on
gineering      Sticht    GmbH      (www.ams-         the type and operator but one can also see
engineering.com), we set out to apply                groupings of operators and types. Similar
machine learning tools like mlf to gain              results were obtained for real data (from a
more knowledge out of the information                given assembly line with more then 30 pro-
provided by the shop floor management                cessing modules made available by AMS
system. The high level goal is to achieve            engineering Sticht GmbH) when analyzing

                                                 7
if operator ∈ {Sue, Pam, Tim, Sam, Bob, Kim} then
                       if type ∈ {casuals, sandals} then T = 13.7
                                                    else T = 12.7
                    else
                       if type ∈ {sandals, galoshes, slippers} then T = 13.05
                                                               else T = 11.34

Figure 7: A binary decision tree (written as if-then-else statements) which shows how the
average production time T depends on the produced type and the operator of a hypothet-
ical footwear assembly line.

EA, P E, and QR in this manner.                      synchronously,3 and it is rather straightfor-
   Obviously there are much more prob-               ward to detect which module caused the
lems regarding the analysis of EA, P E, and          standstill. In contrast, a loosely coupled
QR which can potentially be addressed by             system consists of asynchronously work-
means of machine learning tools. Below,              ing processing modules, and the individual
we describe for each of the three factors            workpiece holders are moved forward by
a typical issue which was not yet solved             separate transport modules which also act
in a satisfactory way in the analysis tools          as buffers (cf. Fig. 6). In this case, it is not
which are currently integrated in the shop           trivial to always detect the correct “guilty”
floor management system.                             modules. Having a model which relates
                                                     standstills of the assembly line and stand-
                                                     stills of modules to each other will help to
                                                     find the reasons for low EA. In the current
                                                     project, first promising results have been
Equipment availability                               obtained by a combination of intelligent data
                                                     visualization and machine learning tools.
Equipment availability (EA) measures how
much time is lost by (technical or organiza-
tional) standstills of the assembly line. Min-       Production efficiency
imizing the number of standstills maximizes
EA. Since a standstill of the assembly line          Production efficiency (P E) describes how
is caused by the standstill of one or more           efficient/fast the individual parts are pro-
processing modules, one must detect the              duced. On the level of individual modules,
“guilty” modules and remedy deficiencies in             3
                                                         To ensure that all the workpiece holders are at
those modules. In a rigidly coupled assem-           the right place at the right time they are usually at-
bly line, all the processing modules work            tached to some kind of transportation belt.

                                                 8
P E measures how efficient/fast the pro-              quality rate of the machine but also help
cessing step of a given module is. To op-             to determine whether a measured quan-
timize the efficiency, it is, of course, inter-       tity is “important” or “useless” (and may be
esting to know what parameters influence              skipped at all), i.e. attribute selection. In-
the processing speed (and thus P E) of a              teresting results of this kind for real data
given module in which way. In the con-                were obtained by using so called model or
text of optimizing the P E of the whole as-           regression trees for more details).
sembly line, it is of special interest to ana-
lyze whether parameters which directly con-
trol the behavior of a given module influ-            5 Conclusions
ence the speed of any successor module.
Making extensive use of machine learning              In this paper we have described two areas
tools, one can, for example, construct a dia-         of industrial applications (paper production
gram where it is shown how strong the pro-            and discrete manufacturing) where we have
cessing speed of a given module depends               successfully applied machine learning tools
on parameters associated with predeces-               (in particular mlf ) in an offline manner for
sor modules. With such an overview dia-               process optimization and quality control. In
gram, one can dig deeper to look at the in-           fact, the knowledge gained by applying this
dividual models relating the parameters to            type of intelligent data analysis helped to
the processing speed. This approach was               optimize the production processes.
successfully applied in analyzing real data              The currently running projects are by no
(made available by AMS engineering Sticht             means unidirectional. We are constantly
GmbH).                                                receiving feedback from our industry part-
                                                      ners which often results in improvements
Quality rate                                          of the used methods. Each particular im-
                                                      provement effects one ore more aspects of
The quality rate (QR) is the ratio between            the used methods. According to our experi-
the number of “o.k.”-parts and the total num-         ence gained in the running projects, several
ber of produced parts. The quality (“o.k.” or         aspects turned out to be important for ma-
“not o.k.”) of a part is determined by means          chine learning methods to be a useful tool
of a great number of measurements (taken              to analyze a company’s crucial data:
at several modules) such as force, length,              • Interpretability and prediction accu-
and torque referred to as quality criteria. If            racy of computational models gener-
any of these quality criteria is outside its              ated from data (e.g. fuzzy decision
allowed range, the part is marked as “not                 trees combine these two requirements)
o.k.”. A central issue is to analyze relation-
ships between such measurements. Such                   • Integration of methods in existing anal-
knowledge will not only help to increase the              ysis and preprocessing tools (e.g. Pa-

                                                  9
perMiner is built as an add-on to MS-         Developer Conf., Champaign, IL, 2003.
    Excel)                                        Wolfram Research Inc.

   • Possibility to compare the results of [3] M. Drobics and U. Bodenhofer. Fuzzy
      several methods (with different param-      modeling with decision trees. In Proc.
      eters) in reasonable time.                  2002 IEEE Int. Conf. on Systems, Man
                                                  and Cybernetics, pages 90–95, Ham-
   • Appealing and easy-to-comprehend             mamet, Tunisia, October 2002.
      graphical representation
                                              [4] M. Drobics, U. Bodenhofer, and E. P.
   • Different levels of analysis; i.e. GUI       Klement. FS-FOIL: An inductive learn-
      (e.g. PaperMiner) versus programming        ing method for extracting interpretable
      language level (e.g. Mathematica).          fuzzy descriptions. Internat. J. Approx.
                                                  Reason., 32(2–3):131–152, 2003.
If all these requirements are met, it is very
likely that people start to “trust” in those [5] J. Haslinger, U. Bodenhofer, and
methods of data analysis.                         M. Burger. Data-driven construction
                                                  of Sugeno controllers: Analytical as-
                                                  pects and new numerical methods. In
Acknowledgements                                  Proc. Joint 9th IFSA World Congress
                                                  and 20th NAFIPS Int. Conf., pages 239–
This work has been done in the framework          244, Vancouver, July 2001.
of the Kplus Competence Center Program
which is funded by the Austrian Govern- [6] T. Kohonen. Self-organized formation of
ment, the Province of Upper Austria, and          topologically correct feature maps. Biol.
the Chamber of Commerce of Upper Aus-             Cyb., 43:59–69, 1982.
tria.                                         [7] E. Lughofer and E. P. Klement. Model-
                                                  based fault detection in multi-sensor
                                                  measurement systems. Technical Re-
References                                        port FLLL-TR-0303, Fuzzy Logic Labo-
                                                  ratorium Linz-Hagenberg, 2003.
[1] J. C. Bezdek. Pattern Recognition with
    Fuzzy Objective Function Algorithms. [8] J. H. Ward Jr. Hierarchical grouping to
    Plenum Press, New York, 1981.            optimize an objective function. J. Amer.
                                             Statist. Assoc., 58:236–244, 1963.
[2] M. Drobics. machine learning frame-
    work for Mathematica — creating under-
    standable computational models from
    data. In Proc. of the 2003 Mathematica

                                            10
You can also read