Extracting Knowledge and Computable Models from Data - Needs, Expectations, and Experience
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Extracting Knowledge and Computable Models from Data - Needs, Expectations, and Experience Thomas Natschläger, Felix Kossak, and Mario Drobics Software Competence Center Hagenberg, A-4232 Hagenberg, Austria Email: {Thomas.Natschlaeger, Felix.Kossak, Mario.Drobics}@scch.at Abstract In modern industrial manufac- ble – a large quantity of high-quality prod- turing, a great amount of data is gathered ucts in a (cost-) efficient way. To tackle to monitor and analyze a given production this challenge, major requirements are an process. Intelligent analysis of such data optimized production process and constant helps to reveal as much information about quality control. Obviously it is desirable the production process as possible. This that each of these issues is supported by information is most useful if it is available suitable intelligent data analysis (IDA) tools in the form of interpretable and predictive which operate on the process data, i.e. sig- models. Such models can be generated nals which are measured by appropriate from data by means of (fuzzy logic based) sensors connected to the production pro- machine learning methods. In this contri- cess. bution we will describe industrial applica- tions in the areas of process optimization Fig. 1 shows a possible scenario where and quality control where we have success- IDA tools are used in an offline manner: fully established machine-learning methods the knowledge gained by IDA operating on as intelligent data analysis tools. large amounts of stored data influences “online” analyses like quality control, trend analysis, and fault detection, as well as the production process itself, i.e. process op- 1 Introduction timization. It turns out that the knowledge extracted by the IDA is particularly useful In order to be competitive on the market, if it comes in the form of mathematically the major goal of any industrial production well defined models of certain dependen- process is to produce – as fast as possi- cies within the data. A potentially success- 1
production process A sensor s1 sensor s2 sensor s2 prediction sensor s3 sensor sn sensor s1 computational knowledge, models, predictions sensor s3 M(s1,...,sd) model M for sensor sp sensor sd online visualization quality control B class_Is_Iris-setosa trend analysis storage class_Is_Iris-versicol fault detection class_Is_Iris-virginica petal_width_IsAtLeast_M H150L T petal_width_IsAtMost_M H100L T class_Is_Iris-setosa H53 Intelligent F class_Is_Iris-virginica Data Analysis F class_Is_Iris-versicol H50L Figure 1: Possible scenario of itelligent data C class_Is_Iris-setosa petal_width_Is_M analysis (IDA) class_Is_Iris-versicol petal_width_Is_L class_Is_Iris-virginica petal_width_Is_H ful approach is to generate such models Figure 2: Computational models. In con- by means of machine learning methods; trast to “black box models” (A) such as neu- see e.g. [1]. Typically a model “learned” ral networks, descriptive models such as from the data predicts a sensor signal sp decision trees (B) or rule bases (C) are easy based on a set of different sensor signals to interpret. The drawings in panels B and s1 , s2 , ..., sd , i.e. sp = M (s1 , s2 , ..., sd ) (see C are obtained with mlf (see Sec. II) applied also Fig. 2A). to the well-known iris data set. The use of such predictive models can be twofold: one can exploit the predictive capa- bilities of a model on the one hand and the ily interpretable, like it is the case for deci- structure of the model on the other hand. sion trees and rule bases (see Fig. 2B), this The prediction obtained from a given (and structure can help to understand the pro- previously learned) model can for example cess in more depth. In consequence, such be used to detect anomalous working con- enhanced knowledge can often be used to ditions of the production process [7] or to optimize the production process. get an idea about the quality of the product Even from this short introduction, one can if certain process parameters are varied. In already specify two general requirements such applications, the accuracy of the pre- of machine learning methods when applied diction is of primary interest while the struc- as intelligent data analysis tools: the re- ture of the model plays a minor role. How- sulting models should have good predictive ever, if the structure of the model is eas- capabilities while the structure of a model 2
should be easily interpretable. In Sec. 2, ical capabilities of Mathematica (see Fig. 3). we will describe a framework for machine learning, called mlf, which tries to meet this Descriptive models via fuzzy logic requirements. mlf has been successfully used as an intelligent data analysis tool in The design of mlf was strongly influenced several applications. Two particular appli- by the requirements of being able to gen- cations are described in Sec. 3 and Sec. 4. erate descriptive and highly predictive com- Throughout the sections 2, 3, and 4, we will putational models. To a large part, this was draw our attention to the features of the ap- achieved by integrating fuzzy logic wher- plied machine learning tools which turned ever possible. As a result, the models gen- out to be important to successfully establish erated by mlf contain easily interpretable them in an intelligent data analysis tool. phrases like “the value of sensor s1 is large” while still maintaining numeric accuracy and predictive capabilities. In addition, smooth 2 mlf : A Machine Learn- results very often model the underlying de- pendencies within the data more realistic ing Framework for Math- than crisp ones. ematica Wide range of algorithms The machine learning framework for Mathe- matica1 (mlf ) is a collection of powerful ma- As it is a matter of fact that for a given prob- chine learning algorithms integrated into a lem, it is not clear in advance which learning framework for the main purpose of intelli- algorithm will yield the best results, it is al- gent data analysis [2]. mlf 2 combines an ways a good advice to try different machine optimized computational kernel - the core learning algorithms or to combine them to engine - realized in C++ with the manipu- solve one single problem. Such combi- lation, descriptive programming, and graph- nations of distinct algorithms may give the user unforeseen insights into their data. 1 Mathematica is a registered trademark of Wol- Therefore mlf covers a wide range of ma- fram Research Inc. (www.wolfram.com). chine learning algorithms, which are listed 2 mlf is developed and supported by the Knowl- edge Based Technology area of the Software in Tab. 1 together with the corresponding Competence Center Hagenberg GmbH (A-4232 references. Hagenberg, Austria, http://www.scch.at). mlf is owned and distributed by uni software plus GmbH (Kreuzstrasse 15a, A-4040 Linz, Aus- Visualization and structure of data tria, http://www.unisoftwareplus.com/) which has ten years of experience in distributing Mathematica and In addition to supervised learning algo- Mathematica-based solutions and collaborates with rithms which generate the kind of compu- Wolfram Research, Inc., for worldwide distribution. tational models depicted in Fig. 2A, the 3
Table 1: Algorithms implemented in mlf Supervised Analysis Fuzzy decision tree learning (FS-ID3) [3] Fuzzy rule base learning (FS-FOIL, FS-MINER) [4] Numerical optimization of arbitrary rule bases (RENO) [5] Unsupervised Analysis Clustering algorithms (WARD, FCM) [8, 1] Self-organizing maps (SOM) [6] machine learning framework also contains Powerful architecture and interface several unsupervised learning algorithms. Such algorithms can help to understand The machine learning framework for Math- how data is structured (e.g. which groups ematica is used from the Mathematica front of customers a company has) or to visu- end. By using the Mathematica front end, alize the often very high dimensional data one has access to all Mathematica func- (e.g. by using SOMs [6]). Apart from ma- tions, including the powerful graphical ma- chine learning algorithms, mlf includes a nipulation tools, and, with the Mathemat- wide range of visualization methods using ica programming language, one has access all the power from the graphics capabilities to an elegant scripting language. How- of Mathematica. ever, computationally intensive algorithms are implemented in an optimized compu- tational kernel (the core engine realized in C++) to yield fast response times. The C++ kernel is completely independent of the Mathematica front end and can be uti- High speed lized from any environment capable of call- ing C++ functions (see Fig. 3). All algorithms are highly parameterizable to be able to adjust them to a particular prob- lem. Given this parameterizeability and the 3 Process optimization in range of learning algorithms combined with paper production the efficient core engine (realized in C++) of mlf, the user is able to “look at their data In this section, we will describe an applica- from different points of view” in real time. tion of mlf for process optimization in paper 4
production. results Excel sheets GUI Paper production is a complex flow pro- auxiliary information cess with several process levels and hun- da nd s els dreds of process parameters which poten- ta a od mm dm co ze tially influence the quality of the produced u a li vis paper, assessed by dozens of quality mea- surements. Hence a major goal is to adjust Mathematica the process parameters for optimum paper quality under tough economic constraints. Knowledge about dependencies within the XML C++ core data, i.e. the process parameters and the mlf quality measurements, is valuable informa- tion in assisting the optimum adjustment of Figure 3: The architecture of PaperMiner parameters which needs enormous expert which, is based on mlf. knowledge and experience. Unfortunately the process of paper production is so com- ality must be accessible for users without a plex that the creation of explicit models for special mathematical background. On the such dependencies is hardly possible. In other hand, experts should be able to dig such a case, intelligent data analysis by deeper using the same tools (e.g. with re- means of machine learning methods can spect to co-operative work). This results in help to discover such required knowledge the following requirements: from previously measured data. This approach turned out to be very fruit- ful and culminated in a project together with GUI for standard problems VOITH Paper (www.voith.com) and SCA Graphic Laakirchen AG (www.sca.at) where A user should be able to obtain useful re- a machine learning application called Pa- sults for standard problems by following a perMiner was (and is still being) developed. rather short series of simple steps. The PaperMiner is heavily used in practice and PaperMiner GUI represents the required has led to a number of surprising insights steps in a hierarchical way which makes by the users. the whole workflow visible at first sight; see A major requirement for the PaperMiner Fig. 4. from the onset was a graphical user inter- face (GUI) for accessing some of the func- Expert interface tionality of mlf (see Fig. 3 for a block dia- gram of PaperMiner). The requirement for In many cases, however, standard pro- a GUI reflects the more general require- cedures will not yield the desired results. ment that basic machine learning function- In these cases, the user should be able, 5
Figure 5: PaperMiner’s interactive decision tree. The screen shot shows a decision tree Figure 4: Graphical user interface (GUI) of generated for the well known iris data set. PaperMiner (screen shot) data. for instance, to optimize parameters of the During the application of PaperMiner to model-generating algorithms. With the cur- practical problems in paper production, it rent architecture of the application, this is turned out that whereas the resulting mod- possible through the Mathematica interface els often describe highly interesting de- of mlf. pendencies, sometimes the chosen at- Another primary requirement for the Pa- tributes/measurements are judged by the perMiner was that it can be used in con- domain experts not to be the “best” ones. nection with Microsoft Excel, for the follow- To examine whether the original measure- ing reasons: a) Excel is a program which ment suggested by the learning algorithm potential users are typically already familiar can be replaced by a “better” measurement, with; b) Excel is capable of displaying and PaperMiner features an interactive tool for transforming data in a convenient way; and the construction of a decision tree. This a) It is relatively easy to extract data from a tool allows a user to modify individual nodes database into an Excel file (and respective of a given (most commonly a previously scripts were already available for the pilot learned) decision tree; see Fig. 5. application). From the requirement to use A further requirement is that the standard Excel as the data source, one can derive presentation of the output should be as ap- the more general requirement that applying pealing and intuitive as possible. This is the machine learning algorithms is only half most easily achieved by graphical represen- of the story about intelligent data analysis. tations. We experienced that people could One also needs flexible, powerful and easy- easily interpret decision trees with color en- to-use tools to access and preprocess the coding of key information at the nodes with- 6
out knowing anything about the theoretical background. Simple diagrams and similar graphical depictions can be interpreted by humans intuitively, at first sight, while the information conveyed by such pictures is still sufficient for most applications. Still, for transport module processing module fine-grained process optimization, more de- finished tailed output must be available as well. parts flow of workpiece holders feeding module Figure 6: Schematic of a loosely coupled 4 Machine Learning in Dis- assembly line. crete Manufacturing an overall equipment efficiency (OEE) as In this section, we will describe applications high as possible and to analyze the causes of machine learning in the are of discrete if the OEE is not satisfactory. The OEE is manufacturing. The process of assembling a product of equipment availability (EA), discrete parts from its discrete components production efficiency (P E), and quality rate (produced elsewhere) has different charac- (QR). Therefore each of the three factors teristics from a flow process like paper pro- must be as high as possible. duction (see previous section). In a typical If one of those factors is strongly varying assembly line, the components to be as- over time, it is of interest to figure out the sembled (provided by certain feeder mod- factors which cause low values. For exam- ules) are fixed at a workpiece holder which ple, one may investigate whether the peo- goes through several processing modules ple operating the assembly line and the type until the part is finished either as “o.k.” or of the produced parts influence the aver- “not o.k.” (cf. Fig. 6). The whole assembly age time needed to produce one part. A process is monitored by a shop floor man- result of such analysis of a hypothetical as- agement system which collects the process sembly line which produces different types data and provides statistical information to of footwear is shown in Fig. 7. From such operators and the management. a decision tree, one can not only conclude In a project together with AMS En- that the average time indeed depends on gineering Sticht GmbH (www.ams- the type and operator but one can also see engineering.com), we set out to apply groupings of operators and types. Similar machine learning tools like mlf to gain results were obtained for real data (from a more knowledge out of the information given assembly line with more then 30 pro- provided by the shop floor management cessing modules made available by AMS system. The high level goal is to achieve engineering Sticht GmbH) when analyzing 7
if operator ∈ {Sue, Pam, Tim, Sam, Bob, Kim} then if type ∈ {casuals, sandals} then T = 13.7 else T = 12.7 else if type ∈ {sandals, galoshes, slippers} then T = 13.05 else T = 11.34 Figure 7: A binary decision tree (written as if-then-else statements) which shows how the average production time T depends on the produced type and the operator of a hypothet- ical footwear assembly line. EA, P E, and QR in this manner. synchronously,3 and it is rather straightfor- Obviously there are much more prob- ward to detect which module caused the lems regarding the analysis of EA, P E, and standstill. In contrast, a loosely coupled QR which can potentially be addressed by system consists of asynchronously work- means of machine learning tools. Below, ing processing modules, and the individual we describe for each of the three factors workpiece holders are moved forward by a typical issue which was not yet solved separate transport modules which also act in a satisfactory way in the analysis tools as buffers (cf. Fig. 6). In this case, it is not which are currently integrated in the shop trivial to always detect the correct “guilty” floor management system. modules. Having a model which relates standstills of the assembly line and stand- stills of modules to each other will help to find the reasons for low EA. In the current project, first promising results have been Equipment availability obtained by a combination of intelligent data visualization and machine learning tools. Equipment availability (EA) measures how much time is lost by (technical or organiza- tional) standstills of the assembly line. Min- Production efficiency imizing the number of standstills maximizes EA. Since a standstill of the assembly line Production efficiency (P E) describes how is caused by the standstill of one or more efficient/fast the individual parts are pro- processing modules, one must detect the duced. On the level of individual modules, “guilty” modules and remedy deficiencies in 3 To ensure that all the workpiece holders are at those modules. In a rigidly coupled assem- the right place at the right time they are usually at- bly line, all the processing modules work tached to some kind of transportation belt. 8
P E measures how efficient/fast the pro- quality rate of the machine but also help cessing step of a given module is. To op- to determine whether a measured quan- timize the efficiency, it is, of course, inter- tity is “important” or “useless” (and may be esting to know what parameters influence skipped at all), i.e. attribute selection. In- the processing speed (and thus P E) of a teresting results of this kind for real data given module in which way. In the con- were obtained by using so called model or text of optimizing the P E of the whole as- regression trees for more details). sembly line, it is of special interest to ana- lyze whether parameters which directly con- trol the behavior of a given module influ- 5 Conclusions ence the speed of any successor module. Making extensive use of machine learning In this paper we have described two areas tools, one can, for example, construct a dia- of industrial applications (paper production gram where it is shown how strong the pro- and discrete manufacturing) where we have cessing speed of a given module depends successfully applied machine learning tools on parameters associated with predeces- (in particular mlf ) in an offline manner for sor modules. With such an overview dia- process optimization and quality control. In gram, one can dig deeper to look at the in- fact, the knowledge gained by applying this dividual models relating the parameters to type of intelligent data analysis helped to the processing speed. This approach was optimize the production processes. successfully applied in analyzing real data The currently running projects are by no (made available by AMS engineering Sticht means unidirectional. We are constantly GmbH). receiving feedback from our industry part- ners which often results in improvements Quality rate of the used methods. Each particular im- provement effects one ore more aspects of The quality rate (QR) is the ratio between the used methods. According to our experi- the number of “o.k.”-parts and the total num- ence gained in the running projects, several ber of produced parts. The quality (“o.k.” or aspects turned out to be important for ma- “not o.k.”) of a part is determined by means chine learning methods to be a useful tool of a great number of measurements (taken to analyze a company’s crucial data: at several modules) such as force, length, • Interpretability and prediction accu- and torque referred to as quality criteria. If racy of computational models gener- any of these quality criteria is outside its ated from data (e.g. fuzzy decision allowed range, the part is marked as “not trees combine these two requirements) o.k.”. A central issue is to analyze relation- ships between such measurements. Such • Integration of methods in existing anal- knowledge will not only help to increase the ysis and preprocessing tools (e.g. Pa- 9
perMiner is built as an add-on to MS- Developer Conf., Champaign, IL, 2003. Excel) Wolfram Research Inc. • Possibility to compare the results of [3] M. Drobics and U. Bodenhofer. Fuzzy several methods (with different param- modeling with decision trees. In Proc. eters) in reasonable time. 2002 IEEE Int. Conf. on Systems, Man and Cybernetics, pages 90–95, Ham- • Appealing and easy-to-comprehend mamet, Tunisia, October 2002. graphical representation [4] M. Drobics, U. Bodenhofer, and E. P. • Different levels of analysis; i.e. GUI Klement. FS-FOIL: An inductive learn- (e.g. PaperMiner) versus programming ing method for extracting interpretable language level (e.g. Mathematica). fuzzy descriptions. Internat. J. Approx. Reason., 32(2–3):131–152, 2003. If all these requirements are met, it is very likely that people start to “trust” in those [5] J. Haslinger, U. Bodenhofer, and methods of data analysis. M. Burger. Data-driven construction of Sugeno controllers: Analytical as- pects and new numerical methods. In Acknowledgements Proc. Joint 9th IFSA World Congress and 20th NAFIPS Int. Conf., pages 239– This work has been done in the framework 244, Vancouver, July 2001. of the Kplus Competence Center Program which is funded by the Austrian Govern- [6] T. Kohonen. Self-organized formation of ment, the Province of Upper Austria, and topologically correct feature maps. Biol. the Chamber of Commerce of Upper Aus- Cyb., 43:59–69, 1982. tria. [7] E. Lughofer and E. P. Klement. Model- based fault detection in multi-sensor measurement systems. Technical Re- References port FLLL-TR-0303, Fuzzy Logic Labo- ratorium Linz-Hagenberg, 2003. [1] J. C. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms. [8] J. H. Ward Jr. Hierarchical grouping to Plenum Press, New York, 1981. optimize an objective function. J. Amer. Statist. Assoc., 58:236–244, 1963. [2] M. Drobics. machine learning frame- work for Mathematica — creating under- standable computational models from data. In Proc. of the 2003 Mathematica 10
You can also read