Validation of design methods: lessons from medicine
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Research in Engineering Design (2006) 17: 45–57 DOI 10.1007/s00163-006-0016-4 O R I GI N A L P A P E R Daniel D. Frey Æ Clive L. Dym Validation of design methods: lessons from medicine Received: 17 March 2005 / Revised: 15 January 2006 / Accepted: 10 April 2006 / Published online: 25 May 2006 Springer-Verlag London Limited 2006 Abstract This paper discusses the validation of design airplane manufacturers and in their government coun- methods. The challenges and opportunities in validation terparts (e.g., NASA). Early-stage design decisions are are illustrated by drawing an analogy to medical re- characterized by uncertainty and ambiguity and, espe- search and development. Specific validation practices cially in large organizations, later design decisions are such as clinical studies and use of models of human made through processes involving teams with a variety disease are discussed, including specific ways to adapt of experiences, skill, and information. In such circum- them to engineering design. The implications are ex- stances, it can be challenging to establish the benefits of plored for three active areas of design research: robust new methods. design, axiomatic design, and design decision making. It This paper explores the topic of validation through is argued that medical research and development has an analogy with medical research and development. The highly-developed, well-documented validation methods principal audience is design researchers although it is and that many specific practices such as natural experi- hoped that policy makers and industry practitioners may ments and model-based evaluations can profitably be also find the discussion useful. Section 2 reviews the adapted for use in engineering design research. literature on validation in the context of design meth- odology. Section 3 introduces an analogy between design Keywords Validation Æ Design methodology Æ Robust methods and medical treatments. Section 4 considers this design Æ Design decisions analogy in the context of three active research topics in engineering design, and some concluding remarks are presented in Sect. 5. 1 Introduction The validation of design methods is important for the continuing advancement of both design theory and the 2 Literature review professional practice of engineering. Researchers in de- sign theory require validation processes to guide the 2.1 Validation of design knowledge development and evaluation of new methods. Profes- sional practitioners need validation processes to deter- This paper is concerned with the validation of claims to mine which methods to employ, as well as when and knowledge in and about engineering design. This subject how to employ them. The latter is especially important can be viewed as a specialized topic within epistemol- in large, complex organizations such as automobile and ogy—the branch of philosophy concerned with the nature of knowledge, the justification of knowledge, and the nature of rationality. There are three prominent D. D. Frey (&) contemporary views of the justification of knowledge Department of Mechanical Engineering and Engineering Systems Division, Massachusetts Institute of Technology, claims (Audi 1995): 77 Massachusetts Avenue, Cambridge, MA 02139, USA E-mail: danfrey@mit.edu • Foundationalism holds that some instances of Tel.: +1-617-3246133 knowledge are basic and that the remaining instances are justified by relating them to basic beliefs (e.g., by C. L. Dym deduction from axioms); Department of Engineering, Harvey Mudd College, • Relativism argues that knowledge cannot be validated 301 Platt Boulevard, Claremont, CA 91711–5990, USA E-mail: clive_dym@hmc.edu in an objective way and that individual, subjective Tel.: +1-909-6218853 preferences and rules of fraternal behavior among
46 scientists must be considered a part of validation behaviors. This concept is sometimes called ecological processes; and rationality because it implies that knowledge—as in- • Naturalistic epistemology promotes empirical study of ferred from behavior—should be judged by a fit with the how subjects convert sensory data into theories. environment (Todd and Gigerenzer 2003). Simon’s view of rationality is generally aligned with naturalistic epis- Schön suggested that engineering design and related temology since it has a basis in empirical studies of professions such as architecture and management have agents. created a demand for an epistemology of practice (Schön 1983). Schön conducted field studies of engineers and other professionals and noted that skilled practitioners 2.2 Validation of methods frequently rely on tacit knowledge that cannot easily be codified. Dym (1994a, b) has made similar arguments Engineers frequently seek to evaluate a design method or about the difficulty of representing and articulating de- software tool for a specified use or range of uses. The sign knowledge. Argyris called the set of tacit knowledge Institute of Electrical and Electronics Engineers (IEEE driving one’s professional work a theory-in-action, 1998) defines validation as the ‘‘confirmation by exami- which, he noted, often failed to match the espoused nation and provision of objective evidence that the par- theory of the discipline or practice (Argyris 1991). Schön ticular requirements for on intended use are fulfilled.’’ As and Argyris (1975) proposed a framework for evaluating will be discussed in Sect. 3, this paper will seek to build theories related to professional practice that included upon the IEEE definition showing it is closely related to checks on: (1) internal consistency, (2) congruence with that of the Food and Drug Administration. the espoused theory, (3) testability of the theory, and, Olewnik and Lewis (2005) propose an alternative ultimately, (4) effectiveness of the theory. They further definition of validation (applicable to decision support argue that a theory-in-action is: methods and design methods) wherein, for a method to be valid, it must: • Testable if one can ‘‘specify the situation, the desired result, and the action through which the result is to be 1. Be logical; achieved,’’ and 2. Use meaningful reliable information; and • Effective when ‘‘action according to the theory tends 3. Not bias the designer. to achieve its governing variables.’’ One desirable property of this definition is that it Pedersen et al. (2000) proposed a similar framework in reveals the ways that some design methods impose which design theories are subjected to a validation preferences on the designer. However, a potential square of four quadrants, each representing one of the drawback of Olewnik and Lewis’ definition is that following design dimensions: invalidity ‘‘does not imply that the methods are inef- fective.’’ By contrast, the IEEE definition emphasizes a 1. Theoretical structural validity. link between validation and an assurance of effectiveness 2. Empirical structural validity. for its specific intended uses. 3. Empirical performance validity, and Todd and Gigerenzer (2003) have proposed still a 4. Theoretical performance validity. different way of validating decision methods (generally aligned with naturalistic epistemology) comprised of the Thus, Pedersen et al. (2000) and Argyris (1991) and following steps: Schön (1983) all suggest a balanced approach that in- cludes the evaluation of internal consistency and effec- 1. Proposing computational models of candidate tiveness. One key difference is that the framework of methods that are realistically based on human com- Pedersen et al. is more nearly aligned with relativist petences, and testing whether they work via simula- epistemology: they define scientific knowledge within the tion; field of design as socially justifiable belief. The validation 2. Mathematically analyzing when and how the meth- square does balance this relativistic definition of ods work with particular environmental structures; knowledge with use of empirically-based notions of and validity. An objective of the present paper is to explore a 3. Experimentally testing when people use these methods. more objectivist foundation for validation of design methods. This approach to validation was applied to many A distinctly different framework was suggested by decision scenarios resulting in the conclusion that ‘‘there Simon, who proposed that ‘‘human rational behavior is is a point where increasing information and information shaped by a scissors whose two blades are the structure processing can actually do harm.’’ One specific example of task environments and the computational capabilities is that a ‘‘Take The Best’’ heuristic equals or outper- of the actor’’ (Simon 1990). Simon emphasized the fit forms any linear decision strategy because decision cues between problem-solving behaviors and the problem are frequently non-compensatory, that is, the potential environment, rather than the internal consistency of the contribution of each new cue falls off rapidly so that
47 combinations of later cues cannot outweigh earlier ones 2.4 Validation of design research methodology (Todd and Gigerenzer 2003). This empirically observed effect, sometimes called the ‘‘less is more’’ effect, is The engineering design research community also has a particularly of interest in light of Olewink and Lewis’ need to evaluate research programs and their results. proposed criteria of validity (Olewnik and Lewis 2005). Therefore critical analysis of research methodology has Todd and Gigerenzer (2003) have provided well-docu- periodically been pursued and has been a healthy part of mented examples in which inclusion of valid information our community’s development. Reich (1994) analyzed into a decision process causes worse decision outcomes the status of research methodology in artificial intelli- rather than better ones. gence for engineering design. His conclusion was that the status is ‘‘poor’’ partly because ‘‘it makes claims far beyond what other disciplines dream of making’’ and 2.3 Validation of models that ‘‘the need for reflection is not taken seriously in AI.’’ As an antidote to these ills, Reich proposed a Engineers frequently seek to evaluate and apply a model layered model of research methodology. The first layer for a specified use or a range of uses. The American differentiates research methods according to their Institute of Aeronautics and Astronautics (AIAA 1998) metaphysical stance. The two options in this layer were defines model validation as ‘‘the process of determining scientism which is based on an objectivist epistemology the degree to which a model is an accurate representa- and practicism which is based on a subjectivist episte- tion of the real world from the perspective of the in- mology, tempered to avoid misuse. The second layer tended uses of the model.’’ Much work has been done to concerns research heuristics that characterize different practically implement a system of model validation communities, such as cognitive science, decision science, consistent with the AIAA definition. For example, and software engineering. The third layer concerns the Hasselman has proposed a technique for evaluating most specific issues of hypothesis evaluation including models based on bodies of empirical data (Hasselman statistical testing and assessment of parsimony. The 2001; Hasselman et al. 1998). goals of the present paper are quite similar to those of Hazelrigg (2003) proposed an alternative definition of Reich (1994) but the position taken here and to be model validity from the perspective of decision the- developed subsequently is substantially different. Reich ory—a model is valid to the extent that it supports the says of the scientist/objectivist view ‘‘we have witnessed conclusion that ‘‘design point O will produce an out- its demise in philosophy.’’ Our view is that it is pre- come that is preferred to the outcome that would be mature to dismiss an objective basis for evaluating de- produced by design point C with essentially probability sign research, methods, or models. At the same time, the 1.’’ This definition of validity prizes resolution over concept of layered evaluations and many of the specific accuracy. One desirable property of Hazelrigg’s defini- suggestions Reich makes are strongly endorsed, espe- tion is that a relatively inaccurate model may be viewed cially ‘‘if the purpose is improving practice, then a study as valid for making choices among alternatives when one that illustrates such improvement should be furnished’’ of the alternatives is vastly preferred to the others. Ha- (Reich 1994). zelrigg’s view of model validation is, in some ways, foundationalist, since it seeks to maintain traceability to basic knowledge, in this case to classical decision theory. 2.5 Needs analysis However, the framework also embraces some elements of subjectivity since, according to Hazelrigg, ‘‘a model is A review of the literature reveals a substantial range of valid when, in the mind of the decision maker, it’s up to opinion regarding validation in design theory. On the the task.’’ The AIAA definition, by contrast, emphasizes other hand, engineering’s professional societies have an objective correspondence of a model with data. been fairly consistent in other matters in seeking vali- McAdams and Dym (2004) have also discussed the dation based on the provision of objective evidence. validation of models in engineering design, drawing on However, these same professional societies have not yet analogies and procedures used in mathematical model- explicitly applied these concepts to design methods and ing (Reich 1994), while also observing that ‘‘design theories. If the engineering profession does choose to models operate on information to produce informa- extend an objective concept of validation to design tion.’’ This, of course, leads one to ask, ‘‘What is methods and tools, it will need a supporting set of information?’’, a question to which many answers have practices and standards for the provision of evidence. been offered (e.g., see McAdams and Dym 2004). Ha- Questions such as the following will have to be an- zelrigg defined information as ‘‘what [a] decision is made swered: on,’’ and noted that it can be quantified in terms of probabilistic outcomes of specific decisions. But, as • Can theoretical arguments alone serve to validate a McAdams and Dym noted, there are many aspects of design method or theory, or must they ultimately be design (e.g., concept generation and synthesis) that are validated experimentally? poorly modeled when modeled only as decisions • What kinds of experiments provide valid evidence of (McAdams and Dym 2004). effectiveness for an intended use?
48 • Can design method validation be made economically in the labeling or proposed labeling thereof’’ (U. S. viable by improvements in speed and efficiency? Federal Food, Drug, and Cosmetic Act, Chapter 9.V, Sec. 355(d), http://www.access.gpo.gov/uscode/title21/ In connection with these questions, it is worth noting the chapter9_.html). recent study by Dorst and Vermass (2005), which ana- The primary goal of design research is to develop lyzes Gero’s function–behavior–structure model. Dorst design methods to be learned and used by designers to and Vermass (2005) point out that many difficulties arise create engineering artifacts, usually within commercial when one tries to match empirical (or experimental) re- engineering enterprises. The purpose of a design method sults against models and theories, including the lack of is to achieve specific design outcomes (e.g., improved clear criteria for matching the ‘‘quality’’ of models with profitability or better product quality and reliability). ‘‘validating’’ empirical data, and indeed, whether models The methods are developed by academic researchers, can be defined or specified sufficiently to be observed in industry practitioners, or by both. Engineering compa- the laboratory (or in practice.) nies look to a variety of professionals (e.g., engineers. The principal goal of this paper is to provide some statisticians, and managers) for advice about design practical answers to these (and other) questions by methods and for help in implementing them. Although drawing on the experience of another profession, medi- lacking the force of law, the IEEE definition of valida- cine, which has faced similar questions and answered tion entails ‘‘confirmation by examination and provision them with a substantial degree of success. In subsequent of objective evidence that the particular requirements for sections, it is argued that medical research and devel- an intended use are fulfilled’’ (Institute of Electrical and opment has highly-developed, well-documented valida- Electronics Engineers 1998). tion methods and that many specific practices such as Table 1 summarizes the analogy just outlined, based natural experiments and model-based evaluations can on the fact that both areas include professionals seeking profitably be adapted for use in engineering design re- outcomes by applying treatments or methodologies that search. might require validation. An important caution arises from inspection of Table 1. If medical treatments are to be compared with design methods, the most salient 3 The medical treatment–design method analogy differences between the two must be openly acknowl- edged. Design methods almost invariably require sub- 3.1 Introducing the analogy stantial interpretation and judgment in their application. This is true to a lesser extent with medical treatments. The professional community involved in medical re- Many drugs, for example, are either administered search and development has developed a set of well- according to the suggested dosage or not at all, leaving documented validation processes. The processes used in little room for interpretation of what administration of medicine are far from perfect and many significant the treatment entails. Surgical procedures, on the other mistakes are still made, both by releasing unsafe or hand, may vary substantially depending on the judg- ineffective drugs or by withholding effective treatments ment of the practitioner implementing them. Design too long. Still the practices applied in medicine possess methods, it must be admitted, lie at an extreme of the many positive attributes from which much can be spectrum: the application of design methods depends learned. In an effort to draw out those lessons, this section introduces an extensive analogy between medical Table 1 An analogy between medical research and development research and development, on the one hand, and design and design theory and methodology theory and methodology, on the other, here called the Medical research Design theory and medical treatment–design method analogy. and development methodology The primary goal of medical research and develop- ment is to develop treatments to be administered to hu- What is validated Medical treatments Design methods man patients. The purpose of the treatment is to achieve Entity affected Human patient Engineering clinical outcomes related to improved health (e.g., low- organization Outcomes Health, side Quality, time ering blood pressure). The treatments are developed evaluated effects, etc. to market, through academic research, or at pharmaceutical com- profitability, etc. panies, or both. Patients seek advice about treatments Developers Academic Academic researchers, from medical professionals and often gain access to such researchers, industry practitioners, pharmaceutical consultants, etc. treatments exclusively through them. Before a claim can companies, etc. be made about a treatment’s effects, the 1962 amendment Professions Medical doctors, Engineers, of the Food, Drug, and Cosmetics (FDC) Act requires involved nurses, statisticians, provision of ‘‘evidence consisting of adequate and well technicians managers controlled investigations . . . that the drug will have the Standards for Food, Drug, IEEE definition validation and Cosmetics of validation, effect it purports or is represented to have under the Act, and so on and so on conditions of use prescribed, recommended, or suggested
49 strongly on judgment of the designer who applies them, must be acknowledged, the subjects in most clinical tri- and their effectiveness is almost assuredly compromised als do not get to choose their treatment(s). The health of by poor implementation. the subjects in the clinical trial is affected by a multitude While the association among the entities compared is of factors that may not be controlled and/or may not be not perfect, it is worth continuing the effort to extend the monitored. For this reason, the data from clinical trials medical treatment–design method analogy to compare is subject to careful statistical analysis (Department of validation of medical treatments and validation of de- Health and Human Services—Food and Drug Admin- sign methods. Validation requires that evidence be istration 1998b), and adequate sample sizes are required provided. The types of evidence provided in medical to reach conclusions with a satisfactory degree of con- research and development are rich and varied. Table 2 fidence. It should be noted that the requirement for (below) provides a partial list of the types of evidence in clinical trials does have negative consequences, such as used in medical research. The list is roughly structured when treatments are withheld from patients that might downward from the most comprehensive evi- otherwise benefit from them. The point is not that the dence—clinical trials—through varying layers of sup- methods are perfect nor that they are perfectly executed, porting evidence, each requiring more assumptions and only that they are highly-developed and that much can abstractions than the one above. All of the layers of be learned from them. evidence are useful, but as will be discussed, there is a Now, to explore the medical treatment–design hierarchy to the levels of evidence. The next five sub- method analogy for validation, a ‘‘clinical trial’’ might sections discuss each layer in detail and flesh out the proceed as follows. Design problems would be identified, analogy with design theory. perhaps as benchmark design problems, and then allo- cated to different design methods or tools, including the specific design method to be studied and a comparable 3.2 Clinical trials tool. (It is hard to identify a design equivalent of a placebo here, although perhaps there is a design equiv- A clinical trial is the ultimate standard for validating the alent of the medical admonition to do no harm that effectiveness of medical treatments. The U.S. Food and could be brought into play). The design methods are Drug Administration (FDA) has developed detailed applied, the results are reviewed, and the design out- guidance for industry on the conduct of such trials comes are recorded. To avoid bias, the application of the (Department of Health and Human Services—Food and design methods ought to be done in some blinded Drug Administration 1996, 1998a). Human subjects are fashion, and while the ‘‘patient’’ or the design problem identified and allocated to different medical treatments can be oblivious to the ‘‘treatment’’ or the design including the specific treatment to be studied and a method being applied, the ‘‘physician-researcher’’ or comparator (sometimes a placebo). The medical treat- designer would have to know which method she was ments are administered, the effects are monitored, and applying, else she could not apply it! (On the other hand, the outcomes are recorded. To avoid bias, the adminis- whereas subjects in most clinical trials do not get to tration of the treatments is usually blinded, often double choose their treatment, designers—and analysts—often blinded so that neither the patient nor the physician- recognize that some problems ‘‘beg for’’ certain solu- researcher knows which treatments are given to which tions or treatments.) The outcomes or solutions to the subjects. Further, even though patients may volunteer design problem in the clinical trial may be affected by for or asked to be included in clinical trials, an other factors that cannot be readily controlled, although increasingly common experience in cancer treatments it is more conceivable that problems and their solution wherein there are other ethical issues whose presence methods can here be more readily isolated. Nevertheless, adequate sample sizes would still be required to reach conclusions with a satisfactory degree of confidence. Table 2 Types of evidence used to develop and validate medical Thus, a clinical trial for design methods could be an treatments and design methods experiment in which different methods are allocated to Evidence used to Evidence used to organizations or to specific tasks therein. The resultant develop/validate develop/validate effects on quality, profitability, or time to market might medical treatments design methods be monitored and analyzed statistically. Such studies are sometimes conducted and a recent example (Kunert Clinical trials Controlled field evaluation 2004) will be discussed in Sect. 4.1. However, as fore- of design methods Natural experiments Studies of industry practice shadowed just above, analogies of clinical studies for In vitro experiments Laboratory experiments engineering design raise several questions, including: with human subjects Animal models Detailed simulations • How can appropriate (benchmark) design problems of design methods be identified, articulated and represented. Moreover, Theory (organic chemistry, Theory (probability, decision who should do that? genetics, molecular biology, science, cognitive science, cellular biology, etc.) organizational behavior) • Since designers—unlike their physician-researcher counterparts—must be familiar with the design
50 method they are implementing, does it matter that the that ‘‘smoking is a factor, and an important factor, in process of blinding is likely impossible in this context? the production of carcinoma of the lung’’ (Doll and Hill If it does matter, how much? 1950). There was subsequent debate over the validity of • Since many engineering firms are unlikely to accept the study: since it was a natural—rather than con- and implement a design method not of their own trolled—experiment, correlation had been statistically choosing, does it matter if the allocation of treatments established, but causation was less certain. The choice to or design methods cannot be randomized? If it does smoke or not was made by the subjects, and those who matter, how much? choose to smoke are not an unbiased, random sample of • Can the costs of doing such research—over many the population at large. The influence of such consid- companies, designers, methods and products—be kept erations on inference should not be underestimated. sufficiently bounded that adequate sample sizes are They prompted the statistician R. A. Fisher to con- enabled? clude—regarding the cancer research—that ‘‘it is more likely that a common cause supplies the explanation’’ The last question suggests that ways to make clinical (Fisher 1958). Subsequent studies have substantially re- validation more efficient should be sought. One way to moved these doubts, establishing the causal link between reduce the cost—or shorten the time—in a clinical trial is smoking and lung cancer without actually resorting to a to use surrogate variables, that is, parameters that are clinical study in which human subjects were forced to observed in place of the relevant clinical outcomes when smoke. Today, most medical scientists accept that cau- it is difficult to make a direct observation of effective- sation can be established from natural experiments by ness. For example, it might take an overly long time to including (Hill 1966): demonstrate clinically that a drug reduces the long-term • An analysis of temporal relationship; risk of heart disease. Therefore, drugs are sometimes • Plausibility based on prior knowledge; and approved if they can be shown to have an effect on • Coherence with other known facts. variables thought to be related to risk of heart disease (e.g. blood cholesterol levels). However, ‘‘there have It is interesting to consider what ‘‘natural experiments’’ been many instances where treatments showing a highly might entail in design methodology. Every time a new positive effect on a proposed surrogate have ultimately method becomes widely adopted by industry, there is an been shown to be detrimental to the subjects’ clinical opportunity for the design research community to learn outcome’’ (U. S. Department of Health and Human about its effectiveness and its side effects. For example, Services - Food and Drug Administration, 1998, Pro- many U.S. companies adopted design methods from viding Clinical Evidence of Effectiveness for Human Japanese companies in the 1980s (e.g., Quality Function Drug and Biological Products, http://www.fda.gov/cder/ Deployment aka, QFD). Many believe that these com- guidance/idex.htm). Surrogate variables can be mis- panies derived benefits in quality, cost, and profitability. leading indicators of clinical effectiveness and no general A careful study of this ‘‘natural experiment’’ might have procedure for evaluating surrogate variables has yet more clearly established the effects of this ‘‘treatment’’ been broadly accepted in the medical community. on the relevant outcomes. This idea will be explored Similarly, surrogate variables may also be somewhat more fully in Sect. 4.3. helpful in validating engineering design methods. For A useful augmentation to natural experiments and example, Olewnik and Lewis’ (2005) proposed criteria clinical trials is meta-analysis: the combination of data for evaluating decision support methods could be viewed from several studies to produce a single estimate. Meta- as surrogate variables. It seems reasonable that a analysis employs statistical, multi-factorial methods in method is more likely to be effective if it is logical, uses which the treatment is one predictor variable and the meaningful, reliable information, and does not bias the study is another predictor variable (Bland 1987). A designer. However, as Olewnik and Lewis (2005) principal difficulty with meta-analysis is so-called pub- acknowledged, these criteria do not ensure effectiveness lication bias. Studies which produce significant differ- in practice. ences are more likely to be published than those which do not (Easterbrook et al. 1991). Also, unfavorable re- 3.3 Natural experiments sults are left unpublished more frequently than favorable results. Thus, in medical research it is advised that all One alternative to a clinical trial, which is a controlled studies be used in any meta-analysis, both published and experiment, is the so-called natural experiment in which unpublished, demanding that researchers seek out the consequences of different treatments are ob- studies through personal knowledge and intensive served—but without controlled administration of the investigation. treatments. For example, in the first major study linking Design methods can also, in principle, be studied by smoking and lung cancer, 1,500 patients who had been meta-analysis. As will be discussed in Sect. 4.2, meta- diagnosed with lung cancer and a similar-sized sample of analysis is used in design research, but more extensive patients not diagnosed with lung cancer were surveyed studies may be possible. Applications of different design about their smoking habits. The result of the study was methods are frequently published and they could be
51 collected and subject to statistical analysis. However, laboratory research, especially decision support tools, just as in medical research, publication bias is a real often degrade performance, rather than improve it concern. Many case studies are developed to illustrate (Klein et al. 2003). the successful use of a design method. This is a reason- able practice as new methods are much more useful to practitioners when accompanied by examples illustrating 3.5 Animal models successful outcomes. Publication bias is a natural con- sequence of the dynamics of the publication process and Animal models are another important tool in medical generally not evidence of deliberate attempts at decep- research and development. An animal model is an tion. Nevertheless, and while difficult to accomplish, organism—often a mouse—selected or developed to adequate safeguards against bias must be made in meta- bear specific physiological similarities to humans with analysis of case studies of engineering design method- specific medical disorders. A variety of animal models ology. In most cases, it is not possible to collect all of the exist for studying disorders of the heart, blood, brain, published and unpublished applications of a design eyes, and so on (Center for Modeling Human Disease, method. Further, if a subset of unpublished cases is http://www.cmhd.ca). The precise form of an appropri- sampled, it would be difficult to avoid bias since engi- ate animal model is strongly determined by the medical neering firms will justifiably be concerned about dis- disorder being modeled. For example, to study attention seminating evidence of unsuccessful outcomes. deficit hyperactivity disorder (ADHD), mice lacking a specific gene encoding a mechanism regulating brain chemistry, a dopamine transporter or DAT, have been 3.4 In vitro experiments developed, thus the mice are called DAT knockout or DAT-KO mice: ‘‘The preponderance of common sym- In many cases, applying an experimental medical treat- ptomatologies between the DAT-KO mice and individ- ment to a human subject is not justifiable, but it may be uals with ADHD suggests that these mice may... serve as possible to apply the treatment to human tissues and a useful animal model and as resource to test new cells outside the human body. This type of experiment is therapies’’ (Gainetdinov et al. 1999). described as in vitro as opposed to in vivo. This ap- The use of animal models entails uncertainty due to proach avoids the risk of harming subjects, and some- the inevitable differences between the animal and the times enables closer observation of the effects and human. For example, of the DAT-KO mice, it is noted mechanisms of the treatment. However, because the that ‘‘despite the similarities between the mutant mice tissues or cells are outside the normal context of their and humans with ADHD-HKD, it is unlikely that their existence, great care must be exercised in drawing phenotypes are completely identical’’ (Gainetdinov et al. inferences about clinical effectiveness. 1999). In this regard, an animal model for medical re- In design theory, human subjects are sometimes used search and development is similar to an engineering as subjects in laboratory experiments (e.g., Chakrabarti model for use in design. The model is intended to rep- et al 2004; Cacciabue and Hollnagel 1995). These are resent another entity, but does not represent that entity analogous to in vitro experiments in medicine in the in all regards. In fact, there are no universally accepted following ways: procedures for validating animal models of human dis- ease because the validity of the models is so closely • The subject of the experiment is removed from the linked to the investigations to be conducted (S.L. Ad- usual context, in this case, the corporation where most amson, personal communication). authentic engineering practice takes place. Simon proposed a way to go about validation of • Cooperation of an entire engineering enterprise is design methods that is similar to the practice of using generally not needed. The human subjects can vol- animal models in medicine (Simon 1996): unteer individually. • Closer observation and control of experimental con- ‘‘. . . there exist today a considerable number of ditions may be possible. examples of actual design processes, of many dif- ferent kinds, that have been defined fully . . . in the Laboratory experiments with human subjects are cur- form of running computer programs . . . Because rently providing insights into engineering design. How- these computer programs describe complex design ever, there is a risk in extending results from laboratory processes in complete, painstaking detail, they are experiments to make inferences about engineering open to full inspection and analysis, or to trial by practice. Macrocognition is a term that has been coined simulation. They constitute a body of empirical to describe the cognitive functions performed in natu- phenomena . . . There is no question, since these ral—versus artificial, laboratory—settings (Klein et al. programs exist, of the design process hiding behind 2003). Real-world settings require such activities as the cloak of ‘‘judgment’’ or ‘‘experience.’’ Whatever problem setting, attention management, planning, and judgment or experience was used in creating the adaptation/re-planning in ways that laboratories can programs must now be incorporated in them and rarely simulate. As a result, tools developed based on hence be observable.’’
52 In effect, Simon is proposing that a computer simulation prescription of psycho-stimulants to ADHD patients of a design process stand in for the actual design process. continued despite the paradox. Meanwhile, animal To be more precise, the argument is that a computer models were (and still are) being used in detailed inves- simulated design scenario to which design methods tigation of the effects of psycho-stimulants on the brain would be applied is a more apt analogy for an animal functions of DAT knockout mice, which may ‘‘provide model as used in medical research. In fact, the animal insights into the basic mechanisms that underlie the model analogy suggests that a full correspondence be- etiology of this and other hyperkinetic disorders’’ tween the simulation and the reality being modeled is (Gainetdinov et al. 1999). These investigations may not required. Animal models such as DAT-KO mice eventually lead to drugs that more specifically target the bear only a few selected similarities to humans with relevant mechanisms to ADHD and therefore are more ADHD. Similarly, simulations used in engineering de- effective and/or have fewer side effects. sign may bear only certain key similarities to real-world The phenomenon of practical uses preceding design and still reveal interesting insights. This practice underlying theory is also common in engineering. The is not uncommon for research in engineering design; founding of thermodynamics as a field was substan- many papers use an approach in which computer sim- tially accelerated by the study of working steam en- ulations of design processes are used as a source of gines. The ability to make workable steam engines epirical knowledge about design (Moss and Cagan preceded understanding of thermodynamics, not the 2004). Such studies do not necessarily require precise other way around as it is frequently assumed. Similarly, models of cognitive processes of engineering designers. many of the most useful innovations in design meth- Schön has proposed another means to test and use odology including robust design, lean manufacturing, methods, a practicum, rather than a computer simula- and QFD emerged from industry practice. Only later tion (Schön 1983): were these practices studied by theorists. In some cases, the innovations were refined based on the theory, but ‘‘. . . a practicum ... is really a virtual world. A virtual just as often, theories had to be extended or even world in the sense that it represents the world of revolutionized to accommodate the understanding of practice, but is not the world of practice. . . ’’ the practice. The interplay of theory and applica- tion—with applications often leading the way—is long Just as Simon proposed the computer simulation as a known, but too often forgotten. Design researchers model of real design, Schön has proposed another kind should certainly keep in mind the approach of medical of entity a bit closer to professional design practice. A researchers. If theoretical investigations of a design key difference in Schön’s practicum is that an actual method seem to conflict with field reports about that person has to carry out the design. Therefore a practi- method, then the design methodology may still be valid cum can assess a design method and the degree to which and perhaps the theory should be extended to include a it fits human cognitive and psychological attributes. richer variety of considerations. Using practica (or something very similar) is common in research in engineering design; for example, researchers often use classroom settings to evaluate design methods 4 Implications of the analogy (e.g., Wilkening and Sobek 2004; Reich et al. 2006). This section considers validation in three different areas of design theory and methodology—robust design, axi- 3.6 Theory omatic design, and decision-based design—in order to explore more thoroughly the medical treatment–design The underlying theories of medical science such as method analogy. chemistry and biology play a pivotal role in the devel- opment of new treatments. Because basic research cat- 4.1 Robust design alyzes new innovations, the developers of medical treatments pay careful attention to research and even Robust design is a set of design methods in which make substantial investments in privately funded basic products and processes are made less sensitive to man- research. However, in validation of medical treatments, ufacturing variations, customer use conditions, and theory plays a surprisingly small role. Drugs are often degradation over time. The techniques were first pio- administered widely without full understanding of their neered by Taguchi who, through his extensive work with underlying mechanisms—as long as their effectiveness Japanese industry, developed a methodology (Taguchi can be established clinically. For example, it is known 1987; Phadke 1989). Later, researchers and practitio- that psycho-stimulants have a calming effect on humans ners—especially in statistics—developed new ap- with ADHD. This effect appeared so logically inexpli- proaches intended to be more tightly linked to cable that medical researchers referred to these calming mathematical theory, and especially to design of experi- effects as paradoxical: seemingly self-contradictory, yet ments (DOE). nonetheless true (Gainetdinov et al 1999). Because the Design of experiments is a body of knowledge and value of the treatments was established clinically, the techniques for planning a set of experiments, analyzing
53 the resulting data, and drawing conclusions from the resulted in a standard deviation of the response sub- analysis (Wu and Hamada 2000; Box et al. 1978). DOE stantially smaller than that provided by a single-array in general—and fractional factorial design in particu- design, although both methods were much better than a lar—have been key theoretical foundations for devel- ‘‘placebo’’ (selecting control factor settings at random) opment of robust design methods. Both Taguchi (Frey and Li 2004). These model-based results tend to methods and newer techniques use fractional factorial corroborate Kunert’s ‘‘clinical’’ study showing that the experiments, but in different ways. Recent theoretical single instance in the field was probably typical of the developments have led statisticians to discourage the use population of realistic scenarios in the field. of crossed arrays and encourage the use of a single array The results in this section suggest that it is a mistake including both control and noise factors (Wu and Ha- to infer too much about robust design methods based mada 2000; Borror and Montgomery 2000). The justi- solely on mathematical theory. A similar conclusion has fication is that ‘‘. . . some of the single arrays . . . are been voiced by Box and Liu (1999). In recent decades, uniformly better than the cross arrays in terms of the the DOE research community has emphasized the number of clear main effects and two-factor interac- development of mathematically optimal experimental tions’’ (Wu and Hamada 2000). designs which tend to be ‘‘one-shot’’ procedures. Such In robust design, the ‘‘clinical endpoint’’—what the designs are optimal within a formal axiomatic frame- engineering organization wants—is improved system work, but Box has argued that, in practice, they performance in the presence of noise. Single-array de- undermine the experimenter’s need to alternate between signs have been justified on the basis of the number of forming hypotheses and conducting experiments (Box clear effects and of control by nose interactions. One and Liu 1999). Box further argued that this resulted in may say that these properties are being used as surrogate less improvement of systems than would have been variables, and that the clinical endpoints should be achieved by iterative procedures (even though they are evaluated in a clinical trial. Such a trial was recently partly heuristic). Box’s findings, if accepted, provide an conducted by Kunert et al. (2004). They used crossed- instance of overemphasis on mathematical theory lead- array designs and single-array designs to improve the ing to ineffective professional practice due to neglect of consistency of a sheet metal spinning process. In effect, relevant human factors. this was a paired comparison experiment where the experimental ‘‘treatment’’ was the design method em- ployed. The result of the experiment was that the cros- 4.2 Axiomatic design sed-array method led to process settings with a more consistent profile of the sheet metal parts when com- Axiomatic design posits two ‘‘axioms’’: the indepen- pared with the single-array method. dence axiom and the information axiom (Suh 1990). The The ‘‘clinical trial’’ by Kunert et al. provides some full theory of axiomatic design, including theorems and objective evidence validating the crossed-array method corollaries, is based on theorems whose validity, it is and disconfirming the single-array approach to robust said, relies on the validity of the axioms. Thus, axiomatic design. However, this evidence is not conclusive. First of design theory employs foundationalist epistemology. As all, the experiment included only one replication of the long as the axioms are valid and the other elements of paired comparison. If the exact same experiment were the theory are deduced from the axioms, it is proposed repeated, the single array might have beaten the crossed that the theory is sound and useful for professional array simply due to random variations. Further, if the practitioners. In particular, it is argued that the theory is same two methods were applied to some other engi- useful in the design of large scale systems along almost neering system, say a lithography process rather than a all dimensions of their design and operation. That is, sheet metal spinning process, the single array may have axiomatic design claims to improve design, construction, prevailed over the crossed array. A paired comparison in operation, modification, maintenance, documentation, a single instance of application does not provide as much and diagnosis of system failures (Suh 1998). information as one would like to validate a design Suh (1990) defines axioms as ‘‘truths that that cannot method. be derived or proved to be true except that there are no Now, let us consider how validation methods analo- counter-examples or exceptions’’. This suggests that the gous to those in medicine can be applied in robust de- means to validate the axioms is to look for counter- sign. As an example of meta-analysis, Li and Frey (2005) examples. However, Suh’s axioms are in the imperative conducted a study of experiments carried out in many form. The independence axiom is ‘‘maintain the inde- fields of engineering to ascertain the degree that system pendence of functional requirements.’’ It is not clear regularities such as effect sparsity, hierarchy, and what a counter-example to an imperative might entail; inheritance were evident in the resulting data. Frey and the axiom is not testable—as defined by Schön (1983) Li (2004) then conducted a model-based investigation of and Argyris (1991) and described in Sect. 2.1—because it crossed-array and single-array designs using a computer does not ‘‘specify the situation, the desired result, and simulation of robust design on a large number of sim- the action through which the result is to be achieved.’’ ulated responses drawn from a third order polynomial The clinical trial part of the medical treatment–design model of the system’s response. A crossed-array design method analogy suggests one testable hypothesis about
54 axiomatic design. If training in axiomatic design is from medical research and development may help avoid viewed as a treatment, the hypothesis might be put forth such difficulties. Data should be collected from ongoing that providing training in axiomatic design to a group of natural experiments and laboratory experiments should engineers will enable them to produce designs having a be used to further investigate the implications of the higher ‘‘probability of success’’ than a group that re- theory. ceives training in another method (a comparator). By devising a study to test this hypothesis, axiomatic design might be objectively assessed. It might also enable the 4.3 Making design decisions assessment of how well axiomatic design theory can be learned and implemented in practice. Such an investi- Designers frequently find themselves having to rank gation would be very costly. objectives against one another, choose among alterna- The natural experiment part of the medical treat- tive means for achieving functions, select from alterna- ment–design method analogy suggests another ap- tive design options, and generally make a wide variety of proach. A software product, Acclaro, has been design decisions (Dym and Little 2004). Many of the developed to support axiomatic design and is currently tools used to help designers make such decisions have being marketed to engineering companies (Axiomatic become widely used, as well as often criticized. For Design Solutions Inc. website http://www.axiomaticde- example, QFD is a method frequently used in the early sign.com/). In effect, a natural experiment is currently phases of product design (Hauser and Clausing 1988). underway. The company that sells the software lists QFD was first developed at the Kobe ship yard in Japan, among its clients Ford Motor Company, Lockheed then adopted by major Japanese companies and subse- Martin, Hewlett Packard, Saab Rosemount, and others. quently by a broad variety of companies in North Also listed is ASML, which acquired Silicon Valley America and Europe. One primary goal of QFD is to Group Incorporated—a company that was among the help companies align their designs with the ‘‘voice of the first to use the software. A study of these companies customer.’’ One claim often made about QFD is that the might reveal how this treatment has affected various method encourages teamwork among engineers and measures of their performance, such as product quality marketing professionals by providing immersion in the ratings, time to market, profitability, market share, and specifications prior to design efforts and by setting so on. Such a study could not provide a definitive appropriate technical targets for mature products (Dym evaluation of the theory, but is essential to forming a and Little 2004). There is some empirical evidence that complete assessment. suggests that QFD has not been correlated in the short The in vitro experiment part of the medical treat- term with quality improvements, but that users never- ment–design method analogy suggests still another theless feel it has longer term benefits (Griffin 1989). means of evaluating Axiomatic Design. A simulation of Meanwhile, QFD and similar decision-making tools a design process could be created to test a hypothesis such as the pairwise comparison chart (PCC) and the such as, ‘‘if a system design does not maintain the analytical hierarchy process (AHP) have been taken to independence of functional requirements, then the set of task by some mathematicians and design theorists. Saari requirements cannot be satisfied.’’ In fact, just such an (2001) has investigated the mathematics of different investigation was carried out. Human subjects were forms of voting and of related decision support proce- asked to satisfy a set of n functional requirements using dures. These investigations have brought to light unde- a set of n design parameters for both uncoupled systems sirable properties the methods exhibit under certain and coupled systems of modest size (from n = 2 to circumstances. Hazelrigg (1998, 1999) has argued that n = 5) (Hirschi and Frey 2002). The experiment showed all of these methods are simply wrong because they that human subjects could succeed in simultaneously violate tenets of decision theory. One specific conclusion satisfying all of the requirements of both uncoupled and drawn by Hazelrigg is that industry should stop using coupled systems, but that the task completion time QFD. However, the framework for validation used in scaled linearly for uncoupled systems and geometrically medicine suggests that QFD should not be abandoned for coupled systems. This investigation does not support so easily because the existing field data show some po- the idea that coupling must be avoided, but it does sitive effects of QFD (Kuppuraju et al. 1985). Also, it is provide empirical support for the idea that coupling has not obvious that decision theory is an adequate model some negative implications in system design. for many forms of design decision making (Dym et al. The results in this section suggest that an axiomatic 2002). Rejecting QFD because of Arrow’s theorem in approach to design theory development will, in some decision theory is akin to telling patients to reject med- instances, lead to logical or practical difficulties. If axi- icines whose effects cannot be explained using current oms are stated regarding design, they may not be test- knowledge of chemistry and biology. No medical treat- able unless they are stated as empirical hypotheses. As ment can work except by a means that may ultimately be theorems are derived from the axioms, one must ensure subject to scientific explanation. However, in practice, not only consistency with the axioms, but also consis- many effective medical treatments are discovered before tency with every fact of reality that bears on the theo- their relevant mechanisms are understood. Under cur- rem, including human cognitive capabilities. Practices rently accepted medical practices in the developed world,
55 it is acceptable to market a treatment without a full biology, in vivo experiments with animal models, clinical understanding of its underlying mechanisms as long as trials with human subjects, and natural experiments. the treatment’s safety and effectiveness can be clinically Model-based validation is one of the techniques we established. Design researchers might conclude, at least most strongly encourage design researchers to adopt. In tentatively and cautiously, that QFD and similar tools medicine, tremendous investments are made in creating are currently the subject of field evaluation. At the same animals that reasonably model aspects of human dis- time, design researchers, in collaboration with design ease. Similar efforts might be made to create models of practitioners, should continue to investigate the mech- engineering design processes for the explicit purpose of anisms that provide perceptible benefits and simulta- evaluating design methodologies. As discussed in Sect. neously use theory to seek improvements. Reports from 4.1, some small scale examples have been carried out in industry practice suggest that the long term value of robust design, but much more may be possible with QFD derives from factors such as cross-functional concerted effort. coordination and information flow (Griffin 1989). The Another practice in medicine worthy of consideration medical analogy suggests that models should be devel- is the careful collection and analysis of field data as new oped that adequately represent these mechanisms, and treatments are introduced. Similar natural experiments then these models should be used to test and refine our in engineering design are ongoing every time a new decision making methods. method is adopted widely in the field. It is essential that Although the medical analogy seems to suggest QFD design researchers work to collect data and include may continue to be used, the analogy also raises an analysis of such data in evaluations of methods such as important issue concerning labeling. Over-the-counter QFD, Pugh controlled convergence, Taguchi methods, medications must carry carefully defined instructions for and Axiomatic Design. All of these methods have seen their use. Without such labels, medicines would be less substantial use in the authentic context of industry useful and more dangerous. Such labeling has been practice. At this point, any broad statements about ei- suggested for QFD through research in engineering de- ther their validity or inherent flaws must be interpreted sign. Dym et al. (2002) suggest that QFD and related in light of what has actually transpired. methods can be used cautiously at early stages of design, In addition to these concrete suggestions for design but should not be used as the sole means to establish a research practice, a more philosophical point may also single numerical rating that can then be used to choose a be made regarding the role of foundations in our field. single alternative. Further, specific cautions are made In both medicine and in engineering design it has fre- against over-interpretation of numerical differences be- quently been observed that as the many layers of evi- cause so much of design knowledge is not amenable to dence are traversed, surprising discoveries are made. straightforward mathematical modeling (Dym and Little Developments based on theory alone may prove to be 2004; Dym et al. 2002). This paper suggests that ineffective in practice. Despite this fact, the field of de- researchers in engineering design should pay close sign research has, at times, taken an exaggerated stand attention to such instructions since they are an essential regarding theoretical foundations—that as long as con- part of the ‘‘treatments’’ being developed. sistency with first principles can be maintained, then good methodology is likely to result. Leonard Savage, in developing an axiomatic basis for statistics and decision making, provided a caution against this approach and 5 Closing remarks suggested a more balanced view of the interactions be- tween foundations and professional practice: The principal argument of this paper is that many of the highly-developed validation techniques found in medi- ‘‘It is often argued academically that no science can cine can profitably be used in engineering design re- be more secure than its foundations, and that, if there search. Validation of design methods is an important is controversy about the foundations, there must be topic for engineering design theory and practice. Every even more controversy about the higher parts of the engineer that designs products or processes uses some science. As a matter of fact, the foundations are the design method or composite of methods. The design most controversial part of many, if not all, sciences... methods now in use have an impact on all aspects of As in other sciences, controversies about the foun- success, including quality, performance, cost, and time dations of statistics reflect themselves to some extent to market. Therefore, it seems reasonable to seek vali- in everyday practice, but not nearly so catastrophi- dation procedures for design methods that are as good cally as one might imagine. I believe that here, as as the ones applied in developing medical treatments. elsewhere, catastrophe is avoided, primarily because Inasmuch as medical treatment is so widely viewed as in practical situations common sense generally saves important, the associated validation procedures are well- all but the most pedantic of us from flagrant error... documented, objective, and evidence-based. New treat- Although study of the foundations of a science does ments must provide proof of their effectiveness before not have the role that would be assigned to it by naı̈ve they can be deployed widely. This proof is built up of first-things-firstism, it certainly has a continuing many layers, including basic theory in chemistry and importance as the science develops, influencing, and
You can also read