Validation of design methods: lessons from medicine

Page created by Joan Richards

Health & Fitness

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Research in Engineering Design (2006) 17: 45–57
DOI 10.1007/s00163-006-0016-4

 O R I GI N A L P A P E R

Daniel D. Frey Æ Clive L. Dym

Validation of design methods: lessons from medicine

Received: 17 March 2005 / Revised: 15 January 2006 / Accepted: 10 April 2006 / Published online: 25 May 2006
 Springer-Verlag London Limited 2006

Abstract This paper discusses the validation of design            airplane manufacturers and in their government coun-
methods. The challenges and opportunities in validation           terparts (e.g., NASA). Early-stage design decisions are
are illustrated by drawing an analogy to medical re-              characterized by uncertainty and ambiguity and, espe-
search and development. Speciﬁc validation practices              cially in large organizations, later design decisions are
such as clinical studies and use of models of human               made through processes involving teams with a variety
disease are discussed, including speciﬁc ways to adapt            of experiences, skill, and information. In such circum-
them to engineering design. The implications are ex-              stances, it can be challenging to establish the beneﬁts of
plored for three active areas of design research: robust          new methods.
design, axiomatic design, and design decision making. It              This paper explores the topic of validation through
is argued that medical research and development has               an analogy with medical research and development. The
highly-developed, well-documented validation methods              principal audience is design researchers although it is
and that many speciﬁc practices such as natural experi-           hoped that policy makers and industry practitioners may
ments and model-based evaluations can proﬁtably be                also ﬁnd the discussion useful. Section 2 reviews the
adapted for use in engineering design research.                   literature on validation in the context of design meth-
                                                                  odology. Section 3 introduces an analogy between design
Keywords Validation Æ Design methodology Æ Robust                 methods and medical treatments. Section 4 considers this
design Æ Design decisions                                         analogy in the context of three active research topics in
                                                                  engineering design, and some concluding remarks are
                                                                  presented in Sect. 5.
1 Introduction

The validation of design methods is important for the
continuing advancement of both design theory and the              2 Literature review
professional practice of engineering. Researchers in de-
sign theory require validation processes to guide the             2.1 Validation of design knowledge
development and evaluation of new methods. Profes-
sional practitioners need validation processes to deter-          This paper is concerned with the validation of claims to
mine which methods to employ, as well as when and                 knowledge in and about engineering design. This subject
how to employ them. The latter is especially important            can be viewed as a specialized topic within epistemol-
in large, complex organizations such as automobile and            ogy—the branch of philosophy concerned with the
                                                                  nature of knowledge, the justiﬁcation of knowledge, and
                                                                  the nature of rationality. There are three prominent
D. D. Frey (&)                                                    contemporary views of the justiﬁcation of knowledge
Department of Mechanical Engineering and Engineering
Systems Division, Massachusetts Institute of Technology,          claims (Audi 1995):
77 Massachusetts Avenue, Cambridge, MA 02139, USA
E-mail: danfrey@mit.edu
                                                                  • Foundationalism holds that some instances of
Tel.: +1-617-3246133                                                knowledge are basic and that the remaining instances
                                                                    are justiﬁed by relating them to basic beliefs (e.g., by
C. L. Dym                                                           deduction from axioms);
Department of Engineering, Harvey Mudd College,                   • Relativism argues that knowledge cannot be validated
301 Platt Boulevard, Claremont, CA 91711–5990, USA
E-mail: clive_dym@hmc.edu                                           in an objective way and that individual, subjective
Tel.: +1-909-6218853                                                preferences and rules of fraternal behavior among

scientists must be considered a part of validation behaviors. This concept is sometimes called ecological
processes; and rationality because it implies that knowledge—as in-
• Naturalistic epistemology promotes empirical study of ferred from behavior—should be judged by a fit with the
how subjects convert sensory data into theories. environment (Todd and Gigerenzer 2003). Simon’s view
of rationality is generally aligned with naturalistic epis-
Schön suggested that engineering design and related temology since it has a basis in empirical studies of
professions such as architecture and management have agents.
created a demand for an epistemology of practice (Schön
1983). Schön conducted field studies of engineers and
other professionals and noted that skilled practitioners 2.2 Validation of methods
frequently rely on tacit knowledge that cannot easily be
codified. Dym (1994a, b) has made similar arguments Engineers frequently seek to evaluate a design method or
about the difficulty of representing and articulating de- software tool for a specified use or range of uses. The
sign knowledge. Argyris called the set of tacit knowledge Institute of Electrical and Electronics Engineers (IEEE
driving one’s professional work a theory-in-action, 1998) defines validation as the ‘‘confirmation by exami-
which, he noted, often failed to match the espoused nation and provision of objective evidence that the par-
theory of the discipline or practice (Argyris 1991). Schön ticular requirements for on intended use are fulfilled.’’ As
and Argyris (1975) proposed a framework for evaluating will be discussed in Sect. 3, this paper will seek to build
theories related to professional practice that included upon the IEEE definition showing it is closely related to
checks on: (1) internal consistency, (2) congruence with that of the Food and Drug Administration.
the espoused theory, (3) testability of the theory, and, Olewnik and Lewis (2005) propose an alternative
ultimately, (4) effectiveness of the theory. They further definition of validation (applicable to decision support
argue that a theory-in-action is: methods and design methods) wherein, for a method to
be valid, it must:
• Testable if one can ‘‘specify the situation, the desired
result, and the action through which the result is to be 1. Be logical;
achieved,’’ and 2. Use meaningful reliable information; and
• Effective when ‘‘action according to the theory tends 3. Not bias the designer.
to achieve its governing variables.’’
One desirable property of this definition is that it
Pedersen et al. (2000) proposed a similar framework in reveals the ways that some design methods impose
which design theories are subjected to a validation preferences on the designer. However, a potential
square of four quadrants, each representing one of the drawback of Olewnik and Lewis’ definition is that
following design dimensions: invalidity ‘‘does not imply that the methods are inef-
fective.’’ By contrast, the IEEE definition emphasizes a
1. Theoretical structural validity.
link between validation and an assurance of effectiveness
2. Empirical structural validity.
for its specific intended uses.
3. Empirical performance validity, and
Todd and Gigerenzer (2003) have proposed still a
4. Theoretical performance validity.
different way of validating decision methods (generally
aligned with naturalistic epistemology) comprised of the
Thus, Pedersen et al. (2000) and Argyris (1991) and following steps:
Schön (1983) all suggest a balanced approach that in-
cludes the evaluation of internal consistency and effec- 1. Proposing computational models of candidate
tiveness. One key difference is that the framework of methods that are realistically based on human com-
Pedersen et al. is more nearly aligned with relativist petences, and testing whether they work via simula-
epistemology: they define scientific knowledge within the tion;
field of design as socially justifiable belief. The validation 2. Mathematically analyzing when and how the meth-
square does balance this relativistic definition of ods work with particular environmental structures;
knowledge with use of empirically-based notions of and
validity. An objective of the present paper is to explore a 3. Experimentally testing when people use these methods.
more objectivist foundation for validation of design
methods. This approach to validation was applied to many
A distinctly different framework was suggested by decision scenarios resulting in the conclusion that ‘‘there
Simon, who proposed that ‘‘human rational behavior is is a point where increasing information and information
shaped by a scissors whose two blades are the structure processing can actually do harm.’’ One specific example
of task environments and the computational capabilities is that a ‘‘Take The Best’’ heuristic equals or outper-
of the actor’’ (Simon 1990). Simon emphasized the fit forms any linear decision strategy because decision cues
between problem-solving behaviors and the problem are frequently non-compensatory, that is, the potential
environment, rather than the internal consistency of the contribution of each new cue falls off rapidly so that

47

combinations of later cues cannot outweigh earlier ones       2.4 Validation of design research methodology
(Todd and Gigerenzer 2003). This empirically observed
eﬀect, sometimes called the ‘‘less is more’’ eﬀect, is        The engineering design research community also has a
particularly of interest in light of Olewink and Lewis’       need to evaluate research programs and their results.
proposed criteria of validity (Olewnik and Lewis 2005).       Therefore critical analysis of research methodology has
Todd and Gigerenzer (2003) have provided well-docu-           periodically been pursued and has been a healthy part of
mented examples in which inclusion of valid information       our community’s development. Reich (1994) analyzed
into a decision process causes worse decision outcomes        the status of research methodology in artiﬁcial intelli-
rather than better ones.                                      gence for engineering design. His conclusion was that
                                                              the status is ‘‘poor’’ partly because ‘‘it makes claims far
                                                              beyond what other disciplines dream of making’’ and
2.3 Validation of models                                      that ‘‘the need for reﬂection is not taken seriously in
                                                              AI.’’ As an antidote to these ills, Reich proposed a
Engineers frequently seek to evaluate and apply a model       layered model of research methodology. The ﬁrst layer
for a speciﬁed use or a range of uses. The American           diﬀerentiates research methods according to their
Institute of Aeronautics and Astronautics (AIAA 1998)         metaphysical stance. The two options in this layer were
deﬁnes model validation as ‘‘the process of determining       scientism which is based on an objectivist epistemology
the degree to which a model is an accurate representa-        and practicism which is based on a subjectivist episte-
tion of the real world from the perspective of the in-        mology, tempered to avoid misuse. The second layer
tended uses of the model.’’ Much work has been done to        concerns research heuristics that characterize diﬀerent
practically implement a system of model validation            communities, such as cognitive science, decision science,
consistent with the AIAA deﬁnition. For example,              and software engineering. The third layer concerns the
Hasselman has proposed a technique for evaluating             most speciﬁc issues of hypothesis evaluation including
models based on bodies of empirical data (Hasselman           statistical testing and assessment of parsimony. The
2001; Hasselman et al. 1998).                                 goals of the present paper are quite similar to those of
    Hazelrigg (2003) proposed an alternative deﬁnition of     Reich (1994) but the position taken here and to be
model validity from the perspective of decision the-          developed subsequently is substantially diﬀerent. Reich
ory—a model is valid to the extent that it supports the       says of the scientist/objectivist view ‘‘we have witnessed
conclusion that ‘‘design point O will produce an out-         its demise in philosophy.’’ Our view is that it is pre-
come that is preferred to the outcome that would be           mature to dismiss an objective basis for evaluating de-
produced by design point C with essentially probability       sign research, methods, or models. At the same time, the
1.’’ This deﬁnition of validity prizes resolution over        concept of layered evaluations and many of the speciﬁc
accuracy. One desirable property of Hazelrigg’s deﬁni-        suggestions Reich makes are strongly endorsed, espe-
tion is that a relatively inaccurate model may be viewed      cially ‘‘if the purpose is improving practice, then a study
as valid for making choices among alternatives when one       that illustrates such improvement should be furnished’’
of the alternatives is vastly preferred to the others. Ha-    (Reich 1994).
zelrigg’s view of model validation is, in some ways,
foundationalist, since it seeks to maintain traceability to
basic knowledge, in this case to classical decision theory.   2.5 Needs analysis
However, the framework also embraces some elements
of subjectivity since, according to Hazelrigg, ‘‘a model is   A review of the literature reveals a substantial range of
valid when, in the mind of the decision maker, it’s up to     opinion regarding validation in design theory. On the
the task.’’ The AIAA deﬁnition, by contrast, emphasizes       other hand, engineering’s professional societies have
an objective correspondence of a model with data.             been fairly consistent in other matters in seeking vali-
    McAdams and Dym (2004) have also discussed the            dation based on the provision of objective evidence.
validation of models in engineering design, drawing on        However, these same professional societies have not yet
analogies and procedures used in mathematical model-          explicitly applied these concepts to design methods and
ing (Reich 1994), while also observing that ‘‘design          theories. If the engineering profession does choose to
models operate on information to produce informa-             extend an objective concept of validation to design
tion.’’ This, of course, leads one to ask, ‘‘What is          methods and tools, it will need a supporting set of
information?’’, a question to which many answers have         practices and standards for the provision of evidence.
been oﬀered (e.g., see McAdams and Dym 2004). Ha-             Questions such as the following will have to be an-
zelrigg deﬁned information as ‘‘what [a] decision is made     swered:
on,’’ and noted that it can be quantiﬁed in terms of
probabilistic outcomes of speciﬁc decisions. But, as          • Can theoretical arguments alone serve to validate a
McAdams and Dym noted, there are many aspects of                design method or theory, or must they ultimately be
design (e.g., concept generation and synthesis) that are        validated experimentally?
poorly modeled when modeled only as decisions                 • What kinds of experiments provide valid evidence of
(McAdams and Dym 2004).                                         eﬀectiveness for an intended use?

• Can design method validation be made economically in the labeling or proposed labeling thereof’’ (U. S.
viable by improvements in speed and efficiency? Federal Food, Drug, and Cosmetic Act, Chapter 9.V,
Sec. 355(d), http://www.access.gpo.gov/uscode/title21/
In connection with these questions, it is worth noting the chapter9_.html).
recent study by Dorst and Vermass (2005), which ana- The primary goal of design research is to develop
lyzes Gero’s function–behavior–structure model. Dorst design methods to be learned and used by designers to
and Vermass (2005) point out that many difficulties arise create engineering artifacts, usually within commercial
when one tries to match empirical (or experimental) re- engineering enterprises. The purpose of a design method
sults against models and theories, including the lack of is to achieve specific design outcomes (e.g., improved
clear criteria for matching the ‘‘quality’’ of models with profitability or better product quality and reliability).
‘‘validating’’ empirical data, and indeed, whether models The methods are developed by academic researchers,
can be defined or specified sufficiently to be observed in industry practitioners, or by both. Engineering compa-
the laboratory (or in practice.) nies look to a variety of professionals (e.g., engineers.
The principal goal of this paper is to provide some statisticians, and managers) for advice about design
practical answers to these (and other) questions by methods and for help in implementing them. Although
drawing on the experience of another profession, medi- lacking the force of law, the IEEE definition of valida-
cine, which has faced similar questions and answered tion entails ‘‘confirmation by examination and provision
them with a substantial degree of success. In subsequent of objective evidence that the particular requirements for
sections, it is argued that medical research and devel- an intended use are fulfilled’’ (Institute of Electrical and
opment has highly-developed, well-documented valida- Electronics Engineers 1998).
tion methods and that many specific practices such as Table 1 summarizes the analogy just outlined, based
natural experiments and model-based evaluations can on the fact that both areas include professionals seeking
profitably be adapted for use in engineering design re- outcomes by applying treatments or methodologies that
search. might require validation. An important caution arises
from inspection of Table 1. If medical treatments are to
be compared with design methods, the most salient
3 The medical treatment–design method analogy differences between the two must be openly acknowl-
edged. Design methods almost invariably require sub-
3.1 Introducing the analogy stantial interpretation and judgment in their application.
This is true to a lesser extent with medical treatments.
The professional community involved in medical re- Many drugs, for example, are either administered
search and development has developed a set of well- according to the suggested dosage or not at all, leaving
documented validation processes. The processes used in little room for interpretation of what administration of
medicine are far from perfect and many significant the treatment entails. Surgical procedures, on the other
mistakes are still made, both by releasing unsafe or hand, may vary substantially depending on the judg-
ineffective drugs or by withholding effective treatments ment of the practitioner implementing them. Design
too long. Still the practices applied in medicine possess methods, it must be admitted, lie at an extreme of the
many positive attributes from which much can be spectrum: the application of design methods depends
learned. In an effort to draw out those lessons, this
section introduces an extensive analogy between medical Table 1 An analogy between medical research and development
research and development, on the one hand, and design and design theory and methodology
theory and methodology, on the other, here called the
Medical research Design theory and
medical treatment–design method analogy. and development methodology
The primary goal of medical research and develop-
ment is to develop treatments to be administered to hu- What is validated Medical treatments Design methods
man patients. The purpose of the treatment is to achieve Entity affected Human patient Engineering
clinical outcomes related to improved health (e.g., low- organization
Outcomes Health, side Quality, time
ering blood pressure). The treatments are developed evaluated effects, etc. to market,
through academic research, or at pharmaceutical com- profitability, etc.
panies, or both. Patients seek advice about treatments Developers Academic Academic researchers,
from medical professionals and often gain access to such researchers, industry practitioners,
pharmaceutical consultants, etc.
treatments exclusively through them. Before a claim can companies, etc.
be made about a treatment’s effects, the 1962 amendment Professions Medical doctors, Engineers,
of the Food, Drug, and Cosmetics (FDC) Act requires involved nurses, statisticians,
provision of ‘‘evidence consisting of adequate and well technicians managers
controlled investigations . . . that the drug will have the Standards for Food, Drug, IEEE definition
validation and Cosmetics of validation,
effect it purports or is represented to have under the Act, and so on and so on
conditions of use prescribed, recommended, or suggested

49

strongly on judgment of the designer who applies them,            must be acknowledged, the subjects in most clinical tri-
and their eﬀectiveness is almost assuredly compromised            als do not get to choose their treatment(s). The health of
by poor implementation.                                           the subjects in the clinical trial is aﬀected by a multitude
   While the association among the entities compared is           of factors that may not be controlled and/or may not be
not perfect, it is worth continuing the eﬀort to extend the       monitored. For this reason, the data from clinical trials
medical treatment–design method analogy to compare                is subject to careful statistical analysis (Department of
validation of medical treatments and validation of de-            Health and Human Services—Food and Drug Admin-
sign methods. Validation requires that evidence be                istration 1998b), and adequate sample sizes are required
provided. The types of evidence provided in medical               to reach conclusions with a satisfactory degree of con-
research and development are rich and varied. Table 2             ﬁdence. It should be noted that the requirement for
(below) provides a partial list of the types of evidence in       clinical trials does have negative consequences, such as
used in medical research. The list is roughly structured          when treatments are withheld from patients that might
downward from the most comprehensive evi-                         otherwise beneﬁt from them. The point is not that the
dence—clinical trials—through varying layers of sup-              methods are perfect nor that they are perfectly executed,
porting evidence, each requiring more assumptions and             only that they are highly-developed and that much can
abstractions than the one above. All of the layers of             be learned from them.
evidence are useful, but as will be discussed, there is a             Now, to explore the medical treatment–design
hierarchy to the levels of evidence. The next ﬁve sub-            method analogy for validation, a ‘‘clinical trial’’ might
sections discuss each layer in detail and ﬂesh out the            proceed as follows. Design problems would be identiﬁed,
analogy with design theory.                                       perhaps as benchmark design problems, and then allo-
                                                                  cated to diﬀerent design methods or tools, including the
                                                                  speciﬁc design method to be studied and a comparable
3.2 Clinical trials                                               tool. (It is hard to identify a design equivalent of a
                                                                  placebo here, although perhaps there is a design equiv-
A clinical trial is the ultimate standard for validating the      alent of the medical admonition to do no harm that
eﬀectiveness of medical treatments. The U.S. Food and             could be brought into play). The design methods are
Drug Administration (FDA) has developed detailed                  applied, the results are reviewed, and the design out-
guidance for industry on the conduct of such trials               comes are recorded. To avoid bias, the application of the
(Department of Health and Human Services—Food and                 design methods ought to be done in some blinded
Drug Administration 1996, 1998a). Human subjects are              fashion, and while the ‘‘patient’’ or the design problem
identiﬁed and allocated to diﬀerent medical treatments            can be oblivious to the ‘‘treatment’’ or the design
including the speciﬁc treatment to be studied and a               method being applied, the ‘‘physician-researcher’’ or
comparator (sometimes a placebo). The medical treat-              designer would have to know which method she was
ments are administered, the eﬀects are monitored, and             applying, else she could not apply it! (On the other hand,
the outcomes are recorded. To avoid bias, the adminis-            whereas subjects in most clinical trials do not get to
tration of the treatments is usually blinded, often double        choose their treatment, designers—and analysts—often
blinded so that neither the patient nor the physician-            recognize that some problems ‘‘beg for’’ certain solu-
researcher knows which treatments are given to which              tions or treatments.) The outcomes or solutions to the
subjects. Further, even though patients may volunteer             design problem in the clinical trial may be aﬀected by
for or asked to be included in clinical trials, an                other factors that cannot be readily controlled, although
increasingly common experience in cancer treatments               it is more conceivable that problems and their solution
wherein there are other ethical issues whose presence             methods can here be more readily isolated. Nevertheless,
                                                                  adequate sample sizes would still be required to reach
                                                                  conclusions with a satisfactory degree of conﬁdence.
Table 2 Types of evidence used to develop and validate medical        Thus, a clinical trial for design methods could be an
treatments and design methods                                     experiment in which diﬀerent methods are allocated to
Evidence used to                  Evidence used to                organizations or to speciﬁc tasks therein. The resultant
develop/validate                  develop/validate                eﬀects on quality, proﬁtability, or time to market might
medical treatments                design methods                  be monitored and analyzed statistically. Such studies are
                                                                  sometimes conducted and a recent example (Kunert
Clinical trials                   Controlled ﬁeld evaluation      2004) will be discussed in Sect. 4.1. However, as fore-
                                   of design methods
Natural experiments               Studies of industry practice    shadowed just above, analogies of clinical studies for
In vitro experiments              Laboratory experiments          engineering design raise several questions, including:
                                   with human subjects
Animal models                     Detailed simulations            • How can appropriate (benchmark) design problems
                                   of design methods                be identiﬁed, articulated and represented. Moreover,
Theory (organic chemistry,        Theory (probability, decision     who should do that?
genetics, molecular biology,       science, cognitive science,
cellular biology, etc.)            organizational behavior)
                                                                  • Since designers—unlike their physician-researcher
                                                                    counterparts—must be familiar with the design

method they are implementing, does it matter that the that ‘‘smoking is a factor, and an important factor, in
process of blinding is likely impossible in this context? the production of carcinoma of the lung’’ (Doll and Hill
If it does matter, how much? 1950). There was subsequent debate over the validity of
• Since many engineering firms are unlikely to accept the study: since it was a natural—rather than con-
and implement a design method not of their own trolled—experiment, correlation had been statistically
choosing, does it matter if the allocation of treatments established, but causation was less certain. The choice to
or design methods cannot be randomized? If it does smoke or not was made by the subjects, and those who
matter, how much? choose to smoke are not an unbiased, random sample of
• Can the costs of doing such research—over many the population at large. The influence of such consid-
companies, designers, methods and products—be kept erations on inference should not be underestimated.
sufficiently bounded that adequate sample sizes are They prompted the statistician R. A. Fisher to con-
enabled? clude—regarding the cancer research—that ‘‘it is more
likely that a common cause supplies the explanation’’
The last question suggests that ways to make clinical (Fisher 1958). Subsequent studies have substantially re-
validation more efficient should be sought. One way to moved these doubts, establishing the causal link between
reduce the cost—or shorten the time—in a clinical trial is smoking and lung cancer without actually resorting to a
to use surrogate variables, that is, parameters that are clinical study in which human subjects were forced to
observed in place of the relevant clinical outcomes when smoke. Today, most medical scientists accept that cau-
it is difficult to make a direct observation of effective- sation can be established from natural experiments by
ness. For example, it might take an overly long time to including (Hill 1966):
demonstrate clinically that a drug reduces the long-term
• An analysis of temporal relationship;
risk of heart disease. Therefore, drugs are sometimes
• Plausibility based on prior knowledge; and
approved if they can be shown to have an effect on
• Coherence with other known facts.
variables thought to be related to risk of heart disease
(e.g. blood cholesterol levels). However, ‘‘there have
It is interesting to consider what ‘‘natural experiments’’
been many instances where treatments showing a highly
might entail in design methodology. Every time a new
positive effect on a proposed surrogate have ultimately
method becomes widely adopted by industry, there is an
been shown to be detrimental to the subjects’ clinical
opportunity for the design research community to learn
outcome’’ (U. S. Department of Health and Human
about its effectiveness and its side effects. For example,
Services - Food and Drug Administration, 1998, Pro-
many U.S. companies adopted design methods from
viding Clinical Evidence of Effectiveness for Human
Japanese companies in the 1980s (e.g., Quality Function
Drug and Biological Products, http://www.fda.gov/cder/
Deployment aka, QFD). Many believe that these com-
guidance/idex.htm). Surrogate variables can be mis-
panies derived benefits in quality, cost, and profitability.
leading indicators of clinical effectiveness and no general
A careful study of this ‘‘natural experiment’’ might have
procedure for evaluating surrogate variables has yet
more clearly established the effects of this ‘‘treatment’’
been broadly accepted in the medical community.
on the relevant outcomes. This idea will be explored
Similarly, surrogate variables may also be somewhat
more fully in Sect. 4.3.
helpful in validating engineering design methods. For
A useful augmentation to natural experiments and
example, Olewnik and Lewis’ (2005) proposed criteria
clinical trials is meta-analysis: the combination of data
for evaluating decision support methods could be viewed
from several studies to produce a single estimate. Meta-
as surrogate variables. It seems reasonable that a
analysis employs statistical, multi-factorial methods in
method is more likely to be effective if it is logical, uses
which the treatment is one predictor variable and the
meaningful, reliable information, and does not bias the
study is another predictor variable (Bland 1987). A
designer. However, as Olewnik and Lewis (2005)
principal difficulty with meta-analysis is so-called pub-
acknowledged, these criteria do not ensure effectiveness
lication bias. Studies which produce significant differ-
in practice.
ences are more likely to be published than those which
do not (Easterbrook et al. 1991). Also, unfavorable re-
3.3 Natural experiments sults are left unpublished more frequently than favorable
results. Thus, in medical research it is advised that all
One alternative to a clinical trial, which is a controlled studies be used in any meta-analysis, both published and
experiment, is the so-called natural experiment in which unpublished, demanding that researchers seek out
the consequences of different treatments are ob- studies through personal knowledge and intensive
served—but without controlled administration of the investigation.
treatments. For example, in the first major study linking Design methods can also, in principle, be studied by
smoking and lung cancer, 1,500 patients who had been meta-analysis. As will be discussed in Sect. 4.2, meta-
diagnosed with lung cancer and a similar-sized sample of analysis is used in design research, but more extensive
patients not diagnosed with lung cancer were surveyed studies may be possible. Applications of different design
about their smoking habits. The result of the study was methods are frequently published and they could be

collected and subject to statistical analysis. However, laboratory research, especially decision support tools,
just as in medical research, publication bias is a real often degrade performance, rather than improve it
concern. Many case studies are developed to illustrate (Klein et al. 2003).
the successful use of a design method. This is a reason-
able practice as new methods are much more useful to
practitioners when accompanied by examples illustrating 3.5 Animal models
successful outcomes. Publication bias is a natural con-
sequence of the dynamics of the publication process and Animal models are another important tool in medical
generally not evidence of deliberate attempts at decep- research and development. An animal model is an
tion. Nevertheless, and while difficult to accomplish, organism—often a mouse—selected or developed to
adequate safeguards against bias must be made in meta- bear specific physiological similarities to humans with
analysis of case studies of engineering design method- specific medical disorders. A variety of animal models
ology. In most cases, it is not possible to collect all of the exist for studying disorders of the heart, blood, brain,
published and unpublished applications of a design eyes, and so on (Center for Modeling Human Disease,
method. Further, if a subset of unpublished cases is http://www.cmhd.ca). The precise form of an appropri-
sampled, it would be difficult to avoid bias since engi- ate animal model is strongly determined by the medical
neering firms will justifiably be concerned about dis- disorder being modeled. For example, to study attention
seminating evidence of unsuccessful outcomes. deficit hyperactivity disorder (ADHD), mice lacking a
specific gene encoding a mechanism regulating brain
chemistry, a dopamine transporter or DAT, have been
3.4 In vitro experiments developed, thus the mice are called DAT knockout or
DAT-KO mice: ‘‘The preponderance of common sym-
In many cases, applying an experimental medical treat- ptomatologies between the DAT-KO mice and individ-
ment to a human subject is not justifiable, but it may be uals with ADHD suggests that these mice may... serve as
possible to apply the treatment to human tissues and a useful animal model and as resource to test new
cells outside the human body. This type of experiment is therapies’’ (Gainetdinov et al. 1999).
described as in vitro as opposed to in vivo. This ap- The use of animal models entails uncertainty due to
proach avoids the risk of harming subjects, and some- the inevitable differences between the animal and the
times enables closer observation of the effects and human. For example, of the DAT-KO mice, it is noted
mechanisms of the treatment. However, because the that ‘‘despite the similarities between the mutant mice
tissues or cells are outside the normal context of their and humans with ADHD-HKD, it is unlikely that their
existence, great care must be exercised in drawing phenotypes are completely identical’’ (Gainetdinov et al.
inferences about clinical effectiveness. 1999). In this regard, an animal model for medical re-
In design theory, human subjects are sometimes used search and development is similar to an engineering
as subjects in laboratory experiments (e.g., Chakrabarti model for use in design. The model is intended to rep-
et al 2004; Cacciabue and Hollnagel 1995). These are resent another entity, but does not represent that entity
analogous to in vitro experiments in medicine in the in all regards. In fact, there are no universally accepted
following ways: procedures for validating animal models of human dis-
ease because the validity of the models is so closely
• The subject of the experiment is removed from the
linked to the investigations to be conducted (S.L. Ad-
usual context, in this case, the corporation where most
amson, personal communication).
authentic engineering practice takes place.
Simon proposed a way to go about validation of
• Cooperation of an entire engineering enterprise is
design methods that is similar to the practice of using
generally not needed. The human subjects can vol-
animal models in medicine (Simon 1996):
unteer individually.
• Closer observation and control of experimental con- ‘‘. . . there exist today a considerable number of
ditions may be possible. examples of actual design processes, of many dif-
ferent kinds, that have been defined fully . . . in the
Laboratory experiments with human subjects are cur- form of running computer programs . . . Because
rently providing insights into engineering design. How- these computer programs describe complex design
ever, there is a risk in extending results from laboratory processes in complete, painstaking detail, they are
experiments to make inferences about engineering open to full inspection and analysis, or to trial by
practice. Macrocognition is a term that has been coined simulation. They constitute a body of empirical
to describe the cognitive functions performed in natu- phenomena . . . There is no question, since these
ral—versus artificial, laboratory—settings (Klein et al. programs exist, of the design process hiding behind
2003). Real-world settings require such activities as the cloak of ‘‘judgment’’ or ‘‘experience.’’ Whatever
problem setting, attention management, planning, and judgment or experience was used in creating the
adaptation/re-planning in ways that laboratories can programs must now be incorporated in them and
rarely simulate. As a result, tools developed based on hence be observable.’’

In effect, Simon is proposing that a computer simulation prescription of psycho-stimulants to ADHD patients
of a design process stand in for the actual design process. continued despite the paradox. Meanwhile, animal
To be more precise, the argument is that a computer models were (and still are) being used in detailed inves-
simulated design scenario to which design methods tigation of the effects of psycho-stimulants on the brain
would be applied is a more apt analogy for an animal functions of DAT knockout mice, which may ‘‘provide
model as used in medical research. In fact, the animal insights into the basic mechanisms that underlie the
model analogy suggests that a full correspondence be- etiology of this and other hyperkinetic disorders’’
tween the simulation and the reality being modeled is (Gainetdinov et al. 1999). These investigations may
not required. Animal models such as DAT-KO mice eventually lead to drugs that more specifically target the
bear only a few selected similarities to humans with relevant mechanisms to ADHD and therefore are more
ADHD. Similarly, simulations used in engineering de- effective and/or have fewer side effects.
sign may bear only certain key similarities to real-world The phenomenon of practical uses preceding
design and still reveal interesting insights. This practice underlying theory is also common in engineering. The
is not uncommon for research in engineering design; founding of thermodynamics as a field was substan-
many papers use an approach in which computer sim- tially accelerated by the study of working steam en-
ulations of design processes are used as a source of gines. The ability to make workable steam engines
epirical knowledge about design (Moss and Cagan preceded understanding of thermodynamics, not the
2004). Such studies do not necessarily require precise other way around as it is frequently assumed. Similarly,
models of cognitive processes of engineering designers. many of the most useful innovations in design meth-
Schön has proposed another means to test and use odology including robust design, lean manufacturing,
methods, a practicum, rather than a computer simula- and QFD emerged from industry practice. Only later
tion (Schön 1983): were these practices studied by theorists. In some cases,
the innovations were refined based on the theory, but
‘‘. . . a practicum ... is really a virtual world. A virtual just as often, theories had to be extended or even
world in the sense that it represents the world of revolutionized to accommodate the understanding of
practice, but is not the world of practice. . . ’’ the practice. The interplay of theory and applica-
tion—with applications often leading the way—is long
Just as Simon proposed the computer simulation as a known, but too often forgotten. Design researchers
model of real design, Schön has proposed another kind should certainly keep in mind the approach of medical
of entity a bit closer to professional design practice. A researchers. If theoretical investigations of a design
key difference in Schön’s practicum is that an actual method seem to conflict with field reports about that
person has to carry out the design. Therefore a practi- method, then the design methodology may still be valid
cum can assess a design method and the degree to which and perhaps the theory should be extended to include a
it fits human cognitive and psychological attributes. richer variety of considerations.
Using practica (or something very similar) is common in
research in engineering design; for example, researchers
often use classroom settings to evaluate design methods 4 Implications of the analogy
(e.g., Wilkening and Sobek 2004; Reich et al. 2006).
This section considers validation in three different areas
of design theory and methodology—robust design, axi-
3.6 Theory omatic design, and decision-based design—in order to
explore more thoroughly the medical treatment–design
The underlying theories of medical science such as method analogy.
chemistry and biology play a pivotal role in the devel-
opment of new treatments. Because basic research cat- 4.1 Robust design
alyzes new innovations, the developers of medical
treatments pay careful attention to research and even Robust design is a set of design methods in which
make substantial investments in privately funded basic products and processes are made less sensitive to man-
research. However, in validation of medical treatments, ufacturing variations, customer use conditions, and
theory plays a surprisingly small role. Drugs are often degradation over time. The techniques were first pio-
administered widely without full understanding of their neered by Taguchi who, through his extensive work with
underlying mechanisms—as long as their effectiveness Japanese industry, developed a methodology (Taguchi
can be established clinically. For example, it is known 1987; Phadke 1989). Later, researchers and practitio-
that psycho-stimulants have a calming effect on humans ners—especially in statistics—developed new ap-
with ADHD. This effect appeared so logically inexpli- proaches intended to be more tightly linked to
cable that medical researchers referred to these calming mathematical theory, and especially to design of experi-
effects as paradoxical: seemingly self-contradictory, yet ments (DOE).
nonetheless true (Gainetdinov et al 1999). Because the Design of experiments is a body of knowledge and
value of the treatments was established clinically, the techniques for planning a set of experiments, analyzing

the resulting data, and drawing conclusions from the resulted in a standard deviation of the response sub-
analysis (Wu and Hamada 2000; Box et al. 1978). DOE stantially smaller than that provided by a single-array
in general—and fractional factorial design in particu- design, although both methods were much better than a
lar—have been key theoretical foundations for devel- ‘‘placebo’’ (selecting control factor settings at random)
opment of robust design methods. Both Taguchi (Frey and Li 2004). These model-based results tend to
methods and newer techniques use fractional factorial corroborate Kunert’s ‘‘clinical’’ study showing that the
experiments, but in different ways. Recent theoretical single instance in the field was probably typical of the
developments have led statisticians to discourage the use population of realistic scenarios in the field.
of crossed arrays and encourage the use of a single array The results in this section suggest that it is a mistake
including both control and noise factors (Wu and Ha- to infer too much about robust design methods based
mada 2000; Borror and Montgomery 2000). The justi- solely on mathematical theory. A similar conclusion has
fication is that ‘‘. . . some of the single arrays . . . are been voiced by Box and Liu (1999). In recent decades,
uniformly better than the cross arrays in terms of the the DOE research community has emphasized the
number of clear main effects and two-factor interac- development of mathematically optimal experimental
tions’’ (Wu and Hamada 2000). designs which tend to be ‘‘one-shot’’ procedures. Such
In robust design, the ‘‘clinical endpoint’’—what the designs are optimal within a formal axiomatic frame-
engineering organization wants—is improved system work, but Box has argued that, in practice, they
performance in the presence of noise. Single-array de- undermine the experimenter’s need to alternate between
signs have been justified on the basis of the number of forming hypotheses and conducting experiments (Box
clear effects and of control by nose interactions. One and Liu 1999). Box further argued that this resulted in
may say that these properties are being used as surrogate less improvement of systems than would have been
variables, and that the clinical endpoints should be achieved by iterative procedures (even though they are
evaluated in a clinical trial. Such a trial was recently partly heuristic). Box’s findings, if accepted, provide an
conducted by Kunert et al. (2004). They used crossed- instance of overemphasis on mathematical theory lead-
array designs and single-array designs to improve the ing to ineffective professional practice due to neglect of
consistency of a sheet metal spinning process. In effect, relevant human factors.
this was a paired comparison experiment where the
experimental ‘‘treatment’’ was the design method em-
ployed. The result of the experiment was that the cros- 4.2 Axiomatic design
sed-array method led to process settings with a more
consistent profile of the sheet metal parts when com- Axiomatic design posits two ‘‘axioms’’: the indepen-
pared with the single-array method. dence axiom and the information axiom (Suh 1990). The
The ‘‘clinical trial’’ by Kunert et al. provides some full theory of axiomatic design, including theorems and
objective evidence validating the crossed-array method corollaries, is based on theorems whose validity, it is
and disconfirming the single-array approach to robust said, relies on the validity of the axioms. Thus, axiomatic
design. However, this evidence is not conclusive. First of design theory employs foundationalist epistemology. As
all, the experiment included only one replication of the long as the axioms are valid and the other elements of
paired comparison. If the exact same experiment were the theory are deduced from the axioms, it is proposed
repeated, the single array might have beaten the crossed that the theory is sound and useful for professional
array simply due to random variations. Further, if the practitioners. In particular, it is argued that the theory is
same two methods were applied to some other engi- useful in the design of large scale systems along almost
neering system, say a lithography process rather than a all dimensions of their design and operation. That is,
sheet metal spinning process, the single array may have axiomatic design claims to improve design, construction,
prevailed over the crossed array. A paired comparison in operation, modification, maintenance, documentation,
a single instance of application does not provide as much and diagnosis of system failures (Suh 1998).
information as one would like to validate a design Suh (1990) defines axioms as ‘‘truths that that cannot
method. be derived or proved to be true except that there are no
Now, let us consider how validation methods analo- counter-examples or exceptions’’. This suggests that the
gous to those in medicine can be applied in robust de- means to validate the axioms is to look for counter-
sign. As an example of meta-analysis, Li and Frey (2005) examples. However, Suh’s axioms are in the imperative
conducted a study of experiments carried out in many form. The independence axiom is ‘‘maintain the inde-
fields of engineering to ascertain the degree that system pendence of functional requirements.’’ It is not clear
regularities such as effect sparsity, hierarchy, and what a counter-example to an imperative might entail;
inheritance were evident in the resulting data. Frey and the axiom is not testable—as defined by Schön (1983)
Li (2004) then conducted a model-based investigation of and Argyris (1991) and described in Sect. 2.1—because it
crossed-array and single-array designs using a computer does not ‘‘specify the situation, the desired result, and
simulation of robust design on a large number of sim- the action through which the result is to be achieved.’’
ulated responses drawn from a third order polynomial The clinical trial part of the medical treatment–design
model of the system’s response. A crossed-array design method analogy suggests one testable hypothesis about

axiomatic design. If training in axiomatic design is from medical research and development may help avoid
viewed as a treatment, the hypothesis might be put forth such difficulties. Data should be collected from ongoing
that providing training in axiomatic design to a group of natural experiments and laboratory experiments should
engineers will enable them to produce designs having a be used to further investigate the implications of the
higher ‘‘probability of success’’ than a group that re- theory.
ceives training in another method (a comparator). By
devising a study to test this hypothesis, axiomatic design
might be objectively assessed. It might also enable the 4.3 Making design decisions
assessment of how well axiomatic design theory can be
learned and implemented in practice. Such an investi- Designers frequently find themselves having to rank
gation would be very costly. objectives against one another, choose among alterna-
The natural experiment part of the medical treat- tive means for achieving functions, select from alterna-
ment–design method analogy suggests another ap- tive design options, and generally make a wide variety of
proach. A software product, Acclaro, has been design decisions (Dym and Little 2004). Many of the
developed to support axiomatic design and is currently tools used to help designers make such decisions have
being marketed to engineering companies (Axiomatic become widely used, as well as often criticized. For
Design Solutions Inc. website http://www.axiomaticde- example, QFD is a method frequently used in the early
sign.com/). In effect, a natural experiment is currently phases of product design (Hauser and Clausing 1988).
underway. The company that sells the software lists QFD was first developed at the Kobe ship yard in Japan,
among its clients Ford Motor Company, Lockheed then adopted by major Japanese companies and subse-
Martin, Hewlett Packard, Saab Rosemount, and others. quently by a broad variety of companies in North
Also listed is ASML, which acquired Silicon Valley America and Europe. One primary goal of QFD is to
Group Incorporated—a company that was among the help companies align their designs with the ‘‘voice of the
first to use the software. A study of these companies customer.’’ One claim often made about QFD is that the
might reveal how this treatment has affected various method encourages teamwork among engineers and
measures of their performance, such as product quality marketing professionals by providing immersion in the
ratings, time to market, profitability, market share, and specifications prior to design efforts and by setting
so on. Such a study could not provide a definitive appropriate technical targets for mature products (Dym
evaluation of the theory, but is essential to forming a and Little 2004). There is some empirical evidence that
complete assessment. suggests that QFD has not been correlated in the short
The in vitro experiment part of the medical treat- term with quality improvements, but that users never-
ment–design method analogy suggests still another theless feel it has longer term benefits (Griffin 1989).
means of evaluating Axiomatic Design. A simulation of Meanwhile, QFD and similar decision-making tools
a design process could be created to test a hypothesis such as the pairwise comparison chart (PCC) and the
such as, ‘‘if a system design does not maintain the analytical hierarchy process (AHP) have been taken to
independence of functional requirements, then the set of task by some mathematicians and design theorists. Saari
requirements cannot be satisfied.’’ In fact, just such an (2001) has investigated the mathematics of different
investigation was carried out. Human subjects were forms of voting and of related decision support proce-
asked to satisfy a set of n functional requirements using dures. These investigations have brought to light unde-
a set of n design parameters for both uncoupled systems sirable properties the methods exhibit under certain
and coupled systems of modest size (from n = 2 to circumstances. Hazelrigg (1998, 1999) has argued that
n = 5) (Hirschi and Frey 2002). The experiment showed all of these methods are simply wrong because they
that human subjects could succeed in simultaneously violate tenets of decision theory. One specific conclusion
satisfying all of the requirements of both uncoupled and drawn by Hazelrigg is that industry should stop using
coupled systems, but that the task completion time QFD. However, the framework for validation used in
scaled linearly for uncoupled systems and geometrically medicine suggests that QFD should not be abandoned
for coupled systems. This investigation does not support so easily because the existing field data show some po-
the idea that coupling must be avoided, but it does sitive effects of QFD (Kuppuraju et al. 1985). Also, it is
provide empirical support for the idea that coupling has not obvious that decision theory is an adequate model
some negative implications in system design. for many forms of design decision making (Dym et al.
The results in this section suggest that an axiomatic 2002). Rejecting QFD because of Arrow’s theorem in
approach to design theory development will, in some decision theory is akin to telling patients to reject med-
instances, lead to logical or practical difficulties. If axi- icines whose effects cannot be explained using current
oms are stated regarding design, they may not be test- knowledge of chemistry and biology. No medical treat-
able unless they are stated as empirical hypotheses. As ment can work except by a means that may ultimately be
theorems are derived from the axioms, one must ensure subject to scientific explanation. However, in practice,
not only consistency with the axioms, but also consis- many effective medical treatments are discovered before
tency with every fact of reality that bears on the theo- their relevant mechanisms are understood. Under cur-
rem, including human cognitive capabilities. Practices rently accepted medical practices in the developed world,

it is acceptable to market a treatment without a full biology, in vivo experiments with animal models, clinical
understanding of its underlying mechanisms as long as trials with human subjects, and natural experiments.
the treatment’s safety and effectiveness can be clinically Model-based validation is one of the techniques we
established. Design researchers might conclude, at least most strongly encourage design researchers to adopt. In
tentatively and cautiously, that QFD and similar tools medicine, tremendous investments are made in creating
are currently the subject of field evaluation. At the same animals that reasonably model aspects of human dis-
time, design researchers, in collaboration with design ease. Similar efforts might be made to create models of
practitioners, should continue to investigate the mech- engineering design processes for the explicit purpose of
anisms that provide perceptible benefits and simulta- evaluating design methodologies. As discussed in Sect.
neously use theory to seek improvements. Reports from 4.1, some small scale examples have been carried out in
industry practice suggest that the long term value of robust design, but much more may be possible with
QFD derives from factors such as cross-functional concerted effort.
coordination and information flow (Griffin 1989). The Another practice in medicine worthy of consideration
medical analogy suggests that models should be devel- is the careful collection and analysis of field data as new
oped that adequately represent these mechanisms, and treatments are introduced. Similar natural experiments
then these models should be used to test and refine our in engineering design are ongoing every time a new
decision making methods. method is adopted widely in the field. It is essential that
Although the medical analogy seems to suggest QFD design researchers work to collect data and include
may continue to be used, the analogy also raises an analysis of such data in evaluations of methods such as
important issue concerning labeling. Over-the-counter QFD, Pugh controlled convergence, Taguchi methods,
medications must carry carefully defined instructions for and Axiomatic Design. All of these methods have seen
their use. Without such labels, medicines would be less substantial use in the authentic context of industry
useful and more dangerous. Such labeling has been practice. At this point, any broad statements about ei-
suggested for QFD through research in engineering de- ther their validity or inherent flaws must be interpreted
sign. Dym et al. (2002) suggest that QFD and related in light of what has actually transpired.
methods can be used cautiously at early stages of design, In addition to these concrete suggestions for design
but should not be used as the sole means to establish a research practice, a more philosophical point may also
single numerical rating that can then be used to choose a be made regarding the role of foundations in our field.
single alternative. Further, specific cautions are made In both medicine and in engineering design it has fre-
against over-interpretation of numerical differences be- quently been observed that as the many layers of evi-
cause so much of design knowledge is not amenable to dence are traversed, surprising discoveries are made.
straightforward mathematical modeling (Dym and Little Developments based on theory alone may prove to be
2004; Dym et al. 2002). This paper suggests that ineffective in practice. Despite this fact, the field of de-
researchers in engineering design should pay close sign research has, at times, taken an exaggerated stand
attention to such instructions since they are an essential regarding theoretical foundations—that as long as con-
part of the ‘‘treatments’’ being developed. sistency with first principles can be maintained, then
good methodology is likely to result. Leonard Savage, in
developing an axiomatic basis for statistics and decision
making, provided a caution against this approach and
5 Closing remarks
suggested a more balanced view of the interactions be-
tween foundations and professional practice:
The principal argument of this paper is that many of the
highly-developed validation techniques found in medi- ‘‘It is often argued academically that no science can
cine can profitably be used in engineering design re- be more secure than its foundations, and that, if there
search. Validation of design methods is an important is controversy about the foundations, there must be
topic for engineering design theory and practice. Every even more controversy about the higher parts of the
engineer that designs products or processes uses some science. As a matter of fact, the foundations are the
design method or composite of methods. The design most controversial part of many, if not all, sciences...
methods now in use have an impact on all aspects of As in other sciences, controversies about the foun-
success, including quality, performance, cost, and time dations of statistics reflect themselves to some extent
to market. Therefore, it seems reasonable to seek vali- in everyday practice, but not nearly so catastrophi-
dation procedures for design methods that are as good cally as one might imagine. I believe that here, as
as the ones applied in developing medical treatments. elsewhere, catastrophe is avoided, primarily because
Inasmuch as medical treatment is so widely viewed as in practical situations common sense generally saves
important, the associated validation procedures are well- all but the most pedantic of us from flagrant error...
documented, objective, and evidence-based. New treat- Although study of the foundations of a science does
ments must provide proof of their effectiveness before not have the role that would be assigned to it by naı̈ve
they can be deployed widely. This proof is built up of first-things-firstism, it certainly has a continuing
many layers, including basic theory in chemistry and importance as the science develops, influencing, and

You can also read