Using Real-World Data for Coverage and Payment Decisions: The ISPOR Real-World Data Task Force Report

Page created by Alfredo Valdez
 
CONTINUE READING
Volume 10 • Number 5 • 2007
                                                                                                               VA L U E I N H E A LT H

Using Real-World Data for Coverage and Payment Decisions:
The ISPOR Real-World Data Task Force Report

Louis P. Garrison Jr., PhD (cochair),1 Peter J. Neumann, ScD (cochair),2 Pennifer Erickson, PhD,3
Deborah Marshall, PhD,4 C. Daniel Mullins, PhD5
1
    University of Washington, Seattle, WA, USA; 2Tufts-New England Medical Center, Boston, MA, USA; 3O.L.G.A., State College, PA, USA;
4
    i3Innovus, Burlington, ON, Canada; 5University of Maryland, Baltimore, MD, USA

A B S T R AC T

Objectives: Health decision-makers involved with coverage                  according to the strength of research design), and by type of
and payment policies are increasingly developing policies that             data source (supplementary data collection alongside RCTs,
seek information on “real-world” (RW) outcomes. Moti-                      large simple trials, patient registries, administrative claims
vated by these initiatives, the International Society for Phar-            database, surveys, and medical records). Our report discusses
macoeconomics and Outcomes Research (ISPOR) created a                      eight key issues: 1) the importance of RW data; 2) limitations
Task Force on Real-World Data to develop a framework to                    of RW data; 3) the fact that the level of evidence required
assist health-care decision-makers in dealing with RW data,                depends on the circumstance; 4) the need for good research
especially related to coverage and payment decisions.                      practices for collecting and reporting RW data; 5) the need
Methods: Task Force cochairs were selected by the ISPOR                    for good process in using RW data in coverage and reim-
Board of Directors. Cochairs selected chairs for four working              bursement decisions; 6) the need to consider costs and ben-
groups on: clinical outcomes, economic outcomes, patient-                  efits of data collection; 7) the ongoing need for modeling; and
reported outcomes, and evidence hierarchies. Task Force                    8) the need for continued stakeholder dialogue on these
members included representatives from academia, the phar-                  topics.
maceutical industry, and health insurers. The Task Force met               Conclusions: Real-world data are essential for sound cover-
on several occasions, conducted frequent correspondence and                age and reimbursement decisions. The types and applications
exchanges of drafts, and solicited comments on three drafts                of such data are varied, and context matters greatly in deter-
from a core group of external reviewers and from the ISPOR                 mining the value of a particular type in any circumstance. It
membership.                                                                is critical that policymakers recognize the benefits, limita-
Results: We defined RW data as data used for decision-                     tions, and methodological challenges in using RW data, and
making that are not collected in conventional randomized                   the need to consider carefully the costs and benefits of differ-
controlled trials (RCTs). We considered several characteriza-              ent forms of data collection in different situations.
tions: by type of outcome (clinical, economic, and patient-                Keywords: methodology, outcomes research, real-world
reported), by hierarchies of evidence (which rank evidence                 data, research design.

Why a Real-World Data Task Force?                                          by regulatory authorities for marketing approval. It is
                                                                           broadly acknowledged that while RCTs provide a
Growing Use of Evidence Syntheses and
                                                                           “gold standard” in the sense that they provide solid
Outcomes Research
                                                                           evidence of product efficacy under carefully controlled
Health decision-makers involved with coverage and                          conditions, RCTs are carried out using selected popu-
payment policies are increasingly seeking information                      lations under idealized conditions. In addition, they
on “real-world” (RW) outcomes on which to base their                       are expensive to conduct. Other sources of data can
decisions. Many of them are developing policies that                       contribute in important ways to the evidence base
integrate evidence from different sources. These poli-                     (e.g., demonstrating how a drug works in populations
cies recognize the importance of evidence that goes                        or under conditions not studied in the trial, or relative
beyond information collected during clinical develop-                      to another drug not included in the study).
ment in randomized controlled trials (RCTs) required

Address correspondence to: Louis P. Garrison, Jr., Department of           Policy Developments
Pharmacy, University of Washington, Health Sciences Building,
H375, Box 357630, Seattle, WA 98195, USA. E-mail:                          Recent policy initiatives highlight payers’ attempts to
lgarrisn@u.washington.edu                                                  collect and use such data. The Medicare Modernization
  10.1111/j.1524-4733.2007.00186.x                                         Act (MMA) of 2003 illustrates the US government’s

326                            © 2007, International Society for Pharmacoeconomics and Outcomes Research (ISPOR)   1098-3015/07/326   326–335
ISPOR Real-World Task Force Report                                                                                  327

attempt to bolster the evidence base on which private          RW health-care decision-making, especially related to
health plans participating in the new Medicare drug            coverage and payment decisions.
benefit can base their coverage decisions. The MMA
contains a provision (Section 1013) calling on the
Agency for Healthcare Research and Quality (AHRQ)              Scope
to conduct research on the “outcomes, comparative
clinical effectiveness, and appropriateness of health          Defining RW data. From the outset, the Task Force
care, including prescription drugs” [1]. The AHRQ has          grappled with the definition and appropriate charac-
recently launched the Effective Health Care Program to         terization of RW data. It seemed self-evident that RW
synthesize, generate, and translate knowledge to aid           outcomes data should come from RW situations. Part
stakeholders in grappling with the often difficult deci-       of the Task Force’s charge was to consider uses and
sions they must make [2]. The AHRQ has also                    limitations of evidence not obtained from RCTs. On
announced that it is developing a “how-to” reference           the other hand, it was also clear that decision-making
guide to help health-care organizations in creating            is a highly integrative process of synthesizing informa-
patient registries to track the outcomes of medical            tion from different sources—both “laboratory” and
treatments, including drugs [3]. The US Centers for            real-world.
Medicare and Medicaid Services (CMS) has also                      Some Task Force members questioned the appropri-
recently issued a revised Coverage with Evidence Devel-        ateness of the term “RW data” in the first place. In the
opment Guidance document that focuses on the need              end, we decided to adhere to the term, not only
for RW data to help inform national coverage decisions         because it reflected the charge from the ISPOR Board,
for new technology [4]. Many private payers are adopt-         but also because the term has gained currency in some
ing the Academy of Managed Care Pharmacy Format,               policy circles.
which calls for health plans to request formally that              We settled on a definition that reflects data used for
drug companies present a standardized “dossier,” con-          decision-making that are not collected in conventional
taining detailed information not only on the drug’s            RCTs. This is not to say that data from RCTs are
efficacy and safety, but also on its projected effectiveness   irrelevant or not used by decision-makers; indeed, they
and economic value relative to alternative therapies           remain the critical foundation for almost all initial
[5,6].                                                         coverage and payment decisions. For if there is not a
    Other efforts to review evidence systematically have       belief in the plausibility of the underlying biological
also gained momentum, including those by private and           mechanism or hypothesis, why should anyone seek
public health plans. A notable initiative in the United        further evidence of effectiveness or cost impact in the
States is the Drug Effectiveness Review Project                real world? Yet, efficacy evidence in a particular group
(DERP), an alliance of 15 state Medicaid programs              or subgroup is typically insufficient to project the size
and two nonprofit organizations, to synthesize and             of the effectiveness impact in the population that
judge clinical evidence for drug class reviews [7].            would actually use a product. Decision-makers there-
Around the globe, national reimbursement authorities           fore seek additional types and sources of data.
and the organizations that assist them, such as the
National Institute for Health and Clinical Excellence          Data versus evidence. Our Task Force also deliberated
(NICE), and the German Institut für Qualität und               distinctions between the terms “real-world data” and
Wirtschaftlichkeit im Gesundheitswesen (IQWiG), are            “real-world evidence.” Some in our group favored the
struggling with similar issues [8,9]. Both DERP and            latter term, or at least raised questions about whether
IQWiG emphasize RCTs for clinical evidence as                  we meant evidence when we employed the term data.
opposed to synthetic modeling approaches, such as              The notion was that “data” conjures the idea of simple
those used by NICE to project RW effectiveness.                factual information, whereas “evidence” connotes the
                                                               organization of the information to inform a conclusion
                                                               or judgment. Evidence is generated according to a
Task Force Objectives and Scope                                research plan and interpreted accordingly, whereas
                                                               data is but one component of the research plan. Evi-
Objectives                                                     dence is shaped, while data simply are raw materials
Motivated by the MMA and other efforts, and recog-             and alone are noninformative.
nizing the lack of a framework for considering RW                 In the end, we adhered to the term “real-world
data in coverage and reimbursement decisions, the              data” for reasons noted above. Nevertheless, we try to
International Society for Pharmacoeconomics and                remain sensitive to the distinctions throughout, and
Outcomes Research (ISPOR) created a Task Force on              emphasize that thorough knowledge of the quality of
Real-World Data. The mission of the Task Force was             the data source is necessary when deciding whether a
to develop a framework to assist health-care decision-         specific set of data can be used to answer a research
makers in dealing with RW data and information in              question or shed light on a decision.
328                                                                                                Garrison et al.

Focus on coverage and payment decisions. We focus           core group of external reviewers and more broadly
our report on coverage and payment (i.e., both pricing      from the membership of ISPOR.
and reimbursement) decisions. This sets it apart from
other contexts, such as direct patient care or regulatory
approval for marketing.                                     Types and Sources of RW Data
   We recognize the tension that sometimes arises           Characterizing RW Data
between what is good for an individual patient and          There are several ways in which one might characterize
what is good for the population at large. Evidence-         RW data. One is by type of outcome: clinical, eco-
based medicine (EBM) has been defined as “the con-          nomic, and patient-reported outcomes (the focus of
scientious, explicit and judicious use of current best      three of our working groups). An advantage of this
evidence in making decisions about the care of indi-        approach is that it corresponds to the way in which
vidual patients” [10]. While physicians retain a pivotal    many decision-makers conceive of data. A downside is
role in patient care decisions, other bodies such as        that it provides broad categories, each of which com-
pharmacy and therapeutic (P&T) committees and               bines many types and sources of evidence.
national reimbursement bodies (e.g., in Canada, Aus-            A second characterization involves traditional hier-
tralia, and most European countries) are taking on          archies of evidence (the focus of our fourth working
greater importance in medical decision-making.              group), which rank evidence according to the strength
                                                            of the research design. Typically, data from RCTs sit
Drugs versus other interventions. While     most of our     atop the hierarchy followed by data from nonrandom-
examples are drawn from the world of        prescription    ized intervention studies, followed by epidemiological
drugs, our report has implications for      all types of    studies and so forth [11]. Evidence hierarchies provide
interventions, including drugs, devices,     procedures,    a useful ranking based on the rigor of the research
and health programs.                                        design; however, they do not provide a complete
                                                            picture of RW data. The results from many RCTs are
US versus global focus. The motivation for the Task         not generalizable to a broader population. Conversely,
Force came largely from the MMA and US policy               a well-conducted observational study may prove
arena, but debates about types and sources of data are,     highly useful in certain situations provided that poten-
of course, present around the world—as suggested in         tial biases have been adequately addressed. Indeed,
the different approaches taken, for example, by NICE        some would argue that observational data can often
and IQWiG. Although many of our examples are taken          provide more relevant evidence regarding patient out-
from US contexts, our findings are global in reach.         comes in actual clinical practice than can a registration
                                                            RCT.
                                                                Finally, one might consider RW data by types of
Task Force Participation                                    data sources. The value of this classification is that it
Task Force cochairs were selected by the ISPOR              identifies tangible sources of information. A potential
Board of Directors. Cochairs selected chairs for four       drawback is that it represents a simplification that does
working groups on: clinical outcomes, economic out-         not capture important design issues within each source
comes, patient-reported outcomes (PROs), and evi-           of evidence.
dence hierarchies. The first three groups correspond to         Each of the three characterizations provides a dif-
the three major areas of outcomes research that con-        ferent perspective on RW data. Collectively, we believe
tribute to the studies of comparative effectiveness and     they provide a useful portrait of the strengths, weak-
economic evaluation sought by decision-makers. The          nesses, and complexities inherent in the topic.
working group on evidence hierarchies addresses the
cross-cutting, foundational issue of the quality of evi-    Types of Outcomes
dence. This paper reflects an integration of the various
working group reports.                                      Clinical outcomes. Clinical outcomes include bio-
   Task Force members included representatives from         logical measures of morbidity (e.g., blood pressure,
academia, the pharmaceutical industry, and health           cholesterol level, symptoms, and side effects) and mor-
insurance plans, and brought varied experiences in          tality. Clinical outcomes include both surrogate (inter-
using RW data. An outline and draft report developed        mediate) and long-term measures. For purposes of this
by the panel were presented at the 2005 International       discussion, we differentiate clinical outcomes from
and European ISPOR Meetings. The manuscript was             health outcomes and especially PROs and health-
then submitted to a reference group of interested           related quality of life (HRQoL).
ISPOR members for review and comment. The Task                  Much of the data collected in phase III registration
Force met on several occasions, conducted frequent          trials involves clinical outcomes. Clinical outcome data
correspondence and exchanges of drafts by electronic        are also found in many other sources, such as patient
mail, and solicited comments on three drafts from a         registries or observational databases. A key issue, dis-
ISPOR Real-World Task Force Report                                                                                 329

cussed in greater detail in Section 3.4, involves the       movement is to ground clinical practice in rigorous
validity of clinical data from such sources.                research. EBM proponents emphasize that traditional
                                                            medical practice incorporated local practices and
Economic outcomes. “Economic outcomes” are nar-
                                                            expert opinion that were not tested in controlled
rowly defined here to include estimates of medical and
                                                            studies [11]. They stress the need for clinical research-
nonmedical resource utilization and their associated
                                                            ers to document all study protocols, utilize appropriate
costs. Such data are used to project the expected cost
                                                            analytical techniques, and strive for internal consis-
of an intervention in the real world—e.g., in the
                                                            tency [16]. Studies are to be considered externally valid
numerator of a cost-effectiveness ratio. As discussed
                                                            when findings are generalizable beyond local clinical
below, many sources of RW data are useful in provid-
                                                            practices. A scientific body of evidence became reliable
ing use and cost information. Of course, we recognize
                                                            and generalizable when similar results were reported
that the term economic outcomes can be more broadly
                                                            by different researchers across a range of study designs
construed since economic evaluations consider both
                                                            and patient populations. For these reasons, RCTs were
benefits and costs, but keep this narrow definition in
                                                            placed at the top of the evidence hierarchy.
the interest of continuity with the original charter of
                                                                Decision-makers, however, quickly recognized the
the Task Force.
                                                            impracticality of basing all of medicine on RCTs. For
Patient-reported outcomes/quality of life. Patient-         one thing, RCTs are expensive. For another, even the
reported outcome is the term adopted by the Food and        best RCT reflects a limited controlled experiment that
Drug Administration (FDA) and internationally to            may not generalize to populations, settings, or condi-
encompass any report coming directly from patients          tions not reflected in the trial. The need for non-RCT
about a health condition and its treatment, including       information became apparent, raising the question of
symptoms, functional status, HRQoL, treatment satis-        how to grade information that by definition was of
faction, preference, and adherence. Researchers have        “poorer quality.”
long recognized that self-reports of outcomes related           A number of groups have developed evidence hier-
to disease, injury, treatment or policy are important       archies over the years that reflect the primacy of data
because they provide the only direct voice that an          from RCTs, and grade other types of evidence by the
individual has in the health decision-making process.       rigor of the research design. For example, the hierar-
This information has been widely used for decision-         chy adopted by AHRQ grades evidence in order from
making in the United States since the mid-1950s with        most to least rigorous as follows [17]: 1) systematic
the establishment of the annual national health             reviews and meta-analyses of RCTs; 2) nonrandom-
surveys; other countries have subsequently developed        ized intervention studies; 3) observational studies; 4)
similar data collection mechanisms. The CMS adop-           nonexperimental studies; and 5) expert opinion.
tion of the Medicare Health Outcome Survey is recog-            The U.S. Preventive Services Task Force (USPSTF)
nition of the importance of PRO data for under-             took the approach a step further recognizing that
standing outcomes of the Medicare program. As               decision-makers require information not only on the
people live longer with chronic conditions, PROs have       rigor of the research design but also on the magnitude
become increasingly important to pharmaceutical             of the net benefit in support of a particular technology
manufacturers in assessing the impact of emerging           or health service [18]. Recent work developed by the
chronic treatments and in communicating the benefits        Scottish Intercollegiate Guidelines Network and the
of these drug treatments in label and promotional           Oxford Center for Evidence-Based Medicine also rec-
claims.                                                     ognized limitations of traditional grading systems [19].
   Interested parties have actively debated diverse con-    Strict use of evidence hierarchies may not account for
ceptual and methodological issues related to all types      the methodological quality of studies or may fail to
of PROs, resulting in the publication of various “best      reflect the overall strength of the evidence base. Users
practices” documents [12,13]. The level of under-           may misinterpret the grade of recommendation or they
standing of both the scientific rigor and its application   may fail to properly weigh lower-grade recommenda-
to the reporting of these subjective measures has           tions [20]. Other groups maintain their own evidence
improved. Like any science, however, questions              grading and classification systems that combine judg-
remain. FDA and regulatory agencies globally have           ments about evidence quality with judgments about
recently developed statements to guide the develop-         the usefulness of the intervention [21].
ment and use of these measures, especially by the phar-
maceutical industry in the drug approval process
[14,15].                                                    Sources of RW Data
                                                            Real-world data can also be categorized by type of
Evidence Hierarchies                                        data source. Our Task Force defined six such sources:
Historically, evidence hierarchies have been linked to      1) supplements to traditional registration RCTs; 2)
“evidence-based medicine.” The thrust of the EBM            large simple trials (also called practical clinical trials);
330                                                                                                    Garrison et al.

3) registries; 4) administrative data; 5) health surveys;    clinical trials are: identification of where subjects
and 6) electronic health records (EHRs) and medical          receive care may be more difficult (less in a closed
chart reviews.                                               system); data collection systems of community-based
                                                             settings may be less sophisticated than those of aca-
Supplements to RCTs. To provide additional data              demic settings (e.g., more likely to use paper rather
alongside standard clinically focused RCTs, research-        than electronic records, thus increasing the likelihood
ers often gather information on variables such as            of data entry errors); there is more likely to be a lack of
PROs, medical resource use, and costs. Such efforts          standardization in financial and billing systems across
can add valuable evidence on treatment patterns for          different settings of care; and more study coordinators
common events, e.g., such as the doses of drugs used to      will be involved in the data collection effort.
treat rejection in kidney transplantation [22].
   Limitations to such data are also well-known: their       Registries. Registries are prospective, observational
primary aim is to measure a key clinical efficacy end        cohort studies of patients who have a particular
point in a carefully limited population and clinical         disease and/or are receiving a particular treatment
setting. Furthermore, trials are not usually powered         or intervention. They can be used for understanding
statistically to measure precisely the probability of rare   natural history, assessing or monitoring RW safety and
adverse or other events and hence are of limited use         effectiveness, assessing quality of care and provider
in measuring the associated resource utilization and         performance, and assessing cost-effectiveness [24].
costs. RCTs are generally conducted over a shorter              Registries involve prospective data collection of
time frame than what is relevant for determining the         clinical, economic, and PRO information, and are
overall clinical and economic impact of an interven-         increasingly relying on real-time data capture. They
tion, and resource use is often protocol-driven.             typically include a larger and more diverse group of
                                                             patients than what is generally studied in phase III
Large simple trials. Large simple trials (also called        RCTs; therefore, they better reflect RW patients, man-
practical or pragmatic clinical trials) involve prospec-     agement practices, and outcomes. Patients are often
tive, randomized assignment but aimed at larger more         followed over a longer time frame, allowing for an
diverse RW population [23]. Large simple trials have         assessment of longer-term outcomes. Most registries
the important strength of randomization, which mini-         have very few, if any, required visits, evaluations, or
mizes bias in the estimation of treatment effects. These     procedures; therefore, the treatment patterns reflect
trials are by design larger than conventional RCTs. For      the everyday clinical decision-making that is most rel-
this reason, they are more likely to have sufficient         evant to providers and payers. Disease registries enable
power to capture significant differences in key out-         providers and payers to gain insight into the most
comes of interest, such as hospitalizations.                 cost-effective treatment approaches.
    Because the focus is on obtaining policy-relevant           Because registries do not involve random assign-
outcomes, costs and cost-effectiveness are more likely       ment to treatment, care must be taken in analyzing and
to be central end points, and the results can be more        interpreting the results due to the inherent limitations
readily generalized to the relevant treatment popula-        of observational studies. There is no guarantee that
tion than those obtained from conventional RCTs:             patient groupings are comparable; therefore, registries
costs are less likely to reflect protocol-driven health-     may not be suitable to test hypotheses, but are useful
care use; well-documented variations in resource use         to generate them. Furthermore, there are limitations in
across various ethnic, racial, age groups, and sexes can     terms of the amount of data that can be collected, and
be better captured by opening the trial to a more            because visit schedules are not required, data cannot
diverse population; people more at risk for adverse          necessarily be obtained at fixed intervals. Registries
events are less likely to be excluded from the trial, and    sometimes include study sites that are not experienced
the related economic effects are more likely to be cap-      in conducting research, and without appropriate over-
tured; and resource use and costs are more likely to         sight, data integrity could be in question. Nevertheless,
reflect those observed in community-based settings           the use of real-time data capture is likely to improve
where most people obtain their care, especially since        data monitoring and integrity. Registries are, in some
study drugs in phase III trials are generally provided       cases, established to collect postmarketing safety data,
for free.                                                    either in response to specific safety concerns or to fulfill
    Nevertheless, the large size of a practical clinical     regulatory obligations established as a condition of
trial increases the cost of data collection and raises       marketing approval.
some concerns about the quality of data collected.
Costs are increased not only because a larger number         Administrative data. Administrative data (typically
of patients are enrolled, but also because a larger          retrospective or real-time, if possible) are collected pri-
number of settings are involved. Some of the issues          marily for reimbursement, but contain some clinical
raised by economic data collection within practical          diagnosis and procedure use with detailed information
ISPOR Real-World Task Force Report                                                                                331

on charges. Claims databases lend themselves to retro-      not just those who are participating in a given RCT, or
spective longitudinal and cross-sectional analyses of       members of a particular health plan. As a result, health
clinical and economic outcomes at patient, group, or        survey data can make unique contributions about gen-
population levels. Such analyses can be performed at        eralizability of treatments and their impacts and about
low overall cost and in a short period of time. Given       use of and expenditures for health services.
the sheer size of claims databases, researchers can iden-      The major limitation of health survey data for
tify outcomes of patients with rare events more easily,     initial coverage and reimbursement decisions is the
assess economic impact of various interventions, and        lack on relevant data on specific products. Survey data
gain insight into possible association between interven-    are also subject to issues of subjectivity and recall bias.
tions and outcomes.
    Administrative claims databases can prove very
                                                            Electronic health records and medical chart review.
useful in measuring resource use and costs, provided
                                                            Finally, we note that EHRs (and other technologies
some basic principles are met. A clear research ques-
                                                            capturing real-time clinical treatment and outcomes)
tion needs to be defined and addressed by an appro-
                                                            are important sources for RW data for a wide range
priate design from a well-defined perspective. Available
                                                            of clinical settings throughout the world. The expan-
statistical tools can be used to help control for some of
                                                            sion of electronic data capture is essentially lowering
the potential biases. Methods and results should be
                                                            the cost of the medical chart reviews that have been
reported in a clear and transparent fashion, so that
                                                            widely used in the past to produce specific informa-
other researchers are able to understand and reproduce
                                                            tion on the RW use of specific tests or drugs for
the analyses.
                                                            particular conditions. EHRs—such as the UK General
    It is worth noting that with appropriate Institu-
                                                            Practice Research Database—contain more detailed,
tional Review Board (IRB) approvals and patient
                                                            longitudinal information including disease-specific
consent, some health plan databases can be linked with
                                                            symptoms at the personal level and should greatly
PROs, lab results, medical records, and physician
                                                            expand the use of this type of information. Neverthe-
surveys—effectively becoming a blend of traditional
                                                            less, transforming the information for research pur-
registries and claims data.
                                                            poses requires high-end statistical analysis tools and
    Beyond challenges posed by privacy issues, the
                                                            remains a challenge.
validity of retrospective claims database analyses
has been challenged on several fronts: data quality
(missing data, coding errors—whether random or
                                                            Key Findings
“intended”—and the lack of comprehensive data
across health-care settings); the lack of or very limited   Recognizing the Importance of RW Data
clinical information on inpatient stays, health out-        We conclude with a strong affirmation of the need for
comes, health status, and symptoms; limited valida-         RW data. As we have emphasized, RCTs have many
tion; absence of a population denominator; and the          advantages: their prospective design, prespecified well-
lack of distinction between costs and charges. Of           defined end points, randomization and control groups,
course, the large size of these databases may be able to    and blinding all work to provide unbiased measures of
overcome the issue of missing data if they are missing      impact in the trial population; however, this strong
at random. If data quality can be ascertained and           internal validity can limit their external validity and
privacy issues addressed, then treatment selection bias     generalizability about which interventions work best
in the sample is the most common and challenging            when implemented in different settings.
methodological issue. Estimates of the effects and costs       Decision-makers rely on multiple sources of RW
can be biased because of a correlation between unob-        data that must be integrated or synthesized in some
served factors associated with treatment selection and      fashion. While RCTs remain the gold standard for
outcomes, such as baseline health status.                   demonstrating clinical efficacy in restricted trial set-
                                                            tings, other designs contribute to the evidence base. In
Health surveys. Health surveys are designed to collect      some situations, RW data may provide clear advantage
descriptions of health status and well-being, health-       for understanding outcomes of treatment, for example,
care utilization, treatment patterns, and health-care       for patients excluded from trials, patients in actual
expenditures from patients, providers, or individuals       clinical practice settings (vs. research settings), and
in the general population. Health surveys typically         patients whose treatment is not determined by trial
collect information on representative individuals in        protocol or practice guidelines.
the target population, whether patients, physicians or         Among the benefits of RW data is that they can
general population, and are methodologically rigor-         provide:
ous, for example, relying on complex sample survey
designs. With these designs, surveys can provide infor-     •   Estimates of effectiveness rather than efficacy in a
mation about all members of the target population,              variety of typical practice settings;
332                                                                                                     Garrison et al.

•     Comparison of multiple alternative interventions          will relate to the question at hand. It is important to
      (e.g., older vs. newer drugs) or clinical strategies to   recognize the variable quality of all data (whether pro-
      inform optimal therapy choices beyond placebo             spective or retrospective, or experimental or observa-
      comparators;                                              tional). The extent to which data provide good or bad
•     Estimates of the evolving risk–benefit profile of a       evidence depends on the research design, the quality of
      new intervention, including long-term (and rare)          the information collected, and how the data are used.
      clinical benefits and harms;                              The optimal solution will depend on the circum-
•     Examination of clinical outcomes in a diverse             stances. Decisions typically rely on multiple sources,
      study population that reflects the range and distri-      and are best thought of as conditional—to be revisited
      bution of patients observed in clinical practice;         as additional evidence is generated.
•     Results on a broader range of outcomes (e.g.,                Ongoing work in evidence hierarchies recognizes
      PROs, HRQoL, and symptoms) than have tradi-               this complexity. As noted, new approaches for grading
      tionally been collected in RCTs (i.e., major mor-         evidence depend on the quality of the evidence and the
      bidity and short-term mortality);                         magnitude of the net benefit. The approaches are more
•     Data on resource use for the costing of health-care       explicit about which studies provide stronger evidence
      services and economic evaluation;                         in support of the use of a particular health service or
•     Information on how a product is dosed and                 drug.
      applied in clinical practice and on levels of com-           In all likelihood, we need clinical trials that are
      pliance and adherence to therapy;                         more practical and “RW data” that are more statisti-
•     Data in situations where it is not possible to            cally rigorous from a design and analysis standpoint
      conduct an RCT (e.g., narcotic abuse);                    [26]. For the collection of economic data alongside
•     Substantiation of data collected in more con-             RCTs, this suggests that data collection efforts focus
      trolled settings;                                         on “big ticket” items, rather then trying to capture a
•     Data in circumstances where there is an urgency           complete picture of resource use. To ensure that the
      to provide reimbursement for some therapies               data collected in these trials are of high quality, a
      because it is the only therapy available and may be       greater emphasis is needed on centralized training of
      life-saving;                                              study coordinators and records abstractors and on the
•     Interim evidence—in the absence of RCT                    centralized quality control mechanisms.
      data—upon which preliminary decisions can be                 The high cost associated with data collection may
      made; and                                                 further necessitate linkage of trial data to claims data,
•     Data on the net clinical, economic, and PRO               which requires collection of sensitive patient identifiers
      impacts following implementation of coverage or           such as Social Security numbers. This may raise addi-
      payment policies or other health management               tional patient confidentiality concerns with Institu-
      programs (e.g., the kind of data CMS expects to           tional Review Boards and may require collaboration
      collect under its coverage with evidence develop-         and cooperation of a government agency, or private
      ment policy) [25].                                        insurers to ensure patient-level claims can be made
                                                                available.
Recognizing the Limitations of RW Data
We also recognize important limitations of RW data.             The Need for Good Research Practices for Collecting
For all nonrandomized data, the most significant                and Reporting RW Data
concern is the potential for bias. Retrospective or pro-        Our review underscores the need for good practices for
spective observational or database studies do not meet          collecting and reporting RW data. In terms of data
the methodological rigor of RCTs, despite the avail-            collection, it is important that efforts follow well-
ability of sophisticated statistical approaches to adjust       established research practices. These include posing
for selection bias in observational data (covariate             well-defined questions, specifying time frames for the
adjustment, propensity scores, instrumental variables,          duration of data collection, conducting periodic moni-
etc.). Observational studies need to be evaluated rig-          toring to ensure quality and responsiveness to research
orously to identify sources of bias and confounding,            questions, and limiting sample sizes to the minimum
and adjusted for these before estimating the impact             necessary. These good practices should also ensure that
of interventions on health outcomes. Observational              informed consent and human subject protections are in
or database studies may also require substantial                place.
resources.                                                         Registries, for example, should be carefully
                                                                planned, beginning with clear and achievable objec-
The Level of Evidence Required Depends on                       tives and extensive clinical input. A protocol or study
the Circumstance                                                plan should guide the conduct of the registry and the
The complexity of data collection underscores the fact          data analysis. Data collection tools should be carefully
that the level of evidence required in any circumstance         designed and tested, and site coordinators thoroughly
ISPOR Real-World Task Force Report                                                                                 333

trained to ensure data quality. Quality control mecha-       and value of RW data. Ideally, there will be a mecha-
nisms should be in place to identify and resolve study       nism for challenge and change, which contributes to
issues. There is also a need to balance research needs       democratic governance.
with privacy concerns as it becomes easier to link data
across multiple sources.                                     The Need to Consider the Costs and Benefits of
    Researchers should draw inferences from observa-         Data Collection
tional data with caution. For example, a widely recog-       Two critical questions are how much RW data should
nized challenge with the analysis of RW data is the          be collected and who should pay for it. Evidence costs
need to correct for sample selection bias—defined as         money. Inevitably, there are questions about whether
the bias introduced by unobserved variables that are         resources devoted to the endeavor are worthwhile.
correlated with both treatment selection and patient         There is a need to prioritize decisions about RW data
outcomes (e.g., unobserved illness severity). On the         such that the benefits of collecting additional informa-
other hand, it is less well-recognized that selection bias   tion are expected to outweigh the costs.
is just one of a variety of factors including missing           The tool of “value-of-information (VOI) analysis”
variables, measurement error, incorrect functional           offers a formal approach to deciding when and what
form and two-way causation (“structural simultane-           types of data to collect. Formal use of decision analysis
ity”) that can introduce bias into treatment estimates.      and VOI analysis can help determine whether an inter-
All of these problems have the common statistical            vention should be adopted, whether additional evi-
result of introducing a correlation between the treat-       dence to further inform that decision is worth
ment variable and the error term, which is the source        gathering, and what kind of information is of greatest
of the bias. Overcoming this “endogeneity bias” is the       value [30].
most common and vexing challenge that arises in the             The VOI analysis evaluates the extent to which new
analysis of RW data. Statistical tests for endogeneity       evidence might improve expected benefits by reducing
bias exist, along with methods for its correction            the chance for error, and compares that improvement
although operationalizing these methods is often a sig-      to the cost of the information. Evidentiary consider-
nificant challenge: in many circumstances, our ability       ations will depend on the particular circumstances of a
to minimize this bias is limited by the lack of exog-        decision (the consequences of an error, what can be
enous variation in the available data [27,28].               learned from additional evidence, how new knowledge
                                                             will change and improve the option identified as
Good Process in Using RW Data in Coverage and                optimal), rather than on predetermined specifications
Reimbursement Decisions                                      about the type of study design (e.g., RCT). Explicit
We also recognize the importance of the processes            consideration is given to the potential positive health
implemented by decision-makers in using RW data              outcomes forgone due to delays in coverage as well as
in coverage and reimbursement. Observers point to            the potential adverse consequences of too rapid uptake
several conditions for establishing good process,            when the risk–benefit ratio is highly uncertain.
including transparency (the decision and the rationale
for making them must be publicly accessible) and rel-        The Need for Modeling
evance (there must be a reasonable explanation for a         A previous ISPOR Task Force on modeling studies
decision’s rationale) [29]. They should also be fair in      emphasized that the purpose of modeling is to “struc-
the sense that RW data will be used in similar fashion       ture evidence on clinical and economic outcomes in a
across technologies, or if situations demand a different     form that can help inform decisions” [31]. They stated:
rationale, the circumstances or principles would be          “Models synthesize evidence on health consequences
known. Decisions should not be “bureaucratically             and costs from many different sources, including data
arbitrary,” or based on reasons that people do not view      from clinical trials, observational studies, insurance
as meaningful or just.                                       claim databases, case registries, public health statistics,
   Typically, the amount of RW data available for the        and preference surveys.”
initial coverage and reimbursement decision is very             Our field has adopted bioclinical cost-effectiveness
limited: instead, effectiveness projections are modeled      models as an integrative framework, incorporating
from efficacy data. Good process would encourage the         biological, clinical outcomes, and economic data into
subsequent gathering of RW data, in part by updating         a decision-making structure. Within this general
coverage and reimbursement decisions based on those          structure, there are several different types of possible
data.                                                        pharmacoeconomic evaluations, such as cost–
   Processes should also allow opportunity for stake-        consequences, cost–utility, and cost–benefit analyses.
holder participation. Different stakeholders (physi-         Such models and analyses are the primary vehicle for
cians, patients, family caregivers, payers, hospitals,       combining RCT and RW data to support coverage and
regulatory agencies, employers, manufacturers, and           reimbursement decision-making. At the same time,
researchers) may have different perspectives on the use      expanded use of RW data in assessing postlaunch cost-
334                                                                                                       Garrison et al.

effectiveness is needed to update the modeling pro-           site at http://www.ispor.org/workpaper/RWD_TF/
jections made to support initial coverage and                 MemberComments.pdf. One recurring theme was a
reimbursement decisions.                                      perceived need for more guidance on precisely how
   In terms of good research practice, the previous           decision-makers will actually use RW data after the
Task Force emphasized the need for transparency of            launch of a product. Other themes included calls for
assumptions, including the point that “conclusions are        more discussion of limitations of various aspects of
conditional upon the assumptions and data on which            RW data collection, calls for more attention to the uses
the model is built.”                                          of EHRs, and comments that our report could have
                                                              used more emphasis on international data collection
The Need for Ongoing Dialog                                   and on medical devices. Although we made numerous
Finally, our review highlights the need for ongoing           changes to this document in light of these comments,
stakeholder dialog on all of these issues. Implicit in        we still refer the interested reader to the comments on
much of our discussion is a central policy question           the ISPOR website, as several of the important broader
about the appropriate role of the public sector in pro-       themes and limitations they raise should help to define
ducing and judging evidence. Who collects and evalu-          the agenda for the next steps in this continuing inquiry.
ates RW data, which pays for these efforts, and what
magnitude of resources is provided are key questions
                                                              The members of the Task Force Working Groups contributed
for policymakers worldwide. There is no general regu-         significantly to the preparation of this report: Evidence Hier-
latory requirement for collecting RW data.                    archies: Joe Jackson, Phil Sarocco, Jennifer Elston-Lafata;
    In the United States, CMS has called for data that        Clinical Outcomes: Marc Berger, Bruce Carleton, Gurvaneet
reflect RW practice. In addition, FDA is requiring the        Randhawa, Anne Smith; Economic Outcomes: Bill Crown,
implementation of mandatory registries in instances           Jens Grueger, Penny Mohr, Les Noe; Patient-Reported Out-
where there is any concern regarding long-term safety         comes: Jamie Banks, Rick Chapman, Mary Cifaldi, Andreas
of a therapy. The opportunity for funding observa-            Pleil. The Task Force is grateful to Marilyn Dix Smith, and
tional studies of therapy is limited compared with tra-       Erin Gilli of ISPOR for administrative support. The views
ditional epidemiological studies and therapeutic trials.      expressed reflect those of Task Force members and do not
                                                              necessarily reflect the views of the broader ISPOR members,
It is critical that we have an intelligent and vigorous
                                                              nor their employers.
public discussion on these and other issues.

                                                              References
Conclusions
                                                               1 Medicare Prescription Drug, Improvement, and Mod-
Real-world data are essential for sound coverage,                ernization Act of 2003, Pub. L. no. 108-173, 117 Stat
payment, and reimbursement decisions. The types and              206 (2003).
applications of such are varied, and context matters           2 Agency for Healthcare Research and Quality. Effec-
greatly in determining the value of a particular type of         tive Heath Care Program. 2006. Available from:
evidence in any circumstance. Different study designs            http : // effectivehealthcare . ahrq . gov / aboutUS / index.
can provide useful information in different situations.          cfm [Accessed July 13, 2006].
RCTs remain the gold standard for demonstrating                3 Agency for Healthcare Research and Quality. AHRQ
clinical efficacy in restricted trial setting, but other         Launches Project to Develop Guide for Patient Regis-
designs—such as observational registries, claims data-           tries. Press release, December 7, 2005. Available from:
                                                                 http://www.ahrq.gov/news/press/pr2005/ptregpr.htm
bases, and practical clinical trials—can contribute to
                                                                 [Accessed December 21, 2006].
the evidence base needed for coverage and payment              4 Centers for Medicare and Medicaid Services. National
decisions.                                                       Coverage Determinations with Data Collection as a
   It is critical that policymakers recognize the benefits,      Condition of Coverage: Coverage with Evidence
limitations, and methodological challenges in using RW           Development. 2006. Available from: http://
data, and the need to carefully consider the costs and           www.cms .hhs. gov / mcd / ncpc _ view _ document . asp?
benefits of different forms of data collection in different      id=8 [Accessed December 21, 2006].
situations. In this report, we have attempted to identify      5 Neumann PJ. Evidence-based and value-based formu-
the salient issues in developing and using RW data. We           lary guidelines. Health Aff (Millwood) 2004;23:124–
acknowledge many challenges ahead and view this                  34.
effort as a starting point for future debates.                 6 Academy of Managed Care Pharmacy. AMCP Format
                                                                 for Formulary Submissions, Version 2.1. 2005. Avail-
                                                                 able from: http://www.amcp.org/amcp.ark?c=pr&sc=
Postscript                                                       link [Accessed July 13, 2006].
                                                               7 Oregon Health & Science University. Drug Effect-
Prior to finalizing this report, we reviewed more                iveness Review Project, 2006. Available from:
than 70 comments from ISPOR members on our                       http: // www.ohsu.edu / drugeffectiveness / reports/final.
draft report, which are accessible at the ISPOR web-             cfm [Accessed July 13, 2006].
ISPOR Real-World Task Force Report                                                                                    335

 8 National Institute for Health and Clinical Excellence.   19 Scottish Intercollegiate Guidelines Network. 2006.
   2006. Available from: http://www.nice.org.uk/ [Ac-          Available from: http://www.sign.ac.uk/ [Accessed July
   cessed July 13, 2006].                                      17, 2006].
 9 IQWIG. German Institut für Qualität und                  20 Harbour R, Miller J. A new system for grading rec-
   Wirtschaftlichkeit im Gesundheitswesen. 2006. Avail-        ommendations in evidence based guidelines. BMJ
   able from: http://www.iqwig.de/ [Accessed July 13,          2001;323:334–6.
   2006].                                                   21 American College of Cardiology Foundation.
10 Sackett DL, Rosenberg WM, Gray JA, et al. Evidence          Clinical Statements and Guidelines. 2006. Available
   based medicine: what it is and what it isn’t. BMJ           from: http://www.acc.org/qualityandscience/clinical/
   1996;312:71–2.                                              statements.htm [Accessed April 15, 2007].
11 Woolf SH. Evidence-based medicine and practice           22 O’Sullivan AK, Thompson D, Drummond MF. Col-
   guidelines: an overview. Cancer Control 2000;               lection of health-economic data alongside clinical
   7:362–7.                                                    trials: is there a future for piggyback evaluations?
12 Wild D, Grove A, Martin M, et al. Principles of             Value Health 2005;8:67–79.
   good practice for the translation and cultural adap-     23 Tunis SR, Stryer DB, Clancy CM. Practical clinical
   tation process for patient-reported outcomes (pro)          trials: increasing the value of clinical research for deci-
   measures: report of the ISPOR task force for trans-         sion making in clinical and health policy. JAMA
   lation and cultural adaptation. Value Health                2003;290:1624–32.
   2005;8:94–104.                                           24 Agency for Healthcare Research and Quality. Effec-
13 Acquadro C, Berzon R, Dubois D, et al. Incorporat-          tive HealthCare, DEcIDE: Registries for Evaluating
   ing the patient’s perspective into drug development         Patient Outcomes. 2006. Available from: http://
   and communication: an ad hoc task force report of           effectivehealthcare . ahrq . gov / decide / registryOutline.
   the Patient-Reported Outcomes (PRO) Harmoniza-              cfm [Accessed July 18, 2006].
   tion Group meeting at the Food and Drug Admin-           25 Centers for Medicare and Medicaid Services. Guidance
   istration, February 16, 2001. Value Health 2003;6:          for the Public, Industry, and CMS Staff: National
   522–31.                                                     Coverage Determinations with Data Collection as a
14 US Department of Health and Human Services, Food            Condition of Coverage: Coverage with Evidence
   and Drug Administration. Guidance for Industry              Development. 2005. Available from: http://www .
   Patient Reported Outcome Measures: Use in Medical           cms . hhs . gov / mcd / ncpc _ view _ document . asp? id = 8
   Product Development to Support Labeling Claims.             [Accessed July 23, 2006].
   2006. Available from: http://www.fda.gov/cber/gdlns/     26 Glasgow RE, Magid DJ, Beck A, et al. Practical clini-
   prolbl.pdf [Accessed July 13, 2006].                        cal trials for translating research to practice: design
15 Szende A, Leidy NK, Revicki D. Health-related               and measurement recommendations. Med Care 2005;
   quality of life and other patient-reported outcomes in      43:551–7.
   the European centralized drug regulatory process: a      27 D’Agostino RB Jr. Propensity score methods for bias
   review of guidance documents and performed autho-           reduction in the comparison of a treatment to a non-
   rizations of medicinal products 1995 to 2003. Value         randomized control group. Stat Med 1998;17:2265–
   Health 2005;8:534–48.                                       81.
16 Guyatt GH, Sackett DL, Sinclair JC, et al. Users’        28 Crown WH, Obenchain RL, Englehart L, et al. The
   guides to the medical literature. IX. A method for          application of sample selection models to outcomes
   grading health care recommendations. Evidence-              research: the case of evaluating the effects of antide-
   Based Medicine Working Group. JAMA 1995;                    pressant therapy on resource utilization. Stat Med
   274:1800–4.                                                 1998;17:1943–58.
17 United States Department of Health and Human Ser-        29 Daniels N, Sabin J. Setting Limits Fairly: Can We
   vices, Agency for Health Care Policy and Research.          Learn to Share Medical Resources? New York:
   Acute Pain Management: Operative or Medical                 Oxford University Press, 2002.
   Procedures and Trauma. 92-0023, 107. Rockville,          30 Claxton K, Cohen JT, Neumann PJ. When is evidence
   MD: AHCPR, 1993. Clinical Practical Guideline               sufficient? Health Aff (Millwood) 2005;24:93–101.
   No. 1.                                                   31 Weinstein MC, O’Brien B, Hornberger J, et al.
18 Harris RP, Helfand M, Woolf SH, et al. Current              Principles of good practice for decision analytic mod-
   methods of the US Preventive Services Task Force: a         eling in health-care evaluation: report of the ISPOR
   review of the process. Am J Prev Med 2001;20(3              Task Force on Good Research Practices—Modeling
   Suppl.):S21–35.                                             Studies. Value Health 2003;6:9–17.
You can also read