Using Real-World Data for Coverage and Payment Decisions: The ISPOR Real-World Data Task Force Report
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Volume 10 • Number 5 • 2007 VA L U E I N H E A LT H Using Real-World Data for Coverage and Payment Decisions: The ISPOR Real-World Data Task Force Report Louis P. Garrison Jr., PhD (cochair),1 Peter J. Neumann, ScD (cochair),2 Pennifer Erickson, PhD,3 Deborah Marshall, PhD,4 C. Daniel Mullins, PhD5 1 University of Washington, Seattle, WA, USA; 2Tufts-New England Medical Center, Boston, MA, USA; 3O.L.G.A., State College, PA, USA; 4 i3Innovus, Burlington, ON, Canada; 5University of Maryland, Baltimore, MD, USA A B S T R AC T Objectives: Health decision-makers involved with coverage according to the strength of research design), and by type of and payment policies are increasingly developing policies that data source (supplementary data collection alongside RCTs, seek information on “real-world” (RW) outcomes. Moti- large simple trials, patient registries, administrative claims vated by these initiatives, the International Society for Phar- database, surveys, and medical records). Our report discusses macoeconomics and Outcomes Research (ISPOR) created a eight key issues: 1) the importance of RW data; 2) limitations Task Force on Real-World Data to develop a framework to of RW data; 3) the fact that the level of evidence required assist health-care decision-makers in dealing with RW data, depends on the circumstance; 4) the need for good research especially related to coverage and payment decisions. practices for collecting and reporting RW data; 5) the need Methods: Task Force cochairs were selected by the ISPOR for good process in using RW data in coverage and reim- Board of Directors. Cochairs selected chairs for four working bursement decisions; 6) the need to consider costs and ben- groups on: clinical outcomes, economic outcomes, patient- efits of data collection; 7) the ongoing need for modeling; and reported outcomes, and evidence hierarchies. Task Force 8) the need for continued stakeholder dialogue on these members included representatives from academia, the phar- topics. maceutical industry, and health insurers. The Task Force met Conclusions: Real-world data are essential for sound cover- on several occasions, conducted frequent correspondence and age and reimbursement decisions. The types and applications exchanges of drafts, and solicited comments on three drafts of such data are varied, and context matters greatly in deter- from a core group of external reviewers and from the ISPOR mining the value of a particular type in any circumstance. It membership. is critical that policymakers recognize the benefits, limita- Results: We defined RW data as data used for decision- tions, and methodological challenges in using RW data, and making that are not collected in conventional randomized the need to consider carefully the costs and benefits of differ- controlled trials (RCTs). We considered several characteriza- ent forms of data collection in different situations. tions: by type of outcome (clinical, economic, and patient- Keywords: methodology, outcomes research, real-world reported), by hierarchies of evidence (which rank evidence data, research design. Why a Real-World Data Task Force? by regulatory authorities for marketing approval. It is broadly acknowledged that while RCTs provide a Growing Use of Evidence Syntheses and “gold standard” in the sense that they provide solid Outcomes Research evidence of product efficacy under carefully controlled Health decision-makers involved with coverage and conditions, RCTs are carried out using selected popu- payment policies are increasingly seeking information lations under idealized conditions. In addition, they on “real-world” (RW) outcomes on which to base their are expensive to conduct. Other sources of data can decisions. Many of them are developing policies that contribute in important ways to the evidence base integrate evidence from different sources. These poli- (e.g., demonstrating how a drug works in populations cies recognize the importance of evidence that goes or under conditions not studied in the trial, or relative beyond information collected during clinical develop- to another drug not included in the study). ment in randomized controlled trials (RCTs) required Address correspondence to: Louis P. Garrison, Jr., Department of Policy Developments Pharmacy, University of Washington, Health Sciences Building, H375, Box 357630, Seattle, WA 98195, USA. E-mail: Recent policy initiatives highlight payers’ attempts to lgarrisn@u.washington.edu collect and use such data. The Medicare Modernization 10.1111/j.1524-4733.2007.00186.x Act (MMA) of 2003 illustrates the US government’s 326 © 2007, International Society for Pharmacoeconomics and Outcomes Research (ISPOR) 1098-3015/07/326 326–335
ISPOR Real-World Task Force Report 327 attempt to bolster the evidence base on which private RW health-care decision-making, especially related to health plans participating in the new Medicare drug coverage and payment decisions. benefit can base their coverage decisions. The MMA contains a provision (Section 1013) calling on the Agency for Healthcare Research and Quality (AHRQ) Scope to conduct research on the “outcomes, comparative clinical effectiveness, and appropriateness of health Defining RW data. From the outset, the Task Force care, including prescription drugs” [1]. The AHRQ has grappled with the definition and appropriate charac- recently launched the Effective Health Care Program to terization of RW data. It seemed self-evident that RW synthesize, generate, and translate knowledge to aid outcomes data should come from RW situations. Part stakeholders in grappling with the often difficult deci- of the Task Force’s charge was to consider uses and sions they must make [2]. The AHRQ has also limitations of evidence not obtained from RCTs. On announced that it is developing a “how-to” reference the other hand, it was also clear that decision-making guide to help health-care organizations in creating is a highly integrative process of synthesizing informa- patient registries to track the outcomes of medical tion from different sources—both “laboratory” and treatments, including drugs [3]. The US Centers for real-world. Medicare and Medicaid Services (CMS) has also Some Task Force members questioned the appropri- recently issued a revised Coverage with Evidence Devel- ateness of the term “RW data” in the first place. In the opment Guidance document that focuses on the need end, we decided to adhere to the term, not only for RW data to help inform national coverage decisions because it reflected the charge from the ISPOR Board, for new technology [4]. Many private payers are adopt- but also because the term has gained currency in some ing the Academy of Managed Care Pharmacy Format, policy circles. which calls for health plans to request formally that We settled on a definition that reflects data used for drug companies present a standardized “dossier,” con- decision-making that are not collected in conventional taining detailed information not only on the drug’s RCTs. This is not to say that data from RCTs are efficacy and safety, but also on its projected effectiveness irrelevant or not used by decision-makers; indeed, they and economic value relative to alternative therapies remain the critical foundation for almost all initial [5,6]. coverage and payment decisions. For if there is not a Other efforts to review evidence systematically have belief in the plausibility of the underlying biological also gained momentum, including those by private and mechanism or hypothesis, why should anyone seek public health plans. A notable initiative in the United further evidence of effectiveness or cost impact in the States is the Drug Effectiveness Review Project real world? Yet, efficacy evidence in a particular group (DERP), an alliance of 15 state Medicaid programs or subgroup is typically insufficient to project the size and two nonprofit organizations, to synthesize and of the effectiveness impact in the population that judge clinical evidence for drug class reviews [7]. would actually use a product. Decision-makers there- Around the globe, national reimbursement authorities fore seek additional types and sources of data. and the organizations that assist them, such as the National Institute for Health and Clinical Excellence Data versus evidence. Our Task Force also deliberated (NICE), and the German Institut für Qualität und distinctions between the terms “real-world data” and Wirtschaftlichkeit im Gesundheitswesen (IQWiG), are “real-world evidence.” Some in our group favored the struggling with similar issues [8,9]. Both DERP and latter term, or at least raised questions about whether IQWiG emphasize RCTs for clinical evidence as we meant evidence when we employed the term data. opposed to synthetic modeling approaches, such as The notion was that “data” conjures the idea of simple those used by NICE to project RW effectiveness. factual information, whereas “evidence” connotes the organization of the information to inform a conclusion or judgment. Evidence is generated according to a Task Force Objectives and Scope research plan and interpreted accordingly, whereas data is but one component of the research plan. Evi- Objectives dence is shaped, while data simply are raw materials Motivated by the MMA and other efforts, and recog- and alone are noninformative. nizing the lack of a framework for considering RW In the end, we adhered to the term “real-world data in coverage and reimbursement decisions, the data” for reasons noted above. Nevertheless, we try to International Society for Pharmacoeconomics and remain sensitive to the distinctions throughout, and Outcomes Research (ISPOR) created a Task Force on emphasize that thorough knowledge of the quality of Real-World Data. The mission of the Task Force was the data source is necessary when deciding whether a to develop a framework to assist health-care decision- specific set of data can be used to answer a research makers in dealing with RW data and information in question or shed light on a decision.
328 Garrison et al. Focus on coverage and payment decisions. We focus core group of external reviewers and more broadly our report on coverage and payment (i.e., both pricing from the membership of ISPOR. and reimbursement) decisions. This sets it apart from other contexts, such as direct patient care or regulatory approval for marketing. Types and Sources of RW Data We recognize the tension that sometimes arises Characterizing RW Data between what is good for an individual patient and There are several ways in which one might characterize what is good for the population at large. Evidence- RW data. One is by type of outcome: clinical, eco- based medicine (EBM) has been defined as “the con- nomic, and patient-reported outcomes (the focus of scientious, explicit and judicious use of current best three of our working groups). An advantage of this evidence in making decisions about the care of indi- approach is that it corresponds to the way in which vidual patients” [10]. While physicians retain a pivotal many decision-makers conceive of data. A downside is role in patient care decisions, other bodies such as that it provides broad categories, each of which com- pharmacy and therapeutic (P&T) committees and bines many types and sources of evidence. national reimbursement bodies (e.g., in Canada, Aus- A second characterization involves traditional hier- tralia, and most European countries) are taking on archies of evidence (the focus of our fourth working greater importance in medical decision-making. group), which rank evidence according to the strength of the research design. Typically, data from RCTs sit Drugs versus other interventions. While most of our atop the hierarchy followed by data from nonrandom- examples are drawn from the world of prescription ized intervention studies, followed by epidemiological drugs, our report has implications for all types of studies and so forth [11]. Evidence hierarchies provide interventions, including drugs, devices, procedures, a useful ranking based on the rigor of the research and health programs. design; however, they do not provide a complete picture of RW data. The results from many RCTs are US versus global focus. The motivation for the Task not generalizable to a broader population. Conversely, Force came largely from the MMA and US policy a well-conducted observational study may prove arena, but debates about types and sources of data are, highly useful in certain situations provided that poten- of course, present around the world—as suggested in tial biases have been adequately addressed. Indeed, the different approaches taken, for example, by NICE some would argue that observational data can often and IQWiG. Although many of our examples are taken provide more relevant evidence regarding patient out- from US contexts, our findings are global in reach. comes in actual clinical practice than can a registration RCT. Finally, one might consider RW data by types of Task Force Participation data sources. The value of this classification is that it Task Force cochairs were selected by the ISPOR identifies tangible sources of information. A potential Board of Directors. Cochairs selected chairs for four drawback is that it represents a simplification that does working groups on: clinical outcomes, economic out- not capture important design issues within each source comes, patient-reported outcomes (PROs), and evi- of evidence. dence hierarchies. The first three groups correspond to Each of the three characterizations provides a dif- the three major areas of outcomes research that con- ferent perspective on RW data. Collectively, we believe tribute to the studies of comparative effectiveness and they provide a useful portrait of the strengths, weak- economic evaluation sought by decision-makers. The nesses, and complexities inherent in the topic. working group on evidence hierarchies addresses the cross-cutting, foundational issue of the quality of evi- Types of Outcomes dence. This paper reflects an integration of the various working group reports. Clinical outcomes. Clinical outcomes include bio- Task Force members included representatives from logical measures of morbidity (e.g., blood pressure, academia, the pharmaceutical industry, and health cholesterol level, symptoms, and side effects) and mor- insurance plans, and brought varied experiences in tality. Clinical outcomes include both surrogate (inter- using RW data. An outline and draft report developed mediate) and long-term measures. For purposes of this by the panel were presented at the 2005 International discussion, we differentiate clinical outcomes from and European ISPOR Meetings. The manuscript was health outcomes and especially PROs and health- then submitted to a reference group of interested related quality of life (HRQoL). ISPOR members for review and comment. The Task Much of the data collected in phase III registration Force met on several occasions, conducted frequent trials involves clinical outcomes. Clinical outcome data correspondence and exchanges of drafts by electronic are also found in many other sources, such as patient mail, and solicited comments on three drafts from a registries or observational databases. A key issue, dis-
ISPOR Real-World Task Force Report 329 cussed in greater detail in Section 3.4, involves the movement is to ground clinical practice in rigorous validity of clinical data from such sources. research. EBM proponents emphasize that traditional medical practice incorporated local practices and Economic outcomes. “Economic outcomes” are nar- expert opinion that were not tested in controlled rowly defined here to include estimates of medical and studies [11]. They stress the need for clinical research- nonmedical resource utilization and their associated ers to document all study protocols, utilize appropriate costs. Such data are used to project the expected cost analytical techniques, and strive for internal consis- of an intervention in the real world—e.g., in the tency [16]. Studies are to be considered externally valid numerator of a cost-effectiveness ratio. As discussed when findings are generalizable beyond local clinical below, many sources of RW data are useful in provid- practices. A scientific body of evidence became reliable ing use and cost information. Of course, we recognize and generalizable when similar results were reported that the term economic outcomes can be more broadly by different researchers across a range of study designs construed since economic evaluations consider both and patient populations. For these reasons, RCTs were benefits and costs, but keep this narrow definition in placed at the top of the evidence hierarchy. the interest of continuity with the original charter of Decision-makers, however, quickly recognized the the Task Force. impracticality of basing all of medicine on RCTs. For Patient-reported outcomes/quality of life. Patient- one thing, RCTs are expensive. For another, even the reported outcome is the term adopted by the Food and best RCT reflects a limited controlled experiment that Drug Administration (FDA) and internationally to may not generalize to populations, settings, or condi- encompass any report coming directly from patients tions not reflected in the trial. The need for non-RCT about a health condition and its treatment, including information became apparent, raising the question of symptoms, functional status, HRQoL, treatment satis- how to grade information that by definition was of faction, preference, and adherence. Researchers have “poorer quality.” long recognized that self-reports of outcomes related A number of groups have developed evidence hier- to disease, injury, treatment or policy are important archies over the years that reflect the primacy of data because they provide the only direct voice that an from RCTs, and grade other types of evidence by the individual has in the health decision-making process. rigor of the research design. For example, the hierar- This information has been widely used for decision- chy adopted by AHRQ grades evidence in order from making in the United States since the mid-1950s with most to least rigorous as follows [17]: 1) systematic the establishment of the annual national health reviews and meta-analyses of RCTs; 2) nonrandom- surveys; other countries have subsequently developed ized intervention studies; 3) observational studies; 4) similar data collection mechanisms. The CMS adop- nonexperimental studies; and 5) expert opinion. tion of the Medicare Health Outcome Survey is recog- The U.S. Preventive Services Task Force (USPSTF) nition of the importance of PRO data for under- took the approach a step further recognizing that standing outcomes of the Medicare program. As decision-makers require information not only on the people live longer with chronic conditions, PROs have rigor of the research design but also on the magnitude become increasingly important to pharmaceutical of the net benefit in support of a particular technology manufacturers in assessing the impact of emerging or health service [18]. Recent work developed by the chronic treatments and in communicating the benefits Scottish Intercollegiate Guidelines Network and the of these drug treatments in label and promotional Oxford Center for Evidence-Based Medicine also rec- claims. ognized limitations of traditional grading systems [19]. Interested parties have actively debated diverse con- Strict use of evidence hierarchies may not account for ceptual and methodological issues related to all types the methodological quality of studies or may fail to of PROs, resulting in the publication of various “best reflect the overall strength of the evidence base. Users practices” documents [12,13]. The level of under- may misinterpret the grade of recommendation or they standing of both the scientific rigor and its application may fail to properly weigh lower-grade recommenda- to the reporting of these subjective measures has tions [20]. Other groups maintain their own evidence improved. Like any science, however, questions grading and classification systems that combine judg- remain. FDA and regulatory agencies globally have ments about evidence quality with judgments about recently developed statements to guide the develop- the usefulness of the intervention [21]. ment and use of these measures, especially by the phar- maceutical industry in the drug approval process [14,15]. Sources of RW Data Real-world data can also be categorized by type of Evidence Hierarchies data source. Our Task Force defined six such sources: Historically, evidence hierarchies have been linked to 1) supplements to traditional registration RCTs; 2) “evidence-based medicine.” The thrust of the EBM large simple trials (also called practical clinical trials);
330 Garrison et al. 3) registries; 4) administrative data; 5) health surveys; clinical trials are: identification of where subjects and 6) electronic health records (EHRs) and medical receive care may be more difficult (less in a closed chart reviews. system); data collection systems of community-based settings may be less sophisticated than those of aca- Supplements to RCTs. To provide additional data demic settings (e.g., more likely to use paper rather alongside standard clinically focused RCTs, research- than electronic records, thus increasing the likelihood ers often gather information on variables such as of data entry errors); there is more likely to be a lack of PROs, medical resource use, and costs. Such efforts standardization in financial and billing systems across can add valuable evidence on treatment patterns for different settings of care; and more study coordinators common events, e.g., such as the doses of drugs used to will be involved in the data collection effort. treat rejection in kidney transplantation [22]. Limitations to such data are also well-known: their Registries. Registries are prospective, observational primary aim is to measure a key clinical efficacy end cohort studies of patients who have a particular point in a carefully limited population and clinical disease and/or are receiving a particular treatment setting. Furthermore, trials are not usually powered or intervention. They can be used for understanding statistically to measure precisely the probability of rare natural history, assessing or monitoring RW safety and adverse or other events and hence are of limited use effectiveness, assessing quality of care and provider in measuring the associated resource utilization and performance, and assessing cost-effectiveness [24]. costs. RCTs are generally conducted over a shorter Registries involve prospective data collection of time frame than what is relevant for determining the clinical, economic, and PRO information, and are overall clinical and economic impact of an interven- increasingly relying on real-time data capture. They tion, and resource use is often protocol-driven. typically include a larger and more diverse group of patients than what is generally studied in phase III Large simple trials. Large simple trials (also called RCTs; therefore, they better reflect RW patients, man- practical or pragmatic clinical trials) involve prospec- agement practices, and outcomes. Patients are often tive, randomized assignment but aimed at larger more followed over a longer time frame, allowing for an diverse RW population [23]. Large simple trials have assessment of longer-term outcomes. Most registries the important strength of randomization, which mini- have very few, if any, required visits, evaluations, or mizes bias in the estimation of treatment effects. These procedures; therefore, the treatment patterns reflect trials are by design larger than conventional RCTs. For the everyday clinical decision-making that is most rel- this reason, they are more likely to have sufficient evant to providers and payers. Disease registries enable power to capture significant differences in key out- providers and payers to gain insight into the most comes of interest, such as hospitalizations. cost-effective treatment approaches. Because the focus is on obtaining policy-relevant Because registries do not involve random assign- outcomes, costs and cost-effectiveness are more likely ment to treatment, care must be taken in analyzing and to be central end points, and the results can be more interpreting the results due to the inherent limitations readily generalized to the relevant treatment popula- of observational studies. There is no guarantee that tion than those obtained from conventional RCTs: patient groupings are comparable; therefore, registries costs are less likely to reflect protocol-driven health- may not be suitable to test hypotheses, but are useful care use; well-documented variations in resource use to generate them. Furthermore, there are limitations in across various ethnic, racial, age groups, and sexes can terms of the amount of data that can be collected, and be better captured by opening the trial to a more because visit schedules are not required, data cannot diverse population; people more at risk for adverse necessarily be obtained at fixed intervals. Registries events are less likely to be excluded from the trial, and sometimes include study sites that are not experienced the related economic effects are more likely to be cap- in conducting research, and without appropriate over- tured; and resource use and costs are more likely to sight, data integrity could be in question. Nevertheless, reflect those observed in community-based settings the use of real-time data capture is likely to improve where most people obtain their care, especially since data monitoring and integrity. Registries are, in some study drugs in phase III trials are generally provided cases, established to collect postmarketing safety data, for free. either in response to specific safety concerns or to fulfill Nevertheless, the large size of a practical clinical regulatory obligations established as a condition of trial increases the cost of data collection and raises marketing approval. some concerns about the quality of data collected. Costs are increased not only because a larger number Administrative data. Administrative data (typically of patients are enrolled, but also because a larger retrospective or real-time, if possible) are collected pri- number of settings are involved. Some of the issues marily for reimbursement, but contain some clinical raised by economic data collection within practical diagnosis and procedure use with detailed information
ISPOR Real-World Task Force Report 331 on charges. Claims databases lend themselves to retro- not just those who are participating in a given RCT, or spective longitudinal and cross-sectional analyses of members of a particular health plan. As a result, health clinical and economic outcomes at patient, group, or survey data can make unique contributions about gen- population levels. Such analyses can be performed at eralizability of treatments and their impacts and about low overall cost and in a short period of time. Given use of and expenditures for health services. the sheer size of claims databases, researchers can iden- The major limitation of health survey data for tify outcomes of patients with rare events more easily, initial coverage and reimbursement decisions is the assess economic impact of various interventions, and lack on relevant data on specific products. Survey data gain insight into possible association between interven- are also subject to issues of subjectivity and recall bias. tions and outcomes. Administrative claims databases can prove very Electronic health records and medical chart review. useful in measuring resource use and costs, provided Finally, we note that EHRs (and other technologies some basic principles are met. A clear research ques- capturing real-time clinical treatment and outcomes) tion needs to be defined and addressed by an appro- are important sources for RW data for a wide range priate design from a well-defined perspective. Available of clinical settings throughout the world. The expan- statistical tools can be used to help control for some of sion of electronic data capture is essentially lowering the potential biases. Methods and results should be the cost of the medical chart reviews that have been reported in a clear and transparent fashion, so that widely used in the past to produce specific informa- other researchers are able to understand and reproduce tion on the RW use of specific tests or drugs for the analyses. particular conditions. EHRs—such as the UK General It is worth noting that with appropriate Institu- Practice Research Database—contain more detailed, tional Review Board (IRB) approvals and patient longitudinal information including disease-specific consent, some health plan databases can be linked with symptoms at the personal level and should greatly PROs, lab results, medical records, and physician expand the use of this type of information. Neverthe- surveys—effectively becoming a blend of traditional less, transforming the information for research pur- registries and claims data. poses requires high-end statistical analysis tools and Beyond challenges posed by privacy issues, the remains a challenge. validity of retrospective claims database analyses has been challenged on several fronts: data quality (missing data, coding errors—whether random or Key Findings “intended”—and the lack of comprehensive data across health-care settings); the lack of or very limited Recognizing the Importance of RW Data clinical information on inpatient stays, health out- We conclude with a strong affirmation of the need for comes, health status, and symptoms; limited valida- RW data. As we have emphasized, RCTs have many tion; absence of a population denominator; and the advantages: their prospective design, prespecified well- lack of distinction between costs and charges. Of defined end points, randomization and control groups, course, the large size of these databases may be able to and blinding all work to provide unbiased measures of overcome the issue of missing data if they are missing impact in the trial population; however, this strong at random. If data quality can be ascertained and internal validity can limit their external validity and privacy issues addressed, then treatment selection bias generalizability about which interventions work best in the sample is the most common and challenging when implemented in different settings. methodological issue. Estimates of the effects and costs Decision-makers rely on multiple sources of RW can be biased because of a correlation between unob- data that must be integrated or synthesized in some served factors associated with treatment selection and fashion. While RCTs remain the gold standard for outcomes, such as baseline health status. demonstrating clinical efficacy in restricted trial set- tings, other designs contribute to the evidence base. In Health surveys. Health surveys are designed to collect some situations, RW data may provide clear advantage descriptions of health status and well-being, health- for understanding outcomes of treatment, for example, care utilization, treatment patterns, and health-care for patients excluded from trials, patients in actual expenditures from patients, providers, or individuals clinical practice settings (vs. research settings), and in the general population. Health surveys typically patients whose treatment is not determined by trial collect information on representative individuals in protocol or practice guidelines. the target population, whether patients, physicians or Among the benefits of RW data is that they can general population, and are methodologically rigor- provide: ous, for example, relying on complex sample survey designs. With these designs, surveys can provide infor- • Estimates of effectiveness rather than efficacy in a mation about all members of the target population, variety of typical practice settings;
332 Garrison et al. • Comparison of multiple alternative interventions will relate to the question at hand. It is important to (e.g., older vs. newer drugs) or clinical strategies to recognize the variable quality of all data (whether pro- inform optimal therapy choices beyond placebo spective or retrospective, or experimental or observa- comparators; tional). The extent to which data provide good or bad • Estimates of the evolving risk–benefit profile of a evidence depends on the research design, the quality of new intervention, including long-term (and rare) the information collected, and how the data are used. clinical benefits and harms; The optimal solution will depend on the circum- • Examination of clinical outcomes in a diverse stances. Decisions typically rely on multiple sources, study population that reflects the range and distri- and are best thought of as conditional—to be revisited bution of patients observed in clinical practice; as additional evidence is generated. • Results on a broader range of outcomes (e.g., Ongoing work in evidence hierarchies recognizes PROs, HRQoL, and symptoms) than have tradi- this complexity. As noted, new approaches for grading tionally been collected in RCTs (i.e., major mor- evidence depend on the quality of the evidence and the bidity and short-term mortality); magnitude of the net benefit. The approaches are more • Data on resource use for the costing of health-care explicit about which studies provide stronger evidence services and economic evaluation; in support of the use of a particular health service or • Information on how a product is dosed and drug. applied in clinical practice and on levels of com- In all likelihood, we need clinical trials that are pliance and adherence to therapy; more practical and “RW data” that are more statisti- • Data in situations where it is not possible to cally rigorous from a design and analysis standpoint conduct an RCT (e.g., narcotic abuse); [26]. For the collection of economic data alongside • Substantiation of data collected in more con- RCTs, this suggests that data collection efforts focus trolled settings; on “big ticket” items, rather then trying to capture a • Data in circumstances where there is an urgency complete picture of resource use. To ensure that the to provide reimbursement for some therapies data collected in these trials are of high quality, a because it is the only therapy available and may be greater emphasis is needed on centralized training of life-saving; study coordinators and records abstractors and on the • Interim evidence—in the absence of RCT centralized quality control mechanisms. data—upon which preliminary decisions can be The high cost associated with data collection may made; and further necessitate linkage of trial data to claims data, • Data on the net clinical, economic, and PRO which requires collection of sensitive patient identifiers impacts following implementation of coverage or such as Social Security numbers. This may raise addi- payment policies or other health management tional patient confidentiality concerns with Institu- programs (e.g., the kind of data CMS expects to tional Review Boards and may require collaboration collect under its coverage with evidence develop- and cooperation of a government agency, or private ment policy) [25]. insurers to ensure patient-level claims can be made available. Recognizing the Limitations of RW Data We also recognize important limitations of RW data. The Need for Good Research Practices for Collecting For all nonrandomized data, the most significant and Reporting RW Data concern is the potential for bias. Retrospective or pro- Our review underscores the need for good practices for spective observational or database studies do not meet collecting and reporting RW data. In terms of data the methodological rigor of RCTs, despite the avail- collection, it is important that efforts follow well- ability of sophisticated statistical approaches to adjust established research practices. These include posing for selection bias in observational data (covariate well-defined questions, specifying time frames for the adjustment, propensity scores, instrumental variables, duration of data collection, conducting periodic moni- etc.). Observational studies need to be evaluated rig- toring to ensure quality and responsiveness to research orously to identify sources of bias and confounding, questions, and limiting sample sizes to the minimum and adjusted for these before estimating the impact necessary. These good practices should also ensure that of interventions on health outcomes. Observational informed consent and human subject protections are in or database studies may also require substantial place. resources. Registries, for example, should be carefully planned, beginning with clear and achievable objec- The Level of Evidence Required Depends on tives and extensive clinical input. A protocol or study the Circumstance plan should guide the conduct of the registry and the The complexity of data collection underscores the fact data analysis. Data collection tools should be carefully that the level of evidence required in any circumstance designed and tested, and site coordinators thoroughly
ISPOR Real-World Task Force Report 333 trained to ensure data quality. Quality control mecha- and value of RW data. Ideally, there will be a mecha- nisms should be in place to identify and resolve study nism for challenge and change, which contributes to issues. There is also a need to balance research needs democratic governance. with privacy concerns as it becomes easier to link data across multiple sources. The Need to Consider the Costs and Benefits of Researchers should draw inferences from observa- Data Collection tional data with caution. For example, a widely recog- Two critical questions are how much RW data should nized challenge with the analysis of RW data is the be collected and who should pay for it. Evidence costs need to correct for sample selection bias—defined as money. Inevitably, there are questions about whether the bias introduced by unobserved variables that are resources devoted to the endeavor are worthwhile. correlated with both treatment selection and patient There is a need to prioritize decisions about RW data outcomes (e.g., unobserved illness severity). On the such that the benefits of collecting additional informa- other hand, it is less well-recognized that selection bias tion are expected to outweigh the costs. is just one of a variety of factors including missing The tool of “value-of-information (VOI) analysis” variables, measurement error, incorrect functional offers a formal approach to deciding when and what form and two-way causation (“structural simultane- types of data to collect. Formal use of decision analysis ity”) that can introduce bias into treatment estimates. and VOI analysis can help determine whether an inter- All of these problems have the common statistical vention should be adopted, whether additional evi- result of introducing a correlation between the treat- dence to further inform that decision is worth ment variable and the error term, which is the source gathering, and what kind of information is of greatest of the bias. Overcoming this “endogeneity bias” is the value [30]. most common and vexing challenge that arises in the The VOI analysis evaluates the extent to which new analysis of RW data. Statistical tests for endogeneity evidence might improve expected benefits by reducing bias exist, along with methods for its correction the chance for error, and compares that improvement although operationalizing these methods is often a sig- to the cost of the information. Evidentiary consider- nificant challenge: in many circumstances, our ability ations will depend on the particular circumstances of a to minimize this bias is limited by the lack of exog- decision (the consequences of an error, what can be enous variation in the available data [27,28]. learned from additional evidence, how new knowledge will change and improve the option identified as Good Process in Using RW Data in Coverage and optimal), rather than on predetermined specifications Reimbursement Decisions about the type of study design (e.g., RCT). Explicit We also recognize the importance of the processes consideration is given to the potential positive health implemented by decision-makers in using RW data outcomes forgone due to delays in coverage as well as in coverage and reimbursement. Observers point to the potential adverse consequences of too rapid uptake several conditions for establishing good process, when the risk–benefit ratio is highly uncertain. including transparency (the decision and the rationale for making them must be publicly accessible) and rel- The Need for Modeling evance (there must be a reasonable explanation for a A previous ISPOR Task Force on modeling studies decision’s rationale) [29]. They should also be fair in emphasized that the purpose of modeling is to “struc- the sense that RW data will be used in similar fashion ture evidence on clinical and economic outcomes in a across technologies, or if situations demand a different form that can help inform decisions” [31]. They stated: rationale, the circumstances or principles would be “Models synthesize evidence on health consequences known. Decisions should not be “bureaucratically and costs from many different sources, including data arbitrary,” or based on reasons that people do not view from clinical trials, observational studies, insurance as meaningful or just. claim databases, case registries, public health statistics, Typically, the amount of RW data available for the and preference surveys.” initial coverage and reimbursement decision is very Our field has adopted bioclinical cost-effectiveness limited: instead, effectiveness projections are modeled models as an integrative framework, incorporating from efficacy data. Good process would encourage the biological, clinical outcomes, and economic data into subsequent gathering of RW data, in part by updating a decision-making structure. Within this general coverage and reimbursement decisions based on those structure, there are several different types of possible data. pharmacoeconomic evaluations, such as cost– Processes should also allow opportunity for stake- consequences, cost–utility, and cost–benefit analyses. holder participation. Different stakeholders (physi- Such models and analyses are the primary vehicle for cians, patients, family caregivers, payers, hospitals, combining RCT and RW data to support coverage and regulatory agencies, employers, manufacturers, and reimbursement decision-making. At the same time, researchers) may have different perspectives on the use expanded use of RW data in assessing postlaunch cost-
334 Garrison et al. effectiveness is needed to update the modeling pro- site at http://www.ispor.org/workpaper/RWD_TF/ jections made to support initial coverage and MemberComments.pdf. One recurring theme was a reimbursement decisions. perceived need for more guidance on precisely how In terms of good research practice, the previous decision-makers will actually use RW data after the Task Force emphasized the need for transparency of launch of a product. Other themes included calls for assumptions, including the point that “conclusions are more discussion of limitations of various aspects of conditional upon the assumptions and data on which RW data collection, calls for more attention to the uses the model is built.” of EHRs, and comments that our report could have used more emphasis on international data collection The Need for Ongoing Dialog and on medical devices. Although we made numerous Finally, our review highlights the need for ongoing changes to this document in light of these comments, stakeholder dialog on all of these issues. Implicit in we still refer the interested reader to the comments on much of our discussion is a central policy question the ISPOR website, as several of the important broader about the appropriate role of the public sector in pro- themes and limitations they raise should help to define ducing and judging evidence. Who collects and evalu- the agenda for the next steps in this continuing inquiry. ates RW data, which pays for these efforts, and what magnitude of resources is provided are key questions The members of the Task Force Working Groups contributed for policymakers worldwide. There is no general regu- significantly to the preparation of this report: Evidence Hier- latory requirement for collecting RW data. archies: Joe Jackson, Phil Sarocco, Jennifer Elston-Lafata; In the United States, CMS has called for data that Clinical Outcomes: Marc Berger, Bruce Carleton, Gurvaneet reflect RW practice. In addition, FDA is requiring the Randhawa, Anne Smith; Economic Outcomes: Bill Crown, implementation of mandatory registries in instances Jens Grueger, Penny Mohr, Les Noe; Patient-Reported Out- where there is any concern regarding long-term safety comes: Jamie Banks, Rick Chapman, Mary Cifaldi, Andreas of a therapy. The opportunity for funding observa- Pleil. The Task Force is grateful to Marilyn Dix Smith, and tional studies of therapy is limited compared with tra- Erin Gilli of ISPOR for administrative support. The views ditional epidemiological studies and therapeutic trials. expressed reflect those of Task Force members and do not necessarily reflect the views of the broader ISPOR members, It is critical that we have an intelligent and vigorous nor their employers. public discussion on these and other issues. References Conclusions 1 Medicare Prescription Drug, Improvement, and Mod- Real-world data are essential for sound coverage, ernization Act of 2003, Pub. L. no. 108-173, 117 Stat payment, and reimbursement decisions. The types and 206 (2003). applications of such are varied, and context matters 2 Agency for Healthcare Research and Quality. Effec- greatly in determining the value of a particular type of tive Heath Care Program. 2006. Available from: evidence in any circumstance. Different study designs http : // effectivehealthcare . ahrq . gov / aboutUS / index. can provide useful information in different situations. cfm [Accessed July 13, 2006]. RCTs remain the gold standard for demonstrating 3 Agency for Healthcare Research and Quality. AHRQ clinical efficacy in restricted trial setting, but other Launches Project to Develop Guide for Patient Regis- designs—such as observational registries, claims data- tries. Press release, December 7, 2005. Available from: http://www.ahrq.gov/news/press/pr2005/ptregpr.htm bases, and practical clinical trials—can contribute to [Accessed December 21, 2006]. the evidence base needed for coverage and payment 4 Centers for Medicare and Medicaid Services. National decisions. Coverage Determinations with Data Collection as a It is critical that policymakers recognize the benefits, Condition of Coverage: Coverage with Evidence limitations, and methodological challenges in using RW Development. 2006. Available from: http:// data, and the need to carefully consider the costs and www.cms .hhs. gov / mcd / ncpc _ view _ document . asp? benefits of different forms of data collection in different id=8 [Accessed December 21, 2006]. situations. In this report, we have attempted to identify 5 Neumann PJ. Evidence-based and value-based formu- the salient issues in developing and using RW data. We lary guidelines. Health Aff (Millwood) 2004;23:124– acknowledge many challenges ahead and view this 34. effort as a starting point for future debates. 6 Academy of Managed Care Pharmacy. AMCP Format for Formulary Submissions, Version 2.1. 2005. Avail- able from: http://www.amcp.org/amcp.ark?c=pr&sc= Postscript link [Accessed July 13, 2006]. 7 Oregon Health & Science University. Drug Effect- Prior to finalizing this report, we reviewed more iveness Review Project, 2006. Available from: than 70 comments from ISPOR members on our http: // www.ohsu.edu / drugeffectiveness / reports/final. draft report, which are accessible at the ISPOR web- cfm [Accessed July 13, 2006].
ISPOR Real-World Task Force Report 335 8 National Institute for Health and Clinical Excellence. 19 Scottish Intercollegiate Guidelines Network. 2006. 2006. Available from: http://www.nice.org.uk/ [Ac- Available from: http://www.sign.ac.uk/ [Accessed July cessed July 13, 2006]. 17, 2006]. 9 IQWIG. German Institut für Qualität und 20 Harbour R, Miller J. A new system for grading rec- Wirtschaftlichkeit im Gesundheitswesen. 2006. Avail- ommendations in evidence based guidelines. BMJ able from: http://www.iqwig.de/ [Accessed July 13, 2001;323:334–6. 2006]. 21 American College of Cardiology Foundation. 10 Sackett DL, Rosenberg WM, Gray JA, et al. Evidence Clinical Statements and Guidelines. 2006. Available based medicine: what it is and what it isn’t. BMJ from: http://www.acc.org/qualityandscience/clinical/ 1996;312:71–2. statements.htm [Accessed April 15, 2007]. 11 Woolf SH. Evidence-based medicine and practice 22 O’Sullivan AK, Thompson D, Drummond MF. Col- guidelines: an overview. Cancer Control 2000; lection of health-economic data alongside clinical 7:362–7. trials: is there a future for piggyback evaluations? 12 Wild D, Grove A, Martin M, et al. Principles of Value Health 2005;8:67–79. good practice for the translation and cultural adap- 23 Tunis SR, Stryer DB, Clancy CM. Practical clinical tation process for patient-reported outcomes (pro) trials: increasing the value of clinical research for deci- measures: report of the ISPOR task force for trans- sion making in clinical and health policy. JAMA lation and cultural adaptation. Value Health 2003;290:1624–32. 2005;8:94–104. 24 Agency for Healthcare Research and Quality. Effec- 13 Acquadro C, Berzon R, Dubois D, et al. Incorporat- tive HealthCare, DEcIDE: Registries for Evaluating ing the patient’s perspective into drug development Patient Outcomes. 2006. Available from: http:// and communication: an ad hoc task force report of effectivehealthcare . ahrq . gov / decide / registryOutline. the Patient-Reported Outcomes (PRO) Harmoniza- cfm [Accessed July 18, 2006]. tion Group meeting at the Food and Drug Admin- 25 Centers for Medicare and Medicaid Services. Guidance istration, February 16, 2001. Value Health 2003;6: for the Public, Industry, and CMS Staff: National 522–31. Coverage Determinations with Data Collection as a 14 US Department of Health and Human Services, Food Condition of Coverage: Coverage with Evidence and Drug Administration. Guidance for Industry Development. 2005. Available from: http://www . Patient Reported Outcome Measures: Use in Medical cms . hhs . gov / mcd / ncpc _ view _ document . asp? id = 8 Product Development to Support Labeling Claims. [Accessed July 23, 2006]. 2006. Available from: http://www.fda.gov/cber/gdlns/ 26 Glasgow RE, Magid DJ, Beck A, et al. Practical clini- prolbl.pdf [Accessed July 13, 2006]. cal trials for translating research to practice: design 15 Szende A, Leidy NK, Revicki D. Health-related and measurement recommendations. Med Care 2005; quality of life and other patient-reported outcomes in 43:551–7. the European centralized drug regulatory process: a 27 D’Agostino RB Jr. Propensity score methods for bias review of guidance documents and performed autho- reduction in the comparison of a treatment to a non- rizations of medicinal products 1995 to 2003. Value randomized control group. Stat Med 1998;17:2265– Health 2005;8:534–48. 81. 16 Guyatt GH, Sackett DL, Sinclair JC, et al. Users’ 28 Crown WH, Obenchain RL, Englehart L, et al. The guides to the medical literature. IX. A method for application of sample selection models to outcomes grading health care recommendations. Evidence- research: the case of evaluating the effects of antide- Based Medicine Working Group. JAMA 1995; pressant therapy on resource utilization. Stat Med 274:1800–4. 1998;17:1943–58. 17 United States Department of Health and Human Ser- 29 Daniels N, Sabin J. Setting Limits Fairly: Can We vices, Agency for Health Care Policy and Research. Learn to Share Medical Resources? New York: Acute Pain Management: Operative or Medical Oxford University Press, 2002. Procedures and Trauma. 92-0023, 107. Rockville, 30 Claxton K, Cohen JT, Neumann PJ. When is evidence MD: AHCPR, 1993. Clinical Practical Guideline sufficient? Health Aff (Millwood) 2005;24:93–101. No. 1. 31 Weinstein MC, O’Brien B, Hornberger J, et al. 18 Harris RP, Helfand M, Woolf SH, et al. Current Principles of good practice for decision analytic mod- methods of the US Preventive Services Task Force: a eling in health-care evaluation: report of the ISPOR review of the process. Am J Prev Med 2001;20(3 Task Force on Good Research Practices—Modeling Suppl.):S21–35. Studies. Value Health 2003;6:9–17.
You can also read