SYMPTOMS, HEALTH-RELATED QUALITY OF LIFE AND PATIENT SATISFACTION: USING THESE PATIENT-REPORTED OUTCOMES IN PEOPLE WITH GASTROESOPHAGEAL REFLUX ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Chapter 25 SYMPTOMS, HEALTH-RELATED QUALITY OF LIFE AND PATIENT SATISFACTION: USING THESE PATIENT-REPORTED OUTCOMES IN PEOPLE WITH GASTROESOPHAGEAL REFLUX DISEASE S. Wood-Dauphinee1 and D. Korolija2 1 School of Physical and Occupational Therapy, Department of Epidemiology and Biostatistics, Department of Medicine, McGill University, Montreal, Quebec, Canada 2 University Surgical Clinic, Clinical Hospital Center Zagreb, Zagreb, Croatia Introduction Patient-reported outcomes, defined as any report com- the expectations of patients with gastroesophageal re- ing directly from the person whose life is affected by a flux disease (GERD) in terms of the outcomes of lapa- health problem, are becoming increasingly important in roscopic anti-reflux surgery. Responses of 70 patients to helping professionals determine the impact of their open-ended questions provided the following infor- treatments. This represents a change in focus. Tradi- mation. Relief of GERD symptoms was expected by tionally, clinicians and researchers were primarily inter- 92.8%; 84.3% anticipated a return to usual daily and ested in outcomes related to morbidity and mortality. work-related activities; and for 72.9% an improved This approach was consistent with the biomedical quality of life was important. Successful surgery without model of disease that relied on tests to identify pathol- complications was named by 52.9% of patients and pro- ogy or changes in physiological processes. A treatment tection from a future Barrett’s esophagus or cancer was was judged to be successful if the biologic test returned noted by 48.6%. Only two patients expected normaliza- to normal, as was often the case for acute conditions. A tion of pH values and healing of the esophagus. more recent approach, termed an outcomes model [1], These results demonstrate that patients’ expecta- suggests that medical care is designed to focus on how tions are generally different from those of the clini- people feel and how they are able to function as well as cian. In fact, patients primarily seek medical care how long they live. Measures of symptoms, health- because of bothersome symptoms. It should, thus, be related quality of life (HRQL) and patient satisfaction anticipated that symptom relief would be their top are considered to be appropriate outcomes. While there expectation. The ability to assume usual patterns of is clearly overlap between the models, the outcomes daily activities including work, as well as customary model focuses attention on the determinants of patient roles were also important. These latter factors are outcomes and relies primarily on reports by patients to well-accepted components of quality of life as it re- judge the result of the treatment. It is particularly lates to health, in other words HRQL, and this was appropriate for diseases that are chronic in nature. Cur- also a priority of the patients. Satisfaction with the rently, it is believed that both types of outcomes are process and end result of surgery is another patient- important. The physiological measures reflect the value reported outcome that reflects their expectations and system of the professionals and provide information degree to which they were met. that helps confirm their clinical impression. The pa- The purposes of this chapter are to describe the ra- tient-reported measures reflect the subjective evaluation tional for using these patient-reported outcomes, the and reporting of the illness experience and its treatment criteria by which the measures should be selected and [2]. These measures reflect their value system. how they can be used in both clinical research and A recent study from Austria [3] illustrates the differ- daily surgical practice. Information will also be pro- ent points of view of clinicians and patients in rating the vided about available measures to assess these con- importance of different outcomes. This study examined structs in individuals with GERD.
270 Chapter 25 Why are patient-reported outcomes with a biopsy and histological examination of the useful in the care of individuals gastroesophageal junction, as well as a careful history with GERD and in clinical studies of GERD? and assessment of symptoms. Traditionally, these pre- operative tests were repeated post-operatively at various Including patient-reported outcomes in clinical practice points in time, and physiologic changes demonstrating and in clinical studies of GERD provides several impor- normalization of pH values and lower esophageal tant benefits. First, as noted previously, it allows the sphincter pressure, along with the elimination of reflux clinician or the investigator to characterize the impact of that signified esophagitis healing, were used as indica- GERD and its treatment in terms that are of value to tors of operative success. Today, while the evaluations and understood by the patients. In fact, patient-reported are similar, the overt surgical objective is mainly focus- measures specifically reflect their point of view, as most sed on the alleviation of symptoms. often, patients provided input as to the content of the Symptoms have been defined by the Patient- measures. Second, certain components of these patient- reported Outcomes Harmonization Group in 2002 as reported measures may be independent predictors of “the subjective experience of abnormal function, sen- surgical outcomes. For example, measures of HRQL sations, or appearance, generally indicating disorder usually contain a component that assesses mental health or disease” [8]. Surgeons aim to decrease the pres- in terms of anxiety and depression. It has been shown ence, severity, frequency and duration of symptoms, that people with these problems are less satisfied with as they portray the sensory changes perceived by the the outcome of surgery than those without such psychi- patient [9]. Operative success is often judged by atric comorbidity [4], [5]. A change in symptoms may patient reports of few remaining or new symptoms, forecast increasing severity and this information may negligible complications and a limited need for provide insight into the progression of GERD, or de- further medications [10], [11]. creasing symptoms may denote treatment adherence or A number of findings have led to this focus on recovery. For instance, the resolution of heartburn is symptoms. First, severity of the symptoms has not highly correlated with post-operative return to normal been found to be strongly associated with the patho- 24-h pH monitoring [6]. Thirdly, measures of symptoms logical extent of the reflux or other physiological are excellent indicators of the severity of the disease, and parameters. For example, studies of esophageal pH those assessing HRQL portray the impact on function- monitoring or manometry are not highly correlated ing, engagement in daily activities and participation in with reported symptom severity [12], [13]. Symptoms life events. Finally, data specifically related to patient and esophageal lesions do not always correlate strongly satisfaction provide information on the quality of care [14]. There are few differences in symptoms between provided. A well-developed measure of patient satisfac- patients with Barrett’s esophagus, erosive and non- tion will indicate both the patients’ perceptions of the erosive GERD [15]. Some studies have found that process of care and the outcomes of treatment. Surgeons surgery is of value for people with severe symptoms in particular, have long been concerned with the results regardless of the endoscopic appearance of the of surgical care that reflect the patient’s subsequent esophageal mucosa [16], [17]. Finally, as mentioned health state [7]. This information is also useful to admin- previously, stress-related symptoms and psychiatric istrators and payers concerned with the quality of care. diagnoses are independent predictors of the surgical outcome [4], [5], [18], [19]. GERD may be associated with many symptoms but the primary one is heartburn. Others, including Which patient-reported outcomes acid regurgitation, epigastric pain, belching, bloating, are important to measure? nausea, vomiting and dysphagia, may range from cau- sing mild impairment to severe disability [20]. Despite Symptoms the long standing interest in symptoms and the resent Prior to surgery for GERD, patients undergo a number reaffirmation of their importance as outcomes, there of objective tests including 24-hr pH monitoring, appears to be no unified approach to the assessment of esophageal manometry, esophagastric duodenoscopy symptoms or an in-depth knowledge of the best way
Wood-Dauphinee S and Korolija D 271 to do so, either in daily practice or research [21]. There [8], [21], [26]. Given the increasing importance of is some agreement that self-report is the most appro- their use in outcomes assessment, increased attention priate approach [8], [22], partially because there is no to the development and validation of symptom scales strong correlation between patient and clinician is warranted. reporting. In some circumstances, clinicians tend to underestimate the presence of symptoms and their severity compared to those who actually have GERD Health-related quality-of-life [23], and at other times, particularly when estimating treatment response, investigators report more positive Patients seek medical care because of symptoms. results than do the patients [24]. Often, however, it is not because of their presence, Self-completed symptom questionnaires have or even their severity, but to the distress they cause been developed and validated and some examples the patient by intruding on daily activities and life in will be presented later in the Chapter. There is, general. In other words, it is how the symptoms im- however, a tendency in the literature to use tradi- pact on their HRQL. In GERD it is evident how tional, clinical, ordinal scales asking questions about problems related to eating, drinking, sleeping, pain, the presence, frequency and severity of common and reduced vitality impair life’s quality. In fact, it symptoms, rather than standardized measures. In has been well documented that people with GERD fact, surgical investigators frequently report the use have a lower quality of life than those without this of a “standard scale” but the meaning of the term is disorder [27], [28]. unclear. Information about the origin and psycho- While no one definition has received universal ac- metric properties is seldom provided. ceptance, there is a general consensus that measures In a recent international, multidisciplinary works- of HRQL are multi-dimensional and should assess hop [21], the issue of symptom reporting in trials was physical, mental, social and role-functioning, a per- extensively addressed. Using heartburn as an example son’s perception of overall well-being and symptoms of a common, salient symptom in GERD, Bytzer [22] to a greater or lesser extent depending on the type of discussed issues related to its assessment that ranged measure [29]. As noted by Guyatt and colleagues from problems in defining heartburn itself, its severity, [30], HRQL is concept that embraces the World and frequency, to when, how and by whom heartburn Health Organization’s definition of health [31] by and other symptoms should be measured. While a incorporating both personal health status and social large number of symptom-response measures have well-being. It reflects peoples’ subjective perceptions been reported in the literature, there is little consis- of how they feel and function. tency of approach and a general lack of validation There are two main types of HRQL measures studies. In connection to this workshop, Wrywich and [32]. Generic measures cover the full range of do- Staebler Tardino [25] provided a blueprint for creating mains and can be used across different patient popula- symptom scales that uses a cognitive psychology tions to compare the impact of various diseases. Some framework approach to development. They also gave generic measures have normative data, by age and sex, general information about developing optimal scales from ostensively healthy populations. When available, and interpreting their results. these data make it possible to compare people with In sum, there is considerable evidence that relief of GERD, for example, to those without the condition, symptoms caused by GERD is at times more complex perhaps prior to and after surgery. Moreover, because than simply correcting the pathologic lesion. How generic measures are broad in scope, they sometimes patients perceive the sensations and respond to them help identify previously undisclosed problems that are must be taken into consideration. Because of the lack not tapped by a measure specifically for people with of physiological and pathologic markers for differen- GERD. This latter type of measure, known as disease- tiating disease severity, an individual’s description of specific, focuses on the specific feelings, dysfunctions his or her symptoms is a predominant source of infor- and symptoms associated with a given condition. mation for the surgeon in making a diagnosis, moni- They, therefore, are able to detect treatment effects toring a patient and assessing the outcomes of surgery and mirror changes in patient status.
272 Chapter 25 Generic and disease-specific measures may be health access, cost and convenience are incorporated. There profiles, which usually yield sub-scales for each do- has been at least one suggestion that satisfaction with main allowing the assessment of interventions on the the treatment process should be assessed separately different components of HRQL, or utility measures, from that of the outcome of the treatment [33]. derived from economic and decision theory. These In recent years, most surgical investigators evalu- measures have preference weights incorporated in ating the impact of various surgical procedures and ap- their scoring. Utility measures provide a single nume- proaches in GERD have selected patient satisfaction rical estimate of HRQL that includes patient choices as one of the outcomes. In the majority of studies, the about both duration of life and its quality. Some pro- degree of satisfaction reported one to five years after files also generate a single number for analysis. Today, the surgery was high. A few reports were not as glow- many studies use a combination of a generic and a di- ing. For example, Bessell and colleagues [40] found sease-specific measure so that response to change is that 27% of those patients who replaced severe pre- captured, but no important aspect of the person’s operative heartburn preoperatively for severe dysphasia HRQL is missed. after the surgery would not have the surgery again. Finally, clinicians and investigators also use a sin- Another study [41], assessing surgical outcomes in gle item to evaluate HRQL. While common, partic- routine clinical practice rather than in a referral centre ularly in clinical practice, these single-item ratings reported similar overall outcomes in the face of less have not usually been tested for their measurement positive data about complications, symptoms, and properties, and are known not to be very reliable. medication use after surgery as well as the need for Moreover, they provide no help in explaining why post-surgical dilatation or repeat operations. These patients respond the way they do. investigators attributed the positive global response regarding satisfaction to a type of measure that fails to include specific components of the process or out- Patient satisfaction comes of care [42]. This is not an uncommon finding. Global ratings of satisfaction tend to be positively Patient satisfaction is the patient’s perceptions of both skewed [43]. Patients rate high levels of satisfaction the quality of treatment provided and its effective- in the face of other negative information [43], [44]. ness. A measure of satisfaction is one that documents Additionally, they tend to be less satisfied if asked patients’ assessments or affective responses to dif- about specific areas [44]. In fact, there are many ferent dimensions of the treatment experience [33]– problems with global single item ratings, although [35]. Typically, it compares the process and outcomes they are easy to use and intuitively appealing. Because of the treatment experience with prior expectations the dimensions within the satisfaction construct are that may or may not have been met or surpassed [36], not named, it is not known what factors the patient [37]. Although individual patients may have different took into consideration or excluded when making the expectations for the distinct components of treatment rating, why elements received the assigned ratings, or or care, their individual expectations and satisfaction how they were combined [45]. with the various components are independent predic- Other investigators [17], [46] used “standard” tors of overall satisfaction [38]. series of questions about such areas as the success Different conceptual frameworks for understanding of the surgery, whether or not the patient would patient satisfaction have been proposed [37], [39], and again decide to undergo the surgery, and difficul- used as a basis for the development of measures. In ties experienced since the operation. Each question general terms, the frameworks include sociodemogra- was treated individually and provided descriptive phic, personal, medical and functional characteristics of information about the patients’ responses. While the patients, their values, preferences and expectations somewhat more informative it is unknown if all sa- of treatment outcome, prior experiences with treatment lient aspects that are important to patients were in- for the current and other disorders, the way treatment cluded, and it is still difficult to form a concrete is delivered and experienced, as well as its impact on picture of the patient’s judgement of the process symptoms, function and HRQL. In some models, and outcomes of care.
Wood-Dauphinee S and Korolija D 273 In summary, the single item ratings or questionnaires how it is to be administered, scored and interpreted. with only a few items used in surgical studies to assess Conversely, “ad hoc” measures, most often created by patient satisfaction have not been carefully developed clinicians, are those without formal testing or estab- and examined for their psychometric properties. In lished measurement properties. An evaluative mea- other words, they have not been developed in the cur- sure is designed to assess an individual at a baseline rently accepted rigorous manner [39]. point, and again at one or more points later in time, To the best of our knowledge, only one group of principally to determine if change has occurred. researchers [47] has developed and validated a mea- Evaluative measures may also discriminate between sure of patient satisfaction for GERD patients, the different groups and predict future events as well. Treatment Satisfaction Questionnaire for Gastro- The content if the instrument is of primary inter- esophageal reflux disease (TSQ-G). The measure was est. While content may be influenced by the literature developed using input from patient focus groups, and information from clinicians, as alluded to physicians and the literature and it was tested appro- previously, most content should come directly from priately for reliability and validity. Unfortunately, the patients and reflect their issues and concerns. There is measure is targeted for GERD patients being man- a growing consensus that the “content” validity or the aged by medications and, thus, is not appropriate as adequacy with which the items sample the construct an outcome of surgery. Nonetheless, it is a model for being assessed by the measure, can only be judged by the development of such a measure for use with the persons being evaluated [2]. It is, thus, important patients undergoing surgery. that patients with the specific health problem have Our concerns about the assessment of treatment had major input. Assuring that this is the case is an satisfaction mirror those of Revicki [48]. He pointed early step in the selection process. out that considerable attention needs to be given to In general terms, validity refers to the ability of an in- the psychometric properties of satisfaction measures strument to measure what it is supposed to measure. Be- including the theoretical model that underpins the yond “content”, information on criterion and construct instrument, reliability, all types of validity and the validity may be available. Criterion validity evaluates the interpretability of the numerical scores. relationship between the measure of interest and a crite- It should be noted that surgical investigators rion measure or “gold standard”, concurrently or in the working in GERD are not alone in their difficulty future. For concurrent criterion validity a new disease- assessing satisfaction. A few years ago an analysis of specific measure of HRQL for people with GERD 195 studies found that little attention had been given should correlate moderately with a well-known disease- to the development of satisfaction measures and this specific measure of HRQL. While the criterion measure its self cast doubt on the credibility of the satisfaction may not be “gold” it should at least be “silver”! For pre- findings [49]. It is an area that needs immediate at- dictive criterion validity one may test if the score on a tention. Specifically, we need to develop measures of measure of HRQL taken two weeks after surgery will satisfaction that reflect the components of global forecast return to work. Given the difficulty of finding satisfaction such as personal expectations, indicators gold standards for patient-reported measures, construct of quality of treatment as well as the outcomes of validity is more often reported. Construct validation ex- care as judged by the patients. amines if the measure performs according to theoretical expectations by examining the direction and magnitude of relationships with other variables. For example, one Issues in selecting patient-reported outcomes might hypothesize and test if a generic measure of for use in clinical practice and research HRQL can discriminate among groups of patients who have different levels of symptoms, or if it will negatively First, we are interested in selecting standardized correlate with measure of pain. A measure of patient sat- measures that are “evaluative” in purpose [50]. A isfaction following an anterior partial fundoplication for standardized measure is one that has been published GERD should be positively correlated with the degree along with information about its psychometric prop- of symptom resolution or an objective outcome such as erties, and instructions as to with whom, when and results from a 24-hr gastric pH monitoring.
274 Chapter 25 Reliability reflects the extent to which a measure is provide some direction about how an individual pa- free from random error and it refers to the reproduc- tient is doing. Readers wishing more information on ibility or stability of the measure over time. This is the psychometric properties of measures are referred termed test-retest reliability. It also includes estimates to work by the Scientific Advisory Committee of the of internal consistency, or how well the items in a scale Medical Outcomes Trust [63]. relate to each other and to the total score. While there In addition to knowledge about psychometric are several test statistics to assess reliability, the properties, the potential user of an instrument needs reliability coefficients are interpreted similarly. A other information before making a choice. For exam- coefficient of 0.89 means that 89% of the variance is ple, does the timeframe associated with the questions true variance, related to the patients in the sample, and or items fit the intended use? Patients can be asked to 11% is the amount of random error. For groups, as in consider their responses in terms of the past 24 hours, research, a coefficient of 0.70 is the minimum level, a week, month or even a year. The choice depends on but for use in clinical practice the minimum has been the typical illness or recovery trajectory of the patients set between 0.85 and 0.90 [51]. or on the design of the study. Which response options The last psychometric property is responsiveness, or are provided for the patient? Are they dichotomous the ability of the measure to accurately detect patient (yes/no), made up of several ordinal categories (poor – change when it has occurred [52]. Most approaches fair – good – excellent) or presented as a visual analog to test responsiveness depend on assessing patients scale? Are population norms available for the country periodically over time during a period of anticipated of the study which can be used for comparison pur- change, and evaluating the change that occurs [53], poses? This is particularly useful for generic measures [54]. While various approaches to quantifying respon- of HRQL so clinicians or investigators can compare siveness exist, clinical studies primarily report one of their patients’ or study subjects’ scores to age- and sex- the variants of “effect size”. Coined by Cohen [55], this matched population values. What is the burden on term simply means a standardized, unitless measure of subjects? More specifically, how long does the instru- change. Today such variants are termed “effect sizes” ment take to complete, and does it include questions [56], “standardized response means” [57] or “respon- that are potentially upsetting for the patient? What is siveness statistics” [58]. the burden on the professional? How easily is the Potential users of measures also need direction in measure scored? Can the scoring be automated? Does how to interpret the score. By “interpretability” we one have to obtain permission to use the measure, and mean the capacity to assign a qualitative meaning to a if so, is there an associated cost? All of this information quantitative score [59]. One approach to the interpre- needs to be ascertained prior to selecting a measure. tation of change that is “distribution based” employs Moreover, today, clinical research is conducted in effect sizes [60]. Cohen [55] suggested that 0.2, 0.5 countries around the world, and thus, the demand and 0.8 represent small, medium and large effect for instruments that can be used internationally has sizes. While these values are somewhat arbitrary [61], risen dramatically. By now many instruments have they are used in the literature. The second approach, been culturally adapted, translated into different termed “anchor-based” [60], examines the relation- languages and then retested psychometrically to in- ship between the change score on the instrument sure that the language, meaning and performance of being tested to that on another measure that is well- the instrument remain consistent. There are differ- known, associated with the test measure and clinically ent methods to enhance cross-cultural comparabil- meaningful [61]. Population norms, severity classifi- ity, and while guidelines are available, [64]–[66] it is cations, symptom scores and global ratings of change a time consuming process. Investigators or clinicians by patients or physicians as well as the minimum im- planning to use a patient-reported measure in their portant difference (MID) have all been used. The clinical practice or research project should determine MID, the smallest change that patients perceive as if the measure they select has undergone such a pro- beneficial [62], is another useful piece of information cess and is available for use. for potential users of a measure as it not only has im- Brief information about the psychometric proper- plications for sample size in investigations, but it can ties of patient-reported measures used in people with
Wood-Dauphinee S and Korolija D 275 GERD and answers to some of the questions raised in Endoscopic Surgery [74], and the GIQLI [77] was this section of the Chapter are presented in Table 1, recommended specifically for outcome assessment by but such information accumulates over time, and so a the European Study Group for Antireflux Surgery potential user should refer to recent literature. [79]. It is available in English [77], French [80], German [81] and Spanish [82]. The QOLRAD was developed in French and English [78]. Patient-reported outcomes currently In terms of generic HRQL measures, people with used in people with GERD GERD have mainly been assessed using two well- known measures – the Psychological General Well A number of articles have reviewed the development, Being Index (PGWB Index) and the Medical Out- psychometric performance and applications of pa- comes Study Short Form-36 (SF-36). These measures tient-reported measures of symptoms and HRQL for were recommended by the European Association for people with GERD [67]–[71]. Table 1 revisits this in- Endoscopic Surgery [74] partially because individuals formation and presents those measures appearing in with GERD score lower on these measures than the surgical literature, along with information on the ostensibly healthy individuals and their scores decrease different domains tapped in each measure, the number as symptoms become more severe [83]–[85]. of items per domain, the time-frame within which pa- The PGWB Index was developed as a measure of tients are to consider their responses and how the subjective well-being or distress [86]. The Index is measures are scored. Additional information is pro- comprised of six domains, including anxiety, depressed vided about the approach to content development mood, positive well-being, self control, general health (specifically if patient input had been sought), other and vitality. The domains contain 3–5 items, each of aspects of validity, estimates of reliability and how which is scored on a 6-point ordinal scale. Domain responsiveness has been examined. Some measures scores and a total score can be calculated. Higher values have information about the minimal important dif- denote better quality of life. Internal consistency and ference in score that patients can detect as well. When test-retest reliability as well as construct and criterion known, the languages in which the measure is avail- validity were moderate to strong [86]–[89]. PGWB to- able are stated in the text. It is acknowledged, how- tal scores were able to discriminate between individuals ever, that other language versions, unknown to the with and without heartburn [83]. Moreover, sensitivity authors, may exist in the international literature. to change in response to treatment has been demon- The Gastrointestinal Symptom Rating Scale strated in patients with upper gastrointestinal symp- (GSRS) [72] and the Gastroesophageal Reflux Dis- toms [88]–[91] and a change of 4 points on the Index ease Health-related Quality of Life (GERD-HRQL) is a clinically meaningful difference in people with scale [73] have been available for a number of years GERD [83]. Swedish norms are available [89]. and appear frequently in surgical investigations. Both The SF-36 is a generic measure of perceived health these measures were recently recommended for use status that incorporates behavioural functioning, by the European Association for Endoscopic Surgery subjective well-being and perceptions of health, by as- [74]. The GSRS has been used in Scandinavian, UK sessing eight health concepts: limitations in physical and US samples. The Symptom Questionnaire for activities due to health problems; limitations in role ac- Gastroesophageal Reflux Disease [75] is more recent tivities due to physical health problems; pain; limita- and has been employed in one study of the long- tions in social activities due to health problems; general term follow-up of patients after laparoscopic Nissen mental health; limitations in usual role activities due to fundoplication [76]. emotional problems; vitality; and general health per- The gastrointestinal-specific and the GERD-spe- ceptions [92]. The questionnaire is made up of 36 cific measures of HRQL have also been widely used items that are divided into the 8 scales. The scores on in surgical studies. Both the Gastrointestinal Quality all scales range from 0–100, with higher scores re- of Life Index (GIQLI) [77] and the Quality of Life flecting better health. There is also a computerized in Reflux and Dyspepsia (QOLRAD) [78] were method of scoring two major components, physical and recommended by the European Association for mental health. Each component has been standardized
Table 1. Patient-reported measures of symptoms and health-related quality of life used in surgical studies of people with GERD 276 1 Symptoms Instrument Domains # of Time Scoring Reliability Validity Responsiveness items frame Gastrointestinal Reflux syndrome 2 Past 7-point ordinal scale [1–7] Internal consistency: Content: Developed Effect Sizes and Symptom Rating Abdominal pain 3 1–2 No Discomfort – Severe Alphas – Moderate using literature and Standardized Scale (GSRS) [72] Indigestion 4 weeks Discomfort to Moderately High professional input Response Means: Diarrhea 3 Sum domain scores [115]–[116] [72] Adequate [115]–[118] Constipation 3 Calculate domain means Higher score : greater Test-retest: Construct: Adequate; Minimal Important severity ICCs – Moderate with SF-36 & PGWB Difference: 0.5 per to Moderately High Scales [116], [118] Item [117] [116]–[117] Discriminative Adequate; between different symptom severities & responses to treatment [83], [115], [116] Gastroesophageal Heartburn 6 Current 6-point ordinal scale [0–5] Content: Face validity Pre-post treatment Reflux Disease Dysphagia 2 No symptoms – judged by clinicians changes evident on Health-related Bloating 1 Incapacitating Symptoms symptom scale [73] Quality of Life Medication impact 1 Sum 10 symptom items Construct: Adequate; Scale Higher score : greater Correlates with (GERD-HRQL) Satisfaction with 1 severity degree of [73] condition 3-point categorical scale esophagitis [12] (Satisfied – Neutral – Unsatisfied) Discriminative Adequate; between satisfied/unsatisfied patients and medical/surgical treatments [73] Symptom Heartburn 1 Current 4-point ordinal scale [0–3] Test-retest Content: Face validity Responsiveness Questionnaire for Regurgitation 1 Severity ICC – High [75] judged by clinicians Index: Gastroesophageal Epigastric / chest 1 5-point ordinal scale [0–4] Adequate [75] Reflux Disease pain Frequency Construct: Adequate [75] Epigastric fullness 1 Severity Frequency Correlates with Minimal Important Chapter 25
Table 1 (continued) Dysphagia 1 [0–12] points per item disease activity Difference [5–10] Cough 1 Total Score [0–72] measures and SF36 points [75] Higher score : greater cross-sectionally symptom impact and longitudinally [75] 2 Health-related quality of life: gastrointestinal and disease-specific Gastrointestinal Symptoms 19 Past 2 5-point ordinal scale [0–4] Internal Consistency Developed by Demonstrated Quality of Life Emotional status 5 weeks Severity or frequency of Alpha – High [77] interviews with expected gradient Index (GIGLI) [77] Physical status 7 symptoms, dysfunctions patients and pre-pos surgery and Social activities 4 Test-retest clinicians and from follow-up [119], [120] Wood-Dauphinee S and Korolija D Treatment stress 1 Total score [0–144] ICC – high [77] the literature [77] Higher score : Better Construct: Adequate; HRQL Correlates with measures of QOL & well-being [77] Data on normal people available [77] Quality of Life Emotional distress 5 Past 7-point ordinal scale [1–7] Internal Consistency Developed by focus Effect Sizes and in Reflux and Sleep disturbance 5 week Severity: none at all – a Alphas – High groups with patients Standardized Dyspepsia Food/drink 6 great deal (total and domain (N.A., Australia and Response means (QOLRAD) [78] problems Frequency: none of the scores) [78] Europe), clinicians, Adequate [117] Physical/social 5 time – all of the time and a literature functioning Test-retest review [78] Minimal Important Vitality 4 Total score and domain ICC – Moderately Difference: 0.5 per scores High [117] Construct: Adequate; Item [15] Higher score : better Correlates with SF- HRQL 36 and GSRS [78] Discriminative Adequate; Better with symptom severity than frequency [78] Abbreviations: ICC Intraclass Correlation Coefficient; Alpha Cronbach’s alpha; QoL Quality of Life Reliability coefficients: 0.80 high; 0.60–0.79 moderately high; 0.40–0.59 moderate 277
278 Chapter 25 to have a mean of 50 and a standard deviation of 10 stand the commitment will help insure continued [93]. One version of the SF-36 asks people to think involvement. Moreover, patient-reported measures about their health over the past four weeks and another rely on the ability of the patient to provide the an- version uses a one-week recall period. swers. The patient must, therefore, have sufficient Good to excellent internal consistency and test-re- reading ability or someone must read the questions to test reliability have been demonstrated in diverse pa- him or her to obtain the response. This is an accept- tient groups including those with GERD [88], [94]. able practice, but ad hoc translating the question by a Subscales of the SF-36 (pain and general health per- family member, a researcher or even a qualified trans- ceptions) and the component summary scores were lator is not permitted, as a bias may be introduced by able to discriminate between people with GERD re- the way the question is translated and asked [96]. porting no heartburn and those reporting heartburn While the use of proxy respondents has a place in symptoms [83]. Responsiveness to treatment has also research they are not patient-reported measures [2]. been demonstrated in people with GERD [83], [88]. When designing the study, the timing of the as- As part of an international initiative that used a stan- sessments should be planned within the context of the dard protocol, the SF-36 has been translated, cultur- surgery and the recovery trajectory [97]. Baseline as- ally adapted and revalidated in over 50 languages. sessments of symptoms and HRQL are essential in Norms for many countries are available [95]. both observational and controlled studies. In both types, one comparison will be between pre-surgery and post-surgery at various points in the recovery tra- Issues in using patient-reported outcomes jectory. In a controlled trial the baseline assessment in clinical research should be administered prior to randomization so as to eliminate any possible bias resulting from knowl- Using measures of symptoms, HRQL and patient sat- edge of the allocation either on the part of the patient isfaction in surgical studies requires additional consid- or the individual administering the measure. This erations in both the planning and execution of the baseline assessment in a controlled situation also investigation. Specific guidelines for selecting measures provides data for group comparisons at study entry as have already been discussed. This section will focus on well as allowing between-group comparisons over time. the successful use of these measures in a study. From the previous paragraph it is obvious that pre- When stating the objectives in the study protocol operative assessments are clearly important to provide it is important to identify that symptom resolution, baseline data. Yet, asking patients to complete ques- improved HRQL or high satisfaction with the treat- tionnaires as they are waiting for imminent surgery is ment are defined outcomes, each with a hypothesis probably not the best time to have them provide reflec- attached to them and that they will be as rigorously tive responses. Completion at an earlier point in time, evaluated as the more traditional outcomes. This is perhaps at the last visit to the doctor, or through a tele- crucial in a multi-centered study so that these out- phone interview a few days prior to surgery might yield comes are not considered as “add-ons” by co-investi- more considered answers. gators who may treat them with less rigour than Another issue to think about in terms of appropri- used with traditional assessments. ate timing is that the immediate effects, particularly When patients are asked to participate in a study when an open approach to surgery for GERD is used, and informed consent is sought, they should also be will be negative on most HRQL domains. Moreover, told about the study and what participation will entail there will be after-effects and possibly new symptoms [96]. In studies using patient-reported outcomes this with which the patients must deal. However, by four means that patients should agree to complete ques- weeks after the operation, patients will likely associate tionnaires or be interviewed face-to-face or over the positive changes in eating, or level of pain with an im- telephone at specific points in time. Some trials have proved quality of life. If one wants information on the actually asked patients to complete a set of forms as patient’s perceptions of the care process, assessments part of the eligibility criteria. Providing this informa- of treatment satisfaction are best made directly after tion up front and making sure that patients under- discharge when details are fresh in patients’ minds.
Wood-Dauphinee S and Korolija D 279 Satisfaction with the outcome of the operation, how- they plan. Moreover, most HRQL measures are multi- ever, must wait a sufficient time until the person is dimensional and made up of subscales. This again may fully recovered from the surgery itself and probably increase the number of endpoints. Not only do we until the long-term effects are apparent. select one or more multidimensional measures, but we The investigators should also plan where and make measures at several points in time. An outline how the assessments will be made. Where might be for data analysis should, thus, be made in the planning in the doctor’s office, in a clinic or hospital or in the stages. All these issues are within the purview of the home [96]. Ideally it should take place in a consis- statistician or someone very familiar with multi-level tent location but often this is impractical. A profes- and multivariate analyses. sional setting provides a milieu in which conditions Finally, procedures to contend with missing items are more controllable and personnel responsible for within measures, or missing data forms need to be administering the questionnaires can make sure that defined. Missing data within a measure are generally the patient completes it without input from family dealt with according to the following process. If at least or friends [96]. Telephone interviews, however, are 50% of the questions or items in a subscale have been widely used, provide data similar to face-to-face completed, a mean score calculated for that subscale interviews [98] and control for timing and patient can be imputed to replace the missing values. While completion. If an external person is involved in ad- this may decrease the variance in the data, it will prob- ministering the questionnaire, that individual should ably not have a major impact on the results [102], not be part of the treatment team and preferably [103]. Missing forms are more of a problem. If they are should be unaware of the objective of the study and missing at random because someone forgot to mail the the group assignment if it is a controlled trial. Pro- questionnaire to the patient or the patient missed a fol- viding questionnaires for patients to complete at a low visit because he or she was on a holiday, it is not later date, or mailing questionnaires for completion too serious. Forms not missing at random, which is the are other accepted approaches but ones that often more common scenario, may be telling us that the pa- result in considerable missing data. tient is sicker (or healthier) or perhaps more upset with Several points are important to remember. We the results of treatment than the average patient. In know that data obtained from self-completed forms other words there may be a health-related reason that are slightly different for those obtained through the questionnaire was not completed. For these cases it interviews so it is preferable to select one approach is important that a protocol is developed to handle the [99]–[101]. Feasibility may dictate, however, that ad- situation. Several options are available and all rely on ministrative modes are mixed. In any case, detailed statistical expertise and use of appropriate statistical instructions must be provided to personnel responsi- packages [103]. ble for collecting the data and procedures should be established that facilitate compliance with question- naire completion. It is also essential to clarify what In clinical practice should be done with the questionnaire when com- pleted. Most often direct entry using computers and While symptoms have traditionally been assessed, ad- electronic transmission is used. Sometimes patients vocates of patient-reported outcomes have supported respond directly on a computer. Instructions on pre- the use of other such measures in daily clinical practice. serving confidentiality are also essential. In particular, the assessment of HRQL has been seen as Detailed descriptions of analytic methods are an aid to screening for unidentified problems, making clearly beyond the scope of this chapter, and so only a decisions about treatment, monitoring patient status few general points will be made. First, it is important and response to treatment, as well as a mechanism for to have statistical expertise when the study is being quality assurance [104]. Barriers, however, were identi- planned. HRQL or symptom scores are seldom the fied to routine use for conceptual, methodological, prac- primary endpoints upon which sample size is calculat- tical and attitudinal reasons [105]. Scepticism about the ed, and therefore, investigators need to be sure that importance of the measures was voiced. Practitioners they have sufficient subjects to make the comparisons preferred traditional, pathologic or physiologic tests and
280 Chapter 25 did not understand the usefulness of information from reported in the GERD literature, articles related to both types of measures. They cited time and resource the value of these tools in clinical decision-making constraints, and were concerned about the costs of was very scarce. One study conducted in Montreal administering the tools, collecting the information, [113], had been presented at the 58th Annual Meet- compiling it rapidly, interpreting and using it. ing of the Central Surgical Association in 2001. Over the years a number of these concerns have been The ensuing discussion included questions to the addressed. Studies have shown that assessing HRQL in presenting author and these questions and their ans- different practice settings is feasible and is easily incor- wers were provided at the end of the article. One porated into the office or clinic routine [106]. Briefer question was about the practical use of continued as- and more precise disease-specific measures have been sessment of HRQL and why a simple satisfaction developed and computer-assisted technology is available rating scale was not sufficient. Her responses indi- to provide instant scoring and feedback to the clinicians. cated that the 5-point satisfaction measure varied Information about population values and the amount of little across patients and this was not sufficient as a change in patient status required to reflect an important sole endpoint, but that the HRQL scores yielded difference, as perceived by the patient, have added ease practical information. For example, if there were to the interpretation of the scores. unexpected responses on the questionnaire, patients A number of studies, both controlled trials and oth- were asked to return to the clinic for re-studies. As er designs, have investigated the impact of the use of was pointed out by the author, after you start to use HRQL information on the doctor-patient communi- HRQL questionnaires and “you get a feel for what is cation. To summarize, the provision of information to normal and abnormal, they help drive decision- the clinician seems to have an impact on the process of making in a practical way”. Hopefully, patient- care. It increases the identification of previously unrec- reported measures will be seen as an adjunct to ognized problems [107]–[109], improves doctor-pa- traditional care in the future. Their use can be seen as tient communication and facilitates more emotional formalizing what clinicians have been implicitly do- support for patients [107], [110], and increases physi- ing for ever when they ask a patient “How are you?” cians’ awareness of their patients’ problems and con- cerns [106], [107]. Moreover, the process was perceived as useful by most physicians and it was acceptable to Conclusions patients and office and clinic staff [108], [109]. Finally, it did not significantly increase the time of the doctor- Patient reported outcomes have been advocated fol- patient interaction [110]. The impact of providing lowing surgery for GERD for the past several years HRQL information to the clinicians appears to have [114]. Those outlined in this chapter, as well as had less impact on the outcomes of care. With the ex- others such as “adherence to treatment”, are impor- ception of the systematic review by Espallargues and tant for measuring the impact of GERD and its colleagues [109], there was no reported impact on pa- treatment. Clinicians and researchers who use these tient satisfaction, which was generally high [107], measures should select them carefully according to [111]. There was also little evidence of change in man- their reliability, validity and responsiveness, as well agement decisions as the result of HRQL knowledge as to information about how the score is interpreted. [107], [111], and most studies did not find that it in- It is also important to think about how traditional, fluenced health-related quality of life. There is, how- objective tests are related to patient-reported mea- ever, some recent evidence in people with cancer that sures, and to use each type as appropriate. Objective providing HRQL information to the clinician, posi- tests and measures provide information about the tively impacted the patient’s HRQL particularly in medical status of the patient that is essential for mental health and role performance areas [112]. management of the disease. Patient-reported mea- The studies referenced in the previous paragraph sures give information about the individual’s per- were all conducted using patients with problems ception of the symptoms and dysfunctions and how other than GERD. While the use of patient-reported they impact on the quality of their lives before and outcomes in follow-up after an operation was often in response to treatment. Both are important.
Wood-Dauphinee S and Korolija D 281 References [16] Watson DI, Foreman D, Devitt PG (1997) Preop- erative endoscopic grading of esophagitis versus out- [1] Kaplan RM (2002) Quality of life: An outcomes come after laparoscopic Nissen fundoplication. Am J perspective. Arch Phys Med Rehabil 83 (Suppl 2): Gastroenterol 92: 222–225 S44–S50 [17] Bammer T, Freeman M, Shabriari A et al (2002) Out- [2] Patrick DL (2003) Patient-reported outcomes come of laparoscopic antireflux surgery in patients with (PROs): an organizing tool for concepts, measures nonerosive reflux disease. J Gastointest Surg 6: 730–737 and applications. Qual Life Newsletter 31: 1–5 [18] Kamolz T, Bammer T, Granderath FA et al (2001) [3] Kamolz T, Pointer R (2002) Expectations of patients Laparoscopic antireflux surgery in gastro-oesopha- with gastroesophageal reflux disease for the outcome geal reflux disease patients with concomitant anxiety of laparoscopic antireflux surgery. Surg Laparosc disorders. Dig Liver Dis 33: 659–664 Endo Percutan Tech 12: 389–392 [19] Kamolz T, Granderath FA, Pointner R (2003) Does [4] Velanovich V, Karmy-Jones R (2001) Psychiatric major depression affect the outcome of laparoscopic disorders affect outcomes of antireflux operations antireflux surgery? Surg Endosc 17: 55–60 for gastroesophageal reflux disease. Surg Endosc [20] Falk GW (2001) Gastroesophageal reflux disease 15: 171–175 and Barrett’s esophagus. Endoscopy 33: 109–118 [5] Velanovich V (2003) The effect of chronic pain syn- [21] Dent J, Armstrong D, Delaney B et al (2004) Symp- dromes and psychoemotional disorders in symptom- tom evaluation in reflux disease: workshop back- atic and quality of life outcomes of antireflux surgery. ground, processes, terminology, recommendations and J Gastrointest Surg 7: 53–58 discussion outputs. Gut 53 (Suppl IV): IV1–IV24 [6] Eubanks TR, Omelanczuk P, Richards C et al (2000) [22] Bytzer P (2004) Assessment of reflux symptom seve- Outcome of laparoscopic antireflux procedures. Am J rity: methodological options and their attributes. Surg 179: 391–395 Gut 53 (Suppl IV): IV28–IV34 [7] Codman EA (1914) The product of a hospital. Surg [23] Stephens RJ, Hopwood P, Girling DJ et al (1997) Gyn Obst 18: 491–496 Randomized trials with quality of life end-points: [8] McColl E (2004) Best practice in symptom assess- are doctors’ ratings of patients’ physical symptoms ment: a review. Gut 53 (Suppl IV): IV49–IV54 interchangeable with patients self-ratings? Qual Life [9] Price DD, Harkins SW, Baker C (1987) Sensory-af- Res 6: 225–236 fective relationships among different types of clinical [24] Sandmark S, Carlsson R, Fausa O et al (1988) and experimental pain. Pain 28: 297–307 Omeprazole or ranitidine in the treatment of reflux [10] Stein HJ, Feussner H, Siewert JR (1998) Antireflux esophagitis. Results of a double-blind, randomized surgery: a current comparison of open and laparoscopic Scandinavian multicenter study. Scand J Gastro- approaches. Hepatogastroenterology 45: 1328–3337 enterol 23: 625–632 [11] Pope CE II (1992) The quality of life following anti- [25] Wyrwich KW, Staebler Tardino VM (2004) A blue- reflux surgery. World J Surg 16: 355–358 print for symptom scales and responses: measure- [12] Velanovich V, Karmy-Jones R (1998) Measuring ment and reporting. Gut 53 (Suppl IV): IV45–IV48 gastroesophageal reflux disease: relationship between [26] Shaw M (2004) Diagnostic utility of reflux disease health-related quality-of-life scores and physiologic symptoms. Gut 53 (Suppl IV): IV25–IV27 parameters. Am Surg 64: 649–653 [27] Revicki DA, Wood M, Maton PN et al (1998) The [13] Shi G, Tatum RP, Joehl RJ et al (1999) Esophageal impact of gastroesophageal reflux disease on health- sensitivity and symptom perception in gastroesopha- related quality of life. Am J Med 104: 252–258 geal reflux disease. Curr Gastroenterol Rep 1: 214–219 [28] Eloubeidi MA, Provenzale D (2000) Health-related [14] Katzka DA (1999) Digestive system disorders: quality of life and severity of symptoms in patients gastroesophageal reflux disease. Clin Evidence 1: with Barrett’s esophagus and gastroesophageal reflux 145–153 disease patients without Barrett’s esophagus. Am J [15] Kulig M, Leodolter A, Vieth M et al (2003) Quality Gastroenterol 95: 1881–1887 of life in relation to symptoms in patients with [29] Berzon RA, Hays RD, Shumaker SA (1993) Inter- gastro-oesophageal reflux disease – an analysis based national use, application and performance of health- on the ProGERD initiative. Aliment Pharmacol related quality of life instruments. Qual Life Res 2: Ther 18: 767–776 367–368
282 Chapter 25 [30] Guyatt GH, Feeny DH, Patrick DL (1993) Mea- symptomatic outcome, and patient satisfaction. suring health-related quality of life. Ann Intern Med J Gastrointest Surg 6: 812–818 118: 622–629 [47] Coyne KS, Wiklund I, Schmier J et al (2003) Devel- [31] World Health Organization (1948) WHO Consti- opment and validation of a disease-specific treatment tution. Geneva WHO satisfaction questionnaire for gastro-oesophageal re- [32] Guyatt G, Veldhuyzen Van Zanten S, Feeny D et al flux disease. Ailment Pharmacol Ther 18: 907–915 (1989) Measuring quality of life in clinical trials: a tax- [48] Revicki DA (2004) Patient assessment of treatment onomy and review. Can Med Assoc J 140: 1441–1448 satisfaction: methods and practical issues. GUT 53 [33] Hudak PL, Wright JG (2000) The characteristics of (Suppl IV): IV40–IV44 patient satisfaction measures. Spine 25: 3167–3177 [49] Sitzia J (1999) How valid and reliable are patient [34] Linder-Pelz S (1982) Toward a theory of patient satisfaction data? An analysis of 195 studies. Int J satisfaction. Soc Sci Med 16: 577–582 Qual Health Care 11: 319–328 [35] Ware JE, Davies-Avery A, Stewart AL (1978) The [50] Kirshner B, Guyatt G (1985) A methodological measurement and meaning of patient satisfaction. framework for assessing health indices. J Chron Dis Health Med Care Serv Rev 1: 13–15 38: 27–36 [36] Kravitz RL (1996) Patients’ expectations for medical [51] Portney LG, Watkins MP (2000) Reliability. In: Foun- care: an expanded formulation based on review of dations of clinical research: applications to practice (2nd the literature. Med Care Res Review 53: 3–27 ed), Prentice Hall Health, New Jersey, pp 61–77 [37] Patrick DL, Martin ML, Bushnell DM et al (2003) [52] de Bruin AF, Diederiks JPM, de Witte LP et al (1997) Measuring satisfaction with migraine treatment: ex- Assessing the responsiveness of a functional status pectations, importance, outcomes and global ratings. measure: the Sickness Impact Profile versus the SIP68. Clin Ther 25: 2920–2935 J Clin Epidemiol 50: 529–540 [38] Locker D, Dunt D (1978) Theoretical and methodo- [53] Husted JA, Cook RJ, Farewell VT et al (2000) logical issues in sociological studies of consumer satis- Methods for assessing responsiveness: a critical review faction with medical care. Soc Sci Med 12: 283–292 and recommendations. J Clin Epidemiol 53: 459–468 [39] Weaver M, Patrick DL, Markson PD et al (1997) [54] Terwee CB, Dekker FW, Wiersinga WM et al (2003) Issues in the measurement of treatment satisfaction. On assessing responsiveness of health-related quality of Am J Manag Care 3: 579–594 life instruments: guidelines for instrument evaluation. [40] Bessell JR, Finch R, Gotley DC et al (2000) Chronic Qual Life Res 12: 349–362 dysphagia following laparoscopic fundoplication. [55] Cohen J (1988) Statistical power analysis for the Br J Surg 87: 1341–1345 behavioral sciences. 2nd ed. Hillsdale: Laurence [41] Vakil N, Shaw M, Kirby R (2003) Clinical effective- Erlbaum ness of laparoscopic fundoplication in a U.S. com- [56] Kazis L, Anderson J, Meenan R (1989) Effect sizes munity. Am J Med 114: 1–5 for interpreting changes in health status. Med Care [42] Cleary P, McNeil B (1998) Patient satisfaction as an 27 (Suppl): 178–189 indicator of quality of care. Inquiry 22: 25–36 [57] Liang MH, Fossel AH, Larson MG (1990) Com- [43] Patrick DL, Erickson P (1993) Health status and parisons of five health status instruments for ortho- health policy: quality of life in health care evaluation paedic evaluation. Med Care 28: 632–642 and resource allocation. Oxford University Press, [58] Guyatt G, Walter S, Norman G (1987) Measuring New York change over time: assessing the usefulness of evalua- [44] Dougall A, Russel A, Rutin G et al (2000) Re- tive instruments. J Chron Dis 40: 171–178 thinking patient satisfaction: patient experiences of [59] Ware JE, Keller SD (1996) Interpreting general an open access flexible sigmoidoscopy service. Soc health measures. In: Quality of life and phar- Sci Med 50: 53–62 macoeconomics in clinical trials (Spilker B, ed). [45] Feinstein AR (1987) Global indexes and scales. In: Philadelphia: Lippencott-Raven, pp 445–460 Clinimetrics. Yale University Press, New Haven, pp [60] Lydick EG, Epstein RS (1993) Interpretation of 91–103 quality of life changes. Qual Life Res 2: 221–226 [46] Granderath FA, Kamolz T, Schweiger M et al (2002) [61] Guyatt G, Osoba D, Wu A et al (2002) Methods to Long term follow-up after laparoscopic refundopli- explain the clinical significance of health status mea- cation for failed antireflux surgery: quality of life, sures. Mayo Clin Proc 77: 371–383
You can also read