Challenges of Accurately Measuring and Using BMI and Other Indicators of Obesity in Children
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
SUPPLEMENT ARTICLE Challenges of Accurately Measuring and Using BMI and Other Indicators of Obesity in Children CONTRIBUTOR: John H. Himes, PhD, MPH Division of Epidemiology and Community Health, University of abstract Minnesota, School of Public Health, Minneapolis, Minnesota BMI is an important indicator of overweight and obesity in childhood KEY WORDS and adolescence. When measurements are taken carefully and com- body mass index, overweight, obesity, child obesity, measurement pared with appropriate growth charts and recommended cutoffs, BMI ABBREVIATIONS provides an excellent indicator of overweight and obesity that is suffi- CDC—Centers for Disease Control and Prevention cient for most clinical, screening, and surveillance purposes. Accurate IOTF—International Obesity Taskforce measurements of height and weight require that adequate attention be WHO—World Health Organization given to data collection and management. Choosing appropriate equip- CI— confidence interval BIA— bioelectrical impedance analysis ment and measurement protocols and providing regular training and www.pediatrics.org/cgi/doi/10.1542/peds.2008-3586D standardization of data collectors are critical aspects that apply to all doi:10.1542/peds.2008-3586D settings in which BMI will be measured and used. Proxy measures for directly measured BMI, such as self-reports or parental reports of Accepted for publication Apr 29, 2009 height and weight, are much less preferred and should only be used Address correspondence to John H. Himes, PhD, MPH, University of Minnesota, School of Public Health, Division of Epidemiology with caution and cognizance of the limitations, biases, and uncertain- and Community Health, 1300 S 2nd St, Suite 300, Minneapolis, ties attending these measures. There is little evidence that other mea- MN 55454. E-mail: himes001@umn.edu sures of body fat such as skinfolds, waist circumference, or bioelectri- PEDIATRICS (ISSN Numbers: Print, 0031-4005; Online, 1098-4275). cal impedance are sufficiently practicable or provide appreciable Copyright © 2009 by the American Academy of Pediatrics added information to be used in the identification of children and ad- FINANCIAL DISCLOSURE: The author has indicated he has no olescents who are overweight or obese. Consequently, for most clini- financial relationships relevant to this article to disclose. cal, school, or community settings these measures are not recom- mended for routine practice. These alternative measures of fatness remain important for research and perhaps in some specialized screening situations that include a specific focus on risk factors for cardiovascular or diabetic disease. Pediatrics 2009;124:S3–S22 Downloaded from www.aappublications.org/news by guest on October 10, 2021 PEDIATRICS Volume 124, Supplement 1, September 2009 S3
BMI (weight [kg]/height [m2]) has appropriate collection, use, and inter- 100 cm. She measured the same chil- probably become the most common in- pretation of BMI as it is used as an in- dren a second time on Tuesday, again dicator used to assess overweight and dicator of child and adolescent over- with a mean height of 100 cm. Never- obesity in a wide variety of settings, weight and obesity are considered. theless, for some girls there were including clinical, public health, and Also, chief measurement issues re- small differences in height measure- community-based programs. Although lated to other selected anthropometric ments between Monday and Tuesday, it is certainly not a perfect surrogate indicators of overweight and obesity although the mean height of all girls for total body fatness and not without are briefly discussed. remained the same for the 2 days. its technical limitations,1 BMI has been The differences between measured recommended as the most appropri- SOME BASIC CONCEPTS FROM heights on Monday and Tuesday for the ate single indicator of overweight and MEASUREMENT THEORY individual girls are examples of ran- obesity in children and adolescents Classical measurement theory in- dom errors of measurement. outside of research settings.2–4 cludes some concepts that are helpful Random errors of measurement are a One of the attractive features of BMI is for understanding issues surrounding concern, because they always add to that it is derived from measurements measurement of height, weight, and, the variability of the true measure- of height and weight. These 2 anthro- therefore, BMI. Detailed explanations ments; their presence and extent are pometric dimensions are the ones of measurement theory are available usually considered the measure- most commonly collected on children in standard textbooks concerning ment’s “reliability.” Poor measure- worldwide. These 2 measurements are measurement and psychometrics.5,6 ment reliability is a concern because it noninvasive, relatively inexpensive to Different academic disciplines may may cause incorrect clinical judg- obtain, and relatively easily under- use different terms to refer to the ments for individual children (misclas- stood by health practitioners, the indi- same concepts, but for the present dis- sification) and alter conclusions for viduals being measured, and their cussion the terms usually found in the statistical analysis for groups of chil- families. biomedical and epidemiologic litera- dren. Because most inferential statis- Mentioning child measurements of ture will be used. tical tests use a measure of variation height and weight, individuals may be It is important to know that all mea- (eg, SD) as a denominator, statistical reminded of their own marks on the surements are imperfect and always tests of differences between means, door sills and the bathroom scales of measured with some error, whether analysis of variance, correlations, re- their childhood homes. So, although the measurements be height, weight, gressions, and odds ratios are all at- wide familiarity with height and weight skinfolds, or bioelectric impedance. tenuated (ie, less statistically signifi- enhances the use and understanding Accordingly, an index such as BMI, cant) as the measurement reliability of a measure such as BMI, it also may which is derived from 2 other mea- decreases and the variability term in desensitize health professionals to the surements, will include the compo- the denominator increases. Random need to give adequate attention to nents of measurement error inherent errors are usually reported in terms of issues concerning how height and in the constituent height and weight a measurement error variance or a weight data are collected. Accordingly, measurements. The nature and magni- measurement error SD, or summa- one may hear the comment, “Anyone tude of these measurement errors rized in reliability coefficients (inter- can measure height and weight.” Al- have some fairly predicable conse- class or intraclass correlations) from though one must actually agree with quences related to the usefulness and replicate measurements of the same the language, if not the intent, of this interpretation of the measurements. children. easy declaration, many health profes- Some measurement errors are ran- In a second example, nurse Brown sionals are unaware that there are dom, with the same probability of be- measured the same group of girls on consequences for the usefulness and ing smaller than or greater than the Monday, again with a mean height of accurate interpretation of BMI data true value (a theoretical value mea- 100 cm. This time on Tuesday nurse that follow from decisions made con- sured without error). Consequently, Jones measured them for a second cerning data collection. the average or mean of random errors time and recorded heights exactly In this article, challenges surround- across a series of measurements is 0. 1.0 cm taller than did nurse Brown for ing the measurement of BMI in US For example, nurse Brown measured every girl. Now the mean height for all children (2–18 years of age) and the heights on a group of 4-year-old girls the girls on Tuesday was 101 cm. If we implications of these issues for the on Monday, and the mean height was consider nurse Brown to be our gold Downloaded from www.aappublications.org/news by guest on October 10, 2021 S4 HIMES
SUPPLEMENT ARTICLE standard of measurement, this sys- associated with the instrument used to time height measurements to ac- tematic measurement error (ie, all in 1 to measure, that associated with the commodate it, unless one is engaged direction) of nurse Jones is an exam- child being measured, and that associ- in a rigorous research protocol that ple of measurement bias. ated with the observer(s) doing the requires serial measurements on a Measurement bias is a concern be- measuring. In most settings, however, small number of individual children. cause it may cause misclassification of errors associated with the child and For the data included in the 2000 Cen- individual children or groups of chil- with the observer(s) are the chief ters for Disease Control and Preven- dren. Nevertheless, as long as the bias sources of measurement error in mea- tion (CDC) growth charts,13 heights is not differential among groups, pure surements of height and weight. Obvi- were measured from mornings through measurement bias will not affect the ously, it is still important to have ap- evenings so that the reference percen- results of statistical tests between or propriate measuring equipment, but tiles represent something like heights among groups, such as differences be- once they are installed and calibrated, averaged throughout the day, and the tween means, analysis of variance, little measurement error usually is associated within-child variation is in- correlations (interclass), regressions, due to the instruments per se. cluded in the total variance in height and odds ratios. In practice, differ- captured in the published percentiles ences between individual observers Measurement Errors Due to or z scores at an age. who measure the same children will Child Variation For body weight, the within-child varia- also have a component of random The normal day-to-day variation within tion is related to the size of the child measurement error between them. a child leads to a component of mea- and should usually be within 1.5% of Not surprisingly, observers tend to surement error. This variation proba- the measured weight (SD: 0.5%).14 Ac- measure more like themselves than bly results from many sources includ- cordingly, the expected maximum like others, so interobserver errors ing hydration, gastrointestinal and within-child weight variation for chil- are almost always larger than intra- urinary bladder contents, diurnal hor- dren who weigh 25 and 50 kg should be observer errors. monal fluctuations, saltatory growth, ⬃375 g (0.83 lb) and 750 g (1.65 lb), Measurement theory usually specifies fidgeting, alterations in position, and respectively. In practice, it is difficult to that measurement errors are inde- fatigue.7,8 standardize this physiologic within- pendent and additive, that is, that the As early as 1724, Wasse recognized ap- child weight variation when children total measurement error variance is preciable variation in stature during are measured, so it is usually ignored the sum of error variances from all the day and concluded that “[t]he al- for most purposes. sources.5 Also, when increments or dif- teration in the human stature . . . pro- ferences between successive mea- ceeds from the yielding of the carti- Seasonal Variation surements are used, the measure- lages between the vertebrae to the It has been known for a long time that ment errors attending each of the 2 weight of the body in an erect pos- in some environments children may constituent measurements are in- ture.”9 MRI studies have since con- grow differentially according to sea- cluded with the increment. So an incre- firmed that the diurnal variation in son of the year.15 It is important, how- ment has twice the random measure- stature primarily results from in- ever, to understand the contexts of ment error (variance) of an attained creases in water content in the soft these findings to determine the impli- value and lower measurement reliabil- central portion of the intervertebral cations for current studies of height, ity. Obviously, if measurement biases discs (nucleus pulposus) while at rest weight, and BMI. change over time, the accuracy of in- and water loss while standing or dur- In developing countries with prevalent crements becomes questionable. ing other weight-bearing activities.10 poverty, undernutrition, and infection, For children, one can expect a mean reduced seasonal patterns of aver- CHIEF SOURCES OF MEASUREMENT height difference of ⬃1.5 cm (SD: age growth in height and weight are ERRORS FOR HEIGHT AND WEIGHT 0.46 cm) between rising and late after- often linked to the rainy season(s), When a child’s height and weight are noon,11 with most of the change prob- along with accompanying factors in- measured, there are several pos- ably occurring during the first 2 to 3 cluding reduced food availability and sible sources of measurement error. hours of the day.12 increased infection.16,17 In developed A simplified theoretical model would In practice, it is helpful to understand countries the evidence is mixed, but say that the total variance of mea- the expected diurnal variation in child when seasonal patterns are present, surement error is the sum of that height but probably impractical to try they usually indicate relatively greater Downloaded from www.aappublications.org/news by guest on October 10, 2021 PEDIATRICS Volume 124, Supplement 1, September 2009 S5
growth in height and linear dimen- more concern about observer reliabil- theory here is that a mean of repli- sions during the spring and summer ity in measurements of height rather cates is a better estimate of the “true” and relatively greater growth in weight than weight, because height measure- measurement, because the random and fatness during the fall and win- ments include more “opportunities” errors of measurement are reduced.28 ter.18,19 When seasonal fluctuations ex- for within-child and observer variation The usefulness of taking replicate ist in developed countries, they are than do weight measurements. measurements depends on the reli- smaller and less common than those Often, height and weight measure- ability of the single measurement in seen in children living in developing ments for BMI are collected in clinical question and how the data will be countries. or other settings in which data col- used. In studies in both Japan20 and the lection may be hurried and observ- Routinely obtaining replicates benefits United States,21 seasonal fluctuations ers may not have been trained as most those measurements that have in growth were observed in earlier rigorously as observers in research the lowest initial reliability, and the generations of children but disap- settings. Actually, there are few stud- corresponding improvements in reli- peared within the same populations ies available concerning measure- ability are predictable.28 Measurement- over 20 to 40 years because general ment variation among those who prob- reliability coefficients (R) express the health and nutrition conditions im- ably collect most of the data used for percentage of the total observed vari- proved through time. Accordingly, for BMI evaluation and screening. Ahmed ation that is captured by the “true” almost all children now living in the et al24 evaluated the measurement measurement variation. For single United States, there should be little if variation among 2 sets of health visi- measurements of height and weight in any seasonal variation in growth that tors who each measured each of 10 a nonresearch setting, a reasonable would require accounting for it in the children at ages 3 and 4.5 years 3 expectation for values of R should be design of studies or data-collection times with a portable stadiometer. The ⬃0.93 and 0.97, respectively. At these protocols. average value for the SD of measure- levels of measurement reliability, col- Excess growth in BMI has been ob- ment was 0.47 cm. In a small compari- lecting a second measurement and served over summer vacation be- son trial on height of 5- and 6-year-old using the mean raises the values of tween kindergarten and first grade British children, school nurses had a R to 0.963 and 0.984, respectively. for children in the Early Childhood pooled interobserver measurement These are not dramatic improvements Longitudinal Survey.22 Nevertheless, SD of 0.32 cm, which compared favor- in measurement reliability using a du- this should probably be viewed as a ably to that of a trained auxologist plicate, because the initial levels of school/no-school effect rather than (0.35 cm) on the study.25 The nurses in measurement reliability started out seasonal variation per se. this study had been trained in measur- rather high. ing height. Importantly, training can Contrast these possible improvements Measurement Errors Due to improve the precision of length and when using replicate measurements Observer Variation height measurements.8,26 with those for skinfold thicknesses, for An important goal in measurement of Given the above-listed principles, it fol- which the measurement reliability for height and weight should always be to lows that when a large number of data a single measurement in nonresearch collect the data with as little measure- collectors are required the interob- settings is probably ⬃0.8. For succes- ment error as possible, given the prac- server measurement errors increase sive numbers of replicate skinfold tical and financial constraints of the as well.27 Consequently, one would pre- measurements and using the mean, local situation. fer to have as few individuals measur- the R values would be 0.88 for 2 mea- In a highly controlled research labora- ing height and weight as is practicable surements, 0.92 for 3 measurements, tory with experienced anthropom- in the particular setting, especially if and 0.94 for 4 measurements. etrists, the mean interobserver (abso- the resulting data will be used for re- The errors of measurement with low lute) differences for standing height search purposes or if serial measure- measurement reliability are usually and weight are 0.3 cm and 0.02 kg, re- ments on the same children are being assumed to be largely random. Conse- spectively, with corresponding SDs of made. quently, how the data are to be used is 0.2 cm and 0.03 kg.23 These values Another strategy for reducing mea- a consideration in deciding whether should be viewed as close to the mini- surement errors is to take the mea- the extra time and trouble should be mum values possible using current surements more than once and then spent routinely collecting replicate methods. In most situations there is use the mean of the replicates. The measurements. Purely random errors Downloaded from www.aappublications.org/news by guest on October 10, 2021 S6 HIMES
SUPPLEMENT ARTICLE will not affect the group means of relative to the 2000 CDC growth the United States are now being re- height, weight, and BMI, although they charts.13 These are high-quality growth ported in the literature rather fre- will increase the SDs because of the charts that present selected percen- quently using the IOTF criteria,33,34 added error variance. Similarly, the tiles and allow calculation of z scores which has been useful in standardizing prevalence of children with a BMI of attained height, weight, and BMI for BMI criteria. Nevertheless, it should be above percentile cutoffs for age and age and gender and in metric and En- noted that the IOTF charts contain no gender will not be affected by the glish units. The primary data were col- percentile or z-score curves other than added random error because as many lected in national surveys by using rig- the 2 cutoff lines, because they were children should be misclassified above orous measurement protocols, and specifically designed for reporting and below the cutoff value. If the BMI state-of-the-art statistical methods population prevalences of overweight data are to be used for these pur- were used to derive and smooth the and obesity. Accordingly, the IOTF poses, routinely taking replicate mea- percentiles and z scores across the charts should not be used to monitor surements is probably not worthwhile. ages. More detailed technical informa- BMI growth in individual children. For some uses of BMI data, however, tion on methods and development are In 2006 the Department of Nutrition routinely taking replicate measure- available elsewhere.13 Earlier sets of and Health at the World Health Organi- ments is recommended. If the BMI data BMI reference data for US children (eg, zation (WHO) released a new growth will be used to make clinical decisions Must et al30) should not be used be- standard for children from birth to 5 regarding treatment or referral of in- cause the cutoff values are slightly dif- years of age based on longitudinal and dividual children, or for assessing ferent, which will serve to complicate cross-sectional data collected in 6 changes in individuals over time, a comparisons across studies. countries (Brazil, Ghana, India, Nor- second measurement of height and Some other countries have developed way, Oman, and United States).35 The weight will reduce misclassification of and use their own growth charts, but 2 new attained growth curves, including current status and increase the ability sets designed for international appli- BMI, were designed to represent how to detect changes from one occasion cations should be briefly mentioned, all children ought to grow under ideal to another. In research settings that particularly relative to BMI. The Inter- circumstances. Accordingly, the moth- include height, weight, and BMI as im- national Obesity Taskforce (IOTF) spon- ers and children were carefully se- portant variables, duplicate measure- sored a workshop with a goal of estab- lected so that there were no known ments of height and weight are recom- lishing a standard definition for child constraints to healthy growth, includ- mended. If the height and weight overweight and obesity worldwide.31 ing exclusive breastfeeding and appro- replicates are averaged before calcu- As a result, high-quality BMI data from priate introduction of solid foods.36 Be- lating BMI, the latter calculation only 6 countries (Brazil, Great Britain, Hong cause of the homogeneous nature of needs to occur once. Kong, Netherlands, Singapore, and the the WHO samples and some choices United States) were combined to de- made to exclude the heaviest children, CHALLENGES OF USING APPROPRIATE REFERENCE DATA velop age- and gender-specific cutoffs the upper BMI percentiles and z scores AND CUTOFFS for children (birth to 20 years of age) are somewhat restricted (ie, nar- corresponding to the locations of the rower) at an age compared with those Which Reference Data? BMI values of 25 and 30 kg/m2 in the in the 2000 CDC growth charts. Conse- Usually, BMI will be evaluated in chil- statistical distribution of adults.19 quently, using the same percentile cut- dren relative to reference data or These latter BMI cutoffs are the con- off for BMI at an age (eg, ⱖ95th), the growth charts. The main challenge to ventional criteria that identify over- WHO standards will yield a higher prev- the investigator is to choose the set weight and obesity in adults.32 alence of children than if the ⱖ95th of growth charts that is most ap- The IOTF cutoffs that define overweight percentile for age were used from the propriate for the intended purposes and obesity correspond approximately 2000 CDC growth charts.37 The oppo- for which the BMI data will be used. to percentiles 82 to 84 and 96 to 97, site is true at the other end of the BMI For height, weight, and BMI, US in- respectively, on the 2000 CDC growth distribution so that thinness defined vestigators have the benefit of re- charts for BMI for age, not very differ- by a low BMI percentile on the WHO cent recommendations from an expert ent from the 85th- and 95th-percentile standards will identify fewer children committee.4,29 cutoffs used customarily in the United with low BMI compared with using the For most purposes, US children aged States. Prevalences of overweight and same percentile cutoff on the 2000 CDC 2 to 18 years should be evaluated obesity in children in countries outside growth charts.38 Downloaded from www.aappublications.org/news by guest on October 10, 2021 PEDIATRICS Volume 124, Supplement 1, September 2009 S7
One concern about using these new As a personal recommendation for was smaller) were considered over- WHO growth standards is the interpre- health practitioners in the United weight. Children or adolescents with a tation in terms of the health or growth States, the 2000 CDC growth charts BMI at ⱖ85th percentile but ⬍95th of children who are in the extremes of should be used for routine screening, percentile were considered at risk of the percentiles (eg, ⬍5th, ⬎95th) on surveillance, and monitoring of BMI overweight. At that time, the term the basis of a standard that purport- because they have been widely evalu- “obese” was avoided, because obesity edly only included healthy children. ated and adopted, and they have been was technically defined in terms of Nevertheless, the WHO standards are recommended by recent expert com- body fat per se, and BMI was derived so new that there are no data docu- mittees.4,29 If investigators wish to only from height and weight. menting whether the new cutoffs are communicate with international col- In 2005, the Institute of Medicine (IOM) better at identifying children at health leagues in presentations and in the consciously departed from the termi- risk than the 2000 CDC growth charts. scientific literature by citing the IOTF nology discussed above and elected to In 2007, the WHO released a growth ref- or WHO criteria, they should also in- define children with at BMI at ⱖ95th erence for height, weight, and BMI for clude at least prevalence results rela- percentile for age and gender as obese children aged 5 to 19 years that was tive to the 2000 CDC growth charts so rather than overweight.44 The IOM re- designed to align with the 2006 WHO that their findings can be compared port expressed the seriousness, ur- growth standards at 5 years and to with those of other US studies. Hope- gency, and medical nature of child- be used internationally.39 The WHO re- fully, as further research becomes hood obesity and deliberately sought analyzed the data comprising the US available, more specific recommenda- to express this concern by using the National Center for Health Statistics tions can be made on the basis of term “obese” to refer to the children growth curves, published in 1977,40 studies of sensitivity/specificity and and adolescents with the highest BMI. and proposed that they be used as a differential risk among the various A recent expert committee endorsed single growth reference for screening, BMI criteria currently available. the IOM position and recommended to surveillance, and monitoring of school- replace the terms “at risk of over- aged children worldwide. As with chil- A Rose by Any Other Name weight” and “overweight” with the dren older than 24 months included in Before 1994 the scientific literature on terms “overweight” and “obese,” re- the new WHO birth to 5 years refer- overweight and obesity included a spectively.4,29 Accordingly, the expert ence,35 BMI values of ⬎2 SDs were ex- wide range of defining criteria (eg, committee recommended that individ- cluded as unhealthy for the 2007 5 to percent ideal weight, skinfold thick- uals 2 to 18 years of age with a BMI of 19 years reference.39 Because the ness, ponderal index, BMI) and many ⬎30 kg/m2 or ⱖ95th percentile for heaviest children were excluded, the descriptive names to refer to the chil- age and gender (whichever is smaller) upper percentiles of BMI for the WHO dren and adolescents who were con- should be considered obese. Individu- 2007 reference are substantially be- sidered the fattest. This variation in re- als with a BMI at ⱖ85th percentile but low the corresponding levels for the porting made it difficult to compare ⬍95th percentile or 30 kg/m2 (which 2000 CDC growth charts, especially in findings because different indicators ever is smaller) should be considered later adolescence when high BMI val- may actually identify different children overweight. ues are more common. as the fattest,41 and the differences in The expert committee believed that the There has been much informal dis- terminology were sometimes confus- terms “overweight” and “obese” better cussion about the use of the IOTF and ing. An expert committee considered convey the seriousness and impor- WHO references. Unfortunately, there these issues, and their proceedings, tance of the obesity epidemic to health have been no formal recommenda- published in 1994,2 had considerable providers, parents, and children and in tions from agencies or professional effect toward standardizing the crite- a less ambiguous manner than the organizations in the United States re- ria (BMI for age) and the nomenclature previous terms, although no specific garding their routine or partial use for referring to the fattest children literature was cited to support this (eg, at certain ages or for certain pur- and adolescents. Subsequently, these view. Because BMI identifies the fattest poses). This institutional silence is un- definitions became preferred in de- individuals with acceptable accuracy, fortunate, because it will likely lead to scribing weight status.3,42,43 especially at the highest levels of at least ambiguity and perhaps even In the 1994 report,2 children with a BMI BMI,45,46 the expert committee believed confusion among health practitioners that exceeded 30 kg/m2 or ⱖ95th per- that choosing more direct terms that and in the scientific literature. centile for age and gender (whichever may provide additional impetus for Downloaded from www.aappublications.org/news by guest on October 10, 2021 S8 HIMES
SUPPLEMENT ARTICLE treatment and change was to be pre- from approximately the 92nd to the 97th ferred to parsing technical concepts percentiles but increase in spread, that would be unlikely to aid under- reaching from the 90th to 98th percen- standing. Finally, the new terminology tiles at the older ages. The 95% CIs comports with that from the IOTF BMI around the 99th BMI percentiles for criteria for children and adolescents,47 girls include from approximately the with conventional terminology for 97th to effectively just less than the adults,32,48 and with the International 100th percentiles (because no point Classification of Diseases, 9th Revi- can exceed percentile 100). sion, Clinical Modification (ICD-9-CM). After ⬃18 years of age, the upper 95% FIGURE 1 Nomenclature really does matter; it is a The 85th, 95th, and 99th percentiles for BMI in confidence limit for the 85th percen- sine qua non with standardized defi- girls (straight horizontal lines) and 95% CIs cal- tiles and the lower 95% confidence culated from the number of subjects included at limit for the 95th percentiles are ap- nitions of health conditions. Standard- each age in the 2000 CDC growth charts. ized nomenclature increases precision proximately coincident, and the upper in scientific and public communication limit of the 95th and the lower limit of and provides improved understanding As an example, Fig 1 presents the 85th, the 99th percentiles actually overlap. in health guidance. 95th, and 99th percentiles of BMI for This means, for example, that a 19- girls as straight lines and the respec- year-old girl with a BMI identified as Precision of Percentile Estimates tive 95% confidence intervals (CIs) cal- being at the 99th percentile by the culated by using the method of Wil- 2000 CDC growth charts (or by com- Often, health providers and research- puter programs that calculate the ex- son49,50 and the unweighted sample ers use the exact BMI-for-age cutoffs act percentiles) will probably have a sizes within age groups (15–20 years) that define overweight and obesity as BMI somewhere between the 96th and used for the 2000 CDC growth charts.13 ironclad diagnostic criteria. Although 100th percentiles. At younger ages the sample sizes standardized definitions are essential, range from 400 to 639, and the 95% CIs The precision of the upper percentile as discussed above, the actual mea- are quite stable and similar to those at cutoffs for BMI can be viewed from sev- surements on the child will always 15 and 16 years. The sample sizes and eral different perspectives. First, the vary somewhat as a result of child and corresponding 95% CIs for boys are samples and CDC growth charts are as observer factors. In addition, the ac- similar to those for girls. The 99th per- they are, and no revisions are antici- tual percentile cutoffs themselves are centile of BMI for age was not origi- pated in the near future. Consequently, statistical estimates of points that are nally published with the growth charts those who use the growth charts should also subject to errors. but has been suggested as a useful understand their limitations in inter- Let us assume that the basic data that cutoff for identifying children at added preting findings and not wait for more were used to construct the 2000 CDC health risk.51 precise estimates. Actually, additional growth charts13 are truly representa- On the basis of sample sizes, the 95% imprecision beyond that related to tive of the US population of children CIs around the 85th BMI percentiles in- sample size probably also occurs at and adolescents, and that the statisti- clude values approximately between some ages in adolescence because of cal procedures used to smooth the the 81st and 88th percentiles until differences in maturational status.52 percentile values across age were ap- ⬃17.5 years, when the sample sizes Given the range of CIs surrounding the propriate and unbiased. There still re- decrease and the 95% CIs become BMI percentiles at all ages, health pro- mains a degree of uncertainty regard- wider. These CIs mean that at 20 years viders and investigators should be a ing the point estimates of the final of age (the most extreme case), a girl little less stringent in defining the ex- percentile values related to the num- whose BMI percentile corresponds to act location of a child or groups of chil- ber of children that were included in the 85th percentile on the CDC chart dren in the BMI distribution relative to the samples within each 6-month age may actually have a BMI anywhere be- the growth charts. Accordingly, BMI group used to estimate the percen- tween the 78th and 90th percentiles values just below or just above recom- tiles. Simply put, the larger the sample, because of the imprecision the of per- mended cutoffs should be interpreted the more precise the percentile esti- centile estimates. as only 1 indicator and not the only di- mates, especially at the extremes of For the 95th BMI percentile estimates agnostic criterion for clinical deci- the distribution. before ⬃17.5 years, the 95% CI range sions. Follow-up visits and repeated as- Downloaded from www.aappublications.org/news by guest on October 10, 2021 PEDIATRICS Volume 124, Supplement 1, September 2009 S9
sessments on other occasions should the percentile charts cease to be use- age, calculating the percentage excess reduce the uncertainty of the child’s ful for differentiating their growth sta- of a BMI value or percentage over- BMI status. tus. Accordingly, cutoffs of less than weight beyond a percentile value is in- The fairly wide confidence limits ⫺2 z for height for age and weight for appropriate, because it will have in- around the percentiles do not invali- age have become conventional defini- consistent meaning from age to age. date the recommended BMI cutoffs for tions for stunting and wasting, standardized reporting of population respectively.53 CHALLENGES OF MEASURING prevalences or for analyses of the In a similar fashion, for overweight HEIGHT, WEIGHT, AND BMI associated risk profiles of groups of and obesity in children and adoles- Summary reminders concerning data children.51 Nevertheless, investigators cents, z scores can be useful for char- collection and management are listed should be cautious drawing inferences acterizing individuals with a high BMI in Table 1. The particular setting in from risk ratios comparing the ob- that exceeds the percentile levels which data for BMI assessments will served and expected prevalences be- available on the growth charts. For ex- be collected has implications for how yond a given BMI cutoff because of the ample, if the progress of a girl with a or whether the recommended prac- imprecision of the percentile cutoffs. BMI that far exceeds the 97th percen- tices can be implemented. tile for age (currently the highest per- When Should z (SD) Scores centile available on the CDC charts) is Equipment and Space Be Used? monitored, her attained BMI on the If possible, height should be measured A BMI z or SD score is the BMI of a child growth chart is difficult to evaluate to the nearest 0.1 cm (1⁄4 in) by using transformed into a scale comprising and impossible to meaningfully quan- a stadiometer mounted on the wall or the number of SD units it is away from tify. On the other hand, by converting a portable stadiometer that allows the mean of the referent population of her BMI to a z score, her progress can the child to be positioned properly the same age and gender. The 2000 be monitored and changes in subse- with his or her back against a vertical CDC growth charts13 were constructed quent z scores have a direct interpre- surface. A second choice are models in such a way to allow calculation of z tation relative to the referent popula- that measure the child freely stand- scores for BMI. tion of her age. Because z scores are ing, but the measurement errors for There are several advantages to using calculated relative to age, noting a these latter instruments tend to be z scores compared with using the cor- change in z score is an appropriate larger than when the measurements responding percentiles, although they way to evaluate changes in BMI across are taken with the child standing both describe a child’s status relative ages relative to what is expected in the against a surface.55 The height mea- to the same reference data set. The pe- referent population. surements for the 2000 CDC growth diatric applications of z scores that An alternative to using z scores to charts13 were taken by using wall- are most common are probably in en- evaluate change in individual children mounted stadiometers. Many brands docrinology or nutrition where chil- with elevated BMI is to just use change of acceptable stadiometers are avail- dren who are very small relative to the in BMI itself. These changes are un- able, and searching on-line will pro- growth charts are seen and z scores derstandable to practitioners, adoles- vide several good choices. Stadiom- provide a more useful and manage- cents, and families, and they allow set- eters attached to scales that do not able metric than percentiles to evalu- ting of goals and monitoring of progress. allow the child to be positioned cor- ate and monitor status or treat- Using z scores is currently the only rectly are not recommended. ment.53,54 For example, a 3-year-old boy appropriate way available to quantify Weight should be measured by using a with a height-for-age z score of ⫺3.6 the severity of obesity in children who good-quality scale to the nearest 100 g has a height that is 3.6 SDs lower than have BMI levels that exceed the avail- (1⁄4 lb). In the past, balance-beam scales the age- and gender-specific mean for able percentiles for age and gender. were routinely recommended because him on the growth charts; his corre- Unfortunately, z scores require a com- the only alternatives were spring scales sponding height-for-age percentile is puter program to calculate them that were less dependable. Now there 0.013. readily, and the SD-related metric is are many good electric scales avail- When a high proportion of children not familiar to many practitioners. Be- able that are also quite portable. The have heights and weights less than the cause the total variation in BMI (eg, the more expensive scales have multiple lowest percentiles (eg, 3rd, 5th), as distance between the 5th and 95th per- pressure transducers under the weigh- found in many developing countries, centiles) progressively increases with ing platform, so they are less sensitive Downloaded from www.aappublications.org/news by guest on October 10, 2021 S10 HIMES
SUPPLEMENT ARTICLE TABLE 1 Data-Collection and Management Practices for Reducing Errors for Height, Weight, and the patient confidentiality sought by in- BMI stitutional human subjects committees. Equipment and space Choose appropriate equipment Check and calibrate equipment regularly Measurement Protocols Keep extra batteries for scales Because health providers and others Provide a private area for child measurements, if possible Measurement protocols who use BMI data will almost always Chose a protocol that matches that used in the growth charts compare them to the growth charts, it Have written copies of measurement protocols available for review makes sense to strive to collect the Train and standardize data collectors Make sure data are recorded in the appropriate units (eg, kilograms, pounds) height and weight measurements that Make sure data are measured and recorded to the nearest unit specified in the protocol (eg, 0.1 cm for comprise BMI by using protocols that height, 0.1 kg for weight) match those used in the reference Collect some replicate measurements for assessment of reliability, if feasible Personnel data as closely as possible. The mea- Use as few observers as is feasible to take measurements, especially for research studies surement procedures used in the col- Identify observers on data-collection forms or data-entry programs lection of the height, weight, and BMI Data management data for the 2000 CDC growth charts13 Use as exact ages as possible Have unique identifiers for children are currently available as a download- Calculate BMI, percentiles, and z scores by using tables or computer programs able file at the CDC National Health and Nutritional Examination Survey (NHANES)Website(www.cdc.gov/nchs/ data/nhanes/bm.pdf). These measure- to variation in the child’s position and ibrated by using a metal rod of a fixed ment protocols follow closely those rec- shifting of weight from one leg to the length. ommended by a US consensus group.57 other. Again, an Internet-based search This publication has become the gold- Good electric scales can be calibrated will yield many good alternatives. standard reference in the United States or “zeroed.” In most areas of the In a research setting, obviously, the for anthropometry methods related to United States, state agencies in de- best-quality equipment should be health issues, although slight differ- partments of commerce, standards, or chosen for maximum consistency over ences exist for some measurements agriculture have representatives who time and for reliability among observ- customarily used internationally.58 calibrate and certify scales in grocery ers taking the measurements. In clini- It is important to train data collectors stores and in other commercial ven- cal or community settings, cheaper al- in the appropriate methods for mea- ues. In some cases, these representa- ternatives are often used, but given suring height and weight. Again, the tives can be called on to routinely the heavy utilization in a busy clinic, for goal is to use the same measurement check and calibrate scales at perma- example, investing in sturdy anthropo- protocols that were used for the deri- nent sites. Alternatively, scales can be metric equipment that can be cali- vation of the growth charts. Some- calibrated by using weights of known brated if necessary will prove worth- times, experienced clinic staff may size. If models of electric scales are while and increase confidence in the take offense because they have been used in clinic or in the field, ensuring measuring height and weight for a measurements. Cheaper models of that a supply of batteries of appropri- long time. Often, however, “the way we stadiometers tend to have less-rigid parts that wobble or bend with fre- ate size should be on the checklist for do it here” includes some bad habits or quent use. routine equipment maintenance. deviations from the prescribed proto- With repeated use or if equipment is Often, in busy clinic or school situa- cols. Standardizing all data collectors moved about fairly often, stadiometers tions, stadiometers and scales are rel- to a gold-standard trainer ensures and scales should be checked to deter- egated to hallways or even reception that a single protocol is followed and mine if they are calibrated correctly. areas. Children and adolescents may that departures from the trainer are It is important to develop a regular find it embarrassing to be measured, within acceptable limits.59 schedule for calibration (eg, daily in and even more so to have witnesses to For extended research protocols or research, weekly in clinic) and assign the procedures.56 Having a private or for ongoing surveillance or clinical someone to be responsible for these partially screened area for the height activities, having a gold-standard duties. Depending on the installation, and weight measurements will in- trainer periodically visit and observe good stadiometers usually can be cal- crease child cooperation and enhance measurements or take some replicate Downloaded from www.aappublications.org/news by guest on October 10, 2021 PEDIATRICS Volume 124, Supplement 1, September 2009 S11
measurements will help prevent “drift” first time take the measurements. touching required for anthropometric in the measurement techniques. Also, Over the course of the month of data measurements. these opportunities can be used to cor- collection, the variation among observ- As mentioned previously, having as rect and recertify data collectors, if ers, schools, and any study drift will few data collectors as is feasible for necessary. be captured in the final reliability sam- other practical demands will minimize Laminated copies of the measurement ple, which should include data on ⬃25 interobserver measurement variation. protocols on-site provide a readily children. For complicated protocols Ensuring that unique observer codes available reminder for data collectors that involve many measurements or are included on the data-collection concerning child position, measure- administration of other instruments, forms or data-entry computer pro- ment landmarks, and local policies children may contribute only a repli- grams will aid in quality-assurance ac- regarding calibration, clothing, exclu- cate for one of the measurements so tivities and can even be used in the sta- sion criteria, data recording, data that the burden on any one child is tistical analyses if consistent observer flow, etc. small and the total of 25 replicates measurement bias becomes apparent. In research settings a certain propor- may represent many more individual tion of the measurements should be children. The calculation of the rele- Data Management repeated to evaluate measurement vant measurement-reliability statis- Having chronological ages as exact as reliability. The proportion required de- tics has been explained elsewhere.27,61 possible is important for the accurate pends on the number of different ob- If the measurement protocols specify calculation of percentiles and z scores, servers concerned, the number of chil- that duplicate measurements be rou- and they will aid in minimizing age- dren usually measured, and the period tinely collected for all subjects (as rec- related variance in statistical analyses over which reliability will be assessed. ommended above), then these repli- when children are grouped according In general, there should be enough cates can be used for assessing to age. Chronological ages in years ex- replicates to capture the variation measurement reliability as long as all pressed to at least 2 decimal points among data collectors and study de- the different data collectors involved are sufficient for most applications; sign features and to capture a fairly this will capture exact ages to the in the study take the replicates. If the stable estimate of the mean differ- nearest 3 days. For children less than 5 mean of replicate measurements will ences between replicates and the ac- be used in statistical analyses, the or 6 years of age it may be more con- companying SD. The SD of differences measurement reliability should take venient to express age in months to 1 between replicates is really a measure this into account.61 If different data col- decimal point, or in exact days. of variance, and the CIs for a variance lectors usually work on different days, Actual values for BMI, BMI percentiles, begin to stabilize at sample sizes then special scheduling may be re- and BMI z scores are best calculated larger than 20 (in our case, 20 pairs of quired to accommodate fully captur- by using computer programs to avoid measurements).60 ing the interobserver variation in the computational errors. There are many As an example, for a hypothetical study reliability sample. Web sites with BMI calculators that in school-aged children, a 3-person can be found easily by using Internet measurement team will visit 4 differ- Personnel searches, including those provided ent schools during a month of data Experience shows that an advanced by the CDC (http://apps.nccd.cdc.gov/ collection. Each school has an average formal education is not required to dnpabmi/Calculator.aspx) and National of 30 children, and the team will aver- age ⬃10 children measured per day. take high-quality anthropometric mea- Institutes of Health (www.nhlbisupport. So, each school will require 3 days of surements. Willing adults who will com/bmi/bminojs.htm). data collection, and ⬃120 children give adequate attention to detail and In some settings where immediate pa- will be measured. If a target of 25 rep- who meet the requirements for em- tient feedback or charting are con- licates is sought for assessing mea- ployment are usually satisfactory. ducted, calculating BMI by using tables surement reliability, that amounts to Members of the community who are may be preferred. Again, many Web an ⬃20% sample. One simple ap- familiar with the local ethos and jar- sites provide such tables; the only cau- proach is to specify that the data col- gon may be excellent data collectors. tion is that some BMI tables are de- lectors remeasure 2 children per day In some situations, like-gender ob- signed for adults and may not include and that a different data collector from servers may make children and ado- the low heights and weights observed the one who measured the child the lescents more comfortable with the in children.2 Downloaded from www.aappublications.org/news by guest on October 10, 2021 S12 HIMES
SUPPLEMENT ARTICLE TABLE 2 Selected Studies With Interclass Correlation Coefficients for Reported and Measured Height, Weight, and BMI According to Gender Source Group/Location Age or Grade Level n, Male/Female Height r Weight r BMI r Male Female Male Female Male Female Davis and Gergen63 Mexican American/US 12–19 y 392/437 0.86 0.86 0.95 0.93 0.87 0.85 Himes and Faricy64 All/US 12–16 y 759/876 0.89 0.79 0.97 0.93 0.93 0.87 Himes and Story65 American Indian/Minnesota 12–19 y 41/28 0.91 0.71 0.96 0.91 0.90 0.80 Hauck et al66 American Indian/US 12–19 y 536 0.83 0.62 0.95 0.90 0.88 0.79 Brener et al67 20 states/US Grades 9–12 957/1075 0.87 0.82 0.92 0.94 0.89 0.89 Himes et al68 Minnesota/US 12–18 y 1936/1861 0.90 0.80 0.96 0.94 0.89 0.85 Tsigilis69 Trikala, Greece Middle and high schools 141/159 0.94 0.93 0.97 0.97 0.90 0.94 Median r 0.89 0.80 0.96 0.93 0.89 0.85 Exact BMI percentiles and BMI z scores when and how such data might be the random errors in both height and can be calculated by using Epi Info, a used appropriately.62 weight, self-reported BMI generally free, user-friendly and downloadable has lower correlations with measured computer program developed by the Measurement Reliability BMI than corresponding associations CDC (www.cdc.gov/epiinfo). At the CDC No published data are available on re- observed for reported and measured Web site, researchers can download liability in self-reported height and height and weight. a program for SAS statistical analysis weight as narrowly defined previously The youngest-aged children included software that generates a data set con- (ie, the random error associated with in these studies were 11 to 12 years taining the percentiles and z scores for the same measurement being re- old, and correlations between self- all the anthropometric measurements peated). Such data would comprise reported and measured dimensions, (including BMI) in the 2000 CDC growth the same children being asked for especially height, are usually lower charts (www.cdc.gov/nccdphp/dnpa/ their reported height and weight at at these ages than they are later in growthcharts/resources/sas.htm). least twice over a period of time insig- adolescence.64,70,71 SELF-REPORTED HEIGHT, WEIGHT, nificant for growth. A slightly different concern about AND BMI Reliability in self-reports has been young adolescents is that they are of- Having older children and adolescents evaluated in adolescents, considering ten unable or decline to report their report their height and weight rather reliability as the random errors asso- heights and weights.62,72 In a study based than having someone directly measure ciated with the differences between on US national-level data, 41% of 12- them is attractive economically and self-reported height, weight, and BMI year-olds and 25% of 13-year-olds had logistically. Costs of direct anthropo- and the corresponding measured di- missing data for weight.64 These rates metric measurements include addi- mensions. A good summary measure compared with 4% missing reported tional time, personnel, training, and of this reliability is the Pearson or weights in 15- and 16-year olds. It may equipment. Logistically, direct mea- interclass correlation coefficient. be that for youth aged 11 to 13 years surements require an in-person ex- Correlation coefficients between re- their height has not yet become as amination, space, and additional time ported and measured height, weight, important to them as it will be as they for participants. If direct measure- and BMI are presented in Table 2 for get older, and they may not have regu- ments of height and weight are re- some selected studies that reported lar opportunities to have their height quired, some study designs and data- the correlations according to gender. measured. collection strategies are summarily Overall, the correlations for reported inadequate or eliminated (eg, mail and measured dimensions are rela- Measurement Bias surveys, classroom surveys, telephone tively high, indicating that self-reported Although Pearson correlation coeffi- surveys). Of course, the appropriate- values are generally reasonable prox- cients are useful indicators of relia- ness of using self-reports of height, ies for the corresponding measured bility, they only provide average asso- weight, and BMI depends on the reli- values. On the basis of the correlation ciations, and they only account for ability, bias, validity, and specific ap- coefficients, boys generally do a little random errors between reported and plications of these measures. In some better than girls, and weight is usually measured values. Pearson correla- cases, self-reported data may be all that more reliably reported than is height. tions are blind to systematic errors or exist, so it is important to understand Because self-reported BMI combines bias. Several different sources of bias Downloaded from www.aappublications.org/news by guest on October 10, 2021 PEDIATRICS Volume 124, Supplement 1, September 2009 S13
in self-reports of height and weight The biases in self-reports are entan- have been investigated, and they were gled in idiosyncratic differences among recently reviewed for studies on US samples in gender, age, underlying adolescents.62 distributions of BMI, and perhaps For our discussion, it is important to race, mental health, and socioeco- recognize that the mean values of self- nomic status.62,68 reported height are usually overesti- mated by ⬃1 to 2 cm, and mean Measurement Validity self-reported weight is usually under- From the evidence for bias discussed estimated by 2 to 4 kg, especially so above, it is not surprising that consid- in girls.62,72 Thus, with overestimated erable misclassification occurs when height and underestimated weight, children and adolescents are identi- mean BMI values calculated from the fied as overweight or obese on the self-reported data are usually less by basis of self-reports and the BMI- 2 to 3 BMI units (kg/m2) than if they percentile criteria. In the Brener et al67 were measured. study, the sensitivity and specificity Another source of bias that is impor- of self-reported BMI for identifying tant for understanding how self- overweight adolescents were 60.5% reported data might be used in evalu- and 98.0%, respectively. Correspond- ating overweight and obesity is related ing values for sensitivity and specific- to the body size of the children and ad- ity for identifying obese individuals olescents providing the self-reports. were 54.9% and 99.2%, respectively. The mean differences for self-reported So, as few as 55% (positive predictive values less measured values for value) of those who are truly over- height, weight, and BMI are presented weight will be correctly identified as in Fig 2 relative to categories of the such when using BMI calculated from measured dimensions for a sample self-reported heights and weights. Re- of 3797 Minnesota youth aged 12 to sults from other studies of validity are 18 years.68 not much more encouraging.62 For height, the errors in self-reporting The validity of BMI using self-reported are largely positive because most of FIGURE 2 data relative to total body fat has not Mean differences between self-reported and the youth overestimated their heights measured body size adjusted for age, socio- been evaluated. Nevertheless, given (mean differences: boys, 1.2 cm; girls, economic status, and race/ethnicity, plotted the modest validity relative to mea- against categories of the measured dimension: sured BMI, BMI derived from self- 2.4 cm). Nevertheless, a strong nega- A, height; B, weight; C, BMI.68 tive relationship between the errors reported data must be even poorer in reporting height and the actual than measured BMI in its ability to cor- measured heights is evident so that upper percentiles (eg, 85th, 95th). For rectly identify the fattest individuals on the only group actually underestimat- example, in a separate study of high the basis of laboratory methods. ing height was the very tallest boys. school students by Brener et al,67 the For self-reported weight and BMI, the prevalences for overweight (ⱖ85th When Is It Appropriate to Use errors in self-reports became increas- percentile) were 47.4% for directly Self-reported BMI? ingly negative (indicating underesti- measured BMI and 29.7% for self- In some situations, BMI derived from mates) as categories of measured reported BMI. Corresponding preva- self-reported data are the only data weight and BMI increased, with steeper lences for obesity (ⱖ95th percentile) available (eg, the CDC Youth Risk Be- slopes in girls than in boys. were 26.0% for measured BMI and havior Surveillance System,73 which This pattern of underestimation means 14.9% for self-reported BMI. Unfortu- collects data through telephone inter- that the greatest impact of the bias nately, there is no easy conversion views from a national sample). In other in self-reported BMI will be to un- from a prevalence based on self- cases, the complexity and size of the derestimate prevalences of over- reported BMI to what it would have been survey make direct measurements im- weight and obesity defined by the if height and weight were measured. practical.74 Nevertheless, any use of Downloaded from www.aappublications.org/news by guest on October 10, 2021 S14 HIMES
You can also read