The Utility of Alternative Fit Indices in Tests of Measurement Invariance
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Meade, A. W., Johnson, E. C., & Braddy, P. W. (2006, August). The Utility of Alternative Fit Indices in Tests of Measurement Invariance. Paper presented at the annual Academy of Management conference, Atlanta, GA. The Utility of Alternative Fit Indices in Tests of Measurement Invariance Adam W. Meade Emily C. Johnson Phillip W. Braddy North Carolina State University Confirmatory factor analytic tests of measurement invariance (MI) based on the chi-square statistic are known to be sensitive to sample size. For this reason, Cheung and Rensvold (2002) recommended using alternative fit indices in MI investigations. However, previous studies have not established the power of fit indices to detect data with a lack of invariance. In this study, we investigated the performance of fit indices with simulated data known to not be invariant. Our results indicate that alternative fit indices can be successfully used in MI investigations. Specifically, we suggest reporting McDonald’s noncentrality index along with CFI, and Gamma- hat. Measurement invariance (MI) can be Thus, in large samples, power to detect even trivial considered the degree to which measurements differences in the properties of a measure between conducted under different conditions yield measures groups is extremely high, potentially leading to over- of the same attributes (Drasgow, 1984; Horn & identification of a lack of invariance (LOI). For this McArdle, 1992). These different conditions include reason, Cheung and Rensvold (2002) examined the stability of measurement over time (Chan, 1998; potential use of change in alternative fit indices in MI Chan & Schmitt, 2000), across different populations investigations. As with overall model fit, these (e.g., cultures, Riordan & Vandenberg, 1994; gender, alternative fit indices (AFIs) are less strongly affected Marsh, 1985, 1987; age groups, Marsh & Hocevar, by sample size than is chi-square in measurement 1985), rater groups (e.g., Facteau & Craig, 2001), or invariance tests. While their groundbreaking work is over different mediums of measurement extremely promising, one crucial omission from administration (Chan & Schmitt, 1997; Ployhart, Cheung and Rensvold’s study is that they only Weekley, Holtz, & Kemp, 2003). Recently, there has examined the performance of AFIs under the null been a substantial increase in research involving tests hypothesis of perfect MI between groups. Thus, of MI due in part to an increased awareness of both while they recommended the use of the some AFIs, the importance of comparing equivalent measures, as the power of these indices to detect a lack of well as increased access and understanding of the invariance between groups is unknown. This methodology utilized to perform tests of MI (Meade deficiency in the literature precludes more & Lautenschlager, 2004; Vandenberg, 2002). widespread use of AFIs in MI studies as researchers Though multiple methods of establishing MI have no indication that AFIs are sensitive to an LOI. exist, multiple-group confirmatory factor analysis One reason for this omission from their study is that (CFA) has been the most commonly used method in no standard measure or amount of effect size has organizational research (Vandenberg & Lance, 2000). been established in MI research. Thus, it is difficult With these tests, constrained and free CFA models to justify the simulation of any one level of LOI are typically compared using a chi-square-based between groups. This study overcomes this likelihood ratio test (LRT; sometimes called a chi- limitation by generating data with many levels of a square difference test). However, like chi-square lack of invariance (from trivial to severe) in order to tests of overall model fit, the LRT has been shown to examine the performance of AFIs in MI tests of equal be sensitive to sample size (Brannick, 1995; factor loadings. Kelloway, 1995; Meade & Lautenschlager, 2004).
AFIs in Measurement Invariance 2 CFA Tests of MI LOI is indicated in those parameters most recently Measurement invariance can be technically constrained (see Vandenberg & Lance, 2000 for a defined in terms of probabilities such that in order for review). While there is some disagreement as to how MI to exist, the probability of observed responses many model parameters must be equal before MI is conditioned upon latent scores must be unaffected by established, the most commonly investigated portion group membership (Meredith & Millsap, 1992; of the MI model are tests of equality of factor Millsap, 1995). Commonly used CFA tests of MI loadings (Vandenberg & Lance, 2000). Moreover, involve simultaneously fitting a measurement model factor loadings and item intercepts are generally to two or more data samples. The multi-group CFA considered to be the most important aspects of the measurement model between p observed variables model essential for MI to be established (Meade & and m latent factors is given by the equation: Kroustalis, in press). For this reason, we focused on Xg = g + g g + g , (1) MI tests of factor loadings (metric invariance) for this where X is a px1 vector of observed scores, is a px1 initial investigation. vector of intercepts, is a pxm matrix of factor loadings, is a mx1 vector of latent variable scores, Alternative Fit Indices (AFIs) is a px1 vector of unique factor scores, and g denotes We could locate only one published study that these parameters are group specific. Observed that has simulated data in order to determine the variable covariances are then given as: feasibility of using differences in AFIs in order to g= g g ’g + g, (2) establish measurement invariance. In this study, where g is a pxp matrix of observed score Cheung and Rensvold (2002) achieved several covariances, is a mxm latent variance/covariance important goals. First, they specified three criteria matrix, and is a pxp diagonal matrix of unique desirable in an AFI used for establishing MI. These variances. include (1) independence between the overall fit in MI can therefore exist for multiple parts of the baseline model and the change in AFI witnessed the CFA model. For instance, if g= g for all with the imposed model constraints ( AFI), (2) an groups, metric invariance is said to exist (Horn & AFI should not be affected by model complexity, and McArdle, 1992); if g = g for all groups, scalar (3) a lack of redundancy with other AFIs. The first invariance is indicated (Meredith, 1993); and if g = of these criteria is important because the degree to g for all groups, uniqueness invariance exists. If all which sampling error is present in the data should three types of invariance are found, strict factorial influence the baseline and constrained models to the invariance is indicated (Meredith, 1993) such that same degree. The extent to which this is true will be differences in observed score means or covariances manifest via a lack of correlation between the initial are a product of differences in latent means AFI value and the AFI associated with the (sometimes called impact; Holland & Wainer, 1993) additional constraints on the model. Cheung and or latent covariances. Rensvold investigated the performance of twenty Typically, when conducting CFA MI tests, a AFIs with regards to these three criteria. These sequence of nested multi-group models are examined twenty included 2, 2/df (Wheaton, Muthen, Alwin, in order to detect an LOI across samples. In the first & Summers, 1977), Root Mean Squared Error of model, both data sets (representing groups, time Approximation (RMSEA; Steiger, 1989), the periods, etc.) are examined simultaneously, holding Noncentrality Parameter (NCP; Steiger, Shapiro, & only the pattern of factor loadings invariant. This Browne, 1985), Akaike’s Information Criterion (AIC; model serves two functions: First, it serves as a test Akaike, 1987), Browne and Cudeck’s Criterion of configural invariance (Horn & McArdle, 1992); (1989), the Expected Cross-Validation Index (ECVI; that is, poor fit of this model indicates that either the Browne & Cudeck, 1993), Normed Fit Index (NFI; same factor structure does not hold for the two Bentler & Bonett, 1980), Relative Fit Index (RFI; samples, or that the model is misspecified in one or Bollen, 1986), Incremental Fit Index (IFI; Bollen, both samples. The second function of the configural 1989), Tucker-Lewis Index (TLI; Tucker & Lewis, invariance model is that it serves as a baseline of 1973), Comparative Fit Index (CFI; Bentler, 1990), model fit for comparison to other, more restrictive Relative Non-Centrality Index (RNI; McDonald & models. Once adequate fit is established for this Marsh, 1990), Parsimony-Adjusted NFI (James, model, tests of equality of parameters in the CFA Muliak, & Brett, 1982), Parsimonious CFI (Arbuckle model are conducted in a series of sequential models & Wothke, 1999), Gamma-hat (Steiger, 1989), in which typically factor loadings, intercepts, and rescaled AIC (Cudeck & Browne, 1983), Cross- uniqueness terms or other model parameters are Validation Index (CVI; Browne & Cudeck, 1989), constrained in sequence. Once a statistically McDonald’s (1989) Non-Centrality Index, and significant decrement in model fit is witnessed, an Critical N (Hoelter, 1983).
AFIs in Measurement Invariance 3 In order to assess the AFIs, Cheung and DF in factor loadings for some items. Group 1 item Rensvold (2002) simulated data under a variety of intercepts were set at zero for all data and uniqueness conditions varying the number of factors, factor terms were created so that item variance was equal to variances, correlations between factors, number of unity. Moreover, a population correlation of .3 items per factor, factor loadings, and sample size. between the latent factors was constant across all Importantly, they only simulated data that had no study conditions (cf. Cheung & Rensvold, 2002). LOI in the population. They then conducted Factor loadings for Group 1 and Group 2 can be seen ANOVAs in order to determine the effect of the in Table 1. Once population data were simulated, number of items, factors, and the interaction between sampling error was introduced into simulated sample the two on the AFIs. Of the AFIs, only RMSEA data. Three-hundred sample replications, each was immune from all simulated factors. They also containing sampling error, were simulated for each of examined the correlation between the initial AFI the study conditions. value and the AFI. Using this criterion, only NCP, ------------------------------------ IFI, CFI, RNI, Gamma-hat, McDonald’s NCI, and Insert Table 1 about here Critical N showed insignificant correlations. ------------------------------------ Moreover, using a six-way ANOVA, they found that The study design constituted a 5 (sample of the indices mentioned above, only NCP and size) x 20 (magnitude of DF) fully crossed design. Critical N showed a dependence on sample size Sample sizes from 100 to 500 were simulated in accounting for more than 5% of the variance in the increments of 100 for both Group 1 and Group 2 change in the fit index. Given their results, they data. Sample sizes were always equal in Group 1 and suggested only reporting results of CFI, Gamma- 2 MI comparisons. We simulated DF for 4 of 16 hat, and McDonald’s NCI as INI and RNI items, with two DF items per factor. The amount of correlated extremely highly with CFI. DF in item factor loadings varied from a difference between groups of .02 to .40 in increments of .02. The current study These differences in factor loadings were created by In this study, we expand on the work of subtracting the amount of DF from the Group 1 factor Cheung and Rensvold (2002) by assessing the utility loading in order to create the Group 2 factor loading of differences in AFIs ( AFIs) for detecting a lack of for items indicated as DF in Table 1. The magnitude MI in item factor loadings. In order to achieve this of DF across the DF items was uniform in all goal, we simulated data under a constant factor model conditions. in two groups. Several conditions of sample size and differential functioning (DF) of item factor loadings Analyses between groups were then simulated. A CFA baseline model was estimated in which the correct factor structure (see Table 1) was METHOD specified for both Group 1 and Group 2. Next, a constrained model was estimated in which the entire In order to evaluate the performance of factor loading matrix was constrained to be equal for AFIs for detecting an LOI, we simulated item-level the Group 1 and Group 2 data. Correlation matrices data for one group, then modified the properties of were analyzed and factor variances were standardized these data in several ways for some items (our DF in order to achieve model identification for all items) in order to simulate item-level data for another conditions. Results from models with standardized hypothetical group. We decided to investigate the latent variances are equal to those using referent potential of AFIs for detecting an LOI in factor indicators when latent variances are known to be loadings. While there is some consensus that tests of invariant across groups. A probability value of .05 item intercepts are also necessary for establishing MI, was used in computing LRTs; LISREL 8.54 tests of factor loadings always occur before tests of (Jöreskog & Sörbom, 1996) was used for all item intercepts (Vandenberg & Lance, 2000) and thus analyses. seemed a good starting point in this initial We also examined the change in several investigation of the feasibility of these indices to AFIs between baseline and constrained models, evaluate MI. focusing on the AFIs found to be most promising by Cheung and Rensvold (2002). Their study revealed Initial Data Properties that many AFIs had the disadvantageous property An initial structural model was developed of being correlated with initial model fit; thus, we for two correlated eight-item scales representing focused on those AFIs found not to have this “Group 1.” Several conditions of “Group 2” data property. Specifically, we concentrated our were created by modifying Group 1 data to simulate investigation and reporting of results on the CFI,
AFIs in Measurement Invariance 4 Gamma hat, McDonald’s NCI, NCP, IFI, sizable effect due to the interaction between DF and RNI, and Critical N. We also examined sample size. Conversely, IFI, RNI, McDonald’s RMSEA as that index was found by Cheung and NCI, and Gamma-hat showed almost no effect of the Rensvold to be independent of model complexity. interaction between sample size and DF, but small We were primarily concerned with effects of sample size. Interestingly, the Critical-N identifying AFIs that were both (1) sensitive to the showed considerably worse properties than did chi- magnitude of DF, and (2) not sensitive to sample square. These patterns can be seen in Figure 1 in size. Thus, we assessed the suitability of each AFI which the level of the AFIs are plotted by DF and by conducting ANOVAs using SAS’s Proc GLM. In sample size (chi-square is plotted for comparison). each model, the AFI was entered as the dependent Based on these results, it appears that Gamma-hat, variable, with sample size and magnitude of DF as McDonald’s NCI, IFI, RNI, and CFI are among the predictors. We then calculated 2 effect size most promising AFIs for establishing MI. measures for the magnitude of DF, sample size, and ------------------------------------------ the interaction between the two. Optimal AFIs are Insert Figure 1 about here identified by displaying large 2 values for level of ------------------------------------------ DF and small 2 values for both the sample size and We also examined the correlation between the interaction between sample size and level of DF. the AFIs, as highly correlated indices provide little We also graphed the relationship between unique information. As can be seen in Table 4, we AFIs and the amount of DF simulated. Such graphs found that McDonald’s NCI, RNI, IFI, and Gamma- provide a visual indication of the relationship hat were very highly correlated. Thus, like Cheung between the AFIs and the amount of DF present. and Rensvold (2002), our results suggest reporting all Moreover, they present information much more four AFIs would provide largely redundant succinctly than a series of large tables. These graphs information. feature the amount of DF simulated on the x-axis ------------------------------------------ with the value of the change in the fit statistic on the Insert Table 4 about here y-axis. ------------------------------------------ In order for the AFIs to be of utility to RESULTS applied researchers, cutoff values need to be established so the indices can be used in practice. None of the 60,000 analyses resulted in Based on their simulation work, Cheung and convergence errors or inadmissible solutions. The 2 Rensvold (2002) suggested values of .01 for CFI, a effect size estimates for level of DF, sample size, and value of .001 for Gamma-hat, and .02 for the interaction between the two are presented in McDonald’s NCI1. They did not provide Tables 2 and 3. While the data in these tables are the recommendations for other indices as, like this study, same, Table 2 sorts fit indices by the effect of DF they found that those indices correlated so highly as while Table 3 sorts the AFIs by the effect of sample to not provide unique useful information. Based on size and the interaction between sample size and our analyses, we concur with Cheung and Rensvold level of DF. (2002) such that we also recommend reporting CFI, ------------------------------------------ Gamma-hat, and McDonald’s NCI. As such, we Insert Tables 2 and 3 about here evaluated the cutoff scores recommended by Cheung ------------------------------------------ and Rensvold by creating cutoff values in AFIs for As can be seen in Tables 2 and 3, all AFIs these indices and plotting the percentage of outperform chi-square in both being responsive to DF significant samples in which an LOI was detected and in being insensitive to sample size, with the (out of 300 replications) for each level of DF for exception of the NCP and Critical N. Because the these three AFIs and the LRT. These plots can be degrees of freedom of the baseline and constrained seen in Figure 2. models were the same in all study conditions, the ------------------------------------------ NCP (defined as chi-square minus degrees of Insert Figure 2 about here freedom) and chi-square had equal effect size ------------------------------------------ estimates. As can be seen in the tables, no one index was superior to the others for both criteria (maximum 1 Note that Cheung and Rensvold report negative sensitivity to DF and minimum sensitive to sample values for these indices. In this study, we calculated size). Gamma-hat, McDonald’s NCI, IFI, and RNI the AFIs in order to keep AFIs values (generally) were somewhat more sensitive to DF than the other positive. We have changed the sign on the indices. CFI and RMSEA showed considerably lower recommended cutoff values from Cheung and effects of sample size, though RMSEA showed a Rensvold to be consistent with our coding.
AFIs in Measurement Invariance 5 As can be seen in Figure 2, it appears that we examined the extent to which the AFIs are Cheung and Rensvold suggested a cutoff value for sensitive to DF. Second, we examined insensitivity the CFI that is somewhat out of line with those of the to sample size when an LOI exists. Third, we have other fit indices in that their CFI value is demonstrated the relationship between several AFIs considerably less sensitive to DF than the others. and many levels of DF for several sample size Figure 3 plots these same data, though organizing the conditions. Fourth, we evaluate the power (% results by fit index to allow a better visualization of significant analyses) for the AFIs using Cheung and the effects of sample size. As can be seen in Figure Rensvold’s (2002) recommended cutoff values. 3, none of the AFIs were unaffected by sample size. The results of our study largely concur with This is to be expected, however, because although the those of Cheung and Rensvold (2002) in that we mean of the fit indices may not vary by sample size, found that CFI, Gamma-hat, and McDonald’s their sampling distributions will still be affected NCI were among the most promising AFIs in that (Marsh, Balla, & McDonald, 1988). In other words, they were (1) less sensitive to sample size than was when examining model fit to two models that fit chi-square, (2) more sensitive to DF than chi-square, equally well in the population, larger samples will be and (3) generally provided non-redundant associated with less variation around the mean AFI information with other AFIs. However, we found than will smaller samples due to less sampling error. that Cheung and Rensvold’s recommended cutoff Thus, when comparing a constrained and baseline values affected the performance of the AFIs for model with a given level of DF, larger sample sizes detecting an LOI. In particular, the recommended lead to less variation in the difference between the value for CFI seems excessively large. For model AFIs and thus a higher percentage of the 300 example, when 4 of 16 items showed DF with factor replications in which an LOI is deemed significant loadings differences of .3, power to detect this than with smaller sample sizes. difference was below 50% in all sample size ------------------------------------------ conditions with the CFI (see Figure 3). In contrast, Insert Figure 3 about here the Gamma-hat, McDonald’s NCI, and the LRT ------------------------------------------ all showed power near 100% for sample sizes of 200 and larger for these data. In this study, the DISCUSSION McDonald’s NCI of .02 seemed to perform optimally of the four indices. The LRT and As recognition of MI as an important Gamma-hat seemed overly sensitive to small (5000) as a their earlier work in several important ways. First, condition. At these large sample sizes, power to
AFIs in Measurement Invariance 6 detect an LOI is very high with the LRT, and it may Browne, M. W., & Cudeck, R. (1989). Single sample well be that researchers dealing with these large cross-validation indices for covariance sample sizes may be the most likely to pursue using a structures. Multivariate Behavioral AFI to evaluate MI. Third, we simulated data that Research, 24, 445-455. were somewhat idealized as compared to that Browne, M. W., & Cudeck, R. (1993). Alternative simulated by Cheung and Rensvold (2002). Our ways of assessing model fit. In K. A. Bollen factor model was simulated to be ‘clean’. In other & J. S. Long (Eds.), Testing structural words, our population model used to derive our equations models (pp. 136-162). Newbury sample replications had zero values for cross- Park, CA: Sage. loadings. While our choice of factor model was no Chan, D. (1998). The conceptualization and analysis more arbitrary than was Cheung and Rensvold’s of change over time: An integrative (2002), the better fit associated with our model may approach incorporating longitudinal mean be less likely to be encountered in practice. and covariance structures analysis (LMACS) We consider our study to be an initial and multiple indicator latent growth expansion of earlier work on AFIs for evaluating modeling (MLGM). Organizational MI. Future research needs to address the Research Methods, 1(4), 421-483. performance of these indices in identifying an LOI in Chan, D., & Schmitt, N. (1997). Video-based versus item intercepts, uniqueness terms, factor variances paper-and-pencil method of assessment in and covariances, and latent means. Also, the effects situational judgment tests: Subgroup of model misspecification and model complexity on differences in test performance and face AFIs need to be examined under conditions in which validity perceptions. Journal of Applied MI does not hold. Importantly, a follow-up study Psychology, 82(1), 143-159. that included very large sample sizes would also be Chan, D., & Schmitt, N. (2000). Interindividual valuable. differences in intraindividual changes in In sum, it appears that examining AFIs proactivity during organizational entry: A may be a valuable tool for establishing MI. These latent growth modeling approach to indices could supplement or replace the LRT for understanding newcomer adaptation. some data conditions. However, further study is Journal of Applied Psychology, 85(2), 190- needed before widespread implementation should 210. proceed. Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for REFERENCES testing measurement invariance. Structural Equation Modeling, 9(2), 233-255. Akaike, H. (1987). Factor analysis and AIC. Cudeck, R., & Browne, M. W. (1983). Cross- Psychometrika, 52, 317-322. Validation of Covariance-Structures. Arbuckle, J. L., & Wothke, W. (1999). Amos 4.0 Multivariate Behavioral Research, 18(2), user's guide. Chicago: SmallWaters. 147-168. Bentler, P. M. (1990). Comparative fit indexes in Drasgow, F. (1984). Scrutinizing psychological tests: structural models. Psychological Bulletin, Measurement equivalence and equivalent 107, 238-246. relations with external variables are the Bentler, P. M., & Bonett, D. G. (1980). Significance central issues. Psychological Bulletin, 95(1), tests and goodness of fit in the analysis of 134-135. covariance structures. Psychological Facteau, J. D., & Craig, S. B. (2001). Are Bulletin, 88, 588-606. performance appraisal ratings from different Bollen, K. A. (1986). Sample size and Bentler and rating sources comparable? Journal of Bonett' s nonnormed fit index. Applied Psychology, 86(2), 215-227. Psychometrika, 51, 375-377. Hoelter, J. W. (1983). The analysis of covariance Bollen, K. A. (1989). A new incremental fit index for structures: Goodness-of-fit indices. general structural equation models. Sociological Methods & Research, 11, 325- Sociological methods and research, 17, 303- 344. 316. Holland, P. W., & Wainer, H. (1993). Differential Brannick, M. T. (1995). Critical Comments on item functioning. Hillside, NJ: Erlbaum. Applying Covariance Structure Modeling. Horn, J. L., & McArdle, J. J. (1992). A practical and Journal of Organizational Behavior, 16(3), theoretical guide to measurement invariance 201-213. in aging research. Experimental Aging Research, 18(3-4), 117-144.
AFIs in Measurement Invariance 7 James, L. R., Muliak, S. A., & Brett, J. M. (1982). and-pencil testing of applicants in a Causal analysis: Assumptions, models and proctored setting: Are personality, biodata data. Beverly Hills: Sage. and situational judgment tests comparable? Jöreskog, K. & Sörbom, D. (1996). LISREL 8: Users Personnel Psychology, 56(3), 733-752. Reference Guide. Chicago: Scientific Riordan, C. M., & Vandenberg, R. J. (1994). A Software International. central question in cross-cultural research: Kelloway, E. K. (1995). Structural Equation Do employees of different cultures interpret Modeling in Perspective. Journal of work-related measures in an equivalent Organizational Behavior, 16(3), 215-224. manner? Journal of Management, 20(3), Marsh, H. W. (1985). The structure of 643-671. masculinity/femininity: An application of Steiger, J. H. (1989). EzPATH: Causal modeling. confirmatory factor analysis to higher-order Evanston, IL: SYSTAT. factor structures and factorial invariance. Steiger, J. H., Shapiro, A., & Browne, M. W. (1985). Multivariate Behavioral Research, 20(4), On the multivariate asymptotic distribution 427-449. of sequential chi-square statistics. Marsh, H. W. (1987). The factorial invariance of Psychometrika, 50, 253-263. responses by males and females to a Tucker, L. R., & Lewis, C. (1973). A reliability multidimensional self-concept instrument: coefficient for maximum likelihood factor Substantive and methodological issues. analysis. Psychometrika, 38, 1-10. Multivariate Behavioral Research, 22(4), Vandenberg, R. J. (2002). Toward a further 457-480. understanding of an improvement in Marsh, H. W., & Hocevar, D. (1985). Application of measurement invariance methods and confirmatory factor analysis to the study of procedures. Organizational Research self-concept: First- and higher order factor Methods, 5(2), 139-158. models and their invariance across groups. Vandenberg, R. J., & Lance, C. E. (2000). A review Psychological Bulletin, 97(3), 562-582. and synthesis of the measurement invariance Marsh, H. W., Balla, J. R., & McDonald, R. P. literature: Suggestions, practices, and (1988). Goodness-of-fit indexes in recommendations for organizational confirmatory factor analysis: The effect of research. Organizational Research Methods, sample size. Psychological Bulletin, 103(3), 3(1), 4-69. 391-410. Wheaton, B., Muthen, B., Alwin, D. F., & Summers, McDonald, R. P. (1989). An index of goodness-of-fit G. F. (1977). Assessing reliability and based on noncentrality. Journal of stability in panel models. In D. R. Heise Classification, 6, 97-103. (Ed.), Sociological methodology (pp. 84- McDonald, R. P., & Marsh, H. W. (1990). Choosing 136). San Francisco: Jossey-Bass. a multivariate model: Noncentrality and goodness of fit. Psychological Bulletin, 107, 247-255. Author Contact Info: Meade, A. W., & Lautenschlager, G. J. (2004). A Monte-Carlo Study of Confirmatory Factor Adam W. Meade Analytic Tests of Measurement Department of Psychology Equivalence/Invariance. Structural Equation North Carolina State University Modeling, 11(1), 60-72. Campus Box 7650 Meredith, W. (1993). Measurement invariance, factor Raleigh, NC 27695-7650 analysis and factorial invariance. Phone: 919-513-4857 Psychometrika, 58(4), 525-543. Fax: 919-515-1716 Meredith, W., & Millsap, R. E. (1992). On the E-mail: awmeade@ncsu.edu misuse of manifest variables in the detection of measurement bias. Psychometrika, 57(2), 289-311. Millsap, R. E. (1995). Measurement invariance, predictive invariance, and the duality paradox. Multivariate Behavioral Research, 30(4), 577-605. Ployhart, R. E., Weekley, J. A., Holtz, B. C., & Kemp, C. (2003). Web-based and paper-
AFIs in Measurement Invariance 8 TABLE 1 Population Factor Loadings for Group 1 and Group 2 Data Group 1 Group 2 Factor Factor Factor Factor Item 1 2 1 2 1 .80 - .80 - 2 .70 - .70 - 3 .60 - .60 - 4 .50 - .50 - 5 .80 - XX - 6 .70 - XX - 7 .60 - .60 - 8 .50 - .50 - 9 - .80 - .80 10 - .70 - .70 11 - .60 - XX 12 - .50 - XX 13 - .80 - .80 14 - .70 - .70 15 - .60 - .60 16 - .50 - .50 Note: XX indicates DF item with variable magnitude of DF. Numeric Group 2 loadings are equal to their Group 1 counterparts (i.e., are not DF items).
AFIs in Measurement Invariance 9 TABLE 2 Omega-Squared Effect Size Estimates for The Amount of DF and Sample Size on AFI Indices; Sorted by the Effect of Amount of DF. Amount Sample AFI of DF DF*N Size (N) (DF) Gamma Hat 0.824 0.007 0.000 McDonald’s NCI 0.824 0.007 0.000 IFI 0.812 0.005 0.000 RNI 0.811 0.006 0.000 CFI 0.722 0.002 0.002 RMSEA 0.651 0.001 0.022 2 0.588 0.010 0.130 NCP 0.588 0.010 0.130 Critical-N 0.389 0.013 0.198
AFIs in Measurement Invariance 10 TABLE 3 Omega-Squared Effect Size Estimates for The Amount of DF and Sample Size on AFI Indices; Sorted by the Effects of Sample Size. Amount Sample AFI of DF Size DF*N (DF) (N) CFI 0.722 0.002 0.002 IFI 0.812 0.005 0.000 RNI 0.811 0.006 0.000 McDonalds NCI 0.824 0.007 0.000 Gamma Hat 0.824 0.007 0.000 RMSEA 0.651 0.001 0.022 2 0.588 0.010 0.130 NCP2 0.588 0.010 0.130 Critical-N 0.389 0.013 0.198 Note: Table sorted by the sum of the effects of N and DF*N.
AFIs in Measurement Invariance 11 TABLE 4 Correlations Between AFIs 2 McD CFI Critical N G-hat IFI RMSEA RNI NCP NCI 2 1.00 CFI 0.83 1.00 Critical N 0.94 0.64 1.00 Gamma-hat 0.86 0.94 0.70 1.00 IFI 0.85 0.96 0.68 0.99 1.00 McDonald’s NCI 0.87 0.93 0.71 1.00 0.99 1.00 RMSEA 0.87 0.88 0.78 0.89 0.87 0.89 1.00 RNI 0.85 0.96 0.68 0.99 1.00 0.99 0.87 1.00 NCP 1.00 0.83 0.94 0.86 0.85 0.87 0.87 0.85 1.00
AFIs in Measurement Invariance 12 FIGURE 1 Change in Chi-Square . Changes in AFIs by Level of DF and Sample Size 200 150 100 200 100 300 400 50 500 0 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF Change in Gamma Hat . 0.025 0.02 100 0.015 200 0.01 300 0.005 400 500 0 -0.005 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF Change in McDonald's NCI . 0.2 0.15 100 0.1 200 300 0.05 400 500 0 -0.05 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF
AFIs in Measurement Invariance 13 Change in IFI . 0.02 0.015 100 0.01 200 300 0.005 400 500 0 -0.005 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF 0.02 Change in RNI . 0.015 100 0.01 200 300 0.005 400 500 0 -0.005 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF 0.02 Change in CFI . 0.015 100 0.01 200 300 0.005 400 500 0 -0.005 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF
AFIs in Measurement Invariance 14 Change in RMSEA . 0.035 0.03 0.025 100 0.02 200 0.015 300 0.01 0.005 400 0 500 -0.005 -0.01 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF 200 Change in NCP . 150 100 100 200 300 50 400 500 0 -50 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF Change in Critical N . 600 500 400 100 200 300 300 200 400 100 500 0 -100 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF
AFIs in Measurement Invariance 15 FIGURE 2 Percentage of Significant Analyses by Level of DF and Sample Size N=100 1.2 %Significant . 1 0.8 2 Gamma-hat 0.6 McD's NCI 0.4 CFI 0.2 0 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF N=200 1.2 % Significant . 1 0.8 2 Gamma-hat 0.6 McD's NCI 0.4 CFI 0.2 0 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF N=300 1.2 % Significant . 1 0.8 2 Gamma-hat 0.6 McD's NCI 0.4 CFI 0.2 0 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF
AFIs in Measurement Invariance 16 N=400 1.2 % Significant . 1 0.8 2 Gamma-hat 0.6 McD's NCI 0.4 CFI 0.2 0 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF N=500 1.2 % Significant . 1 0.8 2 Gamma-hat 0.6 McD's NCI 0.4 CFI 0.2 0 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF
AFIs in Measurement Invariance 17 FIGURE 3 Percentage of Significant Analyses by Level of DF and Sample Size Change in Chi-Square 100 % Significant . 1 200 0.8 300 0.6 400 0.4 500 0.2 0 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF Change in Gamma-hat 100 % Significant . 1 200 0.8 300 0.6 400 0.4 500 0.2 0 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF Change in McDonald's NCI 100 % Significant . 1 200 0.8 300 0.6 400 0.4 500 0.2 0 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF
AFIs in Measurement Invariance 18 Change in CFI 100 % Significant . 1 200 0.8 300 0.6 400 0.4 500 0.2 0 0.02 0.06 0.1 0.14 0.18 0.22 0.26 0.3 0.34 0.38 Amount of DF
You can also read