PSYCHOMETRIC PROPERTIES OF THE "TEACHING QUALITY" SCALE IN STARTING - COHORTS 2, 3, AND 4 ANNA HAWROT NEPS SURVEY PAPERS - LIFBI
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
NEPS SURVEY PAPERS Anna Hawrot PSYCHOMETRIC PROPERTIES OF THE "TEACHING QUALITY" SCALE IN STARTING COHORTS 2, 3, and 4 NEPS Survey Paper No. 82 Bamberg, February 2021
Survey Papers of the German National Educational Panel Study (NEPS) at the Leibniz Institute for Educational Trajectories (LIfBi) at the University of Bamberg The NEPS Survey Paper series provides articles with a focus on methodological aspects and data handling issues related to the German National Educational Panel Study (NEPS). They are of particular relevance for the analysis of NEPS data as they describe data editing and data collection procedures as well as instruments or tests used in the NEPS survey. Papers that appear in this series fall into the category of 'grey literature' and may also appear elsewhere. The NEPS Survey Papers are edited by a review board consisting of the scientific management of LIfBi and NEPS. The NEPS Survey Papers are available at www.neps-data.de (see section “Publications“) and at www.lifbi.de/publications. Editor-in-Chief: Thomas Bäumer, LIfBi Review Board: Board of Directors, Heads of LIfBi Departments, and Scientific Management of NEPS Working Units Contact: German National Educational Panel Study (NEPS) – Leibniz Institute for Educational Trajectories – Wilhelmsplatz 3 – 96047 Bamberg − Germany − contact@lifbi.de
Hawrot Psychometric Properties of the “Teaching Quality” Scale for Teachers in Starting Cohorts 2, 3, and 4 Anna Hawrot Leibniz Institute for Educational Trajectories, Bamberg, Germany E-mail address of lead author: anna.hawrot@lifbi.de Bibliographic data: Hawrot, A. (2021). Psychometric Properties of the “Teaching Quality” Scale for Teachers in Starting Cohorts 2, 3, and 4 (NEPS Survey Paper No. 82). Leibniz Institute for Educational Trajectories, National Educational Panel Study. https://doi.org/10.5157/NEPS:SP82:1.0 NEPS Survey Paper No. 82, 2021
Hawrot Psychometric Properties of the “Teaching Quality” Scale for Teachers in Starting Cohorts 2, 3, and 4 Abstract This paper presents information on the source, theoretical background, and psychometric properties of the “Teaching quality” scale used in Wave 2 of Starting Cohort 2, Waves 2 and 7 of Starting Cohort 3, and Wave 3 of Starting Cohort 4. The scale measures teacher perceptions of the quality of their own instruction. We ran an item-level analysis and checked the scale’s reliability and internal structure. The items had low to moderate discriminatory power and the internal consistencies of the subscales were low. The analyses did not confirm the expected three-factor structure of the scale. A two-factor structure emerged, but it was not fully consistent across Waves and Starting Cohorts. Overall, the test needs major refinements before it can be used as a measure of teaching quality. Keywords teaching quality, teachers, psychometric properties, Three Basic Dimensions Acknowledgments This paper uses data from the National Educational Panel Study (NEPS): Starting Cohort 2 (doi: 10.5157/NEPS:SC2:7.0.0), Starting Cohort 3 (doi:10.5157/NEPS:SC3:7.0.1), and Starting Cohort 4 (doi: 10.5157/NEPS:SC4:9.1.1). From 2008 to 2013, NEPS data was collected as part of the Framework Program for the Promotion of Empirical Educational Research funded by the German Federal Ministry of Education and Research (BMBF). As of 2014, NEPS is carried out by the Leibniz Institute for Educational Trajectories (LIfBi) at the University of Bamberg in cooperation with a nationwide network. I would like to thank Katharina Loos for her assistance with German-language literature and Fenja Schaupp for formatting the paper. NEPS Survey Paper No. 82, 2021 Page 2
Hawrot 1. Introduction The National Educational Panel Study aims at tracking the development of various competencies, describing their patterns, and better understanding of how they unfold across the lifespan (Blossfeld et al., 2011). To this end, information is gathered about various potential sources of influence, including the home environment, educational institutions, or the workplace. However, all these factors need to be measured in a stage-sensitive way that is in a way that is adjusted to the participants’ age, stage of development, as well as to their educational or professional path. Teaching quality is considered one of the key factors affecting student learning and engagement in the school learning environment (e.g., Hattie, 2009; Seidel & Shavelson, 2007). As a result, in the study, teaching quality was measured from both the teacher and student perspective. This report focuses on the teacher perspective. Information on the psychometric properties of a student scale used in Staring Cohort 3 can be found in Hawrot (2021). This paper presents information on the source, theoretical background, and psychometric properties of the “Teaching quality” scale used in Starting Cohorts 2, 3, and 4 to assess teacher perceptions of the quality of their own instruction. Its goal is to document the scale and provide data users with basic information necessary to make an informed decision about use of the scale in the analyses based on the NEPS data or in their own research 1. 2. Description of the Scale The “Teaching quality” scale consists of 12 items divided into three subscales for Cognitive Activation (7 items), Student Support (2 items), and Classroom Management (3 items). Subjects are asked to rate to what extent each item applies to their own teaching. The response categories are labelled as follows: 1 = does not apply, 2 = does rather not apply, 3 = does rather apply, 4 = does apply. The items come from various scales used in three German educational studies: “Bildungsprozesse, Kompetenzentwicklung und Selektionsentscheidungen im Vorschul- und Schulalter” (Codebuch Zum Lehrerfragebogen Welle 1 [BiKS-8-14 Grundschule], n.d.), “Pädagogische EntwicklungsBilanzen” (Steinert et al., 2003), and „Studie zur Entwicklung von Ganztagsschulen“ (Furthmüller, 2014). They were adapted and pooled together to form subscales reflecting three generic dimensions of teaching quality according to the model of Three Basic Dimensions (Praetorius et al., 2018). According to this model Classroom Management is a prerequisite for learning and refers to an orderly and disruption-free classroom environment with clear rules and expectations. Student Support captures whether students are provided with assistance with learning that is adjusted to their individual needs and interests. Cognitive Activation refers to promoting conceptual understanding in class and to stimulating higher-order thinking (Praetorius et al., 2018). Table 1 contains the item wording and the corresponding variable names used in the Scientific Use Files. The original German-language wording is available on the project’s website (www.neps-data.de). 1 Please note that we assumed the items to build a scale and therefore we neither focused on the possibility of using single items, nor tested the properties (e.g., retest reliability) or validity of single items. NEPS Survey Paper No. 82, 2021 Page 3
Hawrot Table 1 Items of the ”Teaching Quality” Scale Variable To what extent do the following statements apply to your Subscale name teaching? e22540a CA a) I give students assignments of different levels of difficulty based on their abilities. e22440b SS b) I quickly notice when a student is having trouble. e22340c CM c) Everyone in my class knows the “rules of the game.” e22540d CA d) In my classes, the types of tasks are repeated to solidify what my students have learned. e22540e CA e) I discuss general and current topics with my students even if puts my lesson plan behind schedule. e22540f CA f) I see it as my job in the classroom to present and teach proven concepts. e22340g CM g) I summarize the material so that my students will remember it better. e22540h CA h) I often ask students to justify their answers with arguments. e22440i SS i) There’s a friendly, trusting relationship between me and my students. e22340j CM j) I think absolute quiet in the classroom is important. e22540k CA k) In my class, the students should find out for themselves why something is wrong. e22540l CA l) I like to give the faster students extra tasks to challenge them. Note. CA = Cognitive Activation; CM = Classroom Management; SS = Student Support. 3. Method 3.1 Data and Sample We used data gathered during the National Educational Panel Study (NEPS, Blossfeld et al., 2011) from Starting Cohort 2 (SC2), Starting Cohort 3 (SC3), and Starting Cohort 4 (SC4). In the three cohorts, selected teachers who were teaching sampled students were surveyed as context persons. The scale was administered to teachers as a part of a larger questionnaire NEPS Survey Paper No. 82, 2021 Page 4
Hawrot using the standard testing procedure for a wave. Information on the procedures is available in the data manual (Skopek et al., 2012b, 2012a, 2013) and interviewer manuals2. Table 2 contains information about the waves of each starting cohort in which the scale was administered to teachers. It is supplemented by information which grade students participating in NEPS attended. Please note that the sample did not include teachers from special schools. Table 2 The Scale Administration in Starting Cohorts 2, 3 & 4 Wave 1 2 3 4 5 6 7 Starting Cohort 2 G3 Starting Cohort 3 G6 G10 Starting Cohort 4 G10 Note. G = grade. The “Teaching quality” scale was administered four times: once in SC2 (Wave 5, school year 2014/15), once in SC4 (Wave 3, school year 2011/12) and twice in SC3 (Waves 2 and 7, school year 2011/12 and 2015/16). In SC3, the same teacher could therefore fill in the scale twice— in Waves 2 and 7. This report however, uses data from the first administration only. This was necessary because some inconsistencies appeared in gender and birth date of teachers surveyed in different waves but having the same identification number assigned. These inconsistencies suggested that identification numbers might not be fully consistent across waves, causing difficulties in identifying dependent data. Thanks to using data gathered during the first administration only, teacher samples in the waves of SC3 did not overlap. We used variable ex80211, which contains information about the questionnaire administered to each teacher (first-time or panel interviewee questionnaire), to identify and exclude all repeated administrations. Table 3 presents the sample sizes in all administration time-points. The samples include teachers who responded to at least one item of the scale; thus, the number of teachers who filled in at least one question in the whole teacher questionnaire may be different. 2 https://www.neps-data.de/Data-Center/Data-and-Documentation/Starting-Cohort- Kindergarten/Documentation; https://www.neps-data.de/Data-Center/Data-and-Documentation/Starting- Cohort-Grade-5/Documentation; https://www.neps-data.de/Data-Center/Data-and-Documentation/Starting- Cohort-Grade-9/Documentation NEPS Survey Paper No. 82, 2021 Page 5
Hawrot Table 3 Sample Sizes in Various Administration Time-Points Wave 1 2 3 4 5 6 7 Starting Cohort 2 651 Starting Cohort 3 564 280 Starting Cohort 4 1075 Note. Samples include first administrations of the test only. 3.2 Analytical Procedure In the first step, we analyzed missing response rates per person and per item as well as calculated the items’ discriminatory power and reliabilities of subscale scores. Next, we inspected item distributions to identify potential problems with response scales, for example, range restrictions. Second step involved analyzing the internal structure of the scale. It was divided into several sub-steps: first, confirmatory factor analyses (CFA) were performed for each starting cohort’s wave separately; then, if the CFA models did not provide an adequate fit to the data, exploratory factor analyses (EFA) were performed. The analyses of the internal structure were performed with Mplus 7.4 (Muthén & Muthén, 1998-2015) using delta parameterization and the WLSMV estimator. This estimator is recommended for ordered categorical data, especially when item response distributions are skewed and the number of response categories is small (e.g., Beauducel & Herzberg, 2006; Flora & Curran, 2004). The scales of the CFA factors were set by fixing factor variances at one. The EFA models used oblique Geomin rotation. All of the models accounted for the non- independence of teachers clustered within schools by adjusting to the standard errors using a sandwich estimator (CLUSTER option). The model fit was assessed with three commonly used (McDonald & Ho, 2002) fit indices, that is, the root mean square error of approximation (RMSEA), the comparative fit index (CFI), and the Tucker–Lewis index (TLI). We assumed that CFI and TLI values not lower than .95, and RMSEA values not higher than .06 indicated a good fit (Hu & Bentler, 1999). 4. Results 4.1 Missing Responses NEPS datasets include several codes for missing data. In this case, two types of missing values occurred: implausible values and unspecific missing values. Both types refer to nonresponse, with implausible values denoting invalid responses and unspecific missing values denoting nonresponse for which the cause is unknown. NEPS Survey Paper No. 82, 2021 Page 6
Hawrot 4.1.1 Missing responses per person Table 4 contains information with the numbers and percentages of respondents with a given number of implausible values, unspecific missing values, and total number of missing values. The majority of missing values was unspecific. The number of implausible values per person was low. It incidentally exceeded 1; under 2% of respondents participating in a given wave of a given starting cohort provided at least one implausible response. The number of unspecific missing values per person was higher than the number of implausible values. The percentage of respondents with at least one unspecific missing value oscillated between 7.5% in Wave 7 of SC3 and 10.8% in Wave 5 of SC2. The respondents most often omitted one item, and the rate of single-item omissions ranged from 5% (Wave 7, SC3) to 7.7% (Wave 5, SC2). Under 0.8% of respondents did not provide their response to over 25% of items. The total missing values per person and unspecific missing values per person hardly differed because of the low share of implausible values in the total missing values. Thus, the results for total missing values are not described. Table 4 Rates of Implausible, Unspecific, and Total Missing Values per Person SC2: W5 SC3: W2 SC3: W7 SC4: W3 Number of missing values per person Freq. % Freq. % Freq. % Freq. % Implausibe value 0 638 98.00 558 98.94 279 99.64 1065 99.07 1 11 1.69 6 1.06 1 0.36 10 0.93 2 2 0.31 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 Total 651 100 564 100 280 100 1075 100 >= 1 13 2.00 6 1.06 1 0.36 10 0.93 Unspecific missing 0 581 89.25 511 90.60 259 92.50 980 91.16 1 50 7.68 40 7.09 14 5.00 71.00 6.60 2 10 1.54 9 1.60 6 2.14 16 1.49 3 5 0.77 2 0.35 0 0 1 0.09 NEPS Survey Paper No. 82, 2021 Page 7
Hawrot SC2: W5 SC3: W2 SC3: W7 SC4: W3 Number of missing values per person Freq. % Freq. % Freq. % Freq. % 4 2 0.31 2 0.35 1 0.36 4 0.37 5 0 0.00 0 0 0 0 0 0 6 2 0.31 0 0 0 0 3 0.28 7 1 0.15 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 Total 651 100 564 100 280 100 1075 100 >= 1 70 10.75 53 9.40 21 7.50 95 8.84 Total missing 0 572 87.86 506 89.72 258 92.14 972 90.42 1 53 8.14 44 7.80 15 5.36 77 7.16 2 16 2.46 10 1.77 6 2.14 18 1.67 3 5 0.77 2 0.35 0 0 1 0.09 4 2 0.31 2 0.35 1 0.36 4 0.37 5 0 0 0 0 0 0 0 0 6 2 0.31 0 0 0 0 3 0.28 7 1 0.15 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 Total 651 100 564 100 280 100 1075 100 >= 1 79 12.14 58 10.28 22 7.86 103 9.58 Note. SC2 = Starting Cohort 2; SC3 = Starting Cohort 3; SC4 = Starting Cohort 4; W = Wave. NEPS Survey Paper No. 82, 2021 Page 8
Hawrot 4.1.2 Missing responses per item Table 5 contains information about implausible, unspecific missing, and total missing values per item. Again, implausible values appeared incidentally. Unspecific missing value rates per item ranged from 0% to 7.1% depending on the item and wave/starting cohort. Item e22540f had the highest rates (4.3% or more in all waves and starting cohorts). Table 5 Rates of Implausible, Unspecific, and Total Missing Values per Item SC2: W5 SC3: W2 SC3: W7 SC4: W3 Item Freq. % Freq. % Freq. % Freq. % Implausible value e22540a 0 0 1 0.18 0 0 0 0 e22440b 2 0.31 0 0 0 0 1 0.09 e22340c 0 0 0 0 0 0 0 0 e22540d 2 0.31 0 0 0 0 0 0 e22540e 0 0 1 0.18 0 0 2 0.19 e22540f 1 0.15 0 0 0 0 1 0.09 e22340g 1 0.15 0 0 0 0 2 0.19 e22540h 0 0 0 0 0 0 0 0 e22440i 1 0.15 0 0 0 0 1 0.09 e22340j 6 0.92 3 0.53 0 0 2 0.19 e22540k 2 0.31 0 0 1 0.36 0 0 e22540l 0 0 1 0.18 0 0 1 0.09 Unspecific missing e22540a 6 0.92 5 0.89 2 0.71 5 0.47 e22440b 6 0.92 2 0.35 1 0.36 9 0.84 e22340c 1 0.15 0 0 0 0 2 0.19 e22540d 6 0.92 6 1.06 4 1.43 17 1.58 NEPS Survey Paper No. 82, 2021 Page 9
Hawrot SC2: W5 SC3: W2 SC3: W7 SC4: W3 Item Freq. % Freq. % Freq. % Freq. % e22540e 7 1.08 3 0.53 0 0 5 0.47 e22540f 46 7.07 28 4.96 12 4.29 55 5.12 e22340g 11 1.69 10 1.77 3 1.07 19 1.77 e22540h 3 0.46 2 0.35 2 0.71 5 0.47 e22440i 4 0.61 4 0.71 3 1.07 5 0.47 e22340j 9 1.38 5 0.89 0 0 5 0.47 e22540k 7 1.08 6 1.06 1 0.36 7 0.65 e22540l 6 0.92 1 0.18 2 0.71 6 0.56 Total missing e22540a 6 0.92 6 1.06 2 0.71 5 0.47 e22440b 8 1.23 2 0.35 1 0.36 10 0.93 e22340c 1 0.15 0 0.00 0 0.00 2 0.19 e22540d 8 1.23 6 1.06 4 1.43 17 1.58 e22540e 7 1.08 4 0.71 0 0.00 7 0.65 e22540f 47 7.22 28 4.96 12 4.29 56 5.21 e22340g 12 1.84 10 1.77 3 1.07 21 1.95 e22540h 3 0.46 2 0.35 2 0.71 5 0.47 e22440i 5 0.77 4 0.71 3 1.07 6 0.56 e22340j 15 2.30 8 1.42 0 0.00 7 0.65 e22540k 9 1.38 6 1.06 2 0.71 7 0.65 e22540l 6 0.92 2 0.35 2 0.71 7 0.65 Note. SC2 = Starting Cohort 2; SC3 = Starting Cohort 3; SC4 = Starting Cohort 4; W = Wave. The total missing values per item and the number of unspecific missing values per item hardly differed because of the low share of implausible values in the total missing values. As a consequence, the total missing values are not described. NEPS Survey Paper No. 82, 2021 Page 10
Hawrot In summary, the implausible value rates per item were low. In conjunction with the very low rates per person, this result suggests that respondents did not experience major difficulties with using the scale’s response format. The unspecific missing value rates per item and per person were acceptable, although item e22540f had increased rates compared to the other items. 4.2 Item Distributions Figures 1 and 2 present the item response distributions in all samples. The distributions of multiple items were skewed or severely skewed towards higher values. Moreover, ceiling effects appeared (see e.g., e22440c, e22540h, e22540i) and scale restrictions were present. For five items no responses were recorded for the category does not apply (e.g., e22440c, e22540h, e22540k) in at least one sample. For seven other items, although the respondents used all of the categories, under 2.5% of responses were recorded for the category does not apply (e.g., e22540d, e22540f, e22540l). Moreover, for some items hardly any responses were recorded for the second lowest category (does not really apply, e.g., items e22340c, e22490i). The above mentioned problems were present in all starting cohorts. NEPS Survey Paper No. 82, 2021 Page 11
Hawrot Note. SC2 = Starting Cohort 2; SC3 = Starting Cohort 3; SC4 = Starting Cohort 4; W = Wave. Figure 1. Item response distributions: items e22540a - e222540f. NEPS Survey Paper No. 82, 2021 Page 12
Hawrot Note. SC2 = Starting Cohort 2; SC3 = Starting Cohort 3; SC4 = Starting Cohort 4; W = Wave. Figure 2. Item response distributions: items e22340g - e22540l. NEPS Survey Paper No. 81, 2021 Page 13
Hawrot 4.3 Discriminatory Power and Reliability To assess the discriminatory power of the items we calculated item-rest correlations within each subscale. Please note that the Student Support subscale included only two items and therefore the reported value represents the correlation between the two items. The discriminatory power of the items was low (.147) to moderate (.428) and reached moderate values (> .3) in all samples in the case of only one item (e22540l). The reliabilities were also dissatisfactory, even for the longest subscale. They oscillated around .4 for Classroom Management and Student Support, and around .55 for Cognitive Activation. The results are presented in Table 6. Table 6 Item-rest Correlations and Reliability Coefficients by Starting Cohort and Wave Item-rest correlation Cronbach’s α Subscale/item SC2: SC3: SC3: SC4: SC2: SC3: SC3: SC4: W5 W2 W7 W3 W5 W2 W7 W3 Classroom Management e22340c .248 .239 .270 .293 e22340g .168 .184 .255 .223 .361 .388 .441 .442 e22340j .236 .300 .291 .299 Student Support e22440b .258 .189 .257 .343 .408 .317 .402 .506 e22440i Cognitive Activation e22540a .147 .310 .327 .349 e22540d .252 .205 .260 .285 e22540e .337 .286 .203 .261 e22540f .247 .270 .232 .331 .544 .565 .534 .605 e22540h .280 .210 .197 .270 e22540k .339 .328 .243 .322 e22540l .317 .390 .369 .428 NEPS Survey Paper No. 81, 2021 Page 14
Hawrot 4.4 Internal Structure Next, we tested the measure’s internal structure. To increase the chances that the sample consisted of respondents who were committed to filling in the scale and provided valid responses, we excluded teachers who had four or more missing values in the scale (over 25% of items). First, we ran confirmatory factor analysis to test whether the expected three-factor structure held in independent samples. The models did not include any cross-loadings, but factors were allowed to correlate. The fit of the three-factor model was poor, which indicated that the expected structure did not hold. Detailed information is presented in Table 7. Second, because the CFA models did not fit, we ran exploratory factor analysis for each wave of each starting cohort separately to explore the scale’s internal structure. We tested models with up to three factors. The scree test, Kaiser criterion, model fit, and interpretability of the results served as selection criteria. However, we did not find any solution that would satisfy the criteria. The models either fitted poorly, or were difficult to interpret. Moreover, the extracted factors differed between waves in their salient loadings. Table 7 Fit of the Three-Factor CFA Models Sample n Npar 2 Df p RMSEA CFI TLI SC2: W5 646 46 321.27 51 < .001 .091 .760 .690 SC3: W2 562 46 356.18 51 < .001 .103 .699 .611 SC3: W7 279 46 153.38 51 < .001 .085 .765 .696 SC4: W3 1068 46 717.23 51 < .001 .111 .765 .696 Note. SC2 = Starting Cohort 2; SC3 = Starting Cohort 3; SC4 = Starting Cohort 4; W = Wave; Npar = number of free parameters. Response categories does not apply (1) and does not really apply (2) were merged in items e22440b, e22540h, e22440i, and e22540k to assure the same number of categories in each sample. As a consequence, we decided to exclude items e22440b, e22340c, e22540h, and e22440i from further analyses. The items were discarded due to their severely skewed response distributions (e22340c, e22540h, e22440i), dissimilar content (e22440b, e22340c, e22440i), or low factor loadings in SC2, Wave 5 (e22440b, e22340c, e22540h, e22440i). All excluded items met at least two out of three above mentioned criteria. The items discarded because of their dissimilar content tapped teacher perceptions of student behavior (e22440b, e22340c) or teacher-student relationship (e22440i), whereas the other items referred to teaching practices or to the way of organizing instruction. We used information on factor loadings from one sample instead of all samples to inform the item selection in order to avoid overfitting the final model. Next, we reran EFA on SC2 sample. Due to satisfactory properties of the two-factor solution in SC2, Wave 5, we ran EFA on the remaining samples. The scree test and Kaiser criterion NEPS Survey Paper No. 81, 2021 Page 15
Hawrot pointed again to the two-factor solution, and the models had good fit to the data. Detailed results are presented in Table 8. Table 8 Fit of the Two-Factor EFA Models Sample n Npar 2 Df p RMSEA CFI TLI SC2: W5 646 15 33.46 13 .002 .049 .964 .923 SC3: W2 562 15 17.88 13 .162 .026 .993 .985 SC3: W7 279 15 8.98 13 .775 .000 1.000 1.033 SC4: W3 1068 15 26.62 13 .014 .031 .992 .983 Note. SC2 = Starting Cohort 2; SC3 = Starting Cohort 3; SC4 = Starting Cohort 4; W = Wave; Npar = number of free parameters. Items e22440b, e22340c, e22540h, and e22440i were excluded. Response categories does not apply (1) and does not really apply (2) were merged in item e22540k to assure the same number of categories in each sample. Table 9 Factor Loadings in the Two-Factor EFA Models Item SC2: W5 SC3: W2 SC3: W7 SC4: W3 F1 F2 F1 F2 F1 F2 F1 F2 e22540a -0.101 0.421* 0.032 0.578* 0.123 0.559* 0.001 0.644* e22540d 0.532* 0.102 0.654* -0.082 0.416* 0.136 0.676* -0.010 e22540e 0.269* 0.293* 0.335* 0.159* 0.358* 0.039 0.362* 0.078* e22540f 0.779* -0.002 0.666* 0.003 0.651* -0.002 0.697* 0.016 e22340g 0.637* -0.032 0.603* 0.037 0.629* -0.040 0.624* -0.008 e22340j 0.427* 0.047 0.383* -0.038 0.460* 0.096 0.390* 0.002 e22540k 0.053 0.545* 0.164* 0.334* 0.088 0.198 0.096* 0.390* e22540l -0.003 0.757* -0.005* 0.991* -0.004 0.918* -0.002 0.950* F.corr .183* .195* .262* .259* Note. SC2 = Starting Cohort 2; SC3 = Starting Cohort 3; SC4 = Starting Cohort 4; W = Wave; F. corr. = factor correlation. Items e22440b, e22340c, e22540h, and e22440i were excluded. Response categories does not apply (1) and does not really apply (2) were merged in item e22540k to assure the same number of categories in each sample. Salient loadings are in bold type. * p < .05 NEPS Survey Paper No. 81, 2021 Page 16
Hawrot However, the extracted factors did not resemble the expected ones. For example, the items originally assigned to Cognitive Activation loaded on both factors, and F1 factor gathered items belonging to Cognitive Activation and Classroom Management. Moreover, the pattern of factor loading was not fully consistent across samples. For example, item e22540k did not load on F2 factor in Wave 7 of SC3, whereas it did so in the other samples; item e22540e loaded on both factors in Wave 5 of SC2, but saliently on F1 only in the other samples. Wave 2 of SC2 and Wave 3 of SC4 had the most similar factor structure. Factor loadings in the two- factor EFA models are presented in Table 9. In summary, the results did not support the expected structure of the scale. We did not find the three factors corresponding to the dimensions of instructional quality. Instead, two factors resembling direct transmission (F1) and constructivist (F2) beliefs about teaching emerged. However, they referred to teaching behavior or teaching strategies instead of convictions. Therefore, the scale needs a substantial revision. Possible changes include rewording or replacing the items with unclear assignment to factors and balancing the number of items in different subscales. 5. Summary This paper documents the “Teaching quality” scale used in Starting Cohort 2, Starting Cohort 3, and Starting Cohort 4 to assess teacher perceptions of the quality of their own instruction. Besides providing information about the scale’s source and theoretical background, the paper reports basic information about its psychometric properties. The scale was administered four times: in Wave 5 of SC2, Waves 2 & 7 of SC3, and Wave 3 of SC4. The analyses showed that the rates of missing values per person and per item were acceptable in all waves, although item e22540f had the highest rates in all samples. Under 2% of teachers provided at least one implausible response, whereas from 7.5% (Wave 7 of SC3) to 10.8% (Wave 5 of SC2) omitted one or more items. The item response distributions were skewed and the percentages of responses recorded for the lowest category were very low in all samples. Item e22340c showed particularly severe skewing, which included restricted response range. The items’ discriminatory power was low except for one item (e22540l). The subscale reliabilities were was also low—they did not exceed .44 for Classroom Management, .51 for Student Support and .61 for Cognitive Activation. The expected three-factor structure did not hold. Exploratory factor analyses revealed that the scale was two-dimensional, however, this required excluding four items. Moreover, the extracted factors, which resembled direct transmission and constructivist approach to teaching, differed to some extent between the samples. The results suggest that the scale does not provide the expected information about instructional quality and as a consequence, needs a major revision. Possible changes include modifying or replacing severely skewed items and items with a low discriminatory power or unclear assignment to subscales, modifying the response scale, and balancing the number of items in different subscales. NEPS Survey Paper No. 81, 2021 Page 17
Hawrot References Beauducel, A., & Herzberg, P. Y. (2006). On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Structural Equation Modeling, 13(2), 186–203. Blossfeld, H.-P., Rossbach, H.-G., & von Maurice, J. (Eds.). (2011). Education as a Lifelong Process—The German National Educational Panel Study (NEPS). [Special Issue] Zeitschrift fuer Erziehungswissenschaft, 14. Codebuch zum Lehrerfragebogen Welle 1 [BiKS-8-14 Grundschule]. (n.d.). Otto-Friedrich- Universität Bamberg, DFG. Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9(4), 466–491. https://doi.org/10.1037/1082-989X.9.4.466 Furthmüller, P. (2014). Skalenverzeichnis. Skalen und Indizes der Scientific-Use-Files 2005 bis 2009. Studie zur Entwicklung von Ganztagsschulen (StEG). Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge. Hawrot, A. (2021). Psychometric properties of the “Quality of instruction” scale in Starting Cohort 3 (NEPS Survey Paper No. 81). Leibniz Institute for Educational Trajectories, National Educational Panel Study. Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. https://doi.org/10.1080/10705519909540118 NEPS Survey Paper No. 81, 2021 Page 18
Hawrot McDonald, R. P., & Ho, M.-H. R. (2002). Principles and practice in reporting structural equation analyses. Psychological Methods, 7(1), 64–82. https://doi.org/10.1037/1082-989X.7.1.64 Muthén, L. K., & Muthén, B. O. (1998-2015). Mplus user’s guide. Seventh edition (7th ed.). Muthén & Muthén. Praetorius, A.-K., Klieme, E., Herbert, B., & Pinger, P. (2018). Generic dimensions of teaching quality: The German framework of Three Basic Dimensions. ZDM, 50(3), 407–426. https://doi.org/10.1007/s11858-018-0918-4 Seidel, T., & Shavelson, R. J. (2007). Teaching effectiveness research in the past decade: The role of theory and research design in disentangling meta-analysis results. Review of Educational Research, 77(4), 454–499. https://doi.org/10.3102/0034654307310317 Skopek, J., Pink, S., & Bela, D. (2012a). Data manual. Starting Cohort 2—From kindergarten to elementary school. NEPS SC2 1.0.0 (NEPS Research Data Paper). University of Bamberg. Skopek, J., Pink, S., & Bela, D. (2012b). Data manual. Starting Cohort 3—From lower to upper secondary school. NEPS SC3 1.0.0 (NEPS Research Data Paper). University of Bamberg. Skopek, J., Pink, S., & Bela, D. (2013). Starting Cohort 4: 9th Grade (SC4). SUF Version 1.1.0. Data Manual. (NEPS Research Data Paper). University of Bamberg. Steinert, B., Gerecht, M., Klieme, E., & Doebrich, P. (2003). Skalen zur Schulqualität: Dokumentation der Erhebungsinstrumente. ArbeitsPlatzUntersuchung (APU), Pädagogische EntwicklungsBilanzen (PEB) (No. 10; Materialien Zur Bildungsforschung). GFPF. NEPS Survey Paper No. 81, 2021 Page 19
You can also read