SEXUAL ASSAULT EDUCATION PROGRAMS: A META-ANALYTIC EXAMINATION OF THEIR EFFECTIVENESS

Page created by Joel Deleon
 
CONTINUE READING
Psychology of Women Quarterly, 29 (2005), 374–388. Blackwell Publishing. Printed in the USA.
Copyright 
          C 2005 Division 35, American Psychological Association. 0361-6843/05

       SEXUAL ASSAULT EDUCATION PROGRAMS: A
  META-ANALYTIC EXAMINATION OF THEIR EFFECTIVENESS

                                         Linda A. Anderson                      Susan C. Whiston
                                       Oregon State University                 Indiana University

        Meta-analyses of the effectiveness of college sexual assault education programs on seven outcome measure categories
        were conducted using 69 studies that involved 102 treatment interventions and 18,172 participants. Five of the outcome
        categories had significant average effect sizes (i.e., rape attitudes, rape-related attitudes, rape knowledge, behavioral
        intent, and incidence of sexual assault), while the outcome areas of rape empathy and rape awareness behaviors did
        not have average effect sizes that differed from zero. A significant finding of this study is that longer interventions are
        more effective than brief interventions in altering both rape attitudes and rape-related attitudes. Moderator analyses
        also suggest that the content of programming, type of presenter, gender of the audience, and type of audience may also
        be associated with greater program effectiveness. Implications for research and practice are discussed.

The disturbingly high incidence of sexual assault expe-                   dissertations and publications in this area can be a daunting
rienced by college women has been widely documented                       task. Consequently, despite increases in recent research, lit-
over the last few decades (e.g., Abbey, Ross, McDuffle, &                 tle is actually known about the overall effectiveness of these
McAuslan, 1996; Brener, McMahon, Warren, & Douglas,                       programs and whether they produce lasting attitudinal or
1999; Koss, Gidycz, & Wisniewski, 1987). Consequently,                    behavioral changes (Heppner, Neville, Smith, Kivlighan, &
the need for sexual assault prevention on college campuses                Gershuny, 1999).
nationwide has become increasingly apparent. The federal                      In an attempt to understand the value of these programs,
government has acknowledged the importance of this issue                  several narrative reviews of the literature have been pub-
by mandating that sexual assault prevention efforts be con-               lished (e.g., Bachar & Koss, 2001; Breitenbecher, 2000;
ducted on campuses that receive federal funding (Neville                  Gidycz, Rich, & Marioni, 2002; Lonsway, 1996; Schewe
& Heppner, 2002). As a result, college education programs                 & O’Donohue, 1993a; Yeater & O’Donohue, 1999). While
have emerged as one of the more popular strategies for                    several reviewers have concluded that most programs dis-
sexual assault prevention.                                                play short-term effectiveness in altering rape-supportive at-
   Although interventions have been developed and im-                     titudes, there is little understanding of the impact of these
plemented at various colleges and universities across the                 interventions beyond this point. Unfortunately, narrative
United States since the 1980s, few of these programs have                 review of such a broad range of findings has several limita-
been empirically evaluated (Schewe & O’Donohue, 1993a;                    tions. First, past narrative reviews of the sexual assault ed-
Yeater & O’Donohue, 1999). McCall (1993) summarized                       ucation literature typically have not included unpublished
this situation by contending, “[S]exual assault prevention                studies, and thus may tend to inflate the effectiveness of pro-
programming remains a confused, scattered, and sporadic                   gramming (Brecklin & Forde, 2001; Breitenbecher, 2000).
enterprise with little scientific underpinning” (p. 277). For-            Furthermore, narrative reviews do not provide quantita-
tunately, in recent years the research on these programs has              tive indices of the degree to which particular program ap-
expanded; however, drawing conclusions from the myriad of                 proaches are effective, nor do they typically systematically
                                                                          identify factors that may moderate or influence program
                                                                          effectiveness. For many involved in sexual assault educa-
                                                                          tion, it is not sufficient to know if these interventions are
Linda A. Anderson, University Counseling and Psychological Ser-
vices, Oregon State University; Susan Whiston, Department of
                                                                          generally effective because they are interested in develop-
Counseling & Educational Psychology, Indiana University.                  ing programs that have the most significant impact on par-
  Address correspondence and reprint requests to: Linda A.                ticipants. Hence, identification and analysis of moderator
Anderson, University Counseling and Psychological Services,               variables may be particularly important.
Oregon State University, 500 Snell Hall, Corvallis, OR 97331.                 Meta-analysis is a technique that overcomes some of the
E-mail: Linda.Anderson@oregonstate.edu                                    limitations of narrative reviews and provides a numerical

374
Sexual Assault                                                                                                              375

indicator of program effectiveness (i.e., effect size) that al-   behavioral categories and were adapted from construct
lows individuals to determine the degree to which interven-       categories offered by Breitenbecher (2000). Although these
tions are effective (Lipsey & Wilson, 2001). Furthermore,         outcome categories are likely to be correlated, they were
meta-analytic techniques can be used to identify variables        analyzed separately to obtain information about the differ-
that influence effect size (i.e., moderator variables). There     ential impact of programming on various constructs, partic-
have been two previous meta-analytic reviews of sexual as-        ularly in light of concerns about the tenuous link between
sault education programs. The first meta-analytic review of       attitudes and behaviors. Lipsey and Wilson (2001) recom-
the sexual assault education literature (Flores & Hartlaub,       mended this approach, as averaging effect sizes across all of
1998) yielded an average effect size of .30. This effect size     these constructs would result in more ambiguous and less
indicates that those attending a sexual assault education         meaningful results.
program would have outcomes almost a third of a standard              Attitude measurements employed in the sexual assault
deviation better than participants who had not attended a         education literature represent a diverse range and thus
program. Flores and Hartlaub’s meta-analysis included only        were divided into three categories. For the purpose of
11 studies that used rape-myth acceptance as the sole out-        this investigation, dependent measures that may be cat-
come measure. The second meta-analysis found an overall           egorized as rape attitudes were those that assessed at-
mean effect size of .35, based on 45 studies (Brecklin &          titudes specific to rape, such as rape myth acceptance,
Forde, 2001). Brecklin and Forde (2001) were also able to         attitudes toward rape, and rape victim blame. This cat-
identify several variables that moderated effect size. They       egory is similar to the outcomes analyzed in prior meta-
found that programs were more effective for men in single-        analyses. Rape empathy was the second outcome category.
gender than in mixed-gender groups. They also found that          It included scales designed to measure empathy specifi-
published studies had larger effect sizes, supporting the         cally related to rape and the degree to which participants
need to include both published and unpublished research           identified with either rape victims or perpetrators. This out-
in future reviews. However, Brecklin and Forde’s (2001)           come was differentiated because reviewers have specifi-
meta-analysis considered only one category of outcome,            cally identified empathy as a construct targeted in educa-
rape attitudes. The current study was designed to expand          tional programming (Lonsway, 1996; Schewe, 2002). The
on previous analyses of the effectiveness of sexual assault       third attitudinal construct was rape-related attitudes. The
education programs by (a) examining a more diverse set of         scales incorporated into this category did not measure rape-
outcomes and (b) exploring whether a wider spectrum of            specific attitudes, but assessed attitudes thought to pro-
program factors (e.g., type of presenter, content of program)     mote the occurrence of sexual assault. This category in-
may influence program effectiveness.                              cluded measures of sex-role stereotyping, attitudes toward
    Concerning the first goal mentioned above, there are          women, and adversarial sexual beliefs. These measures
a variety of outcome measures that have been used by              were not included in past meta-analyses; thus, this category
individual researchers to examine the impact of sexual            may contribute additional information about the impact of
assault education programs. Until recently, most studies          programming.
have relied exclusively upon attitudinal measures to as-              The fourth outcome, rape-related knowledge, consisted
sess the effectiveness of sexual assault education. Several       of measures of participants’ factual knowledge about sexual
researchers have questioned this restricted focus on at-          assault. The final three outcome categories reflected vary-
titudes as the only indicator of change (e.g., Heppner,           ing dimensions of behavior, which are typically defined and
Humphrey, Hillenbrand-Gunn, & DeBord, 1995; Lonsway,              assessed differently for women and men. Behavioral inten-
1996; Schewe & O’Donohue, 1993a) because there is still           tions included participants’ self-reported intent to rape or to
debate on whether a reduction in rape-supportive attitudes        engage in certain dating behaviors. Rape awareness behav-
will reduce the actual incidence of sexual assault. Conse-        iors referred to the actual self-reported or observed behav-
quently, investigators have begun to utilize measures that        iors of participants that may reflect heightened awareness
assess other outcomes, such as knowledge about sexual as-         about sexual assault (e.g., differences in dating behaviors or
sault, behaviors thought to be associated with sexual assault,    willingness to volunteer for rape prevention efforts). The
and incidence of perpetration or victimization. Due to the        final outcome category included in this study was the actual
recent expansion of outcome measurement in the literature,        incidence of sexual assault perpetration and victimization
it was considered important to incorporate diverse outcome        following an intervention.
measures in the current meta-analysis.                                The second major goal of the current investigation was
    In this investigation, seven different outcome variables      to examine the impact of several potential moderating vari-
were analyzed separately: rape attitudes, rape empathy,           ables. Descriptive characteristics about the study (e.g., pub-
rape-related attitudes, rape knowledge, behavioral inten-         lished or unpublished) and its methods and procedures
tions, rape awareness behaviors, and incidence of sex-            (e.g., sample size, random assignment, and time of follow-
ual assault. In essence, seven separate meta-analyses were        up measure) were incorporated in the moderator anal-
conducted, one for each distinct outcome variable. These          ysis to examine if the type and quality of the research
outcomes represent various attitudinal, knowledge, and            influence effect size. In addition, it was deemed critical to
376                                                                                                 ANDERSON AND WHISTON

examine both characteristics of the participants and the in-                              METHOD
terventions to delineate the attributes of effective programs.
                                                                 Selection of Studies
Attending to issues of gender is particularly important in an-
alyzing sexual assault education programs because rape is        Several strategies were used to ensure that virtually all per-
primarily a crime committed by males against females, and        tinent data, both published and unpublished, were used
therefore there are likely to be gender differences on the       in this meta-analysis. Initially, seven computerized refer-
outcome variables. The gender of the audience was targeted       ence database systems were searched: PsycINFO, ERIC,
for moderator analysis because the question of whether           MEDLINE, Dissertation Abstracts Online, Criminal Jus-
all-male, all-female, or mixed groups are more effective         tice Abstracts, Sociological Abstracts, and the Social Science
has been frequently debated. In a narrative review, Breit-       Citation Index. A number of combinations of key words
enbecher (2000) tentatively concluded that single-gender         (e.g., rape, sexual assault, prevention, intervention) were
programs appeared more effective. Similarly, Berkowitz           used to identify pertinent studies, dissertations or theses,
(2002) and Rozee and Koss (2001) argued that mixed-              and program evaluation reports. The second step for iden-
gender programs are less effective because men may be-           tifying relevant studies was to examine the references of the
come defensive in the presence of women. Brecklin and            articles obtained and of previous reviews. The third step in-
Forde (2001) explored this issue in their meta-analysis and      volved searching by hand the last 5 years of journals that
concluded that single-gender programs are indeed more            have traditionally published articles relating to sexual as-
effective at reducing men’s rape-supportive attitudes. This      sault to find articles that were comparatively recent and not
investigation sought to replicate prior conclusions concern-     in databases or cited by other researchers. The final strategy
ing rape attitudes, while also exploring the impact of single-   was to contact authors who have published multiple articles
and mixed-gender programming on additional outcome               concerning sexual assault education and to request their as-
constructs.                                                      sistance in locating studies. For more detailed information
    Based on the recommendations of previous review-             concerning methodological procedures, refer to Anderson
ers (e.g., Breitenbecher, 2000; Lonsway, 1996; Yeater &          (2003).
O’Donohue, 1999), additional moderators examined in this             To have been eligible for inclusion, studies must have
study included status of the intervention facilitator (e.g.,     examined an intervention intended to reduce negative at-
peer, graduate student, or professional), type of popula-        titudes and/or behaviors associated with sexual assault.
tion that received the intervention (e.g., fraternity mem-       Eligible studies must have measured the effectiveness of
bers), length of the program, and intervention content.          an intervention using one or more quantitative dependent
Surprisingly, previous reviews have not explored in de-          measures designed to assess one of seven different outcome
tail the degree to which content influences program ef-          categories: rape attitudes, rape empathy, rape-related atti-
fectiveness; thus, practitioners have little information con-    tudes, rape knowledge, behavioral intentions, rape aware-
cerning what type of content to include to maximize a            ness behaviors, and/or incidence of sexual assault. Because
program’s impact. Based on a comprehensive review of             the prevalence, definition, and general understanding of
the literature and an analysis of over 100 program de-           sexual assault may vary by culture, only studies that involved
scriptions, four general types of rape education programs        North American college students as participants were in-
were identified. Although some interventions exhibited el-       cluded in this meta-analysis. Furthermore, a study had to
ements of more than one type of program, an attempt was          compare one or more interventions with a control group.
made to categorize the content of each program based             Control groups could either be placebo, wait-list, minimal
upon its primary focus. The nature of coding this partic-        treatment, or no-treatment groups. Either random assign-
ular item was somewhat subjective; however, this strat-          ment of participants or pretests had to be given to ensure
egy provided at least an initial exploration of the effect of    equivalence of groups on outcome measures prior to the
differences in content to guide practitioners in program         intervention. Consistent with the suggestions of Hedges
development.                                                     and Olkin (1985), neither treatment-versus-treatment nor
    In sum, this meta-analysis expands on previous reviews       pretest–posttest comparisons were included so as to cal-
by focusing on additional characteristics of the participants,   culate the least biased estimate of the effectiveness of rape
facilitators, and content of the interventions and how these     education programs. Finally, each study also needed to pro-
factors may moderate outcome. Furthermore, because pre-          vide the necessary information to calculate effect sizes, such
vious reviews have examined the effect of sexual assault         as means and standard deviations or other statistical test
education programs on attitudinal variables only, this meta-     information.
analysis examined additional outcomes related to knowl-              Of the 120 studies identified as research on sexual as-
edge and behavior. Finally, additional statistical procedures    sault education programs, 51 of those studies could not be
as recommended by Hedges and Olkin (1985) and Lipsey             used for this meta-analysis. Many of those studies (n = 19)
and Wilson (2001) were implemented to calculate weighted         were eliminated because they lacked a control group. Other
effect sizes and to consider methodological factors of stud-     studies (n = 11) were not included because they dupli-
ies in drawing conclusions.                                      cated another study (e.g., a dissertation later published as a
Sexual Assault                                                                                                               377

journal article), and some studies (n = 10) did not pro-         domly selected portion of articles (n = 12) so that relia-
vide sufficient information to calculate effect sizes. Other     bility estimates could be obtained. The rate of agreement
studies were eliminated because the participants were not        between the two coders was calculated according to the
college students (n = 4) or because the studies included no      four types of moderator variables. The rate of agreement
relevant dependent measures (n = 2), lacked pretest or ran-      for source descriptors combined with methods and proce-
dom assignment (n = 4), or did not involve an intervention       dures was 95.5%, for participant characteristics 97.6%, and
(n = 1). After being screened, 69 studies met the criteria       for intervention characteristics 92.1%.
for inclusion in this review.
                                                                 Analysis
Coding Procedures
                                                                 Statistical analysis was conducted in accordance with the
Consistent with the recommendations of Stock (1994), a           guidelines provided by Lipsey and Wilson (2001). To iden-
manual was developed to guide coding. Attention was given        tify effects on behavior, knowledge, and attitudes, separate
to coding of moderator variables to determine whether fac-       meta-analyses were conducted for each of the seven out-
tors such as methodological sophistication or content of the     come categories. Each effect size (g) was calculated by sub-
intervention moderated the effect size of an investigation.      tracting the mean of the control group from the mean of
As suggested by Lipsey and Wilson (2001), moderator vari-        the experimental group and dividing by the pooled standard
ables were divided into three categories: source descriptors,    deviation. When means and standard deviations were not
methods and procedures, and substantive issues (i.e., char-      available from primary research studies, procedures sum-
acteristics of the participants and the intervention).           marized by Lipsey and Wilson (2001) to estimate standard-
    Source descriptors coded for this study included publi-      ized mean difference effect sizes from statistics (e.g., t tests)
cation form (journal, dissertation/thesis, unpublished) and      were utilized. Following the calculation of effect sizes, these
year of publication. Information concerning methods and          values were then transformed to d to correct for small sam-
procedures were coded as follows: unit and type of assign-       ple bias (Hedges & Olkin, 1985). Once effect sizes were
ment to conditions (individual random, group random, in-         calculated for each study, adjusted for bias and error, and
dividual nonrandom, group nonrandom), nature of control          combined within each construct, the standard error of the
group (placebo, no treatment, minimal information), time         estimate was calculated and each d was then weighted by
of follow-up measure, and overall quality of study. Guide-       its inverse variance, which produced the aggregated effect
lines were developed to standardize evaluation of the over-      size estimate d+ (Hedges & Olkin, 1985). This procedure
all quality of the study (see Anderson, 2003). We further        also corrects for bias because it requires each effect size
coded for attempt to control for social desirability or de-      to be weighted by a value (inverse variance weight) that
mand characteristics, sample attrition, standardization of       represents its precision (Lipsey & Wilson, 2001).
measures, and reliability of measures; however, these fac-           Lipsey and Wilson (2001) and Rosenthal (1991) argued
tors were not included in the moderator analyses because         that effect sizes within a distribution need to be statisti-
the information was inconsistently reported.                     cally independent and suggested using only one effect size
    Information coded about the participants included gen-       for each construct examined. One method for ensuring in-
der, mean age, race, gender of audience (all-female, women       dependence in outcome measures is to not use data from
from a mixed-gender group, all-male, men from a mixed-           the same measure more than one time. For example, some
gender group, women and men combined), and type of au-           studies will assess outcome at the end of treatment and then
dience (Greek members, general students, high risk). Infor-      later in a follow-up analysis to determine the long-term ef-
mation about the intervention was coded as follows: status of    fect of the intervention. We opted to calculate effect sizes
facilitator (peer, graduate student, or professional), length    in this meta-analysis using measures from the last follow-
of program, and content of program. The content of the           up analysis conducted for each study. Given that effect size
program was coded as follows: (a) informative (providing         tends to decrease with time, this conservative method was
factual information and statistics, review of myths and facts,   chosen to enhance our understanding of the long-term ef-
discussion of consequences of rape, identification of char-      fectiveness of programming, which may not be reflected
acteristics of rape scenarios); (b) empathy focused (helping     in an immediate posttest. As another way to ensure inde-
participants develop empathy for rape victims); (c) socializa-   pendence of data, we averaged the effect sizes within each
tion focused (examining gender-role stereotyping, societal       category to produce one effect size per outcome. Finally,
messages that influence rape, oppression); (d) risk reducing     when there is more than one treatment intervention within
(teaching specific strategies to reduce one’s risk of rape);     a single study, effect sizes were calculated for each treat-
(e) more than one content; (f) cannot determine; and (g)         ment relative to the control group, to avoid losing valuable
other. Theoretical foundation was also coded, but could not      information about different types of programs.
be analyzed as a moderator due to the dearth of studies that         Due to our interest in examining gender issues, effect
included this variable.                                          sizes for women and men were coded separately whenever
    The first author coded each article, and the second au-      possible, in an effort to determine whether women and men
thor (a professor of counseling psychology) coded a ran-         in single-gender groups benefit more from programming
378                                                                                                                ANDERSON AND WHISTON

than those in mixed-gender groups. However, some authors                  men whenever possible, the results are based on 262 effect
reported the results only for women and men combined.                     sizes. Approximately 43.9% of these effect sizes represent
    Following the calculation of mean weighted effect sizes               rape attitudes, 7.6% rape empathy, 21.4% rape-related at-
for each outcome construct, 95% confidence intervals were                 titudes, 6.9% knowledge, 9.5% behavioral intentions, 5.7%
computed to determine the significance of each mean effect                rape awareness behaviors, and 5% incidence of sexual as-
size. If the confidence interval does not include zero, then              sault. Before discussing the results of the seven separate
the mean effect is significantly different from zero (p < .05).           meta-analyses, we will provide a brief overview of all of the
Next, a homogeneity analysis was conducted to determine                   studies included.
whether each effect size represented a common popula-
tion mean. Homogeneity testing was performed according                    Studies Overview
to procedures described by Hedges and Olkin (1985) and                    The studies included in this meta-analytic review involved
was based upon the Q statistic, which follows a chi-square                18,172 participants, of which 48.7% were women (SD =
distribution with k – 1 degrees of freedom (k being the num-              35). The participants’ average age was 20.3 (SD = 2.3).
ber of effect sizes). If the effect size distribution was found           Race could be determined in 71% of the studies as fol-
to be heterogeneous (a significant Q value), this hetero-                 lows: 4.1% African American, 4.3% Asian American, 84%
geneity indicated that the dispersion of effect sizes around              Caucasian, 3.1% Latino/a, and 4.9% other participants. The
their mean was greater than that which would be expected                  studies were authored between 1978 and 2002 (M = 1995.1,
from sampling error alone (Lipsey & Wilson, 2001). In this                SD = 4.7), 58% were published in academic journals, 37.6%
circumstance, analyses of moderator variables should be                   were dissertations or theses, and 4.3% were conference pa-
conducted in order to identify additional sources of vari-                pers or other unpublished works. Regarding methodology,
ance. In addition, as recommended by Hedges and Olkin                     68% of studies used random assignment (of either groups
(1985), outliers were examined in an effort to determine                  or individuals), while 17% of studies attempted to control
if the removal of these values would cause the distribution               for social desirability or demand characteristics. The num-
to achieve homogeneity. For this investigation, an outlier                ber of outcome measures used in each study ranged from
was defined as an effect size that fell two standard devia-               1 to 26 with an average of 4.1 measures per study (SD =
tions above or below the mean of the distribution, which is a             3.7); however, only an average of 3.6 (SD = 2.7) of the mea-
common procedure according to Lipsey and Wilson (2001).                   sures could be coded. Reliability estimates were reported
    Of the 13 variables identified as potential moderators,               for the majority of measures. Regarding the type of control
7 were categorical and were investigated using a proce-                   group employed, 59.4% used a no-treatment control, 10.1%
dure analogous to analyses of variance (ANOVA; Hedges &                   used a wait-list control, 24.6% used a placebo intervention,
Olkin, 1985). Modified weighted least squares regression                  and 5.8% used minimal information groups for comparison
(Hedges & Olkin, 1985) was utilized for analysis of the six               with treatment groups. For additional information about
continuous variables.                                                     the studies included, refer to Anderson (2003).
                                                                             Concerning the effectiveness of the 102 interventions
                                                                          evaluated, Table 1 provides the overall effect sizes for each
                           RESULTS
                                                                          of the seven outcome categories. As Table 1 indicates,
This meta-analysis included results from 69 articles rep-                 the mean weighted effect size for sexual assault education
resenting 102 treatment interventions. Because some                       programs ranged from .061 for rape awareness behavior
interventions were evaluated using multiple outcome mea-                  measures to .574 for measures of rape knowledge. As re-
sures, and effect sizes were coded separately for women and               flected by the confidence intervals, the effect sizes for rape

                                                                  Table 1
                  Sample Size, Mean Weighted Effect Size, 95% Confidence Interval, Homogeneity Test,
                                    and Fail-Safe N for Seven Outcome Categories
             Outcome Category                 k           d+           95% C.I.           Homogeneity Test            Fail-Safe N

             Rape attitudes                 115         .211∗∗         .175/.246         Q (114) = 215.41∗∗               569
             Rape empathy                    20         .072          −.003/.147          Q (19) = 45.86∗∗                —
             Rape-related attitudes          56         .125∗∗         .076/.174          Q (55) = 79.68∗                  86
             Rape knowledge                  18         .574∗∗         .498/.650          Q (17) = 119.54∗∗               118
             Behavioral intent               25         .136∗∗         .054/.217          Q (24) = 42.47∗                  16
             Awareness behavior              15         .061          −.018/.140          Q (14) = 19.24                  —
             Incidence                       13         .101∗∗         .036/.167          Q (12) = 33.47∗∗                  7
             Note. k = Number of effect sizes, d+ = Mean weighted effect size, C.I. = Confidence interval, and Q = Hedges & Olkin’s
             (1985) measure of homogeneity.
             ∗
               p < .05. ∗∗ p < .01.
Sexual Assault                                                                                                              379

knowledge, rape attitudes, behavioral intent, rape-related        in which the content was unable to be determined did not
attitudes, and incidence of sexual assault were all signifi-      produce significant change. However, because of signifi-
cant at the .05 level. For the five significant effect sizes,     cant heterogeneity within most of these categories, caution
fail-safe Ns (Rosenthal, 1991) are also provided in Table 1.      should be exercised in interpreting these results. Random
This statistic provides an estimate of the number of unpub-       assignment, nature of control group, type of population,
lished studies with null results required to reduce the mean      and status of facilitator were not found to be significant
weighted effect size to a value that is no longer statistically   moderators of effect size.
significant.                                                          When a significant overall test of homogeneity (Q) exists
    The five outcome categories that had significant effect       and moderator variables are continuous (as compared to
sizes were examined to determine if the average effect            categorical in the previous analysis), then it is common to
sizes represented a homogenous group of effect sizes or           use regression to test which of the moderator variables con-
a heterogeneous group that should be further explored             tributes uniquely to variation in effect size. Two indices are
through moderator analysis. Although rape knowledge and           used to assess the overall fit of the regression models, with
incidence had significant mean effect sizes and significant       QR reflecting a partitioning of the variability into the por-
tests of homogeneity, there were too few studies in these         tion associated with the regression model and QE indicating
categories to ascertain reliable findings; thus, moderator        the variability unaccounted for by the model. Examination
analyses were not conducted for these outcomes. Modera-           of the six predictor variables (see Table 3) revealed that only
tor analyses were performed on the outcome categories of          two of them (length of intervention and number of partic-
rape attitudes, rape-related attitudes, and behavioral inten-     ipants) made significant contributions to variation among
tions because these tests of homogeneity were significant,        rape attitude effect sizes. The positive direction of each
and a sufficient number of studies were available in these        B-weight indicates that longer interventions and larger sam-
categories for analysis.                                          ple sizes are both associated with more positive change in
                                                                  rape attitudes.
Rape Attitudes
                                                                  Rape-Related Attitudes
There were 115 effect sizes in this outcome category, based
on 89 treatment interventions conducted within 57 studies,        This outcome category contained 56 effect sizes, based
which produced an average effect size of .211. On average,        upon 43 treatment interventions from 26 studies, that pro-
41% of the participants were women (SD = 29.5). The most          duced an effect size of .125. On average, 38% of the par-
common outcome measure was the Rape Myth Acceptance               ticipants were women (SD = 33). The average length of
Scale (Burt, 1980), and the average time for follow-up as-        the intervention was 240.1 (SD = 526.0) minutes, and the
sessment was 34.9 days. The average intervention lasted           follow-up assessment was conducted an average of 36.7
142.6 minutes, but there was substantial variation in length      (SD = 111.5) days after the termination of treatment. The
of programs (SD = 362.1). For the homogeneity analysis,           homogeneity test revealed significant variation among ef-
results (Q = 214.41, p < .001) indicated moderator analyses       fect sizes (Q = 79.68, p < .05). Three effect sizes met
were warranted. Outlier analysis revealed the presence of         the criteria as outliers; however, because deletion of these
six outliers. However, deletion of these outliers only mini-      outliers did not reduce the Q statistic beyond significant
mally reduced the overall effect size (i.e., from .211 to .209)   heterogeneity and did not substantially change the average
and did not impact its significance. Furthermore, the Q           effect size, these effect sizes were retained in the moderator
statistic retained significance. These outliers were conse-       analysis.
quently retained for moderator analysis.                              Concerning rape-related attitudes, the moderator analy-
    The results of moderator analysis for the seven categor-      ses for the categorical variables appear in Table 4. Nonran-
ical variables are presented in Table 2. When moderator           dom assignment and nature of the control group appeared
variables are categorical, the interpretation is analogous to     to impact the magnitude of effect size. Once again it should
ANOVA, where QB reflects the portion explained by the             be noted that some of the categories did not obtain within-
categorical variable and QW indicates the residual pooled-        group homogeneity. In addition, only one study used a wait-
within-groups portion. Journal articles were found to have        list control group; this category was dropped from analysis.
a greater mean effect size than unpublished works. An anal-       Average effect sizes also differed significantly by the type of
ysis of the gender of the audience also revealed significant      population that received the intervention, the status of the
between-group differences. Women who received an inter-           facilitator, and the content of the intervention. However,
vention in an all-female group displayed the largest effect       these significant between-group findings are tempered by
sizes, although only three studies were found, and this value     the lack of homogeneity within several of the categories.
was not significant. Finally, the content of the intervention         Modified weighted least squares regression analysis (see
appeared to moderate mean effect size, and interventions          Table 5) was conducted for the six continuous variables in
categorized as “risk reducing” resulted in the largest mean       the same manner as previously described for the rape atti-
effect size, while empathy-focused programs and programs          tudes outcome, and the regression model was statistically
380                                                                                                           ANDERSON AND WHISTON

                                                                Table 2
                                      Categorical Moderators of Rape Attitudes Outcome
              Variable & Class                    k           d+               95% C.I.              QW               QB

              Type of publication                                                                                    4.00∗
                Journal                          60           .239              .19/.28           129.43∗∗
                Diss/thesis/unpub                55           .164              .11/.22            81.98∗∗
              Random assignment                                                                                      1.74
                Individual random                47           .215              .16/.27            94.92∗∗
                Group random                     32           .241              .17/.31            27.33
                Nonrandom                        36           .181              .12/.24            91.42∗∗
              Nature of control group                                                                                7.15
                No treatment                     71           .245             .20/.29            133.50∗∗
                Wait-list                         5           .081            −.06/.22             10.59∗
                Attention placebo                33           .202             .13/.27             47.27∗
                Minimal information               6           .140             .03/.25             16.90∗∗
              Type of population                                                                                     7.49
                General students                 94           .205             .17/.25            154.21∗∗
                Greek members                    13           .290             .20/.38             40.22∗∗
                High-risk                         5           .011            −.22/.24             10.39∗
                Other                             3           .045            −.21/.30              3.10
              Gender of audience                                                                                   14.83∗∗
                Female/female group               3           .287          −.013/.587              2.43
                Female/mixed group               27           .236           .163/.310             50.52∗∗
                Male/male group                  24           .111           .027/.196             38.46∗
                Male/mixed group                 29           .124           .039/.209             36.25
                Female/male combined             32           .273           .217/.329             72.93∗∗
              Status/facilitator                                                                                     3.59
                Peer                             19           .246              .17/.33            29.81∗
                Graduate student                 30           .172              .10/.25            30.22
                Professional                     28           .233              .17/.30            68.28∗∗
                Combination                      16           .154              .05/.26            38.94∗∗
                Unknown-N/A                      22           .222              .12/.32            44.58∗∗
              Content/intervention                                                                                 29.48∗∗
                Information                      39          .257              .20/.31             69.78∗∗
                Empathy                           9          .130             −.02/.28             18.69∗
                Socialization                    13          .327              .19/.46             19.17
                Risk reduction                    5          .435              .28/.59             12.53∗
                More than one                    36          .165              .10/.23             56.34∗
                Other                            13         −.019             −.14/.11              9.41
              Note. k = Number of effect sizes, d+ = Mean weighted effect size, C.I. = Confidence interval, QW = Homogeneity
              within each class, and QB = Homogeneity between classes.
              ∗
                p < .05. ∗∗ p < .01.

significant. The overall rating of the quality of the study and        ing measures of behavioral intentions was .136. In this out-
the time of measurement each contributed significantly to              come category, 31% of the participants were women. The
the prediction of effect size. More specifically, studies with         average length of the intervention was 215 (SD = 386.2)
higher overall quality ratings and studies with a greater time         minutes, and the follow-up assessment was conducted an
lapse between the end of the intervention and the time of              average of 19.4 (SD = 43.9) days later. The homogeneity
measurement were both associated with smaller effect sizes.            test revealed significant variation among effect sizes. Out-
Furthermore, consistent with the outcome category rape at-             lier analysis revealed the presence of one outlier (d+ =
titudes, length of intervention was significant, with longer           −.89), and removal of this outlier from the distribution
interventions being associated with larger effect sizes.               and recalculation of the average weighted effect size in-
                                                                       creased the value from .136 to .15; however, the Q statis-
                                                                       tic dropped from 42.47 to 33.77, which borders on sta-
Behavioral Intentions
                                                                       tistical significance (p = .068). This outlier was excluded
Based upon 25 effect sizes obtained from 22 treatment in-              from further moderator analysis because, unlike previously
terventions within 16 studies, the overall d+ for studies us-          identified outliers, the study displayed several outstanding
Sexual Assault                                                                                                                         381

                               Table 3                                      has a small but positive influence on rape attitudes. This
  Continuous Moderators for Rape Attitudes Outcome                          mean effect size is somewhat lower than the previous meta-
                                                                            analyses of Brecklin and Forde (2001, p. 35) and Flores and
Predictor                           B         
                                            SEB       ß           z         Harlaub (1998; .30). Possible reasons for these differences
                                                                            are the larger number of studies included in this investiga-
Year of publication           −.0091 .0049 −.147 −1.8548
                                                                            tion, implementation of controls for data dependence, and
Overall quality                .0021 .0192  .009   .1083
Length of intervention         .0003 .0001  .227  2.7921∗∗                  weighting of effect sizes as suggested by Hedges and Olkin
Percent attrition             −.0009 .0010 −.068 −.8993                     (1985) and Lipsey and Wilson (2001). Although the effect
Time of measurement           −.0002 .0004 −.045 −.5195                     sizes for behavioral intentions, rape-related attitudes, and
N of sample (after attrition)  .0002 .0001  .218  2.9792∗∗                  incidence of sexual assault were statistically significant (.14,
                                                                            .12, and .10, respectively), the influence of sexual assault
Note. QR (6) = 19.134∗∗ , QE (82) = 196.279∗∗ , and R2 = .09. B = Unstan-
dardized regression coefficient, SEB = Corrected standard error value,     programs on these outcomes may be of little clinical sig-
ß = Standardized regression coefficient, and z = z-test of significance.    nificance, as these effect sizes do not reach the criteria for
∗∗
   p < .01.                                                                 a small effect size (i.e., .20 to .40) as suggested by Cohen
                                                                            (1988). Sexual assault education programming did not ap-
                                                                            pear to have any impact on rape empathy or rape awareness
methodological characteristics that may have impacted the                   behaviors because the studies produced overall mean effect
results and was not typical of the general pool of studies.                 sizes that were not significantly different from zero.
Although this exclusion caused the Q statistic to drop be-                     Consequently, the answer to the question “Are sexual as-
low significance, moderator analysis was nevertheless con-                  sault education programs effective?” indeed depends upon
ducted to explore variations in effect sizes.                               the criteria used to define effectiveness (Breitenbecher,
    As Table 6 reflects, three variables appeared to mod-                   2000). If effectiveness is defined solely as a decrease in
erate effect size for this outcome category—nature of the                   sexual assault, then there is little support available from the
control group, gender of audience, and status of the facil-                 current pool of studies. Although a decline in incidence
itator. Modified weighted least squares regression analysis                 may be the ultimate goal of education programming, the
was conducted for the six continuous variables for this out-                extreme difficulty in obtaining accurate long-term statis-
come as well. The regression model was not statistically                    tics regarding involvement with sexual assault following an
significant, while the residual also did not attain statisti-               intervention (Schewe & O’Donohue, 1993a) indicates that
cal significance. This finding indicated that the regression                additional outcomes should be considered. Our findings do
model was not correctly specified for this outcome; thus, it                indicate that sexual assault education programs are some-
was not appropriate to conclude that any of the regression                  what effective in changing attitudes toward rape and in-
coefficients was significantly different from zero.                         creasing rape knowledge. However, due to the dearth of
                                                                            studies using behavioral outcomes, more research using be-
                                                                            havioral indices is needed before definitive conclusions can
                           DISCUSSION
                                                                            be reached.
The primary purpose of this meta-analysis was to investi-                      In addition, the effect sizes of certain outcome constructs
gate the effectiveness of sexual assault education programs                 were most likely influenced by characteristics of the inter-
on college campuses using both published and unpublished                    vention, participants, and research methodology. For exam-
studies. Specifically, we were interested in whether these                  ple, rape attitudes may be more subject to demand char-
programs influenced attitudinal outcomes, knowledge mea-                    acteristics than other outcomes because these attitudes are
sures, and behavioral indices. Seven separate meta-analyses                 often overtly discussed and disputed in sexual assault ed-
were conducted to determine the impact programming has                      ucation workshops (Lonsway, 1996), while other outcome
on these distinct outcome categories. The results of this                   measures may be less directly associated with program con-
investigation indicate that the efficacy of sexual assault ed-              tent. A factor that may have influenced the effect sizes for
ucation programming on college campuses appears to differ                   behavioral variables pertains to when the outcome was mea-
depending on which types of outcomes are considered.                        sured. In general, effect sizes tend to decrease when there
   The outcome category that evidenced the most positive                    is a longer time between when the intervention is deliv-
change was rape knowledge, which produced a mean effect                     ered and when the outcome is measured. Outcomes such
size of .57. This finding indicates that those who participated             as rape awareness behaviors had longer average follow-up
in a sexual assault education program displayed greater                     measurement times (89 days) compared to rape attitudes
factual knowledge about rape than those who did not at-                     (35 days). Hence, it is difficult to determine whether the
tend a program. The positive effect size for rape knowledge                 lower effect sizes associated with behavioral outcomes rel-
could be considered to produce a “medium” effect using the                  ative to attitudinal measures are due to program impact or
guidelines suggested by Cohen (1988). The second largest                    to time of assessment.
effect size was found for the rape attitudes category (.21),                   An advantage of meta-analytic methodology is the abil-
which suggests that sexual assault education programming                    ity to examine variables that influence program outcome
382                                                                                                               ANDERSON AND WHISTON

                                                                  Table 4
                                 Categorical Moderators for Rape-Related Attitudes Outcome
                Variable & Class                     k            d+              95% C.I.             QW              QB

                Type of publication                                                                                     .81
                  Journal                           25           .139             .08/.20            48.85∗∗
                  Diss/thesis/unpub                 31           .090             .00/.18            30.03
                Random assignment                                                                                      9.03∗
                  Individual random                 25           .056           −.02/.13             23.65
                  Group random                      14           .077           −.04/.19              5.08
                  Nonrandom                         17           .214            .14/.29             41.91∗∗
                Nature of control group                                                                              14.92∗∗
                  No treatment                      38          .200             .14/.26             53.26∗
                  Wait-list                          1            —                 —                 —
                  Attention placebo                 11          .053            −.06/.17              7.66
                  Minimal information                6         −.028            −.14/.08              3.83
                Type of population                                                                                     6.38∗
                  General students                  43           .098            .04/.16             53.62
                  Greek members                      8           .242            .13/.35             14.68∗
                  High-risk                          5           .011           −.22/.24              5.00
                  Other                              0             —                —                 —
                Gender of audience                                                                                     1.99
                  Female/female group                1             —                —                 —
                  Female/mixed group                14           .114            .032/.196           33.09∗∗
                  Male/male group                   17           .169            .066/.273           19.20
                  Male/mixed group                  16           .094           −.002/.191           17.98
                  Female/male combined               8           .123           −.001/.247            7.42
                Status/facilitator                                                                                   10.56∗
                  Peer                               6           .157            .02/.29             15.41∗∗
                  Graduate student                  21           .029           −.06/.12             12.23
                  Professional                      12           .209            .13/.29             27.83∗∗
                  Combination                       10           .171            .02/.32              5.64
                  Unknown-N/A                        7           .032           −.12/.18              8.01
                Content/intervention                                                                                 14.73∗
                  Information                       19           .217            .13/.30             20.95
                  Empathy                            4           .094           −.22/.41              1.60
                  Socialization                      8           .300            .11/.48              6.00
                  Risk reduction                     2           .203           −.15/.56              2.33
                  More than one                     17           .030           −.04/.10             18.27
                  Other                              5           .125           −.03/.28             15.79∗∗
                Note. k = Number of effect sizes, d+ = Mean weighted effect size, C.I. = Confidence interval, QW = Homogeneity
                within each class, and QB = Homogeneity between classes.
                ∗
                  p < .05. ∗∗ p < .01.

through moderator analysis. A significant finding of this                tion and program effectiveness. This difference may in
meta-analysis is that longer interventions (i.e., length of              part be due to their analysis of this variable as categorical,
time exposed to material in minutes) seemed to be more                   whereas we analyzed length of intervention as a continu-
effective in altering both rape attitudes and rape-related at-           ous variable. We believe that, given the larger number of
titudes. Interestingly, the range in length of interventions             studies in our meta-analyses, our finding that longer inter-
was substantial (7 to 2,520 minutes); it seems sensible to               ventions were more effective may more accurately repre-
conclude that a 7-minute intervention would be less ef-                  sent the research in this area. Hence, we would encourage
fective than a much longer intervention. Although we did                 those designing educational programs to institute longer,
not specifically test single- versus multi-session program-              more thorough interventions rather than brief programs.
ming, these findings suggest that semester-long courses                  Because the attention span of students may be limited dur-
or possibly multi-session workshops may be more effec-                   ing one sitting, an educator might consider multi-session
tive in promoting positive change. Flores and Hartlaub                   programming.
(1998) and Brecklin and Forde (2001), however, did not                      This study also found that the status of the facilita-
find an association between the length of the interven-                  tor appears to influence changes in rape-related attitudes
Sexual Assault                                                                                                                                  383

                                Table 5                                                                     Table 6
    Continuous Moderators for Rape-Related Attitudes                           Categorical Moderators for Behavioral Intent Outcome
                      Outcome
                                                                              Variable & Class              k     d+      95% C.I. QW          QB
Predictor                             B         
                                              SEB       ß           z
                                                                              Type of publication                                               .03
Year of publication                .0066      .0072 .123    .9192               Journal                     12    .142     .01/.27 13.66
Overall quality                   −.0665      .0264 −.348 −2.5186∗              Diss/thesis/unpub           12    .157     .05/.26 20.08∗
Length of intervention             .0004      .0001 .481 3.1913∗∗             Random assignment                                                4.47
Percent attrition                  .0012      .0014 .109    .8195               Individual random           14    .074    −.04/.19 14.90
Time of measurement               −.0014      .0005 −.410 −2.7104∗∗             Group random                 5    .278     .12/.44 8.07
N of sample (after attrition)      .00002     .0001 −.024 −.1870                Nonrandom                    5    .187    −.01/.38 6.32
                                                                              Nature of control group                                         10.38∗∗
Note. QR (6) = 24.757∗∗ , QE (49) = 54.779 (ns), and R2 = .31. B = Unstan-      No treatment                16    .235     .12/.35 21.73
dardized regression coefficient, SEB = Corrected standard error value, ß =
Standardized regression coefficient, and z = z-test of significance.            Attention placebo            6    .009    −.13/.14 1.67
∗
  p < .05. ∗∗ p < .01.                                                        Type of population                                               1.69
                                                                                General students            14    .178     .07/.28 26.62∗
                                                                                Greek members                4    .090    −.07/.25 2.38
                                                                                High-risk                    4    .207    −.11/.52 3.07
and behavioral intentions. Professional presenters were                       Gender of audience                                              10.77∗
more successful, while graduate students and peer presen-                       Female/female group          2 .552        .15/.96 .43
ters were generally less successful in promoting positive                       Female/mixed group           2 −.095      −.33/.14 .00
changes. Although there should be some caution in inter-                        Male/male group             14 .133        .02/.24 18.98
preting these results, these findings do raise questions about                  Male/mixed group             5 .265        .10/.43 3.59
the common practice of employing peer facilitators. Peer                        Female/male combined         1 —             —       —
education is popular not only in rape education but also in a                 Status/facilitator                                              12.51∗
number of other health-related educational programs (e.g.,                      Peer                         8    .043    −.07/.16    2.13
substance abuse, HIV, sexuality); however, both Backett-                        Graduate student             5    .124    −.07/.32    4.98
                                                                                Professional                 3    .449     .09/.81    1.04
Milburn and Wilson (2000) and Parkin and McKeganey
                                                                                Combination                  3    .168    −.08/.41    4.89
(2000) have questioned whether there is sufficient research
                                                                                Unknown-N/A                  5    .427     .21/.64    8.22
to support this prevalent approach. Walker and Avis (1999)                    Content/intervention                                             5.27
suggested several reasons why peer intervention might fail,                     Information                  6    .095    −.05/.24    8.33
including a lack of investment in peer education (viewing                       Empathy                      5    .074    −.13/.27    1.96
peers as “cheap labor”); lack of appreciation of the complex-                   Socialization                4    .359     .13/.58    1.25
ity of the peer education process and the need for highly                       Risk reduction               4    .105    −.07/.28    7.28
skilled personnel; and inadequate supervision, training, and                    More than one                5    .232     .03/.44    9.68∗
support. Consequently, it may be beneficial to address these                  Note. k = Number of effect sizes, d+ = Mean weighted effect size, C.I. =
concerns in future research before any conclusions can be                     Confidence interval, QW = Homogeneity within each class, and QB =
offered concerning the effectiveness of peer educators.                       Homogeneity between classes.
                                                                              ∗
                                                                                p < .05. ∗∗ p < .01.
    Another significant moderator of effect size for both
rape attitudes and rape-related attitudes was the content
of the intervention. The results suggest that interventions
that focus on gender-role socialization, provide general in-                  pirical research examining the content of programming is
formation about rape, discuss rape myths/facts, and address                   needed.
risk-reduction strategies have a more positive impact on                         Another pertinent finding was that programs that in-
participants’ attitudes than rape empathy programs and                        cluded more than one topic appeared to be less effective
interventions with unspecified contents. However, some                        than more focused programs, which may indicate that more
considerations should be addressed before concluding that                     in-depth programming produces better outcomes than ses-
rape empathy interventions are ineffective. First, the dif-                   sions that cover multiple topics more superficially. Fur-
ference in effectiveness for these programs could be associ-                  thermore, this finding may be related to our finding that
ated with the types of outcome measures utilized to assess                    longer interventions are more effective and that attempting
positive change. These attitudinal measures tend to assess                    to cover information too quickly may result in weak effects
concepts discussed in myth/fact and socialization-focused                     that have little long-term impact. A final issue to consider
programs, while these concepts may not be as directly ad-                     when evaluating the content of sexual assault education in-
dressed in empathy programs. In addition, these findings                      terventions is that the type of program offered may vary
include only attitudinal data; thus, whether programs with                    depending upon the gender of the participants. Women are
different content have any differential impact upon the be-                   more likely to receive a risk-reduction intervention, while
havior of participants is unknown. Consequently, more em-                     men may be more likely to receive an empathy intervention.
384                                                                                                 ANDERSON AND WHISTON

Due to gender differences in rape attitudes and behaviors,       research and the potential to create programs based on mis-
the gender of participants may influence findings of overall     leading findings; therefore, particular attention was focused
effectiveness within a particular content category.              in this meta-analysis on incorporating research methodol-
   Type of audience was also a significant moderator of          ogy and design variables into our analyses. Our findings
effect size for rape-related attitudes. Greek members ap-        suggest that studies that are published, are rated as lower
peared to be the most positively impacted by educa-              quality, lack random assignment, have larger sample sizes,
tional programming, which is of interest because it is of-       and employ no-treatment control groups have larger effect
ten thought that fraternity and sorority members are at          sizes. Collectively, these results suggest that low method-
greater risk to experience sexual assault (e.g., Copenhaver      ological standards may lead to potentially erroneous con-
& Grauerholz, 1991; Sandy, 1990; Schwartz & DeKeseredy,          clusions about the effectiveness of sexual assault education
1997). Although high-risk populations did not appear to          interventions. Although these findings are not consistent
demonstrate positive changes in attitudes, this finding must     across every methodological characteristic and varied across
be observed with caution because this category included          outcome variables, methodological rigor is necessary in fu-
only five studies and consisted of heterogeneous groups.         ture research to provide more precise findings. Brecklin
Consequently, more research is needed to explore the dif-        and Forde (2001) also found publication bias in their meta-
ference in responsiveness to education among specific high-      analysis, which suggests that unpublished studies should
risk groups.                                                     continue to be included in future reviews of this literature.
   Another important moderator is the gender of the au-          Consistent with Brecklin and Forde (2001) and Flores and
dience. For women, a significant positive effect size was        Hartlaub (1998), we found that for rape-related attitudes,
found for rape attitudes when the program was conducted          the length of time between the end of treatment and the
with mixed-gender groups. Although a relatively high ef-         assessment of the impact of the program was a significant
fect size (.29) was found for women in all-female groups,        moderator of treatment effectiveness. Therefore, there are
this value was not significant and was based on only three       consistent findings that the positive effects of treatment
studies. In contrast, tentative findings for behavioral in-      tend to diminish over time.
tentions suggest that women may have a better outcome
in an all-female setting and that mixed-gender program-
                                                                 Limitations
ming may not be effective. However, these findings are
based on only four studies, and thus further research is         There are several limitations to this study that must be ac-
needed before any conclusion is drawn. Surprisingly, there       knowledged. First, the results of any meta-analytic review
was no evidence from these data that men are more likely         are only as sound as the studies included in the analysis
to benefit from programming administered in all-male             (Lipsey & Wilson, 2001). Although criteria were specified a
groups as compared to men in mixed-gender groups. Al-            priori to exclude studies with serious methodological prob-
though there was no significant difference between the           lems, it should be noted that many studies contained some
two groups, it is surprising to note that men from mixed-        limitations, which in turn restricted the conclusions of this
gender groups displayed a larger effect size for behavioral      meta-analysis. Another limitation concerns the amount of
intentions.                                                      unexplained variance found in many of the univariate mod-
   These results contradict Brecklin and Forde’s (2001)          erator analyses, as well as the underspecification of the re-
findings that single-gender programs were more effective         gression model for the rape attitudes outcome. These condi-
for men than mixed-gender programs. Differences between          tions suggest that several of the findings from the moderator
our study and Brecklin and Forde’s may provide some in-          analyses should be viewed with caution, because there were
sights into reasons for the conflicting findings. It should be   additional sources of variance that remain unexplained. In
noted that Brecklin and Forde’s results were related only        particular, the possibility of interaction effects must be con-
to rape attitudes and their meta-analysis did not include        sidered because the findings of moderator analyses may be
behavioral measures. In addition, the current investigation      influenced by other potentially related variables. Moreover,
included a larger number of studies and controlled for data      the small number of studies included in the behavioral in-
dependency. Because we included effect sizes only from           tentions moderator analysis also limits the generalizability
the last follow-up evaluation, our study may also offer a        of these findings. Although attempts were made to limit the
more accurate indication of the longer-term effectiveness        number of moderator analyses, another issue concerns a po-
of programming. Considering the significance of this issue       tential Type I error due to the number of univariate analy-
and the recent support for single-gender programs above          ses that were conducted for each outcome. However, given
mixed-gender programs (e.g., Berkowitz, 2002; Gidycz             our adherence to the procedures suggested by Hedges and
et al., 2002; Rozee & Koss, 2001; Schewe, 2002), more em-        Olkin (1985) and Lipsey and Wilson (see Anderson, 2003
pirical research on this question is necessary.                  for details), the probability of a Type I error was reduced.
   Schewe and O’Donohue (1993a) and Yeater and                       Although attention was given to systematic and objective
O’Donohue (1999) have voiced concerns about the lack             coding, certain moderator variables were more sensitive to
of methodological sophistication in sexual assault education     coder subjectivity. In particular, it was challenging to code
You can also read