Do Experts and Novices Evaluate Movies the Same Way?
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
mar265_05_20283.qxd 3/19/09 10:53 AM Page 470 Do Experts and Novices Evaluate Movies the Same Way? Jonathan A. Plucker Indiana University James C. Kaufman, and Jason S. Temple California State University, San Bernardino Meihua Qian Indiana University ABSTRACT Do experts and novices evaluate creativity the same way? This question is particularly relevant to the study of critical and public response to movies. How do public opinions differ from movie critic opinions? This study assessed college student (i.e., novice) ratings on movies released from 2001 to 2005 and compared them to expert opinions and those of self-declared novices on major movie rating Web sites. Results suggest that the student ratings overlapped considerably—but not overwhelmingly—with the self-described novices, student ratings correlated at a lower magnitude with critic ratings, and the ratings of students who saw the most movies corre- lated more highly with both critics and self-described novices than those of students who saw the least movies. The results suggest a continuum of creative evaluation in which the distinctions between categories such as “novice,” “amateur,” and “expert” are blurry and often overlap—yet the categories of expertise are not without impor- tance. © 2009 Wiley Periodicals, Inc. Psychology & Marketing, Vol. 26(5): 470–478 (May 2009) Published online in Wiley InterScience (www.interscience.wiley.com) © 2009 Wiley Periodicals, Inc. DOI: 10.1002/mar.20283 470
mar265_05_20283.qxd 3/19/09 10:53 AM Page 471 Psychologists studying creativity, with a long history of applied research in the arts and literature, have recently turned their empirical attention to the study of film. For example, Simonton (2004a, 2004b, 2005a, 2005b, 2006, 2007a, 2007b, 2007c) has conducted a series of increasingly complex studies examining movie awards, critic ratings, box office, and their relationships, among many other phenomena. Delmestri, Montanari, and Usai (2005) have studied artistic merit and commer- cial success of Italian films, and Plucker, Holden, and Neustadter (2008) have investigated the psychometric properties of various film ratings and the ratings’ correlation with specific box office data. Zickar and Slaughter (1999) have used the careers of film directors to investigate patterns of longitudinal creative production. The reasons for this sudden growth in interest include the large, readily avail- able pool of extant data on movies, the relatively thin pool of studies on creativ- ity in film relative to other popular art forms, and the traditional, popular fascination with movies. However, perhaps the major motivation for studying cin- ematic creativity is the ability to influence the creation and, especially, the mar- keting of movies. Much of the existing empirical research on film marketing comes, not surprisingly, from marketing researchers. But creativity researchers, with their appreciation for both the creative product and process, may have insights into how all aspects of film—writing, directing, acting, production, distribution, mar- keting—can be better understood (and perhaps improved) by the application of models and theories from the broader literature on creativity and innovation. THE CRITERION PROBLEM A problem encountered by researchers interested in cinematic creativity is a version of the traditional criterion problem that affects all creativity research. MacKinnon (1978) argued that “the starting point, indeed the bedrock of all studies of creativity, is an analysis of creative products, a determination of what it is that makes them different from more mundane products” (p. 187). Csikszentmihalyi (1999) wrote, If creativity is to have a useful meaning, it must refer to a process that results in an idea or product that is recognized and adopted by others. Originality, freshness of per- ception, divergent-thinking ability are all well and good in their own right, as desir- able personal traits. But without some sort of public recognition they do not constitute creativity. . . . The underlying assumption [in all creativity research] is that an objec- tive quality called “creativity” is revealed in the products, and that judges and raters can recognize it. (p. 314) Many psychologists have shared MacKinnon’s and Csikszentmihalyi’s belief in the importance of the creative product. The importance of the creative product emerged in response to perceived needs for external criteria by which researchers could compare other methods of measuring creativity for the purposes of establish- ing evidence of validity. However, an absolute and indisputable criterion of creativ- ity is not readily available, hence the criterion problem (Plucker & Renzulli, 1999). Although approaches to the study of creative products vary greatly (see Plucker & Renzulli, 1999), the most common method for the measurement of cre- ative products utilizes the ratings of external judges, especially the use of con- sensual assessment techniques (Amabile, 1983; Baer, Kaufman, & Gentile, 2004; Hennessey & Amabile, 1988; Kaufman, Baer, & Gentile, 2004). The Consensual Assessment Technique (CAT) is based on the belief that future creativity is best CRITERION PROBLEM AND CREATIVITY IN FILM 471 Psychology & Marketing DOI: 10.1002/mar
mar265_05_20283.qxd 3/19/09 10:53 AM Page 472 predicted by past creativity, and that the best measure of creativity is the com- bined assessment of experts in that field (Kaufman, Plucker, & Baer, 2008). EVALUATING FILM QUALITY Although no pure measures of a movie’s creativity have been created, a signif- icant amount of research has focused on identifying existing, proxy measures. Simonton (2002, 2004c) has made progress in defining and refining select cri- teria such as awards and reviews, but a much larger range of quantitative data, such as critics’ and fan Web site users’ ratings, are available for use by researchers. A major implication of this work is the examination of critic and layperson ratings of movies. Marketing efforts are obviously most effective when they target char- acteristics of the creative product that consumers consider when they decide whether to purchase the product. If layperson ratings diverge sharply from those of critics, the use of professional film critic reviews in marketing efforts would appear to be less than ideal.1 And if data provide evidence that layperson ratings are simi- lar to critics’ evaluations under certain conditions (e.g., romantic comedies without major stars released mid-winter), marketing for specific films released under those same conditions can be tailored accordingly. The research on critic and layperson agreement on film quality is relatively thin. Boor (1990, 1992) found evidence that inter-rater agreement for critics was high (ranging from around 0.50 to 0.75), with a slightly lower range of cor- relations for critics and non-critics. Holbrook (1999) found a correlation of only 0.25 between critic and non-critic ratings. Yet somewhat surprisingly, Plucker, Holden, and Neustadter (in press) found evidence that professional critics’ rat- ings did not differ significantly from those of laypeople. But the possibility exists that the layperson ratings used in that study—i.e., ratings drawn from users of popular Web sites devoted to movies—may not truly be laypeople. They could include, for example, ratings from people who work in the film industry, ratings from people who are so devoted to movies that their opinions and aesthetic tastes have begun to mirror those of critics, ratings from non-critics who use the Internet a great deal, or perhaps all of the above. A similar controversy exists in other areas of creativity research. In the fine arts, the opinions of non-professional artists about their own work were more accu- rately reflected by their artistic peers than by the judgments of professional artists (but not, it should be noted, art critics) (Runco, McCarthy, & Svenson, 1994). In creative writing, Kaufman, Gentile, and Baer (2005) found evidence of a high correlation between expert ratings and the ratings of gifted novices. However, the same research team, studying expert and non-gifted novice judgments of cre- ativity in poetry, found only a moderate degree of agreement between expert and non-expert judges, with lower ratings by the expert judges (Kaufman et al., 2008). Indeed, when the number of raters was adjusted, the non-expert raters also showed a lower amount of agreement among themselves than the experts. 1 Although we acknowledge that one use of critics’ opinions in film marketing is to get people to buy tickets on opening weekend, regardless of whether laypeople are likely to agree with the critics, this strategy is only effective if laypeople believe that their assessments will be similar to those of critics. If not, both the short-term and long-term impacts of using critics’ ratings in marketing efforts are questionable. 472 PLUCKER, KAUFMAN, TEMPLE, AND QIAN Psychology & Marketing DOI: 10.1002/mar
mar265_05_20283.qxd 3/19/09 10:53 AM Page 473 No solid consensus exists about the relationship among expert and non-expert evaluators of creative products, whether in the fine arts, creative writing, or movies. As a result, within the context of film, the degree to which critics’ ratings are valid and influential measures of the popular opinion of cinematic creativ- ity has not been determined. The purpose of this paper is to extend the research of Plucker, Holden, and Neustadter (2008) and compare the ratings of critics, self- proclaimed novices (who may have a greater interest in movies than most novices), and true novices (i.e., college students) who are major consumers of movies. METHOD Measures and Data Sources The primary sample for this study included all films that played on at least 1,000 screens in the United States during the calendar years of 2001–2005. This resulted in a sample of 680 films. Three types of ratings sources were utilized. First, ratings for professional crit- ics were drawn from publicly available data on metacritic.com. Metacritic rat- ings are a normalized, weighted average of critics’ reviews from 42 major publications (newspapers, weekly and monthly magazines, and Web sites), reported on a 0–100 scale. The metacritic database includes ratings data on nearly every movie released in the United States since 1999. Second, ratings for self-defined novices were collected from publicly avail- able data on boxofficemojo.com and imdb.com. These sites were chosen based on the recommendation of Plucker, Holden, and Neustadter (2008) due primarily to the psychometric quality of the available data. We consider these ratings to come from “self-defined” novices because little is known about the users of each site that choose to submit their ratings: They could be a broad cross section of moviegoers, or they could be a highly self-selected sample, such as people work- ing in the film industry itself. Third, ratings for movies were collected from 169 students at a large public university in the southwestern United States. The university is mid-sized (approximately 14,000 students), and serves a student body that comprises a sub- stantial portion of nontraditional students. The mean age of the student body is 25.9 years. The student population is ethnically diverse (32.75% Hispanic, 11.74, African American, 4% Asian-Pacific Islander), and the university is rec- ognized by the Department of Education as a Hispanic-serving institution. Data Analyses The data were analyzed in three steps. First, means and standard deviations were calculated for all four groups of raters. Analysis of variance was used to exam- ine differences among the mean ratings of the four groups. Second, a correlation matrix for the four sets of ratings was created and examined. Third, the student ratings were calculated separately for students who saw a low, average, or high numbers of movies. These separate student means were compared to those of the critics and self-defined novices. CRITERION PROBLEM AND CREATIVITY IN FILM 473 Psychology & Marketing DOI: 10.1002/mar
mar265_05_20283.qxd 3/19/09 10:53 AM Page 474 Table 1. Mean Movie Ratings by Type of Rater. Rating Source Mean SD Students 6.49 0.93 Critics 5.01 1.70 Novice 1 6.16 1.41 Novice 2 6.09 1.19 Note: Films rated on a scale of 1 to 10, 10 being highest. Critics ⫽ ratings from metacritics.com; Students ⫽ ratings from undergraduate sample; Novice 1 ⫽ ratings from imdb.com; Novice 2 ⫽ ratings from boxofficemojo.com. Table 2. Correlations Between Movie Ratings of Students, Critics, and Novices. Novice Critics Novice Critics Rating Source Critics (IMDB) (BOM) Students 0.43** 0.65** 0.65** Critics 0.72** 0.77** Novice critics (IMDB) 0.86** **p ⬍ 0.01. RESULTS Descriptive statistics from the four groups of raters appear in Table 1. Ratings were converted to a 0–10 scale. The data included in Table 1 suggest that sig- nificant differences existed among the four groups’ ratings, and a subsequent ANOVA [F(2.04) ⫽ 352.98, p ⬍ 0.001, partial eta squared ⫽ 0.40 (Greenhouse- Geisser-adjusted)], provided evidence of this difference. The critics’ ratings were significantly lower than those for the three other groups, perhaps as large as a full-pooled standard deviation. However, the lack of mean differences among the student and two novice groups does not necessarily mean that students and novices rated movies sim- ilarly. For this reason, Pearson product–moment correlations were calculated for the complete set of ratings (Table 2). The correlation between critic and student ratings was moderate in magnitude but much lower than the correlations between critics and novices. Indeed, the magnitude of correlations between crit- ics and novices was at the high end of the range noted in previous research for agreement among critics. Student Ratings by Number of Movies Seen To investigate whether a “familiarity effect” influences the correlations among ratings groups, we split the student ratings into three new groups: low, average, and high exposure to films. The number of movies seen by average student raters was between ⫺1 and ⫹ 1 standard deviations of the mean number of movies viewed by the sample. In the same vein, the number of films seen by the low group of student raters was more than one standard deviation below the mean; the converse was true for the high group of student raters. 474 PLUCKER, KAUFMAN, TEMPLE, AND QIAN Psychology & Marketing DOI: 10.1002/mar
mar265_05_20283.qxd 3/19/09 10:53 AM Page 475 Table 3. Mean Movie Ratings of Students by Number of Movies Rated. Rating Source Mean SD Students low 7.30 1.63 Students average 6.69 0.93 Students high 6.36 1.04 Note: Films rated on a scale of 1 to 10, 10 being highest. Table 4. Correlations Between Movie Ratings of Critics, Novices, and Students by Number of Movies Rated. Novice Critics Novice Critics Rating Source Critics (IMDB) (BOM) Students low 0.22** 0.30** 0.29** Students average 0.39** 0.63** 0.62** Students high 0.44** 0.58** 0.59** Note: Students low ⫽ ratings from undergraduates who saw ⫹l SD fewer movies than average; Students average ⫽ ratings from undergraduates who saw between ⫺1 SD and ⫹1 SD movies; Students high ⫽ ratings from undergraduate who saw ⫹1 SD more movies than average. ** p ⬍ 0.01. The means and standard deviations for the three groups are presented in Table 3. An analysis of variance provided evidence that the scores were significantly different [F(1.32) ⫽ 103.94, p ⬍ 0.001, partial eta squared ⫽ 0.19 (Greenhouse- Geisser-adjusted)], with the low group of students having significantly higher ratings (and greater variability within those ratings). Ratings for the average and high student groups were roughly a standard deviation lower than the low student group but still higher than for novices and critics (see Table 1). Table 4 includes correlations between the three student groups’ ratings and the critic and novice ratings used earlier. The low group of students had low correlations with all three non-student groups, yet the average and high groups had moderate correlations with the critics’ and novices’ ratings. Interestingly, the correlations for the average and high groups of students were very similar to those of the entire student group included in Table 2. DISCUSSION The results of this study indicate that the novice ratings used in studies such as Plucker, Holden, and Neustadter (2008) are indeed different from those of laypeople—but they are also different from the ratings of professional critics. In general, critics have lower ratings than non-critics (i.e., novices submitting rat- ings to film Web sites), who in turn have lower ratings than college students. Col- lege students who see relatively few movies appear to have especially high ratings and, predictably, greater variation in their ratings compared to the other groups. CRITERION PROBLEM AND CREATIVITY IN FILM 475 Psychology & Marketing DOI: 10.1002/mar
mar265_05_20283.qxd 3/19/09 10:53 AM Page 476 We find it especially interesting that the film Web site users appear to be different from both critics and college students. In a sense, the Web site users are firmly in between the two other groups: Their mean ratings roughly split the difference between the mean ratings of the other two groups, and their ratings correlate with critics at a magnitude (r ⬎ 0.70) that is on the high end of how critics tend to correlate with each other (see Boor, 1990, 1992). In this sense, they are truly “novice critics”—not quite laypeople, but not quite professional crit- ics, either. We would argue that the Web site novice critics are akin to the gifted novices tested by Kaufman, Baer, and Gentile (2004) as opposed to true novices. Indeed, most of the users of imdb.com (for example) have much more experience and interest in movies than a typical novice. Many aspiring actors become members of the site hoping to network with industry professionals. It is impossible to deter- mine how many of the Web site novice ratings were actually given by experts. Given these findings and the past research by Kaufman, Baer, Runco, and oth- ers, it seems that there truly is some type of difference between novice and expert ratings, but the key is determining who is genuinely an expert and who is genuinely a novice. “True” novices—college students who are very unlikely to work in the film industry—were not only significantly different from profes- sional critics (a clear example of “expert” raters) but were also different from sup- posed novices from movie Web sites. We wonder whether it may be beneficial to approach the novice–expert Consensual Assessment Technique ratings ques- tion as a continuum. Perhaps instead of strict categories such as “novice” and “expert,” we need more labels that reflect the vast diversity of experience that can be found for nearly all domains. Future research on the marketing of movies should recognize the continuum of evaluation expertise that exists in the avail- able film data. One model of creativity that includes a developmental trajectory is Beghetto and Kaufman’s (2007) Four-C model. This model traces creativity from the per- sonal discoveries of mini-c to the everyday innovations of little-c to the profes- sional accomplishments of Pro-c to the eminent genius of Big-C. A similar arc of expertise categories may be needed. Just as Beghetto and Kaufman argue that creativity is too diffuse to be constrained into the little-c and Big-C labels, so too may expertise be unable to be pigeonholed into “novices” and “experts.” Viewing expertise—or, perhaps more accurately, experience evaluating the quality of creative products—on a continuum has other advantages. For exam- ple, Czikszentmihalyi (1988), in his popular model of creativity, talks about the role of creative gatekeepers who help decide what products within a specific field are to be judged as being creative. He allows for contexts in which the stature of the gatekeepers may make it easier or harder for a creative product to be accepted, and the results of the present study suggest that the role of “gatekeeper” be viewed broadly, at least in the case of film. For example, the continuum implies that a gatekeeper for one person may be a well-known critic; for another, novice critics on the most popular film Web sites; for yet another, their next-door neighbor or best friend. The continuum perspec- tive also suggests that gatekeepers may occupy hybrid positions between cate- gories, such as the case of a novice critic who starts her or his own movie blog, then adds a weekly review column on a film Web site, then begins to review occa- sionally for a local paper, et cetera. This person is clearly moving along the con- tinuum, primarily in the gray areas that separate laypeople from novice critics 476 PLUCKER, KAUFMAN, TEMPLE, AND QIAN Psychology & Marketing DOI: 10.1002/mar
mar265_05_20283.qxd 3/19/09 10:53 AM Page 477 from professional critics. We anticipate this sort of movement to occur increas- ingly with the explosion in avenues for expression provided by information tech- nology. Indeed, with the new avenues for communication provided by technological advances, these transitory experts may be the “tastemakers” or, as Gladwell (2000) describes them, the Mavens, Connectors, and Salespeople that marketers seek to reach. Limitations This study has a number of important limitations. First and foremost, the sam- ple chosen to represent laypeople consisted of a very specific student population that is probably more diverse than typical college students. At the same time, research with college students often involves samples lacking in diversity, so the present sample is not a major weakness—but it does reinforce the need to replicate these results with different student and layperson populations (i.e., non- students). Second, although we collected novice critic ratings based on recommendations from previous research, other sites and data collection strategies may produce dif- ferent patterns of results. Third, data were collected at one point in time, at least 1–2 years after each film was released in the theaters. For this reason, we could not control for when a movie was viewed (for all subsamples) or when a rating was published or released (for the professional and novice critics). Future research should consider the impact of time on the nature of movie ratings. REFERENCES Amabile, T. M. (1983). The social psychology of creativity. New York: Springer-Verlag. Baer, J., Kaufman, J. C., & Gentile, C. A. (2004). Extension of the consensual assessment technique to nonparallel creative products. Creativity Research Journal, 16, 113–117. Beghetto, R. A., & Kaufman, J. C. (2007). Toward a broader conception of creativity: A case for “mini-c” creativity. Psychology of Aesthetics, Creativity, and the Arts, 1, 73–79. Boor, M. (1990). Reliability of ratings of movies by professional movie critics. Psycholog- ical Reports, 67, 243–257. Boor, M. (1992). Relationships among ratings of motion pictures by viewers and six pro- fessional movie critics. Psychological Reports, 70, 1011–1021. Csikszentmihalyi, M. (1988). Society, culture, and person: A systems view of creativity. In R. J. Sternberg (Ed.), The nature of creativity: Contemporary psychological per- spectives (pp. 325–339). New York: Cambridge University Press. Csikszentmihalyi, M. (1999). Implications of a systems perspective for the study of cre- ativity. In R. J. Sternberg (Ed.), Handbook of creativity (pp. 313–335). New York: Cambridge. Delmestri, G., Montanari, F., & Usai, A. (2005). Reputation and strength of ties in pre- dicting commercial success and artistic merit of independents in the Italian feature film industry. Journal of Management Studies, 42, 975–1002. Gladwell, M. (2000). The tipping point: How little things can make a big difference. Boston: Little, Brown. Hennessey, B. A., & Amabile, T. M. (1988). The conditions of creativity. In R. J. Sternberg (Ed.), The nature of creativity: Contemporary psychological perspectives (pp. 11–38). New York: Cambridge University Press. Holbrook, M. B. (1999). Popular appeal versus expert judgments of motion pictures. Jour- nal of Consumer Research, 26, 144–155. CRITERION PROBLEM AND CREATIVITY IN FILM 477 Psychology & Marketing DOI: 10.1002/mar
mar265_05_20283.qxd 3/19/09 10:53 AM Page 478 Kaufman, J. C., Baer, J., Cole, J. C., & Sexton, J. D. (2008). A comparison of expert and nonexpert raters using the consensual assessment technique. Creativity Research Journal, 20, 171–178. Kaufman, J. C., Baer, J., & Gentile, C. A., (2004). Differences in gender and ethnicity as measured by ratings of three writing tasks. Journal of Creative Behavior, 39, 56–69. Kaufman, J. C., Gentile, C. A., & Baer, J. (2005). Do gifted student writers and creative writing experts rate creativity the same way? Gifted Child Quarterly, 49, 260–265. Kaufman, J. C., Plucker, J. A., & Baer, J. (2008). Essentials of creativity assessment. New York: Wiley. MacKinnon, D. W. (1978). In search of human effectiveness: Identifying and developing creativity. Buffalo, NY: The Creative Education Foundation. Plucker, J. A., & Renzulli, J. S. (1999). Psychometric approaches to the study of human creativity. In R. J. Sternberg (Ed.), Handbook of creativity (pp. 35–61). New York: Cambridge. Plucker, J. A., Holden, J., & Neustadter, D. (2008). The criterion problem and creativity in film: Psychometric characteristics of various measures. Psychology of Aesthetics, Cre- ativity, and the Arts, 2, 190–196. Runco, M. A., McCarthy, K. A., & Svenson, E. (1994). Judgments of the creativity of art- work from students and professional artists. Journal of Psychology, 128, 23–31. Simonton, D. K. (2002). Collaborative aesthetics in the feature film: Cinematic components predicting the differential impact of 2,323 Oscar-nominated movies. Empirical Stud- ies of the Arts, 20, 115–125. Simonton, D. K. (2004a). Group artistic creativity: Creative clusters and cinematic suc- cess in feature films. Journal of Applied Social Psychology, 34, 1494–1520. Simonton, D. K. (2004b). The “Best Actress” paradox: Outstanding feature films versus exceptional women’s performances. Sex Roles, 50, 781–794. Simonton, D. K. (2004c). Film awards as indicators of cinematic creativity and achieve- ment: A quantitative comparison of the Oscars and six alternatives. Creativity Research Journal, 16, 163–172. Simonton, D. K. (2005a). Cinematic creativity and production budgets: Does money make the movie? Journal of Creative Behavior, 39, 1–15. Simonton, D. K. (2005b). Film as art versus film as business: Differential correlates of screenplay characteristics. Empirical Studies of the Arts, 23, 93–117. Simonton, D. K. (2006). Cinematic creativity and aesthetics: Empirical analyses of movie awards. In P. Locher, C. Martindale, & L. Dorfman (Eds.), New directions in aesthet- ics, creativity and the arts: Foundations and frontiers in aesthetics (pp. 123–136). Amityville, NY: Baywood. Simonton, D. K. (2007a). Cinema composers: Career trajectories for creative productiv- ity in film music. Psychology of Aesthetics, Creativity, and the Arts, 1, 160–169. Simonton, D. K. (2007b). Film music: Are award-winning scores and songs heard in suc- cessful motion pictures? Psychology of Aesthetics, Creativity, and the Arts, 1, 53–60. Simonton, D. K. (2007c). Is bad art the opposite of good art? Positive versus negative cin- ematic assessments of 877 feature films. Empirical Studies of the Arts, 25, 143–161. Zickar, M. J., & Slaughter, J. E. (1999). Examining creative performance over time using hierarchical linear modeling: An illustration using film directors. Human Performance, 12, 211–230. The authors appreciate the assistance of Gary Zhou, Haiying Long, and Minjing Duan, all of whom helped with gathering extant data and constructing the final database for the study. Correspondence regarding this article should be sent to: Jonathan A. Plucker, Center for Evaluation and Education Policy, 1900 E. 10th Street, Bloomington, IN 47406 ( jplucker@ indiana.edu). 478 PLUCKER, KAUFMAN, TEMPLE, AND QIAN Psychology & Marketing DOI: 10.1002/mar
You can also read