How Scholarly Is Google Scholar? A Comparison to Library Databases
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
How Scholarly Is Google Scholar? A Comparison to Library Databases Jared L. Howland, Thomas C. Wright, Rebecca A. Boughan, and Brian C. Roberts Google Scholar was released as a beta product in November of 2004. Since then, Google Scholar has been scrutinized and questioned by many in academia and the library field. Our objectives in undertaking this study were to determine how scholarly Google Scholar is in comparison with traditional library resources and to determine if the scholarliness of materials found in Google Scholar varies across disciplines. We found that Google Scholar is, on average, 17.6 percent more scholarly than materials found only in library databases and that there is no statistically significant difference between the scholarliness of materials found in Google Scholar across disciplines. Originally presented June 30, 2008, at the as a beta version, has not only become a American Library Association’s Annual common fixture in library literature but is Conference in Anaheim, California. also becoming ubiquitous in information- seeking behavior of users. Google Scholar oogle Scholar was introduced was initially met with curiosity and skep- to the world in November of ticism.1 This was followed by a period of 2004 as a beta product. It has systematic study.2 More recently, there has been embraced by students, been optimism about Google Scholar’s scholars, and librarians alike. However, potential to move us toward Kilgour’s Google Scholar has received criticism goal of 100 percent availability of infor- regarding the breadth and scope of mation.3 Librarians now find themselves available content. We undertook this acknowledging users’ preferences for study to answer two questions regarding one-stop information shopping by giving these common criticisms: (1) Are Google Google Scholar ever-increasing visibility Scholar result sets more or less scholarly on their Web pages.4 Even as librarians than licensed library database result sets? begin to promote Google Scholar, the and (2) Does the scholarliness of Google debate continues within the information Scholar vary across disciplines? community as to the advisability of guid- ing users to this tool. The view of critics Literature Review like Péter Jascó, who use terms such as Google Scholar, which is still branded “shallowness” and “artificial unintel- Jared L. Howland is Electronic Resources Librarian, Thomas C. Wright is Collection Development Coordina- tor, Rebecca A. Boughan is Electronic Resources Assistant, and Brian C. Roberts is Process Improvement Specialist, all at Brigham Young University. E-mail: jared_howland@byu.edu, tom_wright@byu.edu, rebecca_boughan@byu.edu, brian_roberts@byu.edu. 227
228 College & Research Libraries May 2009 ligence” to describe the program,5 seems be to include each citation in a scholarly to be giving way to a landscape where research paper at an academic institution. respected publishers (like Cambridge) This notion of scholarliness, that we and platforms (for instance, JSTOR) are attempted to encapsulate in the rubric, now offering links out to Google Scholar uses a common collection-assessment tool for more citations. as outlined by Kapoun.9 This model con- Early studies of Google Scholar tried to siders many factors, including accuracy, match citations “hit to hit” in comparison authority, objectivity, currency, and cov- with traditional library databases. Jascó erage. For the purposes of the study, we even provided a Web site where the curi- added relevancy because materials that ous could compare search results between met those five criteria were not always Google Scholar and the likes of Nature, relevant to the research topic. Wiley, or Blackwell.6 More recently, stud- The Kapoun model of evaluating ies have appeared that track the “value- scholarliness was based on his experience added” open-access citations that appear in evaluating print resources but was uniquely in Google Scholar versus other expanded for evaluating Web resources. sources.7 However, is comprehensive- Because this study was constructed ness of content the primary indicator of to compare library database results to a resource’s usefulness? Google Scholar results, we had no way Every title from every database may of knowing the breadth of materials the not be in Google Scholar, but that should rubric would be required to evaluate. not be an indictment of Google Scholar’s Google Scholar alone references materi- inability to return scholarly results across als in any format whether it is in print disciplines. The algorithms Google Schol- or electronic only and includes journals, ar uses to return result sets cannot really books, syllabi, and conference proceed- be compared to library database algo- ings. These are just a few examples of the rithms. However, what is returned can be disparate types of materials the rubric judged for its relevancy and scholarliness. would need to handle. By using a model Up to this point, studies of Google Scholar flexible enough to evaluate materials in have followed the example of Neuhaus any format, we have attempted to inject a et al., which compared Google Scholar qualitative value of Google Scholar results content to forty-seven other databases.8 to the ongoing debate. This title-by-title and citation-by-citation comparison is a pure numerical measure, Methodology but it neglects to address the efficacy of We selected seven subject librarians any particular search or the scholarly from Brigham Young University to cover nature of content or algorithms in discov- various academic disciplines: humani- ering that content. ties, sciences, and social sciences. Each We felt that a different approach was specialist was blind to the purpose of the needed. Rather than measuring what has study. We requested that they provide us gone into the database, we have sought, to (1) a sample question that they typically some degree, to evaluate what comes out receive from students, (2) a structured as a result of search queries. We have done query to search a library database, and this by involving subject librarians with (3) the library database they would use knowledge of typical reference questions for that particular query. and using those questions to query both We then used their data in two differ- Google Scholar and discipline-specific ent ways. First, we translated the library databases. We then asked the same librar- database query into an equivalent search ians to judge the search results using a ru- string used by Google Scholar. Using the bric of scholarliness. In short, we wanted original query and the query translated to to determine how appropriate it would work with Google Scholar, we searched
How Scholarly Is Google Scholar? 229 Table 1 academic Representation in This Study academic Database Query GS Query library Database Discipline Science (ACL or “anterior cruci- ACL OR “anterior cruci- SportDiscus ate ligament*”) and in- ate ~ligament” ~in- jur* and (athlet* or sport jury ~athlete OR sport or sports) and (therap* ~therapy OR ~treatment or treat* or rehab*) OR ~rehabilitation Science lung cancer and (etiol* lung cancer ~etiology Medline or caus*) and (cigarette* OR ~cause ~cigarette OR or smok* or nicotine*) ~smoking OR ~nicotine Science “dark matter” and “dark matter” evidence Applied Science evidence and Technology Abstracts Social (“fast food” or mcdon- “fast food” OR mc- Business Source Science ald’s or wendy’s or donald’s OR wendy’s Premier “burger king” or restau- OR “burger king” OR rant) and franchis* and restaurant ~franchise (knowledge n3 transfer “knowledge transfer” OR or “knowledge manage- “knowledge management” ment” or train*) OR ~train Social (“standardized test*” or “standardized ~test” PsycINFO Science “high stakes test*”) and OR “high stakes ~test” (“learning disabilit*” “learning ~disability” OR or dyslexia or “learning dyslexia OR “learning problem”) and accom- problem” ~accommoda- modat* tion Humanities (bilingual* or L2) and ~bilingual OR L2 ~child Linguistics and (child* or toddler) and OR toddler “cognitive Language Behavior “cognitive development” development” Abstracts Humanities (memor* or remem- ~memor OR remembrance JSTOR brance or memoir*) and OR ~memoir holocaust (holocaust) and (Spiegel- Spiegelman OR Maus man or Maus) both the library database and Google in the library database. This allowed us to Scholar and retrieved the citations and calculate the overlap of citations between full text for the first thirty results. We the library databases and Google Scholar. selected thirty results because research We standardized the formatting of the has shown that less than one percent of citations and inserted them randomly into all users ever go beyond a third page of a spreadsheet, which contained a rubric results and most search engines return that was used to assign a scholarliness about ten results per page.10 score to each of the citations. The rubric Next, we took the citations from the contained six criteria, based on Kapoun’s library databases and determined if they model of evaluating resources, to judge could also be found using Google Scholar scholarliness: (1) accuracy, (2) authority, and took the citations from Google (3) objectivity, (4) currency, (5) coverage, Scholar to see if they could also be found and (6) relevancy.11 These criteria were
230 College & Research Libraries May 2009 Table 2 Rubric for Grading Scholarliness 1 = Below Average Quality; 2 = Average Quality; 3 = Above Average Quality Citation References accuracy authority Objectivity Currency Coverage Relevancy Number 1 Barnes, J.E., & 123 123 123 123 123 123 Hernquist, L.E. (1993) Computer models of col- liding galaxies. Physics Today, 46, 54–61. 2 Bergstrom, L. 123 123 123 123 123 123 (2000) Nonbary- onic dark matter: Observational evidence and de- tection methods. Reports on Prog- ress in Physics, 63(5), 793–841. graded on a scale of 1 (below average) to E = Effect due to “exclusivity” (i = 1, 2, 3) 3 (above average) and summed to create a L = Effect due to librarian (j = 1, 2, …7) total scholarliness score for each citation. EL = Interaction between “exclusivity” We provided the subject librarians with and librarian the full text of each of the citations and ε = Error term (k = degrees of freedom asked them to use the rubric to evaluate associated with the error term) the scholarliness of the individual cita- Within the context of this formula, tions. After the grading was completed, E controls for any effect due to the we were able to group each citation from “exclusivity” of the citation, where i the subject librarian into one of three cat- represents each of the three categories egories: (1) the citation was available only of “exclusivity” (that is, the citation in the library database, (2) the citation was was found only in the database, it was available only in Google Scholar, or (3) the found only in Google Scholar, or it was citation was available in both the library found in both the database and Google database and Google Scholar. We have Scholar). Each librarian (L) also played a used the term “exclusivity” to describe role in the total scholarliness score. One the three categories. librarian could have provided consis- Once we had grouped the citations by tently low scores, with another having category, we ran a statistical analysis that a tendency toward higher scores. To controlled for the effect of the individual account for this disparity, each librar- librarian on the total scholarliness score, ian was treated as a factor in the total for the effect of “exclusivity” and for any scholarliness score, where j represents interaction there may have been between each of the seven participants. In short, both librarian and “exclusivity”: this formula allowed us to calculate a measure of scholarliness while account- total scholarliness score = µ + Ei + Lj + ELij + εijk ing for differences in the location of Where: citations as well as differences between µ = Average total score librarians.
How Scholarly Is Google Scholar? 231 Table 3 Scholarliness based on “exclusivity” (Maximum Scholarliness Score Is 18) Participant Found Only Found Only Percent Change in Found in both in Database in GS average Scholarliness Score between average Score average Score the Database and GS Score 1 11.7 16.1 36.8% 13.5 2 13.2 13.8 4.5% 14.6 3 N/A 12.0 N/A 15.6 4 10.0 13.5 35.0% 14.3 5 10.0 11.6 16.0% 11.5 6 11.7 12.8 8.5% 14.3 7 16.5 14.4 –12.7% 13.9 least Squares 11.9 14.0 17.6% 14.2 Mean Results and licensed library databases had a The mean scholarliness score of citations higher average score than citations found found only in Google Scholar was 17.6 only in one or the other. Finally, there percent higher than the score for citations was no statistically significant difference found only in licensed library databases. found between the scholarliness score In fact, across all but one of the tested across disciplines within Google Scholar. disciplines, citations found only in Google Searching for either a humanities topic or Scholar had a higher average scholarliness a science topic yielded no difference in the score than citations found only in licensed scholarliness score of citations discovered library databases. The one discipline with in Google Scholar. a lower score, however, had only two unique citations in the library database, Discussion of Results so the exact significance of the scores for It is interesting to note that there was very that discipline is imprecise. Additionally, little overlap between the initial thirty the citations found in both Google Scholar citations returned by the databases and the initial thirty citations returned by Table 4 Google Scholar. In fact, only one query Overlap of Citations of the seven had any overlapping cita- tions between Google Scholar and the Participant Percent of Percent of database—an overlap of five citations Database GS Citations from JSTOR that appeared within the first Citations in Database thirty results in Google Scholar. in GS However, during the second phase of 1 76.7% 0.0% the study, when we began to search for 2 83.3% 43.3% specific citations, we found that Google Scholar actually contained 76 percent of all 3 100.0% 96.7% the citations found in the library databases, 4 96.7% 80.0% while the library databases contained only 5 93.3% 28.0% 47 percent of the citations found in Google 6 0.0% 46.7% Scholar. Despite the initial lack of overlap in the search results, it was clear that 7 81.8% 34.5% Google Scholar included a large portion of Average 76.0% 47.0% the citations available in library databases.
232 College & Research Libraries May 2009 This seems to validate the decision evancy search options seems to indicate of many students to use Google first to that Google Scholar got it right in the first look for information. If Google Scholar place. It appears that Google Scholar has contains much of the content available in done a better job of both precision and library databases, why shouldn’t students recall than library databases have. begin where the most content exists? The Many studies have compared content argument is made that Google Scholar in library databases to content in Google will return millions of hits, many of Scholar and found inconsistencies. The which are spurious at best, while a library purpose of both search systems, however, database will only return a few thousand is to discover relevant, scholarly content. results that are more focused to the query. Using our scholarliness model, we found However, the power of ordering results that, across disciplines, Google Scholar by relevancy, combined with the fact that is generally superior to individual data- very few people ever go beyond the third bases in retrieving appropriate citations. page of results, creates a searcher-imposed As more publishers share their content higher level of precision for any search with Google Scholar, we would expect the engine. This is particularly true of Google effectiveness of a Google Scholar search Scholar, where the most relevant and more to increase. scholarly material floats to the top of the list, while the less precise material falls Future Studies to the bottom, where it is rarely seen. Hit The statistical results from this study can counts are of secondary importance in a be extrapolated only to the specific topics Google Scholar search; the key to Google and subject librarians that were involved Scholar’s success is relevancy ranking and in the study. A more comprehensive sta- a large universe of information. tistical methodology would need to be A database is limited to its defined constructed to make the results generally title list of content, whereas Google applicable. However, our results were Scholar, by its very nature, is open to a compelling enough to make us believe that much broader set of content that aids the the results would hold up to more strenu- researcher. Business Source Premier, one ous tests. Additionally, the rubric we used of the library databases, was the only in our study was only a three-point Likert library database where we found more scale. Finding statistically significant dif- Google Scholar citations in the database ferences would have been easier had we than database citations in Google Scholar. selected a seven or more point Likert scale. However, even in this one instance, the Additionally, our analysis used a vet- scholarly score for citations found only ted approach to evaluating scholarliness in Google Scholar was higher than the of resources. A more objective view of score for citations found only in the data- scholarliness could be obtained by using base. The citations found in both Google some variation of citation analysis (such Scholar and the database received even as citation counts or ISI impact factor). We higher scores, and these citations were started to do such an analysis but decided only exposed through the first thirty hits there were too many trade-offs to be ap- in Google Scholar. This seems to indicate propriate, given the methodology we that, even when Google Scholar is return- used for this study. For example, citation ing fewer titles, as in this case with Busi- counts are difficult to come by for materi- ness Source Premier, it still returns cita- als other than journal articles, and impact tions that are more scholarly to the top. factors are calculated for journals only Up to this point, many library data- and not for specific articles.12 Alternate bases have defaulted to sorting by date methodologies might be able to overcome rather than by relevancy. The fact that or account for the shortcomings of using many databases are now adding rel- citation analysis to judge scholarliness.
How Scholarly Is Google Scholar? 233 Our study used skilled librarians to discovering even more scholarly materi- create search queries and to judge the als than what is currently discovered in quality of the citations retrieved. Unlike Google Scholar. Comparing Google Schol- most students, the librarians used com- ar to a future system that has completely plex search queries to find more relevant indexed all local content and content results. Students would be more likely available to libraries but provided by third to use natural language queries to find parties would be the ultimate comparison. citations. Complex search queries could return very different results from natural Conclusion language queries. Future studies will Typical arguments against Google Scholar need to address the potential differences focus on citation counts and point to in- to find out if the results we found hold consistent coverage between disciplines. across different types of searches. We felt the more appropriate analysis Finally, future studies need to look at was to compare the scholarliness of re- the appropriateness of comparing Google sources discovered using Google Scholar Scholar to individual library databases. with resources found in library data- It is probable that federated searching is bases. This analysis showed that Google more comparable to Google Scholar than Scholar yielded more scholarly content are individual library databases. Howev- than library databases, with no statisti- er, how users and librarians select which cally significant difference in scholarliness resources to use in a federated search and across disciplines. Despite these findings, how the federated search engine returns Google Scholar is not in competition with the results would still impact the discov- library databases. In truth, without the erability of scholarly resources. Some cooperation of database vendors and pub- studies have already started down this lishers, Google Scholar would not exist as road,13 but Google Scholar result sets have it does today. Google Scholar is simply a still not been carefully compared to result discovery tool for finding scholarly infor- sets from federated search products. mation, while databases still perform the Libraries have begun to build local function of providing access to the content Google Scholars, using tools such as Primo unearthed by a Google Scholar search. (Ex Libris), AquaBrowser (Medialab Solu- The enhanced discoverability of informa- tions), and Encore (Innovative Interfaces), tion in Google Scholar makes it a great that have the potential to aid users in tool for librarians as well as library users. Notes 1. Jan Brophy and David Bawden, “Is Google Enough? Comparison of an Internet Search Engine with Academic Library Resources,” Aslib Proceedings 57, no. 6 (2005): 498–512; Susan Gardner and Susanna Eng, “Gaga Over Google? Scholar in the Social Sciences,” Library Hi Tech News 22, no. 8 (2005): 42–45; Martin Kesselman and Sarah Barbara Watstein, “Google Scholar and Libraries: Point/Counterpoint,” Reference Services Review 33, no. 4 (2005): 380–87. 2. Nisa Bakkalbasi et al., “Three Options for Citation Tracking: Google Scholar, Scopus and Web of Science,” Biomedical Digital Libraries 3, no. 7 (2006), available online at www.bio-diglib. com/content/3/1/7 [accessed 24 March 2008]; Kayvan Kousha and Mike Thelwall, “Google Scholar and Google Web/URL Citations: A Multi-Discipline Exploratory Analysis,” Journal of the American Society for Information Science and Technology 58, no. 7 (2007): 1055–65; Mary L. Robinson and Judith Wusteman, “Putting Google Scholar to the Test: A Preliminary Study,” Program: Electronic Library & Information Systems 41, no. 1 (2007): 71–80; Chris Neuhaus et al., “The Depth and Breadth of Google Scholar: An Empirical Study,” Portal: Libraries and the Academy 6, no. 2 (2006): 127–41. 3. Jeffrey Pomerantz, “Google Scholar and 100 Percent Availability of Information,” Informa- tion Technology and Libraries 25 (June 2006): 52–56. 4. Laura Bowering Mullen and Karen A. Hartman, “Google Scholar and the Library Web Site: The Early Response by ARL Libraries,” College and Research Libraries 67, no. 2 (2006): 106–22. 5. Péter Jascó, “Deflated, Inflated and Phantom Citation Counts,” Online Information Review
234 College & Research Libraries May 2009 30, no. 3 (2006): 297–309; Péter Jascó, “As We May Search: Comparison of Major Features of the Web of Science, Scopus, and Google Scholar Citation-Based and Citation-Enhanced Databases,” Current Science 89, no. 9 (2005): 1537–47; Péter Jascó, “Google Scholar: The Pros and the Cons,” Online Information Review 29, no. 2 (2005): 208–14. 6. Péter Jascó, “Side-by-side2: Native Search Engines vs Google Scholar” (2005). Available online at www2.hawaii.edu/~jacso/scholarly/side-by-side2.htm. [Accessed 24 March 2008]. 7. Kayvan Kousha and Mike Thelwall, “How Is Science Cited on the Web? A Classification of Google Unique Web Citations,” Journal of the American Society for Information Science and Technology 58, no. 11 (2007): 1631–44. 8. Neuhaus et al., “Depth and Breadth of Google Scholar.” 9. Jim Kapoun, “Teaching Undergrads Web Evaluation: A Guide for Library Instruction,” College and Research Libraries News 59, no. 2 (1998): 522–23. 10. Jakob Nielsen and Hoa Loranger, Prioritizing Web Usability (Berkeley, Calif.: New Riders, 2006). 11. Kapoun, “Teaching Undergrads Web Evaluation.” 12. Per O. Seglen, “Why the Impact Factor of Journals Should Not Be Used for Evaluating Research,” BMJ 314, no. 7079 (1997): 498–502. 13. Xiaotian Chen, “Metalib, Webfeat, and Google: The Strengths and Weaknesses of Federated Search Engines Compared With Google,” Online Information Review 30, no. 4 (2006): 413–27. 50th Annual RBMS Preconference Seas of Change: Navigating the Cultural and Institutional Contexts of Special Collections June 17-20, 2009 | Charlottesville, Virginia With major sponsorship from the University of Virginia Library and the Antiquarian Booksellers Association of America As the 50th anniversary of the RBMS preconference, this year represents an important moment in the history of RBMS and its affiliated professions. In addition to celebrating our achievements, we will also look broadly at how special collections librarianship has evolved over the past half century with respect to changes in social, cultural, technological, economic, and academic environments, and – more importantly – how we will need to respond to such changes in the future. A variety of workshops, seminars, short papers, discussion topics, tours, receptions and a booksellers’ showcase will complement the main program. For complete schedule, registration, and housing information, please visit: http://rbms.info Image of the H.M.S. Beagle from an original watercolor by Conrad Martens in the PaulVictorius Evolution Collection, courtesy of Special Collections, University of Virginia Library
You can also read