A Comparison of Two Desktop Search Engines: Google Desktop Search (Beta) vs. Windows XP Search Companion

Page created by Tommy Hogan
 
CONTINUE READING
A Comparison of Two Desktop Search Engines: Google
        Desktop Search (Beta) vs. Windows XP Search
                        Companion
                                                              Paula A. Farina

ABSTRACT                                                                     Google Desktop Search (Beta) is available for free download on
The beta version of “Google Desktop Search” (GDS) made its                   Google’s website [3]. Other popular web search engines have
public debut in October 2004, touted by the folks at Google as               come out of the woodwork in recent months as well with free
being “how our brains would work if we had photographic                      desktop search products, including Yahoo Desktop Search (Beta)
memories” [3]. Similar to the way Google searches the web, GDS               and MSN Desktop Search [1], and a few have been around for
searches your computer’s personal index and digs up files you                some time now, including HotBot [1] and Copernic [4]. Google
may have forgotten you even had. This paper reveals the strengths            stands to be the biggest contender due to the overwhelming
and weaknesses of GDS and compares its performance to                        popularity of its online search engine. Google has legions of users
Windows XP’s built-in “Search Companion” (WSC). Though                       who will find the look of its desktop search browser (almost
GDS performs searches with lightning-fast speed, it only does so             identical to their web search browser) familiar and easy-to-use [4].
with a limited selection of file types due to the fact that it is still in
its beta stage. WSC, on the other hand, can search for any file              While each of these products promises to be better than any other
type, but it tends to be slower due to the fact that it searches in          out there, few have compared any of them to the desktop search
more locations than GDS does. This paper discusses the                       function that has been a part of the Windows operating system for
differences in performance between the two desktop search                    years, right under users’ noses. Indeed, Microsoft has included a
engines.                                                                     search function in the start menu of each of its operating system
                                                                             releases since Windows 95. Windows XP’s Search Companion
                                                                             (WSC) has a “Windows Indexing Service” that may be switched
Keywords                                                                     on or off. Without indexing, WSC grinds away slowly through
Search engine, desktop search, precision, indexing service,                  each file, searching word by word for the intended target. With
Google Desktop Search, Windows Search Companion.                             indexing switched on, searches are notably faster. Narrow down
                                                                             the folders that are searched, excluding hidden files and folders
1. INTRODUCTION                                                              and system folders, and the search is that much faster.
For years, I have been what I call a “virtual pack rat” – a
computer owner with a habit of hoarding files on my PC in                    This paper aims to determine which is the overall superior desktop
random folders and on whatever drive space I can find. Saving a              search mechanism in terms of performance: Google Desktop
file is far less worrisome than emptying the recycling bin. I am a           Search or Windows XP’s Search Companion. Due to limited time
frequent user of Windows XP’s “search” function – a.k.a., Search             and resources, no other products will be examined.
Companion – which I find slow and cumbersome. For the longest
time, I have wished that I could just “google” my hard drive for             For the purposes of this study, performance is defined as an
instantaneous results. I guess I was not the only one wishing for            amalgamation of the following characteristics:
this, as Google released the beta of its “Google Desktop” search             •   Search time – the lower, the better;
engine on October 14, 2004 [4].                                              •   Number of total hits – the more, the better;
                                                                             •   Hit rate – the higher, the better;
Hard drives are getting bigger and bigger, enabling those of us              •   Precision of hits – the higher, the better.
who already exhibit pack rat tendencies to indulge in this habit
even more. Thus, there is an urgently increasing need for a better           Thus, this paper really looks at the following series of null
search tool to scour our computers and find that obscure file or             hypotheses:
email that we might otherwise never find. Over the years, a few              1) There is no statistically significant difference in search time
companies have met this call and released an assortment of                       between GDS and WSC when conducting a single-word
desktop search products, some available for free download and                    search on file contents.
others available for a fee. Google’s online search engine’s                  2) There is no statistically significant difference in total number
popularity will no doubt catapult its desktop search engine ahead                of hits between GDS and WSC when conducting a single-
of the other free desktop engines on the market. People loyal to                 word search on file contents.
Googling the web will inevitably want to try Google Desktop.                 3) There is no statistically significant difference in hit rate
Though in its beta stage, Google Desktop marks an important                      between GDS and WSC when conducting a single-word
point in the evolution of the relationship between man and                       search on file contents.
computer, promising to make man the victor in the struggle                   4) There is no statistically significant difference in precision of
against misplaced electronic files.                                              search results between GDS and WSC when conducting a
                                                                                 single-word search on file contents.

                                                           21st Computer Science Seminar
                                                                    SA3-T3-1
5)  There is no statistically significant difference in search time      single-clicking on the GDS “swirl” icon in the Windows system
    between GDS and WSC when conducting a multiple-word                  tray and selecting “Preferences” -- which in turn brought up the
    (phrase) search on file contents.                                    GDS browser (a regular Internet Explorer browser) -- making the
6) There is no statistically significant difference in total number      appropriate changes, and clicking on the “save preferences”
    of hits between GDS and WSC when conducting a                        button.
    multiple-word (phrase) search on file contents.
7) There is no statistically significant difference in hit rate          Already mentioned was the fact that WSC can look for just about
    between GDS and WSC when conducting a multiple-word                  anything under the sun, so setting up its preferences required a
    (phrase) search on file contents.                                    little more work. The folders designated not-to-be-searched in
8) There is no statistically significant difference in precision of      GDS were also designated as such in WSC. This meant going to
    search results between GDS and WSC when conducting a                 each undesirable folder and marking it “hidden” before even
    multiple-word (phrase) search on file contents.                      conducting a search, and then within the Search Companion
9) There is no statistically significant difference in search time       window, making sure the boxes next to “search system folders”
    between GDS and WSC when conducting a search by file                 and “search hidden files and folders” were both deselected.
    name and type.                                                       Searching “all files and folders” was appropriate for the
10) There is no statistically significant difference in total number     filename/type searches but too broad for the single-word and
    of hits between GDS and WSC when conducting a search by              multiple-word searches; instead, these searches were performed
    file name and type.                                                  on “documents (word processing, spreadsheets, etc.)”.
11) There is no statistically significant difference in hit rate
    between GDS and WSC when conducting a search by file                 Three experiments were conducted in the course of this study.
    name and type.                                                       One examined each search engine’s performance in searching the
12) There is no statistically significant difference in precision of     contents of files for a single word, another examined each
    search results between GDS and WSC when conducting a                 engine’s performance in searching file contents for a
    search by file name and type.                                        two-or-three-word phrase, and the third examined each engine’s
                                                                         performance in searching for specific files by name and extension
In addition, this paper discusses the differences in interface           (type). The experiments were performed on a Dell Inspiron 9200
usability between GDS and WSC, which on a subjective level               laptop computer possessing 512MB of RAM. While each search
might affect performance.                                                was conducted, no other windows or applications were open. GDS
                                                                         features a built-in search timer. WSC, however, lacks this tool, so
                                                                         for WSC, a basic stopwatch (featuring 0.01-second precision) was
2. PROCEDURE                                                             used to measure search time.
Difficulties presented themselves almost immediately in trying to
compare GDS and WCS. Because GDS is in its beta stage, it lacks
                                                                         Both GDS and WSC had to perform an initial sweep of the hard
scope. The current version can search for the following file
                                                                         drive to form an index. Though it was unclear how long the initial
types: MS Word documents (.doc), MS Excel documents (.xls),
                                                                         WSC indexing process took, the GDS process took approximately
MS PowerPoint documents (.ppt), Outlook / Outlook Express
                                                                         10 minutes of idle CPU time. (According to Google, this process
emails, Web sites (Web history), text documents (.txt), and AOL
                                                                         could take several hours, depending on how much material is on
chats. GDS only looks in folders where a user typically stores
                                                                         the user’s hard drive [3].) Once each index was established, it was
documents, i.e., it does not look in system folders [3]. WSC, on
                                                                         updated on a continual basis as the computer was put to
the other hand, is too broad in scope and tends to search every
                                                                         everyday use.
single folder on the hard drive, unless told otherwise. Nothing is
off-limits to WSC (though it does not seem to search through
                                                                         Sample generation for the experiment consisted of coming up
emails), and this lack of focus is what slows it down, even with
                                                                         with three random lists: a list of 30 words, a list of 30 phrases, and
the indexing service enabled. While much can be said about the
                                                                         a list of 30 filenames (with extensions). A sample size of n = 30
benefits of a comprehensive search, one that is too comprehensive
                                                                         was selected as a decent sample size (again, given time and
will dig up obscure system files that no average user would
                                                                         resource constraints) for a population ranging from 46 (total file
understand, much less find useful.
                                                                         names) to 1928 (total words) in size. The population from which
                                                                         the samples were drawn was generated by a thorough examination
In light of these differences and to make the comparison fair and
                                                                         of each file, located “out in the open” on the laptop, that fell into
ensure its validity, both search engines’ preferences were adjusted
                                                                         one of the following filetype categories: Word documents, text
to level the playing field, more or less. This paper focuses on the
                                                                         files, Word Perfect documents, PowerPoint files, PDF files, and
retrieval of documents including Word, Excel, PowerPoint, text,
                                                                         Excel spreadsheets. An item “out in the open” meant that it was
and the like; retrieval of emails, Web sites, and chats is outside the
                                                                         on the desktop or in an unhidden subfolder of “Documents and
realm of this study. As already mentioned, GDS does not have the
                                                                         Settings”. This population, a master list of words, was run through
ability to search system folders – this omission was intentional.
                                                                         the random number generator available at www.random.org [6].
According to the folks at Google, GDS indexes and searches “the
                                                                         For each search performed, search time, total number of hits, and
files and folders on your hard drive (the ones you actually look at,
                                                                         precision were recorded. The precision of a given hit was gauged
not the system files only your computer uses) [3].” Moreover,
                                                                         by whether the file contained the precise term upon which the
GDS preferences were adjusted to exclude the searching of the
                                                                         search was performed. If not, the hit was deemed imprecise. As
following folders: c:\I386, c:\DRIVERS, and c:\DELL. They were
                                                                         such, precision was measured and recorded as a proportion of
also adjusted such that the following items were not included in
                                                                         precise hits to total hits [7].
searches: Outlook emails, Outlook Express emails, Web history,
and AOL chats. This was accomplished fairly effortlessly by

                                                        21st Computer Science Seminar
                                                                 SA3-T3-2
3. RESULTS                                                                    Table 2. Correlated t-Test for significance – Single-Word
Descriptive and inferential statistics were used to analyze the raw                             Search – Total Hits
data. Because the sample was the same between the GDS and the                    Statistic          Total Hits GDS       Total Hits WSC
WSC search runs, the correlated t-Test was the statistical analysis                mean              6.266666667               12.1
tool of choice for the data in this study. It was determined a priori
                                                                                      var            40.89195402           128.162069
to data collection that a significance level of α = 0.05 would be
                                                                                   st dev            6.394681698          11.32086874
used to avoid Type II errors and that the testing would be
two-tailed in nature to avoid Type I errors, since the null                            n                  30                    30
hypotheses were non-directional. These standards are generally                         df                 29
recommended by the scientific research community in order to                          tobs           -3.466102336
distinguish between chance and a statistically significant effect           p(tobs
3.2 Multiple-Word Search                                                  Table 7. Correlated t-Test for significance – Phrase Search –
A correlated t-Test (n = 30) was conducted to evaluate whether                                      Hit Rate
search time for a phrase differed significantly between GDS and                Statistic          Hit Rate GDS         Hit Rate WSC
WSC. Table 5 shows that the absolute value of the t statistic                    mean                  150              0.319663732
calculated from the data (tobs = -692.3642539) is much greater
                                                                                    var           21379.31034           0.024182933
than the critical t value (tcrit = 2.045230758), and the two-tailed
probability      is     much        less   than     the    α-level               st dev           146.2166555           0.155508628
(p = 1.00391E-62 < 0.05). Thus, the mean search time for a phrase                    n                  30                   30
differed significantly between GDS and WSC, with the GDS                             df                 29
mean search time being significantly less than the WSC mean                         tobs          5.611655452
search time.                                                              p(tobs
A correlated t-Test (n = 30) was conducted to evaluate whether           Table 12. Correlated t-Test for significance – Filename/type
the total number of hits from a filename/type search differed                                Search – Precision
significantly between GDS and WSC. Table 10 illustrates that the
                                                                              Statistic          Precision GDS        Precision WSC
absolute value of the t statistic calculated from the data
(tobs = 0.921043264) is less than the critical t value                          mean              0.500555556          0.977777778
(tcrit = 2.045230758), and the two-tailed probability is greater than              var            0.218400064          0.014814815
the α-level (p = 0.364626879 > 0.05). Thus, the mean total                      st dev            0.467332926          0.121716124
number of hits from a filename/type search did not differ                           n                   30                  30
significantly between GDS and WSC. Though the mean number                           df                  29
of hits garnered by GDS was greater than that of WSC, the degree                   tobs           -5.639825052
to which it was greater was insignificant.
                                                                         p(tobs
being searched for, GDS was not as consistently precise. In some        another where the other leaves off, so perhaps until a “final
cases, notably in file names containing spaces and/or                   release” is available, the GDS beta should be used as an adjunct
symbols/punctuation, GDS failed to find the target file altogether.     tool to WSC.
No doubt this is a weakness due to the engine’s beta status. GDS
and WSC performed equally well in terms of searches by phrase,
pulling up only those files containing the phrases being sought.
                                                                        5. FURTHER RESEARCH
                                                                        Further research must be done to compare other desktop search
                                                                        tools to GDS and WSC. Since GDS was released, other players
In terms of usability of interface, GDS is the clear winner. To
                                                                        known for their web browsers have entered or will soon be
start a new search, the user simply double-clicks on the GDS icon
                                                                        entering the desktop search market, including Yahoo, MSN,
in the bottom right of the XP taskbar to bring up the search
                                                                        HotBot, and Ask Jeeves [1]. America Online is also rumored to be
browser, types in the search term, and clicks the “Search” button.
                                                                        working on a desktop search product. Besides these competitors,
In the blink of an eye, the results appear, listed in order of
                                                                        GDS is also up against smaller companies already offering
Google-judged precision or date, depending on how the user
                                                                        desktop search products (some not available for free download),
would rather have results ranked. Within the browser main page,
                                                                        including X1, Vivisimo, and Copernic [4].
the user can click on “Preferences” to adjust the search
preferences. The search browser even displays the search time and
                                                                        The community of PC users would also benefit from a repeat of
lists a brief “abstract” of the items found in the search, with the
                                                                        this study once the full-blown (i.e., non-beta) version of GDS is
search term in boldface.
                                                                        released, though it is unclear when that will actually happen.
                                                                        Google claims that the full-blown version will be capable of
WSC, on the other hand, generally needs to be started from the
                                                                        searching more types of files.
Start menu, unless the user knows the “magic” shortcut that
consists of pressing the Windows key and F-key simultaneously.
                                                                        In addition, future plans at Google include developing a Mac
Either action brings up the WSC window and the “Search
                                                                        version of GDS [3]. When and if that happens, research should be
Companion”, which can be chosen from a variety of annoying
                                                                        done to compare GDS to the desktop search tools available to
animated characters, including a dog that “fetches” items for you,
                                                                        Mac users, most notably its upcoming Spotlight for the Tiger
known in some circles as “Fido the Time-Killing Windows Dog”
                                                                        operating system, a desktop search tool that promises to “find
[5]. (Fortunately, the user can opt to turn this feature off, and it
                                                                        anything on your computer as quickly as you type” [8].
was turned off for the purposes of this study.) At that point, the
user must then determine whether to search for “pictures, music,
or video”, “documents (word processing, spreadsheet, etc.)”, “all       6. REFERENCES
files and folders”, or “computers or people”. Upon selecting one        [1] P. Boutin, “Keeper Finders: Five new programs that let you
of these options, the user must then type in the search term, with          search your hard drive without having a seizure,” Slate, 31
the option of narrowing the search by clicking on “more advanced            Dec. 2004; http://slate.msn.com/id/2111643/.
options”. In other words, there is no way to go to a convenient,
centralized place to change the preferences in WSC. All the             [2] R.B. Burns, Introduction to Research Methods, Sage
clicking around is tedious and time-consuming. WSC, moreover,               Publications, 2000.
does not give a preview of file contents the way GDS does,              [3] “Google Desktop Beta,” 22 Oct. 2004; http://desktop.google.
forcing the user to open each file to determine whether it is the           com/.
intended target.
                                                                        [4] S. Olsen, “Google Unveils Desktop Search,” CNET
                                                                            News.com, 14 Oct. 2004; http://news.com.com/
In light of the results of this study, it is difficult to make a
                                                                            2100-1024_3-5408765.html.
summative statement that declares which search engine is best in
terms of overall performance as defined earlier in this paper. Each     [5] D. Pogue, “Google Takes On Your Desktop,” The New York
has its own strengths and weaknesses. Though GDS is incredibly              Times On The Web, 21 Oct. 2004; http://www.nytimes.com/
fast, it lacks the depth and precision of WSC, which in turn is             2004/10/21/technology/circuits/21stat.html.
slow and cumbersome in comparison to GDS. By and large, GDS             [6] “Random.org – True Random Number Service,” Jan. 2005;
has the potential to surpass WSC’s performance in all facets when           http://random.org/nform.html.
it matures beyond the beta stage. There are a few things that
Google needs to work out prior to achieving this, but the current       [7] D. Sullivan, “Search Engine Glossary,” Search Engine
product puts them in a very good position and seems to be based             Watch, 17 Jan. 2005; http://searchenginewatch.com /facts/
on a highly efficient algorithm. Google gained its popularity as a          article.php/2156001.
web search engine, and undoubtedly its followers will want to try       [8] “Tiger Preview – Spotlight,” 17 Jan. 2005; http://www.apple.
its desktop tool. For now, GDS and WSC seem to pick up for one              com/macosx/tiger/spotlight.html.

                                                       21st Computer Science Seminar
                                                                SA3-T3-6
You can also read