1 Introduction
The Web is now the primary source of information for many people (Cole, Suman, Schramm, Lunn, & Aquino, 2003; Fox, 2002). Over 80% of Web searchers use Web search engines to locate online informationor services (Nielsen Media, 1997). There is a critical need to understand how people use Web search engines. Amichai-Hamburger (2002) presents a review of the effect of the Web and the lack of awareness of the user in the design of Web systems and site content. The research reported in this article attempts to contribute to such a dialogue. Most research of Web searching provides little longitudinal, regional, or across system analysis. We need a clearer understanding of emerging Web searching trends across different global regions and between different Web search engines in order to design better searching systems.
This important research area directly impacts pay-per-click marketing, Web-site-optimization strategies, and Web and Intranet search engine design. It complements research such as that conducted by Liawa and Huangb (2003), who showed that individual experience, individual motivation, search engine quality, and user perceptions of technology acceptance are all factors affecting individual desire to use Web search engines.
In this paper, we present a comparison of nine major Web studies, four European and five US-based Web search engines, over a seven-year period. We provide a temporal comparison of differences in Web searching among and between US and European-based Web searches as one might expect some divergence due to linguistics and interface factors (Spink, Ozmutlu, Ozmutlu, & Jansen, 2002b). We specifically investigate the interactivity between searchers and Web search engines, identifying changes in the complexity of Web search interactions. In addition, we present a longitudinal analysis of the types of information people are searching for on the Web.
We center our research analysis on the interactions between the user and the search engine. Interaction has several meanings in information searching, although the definitions generally encompass query formulation, query modification, and inspection of the list of results, among other actions. Belkin, Cool, Stein, and Theil (1995) have extensively explored user interaction within an information session. Efthimiadis and Robertson (1989) present and categorize interaction at various stages in the information retrieval process from information seeking research. Bates (1990) presents four levels of interaction, which are move, tactic, stratagem, and strategy. Lalmas and Ruthven (1999) present two groups of interaction, that which occurs across sessions and that which occurs within a session.
This within-session category is the type of interaction that we examine in this study. We consider an interaction as any specific exchange between the searcher and the system (i.e., submitting a query, clicking a hyperlink, etc.). We define a searching episode as a series of interactions within a limited duration to address one or more information needs. This duration is typically short, with Web researchers using between 5 and 120 min to define a session duration (c.f., He, Go¨ker, & Harper, 2002; Montgomery & Faloutsos, 2001; Silverstein, Henzinger, Marais, & Moricz, 1999). The searcher may be multitasking (Spink, 2004) within a searching episode, or the episode may be an instance of the searcher engaged in successive searching (Lin, 2002; Spink, Wilson, Ellis, & Ford, 1998).
We begin with an extensive review of literature concerning the rapidly growing area of Web search engine research. We then present the datasets used in this study. We discuss the analysis, results, and implications of the results for the design of Web searching systems.
2 Related studies
There have been a few review articles on Web searching. Jansen and Pooch (2001) provide a review of Web transaction log research of Web search engines and individual Web sites through 2000. Hsieh-Yee (2001) reviews studies conducted between 1995 and 2000 on Web search behaviors. The researcher reports that many studies investigate the effects of certain factors on search behavior, including information organization and presentation, type of search task, Web experience, cognitive abilities, and affective states. Hsieh-Yee (2001) also notes that many studies lack external validity. Bar-Ilan (2004) presents an extension and integrative overview of Web search engines and the use of Web search engines in information science research. Bar-Ilan (2004) provides a variety of perspectives including user studies, social aspects, Web structure, and search-engine evaluation. We extend these review articles in this section, setting the stage for our analysis.
Web searching studies fall into three categories: (1) those that primarily use transaction-log analysis, (2) those that incorporate users in a laboratory survey or other experimental setting, and (3) those that examine issues related to or affecting Web searching. In this paper, we focus on studies using transaction log analysis. Romano, Donovan, Chen, and Nunamaker (2003) present a methodology for general qualitative analysis of transaction log data. Wang, Berry, and Yang (2003) and Spink and Jansen (2004) also present detailed explanations of approaches to transaction log analysis.
In investigations of single Web sites, Yu and Apps (2000) use transaction log data to examine user behavior in the SuperJournal project. For 23 months (February 1997 to December 1998), the researchers recorded 102,966 logged actions, related these actions to four subject clusters, 49 journals, 838 journal issues, 15,786 articles, and three Web search engines. In another study covering the period from 1 January to 18 September 2000, Kea, Kwakkelaarb, Taic, and Chen (2002) examined user behavior in Elseviers ScienceDirect, which hosts the bibliographic information and full-text articles of more than 1300 journals with an estimated 625,000 users. Loken, Radlinski, Crespi, Millet, and Cushing (2004) examined the transaction log data of the online self-directed studying of more than 100,000 students using a Web-based system to prepare for US college admissions tests for several months of use. The researchers noted several nonoptimal behaviors, including a tendency toward deferring study and a preference for short-answer verbal questions. The researchers discuss the relevance of their findings for online learning.
3 Discussion
As the Web is becoming a worldwide phenomenon, we need to understand better the emerging trends in Web searching given the tremendous influence Web search engines have on directing traffic to online information and services. Our findings indicate that the interactions between Web search engines and searchers are not becoming more complex, and in some respects, are becoming less complex. Our comparative analysis also indicates that finding from a study focusing on one Web search engine cannot be applied wholesale to all Web search engines.
Sessions lengths are not increasing as measured by number of queries. The percentage of one-term sessions is remaining stable over time and across Web search engines. There was a difference with the 1998 AltaVista study, but this appears to be caused by an artificially short session duration that the researchers used. Queries lengths are also not increasing as measured by number of terms. There was a statistical difference in the percentage of one-term queries on the German Fireball Web search engine, which may be due to linguistic differences with the other Web search engines. The percentage of single-term queries is holding steady, and the use of query operators is also remaining stable. Web search engines in the future may better leverage the implicit feedback from this interaction to provide more personalized results (Callan & Smeaton, 2003). However, the use of query operators between Web search engines varies significantly, so in this area findings from one study cannot necessary be applied to predict behaviors on other Web search engines.
The viewing of only the first page of results is extremely high, and it significantly increased over time on the Excite Web search engine. This may indicate increasing simplicity in interactions. It may also be an indication of the increasing ability of Web search engines to retrieve and rank Web documents more effectively. There is certainly a need for more studies that focus on the Web document and virtual document (Watters, 1999) level of analysis.
The trend toward viewing fewer result pages with Excite users may be related to a changing user base during the time of the study as the Web population dramatically increased during this time. Excite was the second most popular Web site in 1997 (Munarriz, 1997), and was the fifth most popular in 1999 and 2001 as measured by number of unique visitors (Cyber Atlas, 1999, 2001).
There are both similarities and differences between usage on US and European-based Web search engines. Searchers on both are similar in session length, query length, and number of results pages viewed. Additionally, the use of Web query operators on both is fairly stable. However, the usage of these advanced Web-query operators is much higher on US-based Web search engines than on their European counterparts. In investigating this difference, we ruled out size of content collections (they are all immense), user bases (they all number in the millions), or algorithmic sophistication (they are all similar in performance tests). Fireball and BWIE did not prominently display the advanced Web searching options; however, it may be that users of these Web search engines just do not use query operators. This increases the criticality of keyword and phrase selection for Web providers targeting these users.
Fireball is a general purpose Web search engine, but, BWIE is also a search directory. A search directory supplements query matching of the entire content collection with directory-based search (c.f., Yahoo http:// www.yahoo.com or Open Directory http://dmoz.org/). The idea behind directory services is to provide additional organization to the content. However, some research has shown that directory-based searching does not improve searching performance and also takes longer (Dennis, Bruza, & McArthur, 2002). There are variations of the search directory including specialized or niche Web search engines that provide content within a specific Web search engines, including computer science literature (CiteSeer http://www.researchindex.com), e-commerce (Froogle http://froogle.google.com/), or personal information (c.f., http:// www.switchboard.com). Some Web search engines provide clustering (Vivisimo http://vivisimo.com/), which one can view as an automated, real time, and virtual directory service.
AlltheWeb.com has extensive advanced Web search features, however. Additionally, the results of the 2002 AlltheWeb.com dataset do not conform to the results from studies of the other European-based Web search engines. One possible reason may be that AlltheWeb.com is attracting searchers outside of its traditional European market. From our analysis of the AlltheWeb.com transaction log, nearly 90% of the query requests are in English, with 6% French, 1% each Spanish, German, Italian, and a variety of other languages making up the rest. Further research will be needed to isolate the effects of linguistic differences.
Web searching topics are changing. There was a decrease in sexual searching as a percentage of overall Web searching on both European and US-based Web search engines. The overall trend is towards using the Web as a tool for information or commerce, rather than entertainment. This trend is more pronounced with US as opposed to European searchers. This analysis certainly confirms survey and other data that the Web is now a major source of information for most people (Cole et al., 2003; Fox, 2002). There is increased use of the Web as an economic resource and tool (Lawrence & Giles, 1999; Spink et al., 2002a), and people use the Web for an increasingly variety of information tasks (Fox, 2002; National Telecommunications & Information Administration, 2002).
The decreased level of interaction of Web searches may be unwelcome news for Web-search engine developers and for those providing Web-based information content, products, and services. Web users appear unwilling to invest additional effort to locate relevant Web content. The trend towards viewing only the first results page is a challenge for those seeking to draw visitors to their Web sites or for Web search engines attempting to generate revenue via ad impressions. Users have a low tolerance of viewing any results past the first page. They prefer to reformulate the Web query rather than wade through result listings. Placement within the first page of Web search engine results of an accurate abstract appears to be a determining factor in drawing traffic to a particular Web site.
We continue to conduct ongoing analysis of Web searching trends to provide a valuable insight into this important and critical area of human computer interaction and electronic commerce.