Article

Categorizing web queries according to geographical locality

Authors:
Luis Gravano

Columbia University

Columbia University
View Profile

,
Vasileios Hatzivassiloglou

Columbia University

Columbia University
View Profile

,
Richard Lichtenstein

Harvard University

Harvard University
View Profile

CIKM '03: Proceedings of the twelfth international conference on Information and knowledge managementNovember 2003Pages 325–333https://doi.org/10.1145/956863.956925

Published:03 November 2003Publication History

CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management

Pages 325–333

ABSTRACT

Web pages (and resources, in general) can be characterized according to their geographical locality. For example, a web page with general information about wildflowers could be considered a global page, likely to be of interest to a geographically broad audience. In contrast, a web page with listings on houses for sale in a specific city could be regarded as a local page, likely to be of interest only to an audience in a relatively narrow region. Similarly, some search engine queries (implicitly) target global pages, while other queries are after local pages. For example, the best results for query [wildflowers] are probably global pages about wildflowers such as the one discussed above. However, local pages that are relevant to, say, San Francisco are likely to be good matches for a query [houses for sale] that was issued by a San Francisco resident or by somebody moving to that city. Unfortunately, search engines do not analyze the geographical locality of queries and users, and hence often produce sub-optimal results. Thus query [wildflowers] might return pages that discuss wildflowers in specific U.S. states (and not general information about wildflowers), while query [houses for sale] might return pages with real estate listings for locations other than that of interest to the person who issued the query. Deciding whether an unseen query should produce mostly local or global pages---without placing this burden on the search engine users---is an important and challenging problem, because queries are often ambiguous or underspecify the information they are after. In this paper, we address this problem by first defining how to categorize queries according to their (often implicit) geographical locality. We then introduce several alternatives for automatically and efficiently categorizing queries in our scheme, using a variety of state-of-the-art machine learning tools. We report a thorough evaluation of our classifiers using a large sample of queries from a real web search engine, and conclude by discussing how our query categorization approach can help improve query result quality.

References

D. M. Bates and D. G. Watts. Nonlinear Regression Analysis and its Applications. Wiley, New York, 1988.Google ScholarCross Ref
B. E. Boser, I. M. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, 1992. Google ScholarDigital Library
A. Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30 (7):1145--1159, 1998. Google ScholarDigital Library
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International World Wide Web Conference (WWW7), Apr. 1998. Google ScholarDigital Library
C. Buckley, J. Allan, G. Salton, and A. Singhal. Automatic query expansion using SMART: TREC 3. In Proceedings of the Third Text REtrieval Conference (TREC-3), pages 69--80, April 1995. NIST Special Publication 500-225.Google Scholar
O. Buyukkokten, J. Cho, H. Gracía-Molina, L. Gravano, and N. Shivakumar. Exploiting geographical location information of web pages. In Proceedings of the ACM SIGMOD Workshop on the Web and Databases (WebDB'99), June 1999.Google Scholar
S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Kleinberg. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proceedings of the Seventh International World Wide Web Conference (WWW7), Apr. 1998. Google ScholarDigital Library
W. W. Cohen. Learning trees and rules with set-valued functions. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, 1996. Google ScholarDigital Library
J. Ding, L. Gravano, and N. Shivakumar. Computing geographical scopes of web resources. In Proceedings of the Twenty-sixth International Conference on Very Large Databases (VLDB'00), 2000. Google ScholarDigital Library
G. W. Flake, E. J. Glover, S. Lawrence, and C. L. Giles. Extracting query modifications from nonlinear SVMs. In Proceedings of the Eleventh International World-Wide Web Conference, Dec. 2002. Google ScholarDigital Library
M. A. Hearst. Trends and controversies: Support vector machines. IEEE Intelligent Systems, 13(4):18--28, July 1998. Google ScholarDigital Library
T. Joachims. Estimating the generalization of performance of an SVM efficiently. In Proceedings of the Fourteenth International Conference on Machine Learning, 2000. Google ScholarDigital Library
J. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings of the Ninth Annual ACM - SIAM Symposium on Discrete Algorithms, pages 668--677, Jan. 1998. Google ScholarDigital Library
Geospatial mapping and navigation of the web. In Proceedings of the Tenth International World Wide Web Conference (WWW10) , May 2001. Google ScholarDigital Library
M. Pazzani, C. Merz, P. Murphy, K. Ali, T. Hume, and C. Brunk. Reducing misclassification costs. In Proceedings of the Eleventh International Conference on Machine Learning, Sept. 1997.Google Scholar
R. Purves, A. Ruas, M. Sanderson, M. Sester, M. van Kreveld, and R. Weibel. Spatial information retrieval and geographical ontologies: An overview of the SPIRIT project. In Proceedings of the 25th ACM International Conference on Research and Development in Information Retrieval (SIGIR'02), 2002. Google ScholarDigital Library
R. J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, 1993. Google ScholarDigital Library
G. Salton. Automatic Text Processing: The transformation, analysis, and retrieval of information by computer. Addison-Wesley, 1989. Google ScholarDigital Library
T. J. Santner and D. E. Duffy. The Statistical Analysis of Discrete Data. Springer-Verlag, New York, 1989.Google ScholarCross Ref
C. J. van Rijsbergen. Information Retrieval. Butterworths, London, 2nd edition, 1979. Google ScholarDigital Library
G. M. Weiss and F. Provost. The effect of class distribution on classifier learning: An empirical study. Technical Report ML-TR-44, Computer Science Department, Rutgers University, Aug. 2001.Google Scholar

Index Terms

Categorizing web queries according to geographical locality
1. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

Identifying popular search goals behind search queries to improve web search ranking
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval Technology

Web users usually have a certain search goal before they submit a search query. However, many laypersons can't transform their search goals into suitable queries. Thus, understanding original search goals behind a query is very important for search ...
Read More
Evaluating Google queries based on language preferences

This paper evaluates the assumption that users expect search engines to retrieve the same results for queries regardless of the language or the location of the originator. The dependency of the Google search engine on the language and location from ...
Read More
Click-graph modeling for facet attribute estimation of web search queries
RIAO '10: Adaptivity, Personalization and Fusion of Heterogeneous Information

We use clickthrough data of a Japanese commercial search engine to evaluate the similarity between a query and a facet category from the patterns of clicks on URLs. Using a small number of seed queries, we extract a set of topical words forming search ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management
November 2003
592 pages
ISBN:1581137230
DOI:10.1145/956863
General Chair:
Donald Kraft
Louisiana State University
,
Program Chairs:
Ophir Frieder
Illinois Institute of Technology
,
Joachim Hammer
University of Florida
,
Sajda Qureshi
University of Nebraska, Omaha
,
Len Seligman
The MITRE Corporation
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
information retrieval
query classification
query modification
search engines
web search
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 97
  Total Citations
  View Citations
- 1,075
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Categorizing web queries according to geographical locality

CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Identifying popular search goals behind search queries to improve web search ranking

Evaluating Google queries based on language preferences

Click-graph modeling for facet attribute estimation of web search queries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Categorizing web queries according to geographical locality

CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Identifying popular search goals behind search queries to improve web search ranking

Evaluating Google queries based on language preferences

Click-graph modeling for facet attribute estimation of web search queries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media