article

Query enrichment for web-query classification

Authors:
Dou Shen

Hong Kong University of Science and Technology, Hong Kong, China

Hong Kong University of Science and Technology, Hong Kong, China
View Profile

,
Rong Pan

Hong Kong University of Science and Technology, Hong Kong, China

Hong Kong University of Science and Technology, Hong Kong, China
View Profile

,
Jian-Tao Sun

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Jeffrey Junfeng Pan

Hong Kong University of Science and Technology, Hong Kong, China

Hong Kong University of Science and Technology, Hong Kong, China
View Profile

,
Kangheng Wu

Hong Kong University of Science and Technology, Hong Kong, China

Hong Kong University of Science and Technology, Hong Kong, China
View Profile

,
Jie Yin

Hong Kong University of Science and Technology, Hong Kong, China

Hong Kong University of Science and Technology, Hong Kong, China
View Profile

,
Qiang Yang

Hong Kong University of Science and Technology, Hong Kong, China

Hong Kong University of Science and Technology, Hong Kong, China
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 24 Issue 3pp 320–352https://doi.org/10.1145/1165774.1165776

Published:01 July 2006Publication History

ACM Transactions on Information Systems

Abstract

Web-search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this article, we present a new technique called query enrichment, which takes a short query and maps it to intermediate objects. Based on the collected intermediate objects, the query is then mapped to target categories. To build the necessary mapping functions, we use an ensemble of search engines to produce an enrichment of the queries. Our technique was applied to the ACM Knowledge Discovery and Data Mining competition (ACM KDDCUP) in 2005, where we won the championship on all three evaluation metrics (precision, F1 measure, which combines precision and recall, and creativity, which is judged by the organizers) among a total of 33 teams worldwide. In this article, we show that, despite the difficulty of an abundance of ambiguous queries and lack of training data, our query-enrichment technique can solve the problem satisfactorily through a two-phase classification framework. We present a detailed description of our algorithm and experimental evaluation. Our best result for F1 and precision is 42.4% and 44.4%, respectively, which is 9.6% and 24.3% higher than those from the runner-ups, respectively.

References

Alpaydin, E. 2004. Introduction to Machine Learning. MIT Press, Cambridge, MA.]] Google Scholar
Bauer, E. and Kohavi, R. 1999. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. J. Mach. Learn. 36, 1--2, 105--139.]] Google Scholar
Beeferman, D. and Berger, A. 2000. Agglomerative clustering of a search engine query log. In KDD '00: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, 407--416.]] Google Scholar
Beitzel, S. M., Jensen, E. C., Frieder, O., Grossman, D., Lewis, D. D., Chowdhury, A., and Kolcz, A. 2005. Automatic web query classification using labeled and unlabeled training data. In SIGIR '05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 581--582.]] Google Scholar
Cann, A. J. 2003. Maths from Scratch for Biologists. John Wiley & Sons, New York, NY.]]Google Scholar
Caruana, R., Niculescu-Mizil, A., Crew, G., and Ksikes, A. 2004. Ensemble selection from libraries of models. In ICML '04: Proceedings of the 21st International Conference on Machine Learning. ACM Press, New York, 18.]] Google Scholar
Chang, C.-H. and Hsu, C.-C. 1998. Integrating query expansion and conceptual relevance feedback for personalized web information retrieval. In WWW7: Proceedings of the 7th International Conference on World Wide Web 7. Elsevier Science Publishers B. V., Amsterdam, 621--623.]] Google Scholar
Chekuri, C., Goldwasser, M., Raghavan, P., and Upfal, E. 1997. Web search using automated classification. 6th International World Wide Web Conference (WWW6). Poster presentation.]]Google Scholar
Chen, H. and Dumais, S. 2000. Bringing order to the web: Automatically categorizing search results. In CHI '00: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM Press, New York, 145--152.]] Google Scholar
Dietterich, T. G. 2000. Ensemble methods in machine learning. In Multiple Classifier Systems. 1--15.]] Google Scholar
Fan, W., Stolfo, S. J., and Zhang, J. 1999. The application of adaboost for distributed, scalable and on-line learning. In KDD '99: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, 362--366.]] Google Scholar
Hansen, L. K. and Salamon, P. 1990. Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12, 10, 993--1001.]] Google Scholar
HITEC. http://categorizer.tmit.bmr.hu.]]Google Scholar
Hoel, P. G. 1966. Elementary Statistics, 2nd ed. Wiley, New York, NY.]]Google Scholar
Howe, A. E. and Dreilinger, D. 1997. SAVVYSEARCH: A metasearch engine that learns which search engines to query. AI Mag. 18, 2, 19--25.]]Google Scholar
Jansen, B. J. 2000. The effect of query complexity on web searching results. Inf. Res. 6, 1.]]Google Scholar
Joachims, T. 1998. Text categorization with suport vector machines: Learning with many relevant features. In ECML'98 Proceedings of the 10th European Conference on Machine Learning, 137--142.]] Google Scholar
Joachims, T. 1999. Transductive inference for text classification using support vector machines. In ICML '99: Proceedings of the 16th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, 200--209.]] Google Scholar
Jones, K. S. 1971. Automatic Keyword Classifications for Information Retrieval. Butterworth, London, UK.]]Google Scholar
Kang, I.-H. and Kim, G. 2003. Query type classification for web document retrieval. In SIGIR '03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM Press, New York, 64--71.]] Google Scholar
Kardkovács, Z. T., Tikk, D., and Bánsághi, Z. 2005. The ferrety algorithm for the kdd cup 2005 problem. SIGKDD Explor. Newsl. 7, 2, 111--116.]] Google Scholar
Kittler, J., Hatef, M., Duin, R. P. W., and Matas, J. 1998. On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20, 3, 226--239.]] Google Scholar
Lewis, D. D. and Gale, W. A. 1994. A sequential algorithm for training text classifiers. In SIGIR, W. B. Croft and C. J. van Rijsbergen, Eds. Springer Verlag, Berlin, Germany, 3--12.]] Google Scholar
Li, Y., Zheng, Z., and Dai, H. K. 2005. Kdd cup-2005 report: Facing a great challenge. SIGKDD Explor. Newsl. 7, 2, 91--99.]] Google Scholar
McCallum, A. and Nigam, K. 1998. A comparison of event models for naive Bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization.]]Google Scholar
Meyer, D. A. and Brown, T. A. 1998. Statistical mechanics of voting. Phys. Revei. Lett. 81, 8, 1718--1721.]]Google Scholar
Miller, G., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. 1990. Introduction to wordnet: An on-line lexical database. Int. J. Lexicography 3, 4, 23--244.]]Google Scholar
Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The pagerank citation ranking: Bringing order to the web. Tech. rep., Stanford Digital Library Technologies Project.]]Google Scholar
Selberg, E. and Etzioni, O. 1995. Multi-service search and comparison using the MetaCrawler. In Proceedings of the 4th International World-Wide web Conference. Darmstadt, Germany.]]Google Scholar
Shen, D., Pan, R., Sun, J.-T., Pan, J. J., Wu, K., Yin, J., and Yang, Q. 2005. Q2c@ust: Our winning solution to query classification in kddcup 2005. SIGKDD Explor. Newsl. 7, 2, 100--110.]] Google Scholar
Shen, D., Sun, J.-T., Yang, Q., and Chen, Z. 2006. Building bridges for web query classification. In SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.]] Google Scholar
Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. 1999. Analysis of a very large web search engine query log. SIGIR Forum 33, 1, 6--12.]] Google Scholar
Tikk, D., Biró, Gy., and Yang, J. D. 2005. Experiments with a hierarchial text categorization method on WIPO patent collections. In Applied Research in Uncertainty Modelling and Analysis, N. O. Attok-Okine and B. M. Ayyub, Eds. International Series in Intelligent Technologies, vol. 20, Springer-Verlag, 283--302.]]Google Scholar
Van, R. C. 1979. Information Retrieval, 2nd ed. Butterworth, London, UK.]]Google Scholar
Vogel, D., Bickel, S., Haider, P., Schimpfky, R., Siemen, P., Bridges, S., and Scheffer, T. 2005. Classifying search engine queries using the web as background knowledge. SIGKDD Explor. Newsl. 7, 2, 117--122.]] Google Scholar
Voorhees, E. M. 1994. Query expansion using lexical-semantic relations. In SIGIR '94: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Springer Verlag, Berlin, Germany. 61--69.]] Google Scholar
Wen, J.-R., Nie, J.-Y., and Zhang, H.-J. 2002. Query clustering using user logs. ACM Trans. Inf. Syst. 20, 1, 59--81.]] Google Scholar
Yang, Y. 1999. An evaluation of statistical approaches to text categorization. Inf. Retr. 1, 1--2, 69--90.]] Google Scholar
Yang, Y. and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. In ICML '97: Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 412--420.]] Google Scholar

Index Terms

Query enrichment for web-query classification
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
2. Information systems
  1. Information retrieval
  2. Information systems applications
    1. Data mining

Recommendations

Automatic web query classification using labeled and unlabeled training data
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Accurate topical categorization of user queries allows for increased effectiveness, efficiency, and revenue potential in general-purpose web search systems. Such categorization becomes critical if the system is to return results not just from a general ...
Read More
Context-aware query classification
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Understanding users'search intent expressed through their search queries is crucial to Web search and online advertisement. Web query classification (QC) has been widely studied for this purpose. Most previous QC algorithms classify individual queries ...
Read More
Q²C@UST: our winning solution to query classification in KDDCUP 2005

In this paper, we describe our ensemble-search based approach, Q²C@UST (http://webprojectl.cs.ust.hk/q2c/), for the query classification task for the KDDCUP 2005. There are two aspects to the key difficulties of this problem: one is that the meaning of ...
Read More

Reviews

Reviewer: George Pallis

Web query classification is a rather interesting area of research, since Web users' queries are typically short, noisy, and ambiguous. This paper presents a new approach to classifying Web search queries, made up of two phases. First, the classifiers are learned by including synonym-based and statistics-based techniques. Then, the queries are classified into one or more categories, based on the classifiers that were previously learned. The proposed approach is quite detailed, and is supported by a wide range of examples, which help in understanding the underlying issues. Furthermore, the paper is quite strong in its experimental presentation, which shows that the proposed approach works well in practice, and has proven useful for classifying Web queries. As a computer science paper, however, this work is lacking in coverage of computational aspects. Although the authors provide a running time analysis, it would be useful to see a further discussion of the complexity of the tasks included in their approach. To sum up, this is an interesting, well-written, and easy-to-follow paper; readers will not need to be tech-savvy to understand its contents. This work is well worth reading by anyone interested in the area of information retrieval. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Information Systems Volume 24, Issue 3
July 2006
110 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/1165774
Issue’s Table of Contents

Copyright © 2006 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 July 2006
Published in tois Volume 24, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
KDDCUP2005
Query classification
ensemble learning
query enrichment
synonym-based classifier
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 84
  Total Citations
  View Citations
- 2,034
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Query enrichment for web-query classification

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Automatic web query classification using labeled and unlabeled training data

Context-aware query classification

Q²C@UST: our winning solution to query classification in KDDCUP 2005

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Query enrichment for web-query classification

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Automatic web query classification using labeled and unlabeled training data

Context-aware query classification

Q2C@UST: our winning solution to query classification in KDDCUP 2005

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

Q²C@UST: our winning solution to query classification in KDDCUP 2005