skip to main content
article

Query enrichment for web-query classification

Authors Info & Claims
Published:01 July 2006Publication History
Skip Abstract Section

Abstract

Web-search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this article, we present a new technique called query enrichment, which takes a short query and maps it to intermediate objects. Based on the collected intermediate objects, the query is then mapped to target categories. To build the necessary mapping functions, we use an ensemble of search engines to produce an enrichment of the queries. Our technique was applied to the ACM Knowledge Discovery and Data Mining competition (ACM KDDCUP) in 2005, where we won the championship on all three evaluation metrics (precision, F1 measure, which combines precision and recall, and creativity, which is judged by the organizers) among a total of 33 teams worldwide. In this article, we show that, despite the difficulty of an abundance of ambiguous queries and lack of training data, our query-enrichment technique can solve the problem satisfactorily through a two-phase classification framework. We present a detailed description of our algorithm and experimental evaluation. Our best result for F1 and precision is 42.4% and 44.4%, respectively, which is 9.6% and 24.3% higher than those from the runner-ups, respectively.

References

  1. Alpaydin, E. 2004. Introduction to Machine Learning. MIT Press, Cambridge, MA.]] Google ScholarGoogle Scholar
  2. Bauer, E. and Kohavi, R. 1999. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. J. Mach. Learn. 36, 1--2, 105--139.]] Google ScholarGoogle Scholar
  3. Beeferman, D. and Berger, A. 2000. Agglomerative clustering of a search engine query log. In KDD '00: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, 407--416.]] Google ScholarGoogle Scholar
  4. Beitzel, S. M., Jensen, E. C., Frieder, O., Grossman, D., Lewis, D. D., Chowdhury, A., and Kolcz, A. 2005. Automatic web query classification using labeled and unlabeled training data. In SIGIR '05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 581--582.]] Google ScholarGoogle Scholar
  5. Cann, A. J. 2003. Maths from Scratch for Biologists. John Wiley & Sons, New York, NY.]]Google ScholarGoogle Scholar
  6. Caruana, R., Niculescu-Mizil, A., Crew, G., and Ksikes, A. 2004. Ensemble selection from libraries of models. In ICML '04: Proceedings of the 21st International Conference on Machine Learning. ACM Press, New York, 18.]] Google ScholarGoogle Scholar
  7. Chang, C.-H. and Hsu, C.-C. 1998. Integrating query expansion and conceptual relevance feedback for personalized web information retrieval. In WWW7: Proceedings of the 7th International Conference on World Wide Web 7. Elsevier Science Publishers B. V., Amsterdam, 621--623.]] Google ScholarGoogle Scholar
  8. Chekuri, C., Goldwasser, M., Raghavan, P., and Upfal, E. 1997. Web search using automated classification. 6th International World Wide Web Conference (WWW6). Poster presentation.]]Google ScholarGoogle Scholar
  9. Chen, H. and Dumais, S. 2000. Bringing order to the web: Automatically categorizing search results. In CHI '00: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM Press, New York, 145--152.]] Google ScholarGoogle Scholar
  10. Dietterich, T. G. 2000. Ensemble methods in machine learning. In Multiple Classifier Systems. 1--15.]] Google ScholarGoogle Scholar
  11. Fan, W., Stolfo, S. J., and Zhang, J. 1999. The application of adaboost for distributed, scalable and on-line learning. In KDD '99: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, 362--366.]] Google ScholarGoogle Scholar
  12. Hansen, L. K. and Salamon, P. 1990. Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12, 10, 993--1001.]] Google ScholarGoogle Scholar
  13. HITEC. http://categorizer.tmit.bmr.hu.]]Google ScholarGoogle Scholar
  14. Hoel, P. G. 1966. Elementary Statistics, 2nd ed. Wiley, New York, NY.]]Google ScholarGoogle Scholar
  15. Howe, A. E. and Dreilinger, D. 1997. SAVVYSEARCH: A metasearch engine that learns which search engines to query. AI Mag. 18, 2, 19--25.]]Google ScholarGoogle Scholar
  16. Jansen, B. J. 2000. The effect of query complexity on web searching results. Inf. Res. 6, 1.]]Google ScholarGoogle Scholar
  17. Joachims, T. 1998. Text categorization with suport vector machines: Learning with many relevant features. In ECML'98 Proceedings of the 10th European Conference on Machine Learning, 137--142.]] Google ScholarGoogle Scholar
  18. Joachims, T. 1999. Transductive inference for text classification using support vector machines. In ICML '99: Proceedings of the 16th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, 200--209.]] Google ScholarGoogle Scholar
  19. Jones, K. S. 1971. Automatic Keyword Classifications for Information Retrieval. Butterworth, London, UK.]]Google ScholarGoogle Scholar
  20. Kang, I.-H. and Kim, G. 2003. Query type classification for web document retrieval. In SIGIR '03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM Press, New York, 64--71.]] Google ScholarGoogle Scholar
  21. Kardkovács, Z. T., Tikk, D., and Bánsághi, Z. 2005. The ferrety algorithm for the kdd cup 2005 problem. SIGKDD Explor. Newsl. 7, 2, 111--116.]] Google ScholarGoogle Scholar
  22. Kittler, J., Hatef, M., Duin, R. P. W., and Matas, J. 1998. On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20, 3, 226--239.]] Google ScholarGoogle Scholar
  23. Lewis, D. D. and Gale, W. A. 1994. A sequential algorithm for training text classifiers. In SIGIR, W. B. Croft and C. J. van Rijsbergen, Eds. Springer Verlag, Berlin, Germany, 3--12.]] Google ScholarGoogle Scholar
  24. Li, Y., Zheng, Z., and Dai, H. K. 2005. Kdd cup-2005 report: Facing a great challenge. SIGKDD Explor. Newsl. 7, 2, 91--99.]] Google ScholarGoogle Scholar
  25. McCallum, A. and Nigam, K. 1998. A comparison of event models for naive Bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization.]]Google ScholarGoogle Scholar
  26. Meyer, D. A. and Brown, T. A. 1998. Statistical mechanics of voting. Phys. Revei. Lett. 81, 8, 1718--1721.]]Google ScholarGoogle Scholar
  27. Miller, G., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. 1990. Introduction to wordnet: An on-line lexical database. Int. J. Lexicography 3, 4, 23--244.]]Google ScholarGoogle Scholar
  28. Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The pagerank citation ranking: Bringing order to the web. Tech. rep., Stanford Digital Library Technologies Project.]]Google ScholarGoogle Scholar
  29. Selberg, E. and Etzioni, O. 1995. Multi-service search and comparison using the MetaCrawler. In Proceedings of the 4th International World-Wide web Conference. Darmstadt, Germany.]]Google ScholarGoogle Scholar
  30. Shen, D., Pan, R., Sun, J.-T., Pan, J. J., Wu, K., Yin, J., and Yang, Q. 2005. Q2c@ust: Our winning solution to query classification in kddcup 2005. SIGKDD Explor. Newsl. 7, 2, 100--110.]] Google ScholarGoogle Scholar
  31. Shen, D., Sun, J.-T., Yang, Q., and Chen, Z. 2006. Building bridges for web query classification. In SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.]] Google ScholarGoogle Scholar
  32. Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. 1999. Analysis of a very large web search engine query log. SIGIR Forum 33, 1, 6--12.]] Google ScholarGoogle Scholar
  33. Tikk, D., Biró, Gy., and Yang, J. D. 2005. Experiments with a hierarchial text categorization method on WIPO patent collections. In Applied Research in Uncertainty Modelling and Analysis, N. O. Attok-Okine and B. M. Ayyub, Eds. International Series in Intelligent Technologies, vol. 20, Springer-Verlag, 283--302.]]Google ScholarGoogle Scholar
  34. Van, R. C. 1979. Information Retrieval, 2nd ed. Butterworth, London, UK.]]Google ScholarGoogle Scholar
  35. Vogel, D., Bickel, S., Haider, P., Schimpfky, R., Siemen, P., Bridges, S., and Scheffer, T. 2005. Classifying search engine queries using the web as background knowledge. SIGKDD Explor. Newsl. 7, 2, 117--122.]] Google ScholarGoogle Scholar
  36. Voorhees, E. M. 1994. Query expansion using lexical-semantic relations. In SIGIR '94: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Springer Verlag, Berlin, Germany. 61--69.]] Google ScholarGoogle Scholar
  37. Wen, J.-R., Nie, J.-Y., and Zhang, H.-J. 2002. Query clustering using user logs. ACM Trans. Inf. Syst. 20, 1, 59--81.]] Google ScholarGoogle Scholar
  38. Yang, Y. 1999. An evaluation of statistical approaches to text categorization. Inf. Retr. 1, 1--2, 69--90.]] Google ScholarGoogle Scholar
  39. Yang, Y. and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. In ICML '97: Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 412--420.]] Google ScholarGoogle Scholar

Index Terms

  1. Query enrichment for web-query classification

          Recommendations

          Reviews

          George Pallis

          Web query classification is a rather interesting area of research, since Web users' queries are typically short, noisy, and ambiguous. This paper presents a new approach to classifying Web search queries, made up of two phases. First, the classifiers are learned by including synonym-based and statistics-based techniques. Then, the queries are classified into one or more categories, based on the classifiers that were previously learned. The proposed approach is quite detailed, and is supported by a wide range of examples, which help in understanding the underlying issues. Furthermore, the paper is quite strong in its experimental presentation, which shows that the proposed approach works well in practice, and has proven useful for classifying Web queries. As a computer science paper, however, this work is lacking in coverage of computational aspects. Although the authors provide a running time analysis, it would be useful to see a further discussion of the complexity of the tasks included in their approach. To sum up, this is an interesting, well-written, and easy-to-follow paper; readers will not need to be tech-savvy to understand its contents. This work is well worth reading by anyone interested in the area of information retrieval. Online Computing Reviews Service

          Access critical reviews of Computing literature here

          Become a reviewer for Computing Reviews.

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader