ABSTRACT
Query spelling correction is an important component of modern search engines that can help users to express an information need more accurately and thus improve search quality. In this work we proposed and implemented an end-to-end speller correction system, namely CloudSpeller. The CloudSpeller system uses a Hidden Markov Model to effectively model major types of spelling errors in a unified framework, in which we integrate a large-scale lexicon constructed using Wikipedia, an error model trained from high confidence correction pairs, and the Microsoft Web N-gram service. Our system achieves excellent performance on two search query spelling correction datasets, reaching 0.960 and 0.937 F1 scores on the TREC dataset and the MSN dataset respectively.
- http://research.microsoft.com/en-us/collaboration/focus/cs/web-ngram.aspx.Google Scholar
- E. Brill and R. Moore. An improved error model for noisy channel spelling correction. In ACL 2000. Google ScholarDigital Library
- Q. Chen, M. Li, and M. Zhou. Improving query spelling correction using web search results. In EMNLP 2007.Google Scholar
- J. Gao, X. Li, D. Micol, C. Quirk, and X. Sun. A large scale ranker-based system for search query spelling correction. In COLING 2010. Google ScholarDigital Library
- S. Cucerzan and E. Brill. Spelling correction as an iterative process that exploits the collective knowledge of web users. In EMNLP, 2004.Google Scholar
Index Terms
- CloudSpeller: query spelling correction by using a unified hidden markov model with web-scale resources
Recommendations
A generalized hidden Markov model with discriminative training for query spelling correction
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrievalQuery spelling correction is a crucial component of modern search engines. Existing methods in the literature for search query spelling correction have two major drawbacks. First, they are unable to handle certain important types of spelling errors, ...
A Large-Scale Query Spelling Correction Corpus
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalWe present a new large-scale collection of 54,772 queries with manually annotated spelling corrections. For 9,170 of the queries (16.74%), spelling variants that are different to the original query are proposed. With its size, our new corpus is an order ...
Query spelling correction using multi-task learning
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebThis paper explores the use of online multi-task learning for search query spelling correction, by effectively transferring information from different and biased training datasets for improving spelling correction across datasets. Experiments were ...
Comments