poster

CloudSpeller: query spelling correction by using a unified hidden markov model with web-scale resources

Authors:
Yanen Li

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Huizhong Duan

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
ChengXiang Zhai

University of Illinois at Urbana-Champaign, IL, IL, USA

University of Illinois at Urbana-Champaign, IL, IL, USA
View Profile

WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebApril 2012Pages 561–562https://doi.org/10.1145/2187980.2188130

Published:16 April 2012Publication History

WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web

Pages 561–562

ABSTRACT

Query spelling correction is an important component of modern search engines that can help users to express an information need more accurately and thus improve search quality. In this work we proposed and implemented an end-to-end speller correction system, namely CloudSpeller. The CloudSpeller system uses a Hidden Markov Model to effectively model major types of spelling errors in a unified framework, in which we integrate a large-scale lexicon constructed using Wikipedia, an error model trained from high confidence correction pairs, and the Microsoft Web N-gram service. Our system achieves excellent performance on two search query spelling correction datasets, reaching 0.960 and 0.937 F1 scores on the TREC dataset and the MSN dataset respectively.

References

http://research.microsoft.com/en-us/collaboration/focus/cs/web-ngram.aspx.Google Scholar
E. Brill and R. Moore. An improved error model for noisy channel spelling correction. In ACL 2000. Google ScholarDigital Library
Q. Chen, M. Li, and M. Zhou. Improving query spelling correction using web search results. In EMNLP 2007.Google Scholar
J. Gao, X. Li, D. Micol, C. Quirk, and X. Sun. A large scale ranker-based system for search query spelling correction. In COLING 2010. Google ScholarDigital Library
S. Cucerzan and E. Brill. Spelling correction as an iterative process that exploits the collective knowledge of web users. In EMNLP, 2004.Google Scholar

Index Terms

CloudSpeller: query spelling correction by using a unified hidden markov model with web-scale resources
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

A generalized hidden Markov model with discriminative training for query spelling correction
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Query spelling correction is a crucial component of modern search engines. Existing methods in the literature for search query spelling correction have two major drawbacks. First, they are unable to handle certain important types of spelling errors, ...
Read More
A Large-Scale Query Spelling Correction Corpus
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

We present a new large-scale collection of 54,772 queries with manually annotated spelling corrections. For 9,170 of the queries (16.74%), spelling variants that are different to the original query are proposed. With its size, our new corpus is an order ...
Read More
Query spelling correction using multi-task learning
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web

This paper explores the use of online multi-task learning for search query spelling correction, by effectively transferring information from different and biased training datasets for improving spelling correction across datasets. Experiments were ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web
April 2012
1250 pages
ISBN:9781450312301
DOI:10.1145/2187980
General Chairs:
Alain Mille
Université de Lyon, France
,
Fabien Gandon
INRIA, France
,
Jacques Misselis
HP, France
,
Program Chairs:
Michael Rabinovich
Case Western Reserve University, USA
,
Steffen Staab
University of Koblenz-Landau, Germany
Copyright © 2012 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 April 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cloudspeller
query spelling correction
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 168
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

CloudSpeller: query spelling correction by using a unified hidden markov model with web-scale resources

WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

A generalized hidden Markov model with discriminative training for query spelling correction

A Large-Scale Query Spelling Correction Corpus

Query spelling correction using multi-task learning