|
ABSTRACT
Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if any, terms in common between two short text snippets. We address this problem by introducing a novel method for measuring the similarity between short text snippets (even those without any overlapping terms) by leveraging web search results to provide greater context for the short texts. In this paper, we define such a similarity kernel function, mathematically analyze some of its properties, and provide examples of its efficacy. We also show the use of this kernel function in a large-scale system for suggesting related queries to search engine users.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Arindam Banerjee , Inderjit S. Dhillon , Joydeep Ghosh , Suvrit Sra, Clustering on the Unit Hypersphere using von Mises-Fisher Distributions, The Journal of Machine Learning Research, 6, p.1345-1382, 9/1/2005
|
| |
3
|
C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC 3. In The Third Text REtrieval Conference, pages 69--80, 1994.
|
| |
4
|
|
| |
5
|
|
| |
6
|
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.
|
| |
7
|
I. S. Dhillon and S. Sra. Modeling data using directional distributions, 2003.
|
 |
8
|
Susan Dumais , John Platt , David Heckerman , Mehran Sahami, Inductive learning algorithms and representations for text categorization, Proceedings of the seventh international conference on Information and knowledge management, p.148-155, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288651]
|
 |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
J. S. Kandola, J. Shawe-Taylor, and N. Cristianini. Learning semantic similarity. In Advances in Neural Information Processing Systems (NIPS) 15, pages 657--664, 2002.
|
 |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
A. Vinokourov, J. Shawe-Taylor, and N. Cristianini. Inferring a semantic representation of text via cross-language correlation analysis. In Advances in Neural Information Processing Systems (NIPS) 15, pages 1473--1480, 2002.
|
 |
19
|
Bienvenido Vélez , Ron Weiss , Mark A. Sheldon , David K. Gifford, Fast and effective query refinement, Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, p.6-15, July 27-31, 1997, Philadelphia, Pennsylvania, United States
|
 |
20
|
|
|