skip to main content
10.1145/1076034.1076039acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Orthogonal locality preserving indexing

Published: 15 August 2005 Publication History

Abstract

We consider the problem of document indexing and representation. Recently, Locality Preserving Indexing (LPI) was proposed for learning a compact document subspace. Different from Latent Semantic Indexing which is optimal in the sense of global Euclidean structure, LPI is optimal in the sense of local manifold structure. However, LPI is extremely sensitive to the number of dimensions. This makes it difficult to estimate the intrinsic dimensionality, while inaccurately estimated dimensionality would drastically degrade its performance. One reason leading to this problem is that LPI is non-orthogonal. Non-orthogonality distorts the metric structure of the document space. In this paper, we propose a new algorithm called Orthogonal LPI. Orthogonal LPI iteratively computes the mutually orthogonal basis functions which respect the local geometrical structure. Moreover, our empirical study shows that OLPI can have more locality preserving power than LPI. We compare the new algorithm to LSI and LPI. Extensive experimental results show that Orthogonal LPI obtains better performance than both LSI and LPI. More crucially, it is insensitive to the number of dimensions, which makes it an efficient data preprocessing method for text clustering, classification, retrieval, etc.

References

[1]
R. Ando. Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement. In Proceedings of ACM SIGIR, 2000.
[2]
R. Ando and L. Lee. Iterative residual rescaling: An analysis and generalization. In Proceedings of ACM SIGIR, 2001.
[3]
B. T. Bartell, G. W. Cottrell, and R. K. Belew. Latent semantic indexing is an optimal special case of multidimensional scaling. In Proceedings of ACM SIGIR, 1992.
[4]
M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems 14, 2001.
[5]
M. Belkin, P. Niyogi, and V. Sindhwani. On maniold regularization. Technical report tr-2004-05, Computer Science Department, The University of Chicago, 2004.
[6]
F. R. K. Chung. Spectral Graph Theory, volume 92 of Regional Conference Series in Mathematics. 1997.
[7]
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.
[8]
C. H. Ding. A similarity-based probability model for latent semantic indexing. In Proceedings of ACM SIGIR, 1999.
[9]
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley-Interscience, Hoboken, NJ, 2nd edition, 2000.
[10]
G. H. Golub and C. F. V. Loan. Matrix computations. Johns Hopkins University Press, 3rd edition, 1996.
[11]
X. He, D. Cai, H. Liu, and W.-Y. Ma. Locality preserving indexing for document representation. In Proceedings of ACM SIGIR, 2004.
[12]
T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of ACM SIGIR, 1999.
[13]
B. Kegl. Intrinsic dimension estimation using packing numbers. In Advances in Neural Information Processing Systems 15, 2002.
[14]
E. Kokiopoulou and Y. Saad. Polynomial filtering in latent semantic indexing for information retrieval. In Proceedings of ACM SIGIR, 2004.
[15]
L. Lovasz and M. Plummer. Matching Theory. Akadémiai Kiadó, North Holland, Budapest, 1986.
[16]
P. Niyogi, S. Smale, and S. Weinberger. Finding the homology of submanifolds with high confidence from random samples. Technical report tr-2004-08, Department of Computer Science, University of Chicago, 2004.
[17]
C. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: a probabilistic analysis. In Proc. 17th ACM Symp. Principles of Database Systems, 1998.
[18]
S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323--2326, 2000.
[19]
C. Tang, S. Dwarkadas, and Z. Xu. On scaling latent semantic indexing for large peer-to-peer systems. In Proceedings of ACM SIGIR, 2004.
[20]
J. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319--2323, 2000.
[21]
U. von Luxburg, O. Bousquet, and M. Belkin. Limits of spectral clustering. In Advances in Neural Information Processing Systems 17, 2004.
[22]
W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In Proceedings of ACM SIGIR, 2003.

Cited By

View all
  • (2024)Image Classification Using Graph Regularized Independent Constraint Low-Rank RepresentationAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5663-6_2(15-24)Online publication date: 1-Aug-2024
  • (2023)Adaptive Graph Embedded Preserving Projection Learning for Feature Extraction and SelectionIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2022.319313153:2(1060-1073)Online publication date: Feb-2023
  • (2023)Adaptive affinity matrix learning for dimensionality reductionInternational Journal of Machine Learning and Cybernetics10.1007/s13042-023-01881-y14:12(4063-4077)Online publication date: 21-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
August 2005
708 pages
ISBN:1595930345
DOI:10.1145/1076034
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 August 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dimensionality reduction
  2. document representation and indexing
  3. locality preserving indexing
  4. orthogonal locality preserving indexing
  5. similarity measure
  6. vector space model

Qualifiers

  • Article

Conference

SIGIR05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Image Classification Using Graph Regularized Independent Constraint Low-Rank RepresentationAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5663-6_2(15-24)Online publication date: 1-Aug-2024
  • (2023)Adaptive Graph Embedded Preserving Projection Learning for Feature Extraction and SelectionIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2022.319313153:2(1060-1073)Online publication date: Feb-2023
  • (2023)Adaptive affinity matrix learning for dimensionality reductionInternational Journal of Machine Learning and Cybernetics10.1007/s13042-023-01881-y14:12(4063-4077)Online publication date: 21-Jun-2023
  • (2023)Robust Subspace Learning with Double Graph EmbeddingPattern Recognition and Computer Vision10.1007/978-981-99-8540-1_11(126-137)Online publication date: 25-Dec-2023
  • (2022)An Evolutionary Orthogonal Component Analysis Method for Incremental Dimensionality ReductionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.302785233:1(392-405)Online publication date: Jan-2022
  • (2022)A Scalable Algorithm for Large-Scale Unsupervised Multi-View Partial Least SquaresIEEE Transactions on Big Data10.1109/TBDATA.2020.30149378:4(1073-1083)Online publication date: 1-Aug-2022
  • (2022)Weighted Graph Embedded Low-Rank Projection Learning for Feature ExtractionICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP43922.2022.9746167(1501-1505)Online publication date: 23-May-2022
  • (2022)A class-driven approach to dimension embeddingExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.116650195:COnline publication date: 1-Jun-2022
  • (2021)Robust and accurate prediction of protein–protein interactions by exploiting evolutionary informationScientific Reports10.1038/s41598-021-96265-z11:1Online publication date: 19-Aug-2021
  • (2021)Subspace learning for facial expression recognition: an overview and a new perspectiveAPSIPA Transactions on Signal and Information Processing10.1017/ATSIP.2020.2710Online publication date: 14-Jan-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media