skip to main content
10.1145/1076034.1076080acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Multi-label informed latent semantic indexing

Published: 15 August 2005 Publication History

Abstract

Latent semantic indexing (LSI) is a well-known unsupervised approach for dimensionality reduction in information retrieval. However if the output information (i.e. category labels) is available, it is often beneficial to derive the indexing not only based on the inputs but also on the target values in the training data set. This is of particular importance in applications with multiple labels, in which each document can belong to several categories simultaneously. In this paper we introduce the multi-label informed latent semantic indexing (MLSI) algorithm which preserves the information of inputs and meanwhile captures the correlations between the multiple outputs. The recovered "latent semantics" thus incorporate the human-annotated category information and can be used to greatly improve the prediction accuracy. Empirical study based on two data sets, Reuters-21578 and RCV1, demonstrates very encouraging results.

References

[1]
R. K. Ando. Latent semantic-space: iterative scaling improves precision of inter-document similarity measurement. In Proceedings of the 23rd Annual International ACM SIGIR Conference, pages 216--223, 2000.
[2]
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.
[3]
D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical correlation analysis; an overview with application to learning methods. Technical Report CSD-TR-03-02, Royal Holloway University of London, 2003.
[4]
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Satatistical Learning. Springer Verlag, 2001.
[5]
X. He, D. Cai, H. Liu, and W.-Y. Ma. Locality preserving indexing for document representation. In Proceedings of the 27th Annual International ACM SIGIR Conference, pages 96--103, 2004.
[6]
H. Hotelling. Relations between two sets of variables. Biometrika, 28:321--377, 1936.
[7]
D. D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361--397, 2005.
[8]
A. McCallum. Multi-label text classification with a mixture model trained by EM. In AAAI'99 Workshop on Text Learning, 1999.
[9]
R. Rosipal and L. J. Trejo. Kernel partial least squares regression in reproducing kernel hilbert space. Journal of Machine Learning Research, 2(12):97--123, 2001.
[10]
B. Schölkopf, A. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10:1299--1319, 1998.
[11]
B. Schölkopf, A. Smola, and K.-R. Müller. Kernel principal component analysis. In Advances in Kernel Methods - Support Vector Learning, pages 327--352, 1999.
[12]
B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, 2002.
[13]
J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge Univeristy Press, 2004.
[14]
A. N. Tikhonov and V. Y. Arsenin. Solutions of Ill-Posed Problems. Wiley, New York, 1977.
[15]
H. Wold. Soft modeling by latent variables; the nonlinear iterative partial least squares approach. Perspectives in Probability and Statistics, Papers in Honour of M.S. Bartlett, 1975.

Cited By

View all
  • (2024)A Study Regarding Machine Unlearning on Facial Attribute Data2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG)10.1109/FG59268.2024.10581972(1-5)Online publication date: 27-May-2024
  • (2024)Multi-label feature selection via spectral clustering-based label enhancement and manifold distribution consistencyInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02181-915:10(4669-4693)Online publication date: 9-May-2024
  • (2024)Dynamic multi-label feature selection algorithm based on label importance and label correlationInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02098-3Online publication date: 13-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
August 2005
708 pages
ISBN:1595930345
DOI:10.1145/1076034
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 August 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dimensionality reduction
  2. latent semantic indexing
  3. multi-label classification
  4. supervised projection

Qualifiers

  • Article

Conference

SIGIR05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)3
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Study Regarding Machine Unlearning on Facial Attribute Data2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG)10.1109/FG59268.2024.10581972(1-5)Online publication date: 27-May-2024
  • (2024)Multi-label feature selection via spectral clustering-based label enhancement and manifold distribution consistencyInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02181-915:10(4669-4693)Online publication date: 9-May-2024
  • (2024)Dynamic multi-label feature selection algorithm based on label importance and label correlationInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02098-3Online publication date: 13-Mar-2024
  • (2023)Semi-Supervised Multi-Label Dimensionality Reduction Learning by Instance and Label CorrelationsMathematics10.3390/math1103078211:3(782)Online publication date: 3-Feb-2023
  • (2023)Distributed Online Multi-Label Learning with Privacy Protection in Internet of ThingsApplied Sciences10.3390/app1304271313:4(2713)Online publication date: 20-Feb-2023
  • (2023)A Theoretical Analysis of Out-of-Distribution Detection in Multi-Label ClassificationProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605116(275-282)Online publication date: 9-Aug-2023
  • (2023)A Multi-Objective online streaming Multi-Label feature selection using mutual informationExpert Systems with Applications10.1016/j.eswa.2022.119428216(119428)Online publication date: Apr-2023
  • (2023)Granular ball-based label enhancement for dimensionality reduction in multi-label dataApplied Intelligence10.1007/s10489-023-04771-653:20(24008-24033)Online publication date: 17-Jul-2023
  • (2022)Robust Multi-Label Relief Feature Selection Based on Fuzzy Margin Co-OptimizationIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2020.30446796:2(387-398)Online publication date: Apr-2022
  • (2022)Saliency-Based Multilabel Linear Discriminant AnalysisIEEE Transactions on Cybernetics10.1109/TCYB.2021.306933852:10(10200-10213)Online publication date: Oct-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media