skip to main content
article

Laplacian Linear Discriminant Analysis Approach to Unsupervised Feature Selection

Published:01 October 2009Publication History
Skip Abstract Section

Abstract

Until recently, numerous feature selection techniques have been proposed and found wide applications in genomics and proteomics. For instance, feature/gene selection has proven to be useful for biomarker discovery from microarray and mass spectrometry data. While supervised feature selection has been explored extensively, there are only a few unsupervised methods that can be applied to exploratory data analysis. In this paper, we address the problem of unsupervised feature selection. First, we extend Laplacian linear discriminant analysis (LLDA) to unsupervised cases. Second, we propose a novel algorithm for computing LLDA, which is efficient in the case of high dimensionality and small sample size as in microarray data. Finally, an unsupervised feature selection method, called LLDA-based Recursive Feature Elimination (LLDA-RFE), is proposed. We apply LLDA-RFE to several public data sets of cancer microarrays and compare its performance with those of Laplacian score and SVD-entropy, two state-of-the-art unsupervised methods, and with that of Fisher score, a supervised filter method. Our results demonstrate that LLDA-RFE outperforms Laplacian score and shows favorable performance against SVD-entropy. It performs even better than Fisher score for some of the data sets, despite the fact that LLDA-RFE is fully unsupervised.

References

  1. U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, "Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays," Proc. Nat'l Academy of Sciences USA, vol. 96, pp. 6745-6750, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  2. O. Alter, P.O. Brown, and D. Botstein, "Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling," Proc. Nat'l Academy of Sciences USA, vol. 97, pp. 10101-10106, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  3. S.A. Armstrong, J.E. Staunton, L.B. Silverman, R. Pieters, M.L. den Boer, M.D. Minden, S.E. Sallan, E.S. Lander, T.R. Golub, and S.J. Korsmeyer, "MLL Translocations Specify a Distinct Gene Expression Profile That Distinguishes a Unique Leukemia," Nature Genetics, vol. 30, pp. 41-47, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  4. D.G. Beer, S.L.R. Kardia, C.-C. Huang, T.J. Giordano, A.M. Levin, D.E. Misek, L. Lin, G. Chen, T.G. Gharib, D.G. Thomas, M.L. Lizyness, R. Kuick, S. Hayasaka, J.M.G. Taylor, M.D. Iannettoni, M.B. Orringer, and S. Hanash, "Gene-Expression Profiles Predict Survival of Patients with Lung Adenocarcinoma," Nature Medicine, vol. 8, no. 8, pp. 816-824, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  5. D. Cai, X. He, and J. Han, "Document Clustering Using Locality Preserving Indexing," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 12, pp. 1624-1637, Dec. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F.R.K. Chung, Spectral Graph Theory, Regional Conf. Series in Math., no. 92, Am. Math. Soc., 1997.Google ScholarGoogle Scholar
  7. C.H.Q. Ding, "Unsupervised Feature Selection via Two-Way Ordering in Gene Expression Analysis," Bioinformatics, vol. 19, no. 10, pp. 1259-1266, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  8. S. Dudoit, J. Fridlyand, and T. Speed, "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," J. Am. Statistical Assoc., vol. 97, pp. 77-87, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  9. K. Fukunaga, Introduction to Statistical Pattern Recognition, second ed. Academic Press, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, pp. 531-537, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  11. G.H. Golub and C.F. Van Loan, Matrix Computations, third ed. Johns Hopkins Univ. Press, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. I. Guyon and A. Elisseeff, "An Introduction to Variable and Feature Selection," J. Machine Learning Research, vol. 3, pp. 1157-1182, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol. 46, pp. 389-422, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Hastie, R. Tibshirani, M.B. Eisen, A. Alizadeh, R. Levy, L. Staudt, W.C. Chan, D. Botstein, and P.O. Brown, "'Gene Shaving' as a Method for Identifying Distinct Sets of Genes with Similar Expression Patterns," Genome Biology, vol. 1, no. 2, research0003, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  15. X. He, D. Cai, and P. Niyogi, "Laplacian Score for Feature Selection," Advances in Neural Information Processing Systems 18, Y. Weiss, B. Schölkopf, and J. Platt, eds., pp. 507-514, MIT Press, 2006.Google ScholarGoogle Scholar
  16. X. He, S. Yan, Y. Hu, P. Niyogi, and H.-J. Zhang, "Face Recognition Using Laplacianfaces," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 328-340, Mar. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Khan, J.S. Wei, M. Ringné r, L.H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. Antonescu, C. Peterson, and P.S. Meltzer, "Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks," Nature Medicine, vol. 7, no. 6, pp. 673-679, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  18. H. Li, T. Jiang, and K. Zhang, "Efficient and Robust Feature Extraction by Maximum Margin Criterion," IEEE Trans. Neural Networks, vol. 17, no. 1, pp. 157-165, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Li and Y. Yang, "Analysis of Recursive Gene Selection Approaches from Microarray Data," Bioinformatics, vol. 21, no. 19, pp. 3741-3747, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Q. Liu, X. Tang, H. Lu, and S. Ma, "Face Recognition Using Kernel Scatter-Difference-Based Discriminant Analysis," IEEE Trans. Neural Networks, vol. 17, no. 4, pp. 1081-1085, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Loog, "On an Alternative Formulation of the Fisher Criterion That Overcomes the Small Sample Problem," Pattern Recognition, vol. 40, pp. 1753-1755, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Michiels, S. Koscielny, and C. Hill, "Prediction of Cancer Outcome with Microarrays: A Multiple Random Validation Strategy," Lancet, vol. 365, pp. 488-492, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  23. A.Y. Ng, M.I. Jordan, and Y. Weiss, "On Spectral Clustering: Analysis and an Algorithm," Advances in Neural Information Processing Systems 14, T. Dietterich, S. Becker, and Z. Ghahramani, eds., pp. 849-856, MIT Press, 2002.Google ScholarGoogle Scholar
  24. S. Niijima and S. Kuhara, "Recursive Gene Selection Based on Maximum Margin Criterion: A Comparison with SVM-RFE," BMC Bioinformatics, vol. 7, 543, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  25. S.L. Pomeroy, P. Tamayo, M. Gaasenbeek, L.M. Sturla, M. Angelo, M.E. McLaughlin, J.Y.H Kim, L.C. Goumnerova, P.M. Black, C. Lau, J.C. Allen, D. Zagzag, J.M. Olson, T. Curran, C. Wetmore, J.A. Biegel, T. Poggio, S. Mukherjee, R. Rifkin, A. Califano, G. Stolovitzky, D.N. Louis, J.P. Mesirov, E.S. Lander, and T.R. Golub, "Prediction of Central Nervous System Embryonal Tumor Outcome Based on Gene Expression," Nature, vol. 415, pp. 436-442, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  26. H. Tang, T. Fang, and P.-F. Shi, "Laplacian Linear Discriminant Analysis," Pattern Recognition, vol. 39, pp. 136-139, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L.J. van 't Veer, H. Dai, M.J. van de Vijver, Y.D. He, A.A.M. Hart, M. Mao, H.L. Peterse, K. van der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, and S.H. Friend, "Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer," Nature, vol. 415, pp. 530-536, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  28. R. Varshavsky, A. Gottlieb, M. Linial, and D. Horn, "Novel Unsupervised Feature Filtering of Biological Data," Bioinformatics, vol. 22, no. 14, pp. e507-e513, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L.F.A. Wessels, M.J.T. Reinders, A.A.M. Hart, C.J. Veenman, H. Dai, Y.D. He, and L.J. van't Veer, "A Protocol for Building and Evaluating Predictors of Disease State Based on Microarray Data," Bioinformatics, vol. 21, no. 19, pp. 3755-3762, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. L. Wolf and A. Shashua, "Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach," J. Machine Learning Research, vol. 6, pp. 1855- 1887, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin, "Graph Embedding and Extensions: A General Framework for Dimensionality Reduction," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 40-51, Jan. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Ye, "Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems," J. Machine Learning Research, vol. 6, pp. 483-502, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Ye, T. Li, T. Xiong, and R. Janardan, "Using Uncorrelated Discriminant Analysis for Tissue Classification with Gene Expression Data," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 1, no. 4, pp. 181-190, Oct.-Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Zhu and T. Hastie, "Classification of Gene Microarrays by Penalized Logistic Regression," Biostatistics, vol. 5, no. 3, pp. 427- 443, 2004.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Laplacian Linear Discriminant Analysis Approach to Unsupervised Feature Selection

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader