skip to main content
article

Optimal Aggregation of Binary Classifiers for Multiclass Cancer Diagnosis Using Gene Expression Profiles

Published:01 April 2009Publication History
Skip Abstract Section

Abstract

Multiclass classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. There have been many studies of aggregating binary classifiers to construct a multiclass classifier based on one-versus-the-rest (1R), one-versus-one (11), or other coding strategies, as well as some comparison studies between them. However, the studies found that the best coding depends on each situation. Therefore, a new problem, which we call the “optimal coding problem,” has arisen: how can we determine which coding is the optimal one in each situation? To approach this optimal coding problem, we propose a novel framework for constructing a multiclass classifier, in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. Although there is no a priori answer to the optimal coding problem, our weight tuning method can be a consistent answer to the problem. We apply this method to various classification problems including a synthesized data set and some cancer diagnosis data sets from gene expression profiling. The results demonstrate that, in most situations, our method can improve classification accuracy over simple voting heuristics and is better than or comparable to state-of-the-art multiclass predictors.

References

  1. T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, no. 5439, pp. 531-537, Oct. 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C.R. Antonescu, C. Peterson, and P.S. Meltzer, "Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks," Nature Medicine, vol. 7, no. 6, pp. 673-679, June 2001.Google ScholarGoogle Scholar
  3. S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, T. Poggio, W. Gerald, M. Loda, E.S. Lander, and T.R. Golub, "Multiclass Cancer Diagnosis Using Tumor Gene Expression Signatures," Proc. Nat'l Academy Sciences USA, vol. 98, no. 26, pp. 15149-15154, Dec. 2001.Google ScholarGoogle ScholarCross RefCross Ref
  4. I. Hedenfalk, M. Ringner, A. Ben-Dor, Z. Yakhini, Y. Chen, G. Chebil, R. Ach, N. Loman, H. Olsson, P. Meltzer, A. Borg, and J. Trent, "Molecular Classification of Familial non-BRCA1/BRCA2 Breast Cancer," Proc. Nat'l Academy Sciences USA, vol. 100, no. 5, pp. 2532-2537, Mar. 2003.Google ScholarGoogle ScholarCross RefCross Ref
  5. B. Schoelkopf, C. Burges, and V. Vapnik, "Extracting Support Data for a Given Task," Proc. First Int'l Conf. Knowledge Discovery and Data Mining, pp. 252-257, 1995.Google ScholarGoogle Scholar
  6. B. Schoelkopf, C. Burges, and A. Smola, Advances in Kernel Methods Support Vector Learning. MIT Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T.G. Dietterich and G. Bakiri, "Solving Multiclass Learning Problems via Error-Correcting Output Codes," J. Artificial Intelligence Research, vol. 2, pp. 263-286, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E.L. Allwein, R.E. Schapire, and Y. Singer, "Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers," Proc. 17th Int'l Conf. Machine Learning, pp. 9-16, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Hastie and R. Tibshirani, "Classification by Pairwise Coupling," Advances in Neural Information Processing Systems, vol. 10, pp. 507- 513, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Zadrozny, "Reducing Multiclass to Binary by Coupling Probability Estimates," Advances in Neural Information Processing Systems, vol. 14, pp. 1041-1048, 2001.Google ScholarGoogle Scholar
  11. T. Li, C. Zhang, and M. Ogihara, "A Comparative Study of Feature Selection and Multiclass Classification Methods for Tissue Classification Based on Gene Expression," Bioinformatics, vol. 20, no. 15, pp. 2429-2437, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, "A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis," Bioinformatics, vol. 21, no. 5, pp. 631-643, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Weston and C. Watkins, "Multi-Class Support Vector Machine," technical report, Univ. of London, 1998.Google ScholarGoogle Scholar
  14. K. Crammer and Y. Singer, "On the Algorithmic Implementation of Multiclass Kernel-Based Vector Machines," J. Machine Learning Research, vol. 2, pp. 265-292, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E.L. Allwein, R.E. Schapire, and Y. Singer, "Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers," J. Machine Learning Research, vol. 1, pp. 113-141, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Shen and E.C. Tan, "Reducing Multiclass Cancer Classification to Binary by Output Coding and SVM," Computational Biology and Chemistry, vol. 30, no. 1, pp. 63-71, Feb. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Platt, "Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods," Advances in Large Margin Classifiers, A.J. Smola, P. Bartlett, B. Schoelkopf, and D. Schuurmans, eds., pp. 61-74, 2000.Google ScholarGoogle Scholar
  18. K. Kato, "Adaptor-Tagged Competitive PCR: A Novel Method for Measuring Relative Gene Expression," Nucleic Acids Research, vol. 25, no. 22, pp. 4694-4696, Nov. 1997.Google ScholarGoogle ScholarCross RefCross Ref
  19. E. Saxen, K. Franssila, O. Bjarnason, T. Normann, and N. Ringertz, "Observer Variation in Histologic Classification of Thyroid Cancer," Acta Pathologica et Microbiologica Scandinavica A, vol. 86A, no. 6, pp. 483-486, Nov. 1978.Google ScholarGoogle Scholar
  20. A.S. Fassina, M.C. Montesco, V. Ninfo, P. Denti, and G. Masarotto, "Histological Evaluation of Thyroid Carcinomas: Reproducibility of the WHO Classification," Tumori, vol. 79, no. 5, pp. 314-320, Oct. 1993.Google ScholarGoogle ScholarCross RefCross Ref
  21. Z.W. Baloch, S. Fleisher, V.A. LiVolsi, and P.K. Gupta, "Diagnosis of Follicular Neoplasm: A Gray Zone in Thyroid Fine-Needle Aspiration Cytology," Diagnostic Cytophathology, vol. 26, no. 1, pp. 41-44, Jan. 2002.Google ScholarGoogle ScholarCross RefCross Ref
  22. K. Kato, R. Yamashita, R. Matoba, M. Monden, S. Noguchi, T. Takagi, and K. Nakai, "Cancer Gene Expression Database (CGED): A Database for Gene Expression Profiling and Accompanying Clinical Information of Human Cancer Tissues," Nucleic Acids Research, vol. 33, pp. D533-D536, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  23. K. Taniguchi, T. Takano, A. Miyauchi, K. Koizumi, Y. Ito, Y. Takamura, M. Ishitobi, Y. Miyoshi, T. Taguchi, Y. Tamaki, K. Kato, and S. Noguchi, "Differentiation of Follicular Thyroid Adenoma from Carcinoma by Gene Expression Profiling with Adapter-Tagged Competitive Polymerase Chain Reaction," Oncology, vol. 69, pp. 428-435, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  24. S.A. Armstrong, J.E. Staunton, L.B. Silverman, R. Pieters, M.L. den Boer, M.D. Minden, S.E. Sallan, E.S. Lander, T.R. Golub, and S.J. Korsmeyer, "MLL Translocations Specify a Distinct Gene Expression Profile that Distinguishes a Unique Leukemia," Nature Genetics, vol. 30, no. 1, pp. 41-47, Jan. 2002.Google ScholarGoogle Scholar
  25. R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu, "Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression," Proc. Nat'l Academy Sciences USA, vol. 99, no. 10, pp. 6567- 6572, May 2002.Google ScholarGoogle ScholarCross RefCross Ref
  26. M. Ohira, S. Oba, Y. Nakamura, E. Isogai, S. Kaneko, A. Nakagawa, T. Hirata, H. Kubo, T. Goto, S. Yamada, Y. Yoshida, M. Fuchioka, S. Ishii, and A. Nakagawara, "Expression Profiling Using a Tumor-Specific cDNA Microarray Predicts the Prognosis of Intermediate Risk Neuroblastomas," Cancer Cell, vol. 7, no. 4, pp. 337-350, Apr. 2005.Google ScholarGoogle ScholarCross RefCross Ref
  27. T.S. Furey, N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer, and D. Haussler, "Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data," Bioinformatics, vol. 16, no. 10, pp. 906-914, evaluation studies, Oct. 2000.Google ScholarGoogle Scholar
  28. S. Dudoit, J. Fridlyand, and T.P. Speed, "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," J. Am. Statistical Assoc., vol. 97, pp. 77-87, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  29. Y. Freund and R. Schapire, "Experiments with a New Boosting Algorithm," Proc. Int'l Conf. Machine Learning (ICML '96), pp. 148- 156, 1996.Google ScholarGoogle Scholar
  30. I. Guyon, J. Weston, S.M.D. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol. 46, pp. 389-422, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimal Aggregation of Binary Classifiers for Multiclass Cancer Diagnosis Using Gene Expression Profiles

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader