Abstract
Multiclass classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. There have been many studies of aggregating binary classifiers to construct a multiclass classifier based on one-versus-the-rest (1R), one-versus-one (11), or other coding strategies, as well as some comparison studies between them. However, the studies found that the best coding depends on each situation. Therefore, a new problem, which we call the “optimal coding problem,” has arisen: how can we determine which coding is the optimal one in each situation? To approach this optimal coding problem, we propose a novel framework for constructing a multiclass classifier, in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. Although there is no a priori answer to the optimal coding problem, our weight tuning method can be a consistent answer to the problem. We apply this method to various classification problems including a synthesized data set and some cancer diagnosis data sets from gene expression profiling. The results demonstrate that, in most situations, our method can improve classification accuracy over simple voting heuristics and is better than or comparable to state-of-the-art multiclass predictors.
- T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, no. 5439, pp. 531-537, Oct. 1999.Google ScholarDigital Library
- J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C.R. Antonescu, C. Peterson, and P.S. Meltzer, "Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks," Nature Medicine, vol. 7, no. 6, pp. 673-679, June 2001.Google Scholar
- S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, T. Poggio, W. Gerald, M. Loda, E.S. Lander, and T.R. Golub, "Multiclass Cancer Diagnosis Using Tumor Gene Expression Signatures," Proc. Nat'l Academy Sciences USA, vol. 98, no. 26, pp. 15149-15154, Dec. 2001.Google ScholarCross Ref
- I. Hedenfalk, M. Ringner, A. Ben-Dor, Z. Yakhini, Y. Chen, G. Chebil, R. Ach, N. Loman, H. Olsson, P. Meltzer, A. Borg, and J. Trent, "Molecular Classification of Familial non-BRCA1/BRCA2 Breast Cancer," Proc. Nat'l Academy Sciences USA, vol. 100, no. 5, pp. 2532-2537, Mar. 2003.Google ScholarCross Ref
- B. Schoelkopf, C. Burges, and V. Vapnik, "Extracting Support Data for a Given Task," Proc. First Int'l Conf. Knowledge Discovery and Data Mining, pp. 252-257, 1995.Google Scholar
- B. Schoelkopf, C. Burges, and A. Smola, Advances in Kernel Methods Support Vector Learning. MIT Press, 1999. Google ScholarDigital Library
- T.G. Dietterich and G. Bakiri, "Solving Multiclass Learning Problems via Error-Correcting Output Codes," J. Artificial Intelligence Research, vol. 2, pp. 263-286, 1995. Google ScholarDigital Library
- E.L. Allwein, R.E. Schapire, and Y. Singer, "Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers," Proc. 17th Int'l Conf. Machine Learning, pp. 9-16, 2000. Google ScholarDigital Library
- T. Hastie and R. Tibshirani, "Classification by Pairwise Coupling," Advances in Neural Information Processing Systems, vol. 10, pp. 507- 513, 1998. Google ScholarDigital Library
- B. Zadrozny, "Reducing Multiclass to Binary by Coupling Probability Estimates," Advances in Neural Information Processing Systems, vol. 14, pp. 1041-1048, 2001.Google Scholar
- T. Li, C. Zhang, and M. Ogihara, "A Comparative Study of Feature Selection and Multiclass Classification Methods for Tissue Classification Based on Gene Expression," Bioinformatics, vol. 20, no. 15, pp. 2429-2437, Oct. 2004. Google ScholarDigital Library
- A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, "A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis," Bioinformatics, vol. 21, no. 5, pp. 631-643, 2005. Google ScholarDigital Library
- J. Weston and C. Watkins, "Multi-Class Support Vector Machine," technical report, Univ. of London, 1998.Google Scholar
- K. Crammer and Y. Singer, "On the Algorithmic Implementation of Multiclass Kernel-Based Vector Machines," J. Machine Learning Research, vol. 2, pp. 265-292, 2001. Google ScholarDigital Library
- E.L. Allwein, R.E. Schapire, and Y. Singer, "Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers," J. Machine Learning Research, vol. 1, pp. 113-141, 2001. Google ScholarDigital Library
- L. Shen and E.C. Tan, "Reducing Multiclass Cancer Classification to Binary by Output Coding and SVM," Computational Biology and Chemistry, vol. 30, no. 1, pp. 63-71, Feb. 2006. Google ScholarDigital Library
- J. Platt, "Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods," Advances in Large Margin Classifiers, A.J. Smola, P. Bartlett, B. Schoelkopf, and D. Schuurmans, eds., pp. 61-74, 2000.Google Scholar
- K. Kato, "Adaptor-Tagged Competitive PCR: A Novel Method for Measuring Relative Gene Expression," Nucleic Acids Research, vol. 25, no. 22, pp. 4694-4696, Nov. 1997.Google ScholarCross Ref
- E. Saxen, K. Franssila, O. Bjarnason, T. Normann, and N. Ringertz, "Observer Variation in Histologic Classification of Thyroid Cancer," Acta Pathologica et Microbiologica Scandinavica A, vol. 86A, no. 6, pp. 483-486, Nov. 1978.Google Scholar
- A.S. Fassina, M.C. Montesco, V. Ninfo, P. Denti, and G. Masarotto, "Histological Evaluation of Thyroid Carcinomas: Reproducibility of the WHO Classification," Tumori, vol. 79, no. 5, pp. 314-320, Oct. 1993.Google ScholarCross Ref
- Z.W. Baloch, S. Fleisher, V.A. LiVolsi, and P.K. Gupta, "Diagnosis of Follicular Neoplasm: A Gray Zone in Thyroid Fine-Needle Aspiration Cytology," Diagnostic Cytophathology, vol. 26, no. 1, pp. 41-44, Jan. 2002.Google ScholarCross Ref
- K. Kato, R. Yamashita, R. Matoba, M. Monden, S. Noguchi, T. Takagi, and K. Nakai, "Cancer Gene Expression Database (CGED): A Database for Gene Expression Profiling and Accompanying Clinical Information of Human Cancer Tissues," Nucleic Acids Research, vol. 33, pp. D533-D536, 2005.Google ScholarCross Ref
- K. Taniguchi, T. Takano, A. Miyauchi, K. Koizumi, Y. Ito, Y. Takamura, M. Ishitobi, Y. Miyoshi, T. Taguchi, Y. Tamaki, K. Kato, and S. Noguchi, "Differentiation of Follicular Thyroid Adenoma from Carcinoma by Gene Expression Profiling with Adapter-Tagged Competitive Polymerase Chain Reaction," Oncology, vol. 69, pp. 428-435, 2005.Google ScholarCross Ref
- S.A. Armstrong, J.E. Staunton, L.B. Silverman, R. Pieters, M.L. den Boer, M.D. Minden, S.E. Sallan, E.S. Lander, T.R. Golub, and S.J. Korsmeyer, "MLL Translocations Specify a Distinct Gene Expression Profile that Distinguishes a Unique Leukemia," Nature Genetics, vol. 30, no. 1, pp. 41-47, Jan. 2002.Google Scholar
- R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu, "Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression," Proc. Nat'l Academy Sciences USA, vol. 99, no. 10, pp. 6567- 6572, May 2002.Google ScholarCross Ref
- M. Ohira, S. Oba, Y. Nakamura, E. Isogai, S. Kaneko, A. Nakagawa, T. Hirata, H. Kubo, T. Goto, S. Yamada, Y. Yoshida, M. Fuchioka, S. Ishii, and A. Nakagawara, "Expression Profiling Using a Tumor-Specific cDNA Microarray Predicts the Prognosis of Intermediate Risk Neuroblastomas," Cancer Cell, vol. 7, no. 4, pp. 337-350, Apr. 2005.Google ScholarCross Ref
- T.S. Furey, N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer, and D. Haussler, "Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data," Bioinformatics, vol. 16, no. 10, pp. 906-914, evaluation studies, Oct. 2000.Google Scholar
- S. Dudoit, J. Fridlyand, and T.P. Speed, "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," J. Am. Statistical Assoc., vol. 97, pp. 77-87, 2002.Google ScholarCross Ref
- Y. Freund and R. Schapire, "Experiments with a New Boosting Algorithm," Proc. Int'l Conf. Machine Learning (ICML '96), pp. 148- 156, 1996.Google Scholar
- I. Guyon, J. Weston, S.M.D. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol. 46, pp. 389-422, 2002. Google ScholarDigital Library
Index Terms
- Optimal Aggregation of Binary Classifiers for Multiclass Cancer Diagnosis Using Gene Expression Profiles
Recommendations
Mining Gene Expression Profiles and Gene Regulatory Networks: Identification of Phenotype-Specific Molecular Mechanisms
SETN '08: Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and ApplicationsThe complex regulatory mechanisms of genes and their transcription are the major gene regulatory steps in the cell. Gene Regulatory Networks (GRNs) and DNA Microarrays (MAs) present two of the most prominent and heavily researched concepts in ...
Multiclass cancer classification using a feature subset-based ensemble from microRNA expression profiles
Cancer classification has been a crucial topic of research in cancer treatment. In the last decade, messenger RNA (mRNA) expression profiles have been widely used to classify different types of cancers. With the discovery of a new class of small non-...
Cancer classification using gene expression data
Special issue: Data management in bioinformaticsThe classification of different tumor types is of great importance in cancer diagnosis and drug discovery. However, most previous cancer classification studies are clinical based and have limited diagnostic ability. Cancer classification using gene ...
Comments