|
ABSTRACT
The use of penalized logistic regression for cancer classification using microarray expression data is presented. Two dimension reduction methods are respectively combined with the penalized logistic regression so that both the classification accuracy and computational speed are enhanced. Two other machine-learning methods, support vector machines and least-squares regression, have been chosen for comparison. It is shown that our methods have achieved at least equal or better results. They also have the advantage that the output probability can be explicitly given and the regression coefficients are easier to interpret. Several other aspects, such as the selection of penalty parameters and components, pertinent to the application of our methods for cancer classification are also discussed.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
P.O. Brown and D. Botstein, “Exploring the New World of the Genome with DNA Microarrays,” <i>Nature Genetics Supplement,</i> vol. 21, pp. 33-37, Jan. 1999.
|
| |
2
|
C. Debouck and P.N. Goodfellow, “DNA Microarrays in Drug Discovery and Development,” <i>Nature Genetics Supplement,</i> vol. 21, pp. 48-50, Jan. 1999.
|
| |
3
|
D.J. Duggan et al., “Expression Profiling Using cDNA Microarrays,” <i>Nature Genetics Supplement,</i> vol. 21, pp. 10-14, Jan. 1999.
|
| |
4
|
C. Peterson and M. Ringnér, “Analyzing Tumor Gene Expression Profiles,” <i>Artificial Intelligence in Medicine,</i> vol. 28, no. 1, pp. 59-74, May 2003.
|
| |
5
|
T.S. Furey et al., “Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data,” <i>Bioinformatics,</i> vol. 16, no. 10, pp. 906-914, 2000.
|
| |
6
|
P.H.C. Eilers et al., “Classification of Microarray Data with Penalized Logistic Regression,” <i>Proc. SPIE,</i> vol. 4266, no. 2, pp.nbsp187-198, 2001.
|
| |
7
|
M.G. Schimek, “Penalized Logistic Regression in Gene Expression Analysis,” <i>Proc. The Art of Semiparametrics Conf.,</i> http://apus.wiwi.hu-berlin.de/statistik/aos2003/schimek/schimek.pdf, Oct. 2003.
|
| |
8
|
J. Zhu and T. Hastie, “Classification of Gene Microarrays by Penalized Logistic Regression,” <i>Biostatistics,</i> vol. 5, no. 3, pp. 427-443, 2004.
|
| |
9
|
T. Hastie R. Tibshirani and J. Friedman, <i>The Elements of Statistical Learning: Data Mining, Inference, and Prediction.</i> New York: Springer, 2001.
|
| |
10
|
A.E. Hoerl and R.W. Kennard, “Ridge Regression: Biased Estimation for Nonorthogonal Problems,” <i>Technometrics,</i> vol. 12, no. 1, pp. 55-67, 1970.
|
| |
11
|
S. le Cessie and J.C. van Houwelingen, “Ridge Estimators in Logistic Regression,” <i>Applied Statistics,</i> vol. 41, no. 1, pp. 191-201, 1992.
|
| |
12
|
J.A. Wegelin, “A Survey of Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case,” technical report, Dept. of Statistics, Univ. of Washington, 2000.
|
| |
13
|
G.H. Golub and C.F. Van Loan, <i>Matrix Computations.</i> The Johns Hopkins Univ. Press, 1996.
|
| |
14
|
|
| |
15
|
B. Efron, “The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis,” <i>J. Am. Statistical Assoc.,</i> vol. 70, no. 352, pp. 892-898, 1975.
|
| |
16
|
S.J. Press and S. Wilson, “Choosing between Logistic Regression and Discriminant Analysis,” <i>J. Am. Statistical Assoc.,</i> vol. 73,no. 364, pp. 699-705, 1978.
|
| |
17
|
J. Li and H. Liu, “Kent Ridge Biomedical Data Set Repository,” http://sdmc-lit.org.sg/GEDatasets, 2002.
|
| |
18
|
A. Schwaighofer, “SVM MATLAB Toolbox,” http://www.cis. tugraz.at/igi/aschwaig/svm_v251.tar.gz, 2001.
|
| |
19
|
S. Gunn, “SVM MATLAB Toolbox,” http://www.isis.ecs.soto n.ac.uk/resources/svminfo/, 2001.
|
| |
20
|
T.R. Golub et al., “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” <i>Science,</i> vol. 286, pp. 531-537, Oct. 1999.
|
| |
21
|
D. Singh et al., “Gene Expression Correlates of Clinical Prostate Cancer Behavior,” <i>Cancer Cell,</i> vol. 1, pp. 203-209, Mar. 2002.
|
| |
22
|
A.C. Tan and D. Gilbert, “Ensemble Machine Learning on Gene Expression Data for Cancer Classification,” <i>Applied Bioinformatics,</i> vol. 2, no. 3, pp. 75-83, 2003.
|
| |
23
|
|
| |
24
|
D.V. Nguyen and D.M. Rocke, “Tumor Classification by Partial Least Squares Using Microarray Gene Expression Data,” <i>Bioinformatics,</i> vol. 18, no. 1, pp. 39-50, 2002.
|
| |
25
|
|
| |
26
|
|
CITED BY
|
|
Arthur Tenenhaus , Alain Giron , Emmanuel Viennet , Michel Béra , Gilbert Saporta , Bernard Fertil, Kernel logistic PLS: A tool for supervised nonlinear dimensionality reduction and binary classification, Computational Statistics & Data Analysis, v.51 n.9, p.4083-4100, May, 2007
|
|