|
ABSTRACT
Serial analysis of gene expression (SAGE) is a powerful technique for global gene expression profiling, allowing simultaneous analysis of thousands of transcripts without prior structural and functional knowledge. Pattern discovery and visualization have become fundamental approaches to analyzing such large-scale gene expression data. From the pattern discovery perspective, clustering techniques have received great attention. However, due to the statistical nature of SAGE data (i.e., underlying distribution), traditional clustering techniques may not be suitable for SAGE data analysis. Based on the adaptation and improvement of Self-Organizing Maps and hierarchical clustering techniques, this paper presents two new clustering algorithms, namely, PoissonS and PoissonHC, for SAGE data analysis. Tested on synthetic and experimental SAGE data, these algorithms demonstrate several advantages over traditional pattern discovery techniques. The results indicate that, by incorporating statistical properties of SAGE data, PoissonS and PoissonHC, as well as a hybrid approach (neuro-hierarchical approach) based on the combination of PoissonS and PoissonHC, offer significant improvements in pattern discovery and visualization for SAGE data. Moreover, a user-friendly platform, which may improve and accelerate SAGE data mining, was implemented. The system is freely available on request from the authors for nonprofit use.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
[1] V. E. Velculescu, L. Zhang, B. Vogelstein, and K. W. Kinzler, "Serial Analysis of Gene Expression," Science, vol. 276, pp. 1268-1272, 1997.
|
| |
2
|
[2] W. D. Patino, O. Y. Mian, and P. M. Hwang, "Serial Analysis of Gene Expression: Technical Considerations and Applications to Cardiovascular Biology," Circulation Research, vol. 91, no. 7, pp. 565-569, 2002.
|
| |
3
|
[3] J. M. Ruijter, A. H. Van Kampen, and F. Baas, "Statistical Evaluation of SAGE Libraries: Consequences for Experimental Design," Physiological Genomics, vol. 11, pp. 37-44, 2002.
|
| |
4
|
[4] C. M. Aldaz, "Serial Analysis of Gene Expression (SAGE) in Cancer Research," Expression Profiling of Human Tumors: Diagnostic and Research Applications, M. Ladanyi and W. L. Gerald, eds., pp. 47-60, Humana Press, 2003.
|
| |
5
|
[5] K. Polyak and G. J. Riggins, "Gene Discovery Using the Serial Analysis of Gene Expression Technique: Implications for Cancer Research," J. Clinical Oncology, vol. 19, no. 11, pp. 2948-2958, 2001.
|
| |
6
|
|
| |
7
|
[7] R. T. Ng, J. Sander, and M. C. Sleumer, "Hierarchical Cluster Analysis of SAGE Data for Cancer Profiling," Proc. Workshop Data Mining in Bioinformatics (BIOKDD'01), 2001.
|
| |
8
|
[8] D. A. Porter, I. E. Krop, S. Nasser, D. Sgroi, C. M. Kaelin, J. R. Marks, G. Riggins, and K. Polyak, "A SAGE (Serial Analysis of Gene Expression) View of Breast Tumor Progression," Cancer Research, vol. 61, pp. 5697-5702, Aug. 2001.
|
| |
9
|
[9] C. Becquet, S. Blachon, B. Jeudy, J. Boulicaut, and O. Gandrillon, "Strong-Association-Rule Mining for Large-Scale Gene-Expression Data Analysis: A Case Study on Human SAGE Data," Genome Biology, vol. 3, no. 12, 2002.
|
| |
10
|
[10] L. Cai, H. Huang, S. Blackshaw, J. S. Liu, C. Cepko, and W. Wong, "Clustering Analysis of SAGE Data: A Poisson Approach," Genome Biology, vol. 5, no. R51, 2004.
|
| |
11
|
[11] T. Chu, "Learning from SAGE Data," PhD dissertation, Dept. of Philosophy, Carnegie Mellon Univ., Jan. 2003.
|
| |
12
|
[12] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. Lander, and T. Golub, "Interpreting Patterns of Gene Expression with Self-Organising Maps: Methods and Application to Hematopoietic Differentiation," Proc. Nat'l Academy of Sciences USA, vol. 96, pp. 2907-2912, 1999.
|
| |
13
|
|
| |
14
|
[14] S. Kaski, "Data Exploration Using Self-Organising Maps," PhD dissertation, Helsinki Univ. of Technology, 1997.
|
| |
15
|
[15] A. Flexer, "On the Use of Self-Organising Maps for Clustering and Visualisation," Intelligent Data Analysis, vol. 5, no. 5, pp. 373-384, 2001.
|
| |
16
|
[16] S. Audic and J.-M. Claverie, "The Significance of Digital Gene Expression Profiles," Genome Research, vol. 7, no. 10, pp. 986-995, Oct. 1997.
|
| |
17
|
[17] J. Herrero and J. Dopazo, "Combining Hierarchical Clustering and Self-Organizing Maps for Exploratory Analysis of Gene Expression Patterns," J. Proteome Research, vol. 1, no. 5, pp. 467-470, 2002.
|
| |
18
|
[18] J. Vesanto and E. Alhoniemi, "Clustering of the Self-Organising Map," IEEE Trans. Neural Networks, vol. 11, no. 3, pp. 586-600, 2000.
|
| |
19
|
[19] A. I. Saeed, V. Sharov, J. White, J. Li, W. Liang, N. Bhagabati, J. Braisted, M. Klapa, T. Currier, M. Thiagarajan, A. Sturn, M. Snuffin, A. Rezantsev, D. Popov, A. Ryltsov, E. Kostukovich, I. Borisovsky, Z. Liu, A. Vinsavich, V. Trush, and J. Quackenbush, "TM4: A Free, Opensource System for Microarray Data Management and Analysis," BioTechniques, vol. 34, no. 2, pp. 374-378, 2003.
|
| |
20
|
[20] S. Blackshaw, S. Harpavat, J. Trimarchi, L. Cai, H. Huang, W. Kuo, K. Lee, R. Fraioli, S. Cho, R. Yung, E. Asch, W. Wong, L. Ohno-Machado, G. Weber, and C. L. Cepko, "Genomic Analysis of Mouse Retinal Development," PLoS Biology, vol. 2, no. 9, 2004.
|
| |
21
|
[21] P. Buckhaults, Z. Zhang, Y. C. Chen, T. L. Wang, B. St Croix, S. Saha, A. Bardelli, P. J. Morin, K. Polyak, R. H. Hruban, V. E. Velculescu, and IeM. Shih, "Identifying Tumor Origin Using a Gene Expression-Based Classification Map," Cancer Research, vol. 63, no. 14, pp. 4144-4149, 2003.
|
| |
22
|
[22] S. Blackshaw, R. E. Fraioli, T. Furukawa, and C. L. Cepko, Comprehensive Analysis of Photoreceptor Gene Expression and the Identification of Candidate Retinal Disease Genes, Cell, vol. 107, pp. 579-589, 2001.
|
| |
23
|
[23] F. J. Livesey, T. L. Young, and C. L. Cepko, "An Analysis of the Gene Expression Program of Mammalian Neural Progenitor Cells," Proc. Nat'l Academy of Sciences USA, vol. 101, pp. 1374- 1379, 2004.
|
| |
24
|
|
| |
25
|
|
| |
26
|
[26] J. C. Bezdek and N. R. Pal, "Some New Indexes of Cluster Validity," IEEE Trans. Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 28, no. 3, pp. 301-315, 1998.
|
| |
27
|
|
| |
28
|
[28] A. E. Lash, C. M. Tolstoshev, L. Wagner, G. D. Schuler, R. L. Strausberg, G. J. Riggins, and S. F. Altschul, "SAGEmap: A Public Gene Expression Resource," Genome Research, vol. 10, no. 7, pp. 1051-1060, 2000.
|
| |
29
|
[29] J. C. Bezdek and N. R. Pal, "Some New Indexes of Cluster Validity," IEEE Trans. Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 28, no. 3, pp. 301-315, 1998.
|
| |
30
|
|
| |
31
|
[31] D. Alahakoon, S. K. Halgamuge, and B. Srinivasan, "Dynamic Self-Organising Maps with Controlled Growth for Knowledge Discovery," IEEE Trans. Neural Networks, vol. 11, no. 3, pp. 601- 614, 2000.
|
| |
32
|
[32] J. Dopazo and J. M. Carazo, "Phylogenetic Reconstruction Using an Unsupervised Growing Neural Network that Adopts the Topology of a Phylogenetic Tree," J. Molecular Evolution, vol. 44, pp. 226-233, 1997.
|
|