ABSTRACT
Despite the plethora of gene expression based cancer biomarkers in the scientific literature, a few make their way to the clinic. In the past, several efforts have been made to predict cancer biomarkers with very limited success so far. One of the challenges in the field of cancer biology is to predict cancer at an early stage. The success of various therapies to treat cancer patients depends on correct identification of stage or progression of cancer. Despite the tremendous progress in the field of genomics and proteomics, the performance of stage classification has not improved substantially. Recently our group also developed CancerCSP, a server with prediction models for discriminating early and late stage of clear cell renal cancer (ccRCC) samples based on the gene expression profile. We achieved maximum accuracy of 72.64% with ROC value 0.81, despite the fact that we tried state of- the-art techniques to improve the performance of our models. This raises the question, why the models fail to discriminate ccRCC patients in the early and late stage with high accuracy. In this poster, the analysis is carried out on ccRCC samples obtained from The Cancer Genome Atlas (TCGA) data portal to understand the reasons for the failure of the stage classification models. Firstly, we performed bin-wise analysis of top 20 genes that can discriminate (single gene-based models using threshold) early and late stage samples with highest ROC. A significant overlap was observed in the expression of each gene in early and late stage samples. Though the number of early and late stage samples varied in different gene expression bins, this was not sufficient to classify both types of samples with high accuracy. As an example, the gene NR3C2 had maximum ROC of 0.67 at expression (log RSEM) of 7.61. There were nearly 70% early stage patients above this threshold that made it an average expression marker but the presence of nearly 55% of late stage patients above this threshold increased the false positives. Secondly, we performed hierarchical clustering of ccRCC samples using 64- gene expression features selected using Weka showed weak concordance with pathological stage. The k-means clustering of patients into four groups showed four separable clusters, but these clusters were not associated with the pathological stage. These observations led to the conclusion that the molecular parameters do not always comply with histopathological features. The third analysis was done to identify patients, which were not predicted correctly by any of the four machine-learning algorithms (SVM, Random Forest, SMO and Naïve Bayes). Many samples were not predicted correctly by any of the four machine-learning methods. The false positives and false negatives belonged to explicit clusters obtained through clustering. This further points out to the interspersed nature of the data to differentiate between histopathological stages of cancer. We reach the conclusion that expression profile of genes is not adequate to classify different stages of cancer samples.
Index Terms
Challenges in Prediction of different Cancer Stages using Gene Expression Profile of Cancer Patients
Recommendations
MicroRNA Expression Profiling of Pancreatic Cancer Histological Subtypes in the Identification of Potential Biomarkers for Diagnosis and Tumor Prognosis
ICBRA '23: Proceedings of the 2023 10th International Conference on Bioinformatics Research and ApplicationsPancreatic cancer (PC) is the seventh leading cause of cancer-related deaths and 12th in malignant frequency globally. It has a significantly high incidence rate in countries with high human development index (HDI) and, generally, regions able to use ...
Prediction of lung cancer metastasis by gene expression
AbstractTumor metastasis is the main cause of death in cancer patients. Early prediction of tumor metastasis can allow for timely intervention. At present, research on tumor metastasis mainly focuses on manual diagnosis by imaging or diagnosis by ...
Highlights- A method for predicting lung cancer metastasis was proposed based on deep learning.
- The correlation between transcripts of multiple genes and lung cancer metastasis has been verified.
- The method was tested on TCGA data and achieved ...
Prognostic Factor Analysis for Breast Cancer Using Gene Expression Profiles
DTMBIO '15: Proceedings of the ACM Ninth International Workshop on Data and Text Mining in Biomedical InformaticsThe survival of patients with breast cancer is highly sporadic, from a few months to more than 15 years. Recently, the large-scale gene expression profiling of tumors has been used as a promising means of predicting prognosis factors. In this study, we ...
Comments