The large amounts of microarray data provide us a great opportunity to identify gene expression profiles (GEPs) in different tissues or disease states. Disease-specific biomarker genes likely share GEPs that are distinct in disease samples as compared with normal samples. The similarity of the GEPs may be evaluated by Pearson Correlation Coefficient (PCC) and the distinctness of GEPs may be assessed by Kolmogorov-Smirnov distance (KSD). In this study, we used the PCC and KSD metrics for GEPs to identify disease-specific (cancerspecific) biomarkers. We first analyzed and compared GEPs using microarray datasets for smoking and lung cancer. We found that the number of genes with highly different GEPs between comparing groups in smoking dataset was much larger than that in lung cancer dataset; this observation was further verified when we compared GEPs in smoking dataset with prostate cancer datasets. Moreover, our Gene Ontology analysis revealed that the top ranked biomarker candidate genes for prostate cancer were highly enriched in molecular function categories such as 'cytoskeletal protein binding' and biological process categories such as 'muscle contraction'. Finally, we used two genes, ACTC1 (encoding an actin subunit) and HPN (encoding hepsin), to demonstrate the feasibility of diagnosing and monitoring prostate cancer using the expression intensity histograms of marker genes. In summary, our results suggested that this approach might prove promising and powerful for diagnosing and monitoring the patients who come to the clinic for screening or evaluation of a disease state including cancer.
- Cancer biomarker
- Cancer diagnosis and prognosis
- Gene expression profile
- Kolmogorov-Smirnov distance
- Pearson correlation coefficient
ASJC Scopus subject areas
- Cancer Research