Advancements in Gene Expression Analysis Through Distinguishability-Based Feature Selection

Nur Ernawan Salim

Authors

Nur Ernawan Salim malaysia Author

Keywords:

Gene expression analysis, Microarray data, Feature selection, Distinguishability, Weighted feature selection, Classification accuracy, Bioinformatics, Disease biomarkers

Abstract

Gene expression analysis using microarray data has become a cornerstone in bioinformatics for understanding disease mechanisms and discovering biomarkers. However, the high dimensionality and noise inherent in gene expression data pose significant challenges for effective classification. This study introduces a novel feature selection algorithm based on distinguishability and weighted feature assessment to enhance classification accuracy. Our proposed method evaluates the distinguishability of each gene across different classes and assigns weights accordingly, ensuring that highly discriminative genes are prioritized. Extensive experiments on benchmark microarray datasets demonstrate that our approach significantly improves classification performance compared to traditional methods. The results suggest that distinguishability-based weighted feature selection is a promising avenue for refining gene expression analysis, ultimately aiding in more accurate disease diagnosis and treatment planning.

References

Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96(12), 6745-6750.

Dudoit, S., Fridlyand, J., & Speed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97(457), 77-87.

Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., ... & Lander, E. S. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286(5439), 531-537.

Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1-3), 389-422.

Jain, A. K., & Zongker, D. (1997). Feature selection: Evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2), 153-158

Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491-502.

Li, J., Liu, H., & Wong, L. (2003). Mining statistically important equivalence classes and delta-discriminative emerging patterns in gene expression data. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 430-439.

Pomeroy, S. L., Tamayo, P., Gaasenbeek, M., Sturla, L. M., Angelo, M., McLaughlin, M. E., ... & Golub, T. R. (2002). Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415(6870), 436-442.

Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C. H., Angelo, M., ... & Golub, T. R. (2001). Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences, 98(26), 15149-15154.

Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences, 99(10), 6567-6572.