A Comprehensive Study of Feature Selection Methods for Enhancing Predictive Accuracy in Multidimensional Datasets

Authors

  • Surendra Pradeep Kumar Research Associate, India. Author

Keywords:

Feature selection, predictive accuracy, multidimensional datasets, machine learning, data preprocessing

Abstract

Feature selection is an essential step in the data preprocessing pipeline that directly impacts the performance of predictive models. In high-dimensional datasets, redundant and irrelevant features can dilute the efficacy of machine learning algorithms, leading to reduced accuracy, increased computational costs, and overfitting. This paper explores various feature selection techniques, their impact on predictive accuracy, and their applicability in diverse domains. We provide a systematic review of traditional, statistical, and machine learning-based methods, offering a comparative analysis supported by empirical data. The results demonstrate the significant role of feature selection in optimizing computational resources and improving model generalizability.

References

Hall, Mark A. "Correlation-Based Feature Selection for Machine Learning." Proceedings of the 16th International Conference on Machine Learning, 1999, pp. 359–366.

Liu, Huan, and Lei Yu. "Toward Integrating Feature Selection Algorithms for Classification and Clustering." IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, 2005, pp. 491–502.

Lekkala, C. (2019). Strategies for Effective Partitioning Data at Scale in Large-scale Analytics. European Journal of Advances in Engineering and Technology, 6(11), 49–55.

Guyon, Isabelle, et al. "Gene Selection for Cancer Classification Using Support Vector Machines." Machine Learning, vol. 46, no. 1–3, 2002, pp. 389–422.

Tibshirani, Robert. "Regression Shrinkage and Selection via the Lasso." Journal of the Royal Statistical Society: Series B (Methodological), vol. 58, no. 1, 1996, pp. 267–288.

Zou, Hui, and Trevor Hastie. "Regularization and Variable Selection via the Elastic Net." Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 2, 2005, pp. 301–320.

Lekkala, C. (2020). Leveraging Lambda Architecture for Efficient Real-Time Big Data Analytics. European Journal of Advances in Engineering and Technology, 7(2), 59–64.

Breiman, Leo. "Random Forests." Machine Learning, vol. 45, no. 1, 2001, pp. 5–32.

Baldi, Pierre. "Autoencoders, Unsupervised Learning, and Deep Architectures." Proceedings of ICML Workshop on Unsupervised and Transfer Learning, 2012, pp. 37–50.

Kohavi, Ron, and George H. John. "Wrappers for Feature Subset Selection." Artificial Intelligence, vol. 97, no. 1–2, 1997, pp. 273–324.

Lekkala, C. (2020). Advancements in Data Ingestion: Building High-Throughput Pipelines with Kafka and Spark Streaming. Journal of Scientific and Engineering Research, 7(7), 253–259.

Dash, Manoranjan, and Huan Liu. "Feature Selection for Classification." Intelligent Data Analysis, vol. 1, no. 3, 1997, pp. 131–156.

Peng, Hanchuan, et al. "Feature Selection Based on Mutual Information Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, 2005, pp. 1226–1238.

Chandrashekar, Girish, and Ferat Sahin. "A Survey on Feature Selection Methods." Computers & Electrical Engineering, vol. 40, no. 1, 2014, pp. 16–28.

Lekkala, C. (2020). Building Resilient Big Data Pipelines with Delta Lake for Improved Data Governance. European Journal of Advances in Engineering and Technology, 7(12), 101–106.

Brown, Gavin, et al. "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection." Journal of Machine Learning Research, vol. 13, 2012, pp. 27–66.

Tang, Jie, et al. "Feature Selection for Classification: A Review." Data Classification: Algorithms and Applications, edited by Charu C. Aggarwal, Chapman and Hall/CRC, 2014, pp. 37–64.

Downloads

How to Cite

Surendra Pradeep Kumar. (2024). A Comprehensive Study of Feature Selection Methods for Enhancing Predictive Accuracy in Multidimensional Datasets. JOURNAL OF RECENT TRENDS IN COMPUTER SCIENCE AND ENGINEERING ( JRTCSE), 9(2), 24-30. https://jrtcse.com/index.php/home/article/view/JRTCSE.2021.2.3