Hybrid machine learning approach for performance estimation on diabetes dataset
Pallavi Kumari and Surjeet Gautam
Abstract
A hybrid machine learning approach combining k-nearest neighbors (kNN) and decision tree (DT) algorithms was proposed to enhance diabetes prediction using the PIMA Indian diabetes dataset. The hybrid model leverages kNN’s ability to capture non-linear patterns and DT’s interpretability and feature selection capabilities, mitigating their individual limitations. The dataset was pre-processed by handling missing values, standardizing features, and addressing class imbalance using the synthetic minority oversampling Technique (SMOTE). The predictions of kNN and DT were integrated through a weighted voting mechanism based on their validation performance. The hybrid model was evaluated using accuracy, precision, recall, and F1-score metrics, demonstrating improved predictive accuracy and generalization. Performance analysis across distance algorithms revealed Chebyshev as the best-performing metric, achieving over 96% accuracy and excelling in recall and F1-score. This study highlights the potential of hybrid machine learning approaches in healthcare, providing scalable and interpretable solutions for complex datasets like diabetes prediction.
Keyword
Diabetes prediction, Hybrid machine learning, K-nearest neighbors, Decision tree, PIMA Indian diabetes dataset.,test
Cite this article
Kumari P, Gautam S.Hybrid machine learning approach for performance estimation on diabetes dataset. ACCENTS Transactions on Image Processing and Computer Vision. 2024;10(29):20-25. DOI:10.19101/TIPCV.2024.1026005