Improving medical diagnostics with machine learning: a study on data classification algorithms
Abhishek Kumar and Sujeet Gautam
Abstract
This paper investigates the effectiveness of the logistic regression (LR) and random forest (RF) algorithms for classifying breast cancer using the Breast Cancer Wisconsin Dataset, consisting of 699 instances and 10 attributes. After pre-processing the data and performing feature extraction to retain relevant information, the dataset is split into training, validation, and test portions to evaluate the LR and RF algorithms. The LR algorithm achieves an accuracy level ranging from 96% to 97% across different split ratios, and its error rate decreases with larger training sets. The RF algorithm achieves an accuracy level ranging from 96% to 98% across different split ratios. The results indicate that both algorithms are effective for classifying the data, and the figures highlight the impact of different split ratios on accuracy and error rate. Proper selection of the split ratio is essential for obtaining reliable results.
Keyword
LR, RF, Machine learning, Data selection.
Cite this article
Kumar A, Gautam S.Improving medical diagnostics with machine learning: a study on data classification algorithms . International Journal of Advanced Computer Research. 2022;12(61):31-42. DOI:10.19101/IJACR.2021.1152067
Refference
[1]Abideen ZU, Mazhar T, Razzaq A, Haq I, Ullah I, Alasmary H, et al. Analysis of enrollment criteria in secondary schools using machine learning and data mining approach. Electronics. 2023; 12(3):1-25.
[2]Suiçmez Ç, Yılmaz C, Kahraman HT, Cengiz E, Suiçmez A. Prediction of hepatitis C disease with different machine learning and data mining technique. In smart applications with advanced machine learning and human-centred problem design 2023(pp. 375-98). Cham: Springer International Publishing.
[3]Dubey AK, Gupta U, Jain S. Comparative study of K-means and fuzzy C-means algorithms on the breast cancer data. International Journal on Advanced Science, Engineering and Information Technology. 2018; 8(1):18-29.
[4]Hussin SK, Omar YM, Abdelmageid SM, Marie MI. Traditional machine learning and big data analytics in virtual screening: a comparative study. International Journal of Advanced Computer Research. 2020; 10(47):72-88.
[5]Mumtaz G, Akram S, Iqbal W, Ashraf MU, Almarhabi KA, Alghamdi AM, et al. Classification and prediction of significant cyber incidents (SCI) using data mining and machine learning (DM-ML). IEEE Access. 2023.
[6]Sanjeetha R, Raj A, Saivenu K, Ahmed MI, Sathvik B, Kanavalli A. Detection and mitigation of botnet based DDoS attacks using catboost machine learning algorithm in SDN environment. International Journal of Advanced Technology and Engineering Exploration. 2021; 8(76):445-61.
[7]Saha JK, Patidar K, Kushwah R, Saxena G. Object oriented quality prediction through artificial intelligence and machine learning: a survey. ACCENTS Transactions on Information Security. 2020; 5(17): 1-5.
[8]Dubey AK, Gupta U, Jain S. Computational measure of cancer using data mining and optimization. In sustainable communication networks and application: ICSCN 2019 2020 (pp. 626-32). Springer International Publishing.
[9]Mohammady M. Badland erosion susceptibility mapping using machine learning data mining techniques, Firozkuh watershed, Iran. Natural Hazards. 2023:1-9.
[10]Nemade V, Pathak S, Dubey AK. A systematic literature review of breast cancer diagnosis using machine intelligence techniques. Archives of Computational Methods in Engineering. 2022; 29(6):4401-30.
[11]Ashtiani MN, Raahmei B. News-based intelligent prediction of financial markets using text mining and machine learning: a systematic literature review. Expert Systems with Applications. 2023.
[12]Kannan R, Nandwana P. Accelerated alloy discovery using synthetic data generation and data mining. Scripta Materialia. 2023.
[13]Sher T, Rehman A, Kim D. COVID-19 outbreak prediction by using machine learning algorithms. Computers, Materials and Continua. 2023:1561-74.
[14]Dubey A, Gupta U, Jain S. Medical data clustering and classification using TLBO and machine learning algorithms. Computers, Materials and Continua. 2021; 70(3):4523-43.
[15]Nemade V, Pathak S, Dubey AK, Barhate D. A review and computational analysis of breast cancer using different machine learning techniques. International Journal of Emerging Technology and Advanced Engineering. 2022; 12(3):111-8.
[16]Mahoto NA, Shaikh A, Sulaiman A, Al Reshan MS, Rajab A, Rajab K. A machine learning based data modeling for medical diagnosis. Biomedical Signal Processing and Control. 2023.
[17]Cheng LC, Lu WT, Yeo B. Predicting abnormal trading behavior from internet rumor propagation: a machine learning approach. Financial Innovation. 2023; 9(1).
[18]Chahar R, Dubey AK, Narang SK. A review and meta-analysis of machine intelligence approaches for mental health issues and depression detection. International Journal of Advanced Technology and Engineering Exploration. 2021; 8(83):1279-314.
[19]Ananthi J, Sengottaiyan N, Anbukaruppusamy S, Upreti K, Dubey AK. Forest fire prediction using IoT and deep learning. International Journal of Advanced Technology and Engineering Exploration. 2022; 9(87):246-56.
[20]Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine. 2001; 23(1):89-109.
[21]Kamra V, Kumar P, Mohammadian M. Formulation of an elegant diagnostic approach for an intelligent disease recommendation system. In 9th international conference on cloud computing, data science & engineering (Confluence) 2019 (pp. 278-81). IEEE.
[22]Xiang Z, Jinghua C, Tao W. Review of machine learning algorithms for health-care management medical big data systems. In international conference on inventive computation technologies (ICICT) 2020 (pp. 651-4). IEEE.
[23]Juddoo S, George C. A qualitative assessment of machine learning support for detecting data completeness and accuracy issues to improve data analytics in big data for the healthcare industry. In 3rd international conference on emerging trends in electrical, electronic and communications engineering (ELECOM) 2020 (pp. 58-66). IEEE.
[24]Leung CK, Chen Y, Hoi CS, Shang S, Cuzzocrea A. Machine learning and OLAP on big COVID-19 data. In IEEE international conference on big data (Big Data) 2020 (pp. 5118-27). IEEE.
[25]Jayatilake SM, Ganegoda GU. Involvement of machine learning tools in healthcare decision making. Journal of healthcare engineering. 2021:1-20.
[26]Tchito Tchapga C, Mih TA, Tchagna Kouanou A, Fozin Fonzin T, Kuetche Fogang P, Mezatio BA, et al. Biomedical image classification in a big data architecture using machine learning algorithms. Journal of Healthcare Engineering. 2021; 2021:1-11.
[27]Chahar R. Computational decision support system in healthcare: a review and analysis. International Journal of Advanced Technology and Engineering Exploration. 2021; 8(75):199-220.
[28]Mustafa A, Rahimi Azghadi M. Automated machine learning for healthcare and clinical notes analysis. Computers. 2021; 10(2):1-31.
[29]Aldahiri A, Alrashed B, Hussain W. Trends in using IoT with machine learning in health prediction system. Forecasting. 2021; 3(1):181-206.
[30]Vokinger KN, Feuerriegel S, Kesselheim AS. Mitigating bias in machine learning for medicine. Communications Medicine. 2021; 1(1).
[31]Rafi TH, Shubair RM, Farhan F, Hoque MZ, Quayyum FM. Recent advances in computer-aided medical diagnosis using machine learning algorithms with optimization techniques. IEEE Access. 2021; 9:137847-68.
[32]Sun W, Zhang P, Wang Z, Li D. Prediction of cardiovascular diseases based on machine learning. ASP Transactions on Internet of Things. 2021; 1(1):30-5.
[33]Dhinakaran M, Phasinam K, Alanya-Beltran J, Srivastava K, Babu DV, Singh SK. A system of remote patients’ monitoring and alerting using the machine learning technique. Journal of Food Quality. 2022:1-7.
[34]Elyan E, Vuttipittayamongkol P, Johnston P, Martin K, McPherson K, Jayne C, et al. Computer vision and machine learning for medical image analysis: recent advances, challenges, and way forward. Artificial Intelligence Surgery. 2022:1-25.
[35] Hinterwimmer F, Lazic I, Suren C, Hirschmann MT, Pohlig F, Rueckert D, et al. Machine learning in knee arthroplasty: specific data are key-a systematic review. Knee Surgery, Sports Traumatology, Arthroscopy. 2022; 30(2):376-88.
[36]Zhang A, Xing L, Zou J, Wu JC. Shifting machine learning for healthcare from development to deployment and from models to data. Nature Biomedical Engineering. 2022:1-6.
[37]Severn C, Suresh K, Görg C, Choi YS, Jain R, Ghosh D. A pipeline for the implementation and visualization of explainable machine learning for medical imaging using radiomics features. Sensors. 2022; 22(14):1-16.
[38]Zhu S, Gilbert M, Chetty I, Siddiqui F. The 2021 landscape of FDA-approved artificial intelligence/machine learning-enabled medical devices: an analysis of the characteristics and intended use. International Journal of Medical Informatics. 2022.
[39]Kobashi S, Hossain B, Nii M, Kambara S, Morooka T, Okuno M, Yoshiya S. Prediction of post-operative implanted knee function using machine learning in clinical big data. In 2016 international conference on machine learning and cybernetics (ICMLC) 2016 (pp. 195-200). IEEE.
[40]Lu YC, Lu CJ, Chang CC, Lin YW. A hybrid of data mining and ensemble learning forecasting for recurrent ovarian cancer. In 2017 international conference on intelligent informatics and biomedical sciences (ICIIBMS) 2017 (pp. 216-6). IEEE.
[41]Pitoglou S, Koumpouros Y, Anastasiou A. Using electronic health records and machine learning to make medical-related predictions from non-medical data. In international conference on machine learning and data engineering (iCMLDE) 2018 (pp. 56-60). IEEE.
[42]Reamaroon N, Sjoding MW, Lin K, Iwashyna TJ, Najarian K. Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE Journal of Biomedical and Health Informatics. 2018; 23(1):407-15.
[43]Liu Y, Leng Q, Wang S. Learning medical diagnosis via scaled convex hull-based SK algorithm. In 8th data driven control and learning systems conference (DDCLS) 2019 (pp. 377-81). IEEE.
[44]Chang W, Liu Y, Xiao Y, Yuan X, Xu X, Zhang S, Zhou S. A machine-learning-based prediction method for hypertension outcomes based on medical data. Diagnostics. 2019; 9(4):1-21.
[45]Khushi M, Shaukat K, Alam TM, Hameed IA, Uddin S, Luo S, et al. A comparative performance analysis of data resampling methods on imbalance medical data. IEEE Access. 2021; 9:109960-75.
[46]Yang H, Li X, Cao H, Cui Y, Luo Y, Liu J, Zhang Y. Using machine learning methods to predict hepatic encephalopathy in cirrhotic patients with unbalanced data. Computer Methods and Programs in Biomedicine. 2021.
[47]Bharti R, Khamparia A, Shabaz M, Dhiman G, Pande S, Singh P. Prediction of heart disease using a combination of machine learning and deep learning. Computational Intelligence and Neuroscience. 2021:1-11.
[48]Ram A, Vishwakarma H. Diabetes prediction using machine learning and data mining methods. In IOP conference series: materials science and engineering 2021 (pp. 1-11). IOP Publishing.
[49]Khan S, Saravanan VN, Lakshmi TJ, Deb N, Othman NA. Privacy protection of healthcare data over social networks using machine learning algorithms. Computational Intelligence and Neuroscience. 2022:1-8.
[50]Urban S, Błaziak M, Jura M, Iwanek G, Zdanowicz A, Guzik M, et al. Novel phenotyping for acute heart failure-unsupervised machine learning-based approach. Biomedicines. 2022; 10(7):1-20.
[51]Lee KH, Dong JJ, Kim S, Kim D, Hyun JH, Chae MH, et al. Prediction of bacteremia based on 12-year medical data using a machine learning approach: effect of medical data by extraction time. Diagnostics. 2022; 12(1):1-13.
[52]Ahmad GN, Fatima H, Ullah S, Saidi AS. Efficient medical diagnosis of human heart diseases using machine learning techniques with and without GridSearchCV. IEEE Access. 2022; 10:80151-73.
[53]Dong Z, Wang Q, Ke Y, Zhang W, Hong Q, Liu C, et al. Prediction of 3-year risk of diabetic kidney disease using machine learning based on electronic medical records. Journal of Translational Medicine. 2022; 20(1):1-10.
[54]Tanioka S, Yago T, Tanaka K, Ishida F, Kishimoto T, Tsuda K, et al. Machine learning prediction of hematoma expansion in acute intracerebral hemorrhage. Scientific Reports. 2022; 12(1):1-8.