Machine learning techniques with ANOVA for the prediction of breast cancer
Bharti Thakur, Nagesh Kumar and Gaurav Gupta
Abstract
Breast cancer is one of the most common cancer among females. In this paper, machine learning techniques are applied to a molecular taxonomy of breast cancer international consortium (METABRIC) dataset to extract prime clinical attributes. Analysis of variance (ANOVA), is used for clinical feature selection. Five different machine learning algorithms are implemented, which are support vector machine (SVM), decision tree, random forest, AdaBoost and artificial neural network (ANN). Among all the machine learning classifiers, ANN gives the highest accuracy of 87.43%. This statistical technique is helpful for the detection of breast cancer, and it will increase the survival rate of females.
Keyword
Breast cancer, Genes, ANOVA, ANN, SVM, Machine learning, Healthcare.
Cite this article
Thakur B, Kumar N, Gupta G.Machine learning techniques with ANOVA for the prediction of breast cancer. International Journal of Advanced Technology and Engineering Exploration. 2022;9(87):232-245. DOI:10.19101/IJATEE.2021.874555
Refference
[1]Priyanka KS. A review paper on breast cancer detection using deep learning. In conference series: materials science and engineering 2021 (p. 012071). IOP Publishing.
[2]Lukong KE. Understanding breast cancer–the long and winding road. BBA Clinical. 2017; 7:64-77.
[3]Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians. 2021; 71(3):209-49.
[4]Mahdi KM, Nassiri MR, Nasiri K. Hereditary genes and SNPs associated with breast cancer. Asian Pacific Journal of Cancer Prevention. 2013; 14(6):3403-9.
[5]Shiovitz S, Korde LA. Genetics of breast cancer: a topic in evolution. Annals of Oncology. 2015; 26(7):1291-9.
[6]Gupta A, Shridhar K, Dhillon PK. A review of breast cancer awareness among women in India: cancer literate or awareness deficit?. European Journal of Cancer. 2015; 51(14):2058-66.
[7]Pyingkodi M, Thangarajan R. Informative gene selection for cancer classification with microarray data using a metaheuristic framework. Asian Pacific Journal of Cancer Prevention: Asian Pacific Journal of Cancer Prevention. 2018; 19(2):561-4.
[8]Sun Y, Zhu S, Ma K, Liu W, Yue Y, Hu G, Lu H, Chen W. Identification of 12 cancer types through genome deep learning. Scientific Reports. 2019; 9(1):1-9.
[9]El RSA, Al-montasheri A, Al-hazmi B, Al-dkaan H, Al-shehri M. Machine learning model for breast cancer prediction. In international conference on fourth industrial revolution 2019 (pp. 1-8). IEEE.
[10]Le NQ, Yapp EK, Nagasundaram N, Yeh HY. Classifying promoters by interpreting the hidden information of DNA sequences via deep learning and combination of continuous FastText N-grams. Frontiers in Bioengineering and Biotechnology. 2019:1-9.
[11]Carbonell JG, Michalski RS, Mitchell TM. An overview of machine learning. Machine Learning. 1983:3-23.
[12]Vaka AR, Soni B, Reddy S. Breast cancer detection by leveraging machine learning. ICT Express. 2020; 6(4):320-4.
[13]Malvia S, Bagadi SA, Dubey US, Saxena S. Epidemiology of breast cancer in Indian women. Asia‐Pacific Journal of Clinical Oncology. 2017; 13(4):289-95.
[14]Momenimovahed Z, Salehiniya H. Epidemiological characteristics of and risk factors for breast cancer in the world. Breast Cancer: Targets and Therapy. 2019:151-64.
[15]Oeffinger KC, Fontham ET, Etzioni R, Herzig A, Michaelson JS, Shih YC, et al. Breast cancer screening for women at average risk: 2015 guideline update from the American cancer society. JAMA. 2015; 314(15):1599-614.
[16]Gupta P, Garg S. Breast cancer prediction using varying parameters of machine learning models. Procedia Computer Science. 2020; 171:593-601.
[17]Feng Y, Spezia M, Huang S, Yuan C, Zeng Z, Zhang L, et al. Breast cancer development and progression: Risk factors, cancer stem cells, signaling pathways, genomics, and molecular pathogenesis. Genes & Diseases. 2018; 5(2):77-106.
[18]Musumeci F, Rottondi C, Nag A, Macaluso I, Zibar D, Ruffini M, et al. An overview on application of machine learning techniques in optical networks. IEEE Communications Surveys & Tutorials. 2018; 21(2):1383-408.
[19]Kothari C, Osseni MA, Agbo L, Ouellette G, Déraspe M, Laviolette F, et al. Machine learning analysis identifies genes differentiating triple negative breast cancers. Scientific Reports. 2020; 10(1):1-5.
[20]Mirsadeghi L, Haji HR, Banaei-moghaddam AM, Kavousi K. EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer. BMC Medical Genomics. 2021; 14(1):1-19.
[21]Amrane M, Oukid S, Gagaoua I, Ensari T. Breast cancer classification using machine learning. In electric electronics, computer science, biomedical engineerings meeting 2018 (pp. 1-4). IEEE.
[22]Wu J, Hicks C. Breast cancer type classification using machine learning. Journal of Personalized Medicine. 2021; 11(2):1-12.
[23]Divyavani M, Kalpana G. An analysis on SVM & ANN using breast cancer dataset. Aegaeum J. 2021; 8:369-79.
[24]Ak MF. A comparative analysis of breast cancer detection and diagnosis using data visualization and machine learning applications. Healthcare 2020; 8(2):1-23. Multidisciplinary Digital Publishing Institute.
[25]Thottathyl H, Kanadam KP, Panchadula RP. Microarray breast cancer data clustering using map reduce based K-means algorithm. Revue dIntelligence Artificielle. 2020; 34(6):763-9.
[26]Ahmed MT, Imtiaz MN, Karmakar A. Analysis of wisconsin breast cancer original dataset using data mining and machine learning algorithms for breast cancer prediction. Journal of Science Technology and Environment Informatics. 2020; 9(2):665-72.
[27]Teixeira F, Montenegro JL, Da CCA, Da RRR. An analysis of machine learning classifiers in breast cancer diagnosis. In XLV Latin American computing conference 2019 (pp. 1-10). IEEE.
[28]Magboo VP, Magboo MS. Machine learning classifiers on breast cancer recurrences. Procedia Computer Science. 2021; 192:2742-52.
[29]Naji MA, El FS, Aarika K, Benlahmar EH, Abdelouhahid RA, Debauche O. Machine learning algorithms for breast cancer prediction and diagnosis. Procedia Computer Science. 2021; 191:487-92.
[30]Lahoura V, Singh H, Aggarwal A, Sharma B, Mohammed MA, Damaševičius R, et al. Cloud computing-based framework for breast cancer diagnosis using extreme learning machine. Diagnostics. 2021; 11(2):1-19.
[31]Ali HR, Rueda OM, Chin SF, Curtis C, Dunning MJ, Aparicio SA, et al. Genome-driven integrated classification of breast cancer validated in over 7,500 samples. Genome Biology. 2014; 15(8):1-14.
[32]Saoud H, Ghadi A, Ghailani M, Abdelhakim BA. Using feature selection techniques to improve the accuracy of breast cancer classification. In the proceedings of the third international conference on smart city applications 2018 (pp. 307-15). Springer, Cham.
[33]Vrigazova BP. Detection of malignant and benign breast cancer using the Anova-Bootstrap-SVM. Journal of Data and Information Science. 2020; 5(2):62-75.
[34]Abdullah DM, Abdulazeez AM. Machine learning applications based on SVM classification a review. Qubahan Academic Journal. 2021; 1(2):81-90.
[35]Charbuty B, Abdulazeez A. Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends. 2021; 2(01):20-8.
[36]Chang CC, Yeh JH, Chiu HC, Chen YM, Jhou MJ, Liu TC, et al. Utilization of decision tree algorithms for supporting the prediction of intensive care unit admission of myasthenia gravis: a machine learning-based approach. Journal of Personalized Medicine. 2022; 12(1):1-16.
[37]Disha RA, Waheed S. Performance analysis of machine learning models for intrusion detection system using Gini impurity-based weighted random forest (GIWRF) feature selection technique. Cybersecurity. 2022; 5(1):1-22.
[38]Schonlau M, Zou RY. The random forest algorithm for statistical learning. The Stata Journal. 2020; 20(1):3-29.
[39]Gaye B, Zhang D, Wulamu A. Improvement of support vector machine algorithm in big data background. Mathematical Problems in Engineering. 2021.
[40]Gulati P, Sharma A, Gupta M. Theoretical study of decision tree algorithms to identify pivotal factors for performance improvement: a review. International Journal of Computer Applications. 2016; 141(14):19-25.
[41]Sarica A, Cerasa A, Quattrone A. Random forest algorithm for the classification of neuroimaging data in Alzheimers disease: a systematic review. Frontiers in Aging Neuroscience. 2017; 9:1-12.
[42]Zhang Y, Ni M, Zhang C, Liang S, Fang S, Li R, et al. Research and application of AdaBoost algorithm based on SVM. In 8th joint international information technology and artificial intelligence conference 2019 (pp. 662-6). IEEE.
[43]Montavon G, Samek W, Müller KR. Methods for interpreting and understanding deep neural networks. Digital Signal Processing. 2018; 73:1-15.
[44]Sarker IH. Machine learning: algorithms, real-world applications and research directions. SN Computer Science. 2021; 2(3):1-21.
[45]Battula K. Research of machine learning algorithms using K-fold cross validation. International Journal of Engineering and Advanced Technology. 2021; 8(6S):215-8.
[46]Kumar A, Sushil R, Tiwari AK. Significance of accuracy levels in cancer prediction using machine learning techniques. Technical Communication. 2019; 12(3): 741-7.
[47]Patel HH, Prajapati P. Study and analysis of decision tree based classification algorithms. International Journal of Computer Sciences and Engineering. 2018; 6(10):74-8.
[48]Octaviani TL, Rustam DZ. Random forest for breast cancer prediction. In conference proceedings 2019 (pp. 1-6). AIP Publishing LLC.
[49]Zheng J, Lin D, Gao Z, Wang S, He M, Fan J. Deep learning assisted efficient AdaBoost algorithm for breast cancer detection and early diagnosis. IEEE Access. 2020; 8:96946-54.
[50]Mohammed SA, Darrab S, Noaman SA, Saake G. Analysis of breast cancer detection using different machine learning techniques. In international conference on data mining and big data 2020 (pp. 108-17). Springer, Singapore.
[51]Easttom C, Thapa S, Lawson J. A comparative study of machine learning algorithms for use in breast cancer studies. In 10th annual computing and communication workshop and conference 2020 (pp. 412-6). IEEE.
[52]Chaurasia V, Pal S, Tiwari BB. Prediction of benign and malignant breast cancer using data mining techniques. Journal of Algorithms & Computational Technology. 2018; 12(2):119-26.