International Journal of Advanced Technology and Engineering Exploration (IJATEE) ISSN (P): 2394-5443 ISSN (O): 2394-7454 Vol - 8, Issue - 82, September 2021
  1. 1
    Google Scholar
A comparative performance of breast cancer classification using hyper-parameterized machine learning models

Kristoffersen Edward Mayce R. Lomboy and Rowell M. Hernandez

Abstract

Breast cancer is the second most common cancer and has the second-highest mortality rate in women among all cancer types. Accurate cancer diagnosis plays a great part in breast cancer treatment. The application of machine learning methods in cancer classification has grown popular and has provided an accurate classification of malignant (cancerous) and benign (non-cancerous) breast cancer. This paper presents the application of three machine learning methods to classify malignant and benign breast cancer. The three machine learning methods used in this study are Support Sector Machine (SVM), Logistic Regression (LR), and Neural Network (NN) for breast cancer classification. For each machine learning method, multiple models had been tested with every model having a unique set of parameter values. This study used the breast cancer Wisconsin diagnostic (BCWD) dataset. The performance of the models is evaluated using the k-fold cross-validation technique and confusion matrix. The result shows that SVM outperformed both LR and NN in terms of classification accuracy, precision, recall, and specificity with k-fold cross validation technique. On the other hand, when the train-test split was used to validate the proposed model, the NN outperformed both SVM and LR achieving accuracy of 99.4%.

Keyword

Breast cancer, Breast cancer Wisconsin (diagnostic) data set, Support vector machines, Logistic regression, Neural network.

Cite this article

Lomboy KE, Hernandez RM

Refference

[1][1]Ghanbari A, Rahmatpour P, Hosseini N, Khalili M. Social determinants of breast cancer screening among married women: a cross-sectional study. Journal of Research in Health Sciences. 2020; 20(1):e00467.

[2][2]Zitvogel L, Tesniere A, Kroemer G. Cancer despite immunosurveillance: immunoselection and immunosubversion. Nature Reviews Immunology. 2006; 6(10):715-27.

[3][3]Rakoff-nahoum S. Cancer issue: why cancer and inflammation? The Yale Journal of Biology and Medicine. 2006; 79(3-4):123-30.

[4][4]Akay MF. Support vector machines combined with feature selection for breast cancer diagnosis. Expert Systems with Applications. 2009; 36(2):3240-7.

[5][5]Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians. 2018; 68(6):394-424.

[6][6]Sun YS, Zhao Z, Yang ZN, Xu F, Lu HJ, Zhu ZY, et al. Risk factors and preventions of breast cancer. International Journal of Biological Sciences. 2017; 13(11):1387-97.

[7][7]Drukteinis JS, Mooney BP, Flowers CI, Gatenby RA. Beyond mammography: new frontiers in breast cancer screening. The American Journal of Medicine. 2013; 126(6):472-9.

[8][8]Mckinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020; 577(7788):89-94.

[9][9]Sahu B, Mohanty S, Rout S. A hybrid approach for breast cancer classification and diagnosis. EAI Endorsed Transactions on Scalable Information Systems. 2019; 6(20):1-8.

[10][10]Cho SB, Won HH. Machine learning in DNA microarray analysis for cancer classification. In proceedings of the first Asia-pacific bioinformatics conference on bioinformatics 2003 (pp. 189-98).

[11][11]Liu Y. Active learning with support vector machine applied to gene expression data for cancer classification. Journal of Chemical Information and Computer Sciences. 2004; 44(6):1936-41.

[12][12]Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science. 2015; 349(6245):255-60.

[13][13]Jin X, Xu A, Bie R, Guo P. Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. In international workshop on data mining for biomedical applications 2006 (pp. 106-15). Springer, Berlin, Heidelberg.

[14][14]Lavanya D, Rani DK. Analysis of feature selection with classification: breast cancer datasets. Indian Journal of Computer Science and Engineering. 2011; 2(5):756-63.

[15][15]Salama GI, Abdelhalim M, Zeid MA. Breast cancer diagnosis on three different datasets using multi-classifiers. International Journal of Computer and Information Technology. 2012; 1(1):36-43.

[16][16]Utomo CP, Kardiana A, Yuliwulandari R. Breast cancer diagnosis using artificial neural networks with extreme learning techniques. International Journal of Advanced Research in Artificial Intelligence. 2014; 3(7):10-4.

[17][17]Obaid OI, Mohammed MA, Ghani MK, Mostafa A, Taha F. Evaluating the performance of machine learning techniques in the classification of Wisconsin breast cancer. International Journal of Engineering & Technology. 2018; 7(4.36):160-6.

[18][18]Dhanya R, Paul IR, Akula SS, Sivakumar M, Nair JJ. A comparative study for breast cancer prediction using machine learning and feature selection. In international conference on intelligent computing and control systems 2019 (pp. 1049-55). IEEE.

[19][19]Omondiagbe DA, Veeramani S, Sidhu AS. Machine learning classification techniques for breast cancer diagnosis. In IOP conference series: materials science and engineering 2019 (pp.1-16). IOP Publishing.

[20][20]Gupta P, Garg S. Breast cancer prediction using varying parameters of machine learning models. Procedia Computer Science. 2020; 171:593-601.

[21][21]Balaraman S. Comparison of classification models for breast cancer identification using google colab. Preprints 2020.

[22][22]Laghmati S, Cherradi B, Tmiri A, Daanouni O, Hamida S. Classification of patients with breast cancer using neighbourhood component analysis and supervised machine learning techniques. In 3rd international conference on advanced communication technologies and networking 2020 (pp. 1-6). IEEE.

[23][23]Durgesh KS, Lekha B. Data classification using support vector machine. Journal of Theoretical and Applied Information Technology. 2010; 12(1):1-7.

[24][24]Matsumoto A, Aoki S, Ohwada H. Comparison of random forest and SVM for raw data in drug discovery: prediction of radiation protection and toxicity case study. International Journal of Machine Learning and Computing. 2016; 6(2):145-8.

[25][25]Chai H, Huang HH, Jiang HK, Liang Y, Xia LY. Protein-protein interaction network construction for cancer using a new L1/2-penalized Net-SVM model. Genetics and molecular research: GMR. 2016; 15(3).

[26][26]Tirzïte M, Bukovskis M, Strazda G, Jurka N, Taivans I. Detection of lung cancer with electronic nose and logistic regression analysis. Journal of Breath Research. 2018; 13(1):1-9.

[27][27]Alarabeyyat A, Alhanahnah M. Breast cancer detection using k-nearest neighbor machine learning algorithm. In international conference on developments in eSystems engineering 2016 (pp. 35-9). IEEE.

[28][28]Patrício M, Pereira J, Crisóstomo J, Matafome P, Gomes M, Seiça R, et al. Using resistin, glucose, age and BMI to predict the presence of breast cancer. BMC Cancer. 2018; 18(1):1-8.

[29][29]Zhang YD, Satapathy SC, Guttery DS, Górriz JM, Wang SH. Improved breast cancer classification through combining graph convolutional network and convolutional neural network. Information Processing & Management. 2021; 58(2).

[30][30]Mohammed MA, Al-khateeb B, Rashid AN, Ibrahim DA, Abd GMK, Mostafa SA. Neural network and multi-fractal dimension features for breast cancer classification from ultrasound images. Computers & Electrical Engineering. 2018; 70:871-82.

[31][31]Higa A. Diagnosis of breast cancer using decision tree and artificial neural network algorithms. International Journal of Computer Applications Technology and Research. 2018;7(1): 23-7.

[32][32]Vijayakumar T. Neural network analysis for tumor investigation and cancer prediction. Journal of Electronics. 2019; 1(2):89-98.

[33][33]http://archive.ics.uci.edu/ml. Accessed 26 May 2021.

[34][34]Hazra A, Mandal SK, Gupta A. Study and analysis of breast cancer cell detection using naïve bayes, SVM and ensemble algorithms. International Journal of Computer Applications. 2016; 145(2):39-45.

[35][35]Seddik AF, Shawky DM. Logistic regression model for breast cancer automatic diagnosis. In SAI intelligent systems conference 2015 (pp. 150-4). IEEE.

[36][36]Thein HT, Tun KM. An approach for breast cancer diagnosis classification using neural network. Advanced Computing. 2015; 6(1):1-11.

[37][37]Ukil A. Support vector machine. In intelligent systems and signal processing in power engineering 2007 (pp. 161-226). Springer, Berlin, Heidelberg.

[38][38]Cruz JA, Wishart DS. Applications of machine learning in cancer prediction and prognosis. Cancer Informatics. 2006; 2:59-78.

[39][39]Byvatov E, Schneider G. Support vector machine applications in bioinformatics. Applied Bioinformatics. 2003; 2(2):67-77.

[40][40]Pisner DA, Schnyer DM. Support vector machine. In Machine Learning 2020 (pp. 101-21). Academic Press.

[41][41]Bayrak EA, Kırcı P, Ensari T. Comparison of machine learning methods for breast cancer diagnosis. In scientific meeting on electrical-electronics & biomedical engineering and computer science 2019 (pp. 1-13). IEEE.

[42][42]Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI. Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal. 2015; 13:8-17.

[43][43]Hussain M, Wajid SK, Elzaart A, Berbar M. A comparison of SVM kernel functions for breast cancer detection. In eighth international conference computer graphics, imaging and visualization 2011 (pp. 145-50). IEEE.

[44][44]Cherkassky V, Ma Y. Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks. 2004; 17(1):113-26.

[45][45]Lin SW, Lee ZJ, Chen SC, Tseng TY. Parameter determination of support vector machine and feature selection using simulated annealing approach. Applied Soft Computing. 2008; 8(4):1505-12.

[46][46]Ruiz A, Villa N. Storms prediction: logistic regression vs random forest for unbalanced data. arXiv preprint arXiv:0804.0650. 2008.

[47][47]Yusuff H, Mohamad N, Ngah UK, Yahaya A. Breast cancer analysis using logistic regression. International Journal of Research and Reviews in Applied Sciences. 2012; 10(1):14-22.

[48][48]Murtirawat R, Panchal S, Singh VK, Panchal Y. Breast cancer detection using k-nearest neighbors, logistic regression and ensemble learning. In international conference on electronics and sustainable communication systems 2020 (pp. 534-40). IEEE.

[49][49]Graja O, Azam M, Bouguila N. Breast cancer diagnosis using quality control charts and logistic regression. In 9th international symposium on signal, image, video and communications 2018 (pp. 215-20). IEEE.

[50][50]Sharma A, Kulshrestha S, Daniel S. Machine learning approaches for breast cancer diagnosis and prognosis. In international conference on soft computing and its engineering applications 2017 (pp. 1-5). IEEE.

[51][51]Goodman J. Exponential priors for maximum entropy models. In proceedings of the human language technology conference of the north american chapter of the association for computational linguistics: HLT-NAACL 2004 (pp. 305-12).

[52][52]Lee SI, Lee H, Abbeel P, Ng AY. Efficient l~ 1 regularized logistic regression. In AAAI 2006 (pp. 401-8).

[53][53]Salehi F, Abbasi E, Hassibi B. The impact of regularization on high-dimensional logistic regression. arXiv preprint arXiv:1906.03761. 2019.

[54][54]Ng AY. Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In proceedings of the twenty-first international conference on machine learning 2004 (p. 78).

[55][55]Demir-Kavuk O, Kamada M, Akutsu T, Knapp EW. Prediction using step-wise L1, L2 regularization and feature selection for small data sets with large number of features. BMC Bioinformatics. 2011; 12(1):1-10.

[56][56]Jaiswal S, Mehta A, Nandi GC. Investigation on the effect of L1 an L2 regularization on image features extracted using restricted boltzmann machine. In second international conference on intelligent computing and control systems 2018 (pp. 1548-53). IEEE.

[57][57]Li M, Nanda G, Chhajedss S, Sundararajan R. Machine learning-based decision support system for early detection of breast cancer. Indian Journal of Pharmaceutical Education and Research. 2020; 54(3):S705- 15.

[58][58]Tian H, Cai H, Wen J, Li S, Li Y. A music recommendation system based on logistic regression and eXtreme gradient boosting. In international joint conference on neural networks 2019 (pp. 1-6). IEEE.

[59][59]Floyd JCE, Lo JY, Yun AJ, Sullivan DC, Kornguth PJ. Prediction of breast cancer malignancy using an artificial neural network. Cancer: Interdisciplinary International Journal of the American Cancer Society. 1994; 74(11):2944-8.

[60][60]Karabatak M, Ince MC. An expert system for detection of breast cancer based on association rules and neural network. Expert systems with Applications. 2009; 36(2):3465-9.

[61][61]Heidari AA, Faris H, Aljarah I, Mirjalili S. An efficient hybrid multilayer perceptron neural network with grasshopper optimization. Soft Computing. 2019; 23(17):7941-58.

[62][62]Bui DT, Tuan TA, Klempe H, Pradhan B, Revhaug I. Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides. 2016; 13(2):361-78.

[63][63]Pham BT, Nguyen MD, Bui KT, Prakash I, Chapi K, Bui DT. A novel artificial intelligence approach based on multi-layer perceptron neural network and biogeography-based optimization for predicting coefficient of consolidation of soil. Catena. 2019; 173:302-11.

[64][64]Sharma S, Sharma S. Activation functions in neural networks. Towards Data Science. 2017; 6(12):310-6.

[65][65]Karlik B, Olgac AV. Performance analysis of various activation functions in generalized MLP architectures of neural networks. International Journal of Artificial Intelligence and Expert Systems. 2011; 1(4):111-22.

[66][66]Liao TW, Chen LJ. A neural network approach for grinding processes: modelling and optimization. International Journal of Machine Tools and Manufacture. 1994; 34(7):919-37.

[67][67]Amrane M, Oukid S, Gagaoua I, Ensari T. Breast cancer classification using machine learning. In electric electronics, computer science, biomedical engineering meeting 2018 (pp. 1-4). IEEE.

[68][68]Faraggi D, Simon R. A simulation study of cross‐validation for selecting an optimal cutpoint in univariate survival analysis. Statistics in Medicine. 1996; 15(20):2203-13.

[69][69]Nematzadeh Z, Ibrahim R, Selamat A. Comparative studies on breast cancer classifications with k-fold cross validations using machine learning techniques. In Asian control conference 2015 (pp. 1-6). IEEE.

[70][70]Mojarad SA, Dlay SS, Woo WL, Sherbet GV. Breast cancer prediction and cross validation using multilayer perceptron neural networks. In international symposium on communication systems, networks & digital signal processing 2010 (pp. 760-4). IEEE.

[71][71]Kumar GR, Ramachandra GA, Nagamani K. An efficient prediction of breast cancer data using data mining techniques. International Journal of Innovations in Engineering and Technology. 2013; 2(4):139-44.

[72][72]Alakus TB, Turkoglu I. Comparison of deep learning approaches to predict COVID-19 infection. Chaos, Solitons & Fractals. 2020; 140:110120.

[73][73]Vabalas A, Gowen E, Poliakoff E, Casson AJ. Machine learning algorithm validation with a limited sample size. PloS one. 2019; 14(11):1-20.

[74][74]De MBAF, Miraglia JL, Donato TH, Chiavegatto FAD. COVID-19 diagnosis prediction in emergency care patients: a machine learning approach. medRxiv. 2020.

[75][75]Chen HL, Yang B, Liu J, Liu DY. A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Systems with Applications. 2011; 38(7):9014-22.

[76][76]Hernandez RM, Hernandez AA. Classification of Nile Tilapia using convolutional neural network. In 9th international conference on system engineering and technology 2019 (pp. 126-31). IEEE.