International Journal of Advanced Computer Research (IJACR) ISSN (P): 2249-7277 ISSN (O): 2277-7970 Vol - 10, Issue - 49, July 2020
  1. 1
    Google Scholar
  2. 4
    Impact Factor
Predictive and perspective analysis of cancer image data set using machine learning algorithms

Divya Chauhan and Kishori Lal Bansal

Abstract

Classification and prediction of the images are fairly easy task for humans, but it takes more effort for a machine to do the same. Machine learning helps to attain this goal. It automates the task of classifying a large collection of images into different classes by labelling the incoming data and recognizes patterns in it, which is subsequently translated into valuable insights. The aim of this paper is to classify the image data set of five cancer types, namely Osteosarcoma, Prostate Cancer, Brain Cancer, Breast Cancer and Acute Myeloid Leukaemia. Furthermore, the prediction of Osteosarcoma case for one of the four classes of tumor namely Non tumor, Non-Viable tumor, viable tumor, Viable: Non-Viable tumor has to be done. The quantitative analysis is done using various machine learning libraries of python. The three classification algorithms used for image analysis are random forest, SVM, and logistic regression. The metrics used for performing perspective analysis are precision, recall and F1 Score. The results show that the random forest algorithm has performed best amongst the three classification algorithms when given with less complicated scenario, with prediction accuracy, precision, recall and f1 score of 100%. But the performance of every classification algorithm degrades when provided with the cases of Osteosarcoma which has got more complicated scatter graph. However, the logistic regression retains its performance by predicting tumor cases with 99% accuracy.

Keyword

Data mining, Big data, Hadoop, Mahout, Clustering, Health care.

Cite this article

Chauhan D, Bansal KL

Refference

[1][1]https://searchbusinessanalytics.techtarget.com/ehandbook/Machine-learning-technology-techniques-add-new-analytics-smarts. Accessed 11 April 2020.

[2][2]Asim M, Khan Z. Mobile price class prediction using machine learning techniques. International Journal of Computer Applications. 2018;179(29):6-11.

[3][3]https://towardsdatascience.com/a-brief-introduction-to-supervised-learning-54a3e3932590. Accessed 11 April 2020.

[4][4]Kesavaraj G, Sukumaran S. A study on classification techniques in data mining. In fourth international conference on computing, communications and networking technologies 2013 (pp. 1-7). IEEE.

[5][5]Korkmaz M, Güney S, Yiğiter ŞY. The importance of logistic regression implementations in the Turkish livestock sector and logistic regression implementations/fields. 2012; 16(2):25-36.

[6][6]Biau G. Analysis of a random forests model. The Journal of Machine Learning Research. 2012; 13(1):1063-95.

[7][7]Tong S, Koller D. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research. 2001:45-66.

[8][8]Nadiammai GV, Hemalatha M. Perspective analysis of machine learning algorithms for detecting network intrusions. In third international conference on computing, communication and networking technologies 2012 (pp. 1-7). IEEE.

[9][9]Liu S, Wang X, Liu M, Zhu J. Towards better analysis of machine learning models: a visual analytics perspective. Visual Informatics. 2017; 1(1):48-56.

[10][10]Khatavkar V, Velankar M, Kulkarni P. Multi-perspective analysis of news articles using machine learning algorithms. International Journal of Computer Applications.2019.

[11][11]Celli F, Cumbo F, Weitschek E. Classification of large DNA methylation datasets for identifying cancer drivers. Big Data Research. 2018; 13:21-8.

[12][12]Khalifa S, Martin P, Young R. Label-aware distributed ensemble learning: a simplified distributed classifier training model for big data. Big Data Research. 2019; 15:1-11.

[13][13]Genevès P, Calmant T, Layaïda N, Lepelley M, Artemova S, Bosson JL. Scalable machine learning for predicting at-risk profiles upon hospital admission. Big Data Research. 2018; 12:23-34.

[14][14]Sun N, Sun B, Lin JD, Wu MY. Lossless pruned naive bayes for big data classifications. Big Data Research. 2018; 14:27-36.

[15][15]McGinnis RS, McGinnis EW, Hruschak J, Lopez-Duran NL, Fitzgerald K, Rosenblum KL, et al. Wearable sensors and machine learning diagnose anxiety and depression in young children. In EMBS international conference on biomedical & health informatics (BHI) 2018 (pp. 410-3). IEEE.

[16][16]Dumitrescu E, Hue S, Hurlin C, Tokpavi S. Machine learning for credit scoring: improving logistic regression with non linear decision tree effects (Doctoral dissertation). 2018.

[17][17]Xin M, Wang Y. Research on image classification model based on deep convolution neural network. EURASIP Journal on Image and Video Processing. 2019.

[18][18]Gupta A. Current research opportunities of image processing and computer vision. Computer Science. 2019; 20(4):387-410.

[19][19]Bianco S, Cusano C, Piccoli F, Schettini R. Personalized image enhancement using neural spline color transforms. IEEE Transactions on Image Processing. 2020; 29:6223-36.

[20][20]Liu CL, Shih KT, Huang JW, Chen HH. Light field synthesis by training deep network in the refocused image domain. IEEE Transactions on Image Processing. 2020; 29:6630-40.

[21][21]Liu S, Thung KH, Lin W, Yap PT, Shen D. Real-time quality assessment of pediatric MRI via semi-supervised deep nonlocal residual neural networks. IEEE Transactions on Image Processing. 2020; 29:7697-706.

[22][22]Yasarla R, Perazzi F, Patel VM. Deblurring face images using uncertainty guided multi-stream semantic networks. IEEE Transactions on Image Processing. 2020; 29:6251-63.

[23][23]Mishra A. Metrics to evaluate your machine learning algorithm. Towards Data Science. 2018.