International Journal of Advanced Computer Research (IJACR) ISSN (P): 2249-7277 ISSN (O): 2277-7970 Vol - 9, Issue - 42, May 2019
  1. 1
    Google Scholar
  2. 4
    Impact Factor
Machine learning approach for reducing students dropout rates

Neema Mduma, Khamisi Kalegele and Dina Machuve

Abstract

School dropout is a widely recognized serious issue in developing countries. On the other hand, machine learning techniques have gained much attention on addressing this problem. This paper, presents a thorough analysis of four supervised learning classifiers that represent linear, ensemble, instance and neural networks on Uwezo Annual Learning Assessment datasets for Tanzania as a case study. The goal of the study is to provide data-driven algorithm recommendations to current researchers on the topic. Using three metrics: geometric mean, F-measure and adjusted geometric mean, we assessed and quantified the effect of different sampling techniques on the imbalanced dataset for model selection. We further indicate the significance of hyper parameter tuning in improving predictive performance. The results indicate that two classifiers: logistic regression and multilayer perceptron achieve the highest performance when over-sampling technique was employed. Furthermore, hyper parameter tuning improves each algorithm's performance compared to its baseline settings and stacking these classifiers improves the overall predictive performance.

Keyword

Machine learning (ML), Imbalanced learning classification, Secondary education, Evaluation metrics.

Cite this article

Mduma N, Kalegele K, Machuve D

Refference

[1][1]Mgala M. Investigating prediction modelling of academic performance for students in rural schools in Kenya (Doctoral dissertation, University of Cape Town). 2016.

[2][2]Mohamed MH, Waguih HM. A proposed academic advisor model based on data mining classification techniques. International Journal of Advanced Computer Research. 2018; 8(36):129-36.

[3][3]Xu J, Moon KH, Van Der Schaar M. A machine learning approach for tracking and predicting student performance in degree programs. IEEE Journal of Selected Topics in Signal Processing. 2017; 11(5):742-53.

[4][4]Feng W, Tang J, Liu TX. Understanding dropouts in MOOCs. Association for the Advancement of Artificial Intelligence. 2019.

[5][5]Iam-On N, Boongoen T. Generating descriptive model for student dropout: a review of clustering approach. Human-Centric Computing and Information Sciences. 2017; 7(1):1-24.

[6][6]Kumar M, Singh AJ, Handa D. Literature survey on educational dropout prediction. International Journal of Education and Management Engineering. 2017; 7(2):8-19.

[7][7]Ameri S, Fard MJ, Chinnam RB, Reddy CK. Survival analysis based framework for early prediction of student dropouts. In proceedings of the ACM international on conference on information and knowledge management 2016 (pp. 903-12). ACM.

[8][8]Aulck L, Velagapudi N, Blumenstock J, West J. Predicting student dropout in higher education. Workshop on data4good: machine learning in social good applications 2016 (16-20).

[9][9]Chen Y, Chen Q, Zhao M, Boyer S, Veeramachaneni K, Qu H. DropoutSeer: visualizing learning patterns in massive open online courses for dropout reasoning and prediction. In conference on visual analytics science and technology 2016 (pp. 111-20). IEEE.

[10][10]Hu Q, Polyzou A, Karypis G, Rangwala H. Enriching course-specific regression models with content features for grade prediction. In international conference on data science and advanced analytics 2017 (pp. 504-13). IEEE.

[11][11]Elbadrawy A, Polyzou A, Ren Z, Sweeney M, Karypis G, Rangwala H. Predicting student performance using personalized analytics. Computer. 2016; 49(4):61-9.

[12][12]Iqbal Z, Qadir J, Mian AN, Kamiran F. Machine learning based student grade prediction: a case study. 2017:1-22.

[13][13]Wang W, Yu H, Miao C. Deep model for dropout prediction in MOOCS. In proceedings of the international conference on crowd science and engineering 2017 (pp. 26-32). ACM.

[14][14]Hamedi A and Dirin A. A bayesian approach in students performance analysis. International conference on education and new learning technologies. 2018.

[15][15]https://icsh.es/2017/11/12/i-congreso-internacional-multidisciplinario-de-educacion-superior/. Accessed 26 October 2018.

[16][16]Hung JL, Wang MC, Wang S, Abdelrasoul M, Li Y, He W. Identifying at-risk students for early interventions-a time-series clustering approach. IEEE Transactions on Emerging Topics in Computing. 2017; 5(1):45-55.

[17][17]Młynarska E, Greene D, Cunningham P. Time series clustering of MOODLE activity data. In Irish conference on artificial intelligence and cognitive science University College Dublin, Dublin, Ireland, 2016.

[18][18]Yan J, Han S. Classifying imbalanced data sets by a novel re-sample and cost-sensitive stacked generalization method. Mathematical Problems in Engineering.2018.

[19][19]Alasadi SA, Bhaya WS. Review of data preprocessing techniques in data mining. Journal of Engineering and Applied Sciences. 2017; 12(16):4102-7.

[20][20]Shahul S, Suneel S, Rahaman MA, Swathi JN. A study of data pre-processing techniques for machine learning algorithm to predict software effort estimation. Imperial Journal of Interdisciplinary Research. 2016; 2(6):546-50.

[21][21]Krawczyk B. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence. 2016; 5(4):221-32.

[22][22]Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F. New ordering-based pruning metrics for ensembles of classifiers in imbalanced datasets. In proceedings of the international conference on computer recognition systems 2016 (pp. 3-15). Springer, Cham.

[23][23]Borowska K, Topczewska M. New data level approach for imbalanced data classification improvement. In proceedings of the international conference on computer recognition systems 2015 (pp. 283-94). Springer, Cham.

[24][24]Rout N, Mishra D, Mallick MK. Handling imbalanced data: a survey. In international proceedings on advances in soft computing, intelligent systems and applications 2018 (pp. 431-43). Springer, Singapore.

[25][25]Saini AK, Nayak AK, Vyas RK. ICT Based Innovations. Proceedings of CSI. 2015.

[26][26]Dattagupta SJ. A performance comparison of oversampling methods for data generation in imbalanced learning tasks (Doctoral dissertation). 2017.

[27][27]Stefanowski J. On properties of undersampling bagging and its extensions for imbalanced data. In proceedings of the international conference on computer recognition systems 2016 (pp. 407-417). Springer, Cham.

[28][28]Moreno MF. Comparing the performance of oversampling techniques for imbalanced learning in insurance fraud detection (Doctoral dissertation). 2017.

[29][29]Santoso B, Wijayanto H, Notodiputro KA, Sartono B. Synthetic over sampling methods for handling class imbalanced problems: a review. In IOP conference series: earth and environmental science 2017 (p. 012031). IOP Publishing.

[30][30]Skryjomski P, Krawczyk B. Influence of minority class instance types on SMOTE imbalanced data oversampling. In first international workshop on learning with imbalanced domains: theory and applications 2017 (pp. 7-21).

[31][31]Ahmed S, Mahbub A, Rayhan F, Jani R, Shatabda S, Farid DM. Hybrid methods for class imbalance learning employing bagging with sampling techniques. In international conference on computational systems and information technology for sustainable solution 2017 (pp. 1-5). IEEE.

[32][32]Douzas G, Bacao F. Geometric SMOTE: effective oversampling for imbalanced learning through a geometric extension of SMOTE. arXiv preprint arXiv:1709.07377. 2017.

[33][33]Elhassan T, Aljurf M. Classification of imbalance data using tomek link (T-Link) combined with random under-sampling (RUS) as a data reduction method. Global Journal of Technology and Optimization. 2016, S1: 111.

[34][34]Khaldy MA, Kambhampati C. Resampling imbalanced class and the effectiveness of feature selection methods for heart failure dataset. International Robotics & Automation Journal. 2018; 4(1):1-10.

[35][35]Kim D, Kim S. Sustainable education: analyzing the determinants of university student dropout by nonlinear panel data models. Sustainability. 2018; 10(4):1-18.

[36][36]Marquez‐Vera C, Cano A, Romero C, Noaman AY, Mousa Fardoun H, Ventura S. Early dropout prediction using data mining: a case study with high school students. Expert Systems. 2016; 33(1):107-24.

[37][37]Rovira S, Puertas E, Igual L. Data-driven system to predict academic grades and dropout. PLoS one. 2017; 12(2):e0171207.

[38][38]Aulck L, Aras R, Li L, L Heureux C, Lu P, West J. STEM-ming the tide: predicting STEM attrition using student transcript data. arXiv preprint arXiv:1708.09344. 2017.

[39][39]Rojas-Domínguez A, Padierna LC, Valadez JM, Puga-Soberanes HJ, Fraire HJ. Optimal hyper-parameter tuning of SVM classifiers with application to medical diagnosis. IEEE Access. 2018; 6:7164-76.

[40][40]Probst P, Wright MN, Boulesteix AL. Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 1804:e1301.

[41][41]Dalvi PT, Vernekar N. Anemia detection using ensemble learning techniques and statistical models. In international conference on recent trends in electronics, information & communication technology 2016 (pp. 1747-51). IEEE.

[42][42]Feng W, Huang W, Ren J. Class imbalance ensemble learning based on the margin theory. Applied Sciences. 2018; 8(5):815.

[43][43]Abuassba AO, Zhang D, Luo X, Shaheryar A, Ali H. Improving classification performance through an advanced ensemble based heterogeneous extreme learning machines. Computational Intelligence and Neuroscience. 2017.

[44][44]Afolabi LT, Saeed F, Hashim H, Petinrin OO. Ensemble learning method for the prediction of new bioactive molecules. PloS one. 2018; 13(1):e0189538.