Optimal feature selection for cricket talent identification
Naveed Jeelani Khan, Gulfam Ahamad, Nahida Reyaz and Mohd Naseem
Abstract
Cricket talent identification (TiD) is a methodical process that aims to find the young athletes possessing a potential to excel in the cricket sport at an early age. The sports scientists have identified a set of twenty-eight parameters that determine the cricket TiD. In order to realize the objective of computational efficiency by reducing the feature set, we perform an optimal feature selection for cricket TiD using nine different feature selection techniques Viz. mutual information, information gain ratio, correlation, chi square, univariate root mean square error (RMSE), receiver operating characteristic (ROC) with decision tree classifier, reliefF, boruta and oneR. The individual results obtained from the feature selection techniques are provided along with the individual ranking. We aggregate the results using two different rank aggregation techniques namely average ranking aggregation and majority vote ranking aggregation. The aggregation results show a significant agreement between the two schemes. Fourteen out of twenty-eight features are selected using a threshold of 0.52– the value selected on recommendation of four different domain experts. 71.4% of the selected features are sport-centric and only 28.6% of the selected features are from the cognitive ability category. To the best of our knowledge, this is first such attempt to identify the talent in cricket using this methodology.
Keyword
Sports talent identification, Feature selection, Cricket talent identification, Applied decision sciences, Feature reduction.
Cite this article
Khan NJ, Ahamad G, Reyaz N, Naseem M.Optimal feature selection for cricket talent identification. International Journal of Advanced Technology and Engineering Exploration. 2023;10(98):67-86. DOI:10.19101/IJATEE.2021.876300
Refference
[1]Vaeyens R, Güllich A, Warr CR, Philippaerts R. Talent identification and promotion programmes of olympic athletes. Journal of Sports Sciences. 2009; 27(13):1367-80.
[2]Anshel MH, Lidor R. Talent detection programs in sport: the questionable use of psychological measures. Journal of Sport Behavior. 2012; 35(3): 239-66.
[3]Lidor R, Côté JE, Hackfort D. ISSP position stand: to test or not to test? the use of physical skill tests in talent detection and in early phases of sport development. International Journal of Sport and Exercise Psychology. 2009; 7(2):131-46.
[4]Bompa TO, Buzzichelli C. Periodization-: theory and methodology of training. Human Kinetics; 2018.
[5]Khan NJ, Ahamad G, Naseem M, Sohail SS. Computational efficiency in sports talent identification-a systematic review. International Journal of Applied Decision Sciences. 2022:1-34.
[6]Connor JD, Renshaw I, Farrow D. Defining cricket batting expertise from the perspective of elite coaches. PLoS One. 2020; 15(6):1-20.
[7]Premkumar P, Chakrabarty JB, Chowdhury S. Key performance indicators for factor score based ranking in one day international cricket. IIMB Management Review. 2020; 32(1):85-95.
[8]https://dtai.cs.kuleuven.be/events/MLSA16/slides/06_Madan_Gopal.pdf. Accessed 14 April 2022.
[9]Manage AB, Kafle RC, Wijekularathna DK. Classification of all-rounders in limited over cricket-a machine learning approach. Journal of Sports Analytics. 2020; 6(4):295-306.
[10]Zare CMA. An effective method of feature selection in persian text for improving the accuracy of detecting request in Persian messages on telegram. Journal of Information Systems and Telecommunication. 2021; 4(32):249-62.
[11]Alice K, Natesan K, Dhanalakshmi B, Jaisharma K. Role of attribute selection on tuning the learning performance of Parkinson’s data using various intelligent classifiers. International Journal of Advanced Technology and Engineering Exploration. 2021; 8(78):560-75.
[12]Wiharto W, Suryani E, Susilo M. Performance analysis of hybrid SOM and AdaBoost classifiers for diagnosis of hypertensive retinopathy. Journal of Information Systems and Telecommunication. 2021; 2(34):79-88.
[13]Barney EG. Preliminary stages in the validation of a talent identification model in cricket. Bangor University (United Kingdom); 2015:1-23.
[14]Ahamad G, Naqvi SK, Beg MS. OWA based model for talent selection in cricket. In advance trends in soft computing: proceedings of WCSC 2013, San Antonio, Texas, USA 2014 (pp. 229-39). Springer International Publishing.
[15]Ahamad G, Naqvi SK, Beg MS. A model for talent identification in cricket based on OWA operator. International Journal of Information Technology & Management Information System. 2013; 4(2):40-55.
[16]Johnston FE. The physique of the olympic athlete, by JM Tanner, with the assistance of RH Whitehouse and Shirley Jarman. 126 pp., 6 tables, 80 figures, 118 plates. George Allen and Unwin, Ltd., London. 1964; 22(4):494-5.
[17]Taha Z, Musa RM, Majeed AP, Alim MM, Abdullah MR. The identification of high potential archers based on fitness and motor ability variables: a support vector machine approach. Human Movement Science. 2018; 57:184-93.
[18]Noori M, Sadeghi H. Designing smart model in volleyball talent identification via fuzzy logic based on main and weighted criteria resulted from the analytic hierarchy process. Journal of Advanced Sport Technology. 2018; 2(1):16-24.
[19]Kusnanik NW, Hariyanto A, Herdyanto Y, Satia A. Talent identification model for sprinter using discriminant factor. In IOP conference series: materials science and engineering 2018 (pp. 1-6). IOP Publishing.
[20]Rozi F, Setijono H, Kusnanik NW. The identification model on swimming athletes’ skill. Theory and Methodology of Physical Education and Sports. 2019; 27(4):30-5.
[21]Dwivedi P, Chaturvedi V, Vashist JK. Efficient team formation from pool of talent: comparing AHP-LP and TOPSIS-LP approach. Journal of Enterprise Information Management. 2020; 33(5):1293-318.
[22]Mat-rasid SM, Abdullah MR, Juahir H, Maliki AB, Kosni NA, Musa RM, et al. Applied multidimensional analysis for assessing youth performance in sports talent identification program. International Journal of Recent Technology and Engineering. 2019; 8(2S7):207-11.
[23]Huang X, Wang G, Chen C, Liu J, Kristiansen B, Hohmann A, et al. Constructing a talent identification index system and evaluation model for cross-country skiers. Journal of Sports Sciences. 2021; 39(4):368-79.
[24]Priscilla CV, Prabha DP. A two-phase feature selection technique using mutual information and XGB-RFE for credit card fraud detection. International Journal of Advanced Technology and Engineering Exploration. 2021; 8(85):1656-68.
[25]Liu H, Yu L. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering. 2005; 17(4):491-502.
[26]Yu L, Liu H. Feature selection for high-dimensional data: a fast correlation-based filter solution. In proceedings of the 20th international conference on machine learning 2003 (pp. 856-63).
[27]Liu H, Setiono R. A probabilistic approach to feature selection-a filter solution. In ICML 1996 (pp. 319-27).
[28]Kohavi R, John GH. Wrappers for feature subset selection. Artificial Intelligence. 1997; 97(1-2):273-324.
[29]Das S. Filters, wrappers and a boosting-based hybrid for feature selection. In ICML 2001 (pp. 74-81).
[30]Karegowda AG, Jayaram MA, Manjunath AS. Feature subset selection problem using wrapper approach in supervised learning. International Journal of Computer Applications. 2010; 1(7):13-7.
[31]Ahamad G, Naqvi SK, Beg MS, Ahmed T. A web based system for cricket talent identification, enhancement and selection (C-TIES). Procedia Computer Science. 2015; 62:134-42.
[32]https://era.ed.ac.uk/handle/1842/1952. Accessed 14 April 2022.
[33]Mann DL, Dehghansai N, Baker J. Searching for the elusive gift: advances in talent identification in sport. Current Opinion in Psychology. 2017; 16:128-33.
[34]Xian S, Guo H, Chai J, Wan W. Interval probability hesitant fuzzy linguistic analytic hierarchy process and its application in talent selection. Journal of Intelligent & Fuzzy Systems. 2020; 39(3):2627-45.
[35]Curran O, Macnamara A, Passmore D. What about the girls? exploring the gender data gap in talent development. Frontiers in Sports and Active Living. 2019; 1(3):1-7.
[36]Cover TM, Thomas JA. Information theory and statistics. Elements of Information Theory. 1991; 1(1):279-335.
[37]Vergara JR, Estévez PA. A review of feature selection methods based on mutual information. Neural Computing and Applications. 2014; 24:175-86.
[38]Torkkola K. Information-theoretic methods. Feature Extraction: Foundations and Applications. 2006:167-85.
[39]Salzberg SL. C4. 5: programs for machine learning by J.Ross quinlan. Morgan Kaufmann Publishers. 1993:235-40.
[40]Duch W. Filter methods. Feature Extraction: Foundations and Applications. 2006:89-117.
[41]Jin X, Xu A, Bie R, Guo P. Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. In data mining for biomedical applications: PAKDD 2006 workshop, BioDM 2006, Singapore proceedings 2006 (pp. 106-15). Springer Berlin Heidelberg.
[42]Embrechts MJ, Bress RA, Kewley RH. Feature selection via sensitivity analysis with direct kernel PLS. Feature Extraction: Foundations and Applications. 2006:447-62.
[43]Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition. 1997; 30(7):1145-59.
[44]Robnik-sikonja M, Kononenko I. Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning. 2003; 53:23-69.
[45]Urbanowicz RJ, Meeker M, La CW, Olson RS, Moore JH. Relief-based feature selection: introduction and review. Journal of Biomedical Informatics. 2018; 85:189-203.
[46]Kursa MB, Jankowski A, Rudnicki WR. Boruta–a system for feature selection. Fundamenta Informaticae. 2010; 101(4):271-85.
[47]Sujatha M, Devi L. Feature selection techniques using for high dimensional data in machine learning. International Journal of Engineering Research & Technology. 2013; 2(9):2909-16.
[48]Bauer E, Kohavi R. An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Machine Learning. 1999; 36:105-39.
[49]Wald R, Khoshgoftaar TM, Dittman D, Awada W, Napolitano A. An extensive comparison of feature ranking aggregation techniques in bioinformatics. In 13th international conference on information reuse & integration 2012 (pp. 377-84). IEEE.