ACCENTS Journals

Download PDF
Back

Optimal feature selection for cricket talent identification

Naveed Jeelani Khan, Gulfam Ahamad, Nahida Reyaz and Mohd Naseem

Abstract

Cricket talent identification (TiD) is a methodical process that aims to find the young athletes possessing a potential to excel in the cricket sport at an early age. The sports scientists have identified a set of twenty-eight parameters that determine the cricket TiD. In order to realize the objective of computational efficiency by reducing the feature set, we perform an optimal feature selection for cricket TiD using nine different feature selection techniques Viz. mutual information, information gain ratio, correlation, chi square, univariate root mean square error (RMSE), receiver operating characteristic (ROC) with decision tree classifier, reliefF, boruta and oneR. The individual results obtained from the feature selection techniques are provided along with the individual ranking. We aggregate the results using two different rank aggregation techniques namely average ranking aggregation and majority vote ranking aggregation. The aggregation results show a significant agreement between the two schemes. Fourteen out of twenty-eight features are selected using a threshold of 0.52– the value selected on recommendation of four different domain experts. 71.4% of the selected features are sport-centric and only 28.6% of the selected features are from the cognitive ability category. To the best of our knowledge, this is first such attempt to identify the talent in cricket using this methodology.

Keywords

Sports talent identification, Feature selection, Cricket talent identification, Applied decision sciences, Feature reduction.

Cite this article

Khan NJ, Ahamad G, Reyaz N, Naseem M.Optimal feature selection for cricket talent identification. International Journal of Advanced Technology and Engineering Exploration. 2023;10(98):67-86. DOI:10.19101/IJATEE.2021.876300

References

[1]Vaeyens R, Güllich A, Warr CR, Philippaerts R. Talent identification and promotion programmes of olympic athletes. Journal of Sports Sciences. 2009; 27(13):1367-80.

[Crossref] [Google Scholar]

[2]Anshel MH, Lidor R. Talent detection programs in sport: the questionable use of psychological measures. Journal of Sport Behavior. 2012; 35(3): 239-66.

[Google Scholar]

[3]Lidor R, Côté JE, Hackfort D. ISSP position stand: to test or not to test? the use of physical skill tests in talent detection and in early phases of sport development. International Journal of Sport and Exercise Psychology. 2009; 7(2):131-46.

[Crossref] [Google Scholar]

[4]Bompa TO, Buzzichelli C. Periodization-: theory and methodology of training. Human Kinetics; 2018.

[Google Scholar]

[5]Khan NJ, Ahamad G, Naseem M, Sohail SS. Computational efficiency in sports talent identification-a systematic review. International Journal of Applied Decision Sciences. 2022:1-34.

[Crossref] [Google Scholar]

[6]Connor JD, Renshaw I, Farrow D. Defining cricket batting expertise from the perspective of elite coaches. PLoS One. 2020; 15(6):1-20.

[Crossref] [Google Scholar]

[7]Premkumar P, Chakrabarty JB, Chowdhury S. Key performance indicators for factor score based ranking in one day international cricket. IIMB Management Review. 2020; 32(1):85-95.

[Crossref] [Google Scholar]

[8]https://dtai.cs.kuleuven.be/events/MLSA16/slides/06_Madan_Gopal.pdf. Accessed 14 April 2022.

[9]Manage AB, Kafle RC, Wijekularathna DK. Classification of all-rounders in limited over cricket-a machine learning approach. Journal of Sports Analytics. 2020; 6(4):295-306.

[Crossref] [Google Scholar]

[10]Zare CMA. An effective method of feature selection in persian text for improving the accuracy of detecting request in Persian messages on telegram. Journal of Information Systems and Telecommunication. 2021; 4(32):249-62.

[Crossref] [Google Scholar]

[11]Alice K, Natesan K, Dhanalakshmi B, Jaisharma K. Role of attribute selection on tuning the learning performance of Parkinson’s data using various intelligent classifiers. International Journal of Advanced Technology and Engineering Exploration. 2021; 8(78):560-75.

[Crossref] [Google Scholar]

[12]Wiharto W, Suryani E, Susilo M. Performance analysis of hybrid SOM and AdaBoost classifiers for diagnosis of hypertensive retinopathy. Journal of Information Systems and Telecommunication. 2021; 2(34):79-88.

[Crossref] [Google Scholar]

[13]Barney EG. Preliminary stages in the validation of a talent identification model in cricket. Bangor University (United Kingdom); 2015:1-23.

[Google Scholar]

[14]Ahamad G, Naqvi SK, Beg MS. OWA based model for talent selection in cricket. In advance trends in soft computing: proceedings of WCSC 2013, San Antonio, Texas, USA 2014 (pp. 229-39). Springer International Publishing.

[Crossref] [Google Scholar]

[15]Ahamad G, Naqvi SK, Beg MS. A model for talent identification in cricket based on OWA operator. International Journal of Information Technology & Management Information System. 2013; 4(2):40-55.

[Google Scholar]

[16]Johnston FE. The physique of the olympic athlete, by JM Tanner, with the assistance of RH Whitehouse and Shirley Jarman. 126 pp., 6 tables, 80 figures, 118 plates. George Allen and Unwin, Ltd., London. 1964; 22(4):494-5.

[Crossref] [Google Scholar]

[17]Taha Z, Musa RM, Majeed AP, Alim MM, Abdullah MR. The identification of high potential archers based on fitness and motor ability variables: a support vector machine approach. Human Movement Science. 2018; 57:184-93.

[Crossref] [Google Scholar]

[18]Noori M, Sadeghi H. Designing smart model in volleyball talent identification via fuzzy logic based on main and weighted criteria resulted from the analytic hierarchy process. Journal of Advanced Sport Technology. 2018; 2(1):16-24.

[Google Scholar]

[19]Kusnanik NW, Hariyanto A, Herdyanto Y, Satia A. Talent identification model for sprinter using discriminant factor. In IOP conference series: materials science and engineering 2018 (pp. 1-6). IOP Publishing.

[Crossref] [Google Scholar]

[20]Rozi F, Setijono H, Kusnanik NW. The identification model on swimming athletes’ skill. Theory and Methodology of Physical Education and Sports. 2019; 27(4):30-5.

[Crossref] [Google Scholar]

[21]Dwivedi P, Chaturvedi V, Vashist JK. Efficient team formation from pool of talent: comparing AHP-LP and TOPSIS-LP approach. Journal of Enterprise Information Management. 2020; 33(5):1293-318.

[Crossref] [Google Scholar]

[22]Mat-rasid SM, Abdullah MR, Juahir H, Maliki AB, Kosni NA, Musa RM, et al. Applied multidimensional analysis for assessing youth performance in sports talent identification program. International Journal of Recent Technology and Engineering. 2019; 8(2S7):207-11.

[Crossref] [Google Scholar]

[23]Huang X, Wang G, Chen C, Liu J, Kristiansen B, Hohmann A, et al. Constructing a talent identification index system and evaluation model for cross-country skiers. Journal of Sports Sciences. 2021; 39(4):368-79.

[Crossref] [Google Scholar]

[24]Priscilla CV, Prabha DP. A two-phase feature selection technique using mutual information and XGB-RFE for credit card fraud detection. International Journal of Advanced Technology and Engineering Exploration. 2021; 8(85):1656-68.

[Crossref] [Google Scholar]

[25]Liu H, Yu L. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering. 2005; 17(4):491-502.

[Crossref] [Google Scholar]

[26]Yu L, Liu H. Feature selection for high-dimensional data: a fast correlation-based filter solution. In proceedings of the 20th international conference on machine learning 2003 (pp. 856-63).

[Google Scholar]

[27]Liu H, Setiono R. A probabilistic approach to feature selection-a filter solution. In ICML 1996 (pp. 319-27).

[Google Scholar]

[28]Kohavi R, John GH. Wrappers for feature subset selection. Artificial Intelligence. 1997; 97(1-2):273-324.

[Crossref] [Google Scholar]

[29]Das S. Filters, wrappers and a boosting-based hybrid for feature selection. In ICML 2001 (pp. 74-81).

[Google Scholar]

[30]Karegowda AG, Jayaram MA, Manjunath AS. Feature subset selection problem using wrapper approach in supervised learning. International Journal of Computer Applications. 2010; 1(7):13-7.

[Google Scholar]

[31]Ahamad G, Naqvi SK, Beg MS, Ahmed T. A web based system for cricket talent identification, enhancement and selection (C-TIES). Procedia Computer Science. 2015; 62:134-42.

[Crossref] [Google Scholar]

[32]https://era.ed.ac.uk/handle/1842/1952. Accessed 14 April 2022.

[33]Mann DL, Dehghansai N, Baker J. Searching for the elusive gift: advances in talent identification in sport. Current Opinion in Psychology. 2017; 16:128-33.

[Crossref] [Google Scholar]

[34]Xian S, Guo H, Chai J, Wan W. Interval probability hesitant fuzzy linguistic analytic hierarchy process and its application in talent selection. Journal of Intelligent & Fuzzy Systems. 2020; 39(3):2627-45.

[Crossref] [Google Scholar]

[35]Curran O, Macnamara A, Passmore D. What about the girls? exploring the gender data gap in talent development. Frontiers in Sports and Active Living. 2019; 1(3):1-7.

[Crossref] [Google Scholar]

[36]Cover TM, Thomas JA. Information theory and statistics. Elements of Information Theory. 1991; 1(1):279-335.

[Google Scholar]

[37]Vergara JR, Estévez PA. A review of feature selection methods based on mutual information. Neural Computing and Applications. 2014; 24:175-86.

[Crossref] [Google Scholar]

[38]Torkkola K. Information-theoretic methods. Feature Extraction: Foundations and Applications. 2006:167-85.

[Google Scholar]

[39]Salzberg SL. C4. 5: programs for machine learning by J.Ross quinlan. Morgan Kaufmann Publishers. 1993:235-40.

[Google Scholar]

[40]Duch W. Filter methods. Feature Extraction: Foundations and Applications. 2006:89-117.

[Crossref] [Google Scholar]

[41]Jin X, Xu A, Bie R, Guo P. Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. In data mining for biomedical applications: PAKDD 2006 workshop, BioDM 2006, Singapore proceedings 2006 (pp. 106-15). Springer Berlin Heidelberg.

[Crossref] [Google Scholar]

[42]Embrechts MJ, Bress RA, Kewley RH. Feature selection via sensitivity analysis with direct kernel PLS. Feature Extraction: Foundations and Applications. 2006:447-62.

[Crossref] [Google Scholar]

[43]Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition. 1997; 30(7):1145-59.

[Crossref] [Google Scholar]

[44]Robnik-sikonja M, Kononenko I. Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning. 2003; 53:23-69.

[Crossref] [Google Scholar]

[45]Urbanowicz RJ, Meeker M, La CW, Olson RS, Moore JH. Relief-based feature selection: introduction and review. Journal of Biomedical Informatics. 2018; 85:189-203.

[Crossref] [Google Scholar]

[46]Kursa MB, Jankowski A, Rudnicki WR. Boruta–a system for feature selection. Fundamenta Informaticae. 2010; 101(4):271-85.

[Crossref] [Google Scholar]

[47]Sujatha M, Devi L. Feature selection techniques using for high dimensional data in machine learning. International Journal of Engineering Research & Technology. 2013; 2(9):2909-16.

[Crossref] [Google Scholar]

[48]Bauer E, Kohavi R. An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Machine Learning. 1999; 36:105-39.

[Crossref] [Google Scholar]

[49]Wald R, Khoshgoftaar TM, Dittman D, Awada W, Napolitano A. An extensive comparison of feature ranking aggregation techniques in bioinformatics. In 13th international conference on information reuse & integration 2012 (pp. 377-84). IEEE.

[Crossref] [Google Scholar]