International Journal of Advanced Computer Research (IJACR) ISSN (P): 2249-7277 ISSN (O): 2277-7970 Vol - 13, Issue - 63, June 2023
  1. 1
    Google Scholar
  2. 4
    Impact Factor
A review and analysis for the text-based classification

Prince Kumar and Animesh Kumar Dubey

Abstract

In the current information-rich era, efficient retrieval and classification of text-based documents have become crucial tasks. With the exponential growth of digital content, the ability to retrieve the most relevant and appropriate documents has become a pressing concern. Effective document retrieval not only saves time and effort but also contributes to enhanced knowledge discovery and decision-making processes. To address these challenges, various text-based classification techniques have been developed and implemented. This paper aims to provide a comprehensive review and analysis of text-based classification techniques. The objectives include evaluating existing methods, identifying their strengths and limitations, and suggesting potential avenues for future research. The paper will analyze various algorithms, feature extraction techniques, and evaluation metrics employed in text-based classification. Additionally, it investigated the impact of different factors, such as document size, language, and domain specificity, on classification performance.

Keyword

Text-based classification, Knowledge discovery, Inherent ambiguity, Extraction mechanism.

Cite this article

Kumar P, Dubey AK

Refference

[1][1]Gasparetto A, Marcuzzo M, Zangari A, Albarelli A. A survey on text classification algorithms: From text to predictions. Information. 2022; 13(2):83.

[2][2]Wang Y, Wang C, Zhan J, Ma W, Jiang Y. Text FCG: Fusing contextual information via graph learning for text classification. Expert Systems with Applications. 2023:119658.

[3][3]Chen X, Cong P, Lv S. A long-text classification method of Chinese news based on BERT and CNN. IEEE Access. 2022; 10:34046-57.

[4][4]Bayer M, Kaufhold MA, Reuter C. A survey on data augmentation for text classification. ACM Computing Surveys. 2022; 55(7):1-39.

[5][5]Qasim R, Bangyal WH, Alqarni MA, Ali Almazroi A. A fine-tuned BERT-based transfer learning approach for text classification. Journal of Healthcare Engineering. 2022.

[6][6]Ma Y, Liu X, Zhao L, Liang Y, Zhang P, Jin B. Hybrid embedding-based text representation for hierarchical multi-label text classification. Expert Systems with Applications. 2022; 187:115905.

[7][7]Muñoz S, Iglesias CA. A text classification approach to detect psychological stress combining a lexicon-based feature framework with distributional representations. Information Processing & Management. 2022; 59(5):103011.

[8][8]Dubey AK, Kushwaha GR, Shrivastava N. Heterogeneous data mining environment based on dam for mobile computing environments. Information Technology and Mobile Communication. 2011:144.

[9][9]Mohammed A, Kora R. An effective ensemble deep learning framework for text classification. Journal of King Saud University-Computer and Information Sciences. 2022; 34(10):8825-37.

[10][10]Khataei Maragheh H, Gharehchopogh FS, Majidzadeh K, Sangar AB. A new hybrid based on long short-term memory network with spotted hyena optimization algorithm for multi-label text classification. Mathematics. 2022; 10(3):488.

[11][11]Dubey AK, Shandilya SK. A comprehensive survey of grid computing mechanism in J2ME for effective mobile computing techniques. In 2010 5th international conference on industrial and information systems 2010 (pp. 207-212). IEEE.

[12][12]Zhou H. Research of text classification based on TF-IDF and CNN-LSTM. In journal of physics: conference series 2022 (p. 012021). IOP Publishing.

[13][13]Zhang H, Zhang X, Huang H, Yu L. Prompt-based meta-learning for few-shot text classification. In proceedings of the 2022 conference on empirical methods in natural language processing 2022 (pp. 1342-57).

[14][14]Yang X, Li Y, Li Q, Liu D, Li T. Temporal-spatial three-way granular computing for dynamic text sentiment classification. Information Sciences. 2022; 596:551-66.

[15][15]Li Q, Peng H, Li J, Xia C, Yang R, Sun L, Yu PS, He L. A survey on text classification: From traditional to deep learning. ACM Transactions on Intelligent Systems and Technology (TIST). 2022; 13(2):1-41.

[16][16]Zhao H, Xie J, Wang H. Graph convolutional network based on multi-head pooling for short text classification. IEEE Access. 2022; 10:11947-56.

[17][17]Yang D, Kim B, Lee SH, Ahn YH, Kim HY. AutoDefect: defect text classification in residential buildings using a multi-task channel attention network. Sustainable Cities and Society. 2022; 80:103803.

[18][18]Dubey AK, Kapoor D, Kashyap V. A review on performance analysis of data mining methods in IoT. International Journal of Advanced Technology and Engineering Exploration. 2020; 7(73):193.

[19][19]William P, Badholia A, Patel B, Nigam M. Hybrid Machine Learning Technique for Personality Classification from Online Text using HEXACO Model. In2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS) 2022 Apr 7 (pp. 253-259). IEEE.

[20][20]Shelke N, Chaudhury S, Chakrabarti S, Bangare SL, Yogapriya G, Pandey P. An efficient way of text-based emotion analysis from social media using LRA-DNN. Neuroscience Informatics. 2022: 100048.

[21][21]Liu L, Wu Y, Yin L, Ren J, Song R, Xu G. A method combining text classification and keyword recognition to improve long text information mining. In 7th IEEE International Conference on Data Science in Cyberspace (DSC) 2022 (pp. 242-8). IEEE.

[22][22]Pathak M, Jain A. µBoost: An Effective Method for Solving Indic Multilingual Text Classification Problem. In eighth international conference on multimedia big data (BigMM) 2022 (pp. 96-100). IEEE.

[23][23]Wang H, Cao J, Lin D. Deep analysis of power equipment defects based on semantic framework text mining technology. CSEE Journal of Power and Energy Systems. 2019; 8(4):1157-64.

[24][24]Ma L, Pu KQ. Neural network accelerated tuple search for relational data. In 2022 IEEE 23rd international conference on information reuse and integration for data science (IRI) 2022 (pp. 81-2). IEEE.

[25][25]Yu B, Deng C, Bu L. Policy text classification algorithm based on bert. In 11th international conference of information and communication technology (ICTech)) 2022 (pp. 488-91). IEEE.

[26][26]Caron M. Shortcut learning in financial text mining: exposing the overly optimistic performance estimates of text classification models under distribution shift. In2022 IEEE International Conference on Big Data (Big Data) 2022 (pp. 3486-95). IEEE.

[27][27]Sun JW, Bao JQ, Bu LP. Text classification algorithm based on TF-IDF and BERT. In 2022 11th international conference of information and communication technology (ICTech)) 2022 (pp. 1-4). IEEE.

[28][28]Umer M, Imtiaz Z, Ahmad M, Nappi M, Medaglia C, Choi GS, Mehmood A. Impact of convolutional neural network and FastText embedding on text classification. Multimedia Tools and Applications. 2023; 82(4):5569-85.

[29][29]Shi Y, Zhang X, Yu N. PL-Transformer: a POS-aware and layer ensemble transformer for text classification. Neural Computing and Applications. 2023; 35(2):1971-82.

[30][30]Chandran NV, Anoop VS, Asharaf S. Topicstriker: A topic kernels-powered approach for text classification. Results in Engineering. 2023; 17:100949.

[31][31]Alantari HJ, Currim IS, Deng Y, Singh S. An empirical comparison of machine learning methods for text-based sentiment analysis of online consumer reviews. International Journal of Research in Marketing. 2022; 39(1):1-9.

[32][32]Guo Y, Ge Y, Yang YC, Al-Garadi MA, Sarker A. Comparison of pretraining models and strategies for health-related social media text classification. In Healthcare 2022 (p. 1478). MDPI.

[33][33]Shao D, Li C, Huang C, Xiang Y, Yu Z. A news classification applied with new text representation based on the improved LDA. Multimedia Tools and Applications. 2022; 81(15):21521-45.

[34][34]Chen J, Lv S. Long Text Truncation Algorithm Based on Label Embedding in Text Classification. Applied Sciences. 2022; 12(19):9874.

[35][35]Ozmen M, Zhang H, Wang P, Coates M. Multi-relation message passing for multi-label text classification. In ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP) 2022 (pp. 3583-7). IEEE.