A review of feature selection in sentiment analysis using information gain and domain specific ontology
Ibrahim Said Ahmad, Azuraliza Abu Bakar and Mohd Ridzwan Yaakub
Abstract
There is a continued interest in understanding people’s interest through the contents they share online. However, the data generated is massive, characterized by textual jargons and tokens that contain no sentiment or opinion value. One way of reducing the data dimension and pruning of irrelevant features is feature selection. However, the existing approaches of feature selection are still inefficient. Two prominent feature selection methods in sentiment analysis are information gain and ontology-based methods. Information gain has the disadvantage of not considering redundancy between features while ontology-based approach requires a lot of human intervention. The aim of this paper is to review these two methods. The review of these two methods shows that using the two methods in a two-step approach can overcome their limitations and provide an optimal feature set for sentiment analysis.
Keyword
Sentiment analysis, Feature selection, Information gain, Ontology.
Cite this article
Ahmad IS, Bakar AA, Yaakub MR.A review of feature selection in sentiment analysis using information gain and domain specific ontology. International Journal of Advanced Computer Research. 2019;9(44):283-292. DOI:10.19101/IJACR.PID90
Refference
[1]Dave K, Lawrence S, Pennock DM. Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In proceedings of the international conference on World Wide Web 2003 (pp. 519-28). ACM.
[2]Nasukawa T, Yi J. Sentiment analysis: capturing favorability using natural language processing. In proceedings of the international conference on knowledge capture 2003 (pp. 70-7). ACM.
[3]Pang B, Lee L, Vaithyanathan S. Thumbs up? sentiment classification using machine learning techniques. In proceedings of the ACL-02 conference on empirical methods in natural language processing 2002 (pp. 79-86). Association for Computational Linguistics.
[4]Turney PD. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In proceedings of the annual meeting on association for computational linguistics 2002 (pp. 417-24). Association for Computational Linguistics.
[5]Yi J, Nasukawa T, Bunescu R, Niblack W. Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques. In IEEE international conference on data mining 2003 (pp. 427-34). IEEE.
[6]Ahmad SR, Bakar AA, Yaakub MR. Metaheuristic algorithms for feature selection in sentiment analysis. In science and information conference (SAI) 2015 (pp. 222-6). IEEE.
[7]Zheng L, Wang H, Gao S. Sentimental feature selection for sentiment analysis of Chinese online reviews. International Journal of Machine Learning and Cybernetics. 2018; 9(1):75-84.
[8]Medhat W, Hassan A, Korashy H. Sentiment analysis algorithms and applications: a survey. Ain Shams Engineering Journal. 2014; 5(4):1093-113.
[9]Ravi K, Ravi V. A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowledge-Based Systems. 2015; 89:14-46.
[10]Miranda MD, Sassi RJ. Using sentiment analysis to assess customer satisfaction in an online job search company. In international conference on business information systems 2014 (pp. 17-27). Springer, Cham.
[11]Wang S, Li D, Song X, Wei Y, Li H. A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert Systems with Applications. 2011; 38(7):8696-702.
[12]Colace F, De Santo M, Greco L, Moscato V, Picariello A. Probabilistic approaches for sentiment analysis: latent dirichlet allocation for ontology building and sentiment extraction. In sentiment analysis and ontology engineering 2016 (pp. 75-91). Springer, Cham.
[13]Li YM, Li TY. Deriving market intelligence from microblogs. Decision Support Systems. 2013; 55(1):206-17.
[14]Kang H, Yoo SJ, Han D. Senti-lexicon and improved naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems with Applications. 2012; 39(5):6000-10.
[15]Tripathy A, Agrawal A, Rath SK. Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications. 2016; 57:117-26.
[16]Vohra SM, Teraiya JB. A comparative study of sentiment analysis techniques. Journal JIKRCE. 2013; 2(2):313-7.
[17]Mohammad S, Dunne C, Dorr B. Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus. In proceedings of the 2009 conference on empirical methods in natural language processing: 2009 (pp. 599-608). Association for Computational Linguistics.
[18]Liu H, Lieberman H, Selker T. A model of textual affect sensing using real-world knowledge. In proceedings of the 8th international conference on intelligent user interfaces 2003 (pp. 125-32). ACM.
[19]Tsai AC, Wu CE, Tsai RT, Hsu JY. Building a concept-level sentiment dictionary based on commonsense knowledge. IEEE Intelligent Systems. 2013; 28(2):22-30.
[20]Hatzivassiloglou V, McKeown KR. Predicting the semantic orientation of adjectives. In proceedings of the annual meeting of the association for computational linguistics and eighth conference of the European chapter of the association for computational linguistics 1997 (pp. 174-81). Association for Computational Linguistics.
[21]Mostafa MM. More than words: social networks’ text mining for consumer brand sentiments. Expert Systems with Applications. 2013; 40(10):4241-51.
[22]Abdel-Hafez A, Xu Y. Ontology-based products reputation model. In proceedings of the IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT)2013 (pp. 37-40). IEEE Computer Society.
[23]Garcia-Herranz M, Moro E, Cebrian M, Christakis NA, Fowler JH. Using friends as sensors to detect global-scale contagious outbreaks. PloS one. 2014; 9(4).
[24]Tumasjan A, Sprenger TO, Sandner PG, Welpe IM. Predicting elections with twitter: what 140 characters reveal about political sentiment. In fourth international AAAI conference on weblogs and social media 2010:178-85.
[25]Duric A, Song F. Feature selection for sentiment analysis based on content and syntax models. Decision Support Systems. 2012; 53(4):704-11.
[26]Kontopoulos E, Berberidis C, Dergiades T, Bassiliades N. Ontology-based sentiment analysis of twitter posts. Expert Systems with Applications. 2013; 40(10):4065-74.
[27]Manek AS, Shenoy PD, Mohan MC, Venugopal KR. Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier. World Wide Web. 2017; 20(2):135-54.
[28]Ding X, Liu B, Yu PS. A holistic lexicon-based approach to opinion mining. In proceedings of the international conference on web search and data mining 2008 (pp. 231-40). ACM.
[29]Whitelaw C, Garg N, Argamon S. Using appraisal groups for sentiment analysis. In proceedings of the ACM international conference on information and knowledge management 2005 (pp. 625-31). ACM.
[30]Gómez-Pérez A, Corcho O. Ontology languages for the semantic web. IEEE Intelligent Systems. 2002; 17(1):54-60.
[31]Salas-Zárate MD, Valencia-García R, Ruiz-Martínez A, Colomo-Palacios R. Feature-based opinion mining in financial news: an ontology-driven approach. Journal of Information Science. 2017; 43(4):458-79.
[32]Ali F, Kwak KS, Kim YG. Opinion mining based on fuzzy domain ontology and support vector machine: a proposal to automate online review classification. Applied Soft Computing. 2016; 47:235-50.
[33]Agarwal B, Mittal N, Bansal P, Garg S. Sentiment analysis using common-sense and context information. Computational Intelligence and Neuroscience.2015.
[34]Thakor P, Sasi S. Ontology-based sentiment analysis process for social media content. Procedia Computer Science. 2015; 53:199-207.
[35]Lundquist D, Zhang K, Ouksel A. Ontology-driven cyber-security threat assessment based on sentiment analysis of network activity data. In international conference on cloud and autonomic computing 2014 (pp. 5-14). IEEE.
[36]Marstawi A, Sharef NM, Aris TN, Mustapha A. Ontology-based aspect extraction for an improved sentiment analysis in summarization of product reviews. In proceedings of the international conference on computer modeling and simulation 2017 (pp. 100-4). ACM.
[37]Schouten K, Frasincar F, De Jong F. Ontology-enhanced aspect-based sentiment analysis. In international conference on web engineering 2017 (pp. 302-20). Springer, Cham.
[38]Yadav N, Chowdary CR. Feature based sentiment analysis using a domain ontology. In proceedings of the international conference on natural language processing 2016 (pp. 90-8).
[39]Gutierrez F, Dou D, Fickas S, Wimalasuriya D, Zong H. A hybrid ontology-based information extraction system. Journal of Information Science. 2016; 42(6):798-820.
[40]Alexopoulos P, Wallace M. Creating domain-specific semantic lexicons for aspect-based sentiment analysis. In international workshop on semantic and social media adaptation and personalization 2015 (pp. 1-6). IEEE.
[41]Blanco E, Cankaya H, Moldovan D. Commonsense knowledge extraction using concepts properties. In twenty-fourth international FLAIRS conference 2011(pp. 222-7).
[42]Shangfeng H, Kanagasabai R. Learning commonsense knowledge models for semantic analytics. In international conference on semantic computing 2016 (pp. 400-3). IEEE.
[43]Cambria E, Hussain A, Havasi C, Eckl C. Sentic computing: exploitation of common sense for the development of emotion-sensitive systems. In development of multimodal interfaces: active listening and synchrony 2010 (pp. 148-56). Springer, Berlin, Heidelberg.
[44]Cambria E, Speer R, Havasi C, Hussain A. Senticnet: a publicly available semantic resource for opinion mining. In AAAI fall symposium series 2010 (pp.14-8).
[45]Cambria E, Havasi C, Hussain A. SenticNet 2: a semantic and affective resource for opinion mining and sentiment analysis. In international FLAIRS conference 2012 (pp. 202-7).
[46]Cambria E, Olsher D, Rajagopal D. SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. In AAAI conference on artificial intelligence 2014(pp.1515-21).
[47]Verdu S. Fifty years of Shannon theory. IEEE Transactions on Information Theory. 1998; 44(6):2057-78.
[48]Lee C, Lee GG. Information gain and divergence-based feature selection for machine learning-based text categorization. Information Processing & Management. 2006; 42(1):155-65.
[49]Mukras R, Wiratunga N, Lothian R, Chakraborti S, Harper D. Information gain feature selection for ordinal text classification using probability re-distribution. In proceedings of the textlink workshop at IJCAI 2007.
[50]Wu G, Xu J. Optimized approach of feature selection based on information gain. In international conference on computer science and mechanical automation 2015 (pp. 157-61). IEEE.
[51]Pratiwi AI. On the feature selection and classification based on information gain for document sentiment analysis. Applied Computational Intelligence and Soft Computing. 2018.
[52]Zhu L, Wang G, Zou X. Improved information gain feature selection method for Chinese text classification based on word embedding. In proceedings of the international conference on software and computer applications 2017 (pp. 72-6). ACM.
[53]Schouten K, Frasincar F, Dekker R. An information gain-driven feature study for aspect-based sentiment analysis. In international conference on applications of natural language to information systems 2016 (pp. 48-59). Springer, Cham.
[54]Fahrudin TM, Syarif I, Barakbah AR. Feature selection algorithm using information gain-based clustering for supporting the treatment process of breast cancer. In international conference on informatics and computing 2016 (pp. 6-11). IEEE.
[55]Ong BY, Goh SW, Xu C. Sparsity adjusted information gain for feature selection in sentiment analysis. In international conference on big data 2015 (pp. 2122-8). IEEE.
[56]Gao Z, Xu Y, Meng F, Qi F, Lin Z. Improved information gain-based feature selection for text categorization. In international conference on wireless communications, vehicular technology, information theory and aerospace & electronic systems (VITAE) 2014 (pp. 1-5). IEEE.
[57]Luo K, Luo J, Yin M, Li J. IG-C4. 5: an improved feature selection method based on information gain. In international conference on mechatronics, electronic, industrial and control engineering (MEIC-14) 2014. Atlantis Press.