ACCENTS Journals

Download PDF
Back

Paper Title	:	Latest trends in emotion recognition methods: case study on emotiw challenge
Author Name	:	Huma Naz and Sachin Ahuja
Abstract	:	Emotion recognition is becoming increasingly very active field in research. In recent past, this research field has emerged as a milestone in software engineering, website customization, education, and gaming. Moreover, Emotion recognition models are used by more and more intelligent system to improve the multimodal interaction. Therefore, this paper demonstrates the recent literature on the emotion recognition methods presented at Emotion Recognition in the Wild (EmotiW) challenge. EmotiW is a grand challenge organized every year in ACM international conference on multimodal interaction. There has been number of methods presented every year at EmotiW for emotion analysis which are incorporated in this paper on the basis of emotion categorization in different areas. This work depicts a broad methodical analysis of EmotiW challenge for sentiments analysis which can help researchers, IT professionals and academia to find worthy technique for emotion grouping in several areas. It would also provide aid to select the most suitable technique for emotion recognition on the basis of their applications.
Keywords	:	Emotion recognition, Audio-video emotion recognition, Emotion recognition methods, EmotiW case study, Emotion analysis.
Cite this article	:	Naz H, Ahuja S.Latest trends in emotion recognition methods: case study on emotiw challenge. International Journal of Advanced Computer Research. 2020;10(46):34-50. DOI:10.19101/IJACR.2019.940117
References	:	[1]Zhao M, Adib F, Katabi D. Emotion recognition using wireless signals. In proceedings of the 22nd annual international conference on mobile computing and networking 2016 (pp. 95-108). ACM. [Crossref] [Google Scholar] [2]https://www.mordorintelligence.com/industry-reports/emotion-detection-and-recognition-edr-market. Accessed 20 August 2019. [3]Dhall A, Goecke R, Joshi J, Wagner M, Gedeon T. Emotion recognition in the wild challenge 2013. In proceedings of the ACM on international conference on multimodal interaction 2013 (pp. 509-16). ACM. [Crossref] [Google Scholar] [4]Valstar M, Gratch J, Schuller B, Ringeval F, Lalanne D, Torres Torres M, et al. Depression, mood, and emotion recognition workshop and challenge. In proceedings of the international workshop on audio/visual emotion challenge 2016 (pp. 3-10). ACM. [Crossref] [Google Scholar] [5]Swain M, Routray A, Kabisatpathy P. Databases, features and classifiers for speech emotion recognition: a review. International Journal of Speech Technology. 2018; 21(1):93-120. [Crossref] [Google Scholar] [6]Kahou SE, Pal C, Bouthillier X, Froumenty P, Gülçehre Ç, Memisevic R, et al. Combining modality specific deep neural networks for emotion recognition in video. In proceedings of the ACM on international conference on multimodal interaction 2013 (pp. 543-50). [Crossref] [Google Scholar] [7]Sikka K, Dykstra K, Sathyanarayana S, Littlewort G, Bartlett M. Multiple kernel learning for emotion recognition in the wild. In proceedings of the ACM on international conference on multimodal interaction 2013 (pp. 517-24). ACM. [Crossref] [Google Scholar] [8]Liu M, Wang R, Huang Z, Shan S, Chen X. Partial least squares regression on grassmannian manifold for emotion recognition. In proceedings of the ACM on international conference on multimodal interaction 2013 (pp. 525-30). ACM. [Crossref] [Google Scholar] [9]Dhall A, Goecke R, Joshi J, Sikka K, Gedeon T. Emotion recognition in the wild challenge 2014: baseline, data and protocol. In proceedings of the international conference on multimodal interaction 2014 (pp. 461-6). ACM. [Crossref] [Google Scholar] [10]Liu M, Wang R, Li S, Shan S, Huang Z, Chen X. Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In proceedings of the international conference on multimodal interaction 2014 (pp. 494-501). ACM. [Crossref] [Google Scholar] [11]Sun B, Li L, Zuo T, Chen Y, Zhou G, Wu X. Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In proceedings of the international conference on multimodal interaction 2014 (pp. 481-6). ACM. [Crossref] [Google Scholar] [12]Chen J, Chen Z, Chi Z, Fu H. Emotion recognition in the wild with feature fusion and multiple kernel learning. In proceedings of the international conference on multimodal interaction 2014 (pp. 508-13). ACM. [Crossref] [Google Scholar] [13]Dhall A, Ramana Murthy OV, Goecke R, Joshi J, Gedeon T. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In proceedings of international conference on multimodal interaction 2015 (pp. 423-6). ACM. [Crossref] [Google Scholar] [14]Yao A, Shao J, Ma N, Chen Y. Capturing au-aware facial features and their latent relations for emotion recognition in the wild. In proceedings of the ACM on international conference on multimodal interaction 2015 (pp. 451-8). ACM. [Crossref] [Google Scholar] [15]Kaya H, Gürpinar F, Afshar S, Salah AA. Contrasting and combining least squares based learners for emotion recognition in the wild. In proceedings of the ACM on international conference on multimodal interaction 2015 (pp. 459-66). ACM. [Crossref] [Google Scholar] [16]Ebrahimi Kahou S, Michalski V, Konda K, Memisevic R, Pal C. Recurrent neural networks for emotion recognition in video. In proceedings of the ACM on international conference on multimodal interaction 2015 (pp. 467-74). ACM. [Crossref] [Google Scholar] [17]Kim BK, Lee H, Roh J, Lee SY. Hierarchical committee of deep cnns with exponentially-weighted decision fusion for static facial expression recognition. In proceedings of the international conference on multimodal interaction 2015 (pp. 427-34). ACM. [Crossref] [Google Scholar] [18]Yu Z, Zhang C. Image based static facial expression recognition with multiple deep network learning. In proceedings of the international conference on multimodal interaction 2015 (pp. 435-42). ACM. [Crossref] [Google Scholar] [19]Ng HW, Nguyen VD, Vonikakis V, Winkler S. Deep learning for emotion recognition on small datasets using transfer learning. In proceedings of the international conference on multimodal interaction 2015 (pp. 443-9). ACM. [Crossref] [Google Scholar] [20]Dhall A, Goecke R, Joshi J, Hoey J, Gedeon T. Video and group-level emotion recognition challenges. In proceedings of the international conference on multimodal interaction 2016 (pp. 427-32). ACM. [Crossref] [Google Scholar] [21]Dhall A, Goecke R, Lucey S, Gedeon T. Collecting large, richly annotated facial-expression databases from movies. IEEE Multimedia. 2012; 19(3):34-41. [Crossref] [Google Scholar] [22]Dhall A, Goecke R, Gedeon T. Automatic group happiness intensity analysis. IEEE Transactions on Affective Computing. 2015; 6(1):13-26. [Crossref] [Google Scholar] [23]Yao A, Cai D, Hu P, Wang S, Sha L, Chen Y. Holonet: towards robust emotion recognition in the wild. In proceedings of the international conference on multimodal interaction 2016 (pp. 472-8). ACM. [Crossref] [Google Scholar] [24]Fan Y, Lu X, Li D, Liu Y. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In proceedings of the international conference on multimodal interaction 2016 (pp. 445-50). ACM. [Crossref] [Google Scholar] [25]Bargal SA, Barsoum E, Ferrer CC, Zhang C. Emotion recognition in the wild from videos using images. In proceedings of the international conference on multimodal interaction 2016 (pp. 433-6). ACM. [Crossref] [Google Scholar] [26]Yan J, Zheng W, Cui Z, Tang C, Zhang T, Zong Y. Multi-cue fusion for emotion recognition in the wild. Neurocomputing. 2018; 309:27-35. [Crossref] [Google Scholar] [27]Li J, Roy S, Feng J, Sim T. Happiness level prediction with sequential inputs via multiple regressions. In proceedings of the international conference on multimodal interaction 2016 (pp. 487-93). ACM. [Crossref] [Google Scholar] [28]Vonikakis V, Yazici Y, Nguyen VD, Winkler S. Group happiness assessment using geometric features and dataset balancing. In proceedings of the international conference on multimodal interaction 2016 (pp. 479-86). ACM. [Crossref] [Google Scholar] [29]Sun B, Wei Q, Li L, Xu Q, He J, Yu L. LSTM for dynamic emotion and group emotion recognition in the wild. In proceedings of the international conference on multimodal interaction 2016 (pp. 451-7). ACM. [Crossref] [Google Scholar] [30]Dhall A, Goecke R, Ghosh S, Joshi J, Hoey J, Gedeon T. From individual to group-level emotion recognition: Emotiw 5.0. In proceedings of the international conference on multimodal interaction 2017 (pp. 524-8). ACM. [Crossref] [Google Scholar] [31]Dhall A, Joshi J, Sikka K, Goecke R, Sebe N. The more the merrier: analysing the affect of a group of people in images. In international conference and workshops on automatic face and gesture recognition (FG) 2015 (pp. 1-8). IEEE. [Crossref] [Google Scholar] [32]Knyazev B, Shvetsov R, Efremova N, Kuharenko A. Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv preprint arXiv:1711.04598. 2017. [Google Scholar] [33]Hu P, Cai D, Wang S, Yao A, Chen Y. Learning supervised scoring ensemble for emotion recognition in the wild. In proceedings of the international conference on multimodal interaction 2017 (pp. 553-60). ACM. [Crossref] [Google Scholar] [34]Vielzeuf V, Pateux S, Jurie F. Temporal multimodal fusion for video emotion classification in the wild. In proceedings of the international conference on multimodal interaction 2017 (pp. 569-76). ACM. [Crossref] [Google Scholar] [35]Tan L, Zhang K, Wang K, Zeng X, Peng X, Qiao Y. Group emotion recognition with individual facial emotion CNNs and global image based CNNs. In proceedings of the international conference on multimodal interaction 2017 (pp. 549-52). ACM. [Crossref] [Google Scholar] [36]Guo X, Polanía LF, Barner KE. Group-level emotion recognition using deep models on image scene, faces, and skeletons. In proceedings of the international conference on multimodal interaction 2017 (pp. 603-8). ACM. [Crossref] [Google Scholar] [37]Wei Q, Zhao Y, Xu Q, Li L, He J, Yu L, et al. A new deep-learning framework for group emotion recognition. In proceedings of the international conference on multimodal interaction 2017 (pp. 587-92). ACM. [Crossref] [Google Scholar] [38]Yang J, Wang K, Peng X, Qiao Y. Deep recurrent multi-instance learning with spatio-temporal features for engagement intensity prediction. In proceedings of the on international conference on multimodal interaction 2018 (pp. 594-8). ACM. [Crossref] [Google Scholar] [39]Niu X, Han H, Zeng J, Sun X, Shan S, Huang Y, et al. Automatic engagement prediction with GAP feature. In proceedings of the on international conference on multimodal interaction 2018 (pp. 599-603). ACM. [Crossref] [Google Scholar] [40]Vielzeuf V, Kervadec C, Pateux S, Lechervy A, Jurie F. An occams razor view on learning audiovisual emotion recognition with small training sets. In proceedings of the international conference on multimodal interaction 2018 (pp. 589-93). ACM. [Crossref] [Google Scholar] [41]Thomas C, Nair N, Jayagopi DB. Predicting engagement intensity in the wild using temporal convolutional network. In proceedings of the international conference on multimodal interaction 2018 (pp. 604-10). ACM. [Crossref] [Google Scholar] [42]Chang C, Zhang C, Chen L, Liu Y. An ensemble model using face and body tracking for engagement detection. In proceedings of the international conference on multimodal interaction 2018 (pp. 616-22). ACM. [Crossref] [Google Scholar] [43]Guo X, Zhu B, Polanía LF, Boncelet C, Barner KE. Group-level emotion recognition using hybrid deep models based on faces, scenes, skeletons and visual attentions. In proceedings of the international conference on multimodal interaction 2018 (pp. 635-9). ACM. [Crossref] [Google Scholar] [44]Wang K, Zeng X, Yang J, Meng D, Zhang K, Peng X, et al. Cascade attention networks for group emotion recognition with face, body and image cues. In proceedings of the international conference on multimodal interaction 2018 (pp. 640-5). ACM. [Crossref] [Google Scholar] [45]Khan AS, Li Z, Cai J, Meng Z, O Reilly J, Tong Y. Group-level emotion recognition using deep models with a four-stream hybrid network. In proceedings of the international conference on multimodal interaction 2018 (pp. 623-9). ACM. [Crossref] [Google Scholar] [46]Gupta A, Agrawal D, Chauhan H, Dolz J, Pedersoli M. An attention model for group-level emotion recognition. In proceedings of the international conference on multimodal interaction 2018 (pp. 611-5). ACM. [Crossref] [Google Scholar] [47]Liu C, Tang T, Lv K, Wang M. Multi-feature based emotion recognition for video clips. In proceedings of the on international conference on multimodal interaction 2018 (pp. 630-4). ACM. [Crossref] [Google Scholar] [48]Fan Y, Lam JC, Li VO. Video-based emotion recognition using deeply-supervised neural networks. In proceedings of the international conference on multimodal interaction 2018 (pp. 584-8). ACM. [Crossref] [Google Scholar] [49]Ghosh S, Dhall A, Sebe N, Gedeon T. Predicting group cohesiveness in images. In international joint conference on neural networks 2019 (pp. 1-8). IEEE. [Crossref] [Google Scholar] [50]Poria S, Hazarika D, Majumder N, Naik G, Cambria E, Mihalcea R. Meld: a multimodal multi-party dataset for emotion recognition in conversations. arXiv preprint arXiv:1810.02508. 2018. [Google Scholar] [51]Wankhade VA, Kukade RV. Categorization and analysis of emotion from speech signals. Themed Section: Engineering and Technology. 2018; 4(7):395-8. [Google Scholar] [52]Zhang LM. Genetic deep neural networks using different activation functions for financial data mining. In international conference on big data 2015 (pp. 2849-51). IEEE. [Crossref] [Google Scholar] [53]Chen YL, Chang CL, Yeh CS. Emotion classification of youtube videos. Decision Support Systems. 2017; 101:40-50. [Crossref] [Google Scholar] [54]Afshar S, Ali Salah A. Facial expression recognition in the wild using improved dense trajectories and fisher vector encoding. In proceedings of the conference on computer vision and pattern recognition workshops 2016 (pp. 1517-25). [Crossref] [Google Scholar] [55]Hossain MS, Muhammad G. Emotion recognition using deep learning approach from audio–visual emotional big data. Information Fusion. 2019; 49:69-78. [Crossref] [Google Scholar] [56]Kim Y. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882. 2014. [Google Scholar] [57]Huang KY, Wu CH, Hong QB, Su MH, Chen YH. Speech emotion recognition using deep neural network considering verbal and nonverbal speech sounds. In international conference on acoustics, speech and signal processing 2019 (pp. 5866-70). IEEE. [Crossref] [Google Scholar] [58]Barakat N, Bradley AP, Barakat MN. Intelligible support vector machines for diagnosis of diabetes mellitus. IEEE Transactions on Information Technology in Biomedicine. 2010; 14(4):1114-20. [Crossref] [Google Scholar] [59]Huang GB, Zhu QY, Siew CK. Extreme learning machine: a new learning scheme of feedforward neural networks. Neural Networks. 2004; 2:985-90. [Google Scholar] [60]Xiong X, De la Torre F. Global supervised descent method. In proceedings of the conference on computer vision and pattern recognition 2015 (pp. 2664-73). [Crossref] [Google Scholar]