ACCENTS Journals

Download PDF
Back

Paper Title	:	ResNet50-deep affinity network for object detection and tracking in videos
Author Name	:	Nandeeswar Sampigehalli Basavaraju and Pallavi Hallappanavar Basavaraja
Abstract	:	Multiple-object tracking (MOT) plays a crucial role in addressing many fundamental challenges within the fields of computer vision and video analysis. The majority of MOT methods rely on two primary processes: object detection and data association. Initially, each video frame is analyzed to detect objects, followed by a subsequent step that establishes correlations among the detected objects across multiple frames to generate their tracks. However, the data association for tracking often relies on manually defined criteria such as motion, appearance, grouping, and spatial proximity, among others. In the study, the ResNet50-deep affinity network (DAN) was introduced, which had been designed for the detection and tracking of objects in videos, including those that appear and disappear between frames. The proposed method was evaluated using the widely recognized MOT17 dataset to address MOT challenges. During the preprocessing phase, photometric distortion correction, frame expansion, and cropping were performed. The ResNet50 model was utilized to extract features. The DAN was employed to identify object appearances in video frames and to calculate their cross-frame affinities (CFA). The approach was compared with existing research, including DAN, ByteTrack, the graph neural network for simultaneous detection and tracking (GSDT), the reptile search optimization algorithm with deep learning-based multiple object detection and tracking (RSOADL-MODT), the center graph network (CGTracker), the hybrid motion model, FlowNet2-deep learning, and the super chained tracker (SCT), to validate the efficiency of the ResNet50-DAN method. The ResNet50-DAN method achieved superior results, with a multiple-object tracking accuracy (MOTA) of 84.2%, an F1 score for identification metrics (IDF1) of 80.3%, 10,352 false positives (FP), and 1,284 identity switches (ID-Sw). The ResNet50-DAN method demonstrated higher MOTA compared to the existing approaches, including DAN, ByteTrack, GSDT, RSOADL-MODT, CGTracker, the hybrid motion model, FlowNet2-DL, and SCT.
Keywords	:	Deep tracking, Multiple object tracking, Object detection, Online tracking, Tracking challenge, Video surveillance.
Cite this article	:	Basavaraju NS, Basavaraja PH.ResNet50-deep affinity network for object detection and tracking in videos. International Journal of Advanced Technology and Engineering Exploration. 2024;11(111):190-204. DOI:10.19101/IJATEE.2023.10102017
References	:	[1]Pramanik A, Pal SK, Maiti J, Mitra P. Granulated RCNN and multi-class deep sort for multi-object detection and tracking. IEEE Transactions on Emerging Topics in Computational Intelligence. 2021; 6(1):171-81. [Crossref] [Google Scholar] [2]Deng J, Pan Y, Yao T, Zhou W, Li H, Mei T. Single shot video object detector. IEEE Transactions on Multimedia. 2020; 23:846-58. [Crossref] [Google Scholar] [3]Mao H, Chen Y, Li Z, Chen P, Chen F. SCTracker: multi-object tracking with shape and confidence constraints. IEEE Sensors Journal. 2023; 24(3):3123-30. [Crossref] [Google Scholar] [4]Ariza-sentís M, Baja H, Vélez S, Valente J. Object detection and tracking on UAV RGB videos for early extraction of grape phenotypic traits. Computers and Electronics in Agriculture. 2023; 211:108051. [Crossref] [Google Scholar] [5]Gao X, Wang Z, Wang X, Zhang S, Zhuang S, Wang H. DetTrack: an algorithm for multiple object tracking by improving occlusion object detection. Electronics. 2023; 13(1):1-16. [Crossref] [Google Scholar] [6]Li J, Piao Y. Multi-object tracking based on re-identification enhancement and associated correction. Applied Sciences. 2023; 13(17):1-16. [Crossref] [Google Scholar] [7]Azimjonov J, Özmen A. A real-time vehicle detection and a novel vehicle tracking systems for estimating and monitoring traffic flow on highways. Advanced Engineering Informatics. 2021; 50:101393. [Crossref] [Google Scholar] [8]Lu X, Ma C, Ni B, Yang X. Adaptive region proposal with channel regularization for robust object tracking. IEEE Transactions on Circuits and Systems for Video Technology. 2019; 31(4):1268-82. [Crossref] [Google Scholar] [9]Fernández-sanjurjo M, Mucientes M, Brea VM. Real-time multiple object visual tracking for embedded GPU systems. IEEE Internet of Things Journal. 2021; 8(11):9177-88. [Crossref] [Google Scholar] [10]Yu E, Li Z, Han S, Wang H. Relationtrack: relation-aware multiple object tracking with decoupled representation. IEEE Transactions on Multimedia. 2022; 25: 2686-97. [Crossref] [Google Scholar] [11]Gu S, Ma J, Hui G, Xiao Q, Shi W. STMT: spatio-temporal memory transformer for multi-object tracking. Applied Intelligence. 2023; 53(20):23426-41. [Crossref] [Google Scholar] [12]Hassaballah M, Kenk MA, Muhammad K, Minaee S. Vehicle detection and tracking in adverse weather using a deep learning framework. IEEE Transactions on Intelligent Transportation Systems. 2020; 22(7):4230-42. [Crossref] [Google Scholar] [13]Baisa NL. Occlusion-robust online multi-object visual tracking using a GM-PHD filter with CNN-based re-identification. Journal of Visual Communication and Image Representation. 2021; 80:103279. [Crossref] [Google Scholar] [14]Wang W, Shen J, Lu X, Hoi SC, Ling H. Paying attention to video object pattern understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2020; 43(7):2413-28. [Crossref] [Google Scholar] [15]Hou J, Li B. Swimming target detection and tracking technology in video image processing. Microprocessors and Microsystems. 2021; 80:103535. [Crossref] [Google Scholar] [16]Razzok M, Badri A, El MI, Ruichek Y, Sahel A. Pedestrian detection and tracking system based on deep-SORT, YOLOv5, and new data association metrics. Information. 2023; 14(4):1-16. [Crossref] [Google Scholar] [17]Feng W, Bai L, Yao Y, Yu F, Ouyang W. Towards frame rate agnostic multi-object tracking. International Journal of Computer Vision. 2023:1-20. [Crossref] [Google Scholar] [18]Wang S, Li WX, Wang L, Xu LS, Deng QX. VGT-MOT: visibility-guided tracking for online multiple-object tracking. Machine Vision and Applications. 2023; 34(4):1-6. [Crossref] [Google Scholar] [19]Wang G, Wang Y, Gu R, Hu W, Hwang JN. Split and connect: a universal tracklet booster for multi-object tracking. IEEE Transactions on Multimedia. 2022; 25:1256-68. [Crossref] [Google Scholar] [20]Zhou Y, Chen J, Wang D, Zhu X. Multi-object tracking using context-sensitive enhancement via feature fusion. Multimedia Tools and Applications. 2023:1-20. [Crossref] [Google Scholar] [21]Boragule A, Jang H, Ha N, Jeon M. Pixel-guided association for multi-object tracking. Sensors. 2022; 22(22):1-14. [Crossref] [Google Scholar] [22]Elhoseny M. Multi-object detection and tracking (MODT) machine learning model for real-time video surveillance systems. Circuits, Systems, and Signal Processing. 2020; 39:611-30. [Crossref] [Google Scholar] [23]Jha S, Seo C, Yang E, Joshi GP. Real time object detection and trackingsystem for video surveillance system. Multimedia Tools and Applications. 2021; 80:3981-96. [Crossref] [Google Scholar] [24]Liu D, Cui Y, Chen Y, Zhang J, Fan B. Video object detection for autonomous driving: motion-aid feature calibration. Neurocomputing. 2020; 409:1-11. [Crossref] [Google Scholar] [25]Yu H, Huang Y, Pi L, Zhang C, Li X, Wang L. End-to-end video text detection with online tracking. Pattern Recognition. 2021; 113:107791. [Crossref] [Google Scholar] [26]Sun S, Akhtar N, Song H, Mian A, Shah M. Deep affinity network for multiple object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2019; 43(1):104-19. [Crossref] [Google Scholar] [27]Ji Y, Zhang H, Jie Z, Ma L, Wu QJ. CASNet: a cross-attention siamese network for video salient object detection. IEEE Transactions on Neural Networks and Learning Systems. 2020; 32(6):2676-90. [Crossref] [Google Scholar] [28]Zhang Y, Sun P, Jiang Y, Yu D, Weng F, Yuan Z, et al. Bytetrack: multi-object tracking by associating every detection box. In European conference on computer vision 2022 (pp. 1-21). Cham: Springer Nature Switzerland. [Crossref] [Google Scholar] [29]Wang Y, Kitani K, Weng X. Joint object detection and multi-object tracking with graph neural networks. In international conference on robotics and automation 2021 (pp. 13708-15). IEEE. [Crossref] [Google Scholar] [30]Alagarsamy R, Muneeswaran D. Multi-object detection and tracking using reptile search optimization algorithm with deep learning. Symmetry. 2023; 15(6):1-17. [Crossref] [Google Scholar] [31]Feng X, Wu HM, Yin YH, Lan LB. CGTracker: center graph network for one-stage multi-pedestrian-object detection and tracking. Journal of Computer Science and Technology. 2022; 37(3):626-40. [Crossref] [Google Scholar] [32]Qureshi SA, Hussain L, Chaudhary QU, Abbas SR, Khan RJ, Ali A, et al. Kalman filtering and bipartite matching based super-chained tracker model for online multi object tracking in video sequences. Applied Sciences. 2022; 12(19):1-19. [Crossref] [Google Scholar] [33]Wu Y, Sheng H, Zhang Y, Wang S, Xiong Z, Ke W. Hybrid motion model for multiple object tracking in mobile devices. IEEE Internet of Things Journal. 2022; 10(6):4735-48. [Crossref] [Google Scholar] [34]Singh D, Srivastava R. An end to end trained hybrid CNN model for multi-object tracking. Multimedia Tools and Applications. 2022; 81(29):42209-21. [Crossref] [Google Scholar] [35]Xuan S, Li S, Zhao Z, Zhou Z, Zhang W, Tan H, et al. Rotation adaptive correlation filter for moving object tracking in satellite videos. Neurocomputing. 2021; 438:94-106. [Crossref] [Google Scholar] [36]Suljagic H, Bayraktar E, Celebi N. Similarity based person re-identification for multi-object tracking using deep Siamese network. Neural Computing and Applications. 2022; 34(20):18171-82. [Crossref] [Google Scholar] [37]Ma L, Zhong Q, Zhang Y, Xie D, Pu S. Associative affinity network learning for multi-object tracking. Frontiers of Information Technology & Electronic Engineering. 2021; 22(9):1194-206. [Crossref] [Google Scholar] [38]Xu Y, Ban Y, Delorme G, Gan C, Rus D, Alameda-pineda X. TransCenter: transformers with dense representations for multiple-object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2022; 45(6):7820-35. [Crossref] [Google Scholar] [39]Wu H, Nie J, Zhu Z, He Z, Gao M. Learning task-specific discriminative representations for multiple object tracking. Neural Computing and Applications. 2023; 35(10):7761-77. [Crossref] [Google Scholar] [40]Hu X, Jeon Y. FFTransMOT: feature-fused transformer for enhanced multi-object tracking. IEEE Access. 2023; 11:130060-71. [Crossref] [Google Scholar] [41]Chen S, Hu X, Jiang W, Zhou W, Ding X. Novel learning framework for optimal multi-object video trajectory tracking. Virtual Reality & Intelligent Hardware. 2023; 5(5):422-38. [Crossref] [Google Scholar] [42]Xiang X, Ren W, Qiu Y, Zhang K, Lv N. Multi-object tracking method based on efficient channel attention and switchable atrous convolution. Neural Processing Letters. 2021; 53(4):2747-63. [Crossref] [Google Scholar] [43]Lee J, Jeong M, Ko BC. Graph convolution neural network-based data association for online multi-object tracking. IEEE Access. 2021; 9:114535-46. [Crossref] [Google Scholar] [44]Li Y, Wu L, Chen Y, Wang X, Yin G, Wang Z. Motion estimation and multi-stage association for tracking-by-detection. Complex & Intelligent Systems. 2023:1-4. [Crossref] [Google Scholar] [45]Chen M, Banitaan S, Maleki M. Enhancing pedestrian group detection and tracking through zone-based clustering. IEEE Access. 2023; 11:132162-79. [Crossref] [Google Scholar] [46]Liang H, Wu T, Zhang Q, Zhou H. Non-maximum suppression performs later in multi-object tracking. Applied Sciences. 2022; 12(7):1-11. [Crossref] [Google Scholar] [47]Li Y, Liu Y, Zhou C, Xu D, Tao W. A lightweight scheme of deep appearance extraction for robust online multi-object tracking. The Visual Computer. 2023:1-7. [Crossref] [Google Scholar] [48]https://motchallenge.net/data/MOT17/. Accessed 15 December 2023. [49]Walia IS, Kumar D, Sharma K, Hemanth JD, Popescu DE. An integrated approach for monitoring social distancing and face mask detection using stacked Resnet-50 and YOLOv5. Electronics. 2021; 10(23):1-15. [Crossref] [Google Scholar] [50]Li B, Lima D. Facial expression recognition via ResNet-50. International Journal of Cognitive Computing in Engineering. 2021; 2:57-64. [Crossref] [Google Scholar]