ResNet50-deep affinity network for object detection and tracking in videos
Nandeeswar Sampigehalli Basavaraju and Pallavi Hallappanavar Basavaraja
Abstract
Multiple-object tracking (MOT) plays a crucial role in addressing many fundamental challenges within the fields of computer vision and video analysis. The majority of MOT methods rely on two primary processes: object detection and data association. Initially, each video frame is analyzed to detect objects, followed by a subsequent step that establishes correlations among the detected objects across multiple frames to generate their tracks. However, the data association for tracking often relies on manually defined criteria such as motion, appearance, grouping, and spatial proximity, among others. In the study, the ResNet50-deep affinity network (DAN) was introduced, which had been designed for the detection and tracking of objects in videos, including those that appear and disappear between frames. The proposed method was evaluated using the widely recognized MOT17 dataset to address MOT challenges. During the preprocessing phase, photometric distortion correction, frame expansion, and cropping were performed. The ResNet50 model was utilized to extract features. The DAN was employed to identify object appearances in video frames and to calculate their cross-frame affinities (CFA). The approach was compared with existing research, including DAN, ByteTrack, the graph neural network for simultaneous detection and tracking (GSDT), the reptile search optimization algorithm with deep learning-based multiple object detection and tracking (RSOADL-MODT), the center graph network (CGTracker), the hybrid motion model, FlowNet2-deep learning, and the super chained tracker (SCT), to validate the efficiency of the ResNet50-DAN method. The ResNet50-DAN method achieved superior results, with a multiple-object tracking accuracy (MOTA) of 84.2%, an F1 score for identification metrics (IDF1) of 80.3%, 10,352 false positives (FP), and 1,284 identity switches (ID-Sw). The ResNet50-DAN method demonstrated higher MOTA compared to the existing approaches, including DAN, ByteTrack, GSDT, RSOADL-MODT, CGTracker, the hybrid motion model, FlowNet2-DL, and SCT.
Keyword
Deep tracking, Multiple object tracking, Object detection, Online tracking, Tracking challenge, Video surveillance.
Cite this article
Basavaraju NS, Basavaraja PH.ResNet50-deep affinity network for object detection and tracking in videos. International Journal of Advanced Technology and Engineering Exploration. 2024;11(111):190-204. DOI:10.19101/IJATEE.2023.10102017
Refference
[1]Pramanik A, Pal SK, Maiti J, Mitra P. Granulated RCNN and multi-class deep sort for multi-object detection and tracking. IEEE Transactions on Emerging Topics in Computational Intelligence. 2021; 6(1):171-81.
[2]Deng J, Pan Y, Yao T, Zhou W, Li H, Mei T. Single shot video object detector. IEEE Transactions on Multimedia. 2020; 23:846-58.
[3]Mao H, Chen Y, Li Z, Chen P, Chen F. SCTracker: multi-object tracking with shape and confidence constraints. IEEE Sensors Journal. 2023; 24(3):3123-30.
[4]Ariza-sentís M, Baja H, Vélez S, Valente J. Object detection and tracking on UAV RGB videos for early extraction of grape phenotypic traits. Computers and Electronics in Agriculture. 2023; 211:108051.
[5]Gao X, Wang Z, Wang X, Zhang S, Zhuang S, Wang H. DetTrack: an algorithm for multiple object tracking by improving occlusion object detection. Electronics. 2023; 13(1):1-16.
[6]Li J, Piao Y. Multi-object tracking based on re-identification enhancement and associated correction. Applied Sciences. 2023; 13(17):1-16.
[7]Azimjonov J, Özmen A. A real-time vehicle detection and a novel vehicle tracking systems for estimating and monitoring traffic flow on highways. Advanced Engineering Informatics. 2021; 50:101393.
[8]Lu X, Ma C, Ni B, Yang X. Adaptive region proposal with channel regularization for robust object tracking. IEEE Transactions on Circuits and Systems for Video Technology. 2019; 31(4):1268-82.
[9]Fernández-sanjurjo M, Mucientes M, Brea VM. Real-time multiple object visual tracking for embedded GPU systems. IEEE Internet of Things Journal. 2021; 8(11):9177-88.
[10]Yu E, Li Z, Han S, Wang H. Relationtrack: relation-aware multiple object tracking with decoupled representation. IEEE Transactions on Multimedia. 2022; 25: 2686-97.
[11]Gu S, Ma J, Hui G, Xiao Q, Shi W. STMT: spatio-temporal memory transformer for multi-object tracking. Applied Intelligence. 2023; 53(20):23426-41.
[12]Hassaballah M, Kenk MA, Muhammad K, Minaee S. Vehicle detection and tracking in adverse weather using a deep learning framework. IEEE Transactions on Intelligent Transportation Systems. 2020; 22(7):4230-42.
[13]Baisa NL. Occlusion-robust online multi-object visual tracking using a GM-PHD filter with CNN-based re-identification. Journal of Visual Communication and Image Representation. 2021; 80:103279.
[14]Wang W, Shen J, Lu X, Hoi SC, Ling H. Paying attention to video object pattern understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2020; 43(7):2413-28.
[15]Hou J, Li B. Swimming target detection and tracking technology in video image processing. Microprocessors and Microsystems. 2021; 80:103535.
[16]Razzok M, Badri A, El MI, Ruichek Y, Sahel A. Pedestrian detection and tracking system based on deep-SORT, YOLOv5, and new data association metrics. Information. 2023; 14(4):1-16.
[17]Feng W, Bai L, Yao Y, Yu F, Ouyang W. Towards frame rate agnostic multi-object tracking. International Journal of Computer Vision. 2023:1-20.
[18]Wang S, Li WX, Wang L, Xu LS, Deng QX. VGT-MOT: visibility-guided tracking for online multiple-object tracking. Machine Vision and Applications. 2023; 34(4):1-6.
[19]Wang G, Wang Y, Gu R, Hu W, Hwang JN. Split and connect: a universal tracklet booster for multi-object tracking. IEEE Transactions on Multimedia. 2022; 25:1256-68.
[20]Zhou Y, Chen J, Wang D, Zhu X. Multi-object tracking using context-sensitive enhancement via feature fusion. Multimedia Tools and Applications. 2023:1-20.
[21]Boragule A, Jang H, Ha N, Jeon M. Pixel-guided association for multi-object tracking. Sensors. 2022; 22(22):1-14.
[22]Elhoseny M. Multi-object detection and tracking (MODT) machine learning model for real-time video surveillance systems. Circuits, Systems, and Signal Processing. 2020; 39:611-30.
[23]Jha S, Seo C, Yang E, Joshi GP. Real time object detection and trackingsystem for video surveillance system. Multimedia Tools and Applications. 2021; 80:3981-96.
[24]Liu D, Cui Y, Chen Y, Zhang J, Fan B. Video object detection for autonomous driving: motion-aid feature calibration. Neurocomputing. 2020; 409:1-11.
[25]Yu H, Huang Y, Pi L, Zhang C, Li X, Wang L. End-to-end video text detection with online tracking. Pattern Recognition. 2021; 113:107791.
[26]Sun S, Akhtar N, Song H, Mian A, Shah M. Deep affinity network for multiple object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2019; 43(1):104-19.
[27]Ji Y, Zhang H, Jie Z, Ma L, Wu QJ. CASNet: a cross-attention siamese network for video salient object detection. IEEE Transactions on Neural Networks and Learning Systems. 2020; 32(6):2676-90.
[28]Zhang Y, Sun P, Jiang Y, Yu D, Weng F, Yuan Z, et al. Bytetrack: multi-object tracking by associating every detection box. In European conference on computer vision 2022 (pp. 1-21). Cham: Springer Nature Switzerland.
[29]Wang Y, Kitani K, Weng X. Joint object detection and multi-object tracking with graph neural networks. In international conference on robotics and automation 2021 (pp. 13708-15). IEEE.
[30]Alagarsamy R, Muneeswaran D. Multi-object detection and tracking using reptile search optimization algorithm with deep learning. Symmetry. 2023; 15(6):1-17.
[31]Feng X, Wu HM, Yin YH, Lan LB. CGTracker: center graph network for one-stage multi-pedestrian-object detection and tracking. Journal of Computer Science and Technology. 2022; 37(3):626-40.
[32]Qureshi SA, Hussain L, Chaudhary QU, Abbas SR, Khan RJ, Ali A, et al. Kalman filtering and bipartite matching based super-chained tracker model for online multi object tracking in video sequences. Applied Sciences. 2022; 12(19):1-19.
[33]Wu Y, Sheng H, Zhang Y, Wang S, Xiong Z, Ke W. Hybrid motion model for multiple object tracking in mobile devices. IEEE Internet of Things Journal. 2022; 10(6):4735-48.
[34]Singh D, Srivastava R. An end to end trained hybrid CNN model for multi-object tracking. Multimedia Tools and Applications. 2022; 81(29):42209-21.
[35]Xuan S, Li S, Zhao Z, Zhou Z, Zhang W, Tan H, et al. Rotation adaptive correlation filter for moving object tracking in satellite videos. Neurocomputing. 2021; 438:94-106.
[36]Suljagic H, Bayraktar E, Celebi N. Similarity based person re-identification for multi-object tracking using deep Siamese network. Neural Computing and Applications. 2022; 34(20):18171-82.
[37]Ma L, Zhong Q, Zhang Y, Xie D, Pu S. Associative affinity network learning for multi-object tracking. Frontiers of Information Technology & Electronic Engineering. 2021; 22(9):1194-206.
[38]Xu Y, Ban Y, Delorme G, Gan C, Rus D, Alameda-pineda X. TransCenter: transformers with dense representations for multiple-object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2022; 45(6):7820-35.
[39]Wu H, Nie J, Zhu Z, He Z, Gao M. Learning task-specific discriminative representations for multiple object tracking. Neural Computing and Applications. 2023; 35(10):7761-77.
[40]Hu X, Jeon Y. FFTransMOT: feature-fused transformer for enhanced multi-object tracking. IEEE Access. 2023; 11:130060-71.
[41]Chen S, Hu X, Jiang W, Zhou W, Ding X. Novel learning framework for optimal multi-object video trajectory tracking. Virtual Reality & Intelligent Hardware. 2023; 5(5):422-38.
[42]Xiang X, Ren W, Qiu Y, Zhang K, Lv N. Multi-object tracking method based on efficient channel attention and switchable atrous convolution. Neural Processing Letters. 2021; 53(4):2747-63.
[43]Lee J, Jeong M, Ko BC. Graph convolution neural network-based data association for online multi-object tracking. IEEE Access. 2021; 9:114535-46.
[44]Li Y, Wu L, Chen Y, Wang X, Yin G, Wang Z. Motion estimation and multi-stage association for tracking-by-detection. Complex & Intelligent Systems. 2023:1-4.
[45]Chen M, Banitaan S, Maleki M. Enhancing pedestrian group detection and tracking through zone-based clustering. IEEE Access. 2023; 11:132162-79.
[46]Liang H, Wu T, Zhang Q, Zhou H. Non-maximum suppression performs later in multi-object tracking. Applied Sciences. 2022; 12(7):1-11.
[47]Li Y, Liu Y, Zhou C, Xu D, Tao W. A lightweight scheme of deep appearance extraction for robust online multi-object tracking. The Visual Computer. 2023:1-7.
[48]https://motchallenge.net/data/MOT17/. Accessed 15 December 2023.
[49]Walia IS, Kumar D, Sharma K, Hemanth JD, Popescu DE. An integrated approach for monitoring social distancing and face mask detection using stacked Resnet-50 and YOLOv5. Electronics. 2021; 10(23):1-15.
[50]Li B, Lima D. Facial expression recognition via ResNet-50. International Journal of Cognitive Computing in Engineering. 2021; 2:57-64.