International Journal of Advanced Technology and Engineering Exploration (IJATEE) ISSN (P): 2394-5443 ISSN (O): 2394-7454 Vol - 9, Issue - 86, January 2022
  1. 1
    Google Scholar
Feature-driven label generation for congestion detection in smart cities under big data

Aamish Izhar, Ajay Rastogi, Syed Shafat Ali, S. M. K. Quadri and S. A. M. Rizvi

Abstract

Due to rapid urbanization and the emergence of smart cities, the problem of traffic congestion has materialized into a major issue for smart city planners. Therefore, traffic congestion prediction is needed to effectively reduce traffic congestion and enhance the road capacity. There have been various studies which have tried to solve the problem of traffic congestion. However, it is difficult to properly judge the effectiveness of such studies given the absence of properly labeled datasets. Additionally, current studies use datasets with relatively lesser number of data instances, which does not correctly reflect the big data nature of the traffic data. Motivated by these problems and challenges, in this paper, we aim to study the problem of traffic congestion with respect to effective label-generation under big data perspective. Essentially, we provide two sound and intuitive techniques for label generation which help in the correct annotation of unlabeled data. One of the techniques is based on the number of vehicles plying on the road and the other is based on the amalgamation of average speed and number of vehicles. For this purpose, we consider a publicly available CityPulse traffic dataset with 13.5 million data instances. Using our techniques, we generate “congested” and “not-congested” labels depicting whether there is congestion on the road or not. To tackle the class imbalance problem, besides using random undersampling and oversampling techniques, we also introduce a mixture of the two techniques to negate any bias inherent to two individual sampling techniques. To test the effectiveness of our label generation approaches, we make the extensive use of various machine learning techniques and for performance evaluation we use all the standard classification evaluation metrics. Finally, we compare our techniques with a previous work which only considered average speed for label generation. Our results demonstrate the effectiveness of the proposed approaches against the comparing method. For example, in random undersampling the F1-score of every classifier under the proposed techniques is close to 1, whereas that under the comparing method, F1-score is as low as 0.70 in multinomial naïve Bayes (MNB) classifier and 0.88 in support vector machine (SVM). Similarly, in oversampling, our approaches have a close F1-score of 1 across all the classifiers, whereas the comparing method gets as low as 0.70 in MNB. The same trend can be seen in the mixture of both the sampling techniques.

Keyword

Smart cities, Big data, Label generation, Classification, Traffic congestion.

Cite this article

Izhar A, Rastogi A, Ali SS, Quadri SM, Rizvi SA

Refference

[1][1]Negara JG, Emanuel AW. A conceptual smart city framework for future industrial city in Indonesia. International Journal of Advanced Computer Science and Applications. 2019; 10(7):453-7.

[2][2]Nour MK, Naseer A, Alkazemi B, Jamil MA. Road traffic accidents injury data analytics. International Journal of Advanced Computer Science and Applications. 2020; 11(12):762-70.

[3][3]Dabiri S, Heaslip K. Transport-domain applications of widely used data sources in the smart transportation: a survey. arXiv preprint arXiv:1803.10902. 2018.

[4][4]Christantonis K, Tjortjis C, Manos A, Filippidou DE, Mougiakou Ε, Christelis E. Using classification for traffic prediction in smart cities. In IFIP international conference on artificial intelligence applications and innovations 2020 (pp. 52-61). Springer, Cham.

[5][5]Mystakidis A, Tjortjis C. Big data mining for smart cities: predicting traffic congestion using classification. In international conference on information, intelligence, systems and applications 2020 (pp. 1-8). IEEE.

[6][6]Majumdar S, Subhani MM, Roullier B, Anjum A, Zhu R. Congestion prediction for smart sustainable cities using IoT and machine learning approaches. Sustainable Cities and Society. 2021.

[7][7]Zafar N, Ul HI. Traffic congestion prediction based on estimated time of arrival. PloS One. 2020; 15(12):1-19.

[8][8]Zheng J, Huang M. Traffic flow forecast through time series analysis based on deep learning. IEEE Access. 2020; 8:82562-70.

[9][9]Yu J, Yan Y, Chen X, Luo T. Short-term road traffic flow prediction based on multi-dimensional data. In international conference on intelligent transportation, big data & smart city 2021 (pp. 43-6). IEEE.

[10][10]Wang Z, Thulasiraman P. Foreseeing congestion using LSTM on urban traffic flow clusters. In 6th international conference on systems and informatics (ICSAI) 2019 (pp. 768-74). IEEE.

[11][11]Li Y, Huang C, Jiang J. Research of bus arrival prediction model based on GPS and SVM. In chinese control and decision conference 2018 (pp. 575-9). IEEE.

[12][12]Liu Y, Wu H. Prediction of road traffic congestion based on random forest. In 10th international symposium on computational intelligence and design 2017 (pp. 361-4). IEEE.

[13][13]Bartlett Z, Han L, Nguyen TT, Johnson P. Prediction of road traffic flow based on deep recurrent neural networks. In smartworld, ubiquitous intelligence & computing, advanced & trusted computing, scalable computing & communications. 2019 (pp. 102-9). IEEE.

[14][14]Wang Y, Li L, Xu X. A piecewise hybrid of ARIMA and SVMs for short-term traffic flow prediction. In international conference on neural information processing 2017 (pp. 493-502). Springer, Cham.

[15][15]Kumar SV, Vanajakshi L. Short-term traffic flow prediction using seasonal ARIMA model with limited input data. European Transport Research Review. 2015; 7(3):1-9.

[16][16]Li KL, Zhai CJ, Xu JM. Short-term traffic flow prediction using a methodology based on ARIMA and RBF-ANN. In Chinese automation congress 2017 (pp. 2804-7). IEEE.

[17][17]Singh M, Srivastava VM. Prediction and avoidance of real-time traffic congestion system for Indian metropolitan cities. International Journal of Vehicle Information and Communication Systems. 2020; 5(1):109-18.

[18][18]Ali SS, Anwar T, Rastogi A, Rizvi SA. EPA: exoneration and prominence based age for infection source identification. In proceedings of the international conference on information and knowledge management 2019 (pp. 891-900).

[19][19]Lv Y, Duan Y, Kang W, Li Z, Wang FY. Traffic flow prediction with big data: a deep learning approach. IEEE Transactions on Intelligent Transportation Systems. 2014; 16(2):865-73.

[20][20]Devi S, Neetha T. Machine learning based traffic congestion prediction in a IoT based smart city. International Research Journal of Engineering and Technology. 2017; 4(5):3442-5.

[21][21]Ren C, Chai C, Yin C, Ji H, Cheng X, Gao G, et al. Short-term traffic flow prediction: a method of combined deep learnings. Journal of Advanced Transportation. 2021.

[22][22]Saddad E, Mokhtar HM, El-Bastawissy A, Hazman M. Lake data warehouse architecture for big data solutions. International Journal of Advanced Computer Science and Applications. 2020; 11(8):417-24.

[23][23]Petalas YG, Ammari A, Georgakis P, Nwagboso C. A big data architecture for traffic forecasting using multi-source information. In international workshop of algorithmic aspects of cloud computing 2016 (pp. 65-83). Springer, Cham.

[24][24]Trovati M. Big-data analytics and cloud computing. Theory, Algorithms and Applications. 2015.

[25][25]Yin C, Lin Y, Yang C. A classification and predication framework for taxi-hailing based on big data. In international conference on intelligent computing 2017 (pp. 747-58). Springer, Cham.

[26][26]Florido E, Castaño O, Troncoso A, Martínez-alvarez F. Data mining for predicting traffic congestion and its application to Spanish data. In international conference on soft computing models in industrial and environmental applications 2015 (pp. 341-51). Springer, Cham.

[27][27]Meng M, Shao CF, Wong YD, Wang BB, Li HX. A two-stage short-term traffic flow prediction method based on AVL and AKNN techniques. Journal of Central South University. 2015; 22(2):779-86.

[28][28]Xie J, Choi YK. Hybrid traffic prediction scheme for intelligent transportation systems based on historical and real-time data. International Journal of Distributed Sensor Networks. 2017; 13(11):1-11.

[29][29]Kundu S, Desarkar MS, Srijith PK. Traffic forecasting with deep learning. In region 10 symposium 2020 (pp. 1074-7). IEEE.

[30][30]Joseph LL, Goel P, Jain A, Rajyalakshmi K, Gulati K, Singh P. A novel hybrid deep learning algorithm for smart city traffic congestion predictions. In international conference on signal processing, computing and control 2021 (pp. 561-5). IEEE.

[31][31]Zahid M, Chen Y, Jamal A, Memon MQ. Short term traffic state prediction via hyperparameter optimization based classifiers. Sensors. 2020; 20(3):1-22.

[32][32]Pramanik M, Rahman MM, Anam AS, Ali AA, Amin MA, Rahman AK. Modeling traffic congestion in developing countries using google maps data. In future of information and communication conference 2021 (pp. 513-31). Springer, Cham.

[33][33]Karau H, Konwinski A, Wendell P, Zaharia M. Learning spark: lightning-fast big data analysis. OReilly Media, Inc.; 2015.

[34][34]Ali MI, Gao F, Mileo A. Citybench: a configurable benchmark to evaluate rsp engines using smart city datasets. In international semantic web conference 2015 (pp. 374-89). Springer, Cham.

[35][35]Barnaghi P, Tönjes R, Höller J, Hauswirth M, Sheth A, Anantharam P. Citypulse: real-time iot stream processing and large-scale data analytics for smart city applications. In European semantic web conference (ESWC) 2014.

[36][36]Kolozali S, Bermudez-edo M, Puschmann D, Ganz F, Barnaghi P. A knowledge-based approach for real-time iot data stream annotation and processing. In international conference on internet of things (iThings), and IEEE green computing and communications (GreenCom) and IEEE cyber, physical and social computing (CPSCom) 2014 (pp. 215-22). IEEE.

[37][37]Rastogi A, Mehrotra M, Ali SS. Effective opinion spam detection: a study on review metadata versus content. Journal of Data and Information Science. 2020; 5(2):76-110.

[38][38]Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research. 2002; 16:321-57.

[39][39]Deshpande M, Bajaj PR. Performance analysis of support vector machine for traffic flow prediction. In international conference on global trends in signal processing, information computing and communication 2016 (pp. 126-9). IEEE.

[40][40]Schütze H, Manning CD, Raghavan P. Introduction to information retrieval. Cambridge: Cambridge University Press; 2008.

[41][41]Hosmer JDW, Lemeshow S, Sturdivant RX. Applied logistic regression. John Wiley & Sons; 2013.

[42][42]Breiman L. Random forests. Machine Learning. 2001; 45(1):5-32.

[43][43]Goodfellow I, Bengio Y, Courville A. Deep learning. MIT Press; 2016.