Text independent voiceprint recognition model based on I-vector
Jing Zhang and Minfeng Yao
Abstract
The commonly used text independent Voiceprint recognition models are Gaussian Mixture Model (GMM) and GMM and general background model (GMM-UBM). In the equalization vector of the GMM model, both the speaker information and the channel information are included, which results in unstable performance of the recognition system of the GMM and GMM-UBM models. In addition, the recognition ability for cross channel is poor, moreover, both models are limited by the maximum likelihood criterion. So, they employ weak ability to distinguish categories. I-vector is also known as identity authentication vector and has been proposed on the basis of Gaussian super vector in recent years. The method uses one space instead of the two spaces, including the difference between the speakers and the difference between the channels, and it is known as the most cutting-edge speaker modeling technology available today. Therefore, this paper adopted i-vector framework as the speaker recognition model, and studied the main problems that need to be dealt with. The recognition effect of GMM-UBM model and i-vector model were investigated by experiment as well. Through comparison experiments, it is verified that the i-vector recognition model employs a lower error rate of the and is more efficient. In the recognition phase, to quickly recognize the speaker's identity only needs to record two seconds of speech, and the system recognition accuracy reaches 97%.
Keyword
Speaker recognition, Text-independent, I-vector, EER.
Cite this article
Zhang J, Yao M.Text independent voiceprint recognition model based on I-vector. International Journal of Advanced Technology and Engineering Exploration. 2020;7(62):1-10. DOI:10.19101/IJATEE.2019.650076
Refference
[1]Zhaohui W, Yingchun Y. Speaker recognition model and method. Beijing: Tsinghua University Press, 2009, pp.14-7.
[2]Rao RR, Prasad A, Rao CK. Robust features for automatic text-independent speaker recognition using Gaussian mixture model. International Journal of Soft Computing and Engineering. 2011; 1(5):330-5.
[3]Drgas S, Virtanen T. Speaker verification using adaptive dictionaries in non-negative spectrogram deconvolution. In international conference on latent variable analysis and signal separation 2015 (pp. 462-9). Springer, Cham.
[4]Swietojanski P, Ghoshal A, Renals S. Convolutional neural networks for distant speech recognition. IEEE Signal Processing Letters. 2014; 21(9):1120-4.
[5]Li Z, HE L, Zhang W, Liu J. Speaker recognition based on discriminant i-vector local distance preserving projection [J]. Journal of Tsinghua University (Science and Technology). 2012.
[6]You CH, Li H, Ma B, Lee KA. A study on GMM-SVM with adaptive relevance factor and its comparison with i-vector and JFA for speaker recognition. In international conference on acoustics, speech and signal processing 2013 (pp. 7683-7). IEEE.
[7]Gupta V, Kenny P, Ouellet P, Stafylakis T. I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription. In international conference on acoustics, speech and signal processing 2014 (pp. 6334-8). IEEE.
[8]Cumani S, Laface P. Scoring heterogeneous speaker vectors using nonlinear transformations and tied PLDA models. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2018; 26(5):995-1009.
[9]Kanagasundaram A, Dean D, Sridharan S, Gonzalez-Dominguez J, Gonzalez-Rodriguez J, Ramos D. Improving short utterance i-vector speaker verification using utterance variance modelling and compensation techniques. Speech Communication. 2014; 59:69-82.
[10]Lu X, Shen P, Tsao Y, Kawai H. Regularization of neural network model with distance metric learning for i-vector based spoken language identification. Computer Speech & Language. 2017; 44:48-60.
[11]Wang W, Xu J, Yan Y. Identity vector extraction using shared mixture of PLDA for short-time speaker recognition. Chinese Journal of Electronics. 2019; 28(2):357-63.
[12]Ahmed AI, Chiverton J, Ndzi D, Becerra V. Channel variability synthesis in i-vector speaker recognition. IET international conference on intelligent signal processing 2017.
[13]Nayana PK, Mathew D, Thomas A. Performance comparison of speaker recognition systems using GMM and i-vector methods with PNCC and RASTA PLP features. In international conference on intelligent computing, instrumentation and control technologies 2017 (pp. 438-43). IEEE.
[14]Joy NM, Kothinti SR, Umesh S. FMLLR speaker normalization with i-vector: In pseudo-FMLLR and distillation framework. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2018; 26(4):797-805.
[15]Xu L, Lee KA, Li H, Yang Z. Generalizing i-vector estimation for rapid speaker recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2018; 26(4):749-59.
[16]Al-Kaltakchi MT, Woo WL, Dlay SS, Chambers JA. Speaker identification evaluation based on the speech biometric and i-vector model using the timit and ntimit databases. In international workshop on biometrics and forensics 2017 (pp. 1-6). IEEE.
[17]Kanagasundaram A, Dean D, Sridharan S, Ghaemmaghami H, Fookes C. A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems. International Journal of Speech Technology. 2017; 20(2):247-59.
[18]Lizhe T, Dawei F, Dongsheng L, Rongchun L, Feng L. Analysis of large-scale distributed machine learning systems: a case study on LDA. Journal of Computer Applications. 2017; 37(3): 628-34.
[19]Lei L, Kun S. Speaker recognition using wavelet packet entropy, I-Vector, and Cosine Distance Scoring. Journal of Electrical and Computer Engineering. 2017.