PSSM amino-acid composition based rules for gene identification
Heena Farooq Bhat and M. Arif Wani
Abstract
One of the major aspects in recognizing the molecular mechanism of the cell is to understand the significance or function of each protein encoded in the genome. For that purpose, genome annotation proves to be very supportive. One of the most obligatory phases of genome annotation is the prediction of the genes. Several methods or techniques have been developed in order to locate or predict the patterns of genes in genome sequence. However, still, the recognition of genes is found to be very complicated problem. Recognizing the corresponding gene of a given protein sequence by means of conventional tools is error prone. Hence, the recognition of genes is a very demanding task. In this paper, we first concentrate on the problem of gene prediction and its challenges. We then present a new method for identifying genes. This new method follows a two-step procedure. First, we present new features extracted from protein sequences and these features are derived from a position specific scoring matrix (PSSM). The PSSM profiles are converted into uniform numeric representation. Then, a new structured approach has been applied on PSSM vector which uses a decision tree based technique for obtaining rules. The rules derived from an algorithm correspond to genes. This new method has been demonstrated on genome DNAset dataset. It is observed that the experimental results of new approach produces better results.
Keyword
Gene prediction, Classification, Feature extraction, Binding proteins, Rule induction, PSSM.
Cite this article
.PSSM amino-acid composition based rules for gene identification. International Journal of Advanced Technology and Engineering Exploration. 2018;5(46):318-325. DOI:10.19101/IJATEE.2018.546018
Refference
[1]Wani MA. Incremental hybrid approach for microarray classification. In international conference on machine learning and applications 2008 (pp. 514-20). IEEE.
[2]Wani MA. Microarray classification using sub-space grids. In machine learning and applications and workshops 2011 (pp. 389-94). IEEE.
[3]Wani MA. Introducing subspace grids to recognise patterns in multidimensinal data. In international conference on machine learning and applications 2012 (pp. 33-9). IEEE.
[4]Wani MA, Yesilbudak M. Recognition of wind speed patterns using multi-scale subspace grids with decision trees. International Journal of Renewable Energy Research. 2013; 3(2):458-62.
[5]Wani MA. SAFARI: a structured approach for automatic rule. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 2001; 31(4):650-7.
[6]Goel N, Singh S, Aseri TC. A comparative analysis of soft computing techniques for gene prediction. Analytical Biochemistry. 2013; 438(1):14-21.
[7]Bhat HF, Wani MA. Modified one-against-all algorithm based on support vector machine. International Journal of Advanced Research in Computer Science and Software Engineering. 2013.
[8]Bhat HF, Wani MA. A comparative study of five main support vector machine based multiclass classification algorithms. International Journal of Advance Foundation and Research in Science & Engineering. 2014; 1(2):1-6.
[9]Wani MA. Hybrid method for fast SVM training in applications involving large volumes of data. In international conference on machine learning and applications 2013 (pp. 491-4). IEEE.
[10]Wani MA, Bhat HF. Multiclass SVM algorithms for wind speed prediction. In international conference on renewable energy research and applications 2017 (pp. 1139-43). IEEE.
[11]Khan AI, Wani MA. Efficient and rotation invariant fingerprint matching algorithm using adjustment factor. In international conference on machine learning and applications 2015 (pp. 1103-10). IEEE.
[12]Bhat FA, Wani MA. Performance comparison of major classical face recognition techniques. In international conference on machine learning and applications 2014 (pp. 521-8). IEEE.
[13]Mujtaba T, Wani MA. Daily global horizontal solar radiation forecasting using extreme learning machines. International conference on computing for sustainable global development (pp. 7290-5). IEEE.
[14]Bhat FA, Wani MA. Dropout Technique Based Convolutional Neural Networks Model for Face Recognition. Artificial Intelligent Systems and Machine Learning. 2017; 9(9):202-9.
[15]Bhat MR, Wani MA. Mixture weighted latent dirichlet allocation, an optimized and generalized probabilistic model for large corpus of data. Artificial Intelligent Systems and Machine Learning. 2018; 10(1):8-17.
[16]Mathe C, Sagot MF, Schiex T, Rouze P. Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Research. 2002; 30(19):4103-17.
[17]Xu Y, Mural RJ, Einstein JR, Shah MB, Uberbacher EC. GRAIL: a multi-agent neural network system for gene identification. Proceedings of the IEEE. 1996; 84(10):1544-52.
[18]Krogh A. Using database matches with HMMGene for automated gene detection in Drosophila. Genome Research. 2000; 10:523-8.
[19]Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA1. Journal of Molecular Biology. 1997; 268(1):78-94.
[20]http://genes.mit.edu/GENSCAN.html. Accessed 15 May 2018.
[21]Yeh RF, Lim LP, Burge CB. Computational inference of homologous gene structures in the human genome. Genome Research. 2001; 11:803-16.
[22]Riyaz R, Wani MA. Local and global data spread based index for determining number of clusters in a dataset. In 15th IEEE international conference on machine learning and applications (ICMLA) 2016 (pp. 651-6). IEEE.
[23]Klasberg S, Bitard-Feildel T, Mallet L. Computational identification of novel genes: current and future perspectives. Bioinformatics and Biology Insights. 2016; 10:121-31.
[24]Goel N, Singh S, Aseri TC. A review of soft computing techniques for gene prediction. ISRN Genomics. 2013:1-8.
[25]Sleator RD. An overview of the current status of eukaryote gene prediction strategies. Gene. 2010; 461(1-2):1-4.
[26]Yandell M, Ence D. A beginners guide to eukaryotic genome annotation. Nature Reviews Genetics. 2012; 13(5):329-42.
[27]Guigo R, Knudsen S, Drake N, Smith T. Prediction of gene structure. Journal of Molecular Biology. 1992; 226(1):141-57.
[28]Salamov AA, Solovyev VV. Ab initio gene finding in Drosophila genomic DNA. Genome Research. 2000; 10:516-22.
[29]Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Research. 2004; 32(suppl_ 2):309-12.
[30]Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 1997; 25(17):3389-402.
[31]Altschul SF, Koonin EV. Iterated profile searches with PSI-BLAST-a tool for discovery in protein databases. Trends in Biochemical Sciences. 1998; 23(11):444-7.
[32]Gribskov M, McLachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proceedings of the National Academy of Sciences. 1987; 84(13):4355-8.
[33]Kumar M, Gromiha MM, Raghava GP. Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinformatics. 2007; 8.