A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition
Loading...
Date
2014-12
Journal Title
Journal ISSN
Volume Title
Type
Article
Publisher
SCIENCE & INFORMATION SAI ORGANIZATION LTD
Series Info
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS;Volume: 5 Issue: 12 Pages: 97-106
Doi
Scientific Journal Rankings
Abstract
Protein fold recognition plays an important role in computational protein analysis since it can determine protein function whose structure is unknown. In this paper, a Classified Sequential Pattern mining technique for Protein Fold Recognition (CSPF) is proposed. CSPF technique consists of two main phases: the sequential mining pattern phase and the fold recognition phase. In the sequential mining pattern phase, Mix & Test algorithm is developed based on Grammatical Inference, which is used as a training phase. Mix & Test algorithm minimizes I/O costs by one database scan, discovers subsequence combinations directly from sequences in memory without searching the whole sequence file, has no database projection, handles gaps, and works with variant length sequences without having to align them. In addition, a parallelized version of Mix & Test algorithm is applied to speed up Mix & Test algorithm performance. In the fold recognition phase, unknown protein folds are predicted via a proposed testing function. To test the performance, 36 SCOP protein folds are used, where the accuracy rate is 75.84% for training data and 59.7% for testing data.
Description
Accession Number: WOS:000219167700014
Keywords
October University for University for protein fold recognition, sequential mining, protein fold recognition, sequential mining, grammatical inference, Data mining
Citation
MINING SEQUENTIAL PATTERNS By: AGRAWAL, R; SRIKANT, R PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON DATA ENGINEERING Book Series: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (SERIES) Pages: 3-14 Published: 1995 Times Cited: 1,599 2. Parallel evolution strategy on grids for the protein threading problem By: Alione, N. J. Parallel Distributed Computing Volume: 66 Pages: 489-1502 Published: 2006 Times Cited: 1 3. Pattern Discovery and Biosequences By: Brazma, A.; Johansen, I.; Vilo, J.; et al. ICGI, LNCS (LNAI) 2000 Volume: 1433 Pages: 257-270 Published: 2000 Publisher: Springer, Heidelberg [Show additional data] Times Cited: 1 4. A parallel hybrid GA for peptide 3D structure prediction By: Carpio, C.; Sasaki, S.; Baranyi, L.; et al. P WORKSH GEN INF U A Published: 1995 Publisher: Universal Academy Press, Tokyo [Show additional data] Times Cited: 1 5. Protein Fold Recognition with Combined SVM-RDA Classifier By: Chmielnicki, Wieslaw; Stapor, Katarzyna HYBRID ARTIFICIAL INTELLIGENCE SYSTEMS, PT 1 Book Series: Lecture Notes in Artificial Intelligence Volume: 6076 Pages: 162-+ Published: 2010 Times Cited: 8 6. A Deep Glimpse into Protein Fold Recognition By: Eldin, A. Sharaf; Soliman, T. H. A.; Marie, M. E.; et al. International Journal of Sciences Volume: 2 Pages: 24-33 Published: 2013 [Show additional data] Times Cited: 1 7. A top-down method for mining most specific frequent patterns in biological sequence data By: Ester, M; Zhang, X PROCEEDINGS OF THE FOURTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING Book Series: SIAM Proceedings Series Pages: 90-101 Published: 2004 Times Cited: 7 8. Fold recognition by combining profile-profile alignment and support vector machine By: Han, SJ; Lee, BC; Yu, ST; et al. BIOINFORMATICS Volume: 21 Issue: 11 Pages: 2667-2673 Published: JUN 1 2005 Times Cited: 30 9. Parallel evolution strategy for protein threading By: Islam, R.; Ngom, A. P 25 INT C CHIL COMP Pages: 2347-2354 Published: 2005 Times Cited: 1 10. FreeSpan: FREquEnt pattern-projected Sequential PAtterN mining By: Jiawei Han; Jian Pei; Mortazavi-Asl, B.; et al. Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pages: 355-9 Published: 2000 Times Cited: 134 11. GenTHREADER: An efficient and reliable protein fold recognition method for genomic sequences By: Jones, DT JOURNAL OF MOLECULAR BIOLOGY Volume: 287 Issue: 4 Pages: 797-815 Published: APR 9 1999 Times Cited: 705 12. A solution to protein folding problem using a genetic algorithm with modified keep best reproduction strategy By: Judy, M. V.; Ravichandran, K. S. 2007 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-10, PROCEEDINGS Book Series: IEEE Congress on Evolutionary Computation Pages: 4776-4780 Published: 2007 Times Cited: 3 13. Evolutionary Monte Carlo for protein folding simulations By: Liang, FM; Wong, WH JOURNAL OF CHEMICAL PHYSICS Volume: 115 Issue: 7 Pages: 3374-3380 Published: AUG 15 2001 Times Cited: 103 14. Threading Using Neural nEtwork (TUNE): the measure of protein sequence-structure compatibility By: Lin, K; May, ACW; Taylor, WR BIOINFORMATICS Volume: 18 Issue: 10 Pages: 1350-1357 Published: OCT 2002 Times Cited: 17 15. An algorithm for mining frequent patterns in biological sequence By: Ling Chen; Wei Liu 2011 IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences (ICCABS) Pages: 63-8 Published: 2011 Times Cited: 1 16. Aligning multiple protein sequences by parallel hybrid genetic algorithm. (View record in MEDLINE) By: Nguyen, Hung Dinh; Yoshihara, Ikuo; Yamamori, Kunihito; et al. Genome informatics. International Conference on Genome Informatics Volume: 13 Pages: 123-32 Published: 2002 Times Cited: 18 17. Mining sequential patterns by pattern-growth: The PrefixSpan approach By: Pei, J; Han, JW; Mortazavi-Asl, B; et al. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING Volume: 16 Issue: 11 Pages: 1424-1440 Published: NOV 2004 Times Cited: 483 18. A Bayesian network model for protein fold and remote homologue recognition By: Raval, A; Ghahramani, Z; Wild, DL BIOINFORMATICS Volume: 18 Issue: 6 Pages: 788-801 Published: JUN 2002 Times Cited: 25 19. Parallel protein folding with STAPL By: Thomas, S.; Amato, N. P IEEE 18 INT PAR DI Published: 2004 Times Cited: 1 20. The genetic algorithm approach to protein structure prediction By: Unger, R APPLICATIONS OF EVOLUTIONARY COMPUTATION IN CHEMISTRY Book Series: Structure and Bonding Volume: 110 Pages: 153-175 Published: 2004 Times Cited: 30 21. Scalable sequential pattern mining for biological sequences By: Wang, Ke; Xu, Yabo; Yu, Jeffrey Xu. P 13 ACM INT C INF K Pages: 178-187 Published: 2004 Times Cited: 39 22. Protein fold class prediction using neural networks with tailored early-stopping By: Wiebringhaus, T; Igel, C; Gebert, J 2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS Book Series: IEEE International Joint Conference on Neural Networks (IJCNN) Pages: 1693-1697 Published: 2004 Times Cited: 4 23. TOPPER: An algorithm for mining top k patterns in biological sequences based on regularity measurement By: Xiong, Y.; He, J.; Zhu, Y. P IEEE BIOINF BIOM W Pages: 283-288 Published: 2004 Times Cited: 1 24. Fold recognition by predicted alignment accuracy By: Xu, JB IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS Volume: 2 Issue: 2 Pages: 157-165 Published: APR-JUN 2005 Times Cited: 28 25. SPADE: An efficient algorithm for mining frequent sequences By: Zaki, MJ MACHINE LEARNING Volume: 42 Issue: 1-2 Pages: 31-60 Published: JAN 2001