Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Accurate prediction of protein disordered regions by mining protein structure data
Cheng J., Sweredoski M., Baldi P. Data Mining and Knowledge Discovery11 (3):213-222,2005.Type:Article
Date Reviewed: Oct 11 2006

Disordered regions in proteins have an essential role, at least with respect to their biological properties and behavior, and methods of investigation. Therefore, it is important to determine and predict the locations of these regions. The authors report on their method, program, and results for predicting intrinsically disordered regions in proteins.

The method is based on the structural protein data bank (PDB) available at the University of California. This project is not the first the authors have completed: the program DISpro is one of the protein data mining tools developed at the Institute for Genomics and Bioinformatics at the University of California, Irvine. These tools are available for non-profit applications through the Internet. Tutorials on bioinformatic themes are also provided. Some of these can be of help to newcomers wanting to understand this paper.

The authors have developed a sophisticated method, based on recursive neural networks, to take into account long-range contextual information for determining a fixed number of weights during the learning process. Starting with the structural properties, predicted secondary structure class, and predicted relative solvent accessibility of nonredundant protein chains selected from the PDB, the network was trained and tested by ten-fold cross-validation. The resulting network was tested on CASP5, containing essentially different proteins from PDB. The precision of the prediction power of DISpro overtakes those of the other predictors tested on CASP5.

Finally, the paper lists some ideas about how to refine the method, by taking into account short and long disordered regions separately (proven by the authors, using DISpro, to behave differently), and also presents predictions for homolog proteins. Further variations of the method could be incorporated into these for protein tertiary structure prediction. The method might be used to cross-relate different types of protein databases: structural, pathway, and protein interaction.

Reviewers:  K. BaloghZsofia Balogh Review #: CR133422 (0708-0813)
Bookmark and Share
Would you recommend this review?
Other reviews under "Data Mining": Date
A scalable, incremental learning algorithm for classification problems
Ye N., Li X. Computers and Industrial Engineering 43(4): 677-692, 2002. Type: Article, Reviews: (2 of 2)
Sep 4 2003
High performance discovery in time series: techniques and case studies (Monographs in Computer Science)
Shasha D., Zhu Y., SpringerVerlag, 2004. Type: Book (9780387008578)
Feb 11 2005
Query enrichment for Web-query classification
Shen D., Pan R., Sun J., Pan J., Wu K., Yin J., Yang Q. ACM Transactions on Information Systems 24(3): 320-352, 2006. Type: Article
Jan 8 2007

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 2004™
Terms of Use
| Privacy Policy