No Access

MATCHING OF STRUCTURAL MOTIFS USING HASHING ON RESIDUE LABELS AND GEOMETRIC FILTERING FOR PROTEIN FUNCTION PREDICTION

Mark Moll

Department of Computer Science, Rice University, Houston, TX 77005, USA

Search for more papers by this author

and

Lydia E. Kavraki

Department of Computer Science, Rice University, Houston, TX 77005, USA

Department of Bioengineering, Rice University, Houston, TX 77005, USA

Structural and Comp. Biology and Molec. Biophysics, Baylor College of Medicine, Houston, TX 77005, USA

Search for more papers by this author

https://doi.org/10.1142/9781848162648_0014Cited by:9 (Source: Crossref)

Abstract:

There is an increasing number of proteins with known structure but unknown function. Determining their function would have a significant impact on understanding diseases and designing new therapeutics. However, experimental protein function determination is expensive and very time-consuming. Computational methods can facilitate function determination by identifying proteins that have high structural and chemical similarity. Our focus is on methods that determine binding site similarity. Although several such methods exist, it still remains a challenging problem to quickly find all functionally-related matches for structural motifs in large data sets with high specificity. In this context, a structural motif is a set of 3D points annotated with physicochemical information that characterize a molecular function. We propose a new method called LabelHash that creates hash tables of n-tuples of residues for a set of targets. Using these hash tables, we can quickly look up partial matches to a motif and expand those matches to complete matches. We show that by applying only very mild geometric constraints we can find statistically significant matches with extremely high specificity in very large data sets and for very general structural motifs. We demonstrate that our method requires a reasonable amount of storage when employing a simple geometric filter and further improves on the specificity of our previous work while maintaining very high sensitivity. Our algorithm is evaluated on 20 homolog classes and a non-redundant version of the Protein Data Bank as our background data set. We use cluster analysis to analyze why certain classes of homologs are more difficult to classify than others. The LabelHash algorithm is implemented on a web server at http://kavrakilab.org/labelhash/.

Computational Systems Bioinformatics

Metrics

Downloaded 3 times

History

PDF download

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

MATCHING OF STRUCTURAL MOTIFS USING HASHING ON RESIDUE LABELS AND GEOMETRIC FILTERING FOR PROTEIN FUNCTION PREDICTION

Recommended