Chapter 13: Databases of Protein Structure and Function Predictions at the Amino Acid Level
The rapid growth of the number of protein sequences greatly exceeds the pace of efforts to annotate these proteins functionally and structurally. The closing of the ensuing large and growing gap in the amino acid (AA)-level annotations of protein structure and function can be facilitated using accurate and fast computational predictors. Hundreds of sequence-based predictors of the AA-level annotations have been developed, making it challenging for the end users to identify suitable/good predictors and collect their results. One convenient solution is to obtain pre-computed predictions from large-scale databases, which include MobiDB, D2P2 and DescribePROT. These databases provide access to a diverse set of structural and functional characteristics, such as domains, secondary structures, solvent accessibility, intrinsic disorder, posttranslational modifications (PTMs), protein/DNA/RNA-binding AAs, disordered linkers and signal peptides. We motivate and introduce these databases, discuss and compare their contents, and comment on their applications and limitations. We find that these databases provide complementary scope and services, with D2P2 delivering comprehensive annotations of domains and PTMs, MobiDB focusing on the intrinsic disorder and being highly-connected to other resources, and DescribePROT covering the most diverse set of structural and functional features. We briefly examine practical applications for some of the structural predictions covered by these databases. We also concisely discuss modern predictive webservers that can be used when users need to collect the AA-level annotations for proteins that are not included in these databases.