Chapter 10: Machine Learning Methods for Predicting Protein-Nucleic Acids Interactions
Protein-nucleic acids interactions drive many key cellular functions, such as regulation of gene expression, transcription, and translation. Experimental characterization of the molecular-level details of these interactions is relatively expensive and time-consuming since it requires application of complex and labor-intensive methods, such as X-ray crystallography and/or NMR. Given the relatively low coverage of the experimental molecular-level data on the protein-nucleic acids interactions, many computational methods that predict these interactions from the readily available protein sequences were developed. We introduce and describe a comprehensive collection of 51 methods that predict nucleic acid interacting amino acids in protein sequences. These methods include 20 DNA-binding predictors, 20 RNA-binding predictors and 11 methods that predict both DNA- and RNA-binding residues. We briefly summarize their inputs, predictive architectures, outputs and availability. We find that most of these methods were trained using protein-nucleic acids structures, compared to a more limited number of methods that predict these interactions in the intrinsically disordered regions. We observe that these methods rely almost exclusively on classical/shallow machine learning and deep learning algorithms. Finally, we endorse five recent, readily available and arguably more useful predictors.