Chapter 12: Designing Effective Predictors of Protein Post-Translational Modifications Using iLearnPlus
Posttranslational modifications (PTMs) have vital roles in a myriad of biological processes, such as metabolism, DNA damage response, transcriptional regulation, protein-protein interactions, cell death, immune response, signaling pathways and aging. Identification of PTM sites is a crucial first step for biochemical, pathological and pharmaceutical studies associated with the functional characterization of proteins. However, experimental approaches for identifying PTM sites are relatively expensive, labor-intensive and time-consuming, partly due to the dynamics and reversibility of PTMs. In this context, computational methods that accurately predict PTMs serve as a useful alternative, especially when targeting large-scale whole-proteome annotations. We briefly summarize and review existing predictors of PTM sites in protein sequences. Moreover, we introduce the iLearnPlus platform that facilitates the development of new predictive methods and apply it to generate a new PTM predictor. We elaborate a detailed procedure for the development of predictive models, particularly focusing on the deep learning (DL) techniques. We assess predictive performance of the developed DL model and demonstrate how to compare it against other machine learning algorithms. While we use iLearnPlus in the context of the PTM prediction, we emphasize that this platform can be used to design predictive systems for a broad spectrum of other related problems that cover prediction of structural and functional characteristics of proteins and nucleic acids from their sequences.