Chapter 8: Machine Learning for Intrinsic Disorder Prediction
Intrinsic disorder in proteins is manifested by regions that lack stable structure under physiological conditions. Proteins with disordered regions are common across all kingdoms of life. These proteins facilitate many essential cellular functions and contribute to dark proteomes. They are associated with a wide spectrum of human diseases and consequently, are considered as potent drug targets. Disordered regions have unique sequence signatures, making them distinguishable from structured protein sequences. Computational disorder prediction is a vibrant research area with over 40 years of history, which heavily depends on machine learning (ML) algorithms and innovations, such as meta learning and deep learning. We summarize a comprehensive collection of 73 ML-based disorder predictors, detail several most successful methods and survey related resources that predict disorder and disorder functions. We discuss historical trends in the development of disorder predictors, highlighting the shifting focus from traditional ML methods to meta-predictors, and most recently to the deep neural networks. We introduce a wide range of useful resources that support disorder and disorder function predictions including databases, webservers, and methods that provide quality assessment of disorder predictions. The availability of these numerous high-quality methods and resources ensures that the computational disorder predictions will continue to make substantial impact in key areas of research including rational drug design, structural genomics, and medicine.