Research PapersNo Access

ProBAPred: Inferring protein–protein binding affinity by incorporating protein sequence and structural features

School of Computer, Electronic and Information, and State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, 100 Daxue Road, 530004 Nanning, P. R. China

Search for more papers by this author

Chen Li

Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia

Search for more papers by this author

Qingfeng Chen

E-mail Address: qingfeng@gxu.edu.cn

Corresponding author.

Search for more papers by this author

, and

Jiangning Song

http://orcid.org/0000-0001-8031-9086

Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia

Monash Centre for Data Science, Faculty of Information Technology, Monash University, VIC 3800, Australia

ARC Centre of Excellence for Advanced Molecular Imaging, Monash University, VIC 3800, Australia

E-mail Address: Jiangning.Song@monash.edu

Corresponding author.

Search for more papers by this author

https://doi.org/10.1142/S0219720018500117Cited by:5 (Source: Crossref)

Abstract

Protein-protein binding interaction is the most prevalent biological activity that mediates a great variety of biological processes. The increasing availability of experimental data of protein–protein interaction allows a systematic construction of protein–protein interaction networks, significantly contributing to a better understanding of protein functions and their roles in cellular pathways and human diseases. Compared to well-established classification for protein–protein interactions (PPIs), limited work has been conducted for estimating protein–protein binding free energy, which can provide informative real-value regression models for characterizing the protein–protein binding affinity. In this study, we propose a novel ensemble computational framework, termed ProBAPred (Protein–protein Binding Affinity Predictor), for quantitative estimation of protein–protein binding affinity. A large number of sequence and structural features, including physical–chemical properties, binding energy and conformation annotations, were collected and calculated from currently available protein binding complex datasets and the literature. Feature selection based on the WEKA package was performed to identify and characterize the most informative and contributing feature subsets. Experiments on the independent test showed that our ensemble method achieved the lowest Mean Absolute Error (MAE; 1.657kcal/mol) and the second highest correlation coefficient ( $R -value = 0.467$ ), compared with the existing methods. The datasets and source codes of ProBAPred, and the supplementary materials in this study can be downloaded at http://lightning.med.monash.edu/probapred/ for academic use. We anticipate that the developed ProBAPred regression models can facilitate computational characterization and experimental studies of protein–protein binding affinity.

Keywords:

References

1. Shibutani S, Takeshita M, Grollman AP , Insertion of specific bases during DNA synthesis past the oxidation-damaged base 8-oxodG, Nature 349 :431–434, 1991, https://doi.org/10.1038/349431a0. Crossref, Medline, Google Scholar
2. Forsythe JA et al., Activation of vascular endothelial growth factor gene transcription by hypoxia-inducible factor 1, Mol Cell Biol 16 :4604–4613, 1996. Crossref, Medline, Google Scholar
3. Ma Y, Hendershot LM , Delineation of a negative feedback regulatory loop that controls protein translation during endoplasmic reticulum stress, J Biol Chem 278 :34864–34873, 2003, https://doi.org/10.1074/jbc.M301107200. Crossref, Medline, Google Scholar
4. Apel K, Hirt H , Reactive oxygen species: Metabolism, oxidative stress, and signal transduction, Annu Rev Plant Biol 55 :373–399, 2004, https://doi.org/10.1146/annurev.arplant.55.031903.141701. Crossref, Medline, Google Scholar
5. Vidal M, Cusick ME, Barabasi AL , Interactome networks and human disease, Cell 144 :986–998, 2011, https://doi.org/10.1016/j.cell.2011.02.016. Crossref, Medline, Google Scholar
6. Massova I, Kollman P , Computational alanine scanning to probe protein-protein interactions: A novel approach to evaluate binding free energies, J Am Chem Soc 121 :11, 1999. Crossref, Google Scholar
7. Ajay A, Murcko MA , Computational methods to predict binding free energy in ligand-receptor complexes, J Med Chem 38 :4953–4967, 1995. Crossref, Medline, Google Scholar
8. Sun T, Zhou B, Lai L, Pei J , Sequence-based prediction of protein–protein interaction using a deep-learning algorithm, BMC Bioinf 18 :277, 2017, https://doi.org/10.1186/s12859-017-1700-2. Crossref, Medline, Google Scholar
9. Perovic V et al., TRI_tool: A web-tool for prediction of protein-protein interactions in human transcriptional regulation, Bioinformatics 33 :289–291, 2017, https://doi.org/10.1093/bioinformatics/btw590. Crossref, Medline, Google Scholar
10. Mirabello C, Wallner B , InterPred: A pipeline to identify and model protein-protein interactions, Proteins 85 :1159–1170, 2017, https://doi.org/10.1002/prot.25280. Crossref, Medline, Google Scholar
11. Garcia-Garcia J et al., iFrag: A protein–protein interface prediction server based on sequence fragments, J Mol Biol 429 :382–389, 2017, https://doi.org/10.1016/j.jmb.2016.11.034. Crossref, Medline, Google Scholar
12. Wang YB et al., Predicting protein–protein interactions from protein sequences by a stacked sparse autoencoder deep neural network, Mol Biosyst 13 :1336–1344, 2017, https://doi.org/10.1039/c7mb00188f. Crossref, Medline, Google Scholar
13. Armstrong JS , Illusions in regression analysis, Int J Forecasting 28 :689–694, 2012, https://doi.org/10.1016/j.ijforecast.2012.02.001. Crossref, Google Scholar
14. Moal IH, Fernandez-Recio J , Comment on ‘protein–protein binding affinity prediction from amino acid sequence’, Bioinformatics 31 :614–615, 2015, https://doi.org/10.1093/bioinformatics/btu682. Crossref, Medline, Google Scholar
15. Yugandhar K, Gromiha MM , Protein–protein binding affinity prediction from amino acid sequence, Bioinformatics 30 :3583–3589, 2014, https://doi.org/10.1093/bioinformatics/btu580. Crossref, Medline, Google Scholar
16. Yugandhar K, Gromiha MM , Response to the comment on ‘protein–protein binding affinity prediction from amino acid sequence’, Bioinformatics 31 :978, 2015, https://doi.org/10.1093/bioinformatics/btu821. Crossref, Medline, Google Scholar
17. Liu S, Zhang C, Zhou H, Zhou Y , A physical reference state unifies the structure-derived potential of mean force for protein folding and binding, Proteins 56 :93–101, 2004, https://doi.org/10.1002/prot.20019. Crossref, Medline, Google Scholar
18. Su Y, Zhou A, Xia X, Li W, Sun Z , Quantitative prediction of protein–protein binding affinity with a potential of mean force considering volume correction, Protein Sci 18 :2550–2558, 2009, https://doi.org/10.1002/pro.257. Crossref, Medline, Google Scholar
19. Vangone A, Bonvin AM , Contacts-based prediction of binding affinity in protein-protein complexes, Elife 4 :e07454, 2015, https://doi.org/10.7554/eLife.07454. Crossref, Medline, Google Scholar
20. Moal IH, Agius R, Bates PA , Protein–protein binding affinity prediction on a diverse set of structures, Bioinformatics 27 :3002–3009, 2011, https://doi.org/10.1093/bioinformatics/btr513. Crossref, Medline, Google Scholar
21. Hall M et al., The WEKA data mining software: An update, ACM SIGKDD Explorations Newsletter 11 :9, 2009. Crossref, Google Scholar
22. Friedman JH , Multivariate adaptive regression splines, Ann Stat 19 :1–67, 1991, https://doi.org/10.1214/aos/1176347963. Crossref, Google Scholar
23. Shevade SK, Keerthi SS, Bhattacharyya C, Murthy KRK , Improvements to the SMO algorithm for SVM regression, IEEE Trans Neural Netw 11 :1188–1193, 2000, https://doi.org/10.1109/72.870050. Crossref, Medline, Google Scholar
24. Kastritis PL et al., A structure-based benchmark for protein–protein binding affinity, Protein Sci 20 :482–491, 2011, https://doi.org/10.1002/pro.580. Crossref, Medline, Google Scholar
25. Sadowski MI, Jones DT , The sequence-structure relationship and protein function prediction, Curr Opin Struct Biol 19 :357–362, 2009, https://doi.org/10.1016/j.sbi.2009.03.008. Crossref, Medline, Google Scholar
26. Ng PC, Henikoff S , SIFT: Predicting amino acid changes that affect protein function, Nucl Acids Res 31 :3812–3814, 2003. Crossref, Medline, Google Scholar
27. Li F et al., GlycoMine: A machine learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome, Bioinformatics 31 :1411–1419, 2015, https://doi.org/10.1093/bioinformatics/btu852. Crossref, Medline, Google Scholar
28. Wang Y et al., Knowledge-transfer learning for prediction of matrix metalloprotease substrate-cleavage sites, Sci Rep 7 :5755, 2017, https://doi.org/10.1038/s41598-017-06219-7. Crossref, Medline, Google Scholar
29. Song J et al., PROSPERous: High-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy, Bioinformatics 34 :684–687, 2017, https://doi.org/10.1093/bioinformatics/btx670. Crossref, Google Scholar
30. Kawashima S et al., AAindex: Amino acid index database, progress report 2008, Nucleic Acids Res 36 :D202–205, 2008, https://doi.org/10.1093/nar/gkm998. Crossref, Medline, Google Scholar
31. Gromiha MM , A statistical model for predicting protein folding rates from amino acid sequence with structural class information, J Chem Inf Model 45 :494–501, 2005, https://doi.org/10.1021/ci049757q. Crossref, Medline, Google Scholar
32. Kabsch W, Sander C , Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features, Biopolymers 22 :2577–2637, 1983, https://doi.org/10.1002/bip.360221211. Crossref, Medline, Google Scholar
33. Noskov SY, Lim C , Free energy decomposition of protein–protein interactions, Biophys J 81 :737–750, 2001, https://doi.org/10.1016/S0006-3495(01)75738-4. Crossref, Medline, Google Scholar
34. Chakrabarti P, Janin J , Dissecting protein–protein recognition sites, Proteins 47 :334–343, 2002. Crossref, Medline, Google Scholar
35. Porollo A, Meller J , Prediction-based fingerprints of protein–protein interactions, Proteins 66 :630–645, 2007, https://doi.org/10.1002/prot.21248. Crossref, Medline, Google Scholar
36. Mitra P, Pal D , New measures for estimating surface complementarity and packing at protein–protein interfaces, FEBS Lett 584 :1163–1168, 2010, https://doi.org/10.1016/j.febslet.2010.02.021. Crossref, Medline, Google Scholar
37. Izvekov S, Voth GA , A multiscale coarse-graining method for biomolecular systems, J Phys Chem B 109 :2469–2473, 2005, https://doi.org/10.1021/jp044629q. Crossref, Medline, Google Scholar
38. Pokarowski P et al., Inferring ideal amino acid interaction forms from statistical protein contact potentials, Proteins 59 :49–57, 2005, https://doi.org/10.1002/prot.20380. Crossref, Medline, Google Scholar
39. Feng Y, Kloczkowski A, Jernigan RL , Potentials ‘R’ Us web-server for protein energy estimations with coarse-grained knowledge-based potentials, BMC Bioinf 11 :92, 2010, https://doi.org/10.1186/1471-2105-11-92. Crossref, Medline, Google Scholar
40. Kastritis PL, Rodrigues JP, Folkers GE, Boelens R, Bonvin AM , Proteins feel more than they see: Fine-tuning of binding affinity by properties of the non-interacting surface, J Mol Biol 426 :2632–2652, 2014, https://doi.org/10.1016/j.jmb.2014.04.017. Crossref, Medline, Google Scholar
41. Li Y et al., Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features, Sci Rep 4 :5765, 2014, https://doi.org/10.1038/srep05765. Crossref, Medline, Google Scholar
42. Wang M et al., Cascleave 2.0, a new approach for predicting caspase and granzyme cleavage targets, Bioinformatics 30 :71–80, 2014, https://doi.org/10.1093/bioinformatics/btt603. Crossref, Medline, Google Scholar
43. Song J et al., PhosphoPredict: A bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection, Sci Rep 7 :6862, 2017, https://doi.org/10.1038/s41598-017-07199-4. Crossref, Medline, Google Scholar
44. Saeys Y, Inza I, Larranaga P , A review of feature selection techniques in bioinformatics, Bioinformatics 23 :2507–2517, 2007, https://doi.org/10.1093/bioinformatics/btm344. Crossref, Medline, Google Scholar
45. Derksen S, Keselman H , Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables, Br J Math Stat Psychol 45 :18, 1992. Crossref, Google Scholar
46. Selvakuberan K, Indradevi M, Rajaram R , Combined feature selection and classification — A novel approach for the categorization of web pages, J Inf Comput Sci 3 :7, 2008. Google Scholar
47. Devi MI, Rajaram R, Selvakuberan K , Generating best features for web page classification, Webology 5 :377–384, 2008. Google Scholar
48. Hajela P, Lin C , Genetic search strategies in multicriterion optimal design, Struct Optim 4 :9, 1992. Crossref, Google Scholar
49. Hunt RJ , Percent agreement, Pearson’s correlation, and kappa as measures of inter-examiner reliability, J Dent Res 65 :128–130, 1986, https://doi.org/10.1177/00220345860650020701. Crossref, Medline, Google Scholar
50. Chang CC, Lin CJ , LIBSVM: A library for support vector machines, ACM Trans Intel Syst Tec 2, 2011, https://doi.org/Artn 2710.1145/1961189.1961199. Google Scholar
51. Hardy RL , Multiquadric equations of topography and other irregular surfaces, J Geophys Res 76 :1905, 1971, https://doi.org/10.1029/JB076i008p01905. Crossref, Google Scholar
52. Stone CJ , Additive regression and other nonparametric models, Ann Stat 13 :689–705, 1985, https://doi.org/10.1214/aos/1176349548. Crossref, Google Scholar
53. Breiman L , Random forests, Mach Learn 45 :5–32, 2001, https://doi.org/10.1023/A:1010933404324. Crossref, Google Scholar
54. Quinlan RJ , 5th Australian Joint Conf Artificial Intelligence (World Scientific, Singapore), pp. 343–348, 1992. Google Scholar
55. Wang Y, Witten I , 9th European Conf Machine Learning (Springer), 1997. Google Scholar
56. Kang DS et al., Structure of an arrestin2-clathrin complex reveals a novel clathrin binding domain that modulates receptor trafficking, J Biol Chem 284 :13, 2009. Crossref, Google Scholar
57. Hoffmann C, Ziegler N, Reiner S, Krasel C, Lohse MJ , Agonist-selective, receptor-specific interaction of human P2Y receptors with beta-arrestin-1 and -2, J Biol Chem 283 :30933–30941, 2008. Crossref, Medline, Google Scholar
58. Booth DG, Hood FE, Prior IA, Royle SJ , A TACC3/ch-TOG/clathrin complex stabilises kinetochore fibres by inter-microtubule bridging, EMBO J 30 :14, 2014. Google Scholar
59. Sato Y et al., Structural basis for docking of peroxisomal membrane protein carrier Pex19p onto its receptor Pex3p, EMBO J 29 :4083–4093, 2010, https://doi.org/10.1038/emboj.2010.293. Crossref, Medline, Google Scholar
60. Ghaedi K, Tamura S, Okumoto K, Matsuzono Y, Fujiki Y , The peroxin pex3p initiates membrane assembly in peroxisome biogenesis, Mol Biol Cell 11 :18, 2000. Crossref, Google Scholar
61. Jones JM, Morrell JC, Gould SJ , PEX19 is a predominantly cytosolic chaperone and import receptor for class 1 peroxisomal membrane proteins, J Cell Biol 164 :11, 2004. Crossref, Medline, Google Scholar