MDL AND THE STATISTICAL MECHANICS OF PROTEIN POTENTIALS
The combination of a wealth of structural data and impressive computational power provides detailed information pertaining to the structure and dynamics of biomacromolecules. A natural inclination is to incorporate this information into models to gain added predictive power on protein folding and stability. There has been considerable recent interest in developing “knowledge-based” potentials to describe internal interactions in proteins. In these approaches, probability distribution functions are inferred from existing knowledge. A common assumption has been the “quasi-chemical approximation” or “Boltzmann device”. This method relates statistical mechanical probabilities to observed frequencies. The validity of this approach is discussed in detail from a statistical mechanics perspective. Because statistical mechanics is a form of statistical inference based on a lack of knowledge of the system, the “Boltzmann device” does not have a rigorous theoretical justification. In the present work, a statistical mechanics based on partial knowledge of the system is employed. This statistical mechanical scheme uses the minimum description length (MDL) of phase space as its main tool. With this approach, “knowledge-based” potentials can be derived in a rigorous fashion. In practical calculations, these potentials are best obtained using Bayesian inference methods similar to those used in image reconstruction.