AN ANALYSIS OF INFORMATION CONTENT PRESENT IN PROTEIN-DNA INTERACTIONS
Understanding the role proteins play in regulating DNA replication is essential to forming a complete picture of how the genome manifests itself. In this work, we examine the feasibility of predicting the residues of a protein essential to binding by analyzing protein-DNA interactions from an information theoretic perspective. Through the lens of mutual information, we explore which properties of protein sequence and structure are most useful in determining binding residues with a particular focus on sequence features. We find that the quantity of information carried in most features is small with respect to DNA-contacting residues, the bulk being provided by sequence features along with a select few structural features. Supplemental information for this article is available at http://www.cs.umn.edu/~kauffman/supplements/psb2008