Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Electron cryo-microscopy is a fast advancing biophysical technique to derive three-dimensional structures of large protein complexes. Using this technique, many density maps have been generated at intermediate resolution such as 6–10 Å resolution. Although it is challenging to derive the backbone of the protein directly from such density maps, secondary structure elements such as helices and β-sheets can be computationally detected. Our work in this paper provides an approach to enumerate the top-ranked possible topologies instead of enumerating the entire population of the topologies. This approach is particularly practical for large proteins. We developed a directed weighted graph, the topology graph, to represent the secondary structure assignment problem. We prove that the problem of finding the valid topology with the minimum cost is NP hard. We developed an O(N2 2N) dynamic programming algorithm to identify the topology with the minimum cost. The test of 15 proteins suggests that our dynamic programming approach is feasible to work with proteins of much larger size than we could before. The largest protein in the test contains 18 helical sticks detected from the density map out of 33 helices in the protein.
The determination of the secondary structure topology is a critical step in deriving the atomic structure from the protein density map obtained from electron cryo-microscopy technique. This step often relies on the matching of two sources of information. One source comes from the secondary structures detected from the protein density map at the medium resolution, such as 5–10 Å. The other source comes from the predicted secondary structures from the amino acid sequence. Due to the inaccuracy in either source of information, a pool of possible secondary structure positions needs to be sampled. This paper studies the question, that is, how to reduce the computation of the mapping when the inaccuracy of the secondary structure predictions is considered. We present a method that combines the concept of dynamic graph with our previous work of using constrained shortest path to identify the topology of the secondary structures. We show a reduction of 34.55% of run-time as comparison to the naïve way of handling the inaccuracies. We also show an improved accuracy when the potential secondary structure errors are explicitly sampled verses the use of one consensus prediction. Our framework demonstrated the potential of developing computationally effective exact algorithms to identify the optimal topology of the secondary structures when the inaccuracy of the predicted data is considered.