Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Protein topology representations such as residue contact maps are an important intermediate step towards ab initio prediction of protein structure, but the problem of predicting reliable contact maps is far from solved. One of the main pitfalls of existing contact map predictors is that they generally predict unphysical maps, i.e. maps that cannot be embedded into three-dimensional structures or, at best, violate a number of basic constraints observed in real protein structures, such as the maximum number of contacts for a residue. Here, we focus on the problem of learning to predict more "physical" contact maps. We do so by first predicting contact maps through a traditional system (XXStout), and then filtering these maps by an ensemble of artificial neural networks. The filter is provided as input not only the bare predicted map, but also a number of global or long-range features extracted from it. In a rigorous cross-validation test, we show that the filter greatly improves the predicted maps it is input. CASP7 results, on which we report here, corroborate this finding. Importantly, since the approach we present here is fully modular, it may be beneficial to any other ab initio contact map predictor.
We show that finitely differentiable diffeomorphisms that are either symplectic, volume-preserving, or contact can be approximated with analytic diffeomorphisms that are, respectively, symplectic, volume-preserving, or contact…
Note from Publisher: This article contains the abstract only.
Data mining is the semi-automatic discovery of patterns, associations, changes, anomalies, and statistically significant structures and events in data. Traditional data analysis is assumption driven in the sense that a hypothesis is formed and validated against the data. Data mining, in contrast, is data driven in the sense that patterns are automatically extracted from data. The goal of this tutorial is to provide an introduction to data mining techniques. The focus will be on methods appropriate for mining massive datasets using techniques from scalable and high performance computing. The techniques covered include association rules, sequence mining, decision tree classification, and clustering. Some aspects of preprocessing and postprocessing are also covered. The problem of predicting contact maps for protein sequences is used as a detailed case study.
The material presented here is compiled by LW based on the original tutorial slides of MJZ at the 2002 Post-Genome Knowledge Discovery Programme in Singapore.