![]() |
This volume contains 25 peer-reviewed papers based on the presentations at the 8th Annual International Workshop on Bioinformatics and Systems Biology (IBSB 2008) held at the Teikyo Hotel, Zeuten Lake, near Berlin, from June 9 to June 10, 2008. This workshop started in 2001 as an event for doctoral students and young researchers to present and discuss their research results and approaches in bioinformatics and systems biology. It is part of a collaborative educational program involving leading institutions and leaders committed to the following programs and partner institutions:
• Boston (Gary Benson) — Graduate Program in Bioinformatics, Boston University
• Berlin (Herman-Georg Holzhütter) — The International Research Training Group (IRTG) “Genomics and Systems Biology of Molecular Networks”
• Kyoto/Tokyo (Minoru Kanehisa/Satoru Miyano) — Joint Bioinformatics Education Program of Kyoto University and University of Tokyo.
https://doi.org/10.1142/9781848163003_fmatter
The following sections are included:
https://doi.org/10.1142/9781848163003_0001
Metabolism is one of the best studied fields of biochemistry, but its regulation involves processes on many different levels, some of which are still not understood well enough to allow for quantitative modeling and prediction. Glycolysis in yeast is a good example: although high-quality quantitative data are available, well-established mathematical models typically only cover direct regulation of the involved enzymes by metabolite binding. The effect of various metabolites on the enzyme kinetics is summarized in carefully developed mathematical formulae. However, this approach implicitly assumes that the enzyme concentrations themselves are constant, thus neglecting other regulatory levels – e.g. transcriptional and translational regulation – involved in the regulation of enzyme activities. It is believed, however, that different experimental conditions result in different enzyme activities regulated by the above mechanisms. Detailed modeling of all regulatory levels is still out of reach since some of the necessary data – e.g. quantitative large scale enzyme concentration data sets – are lacking or rare. Nevertheless, a viable approach is to include the regulation of enzyme concentrations into an established model and to investigate whether this improves the predictive capabilities. Proteome data are usually hard to obtain, but levels of mRNA transcripts may be used instead as clues for changes in enzyme concentrations. Here we investigate whether including mRNA data into an established model of yeast glycolysis allows to predict the steady state metabolic concentrations for different experimental conditions. To this end, we modified an established ODE model for the glycolytic pathway of yeast to include changes of enzyme concentrations. Presumable changes were inferred from mRNA transcript level measurement data. We investigate how this approach can be used to predict metabolite concentrations for steady-state yeast cultures at five different oxygen levels ranging from anaerobic to fully aerobic conditions. We were partly able to reproduce the experimental data and present a number of changes that were necessary to improve the modeling result.
https://doi.org/10.1142/9781848163003_0002
Ca2+ is the most important second messenger controlling a variety of intracellular processes by oscillations of the cytosolic Ca2+ concentration. These oscillations occur by Ca2+ release from the endoplasmic reticulum (ER) into the cytosol through channels and the re-uptake of Ca2+ into the ER by pumps. A common channel type present in many cell types is the inositol trisphosphate receptor (IP3R), which is activated by IP3 and Ca2+ itself leading to Ca2+ induced Ca2+ release (CICR). We have shown in an experimental study [15], that Ca2+ oscillations are sequences of random spikes that occur by wave nucleation. We use here our recently developed model for Ca2+ dynamics in 3 dimension to illuminate the role of IP3R clustering within spatial extended systems.
https://doi.org/10.1142/9781848163003_0003
A system-dynamics-centered ontology, called the Cell System Ontology (CSO), has been developed for representation of diverse biological pathways. Many of the pathway data based on the ontology have been created from databases via data conversion or curated by expert biologists. It is essential to validate the pathway data which may cause unexpected issues such as semantic inconsistency and incompleteness. This paper discusses three criteria for validating the pathway data based on CSO as follows: (1) structurally correct models in terms of Petri nets, (2) biologically correct models to capture biological meaning, and (3) systematically correct models to reflect biological behaviors. Simultaneously, we have investigated how logic-based rules can be used for the ontology to extend its expressiveness and to complement the ontology by reasoning, which aims at qualifying pathway knowledge. Finally, we show how the proposed approach helps exploring dynamic modeling and simulation tasks without prior knowledge.
https://doi.org/10.1142/9781848163003_0004
Recently, nonlinear vector autoregressive (NVAR) model based on Granger causality was proposed to infer nonlinear gene regulatory networks from time series gene expression data. Since NVAR requires a large number of parameters due to the basis expansion, the length of time series microarray data is insufficient for accurate parameter estimation and we need to limit the size of the gene set strongly. To address this limitation, we employ L1 regularization technique to estimate NVAR. Under L1 regularization, direct parents of each gene can be selected efficiently even when the number of parameters exceeds the number of data samples. We can thus estimate larger gene regulatory networks more accurately than those from existing methods. Through the simulation study, we verify the effectiveness of the proposed method by comparing its limitation in the number of genes to that of the existing NVAR. The proposed method is also applied to time series microarray data of Human hela cell cycle.
https://doi.org/10.1142/9781848163003_0005
Mathematical modeling of biological systems usually involves implementing, simulating, and discriminating several candidate models that represent alternative hypotheses. Generating and managing these candidate models is a tedious and difficult task and can easily lead to errors. ModelMage is a tool that facilitates management of candidate models. It is designed for the easy and rapid development, generation, simulation, and discrimination of candidate models. The main idea of the program is to automatically create a defined set of model alternatives from a single master model. The user provides only one SBML-model and a set of directives from which the candidate models are created by leaving out species, modifiers or reactions. After generating models the software can automatically fit all these models to the data and provides a ranking for model selection, in case data is available. In contrast to other model generation programs, ModelMage aims at generating only a limited set of models that the user can precisely define. ModelMage uses COPASI as a simulation and optimization engine. Thus, all simulation and optimization features of COPASI are readily incorporated. ModelMage can be downloaded from http://sysbio.molgen.mpg.de/modelmage and is distributed as free software.
https://doi.org/10.1142/9781848163003_0006
Microarrays are high-throughput technologies whose data are known to be noisy. In this work, we propose a graph-based method which first identifies the extent to which a single microarray experiment is noisy and then applies an error function to clean individual expression levels. These two steps are unified within a framework based on a graph representation of a separate data set from some repository. We demonstrate the utility of our method by comparing our results against statistical methods by applying both techniques to simulated microarray data. Our results are encouraging and indicate one potential use of microarray data from past experiments.
https://doi.org/10.1142/9781848163003_0007
We present a model of osmoadaptation in S. cerevisiae based on existing experimental and theoretical work. In order to investigate the impact of osmoadaptation on glycolysis, this model focuses on the interactions between glycolysis and osmoadaptation, namely the production of glycerol and its influence on flux towards pyruvate. Evaluation of this model shows that, depending on initial relations between glycerol and pyruvate production, the increased glycerol production can have a substantial negative effect on the pyruvate production rate. Existing experimental data and a detailed analysis of the model lead to the suggestion of an interaction between activated Hog1 and activators of glycolysis such as Pfk26.
https://doi.org/10.1142/9781848163003_0008
The producibility of metabolites from available resources is investigated systematically using flux balance analysis (FBA) and network expansion. Calculations are performed for the genome-scale metabolic networks of Escherichia coli and Methanosarcina barkeri. Strict biological interpretation of the results obtained with FBA leads to the concept of sustainability, which reduces the set of producible metabolites by assuming a growing and dividing cell. A systematic comparison showed that applying network expansion in many cases results in exactly the set of all sustainable metabolites. The purely heuristic approach of allowing for certain cofactors to facilitate reactions during the process of network expansion dramatically helps to improve agreement of the results from the two different approaches. In conclusion, we state that network expansion, due to its enormous advantages in computational speed, is a valuable alternative to determining producible metabolites with FBA.
https://doi.org/10.1142/9781848163003_0009
In this paper we investigate a new framework for graph partitioning using decision trees to search for sub-graphs within a graph adjacency matrix. Graph partitioning by a decision tree seeks to optimize a specified graph partitioning index such as ratio cut by recursively applying decision rules found within nodes of the graph. Key advantages of tree models for graph partitioning are they provide a predictive framework for evaluating the quality of the solution, determining the number of sub-graphs and assessing overall variable importance. We evaluate the performance of tree based graph partitioning on a benchmark dataset for multiclass classification of tumor diagnosis based on gene expression. Three graph cut indices will be compared, ratio cut, normalized cut and network modularity and assessed in terms of their classification accuracy, power to estimate the optimal number of sub-graphs and ability to extract known important variables within the dataset.
https://doi.org/10.1142/9781848163003_0010
Non-linear correlations based on mutual information are evaluated to measure statistical dependencies among data points measured from metabolism in two dimensional space. While the Pearson correlation coefficient is only rigorously applicable to characterize strictly linear correlations with Gaussian noise, the mutual information coefficient is more generally valid. Here, we use recent distribution-free (non-parametric) mutual information estimators based on k-nearest neighbor distances. The mutual information algorithm of Kraskov et al. is found to yield estimates with low systematic and statistical error. The significance of the different methods is probed for artificial sets of tens to hundreds of data points, a size currently typical for metabolomic data. We analyze experimental data on metabolite concentrations from Arabidopsis thaliana by using these procedures. The mutual information was able to detect additional non-linear correlations undetectable for the Pearson coefficient.
https://doi.org/10.1142/9781848163003_0011
Constraint-based models of cellular metabolism, such as flux balance analysis (FBA), use convex analysis and optimization to study metabolic networks at a genome scale. The availability of reaction lists for numerous organisms, along with a variety of network analysis and optimization tools, is making these approaches increasingly popular for metabolic engineering and biomedical applications, as well as for addressing fundamental biological questions. It is therefore very important to assess the predictive capacity of these models and to understand how to interpret them in a biologically relevant manner. Typically, model assessment is limited to gauging the ability to predict phenotypes, such as viability under different environmental and genetic conditions. These types of assessments, for the most part, focus only on the growth phenotype of the cells, but ignore the underlying flux predictions. While this may be sufficient for certain types of study, the question of whether flux balance models can reliably predict intracellular and transport fluxes is crucial for more detailed analysis, and remains largely unanswered. Here we compare FBA model predictions of yeast metabolic fluxes to a previously published set of experimentally determined fluxes for 13 different single gene deletion mutants across a variety of possible objective functions. We find that the specific optimization criteria used to determine fluxes have a significant impact on the accuracy of the predicted fluxes. Interestingly, while different optimization methods provide very different levels of agreement relative to experimental fluxes, they tend to provide similar predictions with respect to the effect of the perturbation on growth. This demonstrates that assessment of models at the level of flux predictions is a critical step in assessing the biological validity of different models and optimization criteria.
https://doi.org/10.1142/9781848163003_0012
Studies of genome-scale metabolic networks allow for qualitative and quantitative descriptions of an organism's capability to convert nutrients into products. The set of synthesizable products strongly depends on the provided nutrients as well as on the structure of the metabolic network. Here, we apply the method of network expansion and the concept of scopes, describing the synthesizing capacities of an organism when certain nutrients are provided. We analyze the biosynthetic properties of four species: Arabidopsis thaliana, Saccharomyces cerevisiae, Buchnera aphidicola, and Escherichia coli. Matthäus et al. [12] have recently developed a method to identify clusters of scopes, reflecting specific biological functions and exhibiting a hierarchical arrangement, using the network comprising all reactions in KEGG. We extend this method by considering random sets of nutrients on well-curated networks of the investigated species from BioCyc. We identify structural properties of the networks that allow to differentiate their biosynthetic capabilities. Furthermore, we evaluate the quality of the clustering of scopes applied to the species-specific networks. Our study provides a novel assessment of the biosynthetic properties of different species.
https://doi.org/10.1142/9781848163003_0013
Prediction of unknown enzymatic reactions is useful for understanding biological processes such as reactions to external substances like endocrine disrupters. To create an accurate prediction, we need to define a similarity measure in the reaction. We have developed the KEGG RPAIR database which is a collection of chemical structure transformation patterns, called RDM patterns, for substrate-product pairs of enzymatic reactions. In this study, we compared RDM patterns with EC numbers which are the well-known hierarchical classification scheme for enzymes. Additionally, we performed hierarchical clustering of RDM patterns using the information stating whether each sub-subclass of EC has a particular RDM patterns or not. To represent the variation of RDM patterns in a cluster, we generalized RDM patterns in the same cluster using the hierarchy of KEGG Atomtypes, which are the components of RDM patterns. Using this generalized pattern, we can predict which cluster includes a given RDM pattern even if the reaction of the pattern has not been assigned any EC numbers. Thus we will be able to define the similarity between enzymatic reactions by using this cluster information.
https://doi.org/10.1142/9781848163003_0014
Regulation of metabolic enzymes plays a crucial role in the maintenance of metabolic homeostasis, and in the capacity of living systems to undergo physiological adaptation under multiple environmental conditions. Metabolic regulation is achieved through a complex interplay of transcriptional and post-transcriptional mechanisms, some of which have been experimentally characterized for specific pathways and organisms. Many of the details, however, including the values of most kinetic parameters, have proven difficult to elucidate. Hence, understanding the principles that underlie metabolic regulation strategies constitutes an ongoing challenge. In the context of genome-scale steady state models of metabolic networks, it has been shown that evolution may drive metabolic networks towards reaching computationally predictable optimal states, such as maximal growth capacity. Here we develop a new computational approach based on the hypothesis that the regulatory systems operating on metabolic networks have evolved towards an optimal architecture as well. Specifically, we hypothesize that the topology of metabolic regulation networks has been selected for optimally maintaining the system balanced around one or more steady states. Based on these hypotheses, we use methods related to flux balance analysis to construct a model of metabolic regulation based primarily on a metabolic network's topology, bypassing the requirement for the details of all kinetic parameters. This model predicts an optimal regulatory network of metabolic interactions that can resolve perturbations to a given steady state in a metabolic system. We explore the ability of the model to predict optimal regulatory responses in both a simple toy network and in a fragment of the well-described glycolysis pathway.
https://doi.org/10.1142/9781848163003_0015
Flux Balance Analysis (FBA) has been successfully applied to facilitate the understanding of cellular metabolism in model organisms. Standard formulations of FBA can be applied to large systems, but the accuracy of predictions may vary significantly depending on environmental conditions, genetic perturbations, or complex unknown regulatory constraints. Here we present an FBA-based approach to infer the biomass compositions that best describe multiple physiological states of a cell. Specifically, we seek to use experimental data (such as flux measurements, or mRNA expression levels) to infer best matching stoichiometrically balanced fluxes and metabolite sinks. Our algorithm is designed to provide predictions based on the comparative analysis of two metabolic states (e.g. wild-type and knockout, or two different time points), so as to be independent from possible arbitrary scaling factors. We test our algorithm using experimental data for metabolic fluxes in wild type and gene deletion strains of E. coli. In addition to demonstrating the capacity of our approach to correctly identify known exchange fluxes and biomass compositions, we analyze E. coli central carbon metabolism to show the changes of metabolic objectives and potential compensation for reducing power due to single enzyme gene deletion in pentose phosphate pathway.
https://doi.org/10.1142/9781848163003_0016
The RNA Ontology Consortium recently proposed a two-letter representation of the RNA backbone conformation. In this study, we compare the suite notation to a custom string representation that utilizes η - θ pseudotorsion angles. Both representations were used to assess similarity and self-similarity in several RNA structure datasets. For the detection of similarities between two RNA structures we are utilizing suffix techniques that allow for the detection of substructure similarity within some degree of inexactness. The suite representation as well as the pseudotorsion representation was tested on four diverse RNA datasets. The possibility to detect structural similarities on these datasets allowed to recover many homologous structural elements that have implications for further understanding of the RNA apparatus in Systems Biology. The software as well as the utilized datasets are freely available from http://suiterna.sourceforge.net.
https://doi.org/10.1142/9781848163003_0017
GC content has been shown to be an important aspect of human genomic function. Extending beyond the scope of GC content alone, there is a class of regions in the genome that have especially high GC content and are enriched for the CG dinucleotide—called CpG islands. CpG islands have been linked to biologically functional genomic elements. DNA structure also contributes to biological function. Recent studies found that some DNA structural properties are correlated with CpG island functionality [5, 14]. Here, we use hydroxyl radical cleavage patterns as a measure of DNA structure, to explore the relationship between GC content and fine-scale DNA structure. We show that there is a positive correlation between GC content and the solvent-accessible structural properties of a DNA sequence, and that the strength of this correlation decreases as genomic resolution increases. We demonstrate that regions of the genome that have highly solvent-accessible DNA structure tend to overlap functional genomic elements. Our results suggest that fine-scale DNA structural properties that are encoded in the genome are important for biological function, and that the highly solvent-accessible nature of high GC content regions and some CpG islands may account for some of their functional properties.
https://doi.org/10.1142/9781848163003_0018
We report various transcription factor binding sites (TFBSs) conserved among co-expressed genes in human promoter region using expression and genomic data. Assuming similar promoter structure induces similar transcriptional regulation, hence induces similar expression profile, we compared the promoter structure similarities between co-expressed genes. Comprehensive TF binding site predictions for all human genes were conducted for 19,777 promoter regions around the transcription start site (TSS) given from DBTSS and promoter similarity search were conducted among coexpressing genes data provided from newly developed COXPRESdb. Combination of Position Weight Matrix (PWM) motif prediction and bootstrap method, 7,313 genes have at least one statistically significant conserved TFBS. We also applied basket method analysis for seeking combinatorial activities of those conserved TFBSs.
https://doi.org/10.1142/9781848163003_0019
Interleukin-2 (IL-2) is one of the first cytokines to be expressed by T helper cells (Th cells) after antigen-specific stimulation. In contrast, regulatory T cells (Treg cells) do not express IL-2, although they are activated via the same pathways. In regulatory T cells the additional transcription factor FoxP3 is expressed. Using intracellular measurement of the transcription factors NFAT and FoxP3 as well as the cytokine IL-2 on single cell level we revealed a small fraction of IL-2 expressing Treg cells. Furthermore, these data enabled us to develop initial mathematical models describing gene expression of IL-2 in individual cells. The models are adapted to data from human regulatory T cells. Based on statistical tests of available flow cytometric data it seems reasonable that not only the amount of the transcription factors NFAT and FoxP3 is important but also their concentration ratio. We discuss specific problems of modeling gene expression on single cell level taking IL-2 expression as an example.
https://doi.org/10.1142/9781848163003_0020
Within our everyday life we are confronted with a variety of toxic substances. A number of these compounds are already used as lead structures for the development of new drugs, but the amount of toxic substances is still a rich resource of new bioactive compounds. During the identification and development of new potential drugs, risk estimation of health hazards is an essential and topical subject in pharmaceutical industry. To face this challenge, an extensive investigation of known toxic compounds is going to be helpful to estimate the toxicity of potential drugs. "Toxicity properties" found during those investigations will also function as a guideline for the toxicological classification of other unknown substances. We have compiled a dataset of approximately 50,000 toxic compounds from literature and web sources. All compounds were classified according to their toxicity. During this study the collection of toxic compounds was investigated extensively regarding their chemical, functional, and structural properties and compaired with a dataset of drugs and natural compounds. We were able to identify differences in properties within the toxic compounds as well as in comparison to drugs and natural compounds. These properties include molecular weight, hydrogen bond donors and acceptors, and functional groups which can be regarded as "toxicity properties", i.e. attributes defining toxicity.
https://doi.org/10.1142/9781848163003_0021
The Vascular Endothelial Growth Factor receptors (VEGF-Rs) play a significant role in tumor development and tumor angiogenesis and are therefore interesting targets in cancer therapy. Targeting the VEGF-R is of special importance as the feed of the tumor has to be reduced. In general, this can be carried out by inhibiting the tyrosine kinase function of the VEGF-R. Nevertheless, there arise some problems with the specificity of known kinase inhibitors: they bind to the ATP-binding site and inhibit a number of kinases, moreover the so far most specific inhibitors act at least on these three major types of VEGF-Rs: Flt-1, Flk-1/KDR, Flt-4. The goal is a selective VEGF-R-2 (Flk-1/KDR) inhibitor, because this receptor triggers rather unspecific signals from VEGF-A, -C, -D and -E. Here, we describe a protocol starting from an established inhibitor (Vatalanib) with 2D-/3D-searching and property filtering of the in silico screening hits and the "negative docking approach". With this approach we were able to identify a compound, which shows a fourfold higher reduction of the proliferation rate of endothelial cells compared to the reduction effect of the lead structure.
https://doi.org/10.1142/9781848163003_0022
Harmful effects associated with use of drugs are caused as a result of their side effects and combined use of different drugs. These drug interactions result in increased or decreased drug effects, or produce other new unwanted effects and are serious problems for medical institutions and pharmaceutical companies. In this study, we created a drug-drug interaction network from drug package inserts and characterized drug interactions. The known information about the potential risk of drug interactions is described in drug package inserts. Japanese drug package inserts are stored in the JAPIC (Japan Pharmaceutical Information Center) database and GenomeNet provides the GenomeNet pharmaceutical products database, which integrate the JAPIC and KEGG databases. We extracted drug interaction data from GenomeNet, where interactions are classified according to risks, contraindications or cautions for coadministration, and some entries include information about enzymes metabolizing the drugs. We defined drug target and drug-metabolizing enzymes as interaction factors using information on them in KEGG DRUG, and classified drugs into pharmacological/chemical subgroups. In the resulting drug-drug interaction network, the drugs that are associated with the same interaction factors are closely interconnected. Mechanisms of these interactions were then identified by each interaction factor. To characterize other interactions without interaction factors, we used the ATC classification system and found an association between interaction mechanisms and pharmacological/chemical subgroups.
https://doi.org/10.1142/9781848163003_0023
Protein-protein docking is a major task in structural biology. In general, the geometries of protein pairs are sampled by generating docked conformations, analyzing them with scoring functions and selecting appropriate geometries for further refinement. Here, we present an algorithm in real space to sample geometries of protein pairs. Therefore, we initially determine uniformly distributed points on the surfaces of the two protein structures to be docked and additionally define a set of uniformly distributed rotations. Then, the sampling method generates structures of protein pairs as follows: (i) We rotate one protein of the protein pair according to a selected rotation and (ii) translate it along a line connecting two surface points belonging to different proteins such that these surface points coincide. The resulting protein pair geometries are then analyzed and selected using a scoring function that considers residues and atom pairs. We applied this approach to a set of 22 enzyme-inhibitor complexes and demonstrate that a discretisation of the rigid-body search in real space provides an efficient and robust sampling scheme. Our method generates decoy sets with a considerable fraction of near-native geometries for all considered enzyme-inhibitor complexes.
https://doi.org/10.1142/9781848163003_0024
Isotopomer tracer experiments are indispensable for the determination of flux rates in already known pathways as well as for the identification of new pathways. The information gained from such experiments depends on the labeling of the feed tracer metabolite, i.e. the atom positions carrying a label. Here we present an algorithm and a software tool to find an optimal carbon labeling pattern that assures the label to disseminate predominantly into those parts of the network under study. Our implementation is based on carbon fate maps and distinguishes between homotopic and prochiral atoms. In addition, the software can be used to generate carbon transition probability matrices, which can be used for the study of biochemical reaction mechanisms. In this article we present the algorithms and show an application of the software for glycolysis and the TCA cycle.
https://doi.org/10.1142/9781848163003_0025
Reconstructed biological networks are the essence of knowledge originating from experiments, scientific literature, databases and modeling. Proteins are the major players in biological networks. If the function of a protein is not yet known, it can often be deduced from homologous proteins that are already experimentally characterized. As such conclusions are not as reliable as experimental evidences, they should be well documented and reviewed when experimental data is available. Inconsistent operation of the resulting network may indicate invalid functional assignments. Here we present a novel technique to refer to annotated sequence and 3D-structure alignments in terms of Web links. By clicking the Web link the alignment is viewed in the protein viewer STRAP. References to public protein databases such as EMBL, KEGG, GENBANK, PDB, PFAM, PRODOM and UNIPROT/SWISSPROT are encoded in the Web-link whereas the alignment gaps are computed dynamically. Site specific annotations and 3D-rendering commands may also be included in the Web-link. In contrast, sequence features such as active site residues, phosphorylation sites and ligand binding sites do not need to be specified, as long as they are retrievable from public databases. The method has been developed for an information management system that is used for the reconstruction of metabolic pathways. The alignment viewer may also be of interest for experimentalists, as it can be used to document sites of interest in the proteins under experimental investigation. These alignment Web links may be included in project Web sites. Availability: The STRAP program is published under the GNU-license condition and is automatically downloaded from http://3d-alignment.eu/ or http://www.charite.de/bioinf/strap/ when an alignment reference is clicked.
https://doi.org/10.1142/9781848163003_bmatter
The following sections are included: