![]() |
This volume contains 18 peer-reviewed papers based on the presentations at the 10th Annual International Workshop on Bioinformatics and Systems Biology (IBSB 2010) held at Kyoto University from July 26 to July 28, 2010. This workshop started in 2001 as an event for doctoral students and young researchers to present and discuss their research results and approaches in bioinformatics and systems biology. It is part of a collaborative educational program involving leading institutions and leaders committed to the following programs:
Sample Chapter(s)
Chapter 1: Kinetic Modelling of Dna Replication Initiation in Budding Yeast (2,496 KB)
https://doi.org/10.1142/9781848166585_fmatter
The following sections are included:
https://doi.org/10.1142/9781848166585_0001
DNA replication is restricted to a specific time window of the cell cycle, called S phase. Successful progression through S phase requires replication to be properly regulated to ensure that the entire genome is duplicated exactly once, without errors, in a timely fashion. As a result, DNA replication has evolved into a tightly regulated process involving the coordinated action of numerous factors that function in all phases of the cell cycle. Biochemical mechanisms driving the eukaryotic cell division cycle have been the subject of a number of mathematical models. However, cell cycle networks reported in literature so far have not addressed the steps of DNA replication events. In particular, the assembly of the replication machinery is crucial for the timing of S phase. This event, called "initiation", which occurs in late M / early G1 of the cell cycle, starts with the assembly of the pre-replicative complex (pre-RC) at the origins of replication on the DNA. Its activation depends on the availability of different kinase complexes, cyclin-dependent kinases (CDKs) and Dbf-dependent kinase (DDK), which phosphorylate specific components of the pre-RC to convert it into the pre-initiation complex (pre-IC). We have developed an ODE-based model of the network responsible for this process in budding yeast by using mass-action kinetics. We considered all steps from the assembly of the first components at the DNA replication origin up to the active replisome that recruits the polymerases and verified the computational dynamics with the available literature data. Our results highlighted the link between activation of CDK and DDK and the step-by-step formation of both pre-RC and pre-IC, suggesting S-CDK (Cdk1-Clb5,6) to be the main regulator of the process.
https://doi.org/10.1142/9781848166585_0002
Protein-Protein interactions play an important role in many cellular processes. However experimental determination of the protein complex structure is quite difficult and time consuming. Hence, there is need for fast and accurate in silico protein docking methods. These methods generally consist of two stages: (i) a sampling algorithm that generates a large number of candidate complex geometries (decoys), and (ii) a scoring function that ranks these decoys such that nearnative decoys are higher ranked than other decoys. We have recently developed a neural network based scoring function that performed better than other state-of-the-art scoring functions on a benchmark of 65 protein complexes. Here, we use similar ideas to develop a method that is based on linear scoring functions. We compare the linear scoring function of the present study with other knowledge-based scoring functions such as ZDOCK 3.0, ZRANK and the previously developed neural network. Despite its simplicity the linear scoring function performs as good as the compared state-of-the-art methods and predictions are simple and rapid to compute.
https://doi.org/10.1142/9781848166585_0003
The G-protein coupled receptor (GPCR) superfamily is the largest class of proteins with therapeutic value. More than 40% of present prescription drugs are GPCR ligands. The high therapeutic value of GPCR proteins and recent advancements in virtual screening methods gave rise to many virtual screening studies for GPCR ligands. However, in spite of vast amounts of research studying their functions and characteristics, 3D structures of most GPCRs are still unknown. This makes target-based virtual screenings of GPCR ligands extremely difficult, and successful virtual screening techniques rely heavily on ligand information. These virtual screening methods focus on specific features of ligands on GPCR protein level, and common features of ligands on higher levels of GPCR classification are yet to be studied. Here we extracted common substructures of GPCR ligands of GPCR protein subfamilies. We used the SIMCOMP, a graph-based chemical structure comparison program, and hierarchical clustering to reveal common substructures. We applied our method to 850 GPCR ligands and we found 53 common substructures covering 439 ligands. These substructures contribute to deeper understanding of structural features of GPCR ligands which can be used in new drug discovery methods.
https://doi.org/10.1142/9781848166585_0004
In healthy individuals, dehydration of the body leads to release of the hormone vasopressin from the pituitary. Via the bloodstream, vasopressin reaches the collecting duct cells in the kidney, where the water channel Aquaporin-2 (AQP2) is expressed. After stimulation of the vasopressin V2 receptor by vasopressin, intracellular AQP2-containing vesicles fuse with the apical plasma membrane of the collecting duct cells. This leads to increased water reabsorption from the pro-urine into the blood and therefore to enhanced retention of water within the body.
Using existing biological data we propose a mathematical model of AQP-2 trafficking and regulation in collecting duct cells. Our model includes the vasopressin receptor, adenylate cyclase, protein kinase A, and intracellular as well as membrane located AQP2. To model the chemical reactions we used ordinary differential equations (ODEs) based on mass action kinetics. We employ known protein concentrations and time series data to estimate the kinetic parameters of our model and demonstrate its validity.
Through generating, testing and ranking different versions of the model, we show that some model versions can describe the data well as soon as important regulatory parts such as the reduction of the signal by internalization of the vasopressin-receptor or the negative feedback loop representing phosphodiesterase activity are included.
We perform time-dependent sensitivity analysis to identify the reactions that have the greatest influence on the cAMP and membrane located AQP2 levels over time. We predict the time courses for membrane located AQP2 at different vasopressin concentrations, compare them with newly generated data and discuss the competencies of the model.
https://doi.org/10.1142/9781848166585_0005
Several technologies are currently used for gene expression profiling, such as Real Time RT-PCR, microarray and CAGE (Cap Analysis of Gene Expression). CAGE is a recently developed method for constructing transcriptome maps and it has been successfully applied to analyzing gene expressions in diverse biological studies. The principle of CAGE has been developed to address specific issues such as determination of transcriptional starting sites, the study of promoter regions and identification of new transcripts. Here, we present both quantitative and qualitative comparisons among three major gene expression quantification techniques, namely: CAGE, illumina microarray and Real Time RT-PCR, by showing that the quantitative values of each method are not interchangeable, however, each of them has unique characteristics which render all of them essential and complementary. Understanding the advantages and disadvantages of each technology will be useful in selecting the most appropriate technique for a determined purpose.
https://doi.org/10.1142/9781848166585_0006
We address an issue of detecting a switching mechanism in gene expression, where two genes are positively correlated for one experimental condition while they are negatively correlated for another. We compare the performance of existing methods for this issue, roughly divided into two types: interaction test (IT) and the difference of correlation coefficients. Interaction test, currently a standard approach for detecting epistasis in genetics, is the log-likelihood ratio test between two logistic regressions with/without an interaction term, resulting in checking the strength of interaction between two genes. On the other hand, two correlation coefficients can be computed for two experimental conditions and the difference of them shows the alteration of expression trends in a more straightforward manner. In our experiments, we tested three different types of correlation coefficients: Pearson, Spearman and a midcorrelation (biweight midcorrelation). The experiment was performed by using ~ 2.3 × 109 combinations selected out of the GEO (Gene Expression Omnibus) database. We sorted all combinations according to the p-values of IT or by the absolute values of the difference of correlation coefficients and then visually evaluated the top ranked combinations in terms of the switching mechanism. The result showed that 1) combinations detected by IT included non-switching combinations and 2) Pearson was affected by outliers easily while Spearman and the midcorrelation seemed likely to avoid them.
https://doi.org/10.1142/9781848166585_0007
We propose a statistical model realizing simultaneous estimation of gene regulatory network and gene module identification from time series gene expression data from microarray experiments. Under the assumption that genes in the same module are densely connected, the proposed method detects gene modules based on the variational Bayesian technique. The model can also incorporate existing biological prior knowledge such as protein subcellular localization. We apply the proposed model to the time series data from a synthetically generated network and verified the effectiveness of the proposed model. The proposed model is also applied the time series microarray data from HeLa cell. Detected gene module information gives the great help on drawing the estimated gene network.
https://doi.org/10.1142/9781848166585_0008
Motivation: Methods like FBA and kinetic modeling are widely used to calculate fluxes in metabolic networks. For the analysis and understanding of simulation results and experimentally measured fluxes visualization software within the network context is indispensable.
Results: We present Flux Viz, an open-source Cytoscape plug-in for the visualization of flux distributions in molecular interaction networks. FluxViz supports (i) import of networks in a variety of formats (SBML, GML, XGMML, SIF, BioPAX, PSI-MI) (ii) import of flux distributions as CSV, Cytoscape attributes or VAL files (iii) limitation of views to flux carrying reactions (flux subnetwork) or network attributes like localization (iv) export of generated views (SVG, EPS, PDF, BMP, PNG). Though FluxViz was primarily developed as tool for the visualization of fluxes in metabolic networks and the analysis of simulation results from FASIMU, a flexible software for batch flux-balance computation in large metabolic networks, it is not limited to biochemical reaction networks and FBA but can be applied to the visualization of arbitrary fluxes in arbitrary graphs.
Availability: The platform-independent program is an open-source project, freely available at http://sourceforge.net/projects/fluxvizplugin/ under GNU public license, including manual, tutorial and examples.
https://doi.org/10.1142/9781848166585_0009
Many cofactors and nucleotides containing sulfur atoms are known to have important functions in a variety of organisms. Recently, the biosynthetic pathways of these sulfur containing compounds have been revealed, where many enzymes relay sulfur atoms. Increasing evidence also suggests that the prokaryotic sulfur-relay enzymes might be the evolutionary origin of ubiquitination and the related systems that control a wide range of physiological processes in eukaryotic cells. However, these sulfur-relay enzymes have been studied in only a small number of organisms. Here we carried out comparative genomic analysis and examined the presence and absence of sulfurtransferases utilized in the biosynthetic pathways of molybdenum cofactor (Moco), 2-thiouridine (S2U), and 4-thiouridine (S4U), and IscS, a cysteine desulfurase. We found that all eukaryotes and many other organisms lack the intermediate enzymes in S2U biosynthesis. It is also found that most genes lack rhodanese homology domain (RHD), a catalytic domain of sulfurtransferase. Some organisms have a conserved sequence composed of about 100 residues in the C terminus of TusA, different from RHD. Host-associated organisms have a tendency to lose Moco biosynthetic enzymes, and some organisms have MoaD-MoaE fusion protein. Our findings suggest that sulfur-relay pathways have been so diversified that some putative sulfurtransferases possibly function in other unknown pathways.
https://doi.org/10.1142/9781848166585_0010
Lipid mediator is the collective term for prostanoids, leukotrienes, lysophospholipids, platelet-activating factor, endocannabinoids and other bioactive lipids, that are involved in various physiological functions including inflammation, immune regulation and cellular development. They act by binding to their ligand-specific G-protein coupled receptors (GPCRs). Since 1990's a number of lipid GPCRs have been cloned in humans, with a few more identified in other vertebrates. However, the conservation of these receptors has been poorly investigated in other eukaryotes. Herein we performed a phylogenetic analysis by collecting their orthologs in 13 eukaryotes with complete genomes. The analysis shows that orthologs for prostanoid receptors are likely to be conserved in the 13 eukaryotes. In contrast, those for lysophospholipid and cannabinoid receptors appear to be conserved only in vertebrates and chordates. Receptors for leukotrienes and other bioactive lipids are limited to vertebrates. These results indicate that the lipid mediators and their receptors have coevolved with the development of highly modulated physiological functions such as immune regulation and the formation of the central nervous system. Accordingly, examining the presence and role of lipid mediator GPCR orthologs in invertebrate species can provide insight into the development of fundamental biological processes across diverse taxa.
https://doi.org/10.1142/9781848166585_0011
UGTs (UDP glycosyltransferase) are the largest glycosyltransferase gene family in higher plants, modifying secondary metabolites, hormones, and xenobiotics. This gene family plays an important role in the vast diversity of plant secondary metabolites specific to species. Experimental data of biochemical activities and physiological roles of plant UGTs are increasing but most UGTs are not still functionally characterized. To understand their catalytic specificity and function from sequence data, phylogenetic analyses have been achieved mainly in Arabidopsis, but massive and comprehensive approach covering various species has not been applied yet. In this study, we collected 733 UGT sequences derived from 96 plant species and 252 substrate specificity data. We constructed a phylogenetic tree and divided most part of these genes into nine sequence groups, which are characterized by biochemical specificity. Furthermore, we performed genome-wide analysis of seven plant species UGTs by mapping them into these groups. We propose this is the first step to understand whole glycosylated secondary metabolites of each plant species from its genome information.
https://doi.org/10.1142/9781848166585_0012
We develop a general method to identify gene networks from pair-wise correlations between genes in a microarray data set and apply it to a public prostate cancer gene expression data from 69 primary prostate tumors. We define the degree of a node as the number of genes significantly associated with the node and identify hub genes as those with the highest degree. The correlation network was pruned using transcription factor binding information in VisANT (http://visant.bu.edu/) as a biological filter. The reliability of hub genes was determined using a strict permutation test. Separate networks for normal prostate samples, and prostate cancer samples from African Americans (AA) and European Americans (EA) were generated and compared. We found that the same hubs control disease progression in AA and EA networks. Combining AA and EA samples, we generated networks for low (<7) and high (≥7) Gleason grade tumors. A comparison of their major hubs with those of the network for normal samples identified two types of changes associated with disease: (i) Some hub genes increased their degree in the tumor network compared to their degree in the normal network, suggesting that these genes are associated with gain of regulatory control in cancer (e.g. possible turning on of oncogenes). (ii) Some hubs reduced their degree in the tumor network compared to their degree in the normal network, suggesting that these genes are associated with loss of regulatory control in cancer (e.g. possible loss of tumor suppressor genes). A striking result was that for both AA and EA tumor samples, STAT5a, CEBPB and EGR1 are major hubs that gain neighbors compared to the normal prostate network. Conversely, HIF-lα is a major hub that loses connections in the prostate cancer network compared to the normal prostate network. We also find that the degree of these hubs changes progressively from normal to low grade to high grade disease, suggesting that these hubs are master regulators of prostate cancer and marks disease progression. STAT5a was identified as a central hub, with ~120 neighbors in the prostate cancer network and only 81 neighbors in the normal prostate network. Of the 120 neighbors of STAT5a, 57 are known cancer related genes, known to be involved in functional pathways associated with tumorigenesis. Our method is general and can easily be extended to identify and study networks associated with any two phenotypes.
https://doi.org/10.1142/9781848166585_0013
Coexpressed genes are tentatively translated into proteins that are involved in similar biological functions. Here, we constructed gene coexpression networks from collected microarray data of the organisms Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli. Their degree distributions show the common property of an overrepresentation of highly connected nodes followed by a sudden truncation. In order to analyze this behavior, we present an evolutionary model simulating the genetic evolution. This model assumes that new genes emerge by duplication from a small initial set of primordial genes. Our model does not include the removal of unused genes but selective pressure is indirectly taken into account by preferentially duplicating the old genes. Thus, gene duplication represents the emergence of a new gene and its successful establishment. After a duplication event, all genes are slightly but iteratively mutated, thus altering their expression patterns. Our model is capable of reproducing global properties of the investigated coexpression networks. We show that our model reflects the mean inter-node distances and especially the characteristic humps in the degree distribution that, in the biological examples, result from functionally related genes.
https://doi.org/10.1142/9781848166585_0014
One of the open problems in systems biology is to infer dynamic gene networks describing the underlying biological process with mathematical, statistical and computational methods. The first-order difference equation-based models such as dynamic Bayesian networks and vector autoregressive models were used to infer time-lagged relationships between genes from time-series microarray data. However, two primary problems greatly reduce the effectiveness of current approaches. The first problem is the tacit assumption that time lag is stationary. The second is the inseparability between measurement noise and process noise (unmeasured disturbances that pass through time process).
To address these problems, we propose a stochastic differential equation model for inferring continuous-time dynamic gene networks under the situation in which both of the process noise and the observation noise exist. We present a collocation-based sparse estimation for simultaneous parameter estimation and model selection in the model. The collocation-based approach requires considerably less computational effort than traditional methods in ordinary stochastic differential equation models. We also incorporate various biological knowledge easily to refine the estimation accuracy with the proposed method. The results using simulated data and real time-series expression data of human primary small airway epithelial cells demonstrate that the proposed approach outperforms competing approaches and can provide significant genes influenced by gefitinib.
https://doi.org/10.1142/9781848166585_0015
DNA replication is a fundamental process that is tightly regulated during the cell cycle. In budding yeast it starts from multiple origins of replication and proceeds in a timely fashion according to a reproducible temporal program until the entire DNA is replicated exactly once per cell cycle. In this program an origin seems to have an inherent firing probability at a specific time in S-phase that is conserved over the population. However, what exactly determines the origin initiation time remains obscure. In this work, we analyze the gene content that clusters around replication origins following the assumption that inherent origin properties that determine staggered initiation times could potentially be mirrored in the close origin proximity. We perform a Gene Ontology term enrichment test and find that metabolic genes are significantly over-represented in the regions that are close to the starting points of DNA replication. Furthermore, functional analysis also reveals that catabolic genes cluster around early firing origins, whereas anabolic genes can rather be found in the proximity of late firing origins of replication. We speculate that, in budding yeast, gene function around replication origins correlates with their intrinsic probability to initiate DNA replication at a given point in S-phase.
https://doi.org/10.1142/9781848166585_0016
Signaling pathways are often represented by networks where each node corresponds to a protein and each edge corresponds to a relationship between nodes such as activation, inhibition and binding. However, such signaling pathways in a cell may be affected by genetic and epigenetic alteration. Some edges may be deleted and some edges may be newly added. The current knowledge about known signaling pathways is available on some public databases, but most of the signaling pathways including changes upon the cell state alterations remain largely unknown. In this paper, we develop an integer programming-based method for inferring such changes by using gene expression data. We test our method on its ability to reconstruct the pathway of colorectal cancer in the KEGG database.
https://doi.org/10.1142/9781848166585_0017
Boolean modeling has been successfully applied to the budding yeast cell cycle to demonstrate that both its structure and its timing are robustly designed. However, from these studies few conclusions can be drawn how robust the cell cycle arrest upon osmotic stress and pheromone exposure might be. We therefore implement a compact Boolean model of the S. cerevisiae cell cycle including its interfaces with the High Osmolarity Glycerol (HOG) and the pheromone pathways. We show that all initial states of our model robustly converge to a cyclic attractor in the absence of stress inputs whereas pheromone exposure and osmotic stress lead to convergence to singleton states which correspond to G1 and G2 arrest in silico. A comparison with random Boolean networks reveals, that cell cycle arrest under osmotic stress is a highly robust property of the yeast cell cycle. We implemented our model using the novel frontend booleannetGUI to the python software booleannet.
https://doi.org/10.1142/9781848166585_0018
For several decades, many methods have been developed for predicting organic synthesis paths. However these methods have non-polynomial computational time. In this paper, we propose a bottom-up dynamic programming algorithm to predict synthesis paths of target tree-structured compounds. In this approach, we transform the synthesis problem of tree-structured compounds to the generation problem of unordered trees by regarding tree-structured compounds and chemical reactions as unordered trees and rules, respectively. In order to represent rules corresponding to chemical reactions, we employ a subclass of NLC (Node Label Controlled) grammars. We also give some computational results on this algorithm.
https://doi.org/10.1142/9781848166585_bmatter
The following sections are included: