![]() |
This volume contains 31 peer-reviewed papers based on the presentations at the 7th International Annual Workshop on Bioinformatics and Systems Biology (IBSB 2007) held at the Human Genome Center, Institute of Medical Science, University of Tokyo from July 31 to August 2, 2007. This workshop started in 2001 as an event for doctoral students and young researchers to present and discuss their research results and approaches in bioinformatics and systems biology. It is part of a collaborative educational program involving leading institutions and leaders committed to the following programs and partner institutions:
• Boston (Charles DeLisi) — Graduate Program in Bioinformatics, Boston University
• Berlin (Herman-Georg Holzhütter) — The International Research Training Group (IRTG) “Genomics and Systems Biology of Molecular Networks”
• Kyoto/Tokyo (Minoru Kanehisa/Satoru Miyano) — Joint Bioinformatics Education Program of Kyoto University and University of Tokyo.
This volume is dedicated to the memory of Prof. Dr. Dr. h.c. Reinhart Heinrich, a former Professor at Humboldt University Berlin and a co-founder of this workshop.
Sample Chapter(s)
Chapter 1: Regulatory Elements of Marine Cyanobacteria (755 KB)
https://doi.org/10.1142/9781860949920_fmatter
DEDICATION.
PREFACE.
PROGRAM COMMITTEE.
CONTENTS.
https://doi.org/10.1142/9781860949920_0001
The free-living, oxyphototroph bacteria of the group of Prochlorococcus populate widely the oceans. Genomic information of nine marine cyanobacteria was used to predict signals essential for regulation. We implemented a pipeline that automatically calculates BLASTp alignments of query genomes, selects a representative subset of orthologs and predicts motifs conserved in their upstream sequences. Next, similar motifs are clustered into groups which could contain profiles recognized by different transcription factors. The phylogenetic footprinting pipeline revealed a minimal conserved set of putative transcription factors, binding sites and regulons for the chosen marine cyanobacterial genomes. DNA-binding motifs for NtcA and LexA were correctly identified. The relevance of transcriptional regulation of predicted cis elements was supported experimentally.
https://doi.org/10.1142/9781860949920_0002
Exploiting the ortholog/homolog information now available from the complete genomic sequences of twelve species of Drosophila, we have investigated the ability of regulatory site recognition methods to find regulatory changes for orthologs linked to chromosomal rearrangements. This has made use of the wealth of synteny information among these species. By comparing orthologs in multiple species, we found that the breakpoint of chromosomal rearrangements could have had an impact on regulatory changes of genes next to it with respect to the gene function and location. Extensions of our approach could be used to shed light on the role of gene regulation in the evolutionary adaptation to different environmental conditions.
https://doi.org/10.1142/9781860949920_0003
After the sequencing of the human genome, the publication of the genome of our nearest relative, the chimpanzee (Pan troglodytes) provided groundbreaking data improving the understanding of the recent human evolution. There are about forty million changes, most of them single nucleotide substitutions, which teach us about ourselves, both in terms of similarities and differences with chimpanzees. From a medical point of view differences in incidence and severity of diseases are of special importance to pinpoint novel targets and to develop innovative therapies. This analysis focuses on the cognition that chimpanzees rarely suffer from cancer. To elucidate possible reasons for this finding, we compare differences regarding apoptosis and DNA-repair on different levels of chromosome organization, gene structure, post-transcriptional and post-translational modifications to functional changes in protein structures. The result is a complex pattern of subtle variances and a few large-scale changes.
https://doi.org/10.1142/9781860949920_0004
The draft genome of Trichomonas vaginalis was recently published, but not much is known on why it has such a large genome. In part this size is due to many gene family expansions. For example we found over 100 members in the adenylyl cyclase family. About half are complete full length genes, and nearly half are initially confirmed to be pseudogenes, the remaining are either incomplete or the apparent result of assembly or sequencing problems. The family can be divided into two subgroups by sequence similarity. These can then be divided into functional and pseudo genes. Among all four of these sets the cyclase domain is very well conserved. We gave three possible hypotheses for that observation: a) Sequencing error or stop-codon read-through; b) Recency of duplication and mutation; c) The likelihood of functional pseudogene.
https://doi.org/10.1142/9781860949920_0005
Calcium is the most important second messenger in living cells serving as a critical link between a large variety of extracellular stimuli and the intracellular target. Often, the Ca2+ signal is carried by [Ca2+] oscillations. Our recent studies have demonstrated that in contrast to traditional ideas Ca2+ oscillations do not occur by simple synchronization of channel clusters opening and closing in an oscillatory fashion but originate from microscopic fluctuation caused by the stochastic binding of the ligands Ca2+ and IP3 to the receptor's binding sites. They are orchestrated spatially on the cell level by wave nucleation. In this paper we analyze the stochastic data and show how internal properties can be determined from global observations. Further, we analyze the information content of spontaneous and stimulated oscillations.
https://doi.org/10.1142/9781860949920_0006
The coordination of biological activities into daily cycles provides an important advantage for the fitness of diverse organisms. Thereby, an internal circadian oscillator drives gene expression in an approximate 24 hours rhythm. Circadian clocks are found in most eukaryotes. In prokaryotes only cyanobacteria are known to regulate their activities in a circadian rhythm. In vitro experiments showed that three cyanobacterial proteins KaiA, KaiB and KaiC together with ATP are sufficient to generate temperature-compensated circadian oscillations of KaiC protein phosphorylation. Thus, in contrast to eukaryotic clock models the cyanobacterial core oscillator operates independently of transcription and translation processes. Most previous models of the bacterial circadian clock used complex mathematical descriptions. Here, we suggest a minimal and manageable heuristic system. Even though only four reaction steps were assumed, our model exhibited sustained oscillations of KaiC phosphorylation. A simulation of known experimental data was successful as well as oscillations maintained even for a concerted increase of Kai protein concentration. Thus, we provided a useful minimal system of differential equations which might serve as a core module of the holistic cyanobacterial clockwork in the future.
https://doi.org/10.1142/9781860949920_0007
The circadian clock is a biological system providing an internal self-sustained temporal framework and adaptation mechanisms to the daily environmental rhythm. One of its behavioral implication in humans is the sleep-wake cycle. The core mammalian circadian clock is a system composed of interacting regulatory feedback loops present in many tissues throughout the body. The core set of circadian clock genes codes for proteins feeding back to regulate not only their own expression, but also that of clock output genes and regulatory pathways. Still, however, our understanding of processes regulated in a circadian fashion and the linkage between the molecular system and behavioral or physiological outputs is poor. Our work aims at identification of clock-controlled genes (CCGs) and their regulatory motifs. We analyzed several microarray measurements of genes with a daily oscillating expression and extracted 2065 of them together with their peak expression phases and oscillation amplitudes. For an in-depth analysis we selected a subset of 167 genes reported by multiple microarray experiments. Gene promoters were scanned in the search for known regulatory motifs of clock genes (E-Box, RRE, D-Box, CRE) as well as other over-represented regulatory motifs. We found an over-representation of the E-boxes and D-boxes in the selected subset of 167 CCGs. This over-representation is smaller when the list of 2065 genes is analyzed. The search for other regulatory motifs contained in the TRANSFAC database revealed a strong over-representation of some of them such as Spl, AP-2, STAT1, HIF-1 and E2F. The signals found in the promoter sequences indicate possible regulatory mechanisms important for the coordination of circadian rhythms.
https://doi.org/10.1142/9781860949920_0008
Modeling of specification events during development poses new challenges to biochemical modeling. These include data limitations and a notorious absence of homeostasis in developing systems. The sea urchin is one of the best studied model organisms concerning development and a network, the Endomesoderm Network, has been proposed that is presumed to control endoderm and mesoderm specification in the embryo of Strongy-locentrotus purpuratus. We have constructed a dynamic model of a subnetwork of the Endomesoderm Network. In constructing the model, we had to resolve the following issues: choice of appropriate subsystem, assignment of embryonic data to cellular model, choice of appropriate kinetics. Although the resulting model is capable of reproducing fractions of the experimental data, it falls short of reproducing specification of cell types. These findings can facilitate the refinement of the Endomesoderm Network.
https://doi.org/10.1142/9781860949920_0009
The understanding of complex biological processes whose function requires the interaction of a large number of components is strongly improved by the construction of mathematical models able to capture the underlying regulatory wirings and to predict the dynamics of the process in a variety of conditions. Iterative rounds of simulations and experimental analysis generate models of increasing accuracy, what is called the systems biology approach. The cell cycle is one of the complex biological processes that benefit from this approach, and in particular budding yeast is an established model organism for these studies. The recent publication about the modeling of the G1/S transition of the budding yeast cell cycle under a systems biology analysis has highlighted in particular the implications of the cell size determination that impinge the events driving DNA replication. During the life cycle of eukaryotic cells, DNA replication is restricted to a specific time window, called the S phase, and several control mechanisms ensure that each DNA sequence is replicated once, and only once, in the period from one cell division to the next. Here we extend the analysis of the G1/S transition model by including additional aspects concerning the DNA replication process, in order to give a reasonable explanation to the experimental dynamics, as well as of specific cell cycle mutants. Moreover, we show the mathematical description of the critical cell mass (Ps) that cells have to reach to start DNA replication, which value is modulated depending on the different activation of the replication origins. The sensitivity analysis of the influence that the kinetic parameters of the G1/S transition model have on the setting of the Ps value is also reported.
https://doi.org/10.1142/9781860949920_0010
Receptor trafficking is used to describe the internalization and recycling processes of receptors in the cell. Considerable efforts of quantitative modeling have been made so far in the study of receptor trafficking networks. For the reason of simple mathematical analysis, the canonical receptor trafficking models either ignored the recycling step of receptors or didn't consider the trafficking of empty receptors. Here, we revisit the canonical receptor trafficking models and implement steady state analysis for a general model of receptor trafficking networks, which is composed of the de novo appearance of surface receptor, ligand-receptor interaction, internalization, recycling and degradation of both empty and occupied receptors. We present the analytical solution of the two steady states of the receptor trafficking networks before and after the network is exposed to the signal. The results indicate that the distribution of the empty receptor at the cell surface and inside of the cell, before signal is added, is mainly determined by the ratio of internalization rate and recycling rate of empty receptor. Furthermore, the steady state analysis demonstrates that classic Scatchard plot analysis is still valid for the steady state of the complicated receptor trafficking network.
https://doi.org/10.1142/9781860949920_0011
Transcription factors (TFs) bind to the regulatory regions of genes in a cooperative manner. This article describes a method to detect pairs of transcription factor binding sites which co-occur in known regulatory regions more often than expected by mere combination of the individual binding sites. We determine frequently co-occurring TF pairs and evaluate the method using known TF interactions.
Furthermore we use co-occurrence scores to assess the regulatory potential of a sequence region by calculating a graph-based score. We show results for the score on known regulatory regions.
https://doi.org/10.1142/9781860949920_0012
We propose an approach to identify activated transcription factors from gene expression data using a statistical test. Applying the method, we can obtain a synoptic map of transcription factor activities which helps us to easily grasp the system's behavior. As a real data analysis, we use a case-control experiment data of mice treated by a drug of Kampo medicine remedying degraded myelin sheath of nerves in central nervous system. Kampo medicine is Japanese traditional herbal medicine. Since the drug is not a single chemical compound but extracts of multiple medicinal herb, the effector sites are possibly multiple. Thus it is hard to understand the action mechanism and the system's behavior by investigating only few highly expressed individual genes. Our method gives summary for the system's behavior with various functional annotations, e.g. TFAs and gene ontology, and thus offer clues to understand it in more holistic manner.
https://doi.org/10.1142/9781860949920_0013
We describe a new method based on principal component analysis and robust consensus ensemble clustering to identify and elucidate the subtypes of breast cancer disease. The method was applied to microarray gene expression data using micro-dissection of samples from 36 breast cancer patients with at least two of three pathological stages of disease. Controls were normal breast epithelial cells from 3 disease free patients. Our method identified an optimum set of genes and strong, stable clusters which correlated well with clinical classification into Luminal, Basal and Her2+ subtypes based on ER, PR and Her2 status. It also revealed a hierarchical portrait of disease progression through various grades and stages and identified genes and functional pathways for each stage, grade and disease subtype. We found that gene expression heterogeneity across subtypes is much greater than the heterogeneity of progression from DCIS to IDC within a subtype, suggesting that the disease subtypes are distinct disease processes. The averaging over data perturbations and clustering methods is critical in the robust identification of subtypes and gene markers for grade and progression.
https://doi.org/10.1142/9781860949920_0014
Protein interactions constitute the backbone of the cellular machinery in living systems. Their biological importance has led to systematic assemblies of large-scale protein-protein interaction maps for various organisms. Recently, the focus of such interactome projects has shifted towards the elucidation of the human interaction network. Several strategies have been employed to gain comprehensive maps of protein interactions occurring in the human body. For their efficient analysis, graph theory has become a favourite tool. It can identify characteristic features of interaction networks which can give us important insights into the general structure of the underlying molecular networks. Although such graph-theoretical analyses have delivered us a variety of interesting results, their general validity remains to be demonstrated. We therefore examined whether independently assembled human interaction networks show common structural features. Remarkably, while some general graph-theoretical features were found, we detected a strong dependency of network structures on the method used to generate the network. Our study strongly indicates that graph-theoretical analysis can be severely compromised by the observed structural divergence and reassessment of earlier results might be warranted.
https://doi.org/10.1142/9781860949920_0015
Amino acid indices are useful tools in bioinformatics. With the appearance of novel theory and technology, and the rapid increase of experimental data, building new indices to cope with new or unsolved old problems is still necessary. In this study, residue networks are constructed from the PDB structures of 640 representative proteins based on the distance between Cα atoms with an 8 Å cutoff. All these networks show typical small world features. New amino acid indices, termed relative connectivity, clustering coefficient, closeness and betweenness, are derived from the corresponding topological parameters of amino acids in the residue networks. The 4 new network based indices are closely clustered together and related to hydrophobicity and β propensity. When compared with related amino acid indices, the new indices show better or comparable performance in protein surface residue prediction. Relative connectivity is the best index and can reach a useful performance with an area under the curve about 0.75. It indicates that the network property based amino acid indices can be useful complements to the existing physicochemical property based amino acid indices.
https://doi.org/10.1142/9781860949920_0016
Protein-protein interactions are operative at almost every level of cell function. In the recent years high-throughput methods have been increasingly used to uncover protein-protein interactions at genome scale resulting in interaction maps for entire organisms. However, biochemical implications of high-throughput interactions are not always obvious. The question arises whether all interactions detected by in vitro experiments also play a functional role in the living cell. In this work we systematically analyze high-throughput protein-protein interactions stored in public databases in the context of metabolic networks. Classifying reaction pairs according to their topological distance revealed a significantly higher frequency of enzyme-enzyme interactions for directly neighbored reactions (distance = 1). To determine possible functional implications for these interactions we examined randomized networks using original enzyme interactions as well as randomly generated interaction data. A functional relevance of enzyme-enzyme interactions could be demonstrated for those reactions that exhibit low connectivity. As this is a characteristic of enzyme pairs in metabolic channeling we systematically searched the literature and indeed recovered a certain fraction of enzyme pairs that has already been implicated in metabolic channeling. However, a substantial number of enzyme pairs uncovered by our large-scale analysis remains that up to now has neither been functionally nor structurally classified and therefore present novel candidates of the metabolic channeling concept.
https://doi.org/10.1142/9781860949920_0017
Although whole-genome sequencing of many organisms has been completed, numerous newly discovered genes are still functionally unknown. Using high-throughput data such as protein-protein interaction (PPI) information to assign putative protein function to the unknown genes has been proposed, since in many cases it is not feasible to annotate the newly discovered genes by sequence-based approaches alone. In addition to PPI data, information such as protein localization within a cell may be employed to improve protein function prediction in two ways: 1) By using such localization information as a direct indicator of protein function (e.g. nucleolus localized proteins might be involved in ribosome biogenesis), and 2) by refining noisy PPI data by localization information. In the latter case, localization information may be used to distinguish different types of PPIs: Namely, interactions between co-localized proteins (more reliable), and interactions between differently localized proteins (potentially less reliable). In this paper, we propose a probabilistic method to predict protein function from PPI data and localization information. A Bayesian network is used to model dependencies between protein function, PPI data and localization information. We showed in our cross-validation experiment that in some cases, our method (conditioning PPI data by localization information) significantly improves prediction precision, as compared to a simple Naive Bayes method that assumes PPI data and localization information are conditionally independent given protein function. Finally, we predicted 57 unknown genes as "ribosome biogenesis" proteins.
https://doi.org/10.1142/9781860949920_0018
We evaluate the performance of common substitution matrices with respect to structural similarities. For this purpose, we apply an all-versus-all pairwise sequence alignment on the ASTRAL40 [7] dataset, consisting of 7290 entries with a pairwise sequence identity of at most 40%. Afterwards, we compare the 100 highest scoring sequence alignments to their corresponding structural alignments, which we obtain from our structure alignment database. Our database consists of about 18.6 million pairwise entries. We calculated these alignments by applying the current version of GANGSTA [1], our non-sequential structural alignment tool, on about 26 million pairs. The results illustrate the difficulty of homology based protein structure prediction in cases of low sequence similarity. Further, the large fraction of structurally similar proteins in the ASTRAL40 dataset is quantitatively measured. Thereby, this investigation yields a new perspective on the topic of sequence and structure relation. Hence, our finding is a large-scale quality measure for any sequence based method, which aims to detect structural similarities.
https://doi.org/10.1142/9781860949920_0019
A method is presented to calculate thermodynamic conformational entropy of a biomolecule from molecular dynamics simulation. Principal component analysis (the quasi-harmonic approximation) provides the first decomposition of the correlations in particle motion. Entropy is calculated analytically as a sum of independent quantum harmonic oscillators. The largest classical eigenvalues tend to be more anharmonic and show statistical dependence beyond correlation. Their entropy is corrected using a numerical method from information theory: the k-nearest neighbor algorithm. The method calculates a tighter upper limit to entropy than the quasi-harmonic approximation and is likewise applicable to large solutes, such as peptides and proteins. Together with an estimate of solute enthalpy and solvent free energy from methods such as MMPB/SA, it can be used to calculate the free energy of protein folding as well as receptor-ligand binding constants.
https://doi.org/10.1142/9781860949920_0020
Since protein complex crystallization is expensive and time-consuming, computational docking tools provide a valuable method to investigate protein interactions. While the sampling of possible docked conformers of two proteins can be performed efficiently by Fast Fourier Transform (FFT) methods, the selection of near-native decoys from the pool of thousands of possible decoys is still far from being solved. Here, a new approach for docking decoy selection by Monte Carlo stability analysis is presented. In the course of replica exchange Monte Carlo simulations (REMC), replica from near-native decoys show a significantly lower structural diversity than replica from non-native decoys. The effect is successfully applied to rank docking decoys in a benchmark set of 59 protein complexes.
https://doi.org/10.1142/9781860949920_0021
We demonstrate an approach to automatically generating kinetic models of metabolic networks. In a first step, the metabolic network is characterised by its stoichiometric structure. Then to each reaction a kinetic equation is associated describing the metabolic flux. For the kinetics we use a formula that is universally applicable to reactions with arbitrary numbers of substrates and products. Last, the kinetics of the reactions are assigned parameters. The resulting model in SBML format can be fed into standard simulation tools. The approach is applied to the sulphur-glutathione-pathway in Saccharomyces cerevisiae.
https://doi.org/10.1142/9781860949920_0022
The vast accumulation of biological pathway data scattered in various sources presents challenges in the exchange and integration of these data. Major new standards for representation of pathway data and the ability to check inconsistency in pathways are inevitable for the development of a reliable pathway data repository. Within the context of biological pathways, the cell system ontology (CSO) had been developed as a general framework to model system dynamics and visualization of diverse biological pathways. CSO provides an excellent environment for modeling, visualizing, and simulating complex molecular mechanisms at different levels of details. This paper examines whether CSO addresses the integration capability of pathway data with system dynamics. We present a conversion tool for converting BioPAX to CSO. Transforming the data from BioPAX to CSO not only allows an analysis of the dynamic behaviors in molecular interactions but also allows the results to be stored for further biological investigations, which is not possible in BioPAX. The conversion is done using simple inference algorithms with the addition of view- and simulation-related properties. We demonstrate how CSO can be used to build a complete and consistent pathway repository and enhance the interoperability among applications.
https://doi.org/10.1142/9781860949920_0023
The prediction of glycan structures from gene expression of glycosyltransferases (GTs) is a challenging new area in computational biology because the biosynthesis of glycan chains is under the control of GT expression. In this paper we developed a new method for predicting glycan structures from gene expression data. There are two main original aspects of the proposed method. First, we proposed to increase the number of predictable glycan structure candidates by estimating missing glycans from a global glycan structure map, which enables us to predict new glycan structures that are not stored in the database. Second, we proposed a more general scoring scheme based on real-valued gene expression intensity rather than converting it into binary information. In the result we applied the proposed method to predicting cancer-specific glycan structures from gene expression profiles for patients of acute lymphocytic leukemia (ALL) and acute myelocytic leukemia (AML). We confirmed that several of the predicted glycan structures successfully correspond to known cancer-specific glycan structures according to the literature, and our method outperforms the previous methods at a statistically significant level.
https://doi.org/10.1142/9781860949920_0024
Cigarette smoke is the major cause of lung cancer and chronic obstructive pulmonary disease in the United States. We have previously defined the impact of tobacco smoke on intrathoracic airway gene expression among healthy nonsmokers and smokers using standard 3'-based expression U133A arrays [12]. In this report, we compared the performance of the Affymetrix GeneChip Human Exon 1.0 ST array with the HG-U133A array for detecting smoking-related gene expression changes in large airway epithelium obtained at bronchoscopy. RNA obtained from the same bronchial airway epithelial cell samples of four current smokers and three never smokers was hybridized to both arrays. Out of 22,215 probe sets on HG-U133A, 14,741 RefSeq transcripts were mapped to 17,800 core transcripts on the Exon array and the 2 platforms were compared for this overlapping transcript set. While the reproducibility of both platforms was high, the Exon array had a slightly stronger correlation for technical replicates. A majority of the genes with the largest smoking-related fold changes were tightly correlated between platforms, but there were a number of smoking-related changes in gene expression that were detected only on the Exon arrays. Furthermore, while the HG-U133A study did not have enough power to detect any differentially expressed genes between the 4 current vs. 3 never smokers at a False Discovery Rate (FDR) < 0.05, seventy differential expressed genes were detected at FDR < 0.05 in the same set of samples using the Exon platform. These findings suggest that the all-Exon array is a more robust platform for measuring airway epithelial gene expression and can serve as an effective tool for exploring host response to and damage from cigarette smoke.
https://doi.org/10.1142/9781860949920_0025
We propose a novel method to classify samples where each sample is characterized by a time course gene expression profile. By exploiting the mixture of state space model, the proposed method addresses the following tasks: (1) clustering samples according to temporal patterns of gene expressions, (2) automatic detection of genes that discriminate identified clusters, (3) estimation of a restricted autoregressive coefficient for each cluster. We demonstrate the proposed method along with the cluster analysis of 53 multiple sclerosis patients under recombinant interferon β therapy with the longitudinal time course expression profiles.
https://doi.org/10.1142/9781860949920_0026
We have developed a PubMed article recommendation system, PURE, which is based on content-based filtering. PURE has a web interface by which users can add/delete their preferred articles. Once articles are registered, PURE then performs model-based clustering of the preferred articles and recommends the highly-rated articles by the prediction using the trained model. PURE updates the PubMed articles and reports the recommendation by email on daily-base. This system will be helpful for biologists to reduce the time required for gathering information from PubMed. PURE is downloadable under GPL license, via http://www.bic.kyoto-u.ac.jp/pathway/mami/out/PURE.tar.gz.
https://doi.org/10.1142/9781860949920_0027
Many N-myristoylated proteins play key roles in regulating cellular structure and function. In the previous study, we have applied the machine learning system BONSAI to predict patterns based on which positive and negative examples could be classified. Although BONSAI has helped establish 2 interesting rules regarding the requirements for N-myristoylation, the accuracy rates of these rules are not satisfactory. This paper suggests an enhancement of BONSAI by introducing an "insignificant indexing symbol" and demonstrates the efficiency of this enhancement by showing an improvement in the accuracy rates. We further examine the performance of this enhanced BONSAI by comparing the results of classification obtained the proposed method and an existing public method for the same sets of positive and negative examples.
https://doi.org/10.1142/9781860949920_0028
To identify pathways of carbon utilization in the metal-reducing marine bacterium Shewanella oneidensis MR-1, we assayed the expression of cells grown with various carbon sources using a high-density oligonucleotide Affymetrix microarray. Our expression profiles reveal genes and regulatory mechanisms which govern the sensing, import, and utilization of the nucleoside inosine, the chitin monomer N-acetylglucosamine, and a casein-derived mixture of amino acids. Our analysis suggests a prominent role for the pentose-phosphate and Entner-Doudoroff pathways in energy metabolism, and regulatory coupling between carbon catabolism and electron acceptor pathways. In sum, these results indicate that S. oneidensis possesses a broader capacity for carbon utilization than previously reported, a view with implications for optimizing its role in microbial fuel cell and bioremediative applications.
https://doi.org/10.1142/9781860949920_0029
With the increase in available post-genomic data and metabolic pathway information, we have been focusing on revealing the biological meaning of higher phenomena such as relationships of metabolic systems in different organisms. Metabolism plays an essential role in all cellular organisms, e.g. energy transportation, signal transduction and structural formation of cell components. The metabolic pathway of each organism has a different landscape from all others because of the different sets of enzymes encoded in the genome. The organisms that are incapable of producing their own essential chemical compounds should acquire them in some way from other organisms that can produce them. For example, several vitamins are required by animals to survive. In this manner we can assume that the different availabilities of metabolites may influence the relationship between organisms in nature. In this study, we focus on the differences in available metabolites among organisms. First, we divided 239 species with complete genomes into 9 organism groups in accordance with phylogeny and averaged out the annotation quality and the phylogenetic sparsity. Then, we calculated the commonly used chemical compounds between organism groups and the uniquely used chemical compounds in an organism group. The total number of metabolites we consider in this study is 1,074, which is about one-third of all metabolites that appear in the KEGG metabolic pathways. Finally we show the differences and the similarities between organism groups on every metabolic pathway map, illustrating the commonly observed substructures within the uniquely used metabolites. These results will help us to better comprehend the architecture of metabolic pathways and the relationships between organisms.
https://doi.org/10.1142/9781860949920_0030
Metabolic networks represent a set of reactions and associated metabolites that may occur in a given cell or tissue. They are frequently reconstructed from pure genomic data without thorough biochemical validation. Such genome-scale metabolic networks may thus either lack relevant or contain non-existent reactions and metabolites. Filling gaps and removing falsely predicted reactions can be a cumbersome procedure. On the other hand, using the network to build mathematical models addressing a specific problem (e.g. analyzing changes in the level of cellular ATP at substrate depletion) it may turn out that the network comprises more reactions and metabolites than actually needed or, on the contrary, that essential reactions are missing. Therefore, we propose a method to prune the whole network to a smaller sub-network which contains no dead ends and blocked reactions, i.e reactions that may neither proceed in forward nor backward direction. Inspection of this reduced network reveals its actual functional capabilities in terms of producible metabolites. We apply our method to a genome-scale metabolic network of E. coli. Depending on the choice of the exchangeable metabolites, composition of the external medium, and type of thermodynamic constraints we obtain different reduced network variants that may serve as a basis for flux balance models.
https://doi.org/10.1142/9781860949920_0031
Cooperation between organisms of different species is a widely observed phenomenon in biology, ranging from large scale systems such as whole ecosystems to more direct interactions like symbiotic relationships. In the present work, we explore inter-species cooperations on the level of metabolic networks.
For our analysis, we extract 447 organism specific metabolic networks from the KEGG database [7] and assess their biosynthetic capabilities by applying the method of network expansion [5]. We simulate the cooperation of two organisms by unifying their metabolic networks and introduce a measure, the gain Γ, quantifying the amount by which the biosynthetic capability of an organism is enhanced due to the cooperation with another species. For all theoretically possible pairs of organisms, this synergetic effect is determined and we systematically analyze its dependency on the dissimilarities of the interacting partners. We describe these dissimilarities by two different distance measures, where one is based on structural, the other on evolutionary differences.
With the presented method, we provide a conceptional framework to study the metabolic effects resulting from an interaction of different species. We outline possible enhancements of our analysis: by defining more realistic interacting networks and applying alternative structural investigation methods, our concept can be used to study specific symbiotic and parasitic relationships and may help to understand the global interplay of metabolic pathways over the boundary of organism specific systems.
https://doi.org/10.1142/9781860949920_bmatter
AUTHOR INDEX.
Sample Chapter(s)
Chapter 1: Regulatory Elements of Marine Cyanobacteria (755 KB)