This volume contains papers presented at the 20th International Conference on Genome Informatics (GIW 2009) held at the Pacifico Yokohama, Japan from December 14 to 16, 2009. The GIW Series provides an international forum for the presentation and discussion of original research papers on all aspects of bioinformatics, computational biology and systems biology. Its scope includes biological sequence analysis, protein structure prediction, genetic regulatory networks, bioinformatic algorithms, comparative genomics, and biomolecular data integration and analysis. Boasting a history of 20 years, GIW is the longest-running international bioinformatics conference.
A total of 18 contributed papers were selected for presentation at GIW 2009 and for inclusion in this book. In addition, this book contains abstracts from the five invited speakers: Sean Eddy (HHMI's Janelia Farm, USA), Minoru Kanehisa (Kyoto University, Japan), Sang Yup Lee (KAIST, Korea), Hideyuki Okano (Keio University, Japan) and Mark Ragan (University of Queensland, Australia).
Sample Chapter(s)
Chapter 1: Predicting Protein-Protein Relationships From Literature Using Latent Topics (1,528 KB)
https://doi.org/10.1142/9781848165632_fmatter
The following sections are included:
https://doi.org/10.1142/9781848165632_0001
This paper investigates applying statistical topic models to extract and predict relationships between biological entities, especially protein mentions. A statistical topic model, Latent Dirichlet Allocation (LDA) is promising; however, it has not been investigated for such a task. In this paper, we apply the state-of-the-art Collapsed Variational Bayesian Inference and Gibbs Sampling inference to estimating the LDA model. We also apply probabilistic Latent Semantic Analysis (pLSA) as a baseline for comparison, and compare them from the viewpoints of log-likelihood, classification accuracy and retrieval effectiveness. We demonstrate through experiments that the Collapsed Variational LDA gives better results than the others, especially in terms of classification accuracy and retrieval effectiveness in the task of the protein-protein relationship prediction.
https://doi.org/10.1142/9781848165632_0002
We calculated intramolecular interaction energies of DNA by threading DNA sequences around crystal structures of nucleosomes. The strength of the intramolecular energy oscillations at frequency ~10 bps for dinucleotides was in agreement with previous nucleosome models. The intramolecular energy calculated along yeast genome positively correlated with nucleosome positioning experimentally measured.
https://doi.org/10.1142/9781848165632_0003
Biological experiments are usually set up in technical replicates (duplicates or triplicates) in order to ensure reproducibility and, to assess any significant error introduced during the experimental process. The first step in biological data analysis is to check the technical replicates and to confirm that the error of measure is small enough to be of no concern. However, little attention has been paid to this part of analysis. Here, we propose a general process to estimate the error of measure and consequently, to provide an interpretable and objective way to ensure the technical replicates' quality. Particularly, we illustrate our application in a DNA microarray dataset set up in technical duplicates.
https://doi.org/10.1142/9781848165632_0004
We have systematically analyzed various topological patterns comprising 1, 2 or 3 nodes in the mammalian metabolic, signal transduction and transcription networks: These patterns were analyzed with regard to their frequency and statistical over-representation in each network, as well as to their topological significance for the coherence of the networks. The latter property was evaluated using the pairwise disconnectivity index, which we have recently introduced to quantify how critical network components are for the internal connectedness of a network. The 1-node pattern made up by a vertex with a self-loop has been found to exert particular properties in all three networks. In general, vertices with a self-loop tend to be topologically more important than other vertices. Moreover, self-loops have been found to be attached to most 2-node and 3-node patterns, thereby emphasizing a particular role of self-loop components in the architectural organization of the networks. For none of the networks, a positive correlation between the mean topological significance and the Z-score of a pattern could be observed. That is, in general, motifs are not per se more important for the overall network coherence than patterns that are not over-represented. All 2- and 3-node patterns that are over-represented and thus qualified as motifs in all three networks exhibit a loop structure. This intriguing observation can be viewed as an advantage of loop-like structures in building up the regulatory circuits of the whole cell. The transcription network has been found to differ from the other networks in that (i) self-loops play an even higher role, (ii) its binary loops are highly enriched with self-loops attached, and (iii) feed-back loops are not over-represented. Metabolic networks reveal some particular topological properties which may reflect the fact that metabolic paths are, to a large extent, reversible. Interestingly, some of the most important 3-node patterns of both the transcription and the signaling network can be concatenated to subnetworks comprising many genes that play a particular role in the regulation of cell proliferation.
https://doi.org/10.1142/9781848165632_0005
The varDB project (http://www.vardb.org) aims to create and maintain a curated database of antigenic variation sequences as well as a platform for online sequence analysis. Along with the evolution of drug resistance, antigenic variation presents a moving target for public health endeavors and greatly complicates vaccination and eradication efforts. However, careful analysis of a large number of variant forms may reveal structural and functional constraints that can be exploited to identify stable and cross-reactive targets. VarDB attempts to facilitate this effort by providing streamlined interfaces to standard tools to help identify and prepare sequences for various forms of analysis. We have newly implemented such tools for codon usage, selection, recombination, secondary and tertiary structure, and sequence diversity analysis. Just as the adaptive immune system encodes a mechanism for dynamically generating diverse receptors instead of encoding a receptor for every possible epitope, many pathogens take advantage of heritable diversity generating mechanisms to produce progeny able to evade immune recognition. Instead of merely cataloging the observed variation, a major goal of varDB is to characterize and predict the potential range of antigenic variation within a pathogen by investigating the mechanisms by which it attempts to expand its implicit genome. We believe that the new sequence analysis tools will improve the usefulness and range of varDB.
https://doi.org/10.1142/9781848165632_0006
We introduce a new data structure, a localized suffix array, based on which occurrence information is dynamically represented as the combination of global positional information and local lexicographic order information in text search applications. For the search of a pair of words within a given distance, many candidate positions that share a coarse-grained global position can be compactly represented in term of local lexicographic orders as in the conventional suffix array, and they can be simultaneously examined for violation of the distance constraint at the coarse-grained resolution. Trade-off between the positional and lexicographical information is progressively shifted towards finer positional resolution, and the distance constraint is reexamined accordingly. Thus the paired search can be efficiently performed even if there are a large number of occurrences for each word. The localized suffix array itself is in fact a reordering of bits inside the conventional suffix array, and their memory requirements are essentially the same. We demonstrate an application to genome mapping problems for paired-end short reads generated by new-generation DNA sequencers. When paired reads are highly repetitive, it is time-consuming to naïvely calculate, sort, and compare all of the coordinates. For a human genome re-sequencing data of 36 base pairs, more than 10 times speedups over the naïve method were observed in almost half of the cases where the sums of redundancies (number of individual occurrences) of paired reads were greater than 2,000.
https://doi.org/10.1142/9781848165632_0007
Activities of prokaryotes are pivotal in shaping the environment, and are also greatly influenced by the environment. With the substantial progress in genome and metagenome sequencing and the about-to-be-standardized ecological context information, environment-centric comparative genomics will complement species-centric comparative genomics, illuminating how environments have shaped and maintained prokaryotic diversities. In this paper we report our preliminary studies on the association analysis of a particular duo of genomic and ecological traits of prokaryotes – gene-gene functional association patterns vs. oxygen requirement conditions. We first establish a stochastic model to describe gene arrangements on chromosomes, based on which the functional association between genes are quantified. The gene-gene functional association measures are validated using biological process ontology and KEGG pathway annotations. Student's t-tests are then performed on the aerobic and anaerobic organisms to identify those gene pairs that exhibit different functional association patterns in the two different oxygen requirement conditions. As it is difficult to design and conduct biological experiments to validate those genome-environment association relationships that have resulted from long-term accumulative genome-environment interactions, we finally conduct computational validations to determine whether the oxygen requirement condition of an organism is predictable based on gene-gene functional association patterns. The reported study demonstrates the existence and significance of the association relationships between certain gene-gene functional association patterns and oxygen requirement conditions of prokaryotes, as well as the effectiveness of the adopted methodology for such association analysis.
https://doi.org/10.1142/9781848165632_0008
We compared the protein-ligand binding free energies (∆G) obtained by the explicit water model, the MM-GB/SA (molecular-mechanics generalized Born surface area) model, and the docking scoring function. The free energies by the explicit water model and the MM-GB/SA model were calculated by the previously developed Smooth Reaction Path Generation (SRPG) method. In the SRPG method, a smooth reaction path was generated by linking two coordinates, one a bound state and the other an unbound state. The free energy surface along the path was calculated by a molecular dynamics (MD) simulation, and the binding free energy was estimated from the free energy surface. We applied these methods to the streptavidin-and-biotin system. The ∆G value by the explicit water model was close to the experimental value. The ∆G value by the MM-GB/SA model was overestimated and that by the scoring function was underestimated. The free energy surface by the explicit water model was close to that by the GB/SA model around the bound state (distances of < 6 Å), but the discrepancy appears at distances of > 6 Å. Thus, the difference in long-range Coulomb interaction should cause the error in ∆G. The scoring function cannot take into account the entropy change of the protein. Thus, the error of ∆G could depend on the target protein.
https://doi.org/10.1142/9781848165632_0009
A newly identified family of NAD-dependent D-2-hydroxyacid dehydrogenases (D-2-HydDHs) catalyzes the stereo-specific reduction of branched-chain 2-keto acids with bulky hydrophobic side chains to 2-hydroxyacids. They are promising targets for industrial/practical applications, particularly in the stereo-specific synthesis of C3-branched D-hydroxyacids. Comparative modeling and docking studies have been performed to build models of the enzyme-cofactor-substrate complexes and identify key residues for cofactor and substrate recognition. To explore large conformational transitions (domain motions), a normal mode analysis was employed using a simple potential and the protein models. Our analysis suggests that the new D-2-HydDH family members possess the N-terminal NAD(H) binding Rossmann-fold domain and the α-helical C-terminal substrate binding domain. A hinge bending motion between the N- and C-terminal domains was predicted, which would trigger the switch of the conserved essential Lys to form a key hydrogen bond with the C2 ketone of the 2-keto acid substrates. Our findings will be useful for site-directed mutagenesis studies and protein engineering.
https://doi.org/10.1142/9781848165632_0010
Local sequence-structure relationships in the loop regions of proteins were comprehensively estimated using simple prediction tools based on support vector regression (SVR). End-to-end distance was selected as a rough structural property of fragments, and the end-to-end distances of an enormous number of loop fragments from a wide variety of protein folds were directly predicted from sequence information by using SVR. We found that our method was more accurate than random prediction for predicting the structure of fragments comprising 5, 9, and 17 amino acids; moreover, the extended loop fragments could be successfully distinguished from turn structures on the basis of their sequences, which implies that the sequence-structure relationships were significant for loop fragments with a wide range of end-to-end distances. These results suggest that many loop regions as well as helices and strands restrict the conformational space of the entire tertiary structure of proteins to some extent; moreover, our findings throw light on the mechanism of protein folding and prediction of the tertiary structure of proteins without using structural templates.
https://doi.org/10.1142/9781848165632_0011
In scientific fields such as systems biology, evaluation of the relationship between network members (vertices) is approached using a network structure. In a co-expression network, comprising genes (vertices) and gene-to-gene links (edges) representing co-expression relationships, local modular structures with tight intra-modular connections include genes that are co-expressed with each other. For detecting such modules from among the whole network, an approach to evaluate network topology between modules as well as intra-modular network topology is useful. To detect such modules, we combined a novel inter-modular index with network density, the representative intra-modular index, instead of a single use of network density. We designed an algorithm to optimize the combinatory index for a module and applied it to Arabidopsis co-expression analysis. To verify the relation between modules obtained using our algorithm and biological knowledge, we compared it to the other tools for co-expression network analyses using the KEGG pathways, indicating that our algorithm detected network modules representing better associations with the pathways. It is also applicable to a large dataset of gene expression profiles, which is difficult to calculate in a mass.
https://doi.org/10.1142/9781848165632_0012
We have recently proposed novel kernel functions, called base-pairing profile local alignment (BPLA) kernels for discrimination and detection of functional RNA sequences using SVMs. We employ STRAL's scoring function which takes into account sequence similarities as well as upstream and downstream base-pairing probabilities, which enables us to model secondary structures of RNA sequences. In this paper, we develop a method for optimizing hyperparameters of BPLA kernels with respect to discrimination accuracy using a gradient-based optimization technique. Our experiments show that the proposed method can find a nearly optimal set of parameters much faster than the grid search on all parameter combinations.
https://doi.org/10.1142/9781848165632_0013
Efficient execution of data-intensive workflows has been playing an important role in bioinformatics as the amount of data has been rapidly increasing. The execution of such workflows must take into account the volume and pattern of communication. When orchestrating data-centric workflows, a centralized workflow engine can become a bottleneck to performance. To cope with the bottleneck, a hybrid approach with choreography for data management of workflows is proposed. However, when a workflow includes many repetitive operations, the approach might not gain good performance because of the overheads of its additional mechanism. This paper presents and evaluates an improvement of the hybrid approach for managing a large amount of data. The performance of the proposed method is demonstrated by measuring execution times of example workflows.
https://doi.org/10.1142/9781848165632_0014
An accurate scoring function is required for protein structure prediction. The scoring function should distinguish the native structure among model structures (decoys) and it also should have correlation with the quality of the decoys. However, we had observed the trade-off between the two requirements for seven all-atom distance dependent potentials in the previous study, where the native structure could be discriminated by examining the fine atomic details, whereas the correlation could be improved by examining coarse-grained interactions, To overcome this problem, in this study, we tried to make an improved scoring function by combining the seven potentials. First, the seven potentials were normalized by the expected energy values of the native and reference states of the target protein. Second, the relationship between the seven normalized energies and the quality (GDT_TS) of the structure were learned using support vector regression with the decoy sets of CASP6 as the training set. Then the meta-score was obtained as the predicted GDT_TS and it was tested with the decoys of the CASP7 experiment. The meta-score showed improvement in correlations with the GDT_TS and in the Z-score of the native structure. It also showed comparable performances in the ∆GDT and enrichment criteria, with the best component potentials. The meta-score could be also used as the absolute quality of the structures. Our study suggests the benefit of combining several different scoring functions for model evaluation.
https://doi.org/10.1142/9781848165632_0015
Protein complexes are responsible for most of vital biological processes within the cell. Understanding the machinery behind these biological processes requires detection and analysis of complexes and their constituent proteins. A wealth of computational approaches towards detection of complexes deal with clustering of protein-protein interaction (PPI) networks. Among these clustering approaches, the Markov Clustering (MCL) algorithm has proved to be reasonably successful, mainly due to its scalability and robustness. However, MCL produces many noisy clusters, which either do not represent any known complexes or have additional proteins (noise) that reduce the accuracies of correctly predicted complexes. Consequently, the accuracies of these clusters when matched with known complexes are quite low. Refinement of these clusters to improve the accuracy requires deeper understanding of the organization of complexes. Recently, experiments on yeast by Gavin et al. (2006) revealed that proteins within a complex are organized in two parts: core and attachment. Based on these insights, we propose our method (MCL-CA), which couples core-attachment based refinement steps to refine the clusters produced by MCL. We evaluated the effectiveness of our approach on two different datasets and compared the quality of our predicted complexes with that produced by MCL. The results show that our approach significantly improves the accuracies of predicted complexes when matched with known complexes. A direct result of this is that MCL-CA is able to cover larger number of known complexes than MCL. Further, we also compare our method with two very recently proposed methods CORE and COACH, which also capitalize on the core-attachment structure. We also discuss several instances to show that our predicted complexes clearly adhere to the core-attachment structure as revealed by Gavin et al.
https://doi.org/10.1142/9781848165632_0016
Nucleosome configuration in eukaryotic genomes is an important clue to clarify the mechanisms of regulation for various nuclear events. In the past few years, numerous computational tools have been developed for the prediction of nucleosome positioning, but there is no third-party benchmark about their performance. Here we present a performance evaluation using genome-scale in vivo nucleosome maps of two vertebrates and three invertebrates. In our measurement, two recently updated versions of Segal's model and Gupta's SVM with the RBF kernel, which was not implemented originally, showed higher prediction accuracy although their performances differ significantly in the prediction of medaka fish and candida yeast. The cross-species prediction results using Gupta's SVM also suggested rather specific characters of nucleosomal DNAs in medaka and budding yeast. With the analyses for over- and under-representat ion of DNA oligomers, we found both general and species-specific motifs in nucleosomal and linker DNAs. The oligomers commonly enriched in all five eukaryotes were only CA/TG and AC/GT. Thus, to achieve relatively high performance for a species, it is desirable to prepare the training data from the same species.
https://doi.org/10.1142/9781848165632_0017
We present a method for She classification of cancer based on gene expression profiles using single genes. We select the genes with high class-discrimination capability according to their depended degree by the classes. We then build classifiers based on the decision rules induced by single genes selected. We test our single-gene classification method on three publicly available cancerous gene expression datasets. In a majority of cases, we gain relatively accurate classification outcomes by just utilizing one gene. Some genes highly correlated with the pathogenesis of cancer are identified. Our feature selection and classification approaches are both based on rough sets, a machine learning method. In comparison with other methods, our method is simple, effective and robust. We conclude that, if gene selection is implemented reasonably, accurate molecular classification of cancer can be achieved with very simple predictive models based on gene expression profiles.
https://doi.org/10.1142/9781848165632_0018
Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods developed for SAGE data can reduce these false counts to a considerable degree, until now they have not been implemented in a scalable way. Recently, a program named FREC has been developed to address this problem for next generation sequencing data.
In this paper, we introduce RECOUNT, our implementation of an Expectation Maximization algorithm for tag count correction and compare it to FREC. Using both the reference genome and simulated data, we find that RECOUNT performs as well or better than FREC, while using much less memory (e.g. 5GB vs. 75GB). Furthermore, we report the first analysis of tag count correction with real data in the context of gene expression analysis. Our results show that tag count correction not only increases the number of mappable tags, but can make a real difference in the biological interpretation of next generation sequencing data. RECOUNT is an open-source C++ program available at http://seq.cbrc.jp/recount.
https://doi.org/10.1142/9781848165632_0019
Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST's programs are about 100-fold faster than the fastest competing implementations of probabilistic inference methods. I describe recent work on the HMMER software suite for protein sequence analysis, which implements probabilistic inference using profile hidden Markov models. Our aim in HMMER3 is to achieve BLAST's speed while further improving the power of probabilistic inference based methods. HMMER3 implements a new probabilistic model of local sequence alignment and a new heuristic acceleration algorithm. Combined with efficient vector-parallel implementations on modern processors, these improvements synergize. HMMER3 uses more powerful log-odds likelihood scores (scores summed over alignment uncertainty, rather than scoring a single optimal alignment); it calculates accurate expectation values (E-values) for those scores without simulation using a generalization of Karlin/Altschul theory; it computes posterior distributions over the ensemble of possible alignments and returns posterior probabilities (confidences) in each aligned residue; and it does all this at an overall speed comparable to BLAST. The HMMER project aims to usher in a new generation of more powerful homology search tools based on probabilistic inference methods.
https://doi.org/10.1142/9781848165632_0020
Twenty years ago the Human Genome Project was initiated aiming to uncover the genetic factors of human diseases and to develop new strategies for diagnosis, treatment, and prevention. Despite the successful sequencing of the human genome and the discovery of many disease related genes, our understanding of molecular mechanisms is still largely incomplete for the majority of diseases. In the KEGG database project we have been organizing our knowledge on cellular functions and organism behaviors in computable forms, especially in the forms of molecular networks (KEGG pathway maps) and hierarchical lists (BRITE functional hierarchies). The computerized knowledge has been widely used as a reference for biological interpretation of large-scale datasets generated by sequencing and other high-throughput experimental technologies. Our efforts are now focused on human diseases and drugs. We consider diseases as perturbed states of the molecular system that operates the cell and the organism, and drugs as perturbants to the molecular system. Since the existing disease databases are mostly for humans to read and understand, we develop a more computable disease information resource where our knowledge on diseases is represented as molecular networks or gene/molecule lists. When the detail of the molecular system is relatively well characterized, we use the molecular network representation and draw KEGG pathway maps. The Human Diseases category of the KEGG PATHWAY database contains about 40 pathway maps for cancers, immune disorders, neurodegenerative diseases, etc. When the detail is not known but disease genes are identified, we use the gene/molecule list representation and create a KEGG DISEASE entry. The entry contains a list of known disease genes and other relevant molecules including environmental factors, diagnostic markers, and therapeutic drugs. The list simply defines the membership to the underlying molecular system, but is still useful for computational analysis. In the KEGG DRUG database we capture knowledge on two types of molecular networks. One is the interaction network of drugs with target molecules, metabolizing enzymes, transporters, other drugs, and the pathways involving all these molecules. The other is the chemical structure transformation network representing the biosynthetic pathways of natural products in various organisms, as well as the history of drug development where drug structures have been continuously modified by medicinal chemists. KEGG DRUG contains chemical structures and/or chemical components of all prescription and OTC drugs in Japan including crude drugs and TCM (Traditional Chinese Medicine) formulas, as well as most prescription drugs in USA and many prescription drugs in Europe. I will report on our strategy to analyze the chemical architecture of natural products derived from enzymatic reactions (and enzyme genes) and the chemical architecture of marketed drugs derived from human made organic reactions in the history of drug development, towards drug discovery from the genomes of plants and microorganisms.
Note from Publisher: This article contains the abstract only.
https://doi.org/10.1142/9781848165632_0021
Systems biology has been changing the way biological and biotechnological research is performed. Now, systems biological approaches can be taken to develop bioprocesses for the production of valuable drugs, commodity and fine chemicals, fuels, and polymers and other materials; this is termed systems biotechnology. Here I present the general strategies for systems biotechnology and several examples of applying systems biotechnological strategies for the development of bioprocesses for the efficient production of chemicals and materials. This also includes a strategy of systems metabolic engineering for the development of microbial strains. It is expected that systems biotechnology will be at the heart of successful industrial biotechnology towards low carbon green growth of the world
https://doi.org/10.1142/9781848165632_0022
Induced pluripotent stem (iPS) cells are pluripotent stem cells directly reprogrammed from cultured mouse fibroblast by introducing Oct3/4, Sox2, c-Myc, and Klf4. Cells obtained using this technology, which allows the ethical issues and immunological rejection associated with embryonic stem (ES) cells to be avoided, might be a clinically useful source for cell replacement therapics. Here we demonstrate that murine iPS cells formed neurospheres that produced electrophysiologically functional neurons, astrocytes, and oligodendrocytes. Secondary neurospheres (SNSs) generated from various mouse iPS cell showed their neural differentiation capacity and teratoma formation after transplantation into the brain of immunodeficient NOD/SCID mice. We found that origin (source of somatic cells) of the iPS cells are the crucial determinant for the potential tumorigenicity of iPS-derived neural stem/progenitor cclls and that their tumorigenicity results from the persistent presence of undifferentiated cells within the SNSs. Furthermore, transplantation of non-tumorigenic Nanog-iPS-derived SNSs into mouse spinal cord injury (SCI) model promoted locomotor function recovery. Surprisingly, SNSs derived from c-Myc minus iPS cells generated without drug selection showed robust tumorigenesis, in spite of their potential to contribute adult chimeric mice without tumor formation.
https://doi.org/10.1142/9781848165632_0023
Perhaps the most-surprising discovery of the genome era has been the extent to which prokaryotic and many eukaryotic genomes incorporate genetic material from sources other than their parent(s). Lateral genetic transfer (LGT) among bacteria was first observed about 100 years ago, and is now accepted to underlie important phenomena including the spread of antibiotic resistance and ability to degrade xenobiotics. LGT is invoked, perhaps too readily, to explain a breadth of awkward data including compositional heterogeneity of genomes, disagreement among gene-sequence trees, and mismatch between physiology and systematics. At the same time many details of LGT remain unknown or controversial, and some key questions have scarcely been asked. Here I critically review what we think we know about the existence, extent, mechanism and impact of LGT; identify important open questions; and point to research directions that hold particular promise for elucidating the role of LGT in genome evolution.
Evidence for LGT in nature is not only inferential but also direct, and potential vectors are ubiquitous. Genetic material can pass between diverse habitats and be significantly altered during residency in viruses, complicating the inference of donors, In prokaryotes about twice as many genes are interrupted by LGT as are transferred intact, and about 5Short protein domains can be privileged units of transfer. Unresolved phylogenetic issues include the correct null hypothesis, and genes as units of analysis. Themes are beginning to emerge regarding the effect of LGT on cellular networks, but I show why generalization is premature. LGT can associate with radical changes in physiology and ecological niche. Better quantitative models of genome evolution are needed, and theoretical frameworks remain to be developed for some observations including chromosome assembly by LGT.
Note from Publisher: This article contains the abstract and reference only.
https://doi.org/10.1142/9781848165632_bmatter
The following sections are included:
Sample Chapter(s)
Chapter 1: Predicting Protein-Protein Relationships From Literature Using Latent Topics (1,528k)