Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

Bestsellers

Handbook of Machine Learning
Handbook of Machine Learning

Volume 1: Foundation of Artificial Intelligence
by Tshilidzi Marwala
Handbook on Computational Intelligence
Handbook on Computational Intelligence

In 2 Volumes
edited by Plamen Parvanov Angelov

 

  • articleNo Access

    MINIMUM MOSAIC INFERENCE OF A SET OF RECOMBINANTS

    In this paper, we investigate the central problem of finding recombination events. It is commonly assumed that a present population is a descendent of a small number of specific sequences called founders. The recombination process consists in given two equal length sequences, generates a third sequence of the same length by concatenating the prefix of one sequence with the suffix of the other sequence. Due to recombination, a present sequence (called a recombinant) is thus composed of blocks from the founders. A major question related to founder sequences is the so-called Minimum Mosaic problem: using the natural parsimony criterion for the number of recombinations, find the "best" founders. In this article, we prove that the Minimum Mosaic problem given haplotype recombinants with no missing values is NP-hard when the number of founders is given as part of the input and propose some exact exponential-time algorithms for the problem, which can be considered polynomial provided some extra information. Notice that Rastas and Ukkonen proved that the Minimum Mosaic problem is NP-hard using a somewhat unrealistic mutation cost function. The aim of this paper is to provide a better complexity insight of the problem.

  • articleNo Access

    STATISTICAL MODEL SELECTION METHOD TO ANALYZE COMBINATORIAL EFFECTS OF SNPS AND ENVIRONMENTAL FACTORS FOR BINARY DISEASE

    We propose a model selection method to estimate the relation of multiple SNPs, environmental factors and the binary disease trait. We applied the combination of logistic regression and genetic algorithm for this study. The logistic regression model can capture the continuous effects of environments without categorization, which causes the loss of the information. To construct an accurate prediction rule for binary trait, we adopted Akaike's information criterion (AIC) to find the most effective set of SNPs and environments. That is, the set of SNPs and environments that gives the smallest AIC is chosen as the optimal set. Since the number of combinations of SNPs and environments is usually huge, we propose the use of the genetic algorithm for choosing the optimal SNPs and environments in the sense of AIC. We show the effectiveness of the proposed method through the analysis of the case/control populations of diabetes, Alzheimer's disease and obesity patients. We succeeded in finding an efficient set to predict types of diabetes and some SNPs which have strong interactions to age while it is not significant as a single locus.

  • chapterOpen Access

    CERENKOV3: Clustering and molecular network-derived features improve computational prediction of functional noncoding SNPs

    Identification of causal noncoding single nucleotide polymorphisms (SNPs) is important for maximizing the knowledge dividend from human genome-wide association studies (GWAS). Recently, diverse machine learning-based methods have been used for functional SNP identification; however, this task remains a fundamental challenge in computational biology. We report CERENKOV3, a machine learning pipeline that leverages clustering-derived and molecular network-derived features to improve prediction accuracy of regulatory SNPs (rSNPs) in the context of post-GWAS analysis. The clustering-derived feature, locus size (number of SNPs in the locus), derives from our locus partitioning procedure and represents the sizes of clusters based on SNP locations. We generated two molecular network-derived features from representation learning on a network representing SNP-gene and gene-gene relations. Based on empirical studies using a ground-truth SNP dataset, CERENKOV3 significantly improves rSNP recognition performance in AUPRC, AUROC, and AVGRANK (a locus-wise rank-based measure of classification accuracy we previously proposed).

  • chapterOpen Access

    Convergent downstream candidate mechanisms of independent intergenic polymorphisms between co-classified diseases implicate epistasis among noncoding elements

    Eighty percent of DNA outside protein coding regions was shown biochemically functional by the ENCODE project, enabling studies of their interactions. Studies have since explored how convergent downstream mechanisms arise from independent genetic risks of one complex disease. However, the cross-talk and epistasis between intergenic risks associated with distinct complex diseases have not been comprehensively characterized. Our recent integrative genomic analysis unveiled downstream biological effectors of disease-specific polymorphisms buried in intergenic regions, and we then validated their genetic synergy and antagonism in distinct GWAS. We extend this approach to characterize convergent downstream candidate mechanisms of distinct intergenic SNPs across distinct diseases within the same clinical classification. We construct a multipartite network consisting of 467 diseases organized in 15 classes, 2,358 disease-associated SNPs, 6,301 SNPassociated mRNAs by eQTL, and mRNA annotations to 4,538 Gene Ontology mechanisms. Functional similarity between two SNPs (similar SNP pairs) is imputed using a nested information theoretic distance model for which p-values are assigned by conservative scale-free permutation of network edges without replacement (node degrees constant). At FDR≤5%, we prioritized 3,870 intergenic SNP pairs associated, among which 755 are associated with distinct diseases sharing the same disease class, implicating 167 intergenic SNPs, 14 classes, 230 mRNAs, and 134 GO terms. Co-classified SNP pairs were more likely to be prioritized as compared to those of distinct classes confirming a noncoding genetic underpinning to clinical classification (odds ratio ∼3.8; p≤10-25). The prioritized pairs were also enriched in regions bound to the same/interacting transcription factors and/or interacting in long-range chromatin interactions suggestive of epistasis (odds ratio ∼ 2,500; p≤10-25). This prioritized network implicates complex epistasis between intergenic polymorphisms of co-classified diseases and offers a roadmap for a novel therapeutic paradigm: repositioning medications that target proteins within downstream mechanisms of intergenic disease-associated SNPs. Supplementary information and software: http://lussiergroup.org/publications/disease_class

  • chapterNo Access

    SNP Selection for Psychiatric Disease Association Based on Allele Frequency Plots

    Last years the study of the influence of genetic factors in the susceptibility to some common diseases has been obtaining satisfactory results. These results contribute to the prevention of these diseases as well as to design personalized treatments. The present work introduces a technique based on 2D representations of the genetic data that can contribute to find these disease associations. A real case application is presented in which we analyze the relation between the allele pair values of 748 Single Nucleotide Polymorphisms (SNPs) and the susceptibility to seven common psychiatrical disorders.