Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  • articleNo Access

    Genetic Studies for Diabetes in Asia – The Key to Precision Medicine (Vol. 24, No. 11, Full Issue)

      For the month of November 2020, APBN takes a dive into the genetic links behind the diabetes epidemic in Asia. Looking specifically at genetic markers for diabetes in Asian populations, researchers have discovered unique identifiers that could promote development precision medicine for diabetes in the Asia Pacific region. In the Spotlights, we explore innovative research from the 2020 Tokyo Tech Research Showcase and how they are helping to build a greener and sustainable future. Check out in our columns the future of commercial space flight and how space tourism is becoming a reality.

    • articleNo Access

      A comment on two-locus epistatic interaction models for genome-wide association studies

      Detection of epistatic interactions in genome-wide association studies is a computationally hard problem. Many detection algorithms have been proposed and will continue to be. Most of those algorithms measure their predictive power by running on simulated data many times under various disease models. However, we find that there have been subtle differences in interpreting the meaning of existing disease models among the previous studies on detection of epistatic interactions. We elucidate those differences and suggest that future studies on epistatic interactions in GWAS state explicitly which versions/interpretations are employed. We also provide a way to facilitate setting parameters of disease models.

    • articleOpen Access

      KDSNP: A kernel-based approach to detecting high-order SNP interactions

      Despite the accumulation of quantitative trait loci (QTL) data in many complex human diseases, most of current approaches that have attempted to relate genotype to phenotype have achieved limited success, and genetic factors of many common diseases are yet remained to be elucidated. One of the reasons that makes this problem complex is the existence of single nucleotide polymorphism (SNP) interaction, or epistasis. Due to excessive amount of computation for searching the combinatorial space, existing approaches cannot fully incorporate high-order SNP interactions into their models, but limit themselves to detecting only lower-order SNP interactions. We present an empirical approach based on ridge regression with polynomial kernels and model selection technique for determining the true degree of epistasis among SNPs. Computer experiments in simulated data show the ability of the proposed method to correctly predict the number of interacting SNPs provided that the number of samples is large enough relative to the number of SNPs. For cases in which the number of the available samples is limited, we propose to perform sliding window approach to ensure sufficiently large sample/SNP ratio in each window. In computational experiments using heterogeneous stock mice data, our approach has successfully detected subregions that harbor known causal SNPs. Our analysis further suggests the existence of additional candidate causal SNPs interacting to each other in the neighborhood of the known causal gene. Software is available from https://github.com/HirotoSaigo/KDSNP.

    • articleNo Access

      Constrained maximum entropy models to select genotype interactions associated with censored failure times

      We propose a novel screening method targeting genotype interactions associated with disease risks. The proposed method extends the maximum entropy conditional probability model to address disease occurrences over time. Continuous occurrence times are grouped into intervals. The model estimates the conditional distribution over the disease occurrence intervals given individual genotypes by maximizing the corresponding entropy subject to constraints linking genotype interactions to time intervals. The EM algorithm is employed to handle observations with uncertainty, for which the disease occurrence is censored. Stepwise greedy search is proposed to screen a large number of candidate constraints. The minimum description length is employed to select the optimal set of constraints. Extensive simulations show that five or so quantile-dependent intervals are sufficient to categorize disease outcomes into different risk groups. Performance depends on sample size, number of genotypes, and minor allele frequencies. The proposed method outperforms the likelihood ratio test, Lasso, and a previous maximum entropy method with only binary (disease occurrence, non-occurrence) outcomes. Finally, a GWAS study for type 1 diabetes patients is used to illustrate our method. Novel one-genotype and two-genotype interactions associated with neuropathy are identified.

    • articleNo Access

      An enrichment method for mapping ambiguous reads to the reference genome for NGS analysis

      Mapping short reads to a reference genome is an essential step in many next-generation sequencing (NGS) analyses. In plants with large genomes, a large fraction of the reads can align to multiple locations of the genome with equally good alignment scores. How to map these ambiguous reads to the genome is a challenging problem with big impacts on the downstream analysis. Traditionally, the default method is to assign an ambiguous read randomly to one of the many potential locations. In this study, we explore two alternative methods that are based on the hypothesis that the possibility of an ambiguous read being generated by a location is proportional to the total number of reads produced by that location: (1) the enrichment method that assigns an ambiguous read to the location that has produced the most reads among all the potential locations, (2) the probability method that assigns an ambiguous read to a location based on a probability proportional to the number of reads the location produces. We systematically compared the performance of the proposed methods with that of the default random method. Our results showed that the enrichment method produced better results than the default random method and the probability method in the discovery of single nucleotide polymorphisms (SNPs). Not only did it produce more SNP markers, but it also produced SNP markers with better quality, which was demonstrated using multiple mainstay genomic analyses, including genome-wide association studies (GWAS), minor allele distribution, population structure, and genomic prediction.

    • articleNo Access

      A novel SNP-set analytical method without distinguishing common variants or rare variants in genome-wide association study

      Single nucleotide polymorphism (SNP)-set analysis in genome-wide association studies (GWASs) has become a hot topic. Most existing SNP-set analystic methods are designed and work well according to the different natures of common or rare variants and associated diseases. But the information that the disease associated variants are common or rare cannot be gained in advance. Therefore, in this research, we proposed a new and powerful weighted function method without distinguishing common or rare variants to select tagging SNP-set. We applied our selection method to sequence kernel association test (SKAT) and compared the power with some existing methods. The simulation results showed that our method has higher power not only than SKAT in un-weighted case, but also than SKAT in other weighted functions. Moreover, the power is improved significantly when the minor allele frequency (MAF) of causal SNP is relatively small.

    • chapterOpen Access

      CERENKOV3: Clustering and molecular network-derived features improve computational prediction of functional noncoding SNPs

      Identification of causal noncoding single nucleotide polymorphisms (SNPs) is important for maximizing the knowledge dividend from human genome-wide association studies (GWAS). Recently, diverse machine learning-based methods have been used for functional SNP identification; however, this task remains a fundamental challenge in computational biology. We report CERENKOV3, a machine learning pipeline that leverages clustering-derived and molecular network-derived features to improve prediction accuracy of regulatory SNPs (rSNPs) in the context of post-GWAS analysis. The clustering-derived feature, locus size (number of SNPs in the locus), derives from our locus partitioning procedure and represents the sizes of clusters based on SNP locations. We generated two molecular network-derived features from representation learning on a network representing SNP-gene and gene-gene relations. Based on empirical studies using a ground-truth SNP dataset, CERENKOV3 significantly improves rSNP recognition performance in AUPRC, AUROC, and AVGRANK (a locus-wise rank-based measure of classification accuracy we previously proposed).

    • chapterOpen Access

      Efficient Differentially Private Methods for a Transmission Disequilibrium Test in Genome Wide Association Studies

      To achieve the provision of personalized medicine, it is very important to investigate the relationship between diseases and human genomes. For this purpose, large-scale genetic studies such as genome-wide association studies are often conducted, but there is a risk of identifying individuals if the statistics are released as they are. In this study, we propose new efficient differentially private methods for a transmission disequilibrium test, which is a family-based association test. Existing methods are computationally intensive and take a long time even for a small cohort. Moreover, for approximation methods, sensitivity of the obtained values is not guaranteed. We present an exact algorithm with a time complexity of 𝒪(nm) for a dataset containing n families and m single nucleotide polymorphisms (SNPs). We also propose an approximation algorithm that is faster than the exact one and prove that the obtained scores’ sensitivity is 1. From our experimental results, we demonstrate that our exact algorithm is 10, 000 times faster than existing methods for a small cohort with 5, 000 SNPs. The results also indicate that the proposed method is the first in the world that can be applied to a large cohort, such as those with 106 SNPs. In addition, we examine a suitable dataset to apply our approximation algorithm. Supplementary materials are available at https://github.com/ay0408/DP-trio-TDT.

    • chapterOpen Access

      Nonlinear post-selection inference for genome-wide association studies

      To address the lack of statistical power and interpretability of genome-wide association studies (GWAS), gene-level analyses combine the p-values of individual single nucleotide polymorphisms (SNPs) into gene statistics. However, using all SNPs mapped to a gene, including those with low association scores, can mask the association signal of a gene.

      We therefore propose a new two-step strategy, consisting in first selecting the SNPs most associated with the phenotype within a given gene, before testing their joint effect on the phenotype. The recently proposed kernelPSI framework for kernel-based post-selection inference makes it possible to model non-linear relationships between features, as well as to obtain valid p-values that account for the selection step.

      In this paper, we show how we adapted kernelPSI to the setting of quantitative GWAS, using kernels to model epistatic interactions between neighboring SNPs, and post-selection inference to determine the joint effect of selected blocks of SNPs on a phenotype. We illustrate this tool on the study of two continuous phenotypes from the UKBiobank.

      We show that kernelPSI can be successfully used to study GWAS data and detect genes associated with a phenotype through the signal carried by the most strongly associated regions of these genes. In particular, we show that kernelPSI enjoys more statistical power than other gene-based GWAS tools, such as SKAT or MAGMA.

      kernelPSI is an effective tool to combine SNP-based and gene-based analyses of GWAS data, and can be used successfully to improve both statistical performance and interpretability of GWAS.

    • chapterOpen Access

      Evaluation of PrediXcan for prioritizing GWAS associations and predicting gene expression

      Genome-wide association studies (GWAS) have been successful in facilitating the understanding of genetic architecture behind human diseases, but this approach faces many challenges. To identify disease-related loci with modest to weak effect size, GWAS requires very large sample sizes, which can be computational burdensome. In addition, the interpretation of discovered associations remains difficult. PrediXcan was developed to help address these issues. With built in SNP-expression models, PrediXcan is able to predict the expression of genes that are regulated by putative expression quantitative trait loci (eQTLs), and these predicted expression levels can then be used to perform gene-based association studies. This approach reduces the multiple testing burden from millions of variants down to several thousand genes. But most importantly, the identified associations can reveal the genes that are under regulation of eQTLs and consequently involved in disease pathogenesis. In this study, two of the most practical functions of PrediXcan were tested: 1) predicting gene expression, and 2) prioritizing GWAS results. We tested the prediction accuracy of PrediXcan by comparing the predicted and observed gene expression levels, and also looked into some potential influential factors and a filter criterion with the aim of improving PrediXcan performance. As for GWAS prioritization, predicted gene expression levels were used to obtain gene-trait associations, and background regions of significant associations were examined to decrease the likelihood of false positives. Our results showed that 1) PrediXcan predicted gene expression levels accurately for some but not all genes; 2) including more putative eQTLs into prediction did not improve the prediction accuracy; and 3) integrating predicted gene expression levels from the two PrediXcan whole blood models did not eliminate false positives. Still, PrediXcan was able to prioritize GWAS associations that were below the genome-wide significance threshold in GWAS, while retaining GWAS significant results. This study suggests several ways to consider PrediXcan’s performance that will be of value to eQTL and complex human disease research.

    • chapterOpen Access

      Leveraging putative enhancer-promoter interactions to investigate two-way epistasis in Type 2 Diabetes GWAS

      We utilized evidence for enhancer-promoter interactions from functional genomics data in order to build biological filters to narrow down the search space for two-way Single Nucleotide Polymorphism (SNP) interactions in Type 2 Diabetes (T2D) Genome Wide Association Studies (GWAS). This has led us to the identification of a reproducible statistically significant SNP pair associated with T2D. As more functional genomics data are being generated that can help identify potentially interacting enhancer-promoter pairs in larger collection of tissues/cells, this approach has implications for investigation of epistasis from GWAS in general.

    • chapterOpen Access

      Res2s2aM: Deep residual network-based model for identifying functional noncoding SNPs in trait-associated regions

      Noncoding single nucleotide polymorphisms (SNPs) and their target genes are important components of the heritability of diseases and other polygenic traits. Identifying these SNPs and target genes could potentially reveal new molecular mechanisms and advance precision medicine. For polygenic traits, genome-wide association studies (GWAS) are preferred tools for identifying trait-associated regions. However, identifying causal noncoding SNPs within such regions is a difficult problem in computational biology. The DNA sequence context of a noncoding SNP is well-established as an important source of information that is beneficial for discriminating functional from nonfunctional noncoding SNPs. We describe the use of a deep residual network (ResNet)-based model—entitled Res2s2aM—that fuses anking DNA sequence information with additional SNP annotation information to discriminate functional from nonfunctional noncoding SNPs. On a ground-truth set of disease-associated SNPs compiled from the Genome-wide Repository of Associations between SNPs and Phenotypes (GRASP) database, Res2s2aM improves the prediction accuracy of functional SNPs significantly in comparison to models based only on sequence information as well as a leading tool for post-GWAS noncoding SNP prioritization (RegulomeDB).

    • chapterOpen Access

      Protecting Genomic Data Privacy with Probabilistic Modeling

      The proliferation of sequencing technologies in biomedical research has raised many new privacy concerns. These include concerns over the publication of aggregate data at a genomic scale (e.g. minor allele frequencies, regression coefficients). Methods such as differential privacy can overcome these concerns by providing strong privacy guarantees, but come at the cost of greatly perturbing the results of the analysis of interest. Here we investigate an alternative approach for achieving privacy-preserving aggregate genomic data sharing without the high cost to accuracy of differentially private methods. In particular, we demonstrate how other ideas from the statistical disclosure control literature (in particular, the idea of disclosure risk) can be applied to aggregate data to help ensure privacy. This is achieved by combining minimal amounts of perturbation with Bayesian statistics and Markov Chain Monte Carlo techniques. We test our technique on a GWAS dataset to demonstrate its utility in practice. An implementation is available at https://github.com/seanken/PrivMCMC.

    • chapterOpen Access

      PENALIZED REGRESSION FOR GENOME-WIDE ASSOCIATION SCREENING OF SEQUENCE DATA

      Whole exome and whole genome sequencing are likely to be potent tools in the study of common diseases and complex traits. Despite this promise, some very difficult issues in data management and statistical analysis must be squarely faced. The number of rare variants identified by sequencing is apt to be much larger than the number of common variants encountered in current association studies. The low frequencies of rare variants alone will make association testing difficult. This article extends the penalized regression framework for model selection in genome-wide association data to sequencing data with both common and rare variants. Previous research has shown that lasso penalties discourage irrelevant predictors from entering a model. The Euclidean penalties dealt with here group variants by gene or pathway. Pertinent biological information can be incorporated by calibrating penalties by weights. The current paper examines some of the tradeoffs in using pure lasso penalties, pure group penalties, and mixtures of the two types of penalty. All of the computational and statistical advantages of lasso penalized estimation are retained in this richer setting. The overall strategy is implemented in the free statistical genetics analysis software MENDEL and illustrated on both simulated and real data.

    • chapterOpen Access

      MIXTURE MODEL FOR SUB-PHENOTYPING IN GWAS

      Genome Wide Association (GWA) studies resulted in discovery of genetic variants underlying several complex diseases including Chron's disease and age-related macular degeneration (AMD). Still geneticists find that in majority of studies the size of the effect even if it is significant tends to be very small. There are several factors contributing to this problem such as rare variants, complex relationships among SNPs (epistatic effect), and heterogeneity of the phenotype. In this work we focus on addressing phenotypic heterogeneity. We introduce the problem of identifying, from GWAS data, separate genotypic markers from overlapping mixtures of clinically indistinguishable phenotypes. We propose a generative model for this scenario and derive an expectation-maximization (EM) procedure to fit the model to data, as well as a novel screening procedure designed to identify skew specific to certain phenotypic regimes. We present results on several simulated datasets as well as preliminary findings in applying the model to type 2 diabetes dataset.

    • chapterOpen Access

      STATISTICAL EPISTASIS NETWORKS REDUCE THE COMPUTATIONAL COMPLEXITY OF SEARCHING THREE-LOCUS GENETIC MODELS

      The rapid development of sequencing technologies makes thousands to millions of genetic attributes available for testing associations with various biological traits. Searching this enormous high-dimensional data space imposes a great computational challenge in genome-wide association studies. We introduce a network-based approach to supervise the search for three-locus models of disease susceptibility. Such statistical epistasis networks (SEN) are built using strong pairwise epistatic interactions and provide a global interaction map to search for higher-order interactions by prioritizing genetic attributes clustered together in the networks. Applying this approach to a population-based bladder cancer dataset, we found a high susceptibility three-way model of genetic variations in DNA repair and immune regulation pathways, which holds great potential for studying the etiology of bladder cancer with further biological validations. We demonstrate that our SEN-supervised search is able to find a small subset of three-locus models with significantly high associations at a substantially reduced computational cost.

    • chapterOpen Access

      USING THE BIPARTITE HUMAN PHENOTYPE NETWORK TO REVEAL PLEIOTROPY AND EPISTASIS BEYOND THE GENE

      With the rapid increase in the quality and quantity of data generated by modern high-throughput sequencing techniques, there has been a need for innovative methods able to convert this tremendous amount of data into more accessible forms. Networks have been a corner stone of this movement, as they are an intuitive way of representing interaction data, yet they offer a full set of sophisticated statistical tools to analyze the phenomena they model. We propose a novel approach to reveal and analyze pleiotropic and epistatic effects at the genome-wide scale using a bipartite network composed of human diseases, phenotypic traits, and several types of predictive elements (i.e. SNPs, genes, or pathways). We take advantage of publicly available GWAS data, gene and pathway databases, and more to construct networks different levels of granularity, from common genetic variants to entire biological pathways. We use the connections between the layers of the network to approximate the pleiotropy and epistasis effects taking place between the traits and the predictive elements. The global graph-theory based quantitative methods reveal that the levels of pleiotropy and epistasis are comparable for all types of predictive element. The results of the magnified “glaucoma” region of the network demonstrate the existence of well documented interactions, supported by overlapping genes and biological pathway, and more obscure associations. As the amount and complexity of genetic data increases, bipartite, and more generally multipartite networks that combine human diseases and other physical attributes with layers of genetic information, have the potential to become ubiquitous tools in the study of complex genetic and phenotypic interactions.

    • chapterOpen Access

      GENOME-WIDE GENETIC INTERACTION ANALYSIS OF GLAUCOMA USING EXPERT KNOWLEDGE DERIVED FROM HUMAN PHENOTYPE NETWORKS

      The large volume of GWAS data poses great computational challenges for analyzing genetic interactions associated with common human diseases. We propose a computational framework for characterizing epistatic interactions among large sets of genetic attributes in GWAS data. We build the human phenotype network (HPN) and focus around a disease of interest. In this study, we use the GLAUGEN glaucoma GWAS dataset and apply the HPN as a biological knowledge-based filter to prioritize genetic variants. Then, we use the statistical epistasis network (SEN) to identify a significant connected network of pairwise epistatic interactions among the prioritized SNPs. These clearly highlight the complex genetic basis of glaucoma. Furthermore, we identify key SNPs by quantifying structural network characteristics. Through functional annotation of these key SNPs using Biofilter, a software accessing multiple publicly available human genetic data sources, we find supporting biomedical evidences linking glaucoma to an array of genetic diseases, proving our concept. We conclude by suggesting hypotheses for a better understanding of the disease.

    • chapterOpen Access

      The Impact of Ancestry on Genome-Wide Association Studies

      Genome-wide association studies (GWAS) are an important tool for the study of complex disease genetics. Decisions regarding the quality control (QC) procedures employed as part of a GWAS can have important implications on the results and their biological interpretation. Many GWAS have been conducted predominantly in cohorts of European ancestry, but many initiatives aim to increase the representation of diverse ancestries in genetic studies. The question of how these data should be combined and the consequences that genetic variation across ancestry groups might have on GWAS results warrants further investigation. In this study, we focus on several commonly used methods for combining genetic data across diverse ancestry groups and the impact these decisions have on the outcome of GWAS summary statistics. We ran GWAS on two binary phenotypes using ancestry-specific, multi-ancestry mega-analysis, and meta-analysis approaches. We found that while multi-ancestry mega-analysis and meta-analysis approaches can aid in identifying signals shared across ancestries, they can diminish the signal of ancestry-specific associations and modify their effect sizes. These results demonstrate the potential impact on downstream post-GWAS analyses and follow-up studies. Decisions regarding how the genetic data are combined has the potential to mask important findings that might serve individuals of ancestries that have been historically underrepresented in genetic studies. New methods that consider ancestry-specific variants in conjunction with the shared variants need to be developed.

    • chapterOpen Access

      Session Introduction: Precision Medicine: Multi-modal and multi-scale methods to promote mechanistic understanding of disease

      Precision medicine focuses on developing treatments and preventative strategies tailored to an individual’s genomic profile, lifestyle, and environmental context. The Precision Medicine sessions at the Pacific Symposium on Biocomputing (PSB) have consistently spotlighted progress in this domain. Our 2025 manuscript collection features algorithmic innovations that integrate data across scales and diverse data modalities, presenting novel techniques to derive clinically relevant insights from molecular datasets. These studies highlight recent advances in technology and analytics and their application toward realizing the potential of precision medicine to enhance human health outcomes and extend lifespan.