Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  • articleNo Access

    APPLICATION OF AUTOMATIC MUTATION–GENE PAIR EXTRACTION TO DISEASES

    To have a better understanding of the mechanisms of disease development, knowledge of mutations and the genes on which the mutations occur is of crucial importance. Information on disease-related mutations can be accessed through public databases or biomedical literature sources. However, information retrieval from such resources can be problematic because of two reasons: manually created databases are usually incomplete and not up to date, and reading through a vast amount of publicly available biomedical documents is very time-consuming. In this paper, we describe an automated system, MuGeX (Mutation Gene eXtractor), that automatically extracts mutation–gene pairs from Medline abstracts for a disease query. Our system is tested on a corpus that consists of 231 Medline abstracts. While recall for mutation detection alone is 85.9%, precision is 95.9%. For extraction of mutation–gene pairs, we focus on Alzheimer's disease. The recall for mutation–gene pair identification is estimated at 91.3%, and precision is estimated at 88.9%. With automatic extraction techniques, MuGeX overcomes the problems of information retrieval from public resources and reduces the time required to access relevant information, while preserving the accuracy of retrieved information.

  • articleNo Access

    Estimation of isoform expression in RNA-seq data using a hierarchical Bayesian model

    Estimation of gene or isoform expression is a fundamental step in many transcriptome analysis tasks, such as differential expression analysis, eQTL (or sQTL) studies, and biological network construction. RNA-seq technology enables us to monitor the expression on genome-wide scale at single base pair resolution and offers the possibility of accurately measuring expression at the level of isoform. However, challenges remain because of non-uniform read sampling and the presence of various biases in RNA-seq data. In this paper, we present a novel hierarchical Bayesian method to estimate isoform expression. While most of the existing methods treat gene expression as a by-product, we incorporate it into our model and explicitly describe its relationship with corresponding isoform expression using a Multinomial distribution. In this way, gene and isoform expression are included in a unified framework and it helps us achieve a better performance over other state-of-the-art algorithms for isoform expression estimation. The effectiveness of the proposed method is demonstrated using both simulated data with known ground truth and two real RNA-seq datasets from MAQC project. The codes are available at http://www.math.pku.edu.cn/teachers/dengmh/GIExp/.

  • articleNo Access

    Representation of DNA sequences in genetic codon context with applications in exon and intron prediction

    To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.