DNA methylation patterns have profound impacts on genome stability, gene expression and development. The molecular base of DNA methylation patterns has long been focused at single CpG sites level. Here, we construct a kinetic model of DNA methylation with collaborations between CpG sites, from which a correlation function was established based on experimental data. The function consists of three parts that suggest three possible sources of the correlation: movement of enzymes along DNA, collaboration between DNA methylation and nucleosome modification, and global enzyme concentrations within a cell. Moreover, the collaboration strength between DNA methylation and nucleosome modification is universal for mouse early embryo cells. The obtained correlation function provides insightful understanding for the mechanisms of inheritance of DNA methylation patterns.
Taiwan Develops New Biochip Technology.
Scientists Develop Cost-effective Nuclear Medicine Kit.
Protein Found Key to Human Flu Transmission.
IMCB Partners with Hitachi for Better Diagnostics.
Creating Safer and Cheaper Therapeutic Options for Stem Cells.
One Step Closer to Predicting Cancer Risk.
China Reveals Plans for National Lab System.
CSA Group Grants Philips Suzhou its First CPC Laboratory Qualification in China.
China-Myanmar Established Biodiversity Research Institute Unveiled in Nay Pyi Taw.
Athenex and Beijing Sciecure Pharmaceutical Announce Binding Agreement to Launch and Market Five Injectable Pharmaceutical Products in North America.
MicuRx Initiates Phase 3 Clinical Trial for MRX-I, A New Potent Oral Antibiotic Against Gram-positive Bacteria, Including MRSA.
WuXi AppTec Receives Inaugural "Company of the Year" Award from BayHelix.
The US FDA Granted Orphan Drug Designation to Yisheng Biopharma's Biological Product for Hepatocellular Carcinoma.
China Promotes Sci-Tech Cooperation Along Belt and Road.
Infinitus launches Joint Laboratory for the Research of Modern Preparation Technology with Huazhong University of Science and Technology.
The high-throughput correlated DNA methylation (DNAmeth) dataset generated from Illumina Infinium Human Methylation 27 (IIHM 27K) BeadChip assay. In the DNAmeth data, there are several CpG sites for every gene, and these grouped CpG sites are highly correlated. Most of the current filtering-based ranking (FBR) methods do not consider the group correlation structures. Obtaining the significant features with the FBR methods and applying these features to the classifiers to attain the best classification accuracy in highly correlated DNAmeth data is a challenging task. In this research, we introduce a resampling of group least absolute shrinkage and selection operator (glasso) FBR method capable of ignoring the unrelated features in the data considering the group correlation among the features. The various classifiers, such as random forests (RF), Naive Bayes (NB), and support vector machines (SVM) with the significant CpGs obtained from the proposed resampling of group lasso-based ranking (RGLR) method helped to boost the classification accuracy. Through simulated and experimental prostate DNAmeth data, we showed that higher performance of accuracy, sensitivity, specificity, and geometric mean is achieved by ignoring the unimportant CpG sites through the RGLR method.
DNA methylation is an epigenetic modification of DNA that adds a methyl group to cytosine. Aberrant DNA methylation in the CpG context is frequently observed in cancer cells and it is known that aberrant DNA methylation silences tumor repressor genes. However, the mechanism of DNA methylation is not well understood. A widely accepted hypothesis is that DNA methylation does not randomly occur and may be controlled by some instructive mechanisms. In this paper, we conducted an extensive study on this important question by using proprietary sequencing data from methyl-binding domain protein (MBD)-Cap ChIP sequencing experiments for 30 breast cancer cell lines. The goal of our study is to investigate difference in nucleotide composition around CpG sites, where high levels of methylation are observed, and use the information for modeling DNA methylation susceptibility. First, we observed that DNA methylation is not uniform in the whole-genome region and also that the character composition of CpG flanking sequences are significantly different between hyper- and hypo-methylated groups. In an in-depth study, we used information theoretic approaches such as entropy and relative entropy to delineate character composition features and found enrichment of A (Adenine) and T (Thymine) in specific positions around hyper-methylated sites. As the methylation level is increased, A, T proportions in specific positions around hypermethylated sites are increased while A, T proportions in other positions around hypermethylated sites are decreased. Second, we built predictive models for methylation susceptibility by using characters flanking CpG sites as features and hyper-/hypo-methylation status as class. Third, we constructed predictive models using a log odds score of two profiles from DNA sequences surrounding CpG sites of hyper- and hypo-methylated groups. This analysis showed that distribution of profile scores of hyper-/hypo-methylated sites sequences is quite distinct. Our genome-wide CpG methylation study shows that nucleotides around CpG sites caries information for cytosine methylation. This is consistent with the seminal work on the instructive evidence of DNA methylation by Keshet et al. (Nature Genetics, 38(2), 149–153 2006). Our study is on the full genome scale and used the sequencing data, thus our study is significantly different in terms of resolution of data and analysis methods used for the study by Keshet et al.
An increasing number of studies are using beadarrays to measure DNA methylation on a genome-wide basis. The purpose is to identify novel biomarkers in a wide range of complex genetic diseases including cancer. A common difficulty encountered in these studies is distinguishing true biomarkers from false positives. While statistical methods aimed at improving the feature selection step have been developed for gene expression, relatively few methods have been adapted to DNA methylation data, which is naturally beta-distributed. Here we explore and propose an innovative application of a recently developed variational Bayesian beta-mixture model (VBBMM) to the feature selection problem in the context of DNA methylation data generated from a highly popular beadarray technology. We demonstrate that VBBMM offers significant improvements in inference and feature selection in this type of data compared to an Expectation-Maximization (EM) algorithm, at a significantly reduced computational cost. We further demonstrate the added value of VBBMM as a feature selection and prioritization step in the context of identifying prognostic markers in breast cancer. A variational Bayesian approach to feature selection of DNA methylation profiles should thus be of value to any study undergoing large-scale DNA methylation profiling in search of novel biomarkers.
Whole-genome bisulfite sequencing (WGBS) is an approach of growing importance. It is the only approach that provides a comprehensive picture of the genome-wide DNA methylation profile. However, obtaining a sufficient amount of genome and read coverage typically requires high sequencing costs. Bioinformatics tools can reduce this cost burden by improving the quality of sequencing data. We have developed a statistical method Ajusted Local Kernel Smoother (AKSmooth) that can accurately and efficiently reconstruct the single CpG methylation estimate across the entire methylome using low-coverage bisulfite sequencing (Bi-Seq) data. We demonstrate the AKSmooth performance on the low-coverage (~ 4×) DNA methylation profiles of three human colon cancer samples and matched controls. Under the best set of parameters, AKSmooth-curated data showed high concordance with the gold standard high-coverage sample (Pearson 0.90), outperforming the popular analogous method. In addition, AKSmooth showed computational efficiency with runtime benchmark over 4.5 times better than the reference tool. To summarize, AKSmooth is a simple and efficient tool that can provide an accurate human colon methylome estimation profile from low-coverage WGBS data. The proposed method is implemented in R and is available at https://github.com/Junfang/AKSmooth.
Tumor purity is an intrinsic property of tumor samples and potentially has severe impact on many types of data analysis. We have previously developed a statistical method, InfiniumPurify, which could infer purity of a tumor sample given its tumor type (available in TCGA) or a set of informative CpG (iDMC) sites. However, in many clinical practices, researchers may focus on a specific type of tumor samples that is not included in TCGA, and samples which are too few to identify reliable iDMCs. This greatly restricts the application of InfiniumPurify in cancer research. In this paper, we proposed an updated version of InfiniumPurify (termed as uiInfiniumPurify) through identifying a universal set of iDMCs (uiDMCs) and redesigning the algorithm to determine hyper- and hypo-methylation status of each uiDMC. Through the application, we estimated tumor purities of 8830 tumor samples from TCGA. Result shows that our estimates are highly consistent with those by other available methods. Consequently, the updated uiInfiniumPurify, can be applied to a single sample (or a few samples) of interest whose tumor type is not included in TCGA. This characteristic will greatly broaden the application of uiInfiniumPurify in cancer research.
We present MethyMer, a Python-based tool aimed at selecting primers for amplification of complete CpG islands. These regions are difficult in terms of selecting appropriate primers because of their low-complexity, high GC content. Moreover, bisulfite treatment, in fact, leads to the reduction of the 4-letter alphabet (ATGC) to 3-letter one (ATG, except for methylated cytosines), and this also reduces region complexity and increases mispriming potential. MethyMer has a flexible scoring system, which optimizes the balance between various characteristics such as nucleotide composition, thermodynamic features (melting temperature, dimers ΔG, etc.), the presence of CpG sites and polyN tracts, and primer specificity, which is assessed with aligning primers to the bisulfite-treated genome using bowtie (up to three mismatches are allowed). Users are able to customize desired or limit ranges of various parameters as well as penalties for non-desired values. Moreover, MethyMer allows picking up the optimal combination of PCR primer pairs to perform the amplification of a large genomic locus, e.g. CpG island or other hard-to-study region, with minimal overlap of the individual amplicons. MethyMer incorporates ENCODE genome annotation records (promoter/enhancer/insulator), The Cancer Genome Atlas (TCGA) CpG methylation data derived with Illumina Infinium 450K microarrays, and records on correlations between TCGA RNA-Seq and CpG methylation data for 20 cancer types. These databases are included in the MethyMer release. Our tool is available at https://sourceforge.net/projects/methymer/.
In genetic association studies, regularization methods are often used due to their computational efficiency for analysis of high-dimensional genomic data. DNA methylation data generated from Infinium HumanMethylation450 BeadChip Kit have a group structure where an individual gene consists of multiple Cytosine–phosphate–Guanine (CpG) sites. Consequently, group-based regularization can precisely detect outcome-related CpG sites. Representative examples are sparse group lasso (SGL) and network-based regularization. The former is powerful when most of the CpG sites within the same gene are associated with a phenotype outcome. In contrast, the latter is preferred when only a few of the CpG sites within the same gene are related to the outcome. In this paper, we propose new variable selection strategy based on a selection probability that measures selection frequency of individual variables selected by both SGL and network-based regularization. In extensive simulation study, we demonstrated that the proposed strategy can show relatively outstanding selection performance under any situation, compared with both SGL and network-based regularization. Also, we applied the proposed strategy to identify differentially methylated CpG sites and their corresponding genes from ovarian cancer data.
Background: Tumor purity is of great significance for the study of tumor genotyping and the prediction of recurrence, which is significantly affected by tumor heterogeneity. Tumor heterogeneity is the basis of drug resistance in various cancer treatments, and DNA methylation plays a core role in the generation of tumor heterogeneity. Almost all types of cancer cells are associated with abnormal DNA methylation in certain regions of the genome. The selection of tumor-related differential methylation sites, which can be used as an indicator of tumor purity, has important implications for purity assessment. At present, the selection of information sites mostly focuses on inter-tumor heterogeneity and ignores the heterogeneity of tumor growth space that is sample specificity.
Results: Considering the specificity of tumor samples and the information gain of individual tumor sample relative to the normal samples, we present an approach, PESM, to evaluate the tumor purity through the specificity difference methylation sites of tumor samples. Applied to more than 200 tumor samples of Prostate adenocarcinoma (PRAD) and Kidney renal clear cell carcinoma (KIRC), it shows that the tumor purity estimated by PESM is highly consistent with other existing methods. In addition, PESM performs better than the method that uses the integrated signal of methylation sites to estimate purity. Therefore, different information sites selection methods have an important impact on the estimation of tumor purity, and the selection of sample specific information sites has a certain significance for accurate identification of tumor purity of samples.
Cancer is characterized by the dysregulation of alternative splicing (AS). However, the comprehensive regulatory mechanisms of AS in lung adenocarcinoma (LUAD) are poorly understood. Here, we displayed the AS landscape in LUAD based on the integrated analyses of LUAD’s multi-omics data. We identified 13,995 AS events in 6309 genes as differentially expressed alternative splicing events (DEASEs) mainly covering protein-coding genes. These DEASEs were strongly linked to “cancer hallmarks”, such as apoptosis, DNA repair, cell cycle, cell proliferation, angiogenesis, immune response, generation of precursor metabolites and energy, p53 signaling pathway and PI3K-AKT signaling pathway. We further built a regulatory network connecting splicing factors (SFs) and DEASEs. In addition, RNA-binding protein (RBP) mutations that can affect DEASEs were investigated to find some potential cancer drivers. Further association analysis demonstrated that DNA methylation levels were highly correlated with DEASEs. In summary, our results can bring new insight into understanding the mechanism of AS and provide novel biomarkers for personalized medicine of LUAD.
Papillary thyroid carcinoma (PTC) is typically an indolent cancer, yet a minority of cases develop lymph node metastasis. Due to the unclear mechanisms of lymph node metastasis, a considerable number of patients undergo unnecessary surgeries. Currently, the identification of key genetic biomarkers in high-dimensional data presents a significant challenge, thereby limiting research progress in this area. Here, we proposed a hybrid filter-wrapper feature selection strategy for core factor detection and developed MethyAE, a metastasis prediction model based on DNA methylation, utilizing an end-to-end learning auto-encoder. 46 methylated CpG sites were successfully identified as crucial biomarkers for lymph node metastasis. Leveraging 447 PTC samples from the Cancer Genome Atlas (221 with metastasis, 226 without), the MethyAE model achieves 88.9% accuracy and a recall rate of 88.6% in predicting lymph node metastasis, outperforming commonly used machine learning methods like logistic regression and random forest. Furthermore, the MethyAE model exhibits favorable performance in DNA methylation data from colon cancer, bladder cancer, and breast cancer. To the best of our knowledge, this is the first attempt to predict PTC lymph node metastasis through DNA methylation, offering pivotal decision-making criteria for avoiding unnecessary surgeries and selecting appropriate treatment plans for a substantial cohort of PTC patients.
Although so called “large p small n” problem is typical in bioinformatics, there are no effective feature selection methods applicable to them. In this chapter, we propose tensor decomposition based unsupervised feature extraction. The proposed method is applied to post-traumatic stress disorder medicated heart diseases and 26 non-small cell lung cancer cell lines, off-target effect of miRNA transfection, in silico drug discovery from gene expression, and social insects with multiple castes. In spite of the variety of targeted problems, the proposed method turn out to work pretty well.
At the onset of post-implantation embryonic development, the specification of human primordial germ cells (hPGCs) marks the preparation to attain the totipotent state through the extraordinary features of extensive epigenetic reprogramming (in terms of DNA demethylation and chromatin reorganization), a mitochondrial bottleneck, and a characteristic gene regulatory network, which are in stark contrast with those of the somatic lineage. Together, they provide for the attribute of immortality to the germ cell lineage, reflecting their significance in transgenerational inheritance. Interestingly, these features are the antithesis of the several hallmark phenotypes of aging, which gradually accumulate in the somatic lineage over time. In this chapter, we discuss the salient features of hPGC development, the hallmarks of aging, the application of cellular reprogramming as a therapeutic route to rejuvenation, and the intriguing translational potential of the lessons learned from the immortal human germline lineage in the field of anti-aging research.
DNA methylation has emerged as promising epigenetic markers for disease diagnosis. Both the differential mean (DM) and differential variability (DV) in methylation have been shown to contribute to transcriptional aberration and disease pathogenesis. The presence of confounding factors in large scale EWAS may affect the methylation values and hamper accurate marker discovery. In this paper, we propose a exible framework called methylDMV which allows for confounding factors adjustment and enables simultaneous characterization and identification of CpGs exhibiting DM only, DV only and both DM and DV. The proposed framework also allows for prioritization and selection of candidate features to be included in the prediction algorithm. We illustrate the utility of methylDMV in several TCGA datasets. An R package methylDMV implementing our proposed method is available at http://www.ams.sunysb.edu/~pfkuan/softwares.html#methylDMV.
Epigenomics involves the global study of mechanisms, such as histone modifications or DNA methylation, that have an impact on development or phenotype, are heritable, but are not directly encoded in the DNA sequence. The recent availability of large epigenomic data sets, coupled with the increasing recognition of the importance of epigenetic phenomena, has spurred a growing interest in computational methods for interpreting the epigenome.
Despite many gains in our understanding of genetic aspects of autism, the etiologic basis of autism in most affected individuals is unknown and there is a very incomplete understanding about the downstream biology occurring in the autistic brain. Epigenetics has emerged as a promising field that may explain how some cases of autism arise, both as a result of and independently from monogenic and chromosomal mutations. Here we discuss the emerging evidence of how epigenetic dysregulation may play a role in autism pathophysiology, as well as the challenges and potential strategies in studying possible epigenetic processes in autism.
Previous studies have established that there is a strong genetic component to the development of ASD, but these genetic risks do not account for all of the heritability of the disorder. This raises the prospect that alternative, epigenetic mechanisms may play a role in ASD development. Epigenetic mechanisms facilitate temporal and spatial regulation of gene expression, but are independent of changes to the underlying DNA sequence. Because epigenetic profiles are labile, they represent an intriguing mechanism whereby environmental influences, which are not severe enough to alter the DNA sequence of a cell, may alter gene expression and cellular response and contribute to ASD. In this chapter, we discuss the role of DNA methylation and histone modifications in the development of ASD.
While autism spectrum disorders (ASD) have a strong genetic basis, the apparent increase in prevalence in recent generations has raised concerns about environmental exposures and potential interactions with genetic risk factors. DNA methylation is an epigenetic layer on top of DNA sequence at the interface of genetic and environmental factors known to overlap with metabolism and nutrition. Here we review and discuss the evidence from human studies pointing to complex interactions between genetic factors and in utero environment. We also examine the impact of chemical exposures on DNA methylation in brain as well as overall effect on development and reproduction. Transgenerational effects of chemical exposures on ASD risk are still an open question, but evidence supporting the need to ask such questions are summarized.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.