Essential proteins in cells and organisms play a critical role in maintaining normal life functioning and also provide important support in understanding disease pathogenesis. Although existing machine learning and deep learning methods have made some progress in predicting essential proteins, the use of data augmentation methods to improve model robustness and generalization becomes particularly important due to the critical role of data in training models. However, it remains a challenge to use limited data for data augmentation to improve the accuracy of predicting essential proteins. Therefore, we propose an algorithm for essential protein identification based on generative adversarial networks. First, we input the preprocessed gene expression data into a pre-trained generative adversarial network generator to expand the existing gene expression dataset. Second, features are extracted through the confrontation between the generator and the discriminator in the generative adversarial network and PCA technique is applied to downscale these features to make them more representative. Subsequently, the Node2vec method is applied to capture the rich features in the Protein–Protein Interaction (PPI) networks. Finally, we fuse the extracted gene expression profile features with the features of the PPI network and input them into a deep neural network for classification. Experimental results show that our proposed method has better performance compared to existing methods for predicting essential protein.
Drug repurposing is the process of identifying new clinical indications for an existing drug. Some of the recent studies utilized drug response prediction models to identify drugs that can be repurposed. By representing cell-line features as a pathway–pathway interaction network, we can better understand the connections between cellular processes and drug response mechanisms. Existing deep learning models for drug response prediction do not integrate known biological pathway–pathway interactions into the model. This paper presents a drug response prediction model that applies a graph convolution operation on a pathway–pathway interaction network to represent features of cancer cell-lines effectively. The model is used to identify potential drug repurposing candidates for Non-Small Cell Lung Cancer (NSCLC). Experiment results show that the inclusion of graph convolutional model applied on a pathway–pathway interaction network makes the proposed model more effective in predicting drug response than the state-of-the-art methods. Specifically, the model has shown better performance in terms of Root Mean Squared Error, Coefficient of Determination, and Pearson’s Correlation Coefficient when applied to the GDSC1000 dataset. Also, most of the drugs that the model predicted as top candidates for NSCLC treatment are either undergoing clinical studies or have some evidence in the PubMed literature database.
We model dependencies between m multivariate continuous-valued information sources by a combination of (i) a generalized canonical correlations analysis (gCCA) to reduce dimensionality while preserving dependencies in m - 1 of them, and (ii) summarizing dependencies with the remaining one by associative clustering. This new combination of methods avoids multiway associative clustering which would require a multiway contingency table and hence suffer from curse of dimensionality of the table. The method is applied to summarizing properties of yeast stress by searching for dependencies (commonalities) between expression of genes of baker's yeast Saccharomyces cerevisiae in various stressful treatments, and summarizing stress regulation by finally adding data about transcription factor binding sites.
Recent experiments at the level of a single cell have shown that gene expression occurs in abrupt stochastic bursts. Further, in an ensemble of cells, the levels of proteins produced have a bimodal distribution. In a large fraction of cells, the gene expression is either off or has a high value. We propose a stochastic model of gene expression, the essential features of which are stochasticity and cooperative binding of RNA polymerase. The model can reproduce the bimodal behavior seen in experiments.
The conditions of the chromosomes inside the nucleus in the Rabl configuration have been modelled as self-avoiding polymer chains under restraining conditions. To ensure that the chromosomes remain stretched out and lined up, we fixed their end points to two opposing walls. The numbers of segments N, the distances d1 and d2 between the fixpoints, and the wall-to-wall distance z (as measured in segment lengths) determine an approximate value for the Kuhn segment length kl. We have simulated the movement of the chromosomes using molecular dynamics to obtain the expected distance distribution between the genetic loci in the absence of further attractive or repulsive forces. A comparison to biological experiments on Drosophila Melanogaster yields information on the parameters for our model. With the correct parameters it is possible to draw conclusions on the strength and range of the attraction that leads to pairing.
Damp obstruction refers to the stagnation of vital energy (qi) caused by dampness resulting in dysfunction of body and limbs movement, as well as impairment of spleen and stomach digestive function. Damp obstruction is the dampness-induced imbalance of five elements; thus it serves as an ideal model for genomic study using cDNA microarray. We have performed microarray analyses to major organs of damp-obstructed rats. Cluster analysis for the expression profiles of major organs indicated that spleen, stomach, and kidney respond to dampness differently from heart, liver, lung, and brain. Gene expression profile specific to each element or group of elements was also identified. Our results are consistent with the philosophy of Chinese medicine that the five elements, metal (lung), wood (liver), water (kidney), fire (heart), and earth (spleen and stomach) coordinate by subjugation or restriction to maintain a healthy, physiological state. This is the first time that a powerful genomic tool was applied to probe the ancient theory of Chinese medicine.
The aim of the present study was to search for the differential gene expression and measure the serum level of a number of biochemical parameters in the cold Zheng (CZ) and non-cold Zheng (NCZ) in patients receiving hemodialysis. Hemodialysis (HD) patients were randomly selected from the CZ and NCZ groups. The between-group differences in gene expression were assessed using complementary DNA (cDNA) microarray. Differential gene expression was further validated by real-time reverse transcriptase polymerase chain reaction (RT-PCR). Our results demonstrated that the up-regulation of the inflammation-associated genes, ALOX5AP, S100A8 and S100A12, down-regulation of the genes related to immunity (DEFA4), metabolism (GNG11, PYGB, PRKAR2B), and growth/proliferation (HSF2, DDR2, TK1) were found in the CZ group. Furthermore, the CZ HD patients had significantly lower serum albumin levels compared with their NCZ counterparts (3.31 ± 0.08 g/dL versus 4.18 ± 0.12 g/dL). It appears reasonable to conclude that up-regulated inflammatory-gene expression (ALOX5AP, S100A8 and S100A12) may play an important role in CZ HD patients.
The purported effects of "circulation-improving" herbs used in traditional Chinese medicine (TCM) show striking similarities with the vascular actions of nitric oxide (NO) produced by the endothelial NO synthase (eNOS). We have previously reported that Salviae miltiorrhizae radix and Zizyphi spinosae semen upregulate eNOS expression. In the present study, we studied the effect on eNOS gene expression of 15 Chinese herbs with potential effects on the vasculature, and identified Prunella vulgaris L. (PVL) (flowering spike) as a potent eNOS-upregulating agent. In EA.hy 926 cells, a cell line derived from human umbilical vein endothelial cells (HUVEC), an aqueous extract of PVL increased eNOS promoter activity, eNOS mRNA and protein expressions, as well as NO production in concentration- and time-dependent manners. We have previously shown that ursolic acid (a constituent of Salviae miltiorrhizae radix), betulinic acid (a compound present in Zizyphi spinosae semen), luteolin and cynaroside (ingredients of artichoke, Cynara scolymus L.) are capable of enhancing eNOS gene expression. These compounds are also present in significant quantities in PVL. Thus, PVL contains active principles that stimulate human eNOS gene expression, and such compounds may have therapeutic potential against cardiovascular diseases.
Petiveria alliacea is a plant traditionally known for its anti-inflammatory and anti-tumor activities; however, the molecular and cellular mechanisms of its immunomodulatory properties are still unknown. Dendritic cells (DC) promote adaptive immune response by activating T lymphocytes, inducing an effector response or tolerance depending on the DC differentiation level. Herein, we evaluated the immunomodulatory activity of aqueous and organic plant fractions from P. alliacea using human monocyte-derived dendritic cells. The phenotype, cytokine secretion and gene expression were estimated after treatment with the plant fractions. We found that P. alliacea aqueous fraction induced morphological changes and co-stimulatory expression of CD86, indicating partial DC maturation. In addition, pro-inflammatory cytokines such as IL-1β, IL-6, IL-8, IL-10, IL-12p70, and TNF-α were secreted. The fraction also increased NF-κB gene expression while down-regulating TGFβ gene expression. These results suggest that the aqueous fraction can induce partial DC activation, a situation that can be relevant in tolerance induction. It is important to state that the organic fraction by itself does not show any immunomodulatory activity. This study provides evidence for possible immunomodulatory activity of P. alliacea extracts which has been used in traditional medicine in Colombia.
Cardamonin, the chalcone class, is one of the natural components from the spicy herbaceous plant (Alpinia conchigera Griff) and has anticancer activities in many human cancer cell lines. There is, however, no information to show that cardamonin induces cell apoptosis and alters apoptosis associated gene expressions in mouse leukemia cells. Thus, we investigated the effects of cardamonin on the apoptotic cell death and associated gene expression in mouse leukemia WEHI-3 cells in vitro. Results indicated that cardamonin decreased total viable cell number via induced cell morphological changes and apoptotic cell death in WEHI-3 cells that were assay by contrast-phase microscopy and flow cytometry examinations, respectively. The flow cytometry assay indicated that cardamonin increased reactive oxygen species (ROS) and Ca2+ production, decreased the levels of mitochondrial membrane potential (ΔΨm) and increased caspase-3, -8 and -9 activities in WEHI-3 cells. Western blotting was performed to analyze expression of relevant pro- and anti-apoptotic proteins and results showed that cardamonin decreased anti-apoptotic protein of Bcl-2 but increased pro-apoptotic protein of Bax in WEHI-3 cells. Furthermore, cardamonin increased cytochrome c, AIF and Endo G release, increased GRP78, caspase-12 that were associated with ER stress and increased Fas, Fas-Ligand and FADD expression. Furthermore, cardamonin increased the gene expressions of DAP (death-associated protein), TMBIM4 transmembrane (BAX inhibitor motif containing 4), ATG5 (autophagy related 5) but decreased the gene expression of DDIT3 (DNA-damage inducible transcript 3), DDIT4 (DNA-damage-inducible transcript 4), BAG6 (BCL2-associated athanogene 6), BCL2L13 [BCL2-like 13 (apoptosis facilitator)] and BRAT1 (BRCA1-associated ATM activator 1) that are associated with apoptosis pathways. Based on those findings, we may suggest cardamonin induced apoptotic cell death through Fas and Fas-Ligand-, caspase- and mitochondria-dependently pathways and also affects the apoptotic gene expression in WEHI-3 cells in vitro.
Experiments have shown that in embryonic stem cells, the promoters of many lineage-control genes contain “bivalent domains”, within which the nucleosomes possess both active (H3K4me3) and repressive (H3K27me3) marks. Such bivalent modifications play important roles in maintaining pluripotency in embryonic stem cells. Here, to investigate gene expression dynamics when there are regulations in bivalent histone modifications and random partition in cell divisions, we study how positive feedback to histone methylation/demethylation controls the transition dynamics of the histone modification patterns along with cell cycles. We constructed a computational model that includes dynamics of histone marks, three-stage chromatin state transitions, transcription and translation, feedbacks from protein product to enzymes to regulate the addition and removal of histone marks, and the inheritance of nucleosome state between cell cycles. The model reveals how dynamics of both nucleosome state transition and gene expression are dependent on the enzyme activities and feedback regulations. Results show that the combination of stochastic histone modification at each cell division and the deterministic feedback regulation work together to adjust the dynamics of chromatin state transition in stem cell regenerations.
In this work, the dynamics of fluctuations in gene expression time series is investigated. By using collected data of gene expression from yeast and human organisms, we found that the fluctuations of gene expression level and the average value of gene expression over time are strongly correlated and obey a scaling law. As this feature is found in yeast and human organisms, it suggests that probably this coupling is a common dynamical organizing property of all living systems. To understand these observations, we propose a stochastic model which can explain these collective fluctuations, and predict the scaling exponent. Interestingly, our results indicate that the observed scaling law emerges from the self-similarity symmetry embedded in gene expression fluctuations.
The effects of carrying capacity of environment K for degradation (the K effect for short) on the constitutive gene expression and a simple genetic regulation system are investigated by employing a stochastic Langevin equation combined with the corresponding Fokker–Planck equation for the two stochastic systems subjected to internal and external noises. This K effect characterizes the limited degradation ability of the environment for RNA or proteins, such as insufficient catabolic enzymes. The K effect could significantly change the distribution of mRNA copy-number in constitutive gene expression, and interestingly, it leads to the Fano factor slightly larger than one if only the internal noise exists. Therefore, that the recent experimental measurements suggest the Fano factor deviates from one slightly [D. L. Jones, R. C. Brewster and R. Phillips, Science346 (2014) 1533], probably originates from the K effect. The K effects on the steady and transient properties of genetic regulation system, have been investigated in detail. It could enhance the mean first passage time significantly especially when the noises are weak and reduce the signal-to-noise ratio in stochastic resonance substantially.
Microarray technology is a useful tool for monitoring the expression levels of thousands of genes simultaneously. Recently, mixture modeling has been used to extract expression signatures from gene expression profiles. In general, two separate steps are utilized to estimate the number of classes and model parameters, respectively. However, such a method is often time-consuming and leads to suboptimal solutions. In this paper, we therefore apply a one-step approach, namely Rival Penalized Expectation-Maximization (RPEM) algorithm, to analyze the gene expression data. The RPEM algorithm is capable of estimating the parameters of normal mixture model, while determining the number of classes automatically at the same time. Furthermore, we speed up the learning procedure of RPEM by proposing a new mechanism to adjust the learning rate dynamically. The numerical results on real gene expression data demonstrate that our proposed method is indeed effective and efficient.
When a normal cell becomes cancerous there will be change in expression of many genes in that cell. Identification of these changes in gene expression in cancer tissue may lead to the development of novel tools for early diagnosis and effective therapeutics. In this paper we present an association rule mining approach to identify the association between the genes that are differentially expressed in cancer tissue compared to normal tissue. We design an association rule mining algorithm GeneExpMiner for gene expression data mining. Serial Analysis of Gene Expression (SAGE) data related to pancreas cancer is used to demonstrate the approach. It is expected that the approach will help in developing better treatment methodologies for cancer and designing low cost microarray chips for diagnosing cancer. The results have been validated in terms of Gene Ontology and the signature genes that we have identified are matching with the published data.
The reaction time delay in the transcription process depends on the concentration of the protein because the transportation of mRNA from the nucleus to the cytoplasm becomes saturated. Thus the gene regulatory network is a state-dependent delayed model. This study aims to provide some mathematical explanations for the dynamics of the system, such as the linear stability and periodic oscillation, using mathematical techniques, such as formal linearization, linear stability analysis, the method of multiple scale (MMS), and the normal form. First, Hopf bifurcation of the state-dependent delayed gene regulatory networks model in the gene expression is analyzed by the method of multiple scales (MMS). Mechanism of periodic oscillations is obtained by Hopf bifurcation. The findings show that when degradation effects of the mRNA and protein are very strong, the oscillatory gene expression disappears. Then, a more realistic version of the aforementioned model with both constant and state-dependent time delays is established due to the existence of the constant time delay in the protein degradation process. Its nonresonant double Hopf bifurcation is found and analyzed using MMS. Interesting complex dynamic phenomena, such as periodic, quasi-periodic, and global period-2 solutions, are also discovered. These observations indicate that both state-dependent delay and constant delay could induce richer dynamics of the system, and the modified model may potentially describe the real dynamical mechanism (both the transcription process and the degradation process) more accurately in the gene expression. The findings may provide important guidance or hints to understand the real dynamic mechanism of the gene expression process.
A gene regulatory network can be considered a dynamic cellular system which describes the behavior (development) of a living cell and depends completely on the current internal state plus any external inputs, if these exist. Although many details inside a cell are not precisely known, gene expression data on a genome scale provide useful insights into such a cellular system. With gene expression data, a wide variety of models, such as Boolean networks and differential/difference equations, have been proposed to model gene regulatory networks. In these previously proposed models, genes are viewed as the internal state variables of a cellular system. This viewpoint has suffered from the underestimation of the model parameters. In addition, these models ignore an important problem with a gene regulatory network — time delay. Instead, this paper proposes a state-space model with time delays for gene regulatory networks. The proposed model views genes as the observation variables, whose expression values depend on the current internal state variables and any external inputs. Bayesian information criterion (BIC) and probabilistic principal component analysis (PPCA) are used to estimate the number of internal state variables and their expression profiles from gene expression data. By constructing dynamic equations with time delays for the internal state variables and the relationships between the internal state variables and the observation variables (gene expression profiles), state-space models with time delays for gene regulatory networks are constructed. The parameters of the proposed model can be unambiguously identified from time-course gene expression data with a lower computational cost. The proposed model is applied to two time-course gene expression datasets, and two gene regulatory~networks are inferred, respectively. The analysis shows that the inferred gene regulatory networks have several features of the real gene regulatory networks, such as the stability, the robustness, and the periodicity. Further, compared to state-space models without time delays, the proposed model with time delays has better prediction accuracy.
A question has been raised in several publications as to whether or not the expression levels or their logarithms for different genes are normally distributed. To answer this question would require a large data set where both biological variability and technological noise are present. An earlier attempt to test this assumption was limited to technical replicates and did not take multiplicity of tests into account when assessing the net results of goodness-of-fit testing. Therefore, the problem calls for further exploration. We applied several statistical tests to a large set of high-density oligonucleotide microarray data in order to systematically test for log-normality of expression levels for all the reporter genes. The multiple testing aspect of the problem was addressed by designing a pertinent resampling procedure. The results of testing did not reject normality of log-intensities in the non-normalized data under study. However, the global log-normality hypothesis was rejected beyond all reasonable doubt when the data were normalized by the quantile normalization procedure. Our results are consistent with the hypothesis that non-normalized expression levels of different genes are approximately log-normally distributed. The quantile normalization causes dramatic changes in the shape of marginal distributions of log-intensities which may be an indication that this procedure interferes not only in the technological noise but the true biological signal as well. This possibility invites a special investigation.
Microarray technology is used as a source of data for a wide range of biology studies. Useful biological information can be extracted from the analysis of microarray data, namely, the impact of a particular gene expression on the expression of other genes or the determination of expressed genes under different conditions. The purpose of this paper is to find co-behavioral genes in different data sets for different times and conditions. In other words, genes with similar behavior, same increase or decrease, under different medical, stress, and time conditions in terms of expression are determined. Multi-valued discretization of expression data was used for extracting genes with identical behavior. The algorithm proposed in this study is based on data and methods ensemble. The data ensemble technique was used to extract candidate genes with identical behavior. Other methods were also applied on all the data sets; as a result, many co-behavioral candidate genes with different similarity and correlation values were identified. Finally, the ultimate output was created from the ensemble of different methods. By applying the algorithm on yeast gene expression data, meaningful relations among genes were extracted.
Proteoglycans (PGs) are cell-membrane and extracellular matrix components with a wide variety of different functions. In the matrix, they are mainly of structural importance, although some of them have been ascribed specific regulatory functions, such as in the assembly of collagen fibers. PGs on the cell surface act as essential modulators of specific ligand-binding reactions, involving interactions between adjacent cells and between cells and surrounding matrix. Through these interactions they participate in different processes, including cell proliferation and differentiation. Qualitative and quantitative changes in PG expression can therefore be associated with various physiological and pathological conditions. We have optimized the conditions for semi-quantitative evaluation of proteoglycan expression by RT-PCR reaction, using glyceraldehyde-3-phosphate dehydrogenase (GAPDH) as reference gene. The relative fluorescence of analyte to reference amplimers can — within certain limits — be used to estimate the amount of target RNA and allows direct comparison of multiple samples. The profile of PG expression obtained in this way can be used to extend our current understanding of the possible functions that can be associated with these complex molecules.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.