Please login to be able to save your searches and receive alerts for new content matching your search criteria.
To have a better understanding of the mechanisms of disease development, knowledge of mutations and the genes on which the mutations occur is of crucial importance. Information on disease-related mutations can be accessed through public databases or biomedical literature sources. However, information retrieval from such resources can be problematic because of two reasons: manually created databases are usually incomplete and not up to date, and reading through a vast amount of publicly available biomedical documents is very time-consuming. In this paper, we describe an automated system, MuGeX (Mutation Gene eXtractor), that automatically extracts mutation–gene pairs from Medline abstracts for a disease query. Our system is tested on a corpus that consists of 231 Medline abstracts. While recall for mutation detection alone is 85.9%, precision is 95.9%. For extraction of mutation–gene pairs, we focus on Alzheimer's disease. The recall for mutation–gene pair identification is estimated at 91.3%, and precision is estimated at 88.9%. With automatic extraction techniques, MuGeX overcomes the problems of information retrieval from public resources and reduces the time required to access relevant information, while preserving the accuracy of retrieved information.
We have analyzed codon usage for poly-Q stretches of different lengths for the human proteome. First, we have obtained that all long poly-Q stretches in Protein Data Bank (PDB) belong to the disordered regions. Second, we have found the bias for codon usage for glutamine homo-repeats in the human proteome. In the cases when the same codon is used for poly-Q stretches only CAG triplets are found. Similar results are obtained for human proteins with glutamine homo-repeats associated with diseases. Moreover, for proteins associated with diseases (from the HraDis database), the fraction of proteins for which the same codon is used for glutamine homo-repeats is less (22%) than for proteins from the human proteome (26%). We have demonstrated for poly-Q stretches in the human proteome that in some cases (28) the splicing sites correspond to the homo-repeats and in 11 cases, these sites appear at the C-terminal part of the homo-repeats with statistical significance 10−8.
MicroRNAs (miRNA) are a type of non-coding RNA molecules that are effective on the formation and the progression of many different diseases. Various researches have reported that miRNAs play a major role in the prevention, diagnosis, and treatment of complex human diseases. In recent years, researchers have made a tremendous effort to find the potential relationships between miRNAs and diseases. Since the experimental techniques used to find that new miRNA-disease relationships are time-consuming and expensive, many computational techniques have been developed. In this study, Weighted K-Nearest Known Neighbors and Network Consistency Projection techniques were suggested to predict new miRNA-disease relationships using various types of knowledge such as known miRNA-disease relationships, functional similarity of miRNA, and disease semantic similarity. An average AUC of 0.9037 and 0.9168 were calculated in our method by 5-fold and leave-one-out cross validation, respectively. Case studies of breast, lung, and colon neoplasms were applied to prove the performance of our proposed technique, and the results confirmed the predictive reliability of this method. Therefore, reported experimental results have shown that our proposed method can be used as a reliable computational model to reveal potential relationships between miRNAs and diseases.
Circular RNAs (circRNAs) are endogenous non-coding RNAs with a covalently closed loop structure. They have many biological functions, mainly regulatory ones. They have been proven to modulate protein-coding genes in the human genome. CircRNAs are linked to various diseases like Alzheimer’s disease, diabetes, atherosclerosis, Parkinson’s disease and cancer. Identifying the associations between circular RNAs and diseases is essential for disease diagnosis, prevention, and treatment. The proposed model, based on the variational autoencoder and genetic algorithm circular RNA disease association (VAGA-CDA), predicts novel circRNA-disease associations. First, the experimentally verified circRNA-disease associations are augmented with the synthetic minority oversampling technique (SMOTE) and regenerated using a variational autoencoder, and feature selection is applied to these vectors by a genetic algorithm (GA). The variational autoencoder effectively extracts features from the augmented samples. The optimized feature selection of the genetic algorithm effectively carried out dimensionality reduction. The sophisticated feature vectors extracted are then given to a Random Forest classifier to predict new circRNA-disease associations. The proposed model yields an AUC value of 0.9644 and 0.9628 under 5-fold and 10-fold cross-validations, respectively. The results of the case studies indicate the robustness of the proposed model.