Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Next-generation sequencing has provided rapid collection and quantification of ‘big’ biological data. In particular, multi-omics and integration of different molecular data such as miRNA and mRNA can provide important insights to disease classification and processes. There is a need for computational methods that can correctly model and interpret these relationships, and handle the difficulties of large-scale data. In this study, we develop a novel method of representing miRNA-mRNA interactions to classify cancer. Specifically, graphs are designed to account for the interactions and biological communication between miRNAs and mRNAs, using message-passing and attention mechanisms. Patient-matched miRNA and mRNA expression data is obtained from The Cancer Genome Atlas for 12 cancers, and targeting information is incorporated from TargetScan. A Graph Transformer Network (GTN) is selected to provide high interpretability of classification through self-attention mechanisms. The GTN is able to classify the 12 different cancers with an accuracy of 93.56% and is compared to a Graph Convolutional Network, Random Forest, Support Vector Machine, and Multilayer Perceptron. While the GTN does not outperform all of the other classifiers in terms of accuracy, it allows high interpretation of results. Multi-omics models are compared and generally outperform their respective single-omics performance. Extensive analysis of attention identifies important targeting pathways and molecular biomarkers based on integrated miRNA and mRNA expression.
Duchenne and Becker muscular dystrophies are allelic X-linked disorders resulting from defects in the gene coding for the dystrophin muscle protein. The dystrophin gene is more than 2300kb in size and consists of 79 exons. This large size and complexity presents a challenge to direct identification of point mutations and small deletions that cannot be identified by multiplex deletion testing or Southern blotting. One approach to this problem is to analyse the expression of ectopic dystrophin mRNA transcripts. Although the dystrophin gene transcript is distributed only over approximately 0.1% of the genome, analysis of such ectopic lymphocyte dystrophin transcripts can shed light on the pathogenic events at the transcriptional level.
Calculated folding free energies of mRNA sequences from 33 human genes were each compared to ten step-wise partially shuffled versions of the same sequences, keeping the dinucleotide, or "neighbor", composition preserved. The first ten base-swap steps of the shuffling process were dissected by examining the folding free energies after each base-swap. This whole procedure was repeated a hundred times for each gene. Most of the genes showed a tendency for the average free energy to decline dramatically, suggesting that these natural mRNAs have been subject to selection to maximize the folding free energy. For most of the genes, the direction of average change for these first ten steps was accurately predicted by the mRNA folding Z-scores calculated for thoroughly shuffled sequences. Estimates ranged from 12 to 69.2 as to how many swap steps would be required to match the average free energy for fully shuffled sequences.