iDNA6mA-Rice-DL: A local web server for identifying DNA N6-methyladenine sites in rice genome by deep learning method
Abstract
Accurate detection of N6-methyladenine (6mA) sites by biochemical experiments will help to reveal their biological functions, still, these wet experiments are laborious and expensive. Therefore, it is necessary to introduce a powerful computational model to identify the 6mA sites on a genomic scale, especially for plant genomes. In view of this, we proposed a model called iDNA6mA-Rice-DL for the effective identification of 6mA sites in rice genome, which is an intelligent computing model based on deep learning method. Traditional machine learning methods assume the preparation of the features for analysis. However, our proposed model automatically encodes and extracts key DNA features through an embedded layer and several groups of dense layers. We use an independent dataset to evaluate the generalization ability of our model. An area under the receiver operating characteristic curve (auROC) of 0.98 with an accuracy of 95.96% was obtained. The experiment results demonstrate that our model had good performance in predicting 6mA sites in the rice genome. A user-friendly local web server has been established. The Docker image of the local web server can be freely downloaded at https://hub.docker.com/r/his1server/idna6ma-rice-dl.
References
- 1. , iDNA6mA (5-step rule): Identification of DNA N6-methyladenine sites in the rice genome by intelligent computational model via Chou’s 5-step rule, Chemometr Intell Lab Syst 189 :96–101, 2019. Crossref, Google Scholar
- 2. , DNA methylation dynamics in health and disease, Nat Struct Mol Biol 20 :274–281, 2013. Crossref, Medline, Google Scholar
- 3. , DNA methylation: Roles in mammalian development, Nat Rev Gene 14 :204–220, 2013. Crossref, Medline, Google Scholar
- 4. , Impairment of DNA methylation maintenance is the main cause of global demethylation in naive embryonic stem cells, Mol Cell 62 :848–861, 2016. Crossref, Medline, Google Scholar
- 5. , iDNA6mA-Rice: A computational tool for detecting N6-methyladenine sites in rice, Front Gene 10 :793, 2019. Crossref, Medline, Google Scholar
- 6. , i6mA-Pred: Identifying DNA N6-methyladenine sites in the rice genome, Bioinformatics 35 :2796–2800, 2019. Crossref, Medline, Google Scholar
- 7. , N6-Methyladenine: A conserved and dynamic DNA mark, Adv Exp Med Biol 945 :213–246, 2016. Crossref, Medline, Google Scholar
- 8. , Identification of DNA N6-methyladenine sites by integration of sequence features, Epigenetics Chromatin 13 :8, 2020. Crossref, Medline, Google Scholar
- 9. , Meta-i6mA: An interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework, Brief Bioinform 22 :bbaa202, 2020. Crossref, Google Scholar
- 10. , A bioinformatics tool for the prediction of DNA N6-Methyladenine modifications based on feature fusion and optimization protocol, Front Bioeng Biotechnol 8 :502, 2020. Crossref, Medline, Google Scholar
- 11. , N6-methyladenosine regulatory machinery in plants: Composition, function and evolution, Plant Biotechnol J 17 :1194, 2019. Crossref, Medline, Google Scholar
- 12. , DNA N(6)-Adenine Methylation in Arabidopsis thaliana, Dev Cell 45 :406–416.e403, 2018. Crossref, Medline, Google Scholar
- 13. , iDNA6mA-PseKNC: Identifying DNA N(6)-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC, Genomics 111 :96–102, 2019. Crossref, Medline, Google Scholar
- 14. , Dive into Deep Learning, https://d2l.ai, 2020. Google Scholar
- 15. , Deep Learning, MIT Press, Cambridge, 2016. Google Scholar
- 16. , Deep learning models based on distributed feature representations for alternative splicing prediction, IEEE Access 6 :58826–58834, 2018. Crossref, Google Scholar
- 17. , GluonCV and GluonNLP: Deep learning in computer vision and natural language processing, J Mach Learn Res 21 :1–7, 2020. Medline, Google Scholar
- 18. , DeepInteract: Deep neural network based protein-protein interaction prediction tool, Current Bioinform 12 :551–557, 2017. Crossref, Google Scholar
- 19. , Survey of machine learning techniques in drug discovery, Current Drug Metabol 20 :185–193, 2019. Crossref, Medline, Google Scholar
- 20. , Shen H-B, Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks, BMC Genom 19 :511, 2018. Crossref, Medline, Google Scholar
- 21. , Branch point selection in RNA splicing using deep learning, IEEE Access 7 :1800–1807, 2018. Crossref, Google Scholar
- 22. , iRNA-PseKNC (2methyl): Identify RNA 2’-O-methylation sites by convolution neural network and Chou’s pseudo components, J Theor Biol 465 :1–6, 2019. Crossref, Medline, Google Scholar
- 23. , On the usage of JavaScript, Python and Ruby packages in docker hub images, Sci Comput Program 207 :102653, 2021. Crossref, Google Scholar
- 24. Docker, https://www.docker.com/. Google Scholar
- 25. , 70ProPred: A predictor for discovering sigma70 promoters based on combining multiple features, BMC Syst Biol 12 :44, 2018. Crossref, Medline, Google Scholar
- 26. , SVMQA: Support–vector-machine-based protein single-model quality assessment, Bioinformatics 33 :2496–2503, 2017. Crossref, Medline, Google Scholar
- 27. , PVP-SVM: Sequence-based prediction of phage virion proteins using a support vector machine, Front Microbiol 9 :476, 2018. Crossref, Medline, Google Scholar
- 28. , Fan G-L, Predicting peroxidase subcellular location by hybridizing different descriptors of Chou’pseudo amino acid patterns, Anal Biochem 458 :14–19, 2014. Crossref, Medline, Google Scholar
- 29. , Pretata: Predicting TATA binding proteins with novel features and dimensionality reduction strategy, BMC Syst Biol 10 :401–412, 2016. Crossref, Google Scholar
- 30. , MLACP: Machine-learning-based prediction of anticancer peptides, Oncotarget 8 :77121, 2017. Crossref, Medline, Google Scholar
- 31. , ProLanGO: Protein function prediction using neural machine translation based on a recurrent neural network, Molecules 22 :1732, 2017. Crossref, Google Scholar
- 32. , QAcon: Single model quality assessment using protein structural and contact information with machine learning techniques, Bioinformatics 33 :586–588, 2017. Crossref, Medline, Google Scholar
- 33. , Prediction of bacteriophage proteins located in the host cell using hybrid features, Chemometr Intell Lab Syst 180 :64–69, 2018. Crossref, Google Scholar
- 34. , iRNA-2OM: A sequence-based predictor for identifying 2’-O-methylation sites in homo sapiens, J Comput Biol 25 :1266–1277, 2018. Crossref, Medline, Google Scholar
- 35. , Predicting protein structural classes for low-similarity sequences by evaluating different features, Knowl-Based Syst 163: 787–793, 2019. Crossref, Google Scholar
- 36. , Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique, Bioinformatics 35 :2075–2083, 2019. Crossref, Medline, Google Scholar
- 37. , Quokka: A comprehensive tool for rapid and accurate prediction of kinase family-specific phosphorylation sites in the human proteome, Bioinformatics 34 :4223–4231, 2018. Crossref, Medline, Google Scholar
- 38. , PROSPERous: High-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy, Bioinformatics 34 :684–687, 2018. Crossref, Medline, Google Scholar
- 39. , iProt-Sub: A comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites, Brief Bioinform 20 :638–658, 2019. Crossref, Medline, Google Scholar
- 40. , Some practical issues of experimental design and data analysis in radiological ROC studies, Investig Radiol 24 :234–245, 1989. Crossref, Medline, Google Scholar
- 41. , Identification of bacterial cell wall lyases via pseudo amino acid composition, BioMed Res Int 2016 :1654623, 2016. Medline, Google Scholar
- 42. , The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology 143 :29–36, 1982. Crossref, Medline, Google Scholar