Please login to be able to save your searches and receive alerts for new content matching your search criteria.
In recent years, there have been many studies utilizing DNA methylome data to answer fundamental biological questions. Bisulfite sequencing (BS-seq) has enabled measurement of a genome-wide absolute level of DNA methylation at single-nucleotide resolution. However, due to the ambiguity introduced by bisulfite-treatment, the aligning process especially in large-scale epigenetic research is still considered a huge burden. We present Cloud-BS, an efficient BS-seq aligner designed for parallel execution on a distributed environment. Utilizing Apache Hadoop framework, Cloud-BS splits sequencing reads into multiple blocks and transfers them to distributed nodes. By designing each aligning procedure into separate map and reducing tasks while an internal key-value structure is optimized based on the MapReduce programming model, the algorithm significantly improves alignment performance without sacrificing mapping accuracy. In addition, Cloud-BS minimizes the innate burden of configuring a distributed environment by providing a pre-configured cloud image. Cloud-BS shows significantly improved bisulfite alignment performance compared to other existing BS-seq aligners. We believe our algorithm facilitates large-scale methylome data analysis. The algorithm is freely available at https://paryoja.github.io/Cloud-BS/.