doc.ic.ac.uk/teaching/distinguished-projects/2015/t.kaplan.pdf

Application of GPU, FPGA and software optimisations to a bioinformatics pipeline used in cancer and non-invasive prenatal diagnosis (undergraduate thesis).

Abstract:

DNA methylation is an epigenetic process that is key to numerous cellular phenomena including embryonic development and disease. With next-generation bisulfite sequencing it is now possible to conduct whole-genome methylation analysis at single base resolution. This has recently been used in developing novel methods for cancer and non-invasive prenatal diagnosis. The challenge faced in this research is that the throughput of next-generation bisulfite sequencing machines has been improving at a faster rate than Moore’s law. This poses a significant computational and storage challenge for genomic analysis tools. In this report we optimise key bottlenecks of Methy-Pipe, an integrated bioinformatics pipeline developed for bisulfite sequencing alignment and methylation analysis. The contributions of our work include: FPGA and GPU optimisation of the bisulfite sequencing alignment module based on a novel oversampling method, a high throughput referential compression algorithm using hardware accelerated sequence alignment, software optimisation of the methylation calling module and translation of a downstream analysis script. Results indicate that the runtime of Methy-Pipe’s bottlenecks is reduced from 5.5 hours to 22 minutes, which could allow potentially life-saving diagnosis techniques to become routine in healthcare applications.