Alignment of short reads to multiple genomes using hashing


Background Recent advances in biotechnology have enabled highthroughput sequencing of genomes based on large numbers of short reads. Current methods [1,2], however, depend mostly on aligning reads to only one reference genome at a time, making it difficult to differentiate sequencing errors from single nucleotide variants (SNV). Materials and methods Inspired by [3], we propose a new method that attempts to take advantage of multiple genomes and SNV information to align reads. This approach is promising in that it allows us to distinguish between sequencing errors and SNV. Our proposed alignment algorithm uses read fragments to identify seeds and extend these seeds to find occurrences of reads in the genome. In this study, we have developed and implemented an algorithm using multiple genomes that captures genomic variations, indexes the multiple genomes and operates short read alignment on a collection of genomes. The preliminary result was validated on Aspergillus fumigatus.

Publication Title

BMC Bioinformatics