How genome complexity can explain the hardness of aligning reads to genomes
Abstract
Although it is known that aligning short reads to reference genomes becomes harder if such genomes are embedded with complex repeat structures, there has been little effort to quantify this intuition. We investigated several measures of complexity, employed 10 popular short-read aligners to align a large number of diverse genomes, and found that unlike existing notions of complexity, a proposed notion of length sensitive measures correlated highly with the hardness of short-read alignment. This result enables speedy estimation of the hardness of alignment without aligning millions of reads to unknown genomes.
Publication Title
2014 IEEE 4th International Conference on Computational Advances in Bio and Medical Sciences, ICCABS 2014
Recommended Citation
Phan, V., Gao, S., Tran, Q., & Vo, N. (2014). How genome complexity can explain the hardness of aligning reads to genomes. 2014 IEEE 4th International Conference on Computational Advances in Bio and Medical Sciences, ICCABS 2014 https://doi.org/10.1109/ICCABS.2014.6863916