Faculty Publications

How genome complexity can explain the difficulty of aligning reads to genomes

Vinhthuy Phan, University of Memphis
Shanshan Gao, University of MemphisFollow
Quang Tran, University of Memphis
Nam S. Vo, University of Memphis

Abstract

Background: Although it is frequently observed that aligning short reads to genomes becomes harder if they contain complex repeat patterns, there has not been much effort to quantify the relationship between complexity of genomes and difficulty of short-read alignment. Existing measures of sequence complexity seem unsuitable for the understanding and quantification of this relationship. Results: We investigated several measures of complexity and found that length-sensitive measures of complexity had the highest correlation to accuracy of alignment. In particular, the rate of distinct substrings of length k, where k is similar to the read length, correlated very highly to alignment performance in terms of precision and recall. We showed how to compute this measure efficiently in linear time, making it useful in practice to estimate quickly the difficulty of alignment for new genomes without having to align reads to them first. We showed how the length-sensitive measures could provide additional information for choosing aligners that would align consistently accurately on new genomes. Conclusions: We formally established a connection between genome complexity and the accuracy of short-read aligners. The relationship between genome complexity and alignment accuracy provides additional useful information for selecting suitable aligners for new genomes. Further, this work suggests that the complexity of genomes sometimes should be thought of in terms of specific computational problems, such as the alignment of short reads to genomes.

Publication Title

BMC Bioinformatics

Recommended Citation

Phan, V., Gao, S., Tran, Q., & Vo, N. (2015). How genome complexity can explain the difficulty of aligning reads to genomes. BMC Bioinformatics, 16 (17) https://doi.org/10.1186/1471-2105-16-S17-S3

Link to Full Text

COinS

Faculty Publications

How genome complexity can explain the difficulty of aligning reads to genomes

Abstract

Publication Title

Recommended Citation

Search

Browse

Author Corner

Libraries

Faculty Publications

How genome complexity can explain the difficulty of aligning reads to genomes

Authors

Abstract

Publication Title

Recommended Citation

Share

Search

Browse

Author Corner

Libraries