Faculty Publications

Repeat complexity of genomes as a means to predict the performance of short-read aligners

Quang Tran, University of Memphis
Shanshan Gao, University of MemphisFollow
Nam S. Vo, University of Memphis
Vinhthuy Phan, University of Memphis

Abstract

We investigated the extent to which the complexity of genomic sequences affects the performance of short-read aligners. We demonstrated that a proper measure of sequence complexity was essential in studying the relationship between alignment performance and the abundance of repeats in genomes. In particular, we demonstrated that popular measures of sequence complexity were not suitable and that the right measure of repeat complexity correlated strongly to the performance of many popular short-read aligners. Using genomic sequences from a diverse number of species, we observed that as repeat complexity increased, the performance of these aligners decreased proportionally. This strong negative correlation was observed in all three important aspects of alignment performance: (i) precision, (ii) accuracy and (iii) chromosomal coverage by mapped reads. We took advantage of such strong correlation to construct linear regression models that could predict accurately alignment performance based on repeat complexity without having to align millions of reads to genomes. This finding suggests a novel approach to selecting aligners for new genomes and has great potential for reducing experimental cost.

Publication Title

Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016

Recommended Citation

Tran, Q., Gao, S., Vo, N., & Phan, V. (2016). Repeat complexity of genomes as a means to predict the performance of short-read aligners. Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016, 135-141. Retrieved from https://digitalcommons.memphis.edu/facpubs/3144

This document is currently not available here.

COinS

Faculty Publications

Repeat complexity of genomes as a means to predict the performance of short-read aligners

Abstract

Publication Title

Recommended Citation

Search

Browse

Author Corner

Libraries

Faculty Publications

Repeat complexity of genomes as a means to predict the performance of short-read aligners

Authors

Abstract

Publication Title

Recommended Citation

Share

Search

Browse

Author Corner

Libraries