Faculty Publications

Understand effective coverage by mapped reads using genome repeat complexity

Shanshan Gao, Google LLCFollow
Quang Tran, Roche Sequencing Solutions
Vinhthuy Phan, University of Memphis

Abstract

Sequencing depth, which refers to the expected coverage of nucleotides by reads, is computed based on the assumption that reads are synthesized uniformly across chromosomes. In reality, read coverage across genomes is not uniform. Although a coverage of 10x, for example, means a nucleotide is covered 10 times on average, in certain parts of a genome, nucleotides are covered much more or much less. One factor that influences coverage is the ability of a read aligner to align reads to genomes. If a part of a genome is complex, e.g. having many repeats, aligners might have troubles aligning reads to that region, resulting in low coverage. We introduce a systematic approach to predict the effective coverage of genomes by short-read aligners. The effective coverage of a chromosome is defined as the actual amount of bases covered by reads. We show that the quantity is highly correlated with repeat complexity of genomes. Specifically, we show that the more repeats a genome has, the less it is covered by short reads. We demonstrated this strong correlation with five popular short-read aligners in three species: Homo sapiens, Zea mays, and Glycine max. Additionally, we show that compared to other measure of sequence complexity, repeat complexity is most appropriate. This works makes it possible to predict effective coverage of genomes at a given sequencing depth.

Publication Title

Proceedings of 11th International Conference on Bioinformatics and Computational Biology, BiCOB 2019

Recommended Citation

Gao, S., Tran, Q., & Phan, V. (2019). Understand effective coverage by mapped reads using genome repeat complexity. Proceedings of 11th International Conference on Bioinformatics and Computational Biology, BiCOB 2019, 65-73. Retrieved from https://digitalcommons.memphis.edu/facpubs/3302

This document is currently not available here.

COinS

Faculty Publications

Understand effective coverage by mapped reads using genome repeat complexity

Abstract

Publication Title

Recommended Citation

Search

Browse

Author Corner

Libraries

Faculty Publications

Understand effective coverage by mapped reads using genome repeat complexity

Authors

Abstract

Publication Title

Recommended Citation

Share

Search

Browse

Author Corner

Libraries