Using 16S rRNA gene as marker to detect unknown bacteria in microbial communities


Background: Quantification and identification of microbial genomes based on next-generation sequencing data is a challenging problem in metagenomics. Although current methods have mostly focused on analyzing bacteria whose genomes have been sequenced, such analyses are, however, complicated by the presence of unknown bacteria or bacteria whose genomes have not been sequence. Results: We propose a method for detecting unknown bacteria in environmental samples. Our approach is unique in its utilization of short reads only from 16S rRNA genes, not from entire genomes. We show that short reads from 16S rRNA genes retain sufficient information for detecting unknown bacteria in oral microbial communities. Conclusion: In our experimentation with bacterial genomes from the Human Oral Microbiome Database, we found that this method made accurate and robust predictions at different read coverages and percentages of unknown bacteria. Advantages of this approach include not only a reduction in experimental and computational costs but also a potentially high accuracy across environmental samples due to the strong conservation of the 16S rRNA gene.

Publication Title

BMC Bioinformatics