DNA chips for species identification and biological phylogenies


The codeword design problem is an important problem in DNA computing and its applications. Several theoretical analyses as well as practical solutions for short oligonucleotides (up to 20-mers) have been generated recently. These solutions have, in turn, suggested new applications to DNA-based indexing and natural language processing, in addition to the obvious applications to the problems of reliability and scalability that generated them. Here we continue the exploration of this type of DNA-based indexing for biological applications and show that DNA noncrosshybridizing (nxh) sets can be successfully applied to infer ab initio phylogenetic trees by providing a way to measure distances among different genomes indexed by sets of short oligonucleotides selected so as to minimize crosshybridization. These phylogenies are solidly established and well accepted in biology. The new technique is much more effective in terms of signal-to-noise ratio, cost and time than current methods. Second, it can be scaled up to newly available universal DNA chips readily available both in vitro and in silico. In particular, we show how a recently obtained such set of nxh 16-mers can be used as a universal coordinate system in DNA spaces to characterize very large groups (families, genera, and even phylla) of organisms on a uniform reference system, a veritable and comprehensive "Atlas of Life", as it is or as it could be on earth. © 2009 Springer-Verlag.

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)