New Genomic Information Systems (GenISs): Species Delimitation and IDentification


Genomic Information Systems (GenISs) have been recently proposed to provide a universal framework for feature extraction, dimensionality reduction and more effective processing of genomic data. They are based on methodologies more anchored in biochemical reality and exploit newly discovered structure of DNA spaces to extract and represent genomic data in compact data structures rich enough to answer critical questions about the original organisms, including phylogenies, species identification and, more recently, phenotypic information. They work from just DNA sequence alone (possibly including full genomes), in a matter of minutes or hours, and produce answers consistent with well-established and accepted biological knowledge. Here, we introduce a second family of GenISs based on further structural properties of DNA spaces and demonstrate that they could also be used to provide principled, general and intuitive solutions to fundamental questions in biology such as “What exactly is a biological species?” Current answers to these all important questions have remained dependent on specific taxa and subject to analyst choices. We further discuss other applications to be explored in the future, including universal biological taxonomies in the quest for a truly universal and comprehensive “Atlas of Life”, as it is or as it could be on earth.

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)