Profiling Environmental Conditions from DNA


DNA is quintessential to carry out basic functions by organisms as it encodes information necessary for metabolomics and proteomics, among others. In particular, it is common nowadays to use DNA for profiling living organisms based on their phenotypic traits. These traits are the outcomes of the genetic makeup constrained by the interaction between living organisms and their surrounding environment over time. For environmental conditions, however, the conventional assumption is that they are too random and ephemeral to be encoded in the DNA of an organism. Here, we demonstrate that, to the contrary, genomic DNA may also encode sufficient information about some environmental features of an organism’s habitat for a machine learning model to reveal them, although there seem to be exceptions, i.e. some environmental features do not appear to be coded in DNA, unless our methods miss that information. Nevertheless, we demonstrate that these features can be used to train better models for better predictions of other environmental factors. These results lead directly to the question of whether over evolutionary history, DNA itself is actually also a repository of information related to the environment where the lineage has developed, perhaps even more cryptically than the way it encodes phenotypic information.

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)