Electronic Theses and Dissertations

Date

2021

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Computer Science

Committee Chair

Max Garzon

Committee Member

Russell Deaton

Committee Member

Omar Skalli

Committee Member

Deepak Venugopal

Abstract

Our ability to generate data has far outdone our ability to analyze it in order to transform it into useful information. A major tool in addressing the problem is extraction or selection of informative features in the data. When the data is structured, dimensionality reduction and analyses can be much easier. However, structured data (beyond just superficial formatting and cleansing) is rare, hopeless in case of images and even text, particularly with DNA sequences. Deep networks resolve these issues to some extent but the question about explainability and timeliness of results still remains. Recent advancements in Genomic Information Systems (GenISs) have shown that the exquisite discriminating ability of DNA hybridization (double helix formation) can be leveraged for smarter data processing. The knowledge about the hybridization property of DNA (discovered by Watson and Crick in the 1950s) have enabled us to uncover some Euclidean embeddings on DNA spaces along with some interesting structural properties (like centroids, center of mass and so on) analogous to that of the planets in our solar system (like earth and saturn.) In this work, we develop a family of Genomic Information Systems (GenISs), based on novel coordinate systems (genomic and pmeric) for DNA sequences of arbitrary length obtained from their deep structural properties based on hybridization patterns, that can be leveraged to facilitate data analytics of both biotic and abiotic data. We also assess the quality of these GenISs with a number of applications in the field of biology and computer science at large. The quality assessment of these results illustrate how DNA is capable of self-organizing unstructured data into semantic clusters meaningful to humans, in addition to supporting complex life processes for phenomics, metabolomics, species identification, pathogenicity and so on. Furthermore, these results hint at the tip of an iceberg about the capacity of DNA for not only encoding but also processing information that can be leveraged as a powerful tool in this era of big data and data science.

Comments

Data is provided by the student.

Library Comment

Dissertation or thesis originally submitted to ProQuest

Share

COinS