Electronic Theses and Dissertations Archive

Date

2026

Document Type

Thesis

Degree Name

Master of Science

Department

Computer Science

Committee Chair

Max Garzon

Committee Member

Deepak Venugopal

Committee Member

Max Garzon

Committee Member

Vinhthuy Phan

Abstract

Recent evidence suggests that DNA also encodes information that reflects long-term environmental conditions. This work examines whether an organism’s genome encodes specific environmental information, and if so, what is its nature and origin. Genes from three insect groups, Simuliidae (blackflies), Lepidoptera (butterflies), and Formicidae (ants), were analyzed to address the geographic provenance (latitude, longitude) problem. DNA was converted into numerical signatures using noncrosshybridizing DNA bases and used to train machine learning (ML) models. Predictions achieved errors below the baseline, even for models trained on combined data without region-specific labels, indicating that DNA encodes this information and suggesting that the underlying mechanism is likely shared across insects. Granularity of these encodings varied; organisms with low mobility exhibited more precise genomic encodings, whereas highly migratory insects, showed weaker and variable encodings. Across all insects, nucleotide order was critical for encodings, though where this information was encoded, distributive or localized, differs among species.

Comments

Data is provided by the student.

Library Comment

Dissertation or thesis originally submitted to ProQuest/Clarivate.

Notes

Embargoed until 2026-10-02

Available for download on Friday, October 02, 2026

Share

COinS