Doctor of Philosophy
We propose a new method, called Trinucleotide Usage Profile (TUP), to build a genome-wide phylogenetic tree for a large group of species. The main idea is to summarize the DNA sequence in a matrix of three rows corresponding to three reading frames and each row is the distribution on the (non-overlapping) words of length 3 for the corresponding reading frame. Based on the proposed TUP method, the empirical study showed that phylogenetic trees with strong biological support can be built. For a given DNA sequence for a gene, we consider the problem of gene classification into several possible categories. We study the efficacy and efficiency of several DNA feature extraction functions. In particular, we evaluate two newly proposed feature extraction functions, Translational Stop Signal Ratio (TSSR) and Double-strand Translational Stop Signal Ratio (DTSSR), and a well-established feature extraction function, 3-mer, for a simulated metagenomic data. Our study shows that, TSSR and DTSSR can achieve comparably high classification accuracies as 3-mer in much shorter time.
Dissertation or thesis originally submitted to the local University of Memphis Electronic Theses & dissertation (ETD) Repository.
Chen, Si, "DNA Sequence Analysis for Applications to Phylogenetic Tree Construction and Simulated Metagenomic Binning" (2014). Electronic Theses and Dissertations. 916.