Electronic Theses and Dissertations
Identifier
1081
Date
2014
Document Type
Dissertation
Degree Name
Doctor of Philosophy
Major
Mathematical Sciences
Concentration
Applied Statistics
Committee Chair
Lih-Yuan Deng
Committee Member
Tit-Yee Wong
Committee Member
Ebenezer George
Committee Member
Ramin Homayouni
Abstract
We propose a new method, called Trinucleotide Usage Profile (TUP), to build a genome-wide phylogenetic tree for a large group of species. The main idea is to summarize the DNA sequence in a matrix of three rows corresponding to three reading frames and each row is the distribution on the (non-overlapping) words of length 3 for the corresponding reading frame. Based on the proposed TUP method, the empirical study showed that phylogenetic trees with strong biological support can be built. For a given DNA sequence for a gene, we consider the problem of gene classification into several possible categories. We study the efficacy and efficiency of several DNA feature extraction functions. In particular, we evaluate two newly proposed feature extraction functions, Translational Stop Signal Ratio (TSSR) and Double-strand Translational Stop Signal Ratio (DTSSR), and a well-established feature extraction function, 3-mer, for a simulated metagenomic data. Our study shows that, TSSR and DTSSR can achieve comparably high classification accuracies as 3-mer in much shorter time.
Library Comment
Dissertation or thesis originally submitted to the local University of Memphis Electronic Theses & dissertation (ETD) Repository.
Recommended Citation
Chen, Si, "DNA Sequence Analysis for Applications to Phylogenetic Tree Construction and Simulated Metagenomic Binning" (2014). Electronic Theses and Dissertations. 916.
https://digitalcommons.memphis.edu/etd/916
Comments
Data is provided by the student.