Electronic Theses and Dissertations

Identifier

1081

Author

Si Chen

Date

2014

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Mathematical Sciences

Concentration

Applied Statistics

Committee Chair

Lih-Yuan Deng

Committee Member

Tit-Yee Wong

Committee Member

Ebenezer George

Committee Member

Ramin Homayouni

Abstract

We propose a new method, called Trinucleotide Usage Profile (TUP), to build a genome-wide phylogenetic tree for a large group of species. The main idea is to summarize the DNA sequence in a matrix of three rows corresponding to three reading frames and each row is the distribution on the (non-overlapping) words of length 3 for the corresponding reading frame. Based on the proposed TUP method, the empirical study showed that phylogenetic trees with strong biological support can be built. For a given DNA sequence for a gene, we consider the problem of gene classification into several possible categories. We study the efficacy and efficiency of several DNA feature extraction functions. In particular, we evaluate two newly proposed feature extraction functions, Translational Stop Signal Ratio (TSSR) and Double-strand Translational Stop Signal Ratio (DTSSR), and a well-established feature extraction function, 3-mer, for a simulated metagenomic data. Our study shows that, TSSR and DTSSR can achieve comparably high classification accuracies as 3-mer in much shorter time.

Comments

Data is provided by the student.

Library Comment

Dissertation or thesis originally submitted to the local University of Memphis Electronic Theses & dissertation (ETD) Repository.

Share

COinS