Electronic Theses and Dissertations





Document Type


Degree Name

Doctor of Philosophy


Computer Science

Committee Chair

Ebenezer George

Committee Member

Ramin Homayouni

Committee Member

Lan Wang

Committee Member

Vinhthuy Phan


High throughput technologies such as DNA microarray have been widely used to simultaneously measure the expressions of thousands of genes. Analysis of such data often yields to high false positive and false negative rates. When multiple datasets addressing the same scientific question are available, it is clear that integrating them can be more informative, with superior operating characteristics, than any individual study. Pooling the P-values of the statistical tests from the individual studies presents a feasible and proven solution for integrating heterogeneous datasets. However, pooling these P-values with equal weights may result in suboptimal statistical power because of the unequal sample size and experiment quality of the datasets. An approach which weighs dataset according to specified criteria would seem more efficacious. In this dissertation, we developed a procedure for optimally pooling P-values of independent tests from several studies. We propose an approximation of the null distributions of weighted versions of three popular pooling statistics: the Fisher’s omnibus method, the Logit method and Z method. We use approximate null distribution to directly estimate the P-values of the weighted combination statistics, and compare our procedure with an existing procedure called Pointillist, in which the null distribution of the weighted combination procedure is simulated. We have found that the Pointillist software has several errors. We demonstrate the superiority of our procedure over the Pointillist algorithm by application to set of six experimental datasets. In addition, to pool evidence of gene expression data, one-sided P-value instead of two-sided P-values should be used to avoid losing information. We construct an optimally weighted pooling procedure for pooling one-sided P-values. Since the datasets to be pooled are those of gene expressions, a biological perspective can be used to assess the performance of the pooling methods. We developed optimally weighted combination procedure to pool gene expression data by maximizing functional coherence of top ranked genes. Testing with the sample datasets, the top ranked genes identified by this method has higher functional coherence than any single dataset. We have developed a web tool for implementing the optimally weighted combination procedures proposed in this dissertation.


Data is provided by the student.

Library Comment

Dissertation or thesis originally submitted to the local University of Memphis Electronic Theses & dissertation (ETD) Repository.