Exploiting dependencies of pairwise comparison outcomes to predict patterns of gene response


Background: The analysis of gene expression has played an important role in medical and bioinformatics research. Although it is known that a large number of samples is needed to determine the patterns of gene expression accurately, practical designs of gene expression studies occasionally have insufficient numbers of samples, making it difficult to ascertain true response patterns of variantly expressed genes. Results: We describe an approach to cope with the challenge of predicting true orders of gene response to treatments. We show that true patterns of gene response must be orderable sets. In experiments with few samples, we modify the conventional pairwise comparison tests and increase the significance level α intelligently to deduce orderable patterns, which are most likely true orders of gene response. Additionally, motivated by the fact that a gene can be involved in multiple biological functions, our method further resamples experimental replicates and predicts multiple response patterns for each gene. Using a gene expression data set of Sprague-Dawley rats treated with chemopreventive chemical compounds and DAVID to annotate and validate gene sets, we showed that compared to the conventional method of fixing α, this method increased enrichment significantly. A comparison with hierarchical clustering showed that gene clusters labelled by response patterns produced by our method were much more enriched. One of the clusters contained 3 transcription factors, which hierarchical clustering failed to place into one cluster, that have been found to participate in multiple biological networks. One of the transcription factors is known to play an important role in pathways affected by the studied chemical compounds. Conclusions: This method can be useful in designing cost-effective experiments with small sample sizes. Patterns of highly-variantly expressed genes can be predicted by varying α intelligently. Furthermore, clusters are labeled meaningfully with patterns that describe precisely how genes in such clusters respond to treatments.

Publication Title

BMC Bioinformatics