An empirical CDF approach to estimate the significance of gene ranking for finding differentially expressed genes


This paper proposes a procedure for finding significance of gene ranking. The microarray data usually has a large number of genes that are not differentially expressed across multiple conditions. In microarray analysis, it is a common practice to first discard these genes as uninformative based on some filtering criterion. This filtering process results in the information loss as the uninformative genes may be used to construct an empirical distribution of genes under the null hypothesis. The distribution of the non-differentially expressed genes is complex and may be regarded as a mixture of distributions. The null hypothesis is that the gene is not differentially expressed. The significance of the differentially expressed genes therefore may be estimated by using the empirical distribution function of the large number of insignificant genes. The proposed method is efficient, less computation intensive and may be applied on microarray datasets of any sample size. ©2007 IEEE.

Publication Title

Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE