An improved data integration methodology for system biology


Pooling P-values from independent experiments has been proven to improve power of statistical tests. Instead of assigning equal weight to each dataset, Hwang et al. proposed a data integration methodology for system biology, labeled Pontillist, to pool data using weighted P-values so as to maximize the number of significant genes discovered. Pontillist uses simulated null distribution of the weighted combination statistics. We have found several fatal statistical errors in Pontillist, and provide a correction to them. Also, Pontillist is intrinsically computationally inefficient requiring substantial, sometimes even prohibitive, computing time for convergence at low significance levels. We propose a new approach for optimal combination of P-values by using the approximated theoretical distribution of the Fisher's, Logit and Z omnibus combination statistics to estimate the P-value of weighted pooled statistics. Our computationally efficient approach guarantees convergence at any significance level, and produces accurate pooled P-values. © 2011 IEEE.

Publication Title

2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011