POST: A framework for set-based association analysis in high-dimensional data


Evaluating the differential expression of a set of genes belonging to a common biological process or ontology has proven to be a very useful tool for biological discovery. However, existing gene-set association methods are limited to applications that evaluate differential expression across k⩾2 treatment groups or biological categories. This limitation precludes researchers from most effectively evaluating the association with other phenotypes that may be more clinically meaningful, such as quantitative variables or censored survival time variables. Projection onto the Orthogonal Space Testing (POST) is proposed as a general procedure that can robustly evaluate the association of a gene-set with several different types of phenotypic data (categorical, ordinal, continuous, or censored). For each gene-set, POST transforms the gene profiles into a set of eigenvectors and then uses statistical modeling to compute a set of z-statistics that measure the association of each eigenvector with the phenotype. The overall gene-set statistic is the sum of squared z-statistics weighted by the corresponding eigenvalues. Finally, bootstrapping is used to compute a p-value. POST may evaluate associations with or without adjustment for covariates. In simulation studies, it is shown that the performance of POST in evaluating the association with a categorical phenotype is similar to or exceeds that of existing methods. In evaluating the association of 875 biological processes with the time to relapse of pediatric acute myeloid leukemia, POST identified the well-known oncogenic WNT signaling pathway as its top hit. These results indicate that POST can be a very useful tool for evaluating the association of a gene-set with a variety of different phenotypes. We have developed an R package named POST which is freely available in Bioconductor.

Publication Title