A multivariate ensemble approach for identification of biomarkers: Application to breast cancer


Advances in high throughput screening experiments have significantly improved our ability to discover and predict biomarkers for complex diseases. Systems biology approaches have played a critical role in realizing these improvements by providing computational tools for modeling such diseases at the network level. Within these tools, statistical scores such as the two sample t-statistic (t-score) are commonly used to rank genes/features for downstream analyses. In this paper, we propose a new alternative to the t-score - the ensemble sensitivity (ES) metric-which is a multivariate strategy to obtain feature rankings. To validate our method, we employ the COre Module Biomarker Identification with Network Exploration (COMBINER) tool on publicly available breast cancer gene expression data sets. Top candidates obtained by both the t-score and ES method serve as an input to COMBINER, which identifies the candidate biomarkers. Our results, as quantified by the COMBINER-generated area under the ROC curve (AUC), suggest that the ES approach improves the average AUC and identifies biomarkers with ∼93% overlap with known cancer-related genes. In addition, the overlap of genes known to be associated with cancer that are identified using the two methods is small. This suggests that our proposed approach captures signals missed by methods relying on the t-score.

Publication Title

IFAC Proceedings Volumes (IFAC-PapersOnline)