Efficient probabilistic supergraph search over large uncertain graphs


In recent years, with the emergence of a number of new real applications, such as protein-protein interaction (PPI) networks, visual pattern recognition, and intelligent traffic systems, managing huge volumes of uncertain graphs has attracted much attention in the database community. Currently, most existing fundamental queries over graphs only support deterministic (or certain) graphs, although real graph data are often noisy, inaccurate, and incomplete. In this paper, we study a new type of uncertain graph query, probabilistic supergraph containment query over large uncertain graphs. Specifically, given an uncertain graph database UGD which contains a set of uncertain graphs, a deterministic query graph q, and a probabilistic threshold δ, a probabilistic supergraph containment query is to find the set of uncertain graphs from UGD, denoted as UGDq, such that UGDq = {ugi ∈ UGD\Pr(ugi ⊆ q) > δ} where Pr(ugi ⊆ q) means the likelihood that ugi is a subgraph of q. We prove that the computation of Pr(ugi ⊆ q) is #P-hard and design an efficient filtering-and-verification framework to avoid the expensive computation. In particular, we propose an effective filtering strategy and a novel probabilistic inverted index, called PS-Index, to enhance pruning power in the filtering phase. Furthermore, the candidate graphs which pass the filtering phase are tested in the verification phase via an efficient unequal probability sampling-based approximation algorithm. Finally, we verify the effectiveness and efficiency of the proposed methods through extensive experiments.

Publication Title

CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management