Range-max queries on uncertain data


Let P be a set of n uncertain points in ℝd, where each point pi ∈ P is associated with a real value vi and a probability αi ∈ (0,1] of existence, i.e., each pi exists with an independent probability αi. We present algorithms for building an index on P so that for a d-dimensional query rectangle ρ, the expected maximum value or the most-likely maximum value in ρ can be computed quickly. The specific contributions of our paper include the following: (i) The first index of sub-quadratic size to achieve a sub-linear query time in any dimension d ≥ 1. It also provides a trade-off between query time and size of the index. (ii) A conditional lower bound for the most-likely range-max queries, based on the conjectured hardness of the set-intersection problem, which suggests that in the worst case the product (query time)2 x (index size) is Ω(n2/polylog(n)). (iii) A linear-size index for estimating the expected range-max value within approximation factor 1/2 in O(logcn) time, for some constant c > 0; that is, if the expected maximum value is μ then the query procedure returns a value μ′ with μ/2 ≤ μ′ ≤ μ. (iv) Extensions of our algorithm to more general uncertainty models and for computing the top-k values of the range-max.

Publication Title

Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems