Towards efficient join processing over large RDF graph using mapreduce

Abstract

Existing solutions for answering SPARQL queries in a shared-nothing environment using MapReduce failed to fully explore the substantial scalability and parallelism of the computing framework. In this paper, we propose a cost model based RDF join processing solution using MapReduce to minimize the query responding time as much as possible. After transforming a SPARQL query into a sequence of MapReduce jobs, we propose a novel index structure, called All Possible Join tree (APJ-tree), to reduce the searching space for the optimal execution plan of a set of MapReduce jobs. To speed up the join processing, we employ hybrid join and bloom filter for performance optimization. Extensive experiments on real data sets proved the effectiveness of our cost model. Our solution has as much as an order of magnitude time saving compared with the state of art solutions. © 2012 Springer-Verlag.

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Recommended Citation

Zhang, X., Chen, L., & Wang, M. (2012). Towards efficient join processing over large RDF graph using mapreduce. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7338 LNCS, 250-259. https://doi.org/10.1007/978-3-642-31235-9_16

Faculty Publications

Towards efficient join processing over large RDF graph using mapreduce

Abstract

Publication Title

Recommended Citation

Search

Browse

Author Corner

Libraries

Faculty Publications

Towards efficient join processing over large RDF graph using mapreduce

Authors

Abstract

Publication Title

Recommended Citation

Share

Search

Browse

Author Corner

Libraries