Faculty Publications

Efficient multi-way Theta-join processing using MapReduce

Xiaofei Zhang, Hong Kong University of Science and TechnologyFollow
Lei Chen, Hong Kong University of Science and Technology
Min Wang, Hewlett Packard Laboratories

Abstract

Multi-way Theta-join queries are powerful in describing complex relations and therefore widely employed in real practices. However, existing solutions from traditional distributed and parallel databases for multi-way Theta-join queries cannot be easily extended to fit a shared-nothing distributed computing paradigm, which is proven to be able to support OLAP applications over immense data volumes. In this work, we study the problem of efficient processing of multi-way Theta-join queries using MapReduce from a costeffective perspective. Although there have been some works using the (key,value) pair-based programming model to support join operations, efficient processing of multi-way Thetajoin queries has never been fully explored. The substantial challenge lies in, given a number of processing units (that can run Map or Reduce tasks), mapping a multi-way Thetajoin query to a number of MapReduce jobs and having them executed in a well scheduled sequence, such that the total processing time span is minimized. Our solution mainly includes two parts: 1) cost metrics for both single MapReduce job and a number of MapReduce jobs executed in a certain order; 2) the efficient execution of a chain-typed Theta-join with only one MapReduce job. Comparing with the query evaluation strategy proposed in [23] and the widely adopted Pig Latin and Hive SQL solutions, our method achieves significant improvement of the join processing efficiency. © 2012 VLDB Endowment.

Publication Title

Proceedings of the VLDB Endowment

Recommended Citation

Zhang, X., Chen, L., & Wang, M. (2012). Efficient multi-way Theta-join processing using MapReduce. Proceedings of the VLDB Endowment, 5 (11), 1184-1195. https://doi.org/10.14778/2350229.2350238

Link to Full Text

COinS

Faculty Publications

Efficient multi-way Theta-join processing using MapReduce

Abstract

Publication Title

Recommended Citation

Search

Browse

Author Corner

Libraries

Faculty Publications

Efficient multi-way Theta-join processing using MapReduce

Authors

Abstract

Publication Title

Recommended Citation

Share

Search

Browse

Author Corner

Libraries