Fast Compaction Algorithms for NoSQL Databases
Compaction plays a crucial role in NoSQL systems to ensure a high overall read throughput. In this work, we formally define compaction as an optimization problem that attempts to minimize disk I/O. We prove this problem to be NPHard. We then propose a set of algorithms and mathematically analyze upper bounds on worst-case cost. We evaluate the proposed algorithms on real-life workloads. Our results show that our algorithms incur low I/O costs and that a compaction approach using a balanced tree is most preferable.
Proceedings - International Conference on Distributed Computing Systems
Ghosh, M., Gupta, I., Gupta, S., & Kumar, N. (2015). Fast Compaction Algorithms for NoSQL Databases. Proceedings - International Conference on Distributed Computing Systems, 2015-July, 452-461. https://doi.org/10.1109/ICDCS.2015.53