Clustering of defect reports using graph partitioning algorithms


We present in this paper several solutions to the challenging task of clustering software defect reports. Clustering defect reports can be very useful for prioritizing the testing effort and to better understand the nature of software defects. Despite some challenges with the language used and semi-structured nature of defect reports, our experiments on data collected from the open source project Mozilla show extremely promising results for clustering software defect reports using natural language processing and graph partitioning techniques. We report results with three models for representing the textual information in the defect reports and three clustering algorithms: normalized cut, size regularized cut, and k-means. Our data collection method allowed us to quickly develop a proof-of-concept setup. Experiments showed that normalized cut achieved the best performance in terms of average cluster purity, accuracy, and normalized mutual information.

Publication Title

Proceedings of the 21st International Conference on Software Engineering and Knowledge Engineering, SEKE 2009

This document is currently not available here.