Automatic clustering of defect reports


This paper addresses the problem of clustering defect reports. Clustering defect reports can provide valuable information to software testers, e.g. it could help better plan and prioritize the testing effort as testers could focus on testing the features with most defects as indicated by the largest clusters identified. In this paper, we present results obtained with one clustering algorithm, K-means, and two models of defect reports. In one model we use the summary field of the reports and in another the description field. Our experiments on defect reports from Mozilla's Bugzilla, a database of defect reports related to the open source Mozilla project, showed that clustering defect reports based on their summary field (average accuracy= 44.2240%) outperformed clustering based on the description field (average accuracy=29.3581%). Both methods outperform the baseline of randomly picking a cluster (accuracy=20.0000%). We evaluated the clustering algorithm with respect to clusters containing bug reports that refer to the same underlying problem.

Publication Title

20th International Conference on Software Engineering and Knowledge Engineering, SEKE 2008

This document is currently not available here.