Programming education is becoming increasingly popular in universities. However, due to a lack of debugging experience, novices often encounter numerous difficulties in the programming process. Automatic fault localization techniques have emerged as a promising solution to address this issue. Among these techniques, Spectrum-Based Fault Localization (SBFL) and Mutation-Based Fault Localization (MBFL) have been widely used in industrial programs. However, there is a significant difference between industrial and novice programs and the performance of these methods on novice programs has not been extensively studied.
To fill this gap, we conducted an empirical study to evaluate the fault localization performance and execution overhead of SBFL and MBFL in a typical novice programming environment. Our study specifically examined how different program characteristics, including code coverage and mutation score, affect the accuracy of these localization methods. Additionally, during the study, we identified the tie problem in both methods and further investigated its impact on fault localization techniques in novice programs.
To remove the impact of the tie problem, we proposed using PageRank scores as weights for the suspiciousness, sorting, and locating faults based on the weighted suspiciousness. The PageRank algorithm is based on statement coverage information and constructs a directed graph. From the directed graph, a transition matrix generates the weight scores (PageRank scores) for each statement.
Our research demonstrates that both SBFL and MBFL are effective for fault localization in novice programs, with MBFL showing significantly better performance in our tests. In TOP-N(N=1,3,5)N(N=1,3,5), MBFL accurately locates 67, 96 and 114 faults, respectively, indicating superior performance. Additionally, calculating weighted suspiciousness significantly alleviates the tie problem.