TY - GEN
T1 - A heuristic rule reduction approach to software fault-proneness prediction
AU - Monden, Akito
AU - Keung, Jacky
AU - Morisaki, Shuji
AU - Kamei, Yasutaka
AU - Matsumoto, Ken Ichi
PY - 2012/1/1
Y1 - 2012/1/1
N2 - Background: Association rules are more comprehensive and understandable than fault-prone module predictors (such as logistic regression model, random forest and support vector machine). One of the challenges is that there are usually too many similar rules to be extracted by the rule mining. Aim: This paper proposes a rule reduction technique that can eliminate complex (long) and/or similar rules without sacrificing the prediction performance as much as possible. Method: The notion of the method is to removing long and similar rules unless their confidence level as a heuristic is high enough than shorter rules. For example, it starts with selecting rules with shortest length (length=1), and then it continues through the 2nd shortest rules selection (length=2) based on the current confidence level, this process is repeated on the selection for longer rules until no rules are worth included. Result: An empirical experiment has been conducted with the Mylyn and Eclipse PDE datasets. The result of the Mylyn dataset showed the proposed method was able to reduce the number of rules from 1347 down to 13, while the delta of the prediction performance was only. 015 (from. 757 down to. 742) in terms of the F1 prediction criteria. In the experiment with Eclipsed PDE dataset, the proposed method reduced the number of rules from 398 to 12, while the prediction performance even improved (from. 426 to. 441.) Conclusion: The novel technique introduced resolves the rule explosion problem in association rule mining for software proneness prediction, which is significant and provides better understanding of the causes of faulty modules.
AB - Background: Association rules are more comprehensive and understandable than fault-prone module predictors (such as logistic regression model, random forest and support vector machine). One of the challenges is that there are usually too many similar rules to be extracted by the rule mining. Aim: This paper proposes a rule reduction technique that can eliminate complex (long) and/or similar rules without sacrificing the prediction performance as much as possible. Method: The notion of the method is to removing long and similar rules unless their confidence level as a heuristic is high enough than shorter rules. For example, it starts with selecting rules with shortest length (length=1), and then it continues through the 2nd shortest rules selection (length=2) based on the current confidence level, this process is repeated on the selection for longer rules until no rules are worth included. Result: An empirical experiment has been conducted with the Mylyn and Eclipse PDE datasets. The result of the Mylyn dataset showed the proposed method was able to reduce the number of rules from 1347 down to 13, while the delta of the prediction performance was only. 015 (from. 757 down to. 742) in terms of the F1 prediction criteria. In the experiment with Eclipsed PDE dataset, the proposed method reduced the number of rules from 398 to 12, while the prediction performance even improved (from. 426 to. 441.) Conclusion: The novel technique introduced resolves the rule explosion problem in association rule mining for software proneness prediction, which is significant and provides better understanding of the causes of faulty modules.
KW - association rule mining
KW - data mining
KW - defect prediction
KW - empirical study
KW - software quality
UR - http://www.scopus.com/inward/record.url?scp=84874632768&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84874632768&partnerID=8YFLogxK
U2 - 10.1109/APSEC.2012.103
DO - 10.1109/APSEC.2012.103
M3 - Conference contribution
AN - SCOPUS:84874632768
SN - 9780769549224
T3 - Proceedings - Asia-Pacific Software Engineering Conference, APSEC
SP - 838
EP - 847
BT - APSEC 2012 - Proceedings of the 19th Asia-Pacific Software Engineering Conference
PB - IEEE Computer Society
T2 - 19th Asia-Pacific Software Engineering Conference, APSEC 2012
Y2 - 4 December 2012 through 7 December 2012
ER -