TY - GEN
T1 - Empirical Evaluation of Cross-Release Effort-Aware Defect Prediction Models
AU - Bennin, Kwabena Ebo
AU - Toda, Koji
AU - Kamei, Yasutaka
AU - Keung, Jacky
AU - Monden, Akito
AU - Ubayashi, Naoyasu
N1 - Funding Information:
This research is supported by the City University of Hong Kong research funds (Project No. 7200354, 7004222, 7004474).
Publisher Copyright:
© 2016 IEEE.
PY - 2016/10/12
Y1 - 2016/10/12
N2 - To prioritize quality assurance efforts, various fault prediction models have been proposed. However, the best performing fault prediction model is unknown due to three major drawbacks: (1) comparison of few fault prediction models considering small number of data sets, (2) use of evaluation measures that ignore testing efforts and (3) use of n-fold cross-validation instead of the more practical cross-release validation. To address these concerns, we conducted cross-release evaluation of 11 fault density prediction models using data sets collected from 2 releases of 25 open source software projects with an effort-Aware performance measure known as Norm(Popt). Our result shows that, whilst M5 and K∗ had the best performances, they were greatly influenced by the percentage of faulty modules present and size of data set. Using Norm(Popt) produced an overall average performance of more than 50% across all the selected models clearly indicating the importance of considering testing efforts in building fault-prone prediction models.
AB - To prioritize quality assurance efforts, various fault prediction models have been proposed. However, the best performing fault prediction model is unknown due to three major drawbacks: (1) comparison of few fault prediction models considering small number of data sets, (2) use of evaluation measures that ignore testing efforts and (3) use of n-fold cross-validation instead of the more practical cross-release validation. To address these concerns, we conducted cross-release evaluation of 11 fault density prediction models using data sets collected from 2 releases of 25 open source software projects with an effort-Aware performance measure known as Norm(Popt). Our result shows that, whilst M5 and K∗ had the best performances, they were greatly influenced by the percentage of faulty modules present and size of data set. Using Norm(Popt) produced an overall average performance of more than 50% across all the selected models clearly indicating the importance of considering testing efforts in building fault-prone prediction models.
KW - Demsar's significance diagram
KW - crossversionprediction
KW - empirical study
KW - fault-density estimation
KW - open sourceproject
UR - http://www.scopus.com/inward/record.url?scp=84995527116&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84995527116&partnerID=8YFLogxK
U2 - 10.1109/QRS.2016.33
DO - 10.1109/QRS.2016.33
M3 - Conference contribution
AN - SCOPUS:84995527116
T3 - Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016
SP - 214
EP - 221
BT - Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd IEEE International Conference on Software Quality, Reliability and Security, QRS 2016
Y2 - 1 August 2016 through 3 August 2016
ER -