TY - GEN
T1 - Data Smoothing for Software Effort Estimation
AU - Korenaga, Kento
AU - Monden, Akito
AU - Yucel, Zeynep
N1 - Funding Information:
Part of this research was supported by JSPS KAKENHI Grant number 17K00102.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/7
Y1 - 2019/7
N2 - The goal of this paper is to improve the estimation performance of software development effort by mitigating the problem caused by outliers in a historical software project data set, which is used to construct an effort estimation model. To date, outlier removal methods have been proposed to solve this problem; however, they are not always effective because removing outliers reduces the number of data points (= software projects in our case) in a data set, and a model built from a small data set often suffers from lack of generality. In such a case, estimation performance can become even worse. In this paper we propose a method called data smoothing to mitigate the problem of outliers without reducing the number of data points. We consider that data points are outliers if they do not meet the assumption of Analogy-Based Estimation (ABE) such that 'projects with similar features require similar development efforts.' The proposed method changes the effort values (person-months or person-hours) in a data set so as to satisfy this assumption; and by this way, all outliers become non-outliers without decreasing the data points. As a result of experimental evaluation using 8 software development data sets, we found that the proposed data smoothing showed the same or higher effort estimation accuracy than the non-smoothing case, while conventional outlier removal method showed worse accuracy in some data set.
AB - The goal of this paper is to improve the estimation performance of software development effort by mitigating the problem caused by outliers in a historical software project data set, which is used to construct an effort estimation model. To date, outlier removal methods have been proposed to solve this problem; however, they are not always effective because removing outliers reduces the number of data points (= software projects in our case) in a data set, and a model built from a small data set often suffers from lack of generality. In such a case, estimation performance can become even worse. In this paper we propose a method called data smoothing to mitigate the problem of outliers without reducing the number of data points. We consider that data points are outliers if they do not meet the assumption of Analogy-Based Estimation (ABE) such that 'projects with similar features require similar development efforts.' The proposed method changes the effort values (person-months or person-hours) in a data set so as to satisfy this assumption; and by this way, all outliers become non-outliers without decreasing the data points. As a result of experimental evaluation using 8 software development data sets, we found that the proposed data smoothing showed the same or higher effort estimation accuracy than the non-smoothing case, while conventional outlier removal method showed worse accuracy in some data set.
KW - Software project planning
KW - data preprocessing
KW - outlier removal
UR - http://www.scopus.com/inward/record.url?scp=85077960370&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077960370&partnerID=8YFLogxK
U2 - 10.1109/SNPD.2019.8935841
DO - 10.1109/SNPD.2019.8935841
M3 - Conference contribution
AN - SCOPUS:85077960370
T3 - Proceedings - 20th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2019
SP - 501
EP - 506
BT - Proceedings - 20th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2019
A2 - Nakamura, Masahide
A2 - Hirata, Hiroaki
A2 - Ito, Takayuki
A2 - Otsuka, Takanobu
A2 - Okuhara, Shun
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 20th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2019
Y2 - 8 July 2019 through 11 July 2019
ER -