TY - GEN
T1 - Filter-INC
T2 - 23rd Asia-Pacific Software Engineering Conference, APSEC 2016
AU - Phannachitta, Passakorn
AU - Keung, Jacky
AU - Bennin, Kwabena Ebo
AU - Monden, Akito
AU - Matsumoto, Kenichi
N1 - Funding Information:
This research was supported by JSPS KAKENHI Grant number 26330086, was conducted as a part of the Program for Advancing Strategic International Networks to Accelerate the Circulation of Talented Researchers: Interdisciplinary Global Networks for Accelerating Theory and Practice in Software Ecosystem, and was supported in part by the City University of Hong Kong research fund (Project number 7200354, 7004222, 7004474, 7004683, and 9042328).
PY - 2016/7/2
Y1 - 2016/7/2
N2 - Effort-inconsistency is a situation where historical software project data used for software effort estimation (SEE) are contaminated by many project cases with similar characteristics but are completed with significantly different amount of effort. Using these data for SEE generally produces inaccurate results; however, an effective technique for its handling is yet made to be available. This study approaches the problem differently from common solutions, where available techniques typically attempt to remove every project case they have detected as outliers. Instead, we hypothesize that data inconsistency is caused by only a few deviant project cases and any attempt to remove those other cases will result in reduced accuracy, largely due to loss of useful information and data diversity. Filter-INC (short for Filtering technique for handling effort-INConsistency in SEE datasets) implements the hypothesis to decide whether a project case being detected by any existing technique should be subject to removal. The evaluation is carried out by comparing the performance of 2 filtering techniques between before and after having Filter-INC applied. The results produced from 8 real-world datasets together with 3 machine-learning models, and evaluated by 4 performance measures show a significant accuracy improvement at the confident interval of 95%. Based on the results, we recommend our proposed hypothesis as an important instrument to design a data preprocessing technique for handling effort-inconsistency in SEE datasets, definitely an important step forward in preprocessing data for a more accurate SEE model.
AB - Effort-inconsistency is a situation where historical software project data used for software effort estimation (SEE) are contaminated by many project cases with similar characteristics but are completed with significantly different amount of effort. Using these data for SEE generally produces inaccurate results; however, an effective technique for its handling is yet made to be available. This study approaches the problem differently from common solutions, where available techniques typically attempt to remove every project case they have detected as outliers. Instead, we hypothesize that data inconsistency is caused by only a few deviant project cases and any attempt to remove those other cases will result in reduced accuracy, largely due to loss of useful information and data diversity. Filter-INC (short for Filtering technique for handling effort-INConsistency in SEE datasets) implements the hypothesis to decide whether a project case being detected by any existing technique should be subject to removal. The evaluation is carried out by comparing the performance of 2 filtering techniques between before and after having Filter-INC applied. The results produced from 8 real-world datasets together with 3 machine-learning models, and evaluated by 4 performance measures show a significant accuracy improvement at the confident interval of 95%. Based on the results, we recommend our proposed hypothesis as an important instrument to design a data preprocessing technique for handling effort-inconsistency in SEE datasets, definitely an important step forward in preprocessing data for a more accurate SEE model.
KW - Data preprocessing
KW - Effort-inconsistency
KW - Empirical software engineering
KW - Software effort estimation
UR - http://www.scopus.com/inward/record.url?scp=85018504275&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85018504275&partnerID=8YFLogxK
U2 - 10.1109/APSEC.2016.035
DO - 10.1109/APSEC.2016.035
M3 - Conference contribution
AN - SCOPUS:85018504275
T3 - Proceedings - Asia-Pacific Software Engineering Conference, APSEC
SP - 185
EP - 192
BT - Proceedings - 23rd Asia-Pacific Software Engineering Conference, APSEC 2016
A2 - Potanin, Alex
A2 - Murphy, Gail C.
A2 - Reeves, Steve
A2 - Dietrich, Jens
PB - IEEE Computer Society
Y2 - 6 December 2016 through 9 December 2016
ER -