Analyzing the Stationarity Process in Software Effort Estimation Datasets

Michael Franklin Bosu

Centre for Information Technology, Waikato Institute of Technology, Hamilton 3240, New Zealand

E-mail Address: michael.bosu@wintec.ac.nz

Corresponding author.

Search for more papers by this author

Stephen G. MacDonell

Department of Information Science, University of Otago, Dunedin 9054, New Zealand

E-mail Address: stephen.macdonell@otago.ac.nz

Search for more papers by this author

, and

Peter A. Whigham

Department of Information Science, University of Otago, Dunedin 9054, New Zealand

E-mail Address: peter.whigham@otago.ac.nz

Search for more papers by this author

https://doi.org/10.1142/S0218194020400239Cited by:1 (Source: Crossref)

Abstract

Software effort estimation models are typically developed based on an underlying assumption that all data points are equally relevant to the prediction of effort for future projects. The dynamic nature of several aspects of the software engineering process could mean that this assumption does not hold in at least some cases. This study employs three kernel estimator functions to test the stationarity assumption in five software engineering datasets that have been used in the construction of software effort estimation models. The kernel estimators are used in the generation of nonuniform weights which are subsequently employed in weighted linear regression modeling. In each model, older projects are assigned smaller weights while the more recently completed projects are assigned larger weights, to reflect their potentially greater relevance to present or future projects that need to be estimated. Prediction errors are compared to those obtained from uniform models. Our results indicate that, for the datasets that exhibit underlying nonstationary processes, uniform models are more accurate than the nonuniform models; that is, models based on kernel estimator functions are worse than the models where no weighting was applied. In contrast, the accuracies of uniform and nonuniform models for datasets that exhibited stationary processes were essentially equivalent. Our analysis indicates that as the heterogeneity of a dataset increases, the effect of stationarity is overridden. The results of our study also confirm prior findings that the accuracy of effort estimation models is independent of the type of kernel estimator function used in model development.

Keywords:

References

1. R. Lagerström, L. M. von Würtemberg, H. Holm and O. Luczak, Identifying factors affecting software development cost and productivity, Softw. Qual. J. 20 (2012) 395–417. Crossref, Web of Science, Google Scholar
2. S. Wagner and M. Ruhe, A systematic review of productivity factors in software development, in Proc. 2nd Int. Workshop Software Productivity Analysis and Cost Estimmation, 2008, pp. 1–6. Google Scholar
3. K. D. Maxwell and P. Forselius, Benchmarking software development productivity, IEEE Softw. 17(1) (2000) 80–88. Crossref, Web of Science, Google Scholar
4. D. Basten and A. Sunyaev, A systematic mapping of factors affecting accuracy of software development effort estimation, Commun. Assoc. Inf. Syst. 34 (2014) 51–86. Google Scholar
5. P. Abrahamsson, R. Moser, W. Pedrycz, A. Sillitti and G. Succi, Effort prediction in iterative software development processes: Incremental versus global prediction models, in Proc. First Int. Symp. Empirical Software Engineering and Measurement (IEEE, 2007), pp. 344–353. Crossref, Google Scholar
6. E. M. Simao, Comparison of software development methodologies based on the SWEBOK, Ph.D. thesis, Universidade do Minho, Braga, 2011. Google Scholar
7. M. Jørgensen and M. Shepperd, A systematic review of software development cost estimation studies, IEEE Trans. Softw. Eng. 33(1) (2007) 33–53. Crossref, Web of Science, Google Scholar
8. B. Kitchenham, S. L. Pfleeger, B. McColl and S. Eagan, An empirical study of maintenance and development estimation accuracy, J. Syst. Softw. 64(1) (2002) 57–77. Crossref, Web of Science, Google Scholar
9. K. Maxwell, Applied Statistics for Software Managers (Prentice-Hall, Englewood Cliffs, 2002). Google Scholar
10. M. Auer and S. Biffl, Increasing the accuracy and reliability of analogy-based cost estimation with extensive project feature dimension weighting, in Proc. 2004 Int. Symp. Empirical Software Engineering (IEEE, 2004), pp. 147–155. Crossref, Google Scholar
11. M. Auer, A. Trendowicz, B. Graser, E. Haunschmid and S. Biffl, Optimal project feature weights in analogy-based cost estimation: Improvement and limitations, IEEE Trans. Softw. Eng. 32(2) (2006) 83–92. Crossref, Web of Science, Google Scholar
12. L. Song, L. L. Minku and X. Yao, The impact of parameter tuning on software effort estimation using learning machines, in Proc. 9th Int. Conf. Predictive Models in Software Engineering, 2013, pp. 9:1–9:10. Crossref, Google Scholar
13. C. Lokan and E. Mendes, Using chronological splitting to compare cross- and single-company effort models: Further investigation, in Proc. 32nd Australasian Computer Science Conf., Conferences in Research and Practice in Information Technology (CRPIT) Series, Vol. 91 (Australian Computer Society, 2008), pp. 47–54. Google Scholar
14. C. Lokan and E. Mendes, Investigating the use of chronological split for software effort estimation, IET Softw. 3(5) (2009) 422–434. Crossref, Web of Science, Google Scholar
15. C. Lokan and E. Mendes, Applying moving windows to software effort estimation, in Proc. 2009 3rd Int. Symp. Empirical Software Engineering and Measurement (IEEE, 2009), pp. 111–122. Crossref, Google Scholar
16. S. G. MacDonell and M. Shepperd, Data accumulation and software effort prediction, in Proc. 2010 ACM-IEEE Int. Symp. Empirical Software Engineering and Measurement, 2010, pp. 31:1–31:4. Crossref, Google Scholar
17. S. Amasaki, Replicated analyses of windowing approach with single company datasets, in Proc. 12th Int. Conf. Product Focused Software Development and Process Improvement, 2011, pp. 14–17. Crossref, Google Scholar
18. C. Lokan and E. Mendes, Investigating the use of duration-based moving windows to improve software effort prediction, in Proc. 2012 19th Asia-Pacific Software Engineering Conf. (IEEE, 2012), pp. 818–827. Crossref, Google Scholar
19. C. Lokan and E. Mendes, Investigating the use of duration-based moving windows to improve software effort prediction: A replicated study, Inf. Softw. Technol. 56(9) (2014) 1063–1075. Crossref, Web of Science, Google Scholar
20. E. Kocaguneli, T. Menzies and E. Mendes, Transfer learning in effort estimation, Empir. Softw. Eng. 20 (2015) 813–843. Crossref, Web of Science, Google Scholar
21. A. S. Fotheringham, C. Brunsdon and M. Charlton, Geographically Weighted Regression: The Analysis of Spatially Varying Relationships (John Wiley & Sons, Chichester, 2002). Google Scholar
22. P. E. Osborne, G. M. Foody and S. Suárez-Seoane, Non-stationarity and local approaches to modelling the distributions of wildlife, Divers. Distrib. 13 (2007) 313–323. Crossref, Web of Science, Google Scholar
23. J. Gao and S. Li, Detecting spatially non-stationary and scale-dependent relationships between urban landscape fragmentation and related factors using Geographically Weighted Regression, Appl. Geogr. 31(1) (2011) 292–302. Crossref, Web of Science, Google Scholar
24. S. Su, R. Xiao and Y. Zhang, Multi-scale analysis of spatially varying relationships between agricultural landscape patterns and urbanization using geographically weighted regression, Appl. Geogr. 32(2) (2012) 360–375. Crossref, Web of Science, Google Scholar
25. S. Leyk, P. U. Norlund and J. R. Nuckols, Robust assessment of spatial non-stationarity in model associations related to pediatric mortality due to diarrheal disease in Brazil, Spat. Spatio-temporal Epidemiol. 3(2) (2012) 95–105. Crossref, Google Scholar
26. C. Smartt and S. Ferreira, Advancing systems engineering in support of the bid and proposal process, Syst. Eng. 14(3) (2011) 305–326. Crossref, Web of Science, Google Scholar
27. S. Amasaki and C. Lokan, The evaluation of weighted moving windows for software effort estimation, in PROFES 2013: Product-Focused Software Process Improvement, Lecture Notes in Computer Science, Vol. 7983 (Springer, Berlin, 2013), pp. 214–228. Crossref, Google Scholar
28. S. Amasaki and C. Lokan, On the effectiveness of weighted moving windows: Experiment on linear regression based software effort estimation, J. Softw., Evolut. Process 27(7) (2015) 488–507. Crossref, Web of Science, Google Scholar
29. E. Kocaguneli, T. Menzies and J. W. Keung, Kernel methods for software effort estimation: Effects of different kernel functions and bandwidths on estimation accuracy, Empir. Softw. Eng. 18(1) (2013) 1–24. Crossref, Web of Science, Google Scholar
30. L. Angelis and I. Stamelos, A simulation tool for efficient analogy based cost estimation, Empir. Softw. Eng. 5 (2000) 35–68. Crossref, Google Scholar
31. K. Lum, J. Hihn and T. Menzies, Studies in software cost model behavior: Do we really understand cost model performance? in Proc. ISPA Int. Conf., 2006. Google Scholar
32. B. W. Boehm, Software Engineering Economics (Prentice-Hall, Englewood Cliffs, 1981). Google Scholar
33. T. Menzies, Z. Chen, J. Hihn and K. Lum, Selecting best practices for effort estimation, IEEE Trans. Softw. Eng. 32(11) (2006) 883–895. Crossref, Web of Science, Google Scholar
34. M. Shepperd and C. Schofield, Estimating software project effort using analogies, IEEE Trans. Softw. Eng. 23(12) (1997) 736–743. Crossref, Web of Science, Google Scholar
35. J.-M. Desharnais, Statistical analysis on the productivity of data processing with development projects using the function point technique, Master’s thesis, Université du Québec à Montréal, Montreal, Canada, 1988. Google Scholar
36. B. Kitchenham and E. Mendes, Why comparative effort prediction studies may be invalid, in Proc. 5th Int. Conf. Predictor Models in Software Engineering, 2009, pp. 4:1–4:5. Crossref, Google Scholar
37. P. A. Whigham, C. A. Owen and S. G. Macdonell, A baseline model for software effort estimation, ACM Trans. Softw. Eng. Methodol. 24(3) (2015) 20. Crossref, Web of Science, Google Scholar
38. M. C. Jones, J. S. Marron and S. J. Sheather, A brief survey of bandwidth selection for density estimation, J. Am. Stat. Assoc. 91(433) (1996) 401–407. Crossref, Web of Science, Google Scholar
39. B. A. Turlach, Bandwidth selection in kernel density estimation: A review, Discussion Paper No. 9311, Institut de Statistique, Université Catholique de Louvain, Louvain-la-Ncuve, Belgium, 1993. Google Scholar