Research NotesNo Access

Exploiting Correlation Subspace to Predict Heterogeneous Cross-Project Defects

Ming Cheng

State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan, Hubei 430072, P. R. China

Search for more papers by this author

Guoqing Wu

State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan, Hubei 430072, P. R. China

Search for more papers by this author

Hongyan Wan

State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan, Hubei 430072, P. R. China

Search for more papers by this author

Guoan You

State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan, Hubei 430072, P. R. China

Search for more papers by this author

Mengting Yuan

State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan, Hubei 430072, P. R. China

E-mail Address: ymt@whu.edu.cn

Corresponding author.

Search for more papers by this author

, and

Min Jiang

Department of Cognitive Science and Technology, School of Information Science and Engineering, Xiamen University, Xiamen, Fujian 361005, P. R. China

Fujian Key Laboratory of Brain-Like Intelligent Systems, School of Information Science and Engineering, Xiamen University, Xiamen, Fujian 361005, P. R. China

Search for more papers by this author

https://doi.org/10.1142/S0218194016710017Cited by:7 (Source: Crossref)

Abstract

Cross-project defect prediction trains a prediction model using historical data from source projects and applies the model to target projects. Most previous efforts assumed the cross-project data have the same metrics set, which means the metrics used and the size of metrics set are the same. However, this assumption may not hold in practical scenarios. In addition, software defect datasets have the class-imbalance problem which increases the difficulty for the learner to predict defects. In this paper, we advance canonical correlation analysis by deriving a joint feature space for associating cross-project data. We also propose a novel support vector machine algorithm which incorporates the correlation transfer information into classifier design for cross-project prediction. Moreover, we take different misclassification costs into consideration to make the classification inclining to classify a module as a defective one, alleviating the impact of imbalanced data. The experimental results show that our method is more effective compared to state-of-the-art methods.

Keywords:

References

1. G. Czibula, Z. Marian and I. G. Czibula, Software defect prediction using relational association rule mining, Inf. Sci. 264 (2014) 260–278. Crossref, Web of Science, Google Scholar
2. X. Y. Jing, S. Ying, Z. W. Zhang, S. S. Wu and J. Liu, Dictionary learning based software defect prediction, in Proc. 36th Int. Conf. Software Engineering, 2014, pp. 414–423. Google Scholar
3. Y. Ma, G. C. Luo, X. Zeng and A. Chen, Transfer learning for cross-company software defect prediction, Inf. Softw. Technol. 54 (2012) 248–256. Crossref, Web of Science, Google Scholar
4. J. Nam, S. J. Pan and S. Kim, Transfer defect learning, in Proc. 35th Int. Conf. Software Engineering, 2013, pp. 382–391. Google Scholar
5. B. Turhan, T. Menzies, A. B. Bener and J. Di Stefano, On the relative value of cross-company and within-company data for defect prediction, Empir. Softw. Eng. 14 (2009) 540–578. Crossref, Web of Science, Google Scholar
6. S. Wang and X. Yao, Using class imbalance learning for software defect prediction, IEEE Trans. Reliab. 62 (2013) 434–443. Crossref, Web of Science, Google Scholar
7. Z. Sun, Q. Song and X. Zhu, Using coding-based ensemble learning to improve software defect prediction, IEEE Trans. Syst. Man Cybern. C, Appl. Rev. 42 (2012) 1806–1817. Crossref, Web of Science, Google Scholar
8. J. Nam and S. Kim, Heterogeneous defect prediction, in Proc. 10th Joint Meeting on Foundations of Software Engineering, 2015, pp. 508–519. Google Scholar
9. X. Y. Jing, F. Wu, X. Dong, F. Qi and B. Xu, Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning, in Proc. 10th Joint Meeting on Foundations of Software Engineering, 2015, pp. 496–507. Google Scholar
10. D. Ryu, O. Choi and J. Baik, Value-cognitive boosting with a support vector machine for cross-project defect prediction, Empir. Softw. Eng. 21 (2016) 43–71. Crossref, Web of Science, Google Scholar
11. D. Hardoon, S. Szedmak and J. Shawe-Taylor, Canonical correlation analysis: An overview with application to learning methods, Neural Comput. 16 (2004) 2639–2664. Crossref, Web of Science, Google Scholar
12. Y. R. Yeh, C. H. Huang and Y. C. F. Wang, Heterogeneous domain adaptation and classification by exploiting the correlation subspace, IEEE Trans. Image Process. 23 (2014) 2009–2018. Crossref, Web of Science, Google Scholar