Class Imbalance Reduction and Centroid based Relevant Project Selection for Cross Project Defect Prediction

Main Article Content

Kiran Kumar Bejjanki
Sai Priyanka Kanchanapally
Mahesh Kumar Thota

Abstract

Cross-Project Defect Prediction (CPDP) is the process of predicting defects in a target project using information from other projects. This can assist developers in prioritizing their testing efforts and finding flaws. Transfer Learning (TL) has been frequently used at CPDP to improve prediction performance by reducing the disparity in data distribution between the source and target projects. Software Defect Prediction (SDP) is a common study topic in software engineering that plays a critical role in software quality assurance. To address the cross-project class imbalance problem, Centroid-based PF-SMOTE for Imbalanced data is used. In this paper, we used a Centroid-based PF-SMOTE to balance the datasets and Centroid based relevant data selection for Cross Project Defect Prediction. These methods use the mean of all attributes in a dataset and calculating the difference between mean of all datasets. For experimentation, the open source software defect datasets namely, AEEM, Re-Link, and NASA, are considered.

Article Details

How to Cite
Bejjanki, K. K. ., Kanchanapally, S. P. ., & Thota, M. K. . (2023). Class Imbalance Reduction and Centroid based Relevant Project Selection for Cross Project Defect Prediction. International Journal on Recent and Innovation Trends in Computing and Communication, 11(6s), 293–302. https://doi.org/10.17762/ijritcc.v11i6s.6933
Section
Articles

References

T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, “Cross-project defect prediction: a large scale experiment on data vs. domain vs. process,” in FSE/ESEC’09. ACM, 2009, pp. 91–100.

B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano, “On the relative value of cross-company and within-company data for defect prediction,” Empirical Software Engineering, vol. 14, no. 5, pp. 540–578, 2009.

A Survey on Software Defect Prediction Using Deep Learning Elena N. Akimova 1,2,? , Alexander Yu. Bersenev 1,2, Artem A. Deikov 1,2, Konstantin S. Kobylkin 1,2 , Anton V. Konygin 1 , Ilya P. Mezentsev 1,2 and Vladimir E. Misilov 1,2.

S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, “Benchmarking classification models for software defect prediction: A proposed framework and novel findings,” IEEE Transactions on Software Engineering, vol. 34, no. 4, pp. 485–496, 2008.

X.-Y. Jing, S. Ying, Z.-W. Zhang, S.-S. Wu, and J. Liu, “Dictionary learning based software defect prediction,” in ICSE’14. ACM, 2014, pp. 414–423.

T. Wang, Z. Zhang, X.-Y. Jing, and L. Zhang, “Multiple kernel ensemble learning for software defect prediction,” Automated Software Engineering, vol. 23, no. 4, pp. 569–590, 2016.

Z. Xu, J. Liu, X. Luo, Z. Yang, Y. Zhang, P. Yuan, Y. Tang, and T. Zhang, “Software defect prediction based on kernel PCA and weighted extreme learning machine,” Information and Software Technology, vol. 106, pp. 182–200, 2019.

DSSDPP: Data Selection and Sampling based Domain Programming Predictor for Cross-project Defect Prediction Zhiqiang Li, Hongyu Zhang, Xiao-Yuan Jing, Juanying Xie, Min Guo, Jie Ren.

Z. Li, X.-Y. Jing, and X. Zhu, “Progress on approaches to software defect prediction,” IET Software, vol. 12, no. 3, pp. 161–175, 2018.

M. Shepperd, D. Bowes, and T. Hall, “Researcher bias: The use of machine learning in software defect prediction,” IEEE Transactions on Software Engineering, vol. 40, no. 6, pp. 603–616, 2014.

Y. Zhou, Y. Yang, H. Lu, L. Chen, Y. Li, Y. Zhao, J. Qian, and B. Xu, “How far we have progressed in the journey? An examination of cross-project defect prediction,” ACM Transactions on Software Engineering and Methodology, vol. 27, no. 1, pp. 1–51, 2018.

S. Herbold, A. Trautsch, and J. Grabowski, “A comparative study to benchmark cross-project defect prediction approaches,” IEEE Transactions on Software Engineering, vol. 44, no. 9, pp. 811–833, 2018.

S. Hosseini, B. Turhan, and D. Gunarathna, “A systematic literature review and meta-analysis on cross project defect prediction,” IEEE Transactions on Software Engineering, vol. 45, no. 2, pp. 111– 147, 2019.

S. Watanabe, H. Kaiya, and K. Kaijiri, “Adapting a fault prediction model to allow inter language reuse,” in PROMISE’08, 2008, pp. 19–24

A. E. Camargo Cruz and K. Ochimizu, “Towards logistic regression models for predicting fault-prone code across software projects,” in ESEM’09, 2009, pp. 460–463.

C. Ni, W. S. Liu, X. Chen, Q. Gu, D. X. Chen, and Q. G. Huang, “A cluster based feature selection method for cross-project software defect prediction,” Journal of Computer Science and Technology, vol. 32, no. 6, pp. 1090–1107, 2017.

Y. Zhang, L. O. David, X. Xia, and J. Sun, “Combined classifier for cross-project defect prediction: an extended empirical study,” Frontiers of Computer Science, vol. 12, no. 2, pp. 280–296, 2018.

J. Nam, S. J. Pan, and S. Kim, “Transfer defect learning,” in ICSE’13. IEEE, 2013, pp. 382–391.

Y. Ma, G. Luo, X. Zeng, and A. Chen, “Transfer learning for cross company software defect prediction,” Information and Software Technology, vol. 54, no. 3, pp. 248–256, 2012.

C. Liu, D. Yang, X. Xia, M. Yan, and X. Zhang, “A two-phase transfer learning model for cross-project defect prediction,” Information and Software Technology, vol. 107, pp. 125–136, 2019.

Z. Li, J. Niu, X.-Y. Jing, W. Yu, and C. Qi, “Cross-project defect prediction via landmark selection-based kernelized discriminant subspace alignment,” IEEE Transactions on Reliability, vol. 70, no. 3, pp. 996–1013, 2021.

X. Xia, D. Lo, S. J. Pan, N. Nagappan, and X. Wang, “Hydra: massively compositional model for cross-project defect prediction,” IEEE Transactions on Software Engineering, vol. 42, no. 10, pp. 977– 998, 2016

L. Chen, B. Fang, Z. Shang, and Y. Tang, “Negative samples reduction in cross-company software defects prediction,” Information and Software Technology, vol. 62, pp. 67–77, 2015.

D. Ryu, J.-I. Jang, and J. Baik, “A transfer cost-sensitive boosting approach for cross-project defect prediction,” Software Quality Journal, vol. 25, no. 1, pp. 235–272, 2017.

L. Gong, S. Jiang, and L. Jiang, “An improved transfer adaptive boosting approach for mixed-project defect prediction,” Journal of Software: Evolution and Process, vol. 31, no. 10, pp. 1–28, 2019.

D. Ryu, O. Choi, and J. Baik, “Value-cognitive boosting with a support vector machine for cross-project defect prediction,” Empirical Software Engineering, vol. 21, no. 1, pp. 43–71, 2016.

Shankarpure, M. R. ., & Patil, D. D. . (2023). A Comprehensive Survey on Methods and Techniques for Automated Fruit Plucking. International Journal of Intelligent Systems and Applications in Engineering, 11(1), 156–168. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2454.

X.-Y. Jing, F. Wu, X. Dong, and B. Xu, “An improved sda based defect prediction framework for both within-project and cross project class-imbalance problems,” IEEE Transactions on Software Engineering, vol. 43, no. 4, pp. 321–338, 2017.

D. Ryu, J. Jang, and J. Baik, “A hybrid instance selection using nearest-neighbor for cross-project defect prediction,” Journal of Computer Science and Technology, vol. 30, no. 5, pp. 969–980, 2015.

Z. Xu, S. Pang, T. Zhang, X. Luo, J. Liu, Y. Tang, X. Yu, and L. Xue, “Cross project defect prediction via balanced distribution adaptation based transfer learning,” Journal of Computer Science and Technology, vol. 34, no. 5, pp. 1039–1062, 2019.

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schlkopf, and A. J. Smola, “A kernel method for the two-sample-problem,” in NIPS’07. MIT Press, 2007, pp. 513–520.

Kiran Kumar Bejjanki, Sai Priyanka Kanchanapally, “Centroid-based PF-SMOTE for Imbalanced data,” International Conference on Mathematical Sciences and Emerging Applications in Technology (ICMSEAT-2022) (In collaboration with APTSMS), September 9-11, 2022