A New Improved Prediction of Software Defects Using Machine Learning-based Boosting Techniques with NASA Dataset

Main Article Content

Jayanti Goyal
Ripu Ranjan Sinha

Abstract

Predicting when and where bugs will appear in software may assist improve quality and save on software testing expenses. Predicting bugs in individual modules of software by utilizing machine learning methods. There are, however, two major problems with the software defect prediction dataset: Social stratification (there are many fewer faulty modules than non-defective ones), and noisy characteristics (a result of irrelevant features) that make accurate predictions difficult. The performance of the machine learning model will suffer greatly if these two issues arise. Overfitting will occur, and biassed classification findings will be the end consequence. In this research, we suggest using machine learning approaches to enhance the usefulness of the CatBoost and Gradient Boost classifiers while predicting software flaws. Both the Random Over Sampler and Mutual info classification methods address the class imbalance and feature selection issues inherent in software fault prediction. Eleven datasets from NASA's data repository, "Promise," were utilised in this study. Using 10-fold cross-validation, we classified these 11 datasets and found that our suggested technique outperformed the baseline by a significant margin. The proposed methods have been evaluated based on their abilities to anticipate software defects using the most important indices available: Accuracy, Precision, Recall, F1 score, ROC values, RMSE, MSE, and MAE parameters. For all 11 datasets evaluated, the suggested methods outperform baseline classifiers by a significant margin. We tested our model to other methods of flaw identification and found that it outperformed them all. The computational detection rate of the suggested model is higher than that of conventional models, as shown by the experiments..

Article Details

How to Cite
Goyal, J. ., & Sinha, R. R. . (2023). A New Improved Prediction of Software Defects Using Machine Learning-based Boosting Techniques with NASA Dataset. International Journal on Recent and Innovation Trends in Computing and Communication, 11(10s), 492–504. https://doi.org/10.17762/ijritcc.v11i10s.7659
Section
Articles

References

N. Babu, Himagiri, V. Vamshi Krishna, A. Anil Kumar, and M. Ravi, “Software defect prediction analysis by using machine learning algorithms.,” Int. J. Recent Technol. Eng., 2019, doi: 10.35940/ijrte.B1438.0982S1119.

M. C. M. Prasad, L. F. Florence, and A. Arya3, “A Study on Software Metrics based Software Defect Prediction using Data Mining and Machine Learning Techniques,” Int. J. Database Theory Appl., 2015, doi: 10.14257/ijdta.2015.8.3.15.

S. Huda et al., “A Framework for Software Defect Prediction and Metric Selection,” IEEE Access, 2017, doi: 10.1109/ACCESS.2017.2785445.

P. Paramshetti and D. A. Phalke, “Software Defect Prediction for Quality Improvement Using Hybrid Approach,” Int. J. Appl. or Innov. Eng. Manag., 2015.

M. W. Thant and N. T. T. Aung, “Software Defect Prediction using Hybrid Approach,” in 2019 International Conference on Advanced Information Technologies (ICAIT), 2019, pp. 262–267. doi: 10.1109/AITC.2019.8921374.

K. Tanaka, A. Monden, and Z. Yucel, “Prediction of Software Defects Using Automated Machine Learning,” 2019. doi: 10.1109/SNPD.2019.8935839.

Meiliana, S. Karim, H. L. H. S. Warnars, F. L. Gaol, E. Abdurachman, and B. Soewito, “Software metrics for fault prediction using machine learning approaches: A literature review with PROMISE repository dataset,” in 2017 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom), 2017, pp. 19–23. doi: 10.1109/CYBERNETICSCOM.2017.8311708.

A. Alsaeedi and M. Z. Khan, “Software Defect Prediction Using Supervised Machine Learning and Ensemble Techniques: A Comparative Study,” J. Softw. Eng. Appl., 2019, doi: 10.4236/jsea.2019.125007.

P. Paramshetti and D. A. Phalke, “Survey on Software Defect Prediction Using Machine Learning Techniques,” Int. J. Sci. Res., 2014.

M. F. Sohan, M. A. Kabir, M. I. Jabiullah, and S. S. M. M. Rahman, “Revisiting the Class Imbalance Issue in Software Defect Prediction,” 2019. doi: 10.1109/ECACE.2019.8679382.

S. K. Pandey and A. K. Tripathi, “Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study,” 2021. doi: 10.1109/ICSCC51209.2021.9528170.

Z. W. Zhang, X. Y. Jing, and T. J. Wang, “Label propagation based semi-supervised learning for software defect prediction,” Autom. Softw. Eng., 2017, doi: 10.1007/s10515-016-0194-x.

H. Wang, T. M. Khoshgoftaar, and A. Napolitano, “A comparative study of ensemble feature selection techniques for software defect prediction,” 2010. doi: 10.1109/ICMLA.2010.27.

R. S. Wahono, “A Systematic Literature Review of Software Defect Prediction: Research Trends, Datasets, Methods and Frameworks,” J. Softw. Eng., 2015.

Q. Song, Y. Guo, and M. Shepperd, “A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction,” IEEE Trans. Softw. Eng., 2019, doi: 10.1109/TSE.2018.2836442.

K. Bashir, T. Li, C. W. Yohannese, and Y. Mahama, “Enhancing software defect prediction using supervised-learning based framework,” 2017. doi: 10.1109/ISKE.2017.8258790.

J. Ge, J. Liu, and W. Liu, “Comparative study on defect prediction algorithms of supervised learning software based on imbalanced classification data sets,” 2018. doi: 10.1109/SNPD.2018.8441143.

K. E. Bennin, J. Keung, P. Phannachitta, A. Monden, and S. Mensah, “MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction,” IEEE Trans. Softw. Eng., 2018, doi: 10.1109/TSE.2017.2731766.

A. Khalid, G. Badshah, N. Ayub, M. Shiraz, and M. Ghouse, “Software Defect Prediction Analysis Using Machine Learning Techniques,” Sustainability, vol. 15, no. 6, 2023, doi: 10.3390/su15065517.

P. Deep Singh and A. Chug, “Software defect prediction analysis using machine learning algorithms,” 2017. doi: 10.1109/CONFLUENCE.2017.7943255.

M. Assim, Q. Obeidat, and M. Hammad, “Software Defects Prediction using Machine Learning Algorithms,” 2020 Int. Conf. Data Anal. Bus. Ind. W. Towar. a Sustain. Econ. ICDABI 2020, 2020, doi: 10.1109/ICDABI51230.2020.9325677.

P. Tadapaneni, N. C. Nadella, M. Divyanjali, and Y. Sangeetha, “Software Defect Prediction based on Machine Learning and Deep Learning,” in 2022 International Conference on Inventive Computation Technologies (ICICT), 2022, pp. 116–122. doi: 10.1109/ICICT54344.2022.9850643.

Y. Shen, S. Hu, S. Cai, and M. Chen, “Software Defect Prediction based on Bayesian Optimization Random Forest,” 2022. doi: 10.1109/DSA56465.2022.00149.

S. A. Alasadi and W. S. Bhaya, “Review of data preprocessing techniques in data mining,” J. Eng. Appl. Sci., 2017, doi: 10.3923/jeasci.2017.4102.4107.

M. M. Ahsan, M. A. P. Mahmud, P. K. Saha, K. D. Gupta, and Z. Siddique, “Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance,” Technologies, 2021, doi: 10.3390/technologies9030052.

J. Li et al., “Feature selection: A data perspective,” ACM Computing Surveys. 2017. doi: 10.1145/3136625.

A. Jadhav, S. M. Mostafa, H. Elmannai, and F. K. Karim, “An Empirical Assessment of Performance of Data Balancing Techniques in Classification Task,” Appl. Sci., vol. 12, no. 8, 2022, doi: 10.3390/app12083928.

S. A. Alsaif and A. Hidri, “Impact of data balancing during training for best predictions,” Inform., 2021, doi: 10.31449/inf.v45i2.3479.

R. Ghorbani and R. Ghousi, “Comparing Different Resampling Methods in Predicting Students’ Performance Using Machine Learning Techniques,” IEEE Access, 2020, doi: 10.1109/ACCESS.2020.2986809.

X. Y. Wang, Y. Yang, Y. T. Bai, J. Bin Yu, Z. Y. Zhao, and X. B. Jin, “Fuzzy Boost Classifier of Decision Experts for Multicriteria Group Decision-Making,” Complexity, 2020, doi: 10.1155/2020/8147617.

A. K. Jaggi, A. Sharma, N. Sharma, R. Singh, and P. S. Chakraborty, “Diabetes Prediction Using Machine Learning,” Lect. Notes Networks Syst., vol. 185 LNNS, no. 09, pp. 383–392, 2021, doi: 10.1007/978-981-33-6081-5_34.

J. H. Friedman, “Stochastic gradient boosting,” Comput. Stat. Data Anal., 2002, doi: 10.1016/S0167-9473(01)00065-2.

S. Nagar, Introduction to Python for Engineers and Scientists. 2018. doi: 10.1007/978-1-4842-3204-0.

M. Källén, “Jupyter Notebooks on GitHub: Characteristics and Code Clones,” Uppsala University, 2020.

S. Gupta, “Best Python Libraries for Machine and Deep Learning,” 2022.

M. Gong, “A Novel Performance Measure for Machine Learning Classification,” Int. J. Manag. Inf. Technol., vol. 13, no. 1, pp. 11–19, 2021, doi: 10.5121/ijmit.2021.13101.

Bhawana Verma, S. K.A. (2019). Design & Analysis of Cost Estimation for New Mobile-COCOMO Tool for Mobile Application. International Journal on Recent and Innovation Trends in Computing and Communication, 7(1), 27–34. https://doi.org/10.17762/ijritcc.v7i1.5222

S. K.A., Raj, A. ., Sharma, V., & Kumar, V. (2022). Simulation and Analysis of Hand Gesture Recognition for Indian Sign Language using CNN. International Journal on Recent and Innovation Trends in Computing and Communication, 10(4), 10–14. https://doi.org/10.17762/ijritcc.v10i4.5556.