Reinforcement of the Bank Loan Model using the Feature Selection Method of Machine Learning

Main Article Content

Noopur Goel
Durgesh Kumar Singh

Abstract

Does feature selection and machine learning (ML) guarantee the effectiveness of the bank credit system model? This article aims to analyze this problem. In fact, in finance, expert-based credit risk models still dominate. In this study, we establish a new benchmark using consumer data and present machine learning methods. A risk prediction that is as accurate as possible is an important requirements for credit scoring models. In addition, regulators expect that the models should to be auditable and transparent. As a result, the superior predictive power of contemporary machine learning algorithms cannot be fully utilized in credit scoring because very simple predictive models, such as several ML classifiers, are still widely used. As a result, significant potential is missed, increasing reserves or the number of credit defaults. A framework for comparing scores before and after feature selection machine learning models that are transparent, auditable, and explainable is presented in this article, as well as the various dimensions that need to be taken into consideration in order to make credit scoring models understandable. In accordance with this framework, we give an overview of the models which demonstrate how it can be used in credit scoring, and compare the results to scorecards' interpretability. The model presented demonstrates that machine learning techniques can maintain their ability to enhance predictive power while still maintaining a comparable level of interpretability.

Article Details

How to Cite
Goel, N. ., & Singh, D. K. . (2023). Reinforcement of the Bank Loan Model using the Feature Selection Method of Machine Learning. International Journal on Recent and Innovation Trends in Computing and Communication, 11(7s), 126–137. https://doi.org/10.17762/ijritcc.v11i7s.6984
Section
Articles

References

Munkhdalai, L., Munkhdalai, T., Namsrai, O. E., Lee, J. Y., & Ryu, K. H. (2019). An empirical comparison of machine-learning methods on bank client credit assessments. Sustainability, 11(3), 699.

Ricci, A., Jankowski, M., Pedersen, A., Sánchez, F., & Oliveira, F. Predicting Engineering Student Success using Machine Learning Algorithms. Kuwait Journal of Machine Learning, 1(2). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/118

Van Thiel, D., & Van Raaij, W. F. F. (2019). Artificial intelligence credit risk prediction: An empirical study of analytical artificial intelligence tools for credit risk prediction in a digital era. Journal of Risk Management in Financial Institutions, 12(3), 268-286.

Ubarhande, P., & Chandani, A. (2021). Elements of credit rating: a hybrid review and future research Agenda. Cogent Business & Management, 8(1), 1878977.

Saidi, R., Bouaguel, W., & Essoussi, N. (2019). Hybrid feature selection method based on the genetic algorithm and pearson correlation coefficient. Machine learning paradigms: theory and application, 3-24.

Kabir, M. M., Shahjahan, M., & Murase, K. (2012). A new hybrid ant colony optimization algorithm for feature selection. Expert Systems with Applications, 39(3), 3747-3763.

Mr. Vaishali Sarangpure. (2014). CUP and DISC OPTIC Segmentation Using Optimized Superpixel Classification for Glaucoma Screening. International Journal of New Practices in Management and Engineering, 3(03), 07 - 11. Retrieved from http://ijnpme.org/index.php/IJNPME/article/view/30

Chaurasia, V., & Chaurasia, A. (2023). Detection of Parkinson's Disease by Using Machine Learning Stacking and Ensemble Method. Biomedical Materials & Devices, 1-13.

Soria, D., Garibaldi, J. M., Ambrogi, F., Biganzoli, E. M., & Ellis, I. O. (2011). A ‘non-parametric’version of the naive Bayes classifier. Knowledge-Based Systems, 24(6), 775-784.

Park, H. A. (2013). An introduction to logistic regression: from basic concepts to interpretation with particular attention to nursing domain. Journal of Korean Academy of Nursing, 43(2), 154-164.

Osisanwo, F. Y., Akinsola, J. E. T., Awodele, O., Hinmikaiye, J. O., Olakanmi, O., & Akinjobi, J. (2017). Supervised machine learning algorithms: classification and comparison. International Journal of Computer Trends and Technology (IJCTT), 48(3), 128-138.

Chaurasia, V., & Pal, S. (2022). Ensemble technique to predict breast cancer on multiple datasets. The Computer Journal, 65(10), 2730-2740.

Feng, D. C., Liu, Z. T., Wang, X. D., Jiang, Z. M., & Liang, S. X. (2020). Failure mode classification and bearing capacity prediction for reinforced concrete columns based on ensemble machine learning algorithm. Advanced Engineering Informatics, 45, 101126.

Taunk, K., De, S., Verma, S., & Swetapadma, A. (2019, May). A brief review of nearest neighbor algorithm for learning and classification. In 2019 International Conference on Intelligent Computing and Control Systems (ICCS) (pp. 1255-1260). IEEE.

Costa, M. A., Wullt, B., Norrlöf, M., & Gunnarsson, S. (2019). Failure detection in robotic arms using statistical modeling, machine learning and hybrid gradient boosting. Measurement, 146, 425-436.

Deberneh, H. M., & Kim, I. (2021). Prediction of type 2 diabetes based on machine learning algorithm. International journal of environmental research and public health, 18(6), 3317.

Kumar, N. V. M. ., Raju, D. N. ., PV, G. ., & Subhashini, P. (2023). Real-Time User-Service Centric Historical Trust Model Based Access Restriction in Collaborative Systems with Blockchain Public Auditing in Cloud. International Journal of Intelligent Systems and Applications in Engineering, 11(2s), 69–75. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2509

Chen, Y. L., Hsiao, C. H., & Wu, C. C. (2022). An ensemble model for link prediction based on graph embedding. Decision Support Systems, 157, 113753.

Ahammad, D. S. H. ., & Yathiraju, D. . (2021). Maternity Risk Prediction Using IOT Module with Wearable Sensor and Deep Learning Based Feature Extraction and Classification Technique. Research Journal of Computer Systems and Engineering, 2(1), 40:45. Retrieved from https://technicaljournals.org/RJCSE/index.php/journal/article/view/19

Krmar, J., Vuki?evi?, M., Kova?evi?, A., Proti?, A., Ze?evi?, M., & Otaševi?, B. (2020). Performance comparison of nonlinear and linear regression algorithms coupled with different attribute selection methods for quantitative structure-retention relationships modelling in micellar liquid chromatography. Journal of Chromatography A, 1623, 461146.

Carmen Rodriguez, Predictive Analytics for Disease Outbreak Prediction and Prevention , Machine Learning Applications Conference Proceedings, Vol 3 2023.

Bank_Personal_Loan_Modelling.xlsx. https://www.kaggle.com/code/pritech/bank-personal-loan-modelling/data. Accessed 25th January 2023.

Li, H., Leung, K. S., Wong, M. H., & Ballester, P. J. (2015). Improving AutoDock Vina using random forest: the growing accuracy of binding affinity prediction by the effective exploitation of larger data sets. Molecular informatics, 34(2?3), 115-126.

Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics, 21, 1-13.

Tatbul, N., Lee, T. J., Zdonik, S., Alam, M., & Gottschlich, J. (2018). Precision and recall for time series. Advances in neural information processing systems, 31.

Obuchowski, N. A., & Bullen, J. A. (2018). Receiver operating characteristic (ROC) curves: review of methods with applications in diagnostic medicine. Physics in Medicine & Biology, 63(7), 07TR01.

Besmer, M. D., Weissbrodt, D. G., Kratochvil, B. E., Sigrist, J. A., Weyland, M. S., & Hammes, F. (2014). The feasibility of automated online flow cytometry for in-situ monitoring of microbial dynamics in aquatic ecosystems. Frontiers in microbiology, 5, 265.