Improving Phishing Website Detection with Machine Learning: Revealing Hidden Patterns for Better Accuracy

Main Article Content

Garlapati Narayana
Uma Devi Manchala
Usikela Naresh
Saggurthi Kiran
Medikonda Asha Kiran
Ravi Kumar Ch

Abstract

Phishing attacks remain a significant threat to internet users globally, leading to substantial financial losses and compromising personal information. This research study investigates various machine learning models for detecting phishing websites, with a primary focus on achieving high accuracy. After an extensive analysis, the Random Forest Classifier emerged as the most suitable choice for this task. Our methodology leveraged machine learning techniques to uncover subtle patterns and relationships in the data, going beyond traditional URL and content-based restrictions. By incorporating diverse website features, including URL and derived attributes, Page source code-based features, HTML JavaScript-based features, and Domain-based features, we achieved impressive results. The proposed approach effectively classified the majority of websites, demonstrating the efficiency of machine learning in addressing the phishing website detection challenge with an accuracy of over 98%, recall exceeding 98%, and a false positive rate of less than 4%. This research offers valuable insights to the field of cyber security, providing internet users with improved protection against phishing attempts.

Article Details

How to Cite
Narayana, G. ., Manchala, U. D. ., Naresh, U. ., Kiran, S. ., Kiran, M. A. ., & Ch, R. K. . (2023). Improving Phishing Website Detection with Machine Learning: Revealing Hidden Patterns for Better Accuracy. International Journal on Recent and Innovation Trends in Computing and Communication, 11(8), 385–392. https://doi.org/10.17762/ijritcc.v11i8.8353
Section
Articles

References

Antón, A. I., Earp, J. B., & Pankowsky, M. (2015). Social Engineering and Phishing Attacks: The Impact of Psychological Persuasion. Journal of Information Privacy & Security, 11(2), 61- 74. doi:10.1080/15536548.2015.1043353

Arachchilage, N. A. G., & Love, S. (2014). An Investigation of Phishing Attack Techniques. Information Management & Computer Security, 22(5), 419-443. doi:10.1108/IMCS-04-2014- 0067

Chang, K., & Xu, J. (2017). An Adaptive Method for Phishing Detection Based on URL Features. IEEE Access, 5, 17466-17475. doi:10.1109/ACCESS.2017.2752379.

Kumar, S., Selvakumar, P., & Mary, A. L. P. (2018). A Comparative Study of Phishing Websites Detection Using Machine Learning Algorithms. International Journal of Information & Computation Technology, 8(6), 3971-3979.

Nainar, N. J., & Halder, D. (2021). Adversarial Machine Learning: A Comprehensive Survey. Journal of Artificial Intelligence and Data Science, 3(4), 461-482. doi:10.36263/jaid.v3i4.185

Phatak, D. S., & Swami, A. (2016). Detection of Phishing Websites: A Machine Learning Approach. International Journal of Advanced Computer Research, 6(23), 53-57.

Sharma, S., & Upadhyay, R. (2019). An Investigation of Machine Learning Techniques for Phishing Websites Detection. Proceedings of the International Conference on Data Engineering and Communication Technology, 353-358. doi:10.1145/3318606.3318630

Singh, S., & Biswas, K. (2020). A Review of Machine Learning Techniques for Phishing Detection. Proceedings of the International Conference on Computer Communication and Informatics,689-693. doi:10.1109/ICCCI49486.2020.9110540

Yao, H., Gou, H., & Wu, H. (2017). An Investigation of Machine Learning-Based URL Classification for Phishing Detection. Security and Communication Networks, 2017, 1-14. doi:10.1155/2017/6136476

Liu, X., Srivastava, J., & Kumaraguru, P. (2011). PhishGuru: A People-Centric Phishing Countermeasure. In Proceedings of the 10th Annual ACM Workshop on Privacy in the Electronic Society (pp. 107-118). ACM.

Zhang, Y., Kim, J., & Giles, C. L. (2019). Deep Learning for Phishing URL Detection. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (pp. 287-296). ACM.

bin Saion, M. P. . (2021). Simulating Leakage Impact on Steel Industrial System Functionality. International Journal of New Practices in Management and Engineering, 10(03), 12–15. https://doi.org/10.17762/ijnpme.v10i03.129

Akhtar, N., Khan, F. M., & Faye, I. (2018). A Comparative Analysis of Ensemble Learning for Phishing Detection. In Proceedings of the 10th International Conference on Computer and Automation Engineering (pp. 71-76). ACM.

Chiew, K. Y., Tan, S. J., & Goi, B. M. (2020). An Ensemble Framework for Imbalanced Phishing URL Detection. Journal of Information Security and Applications, 52, 102577.

Chen, L., Wang, C., Wang, Y., Wang, S., & Zhang, X. (2019). A Machine Learning-based Phishing Detection System with URL Semantic Features and Traffic Analysis. Journal of Computers & Security, 85, 184-195.

Ahmad, S. N., Alshomrani, S. S., & Al-Mutiri, M. (2018). Evaluating machine learning classifiers for phishing detection. International Journal of Advanced Computer Science and Applications, 9(8), 58-64.

Datta, S., Sharma, M., & Chavan, S. (2019). Phishing URL detection using machine learning. 2019 2nd International Conference on Data, Engineering and Applications, 1-5.

Li, L., Deng, L., & Yegneswaran, V. (2017). Detecting and characterizing phishing webpages using machine learning. Computers & Security, 68, 36-49.

Nainar, A., & Halder, S. K. (2022). Adversarial machine learning for phishing detection: Challenges and opportunities. Journal of Information Security and Applications, 65, 102961.

Wang, Z., Zhou, X., & Wang, Y. (2021). Phishing detection using pre-trained language model with fine-tuning. 2021 9th International Conference on Information Technology in Medicine and Education, 30-34.

Zhang, J., Ye, J., & Gao, S. (2020). An improved deep learning model for phishing website detection. Information Systems Frontiers, 22(5), 1111-1121.