Autonomous AI Self-Healing Distributed Systems Using Deep Reinforcement Learning (DRL)

Main Article Content

Anuraag Mangari Neburi

Abstract

The cloud-native and distributed systems of modernity create complex failures that can hardly be detected and recovered manually or through the rule of thumb. The current paper is a proposal of an Autonomous AI Self-Healing Distributed System based on Deep Reinforcement Learning (DRL). The structure integrates real time observability, artificial intelligence fault detection and a DRL based action engine to make autonomous choices and take autonomous action by selecting and executing the recovery actions. Controlled failure injection was used as a quantitative experimentation. The findings indicate that there are a great deal of improvement in Mean Time to Repair (MTTR), increased availability of the system, and there is also a low rate of false positive remediation when using the traditional ones. The results prove that DRL facilitates efficient, persistent, and self-reliant system resilience.

Article Details

How to Cite
Neburi, A. M. (2024). Autonomous AI Self-Healing Distributed Systems Using Deep Reinforcement Learning (DRL). International Journal on Recent and Innovation Trends in Computing and Communication, 12(1), 403–409. https://doi.org/10.17762/ijritcc.v12i1.11835
Section
Articles