Autonomous Data Pipeline Orchestration Using Multi-Agent AI Systems: Architecture, Implementation, and Empirical Evaluation

Main Article Content

Tejaskumar Patel, Asadullah Saif Mohammed

Abstract

With the surge of large language models (LLMs) and multi-agent AI, a paradigm shift in data engineering practice has started. Enterprise data is growing at an exponential rate, with the number of data schemas increasing. Operational overhead and maintenance effort for conventional Extract Transform Load (ETL) methods, which involve manually-authored scripts, poorly-forged dependency graphs and reactive maintenance, are vastly exorbitant. This paper introduces the multi-agent AI system, the Autonomous Data Pipeline Orchestration (ADPO) framework, where specialised agents can autonomously generate, deploy, monitor, self-heal and govern different kinds of data pipelines with minimal human oversight. The ADPO architecture consists of a large language model (LLM) backend of GPT-4 class language models, a langchain based ReAct agent, Apache Airflow 2.8 for workflow's scheduling, Kubernetes for elastic container orchestration, and Delta Lake to store the state of machines in an ACID (atomicity, consistency, isolation, durability) compliant way. Empirical testing has shown that, against 150 real world pipeline scenarios, ADPO decreases pipeline generation time by 78.6% (from 36.4s to 7.8s), decreases mean time to repair (MTTR) by 88.1% (from 41.2 min to 4.9 min), and improves the data quality composite score from 63.7% to 91.6% as compared to manual baselines. ADPO continues to scale near-linearly up to 500 pipelines simultaneously and delivers 51,000 records per second for a 3.6× improvement over pipeline rule-based implementations. These findings make ADPO a leading proprietary solution for autonomous data engineering that have far-reaching impacts for enterprise reliability, compliance and the transformation of engineering people.

Article Details

How to Cite
Tejaskumar Patel, Asadullah Saif Mohammed. (2026). Autonomous Data Pipeline Orchestration Using Multi-Agent AI Systems: Architecture, Implementation, and Empirical Evaluation. International Journal on Recent and Innovation Trends in Computing and Communication, 14(1), 99–106. Retrieved from https://www.ijritcc.org/index.php/ijritcc/article/view/12147
Section
Articles