Remedy Agent: Autonomous Data Quality Management through Memory-Augmented LLM Reasoning and Deep Q-Network Optimization

Pritam Roy

doi:10.70589/JRTCSE.2026.14.3.1

Authors

Pritam Roy Sr Manager, Capgemini, USA Author

DOI:

https://doi.org/10.70589/JRTCSE.2026.14.3.1

Keywords:

Agentic AI, data quality, deep Q-network, reinforcement learning, fleet analytics, LLM agents, streaming data, autonomous remediation

Abstract

The proliferation of IoT-enabled fleet telematics systems generates unprecedented volumes of streaming data, presenting critical challenges for maintaining data quality at enterprise scale. This paper introduces a novel agentic AI-driven data quality (DQ) framework that leverages Deep Q-Networks (DQN) for autonomous remediation action optimization. Our architecture integrates an 8-component AI Agent pipeline comprising Input Layer, Python/Airflow Orchestration, Claude 4.0 LLM Agent, Memory Store with resolved checks from JIRA and runbooks from Confluence, Alert Classification (False Alert/True with Runbook/True without Runbook), L2/L3 Escalation pathways, JIRA Integration, and Continuous Learning Loop. The framework employs Model Context Protocol (MCP) for AWS tool integration and implements actor-critic reinforcement learning with Proximal Policy Optimization (PPO) for policy gradient updates. Experimental evaluation across 47,832 production alerts over 90 days demonstrates 94.2% detection accuracy (Δ+28.9%), 67% MTTR reduction (7.1→2.3 hours), 64.8% automated resolution rate, and 312% memory store growth. Comparative analysis against rule-based, ML-only, and LLM-only baselines validates the synergistic benefits of the proposed hybrid architecture. The self-service analytics dashboard achieves 78% adoption with 91% backlog reduction, while the system maintains stable operation over 180 days with predefined convergence guarantees.

Impact Statement—Current data quality management approaches force enterprises to choose between rule-based systems offering interpretability but lacking adaptability, or black-box ML solutions providing automation without explainable decision-making. This work resolves this fundamental tension by demonstrating that agentic AI architectures combining LLM reasoning with reinforcement learning optimization achieve both interpretable decision processes and autonomous adaptation synergy previously considered architecturally incompatible. Our key intellectual contribution establishes that memory-augmented learning from organizational knowledge artifacts (resolved JIRA tickets, Confluence runbooks) enables closed-loop optimization achieving 312% knowledge base growth while maintaining 94.2% accuracy. This challenges the prevailing assumption that enterprise AI systems require extensive manual feature engineering, instead demonstrating that contextual reasoning over unstructured organizational memory provides superior signal extraction. The practical implications extend beyond data quality to enterprise AI adoption broadly. By achieving 67% MTTR reduction and 83% autonomous handling rate in production fleet analytics processing 47,832 alerts, we provide the first validated blueprint for deploying agentic AI in mission-critical data operations. The framework's three-way alert classification with automated escalation pathways offers immediately transferable patterns for organizations seeking to operationalize LLM-based autonomous systems while maintaining human oversight. This democratization of intelligent automation capabilities represents significant advancement for data engineering practitioners facing increasing data volumes with constrained operational resources.

References

Y. Zeng, B. Liao, Z. Li, C. Hua, and S. Li, "A comprehensive review of recent advances on intelligence algorithms and information engineering applications," IEEE Access, vol. 11, pp. 1-26, 2024.

Z. Zhang and Y. Lu, "Study on artificial intelligence: The state of the art and future prospects," Journal of Industrial Information Integration, vol. 23, p. 100224, 2021.

L. Xiao, B. Liao, S. Li, and K. Chen, "Nonlinear recurrent neural networks for finite-time solution of general time-varying linear matrix equations," Neural Networks, vol. 98, pp. 102-113, 2018.

B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, "Optimal and autonomous control using reinforcement learning: A survey," IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 6, pp. 2042-2062, 2017.

V. Mnih et al., "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529-533, 2015.

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017.

R. Y. Wang and D. M. Strong, "Beyond accuracy: What data quality means to data consumers," Journal of Management Information Systems, vol. 12, no. 4, pp. 5-33, 1996.

S. Schelter, D. Lange, P. Schmidt, M. Ceber, F. Biessmann, and A. Grafberger, "Automating large-scale data quality verification," Proc. VLDB Endow., vol. 11, no. 12, pp. 1781-1794, 2018.

A. Vaswani et al., "Attention is all you need," Advances in Neural Information Processing Systems, vol. 30, 2017.

T. Brown et al., "Language models are few-shot learners," Advances in Neural Information Processing Systems, vol. 33, pp. 1877-1901, 2020.

Anthropic, "Claude: A conversational AI assistant," Technical Report, 2024.

M. Zaharia et al., "Apache Spark: A unified engine for big data processing," Communications of the ACM, vol. 59, no. 11, pp. 56-65, 2016.