Synthetic Financial Document Generation and Fraud Detection Using Generative AI and Explainable ML

Authors

  • Akash Vijayrao Chaudhari Senior Associate, Santander Bank, Florham Park, NJ, USA Author
  • Pallavi Ashokrao Charate Senior Systems Analyst, Worldpay, Cincinnati, OH, USA Author

DOI:

https://doi.org/10.70589/JRTCSE.2025.13.2.6

Keywords:

Financial Document Fraud, Generative AI, LayoutLM, Explainable AI (XAI), Synthetic Data Generation

Abstract

Financial document fraud (e.g., falsified invoices or receipts) is a growing challenge requiring automated solutions. However, training data for detecting such fraud is scarce due to privacy and confidentiality concerns¹. In this paper, we propose a system that generates synthetic financial documents using generative models and uses a transformer-based classifier (LayoutLM) for fraud detection, augmented with explainable AI (XAI) techniques for interpretability. Synthetic document generation (with GANs and diffusion models) expands the training dataset while preserving privacy, enabling improved detection of anomalous or fraudulent documents. A LayoutLM-based model is fine-tuned to classify documents as genuine or fraudulent, leveraging both textual content and layout information. We integrate SHAP and LIME explainability tools to highlight the features (e.g. specific text fields or patterns) that drive each fraud prediction, satisfying the "right to explanation" and building user trust. Experiments demonstrate that augmenting training with synthetic documents substantially improves fraud recall and F1-score, while explainability techniques provide insights into the model’s decisions. We also discuss ethical implications, including data privacy, fairness (bias mitigation), and transparency. The results indicate that combining generative data augmentation with explainable deep learning offers a promising approach for financial document fraud detection.

References

R. Gricius and I. Belovas, “Generation of Synthetic Invoices for the Training of Machine Learning Models,” in Proc. ICPRAM, 2023 (abstract).

A. Van Driessche, “Realistic Document Generation using Generative Adversarial Networks,” Ixor Blog (Medium), 2021.

A. Carr, “Diffusion models for document synthesis,” Gretel AI Blog, May 2022.

Y. Xu, M. Li, L. Cui, S. Huang, F. Wei, and M. Zhou, “LayoutLM: Pre-training of Text and Layout for Document Image Understanding,” in Proc. ACM SIGKDD, pp. 1192–1200, 2020.

M. T. Ribeiro, S. Singh, and C. Guestrin, “Why Should I Trust You? Explaining the Predictions of Any Classifier,” in Proc. ACM SIGKDD, pp. 1135–1144, 2016.

Akash Vijayrao Chaudhari, 2. Pallavi Ashokrao Charate(2025); AI-Driven Data Warehousing in Real-Time Business Intelligence: A Framework for Automated ETL, Predictive Analytics, and Cloud Integration, International Journal of Research Culture Society, ISSN(O): 2456-6683, Volume – 9, Issue – 3., Pp.185-189. Available on – https://ijrcs.org/

Akash Vijayrao Chaudhari, Pallavi Ashokrao Charate (2025). Federated Learning in Data Warehousing: A Privacy-Preserving Approach for Distributed Analytics. International Journal of Advance Research, Ideas and Innovations in Technology, 11(1) www.IJARIIT.com.

S. M. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” in Proc. NeurIPS, pp. 4765–4774, 2017.

YData, “Synthetic Data for Aligning ML Models to Business Value,” YData Blog, 2023.

IBM Data and AI Team, “Synthetic data generation: Building trust by ensuring privacy and quality,” IBM Blog, Nov. 2023.

The Royal Society, “Synthetic Data: What, Why and How?” April 2022.

A. Amari et al., “An Efficient Deep Learning-Based Approach to Automating Invoice Document Validation,” in Proc. IEEE AICCSA, 2024.

Chaudhari, A. V., & Charate, P. A. (2024). Data Warehousing for IoT Analytics. International Research Journal of Engineering and Technology (IRJET), 11(6), 311–320

Downloads

How to Cite

Akash Vijayrao Chaudhari, & Pallavi Ashokrao Charate. (2025). Synthetic Financial Document Generation and Fraud Detection Using Generative AI and Explainable ML. JOURNAL OF RECENT TRENDS IN COMPUTER SCIENCE AND ENGINEERING ( JRTCSE), 13(2), 45-59. https://doi.org/10.70589/JRTCSE.2025.13.2.6