Skip to main navigation Skip to search Skip to main content

Machine learning approach to synthetic data generation: Uncertainty generative model with neural attention

  • Martin Kang
  • , Gary F. Templeton
  • , Seong-Jong Joo
  • , Han Kyul Kim
  • West Virginia University
  • Air Force Institute of Technology
  • University of Southern California
  • Loyola Marymount University
  • USC Viterbi School of Engineering

Research output: Contribution to journalArticlepeer-review

Abstract

Data scarcity undermines the precision of empirical and analytical research by limiting sample sizes and reducing statistical power. In domains such as business operations, financial management, and information systems, failure data often arise from rare events, introducing substantial aleatoric and epistemic uncertainty. Existing synthetic data generation methods, including interpolation-based oversampling and generative models, face persistent challenges. They often fail to capture rare events, preserve temporal dependencies, or model multiple sources of uncertainty, leading to unrealistic samples and degraded performance in downstream tasks. This study introduces the uncertainty generative model with neural attention (UGMNA), a synthetic data generation approach that integrates attentive neural processes, the Heston stochastic volatility model, and stochastic differential equations within a continuous-time latent framework. UGMNA addresses data scarcity by generating synthetic samples that emulate the distributional characteristics of original datasets while explicitly modeling both aleatoric and epistemic uncertainty. Its design enhances statistical power by augmenting limited datasets and ensures that synthetic data reflect key patterns, temporal dynamics, and complex distributions encountered in real-world scenarios. Experimental results across multiple case studies demonstrate that UGMNA reduces both types of uncertainty while preserving essential data patterns. Compared with conventional baselines and state-of-the-art generators, UGMNA consistently improves predictive accuracy, ranking performance, and model calibration in data-scarce, high-variance environments. These findings establish UGMNA as a robust framework for generating reliable synthetic data, offering practical utility for research and decision-making in contexts where data scarcity and uncertainty hinder model development.
Original languageEnglish
Pages (from-to)66–85
Number of pages20
JournalDecision Sciences
Volume57
Issue number1
DOIs
StatePublished - Feb 2026

ASJC Scopus Subject Areas

  • General Business,Management and Accounting
  • Strategy and Management
  • Information Systems and Management
  • Management of Technology and Innovation

Keywords

  • aleatoric uncertainty
  • epistemic uncertainty
  • synthetic method
  • uncertainty

Cite this