REALM(Realistic AI Learning for Multiphysics) is the first comprehensive benchmark dedicated to evaluating neural surrogates on realistic spatiotemporal multiphysics flows. Unlike existing benchmarks that rely on simplified, low-dimensional problems, REALM tests models on challenging, application-driven reactive flow scenarios where traditional solvers struggle.
The benchmark features 11 high-fidelity datasets spanning canonical problems, high-Mach reactive flows, propulsion engines, and fire hazards. Each trajectory requires hundreds to thousands of CPU/GPU hours, placing REALM squarely in the regime where acceleration is practically valuable.
We systematically evaluate 12+ neural surrogate families including spectral operators, convolutional models, Transformers, pointwise operators, and graph/mesh networks, revealing three critical findings:
Ignition kernels in homogeneous isotropic turbulence. H₂/O₂ premixed flame evolution with turbulent wrinkling and kernel merging.
Time-evolving CH₄/O₂ shear jet flame with mixing-driven stabilization and strain-induced extinction/reignition.
Planar cellular detonation with shock-reaction coupling and characteristic cellular structures.
Reacting Taylor-Green vortex with flame-vortex interaction, extinction, and reignition dynamics.
Propagating H₂-air flame in homogeneous isotropic turbulence with varying pressure and turbulence intensity.
Buoyancy-driven CH₄ pool fire with plume entrainment, puffing, and McCaffrey regime transitions.
Supersonic H₂ cavity flame with shock-shear-flame interactions and recirculation stabilization.
Detonation diffraction around obstacle with potential decoupling and re-initiation.
Single-element CH₄/O₂ rocket combustor with shear-coaxial injector and nozzle acceleration.
Seven-element rocket combustor with multi-injector interactions and 3D turbulent mixing.
Building facade fire with window venting, buoyant plume, and facade-guided flame attachment.
| Model | Family | Case | Correlation | Test Error | Params (M) | Infer Time (s) | Details |
|---|
* Click on any row to view detailed metrics including training and validation errors.
2D Regular Cases: Scatter plot showing correlation vs. inference speed. Bubble size represents test error (smaller is better), color represents parameter count. Click on bubbles to highlight corresponding models in the leaderboard.
3D Regular Cases: Performance becomes more challenging with higher dimensionality. Models show increased error and reduced correlation compared to 2D cases.
Irregular Mesh Cases: Pointwise models (especially DeepONet) show better robustness on irregular meshes compared to spectral/convolutional operators.
Radar chart comparing all 16 benchmarked models across key performance dimensions.
Larger enclosed area indicates better overall trade-off.
Metrics (normalized 0-100%):
Correlation - prediction accuracy;
Accuracy - error metrics;
Inference Speed - throughput;
Parameter Efficiency - fewer parameters is better;
Memory Efficiency - lower memory footprint
Each dot represents one model-case pair. Dot size encodes correlation (larger = higher correlation), color encodes test error (green = low error, red = high error).
Representative snapshots showing OH mass fraction and velocity fields for IgnitHIT and EvolveJet cases, plus pressure fields for PlanarDet. FFNO consistently preserves fine-scale structures with lowest error growth.
Vorticity isosurfaces for ReactTGV, temperature isosurfaces for PoolFire showing buoyancy-driven pulsation, and flame propagation structures in PropHIT. 3D cases show markedly faster error accumulation.
Temperature fields for SupCavityFlame, MultiCoaxFlame rocket combustor, and FacadeFire. DeepONet shows comparative robustness on irregular meshes while graph-based models tend to over-smooth.
Left: Per-category scatter plots of correlation vs. inference efficiency. Middle: Radar chart of mean performance across all cases. Right: Per-case performance summary showing train/test correlation distributions.