The paper presents a preprocessing and feature-engineering pipeline for data‑driven discovery of governing equations in a decaying, nearly incompressible 3D flow, observed on a 128³ periodic grid with only 10 time snapshots. The authors estimate temporal derivatives via second‑order local polynomial regression at each grid point, and benchmark FFT‑based spectral differentiation against a WENO5 finite-difference scheme for spatial derivatives, selecting the spectral method based on slightly better enforcement of incompressibility. They then estimate an effective global kinematic viscosity through linear regression of \(\partial \mathbf{u}/\partial t\) on \(\nabla^2 \mathbf{u}\) and analyze the resulting momentum residuals to infer the importance of unmeasured pressure gradients and motivate additional proxy terms. All primary fields and derived features are independently standardized.
A 26‑term feature library is constructed to support subsequent sparse-regression-based equation discovery, including advection, viscous, density and density‑gradient terms, divergence gradients, kinetic‑energy gradients, and velocity–density couplings such as components of \(\rho'\nabla \mathbf{u}\). The pipeline is evaluated using RMSE of temporal fits in both smooth and sharp‑gradient regions, incompressibility metrics for the spatial schemes, and qualitative and quantitative analysis of residuals against the incompressible Navier–Stokes momentum balance. The authors emphasize that the outcome is an “optimally conditioned” feature matrix and corresponding temporal derivative targets intended for later sparse regression, which is not carried out in this work.
The manuscript is clearly written and provides a coherent end‑to‑end narrative from dataset description and normalization to derivative estimation, residual analysis, and feature construction. However, several key choices (e.g., temporal fitting strategy, viscosity estimation, selection and interpretation of proxy features, and derivative-scheme benchmarking) are only partially justified, and the absence of any actual equation‑discovery experiment limits the ability to assess the effectiveness and robustness of the proposed pipeline or the physical interpretability of the resulting feature set.