Differentiable centisecond halo-model predictions in ΛCDM and beyond

Boris Bolliet, Claude Code
2026-05-26 14:15:26 AOE Reviewed by Skepthical
4 review section(s)
Official Review Official Review by Skepthical · 2026-05-26

The manuscript presents classy_szlite, a pure-JAX halo-model pipeline for fast and differentiable predictions of the thermal SZ angular power spectrum, with emphasis on $C_\ell^{yy}$. It combines CosmoPower ede-v2 neural emulators for cosmology-dependent inputs, FFTLog-based transforms (for $\sigma(R,z)$ and pressure-profile Fourier windows), and a JAX-native integration design that enables automatic differentiation and millisecond-level evaluations (reported as $\sim 5~{\rm ms}$ for fixed-cosmology evaluations and $\sim 20~{\rm ms}$ including cosmology setup; Sec. 2–3). The paper demonstrates (Sec. 5) gradient-based MAP estimation (L-BFGS(-B), Newton), and compares RW-MH (cobaya) versus NUTS (NumPyro) on an 8-bandpower tSZ dataset at two fixed cosmologies, reporting modern diagnostics (R-hat, ESS, divergences, E-BFMI) and validating gradients/Fisher matrices against finite differences. The work is timely and potentially impactful for inference, forecasting, and gradient-based methods in halo-model applications, but key aspects require clarification and stronger empirical support: the precise scope of “end-to-end differentiability” given non-JAX FFTLog components, the specification/reproducibility of the data likelihood and priors, validation against established reference pipelines (e.g. class_sz/CAMB/CLASS), and fair/fully documented benchmarking and sampler comparisons.

Clear motivation and positioning: explains why differentiability (not only speed) enables new inference workflows (Sec. 1, Sec. 6).
Technically coherent architecture: JAX-first design with a practical “factory/closure” pattern to amortize cosmology setup and accelerate nuisance-only sampling (Sec. 3.3).
Strong practical demonstration: MAP optimization + NUTS sampling with comprehensive diagnostics (R-hat, ESS, divergences, E-BFMI) and an informative RW-MH vs NUTS comparison (Sec. 5.1–5.6).
Good numerical self-checks: gradients and Fisher matrices compared to finite differences with near machine-precision agreement (Sec. 5.6, Sec. 5.8).
Open-source/reproducibility-oriented: code and emulator weights are made available, and the manuscript largely ties methodology to implementable components (Data Availability; Sec. 3, Sec. 6).
Forward-looking discussion: outlines extensions to other tracers and differentiable inference paradigms (Fisher forecasts, VI, SBI), and comments on hardware backends (Sec. 6.1–6.3).
  • **The manuscript repeatedly claims an “end-to-end”, “fully differentiable”, “JAX-traceable” pipeline, but Sec. 3.2–3.3 indicate reliance on non-JAX FFTLog / `mcfit.TophatVar` NumPy code paths that are “not safe to JIT-trace”. As written, it is unclear which parts are actually inside the autodiff graph. This affects the scope of supported gradients—especially derivatives with respect to cosmological parameters if $\sigma(R,z)$ (and any downstream quantities depending on it) is computed outside JAX.** *Recommendation:* In Sec. 3.2–3.3, explicitly state what “differentiable” means in the public implementation. Add a compact “differentiability matrix” (table) listing which outputs (e.g. $C_\ell$, bandpowers, likelihood) are differentiable with respect to which parameter blocks (cosmology vs pressure/profile vs nuisance) and whether gradients pass through: (i) emulator calls, (ii) $\sigma(R,z)$/FFTLog, (iii) HMF/bias, (iv) interpolation steps, (v) halo integrals. If $\sigma(R,z)$ is precomputed and treated as a constant inside the closure at fixed cosmology, say so and constrain the “end-to-end” language accordingly in Sec. 1–3 and Sec. 6. If cosmology gradients are intended, describe (even briefly) the concrete plan (JAX-native FFTLog, custom JVP/VJP, or alternative $\sigma$ computation) and how this would affect timings.
  • **The worked-example likelihood is under-specified and therefore hard to reproduce/interpret (Sec. 5.1, Sec. 5.3–5.5). The source of the 8 bandpowers, their exact $\ell$-binning/window functions, whether the data are real or synthetic, how the full $8\times 8$ covariance was obtained (and its correlation structure), and the priors / parameterization for $(P_0,\beta)$ are not fully stated. Because the posterior exhibits long tails, priors and parameter bounds materially affect results and sampler behavior.** *Recommendation:* Expand Sec. 5 with a precise likelihood specification: identify the dataset origin (or state it is synthetic), provide the 8 bandpowers with $\ell$-bin edges/centers and uncertainties (table or appendix), define or reference the bandpower window functions used, and describe covariance estimation (analytic vs simulations; diagonal vs correlated; any conditioning). Write the Gaussian likelihood explicitly in terms of the data vector and covariance. State priors (forms and bounds) and whether parameters are sampled in linear/log space, and clarify whether MAP includes priors (Sec. 5.3). If the full numerical vectors/matrices live in the repository, explicitly point to file paths and ensure they match the manuscript.
  • **Scientific/technical validation against established reference calculations is currently too thin given the central reliance on (i) CosmoPower ede-v2 emulators (Sec. 3.1) and (ii) a new numerical/integration implementation. There is no in-paper end-to-end comparison of $C_\ell^{yy}$ (1h/2h/total) against a CLASS/CAMB-based pipeline (e.g. class_sz) over representative cosmologies, nor a discussion of how emulator errors propagate to downstream halo-model observables and inferred parameters.** *Recommendation:* Add a validation subsection (Sec. 3 or Sec. 5) comparing classy_szlite to a reference pipeline (class_sz/CLASS/CAMB-based) for multiple cosmologies within the ede-v2 training domain and at least one profile setting. Show fractional differences versus $\ell$ (and ideally bandpower-level differences) for 1h, 2h, and total. Summarize emulator accuracy relevant for tSZ inputs (Sec. 3.1) and briefly discuss error propagation to $C_\ell^{yy}$ and to degeneracies (e.g. $\sigma_8$–$P_0$). Also state how points near emulator boundaries are handled.
  • **A central advertised use case is higher-dimensional joint cosmology+astrophysics inference (Sec. 1, Sec. 5.2, Sec. 6.1), yet the main demonstration is restricted to a 2-parameter $(P_0,\beta)$ inference at fixed cosmology (Sec. 5). Scaling claims for NUTS and the differentiable approach at $d\gtrsim 6$–30 are extrapolated rather than empirically shown in this work.** *Recommendation:* Add at least one modest higher-dimensional example using the same pipeline and NUTS (e.g. include $B$ and one additional GNFW shape parameter; or a small joint cosmology+profile run with $d\sim 6$–10, even with informative priors or a simplified/mock likelihood). Report wall time, gradient-evaluation counts, ESS, R-hat, divergences, and (ideally) ESS-per-gradient-evaluation. If infeasible, soften the language in Sec. 1, Sec. 5.2, Sec. 5.5, and Sec. 6.1 to clearly label higher-$d$ performance as an expectation rather than a demonstrated result.
  • **Timing benchmarks and sampler-comparison methodology are not fully normalized or clearly documented (Sec. 2; Sec. 5.1–5.5; Sec. 6.2; Fig. 1). The RW-MH vs NUTS comparison mixes differing parallelism (e.g. RW-MH with “4 MPI walkers” vs NUTS chain execution assumptions), warmup/compilation inclusion is unclear, hardware details differ across locations (laptop CPU vs EPYC vs TPU), and the headline “$\sim 100\times$” statement appears numerically inconsistent in at least one place (Sec. 5.5).** *Recommendation:* For Sec. 5.1–5.5, state a clear protocol: (i) whether reported times include JAX/XLA compilation and warmup; (ii) whether NUTS chains are run sequentially or in parallel; (iii) for RW-MH, proposal tuning/adaptation, warmup, and whether “4 MPI walkers” implies 4 cores used concurrently (and whether times are wall-clock vs CPU-time). Report efficiency in units that factor out parallelism differences (e.g. ESS per forward-model evaluation, ESS per gradient evaluation) in addition to ESS/sec. Correct the inconsistent speedup arithmetic in Sec. 5.5 and reconcile the NUTS wall-time discrepancy between Table 1 and Fig. 4 caption by explicitly stating what differs (sample budget, hardware, warmup inclusion, parallelization). For Fig. 1 / Sec. 6.2, standardize time units, document thread settings (OpenMP/JAX/XLA), and provide reproducible benchmark scripts and variability/error bars.
  • **Core mathematical definitions for the tSZ window function are not fully verifiable from the manuscript (Sec. 3.2; Sec. 4). In particular, $W_y(\ell,M,z)$ uses symbols that are not defined (e.g. $\ell_{500}$, $J_\ell$), the prefactor is dimensionally ambiguous, and consistency with Eq. (1)’s transform convention is difficult to audit.** *Recommendation:* Define $\ell_{500}$ explicitly (e.g. via $\ell_{500}=D_A(z)/r_{500}$ or equivalent) and provide an explicit definition of $J_\ell[\cdot]$ (including whether it contains the $r^2dr$ measure and any projection factors). Re-check and state the dimensional consistency so that Compton-$y$ is dimensionless, and ensure the $W_y$ definition is demonstrably consistent with Eq. (1) under the substitutions $r=x r_{500}$ and $k=k_\ell$ (Sec. 4).
  • **The profile Fourier-transform integral is written to $r=\infty$ (Eq. (1), Sec. 3.2) while the inference explores outer slopes around $\beta\approx 2.7$ (Sec. 5.3–5.4), for which integrals like $\int r^2 P(r)dr$ can fail to converge without truncation/apodization. The manuscript does not state an $r_{\max}$, truncation scheme, or convergence/regularization strategy, so low-$k$/low-$\ell$ well-posedness is unclear from the PDF alone.** *Recommendation:* State explicitly whether the GNFW/pressure profile is truncated (and at what radius, e.g. multiple of $r_{500}$ or $r_{\rm vir}$), apodized, or otherwise regularized before applying Eq. (1). Document the effective $r$-grid used in the implementation and clarify under what conditions on $\beta$ the transform exists as $k\to 0$. If finite-$r$ limits are used in code, reflect that in the mathematical description (Eq. (1) and surrounding text in Sec. 3.2).
  • **Key numerical settings controlling accuracy and runtime are not systematically documented (Sec. 3.1–3.3; Sec. 5; Sec. 6.2). Important details (mass/redshift grid ranges and resolution, FFTLog configuration, interpolation schemes, $\ell$ sampling, integration rules, and default configuration used to reproduce figures/timings) are scattered or implicit, which limits strict reproducibility and makes it harder to judge accuracy/runtime trade-offs.** *Recommendation:* Add a dedicated “Numerical configuration” subsection (e.g. Sec. 3.4 or a preamble to Sec. 5) that lists: (i) $z$ and $M$ grid definitions (ranges, spacing, sizes); (ii) FFTLog settings for $\sigma(R,z)$ and pressure transforms (grid sizes, bias/padding, extrapolation); (iii) $\ell$ arrays and bandpower windowing; (iv) integration strategy (vectorization, quadrature, any adaptivity); (v) runtime scaling with grid sizes. Point to a single default config file / parameter dictionary in the repository that reproduces the paper’s figures and timings.
  • The general “tracer-agnostic” halo-model template claim (Sec. 4; Sec. 6.1–6.3) is currently more conceptual than demonstrated, since only tSZ $C_\ell^{yy}$ is implemented and shown. Extensions to kSZ$^2$, CIB, galaxy–lensing, cluster counts typically require additional ingredients (HOD/selection functions, mass–observable relations, redshift kernels) beyond merely swapping $W_T$. *Recommendation:* Either (i) temper the generality claims in the Abstract/Sec. 4/Sec. 6 to clearly state that the current public focus is tSZ $C_\ell^{yy}$, or (ii) add one concrete non-tSZ worked example (e.g. a simple $y\times g$ cross-spectrum) including the explicit kernel/window and timing. In Sec. 6.3, briefly enumerate the additional modeling components required per tracer and what is realistically planned for near-term releases.
  • Physical interpretation of the worked-example GNFW parameter shifts and degeneracies is easy to overread as astrophysical tension (Sec. 5.3–5.4), even though many relevant quantities are fixed (e.g. hydrostatic bias $B$, other GNFW parameters, cosmology). *Recommendation:* Add a concise caveat in Sec. 5.3–5.4 listing what is fixed ($B$, $c_{500}$, $\gamma$, $\alpha$, cosmology, etc.) and how these assumptions could shift inferred $(P_0,\beta)$. Optionally include a small sensitivity test (vary $B$ or swap profile model) or point directly to repository scripts enabling such tests, emphasizing that Sec. 5 is primarily methodological.
  • Notation inconsistency in Sec. 5: the heading refers to “tSZ $C_\ell^{\gamma\gamma}$ bandpowers” while the observable elsewhere is $C_\ell^{yy}$. This can be confusing (shear $\gamma$ vs Compton-$y$). *Recommendation:* Standardize Sec. 5 and relevant figure captions to $C_\ell^{yy}$ (or explicitly define $\gamma$ if it is intended to denote $y$).
  • ESS / integrated autocorrelation time definitions are internally inconsistent (Sec. 5.1): ESS is written as $N/(1+2\,\tau_{\rm int})$ but later reasoning uses ESS $\approx N/\tau_{\rm int}$, which yields different numbers for the same $\tau_{\rm int}$. *Recommendation:* Define $\tau_{\rm int}$ precisely (including whether it includes the lag-0 term and whether it is per-chain or pooled) and use one consistent ESS relation throughout; update the text so the reported ESS values match the stated convention.
  • Units conventions are not fully explicit: Sec. 4 gives $dV/(d\Omega dz)=\chi^2/H(z)$ while elsewhere $c$ is explicit (e.g. $m_e c^2$ in $W_y$). Without a statement (e.g. $c=1$ in the cosmological part, or $H$ in {\rm Mpc}^{-1}), dimensional consistency is harder to audit. *Recommendation:* Add a brief units convention statement near Sec. 4 (and/or in a notation table), clarifying whether $c=1$ is assumed for $\chi,H$ or how emulator outputs encode units.
  • Sampler-performance conclusions rely heavily on a mean-based “accuracy” metric for one parameter (Fig. 10; Sec. 5.5). Given non-Gaussian/heavy-tailed posteriors, mean error alone may not capture posterior agreement robustly. *Recommendation:* Complement the mean-based metric with at least one additional posterior-distance/summary metric (e.g. error on both $(P_0,\beta)$ mean and covariance; KS/Wasserstein distance for 1D marginals; or bandpower-predictive checks). Also report ESS/sec (or ESS per eval) for both parameters, not only $P_0$.
  • Interpolation and potential non-smoothness: if the implementation uses operations like `searchsorted` for piecewise interpolation on grids, gradients can become non-smooth and may affect HMC/NUTS stability in some regimes (Sec. 3.3; implied by JAX implementation choices). *Recommendation:* Briefly document the interpolation scheme(s) used for emulator outputs and grids, and comment on whether any non-smooth steps exist in the computational graph. If relevant, note tested stability (divergences) and/or consider smoother interpolation alternatives for parameters that move grid indices.
  • Figure set (esp. Fig. 1 and figures in Sec. 5) contains several presentation/reproducibility issues flagged by the structured review: mixed/inconsistent time units (Fig. 1), unclear inclusion/exclusion of compilation/warmup, incomplete axis/legend definitions, and potential accessibility issues (small fonts, color choices). *Recommendation:* Standardize units (prefer ms), ensure captions are self-contained (hardware + protocol + sample sizes), add error bars/variability where appropriate, and adopt colorblind-safe palettes with redundant encodings (markers/linestyles).
  • Emulator interface documentation (Sec. 3.1) mixes standard cosmological parameters with emulator-specific conventions/symbols (e.g. $N_{\mathrm{ut}}$; outputs like $\log_{10}(k^3P_k)$), which may hinder external use. *Recommendation:* Add a small table in Sec. 3.1 listing each emulator, its inputs, outputs, units/conventions, and grid definitions, and define any nonstandard symbols at first use.
  • Tracer bias notation ambiguity: Eq. (3) uses $b_T(M,z)$ while elsewhere a single halo bias model is implied (Sec. 3.3; Tinker-10). *Recommendation:* Clarify whether $b_T\equiv b_{\rm halo}(M,z)$ for all tracers in this paper or whether tracer-specific effective biases are intended in the general template.
  • Table 1 contains at least one ambiguous numeric format (“12.3/6”) and could be clearer about whether entries are $\chi^2$/dof or reduced $\chi^2$, and which timings include warmup/compilation. *Recommendation:* Label columns unambiguously (separate $\chi^2$ and dof, or state explicitly $\chi^2$/dof) and add footnotes specifying timing definitions (warmup, compilation, parallelism assumptions).
  • Typographical/naming inconsistencies: “classy_szLite” vs “classy_szlite”, “cl_yyFACTORY” vs “cl_yyFactory” vs “cl_yy_factory”, “c1_yy” vs “cl_yy”, “TSZ” vs “tSZ”, inconsistent capitalization of hardware names (Sec. 2; Sec. 3.3; Sec. 5; Sec. 6.2). *Recommendation:* Do a final standardization pass across text, figures, and code references: choose one spelling for the package and key API functions, standardize “tSZ” and $C_\ell^{yy}$, and fix small typos (e.g. “c1_yy” $\to$ “cl_yy”).
  • Bibliography issues: duplicated or inconsistent Bolliet et al. 2023a/2023b entries (same title/arXiv), and several incomplete “in prep.” references (References). *Recommendation:* Disambiguate or merge duplicate references; where possible replace “in prep.” with arXiv IDs or add clarifying notes, and fill missing journal/publication metadata.
  • Some long, dense sentences and potentially confusing notation choices (e.g. using $Z$ as a Z-score metric, which can be confused with Bayesian evidence $Z$; Sec. 1; Sec. 5.5). *Recommendation:* Split a few long sentences for readability and explicitly define the Z-score symbol (and state it is not the evidence) where used.
  • Figure readability: some fonts/line widths are small for print; legends sometimes rely on captions; some panels lack labels. *Recommendation:* Increase font sizes/line widths, add panel labels, and include concise in-figure legends so figures remain interpretable when viewed standalone or in grayscale.
Key Statements & References Statement Verification by Skepthical · 2026-05-26
  • The ede-v2 CosmoPower emulators used in classy_szlite, originally trained and validated for the ACT DR6 extended-cosmology analysis, reproduce standard $\Lambda$CDM predictions at the default early-dark-energy parameter value $f_{\rm EDE}=10^{-3}$, with emulator outputs agreeing with CAMB-based references to well under $0.1\sigma$ on ACT DR6 parameter constraints.
  • _Reference(s):_ Spurio Mancini et al. 2022, Bolliet et al. 2023a, Bolliet et al. 2023b
  • The variance $\sigma(R,z)$ entering the Tinker halo mass function and bias, and the Fourier transform $\tilde{u}(k|M,z)$ of the GNFW pressure profile, are computed with FFTLog as implemented in the mcfit library, using the TophatVar transform for $\sigma(R,z)$ and mcfit.SphericalBessel for $\tilde{u}(k|M,z)$, with the $P(k)$ emulator’s log-uniform $k$-grid chosen to satisfy FFTLog’s requirements.
  • _Reference(s):_ Hamilton, 2000, Talman, 1978, Li, 2019
  • classy_szlite adopts the Tinker 2008 halo mass function with the redshift-dependent parameters given in their Table 2 and the Tinker 2010 linear bias expressed in terms of the peak height $\nu=\delta_c/\sigma(M,z)$ at $\Delta_{\rm crit}=500$, thereby matching the halo-population modelling used in class_sz for direct inter-code comparisons.
  • _Reference(s):_ Tinker et al., 2008, Tinker et al., 2010
  • In the worked tSZ bandpower example, a NumPyro implementation of the no-U-turn sampler (NUTS) using exact JAX gradients achieves an effective sample size of $\approx 1400$ with $\hat{R}\leq 1.003$ and zero divergences in $4\times 2000$ post-warmup samples, whereas a cobaya random-walk Metropolis chain tuned to the same posterior requires $\approx 14–17$ minutes to reach $n_{\rm eff}\approx 1900$, demonstrating that NUTS is about $100\times$ faster wall-for-wall at matched accuracy on this 2D problem, consistent with theoretical scaling arguments for RW-MH and HMC.
  • _Reference(s):_ Hoffman & Gelman 2014, Phan et al., 2019, Torrado & Lewis, 2021
  • The ede-v2 CosmoPower emulator suite used by classy_szlite covers $\Lambda$CDM, $m_\nu$–$\Lambda$CDM, $w$CDM, $N_{\rm eff}$–$\Lambda$CDM, and early-dark-energy combinations thereof, and has been shown to achieve $\lesssim 0.1\sigma$ accuracy relative to CAMB-based calculations for $\Lambda$CDM and extended cosmologies in recent ACT DR6 and DESI DR2 analyses, making it sufficient for essentially all current CMB and large-scale-structure survey targets without resorting to a full Boltzmann solver.
  • _Reference(s):_ Spurio Mancini et al. 2022, Bolliet et al. 2023a, Calabrese et al. 2025
Mathematical Consistency Audit Mathematics Audit by Skepthical · 2026-05-26

This section audits symbolic/analytic mathematical consistency (algebra, derivations, dimensional/unit checks, definition consistency).

Maths relevance: light

The PDF contains a small number of central analytic expressions: a spherical-Bessel Fourier transform for the pressure profile (Eq. (1)), a generic 1-halo/2-halo Limber-template for angular power spectra (Eqs. (2)–(3)), a stated form for the tSZ window function $W_y$ (inline), and a Fisher-matrix expression (Eq. (4)). Most other content is computational/algorithmic. The core halo-model template is structurally consistent, but key definitions needed to verify the tSZ window normalization and profile-transform well-posedness are missing or ambiguous, and there is an internal inconsistency in the ESS/$\tau_{\rm int}$ formula statements.

### Checked items

  • Alpha-beta sigmoid activation definition (Sec. 3.1, p.2)
  • Claim: Defines the hidden-layer activation as $h_{\rm out} = (\beta + \sigma(\alpha z) (1 − \beta)) z$.
  • Checks: algebra, notation consistency, sanity/limiting cases
  • Verdict: PASS; confidence: high; impact: minor
  • Assumptions/inputs: $\sigma$ denotes the logistic sigmoid, $\alpha$ and $\beta$ are scalar parameters per hidden layer (or broadcastable to $z$)
  • Notes: Expression is algebraically well-formed. Limiting cases behave sensibly: $\beta=1$ gives $h_{\rm out}=z$; $\beta=0$ gives $h_{\rm out}=\sigma(\alpha z) z$.
  • Spherical Fourier transform of pressure profile (Eq. (1), Sec. 3.2, p.3)
  • Claim: Defines $\tilde{u}(k|M,z)=4\pi \int_0^\infty r^2 \left[\frac{\sin(kr)}{kr}\right]P\left(\frac{r}{r_{500}(M,z)}\right) dr$.
  • Checks: algebra, dimensional/units, existence/convergence, definition consistency
  • Verdict: UNCERTAIN; confidence: medium; impact: critical
  • Assumptions/inputs: This is intended as a 3D spherical-Bessel ($j_0$) transform with the author’s chosen Fourier convention, $P$ is the 3D radial pressure profile (or proportional to it)
  • Notes: The form matches a standard $4\pi \int r^2 j_0(kr)f(r)dr$ convention, but the written upper limit $\infty$ raises a well-posedness concern when later-sampled outer slopes are around $\beta\approx 2.7$: the $k\to 0$ limit involves $\int r^2 P(r)dr$, which would require $\beta>3$ (or an explicit truncation/regularization) to converge. The PDF does not state truncation/apodization, so existence at low $k$/low $\ell$ cannot be verified.
  • Dimensionless lookup variable for profile transform (Sec. 3.2, p.3)
  • Claim: Stores $\tilde{u}$ as a 1-D lookup over $s = k r_{500}/c_{500}$ for the Arnaud-10 profile.
  • Checks: dimensional/units, definition consistency
  • Verdict: PASS; confidence: medium; impact: minor
  • Assumptions/inputs: $c_{500}$ is the GNFW concentration-like parameter relating $r_s = r_{500}/c_{500}$, Shape dependence is intended to be captured as a function of $k r_s$
  • Notes: $s$ is dimensionless if $k$ is inverse-length. Using $r_s = r_{500}/c_{500}$ is consistent with tabulating transforms for profiles expressed in terms of $r/r_s$.
  • 1-halo Limber template for angular cross-power (Eq. (2), Sec. 4, p.3)
  • Claim: $C_\ell^{XY,{\rm 1h}} = \int dz\, \frac{dV}{d\Omega dz} \int d\ln M\, \frac{dn}{d\ln M} W_X(\ell,M,z) W_Y(\ell,M,z)$.
  • Checks: algebra, notation consistency, dimensional/units (structural)
  • Verdict: PASS; confidence: high; impact: moderate
  • Assumptions/inputs: $W_T$ are the projected Fourier-space window functions appropriate for the defined tracer fields, $dn/d\ln M$ corresponds to the same mass definition used in $W_T$
  • Notes: Structure is internally consistent: the use of $d\ln M$ with $dn/d\ln M$ is consistent, and separating tracer dependence into $W_T$ is coherent.
  • 2-halo Limber template and tracer integrals (Eq. (3), Sec. 4, p.3)
  • Claim: $C_\ell^{XY,{\rm 2h}} = \int dz\, \frac{dV}{d\Omega dz} P_{\rm lin}(k_\ell,z) \prod_{T\in \{X,Y\}} I_T(\ell,z)$, with $I_T \equiv \int d\ln M\, \frac{dn}{d\ln M}\, b_T(M,z) W_T(\ell,M,z)$.
  • Checks: algebra, notation consistency, structural sanity
  • Verdict: PASS; confidence: medium; impact: moderate
  • Assumptions/inputs: Linear bias factorization is intended (two-halo term proportional to $P_{\rm lin}$ times products of biased tracer weights), $b_T$ is either the halo bias or an effective tracer bias
  • Notes: The product form is algebraically equivalent to $C\propto I_X I_Y$. Minor clarity gap: $b_T$ notation suggests tracer-specific bias, while elsewhere a single halo bias model is described.
  • Limber mapping and comoving volume element (Sec. 4, p.3)
  • Claim: Uses $k_\ell=(\ell+1/2)/\chi(z)$ and $\frac{dV}{d\Omega dz}=\chi^2/H(z)$.
  • Checks: dimensional/units, definition consistency
  • Verdict: UNCERTAIN; confidence: medium; impact: moderate
  • Assumptions/inputs: $\chi$ is comoving distance and $H$ is the Hubble rate, Either $c=1$ is assumed in cosmological distances or $H$ is expressed in inverse-length units
  • Notes: Dimensionally, $dV/d\Omega dz$ typically requires consistency between units of $\chi$ and $H$; the PDF does not state whether $c=1$ is assumed for these terms, while it keeps $c$ explicitly elsewhere (e.g., $m_e c^2$). A units convention is needed to confirm consistency.
  • tSZ window function expression (Inline after Eq. (3), Sec. 4, p.3)
  • Claim: States $W_y(\ell,M,z) = (\sigma_T/m_e c^2)\, (4\pi r_{500}^3/\ell_{500}^2) J_\ell[P_e(x|M,z)]$, with $J_\ell$ the spherical-Bessel projection at $k_\ell$ and $x=r/r_{500}$.
  • Checks: dimensional/units, definition consistency, compatibility with Eq. (1)
  • Verdict: UNCERTAIN; confidence: medium; impact: critical
  • Assumptions/inputs: $\ell_{500}$ is a characteristic angular multipole scale associated with $r_{500}$ and $D_A(z)$, $J_\ell$ is a specific integral operator acting on the radial profile
  • Notes: Key quantities are undefined ($\ell_{500}$, the precise definition of $J_\ell$), preventing verification of the $r_{500}$ and $D_A$ dependence and whether this expression is consistent with Eq. (1) plus the 3D$\to$2D projection. As written, the prefactor’s $r_{500}$ power is ambiguous without knowing what $J_\ell$ includes.
  • Fisher matrix for Gaussian likelihood with fixed covariance (Eq. (4), Sec. 5.8, p.5)
  • Claim: $F_{ij}(\theta) = (\partial_i \mu)^T \Sigma^{-1} (\partial_j \mu)$, with $\mu(\theta)$ the model bandpower vector.
  • Checks: algebra, notation consistency, assumption consistency
  • Verdict: PASS; confidence: high; impact: moderate
  • Assumptions/inputs: Likelihood is Gaussian in the data vector with parameter-independent covariance $\Sigma$
  • Notes: Correct under the stated assumption of fixed $\Sigma$. The transpose/inner-product structure is consistent for $\mu$ as a vector.
  • ESS and integrated autocorrelation time relations (Sec. 5.1, p.4)
  • Claim: States ESS $= N/(1+2 \tau_{\rm int})$ and later states $\tau_{\rm int} \approx 8$ is in agreement with ESS $\approx N/\tau_{\rm int}$.
  • Checks: algebra, definition consistency
  • Verdict: FAIL; confidence: high; impact: minor
  • Assumptions/inputs: $N$ is the total number of draws, $\tau_{\rm int}$ is the integrated autocorrelation time
  • Notes: The two ESS relations are inconsistent unless $\tau_{\rm int}$ is defined differently in each statement. The text uses both as if they were simultaneously valid for the same $\tau_{\rm int}$, which is a mathematical/definition inconsistency that should be corrected by fixing the $\tau_{\rm int}$ convention and corresponding ESS formula.

### Limitations

  • Audit is limited to the PDF text provided; several key definitions (e.g., $\ell_{500}$ and $J_\ell$) are not given in the PDF, preventing verification of the tSZ window normalization.
  • The paper frequently references implementation details (code modules, libraries) without fully specifying the analytic conventions (Fourier conventions, truncation of profiles, unit system), which limits purely symbolic verification.
  • No appendices or step-by-step derivations are included for the halo-model/tSZ-specific normalizations; where intermediate definitions are missing, items are marked UNCERTAIN rather than inferred.
Numerical Results Audit Numerics Audit by Skepthical · 2026-05-26

This section audits numerical/empirical consistency: reported metrics, experimental design, baseline comparisons, statistical evidence, leakage risks, and reproducibility.

Of 18 audited numeric items: 12 PASS, 2 FAIL, and 4 UNCERTAIN. The main failures are (i) a claimed $\sim 100\times$ wall-time speedup that computes to $\sim 9.36\times$ from the stated times, and (ii) a cross-location inconsistency in reported NUTS wall time (200 s vs $\sim 40$ s). Several other checks are descriptive or heuristic and cannot be strictly verified from the provided numerals alone.

### Checked items

  • C1 (Page 1, Abstract)
  • Claim: “gradient-based optimisation reaches the MAP in fewer than $\sim 40$ forward-and-gradient evaluations ($\sim 0.4~{\rm s}$ wall)”
  • Checks: wall_time_from_eval_count
  • Verdict: UNCERTAIN
  • Notes: Implied time per evaluation is $0.4/40 = 0.01~{\rm s}$, but this is a heuristic with no directly comparable per-eval number reported in the checked inputs.
  • C2 (Page 3, Section 3.3)
  • Claim: “Reverse-mode autodiff through the closure ... costs $\sim 17~{\rm ms}$ ... The closure performs only the halo-model integration ... is $\sim 5~{\rm ms}$ warm.”
  • Checks: ratio_check
  • Verdict: PASS
  • Notes: Computed ratio $17/5 = 3.4$, consistent with “$\sim 3\times$” within the stated tolerance.
  • C3 (Page 3, Section 3.3)
  • Claim: “The full pipeline including a fresh cosmology costs $\sim 20~{\rm ms}$, with the emulator forward pass contributing $\sim 2–3~{\rm ms}$ and the halo-model integration the remainder.”
  • Checks: parts_vs_total_range
  • Verdict: PASS
  • Notes: Remainder is $20-3=17~{\rm ms}$ to $20-2=18~{\rm ms}$; positive and within $[0, {\rm total}]$.
  • C4 (Page 4, Section 5.1)
  • Claim: “ESS = N/(1+2 \tau_{\rm int}) \in [466, 504] ... The integrated autocorrelation time is $\tau_{\rm int} \approx 8$ ... in agreement with ESS $\approx N/\tau_{\rm int}$ (Figure 9).” for 4 chains $\times$ 1000 samples
  • Checks: ess_from_N_and_tau
  • Verdict: PASS
  • Notes: With $N=4000$ and $\tau_{\rm int}=8$, $N/\tau_{\rm int}=500$ lies within $[466,504]$. Also computed $N/(1+2\tau_{\rm int})=4000/17\approx 235.29$ as an alternative formula mentioned, which does not match the stated range.
  • C5 (Page 5, Section 5.5)
  • Claim: “NUTS reaches $|Z| < 0.1\,\sigma$ at $\sim 11~{\rm s}$, whereas the cobaya RW-MH chain needs $\sim 103~{\rm s}$ ... roughly a $\sim 100\times$ wall-for-wall advantage”
  • Checks: speedup_ratio
  • Verdict: FAIL
  • Notes: Computed speedup $103/11 \approx 9.36$, not consistent with $\sim 100\times$.
  • C6 (Page 5, Section 5.5)
  • Claim: “The asymptotic ESS-accumulation rates are $\sim 10$ ESS/s (NUTS) vs $\sim 2.3$ ESS/s (cobaya RW-MH), a factor of $\sim 4$”
  • Checks: ratio_check
  • Verdict: PASS
  • Notes: Computed $10/2.3 \approx 4.35$, consistent with “$\sim 4$” within tolerance.
  • C7 (Page 5, Section 5.5)
  • Claim: “... because the autocorrelation length of the RW-MH chain ($\tau_{\rm int} \sim 20$ ...) is much longer than the NUTS chain’s ($\tau_{\rm int} \approx 8$).”
  • Checks: autocorr_ratio
  • Verdict: PASS
  • Notes: Computed ratio $20/8 = 2.5$.
  • C8 (Page 6, Table 1 caption)
  • Claim: “$\chi^2_{\rm bf}$ is quoted at 6 degrees of freedom (8 bandpowers minus 2 fitted parameters).”
  • Checks: degrees_of_freedom_subtraction
  • Verdict: PASS
  • Notes: $8-2 = 6$ matches reported dof.
  • C9 (Page 6, Table 1 (L-BFGS-B rows))
  • Claim: Bestfit shows “12.3/6” and caption states 6 dof; confirm that the table’s “12.3/6” corresponds to $\chi^2=12.3$ with dof=6 (not a computed ratio).
  • Checks: format_consistency_check
  • Verdict: PASS
  • Notes: Parsed numerator/denominator consistent; reduced $\chi^2$ would be $12.3/6 = 2.05$, but the intended display meaning cannot be verified from arithmetic alone.
  • C10 (Page 6, Table 1 (RW-MH baseline))
  • Claim: Baseline RW-MH: “$n_{\rm eff} \approx 1900$ ($\sim 5300$ accepted steps with acceptance $\sim 13\%$)”
  • Checks: accepted_vs_total_steps
  • Verdict: UNCERTAIN
  • Notes: Implied total proposals $\approx 5300/0.13 \approx 40,769.23$, but no explicit total was provided to confirm.
  • C11 (Page 6, Table 1)
  • Claim: Check consistency between NUTS ESS $\sim 1400$ and wall 200 s with implied ESS rate; compare to claimed ~10 ESS/s rate.
  • Checks: rate_from_total
  • Verdict: PASS
  • Notes: Implied rate $1400/200 = 7$ ESS/s, within the loose tolerance of the claimed ~10 ESS/s.
  • C12 (Page 5, Section 5.5)
  • Claim: Gold-standard chain: “500 warmup + 4000 samples $\times$ 4 chains ... ESS $\sim 1400$”
  • Checks: ess_upper_bound_check
  • Verdict: PASS
  • Notes: Total post-warmup draws = $4000\times 4 = 16,000$; ESS=1400 is below this bound.
  • C13 (Page 1 Abstract; Page 2 Section 2; Page 2 Figure 1 caption)
  • Claim: Cumulative acceleration claim: “from $\sim 30$ s ... to $\sim 5~{\rm ms}$” and “$\sim 6000\times$ acceleration”
  • Checks: speedup_factor
  • Verdict: PASS
  • Notes: $30/0.005 = 6000$ exactly.
  • C14 (Page 2, Section 2 (item iv))
  • Claim: “The $\sim 40\times$ gain over the previous generation ...” comparing $\sim 200~{\rm ms}$ to $\sim 5~{\rm ms}$ (fixed-cosmology closure) or to $\sim 20~{\rm ms}$ (full pipeline).
  • Checks: speedup_factor
  • Verdict: PASS
  • Notes: $200/5 = 40$ matches the claimed $\sim 40\times$ gain; $200/20 = 10$ does not, indicating the claim aligns with the fixed-cosmology timing.
  • C15 (Page 6, Figure 4 caption)
  • Claim: “Wall time per cosmology: $\sim 0.4~{\rm s}$ L-BFGS-B + $\sim 40~{\rm s}$ NUTS (8000 samples $\times$ 4 chains).”
  • Checks: samples_count_multiplication
  • Verdict: PASS
  • Notes: $8000\times 4 = 32,000$ total draws.
  • C16 (Page 6, Table 1 vs Page 6, Figure 4 caption)
  • Claim: NUTS wall time: Table 1 lists 200 s, while Figure 4 caption states $\sim 40~{\rm s}$ NUTS (8000 samples $\times$ 4 chains).
  • Checks: cross_reference_consistency
  • Verdict: FAIL
  • Notes: Computed ratio $200/40 = 5$; discrepancy requires contextual reconciliation (e.g., different budgets/settings).
  • C17 (Page 6, Table 1 caption)
  • Claim: “publication-grade budget (500 warmup + 4000 samples, $R$-hat $\leq 1.003$, ESS $\sim 1400$)” and Table 1 NUTS wall is 200 s; compute total post-warmup draws and compare to ESS.
  • Checks: ess_fraction
  • Verdict: UNCERTAIN
  • Notes: Descriptive recomputation: total post-warmup draws = $4000\times 4 = 16,000$; ESS fraction = $1400/16,000 = 0.0875$. No explicit fraction claim was provided to verify.
  • C18 (Page 3, Section 3.1)
  • Claim: $P_k$ emulator grid: “1000 points spanning $k \in [5 \times 10^{-4}, 10]$ Mpc$^{-1}$, extrapolate to $k_{\min}=10^{-4}$ Mpc$^{-1}$.”
  • Checks: range_order_check
  • Verdict: PASS
  • Notes: Ordering holds: $1\times 10^{-4} < 5\times 10^{-4} < 10$, and $n_{\rm points}=1000$ is positive.

### Limitations

  • Only parsed text and embedded figure/table text from the provided PDF pages were used; no external data, code, or repositories were accessed.
  • No values were extracted from plotted curves or points in figures (pixel-based extraction disallowed); only textual numerals were audited.
  • Many performance, convergence, and accuracy claims depend on runtime logs, chains, datasets, or implementation details not contained in the PDF; these are listed as unverified.

## Paper Ratings

| Dimension | Score |

|-----------|:-----:|

| Overall | 6/10 ██████░░░░ |

| Soundness | 6/10 ██████░░░░ |

| Novelty | 7/10 ███████░░░ |

| Significance | 7/10 ███████░░░ |

| Clarity | 5/10 █████░░░░░ |

| Evidence Quality | 5/10 █████░░░░░ |

Justification: The work presents a coherent, fast, and JAX-native halo-model pipeline with exact autodiff and a compelling NUTS demonstration, offering a meaningful engineering advance likely to aid inference workflows. However, the audits flag important gaps: ambiguities around true end-to-end differentiability due to non-JAX FFTLog components, under-specified likelihood and priors, missing validation against a CAMB/CLASS-based reference, and benchmarking inconsistencies (including a failed 100× speedup arithmetic and conflicting wall times). Mathematical checks also mark critical UNCERTAIN items for the tSZ window normalization and Fourier-transform convergence, plus an ESS definition inconsistency. These issues limit confidence and completeness despite strong gradient checks and solid sampler diagnostics.

Full Review Report