This section audits numerical/empirical consistency: reported metrics, experimental design, baseline comparisons, statistical evidence, leakage risks, and reproducibility.
Of 16 numeric candidates checked, 14 PASS and 2 are UNCERTAIN (not enough supporting numbers to recompute/compare). No FAIL results were found. Key cross-references (Table 3 vs Figure 7(b); Table 4 vs narrative/abstract rounding) and multiple derived computations (percent reduction, ratios, pixel scale, delta-r, percent-from-ratios) were consistent within stated tolerances.
### Checked items
- Claim: Pixel RMS ratio improvement claim: STsep has pixel RMS σe/σt = 1.22 vs ILC’s 3.42, described as “64% lower than ILC’s 3.42”.
- Checks: percent_reduction_from_baseline
- Verdict: PASS
- Notes: Computed reduction = 64.3275%, consistent with claimed 64% within 1 percentage point rounding tolerance.
- Claim: Extreme-tail improvement claim: “30× at z < −8, 14× at z < −5”. Table 4 later lists ILC vs STsep recovery fractions.
- Checks: ratio_claim_check
- Verdict: PASS
- Notes: Computed ratios: z<−8: 0.08/0.003=26.67 vs claimed 30× (within 20% rel tol); z<−5: 0.231/0.017=13.59 vs claimed 14× (within 20% rel tol).
- Claim: Patch pixel count consistency: “256 × 256 pixel resolution” and later “256^2 = 65536 pixels”.
- Checks: integer_product
- Verdict: PASS
- Notes: 256×256 equals 65536 exactly.
- Claim: Pixel scale consistency: “5° × 5° patches at 256 × 256 pixel resolution (pixel scale ≈ 1.17′)”.
- Checks: unit_conversion_and_division
- Verdict: PASS
- Notes: Computed pixel scale = (5×60)/256 = 1.171875 arcmin, consistent with ≈1.17′.
- Claim: Table 1 FWHM values: verify internal monotonic/expected comparisons mentioned elsewhere (e.g., “150 GHz channel has FWHM 1.4′”).
- Checks: cross_reference_value_match
- Verdict: PASS
- Notes: Text FWHM(150)=1.4′ matches Table 1; also 1.4′ > 1.0′ (217 GHz) as expected.
- Claim: tSZ SED scaling example: “atSZ(150) ≈ −2.60 and y = 10−5 corresponds to ≈ −26 µKCMB at 150 GHz.”
- Checks: scalar_multiplication_and_power_of_ten
- Verdict: PASS
- Notes: Computed (−2.60)×(1e−5)×(1e6 µK/K)=−26.0 µK, matching the stated example.
- Claim: Prior variance parameter: “V∗ = (4.3 µK)^2”.
- Checks: square_value
- Verdict: PASS
- Notes: Computed 4.3^2 = 18.49 µK^2 for reference; no separate numeric V* value was provided to compare against.
- ✔ C8 (p4 §3.3 and p3 §3.3)
- Claim: Training/ensemble sizes: contamination ensemble “Nens = 20” and optimisation hyperparameters “150 steps, batch size 4 per step”; compute number of batch items processed.
- Checks: simple_count_multiplication
- Verdict: PASS
- Notes: Computed total batch items = 150×4 = 600, consistent with the implied count.
- Claim: FoCUS coefficient difference: “(a90 − a217) ≈ 1.67”. Verify arithmetic if a90 and a217 are provided elsewhere; if not, this is only checkable as a stand-alone constant match if repeated.
- Checks: repeated_constant_match
- Verdict: UNCERTAIN
- Notes: Cannot recompute without a90 and a217; and only one occurrence is available here, so repetition consistency cannot be assessed.
- Claim: Spectrum binning statement: “binned into 24 log-spaced bins over 500 ≤ ℓ ≤ 6000”. Check that bin count and endpoints are consistent with any later reported bin edges if present (none in text).
- Checks: parameter_consistency_across_mentions
- Verdict: UNCERTAIN
- Notes: No explicit bin edges/centers provided to compare against a generated 24-bin logspace between 500 and 6000 (which would imply 25 edges).
- ✔ C11 (p6 Table 3 and p8 Figure 7(b))
- Claim: Three-frequency mean pixel correlations: Table 3 lists r=0.113 (ILC 3-freq), r=0.042 (SED-init), r=0.042 (STsep 3-freq), matching Figure 7(b) labels 0.113, 0.042, 0.042.
- Checks: cross_reference_value_match
- Verdict: PASS
- Notes: All three table values exactly match the corresponding Figure 7(b) numeric annotations in the provided inputs.
- Claim: SED-init and STsep (3-freq) RMS ratios: Table 3 lists σe/σt = 54.707 vs 54.692; verify that the claim “converges to the same pixel correlation within numerical precision” is consistent at least for r (exactly equal) and that RMS are close.
- Checks: difference_within_tolerance
- Verdict: PASS
- Notes: Pixel correlations match exactly (0.042 vs 0.042). RMS ratios differ by 0.015, which is small and within the stated <0.02 closeness criterion.
- ✔ C13 (p5 Table 2 and p5 §4.3)
- Claim: S1 correlations: Table 2 gives residual vs CIB S1 = 0.970 and vs CMB S1 = 0.831, identical for ILC and FoCUS; check equality across methods and alignment with text statements (r=0.97 vs 0.83).
- Checks: table_internal_consistency
- Verdict: PASS
- Notes: ILC equals FoCUS for both correlations (0.970 and 0.831), and 0.970 > 0.831 as stated.
- Claim: Residual S1 ratio range statement: “ranges from 1.15 at j=0 to 1.36 at j=2”, implying 15–36% more amplitude; check percent conversion.
- Checks: percent_from_ratio
- Verdict: PASS
- Notes: (1.15−1)×100=15% and (1.36−1)×100=36%, matching the stated 15–36% range.
- Claim: ∆r computation: “ILC residual’s r = 0.970 is only ∆r = 0.013 above the pure-tSZ baseline [r = 0.958].”
- Checks: difference
- Verdict: PASS
- Notes: Computed ∆r = 0.970−0.958 = 0.012, which is consistent with the reported 0.013 under rounding tolerance.
- ✔ C16 (p8 Table 4 vs p7 §4.6 text)
- Claim: Table 4 means vs narrative: verify text-reported metrics match Table 4 (e.g., r: 0.144→0.175; σe/σt: 3.42→1.22; DST: 0.780→0.733; KS: 0.216→0.192).
- Checks: cross_reference_value_match
- Verdict: PASS
- Notes: Table 4 σe/σt values (3.416, 1.215) round to the narrative 2-decimal values (3.42, 1.22) exactly; other listed Table 4 values match as given in the provided inputs.
### Limitations
- Only the provided parsed text/images from the PDF were used; no external constants, code, or datasets were accessed.
- No checks requiring extraction of numerical values from plotted curves or image pixels were included.
- Several claims are qualitative (e.g., 'factor ~few', 'about 50%') without explicit numbers; only statements with explicit numerics were selected as candidates.
- Some internal consistency checks are limited to rounding-level verification because the paper reports rounded summary statistics (e.g., 1.22 vs 1.215).
- Some statements cannot be recomputed from the provided inputs because they depend on external functions/constants or additional underlying values not given (e.g., utils.jysr2uk(ν); physical constants; a90 and a217; explicit bin edges/centers; data/noise realisations and statistical assumptions).
## Paper Ratings
| Dimension | Score |
|-----------|:-----:|
| Overall | 6/10 ██████░░░░ |
| Soundness | 6/10 ██████░░░░ |
| Novelty | 6/10 ██████░░░░ |
| Significance | 5/10 █████░░░░░ |
| Clarity | 5/10 █████░░░░░ |
| Evidence Quality | 6/10 ██████░░░░ |
Justification: The study follows a careful, realistic-noise protocol (explicit SO/Planck noise, beam handling, split-cross spectra) and reports balanced positive and negative findings, with numerics that are internally consistent. However, several mathematically critical elements are ambiguous (beam/units in multi-frequency constraints and FoCUS residuals, ILC constraint normalization, SED-difference across unmatched beams), and robustness/uncertainty quantification, ablations, and FoCUS characterisation are missing.
The main impact is moderate: a hybrid ILC-initialised STsep improves map-space and tail metrics, but FoCUS gives only marginal gains and the ILC-free three-band variant fails under realistic noise; dependence on simulation-derived priors and limited patch statistics further temper generality. Strengthening definitions, ablations, uncertainty estimates, and harmonic-space diagnostics would significantly raise confidence and impact.