-
**Contradictory sample sizes across Methods/Results and downstream analyses undermine interpretability and reproducibility (Sec. 2.1, Sec. 2.5, Sec. 3.1–3.3; Tables 1–2; figure captions).** The manuscript alternates between $N=5,\!890$ and $N=1,\!626$ as the “complete/filtered/modeling” dataset, and Sec. 2.5 describes anomaly detection on $5,\!890$ while Sec. 3.3 and figures appear to use $1,\!626$. This also affects the implied train/test sizes and the computational narrative in Sec. 2.6. *Recommendation:* Provide a single authoritative data-flow description with $N$s at each stage: (1) initial merged rows and unique asteroids; (2) after each join/deduplication rule; (3) after requiring each feature (obliquity target, age, diameter, type, family); (4) after any additional cuts/grouping that lead to the final modeling table. Include a compact flow table/diagram and ensure Sec. 2.1, Sec. 2.5, Sec. 3.1–3.3, Tables 1–2, and all figure captions use consistent $N$s and clearly state which $N$ is used for training/testing and which $N$ is used for “full-sample” prediction/anomaly scoring.
-
**Target definition/handling is internally inconsistent (Sec. 2.1–2.2 vs. Sec. 3.1).** Methods describe converting obliquity angles (degrees) to $\cos(\mathrm{radians}($obliquity$))$, while Results state the ingested “obliquity” column already lies in $[-1, 1]$, implying it is already $\cos($obliquity$)$. Without resolving this, readers cannot know whether the cosine transform was applied, applied twice, or merely described. *Recommendation:* Unify target terminology and preprocessing: explicitly name the raw column(s) and units (e.g., $\text{obliquity}_{\rm deg}$ vs $\cos_{\rm obliquity}$), state precisely what is in the source catalog(s), and document the exact transformation applied (or not applied). Update Sec. 2.1–2.2, Sec. 3.1, tables, and captions so that “obliquity” (degrees) and “$\cos($obliquity$)$” (dimensionless) are never conflated.
-
**Model specification is incomplete and partially corrupted in presentation, limiting reproducibility and making some interpretations unreliable (Sec. 2.3–2.4; Sec. 3.1–3.2).** Table 2 appears corrupted (kernel string replacing numeric Max values), the final kernel is not written cleanly in the text, and key training settings are missing (hyperparameter bounds, restarts, normalize_y, alpha, optimizer, random_state). The amplitude/noise interpretation also contains an algebraic inconsistency (Sec. 3.2: “0.1982 $\approx$ 0.039”). *Recommendation:* Fix Table 2 to contain only data summary statistics and move kernel details to Sec. 3.2. In Sec. 2.3–2.4 and Sec. 3.2, report the exact final kernel as printed by the implementation (e.g., ConstantKernel $\times$ RBF $+$ WhiteKernel with numeric values), and list training settings: bounds, n\_restarts\_optimizer, optimizer, alpha, normalize\_y, convergence/termination, and random\_state. Correct the amplitude statement by explicitly defining whether the printed scalar is an amplitude or variance parameter (and avoid squaring unless the kernel form warrants it).
-
**Kernel choice/feature geometry may be mismatched to the mixed continuous + high-dimensional one-hot categorical feature space (Sec. 2.2–2.3; Sec. 3.2).** If a single Euclidean-distance RBF is applied to a concatenated vector containing sparse one-hot family/type indicators, distances can become dominated by categorical mismatches, effectively making most points “far,” inflating uncertainty and encouraging the optimizer to push variance into the WhiteKernel. As written, it is unclear whether the null result reflects astrophysical irreducibility or this representational mismatch. *Recommendation:* Clarify exactly what feature matrix is fed to the GP (continuous $+$ one-hot together, and whether any scaling is applied to one-hot columns). Add at least one robustness experiment better aligned with mixed data types, e.g.: (a) an additive kernel with separate blocks (RBF on standardized [age, diameter] $+$ linear/DotProduct on one-hot or a dedicated categorical kernel); and/or (b) ARD length scales (per-dimension) rather than a single length scale. Report whether conclusions ($R^2$, residual structure, noise dominance) change; if they do not, the null result is much more convincing.
-
**Missing baselines and robustness checks make the “no predictability” conclusion under-supported (Sec. 3.2–3.4).** Only a single GPR configuration and a single 80/20 split are reported. With $N \approx 1,\!626$ (or $5,\!890$; currently unclear), performance estimates can vary, and it is essential to show the GP is not underperforming trivial alternatives. *Recommendation:* Add baseline models in Sec. 3.2–3.4: (1) constant-mean predictor; (2) linear/ridge regression on the same preprocessing; and optionally (3) a nonparametric baseline (random forest or gradient boosting). Report $R^2$/MSE/MAE side-by-side. Replace or supplement the single split with repeated K-fold cross-validation (or repeated train/test splits) and provide mean $\pm$ std of metrics. This will clarify whether the null result is model-independent for the chosen feature set.
-
**Bimodal target distribution and Gaussian-likelihood regression: the modeling framework may be misspecified for the main structure in the data (Sec. 3.1; Sec. 3.5).** The target appears strongly bimodal near $\pm 1$, suggesting a latent “prograde vs retrograde” (or multi-regime) structure. A Gaussian-likelihood regressor will often collapse toward intermediate means ($\approx 0$) with large uncertainties, which can resemble the observed behavior and should be treated as a central modeling concern rather than only a future direction. *Recommendation:* Quantify bimodality briefly (e.g., peak locations/heights or a simple 2-component mixture fit) and explicitly connect it to expected behavior of Gaussian-likelihood regression. Add one complementary experiment: classify $\mathrm{sign}(\cos($obliquity$))$ (or a 3-class version: near $+1$ / near $-1$ / intermediate) and report accuracy/AUC relative to a majority-class baseline. Even if predictability remains poor, this reframes the null result in a way that matches the target’s structure.
-
**Anomaly detection is tightly coupled to (possibly inflated/miscalibrated) predictive uncertainty; “zero anomalies” may be methodological rather than astrophysical (Sec. 2.5; Sec. 3.3).** Using $z = (y-\mu)/\sigma$ with $\sigma$ taken from GaussianProcessRegressor(return\_std=True) will mechanically suppress $|z|$ when the model attributes most variance to noise. The paper does not specify whether $\sigma$ includes observation noise vs latent function uncertainty, nor does it assess calibration (coverage / standardized residual distribution). *Recommendation:* In Sec. 2.5, define $\sigma$ precisely (predictive std including WhiteKernel noise vs latent mean uncertainty) and state the exact code path used. Add calibration diagnostics in Sec. 3.3: (1) histogram of standardized residuals vs $N(0,1)$; (2) empirical coverage of nominal 68\%/95\% predictive intervals. Report sensitivity to threshold choice ($2\sigma$, $2.5\sigma$, $3\sigma$) and also consider an outlier criterion less sensitive to variance inflation (e.g., leave-one-out negative log predictive density / low predictive probability). Reframe the “no anomalies” conclusion as conditional on calibration and the chosen anomaly criterion.
-
**Data provenance and physical meaning of key inputs—especially “age”—are insufficiently defined, limiting scientific interpretability (Sec. 2.1–2.2; Sec. 3.5).** It is unclear whether age is a family-level attribute (shared by many objects) or an object-specific estimate, what method produced it, typical uncertainties, and how conflicting entries across sources were resolved. Similar concerns apply to obliquity derivation consistency across catalogs and to diameter uncertainties. *Recommendation:* Add a concise provenance subsection (Sec. 2.1 or Appendix): for each feature ($\cos($obliquity$)$, age, diameter, type, family), list the source catalog(s), unit conventions, typical uncertainties (if known), and reconciliation rules for duplicates/conflicts. Explicitly state whether age varies within families or is assigned per family, and discuss the implications (effective degrees of freedom; potential leakage/collinearity with family).