Diagnostics

You don’t observe \(\alpha_0\) on real data, so the held-out Riesz loss from cross-validation (cross_val_score, or whatever validation signal a backend exposes during fit) is the only training-time signal. After fitting, .diagnose() reports magnitude statistics and warnings about extreme tail values, near-positivity violations, and single-point extrapolation. Those warnings often matter more for downstream variance than the bulk fit quality.

.diagnose() is defined in rieszreg.diagnostics and works for any RieszEstimator subclass. Backend-specific extensions (e.g. KernelDiagnostics adds chosen λ, support size, effective d.o.f., and the kernel matrix’s condition number) layer on top of the shared base.

This page walks through .diagnose() on an under-tuned booster (so the warnings fire) and maps each warning to the hyperparameter that fixes it. See Boosting backend and Kernel backend for the backend-specific tuning recipes.

Setup: a low-overlap dataset

We push the propensity close to 0 in part of the covariate space. This is where naive inverse-propensity weighting falls apart.

import numpy as np, pandas as pd
from rieszboost import RieszBooster
from rieszreg import ATE

rng = np.random.default_rng(0)
n = 2000
x = rng.uniform(0, 1, n)
pi = 1 / (1 + np.exp(-(8.0 * x - 5.0)))
a = rng.binomial(1, pi)
df = pd.DataFrame({"a": a.astype(float), "x": x})
print(f"min π(x) = {pi.min():.3f}, fraction with π(x) < 0.05 = {(pi < 0.05).mean():.2f}")

min π(x) = 0.007, fraction with π(x) < 0.05 = 0.26

set.seed(0)
n  <- 2000
x  <- runif(n)
pi <- 1 / (1 + exp(-(8 * x - 5)))
a  <- rbinom(n, 1, pi)
df <- data.frame(a = as.numeric(a), x = x)
cat(sprintf("min pi(x) = %.3f, fraction with pi(x) < 0.05 = %.2f\n",
            min(pi), mean(pi < 0.05)))

min pi(x) = 0.007, fraction with pi(x) < 0.05 = 0.26

A greedy fit

Deep trees, no regularization, no early stopping. This will extrapolate badly in the low-π region.

greedy = RieszBooster(
    estimand=ATE(),
    n_estimators=300,
    learning_rate=0.3,
    max_depth=8,
    reg_lambda=0.0,
).fit(df)

print(greedy.diagnose(df).summary())

Riesz representer diagnostics (n=2000):
  RMS magnitude   : 2.4655
  mean            : -0.1415
  min / max       : -11.5966 / 19.8519
  |alpha| quantiles:
     0.50: 1.1422
     0.90: 2.6658
     0.99: 10.2367
     1.00: 19.8519
  extreme rows    : 0/2000 (0.00%) with |alpha| > 30.0
  held-out Riesz  : -66.7209

greedy <- RieszBooster$new(
  estimand = ATE(),
  n_estimators = 300L,
  learning_rate = 0.3,
  max_depth = 8L,
  reg_lambda = 0.0
)
greedy$fit(df)
cat(greedy$diagnose(df)$summary)

Riesz representer diagnostics (n=2000):
  RMS magnitude   : 2.3614
  mean            : -0.2094
  min / max       : -13.6254 / 17.4702
  |alpha| quantiles:
     0.50: 1.1424
     0.90: 2.6994
     0.99: 9.7480
     1.00: 17.4702
  extreme rows    : 0/2000 (0.00%) with |alpha| > 30.0
  held-out Riesz  : -62.8998

Reading the report

The summary has three parts.

Magnitude block. RMS, mean, min/max, and |alpha| quantiles. The 99th percentile and the max are the load-bearing numbers. Healthy fits have max no more than a few times the 99th percentile. Max far above the 99th percentile signals a single-point extrapolation outlier: one leaf is far from any training data and its prediction has run away.

Extreme rows. Rows where \(|\hat\alpha|\) exceeds extreme_threshold (default 30). Downstream IF estimator variance is dominated by the largest few weights, so even 1% of rows over the threshold inflates confidence intervals.

Warnings. Two automatic checks:

“X% of rows have \(|\hat\alpha| >\) threshold — possible near-positivity violation.” Either the data has genuine near-positivity (truncate extreme_threshold and report the loss of overlap), or the estimator is over-fitting (tune as below).
“max \(|\hat\alpha|\) is >10× the 99th percentile — likely a single extrapolation outlier.” The learner found a region with no nearby training data. Tighten model complexity, increase regularization, or stop earlier.

A tuned fit

Same data with sensible hyperparameters and early stopping.

tuned = RieszBooster(
    estimand=ATE(),
    n_estimators=2000,
    learning_rate=0.05,
    max_depth=3,
    reg_lambda=1.0,
    early_stopping_rounds=20,
    validation_fraction=0.2,
).fit(df)

print(f"stopped at iter {tuned.best_iteration_}, "
      f"best held-out Riesz loss = {tuned.best_score_:.4f}")

stopped at iter 1999, best held-out Riesz loss = -31.0978

print()

print(tuned.diagnose(df).summary())

Riesz representer diagnostics (n=2000):
  RMS magnitude   : 3.2958
  mean            : 0.0136
  min / max       : -43.5758 / 30.7376
  |alpha| quantiles:
     0.50: 1.1028
     0.90: 2.9091
     0.99: 14.2924
     1.00: 43.5758
  extreme rows    : 2/2000 (0.10%) with |alpha| > 30.0
  held-out Riesz  : -28.1668

tuned <- RieszBooster$new(
  estimand = ATE(),
  n_estimators = 2000L,
  learning_rate = 0.05,
  max_depth = 3L,
  reg_lambda = 1.0,
  early_stopping_rounds = 20L,
  validation_fraction = 0.2
)
tuned$fit(df)
cat(sprintf("stopped at iter %d, best held-out Riesz loss = %.4f\n\n",
            reticulate::py_to_r(tuned$py$best_iteration_),
            reticulate::py_to_r(tuned$py$best_score_)))

stopped at iter 1999, best held-out Riesz loss = -29.3481

cat(tuned$diagnose(df)$summary)

Riesz representer diagnostics (n=2000):
  RMS magnitude   : 3.1136
  mean            : -0.0701
  min / max       : -15.5357 / 43.7492
  |alpha| quantiles:
     0.50: 1.1113
     0.90: 3.2547
     0.99: 10.1967
     1.00: 43.7492
  extreme rows    : 3/2000 (0.15%) with |alpha| > 30.0
  held-out Riesz  : -30.6263

The max \(|\hat\alpha|\) shrinks by an order of magnitude, the extreme-row count drops, and the warnings clear.

What `.diagnose()` does not do

It does not estimate \(\alpha_0\) for comparison; that closed form is generally unavailable. It does not assess sample size. It checks magnitude and the two failure modes above. For fit quality, track the held-out Riesz loss (best_score_). For downstream confidence intervals, use the bootstrap on the plug-in estimator.

Setup: a low-overlap dataset

A greedy fit

Reading the report

A tuned fit

What .diagnose() does not do

What `.diagnose()` does not do