Tuning and Cross-Fitting

RieszBooster, KernelRieszRegressor, ForestRieszRegressor, AugForestRieszRegressor, and RieszNet are all subclasses of rieszreg.RieszEstimator, which inherits from sklearn.base.BaseEstimator.

From Python, every sklearn tool works without modification: cross_val_predict, GridSearchCV, Pipeline, clone, set_params, RandomizedSearchCV, HalvingGridSearchCV, and the rest of sklearn.model_selection. From R, use rsample::vfold_cv for splits and a for loop over the R6 estimator’s $fit() / $predict() / $score() per fold.

This page uses RieszBooster for the single-learner sections and adds RieszNet and ForestRieszRegressor to the ensemble-selection example. The same patterns work for KernelRieszRegressor and AugForestRieszRegressor with backend-specific hyperparameters.

Cross-fitting

Plug-in $\hat\alpha$ in a one-step / TMLE / DML estimator must be out-of-fold. Use cross_val_predict:

import os
os.environ.update({"OMP_NUM_THREADS": "1", "MKL_NUM_THREADS": "1"})
import numpy as np, pandas as pd
from sklearn.model_selection import cross_val_predict
from rieszboost import RieszBooster
from rieszreg import ATE

rng = np.random.default_rng(0)
n = 2000
x = rng.uniform(0, 1, n)
pi = 1 / (1 + np.exp(-(-0.02 * x - x**2 + 4 * np.log(x + 0.3) + 1.5)))
a = rng.binomial(1, pi)
df = pd.DataFrame({"a": a.astype(float), "x": x})

booster = RieszBooster(
    estimand=ATE(),
    n_estimators=2000,
    early_stopping_rounds=20,
    validation_fraction=0.2,
    learning_rate=0.05,
    max_depth=4,
)

alpha_oof = cross_val_predict(booster, df, cv=5)
alpha_oof.shape, alpha_oof[:5]

((2000,), array([-3.6196816, -1.6876634, -1.0861515, -1.0516312,  1.2505229],
      dtype=float32))

suppressPackageStartupMessages(library(rsample))
set.seed(0)
n  <- 2000
x  <- runif(n)
pi <- 1 / (1 + exp(-(-0.02 * x - x^2 + 4 * log(x + 0.3) + 1.5)))
a  <- rbinom(n, 1, pi)
df <- data.frame(a = as.numeric(a), x = x)

make_booster <- function() RieszBooster$new(
  estimand = ATE(),
  n_estimators = 2000L,
  early_stopping_rounds = 20L,
  validation_fraction = 0.2,
  learning_rate = 0.05,
  max_depth = 4L
)

alpha_oof <- numeric(nrow(df))
for (split in vfold_cv(df, v = 5)$splits) {
  booster <- make_booster()
  booster$fit(analysis(split))
  alpha_oof[complement(split)] <- booster$predict(assessment(split))
}
length(alpha_oof); head(alpha_oof)

[1] 2000

[1]  1.805970  6.023205  2.519082 -3.056600  1.106197 -1.436588

Each fold runs its own internal early-stopping split. The R loop calls $fit() / $predict() per fold on a fresh booster, mirroring what cross_val_predict does in Python.

Hyperparameter search

.score(Z) returns the negative canonical squared Riesz loss — a fixed yardstick independent of the loss the estimator was trained with, in the spirit of R² for regressors. GridSearchCV selects the configuration with the highest score (lowest squared loss). To use a different yardstick, pass scoring=riesz_scorer(loss=...).

from sklearn.model_selection import GridSearchCV

grid = GridSearchCV(
    RieszBooster(
        estimand=ATE(),
        n_estimators=300,
        early_stopping_rounds=20,
        validation_fraction=0.2,
    ),
    param_grid={"learning_rate": [0.03, 0.05, 0.1], "max_depth": [3, 4, 5]},
    cv=3,
    n_jobs=1,
).fit(df)

print("best params:", grid.best_params_)

best params: {'learning_rate': 0.05, 'max_depth': 3}

print(f"best score (neg squared Riesz loss): {grid.best_score_:.4f}")

best score (neg squared Riesz loss): 5.8153

best = grid.best_estimator_
alpha_hat = best.predict(df)
alpha_hat[:5]

array([-2.770411 , -1.6573501, -1.064923 , -1.0326685,  1.2886328],
      dtype=float32)

grid <- expand.grid(
  learning_rate = c(0.03, 0.05, 0.1),
  max_depth     = c(3L, 4L, 5L)
)
make_booster <- function(lr, md) RieszBooster$new(
  estimand = ATE(),
  n_estimators = 300L,
  early_stopping_rounds = 20L,
  validation_fraction = 0.2,
  learning_rate = lr,
  max_depth = md
)

inner_splits <- vfold_cv(df, v = 3)$splits
scores <- vapply(seq_len(nrow(grid)), function(i) {
  mean(vapply(inner_splits, function(s) {
    m <- make_booster(grid$learning_rate[i], grid$max_depth[i])
    m$fit(analysis(s))
    m$score(assessment(s))
  }, numeric(1)))
}, numeric(1))

best <- grid[which.max(scores), ]
cat("best params:\n"); print(best)

best params:

  learning_rate max_depth
3           0.1         3

cat(sprintf("best score (neg squared Riesz loss): %.4f\n", max(scores)))

best score (neg squared Riesz loss): 7.2705

Comparing learners with different losses

Two estimators trained with different losses produce α̂ in the same units (a per-row real value), so the squared yardstick scores them on the same scale. cross_val_score results are directly comparable.

from sklearn.model_selection import cross_val_score
from rieszreg import BoundedSquaredLoss, SquaredLoss, riesz_scorer

booster_bd = RieszBooster(
    estimand=ATE(), loss=BoundedSquaredLoss(lo=-50.0, hi=50.0),
    n_estimators=300, learning_rate=0.05, max_depth=3,
)
booster_sq = RieszBooster(
    estimand=ATE(), loss=SquaredLoss(),
    n_estimators=300, learning_rate=0.05, max_depth=3,
)

s_bd = cross_val_score(booster_bd, df, cv=3)
s_sq = cross_val_score(booster_sq, df, cv=3)
print(f"BoundedSquared-trained: mean score = {s_bd.mean():.4f}")

BoundedSquared-trained: mean score = -20.3700

print(f"squared-trained:        mean score = {s_sq.mean():.4f}")

squared-trained:        mean score = 6.0809

A single GridSearchCV can sweep across losses too. Each candidate is fit with its own loss; all candidates are ranked on the squared yardstick.

grid = GridSearchCV(
    RieszBooster(
        estimand=ATE(),
        n_estimators=300, learning_rate=0.05, max_depth=3,
    ),
    param_grid={"loss": [BoundedSquaredLoss(lo=-50.0, hi=50.0), SquaredLoss()]},
    cv=3,
    n_jobs=1,
).fit(df)

print("best loss:", grid.best_params_["loss"])

best loss: <rieszreg.losses.squared.SquaredLoss object at 0x7ffacb08db20>

print(f"best score: {grid.best_score_:.4f}")

best score: 6.0809

To rank with a different yardstick, build it with riesz_scorer:

scorer = riesz_scorer(loss=BoundedSquaredLoss(lo=-50.0, hi=50.0))
s_bd_yard = cross_val_score(booster_bd, df, cv=3, scoring=scorer)
s_sq_yard = cross_val_score(booster_sq, df, cv=3, scoring=scorer)
print(f"BoundedSquared-trained on bounded yardstick: {s_bd_yard.mean():.4f}")

BoundedSquared-trained on bounded yardstick: -20.3700

print(f"squared-trained on bounded yardstick:        {s_sq_yard.mean():.4f}")

squared-trained on bounded yardstick:        6.0809

The yardstick must accept the α the estimator predicts. The squared yardstick has unrestricted α-domain, so it works for any learner; the others have restrictions:

Yardstick	α-domain	Accepts α from learners trained with
`SquaredLoss`	ℝ	any loss (the default)
`KLLoss`	(0, ∞)	KL-trained learners
`BernoulliLoss`	(0, 1)	Bernoulli-trained learners
`BoundedSquaredLoss(lo, hi)`	(lo, hi)	matching-bounded learners

Pick a yardstick whose domain accepts every candidate’s α. SquaredLoss is always safe.

Cross-fit ensemble selection

Going further than two losses on one backend: the pattern below selects among five learner configurations — two boosting depths, two neural-net architectures, and a forest — using inner 2-fold CV inside each of three outer cross-fitting folds. The inner CV runs entirely on the training portion of each outer fold, so the model selection criterion never sees the held-out data that cross_val_predict predicts.

This uses the Pipeline idiom for cross-estimator-type selection: wrap the estimator in a single-step Pipeline named "riesz", then pass a list of fully configured instances as the value for that key. GridSearchCV calls pipe.set_params(riesz=candidate), fits, and scores each one on the canonical squared yardstick.

from sklearn.pipeline import Pipeline
from riesznet import RieszNet
from forestriesz import ForestRieszRegressor

estimand = ATE()

# Five fully configured candidates across three learner families.
candidates = [
    # Boosting — depth 3, early stopping
    RieszBooster(
        estimand=estimand,
        n_estimators=200,
        max_depth=3,
        early_stopping_rounds=10,
        validation_fraction=0.2,
        learning_rate=0.05,
    ),
    # Boosting — depth 5, early stopping
    RieszBooster(
        estimand=estimand,
        n_estimators=200,
        max_depth=5,
        early_stopping_rounds=10,
        validation_fraction=0.2,
        learning_rate=0.05,
    ),
    # Neural net — small MLP with early stopping
    RieszNet(
        estimand=estimand,
        hidden_sizes=(64, 64),
        epochs=100,
        early_stopping_rounds=10,
        validation_fraction=0.2,
    ),
    # Neural net — larger MLP with early stopping
    RieszNet(
        estimand=estimand,
        hidden_sizes=(128, 64),
        epochs=100,
        early_stopping_rounds=10,
        validation_fraction=0.2,
    ),
    # Forest Riesz
    ForestRieszRegressor(
        estimand=estimand,
        n_estimators=100,
        min_samples_leaf=10,
        random_state=0,
    ),
]

# Single-step Pipeline so GridSearchCV can swap in any candidate as the step.
pipe = Pipeline([("riesz", candidates[0])])

# Inner CV: 2 folds, selects the best learner+config from the five candidates.
cv_inner = GridSearchCV(
    pipe,
    param_grid=[{"riesz": [c]} for c in candidates],
    cv=2,
    n_jobs=1,
    refit=True,
)

# Outer CV: 3 cross-fitting folds. Each fold:
#   1. Fits cv_inner on 2/3 of the data (inner CV happens here, on that 2/3).
#   2. Predicts the held-out 1/3 using the winner from step 1.
# The inner model selection never touches the held-out slice.
alpha_oof = cross_val_predict(cv_inner, df, cv=3)
print(f"out-of-fold alpha shape: {alpha_oof.shape}")

out-of-fold alpha shape: (2000,)

print(f"first 5 values:          {alpha_oof[:5]}")

first 5 values:          [-2.79388666 -1.77150512 -1.09701025 -1.09701025  1.24940085]

The call to cross_val_predict is identical to the single-learner case. The only addition is the Pipeline wrapper and the list-valued param_grid entry — everything else is standard sklearn.

The same nested cross-fit + per-fold model selection in R uses rsample (tidymodels) for the fold splits and a small loop for the inner-CV winner pick. Each outer fold runs its own inner-CV on its training portion only.

Load forestriesz and riesznet R wrappers

.project_root <- Sys.getenv("QUARTO_PROJECT_DIR", unset = getwd())
suppressMessages(pkgload::load_all(
  file.path(.project_root, "../packages/forestriesz/r/forestriesz"), quiet = TRUE))
suppressMessages(pkgload::load_all(
  file.path(.project_root, "../packages/riesznet/r/riesznet"),       quiet = TRUE))

suppressPackageStartupMessages(library(rsample))

estimand <- ATE()
make_boost_d3 <- function() RieszBooster$new(
  estimand = estimand, n_estimators = 200L, max_depth = 3L,
  early_stopping_rounds = 10L, validation_fraction = 0.2, learning_rate = 0.05
)
make_boost_d5 <- function() RieszBooster$new(
  estimand = estimand, n_estimators = 200L, max_depth = 5L,
  early_stopping_rounds = 10L, validation_fraction = 0.2, learning_rate = 0.05
)
make_net_64  <- function() RieszNet$new(
  estimand = estimand, hidden_sizes = c(64L, 64L), epochs = 100L,
  early_stopping_rounds = 10L, validation_fraction = 0.2
)
make_net_128 <- function() RieszNet$new(
  estimand = estimand, hidden_sizes = c(128L, 64L), epochs = 100L,
  early_stopping_rounds = 10L, validation_fraction = 0.2
)
make_forest  <- function() ForestRieszRegressor$new(
  estimand = estimand, n_estimators = 100L, min_samples_leaf = 10L,
  random_state = 0L
)
candidates <- list(make_boost_d3, make_boost_d5, make_net_64, make_net_128, make_forest)

cv_outer <- vfold_cv(df, v = 3)
alpha_oof <- numeric(nrow(df))
for (split in cv_outer$splits) {
  tr <- analysis(split); te <- assessment(split); te_idx <- complement(split)
  cv_inner <- vfold_cv(tr, v = 2)
  scores <- vapply(candidates, function(make) {
    mean(vapply(cv_inner$splits, function(s) {
      m <- make(); m$fit(analysis(s)); m$score(assessment(s))
    }, numeric(1)))
  }, numeric(1))
  best_make <- candidates[[which.max(scores)]]
  best <- best_make(); best$fit(tr)
  alpha_oof[te_idx] <- best$predict(te)
}
length(alpha_oof); head(alpha_oof, 5)

[1] 2000

[1]  1.146083  5.361935  2.419404 -2.785415  1.130142

ForestRieszRegressor requires loading forestriesz and riesznet is needed for the neural candidates; load those R packages alongside rieszreg and rieszboost per the install instructions.

Other sklearn integrations

Every Riesz estimator exposes the standard BaseEstimator interface:

clone(estimator) returns a fresh unfitted copy with the same hyperparameters.
Pipeline wraps a feature-engineering step in front of the estimator.
HalvingGridSearchCV, RandomizedSearchCV, BayesSearchCV (skopt) work without modification.
get_params() / set_params() are used by the search classes to introspect and mutate the configuration.

Cross-fitting

Hyperparameter search

Comparing learners with different losses

Cross-fit ensemble selection

Other sklearn integrations

See also