RieszBooster, KernelRieszRegressor, ForestRieszRegressor, AugForestRieszRegressor, and RieszNet are all subclasses of rieszreg.RieszEstimator, which inherits from sklearn.base.BaseEstimator.
From Python, every sklearn tool works without modification: cross_val_predict, GridSearchCV, Pipeline, clone, set_params, RandomizedSearchCV, HalvingGridSearchCV, and the rest of sklearn.model_selection. From R, use rsample::vfold_cv for splits and a for loop over the R6 estimator’s $fit() / $predict() / $score() per fold.
This page uses RieszBooster for the single-learner sections and adds RieszNet and ForestRieszRegressor to the ensemble-selection example. The same patterns work for KernelRieszRegressor and AugForestRieszRegressor with backend-specific hyperparameters.
Cross-fitting
Plug-in \(\hat\alpha\) in a one-step / TMLE / DML estimator must be out-of-fold. Use cross_val_predict:
Each fold runs its own internal early-stopping split. The R loop calls $fit() / $predict() per fold on a fresh booster, mirroring what cross_val_predict does in Python.
Hyperparameter search
.score(Z) returns the negative canonical squared Riesz loss — a fixed yardstick independent of the loss the estimator was trained with, in the spirit of R² for regressors. GridSearchCV selects the configuration with the highest score (lowest squared loss). To use a different yardstick, pass scoring=riesz_scorer(loss=...).
Two estimators trained with different losses produce α̂ in the same units (a per-row real value), so the squared yardstick scores them on the same scale. cross_val_score results are directly comparable.
BoundedSquared-trained on bounded yardstick: -20.3700
print(f"squared-trained on bounded yardstick: {s_sq_yard.mean():.4f}")
squared-trained on bounded yardstick: 6.0809
The yardstick must accept the α the estimator predicts. The squared yardstick has unrestricted α-domain, so it works for any learner; the others have restrictions:
Yardstick
α-domain
Accepts α from learners trained with
SquaredLoss
ℝ
any loss (the default)
KLLoss
(0, ∞)
KL-trained learners
BernoulliLoss
(0, 1)
Bernoulli-trained learners
BoundedSquaredLoss(lo, hi)
(lo, hi)
matching-bounded learners
Pick a yardstick whose domain accepts every candidate’s α. SquaredLoss is always safe.
Cross-fit ensemble selection
Going further than two losses on one backend: the pattern below selects among five learner configurations — two boosting depths, two neural-net architectures, and a forest — using inner 2-fold CV inside each of three outer cross-fitting folds. The inner CV runs entirely on the training portion of each outer fold, so the model selection criterion never sees the held-out data that cross_val_predict predicts.
This uses the Pipeline idiom for cross-estimator-type selection: wrap the estimator in a single-step Pipeline named "riesz", then pass a list of fully configured instances as the value for that key. GridSearchCV calls pipe.set_params(riesz=candidate), fits, and scores each one on the canonical squared yardstick.
from sklearn.pipeline import Pipelinefrom riesznet import RieszNetfrom forestriesz import ForestRieszRegressorestimand = ATE()# Five fully configured candidates across three learner families.candidates = [# Boosting — depth 3, early stopping RieszBooster( estimand=estimand, n_estimators=200, max_depth=3, early_stopping_rounds=10, validation_fraction=0.2, learning_rate=0.05, ),# Boosting — depth 5, early stopping RieszBooster( estimand=estimand, n_estimators=200, max_depth=5, early_stopping_rounds=10, validation_fraction=0.2, learning_rate=0.05, ),# Neural net — small MLP with early stopping RieszNet( estimand=estimand, hidden_sizes=(64, 64), epochs=100, early_stopping_rounds=10, validation_fraction=0.2, ),# Neural net — larger MLP with early stopping RieszNet( estimand=estimand, hidden_sizes=(128, 64), epochs=100, early_stopping_rounds=10, validation_fraction=0.2, ),# Forest Riesz ForestRieszRegressor( estimand=estimand, n_estimators=100, min_samples_leaf=10, random_state=0, ),]# Single-step Pipeline so GridSearchCV can swap in any candidate as the step.pipe = Pipeline([("riesz", candidates[0])])# Inner CV: 2 folds, selects the best learner+config from the five candidates.cv_inner = GridSearchCV( pipe, param_grid=[{"riesz": [c]} for c in candidates], cv=2, n_jobs=1, refit=True,)# Outer CV: 3 cross-fitting folds. Each fold:# 1. Fits cv_inner on 2/3 of the data (inner CV happens here, on that 2/3).# 2. Predicts the held-out 1/3 using the winner from step 1.# The inner model selection never touches the held-out slice.alpha_oof = cross_val_predict(cv_inner, df, cv=3)print(f"out-of-fold alpha shape: {alpha_oof.shape}")
out-of-fold alpha shape: (2000,)
print(f"first 5 values: {alpha_oof[:5]}")
first 5 values: [-2.79388666 -1.77150512 -1.09701025 -1.09701025 1.24940085]
The call to cross_val_predict is identical to the single-learner case. The only addition is the Pipeline wrapper and the list-valued param_grid entry — everything else is standard sklearn.
The same nested cross-fit + per-fold model selection in R uses rsample (tidymodels) for the fold splits and a small loop for the inner-CV winner pick. Each outer fold runs its own inner-CV on its training portion only.
ForestRieszRegressor requires loading forestriesz and riesznet is needed for the neural candidates; load those R packages alongside rieszreg and rieszboost per the install instructions.
Other sklearn integrations
Every Riesz estimator exposes the standard BaseEstimator interface:
clone(estimator) returns a fresh unfitted copy with the same hyperparameters.
Pipeline wraps a feature-engineering step in front of the estimator.
HalvingGridSearchCV, RandomizedSearchCV, BayesSearchCV (skopt) work without modification.
get_params() / set_params() are used by the search classes to introspect and mutate the configuration.