rieszboost provides two boosting backends and the RieszBooster convenience class. XGBoostBackend is the default; SklearnBackend runs Friedman gradient boosting against any sklearn-compatible base learner.
The augmentation engine produces per-row quadratic (\(a\)) and linear (\(b\)) coefficients. Each original row contributes \(a=1\), \(b=0\); each \((c, p)\) pair from \(m(z, \cdot)\) contributes \(a=0\), \(b=-2c\). The Riesz loss becomes a per-row weighted regression that xgboost handles via its custom-objective interface.
XGBoostBackend
The default. Boosts in η-space; the predictor applies the loss spec’s link to convert to α.
hessian_floor is the lower bound on the per-row Hessian fed to xgboost. Counterfactual rows have \(H = 2a = 0\); without a floor, xgboost’s leaf-weight Newton step (-G/(H+λ)) is degenerate at those leaves. The default of 2.0 matches the natural Hessian on original rows.
gradient_only short-circuits the second-order Hessian and uses \(H = 1\) everywhere — first-order gradient boosting (Friedman 2001 / Lee-Schuler Algorithm 2). Set True to reproduce the Lee-Schuler reference implementation exactly.
max_depth, reg_lambda, subsample, learning_rate, n_estimators, and early_stopping_rounds are passed through from the RieszBooster constructor.
Tuning recipe
Knob
Default
When to change
Effect
max_depth
4
Drop to 2–3 for extrapolation outliers
Shallow trees can’t carve out a high-magnitude leaf for a single point.
learning_rate
0.05
Drop to 0.02 for harder problems
Smaller updates per tree; pair with higher n_estimators.
n_estimators + early_stopping_rounds
200, None
Set n_estimators=1000-2000 and early_stopping_rounds=20
The validation split picks the iteration count.
validation_fraction
0.0
Set to 0.2 with early_stopping_rounds
Internal split for early stopping. Alternative: pass eval_set= explicitly.
reg_lambda
1.0
Bump to 5–10 for low-overlap data
xgboost L2 on leaf weights; damps the magnitude of any single leaf.
subsample
1.0
Try 0.5–0.8 with very large \(n\)
Stochastic boosting.
XGBoostBackend exposes two more knobs:
Knob
Default
When to change
gradient_only
False
Set True to reproduce Lee-Schuler Algorithm 2 / Friedman 2001 exactly.
hessian_floor
2.0
Lower bound on the per-row Hessian. The default matches the natural Hessian of original-data rows.
To use a non-tree base learner, pass a base_learner_factory: a zero-arg callable returning a fresh sklearn-compatible regressor each round.
The backend implements first-order gradient boosting with closed-form line search. Each round: fit the weak learner to the negative gradient of the Riesz loss, solve for the optimal step size under a quadratic surrogate, update the running prediction.
set.seed(0)n <-1000x <-runif(n)pi <-1/ (1+exp(-(8* x -4)))a <-rbinom(n, 1, pi)df <-data.frame(a =as.numeric(a), x = x)# Build the Python factory directly via reticulate so each round gets a# fresh sklearn object.sk_kr <- reticulate::import("sklearn.kernel_ridge", convert =FALSE)factory <- reticulate::py_func(function() sk_kr$KernelRidge(alpha =1.0,kernel ="rbf",gamma =2.0))booster <- RieszBooster$new(estimand =ATE(),backend =SklearnBackend(base_learner_factory = factory,n_estimators =80L,learning_rate =0.05,early_stopping_rounds =10L,validation_fraction =0.2 ))booster$fit(df)alpha_hat <- booster$predict(df)true_alpha <- a / pi - (1- a) / (1- pi)cat(sprintf("corr = %.3f, RMSE = %.3f\n",cor(alpha_hat, true_alpha),sqrt(mean((alpha_hat - true_alpha)^2))))
corr = 0.928, RMSE = 1.579
SklearnBackend is slower than XGBoostBackend (no parallel tree splits). Use it when the data is better fit by a non-tree learner (e.g. low-dimensional problems where kernel ridge dominates), or when xgboost is unavailable in your environment.
TipSkipping xgboost
SklearnBackend does not import xgboost. The library lazy-imports xgboost only when XGBoostBackend is used, so you can run rieszboost in xgboost-hostile environments (alpine containers, builds without OpenMP) by passing backend=SklearnBackend(...) and never installing xgboost.