Forest backend (forestriesz)

forestriesz ships two random-forest backends. They differ in whether the user has to make per-estimand modeling choices to fit the model.

rieszreg exposes the two entry points as Backend.fit_augmented and MomentBackend.fit_rows respectively.

How they work

Augmentation-style (AugForestRieszRegressor)

For each original row \(i\), Estimand.augment emits the original observation \(Z_i\) and one evaluation point per linear-form term in \(m(\mu)(Z_i, Y_i)\). Each augmented row \(r\) carries an evaluation point \(z_r\), a weight \(D_r \in \{0, 1\}\) indicating whether \(z_r\) is the original observation, and a coefficient \(C_r\) collecting the trace coefficients at \(z_r\). The empirical Bregman-Riesz loss is

\[ \hat L(\alpha) \;=\; \frac{1}{n} \sum_{r=1}^M \big[\, D_r\, \tilde h(\alpha(z_r)) \;+\; C_r\, h'(\alpha(z_r)) \,\big]. \]

See Estimands for the construction. Each tree partitions the feature space and stores the loss-optimal \(\alpha\) in every leaf \(\ell\):

\[ \hat\alpha_\ell \;=\; \arg\min_{\alpha} \sum_{r \in \ell} \big[\, D_r\, \tilde h(\alpha) \;+\; C_r\, h'(\alpha) \,\big]. \]

For SquaredLoss this has the closed form \(\hat\alpha_\ell = -\sum_{r \in \ell} C_r / \sum_{r \in \ell} D_r\); for KLLoss, BernoulliLoss, and BoundedSquaredLoss the per-leaf optimum has its own closed form (see Tree backend for the derivations). The forest averages the per-tree predictions of \(\hat\alpha\).

Because \(D_r\) and \(C_r\) already vary across augmented rows for every estimand — original observations have \(D_r = 1, C_r = 0\) and counterfactual evaluation points have \(D_r = 0, C_r \ne 0\) — the splitter learns from the full feature space directly. The user does not supply any basis functions.

Moment-style (ForestRieszRegressor)

The user supplies a list of functions \(\varphi = (\varphi_1, \dots, \varphi_p)\) of the data. Inside each leaf \(\ell\) the per-leaf parameter \(\theta_\ell \in \mathbb R^p\) solves the linear-moment equation

\[ \hat\theta_\ell \;=\; \big(\textstyle\sum_{i\in\ell} \varphi(Z_i)\,\varphi(Z_i)^{\top}\big)^{-1} \textstyle\sum_{i\in\ell} m(\varphi)(Z_i, Y_i), \]

and predictions are \(\hat\alpha(z) = \hat\theta(z_{\text{split}}) \cdot \varphi(z)\). The forest is fit on the \(n\) original rows, and the MSE splitting criterion picks splits to minimize the sum of in-leaf residuals against this leaf optimum.

The list \(\varphi\) has to be chosen per estimand. The default riesz_feature_fns="auto" picks defaults for built-in estimands:

Estimand Default list Effect
ATE, ATT [1{A=0}, 1{A=1}] Forest splits on covariates only; the basis resolves treatment in each leaf.
TSM(level=v) [1{A=v}] Single-basis fit; intervals supported.
AdditiveShift, LocalShift, custom none — auto-resolution falls back to a constant function, which is row-degenerate Backend raises with a hint. Use AugForestRieszRegressor instead, or pass a custom list.

For ATE the leaf solve recovers \(1/\hat P(A = a \mid X\text{-leaf})\) — the IPW representer.

How they differ

AugForestRieszRegressor ForestRieszRegressor
Per-estimand setup none — works on every estimand directly user supplies riesz_feature_fns (auto-resolved for ATE, ATT, TSM)
Loss support SquaredLoss, KLLoss, BernoulliLoss, BoundedSquaredLoss SquaredLoss only
Confidence intervals not in v1 yes — predict_interval on single-basis fits with honest=True, inference=True
Splits on full feature space covariates only when the basis handles treatment
Backend Protocol Backend.fit_augmented MomentBackend.fit_rows
When to pick default for general use; required for shift-style or custom estimands when you need CIs on ATE, ATT, or TSM

Bregman losses

AugForestRieszRegressor accepts every loss rieszreg ships. The per-tree splitter dispatches to a per-loss leaf-loss formula (squared, KL, Bernoulli, bounded-squared) so splits are chosen against the loss the user actually supplied. The forest averages per-tree \(\hat\alpha\) predictions.

import numpy as np, pandas as pd
from forestriesz import AugForestRieszRegressor, TSM
from rieszreg import KLLoss

rng = np.random.default_rng(0)
n = 1500
x = rng.uniform(0, 1, n)
pi = 1 / (1 + np.exp(-(0.5 * x - 0.3)))
a = rng.binomial(1, pi).astype(float)
df_tsm = pd.DataFrame({"a": a, "x": x})

# KLLoss is matched to density-ratio estimands like TSM and IPSI;
# it requires non-negative C-coefficients in the augmented dataset.
est = AugForestRieszRegressor(
    estimand=TSM(level=1, treatment="a", covariates=("x",)),
    loss=KLLoss(),
    n_estimators=200,
    min_samples_leaf=10,
    random_state=0,
)
est.fit(df_tsm)
AugForestRieszRegressor(estimand=<rieszreg.estimands.base.TSM object at 0x7feacbdea120>,
                        loss=<rieszreg.losses.kl.KLLoss object at 0x7feacbdea330>,
                        min_samples_leaf=10, n_estimators=200)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
alpha_hat = est.predict(df_tsm)
print(f"alpha_hat range: [{alpha_hat.min():.3f}, {alpha_hat.max():.3f}]")
alpha_hat range: [0.000, 3.720]

KLLoss and BernoulliLoss reject ATE / ATT / shift-style data at fit time — they require non-negative \(C\)-coefficients, which difference-style estimands violate. BoundedSquaredLoss(lo, hi) accepts any signed coefficients and clips \(\hat\alpha\) into the interval.

Quickstart — ATE

import numpy as np, pandas as pd
from forestriesz import ForestRieszRegressor, ATE

rng = np.random.default_rng(0)
n = 1500
x = rng.uniform(0, 1, n)
pi = 1 / (1 + np.exp(-(-0.02*x - x**2 + 4*np.log(x + 0.3) + 1.5)))
a = rng.binomial(1, pi).astype(float)
df = pd.DataFrame({"a": a, "x": x})

fr = ForestRieszRegressor(
    estimand=ATE(treatment="a", covariates=("x",)),
    n_estimators=500,
    min_samples_leaf=10,
    random_state=0,
)
fr.fit(df)
ForestRieszRegressor(estimand=<rieszreg.estimands.base.ATE object at 0x7feacadfa180>,
                     min_samples_leaf=10, n_estimators=500)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
alpha_hat  = fr.predict(df)
true_alpha = a / pi - (1 - a) / (1 - pi)
print(f"corr(α̂, α₀) = {np.corrcoef(alpha_hat, true_alpha)[0, 1]:.3f}")
corr(α̂, α₀) = 0.939
print(f"RMSE         = {np.sqrt(np.mean((alpha_hat - true_alpha)**2)):.3f}")
RMSE         = 0.880
pkgload::load_all("../packages/forestriesz/r/forestriesz")
use_python_forestriesz(file.path(getwd(), "../.venv/bin/python"))

set.seed(0)
n  <- 1500
x  <- runif(n)
pi <- 1 / (1 + exp(-(-0.02 * x - x^2 + 4 * log(x + 0.3) + 1.5)))
a  <- rbinom(n, 1, pi)
df <- data.frame(a = as.numeric(a), x = x)

# ATE / ATT use a multi-basis list, which is Python-only in v1.
# Call into Python directly via reticulate.
fr_py <- reticulate::import("forestriesz", convert = FALSE)
pd_py <- reticulate::import("pandas", convert = FALSE)
py_df <- pd_py$DataFrame(reticulate::r_to_py(list(a = as.numeric(a), x = x)))

fr <- fr_py$ForestRieszRegressor(
  estimand = fr_py$ATE(treatment = "a", covariates = list("x")),
  n_estimators = 500L, min_samples_leaf = 10L, random_state = 0L
)
fr$fit(py_df)
alpha_hat <- as.numeric(reticulate::py_to_r(fr$predict(py_df)))

Quickstart — AdditiveShift with the augmentation-style backend

AdditiveShift has no canonical list of basis functions, so the moment-style backend would require the user to construct one. AugForestRieszRegressor works on it directly.

from forestriesz import AugForestRieszRegressor, AdditiveShift

# Continuous-treatment data
a_cont = rng.normal(0.5*x, 1.0)
df_cont = pd.DataFrame({"a": a_cont, "x": x})

afr = AugForestRieszRegressor(
    estimand=AdditiveShift(delta=0.5, treatment="a", covariates=("x",)),
    n_estimators=500,
    min_samples_leaf=10,
    random_state=0,
)
afr.fit(df_cont)
AugForestRieszRegressor(estimand=<rieszreg.estimands.base.AdditiveShift object at 0x7feacae4ad50>,
                        min_samples_leaf=10, n_estimators=500)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
alpha_hat = afr.predict(df_cont)
print(f"alpha_hat range: [{alpha_hat.min():.3f}, {alpha_hat.max():.3f}]")
alpha_hat range: [-0.949, 3.695]

Quickstart — TSM with confidence intervals

TSM uses a single-basis fit, so v1’s predict_interval works.

from forestriesz import ForestRieszRegressor, TSM

fr = ForestRieszRegressor(
    estimand=TSM(level=1, treatment="a", covariates=("x",)),
    n_estimators=500,
    honest=True,
    inference=True,
    random_state=0,
)
fr.fit(df)
ForestRieszRegressor(estimand=<rieszreg.estimands.base.TSM object at 0x7feacadf7f80>,
                     honest=True, inference=True, n_estimators=500)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
alpha_hat = fr.predict(df)
lb, ub    = fr.predict_interval(df, alpha=0.05)
print(f"mean 95% CI width = {(ub - lb).mean():.3f}")
mean 95% CI width = 0.518
print(f"mean coverage of α̂ in [lb, ub] = {((lb <= alpha_hat) & (alpha_hat <= ub)).mean():.2f}")
mean coverage of α̂ in [lb, ub] = 1.00
pkgload::load_all("../packages/forestriesz/r/forestriesz")
use_python_forestriesz(file.path(getwd(), "../.venv/bin/python"))

fr <- ForestRieszRegressor$new(
  estimand = TSM(level = 1L, treatment = "a", covariates = "x"),
  n_estimators = 500L,
  honest = TRUE,
  inference = TRUE,
  random_state = 0L
)
fr$fit(df)
alpha_hat <- fr$predict(df)
ci <- fr$predict_interval(df, alpha = 0.05)

Hyperparameters

AugForestRieszRegressor

Mirrors sklearn.ensemble.RandomForestRegressor:

Knob Default Notes
n_estimators 100 Number of trees.
max_depth None Grow until leaves saturate min_samples_leaf.
min_samples_split 2 Minimum count of original (\(D > 0\)) augmented rows in a node before considering a split.
min_samples_leaf 1 Minimum count of original rows in each child of a candidate split.
max_features 1.0 Per-split feature-subsampling rule.
max_leaf_nodes None Cap on per-tree leaf count.
min_impurity_decrease 0.0 Reject splits below this gain.
ccp_alpha 0.0 Cost-complexity pruning per tree.
bootstrap True Per-tree resampling at the original-row level (block bootstrap).
max_samples None Per-tree subsample size as fraction or count of original rows.
n_jobs None Parallelism across trees via joblib.Parallel.
splitter "exact" "exact", "hist", "random", or "python" — riesztree splitter dispatch.
max_bins 255 Histogram-splitter bin count.
categorical_features None Column indices treated as integer category labels.

When splitter="hist" and the configuration is “simple” (no categoricals, no max_features subsampling, no ccp_alpha, no leaf-count cap, built-in loss), the bin mapper is fitted once on the full augmented training data and the binned matrix is shared across joblib workers — sklearn-HistGradientBoostingRegressor convention. The win is largest at shallow depths where per-tree binning dominates tree-build wall time (~2× speedup at max_depth=8).

ForestRieszRegressor

Forwarded to EconML’s BaseGRF:

Knob Default Notes
n_estimators 100 Number of trees. With inference=True must be divisible by subforest_size (default 4).
max_depth None Grow until leaves saturate min_samples_leaf.
min_samples_leaf 5 Tighter leaves give lower bias, higher variance. Paper guidance: 5–20.
min_samples_split 10
max_samples 0.45 Per-tree subsample fraction. GRF default.
honest False Enable for valid-coverage CIs (50/50 sample split).
inference False Enable for predict_interval; preserves subforest structure.
subforest_size 4 Trees per subforest when computing inference variance.
l2 0.01 Ridge added to the per-leaf Jacobian.
n_jobs -1 Parallel tree training.

Save / load

import tempfile, os
with tempfile.TemporaryDirectory() as tmp:
    save_dir = os.path.join(tmp, "fitted")
    fr.save(save_dir)

    loaded = ForestRieszRegressor.load(save_dir)
    pred_loaded = loaded.predict(df)
    print(f"max abs diff after round-trip: {np.max(np.abs(pred_loaded - alpha_hat)):.2e}")
max abs diff after round-trip: 3.55e-15

For built-in estimands the moment-style basis is auto-resolved on load. For custom bases, repass the callables: ForestRieszRegressor.load(path, riesz_feature_fns=my_basis). AugForestRieszRegressor round-trips without any extra arguments.

Sklearn integration

Both regressors are BaseEstimators, so they compose with clone, GridSearchCV, cross_val_predict, Pipeline. See Tuning and Cross-Fitting.

When the constant basis raises (moment-style only)

For built-in estimands the per-row moment \(m(\varphi)(Z_i, Y_i)\) doesn’t depend on \(Z_i\) when \(\varphi \equiv 1\), so the constant basis is degenerate (every leaf would predict the same constant). ForestRieszRegressor detects this and raises with a hint to use riesz_feature_fns="auto" (the default) or supply a custom list. Explicit riesz_feature_fns=None is what raises. AugForestRieszRegressor is unaffected — its splitter does not depend on a user-supplied basis.