Estimands

An estimand is a scalar summary of the data-generating distribution that you want to estimate. The estimands rieszreg handles all share the form \[ \psi = \mathbb{E}\!\left[\,m(\mu)(Z, Y)\,\right] \] where \(\mu(z) = \mathbb{E}[Y \mid Z = z]\) is the outcome regression and \(m\) is an operator — a rule that maps the function \(\mu\) to another function \(m(\mu)\) that can also depend on the row \((Z, Y)\). For the average treatment effect (ATE) with binary treatment \(A\), \(m(\mu)(Z, Y) = \mu(1, X) - \mu(0, X)\) — m ignores \(Y\) and produces a contrast of two evaluations of \(\mu\).

An estimand object packages two things: the column names \(\alpha\) is indexed by (feature_keys) and the operator itself (m). m is curried: it takes a candidate \(\alpha\) and returns a function of one row \((z, y)\). The five built-in factories (ATE, ATT, TSM, AdditiveShift, LocalShift) cover most causal-inference targets; section Custom estimands shows how to write your own.

The factories live in rieszreg; rieszboost, krrr, forestriesz, and riesznet all re-export them so from rieszboost import ATE, from krrr import ATE, from forestriesz import ATE, and from riesznet import ATE resolve to the same object.

Quickstart: ATE on the Lalonde NSW data

The NSW (National Supported Work) demonstration is the canonical Lalonde dataset: a job-training experiment for disadvantaged workers, plus a CPS observational comparison group. The goal here is the average treatment effect on 1978 earnings.

The columns we need:

treat: binary treatment indicator (1 = NSW trainee, 0 = CPS comparison).
re78: 1978 real earnings (the outcome \(Y\)).
age, educ, black, hisp, marr, nodegree, re74, re75: pre-treatment covariates.

The estimand needs treat and the eight covariates. The outcome re78 becomes y — a separate vector — at fit time.

import numpy as np, pandas as pd
from causaldata import nsw_mixtape, cps_mixtape
from rieszreg import ATE
from rieszboost import RieszBooster

COVARIATES = ["age", "educ", "black", "hisp", "marr", "nodegree", "re74", "re75"]

treated  = nsw_mixtape.load_pandas().data
treated  = treated[treated["treat"] == 1].copy()
controls = cps_mixtape.load_pandas().data
df = pd.concat([treated, controls], ignore_index=True)
df["treat"] = df["treat"].astype(float)

Z = df[["treat"] + COVARIATES]
y = df["re78"].astype(float)
print(f"n = {len(df)}  ({(df['treat']==1).sum()} treated, {(df['treat']==0).sum()} control)")

n = 16177  (185 treated, 15992 control)

The estimand says exactly which dataframe columns feed \(\alpha\): the treatment column and the eight covariates, in the order they appear in feature_keys.

estimand = ATE(treatment="treat", covariates=tuple(COVARIATES))
estimand.feature_keys

('treat', 'age', 'educ', 'black', 'hisp', 'marr', 'nodegree', 're74', 're75')

Fit the Riesz representer. y is passed because the sklearn convention says so; the ATE operator ignores it.

booster = RieszBooster(
    estimand=estimand,
    n_estimators=300, learning_rate=0.05, max_depth=4, reg_lambda=10.0,
    random_state=0,
).fit(Z, y)

Predict \(\hat\alpha\) on the same rows and look at extreme-weight diagnostics.

alpha_hat = booster.predict(Z)
print(f"min α̂ = {alpha_hat.min():.2f}, max α̂ = {alpha_hat.max():.2f}")

min α̂ = -2.24, max α̂ = 14.91

booster.diagnose(Z).summary()

'Riesz representer diagnostics (n=16177):\n  RMS magnitude   : 1.2395\n  mean            : -0.9387\n  min / max       : -2.2424 / 14.9131\n  |alpha| quantiles:\n     0.50: 1.0001\n     0.90: 1.0278\n     0.99: 2.1239\n     1.00: 14.9131\n  extreme rows    : 0/16177 (0.00%) with |alpha| > 30.0\n  held-out Riesz  : -29.2830'

To turn \(\hat\alpha\) and a separately fit outcome regression \(\hat\mu\) into a one-step or DML estimate of \(\psi\), see the DML / TMLE recipe.

library(reticulate)
cd_py <- import("causaldata")
COVARIATES <- c("age", "educ", "black", "hisp", "marr", "nodegree", "re74", "re75")

treated  <- py_to_r(cd_py$nsw_mixtape$load_pandas()$data)
treated  <- treated[treated$treat == 1, ]
controls <- py_to_r(cd_py$cps_mixtape$load_pandas()$data)
df <- rbind(treated, controls)
df$treat <- as.numeric(df$treat)

Z <- df[, c("treat", COVARIATES)]
y <- df$re78
cat(sprintf("n = %d  (%d treated, %d control)\n",
            nrow(df), sum(df$treat == 1), sum(df$treat == 0)))

n = 16177  (185 treated, 15992 control)

booster <- RieszBooster$new(
  estimand = ATE(treatment = "treat", covariates = COVARIATES),
  n_estimators = 300L, learning_rate = 0.05, max_depth = 4L, reg_lambda = 10.0,
  random_state = 0L
)
booster$fit(Z, y)

alpha_hat <- booster$predict(Z)
cat(sprintf("min alpha = %.2f, max alpha = %.2f\n",
            min(alpha_hat), max(alpha_hat)))

min alpha = -2.24, max alpha = 14.91

booster$diagnose(Z)$summary

[1] "Riesz representer diagnostics (n=16177):\n  RMS magnitude   : 1.2395\n  mean            : -0.9387\n  min / max       : -2.2424 / 14.9131\n  |alpha| quantiles:\n     0.50: 1.0001\n     0.90: 1.0278\n     0.99: 2.1239\n     1.00: 14.9131\n  extreme rows    : 0/16177 (0.00%) with |alpha| > 30.0\n  held-out Riesz  : -29.2830"

Built-in estimands

Estimand	\(m(\mu)(z, y)\)	Notes
`ATE(treatment, covariates)`	\(\mu(1, x) - \mu(0, x)\)	Average treatment effect — see `rieszboost/examples/lee_schuler/binary_dgp.py`.
`ATT(treatment, covariates)`	\(a \cdot (\mu(1, x) - \mu(0, x))\)	ATT partial-estimand surface. Full ATT divides by \(P(A=1)\) — combine with a delta-method EIF downstream.
`TSM(level, treatment, covariates)`	\(\mu(\text{level}, x)\)	Treatment-specific mean — see `rieszboost/examples/tsm.py`.
`AdditiveShift(delta, ...)`	\(\mu(a + \delta, x) - \mu(a, x)\)	Continuous-treatment shift effect — see `rieszboost/examples/lee_schuler/continuous_dgp.py`.
`LocalShift(delta, threshold, ...)`	\(\mathbf{1}\{a < \text{threshold}\} \cdot (\mu(a + \delta, x) - \mu(a, x))\)	LASE partial-estimand surface; same caveat as ATT.

The R and Python factories take the same arguments.

from rieszreg import ATE, ATT, TSM, AdditiveShift, LocalShift
ATE(treatment="a", covariates=("x",)).name

'ATE'

TSM(level=1, treatment="a", covariates=("x",)).name

'TSM(level=1)'

LocalShift(delta=0.3, threshold=0.7).name

'LocalShift(delta=0.3, threshold=0.7)'

ATE(treatment = "a", covariates = "x")

<rieszreg.estimands.base.ATE object at 0x7fa48dcfacf0>
 signature: (alpha)

TSM(level = 1, treatment = "a", covariates = "x")

<rieszreg.estimands.base.TSM object at 0x7fa48dd68f20>
 signature: (alpha)

LocalShift(delta = 0.3, threshold = 0.7)

<rieszreg.estimands.base.LocalShift object at 0x7fa48dd69190>
 signature: (alpha)

Mapping your dataframe to an estimand

An estimand object carries two pieces: the column names \(\alpha\) is indexed by (feature_keys) and the operator m(alpha)(z, y).

feature_keys declares the input schema of \(\alpha\) — the columns the underlying learner consumes when scoring a row. The orchestrator slices these columns out of Z per row to build the feature matrix the learner sees.

The outcome Y is not part of feature_keys. It flows in the sklearn-style way: a separate vector passed as the second argument to .fit(Z, y). The orchestrator threads it through to m(alpha)(z, y). Built-in estimands ignore y; pass it anyway when following sklearn convention.

For the Lalonde quickstart above, the wiring looks like this:

Concept	Lalonde column	Where it goes
Treatment \(A\)	`treat`	`ATE(treatment="treat", ...)` — a `feature_key`.
Covariates \(X\)	`age`, `educ`, `black`, `hisp`, `marr`, `nodegree`, `re74`, `re75`	`ATE(covariates=(...))` — also `feature_keys`.
Outcome \(Y\)	`re78`	`.fit(Z, y=df["re78"])`.

The Z dataframe must contain every name in feature_keys; extra columns are ignored. With an ndarray, columns are interpreted in feature_keys order.

Theory: how the augmented dataset is derived

This section explains the math that lets rieszreg evaluate the Riesz loss without ever materializing \(\alpha_0\). Skip on a first read.

While the Riesz representer theorem applies to any continuous linear functional, the estimands rieszreg supports are those whose \(m\) reduces to a finite linear combination of point evaluations of \(\mu\). For example, the ATE is \(\mathbb{E}[\mu(1, X) - \mu(0, X)]\), which reduces to two point evaluations \(\mu(1, x)\) and \(\mu(0, x)\) for each row’s covariates \(x\). We call these finite-evaluation estimands.

Formally, an estimand \(\psi = \mathbb{E}[m(\mu)(Z, Y)]\) is finite-evaluation if there exist fixed functions \(a: \mathcal Z \times \mathcal Y \to \mathcal Z^k\) (the point generator) and \(c: \mathcal Z \times \mathcal Y \to \mathbb R^k\) (the weight function) such that \[ m(\mu)(z, y) = c(z, y)^\top \vec\mu(a(z, y)), \] where \(\vec\mu\) denotes elementwise application of \(\mu\) to the vector-valued input. For ATE, \(k = 2\), \(a(z, y) = ((1, z_x), (0, z_x))\), and \(c(z, y) = (1, -1)\).

For these estimands the empirical Bregman-Riesz loss simplifies. Define the \(k\) pseudo-datapoints \(Z_{ij} = a_j(Z_i, Y_i)\) with coefficients \(C_{ij} = c_j(Z_i, Y_i)\) for each row \(i\). Then \[ \begin{aligned} \mathcal L_{h,n}(\alpha) &= \sum_{i=1}^n \left[\tilde h(\alpha(Z_i)) - m(h' \circ \alpha)(Z_i, Y_i)\right] \\ &= \sum_{i=1}^n \left[\tilde h(\alpha(Z_i)) - \sum_{j=1}^k C_{ij} \cdot h'(\alpha(Z_{ij}))\right]. \end{aligned} \]

The loss depends on \(\alpha\) only through its values at \(\{Z_i\} \cup \{Z_{ij}\}\), at most \(n(k+1)\) points. The coefficients \(C_{ij}\) depend on data but not on \(\alpha\), so they can be computed once as a preprocessing step.

The augmented dataset

The cleanest way to keep track of the loss, derivatives, etc. during training is to materialize an augmented dataset of the points \(\{Z_i\} \cup \{Z_{ij}\}\) along with the appropriate coefficients \(C_{ij}\).

Some pseudo-datapoints coincide with the original observation. For ATE, \(Z_{i1} = (1, x_i)\) and \(Z_{i2} = (0, x_i)\). A treated unit (\(a_i = 1\)) has \(Z_{i1} = Z_i\), so two of the three evaluation points are the same. Let \(D_{ij} = \mathbf{1}[Z_{ij} = Z_i]\) indicate overlap. Splitting the \(h'\) sum into overlapping and non-overlapping terms: \[ \mathcal L_{h,n}(\alpha) = \sum_{i=1}^n \left[\tilde h(\alpha(Z_i)) - \sum_{j:\, D_{ij}=1} C_{ij}\, h'(\alpha(Z_i))\right] - \sum_{i=1}^n \sum_{j:\, D_{ij}=0} C_{ij}\, h'(\alpha(Z_{ij})). \]

At each distinct evaluation point, the loss combines a \(\tilde h\)-weight and an \(h'\)-weight into one term. Estimand.augment collects these distinct points for each row and stores their coefficients. Let \(r\) index an enumeration of the augmented data points. Each augmented row \(r\) corresponds to either a \(Z_{ij}\) or a \(Z_i\) for some original row \(i\); write \(i_r\) for that origin row. The two weights are \[ D_r = \mathbf{1}[Z_r = Z_{i_r}], \qquad C_r = -\sum_{j \,:\, Z_{i_r, j} = Z_r} C_{i_r, j}. \]

The \(\tilde h\)-weight \(D_r\) is \(1\) exactly when \(Z_r\) is the original observation; the \(h'\)-weight \(C_r\) collects \(-C_{ij}\) from every pseudo-datapoint of row \(i_r\) that lands at \(Z_r\) — an empty sum (giving \(C_r = 0\)) when no pseudo-datapoint coincides with \(Z_{i_r}\). The per-row contribution to the empirical loss is \(L_r = D_r\, \tilde h(\alpha(Z_r)) + C_r\, h'(\alpha(Z_r))\), and \(\mathcal L_{h,n}(\alpha) = \sum_r L_r\).

For ATE with an untreated unit (\(a_i = 0\)): before merging there are three evaluation points — \(Z_i = (0, x_i)\) and pseudo-datapoints \((1, x_i)\), \((0, x_i)\). The pseudo-datapoint \((0, x_i)\) coincides with \(Z_i\) (\(D_{i2} = 1\)), so they merge into one entry with \(\tilde h\)-weight \(1\) and \(h'\)-weight \(-C_{i2} = +1\). The other pseudo-datapoint \((1, x_i)\) stays separate with \(\tilde h\)-weight \(0\) and \(h'\)-weight \(-C_{i1} = -1\). The result is two augmented rows instead of three. A treated unit merges the other pseudo-datapoint instead, also producing two rows.

It can also happen that none of the pseudo-datapoints coincide with \(Z_i\) — the union \(\{Z_i\} \cup \{Z_{ij}\}\) then has \(k+1\) distinct entries and no merging occurs. For example, TSM(level=1) on an untreated row (\(a_i = 0\)) has \(Z_i = (0, x_i)\) and a single pseudo-datapoint \((1, x_i)\); they are distinct, so \(Z_i\) enters the augmented dataset as its own row with \(\tilde h\)-weight \(1\) and \(h'\)-weight \(0\) (no pseudo-datapoint lands there), and \((1, x_i)\) enters as a separate row with \(\tilde h\)-weight \(0\) and \(h'\)-weight \(-C_{i1} = -1\). The general rule is the same in every case: include \(Z_i\) once, then add each pseudo-datapoint \(Z_{ij}\) — merging into an existing entry when \(Z_{ij}\) matches a point already in the dataset for row \(i\), otherwise as a new row.

To make this concrete, take the ATE estimand on \(n=3\) rows:

\(i\)	\(A_i\)	\(X_i\)
1	0	0.2
2	1	0.5
3	1	0.8

For each row, the three evaluation points \(\{Z_i,\, Z_{i1} = (1, X_i),\, Z_{i2} = (0, X_i)\}\) collapse to two distinct entries after merging. The augmented dataset has \(2n = 6\) rows; each row \(r\) carries the evaluation point \((A_r, X_r)\), the \(\tilde h\)-weight \(D_r\), the \(h'\)-weight \(C_r\), and contributes a term \(L_r = D_r \cdot \tilde h(\alpha(A_r, X_r)) + C_r \cdot h'(\alpha(A_r, X_r))\) to the empirical loss:

from \(i\)	\(A_r\)	\(X_r\)	\(D_r\)	\(C_r\)	\(L_r\)
1	0	0.2	\(1\)	\(+1\)	\(\tilde h(\alpha(0, 0.2)) + h'(\alpha(0, 0.2))\)
1	1	0.2	\(0\)	\(-1\)	\(-h'(\alpha(1, 0.2))\)
2	1	0.5	\(1\)	\(-1\)	\(\tilde h(\alpha(1, 0.5)) - h'(\alpha(1, 0.5))\)
2	0	0.5	\(0\)	\(+1\)	\(+h'(\alpha(0, 0.5))\)
3	1	0.8	\(1\)	\(-1\)	\(\tilde h(\alpha(1, 0.8)) - h'(\alpha(1, 0.8))\)
3	0	0.8	\(0\)	\(+1\)	\(+h'(\alpha(0, 0.8))\)

Rows \(i=2\) and \(i=3\) are treated, so the merging happens on the \((1, X_i)\) pseudo-datapoint instead of \((0, X_i)\), flipping which row carries the \(\tilde h\)-weight. Summing the \(L_r\) column recovers the empirical loss: \[ \mathcal L_{h,n}(\alpha) = \sum_{r=1}^{6} L_r = \sum_{i=1}^{3} \big[\tilde h(\alpha(Z_i)) - h'(\alpha(1, X_i)) + h'(\alpha(0, X_i))\big]. \]

Estimand.augment produces this table directly: it stores the two coefficients as is_original (the \(\tilde h\)-weight \(D_r\)) and potential_deriv_coef (the \(h'\)-weight \(C_r\)) on the returned AugmentedDataset.

import numpy as np
import pandas as pd
from rieszreg import ATE

df = pd.DataFrame({"a": [0, 1, 1], "x": [0.2, 0.5, 0.8]})
df

estimand = ATE(treatment="a", covariates=("x",))
aug = estimand.augment(df[["a", "x"]].to_numpy(dtype=float))

aug_df = pd.DataFrame({
    "i": aug.origin_index + 1,
    "A": aug.features[:, 0],
    "X": aug.features[:, 1],
    "D": aug.is_original,
    "C": aug.potential_deriv_coef,
})
aug_df

   i    A    X    D    C
0  1  1.0  0.2  0.0 -1.0
1  2  1.0  0.5  1.0 -1.0
2  3  1.0  0.8  1.0 -1.0
3  1  0.0  0.2  1.0  1.0
4  2  0.0  0.5  0.0  1.0
5  3  0.0  0.8  0.0  1.0

Pick a candidate \(\alpha(a, x) = a + x\) and the squared loss (\(\tilde h(t) = t^2\), \(h'(t) = 2t\)). The per-row contribution becomes \(L_r = D \cdot \alpha^2 + 2 C \cdot \alpha\):

F = aug_df["A"] + aug_df["X"]                    # α evaluated at each augmented point
aug_df["L_r"] = aug.is_original * F**2 + 2.0 * aug.potential_deriv_coef * F
aug_df

   i    A    X    D    C   L_r
0  1  1.0  0.2  0.0 -1.0 -2.40
1  2  1.0  0.5  1.0 -1.0 -0.75
2  3  1.0  0.8  1.0 -1.0 -0.36
3  1  0.0  0.2  1.0  1.0  0.44
4  2  0.0  0.5  0.0  1.0  1.00
5  3  0.0  0.8  0.0  1.0  1.60

aug_df["L_r"].sum()                              # empirical loss

np.float64(-0.46999999999999975)

The same number drops out of the original row-wise form \(\sum_i [\alpha(Z_i)^2 - 2\alpha(1, X_i) + 2\alpha(0, X_i)]\):

((df["a"] + df["x"])**2 - 2*(1 + df["x"]) + 2*(0 + df["x"])).sum()

np.float64(-0.46999999999999975)

Custom estimands

To fit a finite-evaluation functional not covered by the built-ins, write the operator m(alpha)(z, y) and wrap it in a FiniteEvalEstimand. The tracer extracts the linear-form structure automatically.

m must be a finite linear combination of point evaluations of \(\alpha\), built with +, -, scalar *, and calls to alpha(...). Squaring, adding constants, integrals, and other non-linear operations raise.

Python-only

Custom m() is Python-only. From R, define m in Python via reticulate::py_run_string and pass the resulting FiniteEvalEstimand to your estimator. See the R interface page for the recipe.

from rieszreg import FiniteEvalEstimand
from rieszboost import RieszBooster
import numpy as np, pandas as pd

# A weighted contrast: 2x the ATE at level a=2 minus the level at a=1.
def m_weighted_contrast(alpha):
    def inner(z, y=None):
        return 2.0 * (alpha(a=2, x=z["x"]) - alpha(a=0, x=z["x"])) \
               - alpha(a=1, x=z["x"])
    return inner

est = FiniteEvalEstimand(
    feature_keys=("a", "x"),
    m=m_weighted_contrast,
    name="WeightedContrast",
)
print(est)

<rieszreg.estimands.base.FiniteEvalEstimand object at 0x7fa48dd5b320>

The inner function takes (z, y) even when y is unused — that signature is the contract every custom m follows. Pass the estimand to any backend’s estimator class:

rng = np.random.default_rng(0)
df = pd.DataFrame({
    "a": rng.choice([0, 1, 2], size=500).astype(float),
    "x": rng.uniform(0, 1, 500),
})

booster = RieszBooster(estimand=est, n_estimators=100, max_depth=3).fit(df)
booster.predict(df)[:5]

array([ 4.462047 , -2.6374767, -2.7431874, -5.187854 , -5.187854 ],
      dtype=float32)

What the tracer rejects

The errors point at the offending operation:

from rieszreg import FiniteEvalEstimand, trace

def bad_squared(alpha):
    return lambda z, y=None: alpha(a=1, x=z["x"]) ** 2  # nonlinear in alpha

trace(FiniteEvalEstimand(feature_keys=("a", "x"), m=bad_squared), {"a": 0, "x": 0.5})

TypeError: unsupported operand type(s) for ** or pow(): 'LinearForm' and 'int'

def bad_constant(alpha):
    return lambda z, y=None: alpha(a=1, x=z["x"]) + 1.0  # constant offset

trace(FiniteEvalEstimand(feature_keys=("a", "x"), m=bad_constant), {"a": 0, "x": 0.5})

TypeError: Cannot add a scalar to a LinearForm — m must be linear in alpha (no constant offsets).

For estimands with integrals or derivatives of \(\alpha\) (outside the finite-linear-form algebra), Monte-Carlo-sample and approximate as a finite sum.

Initialization

Every RieszEstimator subclass takes an init= constructor arg that controls the α-space starting value of the model. The orchestrator converts init to an η-space base_score via loss.alpha_to_eta; backends use base_score as a prediction offset.

The default — init=None — sets α to the empirical loss-minimizing constant. For any Bregman loss with strictly convex \(h\), the constant \(a^*\) that minimizes the per-row Riesz loss \[ P_n L(a) \;=\; \tilde h(a) \;-\; h'(a) \cdot \bar m, \qquad \bar m = P_n[m(\alpha = 1)(Z, Y)] \] satisfies the FOC \(\tilde h'(a) = h''(a) \bar m\). Using \(\tilde h(t) = t h'(t) - h(t)\) we get \(\tilde h'(t) = t h''(t)\), so the FOC collapses to \(a^* = \bar m\). The result is loss-independent — the only loss-specific step is projecting \(\bar m\) into the loss’s α-domain when its codomain is restricted. Each Loss exposes best_constant_init(m_bar) for that projection.

For the built-in estimands \(\bar m\) is structurally constant (doesn’t depend on the data realization):

Estimand	\(m(\alpha = 1)(Z, Y)\) per row	\(\bar m\)
`ATE`, `ATT`, `AdditiveShift`, `LocalShift`	\(1 + (-1) = 0\)	\(0\)
`TSM(v)`	\(1\)	\(1\)

For custom estimands \(\bar m\) can depend on the data; the orchestrator computes it from the training rows after the validation split (no peeking) by tracing each row and averaging the sum of trace coefficients.

Override the default by passing a float:

booster = RieszBooster(estimand=ATE(), init=0.0)        # explicit zero baseline
booster = RieszBooster(estimand=TSM(level=1), init=2.5) # informed prior

Boundary cases

For BernoulliLoss + TSM (or any density-ratio estimand with \(\bar m = 1\)) the loss-minimizing constant sits on the upper boundary, and the projection clips to \(1 - \varepsilon\). The sigmoid is saturated at that init and gradients are tiny, so iterative training may stall. Pass init=0.5 (or another interior value) when this matters. The KLLoss / BoundedSquaredLoss boundaries can hit the same issue when \(\bar m\) lands exactly on a bound.

Estimand subclass taxonomy and future work

Today every estimand rieszreg handles is a FiniteEvalEstimand: \(m\) reduces to a finite linear combination of point evaluations of \(\mu\). The taxonomy of \(m\)-operators outside this class — integrals, derivatives, and others — is still being designed. Likely subclasses:

Integrals. \(m(\mu)(z, y) = \int \mu(a', x)\,h(a' \mid z)\,da'\) for a known density \(h\). The classic example is a stochastic-intervention policy. Approximate via Monte Carlo to reduce to the finite-evaluation case (with \(K\) samples per row), or support natively via a backend that integrates \(\alpha\) analytically.
Derivatives. \(m(\mu)(z, y) = \partial_a \mu(a, x)\big|_{a = z_a}\). Finite-difference approximations reduce to the finite-evaluation case; backends with autodiff can support directly.
Other linear operators on \(\mu\). Any \(m\) that satisfies linearity on \(\mu\) (additivity and scalar homogeneity) but doesn’t fit the patterns above.

A previously released StochasticIntervention factory is currently stubbed (raises NotImplementedError). A reintroduction will land once the integral-vs-finite-evaluation distinction is wired into the type hierarchy.