A Loss describes the Bregman-Riesz loss the learner minimizes. rieszreg ships four built-in losses (SquaredLoss, KLLoss, BernoulliLoss, BoundedSquaredLoss) and a Loss base class for defining your own.
Bregman-Riesz Losses
Hines & Miles (2025) show that for a convex, differentiable function \(h: \mathbb R \to \mathbb R\) (called \(F\) in their paper), the minimizer of a population loss based on \(h\) (the average Bregman divergence between \(\alpha\) and \(\alpha_0\) with potential \(h\)) can be expressed in a form that does not involve the true Riesz representer \(\alpha_0\).
where \(\tilde h(t) = t h'(t) - h(t)\). Kato (2026) and Hines & Miles both give some intuition for why this loss makes sense and what sensible choices of \(h\) are in different cases. When \(h(t) = t^2\) this reduces to the standard Riesz loss of Chernozhukov, Newey, Quintas-Martínez, and Syrgkanis (2021). rieszreg implements several Bregman-Riesz losses and also allows users to define their own by defining the function \(h\).
Loss table
The “Potential \(h\)” column is \(h\) in α-space — the function the user passes as potential= (or overrides). At fit time the learner produces an unconstrained \(\eta\), the link maps \(\eta\) to \(\alpha = g(\eta)\) in the domain, and \(h\) is evaluated at \(\alpha\). The “Effective potential \(h \circ g\)” column is what actually gets evaluated in \(\eta\)-space — useful when comparing losses to gradient-boosting objectives stated in \(\eta\).
set.seed(0)n <-500x <-runif(n)pi <-1/ (1+exp(-(4* x -2)))a <-rbinom(n, 1, pi)df <-data.frame(a =as.numeric(a), x = x)# Squared loss (default)RieszBooster$new(estimand =ATE(), n_estimators =100L)$fit(df)$predict(df)[1:3]
[1] 1.169958 -1.336773 1.965481
# KL loss for a density-ratio estimandRieszBooster$new(estimand =TSM(level =1), loss =KLLoss(),n_estimators =100L)$fit(df)$predict(df)[1:3]
[1] 1.1560670 0.2848236 1.8298105
# Bounded squared with prior boundsRieszBooster$new(estimand =ATE(),loss =BoundedSquaredLoss(lo =-5.0, hi =5.0),n_estimators =100L)$fit(df)$predict(df)[1:3]
[1] 1.187630 -1.338935 2.180158
Custom losses
NotePython-only
Custom losses are Python-only. From R, define the loss in Python via reticulate::py_run_string and pass the instance via loss=.
Define your own loss the same way the built-ins are defined: subclass Loss, override potential(alpha) (the math symbol \(h\)), potential_deriv(alpha) (\(h'\)), and (if the loss has a non-trivial domain) the link methods. The example below is a pseudo-Huber potential \(h(\alpha) = \sqrt{1 + \alpha^2} - 1\) with the identity link — strictly convex, bounded gradient, no domain restriction.
For a quick experiment without writing a class, instantiate Loss directly with the potential and link as constructor arguments:
loss = Loss( potential=lambda a: np.sqrt(1.0+ a **2) -1.0, potential_deriv=lambda a: a / np.sqrt(1.0+ a **2), link="identity",)
If you skip potential_deriv, the framework will try to derive it via jax.grad when potential is written using jax.numpy ops. If jax isn’t available or your potential uses stock numpy, it falls back to a central-difference numerical derivative. Subclassing with an analytic potential_deriv is fastest; jax autograd is the next-best option.
import jax.numpy as jnp# potential_deriv resolved automatically by jax.gradloss = Loss(potential=lambda a: jnp.sqrt(1.0+ a **2) -1.0, link="identity")loss.potential_deriv(np.array([0.5, 1.0, 2.0]))
Loss from a constructor or a one-off subclass is not save/load round-trippable — the user’s potential callable doesn’t survive the JSON metadata path. Pass the loss explicitly when reloading the estimator. The four built-ins do round-trip.
A link \(g: \mathbb R \to \mathcal D \subset \mathbb R\) is a differentiable, strictly monotone bijection from the real line onto a target domain. The learner produces an unconstrained \(\eta\); the link maps \(\eta\) to \(\alpha = g(\eta)\) in the loss’s natural domain. Loss exposes the link via link_to_alpha(eta) and its inverse alpha_to_eta(alpha).
A link is mathematically redundant in principle — the composed potential \(h \circ g\) is itself convex and differentiable, so any loss with a non-trivial link can be re-expressed with the identity link by absorbing \(g\) into \(h\). The link exists for convenience: it lets you write \(h\) in its natural form (e.g. \(h(t) = t \log t\) for KL, defined for \(t > 0\)) and then constrain \(\hat\alpha\) to that domain at predict time without re-deriving anything.
The inline Loss(...) constructor accepts three named links:
link
\(\alpha = g(\eta)\)
\(\alpha\) domain
"identity" (default)
\(\eta\)
\(\mathbb R\)
"exp"
\(\exp(\eta)\)
\((0, \infty)\)
"sigmoid"
\(1 / (1 + \exp(-\eta))\)
\((0, 1)\)
Subclasses can implement any link — just override link_to_alpha and alpha_to_eta. BoundedSquaredLoss does this with a sigmoid-scaled link \(\alpha = \mathrm{lo} + (\mathrm{hi} - \mathrm{lo}) \cdot \sigma(\eta)\) to keep predictions in a user-specified box.
Kernel ridge (krrr): SquaredLoss only — non-quadratic losses need a Newton iteration on the kernel system.
Random forest, augmentation-style (forestriesz.AugForestRieszRegressor): all four losses. Tree structure from the squared MSE criterion, then a per-leaf Newton replaces each leaf’s stored θ with the Bregman optimum.
Random forest, moment-style (forestriesz.ForestRieszRegressor): SquaredLoss only.
Neural network (riesznet): all four losses. Each loss’s per-row Bregman form is implemented in autograd-friendly torch ops; gradient parity vs the analytic loss is checked at 1e-8.
Pass an unsupported loss and the backend raises at fit time with a clear error.