krrr provides KernelRidgeBackend and the KernelRieszRegressor convenience class. It implements Singh, Kernel Ridge Riesz Representers (arXiv:2102.11076) for the full set of estimands, by piping rieszreg’s augmentation engine into a closed-form kernel solve.
The augmented dataset gives per-row coefficients \(a_k\) and \(b_k\). The squared Riesz loss with kernel ridge regularization
Partitioning into \(o = \{a_k > 0\}\) (original rows) and \(c = \{a_k = 0\}\) (counterfactual evaluation points): \(\gamma_c\) is closed form, and \(\gamma_o\) solves a symmetric PSD system on the o-block. A single eigendecomposition of the rescaled kernel matrix solves the entire λ path.
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Length-scale "median", "scott", and "silverman" heuristics resolve at fit time on the augmented training points, so the bandwidth adapts to whatever scale matters for this dataset.
Solver tier
Solver
When to use
Cost
"direct"
n_aug ≤ 3,000
One eigendecomposition; entire λ-path is O(n²) per λ. Exact.
"nystrom_cg"
n_aug ≤ 50,000
Preconditioned CG on the o-block; m landmarks.
"rff"
n_aug very large; shift-invariant kernel
Primal D × D solve via random Fourier features.
"falkon"
n_aug very large; GPU available
Wraps the optional falkon package.
"auto"
default
Dispatches by n_aug.
The solver consumes the augmented dataset directly; you never deal with kernel matrices yourself.
WarningFalkon limitation
The Falkon backend currently drops the \(K_{oc}\,b_c\) coupling on the o-block — Falkon’s standalone API only solves vanilla KRR, not the modified-RHS system the augmentation produces. For estimands where \(n_c\) is small or λ is moderate the bias is small; for tight overlap or extreme λ it is not. Use solver="nystrom_cg" if exactness matters more than scale.
λ selection
Pass a grid of λ values; the backend selects by validation Riesz loss when validation_fraction > 0 or eval_set is given.
For consistency theory, λ should scale \(O(1/n)\). Cross-fitting users should re-tune per fold (sklearn’s cross_val_predict does this if KernelRieszRegressor is wrapped in GridSearchCV).
Loss support
KernelRidgeBackend currently supports SquaredLoss only — non-quadratic losses (KLLoss, BernoulliLoss, BoundedSquaredLoss) require Newton iteration on the kernel system, planned for v0.2. Pass an unsupported loss and the backend raises at fit time with a clear error.
Median-heuristic bandwidth on the augmented dataset. The median is computed on augmented points (originals + counterfactuals from \(m\)). For shift-style estimands this includes the shifted treatment values.
solver="falkon" drops \(K_{oc}\,b_c\). See callout above.