Write \(\Delta_n(\alpha) := \hat L_n(\alpha) - L(\alpha)\).
Step 1 (loss identity). For any \(\alpha \in L^2(P)\), \[
L(\alpha) - L(\alpha_0)
\;=\; \mathbb{E}[\alpha^2] - \mathbb{E}[\alpha_0^2] - 2\, \mathbb{E}[m(\alpha) - m(\alpha_0)]
\;=\; \mathbb{E}[\alpha^2] - \mathbb{E}[\alpha_0^2] - 2\, \mathbb{E}[\alpha_0(\alpha - \alpha_0)]
\;=\; \|\alpha - \alpha_0\|_{L^2(P)}^2,
\] using the Riesz identity \(\mathbb{E}[m(g)(Z)] = \mathbb{E}[\alpha_0(Z) g(Z)]\) in the second equality, applied to \(g = \alpha - \alpha_0 \in L^2(P)\) (which is in \(L^2\) by (A1) and the assumption \(\alpha \in \mathcal{F}_{K_n, M}\) so \(\|\alpha\|_\infty \le M\)).
Step 2 (oracle decomposition). Let \(\alpha_n^\star \in \arg\min_{\alpha \in \mathcal{F}_{K_n, M}} L(\alpha)\). By optimality of \(\hat\alpha_n\) for \(\hat L_n\), \(\hat L_n(\hat\alpha_n) - \hat L_n(\alpha_n^\star) \le 0\), so \[
\begin{aligned}
L(\hat\alpha_n) - L(\alpha_0)
&= \big[L(\hat\alpha_n) - \hat L_n(\hat\alpha_n)\big] + \big[\hat L_n(\hat\alpha_n) - \hat L_n(\alpha_n^\star)\big] \\
&\quad + \big[\hat L_n(\alpha_n^\star) - L(\alpha_n^\star)\big] + \big[L(\alpha_n^\star) - L(\alpha_0)\big] \\
&\le -\Delta_n(\hat\alpha_n) + 0 + \Delta_n(\alpha_n^\star) + \inf_{\alpha \in \mathcal{F}_{K_n, M}} \|\alpha - \alpha_0\|_{L^2(P)}^2 \\
&\le 2 \sup_{\alpha \in \mathcal{F}_{K_n, M}} |\Delta_n(\alpha)| + \inf_{\alpha \in \mathcal{F}_{K_n, M}} \|\alpha - \alpha_0\|_{L^2(P)}^2.
\end{aligned}
\] The substitution in the third line uses Step 1 to write \(L(\alpha_n^\star) - L(\alpha_0) = \inf_{\alpha} \|\alpha - \alpha_0\|_{L^2(P)}^2\) (since \(\alpha_n^\star\) minimises \(L\) over \(\mathcal{F}_{K_n, M}\)). The infimum vanishes by (A3); it remains to control the supremum.
Step 3 (uniform LLN). Decompose \[
\Delta_n(\alpha) \;=\; \underbrace{\big(\hat E_n \alpha^2 - \mathbb{E}[\alpha^2]\big)}_{=:\, U_n(\alpha)} \;-\; 2\, \underbrace{\big(\hat E_n F_\alpha - \mathbb{E}[F_\alpha]\big)}_{=:\, V_n(\alpha)}, \qquad F_\alpha(z) \,:=\, \sum_{r=1}^{R(z)} c_r(z)\, \alpha(z_r(z)),
\] where \(\hat E_n f := \frac{1}{n}\sum_i f(Z_i)\). By (A1)-(A2), every \(\alpha \in \mathcal{F}_{K_n, M}\) satisfies \(\|\alpha\|_\infty \le M\), \(\|\alpha^2\|_\infty \le M^2\), and \(\|F_\alpha\|_\infty \le BM\).
We bound the empirical \(L^1\)-covering number of \(\mathcal{F}_{K_n, M}\), \(\mathcal{Q}_n := \{\alpha^2 : \alpha \in \mathcal{F}_{K_n, M}\}\), and \(\mathcal{M}_n := \{F_\alpha : \alpha \in \mathcal{F}_{K_n, M}\}\). For a class \(\mathcal{F}\) and a measure \(\mu\) on \(\mathcal{Z}\), write \(N_1(\epsilon, \mathcal{F}, \mu)\) for the smallest \(N\) such that \(\mathcal{F}\) admits an \(\epsilon\)-cover in \(L^1(\mu)\).
Each \(\alpha \in \mathcal{F}_{K_n, M}\) is determined by a partition \(\Pi \in \mathcal{T}_{K_n}\) and a value vector \((\alpha_\ell)_{\ell=1}^{K_n} \in [-M, M]^{K_n}\). Two functions with the same partition and value vectors \((\alpha_\ell), (\alpha'_\ell)\) satisfy \[
\|\alpha - \alpha'\|_{L^1(P_n)} \;\le\; \max_\ell |\alpha_\ell - \alpha'_\ell|.
\] The number of distinct partitions in \(\mathcal{T}_{K_n}\) that yield distinct restrictions to the data points \(Z_1, \ldots, Z_n\) together with all augmented evaluation points \(\{z_r(Z_i) : i \le n,\, r \le R\}\) (a set of size at most \(nR\)) is at most \((d \cdot nR)^{K_n - 1} \le (dnR)^{K_n}\): each of the \(\le K_n - 1\) splits picks one of \(d\) axes and one of at most \(nR\) achievable thresholds along that axis. Combined with an \(\epsilon\)-cover of \([-M, M]^{K_n}\) in \(L^\infty\) of size \(\lceil 2M/\epsilon \rceil^{K_n}\), \[
N_1\!\big(\epsilon,\, \mathcal{F}_{K_n, M},\, P_n\big) \;\le\; (dnR)^{K_n}\, (2M/\epsilon)^{K_n}.
\] Composition: since \(|a^2 - b^2| \le 2M\, |a - b|\) on \([-M, M]\) and \(|F_\alpha(z) - F_{\alpha'}(z)| \le T(z)\, \max_\ell |\alpha_\ell - \alpha'_\ell| \le B\, \max_\ell |\alpha_\ell - \alpha'_\ell|\) by (A2), \[
N_1\!\big(\epsilon,\, \mathcal{Q}_n,\, P_n\big) \;\le\; (dnR)^{K_n}\, (4M^2/\epsilon)^{K_n},
\qquad
N_1\!\big(\epsilon,\, \mathcal{M}_n,\, P_n\big) \;\le\; (dnR)^{K_n}\, (2BM/\epsilon)^{K_n}.
\]
Pollard’s uniform Vapnik-Chervonenkis inequality (Pollard 1984, Theorem II.24; equivalently Devroye, Györfi and Lugosi 1996, Theorem 12.6) states that for any class \(\mathcal{F}\) of \([-B', B']\)-valued measurable functions and any \(\epsilon > 0\), \[
P\!\left(\sup_{f \in \mathcal{F}} \big|\hat E_n f - \mathbb{E}[f]\big| > \epsilon\right) \;\le\; 8\, \mathbb{E}\!\big[N_1(\epsilon/8, \mathcal{F}, P_n)\big]\, \exp\!\big(-n \epsilon^2 / (128\, B'^2)\big).
\] Apply this to \(\mathcal{Q}_n\) with \(B' = M^2\) and to \(\mathcal{M}_n\) with \(B' = BM\): \[
\begin{aligned}
P\!\left(\sup_{\alpha} |U_n(\alpha)| > \epsilon\right) \;&\le\; 8\, (dnR)^{K_n}\, (32 M^2/\epsilon)^{K_n}\, \exp\!\big(-n \epsilon^2 / (128\, M^4)\big), \\
P\!\left(\sup_{\alpha} |V_n(\alpha)| > \epsilon\right) \;&\le\; 8\, (dnR)^{K_n}\, (16 BM/\epsilon)^{K_n}\, \exp\!\big(-n \epsilon^2 / (128\, B^2 M^2)\big).
\end{aligned}
\] Taking logarithms, the exponents are \[
K_n \log(dnR) + K_n \log(C/\epsilon) - n \epsilon^2 / C',
\] which tend to \(-\infty\) for any fixed \(\epsilon > 0\) provided \(K_n \log n / n \to 0\), i.e., (A4). Hence \[
\sup_{\alpha \in \mathcal{F}_{K_n, M}} |\Delta_n(\alpha)| \;\le\; \sup_\alpha |U_n(\alpha)| + 2 \sup_\alpha |V_n(\alpha)| \;\xrightarrow{P}\; 0.
\]
Step 4 (conclusion). Combining Steps 2 and 3, \(L(\hat\alpha_n) - L(\alpha_0) \xrightarrow{P} 0\). By Step 1, \(L(\hat\alpha_n) - L(\alpha_0) = \|\hat\alpha_n - \alpha_0\|_{L^2(P)}^2\), so \(\|\hat\alpha_n - \alpha_0\|_{L^2(P)} \xrightarrow{P} 0\). \(\blacksquare\)