核密度估计(KDE)(一)
Author: Rotch Date: 2025-08-28 Update: 2025-09-03
1. 算法简介
1.1 核密度估计算法介绍
设 S = ( x ( 1 ) , x ( 2 ) , . . . , x ( n ) ) \mathcal{S} = (x^{(1)}, x^{(2)}, ..., x^{(n)}) S = ( x ( 1 ) , x ( 2 ) , ... , x ( n ) ) 是从总体中抽取的一组独立同分布(i.i.d.)样本数据,在统计分析中,我们通常需要通过这组有限样本,推断出总体的概率分布特征,而核密度估计正是实现这一目标的非参数方法.
1.2 核密度估计算法基本思路
核密度估计的核心思想是:每个样本点都会对其周围区域的概率密度产生一定的 “影响”,将所有样本点的贡献叠加起来,就能得到整个数据空间的概率密度分布 ,具体可拆解为以下两个步骤:
为每个样本点分配一个“影响衰减函数” :它用于描述单个样本点对周围区域密度的“影响范围”和“影响强度”. 距离样本点越近的位置受到的影响越强,函数值越大;反之受到的影响越弱,函数值越小.
叠加各样本的“影响衰减函数” :将每个样本点对应的缩放核函数在整个数据空间中进行叠加,叠加后的结果就是核密度估计得到的总体概率密度函数.
上述“影响衰减函数”通常用核函数来定义,因此本算法称为“核密度估计”. 从直观上理解,核密度估计相当于用无数个 “小的概率密度峰”(每个样本点对应的核函数)共同构建出一个 “整体的概率密度山”,这个 “山” 的形状完全由样本点的分布和核函数的特性决定,既能避免参数估计对分布假设的依赖,又能有效平滑数据中的噪声.
2. 核函数
2.1 核函数的定义
在介绍核密度估计前我们先引入对核函数的介绍.
Def 2.1 核函数: \color{blue}{\textbf{Def 2.1 核函数: }} Def 2.1 核函数 : 一个核函数 K : R → [ 0 , + ∞ ) K: \mathbb{R} \to [0, +\infty) K : R → [ 0 , + ∞ ) 是一个实值函数,满足以下三条性质:
非负性 :K ( u ) ≥ 0 K(u) \geq 0 K ( u ) ≥ 0 对任意的 u ∈ R u \in \mathbb{R} u ∈ R ;
归一性 :∫ R K ( u ) d u = 1 \int_{\mathbb{R}} K(u) \mathrm{d}u = 1 ∫ R K ( u ) d u = 1 ;
对称性 :K ( − u ) = K ( u ) K(-u) = K(u) K ( − u ) = K ( u ) 对任意的 u ∈ R u \in \mathbb{R} u ∈ R .
非负性确保核函数对周围区域的密度贡献是正向的;归一性保证单个样本点对整体密度的总贡献为 1 1 1 ;对称性则确保样本点对其左侧、右侧区域的密度贡献对称,避免引入方向偏差.
以下给出了常见的核函数及其公式:
核函数 公式 高斯核(Gaussian) K ( u ) = 1 2 π exp ( − u 2 2 ) K(u) = \frac{1}{\sqrt{2 \pi}} \exp(-\frac{u^2}{2}) K ( u ) = 2 π 1 exp ( − 2 u 2 ) Epanechnikov 核 K ( u ) = { 3 4 ( 1 − u 2 ) , if ∣ u ∣ < 1 0 , otherwise K(u) = \begin{cases} \frac{3}{4}(1 - u^2), & \text{if } \vert u \vert < 1 \\ 0, & \text{otherwise}\end{cases} K ( u ) = { 4 3 ( 1 − u 2 ) , 0 , if ∣ u ∣ < 1 otherwise 均匀核(Uniform / Rectangular) K ( u ) = { 1 2 , if ∣ u ∣ < 1 0 , otherwise K(u) = \begin{cases} \frac{1}{2}, & \text{if } \vert u \vert < 1 \\ 0, & \text{otherwise}\end{cases} K ( u ) = { 2 1 , 0 , if ∣ u ∣ < 1 otherwise 三角核(Triangular) K ( u ) = { 1 − ∣ u ∣ , if ∣ u ∣ < 1 0 , otherwise K(u) = \begin{cases} 1 - \vert u \vert, & \text{if } \vert u \vert < 1 \\ 0, & \text{otherwise}\end{cases} K ( u ) = { 1 − ∣ u ∣ , 0 , if ∣ u ∣ < 1 otherwise 双权核(Biweight / Quartic) K ( u ) = { 15 16 ( 1 − u 2 ) 2 , if ∣ u ∣ < 1 0 , otherwise K(u) = \begin{cases}\frac{15}{16}(1 - u^2)^2, & \text{if } \vert u \vert < 1 \\ 0, & \text{otherwise}\end{cases} K ( u ) = { 16 15 ( 1 − u 2 ) 2 , 0 , if ∣ u ∣ < 1 otherwise 三权核(Triweight) K ( u ) = { 35 32 ( 1 − u 2 ) 3 , if ∣ u ∣ < 1 0 , otherwise K(u) = \begin{cases} \frac{35}{32}(1 - u^2)^3, & \text{if } \vert u \vert < 1 \\ 0, & \text{otherwise}\end{cases} K ( u ) = { 32 35 ( 1 − u 2 ) 3 , 0 , if ∣ u ∣ < 1 otherwise
2.2 缩放核函数的定义与应用
在上述核函数中存在带宽的概念. 以 Epanechnikov 核为例,其默认带宽 (bandwidth)为 h = 1 h = 1 h = 1 ,即对 ∣ u ∣ < 1 \vert u \vert < 1 ∣ u ∣ < 1 的区域产生影响,对 ∣ u ∣ ≥ 1 \vert u \vert \geq 1 ∣ u ∣ ≥ 1 的区域不产生影响. 这在实际应用中会产生“尺度不匹配”的问题,即样本的分布范围可能与带宽不符.
例如:某身高样本的相邻间隔约为 5 5 5 (单位:厘米,下同),而 Epanechnikov 核函数的带宽默认为 1 1 1 ,导致核函数的影响范围无法覆盖相邻样本,进而使密度曲线"断裂";为解决这一问题,我们可以缩放带宽至 h = 5 h = 5 h = 5 ,将核函数影响范围扩展到 ± 5 \pm 5 ± 5 ,从而有效衔接相邻样本的贡献.
Def 2.2 缩放核函数: \color{blue}{\textbf{Def 2.2 缩放核函数: }} Def 2.2 缩放核函数 : 设 K : R → [ 0 , + ∞ ) K: \mathbb{R} \to [0, +\infty) K : R → [ 0 , + ∞ ) 是一给定的核函数,对带宽 h > 0 h > 0 h > 0 ,定义其缩放核函数 :
K h ( u ) = 1 h K ( u h ) . \begin{equation}
K_h(u) = \frac{1}{h}K(\frac{u}{h}).
\end{equation} K h ( u ) = h 1 K ( h u ) .
显然,∫ R K h ( u ) d u = 1 h ∫ R K ( u h ) d u = 1 \int_\mathbb{R} K_h(u) \mathrm{d}u = \frac{1}{h} \int_\mathbb{R} K(\frac{u}{h}) \mathrm{d}u = 1 ∫ R K h ( u ) d u = h 1 ∫ R K ( h u ) d u = 1 ,即缩放后的核函数不影响核函数的基本性质.
3. 核密度估计算法
3.1 从直方图估计到核密度估计
对于该类问题,最值观的解法是“直方图法”. 例如,设有 n n n 个 1 1 1 维样本点 x ( 1 ) , x ( 2 ) , … , x ( n ) x^{(1)}, x^{(2)}, \dots, x^{(n)} x ( 1 ) , x ( 2 ) , … , x ( n ) ,x ( i ) ∈ [ a , b ] x^{(i)} \in [a, b] x ( i ) ∈ [ a , b ] ,其分布服从概率密度函数 p ( x ) p(x) p ( x ) . 作 [ a , b ] [a, b] [ a , b ] 的等距分划 T : a = a 0 < a 1 < a 2 < ⋯ < a n = b T: a = a_0 < a_1 < a_2 < \dots < a_n = b T : a = a 0 < a 1 < a 2 < ⋯ < a n = b ,a i = a 0 + i n a_i = a_0 + \frac{i}{n} a i = a 0 + n i ,∥ T ∥ = a k − a k − 1 = b − a n \parallel T \parallel = a_k - a_{k - 1} = \frac{b - a}{n} ∥ T ∥= a k − a k − 1 = n b − a . 则可以估计密度函数:
p ^ ( x ) = 1 n ∥ T ∥ ∑ i = 1 n 1 ( x ( i ) ∈ [ a k − 1 , a k ) ) . \begin{equation}
\hat{p}(x) = \frac{1}{n \parallel T \parallel} \sum\limits_{i = 1}^{n} \mathbb{1}\left( x^{(i)} \in [a_{k - 1}, a_k) \right).
\end{equation} p ^ ( x ) = n ∥ T ∥ 1 i = 1 ∑ n 1 ( x ( i ) ∈ [ a k − 1 , a k ) ) .
记 p k = P [ x ( i ) ∈ [ a k − 1 , a k ) ] p_k = \mathbb{P} \left[ x^{(i)} \in [a_{k - 1}, a_k) \right] p k = P [ x ( i ) ∈ [ a k − 1 , a k ) ] ,若 p ∈ C ( R ) p \in C\left( \mathbb{R} \right) p ∈ C ( R ) ,则:
p k = ∥ T ∥ ⋅ p ( ξ k ) , ∃ ξ k ∈ [ a k − 1 , a k ) . \begin{equation}
p_k = \parallel T \parallel \cdot p(\xi_k), \space \exist \xi_k \in [a_{k - 1}, a_k).
\end{equation} p k =∥ T ∥ ⋅ p ( ξ k ) , ∃ ξ k ∈ [ a k − 1 , a k ) .
计算 p ^ \hat{p} p ^ 的期望为:
E [ p ^ ( x ) ] = 1 n ∥ T ∥ ∑ i = 1 n E [ 1 ( x ( i ) ∈ [ a k − 1 , a k ) ) ] = 1 n ∥ T ∥ ⋅ n p k = p ( ξ k ) , x ∈ [ a k − 1 , a k ) . \begin{align}
\mathbb{E} \left[ \hat{p}(x) \right] &= \frac{1}{n \parallel T \parallel} \sum\limits_{i = 1}^{n} \mathbb{E} \left[ \mathbb{1}\left( x^{(i)} \in [a_{k - 1}, a_k) \right) \right] \nonumber \\
&= \frac{1}{n \parallel T \parallel} \cdot n p_k \nonumber \\
&= p(\xi_k), \space x \in [a_{k - 1}, a_k).
\end{align} E [ p ^ ( x ) ] = n ∥ T ∥ 1 i = 1 ∑ n E [ 1 ( x ( i ) ∈ [ a k − 1 , a k ) ) ] = n ∥ T ∥ 1 ⋅ n p k = p ( ξ k ) , x ∈ [ a k − 1 , a k ) .
计算 p ^ \hat{p} p ^ 的方差为:
V a r [ p ^ ( x ) ] = 1 ( n ∥ T ∥ ) 2 ∑ i = 1 n V a r [ 1 ( x ( i ) ∈ [ a k − 1 , a k ) ) ] = 1 ( n ∥ T ∥ ) 2 ⋅ n p k ( 1 − p k ) = p ( ξ k ) ( 1 − ∥ T ∥ ⋅ p ( ξ k ) ) n ∥ T ∥ , x ∈ [ a k − 1 , a k ) . \begin{align}
\mathrm{Var} \left[ \hat{p}(x) \right] &= \frac{1}{(n \parallel T \parallel)^2} \sum\limits_{i = 1}^{n} \mathrm{Var} \left[ \mathbb{1}\left( x^{(i)} \in [a_{k - 1}, a_k) \right) \right] \nonumber \\
&= \frac{1}{(n \parallel T \parallel)^2} \cdot n p_k(1 - p_k) \nonumber \\
&= \frac{p(\xi_k) \left(1 - \parallel T \parallel\cdot p(\xi_k) \right)}{n \parallel T \parallel}, \space x \in [a_{k - 1}, a_k).
\end{align} Var [ p ^ ( x ) ] = ( n ∥ T ∥ ) 2 1 i = 1 ∑ n Var [ 1 ( x ( i ) ∈ [ a k − 1 , a k ) ) ] = ( n ∥ T ∥ ) 2 1 ⋅ n p k ( 1 − p k ) = n ∥ T ∥ p ( ξ k ) ( 1 − ∥ T ∥ ⋅ p ( ξ k ) ) , x ∈ [ a k − 1 , a k ) .
当 ∥ T ∥ → 0 + \parallel T \parallel \rightarrow 0^+ ∥ T ∥→ 0 + 时,ξ k → x , x ∈ [ a k − 1 , a k ) \xi_k \rightarrow x, \space x \in [a_{k - 1}, a_k) ξ k → x , x ∈ [ a k − 1 , a k ) ,有:
E [ p ^ ( x ) ] = p ( ξ k ) → p ( x ) , \begin{equation}
\mathbb{E} \left[ \hat{p}(x) \right] = p(\xi_k) \rightarrow p(x),
\end{equation} E [ p ^ ( x ) ] = p ( ξ k ) → p ( x ) ,
即:p ^ \hat{p} p ^ 是 p p p 的无偏估计. 但注意到此时 V a r [ p ^ ( x ) ] \mathrm{Var} \left[ \hat{p}(x) \right] Var [ p ^ ( x ) ] 会增大,若要减小方差,还需保证 n ∥ T ∥ → + ∞ n \parallel T \parallel \rightarrow +\infty n ∥ T ∥→ + ∞ ,即 n ≫ 1 h n \gg \frac{1}{h} n ≫ h 1 .
显然,该方法存在以下缺陷:
区间的位置和宽度会严重影响估计结果 :例如,改变区间的端点或改变分划 T T T 都会对结果产生影响;
密度函数不连续 :p ^ ( x ) \hat{p}(x) p ^ ( x ) 不连续,无法反映数据分布的平滑性;
难以推广到高维空间 :当数据维度增加时,区间数量会呈指数级增长;且大多数区间内无样本点,致使其密度估计为 0 0 0 .
核密度估计正是为解决这些问题而提出的:它将直方图的 “区间贡献” 替换为 “样本点的核函数贡献”,用连续的核函数替代离散的区间,从而实现平滑、连续的密度估计.
3.2 核密度估计算法推导
首先我们解决第一个问题:摆脱对 T T T 的依赖,其核心思想是不再将 x x x 与 T T T 作关联,而是直接计算 x x x 附近的点的密度. 设总体的随机变量为 X X X ,其概率分布函数为 F ( x ) = P ( X ≤ x ) F(x) = P(X \leq x) F ( x ) = P ( X ≤ x ) ,概率密度函数为 f ( x ) f(x) f ( x ) ,则有:
f ( x ) = d d x F ( x ) = F ( x + h ) − F ( x − h ) 2 h + ο ( h ) = P ( x − h < X ≤ x + h ) 2 h + ο ( h ) . ( h → 0 + ) \begin{align}
f(x) = \frac{\mathrm{d}}{\mathrm{d}x} F(x) &= \frac{F(x + h) - F(x - h)}{2h} + \omicron(h) \nonumber \\
&= \frac{\mathbb{P}(x - h < X \leq x + h)}{2h} + \omicron(h). \space (h \rightarrow 0^+)
\end{align} f ( x ) = d x d F ( x ) = 2 h F ( x + h ) − F ( x − h ) + ο ( h ) = 2 h P ( x − h < X ≤ x + h ) + ο ( h ) . ( h → 0 + )
根据大数定律,当样本空间足够大时,可以用频率近似概率,得到:
P ( x − h < X ≤ x + h ) ≈ 1 n ⋅ ∑ i = 1 n 1 ( ∣ x − x ( i ) ∣ h ≤ 1 ) \begin{equation}
\mathbb{P}(x - h < X \leq x + h) \approx \frac{1}{n} \cdot \sum\limits_{i = 1}^{n} \mathbb{1} \left( \frac{\vert x - x^{(i)} \vert}{h} \leq 1 \right)
\end{equation} P ( x − h < X ≤ x + h ) ≈ n 1 ⋅ i = 1 ∑ n 1 ( h ∣ x − x ( i ) ∣ ≤ 1 )
将 ( 8 ) (8) ( 8 ) 带入 ( 7 ) (7) ( 7 ) ,得:
f ( x ) ≈ 1 2 n h ⋅ ∑ i = 1 n 1 ( ∣ x − x ( i ) ∣ h ≤ 1 ) = 1 n ⋅ ∑ i = 1 n 1 h ⋅ K ( x − x ( i ) h ) = = def f ^ ( x ) , \begin{align}
f(x) &\approx \frac{1}{2nh} \cdot \sum\limits_{i = 1}^{n} \mathbb{1} \left( \frac{\vert x - x^{(i)} \vert}{h} \leq 1 \right) \nonumber \\
&= \frac{1}{n} \cdot \sum\limits_{i = 1}^{n} \frac{1}{h} \cdot K \left(\frac{x - x^{(i)}}{h} \right) \overset{\text{def}}{=\!=} \hat{f}(x),
\end{align} f ( x ) ≈ 2 nh 1 ⋅ i = 1 ∑ n 1 ( h ∣ x − x ( i ) ∣ ≤ 1 ) = n 1 ⋅ i = 1 ∑ n h 1 ⋅ K ( h x − x ( i ) ) = = def f ^ ( x ) ,
其中 K ( u ) = 1 2 1 ( ∣ u ∣ ≤ 1 ) K(u) = \frac{1}{2} \mathbb{1} ( \vert u \vert \leq 1) K ( u ) = 2 1 1 ( ∣ u ∣ ≤ 1 ) ,即均匀核函数. 下面我们验证 f ( x ) f(x) f ( x ) 是概率密度函数,事实上:
∫ R f ^ ( x ) d x = 1 n ∑ i = 1 n ∫ R 1 h K ( x − x ( i ) h ) d x = 1. \begin{equation}
\int_\mathbb{R} \hat{f}(x) \mathrm{d}x = \frac{1}{n} \sum\limits_{i = 1}^{n} \int_\mathbb{R} \frac{1}{h} K\left(\frac{x - x^{(i)}}{h} \right) \mathrm{d}x = 1.
\end{equation} ∫ R f ^ ( x ) d x = n 1 i = 1 ∑ n ∫ R h 1 K ( h x − x ( i ) ) d x = 1.
为解决第二个问题,只需将不连续的均匀核函数替换为其它连续核函数即可. 根据归一性,替换后得到的 f ^ ( x ) \hat{f}(x) f ^ ( x ) 仍为概率密度函数. 在此基础上,问题三也得以解决,我们称这样的方法为核密度估计 .
3.3 核密度估计算法
Algorithm: Kernel Density Estimation \color{blue}{\textbf{Algorithm: Kernel Density Estimation}} Algorithm: Kernel Density Estimation 设 ( x ( 1 ) , x ( 2 ) , … , x ( n ) ) \left( x^{(1)}, x^{(2)}, \dots, x^{(n)} \right) ( x ( 1 ) , x ( 2 ) , … , x ( n ) ) 是一组给定的独立同分布的简单随机样本,K ( x ) K(x) K ( x ) 是给定的核函数,h > 0 h > 0 h > 0 为给定带宽,则核密度估计给出的概率密度估计函数为:
f ^ ( x ) = 1 n h ∑ i = 1 n K ( x − x ( i ) h ) = 1 n ∑ i = 1 n K h ( x − x ( i ) ) . \begin{equation}
\hat{f}(x) = \frac{1}{nh} \sum\limits_{i = 1}^{n} K \left( \frac{x - x^{(i)}}{h} \right) = \frac{1}{n} \sum\limits_{i = 1}^{n}K_h(x - x^{(i)}).
\end{equation} f ^ ( x ) = nh 1 i = 1 ∑ n K ( h x − x ( i ) ) = n 1 i = 1 ∑ n K h ( x − x ( i ) ) .
其中核函数 K ( x ) K(x) K ( x ) 即为样本点的“影响衰减函数”,不同样本点的影响衰减函数彼此相同. 通过选取不同的核函数和带宽,可以改变样本点的影响衰减效应,从而改变概率密度估计. 本文剩余内容将介绍如何选取合适的核函数与带宽,但在此之前,我们先讨论核密度估计的期望、方差及其渐进性质. 为此,我们给出三条假设:
假设 f ∈ C 2 ( R ) f \in C^2(\mathbb{R}) f ∈ C 2 ( R ) 且其二阶导数 ∇ 2 f \nabla^2f ∇ 2 f 平方可积,记为 R ( ∇ 2 f ) = ∫ R [ ∇ 2 f ( x ) ] 2 d x R(\nabla^2 f) = \int_\mathbb{R} \left [ \nabla^2f(x) \right]^2 \mathrm{d}x R ( ∇ 2 f ) = ∫ R [ ∇ 2 f ( x ) ] 2 d x ;
假设核函数 K K K 二阶矩存在且平方可积,分别记为μ 2 ( K ) = ∫ R u 2 K ( u ) d u \mu_2(K) = \int_{\mathbb{R}} u^2 K(u) \mathrm{d}u μ 2 ( K ) = ∫ R u 2 K ( u ) d u ,R ( K ) = ∫ R K 2 ( u ) d u R(K) = \int_{\mathbb{R}} K^2(u) \mathrm{d}u R ( K ) = ∫ R K 2 ( u ) d u ;
记 h = h n h = h_n h = h n ,假设 n → ∞ n \rightarrow \infty n → ∞ 且 h → 0 + h \rightarrow 0^+ h → 0 + 时满足 n h → + ∞ nh \rightarrow +\infty nh → + ∞ .
计算 f ^ \hat{f} f ^ 的期望为:
E [ f ^ ( x ) ] = 1 n h ∑ i = 1 n E [ K ( x − x ( i ) h ) ] = 1 h ∫ R K ( x − t h ) f ( t ) d t = = = = u = x − t h ∫ R K ( u ) f ( x − h u ) d u . \begin{align}
\mathbb{E}\left[ \hat{f}(x) \right] &= \frac{1}{nh} \sum\limits_{i = 1}^{n} \mathbb{E} \left[ K \left( \frac{x - x^{(i)}}{h} \right) \right] \nonumber \\
&= \frac{1}{h} \int_{\mathbb{R}} K \left( \frac{x - t}{h} \right) f(t) \mathrm{d}t \nonumber \\
&\overset{u = \frac{x - t}{h}}{=\!=\!=\!=} \int_{\mathbb{R}} K(u) f(x - hu) \mathrm{d}u.
\end{align} E [ f ^ ( x ) ] = nh 1 i = 1 ∑ n E [ K ( h x − x ( i ) ) ] = h 1 ∫ R K ( h x − t ) f ( t ) d t = = = = u = h x − t ∫ R K ( u ) f ( x − h u ) d u .
对 f ( x − h u ) f(x - hu) f ( x − h u ) 作二阶 Taylor 展开:
f ( x − h u ) = f ( x ) − h z ⋅ ∇ f ( x ) + 1 2 ( h u ) 2 ∇ 2 f ( x ) + o ( h 2 ) , \begin{equation}
f(x - hu) = f(x) - hz \cdot\nabla f(x) + \frac{1}{2} (hu)^2 \nabla^2f(x) + o(h^2),
\end{equation} f ( x − h u ) = f ( x ) − h z ⋅ ∇ f ( x ) + 2 1 ( h u ) 2 ∇ 2 f ( x ) + o ( h 2 ) ,
带入到 E [ f ^ ( x ) ] \mathbb{E}\left[ \hat{f}(x) \right] E [ f ^ ( x ) ] 中得到渐进等式:
E [ f ^ ( x ) ] = ∫ R K ( u ) [ f ( x ) − h u ⋅ ∇ f ( x ) + 1 2 ( h u ) 2 ∇ 2 f ( x ) + o ( h 2 ) ] d u = f ( x ) ∫ R K ( u ) d u − h ⋅ ∇ f ( x ) ∫ R u K ( u ) d u + 1 2 h 2 ∇ 2 f ( x ) ∫ R u 2 K ( u ) d u + o ( h 2 ) = f ( x ) + 1 2 h 2 μ 2 ( K ) ∇ 2 f ( x ) + o ( h 2 ) . \begin{align}
\mathbb{E}\left[ \hat{f}(x) \right] &= \int_{\mathbb{R}} K(u) \left[ f(x) - hu \cdot \nabla f(x) + \frac{1}{2} (hu)^2 \nabla^2 f(x) + o(h^2)\right] \mathrm{d}u \nonumber \\
&= f(x) \int_\mathbb{R} K(u) \mathrm{d}u - h \cdot \nabla f(x) \int_\mathbb{R} u K(u) \mathrm{d}u + \frac{1}{2}h^2 \nabla^2 f(x) \int_\mathbb{R} u^2 K(u) \mathrm{d}u + o(h^2)\nonumber \\
&= f(x) + \frac{1}{2} h^2 \mu_2(K) \nabla^2 f(x) + o(h^2).
\end{align} E [ f ^ ( x ) ] = ∫ R K ( u ) [ f ( x ) − h u ⋅ ∇ f ( x ) + 2 1 ( h u ) 2 ∇ 2 f ( x ) + o ( h 2 ) ] d u = f ( x ) ∫ R K ( u ) d u − h ⋅ ∇ f ( x ) ∫ R u K ( u ) d u + 2 1 h 2 ∇ 2 f ( x ) ∫ R u 2 K ( u ) d u + o ( h 2 ) = f ( x ) + 2 1 h 2 μ 2 ( K ) ∇ 2 f ( x ) + o ( h 2 ) .
注意到 f ( x − h u ) = f ( x ) + O ( h u ) f(x - hu) = f(x) + O(hu) f ( x − h u ) = f ( x ) + O ( h u ) ,计算 f ^ \hat{f} f ^ 的方差为:
V a r [ f ^ ( x ) ] = 1 n h 2 ( E [ K 2 ( x − x ( i ) h ) ] − E 2 [ K ( x − x ( i ) h ) ] ) = 1 n h 2 [ h ∫ R K 2 ( u ) f ( x − h u ) d u − ( h ∫ R K ( u ) f ( x − h u ) d u ) 2 ] = 1 n h 2 [ h ∫ R K 2 ( u ) [ f ( x ) + O ( h u ) ] d u − O ( h 2 ) ] = 1 n h 2 [ h f ( x ) R ( K ) + O ( h 2 ) − O ( h 2 ) ] = f ( x ) R ( K ) n h + O ( 1 n ) = f ( x ) R ( K ) n h + o ( 1 n h ) . \begin{align}
\mathrm{Var} \left[ \hat{f}(x) \right] &= \frac{1}{nh^2} \left( \mathbb{E}\left[ K^2 \left( \frac{x - x^{(i)}}{h} \right) \right] - \mathbb{E}^2 \left[ K \left( \frac{x - x^{(i)}}{h} \right) \right] \right) \nonumber \\
&= \frac{1}{nh^2} \left[ h\int_{\mathbb{R}} K^2(u) f(x - hu) \mathrm{d}u - \left( h \int_{\mathbb{R}} K(u) f(x - hu) \mathrm{d}u \right)^2 \right] \nonumber \\
&= \frac{1}{nh^2} \left[ h\int_{\mathbb{R}} K^2(u) [f(x) + O(hu)] \mathrm{d}u - O(h^2) \right] \nonumber \\
&= \frac{1}{nh^2} \left[ h f(x) R(K) + O(h^2) - O(h^2) \right] \nonumber \\
&= \frac{f(x)R(K)}{nh} + O(\frac{1}{n}) \nonumber \\
&= \frac{f(x)R(K)}{nh} + o(\frac{1}{nh}).
\end{align} Var [ f ^ ( x ) ] = n h 2 1 ( E [ K 2 ( h x − x ( i ) ) ] − E 2 [ K ( h x − x ( i ) ) ] ) = n h 2 1 [ h ∫ R K 2 ( u ) f ( x − h u ) d u − ( h ∫ R K ( u ) f ( x − h u ) d u ) 2 ] = n h 2 1 [ h ∫ R K 2 ( u ) [ f ( x ) + O ( h u )] d u − O ( h 2 ) ] = n h 2 1 [ h f ( x ) R ( K ) + O ( h 2 ) − O ( h 2 ) ] = nh f ( x ) R ( K ) + O ( n 1 ) = nh f ( x ) R ( K ) + o ( nh 1 ) .
其中用到了如下渐进等式:
Prop 3.1: \color{blue}{\textbf{Prop 3.1: }} Prop 3.1: ∫ R K 2 ( u ) O ( h u ) d u = O ( h ) \int_{\mathbb{R}} K^2(u) O(hu) \mathrm{d}u = O(h) ∫ R K 2 ( u ) O ( h u ) d u = O ( h ) .
Proof: \color{brown}{\textbf{Proof: }} Proof: 对于 O ( z ) O(z) O ( z ) ,存在 C > 0 C > 0 C > 0 ,使得 ∣ O ( z ) ∣ < C ∣ z ∣ \vert O(z) \vert < C \vert z \vert ∣ O ( z ) ∣ < C ∣ z ∣ ,于是有:
∣ ∫ R K 2 ( u ) O ( h u ) d u ∣ ≤ ∫ R K 2 ( u ) ∣ O ( h u ) ∣ d u ≤ ∫ R K 2 ( u ) C h ∣ u ∣ d u = C h ∫ R K 2 ( u ) ∣ u ∣ d u = O ( h ) . □ \begin{align}
\lvert \int_{\mathbb{R}} K^2(u) O(hu) \mathrm{d}u \rvert &\leq \int_{\mathbb{R}} K^2(u) \vert O(hu) \vert \mathrm{d}u \nonumber \\
&\leq \int_{\mathbb{R}} K^2(u) Ch\vert u \vert \mathrm{d}u \nonumber \\
&= Ch \int_{\mathbb{R}} K^2(u) \vert u \vert \mathrm{d}u \nonumber \\
&= O(h). \square
\end{align} ∣ ∫ R K 2 ( u ) O ( h u ) d u ∣ ≤ ∫ R K 2 ( u ) ∣ O ( h u ) ∣ d u ≤ ∫ R K 2 ( u ) C h ∣ u ∣ d u = C h ∫ R K 2 ( u ) ∣ u ∣ d u = O ( h ) . □
最后我们给出 B i a s [ f ^ ( x ) ] \mathrm{Bias} \left[ \hat{f}(x) \right] Bias [ f ^ ( x ) ] 的渐进等式:
B i a s [ f ^ ( x ) ] = E [ f ^ ( x ) ] − f ( x ) = 1 2 h 2 ∇ 2 f ( x ) μ 2 ( K ) + o ( h 2 ) . \begin{equation}
\mathrm{Bias} \left[ \hat{f}(x) \right] = \mathbb{E} \left[ \hat{f}(x) \right] - f(x) = \frac{1}{2}h^2 \nabla^2 f(x) \mu_2(K) + o(h^2). \\
\end{equation} Bias [ f ^ ( x ) ] = E [ f ^ ( x ) ] − f ( x ) = 2 1 h 2 ∇ 2 f ( x ) μ 2 ( K ) + o ( h 2 ) .
可以发现,B i a s [ f ^ ( x ) ] \mathrm{Bias} \left[ \hat{f}(x) \right] Bias [ f ^ ( x ) ] 与 h 2 h^2 h 2 和 ∇ 2 f ( x ) \nabla^2 f(x) ∇ 2 f ( x ) 有关. 在 f ( x ) f(x) f ( x ) 为凸函数的区域,∇ 2 f ( x ) > 0 \nabla^2 f(x) > 0 ∇ 2 f ( x ) > 0 ,有 B i a s [ f ^ ( x ) ] > 0 \mathrm{Bias} \left[ \hat{f}(x) \right] > 0 Bias [ f ^ ( x ) ] > 0 ,即核密度估计会高估 f f f ;反之在 f ( x ) f(x) f ( x ) 为凹函数的区域,∇ 2 f ( x ) < 0 \nabla^2 f(x) < 0 ∇ 2 f ( x ) < 0 ,有 B i a s [ f ^ ( x ) ] < 0 \mathrm{Bias} \left[ \hat{f}(x) \right] < 0 Bias [ f ^ ( x ) ] < 0 ,即核密度估计会低估 f f f .
4. 带宽的选择
4.1 带宽评价指标
要选取合适的带宽,首先要给出带宽的评价指标. 我们发现当带宽 h h h 增大时,影响衰减函数图像越“扁”、曲线越平滑,此时方差减小、偏差增大;反之当带宽 h h h 减小时,影响衰减函数图像越“尖”、曲线越陡峭,此时方差增大、偏差减小. 因此,要评价带宽,必须给出对方差和偏差的综合评价指标.
Def 4.1 均方误差: \color{blue}{\textbf{Def 4.1 均方误差: }} Def 4.1 均方误差 : 函数 M S E [ f ^ ( x ) ] = E [ ( f ^ ( x ) − f ( x ) ) 2 ] \mathrm{MSE} \left[ \hat{f}(x) \right] = \mathbb{E}\left[ \left( \hat{f}(x) - f(x) \right)^2 \right] MSE [ f ^ ( x ) ] = E [ ( f ^ ( x ) − f ( x ) ) 2 ] 称为均方误差 (Mean Squared Error).
Prop 4.2: \color{blue}{\textbf{Prop 4.2: }} Prop 4.2: M S E [ f ^ ( x ) ] = V a r [ f ^ ( x ) ] + B i a s 2 [ f ^ ( x ) ] \mathrm{MSE} \left[ \hat{f}(x) \right] = \mathrm{Var} \left[ \hat{f}(x) \right] + \mathrm{Bias}^2 \left[ \hat{f}(x) \right] MSE [ f ^ ( x ) ] = Var [ f ^ ( x ) ] + Bias 2 [ f ^ ( x ) ] .
Proof: \color{brown}{\textbf{Proof: }} Proof: 易知 E [ f ^ ( x ) − E ( f ^ ( x ) ) ] = 0 \mathbb{E} \left[ \hat{f}(x) - \mathbb{E} \left( \hat{f}(x) \right) \right] = 0 E [ f ^ ( x ) − E ( f ^ ( x ) ) ] = 0 ,则有:
M S E [ f ^ ( x ) ] = E [ ( f ^ ( x ) − f ( x ) ) 2 ] = E [ [ ( f ^ ( x ) − E [ f ^ ( x ) ] ) + ( E [ f ^ ( x ) ] − f ( x ) ) ] 2 ] = E [ ( f ^ ( x ) − E [ f ^ ( x ) ] ) 2 ] + 2 E [ ( f ^ ( x ) − E [ f ^ ( x ) ] ) ( E [ f ^ ( x ) ] − f ( x ) ) ] + E [ ( E [ f ^ ( x ) ] − f ( x ) ) 2 ] = V a r ( f ^ ( x ) ) + 2 ( E [ f ^ ( x ) ] − f ( x ) ) E [ f ^ ( x ) − E [ f ^ ( x ) ] ] + ( E [ f ^ ( x ) ] − f ( x ) ) 2 = V a r [ f ^ ( x ) ] + B i a s 2 [ f ^ ( x ) ] . □ \begin{align}
\mathrm{MSE} \left[ \hat{f}(x) \right] &= \mathbb{E}\left[ \left( \hat{f}(x) - f(x) \right)^2 \right] \nonumber \\
&= \mathbb{E} \left[ \left[ \left( \hat{f}(x) - \mathbb{E} \left[ \hat{f}(x) \right] \right) + \left( \mathbb{E} \left[ \hat{f}(x) \right] - f(x) \right) \right]^2 \right] \nonumber \\
&= \mathbb{E} \left[ \left( \hat{f}(x) - \mathbb{E} \left[ \hat{f}(x) \right] \right)^2 \right]
+ 2 \mathbb{E} \left[ \left( \hat{f}(x) - \mathbb{E} \left[ \hat{f}(x) \right] \right) \left( \mathbb{E} \left[ \hat{f}(x) \right] - f(x) \right) \right] \nonumber \\
& \quad + \mathbb{E} \left[ \left(\mathbb{E} \left[ \hat{f}(x) \right] - f(x)\right)^2 \right]\nonumber \\
&= \mathrm{Var} \left( \hat{f}(x) \right)
+ 2 \left( \mathbb{E} \left[ \hat{f}(x) \right] - f(x) \right) \mathbb{E} \left[ \hat{f}(x) - \mathbb{E} \left[ \hat{f}(x) \right] \right]
+ \left(\mathbb{E} \left[ \hat{f}(x) \right] - f(x)\right)^2 \nonumber \\
&= \mathrm{Var} \left[ \hat{f}(x) \right] + \mathrm{Bias}^2 \left[ \hat{f}(x) \right]. \square
\end{align} MSE [ f ^ ( x ) ] = E [ ( f ^ ( x ) − f ( x ) ) 2 ] = E [ [ ( f ^ ( x ) − E [ f ^ ( x ) ] ) + ( E [ f ^ ( x ) ] − f ( x ) ) ] 2 ] = E [ ( f ^ ( x ) − E [ f ^ ( x ) ] ) 2 ] + 2 E [ ( f ^ ( x ) − E [ f ^ ( x ) ] ) ( E [ f ^ ( x ) ] − f ( x ) ) ] + E [ ( E [ f ^ ( x ) ] − f ( x ) ) 2 ] = Var ( f ^ ( x ) ) + 2 ( E [ f ^ ( x ) ] − f ( x ) ) E [ f ^ ( x ) − E [ f ^ ( x ) ] ] + ( E [ f ^ ( x ) ] − f ( x ) ) 2 = Var [ f ^ ( x ) ] + Bias 2 [ f ^ ( x ) ] . □
Def 4.3 均方积分误差: \color{blue}{\textbf{Def 4.3 均方积分误差: }} Def 4.3 均方积分误差 : 积分值 M I S E [ f ^ ( x ) ] = ∫ R M S E [ f ^ ( x ) ] d x \mathrm{MISE} \left[ \hat{f}(x) \right] = \int_\mathbb{R} \mathrm{MSE} \left[ \hat{f}(x) \right] \mathrm{d}x MISE [ f ^ ( x ) ] = ∫ R MSE [ f ^ ( x ) ] d x 称为均方积分误差 (Mean Integrated Squared Error).
最优带宽 h o p t h_{\mathrm{opt}} h opt 理论上是使均方积分误差最小的带宽,即:
h o p t = arg min h M I S E [ f ^ ( x ) ] \begin{equation}
h_{\mathrm{opt}} = \arg \mathop{\min}\limits_{h} \mathrm{MISE}\left[ \hat{f}(x) \right]
\end{equation} h opt = arg h min MISE [ f ^ ( x ) ]
4.2 渐进最优带宽
首先给出 KDE 中 M S E \mathrm{MSE} MSE 和 M I S E \mathrm{MISE} MISE 的渐进形式:
M S E [ f ^ ( x ) ] = V a r [ f ^ ( x ) ] + B i a s 2 [ f ^ ( x ) ] = f ( x ) n h R ( K ) + o ( 1 n h ) + 1 4 h 4 [ ∇ 2 f ( x ) ] 2 μ 2 2 ( K ) + o ( h 4 ) . ⟹ A M S E [ f ^ ( x ) ] = f ( x ) n h R ( K ) + 1 4 h 4 [ ∇ 2 f ( x ) ] 2 μ 2 2 ( K ) , A M I S E [ f ^ ( x ) ] = ∫ R A M S E ( x ) d x = 1 4 h 4 μ 2 2 ( K ) R ( ∇ 2 f ) + R ( K ) n h . \begin{align}
\mathrm{MSE} \left[ \hat{f}(x) \right] &= \mathrm{Var} \left[ \hat{f}(x) \right] + \mathrm{Bias}^2 \left[ \hat{f}(x) \right] \nonumber \\
&= \frac{f(x)}{nh} R(K) + o(\frac{1}{nh}) + \frac{1}{4}h^4 [\nabla^2 f(x)]^2 \mu_2^2(K) + o(h^4). \\
\implies & \mathrm{AMSE} \left[ \hat{f}(x) \right] = \frac{f(x)}{nh} R(K) + \frac{1}{4}h^4 [\nabla^2 f(x)]^2 \mu_2^2(K), \\
&\mathrm{AMISE} \left[ \hat{f}(x) \right] = \int_{\mathbb{R}} \mathrm{AMSE}(x) \mathrm{d}x = \frac{1}{4}h^4 \mu_2^2(K) R(\nabla^2 f) + \frac{R(K)}{nh}.
\end{align} MSE [ f ^ ( x ) ] ⟹ = Var [ f ^ ( x ) ] + Bias 2 [ f ^ ( x ) ] = nh f ( x ) R ( K ) + o ( nh 1 ) + 4 1 h 4 [ ∇ 2 f ( x ) ] 2 μ 2 2 ( K ) + o ( h 4 ) . AMSE [ f ^ ( x ) ] = nh f ( x ) R ( K ) + 4 1 h 4 [ ∇ 2 f ( x ) ] 2 μ 2 2 ( K ) , AMISE [ f ^ ( x ) ] = ∫ R AMSE ( x ) d x = 4 1 h 4 μ 2 2 ( K ) R ( ∇ 2 f ) + nh R ( K ) .
Prop 4.4: \color{blue}{\textbf{Prop 4.4: }} Prop 4.4: f ^ ( x ) → 2 f ( x ) \hat{f}(x) \xrightarrow{2} f(x) f ^ ( x ) 2 f ( x ) .
Proof: \color{brown}{\textbf{Proof: }} Proof: 只需注意到 M S E [ f ^ ( x ) ] = O ( 1 n h ) + O ( h 4 ) = o ( 1 ) \mathrm{MSE} \left[ \hat{f}(x) \right] = O(\frac{1}{nh}) + O(h^4) = o(1) MSE [ f ^ ( x ) ] = O ( nh 1 ) + O ( h 4 ) = o ( 1 ) . □ \square □
Cor 4.5: \color{blue}{\textbf{Cor 4.5: }} Cor 4.5: f ^ ( x ) → 2 , P , d f ( x ) \hat{f}(x) \xrightarrow{2, \mathbb{P}, d} f(x) f ^ ( x ) 2 , P , d f ( x ) .
下面讨论渐进意义下求解最优带宽,即:
h a s y m − o p t = arg min h A M I S E [ f ^ ( x ) ] \begin{equation}
h_{\mathrm{asym-opt}} = \arg \mathop{\min}\limits_{h} \mathrm{AMISE}\left[ \hat{f}(x) \right]
\end{equation} h asym − opt = arg h min AMISE [ f ^ ( x ) ]
对 A M I S E [ f ^ ( x ) ] \mathrm{AMISE} \left[ \hat{f}(x) \right] AMISE [ f ^ ( x ) ] 关于 h h h 求导,得:
∇ h A M I S E [ f ^ ( x ) ] = h 3 μ 2 2 ( K ) R ( ∇ 2 f ) − R ( K ) n h 2 . \begin{equation}
\nabla_h \mathrm{AMISE} \left[ \hat{f}(x) \right] = h^3 \mu_2^2(K) R(\nabla^2 f) - \frac{R(K)}{nh^2}.
\end{equation} ∇ h AMISE [ f ^ ( x ) ] = h 3 μ 2 2 ( K ) R ( ∇ 2 f ) − n h 2 R ( K ) .
令 d d h A M I S E [ f ^ ( x ) ] = 0 \frac{\mathrm{d}}{\mathrm{d}h} \mathrm{AMISE} \left[ \hat{f}(x) \right] = 0 d h d AMISE [ f ^ ( x ) ] = 0 ,有:
h a s y m − o p t = R ( K ) μ 2 2 ( K ) R ( ∇ 2 f ) ⋅ 1 n 5 , A M I S E [ f ^ ( x ) ] min = 5 4 ( R 4 ( K ) ⋅ μ 2 2 ( K ) ⋅ R ( ∇ 2 f ) ⋅ n − 4 ) 1 / 5 . \begin{align}
h_{\mathrm{asym-opt}} &= \sqrt[5]{\frac{R(K)}{\mu_2^2(K)R(\nabla^2 f)} \cdot \frac{1}{n}}, \\
\mathrm{AMISE} \left[ \hat{f}(x) \right]_{\min} &= \frac{5}{4} \left( R^4(K) \cdot \mu_2^2(K) \cdot R(\nabla^2 f) \cdot n^{-4} \right)^{1/5}.
\end{align} h asym − opt AMISE [ f ^ ( x ) ] m i n = 5 μ 2 2 ( K ) R ( ∇ 2 f ) R ( K ) ⋅ n 1 , = 4 5 ( R 4 ( K ) ⋅ μ 2 2 ( K ) ⋅ R ( ∇ 2 f ) ⋅ n − 4 ) 1/5 .
在实际应用中,由于不清楚 f f f 的真实面貌,无法直接求解 h a s y m − o p t h_{\mathrm{asym-opt}} h asym − opt . 下面给出一些常用方法.
4.3 Silverman 方法
Silverman 方法假定 f f f 服从正态分布 N ( μ , σ 2 ) \mathcal{N}(\mu, \sigma^2) N ( μ , σ 2 ) ,即:
f ( x ) = 1 2 π σ e − ( x − μ ) 2 2 σ 2 . \begin{equation}
f(x) = \frac{1}{\sqrt{2 \pi} \sigma} \mathrm{e}^{-\frac{(x - \mu)^2}{2\sigma^2}}.
\end{equation} f ( x ) = 2 π σ 1 e − 2 σ 2 ( x − μ ) 2 .
易知:
∇ 2 f ( x ) = f ( x ) ⋅ ( ( x − μ ) 2 σ 4 − 1 σ 2 ) . \begin{equation}
\nabla^2 f(x) = f(x) \cdot \left( \frac{(x - \mu)^2}{\sigma^4} - \frac{1}{\sigma^2} \right).
\end{equation} ∇ 2 f ( x ) = f ( x ) ⋅ ( σ 4 ( x − μ ) 2 − σ 2 1 ) .
求解 R ( ∇ 2 f ) R(\nabla^2 f) R ( ∇ 2 f ) :
R ( ∇ 2 f ) = ∫ R [ ∇ 2 f ( x ) ] 2 d x = 1 2 π σ 10 [ ∫ R ( x − μ ) 4 e − ( x − μ ) 2 σ 2 d x − 2 σ 2 ∫ R ( x − μ ) 2 e − ( x − μ ) 2 σ 2 d x + σ 4 ∫ R e − ( x − μ ) 2 σ 2 d x ] = 1 2 π σ 10 [ 3 π 4 σ 5 − 2 ⋅ π 2 σ 5 + π σ 5 ] = 3 8 π σ 5 . \begin{align}
R(\nabla^2 f) &= \int_{\mathbb{R}} \left[ \nabla^2 f(x) \right]^2 \mathrm{d}x \nonumber \\
&= \frac{1}{2 \pi \sigma^{10}} \left[ \int_{\mathbb{R}} (x - \mu)^4 \mathrm{e}^{-\frac{(x - \mu)^2}{\sigma^2}} \mathrm{d}x - 2\sigma^2 \int_{\mathbb{R}} (x - \mu)^2 \mathrm{e}^{-\frac{(x - \mu)^2}{\sigma^2}} \mathrm{d}x + \sigma^4 \int_{\mathbb{R}} \mathrm{e}^{-\frac{(x - \mu)^2}{\sigma^2}} \mathrm{d}x \right] \nonumber \\
&= \frac{1}{2 \pi \sigma^{10}} \left[ \frac{3 \sqrt{\pi}}{4} \sigma^5 - 2 \cdot \frac{\sqrt{\pi}}{2} \sigma^5 + \sqrt{\pi} \sigma^5 \nonumber \right] \\
&= \frac{3}{8\sqrt{\pi} \sigma^5}.
\end{align} R ( ∇ 2 f ) = ∫ R [ ∇ 2 f ( x ) ] 2 d x = 2 π σ 10 1 [ ∫ R ( x − μ ) 4 e − σ 2 ( x − μ ) 2 d x − 2 σ 2 ∫ R ( x − μ ) 2 e − σ 2 ( x − μ ) 2 d x + σ 4 ∫ R e − σ 2 ( x − μ ) 2 d x ] = 2 π σ 10 1 [ 4 3 π σ 5 − 2 ⋅ 2 π σ 5 + π σ 5 ] = 8 π σ 5 3 .
记 C = 8 π R ( K ) 3 μ 2 2 ( K ) 5 C = \sqrt[5]{\frac{8 \sqrt{\pi} R(K)}{3 \mu_2^2(K)}} C = 5 3 μ 2 2 ( K ) 8 π R ( K ) ,于是有:
h s i l v e r m a n = C ⋅ σ ⋅ n − 1 / 5 . \begin{equation}
h_{\mathrm{silverman}} = C \cdot \sigma \cdot n^{-1/5}.
\end{equation} h silverman = C ⋅ σ ⋅ n − 1/5 .
最后,我们对 σ \sigma σ 进行估计. 最直接的方法是使用样本的标准差 s s s 估计 σ \sigma σ :
σ ^ s = s . \begin{equation}
\hat{\sigma}_s = s.
\end{equation} σ ^ s = s .
这是因为 E [ s 2 ] = σ 2 \mathbb{E} \left[ s^2 \right] = \sigma^2 E [ s 2 ] = σ 2 . 然而,如果样本点中包含极端值,会导致 σ \sigma σ 被严重高估. 为解决这一问题,我们考虑变换 X = μ + σ Z , Z ∼ N ( 0 , 1 ) X = \mu + \sigma Z, \space Z \sim \mathcal{N}(0, 1) X = μ + σ Z , Z ∼ N ( 0 , 1 ) ,再将样本点进行排序,得到其上四分位数 Q 0.75 Q_{0.75} Q 0.75 和下四分位数 Q 0.25 Q_{0.25} Q 0.25 . 根据正态分布的性质,可得:
Q 0.75 = μ + σ ⋅ Φ − 1 ( 0.75 ) Q 0.25 = μ + σ ⋅ Φ − 1 ( 0.25 ) \begin{align}
Q_{0.75} = \mu + \sigma \cdot \Phi^{-1}(0.75) \\
Q_{0.25} = \mu + \sigma \cdot \Phi^{-1}(0.25)
\end{align} Q 0.75 = μ + σ ⋅ Φ − 1 ( 0.75 ) Q 0.25 = μ + σ ⋅ Φ − 1 ( 0.25 )
其中 Φ = 1 2 π exp ( − 1 2 x 2 ) \Phi = \frac{1}{\sqrt{2 \pi}} \exp(-\frac{1}{2}x^2) Φ = 2 π 1 exp ( − 2 1 x 2 ) 是标准正态函数. 两式相减,得:
σ ^ Q = Q 0.75 − Q 0.25 Φ − 1 ( 0.75 ) − Φ − 1 ( 0.25 ) . \begin{equation}
\hat{\sigma}_Q = \frac{Q_{0.75} - Q_{0.25}}{\Phi^{-1}(0.75) - \Phi^{-1}(0.25)}.
\end{equation} σ ^ Q = Φ − 1 ( 0.75 ) − Φ − 1 ( 0.25 ) Q 0.75 − Q 0.25 .
尽管 σ ^ Q \hat{\sigma}_Q σ ^ Q 相较于 σ ^ s \hat{\sigma}_s σ ^ s 更能避免极端值的影响,但其之运用到了 50 % 50\% 50% 的数据,信息利用不完整. 在数据符合正态分布时,σ ^ s \hat{\sigma}_s σ ^ s 是最优估计,σ ^ Q \hat{\sigma}_Q σ ^ Q 会略大;而数据中出现极端值时,σ ^ s \hat{\sigma}_s σ ^ s 会大幅增大,而 σ ^ Q \hat{\sigma}_Q σ ^ Q 不受影响. 基于此,我们给出如下对 σ \sigma σ 进行估计的经验公式:
σ ^ = min { σ ^ s , σ ^ Q } . \begin{equation}
\hat{\sigma} = \min \{ \hat{\sigma}_s, \hat{\sigma}_Q \}.
\end{equation} σ ^ = min { σ ^ s , σ ^ Q } .
于是得到 Silverman 方法的最终带宽公式:
h s i l v e r m a n = C ⋅ σ ^ ⋅ n − 1 / 5 . \begin{equation}
h_{\mathrm{silverman}} = C \cdot \hat{\sigma} \cdot n^{-1/5}.
\end{equation} h silverman = C ⋅ σ ^ ⋅ n − 1/5 .
若选择 Gauss 核,则 C G ≈ 1.06 C_\mathrm{G} \approx 1.06 C G ≈ 1.06 (这也是著名的经验带宽公式);若选择 Epanechnikov 核,则 C E ≈ 2.34 C_\mathrm{E} \approx 2.34 C E ≈ 2.34 ;若选择均匀核,则 C U ≈ 1 / 36 C_{\mathrm{U}} \approx 1/36 C U ≈ 1/36 . 具体计算过程略.
4.3 插入法(Plug-in)
Def 4.6 k 阶核函数: \color{blue}{\textbf{Def 4.6 } \boldsymbol{k} \textbf{ 阶核函数: }} Def 4.6 k 阶核函数 : 设 K K K 是一个核函数,k ∈ N + k \in \mathbb{N}_+ k ∈ N + . 若对任意的 r = 1 , 2 , … , k − 1 r = 1, 2, \dots, k - 1 r = 1 , 2 , … , k − 1 满足 ∫ R u r K ( u ) d u = 0 \int_{\mathbb{R}} u^rK(u) \mathrm{d}u = 0 ∫ R u r K ( u ) d u = 0 ,且 ∫ R u k K ( u ) d u > 0 \int_{\mathbb{R}} u^kK(u) \mathrm{d}u > 0 ∫ R u k K ( u ) d u > 0 ,则称 K K K 是一个 k k k 阶核函数.
Prop 4.7: \color{blue}{\textbf{Prop 4.7: }} Prop 4.7: K K K 是 k k k 阶核函数,则 k k k 是偶数.
Proof: \color{brown}{\textbf{Proof: }} Proof: 只需注意到 r < k r < k r < k 是奇数时,u r K ( u ) u^r K(u) u r K ( u ) 是奇函数即可. □ \square □
下面,我们介绍一个重要的推导思路,引理 4.8 4.8 4.8 是其最简单的形式. 后续我们将利用其思想推导各等式,不作单独证明.
Lem 4.8: \color{blue}{\textbf{Lem 4.8: }} Lem 4.8: s ∈ N + s \in \mathbb{N}_+ s ∈ N + ,f ∈ C 2 s ( R ) f \in C^{2s}(\mathbb{R}) f ∈ C 2 s ( R ) ,lim x → ∞ ∇ r f ( x ) = 0 , ∀ r = 1 , 2 , … , 2 s \lim\limits_{x \rightarrow \infty} \nabla^r f(x) = 0, \space \forall r = 1, 2, \dots, 2s x → ∞ lim ∇ r f ( x ) = 0 , ∀ r = 1 , 2 , … , 2 s ,则:
∫ R [ ∇ s f ( x ) ] 2 d x = ( − 1 ) s ∫ R ∇ 2 s f ( x ) ⋅ f ( x ) d x . \begin{equation}
\int_{\mathbb{R}} [\nabla^s f(x)]^2 \mathrm{d}x = (-1)^s \int_{\mathbb{R}} \nabla^{2s} f(x) \cdot f(x) \mathrm{d}x.
\end{equation} ∫ R [ ∇ s f ( x ) ] 2 d x = ( − 1 ) s ∫ R ∇ 2 s f ( x ) ⋅ f ( x ) d x .
Proof: \color{brown}{\textbf{Proof: }} Proof:
∫ R [ ∇ s f ( x ) ] 2 d x = ∫ R ∇ s f ( x ) d [ ∇ s − 1 f ( x ) ] = [ ∇ s f ( x ) ∇ s − 1 f ( x ) ] − ∞ + ∞ − ∫ R ∇ s − 1 f ( x ) d [ ∇ s f ( x ) ] = − ∫ R ∇ s − 1 f ( x ) ⋅ ∇ s + 1 f ( x ) d x = … = ( − 1 ) s ∫ R ∇ 2 s f ( x ) ⋅ f ( x ) d x . □ \begin{align}
\int_{\mathbb{R}} [\nabla^s f(x)]^2 \mathrm{d}x &= \int_{\mathbb{R}} \nabla^s f(x) \mathrm{d}[\nabla^{s - 1} f(x)] \nonumber \\
&= \left[ \nabla^s f(x) \nabla^{s - 1} f(x) \right]^{+\infty}_{-\infty} - \int_{\mathbb{R}} \nabla^{s - 1} f(x) \mathrm{d}[\nabla^s f(x)] \nonumber \\
&= -\int_{\mathbb{R}} \nabla^{s - 1} f(x) \cdot \nabla^{s + 1} f(x) \mathrm{d}x \nonumber \\
&= \dots \nonumber \\
&=(-1)^s \int_{\mathbb{R}} \nabla^{2s} f(x) \cdot f(x) \mathrm{d}x. \square
\end{align} ∫ R [ ∇ s f ( x ) ] 2 d x = ∫ R ∇ s f ( x ) d [ ∇ s − 1 f ( x )] = [ ∇ s f ( x ) ∇ s − 1 f ( x ) ] − ∞ + ∞ − ∫ R ∇ s − 1 f ( x ) d [ ∇ s f ( x )] = − ∫ R ∇ s − 1 f ( x ) ⋅ ∇ s + 1 f ( x ) d x = … = ( − 1 ) s ∫ R ∇ 2 s f ( x ) ⋅ f ( x ) d x . □
记:
ψ r = ∫ R ∇ r f ( x ) ⋅ f ( x ) d x = E [ ∇ r f ( X ) ] , \begin{equation}
\psi_r = \int_{\mathbb{R}} \nabla^{r} f(x) \cdot f(x) \mathrm{d}x = \mathbb{E} \left[ \nabla^r f(X) \right],
\end{equation} ψ r = ∫ R ∇ r f ( x ) ⋅ f ( x ) d x = E [ ∇ r f ( X ) ] ,
基于引理 4.8 4.8 4.8 ,我们可以将对 R ( ∇ 2 f ) R(\nabla^2 f) R ( ∇ 2 f ) 的计算转化为对 ψ 4 \psi_4 ψ 4 的计算. 下面我们给出 ψ r \psi_r ψ r 的一种非参数估计方式:
ψ ^ r ( g ) = 1 n ∑ j = 1 n ∇ r f ^ g ( x ( j ) ) = 1 n 2 ∑ i = 1 n ∑ j = 1 n ∇ r L g ( x ( j ) − x ( i ) ) . \begin{align}
\hat{\psi}_r(g) &= \frac{1}{n} \sum\limits_{j = 1}^{n} \nabla^r \hat{f}_g \left( x^{(j)} \right) \nonumber \\
&= \frac{1}{n^2} \sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \nabla^r L_g \left( x^{(j)} - x^{(i)} \right).
\end{align} ψ ^ r ( g ) = n 1 j = 1 ∑ n ∇ r f ^ g ( x ( j ) ) = n 2 1 i = 1 ∑ n j = 1 ∑ n ∇ r L g ( x ( j ) − x ( i ) ) .
其中 g g g 是带宽,L L L 是核函数. 注意这里的带宽和核函数可以与 KDE 的带宽与核函数不同. 于是问题转化为求解 g g g 使得 A M S E [ ψ ^ r ( g ) ] \mathrm{AMSE} \left[ \hat{\psi}_r(g) \right] AMSE [ ψ ^ r ( g ) ] 最小. 其中:
M S E [ ψ ^ r ( g ) ] = E [ ( ψ ^ r ( g ) − ψ r ) 2 ] . \begin{equation}
\mathrm{MSE} \left[ \hat{\psi}_r(g) \right] = \mathbb{E} \left[ \left( \hat{\psi}_r(g) - \psi_r \right)^2 \right].
\end{equation} MSE [ ψ ^ r ( g ) ] = E [ ( ψ ^ r ( g ) − ψ r ) 2 ] .
利用和前文一样的思想,先给出 E [ ψ ^ r ( g ) ] \mathbb{E} \left[ \hat{\psi}_r(g) \right] E [ ψ ^ r ( g ) ] :
E [ ψ ^ r ( g ) ] = E [ 1 n 2 ∑ i , j = 1 n ∇ r L g ( x ( j ) − x ( i ) ) ] = E [ 1 n ∇ r L g ( 0 ) + 1 n 2 ∑ i ≠ j ∇ r L g ( x ( j ) − x ( i ) ) ] = 1 n ∇ r L g ( 0 ) + n − 1 n E [ ∇ r L g ( x ( j ) − x ( i ) ) ] . \begin{align}
\mathbb{E} \left[ \hat{\psi}_r(g) \right] &= \mathbb{E} \left[ \frac{1}{n^2} \sum\limits_{i, j = 1}^{n} \nabla^r L_g \left( x^{(j)} - x^{(i)} \right) \right] \nonumber \\
&= \mathbb{E} \left[\frac{1}{n} \nabla^r L_g(0) + \frac{1}{n^2} \sum\limits_{i \not= j} \nabla^r L_g \left( x^{(j)} - x^{(i)} \right) \right] \nonumber \\
&= \frac{1}{n} \nabla^r L_g(0) + \frac{n - 1}{n} \mathbb{E} \left[ \nabla^r L_g \left( x^{(j)} - x^{(i)} \right) \right].
\end{align} E [ ψ ^ r ( g ) ] = E [ n 2 1 i , j = 1 ∑ n ∇ r L g ( x ( j ) − x ( i ) ) ] = E n 1 ∇ r L g ( 0 ) + n 2 1 i = j ∑ ∇ r L g ( x ( j ) − x ( i ) ) = n 1 ∇ r L g ( 0 ) + n n − 1 E [ ∇ r L g ( x ( j ) − x ( i ) ) ] .
设 L L L 是 k k k 阶核函数,其中:
E [ ∇ r L g ( x ( j ) − x ( i ) ) ] = ∬ R 2 ∇ r L g ( x − y ) f ( x ) f ( y ) d x d y = ∫ R f ( x ) d x ⋅ ∫ R ∇ r L g ( x − y ) f ( y ) d y = ∫ R f ( x ) d x ⋅ ∫ R L g ( x − y ) ∇ r f ( y ) d y = ∬ R 2 L g ( x − y ) f ( x ) ∇ r f ( y ) d x d y = = = = u = x − y g ∬ R 2 L ( u ) f ( y + g u ) ∇ r f ( y ) d u d y = ∬ R 2 L ( u ) [ ∑ l = 0 k 1 l ! ( g u ) l ∇ l f ( x ) + o ( g k ) ] ∇ r f ( y ) d u d y = ∫ R f ( y ) ∇ r f ( y ) d y + 1 k ! g k μ k ( L ) ∫ R ∇ k f ( y ) ∇ r f ( y ) d y + o ( g k ) ∫ R ∇ r f ( y ) d y = ψ r ( g ) + 1 k ! g k μ k ( L ) ψ k + r ( g ) + o ( g k ) . \begin{align}
\mathbb{E} \left[ \nabla^r L_g \left( x^{(j)} - x^{(i)} \right) \right] &= \iint_{\mathbb{R}^2} \nabla^r L_g(x - y) f(x) f(y) \mathrm{d}x \mathrm{d}y \nonumber \\
&= \int_{\mathbb{R}} f(x) \mathrm{d}x \cdot \int_{\mathbb{R}}\nabla^r L_g(x - y) f(y) \mathrm{d}y \nonumber \\
&= \int_{\mathbb{R}} f(x) \mathrm{d}x \cdot \int_{\mathbb{R}} L_g(x - y) \nabla^r f(y) \mathrm{d}y \nonumber \\
&= \iint_{\mathbb{R}^2} L_g(x - y) f(x) \nabla^r f(y) \mathrm{d}x \mathrm{d}y \nonumber \\
&\overset{u = \frac{x - y}{g}}{=\!=\!=\!=} \iint_{\mathbb{R}^2} L(u) f(y + gu) \nabla^r f(y) \mathrm{d}u \mathrm{d}y \nonumber \\
&= \iint_{\mathbb{R}^2} L(u) \left[ \sum\limits_{l = 0}^{k} \frac{1}{l!} (gu)^l \nabla^l f(x) + o(g^k) \right] \nabla^r f(y) \mathrm{d}u \mathrm{d}y \nonumber \\
&= \int_{\mathbb{R}} f(y) \nabla^r f(y) \mathrm{d}y +
\frac{1}{k!} g^k \mu_k(L) \int_{\mathbb{R}} \nabla^k f(y) \nabla^r f(y) \mathrm{d}y \nonumber \\
&\quad + o(g^k) \int_{\mathbb{R}} \nabla^r f(y) \mathrm{d}y \nonumber \\
&= \psi_r(g) + \frac{1}{k!} g^k \mu_k(L) \psi_{k + r}(g) + o(g^k).
\end{align} E [ ∇ r L g ( x ( j ) − x ( i ) ) ] = ∬ R 2 ∇ r L g ( x − y ) f ( x ) f ( y ) d x d y = ∫ R f ( x ) d x ⋅ ∫ R ∇ r L g ( x − y ) f ( y ) d y = ∫ R f ( x ) d x ⋅ ∫ R L g ( x − y ) ∇ r f ( y ) d y = ∬ R 2 L g ( x − y ) f ( x ) ∇ r f ( y ) d x d y = = = = u = g x − y ∬ R 2 L ( u ) f ( y + gu ) ∇ r f ( y ) d u d y = ∬ R 2 L ( u ) [ l = 0 ∑ k l ! 1 ( gu ) l ∇ l f ( x ) + o ( g k ) ] ∇ r f ( y ) d u d y = ∫ R f ( y ) ∇ r f ( y ) d y + k ! 1 g k μ k ( L ) ∫ R ∇ k f ( y ) ∇ r f ( y ) d y + o ( g k ) ∫ R ∇ r f ( y ) d y = ψ r ( g ) + k ! 1 g k μ k ( L ) ψ k + r ( g ) + o ( g k ) .
注意到 n − 1 n ∼ 1 \frac{n - 1}{n} \sim 1 n n − 1 ∼ 1 ,得到:
E [ ψ ^ r ( g ) ] = 1 n ∇ r L g ( 0 ) + ψ r ( g ) + 1 k ! g k μ k ( L ) ψ r + k ( g ) + o ( g k ) = 1 n g r + 1 ∇ r L ( 0 ) + ψ r ( g ) + 1 k ! g k μ k ( L ) ψ r + k ( g ) + o ( g k ) \begin{align}
\mathbb{E} \left[ \hat{\psi}_r(g) \right] &= \frac{1}{n} \nabla^r L_g(0) + \psi_r(g) + \frac{1}{k!} g^k \mu_k(L) \psi_{r + k}(g) + o(g^k) \nonumber \\
&= \frac{1}{ng^{r + 1}} \nabla^r L(0) + \psi_r(g) + \frac{1}{k!} g^k \mu_k(L) \psi_{r + k}(g) + o(g^k)
\end{align} E [ ψ ^ r ( g ) ] = n 1 ∇ r L g ( 0 ) + ψ r ( g ) + k ! 1 g k μ k ( L ) ψ r + k ( g ) + o ( g k ) = n g r + 1 1 ∇ r L ( 0 ) + ψ r ( g ) + k ! 1 g k μ k ( L ) ψ r + k ( g ) + o ( g k )
故偏差为:
B i a s [ ψ ^ r ( g ) ] = 1 n g r + 1 ∇ r L ( 0 ) + 1 k ! g k μ k ( L ) ψ r + k ( g ) + o ( g k ) . \begin{equation}
\mathrm{Bias} \left[ \hat{\psi}_r(g) \right] = \frac{1}{ng^{r + 1}} \nabla^r L(0) + \frac{1}{k!} g^k \mu_k(L) \psi_{r + k}(g) + o(g^k).
\end{equation} Bias [ ψ ^ r ( g ) ] = n g r + 1 1 ∇ r L ( 0 ) + k ! 1 g k μ k ( L ) ψ r + k ( g ) + o ( g k ) .
接下来,我们推导 V a r [ ψ ^ r ( g ) ] \mathrm{Var} \left[ \hat{\psi}_r(g) \right] Var [ ψ ^ r ( g ) ] ,记 ∇ r L g ( x ( i ) − x ( j ) ) = L i j \nabla^r L_g(x^{(i)} - x^{(j)}) = L_{ij} ∇ r L g ( x ( i ) − x ( j ) ) = L ij ,对 V a r [ ψ ^ r ( g ) ] \mathrm{Var} \left[ \hat{\psi}_r(g) \right] Var [ ψ ^ r ( g ) ] 进行展开:
V a r [ ψ ^ r ( g ) ] = 1 n 4 V a r [ ∑ i = 1 n ∑ j = 1 n ∇ r L g ( x ( j ) − x ( i ) ) ] = 1 n 4 ∑ i , j ∑ k , l C o v ( L i j , L k l ) . \begin{align}
\mathrm{Var} \left[ \hat{\psi}_r(g) \right] &= \frac{1}{n^4} \mathrm{Var} \left[ \sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \nabla^r L_g \left( x^{(j)} - x^{(i)} \right) \right] \nonumber \\
&= \frac{1}{n^4} \sum\limits_{i, j} \sum\limits_{k, l} \mathrm{Cov}(L_{ij}, L_{kl}).
\end{align} Var [ ψ ^ r ( g ) ] = n 4 1 Var [ i = 1 ∑ n j = 1 ∑ n ∇ r L g ( x ( j ) − x ( i ) ) ] = n 4 1 i , j ∑ k , l ∑ Cov ( L ij , L k l ) .
对 C o v ( L i j , L k l ) \mathrm{Cov}(L_{ij}, L_{kl}) Cov ( L ij , L k l ) 进行分类讨论:
若 i = j i = j i = j 或 k = l k = l k = l ,此时 C o v ( L i j , L k l ) \mathrm{Cov}(L_{ij}, L_{kl}) Cov ( L ij , L k l ) 退化为 C o v ( L g ( 0 ) , L g ( 0 ) ) = 0 \mathrm{Cov}\left( L_g(0), L_g(0) \right) = 0 Cov ( L g ( 0 ) , L g ( 0 ) ) = 0 .
若 i ≠ j i \not= j i = j 且 k ≠ l k \not= l k = l ,则:
若 ∣ { i , j } ∩ { k , l } ∣ = 2 \vert \{i, j\} \cap \{k, l\} \vert = 2 ∣ { i , j } ∩ { k , l } ∣ = 2 ,这样的情况有 2 A n 2 = 2 n ( n − 1 ) 2 A_n^2 = 2n(n - 1) 2 A n 2 = 2 n ( n − 1 ) 种,根据核函数的对称性,有:C o v ( L i j , L k l ) = V a r ( L i j , L i j ) \mathrm{Cov}(L_{ij}, L_{kl}) = \mathrm{Var}(L_{ij}, L_{ij}) Cov ( L ij , L k l ) = Var ( L ij , L ij ) .
若 ∣ { i , j } ∩ { k , l } ∣ = 1 \vert \{i, j\} \cap \{k, l\} \vert = 1 ∣ { i , j } ∩ { k , l } ∣ = 1 ,不妨设 j = l j = l j = l ,这样的情况有 4 A n 3 = 4 n ( n − 1 ) ( n − 2 ) 4A_n^3 = 4n(n - 1)(n - 2) 4 A n 3 = 4 n ( n − 1 ) ( n − 2 ) 种,根据核函数的对称性,有:C o v ( L i j , L k l ) = V a r ( L i j , L j k ) \mathrm{Cov}(L_{ij}, L_{kl}) = \mathrm{Var}(L_{ij}, L_{jk}) Cov ( L ij , L k l ) = Var ( L ij , L jk ) .
若 ∣ { i , j } ∩ { k , l } ∣ = 0 \vert \{i, j\} \cap \{k, l\} \vert = 0 ∣ { i , j } ∩ { k , l } ∣ = 0 ,L i j L_{ij} L ij 与 L j k L_{jk} L jk 独立,此时有 C o v ( L i j , L k l ) = 0 \mathrm{Cov}(L_{ij}, L_{kl}) = 0 Cov ( L ij , L k l ) = 0 .
于是,得到 V a r [ ψ ^ r ( g ) ] \mathrm{Var} \left[ \hat{\psi}_r(g) \right] Var [ ψ ^ r ( g ) ] 的展开式:
V a r [ ψ ^ r ( g ) ] = 2 ( n − 1 ) n 3 V a r ( L 1 , 2 ) + 4 ( n − 1 ) ( n − 2 ) n 3 C o v ( L 1 , 2 , L 2 , 3 ) . \begin{equation}
\mathrm{Var} \left[ \hat{\psi}_r(g) \right] = \frac{2(n - 1)}{n^3} \mathrm{Var} \left( L_{1, 2} \right) + \frac{4(n - 1)(n - 2)}{n^3} \mathrm{Cov} \left( L_{1, 2}, L_{2, 3} \right).
\end{equation} Var [ ψ ^ r ( g ) ] = n 3 2 ( n − 1 ) Var ( L 1 , 2 ) + n 3 4 ( n − 1 ) ( n − 2 ) Cov ( L 1 , 2 , L 2 , 3 ) .
其中:
E ( L 1 , 2 2 ) = ∬ R 2 [ ∇ r L g ( x − y ) ] 2 f ( x ) f ( y ) d x d y = 1 g 2 r + 1 ∬ R 2 [ ∇ r L ( u ) ] 2 f ( y + u g ) f ( y ) d u d y = 1 g 2 r + 1 ∬ R 2 [ ∇ r L ( u ) ] 2 [ f ( y ) + o ( 1 ) ] f ( y ) d u d y = 1 g 2 r + 1 ψ 0 R ( ∇ r L ) + o ( 1 g 2 r + 1 ) . \begin{align}
\mathbb{E} \left( L_{1, 2}^2 \right) &= \iint_{\mathbb{R}^2} \left[ \nabla^r L_g (x - y) \right]^2 f(x) f(y) \mathrm{d}x \mathrm{d}y \nonumber \\
&= \frac{1}{g^{2r + 1}} \iint_{\mathbb{R}^2} \left[ \nabla^r L (u) \right]^2 f(y + ug) f(y) \mathrm{d}u \mathrm{d}y \nonumber \\
&= \frac{1}{g^{2r + 1}} \iint_{\mathbb{R}^2} \left[ \nabla^r L (u) \right]^2 \left[ f(y) + o(1) \right] f(y) \mathrm{d}u \mathrm{d}y \nonumber \\
&= \frac{1}{g^{2r + 1}} \psi_0 R(\nabla^r L) + o(\frac{1}{g^{2r + 1}}).
\end{align} E ( L 1 , 2 2 ) = ∬ R 2 [ ∇ r L g ( x − y ) ] 2 f ( x ) f ( y ) d x d y = g 2 r + 1 1 ∬ R 2 [ ∇ r L ( u ) ] 2 f ( y + ug ) f ( y ) d u d y = g 2 r + 1 1 ∬ R 2 [ ∇ r L ( u ) ] 2 [ f ( y ) + o ( 1 ) ] f ( y ) d u d y = g 2 r + 1 1 ψ 0 R ( ∇ r L ) + o ( g 2 r + 1 1 ) .
E ( L 1 , 2 ⋅ L 2 , 3 ) = ∭ R 3 ∇ r L g ( x − y ) ∇ r L g ( y − z ) f ( x ) f ( y ) f ( z ) d x d y d z = ∭ R 3 L g ( x − y ) L g ( y − z ) ∇ r f ( x ) f ( y ) ∇ r f ( z ) d x d y d z = ∭ R 3 L g ( u ) L g ( v ) ∇ r f ( y + u g ) f ( y ) ∇ r f ( y − v g ) d u d v d y = ∭ R 3 L g ( u ) L g ( v ) f ( y ) [ ∇ r f ( y ) + o ( 1 ) ] 2 d u d v d y = ∫ R [ ∇ r f ( y ) ] 2 f ( y ) d y + o ( 1 ) . \begin{align}
\mathbb{E} \left( L_{1, 2} \cdot L_{2, 3} \right) &= \iiint_{\mathbb{R}^3} \nabla^r L_g (x - y) \nabla^r L_g (y - z) f(x) f(y) f(z) \mathrm{d}x \mathrm{d}y \mathrm{d}z \nonumber \\
&= \iiint_{\mathbb{R}^3} L_g (x - y) L_g (y - z) \nabla^r f(x) f(y) \nabla^r f(z) \mathrm{d}x \mathrm{d}y \mathrm{d}z \nonumber \\
&= \iiint_{\mathbb{R}^3} L_g (u) L_g (v) \nabla^r f(y + ug) f(y) \nabla^r f(y - vg) \mathrm{d}u \mathrm{d}v \mathrm{d}y \nonumber \\
&= \iiint_{\mathbb{R}^3} L_g (u) L_g (v) f(y) \left[ \nabla^r f(y) + o(1) \right]^2 \mathrm{d}u \mathrm{d}v \mathrm{d}y \nonumber \\
&= \int_{\mathbb{R}} \left[ \nabla^r f(y) \right]^2 f(y) \mathrm{d}y + o(1).
\end{align} E ( L 1 , 2 ⋅ L 2 , 3 ) = ∭ R 3 ∇ r L g ( x − y ) ∇ r L g ( y − z ) f ( x ) f ( y ) f ( z ) d x d y d z = ∭ R 3 L g ( x − y ) L g ( y − z ) ∇ r f ( x ) f ( y ) ∇ r f ( z ) d x d y d z = ∭ R 3 L g ( u ) L g ( v ) ∇ r f ( y + ug ) f ( y ) ∇ r f ( y − vg ) d u d v d y = ∭ R 3 L g ( u ) L g ( v ) f ( y ) [ ∇ r f ( y ) + o ( 1 ) ] 2 d u d v d y = ∫ R [ ∇ r f ( y ) ] 2 f ( y ) d y + o ( 1 ) .
最后,注意到 E ( L 1 , 2 ) = ψ r + o ( 1 ) \mathbb{E}(L_{1, 2}) = \psi_r + o(1) E ( L 1 , 2 ) = ψ r + o ( 1 ) ,有:
V a r [ ψ ^ r ( g ) ] = 2 n 2 [ E ( L 1 , 2 2 ) − E 2 ( L 1 , 2 ) ] + 4 n [ E ( L 1 , 2 ⋅ L 2 , 3 ) − E 2 ( L 1 , 2 ) ] = 2 n 2 g 2 r + 1 ψ 0 R ( ∇ r L ) + o ( 2 n 2 g 2 r + 1 ) − 2 n 2 ψ r 2 + o ( 1 n 2 ) + 4 n ∫ R [ ∇ r f ( y ) ] 2 f ( y ) d y + o ( 1 n ) − 4 n ψ r 2 + o ( 1 n ) = 2 n 2 g 2 r + 1 ψ 0 R ( ∇ r L ) + 4 n [ ∫ R [ ∇ r f ( y ) ] 2 f ( y ) d y − ψ r 2 ] + o ( 2 n 2 g 2 r + 1 + 1 n ) . \begin{align}
\mathrm{Var} \left[ \hat{\psi}_r(g) \right] &= \frac{2}{n^2} \left[ \mathbb{E}(L_{1, 2}^2) - \mathbb{E}^2(L_{1, 2}) \right] + \frac{4}{n} \left[ \mathbb{E}(L_{1, 2} \cdot L_{2, 3}) - \mathbb{E}^2(L_{1, 2}) \right] \nonumber \\
&= \frac{2}{n^2 g^{2r + 1}} \psi_0 R(\nabla^r L) + o(\frac{2}{n^2 g^{2r + 1}}) - \frac{2}{n^2} \psi_r^2 + o(\frac{1}{n^2}) \nonumber \\
&\quad + \frac{4}{n} \int_{\mathbb{R}} \left[ \nabla^r f(y) \right]^2 f(y) \mathrm{d}y + o(\frac{1}{n}) - \frac{4}{n} \psi_r^2 + o(\frac{1}{n}) \nonumber \\
&= \frac{2}{n^2 g^{2r + 1}} \psi_0 R(\nabla^r L) + \frac{4}{n} \left[ \int_{\mathbb{R}} \left[ \nabla^r f(y) \right]^2 f(y) \mathrm{d}y - \psi_r^2 \right] + o(\frac{2}{n^2 g^{2r + 1}} + \frac{1}{n}).
\end{align} Var [ ψ ^ r ( g ) ] = n 2 2 [ E ( L 1 , 2 2 ) − E 2 ( L 1 , 2 ) ] + n 4 [ E ( L 1 , 2 ⋅ L 2 , 3 ) − E 2 ( L 1 , 2 ) ] = n 2 g 2 r + 1 2 ψ 0 R ( ∇ r L ) + o ( n 2 g 2 r + 1 2 ) − n 2 2 ψ r 2 + o ( n 2 1 ) + n 4 ∫ R [ ∇ r f ( y ) ] 2 f ( y ) d y + o ( n 1 ) − n 4 ψ r 2 + o ( n 1 ) = n 2 g 2 r + 1 2 ψ 0 R ( ∇ r L ) + n 4 [ ∫ R [ ∇ r f ( y ) ] 2 f ( y ) d y − ψ r 2 ] + o ( n 2 g 2 r + 1 2 + n 1 ) .
从而得到渐进均方误差:
A M S E [ ψ ^ r ( g ) ] = [ 1 n g r + 1 ∇ r L ( 0 ) + 1 k ! g k μ k ( L ) ψ r + k ( g ) ] 2 + 2 n 2 g 2 r + 1 ψ 0 R ( ∇ r L ) + 4 n [ ∫ R [ ∇ r f ( x ) ] 2 f ( x ) d x − ψ r 2 ] . \begin{align}
\mathrm{AMSE} \left[ \hat{\psi}_r(g) \right] &= \left[ \frac{1}{ng^{r + 1}} \nabla^r L(0) + \frac{1}{k!} g^k \mu_k(L) \psi_{r + k}(g) \right]^2 \nonumber \\
&\quad + \frac{2}{n^2 g^{2r + 1}} \psi_0 R(\nabla^r L) + \frac{4}{n} \left[ \int_{\mathbb{R}} \left[ \nabla^r f(x) \right]^2 f(x) \mathrm{d}x - \psi_r^2 \right].
\end{align} AMSE [ ψ ^ r ( g ) ] = [ n g r + 1 1 ∇ r L ( 0 ) + k ! 1 g k μ k ( L ) ψ r + k ( g ) ] 2 + n 2 g 2 r + 1 2 ψ 0 R ( ∇ r L ) + n 4 [ ∫ R [ ∇ r f ( x ) ] 2 f ( x ) d x − ψ r 2 ] .
这一式子十分复杂,难以通过求偏导并令导数为 0 0 0 的方法求解理论最优带宽,但是我们可以计算其近似最优带宽:通过分析发现,当 g g g 增大时,g 2 k g^{2k} g 2 k 成为主导项,A M S E [ ψ ^ r ( g ) ] \mathrm{AMSE} \left[ \hat{\psi}_r(g) \right] AMSE [ ψ ^ r ( g ) ] 显著增高;g g g 减小时,1 g 2 r + 2 \frac{1}{g^{2r + 2}} g 2 r + 2 1 成主导项,A M S E [ ψ ^ r ( g ) ] \mathrm{AMSE} \left[ \hat{\psi}_r(g) \right] AMSE [ ψ ^ r ( g ) ] 也会显著增高. 因此,减小 A M S E [ ψ ^ r ( g ) ] \mathrm{AMSE} \left[ \hat{\psi}_r(g) \right] AMSE [ ψ ^ r ( g ) ] 的关键在于选取一个平衡的 g g g ,使得 A M S E [ ψ ^ r ( g ) ] \mathrm{AMSE} \left[ \hat{\psi}_r(g) \right] AMSE [ ψ ^ r ( g ) ] 既不会太大也不会太小. 我们根据这样的思想,忽略不占据主导位置的 2 n 2 g 2 r + 1 ψ 0 R ( ∇ r L ) \frac{2}{n^2 g^{2r + 1}} \psi_0 R(\nabla^r L) n 2 g 2 r + 1 2 ψ 0 R ( ∇ r L ) 项和不含 g g g 的 4 n [ ∫ R [ ∇ r f ( x ) ] 2 f ( x ) d x − ψ r 2 ] \frac{4}{n} \left[ \int_{\mathbb{R}} \left[ \nabla^r f(x) \right]^2 f(x) \mathrm{d}x - \psi_r^2 \right] n 4 [ ∫ R [ ∇ r f ( x ) ] 2 f ( x ) d x − ψ r 2 ] 项,令:
1 n g r + 1 ∇ r L ( 0 ) + 1 k ! g k μ k ( L ) ψ r + k ( g ) = 0 , \begin{equation}
\frac{1}{ng^{r + 1}} \nabla^r L(0) + \frac{1}{k!} g^k \mu_k(L) \psi_{r + k}(g) = 0,
\end{equation} n g r + 1 1 ∇ r L ( 0 ) + k ! 1 g k μ k ( L ) ψ r + k ( g ) = 0 ,
得到:
g a p p r o x − o p t = [ k ! ∇ r L ( 0 ) − μ k ( L ) ψ r + k n ] 1 / ( r + k + 1 ) . \begin{equation}
g_{\mathrm{approx-opt}} = \left[ \frac{k! \nabla^r L(0)}{-\mu_k(L) \psi_{r + k} n} \right]^{1/(r + k + 1)}.
\end{equation} g approx − opt = [ − μ k ( L ) ψ r + k n k ! ∇ r L ( 0 ) ] 1/ ( r + k + 1 ) .
将 g a p p r o x − o p t g_{\mathrm{approx-opt}} g approx − opt 带入到 ( 51 ) (51) ( 51 ) 中,仅保留 n n n 最高次数项(主导项,其他项省略):
inf g > 0 A M S E ∼ { 2 R ( ∇ r L ) ψ 0 [ μ k ( L ) ψ r + k − ∇ r L ( 0 ) k ! ] ( 2 r + 1 ) / ( r + k + 1 ) ⋅ n − ( 2 k + 1 ) / ( r + k + 1 ) , k < r 4 n [ ∫ R [ ∇ r f ( x ) ] 2 f ( x ) d x − ψ r 2 ] = 4 n V a r [ ∇ r f ( x ) ] , k > r the sum of the above two terms , k = r . \begin{equation}
\inf\limits_{g > 0} \mathrm{AMSE} \sim \begin{cases}
2 R(\nabla^r L) \psi_0 \left[ \frac{\mu_k(L) \psi_{r + k}}{-\nabla^r L(0) k!} \right]^{(2r + 1) / (r + k + 1)} \cdot n^{-(2k + 1) / (r + k + 1)}, & k < r \\
\frac{4}{n} \left[ \int_{\mathbb{R}} \left[ \nabla^r f(x) \right]^2 f(x) \mathrm{d}x - \psi_r^2 \right] = \frac{4}{n} \mathrm{Var} \left[ \nabla^r f(x) \right], & k > r \\
\text{the sum of the above two terms}, & k = r.
\end{cases}
\end{equation} g > 0 inf AMSE ∼ ⎩ ⎨ ⎧ 2 R ( ∇ r L ) ψ 0 [ − ∇ r L ( 0 ) k ! μ k ( L ) ψ r + k ] ( 2 r + 1 ) / ( r + k + 1 ) ⋅ n − ( 2 k + 1 ) / ( r + k + 1 ) , n 4 [ ∫ R [ ∇ r f ( x ) ] 2 f ( x ) d x − ψ r 2 ] = n 4 Var [ ∇ r f ( x ) ] , the sum of the above two terms , k < r k > r k = r .
一般来讲,我们使用的核函数是 2 2 2 阶核函数,以下也仅以 k = 2 k = 2 k = 2 为例进行讨论. 带入 k = 2 k = 2 k = 2 ,得:
g a p p r o x − o p t = [ 2 ∇ r L ( 0 ) − μ 2 ( L ) ψ r + 2 n ] 1 / ( r + 3 ) . \begin{equation}
g_{\mathrm{approx-opt}} = \left[ \frac{2 \nabla^r L(0)}{-\mu_2(L) \psi_{r + 2} n} \right]^{1/(r + 3)}.
\end{equation} g approx − opt = [ − μ 2 ( L ) ψ r + 2 n 2 ∇ r L ( 0 ) ] 1/ ( r + 3 ) .
我们先回到问题本身,我们的目标是计算 KDE 的最优带宽,即:
h a s y m − o p t = [ R ( K ) μ 2 2 ( K ) ψ 4 n ] 1 / 5 . \begin{equation}
h_{\mathrm{asym-opt}} = \left[ \frac{R(K)}{\mu_2^2(K) \psi_4 n} \right]^{1/5}.
\end{equation} h asym − opt = [ μ 2 2 ( K ) ψ 4 n R ( K ) ] 1/5 .
将 ψ 4 \psi_4 ψ 4 替换为核估计量 ψ ^ 4 ( g ) \hat{\psi}_4(g) ψ ^ 4 ( g ) ,便得到插入法 的最优带宽计算公式:
h p l u g e − i n = [ R ( K ) μ 2 2 ( K ) ψ ^ 4 n ] 1 / 5 . \begin{equation}
h_{\mathrm{pluge-in}} = \left[ \frac{R(K)}{\mu_2^2(K) \hat{\psi}_4 n} \right]^{1/5}.
\end{equation} h pluge − in = [ μ 2 2 ( K ) ψ ^ 4 n R ( K ) ] 1/5 .
然而,ψ ^ 4 ( g ) \hat{\psi}_4(g) ψ ^ 4 ( g ) 的计算依赖于 g a p p r o x − o p t , 4 g_{\mathrm{approx-opt}, 4} g approx − opt , 4 ,根据公式,g a p p r o x − o p t , 4 g_{\mathrm{approx-opt}, 4} g approx − opt , 4 的计算又依赖于 ψ 6 \psi_6 ψ 6 . 依次类推,最终会导致无限的循环:对 ψ r \psi_r ψ r 的估计总是依赖于 ψ r + 2 \psi_{r + 2} ψ r + 2 . 解决这一问题的方法为:选取一个上限 l ∈ N + \mathscr{l} \in \mathbb{N}_+ l ∈ N + ,在计算到 ψ 2 l + 4 \psi_{2\mathscr{l} + 4} ψ 2 l + 4 (或者更一般地,计算到 ψ k l + 4 \psi_{k\mathscr{l} + 4} ψ k l + 4 )时停止,转而用另一种方法进行估计(推导过程将放在本节末尾处):
ψ ^ N S , r = ( − 1 ) r / 2 r ! ( 2 σ ) r + 1 ( r / 2 ) ! π 1 / 2 . \begin{equation}
\hat{\psi}_{\mathrm{NS}, r} = \frac{(-1)^{r / 2} r!}{(2\sigma)^{r + 1}(r / 2)! \pi^{1 / 2}}.
\end{equation} ψ ^ NS , r = ( 2 σ ) r + 1 ( r /2 )! π 1/2 ( − 1 ) r /2 r ! .
例如,我们选取 l = 2 \mathscr{l} = 2 l = 2 ,先估计 ψ ^ 8 = ψ ^ N S , 8 \hat{\psi}_8 = \hat{\psi}_{\mathrm{NS}, 8} ψ ^ 8 = ψ ^ NS , 8 ,利用 ψ ^ 8 \hat{\psi}_8 ψ ^ 8 计算 g a p p r o x − o p t , 6 g_{\mathrm{approx-opt}, 6} g approx − opt , 6 ,进而得到 ψ ^ 6 \hat{\psi}_6 ψ ^ 6 . 利用 ψ ^ 6 \hat{\psi}_6 ψ ^ 6 我们可以计算 g a p p r o x − o p t , 4 g_{\mathrm{approx-opt}, 4} g approx − opt , 4 ,进而得到 ψ ^ 4 \hat{\psi}_4 ψ ^ 4 ,最终得到 h p l u g e − i n , 2 h_{\mathrm{pluge-in}, 2} h pluge − in , 2 . 我们将上述方法综合记为(在 k = 2 k = 2 k = 2 意义下):
h p l u g e − i n , l = [ R ( K ) μ 2 2 ( K ) ψ ^ 4 n ] 1 / 5 , ψ ^ 2 s + 4 = 1 n ∑ j = 1 n ∇ r f ^ g 2 s + 4 ( x ( j ) ) , ψ ^ 2 l + 4 = ( − 1 ) l + 2 ( 2 l + 4 ) ! ( 2 σ ) 2 l + 5 ( l + 2 ) ! π 1 / 2 , g 2 s + 4 = [ 2 ∇ 2 s + 4 L ( 0 ) − μ 2 ( L ) ψ 2 s + 6 n ] 1 / ( 2 s + 7 ) , s = 0 , 1 , 2 , … , l − 1. \begin{align}
h_{\mathrm{pluge-in}, \mathscr{l}} &= \left[ \frac{R(K)}{\mu_2^2(K) \hat{\psi}_4 n} \right]^{1/5}, \\
\hat{\psi}_{2s + 4} &= \frac{1}{n} \sum\limits_{j = 1}^{n} \nabla^r \hat{f}_{g_{2s + 4}} \left( x^{(j)} \right), \\
\hat{\psi}_{2\mathscr{l} + 4} &= \frac{(-1)^{\mathscr{l} + 2} (2\mathscr{l} + 4)!}{(2\sigma)^{2\mathscr{l} + 5}(\mathscr{l} + 2)! \pi^{1 / 2}}, \\
g_{2s + 4} &= \left[ \frac{2 \nabla^{2s + 4} L(0)}{-\mu_2(L) \psi_{2s + 6} n} \right]^{1/(2s + 7)}, \\
\space s &= 0, 1, 2, \dots, \mathscr{l} - 1. \nonumber
\end{align} h pluge − in , l ψ ^ 2 s + 4 ψ ^ 2 l + 4 g 2 s + 4 s = [ μ 2 2 ( K ) ψ ^ 4 n R ( K ) ] 1/5 , = n 1 j = 1 ∑ n ∇ r f ^ g 2 s + 4 ( x ( j ) ) , = ( 2 σ ) 2 l + 5 ( l + 2 )! π 1/2 ( − 1 ) l + 2 ( 2 l + 4 )! , = [ − μ 2 ( L ) ψ 2 s + 6 n 2 ∇ 2 s + 4 L ( 0 ) ] 1/ ( 2 s + 7 ) , = 0 , 1 , 2 , … , l − 1.
以上方法称为 l \mathscr{l} l 阶段插入法 . 目前尚未存在一个客观选择 l \mathscr{l} l 的标准方法,一般而言,取 l = 2 \mathscr{l} = 2 l = 2 即可.
现在来推导式子 ( 58 ) (58) ( 58 ) ,假设 f f f 服从正态分布 N ( μ , σ 2 ) \mathcal{N}(\mu, \sigma^2) N ( μ , σ 2 ) ,则:
∇ r f ( x ) = f ( x ) ⋅ ( − 1 ) r σ r H e r ( x − μ σ ) , \begin{equation}
\nabla^r f(x) = f(x) \cdot \frac{(-1)^r}{\sigma^r} \mathrm{He}_r \left( \frac{x - \mu}{\sigma} \right),
\end{equation} ∇ r f ( x ) = f ( x ) ⋅ σ r ( − 1 ) r He r ( σ x − μ ) ,
其中 H e r ( z ) \mathrm{He}_r (z) He r ( z ) 是概率论的埃尔米特多项式 (Hermite Polynomial, HP),其相关知识这里不作叙述.
ψ r = ∫ R ∇ r f ( x ) f ( x ) d x = ∫ R [ f ( x ) ⋅ ( − 1 ) r σ r H e r ( x − μ σ ) ] f ( x ) d x = ( − 1 ) r σ r ∫ R f 2 ( x ) ⋅ H e r ( x − μ σ ) d x . \begin{align}
\psi_r = \int_{\mathbb{R}} \nabla^r f(x) f(x) \mathrm{d}x &= \int_{\mathbb{R}} \left[ f(x) \cdot \frac{(-1)^r}{\sigma^r} \mathrm{He}_r \left( \frac{x - \mu}{\sigma} \right) \right] f(x) \mathrm{d}x \nonumber \\
&= \frac{(-1)^r}{\sigma^r} \int_{\mathbb{R}} f^2(x) \cdot \mathrm{He}_r \left( \frac{x - \mu}{\sigma} \right) \mathrm{d}x.
\end{align} ψ r = ∫ R ∇ r f ( x ) f ( x ) d x = ∫ R [ f ( x ) ⋅ σ r ( − 1 ) r He r ( σ x − μ ) ] f ( x ) d x = σ r ( − 1 ) r ∫ R f 2 ( x ) ⋅ He r ( σ x − μ ) d x .
记标准正态分布函数为 Φ \Phi Φ ,对式子 ( 64 ) (64) ( 64 ) 进行变量代换,得:
ψ r = ( − 1 ) r σ r + 1 ∫ R Φ 2 ( z ) ⋅ H e r ( z ) d z = ( − 1 ) r 2 π σ r + 1 ∫ R e − z 2 H e r ( z ) d z = = def ( − 1 ) r 2 π σ r + 1 I r . \begin{align}
\psi_r &= \frac{(-1)^r}{\sigma^{r + 1}} \int_{\mathbb{R}} \Phi^2(z) \cdot \mathrm{He}_r \left( z \right) \mathrm{d}z \nonumber \\
&= \frac{(-1)^r}{2\pi \sigma^{r + 1}} \int_{\mathbb{R}} \mathrm{e}^{-z^2} \mathrm{He}_r \left( z \right) \mathrm{d}z \nonumber \\
&\overset{\text{def}}{=\!=} \frac{(-1)^r}{2\pi \sigma^{r + 1}} I_r.
\end{align} ψ r = σ r + 1 ( − 1 ) r ∫ R Φ 2 ( z ) ⋅ He r ( z ) d z = 2 π σ r + 1 ( − 1 ) r ∫ R e − z 2 He r ( z ) d z = = def 2 π σ r + 1 ( − 1 ) r I r .
利用 HP 的生成函数:
e z t − t 2 2 = ∑ r = 0 ∞ t r r ! H e r ( z ) , \begin{equation}
\mathrm{e}^{zt - \frac{t^2}{2}} = \sum\limits_{r = 0}^{\infty} \frac{t^r}{r!} \mathrm{He}_r (z),
\end{equation} e z t − 2 t 2 = r = 0 ∑ ∞ r ! t r He r ( z ) ,
一方面:
∫ R e − z 2 ⋅ e z t − t 2 2 d z = e − t 2 4 ∫ R e − ( z − t 2 ) 2 d z = e − t 2 4 ⋅ π = π ∑ r = 0 ∞ ( − 1 ) r r ! ( t 2 4 ) r = π ∑ r = 0 ∞ ( − 1 ) r 4 r r ! t 2 r , \begin{align}
\int_{\mathbb{R}} \mathrm{e}^{-z^2} \cdot \mathrm{e}^{zt - \frac{t^2}{2}} \mathrm{d}z &= \mathrm{e}^{-\frac{t^2}{4}} \int_{\mathbb{R}} \mathrm{e}^{-(\frac{z - t}{2})^2} \mathrm{d}z \nonumber \\
&= \mathrm{e}^{-\frac{t^2}{4}} \cdot \sqrt{\pi} \nonumber \\
&= \sqrt{\pi} \sum\limits_{r = 0}^{\infty} \frac{(-1)^r}{r!} \left( \frac{t^2}{4} \right)^r \nonumber \\
&= \sqrt{\pi} \sum\limits_{r = 0}^{\infty} \frac{(-1)^r}{4^r r!} t^{2r},
\end{align} ∫ R e − z 2 ⋅ e z t − 2 t 2 d z = e − 4 t 2 ∫ R e − ( 2 z − t ) 2 d z = e − 4 t 2 ⋅ π = π r = 0 ∑ ∞ r ! ( − 1 ) r ( 4 t 2 ) r = π r = 0 ∑ ∞ 4 r r ! ( − 1 ) r t 2 r ,
另一方面:
∫ R e − z 2 ⋅ e z t − t 2 2 d z = ∫ R e − z 2 ⋅ ∑ r = 0 ∞ t r r ! H e r ( z ) d z = ∑ r = 0 ∞ [ ∫ R e − z 2 H e r ( z ) d z ] ⋅ t r r ! = ∑ r = 0 ∞ t r r ! I r . \begin{align}
\int_{\mathbb{R}} \mathrm{e}^{-z^2} \cdot \mathrm{e}^{zt - \frac{t^2}{2}} \mathrm{d}z &= \int_{\mathbb{R}} \mathrm{e}^{-z^2} \cdot \sum\limits_{r = 0}^{\infty} \frac{t^r}{r!} \mathrm{He}_r (z) \mathrm{d}z \nonumber \\
&= \sum\limits_{r = 0}^{\infty} \left[ \int_{\mathbb{R}} \mathrm{e}^{-z^2} \mathrm{He}_r (z) \mathrm{d}z \right] \cdot \frac{t^r}{r!} \nonumber \\
&= \sum\limits_{r = 0}^{\infty} \frac{t^r}{r!} I_r.
\end{align} ∫ R e − z 2 ⋅ e z t − 2 t 2 d z = ∫ R e − z 2 ⋅ r = 0 ∑ ∞ r ! t r He r ( z ) d z = r = 0 ∑ ∞ [ ∫ R e − z 2 He r ( z ) d z ] ⋅ r ! t r = r = 0 ∑ ∞ r ! t r I r .
比较式子 ( 67 ) (67) ( 67 ) 和 ( 68 ) (68) ( 68 ) 可得:
I r = { ( − 1 ) r / 2 r ! 2 r ( r / 2 ) ! ⋅ π , r is even 0 , r is odd . \begin{equation}
I_r = \begin{cases}
\frac{(-1)^{r / 2} r!}{2^r (r/2)!} \cdot \sqrt{\pi}, & r \text{ is even} \\
0, & r \text{ is odd}.
\end{cases}
\end{equation} I r = { 2 r ( r /2 )! ( − 1 ) r /2 r ! ⋅ π , 0 , r is even r is odd .
于是当 r r r 是偶数时:
ψ r = ( − 1 ) r 2 π σ r + 1 I r = ( − 1 ) r / 2 r ! ( 2 σ ) r + 1 ( r / 2 ) ! π 1 / 2 . \begin{align}
\psi_r = \frac{(-1)^r}{2\pi \sigma^{r + 1}} I_r = \frac{(-1)^{r / 2} r!}{(2\sigma)^{r + 1}(r / 2)! \pi^{1 / 2}}.
\end{align} ψ r = 2 π σ r + 1 ( − 1 ) r I r = ( 2 σ ) r + 1 ( r /2 )! π 1/2 ( − 1 ) r /2 r ! .
4.4 留一交叉验证法(Leave-One-Out CV, LOO-CV)
留一交叉验证法同样使用 M I S E \mathrm{MISE} MISE 作为最优带宽的选择标准. 不同的是,留一交叉验证法构建了 L O O C V \mathrm{LOOCV} LOOCV 函数,
首先,回顾 M I S E \mathrm{MISE} MISE 的公式:
M I S E ( f ^ ( x ) ) = ∫ R E [ ( f ^ ( x ) − f ( x ) ) 2 ] d x = E [ ∫ R f ^ 2 ( x ) d x ] − 2 E [ ∫ R f ^ ( x ) f ( x ) d x ] + ∫ R f 2 ( x ) d x . \begin{align}
\mathrm{MISE}(\hat{f}(x)) &= \int_{\mathbb{R}} \mathbb{E}\left[ \left( \hat{f}(x) - f(x) \right)^2 \right] \mathrm{d}x \nonumber \\
&= \mathbb{E}\left[ \int_{\mathbb{R}} \hat{f}^2 (x) \mathrm{d}x \right]
- 2 \mathbb{E}\left[ \int_{\mathbb{R}} \hat{f}(x) f(x) \mathrm{d}x \right]
+ \int_{\mathbb{R}} f^2(x) \mathrm{d}x.
\end{align} MISE ( f ^ ( x )) = ∫ R E [ ( f ^ ( x ) − f ( x ) ) 2 ] d x = E [ ∫ R f ^ 2 ( x ) d x ] − 2 E [ ∫ R f ^ ( x ) f ( x ) d x ] + ∫ R f 2 ( x ) d x .
其中 ∫ R f 2 ( x ) d x \int_{\mathbb{R}} f^2(x) \mathrm{d}x ∫ R f 2 ( x ) d x 与 h h h 无关. 因此最小化 M I S E \mathrm{MISE} MISE 等价于最小化 E [ ∫ R f ^ 2 ( x ) d x ] − 2 E [ ∫ R f ^ ( x ) f ( x ) d x ] \mathbb{E}\left[ \int_{\mathbb{R}} \hat{f}^2 (x) \mathrm{d}x \right] - 2 \mathbb{E}\left[ \int_{\mathbb{R}} \hat{f}(x) f(x) \mathrm{d}x \right] E [ ∫ R f ^ 2 ( x ) d x ] − 2 E [ ∫ R f ^ ( x ) f ( x ) d x ] . 由于 f ( x ) f(x) f ( x ) 未知,无法直接求解. 下面采用 Monte-Carlo 算法的思想,对其进行近似.
对于每个 j = 1 , 2 , … , n j = 1, 2, \dots, n j = 1 , 2 , … , n ,作训练集 S − j = S ∖ { x ( j ) } \mathcal{S}_{-j} = \mathcal{S} \setminus \{x^{(j)}\} S − j = S ∖ { x ( j ) } ,利用 S − j \mathcal{S}_{-j} S − j 作函数 f ^ h , − j ( x ) \hat{f}_{h, -j}(x) f ^ h , − j ( x ) :
f ^ h , − j ( x ) = 1 ( n − 1 ) h ∑ i = 1 , i ≠ j n K ( x − x ( i ) h ) . \begin{equation}
\hat{f}_{h, -j}(x) = \frac{1}{(n - 1)h} \sum\limits_{i = 1, i \not= j}^{n} K \left( \frac{x - x^{(i)}}{h} \right).
\end{equation} f ^ h , − j ( x ) = ( n − 1 ) h 1 i = 1 , i = j ∑ n K ( h x − x ( i ) ) .
显然有:
∫ R f ^ h ( x ) f ( x ) d x ≈ 1 n ∑ j = 1 n f ^ h , − j ( x ( j ) ) \begin{equation}
\int_{\mathbb{R}} \hat{f}_h(x) f(x) \mathrm{d}x \approx \frac{1}{n} \sum\limits_{j = 1}^{n} \hat{f}_{h, -j}(x^{(j)})
\end{equation} ∫ R f ^ h ( x ) f ( x ) d x ≈ n 1 j = 1 ∑ n f ^ h , − j ( x ( j ) )
定义 L O O C V \mathrm{LOOCV} LOOCV 函数:
L O O C V ( h ) = ∫ R f ^ h 2 ( x ) d x − 2 n ∑ j = 1 n f ^ h , − j ( x ( j ) ) . \begin{equation}
\mathrm{LOOCV}(h) = \int_{\mathbb{R}} \hat{f}_h^2 (x) \mathrm{d}x - \frac{2}{n} \sum\limits_{j = 1}^{n} \hat{f}_{h, -j}(x^{(j)}).
\end{equation} LOOCV ( h ) = ∫ R f ^ h 2 ( x ) d x − n 2 j = 1 ∑ n f ^ h , − j ( x ( j ) ) .
容易验证:
E [ L O O C V ( h ) ] = M I S E ( f ^ h ( x ) ) − R ( f ) \begin{equation}
\mathbb{E} \left[ \mathrm{LOOCV}(h) \right] = \mathrm{MISE}(\hat{f}_h(x)) - R(f)
\end{equation} E [ LOOCV ( h ) ] = MISE ( f ^ h ( x )) − R ( f )
因此最小化 L O O C V ( h ) \mathrm{LOOCV}(h) LOOCV ( h ) 等价于最小化 M I S E ( f ^ ( x ) ) \mathrm{MISE}(\hat{f}(x)) MISE ( f ^ ( x )) . 即:
h L O O C V = arg min h L O O C V ( h ) . \begin{equation}
h_{\mathrm{LOOCV}} = \arg \mathop{\min}\limits_{h} \mathrm{LOOCV}\left( h \right).
\end{equation} h LOOCV = arg h min LOOCV ( h ) .
下面补充 L O O C V \mathrm{LOOCV} LOOCV 中 ∫ R f ^ 2 ( x ) d x \int_{\mathbb{R}} \hat{f}^2 (x) \mathrm{d}x ∫ R f ^ 2 ( x ) d x 的计算方法:
∫ R f ^ h ( x ) 2 d x = 1 n 2 h 2 ∑ i = 1 n ∑ j = 1 n ∫ R K ( x − x ( i ) h ) K ( x − x ( j ) h ) d x = = = u = x − x ( i ) h 1 n 2 h 2 ∑ i = 1 n ∑ j = 1 n ∫ R K ( u ) K ( u + x ( i ) − x ( j ) h ) h d u . = 1 n 2 h ∑ i = 1 n ∑ j = 1 n ( K ∗ K ) ( x ( i ) − x ( j ) h ) . \begin{align}
\int_{\mathbb{R}} \hat{f}_h(x)^2 dx &= \frac{1}{n^2 h^2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \int_{\mathbb{R}} K\left( \frac{x - x^{(i)}}{h} \right) K\left( \frac{x - x^{(j)}}{h} \right) \mathrm{d}x \nonumber \\
&\overset{u = \frac{x - x^{(i)}}{h}}{=\!=\!=} \frac{1}{n^2 h^2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \int_{\mathbb{R}} K(u) K \left( u + \frac{x^{(i)} - x^{(j)}}{h} \right) h \mathrm{d}u. \nonumber \\
&= \frac{1}{n^2 h} \sum_{i = 1}^{n} \sum_{j = 1}^{n} (K * K) \left( \frac{x^{(i)} - x^{(j)}}{h} \right)
\end{align}. ∫ R f ^ h ( x ) 2 d x = n 2 h 2 1 i = 1 ∑ n j = 1 ∑ n ∫ R K ( h x − x ( i ) ) K ( h x − x ( j ) ) d x = = = u = h x − x ( i ) n 2 h 2 1 i = 1 ∑ n j = 1 ∑ n ∫ R K ( u ) K ( u + h x ( i ) − x ( j ) ) h d u . = n 2 h 1 i = 1 ∑ n j = 1 ∑ n ( K ∗ K ) ( h x ( i ) − x ( j ) ) .
例如对于 Gauss 核,可直接利用其卷积公式 ( K G ∗ K G ) ( t ) = 1 2 π e − t 2 / 4 (K_{\mathrm{G}} * K_{\mathrm{G}} )(t) = \frac{1}{2\sqrt{\pi}}e^{-t^2/4} ( K G ∗ K G ) ( t ) = 2 π 1 e − t 2 /4 快速计算.
4.5 有偏交叉验证法(Biased CV, BCV)
有偏交叉验证的核心思想是用 R ( ∇ 2 f ^ ) R(\nabla^2 \hat{f}) R ( ∇ 2 f ^ ) 来估计 R ( ∇ 2 f ) R(\nabla^2 f) R ( ∇ 2 f ) . 事实上,前者是后者的有偏估计,我们先计算 R ( ∇ 2 f ^ ) R(\nabla^2 \hat{f}) R ( ∇ 2 f ^ ) 的期望:
E [ R ( ∇ 2 f ^ ) ] = E [ ∫ R [ ∇ 2 f ^ ( x ) ] 2 d x ] = E [ 1 n 2 ∑ i = 1 n ∑ j = 1 n ∫ R ∇ 2 K h ( x − x ( i ) ) ⋅ ∇ 2 K h ( x − x ( j ) ) d x ] = 1 n 2 ∑ i = 1 n ∑ j = 1 n E [ ( ∇ 2 K h ∗ ∇ 2 K h ) ( x ( i ) − x ( j ) ) ] . \begin{align}
\mathbb{E} \left[ R(\nabla^2 \hat{f}) \right] &= \mathbb{E} \left[ \int_{\mathbb{R}} \left[ \nabla^2 \hat{f}(x) \right]^2 \mathrm{d}x \right] \nonumber \\
&= \mathbb{E} \left[ \frac{1}{n^2} \sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \int_{\mathbb{R}} \nabla^2 K_h \left( x - x^{(i)} \right) \cdot \nabla^2 K_h \left( x - x^{(j)} \right) \mathrm{d}x \right] \nonumber \\
&= \frac{1}{n^2} \sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{E} \left[ \left( \nabla^2 K_h \ast \nabla^2K_h \right) \left( x^{(i)} - x^{(j)} \right) \right].
\end{align} E [ R ( ∇ 2 f ^ ) ] = E [ ∫ R [ ∇ 2 f ^ ( x ) ] 2 d x ] = E [ n 2 1 i = 1 ∑ n j = 1 ∑ n ∫ R ∇ 2 K h ( x − x ( i ) ) ⋅ ∇ 2 K h ( x − x ( j ) ) d x ] = n 2 1 i = 1 ∑ n j = 1 ∑ n E [ ( ∇ 2 K h ∗ ∇ 2 K h ) ( x ( i ) − x ( j ) ) ] .
若 i = j i = j i = j ,E [ ( ∇ 2 K h ∗ ∇ 2 K h ) ( x ( i ) − x ( j ) ) ] = E [ h − 5 R ( ∇ 2 K ) ] = h − 5 R ( ∇ 2 K ) \mathbb{E} \left[ \left( \nabla^2 K_h \ast \nabla^2K_h \right) \left( x^{(i)} - x^{(j)} \right) \right] = \mathbb{E} \left[ h^{-5}R(\nabla^2 K) \right] = h^{-5}R(\nabla^2 K) E [ ( ∇ 2 K h ∗ ∇ 2 K h ) ( x ( i ) − x ( j ) ) ] = E [ h − 5 R ( ∇ 2 K ) ] = h − 5 R ( ∇ 2 K ) ;
若 i ≠ j i \not= j i = j ,有:
E [ ( ∇ 2 K h ∗ ∇ 2 K h ) ( x ( i ) − x ( j ) ) ] = ∬ R 2 ( ∇ 2 K h ∗ ∇ 2 K h ) ( s − t ) f ( s ) f ( t ) d s d t = ∫ R ( ∇ 2 K h ∗ ∇ 2 K h ) ( u ) ( f ∗ f ) ( u ) d u . \begin{align}
\mathbb{E} \left[ \left( \nabla^2 K_h \ast \nabla^2K_h \right) \left( x^{(i)} - x^{(j)} \right) \right]
&= \iint_{\mathbb{R^2}} \left( \nabla^2 K_h \ast \nabla^2K_h \right) \left( s - t \right) f(s) f(t) \mathrm{d}s \mathrm{d}t \nonumber \\
&= \int_{\mathbb{R}} \left( \nabla^2 K_h \ast \nabla^2K_h \right) \left( u \right) \left( f \ast f \right) \left( u \right) \mathrm{d}u.
\end{align} E [ ( ∇ 2 K h ∗ ∇ 2 K h ) ( x ( i ) − x ( j ) ) ] = ∬ R 2 ( ∇ 2 K h ∗ ∇ 2 K h ) ( s − t ) f ( s ) f ( t ) d s d t = ∫ R ( ∇ 2 K h ∗ ∇ 2 K h ) ( u ) ( f ∗ f ) ( u ) d u .
带入,得:
E [ R ( ∇ 2 f ^ ) ] = 1 n h 5 R ( ∇ 2 K ) + n − 1 n ∫ R ( ∇ 2 K h ∗ ∇ 2 K h ) ( u ) ( f ∗ f ) ( u ) d u . \begin{align}
\mathbb{E} \left[ R(\nabla^2 \hat{f}) \right]
&= \frac{1}{nh^5} R(\nabla^2 K) + \frac{n - 1}{n} \int_{\mathbb{R}} \left( \nabla^2 K_h \ast \nabla^2K_h \right) \left( u \right) \left( f \ast f \right) \left( u \right) \mathrm{d}u .
\end{align} E [ R ( ∇ 2 f ^ ) ] = n h 5 1 R ( ∇ 2 K ) + n n − 1 ∫ R ( ∇ 2 K h ∗ ∇ 2 K h ) ( u ) ( f ∗ f ) ( u ) d u .
下面证明 lim h → 0 + ∫ R ( ∇ 2 K h ∗ ∇ 2 K h ) ( u ) ( f ∗ f ) ( u ) d u = R ( ∇ 2 f ) \lim\limits_{h \rightarrow 0^+} \int_{\mathbb{R}} \left( \nabla^2 K_h \ast \nabla^2K_h \right) \left( u \right) \left( f \ast f \right) \left( u \right) \mathrm{d}u = R(\nabla^2 f) h → 0 + lim ∫ R ( ∇ 2 K h ∗ ∇ 2 K h ) ( u ) ( f ∗ f ) ( u ) d u = R ( ∇ 2 f ) ,根据 Parseval 定理及卷积定理,有:
∫ R ( ∇ 2 K h ∗ ∇ 2 K h ) ( u ) ( f ∗ f ) ( u ) d u = ∫ R F [ ∇ 2 K h ∗ ∇ 2 K h ] ( t ) ⋅ F [ f ∗ f ] ( t ) d t = ∫ R [ F [ ∇ 2 K h ] ( t ) ] 2 ⋅ [ F [ f ] ( t ) ] 2 d t = ∫ R t 4 [ F [ K ] ( h t ) ] 2 ⋅ [ F [ f ] ( t ) ] 2 d t . \begin{align}
&\quad \space \int_{\mathbb{R}} \left( \nabla^2 K_h \ast \nabla^2K_h \right) \left( u \right) \left( f \ast f \right) \left( u \right) \mathrm{d}u \nonumber \\
&= \int_{\mathbb{R}} \mathcal{F} \left[ \nabla^2 K_h \ast \nabla^2K_h \right] (t) \cdot \mathcal{F} \left[ f \ast f \right] (t) \mathrm{d}t \nonumber \\
&= \int_{\mathbb{R}} \left[ \mathcal{F} \left[ \nabla^2 K_h\right] (t) \right]^2 \cdot \left[ \mathcal{F} \left[ f \right] (t) \right]^2 \mathrm{d}t \nonumber \\
&= \int_{\mathbb{R}} t^4\left[ \mathcal{F} \left[ K \right] (ht) \right]^2 \cdot \left[ \mathcal{F} \left[ f \right] (t) \right]^2 \mathrm{d}t.
\end{align} ∫ R ( ∇ 2 K h ∗ ∇ 2 K h ) ( u ) ( f ∗ f ) ( u ) d u = ∫ R F [ ∇ 2 K h ∗ ∇ 2 K h ] ( t ) ⋅ F [ f ∗ f ] ( t ) d t = ∫ R [ F [ ∇ 2 K h ] ( t ) ] 2 ⋅ [ F [ f ] ( t ) ] 2 d t = ∫ R t 4 [ F [ K ] ( h t ) ] 2 ⋅ [ F [ f ] ( t ) ] 2 d t .
由 F [ K ] ( h t ) → F [ K ] ( 0 ) = 1 \mathcal{F} [K] (ht) \rightarrow \mathcal{F} [K](0) = 1 F [ K ] ( h t ) → F [ K ] ( 0 ) = 1 得:
∫ R ( ∇ 2 K h ∗ ∇ 2 K h ) ( u ) ( f ∗ f ) ( u ) d u = ∫ R t 4 [ F [ f ] ( t ) ] 2 d t = ∫ R [ F [ ∇ 2 f ] ( t ) ] 2 d t = ∫ R [ ∇ 2 f ( t ) ] 2 d t = R ( ∇ 2 f ) . \begin{align}
&\quad \space \int_{\mathbb{R}} \left( \nabla^2 K_h \ast \nabla^2K_h \right) \left( u \right) \left( f \ast f \right) \left( u \right) \mathrm{d}u \nonumber \\
&= \int_{\mathbb{R}} t^4 \left[ \mathcal{F} \left[ f \right] (t) \right]^2 \mathrm{d}t \nonumber \\
&= \int_{\mathbb{R}} \left[ \mathcal{F} \left[ \nabla^2 f \right] (t) \right]^2 \mathrm{d}t \nonumber \\
&= \int_{\mathbb{R}} \left[ \nabla^2 f(t) \right]^2 \mathrm{d}t \nonumber \\
&= R(\nabla^2 f).
\end{align} ∫ R ( ∇ 2 K h ∗ ∇ 2 K h ) ( u ) ( f ∗ f ) ( u ) d u = ∫ R t 4 [ F [ f ] ( t ) ] 2 d t = ∫ R [ F [ ∇ 2 f ] ( t ) ] 2 d t = ∫ R [ ∇ 2 f ( t ) ] 2 d t = R ( ∇ 2 f ) .
于是:
E [ R ( ∇ 2 f ^ ) ] = 1 n h 5 R ( ∇ 2 K ) + R ( ∇ 2 f ) + o ( 1 n ) . \begin{align}
\mathbb{E} \left[ R(\nabla^2 \hat{f}) \right]
&= \frac{1}{nh^5} R(\nabla^2 K) + R(\nabla^2 f) + o(\frac{1}{n}).
\end{align} E [ R ( ∇ 2 f ^ ) ] = n h 5 1 R ( ∇ 2 K ) + R ( ∇ 2 f ) + o ( n 1 ) .
因此,R ( ∇ 2 f ) R(\nabla^2 f) R ( ∇ 2 f ) 的一个良好的无偏估计为:
R ( ∇ 2 f ) ~ = R ( ∇ 2 f ^ ) − 1 n h 5 R ( ∇ 2 K ) . \begin{equation}
\widetilde{R(\nabla^2 f)} = R(\nabla^2 \hat{f}) - \frac{1}{nh^5} R(\nabla^2 K).
\end{equation} R ( ∇ 2 f ) = R ( ∇ 2 f ^ ) − n h 5 1 R ( ∇ 2 K ) .
类似 Silverman 方法,给出有偏交叉验证 函数:
B C V ( h ) = 1 4 h 4 μ 2 2 ( K ) R ( ∇ 2 f ) ~ + R ( K ) n h , h B C V = arg min h B C V ( h ) . \begin{align}
\mathrm{BCV}(h) &= \frac{1}{4}h^4 \mu_2^2(K) \widetilde{R(\nabla^2 f)} + \frac{R(K)}{nh}, \\
h_{\mathrm{BCV}} &= \arg \mathop{\min}\limits_{h} \mathrm{BCV}\left( h \right).
\end{align} BCV ( h ) h BCV = 4 1 h 4 μ 2 2 ( K ) R ( ∇ 2 f ) + nh R ( K ) , = arg h min BCV ( h ) .
容易得出:
h B C V = ( R ( K ) μ 2 2 ( K ) [ R ( ∇ 2 f ^ ) − 1 n h 5 R ( ∇ 2 K ) ] n ) 1 / 5 \begin{equation}
h_{\mathrm{BCV}} = \left(\frac{R(K)}{\mu_2^2(K) [R(\nabla^2 \hat{f}) - \frac{1}{nh^5} R(\nabla^2 K)] n} \right)^{1/5}
\end{equation} h BCV = ( μ 2 2 ( K ) [ R ( ∇ 2 f ^ ) − n h 5 1 R ( ∇ 2 K )] n R ( K ) ) 1/5
4.6 直观法
直观法是核密度估计中带宽选择的 “经验驱动型方法”,核心思路是通过可视化密度曲线形态 和主观判断平滑性与细节的平衡 选择带宽,无需复杂的数学推导或迭代计算,适用于对估计精度要求不高、数据结构简单或需快速探索的场景.
直观法一般给出一系列可选择的带宽 H = { h k ∣ k = 1 , 2 , … , m } \mathcal{H} = \{h_k \mid k = 1, 2, \dots, m\} H = { h k ∣ k = 1 , 2 , … , m } ,生成多组密度函数,进而选取最优带宽(此处的最优指最适合展示或计算的带宽,非理论最优带宽). 例如,在处理小样本问题过程中,Silverman 方法和 LOO-CV 法都有较大误差. 此时可以使用直观法,确定较为合适的带宽. 此外,在数据分析初期,也可通过直观法生成多组带宽的密度曲线,辅助分析数据分布趋势.
5. 核函数的选择
5.1 Epanechnikov 核:理论最优核
在讨论最优核之前,我们先对核函数补充两个限制:
要求任意核函数 K K K 必须有紧支集 ,即 s u p p ( K ) = K − 1 ( R ∖ { 0 } ) ‾ \mathrm{supp}(K) = \overline{K^{-1}(\mathbb{R} \setminus \{0\})} supp ( K ) = K − 1 ( R ∖ { 0 }) 有界,不妨设 s u p p ( K ) = [ − 1 , 1 ] \mathrm{supp}(K) = [-1, 1] supp ( K ) = [ − 1 , 1 ] .
核函数连续.
不符合上述两个限制的核包含 Gauss 核及均匀核,我们将在后面进行讨论. 而在选择最优带宽下,满足上述两个限制的最优核函数是 Epanechnikov 核. 我们将继续在最小化 A M I S E \mathrm{AMISE} AMISE 下讨论.
在前文中我们计算了 A M I S E [ f ^ ( x ) ] \mathrm{AMISE} \left[ \hat{f}(x) \right] AMISE [ f ^ ( x ) ] 的最小值:
A M I S E [ f ^ ( x ) ] min = 5 4 ( R 4 ( K ) ⋅ μ 2 2 ( K ) ⋅ R ( ∇ 2 f ) ⋅ n − 4 ) 1 / 5 . \begin{equation}
\mathrm{AMISE} \left[ \hat{f}(x) \right]_{\min} = \frac{5}{4} \left( R^4(K) \cdot \mu_2^2(K) \cdot R(\nabla^2 f) \cdot n^{-4} \right)^{1/5}.
\end{equation} AMISE [ f ^ ( x ) ] m i n = 4 5 ( R 4 ( K ) ⋅ μ 2 2 ( K ) ⋅ R ( ∇ 2 f ) ⋅ n − 4 ) 1/5 .
这表明在最优化带宽下最小化 A M I S E \mathrm{AMISE} AMISE 等价于最小化 R 2 ( K ) μ 2 ( K ) R^2(K) \mu_2(K) R 2 ( K ) μ 2 ( K ) . 接下来我们使用变分法求解.
由于核函数 K K K 具有紧支集,故对 K K K 的各种积分的界限都可以改为 [ − 1 , 1 ] [-1, 1] [ − 1 , 1 ] ,仍记为原符号. 设 μ 2 ( K ) = A \mu_2(K) = A μ 2 ( K ) = A ,在 μ 2 ( K ) = A \mu_2(K) = A μ 2 ( K ) = A 及 ∫ [ − 1 , 1 ] K ( u ) d u \int_{[-1, 1]} K(u) \mathrm{d}u ∫ [ − 1 , 1 ] K ( u ) d u 的约束下,最小化 R ( K ) R(K) R ( K ) ,可以建立 Lagrange 函数:
L ( K , λ 1 , λ 2 ) = ∫ [ − 1 , 1 ] K 2 ( u ) d u + λ 1 ( 1 − ∫ [ − 1 , 1 ] K ( u ) d u ) + λ 2 ( A − ∫ [ − 1 , 1 ] u 2 K ( u ) d u ) . \begin{equation}
\mathcal{L}(K,\lambda_1,\lambda_2)=\int_{[-1, 1]} K^2(u) \mathrm{d}u + \lambda_1 \left( 1 - \int_{[-1, 1]} K(u) \mathrm{d}u \right) + \lambda_2 \left( A - \int_{[-1, 1]} u^2K(u) \mathrm{d}u \right).
\end{equation} L ( K , λ 1 , λ 2 ) = ∫ [ − 1 , 1 ] K 2 ( u ) d u + λ 1 ( 1 − ∫ [ − 1 , 1 ] K ( u ) d u ) + λ 2 ( A − ∫ [ − 1 , 1 ] u 2 K ( u ) d u ) .
对 L \mathcal{L} L 关于 K K K 求变分:
δ L δ K ( u ) = 2 K ( u ) − λ 1 − λ 2 u 2 . \begin{equation}
\frac{\delta \mathcal{L}}{\delta K(u)} = 2K(u) - \lambda_1 - \lambda_2 u^2.
\end{equation} δK ( u ) δ L = 2 K ( u ) − λ 1 − λ 2 u 2 .
令其泛函导数为 0 0 0 ,解得:
K ( u ) = 1 2 λ 1 + 1 2 λ 2 u 2 . \begin{equation}
K(u) = \frac{1}{2} \lambda_1 + \frac{1}{2} \lambda_2 u^2.
\end{equation} K ( u ) = 2 1 λ 1 + 2 1 λ 2 u 2 .
考虑到约束 K ( − 1 ) = K ( 1 ) = 0 K(-1) = K(1) = 0 K ( − 1 ) = K ( 1 ) = 0 ,得:
λ 1 = − λ 2 . \begin{equation}
\lambda_1 = -\lambda_2.
\end{equation} λ 1 = − λ 2 .
再考虑到 K K K 的归一性,有:
∫ [ − 1 , 1 ] K ( u ) d u = λ 1 2 ∫ [ − 1 , 1 ] ( 1 − u 2 ) d u = 1 , ⟹ λ 1 = 3 2 , K ( u ) = 3 4 ( 1 − u 2 ) . \begin{align}
\int_{[-1, 1]} K(u) \mathrm{d}u = \frac{\lambda_1}{2} \int_{[-1, 1]} \left( 1 - u^2 \right) \mathrm{d}u = 1, \\
\implies \lambda_1 = \frac{3}{2}, \space K(u) = \frac{3}{4} \left( 1 - u^2 \right).
\end{align} ∫ [ − 1 , 1 ] K ( u ) d u = 2 λ 1 ∫ [ − 1 , 1 ] ( 1 − u 2 ) d u = 1 , ⟹ λ 1 = 2 3 , K ( u ) = 4 3 ( 1 − u 2 ) .
上式中 K ( u ) K(u) K ( u ) 即为 Epanechnikov 核.
5.2 Gauss 核:实践最优核
Gauss 核因适配多数实践场景需求,被广泛视为‘实践最优核”.
首先我们讨论 Gauss 核的有效区域. 虽然 Gauss 核不具有紧支集,但是对于 h > 0 h > 0 h > 0 :
当 ∣ x ∣ > 3 h \vert x \vert > 3h ∣ x ∣ > 3 h 时,exp ( − x 2 2 h 2 ) < exp ( − 9 2 ) ≈ 0.011 \exp\left(-\frac{x^2}{2h^2}\right) < \exp\left(-\frac{9}{2}\right) \approx 0.011 exp ( − 2 h 2 x 2 ) < exp ( − 2 9 ) ≈ 0.011 ,贡献已不足 1%;
当 ∣ x ∣ > 5 h \vert x \vert > 5h ∣ x ∣ > 5 h 时,exp ( − x 2 2 h 2 ) < exp ( − 25 2 ) ≈ 3 × 1 0 − 6 \exp\left(-\frac{x^2}{2h^2}\right) < \exp\left(-\frac{25}{2}\right) \approx 3 \times 10^{-6} exp ( − 2 h 2 x 2 ) < exp ( − 2 25 ) ≈ 3 × 1 0 − 6 ,贡献几乎可忽略.
即:Gauss 核的实际有效区域是 [ − 5 h , 5 h ] [-5h, 5h] [ − 5 h , 5 h ] ,在实际计算中完全可将有效区域外样本的权重视为 0 0 0 ,计算效率与紧支集核几乎无差异. Gauss 核的广泛应用还离不开以下三个个关键特质:
无线光滑性 :K G ∈ C ∞ ( R ) K_{\mathrm{G}} \in C^{\infty}(\mathbb{R}) K G ∈ C ∞ ( R ) ,其生成的密度函数 f ^ \hat{f} f ^ 也继承了这一优良数学性质,能完美匹配真实密度的光滑特性. 而其它核(如:Epanechnikov 核)在需要高阶导数条件下(如:峰值分析、梯度分析)表现不佳.
抗干扰性 :由于 Gauss 核的权重是连续平滑的,且对远离待估计点的 “异常值样本” 赋予极低的权重,因此在数据存在少量噪声或异常值时,Gauss 核的密度估计结果不易被干扰. 而其它核若恰好将异常值纳入支集内,会直接影响局部密度估计,抗干扰性相对较弱.
高维稳定性 :高维数据中,“距离” 的定义会因维度灾难导致样本分布稀疏,紧支集核可能因支集内样本过少而无法有效估计密度(甚至支集内无样本,导致估计值为 0 0 0 ). 而 Gauss 核的指数衰减特性能通过带宽调整,灵活适配高维数据的稀疏性——即使局部样本少,也能通过平滑的权重分配维持密度估计的连续性,避免出现 “零密度区域” 的不合理结果.
此外,在其它分析中,Gauss 核还具有严格正定性 等优良性质,是核分析中的常用函数.
5.3 均匀核
均匀核既非理论最优核,也没有 Gauss 核的优良数学性质,反而缺点重重. 但是均匀核在特定情况下有其作用. 例如,在认为“某点周围 K K K 个样本的类别完全同等重要”时,可以使用均匀核. 此外,在一些低计算成本场景下(如:嵌入式开发、实时数据处理),均匀核因其计算简单而受青睐.
总而言之,不同核有其各自的优缺点,实际应用中要根据应用场景的特点,选择合适的核函数.
更新日志
[2025.09.03]:
调整了学习顺序,优化叙述逻辑;
添加“插入法”、“有偏交叉验证法”.