考虑一个给定的数据集 S S S ,它共有 n n n 个样本
S = ( ( x 1 , y 1 ) , ( x 2 , y 2 ) . . . ( x n , y n ) ) w h e r e y i = { − 1 , + 1 } S=\big((x_1, y_1) ,(x_2, y_2)...(x_n, y_n) \big)\quad where\quad y_i = \{-1, +1\} S = ( ( x 1 , y 1 ) , ( x 2 , y 2 ) ... ( x n , y n ) ) w h ere y i = { − 1 , + 1 }
利用模型集 h h h 对其进行划分(比如 h h h 是一种特定的神经网络结构)
h ( x ) → { − 1 , + 1 } h(x) \rightarrow \{-1, +1\} h ( x ) → { − 1 , + 1 }
采用 01 01 01 损失函数,则经验误差
1 n ∑ i = 1 n 1 [ h ( x i ) ≠ y i ] = 1 n ∑ i = 1 n { 1 , i f ( h ( x i ) , y i ) = ( + 1 , − 1 ) o r ( − 1 , + 1 ) 0 , i f ( h ( x i ) , y i ) = ( + 1 , + 1 ) o r ( − 1 , − 1 ) = 1 n ∑ i = 1 n 1 − y i h ( x i ) 2 = 1 2 − 1 2 n ∑ i = 1 n y i h ( x i ) \frac{1}{n} \sum_{i=1}^n 1\big[h(x_i) \not = y_i \big]
\\
= \frac{1}{n} \sum_{i=1}^n
\begin{cases}
1,\quad if\quad (h(x_i), y_i) = (+1, -1) \quad or\quad (-1, + 1)
\\
0,\quad if\quad (h(x_i), y_i) = (+1, +1) \quad or\quad (-1, - 1)
\end{cases}
\\
= \frac{1}{n} \sum_{i=1}^n \frac{1-y_ih(x_i)}{2} = \frac{1}{2} - \frac{1}{2n}\sum_{i=1}^n y_ih(x_i) n 1 i = 1 ∑ n 1 [ h ( x i ) = y i ] = n 1 i = 1 ∑ n { 1 , i f ( h ( x i ) , y i ) = ( + 1 , − 1 ) or ( − 1 , + 1 ) 0 , i f ( h ( x i ) , y i ) = ( + 1 , + 1 ) or ( − 1 , − 1 ) = n 1 i = 1 ∑ n 2 1 − y i h ( x i ) = 2 1 − 2 n 1 i = 1 ∑ n y i h ( x i )
依据上述公式,可以定义预测结果和真实结果之间的相关性 c o r r e l a t i o n = 1 n ∑ i = 1 n y i h ( x i ) correlation = \frac{1}{n}\sum_{i=1}^n y_ih(x_i) corre l a t i o n = n 1 ∑ i = 1 n y i h ( x i )
correlation 越大,则表明模型的分类效果越好,因此我们的目标就是找到一个使得 correlation 最大的模型 sup h ∈ H 1 n ∑ i = 1 n y i h ( x i ) \large \sup_{h \in \mathcal{H}} \frac{1}{n}\sum_{i=1}^n y_ih(x_i) sup h ∈ H n 1 ∑ i = 1 n y i h ( x i )
同时,考虑到不同对分所对应的难易程度不同,我们应当取所有对分(共 2 n 2^n 2 n 种)的平均相似度
1 2 n ∑ y sup h ∈ H 1 n ∑ i = 1 n y i h ( x i ) = E y sup h ∈ H 1 n ∑ i = 1 n y i h ( x i ) \frac{1}{2^n}\sum_y \sup_{h \in \mathcal{H}} \frac{1}{n}\sum_{i=1}^n y_ih(x_i)
=
\mathbb{E}_y \sup_{h \in \mathcal{H}} \frac{1}{n}\sum_{i=1}^n y_ih(x_i) 2 n 1 y ∑ h ∈ H sup n 1 i = 1 ∑ n y i h ( x i ) = E y h ∈ H sup n 1 i = 1 ∑ n y i h ( x i )
这时,我们将真实结果替换为 R a d e m a c h e r v a r i a b l e Rademacher\quad variable R a d e ma c h er v a r iab l e ,即随机样本标签
σ i = { + 1 , w i t h p r o b a b i l i t y = 0.5 − 1 , w i t h p r o b a b i l i t y = 0.5 \sigma_i =
\begin{cases}
+1,\quad with\quad probability = 0.5
\\
-1,\quad with\quad probability = 0.5
\end{cases} σ i = { + 1 , w i t h p ro babi l i t y = 0.5 − 1 , w i t h p ro babi l i t y = 0.5
由此得到 E m p i r i c a l R a d e m a c h e r C o m p l e x i t y Empirical\quad Rademacher\quad Complexity E m p i r i c a l R a d e ma c h er C o m pl e x i t y
Let g \mathfrak{g} g be a family of general functions mapping from Z \mathrm {Z} Z to [ 0 , 1 ] [0, 1] [ 0 , 1 ] ;Let σ i \sigma_i σ i be Rademacher variable ;
Empirical Rademacher Complexity of g \mathfrak{g} g on a size-n sample set S n = { z 1 , z 2 , . . . , z n } \mathcal{S}_n = \{z_1,z_2,...,z_n\} S n = { z 1 , z 2 , ... , z n }
R ^ S n ( g ) = E σ [ sup g ∈ g 1 n ∑ i = 1 n σ i g ( z i ) ] \hat{\mathcal{R}}_{\mathcal{S} _n}(\mathfrak{g} ) =
\mathbb{E}_\sigma \big[\sup_{g\in \mathfrak{g}}\frac{1}{n}\sum_{i=1}^n \sigma_i g(z_i) \big] R ^ S n ( g ) = E σ [ g ∈ g sup n 1 i = 1 ∑ n σ i g ( z i ) ]
进一步考虑不同的 sample set ,得到 Expected Rademacher Complexity:
R n ( g ) = E S n ∼ D n R ^ S n ( g ) = E S n ∼ D n E σ [ sup g ∈ g 1 n ∑ i = 1 n σ i g ( z i ) ] {\mathcal{R}}_{n}(\mathfrak{g} ) =
\mathbb{E}_{\mathcal{S_n}\sim D^n} \hat{\mathcal{R}}_{\mathcal{S} _n}(\mathfrak{g} )=
\mathbb{E}_{\mathcal{S_n}\sim D^n}\mathbb{E}_\sigma
\big[\sup_{g\in \mathfrak{g}}\frac{1}{n}\sum_{i=1}^n \sigma_i g(z_i) \big] R n ( g ) = E S n ∼ D n R ^ S n ( g ) = E S n ∼ D n E σ [ g ∈ g sup n 1 i = 1 ∑ n σ i g ( z i ) ]
R a d e m a c h e r C o m p l e x i t y B o u n d Rademacher\quad Complexity\quad Bound R a d e ma c h er C o m pl e x i t y B o u n d
基于 R a d e m a c h e r C o m p l e x i t y Rademacher\quad Complexity R a d e ma c h er C o m pl e x i t y 的泛化误差界
Theorem: Let g \mathfrak{g} g be a family of general functions mapping from Z \mathrm {Z} Z to [ 0 , 1 ] [0, 1] [ 0 , 1 ] . Then, for any δ > 0 \delta > 0 δ > 0 , with probability at least 1 − δ 1 - \delta 1 − δ , the following bound holds for all g ∈ g g \in \mathfrak{g} g ∈ g :
E z ∼ D [ g ( z ) ] ≤ 1 n ∑ i = 1 n g ( z i ) + 2 R n ( g ) + log ( 1 / δ ) 2 n E z ∼ D [ g ( z ) ] ≤ 1 n ∑ i = 1 n g ( z i ) + 2 R ^ n ( g ) + 3 log ( 1 / δ ) 2 n \mathbb{E}_{z\sim D}\big[g(z) \big]\leq \frac{1}{n}\sum_{i=1}^n g(z_i) + 2\mathcal{R}_n(g) + \sqrt\frac{\log(1/\delta)}{2n}
\\
\mathbb{E}_{z\sim D}\big[g(z) \big]\leq \frac{1}{n}\sum_{i=1}^n g(z_i) + 2\hat{\mathcal{R}}_n(g) + 3\sqrt\frac{\log(1/\delta)}{2n} E z ∼ D [ g ( z ) ] ≤ n 1 i = 1 ∑ n g ( z i ) + 2 R n ( g ) + 2 n log ( 1/ δ ) E z ∼ D [ g ( z ) ] ≤ n 1 i = 1 ∑ n g ( z i ) + 2 R ^ n ( g ) + 3 2 n log ( 1/ δ )
具体推导:
L e t Φ ( S ) = sup g ∈ g ( E D [ g ] − E ^ S [ g ] ) w h e r e S = ( z 1 , z 2 , . . . , z n ) Let\quad \Phi(\mathcal{S}) = \sup_{g \in \mathfrak{g}}(\mathbb{E}_D[g] - \hat{\mathbb{E}}_{\mathcal{S}}[g])\quad where\quad \mathcal{S}=(z_1,z_2,...,z_n) L e t Φ ( S ) = g ∈ g sup ( E D [ g ] − E ^ S [ g ]) w h ere S = ( z 1 , z 2 , ... , z n )
回忆 M c D i a r m i d McDiarmid M cD ia r mi d 不等式
若 x 1 , x 2 , . . . , x m x_1, x_2,...,x_m x 1 , x 2 , ... , x m 为 m m m 个独立随机变量,且对任意 1 ≤ i ≤ m 1 \leq i \leq m 1 ≤ i ≤ m ,函数 f f f 满足
sup x 1 , . . . , x m , x i ′ ∣ f ( x 1 , . . . , x m ) − f ( x 1 , . . . , x i − 1 , x i ′ , x i + 1 , . . . , x m ) ∣ ≤ c i \large \sup_{x_1, ..., x_m, x'_i} |f(x_1, ...,x_m) - f(x_1,...,x_{i-1},x'_i,x_{i+1},...,x_m)| \leq c_i x 1 , ... , x m , x i ′ sup ∣ f ( x 1 , ... , x m ) − f ( x 1 , ... , x i − 1 , x i ′ , x i + 1 , ... , x m ) ∣ ≤ c i
则对任意 ϵ > 0 \epsilon > 0 ϵ > 0 有
P ( f ( x 1 , . . . , x m ) − E ( f ( x 1 , . . . , x m ) ) ≥ ϵ ) ≤ exp ( − 2 ϵ 2 ∑ i = 1 c i 2 ) P ( ∣ f ( x 1 , . . . , x m ) − E ( f ( x 1 , . . . , x m ) ) ∣ ≥ ϵ ) ≤ 2 exp ( − 2 ϵ 2 ∑ i = 1 c i 2 ) P\big(f(x_1,...,x_m) - \mathbb E(f(x_1,...,x_m))\geq \epsilon \big) \leq \exp(\frac{-2\epsilon^2}{\sum_{i=1}c_i^2})
\\
P\big(\big| f(x_1,...,x_m) - \mathbb E(f(x_1,...,x_m)) \big |\geq \epsilon \big) \leq 2\exp(\frac{-2\epsilon^2}{\sum_{i=1}c_i^2}) P ( f ( x 1 , ... , x m ) − E ( f ( x 1 , ... , x m )) ≥ ϵ ) ≤ exp ( ∑ i = 1 c i 2 − 2 ϵ 2 ) P ( ∣ ∣ f ( x 1 , ... , x m ) − E ( f ( x 1 , ... , x m )) ∣ ∣ ≥ ϵ ) ≤ 2 exp ( ∑ i = 1 c i 2 − 2 ϵ 2 )
可见,要应用 M c D i a r m i d McDiarmid M cD ia r mi d 不等式,应先要满足其条件。于是引入 S ′ \mathcal{S}' S ′ , S ′ \mathcal{S}' S ′ 与 S \mathcal{S} S 只有一个变量的取值不同
Change S \mathcal{S} S to S ′ = { z 1 , . . . , z i ′ , . . , z n } \mathcal{S}'= \{z_1,...,z_i',..,z_n\} S ′ = { z 1 , ... , z i ′ , .. , z n } that differs only at z i ′ ≠ z i z_i' \not = z_i z i ′ = z i
Φ ( S ) − Φ ( S ′ ) = sup g ∈ g ( E D [ g ] − E ^ S [ g ] ) − sup g ∈ g ( E D [ g ] − E ^ S ′ [ g ] ) ( 1 ) ≤ sup g ∈ g { ( E D [ g ] − E ^ S [ g ] ) − ( E D [ g ] − E ^ S ′ [ g ] ) } ( 2 ) = sup g ∈ g { E ^ S ′ [ g ] − E ^ S [ g ] } = sup g ∈ g { 1 n ∑ z ∈ S ′ g ( z ) − 1 n ∑ z ∈ S g ( z ) } ( 3 ) = 1 n sup g ∈ g { g ( z i ′ ) − g ( z i ) } ≤ 1 n ( 4 ) \Phi(\mathcal{S}) - \Phi(\mathcal{S}') =
\sup_{g \in \mathfrak{g}}(\mathbb E_D[g] - \hat{\mathbb{E}}_{\mathcal{S}}[g]) -
\sup_{g \in \mathfrak{g}}(\mathbb E_D[g] - \hat{\mathbb{E}}_{\mathcal{S}'}[g])
\quad (1)
\\
\leq \sup_{g \in \mathfrak{g}}\{(\mathbb E_D[g] - \hat{\mathbb{E}}_{\mathcal{S}}[g]) - (\mathbb E_D[g] - \hat{\mathbb{E}}_{\mathcal{S}'}[g])\}\quad (2)
\\
= \sup_{g \in \mathfrak{g}}\{\hat{\mathbb{E}}_{\mathcal{S}'}[g] - \hat{\mathbb{E}}_{\mathcal{S}}[g]\}
= \sup_{g \in \mathfrak{g}} \{\frac{1}{n}\sum_{z\in \mathcal{S}'}g(z) - \frac{1}{n}\sum_{z\in \mathcal{S}}g(z)\}\quad (3)
\\
= \frac{1}{n}\sup_{g \in \mathfrak{g}}\{g(z_i') - g(z_i)\}\leq \frac{1}{n}\quad (4) Φ ( S ) − Φ ( S ′ ) = g ∈ g sup ( E D [ g ] − E ^ S [ g ]) − g ∈ g sup ( E D [ g ] − E ^ S ′ [ g ]) ( 1 ) ≤ g ∈ g sup {( E D [ g ] − E ^ S [ g ]) − ( E D [ g ] − E ^ S ′ [ g ])} ( 2 ) = g ∈ g sup { E ^ S ′ [ g ] − E ^ S [ g ]} = g ∈ g sup { n 1 z ∈ S ′ ∑ g ( z ) − n 1 z ∈ S ∑ g ( z )} ( 3 ) = n 1 g ∈ g sup { g ( z i ′ ) − g ( z i )} ≤ n 1 ( 4 )
( 2 ) (2) ( 2 ) 式是在 ( 1 ) (1) ( 1 ) 的基础上,通过 sup \sup sup 的性质得到的,形式很类似于三角不等式
由于 S ′ \mathcal{S}' S ′ 与 S \mathcal{S} S 只有一个变量的取值不同(z i ′ ≠ z i z_i' \not = z_i z i ′ = z i ),且 g ( z i ) ≤ 1 g(z_i) \leq 1 g ( z i ) ≤ 1 ,所以 ( 3 ) → ( 4 ) (3) \rightarrow (4) ( 3 ) → ( 4 )
至此,使用 M c D i a r m i d McDiarmid M cD ia r mi d 不等式的条件已经满足,取 c i = 1 n c_i = \frac{1}{n} c i = n 1
P ( Φ ( S ) − E S Φ ( S ) ≥ ϵ ) ≤ exp ( − 2 ϵ 2 ∑ i = 1 n 1 n 2 ) = exp ( − 2 n ϵ 2 ) P\Big(\Phi(\mathcal{S}) - \mathbb E_{\mathcal{S}}\Phi(\mathcal{S})\geq \epsilon \Big)
\leq \exp\Big(- \frac{2\epsilon^2}{\sum_{i=1}^n \frac{1}{n^2}} \Big) = \exp(-2n\epsilon^2) P ( Φ ( S ) − E S Φ ( S ) ≥ ϵ ) ≤ exp ( − ∑ i = 1 n n 2 1 2 ϵ 2 ) = exp ( − 2 n ϵ 2 )
令 δ = P ( Φ ( S ) − E S Φ ( S ) ≥ ϵ ) \delta = P\Big(\Phi(\mathcal{S}) - \mathbb E_{\mathcal{S}}\Phi(\mathcal{S})\geq \epsilon \Big) δ = P ( Φ ( S ) − E S Φ ( S ) ≥ ϵ ) ,则有 δ ≤ exp ( − 2 n ϵ 2 ) \delta \leq \exp(-2n\epsilon^2) δ ≤ exp ( − 2 n ϵ 2 ) ,解出 ϵ \epsilon ϵ ,
则 With probability at least 1 − δ 2 1 - \frac{\delta}{2} 1 − 2 δ :Φ ( S ) ≤ E S [ Φ ( S ) ] + log ( 2 / δ ) 2 n ( ∗ ) \Phi(\mathcal{S}) \leq \mathbb E_{\mathcal{S}}[\Phi(\mathcal{S})] + \sqrt \frac{\log(2/\delta)}{2n}\quad (*) Φ ( S ) ≤ E S [ Φ ( S )] + 2 n l o g ( 2/ δ ) ( ∗ )
在 ( ∗ ) (*) ( ∗ ) 的基础上,我们进一步求 E S [ Φ ( S ) ] E_{\mathcal{S}}[\Phi(\mathcal{S})] E S [ Φ ( S )] 的上界
E S [ Φ ( S ) ] = E S [ sup g ∈ g ( E D [ g ] − E ^ S [ g ] ) ] = E S [ sup g ∈ g ( E S ′ E ^ S ′ [ g ] − E ^ S [ g ] ) ] ( 2 ) = E S [ sup g ∈ g E S ′ ( E ^ S ′ [ g ] − E ^ S [ g ] ) ] ( 3 ) ≤ E S , S ′ [ sup g ∈ g ( E ^ S ′ [ g ] − E ^ S [ g ] ) ] ( 4 ) = E S , S ′ [ sup g ∈ g 1 n ∑ i = 1 n ( g ( z i ′ ) − g ( z i ) ) ] ( 5 ) = E S , S ′ [ sup g ∈ g 1 n ∑ i = 1 n σ i ( g ( z i ′ ) − g ( z i ) ) ] ( 6 ) ≤ E σ , S ′ [ sup g ∈ g 1 n ∑ i = 1 n σ i g ( z i ′ ) ] + E σ , S [ sup g ∈ g 1 n ∑ i = 1 n σ i g ( z i ) ] ( 7 ) = 2 E σ , S [ sup g ∈ g 1 n ∑ i = 1 n σ i g ( z i ) ] = 2 R n ( g ) ( ∗ ∗ ) E_{\mathcal{S}}[\Phi(\mathcal{S})] = \mathbb E_{\mathcal{S}}\Big[\sup_{g \in \mathfrak{g}}\big(\mathbb E_D[g] - \hat{\mathbb{E}}_{\mathcal{S}}[g] \big) \Big]
\\
= \mathbb E_{\mathcal{S}}\Big[\sup_{g \in \mathfrak{g}}\big(\mathbb E_{\mathcal{S}'} \hat{\mathbb E}_{\mathcal{S}'}[g] - \hat{\mathbb{E}}_{\mathcal{S}}[g]\big)\Big] \quad (2)
\\
= \mathbb E_{\mathcal{S}}\Big[\sup_{g \in \mathfrak{g}}\mathbb E_{\mathcal{S}'}\big( \hat{\mathbb E}_{\mathcal{S}'}[g] - \hat{\mathbb{E}}_{\mathcal{S}}[g]\big)\Big]\quad (3)
\\
\leq \mathbb E_{\mathcal{S},\mathcal{S}'}\Big[\sup_{g \in \mathfrak{g}}\big( \hat{\mathbb E}_{\mathcal{S}'}[g] - \hat{\mathbb{E}}_{\mathcal{S}}[g]\big)\Big]\quad (4)
\\
= \mathbb E_{\mathcal{S},\mathcal{S}'}\Big[\sup_{g \in \mathfrak{g}} \frac{1}{n} \sum_{i=1}^n \big( g(z_i') - g(z_i)\big)\Big]\quad (5)
\\
= \mathbb E_{\mathcal{S},\mathcal{S}'}\Big[\sup_{g \in \mathfrak{g}} \frac{1}{n} \sum_{i=1}^n \sigma_i\big( g(z_i') - g(z_i)\big)\Big]\quad (6)
\\
\leq \mathbb E_{\sigma,\mathcal{S}'}\Big[\sup_{g \in \mathfrak{g}} \frac{1}{n} \sum_{i=1}^n\sigma_ig(z_i') \Big]
+ \mathbb E_{\sigma,\mathcal{S}}\Big[\sup_{g \in \mathfrak{g}} \frac{1}{n} \sum_{i=1}^n \sigma_ig(z_i)\Big]\quad (7)
\\
= 2\mathbb E_{\sigma,\mathcal{S}}\Big[\sup_{g \in \mathfrak{g}} \frac{1}{n} \sum_{i=1}^n \sigma_ig(z_i)\Big] = 2\mathcal{R}_n(\mathfrak{g}) \quad (**) E S [ Φ ( S )] = E S [ g ∈ g sup ( E D [ g ] − E ^ S [ g ] ) ] = E S [ g ∈ g sup ( E S ′ E ^ S ′ [ g ] − E ^ S [ g ] ) ] ( 2 ) = E S [ g ∈ g sup E S ′ ( E ^ S ′ [ g ] − E ^ S [ g ] ) ] ( 3 ) ≤ E S , S ′ [ g ∈ g sup ( E ^ S ′ [ g ] − E ^ S [ g ] ) ] ( 4 ) = E S , S ′ [ g ∈ g sup n 1 i = 1 ∑ n ( g ( z i ′ ) − g ( z i ) ) ] ( 5 ) = E S , S ′ [ g ∈ g sup n 1 i = 1 ∑ n σ i ( g ( z i ′ ) − g ( z i ) ) ] ( 6 ) ≤ E σ , S ′ [ g ∈ g sup n 1 i = 1 ∑ n σ i g ( z i ′ ) ] + E σ , S [ g ∈ g sup n 1 i = 1 ∑ n σ i g ( z i ) ] ( 7 ) = 2 E σ , S [ g ∈ g sup n 1 i = 1 ∑ n σ i g ( z i ) ] = 2 R n ( g ) ( ∗ ∗ )
( 2 ) (2) ( 2 ) 采用了重采样,有如下关系成立:E D [ g ] = E S ′ ∼ D n E ^ S ′ [ g ] \mathbb E_D[g] = \mathbb E_{\mathcal{S}'\sim D^n}\hat{\mathbb E}_{\mathcal{S}'}[g] E D [ g ] = E S ′ ∼ D n E ^ S ′ [ g ]
( 3 ) (3) ( 3 ) 由于 E ^ S [ g ] \hat{\mathbb{E}}_{\mathcal{S}}[g] E ^ S [ g ] 与 S ′ \mathcal{S}' S ′ 无关,所以 E S ′ E ^ S [ g ] = E ^ S [ g ] \mathbb E_{\mathcal{S}'} \hat{\mathbb{E}}_{\mathcal{S}}[g] = \hat{\mathbb{E}}_{\mathcal{S}}[g] E S ′ E ^ S [ g ] = E ^ S [ g ] ,因此可以将 E S ′ \mathbb E_{\mathcal{S}'} E S ′ 提出来
( 4 ) (4) ( 4 ) 在 ( 3 ) (3) ( 3 ) 的基础上使用 J e n s e n Jensen J e n se n 不等式
( 5 ) (5) ( 5 ) 在 ( 4 ) (4) ( 4 ) 的基础上对期望进行展开
( 6 ) (6) ( 6 ) 式引入 R a d e m a c h e r v a r i a b l e Rademacher\quad variable R a d e ma c h er v a r iab l e ,当 σ i = 1 \sigma_i = 1 σ i = 1 时,( 6 ) (6) ( 6 ) 和 ( 5 ) (5) ( 5 ) 的形式一致;当 σ i = − 1 \sigma_i = -1 σ i = − 1 时,由于我们是对 S , S ′ \mathcal{S},\mathcal{S}' S , S ′ 同时求期望,此时只需交换 z i z_i z i 和 z i ′ z_i' z i ′ 的取值即可
( 7 ) (7) ( 7 ) 应用 sup \sup sup 的三角不等式 sup ( A + B ) ≤ sup A + sup B \sup(A + B) \leq \sup A + \sup B sup ( A + B ) ≤ sup A + sup B
如此,整合 ( ∗ ) (*) ( ∗ ) 和 ( ∗ ∗ ) (**) ( ∗ ∗ ) ,得
With probability at least 1 − δ 1 - \delta 1 − δ
E z ∼ D [ g ( z ) ] ≤ 1 n ∑ i = 1 n g ( z i ) + 2 R n ( g ) + log ( 1 / δ ) 2 n \mathbb{E}_{z\sim D}\big[g(z) \big]\leq \frac{1}{n}\sum_{i=1}^n g(z_i) + 2\mathcal{R}_n(g) + \sqrt\frac{\log(1/\delta)}{2n} E z ∼ D [ g ( z ) ] ≤ n 1 i = 1 ∑ n g ( z i ) + 2 R n ( g ) + 2 n log ( 1/ δ )