情景1:特定形式的高斯朴素贝叶斯分类器与逻辑回归(Specific Gaussian naive Bayes classifiers and logistic regression)
Consider a specific class of Gaussian naive Bayes classifier where:
y y y is a boolean variable following a Bernouli distribution, with parameter π = P ( y = 1 ) \pi=P(y=1) π = P ( y = 1 ) and thus P ( Y = 0 ) = 1 − π P(Y=0)=1-\pi P ( Y = 0 ) = 1 − π .
x = [ x 1 , … , x D ] T x=[x_1,\dots,x_D]^T x = [ x 1 , … , x D ] T , with each feature x i x_i x i a continous random variable. For each x i x_i x i , P ( x i ∣ y = k ) P(x_i|y=k) P ( x i ∣ y = k ) is a Gaussian distribution N ( μ i k , σ i ) N(\mu_{ik},\sigma_i) N ( μ ik , σ i ) . Note that σ i \sigma_i σ i is the standard deviation of the Gaussian distribution, which does not depend on k k k .
For all i ≠ j i \neq j i = j and x i x_i x i and x j x_j x j are conditionally independent given y y y (so called " naice" classifier).
Question: please show that the relationship between a discriminative classifier (say logistic regression) and the above specific class of Gaussian naive Bayes classifiers is precisely the form used by logistic regression.
中文描述: 当 i ≠ j i \neq j i = j 时 x i x_i x i 和 x j x_j x j 相互独立,σ \sigma σ 依赖于 i i i 而不依赖于 k k k
思路:
尝试将贝叶斯表达式转化为逻辑回归的一般形式:
P ( Y = 1 ∣ X ) = 1 1 + exp ( w 0 + ∑ i = 1 n w i x i ) P ( Y = 0 ∣ X ) = exp ( w 0 + ∑ i = 1 n w i x i ) 1 + exp ( w 0 + ∑ i = 1 n w i x i ) P(Y=1|X)=\frac{1}{1 + \exp(w_0 + \sum_{i=1}^n w_i x_i)}\\
P(Y=0|X)=\frac{\exp(w_0+\sum_{i=1}^nw_ix_i)}{1 + \exp(w_0 + \sum_{i=1}^n w_i x_i)} P ( Y = 1∣ X ) = 1 + exp ( w 0 + ∑ i = 1 n w i x i ) 1 P ( Y = 0∣ X ) = 1 + exp ( w 0 + ∑ i = 1 n w i x i ) exp ( w 0 + ∑ i = 1 n w i x i )
Answer:
根据上述特定高斯朴素贝叶斯分类器的假设,以及贝叶斯法则,有:
P ( Y = 1 ∣ X ) = P ( Y = 1 ) P ( X ∣ Y = 1 ) P ( Y = 1 ) P ( X ∣ Y = 1 ) + P ( Y = 0 ) P ( X ∣ Y = 0 ) = 1 1 + P ( Y = 0 ) P ( X ∣ Y = 0 ) P ( Y = 1 ) P ( X ∣ Y = 1 ) = 1 1 + exp ( ln P ( Y = 0 ) P ( X ∣ Y = 0 ) P ( Y = 1 ) P ( X ∣ Y = 1 ) ) \begin{aligned}
P(Y=1|X)
&=\frac{P(Y=1)P(X|Y=1)}{P(Y=1)P(X|Y=1)+P(Y=0)P(X|Y=0)}\\
&=\frac{1}{1+\frac{P(Y=0)P(X|Y=0)}{P(Y=1)P(X|Y=1)}}\\
&=\frac{1}{1+\exp{(\ln\frac{P(Y=0)P(X|Y=0)}{P(Y=1)P(X|Y=1)})}}
\end{aligned} P ( Y = 1∣ X ) = P ( Y = 1 ) P ( X ∣ Y = 1 ) + P ( Y = 0 ) P ( X ∣ Y = 0 ) P ( Y = 1 ) P ( X ∣ Y = 1 ) = 1 + P ( Y = 1 ) P ( X ∣ Y = 1 ) P ( Y = 0 ) P ( X ∣ Y = 0 ) 1 = 1 + exp ( ln P ( Y = 1 ) P ( X ∣ Y = 1 ) P ( Y = 0 ) P ( X ∣ Y = 0 ) ) 1
由给定 x i , x j x_i, x_j x i , x j 的条件独立性假设,可得:
P ( Y = 1 ∣ X ) = 1 1 + exp ( ln P ( Y = 0 ) P = ( Y = 1 ) + ∑ i ln P ( x i ∣ Y = 0 ) P ( x i ∣ Y = 1 ) ) = 1 1 + exp ( ln 1 − π π + ∑ i ln P ( x i ∣ Y = 0 ) P ( x i ∣ Y = 1 ) ) \begin{aligned}
P(Y=1|X)
&=\frac{1}{1+\exp{(\ln\frac{P(Y=0)}{P=(Y=1)}+\sum_i\ln\frac{P(x_i|Y=0)}{P(x_i|Y=1)})}}\\
&=\frac{1}{1+\exp{(\ln\frac{1-\pi}{\pi}+\sum_i\ln\frac{P(x_i|Y=0)}{P(x_i|Y=1)})}}
\end{aligned} P ( Y = 1∣ X ) = 1 + exp ( ln P = ( Y = 1 ) P ( Y = 0 ) + ∑ i ln P ( x i ∣ Y = 1 ) P ( x i ∣ Y = 0 ) ) 1 = 1 + exp ( ln π 1 − π + ∑ i ln P ( x i ∣ Y = 1 ) P ( x i ∣ Y = 0 ) ) 1
再根据 P ( x i ∣ Y = y k ) P(x_i|Y=y_k) P ( x i ∣ Y = y k ) 服从高斯分布 N ( μ i k , σ i ) N(μ_{ik},σ_i) N ( μ ik , σ i ) ,可得:
ln P ( x i ∣ Y = 0 ) P ( x i ∣ Y = 1 ) = ln 1 2 π σ i 2 exp ( − ( x i − μ i 0 ) 2 2 σ i 2 ) 1 2 π σ i 2 exp ( − ( x i − μ i 1 ) 2 2 σ i 2 ) = ( x i − μ i 1 ) 2 − ( x i − μ i 0 ) 2 2 σ i 2 = 2 x i ( μ i 0 − μ i 1 ) + μ i 1 2 − μ i 0 2 2 σ i 2 = μ i 0 − μ i 1 σ i 2 x i + μ i 1 2 − μ i 0 2 2 σ i 2 \begin{aligned}
\ln\frac{P(x_i|Y=0)}{P(x_i|Y=1)}
&=\ln\frac{\frac{1}{\sqrt{2\pi\sigma_i^2}}\exp(\frac{-(x_i-\mu_{i0})^2}{2\sigma_i^2})}{\frac{1}{\sqrt{2\pi\sigma_i^2}}\exp(\frac{-(x_i-\mu_{i1})^2}{2\sigma_i^2})}\\
&=\frac{(x_i-\mu_{i1})^2-(x_i-\mu_{i0})^2}{2\sigma_i^2}\\
&=\frac{2x_i(\mu_{i0}-\mu_{i1})+\mu_{i1}^2-\mu_{i0}^2}{2\sigma_i^2}\\
&=\frac{\mu_{i0}-\mu_{i1}}{\sigma_i^2}x_i+\frac{\mu_{i1}^2-\mu_{i0}^2}{2\sigma_i^2}
\end{aligned} ln P ( x i ∣ Y = 1 ) P ( x i ∣ Y = 0 ) = ln 2 π σ i 2 1 exp ( 2 σ i 2 − ( x i − μ i 1 ) 2 ) 2 π σ i 2 1 exp ( 2 σ i 2 − ( x i − μ i 0 ) 2 ) = 2 σ i 2 ( x i − μ i 1 ) 2 − ( x i − μ i 0 ) 2 = 2 σ i 2 2 x i ( μ i 0 − μ i 1 ) + μ i 1 2 − μ i 0 2 = σ i 2 μ i 0 − μ i 1 x i + 2 σ i 2 μ i 1 2 − μ i 0 2
则:
P ( Y = 1 ∣ X ) = 1 1 + exp ( ln 1 − π π + ∑ i ( μ i 0 − μ σ i 2 x i + μ i 1 2 − μ i 0 2 2 σ i 2 ) ) P(Y=1|X)=\frac{1}{1+\exp(\ln\frac{1-\pi}{\pi}+\sum_i(\frac{\mu_{i0}-\mu}{\sigma_i^2}x_i+\frac{\mu_{i1}^2-\mu_{i0}^2}{2\sigma_i^2}))} P ( Y = 1∣ X ) = 1 + exp ( ln π 1 − π + ∑ i ( σ i 2 μ i 0 − μ x i + 2 σ i 2 μ i 1 2 − μ i 0 2 )) 1
等价于逻辑回归的一般形式:
P ( Y = 1 ∣ X ) = 1 1 + exp ( w 0 + ∑ i = 1 n w i x i ) P(Y=1|X)=\frac{1}{1+\exp(w_0+\sum_{i=1}^nw_ix_i)} P ( Y = 1∣ X ) = 1 + exp ( w 0 + ∑ i = 1 n w i x i ) 1
其中:
w 0 = ln 1 − π π + ∑ i μ i 1 2 − μ i 0 2 2 σ i 2 w 1 = μ i 0 − μ i 1 σ i 2 \begin{aligned}
w_0&=\ln\frac{1-\pi}{\pi}+\sum_i\frac{\mu_{i1}^2-\mu_{i0}^2}{2\sigma_i^2}\\
w_1&=\frac{\mu_{i0}-\mu_{i1}}{\sigma_i^2}
\end{aligned} w 0 w 1 = ln π 1 − π + i ∑ 2 σ i 2 μ i 1 2 − μ i 0 2 = σ i 2 μ i 0 − μ i 1
所以该特定形式的高斯贝叶斯分类器正是逻辑回归所使用的形式。
情景2:一般形式的高斯朴素贝叶斯分类器与逻辑回归(General Gaussian naive Bayes classifiers and logistic regression)
Removing the assumption that the standard devition σ i \sigma_i σ i of P ( x i ∣ y = k ) P(x_i|y=k) P ( x i ∣ y = k ) does not depend on k k k . That is, for each x i x_i x i , P ( x i ∣ y = k ) P(x_i|y=k) P ( x i ∣ y = k ) is a Gaussian distribution N ( μ i k , σ i k ) N(\mu_{ik}, \sigma_{ik}) N ( μ ik , σ ik ) , where i = 1 , … , D i=1,\dots,D i = 1 , … , D and k = 0 , 1 k=0,1 k = 0 , 1 .
Question: is the new form of P ( y ∣ x ) P(y|x) P ( y ∣ x ) implied by this more general Gaussian naive Bayes classifier still the form used by logistic regression? Derive the new form of P ( y ∣ x ) P(y|x) P ( y ∣ x ) to prove your answer.
中文描述: 当 i ≠ j i \neq j i = j 时, x i x_i x i 和 x j x_j x j 相互独立,σ \sigma σ 依赖于 i i i 和 k k k
Answer:
由上题可得:
P ( Y = 1 ∣ X ) = 1 1 + exp ( ln 1 − π π + ∑ i ln P ( x i ∣ Y = 0 ) P ( x i ∣ Y = 1 ) ) \begin{aligned}
P(Y=1|X)
&=\frac{1}{1+\exp{(\ln\frac{1-\pi}{\pi}+\sum_i\ln\frac{P(x_i|Y=0)}{P(x_i|Y=1)})}}
\end{aligned} P ( Y = 1∣ X ) = 1 + exp ( ln π 1 − π + ∑ i ln P ( x i ∣ Y = 1 ) P ( x i ∣ Y = 0 ) ) 1
再根据 P ( x i ∣ Y = y k ) P(x_i|Y=y_k) P ( x i ∣ Y = y k ) 服从高斯分布 N ( μ i k , σ i k ) N(\mu_{ik}, \sigma_{ik}) N ( μ ik , σ ik ) ,可得:
ln P ( x i ∣ Y = 0 ) P ( x i ∣ Y = 1 ) = ln 1 2 π σ i 0 2 exp ( − ( x i − μ i 0 ) 2 2 σ i 0 2 ) 1 2 π σ i 1 2 exp ( − ( x i − μ i 1 ) 2 2 σ i 1 2 ) = ln ( 1 2 π σ i 0 2 exp ( − ( x i − μ i 0 ) 2 2 σ i 0 2 ) ) − ln ( 1 2 π σ i 1 2 exp ( − ( x i − μ i 1 ) 2 2 σ i 0 2 ) ) = ln 1 2 π σ i 0 2 − ( x i − μ i 0 ) 2 2 σ i 0 2 − ln 1 2 π σ i 1 2 + ( x i − μ i 1 ) 2 2 σ i 1 2 = ln σ i 1 σ i 0 + ( x i − μ i 1 ) 2 2 σ i 1 2 − ( x i − μ i 0 ) 2 2 σ i 0 2 = ln σ i 1 σ i 0 + σ i 0 2 μ i 1 2 − σ i 1 2 μ i 0 2 2 σ i 0 2 σ i 1 2 + μ i 0 σ i 0 2 − μ i 1 σ i 1 2 σ i 0 2 σ i 1 2 x i + σ i 0 2 − σ i 1 2 2 σ i 0 2 σ i 1 2 x i 2 \begin{aligned}
\ln\frac{P(x_i|Y=0)}{P(x_i|Y=1)}
&=\ln\frac{\frac{1}{\sqrt{2\pi\sigma_{i0}^2}}\exp(\frac{-(x_i-\mu_{i0})^2}{2\sigma_{i0}^2})}{\frac{1}{\sqrt{2\pi\sigma_{i1}^2}}\exp(\frac{-(x_i-\mu_{i1})^2}{2\sigma_{i1}^2})}\\
&=\ln(\frac{1}{\sqrt{2\pi\sigma_{i0}^2}}\exp(\frac{-(x_i-\mu_{i0})^2}{2\sigma_{i0}^2}))-\ln(\frac{1}{\sqrt{2\pi\sigma_{i1}^2}}\exp(\frac{-(x_i-\mu_{i1})^2}{2\sigma_{i0}^2}))\\
&=\ln\frac{1}{\sqrt{2\pi\sigma_{i0}^2}}-\frac{(x_i-\mu_{i0})^2}{2\sigma_{i0}^2}-\ln\frac{1}{\sqrt{2\pi\sigma_{i1}^2}}+\frac{(x_i-\mu_{i1})^2}{2\sigma_{i1}^2}\\
&=\ln\frac{\sigma_{i1}}{\sigma_{i0}}+\frac{(x_i-\mu_{i1})^2}{2\sigma_{i1}^2}-\frac{(x_i-\mu_{i0})^2}{2\sigma_{i0}^2}\\
&=\ln\frac{\sigma_{i1}}{\sigma_{i0}}+\frac{\sigma_{i0}^2\mu_{i1}^2-\sigma_{i1}^2\mu_{i0}^2}{2\sigma_{i0}^2\sigma_{i1}^2}+\frac{\mu_{i0}\sigma_{i0}^2-\mu_{i1}\sigma_{i1}^2}{\sigma_{i0}^2\sigma_{i1}^2}x_i+\frac{\sigma_{i0}^2-\sigma_{i1}^2}{2\sigma_{i0}^2\sigma_{i1}^2}x_i^2
\end{aligned} ln P ( x i ∣ Y = 1 ) P ( x i ∣ Y = 0 ) = ln 2 π σ i 1 2 1 exp ( 2 σ i 1 2 − ( x i − μ i 1 ) 2 ) 2 π σ i 0 2 1 exp ( 2 σ i 0 2 − ( x i − μ i 0 ) 2 ) = ln ( 2 π σ i 0 2 1 exp ( 2 σ i 0 2 − ( x i − μ i 0 ) 2 )) − ln ( 2 π σ i 1 2 1 exp ( 2 σ i 0 2 − ( x i − μ i 1 ) 2 )) = ln 2 π σ i 0 2 1 − 2 σ i 0 2 ( x i − μ i 0 ) 2 − ln 2 π σ i 1 2 1 + 2 σ i 1 2 ( x i − μ i 1 ) 2 = ln σ i 0 σ i 1 + 2 σ i 1 2 ( x i − μ i 1 ) 2 − 2 σ i 0 2 ( x i − μ i 0 ) 2 = ln σ i 0 σ i 1 + 2 σ i 0 2 σ i 1 2 σ i 0 2 μ i 1 2 − σ i 1 2 μ i 0 2 + σ i 0 2 σ i 1 2 μ i 0 σ i 0 2 − μ i 1 σ i 1 2 x i + 2 σ i 0 2 σ i 1 2 σ i 0 2 − σ i 1 2 x i 2
其中含有 x i 2 x_i^2 x i 2 项,所以一般高斯朴素贝叶斯分类器不是逻辑回归所使用的形式。
情景3:高斯贝叶斯分类器与逻辑回归(Gaussian Bayes classifiers and logistic regression)
Now, consider the following assumptions for our Gaussian Bayses classifiers (without "naive").
y y y is a boolean variable following a Bernouli distribution, with parameter π = P ( y = 1 ) \pi=P(y=1) π = P ( y = 1 ) and thus P ( Y = 0 ) = 1 − π P(Y=0)=1-\pi P ( Y = 0 ) = 1 − π .
x = [ x 1 , x 2 ] T x=[x_1,x_2]^T x = [ x 1 , x 2 ] T , i.e., we only consider two features for each sample, with each feature a continous random variable. x 1 x_1 x 1 and x 2 x_2 x 2 are not conditional independent given y. We assume P ( x 1 , x 2 ∣ y = k ) P(x_1,x_2|y=k) P ( x 1 , x 2 ∣ y = k ) is a bivariate Gaussian distribution N ( μ 1 k , μ 2 k , σ 1 , σ 2 , ρ ) N(\mu_{1k},\mu_{2k},\sigma_1,\sigma_2,\rho) N ( μ 1 k , μ 2 k , σ 1 , σ 2 , ρ ) , where μ 1 k \mu_{1k} μ 1 k and μ 2 k \mu_{2k} μ 2 k are means of x 1 x_1 x 1 and x 2 x_2 x 2 . The density of the bivariate Gaussian distribution is:
P ( x 1 , x 2 ∣ y = k ) = 1 2 π σ 1 σ 2 1 − ρ 2 exp [ − σ 2 ( x 1 − μ 1 k ) 2 + σ 1 2 ( x 2 − μ 2 k ) 2 − 2 ρ σ 1 σ 2 ( x 1 − μ 1 k ) ( x 2 − μ 2 k ) 2 ( 1 − ρ 2 ) σ 1 2 σ 2 2 ] P(x_1,x_2|y=k)=\frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}\exp[-\frac{\sigma^2(x_1-\mu_{1k})^2+\sigma_1^2(x_2-\mu_{2k})^2-2\rho\sigma_1\sigma_2(x_1-\mu_{1k})(x_2-\mu_{2k})}{2(1-\rho^2)\sigma_1^2\sigma2^2}] P ( x 1 , x 2 ∣ y = k ) = 2 π σ 1 σ 2 1 − ρ 2 1 exp [ − 2 ( 1 − ρ 2 ) σ 1 2 σ 2 2 σ 2 ( x 1 − μ 1 k ) 2 + σ 1 2 ( x 2 − μ 2 k ) 2 − 2 ρ σ 1 σ 2 ( x 1 − μ 1 k ) ( x 2 − μ 2 k ) ]
Question: is the form of P ( y ∣ x ) P(y|x) P ( y ∣ x ) implied by such not-so-naive Gaussian Bayes classifiers still the form used by logistic regression? Derive the form of P ( y ∣ x ) P(y|x) P ( y ∣ x ) to prove your answer.
中文描述: x i x_i x i 和 x j x_j x j 不相互独立,σ \sigma σ 依赖于 i i i 而不依赖于 k k k
Answer:
根据上述非朴素的高斯贝叶斯分类器的假设,以及贝叶斯法则,有:
P ( Y = 1 ∣ X ) = 1 1 + exp ( ln 1 − π π + ln P ( X ∣ Y = 0 ) P ( X ∣ Y = 1 ) ) P(Y=1|X)=\frac{1}{1+\exp{(\ln\frac{1-\pi}{\pi}+\ln\frac{P(X|Y=0)}{P(X|Y=1)})}} P ( Y = 1∣ X ) = 1 + exp ( ln π 1 − π + ln P ( X ∣ Y = 1 ) P ( X ∣ Y = 0 ) ) 1
其中 X = [ x 1 , x 2 ] T X=[x_1,x_2]^T X = [ x 1 , x 2 ] T ,且服从二元高斯分布:
P ( x 1 , x 2 ∣ Y = k ) = 1 2 π σ 1 σ 2 1 − ρ 2 exp [ − σ 2 2 ( x 1 − μ 1 k ) 2 + σ 1 2 ( x 2 − μ 2 k ) 2 − 2 ρ σ 1 σ 2 ( x 1 − μ 1 k ) ( x 2 − μ 2 k ) 2 ( 1 − ρ 2 ) σ 1 2 σ 2 2 ] P(x_1,x_2|Y=k)=\frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}\exp\left[-\frac{\sigma_2^2(x_1-\mu_{1k})^2+\sigma_1^2(x_2-\mu_{2k})^2-2\rho\sigma_1\sigma_2(x_1-\mu_{1k})(x_2-\mu_{2k})}{2(1-\rho^2)\sigma_1^2\sigma_2^2}\right] P ( x 1 , x 2 ∣ Y = k ) = 2 π σ 1 σ 2 1 − ρ 2 1 exp [ − 2 ( 1 − ρ 2 ) σ 1 2 σ 2 2 σ 2 2 ( x 1 − μ 1 k ) 2 + σ 1 2 ( x 2 − μ 2 k ) 2 − 2 ρ σ 1 σ 2 ( x 1 − μ 1 k ) ( x 2 − μ 2 k ) ]
则:
ln P ( X ∣ Y = 0 ) P ( X ∣ Y = 1 ) = ln P ( x 1 , x 2 ∣ Y = 0 ) P ( x 1 , x 2 ∣ Y = 1 ) = [ μ 10 − μ 11 ( 1 − ρ 2 ) σ 1 2 + ρ ( μ 21 − μ 20 ) ( 1 − ρ 2 ) σ 1 σ 2 ] x 1 + [ μ 20 − μ 21 ( 1 − ρ 2 ) σ 2 2 + ρ ( μ 11 − μ 10 ) ( 1 − ρ 2 ) σ 1 σ 2 ] x 2 + [ μ 11 2 − μ 10 2 2 ( 1 − ρ 2 ) σ 1 2 + μ 21 2 − μ 20 2 2 ( 1 − ρ 2 ) σ 2 2 + ρ ( μ 10 μ 20 − μ 11 μ 21 ) ( 1 − ρ 2 ) σ 1 σ 2 ] \begin{aligned}
\ln\frac{P(X|Y=0)}{P(X|Y=1)}
&=\ln\frac{P(x_1,x_2|Y=0)}{P(x_1,x_2|Y=1)}\\
&=\left[\frac{\mu_{10}-\mu_{11}}{(1-\rho^2)\sigma_1^2}+\frac{\rho(\mu_{21}-\mu_{20})}{(1-\rho^2)\sigma_1\sigma_2}\right]x_1\\
&+\left[\frac{\mu_{20}-\mu_{21}}{(1-\rho^2)\sigma_2^2}+\frac{\rho(\mu_{11}-\mu_{10})}{(1-\rho^2)\sigma_1\sigma_2}\right]x_2\\
&+\left[\frac{\mu_{11}^2-\mu_{10}^2}{2(1-\rho^2)\sigma_1^2}+\frac{\mu_{21}^2-\mu_{20}^2}{2(1-\rho^2)\sigma_2^2}+\frac{\rho(\mu_{10}\mu_{20}-\mu_{11}\mu_{21})}{(1-\rho^2)\sigma_1\sigma_2}\right]
\end{aligned} ln P ( X ∣ Y = 1 ) P ( X ∣ Y = 0 ) = ln P ( x 1 , x 2 ∣ Y = 1 ) P ( x 1 , x 2 ∣ Y = 0 ) = [ ( 1 − ρ 2 ) σ 1 2 μ 10 − μ 11 + ( 1 − ρ 2 ) σ 1 σ 2 ρ ( μ 21 − μ 20 ) ] x 1 + [ ( 1 − ρ 2 ) σ 2 2 μ 20 − μ 21 + ( 1 − ρ 2 ) σ 1 σ 2 ρ ( μ 11 − μ 10 ) ] x 2 + [ 2 ( 1 − ρ 2 ) σ 1 2 μ 11 2 − μ 10 2 + 2 ( 1 − ρ 2 ) σ 2 2 μ 21 2 − μ 20 2 + ( 1 − ρ 2 ) σ 1 σ 2 ρ ( μ 10 μ 20 − μ 11 μ 21 ) ]
等价于逻辑回归的一般形式:
P ( Y = 1 ∣ X ) = 1 1 + exp ( w 0 + ∑ i = 1 2 w i x i ) P(Y=1|X)=\frac{1}{1+\exp(w_0+\sum_{i=1}^2w_ix_i)} P ( Y = 1∣ X ) = 1 + exp ( w 0 + ∑ i = 1 2 w i x i ) 1
其中:
w 0 = ln 1 − π π + [ ( μ 11 2 − μ 10 2 ) 2 ( 1 − ρ 2 ) σ 1 2 + ( μ 21 2 − μ 20 2 ) 2 ( 1 − ρ 2 ) σ 2 2 + ρ ( μ 10 μ 20 − μ 11 μ 21 ) ( 1 − ρ 2 ) σ 1 σ 2 ] w 1 = μ 10 − μ 11 ( 1 − ρ 2 ) σ 1 2 + ρ ( μ 21 − μ 20 ) ( 1 − ρ 2 ) σ 1 σ 2 w 2 = μ 20 − μ 21 ( 1 − ρ 2 ) σ 2 2 + ρ ( μ 11 − μ 10 ) ( 1 − ρ 2 ) σ 1 σ 2 \begin{aligned}
w_0&=\ln\frac{1-\pi}{\pi}+\left[\frac{(\mu_{11}^2-\mu_{10}^2)}{2(1-\rho^2)\sigma_1^2}+\frac{(\mu_{21}^2-\mu_{20}^2)}{2(1-\rho^2)\sigma_2^2}+\frac{\rho(\mu_{10}\mu_{20}-\mu_{11}\mu_{21})}{(1-\rho^2)\sigma_1\sigma_2}\right]\\
w_1&=\frac{\mu_{10}-\mu_{11}}{(1-\rho^2)\sigma_1^2}+\frac{\rho(\mu_{21}-\mu_{20})}{(1-\rho^2)\sigma_1\sigma_2}
\\
w_2&=\frac{\mu_{20}-\mu_{21}}{(1-\rho^2)\sigma_2^2}+\frac{\rho(\mu_{11}-\mu_{10})}{(1-\rho^2)\sigma_1\sigma_2}
\end{aligned} w 0 w 1 w 2 = ln π 1 − π + [ 2 ( 1 − ρ 2 ) σ 1 2 ( μ 11 2 − μ 10 2 ) + 2 ( 1 − ρ 2 ) σ 2 2 ( μ 21 2 − μ 20 2 ) + ( 1 − ρ 2 ) σ 1 σ 2 ρ ( μ 10 μ 20 − μ 11 μ 21 ) ] = ( 1 − ρ 2 ) σ 1 2 μ 10 − μ 11 + ( 1 − ρ 2 ) σ 1 σ 2 ρ ( μ 21 − μ 20 ) = ( 1 − ρ 2 ) σ 2 2 μ 20 − μ 21 + ( 1 − ρ 2 ) σ 1 σ 2 ρ ( μ 11 − μ 10 )
所以非朴素的高斯贝叶斯分类器也正是逻辑回归所使用的形式。