·  阅读 936

# 搞定softmax和交叉熵

## 1. 信息量的度量-熵

### 信息量

$I=\log _a\frac{1}{P\left( x \right)}$

• a=2,则信息量的单位为比特（bit）------最为常用
• a=e,则信息量的单位为奈特(nat)
• a=10,则信息量的单位为特莱(Hartley)

### 平均信息量

$H\left( X \right) =-\Sigma \left( p_i \right) \log p\left( x_i \right)$

0123
0.3750.250.250.125
$H\left( X \right) =-p_0\log _2P\left( x_0 \right) -p_1\log _2P\left( x_1 \right) -p_2\log _2P\left( x_2 \right) -p_3\log _2P\left( x_3 \right)$
$H\left( X \right) =-0.375\log _20.375-0.25\log _20.25-0.25\log _20.25-0.125\log _20.125=1.90564$

## 2. 交叉熵

$H\left( p,q \right) =\underset{x}{\Sigma}p\left( x \right) \log \left( \frac{1}{q\left( x \right)} \right)$
$\left( image0 \right) \,\, label\,\,=\,\,\left[ \begin{array}{c} 1\\ 0\\ 0\\ 0\\ \end{array} \right] \,\,predicate=\left[ \begin{array}{c} 0.8\\ 0.1\\ 0.1\\ 0\\ \end{array} \right] \\ \left( image2 \right) \,\, label\,\,=\,\,\left[ \begin{array}{c} 0\\ 0\\ 0\\ 1\\ \end{array} \right] \,\,predicate=\left[ \begin{array}{c} 0.7\\ 0.1\\ 0.1\\ 0.1\\ \end{array} \right]$
$H\left( p,q \right) =-1\log _20.8-1\log _10.1$

## 3. softmax

$Y_i=\frac{e^{z_i}}{\Sigma _{i=1}^{n}e^{z_i}}$
$\text{请看以下例子，输入为Z},\text{输出为Y},\text{且Y各个概率之和为}1$
$z=\left[ \begin{array}{c} \mathrm{z}_1\\ \mathrm{z}_2\\ \mathrm{z}_3\\ \end{array} \right] =\left[ \begin{array}{c} 3\\ 1\\ -3\\ \end{array} \right]$
$\sum_{i=1}^3{e^{z_{\mathrm{i}}}}=\mathrm{e}^3+\mathrm{e}^1+\mathrm{e}^{-3}=22.8536$
$\text{第一步实现所有数映射为非负数}z'=\left[ \begin{array}{c} e^3\\ e^1\\ e^{-3}\\ \end{array} \right]$
$\sum_{i=1}^3{e^{z_{\mathrm{i}}}}=\mathrm{e}^3+\mathrm{e}^1+\mathrm{e}^{-3}=22.8536$
$\text{第二步实现所有数映射到0-1范围之内且和为1 } \mathrm{Y}=\frac{z'}{\sum_{i=1}^3{e^{z_{\mathrm{i}}}}}=\frac{\left[ \begin{array}{c} e^3\\ e^1\\ e^{-3}\\ \end{array} \right]}{\sum_{i=1}^3{e^{z_{\mathrm{i}}}}}=\frac{\left[ \begin{array}{c} e^3\\ e^1\\ e^{-3}\\ \end{array} \right]}{22.8536}=\frac{\left[ \begin{array}{c} 20.0855\\ 2.71828\\ 0.0497871\\ \end{array} \right]}{22.8536}\approx \left[ \begin{array}{c} 0.88\\ 0.12\\ 0\\ \end{array} \right]$

## softmax输出作为交叉熵的输入

$\mathrm{Y}=\frac{z'}{\sum_{i=1}^3{e^{z_{\mathrm{i}}}}}=\frac{\left[ \begin{array}{c} e^3\\ e^1\\ e^{-3}\\ \end{array} \right]}{\sum_{i=1}^3{e^{z_{\mathrm{i}}}}}=\frac{\left[ \begin{array}{c} e^3\\ e^1\\ e^{-3}\\ \end{array} \right]}{22.8536}=\frac{\left[ \begin{array}{c} 20.0855\\ 2.71828\\ 0.0497871\\ \end{array} \right]}{22.8536}\approx \left[ \begin{array}{c} 0.88\\ 0.12\\ 0\\ \end{array} \right]$
$\mathrm{label} =\,\,\left[ \begin{array}{c} 1\\ 0\\ 0\\ \end{array} \right]$
$\mathrm{H}\left( \mathrm{x} \right) =-\log _2\left( 0.88 \right) =0.184425$