几种基础的激活函数及其实现几种基础的激活函数及其实现说明：首次发表日期：2024-10-31sigmoid soft

几种基础的激活函数及其实现

说明：

首次发表日期：2024-10-31
参考：
- insidelearningmachines.com/neural_netw…
- stackoverflow.com/questions/4…

神经元（Neuron)

以下为一个神经元：

z = w_1x_1 + w_2x_2 + w_3x_3 + b

可以使用向量来表达：

z = \vec{w}^T\vec{x} + b

$\vec{x}=\left[\begin{array}{l}x_1 \\ x_2 \\ x_3\end{array}\right]$ : 输入（input）
$\vec{w}=\left[\begin{array}{l}w_1 \\ w_2 \\ w_3\end{array}\right]$ : 权重（weights）
$b$ ：偏置（bias）

假设激活函数为 $f$ ，那么输出 $y$ 为：

y = f(x)

激活函数

Binary Threshold

y= \begin{cases} 1 & z \geq k\\ 0 & z<k \\ \end{cases}

def binary(z : np.array, k: float) -> np.array:
    """
    Function to execute the binary threshold activation
    
    Inputs:
        z : input dot product w*x + b
    Output:
        y : determined activation
    """
    return np.round(z >= k)

z = np.linspace(-4,4,num=100)

y = binary(z)

Sigmoid

y=1 /\left(1+e^{-z}\right)

如果 $z$ 是一个很大的正数，那么 $e^{-z}$ 趋近于 0，然后 $y$ 趋近于 1
如果 $z$ 是一个很大的负数，那么 $e^{-z}$ 趋近于无穷大，然后 $y$ 趋近于 0
如果 $z=0$ ，那么 $e^{-z}=1$ ，然后 $y = \frac{1}{2}$

def sigmoid(z : np.array) -> np.array:
    """
    Function to execute the sigmoid activation
    
    Inputs:
        z : input dot product w*x + b
    Output:
        y : determined activation
    """
    return 1/(1+np.exp(-z))
    
z = np.linspace(-5,5,num=100)
y = sigmoid(z)

sigmoid 激活函数常用于 binary classification problems

Softmax

Softmax激活函数适用于 Multiclass classification problems

如果有 $k$ 个输出分类：

y_i=\frac{e^{z_i}}{\sum_{j=1}^k e^{z_j}}

softmax是argmax函数的 smooth approximation

def softmax(z : np.array) -> np.array:
    """
    Function to execute the softmax activation
    
    Inputs:
        z : input dot product w*x + b
    Output:
        y : determined activation
    """
    return np.exp(z)/np.sum(np.exp(z))

ReLU

y= \begin{cases}z & z \geq 0 \\ 0 & z<0\end{cases}

def relu(z : np.array) -> np.array:
    """
    Function to execute the ReLU activation
    
    Inputs:
        z : input dot product w*x + b
    Output:
        y : determined activation
    """
    return np.where(z>=0,z,0)

当 $z$ 为非正数时，输出 $y$ 和梯度均为0，梯度为0会导致训练停止。

PReLU

y=\max (0, z)+ a * \min (0, z)

其中 $a$ 是一个通过训练来学习的参数 (learnable parameter)。

相比于ReLU，即使 $z$ 为负数，梯度也不会为0。

当 $a = -1$ ， $y=|z|$ ，激活函数被称为 absolute value ReLU
当 $a$ 为一个较小的正数，通常在 0.01 左右，激活函数被称为 leaky ReLU

Tanh

y=\frac{e^z-e^{-z}}{e^z+e^{-z}}

def tanh(z : np.array) -> np.array:
    """
    Function to execute the tanh activation
    
    Inputs:
        z : input dot product w*x + b
    Output:
        y : determined activation
    """
    return (np.exp(z) - np.exp(-z))/(np.exp(z)+np.exp(-z))

当输入 $z$ 是一个大的正数时， $e^{-z}$ 趋近于0， $y \approx \frac{e^z}{e^z}$ ，因此 y 趋近于 $1$
当输入 $z$ 是一个大的负数时， $e^z$ 趋近于0， $y \approx \frac{-e^{-z}}{e^{-z}}$ ，因此 y 趋近于 $-1$
当输入 $z$ 为 0 时， $e^z = e^{-z}=1$ ，因此 $y=0$

SoftPlus

f(z)=\log _e\left(1+e^z\right)

SoftPlus 可以看做是 ReLU 的 smooth approximation

\begin{aligned} & \frac{d y}{d z}=f^{\prime}(z)=\frac{d\left(\log_e \left(1+e^z\right)\right)}{d z} \\ & \Longrightarrow f^{\prime}(z)=\frac{e^z}{1+e^z} \\ & \Longrightarrow f^{\prime}(z)=\frac{\frac{e^z}{e^z}}{\frac{1}{e^z}+\frac{e^z}{e^z}} \\ & \Longrightarrow f^{\prime}(z)=\frac{1}{1+e^{-z}} \\ & \Longrightarrow f^{\prime}(z)=\operatorname{sigmoid}(z) \end{aligned}

其中应用了：

\frac{d}{d x}(\ln x)=\frac{1}{x}

和

\frac{d y}{d x}=\frac{d y}{d u} \frac{d u}{d x}

另外：

\begin{aligned} f(z)&=\log _e\left(1+e^z\right) \\ &= \log(1 + e^z) - \log(e^z) + z \\ &= \log\left(\frac{1 + e^z}{e^z}\right)+ z \\ &= \log(1 + e^{-z}) + z \end{aligned}

def softplus(z : np.array) -> np.array:
    """
    Function to execute the softplus activation
    
    Inputs:
        z : input dot product w*x + b
    Output:
        y : determined activation
    """
    return np.log(1 + np.exp(-np.abs(z))) + np.maximum(z, 0)

Swish

f(z)=z * \operatorname{sigmoid}(z)=\frac{z}{\left(1+e^{-z}\right)}

def swish(z : np.array) -> np.array:
    """
    Function to execute the swish activation
    
    Inputs:
        z : input dot product w*x + b
    Output:
        y : determined activation
    """
    return z/(1+np.exp(-z))