2020 机器是如何认识世界的(2)

176 阅读3分钟

这是我参与更文挑战的第 24 天,活动详情查看: 更文挑战

在李宏毅教授分析 Alexnet ,图中识别出一些 pattern 和颜色,其实我们第一层输入图片,如果第二层filter 是将第一层filter 作为输入,输入到特征层。

第一层特征图

from keras.applications import VGG16
from keras import backend as K
import matplotlib.pyplot as plt

在 keras 提供了预测经典模型例如 VGG16,我们多半的神经网都是基于这些经典神经网络上进行训练的。

对 block3 的一个层卷积的滤镜,150 x 150 去,我们找到什么样图片可以 block conv1 编号为 0 的特征反映,我们需要定义函数,特征所有像素的求和平均最大表示特征图被激活。

VGG16 我们研究的对象就是 block3_conv1 层的第一个滤镜(编号1),150×150150 \times150 输入到 VGG 什么样的图片让block3_conv1 层的第一个滤镜(编号1 激活,也就是这个滤镜对这样图片最敏感,那么我们如何数学模型来说明和衡量这件事,那么也就是图片和滤镜计算后的值的所有值均值最大,就说明滤镜对这张图片最敏感。那么这就是我们目标函数,我们要做的就是找一张图片输入让该滤镜的均值最大。用 x 表示我们要找的图片, Maxxf(x)Max_x f(x) df(x)dx\frac{df(x)}{dx}w我们用梯度上升来代替微分过程,我们这里输入图片也就是 x 是150×150×3150 \times 150 \times 3矩阵,我们对这个矩阵每一个分量求导

model = VGG16(weights='imagenet',include_top=False)
layer_name = 'block3_conv1'
filter_index = 0

梯度上升

w1w0+ηLww_1 \leftarrow w_0 + \eta \frac{\partial L}{\partial w} 不过如果希望步伐稍微大一点,如果步伐大,可能造成不稳定,梯度向量的做一个 L2 的正则化。

layer_output = model.get_layer(layer_name).output
loss = K.mean(layer_output[:,:,:,filter_index])
layer_output
<tf.Tensor 'block1_conv1/Relu:0' shape=(?, ?, ?, 64) dtype=float32>
print(loss)
Tensor("Mean_1170:0", shape=(), dtype=float32)
grads = K.gradients(loss,model.input)[0]
grads
<tf.Tensor 'gradients_585/block1_conv1/convolution_grad/Conv2DBackpropInput:0' shape=(?, ?, ?, 3) dtype=float32>
grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5)
iterate = K.function([model.input],[loss,grads])
import numpy as np
loss_value,grads_value = iterate([np.zeros((1,150,150,3))])
input_img_data = np.random.random((1,150,150,3)) * 20 + 128.
step = 1.
for i in range(40):
    loss_value,grads_value = iterate([input_img_data])
    input_img_data += grads_value * step

这个函数用于对图片进行压缩,所谓压缩我们要做的工作就是先将图片

def depress_image(x):
    x -= x.mean()
    x /= (x.std() + 1e-5)
    x *= 0.1
    
    x += 0.5
    x = np.clip(x,0,1)
    
    x *= 255
    x = np.clip(x,0,255).astype('uint8')
    return x

这个函数

def generate_pattern(layer_name, filter_index, size=150):
    layer_output = model.get_layer(layer_name).output
    loss = K.mean(layer_output[:,:,:,filter_index])
    
    grads = K.gradients(loss,model.input)[0]
    grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5)
    
    iterate = K.function([model.input],[loss,grads])

    input_img_data = np.random.random((1,size,size,3)) * 20 + 128.
    step = 1.
    for i in range(40):
        loss_value,grads_value = iterate([input_img_data])
        input_img_data += grads_value * step
    img = input_img_data[0]
    return depress_image(img)
plt.imshow(generate_pattern('block1_conv1',0))
plt.show()

output_14_0.png

plt.imshow(generate_pattern('block3_conv1',0))
plt.show()

output_15_0.png

plt.imshow(generate_pattern('block4_conv1',1))
plt.show()

output_16_0.png

for layer_name in ['block1_conv1','block2_conv1','block3_conv1','block4_conv1']:
    size = 64
    margin = 5
    results = np.zeros(( 8 * size + 7 * margin, 8 * size + 7 * margin,3))
    for i in range(8):
        for j in range(8):
            
            filter_img = generate_pattern(layer_name, i + (j*8),size=size)
            horizontal_start = i * size + i * margin
            horizontal_end = horizontal_start + size
            
            vertical_start = j * size + j * margin
            vertical_end = vertical_start + size
            
            results[horizontal_start:horizontal_end,vertical_start:vertical_end,:] = filter_img
    plt.figure(figsize=(20,20))
    plt.imshow(results)
    plt.show()
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, None, None, 3)     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, None, None, 128)   0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, None, None, 256)   295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, None, None, 256)   0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, None, None, 512)   1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, None, None, 512)   0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, None, None, 512)   0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________