微博FiBiNET: Combining Feature Importance and Bilinear feature Interaction for CTR

Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction

重点回顾下这篇论文中筛选重要特征的经典做法SeNet。

SeNet

其实SeNet的做法就是对特征做attention操作，动态找出重要的特征，可以说是判别特征重要性的神器。

实际的推荐系统，有大量的特征Embedding因为样本数据极度稀疏，没法学习的很好，如果我们一股脑把所有特征都作为输入塞进炼丹炉，那必然会引入噪音导致推荐效果不佳。在炼丹之前，如果能对炼丹材料各个特征Embedding进行挑选，对特征Embedding做重要性打分，然后根据重要程度来塞进对应的特征，效果必然会有所提升。

以DSSM双塔为例，SeNet的结构如下：在这里插入图片描述

详细做法

整体结构如下在这里插入图片描述其中的SENet结构如下分为三步，整体是用两个全连接层操作（不过实际运用可以只用一个）

Squeeze

用户的 $f$ 个特征对应的Embedding集合 $\mathbf E = (\mathbf e_1, \mathbf e_2, ..., \mathbf e_f)^T \in R^{f \times d}$ ，讲每个特征Embedding取平均，相当于取到代表每个特征Embedding的精华 $z_i = F_{squeeze}(\mathbf e_i) = \sum_{j=1}^k e_{ij}$ 得到向量 $\mathbf z = [z_1, z_2, ...., z_f]$

Excitation

对每个特征的精华值进行非线性变换，得到每个特征的重要性得分 $\mathbf a = F_{excitation}(\mathbf z) = \sigma_2(\mathbf W_2 \sigma_1(\mathbf W_1 \mathbf z))$

Re-Weight

根据重要性对特征Embedding做attention操作 $\mathbf V = F_{reweight}(\mathbf a, \mathbf z) = [a_1 \mathbf e_1, a_2 \mathbf e_2, ..., a_f \mathbf e_f]$

简单有效

实现

# input  [batch_size, feature_num * emb_size]
def se_block(input, feature_num, emb_size, name):
    hidden_size == (feature_num * emb_size)
    w = tf.get_variable(name='w_se_block_%s' % name, shape=[hidden_size, feature_num], dtype=tf.float32)
    b = tf.get_variable(name='b_se_block_%s' % name, shape=[1, feature_num], dtype=tf.float32)

    cur_layer = tf.matmul(input, w) + b  # [batch_size, feature_num]
    cur_layer = tf.layers.batch_normalization(cur_layer, axis=1)
    weight = tf.nn.sigmoid(cur_layer)  # [batch_size , feature_num]
    input = tf.reshape(input, [-1, feature_num, emb_size])
    input = input * tf.expand_dims(weight, axis=2)  # [batch_size , column_size , emb_size]
    output = tf.layers.flatten(input)  # [batch_size , hidden_size]
    return output