SIGIR-美团-广告推荐算法

行业前沿在哪里，重点看定会论文来源的实验室。本文是2021年CCFA类会议SIGIR的短文赛道的一篇论文，来源于美团的广告算法部，针对的学术分支如下：

graph LR
ComputerScience-->A(DeepLearning x Recommendation)-->AdvertisingRank-->CTRPrediction

论文地址：arxiv.org/pdf/2106.05…

前言：读论文的方法论

开篇有益，但具体是什么益呢？大部分人读论文不懂其中要理转身就忘。为此带着目的读论文更有可取性。也就类似CNN的思想，我们脑中要有几个窗口，随着张量的输入一起流动。

为此，我将读论文分为以下四块方法：如何发现问题、如何解决问题、如何评估方法、如何科学表达。也就是从凝练科学问题视角、设计技术方案视角、试验设计视角和科技写作视角，四个视角从论文中收获内容。从这四个视角延伸出科技论文的四项基本功，即如下图所示：

graph LR
A(论文内容) --> B(科学问题)-->F(数学建模)
A(论文内容) --> C(技术方法)-->G(科学编写代码)
A(论文内容) --> D(实验设计)-->H(实验千千万,如何验证假设)
A(论文内容) --> E(科技写作)-->I(专有名词,写作方法)

第一步：科学问题

一句话总结创新

推荐系统会给用户展示一个列表，这个列表中每个商品有自己的位置，比如排在第1位、第2位等等。面对同一个推荐列表中商品排在不同位置的现状，为了提升预测的准确性，当前研究分为两类，一类是利用真实位置训练，但用固定位置推理，另一类是没有位置推理的逆倾向加权训练。然而，这些对训练与推理的不同对待会不可避免的导致训练与推理的不一致性与模型的次优表现。因此，为弥补此问题，美团技术团队提出了一种新的方法，在训练与推理阶段建模位置信息与上下文信息，进行深度的特征交差与偏好预测。

创新的多方面细节

广告推荐系统中，广告主会为其广告的点击付费，若用户没点击则平台无法获利，即每曝光收益=CTR * 单价。因此预测广告的CTR是收入与用户体验的核心指标。当前的基于商品位置的CTR预测任务假设CTR C依赖于两个潜在的变量E与R：

$p(C=1|u,c,i,k)=p(E=1|k,[s])p(R=1|u,c,i)$

其中C表示点击率预测，R表示用户对商品的偏好，E表示位置偏差， $p(E) * p(R)$ 表示对有上下文特征c的用户u把商品i排在位置k的时用户的点击率。

为此，得到了本研究的优化目标 $CTR_k^j=p(C=1|u,i,c,k)$ ，并进一步延伸出以下三个研究子目标：1）研究位置特征的离线训练与在线预测的一致性；2）研究位置特征与上下文特征的交叉；3）传统的评估方法无法评估位置偏差，研究新的评估方法PAUC

第二步：技术方法

工业界论文技术路线图往往都是清晰易懂的。每一步计算都会在图中展示，技术路线图如下：

子目标1-2:位置特征的一致性与特征交叉

输入为一个六元组：(u,c,set(i),set(bk),position,CTR)，分别表示用户u、用户上下文c、召回商品集set(i)、用户历史在k位置的行为序列bk，位置和作为训练信号的CTR。

最下层，从左往右看，basemodel表示做基础的模型交叉，对应于计算 $p(R=1|u,c,i)$ ，中间部分为模型的核心创新点，计算了位置特征、当前上下文特征以及用户历史点击该位置的行为序列，算的用户对该位置的个性化便好。最右侧是这个图的精华，解释其中缩放的计算逻辑，比如attention机制与transformer机制，以及图中各个颜色的向量表征，让看图的人看完就可以复现的模型解释。

中间层，利用用户对位置的便好与用户对商品的偏好 $p(R=1|u,i,c)$ 以及用户对POI的便好 $p(E=1|k,c,b)$ ，计算用户对该商品排在不同位置的点击率，但其实这里有个问题，对同一个用户u对同一个商品i排在不同位置的ctr是没有label的，用户如果只在k=4的位置点击了i，不能表示在其他位置用户不点击i。

最终层，生成商品x位置的二维矩阵，用于评估。

子目标3:PAUC评估方法

新的评价指标： $PAUC=\frac{\Sigma_{k=1}^K\#impression_kPAUC@k}{\Sigma_{k=1}^K\#impression_k}$ ，其中 $\#impression_k$ 表示第k个位置曝光的商品数目， $PAUC@k$ 表示第k个位置的AUC，同样如果每个位置分类正确则该值等于1

第四步：实验设计

CS的实验往往分为三部分：1）是对比的SOTA方法；2）是使用的数据集；3）是实验方法。

对比方法

DIN、DIN+PosInWide、DIN+PAL、DIN+ActualPosInWide、DIN+Combination、DPIN-Transformer、DPIN、DPIN+ItemAction。其中DIN+PosInWide表示讲位置特征向量在DIN模型的wide侧直接输入，DIN+PAL表示将PAL（华为2019年论文）方法引入DIN中，DIN+Combination表示在评估中将上述的中间层引入模型中,DPIN-Transformer表示DPIN的模型去除transformer的结构，DPIN+ItemAction进一步在最下层左边的base model引入了跟位置特征的交叉，但由于服务性能问题无法上线

数据集

选取了四周的商品搜索广告曝光日志来训练模型，在两个评估集进行评估，1）离线的第二天的常规流量；2）离线的第二天的前k个随机流量。前k个随机流量排除了相关推荐的影响，选取5%的随机流量。因为直接用常规流量会有搜索相关性的偏差，而通过随机试验去除偏差。

试验方法

试验总体分两部分，1）离线试验：从上述离线数据集中进行训练与评估；2）在线试验，在线与基线模型进行AB试验。

离线试验结果：

试验结果表明，DPIN方法在性能以及优越性得到体现。

在线试验结果：

总结：科学写法

注：用（XXX）表示内容可替换，积累写法。

1. 摘要写法

表达重要性：

（CTR prediction） palys an important role in （online recommender systems）.

解释说明名词：

In practice, （the training of CTR models） depends on （click data） which is （intrinsically biased towards higher positions） since （higher position has higher CTR by nature）.

表达现有研究：

Existing methods such as （actual position training with fixed position inference） and （inverse propensity weighted training with no position inference ）alleviate the bias problem （to some extend）（在一定程度上）.

利用转折表达结论：

However, the （different） treatment of （position information） between （training and inference） will inevitably lead to （inconsistency and sub-optimal online performance）.

利用同时表达解释：

Meanwhile, the basic assumption of these methods, （i.e., the click probability is the product of examination probability and relevance probability）, is （oversimplified ）（过于简单）and （insufficient）（局限） to model the rich interaction between （position and other information）（两种信息的交差）.

本文提出的方法：

In this paper, we propose a （Deep Position-wise Interaction Network (DPIN)） to efficiently combine （all candidate items and positions for estimating CTR at each position）, achieving （consistency between offline and online as well as modeling the deep non-linear interaction among position, user, context and item under the limit of serving performance）.

在方法基础上提出新的评估方式：

Following our new treatment to （the position bias in CTR prediction）, we propose a （new evaluation metrics） named （PAUC (position-wise AUC) that is suitable for measuring the ranking quality at a given position）.

解释试验，并说明贡献：

Through extensive experiments on a real world dataset, we show empirically that our method is both effective and efficient in （solving position bias problem.）
We have also deployed our method in production and observed statistically significant improvement over a highly optimized baseline in a rigorous A/B test.

引言写法

段落整体：

表达背景：

In （cost-per-click (CPC) advertising） systems, （advertisers） are charged for every ad click, and advertisements are ranked by the eCPM (effective cost per mile), which is the product of （click-through rate (CTR) and bid price）（表达为XX与YY的乘积）.
Hence,（CTR prediction） is a core task and has a direct impact in （the final revenue and user experience）.

解释说明：

The （position bias） happens （as users tend to clicks on items in higher position） regardless of （the items’ actual relevance） so that （the CTR declines rapidly with the display position）.

总结前人工作：

Since （position signal） greatly impacts （the CTR prediction）, there has been a great deal of work on solving （position bias problem）.
Modeling（ position as a feature ）in neural network[参考文献] is widely adopted in industrial applications due to its （simplicity and effectiveness）, in which actual （position feature） is added in the wide part of neural network during offline training and a default position value will be used during online inference.

在线试验表述： 6. Online A/B test is also deployed to demonstrate that （DPIN） has a significant improvement over a highly optimized baseline.

实验表述

描述数据集

We evaluate our methods on two test sets, which are collected from （regular traffic） and （top-k randomized traffic）（the next day）.

描述实验结果：

By employing our proposed （position-wise combination） module, the DIN+Combination has 2.74% gain on AUC and 0.63% gain on PAUC compared with DIN+PosInWide, achieving consistency and alleviate position bias further, which shows the position bias is not independent. Furthermore, DPIN models deep non-linear interaction among position, context and user, and eliminates position bias existed in the user sequence by position-wise method, which has 0.24% gain on AUC and 0.44% gain on PAUC compared with DIN+Combination.

实验结果解释：

The effect of （DPIN-Transformer） explains that it is necessary to adopt （transformer） for interaction among （different positions）. And the comparison between （DPIN） and （DPIN+ItemAction） shows that （DPIN） is close to the brute force method on both AUC and PAUC. As can be seen finally, the （DPIN） has （2.98%） gain on AUC and （1.07%） gain on PAUC relative to the （DIN+PosInWide）, which is a baseline in the （advertising) system online.

为什么要随机试验：

In order to ensure that our method can learn the (position bias) instead of (overfitting the selection bias) of the system, we further evaluate our methods on (randomized traffic). The results show that the differences between the different methods on both regular and randomized traffic is consistent.

性能问题

We retrieve some requests with different candidate item numbers from the dataset to measure serving performance. As shown in Figure 2, the serving latency of position-wise combination module is negligible compared to the DIN model since user sequence operation has a large proportion of latency. The serving latency of DPIN increases slowly as the number of items increases since the deep position-wise interaction module has nothing to do with items. Compared with DIPIN+ItemAction, the DPIN has a great improvement in serving performance with little damage to model performance, which shows that our proposed method is both effective and efficient.

在线试验表现：

Online A/B test was conducted in the sponsored search advertising system from 2021-01-08 to 2021-01-22. For the control group, 10% of users are randomly selected and presented with recommendation generated by DIN+PosInWide. For the experimental group, 10% of users are presented with recommendation generated by DPIN. The A/B test shows that the proposed DPIN has improved CTR by 2.25% and RPM (Revenue Per Mille) by 2.15% compared with baseline. For now, DPIN has been deployed online and serves the main traffic, which contributes a significant business revenue growth.

参考文献

[1] Huang J, Hu K, Tang Q, et al. Deep Position-wise Interaction Network for CTR Prediction[J]. arXiv preprint arXiv:2106.05482, 2021.

SIGIR2021 美团《Deep Position-wise Interaction Network for CTR Prediction》 论文解读