Bias and Debias in Recommender System-A Survey and Future Directions

[5] Jiawei Chen, Hande Dong, Xiang Wang, Fuli Feng, Meng Wang, and Xiangnan He.
2020. Bias and Debias in Recommender System: A Survey and Future Directions.
arXiv preprint arXiv:2010.03240 (2020).

3.Bias in Rec

综述中将data bias分成了四类：selection bias、conformity bias、exposure bias、position bias；

3.1Bias in data

很多Rec模型都是基于i.i.d.(独立同分布)数据的假设的，基于这种假设的model在test环境下表现很好；

定义：被收集的训练数据的分布与理想的测试数据的分布是不一样的；

3.1.1 selection bias

定义：selection bias通常发生在用户自由选择items的时候，我们不能保证用户能够同时对所有的item进行选择(item太多)。因此，交互数据一般都是稀疏的，不可避免的；

值得注意的是：

那些用户没有进行选择或者交互的item，可能包含着用户喜欢的；这意味着负样本的选取也挺重要；
用户倾向于和自己喜欢的item进行选择和交互；
用户更愿意在好的或者坏的item中进行交互；
自适应负样本采集器

3.1.2 exposure bias

隐式反馈常被用在RS中，但是这样的隐式反馈很明显带来了很大的缺陷——仅仅能反映出部分的positive信号，用户的喜好并不能表达出；

定义：exposure bias常常发生在，当用户仅仅暴露在特定的item当中，如此那些没有被观察到的交互一直被当作负样本(negative preference)。

3.1.3 conformity bias

定义：常发生在，用户倾向于表现得类似于其他组内的人，即使这样做会违反他们自己的判断；如此，这样的交互并不能反映用户的喜好；

也就是说，直接利用朋友来进行推荐是不靠谱的，但是可以作为参考和信息的补充；可以考虑使用“隐朋友”，或者高阶连接关系的社交;

3.1.4 position bias

定义：用户往往倾向于对处于RS中位置更高(the top of the list)的item进行交互，不管该item的真正的相关性是多少，如此得到的交互item与用户的偏好并不怎么相关；

3.2 Bias in model

3.2.1 inductive bias,归纳偏置

定义：由模型做出的assumptions是为了更好地去学习目标函数，以及去拟合泛化训练数据；

例如：

(1)交互评分是通过embedding的内积计算得到的；

(2)自适应负样本采样；

[47] Jingtao Ding, Yuhan Quan, Xiangnan He, Yong Li, and Depeng Jin. 2019. Reinforced negative sampling for recommendation with
exposure data. In IJCAI. AAAI Press, 2230–2236.
[130] Dae Hoon Park and Yi Chang. 2019. Adversarial Sampling and Training for Semi-Supervised Information Retrieval. In The World Wide
Web Conference. ACM, 1443–1453.
[137] Steffen Rendle and Christoph Freudenthaler. 2014. Improving pairwise learning for item recommendation from implicit feedback. In
WSDM. ACM, 273–282.
[162] Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, and Dell Zhang. 2017. Irgan: A minimax
game for unifying generative and discriminative information retrieval models. In SIGIR. ACM, 515–524.

(3)离散ranking模型；

[201] Hanwang Zhang, Fumin Shen, Wei Liu, Xiangnan He, Huanbo Luan, and Tat-Seng Chua. 2016. Discrete collaborative filtering. In
SIGIR. ACM, 325–334.
[218] Ke Zhou and Hongyuan Zha. 2012. Learning binary codes for collaborative filtering. In KDD. ACM, 498–506.
[101] Defu Lian, Rui Liu, Yong Ge, Kai Zheng, Xing Xie, and Longbing Cao. 2017. Discrete content-aware matrix factorization. In KDD. ACM,
325–334.

3.3 Bias and unfairness in Resualts

Popularity bias，流行度偏置

定义：popular item被推荐的次数会更加频繁；

容易导致：长尾效应，或者说Matthew effect，"the rich get richer"。

产生的问题：

(1)减少个性化推荐的效果，hurts serendipity;

(2)降低RS推荐结果的公平性；

(3)popular item将进一步增加受欢迎物品的曝光机会；刺激"Matthew effect";

unfairness

定义：RS系统性地、不公平地产生对个人、组织的歧视；性别、种族、皮肤等等；

例如：(1)在job推荐中，女性通常被推荐到一些工资更少的工作； (2)书籍推荐中，女作者的书籍更会被推荐；

3.4 Feedback loop amplifies Biases

上图中的figure2展示了feedback loop，如果在一开始就不考虑这些bias，那么feedback loop会加重由bias带来的影响；

4. debias method

见原综述论文，很长。。。