1 题目描述
动作表: Actions
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| post_id | int |
| action_date | date |
| action | enum |
| extra | varchar |
+---------------+---------+
这张表可能存在重复的行。 action 列的类型是 ENUM,可能的值为 ('view', 'like', 'reaction', 'comment', 'report', 'share')。 extra 列拥有一些可选信息,例如:报告理由(a reason for report)或反应类型(a type of reaction)等。
移除表: Removals
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| post_id | int |
| remove_date | date |
+---------------+---------+
这张表的主键是 post_id(具有唯一值的列)。 这张表的每一行表示一个被移除的帖子,原因可能是由于被举报或被管理员审查。
编写解决方案,统计在被报告为垃圾广告的帖子中,被移除的帖子的每日平均占比,四舍五入到小数点后 2 位。
2 测试用例
输入:
Actions table:
+---------+---------+-------------+--------+--------+
| user_id | post_id | action_date | action | extra |
+---------+---------+-------------+--------+--------+
| 1 | 1 | 2019-07-01 | view | null |
| 1 | 1 | 2019-07-01 | like | null |
| 1 | 1 | 2019-07-01 | share | null |
| 2 | 2 | 2019-07-04 | view | null |
| 2 | 2 | 2019-07-04 | report | spam |
| 3 | 4 | 2019-07-04 | view | null |
| 3 | 4 | 2019-07-04 | report | spam |
| 4 | 3 | 2019-07-02 | view | null |
| 4 | 3 | 2019-07-02 | report | spam |
| 5 | 2 | 2019-07-03 | view | null |
| 5 | 2 | 2019-07-03 | report | racism |
| 5 | 5 | 2019-07-03 | view | null |
| 5 | 5 | 2019-07-03 | report | racism |
+---------+---------+-------------+--------+--------+
Removals table:
+---------+-------------+
| post_id | remove_date |
+---------+-------------+
| 2 | 2019-07-20 |
| 3 | 2019-07-18 |
+---------+-------------+
输出:
+-----------------------+
| average_daily_percent |
+-----------------------+
| 75.00 |
+-----------------------+
解释: 2019-07-04 的垃圾广告移除率是 50%,因为有两张帖子被报告为垃圾广告,但只有一个得到移除。 2019-07-02 的垃圾广告移除率是 100%,因为有一张帖子被举报为垃圾广告并得到移除。 其余几天没有收到垃圾广告的举报,因此平均值为:(50 + 100) / 2 = 75% 注意,输出仅需要一个平均值即可,我们并不关注移除操作的日期。
3 解题思路
Actions表中统计每天被报告为垃圾广告的帖子去重后的数量,以及对这些确定为垃圾广告帖子中在Removals表中被移除的帖子去重后的数量,然后计算每天被移除的帖子的占比
SELECT COUNT(DISTINCT IF(post_id IN (SELECT post_id FROM Removals), post_id, NULL)) * 100 /
COUNT(DISTINCT post_id) as daily_percent
FROM Actions
WHERE action = 'report'
AND extra = 'spam'
GROUP BY action_date
执行结果
+-------------+
|daily_percent|
+-------------+
|100.0000 |
|50.0000 |
+-------------+
- 统计在被报告为垃圾广告的帖子中,被移除的帖子的每日平均占比,四舍五入到小数点后 2 位
SELECT ROUND(AVG(t.daily_percent), 2) average_daily_percent
FROM (SELECT COUNT(DISTINCT IF(post_id IN (SELECT post_id FROM Removals), post_id, NULL)) * 100 /
COUNT(DISTINCT post_id) daily_percent
FROM Actions
WHERE action = 'report'
AND extra = 'spam'
GROUP BY action_date) as t;
执行结果
+---------------------+
|average_daily_percent|
+---------------------+
|75.00 |
+---------------------+