1 题目描述
事件表: Events
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| business_id | int |
| event_type | varchar |
| occurences | int |
+---------------+---------+
(business_id, event_type) 是这个表的主键 (具有唯一值的列的组合)
表中的每一行记录了某种类型的事件在某些业务中多次发生的信息
平均活动 是指有特定 event_type 的具有该事件的所有公司的 occurences 的均值
活跃业务 是指具有 多个 event_type 的业务, 它们的 occurences 严格大于 该事件的平均活动次数
写一个解决方案, 找到所有活跃业务
以任意顺序 返回结果表
2 测试用例
输入:
Events table:
+-------------+------------+------------+
| business_id | event_type | occurences |
+-------------+------------+------------+
| 1 | reviews | 7 |
| 3 | reviews | 3 |
| 1 | ads | 11 |
| 2 | ads | 7 |
| 3 | ads | 6 |
| 1 | page views | 3 |
| 2 | page views | 12 |
+-------------+------------+------------+
输出:
+-------------+
| business_id |
+-------------+
| 1 |
+-------------+
解释:
每次活动的平均活动可计算如下:
- reviews: (7+3)/2 = 5
- ads: (11+7+6)/3 = 8
- page views: (3+12)/2 = 7.5
id=1 的业务有 7 个 reviews 事件 (多于 5 个) 和 11 个ads 事件 (多于 8 个), 所以它是一个活跃的业务
3 解题思路
- 按照
event_type分组统计occurences的平均值
select distinct event_type, avg(occurences) over (partition by event_type) as avg_occurences from Events
查询结果
+----------+--------------+
|event_type|avg_occurences|
+----------+--------------+
|ads |8.0000 |
|page views|7.5000 |
|reviews |5.0000 |
+----------+--------------+
- 使用内连接
e.event_type = t.event_type, 将按照event_type分别筛选符合条件的数据e.occurences > t.avg_occurences
select *
from Events as e
inner join (select distinct event_type, avg(occurences) over (partition by event_type) as avg_occurences
from Events) as t
on e.event_type = t.event_type
where e.occurences > t.avg_occurences
查询结果
+-----------+----------+----------+----------+--------------+
|business_id|event_type|occurences|event_type|avg_occurences|
+-----------+----------+----------+----------+--------------+
|1 |ads |11 |ads |8.0000 |
|2 |page views|12 |page views|7.5000 |
|1 |reviews |7 |reviews |5.0000 |
+-----------+----------+----------+----------+--------------+
- 根据题目描述, 需要筛选出多个大于平均活动的
event_type, 简单理解就是按照business_id统计, 总数必须大于 1
select e.business_id
from Events as e
inner join (select distinct event_type, avg(occurences) over (partition by event_type) as avg_occurences
from Events) as t
on e.event_type = t.event_type
where e.occurences > t.avg_occurences
group by e.business_id
having count(*) >= 2;
查询结果
+-----------+
|business_id|
+-----------+
|1 |
+-----------+