1. 题目描述
Traffic 表:
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| activity | enum |
| activity_date | date |
+---------------+---------+
该表可能有重复的行 activity 列是 ENUM 类型, 可能取 ('login', 'logout', 'jobs', 'groups', 'homepage') 几个值之一 编写解决方案, 找出从今天起最多90天内, 每个日期该日期首次登录的用户数, 假设今天是2019-06-30
2. 测试用例
输入:
Traffic 表:
+---------+----------+---------------+
| user_id | activity | activity_date |
+---------+----------+---------------+
| 1 | login | 2019-05-01 |
| 1 | homepage | 2019-05-01 |
| 1 | logout | 2019-05-01 |
| 2 | login | 2019-06-21 |
| 2 | logout | 2019-06-21 |
| 3 | login | 2019-01-01 |
| 3 | jobs | 2019-01-01 |
| 3 | logout | 2019-01-01 |
| 4 | login | 2019-06-21 |
| 4 | groups | 2019-06-21 |
| 4 | logout | 2019-06-21 |
| 5 | login | 2019-03-01 |
| 5 | logout | 2019-03-01 |
| 5 | login | 2019-06-21 |
| 5 | logout | 2019-06-21 |
+---------+----------+---------------+
输出:
+------------+-------------+
| login_date | user_count |
+------------+-------------+
| 2019-05-01 | 1 |
| 2019-06-21 | 2 |
+------------+-------------+
解释:
请注意,我们只关心用户数非零的日期 ID 为 5 的用户第一次登陆于 2019-03-01, 因此他不算在 2019-06-21 的的统计内
3. 解题思路
- 查询每个用户最终登录日期
select user_id, min(activity_date) as first_login_date
from Traffic
where activity = 'login'
group by user_id
查询结果
+-------+----------------+
|user_id|first_login_date|
+-------+----------------+
|1 |2019-05-01 |
|2 |2019-06-21 |
|3 |2019-01-01 |
|4 |2019-06-21 |
|5 |2019-03-01 |
+-------+----------------+
- 查找用户的最早登录日期在
2019-06-30的90天内, 需要使用到*DATEDIFF函数*, 然后按照登录日期分组统计用户数量
select t.first_login_date as login_date, count(t.user_id) as user_count
from (select user_id, min(activity_date) as first_login_date
from Traffic
where activity = 'login'
group by user_id) as t
where DATEDIFF('2019-06-30', t.first_login_date) <= 90
group by t.first_login_date;
查询结果
+----------+----------+
|login_date|user_count|
+----------+----------+
|2019-05-01| 1 |
|2019-06-21| 2 |
+----------+----------+