本题来自牛客:SQL140 未完成率较高的50%用户近三个月答卷情况 未完成率较高的50%用户近三个月答卷情况_牛客题霸_牛客网 (nowcoder.com)
题干
现有用户信息表user_info(uid用户ID,nick_name昵称, achievement成就值, level等级, job职业方向, register_time注册时间):
| id | uid | nick_name | achievement | level | job | register_time |
|---|---|---|---|---|---|---|
| 1 | 1001 | 牛客1号 | 3200 | 7 | 算法 | 2020-01-01 10:00:00 |
| 2 | 1002 | 牛客2号 | 2500 | 6 | 算法 | 2020-01-01 10:00:00 |
| 3 | 1003 | 牛客3号♂ | 2200 | 5 | 算法 | 2020-01-01 10:00:00 |
试卷信息表examination_info(exam_id试卷ID, tag试卷类别, difficulty试卷难度, duration考试时长, release_time发布时间):
| id | exam_id | tag | difficulty | duration | release_time |
|---|---|---|---|---|---|
| 1 | 9001 | SQL | hard | 60 | 2020-01-01 10:00:00 |
| 2 | 9002 | SQL | hard | 80 | 2020-01-01 10:00:00 |
| 3 | 9003 | 算法 | hard | 80 | 2020-01-01 10:00:00 |
| 4 | 9004 | PYTHON | medium | 70 | 2020-01-01 10:00:00 |
试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分):
| id | uid | exam_id | start_time | submit_time | score |
|---|---|---|---|---|---|
| 1 | 1001 | 9001 | 2020-01-01 09:01:01 | 2020-01-01 09:21:59 | 90 |
| 15 | 1002 | 9001 | 2020-01-01 18:01:01 | 2020-01-01 18:59:02 | 90 |
| 13 | 1001 | 9001 | 2020-01-02 10:01:01 | 2020-01-02 10:31:01 | 89 |
| 2 | 1002 | 9001 | 2020-01-20 10:01:01 | ||
| 3 | 1002 | 9001 | 2020-02-01 12:11:01 | ||
| 5 | 1001 | 9001 | 2020-03-01 12:01:01 | ||
| 6 | 1002 | 9001 | 2020-03-01 12:01:01 | 2020-03-01 12:41:01 | 90 |
| 4 | 1003 | 9001 | 2020-03-01 19:01:01 | ||
| 7 | 1002 | 9001 | 2020-05-02 19:01:01 | 2020-05-02 19:32:00 | 90 |
| 14 | 1001 | 9002 | 2020-01-01 12:11:01 | ||
| 8 | 1001 | 9002 | 2020-01-02 19:01:01 | 2020-01-02 19:59:01 | 69 |
| 9 | 1001 | 9002 | 2020-02-02 12:01:01 | 2020-02-02 12:20:01 | 99 |
| 10 | 1002 | 9002 | 2020-02-02 12:01:01 | ||
| 11 | 1002 | 9002 | 2020-02-02 12:01:01 | 2020-02-02 12:43:01 | 81 |
| 12 | 1002 | 9002 | 2020-03-02 12:11:01 | ||
| 17 | 1001 | 9002 | 2020-05-05 18:01:01 | ||
| 16 | 1002 | 9003 | 2020-05-06 12:01:01 |
请统计SQL试卷上未完成率较高的50%用户中,6级和7级用户在有试卷作答记录的近三个月中,每个月的答卷数目和完成数目。按用户ID、月份升序排序。
由示例数据结果输出如下:
| uid | start_month | total_cnt | complete_cnt |
|---|---|---|---|
| 1002 | 202002 | 3 | 1 |
| 1002 | 202003 | 2 | 1 |
| 1002 | 202005 | 2 | 1 |
解释:各个用户对SQL试卷的未完成数、作答总数、未完成率如下:
| uid | incomplete_cnt | total_cnt | incomplete_rate |
|---|---|---|---|
| 1001 | 3 | 7 | 0.4286 |
| 1002 | 4 | 8 | 0.5000 |
| 1003 | 1 | 1 | 1.0000 |
1001、1002、1003分别排在1.0、0.5、0.0的位置,因此较高的50%用户(排位<=0.5)为1002、1003; 1003不是6级或7级;
有试卷作答记录的近三个月为202005、202003、202002;
这三个月里1002的作答题数分别为3、2、2,完成数目分别为1、1、1。
测试数据
drop table if exists examination_info,user_info,exam_record;
CREATE TABLE examination_info (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
exam_id int UNIQUE NOT NULL COMMENT '试卷ID',
tag varchar(32) COMMENT '类别标签',
difficulty varchar(8) COMMENT '难度',
duration int NOT NULL COMMENT '时长',
release_time datetime COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
CREATE TABLE user_info (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid int UNIQUE NOT NULL COMMENT '用户ID',
`nick_name` varchar(64) COMMENT '昵称',
achievement int COMMENT '成就值',
level int COMMENT '用户等级',
job varchar(32) COMMENT '职业方向',
register_time datetime COMMENT '注册时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
CREATE TABLE exam_record (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid int NOT NULL COMMENT '用户ID',
exam_id int NOT NULL COMMENT '试卷ID',
start_time datetime NOT NULL COMMENT '开始时间',
submit_time datetime COMMENT '提交时间',
score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
INSERT INTO user_info(uid,`nick_name`,achievement,`level`,job,register_time) VALUES
(1001, '牛客1', 3200, 7, '算法', '2020-01-01 10:00:00'),
(1002, '牛客2号', 2500, 6, '算法', '2020-01-01 10:00:00'),
(1003, '牛客3号♂', 2200, 5, '算法', '2020-01-01 10:00:00');
INSERT INTO examination_info(exam_id,tag,difficulty,duration,release_time) VALUES
(9001, 'SQL', 'hard', 60, '2020-01-01 10:00:00'),
(9002, 'SQL', 'hard', 80, '2020-01-01 10:00:00'),
(9003, '算法', 'hard', 80, '2020-01-01 10:00:00'),
(9004, 'PYTHON', 'medium', 70, '2020-01-01 10:00:00');
INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2020-01-01 09:01:01', '2020-01-01 09:21:59', 90),
(1002, 9001, '2020-01-20 10:01:01', null, null),
(1002, 9001, '2020-02-01 12:11:01', null, null),
(1003, 9001, '2020-03-01 19:01:01', null, null),
(1001, 9001, '2020-03-01 12:01:01', null, null),
(1002, 9001, '2020-03-01 12:01:01', '2020-03-01 12:41:01', 90),
(1002, 9001, '2020-05-02 19:01:01', '2020-05-02 19:32:00', 90),
(1001, 9002, '2020-01-02 19:01:01', '2020-01-02 19:59:01', 69),
(1001, 9002, '2020-02-02 12:01:01', '2020-02-02 12:20:01', 99),
(1002, 9002, '2020-02-02 12:01:01', null, null),
(1002, 9002, '2020-02-02 12:01:01', '2020-02-02 12:43:01', 81),
(1002, 9002, '2020-03-02 12:11:01', null, null),
(1001, 9001, '2020-01-02 10:01:01', '2020-01-02 10:31:01', 89),
(1001, 9002, '2020-01-01 12:11:01', null, null),
(1002, 9001, '2020-01-01 18:01:01', '2020-01-01 18:59:02', 90),
(1002, 9003, '2020-05-06 12:01:01', null, null),
(1001, 9002, '2020-05-05 18:01:01', null, null);
题干信息解读
题干中的这句话:“SQL试卷上未完成率较高的50%用户中”中的 “未完成率”统计口径为未完成的题目数量除以总作答的数量。
“1001、1002、1003分别排在1.0、0.5、0.0的位置,因此较高的50%用户(排位<=0.5)为1002、1003; 1003不是6级或7级;”这里的排位是需要用窗口函数统计的。
不是根据未完成率来排位!不是根据未完成率来排位!不是根据未完成率来排位! 因为刚好未完成率与排位的结果相近,然后就载在这了。
这道题就前面求解符合要求的用户部分比较困难,后面统计近三个月的作答数量跟完成数目都比较简单。
窗口函数
本题需要用到的窗口函数为percent_rank(), 作用:用于返回某个数值在数据集中的百分比排位,对顺序敏感,取决over里的order by的顺序 具体用法:
select tag,
percent_rank() over(order by tag asc)
from examination_info;
函数解析来源:www.cnblogs.com/xuwinwin/p/…
解题
这里分两步
- 求解出符合要求的用户
- 根据题干条件统计数据
求解出符合要求的用户
未完成率统计
select
tba.uid,
sum(if(tba.submit_time is null, 1, 0)) as incomplete_cnt, -- 未完成数量
count(tba.start_time) as total_cnt, -- 总答题数量
sum(if(tba.submit_time is null, 1, 0))/count(tba.start_time) as incomplete_rate -- 未完成率
from exam_record as tba inner join (select exam_id from examination_info where tag='SQL') as tbb
on tba.exam_id=tbb.exam_id
group by tba.uid
;
根据这个口径排序
select
tba.uid,
percent_rank() over(order by sum(if(tba.submit_time is null, 1, 0))/count(tba.start_time) desc) as pk
from exam_record as tba inner join (select exam_id from examination_info where tag='SQL') as tbb
on tba.exam_id=tbb.exam_id
group by tba.uid
;
取排位小于等于0.5的数据,且用户等级大于等于6的用户
select
uid
from
(select
tba.uid,
percent_rank() over(order by sum(if(tba.submit_time is null, 1, 0))/count(tba.start_time) desc) as pk
from exam_record as tba inner join (select exam_id from examination_info where tag='SQL') as tbb
on tba.exam_id=tbb.exam_id
group by tba.uid) as t
where t.pk<=0.5 and uid in (select uid from user_info where level>=6)
;
统计符合要求的用户的数据
select
uid,
start_month,
count(start_month) as total_cnt,
count(submit_time) as complete_cnt
from
(select
uid,
DATE_FORMAT(start_time,"%Y%m") as start_month,
submit_time,
dense_rank() over(partition by uid order by DATE_FORMAT(start_time,"%Y%m") desc) as ranking
from exam_record) as t
where t.ranking<=3
and uid in
(select
uid
from
(select
tba.uid,
percent_rank() over(order by sum(if(tba.submit_time is null, 1, 0))/count(tba.start_time) desc) as pk
from exam_record as tba inner join (select exam_id from examination_info where tag='SQL') as tbb
on tba.exam_id=tbb.exam_id
group by tba.uid) as t
where t.pk<=0.5 and uid in (select uid from user_info where level>=6)
)
group by uid, start_month
order by uid, start_month
;
最终结果如下: