每日SQL一练#2023102003

135 阅读9分钟

本题来自牛客:SQL140 未完成率较高的50%用户近三个月答卷情况 未完成率较高的50%用户近三个月答卷情况_牛客题霸_牛客网 (nowcoder.com)

题干

现有用户信息表user_info(uid用户ID,nick_name昵称, achievement成就值, level等级, job职业方向, register_time注册时间):

iduidnick_nameachievementleveljobregister_time
11001牛客1号32007算法2020-01-01 10:00:00
21002牛客2号25006算法2020-01-01 10:00:00
31003牛客3号♂22005算法2020-01-01 10:00:00

试卷信息表examination_info(exam_id试卷ID, tag试卷类别, difficulty试卷难度, duration考试时长, release_time发布时间):

idexam_idtagdifficultydurationrelease_time
19001SQLhard602020-01-01 10:00:00
29002SQLhard802020-01-01 10:00:00
39003算法hard802020-01-01 10:00:00
49004PYTHONmedium702020-01-01 10:00:00

试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分):

iduidexam_idstart_timesubmit_timescore
1100190012020-01-01 09:01:012020-01-01 09:21:5990
15100290012020-01-01 18:01:012020-01-01 18:59:0290
13100190012020-01-02 10:01:012020-01-02 10:31:0189
2100290012020-01-20 10:01:01
3100290012020-02-01 12:11:01
5100190012020-03-01 12:01:01
6100290012020-03-01 12:01:012020-03-01 12:41:0190
4100390012020-03-01 19:01:01
7100290012020-05-02 19:01:012020-05-02 19:32:0090
14100190022020-01-01 12:11:01
8100190022020-01-02 19:01:012020-01-02 19:59:0169
9100190022020-02-02 12:01:012020-02-02 12:20:0199
10100290022020-02-02 12:01:01
11100290022020-02-02 12:01:012020-02-02 12:43:0181
12100290022020-03-02 12:11:01
17100190022020-05-05 18:01:01
16100290032020-05-06 12:01:01

请统计SQL试卷上未完成率较高的50%用户中,6级和7级用户在有试卷作答记录的近三个月中,每个月的答卷数目和完成数目。按用户ID、月份升序排序。

由示例数据结果输出如下:

uidstart_monthtotal_cntcomplete_cnt
100220200231
100220200321
100220200521

解释:各个用户对SQL试卷的未完成数、作答总数、未完成率如下:

uidincomplete_cnttotal_cntincomplete_rate
1001370.4286
1002480.5000
1003111.0000

1001、1002、1003分别排在1.0、0.5、0.0的位置,因此较高的50%用户(排位<=0.5)为1002、1003; 1003不是6级或7级;

有试卷作答记录的近三个月为202005、202003、202002;
这三个月里1002的作答题数分别为3、2、2,完成数目分别为1、1、1。

测试数据

drop table if exists examination_info,user_info,exam_record;
CREATE TABLE examination_info (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    exam_id int UNIQUE NOT NULL COMMENT '试卷ID',
    tag varchar(32) COMMENT '类别标签',
    difficulty varchar(8) COMMENT '难度',
    duration int NOT NULL COMMENT '时长',
    release_time datetime COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

CREATE TABLE user_info (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid int UNIQUE NOT NULL COMMENT '用户ID',
    `nick_name` varchar(64) COMMENT '昵称',
    achievement int COMMENT '成就值',
    level int COMMENT '用户等级',
    job varchar(32) COMMENT '职业方向',
    register_time datetime COMMENT '注册时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

CREATE TABLE exam_record (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid int NOT NULL COMMENT '用户ID',
    exam_id int NOT NULL COMMENT '试卷ID',
    start_time datetime NOT NULL COMMENT '开始时间',
    submit_time datetime COMMENT '提交时间',
    score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO user_info(uid,`nick_name`,achievement,`level`,job,register_time) VALUES
  (1001, '牛客1', 3200, 7, '算法', '2020-01-01 10:00:00'),
  (1002, '牛客2号', 2500, 6, '算法', '2020-01-01 10:00:00'),
  (1003, '牛客3号♂', 2200, 5, '算法', '2020-01-01 10:00:00');

INSERT INTO examination_info(exam_id,tag,difficulty,duration,release_time) VALUES
  (9001, 'SQL', 'hard', 60, '2020-01-01 10:00:00'),
  (9002, 'SQL', 'hard', 80, '2020-01-01 10:00:00'),
  (9003, '算法', 'hard', 80, '2020-01-01 10:00:00'),
  (9004, 'PYTHON', 'medium', 70, '2020-01-01 10:00:00');
	
INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2020-01-01 09:01:01', '2020-01-01 09:21:59', 90),
(1002, 9001, '2020-01-20 10:01:01', null, null),
(1002, 9001, '2020-02-01 12:11:01', null, null),
(1003, 9001, '2020-03-01 19:01:01', null, null),
(1001, 9001, '2020-03-01 12:01:01', null, null),
(1002, 9001, '2020-03-01 12:01:01', '2020-03-01 12:41:01', 90),
(1002, 9001, '2020-05-02 19:01:01', '2020-05-02 19:32:00', 90),
(1001, 9002, '2020-01-02 19:01:01', '2020-01-02 19:59:01', 69),
(1001, 9002, '2020-02-02 12:01:01', '2020-02-02 12:20:01', 99),
(1002, 9002, '2020-02-02 12:01:01', null, null),
(1002, 9002, '2020-02-02 12:01:01', '2020-02-02 12:43:01', 81),
(1002, 9002, '2020-03-02 12:11:01', null, null),
(1001, 9001, '2020-01-02 10:01:01', '2020-01-02 10:31:01', 89),
(1001, 9002, '2020-01-01 12:11:01', null, null),
(1002, 9001, '2020-01-01 18:01:01', '2020-01-01 18:59:02', 90),
(1002, 9003, '2020-05-06 12:01:01', null, null),
(1001, 9002, '2020-05-05 18:01:01', null, null);

题干信息解读

题干中的这句话:“SQL试卷上未完成率较高的50%用户中”中的 “未完成率”统计口径为未完成的题目数量除以总作答的数量。

“1001、1002、1003分别排在1.0、0.5、0.0的位置,因此较高的50%用户(排位<=0.5)为1002、1003; 1003不是6级或7级;”这里的排位是需要用窗口函数统计的。

不是根据未完成率来排位!不是根据未完成率来排位!不是根据未完成率来排位! 因为刚好未完成率与排位的结果相近,然后就载在这了。

这道题就前面求解符合要求的用户部分比较困难,后面统计近三个月的作答数量跟完成数目都比较简单。

窗口函数

本题需要用到的窗口函数为percent_rank(), 作用:用于返回某个数值在数据集中的百分比排位,对顺序敏感,取决over里的order by的顺序 具体用法:

select tag,
percent_rank() over(order by tag asc)
from examination_info;

image.png

函数解析来源:www.cnblogs.com/xuwinwin/p/…

解题

这里分两步

  • 求解出符合要求的用户
  • 根据题干条件统计数据

求解出符合要求的用户

未完成率统计

select
    tba.uid,
    sum(if(tba.submit_time is null, 1, 0)) as incomplete_cnt,  -- 未完成数量
    count(tba.start_time) as total_cnt, -- 总答题数量
    sum(if(tba.submit_time is null, 1, 0))/count(tba.start_time) as incomplete_rate  -- 未完成率
from exam_record as tba inner join (select exam_id from examination_info where tag='SQL') as tbb
on tba.exam_id=tbb.exam_id
group by tba.uid
;

image.png

根据这个口径排序

select
    tba.uid,
    percent_rank() over(order by sum(if(tba.submit_time is null, 1, 0))/count(tba.start_time) desc) as pk
from exam_record as tba inner join (select exam_id from examination_info where tag='SQL') as tbb
on tba.exam_id=tbb.exam_id
group by tba.uid
;

image.png

取排位小于等于0.5的数据,且用户等级大于等于6的用户

select
    uid
from 
    (select
        tba.uid,
        percent_rank() over(order by sum(if(tba.submit_time is null, 1, 0))/count(tba.start_time) desc) as pk
    from exam_record as tba inner join (select exam_id from examination_info where tag='SQL') as tbb
    on tba.exam_id=tbb.exam_id
    group by tba.uid) as t
where t.pk<=0.5 and uid in (select uid from user_info where level>=6)
;

image.png

统计符合要求的用户的数据

select
    uid,
    start_month,
    count(start_month) as total_cnt,
    count(submit_time) as complete_cnt
from
    (select
        uid,
        DATE_FORMAT(start_time,"%Y%m") as start_month,
        submit_time,
        dense_rank() over(partition by uid order by DATE_FORMAT(start_time,"%Y%m") desc) as ranking
    from exam_record) as t
where t.ranking<=3
and uid in 
    (select
        uid
    from 
        (select
            tba.uid,
            percent_rank() over(order by sum(if(tba.submit_time is null, 1, 0))/count(tba.start_time) desc) as pk
        from exam_record as tba inner join (select exam_id from examination_info where tag='SQL') as tbb
        on tba.exam_id=tbb.exam_id
        group by tba.uid) as t
    where t.pk<=0.5 and uid in (select uid from user_info where level>=6)
    )
group by uid, start_month
order by uid, start_month
;

最终结果如下:

image.png