每日SQL一练#2023102102

99 阅读7分钟

这题也是来源牛客。窗口函数做完了,开始聚合函数的题目了。 对试卷得分做min-max归一化_牛客题霸_牛客网 (nowcoder.com)

题干

现有试卷信息表examination_info(exam_id试卷ID, tag试卷类别, difficulty试卷难度, duration考试时长, release_time发布时间):

idexam_idtagdifficultydurationrelease_time
19001SQLhard602020-01-01 10:00:00
29002C++hard802020-01-01 10:00:00
39003算法hard802020-01-01 10:00:00
49004PYTHONmedium702020-01-01 10:00:00

试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分):

iduidexam_idstart_timesubmit_timescore
6100390012020-01-02 12:01:012020-01-02 12:31:0168
9100190012020-01-02 10:01:012020-01-02 10:31:0189
1100190012020-01-01 09:01:012020-01-01 09:21:5990
12100290022021-05-05 18:01:01(NULL)(NULL)
3100490022020-01-01 12:01:012020-01-01 12:11:0160
2100390022020-01-01 19:01:012020-01-01 19:30:0175
7100190022020-01-02 12:01:012020-01-02 12:43:0181
10100290022020-01-01 12:11:012020-01-01 12:31:0183
4100390022020-01-01 12:01:012020-01-01 12:41:0190
5100290022020-01-02 19:01:012020-01-02 19:32:0090
11100290042021-09-06 12:01:01(NULL)(NULL)
8100190052020-01-02 12:11:01(NULL)(NULL)

在物理学及统计学数据计算时,有个概念叫min-max标准化,也被称为离差标准化,是对原始数据的线性变换,使结果值映射到[0 - 1]之间。

转换函数为:

请你将用户作答高难度试卷的得分在每份试卷作答记录内执行min-max归一化后缩放到[0,100]区间,并输出用户ID、试卷ID、归一化后分数平均值;最后按照试卷ID升序、归一化分数降序输出。(注:得分区间默认为[0,100],如果某个试卷作答记录中只有一个得分,那么无需使用公式,归一化并缩放后分数仍为原分数)。

由示例数据结果输出如下:

uidexam_idavg_new_score
1001900198
100390010
1002900288
1003900275
1001900270
100490020

解释:高难度试卷有9001、9002、9003;

作答了9001的记录有3条,分数分别为68、89、90,按给定公式归一化后分数为:0、95、100,而后两个得分都是用户1001作答的,因此用户1001对试卷9001的新得分为(95+100)/2≈98(只保留整数部分),用户1003对于试卷9001的新得分为0。最后结果按照试卷ID升序、归一化分数降序输出。

测试数据


drop table if exists examination_info,exam_record;
CREATE TABLE examination_info (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    exam_id int UNIQUE NOT NULL COMMENT '试卷ID',
    tag varchar(32) COMMENT '类别标签',
    difficulty varchar(8) COMMENT '难度',
    duration int NOT NULL COMMENT '时长',
    release_time datetime COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;

CREATE TABLE exam_record (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid int NOT NULL COMMENT '用户ID',
    exam_id int NOT NULL COMMENT '试卷ID',
    start_time datetime NOT NULL COMMENT '开始时间',
    submit_time datetime COMMENT '提交时间',
    score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO examination_info(exam_id,tag,difficulty,duration,release_time) VALUES
  (9001, 'SQL', 'hard', 60, '2020-01-01 10:00:00'),
  (9002, 'C++', 'hard', 80, '2020-01-01 10:00:00'),
  (9003, '算法', 'hard', 80, '2020-01-01 10:00:00'),
  (9004, 'PYTHON', 'medium', 70, '2020-01-01 10:00:00'),
  (9005, 'WEB', 'hard', 80, '2020-01-01 10:00:00'),
  (9006, 'PYTHON', 'hard', 80, '2020-01-01 10:00:00'),
  (9007, 'web', 'hard', 80, '2020-01-01 10:00:00'),
  (9008, 'Web', 'medium', 70, '2020-01-01 10:00:00'),
  (9009, 'WEB', 'medium', 70, '2020-01-01 10:00:00'),
  (9010, 'SQL', 'medium', 70, '2020-01-01 10:00:00');
	
INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2020-01-01 09:01:01', '2020-01-01 09:21:59', 90),
(1003, 9002, '2020-01-01 19:01:01', '2020-01-01 19:30:01', 75),
(1004, 9002, '2020-01-01 12:01:01', '2020-01-01 12:11:01', 60),
(1003, 9002, '2020-01-01 12:01:01', '2020-01-01 12:41:01', 90),
(1002, 9002, '2020-01-02 19:01:01', '2020-01-02 19:32:00', 90),
(1003, 9001, '2020-01-02 12:01:01', '2020-01-02 12:31:01', 68),
(1001, 9002, '2020-01-02 12:01:01', '2020-01-02 12:43:01', 81),
(1001, 9005, '2020-01-02 12:11:01', null, null),
(1001, 9001, '2020-01-02 10:01:01', '2020-01-02 10:31:01', 89),
(1002, 9002, '2020-01-01 12:11:01', '2020-01-01 12:31:01', 83),
(1002, 9004, '2021-09-06 12:01:01', null, null),
(1002, 9002, '2021-05-05 18:01:01', null, null);

题目解析

关于max-min归一化可以参考这个:www.cnblogs.com/chaosimple/…

关于题干说的两次计算都取整,这里只有最后一次计算需要取整,否则最终计算结果会有差异。

解答:

-- 求解每种试卷的最大最小值

select
    distinct exam_id,
    max(score) over(partition by exam_id) as max_score,
    min(score) over(partition by exam_id) as min_score
from exam_record
where exam_id in (select exam_id from examination_info where difficulty='hard')
and score is not null
;

-- 求出每条作答记录归一化后的值

select
    tba.uid,
    tba.exam_id,
    (tba.score-tbb.min_score)/(tbb.max_score-tbb.min_score) * 100 as new_score
from exam_record as tba
inner join
    (select
        distinct exam_id,
        max(score) over(partition by exam_id) as max_score,
        min(score) over(partition by exam_id) as min_score
    from exam_record
    where exam_id in (select exam_id from examination_info where difficulty='hard')
    and score is not null
    ) as tbb
on tba.exam_id=tbb.exam_id
;

image.png

然后是最终代码:

select
    distinct uid, exam_id,
    round(avg(new_score) over(partition by uid, exam_id), 0) as avg_new_score
from 
    (select
        tba.uid,
        tba.exam_id,
        ifnull((tba.score-tbb.min_score)/(tbb.max_score-tbb.min_score) * 100, score) as new_score
    from exam_record as tba
    inner join
        (select
            distinct exam_id,
            max(score) over(partition by exam_id) as max_score,
            min(score) over(partition by exam_id) as min_score
        from exam_record
        where exam_id in (select exam_id from examination_info where difficulty='hard')
        ) as tbb
    on tba.exam_id=tbb.exam_id where tba.score is not null) as tb
group by uid, exam_id, new_score
order by exam_id asc, avg_new_score desc
;

image.png