SQL 进阶挑战(16 - 20)

169 阅读21分钟

持续创作,加速成长!这是我参与「掘金日新计划 · 6 月更文挑战」的第 16 天,点击查看活动详情

文章首发地址:村雨遥

SQL16 得分不小于平均分的最低分

描述

示例数据 exam_record表(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分):

iduidexam_idstart_timesubmit_timescore
1100190012020-01-02 09:01:012020-01-02 09:21:0180
2100290012021-09-05 19:01:012021-09-05 19:40:0189
3100290022021-09-02 12:01:01(NULL)(NULL)
4100290032021-09-01 12:01:01(NULL)(NULL)
5100290012021-02-02 19:01:012021-02-02 19:30:0187
6100290022021-05-05 18:01:012021-05-05 18:59:0290
7100390022021-02-06 12:01:01(NULL)(NULL)
8100390032021-09-07 10:01:012021-09-07 10:31:0186
9100490032021-09-06 12:01:01(NULL)(NULL)

examination_info表(exam_id试卷ID, tag试卷类别, difficulty试卷难度, duration考试时长, release_time发布时间)

idexam_idtagdifficultydurationrelease_time
19001SQLhard602020-01-01 10:00:00
29002SQLeasy602020-02-01 10:00:00
39003算法medium802020-08-02 10:00:00

示例输出数据:

min_score_over_avg
87

解释:试卷9001和9002为SQL类别,作答这两份试卷的得分有[80,89,87,90],平均分为86.5,不小于平均分的最小分数为87

问题

请从试卷作答记录表中找到SQL试卷得分不小于该类试卷平均得分的用户最低得分。

示例

输入:drop table if exists examination_info;
CREATE TABLE  examination_info (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    exam_id int UNIQUE NOT NULL COMMENT '试卷ID',
    tag varchar(32) COMMENT '类别标签',
    difficulty varchar(8) COMMENT '难度',
    duration int NOT NULL COMMENT '时长',
    release_time datetime COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

drop table if exists exam_record;
CREATE TABLE  exam_record (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid int NOT NULL COMMENT '用户ID',
    exam_id int NOT NULL COMMENT '试卷ID',
    start_time datetime NOT NULL COMMENT '开始时间',
    submit_time datetime COMMENT '提交时间',
    score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO examination_info(exam_id,tag,difficulty,duration,release_time) VALUES
  (9001, 'SQL', 'hard', 60, '2020-01-01 10:00:00'),
  (9002, 'SQL', 'easy', 60, '2020-02-01 10:00:00'),
  (9003, '算法', 'medium', 80, '2020-08-02 10:00:00');

INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2020-01-02 09:01:01', '2020-01-02 09:21:01', 80),
(1002, 9001, '2021-09-05 19:01:01', '2021-09-05 19:40:01', 89),
(1002, 9002, '2021-09-02 12:01:01', null, null),
(1002, 9003, '2021-09-01 12:01:01', null, null),
(1002, 9001, '2021-02-02 19:01:01', '2021-02-02 19:30:01', 87),
(1002, 9002, '2021-05-05 18:01:01', '2021-05-05 18:59:02', 90),
(1003, 9002, '2021-02-06 12:01:01', null, null),
(1003, 9003, '2021-09-07 10:01:01', '2021-09-07 10:31:01', 86),
(1004, 9003, '2021-09-06 12:01:01', null, null);
输出:87

解答

要计算得分的平均值,需要用到的知识点是:AVG(),此外,要从连接后的表中找到类别为 SQL 的事件的分数,需要用到条件筛查语句 SELECT ... FROM ... WHERE...。然后注意到试卷类别和得分信息处于两个不同的表,因此需要将两个表进行连接,此时需要用到连接语句 JOIN...ON...

再然后,要找出不小于平均分的最低分,则需要用到聚合函数 MIN()

SELECT MIN(exam_record.score) AS min_score_over_avg FROM exam_record 
JOIN examination_info ON exam_record.exam_id = examination_info.exam_id
WHERE examination_info.tag = "SQL"
AND score >=
(SELECT AVG(exam_record.score) FROM exam_record JOIN examination_info ON exam_record.exam_id = examination_info.exam_id WHERE tag = "SQL")

SQL17 平均活跃天数和月活人数

描述

用户在牛客试卷作答区作答记录存储在表exam_record中,内容如下:

exam_record表(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分)

iduidexam_idstart_timesubmit_timescore
1100190012021-07-02 09:01:012021-07-02 09:21:0180
2100290012021-09-05 19:01:012021-09-05 19:40:0181
3100290022021-09-02 12:01:01(NULL)(NULL)
4100290032021-09-01 12:01:01(NULL)(NULL)
5100290012021-07-02 19:01:012021-07-02 19:30:0182
6100290022021-07-05 18:01:012021-07-05 18:59:0290
7100390022021-07-06 12:01:01(NULL)(NULL)
8100390032021-09-07 10:01:012021-09-07 10:31:0186
9100490032021-09-06 12:01:01(NULL)(NULL)
10100290032021-09-01 12:01:012021-09-01 12:31:0181
11100590012021-09-01 12:01:012021-09-01 12:31:0188
12100690022021-09-02 12:11:012021-09-02 12:31:0189
1310079002

问题

请计算2021年每个月里试卷作答区用户平均月活跃天数avg_active_days和月度活跃人数mau,上面数据的示例输出如下:

monthavg_active_daysmau
2021071.502
2021091.254

解释:2021年7月有2人活跃,共活跃了3天(1001活跃1天,1002活跃2天),平均活跃天数1.5;2021年9月有4人活跃,共活跃了5天,平均活跃天数1.25,结果保留2位小数。

注:此处活跃指有交卷行为。

示例

输入:
drop table if exists exam_record;
CREATE TABLE exam_record (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid int NOT NULL COMMENT '用户ID',
    exam_id int NOT NULL COMMENT '试卷ID',
    start_time datetime NOT NULL COMMENT '开始时间',
    submit_time datetime COMMENT '提交时间',
    score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2021-07-02 09:01:01', '2021-07-02 09:21:01', 80),
(1002, 9001, '2021-09-05 19:01:01', '2021-09-05 19:40:01', 81),
(1002, 9002, '2021-09-02 12:01:01', null, null),
(1002, 9003, '2021-09-01 12:01:01', null, null),
(1002, 9001, '2021-07-02 19:01:01', '2021-07-02 19:30:01', 82),
(1002, 9002, '2021-07-05 18:01:01', '2021-07-05 18:59:02', 90),
(1003, 9002, '2021-07-06 12:01:01', null, null),
(1003, 9003, '2021-09-07 10:01:01', '2021-09-07 10:31:01', 86),
(1004, 9003, '2021-09-06 12:01:01', null, null),
(1002, 9003, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 81),
(1005, 9001, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 88),
(1006, 9002, '2021-09-02 12:11:01', '2021-09-02 12:31:01', 89),
(1007, 9002, '2020-09-02 12:11:01', '2020-09-02 12:31:01', 89);
输出:
202107|1.50|2
202109|1.25|4

解答

按照月份分组,则需要用到 GROUP BY 知识点,同时要注意,因为是按月份分组,所以需要用到函数 DATE_FORMAT() 来匹配月份。同时需要筛选出活跃的用户,所以肯定少不了过滤语句 SELECT FROM ... FROM ...。最后,要进行统计,怎么能少的了 COUNT() 函数。

而再观察给出的结果,可以发现活跃天数都是保留的两位小数,所以此时还需要用到 ROUND()

SELECT date_format(submit_time, '%Y%m') AS month,
ROUND((COUNT(DISTINCT uid, date_format(submit_time, '%Y%m%d'))) / COUNT(DISTINCT uid), 2) AS avg_active_days,
COUNT(DISTINCT uid) AS mau
FROM exam_record
WHERE submit_time IS NOT null
AND year(submit_time) = 2021
GROUP BY date_format(submit_time, '%Y%m')

SQL18 月总刷题数和日均刷题数

描述

现有一张题目练习记录表practice_record,示例内容如下:

iduidquestion_idsubmit_timescore
1100180012021-08-02 11:41:0160
2100280012021-09-02 19:30:0150
3100280012021-09-02 19:20:0170
4100280022021-09-02 19:38:0170
5100380022021-08-01 19:38:0180

问题

请从中统计出2021年每个月里用户的月总刷题数month_q_cnt 和日均刷题数avg_day_q_cnt(按月份升序排序)以及该年的总体情况,示例数据输出如下:

submit_monthmonth_q_cntavg_day_q_cnt
20210820.065
20210930.100
2021汇总50.161

解释:2021年8月共有2次刷题记录,日均刷题数为2/31=0.065(保留3位小数);2021年9月共有3次刷题记录,日均刷题数为3/30=0.100;2021年共有5次刷题记录(年度汇总平均无实际意义,这里我们按照31天来算5/31=0.161)

示例

输入:
drop table if exists practice_record;
CREATE TABLE  practice_record (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid int NOT NULL COMMENT '用户ID',
    question_id int NOT NULL COMMENT '题目ID',
    submit_time datetime COMMENT '提交时间',
    score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO practice_record(uid,question_id,submit_time,score) VALUES
(1001, 8001, '2021-08-02 11:41:01', 60),
(1002, 8001, '2021-09-02 19:30:01', 50),
(1002, 8001, '2021-09-02 19:20:01', 70),
(1002, 8002, '2021-09-02 19:38:01', 70),
(1003, 8002, '2021-08-01 19:38:01', 80);
输出:
202108|2|0.065
202109|3|0.100
2021汇总|5|0.161

解答

SELECT
    COALESCE(DATE_FORMAT(submit_time,"%Y%m"),'2021汇总') submit_month,
    COUNT(submit_time) month_q_cnt,
    round(COUNT(submit_time) / MAX(DAY(last_day(submit_time))),3)avg_day_q_cnt
FROM
    practice_record
WHERE
    year(submit_time) = '2021'
GROUP BY
    DATE_FORMAT(submit_time,"%Y%m") WITH ROLLUP;

SQL19 未完成试卷数大于1的有效用户

描述

现有试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分),示例数据如下:

iduidexam_idstart_timesubmit_timescore
1100190012021-07-02 09:01:012021-07-02 09:21:0180
2100290012021-09-05 19:01:012021-09-05 19:40:0181
3100290022021-09-02 12:01:01(NULL)(NULL)
4100290032021-09-01 12:01:01(NULL)(NULL)
5100290012021-07-02 19:01:012021-07-02 19:30:0182
6100290022021-07-05 18:01:012021-07-05 18:59:0290
7100390022021-07-06 12:01:01(NULL)(NULL)
8100390032021-09-07 10:01:012021-09-07 10:31:0186
9100490032021-09-06 12:01:01(NULL)(NULL)
10100290032021-09-01 12:01:012021-09-01 12:31:0181
11100590012021-09-01 12:01:012021-09-01 12:31:0188
12100590022021-09-01 12:01:012021-09-01 12:31:0188
13100690022021-09-02 12:11:012021-09-02 12:31:0189

还有一张试卷信息表examination_info(exam_id试卷ID, tag试卷类别, difficulty试卷难度, duration考试时长, release_time发布时间),示例数据如下:

idexam_idtagdifficultydurationrelease_time
19001SQLhard602020-01-01 10:00:00
29002SQLeasy602020-02-01 10:00:00
39003算法medium802020-08-02 10:00:00

问题

请统计2021年每个未完成试卷作答数大于1的有效用户的数据(有效用户指完成试卷作答数至少为1且未完成数小于5),输出用户ID、未完成试卷作答数、完成试卷作答数、作答过的试卷tag集合,按未完成试卷数量由多到少排序。示例数据的输出结果如下:

uidincomplete_cntcomplete_cntdetail
1002242021-09-01:算法;2021-07-02:SQL;2021-09-02:SQL;2021-09-05:SQL;2021-07-05:SQL

解释:2021年的作答记录中,除了1004,其他用户均满足有效用户定义,但只有1002未完成试卷数大于1,因此只输出1002,detail中是1002作答过的试卷{日期:tag}集合,日期和tag间用:连接,多元素间用;连接。

示例

输入:
drop table if exists examination_info;
CREATE TABLE examination_info (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    exam_id int UNIQUE NOT NULL COMMENT '试卷ID',
    tag varchar(32) COMMENT '类别标签',
    difficulty varchar(8) COMMENT '难度',
    duration int NOT NULL COMMENT '时长',
    release_time datetime COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

drop table if exists exam_record;
CREATE TABLE  exam_record (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid int NOT NULL COMMENT '用户ID',
    exam_id int NOT NULL COMMENT '试卷ID',
    start_time datetime NOT NULL COMMENT '开始时间',
    submit_time datetime COMMENT '提交时间',
    score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO examination_info(exam_id,tag,difficulty,duration,release_time) VALUES
  (9001, 'SQL', 'hard', 60, '2020-01-01 10:00:00'),
  (9002, 'SQL', 'easy', 60, '2020-02-01 10:00:00'),
  (9003, '算法', 'medium', 80, '2020-08-02 10:00:00');

INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2021-07-02 09:01:01', '2021-07-02 09:21:01', 80),
(1002, 9001, '2021-09-05 19:01:01', '2021-09-05 19:40:01', 81),
(1002, 9002, '2021-09-02 12:01:01', null, null),
(1002, 9003, '2021-09-01 12:01:01', null, null),
(1002, 9001, '2021-07-02 19:01:01', '2021-07-02 19:30:01', 82),
(1002, 9002, '2021-07-05 18:01:01', '2021-07-05 18:59:02', 90),
(1003, 9002, '2021-07-06 12:01:01', null, null),
(1003, 9003, '2021-09-07 10:01:01', '2021-09-07 10:31:01', 86),
(1004, 9003, '2021-09-06 12:01:01', null, null),
(1002, 9003, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 81),
(1005, 9001, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 88),
(1005, 9002, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 88),
(1006, 9002, '2021-09-02 12:11:01', '2021-09-02 12:31:01', 89);
输出:
1002|2|4|2021-07-02:SQL;2021-07-05:SQL;2021-09-01:算法;2021-09-02:SQL;2021-09-05:SQL

解答

SELECT uid, COUNT(incomplete) incomplete_cnt,
COUNT(complete) complete_cnt,
GROUP_CONCAT(DISTINCT concat_ws(':', DATE(start_time), tag) SEPARATOR ';') detail
FROM (
    SELECT uid, tag, start_time,
        IF(submit_time IS null, 1, null) AS incomplete,
        IF(submit_time IS null, null, 1) AS complete
    FROM exam_record 
    JOIN examination_info USING(exam_id)
    WHERE year(start_time)=2021
) AS exam_complete_rec
GROUP BY uid
HAVING complete_cnt >= 1 AND incomplete_cnt BETWEEN 2 AND 4
ORDER BY incomplete_cnt DESC;

SQL20 月均完成试卷数不小于3的用户爱作答的类别

描述

现有试卷作答记录表exam_record(uid:用户ID, exam_id:试卷ID, start_time:开始作答时间, submit_time:交卷时间,没提交的话为NULL, score:得分),示例数据如下:

iduidexam_idstart_timesubmit_timescore
1100190012021-07-02 09:01:01(NULL)(NULL)
2100290032021-09-01 12:01:012021-09-01 12:21:0160
3100290022021-09-02 12:01:012021-09-02 12:31:0170
4100290012021-09-05 19:01:012021-09-05 19:40:0181
5100290022021-07-06 12:01:01(NULL)(NULL)
6100390032021-09-07 10:01:012021-09-07 10:31:0186
7100390032021-09-08 12:01:012021-09-08 12:11:0140
8100390012021-09-08 13:01:01(NULL)(NULL)
9100390022021-09-08 14:01:01(NULL)(NULL)
10100390032021-09-08 15:01:01(NULL)(NULL)
11100590012021-09-01 12:01:012021-09-01 12:31:0188
12100590022021-09-01 12:01:012021-09-01 12:31:0188
13100590022021-09-02 12:11:012021-09-02 12:31:0189

试卷信息表examination_info(exam_id:试卷ID, tag:试卷类别, difficulty:试卷难度, duration:考试时长, release_time:发布时间),示例数据如下:

idexam_idtagdifficultydurationrelease_time
19001SQLhard602020-01-01 10:00:00
29002C++easy602020-02-01 10:00:00
39003算法medium802020-08-02 10:00:00

问题

请从表中统计出 “当月均完成试卷数”不小于3的用户们爱作答的类别及作答次数,按次数降序输出,示例输出如下:

tagtag_cnt
C++4
SQL2
算法1

解释:用户1002和1005在2021年09月的完成试卷数目均为3,其他用户均小于3;然后用户1002和1005作答过的试卷tag分布结果按作答次数降序排序依次为C++、SQL、算法。

示例

输入:
drop table if exists examination_info,exam_record;
CREATE TABLE examination_info (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    exam_id int UNIQUE NOT NULL COMMENT '试卷ID',
    tag varchar(32) COMMENT '类别标签',
    difficulty varchar(8) COMMENT '难度',
    duration int NOT NULL COMMENT '时长',
    release_time datetime COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

CREATE TABLE exam_record (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid int NOT NULL COMMENT '用户ID',
    exam_id int NOT NULL COMMENT '试卷ID',
    start_time datetime NOT NULL COMMENT '开始时间',
    submit_time datetime COMMENT '提交时间',
    score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO examination_info(exam_id,tag,difficulty,duration,release_time) VALUES
  (9001, 'SQL', 'hard', 60, '2020-01-01 10:00:00'),
  (9002, 'C++', 'easy', 60, '2020-02-01 10:00:00'),
  (9003, '算法', 'medium', 80, '2020-08-02 10:00:00');

INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2021-07-02 09:01:01', null, null),
(1002, 9003, '2021-09-01 12:01:01', '2021-09-01 12:21:01', 60),
(1002, 9002, '2021-09-02 12:01:01', '2021-09-02 12:31:01', 70),
(1002, 9001, '2021-09-05 19:01:01', '2021-09-05 19:40:01', 81),
(1002, 9002, '2021-07-06 12:01:01', null, null),
(1003, 9003, '2021-09-07 10:01:01', '2021-09-07 10:31:01', 86),
(1003, 9003, '2021-09-08 12:01:01', '2021-09-08 12:11:01', 40),
(1003, 9001, '2021-09-08 13:01:01', null, null),
(1003, 9002, '2021-09-08 14:01:01', null, null),
(1003, 9003, '2021-09-08 15:01:01', null, null),
(1005, 9001, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 88),
(1005, 9002, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 88),
(1005, 9002, '2021-09-02 12:11:01', '2021-09-02 12:31:01', 89);
输出:
C++|4
SQL|2
算法|1

解答

根据题意可知,本质主要是统计月均完成试卷数不少于 3 的用户所喜爱的类别和作答词数,然后按次数降序输出,将题中的知识点进行分解,就可以大致得到以下小知识点:

  • 首先要进行筛选工作,那么肯定少不了 WHERE

  • 其次,要统计完成次数,那么需要用到分组语句 GROUP BY,然后要统计用户完成试卷的总数,需要用到统计某一列的聚合函数 COUNT(),而要统计完成试卷的月份数,此时需要用到 DATE_FORMAT()

  • 要将试卷作答记录和试卷信息两个表进行关联,需要用到 JOIN 知识点。

  • 最后则是降序输出,需要用到知识点 ORDER BY [列名] DESC

SELECT tag, COUNT(tag) AS tag_cnt 
FROM exam_record
JOIN examination_info USING(exam_id)
WHERE uid in 
(SELECT uid FROM exam_record WHERE submit_time IS NOT NULL GROUP BY uid HAVING COUNT(exam_id) / COUNT(DISTINCT DATE_FORMAT(start_time, "%Y%m")) >= 3)
GROUP BY tag
ORDER BY
tag_cnt
DESC;