1415_Easy_学生们参加各科测试的次数
学生表: Students
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| student_id | int |
| student_name | varchar |
+---------------+---------+
主键为 student_id(学生ID),该表内的每一行都记录有学校一名学生的信息。
科目表: Subjects
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| subject_name | varchar |
+--------------+---------+
主键为 subject_name(科目名称),每一行记录学校的一门科目名称。
考试表: Examinations
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| student_id | int |
| subject_name | varchar |
+--------------+---------+
这张表压根没有主键,可能会有重复行。
学生表里的一个学生修读科目表里的每一门科目,而这张考试表的每一行记录就表示学生表里的某个学生参加了一次科目表里某门科目的测试。
要求写一段 SQL 语句,查询出每个学生参加每一门科目测试的次数,结果按 student_id 和 subject_name 排序。查询结构格式如下所示:
Students table:
+------------+--------------+
| student_id | student_name |
+------------+--------------+
| 1 | Alice |
| 2 | Bob |
| 13 | John |
| 6 | Alex |
+------------+--------------+
Subjects table:
+--------------+
| subject_name |
+--------------+
| Math |
| Physics |
| Programming |
+--------------+
Examinations table:
+------------+--------------+
| student_id | subject_name |
+------------+--------------+
| 1 | Math |
| 1 | Physics |
| 1 | Programming |
| 2 | Programming |
| 1 | Physics |
| 1 | Math |
| 13 | Math |
| 13 | Programming |
| 13 | Physics |
| 2 | Math |
| 1 | Math |
+------------+--------------+
Result table:
+------------+--------------+--------------+----------------+
| student_id | student_name | subject_name | attended_exams |
+------------+--------------+--------------+----------------+
| 1 | Alice | Math | 3 |
| 1 | Alice | Physics | 2 |
| 1 | Alice | Programming | 1 |
| 2 | Bob | Math | 1 |
| 2 | Bob | Physics | 0 |
| 2 | Bob | Programming | 1 |
| 6 | Alex | Math | 0 |
| 6 | Alex | Physics | 0 |
| 6 | Alex | Programming | 0 |
| 13 | John | Math | 1 |
| 13 | John | Physics | 1 |
| 13 | John | Programming | 1 |
+------------+--------------+--------------+----------------+
结果表需包含所有学生和所有科目(即便测试次数为0): Alice 参加了 3 次数学测试, 2 次物理测试,以及 1 次编程测试; Bob 参加了 1 次数学测试, 1 次编程测试,没有参加物理测试; Alex 啥测试都没参加; John 参加了数学、物理、编程测试各 1 次。
建表+ 导数
create table 1415_Students(student_id int, student_name varchar(30)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1415_Students" overwrite into table 1415_Students;
create table 1415_Subjects(subject_name varchar(30)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1415_Subjects" overwrite into table 1415_Subjects;
create table 1415_Examinations(student_id int, subject_name varchar(30)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1415_Examinations" overwrite into table 1415_Examinations;
知识点:笛卡尔积、if用法、group by去重
select
t.student_id, t.student_name, t.subject_name , if(t1.cnt is null, 0, t1.cnt) attended_exams
from
(
select
stu.student_id student_id, stu.student_name student_name, sub.subject_name subject_name
from 1415_Students stu
join 1415_Subjects sub
) t
left join
(
select
exam.student_id, exam.subject_name, count(1) cnt
from 1415_Examinations exam
group by exam.student_id, exam.subject_name
) t1
on t.student_id = t1.student_id and t.subject_name = t1.subject_name
order by t.student_id, t.subject_name
执行结果:
1420_Medium_找到连续区间的开始和结束数字
表:Logs
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| log_id | int |
+---------------+---------+
id 是上表的主键。上表的每一行包含日志表中的一个 ID。
后来一些 ID 从 Logs 表中删除。编写一个 SQL 查询得到 Logs 表中的连续区间的开始数字和结束数字。
将查询表按照 start_id 排序。
查询结果格式如下面的例子:
Logs表:
+------------+
| log_id |
+------------+
| 1 |
| 2 |
| 3 |
| 7 |
| 8 |
| 10 |
+------------+
结果表:
+------------+--------------+
| start_id | end_id |
+------------+--------------+
| 1 | 3 |
| 7 | 8 |
| 10 | 10 |
+------------+--------------+
结果表应包含 Logs 表中的所有区间。
从 1 到 3 在表中。
从 4 到 6 不在表中。
从 7 到 8 在表中。
9 不在表中。
10 在表中。
建表+导数
create table 1420_Logs(log_id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1420_Logs" overwrite into table 1420_Logs;
select * from 1420_Logs;
知识点:rank()排序函数,first_value()开窗函数,over()窗口大小控制
select
t1.start_id, t1.end_id
from
(
select
first_value(log_id) over(partition by stand order by log_id) start_id,
first_value(log_id) over(partition by stand order by log_id desc) end_id
from
(
select
log_id, log_id-(rank() over(order by log_id)) stand
from 1420_Logs
) t
) t1
group by t1.start_id, t1.end_id
1425_Easy_不同国家的天气类型
国家表:Countries
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| country_id | int |
| country_name | varchar |
+---------------+---------+
country_id 是这张表的主键。
该表的每行有 country_id 和 country_name 两列。
天气表:Weather
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| country_id | int |
| weather_state | varchar |
| day | date |
+---------------+---------+
(country_id, day) 是该表的复合主键。
该表的每一行记录了某个国家某一天的天气情况。
写一段 SQL 来找到表中每个国家在 2019 年 11 月的天气类型。
天气类型的定义如下:当 weather_state 的平均值小于或等于15返回 Cold,当 weather_state 的平均值大于或等于 25 返回 Hot,否则返回 Warm。
你可以以任意顺序返回你的查询结果。
查询结果格式如下所示:
Countries table:
+------------+--------------+
| country_id | country_name |
+------------+--------------+
| 2 | USA |
| 3 | Australia |
| 7 | Peru |
| 5 | China |
| 8 | Morocco |
| 9 | Spain |
+------------+--------------+
Weather table:
+------------+---------------+------------+
| country_id | weather_state | day |
+------------+---------------+------------+
| 2 | 15 | 2019-11-01 |
| 2 | 12 | 2019-10-28 |
| 2 | 12 | 2019-10-27 |
| 3 | -2 | 2019-11-10 |
| 3 | 0 | 2019-11-11 |
| 3 | 3 | 2019-11-12 |
| 5 | 16 | 2019-11-07 |
| 5 | 18 | 2019-11-09 |
| 5 | 21 | 2019-11-23 |
| 7 | 25 | 2019-11-28 |
| 7 | 22 | 2019-12-01 |
| 7 | 20 | 2019-12-02 |
| 8 | 25 | 2019-11-05 |
| 8 | 27 | 2019-11-15 |
| 8 | 31 | 2019-11-25 |
| 9 | 7 | 2019-10-23 |
| 9 | 3 | 2019-12-23 |
+------------+---------------+------------+
Result table:
+--------------+--------------+
| country_name | weather_type |
+--------------+--------------+
| USA | Cold |
| Austraila | Cold |
| Peru | Hot |
| China | Warm |
| Morocco | Hot |
+--------------+--------------+
USA 11 月的平均 weather_state 为 (15) / 1 = 15 所以天气类型为 Cold。
Australia 11 月的平均 weather_state 为 (-2 + 0 + 3) / 3 = 0.333 所以天气类型为 Cold。
Peru 11 月的平均 weather_state 为 (25) / 1 = 25 所以天气类型为 Hot。
China 11 月的平均 weather_state 为 (16 + 18 + 21) / 3 = 18.333 所以天气类型为 Warm。
Morocco 11 月的平均 weather_state 为 (25 + 27 + 31) / 3 = 27.667 所以天气类型为 Hot。
我们并不知道 Spain 在 11 月的 weather_state 情况所以无需将他包含在结果中。
建表+导数
create table 1425_Countries(country_id int, country_name varchar(20)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1425_Countries" overwrite into table 1425_Countries;
create table 1425_Weather(country_id int, weather_state varchar(20), `day` date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1425_Weather" overwrite into table 1425_Weather;
知识点:date_format() 时间函数、case-when-then-else-end语句
select
t.country_name,
case
when (sum(cast(t.weather_state as int))/ count(1)) <= 15 then 'Cold'
when (sum(cast(t.weather_state as int))/ count(1)) >= 25 then 'Hot'
else 'Warm'
end as weather_type
from
(
select
c.country_name country_name, w.weather_state weather_state
from
1425_Weather w
left join 1425_Countries c on
w.country_id = c.country_id
where
date_format(w.`day`,'yyyy-MM')="2019-11"
) t
group by
t.country_name
执行结果:
1438_Easy_求团队人数
员工表:Employee
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| employee_id | int |
| team_id | int |
+---------------+---------+
employee_id 字段是这张表的主键,表中的每一行都包含每个员工的 ID 和他们所属的团队。
编写一个 SQL 查询,以求得每个员工所在团队的总人数。
查询结果中的顺序无特定要求。
查询结果格式示例如下:
Employee Table:
+-------------+------------+
| employee_id | team_id |
+-------------+------------+
| 1 | 8 |
| 2 | 8 |
| 3 | 8 |
| 4 | 7 |
| 5 | 9 |
| 6 | 9 |
+-------------+------------+
Result table:
+-------------+------------+
| employee_id | team_size |
+-------------+------------+
| 1 | 3 |
| 2 | 3 |
| 3 | 3 |
| 4 | 1 |
| 5 | 2 |
| 6 | 2 |
+-------------+------------+
ID 为 1、2、3 的员工是 team_id 为 8 的团队的成员,
ID 为 4 的员工是 team_id 为 7 的团队的成员,
ID 为 5、6 的员工是 team_id 为 9 的团队的成员。
建表+导数
create table 1438_Employee(employee_id int, team_id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1438_Employee" overwrite into table 1438_Employee;
知识点:UDAF函数结合over()开窗秒出结果
select
employee_id,
count(1) over(partition by team_id) team_size
from 1438_Employee
1439_Medium_不同性别每日分数总计
表: Scores
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| player_name | varchar |
| gender | varchar |
| day | date |
| score_points | int |
+---------------+---------+
(gender, day)是该表的主键
一场比赛是在女队和男队之间举行的
该表的每一行表示一个名叫 (player_name) 性别为 (gender) 的参赛者在某一天获得了 (score_points) 的分数
如果参赛者是女性,那么 gender 列为 'F',如果参赛者是男性,那么 gender 列为 'M'
写一条SQL语句查询每种性别在每一天的总分,并按性别和日期对查询结果排序
下面是查询结果格式的例子:
Scores表:
+-------------+--------+------------+--------------+
| player_name | gender | day | score_points |
+-------------+--------+------------+--------------+
| Aron | F | 2020-01-01 | 17 |
| Alice | F | 2020-01-07 | 23 |
| Bajrang | M | 2020-01-07 | 7 |
| Khali | M | 2019-12-25 | 11 |
| Slaman | M | 2019-12-30 | 13 |
| Joe | M | 2019-12-31 | 3 |
| Jose | M | 2019-12-18 | 2 |
| Priya | F | 2019-12-31 | 23 |
| Priyanka | F | 2019-12-30 | 17 |
+-------------+--------+------------+--------------+
结果表:
+--------+------------+-------+
| gender | day | total |
+--------+------------+-------+
| F | 2019-12-30 | 17 |
| F | 2019-12-31 | 40 |
| F | 2020-01-01 | 57 |
| F | 2020-01-07 | 80 |
| M | 2019-12-18 | 2 |
| M | 2019-12-25 | 13 |
| M | 2019-12-30 | 26 |
| M | 2019-12-31 | 29 |
| M | 2020-01-07 | 36 |
+--------+------------+-------+
女性队伍:
第一天是 2019-12-30,Priyanka 获得 17 分,队伍的总分是 17 分
第二天是 2019-12-31, Priya 获得 23 分,队伍的总分是 40 分
第三天是 2020-01-01, Aron 获得 17 分,队伍的总分是 57 分
第四天是 2020-01-07, Alice 获得 23 分,队伍的总分是 80 分
男性队伍:
第一天是 2019-12-18, Jose 获得 2 分,队伍的总分是 2 分
第二天是 2019-12-25, Khali 获得 11 分,队伍的总分是 13 分
第三天是 2019-12-30, Slaman 获得 13 分,队伍的总分是 26 分
第四天是 2019-12-31, Joe 获得 3 分,队伍的总分是 29 分
第五天是 2020-01-07, Bajrang 获得 7 分,队伍的总分是 36 分
建表+导数
create table 1439_Scores(player_name varchar(20), gender varchar(20), `day` date, score_points int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1439_Scores" overwrite into table 1439_Scores;
select * from 1439_Scores;
知识点:UDAF函数配合over()开窗,并对窗口大小进行控制。
窗口内首行到当前行:ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
窗口内当前行的前三行到当前行:ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
窗口内当前行的前三行到当前行的下一行:ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING
窗口内当前行的前三行到窗口内最后一行:ROWS BETWEEN 3 PRECEDING AND UNBOUNDED FOLLOWING
此处开窗大小为:窗口内首行到当前行
select
gender, `day`,
sum(score_points) over(partition by gender order by 'day' ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) total
from 1439_Scores
执行结果:
1452_Medium_餐馆营业额变化增长
表: Customer
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| customer_id | int |
| name | varchar |
| visited_on | date |
| amount | int |
+---------------+---------+
(customer_id, visited_on) 是该表的主键
该表包含一家餐馆的顾客交易数据
visited_on 表示 (customer_id) 的顾客在 visited_on 那天访问了餐馆
amount 是一个顾客某一天的消费总额
你是餐馆的老板,现在你想分析一下可能的营业额变化增长(每天至少有一位顾客)
写一条 SQL 查询计算以 7 天(某日期 + 该日期前的 6 天)为一个时间段的顾客消费平均值
查询结果格式的例子如下:
- 查询结果按
visited_on排序 average_amount要 保留两位小数,日期数据的格式为 ('YYYY-MM-DD')
Customer 表:
+-------------+--------------+--------------+-------------+
| customer_id | name | visited_on | amount |
+-------------+--------------+--------------+-------------+
| 1 | Jhon | 2019-01-01 | 100 |
| 2 | Daniel | 2019-01-02 | 110 |
| 3 | Jade | 2019-01-03 | 120 |
| 4 | Khaled | 2019-01-04 | 130 |
| 5 | Winston | 2019-01-05 | 110 |
| 6 | Elvis | 2019-01-06 | 140 |
| 7 | Anna | 2019-01-07 | 150 |
| 8 | Maria | 2019-01-08 | 80 |
| 9 | Jaze | 2019-01-09 | 110 |
| 1 | Jhon | 2019-01-10 | 130 |
| 3 | Jade | 2019-01-10 | 150 |
+-------------+--------------+--------------+-------------+
结果表:
+--------------+--------------+----------------+
| visited_on | amount | average_amount |
+--------------+--------------+----------------+
| 2019-01-07 | 860 | 122.86 |
| 2019-01-08 | 840 | 120 |
| 2019-01-09 | 840 | 120 |
| 2019-01-10 | 1000 | 142.86 |
+--------------+--------------+----------------+
第一个七天消费平均值从 2019-01-01 到 2019-01-07 是 (100 + 110 + 120 + 130 + 110 + 140 + 150)/7 = 122.86
第二个七天消费平均值从 2019-01-02 到 2019-01-08 是 (110 + 120 + 130 + 110 + 140 + 150 + 80)/7 = 120
第三个七天消费平均值从 2019-01-03 到 2019-01-09 是 (120 + 130 + 110 + 140 + 150 + 80 + 110)/7 = 120
第四个七天消费平均值从 2019-01-04 到 2019-01-10 是 (130 + 110 + 140 + 150 + 80 + 110 + 130 + 150)/7 = 142.86
建表+导数
create table 1452_Customer(customer_id int, name varchar(20), visited_on date, amount int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1452_Customer" overwrite into table 1452_Customer;
select * from 1452_Customer;
知识点:rank()排序函数,over()开窗并控制窗口大小,round()函数控制输出位
select
t1.visited_on,
round(t1.total_amount / 7,2) average_amount
from
(
select
t.visited_on,
sum(t.amount) over(order by t.visited_on ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) total_amount,
rk
from
(
select
visited_on,
sum(amount) amount,
rank() over(order by visited_on) rk
from
1452 _Customer
group by
visited_on
) t
) t1
where
t1.rk > 6
执行结果:
1453_Easy_广告效果
表: Ads
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| ad_id | int |
| user_id | int |
| action | enum |
+---------------+---------+
(ad_id, user_id) 是该表的主键
该表的每一行包含一条广告的 ID(ad_id),用户的 ID(user_id) 和用户对广告采取的行为 (action)
action 列是一个枚举类型 ('Clicked', 'Viewed', 'Ignored') 。
一家公司正在运营这些广告并想计算每条广告的效果。
广告效果用点击通过率(Click-Through Rate:CTR)来衡量,公式如下:
写一条SQL语句来查询每一条广告的 ctr ,
ctr 要保留两位小数。结果需要按 ctr 降序、按 ad_id 升序 进行排序。
查询结果示例如下:
Ads 表:
+-------+---------+---------+
| ad_id | user_id | action |
+-------+---------+---------+
| 1 | 1 | Clicked |
| 2 | 2 | Clicked |
| 3 | 3 | Viewed |
| 5 | 5 | Ignored |
| 1 | 7 | Ignored |
| 2 | 7 | Viewed |
| 3 | 5 | Clicked |
| 1 | 4 | Viewed |
| 2 | 11 | Viewed |
| 1 | 2 | Clicked |
+-------+---------+---------+
结果表:
+-------+-------+
| ad_id | ctr |
+-------+-------+
| 1 | 66.67 |
| 3 | 50.00 |
| 2 | 33.33 |
| 5 | 0.00 |
+-------+-------+
对于 ad_id = 1, ctr = (2/(2+1)) * 100 = 66.67
对于 ad_id = 2, ctr = (1/(1+2)) * 100 = 33.33
对于 ad_id = 3, ctr = (1/(1+1)) * 100 = 50.00
对于 ad_id = 5, ctr = 0.00, 注意 ad_id = 5 没有被点击 (Clicked) 或查看 (Viewed) 过
注意我们不关心 action 为 Ingnored 的广告
结果按 ctr(降序),ad_id(升序)排序
建表+导数:
create table 1453_Ads(ad_id int, user_id int, `action` varchar(20)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1453_Ads" overwrite into table 1453_Ads;
select * from 1453_Ads;
知识点:case when then else 语法,round函数, coalesce函数-> 返回第一个不为null的值
select
ad_id,
coalesce(round(total_clicked / total_action,2), 0) ctr
from
(
select
ad_id,
sum( case when `action` = 'Ignored' then 0 else 1 end ) total_action,
sum( case when `action` = 'Clicked' then 1 else 0 end ) total_clicked
from
1453_Ads
group by
ad_id
) t
order by
ctr desc,
ad_id
执行结果:
1462_Easy_列出指定时间段内所有的下单产品
表: Products
+------------------+---------+
| Column Name | Type |
+------------------+---------+
| product_id | int |
| product_name | varchar |
| product_category | varchar |
+------------------+---------+
product_id 是该表主键。
该表包含该公司产品的数据。
表: Orders
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| product_id | int |
| order_date | date |
| unit | int |
+---------------+---------+
该表无主键,可能包含重复行。
product_id 是表单 Products 的外键。
unit 是在日期 order_date 内下单产品的数目。
写一个 SQL 语句,要求获取在 2020 年 2 月份下单的数量不少于 100 的产品的名字和数目。
返回结果表单的顺序无要求。
查询结果的格式如下:
Products 表:
+-------------+-----------------------+------------------+
| product_id | product_name | product_category |
+-------------+-----------------------+------------------+
| 1 | Leetcode Solutions | Book |
| 2 | Jewels of Stringology | Book |
| 3 | HP | Laptop |
| 4 | Lenovo | Laptop |
| 5 | Leetcode Kit | T-shirt |
+-------------+-----------------------+------------------+
Orders 表:
+--------------+--------------+----------+
| product_id | order_date | unit |
+--------------+--------------+----------+
| 1 | 2020-02-05 | 60 |
| 1 | 2020-02-10 | 70 |
| 2 | 2020-01-18 | 30 |
| 2 | 2020-02-11 | 80 |
| 3 | 2020-02-17 | 2 |
| 3 | 2020-02-24 | 3 |
| 4 | 2020-03-01 | 20 |
| 4 | 2020-03-04 | 30 |
| 4 | 2020-03-04 | 60 |
| 5 | 2020-02-25 | 50 |
| 5 | 2020-02-27 | 50 |
| 5 | 2020-03-01 | 50 |
+--------------+--------------+----------+
Result 表:
+--------------------+---------+
| product_name | unit |
+--------------------+---------+
| Leetcode Solutions | 130 |
| Leetcode Kit | 100 |
+--------------------+---------+
2020 年 2 月份下单 product_id = 1 的产品的数目总和为 (60 + 70) = 130 。
2020 年 2 月份下单 product_id = 2 的产品的数目总和为 80 。
2020 年 2 月份下单 product_id = 3 的产品的数目总和为 (2 + 3) = 5 。
2020 年 2 月份 product_id = 4 的产品并没有下单。
2020 年 2 月份下单 product_id = 5 的产品的数目总和为 (50 + 50) = 100 。
建表+导数
create table 1462_Products(product_id int, product_name varchar(20), product_category varchar(20)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1462_Products" overwrite into table 1462_Products;
select * from 1462_Products;
create table 1462_Orders(product_id int, order_date date, unit int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1462_Orders" overwrite into table 1462_Orders;
select * from 1462_Orders;
知识点:date_format时间函数,group by -> having用法,以及right left join
select
p.product_name,
t.total_unit
from
1462 _Products p
right join
(
select
o.product_id,
sum(o.unit) total_unit
from
1462 _Orders o
group by
o.product_id,
date_format(o.order_date,'yyyy-MM')
having date_format(o.order_date,'yyyy-MM')= '2020-02'and total_unit >= 100
) t
on
p.product_id = t.product_id
执行结果:
1467_Hard_每次访问的交易次数
表: Visits
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| visit_date | date |
+---------------+---------+
(user_id, visit_date) 是该表的主键
该表的每行表示 user_id 在 visit_date 访问了银行
表: Transactions
+------------------+---------+
| Column Name | Type |
+------------------+---------+
| user_id | int |
| transaction_date | date |
| amount | int |
+------------------+---------+
该表没有主键,所以可能有重复行
该表的每一行表示 user_id 在 transaction_date 完成了一笔 amount 数额的交易
可以保证用户 (user) 在 transaction_date 访问了银行 (也就是说 Visits 表包含 (user_id, transaction_date) 行)
银行想要得到银行客户在一次访问时的交易次数和相应的在一次访问时该交易次数的客户数量的图表
写一条 SQL 查询多少客户访问了银行但没有进行任何交易,多少客户访问了银行进行了一次交易等等
结果包含两列:
transactions_count:客户在一次访问中的交易次数visits_count:在transactions_count交易次数下相应的一次访问时的客户数量
transactions_count 的值从 0 到所有用户一次访问中的 max(transactions_count)
按 transactions_count 排序
下面是查询结果格式的例子:
Visits 表:
+---------+------------+
| user_id | visit_date |
+---------+------------+
| 1 | 2020-01-01 |
| 2 | 2020-01-02 |
| 12 | 2020-01-01 |
| 19 | 2020-01-03 |
| 1 | 2020-01-02 |
| 2 | 2020-01-03 |
| 1 | 2020-01-04 |
| 7 | 2020-01-11 |
| 9 | 2020-01-25 |
| 8 | 2020-01-28 |
+---------+------------+
Transactions 表:
+---------+------------------+--------+
| user_id | transaction_date | amount |
+---------+------------------+--------+
| 1 | 2020-01-02 | 120 |
| 2 | 2020-01-03 | 22 |
| 7 | 2020-01-11 | 232 |
| 1 | 2020-01-04 | 7 |
| 9 | 2020-01-25 | 33 |
| 9 | 2020-01-25 | 66 |
| 8 | 2020-01-28 | 1 |
| 9 | 2020-01-25 | 99 |
+---------+------------------+--------+
结果表:
+--------------------+--------------+
| transactions_count | visits_count |
+--------------------+--------------+
| 0 | 4 |
| 1 | 5 |
| 2 | 0 |
| 3 | 1 |
+--------------------+--------------+
* 对于 transactions_count = 0, visits 中 (1, "2020-01-01"), (2, "2020-01-02"), (12, "2020-01-01") 和 (19, "2020-01-03") 没有进行交易,所以 visits_count = 4 。
* 对于 transactions_count = 1, visits 中 (2, "2020-01-03"), (7, "2020-01-11"), (8, "2020-01-28"), (1, "2020-01-02") 和 (1, "2020-01-04") 进行了一次交易,所以 visits_count = 5 。
* 对于 transactions_count = 2, 没有客户访问银行进行了两次交易,所以 visits_count = 0 。
* 对于 transactions_count = 3, visits 中 (9, "2020-01-25") 进行了三次交易,所以 visits_count = 1 。
* 对于 transactions_count >= 4, 没有客户访问银行进行了超过3次交易,所以我们停止在 transactions_count = 3 。
如下是这个例子的图表:
建表+导数
create table 1467_Visits(user_id int, visit_date date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1467_Visits" overwrite into table 1467_Visits;
select * from 1467_Visits;
create table 1467_Transactions(user_id int, transaction_date date, amount int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1467_Transactions" overwrite into table 1467_Transactions;
select * from 1467_Transactions;
知识点:视图创建、rank函数、基础列创建
create
view tmp as
select
coalesce(t.cnt,0) cnt,count(1) cnt_nums
from
1467 _Visits vis
left join
(
select
tran.user_id,tran.transaction_date,count(1) cnt
from
1467 _Transactions tran
group by
tran.user_id,tran.transaction_date
) t
on vis.visit_date = t.transaction_date and vis.user_id = t.user_id
group by
cnt
select * from tmp
tmp执行结果:
最终执行结果:
select
t0.`rows`-1 transactions_count, coalesce(tmp.cnt_nums, 0) visits_count
from
tmp
right join
(
select
rank() over(order by user_id,transaction_date) `rows`
from
1467 _Transactions
group by
user_id,transaction_date
) t0
on tmp.cnt = t0.`rows`-1
where
t0.`rows`-1 <= 3
败笔:如何拿到transactions_count的最大值3
1480_Medium_电影评分
表:Movies
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| movie_id | int |
| title | varchar |
+---------------+---------+
movie_id 是这个表的主键。
title 是电影的名字。
表:Users
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| name | varchar |
+---------------+---------+
user_id 是表的主键。
表:Movie_Rating
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| movie_id | int |
| user_id | int |
| rating | int |
| created_at | date |
+---------------+---------+
(movie_id, user_id) 是这个表的主键。
这个表包含用户在其评论中对电影的评分 rating 。
created_at 是用户的点评日期。
请你编写一组 SQL 查询:
-
查找评论电影数量最多的用户名。
如果出现平局,返回字典序较小的用户名。
-
查找在 2020 年 2 月 平均评分最高 的电影名称。
如果出现平局,返回字典序较小的电影名称。
查询分两行返回,查询结果格式如下例所示:
Movies 表:
+-------------+--------------+
| movie_id | title |
+-------------+--------------+
| 1 | Avengers |
| 2 | Frozen 2 |
| 3 | Joker |
+-------------+--------------+
Users 表:
+-------------+--------------+
| user_id | name |
+-------------+--------------+
| 1 | Daniel |
| 2 | Monica |
| 3 | Maria |
| 4 | James |
+-------------+--------------+
Movie_Rating 表:
+-------------+--------------+--------------+-------------+
| movie_id | user_id | rating | created_at |
+-------------+--------------+--------------+-------------+
| 1 | 1 | 3 | 2020-01-12 |
| 1 | 2 | 4 | 2020-02-11 |
| 1 | 3 | 2 | 2020-02-12 |
| 1 | 4 | 1 | 2020-01-01 |
| 2 | 1 | 5 | 2020-02-17 |
| 2 | 2 | 2 | 2020-02-01 |
| 2 | 3 | 2 | 2020-03-01 |
| 3 | 1 | 3 | 2020-02-22 |
| 3 | 2 | 4 | 2020-02-25 |
+-------------+--------------+--------------+-------------+
Result 表:
+--------------+
| results |
+--------------+
| Daniel |
| Frozen 2 |
+--------------+
Daniel 和 Monica 都点评了 3 部电影("Avengers", "Frozen 2" 和 "Joker") 但是 Daniel 字典序比较小。
Frozen 2 和 Joker 在 2 月的评分都是 3.5,但是 Frozen 2 的字典序比较小。
建表+导数
create table 1480_Movies(movie_id int, title varchar(20)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1480_Movies" overwrite into table 1480_Movies;
select * from 1480_Movies;
create table 1480_Users(user_id int, name varchar(20)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1480_Users" overwrite into table 1480_Users;
select * from 1480_Users;
create table 1480_Movie_Rating(movie_id int, user_id int, rating int, created_at date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1480_Movie_Rating" overwrite into table 1480_Movie_Rating;
select * from 1480_Movie_Rating;
知识点:union all连接(字段必须相同),rk()函数排序,create view tableXX as Select。。。创建视图表。
- 创建视图表
create
view tmp as select
m.title,
u.name,
mr.rating,
mr.created_at
from
1480_Movie_Rating mr
left join 1480 _Movies m on
mr.movie_id = m.movie_id
left join 1480_Users u on
mr.user_id = u.user_id
tmp表:
- 结果代码
select
t2.name results
from
(
select
t.name,
rank() over(
order by t.cnt desc,
t.name) rk
from
(
select
name,
count(1) cnt
from
tmp
group by
name
) t ) t2
where
t2.rk = 1
union all
select
tt2.title results
from
(
select
tt.title,
rank() over(
order by tt.per_rating desc,
tt.title) rk1
from
(
select
title,
sum(rating)/ count(1) per_rating
from
tmp
where
date_format(created_at,'yyyy-MM') = '2020-02'
group by
title ) tt ) tt2
where
tt2.rk1 = 1
执行结果: