力扣HQL--<614-1215>

164 阅读6分钟

614_Medium_二级关注者

在 facebook 中,表 follow 会有 2 个字段: followee, follower ,分别表示被关注者和关注者。请写一个 sql 查询语句,对每一个关注者,查询关注他的关注者的数目。比方说:

+-------------+------------+
| followee    | follower   |
+-------------+------------+
|     A       |     B      |
|     B       |     C      |
|     B       |     D      |
|     D       |     E      |
+-------------+------------+

应该输出:

+-------------+------------+
| follower    | num        |
+-------------+------------+
|     B       |  2         |
|     D       |  1         |
+-------------+------------+

解释:B 和 D 都在在 follower 字段中出现,作为被关注者,B 被 C 和 D 关注,D 被 E 关注。A 不在 follower 字段内,所以A不在输出列表中。 注意:被关注者永远不会被他 / 她自己关注。将结果按照字典序返回。 


建表+导数

create table 614_follow(followe string, follower string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/614_follow.txt" overwrite into table 614_follow;

思路:先通过左连接求得,followee 也在 follower中的人数,对上述条件做group by order by排序

select t.followe,count(1) num
from 
(
     select 
        f1.followe followe, f1.follower follower
     from 614_follow f1
     join 614_follow f2
     on f1.followe=f2.follower
) t
group by t.followe order by t.followe

执行结果: image.png


中间表t的执行结果;

image.png

615_Hard_平均工资:部门与公司比较

给如下两个表,写一个查询语句,求出在每一个工资发放日,每个部门的平均工资与公司的平均工资的比较结果 (高 / 低 / 相同)。 表: salary

| id | employee_id | amount | pay_date   |
|----|-------------|--------|------------|
| 1  | 1           | 9000   | 2017-03-31 |
| 2  | 2           | 6000   | 2017-03-31 |
| 3  | 3           | 10000  | 2017-03-31 |
| 4  | 1           | 7000   | 2017-02-28 |
| 5  | 2           | 6000   | 2017-02-28 |
| 6  | 3           | 8000   | 2017-02-28 |

 employee_id 字段是表 employee 中 employee_id 字段的外键。 

| employee_id | department_id |
|-------------|---------------|
| 1           | 1             |
| 2           | 2             |
| 3           | 2             |

 对于如上样例数据,结果为: 

| pay_month | department_id | comparison  |
|-----------|---------------|-------------|
| 2017-03   | 1             | higher      |
| 2017-03   | 2             | lower       |
| 2017-02   | 1             | same        |
| 2017-02   | 2             | same        |

 解释 在三月,公司的平均工资是 (9000+6000+10000)/3 = 8333.33... 由于部门 '1' 里只有一个 employee_id 为 '1' 的员工,所以部门 '1' 的平均工资就是此人的工资 9000 。因为 9000 > 8333.33 ,所以比较结果是 'higher'。 第二个部门的平均工资为 employee_id 为 '2' 和 '3' 两个人的平均工资,为 (6000+10000)/2=8000 。因为 8000 < 8333.33 ,所以比较结果是 'lower' 。 在二月用同样的公式求平均工资并比较,比较结果为 'same' ,因为部门 '1' 和部门 '2' 的平均工资与公司的平均工资相同,都是 7000 。 


建表+导数

create table 615_salary(id int, employee_id  int, amount int, pay_date date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/615_salary.txt" overwrite into table 615_salary;

create table 615_employee(employee_id int, department_id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/615_employee.txt" overwrite into table 615_employee;

思路:先通过avg结合开窗函数over对公司和部分的工资求平均数,再结合case when语法比较部门和工资平均数大小,对结果distinct去重

select
     distinct date_format(t.pay_date, 'yyyy-MM') 
pay_month,
     t.department_id department_id,
     case
          when dept_avg > com_avg then 'higher'
          when dept_avg < com_avg then 'lower'
          else 'same'
     end as comparison
from
     (
     select
          s.pay_date pay_date,
          s.amount amount,
          e.department_id department_id,
          avg(s.amount) over(partition by s.pay_date) com_avg,
          avg(s.amount) over(partition by s.pay_date,
          e.department_id) dept_avg
     from
          615 _salary s
     left join 615 _employee e on
          s.employee_id = e.employee_id ) t

执行结果:

image.png


中间表t执行结果:

image.png

618_Hard_学生地理信息报告

一所美国大学有来自亚洲、欧洲和美洲的学生,他们的地理信息存放在如下student 表中。

| name   | continent |
|--------|-----------|
| Jack   | America   |
| Pascal | Europe    |
| Xi     | Asia      |
| Jane   | America   |

写一个查询语句实现对大洲(continent)列的透视表 操作,使得每个学生按照姓名的字母顺序依次排列在对应的大洲下面。输出的标题应依次为美洲(America)、亚洲(Asia)和欧洲(Europe)。对于样例输入,它的对应输出是:

| America | Asia | Europe |
|---------|------|--------|
| Jack    | Xi   | Pascal |
| Jane    |      |        |

进阶:如果不能确定哪个大洲的学生数最多,你可以写出一个查询去生成上述学生报告吗?


建表+导数

create table 618_student(name string, continent string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/618_student.txt" overwrite into table 618_student;

思路:第一步,对表中的continent字段排序(目的在于将一行数据绑定在一起,排序相同的在结果表中处在一列),第二步,对上诉排序结果rk 分区,那么continent对应的name字段就处在同一个分组中,将这些字段拿出来即可,但是由于使用了group by分组,那么在select条件过滤的时候也需要加入聚合函数,(max(),min()在此处作用都一样,因为一个分组中,只有一组数据中)

select 
min(case when t.continent='America' then t.name else null end) as America,
min(case when t.continent='Asia' then t.name else null end) as Asia,
min(case when t.continent='Europe' then t.name else null end) as Europe
from 
(
     select
     name, continent,
     row_number() over(partition by continent) rk
     from 618_student
) t
group by t.rk

执行结果: image.png

619_Easy_只出现一次的最大数字

表 my_numbers 的 num 字段包含很多数字,其中包括很多重复的数字。你能写一个 SQL 查询语句,找到只出现过一次的数字中,最大的一个数字吗?

+---+
|num|
+---+
| 8 |
| 8 |
| 3 |
| 3 |
| 1 |
| 4 |
| 5 |
| 6 | 

对于上面给出的样例数据,你的查询语句应该返回如下结果:

+---+
|num|
+---+
| 6 |

注意:如果没有只出现一次的数字,输出 null 。 


建表+导数:

create table 619_my_numbers(num  int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

load data inpath "/leetcode/619_my_numbers.txt" overwrite into table 619_my_numbers;

思路:对num开窗求条数,过滤保留条数等于1的数,对过滤的结果取max()

select 
max(num) num
from 
(
     select 
     num,
     count(1) over(partition by num) cnt
     from 619_my_numbers
) t
where t.cnt=1

执行结果:

image.png

620_Easy_有趣的电影

某城市开了一家新的电影院,吸引了很多人过来看电影。该电影院特别注意用户体验,专门有个 LED显示板做电影推荐,上面公布着影评和相关电影描述。作为该电影院的信息部主管,您需要编写一个 SQL查询,找出所有影片描述为非 boring (不无聊) 的并且 id 为奇数 的影片,结果请按等级 rating 排列。 例如,下表 cinema:

+---------+-----------+--------------+-----------+
|   id    | movie     |  description |  rating   |
+---------+-----------+--------------+-----------+
|   1     | War       |   great 3D   |   8.9     |
|   2     | Science   |   fiction    |   8.5     |
|   3     | irish     |   boring     |   6.2     |
|   4     | Ice song  |   Fantacy    |   8.6     |
|   5     | House card|   Interesting|   9.1     |
+---------+-----------+--------------+-----------+

对于上面的例子,则正确的输出是为:

+---------+-----------+--------------+-----------+
|   id    | movie     |  description |  rating   |
+---------+-----------+--------------+-----------+
|   5     | House card|   Interesting|   9.1     |
|   1     | War       |   great 3D   |   8.9     |
+---------+-----------+--------------+-----------+

建表+导数:

create table 620_cinema(id int, movie string, description string, rating decimal(2,1)) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY '\t';

load data inpath "/leetcode/620_cinema.txt" overwrite into table 620_cinema   ;

思路:按条件过滤,再order by 排序即可

select 
    id, movie, description, rating
from 
    620_cinema
where 
    description!='boring' and id%2=1
order by rating desc

执行结果:

image.png

626_Medium_换座位

小美是一所中学的信息科技老师,她有一张 seat 座位表,平时用来储存学生名字和与他们相对应的座位 id。其中纵列的 id 是连续递增的小美想改变相邻俩学生的座位。你能不能帮她写一个 SQL query 来输出小美想要的结果呢? 示例:

+---------+---------+
|    id   | student |
+---------+---------+
|    1    | Abbot   |
|    2    | Doris   |
|    3    | Emerson |
|    4    | Green   |
|    5    | Jeames  |
+---------+---------+

假如数据输入的是上表,则输出结果如下:

+---------+---------+
|    id   | student |
+---------+---------+
|    1    | Doris   |
|    2    | Abbot   |
|    3    | Green   |
|    4    | Emerson |
|    5    | Jeames  |
+---------+---------+

注意:如果学生人数是奇数,则不需要改变最后一个同学的座位。


建表+导数


解题思路:对于所有座位 id 是奇数的学生,修改其 id 为 id+1,如果最后一个座位 id 也是奇数,则最后一个座位 id 不修改。对于所有座位 id 是偶数的学生,修改其 id 为 id-1。 其中,最后一个座位id等价于座位总数量值。

采用case语法:

select 
case 
    when t.id=t.cnt then t.id
    when t.id<t.cnt and t.id%2=0 then t.id-1    
    else t.id+1 end as id,
t.student
from
(
     select 
     s.id id, s.student student,
     count(*) over() cnt
     from 626_seat s
) t
order by id

采用if语法:

select 

if(t.id%2=0 and t.id<t.cnt, t.id-1, if(t.id=t.cnt, t.id, 
t.id+1)) as id,
t.student
from
(
     select 
     s.id id, s.student student,
     count(*) over() cnt
     from 626_seat s
) t
order by id

执行结果:

image.png

627_Easy_变更性别

给定一个 salary 表,如下所示,有 m = 男性 和 f = 女性 的值。交换所有的 f 和 m 值(例如,将所有 f 值更改为 m,反之亦然)。要求只使用一个更新(Update)语句,并且没有中间的临时表。注意,您必只能写一个 Update 语句,请不要编写任何 Select 语句。例如:

| id | name | sex | salary |
|----|------|-----|--------|
| 1  | A    | m   | 2500   |
| 2  | B    | f   | 1500   |
| 3  | C    | m   | 5500   |
| 4  | D    | f   | 500    |

运行你所编写的更新语句之后,将会得到以下表:

| id | name | sex | salary |
|----|------|-----|--------|
| 1  | A    | f   | 2500   |
| 2  | B    | m   | 1500   |
| 3  | C    | f   | 5500   |
| 4  | D    | m   | 500    |

建表+导数

create table 627_salary(id int, name string, sex string, salary string) ROW FORMAT DELIMITED FIELDS TERMINATED BY 
'\t';
load data inpath "/leetcode/627_salary.txt" overwrite into table 627_salary   ;

hive从Hive 0.14.版本开始支持update操作 默认关闭,需调整参数设置:set hive.optimize.sort.dynamic.partition=false

set hive.optimize.sort.dynamic.partition=false

update 627_salary set sex=if(sex='m','f','m')

执行结果:

image.png

1135_Medium_买下所有产品的客户 ++++++++++++++++

Customer 表:

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| customer_id | int     |
| product_key | int     |
+-------------+---------+

product_key 是 Customer 表的外键。 Product 表:

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| product_key | int     |
+-------------+---------+

product_key 是这张表的主键。  写一条 SQL 查询语句,从 Customer 表中查询购买了 Product 表中所有产品的客户的 id。示例:Customer 表:

+-------------+-------------+
| customer_id | product_key |
+-------------+-------------+
| 1           | 5           |
| 2           | 6           |
| 3           | 5           |
| 3           | 6           |
| 1           | 6           |
+-------------+-------------+

Product 表:

+-------------+
| product_key |
+-------------+
| 5           |
| 6           |
+-------------+

Result 表:

+-------------+
| customer_id |
+-------------+
| 1           |
| 3           |
+-------------+

购买了所有产品(5 和 6)的客户的 id 是 1 和 3 。


建表+导数

create table 1135_Customer(customer_id int, product_key int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1135_Customer.txt" overwrite into table 1135_Customer ;

create table 1135_Product (product_key int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1135_Product.txt" overwrite into table 1135_Product ;

知识点:行列互转

针对商品表:由于需要找出购买了所有产品的用户,所以可以将商品表行转列,将所有商品通过concat_ws()函数拼接起来,由于商品属性是int,故需要cast()将int类型转为string类型,collect_set()为去重,同时因为使用了group by聚合函数,所以需要配置聚合函数将字段内容收集起来。 针对用户表:处理方式相同,将每个购买的商品行转列,再通过join on关联上商品表,能关联上的就是最终复合要求的。

select
     t1.customer_id
from
     (
     select
          customer_id,
          concat_ws("|", collect_set(cast(product_key as string))) c_pk
     from
          1135_Customer
     group by
          customer_id 
     ) t1
join (
     select
          conct_ws("|",collect_set(cast(p.product_key as string))) p_pk
     from
          (
          select
              product_key, rank() over() rk
          from
              1135 _Product
          order by
              product_key ) p
     group by
          p.rk 
    ) t2
where
     t1.c_pk = t2.p_pk

执行结果:

image.png


执行中间结果t1 image.png


执行中间结果t2

image.png

1136_Easy_合作过至少三次的演员和导演

ActorDirector 表:

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| actor_id    | int     |
| director_id | int     |
| timestamp   | int     |
+-------------+---------+

timestamp 是这张表的主键.  写一条SQL查询语句获取合作过至少三次的演员和导演的 id 对 (actor_id, director_id)示例:ActorDirector 表:

+-------------+-------------+-------------+
| actor_id    | director_id | timestamp   |
+-------------+-------------+-------------+
| 1           | 1           | 0           |
| 1           | 1           | 1           |
| 1           | 1           | 2           |
| 1           | 2           | 3           |
| 1           | 2           | 4           |
| 2           | 1           | 5           |
| 2           | 1           | 6           |
+-------------+-------------+-------------+

Result 表:

+-------------+-------------+
| actor_id    | director_id |
+-------------+-------------+
| 1           | 1           |
+-------------+-------------+

唯一的 id 对是 (1, 1),他们恰好合作了 3 次。


建表+导数:

create table 1136_ActorDirector(actor_id int, director_id int, `timestamp` int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1136_ActorDirector.txt" overwrite into table 1136_ActorDirector    ;

知识点:开窗函数

对actor_id,director_id分组开窗求条数,再对条数等于3的过滤出结果,最后对结果distinct去重**

select
     distinct t.actor_id,
     t.director_id
from
     (
     select
          actor_id,director_id,
          count(*) over(partition by actor_id,director_id) num
     from
          1136 _ActorDirector 
     ) t
where
     num = 3

执行结果: image.png


中间表t的执行结果: image.png

产品销售分析

销售表 Sales:

+-------------+-------+
| Column Name | Type  |
+-------------+-------+
| sale_id     | int   |
| product_id  | int   |
| year        | int   |
| quantity    | int   |
| price       | int   |
+-------------+-------+

(sale_id, year) 是销售表 Sales 的主键. product_id 是关联到产品表 Product 的外键. 注意: price 表示每单位价格 产品表 Product:

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| product_id   | int     |
| product_name | varchar |
+--------------+---------+

product_id 是表的主键.

示例:Sales 表:

+---------+------------+------+----------+-------+
| sale_id | product_id | year | quantity | price |
+---------+------------+------+----------+-------+ 
| 1       | 100        | 2008 | 10       | 5000  |
| 2       | 100        | 2009 | 12       | 5000  |
| 7       | 200        | 2011 | 15       | 9000  |
+---------+------------+------+----------+-------+

Product 表:

+------------+--------------+
| product_id | product_name |
+------------+--------------+
| 100        | Nokia        |
| 200        | Apple        |
| 300        | Samsung      |
+------------+--------------+

1153_Easy_产品销售分析 I

 写一条SQL 查询语句获取 Sales 表中所有产品对应的 产品名称 product_name 以及该产品的所有 售卖年份 year 和 价格 price 。

Result 表:

+--------------+-------+-------+
| product_name | year  | price |
+--------------+-------+-------+
| Nokia        | 2008  | 5000  |
| Nokia        | 2009  | 5000  |
| Apple        | 2011  | 9000  |
+--------------+-------+-------+

建表+导数

create table 1153_Sales(sale_id int, product_id int, `year` int, quantity int, price int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1153_Sales.txt" overwrite into table 1153_Sales	;

create table 1153_Product(product_id int, product_name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1153_Product.txt" overwrite into table 1153_Product	;

知识点:表左连接

select 
    p.product_name, s.`year`, s.price
from 
    1153_Sales s
left join
    1153_Product p
on s.product_id=p.product_id

执行结果:

image.png

1153_Easy_产品销售分析 II

编写一个 SQL 查询,按产品 id product_id 来统计每个产品的销售总量。 Result 表:

+--------------+----------------+
| product_id   | total_quantity |
+--------------+----------------+
| 100          | 22             |
| 200          | 15             |
+--------------+----------------+

建表+导数

create table 1154_Sales(sale_id int, product_id int, `year` int, quantity int, price int) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1154_Sales.txt" overwrite into table 1154_Sales ;

create table 1154_Product(product_id int, product_name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1154_Product.txt" overwrite into table 1154_Product ;

思路:group by 分区,在sum求和即可

select 
product_id,
sum(quantity) as total_quantity 
from 1154_Sales
group by product_id

执行结果: image.png

1155_Medium_产品销售分析 III

写一条SQL 查询语句获取 Sales 表中所有产品对应的 产品名称 product_name 以及该产品的所有 售卖年份 year 和 价格 price 。 Result 表:

+--------------+-------+-------+
| product_name | year  | price |
+--------------+-------+-------+
| Nokia        | 2008  | 5000  |
| Nokia        | 2009  | 5000  |
| Apple        | 2011  | 9000  |
+--------------+-------+-------+

建表+导数:

create table 1155_Sales(sale_id int, product_id int, `year` int, quantity int, price int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1155_Sales.txt" overwrite into table 1155_Sales   ;

create table 1155_Product(product_id int, product_name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1155_Product.txt" overwrite into table 1155_Product ;

select 
p.product_name, s.`year`, s.price
from 1154_Sales s
left join 1155_Product p
on s.product_id=p.product_id

执行结果:

image.png

游戏玩法分析

活动表 Activity:

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| player_id    | int     |
| device_id    | int     |
| event_date   | date    |
| games_played | int     |
+--------------+---------+

表的主键是 (player_id, event_date)。 这张表展示了一些游戏玩家在游戏平台上的行为活动。 每行数据记录了一名玩家在退出平台之前,当天使用同一台设备登录平台后打开的游戏的数目(可能是 0 个)。

1179_Easy_游戏玩法分析 I

写一条 SQL 查询语句获取每位玩家 第一次登陆平台的日期。查询结果的格式如下所示:Activity 表:

+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-05-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+

Result 表:

+-----------+-------------+
| player_id | first_login |
+-----------+-------------+
| 1         | 2016-03-01  |
| 2         | 2017-06-25  |
| 3         | 2016-03-02  |
+-----------+-------------+

建表+导数

create table 1179_Activity(player_id int, device_id int, event_date date, games_played int) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t';

load data inpath "/leetcode/1179_Activity.txt" overwrite into table 1179_Activity;

知识点:开窗函数结合分组top n

select 
distinct 
t.player_id, t.event_date
from 
(
     select 
     player_id,
     first_value(event_date) over(partition by player_id order by event_date) event_date
     from 1179_Activity
) t

执行结果

image.png

1180_Easy_游戏玩法分析  II

请编写一个 SQL 查询,描述每一个玩家首次登陆的设备名称查询结果格式在以下示例中:Activity table:

+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-05-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+

Result table:

+-----------+-----------+
| player_id | device_id |
+-----------+-----------+
| 1         | 2         |
| 2         | 3         |
| 3         | 1         |
+-----------+-----------+

建表+导数:

create table 1180_Activity(player_id int, device_id int, event_date date, games_played int) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t';

load data inpath "/leetcode/1180_Activity.txt" overwrite into table 1180_Activity;

知识点:开窗函数实现分组top n,并结合条件过滤

思路:取分组数据的第一条 实现:开窗对日期、排序之后取first_value,对该数据过滤关联上driver_id即可

select 
t.player_id, t.device_id
from 
(
     select 
     player_id,
     device_id,
     event_date,
     first_value(event_date) over(partition by player_id order by event_date) first_day
     from 1180_Activity
) t
where t.event_date=t.first_day

执行结果 image.png

1181_Easy_游戏玩法分析  III ++++++++++++++++

编写一个 SQL 查询,同时报告每组玩家和日期,以及玩家到目前为止玩了多少游戏。也就是说,在此日期之前玩家所玩的游戏总数。详细情况请查看示例。查询结果格式如下所示:Activity table:

+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-05-02 | 6            |
| 1         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+

Result table:

+-----------+------------+---------------------+
| player_id | event_date | games_played_so_far |
+-----------+------------+---------------------+
| 1         | 2016-03-01 | 5                   |
| 1         | 2016-05-02 | 11                  |
| 1         | 2017-06-25 | 12                  |
| 3         | 2016-03-02 | 0                   |
| 3         | 2018-07-03 | 5                   |
+-----------+------------+---------------------+

对于 ID 为 1 的玩家,2016-05-02 共玩了 5+6=11 个游戏,2017-06-25 共玩了 5+6+1=12 个游戏。 对于 ID 为 3 的玩家,2018-07-03 共玩了 0+5=5 个游戏。 请注意,对于每个玩家,我们只关心玩家的登录日期。


建表+导数:

create table 1181_Activity(player_id int, device_id int, event_date date, games_played int) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t';

load data inpath "/leetcode/1181_Activity.txt" overwrite into table 1181_Activity;

知识点:开窗函数,窗口大小控制

对日期分区,排序之后,计算将截止到当前时间的和** 实现:通过over()开窗,在窗内对日期字段分组并升序排列,使用ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,控制窗口大小,即从组内第一条到前条,再通过sum()计算窗口内的和

select
     player_id,
     event_date,
     sum(games_played) over(partition by player_id
order by
     event_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) games_played_so_far
from
     1181 _Activity

执行结果:

image.png

1182_Medium_游戏玩法分析 IV ?????????

编写一个 SQL 查询,报告在首次登录的第二天再次登录的玩家的比率,四舍五入到小数点后两位。换句话说,您需要计算从首次登录日期开始至少连续两天登录的玩家的数量,然后除以玩家总数。查询结果格式如下所示:Activity table:

+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-03-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+

Result table:

+-----------+
| fraction  |
+-----------+
| 0.33      |
+-----------+

只有 ID 为 1 的玩家在第一天登录后才重新登录,所以答案是 1/3 = 0.33


建表+导数

create table 1182_Activity(player_id int, device_id int, event_date date, games_played int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

load data inpath "/leetcode/1182_Activity.txt" overwrite into table 1182_Activity;

知识点:开窗函数计算连续登陆

两步走:第一步获取连续登陆的人的id,第二步:对player_id去重,得到总人数。将两步join起来。 通过date_sub函数减去rank值,在分组,即可得到连续登陆得人。 最后将在group by中对分组内的条数求和,过滤出条数大于2的,在select中通过round()函数保留小数位,在round函数中使用player_id人数(distinct去重)/ cnt**

select
     round(count(distinct t1.player_id)/ t1.cnt,2)
from
     (
     select
          t.player_id,
          t.event_date,
          date_sub(t.event_date, t.rk) event_date_sub_rk,
          t.cnt
     from
          (
          select
              a.player_id,
              a.event_date,
              rank() over(partition by a.player_id order by a.event_date) rk,
              base2.cnt
          from
              1182 _Activity a
          left join
              (
              select
                   dis_play,
                   count(1) over() cnt
              from
                   (
                   select
                        distinct player_id dis_play
                   from
                        1182 _Activity 
                   ) base
               ) base2 
           on
              a.player_id = base2.dis_play ) t ) t1
group by
     player_id,
     event_date_sub_rk,
     cnt
having
     count(*)= 2

执行结果: image.png


中间表t执行结果:

image.png

1193_Hard_游戏玩法分析V ????????????

编写一个 SQL 查询,报告所有安装日期、当天安装游戏的玩家数量和玩家的第一天留存率。查询结果格式如下所示:Activity 表:

+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-03-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-01 | 0            |
| 3         | 4         | 2016-07-03 | 5            |
+-----------+-----------+------------+--------------+

Result 表:

+------------+----------+----------------+
| install_dt | installs | Day1_retention |
+------------+----------+----------------+
| 2016-03-01 | 2        | 0.50           |
| 2017-06-25 | 1        | 0.00           |
+------------+----------+----------------+

玩家 1 和 3 在 2016-03-01 安装了游戏,但只有玩家 1 在 2016-03-02 重新登录,所以 2016-03-01 的第一天留存率是 1/2=0.50 玩家 2 在 2017-06-25 安装了游戏,但在 2017-06-26 没有重新登录,因此 2017-06-25 的第一天留存率为 0/1=0.00


建表+导数

create table 1193_Activity(player_id int, device_id int, event_date date, games_played int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

load data inpath "/leetcode/1193_Activity.txt" overwrite into table 1193_Activity;

1198_Medium_小众书籍

书籍表 Books:

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| book_id        | int     |
| name           | varchar |
| available_from | date    |
+----------------+---------+

book_id 是这个表的主键。 订单表 Orders:

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| order_id       | int     |
| book_id        | int     |
| quantity       | int     |
| dispatch_date  | date    |
+----------------+---------+

order_id 是这个表的主键。 book_id 是 Books 表的外键。  你需要写一段 SQL 命令,筛选出过去一年中订单总量 少于10本 的 书籍 。注意:不考虑 上架(available from)距今 不满一个月 的书籍。并且 假设今天是 2019-06-23 。 下面是样例输出结果:Books 表:

+---------+--------------------+----------------+
| book_id | name               | available_from |
+---------+--------------------+----------------+
| 1       | "Kalila And Demna" | 2010-01-01     |
| 2       | "28 Letters"       | 2012-05-12     |
| 3       | "The Hobbit"       | 2019-06-10     |
| 4       | "13 Reasons Why"   | 2019-06-01     |
| 5       | "The Hunger Games" | 2008-09-21     |
+---------+--------------------+----------------+

Orders 表:

+----------+---------+----------+---------------+
| order_id | book_id | quantity | dispatch_date |
+----------+---------+----------+---------------+
| 1        | 1       | 2        | 2018-07-26    |
| 2        | 1       | 1        | 2018-11-05    |
| 3        | 3       | 8        | 2019-06-11    |
| 4        | 4       | 6        | 2019-06-05    |
| 5        | 4       | 5        | 2019-06-20    |
| 6        | 5       | 9        | 2009-02-02    |
| 7        | 5       | 8        | 2010-04-13    |
+----------+---------+----------+---------------+

Result 表:

+-----------+--------------------+
| book_id   | name               |
+-----------+--------------------+
| 1         | "Kalila And Demna" |
| 2         | "28 Letters"       |
| 5         | "The Hunger Games" |
+-----------+--------------------+

建表+导数

create table 1198_Books(book_id int, name string, available_from date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1193_Activity.txt" overwrite into table 1193_Activity;

create table 1198_Orders(order_id int, book_id int, quantity int,dispatch_date date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1193_Activity.txt" overwrite into table 1193_Activity;

知识点:join过程中的空值处理,datediff,date_sub

思路:对给定条件进行left join,并在join过程中过滤。对于空值,使用nvl()函数设置默认值 datediff('2019-06-23', o.dispatch_date) <= 365 过滤出近一年的数据。 date_sub(o.dispatch_date, 30) > available_from过滤出30天外的数据。

select 
t.name name
from 
(
          select
          b.name,   b.available_from, nvl(o.quantity,0), date_sub(o.dispatch_date,30) sub_dispatch_date
     from
          1198_Books b
     left join 1198_Orders o 
     on b.book_id = o.book_id and datediff('2019-06-23', o.dispatch_date) <= 365
     where
          date_sub(o.dispatch_date, 30) > available_from or quantity is NULL
) t
group by t.name having sum(t.quantity) < 10

执行结果:

image.png


执行中间结果表t:

image.png

1204_Medium_每日新用户统计

Traffic 表:

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| activity      | enum    |
| activity_date | date    |
+---------------+---------+

该表没有主键,它可能有重复的行。 activity 列是 ENUM 类型,可能取 ('login', 'logout', 'jobs', 'groups', 'homepage') 几个值之一。  编写一个 SQL 查询,以查询从今天起最多 90 天内,每个日期该日期首次登录的用户数。假设今天是 2019-06-30.查询结果格式如下例所示:Traffic 表:

+---------+----------+---------------+
| user_id | activity | activity_date |
+---------+----------+---------------+
| 1       | login    | 2019-05-01    |
| 1       | homepage | 2019-05-01    |
| 1       | logout   | 2019-05-01    |
| 2       | login    | 2019-06-21    |
| 2       | logout   | 2019-06-21    |
| 3       | login    | 2019-01-01    |
| 3       | jobs     | 2019-01-01    |
| 3       | logout   | 2019-01-01    |
| 4       | login    | 2019-06-21    |
| 4       | groups   | 2019-06-21    |
| 4       | logout   | 2019-06-21    |
| 5       | login    | 2019-03-01    |
| 5       | logout   | 2019-03-01    |
| 5       | login    | 2019-06-21    |
| 5       | logout   | 2019-06-21    |
+---------+----------+---------------+

Result 表:

+------------+-------------+
| login_date | user_count  |
+------------+-------------+
| 2019-05-01 | 1           |
| 2019-06-21 | 2           |
+------------+-------------+

请注意,我们只关心用户数非零的日期. ID 为 5 的用户第一次登陆于 2019-03-01,因此他不算在 2019-06-21 的的统计内。


建表+导数

create table 1204_Traffic (user_id int, activity string, activity_date date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

load data inpath "/leetcode/1204_Traffic.txt" overwrite into table 1204_Traffic;

知识点:分组top n

思路,为了取出首次登陆(因为用户有多次登陆),先对use分组,对日期排序,取rank = 1,即是用户首次登陆。

select 
t.activity_date, count(*) user_count  
from 
(
     select 
     user_id, activity_date,activity,
     rank() over(partition by user_id order by activity_date) rk
     from 
     1204_Traffic
     where activity='login' 
) t
where t.rk = 1 and datediff('2019-06-30', activity_date) < 90
group by t.activity_date

执行结果:

image.png

1214_Medium_每位学生的最高成绩

表: Enrollments

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| student_id    | int     |
| course_id     | int     |
| grade         | int     |
+---------------+---------+

(student_id, course_id) 是该表的主键。

 编写一个 SQL 查询,查询每位学生获得的最高成绩和它所对应的科目,若科目成绩并列,取 course_id 最小的一门。查询结果需按 student_id 增序进行排序。查询结果格式如下所示:Enrollments 表:

+------------+-------------------+
| student_id | course_id | grade |
+------------+-----------+-------+
| 2          | 2         | 95    |
| 2          | 3         | 95    |
| 1          | 1         | 90    |
| 1          | 2         | 99    |
| 3          | 1         | 80    |
| 3          | 2         | 75    |
| 3          | 3         | 82    |
+------------+-----------+-------+

Result 表:

+------------+-------------------+
| student_id | course_id | grade |
+------------+-----------+-------+
| 1          | 2         | 99    |
| 2          | 2         | 95    |
| 3          | 3         | 82    |
+------------+-----------+-------+

思路:在最内层中做两个排序,第一个排序按student_id分组,按grade降序排,目的在于找到最高得成绩,第二个排序按student_id, grade分组,按course_id升序排,找到相同成绩时的course_id最小值。最外层按student_id升序排

select 
student_id, course_id, grade
from 
(
     select 
     student_id, course_id, grade,
     rank() over(partition by student_id order by grade desc) rk_grade,
     rank() over(partition by student_id, grade order by course_id) rk_course_id
     from 1214_Enrollments
) t
where rk_grade=1 and rk_course_id=1
order by student_id

执行结果:

image.png

1215_Easy_报告的记录

动作表:Actions

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| post_id       | int     |
| action_date   | date    | 
| action        | enum    |
| extra         | varchar |
+---------------+---------+

此表没有主键,所以可能会有重复的行。 action 字段是 ENUM 类型的,包含:('view', 'like', 'reaction', 'comment', 'report', 'share') extra 字段是可选的信息(可能为 null),其中的信息例如有:1.报告理由(a reason for report) 2.反应类型(a type of reaction)  编写一条SQL,查询每种 报告理由(report reason)在昨天的不同报告数量(post_id)。假设今天是 2019-07-05。查询及结果的格式示例:Actions table:

+---------+---------+-------------+--------+--------+
| user_id | post_id | action_date | action | extra  |
+---------+---------+-------------+--------+--------+
| 1       | 1       | 2019-07-01  | view   | null   |
| 1       | 1       | 2019-07-01  | like   | null   |
| 1       | 1       | 2019-07-01  | share  | null   |
| 2       | 4       | 2019-07-04  | view   | null   |
| 2       | 4       | 2019-07-04  | report | spam   |
| 3       | 4       | 2019-07-04  | view   | null   |
| 3       | 4       | 2019-07-04  | report | spam   |
| 4       | 3       | 2019-07-02  | view   | null   |
| 4       | 3       | 2019-07-02  | report | spam   |
| 5       | 2       | 2019-07-04  | view   | null   |
| 5       | 2       | 2019-07-04  | report | racism |
| 5       | 5       | 2019-07-04  | view   | null   |
| 5       | 5       | 2019-07-04  | report | racism |
+---------+---------+-------------+--------+--------+

Result table:

+---------------+--------------+
| report_reason | report_count |
+---------------+--------------+
| spam          | 1            |
| racism        | 2            |
+---------------+--------------+ 

注意,我们只关心报告数量非零的结果。


建表+导数:

create table 1215_Actions(user_id int, post_id int, action_date date, `action` string, extra varchar(100)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

load data inpath "/leetcode/1215_Actions" overwrite into table 1215_Actions;


思路:日期函数datediff的使用

select 
extra,
count(extra)
from 
(
     select 
     datediff("2019-07-05", action_date) ddf,
     action,
     extra
     from 1215_Actions
) t
where ddf=1 and extra!="null"
group by extra