614_Medium_二级关注者

在 facebook 中，表 follow 会有 2 个字段： followee, follower ，分别表示被关注者和关注者。请写一个 sql 查询语句，对每一个关注者，查询关注他的关注者的数目。比方说：

+-------------+------------+
| followee    | follower   |
+-------------+------------+
|     A       |     B      |
|     B       |     C      |
|     B       |     D      |
|     D       |     E      |
+-------------+------------+

应该输出：

+-------------+------------+
| follower    | num        |
+-------------+------------+
|     B       |  2         |
|     D       |  1         |
+-------------+------------+

解释：B 和 D 都在在 follower 字段中出现，作为被关注者，B 被 C 和 D 关注，D 被 E 关注。A 不在 follower 字段内，所以A不在输出列表中。注意：被关注者永远不会被他 / 她自己关注。将结果按照字典序返回。

建表+导数

create table 614_follow(followe string, follower string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/614_follow.txt" overwrite into table 614_follow;

思路：先通过左连接求得，followee 也在 follower中的人数，对上述条件做group by order by排序

select t.followe,count(1) num
from 
(
     select 
        f1.followe followe, f1.follower follower
     from 614_follow f1
     join 614_follow f2
     on f1.followe=f2.follower
) t
group by t.followe order by t.followe

执行结果：

中间表t的执行结果；

615_Hard_平均工资：部门与公司比较

给如下两个表，写一个查询语句，求出在每一个工资发放日，每个部门的平均工资与公司的平均工资的比较结果（高 / 低 / 相同）。表： salary

| id | employee_id | amount | pay_date   |
|----|-------------|--------|------------|
| 1  | 1           | 9000   | 2017-03-31 |
| 2  | 2           | 6000   | 2017-03-31 |
| 3  | 3           | 10000  | 2017-03-31 |
| 4  | 1           | 7000   | 2017-02-28 |
| 5  | 2           | 6000   | 2017-02-28 |
| 6  | 3           | 8000   | 2017-02-28 |

employee_id 字段是表 employee 中 employee_id 字段的外键。

| employee_id | department_id |
|-------------|---------------|
| 1           | 1             |
| 2           | 2             |
| 3           | 2             |

对于如上样例数据，结果为：

| pay_month | department_id | comparison  |
|-----------|---------------|-------------|
| 2017-03   | 1             | higher      |
| 2017-03   | 2             | lower       |
| 2017-02   | 1             | same        |
| 2017-02   | 2             | same        |

解释在三月，公司的平均工资是 (9000+6000+10000)/3 = 8333.33... 由于部门 '1' 里只有一个 employee_id 为 '1' 的员工，所以部门 '1' 的平均工资就是此人的工资 9000 。因为 9000 > 8333.33 ，所以比较结果是 'higher'。第二个部门的平均工资为 employee_id 为 '2' 和 '3' 两个人的平均工资，为 (6000+10000)/2=8000 。因为 8000 < 8333.33 ，所以比较结果是 'lower' 。在二月用同样的公式求平均工资并比较，比较结果为 'same' ，因为部门 '1' 和部门 '2' 的平均工资与公司的平均工资相同，都是 7000 。

建表+导数

create table 615_salary(id int, employee_id  int, amount int, pay_date date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/615_salary.txt" overwrite into table 615_salary;

create table 615_employee(employee_id int, department_id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/615_employee.txt" overwrite into table 615_employee;

思路：先通过avg结合开窗函数over对公司和部分的工资求平均数，再结合case when语法比较部门和工资平均数大小，对结果distinct去重

select
     distinct date_format(t.pay_date, 'yyyy-MM') 
pay_month,
     t.department_id department_id,
     case
          when dept_avg > com_avg then 'higher'
          when dept_avg < com_avg then 'lower'
          else 'same'
     end as comparison
from
     (
     select
          s.pay_date pay_date,
          s.amount amount,
          e.department_id department_id,
          avg(s.amount) over(partition by s.pay_date) com_avg,
          avg(s.amount) over(partition by s.pay_date,
          e.department_id) dept_avg
     from
          615 _salary s
     left join 615 _employee e on
          s.employee_id = e.employee_id ) t

执行结果：

中间表t执行结果：

618_Hard_学生地理信息报告

一所美国大学有来自亚洲、欧洲和美洲的学生，他们的地理信息存放在如下student 表中。

| name   | continent |
|--------|-----------|
| Jack   | America   |
| Pascal | Europe    |
| Xi     | Asia      |
| Jane   | America   |

写一个查询语句实现对大洲（continent）列的透视表操作，使得每个学生按照姓名的字母顺序依次排列在对应的大洲下面。输出的标题应依次为美洲（America）、亚洲（Asia）和欧洲（Europe）。对于样例输入，它的对应输出是：

| America | Asia | Europe |
|---------|------|--------|
| Jack    | Xi   | Pascal |
| Jane    |      |        |

进阶：如果不能确定哪个大洲的学生数最多，你可以写出一个查询去生成上述学生报告吗？

建表+导数

create table 618_student(name string, continent string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/618_student.txt" overwrite into table 618_student;

思路：第一步，对表中的continent字段排序（目的在于将一行数据绑定在一起，排序相同的在结果表中处在一列），第二步，对上诉排序结果rk 分区，那么continent对应的name字段就处在同一个分组中，将这些字段拿出来即可，但是由于使用了group by分组，那么在select条件过滤的时候也需要加入聚合函数，（max（），min（）在此处作用都一样，因为一个分组中，只有一组数据中）

select 
min(case when t.continent='America' then t.name else null end) as America,
min(case when t.continent='Asia' then t.name else null end) as Asia,
min(case when t.continent='Europe' then t.name else null end) as Europe
from 
(
     select
     name, continent,
     row_number() over(partition by continent) rk
     from 618_student
) t
group by t.rk

执行结果：

619_Easy_只出现一次的最大数字

表 my_numbers 的 num 字段包含很多数字，其中包括很多重复的数字。你能写一个 SQL 查询语句，找到只出现过一次的数字中，最大的一个数字吗？

+---+
|num|
+---+
| 8 |
| 8 |
| 3 |
| 3 |
| 1 |
| 4 |
| 5 |
| 6 |

对于上面给出的样例数据，你的查询语句应该返回如下结果：

+---+
|num|
+---+
| 6 |

注意：如果没有只出现一次的数字，输出 null 。

建表+导数：

create table 619_my_numbers(num  int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

load data inpath "/leetcode/619_my_numbers.txt" overwrite into table 619_my_numbers;

思路：对num开窗求条数，过滤保留条数等于1的数，对过滤的结果取max（）

select 
max(num) num
from 
(
     select 
     num,
     count(1) over(partition by num) cnt
     from 619_my_numbers
) t
where t.cnt=1

执行结果：

620_Easy_有趣的电影

某城市开了一家新的电影院，吸引了很多人过来看电影。该电影院特别注意用户体验，专门有个 LED显示板做电影推荐，上面公布着影评和相关电影描述。作为该电影院的信息部主管，您需要编写一个 SQL查询，找出所有影片描述为非 boring (不无聊) 的并且 id 为奇数的影片，结果请按等级 rating 排列。例如，下表 cinema:

+---------+-----------+--------------+-----------+
|   id    | movie     |  description |  rating   |
+---------+-----------+--------------+-----------+
|   1     | War       |   great 3D   |   8.9     |
|   2     | Science   |   fiction    |   8.5     |
|   3     | irish     |   boring     |   6.2     |
|   4     | Ice song  |   Fantacy    |   8.6     |
|   5     | House card|   Interesting|   9.1     |
+---------+-----------+--------------+-----------+

对于上面的例子，则正确的输出是为：

+---------+-----------+--------------+-----------+
|   id    | movie     |  description |  rating   |
+---------+-----------+--------------+-----------+
|   5     | House card|   Interesting|   9.1     |
|   1     | War       |   great 3D   |   8.9     |
+---------+-----------+--------------+-----------+

建表+导数：

create table 620_cinema(id int, movie string, description string, rating decimal(2,1)) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY '\t';

load data inpath "/leetcode/620_cinema.txt" overwrite into table 620_cinema   ;

思路：按条件过滤，再order by 排序即可

select 
    id, movie, description, rating
from 
    620_cinema
where 
    description!='boring' and id%2=1
order by rating desc

执行结果：

626_Medium_换座位

小美是一所中学的信息科技老师，她有一张 seat 座位表，平时用来储存学生名字和与他们相对应的座位 id。其中纵列的 id 是连续递增的小美想改变相邻俩学生的座位。你能不能帮她写一个 SQL query 来输出小美想要的结果呢？示例：

+---------+---------+
|    id   | student |
+---------+---------+
|    1    | Abbot   |
|    2    | Doris   |
|    3    | Emerson |
|    4    | Green   |
|    5    | Jeames  |
+---------+---------+

假如数据输入的是上表，则输出结果如下：

+---------+---------+
|    id   | student |
+---------+---------+
|    1    | Doris   |
|    2    | Abbot   |
|    3    | Green   |
|    4    | Emerson |
|    5    | Jeames  |
+---------+---------+

注意：如果学生人数是奇数，则不需要改变最后一个同学的座位。

建表+导数

解题思路：对于所有座位 id 是奇数的学生，修改其 id 为 id+1，如果最后一个座位 id 也是奇数，则最后一个座位 id 不修改。对于所有座位 id 是偶数的学生，修改其 id 为 id-1。其中，最后一个座位id等价于座位总数量值。

采用case语法：

select 
case 
    when t.id=t.cnt then t.id
    when t.id<t.cnt and t.id%2=0 then t.id-1    
    else t.id+1 end as id,
t.student
from
(
     select 
     s.id id, s.student student,
     count(*) over() cnt
     from 626_seat s
) t
order by id

采用if语法：

select 

if(t.id%2=0 and t.id<t.cnt, t.id-1, if(t.id=t.cnt, t.id, 
t.id+1)) as id,
t.student
from
(
     select 
     s.id id, s.student student,
     count(*) over() cnt
     from 626_seat s
) t
order by id

执行结果：

627_Easy_变更性别

给定一个 salary 表，如下所示，有 m = 男性和 f = 女性的值。交换所有的 f 和 m 值（例如，将所有 f 值更改为 m，反之亦然）。要求只使用一个更新（Update）语句，并且没有中间的临时表。注意，您必只能写一个 Update 语句，请不要编写任何 Select 语句。例如：

| id | name | sex | salary |
|----|------|-----|--------|
| 1  | A    | m   | 2500   |
| 2  | B    | f   | 1500   |
| 3  | C    | m   | 5500   |
| 4  | D    | f   | 500    |

运行你所编写的更新语句之后，将会得到以下表:

| id | name | sex | salary |
|----|------|-----|--------|
| 1  | A    | f   | 2500   |
| 2  | B    | m   | 1500   |
| 3  | C    | f   | 5500   |
| 4  | D    | m   | 500    |

建表+导数

create table 627_salary(id int, name string, sex string, salary string) ROW FORMAT DELIMITED FIELDS TERMINATED BY 
'\t';
load data inpath "/leetcode/627_salary.txt" overwrite into table 627_salary   ;

hive从Hive 0.14.版本开始支持update操作 默认关闭，需调整参数设置：set hive.optimize.sort.dynamic.partition=false

set hive.optimize.sort.dynamic.partition=false

update 627_salary set sex=if(sex='m','f','m')

执行结果：

1135_Medium_买下所有产品的客户 ++++++++++++++++

Customer 表：

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| customer_id | int     |
| product_key | int     |
+-------------+---------+

product_key 是 Customer 表的外键。 Product 表：

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| product_key | int     |
+-------------+---------+

product_key 是这张表的主键。写一条 SQL 查询语句，从 Customer 表中查询购买了 Product 表中所有产品的客户的 id。示例：Customer 表：

+-------------+-------------+
| customer_id | product_key |
+-------------+-------------+
| 1           | 5           |
| 2           | 6           |
| 3           | 5           |
| 3           | 6           |
| 1           | 6           |
+-------------+-------------+

Product 表：

+-------------+
| product_key |
+-------------+
| 5           |
| 6           |
+-------------+

Result 表：

+-------------+
| customer_id |
+-------------+
| 1           |
| 3           |
+-------------+

购买了所有产品（5 和 6）的客户的 id 是 1 和 3 。

建表+导数

create table 1135_Customer(customer_id int, product_key int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1135_Customer.txt" overwrite into table 1135_Customer ;

create table 1135_Product (product_key int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1135_Product.txt" overwrite into table 1135_Product ;

知识点：行列互转

针对商品表：由于需要找出购买了所有产品的用户，所以可以将商品表行转列，将所有商品通过concat_ws（）函数拼接起来，由于商品属性是int，故需要cast（）将int类型转为string类型，collect_set（）为去重，同时因为使用了group by聚合函数，所以需要配置聚合函数将字段内容收集起来。 针对用户表：处理方式相同，将每个购买的商品行转列，再通过join on关联上商品表，能关联上的就是最终复合要求的。

select
     t1.customer_id
from
     (
     select
          customer_id,
          concat_ws("|", collect_set(cast(product_key as string))) c_pk
     from
          1135_Customer
     group by
          customer_id 
     ) t1
join (
     select
          conct_ws("|",collect_set(cast(p.product_key as string))) p_pk
     from
          (
          select
              product_key, rank() over() rk
          from
              1135 _Product
          order by
              product_key ) p
     group by
          p.rk 
    ) t2
where
     t1.c_pk = t2.p_pk

执行结果：

执行中间结果t1

执行中间结果t2

1136_Easy_合作过至少三次的演员和导演

ActorDirector 表：

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| actor_id    | int     |
| director_id | int     |
| timestamp   | int     |
+-------------+---------+

timestamp 是这张表的主键. 写一条SQL查询语句获取合作过至少三次的演员和导演的 id 对 (actor_id, director_id)示例：ActorDirector 表：

+-------------+-------------+-------------+
| actor_id    | director_id | timestamp   |
+-------------+-------------+-------------+
| 1           | 1           | 0           |
| 1           | 1           | 1           |
| 1           | 1           | 2           |
| 1           | 2           | 3           |
| 1           | 2           | 4           |
| 2           | 1           | 5           |
| 2           | 1           | 6           |
+-------------+-------------+-------------+

Result 表：

+-------------+-------------+
| actor_id    | director_id |
+-------------+-------------+
| 1           | 1           |
+-------------+-------------+

唯一的 id 对是 (1, 1)，他们恰好合作了 3 次。

建表+导数：

create table 1136_ActorDirector(actor_id int, director_id int, `timestamp` int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1136_ActorDirector.txt" overwrite into table 1136_ActorDirector    ;

知识点：开窗函数

对actor_id,director_id分组开窗求条数，再对条数等于3的过滤出结果，最后对结果distinct去重**

select
     distinct t.actor_id,
     t.director_id
from
     (
     select
          actor_id,director_id,
          count(*) over(partition by actor_id,director_id) num
     from
          1136 _ActorDirector 
     ) t
where
     num = 3

执行结果：

中间表t的执行结果：

产品销售分析

销售表 Sales：

+-------------+-------+
| Column Name | Type  |
+-------------+-------+
| sale_id     | int   |
| product_id  | int   |
| year        | int   |
| quantity    | int   |
| price       | int   |
+-------------+-------+

(sale_id, year) 是销售表 Sales 的主键. product_id 是关联到产品表 Product 的外键. 注意: price 表示每单位价格产品表 Product：

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| product_id   | int     |
| product_name | varchar |
+--------------+---------+

product_id 是表的主键.

示例：Sales 表：

+---------+------------+------+----------+-------+
| sale_id | product_id | year | quantity | price |
+---------+------------+------+----------+-------+ 
| 1       | 100        | 2008 | 10       | 5000  |
| 2       | 100        | 2009 | 12       | 5000  |
| 7       | 200        | 2011 | 15       | 9000  |
+---------+------------+------+----------+-------+

Product 表：

+------------+--------------+
| product_id | product_name |
+------------+--------------+
| 100        | Nokia        |
| 200        | Apple        |
| 300        | Samsung      |
+------------+--------------+

1153_Easy_产品销售分析 I

写一条SQL 查询语句获取 Sales 表中所有产品对应的产品名称 product_name 以及该产品的所有售卖年份 year 和价格 price 。

Result 表：

+--------------+-------+-------+
| product_name | year  | price |
+--------------+-------+-------+
| Nokia        | 2008  | 5000  |
| Nokia        | 2009  | 5000  |
| Apple        | 2011  | 9000  |
+--------------+-------+-------+

建表+导数

create table 1153_Sales(sale_id int, product_id int, `year` int, quantity int, price int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1153_Sales.txt" overwrite into table 1153_Sales	;

create table 1153_Product(product_id int, product_name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1153_Product.txt" overwrite into table 1153_Product	;

知识点：表左连接

select 
    p.product_name, s.`year`, s.price
from 
    1153_Sales s
left join
    1153_Product p
on s.product_id=p.product_id

执行结果：

1153_Easy_产品销售分析 II

编写一个 SQL 查询，按产品 id product_id 来统计每个产品的销售总量。 Result 表：

+--------------+----------------+
| product_id   | total_quantity |
+--------------+----------------+
| 100          | 22             |
| 200          | 15             |
+--------------+----------------+

建表+导数

create table 1154_Sales(sale_id int, product_id int, `year` int, quantity int, price int) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1154_Sales.txt" overwrite into table 1154_Sales ;

create table 1154_Product(product_id int, product_name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1154_Product.txt" overwrite into table 1154_Product ;

思路：group by 分区，在sum求和即可

select 
product_id,
sum(quantity) as total_quantity 
from 1154_Sales
group by product_id

执行结果：

1155_Medium_产品销售分析 III

写一条SQL 查询语句获取 Sales 表中所有产品对应的产品名称 product_name 以及该产品的所有售卖年份 year 和价格 price 。 Result 表：

+--------------+-------+-------+
| product_name | year  | price |
+--------------+-------+-------+
| Nokia        | 2008  | 5000  |
| Nokia        | 2009  | 5000  |
| Apple        | 2011  | 9000  |
+--------------+-------+-------+

建表+导数：

create table 1155_Sales(sale_id int, product_id int, `year` int, quantity int, price int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1155_Sales.txt" overwrite into table 1155_Sales   ;

create table 1155_Product(product_id int, product_name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1155_Product.txt" overwrite into table 1155_Product ;

select 
p.product_name, s.`year`, s.price
from 1154_Sales s
left join 1155_Product p
on s.product_id=p.product_id

执行结果：

游戏玩法分析

活动表 Activity：

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| player_id    | int     |
| device_id    | int     |
| event_date   | date    |
| games_played | int     |
+--------------+---------+

表的主键是 (player_id, event_date)。这张表展示了一些游戏玩家在游戏平台上的行为活动。每行数据记录了一名玩家在退出平台之前，当天使用同一台设备登录平台后打开的游戏的数目（可能是 0 个）。

1179_Easy_游戏玩法分析 I

写一条 SQL 查询语句获取每位玩家第一次登陆平台的日期。查询结果的格式如下所示：Activity 表：

+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-05-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+

Result 表：

+-----------+-------------+
| player_id | first_login |
+-----------+-------------+
| 1         | 2016-03-01  |
| 2         | 2017-06-25  |
| 3         | 2016-03-02  |
+-----------+-------------+

建表+导数

create table 1179_Activity(player_id int, device_id int, event_date date, games_played int) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t';

load data inpath "/leetcode/1179_Activity.txt" overwrite into table 1179_Activity;

知识点：开窗函数结合分组top n

select 
distinct 
t.player_id, t.event_date
from 
(
     select 
     player_id,
     first_value(event_date) over(partition by player_id order by event_date) event_date
     from 1179_Activity
) t

执行结果

1180_Easy_游戏玩法分析 II

请编写一个 SQL 查询，描述每一个玩家首次登陆的设备名称查询结果格式在以下示例中：Activity table:

+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-05-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+

Result table:

+-----------+-----------+
| player_id | device_id |
+-----------+-----------+
| 1         | 2         |
| 2         | 3         |
| 3         | 1         |
+-----------+-----------+

建表+导数：

create table 1180_Activity(player_id int, device_id int, event_date date, games_played int) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t';

load data inpath "/leetcode/1180_Activity.txt" overwrite into table 1180_Activity;

知识点：开窗函数实现分组top n，并结合条件过滤

思路：取分组数据的第一条 实现：开窗对日期、排序之后取first_value，对该数据过滤关联上driver_id即可

select 
t.player_id, t.device_id
from 
(
     select 
     player_id,
     device_id,
     event_date,
     first_value(event_date) over(partition by player_id order by event_date) first_day
     from 1180_Activity
) t
where t.event_date=t.first_day

执行结果

1181_Easy_游戏玩法分析 III ++++++++++++++++

编写一个 SQL 查询，同时报告每组玩家和日期，以及玩家到目前为止玩了多少游戏。也就是说，在此日期之前玩家所玩的游戏总数。详细情况请查看示例。查询结果格式如下所示：Activity table:

+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-05-02 | 6            |
| 1         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+

Result table:

+-----------+------------+---------------------+
| player_id | event_date | games_played_so_far |
+-----------+------------+---------------------+
| 1         | 2016-03-01 | 5                   |
| 1         | 2016-05-02 | 11                  |
| 1         | 2017-06-25 | 12                  |
| 3         | 2016-03-02 | 0                   |
| 3         | 2018-07-03 | 5                   |
+-----------+------------+---------------------+

对于 ID 为 1 的玩家，2016-05-02 共玩了 5+6=11 个游戏，2017-06-25 共玩了 5+6+1=12 个游戏。对于 ID 为 3 的玩家，2018-07-03 共玩了 0+5=5 个游戏。请注意，对于每个玩家，我们只关心玩家的登录日期。

建表+导数：

create table 1181_Activity(player_id int, device_id int, event_date date, games_played int) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t';

load data inpath "/leetcode/1181_Activity.txt" overwrite into table 1181_Activity;

知识点：开窗函数，窗口大小控制

对日期分区，排序之后，计算将截止到当前时间的和** 实现：通过over（）开窗，在窗内对日期字段分组并升序排列，使用ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW，控制窗口大小，即从组内第一条到前条，再通过sum（）计算窗口内的和

select
     player_id,
     event_date,
     sum(games_played) over(partition by player_id
order by
     event_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) games_played_so_far
from
     1181 _Activity

执行结果：

1182_Medium_游戏玩法分析 IV ？？？？？？？？？

编写一个 SQL 查询，报告在首次登录的第二天再次登录的玩家的比率，四舍五入到小数点后两位。换句话说，您需要计算从首次登录日期开始至少连续两天登录的玩家的数量，然后除以玩家总数。查询结果格式如下所示：Activity table:

+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-03-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+

Result table:

+-----------+
| fraction  |
+-----------+
| 0.33      |
+-----------+

只有 ID 为 1 的玩家在第一天登录后才重新登录，所以答案是 1/3 = 0.33

建表+导数

create table 1182_Activity(player_id int, device_id int, event_date date, games_played int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

load data inpath "/leetcode/1182_Activity.txt" overwrite into table 1182_Activity;

知识点：开窗函数计算连续登陆

两步走：第一步获取连续登陆的人的id，第二步：对player_id去重，得到总人数。将两步join起来。通过date_sub函数减去rank值，在分组，即可得到连续登陆得人。最后将在group by中对分组内的条数求和，过滤出条数大于2的，在select中通过round（）函数保留小数位，在round函数中使用player_id人数（distinct去重）/ cnt**

select
     round(count(distinct t1.player_id)/ t1.cnt,2)
from
     (
     select
          t.player_id,
          t.event_date,
          date_sub(t.event_date, t.rk) event_date_sub_rk,
          t.cnt
     from
          (
          select
              a.player_id,
              a.event_date,
              rank() over(partition by a.player_id order by a.event_date) rk,
              base2.cnt
          from
              1182 _Activity a
          left join
              (
              select
                   dis_play,
                   count(1) over() cnt
              from
                   (
                   select
                        distinct player_id dis_play
                   from
                        1182 _Activity 
                   ) base
               ) base2 
           on
              a.player_id = base2.dis_play ) t ) t1
group by
     player_id,
     event_date_sub_rk,
     cnt
having
     count(*)= 2

执行结果：

中间表t执行结果：

1193_Hard_游戏玩法分析V ？？？？？？？？？？？？

编写一个 SQL 查询，报告所有安装日期、当天安装游戏的玩家数量和玩家的第一天留存率。查询结果格式如下所示：Activity 表：

+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-03-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-01 | 0            |
| 3         | 4         | 2016-07-03 | 5            |
+-----------+-----------+------------+--------------+

Result 表：

+------------+----------+----------------+
| install_dt | installs | Day1_retention |
+------------+----------+----------------+
| 2016-03-01 | 2        | 0.50           |
| 2017-06-25 | 1        | 0.00           |
+------------+----------+----------------+

玩家 1 和 3 在 2016-03-01 安装了游戏，但只有玩家 1 在 2016-03-02 重新登录，所以 2016-03-01 的第一天留存率是 1/2=0.50 玩家 2 在 2017-06-25 安装了游戏，但在 2017-06-26 没有重新登录，因此 2017-06-25 的第一天留存率为 0/1=0.00

建表+导数

create table 1193_Activity(player_id int, device_id int, event_date date, games_played int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

load data inpath "/leetcode/1193_Activity.txt" overwrite into table 1193_Activity;

1198_Medium_小众书籍

书籍表 Books：

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| book_id        | int     |
| name           | varchar |
| available_from | date    |
+----------------+---------+

book_id 是这个表的主键。订单表 Orders：

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| order_id       | int     |
| book_id        | int     |
| quantity       | int     |
| dispatch_date  | date    |
+----------------+---------+

order_id 是这个表的主键。 book_id 是 Books 表的外键。你需要写一段 SQL 命令，筛选出过去一年中订单总量少于10本的书籍。注意：不考虑上架（available from）距今不满一个月的书籍。并且假设今天是 2019-06-23 。下面是样例输出结果：Books 表：

+---------+--------------------+----------------+
| book_id | name               | available_from |
+---------+--------------------+----------------+
| 1       | "Kalila And Demna" | 2010-01-01     |
| 2       | "28 Letters"       | 2012-05-12     |
| 3       | "The Hobbit"       | 2019-06-10     |
| 4       | "13 Reasons Why"   | 2019-06-01     |
| 5       | "The Hunger Games" | 2008-09-21     |
+---------+--------------------+----------------+

Orders 表：

+----------+---------+----------+---------------+
| order_id | book_id | quantity | dispatch_date |
+----------+---------+----------+---------------+
| 1        | 1       | 2        | 2018-07-26    |
| 2        | 1       | 1        | 2018-11-05    |
| 3        | 3       | 8        | 2019-06-11    |
| 4        | 4       | 6        | 2019-06-05    |
| 5        | 4       | 5        | 2019-06-20    |
| 6        | 5       | 9        | 2009-02-02    |
| 7        | 5       | 8        | 2010-04-13    |
+----------+---------+----------+---------------+

Result 表：

+-----------+--------------------+
| book_id   | name               |
+-----------+--------------------+
| 1         | "Kalila And Demna" |
| 2         | "28 Letters"       |
| 5         | "The Hunger Games" |
+-----------+--------------------+

建表+导数

create table 1198_Books(book_id int, name string, available_from date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1193_Activity.txt" overwrite into table 1193_Activity;

create table 1198_Orders(order_id int, book_id int, quantity int，dispatch_date date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath "/leetcode/1193_Activity.txt" overwrite into table 1193_Activity;

知识点：join过程中的空值处理，datediff，date_sub

思路：对给定条件进行left join，并在join过程中过滤。对于空值，使用nvl（）函数设置默认值 datediff('2019-06-23', o.dispatch_date) <= 365 过滤出近一年的数据。 date_sub(o.dispatch_date, 30) > available_from过滤出30天外的数据。

select 
t.name name
from 
(
          select
          b.name,   b.available_from, nvl(o.quantity,0), date_sub(o.dispatch_date,30) sub_dispatch_date
     from
          1198_Books b
     left join 1198_Orders o 
     on b.book_id = o.book_id and datediff('2019-06-23', o.dispatch_date) <= 365
     where
          date_sub(o.dispatch_date, 30) > available_from or quantity is NULL
) t
group by t.name having sum(t.quantity) < 10

执行结果：

执行中间结果表t：

1204_Medium_每日新用户统计

Traffic 表：

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| activity      | enum    |
| activity_date | date    |
+---------------+---------+

该表没有主键，它可能有重复的行。 activity 列是 ENUM 类型，可能取 ('login', 'logout', 'jobs', 'groups', 'homepage') 几个值之一。编写一个 SQL 查询，以查询从今天起最多 90 天内，每个日期该日期首次登录的用户数。假设今天是 2019-06-30.查询结果格式如下例所示：Traffic 表：

+---------+----------+---------------+
| user_id | activity | activity_date |
+---------+----------+---------------+
| 1       | login    | 2019-05-01    |
| 1       | homepage | 2019-05-01    |
| 1       | logout   | 2019-05-01    |
| 2       | login    | 2019-06-21    |
| 2       | logout   | 2019-06-21    |
| 3       | login    | 2019-01-01    |
| 3       | jobs     | 2019-01-01    |
| 3       | logout   | 2019-01-01    |
| 4       | login    | 2019-06-21    |
| 4       | groups   | 2019-06-21    |
| 4       | logout   | 2019-06-21    |
| 5       | login    | 2019-03-01    |
| 5       | logout   | 2019-03-01    |
| 5       | login    | 2019-06-21    |
| 5       | logout   | 2019-06-21    |
+---------+----------+---------------+

Result 表：

+------------+-------------+
| login_date | user_count  |
+------------+-------------+
| 2019-05-01 | 1           |
| 2019-06-21 | 2           |
+------------+-------------+

请注意，我们只关心用户数非零的日期. ID 为 5 的用户第一次登陆于 2019-03-01，因此他不算在 2019-06-21 的的统计内。

建表+导数

create table 1204_Traffic (user_id int, activity string, activity_date date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

load data inpath "/leetcode/1204_Traffic.txt" overwrite into table 1204_Traffic;

知识点：分组top n

思路，为了取出首次登陆（因为用户有多次登陆），先对use分组，对日期排序，取rank = 1，即是用户首次登陆。

select 
t.activity_date, count(*) user_count  
from 
(
     select 
     user_id, activity_date,activity,
     rank() over(partition by user_id order by activity_date) rk
     from 
     1204_Traffic
     where activity='login' 
) t
where t.rk = 1 and datediff('2019-06-30', activity_date) < 90
group by t.activity_date

执行结果：

1214_Medium_每位学生的最高成绩

表： Enrollments

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| student_id    | int     |
| course_id     | int     |
| grade         | int     |
+---------------+---------+

(student_id, course_id) 是该表的主键。

编写一个 SQL 查询，查询每位学生获得的最高成绩和它所对应的科目，若科目成绩并列，取 course_id 最小的一门。查询结果需按 student_id 增序进行排序。查询结果格式如下所示：Enrollments 表：

+------------+-------------------+
| student_id | course_id | grade |
+------------+-----------+-------+
| 2          | 2         | 95    |
| 2          | 3         | 95    |
| 1          | 1         | 90    |
| 1          | 2         | 99    |
| 3          | 1         | 80    |
| 3          | 2         | 75    |
| 3          | 3         | 82    |
+------------+-----------+-------+

Result 表：

+------------+-------------------+
| student_id | course_id | grade |
+------------+-----------+-------+
| 1          | 2         | 99    |
| 2          | 2         | 95    |
| 3          | 3         | 82    |
+------------+-----------+-------+

思路：在最内层中做两个排序，第一个排序按student_id分组，按grade降序排，目的在于找到最高得成绩，第二个排序按student_id, grade分组，按course_id升序排，找到相同成绩时的course_id最小值。最外层按student_id升序排

select 
student_id, course_id, grade
from 
(
     select 
     student_id, course_id, grade,
     rank() over(partition by student_id order by grade desc) rk_grade,
     rank() over(partition by student_id, grade order by course_id) rk_course_id
     from 1214_Enrollments
) t
where rk_grade=1 and rk_course_id=1
order by student_id

执行结果：

1215_Easy_报告的记录

动作表：Actions

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| post_id       | int     |
| action_date   | date    | 
| action        | enum    |
| extra         | varchar |
+---------------+---------+

此表没有主键，所以可能会有重复的行。 action 字段是 ENUM 类型的，包含:('view', 'like', 'reaction', 'comment', 'report', 'share') extra 字段是可选的信息（可能为 null），其中的信息例如有：1.报告理由(a reason for report) 2.反应类型(a type of reaction) 编写一条SQL，查询每种报告理由（report reason）在昨天的不同报告数量（post_id）。假设今天是 2019-07-05。查询及结果的格式示例：Actions table:

+---------+---------+-------------+--------+--------+
| user_id | post_id | action_date | action | extra  |
+---------+---------+-------------+--------+--------+
| 1       | 1       | 2019-07-01  | view   | null   |
| 1       | 1       | 2019-07-01  | like   | null   |
| 1       | 1       | 2019-07-01  | share  | null   |
| 2       | 4       | 2019-07-04  | view   | null   |
| 2       | 4       | 2019-07-04  | report | spam   |
| 3       | 4       | 2019-07-04  | view   | null   |
| 3       | 4       | 2019-07-04  | report | spam   |
| 4       | 3       | 2019-07-02  | view   | null   |
| 4       | 3       | 2019-07-02  | report | spam   |
| 5       | 2       | 2019-07-04  | view   | null   |
| 5       | 2       | 2019-07-04  | report | racism |
| 5       | 5       | 2019-07-04  | view   | null   |
| 5       | 5       | 2019-07-04  | report | racism |
+---------+---------+-------------+--------+--------+

Result table:

+---------------+--------------+
| report_reason | report_count |
+---------------+--------------+
| spam          | 1            |
| racism        | 2            |
+---------------+--------------+

注意，我们只关心报告数量非零的结果。

建表+导数：

create table 1215_Actions(user_id int, post_id int, action_date date, `action` string, extra varchar(100)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

load data inpath "/leetcode/1215_Actions" overwrite into table 1215_Actions;

思路：日期函数datediff的使用

select 
extra,
count(extra)
from 
(
     select 
     datediff("2019-07-05", action_date) ddf,
     action,
     extra
     from 1215_Actions
) t
where ddf=1 and extra!="null"
group by extra

力扣HQL--<614-1215>

614_Medium_二级关注者

615_Hard_平均工资：部门与公司比较

618_Hard_学生地理信息报告

619_Easy_只出现一次的最大数字

620_Easy_有趣的电影

626_Medium_换座位

627_Easy_变更性别

1135_Medium_买下所有产品的客户 ++++++++++++++++

知识点：行列互转

1136_Easy_合作过至少三次的演员和导演

知识点：开窗函数

产品销售分析

1153_Easy_产品销售分析 I

知识点：表左连接

1153_Easy_产品销售分析 II

1155_Medium_产品销售分析 III

游戏玩法分析

1179_Easy_游戏玩法分析 I

知识点：开窗函数结合分组top n

1180_Easy_游戏玩法分析 II

知识点：开窗函数实现分组top n，并结合条件过滤

1181_Easy_游戏玩法分析 III ++++++++++++++++

知识点：开窗函数，窗口大小控制

1182_Medium_游戏玩法分析 IV ？？？？？？？？？

知识点：开窗函数计算连续登陆

1193_Hard_游戏玩法分析V ？？？？？？？？？？？？

1198_Medium_小众书籍

知识点：join过程中的空值处理，datediff，date_sub

1204_Medium_每日新用户统计

知识点：分组top n

1214_Medium_每位学生的最高成绩

1215_Easy_报告的记录