LeetCode数据库SQL练习(七)

156 阅读9分钟

报告的记录系列

动作表:Actions

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| post_id       | int     |
| action_date   | date    | 
| action        | enum    |
| extra         | varchar |
+---------------+---------+
此表没有主键,所以可能会有重复的行。
action 字段是 ENUM 类型的,包含:('view', 'like', 'reaction', 'comment',  'report'  , 'share')
extra 字段是可选的信息(可能为 null),其中的信息例如有:1.报告理由(a reason for report)  2.反应类型(a type of reaction)

移除表: Removals

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| post_id       | int     |
| remove_date   | date    | 
+---------------+---------+
这张表的主键是 post_id。
这张表的每一行表示一个被移除的帖子,原因可能是由于被举报或被管理员审查。

1、编写一条SQL,查询每种 报告理由(report reason)在昨天的不同报告数量(post_id)。假设今天是 2019-07-05

查询及结果的格式示例:

Actions table:
+---------+---------+-------------+--------+--------+
| user_id | post_id | action_date | action | extra  |
+---------+---------+-------------+--------+--------+
| 1       | 1       | 2019-07-01  | view   | null   |
| 1       | 1       | 2019-07-01  | like   | null   |
| 1       | 1       | 2019-07-01  | share  | null   |
| 2       | 4       | 2019-07-04  | view   | null   |
| 2       | 4       | 2019-07-04  | report | spam   |
| 3       | 4       | 2019-07-04  | view   | null   |
| 3       | 4       | 2019-07-04  | report | spam   |
| 4       | 3       | 2019-07-02  | view   | null   |
| 4       | 3       | 2019-07-02  | report | spam   |
| 5       | 2       | 2019-07-04  | view   | null   |
| 5       | 2       | 2019-07-04  | report | racism |
| 5       | 5       | 2019-07-04  | view   | null   |
| 5       | 5       | 2019-07-04  | report | racism |
+---------+---------+-------------+--------+--------+

Result table:
+---------------+--------------+
| report_reason | report_count |
+---------------+--------------+
| spam          | 1            |
| racism        | 2            |
+---------------+--------------+ 
注意,我们只关心报告数量非零的结果。
  • SQL:
    • select
      extra report_reason, count(distinct post_id) report_count
      from
      Actions
      where action = 'report' and action_date = '2019-07-04' and extra is not null
      group by extra
      

2、编写一段 SQL 来查找:在被报告为垃圾广告的帖子中,被移除的帖子的每日平均占比,四舍五入到小数点后 2 位

查询结果的格式如下:

Actions table:
+---------+---------+-------------+--------+--------+
| user_id | post_id | action_date | action | extra  |
+---------+---------+-------------+--------+--------+
| 1       | 1       | 2019-07-01  | view   | null   |
| 1       | 1       | 2019-07-01  | like   | null   |
| 1       | 1       | 2019-07-01  | share  | null   |
| 2       | 2       | 2019-07-04  | view   | null   |
| 2       | 2       | 2019-07-04  | report | spam   |
| 3       | 4       | 2019-07-04  | view   | null   |
| 3       | 4       | 2019-07-04  | report | spam   |
| 4       | 3       | 2019-07-02  | view   | null   |
| 4       | 3       | 2019-07-02  | report | spam   |
| 5       | 2       | 2019-07-03  | view   | null   |
| 5       | 2       | 2019-07-03  | report | racism |
| 5       | 5       | 2019-07-03  | view   | null   |
| 5       | 5       | 2019-07-03  | report | racism |
+---------+---------+-------------+--------+--------+

Removals table:
+---------+-------------+
| post_id | remove_date |
+---------+-------------+
| 2       | 2019-07-20  |
| 3       | 2019-07-18  |
+---------+-------------+

Result table:
+-----------------------+
| average_daily_percent |
+-----------------------+
| 75.00                 |
+-----------------------+
2019-07-04 的垃圾广告移除率是 50%,因为有两张帖子被报告为垃圾广告,但只有一个得到移除。
2019-07-02 的垃圾广告移除率是 100%,因为有一张帖子被举报为垃圾广告并得到移除。
其余几天没有收到垃圾广告的举报,因此平均值为:(50 + 100) / 2 = 75%
注意,输出仅需要一个平均值即可,我们并不关注移除操作的日期。
  • SQL1:

    • 子查询
    • select
      round(avg(t.rate) * 100, 2) average_daily_percent
      from
          (select
          action_date,
          count(distinct r.post_id) / count(distinct a.post_id) rate
          from
          Actions a left Join Removals r
          on a.post_id = r.post_id
          where extra = 'spam'
          group by a.action_date) t
      
  • SQL2:

    • Join方法
    • select
      round(avg(ifnull(t2.cnt2, 0) / t1.cnt1) * 100, 2) average_daily_percent
      from
          (select
          action_date, count(distinct post_id) cnt1
          from
          Actions
          where extra = 'spam'
          group by action_date) t1
      left Join
          (select
          action_date, count(distinct post_id) cnt2
          from
          Actions
          where extra = 'spam' and post_id in (select post_id from Removals)
          group by action_date) t2
      on t1.action_date = t2.action_date
      

用户购买平台

支出表: Spending

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| user_id     | int     |
| spend_date  | date    |
| platform    | enum    | 
| amount      | int     |
+-------------+---------+
这张表记录了用户在一个在线购物网站的支出历史,该在线购物平台同时拥有桌面端('desktop')和手机端('mobile')的应用程序。
这张表的主键是 (user_id, spend_date, platform)。
平台列 platform 是一种 ENUM ,类型为('desktop', 'mobile')。

写一段 SQL 来查找每天  使用手机端用户、 使用桌面端用户和 同时 使用桌面端和手机端的用户人数和总支出金额。

查询结果格式如下例所示:

Spending table:
+---------+------------+----------+--------+
| user_id | spend_date | platform | amount |
+---------+------------+----------+--------+
| 1       | 2019-07-01 | mobile   | 100    |
| 1       | 2019-07-01 | desktop  | 100    |
| 2       | 2019-07-01 | mobile   | 100    |
| 2       | 2019-07-02 | mobile   | 100    |
| 3       | 2019-07-01 | desktop  | 100    |
| 3       | 2019-07-02 | desktop  | 100    |
+---------+------------+----------+--------+

Result table:
+------------+----------+--------------+-------------+
| spend_date | platform | total_amount | total_users |
+------------+----------+--------------+-------------+
| 2019-07-01 | desktop  | 100          | 1           |
| 2019-07-01 | mobile   | 100          | 1           |
| 2019-07-01 | both     | 200          | 1           |
| 2019-07-02 | desktop  | 100          | 1           |
| 2019-07-02 | mobile   | 100          | 1           |
| 2019-07-02 | both     | 0            | 0           |
+------------+----------+--------------+-------------+ 
 2019-07-01, 用户1 同时 使用桌面端和手机端购买, 用户2  使用了手机端购买,而用户3  使用了桌面端购买。
 2019-07-02, 用户2  使用了手机端购买, 用户3  使用了桌面端购买,且没有用户 同时 使用桌面端和手机端购买。
  • SQL:
    • select
      t1.spend_date, t1.platform,
      coalesce(sum(t2.amount), 0) total_amount,
      coalesce(sum(t2.users), 0) total_users
      from
          (select distinct spend_date,'desktop' as platform from Spending
          union
          select distinct spend_date,'mobile' as platform from Spending
          union
          select distinct spend_date,'both' as platform from Spending) t1
      left join
          (select
          user_id, spend_date,
          case when count(platform) = 2 then 'both' else platform end platform,
          sum(amount) amount,
          count(distinct user_id) users
          from
          Spending
          group by user_id, spend_date) t2
      on t1.spend_date = t2.spend_date and t1.platform = t2.platform
      group by t1.spend_date, t1.platform
      

查询近30天活跃用户数系列

活动记录表:Activity

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| session_id    | int     |
| activity_date | date    |
| activity_type | enum    |
+---------------+---------+
该表是用户在社交网站的活动记录。
该表没有主键,可能包含重复数据。
activity_type 字段为以下四种值 ('open_session', 'end_session', 'scroll_down', 'send_message')。
每个 session_id 只属于一个用户。

1、请写SQL查询出截至 2019-07-27(包含2019-07-27) ,近 30天的每日活跃用户数(当天只要有一条活动记录,即为活跃用户)。

查询结果示例如下:

Activity table:
+---------+------------+---------------+---------------+
| user_id | session_id | activity_date | activity_type |
+---------+------------+---------------+---------------+
| 1       | 1          | 2019-07-20    | open_session  |
| 1       | 1          | 2019-07-20    | scroll_down   |
| 1       | 1          | 2019-07-20    | end_session   |
| 2       | 4          | 2019-07-20    | open_session  |
| 2       | 4          | 2019-07-21    | send_message  |
| 2       | 4          | 2019-07-21    | end_session   |
| 3       | 2          | 2019-07-21    | open_session  |
| 3       | 2          | 2019-07-21    | send_message  |
| 3       | 2          | 2019-07-21    | end_session   |
| 4       | 3          | 2019-06-25    | open_session  |
| 4       | 3          | 2019-06-25    | end_session   |
+---------+------------+---------------+---------------+

Result table:
+------------+--------------+ 
| day        | active_users |
+------------+--------------+ 
| 2019-07-20 | 2            |
| 2019-07-21 | 2            |
+------------+--------------+ 
非活跃用户的记录不需要展示。
  • 近30天:

    • where datediff('2019-07-27',activity_date) < 30
    • where activity_date between date_add('2019-07-27', interval -29 day) and '2019-07-27'
  • SQL:

    • select
      activity_date day, count(distinct user_id) active_users
      from
      Activity
      where activity_date between '2019-06-28' and '2019-07-27'
      group by activity_date
      

2、编写SQL查询以查找截至2019年7月27日(含)的30天内每个用户的平均会话数,四舍五入到小数点后两位。我们只统计那些会话期间用户至少进行一项活动的有效会话。

查询结果格式如下例所示:

Activity table:
+---------+------------+---------------+---------------+
| user_id | session_id | activity_date | activity_type |
+---------+------------+---------------+---------------+
| 1       | 1          | 2019-07-20    | open_session  |
| 1       | 1          | 2019-07-20    | scroll_down   |
| 1       | 1          | 2019-07-20    | end_session   |
| 2       | 4          | 2019-07-20    | open_session  |
| 2       | 4          | 2019-07-21    | send_message  |
| 2       | 4          | 2019-07-21    | end_session   |
| 3       | 2          | 2019-07-21    | open_session  |
| 3       | 2          | 2019-07-21    | send_message  |
| 3       | 2          | 2019-07-21    | end_session   |
| 3       | 5          | 2019-07-21    | open_session  |
| 3       | 5          | 2019-07-21    | scroll_down   |
| 3       | 5          | 2019-07-21    | end_session   |
| 4       | 3          | 2019-06-25    | open_session  |
| 4       | 3          | 2019-06-25    | end_session   |
+---------+------------+---------------+---------------+

Result table:
+---------------------------+ 
| average_sessions_per_user |
+---------------------------+ 
| 1.33                      |
+---------------------------+ 
User 1  2 在过去30天内各自进行了1次会话,而用户3进行了2次会话,因此平均值为(1 +1 + 2)/ 3 = 1.33
  • SQL:
    • select
      ifnull(round(count(distinct session_id) / count(distinct user_id), 2), 0)average_sessions_per_user
      from
      Activity
      where datediff('2019-07-27', activity_date) < 30
      

文章浏览系列

Views 表:

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| article_id    | int     |
| author_id     | int     |
| viewer_id     | int     |
| view_date     | date    |
+---------------+---------+
此表无主键,因此可能会存在重复行。
此表的每一行都表示某人在某天浏览了某位作者的某篇文章。
请注意,同一人的 author_id 和 viewer_id 是相同的。

1、请编写一条 SQL 查询以找出所有浏览过自己文章的作者,结果按照 id 升序排列。

查询结果的格式如下所示:

Views 表:
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date  |
+------------+-----------+-----------+------------+
| 1          | 3         | 5         | 2019-08-01 |
| 1          | 3         | 6         | 2019-08-02 |
| 2          | 7         | 7         | 2019-08-01 |
| 2          | 7         | 6         | 2019-08-02 |
| 4          | 7         | 1         | 2019-07-22 |
| 3          | 4         | 4         | 2019-07-21 |
| 3          | 4         | 4         | 2019-07-21 |
+------------+-----------+-----------+------------+

结果表:
+------+
| id   |
+------+
| 4    |
| 7    |
+------+
  • SQL:
    • select
      distinct author_id id
      from
      Views
      where author_id = viewer_id
      order by author_id
      

2、编写一条 SQL 查询来找出在同一天阅读至少两篇文章的人,结果按照 id 升序排序。

查询结果的格式如下:

Views table:
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date  |
+------------+-----------+-----------+------------+
| 1          | 3         | 5         | 2019-08-01 |
| 3          | 4         | 5         | 2019-08-01 |
| 1          | 3         | 6         | 2019-08-02 |
| 2          | 7         | 7         | 2019-08-01 |
| 2          | 7         | 6         | 2019-08-02 |
| 4          | 7         | 1         | 2019-07-22 |
| 3          | 4         | 4         | 2019-07-21 |
| 3          | 4         | 4         | 2019-07-21 |
+------------+-----------+-----------+------------+

Result table:
+------+
| id   |
+------+
| 5    |
| 6    |
+------+
  • SQL:
    • select
      distinct viewer_id id
      from
      Views
      group by view_date, viewer_id
      having count(distinct article_id) >= 2
      order by viewer_id
      

市场分析系列

Table: Users

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| user_id        | int     |
| join_date      | date    |
| favorite_brand | varchar |
+----------------+---------+
此表主键是 user_id,表中描述了购物网站的用户信息,用户可以在此网站上进行商品买卖。

Table: Orders

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| order_id      | int     |
| order_date    | date    |
| item_id       | int     |
| buyer_id      | int     |
| seller_id     | int     |
+---------------+---------+
此表主键是 order_id,外键是 item_id 和(buyer_id,seller_id)。

Table: Item

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| item_id       | int     |
| item_brand    | varchar |
+---------------+---------+
此表主键是 item_id。

1、请写出一条SQL语句以查询每个用户的注册日期和在 2019 年作为买家的订单总数。

查询结果格式如下:

Users table:
+---------+------------+----------------+
| user_id | join_date  | favorite_brand |
+---------+------------+----------------+
| 1       | 2018-01-01 | Lenovo         |
| 2       | 2018-02-09 | Samsung        |
| 3       | 2018-01-19 | LG             |
| 4       | 2018-05-21 | HP             |
+---------+------------+----------------+

Orders table:
+----------+------------+---------+----------+-----------+
| order_id | order_date | item_id | buyer_id | seller_id |
+----------+------------+---------+----------+-----------+
| 1        | 2019-08-01 | 4       | 1        | 2         |
| 2        | 2018-08-02 | 2       | 1        | 3         |
| 3        | 2019-08-03 | 3       | 2        | 3         |
| 4        | 2018-08-04 | 1       | 4        | 2         |
| 5        | 2018-08-04 | 1       | 3        | 4         |
| 6        | 2019-08-05 | 2       | 2        | 4         |
+----------+------------+---------+----------+-----------+

Items table:
+---------+------------+
| item_id | item_brand |
+---------+------------+
| 1       | Samsung    |
| 2       | Lenovo     |
| 3       | LG         |
| 4       | HP         |
+---------+------------+

Result table:
+-----------+------------+----------------+
| buyer_id  | join_date  | orders_in_2019 |
+-----------+------------+----------------+
| 1         | 2018-01-01 | 1              |
| 2         | 2018-02-09 | 2              |
| 3         | 2018-01-19 | 0              |
| 4         | 2018-05-21 | 0              |
+-----------+------------+----------------+
  • 筛选2019年:

    • between '2019-01-01' and '2020-01-01'
    • year(order_date)='2019'
  • SQL1:

    • select
      u.user_id buyer_id, u.join_date, count(o.order_id) orders_in_2019
      from
      Users u left join Orders o
      on u.user_id = o.buyer_id and (o.order_date between '2019-01-01' and '2020-01-01')
      group by u.user_id
      
  • SQL2:

    • select
      u.user_id buyer_id, join_date, ifnull(t.cnt, 0) orders_in_2019
      from
      Users u
      left join
          (select
          buyer_id, count(order_id) cnt
          from
          Orders
          where order_date between '2019-01-01' and '2020-01-01'
          group by buyer_id) t
      on u.user_id = t.buyer_id
      

2、写一个 SQL 查询确定每一个用户按日期顺序卖出的第二件商品的品牌是否是他们最喜爱的品牌。如果一个用户卖出少于两件商品,查询的结果是 no 。
题目保证没有一个用户在一天中卖出超过一件商品

下面是查询结果格式的例子:

Users table:
+---------+------------+----------------+
| user_id | join_date  | favorite_brand |
+---------+------------+----------------+
| 1       | 2019-01-01 | Lenovo         |
| 2       | 2019-02-09 | Samsung        |
| 3       | 2019-01-19 | LG             |
| 4       | 2019-05-21 | HP             |
+---------+------------+----------------+

Orders table:
+----------+------------+---------+----------+-----------+
| order_id | order_date | item_id | buyer_id | seller_id |
+----------+------------+---------+----------+-----------+
| 1        | 2019-08-01 | 4       | 1        | 2         |
| 2        | 2019-08-02 | 2       | 1        | 3         |
| 3        | 2019-08-03 | 3       | 2        | 3         |
| 4        | 2019-08-04 | 1       | 4        | 2         |
| 5        | 2019-08-04 | 1       | 3        | 4         |
| 6        | 2019-08-05 | 2       | 2        | 4         |
+----------+------------+---------+----------+-----------+

Items table:
+---------+------------+
| item_id | item_brand |
+---------+------------+
| 1       | Samsung    |
| 2       | Lenovo     |
| 3       | LG         |
| 4       | HP         |
+---------+------------+

Result table:
+-----------+--------------------+
| seller_id | 2nd_item_fav_brand |
+-----------+--------------------+
| 1         | no                 |
| 2         | yes                |
| 3         | yes                |
| 4         | no                 |
+-----------+--------------------+

id  1 的用户的查询结果是 no,因为他什么也没有卖出
id为 2  3 的用户的查询结果是 yes,因为他们卖出的第二件商品的品牌是他们自己最喜爱的品牌
id为 4 的用户的查询结果是 no,因为他卖出的第二件商品的品牌不是他最喜爱的品牌
  • MySQL变量: 1、使用 @ 来定义一个变量。比如:@a
    2、使用 := 来给变量赋值。比如: @a := 123,表示变量 a 的值为 123。

  • SQL1:

    • select
      user_id seller_id, if(r2.item_brand is null || r2.item_brand != favorite_brand, "no", "yes") 2nd_item_fav_brand
      from
      Users
      left join
      (select
      r1.seller_id, items.item_brand
      from
          (select
          @rk := if (@seller = t.seller_id, @rk + 1, 1) rk,
          @seller := t.seller_id seller_id,
          t.item_id
          from
              (select
              seller_id, item_id
              from Orders
              order by seller_id, order_date) t,
              (select @seller := -1, @rk := 0) b) r1
      join items
      on r1.item_id = items.item_id
      where r1.rk = 2) r2
      on user_id = r2.seller_id
      
  • SQL2:

    • select
      seller_id, if(item_brand = favorite_brand, 'yes', 'no') 2nd_item_fav_brand
      from
          (select
          u.user_id seller_id, u.join_date, u.favorite_brand, t1.item_id item_id, i.item_brand item_brand
          from
          Users u left join
          (select
          t.seller_id, t.item_id
          from
              (select
              seller_id, item_id,
              RANK() over(partition by seller_id order by order_date) rk
              from
              Orders) t
              where t.rk = 2) t1
          on u.user_id = t1.seller_id
          left join Items i
          on t1.item_id = i.item_id) t2
      

指定日期的产品价格

产品数据表: Products

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| product_id    | int     |
| new_price     | int     |
| change_date   | date    |
+---------------+---------+
这张表的主键是 (product_id, change_date)。
这张表的每一行分别记录了 某产品 在某个日期 更改后 的新价格。

写一段 SQL来查找在 2019-08-16 时全部产品的价格,假设所有产品在修改前的价格都是 10。

查询结果格式如下例所示:

Products table:
+------------+-----------+-------------+
| product_id | new_price | change_date |
+------------+-----------+-------------+
| 1          | 20        | 2019-08-14  |
| 2          | 50        | 2019-08-14  |
| 1          | 30        | 2019-08-15  |
| 1          | 35        | 2019-08-16  |
| 2          | 65        | 2019-08-17  |
| 3          | 20        | 2019-08-18  |
+------------+-----------+-------------+

Result table:
+------------+-------+
| product_id | price |
+------------+-------+
| 2          | 50    |
| 1          | 35    |
| 3          | 10    |
+------------+-------+
  • SQL1:

    • select
      t1.product_id, ifnull(t2.new_price, 10) price
      from
          (select
          distinct product_id
          from
          products) t1
      left join
          (select
          product_id, new_price
          from
          products
          where (product_id, change_date) in (select
                                              product_id, max(change_date)
                                              from
                                              products
                                              where change_date <= '2019-08-16'
                                              group by product_id)) t2
      on t1.product_id = t2.product_id
      
  • SQL2:

    • 窗口函数:
    • select
      p1.product_id, ifnull(p2.new_price, 10) price
      from
          (select
          distinct product_id
          from
          Products) p1
      left join
          (select
          product_id, new_price,
          ROW_NUMBER() over(partition by product_id order by change_date DESC) rn
          from
          Products
          where change_date<='2019-08-16') p2
      on p1.product_id = p2.product_id and p2.rn = 1