1.问题描述
- 某一时间段各个商品的销量排行。
- 销量飙升榜。
2.初始化脚本
-- 创建示例表
create table products(
product_id integer not null primary key,
product_name varchar(100) not null unique,
product_subcategory varchar(100) not null,
product_category varchar(100) not null
);
insert into products values(1, 'iPhone 11', '手机', '手机通讯');
insert into products values(2, 'HUAWEI P40', '手机', '手机通讯');
insert into products values(3, '小米10', '手机', '手机通讯');
insert into products values(4, 'OPPO Reno4', '手机', '手机通讯');
insert into products values(5, 'vivo Y70s', '手机', '手机通讯');
insert into products values(6, '海尔BCD-216STPT', '冰箱', '大家电');
insert into products values(7, '康佳BCD-155C2GBU', '冰箱', '大家电');
insert into products values(8, '容声BCD-529WD11HP', '冰箱', '大家电');
insert into products values(9, '美的BCD-213TM(E)', '冰箱', '大家电');
insert into products values(10, '格力BCD-230WETCL', '冰箱', '大家电');
insert into products values(11, '格力KFR-35GW', '空调', '大家电');
insert into products values(12, '美的KFR-35GW', '空调', '大家电');
insert into products values(13, 'TCLKFRd-26GW', '空调', '大家电');
insert into products values(14, '奥克斯KFR-35GW', '空调', '大家电');
insert into products values(15, '海尔KFR-35GW', '空调', '大家电');
create table sales(
product_id integer not null,
sale_time timestamp not null,
quantity integer not null
);
-- 生成模拟销量数据
insert into sales
with recursive s(product_id, sale_time, quantity) as (
select product_id, '2022-04-01 00:00:00', floor(10*rand(0)) from products
union all
select product_id, sale_time + interval 1 minute, floor(10*rand(0))
from s
where sale_time < '2022-04-01 10:00:00'
)
select * from s;
3.思路分析
1.某一时间段各个商品的销量排行
-
首先查询出某一时间段某商品的销售情况,以9点钟为例
select s.product_id ,date_format(s.sale_time,'%Y%m%d%H') ymdh ,sum(s.quantity) from sales s where s.sale_time between '2022-04-01 09:00:00' and '2022-04-01 09:59:59' group by product_id,ymdh -
以此为基础,通过rank()函数进行排行。
with hourly_sales (product_id,ymdh,quantity) as( select s.product_id ,date_format(s.sale_time,'%Y%m%d%H') ymdh ,sum(s.quantity) from sales s where s.sale_time between '2022-04-01 09:00:00' and '2022-04-01 09:59:59' group by product_id,ymdh), hourly_rank as ( select P.product_id,hs .quantity ,p .product_category ,p.product_name ,p.product_subcategory ,rank() over(partition by ymdh,p.product_category ,p.product_subcategory order by hs .quantity desc) AS rk from hourly_sales hs left join products p on p.product_id =hs .product_id ) select * from hourly_rank
2.销量飙升榜
-
在第一个问题的基础上进行改进,例如筛选出8-9点商品销量排行。
with hourly_sales (product_id,ymdh,quantity) as( select s.product_id ,date_format(s.sale_time,'%Y%m%d%H') ymdh ,sum(s.quantity) from sales s where s.sale_time between '2022-04-01 08:00:00' and '2022-04-01 09:59:59' group by product_id,ymdh), hourly_rank as ( select P.product_id,hs .quantity ,p .product_category ,p.product_name ,hs.ymdh ,rank() over(partition by ymdh,p.product_category ,p.product_subcategory order by hs .quantity desc) AS rk from hourly_sales hs left join products p on p.product_id =hs .product_id ) select * from hourly_rank; -
在此基础上可以通过lag函数进行处理,来对比9点某商品的排名是上升还是下降
with hourly_sales (product_id,ymdh,quantity) as( select s.product_id ,date_format(s.sale_time,'%Y%m%d%H') ymdh ,sum(s.quantity) from sales s where s.sale_time between '2022-04-01 08:00:00' and '2022-04-01 09:59:59' group by product_id,ymdh), hourly_rank as ( select P.product_id,hs .quantity ,p .product_category ,p.product_name ,hs.ymdh ,rank() over(partition by ymdh,p.product_category ,p.product_subcategory order by hs .quantity desc) AS rk from hourly_sales hs left join products p on p.product_id =hs .product_id ), rank_gain as( select product_name,product_category,ymdh,quantity,rk, lag(rk,1) over(partition by product_name,product_category order by ymdh) pre_rk, (ifnull(lag(rk,1) over(partition by product_name,product_category order by ymdh),9999)-rk)/rk as gain from hourly_rank ) select * ,rank () over (partition by product_category order by gain desc) gain_rk from rank_gain where pre_rk is not null这里需要注意两个null的处理
- pre_rk值为空时,说明时间段是8点,可以通过pre_rk is not null来进行过滤,从而只得到9点时间段商品的信息。
- gain字段,在该商品八点时间段时使用lag函数得到的pre_rk值为null,在与rk比较时应给一个默认值。