1.问题描述:
如何使用SQL语句基于扫码信息获取用户活动轨迹,基于轨迹交集获取时空伴随者(密接人员)。
2.初始化脚本:
-- 创建示例表和数据
CREATE TABLE trail(
uid varchar(11) NOT NULL,
area varchar(10) NOT NULL,
scan_time timestamp);
INSERT INTO trail VALUES ('13011111111', 'A001', '2022-05-01 09:00:00');
INSERT INTO trail VALUES ('13011111111', 'A001', '2022-05-01 10:00:00');
INSERT INTO trail VALUES ('13011111111', 'A001', '2022-05-01 11:00:00');
INSERT INTO trail VALUES ('13011111111', 'A002', '2022-05-01 11:05:00');
INSERT INTO trail VALUES ('13011111111', 'A002', '2022-05-01 13:00:00');
INSERT INTO trail VALUES ('13011111111', 'A001', '2022-05-01 13:15:00');
INSERT INTO trail VALUES ('13011111111', 'A001', '2022-05-01 14:00:00');
INSERT INTO trail VALUES ('13022222222', 'A001', '2022-05-01 10:30:00');
INSERT INTO trail VALUES ('13022222222', 'A001', '2022-05-01 12:00:00');
INSERT INTO trail VALUES ('13033333333', 'A001', '2022-05-01 11:00:00');
INSERT INTO trail VALUES ('13033333333', 'A001', '2022-05-01 12:00:00');
INSERT INTO trail VALUES ('13033333333', 'A001', '2022-05-01 13:00:00');
3.思路分析
1. 问题一:如何找出用户在每个区域的停留开始时间和结束时间?
-
首先我们可以通过row_number () over () 函数对用户id和用户所在区域进行分组
select uid,area ,scan_time , row_number () over (partition by uid order by scan_time) num1, row_number () over (partition by uid,area order by scan_time)num2 from trail order by uid ,scan_time
不难发现,当num1-num2时出现变化时说明该用户所在区域也发生变化。
-
我们利用num1-num2得
select uid,area ,scan_time , row_number () over (partition by uid order by scan_time) num1, row_number () over (partition by uid,area order by scan_time)num2, row_number () over (partition by uid order by scan_time)-row_number () over (partition by uid,area order by scan_time)diff from trail order by uid ,scan_time -
根据上述分析
with tmp as ( select uid,area ,scan_time , row_number () over (partition by uid order by scan_time)-row_number () over (partition by uid,area order by scan_time)diff from trail ) select uid,area,min(scan_time)start_time ,max(scan_time)end_time from tmp group by uid ,area ,diff order by uid,scan_time
2.问题二:假如某个用户核酸检测为阳性,找出他的伴随人员
伴随规则:在阳性人员停留半小时以上的区域,用户停留半小时以上,并且停留时间和阳性人员有十分钟以上的交集-
假设用户1为阳性人员出现的区域是A001 停留时间为11:00-13:00
在问题1基础上,我们只需将表与自身关联即可
注意一下几点:
- 关联自身时要排除自己。
- 停留时间和阳性人员有十分钟以上的交集,这里需要考虑用户是先到达和后到达的问题,容易漏掉。
with tmp as (
select uid,area ,scan_time ,
row_number () over (partition by uid order by scan_time)-row_number () over (partition by uid,area order by scan_time)diff
from trail
),
tmp2 as (
select uid,area,min(scan_time)start_time ,max(scan_time)end_time from tmp group by uid ,area ,diff
having min(scan_time)+interval 30 minute <= max(scan_time)
)
select * from tmp2 u1,tmp2 u2
where u1.uid<>u2.uid and u1.uid='13011111111'
and u1.area=u2.area
and u1.start_time+interval 10 minute <u2.end_time
and u2.start_time+ interval 10 minute <u1.end_time ;