SQL面试题:基于扫码记录查找密接人员

170 阅读3分钟

1.问题描述:

如何使用SQL语句基于扫码信息获取用户活动轨迹,基于轨迹交集获取时空伴随者(密接人员)。

2.初始化脚本:

 -- 创建示例表和数据
 CREATE TABLE trail(
   uid varchar(11) NOT NULL,
   area varchar(10) NOT NULL,
   scan_time timestamp);
 ​
 INSERT INTO trail VALUES ('13011111111', 'A001', '2022-05-01 09:00:00');
 INSERT INTO trail VALUES ('13011111111', 'A001', '2022-05-01 10:00:00');
 INSERT INTO trail VALUES ('13011111111', 'A001', '2022-05-01 11:00:00');
 INSERT INTO trail VALUES ('13011111111', 'A002', '2022-05-01 11:05:00');
 INSERT INTO trail VALUES ('13011111111', 'A002', '2022-05-01 13:00:00');
 INSERT INTO trail VALUES ('13011111111', 'A001', '2022-05-01 13:15:00');
 INSERT INTO trail VALUES ('13011111111', 'A001', '2022-05-01 14:00:00');
 INSERT INTO trail VALUES ('13022222222', 'A001', '2022-05-01 10:30:00');
 INSERT INTO trail VALUES ('13022222222', 'A001', '2022-05-01 12:00:00');
 INSERT INTO trail VALUES ('13033333333', 'A001', '2022-05-01 11:00:00');
 INSERT INTO trail VALUES ('13033333333', 'A001', '2022-05-01 12:00:00');
 INSERT INTO trail VALUES ('13033333333', 'A001', '2022-05-01 13:00:00');

3.思路分析

1. 问题一:如何找出用户在每个区域的停留开始时间和结束时间?
  1. 首先我们可以通过row_number () over () 函数对用户id和用户所在区域进行分组

     select uid,area ,scan_time ,
     row_number () over (partition by uid order by scan_time) num1, 
     row_number () over (partition by uid,area  order by scan_time)num2
     from trail
     order by uid ,scan_time
    

    2022-7-31-1.PNG

不难发现,当num1-num2时出现变化时说明该用户所在区域也发生变化。

  1. 我们利用num1-num2得

     select uid,area ,scan_time ,
     row_number () over (partition by uid order by scan_time) num1, 
     row_number () over (partition by uid,area  order by scan_time)num2,
     row_number () over (partition by uid order by scan_time)-row_number () over (partition by uid,area  order by scan_time)diff
     from trail
     order by uid ,scan_time
    

    屏幕截图 2022-07-31-2.png

  2. 根据上述分析

     with tmp as (
     select uid,area ,scan_time ,
     row_number () over (partition by uid order by scan_time)-row_number () over (partition by uid,area  order by scan_time)diff
     from trail
     )
     select uid,area,min(scan_time)start_time ,max(scan_time)end_time from tmp group by uid ,area ,diff order by uid,scan_time 
    
2.问题二:假如某个用户核酸检测为阳性,找出他的伴随人员

伴随规则:在阳性人员停留半小时以上的区域,用户停留半小时以上,并且停留时间和阳性人员有十分钟以上的交集-

假设用户1为阳性人员出现的区域是A001 停留时间为11:00-13:00

在问题1基础上,我们只需将表与自身关联即可

注意一下几点:

  1. 关联自身时要排除自己。
  2. 停留时间和阳性人员有十分钟以上的交集,这里需要考虑用户是先到达和后到达的问题,容易漏掉。
 with tmp as (
 select uid,area ,scan_time ,
 row_number () over (partition by uid order by scan_time)-row_number () over (partition by uid,area  order by scan_time)diff
 from trail
 ),
 tmp2 as (
 select uid,area,min(scan_time)start_time ,max(scan_time)end_time from tmp group by uid ,area ,diff
 having min(scan_time)+interval 30 minute  <= max(scan_time)
 )
 select * from tmp2 u1,tmp2 u2 
 where u1.uid<>u2.uid and u1.uid='13011111111' 
 and u1.area=u2.area 
 and u1.start_time+interval 10 minute <u2.end_time 
 and u2.start_time+ interval 10 minute <u1.end_time ;

屏幕截图 2022-07-31-3 .png