一文吃透SQL笔试题

1,092 阅读8分钟

一文吃透笔试SQL

本文主要总结了秋招中常见的SQL使用函数、常见易错点、常见经典题型,以MySql语法为例。

SQL常见函数

加粗为高频函数

1、数字函数

函数名作用示例
abs(x)返回x的绝对值select abs(2); 结果为2
floor(x)返回小于x的最大整数select floor(1.23); 结果为1
ceiling(x)返回大于x的最小整数select ceiling(1.23); 结果是2
mod(x,y)返回x/y的模(余数)select mod(234,10); 结果是4
sqrt(x)返回x的平方根select sqrt(4); 结果是2
pow(x,y)返回x的y幂次方select pow(2,2); 结果是4
exp(x)返回e的幂次方select exp(x); 结果是e2e^{2}
log(x)返回以e为底的对数select log(exp(3)); 结果是3
truncate(x,y)返回数字x截短的y位小数的结果select truncate(1.223,1); 结果是1.2
round(x,y)返回参数x的四舍五入的有y位小数的值select round(1.298,1); 结果是1.3

2、字符串函数

函数名作用示例
concat(s1,s2...,sn)将s1,s2...,sn拼接成字符串select concat('n','e','w'); 结果是 new
concat_ws(sep,s1,s2,...,sn)将s1,s2...,sn拼接成字符串,并用sep字符间隔select concat('s','n','e','w'); 结果是nsesw
left(str,x)返回字符串str最左边的x个字符select left('newdream',4); 结果是newd
right(str,x)返回字符串str最右边的x个字符select right('newdream'); 结果是ream
length(str)返回字符串str的字节数select length('newdream'); 结果是8
trim(str)去除字符串首尾的字符select trim(' newdream '); 结果是newdream
lower(str)返回字符串str所有字符的小写字符select lower('NeW'); 结果是new
upper(str)返回字符串str所有字符的大写字符select upper(neW'); 结果是NEW
substring(s,n,len)获取字符串s的第n位置开始长度为len的字符串select substring('newdream',4,2); 结果是dr
replace(s,s1,s2)将字符串s2代替字符串s中的字符串s1select replace('abc','a','x'); 结果是xbc
cast(数据 as 类型)获取一个类型的值,并将它转化成另一个类型select cast('12efssf' as signed); 结果是12
convert(数据,类型)获取一个类型的值,并将它转化成另一个类型select convert('360.5',signed); 结果是360
二进制:binary 字符型:char浮点型:decimal 整数:signed 无符号整数:unsigned日期:date 时间:time 日期时间:datetime

3、时间函数

函数名作用示例
now()返回当前的日期和时间select now(); 结果是2022-09-02 10:52:19
curdate()返回当前的日期select curdate(); 结果是2022-09-02
curtime()返回当前的时间select curtime(); 结果是10:52:19
year(),month(),day(),week(),quarter(),hour(),minute(),second()年月日时分秒select year(now()); 结果是2022
datediff(date1,date2)返回date1与date2之间相差的天数select datediff(date1,date2) 结果是date1-date2
timestampdiff(单位,date1,date2)返回date2与date1之间相差的单位数select timestampdiff(WEEK,now(),now()) 结果是0
date_add(date,INTERVAL int keyword)返回date加上时间int的结果select date_add("2016-12-31" interval 1 day); 结果是2017-01-01 23:59:59
date_sub(date,INTERVAL int keyword)返回date减去时间int的结果select date_add("2016-12-31" interval 1 day); 结果是2016-12-30 23:59:59
date_format将date数据格式转化为String类型select date_format(now(),'%Y-%m-%d %H:%i:%S'); 结果是string类型的2022-09-02 10:52:19

date_format日期常用格式化 image.png

4、聚合函数

函数名作用
max()计算列的最大值
min()计算列的最小值
count()统计行数量
sum()获取单个列的合计值
avg()计算某个列的平均值

5、条件函数

函数名作用示例
if(condition,A,B)判断condition是否成立,若成立则为A,否则为Bselect if(sex=1,'男','女') from user;
ifnull(A,B)判断A是否为null,如果不为null则A,否则为Bselect ifnull(null,"RUNOOB"); 结果是RUNOOB
coalesce(expession value1,value2,...)判断空值,返回参数中第一个非空表达式select coalesce('苹果汁’,'橙汁','水')
case when then else end如果满足expression=value1则返回key1,以此类推case when condition1 then vlaue1 ... else defaultvaluen end

6、开窗函数

image.png

sql常见易错点

1、判空操作

value == null; value!=null;                          不合法

=,<>,!=,>=,<,<=等操作都不能够判空,判空正确操作:is null,is not null或者<=>

2、运算符like

  • like,rlike都可以做字符匹配,rlike支持正则表达式
  • like和_,%使用,_ 代表单个字符,%代表任意长度字符

3、位运算 & | ~ mysql中所有位运算中,大于-1小于1的数字逻辑被视为逻辑0,其他逻辑1

4、查询limit

select * from 表名 where 字段名 = 0 order by 字段名 asc limit 0, 10
limit(0,1) 错误写法

5、null和零值

insert into my_table (phone) values (NULL);
insert into my_table (phone) values ("");
第一条sql表示暂时不知道电话号码是多少,第二句是电话号码知道并且记录为“”

is null is not null ifnull进行判空

select * from my_table where phone = null
where后面判断的结果永远不会是true

常见题目解析-持续更新

1、留存率 次日留存率、3日留存率、7日留存率
①将device_id分组,每个device_id按照时间进行排序

select 
	distinct date(event_time) as log_day, # 只关心日期,不关注具体的时间。
	device_id as user_id_d0
from role_login_back
group by device_id
order by log_day; a

②取出次日、第3日以及第7日登录的device_id

select 
	distinct log_day,
	a.user_id_d0,
	b.device_id as user_id_d1,
	c.device_id as user_id_d3,
	d.device_id as user_id_d7
from a
left join role_login_back b 
on datediff(date(b.event_time),a.log_day) = 1 
and a.user_id_d0 = b.device_id
left join role_login_back c 
on datediff(date(c.event_time), a.log_day) = 2
and a.user_id_d0 = c.device_id
left join role_login_back d
on datediff(date(d.event_time), a.log_day) = 6
and a.user_id_d0 = d.device_id; temp

③ 计算留存率

select
	log_day '日期',
	count(user_id_d0) '新增数量',
	count(user_id_d1) / count(user_id_d0) '次日留存率',
	count(user_id_d3) / count(user_id_d0) '3日留存率',
	count(user_id_d7) / count(user_id_d0) '7日留存率',
from temp
group by log_day;

④ 综合

select
	log_day '日期',
	count(user_id_d0) '新增数量',
	count(user_id_d1) / count(user_id_d0) '次日留存率',
	count(user_id_d3) / count(user_id_d0) '3日留存率',
	count(user_id_d7) / count(user_id_d0) '7日留存率',
from (
	select 
		distinct log_day,
		a.user_id_d0,
		b.device_id as user_id_d1,
		c.device_id as user_id_d3,
		d.device_id as user_id_d7
	from 
		(select 
			distinct date(event_time) as log_day, # 只关心日期,不关注具体的时间。
			device_id as user_id_d0
		from role_login_back
		group by device_id
		order by log_day) a
	left join role_login_back b 
	on datediff(date(b.event_time),a.log_day) = 1 
	and a.user_id_d0 = b.device_id
	left join role_login_back c 
	on datediff(date(c.event_time), a.log_day) = 2
	and a.user_id_d0 = c.device_id
	left join role_login_back d
	on datediff(date(d.event_time), a.log_day) = 6
	and a.user_id_d0 = d.device_id 
	)
group by log_day;

2、7天内连续3天活跃
① 按用户id分组,按照dt进行排序后使用rank()函数进行排序

select
    user_id,
    dt,
    rank() over (partition by user_id order by dt) rk
from dws_user_action_daycount; t1

② 计算活跃时间dt与排名之间的差值

select
    user_id,
    date_add(dt,-rk) diff
from
t1;t2

③ 按照用户和时间差值进行分组,统计相同时间的次数

select
    user_id,
    diff,count(*)
from 
t2
group by user_id, diff
having count(*)>=3;t3

④ 将用户id进行去重后统计数量

select
    count(distinct user_id)
from
t3

综合

-- 将用户id进行去重后统计数量
select
	count(distinct user_id)
from
(
	    -- 按照用户和时间差值进行分组 统计相同时间的次数
	    select
	        user_id,
	        diff,
	        count(*)
	    from
	    (
	        -- 计算活跃时间dt 与排名之间的差值
	        select
	            user_id,
	            date_add(dt,-rk) diff
	        from
	        (
	            -- 按照用户id分组 按照dt进行排序 后 使用rank() 函数进行排名
	            select
	                user_id,
	                dt,
	                rank() over (partition by user_id order by dt) rk
	            from dws_user_action_daycount
	        )t1
	    )t2
	    group by user_id, diff
	    having count(*)>=3  -- 将少于三次的数据进行过滤掉
)t3

3、每篇文章同一时刻最大再看人数
同一时刻在线/阅读人数问题
每篇文章同一时刻最大在看人数
① 将用户进入时间单独拿出来,记作1;离开时间单独拿出来,记作-1,聚合两张表,按照时间排序。即进入一个加1,离开一个-1。

select
    artical_id,
    in_time dt,
    1 num
from tb_user_log
where artical_id != 0
union all
select
    artical_id,
    out_time dt,
    -1 num
from tb_user_log
where artical_id != 0; a

② 利用窗口函数对计数(1或者-1)求累计和,因为题目规定:同一时间有出有进的话先算进后算出,所以排序按时间排即可

select
    artical_id,
    sum(num) over (partition by artical_id order by dt asc,num desc) cnt
from a;b

③ 在每个分组中求出最大的累积和即为最多同时在线的人数

select
    artical_id,
    max(cnt) max_uv
from b
group by artical_id
order by max_uv desc

综合

select
    artical_id,
    max(cnt) "max_uv"
from
    (SELECT
        artical_id,
        sum(num)over(partition by artical_id order by dt asc,num desc) "cnt"
    from
        (select
            artical_id,
            in_time "dt",
            1 "num"
        from
            tb_user_log
        where artical_id!=0
    union all
        select
            artical_id,
            out_time "dt",
            -1 "num"
        from
        tb_user_log
        where artical_id!=0) as a
    ) as b
 group by artical_id
 order by max_uv desc

4、股票的波峰和波谷
波峰:当前股票价大于前一天和后一天
波谷:当前股票价小于前一天和后一天
① 利用lag()和lead()函数获取到前一天和后一天的股票价格
lag(field,num,defaultvalue)函数查询取出当前行的同意字段(field参数)的前面第num行的数据,如果没有用defultvalue代替

select
    id,
    price,
    dt,
    lag(price,1,price) over (parition by id order by dt) as lag_price,
    lead(price,1,price) over (parition by id order by dt) as lead_prcice
from stock; t

② 判断是否为波峰,波谷

select
*
from
(select
    *,
    case when price>lag_price and price>lead_price then 'crest'
         when price>lag_price and price<lead_price then 'trough'
         esle null
    end res
from t) m
where res is not null

综合

select *
    from(
    select *
          ,case when price>lag_price and price>lead_price then "crest"
                when price<lag_price and price<lead_price then "trough"
           else null
           end as res
    from(
        select id
              ,price
              ,dt
              ,lag(price,1,price) over(partition by id order by dt) as lag_price
              ,lead(price,1,price) over(partition by id order by dt) as lead_price
        from stock
        ) t
    ) m
where res is not null

5、连续增长问题
至少连续3天销售金额连续增长的shop_id
SQL连续增长问题--HQL面试题
至多至少问题描述可以归结于任意、所有都的问题,统称为描述性问题 (1) 所有都大于:所有都大于问题描述min()函数
(2) 所有都小于:所有都小于的问题描述max()函数
(3)至少:本质上所有都大于问题,只要最小值大于该值,那么所有的都将大于
(4)至多:本质上所有都小于问题,只要最大值小于该值,那么所有的都将小于
(5)对于所有都的问题思考,我们往往可以从反面入手,通过补集的思想解决问题

① 求出每天金额的汇总情况

  select
    shop_id,to_date(order_time) as order_date,sum(order_amt) as order_amt
  from
    shop_order
  group by
     shop_id,to_date(order_time);tmp

② 计算连续性增长的各类指标

select *
      ,sum(case when order_amt_diff <= 0 then 1 else 0 end) over(partition by shop_id,flag1 order by order_date) flag2 --增长性分组条件
from(
    select *
         ,sum(if(order_date_diff>1,1,0)) over(partition by shop_id order by order_date) as flag1 --连续性分组条件
    from(
        select shop_id
            ,order_date
            ,order_amt
            ,datediff(order_date,lag(order_date,1,date_sub(order_date,1)) over(partition by shop_id order by order_date)) as order_date_diff --用于判断是否按天连续
            ,order_amt - lag(order_amt,1,0) over(partition by shop_id order by order_date) as order_amt_diff --判断是否增长
    from tmp
    ) t
) m

③ 根据筛选条件,选出符合条件的结果

select
    shop_id
from m
group by shop_id,flag1,flag2
having count(1)>=3

综合

with tmp as(
 select
    shop_id,to_date(order_time) as order_date,sum(order_amt) as order_amt
  from
    shop_order
  group by
     shop_id,to_date(order_time)
) 
select shop_id
from(
    select *
          ,sum(case when order_amt_diff <= 0 then 1 else 0 end) over(partition by shop_id,flag1 order by order_date) flag2 --增长性分组条件
    from(
        select *
             ,sum(if(order_date_diff>1,1,0)) over(partition by shop_id order by order_date) as flag1 --连续性分组条件
        from(
            select shop_id
                ,order_date
                ,order_amt
                ,datediff(order_date,lag(order_date,1,date_sub(order_date,1)) over(partition by shop_id order by order_date)) as order_date_diff --用于判断是否按天连续
                ,order_amt - lag(order_amt,1,0) over(partition by shop_id order by order_date) as order_amt_diff --判断是否增长
        from tmp
        ) t
    ) m
) n
group by shop_id,flag1,flag2
having count(1) >=3