一文吃透笔试SQL
本文主要总结了秋招中常见的SQL使用函数、常见易错点、常见经典题型,以MySql语法为例。
SQL常见函数
加粗为高频函数
1、数字函数
| 函数名 | 作用 | 示例 |
|---|---|---|
| abs(x) | 返回x的绝对值 | select abs(2); 结果为2 |
| floor(x) | 返回小于x的最大整数 | select floor(1.23); 结果为1 |
| ceiling(x) | 返回大于x的最小整数 | select ceiling(1.23); 结果是2 |
| mod(x,y) | 返回x/y的模(余数) | select mod(234,10); 结果是4 |
| sqrt(x) | 返回x的平方根 | select sqrt(4); 结果是2 |
| pow(x,y) | 返回x的y幂次方 | select pow(2,2); 结果是4 |
| exp(x) | 返回e的幂次方 | select exp(x); 结果是 |
| log(x) | 返回以e为底的对数 | select log(exp(3)); 结果是3 |
| truncate(x,y) | 返回数字x截短的y位小数的结果 | select truncate(1.223,1); 结果是1.2 |
| round(x,y) | 返回参数x的四舍五入的有y位小数的值 | select round(1.298,1); 结果是1.3 |
2、字符串函数
| 函数名 | 作用 | 示例 |
|---|---|---|
| concat(s1,s2...,sn) | 将s1,s2...,sn拼接成字符串 | select concat('n','e','w'); 结果是 new |
| concat_ws(sep,s1,s2,...,sn) | 将s1,s2...,sn拼接成字符串,并用sep字符间隔 | select concat('s','n','e','w'); 结果是nsesw |
| left(str,x) | 返回字符串str最左边的x个字符 | select left('newdream',4); 结果是newd |
| right(str,x) | 返回字符串str最右边的x个字符 | select right('newdream'); 结果是ream |
| length(str) | 返回字符串str的字节数 | select length('newdream'); 结果是8 |
| trim(str) | 去除字符串首尾的字符 | select trim(' newdream '); 结果是newdream |
| lower(str) | 返回字符串str所有字符的小写字符 | select lower('NeW'); 结果是new |
| upper(str) | 返回字符串str所有字符的大写字符 | select upper(neW'); 结果是NEW |
| substring(s,n,len) | 获取字符串s的第n位置开始长度为len的字符串 | select substring('newdream',4,2); 结果是dr |
| replace(s,s1,s2) | 将字符串s2代替字符串s中的字符串s1 | select replace('abc','a','x'); 结果是xbc |
| cast(数据 as 类型) | 获取一个类型的值,并将它转化成另一个类型 | select cast('12efssf' as signed); 结果是12 |
| convert(数据,类型) | 获取一个类型的值,并将它转化成另一个类型 | select convert('360.5',signed); 结果是360 |
| 二进制:binary 字符型:char | 浮点型:decimal 整数:signed 无符号整数:unsigned | 日期:date 时间:time 日期时间:datetime |
3、时间函数
| 函数名 | 作用 | 示例 |
|---|---|---|
| now() | 返回当前的日期和时间 | select now(); 结果是2022-09-02 10:52:19 |
| curdate() | 返回当前的日期 | select curdate(); 结果是2022-09-02 |
| curtime() | 返回当前的时间 | select curtime(); 结果是10:52:19 |
| year(),month(),day(),week(),quarter(),hour(),minute(),second() | 年月日时分秒 | select year(now()); 结果是2022 |
| datediff(date1,date2) | 返回date1与date2之间相差的天数 | select datediff(date1,date2) 结果是date1-date2 |
| timestampdiff(单位,date1,date2) | 返回date2与date1之间相差的单位数 | select timestampdiff(WEEK,now(),now()) 结果是0 |
| date_add(date,INTERVAL int keyword) | 返回date加上时间int的结果 | select date_add("2016-12-31" interval 1 day); 结果是2017-01-01 23:59:59 |
| date_sub(date,INTERVAL int keyword) | 返回date减去时间int的结果 | select date_add("2016-12-31" interval 1 day); 结果是2016-12-30 23:59:59 |
| date_format | 将date数据格式转化为String类型 | select date_format(now(),'%Y-%m-%d %H:%i:%S'); 结果是string类型的2022-09-02 10:52:19 |
date_format日期常用格式化
4、聚合函数
| 函数名 | 作用 |
|---|---|
| max() | 计算列的最大值 |
| min() | 计算列的最小值 |
| count() | 统计行数量 |
| sum() | 获取单个列的合计值 |
| avg() | 计算某个列的平均值 |
5、条件函数
| 函数名 | 作用 | 示例 |
|---|---|---|
| if(condition,A,B) | 判断condition是否成立,若成立则为A,否则为B | select if(sex=1,'男','女') from user; |
| ifnull(A,B) | 判断A是否为null,如果不为null则A,否则为B | select ifnull(null,"RUNOOB"); 结果是RUNOOB |
| coalesce(expession value1,value2,...) | 判断空值,返回参数中第一个非空表达式 | select coalesce('苹果汁’,'橙汁','水') |
| case when then else end | 如果满足expression=value1则返回key1,以此类推 | case when condition1 then vlaue1 ... else defaultvaluen end |
6、开窗函数
sql常见易错点
1、判空操作
value == null; value!=null; 不合法
=,<>,!=,>=,<,<=等操作都不能够判空,判空正确操作:is null,is not null或者<=>
2、运算符like
- like,rlike都可以做字符匹配,rlike支持正则表达式
- like和_,%使用,_ 代表单个字符,%代表任意长度字符
3、位运算 & | ~ mysql中所有位运算中,大于-1小于1的数字逻辑被视为逻辑0,其他逻辑1
4、查询limit
select * from 表名 where 字段名 = 0 order by 字段名 asc limit 0, 10
limit(0,1) 错误写法
5、null和零值
insert into my_table (phone) values (NULL);
insert into my_table (phone) values ("");
第一条sql表示暂时不知道电话号码是多少,第二句是电话号码知道并且记录为“”
is null is not null ifnull进行判空
select * from my_table where phone = null
where后面判断的结果永远不会是true
常见题目解析-持续更新
1、留存率
次日留存率、3日留存率、7日留存率
①将device_id分组,每个device_id按照时间进行排序
select
distinct date(event_time) as log_day, # 只关心日期,不关注具体的时间。
device_id as user_id_d0
from role_login_back
group by device_id
order by log_day; a
②取出次日、第3日以及第7日登录的device_id
select
distinct log_day,
a.user_id_d0,
b.device_id as user_id_d1,
c.device_id as user_id_d3,
d.device_id as user_id_d7
from a
left join role_login_back b
on datediff(date(b.event_time),a.log_day) = 1
and a.user_id_d0 = b.device_id
left join role_login_back c
on datediff(date(c.event_time), a.log_day) = 2
and a.user_id_d0 = c.device_id
left join role_login_back d
on datediff(date(d.event_time), a.log_day) = 6
and a.user_id_d0 = d.device_id; temp
③ 计算留存率
select
log_day '日期',
count(user_id_d0) '新增数量',
count(user_id_d1) / count(user_id_d0) '次日留存率',
count(user_id_d3) / count(user_id_d0) '3日留存率',
count(user_id_d7) / count(user_id_d0) '7日留存率',
from temp
group by log_day;
④ 综合
select
log_day '日期',
count(user_id_d0) '新增数量',
count(user_id_d1) / count(user_id_d0) '次日留存率',
count(user_id_d3) / count(user_id_d0) '3日留存率',
count(user_id_d7) / count(user_id_d0) '7日留存率',
from (
select
distinct log_day,
a.user_id_d0,
b.device_id as user_id_d1,
c.device_id as user_id_d3,
d.device_id as user_id_d7
from
(select
distinct date(event_time) as log_day, # 只关心日期,不关注具体的时间。
device_id as user_id_d0
from role_login_back
group by device_id
order by log_day) a
left join role_login_back b
on datediff(date(b.event_time),a.log_day) = 1
and a.user_id_d0 = b.device_id
left join role_login_back c
on datediff(date(c.event_time), a.log_day) = 2
and a.user_id_d0 = c.device_id
left join role_login_back d
on datediff(date(d.event_time), a.log_day) = 6
and a.user_id_d0 = d.device_id
)
group by log_day;
2、7天内连续3天活跃
① 按用户id分组,按照dt进行排序后使用rank()函数进行排序
select
user_id,
dt,
rank() over (partition by user_id order by dt) rk
from dws_user_action_daycount; t1
② 计算活跃时间dt与排名之间的差值
select
user_id,
date_add(dt,-rk) diff
from
t1;t2
③ 按照用户和时间差值进行分组,统计相同时间的次数
select
user_id,
diff,count(*)
from
t2
group by user_id, diff
having count(*)>=3;t3
④ 将用户id进行去重后统计数量
select
count(distinct user_id)
from
t3
综合
-- 将用户id进行去重后统计数量
select
count(distinct user_id)
from
(
-- 按照用户和时间差值进行分组 统计相同时间的次数
select
user_id,
diff,
count(*)
from
(
-- 计算活跃时间dt 与排名之间的差值
select
user_id,
date_add(dt,-rk) diff
from
(
-- 按照用户id分组 按照dt进行排序 后 使用rank() 函数进行排名
select
user_id,
dt,
rank() over (partition by user_id order by dt) rk
from dws_user_action_daycount
)t1
)t2
group by user_id, diff
having count(*)>=3 -- 将少于三次的数据进行过滤掉
)t3
3、每篇文章同一时刻最大再看人数
同一时刻在线/阅读人数问题
每篇文章同一时刻最大在看人数
① 将用户进入时间单独拿出来,记作1;离开时间单独拿出来,记作-1,聚合两张表,按照时间排序。即进入一个加1,离开一个-1。
select
artical_id,
in_time dt,
1 num
from tb_user_log
where artical_id != 0
union all
select
artical_id,
out_time dt,
-1 num
from tb_user_log
where artical_id != 0; a
② 利用窗口函数对计数(1或者-1)求累计和,因为题目规定:同一时间有出有进的话先算进后算出,所以排序按时间排即可
select
artical_id,
sum(num) over (partition by artical_id order by dt asc,num desc) cnt
from a;b
③ 在每个分组中求出最大的累积和即为最多同时在线的人数
select
artical_id,
max(cnt) max_uv
from b
group by artical_id
order by max_uv desc
综合
select
artical_id,
max(cnt) "max_uv"
from
(SELECT
artical_id,
sum(num)over(partition by artical_id order by dt asc,num desc) "cnt"
from
(select
artical_id,
in_time "dt",
1 "num"
from
tb_user_log
where artical_id!=0
union all
select
artical_id,
out_time "dt",
-1 "num"
from
tb_user_log
where artical_id!=0) as a
) as b
group by artical_id
order by max_uv desc
4、股票的波峰和波谷
波峰:当前股票价大于前一天和后一天
波谷:当前股票价小于前一天和后一天
① 利用lag()和lead()函数获取到前一天和后一天的股票价格
lag(field,num,defaultvalue)函数查询取出当前行的同意字段(field参数)的前面第num行的数据,如果没有用defultvalue代替
select
id,
price,
dt,
lag(price,1,price) over (parition by id order by dt) as lag_price,
lead(price,1,price) over (parition by id order by dt) as lead_prcice
from stock; t
② 判断是否为波峰,波谷
select
*
from
(select
*,
case when price>lag_price and price>lead_price then 'crest'
when price>lag_price and price<lead_price then 'trough'
esle null
end res
from t) m
where res is not null
综合
select *
from(
select *
,case when price>lag_price and price>lead_price then "crest"
when price<lag_price and price<lead_price then "trough"
else null
end as res
from(
select id
,price
,dt
,lag(price,1,price) over(partition by id order by dt) as lag_price
,lead(price,1,price) over(partition by id order by dt) as lead_price
from stock
) t
) m
where res is not null
5、连续增长问题
至少连续3天销售金额连续增长的shop_id
SQL连续增长问题--HQL面试题
至多至少问题描述可以归结于任意、所有都的问题,统称为描述性问题
(1) 所有都大于:所有都大于问题描述min()函数
(2) 所有都小于:所有都小于的问题描述max()函数
(3)至少:本质上所有都大于问题,只要最小值大于该值,那么所有的都将大于
(4)至多:本质上所有都小于问题,只要最大值小于该值,那么所有的都将小于
(5)对于所有都的问题思考,我们往往可以从反面入手,通过补集的思想解决问题
① 求出每天金额的汇总情况
select
shop_id,to_date(order_time) as order_date,sum(order_amt) as order_amt
from
shop_order
group by
shop_id,to_date(order_time);tmp
② 计算连续性增长的各类指标
select *
,sum(case when order_amt_diff <= 0 then 1 else 0 end) over(partition by shop_id,flag1 order by order_date) flag2 --增长性分组条件
from(
select *
,sum(if(order_date_diff>1,1,0)) over(partition by shop_id order by order_date) as flag1 --连续性分组条件
from(
select shop_id
,order_date
,order_amt
,datediff(order_date,lag(order_date,1,date_sub(order_date,1)) over(partition by shop_id order by order_date)) as order_date_diff --用于判断是否按天连续
,order_amt - lag(order_amt,1,0) over(partition by shop_id order by order_date) as order_amt_diff --判断是否增长
from tmp
) t
) m
③ 根据筛选条件,选出符合条件的结果
select
shop_id
from m
group by shop_id,flag1,flag2
having count(1)>=3
综合
with tmp as(
select
shop_id,to_date(order_time) as order_date,sum(order_amt) as order_amt
from
shop_order
group by
shop_id,to_date(order_time)
)
select shop_id
from(
select *
,sum(case when order_amt_diff <= 0 then 1 else 0 end) over(partition by shop_id,flag1 order by order_date) flag2 --增长性分组条件
from(
select *
,sum(if(order_date_diff>1,1,0)) over(partition by shop_id order by order_date) as flag1 --连续性分组条件
from(
select shop_id
,order_date
,order_amt
,datediff(order_date,lag(order_date,1,date_sub(order_date,1)) over(partition by shop_id order by order_date)) as order_date_diff --用于判断是否按天连续
,order_amt - lag(order_amt,1,0) over(partition by shop_id order by order_date) as order_amt_diff --判断是否增长
from tmp
) t
) m
) n
group by shop_id,flag1,flag2
having count(1) >=3