Python 与 SQL的互替

328 阅读2分钟

使用Pandas执行各种SQL操作的示例

在数据处理中除了日常使用SQL之外,有时也需要利用python进行数据清洗,下面是一些利用python处理实现与SQL相同功能的示例。

一、select

SQL:

select c1,c2,dt from t1

Py:

f1=pd.read_csv(t1)
f2 =f1[['c1', 'c2','dt']].head(5)
print(f2)

二、where

SQL:

select c1,c2 from t1 where dt='2023-12-09'

Py:

f1=pd.read_csv(f1)
f2=f1[['c1', 'c2','dt']]
f3=f2[f2['dt']=='2023-12-09']
print(f3)

SQL:

select c1,c2,dt from t1 where dt in ('2023-12-09','2023-12-08')

Py:

f1=pd.read_csv(f1)
f2=f1[['c1', 'c2','dt']]
f3=f2[f2['dt'].isin(['2023-12-09','2023-12-08'])]
print(f3)

三、group by

SQL:

select sex,count(*) from t1 group by sex

Py:

分组
f1=pd.read_csv(f1)
f2=f1.groupby('sex').size()
分组聚合
f3=f1.groupby(['sex']).agg({'sale_qtty':sum,'after_prefr_amount_1':sum,'sale_ord_id':pd.Series.nunique}).reset_index()
分组聚合且排序
f4=f1.groupby(['c1','c2']).agg({'sale_qtty':sum,'after_prefr_amount_1':sum,'sale_ord_id':pd.Series.nunique}).reset_index().sort_values(by='after_prefr_amount_1',ascending=False)

四、前N行

SQL:

select c1,c2,dt from t1  limit 100

Py:

f2=f1[['c1', 'c2','dt']].head(100)

五、去重

SQL:

select distinct dt from f1 
select dept_name_1,sale_ord_id from f1 group by dept_name_1,sale_ord_id

Py:

f2=f1['dt'].unique()
f3=f1[['dept_name_1','sale_ord_id']].drop_duplicates()

六、透视表

SQL:

select dept_name_1,dept_name_2,dept_name_3,sum(sale_qtty),sum(after_prefr_amount_1),count(distinct sale_ord_id) from f1 group by dept_name_1,dept_name_2,dept_name_3

Py:

f2 = pd.pivot_table(f1,index=['dept_name_1', 'dept_name_2', 'dept_name_3'],
     values=['sale_qtty', 'after_prefr_amount_1','sale_ord_id'],  aggfunc={'sale_qtty':np.sum,'after_prefr_amount_1':np.sum,'sale_ord_id':pd.Series.nunique}).reset_index()

七、定义函数:

SQL:

CREATE FUNCTION shaixuan(@x varchar(10),@y int) 
RETURNS int 
as 
BEGIN 
DECLARE @z int;
if @x='奢品钟表部' and @y>1
set @z=1
else 
set @z=0
return(@z)
end
select dbo.shaixuan('奢品钟表部',7)

Py:

def shaixuan(x,y):
    if x=='奢品钟表部' and y>1:
        return 1
    else:
        return 0
df1['test']=df1.apply(lambda x:shaixuan(x['dept_name_1'],x['sale_qtty']),axis=1)