Dataframe之join,merge的使用

254 阅读2分钟

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df1 = pd.DataFrame(np.ones((2,4)),columns=list("abcd"), index=list("AB"))

In [4]: df1
Out[4]:
a b c d
A 1.0 1.0 1.0 1.0
B 1.0 1.0 1.0 1.0

In [5]: df2 = pd.DataFrame(np.zeros((3,3)), columns=list("xyz"), index=list("ABC"))

In [6]: df2
Out[6]:
x y z
A 0.0 0.0 0.0
B 0.0 0.0 0.0
C 0.0 0.0 0.0

join就是按照行索引进行相应的合并

In [7]: df1.join(df2)
Out[7]:
a b c d x y z
A 1.0 1.0 1.0 1.0 0.0 0.0 0.0
B 1.0 1.0 1.0 1.0 0.0 0.0 0.0

In [8]: df2.join(df1)
Out[8]:
x y z a b c d
A 0.0 0.0 0.0 1.0 1.0 1.0 1.0
B 0.0 0.0 0.0 1.0 1.0 1.0 1.0
C 0.0 0.0 0.0 NaN NaN NaN NaN
merge:按照指定的列把数据按照一定的方式合并



In [41]: df3 = pd.DataFrame(np.arange(9).reshape(3,3), columns=list("fax"))

In [42]: df3
Out[42]:
f a x
0 0 1 2
1 3 4 5
2 6 7 8

In [44]: df1 = pd.DataFrame(np.ones((2,4)), columns=list("abcd"), index=list("A
...: B"))

In [45]: df1
Out[45]:
a b c d
A 1.0 1.0 1.0 1.0
B 1.0 1.0 1.0 1.0

In [47]: df1.merge(df3, on="a")
Out[47]:
a b c d f x
0 1.0 1.0 1.0 1.0 0 2
1 1.0 1.0 1.0 1.0 0 2




In [49]: df1.loc["A", "a"] = 100

In [50]: df1
Out[50]:
a b c d
A 100.0 1.0 1.0 1.0
B 1.0 1.0 1.0 1.0

In [51]: df1.merge(df3, on="a")
Out[51]:
a b c d f x
0 1.0 1.0 1.0 1.0 0 2
按照列进行合并,df1,和df3这"a"这一列相同的值只有1,所以得到上述的结果

In [52]: df3
Out[52]:
f a x
0 0 1 2
1 3 4 5
2 6 7 8




In [53]: df1
Out[53]:
a b c d
A 100.0 1.0 1.0 1.0
B 1.0 1.0 1.0 1.0

In [54]: df3
Out[54]:
f a x
0 0 1 2
1 3 4 5
2 6 7 8

默认的是内连接
In [55]: df1.merge(df3, on="a", how="inner")
Out[55]:
a b c d f x
0 1.0 1.0 1.0 1.0 0 2




In [56]: df3
Out[56]:
f a x
0 0 1 2
1 3 4 5
2 6 7 8

In [57]: df1
Out[57]:
a b c d
A 100.0 1.0 1.0 1.0
B 1.0 1.0 1.0 1.0


外连接

In [58]: df1.merge(df3, on="a", how="outer")
Out[58]:
a b c d f x
0 100.0 1.0 1.0 1.0 NaN NaN
1 1.0 1.0 1.0 1.0 0.0 2.0
2 4.0 NaN NaN NaN 3.0 5.0
3 7.0 NaN NaN NaN 6.0 8.0





In [59]: df1
Out[59]:
a b c d
A 100.0 1.0 1.0 1.0
B 1.0 1.0 1.0 1.0

In [60]: df3
Out[60]:
f a x
0 0 1 2
1 3 4 5
2 6 7 8

左连接
In [61]: df1.merge(df3, on="a", how="left")
Out[61]:
a b c d f x
0 100.0 1.0 1.0 1.0 NaN NaN
1 1.0 1.0 1.0 1.0 0.0 2.0
右连接

In [62]: df1.merge(df3, on="a", how="right")
Out[62]:
a b c d f x
0 1.0 1.0 1.0 1.0 0 2
1 4.0 NaN NaN NaN 3 5
2 7.0 NaN NaN NaN 6 8




默认的合并方式inner,交集

merge outer,并集,NaN补全

merge left,左边为准,NaN补全

merge right,右边为准,NaN补全