In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df1 = pd.DataFrame(np.ones((2,4)),columns=list("abcd"), index=list("AB"))
In [4]: df1 Out[4]: a b c d A 1.0 1.0 1.0 1.0 B 1.0 1.0 1.0 1.0
In [5]: df2 = pd.DataFrame(np.zeros((3,3)), columns=list("xyz"), index=list("ABC"))
In [6]: df2 Out[6]: x y z A 0.0 0.0 0.0 B 0.0 0.0 0.0 C 0.0 0.0 0.0
join就是按照行索引进行相应的合并
In [7]: df1.join(df2) Out[7]: a b c d x y z A 1.0 1.0 1.0 1.0 0.0 0.0 0.0 B 1.0 1.0 1.0 1.0 0.0 0.0 0.0
In [8]: df2.join(df1) Out[8]: x y z a b c d A 0.0 0.0 0.0 1.0 1.0 1.0 1.0 B 0.0 0.0 0.0 1.0 1.0 1.0 1.0 C 0.0 0.0 0.0 NaN NaN NaN NaN merge:按照指定的列把数据按照一定的方式合并
In [41]: df3 = pd.DataFrame(np.arange(9).reshape(3,3), columns=list("fax"))
In [42]: df3 Out[42]: f a x 0 0 1 2 1 3 4 5 2 6 7 8
In [44]: df1 = pd.DataFrame(np.ones((2,4)), columns=list("abcd"), index=list("A ...: B"))
In [45]: df1 Out[45]: a b c d A 1.0 1.0 1.0 1.0 B 1.0 1.0 1.0 1.0
In [47]: df1.merge(df3, on="a") Out[47]: a b c d f x 0 1.0 1.0 1.0 1.0 0 2 1 1.0 1.0 1.0 1.0 0 2
In [49]: df1.loc["A", "a"] = 100
In [50]: df1 Out[50]: a b c d A 100.0 1.0 1.0 1.0 B 1.0 1.0 1.0 1.0
In [51]: df1.merge(df3, on="a") Out[51]: a b c d f x 0 1.0 1.0 1.0 1.0 0 2 按照列进行合并,df1,和df3这"a"这一列相同的值只有1,所以得到上述的结果
In [52]: df3 Out[52]: f a x 0 0 1 2 1 3 4 5 2 6 7 8
In [53]: df1 Out[53]: a b c d A 100.0 1.0 1.0 1.0 B 1.0 1.0 1.0 1.0
In [54]: df3 Out[54]: f a x 0 0 1 2 1 3 4 5 2 6 7 8
默认的是内连接 In [55]: df1.merge(df3, on="a", how="inner") Out[55]: a b c d f x 0 1.0 1.0 1.0 1.0 0 2
In [56]: df3 Out[56]: f a x 0 0 1 2 1 3 4 5 2 6 7 8
In [57]: df1 Out[57]: a b c d A 100.0 1.0 1.0 1.0 B 1.0 1.0 1.0 1.0
外连接
In [58]: df1.merge(df3, on="a", how="outer") Out[58]: a b c d f x 0 100.0 1.0 1.0 1.0 NaN NaN 1 1.0 1.0 1.0 1.0 0.0 2.0 2 4.0 NaN NaN NaN 3.0 5.0 3 7.0 NaN NaN NaN 6.0 8.0
In [59]: df1 Out[59]: a b c d A 100.0 1.0 1.0 1.0 B 1.0 1.0 1.0 1.0
In [60]: df3 Out[60]: f a x 0 0 1 2 1 3 4 5 2 6 7 8
左连接 In [61]: df1.merge(df3, on="a", how="left") Out[61]: a b c d f x 0 100.0 1.0 1.0 1.0 NaN NaN 1 1.0 1.0 1.0 1.0 0.0 2.0 右连接
In [62]: df1.merge(df3, on="a", how="right") Out[62]: a b c d f x 0 1.0 1.0 1.0 1.0 0 2 1 4.0 NaN NaN NaN 3 5 2 7.0 NaN NaN NaN 6 8
默认的合并方式inner,交集
merge outer,并集,NaN补全
merge left,左边为准,NaN补全
merge right,右边为准,NaN补全
|
|