数据整理涉及以各种格式(如合并,分组,连接等)处理数据,以便分析或准备将其与另一组数据一起使用。
合并数据
python中的Pandas库提供单个函数 merge ,作为DataFrame对象之间所有标准数据库联接操作的入口点-
pd.merge(left, right, how=inner, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True)
现在让无涯教程创建两个不同的DataFrame并对其执行合并操作。
# import the pandas library import pandas as pd left = pd.DataFrame({ id:[1,2,3,4,5], Name: [Alex, Amy, Allen, Alice, Learnfk], subject_id:[sub1,sub2,sub4,sub6,sub5]}) right = pd.DataFrame( {id:[1,2,3,4,5], Name: [Billy, Brian, Bran, Bryce, Betty], subject_id:[sub2,sub4,sub3,sub6,sub5]}) print left print right
其输出如下-
Name id subject_id
0 Alex 1 sub1
1 Amy 2 sub2
2 Allen 3 sub4
3 Alice 4 sub6
4 Learnfk 5 sub5
</span><span class="typ">Name</span><span class="pln"> id subject_id
0 Billy 1 sub2
1 Brian 2 sub4
2 Bran 3 sub3
3 Bryce 4 sub6
4 Betty 5 sub5
分组数据
将数据集分组是数据分析中经常需要的一种,需要根据数据集中存在的各种分组来获得输出。在下面的示例中,按年份对数据进行分组,然后获得特定年份的输出。
# import the pandas library import pandas as pdipl_data = {Team: [Riders, Riders, Devils, Devils, Kings, kings, Kings, Kings, Riders, Royals, Royals, Riders], Rank: [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2], Year: [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017], Points:[876,789,863,673,741,812,756,788,694,701,804,690]} df = pd.DataFrame(ipl_data)
grouped = df.groupby(Year) print grouped.get_group(2014)
其输出如下-
Points Rank Team Year 0 876 1 Riders 2014 2 863 2 Devils 2014 4 741 3 Kings 2014 9 701 4 Royals 2014
串联数据
Pandas提供了各种功能,可以轻松地将 Series,DataFrame 和 Panel 对象组合在一起,在下面的示例中 concat 函数沿轴执行串联操作,让无涯教程创建不同的对象并进行串联。
import pandas as pd one = pd.DataFrame({ Name: [Alex, Amy, Allen, Alice, Learnfk], subject_id:[sub1,sub2,sub4,sub6,sub5], Marks_scored:[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ Name: [Billy, Brian, Bran, Bryce, Betty], subject_id:[sub2,sub4,sub3,sub6,sub5], Marks_scored:[89,80,79,97,88]}, index=[1,2,3,4,5]) print pd.concat([one,two])
其输出如下-
Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Learnfk sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5