排版问题
导入包
import numpy as np
import pandas as pd
索引器
表的列索引
- 列索引是最常见的索引形式
- 一般通过
[]
来实现
- 通过
[列名]
可以从DataFrame
中取出一列,返回值为Series
- 取出多列,可通过
列名组成的列表
,返回值为DataFrame
- 取出单列,列名不包含空格,可用
.列名
取出,这和[列名]
等价
df = pd.read_csv('data/learn_pandas.csv',
usecols = ['School','Grade','Name','Gender','Weight','Height'])
df['Name'].head()
0 Gaopeng Yang
1 Changqiang You
2 Mei Sun
3 Xiaojuan Sun
4 Gaojuan You
Name: Name, dtype: object
df[['Gender','Name']].head()

df.Name.head()
0 Gaopeng Yang
1 Changqiang You
2 Mei Sun
3 Xiaojuan Sun
4 Gaojuan You
Name: Name, dtype: object
序列的行索引
- 1.以字符串为索引的
Series
- 取出单个索引的对应元素,可以使用
[item]
Series
只有单个值对应,返回这个标量值
- 多个值对应,返回
Series
- 取出多个索引对应元素,可以使用
[items的列表]
s = pd.Series(np.arange(1,7,1),index=['a','b','a','a','e','f'])
s['a']
a 1
a 3
a 4
dtype: int64
s['b']
2
s[['a','f']]
a 1
a 3
a 4
f 6
dtype: int64
- 取出某2个索引之间的元素,并且这2个索引在整个索引中唯一出现,则可以使用切片
- 这里切片会
包含2个端点
s['b':'e']
b 2
a 3
a 4
e 5
dtype: int64
- 2.以整数为索引的Series
- 默认会生成从0开始的整数索引,任意一组符合长度要求的整数都可以作为索引
- 和字符串一样,如果使用
[int]
或者[int_list]
,可以取出对应索引元素
的值
s = pd.Series(['a','b','c','d','e','f'],index=[1,2,3,1,4,5])
s[1]
1 a
1 d
dtype: object
s[[2,3]]
2 b
3 c
dtype: object
- 如果使用整数切片,则会取出对应索引
位置
的值
- 这里整数切片和python中一样,
不包含右端点
s[1:5:2]
2 b
1 d
dtype: object
loc索引器
- 对行对选取
- 对于表而言,有2种索引器
- 基于元素的
loc
索引
loc
索引形式:loc[*,*]
- 如果省略第2个位置写作
loc[*]
,这里只选择行
*
的位置一共有5类合法对象:单个元素、元素列表、元素切片、布尔列表、函数
- 基于位置的
iloc
索引
df_demo = df.set_index('Name')
df_demo.head(10)

df_demo.loc['Mei Sun']
df_demo.loc['Quan Zhao']
School Shanghai Jiao Tong University
Grade Junior
Gender Female
Height 160.6
Weight 53
Name: Quan Zhao, dtype: object
df_demo.loc['Mei Sun','School']
Name
Mei Sun Shanghai Jiao Tong University
Mei Sun Shanghai Jiao Tong University
Name: School, dtype: object
df_demo.loc[['Mei Sun','Quan Zhao'],['School','Gender']]
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
|
School |
Gender |
Name |
|
|
Mei Sun |
Shanghai Jiao Tong University |
Male |
Mei Sun |
Shanghai Jiao Tong University |
Female |
Quan Zhao |
Shanghai Jiao Tong University |
Female |
*
为切片
- Series使用字符串索引,如果是唯一值的起点和终点字符,可以使用切片,且包含2个端点
df_demo.loc['Xiaoli Qian':'Gaoqiang Qian','School':'Gender']
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
|
School |
Grade |
Gender |
Name |
|
|
|
Xiaoli Qian |
Tsinghua University |
Freshman |
Female |
Qiang Chu |
Shanghai Jiao Tong University |
Freshman |
Female |
Gaoqiang Qian |
Tsinghua University |
Junior |
Female |
注意⚠️
如果DataFrame
使用整数索引,使用整数切片的时候和上面字符串索引要求一致,都是元素
切片。
包含端点且起点、终点不允许有重复值。
df_demo1 = df_demo.copy()
df_demo1.index = range(df_demo.shape[0],0,-1)
df_demo1.loc[5:3]
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
|
School |
Grade |
Gender |
Height |
Weight |
5 |
Fudan University |
Junior |
Female |
153.9 |
46.0 |
4 |
Tsinghua University |
Senior |
Female |
160.9 |
50.0 |
3 |
Shanghai Jiao Tong University |
Senior |
Female |
153.9 |
45.0 |
df_demo1.loc[3:5]
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
|
School |
Grade |
Gender |
Height |
Weight |
*
为布尔列表
根据条件筛选,传入loc布尔列表和dataframe长度相同,且列表为True会被选中
df_demo.loc[df_demo.Weight>70].head()
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
|
School |
Grade |
Gender |
Height |
Weight |
Name |
|
|
|
|
|
Mei Sun |
Shanghai Jiao Tong University |
Senior |
Male |
188.9 |
89.0 |
Gaojuan You |
Fudan University |
Sophomore |
Male |
174.0 |
74.0 |
Xiaopeng Zhou |
Shanghai Jiao Tong University |
Freshman |
Male |
174.1 |
74.0 |
Xiaofeng Sun |
Tsinghua University |
Senior |
Male |
170.3 |
71.0 |
Qiang Zheng |
Shanghai Jiao Tong University |
Senior |
Male |
183.9 |
87.0 |
df_demo.loc[df_demo.Grade.isin(['Freshman','Senior'])].head()
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
|
School |
Grade |
Gender |
Height |
Weight |
Name |
|
|
|
|
|
Gaopeng Yang |
Shanghai Jiao Tong University |
Freshman |
Female |
158.9 |
46.0 |
Changqiang You |
Peking University |
Freshman |
Male |
166.5 |
70.0 |
Mei Sun |
Shanghai Jiao Tong University |
Senior |
Male |
188.9 |
89.0 |
Xiaoli Qian |
Tsinghua University |
Freshman |
Female |
158.0 |
51.0 |
Qiang Chu |
Shanghai Jiao Tong University |
Freshman |
Female |
162.5 |
52.0 |
复合条件
c1 = df_demo.School == 'Fudan University'
c2 = df_demo.Weight > 70
c3 = df_demo.Grade == 'Senior'
fudan = c1 & c2 & c3
c4 = df_demo.School == 'Peking University'
c5 = df_demo.Weight > 80
c6 = df_demo.Grade == 'Senior'
qinghua = c4 & c5 & (~c6)
df_demo.loc[fudan | qinghua]
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
|
School |
Grade |
Gender |
Height |
Weight |
Name |
|
|
|
|
|
Qiang Han |
Peking University |
Freshman |
Male |
185.3 |
87.0 |
Chengpeng Zhou |
Fudan University |
Senior |
Male |
177.1 |
81.0 |
Changpeng Zhao |
Peking University |
Freshman |
Male |
181.3 |
83.0 |
Chengpeng Qian |
Fudan University |
Senior |
Male |
177.2 |
73.0 |
Name
Gaopeng Yang False
Changqiang You False
Mei Sun False
Xiaojuan Sun False
Gaojuan You False
...
Xiaojuan Sun False
Li Zhao False
Chengqiang Chu False
Chengmei Shen False
Chunpeng Lv False
Length: 200, dtype: bool
*
为函数
这里的函数,必须以前面的四种合法形式之一为返回值,并且函数的输入值为 DataFrame 本身。
def condition(x):
c1 = df_demo.School == 'Fudan University'
c2 = df_demo.Weight > 70
c3 = df_demo.Grade == 'Senior'
fudan = c1 & c2 & c3
c4 = df_demo.School == 'Peking University'
c5 = df_demo.Weight > 80
c6 = df_demo.Grade == 'Senior'
qinghua = c4 & c5 & (~c6)
result = fudan | qinghua
return result
df_demo.loc[condition]
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
|
School |
Grade |
Gender |
Height |
Weight |
Name |
|
|
|
|
|
Qiang Han |
Peking University |
Freshman |
Male |
185.3 |
87.0 |
Chengpeng Zhou |
Fudan University |
Senior |
Male |
177.1 |
81.0 |
Changpeng Zhao |
Peking University |
Freshman |
Male |
181.3 |
83.0 |
Chengpeng Qian |
Fudan University |
Senior |
Male |
177.2 |
73.0 |
df_demo.loc[lambda x:'Qiang Han',lambda x:'Gender']
Name
Qiang Han Male
Qiang Han Female
Name: Gender, dtype: object
- 函数无法返回如
start: end: step
的切片形式
- 返回切片时要用
slice
对象进行包装
df_demo.loc[lambda x:slice('Gaojuan You','Gaoqiang Qian')]
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
|
School |
Grade |
Gender |
Height |
Weight |
Name |
|
|
|
|
|
Gaojuan You |
Fudan University |
Sophomore |
Male |
174.0 |
74.0 |
Xiaoli Qian |
Tsinghua University |
Freshman |
Female |
158.0 |
51.0 |
Qiang Chu |
Shanghai Jiao Tong University |
Freshman |
Female |
162.5 |
52.0 |
Gaoqiang Qian |
Tsinghua University |
Junior |
Female |
161.9 |
50.0 |
不要使用链式赋值
在对表或者序列赋值时,应当在使用一层索引器后直接进行赋值操作,这样做是由于进行多次索引后赋值是赋在临时返回的 copy 副本上的,而没有真正修改元素从而报出 SettingWithCopyWarning 警告。
df_chain = pd.DataFrame([[0,0],[1,0],[-1,0]],columns=list('AB'))
df_chain
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
import warnings
with warnings.catch_warnings():
warnings.filterwarnings('error')
try:
df_chain[df_chain.A!=0].B = 1
except Warning as w:
Warning_Msg = w
print(Warning_Msg)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df_chain
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
df_chain.loc[df_chain.A!=0,'B'] = 1
df_chain
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
iloc索引器
iloc
使用与loc
完全相似,不过是针对位置进行筛选
- 在相应的
*
位置出一种有5类合法对象
- 整数
- 整数列表
- 整数切片
- 布尔列表
- 函数,返回值必须是前面4个合法对象中的一个,输入为
dataframe
本身
df_demo.iloc[1,1]
'Freshman'
df_demo.iloc[[0,1],[0,1]]
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
|
School |
Grade |
Name |
|
|
Gaopeng Yang |
Shanghai Jiao Tong University |
Freshman |
Changqiang You |
Peking University |
Freshman |
df_demo.iloc[1:4,2:4]
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
|
Gender |
Height |
Name |
|
|
Changqiang You |
Male |
166.5 |
Mei Sun |
Male |
188.9 |
Xiaojuan Sun |
Female |
NaN |
df_demo.iloc[lambda x:slice(1,4)]
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
|
School |
Grade |
Gender |
Height |
Weight |
Name |
|
|
|
|
|
Changqiang You |
Peking University |
Freshman |
Male |
166.5 |
70.0 |
Mei Sun |
Shanghai Jiao Tong University |
Senior |
Male |
188.9 |
89.0 |
Xiaojuan Sun |
Fudan University |
Sophomore |
Female |
NaN |
41.0 |
- 使用布尔不能传入
Series
,而必须传入序列的values
- 使用布尔筛选
优先考虑loc
方式
df_demo.iloc[(df_demo.Weight>80).values].head()
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
|
School |
Grade |
Gender |
Height |
Weight |
Name |
|
|
|
|
|
Mei Sun |
Shanghai Jiao Tong University |
Senior |
Male |
188.9 |
89.0 |
Qiang Zheng |
Shanghai Jiao Tong University |
Senior |
Male |
183.9 |
87.0 |
Qiang Han |
Peking University |
Freshman |
Male |
185.3 |
87.0 |
Chengpeng Zhou |
Fudan University |
Senior |
Male |
177.1 |
81.0 |
Feng Han |
Shanghai Jiao Tong University |
Sophomore |
Male |
183.4 |
82.0 |
df_demo.School.iloc[1]
'Peking University'
df_demo.School.iloc[1:5:2]
Name
Changqiang You Peking University
Xiaojuan Sun Fudan University
Name: School, dtype: object
query方法
- 支持以字符串的形式查询表达式传入
query
方法,表达式执行结果必须返回布尔列表
- 进行复杂索引时,无须像普通方法重复使用
Dataframe
作为列名
df_demo.query('((School == "Fudan University")&'
' (Grade == "Senior")&'
' (Weight > 70) |'
' (School == "Peking University")&'
' (Grade != "Senior")&'
' (Weight > 80))')
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
|
School |
Grade |
Gender |
Height |
Weight |
Name |
|
|
|
|
|
Qiang Han |
Peking University |
Freshman |
Male |
185.3 |
87.0 |
Chengpeng Zhou |
Fudan University |
Senior |
Male |
177.1 |
81.0 |
Changpeng Zhao |
Peking University |
Freshman |
Male |
181.3 |
83.0 |
Chengpeng Qian |
Fudan University |
Senior |
Male |
177.2 |
73.0 |
df_demo.query('Weight > Weight.mean()').head()
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
|
School |
Grade |
Gender |
Height |
Weight |
Name |
|
|
|
|
|
Changqiang You |
Peking University |
Freshman |
Male |
166.5 |
70.0 |
Mei Sun |
Shanghai Jiao Tong University |
Senior |
Male |
188.9 |
89.0 |
Gaojuan You |
Fudan University |
Sophomore |
Male |
174.0 |
74.0 |
Xiaopeng Zhou |
Shanghai Jiao Tong University |
Freshman |
Male |
174.1 |
74.0 |
Xiaomei Zhou |
Tsinghua University |
Senior |
Female |
165.3 |
57.0 |
- 对于含有空格的列名,需要使用
col name
的方式进行引用
query
其他常用
query
提供可读性语法,or
、and
、is in
、not in
- 等价替换
- 需要引用外部变量,在变量名前加
@
df_demo.query('(Grade not in ["Freshman","Sophomore"]) and'
'(Gender == "Male")').head()
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
|
School |
Grade |
Gender |
Height |
Weight |
Name |
|
|
|
|
|
Mei Sun |
Shanghai Jiao Tong University |
Senior |
Male |
188.9 |
89.0 |
Xiaoqiang Qin |
Tsinghua University |
Junior |
Male |
170.1 |
68.0 |
Peng Wang |
Tsinghua University |
Junior |
Male |
162.8 |
65.0 |
Xiaofeng Sun |
Tsinghua University |
Senior |
Male |
170.3 |
71.0 |
Xiaopeng Shen |
Shanghai Jiao Tong University |
Senior |
Male |
166.0 |
62.0 |
df_demo.query('Grade == ["Junior","Senior"]').head()
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
|
School |
Grade |
Gender |
Height |
Weight |
Name |
|
|
|
|
|
Mei Sun |
Shanghai Jiao Tong University |
Senior |
Male |
188.9 |
89.0 |
Gaoqiang Qian |
Tsinghua University |
Junior |
Female |
161.9 |
50.0 |
Juan Xu |
Peking University |
Junior |
Female |
164.8 |
NaN |
Xiaoquan Lv |
Tsinghua University |
Junior |
Female |
153.2 |
43.0 |
Peng You |
Shanghai Jiao Tong University |
Senior |
Female |
NaN |
48.0 |
low, high = 70, 80
df_demo.query('Weight.between(@low,@high)').head()
|
School |
Grade |
Gender |
Height |
Weight |
Name |
|
|
|
|
|
Changqiang You |
Peking University |
Freshman |
Male |
166.5 |
70.0 |
Gaojuan You |
Fudan University |
Sophomore |
Male |
174.0 |
74.0 |
Xiaopeng Zhou |
Shanghai Jiao Tong University |
Freshman |
Male |
174.1 |
74.0 |
Xiaofeng Sun |
Tsinghua University |
Senior |
Male |
170.3 |
71.0 |
Gaoli Zhao |
Peking University |
Freshman |
Male |
175.4 |
78.0 |
随机抽样
sample
,对样本/特征进行随机抽样,参数如下:
n
,抽样数量
axis
,抽样的方向(0为行,1为列)
frac
,抽样比例(0.3为从总体中抽出30%的样本)
replace
,是否放回,True,有放回抽样
weights
,每个样本的抽样相对概率
df = pd.DataFrame({'id':list('abcde'),
'value':[1,2,3,4,9]})
df
|
id |
value |
0 |
a |
1 |
1 |
b |
2 |
2 |
c |
3 |
3 |
d |
4 |
4 |
e |
9 |
df.sample(5, replace = True, weights = df.value)
|
id |
value |
2 |
c |
3 |
4 |
e |
9 |
3 |
d |
4 |
4 |
e |
9 |
3 |
d |
4 |
参考