pandas03:索引(下)

226 阅读7分钟

排版问题

pandas索引

二级索引

多级索引及其表的结构

  • 索引名字,names
  • 值属性,values
  • 得到某一层索引,get_level_values
# 构建多级索引
multi_index = pd.MultiIndex.from_product([list('ABCD'),
              df_demo.Gender.unique()], names=('School', 'Gender'))
  

multi_column = pd.MultiIndex.from_product([['Height', 'Weight'],
                        df_demo.Grade.unique()], names=('Indicator', 'Grade'))


df_multi = pd.DataFrame(np.c_[(np.random.randn(8,4)*5 + 163).tolist(),
                                       (np.random.randn(8,4)*5 + 65).tolist()],
                       index = multi_index,
                       columns = multi_column).round(1)

df_multi
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead tr th { text-align: left; } .dataframe thead tr:last-of-type th { text-align: right; }
Indicator Height Weight
Grade Freshman Senior Sophomore Junior Freshman Senior Sophomore Junior
School Gender
A Female 171.8 165.0 167.9 174.2 60.6 55.1 63.3 65.8
Male 172.3 158.1 167.8 162.2 71.2 71.0 63.1 63.5
B Female 162.5 165.1 163.7 170.3 59.8 57.9 56.5 74.8
Male 166.8 163.6 165.2 164.7 62.5 62.8 58.7 68.9
C Female 170.5 162.0 164.6 158.7 56.9 63.9 60.5 66.9
Male 150.2 166.3 167.3 159.3 62.4 59.1 64.9 67.1
D Female 174.3 155.7 163.2 162.1 65.3 66.5 61.8 63.2
Male 170.7 170.3 163.8 164.9 61.6 63.2 60.9 56.4
  • SchoolGender 分别对应了表的第一层和第二层行索引的名字
  • IndicatorGrade 分别对应了第一层和第二层列索引的名字
# 行索引名字
df_multi.index.names
FrozenList(['School', 'Gender'])
# 列索引名
df_multi.columns.names
FrozenList(['Indicator', 'Grade'])
# 行,值属性
df_multi.index.values
array([('A', 'Female'), ('A', 'Male'), ('B', 'Female'), ('B', 'Male'),       ('C', 'Female'), ('C', 'Male'), ('D', 'Female'), ('D', 'Male')],
      dtype=object)
# 列,值属性
df_multi.columns.values
array([('Height', 'Freshman'), ('Height', 'Senior'),       ('Height', 'Sophomore'), ('Height', 'Junior'),       ('Weight', 'Freshman'), ('Weight', 'Senior'),       ('Weight', 'Sophomore'), ('Weight', 'Junior')], dtype=object)
# 得到某一层索引
df_multi.index.get_level_values(1) # 行,第2层索引
Index(['Female', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female', 'Male'], dtype='object', name='Gender')

多级索引中的loc索引器

  • 将学校和年纪设为索引
    • 行为多级索引
    • 列为单级索引
df_mutli = df_demo.set_index(['School','Grade'])
# df_mutli.head()
df_mutli
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
Gender Height Weight
School Grade
Shanghai Jiao Tong University Freshman Female 158.9 46.0
Peking University Freshman Male 166.5 70.0
Shanghai Jiao Tong University Senior Male 188.9 89.0
Fudan University Sophomore Female NaN 41.0
Sophomore Male 174.0 74.0
... ... ... ...
Junior Female 153.9 46.0
Tsinghua University Senior Female 160.9 50.0
Shanghai Jiao Tong University Senior Female 153.9 45.0
Senior Male 175.3 71.0
Tsinghua University Sophomore Male 155.7 51.0

200 rows × 3 columns

df_sorted = df_multi.sort_index()
# df_sorted.loc[('Fudan University', 'Junior')].head()

IndexSlice对象

  • 应用场景
    • 索引不重复
    • 可对每层进行切片
    • 允许将切片和布尔列表混合使用
  • 使用方式
    • loc[idx[*,*]
    • loc[idx[*,*],idx[*,*]]
# 构建索引不重复df
np.random.seed(0)
a, b = ['A','B','C'],['a','b','c']
mul_index1 = pd.MultiIndex.from_product([a,b],names=('Upper','Lower'))
c, d = ['D','E','F'],['d','e','f']
mul_index2 = pd.MultiIndex.from_product([c,d],names=('Big','Small'))
df = pd.DataFrame(np.random.randint(-9,10,(9,9)),
                  index = mul_index1,
                  columns=mul_index2)
df
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead tr th { text-align: left; } .dataframe thead tr:last-of-type th { text-align: right; }
Big D E F
Small d e f d e f d e f
Upper Lower
A a 3 6 -9 -6 -6 -2 0 9 -5
b -3 3 -8 -3 -2 5 8 -4 4
c -1 0 7 -4 6 6 -9 9 -6
B a 8 5 -2 -9 -8 0 -9 1 -6
b 2 9 -7 -9 -9 -5 -4 -3 -1
c 8 6 -5 0 1 -8 -8 -2 0
C a -6 -3 2 5 9 -9 5 -6 3
b 1 2 -5 -3 -5 6 -6 3 -5
c -1 5 6 -6 6 4 7 8 -4
# 定义slice对象
idx = pd.IndexSlice
  • loc[idx[*,*]]
  • 第1个*表示,第2个*表示
# loc[idx[*,*]]
df.loc[idx['C':,('D','f'):]]
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead tr th { text-align: left; } .dataframe thead tr:last-of-type th { text-align: right; }
Big D E F
Small f d e f d e f
Upper Lower
C a 2 5 9 -9 5 -6 3
b -5 -3 -5 6 -6 3 -5
c 6 -6 6 4 7 8 -4
### 支持布尔序列索引
df.loc[idx[:'A', lambda x:x.sum()>0]] # 列和>0
  • loc[idx[*,*],idx[*,*]]
  • 分层切片,前一个idx指代索引,后一个idx索引
# loc[idx[*,*],idx[*,*]]
df.loc[idx[:'A','b':],idx['E':,'e':]]
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead tr th { text-align: left; } .dataframe thead tr:last-of-type th { text-align: right; }
Big E F
Small e f e f
Upper Lower
A b -2 5 -4 4
c 6 6 9 -6

多级索引构造

  • 可使用pd.MultiIndex对象下的函数
    • from_tuples,根据传入由元组组成的列表进行构造
    • from_arrays,根据传入列表中,对应层的列表进行构造
    • from_product,根据给定多个列表的笛卡尔积进行构造
# from_tuples
a = [('a','cat'),('a','dog'),('b','cat'),('b','dog')]
pd.MultiIndex.from_tuples(a, names=['First','Second'])
MultiIndex([('a', 'cat'),            ('a', 'dog'),            ('b', 'cat'),            ('b', 'dog')],
           names=['First', 'Second'])
# from_arrays
b = [list('aabb'),['cat','dog']*2]
pd.MultiIndex.from_arrays(b,names=['Frist','Second'])
MultiIndex([('a', 'cat'),            ('a', 'dog'),            ('b', 'cat'),            ('b', 'dog')],
           names=['Frist', 'Second'])
# from_product
a = ['a','b']
b = ['cat','dog']
pd.MultiIndex.from_product([a,b],names=['First','Second'])
MultiIndex([('a', 'cat'),            ('a', 'dog'),            ('b', 'cat'),            ('b', 'dog')],
           names=['First', 'Second'])

索引的常用方法

索引的交换和删除

  • 交换,可指定交换轴(行/列索引)
    • swaplevel,只能交换2个层
    • reorder_levels,可以交换任意层
  • 删除,droplevel
np.random.seed(0)

L1,L2,L3 = ['A','B'],['a','b'],['alpha','beta']

mul_index1 = pd.MultiIndex.from_product([L1,L2,L3],
             names=('Upper', 'Lower','Extra'))


L4,L5,L6 = ['C','D'],['c','d'],['cat','dog']

mul_index2 = pd.MultiIndex.from_product([L4,L5,L6],
             names=('Big', 'Small', 'Other'))
 

df = pd.DataFrame(np.random.randint(-9,10,(8,8)),
                     index=mul_index1,
                     columns=mul_index2)


df
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead tr th { text-align: left; } .dataframe thead tr:last-of-type th { text-align: right; }
Big C D
Small c d c d
Other cat dog cat dog cat dog cat dog
Upper Lower Extra
A a alpha 3 6 -9 -6 -6 -2 0 9
beta -5 -3 3 -8 -3 -2 5 8
b alpha -4 4 -1 0 7 -4 6 6
beta -9 9 -6 8 5 -2 -9 -8
B a alpha 0 -9 1 -6 2 9 -7 -9
beta -9 -5 -4 -3 -1 8 6 -5
b alpha 0 1 -8 -8 -2 0 -6 -3
beta 2 5 9 -9 5 -6 3 1
# axis=1列索引
# 第1层和第3层交换  big / other
df.swaplevel(0,2,axis=1).head()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead tr th { text-align: left; } .dataframe thead tr:last-of-type th { text-align: right; }
Other cat dog cat dog cat dog cat dog
Small c c d d c c d d
Big C C C C D D D D
Upper Lower Extra
A a alpha 3 6 -9 -6 -6 -2 0 9
beta -5 -3 3 -8 -3 -2 5 8
b alpha -4 4 -1 0 7 -4 6 6
beta -9 9 -6 8 5 -2 -9 -8
B a alpha 0 -9 1 -6 2 9 -7 -9
# 行索引
# 0 1 2 -- 2 0 1
df.reorder_levels([2,0,1],axis=0).head()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead tr th { text-align: left; } .dataframe thead tr:last-of-type th { text-align: right; }
Big C D
Small c d c d
Other cat dog cat dog cat dog cat dog
Extra Upper Lower
alpha A a 3 6 -9 -6 -6 -2 0 9
beta A a -5 -3 3 -8 -3 -2 5 8
alpha A b -4 4 -1 0 7 -4 6 6
beta A b -9 9 -6 8 5 -2 -9 -8
alpha B a 0 -9 1 -6 2 9 -7 -9
# 删除某一层索引
df.droplevel(1,axis=1) # -small
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead tr th { text-align: left; } .dataframe thead tr:last-of-type th { text-align: right; }
Big C D
Other cat dog cat dog cat dog cat dog
Upper Lower Extra
A a alpha 3 6 -9 -6 -6 -2 0 9
beta -5 -3 3 -8 -3 -2 5 8
b alpha -4 4 -1 0 7 -4 6 6
beta -9 9 -6 8 5 -2 -9 -8
B a alpha 0 -9 1 -6 2 9 -7 -9
beta -9 -5 -4 -3 -1 8 6 -5
b alpha 0 1 -8 -8 -2 0 -6 -3
beta 2 5 9 -9 5 -6 3 1
# 删除2行索引
df.droplevel([0,1],axis=0)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead tr th { text-align: left; } .dataframe thead tr:last-of-type th { text-align: right; }
Big C D
Small c d c d
Other cat dog cat dog cat dog cat dog
Extra
alpha 3 6 -9 -6 -6 -2 0 9
beta -5 -3 3 -8 -3 -2 5 8
alpha -4 4 -1 0 7 -4 6 6
beta -9 9 -6 8 5 -2 -9 -8
alpha 0 -9 1 -6 2 9 -7 -9
beta -9 -5 -4 -3 -1 8 6 -5
alpha 0 1 -8 -8 -2 0 -6 -3
beta 2 5 9 -9 5 -6 3 1

索引属性修改

  • rename_axis:对索引层名字修改
  • rename:对索引的值进行修改
    • 如果是多级索引需要指定修改的层号level
  • 传入参数可以为函数
  • 对整个索引元素替换,使用迭代器
  • 对某个位置元素进行修改,可使用map函数
    • 为用户跨层提供🏪操作
    • 对多级索引压缩
# 修改索引层名字
df.rename_axis(index={'Upper':'Changed_row'},
               columns={'Other':'Changed_Col'}).head()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead tr th { text-align: left; } .dataframe thead tr:last-of-type th { text-align: right; }
Big C D
Small c d c d
Changed_Col cat dog cat dog cat dog cat dog
Changed_row Lower Extra
A a alpha 3 6 -9 -6 -6 -2 0 9
beta -5 -3 3 -8 -3 -2 5 8
b alpha -4 4 -1 0 7 -4 6 6
beta -9 9 -6 8 5 -2 -9 -8
B a alpha 0 -9 1 -6 2 9 -7 -9
# 修改索引值名字
# level 索引层号
df.rename(columns={'cat':'not_cat'},level=2).head()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead tr th { text-align: left; } .dataframe thead tr:last-of-type th { text-align: right; }
Big C D
Small c d c d
Other not_cat dog not_cat dog not_cat dog not_cat dog
Upper Lower Extra
A a alpha 3 6 -9 -6 -6 -2 0 9
beta -5 -3 3 -8 -3 -2 5 8
b alpha -4 4 -1 0 7 -4 6 6
beta -9 9 -6 8 5 -2 -9 -8
B a alpha 0 -9 1 -6 2 9 -7 -9
# 传入参数也可以是函数,输入值就是索引元素
# 第2层索引大写
df.rename(index=lambda x:str.upper(x),level=2).head()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead tr th { text-align: left; } .dataframe thead tr:last-of-type th { text-align: right; }
Big C D
Small c d c d
Other cat dog cat dog cat dog cat dog
Upper Lower Extra
A a ALPHA 3 6 -9 -6 -6 -2 0 9
BETA -5 -3 3 -8 -3 -2 5 8
b ALPHA -4 4 -1 0 7 -4 6 6
BETA -9 9 -6 8 5 -2 -9 -8
B a ALPHA 0 -9 1 -6 2 9 -7 -9
# 对整个索引替换,可以使用迭代器
a = iter(list('abcdefgh'))
df.rename(index= lambda x:next(a),level=2)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead tr th { text-align: left; } .dataframe thead tr:last-of-type th { text-align: right; }
Big C D
Small c d c d
Other cat dog cat dog cat dog cat dog
Upper Lower Extra
A a a 3 6 -9 -6 -6 -2 0 9
b -5 -3 3 -8 -3 -2 5 8
b c -4 4 -1 0 7 -4 6 6
d -9 9 -6 8 5 -2 -9 -8
B a e 0 -9 1 -6 2 9 -7 -9
f -9 -5 -4 -3 -1 8 6 -5
b g 0 1 -8 -8 -2 0 -6 -3
h 2 5 9 -9 5 -6 3 1

索引设置与重置

  • set_index,索引设置
    • 主要参数:append,是否保留原来的索引
  • rest_indexset_index逆函数
    • 主要参数:drop,是否把去掉对索引层丢弃

索引的变形

  • reindex
  • reindex_like

索引运算

集合的运算法则

一般的索引运算

参考