pandas.iloc方法使用详解Pandas provides a suite of methods in order

在前天的那一篇文章里提过pandas的.loc方法和.iloc方法，昨天我们仔细的讨论了.loc方法，那我们今天就来好好聊聊Pandas中的.iloc方法！

我们首先来看一下文档里是怎么说的

Pandas provides a suite of methods in order to get purely integer based indexing.

The .iloc attribute is the primary access method. The following are valid inputs:
- An integer e.g. 5
- A list or array of integers[4, 3, 0]
- A slice object with ints 1:7
- A boolean array
- A callable, see Selection By Callable

    >>> import numpy as np
    >>> import pandas as pd
    >>> s1 = pd.Series(np.random.randn(5), index=list(range(0, 10, 2)))
    >>> s1
    0    0.960870
    2    0.537314
    4    3.518552
    6   -0.608548
    8   -0.359744
    dtype: float64

pandas.iloc方法提供了基于整数的索引方式，跟 python自身的list 的索引方式是十分类似的！我们定义了一个包含5个随机数的pandas.Series，这5个数的索引标签(a label of the index)是从0开始到10(不包括在内)之间的所有偶数,接下来我们来看看和.loc方法有什么不同的地方：

>>> s1.iloc[:3]
0    0.960870
2    0.537314
4    3.518552
dtype: float64
>>> s1.iloc[3]
-0.608547657174299

可以看到，3并不是索引的某一个标签(a label of the index)，当我调用s1.iloc[:3]时返回的是从第0行到第2行的数据，同样的操作通过.loc方法实现如下：

>>> s1.loc[:4]
0    0.960870
2    0.537314
4    3.518552
dtype: float64
>>> s1.loc[6]
-0.608547657174299

我们也可以通过一个索引或者是一个索引切片来进行赋值：

>>> s1.iloc[:3] = 0
>>> s1
0    0.000000
2    0.000000
4    0.000000
6   -0.608548
8   -0.359744
dtype: float64
>>> s1.iloc[4] = 3.3
0    0.000000
2    0.000000
4    0.000000
6   -0.608548
8    3.300000
dtype: float64

在pandas的DataFrame中.loc方法并没有很大区别,以下展示代码，不进行过多赘述

>>> df1 = pd.DataFrame(np.random.randn(6, 4), 
                       index=list(range(0, 12, 2)), 
                       columns=list(range(0, 8, 2)))
>>> df1
          0         2         4
0 -0.152577  0.417328 -0.531741
3  0.109624  0.616024 -0.229882
6  0.355600 -0.633415  0.626256
9  1.208580 -0.672474 -0.141351
>>> df1.iloc[1:4, 1:3]
          2         4
3  0.616024 -0.229882
6 -0.633415  0.626256
9 -0.672474 -0.141351
>>> df1.iloc[[0, 2], [0, 2]]
          0         4
0 -0.152577 -0.531741
6  0.355600  0.626256
>>> df1.iloc[1:4, :]
          0         2         4
3  0.109624  0.616024 -0.229882
6  0.355600 -0.633415  0.626256
9  1.208580 -0.672474 -0.141351
>>> df1.iloc[:, 1:3]
          2         4
0  0.417328 -0.531741
3  0.616024 -0.229882
6 -0.633415  0.626256
9 -0.672474 -0.141351
>>> df1.iloc[1, 1]
0.6160235470931357
>>> df1.iat[1, 1]
0.6160235470931357
>>> df1.iloc[1]
0    0.109624
2    0.616024
4   -0.229882
Name: 3, dtype: float64
>>> df1.xs(3)
0    0.109624
2    0.616024
4   -0.229882
Name: 3, dtype: float64

从上面我们可以看到，当我们想要获取DataFrame中某一行某一列的数据时，除了通过.iloc方法传入某一行某一列具体的索引，也可以通过向.iat方法传入某一行某一列的索引达到相同的效果。当我们想要获取DataFrame中某一列的数据时，可以通过.iloc方法直接传入索引获取，也可以通过xs方法传入索引的标签(a label of the index)获取

最后再看一点：

>>> x = list('abcdef')
>>> x
['a', 'b', 'c', 'd', 'e', 'f']
>>> x[4:10]
['e', 'f']
>>> x[8:10]
[]
>>> s = pd.Series(x)
>>> s
0    a
1    b
2    c
3    d
4    e
5    f
dtype: object
>>> s.ilco[4:10]
4    e
5    f
dtype: object
>>> s.iloc[8:10]
Series([], dtype: object)
>>> df1.iloc[:, 3:5]
Empty DataFrame
Columns: []
Index: [0, 3, 6, 9]
>>> df1.iloc[[4, 5, 6]]
IndexError: positional indexers are out-of-bounds

从上面可以看到当调用.iloc方法传入一个超过范围的切片时，会返回一个空的pandas.Series或者是pandas.DataFrame，当传入单个索引超出范围时，会返回IndexError

关于pandas.iloc方法的用法就写到这里啦！文章中涉及的所有代码都可以在我的Github中找到！文章和代码中有什么错误的地方恳请大家不吝赐教！欢迎你们留言评论！