Pandas系列教程和实例

123 阅读7分钟

系列结构,也被称为系列序列,是pandas中常用的数据结构之一。它是一种类似于一维数组的结构,由一组数据值和一组标签组成,其中标签和数据值是一对一的对应关系。

系列可以保存任何数据类型,如整数、字符串、浮点数、python对象等,其标签默认为整数,从0开始。 通过索引标签,我们可以更直观地查看数据的索引位置。

1.如何创建系列对象

  1. Pandas使用series()函数来创建一个系列对象。

  2. 通过这个系列对象,你可以调用相应的方法和属性来处理数据。

    import pandas as pd
    
    series = pd.Series( data, index, dtype, copy)
    
    data: The input data, can be lists, constants, ndarray arrays, etc.
    
    index: The index value must be unique. If no index is passed, it defaults to np.arrange(n).
    
    dtype: dtype indicates the data type. If it is not provided, it will be determined automatically.
    
    copy: Indicates copying data. The default value is false.
    
  3. 我们也可以使用数组、字典、标量值或Python对象创建系列对象。下面是创建系列对象的例子。

1.1 创建一个空的系列对象

  1. 创建一个空的系列对象,这已经被废弃了。

    >>> import pandas as pd
    >>> # create an empty Series object, this is deprecated.
    ... s = pd.Series()
    __main__:2: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
    >>> print(s)
    Series([], dtype: float64)
    

1.2 从NumPy ndarray对象创建系列对象

  1. NdarrayNumPy的一个数组类型。当数据类型为Ndarray时,传递的索引必须与数组的长度相同。

  2. 如果没有传入索引参数,默认情况下,索引值将使用range(n)函数生成,其中n代表数组的长度。

  3. 下面的例子将使用默认的索引创建一个系列序列对象。因为没有传递索引,所以默认从0开始分配索引,索引范围是0len(data)-1,也就是0到3。这种设置方法被称为 "隐式索引"。

    >>> # import the pandas module.
    ... import pandas as pd
    >>> 
    >>> # import the numpy module.
    ... import numpy as np
    >>> 
    >>> # create a ndarray object.
    ... data = np.array(['python','javascript','java','php'])
    >>> 
    >>> # create a Series object with the above ndarray object.
    ... s = pd.Series(data)
    >>> 
    >>> # print out the Series object.
    ... print (s)
    0        python
    1    javascript
    2          java
    3           php
    dtype: object
    
  4. 你也可以使用 "显式索引"来定义索引标签,就像下面的例子。

    >>> # import pandas, numpy python library.
    ... import pandas as pd
    >>> import numpy as np
    >>> 
    >>> # create a numpy ndarray object. 
    ... data = np.array(['python','javascript','java'])
    >>> 
    >>> # create an list object.
    ... idx = ['language-1', 2, 'language-3'] 
    >>> 
    >>> # set the Series object's index explicitly. 
    >>> s = pd.Series(data=data,index=idx)
    >>> 
    >>> print(s)
    language-1        python
    2             javascript
    language-3          java
    dtype: object
    
    

1.3 从一个Python字典对象创建Pandas系列对象

  1. 你可以使用一个Python字典对象作为输入数据来创建系列对象。

  2. 如果没有传入索引,索引将根据字典对象的键来构建。

    >>> # import the python pandas library
    >>> import pandas as pd
    >>>
    >>> # create the python dictionary object.
    >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'}
    >>>
    >>> # create the pandas Series object based on the above python dictionary object, do not pass in the index parameter.
    >>> series_obj = pd.Series(dict_data)
    >>>
    >>> # print out the pandas Series object,the keys value is the python dictionary object keys value.
    >>> print(series_obj)
    PL     Python
    OS    Windows
    DB      MySQL
    dtype: object
    
    
  3. 相反,当传入索引参数时,索引标签需要与字典中的值逐一对应。

    >>> # import the python pandas library
    >>> import pandas as pd
    >>>
    >>> # create the python dictionary object.
    >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'}
    >>> 
    >>> # define the Series object's index list, the index list element value should match the above dictionary object keys' value.  
    >>> index_data= ['DB', 'OS', 'PL', 'ISO']
    >>>
    >>> # create the pandas Series object based on the above python dictionary object, pass in the index parameter.
    >>> series_obj = pd.Series(dict_data, index=index_data)
    >>>
    >>> # print out the pandas Series object,the keys value is the python dictionary object keys value.
    >>> print(series_obj)
    DB      MySQL
    OS    Windows
    PL     Python
    ISO       NaN # Because the dictionary object's keys value does not contain 'ISO', then this element's value is NaN.
    dtype: object
    

1.4 从标量值创建Pandas系列对象

  1. 你可以从一个标量值创建pandas系列对象,在这种情况下,必须提供一个索引,下面是例子。

    >>> # import python pandas module with the pd name alias.
    >>> import pandas as pd
    >>>
    >>> # create a list object.
    >>> index_list=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    >>>
    >>> # create a pandas Series object from a scalar value 10, you must pass in the index array, the scalar values are repeated according to the number of indexes and correspond to them one by one.
    >>> s = pd.Series(10, index = index_list)
    >>>
    >>> # print out the pandas Series object
    >>> print(s)
    1     10
    2     10
    3     10
    4     10
    5     10
    6     10
    7     10
    8     10
    9     10
    10    10
    dtype: int64
    

2.如何读取系列对象

  1. 上面一节解释了创建pandas系列对象的各种方法,但是你应该如何读取系列对象中的元素呢?

  2. 我们可以通过系列号索引、标签索引或索引片来读取系列对象的元素。

  3. 下面是一些例子:

    >>> # import the python pandas library 
    >>> import pandas as pd 
    >>> 
    >>> # create the python dictionary object. 
    >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'} 
    >>> 
    >>> # define the Series object's index list, the index list element value should match the above dictionary object keys' value. 
    >>> index_data= ['DB', 'OS', 'PL', 'ISO']
    >>>
    >>> # create the pandas Series object based on the above python dictionary object, pass in the index parameter. 
    >>> series_obj = pd.Series(dict_data, index=index_data)
    >>> 
    >>> print(series_obj)
    DB       MySQL
    OS     Windows
    PL      Python
    ISO        NaN
    dtype: object
    >>> print(series_obj[0]) # get series element by number index.
    MySQL
    >>>
    >>> print(series_obj['OS']) # get series element by tag index.
    Windows
    >>>
    >>> print(series_obj[:3]) # get series elements by number index slice.
    DB      MySQL
    OS    Windows
    PL     Python
    dtype: object
    >>>
    >>> print(series_obj[-3:])# get series elements by number index slice.
    OS     Windows
    PL      Python
    ISO        NaN
    dtype: object
    >>>
    >>> print(series_obj[['OS','DB']])# get series elements by tag index slice.
    OS    Windows
    DB      MySQL
    dtype: object
    >>>
    >>> print(series_obj[['OS','DB','BD']]) # when the tag index dose not exist then it will throw KeyError.
    Traceback (most recent call last):
      File "", line 1, in 
      File "C:\Users\zhaosong\anaconda3\envs\MyPythonEnv\lib\site-packages\pandas\core\series.py", line 966, in __getitem__
        return self._get_with(key)
      File "C:\Users\zhaosong\anaconda3\envs\MyPythonEnv\lib\site-packages\pandas\core\series.py", line 1006, in _get_with
        return self.loc[key]
      File "C:\Users\zhaosong\anaconda3\envs\MyPythonEnv\lib\site-packages\pandas\core\indexing.py", line 931, in __getitem__
        return self._getitem_axis(maybe_callable, axis=axis)
      File "C:\Users\zhaosong\anaconda3\envs\MyPythonEnv\lib\site-packages\pandas\core\indexing.py", line 1153, in _getitem_axis
        return self._getitem_iterable(key, axis=axis)
      File "C:\Users\zhaosong\anaconda3\envs\MyPythonEnv\lib\site-packages\pandas\core\indexing.py", line 1093, in _getitem_iterable
        keyarr, indexer = self._get_listlike_indexer(key, axis)
      File "C:\Users\zhaosong\anaconda3\envs\MyPythonEnv\lib\site-packages\pandas\core\indexing.py", line 1314, in _get_listlike_indexer
        self._validate_read_indexer(keyarr, indexer, axis)
      File "C:\Users\zhaosong\anaconda3\envs\MyPythonEnv\lib\site-packages\pandas\core\indexing.py", line 1377, in _validate_read_indexer
        raise KeyError(f"{not_found} not in index")
    KeyError: "['BD'] not in index"
    
    

3.如何删除系列对象的数据

  1. 调用系列对象的drop()方法来删除其数据。

    >>> import pandas as pd
    >>>
    >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'}
    >>>
    >>> index_data= ['DB', 'OS', 'PL', 'ISO']
    >>>
    >>> series_obj = pd.Series(dict_data, index=index_data)
    >>>
    >>> print(series_obj)
    DB       MySQL
    OS     Windows
    PL      Python
    ISO        NaN
    dtype: object
    >>>
    >>> series_obj.drop(['DB'])
    OS     Windows
    PL      Python
    ISO        NaN
    dtype: object
    

4.Pandas系列对象的一般方法

  1. head(n):返回数据的前n行,如果没有提供参数n,它的默认值是5,那么返回数据的前5行。

  2. tail(n):返回数据的最后n行,默认返回最后5行。

    >>> import pandas as pd
    >>>
    >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'}
    >>>
    >>> index_data= ['DB', 'OS', 'PL', 'ISO']
    >>>
    >>> series_obj = pd.Series(dict_data, index=index_data)
    >>>
    >>> print(series_obj)
    DB       MySQL
    OS     Windows
    PL      Python
    ISO        NaN
    dtype: object
    >>>
    >>> print(series_obj.head(3)) # return the first 3 rows in the Series object.
    DB      MySQL
    OS    Windows
    PL     Python
    dtype: object
    >>>
    >>> print(series_obj.tail(3))
    OS     Windows
    PL      Python
    ISO        NaN
    dtype: object
    
  3. isnull() / notnull()。检测系列对象的元素值是否为空。如果元素的值是空的,那么isnull()返回Truenotnull()返回False

    >>> import pandas as pd
    >>>
    >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'}
    >>>
    >>> index_data= ['DB', 'OS', 'PL', 'ISO']
    >>>
    >>> series_obj = pd.Series(dict_data, index=index_data)
    >>>
    >>> print(series_obj)
    DB       MySQL
    OS     Windows
    PL      Python
    ISO        NaN
    dtype: object
    >>>
    >>> print(pd.isnull(series_obj))
    DB     False
    OS     False
    PL     False
    ISO     True
    dtype: bool
    >>>
    >>> print(pd.notnull(series_obj))
    DB      True
    OS      True
    PL      True
    ISO    False
    dtype: bool
    

5.Pandas系列对象的一般属性

  1. axes:返回列表中的所有行索引标签。

    >>> import pandas as pd
    >>>
    >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'}
    >>>
    >>> index_data= ['DB', 'OS', 'PL', 'ISO']
    >>>
    >>> series_obj = pd.Series(dict_data, index=index_data)
    >>>
    >>> print(series_obj)
    DB       MySQL
    OS     Windows
    PL      Python
    ISO        NaN
    dtype: object
    >>>
    >>>
    >>> print(series_obj.axes)
    [Index(['DB', 'OS', 'PL', 'ISO'], dtype='object')]
    
  2. dtype:系列元素的数据类型。

    >>> import pandas as pd
    >>>
    >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'}
    >>>
    >>> index_data= ['DB', 'OS', 'PL', 'ISO']
    >>>
    >>> series_obj = pd.Series(dict_data, index=index_data)
    >>>
    >>> print(series_obj)
    DB       MySQL
    OS     Windows
    PL      Python
    ISO        NaN
    dtype: object
    >>>
    >>> print(series_obj.dtype)
    object
    
  3. empty: 空的。返回一个布尔值,用于确定系列 对象是否为空。

    >>> import pandas as pd
    >>>
    >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'}
    >>>
    >>> index_data= ['DB', 'OS', 'PL', 'ISO']
    >>>
    >>> series_obj = pd.Series(dict_data, index=index_data)
    >>>
    >>> print(series_obj)
    DB       MySQL
    OS     Windows
    PL      Python
    ISO        NaN
    dtype: object
    >>>
    >>> print(series_obj.empty)
    False
    
  4. index:返回一个RangeIndex对象,用于描述该对象是否是空的。返回一个RangeIndex对象,描述索引的值范围。

    >>> import pandas as pd
    >>>
    >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'}
    >>>
    >>> index_data= ['DB', 'OS', 'PL', 'ISO']
    >>>
    >>> series_obj = pd.Series(dict_data, index=index_data)
    >>>
    >>> print(series_obj)
    DB       MySQL
    OS     Windows
    PL      Python
    ISO        NaN
    dtype: object
    >>>
    >>> print(series_obj.index)
    Index(['DB', 'OS', 'PL', 'ISO'], dtype='object')
    >>>
    >>> print(series_obj.index.array)
    <PandasArray>
    ['DB', 'OS', 'PL', 'ISO']
    Length: 4, dtype: object
    >>>
    >>> for key in series_obj.index.array:
    ...      print(key, ' = ', series_obj[key])
    ...
    DB  =  MySQL
    OS  =  Windows
    PL  =  Python
    ISO  =  nan
    
  5. ndim:返回系列对象的尺寸。根据定义,Series对象是一个一维的数据结构,所以它总是返回1。

    >>> print(series_obj.ndim)
    1
    
  6. size:返回系列对象的尺寸(长度)。返回系列对象的大小(长度)。

    >>> print(series_obj.size)
    4
    
  7. values(值):以数组形式返回系列对象中的数据。

    >>> print(series_obj.values)
    ['MySQL' 'Windows' 'Python' nan]