系列结构,也被称为系列序列,是pandas中常用的数据结构之一。它是一种类似于一维数组的结构,由一组数据值和一组标签组成,其中标签和数据值是一对一的对应关系。
系列可以保存任何数据类型,如整数、字符串、浮点数、python对象等,其标签默认为整数,从0开始。 通过索引标签,我们可以更直观地查看数据的索引位置。
1.如何创建系列对象
-
Pandas使用series()函数来创建一个系列对象。
-
通过这个系列对象,你可以调用相应的方法和属性来处理数据。
import pandas as pd series = pd.Series( data, index, dtype, copy) data: The input data, can be lists, constants, ndarray arrays, etc. index: The index value must be unique. If no index is passed, it defaults to np.arrange(n). dtype: dtype indicates the data type. If it is not provided, it will be determined automatically. copy: Indicates copying data. The default value is false. -
我们也可以使用数组、字典、标量值或Python对象创建系列对象。下面是创建系列对象的例子。
1.1 创建一个空的系列对象
-
创建一个空的系列对象,这已经被废弃了。
>>> import pandas as pd >>> # create an empty Series object, this is deprecated. ... s = pd.Series() __main__:2: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning. >>> print(s) Series([], dtype: float64)
1.2 从NumPy ndarray对象创建系列对象
-
Ndarray是NumPy的一个数组类型。当数据类型为Ndarray时,传递的索引必须与数组的长度相同。
-
如果没有传入索引参数,默认情况下,索引值将使用range(n)函数生成,其中n代表数组的长度。
-
下面的例子将使用默认的索引创建一个系列序列对象。因为没有传递索引,所以默认从0开始分配索引,索引范围是0到len(data)-1,也就是0到3。这种设置方法被称为 "隐式索引"。
>>> # import the pandas module. ... import pandas as pd >>> >>> # import the numpy module. ... import numpy as np >>> >>> # create a ndarray object. ... data = np.array(['python','javascript','java','php']) >>> >>> # create a Series object with the above ndarray object. ... s = pd.Series(data) >>> >>> # print out the Series object. ... print (s) 0 python 1 javascript 2 java 3 php dtype: object -
你也可以使用 "显式索引"来定义索引标签,就像下面的例子。
>>> # import pandas, numpy python library. ... import pandas as pd >>> import numpy as np >>> >>> # create a numpy ndarray object. ... data = np.array(['python','javascript','java']) >>> >>> # create an list object. ... idx = ['language-1', 2, 'language-3'] >>> >>> # set the Series object's index explicitly. >>> s = pd.Series(data=data,index=idx) >>> >>> print(s) language-1 python 2 javascript language-3 java dtype: object
1.3 从一个Python字典对象创建Pandas系列对象
-
你可以使用一个Python字典对象作为输入数据来创建系列对象。
-
如果没有传入索引,索引将根据字典对象的键来构建。
>>> # import the python pandas library >>> import pandas as pd >>> >>> # create the python dictionary object. >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'} >>> >>> # create the pandas Series object based on the above python dictionary object, do not pass in the index parameter. >>> series_obj = pd.Series(dict_data) >>> >>> # print out the pandas Series object,the keys value is the python dictionary object keys value. >>> print(series_obj) PL Python OS Windows DB MySQL dtype: object -
相反,当传入索引参数时,索引标签需要与字典中的值逐一对应。
>>> # import the python pandas library >>> import pandas as pd >>> >>> # create the python dictionary object. >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'} >>> >>> # define the Series object's index list, the index list element value should match the above dictionary object keys' value. >>> index_data= ['DB', 'OS', 'PL', 'ISO'] >>> >>> # create the pandas Series object based on the above python dictionary object, pass in the index parameter. >>> series_obj = pd.Series(dict_data, index=index_data) >>> >>> # print out the pandas Series object,the keys value is the python dictionary object keys value. >>> print(series_obj) DB MySQL OS Windows PL Python ISO NaN # Because the dictionary object's keys value does not contain 'ISO', then this element's value is NaN. dtype: object
1.4 从标量值创建Pandas系列对象
-
你可以从一个标量值创建pandas系列对象,在这种情况下,必须提供一个索引,下面是例子。
>>> # import python pandas module with the pd name alias. >>> import pandas as pd >>> >>> # create a list object. >>> index_list=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] >>> >>> # create a pandas Series object from a scalar value 10, you must pass in the index array, the scalar values are repeated according to the number of indexes and correspond to them one by one. >>> s = pd.Series(10, index = index_list) >>> >>> # print out the pandas Series object >>> print(s) 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 10 10 dtype: int64
2.如何读取系列对象
-
上面一节解释了创建pandas系列对象的各种方法,但是你应该如何读取系列对象中的元素呢?
-
我们可以通过系列号索引、标签索引或索引片来读取系列对象的元素。
-
下面是一些例子:
>>> # import the python pandas library >>> import pandas as pd >>> >>> # create the python dictionary object. >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'} >>> >>> # define the Series object's index list, the index list element value should match the above dictionary object keys' value. >>> index_data= ['DB', 'OS', 'PL', 'ISO'] >>> >>> # create the pandas Series object based on the above python dictionary object, pass in the index parameter. >>> series_obj = pd.Series(dict_data, index=index_data) >>> >>> print(series_obj) DB MySQL OS Windows PL Python ISO NaN dtype: object >>> print(series_obj[0]) # get series element by number index. MySQL >>> >>> print(series_obj['OS']) # get series element by tag index. Windows >>> >>> print(series_obj[:3]) # get series elements by number index slice. DB MySQL OS Windows PL Python dtype: object >>> >>> print(series_obj[-3:])# get series elements by number index slice. OS Windows PL Python ISO NaN dtype: object >>> >>> print(series_obj[['OS','DB']])# get series elements by tag index slice. OS Windows DB MySQL dtype: object >>> >>> print(series_obj[['OS','DB','BD']]) # when the tag index dose not exist then it will throw KeyError. Traceback (most recent call last): File "", line 1, in File "C:\Users\zhaosong\anaconda3\envs\MyPythonEnv\lib\site-packages\pandas\core\series.py", line 966, in __getitem__ return self._get_with(key) File "C:\Users\zhaosong\anaconda3\envs\MyPythonEnv\lib\site-packages\pandas\core\series.py", line 1006, in _get_with return self.loc[key] File "C:\Users\zhaosong\anaconda3\envs\MyPythonEnv\lib\site-packages\pandas\core\indexing.py", line 931, in __getitem__ return self._getitem_axis(maybe_callable, axis=axis) File "C:\Users\zhaosong\anaconda3\envs\MyPythonEnv\lib\site-packages\pandas\core\indexing.py", line 1153, in _getitem_axis return self._getitem_iterable(key, axis=axis) File "C:\Users\zhaosong\anaconda3\envs\MyPythonEnv\lib\site-packages\pandas\core\indexing.py", line 1093, in _getitem_iterable keyarr, indexer = self._get_listlike_indexer(key, axis) File "C:\Users\zhaosong\anaconda3\envs\MyPythonEnv\lib\site-packages\pandas\core\indexing.py", line 1314, in _get_listlike_indexer self._validate_read_indexer(keyarr, indexer, axis) File "C:\Users\zhaosong\anaconda3\envs\MyPythonEnv\lib\site-packages\pandas\core\indexing.py", line 1377, in _validate_read_indexer raise KeyError(f"{not_found} not in index") KeyError: "['BD'] not in index"
3.如何删除系列对象的数据
-
调用系列对象的drop()方法来删除其数据。
>>> import pandas as pd >>> >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'} >>> >>> index_data= ['DB', 'OS', 'PL', 'ISO'] >>> >>> series_obj = pd.Series(dict_data, index=index_data) >>> >>> print(series_obj) DB MySQL OS Windows PL Python ISO NaN dtype: object >>> >>> series_obj.drop(['DB']) OS Windows PL Python ISO NaN dtype: object
4.Pandas系列对象的一般方法
-
head(n):返回数据的前n行,如果没有提供参数n,它的默认值是5,那么返回数据的前5行。
-
tail(n):返回数据的最后n行,默认返回最后5行。
>>> import pandas as pd >>> >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'} >>> >>> index_data= ['DB', 'OS', 'PL', 'ISO'] >>> >>> series_obj = pd.Series(dict_data, index=index_data) >>> >>> print(series_obj) DB MySQL OS Windows PL Python ISO NaN dtype: object >>> >>> print(series_obj.head(3)) # return the first 3 rows in the Series object. DB MySQL OS Windows PL Python dtype: object >>> >>> print(series_obj.tail(3)) OS Windows PL Python ISO NaN dtype: object -
isnull() / notnull()。检测系列对象的元素值是否为空。如果元素的值是空的,那么isnull()返回True,notnull()返回False。
>>> import pandas as pd >>> >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'} >>> >>> index_data= ['DB', 'OS', 'PL', 'ISO'] >>> >>> series_obj = pd.Series(dict_data, index=index_data) >>> >>> print(series_obj) DB MySQL OS Windows PL Python ISO NaN dtype: object >>> >>> print(pd.isnull(series_obj)) DB False OS False PL False ISO True dtype: bool >>> >>> print(pd.notnull(series_obj)) DB True OS True PL True ISO False dtype: bool
5.Pandas系列对象的一般属性
-
axes:返回列表中的所有行索引标签。
>>> import pandas as pd >>> >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'} >>> >>> index_data= ['DB', 'OS', 'PL', 'ISO'] >>> >>> series_obj = pd.Series(dict_data, index=index_data) >>> >>> print(series_obj) DB MySQL OS Windows PL Python ISO NaN dtype: object >>> >>> >>> print(series_obj.axes) [Index(['DB', 'OS', 'PL', 'ISO'], dtype='object')] -
dtype:系列元素的数据类型。
>>> import pandas as pd >>> >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'} >>> >>> index_data= ['DB', 'OS', 'PL', 'ISO'] >>> >>> series_obj = pd.Series(dict_data, index=index_data) >>> >>> print(series_obj) DB MySQL OS Windows PL Python ISO NaN dtype: object >>> >>> print(series_obj.dtype) object -
empty: 空的。返回一个布尔值,用于确定系列 对象是否为空。
>>> import pandas as pd >>> >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'} >>> >>> index_data= ['DB', 'OS', 'PL', 'ISO'] >>> >>> series_obj = pd.Series(dict_data, index=index_data) >>> >>> print(series_obj) DB MySQL OS Windows PL Python ISO NaN dtype: object >>> >>> print(series_obj.empty) False -
index:返回一个RangeIndex对象,用于描述该对象是否是空的。返回一个RangeIndex对象,描述索引的值范围。
>>> import pandas as pd >>> >>> dict_data = {'PL' : 'Python', 'OS' : 'Windows', 'DB' : 'MySQL'} >>> >>> index_data= ['DB', 'OS', 'PL', 'ISO'] >>> >>> series_obj = pd.Series(dict_data, index=index_data) >>> >>> print(series_obj) DB MySQL OS Windows PL Python ISO NaN dtype: object >>> >>> print(series_obj.index) Index(['DB', 'OS', 'PL', 'ISO'], dtype='object') >>> >>> print(series_obj.index.array) <PandasArray> ['DB', 'OS', 'PL', 'ISO'] Length: 4, dtype: object >>> >>> for key in series_obj.index.array: ... print(key, ' = ', series_obj[key]) ... DB = MySQL OS = Windows PL = Python ISO = nan -
ndim:返回系列对象的尺寸。根据定义,Series对象是一个一维的数据结构,所以它总是返回1。
>>> print(series_obj.ndim) 1 -
size:返回系列对象的尺寸(长度)。返回系列对象的大小(长度)。
>>> print(series_obj.size) 4 -
values(值):以数组形式返回系列对象中的数据。
>>> print(series_obj.values) ['MySQL' 'Windows' 'Python' nan]