如何在Python Pandas数据框架对象上运行描述性统计？本文简要介绍了pandas常用的统计函数，并有一些将这些统

本文简要介绍了pandas常用的统计函数，并有一些将这些统计函数应用于DataFrame对象的例子。

1.Python Pandas统计函数列表

下面是Python潘达斯的统计函数列表。
abs()：获取绝对值。
corr()：计算系列或变量之间的相关系数，数值为-1到1，数值越大，相关度越强。
count()：计算非空值的数量。
cumprod()：计算累积乘积，轴=0，按行累积；轴=1，按列累积。
cumsum()：计算累积和，坐标轴=0，按行计算；坐标轴=1，按列计算。
max()：获取最大值。
mean():获取平均值。
median():获取中位数值。
min():获取最小值。
prod(): 获取所有数值的乘积。
std():获取标准偏差值。
sum():计算各数值的汇总。

2.在DataFrame对象上执行聚合计算操作

从描述性统计的角度来看，我们可以对pandas DataFrame结构进行聚合计算和其他操作，比如运行sum()和mean()方法。
对于DataFrame对象，在对其使用聚合类方法时，需要指定轴参数。
现在我们来介绍一下参数传输的两种方法。
对于行操作，你应该传入axis = 0或 "index" 参数。
对于列操作，你应该传入axis = 1或 "columns" 参数。
Axis = 0意味着在垂直方向上计算，而axis = 1意味着在水平方向上计算。

3.数据框架对象汇总计算操作示例

3.1 示例的基本数据框架结构值

现在让我们创建一个DataFrame对象，用它来演示本例的内容。

下面是本例中将要使用的基本DataFrame对象数据。

import pandas as pd
    
def run_statistics_function():
    
    # create the name column data.
    name_series = pd.Series(['Tom', 'Jerry', 'Mike'])
    
    # create the salary column data.
    salary_series = pd.Series([10000, 8000, 12000])
    
    # create the data dictionary object.
    account_dict = {'Name':name_series, 'Salary':salary_series}
    
    # create the DataFrame object based on the above python dictionary object.
    df = pd.DataFrame(account_dict)
    
    # print out the DataFrame object.
    print(df)
    
    # return the DataFrame object.
    return df

if __name__ == '__main__':
    
    run_statistics_function()

========================================================================
when you run the above example source code, you will get the below DataFrame data output.

    Name  Salary
0    Tom   10000
1  Jerry    8000
2   Mike   12000

3.2 describe()

该函数显示与DataFrame数据列相关的统计摘要。

import pandas as pd
    
def run_statistics_function():
    ......
    # return the DataFrame object.
    return df

if __name__ == '__main__':
    
    df = run_statistics_function()
    
    print(df.describe())

========================================================================
Below is the above code execution result.

    Name  Salary
0    Tom   10000
1  Jerry    8000
2   Mike   12000
        Salary
count      3.0
mean   10000.0
std     2000.0
min     8000.0
25%     9000.0
50%    10000.0
75%    11000.0
max    12000.0

通过describe()方法提供的include参数，我们可以筛选出字符列或数字列的汇总信息。

    print(df.describe(include=['object']))
==========================================================
Below is the example execution output.

         Name
count       3
unique      3
top     Jerry
freq        1

3.3 mean().

计算平均值

import pandas as pd
    
def run_statistics_function():
    
    # create the name column data.
    name_series = pd.Series(['Tom', 'Jerry', 'Mike'])
    
    ......
    
    # print out the DataFrame object.
    print(df)
    
    # return the DataFrame object.
    return df

if __name__ == '__main__':
    
    df = run_statistics_function()
    
    print('\r\n\r\n****** df.mean() ******\r\n', df.mean())

=======================================================================

Below is the above source code execution result.

    Name  Salary
0    Tom   10000
1  Jerry    8000
2   Mike   12000

****** df.mean() ******

 Salary    10000.0
dtype: float64

3.4 std().

计算标准差

import pandas as pd
    
def run_statistics_function():
    
    # create the name column data.
    name_series = pd.Series(['Tom', 'Jerry', 'Mike'])
    
    ......
    
    # print out the DataFrame object.
    print(df)
    
    # return the DataFrame object.
    return df

if __name__ == '__main__':
    
    df = run_statistics_function()
    
    print('\r\n\r\n****** df.std() ******\r\n', df.std())

==========================================================================

The above example source code output.

    Name  Salary
0    Tom   10000
1  Jerry    8000
2   Mike   12000

****** df.std() ******

 Salary    2000.0
dtype: float64

3.5 sum().

计算总和值

import pandas as pd
    
def run_statistics_function():
    
    # create the name column data.
    name_series = pd.Series(['Tom', 'Jerry', 'Mike'])
    
   ......
    
    # print out the DataFrame object.
    print(df)
    
    # return the DataFrame object.
    return df

if __name__ == '__main__':
    
    df = run_statistics_function()
    
    print('\r\n\r\n****** df.sum(axis=0) ******\r\n', df.sum())
    
    print('\r\n\r\n****** df.sum(axis=1) ******\r\n', df.sum(axis=1))

===============================================================================

Below is the example output.

    Name  Salary
0    Tom   10000
1  Jerry    8000
2   Mike   12000

****** df.sum(axis=0) ******

 Name      TomJerryMike
Salary           30000
dtype: object

****** df.sum(axis=1) ******

 0    10000
1     8000
2    12000
dtype: int64