pandas模块三之数据分析

122 阅读1分钟

统计指标

"最大值:",df['amounts'].max()
"最小值:",df['amounts'].min()
"方差:",df['amounts'].var()
"标准差:",df['amounts'].std()
"极差:",df['amounts'].ptp()
"非空总数:",df['amounts'].count()
"众数:",df['amounts'].mode() # 返回  众数的序号  众数

pf[行索引].describe()     

1.对于数值型数据

返回8种指标 count mean std max min 25% 50% 75%

2.对于非数值型数据

返回4种指标 count unique top freq频率

对于数值型数据统计众数、众数的次数 -- 简单统计分析无法实现,要先将数据类型转换为非数值类型

df['amounts'] = df['amounts'].astype('category')
print("amount的describe的统计指标:",df['amounts'].describe())

删除整列为空的列

方法一:

labels = []
for i in df.columns:
    if not df[i].count():
        labels.append(i)
df.drop(labels=labels,axis=1,inplace=True)

方法二:

col = df.count() == 0  # 返回bool数组
length = len(col)
for i in range(length):
    if col[i]:
        df.drop(labels=col.index[i],axis=1,inplace=True)

时间类型

time = pf['time']
# 转化为pandas默认的时间数据类型
方法一:
detail['time'] = pd.to_datetime(time)
方法二:
detail['time'] = pd.DatetimeIndex(time)

时间序列操作

year = [i.year for i in detail['place_order_time']]  年
month = [i.month for i in detail['place_order_time']]  月
day = [i.day for i in detail['place_order_time']] 日
weekday = [i.weekday for i in detail['place_order_time']]
is_leap_year = [i.is_leap_year for i in detail['place_order_time']]

# hour minute second weekday周几(对象) weekday_name周几 quarter第几季度 is_leap_year 是否闰年

t = pd.to_datetime('2019-06-19')+pd.to_datetime('2019-06-10')
t = pd.to_datetime('2019-06-19')-pd.to_datetime('2019-06-10')