统计指标
"最大值:",df['amounts'].max()
"最小值:",df['amounts'].min()
"方差:",df['amounts'].var()
"标准差:",df['amounts'].std()
"极差:",df['amounts'].ptp()
"非空总数:",df['amounts'].count()
"众数:",df['amounts'].mode() # 返回 众数的序号 众数
pf[行索引].describe()
1.对于数值型数据
返回8种指标 count mean std max min 25% 50% 75%
2.对于非数值型数据
返回4种指标 count unique top freq频率
对于数值型数据统计众数、众数的次数 -- 简单统计分析无法实现,要先将数据类型转换为非数值类型
df['amounts'] = df['amounts'].astype('category')
print("amount的describe的统计指标:",df['amounts'].describe())
删除整列为空的列
方法一:
labels = []
for i in df.columns:
if not df[i].count():
labels.append(i)
df.drop(labels=labels,axis=1,inplace=True)
方法二:
col = df.count() == 0 # 返回bool数组
length = len(col)
for i in range(length):
if col[i]:
df.drop(labels=col.index[i],axis=1,inplace=True)
时间类型
time = pf['time']
# 转化为pandas默认的时间数据类型
方法一:
detail['time'] = pd.to_datetime(time)
方法二:
detail['time'] = pd.DatetimeIndex(time)
时间序列操作
year = [i.year for i in detail['place_order_time']] 年
month = [i.month for i in detail['place_order_time']] 月
day = [i.day for i in detail['place_order_time']] 日
weekday = [i.weekday for i in detail['place_order_time']]
is_leap_year = [i.is_leap_year for i in detail['place_order_time']]
# hour minute second weekday周几(对象) weekday_name周几 quarter第几季度 is_leap_year 是否闰年
t = pd.to_datetime('2019-06-19')+pd.to_datetime('2019-06-10')
t = pd.to_datetime('2019-06-19')-pd.to_datetime('2019-06-10')