单因素方差分析（代码）开启掘金成长之旅！这是我参与「掘金日新计划 · 12 月更文挑战」的第2天，点击查看活动详情总结

开启掘金成长之旅！这是我参与「掘金日新计划 · 12 月更文挑战」的第2天，点击查看活动详情

总结：此文为12月更文计划第二天第二篇。

这篇来对方差分析进行代码的学习

单因素方差分析

单因素方差分析是两个样本平均数比较的引伸，它是用来检验多个平均数之间的差异，从而确定因素对试验结果有无显著性影响的一种统计方法。

因素：影响研究对象的某一指标、变量。
水平：因素变化的各种状态或因素变化所分的等级或组别。
单因素试验：考虑的因素只有一个的试验叫单因素试验。

#-*- coding : utf-8 -*-
# coding: unicode_escape
import pandas as pd
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
from scipy import stats
cishu=pd.read_csv('hua.csv',encoding="gbk")
print(cishu.head())
cishu[["Particle_drop_rate", "First_flowering"]].groupby("First_flowering").mean()
stats.bartlett(cishu[cishu['Particle_drop_rate']==1]['First_flowering'],
                cishu[cishu['Particle_drop_rate']==2]['First_flowering'],
                cishu[cishu['Particle_drop_rate']==3]['First_flowering'],
                cishu[cishu['Particle_drop_rate']==4]['First_flowering'])#检验四个行业的投诉次数样本方差的齐性

model = ols('First_flowering~C(Particle_drop_rate)-1',cishu).fit()  #有-1表示模型中不含常数项
anovat = anova_lm(model)
print(anovat)
print(model.summary2())
from statsmodels.stats.multicomp import pairwise_tukeyhsd #多重区组检验
cishu_anova_post=pairwise_tukeyhsd(cishu['First_flowering'],cishu['Particle_drop_rate'],alpha=0.05)
cishu_anova_post.summary()

输出的结果如下：

当原假设成立时，各总体均值相等，各样本均值间的差异应该较小，模型平方和也应较小，F统计量取很大值应该是稀有的情形。

所以对给定显著性水平α(0, 1)，若p = P{F  F0} < α，则拒绝原假设H0（F0为F统计量的观测值），可以认为所考虑的因素对响应变量有显著影响；否则不能拒绝H0，认为所考虑的因素对响应变量无显著影响。