R语言: 用于多个比较的方差分析（ANOVA）

2013-05-19 21:44阅读：

http://blog.sina.cn/dpool/blog/u/1557328354

方差分析在试验科学中有重要的地位，今天谈谈如何用R做方差分析
前提假设：独立、正态、方差齐次（各水平间）
例子：x<-c(25.6,22.2,28.0,29.8,24.4,30.0,29.0,27.5,25.0,27.7,23.0,32.2,28.8,28.0,31.5,25.9,20.6,21.2,22.0,21.2)

数据集用5个因子水平测量，问是否存在差异
光是这样是无法进行分析的，对数据x进行格式转化
b<-data.frame(x,a=gl(5,4,20))得到结果如下（gl指定因子，5是水平，4是重复次数）
x a
1 25.6 1
2 22.2 1
3 28.0 1
4 29.8 1
5 24.4 2
6 30.0 2
7 29.0 2
8 27.5 2
9 25.0 3
10 27.7 3
11 23.0 3
12 32.2 3
13 28.8 4
14 28.0 4
15 31.5 4
16 25.9 4
17 20.6 5
18 21.2 5
19 22.0 5
20 21.2 5
在进行方差分析之前先对几条假设进行检验，由于随机抽取，假设总体满足独立、正态，考察方差齐次性（用bartlett检验）
> bartlett.test(x~a,data=

b)
Bartlett test of homogeneity of variances
data: x by a
Bartlett's K-squared = 7.0966, df = 4, p-value = 0.1309
方差齐次性符合,因此选用方差分析aov()或者KW检验kruskal.test()均可(统计建模与R软件第七章习题答案(方差分析) )。

下面进行方差分析
m1<-aov(x~a,data=b)
summary(m1)
Df Sum Sq Mean Sq F value Pr(>F)
a 4 132.0 32.99 4.306 0.0162 *
Residuals 15 114.9 7.66
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
从这个结果看出差别显著
接下来考察具体的差异（多重比较），

> TukeyHSD(m1)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = x ~ a, data = b)
$a
diff lwr upr p adj
2-1 1.325 -4.718582 7.3685818 0.9584566
3-1 0.575 -5.468582 6.6185818 0.9981815
4-1 2.150 -3.893582 8.1935818 0.8046644
5-1 -5.150 -11.193582 0.8935818 0.1140537
3-2 -0.750 -6.793582 5.2935818 0.9949181
4-2 0.825 -5.218582 6.8685818 0.9926905
5-2 -6.475 -12.518582 -0.4314182 0.0330240
4-3 1.575 -4.468582 7.6185818 0.9251337
5-3 -5.725 -11.768582 0.3185818 0.0675152
5-4 -7.300 -13.343582 -1.2564182 0.0146983
除了5、2和5、4间外，其他之间的差异是不显著的。
原文链接：http://blog.163.com/jiangfeng_data/blog/static/206414038201249111935/
用于多个比较的方差分析（ANOVA：Analysis of variance）
ANOVA模型能用于比较多个群组之间的均值，这里使用了参数（parametric）的方法，也就是假设这些群组符合Gaussian分布。以下为例子：
----------------------------------------------------
超市连锁店的经理想看看4个店面的耗电量（千瓦）是否相等。他在每个月底收集数据，持续了6个月，结果如下：
Store A: 65, 48, 66, 75, 70, 55
Store B: 64, 44, 70, 70, 68, 59
Store C: 60, 50, 65, 69, 69, 57
Store D: 62, 46, 68, 72, 67, 56
为了使用ANOVA来验证，我们必须首先验证homoskedasticity，也就是方差的同质性检验。R软件提供了两种检验方法：Bartlett检验，和Fligner-Killeen检验。
---------------------------------------------------
我们先看Bartlett检验。首先我们创建4个向量，然后再组合成一个向量：
>a = c(65, 48, 66, 75, 70, 55)
>b = c(64, 44, 70, 70, 68, 59)
>c = c(60, 50, 65, 69, 69, 57)
>d = c(62, 46, 68, 72, 67, 56)
>dati = c(a, b, c, d)
另外我们再创建一个对应用于标示dati分组的4个水平的factor：
>groups = factor(rep(letters[1:4], each = 6))
————————————————————————————————————————————————
【如果各组数据量不等： >groups = factor(c(rep(1,7),rep(2,5), rep(3,8), rep(4,6)))
或
>groups = factor(c(rep(“CD”,7),rep(“WL',5), rep('LT',8), rep('BJ',6)))
注：第1组数据结有7个，第2组数据结有5个，第3组数据结有8个，第4组数据结有6个.

或
> g=factor(rep(1:4, c(8,4,7,6))) > g [1] 1 1 1 1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 Levels: 1 2 3 4
或

> g=factor(rep(c('cd','wl','lt','bj'), c(8,4,7,6))) > g [1] cd cd cd cd cd cd cd cd wl wl wl wl lt lt lt lt lt lt lt bj bj bj bj bj bj Levels: bj cd lt wl
】
———————————————————————————————————————————————
这样我们就可以进行Bartlett test了：
>bartlett.test(dati, groups)
Bartlett test of homogeneity of variances
data: dati and groups
Bartlett's K-squared = 0.4822, df = 3, p-value = 0.9228
这个函数得到了统计检验的值（K squared）和p-value。因为p-value > 0.05，所以我们可以说这些组的方差是同质的。另一方面，我们也可以比较Barlett的K-squared和查表的chi-square值，使用函数 qchisq，其输入包括alpha值和自由度
>qchisq(0.950, 3)
[1] 7.814728
显然，这里的chi-squared 大于上面计算的Bartlett的K-squared，因此我们接受null hypothesis H0，即方差都是同质的。 -------------------------------------------------------------------
现在我们试着用Fligner-Killeen test来检测同质性。调用函数的方法和过程都类似：
>a = c(65, 48, 66, 75, 70, 55)
>b = c(64, 44, 70, 70, 68, 59)
>c = c(60, 50, 65, 69, 69, 57)
>d = c(62, 46, 68, 72, 67, 56)
>dati = c(a, b, c, d)
>groups = factor(rep(letters[1:4], each = 6))
>fligner.test(dati, groups)
Fligner-Killeen test of homogeneity of variances
data: dati and groups
Fligner-Killeen:med chi-squared = 0.1316, df = 3, p-value = 0.9878
这里的结论也与Bartlett test类似。 ----------------------------------------------------------------------------------
已验证了4个群组的同质性，我们就可以来处理ANOVA模型了。
首先拟合模型：
>fit = lm(formula = dati ~ groups)
然后分析ANOVA模型：
>anova (fit)
Analysis of Variance Table
Response: dati
Df Sum Sq Mean Sq F value Pr(>F)
groups 3 8.46 2.82 0.0327 0.9918
Residuals 20 1726.50 86.33
函数的输出为经典的ANOVA表，数据如下：

Df = degree of freedom，自由度
Sum Sq = deviance (within groups, and residual)，总方差和（分别有groups和residual的）
Mean Sq = variance (within groups, and residual)，平均方差和（分别有groups和residual的）
F value = the value of the Fisher statistic test, so computed (variance within groups) / (variance residual)，统计检验的值
Pr(>F) = p-value

因为p-value大于0.05，我们接受null hypothesis H0，即4个样本的均值统计相等。我们也可以比较计算的F-vaue和查表的F-value：
>qf(0.950, 20, 3)
[1] 8.66019

查表的F-value大于之前计算的F-value，因此我们接受null hypothesis。
原文链接：http://blog.csdn.net/timothyzh/article/details/7669831

应用实例：当各样本量大小不一时
> rw[1:10,]
r pop
1 0.0870 CD
2 -0.0910 CD
3 0.0719 CD
4 0.2516 CD
5 0.1878 CD
6 -0.3318 CD
7 -0.4095 CD
8 -0.0852 CD
9 -0.1585 CD
10 -0.2265 CD

> str(rw)
'data.frame': 8472 obs. of 2 variables:
$ r : num 0.087 -0.091 0.0719 0.2516 0.1878 ...
$ pop: Factor w/ 5 levels 'BJ','CD','CQ',..: 2 2 2 2 2 2 2 2 2 2 ...
> bartlett.test(r~pop,data=rw)
Bartlett test of homogeneity of variances
data: r by pop
Bartlett's K-squared = 74.8494, df = 4, p-value = 2.144e-15 # p<0.05,方差不齐

————————————————————————————————————
方差不齐用：
> kruskal.test(r~pop,data=rw) Kruskal-Wallis rank sum test data: r by pop Kruskal-Wallis chi-squared = 43.3389, df = 4, p-value = 8.801e-09
————————————————————————————————————————————
> m1<-aov(r~pop,data=rw)
> summary(m1)
Df Sum Sq Mean Sq F value Pr(>F)
pop 4 2.0 0.4980 12.8 2.19e-10 ***
Residuals 8467 329.5 0.0389
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> TukeyHSD(m1)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = r ~ pop, data = rw)
$pop diff lwr upr p adj
CD-BJ 0.025972862 -0.076497522 0.12844325 0.9583788
CQ-BJ 0.048778810 -0.123445811 0.22100343 0.9384799
LT-BJ 0.127254524 0.018965243 0.23554380 0.0117902
WL-BJ 0.032694279 -0.069250584 0.13463914 0.9061475
CQ-CD 0.022805947 -0.116720234 0.16233213 0.9918260
LT-CD 0.101281662 0.062130625 0.14043270 0.0000000
WL-CD 0.006721417 -0.007384981 0.02082782 0.6912301
LT-CQ 0.078475714 -0.065378185 0.22232961 0.5700784
WL-CQ -0.016084530 -0.155225218 0.12305616 0.9978590
WL-LT -0.094560244 -0.132314441 -0.05680605 0.0000000
> re <- TukeyHSD(m1) > plot(re)

R语言: <wbr> <wbr>用于多个比较的方差分析（ANOVA）

举报/Report

我的更多文章

下载客户端阅读体验更佳