thr3ads.net - R help - [R] Lack of independence in anova() [Jul 2005]

If this information is useful, please help other people find it:
Share via:

Phillip Good

2005-Jul-04 15:28 UTC

[R] Lack of independence in anova()

If the observations are normally distributed and the 2xk design is
balanced,  theory requires that the tests for interaction and row effects be
independent.  In my program, appended below, this would translate to cntT
(approx)= cntR*cntI/N if all R routines were functioning correctly.  They
aren't.

sim2=function(size,N,p){
  cntR=0
  cntC=0
  cntI=0
  cntT=0
  cntP=0
  for(i in 1:N){
    #generate data
     v=gendata(size)
    #analyze after build(ing) design containing data
     lm.out=lm(yield~c*r,build(size,v))
     av.out=anova(lm.out)
    #if column effect is significant, increment cntC
     if (av.out[[5]][1]<=p)cntC=cntC+1
   #if row effect is significant, increment cntR
     if (av.out[[5]][2]<=p){
           cntR=cntR+1
	   tmp = 1
	   }
     else tmp =0 		
     if (av.out[[5]][3]<=p){
           #if interaction is significant, increment cntI
            cntI=cntI+1
	#if both interaction and row effect are significant, increment cntT
            cntT=cntT + tmp
            } 
     }
    list(cntC=cntC, cntR=cntR, cntI=cntI, cntT=cntT)
}

build=function(size,v){
#size is a vector containing the sample sizes
col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]),
rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8]))
row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6]
+size[7]+size[8]))
return(data.frame(c=factor(col), r=factor(row),yield=v))
}

gendata=function(size){
  ssize=sum(size);
  return (rnorm(ssize))
}

#Example
 size=c(3,3,3,0,3,3,3,0)
 sim2(size,10000,10,.16)



Phillip Good
Huntington Beach CA

Douglas Bates

2005-Jul-04 16:12 UTC

head link

[R] Lack of independence in anova()

I have already had email exchanges off-list with Phillip Good pointing
out that the independence property that he wishes to establish by
simulation is a consequence of orthogonality of the column span of the
row contrasts and the interaction contrasts.  If you set the contrasts
option to a set of orthogonal contrasts such as contr.helmert or
contr.sum, which has no effect on the results of the anova, this is
easily established.
> buildfunction(size, v = rnorm(sum(size))) {
    col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]),
    rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8]))
    row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6]
    +size[7]+size[8]))
    return(data.frame(c=factor(col), r=factor(row),yield=v))
}> size
[1] 3 3 3 0 3 3 3 0> set.seed(1234321)
> vv <- build(size)
> vv   c r      yield
1  0 0  1.2369081
2  0 0  1.5616230
3  0 0  1.8396185
4  1 0  0.3173245
5  1 0  1.0715115
6  1 0 -1.1459955
7  2 0  0.2488894
8  2 0  0.1158625
9  2 0  2.6200816
10 0 1  1.2624048
11 0 1 -0.9862578
12 0 1 -0.3235653
13 1 1  0.2039706
14 1 1 -1.4574576
15 1 1  1.9158713
16 2 1 -2.0333909
17 2 1  1.0050272
18 2 1  0.6789184> options(contrasts = c('contr.helmert', 'contr.poly'))
> crossprod(model.matrix(~c*r, vv))            (Intercept) c1 c2 r1 c1:r1 c2:r1
(Intercept)          18  0  0  0     0     0
c1                    0 12  0  0     0     0
c2                    0  0 36  0     0     0
r1                    0  0  0 18     0     0
c1:r1                 0  0  0  0    12     0
c2:r1                 0  0  0  0     0    36


On 7/4/05, Phillip Good <pigood at verizon.net>
wrote:>  If the observations are normally distributed and the 2xk design is
> balanced,  theory requires that the tests for interaction and row effects
be
> independent.  In my program, appended below, this would translate to cntT
> (approx)= cntR*cntI/N if all R routines were functioning correctly.  They
> aren't.
> 
> sim2=function(size,N,p){
>   cntR=0
>   cntC=0
>   cntI=0
>   cntT=0
>   cntP=0
>   for(i in 1:N){
>     #generate data
>      v=gendata(size)
>     #analyze after build(ing) design containing data
>      lm.out=lm(yield~c*r,build(size,v))
>      av.out=anova(lm.out)
>     #if column effect is significant, increment cntC
>      if (av.out[[5]][1]<=p)cntC=cntC+1
>    #if row effect is significant, increment cntR
>      if (av.out[[5]][2]<=p){
>            cntR=cntR+1
>            tmp = 1
>            }
>      else tmp =0
>      if (av.out[[5]][3]<=p){
>            #if interaction is significant, increment cntI
>             cntI=cntI+1
>         #if both interaction and row effect are significant, increment cntT
>             cntT=cntT + tmp
>             }
>      }
>     list(cntC=cntC, cntR=cntR, cntI=cntI, cntT=cntT)
> }
> 
> build=function(size,v){
> #size is a vector containing the sample sizes
> col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]),
> rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8]))
> row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6]
> +size[7]+size[8]))
> return(data.frame(c=factor(col), r=factor(row),yield=v))
> }
> 
> gendata=function(size){
>   ssize=sum(size);
>   return (rnorm(ssize))
> }
> 
> #Example
>  size=c(3,3,3,0,3,3,3,0)
>  sim2(size,10000,10,.16)
> 
> 
> 
> Phillip Good
> Huntington Beach CA
> 
> 
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> 
>

Douglas Bates

2005-Jul-04 16:24 UTC

head link

[R] Lack of independence in anova()

A couple more comments on this simulation.  Notice that the sim2
function is defined with arguments size, N and p but is being called
with four arguments.  It appears as if the value of p will be 10 in
that call.

If you decide to do such a simulation yourself you can save a lot of
time by just building the model matrix once and using lm.fit in
subsequent calls.

Also, there is no need to do the counting in the body of the sim2
function.  Just save the 3 p-values from each replication.  The test
of independence is equivalent to showing that the distribution of the
p-values is uniform over the unit cube.


On 7/4/05, Phillip Good <pigood at verizon.net>
wrote:>  If the observations are normally distributed and the 2xk design is
> balanced,  theory requires that the tests for interaction and row effects
be
> independent.  In my program, appended below, this would translate to cntT
> (approx)= cntR*cntI/N if all R routines were functioning correctly.  They
> aren't.
> 
> sim2=function(size,N,p){
>   cntR=0
>   cntC=0
>   cntI=0
>   cntT=0
>   cntP=0
>   for(i in 1:N){
>     #generate data
>      v=gendata(size)
>     #analyze after build(ing) design containing data
>      lm.out=lm(yield~c*r,build(size,v))
>      av.out=anova(lm.out)
>     #if column effect is significant, increment cntC
>      if (av.out[[5]][1]<=p)cntC=cntC+1
>    #if row effect is significant, increment cntR
>      if (av.out[[5]][2]<=p){
>            cntR=cntR+1
>            tmp = 1
>            }
>      else tmp =0
>      if (av.out[[5]][3]<=p){
>            #if interaction is significant, increment cntI
>             cntI=cntI+1
>         #if both interaction and row effect are significant, increment cntT
>             cntT=cntT + tmp
>             }
>      }
>     list(cntC=cntC, cntR=cntR, cntI=cntI, cntT=cntT)
> }
> 
> build=function(size,v){
> #size is a vector containing the sample sizes
> col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]),
> rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8]))
> row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6]
> +size[7]+size[8]))
> return(data.frame(c=factor(col), r=factor(row),yield=v))
> }
> 
> gendata=function(size){
>   ssize=sum(size);
>   return (rnorm(ssize))
> }
> 
> #Example
>  size=c(3,3,3,0,3,3,3,0)
>  sim2(size,10000,10,.16)
> 
> 
> 
> Phillip Good
> Huntington Beach CA
> 
> 
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> 
>

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Jul 2005 - Lack of independence in anova()

[R] Lack of independence in anova()

[R] Lack of independence in anova()

[R] Lack of independence in anova()

Apparently Analagous Threads