If the observations are normally distributed and the 2xk design is balanced, theory requires that the tests for interaction and row effects be independent. In my program, appended below, this would translate to cntT (approx)= cntR*cntI/N if all R routines were functioning correctly. They aren't. sim2=function(size,N,p){ cntR=0 cntC=0 cntI=0 cntT=0 cntP=0 for(i in 1:N){ #generate data v=gendata(size) #analyze after build(ing) design containing data lm.out=lm(yield~c*r,build(size,v)) av.out=anova(lm.out) #if column effect is significant, increment cntC if (av.out[[5]][1]<=p)cntC=cntC+1 #if row effect is significant, increment cntR if (av.out[[5]][2]<=p){ cntR=cntR+1 tmp = 1 } else tmp =0 if (av.out[[5]][3]<=p){ #if interaction is significant, increment cntI cntI=cntI+1 #if both interaction and row effect are significant, increment cntT cntT=cntT + tmp } } list(cntC=cntC, cntR=cntR, cntI=cntI, cntT=cntT) } build=function(size,v){ #size is a vector containing the sample sizes col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]), rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8])) row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6] +size[7]+size[8])) return(data.frame(c=factor(col), r=factor(row),yield=v)) } gendata=function(size){ ssize=sum(size); return (rnorm(ssize)) } #Example size=c(3,3,3,0,3,3,3,0) sim2(size,10000,10,.16) Phillip Good Huntington Beach CA
I have already had email exchanges off-list with Phillip Good pointing out that the independence property that he wishes to establish by simulation is a consequence of orthogonality of the column span of the row contrasts and the interaction contrasts. If you set the contrasts option to a set of orthogonal contrasts such as contr.helmert or contr.sum, which has no effect on the results of the anova, this is easily established.> buildfunction(size, v = rnorm(sum(size))) { col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]), rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8])) row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6] +size[7]+size[8])) return(data.frame(c=factor(col), r=factor(row),yield=v)) }> size[1] 3 3 3 0 3 3 3 0> set.seed(1234321) > vv <- build(size) > vvc r yield 1 0 0 1.2369081 2 0 0 1.5616230 3 0 0 1.8396185 4 1 0 0.3173245 5 1 0 1.0715115 6 1 0 -1.1459955 7 2 0 0.2488894 8 2 0 0.1158625 9 2 0 2.6200816 10 0 1 1.2624048 11 0 1 -0.9862578 12 0 1 -0.3235653 13 1 1 0.2039706 14 1 1 -1.4574576 15 1 1 1.9158713 16 2 1 -2.0333909 17 2 1 1.0050272 18 2 1 0.6789184> options(contrasts = c('contr.helmert', 'contr.poly')) > crossprod(model.matrix(~c*r, vv))(Intercept) c1 c2 r1 c1:r1 c2:r1 (Intercept) 18 0 0 0 0 0 c1 0 12 0 0 0 0 c2 0 0 36 0 0 0 r1 0 0 0 18 0 0 c1:r1 0 0 0 0 12 0 c2:r1 0 0 0 0 0 36 On 7/4/05, Phillip Good <pigood at verizon.net> wrote:> If the observations are normally distributed and the 2xk design is > balanced, theory requires that the tests for interaction and row effects be > independent. In my program, appended below, this would translate to cntT > (approx)= cntR*cntI/N if all R routines were functioning correctly. They > aren't. > > sim2=function(size,N,p){ > cntR=0 > cntC=0 > cntI=0 > cntT=0 > cntP=0 > for(i in 1:N){ > #generate data > v=gendata(size) > #analyze after build(ing) design containing data > lm.out=lm(yield~c*r,build(size,v)) > av.out=anova(lm.out) > #if column effect is significant, increment cntC > if (av.out[[5]][1]<=p)cntC=cntC+1 > #if row effect is significant, increment cntR > if (av.out[[5]][2]<=p){ > cntR=cntR+1 > tmp = 1 > } > else tmp =0 > if (av.out[[5]][3]<=p){ > #if interaction is significant, increment cntI > cntI=cntI+1 > #if both interaction and row effect are significant, increment cntT > cntT=cntT + tmp > } > } > list(cntC=cntC, cntR=cntR, cntI=cntI, cntT=cntT) > } > > build=function(size,v){ > #size is a vector containing the sample sizes > col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]), > rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8])) > row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6] > +size[7]+size[8])) > return(data.frame(c=factor(col), r=factor(row),yield=v)) > } > > gendata=function(size){ > ssize=sum(size); > return (rnorm(ssize)) > } > > #Example > size=c(3,3,3,0,3,3,3,0) > sim2(size,10000,10,.16) > > > > Phillip Good > Huntington Beach CA > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > >
A couple more comments on this simulation. Notice that the sim2 function is defined with arguments size, N and p but is being called with four arguments. It appears as if the value of p will be 10 in that call. If you decide to do such a simulation yourself you can save a lot of time by just building the model matrix once and using lm.fit in subsequent calls. Also, there is no need to do the counting in the body of the sim2 function. Just save the 3 p-values from each replication. The test of independence is equivalent to showing that the distribution of the p-values is uniform over the unit cube. On 7/4/05, Phillip Good <pigood at verizon.net> wrote:> If the observations are normally distributed and the 2xk design is > balanced, theory requires that the tests for interaction and row effects be > independent. In my program, appended below, this would translate to cntT > (approx)= cntR*cntI/N if all R routines were functioning correctly. They > aren't. > > sim2=function(size,N,p){ > cntR=0 > cntC=0 > cntI=0 > cntT=0 > cntP=0 > for(i in 1:N){ > #generate data > v=gendata(size) > #analyze after build(ing) design containing data > lm.out=lm(yield~c*r,build(size,v)) > av.out=anova(lm.out) > #if column effect is significant, increment cntC > if (av.out[[5]][1]<=p)cntC=cntC+1 > #if row effect is significant, increment cntR > if (av.out[[5]][2]<=p){ > cntR=cntR+1 > tmp = 1 > } > else tmp =0 > if (av.out[[5]][3]<=p){ > #if interaction is significant, increment cntI > cntI=cntI+1 > #if both interaction and row effect are significant, increment cntT > cntT=cntT + tmp > } > } > list(cntC=cntC, cntR=cntR, cntI=cntI, cntT=cntT) > } > > build=function(size,v){ > #size is a vector containing the sample sizes > col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]), > rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8])) > row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6] > +size[7]+size[8])) > return(data.frame(c=factor(col), r=factor(row),yield=v)) > } > > gendata=function(size){ > ssize=sum(size); > return (rnorm(ssize)) > } > > #Example > size=c(3,3,3,0,3,3,3,0) > sim2(size,10000,10,.16) > > > > Phillip Good > Huntington Beach CA > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > >