Kazunari Nozue
2012-Feb-24 18:15 UTC
[R] permutation approach to know effect of one factor (factor A) to another (factor B) in each level of factor A (although it is slow)
Hi all, At first I will explain my linear mixed effects model; lme1 <- lme(leaf_length ~ Treatment*Genotype*leaf,random=~Treatment|Set,data=data) or lmer1 <- lmer(leaf~Treatment*Genotype*leaf+(Treatment|Set),data=data) Treatment factor has two conditions: mock (=Mo) and treatment(=Tr) Leaf is position of leaf: 3rd leaf, 4th leaf, .... Genotype is type of plant: wt, mut1, mut2, ....., mut72 Set is experimental replicates: A to J. I would like to ask how to compare leaf length under condition A and one under condition B in each Genotype, i.e. know effect of one factor (factor A, ?Genotype?) to another (factor B, ?Treatment?) in each level of factor A (eg. ?wt?, ?mut1?, ?mut2?, ...). Since my data is unbalanced data, I could not use TukeyHSD() multiple comparison (see its help). My understanding is that pvalue in summary(lme1)$tTable is controversial and pvals.fnc(lmer1) # in languageR package gave me an error (due to (Treatment|Set) because (1|Set) did not gave me an error, but (Treatment\SET) is essential in this model.). As seen below, I would like to know my permutation method is OK or not. I started from a simple permutation approach from Maindonald and Braun (2003) "Data Analysis and Graphics Using R." pg 98. (see scripts below) and I combined my mixed model and this approach with simulated data (omitting leaf factor to simplify, see scripts below). Disadvantage of this method is running time could be long in large data sets with many permutation (eg. 10000), I would like to know my method is OK. If I could have better methods (probably by using multcomp package, such as glht() function), I would really appreciate them. Sorry for long message. Thank you, Kazu ############################## ### Maindonald nad Braun pg. 98 ############################## library(DAAG) data(two65) # from DAAG package x1 <- two65$ambient;x2<-two65$heated;x<-c(x1,x2) n1<-length(x1);n2<-length(x2);n<-n1+n2 dbar<-mean(x2) - mean(x1) z<-array(,2000) for(i in 1:2000) { mn<-sample(n,n2,replace=FALSE) dbardash<-mean(x[mn])-mean(x[-mn]) z[i]<-dbardash } pval<-(sum(z > abs(dbar)) + sum(z< -abs(dbar)))/2000 plot(density(z),yaxs="i") abline(v=dbar) abline(v=-dbar,lty=2) ############################### ### my example with unbalanced data ############################### library(lme4) #simulate data. leaf length in Tr is longer than Mo. wt shows more dramatic response to Tr than mut. setA plants were longer than setB plants (set factor is significant in this model). set.seed(1234) data <- data.frame(Treatment=rep(c("Tr","Mo"),c(9,11)),leaf=c(rnorm(9,11),rnorm(11,10)),Set=rep(c("A","B"),times=10)) data$Genotype <- factor(rep(c("mut","wt"),each=5,length.out=20)) #add setA specific effect for shade and then for sun data$leaf[data$Treatment=="Tr" & data$Set=="A"] <- data$leaf[data$Treatment=="Tr" & data$Set=="A"] + rnorm(length(data$leaf[data$Treatment=="Tr" & data$Set=="A"]),1) data$leaf[data$Treatment=="Tr" & data$Genotype=="mut"] <- data$leaf[data$Treatment=="Tr" & data$Genotype=="mut"] + rnorm(length(data$leaf[data$Treatment=="Tr" & data$Genotype=="mut"]),-0.5) data$leaf[data$Treatment=="Mo" & data$Set=="A"] <- data$leaf[data$Treatment=="Mo" & data$Set=="A"] + rnorm(length(data$leaf[data$Treatment=="Mo" & data$Set=="A"]),-0.25) data # mean from observed data mean.table.obs<-tapply(data$leaf, list(data$Treatment,data$Genotype),mean) # permutation mean.table.PER<-list() nreps<-1000 z<-list() # mean differences for(i in 1:nreps) { new.data<-data.frame() new.wt<-sample(data[data$Genotype=="wt",]$leaf,sum(data$Genotype=="wt",na.rm=TRUE),replace=FALSE) # null hypothesis: leaf in Mo = Tr in wt new.mut<-sample(data[data$Genotype=="mut",]$leaf,sum(data$Genotype=="mut",na.rm=TRUE),replace=FALSE) new.data<-data.frame(leaf=c(new.wt,new.mut),Genotype=data$Genotype,Treatment=data$Treatment,Set=data$Set) # mixed effect model lmer.temp<- lmer(leaf~Treatment*Genotype+(Treatment|Set),data=new.data) #calculate mean of each group mean.table.PER[[i]]<-tapply(fitted(lmer.temp), list(new.data$Treatment,new.data$Genotype),mean) z[[i]]<-mean.table.PER[[i]][2,] - mean.table.PER[[i]][1,] } # calculate p value TF1<-list() TF2<-list() p.value<-vector() for(i in 1:length(levels(data$Genotype))) { for(n in 1:length(z)) { TF1[[n]]<- (z[[n]][i] > abs(mean.table.obs[2,i] - mean.table.obs[1,i]))*1 TF2[[n]]<- (z[[n]][i] < -abs(mean.table.obs[2,i] - mean.table.obs[1,i]))*1 } p.value[i]<-(sum(as.numeric(TF1) + as.numeric(TF2)))/length(z) } names(p.value)<-levels(data$Genotype) p.value p.adjust<-p.adjust(p=p.value,method=?fdr?) p.adjust # graph par(mfcol=c(2,1)) hist(unlist(z)[names(unlist(z))=="mut"],breaks=seq(-5,5,0.2)) abline(v=abs(mean.table.obs[2,1] - mean.table.obs[1,1])) abline(v=-abs(mean.table.obs[2,1] - mean.table.obs[1,1])) hist(unlist(z)[names(unlist(z))=="wt"],breaks=seq(-5,5,0.2)) abline(v=abs(mean.table.obs[2,2] - mean.table.obs[1,2])) abline(v=-abs(mean.table.obs[2,2] - mean.table.obs[1,2]))