Kazunari Nozue
2012-Feb-24 18:15 UTC
[R] permutation approach to know effect of one factor (factor A) to another (factor B) in each level of factor A (although it is slow)
Hi all,
At first I will explain my linear mixed effects model;
lme1 <- lme(leaf_length ~
Treatment*Genotype*leaf,random=~Treatment|Set,data=data)
or
lmer1 <- lmer(leaf~Treatment*Genotype*leaf+(Treatment|Set),data=data)
Treatment factor has two conditions: mock (=Mo) and treatment(=Tr)
Leaf is position of leaf: 3rd leaf, 4th leaf, ....
Genotype is type of plant: wt, mut1, mut2, ....., mut72
Set is experimental replicates: A to J.
I would like to ask how to compare leaf length under condition A and one under
condition B in each Genotype, i.e. know effect of one factor (factor A,
?Genotype?) to another (factor B, ?Treatment?) in each level of factor A (eg.
?wt?, ?mut1?, ?mut2?, ...).
Since my data is unbalanced data, I could not use TukeyHSD() multiple comparison
(see its help).
My understanding is that pvalue in summary(lme1)$tTable is controversial and
pvals.fnc(lmer1) # in languageR package
gave me an error (due to (Treatment|Set) because (1|Set) did not gave me an
error, but (Treatment\SET) is essential in this model.).
As seen below, I would like to know my permutation method is OK or not.
I started from a simple permutation approach from Maindonald and Braun (2003)
"Data Analysis and Graphics Using R." pg 98. (see scripts below) and I
combined my mixed model and this approach with simulated data (omitting leaf
factor to simplify, see scripts below). Disadvantage of this method is running
time could be long in large data sets with many permutation (eg. 10000), I would
like to know my method is OK. If I could have better methods (probably by using
multcomp package, such as glht() function), I would really appreciate them.
Sorry for long message.
Thank you,
Kazu
##############################
### Maindonald nad Braun pg. 98
##############################
library(DAAG)
data(two65) # from DAAG package
x1 <- two65$ambient;x2<-two65$heated;x<-c(x1,x2)
n1<-length(x1);n2<-length(x2);n<-n1+n2
dbar<-mean(x2) - mean(x1)
z<-array(,2000)
for(i in 1:2000) {
mn<-sample(n,n2,replace=FALSE)
dbardash<-mean(x[mn])-mean(x[-mn])
z[i]<-dbardash
}
pval<-(sum(z > abs(dbar)) + sum(z< -abs(dbar)))/2000
plot(density(z),yaxs="i")
abline(v=dbar)
abline(v=-dbar,lty=2)
###############################
### my example with unbalanced data
###############################
library(lme4)
#simulate data. leaf length in Tr is longer than Mo. wt shows more dramatic
response to Tr than mut. setA plants were longer than setB plants (set factor is
significant in this model).
set.seed(1234)
data <-
data.frame(Treatment=rep(c("Tr","Mo"),c(9,11)),leaf=c(rnorm(9,11),rnorm(11,10)),Set=rep(c("A","B"),times=10))
data$Genotype <-
factor(rep(c("mut","wt"),each=5,length.out=20))
#add setA specific effect for shade and then for sun
data$leaf[data$Treatment=="Tr" & data$Set=="A"] <-
data$leaf[data$Treatment=="Tr" & data$Set=="A"] +
rnorm(length(data$leaf[data$Treatment=="Tr" &
data$Set=="A"]),1)
data$leaf[data$Treatment=="Tr" & data$Genotype=="mut"]
<- data$leaf[data$Treatment=="Tr" &
data$Genotype=="mut"] +
rnorm(length(data$leaf[data$Treatment=="Tr" &
data$Genotype=="mut"]),-0.5)
data$leaf[data$Treatment=="Mo" & data$Set=="A"] <-
data$leaf[data$Treatment=="Mo" & data$Set=="A"] +
rnorm(length(data$leaf[data$Treatment=="Mo" &
data$Set=="A"]),-0.25)
data
# mean from observed data
mean.table.obs<-tapply(data$leaf, list(data$Treatment,data$Genotype),mean)
# permutation
mean.table.PER<-list()
nreps<-1000
z<-list() # mean differences
for(i in 1:nreps) {
new.data<-data.frame()
new.wt<-sample(data[data$Genotype=="wt",]$leaf,sum(data$Genotype=="wt",na.rm=TRUE),replace=FALSE)
# null hypothesis: leaf in Mo = Tr in wt
new.mut<-sample(data[data$Genotype=="mut",]$leaf,sum(data$Genotype=="mut",na.rm=TRUE),replace=FALSE)
new.data<-data.frame(leaf=c(new.wt,new.mut),Genotype=data$Genotype,Treatment=data$Treatment,Set=data$Set)
# mixed effect model
lmer.temp<- lmer(leaf~Treatment*Genotype+(Treatment|Set),data=new.data)
#calculate mean of each group
mean.table.PER[[i]]<-tapply(fitted(lmer.temp),
list(new.data$Treatment,new.data$Genotype),mean)
z[[i]]<-mean.table.PER[[i]][2,] - mean.table.PER[[i]][1,]
}
# calculate p value
TF1<-list()
TF2<-list()
p.value<-vector()
for(i in 1:length(levels(data$Genotype))) {
for(n in 1:length(z)) {
TF1[[n]]<- (z[[n]][i] > abs(mean.table.obs[2,i] -
mean.table.obs[1,i]))*1
TF2[[n]]<- (z[[n]][i] < -abs(mean.table.obs[2,i] -
mean.table.obs[1,i]))*1
}
p.value[i]<-(sum(as.numeric(TF1) + as.numeric(TF2)))/length(z)
}
names(p.value)<-levels(data$Genotype)
p.value
p.adjust<-p.adjust(p=p.value,method=?fdr?)
p.adjust
# graph
par(mfcol=c(2,1))
hist(unlist(z)[names(unlist(z))=="mut"],breaks=seq(-5,5,0.2))
abline(v=abs(mean.table.obs[2,1] - mean.table.obs[1,1]))
abline(v=-abs(mean.table.obs[2,1] - mean.table.obs[1,1]))
hist(unlist(z)[names(unlist(z))=="wt"],breaks=seq(-5,5,0.2))
abline(v=abs(mean.table.obs[2,2] - mean.table.obs[1,2]))
abline(v=-abs(mean.table.obs[2,2] - mean.table.obs[1,2]))