R provides a few ways of handling missing values, a.o. in the context of an anova (aov); 2 types of exclusion, and failure. In some situations, I personally like to have missing values replaced by the mean (or the median) for the given combination of factors. A routine that does that is something like the code included below. It works, but is (of course) rather slow. It would be much quicker if sapply() could be used -- and I imagine that somewhere in the "innards" of aov or lm the data will have been broken up by factors such that sapply could be applied. Is there a good (statistical or other) reason why there is no such option? And alternatively, is there a more efficient solution than my code below? Thanks again, -- RJV Bertin NB: return address not valid; use r j v b e r t i n at h o t m a i l dot c o m df.Missing.Mean.VV1 <- function(df,verbose=F) { ## replace missing values in the dataframe df by the mean of the corresponding column for each combination of the factors that interest us here. ## have to find a more elegant fashion to find the factor columns! Subjects<-length(levels(df$Snr)) ## construct an array to receive the means for each combination of the relevant factors: Types<-length(levels(df$Type)) sizes<-length(levels(df$size)) Modalities<-length(levels(df$Modality)) replval<-rep(NA, Subjects*Types*sizes*Modalities) dim(replval)<-c(Subjects,Types,sizes,Modalities) nSS<-as.numeric(df$Snr) nT<-as.numeric(df$Type) nS<-as.numeric(df$size) nM<-as.numeric(df$Modality) for( i in 1:ncol(df) ){ m<-mean(df[,i],na.rm=T) if( !is.na(m) ){ for( T in 1:Types ){ for( S in 1:sizes ){ for( M in 1:Modalities ){ m <- mean( df[,i][ nT==T & nS==S & nM==M ], na.rm=T ) ## subject-dependency should be redundant! for( SS in 1:Subjects ){ replval[SS,T,S,M] <- m } } } } for( j in 1:length(df[,i]) ){ if( is.na(df[,i][j]) ){ SS<-nSS[j] ; T<-nT[j] ; S<-nS[j] ; M<-nM[j] if( verbose ){ print( paste( "df[,", i, ",", j, "] == NA <-", # "mean(Snr=", SS, ",T=", T, ",S=",S,",M=",M,")==", "mean(Snr=", df$Snr[j], ",T=", df$Type[j], ",S=",df$size[j],",M=",df$Modality[j],")==", replval[SS,T,S,M], sep="" )) } df[,i][j]<-replval[SS,T,S,M] } } } } rm(nSS,nT,nS,nM,replval) df } ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._