S Ellison
2008-Jan-11 16:03 UTC
[R] Behaviour of standard error estimates in lmRob and the like
I am looking at MM-estimates for some interlab comparison work. The usual situation in this particular context is a modest number of results from very expensive methods with abnormally well-characterised performance, so for once we have good "variance" estimates (which can differ substantially for good reason) from most labs. But there remains room for human error or unexpected chemistry which still causes some outliers. MM-estimates from robust regression with variance weights is a suggested approach with promising features. It's a sledgehammer approach to getting a mean value, but .... Now, rlm, lmrob and lmRob all seem to do the job . But one of the things of particular interest to this particular lab community is the standard error on that mean value, and there I am seeing some unexpected anomalies in lmRob. rlm, by contrast, is pretty much rock steady, and lmrob little different to rlm. The practical answer is not to use lmRob, but I'd like to better understand the cause. (It's a confidence thing). Clearly, the scale estimator in lmRob is doing something odd - but what? Any observations/pointers much appreciated. By way of example, try (for exaggerated behaviour): library(robust) #for lmRob library(robustbase) #lmrob - R's own version? library(MASS) #for rlm #The presence of, or ordering, of the three packages does not affect the outcome from lmRob N<-10 z<-matrix(rnorm(N*1000), ncol=N) se.MM.rlm<-apply(z[,1:N],1, function(z) coef(summary(rlm(z~1, method="MM")))[2] ) se.MM<-apply(z[,1:N],1, function(z) coef(summary(lmRob(z~1, control=lmRob.control(efficiency=0.95)) ))[2] ) se.MM.lmrob<-apply(z[,1:N],1, function(z) coef(summary(lmrob(z~1)) )[2] ) #.. and some simpler estimators for comparison, since this is a simple data set. se.med<-apply(z[,1:N],1, function(x,N) mad(x)/sqrt(N),N=N ) se.mean<-apply(z[,1:N],1, function(x,N) sd(x)/sqrt(N),N=N ) se.H<-apply(z[,1:N],1, function(z) hubers(z,k=1.35)$s/sqrt(N) ) #expected 95% efficiency se.vals<-data.frame(Mean=se.mean, Median=se.med, Huber=se.H, MM.rlm=se.MM.rlm, MM.lmrob=se.MM.lmrob, MM=se.MM) se.stk<-stack(se.vals) windows() boxplot(values~ind, data=se.stk, main=paste("STD Errors: N=",N), ylab="Std Err") abline(h=sqrt(1/N)) #Around about .3 for N=10 abline(h=sqrt(1/0.95)*sqrt(1/N),lty=2) #95% efficiency line #Now, I don't mind a difference, but a factor of 200 is a bit much to swallow. #Steve Ellison ******************************************************************* This email contains information which may be confidentia...{{dropped:5}}