ivo welch
2010-May-20 13:34 UTC
[R] esthetics --- extending the lm command to fixed effects?
dear R wizards: not important. more a curiosity or esthetics question. is there a way to extend the standard lm command, so that it takes a new argument that handles fixed effects? right now, I have (provided to me from an expert---I would have never figured this one out): diffid <- function(h,id) { id <- as.factor(id)[, drop=TRUE] apply(as.matrix(h), 2, function(x) x - tapply(x,id,mean)[id] } which is used as r= lm( diffid(y, firmid) ~ diffid(x, firmid ) ) it works, but it would be much nicer if I could just write r= lm( y ~ x + z, fixed.effects=firmid ) does this already exists as a package? or has someone figured out how to program this? as I wrote---this is a curiosity question, not a substance question. regards, /iaw ---- Ivo Welch (ivo.welch@brown.edu, ivo.welch@gmail.com) [[alternative HTML version deleted]]
Thomas Lumley
2010-May-20 15:30 UTC
[R] esthetics --- extending the lm command to fixed effects?
On Thu, 20 May 2010, ivo welch wrote:> dear R wizards: > > not important. more a curiosity or esthetics question. > > is there a way to extend the standard lm command, so that it takes a new > argument that handles fixed effects? right now, I have (provided to me > from an expert---I would have never figured this one out): > > diffid <- function(h,id) { > id <- as.factor(id)[, drop=TRUE] > apply(as.matrix(h), 2, function(x) x - tapply(x,id,mean)[id] > }Simpler would be diffid<-function(h,id){ h-ave(h,id)}> which is used as > > r= lm( diffid(y, firmid) ~ diffid(x, firmid ) ) > > it works, but it would be much nicer if I could just write > > r= lm( y ~ x + z, fixed.effects=firmid ) > > does this already exists as a package? or has someone figured out how to > program this?I would just have used lm(y~x+z+factor(firmid)). Admittedly, you get a whole bunch of uninteresting coefficients in the output, but it's not that hard to subset them out. There are two implementation of this in Bill Venables' course notes on advanced programming. I think they are also in 'S Programming', but I can't find my copy right now. These were motivated by computational problems: the full design matrix for the linear model was too large for memory at the time (last century). As a final note, I would strongly discourage r= lm( y ~ x + z, fixed.effects=firmid ) as a specification, and would argue for r= lm( y ~ x + z, fixed.effects=~firmid ) I think the ability to have some subset of the arguments in a modelling call silently treated as formulas was a bad decision, although it must have looked user-friendly at the time. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle