dear R wizards: apologies for two queries in one day. I have a long form
data set, which identifies about 5,000 regressions, each with about 1,000
observations.
unit date y x
1 20060101 <two values>
1 20060102 <two values>
...
5000 20081230 <two values>
5000 20081231 <two values>
I need to run such regressions many many times, because they are part of an
optimization. thus, getting my code to be fast is paramount. I will need
to pick off the 5,000 coefficients on x (i.e., b) and the standard errors of
b's. I can ignore the 5,000 intercept.
by( dataset, as.factor(dataset$unit), function(x) coef(lm( y ~ x,
data=x)) )
gives me the coefficients. of course, I could use the summary method to lm
to pick off the coefficient standard errors, too. my guess is that this
would be slow.
I think the alternative would be to delete all NAs first, and then use a
building block function (such as lm.fit(), or solve(qr(),y)). this would be
fast for getting the coefficients, but I wonder whether there is a *FAST*
way to obtain the standard error of b. (I do know slow ways, but this would
defeat the purpose.) is this the right idea? or will I just end up with
more code but not more speed than I would with summary(lm())? can someone
tell me the "fastest" way to generate b and se(b)?
is there anything else that comes to mind as a recommended way to speed this
up in R, short of writing everything in C?
as always, advice highly appreciated.
/iaw
--
Ivo Welch (ivo.welch@brown.edu, ivo.welch@gmail.com)
[[alternative HTML version deleted]]