Folks, I do a series of regressions (one for each quarter in the dataset) and then go and extract the residuals from each stored lm object that is returned as follows: vResiduals <- as.vector(unlist(resid(lQuarterlyRegressions[[i]]))); Here lQuarterlyRegressions is a vector of objects returned by lm(). Next, I may go find outliers using identify() on a plot or do some other analysis which tells me which row of the quarterly data I need to take a closer look at. However, if I try to match some point in one of the quarters that I have with its residual, then I have to drop the points from my "current Data" which have NA's for either the explanatory variables or the explained, so that the vector or residuals and the data have the same indexes. This lead to some serious confusion/bugs for me, and I am wondering if it might not be better for lm to put an NA into those rows where the point was dropped because of NA's in the explanatory or explained variables (currently it just returns nothing at that index). Ofcourse, there might be some arguments against this idea, and I would be interested to hear them. Thank you for your time and attention, -- Vivek Satsangi Student, Rochester, NY USA
1. Try using lm(...whatever..., na.action = na.exclude) 2. Be sure to read the note on Using Time Series in ?lm 3. The dyn package will accept ts, irts, its and zoo class time series and output time series for the residuals. Just preface lm with dyn$. e.g. library(dyn) # test data set.seed(1) x <- ts(1:10, start = 2000, freq = 4) x[5] <- NA y <- x + rnorm(10) # regress series y against series x y.lm <- dyn$lm(y ~ x) resid(y.lm) # note that residuals are a time series On 1/18/06, Vivek Satsangi <vivek.satsangi at gmail.com> wrote:> Folks, > > I do a series of regressions (one for each quarter in the dataset) and > then go and extract the residuals from each stored lm object that is > returned as follows: > > vResiduals <- as.vector(unlist(resid(lQuarterlyRegressions[[i]]))); > > Here lQuarterlyRegressions is a vector of objects returned by lm(). > > Next, I may go find outliers using identify() on a plot or do some > other analysis which tells me which row of the quarterly data I need > to take a closer look at. > > However, if I try to match some point in one of the quarters that I > have with its residual, then I have to drop the points from my > "current Data" which have NA's for either the explanatory variables or > the explained, so that the vector or residuals and the data have the > same indexes. > > This lead to some serious confusion/bugs for me, and I am wondering if > it might not be better for lm to put an NA into those rows where the > point was dropped because of NA's in the explanatory or explained > variables (currently it just returns nothing at that index). Ofcourse, > there might be some arguments against this idea, and I would be > interested to hear them. > > Thank you for your time and attention, > > > -- Vivek Satsangi > Student, Rochester, NY USA > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
It seems you are looking for na.action = na.exclude. This is described in all good books on S/R (e.g. MASS4, p. 141) and also in an answer on this list already this week. It is even on the help pages for residuals.lm and fitted. On Wed, 18 Jan 2006, Vivek Satsangi wrote:> Folks, > > I do a series of regressions (one for each quarter in the dataset) and > then go and extract the residuals from each stored lm object that is > returned as follows: > > vResiduals <- as.vector(unlist(resid(lQuarterlyRegressions[[i]]))); > > Here lQuarterlyRegressions is a vector of objects returned by lm(). > > Next, I may go find outliers using identify() on a plot or do some > other analysis which tells me which row of the quarterly data I need > to take a closer look at. > > However, if I try to match some point in one of the quarters that I > have with its residual, then I have to drop the points from my > "current Data" which have NA's for either the explanatory variables or > the explained, so that the vector or residuals and the data have the > same indexes. > > This lead to some serious confusion/bugs for me, and I am wondering if > it might not be better for lm to put an NA into those rows where the > point was dropped because of NA's in the explanatory or explained > variables (currently it just returns nothing at that index). Ofcourse, > there might be some arguments against this idea, and I would be > interested to hear them. > > Thank you for your time and attention,-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
I'm afraid you're a day late and a dollar short: see ?na.exclude. lm() has been around longer than you have, maybe, and is thus pretty well optimized. Not perfect, mind you, but I think it unlikely that "casual" suggestions haven't already been considered. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Vivek Satsangi > Sent: Wednesday, January 18, 2006 4:08 AM > To: r-help at stat.math.ethz.ch > Subject: [R] Possible improvement in lm > > Folks, > > I do a series of regressions (one for each quarter in the dataset) and > then go and extract the residuals from each stored lm object that is > returned as follows: > > vResiduals <- as.vector(unlist(resid(lQuarterlyRegressions[[i]]))); > > Here lQuarterlyRegressions is a vector of objects returned by lm(). > > Next, I may go find outliers using identify() on a plot or do some > other analysis which tells me which row of the quarterly data I need > to take a closer look at. > > However, if I try to match some point in one of the quarters that I > have with its residual, then I have to drop the points from my > "current Data" which have NA's for either the explanatory variables or > the explained, so that the vector or residuals and the data have the > same indexes. > > This lead to some serious confusion/bugs for me, and I am wondering if > it might not be better for lm to put an NA into those rows where the > point was dropped because of NA's in the explanatory or explained > variables (currently it just returns nothing at that index). Ofcourse, > there might be some arguments against this idea, and I would be > interested to hear them. > > Thank you for your time and attention, > > > -- Vivek Satsangi > Student, Rochester, NY USA > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >