thr3ads.net - R help - [R] Possible improvement in lm [Jan 2006]

If this information is useful, please help other people find it:
Share via:

Vivek Satsangi

2006-Jan-18 12:08 UTC

[R] Possible improvement in lm

Folks,

I do a series of regressions (one for each quarter in the dataset) and
then go and extract the residuals from each stored lm object that is
returned as follows:

vResiduals <- as.vector(unlist(resid(lQuarterlyRegressions[[i]])));

Here lQuarterlyRegressions is a vector of objects returned by lm().

Next, I may go find outliers using identify() on a plot or do some
other analysis which tells me which row of the quarterly data I need
to take a closer look at.

However, if I try to match some point in one of the quarters that I
have with its residual, then I have to drop the points from my
"current Data" which have NA's for either the explanatory
variables or
the explained, so that the vector or residuals and the data have the
same indexes.

This lead to some serious confusion/bugs for me, and I am wondering if
it might not be better for lm to put an NA into those rows where the
point was dropped because of NA's in the explanatory or explained
variables (currently it just returns nothing at that index). Ofcourse,
there might be some arguments against this idea, and I would be
interested to hear them.

Thank you for your time and attention,


-- Vivek Satsangi
Student, Rochester, NY USA

Gabor Grothendieck

2006-Jan-18 13:33 UTC

head link

[R] Possible improvement in lm

1. Try using

   lm(...whatever..., na.action = na.exclude)

2. Be sure to read the note on Using Time Series in ?lm

3. The dyn package will accept ts, irts, its and zoo class time series
and output time series for the residuals.  Just preface lm with dyn$.
e.g.

library(dyn)

# test data
set.seed(1)
x <- ts(1:10, start = 2000, freq = 4)
x[5] <- NA
y <- x + rnorm(10)

# regress series y against series x
y.lm <- dyn$lm(y ~ x)
resid(y.lm)  # note that residuals are a time series


On 1/18/06, Vivek Satsangi <vivek.satsangi at gmail.com>
wrote:> Folks,
>
> I do a series of regressions (one for each quarter in the dataset) and
> then go and extract the residuals from each stored lm object that is
> returned as follows:
>
> vResiduals <- as.vector(unlist(resid(lQuarterlyRegressions[[i]])));
>
> Here lQuarterlyRegressions is a vector of objects returned by lm().
>
> Next, I may go find outliers using identify() on a plot or do some
> other analysis which tells me which row of the quarterly data I need
> to take a closer look at.
>
> However, if I try to match some point in one of the quarters that I
> have with its residual, then I have to drop the points from my
> "current Data" which have NA's for either the explanatory
variables or
> the explained, so that the vector or residuals and the data have the
> same indexes.
>
> This lead to some serious confusion/bugs for me, and I am wondering if
> it might not be better for lm to put an NA into those rows where the
> point was dropped because of NA's in the explanatory or explained
> variables (currently it just returns nothing at that index). Ofcourse,
> there might be some arguments against this idea, and I would be
> interested to hear them.
>
> Thank you for your time and attention,
>
>
> -- Vivek Satsangi
> Student, Rochester, NY USA
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

Prof Brian Ripley

2006-Jan-18 13:51 UTC

head link

[R] Possible improvement in lm

It seems you are looking for na.action = na.exclude.

This is described in all good books on S/R (e.g. MASS4, p. 141) and also 
in an answer on this list already this week.

It is even on the help pages for residuals.lm and fitted.

On Wed, 18 Jan 2006, Vivek Satsangi wrote:
> Folks,
>
> I do a series of regressions (one for each quarter in the dataset) and
> then go and extract the residuals from each stored lm object that is
> returned as follows:
>
> vResiduals <- as.vector(unlist(resid(lQuarterlyRegressions[[i]])));
>
> Here lQuarterlyRegressions is a vector of objects returned by lm().
>
> Next, I may go find outliers using identify() on a plot or do some
> other analysis which tells me which row of the quarterly data I need
> to take a closer look at.
>
> However, if I try to match some point in one of the quarters that I
> have with its residual, then I have to drop the points from my
> "current Data" which have NA's for either the explanatory
variables or
> the explained, so that the vector or residuals and the data have the
> same indexes.
>
> This lead to some serious confusion/bugs for me, and I am wondering if
> it might not be better for lm to put an NA into those rows where the
> point was dropped because of NA's in the explanatory or explained
> variables (currently it just returns nothing at that index). Ofcourse,
> there might be some arguments against this idea, and I would be
> interested to hear them.
>
> Thank you for your time and attention,
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Berton Gunter

2006-Jan-18 16:13 UTC

head link

[R] Possible improvement in lm

I'm afraid you're a day late and a dollar short: see ?na.exclude.

lm() has been around longer than you have, maybe, and is thus pretty well
optimized. Not perfect, mind you, but I think it unlikely that
"casual"
suggestions haven't already been considered.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Vivek Satsangi
> Sent: Wednesday, January 18, 2006 4:08 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Possible improvement in lm
> 
> Folks,
> 
> I do a series of regressions (one for each quarter in the dataset) and
> then go and extract the residuals from each stored lm object that is
> returned as follows:
> 
> vResiduals <- as.vector(unlist(resid(lQuarterlyRegressions[[i]])));
> 
> Here lQuarterlyRegressions is a vector of objects returned by lm().
> 
> Next, I may go find outliers using identify() on a plot or do some
> other analysis which tells me which row of the quarterly data I need
> to take a closer look at.
> 
> However, if I try to match some point in one of the quarters that I
> have with its residual, then I have to drop the points from my
> "current Data" which have NA's for either the explanatory
variables or
> the explained, so that the vector or residuals and the data have the
> same indexes.
> 
> This lead to some serious confusion/bugs for me, and I am wondering if
> it might not be better for lm to put an NA into those rows where the
> point was dropped because of NA's in the explanatory or explained
> variables (currently it just returns nothing at that index). Ofcourse,
> there might be some arguments against this idea, and I would be
> interested to hear them.
> 
> Thank you for your time and attention,
> 
> 
> -- Vivek Satsangi
> Student, Rochester, NY USA
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Jan 2006 - Possible improvement in lm

[R] Possible improvement in lm

[R] Possible improvement in lm

[R] Possible improvement in lm

[R] Possible improvement in lm

Possibly Parallel Threads