thr3ads.net - R help - [R] predict.loess and NA/NaN values [Aug 2010]

If this information is useful, please help other people find it:
Share via:

Philipp Pagel

2010-Aug-27 09:41 UTC

[R] predict.loess and NA/NaN values

Hi!

In a current project, I am fitting loess models to subsets of data in
order to use the loess predicitons for normalization (similar to what
is done in many microarray analyses). While working on this I ran into
a problem when I tried to predict from the loess models and the data
contained NAs or NaNs. I tracked down the problem to the fact that
predict.loess will not return a value at all when fed with such
values. A toy example:

x <- rnorm(15)
y <- x + rnorm(15)
model.lm <- lm(y~x)
model.loess <- loess(y~x)
predict(model.lm, data.frame(x=c(0.5, Inf, -Inf, NA, NaN)))
predict(model.loess, data.frame(x=c(0.5, Inf, -Inf, NA, NaN)))

The behaviour of predict.lm meets my expectation: I get a vector of
length 5 where the unpredictable ones are NA or NaN. predict.loess on the
other hand returns only 3 values quietly skipping the last two.

I was unable to find anything in the manual page that explains this
behaviour or says how to change it. So I'm asking the community: Is
there a way to fix this or do I have to code around it?

This is in R 2.11.1 (Linux), by the way.

Thanks in advance

	Philipp


-- 
Dr. Philipp Pagel
Lehrstuhl f?r Genomorientierte Bioinformatik
Technische Universit?t M?nchen
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

Liaw, Andy

2010-Aug-28 00:57 UTC

head link

[R] predict.loess and NA/NaN values

From: Philipp Pagel> 
> In a current project, I am fitting loess models to subsets of data in
> order to use the loess predicitons for normalization (similar to what
> is done in many microarray analyses). While working on this I ran into
> a problem when I tried to predict from the loess models and the data
> contained NAs or NaNs. I tracked down the problem to the fact that
> predict.loess will not return a value at all when fed with such
> values. A toy example:
> 
> x <- rnorm(15)
> y <- x + rnorm(15)
> model.lm <- lm(y~x)
> model.loess <- loess(y~x)
> predict(model.lm, data.frame(x=c(0.5, Inf, -Inf, NA, NaN)))
> predict(model.loess, data.frame(x=c(0.5, Inf, -Inf, NA, NaN)))
> 
> The behaviour of predict.lm meets my expectation: I get a vector of
> length 5 where the unpredictable ones are NA or NaN. 
> predict.loess on the
> other hand returns only 3 values quietly skipping the last two.
> 
> I was unable to find anything in the manual page that explains this
> behaviour or says how to change it. So I'm asking the community: Is
> there a way to fix this or do I have to code around it?
This is not much help, but I did a bit of digging by using

  debug(stats:::predict.loess)

And then step through the function line-by-line.  Apparently the
Problem happens before the actual prediction is done.  The code

   as.matrix(model.frame(delete.response(terms(object)), newdata))

already omitted the NA and NaN.  The problem is that that's the
default behavior of model.frame().  Consulting ?model.frame, I see
that you can override this by setting the na.action attribute of the 
data frame passed to it.  Thus I tried setting 

  na.dat = data.frame(x=c(0.5, Inf, -Inf, NA, NaN))
  attr(na.dat, "na.action") = na.pass

This does make the as.matrix(model.frame()) line retain the NA and
NaN, but it bombs in the prediction at the subsequent step.  I guess
It really doesn't like NA as inputs.

What you can do is patch the code to add the NAs back after the 
Prediction step (which many predict() methods do).

Cheers,
Andy
 > This is in R 2.11.1 (Linux), by the way.
> 
> Thanks in advance
> 
> 	Philipp
> 
> 
> -- 
> Dr. Philipp Pagel
> Lehrstuhl f?r Genomorientierte Bioinformatik
> Technische Universit?t M?nchen
> Wissenschaftszentrum Weihenstephan
> Maximus-von-Imhof-Forum 3
> 85354 Freising, Germany
> http://webclu.bio.wzw.tum.de/~pagel/
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> Notice:  This e-mail message, together with any attachme...{{dropped:11}}

Prof Brian Ripley

2010-Aug-30 12:50 UTC

head link

[R] predict.loess and NA/NaN values

The underlying problem is your expectations.

R (unlike S) was set up many years ago to use na.omit as the default, 
and when fitting both lm() and loess() silently omit cases with 
missing values.  So why should prediction from 'newdata' be different 
unless documented to be so (which it is nowadays for predict.lm, 
even though you are adding to the evidence that was a mistake)?

loess() is somewhat different from lm() in that it does not in general 
allow extrapolation, and the prediction for Inf and NaN is simply 
undefined.

Nevertheless, take a look at the version in R-devel (pre-2.12.0) which 
give you more options.

On Fri, 27 Aug 2010, Philipp Pagel wrote:
>
> 	Hi!
>
> In a current project, I am fitting loess models to subsets of data in
> order to use the loess predicitons for normalization (similar to what
> is done in many microarray analyses). While working on this I ran into
> a problem when I tried to predict from the loess models and the data
> contained NAs or NaNs. I tracked down the problem to the fact that
> predict.loess will not return a value at all when fed with such
> values. A toy example:
>
> x <- rnorm(15)
> y <- x + rnorm(15)
> model.lm <- lm(y~x)
> model.loess <- loess(y~x)
> predict(model.lm, data.frame(x=c(0.5, Inf, -Inf, NA, NaN)))
> predict(model.loess, data.frame(x=c(0.5, Inf, -Inf, NA, NaN)))
>
> The behaviour of predict.lm meets my expectation: I get a vector of
> length 5 where the unpredictable ones are NA or NaN. predict.loess on the
> other hand returns only 3 values quietly skipping the last two.
>
> I was unable to find anything in the manual page that explains this
> behaviour or says how to change it. So I'm asking the community: Is
> there a way to fix this or do I have to code around it?
>
> This is in R 2.11.1 (Linux), by the way.
>
> Thanks in advance
>
> 	Philipp
>
>
> --
> Dr. Philipp Pagel
> Lehrstuhl f?r Genomorientierte Bioinformatik
> Technische Universit?t M?nchen
> Wissenschaftszentrum Weihenstephan
> Maximus-von-Imhof-Forum 3
> 85354 Freising, Germany
> http://webclu.bio.wzw.tum.de/~pagel/
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Philipp Pagel

2010-Aug-30 18:48 UTC

head link

[R] predict.loess and NA/NaN values

> What you can do is patch the code to add the NAs back after the 
> Prediction step (which many predict() methods do).
Thanks Andy for your hints and especially for digging into the problem
like this! I have, in the meantime, written a simple wrapper around
predict.loess that fills in the NAs, where I would like to have them.

cu
	Philipp
	
-- 
Dr. Philipp Pagel
Lehrstuhl f?r Genomorientierte Bioinformatik
Technische Universit?t M?nchen
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

Reasonably Related Threads

Search for more possibly parallel threads

R help - Aug 2010 - predict.loess and NA/NaN values

[R] predict.loess and NA/NaN values

[R] predict.loess and NA/NaN values

[R] predict.loess and NA/NaN values

[R] predict.loess and NA/NaN values

Reasonably Related Threads