Hi! In a current project, I am fitting loess models to subsets of data in order to use the loess predicitons for normalization (similar to what is done in many microarray analyses). While working on this I ran into a problem when I tried to predict from the loess models and the data contained NAs or NaNs. I tracked down the problem to the fact that predict.loess will not return a value at all when fed with such values. A toy example: x <- rnorm(15) y <- x + rnorm(15) model.lm <- lm(y~x) model.loess <- loess(y~x) predict(model.lm, data.frame(x=c(0.5, Inf, -Inf, NA, NaN))) predict(model.loess, data.frame(x=c(0.5, Inf, -Inf, NA, NaN))) The behaviour of predict.lm meets my expectation: I get a vector of length 5 where the unpredictable ones are NA or NaN. predict.loess on the other hand returns only 3 values quietly skipping the last two. I was unable to find anything in the manual page that explains this behaviour or says how to change it. So I'm asking the community: Is there a way to fix this or do I have to code around it? This is in R 2.11.1 (Linux), by the way. Thanks in advance Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/
From: Philipp Pagel> > In a current project, I am fitting loess models to subsets of data in > order to use the loess predicitons for normalization (similar to what > is done in many microarray analyses). While working on this I ran into > a problem when I tried to predict from the loess models and the data > contained NAs or NaNs. I tracked down the problem to the fact that > predict.loess will not return a value at all when fed with such > values. A toy example: > > x <- rnorm(15) > y <- x + rnorm(15) > model.lm <- lm(y~x) > model.loess <- loess(y~x) > predict(model.lm, data.frame(x=c(0.5, Inf, -Inf, NA, NaN))) > predict(model.loess, data.frame(x=c(0.5, Inf, -Inf, NA, NaN))) > > The behaviour of predict.lm meets my expectation: I get a vector of > length 5 where the unpredictable ones are NA or NaN. > predict.loess on the > other hand returns only 3 values quietly skipping the last two. > > I was unable to find anything in the manual page that explains this > behaviour or says how to change it. So I'm asking the community: Is > there a way to fix this or do I have to code around it?This is not much help, but I did a bit of digging by using debug(stats:::predict.loess) And then step through the function line-by-line. Apparently the Problem happens before the actual prediction is done. The code as.matrix(model.frame(delete.response(terms(object)), newdata)) already omitted the NA and NaN. The problem is that that's the default behavior of model.frame(). Consulting ?model.frame, I see that you can override this by setting the na.action attribute of the data frame passed to it. Thus I tried setting na.dat = data.frame(x=c(0.5, Inf, -Inf, NA, NaN)) attr(na.dat, "na.action") = na.pass This does make the as.matrix(model.frame()) line retain the NA and NaN, but it bombs in the prediction at the subsequent step. I guess It really doesn't like NA as inputs. What you can do is patch the code to add the NAs back after the Prediction step (which many predict() methods do). Cheers, Andy> This is in R 2.11.1 (Linux), by the way. > > Thanks in advance > > Philipp > > > -- > Dr. Philipp Pagel > Lehrstuhl f?r Genomorientierte Bioinformatik > Technische Universit?t M?nchen > Wissenschaftszentrum Weihenstephan > Maximus-von-Imhof-Forum 3 > 85354 Freising, Germany > http://webclu.bio.wzw.tum.de/~pagel/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Notice: This e-mail message, together with any attachme...{{dropped:11}}
The underlying problem is your expectations. R (unlike S) was set up many years ago to use na.omit as the default, and when fitting both lm() and loess() silently omit cases with missing values. So why should prediction from 'newdata' be different unless documented to be so (which it is nowadays for predict.lm, even though you are adding to the evidence that was a mistake)? loess() is somewhat different from lm() in that it does not in general allow extrapolation, and the prediction for Inf and NaN is simply undefined. Nevertheless, take a look at the version in R-devel (pre-2.12.0) which give you more options. On Fri, 27 Aug 2010, Philipp Pagel wrote:> > Hi! > > In a current project, I am fitting loess models to subsets of data in > order to use the loess predicitons for normalization (similar to what > is done in many microarray analyses). While working on this I ran into > a problem when I tried to predict from the loess models and the data > contained NAs or NaNs. I tracked down the problem to the fact that > predict.loess will not return a value at all when fed with such > values. A toy example: > > x <- rnorm(15) > y <- x + rnorm(15) > model.lm <- lm(y~x) > model.loess <- loess(y~x) > predict(model.lm, data.frame(x=c(0.5, Inf, -Inf, NA, NaN))) > predict(model.loess, data.frame(x=c(0.5, Inf, -Inf, NA, NaN))) > > The behaviour of predict.lm meets my expectation: I get a vector of > length 5 where the unpredictable ones are NA or NaN. predict.loess on the > other hand returns only 3 values quietly skipping the last two. > > I was unable to find anything in the manual page that explains this > behaviour or says how to change it. So I'm asking the community: Is > there a way to fix this or do I have to code around it? > > This is in R 2.11.1 (Linux), by the way. > > Thanks in advance > > Philipp > > > -- > Dr. Philipp Pagel > Lehrstuhl f?r Genomorientierte Bioinformatik > Technische Universit?t M?nchen > Wissenschaftszentrum Weihenstephan > Maximus-von-Imhof-Forum 3 > 85354 Freising, Germany > http://webclu.bio.wzw.tum.de/~pagel/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
> What you can do is patch the code to add the NAs back after the > Prediction step (which many predict() methods do).Thanks Andy for your hints and especially for digging into the problem like this! I have, in the meantime, written a simple wrapper around predict.loess that fills in the NAs, where I would like to have them. cu Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/