Dear altogether, I tried local regression with the following data. These data are a part of a bigger dataset for which loess is no problem. However, the plot shows extreme values and by looking into the fits, it reveals very extreme values (up to 20000 !) although the original data are > summary(cbind(x,y)) x y Min. :1.800 Min. :2.000 1st Qu.:2.550 1st Qu.:2.750 Median :2.800 Median :3.000 Mean :2.779 Mean :3.093 3rd Qu.:3.050 3rd Qu.:3.450 Max. :4.000 Max. :4.000 > As you can see below, the difference lies in the line predict(mod, data.frame(x=X), se=TRUE) # strange values predict(mod, x=X, se=TRUE) # plausible values What is the difference whether predict() is called via data.frame(x=X) or "just" x=X ???? Here are the data + R-code. It can be repoduced. <--- snip ---> # data x <- c(3.4,2.8,2.6,2.2,2.0,2.8,2.6,2.6,2.8,4.0,2.4,2.8,3.0,3.6,3.2,2.8,3.2,2.4,2.2,1.8,2.8,2.0,3.6,2.6,2.8,3.2,3.0,2.6) y <- c(3.0,2.6,2.8,2.6,3.0,4.0,3.6,2.4,3.0,4.0,2.4,3.4,3.0,3.2,2.8,3.4,3.4,3.8,3.8,3.6,3.2,2.4,3.8,3.0,3.0,2.0,2.6,2.8) par(mfrow=c(2,1)) # normal plot plot(x,y) lines(lowess(x,y)) # loess part mod <- loess(y ~ x, span=.5, degree=1) X <- seq(min(x), max(x), length=50) fit <- predict(mod, data.frame(x=X), se=TRUE) zv <- qnorm((1 + .95)/2) lower <- fit$fit - zv*fit$se upper <- fit$fit + zv*fit$se plot(x, y, ylim=range(y, lower, upper)) lines(X, fit$fit) # strange values in fit fit # here is the difference!! predict(mod, data.frame(x=X), se=TRUE) predict(mod, x=X, se=TRUE) <--- end of snip ---> I assume this has some reason but I do not understand this reason. Merci, best regards leo g??rtler
On Tue, 2005-12-06 at 18:09 +0100, Leo G伱伡rtler wrote:> Dear altogether,<snip>> > # here is the difference!! > predict(mod, data.frame(x=X), se=TRUE) > predict(mod, x=X, se=TRUE) > > > <--- end of snip ---> > > I assume this has some reason but I do not understand this reason. > Merci,Not sure if this is the reason, but there is no argument x in predict.loess, and: a <- predict(mod, se = TRUE) gives you the same results as: b <- predict(mod, x=X, se=TRUE) so the x argument appears to be being passed on/in the ... arguments and ignored? As such, you have no newdata, so mod$x is used. Now, when you do: c <- predict(mod, data.frame(x=X), se=TRUE) You have used an un-named argument in position 2. R takes this to be what you want to use for newdata and so works with this data rather than the one in mod$x as in the first case: # now named second argument - gets ignored as in a and b d <- predict(mod, x = data.frame(x=X), se=TRUE) all.equal(a, b) # TRUE all.equal(a, c) # FALSE all.equal(a, d) # TRUE # this time we assign X to x by using (), the result is used as newdata e <- predict(mod, (x=X), se=TRUE) all.equal(c, e) # TRUE If in doubt, name your arguments and check the help! ?predict.loess would have quickly shown you where the problem lay. HTH G> > best regards > > leo g伱伡rtler > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson wrote: Dear list, I am very sorry for being inaccurate in my question. But re-reading the predict.loess help site does not provide a solution. As long as predict is used on a new dataset based on this dataset, the strange values remain and can be reproduced. Adding a new element to both vectors (at the beginning, e.g. "1" for each vector) results in plausible values - but not in every case. Even switching x and y is sufficient (i.e. x as predictor and y as dependent variable). So my question is: Is it normal - or under which conditions does it take place - that predict.loess predicts values that are almost 20000/max(y) ~ 5000 times higher than expected? best, leo g伱伡rtler>On Tue, 2005-12-06 at 18:09 +0100, Leo G伱伡rtler wrote: > > >>Dear altogether, >> >> ><snip> > > >># here is the difference!! >>predict(mod, data.frame(x=X), se=TRUE) >>predict(mod, x=X, se=TRUE) >> >> >><--- end of snip ---> >> >>I assume this has some reason but I do not understand this reason. >>Merci, >> >> > >Not sure if this is the reason, but there is no argument x in >predict.loess, and: > >a <- predict(mod, se = TRUE) > >gives you the same results as: > >b <- predict(mod, x=X, se=TRUE) > >so the x argument appears to be being passed on/in the ... arguments and >ignored? As such, you have no newdata, so mod$x is used. > >Now, when you do: > >c <- predict(mod, data.frame(x=X), se=TRUE) > >You have used an un-named argument in position 2. R takes this to be >what you want to use for newdata and so works with this data rather than >the one in mod$x as in the first case: > ># now named second argument - gets ignored as in a and b >d <- predict(mod, x = data.frame(x=X), se=TRUE) > >all.equal(a, b) # TRUE >all.equal(a, c) # FALSE >all.equal(a, d) # TRUE > ># this time we assign X to x by using (), the result is used as newdata >e <- predict(mod, (x=X), se=TRUE) > >all.equal(c, e) # TRUE > >If in doubt, name your arguments and check the help! ?predict.loess >would have quickly shown you where the problem lay. > >HTH > >G > > > >>best regards >> >>leo g伱伡rtler >> >>______________________________________________ >>R-help at stat.math.ethz.ch mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >> >>-- email: leog at anicca-vijja.de www: http://www.anicca-vijja.de/
_______________________________________________________________________________________ The problem appears to be in how your original data has several tied values:> table(x)x 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 4 1 2 2 2 5 7 2 3 1 2 1 IIRC the maths and programming behind loess assume unique values for the predictor. One way to get around this is to jitter your data:> x2 <- jitter(x) > modj <- loess(y ~ x2, span=.5, degree=1) > predict(modj, data.frame(x=X))[1] 3.156192 3.141705 3.126918 3.112996 3.101108 3.087696 3.063471 3.038609 3.024639 3.032585 3.059480 3.091774 [13] 3.115763 3.117743 3.092979 3.040798 2.988283 2.957976 2.950648 3.008358 3.070065 3.127379 3.193501 3.149428 [25] 3.082843 3.010998 2.939407 2.888213 2.841487 2.812815 2.801583 2.807181 2.837887 2.899130 2.978165 3.062088 [37] 3.137995 3.204628 3.271813 3.339450 3.407396 3.475510 3.543843 3.612450 3.681267 3.750227 3.819267 3.888321 [49] 3.957324 4.026212 Another way is to summarise your data using table() and aggregate(), and fit a weighted model where the weights are the counts for each unique x-value:> dtab <- aggregate(data.frame(y=y), by=list(x=x), FUN=mean) > dtab$x <- as.numeric(as.character(dtab$x)) > dtab$w <- table(x) > modt <- loess(y ~ x, span=.5, degree=1, weights=w, data=dtab) > predict(modt, data.frame(x=X))[1] 3.186959 3.163133 3.136244 3.110822 3.091396 3.076705 3.047705 3.018362 3.007143 3.032246 3.069599 3.092369 [13] 3.098049 3.084134 3.053633 3.027429 3.012429 3.013908 3.036517 3.060372 3.076116 3.086870 3.095758 3.097287 [25] 3.073824 3.031238 2.976659 2.917402 2.863489 2.821469 2.796398 2.793336 2.823850 2.892363 2.980322 3.068725 [37] 3.140843 3.208920 3.279124 3.351965 3.427952 3.504330 3.577149 3.647119 3.714984 3.781486 3.847369 3.913375 [49] 3.980249 4.048733 There's probably a way to make the aggregate and table calls neater. -- Hong Ooi Senior Research Analyst, IAG Limited 388 George St, Sydney NSW 2000 +61 (2) 9292 1566 -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Leo G??rtler Sent: Wednesday, 7 December 2005 8:10 AM To: r-help at stat.math.ethz.ch Cc: gavin.simpson at ucl.ac.uk Subject: Re: [R] strange behavior of loess() & predict() Gavin Simpson wrote: Dear list, I am very sorry for being inaccurate in my question. But re-reading the predict.loess help site does not provide a solution. As long as predict is used on a new dataset based on this dataset, the strange values remain and can be reproduced. Adding a new element to both vectors (at the beginning, e.g. "1" for each vector) results in plausible values - but not in every case. Even switching x and y is sufficient (i.e. x as predictor and y as dependent variable). So my question is: Is it normal - or under which conditions does it take place - that predict.loess predicts values that are almost 20000/max(y) ~ 5000 times higher than expected? best, leo g??rtler>On Tue, 2005-12-06 at 18:09 +0100, Leo G??rtler wrote: > > >>Dear altogether, >> >> ><snip> > > >># here is the difference!! >>predict(mod, data.frame(x=X), se=TRUE) >>predict(mod, x=X, se=TRUE) >> >> >><--- end of snip ---> >> >>I assume this has some reason but I do not understand this reason. >>Merci, >> >> > >Not sure if this is the reason, but there is no argument x in >predict.loess, and: > >a <- predict(mod, se = TRUE) > >gives you the same results as: > >b <- predict(mod, x=X, se=TRUE) > >so the x argument appears to be being passed on/in the ... arguments and >ignored? As such, you have no newdata, so mod$x is used. > >Now, when you do: > >c <- predict(mod, data.frame(x=X), se=TRUE) > >You have used an un-named argument in position 2. R takes this to be >what you want to use for newdata and so works with this data rather than >the one in mod$x as in the first case: > ># now named second argument - gets ignored as in a and b >d <- predict(mod, x = data.frame(x=X), se=TRUE) > >all.equal(a, b) # TRUE >all.equal(a, c) # FALSE >all.equal(a, d) # TRUE > ># this time we assign X to x by using (), the result is used as newdata >e <- predict(mod, (x=X), se=TRUE) > >all.equal(c, e) # TRUE > >If in doubt, name your arguments and check the help! ?predict.loess >would have quickly shown you where the problem lay. > >HTH > >G > > > >>best regards >> >>leo g??rtler >> >>______________________________________________ >>R-help at stat.math.ethz.ch mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >> >>-- email: leog at anicca-vijja.de www: http://www.anicca-vijja.de/ ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html _______________________________________________________________________________________ The information transmitted in this message and its attachme...{{dropped}}