thr3ads.net - R help - [R] strange behavior of loess() & predict() [Dec 2005]

If this information is useful, please help other people find it:
Share via:

Leo Gürtler

2005-Dec-06 17:09 UTC

[R] strange behavior of loess() & predict()

Dear altogether,

I tried local regression with the following data. These data are a part 
of a bigger dataset for which loess is no problem.
However, the plot shows extreme values and by looking into the fits, it 
reveals very extreme values (up to 20000 !) although the original data are

 > summary(cbind(x,y))
       x               y       
 Min.   :1.800   Min.   :2.000 
 1st Qu.:2.550   1st Qu.:2.750 
 Median :2.800   Median :3.000 
 Mean   :2.779   Mean   :3.093 
 3rd Qu.:3.050   3rd Qu.:3.450 
 Max.   :4.000   Max.   :4.000 
 >

 As you can see below, the difference lies in the line

predict(mod, data.frame(x=X), se=TRUE)   # strange values
predict(mod, x=X, se=TRUE)                     # plausible values

What is the difference whether predict() is called via

data.frame(x=X) or "just" x=X ????

Here are the data + R-code. It can be repoduced.

<--- snip --->

# data
x <- 
c(3.4,2.8,2.6,2.2,2.0,2.8,2.6,2.6,2.8,4.0,2.4,2.8,3.0,3.6,3.2,2.8,3.2,2.4,2.2,1.8,2.8,2.0,3.6,2.6,2.8,3.2,3.0,2.6)
y <- 
c(3.0,2.6,2.8,2.6,3.0,4.0,3.6,2.4,3.0,4.0,2.4,3.4,3.0,3.2,2.8,3.4,3.4,3.8,3.8,3.6,3.2,2.4,3.8,3.0,3.0,2.0,2.6,2.8)

par(mfrow=c(2,1))

# normal plot
plot(x,y)
lines(lowess(x,y))

# loess part
mod <- loess(y ~ x, span=.5, degree=1)
X <- seq(min(x), max(x), length=50)
fit <- predict(mod, data.frame(x=X), se=TRUE)
zv <- qnorm((1 + .95)/2)
lower <- fit$fit - zv*fit$se
upper <- fit$fit + zv*fit$se
plot(x, y, ylim=range(y, lower, upper))
lines(X, fit$fit)

# strange values in fit
fit

# here is the difference!!
predict(mod, data.frame(x=X), se=TRUE)
predict(mod, x=X, se=TRUE)


<--- end of snip --->

I assume this has some reason but I do not understand this reason.
Merci,

best regards

leo g??rtler

Gavin Simpson

2005-Dec-06 17:49 UTC

head link

[R] strange behavior of loess() & predict()

On Tue, 2005-12-06 at 18:09 +0100, Leo G伱伡rtler wrote:> Dear altogether,
<snip>> 
> # here is the difference!!
> predict(mod, data.frame(x=X), se=TRUE)
> predict(mod, x=X, se=TRUE)
> 
> 
> <--- end of snip --->
> 
> I assume this has some reason but I do not understand this reason.
> Merci,
Not sure if this is the reason, but there is no argument x in
predict.loess, and:

a <- predict(mod, se = TRUE)

gives you the same results as:

b <- predict(mod, x=X, se=TRUE)

so the x argument appears to be being passed on/in the ... arguments and
ignored? As such, you have no newdata, so mod$x is used.

Now, when you do:

c <- predict(mod, data.frame(x=X), se=TRUE)

You have used an un-named argument in position 2. R takes this to be
what you want to use for newdata and so works with this data rather than
the one in mod$x as in the first case:

# now named second argument - gets ignored as in a and b
d <- predict(mod, x = data.frame(x=X), se=TRUE)

all.equal(a, b) # TRUE
all.equal(a, c) # FALSE
all.equal(a, d) # TRUE

# this time we assign X to x by using (), the result is used as newdata
e <-  predict(mod, (x=X), se=TRUE)

all.equal(c, e) # TRUE

If in doubt, name your arguments and check the help! ?predict.loess
would have quickly shown you where the problem lay.

HTH

G
> 
> best regards
> 
> leo g伱伡rtler
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson                     [T] +44 (0)20 7679 5522
ENSIS Research Fellow             [F] +44 (0)20 7679 7565
ENSIS Ltd. & ECRC                 [E] gavin.simpsonATNOSPAMucl.ac.uk
UCL Department of Geography       [W] http://www.ucl.ac.uk/~ucfagls/cv/
26 Bedford Way                    [W] http://www.ucl.ac.uk/~ucfagls/
London.  WC1H 0AP.
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

Leo Gürtler

2005-Dec-06 21:09 UTC

head link

[R] strange behavior of loess() & predict()

Gavin Simpson wrote:

Dear list,

I am very sorry for being inaccurate in my question. But re-reading the 
predict.loess help site does not provide a solution. As long as predict 
is used on a new dataset based on this dataset, the strange values 
remain and can be reproduced.
Adding a new element to both vectors (at the beginning, e.g. "1" for 
each vector) results in plausible values - but not in every case.
Even switching x and y is sufficient (i.e. x as predictor and y as 
dependent variable). So my question is:

Is it normal - or under which conditions does it take place - that 
predict.loess predicts values that are almost 20000/max(y) ~ 5000 times 
higher than expected?

best,

leo g伱伡rtler
>On Tue, 2005-12-06 at 18:09 +0100, Leo G伱伡rtler wrote:
>  
>
>>Dear altogether,
>>    
>>
><snip>
>  
>
>># here is the difference!!
>>predict(mod, data.frame(x=X), se=TRUE)
>>predict(mod, x=X, se=TRUE)
>>
>>
>><--- end of snip --->
>>
>>I assume this has some reason but I do not understand this reason.
>>Merci,
>>    
>>
>
>Not sure if this is the reason, but there is no argument x in
>predict.loess, and:
>
>a <- predict(mod, se = TRUE)
>
>gives you the same results as:
>
>b <- predict(mod, x=X, se=TRUE)
>
>so the x argument appears to be being passed on/in the ... arguments and
>ignored? As such, you have no newdata, so mod$x is used.
>
>Now, when you do:
>
>c <- predict(mod, data.frame(x=X), se=TRUE)
>
>You have used an un-named argument in position 2. R takes this to be
>what you want to use for newdata and so works with this data rather than
>the one in mod$x as in the first case:
>
># now named second argument - gets ignored as in a and b
>d <- predict(mod, x = data.frame(x=X), se=TRUE)
>
>all.equal(a, b) # TRUE
>all.equal(a, c) # FALSE
>all.equal(a, d) # TRUE
>
># this time we assign X to x by using (), the result is used as newdata
>e <-  predict(mod, (x=X), se=TRUE)
>
>all.equal(c, e) # TRUE
>
>If in doubt, name your arguments and check the help! ?predict.loess
>would have quickly shown you where the problem lay.
>
>HTH
>
>G
>
>  
>
>>best regards
>>
>>leo g伱伡rtler
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>>    
>>

-- 

email: leog at anicca-vijja.de
www: http://www.anicca-vijja.de/

Hong Ooi

2005-Dec-06 23:41 UTC

head link

[R] strange behavior of loess() & predict()

_______________________________________________________________________________________


The problem appears to be in how your original data has several tied values:
> table(x)x
1.8   2 2.2 2.4 2.6 2.8   3 3.2 3.4 3.6   4 
  1   2   2   2   5   7   2   3   1   2   1

IIRC the maths and programming behind loess assume unique values for the
predictor.

One way to get around this is to jitter your data:
> x2 <- jitter(x)
> modj <- loess(y ~ x2, span=.5, degree=1)
> predict(modj, data.frame(x=X)) [1] 3.156192 3.141705 3.126918 3.112996 3.101108 3.087696 3.063471 3.038609
3.024639 3.032585 3.059480 3.091774
[13] 3.115763 3.117743 3.092979 3.040798 2.988283 2.957976 2.950648 3.008358
3.070065 3.127379 3.193501 3.149428
[25] 3.082843 3.010998 2.939407 2.888213 2.841487 2.812815 2.801583 2.807181
2.837887 2.899130 2.978165 3.062088
[37] 3.137995 3.204628 3.271813 3.339450 3.407396 3.475510 3.543843 3.612450
3.681267 3.750227 3.819267 3.888321
[49] 3.957324 4.026212

Another way is to summarise your data using table() and aggregate(), and fit a
weighted model where the weights are the counts for each unique x-value:
> dtab <-  aggregate(data.frame(y=y), by=list(x=x), FUN=mean)
> dtab$x <- as.numeric(as.character(dtab$x))
> dtab$w <- table(x)
> modt <- loess(y ~ x, span=.5, degree=1, weights=w, data=dtab)
> predict(modt, data.frame(x=X)) [1] 3.186959 3.163133 3.136244 3.110822 3.091396 3.076705 3.047705 3.018362
3.007143 3.032246 3.069599 3.092369
[13] 3.098049 3.084134 3.053633 3.027429 3.012429 3.013908 3.036517 3.060372
3.076116 3.086870 3.095758 3.097287
[25] 3.073824 3.031238 2.976659 2.917402 2.863489 2.821469 2.796398 2.793336
2.823850 2.892363 2.980322 3.068725
[37] 3.140843 3.208920 3.279124 3.351965 3.427952 3.504330 3.577149 3.647119
3.714984 3.781486 3.847369 3.913375
[49] 3.980249 4.048733

There's probably a way to make the aggregate and table calls neater.


-- 
Hong Ooi
Senior Research Analyst, IAG Limited
388 George St, Sydney NSW 2000
+61 (2) 9292 1566
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at
stat.math.ethz.ch] On Behalf Of Leo G??rtler
Sent: Wednesday, 7 December 2005 8:10 AM
To: r-help at stat.math.ethz.ch
Cc: gavin.simpson at ucl.ac.uk
Subject: Re: [R] strange behavior of loess() & predict()

Gavin Simpson wrote:

Dear list,

I am very sorry for being inaccurate in my question. But re-reading the 
predict.loess help site does not provide a solution. As long as predict 
is used on a new dataset based on this dataset, the strange values 
remain and can be reproduced.
Adding a new element to both vectors (at the beginning, e.g. "1" for 
each vector) results in plausible values - but not in every case.
Even switching x and y is sufficient (i.e. x as predictor and y as 
dependent variable). So my question is:

Is it normal - or under which conditions does it take place - that 
predict.loess predicts values that are almost 20000/max(y) ~ 5000 times 
higher than expected?

best,

leo g??rtler
>On Tue, 2005-12-06 at 18:09 +0100, Leo G??rtler wrote:
>  
>
>>Dear altogether,
>>    
>>
><snip>
>  
>
>># here is the difference!!
>>predict(mod, data.frame(x=X), se=TRUE)
>>predict(mod, x=X, se=TRUE)
>>
>>
>><--- end of snip --->
>>
>>I assume this has some reason but I do not understand this reason.
>>Merci,
>>    
>>
>
>Not sure if this is the reason, but there is no argument x in
>predict.loess, and:
>
>a <- predict(mod, se = TRUE)
>
>gives you the same results as:
>
>b <- predict(mod, x=X, se=TRUE)
>
>so the x argument appears to be being passed on/in the ... arguments and
>ignored? As such, you have no newdata, so mod$x is used.
>
>Now, when you do:
>
>c <- predict(mod, data.frame(x=X), se=TRUE)
>
>You have used an un-named argument in position 2. R takes this to be
>what you want to use for newdata and so works with this data rather than
>the one in mod$x as in the first case:
>
># now named second argument - gets ignored as in a and b
>d <- predict(mod, x = data.frame(x=X), se=TRUE)
>
>all.equal(a, b) # TRUE
>all.equal(a, c) # FALSE
>all.equal(a, d) # TRUE
>
># this time we assign X to x by using (), the result is used as newdata
>e <-  predict(mod, (x=X), se=TRUE)
>
>all.equal(c, e) # TRUE
>
>If in doubt, name your arguments and check the help! ?predict.loess
>would have quickly shown you where the problem lay.
>
>HTH
>
>G
>
>  
>
>>best regards
>>
>>leo g??rtler
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>>    
>>

-- 

email: leog at anicca-vijja.de
www: http://www.anicca-vijja.de/

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


_______________________________________________________________________________________

The information transmitted in this message and its attachme...{{dropped}}

Maybe Matching Threads

Search for more possibly parallel threads

R help - Dec 2005 - strange behavior of loess() & predict()

[R] strange behavior of loess() & predict()

[R] strange behavior of loess() & predict()

[R] strange behavior of loess() & predict()

[R] strange behavior of loess() & predict()

Maybe Matching Threads