thr3ads.net - R help - [R] Unexpected behavior of predict and interval="confidence" [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Kevin Middleton

2006-Oct-29 17:54 UTC

[R] Unexpected behavior of predict and interval="confidence"

Based on some recent r-help discussions, I have been trying out  
plotting confidence intervals using predict and matplot. Matplot  
appeared to not be plotting the linear regression when using the  
default column names generated by read.table (V1, V2, etc). On  
further inspection, the error seemed to be with predict and vector  
names (V1, V2) rather than with matplot. I was using some textbook  
data sets that were read with read.table, so my data.frame columns  
were named by default. The problem seems to be related to the name of  
the vector only (though I may be wrong).

The example below, based on that in ?predict.lm, better describes the  
problem. Using predict with V2~V1 & y~x produces identical output  
(when x<-V1 and y<-V2). However, using predict with  
interval="confidence" results in different output from the same data.
That with y~x is correct, and V2~V1 is "incorrect".

This may be related to a previous post on r-help:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/56982.html

I can't figure out why there would be a difference in predict when  
only the vector name seemingly changes. Granted, I could just read  
the data.frame is as "x" and "y."

Thanks
Kevin

######################

set.seed(10)

# For example purposes, plot side by side
par(mfrow=c(1,2))

V1 <- rnorm(15)
V2 <- V1 + rnorm(15)

new <- data.frame(x = seq(min(V1), max(V1), length.out = length(V1)))

pred.w.clim <- predict(lm(V2 ~ V1), new, interval="confidence")
matplot(new$x, pred.w.clim,
         lty=c(1,2,2), type="l",
         col=c("black", "red", "red"),
         ylab="predicted y")
points(V1,V2)


# Create x & y equal to V1 & V2
x <- V1
y <- V2

pred.w.clim2 <- predict(lm(y ~ x), new, interval="confidence")
matplot(new$x, pred.w.clim2,
         lty=c(1,2,2), type="l",
         col=c("black", "red", "red"),
         ylab="predicted y")
points(x,y)

# Test if V1=x, V2=y
all.equal(x,V1)
all.equal(y,V2)

# Same output with predict
predict(lm(V2 ~ V1))
predict(lm(y ~ x))
all.equal(predict(lm(V2 ~ V1)), predict(lm(y ~ x)))

# Different output with interval="confidence"
pred.w.clim
pred.w.clim2
all.equal(pred.w.clim, pred.w.clim2)

Gavin Simpson

2006-Oct-30 08:07 UTC

head link

[R] Unexpected behavior of predict and interval="confidence"

On Sun, 2006-10-29 at 12:54 -0500, Kevin Middleton
wrote:> Based on some recent r-help discussions, I have been trying out  
> plotting confidence intervals using predict and matplot. Matplot  
> appeared to not be plotting the linear regression when using the  
> default column names generated by read.table (V1, V2, etc). On  
> further inspection, the error seemed to be with predict and vector  
> names (V1, V2) rather than with matplot. I was using some textbook  
> data sets that were read with read.table, so my data.frame columns  
> were named by default. The problem seems to be related to the name of  
> the vector only (though I may be wrong).
> 
> The example below, based on that in ?predict.lm, better describes the  
> problem. Using predict with V2~V1 & y~x produces identical output  
> (when x<-V1 and y<-V2). However, using predict with  
> interval="confidence" results in different output from the same
data.
> That with y~x is correct, and V2~V1 is "incorrect".
This is because you have been caught out by the way newdata is handled
in predict.

This is your problem:

new <- data.frame(x = seq(min(V1), max(V1), length.out = length(V1)))
names(new)

You have created new, a 1 column data frame with the colname "x". As
"x"
doesn't exist in the model formula (V2 ~ V1), what the predict line
actually means, is to predict value for the observed data used to fit
them model, i.e. the fitted values:

pred.w.clim <- predict(lm(V2 ~ V1), new, interval="confidence")
fit <- fitted(lm(V2 ~ V1))
all.equal(fit, pred.w.clim[, 1])

When you do predict(lm(y ~ x), new, interval="confidence"),
"x" is now
in the model formula *and* in "new" so you get predictions for the x
values in "new".

If you'd done this:

new <- data.frame(V1 = seq(min(V1), max(V1), length.out = length(V1)))
predict(lm(V2 ~ V1), new, interval="confidence")

Then you'd have gotten what you wanted.

HTH

G
> 
> This may be related to a previous post on r-help:
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/56982.html
> 
> I can't figure out why there would be a difference in predict when  
> only the vector name seemingly changes. Granted, I could just read  
> the data.frame is as "x" and "y."
> 
> Thanks
> Kevin
> 
> ######################
> 
> set.seed(10)
> 
> # For example purposes, plot side by side
> par(mfrow=c(1,2))
> 
> V1 <- rnorm(15)
> V2 <- V1 + rnorm(15)
> 
> new <- data.frame(x = seq(min(V1), max(V1), length.out = length(V1)))
> 
> pred.w.clim <- predict(lm(V2 ~ V1), new,
interval="confidence")
> matplot(new$x, pred.w.clim,
>          lty=c(1,2,2), type="l",
>          col=c("black", "red", "red"),
>          ylab="predicted y")
> points(V1,V2)
> 
> 
> # Create x & y equal to V1 & V2
> x <- V1
> y <- V2
> 
> pred.w.clim2 <- predict(lm(y ~ x), new, interval="confidence")
> matplot(new$x, pred.w.clim2,
>          lty=c(1,2,2), type="l",
>          col=c("black", "red", "red"),
>          ylab="predicted y")
> points(x,y)
> 
> # Test if V1=x, V2=y
> all.equal(x,V1)
> all.equal(y,V2)
> 
> # Same output with predict
> predict(lm(V2 ~ V1))
> predict(lm(y ~ x))
> all.equal(predict(lm(V2 ~ V1)), predict(lm(y ~ x)))
> 
> # Different output with interval="confidence"
> pred.w.clim
> pred.w.clim2
> all.equal(pred.w.clim, pred.w.clim2)
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson                     [t] +44 (0)20 7679 0522
ECRC                              [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building                  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK                        [w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT                          [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Oct 2006 - Unexpected behavior of predict and interval="confidence"

[R] Unexpected behavior of predict and interval="confidence"

[R] Unexpected behavior of predict and interval="confidence"

Possibly Parallel Threads