Kevin Middleton
2006-Oct-29 17:54 UTC
[R] Unexpected behavior of predict and interval="confidence"
Based on some recent r-help discussions, I have been trying out plotting confidence intervals using predict and matplot. Matplot appeared to not be plotting the linear regression when using the default column names generated by read.table (V1, V2, etc). On further inspection, the error seemed to be with predict and vector names (V1, V2) rather than with matplot. I was using some textbook data sets that were read with read.table, so my data.frame columns were named by default. The problem seems to be related to the name of the vector only (though I may be wrong). The example below, based on that in ?predict.lm, better describes the problem. Using predict with V2~V1 & y~x produces identical output (when x<-V1 and y<-V2). However, using predict with interval="confidence" results in different output from the same data. That with y~x is correct, and V2~V1 is "incorrect". This may be related to a previous post on r-help: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/56982.html I can't figure out why there would be a difference in predict when only the vector name seemingly changes. Granted, I could just read the data.frame is as "x" and "y." Thanks Kevin ###################### set.seed(10) # For example purposes, plot side by side par(mfrow=c(1,2)) V1 <- rnorm(15) V2 <- V1 + rnorm(15) new <- data.frame(x = seq(min(V1), max(V1), length.out = length(V1))) pred.w.clim <- predict(lm(V2 ~ V1), new, interval="confidence") matplot(new$x, pred.w.clim, lty=c(1,2,2), type="l", col=c("black", "red", "red"), ylab="predicted y") points(V1,V2) # Create x & y equal to V1 & V2 x <- V1 y <- V2 pred.w.clim2 <- predict(lm(y ~ x), new, interval="confidence") matplot(new$x, pred.w.clim2, lty=c(1,2,2), type="l", col=c("black", "red", "red"), ylab="predicted y") points(x,y) # Test if V1=x, V2=y all.equal(x,V1) all.equal(y,V2) # Same output with predict predict(lm(V2 ~ V1)) predict(lm(y ~ x)) all.equal(predict(lm(V2 ~ V1)), predict(lm(y ~ x))) # Different output with interval="confidence" pred.w.clim pred.w.clim2 all.equal(pred.w.clim, pred.w.clim2)
Gavin Simpson
2006-Oct-30 08:07 UTC
[R] Unexpected behavior of predict and interval="confidence"
On Sun, 2006-10-29 at 12:54 -0500, Kevin Middleton wrote:> Based on some recent r-help discussions, I have been trying out > plotting confidence intervals using predict and matplot. Matplot > appeared to not be plotting the linear regression when using the > default column names generated by read.table (V1, V2, etc). On > further inspection, the error seemed to be with predict and vector > names (V1, V2) rather than with matplot. I was using some textbook > data sets that were read with read.table, so my data.frame columns > were named by default. The problem seems to be related to the name of > the vector only (though I may be wrong). > > The example below, based on that in ?predict.lm, better describes the > problem. Using predict with V2~V1 & y~x produces identical output > (when x<-V1 and y<-V2). However, using predict with > interval="confidence" results in different output from the same data. > That with y~x is correct, and V2~V1 is "incorrect".This is because you have been caught out by the way newdata is handled in predict. This is your problem: new <- data.frame(x = seq(min(V1), max(V1), length.out = length(V1))) names(new) You have created new, a 1 column data frame with the colname "x". As "x" doesn't exist in the model formula (V2 ~ V1), what the predict line actually means, is to predict value for the observed data used to fit them model, i.e. the fitted values: pred.w.clim <- predict(lm(V2 ~ V1), new, interval="confidence") fit <- fitted(lm(V2 ~ V1)) all.equal(fit, pred.w.clim[, 1]) When you do predict(lm(y ~ x), new, interval="confidence"), "x" is now in the model formula *and* in "new" so you get predictions for the x values in "new". If you'd done this: new <- data.frame(V1 = seq(min(V1), max(V1), length.out = length(V1))) predict(lm(V2 ~ V1), new, interval="confidence") Then you'd have gotten what you wanted. HTH G> > This may be related to a previous post on r-help: > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/56982.html > > I can't figure out why there would be a difference in predict when > only the vector name seemingly changes. Granted, I could just read > the data.frame is as "x" and "y." > > Thanks > Kevin > > ###################### > > set.seed(10) > > # For example purposes, plot side by side > par(mfrow=c(1,2)) > > V1 <- rnorm(15) > V2 <- V1 + rnorm(15) > > new <- data.frame(x = seq(min(V1), max(V1), length.out = length(V1))) > > pred.w.clim <- predict(lm(V2 ~ V1), new, interval="confidence") > matplot(new$x, pred.w.clim, > lty=c(1,2,2), type="l", > col=c("black", "red", "red"), > ylab="predicted y") > points(V1,V2) > > > # Create x & y equal to V1 & V2 > x <- V1 > y <- V2 > > pred.w.clim2 <- predict(lm(y ~ x), new, interval="confidence") > matplot(new$x, pred.w.clim2, > lty=c(1,2,2), type="l", > col=c("black", "red", "red"), > ylab="predicted y") > points(x,y) > > # Test if V1=x, V2=y > all.equal(x,V1) > all.equal(y,V2) > > # Same output with predict > predict(lm(V2 ~ V1)) > predict(lm(y ~ x)) > all.equal(predict(lm(V2 ~ V1)), predict(lm(y ~ x))) > > # Different output with interval="confidence" > pred.w.clim > pred.w.clim2 > all.equal(pred.w.clim, pred.w.clim2) > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC [f] +44 (0)20 7679 0565 UCL Department of Geography Pearson Building [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street London, UK [w] http://www.ucl.ac.uk/~ucfagls/ WC1E 6BT [w] http://www.freshwaters.org.uk/ %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%