jranke at uni-bremen.de
2006-May-18 15:14 UTC
[Rd] predict.lm does not have a weights argument for newdata (PR#8877)
Full_Name: Johannes Ranke
Version: 2.3.0
OS: Linux-i386
Submission from: (NULL) (134.102.60.74)
In the case that predict.lm is used on an object resulting from weighted linear
regression with interval="prediction", the prediction intervals
currently depend
on
the absolute size of object$weights:
data(anscombe); attach(anscombe)
m1 <- lm(y1 ~ x1, w = rep(1,length(x1)))
m2 <- lm(y1 ~ x1, w = rep(3,length(x1)))
predict(m1, data.frame(x1 = 5), interval = "prediction")
predict(m2, data.frame(x1 = 5), interval = "prediction")
This make sense only if the weights are taken to be the numbers of replicates
used
for deriving the x values in newdata, and even then, the given prediction
interval
is only correct if the number of replicates for establishing the x values in
newdata is always one.
The desired behavior would be that the user gives a vector weights.newdata for
newdata, matching the weighting scheme applied for the weighted regression (e.g.
calculated from a variance function).
My education in statistics is only medium, so I am not sure if my proposed
solution is correct. Please check my patch to fix this:
--- lm.R.orig 2006-04-10 00:19:39.000000000 +0200
+++ lm.R 2006-05-18 17:10:29.000000000 +0200
@@ -591,7 +591,8 @@
function(object, newdata, se.fit = FALSE, scale = NULL, df = Inf,
interval = c("none", "confidence",
"prediction"),
level = .95, type = c("response", "terms"),
- terms = NULL, na.action = na.pass, ...)
+ terms = NULL, na.action = na.pass, ...,
+ weights.newdata = rep(1, length(newdata[[1]])))
{
tt <- terms(object)
if(missing(newdata) || is.null(newdata)) {
@@ -630,6 +631,11 @@
r <- object$residuals
w <- object$weights
rss <- sum(if(is.null(w)) r^2 else r^2 * w)
+ if(!is.null(w) && interval == "prediction" &&
weights.newdata =rep(1,length(newdata[[1]]))) {
+ warning(paste(
+ "To find prediction intervals from linear models with
weights,",
+ "you probably want weights.newdata different from one."))
+ }
df <- n - p
rss/df
} else scale^2
@@ -715,7 +721,7 @@
tfrac <- qt((1 - level)/2, df)
hwid <- tfrac * switch(interval,
confidence = sqrt(ip),
- prediction = sqrt(ip+res.var)
+ prediction = sqrt(ip+res.var/weights.newdata)
)
if(type != "terms") {
predictor <- cbind(predictor, predictor + hwid %o% c(1, -1))
---------------------------end patch-------------------------------------------
if you apply this to lm.R, you get the same result from both lines:
predict(m1, data.frame(x1 = 5), interval = "prediction")
predict(m2, data.frame(x1 = 5), interval = "prediction",
weights.newdata = 3)
except that you get a warning if you set all newweights to one (default),
because this is probably not what you want for a prediction interval from
weighted regression.
All it does is to (down)scale the part of the variance that each new data point
contributes to the variance of the predicted y.