jranke at uni-bremen.de
2006-May-18 15:14 UTC
[Rd] predict.lm does not have a weights argument for newdata (PR#8877)
Full_Name: Johannes Ranke Version: 2.3.0 OS: Linux-i386 Submission from: (NULL) (134.102.60.74) In the case that predict.lm is used on an object resulting from weighted linear regression with interval="prediction", the prediction intervals currently depend on the absolute size of object$weights: data(anscombe); attach(anscombe) m1 <- lm(y1 ~ x1, w = rep(1,length(x1))) m2 <- lm(y1 ~ x1, w = rep(3,length(x1))) predict(m1, data.frame(x1 = 5), interval = "prediction") predict(m2, data.frame(x1 = 5), interval = "prediction") This make sense only if the weights are taken to be the numbers of replicates used for deriving the x values in newdata, and even then, the given prediction interval is only correct if the number of replicates for establishing the x values in newdata is always one. The desired behavior would be that the user gives a vector weights.newdata for newdata, matching the weighting scheme applied for the weighted regression (e.g. calculated from a variance function). My education in statistics is only medium, so I am not sure if my proposed solution is correct. Please check my patch to fix this: --- lm.R.orig 2006-04-10 00:19:39.000000000 +0200 +++ lm.R 2006-05-18 17:10:29.000000000 +0200 @@ -591,7 +591,8 @@ function(object, newdata, se.fit = FALSE, scale = NULL, df = Inf, interval = c("none", "confidence", "prediction"), level = .95, type = c("response", "terms"), - terms = NULL, na.action = na.pass, ...) + terms = NULL, na.action = na.pass, ..., + weights.newdata = rep(1, length(newdata[[1]]))) { tt <- terms(object) if(missing(newdata) || is.null(newdata)) { @@ -630,6 +631,11 @@ r <- object$residuals w <- object$weights rss <- sum(if(is.null(w)) r^2 else r^2 * w) + if(!is.null(w) && interval == "prediction" && weights.newdata =rep(1,length(newdata[[1]]))) { + warning(paste( + "To find prediction intervals from linear models with weights,", + "you probably want weights.newdata different from one.")) + } df <- n - p rss/df } else scale^2 @@ -715,7 +721,7 @@ tfrac <- qt((1 - level)/2, df) hwid <- tfrac * switch(interval, confidence = sqrt(ip), - prediction = sqrt(ip+res.var) + prediction = sqrt(ip+res.var/weights.newdata) ) if(type != "terms") { predictor <- cbind(predictor, predictor + hwid %o% c(1, -1)) ---------------------------end patch------------------------------------------- if you apply this to lm.R, you get the same result from both lines: predict(m1, data.frame(x1 = 5), interval = "prediction") predict(m2, data.frame(x1 = 5), interval = "prediction", weights.newdata = 3) except that you get a warning if you set all newweights to one (default), because this is probably not what you want for a prediction interval from weighted regression. All it does is to (down)scale the part of the variance that each new data point contributes to the variance of the predicted y.