Arie ten Cate
2017-Oct-07 07:35 UTC
[Rd] Discourage the weights= option of lm with summarized data
In the Details section of lm (linear models) in the Reference manual, it is suggested to use the weights= option for summarized data. This must be discouraged rather than encouraged. The motivation for this is as follows. With summarized data the standard errors get smaller with increasing numbers of observations. However, the standard errors in lm do not get smaller when for instance all weights are multiplied with the same constant larger than one, since the inverse weights are merely proportional to the error variances. Here is an example of the estimated standard errors being too large with the weights= option. The p value and the number of degrees of freedom are also wrong. The parameter estimates are correct. n <- 10 x <- c(1,2,3,4) y <- c(1,2,5,4) w <- c(1,1,1,n) xb <- c(x,rep(x[4],n-1)) # restore the original data yb <- c(y,rep(y[4],n-1)) print(summary(lm(yb ~ xb))) print(summary(lm(y ~ x, weights=w))) Compare with PROC REG in SAS, with a WEIGHT statement (like R) and a FREQ statement (for summarized data). Arie
Viechtbauer Wolfgang (SP)
2017-Oct-07 13:34 UTC
[Rd] Discourage the weights= option of lm with summarized data
Using 'weights' is not meant to indicate that the same observation is repeated 'n' times. It is meant to indicate different variances (or to be precise, that the variance of the last observation in 'x' is sigma^2 / n, while the first three observations have variance sigma^2). Best, Wolfgang -----Original Message----- From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Arie ten Cate Sent: Saturday, 07 October, 2017 9:36 To: r-devel at r-project.org Subject: [Rd] Discourage the weights= option of lm with summarized data In the Details section of lm (linear models) in the Reference manual, it is suggested to use the weights= option for summarized data. This must be discouraged rather than encouraged. The motivation for this is as follows. With summarized data the standard errors get smaller with increasing numbers of observations. However, the standard errors in lm do not get smaller when for instance all weights are multiplied with the same constant larger than one, since the inverse weights are merely proportional to the error variances. Here is an example of the estimated standard errors being too large with the weights= option. The p value and the number of degrees of freedom are also wrong. The parameter estimates are correct. n <- 10 x <- c(1,2,3,4) y <- c(1,2,5,4) w <- c(1,1,1,n) xb <- c(x,rep(x[4],n-1)) # restore the original data yb <- c(y,rep(y[4],n-1)) print(summary(lm(yb ~ xb))) print(summary(lm(y ~ x, weights=w))) Compare with PROC REG in SAS, with a WEIGHT statement (like R) and a FREQ statement (for summarized data). Arie
Reasonably Related Threads
- Discourage the weights= option of lm with summarized data
- Discourage the weights= option of lm with summarized data
- Discourage the weights= option of lm with summarized data
- Discourage the weights= option of lm with summarized data
- Discourage the weights= option of lm with summarized data