thr3ads.net - R devel - [Rd] Discourage the weights= option of lm with summarized data [Oct 2017]

If this information is useful, please help other people find it:
Share via:

Arie ten Cate

2017-Oct-08 12:55 UTC

[Rd] Discourage the weights= option of lm with summarized data

Indeed: Using 'weights' is not meant to indicate that the same
observation is repeated 'n' times.  As I showed, this gives erroneous
results. Hence I suggested that it is discouraged rather than
encouraged in the Details section of lm in the Reference manual.

   Arie

---Original Message-----
On Sat, 7 Oct 2017, wolfgang.viechtbauer at maastrichtuniversity.nl wrote:

Using 'weights' is not meant to indicate that the same observation is
repeated 'n' times. It is meant to indicate different variances (or to
be precise, that the variance of the last observation in 'x' is
sigma^2 / n, while the first three observations have variance
sigma^2).

Best,
Wolfgang

-----Original Message-----
From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Arie ten
Cate
Sent: Saturday, 07 October, 2017 9:36
To: r-devel at r-project.org
Subject: [Rd] Discourage the weights= option of lm with summarized data

In the Details section of lm (linear models) in the Reference manual,
it is suggested to use the weights= option for summarized data. This
must be discouraged rather than encouraged. The motivation for this is
as follows.

With summarized data the standard errors get smaller with increasing
numbers of observations. However, the standard errors in lm do not get
smaller when for instance all weights are multiplied with the same
constant larger than one, since the inverse weights are merely
proportional to the error variances.

Here is an example of the estimated standard errors being too large
with the weights= option. The p value and the number of degrees of
freedom are also wrong. The parameter estimates are correct.

  n <- 10
  x <- c(1,2,3,4)
  y <- c(1,2,5,4)
  w <- c(1,1,1,n)
  xb <- c(x,rep(x[4],n-1))  # restore the original data
  yb <- c(y,rep(y[4],n-1))
  print(summary(lm(yb ~ xb)))
  print(summary(lm(y ~ x, weights=w)))

Compare with PROC REG in SAS, with a WEIGHT statement (like R) and a
FREQ statement (for summarized data).

    Arie

Viechtbauer Wolfgang (SP)

2017-Oct-08 14:38 UTC

head link

[Rd] Discourage the weights= option of lm with summarized data

Ah, I think you are referring to this part from ?lm:

"(including the case that there are w_i observations equal to y_i and the
data have been summarized)"

I see; indeed, I don't think this is what 'weights' should be used
for (the other part before that is correct). Sorry, I misunderstood the point
you were trying to make.

Best,
Wolfgang

-----Original Message-----
From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Arie ten
Cate
Sent: Sunday, 08 October, 2017 14:55
To: r-devel at r-project.org
Subject: [Rd] Discourage the weights= option of lm with summarized data

Indeed: Using 'weights' is not meant to indicate that the same
observation is repeated 'n' times.  As I showed, this gives erroneous
results. Hence I suggested that it is discouraged rather than
encouraged in the Details section of lm in the Reference manual.

   Arie

---Original Message-----
On Sat, 7 Oct 2017, wolfgang.viechtbauer at maastrichtuniversity.nl wrote:

Using 'weights' is not meant to indicate that the same observation is
repeated 'n' times. It is meant to indicate different variances (or to
be precise, that the variance of the last observation in 'x' is
sigma^2 / n, while the first three observations have variance
sigma^2).

Best,
Wolfgang

-----Original Message-----
From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Arie ten
Cate
Sent: Saturday, 07 October, 2017 9:36
To: r-devel at r-project.org
Subject: [Rd] Discourage the weights= option of lm with summarized data

In the Details section of lm (linear models) in the Reference manual,
it is suggested to use the weights= option for summarized data. This
must be discouraged rather than encouraged. The motivation for this is
as follows.

With summarized data the standard errors get smaller with increasing
numbers of observations. However, the standard errors in lm do not get
smaller when for instance all weights are multiplied with the same
constant larger than one, since the inverse weights are merely
proportional to the error variances.

Here is an example of the estimated standard errors being too large
with the weights= option. The p value and the number of degrees of
freedom are also wrong. The parameter estimates are correct.

  n <- 10
  x <- c(1,2,3,4)
  y <- c(1,2,5,4)
  w <- c(1,1,1,n)
  xb <- c(x,rep(x[4],n-1))  # restore the original data
  yb <- c(y,rep(y[4],n-1))
  print(summary(lm(yb ~ xb)))
  print(summary(lm(y ~ x, weights=w)))

Compare with PROC REG in SAS, with a WEIGHT statement (like R) and a
FREQ statement (for summarized data).

    Arie

Arie ten Cate

2017-Oct-09 05:58 UTC

head link

[Rd] Discourage the weights= option of lm with summarized data

Yes.  Thank you; I should have quoted it.
I suggest to remove this text or to add the word "not" at the
beginning.

   Arie

On Sun, Oct 8, 2017 at 4:38 PM, Viechtbauer Wolfgang (SP)
<wolfgang.viechtbauer at maastrichtuniversity.nl>
wrote:> Ah, I think you are referring to this part from ?lm:
>
> "(including the case that there are w_i observations equal to y_i and
the data have been summarized)"
>
> I see; indeed, I don't think this is what 'weights' should be
used for (the other part before that is correct). Sorry, I misunderstood the
point you were trying to make.
>
> Best,
> Wolfgang
>
> -----Original Message-----
> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Arie
ten Cate
> Sent: Sunday, 08 October, 2017 14:55
> To: r-devel at r-project.org
> Subject: [Rd] Discourage the weights= option of lm with summarized data
>
> Indeed: Using 'weights' is not meant to indicate that the same
> observation is repeated 'n' times.  As I showed, this gives
erroneous
> results. Hence I suggested that it is discouraged rather than
> encouraged in the Details section of lm in the Reference manual.
>
>    Arie
>
> ---Original Message-----
> On Sat, 7 Oct 2017, wolfgang.viechtbauer at maastrichtuniversity.nl wrote:
>
> Using 'weights' is not meant to indicate that the same observation
is
> repeated 'n' times. It is meant to indicate different variances (or
to
> be precise, that the variance of the last observation in 'x' is
> sigma^2 / n, while the first three observations have variance
> sigma^2).
>
> Best,
> Wolfgang
>
> -----Original Message-----
> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Arie
ten Cate
> Sent: Saturday, 07 October, 2017 9:36
> To: r-devel at r-project.org
> Subject: [Rd] Discourage the weights= option of lm with summarized data
>
> In the Details section of lm (linear models) in the Reference manual,
> it is suggested to use the weights= option for summarized data. This
> must be discouraged rather than encouraged. The motivation for this is
> as follows.
>
> With summarized data the standard errors get smaller with increasing
> numbers of observations. However, the standard errors in lm do not get
> smaller when for instance all weights are multiplied with the same
> constant larger than one, since the inverse weights are merely
> proportional to the error variances.
>
> Here is an example of the estimated standard errors being too large
> with the weights= option. The p value and the number of degrees of
> freedom are also wrong. The parameter estimates are correct.
>
>   n <- 10
>   x <- c(1,2,3,4)
>   y <- c(1,2,5,4)
>   w <- c(1,1,1,n)
>   xb <- c(x,rep(x[4],n-1))  # restore the original data
>   yb <- c(y,rep(y[4],n-1))
>   print(summary(lm(yb ~ xb)))
>   print(summary(lm(y ~ x, weights=w)))
>
> Compare with PROC REG in SAS, with a WEIGHT statement (like R) and a
> FREQ statement (for summarized data).
>
>     Arie
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Apparently Analagous Threads

Search for more apparently analagous threads

R devel - Oct 2017 - Discourage the weights= option of lm with summarized data

[Rd] Discourage the weights= option of lm with summarized data

[Rd] Discourage the weights= option of lm with summarized data

[Rd] Discourage the weights= option of lm with summarized data

Apparently Analagous Threads