thr3ads.net - R help - [R] weight in lm [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Spencer Graves

2017-Aug-14 11:43 UTC

[R] weight in lm

On 2017-08-14 5:53 AM, peter dalgaard wrote:>> On 14 Aug 2017, at 10:13 , Troels Ring <tring at gvdnet.dk>
wrote:
>>
>> Dear friends - I hope you will accept a naive question on lm: R version
3.4.1, Windows 10
>>
>> I have 204 "baskets" of three types corresponding to factor
F, each of size from 2 to 33 containing measurements, and need to know if the
standard deviation on the measurements  in each basket,sdd, is different across
types, F. Plotting the observed sdd  versus the sizes from 2 to 33, called
"k" , does show a decreasing spread as k increases towards 33.
>>
>> I tried lm(sdd ~ F,weight=k) and got different results if omitting the
weight argument but would it be the correct way to use sqrt(k) as weight
instead?
>>
> I doubt that there is a "correct" way, but theory says that if
the baskets have the same SD and data are normally distributed, then the
variance of the sample VARIANCE is proportional to 1/f = 1/(k-1). Weights in lm
are inverse-variance, so the "natural" thing to do would seem to be to
regress the square of sdd with weights (k-1).
>
> (If the distribution is not normal, the variance of the sample variance is
complicated by a term that involves both n and the excess kurtosis, whereas the
variance of the sample SD is complicated in any case. All according to the
gospel of St.Google.)

       The Wikipedia article on "standard deviation" gives the more 
general formula.  (That article does NOT give a citation for that 
formula.  I you know one, please add it -- or post it here, to make it 
easier for someone else to add it.)


       Thanks, Peter.
       Spencer Graves>
> -pd
>
>
>> Best wishes
>>
>> Troels Ring
>> Aalborg, Denmark
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

peter dalgaard

2017-Aug-14 12:17 UTC

head link

[R] weight in lm

> On 14 Aug 2017, at 13:43 , Spencer Graves <spencer.graves at
effectivedefense.org> wrote:
> 
> 
> 
> On 2017-08-14 5:53 AM, peter dalgaard wrote:
>>> On 14 Aug 2017, at 10:13 , Troels Ring <tring at gvdnet.dk>
wrote:
>>> 
>>> Dear friends - I hope you will accept a naive question on lm: R
version 3.4.1, Windows 10
>>> 
>>> I have 204 "baskets" of three types corresponding to
factor F, each of size from 2 to 33 containing measurements, and need to know if
the standard deviation on the measurements  in each basket,sdd, is different
across types, F. Plotting the observed sdd  versus the sizes from 2 to 33,
called "k" , does show a decreasing spread as k increases towards 33.
>>> 
>>> I tried lm(sdd ~ F,weight=k) and got different results if omitting
the weight argument but would it be the correct way to use sqrt(k) as weight
instead?
>>> 
>> I doubt that there is a "correct" way, but theory says that
if the baskets have the same SD and data are normally distributed, then the
variance of the sample VARIANCE is proportional to 1/f = 1/(k-1). Weights in lm
are inverse-variance, so the "natural" thing to do would seem to be to
regress the square of sdd with weights (k-1).
>> 
>> (If the distribution is not normal, the variance of the sample variance
is complicated by a term that involves both n and the excess kurtosis, whereas
the variance of the sample SD is complicated in any case. All according to the
gospel of St.Google.)
> 
> 
>      The Wikipedia article on "standard deviation" gives the more
general formula.  (That article does NOT give a citation for that formula.  I
you know one, please add it -- or post it here, to make it easier for someone
else to add it.)
> 
Er, I don't see that (i.e. var(S) etc.) in there? 

My sources were

https://math.stackexchange.com/questions/72975/variance-of-sample-variance
https://stats.stackexchange.com/questions/631/standard-deviation-of-standard-deviation

which contains further links, but no references to publications. I suspect that
this stuff is easy enough to do ab initio that people don't bother to fire
up a literature search.

-pd

> 
>      Thanks, Peter.
>      Spencer Graves
>> 
>> -pd
>> 
>> 
>>> Best wishes
>>> 
>>> Troels Ring
>>> Aalborg, Denmark
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

David Winsemius

2017-Aug-14 17:49 UTC

head link

[R] weight in lm

> On Aug 14, 2017, at 5:17 AM, peter dalgaard <pdalgd at gmail.com>
wrote:
> 
> 
>> On 14 Aug 2017, at 13:43 , Spencer Graves <spencer.graves at
effectivedefense.org> wrote:
>> 
>> 
>> 
>> On 2017-08-14 5:53 AM, peter dalgaard wrote:
>>>> On 14 Aug 2017, at 10:13 , Troels Ring <tring at
gvdnet.dk> wrote:
>>>> 
>>>> Dear friends - I hope you will accept a naive question on lm: R
version 3.4.1, Windows 10
>>>> 
>>>> I have 204 "baskets" of three types corresponding to
factor F, each of size from 2 to 33 containing measurements, and need to know if
the standard deviation on the measurements  in each basket,sdd, is different
across types, F. Plotting the observed sdd  versus the sizes from 2 to 33,
called "k" , does show a decreasing spread as k increases towards 33.
>>>> 
>>>> I tried lm(sdd ~ F,weight=k) and got different results if
omitting the weight argument but would it be the correct way to use sqrt(k) as
weight instead?
>>>> 
>>> I doubt that there is a "correct" way, but theory says
that if the baskets have the same SD and data are normally distributed, then the
variance of the sample VARIANCE is proportional to 1/f = 1/(k-1). Weights in lm
are inverse-variance, so the "natural" thing to do would seem to be to
regress the square of sdd with weights (k-1).
>>> 
>>> (If the distribution is not normal, the variance of the sample
variance is complicated by a term that involves both n and the excess kurtosis,
whereas the variance of the sample SD is complicated in any case. All according
to the gospel of St.Google.)
>> 
>> 
>>     The Wikipedia article on "standard deviation" gives the
more general formula.  (That article does NOT give a citation for that formula. 
I you know one, please add it -- or post it here, to make it easier for someone
else to add it.)
>> 
> 
> Er, I don't see that (i.e. var(S) etc.) in there? 
> 
> My sources were
> 
> https://math.stackexchange.com/questions/72975/variance-of-sample-variance
>
https://stats.stackexchange.com/questions/631/standard-deviation-of-standard-deviation
> 
> which contains further links, but no references to publications. I suspect
that this stuff is easy enough to do ab initio that people don't bother to
fire up a literature search.
I don't see why that page doesn't cite:
https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation

... which had several citations including to Johnson, Kotz and Balakrishnan, v
1, ch 13 sect 8.2. I dug out my copy from the bottom of a large pile of tomes
that I had not reshelved and can confirm that the formula is almost (but not
quite) the same as appears in print.

JK&M give a formula (p 127) with no derivation or citation:

E[S] = sigma*( 2/n )^(1/2)*Gamma(n/2)/Gamma[ (n-1)/2 ]

Whereas the Wikipedia page citing a 1968 TAS article gives:

E[S] = sigma*( 2/(n-1) )^(1/2)*Gamma(n/2)/Gamma[ (n-1)/2 ]

I looked up the Bloch note online:

http://www.tandfonline.com/doi/abs/10.1080/00031305.1968.10480476?journalCode=utas20

And it does not have the formula. It was a note on an earlier article by
Cureton, who in turn cited an American Journal of Psychology article by
Holtxman(1950, v63, 615-617).
http://amstat.tandfonline.com/doi/abs/10.1080/00031305.1968.10480435?src=recsys

Searching on that article I see the first hit is a citation to some R
documentation for hte MBESS::s.u function, which does implement it as
recommended by Holtzman.

If I were voting on this I would put greater weight on the JK&M but
that's just because it is incredibly likely that I could do the math.

Best;
David.




> 
> -pd
> 
> 
>> 
>>     Thanks, Peter.
>>     Spencer Graves
>>> 
>>> -pd
>>> 
David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.' 
-Gehm's Corollary to Clarke's Third Law

Possibly Parallel Threads

Search for more maybe matching threads

R help - Aug 2017 - weight in lm

[R] weight in lm

[R] weight in lm

[R] weight in lm

Possibly Parallel Threads