> On 14 Aug 2017, at 13:43 , Spencer Graves <spencer.graves at effectivedefense.org> wrote: > > > > On 2017-08-14 5:53 AM, peter dalgaard wrote: >>> On 14 Aug 2017, at 10:13 , Troels Ring <tring at gvdnet.dk> wrote: >>> >>> Dear friends - I hope you will accept a naive question on lm: R version 3.4.1, Windows 10 >>> >>> I have 204 "baskets" of three types corresponding to factor F, each of size from 2 to 33 containing measurements, and need to know if the standard deviation on the measurements in each basket,sdd, is different across types, F. Plotting the observed sdd versus the sizes from 2 to 33, called "k" , does show a decreasing spread as k increases towards 33. >>> >>> I tried lm(sdd ~ F,weight=k) and got different results if omitting the weight argument but would it be the correct way to use sqrt(k) as weight instead? >>> >> I doubt that there is a "correct" way, but theory says that if the baskets have the same SD and data are normally distributed, then the variance of the sample VARIANCE is proportional to 1/f = 1/(k-1). Weights in lm are inverse-variance, so the "natural" thing to do would seem to be to regress the square of sdd with weights (k-1). >> >> (If the distribution is not normal, the variance of the sample variance is complicated by a term that involves both n and the excess kurtosis, whereas the variance of the sample SD is complicated in any case. All according to the gospel of St.Google.) > > > The Wikipedia article on "standard deviation" gives the more general formula. (That article does NOT give a citation for that formula. I you know one, please add it -- or post it here, to make it easier for someone else to add it.) >Er, I don't see that (i.e. var(S) etc.) in there? My sources were https://math.stackexchange.com/questions/72975/variance-of-sample-variance https://stats.stackexchange.com/questions/631/standard-deviation-of-standard-deviation which contains further links, but no references to publications. I suspect that this stuff is easy enough to do ab initio that people don't bother to fire up a literature search. -pd> > Thanks, Peter. > Spencer Graves >> >> -pd >> >> >>> Best wishes >>> >>> Troels Ring >>> Aalborg, Denmark >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com

> On Aug 14, 2017, at 5:17 AM, peter dalgaard <pdalgd at gmail.com> wrote: > > >> On 14 Aug 2017, at 13:43 , Spencer Graves <spencer.graves at effectivedefense.org> wrote: >> >> >> >> On 2017-08-14 5:53 AM, peter dalgaard wrote: >>>> On 14 Aug 2017, at 10:13 , Troels Ring <tring at gvdnet.dk> wrote: >>>> >>>> Dear friends - I hope you will accept a naive question on lm: R version 3.4.1, Windows 10 >>>> >>>> I have 204 "baskets" of three types corresponding to factor F, each of size from 2 to 33 containing measurements, and need to know if the standard deviation on the measurements in each basket,sdd, is different across types, F. Plotting the observed sdd versus the sizes from 2 to 33, called "k" , does show a decreasing spread as k increases towards 33. >>>> >>>> I tried lm(sdd ~ F,weight=k) and got different results if omitting the weight argument but would it be the correct way to use sqrt(k) as weight instead? >>>> >>> I doubt that there is a "correct" way, but theory says that if the baskets have the same SD and data are normally distributed, then the variance of the sample VARIANCE is proportional to 1/f = 1/(k-1). Weights in lm are inverse-variance, so the "natural" thing to do would seem to be to regress the square of sdd with weights (k-1). >>> >>> (If the distribution is not normal, the variance of the sample variance is complicated by a term that involves both n and the excess kurtosis, whereas the variance of the sample SD is complicated in any case. All according to the gospel of St.Google.) >> >> >> The Wikipedia article on "standard deviation" gives the more general formula. (That article does NOT give a citation for that formula. I you know one, please add it -- or post it here, to make it easier for someone else to add it.) >> > > Er, I don't see that (i.e. var(S) etc.) in there? > > My sources were > > https://math.stackexchange.com/questions/72975/variance-of-sample-variance > https://stats.stackexchange.com/questions/631/standard-deviation-of-standard-deviation > > which contains further links, but no references to publications. I suspect that this stuff is easy enough to do ab initio that people don't bother to fire up a literature search.I don't see why that page doesn't cite: https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation ... which had several citations including to Johnson, Kotz and Balakrishnan, v 1, ch 13 sect 8.2. I dug out my copy from the bottom of a large pile of tomes that I had not reshelved and can confirm that the formula is almost (but not quite) the same as appears in print. JK&M give a formula (p 127) with no derivation or citation: E[S] = sigma*( 2/n )^(1/2)*Gamma(n/2)/Gamma[ (n-1)/2 ] Whereas the Wikipedia page citing a 1968 TAS article gives: E[S] = sigma*( 2/(n-1) )^(1/2)*Gamma(n/2)/Gamma[ (n-1)/2 ] I looked up the Bloch note online: http://www.tandfonline.com/doi/abs/10.1080/00031305.1968.10480476?journalCode=utas20 And it does not have the formula. It was a note on an earlier article by Cureton, who in turn cited an American Journal of Psychology article by Holtxman(1950, v63, 615-617). http://amstat.tandfonline.com/doi/abs/10.1080/00031305.1968.10480435?src=recsys Searching on that article I see the first hit is a citation to some R documentation for hte MBESS::s.u function, which does implement it as recommended by Holtzman. If I were voting on this I would put greater weight on the JK&M but that's just because it is incredibly likely that I could do the math. Best; David.> > -pd > > >> >> Thanks, Peter. >> Spencer Graves >>> >>> -pd >>>David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law