On 08 Apr 2016, at 12:57 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 07/04/2016 5:21 PM, Murray Efford wrote: >> Following some old advice on this list, I have been reading the code for summary.lm to understand the computation of R-squared from a weighted regression. Usually weights in lm are applied to squared residuals, but I see that the weighted mean of the observations is calculated as if the weights are on the original scale: >> >> [...] >> f <- z$fitted.values >> w <- z$weights >> [...] >> m <- sum(w * f/sum(w)) >> [mss <-] sum(w * (f - m)^2) >> [...] >> >> This seems inconsistent to me. What am I missing? > > I think you are expecting consistency where there needn't be any. Why do you see an inconsistency here? Those are different calculations. You get expressions like these if you assume observations have variance sigma^2/w, and you're trying to estimate sigma^2. >It's also perfectly consistent that m is the minimizer of mss: d/dm sum(w*(f-m)^2) = -2 sum(w*(f-m)) = 0 => m = sum(w*f) / sum(w) However, beware the distiction between inverse variance weights, replication weights, and sampling weights.> Duncan Murdoch > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Thanks for these perfectly consistent replies - I didn't understand the purpose of m = sum(w * f/sum(w)) and saw it merely as a weighted average of the fitted values. My ultimate concern is how to compute an appropriate weighted TSS (or equivalently, MSS) for PRESS-R^2 = 1 - PRESS/TSS = 1 - PRESS/ (MSS + PRESS). Do you think it then makes sense to substitute the vector of leave-one-out fitted values for f here? m <- sum(w * f/sum(w)) mss <- sum(w * (f - m)^2) Murray ________________________________________ From: peter dalgaard <pdalgd at gmail.com> Sent: Friday, 8 April 2016 11:28 p.m. To: Duncan Murdoch Cc: Murray Efford; r-help at r-project.org Subject: Re: [R] R.squared in summary.lm with weights On 08 Apr 2016, at 12:57 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 07/04/2016 5:21 PM, Murray Efford wrote: >> Following some old advice on this list, I have been reading the code for summary.lm to understand the computation of R-squared from a weighted regression. Usually weights in lm are applied to squared residuals, but I see that the weighted mean of the observations is calculated as if the weights are on the original scale: >> >> [...] >> f <- z$fitted.values >> w <- z$weights >> [...] >> m <- sum(w * f/sum(w)) >> [mss <-] sum(w * (f - m)^2) >> [...] >> >> This seems inconsistent to me. What am I missing? > > I think you are expecting consistency where there needn't be any. Why do you see an inconsistency here? Those are different calculations. You get expressions like these if you assume observations have variance sigma^2/w, and you're trying to estimate sigma^2. >It's also perfectly consistent that m is the minimizer of mss: d/dm sum(w*(f-m)^2) = -2 sum(w*(f-m)) = 0 => m = sum(w*f) / sum(w) However, beware the distiction between inverse variance weights, replication weights, and sampling weights.> Duncan Murdoch > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
>>>>> Murray Efford <murray.efford at otago.ac.nz> >>>>> on Fri, 8 Apr 2016 18:45:33 +0000 writes:> Thanks for these perfectly consistent replies - I didn't > understand the purpose of m = sum(w * f/sum(w)) and saw it > merely as a weighted average of the fitted values. My > ultimate concern is how to compute an appropriate weighted > TSS (or equivalently, MSS) for PRESS-R^2 = 1 - PRESS/TSS > 1 - PRESS/ (MSS + PRESS). Do you think it then makes sense > to substitute the vector of leave-one-out fitted values > for f here? --> A new topic really. I think you should find the answer on the help pages (and in the source) of ? influence.measures (which documents a host of such functions) and ? influence Note that influence is S3 generic and methods(influence) indicates that the 'lm' and 'glm' methods are hidden. Of course I do recommend reading the real R source code (which also contains the comments and has some logical order in all the function definitions), but you can use stats ::: influence.lm to show a version of the function that looks not too different from the source. Martin Maechler, ETH Zurich