Hello, I was constructing a simple linear model with one categorical (3-levels) and one quantitative predictor variable for a colleague. I estimated model parameters with and without an intercept, sometimes called reference cell coding and cell means coding. Model 1: yResp ~ -1 + xCat + xCont Model 2: yResp ~ xCat + xCont These models are equivalent and the estimated coefficients come out fine, but the R-squared and F statistics returned by summary() differ markedly. I spent some time looking at the code for both lm() and summary.lm() but did not find the source of the difference. aov() and anova() results also differ, so I suspect the issue involves how the sums of squares are being computed. I've also spent some time trying to search online for information on this, without success. I haven't used lm() for quite a while, but my memory is that these differences didn't occur in the distant past when I was teaching. Thanks in advance for any insights you might have, Jeff Jeffrey F. Bromaghin Research Statistician USGS Alaska Science Center 907-786-7086 Jeffrey Bromaghin, Ph.D. | U.S. Geological Survey (usgs.gov)<https://www.usgs.gov/staff-profiles/jeffrey-bromaghin> Ecosystems Analytics | U.S. Geological Survey (usgs.gov)<https://www.usgs.gov/centers/alaska-science-center/science/ecosystems-analytics> [[alternative HTML version deleted]]
The models are NOT equivalent. Why would you?ll think they were? ? David Sent from my iPhone> On Feb 9, 2022, at 11:10 PM, Bromaghin, Jeffrey F via R-help <r-help at r-project.org> wrote: > > ?Hello, > > I was constructing a simple linear model with one categorical (3-levels) and one quantitative predictor variable for a colleague. I estimated model parameters with and without an intercept, sometimes called reference cell coding and cell means coding. > > Model 1: yResp ~ -1 + xCat + xCont > Model 2: yResp ~ xCat + xCont > > These models are equivalent and the estimated coefficients come out fine, but the R-squared and F statistics returned by summary() differ markedly. I spent some time looking at the code for both lm() and summary.lm() but did not find the source of the difference. aov() and anova() results also differ, so I suspect the issue involves how the sums of squares are being computed. I've also spent some time trying to search online for information on this, without success. I haven't used lm() for quite a while, but my memory is that these differences didn't occur in the distant past when I was teaching. > > Thanks in advance for any insights you might have, > Jeff > > Jeffrey F. Bromaghin > Research Statistician > USGS Alaska Science Center > 907-786-7086 > Jeffrey Bromaghin, Ph.D. | U.S. Geological Survey (usgs.gov)<https://www.usgs.gov/staff-profiles/jeffrey-bromaghin> > Ecosystems Analytics | U.S. Geological Survey (usgs.gov)<https://www.usgs.gov/centers/alaska-science-center/science/ecosystems-analytics> > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Wed, 9 Feb 2022 22:00:40 +0000 "Bromaghin, Jeffrey F via R-help" <r-help at r-project.org> wrote:> These models are equivalent and the estimated coefficients come out > fine, but the R-squared and F statistics returned by summary() differ > markedly.Is the mean of yResp far from zero? Here's what summary.lm says about that:>> r.squared: R^2, the ?fraction of variance explained by the model?, >> >> R^2 = 1 - Sum(R[i]^2) / Sum((y[i] - y*)^2), >> >> where y* is the mean of y[i] if there is an intercept and >> zero otherwise.-- Best regards, Ivan
Hi Is it enough for explanation? https://stats.stackexchange.com/questions/26176/removal-of-statistically-sig nificant-intercept-term-increases-r2-in-linear-mo https://stackoverflow.com/questions/57415793/r-squared-in-lm-for-zero-interc ept-model Cheers Petr> -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Bromaghin,Jeffrey> F via R-help > Sent: Wednesday, February 9, 2022 11:01 PM > To: r-help at r-project.org > Subject: [R] Question About lm() > > Hello, > > I was constructing a simple linear model with one categorical (3-levels)and one> quantitative predictor variable for a colleague. I estimated modelparameters> with and without an intercept, sometimes called reference cell coding andcell> means coding. > > Model 1: yResp ~ -1 + xCat + xCont > Model 2: yResp ~ xCat + xCont > > These models are equivalent and the estimated coefficients come out fine,but> the R-squared and F statistics returned by summary() differ markedly. Ispent> some time looking at the code for both lm() and summary.lm() but did notfind> the source of the difference. aov() and anova() results also differ, so Isuspect> the issue involves how the sums of squares are being computed. I've alsospent> some time trying to search online for information on this, withoutsuccess. I> haven't used lm() for quite a while, but my memory is that thesedifferences> didn't occur in the distant past when I was teaching. > > Thanks in advance for any insights you might have, Jeff > > Jeffrey F. Bromaghin > Research Statistician > USGS Alaska Science Center > 907-786-7086 > Jeffrey Bromaghin, Ph.D. | U.S. Geological Survey > (usgs.gov)<https://www.usgs.gov/staff-profiles/jeffrey-bromaghin> > Ecosystems Analytics | U.S. Geological Survey > (usgs.gov)<https://www.usgs.gov/centers/alaska-science- > center/science/ecosystems-analytics> > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.