Hello All- I found an inconsistency between the R-squared reported in Excel vs. that in R, and I am wondering which (if any) may be correct and if this is a known issue. While it certainly wouldn't surprise me if Excel is just flat out wrong, I just want to make sure since the R- squared reported in R seems surprisingly high. Please let me know if this is the wrong list. Thanks! To begin, I have a set of data points in which the y is the experimental number and x is the predicted value. The Excel- generated graph (complete with R^2 and trend line) is provided at this link if you want to take a look: http://www.quantumbioinc.com/downloads/public/excel.png As you can see, the R-squared that is reported by Excel is -0.1005. Now when I bring the same data into R, I get an R-square of +0.9331 (see below). Being that I am new to R and semi-new to stats, is there a difference between "multiple R-squared" and R-squared that perhaps I am simply interpreting this wrong, or is this a known inconsistency between the two applications? If so, which is correct? Any insight would be greatly appreciated! ===================== > # note: a is experimental and c is predicted > summary(lm(a~c-1)) Call: lm(formula = a ~ c - 1) Residuals: Min 1Q Median 3Q Max -2987.6 -1126.6 -181.7 855.3 5602.8 Coefficients: Estimate Std. Error t value Pr(>|t|) c 0.99999 0.01402 71.33 <2e-16 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 1423 on 365 degrees of freedom Multiple R-Squared: 0.9331, Adjusted R-squared: 0.9329 F-statistic: 5088 on 1 and 365 DF, p-value: < 2.2e-16 > version _ platform powerpc-apple-darwin7.9.0 arch powerpc os darwin7.9.0 system powerpc, darwin7.9.0 status major 2 minor 2.1 year 2005 month 12 day 20 svn rev 36812 language R ===================== Thank you very much for your time! -Lance ____________________ Lance M. Westerhoff, Ph.D. General Manager QuantumBio Inc. WWW: http://www.quantumbioinc.com Email: lance at quantumbioinc.com "Safety is not the most important thing. I know this sounds like heresy, but it is a truth that must be embraced in order to do exploration. The most important thing is to actually go." ~ James Cameron
Lance Westerhoff <lance at quantumbioinc.com> writes:> Hello All- > > I found an inconsistency between the R-squared reported in Excel vs. > that in R, and I am wondering which (if any) may be correct and if > this is a known issue. While it certainly wouldn't surprise me if > Excel is just flat out wrong, I just want to make sure since the R- > squared reported in R seems surprisingly high. Please let me know if > this is the wrong list. Thanks!Excel is flat out wrong. As the name implies, R-squared values cannot be less than zero (adjusted R-squared can, but I wouldn't think that is what Excel does). R-squared is a bit odd in the zero intercept case because it describes how much better the line describes data compared to a horizontal line *at zero*. However, it doesn't really makes sense to compare with a non-zero constant, because the models are not nested. -- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch [mailto:r-help- > bounces at stat.math.ethz.ch] On Behalf Of Lance Westerhoff > Sent: Tuesday, January 24, 2006 8:51 AM > To: r-help at stat.math.ethz.ch > Subject: [R] R vs. Excel (R-squared) > > Hello All- > > I found an inconsistency between the R-squared reported in Excel vs. > that in R, and I am wondering which (if any) may be correct and if > this is a known issue. While it certainly wouldn't surprise me if > Excel is just flat out wrong, I just want to make sure since the R- > squared reported in R seems surprisingly high. Please let me know if > this is the wrong list. Thanks! ><<snip>>> > > # note: a is experimental and c is predicted > > summary(lm(a~c-1)) > > Call: > lm(formula = a ~ c - 1) > > Residuals: > Min 1Q Median 3Q Max > -2987.6 -1126.6 -181.7 855.3 5602.8 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > c 0.99999 0.01402 71.33 <2e-16 *** > --- > Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 > > Residual standard error: 1423 on 365 degrees of freedom > Multiple R-Squared: 0.9331, Adjusted R-squared: 0.9329 > F-statistic: 5088 on 1 and 365 DF, p-value: < 2.2e-16 ><<snip>>> Thank you very much for your time! > > -Lance > ____________________ > Lance M. Westerhoff, Ph.D. > General Manager > QuantumBio Inc. > > WWW: http://www.quantumbioinc.com > Email: lance at quantumbioinc.com > >Lance, Did you force the regression through the origin in Excel, like you are doing with your R code? And why are you doing the regression without an intercept in R? Dan Daniel J. Nordlund Research and Data Analysis Washington State Department of Social and Health Services Olympia, WA 98504-5204
Hello Lance, this was discussed on the list lately, see: http://tolstoy.newcastle.edu.au/~rking/R/help/06/01/18934.html Bernhard -----Urspr??ngliche Nachricht----- Von: Lance Westerhoff [mailto:lance at quantumbioinc.com] Gesendet: Dienstag, 24. Januar 2006 17:51 An: r-help at stat.math.ethz.ch Betreff: [R] R vs. Excel (R-squared) Hello All- I found an inconsistency between the R-squared reported in Excel vs. that in R, and I am wondering which (if any) may be correct and if this is a known issue. While it certainly wouldn't surprise me if Excel is just flat out wrong, I just want to make sure since the R- squared reported in R seems surprisingly high. Please let me know if this is the wrong list. Thanks! To begin, I have a set of data points in which the y is the experimental number and x is the predicted value. The Excel- generated graph (complete with R^2 and trend line) is provided at this link if you want to take a look: http://www.quantumbioinc.com/downloads/public/excel.png As you can see, the R-squared that is reported by Excel is -0.1005. Now when I bring the same data into R, I get an R-square of +0.9331 (see below). Being that I am new to R and semi-new to stats, is there a difference between "multiple R-squared" and R-squared that perhaps I am simply interpreting this wrong, or is this a known inconsistency between the two applications? If so, which is correct? Any insight would be greatly appreciated! ===================== > # note: a is experimental and c is predicted > summary(lm(a~c-1)) Call: lm(formula = a ~ c - 1) Residuals: Min 1Q Median 3Q Max -2987.6 -1126.6 -181.7 855.3 5602.8 Coefficients: Estimate Std. Error t value Pr(>|t|) c 0.99999 0.01402 71.33 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1423 on 365 degrees of freedom Multiple R-Squared: 0.9331, Adjusted R-squared: 0.9329 F-statistic: 5088 on 1 and 365 DF, p-value: < 2.2e-16 > version _ platform powerpc-apple-darwin7.9.0 arch powerpc os darwin7.9.0 system powerpc, darwin7.9.0 status major 2 minor 2.1 year 2005 month 12 day 20 svn rev 36812 language R ===================== Thank you very much for your time! -Lance ____________________ Lance M. Westerhoff, Ph.D. General Manager QuantumBio Inc. WWW: http://www.quantumbioinc.com Email: lance at quantumbioinc.com "Safety is not the most important thing. I know this sounds like heresy, but it is a truth that must be embraced in order to do exploration. The most important thing is to actually go." ~ James Cameron ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html ***************************************************************** Confidentiality Note: The information contained in this mess...{{dropped}}
I would think that a negative value for a statistic that takes on non-negative values would be suspiciously low. But some people think I'm overly skeptical. There may or may not be a discussion of this in one or more of the links in 'Spreadsheet Addiction'. http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html The executive summary is, "Don't do statistics in spreadsheets, especially Excel." Patrick Burns patrick at burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User") Lance Westerhoff wrote:>Hello All- > >I found an inconsistency between the R-squared reported in Excel vs. >that in R, and I am wondering which (if any) may be correct and if >this is a known issue. While it certainly wouldn't surprise me if >Excel is just flat out wrong, I just want to make sure since the R- >squared reported in R seems surprisingly high. Please let me know if >this is the wrong list. Thanks! > >To begin, I have a set of data points in which the y is the >experimental number and x is the predicted value. The Excel- >generated graph (complete with R^2 and trend line) is provided at >this link if you want to take a look: > >http://www.quantumbioinc.com/downloads/public/excel.png > >As you can see, the R-squared that is reported by Excel is -0.1005. >Now when I bring the same data into R, I get an R-square of +0.9331 >(see below). Being that I am new to R and semi-new to stats, is >there a difference between "multiple R-squared" and R-squared that >perhaps I am simply interpreting this wrong, or is this a known >inconsistency between the two applications? If so, which is >correct? Any insight would be greatly appreciated! > > >=====================> > > # note: a is experimental and c is predicted > > summary(lm(a~c-1)) > >Call: >lm(formula = a ~ c - 1) > >Residuals: > Min 1Q Median 3Q Max >-2987.6 -1126.6 -181.7 855.3 5602.8 > >Coefficients: > Estimate Std. Error t value Pr(>|t|) >c 0.99999 0.01402 71.33 <2e-16 *** >--- >Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > >Residual standard error: 1423 on 365 degrees of freedom >Multiple R-Squared: 0.9331, Adjusted R-squared: 0.9329 >F-statistic: 5088 on 1 and 365 DF, p-value: < 2.2e-16 > > > version > _ >platform powerpc-apple-darwin7.9.0 >arch powerpc >os darwin7.9.0 >system powerpc, darwin7.9.0 >status >major 2 >minor 2.1 >year 2005 >month 12 >day 20 >svn rev 36812 >language R > >=====================> > >Thank you very much for your time! > >-Lance >____________________ >Lance M. Westerhoff, Ph.D. >General Manager >QuantumBio Inc. > >WWW: http://www.quantumbioinc.com >Email: lance at quantumbioinc.com > > >"Safety is not the most important thing. I know this sounds like heresy, >but it is a truth that must be embraced in order to do exploration. >The most important thing is to actually go." ~ James Cameron > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > > > > >
> -----Original Message----- > From: Lance Westerhoff [mailto:lance at quantumbioinc.com] > Sent: Tuesday, January 24, 2006 10:48 AM > To: Nordlund, Dan > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] R vs. Excel (R-squared) > > > On Jan 24, 2006, at 12:11 PM, Nordlund, Dan wrote: > > > > Lance, > > > > Did you force the regression through the origin in Excel, like you > > are doing > > with your R code? And why are you doing the regression without an > > intercept > > in R? > > > > Dan > > > > Hi Dan- > > The reason why the intercept is forced to be zero is because I would > like to determine how well my prediction is compared to experiment. > Therefore, the only point we really know is (0,0) - everything else > is conjecture. Both in the excel case and the R case, the intercept > is forced to be zero. In terms of your question about the regression > without an intercept in R, I'm not sure what you mean. Haven't I set > the intercept to be zero? > > Thanks! > > -Lance >Your model formula, a ~ c - 1, estimates a slope coefficient but removes the column of 1's which would be used to estimate an intercept term (i.e., you eliminated the intercept term). This effectively forces the regression through the origin. So yes, you set the intercept to zero. You stated that "the only point we really know is (0,0)". The reason I asked about why you were forcing the regression through the intercept is that *I* don't know that you know anything about what happens when c=0. It is possible that your measurement process, whatever that might be, has a bias such that a is not equal to 0 when c=0. Did you actually have any points where the predicted value was 0? Just something to think about. Dan Daniel J. Nordlund Research and Data Analysis Washington State Department of Social and Health Services Olympia, WA 98504-5204
Hi In model without intercept Rsqared is high. Se e.g. Julian J. Faraway - Practical regression .... Warning: R2 as defined here doesn?t make any sense if you do not have an intercept in your model. This is because the denominator in the definition of R2 has a null model with an intercept in mind when the sum of squares is calculated. Alternative definitions of R2 are possible when there is no intercept but the same graphical intuition is not available and the R2?s obtained should not be compared to those for models with an intercept. ***Beware of high R2?s reported from models without an intercept***. HTH Petr On 24 Jan 2006 at 11:50, Lance Westerhoff wrote: To: r-help at stat.math.ethz.ch From: Lance Westerhoff <lance at quantumbioinc.com> Date sent: Tue, 24 Jan 2006 11:50:43 -0500 Subject: [R] R vs. Excel (R-squared)> Hello All- > > I found an inconsistency between the R-squared reported in Excel vs. > that in R, and I am wondering which (if any) may be correct and if > this is a known issue. While it certainly wouldn't surprise me if > Excel is just flat out wrong, I just want to make sure since the R- > squared reported in R seems surprisingly high. Please let me know if > this is the wrong list. Thanks! > > To begin, I have a set of data points in which the y is the > experimental number and x is the predicted value. The Excel- > generated graph (complete with R^2 and trend line) is provided at > this link if you want to take a look: > > http://www.quantumbioinc.com/downloads/public/excel.png > > As you can see, the R-squared that is reported by Excel is -0.1005. > Now when I bring the same data into R, I get an R-square of +0.9331 > (see below). Being that I am new to R and semi-new to stats, is > there a difference between "multiple R-squared" and R-squared that > perhaps I am simply interpreting this wrong, or is this a known > inconsistency between the two applications? If so, which is correct? > Any insight would be greatly appreciated! > > > =====================> > > # note: a is experimental and c is predicted > > summary(lm(a~c-1)) > > Call: > lm(formula = a ~ c - 1) > > Residuals: > Min 1Q Median 3Q Max > -2987.6 -1126.6 -181.7 855.3 5602.8 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > c 0.99999 0.01402 71.33 <2e-16 *** > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 1423 on 365 degrees of freedom > Multiple R-Squared: 0.9331, Adjusted R-squared: 0.9329 > F-statistic: 5088 on 1 and 365 DF, p-value: < 2.2e-16 > > > version > _ > platform powerpc-apple-darwin7.9.0 > arch powerpc > os darwin7.9.0 > system powerpc, darwin7.9.0 > status > major 2 > minor 2.1 > year 2005 > month 12 > day 20 > svn rev 36812 > language R > > =====================> > > Thank you very much for your time! > > -Lance > ____________________ > Lance M. Westerhoff, Ph.D. > General Manager > QuantumBio Inc. > > WWW: http://www.quantumbioinc.com > Email: lance at quantumbioinc.com > > > "Safety is not the most important thing. I know this sounds like > heresy, but it is a truth that must be embraced in order to do > exploration. The most important thing is to actually go." ~ James > Cameron > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.htmlPetr Pikal petr.pikal at precheza.cz