thr3ads.net - R help - [R] R vs. Excel (R-squared) [Jan 2006]

If this information is useful, please help other people find it:
Share via:

Lance Westerhoff

2006-Jan-24 16:50 UTC

[R] R vs. Excel (R-squared)

Hello All-

I found an inconsistency between the R-squared reported in Excel vs.  
that in R, and I am wondering which (if any) may be correct and if  
this is a known issue.  While it certainly wouldn't surprise me if  
Excel is just flat out wrong, I just want to make sure since the R- 
squared reported in R seems surprisingly high.  Please let me know if  
this is the wrong list.  Thanks!

To begin, I have a set of data points in which the y is the  
experimental number and x is the predicted value.  The Excel- 
generated graph (complete with R^2 and trend line) is provided at  
this link if you want to take a look:

http://www.quantumbioinc.com/downloads/public/excel.png

As you can see, the R-squared that is reported by Excel is -0.1005.   
Now when I bring the same data into R, I get an R-square of +0.9331  
(see below).  Being that I am new to R and semi-new to stats, is  
there a difference between "multiple R-squared" and R-squared that  
perhaps I am simply interpreting this wrong, or is this a known  
inconsistency between the two applications?  If so, which is  
correct?  Any insight would be greatly appreciated!


=====================
 > # note: a is experimental and c is predicted
 > summary(lm(a~c-1))

Call:
lm(formula = a ~ c - 1)

Residuals:
     Min      1Q  Median      3Q     Max
-2987.6 -1126.6  -181.7   855.3  5602.8

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
c  0.99999    0.01402   71.33   <2e-16 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 1423 on 365 degrees of freedom
Multiple R-Squared: 0.9331,	Adjusted R-squared: 0.9329
F-statistic:  5088 on 1 and 365 DF,  p-value: < 2.2e-16

 > version
          _
platform powerpc-apple-darwin7.9.0
arch     powerpc
os       darwin7.9.0
system   powerpc, darwin7.9.0
status
major    2
minor    2.1
year     2005
month    12
day      20
svn rev  36812
language R

=====================

Thank you very much for your time!

-Lance
____________________
Lance M. Westerhoff, Ph.D.
General Manager
QuantumBio Inc.

WWW:    http://www.quantumbioinc.com
Email:    lance at quantumbioinc.com


"Safety is not the most important thing. I know this sounds like heresy,
but it is a truth that must be embraced in order to do exploration.
The most important thing is to actually go."  ~ James Cameron

Peter Dalgaard

2006-Jan-24 17:08 UTC

head link

[R] R vs. Excel (R-squared)

Lance Westerhoff <lance at quantumbioinc.com> writes:
> Hello All-
> 
> I found an inconsistency between the R-squared reported in Excel vs.  
> that in R, and I am wondering which (if any) may be correct and if  
> this is a known issue.  While it certainly wouldn't surprise me if  
> Excel is just flat out wrong, I just want to make sure since the R- 
> squared reported in R seems surprisingly high.  Please let me know if  
> this is the wrong list.  Thanks!
Excel is flat out wrong. As the name implies, R-squared values cannot
be less than zero (adjusted R-squared can, but I wouldn't think
that is what Excel does).

R-squared is a bit odd in the zero intercept case because it
describes how much better the line describes data compared to a
horizontal line *at zero*. However, it doesn't really makes sense to
compare with a non-zero constant, because the models are not nested. 
 


-- 
   O__  ---- Peter Dalgaard             ??ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

Nordlund, Dan

2006-Jan-24 17:11 UTC

head link

[R] R vs. Excel (R-squared)

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-
> bounces at stat.math.ethz.ch] On Behalf Of Lance Westerhoff
> Sent: Tuesday, January 24, 2006 8:51 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] R vs. Excel (R-squared)
> 
> Hello All-
> 
> I found an inconsistency between the R-squared reported in Excel vs.
> that in R, and I am wondering which (if any) may be correct and if
> this is a known issue.  While it certainly wouldn't surprise me if
> Excel is just flat out wrong, I just want to make sure since the R-
> squared reported in R seems surprisingly high.  Please let me know if
> this is the wrong list.  Thanks!
> 
<<snip>>> 
>  > # note: a is experimental and c is predicted
>  > summary(lm(a~c-1))
> 
> Call:
> lm(formula = a ~ c - 1)
> 
> Residuals:
>      Min      1Q  Median      3Q     Max
> -2987.6 -1126.6  -181.7   855.3  5602.8
> 
> Coefficients:
>    Estimate Std. Error t value Pr(>|t|)
> c  0.99999    0.01402   71.33   <2e-16 ***
> ---
> Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
> 
> Residual standard error: 1423 on 365 degrees of freedom
> Multiple R-Squared: 0.9331,	Adjusted R-squared: 0.9329
> F-statistic:  5088 on 1 and 365 DF,  p-value: < 2.2e-16
> <<snip>>
> Thank you very much for your time!
> 
> -Lance
> ____________________
> Lance M. Westerhoff, Ph.D.
> General Manager
> QuantumBio Inc.
> 
> WWW:    http://www.quantumbioinc.com
> Email:    lance at quantumbioinc.com
> 
> 

Lance,

Did you force the regression through the origin in Excel, like you are doing
with your R code?  And why are you doing the regression without an intercept
in R?

Dan

Daniel J. Nordlund
Research and Data Analysis
Washington State Department of Social and Health Services
Olympia, WA  98504-5204

Pfaff, Bernhard Dr.

2006-Jan-24 17:14 UTC

head link

[R] R vs. Excel (R-squared)

Hello Lance,

this was discussed on the list lately, see:

http://tolstoy.newcastle.edu.au/~rking/R/help/06/01/18934.html 


Bernhard

-----Urspr??ngliche Nachricht-----
Von: Lance Westerhoff [mailto:lance at quantumbioinc.com] 
Gesendet: Dienstag, 24. Januar 2006 17:51
An: r-help at stat.math.ethz.ch
Betreff: [R] R vs. Excel (R-squared)

Hello All-

I found an inconsistency between the R-squared reported in Excel vs.  
that in R, and I am wondering which (if any) may be correct and if  
this is a known issue.  While it certainly wouldn't surprise me if  
Excel is just flat out wrong, I just want to make sure since the R- 
squared reported in R seems surprisingly high.  Please let me know if  
this is the wrong list.  Thanks!

To begin, I have a set of data points in which the y is the  
experimental number and x is the predicted value.  The Excel- 
generated graph (complete with R^2 and trend line) is provided at  
this link if you want to take a look:

http://www.quantumbioinc.com/downloads/public/excel.png

As you can see, the R-squared that is reported by Excel is -0.1005.   
Now when I bring the same data into R, I get an R-square of +0.9331  
(see below).  Being that I am new to R and semi-new to stats, is  
there a difference between "multiple R-squared" and R-squared that  
perhaps I am simply interpreting this wrong, or is this a known  
inconsistency between the two applications?  If so, which is  
correct?  Any insight would be greatly appreciated!


=====================
 > # note: a is experimental and c is predicted
 > summary(lm(a~c-1))

Call:
lm(formula = a ~ c - 1)

Residuals:
     Min      1Q  Median      3Q     Max
-2987.6 -1126.6  -181.7   855.3  5602.8

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
c  0.99999    0.01402   71.33   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

Residual standard error: 1423 on 365 degrees of freedom
Multiple R-Squared: 0.9331,	Adjusted R-squared: 0.9329
F-statistic:  5088 on 1 and 365 DF,  p-value: < 2.2e-16

 > version
          _
platform powerpc-apple-darwin7.9.0
arch     powerpc
os       darwin7.9.0
system   powerpc, darwin7.9.0
status
major    2
minor    2.1
year     2005
month    12
day      20
svn rev  36812
language R

=====================

Thank you very much for your time!

-Lance
____________________
Lance M. Westerhoff, Ph.D.
General Manager
QuantumBio Inc.

WWW:    http://www.quantumbioinc.com
Email:    lance at quantumbioinc.com


"Safety is not the most important thing. I know this sounds like heresy,
but it is a truth that must be embraced in order to do exploration.
The most important thing is to actually go."  ~ James Cameron

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
*****************************************************************
Confidentiality Note: The information contained in this mess...{{dropped}}

Patrick Burns

2006-Jan-24 18:45 UTC

head link

[R] R vs. Excel (R-squared)

I would think that a negative value for a statistic that takes
on non-negative values would be suspiciously low.  But
some people think I'm overly skeptical.

There may or may not be a discussion of this in one or more
of the links in 'Spreadsheet Addiction'.
http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html

The executive summary is, "Don't do statistics in spreadsheets,
especially Excel."

Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

Lance Westerhoff wrote:
>Hello All-
>
>I found an inconsistency between the R-squared reported in Excel vs.  
>that in R, and I am wondering which (if any) may be correct and if  
>this is a known issue.  While it certainly wouldn't surprise me if  
>Excel is just flat out wrong, I just want to make sure since the R- 
>squared reported in R seems surprisingly high.  Please let me know if  
>this is the wrong list.  Thanks!
>
>To begin, I have a set of data points in which the y is the  
>experimental number and x is the predicted value.  The Excel- 
>generated graph (complete with R^2 and trend line) is provided at  
>this link if you want to take a look:
>
>http://www.quantumbioinc.com/downloads/public/excel.png
>
>As you can see, the R-squared that is reported by Excel is -0.1005.   
>Now when I bring the same data into R, I get an R-square of +0.9331  
>(see below).  Being that I am new to R and semi-new to stats, is  
>there a difference between "multiple R-squared" and R-squared that
>perhaps I am simply interpreting this wrong, or is this a known  
>inconsistency between the two applications?  If so, which is  
>correct?  Any insight would be greatly appreciated!
>
>
>=====================>
> > # note: a is experimental and c is predicted
> > summary(lm(a~c-1))
>
>Call:
>lm(formula = a ~ c - 1)
>
>Residuals:
>     Min      1Q  Median      3Q     Max
>-2987.6 -1126.6  -181.7   855.3  5602.8
>
>Coefficients:
>   Estimate Std. Error t value Pr(>|t|)
>c  0.99999    0.01402   71.33   <2e-16 ***
>---
>Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
>Residual standard error: 1423 on 365 degrees of freedom
>Multiple R-Squared: 0.9331,	Adjusted R-squared: 0.9329
>F-statistic:  5088 on 1 and 365 DF,  p-value: < 2.2e-16
>
> > version
>          _
>platform powerpc-apple-darwin7.9.0
>arch     powerpc
>os       darwin7.9.0
>system   powerpc, darwin7.9.0
>status
>major    2
>minor    2.1
>year     2005
>month    12
>day      20
>svn rev  36812
>language R
>
>=====================>
>
>Thank you very much for your time!
>
>-Lance
>____________________
>Lance M. Westerhoff, Ph.D.
>General Manager
>QuantumBio Inc.
>
>WWW:    http://www.quantumbioinc.com
>Email:    lance at quantumbioinc.com
>
>
>"Safety is not the most important thing. I know this sounds like
heresy,
>but it is a truth that must be embraced in order to do exploration.
>The most important thing is to actually go."  ~ James Cameron
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
>
>
>  
>

Nordlund, Dan

2006-Jan-24 21:50 UTC

head link

[R] R vs. Excel (R-squared)

> -----Original Message-----
> From: Lance Westerhoff [mailto:lance at quantumbioinc.com]
> Sent: Tuesday, January 24, 2006 10:48 AM
> To: Nordlund, Dan
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] R vs. Excel (R-squared)
> 
> 
> On Jan 24, 2006, at 12:11 PM, Nordlund, Dan wrote:
> >
> > Lance,
> >
> > Did you force the regression through the origin in Excel, like you
> > are doing
> > with your R code?  And why are you doing the regression without an
> > intercept
> > in R?
> >
> > Dan
> >
> 
> Hi Dan-
> 
> The reason why the intercept is forced to be zero is because I would
> like to determine how well my prediction is compared to experiment.
> Therefore, the only point we really know is (0,0) - everything else
> is conjecture.  Both in the excel case and the R case, the intercept
> is forced to be zero.  In terms of your question about the regression
> without an intercept in R, I'm not sure what you mean.  Haven't I
set
> the intercept to be zero?
> 
> Thanks!
> 
> -Lance
> Your model formula, a ~ c - 1, estimates a slope coefficient but removes the
column of 1's which would be used to estimate an intercept term (i.e., you
eliminated the intercept term).  This effectively forces the regression
through the origin.  So yes, you set the intercept to zero. 

You stated that "the only point we really know is (0,0)".  The reason
I
asked about why you were forcing the regression through the intercept is
that *I* don't know that you know anything about what happens when c=0.  It
is possible that your measurement process, whatever that might be, has a
bias such that a is not equal to 0 when c=0.  Did you actually have any
points where the predicted value was 0?  Just something to think about.

Dan

Daniel J. Nordlund
Research and Data Analysis
Washington State Department of Social and Health Services
Olympia, WA  98504-5204

Petr Pikal

2006-Jan-25 06:50 UTC

head link

[R] R vs. Excel (R-squared)

Hi

In model without intercept Rsqared is high.

Se e.g. Julian J. Faraway - Practical regression ....

Warning: R2 as defined here doesn?t make any sense if you do not have 
an intercept in your model. This
is because the denominator in the definition of R2 has a null model 
with an intercept in mind when the sum
of squares is calculated. Alternative definitions of R2 are possible 
when there is no intercept but the same
graphical intuition is not available and the R2?s obtained should not 
be compared to those for models with
an intercept. ***Beware of high R2?s reported from models without an 
intercept***.

HTH
Petr




On 24 Jan 2006 at 11:50, Lance Westerhoff wrote:

To:             	r-help at stat.math.ethz.ch
From:           	Lance Westerhoff <lance at quantumbioinc.com>
Date sent:      	Tue, 24 Jan 2006 11:50:43 -0500
Subject:        	[R] R vs. Excel (R-squared)
> Hello All-
> 
> I found an inconsistency between the R-squared reported in Excel vs. 
> that in R, and I am wondering which (if any) may be correct and if 
> this is a known issue.  While it certainly wouldn't surprise me if 
> Excel is just flat out wrong, I just want to make sure since the R-
> squared reported in R seems surprisingly high.  Please let me know if 
> this is the wrong list.  Thanks!
> 
> To begin, I have a set of data points in which the y is the  
> experimental number and x is the predicted value.  The Excel- 
> generated graph (complete with R^2 and trend line) is provided at 
> this link if you want to take a look:
> 
> http://www.quantumbioinc.com/downloads/public/excel.png
> 
> As you can see, the R-squared that is reported by Excel is -0.1005.  
> Now when I bring the same data into R, I get an R-square of +0.9331 
> (see below).  Being that I am new to R and semi-new to stats, is 
> there a difference between "multiple R-squared" and R-squared
that
> perhaps I am simply interpreting this wrong, or is this a known 
> inconsistency between the two applications?  If so, which is  correct?
>  Any insight would be greatly appreciated!
> 
> 
> =====================> 
>  > # note: a is experimental and c is predicted
>  > summary(lm(a~c-1))
> 
> Call:
> lm(formula = a ~ c - 1)
> 
> Residuals:
>      Min      1Q  Median      3Q     Max
> -2987.6 -1126.6  -181.7   855.3  5602.8
> 
> Coefficients:
>    Estimate Std. Error t value Pr(>|t|)
> c  0.99999    0.01402   71.33   <2e-16 ***
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> 
> Residual standard error: 1423 on 365 degrees of freedom
> Multiple R-Squared: 0.9331,	Adjusted R-squared: 0.9329
> F-statistic:  5088 on 1 and 365 DF,  p-value: < 2.2e-16
> 
>  > version
>           _
> platform powerpc-apple-darwin7.9.0
> arch     powerpc
> os       darwin7.9.0
> system   powerpc, darwin7.9.0
> status
> major    2
> minor    2.1
> year     2005
> month    12
> day      20
> svn rev  36812
> language R
> 
> =====================> 
> 
> Thank you very much for your time!
> 
> -Lance
> ____________________
> Lance M. Westerhoff, Ph.D.
> General Manager
> QuantumBio Inc.
> 
> WWW:    http://www.quantumbioinc.com
> Email:    lance at quantumbioinc.com
> 
> 
> "Safety is not the most important thing. I know this sounds like
> heresy, but it is a truth that must be embraced in order to do
> exploration. The most important thing is to actually go."  ~ James
> Cameron
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
Petr Pikal
petr.pikal at precheza.cz

Seemingly Similar Threads

Search for more maybe matching threads

R help - Jan 2006 - R vs. Excel (R-squared)

[R] R vs. Excel (R-squared)

[R] R vs. Excel (R-squared)

[R] R vs. Excel (R-squared)

[R] R vs. Excel (R-squared)

[R] R vs. Excel (R-squared)

[R] R vs. Excel (R-squared)

[R] R vs. Excel (R-squared)

Seemingly Similar Threads