thr3ads.net - R help - [R] multiple regression w/ no intercept; strange results [Jun 2009]

If this information is useful, please help other people find it:
Share via:

John Hunter

2009-Jun-28 05:32 UTC

[R] multiple regression w/ no intercept; strange results

I am writing some software to do multiple regression and am using r to
benchmark the results.  The results are squaring up nicely for the
"with-intercept" case but not for the "no-intercept" case. 
I am not
sure what R is doing to get the statistics for the 0 intercept case.
For example, I would expect the Multiple R-squared to equal the square
of the correlation between the actual values "y" and the fitted values
"yprime".  For the with-intercept case, they do, but not for the
"no-intercept" case.  My sample file and R session output are below


  > dataset = read.table("/Users/jdhunter/tmp/sample1.csv",
header=TRUE, sep=",")

The "with-intercept" fit: the "Multiple R-Squared" is equal
to the
cor(yprime, y)**2:

    > fit <- lm( y~x1+x2, data=dataset)
    > summary(fit)

    Call:
    lm(formula = y ~ x1 + x2, data = dataset)

    Residuals:
        Min      1Q  Median      3Q     Max
    -1.8026 -0.4651  0.1778  0.5241  1.0222

    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept) -4.10358    1.26103  -3.254  0.00467 **
    x1           0.08641    0.03144   2.748  0.01372 *
    x2           0.08760    0.04548   1.926  0.07100 .
    ---

    Residual standard error: 0.7589 on 17 degrees of freedom
    Multiple R-squared: 0.6709,	Adjusted R-squared: 0.6322
    F-statistic: 17.33 on 2 and 17 DF,  p-value: 7.888e-05

    > yp = fitted.values(fit)
    > cor(yp, dataset$y)**2
    [1] 0.6709279

The "no-intercept" fit: the "Multiple R-Squared" is not
equal to the
cor(yprime, y)**2:

    > fitno <- lm( y~0+x1+x2, data=dataset)
    > summary(fitno)

    Call:
    lm(formula = y ~ 0 + x1 + x2, data = dataset)

    Residuals:
         Min       1Q   Median       3Q      Max
    -1.69640 -0.58134  0.03650  0.53673  1.33358

    Coefficients:
       Estimate Std. Error t value Pr(>|t|)
    x1  0.03655    0.03399   1.075    0.296
    x2  0.04358    0.05376   0.811    0.428

    Residual standard error: 0.9395 on 18 degrees of freedom

    Multiple R-squared: 0.9341,	Adjusted R-squared: 0.9267
    F-statistic: 127.5 on 2 and 18 DF,  p-value: 2.352e-11

    > ypno = fitted.values(fitno)
    > cor(ypno, dataset$y)
    [1] 0.6701336

If anyone has some suggestions about how R is computing these summary
stats for the no-intercept case, or references to literature or docs,
tha would be helpful.  It seems odd to me that dropping the intercept
would cause the R^2 and F stats to rise so dramatically, and the p
value to consequently drop so much.  In my implementation, I get the
same beta1 and beta2, and the R2 I compute using the

   variance_regression / variance_total

agrees with cor(ypno, dataset$y) but not with the value R reports in
the summary, and my F and p values are similarly off for the
no-intercept case.

Thanks,
JDH


R version 2.9.1 (2009-06-26)

home:~/tmp> uname -a
Darwin Macintosh-7.local 9.6.0 Darwin Kernel Version 9.6.0: Mon Nov 24
17:37:00 PST 2008; root:xnu-1228.9.59~1/RELEASE_I386 i386

Dieter Menne

2009-Jun-28 08:38 UTC

head link

[R] multiple regression w/ no intercept; strange results

I am writing some software to do multiple regression and am using r to
benchmark the results.  The results are squaring up nicely for the
"with-intercept" case but not for the "no-intercept" case. 
I am not
sure what R is doing to get the statistics for the 0 intercept case.
...
 It seems odd to me that dropping the intercept
would cause the R^2 and F stats to rise so dramatically, and the p
value to consequently drop so much.  In my implementation, I get the
same beta1 and beta2, and the R2 I compute using the

Removing the intercept can harm your sanity. See 

http://markmail.org/message/q67jf7uaig7d4tkm

for an example. 

Dieter

-- 
View this message in context:
http://www.nabble.com/multiple-regression-w--no-intercept--strange-results-tp24238950p24239683.html
Sent from the R help mailing list archive at Nabble.com.

John Hunter

2009-Jun-29 13:39 UTC

head link

[R] multiple regression w/ no intercept; strange results

On Sun, Jun 28, 2009 at 3:38 AM, Dieter
Menne<dieter.menne at menne-biomed.de> wrote:
>  It seems odd to me that dropping the intercept
> would cause the R^2 and F stats to rise so dramatically, and the p
> value to consequently drop so much.  In my implementation, I get the
> same beta1 and beta2, and the R2 I compute using the
>
> Removing the intercept can harm your sanity. See
>
> http://markmail.org/message/q67jf7uaig7d4tkm
>
> for an example.
I read the paper and the example so thanks for sending those along.
The paper made some good arguments from a modeling perspective why one
should keep the intercept -- the most convincing to me is that you
would like the modeling to be robust to a location and scale
transformation.

But my question was more numerical: in particular, the R^2 of the
model should be equal to the square of the correlation between the fit
values and the actual values.  It is with the intercept and is not w/o
it, as my code example shows.  Am I correct in assuming these should
always be the same, and if they are not, does it reflect a bug in R or
perhaps a numerical instability?

You also wrote in your post "There are reasons why the standard
textbooks...".  I read the reasons Venables addressed in his
"Exegeses", but none of these seem to address my particular concern.
Can you elaborate on these or provide additional links ?

Thanks!
JDH

R help - Jun 2009 - multiple regression w/ no intercept; strange results

[R] multiple regression w/ no intercept; strange results

[R] multiple regression w/ no intercept; strange results

[R] multiple regression w/ no intercept; strange results