thr3ads.net - R help - [R] two methods for regression, two different results [Apr 2005]

If this information is useful, please help other people find it:
Share via:

John Sorkin

2005-Apr-06 02:54 UTC

[R] two methods for regression, two different results

Please forgive a straight stats question, and the informal notation.
 
let us say we wish to perform a liner regression:
y=b0 + b1*x + b2*z
 
There are two ways this can be done, the usual way, as a single
regression, 
fit1<-lm(y~x+z)
or by doing two regressions. In the first regression we could have y as
the dependent variable and x as the independent variable 
fit2<-lm(y~x). 
The second regrssion would be a regression in which the residuals from
the first regression would be the depdendent variable, and the
independent variable would be z.
fit2<-lm(fit2$residuals~z)
 
I would think the two methods would give the same p value and the same
beta coefficient for z. The don't. Can someone help my understand why
the two methods do not give the same results. Additionally, could
someone tell me when one method might be better than the other, i.e.
what question does the first method anwser, and what question does the
second method answer. I have searched a number of textbooks and have not
found this question addressed.
 
Thanks,
John
 
John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC and
University of Maryland School of Medicine Claude Pepper OAIC
 
University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
 
410-605-7119 
- NOTE NEW EMAIL ADDRESS:
jsorkin@grecc.umaryland.edu

	[[alternative HTML version deleted]]

Bill.Venables@csiro.au

2005-Apr-06 03:25 UTC

head link

[R] two methods for regression, two different results

This is possible if x and z are orthogonal, but in general it doesn't
work as you have noted.  (If it did work it would almost amount to a way
of inverting geenral square matrices by working one row at a time, no
going back...)

It is possible to fit a bivariate regression using simple linear
regression techniques iteratively like this, but it is a bit more
involved than your two step process.

1. regress y on x and take the residuals: ryx <- resid(lm(y ~ x))

2. regress z on x and take the residuals: rzx <- resid(lm(z ~ x))

3. regress ryx on rzx: fitz <- lm(ryx ~ rzx)

4. this gives you the estimate of the coefficient on z (what you call
below b2):
b2 <- coef(fitz)[2]

5. regress y - b2*z on x: fitx <- lm(I(y - b2*z) ~ x)

This last step gets you the estimates of b0 and b1.

None of this works with significances, though, because in going about it
this way you have essentially disguised the degrees of freedom involved.
So you can get the right estimates, but the standard errors,
t-statistics and residual variances are all somewhat inaccurate (though
usually not by much).

If x and z are orthogonal the (curious looking) step 2 is not needed.

This kind of idea lies behind some algorithms (e.g. Stevens' algorithm)
for fitting very large regressions essentially by iterative processes to
avoid constructing a huge model matrix.

Bill Venables

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of John Sorkin
Sent: Wednesday, 6 April 2005 12:55 PM
To: r-help at stat.math.ethz.ch
Subject: [R] two methods for regression, two different results


Please forgive a straight stats question, and the informal notation.
 
let us say we wish to perform a liner regression:
y=b0 + b1*x + b2*z
 
There are two ways this can be done, the usual way, as a single
regression, 
fit1<-lm(y~x+z)
or by doing two regressions. In the first regression we could have y as
the dependent variable and x as the independent variable 
fit2<-lm(y~x). 
The second regrssion would be a regression in which the residuals from
the first regression would be the depdendent variable, and the
independent variable would be z.
fit2<-lm(fit2$residuals~z)
 
I would think the two methods would give the same p value and the same
beta coefficient for z. The don't. Can someone help my understand why
the two methods do not give the same results. Additionally, could
someone tell me when one method might be better than the other, i.e.
what question does the first method anwser, and what question does the
second method answer. I have searched a number of textbooks and have not
found this question addressed.
 
Thanks,
John
 
John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC and
University of Maryland School of Medicine Claude Pepper OAIC
 
University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
 
410-605-7119 
-- NOTE NEW EMAIL ADDRESS:
jsorkin at grecc.umaryland.edu

	[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Jari Oksanen

2005-Apr-06 06:54 UTC

head link

[R] two methods for regression, two different results

On Tue, 2005-04-05 at 22:54 -0400, John Sorkin wrote:> Please forgive a straight stats question, and the informal notation.
>  
> let us say we wish to perform a liner regression:
> y=b0 + b1*x + b2*z
>  
> There are two ways this can be done, the usual way, as a single
> regression, 
> fit1<-lm(y~x+z)
> or by doing two regressions. In the first regression we could have y as
> the dependent variable and x as the independent variable 
> fit2<-lm(y~x). 
> The second regrssion would be a regression in which the residuals from
> the first regression would be the depdendent variable, and the
> independent variable would be z.
> fit2<-lm(fit2$residuals~z)
>  
> I would think the two methods would give the same p value and the same
> beta coefficient for z. The don't. Can someone help my understand why
> the two methods do not give the same results. Additionally, could
> someone tell me when one method might be better than the other, i.e.
> what question does the first method anwser, and what question does the
> second method answer. I have searched a number of textbooks and have not
> found this question addressed.
>  John,

Bill Venables already told you that they don't do that, because they are
not orthogonal. Here is a simpler way of getting the same result as he
suggested for the coefficients of z (but only for z):
> x <- runif(100)
> z <- x + rnorm(100, sd=0.4)
> y <- 3 + x + z + rnorm(100, sd=0.3)
> mod <- lm(y ~ x + z)
> mod2 <- lm(residuals(lm(y ~ x)) ~ x + z)
> summary(mod)
Call:
lm(formula = y ~ x + z)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  2.96436    0.06070  48.836  < 2e-16 ***
x            0.96272    0.11576   8.317 5.67e-13 ***
z            1.08922    0.06711  16.229  < 2e-16 ***
---
Residual standard error: 0.2978 on 97 degrees of freedom
> summary(mod2)
Call:
lm(formula = residuals(lm(y ~ x)) ~ x + z)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.15731    0.06070  -2.592   0.0110 *
x           -0.84459    0.11576  -7.296 8.13e-11 ***
z            1.08922    0.06711  16.229  < 2e-16 ***
---
Residual standard error: 0.2978 on 97 degrees of freedom

You can omit x from the outer lm only if x and z are orthogonal,
although you already removed the effect of x... In orthogonal case the
coefficient for x would be 0.

Residuals are equal in these two models:
> range(residuals(mod) - residuals(mod2))[1] -2.797242e-17  5.551115e-17

But, of course, fitted values are not equal, since you fit the mod2 to
the residuals after removing the effect of x...

cheers, jari oksanen
-- 
Jari Oksanen <jarioksa at sun3.oulu.fi>

Reasonably Related Threads

Search for more possibly parallel threads

R help - Apr 2005 - two methods for regression, two different results

[R] two methods for regression, two different results

[R] two methods for regression, two different results

[R] two methods for regression, two different results

Reasonably Related Threads