thr3ads.net - R help - [R] How to compare two regression line slopes [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Etienne Toffin

2009-Jan-27 19:08 UTC

[R] How to compare two regression line slopes

Hi,

I've made a research about how to compare two regression line slopes  
(of y versus x for 2 groups, "group" being a factor ) using R.

I knew the method based on the following statement :
t = (b1 - b2) / sb1,b2
where b1 and b2 are the two slope coefficients and sb1,b2 the pooled  
standard error of the slope (b)

which can be calculated in R this way:
 > df1 <- data.frame(x=1:3, y=1:3+rnorm(3))
 > df2 <- data.frame(x=1:3, y=1:3+rnorm(3))
 > fit1 <- lm(y~x, df1)
 > s1 <- summary(fit1)$coefficients
 > fit2 <- lm(y~x, df2)
 > s2 <- summary(fit2)$coefficients
 > db <- (s2[2,1]-s1[2,1])
 > sd <- sqrt(s2[2,2]^2+s1[2,2]^2)
 > df <- (fit1$df.residual+fit2$df.residual)
 > td <- db/sd
 > 2*pt(-abs(td), df)
[1] 0.9510506

However, I also found a procedure in Wonnacott & Wonnacott, that is  
based on the use of a mute variable D that will have a binary value  
according to the group to which a given point belongs (group : D=0;  
group 2: D=1). Then the equation that is computed is as follow:
y = b0 + b1.x + D.b2.x

which can be computed in R with:
 > fit <- lm(y ~ group + x + x:group)
where y is the response of the 2 groups.
The p-value of x:group gives the probability for the two slopes to be  
different, and the estimated values of parameters are these of both  
populations.

These two methods have already been described in the mailing list but  
not confronted and discussed.
So, my questions are:
- are these methods different ?
- which one should be preferentially used ?

This is not really a question about R but more about statistics?

I don't think I'm really clear and I know I'm not rigorous at all in
my descriptions, but I hope someone will understand me.

Thanks,

Etienne

-------------------------------------------------------------------
Etienne Toffin, PhD Student
Unit of Social Ecology
Universit? Libre de Bruxelles, CP 231
Boulevard du Triomphe
B-1050 Brussels
Belgium

Tel: +32(0)2/650.55.30
Fax: +32(0)/650.59.87
Skype: etienne_titou
http://www.ulb.ac.be/sciences/use/toffin.html

Ben Bolker

2009-Jan-27 21:52 UTC

head link

[R] How to compare two regression line slopes

Etienne Toffin <etoffin <at> ulb.ac.be> writes:
> 
> I've made a research about how to compare two regression line slopes  
> (of y versus x for 2 groups, "group" being a factor ) using R.
> 
> I knew the method based on the following statement :
> t = (b1 - b2) / sb1,b2
> where b1 and b2 are the two slope coefficients and sb1,b2 the pooled  
> standard error of the slope (b)
> 
> However, I also found a procedure in Wonnacott & Wonnacott, that is  
> based on the use of a mute variable D that will have a binary value  
> according to the group to which a given point belongs (group : D=0;  
> group 2: D=1). Then the equation that is computed is as follow:
> y = b0 + b1.x + D.b2.x
> 
> which can be computed in R with:
>  > fit <- lm(y ~ group + x + x:group)
> where y is the response of the 2 groups.
> The p-value of x:group gives the probability for the two slopes to be  
> different, and the estimated values of parameters are these of both  
> populations.
> 
> These two methods have already been described in the mailing list but  
> not confronted and discussed.
> So, my questions are:
> - are these methods different ?
> - which one should be preferentially used ?
>    I think you're perfectly clear.
  These procedures are identical: the first has the virtue
of being very mechanical and transparent, but the
second is much easier (and easier to extend, e.g. to
multiple groups):

dat <- data.frame(x=rep(1:3,2),y=rep(1:3,2)+rnorm(6),
                 f=factor(rep(1:2,each=3)))


test1 <- function(dat) {
  fits <- lapply(split(dat,dat$f),lm,formula=y~x)
  sums <- lapply(fits,summary)
  coefs <- lapply(sums,coef)
  db <-
coefs[[2]]["x","Estimate"]-coefs[[1]]["x","Estimate"]
  sd <- sqrt(sum(sapply(coefs,function(x) x["x","Std.
Error"])^2))
  df <- sum(sapply(fits,"[[","df.residual"))
  td <- db/sd
  c(est=db,sd=sd,tstat=td,prt=2*pt(-abs(td),df))
}

test2 <- function(dat) {
  fit <- lm(y~x*f,data=dat)
  coef(summary(fit))["x:f2",]
}

 rbind(test1(dat),test2(dat))
           est       sd     tstat       prt
[1,] 0.5333555 1.382019 0.3859249 0.7367364
[2,] 0.5333555 1.382019 0.3859249 0.7367364>

Zhou Fang

2009-Jan-27 21:56 UTC

head link

[R] How to compare two regression line slopes

Hi,

Yes, the two methods are equivalent.

The p-value R calculates is based on the same t-statistic used in your 
manual analysis. You can see this by doing the second method:

y2 = rbind(df1, df2)
y2 = cbind(c(0,0,0,1,1,1), y2)
summary(lm(y2[,3] ~ y2[,1] + y2[,2] + y2[,2]*y2[,1]))

Look at the values you previously calculated and see where they reappear...
print(td)
print(db)
print(sd)

Looked at from the other way, the models with the D's and so on is one 
way to explain where the t-test comes from. Just do H0: b2=0 vs H1: 
b2!=0, and sprinkle some independence and normality assumptions.

It's probably preferable to use the automatic lm based method, because 
then you specify the model explicitly, while with the seemingly recipe 
based approach the actual models and hypotheses your are testing may not 
be clear. Plus you get nice diagnostic statistics and pretty graphs. The 
downside is that you might get lured into complacency...

Zhou Fang

PS: Your model equation isn't right. In both, we are also allowing the 
intercept to vary between groups. So really you want
y = c + D.b0 + b1.x + D.b2.x

Zhou Fang

2009-Jan-27 22:03 UTC

head link

[R] How to compare two regression line slopes

Hi,

Yes, the two methods are equivalent.

The p-value R calculates is based on the same t-statistic used in your 
manual analysis. You can see this by doing the second method:

y2 = rbind(df1, df2)
y2 = cbind(c(0,0,0,1,1,1), y2)
summary(lm(y2[,3] ~ y2[,1] + y2[,2] + y2[,2]*y2[,1]))

Look at the values you previously calculated and see where they reappear...
print(td)
print(db)
print(sd)

Looked at from the other way, the models with the D's and so on is one 
way to explain where the t-test comes from. Just do H0: b2=0 vs H1: 
b2!=0, and sprinkle some independence and normality assumptions.

It's probably preferable to use the automatic lm based method, because 
then you specify the model explicitly, while with the seemingly recipe 
based approach the actual models and hypotheses your are testing may not 
be clear. Plus you get nice diagnostic statistics and pretty graphs. The 
downside is that you might get lured into complacency...

Zhou Fang

PS: Your model equation isn't right. In both, we are also allowing the 
intercept to vary between groups. So really you want
y = c + D.b0 + b1.x + D.b2.x

Possibly Parallel Threads

Search for more reasonably related threads

R help - Jan 2009 - How to compare two regression line slopes

[R] How to compare two regression line slopes

[R] How to compare two regression line slopes

[R] How to compare two regression line slopes

[R] How to compare two regression line slopes

Possibly Parallel Threads