thr3ads.net - R help - [R] Interaction term in multiple regression [Jul 2009]

If this information is useful, please help other people find it:
Share via:

kfortino at email.unc.edu

2009-Jul-14 01:31 UTC

[R] Interaction term in multiple regression

Hello All, Thank you for taking my question.  I am looking for 
information on how R handles interaction terms in a multiple regression 
using the ?lm? command.  I originally noticed something was unusual 
when my R output did not match the output from JMP for an identical 
test run previously. Both programs give identical results for the main 
test and if the models do not contain the interaction term then the 
output is identical.  However the results of the partial F tests differ 
dramatically when the interaction term is included.

Here are the results from R of the test with the interaction:
> summary(lm(TD[Year==2007]~Kd[Year==2007]*area[Year==2007], data=boon_tot))
Call:
lm(formula = TD[Year == 2007] ~ Kd[Year == 2007] * area[Year ==    
2007], data = boon_tot)

Residuals:
     Min       1Q   Median       3Q      Max -0.42696 -0.25648 -0.11960 
  0.03151  1.27957

Coefficients:
                                    Estimate Std. Error t value 
Pr(>|t|)  (Intercept)                           5.5714     1.7995   
3.096   0.0148 *
Kd[Year == 2007]                      0.2867     4.0696   0.070   
0.9456  area[Year == 2007]                    0.8192     0.2874   2.851 
   0.0215 *
Kd[Year == 2007]:area[Year == 2007]  -1.8074     0.6320  -2.860   0.0211 *
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 0.5238 on 8 degrees of freedom
Multiple R-squared: 0.6826,     Adjusted R-squared: 0.5636 F-statistic: 
5.736 on 3 and 8 DF,  p-value: 0.02155

Here are the results from JMP for the same model

Source		df	SS		MS		F		p
Model		3	4.72157318	1.57385773	5.73591141  0.02155127
Error		8	2.19509349	0.27438669
C. Total	11	6.91666667

Source			Est.		Std Error	t value	p > t
Intercept			10.4933505	1.24016642	8.46124381	0.00002911
Kd				-11.213166	2.95096414	-3.7998315	0.00523792
area (ha)			0.04560254	0.03069489	1.48567197	0.17567049
(Kd-0.428)*
(area (ha)-6.3625)	-1.8074455	0.63195669	-2.860078	0.02114887


As you can see although the results of the main test and the 
interaction term are identical, the estimate and std error of the other 
factors are very different.

Additionally if I remove the interaction term from the model, the two 
programs then give identical results.

Any thoughts as to why they differ would be appreciated.

Sincerely
Ken

David Winsemius

2009-Jul-14 04:01 UTC

head link

[R] Interaction term in multiple regression

On Jul 13, 2009, at 9:31 PM, kfortino at email.unc.edu wrote:
> Hello All, Thank you for taking my question.  I am looking for  
> information on how R handles interaction terms in a multiple  
> regression using the ?lm? command.  I originally noticed something  
> was unusual when my R output did not match the output from JMP for  
> an identical test run previously. Both programs give identical  
> results for the main test and if the models do not contain the  
> interaction term then the output is identical.  However the results  
> of the partial F tests differ dramatically when the interaction term  
> is included.
The interpretation the coefficients and partial F-tests for individual  
terms of a model involving interactions is at the very least  
difficult, and I have been advised by my statistical betters simply to  
not to attempt it. Compare the differences between overall model  
statistics instead, and while paying careful attention to the coding  
of terms,  create predictions for combinations of variables.
>
> Here are the results from R of the test with the interaction:
>
>> summary(lm(TD[Year==2007]~Kd[Year==2007]*area[Year==2007],  
>> data=boon_tot))
>
> Call:
> lm(formula = TD[Year == 2007] ~ Kd[Year == 2007] * area[Year ==     
> 2007], data = boon_tot)
>
> Residuals:
>    Min       1Q   Median       3Q      Max
> -0.42696 -0.25648 -0.11960  0.03151  1.27957
>
> Coefficients:
>                                   Estimate Std. Error t value Pr(>|t|)
> (Intercept)                           5.5714     1.7995   3.096    
> 0.0148 *
> Kd[Year == 2007]                      0.2867     4.0696   0.070    
> 0.9456
> area[Year == 2007]                    0.8192     0.2874   2.851    
> 0.0215 *
> Kd[Year == 2007]:area[Year == 2007]  -1.8074     0.6320  -2.860    
> 0.0211 *
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> Residual standard error: 0.5238 on 8 degrees of freedom
> Multiple R-squared: 0.6826,     Adjusted R-squared: 0.5636 F- 
> statistic: 5.736 on 3 and 8 DF,  p-value: 0.02155
>
> Here are the results from JMP for the same model
>
> Source		df	SS		MS		F		p
> Model		3	4.72157318	1.57385773	5.73591141  0.02155127
> Error		8	2.19509349	0.27438669
> C. Total	11	6.91666667
>
> Source			Est.		Std Error	t value	p > t
> Intercept		10.4933505	1.24016642	8.46124381	0.00002911
> Kd			-11.213166	2.95096414	-3.7998315	0.00523792
> area (ha)		0.04560254	0.03069489	1.48567197	0.17567049
> (Kd-0.428)*
      ^^^^> (area (ha)-6.3625)	-1.8074455	0.63195669	-2.860078	0.02114887           ^^^^^
This suggests that JMP has automatically centered the variables prior  
to forming the interaction term. What's not so clear is whether the  
other terms may have been centered as well.>
>
> As you can see although the results of the main test and the  
> interaction term are identical, the estimate and std error of the  
> other factors are very different.
The real question would be whether they give identical predictions and  
what the difference between model statistics show when the more simple  
models are compared with the more complex. You have not yet looked at  
this question in detail although the information is available in the  
outputs alluded to below..>
> Additionally if I remove the interaction term from the model, the  
> two programs then give identical results.
Then JMP must be give acceptable computations, I suppose.
>
> Any thoughts as to why they differ would be appreciated.
Different codings of the variables in the interaction models. Perhaps  
you couldcreate a variable that resembles the JMP interaction term and  
see if that is confirmed, or you could review the respective manuals  
regarding interactions.

-- 
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

R help - Jul 2009 - Interaction term in multiple regression

[R] Interaction term in multiple regression

[R] Interaction term in multiple regression