Hi all, I have dataset with 2 independent variable, one (x1) is continuous, the other (x2) is a categorical variable with 2 levels. The dependent variable (y) is continuous. When I run linear regression y~x1*x2, I found that the p value for the continuous independent variable x1 changes when different contrasts was used (helmert vs. treatment), while the p values for the categorical x2 and interaction are independent of the contrasts used. Can anyone explain why? I guess the p value for x1 is testing different hypothesis under different contrasts? If the interaction is NOT significant, what contrast should I use to test the hypothesis that x1 is not significantly related with y? x1<-rnorm(50,9,2) x2<-as.factor(as.numeric(runif(50)<0.35)) y<-rnorm(50,30,5) options(contrasts=c('contr.treatment','contr.poly')) summary(lm(y~x1*x2)) options(contrasts=c('contr.helmert','contr.poly')) summary(lm(y~x1*x2))
Bill.Venables@csiro.au
2005-Apr-23 05:53 UTC
[R] ANOVA with both discreet and continuous variable
An anonymous enquirer asks: : -----Original Message----- : From: r-help-bounces at stat.math.ethz.ch : [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of array chip : Sent: Saturday, 23 April 2005 2:32 PM : To: r-help at stat.math.ethz.ch : Subject: [R] ANOVA with both discreet and continuous variable : : : Hi all, : : I have dataset with 2 independent variable, one (x1) : is continuous, the other (x2) is a categorical : variable with 2 levels. The dependent variable (y) is : continuous. When I run linear regression y~x1*x2, I : found that the p value for the continuous independent : variable x1 changes when different contrasts was used : (helmert vs. treatment), while the p values for the : categorical x2 and interaction are independent of the : contrasts used. Can anyone explain why? Because the hypotheses the corresponding test statistics are testing are invariant with respect to the choice of contrast matrices you have considered. (This is NOT true if your factor has more than two levels, by the way.) : I guess the p : value for x1 is testing different hypothesis under : different contrasts? The tests are for different null hypotheses, yes. : If the interaction is NOT : significant, what contrast should I use to test the : hypothesis that x1 is not significantly related with : y? There is no choice of contrast matrix that will give the test statistic associated with the linear term x1 this meaning. Your question only specifies a null hypothesis, a significance test requires a null and an alternative hypothesis. Both matter. In the context you have set up below the way I would go about addressing what I think is your question would be something like: M0 <- lm(y ~ x2) ## Null hypthesis with no x1 M1 <- lm(y ~ x1*x2) ## outer hypothesis as below anova(M0, M1) : : : x1<-rnorm(50,9,2) : x2<-as.factor(as.numeric(runif(50)<0.35)) : y<-rnorm(50,30,5) : : options(contrasts=c('contr.treatment','contr.poly')) : summary(lm(y~x1*x2)) : : options(contrasts=c('contr.helmert','contr.poly')) : summary(lm(y~x1*x2)) : : ______________________________________________ : R-help at stat.math.ethz.ch mailing list : https://stat.ethz.ch/mailman/listinfo/r-help : PLEASE do read the posting guide! : http://www.R-project.org/posting-guide.html :
Steve Adams
2005-Apr-25 23:16 UTC
[R] RE: ANOVA with both discreet and continuous variable
If the treatment contrast is used, the p value for x1 is testing whether the slope at the reference level of x2 is equal to 0 (think about the model y~x1*x2 as fitting 2 straight lines, one for the reference level of x2, and one for the other level of x2). I am not quite sure about what it tests when the helmert contrast is used, my guess is it tests whether the slope at the mid-level (?) of the x2 is equal to 0. maybe other experts can comment on this. Steve : -----Original Message----- : From: r-help-bounces at stat.math.ethz.ch : [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of array chip : Sent: Saturday, 23 April 2005 2:32 PM : To: r-help at stat.math.ethz.ch : Subject: [R] ANOVA with both discreet and continuous variable : : : Hi all, : : I have dataset with 2 independent variable, one (x1) : is continuous, the other (x2) is a categorical : variable with 2 levels. The dependent variable (y) is : continuous. When I run linear regression y~x1*x2, I : found that the p value for the continuous independent : variable x1 changes when different contrasts was used : (helmert vs. treatment), while the p values for the : categorical x2 and interaction are independent of the : contrasts used. Can anyone explain why? Because the hypotheses the corresponding test statistics are testing are invariant with respect to the choice of contrast matrices you have considered. (This is NOT true if your factor has more than two levels, by the way.) : I guess the p : value for x1 is testing different hypothesis under : different contrasts? The tests are for different null hypotheses, yes. : If the interaction is NOT : significant, what contrast should I use to test the : hypothesis that x1 is not significantly related with : y? There is no choice of contrast matrix that will give the test statistic associated with the linear term x1 this meaning. Your question only specifies a null hypothesis, a significance test requires a null and an alternative hypothesis. Both matter. In the context you have set up below the way I would go about addressing what I think is your question would be something like: M0 <- lm(y ~ x2) ## Null hypthesis with no x1 M1 <- lm(y ~ x1*x2) ## outer hypothesis as below anova(M0, M1) : : : x1<-rnorm(50,9,2) : x2<-as.factor(as.numeric(runif(50)<0.35)) : y<-rnorm(50,30,5) : : options(contrasts=c('contr.treatment','contr.poly')) : summary(lm(y~x1*x2)) : : options(contrasts=c('contr.helmert','contr.poly')) : summary(lm(y~x1*x2)) :