Agustin Lobo
2013-Nov-08 08:35 UTC
[R] Different output from lm() and lmPerm lmp() if categorical variables are included in the analysis
I've found a problem when using categorical variables in lmp() from package lmPerm According to help(lmp): "This function will behave identically to lm() if the following parameters are set: perm="", seq=TRUE, center=FALSE.") But not in the case of including categorical variables: require(lmPerm) set.seed(42) testx1 <- rnorm(100,10,5) testx2 <- c(rep("a",50),rep("b",50)) testy <- 5*testx1 + 3 + runif(100,-20,20) test <- data.frame(x1=testx1,x2testx2,y=testy) atest <- lm(y ~ x1*x2,data=test) aptest <- lmp(y ~ x1*x2,data=test,perm = "", seqs = TRUE, center = FALSE) summary(atest) Call: lm(formula = y ~ x1 * x2, data = test) Residuals: Min 1Q Median 3Q Max -17.1777 -9.5306 -0.9733 7.6840 22.2728 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.0036 3.2488 -0.617 0.539 x1 5.3346 0.2861 18.646 <2e-16 *** x2b 2.4952 5.2160 0.478 0.633 x1:x2b -0.3833 0.4568 -0.839 0.404 summary(aptest) Call: lmp(formula = y ~ x1 * x2, data = test, perm = "", seqs = TRUE, center = FALSE) Residuals: Min 1Q Median 3Q Max -17.1777 -9.5306 -0.9733 7.6840 22.2728 Coefficients: Estimate Std. Error t value Pr(>|t|) x1 5.1429 0.2284 22.516 <2e-16 *** x21 -1.2476 2.6080 -0.478 0.633 x1:x21 0.1917 0.2284 0.839 0.404 It looks like lmp() is internally coding dummy variables in a different way, so lmp results are for "a" (named "1" by lmp) while lm results are for "b" ? Agus
Rolf Turner
2013-Nov-13 08:27 UTC
[R] Different output from lm() and lmPerm lmp() if categorical variables are included in the analysis
RTFM!!! :-) The help explicitly says "The default contrasts are set internally to (contr.sum, contr.poly) ....". Set options(contrasts=c("contr.sum","contr.poly")) before your call to lm() and "atest" will agree with "aptest" all down the line. cheers, Rolf Turner On 11/08/13 21:35, Agustin Lobo wrote:> I've found a problem when using > categorical variables in lmp() from package lmPerm > > According to help(lmp): "This function will behave identically to lm() > if the following parameters are set: perm="", seq=TRUE, > center=FALSE.") > But not in the case of including categorical variables: > > require(lmPerm) > set.seed(42) > testx1 <- rnorm(100,10,5) > testx2 <- c(rep("a",50),rep("b",50)) > testy <- 5*testx1 + 3 + runif(100,-20,20) > test <- data.frame(x1=testx1,x2> testx2,y=testy) > atest <- lm(y ~ x1*x2,data=test) > aptest <- lmp(y ~ x1*x2,data=test,perm = "", seqs = TRUE, center = FALSE) > summary(atest) > > Call: > lm(formula = y ~ x1 * x2, data = test) > Residuals: > Min 1Q Median 3Q Max > -17.1777 -9.5306 -0.9733 7.6840 22.2728 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) -2.0036 3.2488 -0.617 0.539 > x1 5.3346 0.2861 18.646 <2e-16 *** > x2b 2.4952 5.2160 0.478 0.633 > x1:x2b -0.3833 0.4568 -0.839 0.404 > > summary(aptest) > > Call: > lmp(formula = y ~ x1 * x2, data = test, perm = "", seqs = TRUE, > center = FALSE) > > Residuals: > Min 1Q Median 3Q Max > -17.1777 -9.5306 -0.9733 7.6840 22.2728 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > x1 5.1429 0.2284 22.516 <2e-16 *** > x21 -1.2476 2.6080 -0.478 0.633 > x1:x21 0.1917 0.2284 0.839 0.404 > > It looks like lmp() is internally coding dummy variables in a different way, so > lmp results are for "a" (named "1" by lmp) while lm results are for > "b" ?