Harry Athanassiou
2004-Nov-24 17:01 UTC
[R] what does order() stand for in an lme formula?
I'm a beginner in R, and trying to fit linear models with different
intercepts per group, of the type y ~ A*x1 + B, where x1 is a numerical
variable. I cannot understand whether I should use
y1 ~ x1 +1
or
y1 ~ order(x1) + 1
Although in the toy example included it makes a small difference, in models
with many groups the models without order() converge slower if at all!
Please help
---------------
R script : START
---------------
##
# what does order() do in an lme formula?
##
# prep data
y1 <- c(rnorm(25, 35, sd=5), rnorm(17, 55, sd=4)) # this is a line paralle
to x-axis (slope=0) with noise
x1 <- c(sample(1:25, 25, replace=F), sample(1:17, 17, replace=F)) # scramble
the x, so that they do not appear in-oorder
f1 <- c(rep("A",25), rep("B", 17))
dat1 <- data.frame(y1, x1, f1)
x1 <- NULL
y1 <- NULL
f1 <- NULL
# load libraries
require(nlme)
require(gmodels) # for the ci()
# fit model with and w/o order()
dat1.lm.1 <- lm(y1 ~ x1 + f1 -1, data=dat1)
dat1.lm.2 <- lm(y1 ~ order(x1) + f1 -1, data=dat1)
#
# using lme, and assigning f1 to the random effects; this is different
than in lm(), but in my larger models
# f1 is a repeated experiment vs a fixed factor
dat1.lme.1 <- lme(y1 ~ x1 + 1, random= ~ 1 | f1, data=dat1,
method="ML")
dat1.lme.2 <- update(dat1.lme.1, fixed=y1 ~ order(x1) + 1, random= ~ 1 |
f1)
# compare
summary(dat1.lm.1)
summary(dat1.lm.2)
ci(dat1.lm.1)
ci(dat1.lm.2)
#
summary(dat1.lme.1)
summary(dat1.lme.2)
ci(dat1.lme.1)
ci(dat1.lme.2)
---------------
R script : END
---------------
---------------
R session: START
---------------> # compare
> summary(dat1.lm.1)
Call:
lm(formula = y1 ~ x1 + f1 - 1, data = dat1)
Residuals:
Min 1Q Median 3Q Max
-7.1774 -2.9020 -0.1616 2.3576 10.0103
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x1 -0.06173 0.09318 -0.663 0.512
f1A 35.25704 1.43540 24.563 <2e-16 ***
f1B 54.98193 1.25519 43.804 <2e-16 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` '
1
Residual standard error: 3.851 on 39 degrees of freedom
Multiple R-Squared: 0.9928, Adjusted R-squared: 0.9923
F-statistic: 1799 on 3 and 39 DF, p-value: < 2.2e-16
> summary(dat1.lm.2)
Call:
lm(formula = y1 ~ order(x1) + f1 - 1, data = dat1)
Residuals:
Min 1Q Median 3Q Max
-7.0089 -3.0955 0.1829 2.3387 10.0083
Coefficients:
Estimate Std. Error t value Pr(>|t|)
order(x1) -0.002098 0.049830 -0.042 0.967
f1A 34.502668 1.381577 24.973 <2e-16 ***
f1B 54.466926 1.346118 40.462 <2e-16 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` '
1
Residual standard error: 3.872 on 39 degrees of freedom
Multiple R-Squared: 0.9927, Adjusted R-squared: 0.9922
F-statistic: 1779 on 3 and 39 DF, p-value: < 2.2e-16
> ci(dat1.lm.1)
Estimate CI lower CI upper Std. Error p-value
x1 -0.06173363 -0.2502001 0.1267329 0.09317612 5.115170e-01
f1A 35.25703803 32.3536752 38.1604009 1.43539619 2.461184e-25
f1B 54.98192838 52.4430771 57.5207797 1.25518501
8.793394e-35> ci(dat1.lm.2)
Estimate CI lower CI upper Std. Error p-value
order(x1) -0.002097868 -0.1028889 0.09869315 0.04983016 9.666335e-01
f1A 34.502667919 31.7081648 37.29717105 1.38157694 1.338416e-25
f1B 54.466925648 51.7441455 57.18970575 1.34611772
1.819408e-33> #
> summary(dat1.lme.1)
Linear mixed-effects model fit by maximum likelihood
Data: dat1
AIC BIC logLik
249.2458 256.1965 -120.6229
Random effects:
Formula: ~1 | f1
(Intercept) Residual
StdDev: 9.819254 3.802406
Fixed effects: y1 ~ x1 + 1
Value Std.Error DF t-value p-value
(Intercept) 45.14347 7.215924 39 6.256090 0.0000
x1 -0.06517 0.094245 39 -0.691489 0.4934
Correlation:
(Intr)
x1 -0.144
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.90575027 -0.76278244 -0.02205757 0.60302788 2.61718120
Number of Observations: 42
Number of Groups: 2 > summary(dat1.lme.2)
Linear mixed-effects model fit by maximum likelihood
Data: dat1
AIC BIC logLik
249.741 256.6917 -120.8705
Random effects:
Formula: ~1 | f1
(Intercept) Residual
StdDev: 9.944285 3.823607
Fixed effects: y1 ~ order(x1)
Value Std.Error DF t-value p-value
(Intercept) 44.48952 7.309834 39 6.086256 0.0000
order(x1) -0.00297 0.050415 39 -0.058968 0.9533
Correlation:
(Intr)
order(x1) -0.146
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.85021774 -0.81922106 0.03922119 0.61748886 2.60194590
Number of Observations: 42
Number of Groups: 2 > ci(dat1.lme.1)
Estimate CI lower CI upper Std. Error DF p-value
(Intercept) 45.14346803 30.5478836 59.7390525 7.2159242 39 2.283608e-07
x1 -0.06516927 -0.2557976 0.1254590 0.0942449 39
4.933546e-01> ci(dat1.lme.2)
Estimate CI lower CI upper Std. Error DF p-value
(Intercept) 44.489521212 29.7039857 59.2750567 7.30983431 39 3.930232e-07
order(x1) -0.002972836 -0.1049461 0.0990004 0.05041464 39
9.532789e-01>
---------------
R session : END
---------------
Harry Athanassiou
BioInformatics manager
Automated Cell, Inc
[[alternative HTML version deleted]]
"Harry Athanassiou" <hathanassiou at automatedcell.com> writes:> I'm a beginner in R, and trying to fit linear models with different > intercepts per group, of the type y ~ A*x1 + B, where x1 is a numerical > variable. I cannot understand whether I should use > y1 ~ x1 +1 > or > y1 ~ order(x1) + 1 > Although in the toy example included it makes a small difference, in models > with many groups the models without order() converge slower if at all!Er? What gave you the idea of using order in the first place? To the best of my knowledge, order(x) is also in this context just a function, which for the nth observation returns the position of the nth largest observation in x. This is not likely to make sense as a predictor in a model. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Reasonably Related Threads
- new to latex to pdf
- [PATCH 09/23] nv50-: separate vertex formats from surface format descriptions
- [PATCH 09/23] nv50-: separate vertex formats from surface format descriptions
- [LLVMdev] Questions on Memory Optimizations
- question about technieque do with large computation