Harry Athanassiou
2004-Nov-24 17:01 UTC
[R] what does order() stand for in an lme formula?
I'm a beginner in R, and trying to fit linear models with different intercepts per group, of the type y ~ A*x1 + B, where x1 is a numerical variable. I cannot understand whether I should use y1 ~ x1 +1 or y1 ~ order(x1) + 1 Although in the toy example included it makes a small difference, in models with many groups the models without order() converge slower if at all! Please help --------------- R script : START --------------- ## # what does order() do in an lme formula? ## # prep data y1 <- c(rnorm(25, 35, sd=5), rnorm(17, 55, sd=4)) # this is a line paralle to x-axis (slope=0) with noise x1 <- c(sample(1:25, 25, replace=F), sample(1:17, 17, replace=F)) # scramble the x, so that they do not appear in-oorder f1 <- c(rep("A",25), rep("B", 17)) dat1 <- data.frame(y1, x1, f1) x1 <- NULL y1 <- NULL f1 <- NULL # load libraries require(nlme) require(gmodels) # for the ci() # fit model with and w/o order() dat1.lm.1 <- lm(y1 ~ x1 + f1 -1, data=dat1) dat1.lm.2 <- lm(y1 ~ order(x1) + f1 -1, data=dat1) # # using lme, and assigning f1 to the random effects; this is different than in lm(), but in my larger models # f1 is a repeated experiment vs a fixed factor dat1.lme.1 <- lme(y1 ~ x1 + 1, random= ~ 1 | f1, data=dat1, method="ML") dat1.lme.2 <- update(dat1.lme.1, fixed=y1 ~ order(x1) + 1, random= ~ 1 | f1) # compare summary(dat1.lm.1) summary(dat1.lm.2) ci(dat1.lm.1) ci(dat1.lm.2) # summary(dat1.lme.1) summary(dat1.lme.2) ci(dat1.lme.1) ci(dat1.lme.2) --------------- R script : END --------------- --------------- R session: START ---------------> # compare > summary(dat1.lm.1)Call: lm(formula = y1 ~ x1 + f1 - 1, data = dat1) Residuals: Min 1Q Median 3Q Max -7.1774 -2.9020 -0.1616 2.3576 10.0103 Coefficients: Estimate Std. Error t value Pr(>|t|) x1 -0.06173 0.09318 -0.663 0.512 f1A 35.25704 1.43540 24.563 <2e-16 *** f1B 54.98193 1.25519 43.804 <2e-16 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 3.851 on 39 degrees of freedom Multiple R-Squared: 0.9928, Adjusted R-squared: 0.9923 F-statistic: 1799 on 3 and 39 DF, p-value: < 2.2e-16> summary(dat1.lm.2)Call: lm(formula = y1 ~ order(x1) + f1 - 1, data = dat1) Residuals: Min 1Q Median 3Q Max -7.0089 -3.0955 0.1829 2.3387 10.0083 Coefficients: Estimate Std. Error t value Pr(>|t|) order(x1) -0.002098 0.049830 -0.042 0.967 f1A 34.502668 1.381577 24.973 <2e-16 *** f1B 54.466926 1.346118 40.462 <2e-16 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 3.872 on 39 degrees of freedom Multiple R-Squared: 0.9927, Adjusted R-squared: 0.9922 F-statistic: 1779 on 3 and 39 DF, p-value: < 2.2e-16> ci(dat1.lm.1)Estimate CI lower CI upper Std. Error p-value x1 -0.06173363 -0.2502001 0.1267329 0.09317612 5.115170e-01 f1A 35.25703803 32.3536752 38.1604009 1.43539619 2.461184e-25 f1B 54.98192838 52.4430771 57.5207797 1.25518501 8.793394e-35> ci(dat1.lm.2)Estimate CI lower CI upper Std. Error p-value order(x1) -0.002097868 -0.1028889 0.09869315 0.04983016 9.666335e-01 f1A 34.502667919 31.7081648 37.29717105 1.38157694 1.338416e-25 f1B 54.466925648 51.7441455 57.18970575 1.34611772 1.819408e-33> # > summary(dat1.lme.1)Linear mixed-effects model fit by maximum likelihood Data: dat1 AIC BIC logLik 249.2458 256.1965 -120.6229 Random effects: Formula: ~1 | f1 (Intercept) Residual StdDev: 9.819254 3.802406 Fixed effects: y1 ~ x1 + 1 Value Std.Error DF t-value p-value (Intercept) 45.14347 7.215924 39 6.256090 0.0000 x1 -0.06517 0.094245 39 -0.691489 0.4934 Correlation: (Intr) x1 -0.144 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -1.90575027 -0.76278244 -0.02205757 0.60302788 2.61718120 Number of Observations: 42 Number of Groups: 2> summary(dat1.lme.2)Linear mixed-effects model fit by maximum likelihood Data: dat1 AIC BIC logLik 249.741 256.6917 -120.8705 Random effects: Formula: ~1 | f1 (Intercept) Residual StdDev: 9.944285 3.823607 Fixed effects: y1 ~ order(x1) Value Std.Error DF t-value p-value (Intercept) 44.48952 7.309834 39 6.086256 0.0000 order(x1) -0.00297 0.050415 39 -0.058968 0.9533 Correlation: (Intr) order(x1) -0.146 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -1.85021774 -0.81922106 0.03922119 0.61748886 2.60194590 Number of Observations: 42 Number of Groups: 2> ci(dat1.lme.1)Estimate CI lower CI upper Std. Error DF p-value (Intercept) 45.14346803 30.5478836 59.7390525 7.2159242 39 2.283608e-07 x1 -0.06516927 -0.2557976 0.1254590 0.0942449 39 4.933546e-01> ci(dat1.lme.2)Estimate CI lower CI upper Std. Error DF p-value (Intercept) 44.489521212 29.7039857 59.2750567 7.30983431 39 3.930232e-07 order(x1) -0.002972836 -0.1049461 0.0990004 0.05041464 39 9.532789e-01>--------------- R session : END --------------- Harry Athanassiou BioInformatics manager Automated Cell, Inc [[alternative HTML version deleted]]
"Harry Athanassiou" <hathanassiou at automatedcell.com> writes:> I'm a beginner in R, and trying to fit linear models with different > intercepts per group, of the type y ~ A*x1 + B, where x1 is a numerical > variable. I cannot understand whether I should use > y1 ~ x1 +1 > or > y1 ~ order(x1) + 1 > Although in the toy example included it makes a small difference, in models > with many groups the models without order() converge slower if at all!Er? What gave you the idea of using order in the first place? To the best of my knowledge, order(x) is also in this context just a function, which for the nth observation returns the position of the nth largest observation in x. This is not likely to make sense as a predictor in a model. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Apparently Analagous Threads
- new to latex to pdf
- [PATCH 09/23] nv50-: separate vertex formats from surface format descriptions
- [PATCH 09/23] nv50-: separate vertex formats from surface format descriptions
- [LLVMdev] Questions on Memory Optimizations
- question about technieque do with large computation