Harry Athanassiou
2004-Nov-24  17:01 UTC
[R] what does order() stand for in an lme formula?
I'm a beginner in R, and trying to fit linear models with different
intercepts per group, of the type y ~ A*x1 + B, where x1 is a numerical
variable. I cannot understand whether I should use
            y1 ~ x1 +1
or
            y1 ~ order(x1) + 1
Although in the toy example included it makes a small difference, in models
with many groups the models without order() converge slower if at all!
 
Please help
---------------
R script : START
---------------
##
# what does order() do in an lme formula?
##
 
# prep data
y1 <- c(rnorm(25, 35, sd=5), rnorm(17, 55, sd=4)) # this is a line paralle
to x-axis (slope=0) with noise
x1 <- c(sample(1:25, 25, replace=F), sample(1:17, 17, replace=F)) # scramble
the x, so that they do not appear in-oorder
f1 <- c(rep("A",25), rep("B", 17))
dat1 <- data.frame(y1, x1, f1)
x1 <- NULL
y1 <- NULL
f1 <- NULL
 
# load libraries
require(nlme)
require(gmodels) # for the ci()
 
# fit model with and w/o order()
dat1.lm.1  <- lm(y1 ~ x1 + f1 -1, data=dat1)
dat1.lm.2  <- lm(y1 ~ order(x1) + f1 -1, data=dat1)
#
#    using lme, and assigning f1 to the random effects; this is different
than in lm(), but in my larger models
#        f1 is a repeated experiment vs a fixed factor
dat1.lme.1 <- lme(y1 ~ x1 + 1, random= ~  1 | f1, data=dat1,
method="ML")
dat1.lme.2  <- update(dat1.lme.1, fixed=y1 ~ order(x1) + 1, random= ~ 1 |
f1)
# compare
summary(dat1.lm.1)
summary(dat1.lm.2)
ci(dat1.lm.1)
ci(dat1.lm.2)
#
summary(dat1.lme.1)
summary(dat1.lme.2)
ci(dat1.lme.1)
ci(dat1.lme.2)
 
---------------
R script : END
---------------
 
 
 
---------------
R session: START
---------------> # compare
> summary(dat1.lm.1)
 
Call:
lm(formula = y1 ~ x1 + f1 - 1, data = dat1)
 
Residuals:
    Min      1Q  Median      3Q     Max 
-7.1774 -2.9020 -0.1616  2.3576 10.0103 
 
Coefficients:
    Estimate Std. Error t value Pr(>|t|)    
x1  -0.06173    0.09318  -0.663    0.512    
f1A 35.25704    1.43540  24.563   <2e-16 ***
f1B 54.98193    1.25519  43.804   <2e-16 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` '
1
 
Residual standard error: 3.851 on 39 degrees of freedom
Multiple R-Squared: 0.9928,     Adjusted R-squared: 0.9923 
F-statistic:  1799 on 3 and 39 DF,  p-value: < 2.2e-16 
 > summary(dat1.lm.2)
 
Call:
lm(formula = y1 ~ order(x1) + f1 - 1, data = dat1)
 
Residuals:
    Min      1Q  Median      3Q     Max 
-7.0089 -3.0955  0.1829  2.3387 10.0083 
 
Coefficients:
           Estimate Std. Error t value Pr(>|t|)    
order(x1) -0.002098   0.049830  -0.042    0.967    
f1A       34.502668   1.381577  24.973   <2e-16 ***
f1B       54.466926   1.346118  40.462   <2e-16 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` '
1
 
Residual standard error: 3.872 on 39 degrees of freedom
Multiple R-Squared: 0.9927,     Adjusted R-squared: 0.9922 
F-statistic:  1779 on 3 and 39 DF,  p-value: < 2.2e-16 
 > ci(dat1.lm.1)
       Estimate   CI lower   CI upper Std. Error      p-value
x1  -0.06173363 -0.2502001  0.1267329 0.09317612 5.115170e-01
f1A 35.25703803 32.3536752 38.1604009 1.43539619 2.461184e-25
f1B 54.98192838 52.4430771 57.5207797 1.25518501
8.793394e-35> ci(dat1.lm.2)
              Estimate   CI lower    CI upper Std. Error      p-value
order(x1) -0.002097868 -0.1028889  0.09869315 0.04983016 9.666335e-01
f1A       34.502667919 31.7081648 37.29717105 1.38157694 1.338416e-25
f1B       54.466925648 51.7441455 57.18970575 1.34611772
1.819408e-33> #
> summary(dat1.lme.1)
Linear mixed-effects model fit by maximum likelihood
 Data: dat1 
       AIC      BIC    logLik
  249.2458 256.1965 -120.6229
 
Random effects:
 Formula: ~1 | f1
        (Intercept) Residual
StdDev:    9.819254 3.802406
 
Fixed effects: y1 ~ x1 + 1 
               Value Std.Error DF   t-value p-value
(Intercept) 45.14347  7.215924 39  6.256090  0.0000
x1          -0.06517  0.094245 39 -0.691489  0.4934
 Correlation: 
   (Intr)
x1 -0.144
 
Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-1.90575027 -0.76278244 -0.02205757  0.60302788  2.61718120 
 
Number of Observations: 42
Number of Groups: 2 > summary(dat1.lme.2)
Linear mixed-effects model fit by maximum likelihood
 Data: dat1 
      AIC      BIC    logLik
  249.741 256.6917 -120.8705
 
Random effects:
 Formula: ~1 | f1
        (Intercept) Residual
StdDev:    9.944285 3.823607
 
Fixed effects: y1 ~ order(x1) 
               Value Std.Error DF   t-value p-value
(Intercept) 44.48952  7.309834 39  6.086256  0.0000
order(x1)   -0.00297  0.050415 39 -0.058968  0.9533
 Correlation: 
          (Intr)
order(x1) -0.146
 
Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-1.85021774 -0.81922106  0.03922119  0.61748886  2.60194590 
 
Number of Observations: 42
Number of Groups: 2 > ci(dat1.lme.1)
               Estimate   CI lower   CI upper Std. Error DF      p-value
(Intercept) 45.14346803 30.5478836 59.7390525  7.2159242 39 2.283608e-07
x1          -0.06516927 -0.2557976  0.1254590  0.0942449 39
4.933546e-01> ci(dat1.lme.2)
                Estimate   CI lower   CI upper Std. Error DF      p-value
(Intercept) 44.489521212 29.7039857 59.2750567 7.30983431 39 3.930232e-07
order(x1)   -0.002972836 -0.1049461  0.0990004 0.05041464 39
9.532789e-01> 
 
---------------
R session : END
---------------
     
 
Harry Athanassiou
BioInformatics manager
Automated Cell, Inc
 
	[[alternative HTML version deleted]]
"Harry Athanassiou" <hathanassiou at automatedcell.com> writes:> I'm a beginner in R, and trying to fit linear models with different > intercepts per group, of the type y ~ A*x1 + B, where x1 is a numerical > variable. I cannot understand whether I should use > y1 ~ x1 +1 > or > y1 ~ order(x1) + 1 > Although in the toy example included it makes a small difference, in models > with many groups the models without order() converge slower if at all!Er? What gave you the idea of using order in the first place? To the best of my knowledge, order(x) is also in this context just a function, which for the nth observation returns the position of the nth largest observation in x. This is not likely to make sense as a predictor in a model. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Reasonably Related Threads
- new to latex to pdf
- [PATCH 09/23] nv50-: separate vertex formats from surface format descriptions
- [PATCH 09/23] nv50-: separate vertex formats from surface format descriptions
- [LLVMdev] Questions on Memory Optimizations
- question about technieque do with large computation