Dear R list members,
I've turned up a strange discrepancy between results obtained from the lme 
function in the nlme package in R and results obtained with lme in S-PLUS. 
I'm using version 3.1-24 of nlme in R 1.4.1 under Windows 2000, and both 
S-PLUS 2000 and 6.0, again under Windows 2000.
I've noticed discrepancies in a couple of instances. Here's one, using
data
from Bryk and Raudenbush's Hierarchical Linear models:
 From R:
     > attach(Bryk)
     > cses <- meanses <- numeric(length(ses))  # initialize
     > for (sc in unique(school)) {
     +     meanses[school==sc] <- mean(ses[school==sc])
     +     cses[school==sc]<-ses[school==sc]- meanses[school==sc]
     +     }
     > Bryk$cses <- cses
     > Bryk$meanses <- meanses
     > Bryk[1:10,]
        school    ses mathach sector      cses  meanses
     1    1224  0.022   4.583 public  0.456383 -0.43438
     2    1224  0.332  20.349 public  0.766383 -0.43438
     3    1224  0.372   6.714 public  0.806383 -0.43438
     4    1224 -0.078  16.405 public  0.356383 -0.43438
     5    1224 -0.158  17.898 public  0.276383 -0.43438
     6    1224 -0.298  19.338 public  0.136383 -0.43438
     7    1224 -0.458  21.521 public -0.023617 -0.43438
     8    1224 -0.468   3.154 public -0.033617 -0.43438
     9    1224 -0.468  21.178 public -0.033617 -0.43438
     10   1224 -0.528  20.349 public -0.093617 -0.43438
     > remove(cses, meanses)
     > Bryk$sector <- factor(Bryk$sector, levels=c('public',
'Catholic'))
     > contrasts(Bryk$sector)
             Catholic
     public          0
     Catholic        1
     >
     > bryk.lme.1<-lme(mathach ~ meanses*cses + sector*cses,
     +     random = ~ cses | school,
     +     data=Bryk)
     > summary(bryk.lme.1)
     Linear mixed-effects model fit by REML
     Data: Bryk
         AIC   BIC logLik
     46525 46594 -23252
     Random effects:
     Formula: ~cses | school
     Structure: General positive-definite, Log-Cholesky parametrization
                 StdDev   Corr
     (Intercept) 1.541177 (Intr)
     cses        0.018174 0.006
     Residual    6.063492
     Fixed effects: mathach ~ meanses * cses + sector * cses
                           Value Std.Error   DF t-value p-value
     (Intercept)         12.1282   0.19920 7022  60.886  <.0001
     meanses              5.3367   0.36898  157  14.463  <.0001
     cses                 2.9421   0.15122 7022  19.456  <.0001
     sectorCatholic       1.2245   0.30611  157   4.000   1e-04
     meanses:cses         1.0444   0.29107 7022   3.588   3e-04
     cses:sectorCatholic -1.6421   0.23312 7022  -7.044  <.0001
     Correlation:
                         (Intr) meanss cses   sctrCt mnss:c
     meanses              0.256
     cses                 0.000  0.000
     sectorCatholic      -0.699 -0.356  0.000
     meanses:cses         0.000  0.000  0.295  0.000
     cses:sectorCatholic  0.000  0.000 -0.696  0.000 -0.351
     Standardized Within-Group Residuals:
         Min        Q1       Med        Q3       Max
     -3.170106 -0.724877  0.014892  0.754263  2.965498
     Number of Observations: 7185
     Number of Groups: 160
     >
 From S-PLUS 6.0 (identical results from S-PLUS 2000):
     > attach(Bryk)
     > cses <- meanses <- numeric(length(ses))  # initialize
     > for (sc in unique(school)) {
     +     meanses[school==sc] <- mean(ses[school==sc])
     +     cses[school==sc]<-ses[school==sc]- meanses[school==sc]
     +     }
     > Bryk$cses <- cses
     > Bryk$meanses <- meanses
     > Bryk[1:10,]
       school    ses mathach sector      cses  meanses
     1   1224  0.022   4.583 public  0.456383 -0.43438
     2   1224  0.332  20.349 public  0.766383 -0.43438
     3   1224  0.372   6.714 public  0.806383 -0.43438
     4   1224 -0.078  16.405 public  0.356383 -0.43438
     5   1224 -0.158  17.898 public  0.276383 -0.43438
     6   1224 -0.298  19.338 public  0.136383 -0.43438
     7   1224 -0.458  21.521 public -0.023617 -0.43438
     8   1224 -0.468   3.154 public -0.033617 -0.43438
     9   1224 -0.468  21.178 public -0.033617 -0.43438
     10   1224 -0.528  20.349 public -0.093617 -0.43438
     > remove(c('cses', 'meanses'))
     > options(contrasts=c('contr.treatment', 'contr.poly'))
     >
     > Bryk$sector <- factor(Bryk$sector, levels=c('public',
'Catholic'))
     > contrasts(Bryk$sector)
             Catholic
     public        0
     Catholic      1
     >
     > bryk.lme.1<-lme(mathach ~ meanses*cses + sector*cses,
     +     random = ~ cses | school,
     +     data=Bryk)
     > summary(bryk.lme.1)
     Linear mixed-effects model fit by REML
     Data: Bryk
         AIC   BIC logLik
     46524 46592 -23252
     Random effects:
     Formula:  ~ cses | school
     Structure: General positive-definite
                 StdDev   Corr
     (Intercept) 1.54278 (Inter
         cses 0.32004 0.395
     Residual 6.05976
     Fixed effects: mathach ~ meanses * cses + sector * cses
                   Value Std.Error   DF t-value p-value
     (Intercept)  12.128   0.19931 7022  60.851  <.0001
         meanses   5.333   0.36920  157  14.444  <.0001
            cses   2.945   0.15565 7022  18.921  <.0001
          sector   1.227   0.30630  157   4.005  0.0001
     meanses:cses  1.039   0.29899 7022   3.476  0.0005
     sector:cses  -1.643   0.23986 7022  -6.849  <.0001
     Correlation:
                 (Intr) meanss   cses sector mnss:c
          meanses  0.256
             cses  0.076  0.020
           sector -0.699 -0.356 -0.053
     meanses:cses  0.019  0.076  0.293 -0.027
      sector:cses -0.053 -0.028 -0.696  0.078 -0.351
     Standardized Within-Group Residuals:
         Min       Q1     Med      Q3    Max
     -3.1591 -0.72314 0.01723 0.75461 2.9581
     Number of Observations: 7185
     Number of Groups: 160
Notice that the results are very close, except for the standard deviation 
of the random cses effect and the correlation between the random effects 
for cses and the intercept. Published results and SAS PROC MIXED agree with 
the S-PLUS output.
I don't think that I've done anything wrong in setting up the problem in
R.
In any event, any suggestions would be greatly appreciated. I could make 
the data available, if that would help.
Thanks,
  John
-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox
-----------------------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
John, I'm travelling today and won't be able to check this discrepancy. I will try to look at it tomorrow. A couple of things to consider: - the underlying optimization software is different in S-PLUS and R. It may be that the REML estimates are poorly defined and convergence is being declared in different places by different software. Have you checked the intervals on those variance components using intervals(bryk.lme.1) - I believe the data are already in the nlme package as the MathAchieve and MathAchSchool data sets. - In Bryk's analysis the meanses is calculated incorrectly. - Some analysis of these data is given in the slides at http://www.stat.wisc.edu/~st850-1/MEcourse6_4up.pdf - It is much easier (and faster) to calculate the meanses gsummary. I think it would be gsummary(MathAchieve, which = "ses") -- Douglas Bates bates at stat.wisc.edu Statistics Department 608/262-2598 University of Wisconsin - Madison http://www.stat.wisc.edu/~bates/ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Dear Doug, First, thanks for getting back to me. At 08:18 AM 5/2/2002 -0500, Douglas Bates wrote:>John, > >I'm travelling today and won't be able to check this discrepancy. I >will try to look at it tomorrow.There's no rush.>A couple of things to consider: > > - the underlying optimization software is different in S-PLUS and R. > It may be that the REML estimates are poorly defined and > convergence is being declared in different places by different software. > Have you checked the intervals on those variance components using > intervals(bryk.lme.1)It occurred to me that the problem may be ill-conditioned, but I didn't check the confidence intervals for the variance components. I did look at the log-likelihoods, though. To more decimal places than in my previous note, they are: R: -23252.3919 S-PLUS: -23253.2177 Getting the confidence intervals on the variance components produces: R: > intervals(bryk.lme.1) Approximate 95% confidence intervals Fixed effects: lower est. upper (Intercept) 11.7377216 12.128207 12.518692 meanses 4.6078631 5.336665 6.065467 cses 2.6457000 2.942145 3.238589 sectorCatholic 0.6198986 1.224531 1.829164 meanses:cses 0.4738118 1.044406 1.615000 cses:sectorCatholic -2.0991261 -1.642148 -1.185170 Random Effects: Level: school lower est. upper sd((Intercept)) 1.321746e+00 1.541176850 1.797037e+00 sd(cses) 6.686558e-09 0.018173637 4.939478e+04 cor((Intercept),cses) -8.630434e-02 0.005809486 9.782482e-02 Within-group standard error: lower est. upper 5.964008 6.063492 6.164636 S-PLUS > intervals(bryk.lme.1) Problem in intervals.lme(bryk.lme.1): Cannot get confidence inter vals on var-cov components: Non-positive definite approximate var iance-covariance Use traceback() to see the call stack Obviously, there's a problem here, as you suspected: the interval for sd(cses) in the R solution is very broad, and the covariance matrix in the S-PLUS solution isn't positive definite. (Does the S-PLUS version of the software use the log-Cholesky parametrization of the covariance matrix for the random effects?)> - I believe the data are already in the nlme package as the > MathAchieve and MathAchSchool data sets.Yes, they are -- I didn't know that. Thanks.> - In Bryk's analysis the meanses is calculated incorrectly.I was aware of this -- it's why I recalculated the means rather than using Bryk and Raudenbush's -- but the results don't change much.> - Some analysis of these data is given in the slides at > http://www.stat.wisc.edu/~st850-1/MEcourse6_4up.pdfAgain, thanks. I'll take a look. I'm working up the example for use in my appendix on mixed models, a draft of which is at <http://socserv.socsci.mcmaster.ca/jfox/Books/Companion/appendix-mixed-models.pdf>.> - It is much easier (and faster) to calculate the meanses gsummary. > I think it would be gsummary(MathAchieve, which = "ses")The versions of gsummary that I have don't seem to take a which argument, but tapply works (and is faster than my previous code, though in a data set this size both are essentially instantaneous): > mses <- tapply(ses, school, mean) > meanses <- mses[as.character(school)] I expect that the simple answer to my basic question is that the problem is ill-conditioned. Interesting, since Bryk and Raudenbush's data have been used a lot. Thanks for the suggestions and for your help. John ----------------------------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: jfox at mcmaster.ca phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox ----------------------------------------------------- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._