Keith Wong
2004-Jul-04 07:21 UTC
[R] Random intercept model with time-dependent covariates, results different from SAS
Dear list-members I am new to R and a statistics beginner. I really like the ease with which I can extract and manipulate data in R, and would like to use it primarily. I've been learning by checking analyses that have already been run in SAS. In an experiment with Y being a response variable, and group a 2-level between-subject factor, and time a 5-level within-subject factor. 2 time-dependent covariates are also measured (continuous variables W and Z). The subject id variable (ID) is unique to each subject, and is not duplicated across groups. I tried to fit a random intercept model in R (after setting options(contrasts c(factor = "contr.SAS", ordered = "contr.poly")) as recommended on this list), and making the time, group and id variables factors:> g2 = lme(Y ~ time + group + time:group + W + Z, random = ~ 1 | id, data datamod)> anova(g2)numDF denDF F-value p-value (Intercept) 1 42 5.54545 0.0233 time 4 42 16.41069 <.0001 group 1 11 0.83186 0.3813 W 1 42 0.07555 0.7848 Z 1 42 45.23577 <.0001 time:group 4 42 3.04313 0.0273 I compared the results using SAS proc mixed: proc mixed data = datamod; class id time group; model Y = time group time*group W Z /s; random int / sub = id; run; And get the following anova table for the fixed effects: Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F time 4 42 2.55 0.0534 group 1 42 0.54 0.4664 time*group 4 42 3.04 0.0273 W 1 42 8.80 0.0050 Z 1 42 32.52 <.0001 I am perplexed to see that the test for the main effect "time" is quite different. Both models seem to be specified equivalently to me, am I doing something wrong - particularly with the inclusion of the time-dependent covariates W and Z? I have looked at the data in both programmes and they are the same. There are no missing observations. I a simpler model without the time-dependent covariates, and in this case the results are similar: [R]> g1 = lme(Y ~ time + group + time:group, random = ~ 1 | id, data = datamod) > anova(g1)numDF denDF F-value p-value (Intercept) 1 44 3.387117 0.0725 time 4 44 10.620547 <.0001 group 1 11 0.508092 0.4908 time:group 4 44 3.961726 0.0079 [SAS] proc mixed data = datamod; class id time group; model Y = time group time*group /s; random int / sub = id; run; Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F time 4 44 7.75 <.0001 group 1 44 0.51 0.4797 time*group 4 44 3.96 0.0079 Secondly, is there no R equivalent of the "LSMEANS" statement in SAS? Is there a work-around? I would very much appreciate assistance. Thanks. Keith
Prof Brian Ripley
2004-Jul-04 08:13 UTC
[R] Random intercept model with time-dependent covariates, results different from SAS
Looking at the significance of a main effect (group) in the presence of an interaction (time:group) is hard to interpret, and in your case is I think not even interesting. (The `main effect' probably represents difference in intercept for the time effect, that is the group difference at the last time. But see the next para.) Note that the two systems are returning different denominator dfs. At this point you have not told us enough. My guess is that you have complete balance with the same number of subjects in each group. In that case the `group' effect is in the between-subjects stratum (as defined for the use of Error in aov, which you could also do), and thus R's 11 df would be right (rather than 44, without W and Z). Without balance Type III tests get much harder to interpret and the `group' effect would appear in two strata and there is no simple F test in the classical theory. So further guessing, SAS may have failed to detect balance and so used the wrong test. The time-dependent covariates muddy the issue more, and I looked mainly at the analyses without them. Again, a crucial fact is not here: do the covariates depend on the subjects as well? The good news is that the results _are_ similar. You do have different time behaviour in the two groups. So stop worrying about tests of uninteresting hypotheses and concentrate of summarizing that difference. On Sun, 4 Jul 2004, Keith Wong wrote:> Dear list-members > > I am new to R and a statistics beginner. I really like the ease with which I can > extract and manipulate data in R, and would like to use it primarily. I've > been learning by checking analyses that have already been run in SAS.That presumes SAS gets them `right'. A body of experienced statisticians thinks that `Type III' tests are somewhere between `often misleading' and `evil'. It certainly presents tests that are uninteresting without you asking for them. It reminds me of the days of yore when we taught using Minitab and had to explain that the Durbin-Watson test was not relevant to almost all regressions the students would see, yet appeared on every piece of output. The standard output from the lme fit (which you have not shown us) would be more revealing. I don't think I have ever used anova() on a single lme fit (I would look at the estimates and use anova on two fits, which is full (RE)ML likelihood ratio not a Wald approximation). May I suggest a better way to learn is to understand published examples using lme, in Pinheiro & Bates (2000) or MASS4 for example.> In an experiment with Y being a response variable, and group a 2-level > between-subject factor, and time a 5-level within-subject factor. 2 > time-dependent covariates are also measured (continuous variables W and > Z). The subject id variable (ID) is unique to each subject, and is not > duplicated across groups. > > I tried to fit a random intercept model in R (after setting options(contrasts > c(factor = "contr.SAS", ordered = "contr.poly")) as recommended on this list), > and making the time, group and id variables factors: > > > g2 = lme(Y ~ time + group + time:group + W + Z, random = ~ 1 | id, data > datamod) > > > anova(g2) > numDF denDF F-value p-value > (Intercept) 1 42 5.54545 0.0233 > time 4 42 16.41069 <.0001 > group 1 11 0.83186 0.3813 > W 1 42 0.07555 0.7848 > Z 1 42 45.23577 <.0001 > time:group 4 42 3.04313 0.0273 > > > I compared the results using SAS proc mixed: > proc mixed data = datamod; > class id time group; > model Y = time group time*group W Z /s; > random int / sub = id; > run; > > And get the following anova table for the fixed effects: > > Type 3 Tests of Fixed Effects > > Num Den > Effect DF DF F Value Pr > F > > time 4 42 2.55 0.0534 > group 1 42 0.54 0.4664 > time*group 4 42 3.04 0.0273 > W 1 42 8.80 0.0050 > Z 1 42 32.52 <.0001 > > I am perplexed to see that the test for the main effect "time" is quite > different. Both models seem to be specified equivalently to me, am I doing > something wrong - particularly with the inclusion of the time-dependent > covariates W and Z? I have looked at the data in both programmes and they are > the same. There are no missing observations. > > I a simpler model without the time-dependent covariates, and in this case the > results are similar: > [R] > > g1 = lme(Y ~ time + group + time:group, random = ~ 1 | id, data = datamod) > > anova(g1) > numDF denDF F-value p-value > (Intercept) 1 44 3.387117 0.0725 > time 4 44 10.620547 <.0001 > group 1 11 0.508092 0.4908 > time:group 4 44 3.961726 0.0079 > > > [SAS] > proc mixed data = datamod; > class id time group; > model Y = time group time*group /s; > random int / sub = id; > run; > > Type 3 Tests of Fixed Effects > > Num Den > Effect DF DF F Value Pr > F > > time 4 44 7.75 <.0001 > group 1 44 0.51 0.4797 > time*group 4 44 3.96 0.0079 > > Secondly, is there no R equivalent of the "LSMEANS" statement in SAS? Is there a > work-around? > > I would very much appreciate assistance. > > Thanks. Keith > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Dan Bebber
2004-Jul-07 09:39 UTC
[R] Random intercept model with time-dependent covariates, results different from SAS
Hello, I have been struggling with a similar problem, i.e. fitting an LME model to unbalanced repeated measures data. I found "Linear Mixed Models" by John Fox (http://socserv2.socsci.mcmaster.ca/jfox/Books/Companion/appendix-mixed-mode ls.pdf) quite helpful. Fox gives examples which are unbalanced, so I guess that balance is not a requirement (assuming Fox is correct). However, the sample sizes are large compared to yours (and mine), which may make a difference. Dan Bebber ____________________________ Dr. Daniel P. Bebber Department of Plant Sciences University of Oxford South Parks Road Oxford OX1 3RB Tel. 01865 275060 Web. http://www.forestecology.co.uk/ "Data, data, data!" he cried impatiently. "I can't make bricks without clay" - Sherlock Holmes, The Adventure of the Copper Beeches, 1892> Message: 24 > Date: Sun, 4 Jul 2004 19:21:32 +1000 > From: Keith Wong <keithw at med.usyd.edu.au> > Subject: Re: [R] Random intercept model with time-dependent covariates,results different from AS> To: Prof Brian Ripley <ripley at stats.ox.ac.uk> > Cc: r-help at stat.math.ethz.ch > Message-ID: <1088932892.40e7cc1c8c24d at www.mail.med.usyd.edu.au> > Content-Type: text/plain; charset=ISO-8859-1 > > Thank you for the very prompt response. I only included a small > part of the > output to make the message brief. I'm sorry it did not provide > enough detail to > answer my question. I have appended the summary() and anova() > outputs to the > two models I fitted in R. > > Quoting Prof Brian Ripley <ripley at stats.ox.ac.uk>: > > > Looking at the significance of a main effect (group) in the > presence of an > > interaction (time:group) is hard to interpret, and in your case > is I think > > not even interesting. (The `main effect' probably represents difference > > in intercept for the time effect, that is the group difference > at the last > > time. But see the next para.) Note that the two systems are returning > > different denominator dfs. > > > I take your point that the main effect is probably not interesting in the > presence of an interaction. I was checking the results for > consistency to see > if I was doing the right thing. I was not 100% sure that the SAS > code was in > itself correct. > > > At this point you have not told us enough. My guess is that you have > > complete balance with the same number of subjects in each > group. In that > > case the `group' effect is in the between-subjects stratum (as > defined for > > the use of Error in aov, which you could also do), and thus R's 11 df > > would be right (rather than 44, without W and Z). Without balance Type > > III tests get much harder to interpret and the `group' effect > would appear > > in two strata and there is no simple F test in the classical theory. So > > further guessing, SAS may have failed to detect balance and so used the > > wrong test. > > I had not appreciated the need for balance: in actual fact, one > group has 5 > subjects and the other 7. Will this be a problem? Would the R > analysis still be > valid in that case? > > > > The time-dependent covariates muddy the issue more, and I > looked mainly at > > the analyses without them. Again, a crucial fact is not here: do the > > covariates depend on the subjects as well? > > Yes the covariates are measures of blood pressure and pulse, and > they depend on > the subjects as well. > > > The good news is that the results _are_ similar. You do have different > > time behaviour in the two groups. So stop worrying about tests of > > uninteresting hypotheses and concentrate of summarizing that difference. > > > > -- > > Brian D. Ripley, ripley at stats.ox.ac.uk > > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > > University of Oxford, Tel: +44 1865 272861 (self) > > 1 South Parks Road, +44 1865 272866 (PA) > > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > > Thank you. I was concerned that one or both methods were > incorrect given the > results were inconsistent. Perhaps reassuringly, the parameter > estimates for > the fixed effects in both SAS and R were the same. > > Is the model specification OK for the model with just time, group > and their > interaction? > Is the model specification with the 2 time dependent covariates > appropriate? > > Once again, I'm very grateful for the time you've taken to answer > my questions. > > Keith > > [Output from the 2 models fitted in R follows] > > > g1 = lme(Y ~ time + group + time:group, random = ~ 1 | id, data > = datamod) > > > anova(g1) > numDF denDF F-value p-value > (Intercept) 1 44 3.387117 0.0725 > time 4 44 10.620547 <.0001 > group 1 11 0.508092 0.4908 > time:group 4 44 3.961726 0.0079 > > summary(g1) > Linear mixed-effects model fit by REML > Data: datamod > AIC BIC logLik > 372.4328 396.5208 -174.2164 > > Random effects: > Formula: ~1 | id > (Intercept) Residual > StdDev: 11.05975 3.228684 > > Fixed effects: Y ~ time + group + time:group > Value Std.Error DF t-value p-value > (Intercept) 8.250 4.073428 44 2.025321 0.0489 > time1 -0.250 1.614342 44 -0.154862 0.8776 > time2 -8.125 1.614342 44 -5.033011 0.0000 > time3 -8.875 1.614342 44 -5.497596 0.0000 > time4 -4.250 1.614342 44 -2.632652 0.0116 > group1 2.126 6.568205 11 0.323681 0.7523 > time1:group1 -2.734 2.603048 44 -1.050307 0.2993 > time2:group1 5.583 2.603048 44 2.144793 0.0375 > time3:group1 5.549 2.603048 44 2.131732 0.0387 > time4:group1 3.634 2.603048 44 1.396056 0.1697 > Correlation: > (Intr) time1 time2 time3 time4 group1 tm1:g1 > tm2:g1 tm3:g1 > time1 -0.198 > > time2 -0.198 0.500 > > time3 -0.198 0.500 0.500 > > time4 -0.198 0.500 0.500 0.500 > > group1 -0.620 0.123 0.123 0.123 0.123 > > time1:group1 0.123 -0.620 -0.310 -0.310 -0.310 -0.198 > > time2:group1 0.123 -0.310 -0.620 -0.310 -0.310 -0.198 0.500 > > time3:group1 0.123 -0.310 -0.310 -0.620 -0.310 -0.198 0.500 > 0.500 > time4:group1 0.123 -0.310 -0.310 -0.310 -0.620 -0.198 0.500 > 0.500 0.500 > > Standardized Within-Group Residuals: > Min Q1 Med Q3 Max > -2.63416413 -0.42033405 0.03577472 0.46164486 1.74068368 > > Number of Observations: 65 > Number of Groups: 13 > > > > g2 = lme(Y ~ time + group + time:group + W + Z, random = ~ 1 | > id, data > datamod) > > > anova(g2) > numDF denDF F-value p-value > (Intercept) 1 42 5.54545 0.0233 > time 4 42 16.41069 <.0001 > group 1 11 0.83186 0.3813 > W 1 42 0.07555 0.7848 > Z 1 42 45.23577 <.0001 > time:group 4 42 3.04313 0.0273 > > > summary(g2) > Linear mixed-effects model fit by REML > Data: datamod > AIC BIC logLik > 355.2404 382.8245 -163.6202 > > Random effects: > Formula: ~1 | id > (Intercept) Residual > StdDev: 8.639157 2.597380 > > Fixed effects: Y ~ time + group + time:group + W + Z > Value Std.Error DF t-value p-value > (Intercept) 10.056433 9.583658 42 1.049331 0.3000 > time1 0.209668 1.301306 42 0.161121 0.8728 > time2 4.111435 2.556420 42 1.608278 0.1153 > time3 0.423056 2.077066 42 0.203679 0.8396 > time4 -3.976417 1.300572 42 -3.057437 0.0039 > group1 4.677706 5.162006 11 0.906180 0.3843 > W 0.377142 0.127146 42 2.966212 0.0050 > Z -0.531895 0.093276 42 -5.702395 0.0000 > time1:group1 -0.845857 2.126289 42 -0.397809 0.6928 > time2:group1 -5.145361 2.962470 42 -1.736848 0.0897 > time3:group1 -3.261241 2.597008 42 -1.255769 0.2161 > time4:group1 4.153245 2.096587 42 1.980956 0.0542 > Correlation: > (Intr) time1 time2 time3 time4 group1 W Z > tm1:g1 > tm2:g1 > time1 -0.051 > > > time2 0.199 0.308 > > > time3 0.023 0.361 0.817 > > > time4 -0.029 0.501 0.293 0.342 > > > group1 -0.202 0.131 0.136 0.146 0.129 > > > W -0.790 0.019 0.243 0.366 -0.015 0.044 > > > Z -0.146 -0.063 -0.853 -0.779 -0.041 -0.086 -0.409 > > > time1:group1 -0.028 -0.601 -0.043 -0.074 -0.302 -0.187 0.147 > -0.144 > > time2:group1 -0.293 -0.262 -0.818 -0.642 -0.255 -0.198 -0.051 > 0.665 0.276 > > time3:group1 -0.016 -0.286 -0.626 -0.774 -0.273 -0.214 -0.277 > 0.590 0.308 > 0.668 > time4:group1 0.065 -0.306 -0.116 -0.159 -0.616 -0.199 0.002 > -0.046 0.497 > 0.318 > tm3:g1 > time1 > time2 > time3 > time4 > group1 > W > Z > time1:group1 > time2:group1 > time3:group1 > time4:group1 0.376 > > Standardized Within-Group Residuals: > Min Q1 Med Q3 Max > -2.11181231 -0.43210237 0.04949838 0.32444580 2.77710590 > > Number of Observations: 65 > Number of Groups: 13 > > > > > > ------------------------------ > > Message: 25 > Date: Sun, 4 Jul 2004 10:24:47 +0100 (BST) > From: Prof Brian Ripley <ripley at stats.ox.ac.uk> > Subject: Re: [R] Re: Seasonal ARMA model > To: Ajay Shah <ajayshah at mayin.org> > Cc: r-help at stat.math.ethz.ch > Message-ID: <Pine.LNX.4.44.0407041021440.9904-100000 at gannet.stats> > Content-Type: TEXT/PLAIN; charset=US-ASCII > > On Sun, 4 Jul 2004, Ajay Shah wrote: > > > > It might clarify your thinking to note that a seasonal ARIMA model > > > is just an ``ordinary'' ARIMA model with some coefficients > > > constrained to be 0 in an efficient way. E.g. a seasonal AR(1) s > > > 4 model is the same as an ordinary (nonseasonal) AR(4) model with > > > coefficients theta_1, theta_2, and theta_3 constrained to be 0. You > > > can get the same answer as from a seasonal model by using the > > > ``fixed'' argument to arima. E.g.: > > > > set.seed(42) > > x <- arima.sim(list(ar=c(0,0,0,0.5)),300) > > f1 = arima(x,seasonal=list(order=c(1,0,0),period=4)) > > f2 > arima(x,order=c(4,0,0),fixed=c(0,0,0,NA,NA),transform.pars=FALSE) > > > > Is there a convenient URL which shows the mathematics of the seasonal > > ARMA model, as implemented by R? > > No, but there is a book, MASS4 (see the FAQ). Although the > software is in > base R it was in fact written by me to support MASS4. > > R follows S-PLUS in some of its choices of signs, which do differ between > accounts. > > > I understand f2 fine. I understand that you are saying that f1 is just > > an AR(4) with the lags 1,2,3 constrained to 0. But I'm unable to > > generalise this. What would be the meaning of mixing up both order and > > seasonal? E.g. what would it mean to do something like: > > > > arima(x,order=c(2,0,0),seasonal=list(order=c(2,0,0),period=12)) > > That is in MASS4 and most of the books referenced on the help page. > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > > > ------------------------------ > > Message: 26 > Date: Sun, 04 Jul 2004 11:38:30 +0200 > From: Peter Mathe <mathe at wias-berlin.de> > Subject: [R] Is there rpm for suse 9.1 under x86_64? > To: R-help at stat.math.ethz.ch > Message-ID: <40E7D016.4060705 at wias-berlin.de> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > I recently upgraded to Suse 9.1 for Amd64. > So far I could not find precompiled binaries of R-1.9.1 for this case. > So I tried installation from source, but could not succeed. Although the > configuration/installation procedure ran without problems, the make > check always ended with errors. When trying to run R , to see what's > going on, the eigen() reported error code -18. > So, is a rpm for R-base-1.9.1 under x86_64 for Suse available, or how > can I succesfully install from sources? > Thank's for reading this message, Peter > > > > ------------------------------ > > Message: 27 > Date: 04 Jul 2004 11:53:54 +0200 > From: Peter Dalgaard <p.dalgaard at biostat.ku.dk> > Subject: Re: [R] Is there rpm for suse 9.1 under x86_64? > To: Peter Mathe <mathe at wias-berlin.de> > Cc: R-help at stat.math.ethz.ch > Message-ID: <x2ekns2j4d.fsf at biostat.ku.dk> > Content-Type: text/plain; charset=us-ascii > > Peter Mathe <mathe at wias-berlin.de> writes: > > > I recently upgraded to Suse 9.1 for Amd64. > > So far I could not find precompiled binaries of R-1.9.1 for this case. > > So I tried installation from source, but could not succeed. Although > > the configuration/installation procedure ran without problems, the > > make check always ended with errors. When trying to run R , to see > > what's going on, the eigen() reported error code -18. > > So, is a rpm for R-base-1.9.1 under x86_64 for Suse available, or how > > can I succesfully install from sources? > > I can't get the upgrade working for me (SATA trouble -- again!) but I > have 9.0 on a system. This has run cleanly for a while with a > home-built RPM based on Detlef's SPEC file, as well as several local > builds. > > A good guess is that they upgraded GCC and something got broken -- > again. You could try reducing the optimization levels on the relevant > files, or as the first thing on everything (-O0 in CFLAGS and FFLAGS). > > > Thank's for reading this message, Peter > > How did you know I would, Peter? ;-) > -- > O__ ---- Peter Dalgaard Blegdamsvej 3 > c/ /'_ --- Dept. of Biostatistics 2200 Cph. N > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 > > > > ------------------------------ > > _______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE read the posting guide! http://www.R-project.org/posting-guide.html > > > End of R-help Digest, Vol 17, Issue 4 > ************************************* >
Reasonably Related Threads
- Newbie question: Statistical functions (e.g., mean, sd) in a "transform" statement?
- help on "stacking" matrices up
- Strange behaviour when using diff with POSIXt and POSIXlt objects
- Repeates Measures MANOVA for Time*Treatment Interactions
- Treatment-response analysis along time